更新

2026-01-29 17:58:07 +08:00
parent b74bacb0b5
commit cf679b34bf
3 changed files with 449 additions and 3 deletions
--- a/Docs/DevLogs/Day13.md
+++ b/Docs/DevLogs/Day13.md
@@ -1,4 +1,4 @@
-# Day 13 - 声音克隆功能集成完成
+# Day 13 - 声音克隆功能集成 + 字幕功能

 **日期**：2026-01-29

@@ -276,4 +276,156 @@ pm2 logs vigent2-qwen-tts --lines 50
 - [task_complete.md](../task_complete.md) - 任务总览
 - [Day12.md](./Day12.md) - iOS 兼容与 Qwen3-TTS 部署
 - [QWEN3_TTS_DEPLOY.md](../QWEN3_TTS_DEPLOY.md) - Qwen3-TTS 部署指南
+- [SUBTITLE_DEPLOY.md](../SUBTITLE_DEPLOY.md) - 字幕功能部署指南
 - [DEPLOY_MANUAL.md](../DEPLOY_MANUAL.md) - 完整部署手册
+
+---
+
+## 🎬 逐字高亮字幕 + 片头标题功能
+
+### 背景
+
+为提升视频质量，新增逐字高亮字幕（卡拉OK效果）和片头标题功能。
+
+### 技术方案
+
+| 组件 | 技术 | 说明 |
+|------|------|------|
+| 字幕对齐 | **faster-whisper** | 生成字级别时间戳 |
+| 视频渲染 | **Remotion** | React 视频合成框架 |
+
+### 架构设计
+
+```
+原有流程:
+  文本 → EdgeTTS → 音频 → LatentSync → FFmpeg合成 → 最终视频
+
+新流程:
+  文本 → EdgeTTS → 音频 ─┬→ LatentSync → 唇形视频 ─┐
+                        └→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频
+```
+
+### 后端新增服务
+
+#### 1. 字幕服务 (`whisper_service.py`)
+
+基于 faster-whisper 生成字级别时间戳：
+
+```python
+from faster_whisper import WhisperModel
+
+class WhisperService:
+    def __init__(self, model_size="large-v3", device="cuda"):
+        self.model = WhisperModel(model_size, device=device)
+
+    async def align(self, audio_path: str, text: str, output_path: str):
+        segments, info = self.model.transcribe(audio_path, word_timestamps=True)
+        # 将词拆分成单字，时间戳线性插值
+        result = {"segments": [...]}
+        # 保存到 JSON
+```
+
+**字幕拆字算法**：faster-whisper 对中文返回词级别，系统自动拆分成单字并线性插值：
+
+```python
+# 输入: {"word": "大家好", "start": 0.0, "end": 0.9}
+# 输出:
+[
+  {"word": "大", "start": 0.0, "end": 0.3},
+  {"word": "家", "start": 0.3, "end": 0.6},
+  {"word": "好", "start": 0.6, "end": 0.9}
+]
+```
+
+#### 2. Remotion 渲染服务 (`remotion_service.py`)
+
+调用 Remotion 渲染字幕和标题：
+
+```python
+class RemotionService:
+    async def render(self, video_path, output_path, captions_path, title, ...):
+        cmd = f"npx ts-node render.ts --video {video_path} --output {output_path} ..."
+        # 执行渲染
+```
+
+### Remotion 项目结构
+
+```
+remotion/
+├── package.json              # Node.js 依赖
+├── render.ts                 # 服务端渲染脚本
+└── src/
+    ├── Video.tsx             # 主视频组件
+    ├── components/
+    │   ├── Title.tsx         # 片头标题（淡入淡出）
+    │   ├── Subtitles.tsx     # 逐字高亮字幕
+    │   └── VideoLayer.tsx    # 视频图层
+    └── utils/
+        └── captions.ts       # 字幕数据类型
+```
+
+### 前端 UI
+
+新增标题和字幕设置区块：
+
+| 功能 | 说明 |
+|------|------|
+| 片头标题输入 | 可选，在视频开头显示 3 秒 |
+| 字幕开关 | 默认开启，可关闭 |
+
+### 遇到的问题与修复
+
+#### 问题 1: `fs` 模块错误
+
+**现象**：Remotion 打包失败，提示 `fs.js doesn't exist`
+
+**原因**：`captions.ts` 中有 `loadCaptions` 函数使用了 Node.js 的 `fs` 模块
+
+**修复**：删除未使用的 `loadCaptions` 函数
+
+#### 问题 2: 视频文件读取失败
+
+**现象**：`file://` 协议无法读取本地视频
+
+**修复**：
+1. `render.ts` 使用 `publicDir` 指向视频目录
+2. `VideoLayer.tsx` 使用 `staticFile()` 加载视频
+
+```typescript
+// render.ts
+const publicDir = path.dirname(path.resolve(options.videoPath));
+const bundleLocation = await bundle({
+  entryPoint: path.resolve(__dirname, './src/index.ts'),
+  publicDir,  // 关键配置
+});
+
+// VideoLayer.tsx
+const videoUrl = staticFile(videoSrc);
+```
+
+### 测试结果
+
+- ✅ faster-whisper 字幕对齐成功（~1秒）
+- ✅ Remotion 渲染成功（~10秒）
+- ✅ 字幕逐字高亮效果正常
+- ✅ 片头标题淡入淡出正常
+- ✅ 降级机制正常（Remotion 失败时回退到 FFmpeg）
+
+---
+
+## 📁 今日修改文件清单（完整）
+
+| 文件 | 变更类型 | 说明 |
+|------|----------|------|
+| `models/Qwen3-TTS/qwen_tts_server.py` | 新增 | Qwen3-TTS HTTP 推理服务 |
+| `run_qwen_tts.sh` | 新增 | PM2 启动脚本 (根目录) |
+| `backend/app/services/voice_clone_service.py` | 新增 | 声音克隆服务 (HTTP 调用) |
+| `backend/app/services/whisper_service.py` | 新增 | 字幕对齐服务 (faster-whisper) |
+| `backend/app/services/remotion_service.py` | 新增 | Remotion 渲染服务 |
+| `backend/app/api/ref_audios.py` | 新增 | 参考音频管理 API |
+| `backend/app/api/videos.py` | 修改 | 集成字幕和标题功能 |
+| `backend/app/main.py` | 修改 | 注册 ref-audios 路由 |
+| `backend/requirements.txt` | 修改 | 添加 faster-whisper 依赖 |
+| `remotion/` | 新增 | Remotion 视频渲染项目 |
+| `frontend/src/app/page.tsx` | 修改 | TTS 模式选择 + 标题字幕 UI |
+| `Docs/SUBTITLE_DEPLOY.md` | 新增 | 字幕功能部署文档 |