更新
This commit is contained in:
@@ -1,4 +1,4 @@
|
||||
# Day 13 - 声音克隆功能集成完成
|
||||
# Day 13 - 声音克隆功能集成 + 字幕功能
|
||||
|
||||
**日期**:2026-01-29
|
||||
|
||||
@@ -276,4 +276,156 @@ pm2 logs vigent2-qwen-tts --lines 50
|
||||
- [task_complete.md](../task_complete.md) - 任务总览
|
||||
- [Day12.md](./Day12.md) - iOS 兼容与 Qwen3-TTS 部署
|
||||
- [QWEN3_TTS_DEPLOY.md](../QWEN3_TTS_DEPLOY.md) - Qwen3-TTS 部署指南
|
||||
- [SUBTITLE_DEPLOY.md](../SUBTITLE_DEPLOY.md) - 字幕功能部署指南
|
||||
- [DEPLOY_MANUAL.md](../DEPLOY_MANUAL.md) - 完整部署手册
|
||||
|
||||
---
|
||||
|
||||
## 🎬 逐字高亮字幕 + 片头标题功能
|
||||
|
||||
### 背景
|
||||
|
||||
为提升视频质量,新增逐字高亮字幕(卡拉OK效果)和片头标题功能。
|
||||
|
||||
### 技术方案
|
||||
|
||||
| 组件 | 技术 | 说明 |
|
||||
|------|------|------|
|
||||
| 字幕对齐 | **faster-whisper** | 生成字级别时间戳 |
|
||||
| 视频渲染 | **Remotion** | React 视频合成框架 |
|
||||
|
||||
### 架构设计
|
||||
|
||||
```
|
||||
原有流程:
|
||||
文本 → EdgeTTS → 音频 → LatentSync → FFmpeg合成 → 最终视频
|
||||
|
||||
新流程:
|
||||
文本 → EdgeTTS → 音频 ─┬→ LatentSync → 唇形视频 ─┐
|
||||
└→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频
|
||||
```
|
||||
|
||||
### 后端新增服务
|
||||
|
||||
#### 1. 字幕服务 (`whisper_service.py`)
|
||||
|
||||
基于 faster-whisper 生成字级别时间戳:
|
||||
|
||||
```python
|
||||
from faster_whisper import WhisperModel
|
||||
|
||||
class WhisperService:
|
||||
def __init__(self, model_size="large-v3", device="cuda"):
|
||||
self.model = WhisperModel(model_size, device=device)
|
||||
|
||||
async def align(self, audio_path: str, text: str, output_path: str):
|
||||
segments, info = self.model.transcribe(audio_path, word_timestamps=True)
|
||||
# 将词拆分成单字,时间戳线性插值
|
||||
result = {"segments": [...]}
|
||||
# 保存到 JSON
|
||||
```
|
||||
|
||||
**字幕拆字算法**:faster-whisper 对中文返回词级别,系统自动拆分成单字并线性插值:
|
||||
|
||||
```python
|
||||
# 输入: {"word": "大家好", "start": 0.0, "end": 0.9}
|
||||
# 输出:
|
||||
[
|
||||
{"word": "大", "start": 0.0, "end": 0.3},
|
||||
{"word": "家", "start": 0.3, "end": 0.6},
|
||||
{"word": "好", "start": 0.6, "end": 0.9}
|
||||
]
|
||||
```
|
||||
|
||||
#### 2. Remotion 渲染服务 (`remotion_service.py`)
|
||||
|
||||
调用 Remotion 渲染字幕和标题:
|
||||
|
||||
```python
|
||||
class RemotionService:
|
||||
async def render(self, video_path, output_path, captions_path, title, ...):
|
||||
cmd = f"npx ts-node render.ts --video {video_path} --output {output_path} ..."
|
||||
# 执行渲染
|
||||
```
|
||||
|
||||
### Remotion 项目结构
|
||||
|
||||
```
|
||||
remotion/
|
||||
├── package.json # Node.js 依赖
|
||||
├── render.ts # 服务端渲染脚本
|
||||
└── src/
|
||||
├── Video.tsx # 主视频组件
|
||||
├── components/
|
||||
│ ├── Title.tsx # 片头标题(淡入淡出)
|
||||
│ ├── Subtitles.tsx # 逐字高亮字幕
|
||||
│ └── VideoLayer.tsx # 视频图层
|
||||
└── utils/
|
||||
└── captions.ts # 字幕数据类型
|
||||
```
|
||||
|
||||
### 前端 UI
|
||||
|
||||
新增标题和字幕设置区块:
|
||||
|
||||
| 功能 | 说明 |
|
||||
|------|------|
|
||||
| 片头标题输入 | 可选,在视频开头显示 3 秒 |
|
||||
| 字幕开关 | 默认开启,可关闭 |
|
||||
|
||||
### 遇到的问题与修复
|
||||
|
||||
#### 问题 1: `fs` 模块错误
|
||||
|
||||
**现象**:Remotion 打包失败,提示 `fs.js doesn't exist`
|
||||
|
||||
**原因**:`captions.ts` 中有 `loadCaptions` 函数使用了 Node.js 的 `fs` 模块
|
||||
|
||||
**修复**:删除未使用的 `loadCaptions` 函数
|
||||
|
||||
#### 问题 2: 视频文件读取失败
|
||||
|
||||
**现象**:`file://` 协议无法读取本地视频
|
||||
|
||||
**修复**:
|
||||
1. `render.ts` 使用 `publicDir` 指向视频目录
|
||||
2. `VideoLayer.tsx` 使用 `staticFile()` 加载视频
|
||||
|
||||
```typescript
|
||||
// render.ts
|
||||
const publicDir = path.dirname(path.resolve(options.videoPath));
|
||||
const bundleLocation = await bundle({
|
||||
entryPoint: path.resolve(__dirname, './src/index.ts'),
|
||||
publicDir, // 关键配置
|
||||
});
|
||||
|
||||
// VideoLayer.tsx
|
||||
const videoUrl = staticFile(videoSrc);
|
||||
```
|
||||
|
||||
### 测试结果
|
||||
|
||||
- ✅ faster-whisper 字幕对齐成功(~1秒)
|
||||
- ✅ Remotion 渲染成功(~10秒)
|
||||
- ✅ 字幕逐字高亮效果正常
|
||||
- ✅ 片头标题淡入淡出正常
|
||||
- ✅ 降级机制正常(Remotion 失败时回退到 FFmpeg)
|
||||
|
||||
---
|
||||
|
||||
## 📁 今日修改文件清单(完整)
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `models/Qwen3-TTS/qwen_tts_server.py` | 新增 | Qwen3-TTS HTTP 推理服务 |
|
||||
| `run_qwen_tts.sh` | 新增 | PM2 启动脚本 (根目录) |
|
||||
| `backend/app/services/voice_clone_service.py` | 新增 | 声音克隆服务 (HTTP 调用) |
|
||||
| `backend/app/services/whisper_service.py` | 新增 | 字幕对齐服务 (faster-whisper) |
|
||||
| `backend/app/services/remotion_service.py` | 新增 | Remotion 渲染服务 |
|
||||
| `backend/app/api/ref_audios.py` | 新增 | 参考音频管理 API |
|
||||
| `backend/app/api/videos.py` | 修改 | 集成字幕和标题功能 |
|
||||
| `backend/app/main.py` | 修改 | 注册 ref-audios 路由 |
|
||||
| `backend/requirements.txt` | 修改 | 添加 faster-whisper 依赖 |
|
||||
| `remotion/` | 新增 | Remotion 视频渲染项目 |
|
||||
| `frontend/src/app/page.tsx` | 修改 | TTS 模式选择 + 标题字幕 UI |
|
||||
| `Docs/SUBTITLE_DEPLOY.md` | 新增 | 字幕功能部署文档 |
|
||||
|
||||
Reference in New Issue
Block a user