# ViGent2 字幕与标题功能部署指南

本文档介绍如何部署 ViGent2 的逐字高亮字幕和片头标题功能。

## 功能概述

| 功能 | 说明 |
|------|------|
| **逐字高亮字幕** | 使用 faster-whisper 生成字级别时间戳，Remotion 渲染卡拉OK效果 |
| **片头标题** | 视频开头显示标题，带淡入淡出动画，几秒后消失 |

## 技术架构

```
原有流程:
  文本 → EdgeTTS → 音频 → LatentSync → FFmpeg合成 → 最终视频

新流程 (单素材):
  文本 → EdgeTTS/Qwen3-TTS/预生成配音 → 音频 ─┬→ LatentSync → 唇形视频 ─┐
                                              └→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频

新流程 (多素材):
  音频 → 多素材按 custom_assignments 拼接 → LatentSync (单次推理) → 唇形视频 ─┐
  音频 → faster-whisper → 字幕JSON ─────────────────────────────────────────────┴→ Remotion合成 → 最终视频
```

## 系统要求

| 组件 | 要求 |
|------|------|
| Node.js | 18+ |
| Python | 3.10+ |
| GPU 显存 | faster-whisper 需要约 3-4GB VRAM |
| FFmpeg | 已安装 |

---

## 部署步骤

### 步骤 1: 安装 faster-whisper (Python)

```bash
cd /home/rongye/ProgramFiles/ViGent2/backend
source venv/bin/activate

# 安装 faster-whisper
pip install faster-whisper>=1.0.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
```

> **注意**: 首次运行时，faster-whisper 会自动下载 `large-v3` Whisper 模型 (~3GB)

### 步骤 2: 安装 Remotion (Node.js)

```bash
cd /home/rongye/ProgramFiles/ViGent2/remotion

# 安装依赖
npm install

# 预编译渲染脚本 (生产环境必须)
npm run build:render
```

### 步骤 3: 重启后端服务

```bash
pm2 restart vigent2-backend
```

### 步骤 4: 验证安装

```bash
# 检查 faster-whisper 是否安装成功
cd /home/rongye/ProgramFiles/ViGent2/backend
source venv/bin/activate
python -c "from faster_whisper import WhisperModel; print('faster-whisper OK')"

# 检查 Remotion 是否安装成功
cd /home/rongye/ProgramFiles/ViGent2/remotion
npx remotion --version
```

---

## 文件结构

### 后端新增文件

| 文件 | 说明 |
|------|------|
| `backend/app/services/whisper_service.py` | 字幕对齐服务 (基于 faster-whisper) |
| `backend/app/services/remotion_service.py` | Remotion 渲染服务 |

### Remotion 项目结构

```
remotion/
├── package.json              # Node.js 依赖配置
├── tsconfig.json             # TypeScript 配置
├── render.ts                 # 服务端渲染脚本
└── src/
    ├── index.ts              # Remotion 入口
    ├── Root.tsx              # 根组件
    ├── Video.tsx             # 主视频组件
    ├── components/
    │   ├── Title.tsx         # 片头标题组件
    │   ├── Subtitles.tsx     # 逐字高亮字幕组件
    │   └── VideoLayer.tsx    # 视频图层组件
    ├── utils/
    │   └── captions.ts       # 字幕数据处理工具
    └── fonts/                # 字体文件目录 (可选)
```

---

## API 参数

视频生成 API (`POST /api/videos/generate`) 新增以下参数：

| 参数 | 类型 | 默认值 | 说明 |
|------|------|--------|------|
| `title` | string | null | 视频标题（片头显示，可选） |
| `enable_subtitles` | boolean | true | 是否启用逐字高亮字幕 |

### 请求示例

```json
{
  "material_path": "https://...",
  "text": "大家好，欢迎来到我的频道",
  "tts_mode": "edgetts",
  "voice": "zh-CN-YunxiNeural",
  "title": "今日分享",
  "enable_subtitles": true
}
```

---

## 视频生成流程

新的视频生成流程进度分配：

| 阶段 | 进度 | 说明 |
|------|------|------|
| 下载素材 | 0% → 5% | 从 Supabase 下载输入视频 |
| TTS 语音生成 | 5% → 25% | EdgeTTS / Qwen3-TTS / 预生成配音下载 |
| 唇形同步 | 25% → 80% | LatentSync 推理 |
| 字幕对齐 | 80% → 85% | faster-whisper 生成字级别时间戳 |
| Remotion 渲染 | 85% → 95% | 合成字幕和标题 |
| 上传结果 | 95% → 100% | 上传到 Supabase Storage |

---

## 降级处理

系统包含自动降级机制，确保基本功能不受影响：

| 场景 | 处理方式 |
|------|----------|
| 字幕对齐失败 | 跳过字幕，继续生成视频 |
| Remotion 未安装 | 使用 FFmpeg 直接合成 |
| Remotion 渲染失败 | 回退到 FFmpeg 合成 |

---

## 配置说明

### 字幕服务配置

字幕服务位于 `backend/app/services/whisper_service.py`，默认配置：

| 参数 | 默认值 | 说明 |
|------|--------|------|
| `model_size` | large-v3 | Whisper 模型大小 |
| `device` | cuda | 运行设备 |
| `compute_type` | float16 | 计算精度 |

如需修改，可编辑 `whisper_service.py` 中的 `WhisperService` 初始化参数。

### Remotion 配置

Remotion 渲染参数在 `backend/app/services/remotion_service.py` 中配置：

| 参数 | 默认值 | 说明 |
|------|--------|------|
| `fps` | 25 | 输出帧率 |
| `title_duration` | 3.0 | 标题显示时长（秒） |

---

## 故障排除

### faster-whisper 相关

**问题**: `ModuleNotFoundError: No module named 'faster_whisper'`

```bash
cd /home/rongye/ProgramFiles/ViGent2/backend
source venv/bin/activate
pip install faster-whisper>=1.0.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
```

**问题**: GPU 显存不足

修改 `whisper_service.py`，使用较小的模型：
```python
WhisperService(model_size="medium", compute_type="int8")
```

### Remotion 相关

**问题**: `node_modules not found`

```bash
cd /home/rongye/ProgramFiles/ViGent2/remotion
npm install
```

**问题**: Remotion 渲染失败 - `fs` 模块错误

确保 `remotion/src/utils/captions.ts` 中没有使用 Node.js 的 `fs` 模块。Remotion 在浏览器环境打包，不支持 `fs`。

**问题**: Remotion 渲染失败 - 视频文件读取错误 (`file://` 协议)

确保 `render.ts` 使用 `publicDir` 选项指向视频所在目录，`VideoLayer.tsx` 使用 `staticFile()` 加载视频：

```typescript
// render.ts
const publicDir = path.dirname(path.resolve(options.videoPath));
const bundleLocation = await bundle({
  entryPoint: path.resolve(__dirname, './src/index.ts'),
  publicDir,  // 关键配置
});

// VideoLayer.tsx
const videoUrl = staticFile(videoSrc);  // 使用 staticFile
```

**问题**: Remotion 渲染失败

查看后端日志：
```bash
pm2 logs vigent2-backend
```

### 查看服务健康状态

```bash
# 字幕服务健康检查
cd /home/rongye/ProgramFiles/ViGent2/backend
source venv/bin/activate
python -c "from app.services.whisper_service import whisper_service; import asyncio; print(asyncio.run(whisper_service.check_health()))"

# Remotion 健康检查
python -c "from app.services.remotion_service import remotion_service; import asyncio; print(asyncio.run(remotion_service.check_health()))"
```

---

## 可选优化

### 添加中文字体

为获得更好的字幕渲染效果，可添加中文字体：

```bash
# 下载 Noto Sans SC 字体
cd /home/rongye/ProgramFiles/ViGent2/remotion/src/fonts
wget https://github.com/googlefonts/noto-cjk/raw/main/Sans/OTF/SimplifiedChinese/NotoSansSC-Regular.otf -O NotoSansSC.otf
```

### 使用 GPU 0

faster-whisper 默认使用 GPU 0，与 LatentSync (GPU 1) 分开，避免显存冲突。如需指定 GPU：

```python
# 在 whisper_service.py 中修改
WhisperService(device="cuda:0")  # 或 "cuda:1"
```

---

## 更新日志

| 日期 | 版本 | 说明 |
|------|------|------|
| 2026-01-29 | 1.0.0 | 初始版本，使用 faster-whisper + Remotion 实现逐字高亮字幕和片头标题 |
| 2026-02-10 | 1.1.0 | 更新架构图：多素材 concat-then-infer、预生成配音选项 |
| 2026-01-30 | 1.0.1 | 字幕高亮样式与标题动画优化，视觉表现更清晰 |