更新

2026-01-29 17:58:07 +08:00 · 2026-01-29 17:54:43 +08:00
21 changed files with 4372 additions and 10 deletions
--- a/Docs/DevLogs/Day13.md
+++ b/Docs/DevLogs/Day13.md
@@ -1,4 +1,4 @@
-# Day 13 - 声音克隆功能集成完成
+# Day 13 - 声音克隆功能集成 + 字幕功能

 **日期**：2026-01-29

@@ -276,4 +276,156 @@ pm2 logs vigent2-qwen-tts --lines 50
 - [task_complete.md](../task_complete.md) - 任务总览
 - [Day12.md](./Day12.md) - iOS 兼容与 Qwen3-TTS 部署
 - [QWEN3_TTS_DEPLOY.md](../QWEN3_TTS_DEPLOY.md) - Qwen3-TTS 部署指南
+- [SUBTITLE_DEPLOY.md](../SUBTITLE_DEPLOY.md) - 字幕功能部署指南
 - [DEPLOY_MANUAL.md](../DEPLOY_MANUAL.md) - 完整部署手册
+
+---
+
+## 🎬 逐字高亮字幕 + 片头标题功能
+
+### 背景
+
+为提升视频质量，新增逐字高亮字幕（卡拉OK效果）和片头标题功能。
+
+### 技术方案
+
+| 组件 | 技术 | 说明 |
+|------|------|------|
+| 字幕对齐 | **faster-whisper** | 生成字级别时间戳 |
+| 视频渲染 | **Remotion** | React 视频合成框架 |
+
+### 架构设计
+
+```
+原有流程:
+  文本 → EdgeTTS → 音频 → LatentSync → FFmpeg合成 → 最终视频
+
+新流程:
+  文本 → EdgeTTS → 音频 ─┬→ LatentSync → 唇形视频 ─┐
+                        └→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频
+```
+
+### 后端新增服务
+
+#### 1. 字幕服务 (`whisper_service.py`)
+
+基于 faster-whisper 生成字级别时间戳：
+
+```python
+from faster_whisper import WhisperModel
+
+class WhisperService:
+    def __init__(self, model_size="large-v3", device="cuda"):
+        self.model = WhisperModel(model_size, device=device)
+
+    async def align(self, audio_path: str, text: str, output_path: str):
+        segments, info = self.model.transcribe(audio_path, word_timestamps=True)
+        # 将词拆分成单字，时间戳线性插值
+        result = {"segments": [...]}
+        # 保存到 JSON
+```
+
+**字幕拆字算法**：faster-whisper 对中文返回词级别，系统自动拆分成单字并线性插值：
+
+```python
+# 输入: {"word": "大家好", "start": 0.0, "end": 0.9}
+# 输出:
+[
+  {"word": "大", "start": 0.0, "end": 0.3},
+  {"word": "家", "start": 0.3, "end": 0.6},
+  {"word": "好", "start": 0.6, "end": 0.9}
+]
+```
+
+#### 2. Remotion 渲染服务 (`remotion_service.py`)
+
+调用 Remotion 渲染字幕和标题：
+
+```python
+class RemotionService:
+    async def render(self, video_path, output_path, captions_path, title, ...):
+        cmd = f"npx ts-node render.ts --video {video_path} --output {output_path} ..."
+        # 执行渲染
+```
+
+### Remotion 项目结构
+
+```
+remotion/
+├── package.json              # Node.js 依赖
+├── render.ts                 # 服务端渲染脚本
+└── src/
+    ├── Video.tsx             # 主视频组件
+    ├── components/
+    │   ├── Title.tsx         # 片头标题（淡入淡出）
+    │   ├── Subtitles.tsx     # 逐字高亮字幕
+    │   └── VideoLayer.tsx    # 视频图层
+    └── utils/
+        └── captions.ts       # 字幕数据类型
+```
+
+### 前端 UI
+
+新增标题和字幕设置区块：
+
+| 功能 | 说明 |
+|------|------|
+| 片头标题输入 | 可选，在视频开头显示 3 秒 |
+| 字幕开关 | 默认开启，可关闭 |
+
+### 遇到的问题与修复
+
+#### 问题 1: `fs` 模块错误
+
+**现象**：Remotion 打包失败，提示 `fs.js doesn't exist`
+
+**原因**：`captions.ts` 中有 `loadCaptions` 函数使用了 Node.js 的 `fs` 模块
+
+**修复**：删除未使用的 `loadCaptions` 函数
+
+#### 问题 2: 视频文件读取失败
+
+**现象**：`file://` 协议无法读取本地视频
+
+**修复**：
+1. `render.ts` 使用 `publicDir` 指向视频目录
+2. `VideoLayer.tsx` 使用 `staticFile()` 加载视频
+
+```typescript
+// render.ts
+const publicDir = path.dirname(path.resolve(options.videoPath));
+const bundleLocation = await bundle({
+  entryPoint: path.resolve(__dirname, './src/index.ts'),
+  publicDir,  // 关键配置
+});
+
+// VideoLayer.tsx
+const videoUrl = staticFile(videoSrc);
+```
+
+### 测试结果
+
+- ✅ faster-whisper 字幕对齐成功（~1秒）
+- ✅ Remotion 渲染成功（~10秒）
+- ✅ 字幕逐字高亮效果正常
+- ✅ 片头标题淡入淡出正常
+- ✅ 降级机制正常（Remotion 失败时回退到 FFmpeg）
+
+---
+
+## 📁 今日修改文件清单（完整）
+
+| 文件 | 变更类型 | 说明 |
+|------|----------|------|
+| `models/Qwen3-TTS/qwen_tts_server.py` | 新增 | Qwen3-TTS HTTP 推理服务 |
+| `run_qwen_tts.sh` | 新增 | PM2 启动脚本 (根目录) |
+| `backend/app/services/voice_clone_service.py` | 新增 | 声音克隆服务 (HTTP 调用) |
+| `backend/app/services/whisper_service.py` | 新增 | 字幕对齐服务 (faster-whisper) |
+| `backend/app/services/remotion_service.py` | 新增 | Remotion 渲染服务 |
+| `backend/app/api/ref_audios.py` | 新增 | 参考音频管理 API |
+| `backend/app/api/videos.py` | 修改 | 集成字幕和标题功能 |
+| `backend/app/main.py` | 修改 | 注册 ref-audios 路由 |
+| `backend/requirements.txt` | 修改 | 添加 faster-whisper 依赖 |
+| `remotion/` | 新增 | Remotion 视频渲染项目 |
+| `frontend/src/app/page.tsx` | 修改 | TTS 模式选择 + 标题字幕 UI |
+| `Docs/SUBTITLE_DEPLOY.md` | 新增 | 字幕功能部署文档 |
--- a/Docs/SUBTITLE_DEPLOY.md
+++ b/Docs/SUBTITLE_DEPLOY.md
@@ -0,0 +1,281 @@
+# ViGent2 字幕与标题功能部署指南
+
+本文档介绍如何部署 ViGent2 的逐字高亮字幕和片头标题功能。
+
+## 功能概述
+
+| 功能 | 说明 |
+|------|------|
+| **逐字高亮字幕** | 使用 faster-whisper 生成字级别时间戳，Remotion 渲染卡拉OK效果 |
+| **片头标题** | 视频开头显示标题，带淡入淡出动画，几秒后消失 |
+
+## 技术架构
+
+```
+原有流程:
+  文本 → EdgeTTS → 音频 → LatentSync → FFmpeg合成 → 最终视频
+
+新流程:
+  文本 → EdgeTTS → 音频 ─┬→ LatentSync → 唇形视频 ─┐
+                        └→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频
+```
+
+## 系统要求
+
+| 组件 | 要求 |
+|------|------|
+| Node.js | 18+ |
+| Python | 3.10+ |
+| GPU 显存 | faster-whisper 需要约 3-4GB VRAM |
+| FFmpeg | 已安装 |
+
+---
+
+## 部署步骤
+
+### 步骤 1: 安装 faster-whisper (Python)
+
+```bash
+cd /home/rongye/ProgramFiles/ViGent2/backend
+source venv/bin/activate
+
+# 安装 faster-whisper
+pip install faster-whisper>=1.0.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+> **注意**: 首次运行时，faster-whisper 会自动下载 `large-v3` Whisper 模型 (~3GB)
+
+### 步骤 2: 安装 Remotion (Node.js)
+
+```bash
+cd /home/rongye/ProgramFiles/ViGent2/remotion
+
+# 安装依赖
+npm install
+```
+
+### 步骤 3: 重启后端服务
+
+```bash
+pm2 restart vigent2-backend
+```
+
+### 步骤 4: 验证安装
+
+```bash
+# 检查 faster-whisper 是否安装成功
+cd /home/rongye/ProgramFiles/ViGent2/backend
+source venv/bin/activate
+python -c "from faster_whisper import WhisperModel; print('faster-whisper OK')"
+
+# 检查 Remotion 是否安装成功
+cd /home/rongye/ProgramFiles/ViGent2/remotion
+npx remotion --version
+```
+
+---
+
+## 文件结构
+
+### 后端新增文件
+
+| 文件 | 说明 |
+|------|------|
+| `backend/app/services/whisper_service.py` | 字幕对齐服务 (基于 faster-whisper) |
+| `backend/app/services/remotion_service.py` | Remotion 渲染服务 |
+
+### Remotion 项目结构
+
+```
+remotion/
+├── package.json              # Node.js 依赖配置
+├── tsconfig.json             # TypeScript 配置
+├── render.ts                 # 服务端渲染脚本
+└── src/
+    ├── index.ts              # Remotion 入口
+    ├── Root.tsx              # 根组件
+    ├── Video.tsx             # 主视频组件
+    ├── components/
+    │   ├── Title.tsx         # 片头标题组件
+    │   ├── Subtitles.tsx     # 逐字高亮字幕组件
+    │   └── VideoLayer.tsx    # 视频图层组件
+    ├── utils/
+    │   └── captions.ts       # 字幕数据处理工具
+    └── fonts/                # 字体文件目录 (可选)
+```
+
+---
+
+## API 参数
+
+视频生成 API (`POST /api/videos/generate`) 新增以下参数：
+
+| 参数 | 类型 | 默认值 | 说明 |
+|------|------|--------|------|
+| `title` | string | null | 视频标题（片头显示，可选） |
+| `enable_subtitles` | boolean | true | 是否启用逐字高亮字幕 |
+
+### 请求示例
+
+```json
+{
+  "material_path": "https://...",
+  "text": "大家好，欢迎来到我的频道",
+  "tts_mode": "edgetts",
+  "voice": "zh-CN-YunxiNeural",
+  "title": "今日分享",
+  "enable_subtitles": true
+}
+```
+
+---
+
+## 视频生成流程
+
+新的视频生成流程进度分配：
+
+| 阶段 | 进度 | 说明 |
+|------|------|------|
+| 下载素材 | 0% → 5% | 从 Supabase 下载输入视频 |
+| TTS 语音生成 | 5% → 25% | EdgeTTS 或 Qwen3-TTS 生成音频 |
+| 唇形同步 | 25% → 80% | LatentSync 推理 |
+| 字幕对齐 | 80% → 85% | faster-whisper 生成字级别时间戳 |
+| Remotion 渲染 | 85% → 95% | 合成字幕和标题 |
+| 上传结果 | 95% → 100% | 上传到 Supabase Storage |
+
+---
+
+## 降级处理
+
+系统包含自动降级机制，确保基本功能不受影响：
+
+| 场景 | 处理方式 |
+|------|----------|
+| 字幕对齐失败 | 跳过字幕，继续生成视频 |
+| Remotion 未安装 | 使用 FFmpeg 直接合成 |
+| Remotion 渲染失败 | 回退到 FFmpeg 合成 |
+
+---
+
+## 配置说明
+
+### 字幕服务配置
+
+字幕服务位于 `backend/app/services/whisper_service.py`，默认配置：
+
+| 参数 | 默认值 | 说明 |
+|------|--------|------|
+| `model_size` | large-v3 | Whisper 模型大小 |
+| `device` | cuda | 运行设备 |
+| `compute_type` | float16 | 计算精度 |
+
+如需修改，可编辑 `whisper_service.py` 中的 `WhisperService` 初始化参数。
+
+### Remotion 配置
+
+Remotion 渲染参数在 `backend/app/services/remotion_service.py` 中配置：
+
+| 参数 | 默认值 | 说明 |
+|------|--------|------|
+| `fps` | 25 | 输出帧率 |
+| `title_duration` | 3.0 | 标题显示时长（秒） |
+
+---
+
+## 故障排除
+
+### faster-whisper 相关
+
+**问题**: `ModuleNotFoundError: No module named 'faster_whisper'`
+
+```bash
+cd /home/rongye/ProgramFiles/ViGent2/backend
+source venv/bin/activate
+pip install faster-whisper>=1.0.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+**问题**: GPU 显存不足
+
+修改 `whisper_service.py`，使用较小的模型：
+```python
+WhisperService(model_size="medium", compute_type="int8")
+```
+
+### Remotion 相关
+
+**问题**: `node_modules not found`
+
+```bash
+cd /home/rongye/ProgramFiles/ViGent2/remotion
+npm install
+```
+
+**问题**: Remotion 渲染失败 - `fs` 模块错误
+
+确保 `remotion/src/utils/captions.ts` 中没有使用 Node.js 的 `fs` 模块。Remotion 在浏览器环境打包，不支持 `fs`。
+
+**问题**: Remotion 渲染失败 - 视频文件读取错误 (`file://` 协议)
+
+确保 `render.ts` 使用 `publicDir` 选项指向视频所在目录，`VideoLayer.tsx` 使用 `staticFile()` 加载视频：
+
+```typescript
+// render.ts
+const publicDir = path.dirname(path.resolve(options.videoPath));
+const bundleLocation = await bundle({
+  entryPoint: path.resolve(__dirname, './src/index.ts'),
+  publicDir,  // 关键配置
+});
+
+// VideoLayer.tsx
+const videoUrl = staticFile(videoSrc);  // 使用 staticFile
+```
+
+**问题**: Remotion 渲染失败
+
+查看后端日志：
+```bash
+pm2 logs vigent2-backend
+```
+
+### 查看服务健康状态
+
+```bash
+# 字幕服务健康检查
+cd /home/rongye/ProgramFiles/ViGent2/backend
+source venv/bin/activate
+python -c "from app.services.whisper_service import whisper_service; import asyncio; print(asyncio.run(whisper_service.check_health()))"
+
+# Remotion 健康检查
+python -c "from app.services.remotion_service import remotion_service; import asyncio; print(asyncio.run(remotion_service.check_health()))"
+```
+
+---
+
+## 可选优化
+
+### 添加中文字体
+
+为获得更好的字幕渲染效果，可添加中文字体：
+
+```bash
+# 下载 Noto Sans SC 字体
+cd /home/rongye/ProgramFiles/ViGent2/remotion/src/fonts
+wget https://github.com/googlefonts/noto-cjk/raw/main/Sans/OTF/SimplifiedChinese/NotoSansSC-Regular.otf -O NotoSansSC.otf
+```
+
+### 使用 GPU 0
+
+faster-whisper 默认使用 GPU 0，与 LatentSync (GPU 1) 分开，避免显存冲突。如需指定 GPU：
+
+```python
+# 在 whisper_service.py 中修改
+WhisperService(device="cuda:0")  # 或 "cuda:1"
+```
+
+---
+
+## 更新日志
+
+| 日期 | 版本 | 说明 |
+|------|------|------|
+| 2026-01-29 | 1.0.0 | 初始版本，使用 faster-whisper + Remotion 实现逐字高亮字幕和片头标题 |
--- a/Docs/task_complete.md
+++ b/Docs/task_complete.md
@@ -3,7 +3,7 @@
 **项目**：ViGent2 数字人口播视频生成系统
 **服务器**：Dell R730 (2× RTX 3090 24GB)
 **更新时间**：2026-01-29
-**整体进度**：100%（Day 13 声音克隆功能集成完成）
+**整体进度**：100%（Day 13 声音克隆 + 字幕功能完成）

 ## 📖 快速导航

@@ -177,6 +177,14 @@
 - [x] **Supabase ref-audios Bucket** (参考音频存储桶 + RLS 策略)
 - [x] **端到端测试验证** (声音克隆完整流程测试通过)

+### 阶段二十一：逐字高亮字幕 + 片头标题 (Day 13)
+- [x] **faster-whisper 字幕对齐** (字级别时间戳生成)
+- [x] **Remotion 视频渲染** (React 视频合成框架)
+- [x] **逐字高亮字幕** (卡拉OK效果)
+- [x] **片头标题** (淡入淡出动画)
+- [x] **前端标题/字幕设置 UI**
+- [x] **降级机制** (Remotion 失败时回退 FFmpeg)
+
 ---

 ## 🛤️ 后续规划
@@ -187,6 +195,7 @@
 ### 🟠 功能完善
 - [x] Qwen3-TTS 集成到 ViGent2 ✅ Day 13 完成
 - [x] 定时发布功能 ✅ Day 7 完成
+- [x] 逐字高亮字幕 ✅ Day 13 完成
 - [ ] **后端定时发布** - 替代平台端定时，使用 APScheduler 实现任务调度
 - [ ] 批量视频生成
 - [ ] 字幕样式编辑器
@@ -366,11 +375,15 @@ Day 12: iOS 兼容与移动端优化   ✅ 完成
       - **Qwen3-TTS 0.6B 部署** (声音克隆模型，GPU0)
       - **部署文档** (QWEN3_TTS_DEPLOY.md)

-Day 13: 声音克隆功能集成        ✅ 完成
+Day 13: 声音克隆 + 字幕功能        ✅ 完成
       - Qwen3-TTS HTTP 服务 (独立 FastAPI，端口 8009)
       - 声音克隆服务 (voice_clone_service.py)
       - 参考音频管理 API (上传/列表/删除)
       - 前端 TTS 模式选择 (EdgeTTS / 声音克隆)
       - Supabase ref-audios Bucket 配置
       - 端到端测试验证通过
+       - **faster-whisper 字幕对齐** (字级别时间戳)
+       - **Remotion 视频渲染** (逐字高亮字幕 + 片头标题)
+       - **前端标题/字幕设置 UI**
+       - **部署文档** (SUBTITLE_DEPLOY.md)

--- a/README.md
+++ b/README.md
@@ -10,7 +10,9 @@

 - 🎬 **唇形同步** - LatentSync 1.6 驱动，512×512 高分辨率 Diffusion 模型
 - 🎙️ **TTS 配音** - EdgeTTS 多音色支持（云溪、晓晓等）
- 🔊 **声音克隆** - Qwen3-TTS 0.6B，3秒参考音频快速克隆 🆕
+- 🔊 **声音克隆** - Qwen3-TTS 0.6B，3秒参考音频快速克隆
+- 📝 **逐字高亮字幕** - faster-whisper + Remotion，卡拉OK效果 🆕
+- 🎬 **片头标题** - 淡入淡出动画，可自定义 🆕
 - 📱 **全自动发布** - 扫码登录 + Cookie持久化，支持多平台(B站/抖音/小红书)定时发布
 - 🖥️ **Web UI** - Next.js 现代化界面，iOS/Android 移动端适配
 - 🔐 **用户系统** - Supabase + JWT 认证，支持管理员后台、注册/登录
@@ -29,6 +31,7 @@
 | 唇形同步 | **LatentSync 1.6** (Latent Diffusion, 512×512) |
 | TTS | EdgeTTS |
 | 声音克隆 | **Qwen3-TTS 0.6B** |
+| 字幕渲染 | **faster-whisper + Remotion** |
 | 视频处理 | FFmpeg |
 | 自动发布 | Playwright |

@@ -152,6 +155,7 @@ nohup python -m scripts.server > server.log 2>&1 &

 - [手动部署指南](Docs/DEPLOY_MANUAL.md)
 - [Supabase 部署指南](Docs/SUPABASE_DEPLOY.md)
+- [字幕功能部署指南](Docs/SUBTITLE_DEPLOY.md)
 - [LatentSync 部署指南](models/LatentSync/DEPLOY.md)
 - [开发日志](Docs/DevLogs/)
 - [任务进度](Docs/task_complete.md)
--- a/backend/app/api/videos.py
+++ b/backend/app/api/videos.py
@@ -13,6 +13,8 @@ from app.services.video_service import VideoService
 from app.services.lipsync_service import LipSyncService
 from app.services.voice_clone_service import voice_clone_service
 from app.services.storage import storage_service
+from app.services.whisper_service import whisper_service
+from app.services.remotion_service import remotion_service
 from app.core.config import settings
 from app.core.deps import get_current_user

@@ -26,6 +28,9 @@ class GenerateRequest(BaseModel):
    tts_mode: str = "edgetts"  # "edgetts" | "voiceclone"
    ref_audio_id: Optional[str] = None  # 参考音频 storage path
    ref_text: Optional[str] = None  # 参考音频的转写文字
+    # 字幕和标题功能
+    title: Optional[str] = None  # 视频标题（片头显示）
+    enable_subtitles: bool = True  # 是否启用逐字高亮字幕

 tasks = {} # In-memory task store

@@ -167,17 +172,84 @@ async def _process_video_generation(task_id: str, req: GenerateRequest, user_id:

        lipsync_time = time.time() - lipsync_start
        print(f"[Pipeline] LipSync completed in {lipsync_time:.1f}s")
+        tasks[task_id]["progress"] = 80
+
+        # 3. WhisperX 字幕对齐 - 进度 80% -> 85%
+        captions_path = None
+        if req.enable_subtitles:
+            tasks[task_id]["message"] = "正在生成字幕 (Whisper)..."
+            tasks[task_id]["progress"] = 82
+
+            captions_path = temp_dir / f"{task_id}_captions.json"
+            temp_files.append(captions_path)
+
+            try:
+                await whisper_service.align(
+                    audio_path=str(audio_path),
+                    text=req.text,
+                    output_path=str(captions_path)
+                )
+                print(f"[Pipeline] Whisper alignment completed")
+            except Exception as e:
+                logger.warning(f"Whisper alignment failed, skipping subtitles: {e}")
+                captions_path = None
+
        tasks[task_id]["progress"] = 85

-        # 3. Composition - 进度 85% -> 100%
-        tasks[task_id]["message"] = "正在合成最终视频..."
-        tasks[task_id]["progress"] = 90
+        # 4. Remotion 视频合成（字幕 + 标题）- 进度 85% -> 95%
+        # 判断是否需要使用 Remotion（有字幕或标题时使用）
+        use_remotion = (captions_path and captions_path.exists()) or req.title

-        video = VideoService()
        final_output_local_path = temp_dir / f"{task_id}_output.mp4"
        temp_files.append(final_output_local_path)

-        await video.compose(str(lipsync_video_path), str(audio_path), str(final_output_local_path))
+        if use_remotion:
+            tasks[task_id]["message"] = "正在合成视频 (Remotion)..."
+            tasks[task_id]["progress"] = 87
+
+            # 先用 FFmpeg 合成音视频（Remotion 需要带音频的视频）
+            composed_video_path = temp_dir / f"{task_id}_composed.mp4"
+            temp_files.append(composed_video_path)
+
+            video = VideoService()
+            await video.compose(str(lipsync_video_path), str(audio_path), str(composed_video_path))
+
+            # 检查 Remotion 是否可用
+            remotion_health = await remotion_service.check_health()
+            if remotion_health.get("ready"):
+                try:
+                    def on_remotion_progress(percent):
+                        # 映射 Remotion 进度到 87-95%
+                        mapped = 87 + int(percent * 0.08)
+                        tasks[task_id]["progress"] = mapped
+
+                    await remotion_service.render(
+                        video_path=str(composed_video_path),
+                        output_path=str(final_output_local_path),
+                        captions_path=str(captions_path) if captions_path else None,
+                        title=req.title,
+                        title_duration=3.0,
+                        fps=25,
+                        enable_subtitles=req.enable_subtitles,
+                        on_progress=on_remotion_progress
+                    )
+                    print(f"[Pipeline] Remotion render completed")
+                except Exception as e:
+                    logger.warning(f"Remotion render failed, using FFmpeg fallback: {e}")
+                    # 回退到 FFmpeg 合成
+                    import shutil
+                    shutil.copy(str(composed_video_path), final_output_local_path)
+            else:
+                logger.warning(f"Remotion not ready: {remotion_health.get('error')}, using FFmpeg")
+                import shutil
+                shutil.copy(str(composed_video_path), final_output_local_path)
+        else:
+            # 不需要字幕和标题，直接用 FFmpeg 合成
+            tasks[task_id]["message"] = "正在合成最终视频..."
+            tasks[task_id]["progress"] = 90
+
+            video = VideoService()
+            await video.compose(str(lipsync_video_path), str(audio_path), str(final_output_local_path))

        total_time = time.time() - start_time

--- a/backend/app/services/remotion_service.py
+++ b/backend/app/services/remotion_service.py
@@ -0,0 +1,150 @@
+"""
+Remotion 视频渲染服务
+调用 Node.js Remotion 进行视频合成（字幕 + 标题）
+"""
+
+import asyncio
+import subprocess
+from pathlib import Path
+from typing import Optional
+from loguru import logger
+
+
+class RemotionService:
+    """Remotion 视频渲染服务"""
+
+    def __init__(self, remotion_dir: Optional[str] = None):
+        # Remotion 项目目录
+        if remotion_dir:
+            self.remotion_dir = Path(remotion_dir)
+        else:
+            # 默认在 ViGent2/remotion 目录
+            self.remotion_dir = Path(__file__).parent.parent.parent.parent / "remotion"
+
+    async def render(
+        self,
+        video_path: str,
+        output_path: str,
+        captions_path: Optional[str] = None,
+        title: Optional[str] = None,
+        title_duration: float = 3.0,
+        fps: int = 25,
+        enable_subtitles: bool = True,
+        on_progress: Optional[callable] = None
+    ) -> str:
+        """
+        使用 Remotion 渲染视频（添加字幕和标题）
+
+        Args:
+            video_path: 输入视频路径（唇形同步后的视频）
+            output_path: 输出视频路径
+            captions_path: 字幕 JSON 文件路径（Whisper 生成）
+            title: 视频标题（可选）
+            title_duration: 标题显示时长（秒）
+            fps: 帧率
+            enable_subtitles: 是否启用字幕
+            on_progress: 进度回调函数
+
+        Returns:
+            输出视频路径
+        """
+        # 构建命令参数
+        cmd = [
+            "npx", "ts-node", "render.ts",
+            "--video", str(video_path),
+            "--output", str(output_path),
+            "--fps", str(fps),
+            "--enableSubtitles", str(enable_subtitles).lower()
+        ]
+
+        if captions_path:
+            cmd.extend(["--captions", str(captions_path)])
+
+        if title:
+            cmd.extend(["--title", title])
+            cmd.extend(["--titleDuration", str(title_duration)])
+
+        logger.info(f"Running Remotion render: {' '.join(cmd)}")
+
+        # 在线程池中运行子进程
+        def _run_render():
+            process = subprocess.Popen(
+                cmd,
+                cwd=str(self.remotion_dir),
+                stdout=subprocess.PIPE,
+                stderr=subprocess.STDOUT,
+                text=True,
+                bufsize=1
+            )
+
+            output_lines = []
+            for line in iter(process.stdout.readline, ''):
+                line = line.strip()
+                if line:
+                    output_lines.append(line)
+                    logger.debug(f"[Remotion] {line}")
+
+                    # 解析进度
+                    if "Rendering:" in line and "%" in line:
+                        try:
+                            percent_str = line.split("Rendering:")[1].strip().replace("%", "")
+                            percent = int(percent_str)
+                            if on_progress:
+                                on_progress(percent)
+                        except (ValueError, IndexError):
+                            pass
+
+            process.wait()
+
+            if process.returncode != 0:
+                error_msg = "\n".join(output_lines[-20:])  # 最后 20 行
+                raise RuntimeError(f"Remotion render failed (code {process.returncode}):\n{error_msg}")
+
+            return output_path
+
+        loop = asyncio.get_event_loop()
+        result = await loop.run_in_executor(None, _run_render)
+
+        logger.info(f"Remotion render complete: {result}")
+        return result
+
+    async def check_health(self) -> dict:
+        """检查 Remotion 服务健康状态"""
+        try:
+            # 检查 remotion 目录是否存在
+            if not self.remotion_dir.exists():
+                return {
+                    "ready": False,
+                    "error": f"Remotion directory not found: {self.remotion_dir}"
+                }
+
+            # 检查 package.json 是否存在
+            package_json = self.remotion_dir / "package.json"
+            if not package_json.exists():
+                return {
+                    "ready": False,
+                    "error": "package.json not found"
+                }
+
+            # 检查 node_modules 是否存在
+            node_modules = self.remotion_dir / "node_modules"
+            if not node_modules.exists():
+                return {
+                    "ready": False,
+                    "error": "node_modules not found, run 'npm install' first"
+                }
+
+            return {
+                "ready": True,
+                "remotion_dir": str(self.remotion_dir)
+            }
+
+        except Exception as e:
+            return {
+                "ready": False,
+                "error": str(e)
+            }
+
+
+# 全局服务实例
+remotion_service = RemotionService()
--- a/backend/app/services/whisper_service.py
+++ b/backend/app/services/whisper_service.py
@@ -0,0 +1,176 @@
+"""
+字幕对齐服务
+使用 faster-whisper 生成字级别时间戳
+"""
+
+import json
+import re
+from pathlib import Path
+from typing import Optional
+from loguru import logger
+
+# 模型缓存
+_whisper_model = None
+
+
+def split_word_to_chars(word: str, start: float, end: float) -> list:
+    """
+    将词拆分成单个字符，时间戳线性插值
+
+    Args:
+        word: 词文本
+        start: 词开始时间
+        end: 词结束时间
+
+    Returns:
+        单字符列表，每个包含 word/start/end
+    """
+    # 只保留中文字符和基本标点
+    chars = [c for c in word if c.strip()]
+    if not chars:
+        return []
+
+    if len(chars) == 1:
+        return [{"word": chars[0], "start": start, "end": end}]
+
+    # 线性插值时间戳
+    duration = end - start
+    char_duration = duration / len(chars)
+
+    result = []
+    for i, char in enumerate(chars):
+        char_start = start + i * char_duration
+        char_end = start + (i + 1) * char_duration
+        result.append({
+            "word": char,
+            "start": round(char_start, 3),
+            "end": round(char_end, 3)
+        })
+
+    return result
+
+
+class WhisperService:
+    """字幕对齐服务（基于 faster-whisper）"""
+
+    def __init__(
+        self,
+        model_size: str = "large-v3",
+        device: str = "cuda",
+        compute_type: str = "float16",
+    ):
+        self.model_size = model_size
+        self.device = device
+        self.compute_type = compute_type
+
+    def _load_model(self):
+        """懒加载 faster-whisper 模型"""
+        global _whisper_model
+
+        if _whisper_model is None:
+            from faster_whisper import WhisperModel
+
+            logger.info(f"Loading faster-whisper model: {self.model_size} on {self.device}")
+            _whisper_model = WhisperModel(
+                self.model_size,
+                device=self.device,
+                compute_type=self.compute_type
+            )
+            logger.info("faster-whisper model loaded")
+
+        return _whisper_model
+
+    async def align(
+        self,
+        audio_path: str,
+        text: str,
+        output_path: Optional[str] = None
+    ) -> dict:
+        """
+        对音频进行转录，生成字级别时间戳
+
+        Args:
+            audio_path: 音频文件路径
+            text: 原始文本（用于参考，但实际使用 whisper 转录结果）
+            output_path: 可选，输出 JSON 文件路径
+
+        Returns:
+            包含字级别时间戳的字典
+        """
+        import asyncio
+
+        def _do_transcribe():
+            model = self._load_model()
+
+            logger.info(f"Transcribing audio: {audio_path}")
+
+            # 转录并获取字级别时间戳
+            segments_iter, info = model.transcribe(
+                audio_path,
+                language="zh",
+                word_timestamps=True,  # 启用字级别时间戳
+                vad_filter=True,  # 启用 VAD 过滤静音
+            )
+
+            logger.info(f"Detected language: {info.language} (prob: {info.language_probability:.2f})")
+
+            segments = []
+            for segment in segments_iter:
+                seg_data = {
+                    "text": segment.text.strip(),
+                    "start": segment.start,
+                    "end": segment.end,
+                    "words": []
+                }
+
+                # 提取每个字的时间戳，并拆分成单字
+                if segment.words:
+                    for word_info in segment.words:
+                        word_text = word_info.word.strip()
+                        if word_text:
+                            # 将词拆分成单字，时间戳线性插值
+                            chars = split_word_to_chars(
+                                word_text,
+                                word_info.start,
+                                word_info.end
+                            )
+                            seg_data["words"].extend(chars)
+
+                if seg_data["words"]:  # 只添加有内容的段落
+                    segments.append(seg_data)
+
+            return {"segments": segments}
+
+        # 在线程池中执行
+        loop = asyncio.get_event_loop()
+        result = await loop.run_in_executor(None, _do_transcribe)
+
+        # 保存到文件
+        if output_path:
+            output_file = Path(output_path)
+            output_file.parent.mkdir(parents=True, exist_ok=True)
+            with open(output_file, "w", encoding="utf-8") as f:
+                json.dump(result, f, ensure_ascii=False, indent=2)
+            logger.info(f"Captions saved to: {output_path}")
+
+        return result
+
+    async def check_health(self) -> dict:
+        """检查服务健康状态"""
+        try:
+            from faster_whisper import WhisperModel
+            return {
+                "ready": True,
+                "model_size": self.model_size,
+                "device": self.device,
+                "backend": "faster-whisper"
+            }
+        except ImportError:
+            return {
+                "ready": False,
+                "error": "faster-whisper not installed"
+            }
+
+
+# 全局服务实例
+whisper_service = WhisperService()
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -28,3 +28,6 @@ supabase>=2.0.0
 python-jose[cryptography]>=3.3.0
 passlib[bcrypt]>=1.7.4
 bcrypt==4.0.1
+
+# 字幕对齐
+faster-whisper>=1.0.0
--- a/frontend/README.md
+++ b/frontend/README.md
@@ -24,6 +24,11 @@ ViGent2 的前端界面，采用 Next.js 14 + TailwindCSS 构建。
 - **参考音频管理**: 上传/列表/删除参考音频 (3-20秒 WAV)。
 - **一键克隆**: 选择参考音频后自动调用 Qwen3-TTS 服务。

+### 4. 字幕与标题 [Day 13 新增]
+- **片头标题**: 可选输入，视频开头显示 3 秒淡入淡出标题。
+- **逐字高亮字幕**: 卡拉OK效果，默认开启，可关闭。
+- **自动对齐**: 基于 faster-whisper 生成字级别时间戳。
+
 ## 🛠️ 技术栈

 - **框架**: Next.js 14 (App Router)
--- a/frontend/src/app/page.tsx
+++ b/frontend/src/app/page.tsx
@@ -74,6 +74,10 @@ export default function Home() {

  const [selectedVideoId, setSelectedVideoId] = useState<string | null>(null);

+  // 字幕和标题相关状态
+  const [videoTitle, setVideoTitle] = useState<string>("");
+  const [enableSubtitles, setEnableSubtitles] = useState<boolean>(true);
+
  // 声音克隆相关状态
  const [ttsMode, setTtsMode] = useState<'edgetts' | 'voiceclone'>('edgetts');
  const [refAudios, setRefAudios] = useState<RefAudio[]>([]);
@@ -356,6 +360,8 @@ export default function Home() {
        material_path: materialObj.path,
        text: text,
        tts_mode: ttsMode,
+        title: videoTitle.trim() || undefined,
+        enable_subtitles: enableSubtitles,
      };

      if (ttsMode === 'edgetts') {
@@ -587,6 +593,46 @@ export default function Home() {
              </div>
            </div>

+            {/* 标题和字幕设置 */}
+            <div className="bg-white/5 rounded-2xl p-4 sm:p-6 border border-white/10 backdrop-blur-sm">
+              <h2 className="text-base sm:text-lg font-semibold text-white mb-4 flex items-center gap-2">
+                🎬 标题与字幕
+              </h2>
+
+              {/* 视频标题输入 */}
+              <div className="mb-4">
+                <label className="text-sm text-gray-300 mb-2 block">
+                  片头标题（可选）
+                </label>
+                <input
+                  type="text"
+                  value={videoTitle}
+                  onChange={(e) => setVideoTitle(e.target.value)}
+                  placeholder="输入视频标题，将在片头显示"
+                  className="w-full px-3 sm:px-4 py-2 text-sm sm:text-base bg-black/30 border border-white/10 rounded-xl text-white placeholder-gray-500 focus:outline-none focus:border-purple-500 transition-colors"
+                />
+              </div>
+
+              {/* 字幕开关 */}
+              <div className="flex items-center justify-between">
+                <div>
+                  <span className="text-sm text-gray-300">逐字高亮字幕</span>
+                  <p className="text-xs text-gray-500 mt-1">
+                    自动生成卡拉OK效果字幕
+                  </p>
+                </div>
+                <label className="relative inline-flex items-center cursor-pointer">
+                  <input
+                    type="checkbox"
+                    checked={enableSubtitles}
+                    onChange={(e) => setEnableSubtitles(e.target.checked)}
+                    className="sr-only peer"
+                  />
+                  <div className="w-11 h-6 bg-gray-600 peer-focus:outline-none rounded-full peer peer-checked:after:translate-x-full peer-checked:after:border-white after:content-[''] after:absolute after:top-[2px] after:left-[2px] after:bg-white after:border-gray-300 after:border after:rounded-full after:h-5 after:w-5 after:transition-all peer-checked:bg-purple-600"></div>
+                </label>
+              </div>
+            </div>
+
            {/* 配音方式选择 */}
            <div className="bg-white/5 rounded-2xl p-6 border border-white/10 backdrop-blur-sm">
              <h2 className="text-lg font-semibold text-white mb-4 flex items-center gap-2">
@@ -833,7 +879,7 @@ export default function Home() {
                      style={{ width: `${currentTask.progress}%` }}
                    />
                  </div>
-                  <p className="text-gray-300">{currentTask.message}</p>
+                  <p className="text-gray-300">正在用AI生成中...</p>
                </div>
              </div>
            )}
--- a/remotion/package-lock.json
+++ b/remotion/package-lock.json
--- a/remotion/package.json
+++ b/remotion/package.json
@@ -0,0 +1,24 @@
+{
+  "name": "vigent-remotion",
+  "version": "1.0.0",
+  "description": "Remotion video composition for ViGent2 subtitles and titles",
+  "scripts": {
+    "start": "remotion studio",
+    "build": "remotion bundle",
+    "render": "npx ts-node render.ts"
+  },
+  "dependencies": {
+    "remotion": "^4.0.0",
+    "@remotion/renderer": "^4.0.0",
+    "@remotion/cli": "^4.0.0",
+    "@remotion/media-utils": "^4.0.0",
+    "react": "^18.2.0",
+    "react-dom": "^18.2.0"
+  },
+  "devDependencies": {
+    "@types/node": "^20.0.0",
+    "@types/react": "^18.2.0",
+    "typescript": "^5.0.0",
+    "ts-node": "^10.9.0"
+  }
+}
--- a/remotion/render.ts
+++ b/remotion/render.ts
@@ -0,0 +1,153 @@
+/**
+ * Remotion 服务端渲染脚本
+ * 用于从命令行渲染视频
+ *
+ * 使用方式:
+ * npx ts-node render.ts --video /path/to/video.mp4 --captions /path/to/captions.json --title "视频标题" --output /path/to/output.mp4
+ */
+
+import { bundle } from '@remotion/bundler';
+import { renderMedia, selectComposition } from '@remotion/renderer';
+import path from 'path';
+import fs from 'fs';
+
+interface RenderOptions {
+  videoPath: string;
+  captionsPath?: string;
+  title?: string;
+  titleDuration?: number;
+  outputPath: string;
+  fps?: number;
+  enableSubtitles?: boolean;
+}
+
+async function parseArgs(): Promise<RenderOptions> {
+  const args = process.argv.slice(2);
+  const options: Partial<RenderOptions> = {};
+
+  for (let i = 0; i < args.length; i += 2) {
+    const key = args[i].replace('--', '');
+    const value = args[i + 1];
+
+    switch (key) {
+      case 'video':
+        options.videoPath = value;
+        break;
+      case 'captions':
+        options.captionsPath = value;
+        break;
+      case 'title':
+        options.title = value;
+        break;
+      case 'titleDuration':
+        options.titleDuration = parseFloat(value);
+        break;
+      case 'output':
+        options.outputPath = value;
+        break;
+      case 'fps':
+        options.fps = parseInt(value, 10);
+        break;
+      case 'enableSubtitles':
+        options.enableSubtitles = value === 'true';
+        break;
+    }
+  }
+
+  if (!options.videoPath || !options.outputPath) {
+    console.error('Usage: npx ts-node render.ts --video <path> --output <path> [--captions <path>] [--title <text>] [--fps <number>]');
+    process.exit(1);
+  }
+
+  return options as RenderOptions;
+}
+
+async function main() {
+  const options = await parseArgs();
+  const fps = options.fps || 25;
+
+  console.log('Starting Remotion render...');
+  console.log('Options:', JSON.stringify(options, null, 2));
+
+  // 读取字幕数据
+  let captions = undefined;
+  if (options.captionsPath && fs.existsSync(options.captionsPath)) {
+    const captionsContent = fs.readFileSync(options.captionsPath, 'utf-8');
+    captions = JSON.parse(captionsContent);
+    console.log(`Loaded captions with ${captions.segments?.length || 0} segments`);
+  }
+
+  // 获取视频时长
+  let durationInFrames = 300; // 默认 12 秒
+  try {
+    // 使用 ffprobe 获取视频时长
+    const { execSync } = require('child_process');
+    const ffprobeOutput = execSync(
+      `ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "${options.videoPath}"`,
+      { encoding: 'utf-8' }
+    );
+    const durationInSeconds = parseFloat(ffprobeOutput.trim());
+    durationInFrames = Math.ceil(durationInSeconds * fps);
+    console.log(`Video duration: ${durationInSeconds}s (${durationInFrames} frames at ${fps}fps)`);
+  } catch (e) {
+    console.warn('Could not get video duration, using default:', e);
+  }
+
+  // 设置 publicDir 为视频文件所在目录，使用文件名作为 videoSrc
+  const publicDir = path.dirname(path.resolve(options.videoPath));
+  const videoFileName = path.basename(options.videoPath);
+  console.log(`Public dir: ${publicDir}, Video file: ${videoFileName}`);
+
+  // Bundle the Remotion project
+  console.log('Bundling Remotion project...');
+  const bundleLocation = await bundle({
+    entryPoint: path.resolve(__dirname, './src/index.ts'),
+    webpackOverride: (config) => config,
+    publicDir,
+  });
+
+  // Select the composition
+  const composition = await selectComposition({
+    serveUrl: bundleLocation,
+    id: 'ViGentVideo',
+    inputProps: {
+      videoSrc: videoFileName,
+      captions,
+      title: options.title,
+      titleDuration: options.titleDuration || 3,
+      enableSubtitles: options.enableSubtitles !== false,
+    },
+  });
+
+  // Override duration
+  composition.durationInFrames = durationInFrames;
+  composition.fps = fps;
+
+  // Render the video
+  console.log('Rendering video...');
+  await renderMedia({
+    composition,
+    serveUrl: bundleLocation,
+    codec: 'h264',
+    outputLocation: options.outputPath,
+    inputProps: {
+      videoSrc: videoFileName,
+      captions,
+      title: options.title,
+      titleDuration: options.titleDuration || 3,
+      enableSubtitles: options.enableSubtitles !== false,
+    },
+    onProgress: ({ progress }) => {
+      const percent = Math.round(progress * 100);
+      process.stdout.write(`\rRendering: ${percent}%`);
+    },
+  });
+
+  console.log('\nRender complete!');
+  console.log(`Output: ${options.outputPath}`);
+}
+
+main().catch((err) => {
+  console.error('Render failed:', err);
+  process.exit(1);
+});
--- a/remotion/src/Root.tsx
+++ b/remotion/src/Root.tsx
@@ -0,0 +1,30 @@
+import React from 'react';
+import { Composition } from 'remotion';
+import { Video, VideoProps } from './Video';
+
+/**
+ * Remotion 根组件
+ * 定义视频合成配置
+ */
+export const RemotionRoot: React.FC = () => {
+  return (
+    <>
+      <Composition
+        id="ViGentVideo"
+        component={Video}
+        durationInFrames={300} // 默认值，会被 render.ts 覆盖
+        fps={25}
+        width={1280}
+        height={720}
+        defaultProps={{
+          videoSrc: '',
+          audioSrc: undefined,
+          captions: undefined,
+          title: undefined,
+          titleDuration: 3,
+          enableSubtitles: true,
+        }}
+      />
+    </>
+  );
+};
--- a/remotion/src/Video.tsx
+++ b/remotion/src/Video.tsx
@@ -0,0 +1,45 @@
+import React from 'react';
+import { AbsoluteFill, Composition } from 'remotion';
+import { VideoLayer } from './components/VideoLayer';
+import { Title } from './components/Title';
+import { Subtitles } from './components/Subtitles';
+import { CaptionsData } from './utils/captions';
+
+export interface VideoProps {
+  videoSrc: string;
+  audioSrc?: string;
+  captions?: CaptionsData;
+  title?: string;
+  titleDuration?: number;
+  enableSubtitles?: boolean;
+}
+
+/**
+ * 主视频组件
+ * 组合视频层、标题层和字幕层
+ */
+export const Video: React.FC<VideoProps> = ({
+  videoSrc,
+  audioSrc,
+  captions,
+  title,
+  titleDuration = 3,
+  enableSubtitles = true,
+}) => {
+  return (
+    <AbsoluteFill style={{ backgroundColor: 'black' }}>
+      {/* 底层：视频 */}
+      <VideoLayer videoSrc={videoSrc} audioSrc={audioSrc} />
+
+      {/* 中层：字幕 */}
+      {enableSubtitles && captions && (
+        <Subtitles captions={captions} />
+      )}
+
+      {/* 顶层：标题 */}
+      {title && (
+        <Title title={title} duration={titleDuration} />
+      )}
+    </AbsoluteFill>
+  );
+};
--- a/remotion/src/components/Subtitles.tsx
+++ b/remotion/src/components/Subtitles.tsx
@@ -0,0 +1,85 @@
+import React from 'react';
+import { AbsoluteFill, useCurrentFrame, useVideoConfig } from 'remotion';
+import {
+  CaptionsData,
+  getCurrentSegment,
+  getCurrentWordIndex,
+} from '../utils/captions';
+
+interface SubtitlesProps {
+  captions: CaptionsData;
+  highlightColor?: string;
+  normalColor?: string;
+  fontSize?: number;
+}
+
+/**
+ * 逐字高亮字幕组件
+ * 根据时间戳逐字高亮显示字幕
+ */
+export const Subtitles: React.FC<SubtitlesProps> = ({
+  captions,
+  highlightColor = '#FFFFFF',
+  normalColor = 'rgba(255, 255, 255, 0.5)',
+  fontSize = 36,
+}) => {
+  const frame = useCurrentFrame();
+  const { fps } = useVideoConfig();
+
+  const currentTimeInSeconds = frame / fps;
+
+  // 获取当前段落
+  const currentSegment = getCurrentSegment(captions, currentTimeInSeconds);
+
+  if (!currentSegment || currentSegment.words.length === 0) {
+    return null;
+  }
+
+  // 获取当前高亮字的索引
+  const currentWordIndex = getCurrentWordIndex(currentSegment, currentTimeInSeconds);
+
+  return (
+    <AbsoluteFill
+      style={{
+        justifyContent: 'flex-end',
+        alignItems: 'center',
+        paddingBottom: '60px',
+      }}
+    >
+      <div
+        style={{
+          background: 'rgba(0, 0, 0, 0.6)',
+          padding: '12px 24px',
+          borderRadius: '12px',
+          maxWidth: '80%',
+          textAlign: 'center',
+        }}
+      >
+        <p
+          style={{
+            margin: 0,
+            fontSize: `${fontSize}px`,
+            fontFamily: '"Noto Sans SC", "Microsoft YaHei", sans-serif',
+            fontWeight: 500,
+            lineHeight: 1.5,
+          }}
+        >
+          {currentSegment.words.map((word, index) => (
+            <span
+              key={`${word.word}-${index}`}
+              style={{
+                color: index <= currentWordIndex ? highlightColor : normalColor,
+                transition: 'color 0.1s ease',
+                textShadow: index <= currentWordIndex
+                  ? '0 2px 10px rgba(255,255,255,0.3)'
+                  : 'none',
+              }}
+            >
+              {word.word}
+            </span>
+          ))}
+        </p>
+      </div>
+    </AbsoluteFill>
+  );
+};
--- a/remotion/src/components/Title.tsx
+++ b/remotion/src/components/Title.tsx
@@ -0,0 +1,94 @@
+import React from 'react';
+import {
+  AbsoluteFill,
+  interpolate,
+  useCurrentFrame,
+  useVideoConfig,
+} from 'remotion';
+
+interface TitleProps {
+  title: string;
+  duration?: number; // 标题显示时长（秒）
+  fadeOutStart?: number; // 开始淡出的时间（秒）
+}
+
+/**
+ * 片头标题组件
+ * 在视频开头显示标题，带淡入淡出效果
+ */
+export const Title: React.FC<TitleProps> = ({
+  title,
+  duration = 3,
+  fadeOutStart = 2,
+}) => {
+  const frame = useCurrentFrame();
+  const { fps } = useVideoConfig();
+
+  const currentTimeInSeconds = frame / fps;
+
+  // 如果超过显示时长，不渲染
+  if (currentTimeInSeconds > duration) {
+    return null;
+  }
+
+  // 淡入效果 (0-0.5秒)
+  const fadeInOpacity = interpolate(
+    currentTimeInSeconds,
+    [0, 0.5],
+    [0, 1],
+    { extrapolateRight: 'clamp' }
+  );
+
+  // 淡出效果
+  const fadeOutOpacity = interpolate(
+    currentTimeInSeconds,
+    [fadeOutStart, duration],
+    [1, 0],
+    { extrapolateLeft: 'clamp', extrapolateRight: 'clamp' }
+  );
+
+  const opacity = Math.min(fadeInOpacity, fadeOutOpacity);
+
+  // 轻微的缩放动画
+  const scale = interpolate(
+    currentTimeInSeconds,
+    [0, 0.5],
+    [0.95, 1],
+    { extrapolateRight: 'clamp' }
+  );
+
+  return (
+    <AbsoluteFill
+      style={{
+        justifyContent: 'center',
+        alignItems: 'center',
+        opacity,
+      }}
+    >
+      <div
+        style={{
+          transform: `scale(${scale})`,
+          textAlign: 'center',
+          padding: '40px 60px',
+          background: 'linear-gradient(135deg, rgba(0,0,0,0.7) 0%, rgba(0,0,0,0.5) 100%)',
+          borderRadius: '20px',
+          backdropFilter: 'blur(10px)',
+        }}
+      >
+        <h1
+          style={{
+            color: 'white',
+            fontSize: '48px',
+            fontWeight: 'bold',
+            fontFamily: '"Noto Sans SC", "Microsoft YaHei", sans-serif',
+            textShadow: '0 4px 20px rgba(0,0,0,0.5)',
+            margin: 0,
+            lineHeight: 1.4,
+          }}
+        >
+          {title}
+        </h1>
+      </div>
+    </AbsoluteFill>
+  );
+};
--- a/remotion/src/components/VideoLayer.tsx
+++ b/remotion/src/components/VideoLayer.tsx
@@ -0,0 +1,33 @@
+import React from 'react';
+import { AbsoluteFill, OffthreadVideo, Audio, staticFile } from 'remotion';
+
+interface VideoLayerProps {
+  videoSrc: string;
+  audioSrc?: string;
+}
+
+/**
+ * 视频图层组件
+ * 渲染底层视频和音频
+ */
+export const VideoLayer: React.FC<VideoLayerProps> = ({
+  videoSrc,
+  audioSrc,
+}) => {
+  // 使用 staticFile 从 publicDir 加载视频
+  const videoUrl = staticFile(videoSrc);
+
+  return (
+    <AbsoluteFill>
+      <OffthreadVideo
+        src={videoUrl}
+        style={{
+          width: '100%',
+          height: '100%',
+          objectFit: 'contain',
+        }}
+      />
+      {audioSrc && <Audio src={staticFile(audioSrc)} />}
+    </AbsoluteFill>
+  );
+};
--- a/remotion/src/index.ts
+++ b/remotion/src/index.ts
@@ -0,0 +1,4 @@
+import { registerRoot } from 'remotion';
+import { RemotionRoot } from './Root';
+
+registerRoot(RemotionRoot);
--- a/remotion/src/utils/captions.ts
+++ b/remotion/src/utils/captions.ts
@@ -0,0 +1,66 @@
+/**
+ * 字幕数据类型定义和处理工具
+ */
+
+export interface WordTimestamp {
+  word: string;
+  start: number;
+  end: number;
+}
+
+export interface Segment {
+  text: string;
+  start: number;
+  end: number;
+  words: WordTimestamp[];
+}
+
+export interface CaptionsData {
+  segments: Segment[];
+}
+
+/**
+ * 根据当前时间获取应该显示的字幕段落
+ */
+export function getCurrentSegment(
+  captions: CaptionsData,
+  currentTimeInSeconds: number
+): Segment | null {
+  for (const segment of captions.segments) {
+    if (currentTimeInSeconds >= segment.start && currentTimeInSeconds <= segment.end) {
+      return segment;
+    }
+  }
+  return null;
+}
+
+/**
+ * 根据当前时间获取当前高亮的字的索引
+ */
+export function getCurrentWordIndex(
+  segment: Segment,
+  currentTimeInSeconds: number
+): number {
+  for (let i = 0; i < segment.words.length; i++) {
+    const word = segment.words[i];
+    if (currentTimeInSeconds >= word.start && currentTimeInSeconds <= word.end) {
+      return i;
+    }
+    // 如果当前时间在两个字之间，返回前一个字
+    if (i < segment.words.length - 1) {
+      const nextWord = segment.words[i + 1];
+      if (currentTimeInSeconds > word.end && currentTimeInSeconds < nextWord.start) {
+        return i;
+      }
+    }
+  }
+  // 如果超过最后一个字的结束时间，返回最后一个字
+  if (segment.words.length > 0) {
+    const lastWord = segment.words[segment.words.length - 1];
+    if (currentTimeInSeconds >= lastWord.end) {
+      return segment.words.length - 1;
+    }
+  }
+  return -1;
+}
+
--- a/remotion/tsconfig.json
+++ b/remotion/tsconfig.json
@@ -0,0 +1,19 @@
+{
+  "compilerOptions": {
+    "target": "ES2020",
+    "module": "commonjs",
+    "lib": ["ES2020", "DOM"],
+    "jsx": "react-jsx",
+    "strict": true,
+    "esModuleInterop": true,
+    "skipLibCheck": true,
+    "forceConsistentCasingInFileNames": true,
+    "resolveJsonModule": true,
+    "declaration": true,
+    "declarationMap": true,
+    "outDir": "./dist",
+    "rootDir": "."
+  },
+  "include": ["src/**/*", "render.ts"],
+  "exclude": ["node_modules", "dist"]
+}
Author	SHA1	Message	Date
Kevin Wong	cf679b34bf	更新	2026-01-29 17:58:07 +08:00
Kevin Wong	b74bacb0b5	更新	2026-01-29 17:54:43 +08:00