Compare commits
2 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
cf679b34bf | ||
|
|
b74bacb0b5 |
@@ -1,4 +1,4 @@
|
||||
# Day 13 - 声音克隆功能集成完成
|
||||
# Day 13 - 声音克隆功能集成 + 字幕功能
|
||||
|
||||
**日期**:2026-01-29
|
||||
|
||||
@@ -276,4 +276,156 @@ pm2 logs vigent2-qwen-tts --lines 50
|
||||
- [task_complete.md](../task_complete.md) - 任务总览
|
||||
- [Day12.md](./Day12.md) - iOS 兼容与 Qwen3-TTS 部署
|
||||
- [QWEN3_TTS_DEPLOY.md](../QWEN3_TTS_DEPLOY.md) - Qwen3-TTS 部署指南
|
||||
- [SUBTITLE_DEPLOY.md](../SUBTITLE_DEPLOY.md) - 字幕功能部署指南
|
||||
- [DEPLOY_MANUAL.md](../DEPLOY_MANUAL.md) - 完整部署手册
|
||||
|
||||
---
|
||||
|
||||
## 🎬 逐字高亮字幕 + 片头标题功能
|
||||
|
||||
### 背景
|
||||
|
||||
为提升视频质量,新增逐字高亮字幕(卡拉OK效果)和片头标题功能。
|
||||
|
||||
### 技术方案
|
||||
|
||||
| 组件 | 技术 | 说明 |
|
||||
|------|------|------|
|
||||
| 字幕对齐 | **faster-whisper** | 生成字级别时间戳 |
|
||||
| 视频渲染 | **Remotion** | React 视频合成框架 |
|
||||
|
||||
### 架构设计
|
||||
|
||||
```
|
||||
原有流程:
|
||||
文本 → EdgeTTS → 音频 → LatentSync → FFmpeg合成 → 最终视频
|
||||
|
||||
新流程:
|
||||
文本 → EdgeTTS → 音频 ─┬→ LatentSync → 唇形视频 ─┐
|
||||
└→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频
|
||||
```
|
||||
|
||||
### 后端新增服务
|
||||
|
||||
#### 1. 字幕服务 (`whisper_service.py`)
|
||||
|
||||
基于 faster-whisper 生成字级别时间戳:
|
||||
|
||||
```python
|
||||
from faster_whisper import WhisperModel
|
||||
|
||||
class WhisperService:
|
||||
def __init__(self, model_size="large-v3", device="cuda"):
|
||||
self.model = WhisperModel(model_size, device=device)
|
||||
|
||||
async def align(self, audio_path: str, text: str, output_path: str):
|
||||
segments, info = self.model.transcribe(audio_path, word_timestamps=True)
|
||||
# 将词拆分成单字,时间戳线性插值
|
||||
result = {"segments": [...]}
|
||||
# 保存到 JSON
|
||||
```
|
||||
|
||||
**字幕拆字算法**:faster-whisper 对中文返回词级别,系统自动拆分成单字并线性插值:
|
||||
|
||||
```python
|
||||
# 输入: {"word": "大家好", "start": 0.0, "end": 0.9}
|
||||
# 输出:
|
||||
[
|
||||
{"word": "大", "start": 0.0, "end": 0.3},
|
||||
{"word": "家", "start": 0.3, "end": 0.6},
|
||||
{"word": "好", "start": 0.6, "end": 0.9}
|
||||
]
|
||||
```
|
||||
|
||||
#### 2. Remotion 渲染服务 (`remotion_service.py`)
|
||||
|
||||
调用 Remotion 渲染字幕和标题:
|
||||
|
||||
```python
|
||||
class RemotionService:
|
||||
async def render(self, video_path, output_path, captions_path, title, ...):
|
||||
cmd = f"npx ts-node render.ts --video {video_path} --output {output_path} ..."
|
||||
# 执行渲染
|
||||
```
|
||||
|
||||
### Remotion 项目结构
|
||||
|
||||
```
|
||||
remotion/
|
||||
├── package.json # Node.js 依赖
|
||||
├── render.ts # 服务端渲染脚本
|
||||
└── src/
|
||||
├── Video.tsx # 主视频组件
|
||||
├── components/
|
||||
│ ├── Title.tsx # 片头标题(淡入淡出)
|
||||
│ ├── Subtitles.tsx # 逐字高亮字幕
|
||||
│ └── VideoLayer.tsx # 视频图层
|
||||
└── utils/
|
||||
└── captions.ts # 字幕数据类型
|
||||
```
|
||||
|
||||
### 前端 UI
|
||||
|
||||
新增标题和字幕设置区块:
|
||||
|
||||
| 功能 | 说明 |
|
||||
|------|------|
|
||||
| 片头标题输入 | 可选,在视频开头显示 3 秒 |
|
||||
| 字幕开关 | 默认开启,可关闭 |
|
||||
|
||||
### 遇到的问题与修复
|
||||
|
||||
#### 问题 1: `fs` 模块错误
|
||||
|
||||
**现象**:Remotion 打包失败,提示 `fs.js doesn't exist`
|
||||
|
||||
**原因**:`captions.ts` 中有 `loadCaptions` 函数使用了 Node.js 的 `fs` 模块
|
||||
|
||||
**修复**:删除未使用的 `loadCaptions` 函数
|
||||
|
||||
#### 问题 2: 视频文件读取失败
|
||||
|
||||
**现象**:`file://` 协议无法读取本地视频
|
||||
|
||||
**修复**:
|
||||
1. `render.ts` 使用 `publicDir` 指向视频目录
|
||||
2. `VideoLayer.tsx` 使用 `staticFile()` 加载视频
|
||||
|
||||
```typescript
|
||||
// render.ts
|
||||
const publicDir = path.dirname(path.resolve(options.videoPath));
|
||||
const bundleLocation = await bundle({
|
||||
entryPoint: path.resolve(__dirname, './src/index.ts'),
|
||||
publicDir, // 关键配置
|
||||
});
|
||||
|
||||
// VideoLayer.tsx
|
||||
const videoUrl = staticFile(videoSrc);
|
||||
```
|
||||
|
||||
### 测试结果
|
||||
|
||||
- ✅ faster-whisper 字幕对齐成功(~1秒)
|
||||
- ✅ Remotion 渲染成功(~10秒)
|
||||
- ✅ 字幕逐字高亮效果正常
|
||||
- ✅ 片头标题淡入淡出正常
|
||||
- ✅ 降级机制正常(Remotion 失败时回退到 FFmpeg)
|
||||
|
||||
---
|
||||
|
||||
## 📁 今日修改文件清单(完整)
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `models/Qwen3-TTS/qwen_tts_server.py` | 新增 | Qwen3-TTS HTTP 推理服务 |
|
||||
| `run_qwen_tts.sh` | 新增 | PM2 启动脚本 (根目录) |
|
||||
| `backend/app/services/voice_clone_service.py` | 新增 | 声音克隆服务 (HTTP 调用) |
|
||||
| `backend/app/services/whisper_service.py` | 新增 | 字幕对齐服务 (faster-whisper) |
|
||||
| `backend/app/services/remotion_service.py` | 新增 | Remotion 渲染服务 |
|
||||
| `backend/app/api/ref_audios.py` | 新增 | 参考音频管理 API |
|
||||
| `backend/app/api/videos.py` | 修改 | 集成字幕和标题功能 |
|
||||
| `backend/app/main.py` | 修改 | 注册 ref-audios 路由 |
|
||||
| `backend/requirements.txt` | 修改 | 添加 faster-whisper 依赖 |
|
||||
| `remotion/` | 新增 | Remotion 视频渲染项目 |
|
||||
| `frontend/src/app/page.tsx` | 修改 | TTS 模式选择 + 标题字幕 UI |
|
||||
| `Docs/SUBTITLE_DEPLOY.md` | 新增 | 字幕功能部署文档 |
|
||||
|
||||
281
Docs/SUBTITLE_DEPLOY.md
Normal file
281
Docs/SUBTITLE_DEPLOY.md
Normal file
@@ -0,0 +1,281 @@
|
||||
# ViGent2 字幕与标题功能部署指南
|
||||
|
||||
本文档介绍如何部署 ViGent2 的逐字高亮字幕和片头标题功能。
|
||||
|
||||
## 功能概述
|
||||
|
||||
| 功能 | 说明 |
|
||||
|------|------|
|
||||
| **逐字高亮字幕** | 使用 faster-whisper 生成字级别时间戳,Remotion 渲染卡拉OK效果 |
|
||||
| **片头标题** | 视频开头显示标题,带淡入淡出动画,几秒后消失 |
|
||||
|
||||
## 技术架构
|
||||
|
||||
```
|
||||
原有流程:
|
||||
文本 → EdgeTTS → 音频 → LatentSync → FFmpeg合成 → 最终视频
|
||||
|
||||
新流程:
|
||||
文本 → EdgeTTS → 音频 ─┬→ LatentSync → 唇形视频 ─┐
|
||||
└→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频
|
||||
```
|
||||
|
||||
## 系统要求
|
||||
|
||||
| 组件 | 要求 |
|
||||
|------|------|
|
||||
| Node.js | 18+ |
|
||||
| Python | 3.10+ |
|
||||
| GPU 显存 | faster-whisper 需要约 3-4GB VRAM |
|
||||
| FFmpeg | 已安装 |
|
||||
|
||||
---
|
||||
|
||||
## 部署步骤
|
||||
|
||||
### 步骤 1: 安装 faster-whisper (Python)
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
source venv/bin/activate
|
||||
|
||||
# 安装 faster-whisper
|
||||
pip install faster-whisper>=1.0.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
```
|
||||
|
||||
> **注意**: 首次运行时,faster-whisper 会自动下载 `large-v3` Whisper 模型 (~3GB)
|
||||
|
||||
### 步骤 2: 安装 Remotion (Node.js)
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/remotion
|
||||
|
||||
# 安装依赖
|
||||
npm install
|
||||
```
|
||||
|
||||
### 步骤 3: 重启后端服务
|
||||
|
||||
```bash
|
||||
pm2 restart vigent2-backend
|
||||
```
|
||||
|
||||
### 步骤 4: 验证安装
|
||||
|
||||
```bash
|
||||
# 检查 faster-whisper 是否安装成功
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
source venv/bin/activate
|
||||
python -c "from faster_whisper import WhisperModel; print('faster-whisper OK')"
|
||||
|
||||
# 检查 Remotion 是否安装成功
|
||||
cd /home/rongye/ProgramFiles/ViGent2/remotion
|
||||
npx remotion --version
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 文件结构
|
||||
|
||||
### 后端新增文件
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `backend/app/services/whisper_service.py` | 字幕对齐服务 (基于 faster-whisper) |
|
||||
| `backend/app/services/remotion_service.py` | Remotion 渲染服务 |
|
||||
|
||||
### Remotion 项目结构
|
||||
|
||||
```
|
||||
remotion/
|
||||
├── package.json # Node.js 依赖配置
|
||||
├── tsconfig.json # TypeScript 配置
|
||||
├── render.ts # 服务端渲染脚本
|
||||
└── src/
|
||||
├── index.ts # Remotion 入口
|
||||
├── Root.tsx # 根组件
|
||||
├── Video.tsx # 主视频组件
|
||||
├── components/
|
||||
│ ├── Title.tsx # 片头标题组件
|
||||
│ ├── Subtitles.tsx # 逐字高亮字幕组件
|
||||
│ └── VideoLayer.tsx # 视频图层组件
|
||||
├── utils/
|
||||
│ └── captions.ts # 字幕数据处理工具
|
||||
└── fonts/ # 字体文件目录 (可选)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API 参数
|
||||
|
||||
视频生成 API (`POST /api/videos/generate`) 新增以下参数:
|
||||
|
||||
| 参数 | 类型 | 默认值 | 说明 |
|
||||
|------|------|--------|------|
|
||||
| `title` | string | null | 视频标题(片头显示,可选) |
|
||||
| `enable_subtitles` | boolean | true | 是否启用逐字高亮字幕 |
|
||||
|
||||
### 请求示例
|
||||
|
||||
```json
|
||||
{
|
||||
"material_path": "https://...",
|
||||
"text": "大家好,欢迎来到我的频道",
|
||||
"tts_mode": "edgetts",
|
||||
"voice": "zh-CN-YunxiNeural",
|
||||
"title": "今日分享",
|
||||
"enable_subtitles": true
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 视频生成流程
|
||||
|
||||
新的视频生成流程进度分配:
|
||||
|
||||
| 阶段 | 进度 | 说明 |
|
||||
|------|------|------|
|
||||
| 下载素材 | 0% → 5% | 从 Supabase 下载输入视频 |
|
||||
| TTS 语音生成 | 5% → 25% | EdgeTTS 或 Qwen3-TTS 生成音频 |
|
||||
| 唇形同步 | 25% → 80% | LatentSync 推理 |
|
||||
| 字幕对齐 | 80% → 85% | faster-whisper 生成字级别时间戳 |
|
||||
| Remotion 渲染 | 85% → 95% | 合成字幕和标题 |
|
||||
| 上传结果 | 95% → 100% | 上传到 Supabase Storage |
|
||||
|
||||
---
|
||||
|
||||
## 降级处理
|
||||
|
||||
系统包含自动降级机制,确保基本功能不受影响:
|
||||
|
||||
| 场景 | 处理方式 |
|
||||
|------|----------|
|
||||
| 字幕对齐失败 | 跳过字幕,继续生成视频 |
|
||||
| Remotion 未安装 | 使用 FFmpeg 直接合成 |
|
||||
| Remotion 渲染失败 | 回退到 FFmpeg 合成 |
|
||||
|
||||
---
|
||||
|
||||
## 配置说明
|
||||
|
||||
### 字幕服务配置
|
||||
|
||||
字幕服务位于 `backend/app/services/whisper_service.py`,默认配置:
|
||||
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `model_size` | large-v3 | Whisper 模型大小 |
|
||||
| `device` | cuda | 运行设备 |
|
||||
| `compute_type` | float16 | 计算精度 |
|
||||
|
||||
如需修改,可编辑 `whisper_service.py` 中的 `WhisperService` 初始化参数。
|
||||
|
||||
### Remotion 配置
|
||||
|
||||
Remotion 渲染参数在 `backend/app/services/remotion_service.py` 中配置:
|
||||
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `fps` | 25 | 输出帧率 |
|
||||
| `title_duration` | 3.0 | 标题显示时长(秒) |
|
||||
|
||||
---
|
||||
|
||||
## 故障排除
|
||||
|
||||
### faster-whisper 相关
|
||||
|
||||
**问题**: `ModuleNotFoundError: No module named 'faster_whisper'`
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
source venv/bin/activate
|
||||
pip install faster-whisper>=1.0.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
```
|
||||
|
||||
**问题**: GPU 显存不足
|
||||
|
||||
修改 `whisper_service.py`,使用较小的模型:
|
||||
```python
|
||||
WhisperService(model_size="medium", compute_type="int8")
|
||||
```
|
||||
|
||||
### Remotion 相关
|
||||
|
||||
**问题**: `node_modules not found`
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/remotion
|
||||
npm install
|
||||
```
|
||||
|
||||
**问题**: Remotion 渲染失败 - `fs` 模块错误
|
||||
|
||||
确保 `remotion/src/utils/captions.ts` 中没有使用 Node.js 的 `fs` 模块。Remotion 在浏览器环境打包,不支持 `fs`。
|
||||
|
||||
**问题**: Remotion 渲染失败 - 视频文件读取错误 (`file://` 协议)
|
||||
|
||||
确保 `render.ts` 使用 `publicDir` 选项指向视频所在目录,`VideoLayer.tsx` 使用 `staticFile()` 加载视频:
|
||||
|
||||
```typescript
|
||||
// render.ts
|
||||
const publicDir = path.dirname(path.resolve(options.videoPath));
|
||||
const bundleLocation = await bundle({
|
||||
entryPoint: path.resolve(__dirname, './src/index.ts'),
|
||||
publicDir, // 关键配置
|
||||
});
|
||||
|
||||
// VideoLayer.tsx
|
||||
const videoUrl = staticFile(videoSrc); // 使用 staticFile
|
||||
```
|
||||
|
||||
**问题**: Remotion 渲染失败
|
||||
|
||||
查看后端日志:
|
||||
```bash
|
||||
pm2 logs vigent2-backend
|
||||
```
|
||||
|
||||
### 查看服务健康状态
|
||||
|
||||
```bash
|
||||
# 字幕服务健康检查
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
source venv/bin/activate
|
||||
python -c "from app.services.whisper_service import whisper_service; import asyncio; print(asyncio.run(whisper_service.check_health()))"
|
||||
|
||||
# Remotion 健康检查
|
||||
python -c "from app.services.remotion_service import remotion_service; import asyncio; print(asyncio.run(remotion_service.check_health()))"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 可选优化
|
||||
|
||||
### 添加中文字体
|
||||
|
||||
为获得更好的字幕渲染效果,可添加中文字体:
|
||||
|
||||
```bash
|
||||
# 下载 Noto Sans SC 字体
|
||||
cd /home/rongye/ProgramFiles/ViGent2/remotion/src/fonts
|
||||
wget https://github.com/googlefonts/noto-cjk/raw/main/Sans/OTF/SimplifiedChinese/NotoSansSC-Regular.otf -O NotoSansSC.otf
|
||||
```
|
||||
|
||||
### 使用 GPU 0
|
||||
|
||||
faster-whisper 默认使用 GPU 0,与 LatentSync (GPU 1) 分开,避免显存冲突。如需指定 GPU:
|
||||
|
||||
```python
|
||||
# 在 whisper_service.py 中修改
|
||||
WhisperService(device="cuda:0") # 或 "cuda:1"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 更新日志
|
||||
|
||||
| 日期 | 版本 | 说明 |
|
||||
|------|------|------|
|
||||
| 2026-01-29 | 1.0.0 | 初始版本,使用 faster-whisper + Remotion 实现逐字高亮字幕和片头标题 |
|
||||
@@ -3,7 +3,7 @@
|
||||
**项目**:ViGent2 数字人口播视频生成系统
|
||||
**服务器**:Dell R730 (2× RTX 3090 24GB)
|
||||
**更新时间**:2026-01-29
|
||||
**整体进度**:100%(Day 13 声音克隆功能集成完成)
|
||||
**整体进度**:100%(Day 13 声音克隆 + 字幕功能完成)
|
||||
|
||||
## 📖 快速导航
|
||||
|
||||
@@ -177,6 +177,14 @@
|
||||
- [x] **Supabase ref-audios Bucket** (参考音频存储桶 + RLS 策略)
|
||||
- [x] **端到端测试验证** (声音克隆完整流程测试通过)
|
||||
|
||||
### 阶段二十一:逐字高亮字幕 + 片头标题 (Day 13)
|
||||
- [x] **faster-whisper 字幕对齐** (字级别时间戳生成)
|
||||
- [x] **Remotion 视频渲染** (React 视频合成框架)
|
||||
- [x] **逐字高亮字幕** (卡拉OK效果)
|
||||
- [x] **片头标题** (淡入淡出动画)
|
||||
- [x] **前端标题/字幕设置 UI**
|
||||
- [x] **降级机制** (Remotion 失败时回退 FFmpeg)
|
||||
|
||||
---
|
||||
|
||||
## 🛤️ 后续规划
|
||||
@@ -187,6 +195,7 @@
|
||||
### 🟠 功能完善
|
||||
- [x] Qwen3-TTS 集成到 ViGent2 ✅ Day 13 完成
|
||||
- [x] 定时发布功能 ✅ Day 7 完成
|
||||
- [x] 逐字高亮字幕 ✅ Day 13 完成
|
||||
- [ ] **后端定时发布** - 替代平台端定时,使用 APScheduler 实现任务调度
|
||||
- [ ] 批量视频生成
|
||||
- [ ] 字幕样式编辑器
|
||||
@@ -366,11 +375,15 @@ Day 12: iOS 兼容与移动端优化 ✅ 完成
|
||||
- **Qwen3-TTS 0.6B 部署** (声音克隆模型,GPU0)
|
||||
- **部署文档** (QWEN3_TTS_DEPLOY.md)
|
||||
|
||||
Day 13: 声音克隆功能集成 ✅ 完成
|
||||
Day 13: 声音克隆 + 字幕功能 ✅ 完成
|
||||
- Qwen3-TTS HTTP 服务 (独立 FastAPI,端口 8009)
|
||||
- 声音克隆服务 (voice_clone_service.py)
|
||||
- 参考音频管理 API (上传/列表/删除)
|
||||
- 前端 TTS 模式选择 (EdgeTTS / 声音克隆)
|
||||
- Supabase ref-audios Bucket 配置
|
||||
- 端到端测试验证通过
|
||||
- **faster-whisper 字幕对齐** (字级别时间戳)
|
||||
- **Remotion 视频渲染** (逐字高亮字幕 + 片头标题)
|
||||
- **前端标题/字幕设置 UI**
|
||||
- **部署文档** (SUBTITLE_DEPLOY.md)
|
||||
|
||||
|
||||
@@ -10,7 +10,9 @@
|
||||
|
||||
- 🎬 **唇形同步** - LatentSync 1.6 驱动,512×512 高分辨率 Diffusion 模型
|
||||
- 🎙️ **TTS 配音** - EdgeTTS 多音色支持(云溪、晓晓等)
|
||||
- 🔊 **声音克隆** - Qwen3-TTS 0.6B,3秒参考音频快速克隆 🆕
|
||||
- 🔊 **声音克隆** - Qwen3-TTS 0.6B,3秒参考音频快速克隆
|
||||
- 📝 **逐字高亮字幕** - faster-whisper + Remotion,卡拉OK效果 🆕
|
||||
- 🎬 **片头标题** - 淡入淡出动画,可自定义 🆕
|
||||
- 📱 **全自动发布** - 扫码登录 + Cookie持久化,支持多平台(B站/抖音/小红书)定时发布
|
||||
- 🖥️ **Web UI** - Next.js 现代化界面,iOS/Android 移动端适配
|
||||
- 🔐 **用户系统** - Supabase + JWT 认证,支持管理员后台、注册/登录
|
||||
@@ -29,6 +31,7 @@
|
||||
| 唇形同步 | **LatentSync 1.6** (Latent Diffusion, 512×512) |
|
||||
| TTS | EdgeTTS |
|
||||
| 声音克隆 | **Qwen3-TTS 0.6B** |
|
||||
| 字幕渲染 | **faster-whisper + Remotion** |
|
||||
| 视频处理 | FFmpeg |
|
||||
| 自动发布 | Playwright |
|
||||
|
||||
@@ -152,6 +155,7 @@ nohup python -m scripts.server > server.log 2>&1 &
|
||||
|
||||
- [手动部署指南](Docs/DEPLOY_MANUAL.md)
|
||||
- [Supabase 部署指南](Docs/SUPABASE_DEPLOY.md)
|
||||
- [字幕功能部署指南](Docs/SUBTITLE_DEPLOY.md)
|
||||
- [LatentSync 部署指南](models/LatentSync/DEPLOY.md)
|
||||
- [开发日志](Docs/DevLogs/)
|
||||
- [任务进度](Docs/task_complete.md)
|
||||
|
||||
@@ -13,6 +13,8 @@ from app.services.video_service import VideoService
|
||||
from app.services.lipsync_service import LipSyncService
|
||||
from app.services.voice_clone_service import voice_clone_service
|
||||
from app.services.storage import storage_service
|
||||
from app.services.whisper_service import whisper_service
|
||||
from app.services.remotion_service import remotion_service
|
||||
from app.core.config import settings
|
||||
from app.core.deps import get_current_user
|
||||
|
||||
@@ -26,6 +28,9 @@ class GenerateRequest(BaseModel):
|
||||
tts_mode: str = "edgetts" # "edgetts" | "voiceclone"
|
||||
ref_audio_id: Optional[str] = None # 参考音频 storage path
|
||||
ref_text: Optional[str] = None # 参考音频的转写文字
|
||||
# 字幕和标题功能
|
||||
title: Optional[str] = None # 视频标题(片头显示)
|
||||
enable_subtitles: bool = True # 是否启用逐字高亮字幕
|
||||
|
||||
tasks = {} # In-memory task store
|
||||
|
||||
@@ -167,17 +172,84 @@ async def _process_video_generation(task_id: str, req: GenerateRequest, user_id:
|
||||
|
||||
lipsync_time = time.time() - lipsync_start
|
||||
print(f"[Pipeline] LipSync completed in {lipsync_time:.1f}s")
|
||||
tasks[task_id]["progress"] = 80
|
||||
|
||||
# 3. WhisperX 字幕对齐 - 进度 80% -> 85%
|
||||
captions_path = None
|
||||
if req.enable_subtitles:
|
||||
tasks[task_id]["message"] = "正在生成字幕 (Whisper)..."
|
||||
tasks[task_id]["progress"] = 82
|
||||
|
||||
captions_path = temp_dir / f"{task_id}_captions.json"
|
||||
temp_files.append(captions_path)
|
||||
|
||||
try:
|
||||
await whisper_service.align(
|
||||
audio_path=str(audio_path),
|
||||
text=req.text,
|
||||
output_path=str(captions_path)
|
||||
)
|
||||
print(f"[Pipeline] Whisper alignment completed")
|
||||
except Exception as e:
|
||||
logger.warning(f"Whisper alignment failed, skipping subtitles: {e}")
|
||||
captions_path = None
|
||||
|
||||
tasks[task_id]["progress"] = 85
|
||||
|
||||
# 3. Composition - 进度 85% -> 100%
|
||||
tasks[task_id]["message"] = "正在合成最终视频..."
|
||||
tasks[task_id]["progress"] = 90
|
||||
# 4. Remotion 视频合成(字幕 + 标题)- 进度 85% -> 95%
|
||||
# 判断是否需要使用 Remotion(有字幕或标题时使用)
|
||||
use_remotion = (captions_path and captions_path.exists()) or req.title
|
||||
|
||||
video = VideoService()
|
||||
final_output_local_path = temp_dir / f"{task_id}_output.mp4"
|
||||
temp_files.append(final_output_local_path)
|
||||
|
||||
await video.compose(str(lipsync_video_path), str(audio_path), str(final_output_local_path))
|
||||
if use_remotion:
|
||||
tasks[task_id]["message"] = "正在合成视频 (Remotion)..."
|
||||
tasks[task_id]["progress"] = 87
|
||||
|
||||
# 先用 FFmpeg 合成音视频(Remotion 需要带音频的视频)
|
||||
composed_video_path = temp_dir / f"{task_id}_composed.mp4"
|
||||
temp_files.append(composed_video_path)
|
||||
|
||||
video = VideoService()
|
||||
await video.compose(str(lipsync_video_path), str(audio_path), str(composed_video_path))
|
||||
|
||||
# 检查 Remotion 是否可用
|
||||
remotion_health = await remotion_service.check_health()
|
||||
if remotion_health.get("ready"):
|
||||
try:
|
||||
def on_remotion_progress(percent):
|
||||
# 映射 Remotion 进度到 87-95%
|
||||
mapped = 87 + int(percent * 0.08)
|
||||
tasks[task_id]["progress"] = mapped
|
||||
|
||||
await remotion_service.render(
|
||||
video_path=str(composed_video_path),
|
||||
output_path=str(final_output_local_path),
|
||||
captions_path=str(captions_path) if captions_path else None,
|
||||
title=req.title,
|
||||
title_duration=3.0,
|
||||
fps=25,
|
||||
enable_subtitles=req.enable_subtitles,
|
||||
on_progress=on_remotion_progress
|
||||
)
|
||||
print(f"[Pipeline] Remotion render completed")
|
||||
except Exception as e:
|
||||
logger.warning(f"Remotion render failed, using FFmpeg fallback: {e}")
|
||||
# 回退到 FFmpeg 合成
|
||||
import shutil
|
||||
shutil.copy(str(composed_video_path), final_output_local_path)
|
||||
else:
|
||||
logger.warning(f"Remotion not ready: {remotion_health.get('error')}, using FFmpeg")
|
||||
import shutil
|
||||
shutil.copy(str(composed_video_path), final_output_local_path)
|
||||
else:
|
||||
# 不需要字幕和标题,直接用 FFmpeg 合成
|
||||
tasks[task_id]["message"] = "正在合成最终视频..."
|
||||
tasks[task_id]["progress"] = 90
|
||||
|
||||
video = VideoService()
|
||||
await video.compose(str(lipsync_video_path), str(audio_path), str(final_output_local_path))
|
||||
|
||||
total_time = time.time() - start_time
|
||||
|
||||
|
||||
150
backend/app/services/remotion_service.py
Normal file
150
backend/app/services/remotion_service.py
Normal file
@@ -0,0 +1,150 @@
|
||||
"""
|
||||
Remotion 视频渲染服务
|
||||
调用 Node.js Remotion 进行视频合成(字幕 + 标题)
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
from loguru import logger
|
||||
|
||||
|
||||
class RemotionService:
|
||||
"""Remotion 视频渲染服务"""
|
||||
|
||||
def __init__(self, remotion_dir: Optional[str] = None):
|
||||
# Remotion 项目目录
|
||||
if remotion_dir:
|
||||
self.remotion_dir = Path(remotion_dir)
|
||||
else:
|
||||
# 默认在 ViGent2/remotion 目录
|
||||
self.remotion_dir = Path(__file__).parent.parent.parent.parent / "remotion"
|
||||
|
||||
async def render(
|
||||
self,
|
||||
video_path: str,
|
||||
output_path: str,
|
||||
captions_path: Optional[str] = None,
|
||||
title: Optional[str] = None,
|
||||
title_duration: float = 3.0,
|
||||
fps: int = 25,
|
||||
enable_subtitles: bool = True,
|
||||
on_progress: Optional[callable] = None
|
||||
) -> str:
|
||||
"""
|
||||
使用 Remotion 渲染视频(添加字幕和标题)
|
||||
|
||||
Args:
|
||||
video_path: 输入视频路径(唇形同步后的视频)
|
||||
output_path: 输出视频路径
|
||||
captions_path: 字幕 JSON 文件路径(Whisper 生成)
|
||||
title: 视频标题(可选)
|
||||
title_duration: 标题显示时长(秒)
|
||||
fps: 帧率
|
||||
enable_subtitles: 是否启用字幕
|
||||
on_progress: 进度回调函数
|
||||
|
||||
Returns:
|
||||
输出视频路径
|
||||
"""
|
||||
# 构建命令参数
|
||||
cmd = [
|
||||
"npx", "ts-node", "render.ts",
|
||||
"--video", str(video_path),
|
||||
"--output", str(output_path),
|
||||
"--fps", str(fps),
|
||||
"--enableSubtitles", str(enable_subtitles).lower()
|
||||
]
|
||||
|
||||
if captions_path:
|
||||
cmd.extend(["--captions", str(captions_path)])
|
||||
|
||||
if title:
|
||||
cmd.extend(["--title", title])
|
||||
cmd.extend(["--titleDuration", str(title_duration)])
|
||||
|
||||
logger.info(f"Running Remotion render: {' '.join(cmd)}")
|
||||
|
||||
# 在线程池中运行子进程
|
||||
def _run_render():
|
||||
process = subprocess.Popen(
|
||||
cmd,
|
||||
cwd=str(self.remotion_dir),
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.STDOUT,
|
||||
text=True,
|
||||
bufsize=1
|
||||
)
|
||||
|
||||
output_lines = []
|
||||
for line in iter(process.stdout.readline, ''):
|
||||
line = line.strip()
|
||||
if line:
|
||||
output_lines.append(line)
|
||||
logger.debug(f"[Remotion] {line}")
|
||||
|
||||
# 解析进度
|
||||
if "Rendering:" in line and "%" in line:
|
||||
try:
|
||||
percent_str = line.split("Rendering:")[1].strip().replace("%", "")
|
||||
percent = int(percent_str)
|
||||
if on_progress:
|
||||
on_progress(percent)
|
||||
except (ValueError, IndexError):
|
||||
pass
|
||||
|
||||
process.wait()
|
||||
|
||||
if process.returncode != 0:
|
||||
error_msg = "\n".join(output_lines[-20:]) # 最后 20 行
|
||||
raise RuntimeError(f"Remotion render failed (code {process.returncode}):\n{error_msg}")
|
||||
|
||||
return output_path
|
||||
|
||||
loop = asyncio.get_event_loop()
|
||||
result = await loop.run_in_executor(None, _run_render)
|
||||
|
||||
logger.info(f"Remotion render complete: {result}")
|
||||
return result
|
||||
|
||||
async def check_health(self) -> dict:
|
||||
"""检查 Remotion 服务健康状态"""
|
||||
try:
|
||||
# 检查 remotion 目录是否存在
|
||||
if not self.remotion_dir.exists():
|
||||
return {
|
||||
"ready": False,
|
||||
"error": f"Remotion directory not found: {self.remotion_dir}"
|
||||
}
|
||||
|
||||
# 检查 package.json 是否存在
|
||||
package_json = self.remotion_dir / "package.json"
|
||||
if not package_json.exists():
|
||||
return {
|
||||
"ready": False,
|
||||
"error": "package.json not found"
|
||||
}
|
||||
|
||||
# 检查 node_modules 是否存在
|
||||
node_modules = self.remotion_dir / "node_modules"
|
||||
if not node_modules.exists():
|
||||
return {
|
||||
"ready": False,
|
||||
"error": "node_modules not found, run 'npm install' first"
|
||||
}
|
||||
|
||||
return {
|
||||
"ready": True,
|
||||
"remotion_dir": str(self.remotion_dir)
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
return {
|
||||
"ready": False,
|
||||
"error": str(e)
|
||||
}
|
||||
|
||||
|
||||
# 全局服务实例
|
||||
remotion_service = RemotionService()
|
||||
176
backend/app/services/whisper_service.py
Normal file
176
backend/app/services/whisper_service.py
Normal file
@@ -0,0 +1,176 @@
|
||||
"""
|
||||
字幕对齐服务
|
||||
使用 faster-whisper 生成字级别时间戳
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
from loguru import logger
|
||||
|
||||
# 模型缓存
|
||||
_whisper_model = None
|
||||
|
||||
|
||||
def split_word_to_chars(word: str, start: float, end: float) -> list:
|
||||
"""
|
||||
将词拆分成单个字符,时间戳线性插值
|
||||
|
||||
Args:
|
||||
word: 词文本
|
||||
start: 词开始时间
|
||||
end: 词结束时间
|
||||
|
||||
Returns:
|
||||
单字符列表,每个包含 word/start/end
|
||||
"""
|
||||
# 只保留中文字符和基本标点
|
||||
chars = [c for c in word if c.strip()]
|
||||
if not chars:
|
||||
return []
|
||||
|
||||
if len(chars) == 1:
|
||||
return [{"word": chars[0], "start": start, "end": end}]
|
||||
|
||||
# 线性插值时间戳
|
||||
duration = end - start
|
||||
char_duration = duration / len(chars)
|
||||
|
||||
result = []
|
||||
for i, char in enumerate(chars):
|
||||
char_start = start + i * char_duration
|
||||
char_end = start + (i + 1) * char_duration
|
||||
result.append({
|
||||
"word": char,
|
||||
"start": round(char_start, 3),
|
||||
"end": round(char_end, 3)
|
||||
})
|
||||
|
||||
return result
|
||||
|
||||
|
||||
class WhisperService:
|
||||
"""字幕对齐服务(基于 faster-whisper)"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
model_size: str = "large-v3",
|
||||
device: str = "cuda",
|
||||
compute_type: str = "float16",
|
||||
):
|
||||
self.model_size = model_size
|
||||
self.device = device
|
||||
self.compute_type = compute_type
|
||||
|
||||
def _load_model(self):
|
||||
"""懒加载 faster-whisper 模型"""
|
||||
global _whisper_model
|
||||
|
||||
if _whisper_model is None:
|
||||
from faster_whisper import WhisperModel
|
||||
|
||||
logger.info(f"Loading faster-whisper model: {self.model_size} on {self.device}")
|
||||
_whisper_model = WhisperModel(
|
||||
self.model_size,
|
||||
device=self.device,
|
||||
compute_type=self.compute_type
|
||||
)
|
||||
logger.info("faster-whisper model loaded")
|
||||
|
||||
return _whisper_model
|
||||
|
||||
async def align(
|
||||
self,
|
||||
audio_path: str,
|
||||
text: str,
|
||||
output_path: Optional[str] = None
|
||||
) -> dict:
|
||||
"""
|
||||
对音频进行转录,生成字级别时间戳
|
||||
|
||||
Args:
|
||||
audio_path: 音频文件路径
|
||||
text: 原始文本(用于参考,但实际使用 whisper 转录结果)
|
||||
output_path: 可选,输出 JSON 文件路径
|
||||
|
||||
Returns:
|
||||
包含字级别时间戳的字典
|
||||
"""
|
||||
import asyncio
|
||||
|
||||
def _do_transcribe():
|
||||
model = self._load_model()
|
||||
|
||||
logger.info(f"Transcribing audio: {audio_path}")
|
||||
|
||||
# 转录并获取字级别时间戳
|
||||
segments_iter, info = model.transcribe(
|
||||
audio_path,
|
||||
language="zh",
|
||||
word_timestamps=True, # 启用字级别时间戳
|
||||
vad_filter=True, # 启用 VAD 过滤静音
|
||||
)
|
||||
|
||||
logger.info(f"Detected language: {info.language} (prob: {info.language_probability:.2f})")
|
||||
|
||||
segments = []
|
||||
for segment in segments_iter:
|
||||
seg_data = {
|
||||
"text": segment.text.strip(),
|
||||
"start": segment.start,
|
||||
"end": segment.end,
|
||||
"words": []
|
||||
}
|
||||
|
||||
# 提取每个字的时间戳,并拆分成单字
|
||||
if segment.words:
|
||||
for word_info in segment.words:
|
||||
word_text = word_info.word.strip()
|
||||
if word_text:
|
||||
# 将词拆分成单字,时间戳线性插值
|
||||
chars = split_word_to_chars(
|
||||
word_text,
|
||||
word_info.start,
|
||||
word_info.end
|
||||
)
|
||||
seg_data["words"].extend(chars)
|
||||
|
||||
if seg_data["words"]: # 只添加有内容的段落
|
||||
segments.append(seg_data)
|
||||
|
||||
return {"segments": segments}
|
||||
|
||||
# 在线程池中执行
|
||||
loop = asyncio.get_event_loop()
|
||||
result = await loop.run_in_executor(None, _do_transcribe)
|
||||
|
||||
# 保存到文件
|
||||
if output_path:
|
||||
output_file = Path(output_path)
|
||||
output_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(output_file, "w", encoding="utf-8") as f:
|
||||
json.dump(result, f, ensure_ascii=False, indent=2)
|
||||
logger.info(f"Captions saved to: {output_path}")
|
||||
|
||||
return result
|
||||
|
||||
async def check_health(self) -> dict:
|
||||
"""检查服务健康状态"""
|
||||
try:
|
||||
from faster_whisper import WhisperModel
|
||||
return {
|
||||
"ready": True,
|
||||
"model_size": self.model_size,
|
||||
"device": self.device,
|
||||
"backend": "faster-whisper"
|
||||
}
|
||||
except ImportError:
|
||||
return {
|
||||
"ready": False,
|
||||
"error": "faster-whisper not installed"
|
||||
}
|
||||
|
||||
|
||||
# 全局服务实例
|
||||
whisper_service = WhisperService()
|
||||
@@ -28,3 +28,6 @@ supabase>=2.0.0
|
||||
python-jose[cryptography]>=3.3.0
|
||||
passlib[bcrypt]>=1.7.4
|
||||
bcrypt==4.0.1
|
||||
|
||||
# 字幕对齐
|
||||
faster-whisper>=1.0.0
|
||||
|
||||
@@ -24,6 +24,11 @@ ViGent2 的前端界面,采用 Next.js 14 + TailwindCSS 构建。
|
||||
- **参考音频管理**: 上传/列表/删除参考音频 (3-20秒 WAV)。
|
||||
- **一键克隆**: 选择参考音频后自动调用 Qwen3-TTS 服务。
|
||||
|
||||
### 4. 字幕与标题 [Day 13 新增]
|
||||
- **片头标题**: 可选输入,视频开头显示 3 秒淡入淡出标题。
|
||||
- **逐字高亮字幕**: 卡拉OK效果,默认开启,可关闭。
|
||||
- **自动对齐**: 基于 faster-whisper 生成字级别时间戳。
|
||||
|
||||
## 🛠️ 技术栈
|
||||
|
||||
- **框架**: Next.js 14 (App Router)
|
||||
|
||||
@@ -74,6 +74,10 @@ export default function Home() {
|
||||
|
||||
const [selectedVideoId, setSelectedVideoId] = useState<string | null>(null);
|
||||
|
||||
// 字幕和标题相关状态
|
||||
const [videoTitle, setVideoTitle] = useState<string>("");
|
||||
const [enableSubtitles, setEnableSubtitles] = useState<boolean>(true);
|
||||
|
||||
// 声音克隆相关状态
|
||||
const [ttsMode, setTtsMode] = useState<'edgetts' | 'voiceclone'>('edgetts');
|
||||
const [refAudios, setRefAudios] = useState<RefAudio[]>([]);
|
||||
@@ -356,6 +360,8 @@ export default function Home() {
|
||||
material_path: materialObj.path,
|
||||
text: text,
|
||||
tts_mode: ttsMode,
|
||||
title: videoTitle.trim() || undefined,
|
||||
enable_subtitles: enableSubtitles,
|
||||
};
|
||||
|
||||
if (ttsMode === 'edgetts') {
|
||||
@@ -587,6 +593,46 @@ export default function Home() {
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* 标题和字幕设置 */}
|
||||
<div className="bg-white/5 rounded-2xl p-4 sm:p-6 border border-white/10 backdrop-blur-sm">
|
||||
<h2 className="text-base sm:text-lg font-semibold text-white mb-4 flex items-center gap-2">
|
||||
🎬 标题与字幕
|
||||
</h2>
|
||||
|
||||
{/* 视频标题输入 */}
|
||||
<div className="mb-4">
|
||||
<label className="text-sm text-gray-300 mb-2 block">
|
||||
片头标题(可选)
|
||||
</label>
|
||||
<input
|
||||
type="text"
|
||||
value={videoTitle}
|
||||
onChange={(e) => setVideoTitle(e.target.value)}
|
||||
placeholder="输入视频标题,将在片头显示"
|
||||
className="w-full px-3 sm:px-4 py-2 text-sm sm:text-base bg-black/30 border border-white/10 rounded-xl text-white placeholder-gray-500 focus:outline-none focus:border-purple-500 transition-colors"
|
||||
/>
|
||||
</div>
|
||||
|
||||
{/* 字幕开关 */}
|
||||
<div className="flex items-center justify-between">
|
||||
<div>
|
||||
<span className="text-sm text-gray-300">逐字高亮字幕</span>
|
||||
<p className="text-xs text-gray-500 mt-1">
|
||||
自动生成卡拉OK效果字幕
|
||||
</p>
|
||||
</div>
|
||||
<label className="relative inline-flex items-center cursor-pointer">
|
||||
<input
|
||||
type="checkbox"
|
||||
checked={enableSubtitles}
|
||||
onChange={(e) => setEnableSubtitles(e.target.checked)}
|
||||
className="sr-only peer"
|
||||
/>
|
||||
<div className="w-11 h-6 bg-gray-600 peer-focus:outline-none rounded-full peer peer-checked:after:translate-x-full peer-checked:after:border-white after:content-[''] after:absolute after:top-[2px] after:left-[2px] after:bg-white after:border-gray-300 after:border after:rounded-full after:h-5 after:w-5 after:transition-all peer-checked:bg-purple-600"></div>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* 配音方式选择 */}
|
||||
<div className="bg-white/5 rounded-2xl p-6 border border-white/10 backdrop-blur-sm">
|
||||
<h2 className="text-lg font-semibold text-white mb-4 flex items-center gap-2">
|
||||
@@ -833,7 +879,7 @@ export default function Home() {
|
||||
style={{ width: `${currentTask.progress}%` }}
|
||||
/>
|
||||
</div>
|
||||
<p className="text-gray-300">{currentTask.message}</p>
|
||||
<p className="text-gray-300">正在用AI生成中...</p>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
2907
remotion/package-lock.json
generated
Normal file
2907
remotion/package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load Diff
24
remotion/package.json
Normal file
24
remotion/package.json
Normal file
@@ -0,0 +1,24 @@
|
||||
{
|
||||
"name": "vigent-remotion",
|
||||
"version": "1.0.0",
|
||||
"description": "Remotion video composition for ViGent2 subtitles and titles",
|
||||
"scripts": {
|
||||
"start": "remotion studio",
|
||||
"build": "remotion bundle",
|
||||
"render": "npx ts-node render.ts"
|
||||
},
|
||||
"dependencies": {
|
||||
"remotion": "^4.0.0",
|
||||
"@remotion/renderer": "^4.0.0",
|
||||
"@remotion/cli": "^4.0.0",
|
||||
"@remotion/media-utils": "^4.0.0",
|
||||
"react": "^18.2.0",
|
||||
"react-dom": "^18.2.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@types/node": "^20.0.0",
|
||||
"@types/react": "^18.2.0",
|
||||
"typescript": "^5.0.0",
|
||||
"ts-node": "^10.9.0"
|
||||
}
|
||||
}
|
||||
153
remotion/render.ts
Normal file
153
remotion/render.ts
Normal file
@@ -0,0 +1,153 @@
|
||||
/**
|
||||
* Remotion 服务端渲染脚本
|
||||
* 用于从命令行渲染视频
|
||||
*
|
||||
* 使用方式:
|
||||
* npx ts-node render.ts --video /path/to/video.mp4 --captions /path/to/captions.json --title "视频标题" --output /path/to/output.mp4
|
||||
*/
|
||||
|
||||
import { bundle } from '@remotion/bundler';
|
||||
import { renderMedia, selectComposition } from '@remotion/renderer';
|
||||
import path from 'path';
|
||||
import fs from 'fs';
|
||||
|
||||
interface RenderOptions {
|
||||
videoPath: string;
|
||||
captionsPath?: string;
|
||||
title?: string;
|
||||
titleDuration?: number;
|
||||
outputPath: string;
|
||||
fps?: number;
|
||||
enableSubtitles?: boolean;
|
||||
}
|
||||
|
||||
async function parseArgs(): Promise<RenderOptions> {
|
||||
const args = process.argv.slice(2);
|
||||
const options: Partial<RenderOptions> = {};
|
||||
|
||||
for (let i = 0; i < args.length; i += 2) {
|
||||
const key = args[i].replace('--', '');
|
||||
const value = args[i + 1];
|
||||
|
||||
switch (key) {
|
||||
case 'video':
|
||||
options.videoPath = value;
|
||||
break;
|
||||
case 'captions':
|
||||
options.captionsPath = value;
|
||||
break;
|
||||
case 'title':
|
||||
options.title = value;
|
||||
break;
|
||||
case 'titleDuration':
|
||||
options.titleDuration = parseFloat(value);
|
||||
break;
|
||||
case 'output':
|
||||
options.outputPath = value;
|
||||
break;
|
||||
case 'fps':
|
||||
options.fps = parseInt(value, 10);
|
||||
break;
|
||||
case 'enableSubtitles':
|
||||
options.enableSubtitles = value === 'true';
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (!options.videoPath || !options.outputPath) {
|
||||
console.error('Usage: npx ts-node render.ts --video <path> --output <path> [--captions <path>] [--title <text>] [--fps <number>]');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
return options as RenderOptions;
|
||||
}
|
||||
|
||||
async function main() {
|
||||
const options = await parseArgs();
|
||||
const fps = options.fps || 25;
|
||||
|
||||
console.log('Starting Remotion render...');
|
||||
console.log('Options:', JSON.stringify(options, null, 2));
|
||||
|
||||
// 读取字幕数据
|
||||
let captions = undefined;
|
||||
if (options.captionsPath && fs.existsSync(options.captionsPath)) {
|
||||
const captionsContent = fs.readFileSync(options.captionsPath, 'utf-8');
|
||||
captions = JSON.parse(captionsContent);
|
||||
console.log(`Loaded captions with ${captions.segments?.length || 0} segments`);
|
||||
}
|
||||
|
||||
// 获取视频时长
|
||||
let durationInFrames = 300; // 默认 12 秒
|
||||
try {
|
||||
// 使用 ffprobe 获取视频时长
|
||||
const { execSync } = require('child_process');
|
||||
const ffprobeOutput = execSync(
|
||||
`ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "${options.videoPath}"`,
|
||||
{ encoding: 'utf-8' }
|
||||
);
|
||||
const durationInSeconds = parseFloat(ffprobeOutput.trim());
|
||||
durationInFrames = Math.ceil(durationInSeconds * fps);
|
||||
console.log(`Video duration: ${durationInSeconds}s (${durationInFrames} frames at ${fps}fps)`);
|
||||
} catch (e) {
|
||||
console.warn('Could not get video duration, using default:', e);
|
||||
}
|
||||
|
||||
// 设置 publicDir 为视频文件所在目录,使用文件名作为 videoSrc
|
||||
const publicDir = path.dirname(path.resolve(options.videoPath));
|
||||
const videoFileName = path.basename(options.videoPath);
|
||||
console.log(`Public dir: ${publicDir}, Video file: ${videoFileName}`);
|
||||
|
||||
// Bundle the Remotion project
|
||||
console.log('Bundling Remotion project...');
|
||||
const bundleLocation = await bundle({
|
||||
entryPoint: path.resolve(__dirname, './src/index.ts'),
|
||||
webpackOverride: (config) => config,
|
||||
publicDir,
|
||||
});
|
||||
|
||||
// Select the composition
|
||||
const composition = await selectComposition({
|
||||
serveUrl: bundleLocation,
|
||||
id: 'ViGentVideo',
|
||||
inputProps: {
|
||||
videoSrc: videoFileName,
|
||||
captions,
|
||||
title: options.title,
|
||||
titleDuration: options.titleDuration || 3,
|
||||
enableSubtitles: options.enableSubtitles !== false,
|
||||
},
|
||||
});
|
||||
|
||||
// Override duration
|
||||
composition.durationInFrames = durationInFrames;
|
||||
composition.fps = fps;
|
||||
|
||||
// Render the video
|
||||
console.log('Rendering video...');
|
||||
await renderMedia({
|
||||
composition,
|
||||
serveUrl: bundleLocation,
|
||||
codec: 'h264',
|
||||
outputLocation: options.outputPath,
|
||||
inputProps: {
|
||||
videoSrc: videoFileName,
|
||||
captions,
|
||||
title: options.title,
|
||||
titleDuration: options.titleDuration || 3,
|
||||
enableSubtitles: options.enableSubtitles !== false,
|
||||
},
|
||||
onProgress: ({ progress }) => {
|
||||
const percent = Math.round(progress * 100);
|
||||
process.stdout.write(`\rRendering: ${percent}%`);
|
||||
},
|
||||
});
|
||||
|
||||
console.log('\nRender complete!');
|
||||
console.log(`Output: ${options.outputPath}`);
|
||||
}
|
||||
|
||||
main().catch((err) => {
|
||||
console.error('Render failed:', err);
|
||||
process.exit(1);
|
||||
});
|
||||
30
remotion/src/Root.tsx
Normal file
30
remotion/src/Root.tsx
Normal file
@@ -0,0 +1,30 @@
|
||||
import React from 'react';
|
||||
import { Composition } from 'remotion';
|
||||
import { Video, VideoProps } from './Video';
|
||||
|
||||
/**
|
||||
* Remotion 根组件
|
||||
* 定义视频合成配置
|
||||
*/
|
||||
export const RemotionRoot: React.FC = () => {
|
||||
return (
|
||||
<>
|
||||
<Composition
|
||||
id="ViGentVideo"
|
||||
component={Video}
|
||||
durationInFrames={300} // 默认值,会被 render.ts 覆盖
|
||||
fps={25}
|
||||
width={1280}
|
||||
height={720}
|
||||
defaultProps={{
|
||||
videoSrc: '',
|
||||
audioSrc: undefined,
|
||||
captions: undefined,
|
||||
title: undefined,
|
||||
titleDuration: 3,
|
||||
enableSubtitles: true,
|
||||
}}
|
||||
/>
|
||||
</>
|
||||
);
|
||||
};
|
||||
45
remotion/src/Video.tsx
Normal file
45
remotion/src/Video.tsx
Normal file
@@ -0,0 +1,45 @@
|
||||
import React from 'react';
|
||||
import { AbsoluteFill, Composition } from 'remotion';
|
||||
import { VideoLayer } from './components/VideoLayer';
|
||||
import { Title } from './components/Title';
|
||||
import { Subtitles } from './components/Subtitles';
|
||||
import { CaptionsData } from './utils/captions';
|
||||
|
||||
export interface VideoProps {
|
||||
videoSrc: string;
|
||||
audioSrc?: string;
|
||||
captions?: CaptionsData;
|
||||
title?: string;
|
||||
titleDuration?: number;
|
||||
enableSubtitles?: boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* 主视频组件
|
||||
* 组合视频层、标题层和字幕层
|
||||
*/
|
||||
export const Video: React.FC<VideoProps> = ({
|
||||
videoSrc,
|
||||
audioSrc,
|
||||
captions,
|
||||
title,
|
||||
titleDuration = 3,
|
||||
enableSubtitles = true,
|
||||
}) => {
|
||||
return (
|
||||
<AbsoluteFill style={{ backgroundColor: 'black' }}>
|
||||
{/* 底层:视频 */}
|
||||
<VideoLayer videoSrc={videoSrc} audioSrc={audioSrc} />
|
||||
|
||||
{/* 中层:字幕 */}
|
||||
{enableSubtitles && captions && (
|
||||
<Subtitles captions={captions} />
|
||||
)}
|
||||
|
||||
{/* 顶层:标题 */}
|
||||
{title && (
|
||||
<Title title={title} duration={titleDuration} />
|
||||
)}
|
||||
</AbsoluteFill>
|
||||
);
|
||||
};
|
||||
85
remotion/src/components/Subtitles.tsx
Normal file
85
remotion/src/components/Subtitles.tsx
Normal file
@@ -0,0 +1,85 @@
|
||||
import React from 'react';
|
||||
import { AbsoluteFill, useCurrentFrame, useVideoConfig } from 'remotion';
|
||||
import {
|
||||
CaptionsData,
|
||||
getCurrentSegment,
|
||||
getCurrentWordIndex,
|
||||
} from '../utils/captions';
|
||||
|
||||
interface SubtitlesProps {
|
||||
captions: CaptionsData;
|
||||
highlightColor?: string;
|
||||
normalColor?: string;
|
||||
fontSize?: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* 逐字高亮字幕组件
|
||||
* 根据时间戳逐字高亮显示字幕
|
||||
*/
|
||||
export const Subtitles: React.FC<SubtitlesProps> = ({
|
||||
captions,
|
||||
highlightColor = '#FFFFFF',
|
||||
normalColor = 'rgba(255, 255, 255, 0.5)',
|
||||
fontSize = 36,
|
||||
}) => {
|
||||
const frame = useCurrentFrame();
|
||||
const { fps } = useVideoConfig();
|
||||
|
||||
const currentTimeInSeconds = frame / fps;
|
||||
|
||||
// 获取当前段落
|
||||
const currentSegment = getCurrentSegment(captions, currentTimeInSeconds);
|
||||
|
||||
if (!currentSegment || currentSegment.words.length === 0) {
|
||||
return null;
|
||||
}
|
||||
|
||||
// 获取当前高亮字的索引
|
||||
const currentWordIndex = getCurrentWordIndex(currentSegment, currentTimeInSeconds);
|
||||
|
||||
return (
|
||||
<AbsoluteFill
|
||||
style={{
|
||||
justifyContent: 'flex-end',
|
||||
alignItems: 'center',
|
||||
paddingBottom: '60px',
|
||||
}}
|
||||
>
|
||||
<div
|
||||
style={{
|
||||
background: 'rgba(0, 0, 0, 0.6)',
|
||||
padding: '12px 24px',
|
||||
borderRadius: '12px',
|
||||
maxWidth: '80%',
|
||||
textAlign: 'center',
|
||||
}}
|
||||
>
|
||||
<p
|
||||
style={{
|
||||
margin: 0,
|
||||
fontSize: `${fontSize}px`,
|
||||
fontFamily: '"Noto Sans SC", "Microsoft YaHei", sans-serif',
|
||||
fontWeight: 500,
|
||||
lineHeight: 1.5,
|
||||
}}
|
||||
>
|
||||
{currentSegment.words.map((word, index) => (
|
||||
<span
|
||||
key={`${word.word}-${index}`}
|
||||
style={{
|
||||
color: index <= currentWordIndex ? highlightColor : normalColor,
|
||||
transition: 'color 0.1s ease',
|
||||
textShadow: index <= currentWordIndex
|
||||
? '0 2px 10px rgba(255,255,255,0.3)'
|
||||
: 'none',
|
||||
}}
|
||||
>
|
||||
{word.word}
|
||||
</span>
|
||||
))}
|
||||
</p>
|
||||
</div>
|
||||
</AbsoluteFill>
|
||||
);
|
||||
};
|
||||
94
remotion/src/components/Title.tsx
Normal file
94
remotion/src/components/Title.tsx
Normal file
@@ -0,0 +1,94 @@
|
||||
import React from 'react';
|
||||
import {
|
||||
AbsoluteFill,
|
||||
interpolate,
|
||||
useCurrentFrame,
|
||||
useVideoConfig,
|
||||
} from 'remotion';
|
||||
|
||||
interface TitleProps {
|
||||
title: string;
|
||||
duration?: number; // 标题显示时长(秒)
|
||||
fadeOutStart?: number; // 开始淡出的时间(秒)
|
||||
}
|
||||
|
||||
/**
|
||||
* 片头标题组件
|
||||
* 在视频开头显示标题,带淡入淡出效果
|
||||
*/
|
||||
export const Title: React.FC<TitleProps> = ({
|
||||
title,
|
||||
duration = 3,
|
||||
fadeOutStart = 2,
|
||||
}) => {
|
||||
const frame = useCurrentFrame();
|
||||
const { fps } = useVideoConfig();
|
||||
|
||||
const currentTimeInSeconds = frame / fps;
|
||||
|
||||
// 如果超过显示时长,不渲染
|
||||
if (currentTimeInSeconds > duration) {
|
||||
return null;
|
||||
}
|
||||
|
||||
// 淡入效果 (0-0.5秒)
|
||||
const fadeInOpacity = interpolate(
|
||||
currentTimeInSeconds,
|
||||
[0, 0.5],
|
||||
[0, 1],
|
||||
{ extrapolateRight: 'clamp' }
|
||||
);
|
||||
|
||||
// 淡出效果
|
||||
const fadeOutOpacity = interpolate(
|
||||
currentTimeInSeconds,
|
||||
[fadeOutStart, duration],
|
||||
[1, 0],
|
||||
{ extrapolateLeft: 'clamp', extrapolateRight: 'clamp' }
|
||||
);
|
||||
|
||||
const opacity = Math.min(fadeInOpacity, fadeOutOpacity);
|
||||
|
||||
// 轻微的缩放动画
|
||||
const scale = interpolate(
|
||||
currentTimeInSeconds,
|
||||
[0, 0.5],
|
||||
[0.95, 1],
|
||||
{ extrapolateRight: 'clamp' }
|
||||
);
|
||||
|
||||
return (
|
||||
<AbsoluteFill
|
||||
style={{
|
||||
justifyContent: 'center',
|
||||
alignItems: 'center',
|
||||
opacity,
|
||||
}}
|
||||
>
|
||||
<div
|
||||
style={{
|
||||
transform: `scale(${scale})`,
|
||||
textAlign: 'center',
|
||||
padding: '40px 60px',
|
||||
background: 'linear-gradient(135deg, rgba(0,0,0,0.7) 0%, rgba(0,0,0,0.5) 100%)',
|
||||
borderRadius: '20px',
|
||||
backdropFilter: 'blur(10px)',
|
||||
}}
|
||||
>
|
||||
<h1
|
||||
style={{
|
||||
color: 'white',
|
||||
fontSize: '48px',
|
||||
fontWeight: 'bold',
|
||||
fontFamily: '"Noto Sans SC", "Microsoft YaHei", sans-serif',
|
||||
textShadow: '0 4px 20px rgba(0,0,0,0.5)',
|
||||
margin: 0,
|
||||
lineHeight: 1.4,
|
||||
}}
|
||||
>
|
||||
{title}
|
||||
</h1>
|
||||
</div>
|
||||
</AbsoluteFill>
|
||||
);
|
||||
};
|
||||
33
remotion/src/components/VideoLayer.tsx
Normal file
33
remotion/src/components/VideoLayer.tsx
Normal file
@@ -0,0 +1,33 @@
|
||||
import React from 'react';
|
||||
import { AbsoluteFill, OffthreadVideo, Audio, staticFile } from 'remotion';
|
||||
|
||||
interface VideoLayerProps {
|
||||
videoSrc: string;
|
||||
audioSrc?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* 视频图层组件
|
||||
* 渲染底层视频和音频
|
||||
*/
|
||||
export const VideoLayer: React.FC<VideoLayerProps> = ({
|
||||
videoSrc,
|
||||
audioSrc,
|
||||
}) => {
|
||||
// 使用 staticFile 从 publicDir 加载视频
|
||||
const videoUrl = staticFile(videoSrc);
|
||||
|
||||
return (
|
||||
<AbsoluteFill>
|
||||
<OffthreadVideo
|
||||
src={videoUrl}
|
||||
style={{
|
||||
width: '100%',
|
||||
height: '100%',
|
||||
objectFit: 'contain',
|
||||
}}
|
||||
/>
|
||||
{audioSrc && <Audio src={staticFile(audioSrc)} />}
|
||||
</AbsoluteFill>
|
||||
);
|
||||
};
|
||||
4
remotion/src/index.ts
Normal file
4
remotion/src/index.ts
Normal file
@@ -0,0 +1,4 @@
|
||||
import { registerRoot } from 'remotion';
|
||||
import { RemotionRoot } from './Root';
|
||||
|
||||
registerRoot(RemotionRoot);
|
||||
66
remotion/src/utils/captions.ts
Normal file
66
remotion/src/utils/captions.ts
Normal file
@@ -0,0 +1,66 @@
|
||||
/**
|
||||
* 字幕数据类型定义和处理工具
|
||||
*/
|
||||
|
||||
export interface WordTimestamp {
|
||||
word: string;
|
||||
start: number;
|
||||
end: number;
|
||||
}
|
||||
|
||||
export interface Segment {
|
||||
text: string;
|
||||
start: number;
|
||||
end: number;
|
||||
words: WordTimestamp[];
|
||||
}
|
||||
|
||||
export interface CaptionsData {
|
||||
segments: Segment[];
|
||||
}
|
||||
|
||||
/**
|
||||
* 根据当前时间获取应该显示的字幕段落
|
||||
*/
|
||||
export function getCurrentSegment(
|
||||
captions: CaptionsData,
|
||||
currentTimeInSeconds: number
|
||||
): Segment | null {
|
||||
for (const segment of captions.segments) {
|
||||
if (currentTimeInSeconds >= segment.start && currentTimeInSeconds <= segment.end) {
|
||||
return segment;
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* 根据当前时间获取当前高亮的字的索引
|
||||
*/
|
||||
export function getCurrentWordIndex(
|
||||
segment: Segment,
|
||||
currentTimeInSeconds: number
|
||||
): number {
|
||||
for (let i = 0; i < segment.words.length; i++) {
|
||||
const word = segment.words[i];
|
||||
if (currentTimeInSeconds >= word.start && currentTimeInSeconds <= word.end) {
|
||||
return i;
|
||||
}
|
||||
// 如果当前时间在两个字之间,返回前一个字
|
||||
if (i < segment.words.length - 1) {
|
||||
const nextWord = segment.words[i + 1];
|
||||
if (currentTimeInSeconds > word.end && currentTimeInSeconds < nextWord.start) {
|
||||
return i;
|
||||
}
|
||||
}
|
||||
}
|
||||
// 如果超过最后一个字的结束时间,返回最后一个字
|
||||
if (segment.words.length > 0) {
|
||||
const lastWord = segment.words[segment.words.length - 1];
|
||||
if (currentTimeInSeconds >= lastWord.end) {
|
||||
return segment.words.length - 1;
|
||||
}
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
|
||||
19
remotion/tsconfig.json
Normal file
19
remotion/tsconfig.json
Normal file
@@ -0,0 +1,19 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"target": "ES2020",
|
||||
"module": "commonjs",
|
||||
"lib": ["ES2020", "DOM"],
|
||||
"jsx": "react-jsx",
|
||||
"strict": true,
|
||||
"esModuleInterop": true,
|
||||
"skipLibCheck": true,
|
||||
"forceConsistentCasingInFileNames": true,
|
||||
"resolveJsonModule": true,
|
||||
"declaration": true,
|
||||
"declarationMap": true,
|
||||
"outDir": "./dist",
|
||||
"rootDir": "."
|
||||
},
|
||||
"include": ["src/**/*", "render.ts"],
|
||||
"exclude": ["node_modules", "dist"]
|
||||
}
|
||||
Reference in New Issue
Block a user