Originals/ViGent2

Fork 0

Files

Kevin Wong 1717635bfd 更新

2026-02-25 17:51:58 +08:00

8.0 KiB

Raw Permalink Blame History

ViGent2 字幕与标题功能部署指南

本文档介绍如何部署 ViGent2 的逐字高亮字幕和片头标题功能。

功能概述

功能	说明
逐字高亮字幕	使用 faster-whisper 生成字级别时间戳，Remotion 渲染卡拉OK效果
片头标题	视频开头显示标题，带淡入淡出动画，几秒后消失

技术架构

原有流程:
  文本 → EdgeTTS → 音频 → LatentSync → FFmpeg合成 → 最终视频

新流程 (单素材):
  文本 → EdgeTTS/Qwen3-TTS/预生成配音 → 音频 ─┬→ LatentSync → 唇形视频 ─┐
                                              └→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频

新流程 (多素材):
  音频 → 多素材按 custom_assignments 拼接 → LatentSync (单次推理) → 唇形视频 ─┐
  音频 → faster-whisper → 字幕JSON ─────────────────────────────────────────────┴→ Remotion合成 → 最终视频

系统要求

组件	要求
Node.js	18+
Python	3.10+
GPU 显存	faster-whisper 需要约 3-4GB VRAM
FFmpeg	已安装

部署步骤

步骤 1: 安装 faster-whisper (Python)

cd /home/rongye/ProgramFiles/ViGent2/backend
source venv/bin/activate

# 安装 faster-whisper
pip install faster-whisper>=1.0.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

注意: 首次运行时，faster-whisper 会自动下载 large-v3 Whisper 模型 (~3GB)

步骤 2: 安装 Remotion (Node.js)

cd /home/rongye/ProgramFiles/ViGent2/remotion

# 安装依赖
npm install

# 预编译渲染脚本 (生产环境必须)
npm run build:render

步骤 3: 重启后端服务

pm2 restart vigent2-backend

步骤 4: 验证安装

# 检查 faster-whisper 是否安装成功
cd /home/rongye/ProgramFiles/ViGent2/backend
source venv/bin/activate
python -c "from faster_whisper import WhisperModel; print('faster-whisper OK')"

# 检查 Remotion 是否安装成功
cd /home/rongye/ProgramFiles/ViGent2/remotion
npx remotion --version

文件结构

后端新增文件

文件	说明
`backend/app/services/whisper_service.py`	字幕对齐服务 (基于 faster-whisper)
`backend/app/services/remotion_service.py`	Remotion 渲染服务

Remotion 项目结构

remotion/
├── package.json              # Node.js 依赖配置
├── tsconfig.json             # TypeScript 配置
├── render.ts                 # 服务端渲染脚本
└── src/
    ├── index.ts              # Remotion 入口
    ├── Root.tsx              # 根组件
    ├── Video.tsx             # 主视频组件
    ├── components/
    │   ├── Title.tsx         # 片头标题组件
    │   ├── Subtitles.tsx     # 逐字高亮字幕组件
    │   └── VideoLayer.tsx    # 视频图层组件
    ├── utils/
    │   └── captions.ts       # 字幕数据处理工具
    └── fonts/                # 字体文件目录 (可选)

API 参数

视频生成 API (POST /api/videos/generate) 新增以下参数：

参数	类型	默认值	说明
`title`	string	null	视频标题（片头显示，可选）
`enable_subtitles`	boolean	true	是否启用逐字高亮字幕

请求示例

{
  "material_path": "https://...",
  "text": "大家好，欢迎来到我的频道",
  "tts_mode": "edgetts",
  "voice": "zh-CN-YunxiNeural",
  "title": "今日分享",
  "enable_subtitles": true
}

视频生成流程

新的视频生成流程进度分配：

阶段	进度	说明
下载素材	0% → 5%	从 Supabase 下载输入视频
TTS 语音生成	5% → 25%	EdgeTTS / Qwen3-TTS / 预生成配音下载
唇形同步	25% → 80%	LatentSync 推理
字幕对齐	80% → 85%	faster-whisper 生成字级别时间戳
Remotion 渲染	85% → 95%	合成字幕和标题
上传结果	95% → 100%	上传到 Supabase Storage

降级处理

系统包含自动降级机制，确保基本功能不受影响：

场景	处理方式
字幕对齐失败	跳过字幕，继续生成视频
Remotion 未安装	使用 FFmpeg 直接合成
Remotion 渲染失败	回退到 FFmpeg 合成

配置说明

字幕服务配置

字幕服务位于 backend/app/services/whisper_service.py，默认配置：

参数	默认值	说明
`model_size`	large-v3	Whisper 模型大小
`device`	cuda	运行设备
`compute_type`	float16	计算精度

如需修改，可编辑 whisper_service.py 中的 WhisperService 初始化参数。

Remotion 配置

Remotion 渲染参数在 backend/app/services/remotion_service.py 中配置：

参数	默认值	说明
`fps`	25	输出帧率
`title_display_mode`	`short`	标题显示模式（`short`=短暂显示；`persistent`=常驻显示）
`title_duration`	4.0	标题显示时长（秒，仅 `short` 模式生效）

故障排除

faster-whisper 相关

问题: ModuleNotFoundError: No module named 'faster_whisper'

cd /home/rongye/ProgramFiles/ViGent2/backend
source venv/bin/activate
pip install faster-whisper>=1.0.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

问题: GPU 显存不足

修改 whisper_service.py，使用较小的模型：

WhisperService(model_size="medium", compute_type="int8")

Remotion 相关

问题: node_modules not found

cd /home/rongye/ProgramFiles/ViGent2/remotion
npm install

问题: Remotion 渲染失败 - fs 模块错误

确保 remotion/src/utils/captions.ts 中没有使用 Node.js 的 fs 模块。Remotion 在浏览器环境打包，不支持 fs。

问题: Remotion 渲染失败 - 视频文件读取错误 (file:// 协议)

确保 render.ts 使用 publicDir 选项指向视频所在目录，VideoLayer.tsx 使用 staticFile() 加载视频：

// render.ts
const publicDir = path.dirname(path.resolve(options.videoPath));
const bundleLocation = await bundle({
  entryPoint: path.resolve(__dirname, './src/index.ts'),
  publicDir,  // 关键配置
});

// VideoLayer.tsx
const videoUrl = staticFile(videoSrc);  // 使用 staticFile

问题: Remotion 渲染失败

查看后端日志：

pm2 logs vigent2-backend

查看服务健康状态

# 字幕服务健康检查
cd /home/rongye/ProgramFiles/ViGent2/backend
source venv/bin/activate
python -c "from app.services.whisper_service import whisper_service; import asyncio; print(asyncio.run(whisper_service.check_health()))"

# Remotion 健康检查
python -c "from app.services.remotion_service import remotion_service; import asyncio; print(asyncio.run(remotion_service.check_health()))"

可选优化

添加中文字体

为获得更好的字幕渲染效果，可添加中文字体：

# 下载 Noto Sans SC 字体
cd /home/rongye/ProgramFiles/ViGent2/remotion/src/fonts
wget https://github.com/googlefonts/noto-cjk/raw/main/Sans/OTF/SimplifiedChinese/NotoSansSC-Regular.otf -O NotoSansSC.otf

使用 GPU 0

faster-whisper 默认使用 GPU 0，与 LatentSync (GPU 1) 分开，避免显存冲突。如需指定 GPU：

# 在 whisper_service.py 中修改
WhisperService(device="cuda:0")  # 或 "cuda:1"

更新日志

日期	版本	说明
2026-01-29	1.0.0	初始版本，使用 faster-whisper + Remotion 实现逐字高亮字幕和片头标题
2026-02-10	1.1.0	更新架构图：多素材 concat-then-infer、预生成配音选项
2026-01-30	1.0.1	字幕高亮样式与标题动画优化，视觉表现更清晰
2026-02-25	1.2.0	字幕时间戳从线性插值改为 Whisper 节奏映射，修复长视频字幕漂移

8.0 KiB Raw Permalink Blame History Unescape Escape