更新

2026-02-28 14:44:51 +08:00 · 2026-02-28 09:16:41 +08:00
20 changed files with 558 additions and 225 deletions
--- a/Docs/BACKEND_README.md
+++ b/Docs/BACKEND_README.md
@@ -65,6 +65,7 @@ backend/
    *   `POST /api/materials`: 上传素材
    *   `GET /api/materials`: 获取素材列表
    *   `PUT /api/materials/{material_id}`: 重命名素材
+    *   `GET /api/materials/stream/{material_id}`: 同源流式返回素材文件（用于前端 canvas 截帧，避免跨域 CORS taint）

 4.  **社交发布 (Publish)**
    *   `POST /api/publish`: 发布视频到 抖音/微信视频号/B站/小红书
@@ -160,6 +161,18 @@ backend/
 - 多素材片段在拼接前统一重编码，并强制 `25fps + CFR`，减少段边界时间基不一致导致的画面卡顿。
 - concat 流程启用 `+genpts` 重建时间戳，提升拼接后时间轴连续性。
 - 对带旋转元数据的 MOV 素材会先做方向归一化，再进入分辨率判断和后续流程。
+- compose 阶段（视频轨+音频轨合并）使用 `-c:v copy` 流复制替代重编码，几乎瞬间完成。
+- FFmpeg 子进程设有超时保护：`_run_ffmpeg()` 600 秒、`_get_duration()` 30 秒，防止畸形文件导致永久挂起。
+
+### 全局并发控制
+
+- 视频生成入口使用 `asyncio.Semaphore(2)` 限制最多 2 个任务同时执行，排队中的任务显示"排队中..."状态。
+- Redis 任务 key 设有 TTL：创建时 24 小时，completed/failed 状态 2 小时，`list()` 时自动清理过期索引。
+
+### 字幕时间戳优化
+
+- Whisper 输出经 `smooth_word_timestamps()` 三步平滑：单调递增保证、重叠消除（中点分割）、微小间隙填补（<50ms）。
+- 支持 `original_text` 原文节奏映射：原文字符按比例映射到 Whisper 时间戳上，解决 AI 改写/多语言文案与转录不一致问题。

 ## 📦 资源库与静态资源

--- a/Docs/DEPLOY_MANUAL.md
+++ b/Docs/DEPLOY_MANUAL.md
@@ -97,7 +97,7 @@ python -m scripts.server  # 测试能否启动，Ctrl+C 退出

 ### 3b. MuseTalk 1.5 (长视频唇形同步, GPU0)

-> MuseTalk 是单步潜空间修复模型（非扩散模型），推理速度接近实时，适合 >=120s 的长视频。与 CosyVoice 共享 GPU0，fp16 推理约需 4-8GB 显存。
+> MuseTalk 是单步潜空间修复模型（非扩散模型），推理速度接近实时，适合 >=120s 的长视频。与 CosyVoice 共享 GPU0，fp16 推理约需 4-8GB 显存。合成阶段使用 NVENC GPU 硬编码（h264_nvenc）+ 纯 numpy blending，避免双重编码和 PIL 转换开销。

 请参考详细的独立部署指南：
 **[MuseTalk 部署指南](MUSETALK_DEPLOY.md)**
@@ -211,8 +211,10 @@ cp .env.example .env
 | `SUPABASE_PUBLIC_URL` | `https://api.hbyrkj.top` | Supabase API 公网地址 (前端访问) |
 | `LATENTSYNC_GPU_ID` | 1 | GPU 选择 (0 或 1) |
 | `LATENTSYNC_USE_SERVER` | false | 设为 true 以启用常驻服务加速 |
-| `LATENTSYNC_INFERENCE_STEPS` | 16 | 推理步数 (16-50) |
-| `LATENTSYNC_GUIDANCE_SCALE` | 1.5 | 引导系数 (1.0-3.0) |
+| `LATENTSYNC_INFERENCE_STEPS` | 20 | 推理步数 (16-50) |
+| `LATENTSYNC_GUIDANCE_SCALE` | 2.0 | 引导系数 (1.0-3.0) |
+| `LATENTSYNC_ENABLE_DEEPCACHE` | true | DeepCache 推理加速 |
+| `LATENTSYNC_SEED` | 1247 | 固定随机种子（可复现） |
 | `DEBUG` | true | 生产环境改为 false |
 | `REDIS_URL` | `redis://localhost:6379/0` | 任务状态存储（不可用时回退内存） |
 | `WEIXIN_HEADLESS_MODE` | headless-new | 视频号 Playwright 模式 (headful/headless-new) |
--- a/Docs/DevLogs/Day28.md
+++ b/Docs/DevLogs/Day28.md
@@ -201,3 +201,63 @@ const materialPosterUrl = useVideoFrameCapture(
 | TensorRT (DiT 模块) | +20-30% | 需编译 .plan 引擎 |
 | torch.compile() | +10-20% | 一行代码，但首次编译慢 |
 | vLLM (LLM 模块) | +10-15% | 额外依赖 |
+
+---
+
+## MuseTalk 合成阶段性能优化
+
+### 概述
+
+MuseTalk v2 优化后总耗时从 1799s 降到 819s（2.2x），但合成阶段（Phase 6）仍占 462.2s (56.4%)，是最大单一瓶颈。本次优化两个方向：纯 numpy blending 替代 PIL 转换、FFmpeg pipe + NVENC GPU 硬编码替代双重编码。
+
+### 1. 纯 numpy blending 替代 PIL（blending.py）
+
+- **问题**: `get_image_blending` 每帧做 3 次 numpy↔PIL 转换 + BGR↔RGB 通道翻转，纯粹浪费
+- **方案**: 新增 `get_image_blending_fast()` 函数
+  - 全程保持 BGR numpy 数组，不做 PIL 转换和通道翻转
+  - mask 混合用 numpy 向量化广播 `mask * (1/255)` 替代 `PIL.paste with mask`
+  - 原 `get_image_blending` 保留作为 fallback
+- **降级链**: `blending_fast` → `blending`（PIL）→ `get_image`（完整重算）
+
+### 2. FFmpeg pipe + NVENC 硬编码替代双重编码（server.py）
+
+**优化前（双重编码）**:
+```
+Phase 6: 逐帧 → cv2.VideoWriter (mp4v CPU 软编码) → temp_raw.mp4
+Phase 7: FFmpeg 读 temp_raw.mp4 → H.264 CPU 重编码 + 合并音频 → output.mp4
+```
+
+**优化后（单次 GPU 编码）**:
+```
+Phase 6: 逐帧 → FFmpeg stdin pipe (rawvideo → h264_nvenc GPU 编码) → temp_raw.mp4
+Phase 7: FFmpeg 只做音频合并 (-c:v copy -c:a copy) → output.mp4  （秒级）
+```
+
+- NVENC 参数: `-c:v h264_nvenc -preset p4 -cq 20 -pix_fmt yuv420p`
+- RTX 3090 NVENC 专用芯片编码，不占 CUDA 核心，编码速度 >500fps
+
+### 3. FFmpeg 进程资源管理加固
+
+- `try/finally` 包裹写帧循环，确保异常时 `proc.stdin.close()` 执行
+- `proc.wait()` 后读 stderr 再关闭，避免缓冲区死锁
+- stderr decode 加 `errors="ignore"` 防止非 UTF-8 崩溃
+
+### 4. `run_ffmpeg` 安全改进
+
+- 去掉 `shell=True`，改用列表传参，避免路径特殊字符导致命令注入
+- Phase 7 FFmpeg 命令从字符串拼接改为列表传参
+
+### 调优过程
+
+| 版本 | Phase 6 | Phase 7 | 总计 | 结论 |
+|------|---------|---------|------|------|
+| Day27 基线 | 462s | 38s | 819s | — |
+| v1: libx264 -preset medium | 548s | 0.3s | 854s | CPU 编码背压，反而更慢 |
+| v2: h264_nvenc（当前） | 待测 | 待测 | 待测 | NVENC 零背压，预估 Phase 6 < 200s |
+
+### 修改文件
+
+| 文件 | 改动 |
+|------|------|
+| `models/MuseTalk/musetalk/utils/blending.py` | 新增 `get_image_blending_fast()` 纯 numpy 函数 |
+| `models/MuseTalk/scripts/server.py` | Phase 6: FFmpeg pipe + NVENC + blending_fast；Phase 7: -c:v copy；`run_ffmpeg` 去掉 shell=True |
--- a/Docs/DevLogs/Day29.md
+++ b/Docs/DevLogs/Day29.md
@@ -0,0 +1,206 @@
+## 字幕同步修复 + 嘴型参数调优 + 视频流水线全面优化 + 预览背景修复 (Day 29)
+
+### 概述
+
+本轮对视频生成流水线做全面审查优化：修复字幕与语音不同步问题（Whisper 时间戳平滑 + 原文节奏映射）、调优 LatentSync 嘴型参数、compose 流复制省去冗余重编码、FFmpeg 超时保护、全局并发限制、Redis 任务 TTL、临时文件清理、死代码移除。同时修复因前端域名迁移（`vigent.hbyrkj.top` → `ipagent.ai-labz.cn`）导致的样式预览背景 CORS 失效问题。
+
+---
+
+## ✅ 改动内容
+
+### 1. 字幕同步修复（Whisper 时间戳 + 原文节奏映射）
+
+- **问题**: 字幕高亮与语音不同步，表现为字幕超前/滞后、高亮跳空
+- **根因**: Whisper 输出的逐字时间戳存在微小抖动（相邻字 end > 下一字 start），且字间间隙导致高亮"闪烁"
+
+#### whisper_service.py — 时间戳后处理
+
+新增 `smooth_word_timestamps()` 函数，三步平滑：
+
+1. **单调递增保证**: 后一字的 start 不早于前一字的 start
+2. **重叠消除**: 两字时间重叠时取中点分割
+3. **间隙填补**: 字间间隙 < 50ms 时直接连接，避免高亮跳空
+
+```python
+def smooth_word_timestamps(words):
+    for i in range(1, len(words)):
+        # 重叠 → 中点分割
+        if w["start"] < prev["end"]:
+            mid = (prev["end"] + w["start"]) / 2
+            prev["end"] = mid; w["start"] = mid
+        # 微小间隙 → 直接连接
+        if 0 < gap < 0.05:
+            prev["end"] = w["start"]
+```
+
+#### whisper_service.py — 原文节奏映射
+
+- **问题**: AI 改写/多语言文案与 Whisper 转录文字不一致，直接用 Whisper 文字会乱码
+- **方案**: `original_text` 参数非空时，用原文字符替换 Whisper 文字，但保留 Whisper 的语音节奏时间戳
+- 实现：将 N 个原文字符按比例映射到 M 个 Whisper 时间戳上（线性插值）
+- 字数比例异常检测（>1.5x 或 <0.67x 时警告）
+- 单字时长钳位：40ms ~ 800ms，防止极端漂移
+
+#### captions.ts — Remotion 端字幕查找
+
+新增 `getCurrentSegment()` 和 `getCurrentWordIndex()` 函数：
+
+- 根据当前帧时间精确查找应显示的字幕段落和高亮字索引
+- 处理字间间隙（两字之间返回前一字索引，保持高亮连续）
+- 超过最后一字结束时间时返回最后一字（避免末尾闪烁）
+
+---
+
+### 2. LatentSync 嘴型参数调优
+
+| 参数 | Day28 值 | Day29 值 | 说明 |
+|------|----------|----------|------|
+| `LATENTSYNC_INFERENCE_STEPS` | 16 | 20 | 适当增加步数提升嘴型质量 |
+| `LATENTSYNC_GUIDANCE_SCALE` | (默认) | 2.0 | 平衡嘴型贴合度与自然感 |
+| `LATENTSYNC_ENABLE_DEEPCACHE` | (默认) | true | DeepCache 加速推理 |
+| `LATENTSYNC_SEED` | (默认) | 1247 | 固定种子保证可复现 |
+| Remotion concurrency | 16 | 4 | 降低并发防止资源争抢 |
+
+---
+
+### 3. compose() 流复制替代冗余重编码（高优先级）
+
+**文件**: `video_service.py`
+
+- **问题**: `compose()` 只是合并视频轨+音频轨（mux），却每次用 `libx264 -preset medium -crf 20` 做完整重编码，耗时数分钟。整条流水线一个视频最多被 x264 编码 5 次
+- **方案**: 不需要循环时（`loop_count == 1`）用 `-c:v copy` 流复制，几乎瞬间完成；需要循环时仍用 libx264
+
+```python
+if loop_count > 1:
+    cmd.extend(["-c:v", "libx264", "-preset", "fast", "-crf", "23"])
+else:
+    cmd.extend(["-c:v", "copy"])
+```
+
+- compose 是中间产物（Remotion 会再次编码），流复制省一次编码且无质量损失
+
+---
+
+### 4. FFmpeg 超时保护（高优先级）
+
+**文件**: `video_service.py`
+
+- `_run_ffmpeg()`: 新增 `timeout=600`（10 分钟），捕获 `subprocess.TimeoutExpired`
+- `_get_duration()`: 新增 `timeout=30`
+- 防止畸形视频导致 FFmpeg 永久挂起阻塞后台任务
+
+---
+
+### 5. 全局任务并发限制（高优先级）
+
+**文件**: `workflow.py`
+
+- 模块级 `asyncio.Semaphore(2)`，`process_video_generation()` 入口 acquire
+- 排队中的任务显示"排队中..."状态
+- 防止多个请求同时跑 FFmpeg + Remotion 导致 CPU/内存爆炸
+
+```python
+_generation_semaphore = asyncio.Semaphore(2)
+
+async def process_video_generation(task_id, req, user_id):
+    _update_task(task_id, message="排队中...")
+    async with _generation_semaphore:
+        await _process_video_generation_inner(task_id, req, user_id)
+```
+
+---
+
+### 6. Redis 任务 TTL + 索引清理（中优先级）
+
+**文件**: `task_store.py`
+
+- `create()`: 设 24 小时 TTL（`ex=86400`）
+- `update()`: completed/failed 状态设 2 小时 TTL（`ex=7200`），其余 24 小时
+- `list()`: 遍历时顺带清理已过期的索引条目（`srem`）
+- 解决 Redis 任务 key 永久堆积问题
+
+---
+
+### 7. 临时字体文件清理（中优先级）
+
+**文件**: `workflow.py`
+
+- `prepare_style_for_remotion()` 复制字体到 temp_dir，但未加入清理列表
+- 现在遍历三组前缀（subtitle/title/secondary_title）× 四种扩展名（.ttf/.otf/.woff/.woff2），将存在的字体文件加入 `temp_files`
+
+---
+
+### 8. Whisper+split 逻辑去重（低优先级）
+
+**文件**: `workflow.py`
+
+- 两个分支（custom_assignments 不匹配 vs 默认）的 Whisper→_split_equal 代码 100% 相同（36 行重复）
+- 提取为内部函数 `_whisper_and_split()`，两个分支共用
+
+---
+
+### 9. LipSync 死代码清理（低优先级）
+
+**文件**: `lipsync_service.py`
+
+- 删除 `_preprocess_video()` 方法（92 行），全项目无任何调用
+
+---
+
+### 10. 标题字幕预览背景 CORS 修复
+
+- **问题**: 前端域名从 `vigent.hbyrkj.top` 迁移到 `ipagent.ai-labz.cn` 后，素材签名 URL（`api.hbyrkj.top`）与新前端域名完全不同根域，Supabase Kong 网关的 CORS 不覆盖新域名 → `<video crossOrigin="anonymous">` 加载失败 → canvas 截帧失败 → 回退渐变背景
+- **根因**: Day28 实现依赖 Supabase 返回 `Access-Control-Allow-Origin` 头，换域名后此依赖断裂
+
+**修复方案 — 同源代理（彻底绕开 CORS）**:
+
+| 组件 | 改动 |
+|------|------|
+| `materials/router.py` | 新增 `GET /api/materials/stream/{material_id}` 端点，通过 `get_local_file_path()` 从本地磁盘直读，返回 `FileResponse` |
+| `useHomeController.ts` | 帧截取 URL 改为 `/api/materials/stream/${mat.id}`（同源），不再用跨域签名 URL |
+| `useVideoFrameCapture.ts` | 移除 `crossOrigin = "anonymous"`，同源请求不需要 |
+
+链路：`用户点预览 → /api/materials/stream/xxx → Next.js rewrite → FastAPI FileResponse → 同源 <video> → canvas 截帧成功`
+
+---
+
+### 11. 支付宝回调域名更新
+
+**文件**: `.env`
+
+```
+ALIPAY_NOTIFY_URL=https://ipagent.ai-labz.cn/api/payment/notify
+ALIPAY_RETURN_URL=https://ipagent.ai-labz.cn/pay
+```
+
+---
+
+## 📁 修改文件清单
+
+| 文件 | 改动 |
+|------|------|
+| `backend/app/services/whisper_service.py` | 时间戳平滑 + 原文节奏映射 + 单字时长钳位 |
+| `remotion/src/utils/captions.ts` | 新增 `getCurrentSegment` / `getCurrentWordIndex` |
+| `backend/app/services/video_service.py` | compose 流复制 + FFmpeg 超时保护 |
+| `backend/app/modules/videos/workflow.py` | Semaphore(2) 并发限制 + 字体清理 + Whisper 逻辑去重 |
+| `backend/app/modules/videos/task_store.py` | Redis TTL + 索引过期清理 |
+| `backend/app/services/lipsync_service.py` | 删除 `_preprocess_video()` 死代码 |
+| `backend/app/services/remotion_service.py` | concurrency 16 → 4 |
+| `remotion/render.ts` | 新增 concurrency 参数支持 |
+| `backend/app/modules/materials/router.py` | 新增 `/stream/{material_id}` 同源代理端点 |
+| `frontend/.../useVideoFrameCapture.ts` | 移除 crossOrigin |
+| `frontend/.../useHomeController.ts` | 帧截取 URL 改用同源代理 |
+| `backend/.env` | 嘴型参数 + 支付宝域名更新 |
+
+---
+
+## 🔍 验证
+
+1. **字幕同步**: 生成视频观察逐字高亮，不应出现超前/滞后/跳空
+2. **compose 流复制**: FFmpeg 日志中 compose 步骤应出现 `-c:v copy`，耗时从分钟级降到秒级
+3. **FFmpeg 超时**: 代码确认 timeout 参数已加
+4. **并发限制**: 连续提交 3 个任务，第 3 个应显示"排队中"，前 2 个完成后才开始
+5. **Redis TTL**: `redis-cli TTL vigent:tasks:<id>` 确认有过期时间
+6. **字体清理**: 生成视频后 temp 目录不应残留字体文件
+7. **预览背景**: 选择素材 → 点击"预览样式"，应显示视频第一帧（非渐变）
+8. **支付宝**: 发起支付后回调和跳转地址为新域名
--- a/Docs/SUBTITLE_DEPLOY.md
+++ b/Docs/SUBTITLE_DEPLOY.md
@@ -187,7 +187,7 @@ Remotion 渲染参数在 `backend/app/services/remotion_service.py` 中配置：
 | 参数 | 默认值 | 说明 |
 |------|--------|------|
 | `fps` | 25 | 输出帧率 |
-| `concurrency` | 16 | Remotion 并发渲染进程数（默认 16，可通过 `--concurrency` CLI 参数覆盖） |
+| `concurrency` | 4 | Remotion 并发渲染进程数（默认 4，可通过 `--concurrency` CLI 参数覆盖） |
 | `title_display_mode` | `short` | 标题显示模式（`short`=短暂显示；`persistent`=常驻显示） |
 | `title_duration` | 4.0 | 标题显示时长（秒，仅 `short` 模式生效） |

@@ -294,3 +294,5 @@ WhisperService(device="cuda:0")  # 或 "cuda:1"
 | 2026-01-30 | 1.0.1 | 字幕高亮样式与标题动画优化，视觉表现更清晰 |
 | 2026-02-25 | 1.2.0 | 字幕时间戳从线性插值改为 Whisper 节奏映射，修复长视频字幕漂移 |
 | 2026-02-27 | 1.3.0 | 架构图更新 MuseTalk 混合路由；Remotion 并发渲染从 8 提升到 16；GPU 分配说明更新 |
+| 2026-02-28 | 1.3.1 | MuseTalk 合成阶段优化：纯 numpy blending + FFmpeg pipe NVENC GPU 硬编码替代双重编码 |
+| 2026-02-28 | 1.4.0 | compose 流复制替代重编码；FFmpeg 超时保护 (600s/30s)；Remotion 并发 16→4；Whisper 时间戳平滑 + 原文节奏映射；全局视频生成 Semaphore(2)；Redis 任务 TTL |
--- a/README.md
+++ b/README.md
@@ -37,7 +37,7 @@
 - 💳 **付费会员** - 支付宝电脑网站支付自动开通会员，到期自动停用并引导续费，管理员手动激活并存。
 - 🔐 **认证与隔离** - 基于 Supabase 的用户隔离，支持手机号注册/登录、密码管理。
 - 🛡️ **服务守护** - 内置 Watchdog 看门狗机制，自动监控并重启僵死服务，确保 7x24h 稳定运行。
- 🚀 **性能优化** - 视频预压缩、模型常驻服务（近实时加载）、双 GPU 流水线并发、MuseTalk 人脸检测降频 + BiSeNet 缓存、Remotion 16 并发渲染。
+- 🚀 **性能优化** - compose 流复制免重编码、FFmpeg 超时保护、全局视频生成并发限制 (Semaphore(2))、Remotion 4 并发渲染、MuseTalk NVENC GPU 硬编码 + 纯 numpy blending、模型常驻服务、双 GPU 流水线并发、Redis 任务 TTL 自动清理。

 ---

--- a/backend/.env.example
+++ b/backend/.env.example
@@ -25,10 +25,10 @@ LATENTSYNC_USE_SERVER=true
 # LATENTSYNC_API_URL=http://localhost:8007

 # 推理步数 (20-50, 越高质量越好，速度越慢)
-LATENTSYNC_INFERENCE_STEPS=16
+LATENTSYNC_INFERENCE_STEPS=20

 # 引导系数 (1.0-3.0, 越高唇同步越准，但可能抖动)
-LATENTSYNC_GUIDANCE_SCALE=1.5
+LATENTSYNC_GUIDANCE_SCALE=2.0

 # 启用 DeepCache 加速 (推荐开启)
 LATENTSYNC_ENABLE_DEEPCACHE=true
@@ -94,5 +94,5 @@ SUPABASE_STORAGE_LOCAL_PATH=/home/rongye/ProgramFiles/Supabase/volumes/storage/s
 ALIPAY_APP_ID=2021006132600283
 ALIPAY_PRIVATE_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/app_private_key.pem
 ALIPAY_PUBLIC_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/alipay_public_key.pem
-ALIPAY_NOTIFY_URL=https://vigent.hbyrkj.top/api/payment/notify
-ALIPAY_RETURN_URL=https://vigent.hbyrkj.top/pay
+ALIPAY_NOTIFY_URL=https://ipagent.ai-labz.cn/api/payment/notify
+ALIPAY_RETURN_URL=https://ipagent.ai-labz.cn/pay
--- a/backend/app/modules/materials/router.py
+++ b/backend/app/modules/materials/router.py
@@ -1,14 +1,28 @@
 from fastapi import APIRouter, HTTPException, Request, Depends
+from fastapi.responses import FileResponse
 from loguru import logger

 from app.core.deps import get_current_user
 from app.core.response import success_response
 from app.modules.materials.schemas import RenameMaterialRequest
 from app.modules.materials import service
+from app.services.storage import storage_service

 router = APIRouter()


+@router.get("/stream/{material_id:path}")
+async def stream_material(material_id: str, current_user: dict = Depends(get_current_user)):
+    """直接流式返回素材文件（同源，避免 CORS canvas taint）"""
+    user_id = current_user["id"]
+    if not material_id.startswith(f"{user_id}/"):
+        raise HTTPException(403, "无权访问此素材")
+    local_path = storage_service.get_local_file_path("materials", material_id)
+    if not local_path:
+        raise HTTPException(404, "素材文件不存在")
+    return FileResponse(local_path, media_type="video/mp4")
+
+
@router.post("")
 async def upload_material(
    request: Request,
--- a/backend/app/modules/videos/task_store.py
+++ b/backend/app/modules/videos/task_store.py
@@ -54,7 +54,7 @@ class RedisTaskStore:
            "progress": 0,
            "user_id": user_id,
        }
-        self._client.set(self._key(task_id), json.dumps(task, ensure_ascii=False))
+        self._client.set(self._key(task_id), json.dumps(task, ensure_ascii=False), ex=86400)
        self._client.sadd(self._index_key, task_id)
        return task

@@ -71,12 +71,17 @@ class RedisTaskStore:
        keys = [self._key(task_id) for task_id in task_ids]
        raw_items = self._client.mget(keys)
        tasks = []
-        for raw in raw_items:
-            if raw:
-                try:
-                    tasks.append(json.loads(raw))
-                except Exception:
-                    continue
+        expired = []
+        for task_id, raw in zip(task_ids, raw_items):
+            if raw is None:
+                expired.append(task_id)
+                continue
+            try:
+                tasks.append(json.loads(raw))
+            except Exception:
+                continue
+        if expired:
+            self._client.srem(self._index_key, *expired)
        return tasks

    def update(self, task_id: str, updates: Dict[str, Any]) -> Dict[str, Any]:
@@ -84,7 +89,8 @@ class RedisTaskStore:
        if task.get("status") == "not_found":
            task = {"status": "pending", "task_id": task_id}
        task.update(updates)
-        self._client.set(self._key(task_id), json.dumps(task, ensure_ascii=False))
+        ttl = 7200 if task.get("status") in ("completed", "failed") else 86400
+        self._client.set(self._key(task_id), json.dumps(task, ensure_ascii=False), ex=ttl)
        self._client.sadd(self._index_key, task_id)
        return task

--- a/backend/app/modules/videos/workflow.py
+++ b/backend/app/modules/videos/workflow.py
@@ -24,6 +24,9 @@ from app.services.remotion_service import remotion_service
 from .schemas import GenerateRequest
 from .task_store import task_store

+# 全局并发限制：最多同时运行 2 个视频生成任务
+_generation_semaphore = asyncio.Semaphore(2)
+

 def _locale_to_whisper_lang(locale: str) -> str:
    """'en-US' → 'en', 'zh-CN' → 'zh'"""
@@ -169,6 +172,12 @@ def _split_equal(segments: List[dict], material_paths: List[str]) -> List[dict]:


 async def process_video_generation(task_id: str, req: GenerateRequest, user_id: str):
+    _update_task(task_id, message="排队中...")
+    async with _generation_semaphore:
+        await _process_video_generation_inner(task_id, req, user_id)
+
+
+async def _process_video_generation_inner(task_id: str, req: GenerateRequest, user_id: str):
    temp_files = []
    try:
        start_time = time.time()
@@ -283,6 +292,42 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:

        captions_path = None

+        async def _whisper_and_split():
+            """Whisper 对齐 → _split_equal 均分素材（公共逻辑）"""
+            _update_task(task_id, message="正在生成字幕 (Whisper)...")
+            _captions_path = temp_dir / f"{task_id}_captions.json"
+            temp_files.append(_captions_path)
+            captions_data = None
+            try:
+                captions_data = await whisper_service.align(
+                    audio_path=str(audio_path),
+                    text=req.text,
+                    output_path=str(_captions_path),
+                    language=_locale_to_whisper_lang(req.language),
+                    original_text=req.text,
+                )
+                print(f"[Pipeline] Whisper alignment completed (multi-material)")
+            except Exception as e:
+                logger.warning(f"Whisper alignment failed: {e}")
+                _captions_path = None
+
+            _update_task(task_id, progress=15, message="正在分配素材...")
+
+            if captions_data and captions_data.get("segments"):
+                result = _split_equal(captions_data["segments"], material_paths)
+            else:
+                logger.warning("[MultiMat] Whisper 无数据，按时长均分")
+                audio_dur = video._get_duration(str(audio_path))
+                if audio_dur <= 0:
+                    audio_dur = 30.0
+                seg_dur = audio_dur / len(material_paths)
+                result = [
+                    {"material_path": material_paths[i], "start": i * seg_dur,
+                     "end": (i + 1) * seg_dur, "index": i}
+                    for i in range(len(material_paths))
+                ]
+            return result, _captions_path
+
        if is_multi:
            # ══════════════════════════════════════
            # 多素材流水线
@@ -327,80 +372,10 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
                    f" 与素材数量({len(material_paths)})不一致，回退自动分配"
                )

-                # 原有逻辑：Whisper → _split_equal
-                _update_task(task_id, message="正在生成字幕 (Whisper)...")
-
-                captions_path = temp_dir / f"{task_id}_captions.json"
-                temp_files.append(captions_path)
-
-                try:
-                    captions_data = await whisper_service.align(
-                        audio_path=str(audio_path),
-                        text=req.text,
-                        output_path=str(captions_path),
-                        language=_locale_to_whisper_lang(req.language),
-                        original_text=req.text,
-                    )
-                    print(f"[Pipeline] Whisper alignment completed (multi-material)")
-                except Exception as e:
-                    logger.warning(f"Whisper alignment failed: {e}")
-                    captions_data = None
-                    captions_path = None
-
-                _update_task(task_id, progress=15, message="正在分配素材...")
-
-                if captions_data and captions_data.get("segments"):
-                    assignments = _split_equal(captions_data["segments"], material_paths)
-                else:
-                    # Whisper 失败 → 按时长均分（不依赖字符对齐）
-                    logger.warning("[MultiMat] Whisper 无数据，按时长均分")
-                    audio_dur = video._get_duration(str(audio_path))
-                    if audio_dur <= 0:
-                        audio_dur = 30.0  # 安全兜底
-                    seg_dur = audio_dur / len(material_paths)
-                    assignments = [
-                        {"material_path": material_paths[i], "start": i * seg_dur,
-                         "end": (i + 1) * seg_dur, "index": i}
-                        for i in range(len(material_paths))
-                    ]
+                assignments, captions_path = await _whisper_and_split()

            else:
-                # 原有逻辑：Whisper → _split_equal
-                _update_task(task_id, message="正在生成字幕 (Whisper)...")
-
-                captions_path = temp_dir / f"{task_id}_captions.json"
-                temp_files.append(captions_path)
-
-                try:
-                    captions_data = await whisper_service.align(
-                        audio_path=str(audio_path),
-                        text=req.text,
-                        output_path=str(captions_path),
-                        language=_locale_to_whisper_lang(req.language),
-                        original_text=req.text,
-                    )
-                    print(f"[Pipeline] Whisper alignment completed (multi-material)")
-                except Exception as e:
-                    logger.warning(f"Whisper alignment failed: {e}")
-                    captions_data = None
-                    captions_path = None
-
-                _update_task(task_id, progress=15, message="正在分配素材...")
-
-                if captions_data and captions_data.get("segments"):
-                    assignments = _split_equal(captions_data["segments"], material_paths)
-                else:
-                    # Whisper 失败 → 按时长均分（不依赖字符对齐）
-                    logger.warning("[MultiMat] Whisper 无数据，按时长均分")
-                    audio_dur = video._get_duration(str(audio_path))
-                    if audio_dur <= 0:
-                        audio_dur = 30.0  # 安全兜底
-                    seg_dur = audio_dur / len(material_paths)
-                    assignments = [
-                        {"material_path": material_paths[i], "start": i * seg_dur,
-                         "end": (i + 1) * seg_dur, "index": i}
-                        for i in range(len(material_paths))
-                    ]
+                assignments, captions_path = await _whisper_and_split()

            # 扩展段覆盖完整音频范围：首段从0开始，末段到音频结尾
            audio_duration = video._get_duration(str(audio_path))
@@ -721,6 +696,13 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
                f"{task_id}_secondary_title_font"
            )

+            # 清理字体临时文件
+            for prefix in [f"{task_id}_subtitle_font", f"{task_id}_title_font", f"{task_id}_secondary_title_font"]:
+                for ext in [".ttf", ".otf", ".woff", ".woff2"]:
+                    font_tmp = temp_dir / f"{prefix}{ext}"
+                    if font_tmp.exists():
+                        temp_files.append(font_tmp)
+
        final_output_local_path = temp_dir / f"{task_id}_output.mp4"
        temp_files.append(final_output_local_path)

--- a/backend/app/services/lipsync_service.py
+++ b/backend/app/services/lipsync_service.py
@@ -121,98 +121,6 @@ class LipSyncService:
            logger.warning(f"⚠️ 视频循环异常: {e}")
            return video_path

-    def _preprocess_video(self, video_path: str, output_path: str, target_height: int = 720) -> str:
-        """
-        视频预处理：压缩视频以加速后续处理
-        - 限制最大高度为 target_height (默认720p)
-        - 保持宽高比
-        - 使用快速编码预设
-        
-        Returns: 预处理后的视频路径
-        """
-        import subprocess
-        import json
-        
-        # 获取视频信息 (使用 JSON 格式更可靠)
-        probe_cmd = [
-            "ffprobe", "-v", "error",
-            "-select_streams", "v:0",
-            "-show_entries", "stream=height,width",
-            "-of", "json",
-            video_path
-        ]
-        try:
-            result = subprocess.run(probe_cmd, capture_output=True, text=True, timeout=10)
-            if result.returncode != 0:
-                logger.warning(f"⚠️ ffprobe 失败: {result.stderr[:100]}")
-                return video_path
-            
-            probe_data = json.loads(result.stdout)
-            streams = probe_data.get("streams", [])
-            if not streams:
-                logger.warning("⚠️ 无法获取视频流信息，跳过预处理")
-                return video_path
-            
-            current_height = streams[0].get("height", 0)
-            current_width = streams[0].get("width", 0)
-            
-            if current_height == 0:
-                logger.warning("⚠️ 视频高度为 0，跳过预处理")
-                return video_path
-                
-            logger.info(f"📹 原始视频分辨率: {current_width}×{current_height}")
-            
-        except json.JSONDecodeError as e:
-            logger.warning(f"⚠️ ffprobe 输出解析失败: {e}")
-            return video_path
-        except subprocess.TimeoutExpired:
-            logger.warning("⚠️ ffprobe 超时，跳过预处理")
-            return video_path
-        except Exception as e:
-            logger.warning(f"⚠️ 获取视频信息失败: {e}")
-            return video_path
-        
-        # 如果视频已经足够小，跳过压缩
-        if current_height <= target_height:
-            logger.info(f"📹 视频高度 {current_height}p <= {target_height}p，无需压缩")
-            return video_path
-        
-        logger.info(f"📹 预处理视频: {current_height}p → {target_height}p")
-        
-        # 使用 FFmpeg 压缩
-        compress_cmd = [
-            "ffmpeg", "-y",
-            "-i", video_path,
-            "-vf", f"scale=-2:{target_height}",  # 保持宽高比，高度设为 target_height
-            "-c:v", "libx264",
-            "-preset", "ultrafast",  # 最快编码速度
-            "-crf", "23",  # 质量因子
-            "-c:a", "copy",  # 音频直接复制
-            output_path
-        ]
-        
-        try:
-            result = subprocess.run(
-                compress_cmd,
-                capture_output=True,
-                text=True,
-                timeout=120  # 增加超时时间到2分钟
-            )
-            if result.returncode == 0 and Path(output_path).exists():
-                original_size = Path(video_path).stat().st_size / 1024 / 1024
-                new_size = Path(output_path).stat().st_size / 1024 / 1024
-                logger.info(f"✅ 视频压缩完成: {original_size:.1f}MB → {new_size:.1f}MB")
-                return output_path
-            else:
-                logger.warning(f"⚠️ 视频压缩失败: {result.stderr[:200]}")
-                return video_path
-        except subprocess.TimeoutExpired:
-            logger.warning("⚠️ 视频压缩超时，使用原始视频")
-            return video_path
-        except Exception as e:
-            logger.warning(f"⚠️ 视频压缩异常: {e}")
-            return video_path
-    
    async def generate(
        self, 
        video_path: str, 
--- a/backend/app/services/remotion_service.py
+++ b/backend/app/services/remotion_service.py
@@ -71,7 +71,8 @@ class RemotionService:
            "--video", str(video_path),
            "--output", str(output_path),
            "--fps", str(fps),
-            "--enableSubtitles", str(enable_subtitles).lower()
+            "--enableSubtitles", str(enable_subtitles).lower(),
+            "--concurrency", "4"
        ])

        if captions_path:
--- a/backend/app/services/video_service.py
+++ b/backend/app/services/video_service.py
@@ -118,18 +118,21 @@ class VideoService:
        cmd_str = ' '.join(shlex.quote(str(c)) for c in cmd)
        logger.debug(f"FFmpeg CMD: {cmd_str}")
        try:
-            # Synchronous call for BackgroundTasks compatibility
            result = subprocess.run(
                cmd,
                shell=False,
                capture_output=True,
                text=True,
                encoding='utf-8',
+                timeout=600,
            )
            if result.returncode != 0:
                logger.error(f"FFmpeg Error: {result.stderr}")
                return False
            return True
+        except subprocess.TimeoutExpired:
+            logger.error("FFmpeg timed out after 600s")
+            return False
        except Exception as e:
            logger.error(f"FFmpeg Exception: {e}")
            return False
@@ -148,6 +151,7 @@ class VideoService:
                cmd,
                capture_output=True,
                text=True,
+                timeout=30,
            )
            return float(result.stdout.strip())
        except Exception:
@@ -221,17 +225,20 @@ class VideoService:
        # Previous state: subtitles disabled due to font issues
        # if subtitle_path: ...
        
-        # Audio map with high quality encoding
+        # 不需要循环时用流复制（几乎瞬间完成），需要循环时才重编码
+        if loop_count > 1:
+            cmd.extend([
+                "-c:v", "libx264", "-preset", "fast", "-crf", "23",
+            ])
+        else:
+            cmd.extend(["-c:v", "copy"])
+
        cmd.extend([
-            "-c:v", "libx264",
-            "-preset", "medium",    # 平衡速度与压缩效率
-            "-crf", "20",           # 最终输出：高质量（肉眼无损）
            "-c:a", "aac",
-            "-b:a", "192k",         # 音频比特率
-            "-shortest"
+            "-b:a", "192k",
+            "-shortest",
+            "-map", "0:v", "-map", "1:a",
        ])
-        # Use audio from input 1
-        cmd.extend(["-map", "0:v", "-map", "1:a"])
        
        cmd.append(output_path)
        
--- a/backend/app/services/whisper_service.py
+++ b/backend/app/services/whisper_service.py
@@ -151,6 +151,46 @@ def split_segment_to_lines(words: List[dict], max_chars: int = MAX_CHARS_PER_LIN
    return segments


+def smooth_word_timestamps(words: List[dict]) -> List[dict]:
+    """
+    时间戳后处理平滑：
+    1. 保证时间戳严格单调递增
+    2. 消除 Whisper 输出中的微小抖动（字的 end > 下一字的 start）
+    3. 填补字间间隙，避免字幕高亮"跳空"
+    """
+    if len(words) <= 1:
+        return words
+
+    result = [words[0].copy()]
+    for i in range(1, len(words)):
+        w = words[i].copy()
+        prev = result[-1]
+
+        # 保证 start 不早于前一字的 start（单调递增）
+        if w["start"] < prev["start"]:
+            w["start"] = prev["start"]
+
+        # 保证 start 不早于前一字的 end
+        if w["start"] < prev["end"]:
+            # 两字重叠，取中点分割
+            mid = (prev["end"] + w["start"]) / 2
+            prev["end"] = round(mid, 3)
+            w["start"] = round(mid, 3)
+
+        # 填补字间间隙（间隙 < 50ms 时直接连接，避免高亮跳空）
+        gap = w["start"] - prev["end"]
+        if 0 < gap < 0.05:
+            prev["end"] = w["start"]
+
+        # 保证 end >= start
+        if w["end"] < w["start"]:
+            w["end"] = w["start"] + 0.05
+
+        result.append(w)
+
+    return result
+
+
 class WhisperService:
    """字幕对齐服务（基于 faster-whisper）"""

@@ -219,6 +259,8 @@ class WhisperService:
                language=language,
                word_timestamps=True,  # 启用字级别时间戳
                vad_filter=True,  # 启用 VAD 过滤静音
+                beam_size=8,  # 增大搜索宽度，提升时间戳精度
+                # condition_on_previous_text 保持默认 True，避免时间戳系统性超前
            )

            logger.info(f"Detected language: {info.language} (prob: {info.language_probability:.2f})")
@@ -244,6 +286,7 @@ class WhisperService:
                            all_words.extend(chars)

                if all_words:
+                    all_words = smooth_word_timestamps(all_words)
                    line_segments = split_segment_to_lines(all_words, max_chars)
                    all_segments.extend(line_segments)

@@ -268,6 +311,14 @@ class WhisperService:
                    w_starts = [c["start"] for c in whisper_chars]
                    w_final_end = whisper_chars[-1]["end"]

+                    # 字数比例异常检测
+                    ratio = n_o / n_w
+                    if ratio > 1.5 or ratio < 0.67:
+                        logger.warning(
+                            f"original_text 与 Whisper 字数比例异常: {n_o}/{n_w} = {ratio:.2f}, "
+                            f"字幕时间戳精度可能下降"
+                        )
+
                    logger.info(
                        f"Using original_text for subtitles (len={len(original_text)}), "
                        f"rhythm-mapping {n_o} orig chars onto {n_w} Whisper chars, "
@@ -302,11 +353,21 @@ class WhisperService:
                            "end": round(t_end, 3),
                        })

-                    all_segments = split_segment_to_lines(remapped, max_chars)
+                    # 限制单字时长范围，防止比例异常时极端漂移
+                    MIN_CHAR_DURATION = 0.04   # 40ms（一帧@25fps）
+                    MAX_CHAR_DURATION = 0.8    # 800ms
+                    for r in remapped:
+                        dur = r["end"] - r["start"]
+                        if dur < MIN_CHAR_DURATION:
+                            r["end"] = round(r["start"] + MIN_CHAR_DURATION, 3)
+                        elif dur > MAX_CHAR_DURATION:
+                            r["end"] = round(r["start"] + MAX_CHAR_DURATION, 3)
+
+                    all_segments = split_segment_to_lines(smooth_word_timestamps(remapped), max_chars)
                    logger.info(f"Rebuilt {len(all_segments)} subtitle segments (rhythm-mapped)")
                elif orig_chars:
                    # Whisper 字符不足，退回线性插值
-                    all_segments = split_segment_to_lines(orig_chars, max_chars)
+                    all_segments = split_segment_to_lines(smooth_word_timestamps(orig_chars), max_chars)
                    logger.info(f"Rebuilt {len(all_segments)} subtitle segments (linear fallback)")

            logger.info(f"Generated {len(all_segments)} subtitle segments")
--- a/frontend/src/features/home/model/useHomeController.ts
+++ b/frontend/src/features/home/model/useHomeController.ts
@@ -400,13 +400,14 @@ export const useHomeController = () => {
  });

  // 时间轴第一段素材的视频 URL（用于帧截取预览）
-  // 有时间轴段时用第一段，没有（如未选配音）回退到 selectedMaterials[0]
+  // 使用后端代理 URL（同源）避免 CORS canvas taint
  const firstTimelineMaterialUrl = useMemo(() => {
    const firstSeg = timelineSegments[0];
    const matId = firstSeg?.materialId ?? selectedMaterials[0];
    if (!matId) return null;
    const mat = materials.find((m) => m.id === matId);
-    return mat?.path ? resolveMediaUrl(mat.path) : null;
+    if (!mat) return null;
+    return `/api/materials/stream/${mat.id}`;
  }, [materials, timelineSegments, selectedMaterials]);

  const materialPosterUrl = useVideoFrameCapture(showStylePreview ? firstTimelineMaterialUrl : null);
--- a/frontend/src/features/home/model/useVideoFrameCapture.ts
+++ b/frontend/src/features/home/model/useVideoFrameCapture.ts
@@ -18,7 +18,6 @@ export function useVideoFrameCapture(videoUrl: string | null): string | null {

    let isActive = true;
    const video = document.createElement("video");
-    video.crossOrigin = "anonymous";
    video.muted = true;
    video.preload = "auto";
    video.playsInline = true;
--- a/models/MuseTalk/musetalk/utils/blending.py
+++ b/models/MuseTalk/musetalk/utils/blending.py
@@ -109,6 +109,31 @@ def get_image_blending(image, face, face_box, mask_array, crop_box):
    return body[:,:,::-1]


+def get_image_blending_fast(image, face, face_box, mask_array, crop_box):
+    """纯 numpy blending，无 PIL 转换，无 BGR↔RGB 翻转。
+    所有输入输出均为 BGR numpy uint8，与 get_image_blending 语义等价。
+    """
+    x, y, x1, y1 = face_box
+    x_s, y_s, x_e, y_e = crop_box
+
+    result = image.copy()
+
+    # 1. 将生成的人脸贴入 crop 区域对应位置
+    crop_region = result[y_s:y_e, x_s:x_e].copy()
+    fy, fx = y - y_s, x - x_s
+    fh, fw = y1 - y, x1 - x
+    crop_region[fy:fy+fh, fx:fx+fw] = face
+
+    # 2. mask alpha 混合（numpy 向量化广播）
+    mask_f = mask_array[:, :, np.newaxis].astype(np.float32) * (1.0 / 255.0)
+    orig_region = result[y_s:y_e, x_s:x_e].astype(np.float32)
+    new_region = crop_region.astype(np.float32)
+    blended = orig_region * (1.0 - mask_f) + new_region * mask_f
+    result[y_s:y_e, x_s:x_e] = blended.astype(np.uint8)
+
+    return result
+
+
 def get_image_prepare_material(image, face_box, upper_boundary_ratio=0.5, expand=1.5, fp=None, mode="raw"):
    body = Image.fromarray(image[:,:,::-1])

--- a/models/MuseTalk/scripts/server.py
+++ b/models/MuseTalk/scripts/server.py
@@ -77,7 +77,7 @@ from transformers import WhisperModel
 musetalk_root = Path(__file__).resolve().parent.parent
 sys.path.insert(0, str(musetalk_root))

-from musetalk.utils.blending import get_image, get_image_blending, get_image_prepare_material
+from musetalk.utils.blending import get_image, get_image_blending, get_image_blending_fast, get_image_prepare_material
 from musetalk.utils.face_parsing import FaceParsing
 from musetalk.utils.audio_processor import AudioProcessor
 from musetalk.utils.utils import get_file_type, get_video_fps, datagen, load_all_model
@@ -124,13 +124,15 @@ BLEND_CACHE_EVERY = 5   # BiSeNet mask 缓存: 每 N 帧更新一次


 def run_ffmpeg(cmd):
-    """执行 FFmpeg 命令"""
-    print(f"Executing: {cmd}")
+    """执行 FFmpeg 命令（接受列表或字符串）"""
+    if isinstance(cmd, str):
+        cmd = cmd.split()
+    print(f"Executing: {' '.join(cmd)}")
    try:
-        result = subprocess.run(cmd, shell=True, check=True, capture_output=True, text=True)
+        result = subprocess.run(cmd, check=True, capture_output=True, text=True)
        return True
    except subprocess.CalledProcessError as e:
-        print(f"Error executing ffmpeg: {cmd}")
+        print(f"Error executing ffmpeg: {' '.join(cmd)}")
        print(f"Return code: {e.returncode}")
        print(f"Stderr: {e.stderr[:500]}")
        return False
@@ -427,7 +429,7 @@ def _run_inference(req: LipSyncRequest) -> dict:
    # ===== Phase 4: VAE 潜空间编码 =====
    t0 = time.time()
    input_latent_list = []
-    extra_margin = 10
+    extra_margin = 15
    for bbox, frame in zip(coord_list, frames):
        if bbox == coord_placeholder:
            continue
@@ -477,7 +479,7 @@ def _run_inference(req: LipSyncRequest) -> dict:
    timings["5_unet"] = time.time() - t0
    print(f"✅ UNet 推理: {len(res_frame_list)} 帧 [{timings['5_unet']:.1f}s]")

-    # ===== Phase 6: 合成 (缓存 BiSeNet mask + cv2.VideoWriter) =====
+    # ===== Phase 6: 合成 (cv2.VideoWriter + 纯 numpy blending) =====
    t0 = time.time()

    h, w = frames[0].shape[:2]
@@ -523,13 +525,17 @@ def _run_inference(req: LipSyncRequest) -> dict:
                continue

        try:
-            combine_frame = get_image_blending(
+            combine_frame = get_image_blending_fast(
                ori_frame, res_frame, adjusted_bbox, cached_mask, cached_crop_box)
        except Exception:
-            # blending 失败时 fallback 到完整方式
-            combine_frame = get_image(
-                ori_frame, res_frame, list(adjusted_bbox),
-                mode=blend_mode, fp=fp)
+            # blending_fast 失败时 fallback 到 PIL 方式
+            try:
+                combine_frame = get_image_blending(
+                    ori_frame, res_frame, adjusted_bbox, cached_mask, cached_crop_box)
+            except Exception:
+                combine_frame = get_image(
+                    ori_frame, res_frame, list(adjusted_bbox),
+                    mode=blend_mode, fp=fp)

        writer.write(combine_frame)

@@ -537,13 +543,15 @@ def _run_inference(req: LipSyncRequest) -> dict:
    timings["6_blend"] = time.time() - t0
    print(f"🎨 合成 [{timings['6_blend']:.1f}s]")

-    # ===== Phase 7: FFmpeg 重编码 H.264 + 合并音频 =====
+    # ===== Phase 7: FFmpeg H.264 编码 + 合并音频 =====
    t0 = time.time()
-    cmd = (
-        f"ffmpeg -y -v warning -i {temp_raw_path} -i {audio_path} "
-        f"-c:v libx264 -crf 18 -pix_fmt yuv420p "
-        f"-c:a copy -shortest {output_vid_path}"
-    )
+    cmd = [
+        "ffmpeg", "-y", "-v", "warning",
+        "-i", temp_raw_path, "-i", audio_path,
+        "-c:v", "libx264", "-crf", "18", "-pix_fmt", "yuv420p",
+        "-c:a", "copy", "-shortest",
+        output_vid_path
+    ]
    if not run_ffmpeg(cmd):
        raise RuntimeError("FFmpeg 重编码+音频合并失败")

--- a/remotion/render.ts
+++ b/remotion/render.ts
@@ -155,18 +155,56 @@ async function main() {
  console.log(`Public dir: ${publicDir}, Video file: ${videoFileName}`);

  // Bundle the Remotion project
-  console.log('Bundling Remotion project...');
-
  // 修复: 使用 process.cwd() 解析 src/index.ts，确保在 dist/render.js 和 ts-node 下都能找到
  // 假设脚本总是在 remotion 根目录下运行 (由 python service 保证)
  const entryPoint = path.resolve(process.cwd(), 'src/index.ts');
-  console.log(`Entry point: ${entryPoint}`);

-  const bundleLocation = await bundle({
-    entryPoint,
-    webpackOverride: (config) => config,
-    publicDir,
-  });
+  // Bundle 缓存逻辑：通过 src 目录 mtime hash 判断是否需要重新打包
+  const BUNDLE_CACHE_DIR = path.resolve(process.cwd(), '.remotion-bundle-cache');
+  const hashFile = path.join(BUNDLE_CACHE_DIR, '.hash');
+
+  function getSourceHash(): string {
+    // 收集 src 目录下所有文件的 mtime 作为缓存 key
+    const srcDir = path.resolve(process.cwd(), 'src');
+    const mtimes: string[] = [];
+    function walkDir(dir: string) {
+      for (const entry of fs.readdirSync(dir, { withFileTypes: true })) {
+        const fullPath = path.join(dir, entry.name);
+        if (entry.isDirectory()) {
+          walkDir(fullPath);
+        } else {
+          mtimes.push(`${fullPath}:${fs.statSync(fullPath).mtimeMs}`);
+        }
+      }
+    }
+    walkDir(srcDir);
+    mtimes.sort();
+    return mtimes.join('|');
+  }
+
+  const currentHash = getSourceHash();
+  let bundleLocation: string;
+
+  if (fs.existsSync(hashFile) && fs.readFileSync(hashFile, 'utf-8') === currentHash) {
+    bundleLocation = BUNDLE_CACHE_DIR;
+    console.log('Using cached bundle');
+  } else {
+    console.log('Bundling Remotion project...');
+    console.log(`Entry point: ${entryPoint}`);
+    const freshBundle = await bundle({
+      entryPoint,
+      webpackOverride: (config) => config,
+      publicDir,
+    });
+    // 复制到缓存目录
+    if (fs.existsSync(BUNDLE_CACHE_DIR)) {
+      fs.rmSync(BUNDLE_CACHE_DIR, { recursive: true });
+    }
+    fs.cpSync(freshBundle, BUNDLE_CACHE_DIR, { recursive: true });
+    fs.writeFileSync(hashFile, currentHash);
+    bundleLocation = BUNDLE_CACHE_DIR;
+    console.log('Bundle cached for future use');
+  }

  // 统一 inputProps，包含视频尺寸供 calculateMetadata 使用
  const inputProps = {
@@ -198,7 +236,7 @@ async function main() {
  composition.height = videoHeight;

  // Render the video
-  const concurrency = options.concurrency || 16;
+  const concurrency = options.concurrency || 4;
  console.log(`Rendering video (concurrency=${concurrency})...`);
  await renderMedia({
    composition,
--- a/remotion/src/utils/captions.ts
+++ b/remotion/src/utils/captions.ts
@@ -27,7 +27,7 @@ export function getCurrentSegment(
  currentTimeInSeconds: number
 ): Segment | null {
  for (const segment of captions.segments) {
-    if (currentTimeInSeconds >= segment.start && currentTimeInSeconds <= segment.end) {
+    if (currentTimeInSeconds >= segment.start && currentTimeInSeconds < segment.end) {
      return segment;
    }
  }
@@ -43,7 +43,7 @@ export function getCurrentWordIndex(
 ): number {
  for (let i = 0; i < segment.words.length; i++) {
    const word = segment.words[i];
-    if (currentTimeInSeconds >= word.start && currentTimeInSeconds <= word.end) {
+    if (currentTimeInSeconds >= word.start && currentTimeInSeconds < word.end) {
      return i;
    }
    // 如果当前时间在两个字之间，返回前一个字
Author	SHA1	Message	Date
Kevin Wong	9de2cb40b4	更新	2026-02-28 14:44:51 +08:00
Kevin Wong	29c67f629d	更新	2026-02-28 09:16:41 +08:00