更新

2026-03-02 16:35:16 +08:00 · 2026-02-28 17:49:32 +08:00 · 2026-02-28 14:44:51 +08:00 · 2026-02-28 09:16:41 +08:00
38 changed files with 1887 additions and 539 deletions
--- a/Docs/BACKEND_README.md
+++ b/Docs/BACKEND_README.md
@@ -65,6 +65,7 @@ backend/
    *   `POST /api/materials`: 上传素材
    *   `GET /api/materials`: 获取素材列表
    *   `PUT /api/materials/{material_id}`: 重命名素材
+    *   `GET /api/materials/stream/{material_id}`: 同源流式返回素材文件（用于前端 canvas 截帧，避免跨域 CORS taint）

 4.  **社交发布 (Publish)**
    *   `POST /api/publish`: 发布视频到 抖音/微信视频号/B站/小红书
@@ -160,6 +161,18 @@ backend/
 - 多素材片段在拼接前统一重编码，并强制 `25fps + CFR`，减少段边界时间基不一致导致的画面卡顿。
 - concat 流程启用 `+genpts` 重建时间戳，提升拼接后时间轴连续性。
 - 对带旋转元数据的 MOV 素材会先做方向归一化，再进入分辨率判断和后续流程。
+- compose 阶段（视频轨+音频轨合并）使用 `-c:v copy` 流复制替代重编码，几乎瞬间完成。
+- FFmpeg 子进程设有超时保护：`_run_ffmpeg()` 600 秒、`_get_duration()` 30 秒，防止畸形文件导致永久挂起。
+
+### 全局并发控制
+
+- 视频生成入口使用 `asyncio.Semaphore(2)` 限制最多 2 个任务同时执行，排队中的任务显示"排队中..."状态。
+- Redis 任务 key 设有 TTL：创建时 24 小时，completed/failed 状态 2 小时，`list()` 时自动清理过期索引。
+
+### 字幕时间戳优化
+
+- Whisper 输出经 `smooth_word_timestamps()` 三步平滑：单调递增保证、重叠消除（中点分割）、微小间隙填补（<50ms）。
+- 支持 `original_text` 原文节奏映射：原文字符按比例映射到 Whisper 时间戳上，解决 AI 改写/多语言文案与转录不一致问题。

 ## 📦 资源库与静态资源

--- a/Docs/COSYVOICE3_DEPLOY.md
+++ b/Docs/COSYVOICE3_DEPLOY.md
@@ -70,6 +70,18 @@ run_cosyvoice.sh                      # PM2 启动脚本
 | ref_text | string | 是 | 参考音频的转写文字 |
 | language | string | 否 | 语言 (默认 "Chinese"，CosyVoice 自动检测) |
 | speed | float | 否 | 语速 (默认 1.0，范围 0.5-2.0，建议 0.8-1.2) |
+| instruct_text | string | 否 | 语气指令 (默认 ""，非空时切换为 `inference_instruct2` 模式) |
+
+**推理模式分支：**
+- `instruct_text` 为空 → `inference_zero_shot(text, prompt_text, ref_audio)` — 纯声音克隆
+- `instruct_text` 非空 → `inference_instruct2(text, instruct_text, ref_audio)` — 带语气/情绪控制的声音克隆
+
+**支持的语气指令示例：**
+```
+"You are a helpful assistant. 请非常开心地说一句话。<|endofprompt|>"
+"You are a helpful assistant. 请非常伤心地说一句话。<|endofprompt|>"
+"You are a helpful assistant. 请非常生气地说一句话。<|endofprompt|>"
+```

 **返回：** WAV 音频文件

--- a/Docs/DEPLOY_MANUAL.md
+++ b/Docs/DEPLOY_MANUAL.md
@@ -97,7 +97,7 @@ python -m scripts.server  # 测试能否启动，Ctrl+C 退出

 ### 3b. MuseTalk 1.5 (长视频唇形同步, GPU0)

-> MuseTalk 是单步潜空间修复模型（非扩散模型），推理速度接近实时，适合 >=120s 的长视频。与 CosyVoice 共享 GPU0，fp16 推理约需 4-8GB 显存。
+> MuseTalk 是单步潜空间修复模型（非扩散模型），推理速度接近实时，适合 >=120s 的长视频。与 CosyVoice 共享 GPU0，fp16 推理约需 4-8GB 显存。合成阶段使用 NVENC GPU 硬编码（h264_nvenc）+ 纯 numpy blending，避免双重编码和 PIL 转换开销。

 请参考详细的独立部署指南：
 **[MuseTalk 部署指南](MUSETALK_DEPLOY.md)**
@@ -211,8 +211,10 @@ cp .env.example .env
 | `SUPABASE_PUBLIC_URL` | `https://api.hbyrkj.top` | Supabase API 公网地址 (前端访问) |
 | `LATENTSYNC_GPU_ID` | 1 | GPU 选择 (0 或 1) |
 | `LATENTSYNC_USE_SERVER` | false | 设为 true 以启用常驻服务加速 |
-| `LATENTSYNC_INFERENCE_STEPS` | 16 | 推理步数 (16-50) |
-| `LATENTSYNC_GUIDANCE_SCALE` | 1.5 | 引导系数 (1.0-3.0) |
+| `LATENTSYNC_INFERENCE_STEPS` | 20 | 推理步数 (16-50) |
+| `LATENTSYNC_GUIDANCE_SCALE` | 2.0 | 引导系数 (1.0-3.0) |
+| `LATENTSYNC_ENABLE_DEEPCACHE` | true | DeepCache 推理加速 |
+| `LATENTSYNC_SEED` | 1247 | 固定随机种子（可复现） |
 | `DEBUG` | true | 生产环境改为 false |
 | `REDIS_URL` | `redis://localhost:6379/0` | 任务状态存储（不可用时回退内存） |
 | `WEIXIN_HEADLESS_MODE` | headless-new | 视频号 Playwright 模式 (headful/headless-new) |
--- a/Docs/DevLogs/Day28.md
+++ b/Docs/DevLogs/Day28.md
@@ -201,3 +201,63 @@ const materialPosterUrl = useVideoFrameCapture(
 | TensorRT (DiT 模块) | +20-30% | 需编译 .plan 引擎 |
 | torch.compile() | +10-20% | 一行代码，但首次编译慢 |
 | vLLM (LLM 模块) | +10-15% | 额外依赖 |
+
+---
+
+## MuseTalk 合成阶段性能优化
+
+### 概述
+
+MuseTalk v2 优化后总耗时从 1799s 降到 819s（2.2x），但合成阶段（Phase 6）仍占 462.2s (56.4%)，是最大单一瓶颈。本次优化两个方向：纯 numpy blending 替代 PIL 转换、FFmpeg pipe + NVENC GPU 硬编码替代双重编码。
+
+### 1. 纯 numpy blending 替代 PIL（blending.py）
+
+- **问题**: `get_image_blending` 每帧做 3 次 numpy↔PIL 转换 + BGR↔RGB 通道翻转，纯粹浪费
+- **方案**: 新增 `get_image_blending_fast()` 函数
+  - 全程保持 BGR numpy 数组，不做 PIL 转换和通道翻转
+  - mask 混合用 numpy 向量化广播 `mask * (1/255)` 替代 `PIL.paste with mask`
+  - 原 `get_image_blending` 保留作为 fallback
+- **降级链**: `blending_fast` → `blending`（PIL）→ `get_image`（完整重算）
+
+### 2. FFmpeg pipe + NVENC 硬编码替代双重编码（server.py）
+
+**优化前（双重编码）**:
+```
+Phase 6: 逐帧 → cv2.VideoWriter (mp4v CPU 软编码) → temp_raw.mp4
+Phase 7: FFmpeg 读 temp_raw.mp4 → H.264 CPU 重编码 + 合并音频 → output.mp4
+```
+
+**优化后（单次 GPU 编码）**:
+```
+Phase 6: 逐帧 → FFmpeg stdin pipe (rawvideo → h264_nvenc GPU 编码) → temp_raw.mp4
+Phase 7: FFmpeg 只做音频合并 (-c:v copy -c:a copy) → output.mp4  （秒级）
+```
+
+- NVENC 参数: `-c:v h264_nvenc -preset p4 -cq 20 -pix_fmt yuv420p`
+- RTX 3090 NVENC 专用芯片编码，不占 CUDA 核心，编码速度 >500fps
+
+### 3. FFmpeg 进程资源管理加固
+
+- `try/finally` 包裹写帧循环，确保异常时 `proc.stdin.close()` 执行
+- `proc.wait()` 后读 stderr 再关闭，避免缓冲区死锁
+- stderr decode 加 `errors="ignore"` 防止非 UTF-8 崩溃
+
+### 4. `run_ffmpeg` 安全改进
+
+- 去掉 `shell=True`，改用列表传参，避免路径特殊字符导致命令注入
+- Phase 7 FFmpeg 命令从字符串拼接改为列表传参
+
+### 调优过程
+
+| 版本 | Phase 6 | Phase 7 | 总计 | 结论 |
+|------|---------|---------|------|------|
+| Day27 基线 | 462s | 38s | 819s | — |
+| v1: libx264 -preset medium | 548s | 0.3s | 854s | CPU 编码背压，反而更慢 |
+| v2: h264_nvenc（当前） | 待测 | 待测 | 待测 | NVENC 零背压，预估 Phase 6 < 200s |
+
+### 修改文件
+
+| 文件 | 改动 |
+|------|------|
+| `models/MuseTalk/musetalk/utils/blending.py` | 新增 `get_image_blending_fast()` 纯 numpy 函数 |
+| `models/MuseTalk/scripts/server.py` | Phase 6: FFmpeg pipe + NVENC + blending_fast；Phase 7: -c:v copy；`run_ffmpeg` 去掉 shell=True |
--- a/Docs/DevLogs/Day29.md
+++ b/Docs/DevLogs/Day29.md
@@ -0,0 +1,283 @@
+## 字幕同步修复 + 嘴型参数调优 + 视频流水线全面优化 + 预览背景修复 + CosyVoice 语气控制 (Day 29)
+
+### 概述
+
+本轮对视频生成流水线做全面审查优化：修复字幕与语音不同步问题（Whisper 时间戳平滑 + 原文节奏映射）、调优 LatentSync 嘴型参数、compose 流复制省去冗余重编码、FFmpeg 超时保护、全局并发限制、Redis 任务 TTL、临时文件清理、死代码移除。修复因前端域名迁移导致的样式预览背景 CORS 失效问题。新增 CosyVoice 语气控制功能，声音克隆模式下支持开心/伤心/生气等情绪表达（基于 `inference_instruct2`）。
+
+---
+
+## ✅ 改动内容
+
+### 1. 字幕同步修复（Whisper 时间戳 + 原文节奏映射）
+
+- **问题**: 字幕高亮与语音不同步，表现为字幕超前/滞后、高亮跳空
+- **根因**: Whisper 输出的逐字时间戳存在微小抖动（相邻字 end > 下一字 start），且字间间隙导致高亮"闪烁"
+
+#### whisper_service.py — 时间戳后处理
+
+新增 `smooth_word_timestamps()` 函数，三步平滑：
+
+1. **单调递增保证**: 后一字的 start 不早于前一字的 start
+2. **重叠消除**: 两字时间重叠时取中点分割
+3. **间隙填补**: 字间间隙 < 50ms 时直接连接，避免高亮跳空
+
+```python
+def smooth_word_timestamps(words):
+    for i in range(1, len(words)):
+        # 重叠 → 中点分割
+        if w["start"] < prev["end"]:
+            mid = (prev["end"] + w["start"]) / 2
+            prev["end"] = mid; w["start"] = mid
+        # 微小间隙 → 直接连接
+        if 0 < gap < 0.05:
+            prev["end"] = w["start"]
+```
+
+#### whisper_service.py — 原文节奏映射
+
+- **问题**: AI 改写/多语言文案与 Whisper 转录文字不一致，直接用 Whisper 文字会乱码
+- **方案**: `original_text` 参数非空时，用原文字符替换 Whisper 文字，但保留 Whisper 的语音节奏时间戳
+- 实现：将 N 个原文字符按比例映射到 M 个 Whisper 时间戳上（线性插值）
+- 字数比例异常检测（>1.5x 或 <0.67x 时警告）
+- 单字时长钳位：40ms ~ 800ms，防止极端漂移
+
+#### captions.ts — Remotion 端字幕查找
+
+新增 `getCurrentSegment()` 和 `getCurrentWordIndex()` 函数：
+
+- 根据当前帧时间精确查找应显示的字幕段落和高亮字索引
+- 处理字间间隙（两字之间返回前一字索引，保持高亮连续）
+- 超过最后一字结束时间时返回最后一字（避免末尾闪烁）
+
+---
+
+### 2. LatentSync 嘴型参数调优
+
+| 参数 | Day28 值 | Day29 值 | 说明 |
+|------|----------|----------|------|
+| `LATENTSYNC_INFERENCE_STEPS` | 16 | 20 | 适当增加步数提升嘴型质量 |
+| `LATENTSYNC_GUIDANCE_SCALE` | (默认) | 2.0 | 平衡嘴型贴合度与自然感 |
+| `LATENTSYNC_ENABLE_DEEPCACHE` | (默认) | true | DeepCache 加速推理 |
+| `LATENTSYNC_SEED` | (默认) | 1247 | 固定种子保证可复现 |
+| Remotion concurrency | 16 | 4 | 降低并发防止资源争抢 |
+
+---
+
+### 3. compose() 流复制替代冗余重编码（高优先级）
+
+**文件**: `video_service.py`
+
+- **问题**: `compose()` 只是合并视频轨+音频轨（mux），却每次用 `libx264 -preset medium -crf 20` 做完整重编码，耗时数分钟。整条流水线一个视频最多被 x264 编码 5 次
+- **方案**: 不需要循环时（`loop_count == 1`）用 `-c:v copy` 流复制，几乎瞬间完成；需要循环时仍用 libx264
+
+```python
+if loop_count > 1:
+    cmd.extend(["-c:v", "libx264", "-preset", "fast", "-crf", "23"])
+else:
+    cmd.extend(["-c:v", "copy"])
+```
+
+- compose 是中间产物（Remotion 会再次编码），流复制省一次编码且无质量损失
+
+---
+
+### 4. FFmpeg 超时保护（高优先级）
+
+**文件**: `video_service.py`
+
+- `_run_ffmpeg()`: 新增 `timeout=600`（10 分钟），捕获 `subprocess.TimeoutExpired`
+- `_get_duration()`: 新增 `timeout=30`
+- 防止畸形视频导致 FFmpeg 永久挂起阻塞后台任务
+
+---
+
+### 5. 全局任务并发限制（高优先级）
+
+**文件**: `workflow.py`
+
+- 模块级 `asyncio.Semaphore(2)`，`process_video_generation()` 入口 acquire
+- 排队中的任务显示"排队中..."状态
+- 防止多个请求同时跑 FFmpeg + Remotion 导致 CPU/内存爆炸
+
+```python
+_generation_semaphore = asyncio.Semaphore(2)
+
+async def process_video_generation(task_id, req, user_id):
+    _update_task(task_id, message="排队中...")
+    async with _generation_semaphore:
+        await _process_video_generation_inner(task_id, req, user_id)
+```
+
+---
+
+### 6. Redis 任务 TTL + 索引清理（中优先级）
+
+**文件**: `task_store.py`
+
+- `create()`: 设 24 小时 TTL（`ex=86400`）
+- `update()`: completed/failed 状态设 2 小时 TTL（`ex=7200`），其余 24 小时
+- `list()`: 遍历时顺带清理已过期的索引条目（`srem`）
+- 解决 Redis 任务 key 永久堆积问题
+
+---
+
+### 7. 临时字体文件清理（中优先级）
+
+**文件**: `workflow.py`
+
+- `prepare_style_for_remotion()` 复制字体到 temp_dir，但未加入清理列表
+- 现在遍历三组前缀（subtitle/title/secondary_title）× 四种扩展名（.ttf/.otf/.woff/.woff2），将存在的字体文件加入 `temp_files`
+
+---
+
+### 8. Whisper+split 逻辑去重（低优先级）
+
+**文件**: `workflow.py`
+
+- 两个分支（custom_assignments 不匹配 vs 默认）的 Whisper→_split_equal 代码 100% 相同（36 行重复）
+- 提取为内部函数 `_whisper_and_split()`，两个分支共用
+
+---
+
+### 9. LipSync 死代码清理（低优先级）
+
+**文件**: `lipsync_service.py`
+
+- 删除 `_preprocess_video()` 方法（92 行），全项目无任何调用
+
+---
+
+### 10. 标题字幕预览背景 CORS 修复
+
+- **问题**: 前端域名从 `vigent.hbyrkj.top` 迁移到 `ipagent.ai-labz.cn` 后，素材签名 URL（`api.hbyrkj.top`）与新前端域名完全不同根域，Supabase Kong 网关的 CORS 不覆盖新域名 → `<video crossOrigin="anonymous">` 加载失败 → canvas 截帧失败 → 回退渐变背景
+- **根因**: Day28 实现依赖 Supabase 返回 `Access-Control-Allow-Origin` 头，换域名后此依赖断裂
+
+**修复方案 — 同源代理（彻底绕开 CORS）**:
+
+| 组件 | 改动 |
+|------|------|
+| `materials/router.py` | 新增 `GET /api/materials/stream/{material_id}` 端点，通过 `get_local_file_path()` 从本地磁盘直读，返回 `FileResponse` |
+| `useHomeController.ts` | 帧截取 URL 改为 `/api/materials/stream/${mat.id}`（同源），不再用跨域签名 URL |
+| `useVideoFrameCapture.ts` | 移除 `crossOrigin = "anonymous"`，同源请求不需要 |
+
+链路：`用户点预览 → /api/materials/stream/xxx → Next.js rewrite → FastAPI FileResponse → 同源 <video> → canvas 截帧成功`
+
+---
+
+### 11. 支付宝回调域名更新
+
+**文件**: `.env`
+
+```
+ALIPAY_NOTIFY_URL=https://ipagent.ai-labz.cn/api/payment/notify
+ALIPAY_RETURN_URL=https://ipagent.ai-labz.cn/pay
+```
+
+---
+
+## 📁 修改文件清单
+
+| 文件 | 改动 |
+|------|------|
+| `backend/app/services/whisper_service.py` | 时间戳平滑 + 原文节奏映射 + 单字时长钳位 |
+| `remotion/src/utils/captions.ts` | 新增 `getCurrentSegment` / `getCurrentWordIndex` |
+| `backend/app/services/video_service.py` | compose 流复制 + FFmpeg 超时保护 |
+| `backend/app/modules/videos/workflow.py` | Semaphore(2) 并发限制 + 字体清理 + Whisper 逻辑去重 |
+| `backend/app/modules/videos/task_store.py` | Redis TTL + 索引过期清理 |
+| `backend/app/services/lipsync_service.py` | 删除 `_preprocess_video()` 死代码 |
+| `backend/app/services/remotion_service.py` | concurrency 16 → 4 |
+| `remotion/render.ts` | 新增 concurrency 参数支持 |
+| `backend/app/modules/materials/router.py` | 新增 `/stream/{material_id}` 同源代理端点 |
+| `frontend/.../useVideoFrameCapture.ts` | 移除 crossOrigin |
+| `frontend/.../useHomeController.ts` | 帧截取 URL 改用同源代理 |
+| `backend/.env` | 嘴型参数 + 支付宝域名更新 |
+
+---
+
+### 12. CosyVoice 语气控制功能
+
+- **功能**: 声音克隆模式下新增"语气"下拉菜单（正常/欢快/低沉/严肃），利用 CosyVoice3 的 `inference_instruct2()` 方法通过自然语言指令控制语气情绪
+- **默认行为不变**: 选择"正常"时仍走 `inference_zero_shot()`，与改动前完全一致
+
+#### 数据流
+
+```
+用户选择语气 → setEmotion("happy") → localStorage 持久化
+  → 生成配音 → emotion 映射为 instruct_text
+    → POST /api/generated-audios/generate { instruct_text }
+      → voice_clone_service → POST localhost:8010/generate { instruct_text }
+        → instruct_text 非空 ? inference_instruct2() : inference_zero_shot()
+```
+
+#### CosyVoice 服务 — `cosyvoice_server.py`
+
+- `/generate` 端点新增 `instruct_text: str = Form("")` 参数
+- 推理分支：空 → `inference_zero_shot()`，非空 → `inference_instruct2(text, instruct_text, ref_audio_path, ...)`
+- `inference_instruct2` 不需要 `prompt_text`，直接接受 `instruct_text` + `prompt_wav`
+
+#### 后端透传
+
+- `schemas.py`: `GenerateAudioRequest` 新增 `instruct_text: Optional[str] = None`
+- `service.py`: `generate_audio_task()` voiceclone 分支传递 `instruct_text=req.instruct_text or ""`
+- `voice_clone_service.py`: `_generate_once()` 和 `generate_audio()` 新增 `instruct_text` 参数
+
+#### 前端
+
+- `useHomeController.ts`: 新增 `emotion` state + `emotionToInstruct` 映射表
+- `useHomePersistence.ts`: 语气选择持久化到 localStorage
+- `useGeneratedAudios.ts`: `generateAudio` params 新增 `instruct_text`
+- `GeneratedAudiosPanel.tsx`: 语气下拉菜单（语速按钮左侧），复用语速下拉样式，仅 voiceclone 模式可见
+- `HomePage.tsx`: 透传 `emotion`/`onEmotionChange`
+
+#### instruct_text 格式（来自 CosyVoice3 instruct_list）
+
+```
+正常: ""（走 inference_zero_shot）
+欢快: "You are a helpful assistant. 请非常开心地说一句话。<|endofprompt|>"
+低沉: "You are a helpful assistant. 请非常伤心地说一句话。<|endofprompt|>"
+严肃: "You are a helpful assistant. 请非常生气地说一句话。<|endofprompt|>"
+```
+
+---
+
+## 📁 修改文件清单
+
+| 文件 | 改动 |
+|------|------|
+| `backend/app/services/whisper_service.py` | 时间戳平滑 + 原文节奏映射 + 单字时长钳位 |
+| `remotion/src/utils/captions.ts` | 新增 `getCurrentSegment` / `getCurrentWordIndex` |
+| `backend/app/services/video_service.py` | compose 流复制 + FFmpeg 超时保护 |
+| `backend/app/modules/videos/workflow.py` | Semaphore(2) 并发限制 + 字体清理 + Whisper 逻辑去重 |
+| `backend/app/modules/videos/task_store.py` | Redis TTL + 索引过期清理 |
+| `backend/app/services/lipsync_service.py` | 删除 `_preprocess_video()` 死代码 |
+| `backend/app/services/remotion_service.py` | concurrency 16 → 4 |
+| `remotion/render.ts` | 新增 concurrency 参数支持 |
+| `backend/app/modules/materials/router.py` | 新增 `/stream/{material_id}` 同源代理端点 |
+| `frontend/.../useVideoFrameCapture.ts` | 移除 crossOrigin |
+| `frontend/.../useHomeController.ts` | 帧截取 URL 改用同源代理 + emotion state + emotionToInstruct 映射 |
+| `backend/.env` | 嘴型参数 + 支付宝域名更新 |
+| `models/CosyVoice/cosyvoice_server.py` | `/generate` 新增 `instruct_text` 参数，分支 `inference_instruct2` / `inference_zero_shot` |
+| `backend/app/services/voice_clone_service.py` | `_generate_once` / `generate_audio` 新增 `instruct_text` 透传 |
+| `backend/app/modules/generated_audios/schemas.py` | `GenerateAudioRequest` 新增 `instruct_text` 字段 |
+| `backend/app/modules/generated_audios/service.py` | voiceclone 分支传递 `instruct_text` |
+| `frontend/.../useGeneratedAudios.ts` | `generateAudio` params 新增 `instruct_text` |
+| `frontend/.../useHomePersistence.ts` | emotion 持久化 (localStorage) |
+| `frontend/.../GeneratedAudiosPanel.tsx` | 语气下拉菜单 UI (embedded + standalone) |
+| `frontend/.../HomePage.tsx` | 透传 emotion / onEmotionChange |
+
+---
+
+## 🔍 验证
+
+1. **字幕同步**: 生成视频观察逐字高亮，不应出现超前/滞后/跳空
+2. **compose 流复制**: FFmpeg 日志中 compose 步骤应出现 `-c:v copy`，耗时从分钟级降到秒级
+3. **FFmpeg 超时**: 代码确认 timeout 参数已加
+4. **并发限制**: 连续提交 3 个任务，第 3 个应显示"排队中"，前 2 个完成后才开始
+5. **Redis TTL**: `redis-cli TTL vigent:tasks:<id>` 确认有过期时间
+6. **字体清理**: 生成视频后 temp 目录不应残留字体文件
+7. **预览背景**: 选择素材 → 点击"预览样式"，应显示视频第一帧（非渐变）
+8. **支付宝**: 发起支付后回调和跳转地址为新域名
+9. **语气控制**: 声音克隆模式选择"开心"/"生气"生成配音，CosyVoice 日志出现 `🎭 Instruct mode`，音频语气有明显变化
+10. **语气默认**: 选择"正常"时行为与改动前完全相同（走 `inference_zero_shot`）
+11. **语气持久化**: 切换语气后刷新页面，下拉菜单恢复上次选择
+12. **语气可见性**: 语气下拉仅在 voiceclone 模式显示，edgetts 模式不显示
--- a/Docs/DevLogs/Day30.md
+++ b/Docs/DevLogs/Day30.md
@@ -0,0 +1,363 @@
+## Remotion 缓存修复 + 编码流水线质量优化 + 唇形同步容错 + 模型选择 (Day 30)
+
+### 概述
+
+本轮解决四大方面：(1) Remotion bundle 缓存导致标题/字幕丢失的严重 Bug；(2) 全面优化 LatentSync + MuseTalk 双引擎编码流水线，消除冗余有损编码；(3) 增强 LatentSync 的鲁棒性，允许素材中部分帧检测不到人脸时继续推理而非中断任务；(4) 前端唇形模型选择，用户可按需切换默认/快速/高级模型。
+
+---
+
+## ✅ 改动内容
+
+### 1. Remotion Bundle 缓存 404 修复（严重 Bug）
+
+- **问题**: 生成的视频没有标题和字幕，Remotion 渲染失败后静默回退到 FFmpeg（无文字叠加能力）
+- **根因**: Remotion 的 bundle 缓存机制只在首次打包时复制 `publicDir`（视频/字体所在目录）。代码稳定后缓存持续命中，新生成的视频和字体文件不在旧缓存的 `public/` 目录 → Remotion HTTP server 返回 404 → 渲染失败
+- **尝试**: 先用 `fs.symlinkSync` 符号链接，但 Remotion 内部 HTTP server 不支持跟随符号链接
+- **最终方案**: 使用 `fs.linkSync` 硬链接（同文件系统零拷贝，对应用完全透明），跨文件系统时自动回退为 `fs.copyFileSync`
+
+**文件**: `remotion/render.ts`
+
+```typescript
+function ensureInCachedPublic(cachedPublicDir, srcAbsPath, fileName) {
+  // 检查是否已存在且为同一 inode
+  // 优先硬链接（零拷贝），跨文件系统回退为复制
+  try {
+    fs.linkSync(srcAbsPath, cachedPath);
+  } catch {
+    fs.copyFileSync(srcAbsPath, cachedPath);
+  }
+}
+```
+
+使用缓存 bundle 时，自动将当前渲染所需的文件（视频 + 字体）硬链接到缓存的 `public/` 目录：
+- 视频文件（`videoFileName`）
+- 字体文件（从 `subtitleStyle` / `titleStyle` / `secondaryTitleStyle` 的 `font_file` 字段提取）
+
+---
+
+### 2. 视频编码流水线质量优化
+
+对完整流水线做全面审查，发现从素材上传到最终输出，视频最多经历 **5-6 次有损重编码**，而官方 LatentSync demo 只有 1-2 次。
+
+#### 优化前编码链路
+
+| # | 阶段 | CRF | 问题 |
+|---|------|-----|------|
+| 1 | 方向归一化 | 23 | 条件触发 |
+| 2 | `prepare_segment` 缩放+时长 | 23 | 必经，质量偏低 |
+| 3 | LatentSync `read_video` FPS 转换 | 18 | **即使已是 25fps 也重编码** |
+| 4 | LatentSync `imageio` 写帧 | 13 | 模型输出 |
+| 5 | LatentSync final mux | 18 | **CRF13 刚写完立刻 CRF18 重编码** |
+| 6 | compose | copy | Day29 已优化 |
+| 7 | 多素材 concat | 23 | **段参数已统一，不需要重编码** |
+| 8 | Remotion 渲染 | ~18 | 必经（叠加文字） |
+
+#### 优化措施
+
+##### 2a. LatentSync `read_video` 跳过冗余 FPS 重编码
+
+**文件**: `models/LatentSync/latentsync/utils/util.py`
+
+- 原代码无条件执行 `ffmpeg -r 25 -crf 18`，即使输入视频已是 25fps
+- 新增 FPS 检测：`abs(current_fps - 25.0) < 0.5` 时直接使用原文件
+- 我们的 `prepare_segment` 已统一输出 25fps，此步完全多余
+
+```python
+cap = cv2.VideoCapture(video_path)
+current_fps = cap.get(cv2.CAP_PROP_FPS)
+cap.release()
+
+if abs(current_fps - 25.0) < 0.5:
+    print(f"Video already at {current_fps:.1f}fps, skipping FPS conversion")
+    target_video_path = video_path
+else:
+    # 仅非 25fps 时才重编码
+    command = f"ffmpeg ... -r 25 -crf 18 ..."
+```
+
+##### 2b. LatentSync final mux 流复制替代重编码
+
+**文件**: `models/LatentSync/latentsync/pipelines/lipsync_pipeline.py`
+
+- 原代码：`imageio` 以 CRF 13 高质量写完帧后，final mux 又用 `libx264 -crf 18` 完整重编码
+- 修复：改为 `-c:v copy` 流复制，仅 mux 音频轨，视频零损失
+
+```diff
+- ffmpeg ... -c:v libx264 -crf 18 -c:a aac -q:v 0 -q:a 0
+ ffmpeg ... -c:v copy -c:a aac -q:a 0
+```
+
+##### 2c. `prepare_segment` + `normalize_orientation` CRF 23 → 18
+
+**文件**: `backend/app/services/video_service.py`
+
+- `normalize_orientation`：CRF 23 → 18
+- `prepare_segment` trim 临时文件：CRF 23 → 18
+- `prepare_segment` 主命令：CRF 23 → 18
+- CRF 18 是"高质量"级别，与 LatentSync 内部标准一致
+
+##### 2d. 多素材 concat 流复制
+
+**文件**: `backend/app/services/video_service.py`
+
+- 原代码用 `libx264 -crf 23` 重编码拼接
+- 所有段已由 `prepare_segment` 统一为相同分辨率/帧率/编码参数
+- 改为 `-c:v copy` 流复制，消除一次完整重编码
+
+```diff
+- -vsync cfr -r 25 -c:v libx264 -preset fast -crf 23 -pix_fmt yuv420p
+ -c:v copy
+```
+
+#### 优化后编码链路
+
+| # | 阶段 | CRF | 状态 |
+|---|------|-----|------|
+| 1 | 方向归一化 | **18** | 提质（条件触发） |
+| 2 | `prepare_segment` | **18** | 提质（必经） |
+| 3 | ~~LatentSync FPS 转换~~ | - | **已消除** |
+| 4 | LatentSync 模型输出 | 13 | 不变（不可避免） |
+| 5 | ~~LatentSync final mux~~ | - | **已消除（copy）** |
+| 6 | compose | copy | 不变 |
+| 7 | ~~多素材 concat~~ | - | **已消除（copy）** |
+| 8 | Remotion 渲染 | ~18 | 不变（不可避免） |
+
+**总计：5-6 次有损编码 → 3 次**（prepare_segment → LatentSync 模型输出 → Remotion），质量损失减少近一半。
+
+---
+
+## 📁 修改文件清单
+
+| 文件 | 改动 |
+|------|------|
+| `remotion/render.ts` | bundle 缓存使用时硬链接视频+字体到 public 目录 |
+| `models/LatentSync/latentsync/utils/util.py` | `read_video` 检测 FPS，25fps 时跳过重编码 |
+| `models/LatentSync/latentsync/pipelines/lipsync_pipeline.py` | final mux `-c:v copy`；无脸帧容错（affine_transform + restore_video） |
+| `backend/app/services/video_service.py` | `normalize_orientation` CRF 23→18；`prepare_segment` CRF 23→18；`concat_videos` `-c:v copy` |
+| `backend/app/modules/videos/workflow.py` | 单素材 LatentSync 异常时回退原视频 |
+
+---
+
+### 3. LatentSync 无脸帧容错
+
+- **问题**: 素材中如果有部分帧检测不到人脸（转头、遮挡、空镜头），`affine_transform` 会抛异常导致整个推理任务失败
+- **改动**:
+  - `affine_transform_video`: 单帧异常时 catch 住，用最近有效帧的 face/box/affine_matrix 填充（保证 tensor batch 维度完整），全部帧无脸时仍 raise
+  - `restore_video`: 新增 `valid_face_flags` 参数，无脸帧直接保留原画面（不做嘴型替换）
+  - `loop_video`: `valid_face_flags` 跟随循环和翻转
+  - `workflow.py`: 单素材路径 `lipsync.generate()` 整体异常时 copy 原视频继续流程，任务不会失败
+
+---
+
+### 4. MuseTalk 编码链路优化
+
+#### 4a. FFmpeg rawvideo 管道直编码（消除中间有损文件）
+
+**文件**: `models/MuseTalk/scripts/server.py`
+
+- **原流程**: UNet 推理帧 → `cv2.VideoWriter(mp4v)` 写中间文件（有损） → FFmpeg 重编码+音频 mux（又一次有损）
+- **新流程**: UNet 推理帧 → FFmpeg rawvideo stdin 管道 → 一次 libx264 编码+音频 mux
+
+```python
+ffmpeg_cmd = [
+    "ffmpeg", "-y", "-v", "warning",
+    "-f", "rawvideo", "-pix_fmt", "bgr24",
+    "-s", f"{w}x{h}", "-r", str(fps),
+    "-i", "-",                        # stdin 管道输入
+    "-i", audio_path,
+    "-c:v", "libx264", "-preset", ENCODE_PRESET, "-crf", str(ENCODE_CRF),
+    "-pix_fmt", "yuv420p",
+    "-c:a", "copy", "-shortest",
+    output_vid_path,
+]
+ffmpeg_proc = subprocess.Popen(ffmpeg_cmd, stdin=subprocess.PIPE, ...)
+# 每帧直接 pipe_in.write(frame.tobytes())
+```
+
+关键实现细节：
+- `-pix_fmt bgr24` 匹配 OpenCV 原生帧格式，零转换开销
+- `np.ascontiguousarray` 确保帧内存连续
+- `BrokenPipeError` 捕获 + return code 检查覆盖异常路径
+- `pipe_in.close()` 在 `ffmpeg_proc.wait()` 之前，正确发送 EOF
+- 合成 fallback（resize 失败、mask 失败、blending 失败）均通过 `_write_pipe_frame` 输出原帧
+
+#### 4b. MuseTalk 参数环境变量化 + 质量优先档
+
+**文件**: `models/MuseTalk/scripts/server.py` + `backend/.env`
+
+所有推理与编码参数从硬编码改为 `.env` 可配置，当前使用"质量优先"档：
+
+| 参数 | 原默认值 | 质量优先值 | 作用 |
+|------|----------|-----------|------|
+| `MUSETALK_DETECT_EVERY` | 5 | **2** | 人脸检测频率 ↑2.5x，画面跟踪更稳 |
+| `MUSETALK_BLEND_CACHE_EVERY` | 5 | **2** | mask 更新更频，面部边缘融合更干净 |
+| `MUSETALK_EXTRA_MARGIN` | 15 | **14** | 下巴区域微调 |
+| `MUSETALK_BLEND_MODE` | auto | **jaw** | v1.5 显式 jaw 模式 |
+| `MUSETALK_ENCODE_CRF` | 18 | **14** | 接近视觉无损（输出还要进 Remotion 再编码） |
+| `MUSETALK_ENCODE_PRESET` | medium | **slow** | 同 CRF 下压缩效率更高 |
+| `MUSETALK_AUDIO_PADDING` | 2/2 | 2/2 | 不变 |
+| `MUSETALK_FACEPARSING_CHEEK` | 90/90 | 90/90 | 不变 |
+
+新增可配置参数完整列表：`DETECT_EVERY`、`BLEND_CACHE_EVERY`、`AUDIO_PADDING_LEFT/RIGHT`、`EXTRA_MARGIN`、`DELAY_FRAME`、`BLEND_MODE`、`FACEPARSING_LEFT/RIGHT_CHEEK_WIDTH`、`ENCODE_CRF`、`ENCODE_PRESET`。
+
+---
+
+### 5. Workflow 异步防阻塞 + compose 跳过优化
+
+#### 5a. 阻塞调用线程池化
+
+**文件**: `backend/app/modules/videos/workflow.py`
+
+workflow 中多处同步 FFmpeg 调用会阻塞 asyncio 事件循环，导致其他 API 请求（健康检查、任务状态查询）无法响应。新增通用辅助函数 `_run_blocking()`，将所有阻塞调用统一走线程池：
+
+```python
+async def _run_blocking(func, *args):
+    """在线程池执行阻塞函数，避免卡住事件循环。"""
+    loop = asyncio.get_running_loop()
+    return await loop.run_in_executor(None, func, *args)
+```
+
+已改造的阻塞调用点：
+
+| 调用 | 位置 | 说明 |
+|------|------|------|
+| `video.normalize_orientation()` | 单素材旋转归一化 | FFmpeg 旋转/转码 |
+| `video.prepare_segment()` | 多素材片段准备 | FFmpeg 缩放+时长裁剪，配合 `asyncio.gather` 多段并行 |
+| `video.concat_videos()` | 多素材拼接 | FFmpeg concat |
+| `video.prepare_segment()` | 单素材 prepare | FFmpeg 缩放+时长裁剪 |
+| `video.mix_audio()` | BGM 混音 | FFmpeg 音频混合 |
+| `video._get_duration()` | 音频/视频时长探测 (3处) | ffprobe 子进程 |
+
+#### 5b. `prepare_segment` 同分辨率跳过 scale
+
+**文件**: `backend/app/modules/videos/workflow.py`
+
+原来无论素材分辨率是否已匹配目标，都强制传 `target_resolution` 给 `prepare_segment`，触发 scale filter + libx264 重编码。优化后逐素材比对分辨率：
+
+- **多素材**: 逐段判断，分辨率匹配的传 `None`（`prepare_target_res = None if res == base_res else base_res`），走 `-c:v copy` 分支
+- **单素材**: 先 `get_resolution` 比对，匹配则传 `None`
+
+当分辨率匹配且无截取、不需要循环、不需要变帧率时，`prepare_segment` 内部走 `-c:v copy`，完全零损编码。
+
+#### 5c. `_get_duration()` 线程池化
+
+**文件**: `backend/app/modules/videos/workflow.py`
+
+3 处 `video._get_duration()` 同步 ffprobe 调用改为 `await _run_blocking(video._get_duration, ...)`，避免阻塞事件循环。
+
+#### 5d. compose 循环场景 CRF 统一
+
+**文件**: `backend/app/services/video_service.py`
+
+`compose()` 在视频需要循环时的编码从 CRF 23 提升到 CRF 18，与全流水线质量标准统一。
+
+#### 5e. 多素材片段校验
+
+**文件**: `backend/app/modules/videos/workflow.py`
+
+多素材 `prepare_segment` 完成后新增片段数量一致性校验，避免空片段进入 concat 导致异常。
+
+#### 5f. compose() 内部防阻塞
+
+**文件**: `backend/app/services/video_service.py`
+
+`compose()` 改为 `async def`，内部的 `_get_duration()` 和 `_run_ffmpeg()` 都通过 `loop.run_in_executor` 在线程池执行。
+
+#### 5g. 无需二次 compose 直接透传
+
+**文件**: `backend/app/modules/videos/workflow.py`
+
+当没有 BGM 时（`final_audio_path == audio_path`），LatentSync/MuseTalk 输出已包含正确音轨，跳过多余的 compose 步骤：
+
+```python
+needs_audio_compose = str(final_audio_path) != str(audio_path)
+```
+
+- **Remotion 路径**: 音频没变则跳过 pre-compose，直接用 lipsync 输出进 Remotion
+- **非 Remotion 路径**: 音频没变则 `shutil.copy` 直接透传 lipsync 输出，不再走 compose
+
+---
+
+### 6. 唇形模型前端选择
+
+前端生成按钮右侧新增模型下拉，用户可按需选择唇形同步引擎，全链路透传到后端路由。
+
+#### 模型选项
+
+| 选项 | 值 | 路由逻辑 |
+|------|------|------|
+| 默认模型 | `default` | 保持现有阈值策略（`LIPSYNC_DURATION_THRESHOLD` 分水岭，短视频 LatentSync，长视频 MuseTalk） |
+| 快速模型 | `fast` | 强制 MuseTalk，不可用时回退 LatentSync |
+| 高级模型 | `advanced` | 强制 LatentSync，跳过 MuseTalk |
+
+三种模式最终都有 LatentSync 兜底，不会出现无模型可用的情况。
+
+#### 数据流
+
+```
+前端 select → setLipsyncModelMode("fast") → localStorage 持久化
+                                           ↓
+用户点击"生成视频" → handleGenerate()
+  → payload.lipsync_model = lipsyncModelMode
+  → POST /api/videos/generate { ..., lipsync_model: "fast" }
+    → workflow: req.lipsync_model 透传给 lipsync.generate(model_mode=...)
+      → lipsync_service.generate(): 按 model_mode 路由
+        → fast: 强制 MuseTalk → 回退 LatentSync
+        → advanced: 强制 LatentSync
+        → default: 阈值策略
+```
+
+#### 改动文件
+
+| 文件 | 改动 |
+|------|------|
+| `frontend/src/features/home/ui/GenerateActionBar.tsx` | 生成按钮右侧新增模型 `<select>` 下拉 |
+| `frontend/src/features/home/ui/HomePage.tsx` | 透传 `modelMode` / `onModelModeChange` |
+| `frontend/src/features/home/model/useHomeController.ts` | `lipsyncModelMode` state + payload 透传 |
+| `frontend/src/features/home/model/useHomePersistence.ts` | 读/校验/写三步持久化 |
+| `backend/app/modules/videos/schemas.py` | `lipsync_model: Literal["default", "fast", "advanced"]` |
+| `backend/app/modules/videos/workflow.py` | 多素材/单素材两处 `model_mode=req.lipsync_model` 透传 |
+| `backend/app/services/lipsync_service.py` | `generate()` 新增 `model_mode` 参数，三路分支路由 |
+
+---
+
+## 📁 总修改文件清单
+
+| 文件 | 改动 |
+|------|------|
+| `remotion/render.ts` | bundle 缓存使用时硬链接视频+字体到 public 目录 |
+| `models/LatentSync/latentsync/utils/util.py` | `read_video` 检测 FPS，25fps 时跳过重编码 |
+| `models/LatentSync/latentsync/pipelines/lipsync_pipeline.py` | final mux `-c:v copy`；无脸帧容错 |
+| `backend/app/services/video_service.py` | CRF 23→18；`concat_videos` copy；`compose()` 异步化 + 循环 CRF 18 |
+| `backend/app/modules/videos/workflow.py` | 线程池化；同分辨率跳过 scale；compose 跳过；片段校验；模型选择透传 |
+| `backend/app/modules/videos/schemas.py` | 新增 `lipsync_model` 字段 |
+| `backend/app/services/lipsync_service.py` | `generate()` 新增 `model_mode` 三路分支路由 |
+| `models/MuseTalk/scripts/server.py` | FFmpeg rawvideo 管道；参数环境变量化 |
+| `backend/.env` | 新增 MuseTalk 质量优先参数 |
+| `frontend/src/features/home/ui/GenerateActionBar.tsx` | 模型下拉 UI |
+| `frontend/src/features/home/ui/HomePage.tsx` | 模型状态透传 |
+| `frontend/src/features/home/model/useHomeController.ts` | `lipsyncModelMode` state + payload |
+| `frontend/src/features/home/model/useHomePersistence.ts` | 模型选择持久化 |
+
+---
+
+## 🔍 验证
+
+1. **标题字幕恢复**: 生成视频应有标题和逐字高亮字幕（Remotion 渲染成功，非 FFmpeg 回退）
+2. **Remotion 日志**: 应出现 `Hardlinked into cached bundle:` 或 `Copied into cached bundle:` 而非 404
+3. **LatentSync FPS 跳过**: 日志应出现 `Video already at 25.0fps, skipping FPS conversion`
+4. **LatentSync mux**: FFmpeg 日志中 final mux 应为 `-c:v copy`
+5. **画质对比**: 同一素材+音频，优化后生成的视频嘴型区域（尤其牙齿）应比优化前更清晰
+6. **多素材拼接**: concat 步骤应为流复制，耗时从秒级降到毫秒级
+7. **无脸帧容错**: 包含转头/遮挡帧的素材不再导致任务失败，无脸帧保留原画面
+8. **MuseTalk 管道编码**: 日志中不应出现中间 mp4v 文件，合成阶段直接管道写入
+9. **MuseTalk 质量参数**: `curl localhost:8011/health` 确认服务在线，生成视频嘴型边缘更清晰
+10. **事件循环不阻塞**: 生成视频期间，`/api/tasks/{id}` 等接口应正常响应，不出现超时
+11. **compose 跳过**: 无 BGM 时日志应出现 `Audio unchanged, skip pre-Remotion compose`
+12. **同分辨率跳过 scale**: 素材已是目标分辨率时，`prepare_segment` 应走 `-c:v copy`（日志中无 scale filter）
+13. **compose 循环 CRF**: 循环场景编码应为 CRF 18（非 23）
+14. **模型选择 UI**: 生成按钮右侧应出现默认模型/快速模型/高级模型下拉
+15. **模型选择持久化**: 切换模型后刷新页面，下拉应恢复上次选择
+16. **快速模型路由**: 选择"快速模型"时，后端日志应出现 `强制快速模型：MuseTalk`
+17. **高级模型路由**: 选择"高级模型"时，后端日志应出现 `强制高级模型：LatentSync`
+18. **默认模型不变**: 选择"默认模型"时行为与改动前完全一致（阈值路由）
--- a/Docs/FRONTEND_README.md
+++ b/Docs/FRONTEND_README.md
@@ -37,6 +37,7 @@ ViGent2 的前端界面，采用 Next.js 16 + TailwindCSS 构建。
 - **重新识别**: 旧参考音频可重新转写并截取 (RotateCw 按钮)。
 - **一键克隆**: 选择参考音频后自动调用 CosyVoice 3.0 服务。
 - **语速控制**: 声音克隆模式下支持 5 档语速 (0.8-1.2)，选择持久化 (Day 23)。
+- **语气控制**: 声音克隆模式下支持 4 种语气 (正常/欢快/低沉/严肃)，基于 CosyVoice3 `inference_instruct2`，选择持久化 (Day 29)。
 - **多语言支持**: EdgeTTS 10 语言声音列表，声音克隆 language 透传 (Day 22)。

 ### 4. 配音前置 + 时间轴编排 [Day 23 新增]
--- a/Docs/LatentSync_DEPLOY.md
+++ b/Docs/LatentSync_DEPLOY.md
@@ -201,6 +201,29 @@ LatentSync 1.6 需要 ~18GB VRAM。如果遇到 OOM 错误：
 - `inference_steps`: 增加到 30-50 可提高质量
 - `guidance_scale`: 增加可改善唇同步，但过高可能导致抖动

+### 编码流水线优化（Day 30）
+
+LatentSync 内部默认流程有两处冗余编码已优化：
+
+1. **`read_video` FPS 转换**: 原代码无条件 `ffmpeg -r 25 -crf 18`，现已改为检测 FPS，25fps 时跳过（我们的 `prepare_segment` 已输出 25fps）
+2. **final mux 双重编码**: 原代码 `imageio` CRF 13 写帧后又用 `libx264 -crf 18` 重编码做 mux，现已改为 `-c:v copy` 流复制
+
+这两项优化位于：
+- `latentsync/utils/util.py` — `read_video()` 函数
+- `latentsync/pipelines/lipsync_pipeline.py` — final mux 命令
+
+---
+
+### 无脸帧容错（Day 30）
+
+素材中部分帧检测不到人脸（转头、遮挡、空镜头）时，不再中断整次推理：
+
+- `affine_transform_video`: 单帧异常时用最近有效帧填充，全部帧无脸时仍报错
+- `restore_video`: 无脸帧保留原画面，不做嘴型替换
+- 后端 `workflow.py`: LatentSync 整体异常时自动回退原视频，任务不会失败
+
+改动位于 `latentsync/pipelines/lipsync_pipeline.py`。
+
 ---

 ## 参考链接
--- a/Docs/MUSETALK_DEPLOY.md
+++ b/Docs/MUSETALK_DEPLOY.md
@@ -1,6 +1,6 @@
 # MuseTalk 部署指南

-> **更新时间**：2026-02-27
+> **更新时间**：2026-03-02
 > **适用版本**：MuseTalk v1.5 (常驻服务模式)
 > **架构**：FastAPI 常驻服务 + PM2 进程管理

@@ -173,17 +173,36 @@ curl http://localhost:8011/health
 `backend/.env` 中的相关变量：

 ```ini
-# MuseTalk 配置
+# MuseTalk 基础配置
 MUSETALK_GPU_ID=0                        # GPU 编号 (与 CosyVoice 共存)
 MUSETALK_API_URL=http://localhost:8011    # 常驻服务地址
 MUSETALK_BATCH_SIZE=32                   # 推理批大小
 MUSETALK_VERSION=v15                     # 模型版本
 MUSETALK_USE_FLOAT16=true                # 半精度加速

+# 推理质量参数
+MUSETALK_DETECT_EVERY=2                  # 人脸检测降频间隔 (帧，越小越准但更慢)
+MUSETALK_BLEND_CACHE_EVERY=2             # BiSeNet mask 缓存更新间隔 (帧)
+MUSETALK_AUDIO_PADDING_LEFT=2            # Whisper 时序上下文 (左)
+MUSETALK_AUDIO_PADDING_RIGHT=2           # Whisper 时序上下文 (右)
+MUSETALK_EXTRA_MARGIN=14                 # v1.5 下巴区域扩展像素
+MUSETALK_DELAY_FRAME=0                   # 音频-口型对齐偏移 (帧)
+MUSETALK_BLEND_MODE=jaw                  # 融合模式: auto / jaw / raw
+MUSETALK_FACEPARSING_LEFT_CHEEK_WIDTH=90   # 面颊宽度 (仅 v1.5)
+MUSETALK_FACEPARSING_RIGHT_CHEEK_WIDTH=90
+
+# 编码质量参数
+MUSETALK_ENCODE_CRF=14                   # CRF 越小越清晰 (14≈接近视觉无损)
+MUSETALK_ENCODE_PRESET=slow              # x264 preset (slow=高压缩效率)
+
 # 混合唇形同步路由
 LIPSYNC_DURATION_THRESHOLD=120           # 秒, >=此值用 MuseTalk
 ```

+> **参数档位参考**：
+> - 速度优先：`DETECT_EVERY=5, BLEND_CACHE_EVERY=5, ENCODE_CRF=18, ENCODE_PRESET=medium`
+> - 质量优先（当前）：`DETECT_EVERY=2, BLEND_CACHE_EVERY=2, ENCODE_CRF=14, ENCODE_PRESET=slow`
+
 ---

 ## 相关文件
@@ -207,22 +226,36 @@ LIPSYNC_DURATION_THRESHOLD=120           # 秒, >=此值用 MuseTalk
 |--------|------|
 | `MUSETALK_BATCH_SIZE` 8→32 | RTX 3090 显存充裕，UNet 推理加速 ~3x |
 | cv2.VideoCapture 直读帧 | 跳过 ffmpeg→PNG→imread 链路 |
-| 人脸检测降频 (每5帧) | DWPose + FaceAlignment 只在采样帧运行，中间帧线性插值 bbox |
-| BiSeNet mask 缓存 (每5帧) | `get_image_prepare_material` 每 5 帧运行，中间帧用 `get_image_blending` 复用 |
-| cv2.VideoWriter 直写 | 跳过逐帧 PNG 写盘 + ffmpeg 重编码 |
+| 人脸检测降频 (每N帧) | DWPose + FaceAlignment 只在采样帧运行，中间帧线性插值 bbox |
+| BiSeNet mask 缓存 (每N帧) | `get_image_prepare_material` 每 N 帧运行，中间帧复用 |
+| FFmpeg rawvideo 管道直编码 | 原 `cv2.VideoWriter(mp4v)` 中间有损文件改为 stdin 管道直写，消除一次冗余有损编码 |
+| 参数环境变量化 | 所有推理/编码参数从 `.env` 读取，支持速度优先/质量优先快速切换 |
 | 每阶段计时 | 7 个阶段精确计时，方便后续调优 |

+### 编码链路
+
+```
+UNet 推理帧 (raw BGR24)
+  → FFmpeg rawvideo stdin 管道
+    → 一次 libx264 编码 (CRF 14, preset slow) + 音频 mux
+      → 最终输出 .mp4
+```
+
+与旧流程对比：消除了 `cv2.VideoWriter(mp4v)` 中间有损文件，编码次数从 2 次减至 1 次。
+
 ### 调优参数

-`models/MuseTalk/scripts/server.py` 顶部可调：
+所有参数通过 `backend/.env` 配置（修改后需重启 MuseTalk 服务生效）：

-```python
-DETECT_EVERY = 5        # 人脸检测降频间隔 (帧)
-BLEND_CACHE_EVERY = 5   # BiSeNet mask 缓存间隔 (帧)
+```ini
+MUSETALK_DETECT_EVERY=2        # 人脸检测降频间隔 (帧)，质量优先用 2，速度优先用 5
+MUSETALK_BLEND_CACHE_EVERY=2   # BiSeNet mask 缓存间隔 (帧)
+MUSETALK_ENCODE_CRF=14         # 编码质量 (14≈接近视觉无损，18=高质量)
+MUSETALK_ENCODE_PRESET=slow    # 编码速度 (slow=高压缩效率，medium=平衡)
 ```

-> 对于口播视频 (人脸几乎不动)，5 帧间隔的插值误差可忽略。
-> 如人脸运动剧烈的场景，可降低为 2-3。
+> 对于口播视频 (人脸几乎不动)，detect_every=5 的插值误差可忽略。
+> 如人脸运动剧烈或追求最佳质量，使用 detect_every=2。

 ---

--- a/Docs/SUBTITLE_DEPLOY.md
+++ b/Docs/SUBTITLE_DEPLOY.md
@@ -187,7 +187,7 @@ Remotion 渲染参数在 `backend/app/services/remotion_service.py` 中配置：
 | 参数 | 默认值 | 说明 |
 |------|--------|------|
 | `fps` | 25 | 输出帧率 |
-| `concurrency` | 16 | Remotion 并发渲染进程数（默认 16，可通过 `--concurrency` CLI 参数覆盖） |
+| `concurrency` | 4 | Remotion 并发渲染进程数（默认 4，可通过 `--concurrency` CLI 参数覆盖） |
 | `title_display_mode` | `short` | 标题显示模式（`short`=短暂显示；`persistent`=常驻显示） |
 | `title_duration` | 4.0 | 标题显示时长（秒，仅 `short` 模式生效） |

@@ -241,6 +241,15 @@ const bundleLocation = await bundle({
 const videoUrl = staticFile(videoSrc);  // 使用 staticFile
 ```

+**问题**: Remotion 渲染失败 - 404 视频文件找不到（bundle 缓存问题）
+
+Remotion 使用 bundle 缓存加速打包。缓存命中时，新生成的视频/字体文件需要硬链接到缓存的 `public/` 目录。如果出现 404 错误，清除缓存重试：
+
+```bash
+rm -rf /home/rongye/ProgramFiles/ViGent2/remotion/.remotion-bundle-cache
+pm2 restart vigent2-backend
+```
+
 **问题**: Remotion 渲染失败

 查看后端日志：
@@ -294,3 +303,6 @@ WhisperService(device="cuda:0")  # 或 "cuda:1"
 | 2026-01-30 | 1.0.1 | 字幕高亮样式与标题动画优化，视觉表现更清晰 |
 | 2026-02-25 | 1.2.0 | 字幕时间戳从线性插值改为 Whisper 节奏映射，修复长视频字幕漂移 |
 | 2026-02-27 | 1.3.0 | 架构图更新 MuseTalk 混合路由；Remotion 并发渲染从 8 提升到 16；GPU 分配说明更新 |
+| 2026-02-28 | 1.3.1 | MuseTalk 合成阶段优化：纯 numpy blending + FFmpeg pipe NVENC GPU 硬编码替代双重编码 |
+| 2026-02-28 | 1.4.0 | compose 流复制替代重编码；FFmpeg 超时保护 (600s/30s)；Remotion 并发 16→4；Whisper 时间戳平滑 + 原文节奏映射；全局视频生成 Semaphore(2)；Redis 任务 TTL |
+| 2026-03-02 | 1.5.0 | Remotion bundle 缓存修复（硬链接视频/字体到 cached public 目录）；编码流水线优化 prepare_segment/normalize CRF 23→18；多素材 concat 改为流复制 |
--- a/Docs/task_complete.md
+++ b/Docs/task_complete.md
@@ -1,8 +1,8 @@
 # ViGent2 开发任务清单 (Task Log)

 **项目**: ViGent2 数字人口播视频生成系统
-**进度**: 100% (Day 28 - CosyVoice FP16 加速 + 文档全面更新)
-**更新时间**: 2026-02-27
+**进度**: 100% (Day 30 - Remotion 缓存修复 + 编码流水线质量优化)
+**更新时间**: 2026-03-02

 ---

@@ -10,7 +10,37 @@

 > 这里记录了每一天的核心开发内容与 milestone。

-### Day 28: CosyVoice FP16 加速 + 文档全面更新 (Current)
+### Day 30: Remotion 缓存修复 + 编码流水线质量优化 + 唇形同步容错 (Current)
+- [x] **Remotion 缓存 404 修复**: bundle 缓存命中时，新生成的视频/字体文件不在旧缓存 `public/` 目录 → 404 → 回退 FFmpeg（无标题字幕）。改为硬链接（`fs.linkSync`）当前渲染所需文件到缓存目录。
+- [x] **LatentSync `read_video` 跳过冗余 FPS 重编码**: 检测输入 FPS，已是 25fps 时跳过 `ffmpeg -r 25 -crf 18` 重编码。
+- [x] **LatentSync final mux 流复制**: `imageio` CRF 13 写帧后的 mux 步骤从 `libx264 -crf 18` 改为 `-c:v copy`，消除冗余双重编码。
+- [x] **`prepare_segment` + `normalize_orientation` CRF 提质**: CRF 23 → 18，与 LatentSync 内部质量标准统一。
+- [x] **多素材 concat 流复制**: 各段参数已统一，`concat_videos` 从 `libx264 -crf 23` 改为 `-c:v copy`。
+- [x] **编码次数总计**: 从 5-6 次有损编码降至 3 次（prepare_segment → LatentSync/MuseTalk 模型输出 → Remotion）。
+- [x] **LatentSync 无脸帧容错**: 素材部分帧检测不到人脸时不再中断推理，无脸帧保留原画面，单素材异常时回退原视频。
+- [x] **MuseTalk 管道直编码**: `cv2.VideoWriter(mp4v)` 中间有损文件改为 FFmpeg rawvideo stdin 管道，消除一次冗余有损编码。
+- [x] **MuseTalk 参数环境变量化**: 推理与编码参数（detect_every/blend_cache/CRF/preset 等）从硬编码迁移到 `backend/.env`，当前使用质量优先档（CRF 14, preset slow, detect_every 2, blend_cache_every 2）。
+- [x] **Workflow 异步防阻塞**: 新增 `_run_blocking()` 线程池辅助，5 处同步 FFmpeg 调用（旋转归一化/prepare_segment/concat/BGM 混音）改为 `await _run_blocking()`，事件循环不再被阻塞。
+- [x] **compose 跳过优化**: 无 BGM 时 `final_audio_path == audio_path`，跳过多余的 compose 步骤，Remotion 路径直接用 lipsync 输出，非 Remotion 路径 `shutil.copy` 透传。
+- [x] **compose() 异步化**: `compose()` 改为 `async def`，内部 `_get_duration` 和 `_run_ffmpeg` 走 `run_in_executor`。
+- [x] **同分辨率跳过 scale**: 多素材逐段比对分辨率，匹配的传 `None` 走 copy 分支；单素材同理。避免已是目标分辨率时的无效重编码。
+- [x] **`_get_duration()` 线程池化**: workflow 中 3 处同步 ffprobe 探测改为 `await _run_blocking()`。
+- [x] **compose 循环 CRF 统一**: 循环场景 CRF 23 → 18，与全流水线质量标准一致。
+- [x] **多素材片段校验**: prepare 完成后校验片段数量一致，防止空片段进入 concat。
+- [x] **唇形模型前端选择**: 生成按钮右侧新增模型下拉（默认模型/快速模型/高级模型），全链路透传 `lipsync_model` 到后端路由。默认保持阈值策略，快速强制 MuseTalk，高级强制 LatentSync，三种模式均有 LatentSync 兜底。选择 localStorage 持久化。
+
+### Day 29: 视频流水线优化 + CosyVoice 语气控制
+- [x] **字幕同步修复**: Whisper 时间戳三步平滑（单调递增+重叠消除+间隙填补）+ 原文节奏映射（线性插值 + 单字时长钳位）。
+- [x] **LatentSync 嘴型参数调优**: inference_steps 16→20, guidance_scale 2.0, DeepCache 启用, Remotion concurrency 16→4。
+- [x] **compose 流复制**: 不循环时 `-c:v copy` 替代 libx264 重编码，compose 耗时从分钟级降到秒级。
+- [x] **FFmpeg 超时保护**: `_run_ffmpeg()` timeout=600, `_get_duration()` timeout=30。
+- [x] **全局并发限制**: `asyncio.Semaphore(2)` 控制同时运行的生成任务数。
+- [x] **Redis 任务 TTL**: create 24h, completed/failed 2h, list 自动清理过期索引。
+- [x] **临时字体清理**: 字体文件加入 temp_files 清理列表。
+- [x] **预览背景 CORS 修复**: 素材同源代理 `/api/materials/stream/{id}` 彻底绕开跨域。
+- [x] **CosyVoice 语气控制**: 声音克隆模式新增语气下拉（正常/欢快/低沉/严肃），基于 `inference_instruct2()` 自然语言指令控制情绪，全链路透传 instruct_text，默认"正常"行为不变。
+
+### Day 28: CosyVoice FP16 加速 + 文档全面更新
 - [x] **CosyVoice FP16 半精度加速**: `AutoModel()` 开启 `fp16=True`，LLM 推理和 Flow Matching 自动混合精度运行，预估提速 30-40%、显存降低 ~30%。
 - [x] **文档全面更新**: README.md / DEPLOY_MANUAL.md / SUBTITLE_DEPLOY.md / BACKEND_README.md 补充 MuseTalk 混合唇形同步方案、性能优化、Remotion 并发渲染等内容。

@@ -258,7 +288,7 @@
 | **核心 API** | 100% | ✅ 稳定 |
 | **Web UI** | 100% | ✅ 稳定 (移动端适配) |
 | **唇形同步** | 100% | ✅ LatentSync 1.6 |
-| **TTS 配音** | 100% | ✅ EdgeTTS + CosyVoice 3.0 + 配音前置 + 时间轴编排 + 自动转写 + 语速控制 |
+| **TTS 配音** | 100% | ✅ EdgeTTS + CosyVoice 3.0 + 配音前置 + 时间轴编排 + 自动转写 + 语速控制 + 语气控制 |
 | **自动发布** | 100% | ✅ 抖音/微信视频号/B站/小红书 |
 | **用户认证** | 100% | ✅ 手机号 + JWT |
 | **付费会员** | 100% | ✅ 支付宝电脑网站支付 + 自动激活 |
--- a/README.md
+++ b/README.md
@@ -16,8 +16,8 @@
 ## ✨ 功能特性

 ### 核心能力
- 🎬 **高清唇形同步** - 混合方案：短视频 (<120s) 用 LatentSync 1.6 (高质量 Latent Diffusion)，长视频 (>=120s) 用 MuseTalk 1.5 (实时级单步推理)，自动路由 + 回退。
- 🎙️ **多模态配音** - 支持 **EdgeTTS** (微软超自然语音, 10 语言) 和 **CosyVoice 3.0** (3秒极速声音克隆, 9语言+18方言, 语速可调)。上传参考音频自动 Whisper 转写 + 智能截取。配音前置工作流：先生成配音 → 选素材 → 生成视频。
+- 🎬 **高清唇形同步** - 混合方案：短视频 (<120s) 用 LatentSync 1.6 (高质量 Latent Diffusion)，长视频 (>=120s) 用 MuseTalk 1.5 (实时级单步推理)，自动路由 + 回退。前端可选模型：默认模型（阈值自动路由）/ 快速模型（强制 MuseTalk）/ 高级模型（强制 LatentSync）。
+- 🎙️ **多模态配音** - 支持 **EdgeTTS** (微软超自然语音, 10 语言) 和 **CosyVoice 3.0** (3秒极速声音克隆, 9语言+18方言, 语速/语气可调)。上传参考音频自动 Whisper 转写 + 智能截取。配音前置工作流：先生成配音 → 选素材 → 生成视频。
 - 📝 **智能字幕** - 集成 faster-whisper + Remotion，自动生成逐字高亮 (卡拉OK效果) 字幕。
 - 🎨 **样式预设** - 12 种标题 + 8 种字幕样式预设，支持预览 + 字号调节 + 自定义字体库。CSS 原生描边渲染，清晰无重影。
 - 🏷️ **标题显示模式** - 片头标题支持 `短暂显示` / `常驻显示`，默认短暂显示（4秒），用户偏好自动持久化。
@@ -37,7 +37,7 @@
 - 💳 **付费会员** - 支付宝电脑网站支付自动开通会员，到期自动停用并引导续费，管理员手动激活并存。
 - 🔐 **认证与隔离** - 基于 Supabase 的用户隔离，支持手机号注册/登录、密码管理。
 - 🛡️ **服务守护** - 内置 Watchdog 看门狗机制，自动监控并重启僵死服务，确保 7x24h 稳定运行。
- 🚀 **性能优化** - 视频预压缩、模型常驻服务（近实时加载）、双 GPU 流水线并发、MuseTalk 人脸检测降频 + BiSeNet 缓存、Remotion 16 并发渲染。
+- 🚀 **性能优化** - 编码流水线从 5-6 次有损编码精简至 3 次（prepare_segment → 模型输出 → Remotion）、compose 流复制免重编码、同分辨率跳过 scale、FFmpeg 超时保护、全局视频生成并发限制 (Semaphore(2))、Remotion 4 并发渲染、MuseTalk rawvideo 管道直编码（消除中间有损文件）、模型常驻服务、双 GPU 流水线并发、Redis 任务 TTL 自动清理、workflow 阻塞调用线程池化。

 ---

--- a/backend/.env.example
+++ b/backend/.env.example
@@ -25,10 +25,10 @@ LATENTSYNC_USE_SERVER=true
 # LATENTSYNC_API_URL=http://localhost:8007

 # 推理步数 (20-50, 越高质量越好，速度越慢)
-LATENTSYNC_INFERENCE_STEPS=16
+LATENTSYNC_INFERENCE_STEPS=30

 # 引导系数 (1.0-3.0, 越高唇同步越准，但可能抖动)
-LATENTSYNC_GUIDANCE_SCALE=1.5
+LATENTSYNC_GUIDANCE_SCALE=1.9

 # 启用 DeepCache 加速 (推荐开启)
 LATENTSYNC_ENABLE_DEEPCACHE=true
@@ -52,9 +52,36 @@ MUSETALK_VERSION=v15
 # 半精度加速
 MUSETALK_USE_FLOAT16=true

+# 人脸检测降频间隔（帧，越小质量越稳但更慢）
+MUSETALK_DETECT_EVERY=2
+
+# BiSeNet mask 缓存更新间隔（帧，越小质量越稳但更慢）
+MUSETALK_BLEND_CACHE_EVERY=2
+
+# Whisper 时序上下文（越大越平滑，口型响应会更钝）
+MUSETALK_AUDIO_PADDING_LEFT=2
+MUSETALK_AUDIO_PADDING_RIGHT=2
+
+# v1.5 下巴区域扩展像素（越大越容易看到下唇/牙齿，也更易边缘不稳）
+MUSETALK_EXTRA_MARGIN=14
+
+# 音频-口型对齐偏移（帧，正数=口型更晚，负数=口型更早）
+MUSETALK_DELAY_FRAME=0
+
+# 融合模式：auto(按版本自动) / jaw / raw
+MUSETALK_BLEND_MODE=jaw
+
+# FaceParsing 面颊宽度（仅 v1.5 生效，影响融合掩膜范围）
+MUSETALK_FACEPARSING_LEFT_CHEEK_WIDTH=90
+MUSETALK_FACEPARSING_RIGHT_CHEEK_WIDTH=90
+
+# 最终编码质量（CRF 越小越清晰但体积更大）
+MUSETALK_ENCODE_CRF=14
+MUSETALK_ENCODE_PRESET=slow
+
 # =============== 混合唇形同步路由 ===============
 # 音频时长 >= 此阈值（秒）用 MuseTalk，< 此阈值用 LatentSync
-LIPSYNC_DURATION_THRESHOLD=120
+LIPSYNC_DURATION_THRESHOLD=100

 # =============== 上传配置 ===============
 # 最大上传文件大小 (MB)
@@ -94,5 +121,5 @@ SUPABASE_STORAGE_LOCAL_PATH=/home/rongye/ProgramFiles/Supabase/volumes/storage/s
 ALIPAY_APP_ID=2021006132600283
 ALIPAY_PRIVATE_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/app_private_key.pem
 ALIPAY_PUBLIC_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/alipay_public_key.pem
-ALIPAY_NOTIFY_URL=https://vigent.hbyrkj.top/api/payment/notify
-ALIPAY_RETURN_URL=https://vigent.hbyrkj.top/pay
+ALIPAY_NOTIFY_URL=https://ipagent.ai-labz.cn/api/payment/notify
+ALIPAY_RETURN_URL=https://ipagent.ai-labz.cn/pay
--- a/backend/app/modules/generated_audios/schemas.py
+++ b/backend/app/modules/generated_audios/schemas.py
@@ -10,6 +10,7 @@ class GenerateAudioRequest(BaseModel):
    ref_text: Optional[str] = None
    language: str = "zh-CN"
    speed: float = 1.0
+    instruct_text: Optional[str] = None


 class RenameAudioRequest(BaseModel):
--- a/backend/app/modules/generated_audios/service.py
+++ b/backend/app/modules/generated_audios/service.py
@@ -81,6 +81,7 @@ async def generate_audio_task(task_id: str, req: GenerateAudioRequest, user_id:
                        output_path=audio_path,
                        language=_locale_to_tts_lang(req.language),
                        speed=req.speed,
+                        instruct_text=req.instruct_text or "",
                    )
                finally:
                    if os.path.exists(ref_local):
--- a/backend/app/modules/materials/router.py
+++ b/backend/app/modules/materials/router.py
@@ -1,14 +1,28 @@
 from fastapi import APIRouter, HTTPException, Request, Depends
+from fastapi.responses import FileResponse
 from loguru import logger

 from app.core.deps import get_current_user
 from app.core.response import success_response
 from app.modules.materials.schemas import RenameMaterialRequest
 from app.modules.materials import service
+from app.services.storage import storage_service

 router = APIRouter()


+@router.get("/stream/{material_id:path}")
+async def stream_material(material_id: str, current_user: dict = Depends(get_current_user)):
+    """直接流式返回素材文件（同源，避免 CORS canvas taint）"""
+    user_id = current_user["id"]
+    if not material_id.startswith(f"{user_id}/"):
+        raise HTTPException(403, "无权访问此素材")
+    local_path = storage_service.get_local_file_path("materials", material_id)
+    if not local_path:
+        raise HTTPException(404, "素材文件不存在")
+    return FileResponse(local_path, media_type="video/mp4")
+
+
@router.post("")
 async def upload_material(
    request: Request,
--- a/backend/app/modules/videos/schemas.py
+++ b/backend/app/modules/videos/schemas.py
@@ -38,3 +38,4 @@ class GenerateRequest(BaseModel):
    bgm_volume: Optional[float] = 0.2
    custom_assignments: Optional[List[CustomAssignment]] = None
    output_aspect_ratio: Literal["9:16", "16:9"] = "9:16"
+    lipsync_model: Literal["default", "fast", "advanced"] = "default"
--- a/backend/app/modules/videos/task_store.py
+++ b/backend/app/modules/videos/task_store.py
@@ -54,7 +54,7 @@ class RedisTaskStore:
            "progress": 0,
            "user_id": user_id,
        }
-        self._client.set(self._key(task_id), json.dumps(task, ensure_ascii=False))
+        self._client.set(self._key(task_id), json.dumps(task, ensure_ascii=False), ex=86400)
        self._client.sadd(self._index_key, task_id)
        return task

@@ -71,12 +71,17 @@ class RedisTaskStore:
        keys = [self._key(task_id) for task_id in task_ids]
        raw_items = self._client.mget(keys)
        tasks = []
-        for raw in raw_items:
-            if raw:
-                try:
-                    tasks.append(json.loads(raw))
-                except Exception:
-                    continue
+        expired = []
+        for task_id, raw in zip(task_ids, raw_items):
+            if raw is None:
+                expired.append(task_id)
+                continue
+            try:
+                tasks.append(json.loads(raw))
+            except Exception:
+                continue
+        if expired:
+            self._client.srem(self._index_key, *expired)
        return tasks

    def update(self, task_id: str, updates: Dict[str, Any]) -> Dict[str, Any]:
@@ -84,7 +89,8 @@ class RedisTaskStore:
        if task.get("status") == "not_found":
            task = {"status": "pending", "task_id": task_id}
        task.update(updates)
-        self._client.set(self._key(task_id), json.dumps(task, ensure_ascii=False))
+        ttl = 7200 if task.get("status") in ("completed", "failed") else 86400
+        self._client.set(self._key(task_id), json.dumps(task, ensure_ascii=False), ex=ttl)
        self._client.sadd(self._index_key, task_id)
        return task

--- a/backend/app/modules/videos/workflow.py
+++ b/backend/app/modules/videos/workflow.py
@@ -24,6 +24,9 @@ from app.services.remotion_service import remotion_service
 from .schemas import GenerateRequest
 from .task_store import task_store

+# 全局并发限制：最多同时运行 2 个视频生成任务
+_generation_semaphore = asyncio.Semaphore(2)
+

 def _locale_to_whisper_lang(locale: str) -> str:
    """'en-US' → 'en', 'zh-CN' → 'zh'"""
@@ -91,6 +94,12 @@ def _update_task(task_id: str, **updates: Any) -> None:
    task_store.update(task_id, updates)


+async def _run_blocking(func, *args):
+    """在线程池执行阻塞函数，避免卡住事件循环。"""
+    loop = asyncio.get_running_loop()
+    return await loop.run_in_executor(None, func, *args)
+
+
 # ── 多素材辅助函数 ──


@@ -169,6 +178,12 @@ def _split_equal(segments: List[dict], material_paths: List[str]) -> List[dict]:


 async def process_video_generation(task_id: str, req: GenerateRequest, user_id: str):
+    _update_task(task_id, message="排队中...")
+    async with _generation_semaphore:
+        await _process_video_generation_inner(task_id, req, user_id)
+
+
+async def _process_video_generation_inner(task_id: str, req: GenerateRequest, user_id: str):
    temp_files = []
    try:
        start_time = time.time()
@@ -205,7 +220,8 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:

            # 归一化旋转元数据（如 iPhone MOV 1920x1080 + rotation=-90）
            normalized_input_path = temp_dir / f"{task_id}_input_norm.mp4"
-            normalized_result = video.normalize_orientation(
+            normalized_result = await _run_blocking(
+                video.normalize_orientation,
                str(input_material_path),
                str(normalized_input_path),
            )
@@ -283,6 +299,42 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:

        captions_path = None

+        async def _whisper_and_split():
+            """Whisper 对齐 → _split_equal 均分素材（公共逻辑）"""
+            _update_task(task_id, message="正在生成字幕 (Whisper)...")
+            _captions_path = temp_dir / f"{task_id}_captions.json"
+            temp_files.append(_captions_path)
+            captions_data = None
+            try:
+                captions_data = await whisper_service.align(
+                    audio_path=str(audio_path),
+                    text=req.text,
+                    output_path=str(_captions_path),
+                    language=_locale_to_whisper_lang(req.language),
+                    original_text=req.text,
+                )
+                print(f"[Pipeline] Whisper alignment completed (multi-material)")
+            except Exception as e:
+                logger.warning(f"Whisper alignment failed: {e}")
+                _captions_path = None
+
+            _update_task(task_id, progress=15, message="正在分配素材...")
+
+            if captions_data and captions_data.get("segments"):
+                result = _split_equal(captions_data["segments"], material_paths)
+            else:
+                logger.warning("[MultiMat] Whisper 无数据，按时长均分")
+                audio_dur = await _run_blocking(video._get_duration, str(audio_path))
+                if audio_dur <= 0:
+                    audio_dur = 30.0
+                seg_dur = audio_dur / len(material_paths)
+                result = [
+                    {"material_path": material_paths[i], "start": i * seg_dur,
+                     "end": (i + 1) * seg_dur, "index": i}
+                    for i in range(len(material_paths))
+                ]
+            return result, _captions_path
+
        if is_multi:
            # ══════════════════════════════════════
            # 多素材流水线
@@ -327,83 +379,13 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
                    f" 与素材数量({len(material_paths)})不一致，回退自动分配"
                )

-                # 原有逻辑：Whisper → _split_equal
-                _update_task(task_id, message="正在生成字幕 (Whisper)...")
-
-                captions_path = temp_dir / f"{task_id}_captions.json"
-                temp_files.append(captions_path)
-
-                try:
-                    captions_data = await whisper_service.align(
-                        audio_path=str(audio_path),
-                        text=req.text,
-                        output_path=str(captions_path),
-                        language=_locale_to_whisper_lang(req.language),
-                        original_text=req.text,
-                    )
-                    print(f"[Pipeline] Whisper alignment completed (multi-material)")
-                except Exception as e:
-                    logger.warning(f"Whisper alignment failed: {e}")
-                    captions_data = None
-                    captions_path = None
-
-                _update_task(task_id, progress=15, message="正在分配素材...")
-
-                if captions_data and captions_data.get("segments"):
-                    assignments = _split_equal(captions_data["segments"], material_paths)
-                else:
-                    # Whisper 失败 → 按时长均分（不依赖字符对齐）
-                    logger.warning("[MultiMat] Whisper 无数据，按时长均分")
-                    audio_dur = video._get_duration(str(audio_path))
-                    if audio_dur <= 0:
-                        audio_dur = 30.0  # 安全兜底
-                    seg_dur = audio_dur / len(material_paths)
-                    assignments = [
-                        {"material_path": material_paths[i], "start": i * seg_dur,
-                         "end": (i + 1) * seg_dur, "index": i}
-                        for i in range(len(material_paths))
-                    ]
+                assignments, captions_path = await _whisper_and_split()

            else:
-                # 原有逻辑：Whisper → _split_equal
-                _update_task(task_id, message="正在生成字幕 (Whisper)...")
-
-                captions_path = temp_dir / f"{task_id}_captions.json"
-                temp_files.append(captions_path)
-
-                try:
-                    captions_data = await whisper_service.align(
-                        audio_path=str(audio_path),
-                        text=req.text,
-                        output_path=str(captions_path),
-                        language=_locale_to_whisper_lang(req.language),
-                        original_text=req.text,
-                    )
-                    print(f"[Pipeline] Whisper alignment completed (multi-material)")
-                except Exception as e:
-                    logger.warning(f"Whisper alignment failed: {e}")
-                    captions_data = None
-                    captions_path = None
-
-                _update_task(task_id, progress=15, message="正在分配素材...")
-
-                if captions_data and captions_data.get("segments"):
-                    assignments = _split_equal(captions_data["segments"], material_paths)
-                else:
-                    # Whisper 失败 → 按时长均分（不依赖字符对齐）
-                    logger.warning("[MultiMat] Whisper 无数据，按时长均分")
-                    audio_dur = video._get_duration(str(audio_path))
-                    if audio_dur <= 0:
-                        audio_dur = 30.0  # 安全兜底
-                    seg_dur = audio_dur / len(material_paths)
-                    assignments = [
-                        {"material_path": material_paths[i], "start": i * seg_dur,
-                         "end": (i + 1) * seg_dur, "index": i}
-                        for i in range(len(material_paths))
-                    ]
+                assignments, captions_path = await _whisper_and_split()

            # 扩展段覆盖完整音频范围：首段从0开始，末段到音频结尾
-            audio_duration = video._get_duration(str(audio_path))
+            audio_duration = await _run_blocking(video._get_duration, str(audio_path))
            if assignments and audio_duration > 0:
                assignments[0]["start"] = 0.0
                assignments[-1]["end"] = audio_duration
@@ -427,9 +409,7 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
                await _download_material(assignment["material_path"], material_local)

                normalized_material = temp_dir / f"{task_id}_material_{i}_norm.mp4"
-                loop = asyncio.get_event_loop()
-                normalized_result = await loop.run_in_executor(
-                    None,
+                normalized_result = await _run_blocking(
                    video.normalize_orientation,
                    str(material_local),
                    str(normalized_material),
@@ -457,22 +437,21 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
                logger.info(f"[MultiMat] 素材分辨率不一致，统一到 {base_res[0]}x{base_res[1]}")

            # ── 第二步：并行裁剪每段素材到对应时长 ──
-            prepared_segments: List[Path] = [None] * num_segments
+            prepared_segments: List[Optional[Path]] = [None] * num_segments

            async def _prepare_one_segment(i: int, assignment: dict):
                """将单个素材裁剪/循环到对应时长"""
                seg_dur = assignment["end"] - assignment["start"]
                prepared_path = temp_dir / f"{task_id}_prepared_{i}.mp4"
                temp_files.append(prepared_path)
+                prepare_target_res = None if resolutions[i] == base_res else base_res

-                loop = asyncio.get_event_loop()
-                await loop.run_in_executor(
-                    None,
+                await _run_blocking(
                    video.prepare_segment,
                    str(material_locals[i]),
                    seg_dur,
                    str(prepared_path),
-                    base_res,
+                    prepare_target_res,
                    assignment.get("source_start", 0.0),
                    assignment.get("source_end"),
                    25,
@@ -497,10 +476,14 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
            _update_task(task_id, progress=50, message="正在拼接素材片段...")
            concat_path = temp_dir / f"{task_id}_concat.mp4"
            temp_files.append(concat_path)
-            video.concat_videos(
-                [str(p) for p in prepared_segments],
+            prepared_segment_paths = [str(p) for p in prepared_segments if p is not None]
+            if len(prepared_segment_paths) != num_segments:
+                raise RuntimeError("Multi-material: prepared segments mismatch")
+            await _run_blocking(
+                video.concat_videos,
+                prepared_segment_paths,
                str(concat_path),
-                target_fps=25,
+                25,
            )

            # ── 第三步：一次 LatentSync 推理 ──
@@ -510,7 +493,12 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
                _update_task(task_id, progress=55, message="正在合成唇形 (LatentSync)...")
                print(f"[LipSync] Multi-material: single LatentSync on concatenated video")
                try:
-                    await lipsync.generate(str(concat_path), str(audio_path), str(lipsync_video_path))
+                    await lipsync.generate(
+                        str(concat_path),
+                        str(audio_path),
+                        str(lipsync_video_path),
+                        model_mode=req.lipsync_model,
+                    )
                except Exception as e:
                    logger.warning(f"[LipSync] Failed, fallback to concat without lipsync: {e}")
                    import shutil
@@ -544,18 +532,22 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
                single_source_end = req.custom_assignments[0].source_end

            _update_task(task_id, progress=20, message="正在准备素材片段...")
-            audio_dur = video._get_duration(str(audio_path))
+            audio_dur = await _run_blocking(video._get_duration, str(audio_path))
            if audio_dur <= 0:
                audio_dur = 30.0
+            single_res = await _run_blocking(video.get_resolution, str(input_material_path))
+            single_target_res = None if single_res == target_resolution else target_resolution
            prepared_single_path = temp_dir / f"{task_id}_prepared_single.mp4"
            temp_files.append(prepared_single_path)
-            video.prepare_segment(
+            await _run_blocking(
+                video.prepare_segment,
                str(input_material_path),
                audio_dur,
                str(prepared_single_path),
-                target_resolution=target_resolution,
-                source_start=single_source_start,
-                source_end=single_source_end,
+                single_target_res,
+                single_source_start,
+                single_source_end,
+                None,
            )
            input_material_path = prepared_single_path

@@ -568,7 +560,18 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
            if is_ready:
                print(f"[LipSync] Starting LatentSync inference...")
                _update_task(task_id, progress=35, message="正在运行 LatentSync 推理...")
-                await lipsync.generate(str(input_material_path), str(audio_path), str(lipsync_video_path))
+                try:
+                    await lipsync.generate(
+                        str(input_material_path),
+                        str(audio_path),
+                        str(lipsync_video_path),
+                        model_mode=req.lipsync_model,
+                    )
+                except Exception as e:
+                    logger.warning(f"[LipSync] Failed on single-material, fallback to prepared video: {e}")
+                    _update_task(task_id, message="唇形同步失败，使用原始视频...")
+                    import shutil
+                    shutil.copy(str(input_material_path), str(lipsync_video_path))
            else:
                print(f"[LipSync] LatentSync not ready, copying original video")
                _update_task(task_id, message="唇形同步不可用，使用原始视频...")
@@ -589,6 +592,7 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
        final_audio_path = audio_path
        _whisper_task = None
        _bgm_task = None
+        mix_output_path: Optional[Path] = None

        # 单素材模式下 Whisper 尚未执行，这里与 BGM 并行启动
        need_whisper = not is_multi and req.enable_subtitles and captions_path is None
@@ -629,10 +633,8 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:

                async def _run_bgm():
                    _update_task(task_id, message="正在合成背景音乐...", progress=86)
-                    loop = asyncio.get_event_loop()
                    try:
-                        await loop.run_in_executor(
-                            None,
+                        await _run_blocking(
                            video.mix_audio,
                            _voice_path,
                            _bgm_path,
@@ -658,7 +660,7 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
                    captions_path = None
                result_idx += 1
            if _bgm_task is not None:
-                if results[result_idx]:
+                if results[result_idx] and mix_output_path is not None:
                    final_audio_path = mix_output_path


@@ -721,16 +723,28 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
                f"{task_id}_secondary_title_font"
            )

+            # 清理字体临时文件
+            for prefix in [f"{task_id}_subtitle_font", f"{task_id}_title_font", f"{task_id}_secondary_title_font"]:
+                for ext in [".ttf", ".otf", ".woff", ".woff2"]:
+                    font_tmp = temp_dir / f"{prefix}{ext}"
+                    if font_tmp.exists():
+                        temp_files.append(font_tmp)
+
        final_output_local_path = temp_dir / f"{task_id}_output.mp4"
        temp_files.append(final_output_local_path)
+        needs_audio_compose = str(final_audio_path) != str(audio_path)

        if use_remotion:
            _update_task(task_id, message="正在合成视频 (Remotion)...", progress=87)
+            remotion_input_path = lipsync_video_path

-            composed_video_path = temp_dir / f"{task_id}_composed.mp4"
-            temp_files.append(composed_video_path)
-
-            await video.compose(str(lipsync_video_path), str(final_audio_path), str(composed_video_path))
+            if needs_audio_compose:
+                composed_video_path = temp_dir / f"{task_id}_composed.mp4"
+                temp_files.append(composed_video_path)
+                await video.compose(str(lipsync_video_path), str(final_audio_path), str(composed_video_path))
+                remotion_input_path = composed_video_path
+            else:
+                logger.info("[Pipeline] Audio unchanged, skip pre-Remotion compose")

            remotion_health = await remotion_service.check_health()
            if remotion_health.get("ready"):
@@ -747,7 +761,7 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
                    title_duration = max(0.5, min(float(req.title_duration or 4.0), 30.0))

                    await remotion_service.render(
-                        video_path=str(composed_video_path),
+                        video_path=str(remotion_input_path),
                        output_path=str(final_output_local_path),
                        captions_path=str(captions_path) if captions_path else None,
                        title=req.title,
@@ -765,15 +779,18 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
                except Exception as e:
                    logger.warning(f"Remotion render failed, using FFmpeg fallback: {e}")
                    import shutil
-                    shutil.copy(str(composed_video_path), final_output_local_path)
+                    shutil.copy(str(remotion_input_path), str(final_output_local_path))
            else:
                logger.warning(f"Remotion not ready: {remotion_health.get('error')}, using FFmpeg")
                import shutil
-                shutil.copy(str(composed_video_path), final_output_local_path)
+                shutil.copy(str(remotion_input_path), str(final_output_local_path))
        else:
            _update_task(task_id, message="正在合成最终视频...", progress=90)
-
-            await video.compose(str(lipsync_video_path), str(final_audio_path), str(final_output_local_path))
+            if needs_audio_compose:
+                await video.compose(str(lipsync_video_path), str(final_audio_path), str(final_output_local_path))
+            else:
+                import shutil
+                shutil.copy(str(lipsync_video_path), str(final_output_local_path))

        total_time = time.time() - start_time

--- a/backend/app/services/lipsync_service.py
+++ b/backend/app/services/lipsync_service.py
@@ -11,12 +11,12 @@ import asyncio
 import httpx
 from pathlib import Path
 from loguru import logger
-from typing import Optional
+from typing import Optional, Literal

 from app.core.config import settings


-class LipSyncService:
+class LipSyncService:
    """唇形同步服务 - LatentSync 1.6 + MuseTalk 1.5 混合方案"""

    def __init__(self):
@@ -121,139 +121,43 @@ class LipSyncService:
            logger.warning(f"⚠️ 视频循环异常: {e}")
            return video_path

-    def _preprocess_video(self, video_path: str, output_path: str, target_height: int = 720) -> str:
-        """
-        视频预处理：压缩视频以加速后续处理
-        - 限制最大高度为 target_height (默认720p)
-        - 保持宽高比
-        - 使用快速编码预设
-        
-        Returns: 预处理后的视频路径
-        """
-        import subprocess
-        import json
-        
-        # 获取视频信息 (使用 JSON 格式更可靠)
-        probe_cmd = [
-            "ffprobe", "-v", "error",
-            "-select_streams", "v:0",
-            "-show_entries", "stream=height,width",
-            "-of", "json",
-            video_path
-        ]
-        try:
-            result = subprocess.run(probe_cmd, capture_output=True, text=True, timeout=10)
-            if result.returncode != 0:
-                logger.warning(f"⚠️ ffprobe 失败: {result.stderr[:100]}")
-                return video_path
-            
-            probe_data = json.loads(result.stdout)
-            streams = probe_data.get("streams", [])
-            if not streams:
-                logger.warning("⚠️ 无法获取视频流信息，跳过预处理")
-                return video_path
-            
-            current_height = streams[0].get("height", 0)
-            current_width = streams[0].get("width", 0)
-            
-            if current_height == 0:
-                logger.warning("⚠️ 视频高度为 0，跳过预处理")
-                return video_path
-                
-            logger.info(f"📹 原始视频分辨率: {current_width}×{current_height}")
-            
-        except json.JSONDecodeError as e:
-            logger.warning(f"⚠️ ffprobe 输出解析失败: {e}")
-            return video_path
-        except subprocess.TimeoutExpired:
-            logger.warning("⚠️ ffprobe 超时，跳过预处理")
-            return video_path
-        except Exception as e:
-            logger.warning(f"⚠️ 获取视频信息失败: {e}")
-            return video_path
-        
-        # 如果视频已经足够小，跳过压缩
-        if current_height <= target_height:
-            logger.info(f"📹 视频高度 {current_height}p <= {target_height}p，无需压缩")
-            return video_path
-        
-        logger.info(f"📹 预处理视频: {current_height}p → {target_height}p")
-        
-        # 使用 FFmpeg 压缩
-        compress_cmd = [
-            "ffmpeg", "-y",
-            "-i", video_path,
-            "-vf", f"scale=-2:{target_height}",  # 保持宽高比，高度设为 target_height
-            "-c:v", "libx264",
-            "-preset", "ultrafast",  # 最快编码速度
-            "-crf", "23",  # 质量因子
-            "-c:a", "copy",  # 音频直接复制
-            output_path
-        ]
-        
-        try:
-            result = subprocess.run(
-                compress_cmd,
-                capture_output=True,
-                text=True,
-                timeout=120  # 增加超时时间到2分钟
-            )
-            if result.returncode == 0 and Path(output_path).exists():
-                original_size = Path(video_path).stat().st_size / 1024 / 1024
-                new_size = Path(output_path).stat().st_size / 1024 / 1024
-                logger.info(f"✅ 视频压缩完成: {original_size:.1f}MB → {new_size:.1f}MB")
-                return output_path
-            else:
-                logger.warning(f"⚠️ 视频压缩失败: {result.stderr[:200]}")
-                return video_path
-        except subprocess.TimeoutExpired:
-            logger.warning("⚠️ 视频压缩超时，使用原始视频")
-            return video_path
-        except Exception as e:
-            logger.warning(f"⚠️ 视频压缩异常: {e}")
-            return video_path
+    async def generate(
+        self, 
+        video_path: str, 
+        audio_path: str, 
+        output_path: str, 
+        fps: int = 25,
+        model_mode: Literal["default", "fast", "advanced"] = "default",
+    ) -> str:
+        """生成唇形同步视频"""
+        logger.info(f"🎬 唇形同步任务: {Path(video_path).name} + {Path(audio_path).name}")
+        Path(output_path).parent.mkdir(parents=True, exist_ok=True)
+
+        normalized_mode: Literal["default", "fast", "advanced"] = model_mode
+        if normalized_mode not in ("default", "fast", "advanced"):
+            normalized_mode = "default"
+        logger.info(f"🧠 Lipsync 模式: {normalized_mode}")
+        
+        if self.use_local:
+            return await self._local_generate(video_path, audio_path, output_path, fps, normalized_mode)
+        else:
+            return await self._remote_generate(video_path, audio_path, output_path, fps, normalized_mode)
    
-    async def generate(
-        self, 
-        video_path: str, 
-        audio_path: str, 
-        output_path: str, 
-        fps: int = 25
-    ) -> str:
-        """生成唇形同步视频"""
-        logger.info(f"🎬 唇形同步任务: {Path(video_path).name} + {Path(audio_path).name}")
-        Path(output_path).parent.mkdir(parents=True, exist_ok=True)
-        
-        if self.use_local:
-            return await self._local_generate(video_path, audio_path, output_path, fps)
-        else:
-            return await self._remote_generate(video_path, audio_path, output_path, fps)
-    
-    async def _local_generate(
-        self, 
-        video_path: str, 
-        audio_path: str, 
-        output_path: str, 
-        fps: int
-    ) -> str:
-        """使用 subprocess 调用 LatentSync conda 环境"""
-        
-        # 检查前置条件
-        if not self._check_conda_env():
-            logger.warning("⚠️ Conda 环境不可用，使用 Fallback")
-            shutil.copy(video_path, output_path)
-            return output_path
-        
-        if not self._check_weights():
-            logger.warning("⚠️ 模型权重不存在，使用 Fallback")
-            shutil.copy(video_path, output_path)
-            return output_path
-        
-        logger.info("⏳ 等待 GPU 资源 (排队中)...")
-        async with self._lock:
-            # 使用临时目录存放中间文件
-            with tempfile.TemporaryDirectory() as tmpdir:
-                tmpdir = Path(tmpdir)
+    async def _local_generate(
+        self, 
+        video_path: str, 
+        audio_path: str, 
+        output_path: str, 
+        fps: int,
+        model_mode: Literal["default", "fast", "advanced"],
+    ) -> str:
+        """使用 subprocess 调用 LatentSync conda 环境"""
+
+        logger.info("⏳ 等待 GPU 资源 (排队中)...")
+        async with self._lock:
+            # 使用临时目录存放中间文件
+            with tempfile.TemporaryDirectory() as tmpdir:
+                tmpdir = Path(tmpdir)

                # 获取音频和视频时长
                audio_duration = self._get_media_duration(audio_path)
@@ -268,24 +172,53 @@ class LipSyncService:
                        str(looped_video),
                        audio_duration
                    )
-                else:
-                    actual_video_path = video_path
-
-                # 混合路由: 长视频走 MuseTalk，短视频走 LatentSync
-                if audio_duration and audio_duration >= settings.LIPSYNC_DURATION_THRESHOLD:
-                    logger.info(
-                        f"🔄 音频 {audio_duration:.1f}s >= {settings.LIPSYNC_DURATION_THRESHOLD}s，路由到 MuseTalk"
-                    )
-                    musetalk_result = await self._call_musetalk_server(
-                        actual_video_path, audio_path, output_path
-                    )
-                    if musetalk_result:
-                        return musetalk_result
-                    logger.warning("⚠️ MuseTalk 不可用，回退到 LatentSync（长视频，会较慢）")
-
-                if self.use_server:
-                    # 模式 A: 调用常驻服务 (加速模式)
-                    return await self._call_persistent_server(actual_video_path, audio_path, output_path)
+                else:
+                    actual_video_path = video_path
+
+                # 模型路由
+                force_musetalk = model_mode == "fast"
+                force_latentsync = model_mode == "advanced"
+                auto_to_musetalk = (
+                    model_mode == "default"
+                    and audio_duration is not None
+                    and audio_duration >= settings.LIPSYNC_DURATION_THRESHOLD
+                )
+
+                if force_musetalk:
+                    logger.info("⚡ 强制快速模型：MuseTalk")
+                    musetalk_result = await self._call_musetalk_server(
+                        actual_video_path, audio_path, output_path
+                    )
+                    if musetalk_result:
+                        return musetalk_result
+                    logger.warning("⚠️ MuseTalk 不可用，快速模型回退到 LatentSync")
+                elif auto_to_musetalk:
+                    logger.info(
+                        f"🔄 音频 {audio_duration:.1f}s >= {settings.LIPSYNC_DURATION_THRESHOLD}s，路由到 MuseTalk"
+                    )
+                    musetalk_result = await self._call_musetalk_server(
+                        actual_video_path, audio_path, output_path
+                    )
+                    if musetalk_result:
+                        return musetalk_result
+                    logger.warning("⚠️ MuseTalk 不可用，回退到 LatentSync（长视频，会较慢）")
+                elif force_latentsync:
+                    logger.info("🎯 强制高级模型：LatentSync")
+
+                # 检查 LatentSync 前置条件（仅在需要回退或使用 LatentSync 时）
+                if not self._check_conda_env():
+                    logger.warning("⚠️ Conda 环境不可用，使用 Fallback")
+                    shutil.copy(video_path, output_path)
+                    return output_path
+
+                if not self._check_weights():
+                    logger.warning("⚠️ 模型权重不存在，使用 Fallback")
+                    shutil.copy(video_path, output_path)
+                    return output_path
+
+                if self.use_server:
+                    # 模式 A: 调用常驻服务 (加速模式)
+                    return await self._call_persistent_server(actual_video_path, audio_path, output_path)

                logger.info("🔄 调用 LatentSync 推理 (subprocess)...")

@@ -480,15 +413,18 @@ class LipSyncService:
            "请确保 LatentSync 服务已启动 (cd models/LatentSync && python scripts/server.py)"
        )
    
-    async def _remote_generate(
-        self, 
-        video_path: str, 
-        audio_path: str, 
-        output_path: str, 
-        fps: int
-    ) -> str:
-        """调用远程 LatentSync API 服务"""
-        logger.info(f"📡 调用远程 API: {self.api_url}")
+    async def _remote_generate(
+        self, 
+        video_path: str, 
+        audio_path: str, 
+        output_path: str, 
+        fps: int,
+        model_mode: Literal["default", "fast", "advanced"],
+    ) -> str:
+        """调用远程 LatentSync API 服务"""
+        if model_mode == "fast":
+            logger.warning("⚠️ 远程模式未接入 MuseTalk，快速模型将使用远程 LatentSync")
+        logger.info(f"📡 调用远程 API: {self.api_url}")
        
        try:
            async with httpx.AsyncClient(timeout=600.0) as client:
--- a/backend/app/services/remotion_service.py
+++ b/backend/app/services/remotion_service.py
@@ -71,7 +71,8 @@ class RemotionService:
            "--video", str(video_path),
            "--output", str(output_path),
            "--fps", str(fps),
-            "--enableSubtitles", str(enable_subtitles).lower()
+            "--enableSubtitles", str(enable_subtitles).lower(),
+            "--concurrency", "4"
        ])

        if captions_path:
--- a/backend/app/services/video_service.py
+++ b/backend/app/services/video_service.py
@@ -1,6 +1,7 @@
 """
 视频合成服务
 """
+import asyncio
 import os
 import subprocess
 import json
@@ -96,7 +97,7 @@ class VideoService:
            "-map", "0:a?",
            "-c:v", "libx264",
            "-preset", "fast",
-            "-crf", "23",
+            "-crf", "18",
            "-c:a", "copy",
            "-movflags", "+faststart",
            output_path,
@@ -118,18 +119,21 @@ class VideoService:
        cmd_str = ' '.join(shlex.quote(str(c)) for c in cmd)
        logger.debug(f"FFmpeg CMD: {cmd_str}")
        try:
-            # Synchronous call for BackgroundTasks compatibility
            result = subprocess.run(
                cmd,
                shell=False,
                capture_output=True,
                text=True,
                encoding='utf-8',
+                timeout=600,
            )
            if result.returncode != 0:
                logger.error(f"FFmpeg Error: {result.stderr}")
                return False
            return True
+        except subprocess.TimeoutExpired:
+            logger.error("FFmpeg timed out after 600s")
+            return False
        except Exception as e:
            logger.error(f"FFmpeg Exception: {e}")
            return False
@@ -148,6 +152,7 @@ class VideoService:
                cmd,
                capture_output=True,
                text=True,
+                timeout=30,
            )
            return float(result.stdout.strip())
        except Exception:
@@ -195,9 +200,10 @@ class VideoService:
        """合成视频"""
        # Ensure output dir
        Path(output_path).parent.mkdir(parents=True, exist_ok=True)
-        
-        video_duration = self._get_duration(video_path)
-        audio_duration = self._get_duration(audio_path)
+
+        loop = asyncio.get_running_loop()
+        video_duration = await loop.run_in_executor(None, self._get_duration, video_path)
+        audio_duration = await loop.run_in_executor(None, self._get_duration, audio_path)
        
        # Audio loop if needed
        loop_count = 1
@@ -221,21 +227,25 @@ class VideoService:
        # Previous state: subtitles disabled due to font issues
        # if subtitle_path: ...
        
-        # Audio map with high quality encoding
+        # 不需要循环时用流复制（几乎瞬间完成），需要循环时才重编码
+        if loop_count > 1:
+            cmd.extend([
+                "-c:v", "libx264", "-preset", "fast", "-crf", "18",
+            ])
+        else:
+            cmd.extend(["-c:v", "copy"])
+
        cmd.extend([
-            "-c:v", "libx264",
-            "-preset", "medium",    # 平衡速度与压缩效率
-            "-crf", "20",           # 最终输出：高质量（肉眼无损）
            "-c:a", "aac",
-            "-b:a", "192k",         # 音频比特率
-            "-shortest"
+            "-b:a", "192k",
+            "-shortest",
+            "-map", "0:v", "-map", "1:a",
        ])
-        # Use audio from input 1
-        cmd.extend(["-map", "0:v", "-map", "1:a"])
        
        cmd.append(output_path)
        
-        if self._run_ffmpeg(cmd):
+        ok = await loop.run_in_executor(None, self._run_ffmpeg, cmd)
+        if ok:
            return output_path
        else:
            raise RuntimeError("FFmpeg composition failed")
@@ -260,12 +270,7 @@ class VideoService:
            "-fflags", "+genpts",
            "-i", str(list_path),
            "-an",
-            "-vsync", "cfr",
-            "-r", str(target_fps),
-            "-c:v", "libx264",
-            "-preset", "fast",
-            "-crf", "23",
-            "-pix_fmt", "yuv420p",
+            "-c:v", "copy",
            "-movflags", "+faststart",
            output_path,
        ]
@@ -339,6 +344,7 @@ class VideoService:
        needs_loop = target_duration > available
        needs_scale = target_resolution is not None
        needs_fps = bool(target_fps and target_fps > 0)
+        target_fps_value = int(target_fps) if needs_fps and target_fps is not None else None
        has_source_end = clip_end < video_dur

        # 当需要循环且存在截取范围时，先裁剪出片段，再循环裁剪后的文件
@@ -353,7 +359,7 @@ class VideoService:
                "-i", video_path,
                "-t", str(available),
                "-an",
-                "-c:v", "libx264", "-preset", "fast", "-crf", "23",
+                "-c:v", "libx264", "-preset", "fast", "-crf", "18",
                trim_temp,
            ]
            if not self._run_ffmpeg(trim_cmd):
@@ -373,20 +379,20 @@ class VideoService:
        cmd.extend(["-i", actual_input, "-t", str(target_duration), "-an"])

        filters = []
-        if needs_fps:
-            filters.append(f"fps={int(target_fps)}")
+        if target_fps_value is not None:
+            filters.append(f"fps={target_fps_value}")
        if needs_scale:
            w, h = target_resolution
            filters.append(f"scale={w}:{h}:force_original_aspect_ratio=decrease,pad={w}:{h}:(ow-iw)/2:(oh-ih)/2")

        if filters:
            cmd.extend(["-vf", ",".join(filters)])
-        if needs_fps:
-            cmd.extend(["-vsync", "cfr", "-r", str(int(target_fps))])
+        if target_fps_value is not None:
+            cmd.extend(["-vsync", "cfr", "-r", str(target_fps_value)])

        # 需要循环、缩放或指定起点时必须重编码，否则用 stream copy 保持原画质
        if needs_loop or needs_scale or source_start > 0 or has_source_end or needs_fps:
-            cmd.extend(["-c:v", "libx264", "-preset", "fast", "-crf", "23"])
+            cmd.extend(["-c:v", "libx264", "-preset", "fast", "-crf", "18"])
        else:
            cmd.extend(["-c:v", "copy"])

--- a/backend/app/services/voice_clone_service.py
+++ b/backend/app/services/voice_clone_service.py
@@ -32,6 +32,7 @@ class VoiceCloneService:
        ref_text: str,
        language: str,
        speed: float = 1.0,
+        instruct_text: str = "",
        max_retries: int = 4,
    ) -> bytes:
        timeout = httpx.Timeout(240.0)
@@ -39,15 +40,18 @@ class VoiceCloneService:
        for attempt in range(max_retries):
            try:
                async with httpx.AsyncClient(timeout=timeout) as client:
+                    data = {
+                        "text": text,
+                        "ref_text": ref_text,
+                        "language": language,
+                        "speed": str(speed),
+                    }
+                    if instruct_text:
+                        data["instruct_text"] = instruct_text
                    response = await client.post(
                        f"{self.base_url}/generate",
                        files={"ref_audio": ("ref.wav", ref_audio_data, "audio/wav")},
-                        data={
-                            "text": text,
-                            "ref_text": ref_text,
-                            "language": language,
-                            "speed": str(speed),
-                        },
+                        data=data,
                    )

                retryable = False
@@ -99,6 +103,7 @@ class VoiceCloneService:
        output_path: str,
        language: str = "Chinese",
        speed: float = 1.0,
+        instruct_text: str = "",
    ) -> str:
        """
        使用声音克隆生成语音
@@ -132,6 +137,7 @@ class VoiceCloneService:
                ref_text=ref_text,
                language=language,
                speed=speed,
+                instruct_text=instruct_text,
            )
            with open(output_path, "wb") as f:
                f.write(audio_bytes)
--- a/backend/app/services/whisper_service.py
+++ b/backend/app/services/whisper_service.py
@@ -151,6 +151,46 @@ def split_segment_to_lines(words: List[dict], max_chars: int = MAX_CHARS_PER_LIN
    return segments


+def smooth_word_timestamps(words: List[dict]) -> List[dict]:
+    """
+    时间戳后处理平滑：
+    1. 保证时间戳严格单调递增
+    2. 消除 Whisper 输出中的微小抖动（字的 end > 下一字的 start）
+    3. 填补字间间隙，避免字幕高亮"跳空"
+    """
+    if len(words) <= 1:
+        return words
+
+    result = [words[0].copy()]
+    for i in range(1, len(words)):
+        w = words[i].copy()
+        prev = result[-1]
+
+        # 保证 start 不早于前一字的 start（单调递增）
+        if w["start"] < prev["start"]:
+            w["start"] = prev["start"]
+
+        # 保证 start 不早于前一字的 end
+        if w["start"] < prev["end"]:
+            # 两字重叠，取中点分割
+            mid = (prev["end"] + w["start"]) / 2
+            prev["end"] = round(mid, 3)
+            w["start"] = round(mid, 3)
+
+        # 填补字间间隙（间隙 < 50ms 时直接连接，避免高亮跳空）
+        gap = w["start"] - prev["end"]
+        if 0 < gap < 0.05:
+            prev["end"] = w["start"]
+
+        # 保证 end >= start
+        if w["end"] < w["start"]:
+            w["end"] = w["start"] + 0.05
+
+        result.append(w)
+
+    return result
+
+
 class WhisperService:
    """字幕对齐服务（基于 faster-whisper）"""

@@ -219,6 +259,8 @@ class WhisperService:
                language=language,
                word_timestamps=True,  # 启用字级别时间戳
                vad_filter=True,  # 启用 VAD 过滤静音
+                beam_size=8,  # 增大搜索宽度，提升时间戳精度
+                # condition_on_previous_text 保持默认 True，避免时间戳系统性超前
            )

            logger.info(f"Detected language: {info.language} (prob: {info.language_probability:.2f})")
@@ -244,6 +286,7 @@ class WhisperService:
                            all_words.extend(chars)

                if all_words:
+                    all_words = smooth_word_timestamps(all_words)
                    line_segments = split_segment_to_lines(all_words, max_chars)
                    all_segments.extend(line_segments)

@@ -268,6 +311,14 @@ class WhisperService:
                    w_starts = [c["start"] for c in whisper_chars]
                    w_final_end = whisper_chars[-1]["end"]

+                    # 字数比例异常检测
+                    ratio = n_o / n_w
+                    if ratio > 1.5 or ratio < 0.67:
+                        logger.warning(
+                            f"original_text 与 Whisper 字数比例异常: {n_o}/{n_w} = {ratio:.2f}, "
+                            f"字幕时间戳精度可能下降"
+                        )
+
                    logger.info(
                        f"Using original_text for subtitles (len={len(original_text)}), "
                        f"rhythm-mapping {n_o} orig chars onto {n_w} Whisper chars, "
@@ -302,11 +353,21 @@ class WhisperService:
                            "end": round(t_end, 3),
                        })

-                    all_segments = split_segment_to_lines(remapped, max_chars)
+                    # 限制单字时长范围，防止比例异常时极端漂移
+                    MIN_CHAR_DURATION = 0.04   # 40ms（一帧@25fps）
+                    MAX_CHAR_DURATION = 0.8    # 800ms
+                    for r in remapped:
+                        dur = r["end"] - r["start"]
+                        if dur < MIN_CHAR_DURATION:
+                            r["end"] = round(r["start"] + MIN_CHAR_DURATION, 3)
+                        elif dur > MAX_CHAR_DURATION:
+                            r["end"] = round(r["start"] + MAX_CHAR_DURATION, 3)
+
+                    all_segments = split_segment_to_lines(smooth_word_timestamps(remapped), max_chars)
                    logger.info(f"Rebuilt {len(all_segments)} subtitle segments (rhythm-mapped)")
                elif orig_chars:
                    # Whisper 字符不足，退回线性插值
-                    all_segments = split_segment_to_lines(orig_chars, max_chars)
+                    all_segments = split_segment_to_lines(smooth_word_timestamps(orig_chars), max_chars)
                    logger.info(f"Rebuilt {len(all_segments)} subtitle segments (linear fallback)")

            logger.info(f"Generated {len(all_segments)} subtitle segments")
--- a/frontend/src/features/home/model/useGeneratedAudios.ts
+++ b/frontend/src/features/home/model/useGeneratedAudios.ts
@@ -127,6 +127,7 @@ export const useGeneratedAudios = ({
    ref_text?: string;
    language: string;
    speed?: number;
+    instruct_text?: string;
  }) => {
    setIsGeneratingAudio(true);
    setAudioTask({ status: "pending", progress: 0, message: "正在提交..." });
--- a/frontend/src/features/home/model/useHomeController.ts
+++ b/frontend/src/features/home/model/useHomeController.ts
@@ -124,6 +124,8 @@ interface RefAudio {
  created_at: number;
 }

+type LipsyncModelMode = "default" | "fast" | "advanced";
+
 import type { Material } from "@/shared/types/material";

 export const useHomeController = () => {
@@ -155,6 +157,7 @@ export const useHomeController = () => {
  const [titleDisplayMode, setTitleDisplayMode] = useState<"short" | "persistent">("short");
  const [subtitleBottomMargin, setSubtitleBottomMargin] = useState<number>(80);
  const [outputAspectRatio, setOutputAspectRatio] = useState<"9:16" | "16:9">("9:16");
+  const [lipsyncModelMode, setLipsyncModelMode] = useState<LipsyncModelMode>("default");
  const [showStylePreview, setShowStylePreview] = useState<boolean>(false);
  const [materialDimensions, setMaterialDimensions] = useState<{ width: number; height: number } | null>(null);

@@ -182,6 +185,9 @@ export const useHomeController = () => {
  // 语速控制
  const [speed, setSpeed] = useState<number>(1.0);

+  // 语气控制（仅声音克隆模式）
+  const [emotion, setEmotion] = useState<string>("normal");
+
  // ClipTrimmer 模态框状态
  const [clipTrimmerOpen, setClipTrimmerOpen] = useState(false);
  const [clipTrimmerSegmentId, setClipTrimmerSegmentId] = useState<string | null>(null);
@@ -400,13 +406,14 @@ export const useHomeController = () => {
  });

  // 时间轴第一段素材的视频 URL（用于帧截取预览）
-  // 有时间轴段时用第一段，没有（如未选配音）回退到 selectedMaterials[0]
+  // 使用后端代理 URL（同源）避免 CORS canvas taint
  const firstTimelineMaterialUrl = useMemo(() => {
    const firstSeg = timelineSegments[0];
    const matId = firstSeg?.materialId ?? selectedMaterials[0];
    if (!matId) return null;
    const mat = materials.find((m) => m.id === matId);
-    return mat?.path ? resolveMediaUrl(mat.path) : null;
+    if (!mat) return null;
+    return `/api/materials/stream/${mat.id}`;
  }, [materials, timelineSegments, selectedMaterials]);

  const materialPosterUrl = useVideoFrameCapture(showStylePreview ? firstTimelineMaterialUrl : null);
@@ -488,6 +495,8 @@ export const useHomeController = () => {
    setSubtitleBottomMargin,
    outputAspectRatio,
    setOutputAspectRatio,
+    lipsyncModelMode,
+    setLipsyncModelMode,
    selectedBgmId,
    setSelectedBgmId,
    bgmVolume,
@@ -501,6 +510,8 @@ export const useHomeController = () => {
    setSelectedAudioId,
    speed,
    setSpeed,
+    emotion,
+    setEmotion,
  });

  const { savedScripts, saveScript, deleteScript: deleteSavedScript } = useSavedScripts(storageKey);
@@ -875,6 +886,13 @@ export const useHomeController = () => {
      return;
    }

+    const emotionToInstruct: Record<string, string> = {
+      normal: "",
+      happy: "You are a helpful assistant. 请非常开心地说一句话。<|endofprompt|>",
+      sad: "You are a helpful assistant. 请非常伤心地说一句话。<|endofprompt|>",
+      angry: "You are a helpful assistant. 请非常生气地说一句话。<|endofprompt|>",
+    };
+
    const params = {
      text: text.trim(),
      tts_mode: ttsMode,
@@ -883,6 +901,7 @@ export const useHomeController = () => {
      ref_text: ttsMode === "voiceclone" ? refText : undefined,
      language: textLang,
      speed: ttsMode === "voiceclone" ? speed : undefined,
+      instruct_text: ttsMode === "voiceclone" ? emotionToInstruct[emotion] || "" : undefined,
    };
    await generateAudio(params);
  };
@@ -920,6 +939,7 @@ export const useHomeController = () => {
        text: selectedAudio.text || text,
        generated_audio_id: selectedAudio.id,
        language: selectedAudio.language || textLang,
+        lipsync_model: lipsyncModelMode,
        title: videoTitle.trim() || undefined,
        enable_subtitles: true,
        output_aspect_ratio: outputAspectRatio,
@@ -1140,6 +1160,8 @@ export const useHomeController = () => {
    setSubtitleBottomMargin,
    outputAspectRatio,
    setOutputAspectRatio,
+    lipsyncModelMode,
+    setLipsyncModelMode,
    resolveAssetUrl,
    getFontFormat,
    buildTextShadow,
@@ -1214,6 +1236,8 @@ export const useHomeController = () => {
    selectAudio,
    speed,
    setSpeed,
+    emotion,
+    setEmotion,
    timelineSegments,
    reorderSegments,
    setSourceRange,
--- a/frontend/src/features/home/model/useHomePersistence.ts
+++ b/frontend/src/features/home/model/useHomePersistence.ts
@@ -52,6 +52,8 @@ interface UseHomePersistenceOptions {
  setSubtitleBottomMargin: React.Dispatch<React.SetStateAction<number>>;
  outputAspectRatio: '9:16' | '16:9';
  setOutputAspectRatio: React.Dispatch<React.SetStateAction<'9:16' | '16:9'>>;
+  lipsyncModelMode: 'default' | 'fast' | 'advanced';
+  setLipsyncModelMode: React.Dispatch<React.SetStateAction<'default' | 'fast' | 'advanced'>>;
  selectedBgmId: string;
  setSelectedBgmId: React.Dispatch<React.SetStateAction<string>>;
  bgmVolume: number;
@@ -65,6 +67,8 @@ interface UseHomePersistenceOptions {
  setSelectedAudioId: React.Dispatch<React.SetStateAction<string | null>>;
  speed: number;
  setSpeed: React.Dispatch<React.SetStateAction<number>>;
+  emotion: string;
+  setEmotion: React.Dispatch<React.SetStateAction<string>>;
 }

 export const useHomePersistence = ({
@@ -109,6 +113,8 @@ export const useHomePersistence = ({
  setSubtitleBottomMargin,
  outputAspectRatio,
  setOutputAspectRatio,
+  lipsyncModelMode,
+  setLipsyncModelMode,
  selectedBgmId,
  setSelectedBgmId,
  bgmVolume,
@@ -122,6 +128,8 @@ export const useHomePersistence = ({
  setSelectedAudioId,
  speed,
  setSpeed,
+  emotion,
+  setEmotion,
 }: UseHomePersistenceOptions) => {
  const [isRestored, setIsRestored] = useState(false);

@@ -152,7 +160,9 @@ export const useHomePersistence = ({
    const savedTitleDisplayMode = localStorage.getItem(`vigent_${storageKey}_titleDisplayMode`);
    const savedSubtitleBottomMargin = localStorage.getItem(`vigent_${storageKey}_subtitleBottomMargin`);
    const savedOutputAspectRatio = localStorage.getItem(`vigent_${storageKey}_outputAspectRatio`);
+    const savedLipsyncModelMode = localStorage.getItem(`vigent_${storageKey}_lipsyncModelMode`);
    const savedSpeed = localStorage.getItem(`vigent_${storageKey}_speed`);
+    const savedEmotion = localStorage.getItem(`vigent_${storageKey}_emotion`);

    setText(savedText || "大家好，欢迎来到我的频道，今天给大家分享一些有趣的内容。");
    setVideoTitle(savedTitle ? clampTitle(savedTitle) : "");
@@ -230,11 +240,21 @@ export const useHomePersistence = ({
      setOutputAspectRatio(savedOutputAspectRatio);
    }

+    if (
+      savedLipsyncModelMode === 'default'
+      || savedLipsyncModelMode === 'fast'
+      || savedLipsyncModelMode === 'advanced'
+    ) {
+      setLipsyncModelMode(savedLipsyncModelMode);
+    }
+
    if (savedSpeed) {
      const parsed = parseFloat(savedSpeed);
      if (!Number.isNaN(parsed)) setSpeed(parsed);
    }

+    if (savedEmotion) setEmotion(savedEmotion);
+
    // eslint-disable-next-line react-hooks/set-state-in-effect
    setIsRestored(true);
  }, [
@@ -249,6 +269,7 @@ export const useHomePersistence = ({
    setSelectedVideoId,
    setSelectedAudioId,
    setSpeed,
+    setEmotion,
    setSubtitleFontSize,
    setSubtitleSizeLocked,
    setText,
@@ -262,6 +283,7 @@ export const useHomePersistence = ({
    setTitleDisplayMode,
    setSubtitleBottomMargin,
    setOutputAspectRatio,
+    setLipsyncModelMode,
    setTtsMode,
    setVideoTitle,
    setVideoSecondaryTitle,
@@ -377,6 +399,12 @@ export const useHomePersistence = ({
    }
  }, [outputAspectRatio, storageKey, isRestored]);

+  useEffect(() => {
+    if (isRestored) {
+      localStorage.setItem(`vigent_${storageKey}_lipsyncModelMode`, lipsyncModelMode);
+    }
+  }, [lipsyncModelMode, storageKey, isRestored]);
+
  useEffect(() => {
    if (isRestored) {
      localStorage.setItem(`vigent_${storageKey}_bgmId`, selectedBgmId);
@@ -427,5 +455,11 @@ export const useHomePersistence = ({
    }
  }, [speed, storageKey, isRestored]);

+  useEffect(() => {
+    if (isRestored) {
+      localStorage.setItem(`vigent_${storageKey}_emotion`, emotion);
+    }
+  }, [emotion, storageKey, isRestored]);
+
  return { isRestored };
 };
--- a/frontend/src/features/home/model/useVideoFrameCapture.ts
+++ b/frontend/src/features/home/model/useVideoFrameCapture.ts
@@ -18,7 +18,6 @@ export function useVideoFrameCapture(videoUrl: string | null): string | null {

    let isActive = true;
    const video = document.createElement("video");
-    video.crossOrigin = "anonymous";
    video.muted = true;
    video.preload = "auto";
    video.playsInline = true;
--- a/frontend/src/features/home/ui/GenerateActionBar.tsx
+++ b/frontend/src/features/home/ui/GenerateActionBar.tsx
@@ -1,10 +1,14 @@
 import { Rocket } from "lucide-react";

+type LipsyncModelMode = "default" | "fast" | "advanced";
+
 interface GenerateActionBarProps {
  isGenerating: boolean;
  progress: number;
  disabled: boolean;
  materialCount?: number;
+  modelMode: LipsyncModelMode;
+  onModelModeChange: (value: LipsyncModelMode) => void;
  onGenerate: () => void;
 }

@@ -13,45 +17,61 @@ export function GenerateActionBar({
  progress,
  disabled,
  materialCount = 1,
+  modelMode,
+  onModelModeChange,
  onGenerate,
 }: GenerateActionBarProps) {
  return (
    <div>
-      <button
-        onClick={onGenerate}
-        disabled={disabled}
-        className={`w-full py-4 rounded-xl font-bold text-lg transition-all ${disabled
-          ? "bg-gray-600 cursor-not-allowed text-gray-400"
-          : "bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-700 hover:to-pink-700 text-white shadow-lg hover:shadow-purple-500/25"
-          }`}
-      >
-        {isGenerating ? (
-          <span className="flex items-center justify-center gap-3">
-            <svg className="animate-spin h-5 w-5" viewBox="0 0 24 24">
-              <circle
-                className="opacity-25"
-                cx="12"
-                cy="12"
-                r="10"
-                stroke="currentColor"
-                strokeWidth="4"
-                fill="none"
-              />
-              <path
-                className="opacity-75"
-                fill="currentColor"
-                d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z"
-              />
-            </svg>
-            生成中... {progress}%
-          </span>
-        ) : (
-          <span className="flex items-center justify-center gap-2">
-            <Rocket className="h-5 w-5" />
-            生成视频
-          </span>
-        )}
-      </button>
+      <div className="flex items-center gap-2">
+        <button
+          onClick={onGenerate}
+          disabled={disabled}
+          className={`flex-1 py-4 rounded-xl font-bold text-lg transition-all ${disabled
+            ? "bg-gray-600 cursor-not-allowed text-gray-400"
+            : "bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-700 hover:to-pink-700 text-white shadow-lg hover:shadow-purple-500/25"
+            }`}
+        >
+          {isGenerating ? (
+            <span className="flex items-center justify-center gap-3">
+              <svg className="animate-spin h-5 w-5" viewBox="0 0 24 24">
+                <circle
+                  className="opacity-25"
+                  cx="12"
+                  cy="12"
+                  r="10"
+                  stroke="currentColor"
+                  strokeWidth="4"
+                  fill="none"
+                />
+                <path
+                  className="opacity-75"
+                  fill="currentColor"
+                  d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z"
+                />
+              </svg>
+              生成中... {progress}%
+            </span>
+          ) : (
+            <span className="flex items-center justify-center gap-2">
+              <Rocket className="h-5 w-5" />
+              生成视频
+            </span>
+          )}
+        </button>
+
+        <select
+          value={modelMode}
+          onChange={(e) => onModelModeChange(e.target.value as LipsyncModelMode)}
+          disabled={isGenerating}
+          className="h-[58px] rounded-xl border border-white/15 bg-black/30 px-3 text-sm text-gray-200 outline-none focus:border-purple-400"
+          title="选择唇形模型"
+        >
+          <option value="default">默认模型</option>
+          <option value="fast">快速模型</option>
+          <option value="advanced">高级模型</option>
+        </select>
+      </div>
      {!isGenerating && materialCount >= 2 && (
        <p className="text-xs text-gray-400 text-center mt-1.5">
          多素材模式 ({materialCount} 个机位)，生成耗时较长
--- a/frontend/src/features/home/ui/GeneratedAudiosPanel.tsx
+++ b/frontend/src/features/home/ui/GeneratedAudiosPanel.tsx
@@ -23,6 +23,8 @@ interface GeneratedAudiosPanelProps {
  speed: number;
  onSpeedChange: (speed: number) => void;
  ttsMode: string;
+  emotion: string;
+  onEmotionChange: (e: string) => void;
  embedded?: boolean;
 }

@@ -41,14 +43,18 @@ export function GeneratedAudiosPanel({
  speed,
  onSpeedChange,
  ttsMode,
+  emotion,
+  onEmotionChange,
  embedded = false,
 }: GeneratedAudiosPanelProps) {
  const [editingId, setEditingId] = useState<string | null>(null);
  const [editName, setEditName] = useState("");
  const [playingId, setPlayingId] = useState<string | null>(null);
  const [speedOpen, setSpeedOpen] = useState(false);
+  const [emotionOpen, setEmotionOpen] = useState(false);
  const audioRef = useRef<HTMLAudioElement | null>(null);
  const speedRef = useRef<HTMLDivElement>(null);
+  const emotionRef = useRef<HTMLDivElement>(null);

  const stopPlaying = useCallback(() => {
    if (audioRef.current) {
@@ -80,6 +86,17 @@ export function GeneratedAudiosPanel({
    return () => document.removeEventListener("mousedown", handler);
  }, [speedOpen]);

+  // Close emotion dropdown on click outside
+  useEffect(() => {
+    const handler = (e: MouseEvent) => {
+      if (emotionRef.current && !emotionRef.current.contains(e.target as Node)) {
+        setEmotionOpen(false);
+      }
+    };
+    if (emotionOpen) document.addEventListener("mousedown", handler);
+    return () => document.removeEventListener("mousedown", handler);
+  }, [emotionOpen]);
+
  const togglePlay = (audio: GeneratedAudio, e: React.MouseEvent) => {
    e.stopPropagation();
    if (playingId === audio.id) {
@@ -125,12 +142,48 @@ export function GeneratedAudiosPanel({
  ] as const;
  const currentSpeedLabel = speedOptions.find((o) => o.value === speed)?.label ?? "正常";

+  const emotionOptions = [
+    { value: "normal", label: "正常" },
+    { value: "happy", label: "欢快" },
+    { value: "sad", label: "低沉" },
+    { value: "angry", label: "严肃" },
+  ] as const;
+  const currentEmotionLabel = emotionOptions.find((o) => o.value === emotion)?.label ?? "正常";
+
  const content = (
    <>
      {embedded ? (
        <>
-          {/* Row 1: 语速 + 生成配音 (right-aligned) */}
+          {/* Row 1: 语气 + 语速 + 生成配音 (right-aligned) */}
          <div className="flex justify-end items-center gap-1.5 mb-3">
+            {ttsMode === "voiceclone" && (
+              <div ref={emotionRef} className="relative">
+                <button
+                  onClick={() => setEmotionOpen((v) => !v)}
+                  className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 whitespace-nowrap flex items-center gap-1 transition-all"
+                >
+                  语气: {currentEmotionLabel}
+                  <ChevronDown className={`h-3 w-3 transition-transform ${emotionOpen ? "rotate-180" : ""}`} />
+                </button>
+                {emotionOpen && (
+                  <div className="absolute right-0 top-full mt-1 bg-gray-800 border border-white/20 rounded-lg shadow-xl py-1 z-50 min-w-[80px]">
+                    {emotionOptions.map((opt) => (
+                      <button
+                        key={opt.value}
+                        onClick={() => { onEmotionChange(opt.value); setEmotionOpen(false); }}
+                        className={`w-full text-left px-3 py-1.5 text-xs transition-colors ${
+                          emotion === opt.value
+                            ? "bg-purple-600/40 text-purple-200"
+                            : "text-gray-300 hover:bg-white/10"
+                        }`}
+                      >
+                        {opt.label}
+                      </button>
+                    ))}
+                  </div>
+                )}
+              </div>
+            )}
            {ttsMode === "voiceclone" && (
              <div ref={speedRef} className="relative">
                <button
@@ -192,6 +245,34 @@ export function GeneratedAudiosPanel({
            配音列表
          </h2>
          <div className="flex gap-1.5">
+            {ttsMode === "voiceclone" && (
+              <div ref={emotionRef} className="relative">
+                <button
+                  onClick={() => setEmotionOpen((v) => !v)}
+                  className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 whitespace-nowrap flex items-center gap-1 transition-all"
+                >
+                  语气: {currentEmotionLabel}
+                  <ChevronDown className={`h-3 w-3 transition-transform ${emotionOpen ? "rotate-180" : ""}`} />
+                </button>
+                {emotionOpen && (
+                  <div className="absolute right-0 top-full mt-1 bg-gray-800 border border-white/20 rounded-lg shadow-xl py-1 z-50 min-w-[80px]">
+                    {emotionOptions.map((opt) => (
+                      <button
+                        key={opt.value}
+                        onClick={() => { onEmotionChange(opt.value); setEmotionOpen(false); }}
+                        className={`w-full text-left px-3 py-1.5 text-xs transition-colors ${
+                          emotion === opt.value
+                            ? "bg-purple-600/40 text-purple-200"
+                            : "text-gray-300 hover:bg-white/10"
+                        }`}
+                      >
+                        {opt.label}
+                      </button>
+                    ))}
+                  </div>
+                )}
+              </div>
+            )}
            {ttsMode === "voiceclone" && (
              <div ref={speedRef} className="relative">
                <button
--- a/frontend/src/features/home/ui/HomePage.tsx
+++ b/frontend/src/features/home/ui/HomePage.tsx
@@ -97,6 +97,8 @@ export function HomePage() {
    setTitleDisplayMode,
    outputAspectRatio,
    setOutputAspectRatio,
+    lipsyncModelMode,
+    setLipsyncModelMode,
    resolveAssetUrl,
    getFontFormat,
    buildTextShadow,
@@ -168,6 +170,8 @@ export function HomePage() {
    selectAudio,
    speed,
    setSpeed,
+    emotion,
+    setEmotion,
    timelineSegments,
    reorderSegments,
    setSourceRange,
@@ -293,6 +297,8 @@ export function HomePage() {
                speed={speed}
                onSpeedChange={setSpeed}
                ttsMode={ttsMode}
+                emotion={emotion}
+                onEmotionChange={setEmotion}
              />
            </div>

@@ -427,6 +433,8 @@ export function HomePage() {
              progress={currentTask?.progress || 0}
              materialCount={selectedMaterials.length}
              disabled={isGenerating || selectedMaterials.length === 0 || !selectedAudio}
+              modelMode={lipsyncModelMode}
+              onModelModeChange={setLipsyncModelMode}
              onGenerate={handleGenerate}
            />
          </div>
--- a/models/CosyVoice/cosyvoice_server.py
+++ b/models/CosyVoice/cosyvoice_server.py
@@ -174,6 +174,7 @@ async def generate(
    ref_text: str = Form(...),
    language: str = Form("Chinese"),
    speed: float = Form(1.0),
+    instruct_text: str = Form(""),
 ):
    """
    声音克隆生成
@@ -236,16 +237,30 @@ async def generate(
            # CosyVoice3 的 prompt_text 格式
            prompt_text = f"You are a helpful assistant.<|endofprompt|>{ref_text}"

+            use_instruct = bool(instruct_text.strip())
+            if use_instruct:
+                print(f"🎭 Instruct mode: {instruct_text[:60]}...")
+
            def _do_inference():
                """在线程池中执行推理"""
-                results = list(_model.inference_zero_shot(
-                    text,
-                    prompt_text,
-                    ref_audio_path,
-                    stream=False,
-                    speed=speed,
-                    text_frontend=True,
-                ))
+                if use_instruct:
+                    results = list(_model.inference_instruct2(
+                        text,
+                        instruct_text,
+                        ref_audio_path,
+                        stream=False,
+                        speed=speed,
+                        text_frontend=True,
+                    ))
+                else:
+                    results = list(_model.inference_zero_shot(
+                        text,
+                        prompt_text,
+                        ref_audio_path,
+                        stream=False,
+                        speed=speed,
+                        text_frontend=True,
+                    ))
                if not results:
                    raise RuntimeError("CosyVoice returned empty results")

--- a/models/LatentSync/latentsync/pipelines/lipsync_pipeline.py
+++ b/models/LatentSync/latentsync/pipelines/lipsync_pipeline.py
@@ -253,21 +253,58 @@ class LipsyncPipeline(DiffusionPipeline):
        faces = []
        boxes = []
        affine_matrices = []
+        valid_face_flags = []
        print(f"Affine transforming {len(video_frames)} faces...")
        for frame in tqdm.tqdm(video_frames):
-            face, box, affine_matrix = self.image_processor.affine_transform(frame)
-            faces.append(face)
-            boxes.append(box)
-            affine_matrices.append(affine_matrix)
+            try:
+                face, box, affine_matrix = self.image_processor.affine_transform(frame)
+                faces.append(face)
+                boxes.append(box)
+                affine_matrices.append(affine_matrix)
+                valid_face_flags.append(True)
+            except Exception:
+                faces.append(None)
+                boxes.append(None)
+                affine_matrices.append(None)
+                valid_face_flags.append(False)
+
+        valid_indices = [i for i, flag in enumerate(valid_face_flags) if flag]
+        if not valid_indices:
+            raise RuntimeError("Face not detected in any frame")
+
+        for i in range(len(faces)):
+            if faces[i] is not None:
+                continue
+            nearest_idx = min(valid_indices, key=lambda idx: abs(idx - i))
+            faces[i] = faces[nearest_idx].clone()
+            boxes[i] = boxes[nearest_idx]
+            affine_matrices[i] = affine_matrices[nearest_idx]
+
+        missing_count = len(valid_face_flags) - len(valid_indices)
+        if missing_count > 0:
+            print(
+                f"Warning: face not detected in {missing_count}/{len(valid_face_flags)} frames. "
+                "Those frames will keep original content."
+            )

        faces = torch.stack(faces)
-        return faces, boxes, affine_matrices
+        return faces, boxes, affine_matrices, valid_face_flags

-    def restore_video(self, faces: torch.Tensor, video_frames: np.ndarray, boxes: list, affine_matrices: list):
+    def restore_video(
+        self,
+        faces: torch.Tensor,
+        video_frames: np.ndarray,
+        boxes: list,
+        affine_matrices: list,
+        valid_face_flags: Optional[list] = None,
+    ):
        video_frames = video_frames[: len(faces)]
        out_frames = []
        print(f"Restoring {len(faces)} faces...")
        for index, face in enumerate(tqdm.tqdm(faces)):
+            if valid_face_flags is not None and not valid_face_flags[index]:
+                out_frames.append(video_frames[index])
+                continue
            x1, y1, x2, y2 = boxes[index]
            height = int(y2 - y1)
            width = int(x2 - x1)
@@ -281,33 +318,37 @@ class LipsyncPipeline(DiffusionPipeline):
    def loop_video(self, whisper_chunks: list, video_frames: np.ndarray):
        # If the audio is longer than the video, we need to loop the video
        if len(whisper_chunks) > len(video_frames):
-            faces, boxes, affine_matrices = self.affine_transform_video(video_frames)
+            faces, boxes, affine_matrices, valid_face_flags = self.affine_transform_video(video_frames)
            num_loops = math.ceil(len(whisper_chunks) / len(video_frames))
            loop_video_frames = []
            loop_faces = []
            loop_boxes = []
            loop_affine_matrices = []
+            loop_valid_face_flags = []
            for i in range(num_loops):
                if i % 2 == 0:
                    loop_video_frames.append(video_frames)
                    loop_faces.append(faces)
                    loop_boxes += boxes
                    loop_affine_matrices += affine_matrices
+                    loop_valid_face_flags += valid_face_flags
                else:
                    loop_video_frames.append(video_frames[::-1])
                    loop_faces.append(faces.flip(0))
                    loop_boxes += boxes[::-1]
                    loop_affine_matrices += affine_matrices[::-1]
+                    loop_valid_face_flags += valid_face_flags[::-1]

            video_frames = np.concatenate(loop_video_frames, axis=0)[: len(whisper_chunks)]
            faces = torch.cat(loop_faces, dim=0)[: len(whisper_chunks)]
            boxes = loop_boxes[: len(whisper_chunks)]
            affine_matrices = loop_affine_matrices[: len(whisper_chunks)]
+            valid_face_flags = loop_valid_face_flags[: len(whisper_chunks)]
        else:
            video_frames = video_frames[: len(whisper_chunks)]
-            faces, boxes, affine_matrices = self.affine_transform_video(video_frames)
+            faces, boxes, affine_matrices, valid_face_flags = self.affine_transform_video(video_frames)

-        return video_frames, faces, boxes, affine_matrices
+        return video_frames, faces, boxes, affine_matrices, valid_face_flags

    @torch.no_grad()
    def __call__(
@@ -367,7 +408,7 @@ class LipsyncPipeline(DiffusionPipeline):
        audio_samples = read_audio(audio_path)
        video_frames = read_video(video_path, use_decord=False)

-        video_frames, faces, boxes, affine_matrices = self.loop_video(whisper_chunks, video_frames)
+        video_frames, faces, boxes, affine_matrices, valid_face_flags = self.loop_video(whisper_chunks, video_frames)

        synced_video_frames = []

@@ -457,7 +498,13 @@ class LipsyncPipeline(DiffusionPipeline):
            )
            synced_video_frames.append(decoded_latents)

-        synced_video_frames = self.restore_video(torch.cat(synced_video_frames), video_frames, boxes, affine_matrices)
+        synced_video_frames = self.restore_video(
+            torch.cat(synced_video_frames),
+            video_frames,
+            boxes,
+            affine_matrices,
+            valid_face_flags=valid_face_flags,
+        )

        audio_samples_remain_length = int(synced_video_frames.shape[0] / video_fps * audio_sample_rate)
        audio_samples = audio_samples[:audio_samples_remain_length].cpu().numpy()
@@ -473,5 +520,5 @@ class LipsyncPipeline(DiffusionPipeline):

        sf.write(os.path.join(temp_dir, "audio.wav"), audio_samples, audio_sample_rate)

-        command = f"ffmpeg -y -loglevel error -nostdin -i {os.path.join(temp_dir, 'video.mp4')} -i {os.path.join(temp_dir, 'audio.wav')} -c:v libx264 -crf 18 -c:a aac -q:v 0 -q:a 0 {video_out_path}"
+        command = f"ffmpeg -y -loglevel error -nostdin -i {os.path.join(temp_dir, 'video.mp4')} -i {os.path.join(temp_dir, 'audio.wav')} -c:v copy -c:a aac -q:a 0 {video_out_path}"
        subprocess.run(command, shell=True)
--- a/models/LatentSync/latentsync/utils/util.py
+++ b/models/LatentSync/latentsync/utils/util.py
@@ -49,11 +49,22 @@ def read_video(video_path: str, change_fps=True, use_decord=True):
        if os.path.exists(temp_dir):
            shutil.rmtree(temp_dir)
        os.makedirs(temp_dir, exist_ok=True)
-        command = (
-            f"ffmpeg -loglevel error -y -nostdin -i {video_path} -r 25 -crf 18 {os.path.join(temp_dir, 'video.mp4')}"
-        )
-        subprocess.run(command, shell=True)
-        target_video_path = os.path.join(temp_dir, "video.mp4")
+
+        # 检测输入视频 FPS，已是 25fps 时跳过重编码
+        cap = cv2.VideoCapture(video_path)
+        current_fps = cap.get(cv2.CAP_PROP_FPS)
+        cap.release()
+
+        if abs(current_fps - 25.0) < 0.5:
+            # 已是 25fps，直接使用原文件（避免一次有损重编码）
+            print(f"Video already at {current_fps:.1f}fps, skipping FPS conversion")
+            target_video_path = video_path
+        else:
+            command = (
+                f"ffmpeg -loglevel error -y -nostdin -i {video_path} -r 25 -crf 18 {os.path.join(temp_dir, 'video.mp4')}"
+            )
+            subprocess.run(command, shell=True)
+            target_video_path = os.path.join(temp_dir, "video.mp4")
    else:
        target_video_path = video_path

--- a/models/MuseTalk/musetalk/utils/blending.py
+++ b/models/MuseTalk/musetalk/utils/blending.py
@@ -109,6 +109,31 @@ def get_image_blending(image, face, face_box, mask_array, crop_box):
    return body[:,:,::-1]


+def get_image_blending_fast(image, face, face_box, mask_array, crop_box):
+    """纯 numpy blending，无 PIL 转换，无 BGR↔RGB 翻转。
+    所有输入输出均为 BGR numpy uint8，与 get_image_blending 语义等价。
+    """
+    x, y, x1, y1 = face_box
+    x_s, y_s, x_e, y_e = crop_box
+
+    result = image.copy()
+
+    # 1. 将生成的人脸贴入 crop 区域对应位置
+    crop_region = result[y_s:y_e, x_s:x_e].copy()
+    fy, fx = y - y_s, x - x_s
+    fh, fw = y1 - y, x1 - x
+    crop_region[fy:fy+fh, fx:fx+fw] = face
+
+    # 2. mask alpha 混合（numpy 向量化广播）
+    mask_f = mask_array[:, :, np.newaxis].astype(np.float32) * (1.0 / 255.0)
+    orig_region = result[y_s:y_e, x_s:x_e].astype(np.float32)
+    new_region = crop_region.astype(np.float32)
+    blended = orig_region * (1.0 - mask_f) + new_region * mask_f
+    result[y_s:y_e, x_s:x_e] = blended.astype(np.uint8)
+
+    return result
+
+
 def get_image_prepare_material(image, face_box, upper_boundary_ratio=0.5, expand=1.5, fp=None, mode="raw"):
    body = Image.fromarray(image[:,:,::-1])

--- a/models/MuseTalk/scripts/server.py
+++ b/models/MuseTalk/scripts/server.py
@@ -4,14 +4,14 @@ MuseTalk v1.5 常驻推理服务 (优化版 v2)
 - GPU: 从 backend/.env 读取 MUSETALK_GPU_ID (默认 0)
 - 架构: FastAPI + lifespan (与 LatentSync server.py 同模式)

-优化项 (vs v1):
-1. cv2.VideoCapture 直读帧 (跳过 ffmpeg→PNG→imread)
-2. 人脸检测降频 (每 N 帧检测, 中间插值 bbox)
-3. BiSeNet mask 缓存 (每 N 帧更新, 中间复用)
-4. cv2.VideoWriter 直写视频 (跳过逐帧 PNG 写盘)
-5. batch_size 8→32
-6. 每阶段计时
-"""
+优化项 (vs v1):
+1. cv2.VideoCapture 直读帧 (跳过 ffmpeg→PNG→imread)
+2. 人脸检测降频 (每 N 帧检测, 中间插值 bbox)
+3. BiSeNet mask 缓存 (每 N 帧更新, 中间复用)
+4. FFmpeg rawvideo 管道直编码 (去掉中间有损 mp4v)
+5. batch_size 8→32
+6. 每阶段计时
+"""

 import os
 import sys
@@ -77,24 +77,35 @@ from transformers import WhisperModel
 musetalk_root = Path(__file__).resolve().parent.parent
 sys.path.insert(0, str(musetalk_root))

-from musetalk.utils.blending import get_image, get_image_blending, get_image_prepare_material
+from musetalk.utils.blending import get_image, get_image_blending, get_image_blending_fast, get_image_prepare_material
 from musetalk.utils.face_parsing import FaceParsing
 from musetalk.utils.audio_processor import AudioProcessor
 from musetalk.utils.utils import get_file_type, get_video_fps, datagen, load_all_model
 from musetalk.utils.preprocessing import get_landmark_and_bbox, read_imgs, coord_placeholder

 # --- 从 .env 读取额外配置 ---
-def load_env_config():
-    """读取 MuseTalk 相关环境变量"""
-    config = {
-        "batch_size": 32,
-        "version": "v15",
-        "use_float16": True,
-    }
-    try:
-        env_path = musetalk_root.parent.parent / "backend" / ".env"
-        if env_path.exists():
-            with open(env_path, "r", encoding="utf-8") as f:
+def load_env_config():
+    """读取 MuseTalk 相关环境变量"""
+    config = {
+        "batch_size": 32,
+        "version": "v15",
+        "use_float16": True,
+        "detect_every": 5,
+        "blend_cache_every": 5,
+        "audio_padding_left": 2,
+        "audio_padding_right": 2,
+        "extra_margin": 15,
+        "delay_frame": 0,
+        "blend_mode": "auto",
+        "faceparsing_left_cheek_width": 90,
+        "faceparsing_right_cheek_width": 90,
+        "encode_crf": 18,
+        "encode_preset": "medium",
+    }
+    try:
+        env_path = musetalk_root.parent.parent / "backend" / ".env"
+        if env_path.exists():
+            with open(env_path, "r", encoding="utf-8") as f:
                for line in f:
                    line = line.strip()
                    if line.startswith("MUSETALK_BATCH_SIZE="):
@@ -105,32 +116,90 @@ def load_env_config():
                        val = line.split("=")[1].strip().split("#")[0].strip()
                        if val:
                            config["version"] = val
-                    elif line.startswith("MUSETALK_USE_FLOAT16="):
-                        val = line.split("=")[1].strip().split("#")[0].strip().lower()
-                        config["use_float16"] = val in ("true", "1", "yes")
-    except Exception as e:
-        print(f"⚠️ 读取额外配置失败: {e}")
-    return config
-
-env_config = load_env_config()
+                    elif line.startswith("MUSETALK_USE_FLOAT16="):
+                        val = line.split("=")[1].strip().split("#")[0].strip().lower()
+                        config["use_float16"] = val in ("true", "1", "yes")
+                    elif line.startswith("MUSETALK_DETECT_EVERY="):
+                        val = line.split("=")[1].strip().split("#")[0].strip()
+                        if val:
+                            config["detect_every"] = max(1, int(val))
+                    elif line.startswith("MUSETALK_BLEND_CACHE_EVERY="):
+                        val = line.split("=")[1].strip().split("#")[0].strip()
+                        if val:
+                            config["blend_cache_every"] = max(1, int(val))
+                    elif line.startswith("MUSETALK_AUDIO_PADDING_LEFT="):
+                        val = line.split("=")[1].strip().split("#")[0].strip()
+                        if val:
+                            config["audio_padding_left"] = max(0, int(val))
+                    elif line.startswith("MUSETALK_AUDIO_PADDING_RIGHT="):
+                        val = line.split("=")[1].strip().split("#")[0].strip()
+                        if val:
+                            config["audio_padding_right"] = max(0, int(val))
+                    elif line.startswith("MUSETALK_EXTRA_MARGIN="):
+                        val = line.split("=")[1].strip().split("#")[0].strip()
+                        if val:
+                            config["extra_margin"] = max(0, int(val))
+                    elif line.startswith("MUSETALK_DELAY_FRAME="):
+                        val = line.split("=")[1].strip().split("#")[0].strip()
+                        if val:
+                            config["delay_frame"] = int(val)
+                    elif line.startswith("MUSETALK_BLEND_MODE="):
+                        val = line.split("=")[1].strip().split("#")[0].strip().lower()
+                        if val in ("auto", "jaw", "raw"):
+                            config["blend_mode"] = val
+                    elif line.startswith("MUSETALK_FACEPARSING_LEFT_CHEEK_WIDTH="):
+                        val = line.split("=")[1].strip().split("#")[0].strip()
+                        if val:
+                            config["faceparsing_left_cheek_width"] = max(0, int(val))
+                    elif line.startswith("MUSETALK_FACEPARSING_RIGHT_CHEEK_WIDTH="):
+                        val = line.split("=")[1].strip().split("#")[0].strip()
+                        if val:
+                            config["faceparsing_right_cheek_width"] = max(0, int(val))
+                    elif line.startswith("MUSETALK_ENCODE_CRF="):
+                        val = line.split("=")[1].strip().split("#")[0].strip()
+                        if val:
+                            config["encode_crf"] = min(51, max(0, int(val)))
+                    elif line.startswith("MUSETALK_ENCODE_PRESET="):
+                        val = line.split("=")[1].strip().split("#")[0].strip().lower()
+                        if val in (
+                            "ultrafast", "superfast", "veryfast", "faster", "fast",
+                            "medium", "slow", "slower", "veryslow"
+                        ):
+                            config["encode_preset"] = val
+    except Exception as e:
+        print(f"⚠️ 读取额外配置失败: {e}")
+    return config
+
+env_config = load_env_config()

 # 全局模型缓存
 models = {}

-# ===================== 优化参数 =====================
-DETECT_EVERY = 5        # 人脸检测降频: 每 N 帧检测一次
-BLEND_CACHE_EVERY = 5   # BiSeNet mask 缓存: 每 N 帧更新一次
-# ====================================================
+# ===================== 优化参数 =====================
+DETECT_EVERY = int(env_config["detect_every"])            # 人脸检测降频: 每 N 帧检测一次
+BLEND_CACHE_EVERY = int(env_config["blend_cache_every"])  # BiSeNet mask 缓存: 每 N 帧更新一次
+AUDIO_PADDING_LEFT = int(env_config["audio_padding_left"])
+AUDIO_PADDING_RIGHT = int(env_config["audio_padding_right"])
+EXTRA_MARGIN = int(env_config["extra_margin"])
+DELAY_FRAME = int(env_config["delay_frame"])
+BLEND_MODE = str(env_config["blend_mode"])
+FACEPARSING_LEFT_CHEEK_WIDTH = int(env_config["faceparsing_left_cheek_width"])
+FACEPARSING_RIGHT_CHEEK_WIDTH = int(env_config["faceparsing_right_cheek_width"])
+ENCODE_CRF = int(env_config["encode_crf"])
+ENCODE_PRESET = str(env_config["encode_preset"])
+# ====================================================


 def run_ffmpeg(cmd):
-    """执行 FFmpeg 命令"""
-    print(f"Executing: {cmd}")
+    """执行 FFmpeg 命令（接受列表或字符串）"""
+    if isinstance(cmd, str):
+        cmd = cmd.split()
+    print(f"Executing: {' '.join(cmd)}")
    try:
-        result = subprocess.run(cmd, shell=True, check=True, capture_output=True, text=True)
+        result = subprocess.run(cmd, check=True, capture_output=True, text=True)
        return True
    except subprocess.CalledProcessError as e:
-        print(f"Error executing ffmpeg: {cmd}")
+        print(f"Error executing ffmpeg: {' '.join(cmd)}")
        print(f"Return code: {e.returncode}")
        print(f"Stderr: {e.stderr[:500]}")
        return False
@@ -189,11 +258,14 @@ async def lifespan(app: FastAPI):
    whisper = whisper.to(device=device, dtype=weight_dtype).eval()
    whisper.requires_grad_(False)

-    # FaceParsing
-    if version == "v15":
-        fp = FaceParsing(left_cheek_width=90, right_cheek_width=90)
-    else:
-        fp = FaceParsing()
+    # FaceParsing
+    if version == "v15":
+        fp = FaceParsing(
+            left_cheek_width=FACEPARSING_LEFT_CHEEK_WIDTH,
+            right_cheek_width=FACEPARSING_RIGHT_CHEEK_WIDTH,
+        )
+    else:
+        fp = FaceParsing()

    # 恢复工作目录
    os.chdir(original_cwd)
@@ -209,9 +281,13 @@ async def lifespan(app: FastAPI):
    models["version"] = version
    models["timesteps"] = torch.tensor([0], device=device)

-    print("✅ MuseTalk v1.5 模型加载完成，服务就绪！")
-    print(f"⚙️  优化参数: batch_size={env_config['batch_size']}, "
-          f"detect_every={DETECT_EVERY}, blend_cache_every={BLEND_CACHE_EVERY}")
+    print("✅ MuseTalk v1.5 模型加载完成，服务就绪！")
+    print(f"⚙️  优化参数: batch_size={env_config['batch_size']}, "
+          f"detect_every={DETECT_EVERY}, blend_cache_every={BLEND_CACHE_EVERY}, "
+          f"audio_padding=({AUDIO_PADDING_LEFT},{AUDIO_PADDING_RIGHT}), extra_margin={EXTRA_MARGIN}, "
+          f"delay_frame={DELAY_FRAME}, blend_mode={BLEND_MODE}, "
+          f"faceparsing_cheek=({FACEPARSING_LEFT_CHEEK_WIDTH},{FACEPARSING_RIGHT_CHEEK_WIDTH}), "
+          f"encode=libx264/{ENCODE_PRESET}/crf{ENCODE_CRF}")
    yield
    models.clear()
    torch.cuda.empty_cache()
@@ -352,15 +428,15 @@ def _detect_faces_subsampled(frames, detect_every=5):
 #  核心推理 (优化版)
 # =====================================================================
@torch.no_grad()
-def _run_inference(req: LipSyncRequest) -> dict:
-    """
-    优化版推理逻辑:
-    1. cv2.VideoCapture 直读帧 (跳过 ffmpeg→PNG→imread)
-    2. 人脸检测降频 (每 N 帧, 中间插值)
-    3. BiSeNet mask 缓存 (每 N 帧更新)
-    4. cv2.VideoWriter 直写 (跳过逐帧 PNG)
-    5. 每阶段计时
+def _run_inference(req: LipSyncRequest) -> dict:
    """
+    优化版推理逻辑:
+    1. cv2.VideoCapture 直读帧 (跳过 ffmpeg→PNG→imread)
+    2. 人脸检测降频 (每 N 帧, 中间插值)
+    3. BiSeNet mask 缓存 (每 N 帧更新)
+    4. FFmpeg rawvideo 管道直编码 (无中间有损文件)
+    5. 每阶段计时
+    """
    vae = models["vae"]
    unet = models["unet"]
    pe = models["pe"]
@@ -409,12 +485,12 @@ def _run_inference(req: LipSyncRequest) -> dict:
    # ===== Phase 2: Whisper 音频特征 =====
    t0 = time.time()
    whisper_input_features, librosa_length = audio_processor.get_audio_feature(audio_path)
-    whisper_chunks = audio_processor.get_whisper_chunk(
-        whisper_input_features, device, weight_dtype, whisper, librosa_length,
-        fps=fps,
-        audio_padding_length_left=2,
-        audio_padding_length_right=2,
-    )
+    whisper_chunks = audio_processor.get_whisper_chunk(
+        whisper_input_features, device, weight_dtype, whisper, librosa_length,
+        fps=fps,
+        audio_padding_length_left=AUDIO_PADDING_LEFT,
+        audio_padding_length_right=AUDIO_PADDING_RIGHT,
+    )
    timings["2_whisper"] = time.time() - t0
    print(f"🎵 Whisper 特征 [{timings['2_whisper']:.1f}s]")

@@ -425,12 +501,12 @@ def _run_inference(req: LipSyncRequest) -> dict:
    print(f"🔍 人脸检测 [{timings['3_face']:.1f}s]")

    # ===== Phase 4: VAE 潜空间编码 =====
-    t0 = time.time()
-    input_latent_list = []
-    extra_margin = 10
-    for bbox, frame in zip(coord_list, frames):
-        if bbox == coord_placeholder:
-            continue
+    t0 = time.time()
+    input_latent_list = []
+    extra_margin = EXTRA_MARGIN
+    for bbox, frame in zip(coord_list, frames):
+        if bbox == coord_placeholder:
+            continue
        x1, y1, x2, y2 = bbox
        if version == "v15":
            y2 = min(y2 + extra_margin, frame.shape[0])
@@ -451,13 +527,13 @@ def _run_inference(req: LipSyncRequest) -> dict:
    input_latent_list_cycle = input_latent_list + input_latent_list[::-1]

    video_num = len(whisper_chunks)
-    gen = datagen(
-        whisper_chunks=whisper_chunks,
-        vae_encode_latents=input_latent_list_cycle,
-        batch_size=batch_size,
-        delay_frame=0,
-        device=device,
-    )
+    gen = datagen(
+        whisper_chunks=whisper_chunks,
+        vae_encode_latents=input_latent_list_cycle,
+        batch_size=batch_size,
+        delay_frame=DELAY_FRAME,
+        device=device,
+    )

    res_frame_list = []
    total_batches = int(np.ceil(float(video_num) / batch_size))
@@ -477,21 +553,44 @@ def _run_inference(req: LipSyncRequest) -> dict:
    timings["5_unet"] = time.time() - t0
    print(f"✅ UNet 推理: {len(res_frame_list)} 帧 [{timings['5_unet']:.1f}s]")

-    # ===== Phase 6: 合成 (缓存 BiSeNet mask + cv2.VideoWriter) =====
-    t0 = time.time()
-
-    h, w = frames[0].shape[:2]
-    temp_raw_path = output_vid_path + ".raw.mp4"
-
-    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
-    writer = cv2.VideoWriter(temp_raw_path, fourcc, fps, (w, h))
-
-    if not writer.isOpened():
-        raise RuntimeError(f"cv2.VideoWriter 打开失败: {temp_raw_path}")
-
-    cached_mask = None
-    cached_crop_box = None
-    blend_mode = "jaw" if version == "v15" else "raw"
+    # ===== Phase 6: 合成并写入 FFmpeg rawvideo 管道 =====
+    t0 = time.time()
+
+    h, w = frames[0].shape[:2]
+    ffmpeg_cmd = [
+        "ffmpeg", "-y", "-v", "warning",
+        "-f", "rawvideo",
+        "-pix_fmt", "bgr24",
+        "-s", f"{w}x{h}",
+        "-r", str(fps),
+        "-i", "-",
+        "-i", audio_path,
+        "-c:v", "libx264", "-preset", ENCODE_PRESET, "-crf", str(ENCODE_CRF), "-pix_fmt", "yuv420p",
+        "-c:a", "copy", "-shortest",
+        output_vid_path,
+    ]
+    ffmpeg_proc = subprocess.Popen(
+        ffmpeg_cmd,
+        stdin=subprocess.PIPE,
+        stdout=subprocess.DEVNULL,
+        stderr=subprocess.DEVNULL,
+    )
+    pipe_in = ffmpeg_proc.stdin
+    if pipe_in is None:
+        raise RuntimeError("FFmpeg 管道初始化失败")
+
+    def _write_pipe_frame(frame: np.ndarray):
+        try:
+            pipe_in.write(np.ascontiguousarray(frame, dtype=np.uint8).tobytes())
+        except BrokenPipeError as exc:
+            raise RuntimeError("FFmpeg 管道写入失败") from exc
+
+    cached_mask = None
+    cached_crop_box = None
+    if BLEND_MODE == "auto":
+        blend_mode = "jaw" if version == "v15" else "raw"
+    else:
+        blend_mode = BLEND_MODE

    for i in tqdm(range(len(res_frame_list)), desc="合成"):
        res_frame = res_frame_list[i]
@@ -501,58 +600,54 @@ def _run_inference(req: LipSyncRequest) -> dict:
        x1, y1, x2, y2 = bbox
        if version == "v15":
            y2 = min(y2 + extra_margin, ori_frame.shape[0])
-        adjusted_bbox = (x1, y1, x2, y2)
-
-        try:
-            res_frame = cv2.resize(res_frame.astype(np.uint8), (x2 - x1, y2 - y1))
-        except Exception:
-            writer.write(ori_frame)
-            continue
+        adjusted_bbox = (x1, y1, x2, y2)
+
+        try:
+            res_frame = cv2.resize(res_frame.astype(np.uint8), (x2 - x1, y2 - y1))
+        except Exception:
+            _write_pipe_frame(ori_frame)
+            continue

        # 每 N 帧更新 BiSeNet 人脸解析 mask, 其余帧复用缓存
        if i % BLEND_CACHE_EVERY == 0 or cached_mask is None:
            try:
                cached_mask, cached_crop_box = get_image_prepare_material(
                    ori_frame, adjusted_bbox, mode=blend_mode, fp=fp)
-            except Exception:
-                # 如果 prepare 失败, 用完整方式
-                combine_frame = get_image(
-                    ori_frame, res_frame, list(adjusted_bbox),
-                    mode=blend_mode, fp=fp)
-                writer.write(combine_frame)
-                continue
+            except Exception:
+                # 如果 prepare 失败, 用完整方式
+                combine_frame = get_image(
+                    ori_frame, res_frame, list(adjusted_bbox),
+                    mode=blend_mode, fp=fp)
+                _write_pipe_frame(combine_frame)
+                continue

        try:
-            combine_frame = get_image_blending(
+            combine_frame = get_image_blending_fast(
                ori_frame, res_frame, adjusted_bbox, cached_mask, cached_crop_box)
        except Exception:
-            # blending 失败时 fallback 到完整方式
-            combine_frame = get_image(
-                ori_frame, res_frame, list(adjusted_bbox),
-                mode=blend_mode, fp=fp)
-
-        writer.write(combine_frame)
-
-    writer.release()
-    timings["6_blend"] = time.time() - t0
-    print(f"🎨 合成 [{timings['6_blend']:.1f}s]")
-
-    # ===== Phase 7: FFmpeg 重编码 H.264 + 合并音频 =====
-    t0 = time.time()
-    cmd = (
-        f"ffmpeg -y -v warning -i {temp_raw_path} -i {audio_path} "
-        f"-c:v libx264 -crf 18 -pix_fmt yuv420p "
-        f"-c:a copy -shortest {output_vid_path}"
-    )
-    if not run_ffmpeg(cmd):
-        raise RuntimeError("FFmpeg 重编码+音频合并失败")
-
-    # 清理临时文件
-    if os.path.exists(temp_raw_path):
-        os.unlink(temp_raw_path)
-
-    timings["7_encode"] = time.time() - t0
-    print(f"🔊 编码+音频 [{timings['7_encode']:.1f}s]")
+            # blending_fast 失败时 fallback 到 PIL 方式
+            try:
+                combine_frame = get_image_blending(
+                    ori_frame, res_frame, adjusted_bbox, cached_mask, cached_crop_box)
+            except Exception:
+                combine_frame = get_image(
+                    ori_frame, res_frame, list(adjusted_bbox),
+                    mode=blend_mode, fp=fp)
+
+        _write_pipe_frame(combine_frame)
+
+    pipe_in.close()
+    timings["6_blend"] = time.time() - t0
+    print(f"🎨 合成 [{timings['6_blend']:.1f}s]")
+
+    # ===== Phase 7: 等待 FFmpeg 编码完成 =====
+    t0 = time.time()
+    return_code = ffmpeg_proc.wait()
+    if return_code != 0:
+        raise RuntimeError("FFmpeg 编码+音频合并失败")
+
+    timings["7_encode"] = time.time() - t0
+    print(f"🔊 编码+音频 [{timings['7_encode']:.1f}s]")

    # ===== 汇总 =====
    total_time = time.time() - t_total
--- a/remotion/render.ts
+++ b/remotion/render.ts
@@ -155,18 +155,97 @@ async function main() {
  console.log(`Public dir: ${publicDir}, Video file: ${videoFileName}`);

  // Bundle the Remotion project
-  console.log('Bundling Remotion project...');
-
  // 修复: 使用 process.cwd() 解析 src/index.ts，确保在 dist/render.js 和 ts-node 下都能找到
  // 假设脚本总是在 remotion 根目录下运行 (由 python service 保证)
  const entryPoint = path.resolve(process.cwd(), 'src/index.ts');
-  console.log(`Entry point: ${entryPoint}`);

-  const bundleLocation = await bundle({
-    entryPoint,
-    webpackOverride: (config) => config,
-    publicDir,
-  });
+  // Bundle 缓存逻辑：通过 src 目录 mtime hash 判断是否需要重新打包
+  const BUNDLE_CACHE_DIR = path.resolve(process.cwd(), '.remotion-bundle-cache');
+  const hashFile = path.join(BUNDLE_CACHE_DIR, '.hash');
+
+  function getSourceHash(): string {
+    // 收集 src 目录下所有文件的 mtime 作为缓存 key
+    const srcDir = path.resolve(process.cwd(), 'src');
+    const mtimes: string[] = [];
+    function walkDir(dir: string) {
+      for (const entry of fs.readdirSync(dir, { withFileTypes: true })) {
+        const fullPath = path.join(dir, entry.name);
+        if (entry.isDirectory()) {
+          walkDir(fullPath);
+        } else {
+          mtimes.push(`${fullPath}:${fs.statSync(fullPath).mtimeMs}`);
+        }
+      }
+    }
+    walkDir(srcDir);
+    mtimes.sort();
+    return mtimes.join('|');
+  }
+
+  const currentHash = getSourceHash();
+  let bundleLocation: string;
+
+  // 辅助函数: 确保文件在缓存 public 目录中可访问 (硬链接 > 复制)
+  function ensureInCachedPublic(cachedPublicDir: string, srcAbsPath: string, fileName: string) {
+    const cachedPath = path.join(cachedPublicDir, fileName);
+    // 已存在且大小一致，跳过
+    try {
+      if (fs.existsSync(cachedPath)) {
+        const srcStat = fs.statSync(srcAbsPath);
+        const cachedStat = fs.statSync(cachedPath);
+        if (srcStat.size === cachedStat.size && srcStat.ino === cachedStat.ino) return;
+      }
+    } catch { /* file doesn't exist or broken, will recreate */ }
+    // 移除旧的文件/链接
+    try { fs.unlinkSync(cachedPath); } catch { /* doesn't exist, fine */ }
+    // 优先硬链接（零拷贝，对应用透明），跨文件系统时回退为复制
+    try {
+      fs.linkSync(srcAbsPath, cachedPath);
+      console.log(`Hardlinked into cached bundle: ${fileName}`);
+    } catch {
+      fs.copyFileSync(srcAbsPath, cachedPath);
+      console.log(`Copied into cached bundle: ${fileName}`);
+    }
+  }
+
+  if (fs.existsSync(hashFile) && fs.readFileSync(hashFile, 'utf-8') === currentHash) {
+    bundleLocation = BUNDLE_CACHE_DIR;
+    console.log('Using cached bundle');
+    // 确保当前渲染所需的文件在缓存 bundle 的 public 目录中可访问
+    const cachedPublicDir = path.join(BUNDLE_CACHE_DIR, 'public');
+    if (!fs.existsSync(cachedPublicDir)) {
+      fs.mkdirSync(cachedPublicDir, { recursive: true });
+    }
+    // 1) 视频文件
+    ensureInCachedPublic(cachedPublicDir, path.resolve(options.videoPath), videoFileName);
+    // 2) 字体文件 (从 subtitleStyle / titleStyle / secondaryTitleStyle 中提取)
+    const styleSources = [options.subtitleStyle, options.titleStyle, options.secondaryTitleStyle];
+    for (const style of styleSources) {
+      const fontFile = (style as Record<string, unknown>)?.font_file as string | undefined;
+      if (fontFile) {
+        const fontSrcPath = path.join(publicDir, fontFile);
+        if (fs.existsSync(fontSrcPath)) {
+          ensureInCachedPublic(cachedPublicDir, path.resolve(fontSrcPath), fontFile);
+        }
+      }
+    }
+  } else {
+    console.log('Bundling Remotion project...');
+    console.log(`Entry point: ${entryPoint}`);
+    const freshBundle = await bundle({
+      entryPoint,
+      webpackOverride: (config) => config,
+      publicDir,
+    });
+    // 复制到缓存目录
+    if (fs.existsSync(BUNDLE_CACHE_DIR)) {
+      fs.rmSync(BUNDLE_CACHE_DIR, { recursive: true });
+    }
+    fs.cpSync(freshBundle, BUNDLE_CACHE_DIR, { recursive: true });
+    fs.writeFileSync(hashFile, currentHash);
+    bundleLocation = BUNDLE_CACHE_DIR;
+    console.log('Bundle cached for future use');
+  }

  // 统一 inputProps，包含视频尺寸供 calculateMetadata 使用
  const inputProps = {
@@ -198,7 +277,7 @@ async function main() {
  composition.height = videoHeight;

  // Render the video
-  const concurrency = options.concurrency || 16;
+  const concurrency = options.concurrency || 4;
  console.log(`Rendering video (concurrency=${concurrency})...`);
  await renderMedia({
    composition,
--- a/remotion/src/utils/captions.ts
+++ b/remotion/src/utils/captions.ts
@@ -27,7 +27,7 @@ export function getCurrentSegment(
  currentTimeInSeconds: number
 ): Segment | null {
  for (const segment of captions.segments) {
-    if (currentTimeInSeconds >= segment.start && currentTimeInSeconds <= segment.end) {
+    if (currentTimeInSeconds >= segment.start && currentTimeInSeconds < segment.end) {
      return segment;
    }
  }
@@ -43,7 +43,7 @@ export function getCurrentWordIndex(
 ): number {
  for (let i = 0; i < segment.words.length; i++) {
    const word = segment.words[i];
-    if (currentTimeInSeconds >= word.start && currentTimeInSeconds <= word.end) {
+    if (currentTimeInSeconds >= word.start && currentTimeInSeconds < word.end) {
      return i;
    }
    // 如果当前时间在两个字之间，返回前一个字
Author	SHA1	Message	Date
Kevin Wong	48bc78fe38	更新	2026-03-02 16:35:16 +08:00
Kevin Wong	abf005f225	更新	2026-02-28 17:49:32 +08:00
Kevin Wong	9de2cb40b4	更新	2026-02-28 14:44:51 +08:00
Kevin Wong	29c67f629d	更新	2026-02-28 09:16:41 +08:00