253 lines
6.8 KiB
Markdown
253 lines
6.8 KiB
Markdown
# MuseTalk 部署指南
|
||
|
||
> **更新时间**:2026-02-27
|
||
> **适用版本**:MuseTalk v1.5 (常驻服务模式)
|
||
> **架构**:FastAPI 常驻服务 + PM2 进程管理
|
||
|
||
---
|
||
|
||
## 架构概览
|
||
|
||
MuseTalk 作为 **混合唇形同步方案** 的长视频引擎:
|
||
|
||
- **短视频 (<120s)** → LatentSync 1.6 (GPU1, 端口 8007)
|
||
- **长视频 (>=120s)** → MuseTalk 1.5 (GPU0, 端口 8011)
|
||
- 路由阈值由 `LIPSYNC_DURATION_THRESHOLD` 控制
|
||
- MuseTalk 不可用时自动回退到 LatentSync
|
||
|
||
---
|
||
|
||
## 硬件要求
|
||
|
||
| 配置 | 最低要求 | 推荐配置 |
|
||
|------|----------|----------|
|
||
| GPU | 8GB VRAM (RTX 3060) | 24GB VRAM (RTX 3090) |
|
||
| 内存 | 32GB | 64GB |
|
||
| CUDA | 11.7+ | 11.8 |
|
||
|
||
> MuseTalk fp16 推理约需 4-8GB 显存,可与 CosyVoice 共享 GPU0。
|
||
|
||
---
|
||
|
||
## 安装步骤
|
||
|
||
### 1. Conda 环境
|
||
|
||
```bash
|
||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||
conda create -n musetalk python=3.10 -y
|
||
conda activate musetalk
|
||
```
|
||
|
||
### 2. PyTorch 2.0.1 + CUDA 11.8
|
||
|
||
> 必须使用此版本,mmcv 预编译包依赖。
|
||
|
||
```bash
|
||
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
|
||
```
|
||
|
||
### 3. 依赖安装
|
||
|
||
```bash
|
||
pip install -r requirements.txt
|
||
|
||
# MMLab 系列
|
||
pip install --no-cache-dir -U openmim
|
||
mim install mmengine
|
||
mim install "mmcv==2.0.1"
|
||
mim install "mmdet==3.1.0"
|
||
pip install chumpy --no-build-isolation
|
||
pip install "mmpose==1.1.0" --no-deps
|
||
|
||
# FastAPI 服务依赖
|
||
pip install fastapi uvicorn httpx
|
||
```
|
||
|
||
---
|
||
|
||
## 模型权重
|
||
|
||
### 目录结构
|
||
|
||
```
|
||
models/MuseTalk/models/
|
||
├── musetalk/ ← v1 基础模型
|
||
│ ├── config.json -> musetalk.json (软链接)
|
||
│ ├── musetalk.json
|
||
│ ├── musetalkV15 -> ../musetalkV15 (软链接, 关键!)
|
||
│ └── pytorch_model.bin (~3.2GB)
|
||
├── musetalkV15/ ← v1.5 UNet 模型
|
||
│ ├── musetalk.json
|
||
│ └── unet.pth (~3.2GB)
|
||
├── sd-vae/ ← Stable Diffusion VAE
|
||
│ ├── config.json
|
||
│ └── diffusion_pytorch_model.bin
|
||
├── whisper/ ← OpenAI Whisper Tiny
|
||
│ ├── config.json
|
||
│ ├── pytorch_model.bin (~151MB)
|
||
│ └── preprocessor_config.json
|
||
├── dwpose/ ← DWPose 人体姿态检测
|
||
│ └── dw-ll_ucoco_384.pth (~387MB)
|
||
├── syncnet/ ← SyncNet 唇形同步评估
|
||
│ └── latentsync_syncnet.pt
|
||
└── face-parse-bisent/ ← 人脸解析模型
|
||
├── 79999_iter.pth (~53MB)
|
||
└── resnet18-5c106cde.pth (~45MB)
|
||
```
|
||
|
||
### 下载方式
|
||
|
||
使用项目自带脚本:
|
||
|
||
```bash
|
||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||
conda activate musetalk
|
||
bash download_weights.sh
|
||
```
|
||
|
||
或手动 Python API 下载:
|
||
|
||
```bash
|
||
conda activate musetalk
|
||
export HF_ENDPOINT=https://hf-mirror.com
|
||
python -c "
|
||
from huggingface_hub import snapshot_download
|
||
snapshot_download('TMElyralab/MuseTalk', local_dir='models',
|
||
allow_patterns=['musetalk/*', 'musetalkV15/*'])
|
||
snapshot_download('stabilityai/sd-vae-ft-mse', local_dir='models/sd-vae',
|
||
allow_patterns=['config.json', 'diffusion_pytorch_model.bin'])
|
||
snapshot_download('openai/whisper-tiny', local_dir='models/whisper',
|
||
allow_patterns=['config.json', 'pytorch_model.bin', 'preprocessor_config.json'])
|
||
snapshot_download('yzd-v/DWPose', local_dir='models/dwpose',
|
||
allow_patterns=['dw-ll_ucoco_384.pth'])
|
||
"
|
||
```
|
||
|
||
### 创建必要的软链接
|
||
|
||
```bash
|
||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk/models/musetalk
|
||
ln -sf musetalk.json config.json
|
||
ln -sf ../musetalkV15 musetalkV15
|
||
```
|
||
|
||
> **关键**:`musetalk/musetalkV15` 软链接缺失会导致权重检测失败 (`weights: False`)。
|
||
|
||
---
|
||
|
||
## 服务启动
|
||
|
||
### PM2 进程管理(推荐)
|
||
|
||
```bash
|
||
# 首次注册
|
||
cd /home/rongye/ProgramFiles/ViGent2
|
||
pm2 start run_musetalk.sh --name vigent2-musetalk
|
||
pm2 save
|
||
|
||
# 日常管理
|
||
pm2 restart vigent2-musetalk
|
||
pm2 logs vigent2-musetalk
|
||
pm2 stop vigent2-musetalk
|
||
```
|
||
|
||
### 手动启动
|
||
|
||
```bash
|
||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||
/home/rongye/ProgramFiles/miniconda3/envs/musetalk/bin/python scripts/server.py
|
||
```
|
||
|
||
### 健康检查
|
||
|
||
```bash
|
||
curl http://localhost:8011/health
|
||
# {"status":"ok","model_loaded":true}
|
||
```
|
||
|
||
---
|
||
|
||
## 后端配置
|
||
|
||
`backend/.env` 中的相关变量:
|
||
|
||
```ini
|
||
# MuseTalk 配置
|
||
MUSETALK_GPU_ID=0 # GPU 编号 (与 CosyVoice 共存)
|
||
MUSETALK_API_URL=http://localhost:8011 # 常驻服务地址
|
||
MUSETALK_BATCH_SIZE=32 # 推理批大小
|
||
MUSETALK_VERSION=v15 # 模型版本
|
||
MUSETALK_USE_FLOAT16=true # 半精度加速
|
||
|
||
# 混合唇形同步路由
|
||
LIPSYNC_DURATION_THRESHOLD=120 # 秒, >=此值用 MuseTalk
|
||
```
|
||
|
||
---
|
||
|
||
## 相关文件
|
||
|
||
| 文件 | 说明 |
|
||
|------|------|
|
||
| `models/MuseTalk/scripts/server.py` | FastAPI 常驻服务 (端口 8011) |
|
||
| `run_musetalk.sh` | PM2 启动脚本 |
|
||
| `backend/app/services/lipsync_service.py` | 混合路由 + `_call_musetalk_server()` |
|
||
| `backend/app/core/config.py` | `MUSETALK_*` 配置项 |
|
||
|
||
---
|
||
|
||
## 性能优化 (server.py v2)
|
||
|
||
首次长视频测试 (136s, 3404 帧) 耗时 30 分钟。分析发现瓶颈在人脸检测 (28%)、BiSeNet 合成 (22%)、I/O (17%),而非 UNet 推理 (17%)。
|
||
|
||
### 已实施优化
|
||
|
||
| 优化项 | 说明 |
|
||
|--------|------|
|
||
| `MUSETALK_BATCH_SIZE` 8→32 | RTX 3090 显存充裕,UNet 推理加速 ~3x |
|
||
| cv2.VideoCapture 直读帧 | 跳过 ffmpeg→PNG→imread 链路 |
|
||
| 人脸检测降频 (每5帧) | DWPose + FaceAlignment 只在采样帧运行,中间帧线性插值 bbox |
|
||
| BiSeNet mask 缓存 (每5帧) | `get_image_prepare_material` 每 5 帧运行,中间帧用 `get_image_blending` 复用 |
|
||
| cv2.VideoWriter 直写 | 跳过逐帧 PNG 写盘 + ffmpeg 重编码 |
|
||
| 每阶段计时 | 7 个阶段精确计时,方便后续调优 |
|
||
|
||
### 调优参数
|
||
|
||
`models/MuseTalk/scripts/server.py` 顶部可调:
|
||
|
||
```python
|
||
DETECT_EVERY = 5 # 人脸检测降频间隔 (帧)
|
||
BLEND_CACHE_EVERY = 5 # BiSeNet mask 缓存间隔 (帧)
|
||
```
|
||
|
||
> 对于口播视频 (人脸几乎不动),5 帧间隔的插值误差可忽略。
|
||
> 如人脸运动剧烈的场景,可降低为 2-3。
|
||
|
||
---
|
||
|
||
## 常见问题
|
||
|
||
### huggingface-hub 版本冲突
|
||
|
||
```
|
||
ImportError: huggingface-hub>=0.19.3,<1.0 is required
|
||
```
|
||
|
||
**解决**:降级 huggingface-hub
|
||
|
||
```bash
|
||
pip install "huggingface-hub>=0.19.3,<1.0"
|
||
```
|
||
|
||
### mmcv 导入失败
|
||
|
||
```bash
|
||
pip uninstall mmcv mmcv-full -y
|
||
mim install "mmcv==2.0.1"
|
||
```
|
||
|
||
### 音视频长度不匹配
|
||
|
||
已在 `musetalk/utils/audio_processor.py` 中修复(零填充逻辑),无需额外处理。
|