Init: 初版代码

This commit is contained in:
Kevin Wong
2026-01-14 14:39:02 +08:00
parent 41c2e3f9d3
commit 302a43a22f
44 changed files with 9999 additions and 316 deletions

186
models/MuseTalk/DEPLOY.md Normal file
View File

@@ -0,0 +1,186 @@
# MuseTalk 部署指南
## 硬件要求
| 配置 | 最低要求 | 推荐配置 |
|------|----------|----------|
| GPU | 8GB VRAM (如 RTX 3060) | 24GB VRAM (如 RTX 3090) |
| 内存 | 32GB | 64GB |
| CUDA | 11.7+ | 12.0+ |
---
## 📦 安装步骤
### 1. 克隆 MuseTalk 仓库
```bash
# 进入 ViGent 项目的 models 目录
cd /home/rongye/ProgramFiles/ViGent/models
# 克隆 MuseTalk 仓库
git clone https://github.com/TMElyralab/MuseTalk.git MuseTalk_repo
# 保留我们的自定义文件
cp MuseTalk/DEPLOY.md MuseTalk_repo/
cp MuseTalk/musetalk_api.py MuseTalk_repo/
# 替换目录
rm -rf MuseTalk
mv MuseTalk_repo MuseTalk
```
### 2. 创建虚拟环境
```bash
cd /home/rongye/ProgramFiles/ViGent/models/MuseTalk
conda create -n musetalk python=3.10 -y
conda activate musetalk
```
### 3. 安装 PyTorch (CUDA 12.1)
```bash
# CUDA 12.1 (适配服务器 CUDA 12.8)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
```
### 4. 安装 MuseTalk 依赖
```bash
pip install -r requirements.txt
# 安装 mmlab 系列 (MuseTalk 必需)
pip install --no-cache-dir -U openmim
mim install mmengine
mim install "mmcv>=2.0.1"
mim install "mmdet>=3.1.0"
mim install "mmpose>=1.1.0"
```
### 5. 下载模型权重 ⬇️
> **权重文件较大(约 5GB请确保网络稳定**
#### 方式一:从 Hugging Face 下载 (推荐)
```bash
cd /home/rongye/ProgramFiles/ViGent/models/MuseTalk
# 安装 huggingface-cli
pip install huggingface_hub
# 下载 MuseTalk 权重 (v1.5)
huggingface-cli download TMElyralab/MuseTalk \
--local-dir ./models/musetalk \
--include "*.pth" "*.json"
# 下载 MuseTalk V15 权重
huggingface-cli download TMElyralab/MuseTalk \
--local-dir ./models/musetalkV15 \
--include "unet.pth"
# 下载 SD-VAE 模型 (Stable Diffusion VAE)
huggingface-cli download stabilityai/sd-vae-ft-mse \
--local-dir ./models/sd-vae-ft-mse
# 下载 Whisper 模型 (音频特征提取)
# MuseTalk 使用 whisper-tiny
huggingface-cli download openai/whisper-tiny \
--local-dir ./models/whisper
```
#### 方式二:手动下载
从以下链接下载并放到对应目录:
| 模型 | 下载链接 | 存放路径 |
|------|----------|----------|
| MuseTalk | [Hugging Face](https://huggingface.co/TMElyralab/MuseTalk) | `models/MuseTalk/models/musetalk/` |
| MuseTalk V15 | 同上 | `models/MuseTalk/models/musetalkV15/` |
| SD-VAE | [Hugging Face](https://huggingface.co/stabilityai/sd-vae-ft-mse) | `models/MuseTalk/models/sd-vae-ft-mse/` |
| Whisper | [Hugging Face](https://huggingface.co/openai/whisper-tiny) | `models/MuseTalk/models/whisper/` |
| DWPose | 按官方 README | `models/MuseTalk/models/dwpose/` |
| Face Parse | 按官方 README | `models/MuseTalk/models/face-parse-bisent/` |
### 6. 验证安装
```bash
cd /home/rongye/ProgramFiles/ViGent/models/MuseTalk
conda activate musetalk
# 测试推理 (使用 GPU1)
CUDA_VISIBLE_DEVICES=1 python -m scripts.inference \
--version v15 \
--inference_config configs/inference/test.yaml \
--result_dir ./results \
--use_float16
```
---
## 📂 目录结构
安装完成后目录结构:
```
models/MuseTalk/
├── configs/
│ └── inference/
├── models/ # ⬅️ 权重文件目录
│ ├── musetalk/ # MuseTalk 基础权重
│ │ ├── config.json
│ │ └── pytorch_model.bin
│ ├── musetalkV15/ # V1.5 版本 UNet
│ │ └── unet.pth
│ ├── sd-vae-ft-mse/ # Stable Diffusion VAE
│ │ └── diffusion_pytorch_model.bin
│ ├── whisper/ # Whisper 模型
│ ├── dwpose/ # 姿态检测
│ └── face-parse-bisent/ # 人脸解析
├── musetalk/ # MuseTalk 源码
├── scripts/
│ └── inference.py
├── DEPLOY.md # 本文档
└── musetalk_api.py # API 服务
```
---
## 🔧 ViGent 集成配置
### 环境变量配置
`/home/rongye/ProgramFiles/ViGent/backend/.env` 中设置:
```bash
# MuseTalk 配置
MUSETALK_LOCAL=true
MUSETALK_GPU_ID=1
MUSETALK_VERSION=v15
MUSETALK_USE_FLOAT16=true
MUSETALK_BATCH_SIZE=8
```
### 启动后端服务
```bash
cd /home/rongye/ProgramFiles/ViGent/backend
source venv/bin/activate
# 设置 GPU 并启动
CUDA_VISIBLE_DEVICES=1 uvicorn app.main:app --host 0.0.0.0 --port 8000
```
---
## 🚨 常见问题
### Q1: CUDA out of memory
**解决**:减小 `MUSETALK_BATCH_SIZE` 或启用 `MUSETALK_USE_FLOAT16=true`
### Q2: mmcv 安装失败
**解决**:确保 CUDA 版本匹配,使用 `mim install mmcv==2.0.1`
### Q3: Whisper 加载失败
**解决**:检查 `models/whisper/` 目录是否包含完整模型文件

View File

@@ -0,0 +1,157 @@
"""
MuseTalk API 服务
这个脚本将 MuseTalk 封装为 FastAPI 服务,
可以独立部署在 GPU 服务器上。
用法:
python musetalk_api.py --port 8001
"""
import os
import sys
import argparse
import tempfile
import shutil
from pathlib import Path
from typing import Optional
from fastapi import FastAPI, UploadFile, File, Form, HTTPException
from fastapi.responses import FileResponse
from fastapi.middleware.cors import CORSMiddleware
import uvicorn
# 添加 MuseTalk 路径
MUSETALK_DIR = Path(__file__).parent
sys.path.insert(0, str(MUSETALK_DIR))
app = FastAPI(
title="MuseTalk API",
description="唇形同步推理服务",
version="0.1.0"
)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# 全局模型实例 (懒加载)
_model = None
def get_model():
"""懒加载 MuseTalk 模型"""
global _model
if _model is None:
print("🔄 加载 MuseTalk 模型...")
# TODO: 根据 MuseTalk 实际 API 调整
# from musetalk.inference import MuseTalkInference
# _model = MuseTalkInference()
print("✅ MuseTalk 模型加载完成")
return _model
@app.get("/")
async def root():
return {"name": "MuseTalk API", "status": "ok"}
@app.get("/health")
async def health():
"""健康检查"""
return {"status": "healthy", "gpu": True}
@app.post("/lipsync")
async def lipsync(
video: UploadFile = File(..., description="输入视频文件"),
audio: UploadFile = File(..., description="音频文件"),
fps: int = Form(25, description="输出帧率")
):
"""
唇形同步推理
Args:
video: 输入视频 (静态人物)
audio: 驱动音频
fps: 输出帧率
Returns:
生成的视频文件
"""
# 创建临时目录
with tempfile.TemporaryDirectory() as tmpdir:
tmpdir = Path(tmpdir)
# 保存上传的文件
video_path = tmpdir / "input_video.mp4"
audio_path = tmpdir / "input_audio.wav"
output_path = tmpdir / "output.mp4"
with open(video_path, "wb") as f:
shutil.copyfileobj(video.file, f)
with open(audio_path, "wb") as f:
shutil.copyfileobj(audio.file, f)
try:
# 执行唇形同步
model = get_model()
# TODO: 调用实际的 MuseTalk 推理
# result = model.inference(
# source_video=str(video_path),
# driving_audio=str(audio_path),
# output_path=str(output_path),
# fps=fps
# )
# 临时: 使用 subprocess 调用 MuseTalk CLI
import subprocess
cmd = [
sys.executable, "-m", "scripts.inference",
"--video_path", str(video_path),
"--audio_path", str(audio_path),
"--output_path", str(output_path),
]
result = subprocess.run(
cmd,
cwd=str(MUSETALK_DIR),
capture_output=True,
text=True
)
if result.returncode != 0:
raise RuntimeError(f"MuseTalk 推理失败: {result.stderr}")
if not output_path.exists():
raise RuntimeError("输出文件不存在")
# 返回生成的视频
# 需要先复制到持久化位置
final_output = Path("outputs") / f"lipsync_{video.filename}"
final_output.parent.mkdir(exist_ok=True)
shutil.copy(output_path, final_output)
return FileResponse(
final_output,
media_type="video/mp4",
filename=f"lipsync_{video.filename}"
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--port", type=int, default=8001)
parser.add_argument("--host", type=str, default="0.0.0.0")
args = parser.parse_args()
print(f"🚀 MuseTalk API 启动在 http://{args.host}:{args.port}")
uvicorn.run(app, host=args.host, port=args.port)