Init: 初版代码

2026-01-14 14:39:02 +08:00
parent 41c2e3f9d3
commit 302a43a22f
44 changed files with 9999 additions and 316 deletions
--- a/models/MuseTalk/DEPLOY.md
+++ b/models/MuseTalk/DEPLOY.md
@@ -0,0 +1,186 @@
+# MuseTalk 部署指南
+
+## 硬件要求
+
+| 配置 | 最低要求 | 推荐配置 |
+|------|----------|----------|
+| GPU | 8GB VRAM (如 RTX 3060) | 24GB VRAM (如 RTX 3090) |
+| 内存 | 32GB | 64GB |
+| CUDA | 11.7+ | 12.0+ |
+
+---
+
+## 📦 安装步骤
+
+### 1. 克隆 MuseTalk 仓库
+
+```bash
+# 进入 ViGent 项目的 models 目录
+cd /home/rongye/ProgramFiles/ViGent/models
+
+# 克隆 MuseTalk 仓库
+git clone https://github.com/TMElyralab/MuseTalk.git MuseTalk_repo
+
+# 保留我们的自定义文件
+cp MuseTalk/DEPLOY.md MuseTalk_repo/
+cp MuseTalk/musetalk_api.py MuseTalk_repo/
+
+# 替换目录
+rm -rf MuseTalk
+mv MuseTalk_repo MuseTalk
+```
+
+### 2. 创建虚拟环境
+
+```bash
+cd /home/rongye/ProgramFiles/ViGent/models/MuseTalk
+conda create -n musetalk python=3.10 -y
+conda activate musetalk
+```
+
+### 3. 安装 PyTorch (CUDA 12.1)
+
+```bash
+# CUDA 12.1 (适配服务器 CUDA 12.8)
+pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
+```
+
+### 4. 安装 MuseTalk 依赖
+
+```bash
+pip install -r requirements.txt
+
+# 安装 mmlab 系列 (MuseTalk 必需)
+pip install --no-cache-dir -U openmim
+mim install mmengine
+mim install "mmcv>=2.0.1"
+mim install "mmdet>=3.1.0"
+mim install "mmpose>=1.1.0"
+```
+
+### 5. 下载模型权重 ⬇️
+
+> **权重文件较大（约 5GB），请确保网络稳定**
+
+#### 方式一：从 Hugging Face 下载 (推荐)
+
+```bash
+cd /home/rongye/ProgramFiles/ViGent/models/MuseTalk
+
+# 安装 huggingface-cli
+pip install huggingface_hub
+
+# 下载 MuseTalk 权重 (v1.5)
+huggingface-cli download TMElyralab/MuseTalk \
+    --local-dir ./models/musetalk \
+    --include "*.pth" "*.json"
+
+# 下载 MuseTalk V15 权重
+huggingface-cli download TMElyralab/MuseTalk \
+    --local-dir ./models/musetalkV15 \
+    --include "unet.pth"
+
+# 下载 SD-VAE 模型 (Stable Diffusion VAE)
+huggingface-cli download stabilityai/sd-vae-ft-mse \
+    --local-dir ./models/sd-vae-ft-mse
+
+# 下载 Whisper 模型 (音频特征提取)
+# MuseTalk 使用 whisper-tiny
+huggingface-cli download openai/whisper-tiny \
+    --local-dir ./models/whisper
+```
+
+#### 方式二：手动下载
+
+从以下链接下载并放到对应目录：
+
+| 模型 | 下载链接 | 存放路径 |
+|------|----------|----------|
+| MuseTalk | [Hugging Face](https://huggingface.co/TMElyralab/MuseTalk) | `models/MuseTalk/models/musetalk/` |
+| MuseTalk V15 | 同上 | `models/MuseTalk/models/musetalkV15/` |
+| SD-VAE | [Hugging Face](https://huggingface.co/stabilityai/sd-vae-ft-mse) | `models/MuseTalk/models/sd-vae-ft-mse/` |
+| Whisper | [Hugging Face](https://huggingface.co/openai/whisper-tiny) | `models/MuseTalk/models/whisper/` |
+| DWPose | 按官方 README | `models/MuseTalk/models/dwpose/` |
+| Face Parse | 按官方 README | `models/MuseTalk/models/face-parse-bisent/` |
+
+### 6. 验证安装
+
+```bash
+cd /home/rongye/ProgramFiles/ViGent/models/MuseTalk
+conda activate musetalk
+
+# 测试推理 (使用 GPU1)
+CUDA_VISIBLE_DEVICES=1 python -m scripts.inference \
+    --version v15 \
+    --inference_config configs/inference/test.yaml \
+    --result_dir ./results \
+    --use_float16
+```
+
+---
+
+## 📂 目录结构
+
+安装完成后目录结构：
+
+```
+models/MuseTalk/
+├── configs/
+│   └── inference/
+├── models/                     # ⬅️ 权重文件目录
+│   ├── musetalk/               # MuseTalk 基础权重
+│   │   ├── config.json
+│   │   └── pytorch_model.bin
+│   ├── musetalkV15/            # V1.5 版本 UNet
+│   │   └── unet.pth
+│   ├── sd-vae-ft-mse/          # Stable Diffusion VAE
+│   │   └── diffusion_pytorch_model.bin
+│   ├── whisper/                # Whisper 模型
+│   ├── dwpose/                 # 姿态检测
+│   └── face-parse-bisent/      # 人脸解析
+├── musetalk/                   # MuseTalk 源码
+├── scripts/
+│   └── inference.py
+├── DEPLOY.md                   # 本文档
+└── musetalk_api.py             # API 服务
+```
+
+---
+
+## 🔧 ViGent 集成配置
+
+### 环境变量配置
+
+在 `/home/rongye/ProgramFiles/ViGent/backend/.env` 中设置：
+
+```bash
+# MuseTalk 配置
+MUSETALK_LOCAL=true
+MUSETALK_GPU_ID=1
+MUSETALK_VERSION=v15
+MUSETALK_USE_FLOAT16=true
+MUSETALK_BATCH_SIZE=8
+```
+
+### 启动后端服务
+
+```bash
+cd /home/rongye/ProgramFiles/ViGent/backend
+source venv/bin/activate
+
+# 设置 GPU 并启动
+CUDA_VISIBLE_DEVICES=1 uvicorn app.main:app --host 0.0.0.0 --port 8000
+```
+
+---
+
+## 🚨 常见问题
+
+### Q1: CUDA out of memory
+**解决**：减小 `MUSETALK_BATCH_SIZE` 或启用 `MUSETALK_USE_FLOAT16=true`
+
+### Q2: mmcv 安装失败
+**解决**：确保 CUDA 版本匹配，使用 `mim install mmcv==2.0.1`
+
+### Q3: Whisper 加载失败
+**解决**：检查 `models/whisper/` 目录是否包含完整模型文件
--- a/models/MuseTalk/musetalk_api.py
+++ b/models/MuseTalk/musetalk_api.py
@@ -0,0 +1,157 @@
+"""
+MuseTalk API 服务
+
+这个脚本将 MuseTalk 封装为 FastAPI 服务，
+可以独立部署在 GPU 服务器上。
+
+用法:
+    python musetalk_api.py --port 8001
+"""
+
+import os
+import sys
+import argparse
+import tempfile
+import shutil
+from pathlib import Path
+from typing import Optional
+
+from fastapi import FastAPI, UploadFile, File, Form, HTTPException
+from fastapi.responses import FileResponse
+from fastapi.middleware.cors import CORSMiddleware
+import uvicorn
+
+# 添加 MuseTalk 路径
+MUSETALK_DIR = Path(__file__).parent
+sys.path.insert(0, str(MUSETALK_DIR))
+
+app = FastAPI(
+    title="MuseTalk API",
+    description="唇形同步推理服务",
+    version="0.1.0"
+)
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+# 全局模型实例 (懒加载)
+_model = None
+
+
+def get_model():
+    """懒加载 MuseTalk 模型"""
+    global _model
+    if _model is None:
+        print("🔄 加载 MuseTalk 模型...")
+        # TODO: 根据 MuseTalk 实际 API 调整
+        # from musetalk.inference import MuseTalkInference
+        # _model = MuseTalkInference()
+        print("✅ MuseTalk 模型加载完成")
+    return _model
+
+
+@app.get("/")
+async def root():
+    return {"name": "MuseTalk API", "status": "ok"}
+
+
+@app.get("/health")
+async def health():
+    """健康检查"""
+    return {"status": "healthy", "gpu": True}
+
+
+@app.post("/lipsync")
+async def lipsync(
+    video: UploadFile = File(..., description="输入视频文件"),
+    audio: UploadFile = File(..., description="音频文件"),
+    fps: int = Form(25, description="输出帧率")
+):
+    """
+    唇形同步推理
+    
+    Args:
+        video: 输入视频 (静态人物)
+        audio: 驱动音频
+        fps: 输出帧率
+    
+    Returns:
+        生成的视频文件
+    """
+    # 创建临时目录
+    with tempfile.TemporaryDirectory() as tmpdir:
+        tmpdir = Path(tmpdir)
+        
+        # 保存上传的文件
+        video_path = tmpdir / "input_video.mp4"
+        audio_path = tmpdir / "input_audio.wav"
+        output_path = tmpdir / "output.mp4"
+        
+        with open(video_path, "wb") as f:
+            shutil.copyfileobj(video.file, f)
+        with open(audio_path, "wb") as f:
+            shutil.copyfileobj(audio.file, f)
+        
+        try:
+            # 执行唇形同步
+            model = get_model()
+            
+            # TODO: 调用实际的 MuseTalk 推理
+            # result = model.inference(
+            #     source_video=str(video_path),
+            #     driving_audio=str(audio_path),
+            #     output_path=str(output_path),
+            #     fps=fps
+            # )
+            
+            # 临时: 使用 subprocess 调用 MuseTalk CLI
+            import subprocess
+            cmd = [
+                sys.executable, "-m", "scripts.inference",
+                "--video_path", str(video_path),
+                "--audio_path", str(audio_path),
+                "--output_path", str(output_path),
+            ]
+            
+            result = subprocess.run(
+                cmd,
+                cwd=str(MUSETALK_DIR),
+                capture_output=True,
+                text=True
+            )
+            
+            if result.returncode != 0:
+                raise RuntimeError(f"MuseTalk 推理失败: {result.stderr}")
+            
+            if not output_path.exists():
+                raise RuntimeError("输出文件不存在")
+            
+            # 返回生成的视频
+            # 需要先复制到持久化位置
+            final_output = Path("outputs") / f"lipsync_{video.filename}"
+            final_output.parent.mkdir(exist_ok=True)
+            shutil.copy(output_path, final_output)
+            
+            return FileResponse(
+                final_output,
+                media_type="video/mp4",
+                filename=f"lipsync_{video.filename}"
+            )
+            
+        except Exception as e:
+            raise HTTPException(status_code=500, detail=str(e))
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--port", type=int, default=8001)
+    parser.add_argument("--host", type=str, default="0.0.0.0")
+    args = parser.parse_args()
+    
+    print(f"🚀 MuseTalk API 启动在 http://{args.host}:{args.port}")
+    uvicorn.run(app, host=args.host, port=args.port)