ViGent2/Docs/QWEN3_TTS_DEPLOY.md

# Qwen3-TTS 1.7B 部署指南

> 本文档描述如何在 Ubuntu 服务器上部署 Qwen3-TTS 1.7B-Base 声音克隆模型。

## 系统要求

| 要求 | 规格 |
|------|------|
| GPU | NVIDIA RTX 3090 24GB (或更高) |
| VRAM | ≥ 8GB (推理), ≥ 12GB (带 flash-attn) |
| CUDA | 12.1+ |
| Python | 3.10.x |
| 系统 | Ubuntu 20.04+ |

---

## GPU 分配

| GPU | 服务 | 模型 |
|-----|------|------|
| GPU0 | **Qwen3-TTS** | 1.7B-Base (声音克隆，更高质量) |
| GPU1 | LatentSync | 1.6 (唇形同步) |

---

## 步骤 1: 克隆仓库

```bash
cd /home/rongye/ProgramFiles/ViGent2/models
git clone https://github.com/QwenLM/Qwen3-TTS.git
cd Qwen3-TTS
```

---

## 步骤 2: 创建 Conda 环境

```bash
# 创建新的 conda 环境
conda create -n qwen-tts python=3.10 -y
conda activate qwen-tts
```

---

## 步骤 3: 安装 Python 依赖

```bash
cd /home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS

# 安装 qwen-tts 包 (editable mode)
pip install -e .

# 安装 sox 音频处理库 (必须)
conda install -y -c conda-forge sox
```

### 可选: 安装 FlashAttention (推荐)

FlashAttention 可以显著提升推理速度并减少显存占用：

```bash
pip install -U flash-attn --no-build-isolation
```

如果内存不足，可以限制编译并发数：

```bash
MAX_JOBS=4 pip install -U flash-attn --no-build-isolation
```

---

## 步骤 4: 下载模型权重

### 方式 A: ModelScope (推荐，国内更快)

```bash
pip install modelscope

# 下载 Tokenizer (651MB)
modelscope download --model Qwen/Qwen3-TTS-Tokenizer-12Hz --local_dir ./checkpoints/Tokenizer

# 下载 1.7B-Base 模型 (6.8GB)
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./checkpoints/1.7B-Base
```

### 方式 B: HuggingFace

```bash
pip install -U "huggingface_hub[cli]"

huggingface-cli download Qwen/Qwen3-TTS-Tokenizer-12Hz --local-dir ./checkpoints/Tokenizer
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-Base --local-dir ./checkpoints/1.7B-Base
```

下载完成后，目录结构应如下：

```
checkpoints/
├── Tokenizer/       # ~651MB
│   ├── config.json
│   ├── model.safetensors
│   └── ...
└── 1.7B-Base/       # ~6.8GB
    ├── config.json
    ├── model.safetensors
    └── ...
```

---

## 步骤 5: 验证安装

### 5.1 检查环境

```bash
conda activate qwen-tts

# 检查 PyTorch 和 CUDA
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"

# 检查 sox
sox --version
```

### 5.2 运行推理测试

创建测试脚本 `test_inference.py`:

```python
"""Qwen3-TTS 声音克隆测试"""
import torch
import soundfile as sf
from qwen_tts import Qwen3TTSModel

print("Loading Qwen3-TTS model on GPU:0...")
model = Qwen3TTSModel.from_pretrained(
    "./checkpoints/1.7B-Base",
    device_map="cuda:0",
    dtype=torch.bfloat16,
)
print("Model loaded!")

# 测试声音克隆 (需要准备参考音频)
ref_audio = "./examples/myvoice.wav"  # 3-20秒的参考音频
ref_text = "参考音频的文字内容"

test_text = "这是一段测试文本，用于验证声音克隆功能是否正常工作。"

print("Generating cloned voice...")
wavs, sr = model.generate_voice_clone(
    text=test_text,
    language="Chinese",
    ref_audio=ref_audio,
    ref_text=ref_text,
)

sf.write("test_output.wav", wavs[0], sr)
print(f"✅ Saved: test_output.wav | {sr}Hz | {len(wavs[0])/sr:.2f}s")
```

运行测试：

```bash
cd /home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS
python test_inference.py
```

---

## 步骤 6: 安装 HTTP 服务依赖

```bash
conda activate qwen-tts
pip install fastapi uvicorn python-multipart
```

---

## 步骤 7: 启动服务 (PM2 管理)

### 手动测试

```bash
conda activate qwen-tts
cd /home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS
python qwen_tts_server.py
```

访问 http://localhost:8009/health 验证服务状态。

### PM2 常驻服务

> ⚠️ **注意**：启动脚本 `run_qwen_tts.sh` 位于项目**根目录**，而非 models/Qwen3-TTS 目录。

1. 使用启动脚本:
```bash
cd /home/rongye/ProgramFiles/ViGent2
pm2 start ./run_qwen_tts.sh --name vigent2-qwen-tts
pm2 save
```

2. 查看日志:
```bash
pm2 logs vigent2-qwen-tts
```

3. 重启服务:
```bash
pm2 restart vigent2-qwen-tts
```

---

## 目录结构

部署完成后，目录结构应如下：

```
/home/rongye/ProgramFiles/ViGent2/
├── run_qwen_tts.sh              # PM2 启动脚本 (根目录)
└── models/Qwen3-TTS/
    ├── checkpoints/
    │   ├── Tokenizer/           # 语音编解码器
    │   └── 1.7B-Base/           # 声音克隆模型 (更高质量)
    ├── qwen_tts/                # 源码
    │   ├── inference/
    │   ├── models/
    │   └── ...
    ├── examples/
    │   └── myvoice.wav          # 参考音频
    ├── qwen_tts_server.py       # HTTP 推理服务 (端口 8009)
    ├── pyproject.toml
    ├── requirements.txt
    └── test_inference.py        # 测试脚本
```

---

## API 参考

### 健康检查

```
GET http://localhost:8009/health
```

响应:
```json
{
  "service": "Qwen3-TTS Voice Clone",
  "model": "1.7B-Base",
  "ready": true,
  "gpu_id": 0
}
```

### 声音克隆生成

```
POST http://localhost:8009/generate
Content-Type: multipart/form-data

Fields:
  - ref_audio: 参考音频文件 (WAV)
  - text: 要合成的文本
  - ref_text: 参考音频的转写文字
  - language: 语言 (默认 Chinese)

Response: audio/wav 文件
```

---

## 模型说明

### 可用模型

| 模型 | 功能 | 大小 |
|------|------|------|
| 0.6B-Base | 3秒快速声音克隆 | 2.4GB |
| 0.6B-CustomVoice | 9种预设音色 | 2.4GB |
| **1.7B-Base** | **声音克隆 (更高质量)** ✅ 当前使用 | 6.8GB |
| 1.7B-VoiceDesign | 自然语言描述生成声音 | 6.8GB |

### 支持语言

中文、英语、日语、韩语、德语、法语、俄语、葡萄牙语、西班牙语、意大利语

---

## 故障排除

### sox 未找到

```
SoX could not be found!
```

**解决**: 通过 conda 安装 sox：

```bash
conda install -y -c conda-forge sox
```

### CUDA 内存不足

Qwen3-TTS 1.7B 通常需要 8-10GB VRAM。如果遇到 OOM：

1. 确保 GPU0 没有运行其他程序
2. 不使用 flash-attn (会增加显存占用)
3. 使用更小的参考音频 (3-5秒)
4. 如果显存仍不足，可降级使用 0.6B-Base 模型

### 模型加载失败

确保以下文件存在：
- `checkpoints/1.7B-Base/config.json`
- `checkpoints/1.7B-Base/model.safetensors`

### 音频输出质量问题

1. 参考音频质量：使用清晰、无噪音的 3-10 秒音频
2. ref_text 准确性：参考音频的转写文字必须准确
3. 语言设置：确保 `language` 参数与文本语言一致

---

## 后端 ViGent2 集成

### 声音克隆服务 (`voice_clone_service.py`)

后端通过 HTTP 调用 Qwen3-TTS 服务：

```python
import aiohttp

QWEN_TTS_URL = "http://localhost:8009"

async def generate_cloned_audio(ref_audio_path: str, text: str, output_path: str):
    async with aiohttp.ClientSession() as session:
        with open(ref_audio_path, "rb") as f:
            data = aiohttp.FormData()
            data.add_field("ref_audio", f, filename="ref.wav")
            data.add_field("text", text)

            async with session.post(f"{QWEN_TTS_URL}/generate", data=data) as resp:
                audio_data = await resp.read()
                with open(output_path, "wb") as out:
                    out.write(audio_data)
    return output_path
```

### 参考音频 Supabase Bucket

```sql
-- 创建 ref-audios bucket
INSERT INTO storage.buckets (id, name, public)
VALUES ('ref-audios', 'ref-audios', true)
ON CONFLICT (id) DO NOTHING;

-- RLS 策略
CREATE POLICY "Allow public uploads" ON storage.objects
FOR INSERT TO anon WITH CHECK (bucket_id = 'ref-audios');
```

---

## 更新日志

| 日期 | 版本 | 说明 |
|------|------|------|
| 2026-01-30 | 1.1.0 | 明确默认模型升级为 1.7B-Base，替换旧版 0.6B 路径 |

---

## 参考链接

- [Qwen3-TTS GitHub](https://github.com/QwenLM/Qwen3-TTS)
- [ModelScope 模型](https://modelscope.cn/collections/Qwen/Qwen3-TTS)
- [HuggingFace 模型](https://huggingface.co/collections/Qwen/qwen3-tts)
- [技术报告](https://arxiv.org/abs/2601.15621)
- [官方博客](https://qwen.ai/blog?id=qwen3tts-0115)