385 lines
8.2 KiB
Markdown
385 lines
8.2 KiB
Markdown
# Qwen3-TTS 1.7B 部署指南
|
||
|
||
> 本文档描述如何在 Ubuntu 服务器上部署 Qwen3-TTS 1.7B-Base 声音克隆模型。
|
||
|
||
## 系统要求
|
||
|
||
| 要求 | 规格 |
|
||
|------|------|
|
||
| GPU | NVIDIA RTX 3090 24GB (或更高) |
|
||
| VRAM | ≥ 8GB (推理), ≥ 12GB (带 flash-attn) |
|
||
| CUDA | 12.1+ |
|
||
| Python | 3.10.x |
|
||
| 系统 | Ubuntu 20.04+ |
|
||
|
||
---
|
||
|
||
## GPU 分配
|
||
|
||
| GPU | 服务 | 模型 |
|
||
|-----|------|------|
|
||
| GPU0 | **Qwen3-TTS** | 1.7B-Base (声音克隆,更高质量) |
|
||
| GPU1 | LatentSync | 1.6 (唇形同步) |
|
||
|
||
---
|
||
|
||
## 步骤 1: 克隆仓库
|
||
|
||
```bash
|
||
cd /home/rongye/ProgramFiles/ViGent2/models
|
||
git clone https://github.com/QwenLM/Qwen3-TTS.git
|
||
cd Qwen3-TTS
|
||
```
|
||
|
||
---
|
||
|
||
## 步骤 2: 创建 Conda 环境
|
||
|
||
```bash
|
||
# 创建新的 conda 环境
|
||
conda create -n qwen-tts python=3.10 -y
|
||
conda activate qwen-tts
|
||
```
|
||
|
||
---
|
||
|
||
## 步骤 3: 安装 Python 依赖
|
||
|
||
```bash
|
||
cd /home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS
|
||
|
||
# 安装 qwen-tts 包 (editable mode)
|
||
pip install -e .
|
||
|
||
# 安装 sox 音频处理库 (必须)
|
||
conda install -y -c conda-forge sox
|
||
```
|
||
|
||
### 可选: 安装 FlashAttention (推荐)
|
||
|
||
FlashAttention 可以显著提升推理速度并减少显存占用:
|
||
|
||
```bash
|
||
pip install -U flash-attn --no-build-isolation
|
||
```
|
||
|
||
如果内存不足,可以限制编译并发数:
|
||
|
||
```bash
|
||
MAX_JOBS=4 pip install -U flash-attn --no-build-isolation
|
||
```
|
||
|
||
---
|
||
|
||
## 步骤 4: 下载模型权重
|
||
|
||
### 方式 A: ModelScope (推荐,国内更快)
|
||
|
||
```bash
|
||
pip install modelscope
|
||
|
||
# 下载 Tokenizer (651MB)
|
||
modelscope download --model Qwen/Qwen3-TTS-Tokenizer-12Hz --local_dir ./checkpoints/Tokenizer
|
||
|
||
# 下载 1.7B-Base 模型 (6.8GB)
|
||
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./checkpoints/1.7B-Base
|
||
```
|
||
|
||
### 方式 B: HuggingFace
|
||
|
||
```bash
|
||
pip install -U "huggingface_hub[cli]"
|
||
|
||
huggingface-cli download Qwen/Qwen3-TTS-Tokenizer-12Hz --local-dir ./checkpoints/Tokenizer
|
||
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-Base --local-dir ./checkpoints/1.7B-Base
|
||
```
|
||
|
||
下载完成后,目录结构应如下:
|
||
|
||
```
|
||
checkpoints/
|
||
├── Tokenizer/ # ~651MB
|
||
│ ├── config.json
|
||
│ ├── model.safetensors
|
||
│ └── ...
|
||
└── 1.7B-Base/ # ~6.8GB
|
||
├── config.json
|
||
├── model.safetensors
|
||
└── ...
|
||
```
|
||
|
||
---
|
||
|
||
## 步骤 5: 验证安装
|
||
|
||
### 5.1 检查环境
|
||
|
||
```bash
|
||
conda activate qwen-tts
|
||
|
||
# 检查 PyTorch 和 CUDA
|
||
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"
|
||
|
||
# 检查 sox
|
||
sox --version
|
||
```
|
||
|
||
### 5.2 运行推理测试
|
||
|
||
创建测试脚本 `test_inference.py`:
|
||
|
||
```python
|
||
"""Qwen3-TTS 声音克隆测试"""
|
||
import torch
|
||
import soundfile as sf
|
||
from qwen_tts import Qwen3TTSModel
|
||
|
||
print("Loading Qwen3-TTS model on GPU:0...")
|
||
model = Qwen3TTSModel.from_pretrained(
|
||
"./checkpoints/1.7B-Base",
|
||
device_map="cuda:0",
|
||
dtype=torch.bfloat16,
|
||
)
|
||
print("Model loaded!")
|
||
|
||
# 测试声音克隆 (需要准备参考音频)
|
||
ref_audio = "./examples/myvoice.wav" # 3-20秒的参考音频
|
||
ref_text = "参考音频的文字内容"
|
||
|
||
test_text = "这是一段测试文本,用于验证声音克隆功能是否正常工作。"
|
||
|
||
print("Generating cloned voice...")
|
||
wavs, sr = model.generate_voice_clone(
|
||
text=test_text,
|
||
language="Chinese",
|
||
ref_audio=ref_audio,
|
||
ref_text=ref_text,
|
||
)
|
||
|
||
sf.write("test_output.wav", wavs[0], sr)
|
||
print(f"✅ Saved: test_output.wav | {sr}Hz | {len(wavs[0])/sr:.2f}s")
|
||
```
|
||
|
||
运行测试:
|
||
|
||
```bash
|
||
cd /home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS
|
||
python test_inference.py
|
||
```
|
||
|
||
---
|
||
|
||
## 步骤 6: 安装 HTTP 服务依赖
|
||
|
||
```bash
|
||
conda activate qwen-tts
|
||
pip install fastapi uvicorn python-multipart
|
||
```
|
||
|
||
---
|
||
|
||
## 步骤 7: 启动服务 (PM2 管理)
|
||
|
||
### 手动测试
|
||
|
||
```bash
|
||
conda activate qwen-tts
|
||
cd /home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS
|
||
python qwen_tts_server.py
|
||
```
|
||
|
||
访问 http://localhost:8009/health 验证服务状态。
|
||
|
||
### PM2 常驻服务
|
||
|
||
> ⚠️ **注意**:启动脚本 `run_qwen_tts.sh` 位于项目**根目录**,而非 models/Qwen3-TTS 目录。
|
||
|
||
1. 使用启动脚本:
|
||
```bash
|
||
cd /home/rongye/ProgramFiles/ViGent2
|
||
pm2 start ./run_qwen_tts.sh --name vigent2-qwen-tts
|
||
pm2 save
|
||
```
|
||
|
||
2. 查看日志:
|
||
```bash
|
||
pm2 logs vigent2-qwen-tts
|
||
```
|
||
|
||
3. 重启服务:
|
||
```bash
|
||
pm2 restart vigent2-qwen-tts
|
||
```
|
||
|
||
---
|
||
|
||
## 目录结构
|
||
|
||
部署完成后,目录结构应如下:
|
||
|
||
```
|
||
/home/rongye/ProgramFiles/ViGent2/
|
||
├── run_qwen_tts.sh # PM2 启动脚本 (根目录)
|
||
└── models/Qwen3-TTS/
|
||
├── checkpoints/
|
||
│ ├── Tokenizer/ # 语音编解码器
|
||
│ └── 1.7B-Base/ # 声音克隆模型 (更高质量)
|
||
├── qwen_tts/ # 源码
|
||
│ ├── inference/
|
||
│ ├── models/
|
||
│ └── ...
|
||
├── examples/
|
||
│ └── myvoice.wav # 参考音频
|
||
├── qwen_tts_server.py # HTTP 推理服务 (端口 8009)
|
||
├── pyproject.toml
|
||
├── requirements.txt
|
||
└── test_inference.py # 测试脚本
|
||
```
|
||
|
||
---
|
||
|
||
## API 参考
|
||
|
||
### 健康检查
|
||
|
||
```
|
||
GET http://localhost:8009/health
|
||
```
|
||
|
||
响应:
|
||
```json
|
||
{
|
||
"service": "Qwen3-TTS Voice Clone",
|
||
"model": "1.7B-Base",
|
||
"ready": true,
|
||
"gpu_id": 0
|
||
}
|
||
```
|
||
|
||
### 声音克隆生成
|
||
|
||
```
|
||
POST http://localhost:8009/generate
|
||
Content-Type: multipart/form-data
|
||
|
||
Fields:
|
||
- ref_audio: 参考音频文件 (WAV)
|
||
- text: 要合成的文本
|
||
- ref_text: 参考音频的转写文字
|
||
- language: 语言 (默认 Chinese)
|
||
|
||
Response: audio/wav 文件
|
||
```
|
||
|
||
---
|
||
|
||
## 模型说明
|
||
|
||
### 可用模型
|
||
|
||
| 模型 | 功能 | 大小 |
|
||
|------|------|------|
|
||
| 0.6B-Base | 3秒快速声音克隆 | 2.4GB |
|
||
| 0.6B-CustomVoice | 9种预设音色 | 2.4GB |
|
||
| **1.7B-Base** | **声音克隆 (更高质量)** ✅ 当前使用 | 6.8GB |
|
||
| 1.7B-VoiceDesign | 自然语言描述生成声音 | 6.8GB |
|
||
|
||
### 支持语言
|
||
|
||
中文、英语、日语、韩语、德语、法语、俄语、葡萄牙语、西班牙语、意大利语
|
||
|
||
---
|
||
|
||
## 故障排除
|
||
|
||
### sox 未找到
|
||
|
||
```
|
||
SoX could not be found!
|
||
```
|
||
|
||
**解决**: 通过 conda 安装 sox:
|
||
|
||
```bash
|
||
conda install -y -c conda-forge sox
|
||
```
|
||
|
||
### CUDA 内存不足
|
||
|
||
Qwen3-TTS 1.7B 通常需要 8-10GB VRAM。如果遇到 OOM:
|
||
|
||
1. 确保 GPU0 没有运行其他程序
|
||
2. 不使用 flash-attn (会增加显存占用)
|
||
3. 使用更小的参考音频 (3-5秒)
|
||
4. 如果显存仍不足,可降级使用 0.6B-Base 模型
|
||
|
||
### 模型加载失败
|
||
|
||
确保以下文件存在:
|
||
- `checkpoints/1.7B-Base/config.json`
|
||
- `checkpoints/1.7B-Base/model.safetensors`
|
||
|
||
### 音频输出质量问题
|
||
|
||
1. 参考音频质量:使用清晰、无噪音的 3-10 秒音频
|
||
2. ref_text 准确性:参考音频的转写文字必须准确
|
||
3. 语言设置:确保 `language` 参数与文本语言一致
|
||
|
||
---
|
||
|
||
## 后端 ViGent2 集成
|
||
|
||
### 声音克隆服务 (`voice_clone_service.py`)
|
||
|
||
后端通过 HTTP 调用 Qwen3-TTS 服务:
|
||
|
||
```python
|
||
import aiohttp
|
||
|
||
QWEN_TTS_URL = "http://localhost:8009"
|
||
|
||
async def generate_cloned_audio(ref_audio_path: str, text: str, output_path: str):
|
||
async with aiohttp.ClientSession() as session:
|
||
with open(ref_audio_path, "rb") as f:
|
||
data = aiohttp.FormData()
|
||
data.add_field("ref_audio", f, filename="ref.wav")
|
||
data.add_field("text", text)
|
||
|
||
async with session.post(f"{QWEN_TTS_URL}/generate", data=data) as resp:
|
||
audio_data = await resp.read()
|
||
with open(output_path, "wb") as out:
|
||
out.write(audio_data)
|
||
return output_path
|
||
```
|
||
|
||
### 参考音频 Supabase Bucket
|
||
|
||
```sql
|
||
-- 创建 ref-audios bucket
|
||
INSERT INTO storage.buckets (id, name, public)
|
||
VALUES ('ref-audios', 'ref-audios', true)
|
||
ON CONFLICT (id) DO NOTHING;
|
||
|
||
-- RLS 策略
|
||
CREATE POLICY "Allow public uploads" ON storage.objects
|
||
FOR INSERT TO anon WITH CHECK (bucket_id = 'ref-audios');
|
||
```
|
||
|
||
---
|
||
|
||
## 更新日志
|
||
|
||
| 日期 | 版本 | 说明 |
|
||
|------|------|------|
|
||
| 2026-01-30 | 1.1.0 | 明确默认模型升级为 1.7B-Base,替换旧版 0.6B 路径 |
|
||
|
||
---
|
||
|
||
## 参考链接
|
||
|
||
- [Qwen3-TTS GitHub](https://github.com/QwenLM/Qwen3-TTS)
|
||
- [ModelScope 模型](https://modelscope.cn/collections/Qwen/Qwen3-TTS)
|
||
- [HuggingFace 模型](https://huggingface.co/collections/Qwen/qwen3-tts)
|
||
- [技术报告](https://arxiv.org/abs/2601.15621)
|
||
- [官方博客](https://qwen.ai/blog?id=qwen3tts-0115)
|