Originals/ViGent2

Fork 0

Files

Kevin Wong e33dfc3031 更新

2026-02-10 13:31:29 +08:00

8.6 KiB

Raw Blame History

Qwen3-TTS 1.7B 部署指南

本文档描述如何在 Ubuntu 服务器上部署 Qwen3-TTS 1.7B-Base 声音克隆模型。

系统要求

要求	规格
GPU	NVIDIA RTX 3090 24GB (或更高)
VRAM	≥ 8GB (推理), ≥ 12GB (带 flash-attn)
CUDA	12.1+
Python	3.10.x
系统	Ubuntu 20.04+

GPU 分配

GPU	服务	模型
GPU0	Qwen3-TTS	1.7B-Base (声音克隆，更高质量)
GPU1	LatentSync	1.6 (唇形同步)

步骤 1: 克隆仓库

cd /home/rongye/ProgramFiles/ViGent2/models
git clone https://github.com/QwenLM/Qwen3-TTS.git
cd Qwen3-TTS

步骤 2: 创建 Conda 环境

# 创建新的 conda 环境
conda create -n qwen-tts python=3.10 -y
conda activate qwen-tts

步骤 3: 安装 Python 依赖

cd /home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS

# 安装 qwen-tts 包 (editable mode)
pip install -e .

# 安装 sox 音频处理库 (必须)
conda install -y -c conda-forge sox

可选: 安装 FlashAttention (强烈推荐)

FlashAttention 可以显著提升推理速度 (加载时间减少 85%) 并减少显存占用：

pip install -U flash-attn --no-build-isolation

如果内存不足，可以限制编译并发数：

MAX_JOBS=4 pip install -U flash-attn --no-build-isolation

步骤 4: 下载模型权重

方式 A: ModelScope (推荐，国内更快)

pip install modelscope

# 下载 Tokenizer (651MB)
modelscope download --model Qwen/Qwen3-TTS-Tokenizer-12Hz --local_dir ./checkpoints/Tokenizer

# 下载 1.7B-Base 模型 (6.8GB)
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./checkpoints/1.7B-Base

方式 B: HuggingFace

pip install -U "huggingface_hub[cli]"

huggingface-cli download Qwen/Qwen3-TTS-Tokenizer-12Hz --local-dir ./checkpoints/Tokenizer
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-Base --local-dir ./checkpoints/1.7B-Base

下载完成后，目录结构应如下：

checkpoints/
├── Tokenizer/       # ~651MB
│   ├── config.json
│   ├── model.safetensors
│   └── ...
└── 1.7B-Base/       # ~6.8GB
    ├── config.json
    ├── model.safetensors
    └── ...

步骤 5: 验证安装

5.1 检查环境

conda activate qwen-tts

# 检查 PyTorch 和 CUDA
python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')"

# 检查 sox
sox --version

5.2 运行推理测试

创建测试脚本 test_inference.py:

"""Qwen3-TTS 声音克隆测试"""
import torch
import soundfile as sf
from qwen_tts import Qwen3TTSModel

print("Loading Qwen3-TTS model on GPU:0...")
model = Qwen3TTSModel.from_pretrained(
    "./checkpoints/1.7B-Base",
    device_map="cuda:0",
    dtype=torch.bfloat16,
)
print("Model loaded!")

# 测试声音克隆 (需要准备参考音频)
ref_audio = "./examples/myvoice.wav"  # 3-20秒的参考音频
ref_text = "参考音频的文字内容"

test_text = "这是一段测试文本，用于验证声音克隆功能是否正常工作。"

print("Generating cloned voice...")
wavs, sr = model.generate_voice_clone(
    text=test_text,
    language="Chinese",
    ref_audio=ref_audio,
    ref_text=ref_text,
)

sf.write("test_output.wav", wavs[0], sr)
print(f"✅ Saved: test_output.wav | {sr}Hz | {len(wavs[0])/sr:.2f}s")

运行测试：

cd /home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS
python test_inference.py

步骤 6: 安装 HTTP 服务依赖

conda activate qwen-tts
pip install fastapi uvicorn python-multipart

步骤 7: 启动服务 (PM2 管理)

手动测试

conda activate qwen-tts
cd /home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS
python qwen_tts_server.py

访问 http://localhost:8009/health 验证服务状态。

PM2 常驻服务

⚠️ 注意：启动脚本 run_qwen_tts.sh 位于项目根目录，而非 models/Qwen3-TTS 目录。

使用启动脚本:

cd /home/rongye/ProgramFiles/ViGent2
pm2 start ./run_qwen_tts.sh --name vigent2-qwen-tts
pm2 save

查看日志:

pm2 logs vigent2-qwen-tts

重启服务:

pm2 restart vigent2-qwen-tts

目录结构

部署完成后，目录结构应如下：

/home/rongye/ProgramFiles/ViGent2/
├── run_qwen_tts.sh              # PM2 启动脚本 (根目录)
└── models/Qwen3-TTS/
    ├── checkpoints/
    │   ├── Tokenizer/           # 语音编解码器
    │   └── 1.7B-Base/           # 声音克隆模型 (更高质量)
    ├── qwen_tts/                # 源码
    │   ├── inference/
    │   ├── models/
    │   └── ...
    ├── examples/
    │   └── myvoice.wav          # 参考音频
    ├── qwen_tts_server.py       # HTTP 推理服务 (端口 8009)
    ├── pyproject.toml
    ├── requirements.txt
    └── test_inference.py        # 测试脚本

API 参考

健康检查

GET http://localhost:8009/health

响应:

{
  "service": "Qwen3-TTS Voice Clone",
  "model": "1.7B-Base",
  "ready": true,
  "gpu_id": 0
}

声音克隆生成

POST http://localhost:8009/generate
Content-Type: multipart/form-data

Fields:
  - ref_audio: 参考音频文件 (WAV)
  - text: 要合成的文本
  - ref_text: 参考音频的转写文字
  - language: 语言 (默认 Chinese)

Response: audio/wav 文件

模型说明

可用模型

模型	功能	大小
0.6B-Base	3秒快速声音克隆	2.4GB
0.6B-CustomVoice	9种预设音色	2.4GB
1.7B-Base	声音克隆 (更高质量) ✅ 当前使用	6.8GB
1.7B-VoiceDesign	自然语言描述生成声音	6.8GB

支持语言

中文、英语、日语、韩语、德语、法语、俄语、葡萄牙语、西班牙语、意大利语

故障排除

sox 未找到

SoX could not be found!

解决:

通过 conda 安装 sox：

conda install -y -c conda-forge sox

确保启动脚本 run_qwen_tts.sh 中已 export conda env bin 到 PATH（PM2 启动时系统 PATH 不含 conda 环境目录）：

export PATH="/home/rongye/ProgramFiles/miniconda3/envs/qwen-tts/bin:$PATH"

CUDA 内存不足

Qwen3-TTS 1.7B 通常需要 8-10GB VRAM。如果遇到 OOM：

确保 GPU0 没有运行其他程序
不使用 flash-attn (会增加显存占用)
使用更小的参考音频 (3-5秒)
如果显存仍不足，可降级使用 0.6B-Base 模型

模型加载失败

确保以下文件存在：

checkpoints/1.7B-Base/config.json
checkpoints/1.7B-Base/model.safetensors

音频输出质量问题

参考音频质量：使用清晰、无噪音的 3-10 秒音频
ref_text 准确性：参考音频的转写文字必须准确
语言设置：确保 language 参数与文本语言一致

后端 ViGent2 集成

声音克隆服务 (`voice_clone_service.py`)

后端通过 HTTP 调用 Qwen3-TTS 服务：

import aiohttp

QWEN_TTS_URL = "http://localhost:8009"

async def generate_cloned_audio(ref_audio_path: str, text: str, output_path: str):
    async with aiohttp.ClientSession() as session:
        with open(ref_audio_path, "rb") as f:
            data = aiohttp.FormData()
            data.add_field("ref_audio", f, filename="ref.wav")
            data.add_field("text", text)
            
            async with session.post(f"{QWEN_TTS_URL}/generate", data=data) as resp:
                audio_data = await resp.read()
                with open(output_path, "wb") as out:
                    out.write(audio_data)
    return output_path

参考音频 Supabase Bucket

-- 创建 ref-audios bucket
INSERT INTO storage.buckets (id, name, public)
VALUES ('ref-audios', 'ref-audios', true)
ON CONFLICT (id) DO NOTHING;

-- RLS 策略
CREATE POLICY "Allow public uploads" ON storage.objects
FOR INSERT TO anon WITH CHECK (bucket_id = 'ref-audios');

更新日志

日期	版本	说明
2026-02-09	1.2.0	修复 SoX PATH 问题（run_qwen_tts.sh export conda bin），每次生成后 empty_cache()
2026-01-30	1.1.0	明确默认模型升级为 1.7B-Base，替换旧版 0.6B 路径

8.6 KiB Raw Blame History Unescape Escape