更新

2026-01-29 12:16:41 +08:00
parent 4a3dd2b225
commit 661a8f357c
18 changed files with 2092 additions and 80 deletions
--- a/Docs/QWEN3_TTS_DEPLOY.md
+++ b/Docs/QWEN3_TTS_DEPLOY.md
@@ -169,24 +169,106 @@ python test_inference.py

 ---

+## 步骤 6: 安装 HTTP 服务依赖
+
+```bash
+conda activate qwen-tts
+pip install fastapi uvicorn python-multipart
+```
+
+---
+
+## 步骤 7: 启动服务 (PM2 管理)
+
+### 手动测试
+
+```bash
+conda activate qwen-tts
+cd /home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS
+python qwen_tts_server.py
+```
+
+访问 http://localhost:8009/health 验证服务状态。
+
+### PM2 常驻服务
+
+> ⚠️ **注意**：启动脚本 `run_qwen_tts.sh` 位于项目**根目录**，而非 models/Qwen3-TTS 目录。
+
+1. 使用启动脚本:
+```bash
+cd /home/rongye/ProgramFiles/ViGent2
+pm2 start ./run_qwen_tts.sh --name vigent2-qwen-tts
+pm2 save
+```
+
+2. 查看日志:
+```bash
+pm2 logs vigent2-qwen-tts
+```
+
+3. 重启服务:
+```bash
+pm2 restart vigent2-qwen-tts
+```
+
+---
+
 ## 目录结构

 部署完成后，目录结构应如下：

 ```
-/home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS/
-├── checkpoints/
-│   ├── Tokenizer/           # 语音编解码器
-│   └── 0.6B-Base/           # 声音克隆模型
-├── qwen_tts/                # 源码
-│   ├── inference/
-│   ├── models/
-│   └── ...
-├── examples/
-│   └── myvoice.wav          # 参考音频
-├── pyproject.toml
-├── requirements.txt
-└── test_inference.py        # 测试脚本
+/home/rongye/ProgramFiles/ViGent2/
+├── run_qwen_tts.sh              # PM2 启动脚本 (根目录)
+└── models/Qwen3-TTS/
+    ├── checkpoints/
+    │   ├── Tokenizer/           # 语音编解码器
+    │   └── 0.6B-Base/           # 声音克隆模型
+    ├── qwen_tts/                # 源码
+    │   ├── inference/
+    │   ├── models/
+    │   └── ...
+    ├── examples/
+    │   └── myvoice.wav          # 参考音频
+    ├── qwen_tts_server.py       # HTTP 推理服务 (端口 8009)
+    ├── pyproject.toml
+    ├── requirements.txt
+    └── test_inference.py        # 测试脚本
+```
+
+---
+
+## API 参考
+
+### 健康检查
+
+```
+GET http://localhost:8009/health
+```
+
+响应:
+```json
+{
+  "service": "Qwen3-TTS Voice Clone",
+  "model": "0.6B-Base",
+  "ready": true,
+  "gpu_id": 0
+}
+```
+
+### 声音克隆生成
+
+```
+POST http://localhost:8009/generate
+Content-Type: multipart/form-data
+
+Fields:
+  - ref_audio: 参考音频文件 (WAV)
+  - text: 要合成的文本
+  - ref_text: 参考音频的转写文字
+  - language: 语言 (默认 Chinese)
+
+Response: audio/wav 文件
 ```

 ---
@@ -244,6 +326,46 @@ Qwen3-TTS 0.6B 通常只需要 4-6GB VRAM。如果遇到 OOM：

 ---

+## 后端 ViGent2 集成
+
+### 声音克隆服务 (`voice_clone_service.py`)
+
+后端通过 HTTP 调用 Qwen3-TTS 服务：
+
+```python
+import aiohttp
+
+QWEN_TTS_URL = "http://localhost:8009"
+
+async def generate_cloned_audio(ref_audio_path: str, text: str, output_path: str):
+    async with aiohttp.ClientSession() as session:
+        with open(ref_audio_path, "rb") as f:
+            data = aiohttp.FormData()
+            data.add_field("ref_audio", f, filename="ref.wav")
+            data.add_field("text", text)
+            
+            async with session.post(f"{QWEN_TTS_URL}/generate", data=data) as resp:
+                audio_data = await resp.read()
+                with open(output_path, "wb") as out:
+                    out.write(audio_data)
+    return output_path
+```
+
+### 参考音频 Supabase Bucket
+
+```sql
+-- 创建 ref-audios bucket
+INSERT INTO storage.buckets (id, name, public)
+VALUES ('ref-audios', 'ref-audios', true)
+ON CONFLICT (id) DO NOTHING;
+
+-- RLS 策略
+CREATE POLICY "Allow public uploads" ON storage.objects
+FOR INSERT TO anon WITH CHECK (bucket_id = 'ref-audios');
+```
+
+---
+
 ## 参考链接

 - [Qwen3-TTS GitHub](https://github.com/QwenLM/Qwen3-TTS)
@@ -251,3 +373,4 @@ Qwen3-TTS 0.6B 通常只需要 4-6GB VRAM。如果遇到 OOM：
 - [HuggingFace 模型](https://huggingface.co/collections/Qwen/qwen3-tts)
 - [技术报告](https://arxiv.org/abs/2601.15621)
 - [官方博客](https://qwen.ai/blog?id=qwen3tts-0115)
+