Compare commits
38 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
0e3502c6f0 | ||
|
|
a1604979f0 | ||
|
|
08221e48de | ||
|
|
42b5cc0c02 | ||
|
|
1717635bfd | ||
|
|
0a5a17402c | ||
|
|
bc0fe9326a | ||
|
|
035ee29d72 | ||
|
|
a6cc919e5c | ||
|
|
96a298e51c | ||
|
|
e33dfc3031 | ||
|
|
3129d45b25 | ||
|
|
e226224119 | ||
|
|
ee342cc40f | ||
|
|
1a291a03b8 | ||
|
|
1e52346eb4 | ||
|
|
945262a7fc | ||
|
|
be6a3436bb | ||
|
|
b2c1042c5c | ||
|
|
aaa8088c82 | ||
|
|
31469ca01d | ||
|
|
22ea3dd0db | ||
|
|
8a5912c517 | ||
|
|
74516dbcdb | ||
|
|
5357d97012 | ||
|
|
33d8e52802 | ||
|
|
9af50a9066 | ||
|
|
6c6fbae13a | ||
|
|
cb10da52fc | ||
|
|
eb3ed23326 | ||
|
|
6e58f4bbe7 | ||
|
|
7bfd6bf862 | ||
|
|
569736d05b | ||
|
|
ec16e08bdb | ||
|
|
6801d3e8aa | ||
|
|
cf679b34bf | ||
|
|
b74bacb0b5 | ||
|
|
661a8f357c |
18
.gitignore
vendored
18
.gitignore
vendored
@@ -20,11 +20,14 @@ node_modules/
|
||||
out/
|
||||
.turbo/
|
||||
|
||||
# ============ IDE ============
|
||||
# ============ IDE / AI 工具 ============
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
.agents/
|
||||
.opencode/
|
||||
.claude/
|
||||
|
||||
# ============ 系统文件 ============
|
||||
.DS_Store
|
||||
@@ -35,11 +38,22 @@ desktop.ini
|
||||
backend/outputs/
|
||||
backend/uploads/
|
||||
backend/cookies/
|
||||
backend/user_data/
|
||||
backend/debug_screenshots/
|
||||
backend/keys/
|
||||
*_cookies.json
|
||||
|
||||
# ============ MuseTalk ============
|
||||
# ============ 模型权重 ============
|
||||
models/*/checkpoints/
|
||||
models/MuseTalk/models/
|
||||
models/MuseTalk/results/
|
||||
models/LatentSync/temp/
|
||||
|
||||
# ============ Remotion 构建 ============
|
||||
remotion/dist/
|
||||
|
||||
# ============ 临时文件 ============
|
||||
Temp/
|
||||
|
||||
# ============ 日志 ============
|
||||
*.log
|
||||
|
||||
278
Docs/ALIPAY_DEPLOY.md
Normal file
278
Docs/ALIPAY_DEPLOY.md
Normal file
@@ -0,0 +1,278 @@
|
||||
# 支付宝付费开通会员 — 部署指南
|
||||
|
||||
本文档涵盖支付宝电脑网站支付功能的完整部署流程。用户注册后通过支付宝付费自动激活会员,有效期 1 年。
|
||||
|
||||
---
|
||||
|
||||
## 前置条件
|
||||
|
||||
- 支付宝企业/个体商户账号
|
||||
- 已在 [支付宝开放平台](https://open.alipay.com) 创建应用并获取 APPID
|
||||
- 应用已开通 **「电脑网站支付」** 产品权限(`alipay.trade.page.pay` 接口)
|
||||
- 服务器域名已配置 HTTPS(支付宝回调要求公网可达)
|
||||
|
||||
---
|
||||
|
||||
## 第一部分:支付宝开放平台配置
|
||||
|
||||
### 1. 创建应用
|
||||
|
||||
登录 https://open.alipay.com → 控制台 → 创建应用(或使用已有应用)。
|
||||
|
||||
### 2. 开通「电脑网站支付」产品
|
||||
|
||||
进入应用详情 → 产品绑定/产品管理 → 添加 **「电脑网站支付」** → 提交审核。
|
||||
|
||||
> **注意**:未开通此产品会导致 `ACQ.ACCESS_FORBIDDEN` 错误。
|
||||
|
||||
### 3. 生成密钥对
|
||||
|
||||
进入应用详情 → 开发设置 → 接口加签方式 → 选择 **RSA2(SHA256)**:
|
||||
|
||||
1. 使用支付宝官方密钥工具生成 RSA2048 密钥对
|
||||
2. 将 **应用公钥** 上传到开放平台
|
||||
3. 上传后平台会显示 **支付宝公钥**(`alipayPublicKey_RSA2`)
|
||||
|
||||
最终你会得到两样东西:
|
||||
- **应用私钥**:你本地保存,代码用来签名请求
|
||||
- **支付宝公钥**:平台返回给你,代码用来验证回调签名
|
||||
|
||||
> 应用公钥只是上传用的中间产物,代码中不需要。
|
||||
|
||||
---
|
||||
|
||||
## 第二部分:服务器配置
|
||||
|
||||
### 1. 放置密钥文件
|
||||
|
||||
将密钥保存为标准 PEM 格式,放到 `backend/keys/` 目录:
|
||||
|
||||
```bash
|
||||
mkdir -p /home/rongye/ProgramFiles/ViGent2/backend/keys
|
||||
```
|
||||
|
||||
**`backend/keys/app_private_key.pem`**(应用私钥):
|
||||
|
||||
```
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MIIEvQIBADANBgkqhkiG9w0BAQEFAASC...(你的私钥内容)
|
||||
...
|
||||
-----END PRIVATE KEY-----
|
||||
```
|
||||
|
||||
**`backend/keys/alipay_public_key.pem`**(支付宝公钥):
|
||||
|
||||
```
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A...(支付宝公钥内容)
|
||||
...
|
||||
-----END PUBLIC KEY-----
|
||||
```
|
||||
|
||||
#### PEM 格式要求
|
||||
|
||||
支付宝密钥工具导出的是一行纯文本,需要转换为标准 PEM 格式:
|
||||
|
||||
- 必须有头尾标记(`-----BEGIN/END ...-----`)
|
||||
- 密钥内容每 64 字符换行
|
||||
- 私钥头标记为 `-----BEGIN PRIVATE KEY-----`(PKCS#8 格式)
|
||||
- 公钥头标记为 `-----BEGIN PUBLIC KEY-----`
|
||||
|
||||
如果你拿到的是一行裸密钥,用以下命令转换:
|
||||
|
||||
```bash
|
||||
# 私钥格式化(假设裸密钥在 raw_private.txt 中)
|
||||
echo "-----BEGIN PRIVATE KEY-----" > app_private_key.pem
|
||||
cat raw_private.txt | fold -w 64 >> app_private_key.pem
|
||||
echo "-----END PRIVATE KEY-----" >> app_private_key.pem
|
||||
|
||||
# 公钥格式化
|
||||
echo "-----BEGIN PUBLIC KEY-----" > alipay_public_key.pem
|
||||
cat raw_public.txt | fold -w 64 >> alipay_public_key.pem
|
||||
echo "-----END PUBLIC KEY-----" >> alipay_public_key.pem
|
||||
```
|
||||
|
||||
> `backend/keys/` 目录已加入 `.gitignore`,不会被提交到仓库。
|
||||
|
||||
### 2. 配置环境变量
|
||||
|
||||
在 `backend/.env` 中添加:
|
||||
|
||||
```ini
|
||||
# =============== 支付宝配置 ===============
|
||||
ALIPAY_APP_ID=你的应用APPID
|
||||
ALIPAY_PRIVATE_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/app_private_key.pem
|
||||
ALIPAY_PUBLIC_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/alipay_public_key.pem
|
||||
ALIPAY_NOTIFY_URL=https://vigent.hbyrkj.top/api/payment/notify
|
||||
ALIPAY_RETURN_URL=https://vigent.hbyrkj.top/pay
|
||||
```
|
||||
|
||||
| 变量 | 说明 |
|
||||
|------|------|
|
||||
| `ALIPAY_APP_ID` | 支付宝开放平台应用 APPID |
|
||||
| `ALIPAY_PRIVATE_KEY_PATH` | 应用私钥 PEM 文件绝对路径 |
|
||||
| `ALIPAY_PUBLIC_KEY_PATH` | 支付宝公钥 PEM 文件绝对路径 |
|
||||
| `ALIPAY_NOTIFY_URL` | 异步回调地址(服务器间通信),必须公网 HTTPS 可达 |
|
||||
| `ALIPAY_RETURN_URL` | 同步跳转地址(用户支付完成后浏览器跳转回的页面) |
|
||||
|
||||
`config.py` 中还有几个可调参数(已有默认值,一般不需要加到 .env):
|
||||
|
||||
| 变量 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `ALIPAY_SANDBOX` | `false` | 是否使用沙箱环境 |
|
||||
| `PAYMENT_AMOUNT` | `999.00` | 会员价格(元) |
|
||||
| `PAYMENT_EXPIRE_DAYS` | `365` | 会员有效天数 |
|
||||
|
||||
### 3. 创建数据库表
|
||||
|
||||
通过 Docker 在本地 Supabase 中执行:
|
||||
|
||||
```bash
|
||||
docker exec -i supabase-db psql -U postgres -c "
|
||||
CREATE TABLE IF NOT EXISTS orders (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
|
||||
out_trade_no TEXT UNIQUE NOT NULL,
|
||||
amount DECIMAL(10, 2) NOT NULL DEFAULT 999.00,
|
||||
status TEXT DEFAULT 'pending' CHECK (status IN ('pending', 'paid', 'failed')),
|
||||
trade_no TEXT,
|
||||
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
|
||||
paid_at TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_orders_user_id ON orders(user_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_orders_out_trade_no ON orders(out_trade_no);
|
||||
"
|
||||
```
|
||||
|
||||
### 4. 安装依赖
|
||||
|
||||
```bash
|
||||
# 后端(在 venv 中)
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
venv/bin/pip install python-alipay-sdk
|
||||
```
|
||||
|
||||
> 前端无额外依赖需要安装。
|
||||
|
||||
### 5. Nginx 配置
|
||||
|
||||
确保 Nginx 将 `/api/payment/notify` 代理到后端。如果现有配置已覆盖 `/api/` 前缀,则无需额外修改:
|
||||
|
||||
```nginx
|
||||
location /api/ {
|
||||
proxy_pass http://localhost:8006;
|
||||
# ... 现有配置
|
||||
}
|
||||
```
|
||||
|
||||
### 6. 重启服务
|
||||
|
||||
```bash
|
||||
# 构建前端
|
||||
cd /home/rongye/ProgramFiles/ViGent2/frontend
|
||||
npx next build
|
||||
|
||||
# 重启
|
||||
pm2 restart vigent2-backend
|
||||
pm2 restart vigent2-frontend
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 第三部分:正式上线
|
||||
|
||||
测试通过后,将 `backend/app/core/config.py` 中的测试金额改为正式价格:
|
||||
|
||||
```python
|
||||
PAYMENT_AMOUNT: float = 999.00 # 正式价格
|
||||
```
|
||||
|
||||
或在 `backend/.env` 中添加覆盖:
|
||||
|
||||
```ini
|
||||
PAYMENT_AMOUNT=999.00
|
||||
```
|
||||
|
||||
然后重启后端:
|
||||
|
||||
```bash
|
||||
pm2 restart vigent2-backend
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 支付流程说明
|
||||
|
||||
```
|
||||
用户注册 → 登录(密码正确但 is_active=false)
|
||||
→ 后端返回 403 + payment_token
|
||||
→ 前端跳转 /pay 页面
|
||||
→ POST /api/payment/create-order → 返回支付宝收银台 URL
|
||||
→ 前端重定向到支付宝收银台页面(支持扫码、账号登录、余额等多种支付方式)
|
||||
→ 用户完成支付
|
||||
→ 支付宝异步回调 POST /api/payment/notify
|
||||
→ 后端验签 → 更新订单 → 激活用户(is_active=true, expires_at=+365天)
|
||||
→ 支付宝同步跳转回 /pay?out_trade_no=xxx
|
||||
→ 前端轮询 GET /api/payment/status/{out_trade_no}
|
||||
→ 轮询到 paid → 提示成功 → 跳转登录页
|
||||
→ 用户重新登录 → 成功进入系统
|
||||
```
|
||||
|
||||
**电脑网站支付 vs 当面付**:电脑网站支付(`alipay.trade.page.pay`)会跳转到支付宝官方收银台页面,用户可以选择扫码、支付宝账号登录、余额等多种方式支付,体验更好。当面付(`alipay.trade.precreate`)仅生成一个二维码,只能扫码支付。
|
||||
|
||||
会员到期续费同流程:登录时检测到过期 → 返回 PAYMENT_REQUIRED → 跳转 /pay。
|
||||
|
||||
管理员手动激活功能不受影响,两种方式并存。
|
||||
|
||||
---
|
||||
|
||||
## 涉及文件
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|---------|------|
|
||||
| `backend/requirements.txt` | 修改 | 添加 `python-alipay-sdk` |
|
||||
| `backend/database/schema.sql` | 修改 | 新增 `orders` 表 |
|
||||
| `backend/app/core/config.py` | 修改 | 支付宝配置项 |
|
||||
| `backend/app/core/security.py` | 修改 | payment_token 函数 |
|
||||
| `backend/app/core/deps.py` | 修改 | is_active 安全兜底 |
|
||||
| `backend/app/repositories/orders.py` | 新建 | orders 数据层 |
|
||||
| `backend/app/modules/payment/__init__.py` | 新建 | 模块初始化 |
|
||||
| `backend/app/modules/payment/schemas.py` | 新建 | 请求/响应模型 |
|
||||
| `backend/app/modules/payment/service.py` | 新建 | 支付业务逻辑(电脑网站支付) |
|
||||
| `backend/app/modules/payment/router.py` | 新建 | 3 个 API 端点 |
|
||||
| `backend/app/modules/auth/router.py` | 修改 | 登录返回 PAYMENT_REQUIRED |
|
||||
| `backend/app/main.py` | 修改 | 注册 payment_router |
|
||||
| `backend/.env` | 修改 | 支付宝环境变量 |
|
||||
| `backend/keys/` | 新建 | PEM 密钥文件 |
|
||||
| `frontend/src/shared/lib/auth.ts` | 修改 | login() 处理 paymentToken |
|
||||
| `frontend/src/shared/api/axios.ts` | 修改 | PUBLIC_PATHS 加 /pay |
|
||||
| `frontend/src/app/login/page.tsx` | 修改 | paymentToken 跳转 |
|
||||
| `frontend/src/app/register/page.tsx` | 修改 | 注册成功提示文案 |
|
||||
| `frontend/src/app/pay/page.tsx` | 新建 | 付费页面(重定向到支付宝收银台) |
|
||||
|
||||
---
|
||||
|
||||
## 常见问题
|
||||
|
||||
### RSA key format is not supported
|
||||
|
||||
密钥文件缺少 PEM 头尾标记或未按 64 字符换行。参考「PEM 格式要求」重新格式化。
|
||||
|
||||
### ACQ.ACCESS_FORBIDDEN
|
||||
|
||||
应用未开通「电脑网站支付」产品。在支付宝开放平台 → 应用详情 → 产品管理中添加并开通。
|
||||
|
||||
### 支付宝回调不到
|
||||
|
||||
1. 检查 `ALIPAY_NOTIFY_URL` 是否公网 HTTPS 可达
|
||||
2. 检查 Nginx 是否将 `/api/payment/notify` 代理到后端
|
||||
3. 支付宝回调超时(15s 未响应)会重试,共重试 8 次,持续 24 小时
|
||||
|
||||
### 支付完成后页面未跳转回来
|
||||
|
||||
检查 `ALIPAY_RETURN_URL` 配置是否正确,必须是前端 `/pay` 页面的完整 URL(如 `https://vigent.hbyrkj.top/pay`)。支付宝会在用户支付完成后将浏览器重定向到此地址,并附带 `out_trade_no` 等参数。
|
||||
|
||||
### 前端显示"网络错误"而非具体错误
|
||||
|
||||
API 函数缺少 try/catch 捕获 axios 异常。已在 `auth.ts` 的 `register()` 和 `login()` 中修复。
|
||||
209
Docs/BACKEND_DEV.md
Normal file
209
Docs/BACKEND_DEV.md
Normal file
@@ -0,0 +1,209 @@
|
||||
# ViGent2 后端开发规范
|
||||
|
||||
本文档定义后端开发的结构规范、接口契约与实现习惯。目标是让新功能按统一范式落地,旧逻辑在修复时逐步抽离。
|
||||
|
||||
---
|
||||
|
||||
## 1. 模块化与分层原则
|
||||
|
||||
每个业务功能放入 `app/modules/<feature>/`,以“薄路由 + 厚服务/流程”组织代码。
|
||||
|
||||
- **router.py**:只做参数校验、权限校验、调用 service/workflow、返回统一响应。
|
||||
- **schemas.py**:Pydantic 请求/响应模型。
|
||||
- **service.py**:业务逻辑与集成逻辑(非长流程)。
|
||||
- **workflow.py**:长流程/重任务编排(视频生成、渲染、异步任务)。
|
||||
- **__init__.py**:模块标记。
|
||||
|
||||
其它层级职责:
|
||||
|
||||
- **repositories/**:数据读写(Supabase),不包含业务逻辑。
|
||||
- **services/**:外部依赖与基础能力(TTS、Storage、Remotion 等)。
|
||||
- **core/**:配置、安全、依赖注入、统一响应。
|
||||
|
||||
---
|
||||
|
||||
## 2. 目录结构(当前约定)
|
||||
|
||||
```
|
||||
backend/
|
||||
├── app/
|
||||
│ ├── core/ # config、deps、security、response
|
||||
│ ├── modules/ # 业务模块(路由 + 逻辑)
|
||||
│ │ ├── videos/ # 视频生成任务(router/schemas/service/workflow)
|
||||
│ │ ├── materials/ # 素材管理(router/schemas/service)
|
||||
│ │ ├── publish/ # 多平台发布
|
||||
│ │ ├── auth/ # 认证与会话
|
||||
│ │ ├── ai/ # AI 功能(标题标签生成、多语言翻译)
|
||||
│ │ ├── assets/ # 静态资源(字体/样式/BGM)
|
||||
│ │ ├── ref_audios/ # 声音克隆参考音频(router/schemas/service)
|
||||
│ │ ├── generated_audios/ # 预生成配音管理(router/schemas/service)
|
||||
│ │ ├── login_helper/ # 扫码登录辅助
|
||||
│ │ ├── tools/ # 工具接口(router/schemas/service)
|
||||
│ │ ├── payment/ # 支付宝付费开通(router/schemas/service)
|
||||
│ │ └── admin/ # 管理员功能
|
||||
│ ├── repositories/ # Supabase 数据访问
|
||||
│ ├── services/ # 外部服务集成
|
||||
│ │ ├── uploader/ # 平台发布器(douyin/weixin)
|
||||
│ │ ├── qr_login_service.py
|
||||
│ │ ├── publish_service.py
|
||||
│ │ ├── remotion_service.py
|
||||
│ │ ├── storage.py
|
||||
│ │ └── ...
|
||||
│ └── tests/
|
||||
├── assets/ # 字体 / 样式 / bgm
|
||||
├── user_data/ # 用户隔离数据(Cookie 等)
|
||||
├── scripts/
|
||||
└── requirements.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 接口契约规范(统一响应)
|
||||
|
||||
所有 JSON API 返回统一结构:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "ok",
|
||||
"data": { },
|
||||
"code": 0
|
||||
}
|
||||
```
|
||||
|
||||
- 正常响应使用 `success_response`。
|
||||
- 错误通过 `HTTPException` 抛出,统一由全局异常处理返回 `{success:false, message, code}`。
|
||||
- 不再使用 `detail` 作为前端错误文案(前端已改为读 `message`)。
|
||||
|
||||
### `/api/videos/generate` 参数契约(关键约定)
|
||||
|
||||
- `custom_assignments` 每项使用 `material_path/start/end/source_start/source_end?`,并以时间轴可见段为准。
|
||||
- `output_aspect_ratio` 仅允许 `9:16` / `16:9`,默认 `9:16`。
|
||||
- 标题显示模式参数:
|
||||
- `title_display_mode`: `short` / `persistent`(默认 `short`)
|
||||
- `title_duration`: 默认 `4.0`(秒),仅 `short` 模式生效
|
||||
- 片头副标题参数:
|
||||
- `secondary_title`: 副标题文字(可选,限 20 字),仅在视频画面中显示,不参与发布标题
|
||||
- `secondary_title_style_id` / `secondary_title_font_size` / `secondary_title_top_margin`: 副标题样式配置
|
||||
- workflow/remotion 侧需保持字段透传一致,避免前后端语义漂移。
|
||||
|
||||
---
|
||||
|
||||
## 4. 认证与权限
|
||||
|
||||
- 认证方式:**HttpOnly Cookie** (`access_token`)。
|
||||
- `get_current_user` / `get_current_user_optional` 位于 `core/deps.py`。
|
||||
- Session 单设备校验使用 `repositories/sessions.py`。
|
||||
|
||||
---
|
||||
|
||||
## 5. 任务与状态
|
||||
|
||||
- 视频生成任务通过 `modules/videos/workflow.py` 统一编排。
|
||||
- 任务状态通过 `modules/videos/task_store.py` 读写,**不要直接维护全局 dict**。
|
||||
- 默认使用 Redis(`REDIS_URL`),不可用自动回退内存。
|
||||
|
||||
---
|
||||
|
||||
## 6. 文件与存储
|
||||
|
||||
- 所有文件上传/下载/删除/移动通过 `services/storage.py`。
|
||||
- 需要重命名时使用 `move_file`,避免直接读写 Storage。
|
||||
|
||||
### Cookie 存储(用户隔离)
|
||||
|
||||
多平台扫码登录产生的 Cookie 按用户隔离存储:
|
||||
|
||||
```
|
||||
backend/user_data/{user_uuid}/cookies/
|
||||
├── douyin_cookies.json
|
||||
├── weixin_cookies.json
|
||||
└── ...
|
||||
```
|
||||
|
||||
- `publish_service.py` 中通过 `_get_cookies_dir(user_id)` / `_get_cookie_path(user_id, platform)` 定位
|
||||
- 会话 key 格式:`"{user_id}_{platform}"`,确保多用户并发登录互不干扰
|
||||
- 登录成功后 Cookie 自动保存到对应路径,发布时自动加载
|
||||
|
||||
---
|
||||
|
||||
## 7. 代码约定
|
||||
|
||||
- 只在 router 做校验与响应拼装。
|
||||
- 业务逻辑写在 service/workflow。
|
||||
- 数据库访问写在 repositories。
|
||||
- 统一使用 `loguru` 打日志。
|
||||
|
||||
---
|
||||
|
||||
## 8. 开发流程建议
|
||||
|
||||
- **新增功能**:先建模块,**必须**包含 `router.py + schemas.py + service.py`,不允许 router-only。
|
||||
- **修复 Bug**:顺手把涉及的逻辑抽到对应 service/workflow(渐进式改造)。
|
||||
- **改旧模块**:改动哪部分就拆哪部分,不要求一次重构整个文件。
|
||||
- **核心流程变更**:必跑冒烟(登录/生成/发布)。
|
||||
|
||||
> **渐进原则**:新代码高标准,旧代码逐步改。不做大规模一次性重构,避免引入回归风险。
|
||||
|
||||
---
|
||||
|
||||
## 9. 常用环境变量
|
||||
|
||||
- `SUPABASE_URL` / `SUPABASE_KEY`
|
||||
- `SUPABASE_PUBLIC_URL`
|
||||
- `REDIS_URL`
|
||||
- `GLM_API_KEY`
|
||||
- `LATENTSYNC_*`
|
||||
- `CORS_ORIGINS` (CORS 白名单,默认 *)
|
||||
|
||||
### MuseTalk / 混合唇形同步
|
||||
- `MUSETALK_GPU_ID` (GPU 编号,默认 0)
|
||||
- `MUSETALK_API_URL` (常驻服务地址,默认 http://localhost:8011)
|
||||
- `MUSETALK_BATCH_SIZE` (推理批大小,默认 32)
|
||||
- `MUSETALK_VERSION` (v15)
|
||||
- `MUSETALK_USE_FLOAT16` (半精度,默认 true)
|
||||
- `LIPSYNC_DURATION_THRESHOLD` (秒,>=此值用 MuseTalk,默认 120)
|
||||
|
||||
### 微信视频号
|
||||
- `WEIXIN_HEADLESS_MODE` (headful/headless-new)
|
||||
- `WEIXIN_CHROME_PATH` / `WEIXIN_BROWSER_CHANNEL`
|
||||
- `WEIXIN_USER_AGENT` / `WEIXIN_LOCALE` / `WEIXIN_TIMEZONE_ID`
|
||||
- `WEIXIN_FORCE_SWIFTSHADER`
|
||||
- `WEIXIN_TRANSCODE_MODE` (reencode/faststart/off)
|
||||
|
||||
### 抖音
|
||||
- `DOUYIN_HEADLESS_MODE` (headful/headless-new,默认 headless-new)
|
||||
- `DOUYIN_CHROME_PATH` / `DOUYIN_BROWSER_CHANNEL`
|
||||
- `DOUYIN_USER_AGENT` (默认 Chrome/144)
|
||||
- `DOUYIN_LOCALE` / `DOUYIN_TIMEZONE_ID`
|
||||
- `DOUYIN_FORCE_SWIFTSHADER`
|
||||
- `DOUYIN_DEBUG_ARTIFACTS` / `DOUYIN_RECORD_VIDEO` / `DOUYIN_KEEP_SUCCESS_VIDEO`
|
||||
|
||||
### 支付宝
|
||||
- `ALIPAY_APP_ID` / `ALIPAY_PRIVATE_KEY_PATH` / `ALIPAY_PUBLIC_KEY_PATH`
|
||||
- `ALIPAY_NOTIFY_URL` / `ALIPAY_RETURN_URL`
|
||||
- `ALIPAY_SANDBOX` (沙箱模式,默认 false)
|
||||
- `PAYMENT_AMOUNT` (会员价格,默认 999.00)
|
||||
- `PAYMENT_EXPIRE_DAYS` (会员有效天数,默认 365)
|
||||
|
||||
---
|
||||
|
||||
## 10. Playwright 发布调试
|
||||
|
||||
- 诊断日志落盘:`backend/app/debug_screenshots/weixin_network.log` / `douyin_network.log`
|
||||
- 关键失败截图:`backend/app/debug_screenshots/weixin_*.png` / `douyin_*.png`
|
||||
- 视频号建议使用 headful + xvfb-run(避免 headless 解码/指纹问题)
|
||||
|
||||
---
|
||||
|
||||
## 11. 最小新增模块示例
|
||||
|
||||
```
|
||||
app/modules/foo/
|
||||
├── router.py
|
||||
├── schemas.py
|
||||
├── service.py
|
||||
└── workflow.py
|
||||
```
|
||||
|
||||
router 仅调用 service/workflow 并返回 `success_response`。
|
||||
270
Docs/BACKEND_README.md
Normal file
270
Docs/BACKEND_README.md
Normal file
@@ -0,0 +1,270 @@
|
||||
# ViGent2 后端开发指南
|
||||
|
||||
本文档提供后端架构概览与接口规范。开发规范与分层约定见 `Docs/BACKEND_DEV.md`。
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ 架构概览
|
||||
|
||||
后端采用 **FastAPI** 框架,基于 Python 3.10+ 构建,主要负责业务逻辑处理、AI 任务调度以及与各微服务组件的交互。
|
||||
|
||||
### 目录结构
|
||||
|
||||
```
|
||||
backend/
|
||||
├── app/
|
||||
│ ├── core/ # 核心配置 (config.py, security.py, response.py)
|
||||
│ ├── modules/ # 业务模块 (router/service/workflow/schemas)
|
||||
│ │ ├── videos/ # 视频生成任务(router/schemas/service/workflow)
|
||||
│ │ ├── materials/ # 素材管理(router/schemas/service)
|
||||
│ │ ├── publish/ # 多平台发布
|
||||
│ │ ├── auth/ # 认证与会话
|
||||
│ │ ├── ai/ # AI 功能(标题标签生成、多语言翻译)
|
||||
│ │ ├── assets/ # 静态资源(字体/样式/BGM)
|
||||
│ │ ├── ref_audios/ # 声音克隆参考音频(router/schemas/service)
|
||||
│ │ ├── generated_audios/ # 预生成配音管理(router/schemas/service)
|
||||
│ │ ├── login_helper/ # 扫码登录辅助
|
||||
│ │ ├── tools/ # 工具接口(router/schemas/service)
|
||||
│ │ ├── payment/ # 支付宝付费开通(router/schemas/service)
|
||||
│ │ └── admin/ # 管理员功能
|
||||
│ ├── repositories/ # Supabase 数据访问
|
||||
│ ├── services/ # 外部服务集成 (TTS/Remotion/Storage/Uploader 等)
|
||||
│ └── tests/ # 单元测试与集成测试
|
||||
├── scripts/ # 运维脚本 (watchdog.py, init_db.py)
|
||||
├── assets/ # 资源库 (fonts, bgm, styles)
|
||||
├── user_data/ # 用户隔离数据 (Cookie 等)
|
||||
└── requirements.txt # 依赖清单
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔌 API 接口规范
|
||||
|
||||
后端服务默认运行在 `8006` 端口。
|
||||
|
||||
- **文档地址**: `http://localhost:8006/docs` (Swagger UI)
|
||||
- **认证方式**: HttpOnly Cookie (JWT)
|
||||
|
||||
### 核心模块
|
||||
|
||||
1. **认证 (Auth)**
|
||||
* `POST /api/auth/login`: 用户登录 (手机号)
|
||||
* `POST /api/auth/register`: 用户注册
|
||||
* `GET /api/auth/me`: 获取当前用户信息
|
||||
|
||||
> 授权有效期策略:在登录与受保护接口鉴权时,后端会检查 `users.expires_at`。账号到期会自动停用 (`is_active=false`) 并清理 session,返回 `403: 会员已到期,请续费`。
|
||||
|
||||
2. **视频生成 (Videos)**
|
||||
* `POST /api/videos/generate`: 提交生成任务
|
||||
* `GET /api/videos/tasks/{task_id}`: 查询单个任务状态
|
||||
* `GET /api/videos/tasks`: 获取用户所有任务列表
|
||||
* `GET /api/videos/generated`: 获取历史视频列表
|
||||
* `DELETE /api/videos/generated/{video_id}`: 删除历史视频
|
||||
|
||||
3. **素材管理 (Materials)**
|
||||
* `POST /api/materials`: 上传素材
|
||||
* `GET /api/materials`: 获取素材列表
|
||||
* `PUT /api/materials/{material_id}`: 重命名素材
|
||||
|
||||
4. **社交发布 (Publish)**
|
||||
* `POST /api/publish`: 发布视频到 抖音/微信视频号/B站/小红书
|
||||
* `POST /api/publish/login`: 扫码登录平台
|
||||
* `GET /api/publish/login/status`: 查询登录状态(含刷脸验证二维码)
|
||||
* `GET /api/publish/accounts`: 获取已登录账号列表
|
||||
|
||||
> 提示:视频号/抖音发布建议使用 headful + xvfb-run 运行后端。
|
||||
|
||||
5. **资源库 (Assets)**
|
||||
* `GET /api/assets/subtitle-styles`: 字幕样式列表
|
||||
* `GET /api/assets/title-styles`: 标题样式列表
|
||||
* `GET /api/assets/bgm`: 背景音乐列表
|
||||
|
||||
6. **声音克隆 (Ref Audios)**
|
||||
* `POST /api/ref-audios`: 上传参考音频 (multipart/form-data,自动 Whisper 转写 ref_text)
|
||||
* `GET /api/ref-audios`: 获取参考音频列表
|
||||
* `PUT /api/ref-audios/{id}`: 重命名参考音频
|
||||
* `DELETE /api/ref-audios/{id}`: 删除参考音频
|
||||
* `POST /api/ref-audios/{id}/retranscribe`: 重新识别参考音频文字(Whisper 转写 + 超 10s 自动截取)
|
||||
|
||||
7. **AI 功能 (AI)**
|
||||
* `POST /api/ai/generate-meta`: AI 生成标题和标签
|
||||
* `POST /api/ai/translate`: AI 多语言翻译(支持 9 种目标语言)
|
||||
|
||||
8. **预生成配音 (Generated Audios)**
|
||||
* `POST /api/generated-audios/generate`: 异步生成配音(返回 task_id)
|
||||
* `GET /api/generated-audios/tasks/{task_id}`: 轮询生成进度
|
||||
* `GET /api/generated-audios`: 列出用户所有配音
|
||||
* `DELETE /api/generated-audios/{audio_id}`: 删除配音
|
||||
* `PUT /api/generated-audios/{audio_id}`: 重命名配音
|
||||
|
||||
9. **工具 (Tools)**
|
||||
* `POST /api/tools/extract-script`: 从视频链接提取文案
|
||||
|
||||
10. **健康检查**
|
||||
* `GET /api/lipsync/health`: 唇形同步服务健康状态(含 LatentSync + MuseTalk + 混合路由阈值)
|
||||
* `GET /api/voiceclone/health`: CosyVoice 3.0 服务健康状态
|
||||
|
||||
11. **支付 (Payment)**
|
||||
* `POST /api/payment/create-order`: 创建支付宝电脑网站支付订单(需 payment_token)
|
||||
* `POST /api/payment/notify`: 支付宝异步通知回调(返回纯文本 success/fail)
|
||||
* `GET /api/payment/status/{out_trade_no}`: 查询订单支付状态(前端轮询)
|
||||
|
||||
> 登录时若账号未激活或已过期,返回 403 + `payment_token`,前端跳转 `/pay` 页面完成付费。详见 [支付宝部署指南](ALIPAY_DEPLOY.md)。
|
||||
|
||||
### 统一响应结构
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "ok",
|
||||
"data": { },
|
||||
"code": 0
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎛️ 视频生成扩展参数
|
||||
|
||||
`POST /api/videos/generate` 支持以下可选字段:
|
||||
|
||||
- `material_path`: 视频素材路径(单素材模式)
|
||||
- `material_paths`: 多素材路径数组(多机位模式,≥2 个素材时按句子自动切换)
|
||||
- `tts_mode`: TTS 模式 (`edgetts` / `voiceclone`)
|
||||
- `voice`: EdgeTTS 音色 ID(edgetts 模式)
|
||||
- `ref_audio_id` / `ref_text`: 参考音频 ID 与文本(voiceclone 模式)
|
||||
- `generated_audio_id`: 预生成配音 ID(存在时跳过内联 TTS,使用已生成的配音文件)
|
||||
- `speed`: 语速(声音克隆模式,默认 1.0,范围 0.8-1.2)
|
||||
- `custom_assignments`: 自定义素材分配数组(每项含 `material_path` / `start` / `end` / `source_start` / `source_end?`),存在时优先按时间轴可见段生成
|
||||
- `output_aspect_ratio`: 输出画面比例(`9:16` 或 `16:9`,默认 `9:16`)
|
||||
- `language`: TTS 语言(默认自动检测,声音克隆时透传给 CosyVoice 3.0)
|
||||
- `title`: 片头标题文字
|
||||
- `title_display_mode`: 标题显示模式(`short` / `persistent`,默认 `short`)
|
||||
- `title_duration`: 标题显示时长(秒,默认 `4.0`;`short` 模式生效)
|
||||
- `subtitle_style_id`: 字幕样式 ID
|
||||
- `title_style_id`: 标题样式 ID
|
||||
- `subtitle_font_size`: 字幕字号(覆盖样式默认值)
|
||||
- `title_font_size`: 标题字号(覆盖样式默认值)
|
||||
- `title_top_margin`: 标题距顶部像素
|
||||
- `secondary_title`: 片头副标题文字(可选,限 20 字,仅视频画面显示)
|
||||
- `secondary_title_style_id`: 副标题样式 ID
|
||||
- `secondary_title_font_size`: 副标题字号
|
||||
- `secondary_title_top_margin`: 副标题距主标题间距
|
||||
- `subtitle_bottom_margin`: 字幕距底部像素
|
||||
- `enable_subtitles`: 是否启用字幕
|
||||
- `bgm_id`: 背景音乐 ID
|
||||
- `bgm_volume`: 背景音乐音量(0-1,默认 0.2)
|
||||
|
||||
### 多素材稳定性说明
|
||||
|
||||
- 多素材片段在拼接前统一重编码,并强制 `25fps + CFR`,减少段边界时间基不一致导致的画面卡顿。
|
||||
- concat 流程启用 `+genpts` 重建时间戳,提升拼接后时间轴连续性。
|
||||
- 对带旋转元数据的 MOV 素材会先做方向归一化,再进入分辨率判断和后续流程。
|
||||
|
||||
## 📦 资源库与静态资源
|
||||
|
||||
- 本地资源目录:`backend/assets/{fonts,bgm,styles}`
|
||||
- 静态访问路径:`/assets`(用于前端样式预览与背景音乐试听)
|
||||
|
||||
## 🎵 背景音乐混音策略
|
||||
|
||||
- 混音发生在 **唇形对齐之后**,避免影响字幕/口型时间轴。
|
||||
- 使用 FFmpeg `amix`,禁用归一化以保持配音音量稳定。
|
||||
|
||||
## 🛠️ 开发环境搭建
|
||||
|
||||
### 1. 虚拟环境
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
python -m venv venv
|
||||
source venv/bin/activate # Linux/macOS
|
||||
# .\venv\Scripts\activate # Windows
|
||||
```
|
||||
|
||||
### 2. 依赖安装
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 3. 环境变量配置
|
||||
|
||||
复制 `.env.example` 到 `.env` 并配置必要的 Key:
|
||||
|
||||
```ini
|
||||
# Supabase
|
||||
SUPABASE_URL=http://localhost:8008
|
||||
SUPABASE_KEY=your_service_role_key
|
||||
|
||||
# GLM API (用于 AI 标题生成)
|
||||
GLM_API_KEY=your_glm_api_key
|
||||
|
||||
# LatentSync 配置
|
||||
LATENTSYNC_GPU_ID=1
|
||||
|
||||
# MuseTalk 配置 (长视频唇形同步)
|
||||
MUSETALK_GPU_ID=0
|
||||
MUSETALK_API_URL=http://localhost:8011
|
||||
MUSETALK_BATCH_SIZE=32
|
||||
LIPSYNC_DURATION_THRESHOLD=120
|
||||
```
|
||||
|
||||
### 4. 启动服务
|
||||
|
||||
**开发模式 (热重载)**:
|
||||
```bash
|
||||
uvicorn app.main:app --host 0.0.0.0 --port 8006 --reload
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🧩 服务集成指南
|
||||
|
||||
### 集成新模型
|
||||
|
||||
如果需要集成新的 AI 模型 (例如新的 TTS 引擎):
|
||||
|
||||
1. 在 `app/services/` 下创建新的 Service 类 (如 `NewTTSService`)。
|
||||
2. 实现 `generate` 方法,可以使用 subprocess 调用,也可以是 HTTP 请求。
|
||||
3. **重要**: 如果模型占用 GPU,请务必使用 `asyncio.Lock` 进行并发控制,防止 OOM。
|
||||
4. 在 `app/modules/` 下创建对应模块,添加 router/service/schemas,并在 `main.py` 注册路由。
|
||||
|
||||
### 唇形同步混合路由
|
||||
|
||||
`lipsync_service.py` 实现了 LatentSync + MuseTalk 混合路由:
|
||||
- 短视频 (<`LIPSYNC_DURATION_THRESHOLD`s) → LatentSync 1.6 (GPU1, 端口 8007)
|
||||
- 长视频 (>=阈值) → MuseTalk 1.5 (GPU0, 端口 8011)
|
||||
- MuseTalk 不可用时自动回退到 LatentSync
|
||||
- 路由逻辑对 workflow 完全透明
|
||||
|
||||
### 添加定时任务
|
||||
|
||||
目前推荐使用 **APScheduler** 或 **Crontab** 来管理定时任务。
|
||||
社交媒体的定时发布功能目前依赖 `playwright` 的延迟执行,未来计划迁移到 Celery 队列。
|
||||
|
||||
---
|
||||
|
||||
## 🛡️ 错误处理
|
||||
|
||||
全项目统一使用 `Loguru` 进行日志记录。
|
||||
|
||||
```python
|
||||
from loguru import logger
|
||||
|
||||
try:
|
||||
# 业务逻辑
|
||||
except Exception as e:
|
||||
logger.error(f"操作失败: {str(e)}")
|
||||
raise HTTPException(status_code=500, detail="服务器内部错误")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🧪 测试
|
||||
|
||||
运行测试套件:
|
||||
|
||||
```bash
|
||||
pytest
|
||||
```
|
||||
212
Docs/COSYVOICE3_DEPLOY.md
Normal file
212
Docs/COSYVOICE3_DEPLOY.md
Normal file
@@ -0,0 +1,212 @@
|
||||
# CosyVoice 3.0 部署文档
|
||||
|
||||
## 概览
|
||||
|
||||
| 项目 | 值 |
|
||||
|------|------|
|
||||
| 模型 | Fun-CosyVoice3-0.5B-2512 (0.5B 参数) |
|
||||
| 端口 | 8010 |
|
||||
| GPU | 0 (CUDA_VISIBLE_DEVICES=0) |
|
||||
| 推理精度 | FP16 (自动混合精度) |
|
||||
| PM2 名称 | vigent2-cosyvoice (id=15) |
|
||||
| Conda 环境 | cosyvoice (Python 3.10) |
|
||||
| 启动脚本 | `run_cosyvoice.sh` |
|
||||
| 服务脚本 | `models/CosyVoice/cosyvoice_server.py` |
|
||||
| 模型加载时间 | ~22-34 秒 |
|
||||
| 显存占用 | ~3-5 GB |
|
||||
|
||||
## 支持语言
|
||||
|
||||
中文、英文、日语、韩语、德语、西班牙语、法语、意大利语、俄语,18+ 中国方言
|
||||
|
||||
## 目录结构
|
||||
|
||||
```
|
||||
models/CosyVoice/
|
||||
├── cosyvoice_server.py # FastAPI 服务 (端口 8010)
|
||||
├── cosyvoice/ # CosyVoice 源码
|
||||
│ └── cli/cosyvoice.py # AutoModel 入口
|
||||
├── third_party/Matcha-TTS/ # 子模块依赖
|
||||
├── pretrained_models/
|
||||
│ ├── Fun-CosyVoice3-0.5B/ # 模型文件 (~8.2GB)
|
||||
│ │ ├── llm.pt # LLM 模型 (1.9GB)
|
||||
│ │ ├── llm.rl.pt # RL 模型 (1.9GB, 备用)
|
||||
│ │ ├── flow.pt # Flow 模型 (1.3GB)
|
||||
│ │ ├── hift.pt # HiFT 声码器 (80MB)
|
||||
│ │ ├── campplus.onnx # 说话人嵌入 (27MB)
|
||||
│ │ ├── speech_tokenizer_v3.onnx # 语音分词器 (925MB)
|
||||
│ │ ├── cosyvoice3.yaml # 模型配置
|
||||
│ │ └── CosyVoice-BlankEN/ # Qwen tokenizer
|
||||
│ └── CosyVoice-ttsfrd/ # 文本正则化资源
|
||||
│ ├── resource/ # 解压后的 ttsfrd 资源
|
||||
│ └── resource.zip
|
||||
run_cosyvoice.sh # PM2 启动脚本
|
||||
```
|
||||
|
||||
## API 接口
|
||||
|
||||
### GET /health
|
||||
|
||||
健康检查,返回:
|
||||
```json
|
||||
{
|
||||
"service": "CosyVoice 3.0 Voice Clone",
|
||||
"model": "Fun-CosyVoice3-0.5B",
|
||||
"ready": true,
|
||||
"gpu_id": 0
|
||||
}
|
||||
```
|
||||
|
||||
### POST /generate
|
||||
|
||||
声音克隆生成。
|
||||
|
||||
**参数 (multipart/form-data):**
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| ref_audio | File | 是 | 参考音频 (WAV) |
|
||||
| text | string | 是 | 要合成的文本 |
|
||||
| ref_text | string | 是 | 参考音频的转写文字 |
|
||||
| language | string | 否 | 语言 (默认 "Chinese",CosyVoice 自动检测) |
|
||||
| speed | float | 否 | 语速 (默认 1.0,范围 0.5-2.0,建议 0.8-1.2) |
|
||||
|
||||
**返回:** WAV 音频文件
|
||||
|
||||
**状态码:**
|
||||
- 200: 成功
|
||||
- 429: GPU 忙,请重试
|
||||
- 500: 生成失败/超时
|
||||
- 503: 模型未加载/服务中毒
|
||||
|
||||
## 安全机制
|
||||
|
||||
1. **GPU 推理锁** (`asyncio.Lock`): 防止并发推理导致 GPU 状态损坏
|
||||
2. **429 拒绝**: 锁被占用时立即返回 429,客户端重试
|
||||
3. **超时保护**: `60 + len(text) * 2` 秒,上限 300 秒
|
||||
4. **Poisoned 标记**: 超时后标记服务为中毒状态,健康检查返回 `ready: false`
|
||||
5. **强制退出**: 超时后 1.5 秒强制 `os._exit(1)`,PM2 自动重启
|
||||
6. **启动自检**: 启动时用短文本做一次真实推理,验证 GPU 推理链路可用;失败则 `_model_loaded = False`,健康检查返回 `ready: false`,避免假阳性
|
||||
7. **参考音频自动截取**: 参考音频超过 10 秒时自动截取前 10 秒(CosyVoice 建议 3-10 秒),避免采样异常
|
||||
|
||||
## 运维命令
|
||||
|
||||
```bash
|
||||
# 启动
|
||||
pm2 start run_cosyvoice.sh --name vigent2-cosyvoice
|
||||
|
||||
# 重启
|
||||
pm2 restart vigent2-cosyvoice
|
||||
|
||||
# 查看日志
|
||||
pm2 logs vigent2-cosyvoice --lines 50
|
||||
|
||||
# 健康检查
|
||||
curl http://localhost:8010/health
|
||||
|
||||
# 停止
|
||||
pm2 stop vigent2-cosyvoice
|
||||
```
|
||||
|
||||
## 从零部署步骤
|
||||
|
||||
### 1. 克隆仓库
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models
|
||||
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
|
||||
cd CosyVoice
|
||||
git submodule update --init --recursive
|
||||
```
|
||||
|
||||
### 2. 创建 Conda 环境
|
||||
|
||||
```bash
|
||||
conda create -n cosyvoice -y python=3.10
|
||||
conda activate cosyvoice
|
||||
```
|
||||
|
||||
### 3. 安装依赖
|
||||
|
||||
注意:不能直接 `pip install -r requirements.txt`,有版本冲突需要处理。
|
||||
|
||||
```bash
|
||||
# 安装 PyTorch 2.3.1 (CUDA 12.1) — 必须先装,版本严格要求
|
||||
pip install torch==2.3.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
|
||||
|
||||
# 核心推理依赖
|
||||
pip install conformer==0.3.2 HyperPyYAML==1.2.2 inflect==7.3.1 \
|
||||
librosa==0.10.2 lightning==2.2.4 modelscope==1.20.0 omegaconf==2.3.0 \
|
||||
pydantic==2.7.0 soundfile==0.12.1 fastapi==0.115.6 uvicorn==0.30.0 \
|
||||
transformers==4.51.3 protobuf==4.25 hydra-core==1.3.2 \
|
||||
rich==13.7.1 diffusers==0.29.0 x-transformers==2.11.24 wetext==0.0.4
|
||||
|
||||
# onnxruntime-gpu
|
||||
pip install onnxruntime-gpu==1.18.0 \
|
||||
--extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
|
||||
|
||||
# 其他必要依赖
|
||||
pip install gdown matplotlib pyarrow wget onnx python-multipart httpx
|
||||
|
||||
# openai-whisper 需要 setuptools < 71(提供 pkg_resources)
|
||||
pip install "setuptools<71"
|
||||
pip install --no-build-isolation openai-whisper==20231117
|
||||
|
||||
# pyworld 需要 g++ 和 Cython
|
||||
pip install Cython
|
||||
PATH="/usr/bin:$PATH" pip install pyworld==0.3.4
|
||||
|
||||
# 关键版本修复
|
||||
pip install "numpy<2" # onnxruntime-gpu 不兼容 numpy 2.x
|
||||
pip install "ruamel.yaml<0.18" # hyperpyyaml 不兼容 ruamel.yaml 0.19+
|
||||
```
|
||||
|
||||
> **重要**: CosyVoice 要求 torch==2.3.1。torch 2.10+ 会导致 CUBLAS_STATUS_INVALID_VALUE 错误。
|
||||
> torch 2.3.1+cu121 自带 nvidia-cudnn-cu12,onnxruntime CUDAExecutionProvider 可正常使用。
|
||||
|
||||
### 4. 下载模型
|
||||
|
||||
```bash
|
||||
# 使用 huggingface_hub (国内用 hf-mirror.com)
|
||||
HF_ENDPOINT=https://hf-mirror.com python -c "
|
||||
from huggingface_hub import snapshot_download
|
||||
snapshot_download('FunAudioLLM/Fun-CosyVoice3-0.5B-2512', local_dir='pretrained_models/Fun-CosyVoice3-0.5B')
|
||||
snapshot_download('FunAudioLLM/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')
|
||||
"
|
||||
```
|
||||
|
||||
### 5. 安装 ttsfrd (可选,提升文本正则化质量)
|
||||
|
||||
```bash
|
||||
cd pretrained_models/CosyVoice-ttsfrd/
|
||||
unzip resource.zip -d .
|
||||
pip install ttsfrd_dependency-0.1-py3-none-any.whl
|
||||
pip install ttsfrd-0.4.2-cp310-cp310-linux_x86_64.whl
|
||||
```
|
||||
|
||||
### 6. 注册 PM2
|
||||
|
||||
```bash
|
||||
pm2 start run_cosyvoice.sh --name vigent2-cosyvoice
|
||||
pm2 save
|
||||
```
|
||||
|
||||
## 已知问题
|
||||
|
||||
1. **ttsfrd "prepare tts engine failed"**: ttsfrd C 库内部日志,Python 层初始化成功,不影响使用
|
||||
2. **Sliding Window Attention 警告**: transformers 库提示,不影响推理结果
|
||||
3. **onnxruntime Memcpy 性能提示**: `Memcpy nodes are not supported by the CUDA EP`,仅为性能建议日志,不影响功能
|
||||
|
||||
> 注:libcudnn.so.8 问题在 torch 2.3.1+cu121 环境下已解决(自带 nvidia-cudnn-cu12),onnxruntime CUDAExecutionProvider 可正常加载。
|
||||
|
||||
## 与 Qwen3-TTS 对比
|
||||
|
||||
| 特性 | Qwen3-TTS (已停用) | CosyVoice 3.0 (当前) |
|
||||
|------|-----------|----------------|
|
||||
| 端口 | 8009 | 8010 |
|
||||
| 模型大小 | 0.6B | 0.5B |
|
||||
| 语言 | 中/英/日/韩 | 9 语言 + 18 方言 |
|
||||
| 克隆方式 | ref_audio + ref_text | ref_audio + ref_text |
|
||||
| prompt 格式 | 直接传 ref_text | `You are a helpful assistant.<\|endofprompt\|>` + ref_text |
|
||||
| 内置分段 | 无,需客户端分段 | 内置 text_normalize 自动分段 |
|
||||
| 状态 | 已停用 (PM2 stopped) | 生产使用中 |
|
||||
@@ -7,8 +7,8 @@
|
||||
| 服务器 | Dell PowerEdge R730 |
|
||||
| CPU | 2× Intel Xeon E5-2680 v4 (56 线程) |
|
||||
| 内存 | 192GB DDR4 |
|
||||
| GPU 0 | NVIDIA RTX 3090 24GB |
|
||||
| GPU 1 | NVIDIA RTX 3090 24GB (用于 LatentSync) |
|
||||
| GPU 0 | NVIDIA RTX 3090 24GB (MuseTalk + CosyVoice) |
|
||||
| GPU 1 | NVIDIA RTX 3090 24GB (LatentSync) |
|
||||
| 部署路径 | `/home/rongye/ProgramFiles/ViGent2` |
|
||||
|
||||
---
|
||||
@@ -28,8 +28,17 @@ node --version
|
||||
# 检查 FFmpeg
|
||||
ffmpeg -version
|
||||
|
||||
# 检查 Chrome (视频号发布)
|
||||
google-chrome --version
|
||||
|
||||
# 检查 Xvfb
|
||||
xvfb-run --help
|
||||
|
||||
# 检查 pm2 (用于服务管理)
|
||||
pm2 --version
|
||||
|
||||
# 检查 Redis (任务状态存储,推荐)
|
||||
redis-server --version
|
||||
```
|
||||
|
||||
如果缺少依赖:
|
||||
@@ -37,8 +46,17 @@ pm2 --version
|
||||
sudo apt update
|
||||
sudo apt install ffmpeg
|
||||
|
||||
# 安装 Xvfb (视频号发布)
|
||||
sudo apt install xvfb
|
||||
|
||||
# 安装 pm2
|
||||
npm install -g pm2
|
||||
|
||||
# 安装 Chrome (视频号发布)
|
||||
wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo gpg --dearmor -o /usr/share/keyrings/google-linux-signing-keyring.gpg
|
||||
printf "deb [arch=amd64 signed-by=/usr/share/keyrings/google-linux-signing-keyring.gpg] http://dl.google.com/linux/chrome/deb/ stable main\n" | sudo tee /etc/apt/sources.list.d/google-chrome.list > /dev/null
|
||||
sudo apt update
|
||||
sudo apt install -y google-chrome-stable
|
||||
```
|
||||
|
||||
---
|
||||
@@ -54,7 +72,9 @@ cd /home/rongye/ProgramFiles/ViGent2
|
||||
|
||||
---
|
||||
|
||||
## 步骤 3: 部署 AI 模型 (LatentSync 1.6)
|
||||
## 步骤 3: 部署 AI 模型
|
||||
|
||||
### 3a. LatentSync 1.6 (短视频唇形同步, GPU1)
|
||||
|
||||
> ⚠️ **重要**:LatentSync 需要独立的 Conda 环境和 **~18GB VRAM**。请**不要**直接安装在后端环境中。
|
||||
|
||||
@@ -75,6 +95,26 @@ conda activate latentsync
|
||||
python -m scripts.server # 测试能否启动,Ctrl+C 退出
|
||||
```
|
||||
|
||||
### 3b. MuseTalk 1.5 (长视频唇形同步, GPU0)
|
||||
|
||||
> MuseTalk 是单步潜空间修复模型(非扩散模型),推理速度接近实时,适合 >=120s 的长视频。与 CosyVoice 共享 GPU0,fp16 推理约需 4-8GB 显存。
|
||||
|
||||
请参考详细的独立部署指南:
|
||||
**[MuseTalk 部署指南](MUSETALK_DEPLOY.md)**
|
||||
|
||||
简要步骤:
|
||||
1. 创建独立的 `musetalk` Conda 环境 (Python 3.10 + PyTorch 2.0.1 + CUDA 11.8)
|
||||
2. 安装 mmcv/mmdet/mmpose 等依赖
|
||||
3. 下载模型权重 (`download_weights.sh`)
|
||||
4. 创建必要的软链接 (`musetalk/config.json`, `musetalk/musetalkV15`)
|
||||
|
||||
**验证 MuseTalk 部署**:
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||||
/home/rongye/ProgramFiles/miniconda3/envs/musetalk/bin/python scripts/server.py
|
||||
# 另一个终端: curl http://localhost:8011/health
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 步骤 4: 安装后端依赖
|
||||
@@ -96,6 +136,27 @@ pip install -r requirements.txt
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
> 提示:视频号发布建议使用系统 Chrome + xvfb-run(避免 headless 解码失败)。
|
||||
> 抖音发布同样建议 headful 模式 (`DOUYIN_HEADLESS_MODE=headful`)。
|
||||
|
||||
### 扫码登录注意事项
|
||||
|
||||
- **Cookie 按用户隔离**:每个用户的 Cookie 存储在 `backend/user_data/{uuid}/cookies/` 目录下,多用户并发登录互不干扰。
|
||||
- **抖音 QR 登录关键教训**:
|
||||
- 扫码后绝对**不能重新加载 QR 页面**,否则会销毁会话 token
|
||||
- 使用**新标签页**检测登录完成状态(检查 URL 包含 `creator-micro` + session cookies 存在)
|
||||
- 抖音可能弹出**刷脸验证**,后端会自动提取验证二维码返回给前端展示
|
||||
- **微信视频号发布**:标题、描述、标签统一写入"视频描述"字段
|
||||
|
||||
---
|
||||
|
||||
### 可选:AI 标题/标签生成
|
||||
|
||||
> ✅ 如需启用“AI 标题/标签生成”功能,请确保后端可访问外网 API。
|
||||
|
||||
- 需要可访问 `https://open.bigmodel.cn`
|
||||
- API Key 配置在 `backend/app/services/glm_service.py`(建议替换为自己的密钥)
|
||||
|
||||
---
|
||||
|
||||
## 步骤 5: 部署用户认证系统 (Supabase + Auth)
|
||||
@@ -126,6 +187,8 @@ playwright install chromium
|
||||
CREATE POLICY "Allow public read" ON storage.objects FOR SELECT TO anon USING (bucket_id = 'materials' OR bucket_id = 'outputs');
|
||||
EOF
|
||||
```
|
||||
|
||||
> **注意**:后端启动时会自动创建额外的存储桶(`ref-audios`、`generated-audios`),无需手动创建。
|
||||
|
||||
---
|
||||
|
||||
@@ -148,9 +211,44 @@ cp .env.example .env
|
||||
| `SUPABASE_PUBLIC_URL` | `https://api.hbyrkj.top` | Supabase API 公网地址 (前端访问) |
|
||||
| `LATENTSYNC_GPU_ID` | 1 | GPU 选择 (0 或 1) |
|
||||
| `LATENTSYNC_USE_SERVER` | false | 设为 true 以启用常驻服务加速 |
|
||||
| `LATENTSYNC_INFERENCE_STEPS` | 20 | 推理步数 (20-50) |
|
||||
| `LATENTSYNC_INFERENCE_STEPS` | 16 | 推理步数 (16-50) |
|
||||
| `LATENTSYNC_GUIDANCE_SCALE` | 1.5 | 引导系数 (1.0-3.0) |
|
||||
| `DEBUG` | true | 生产环境改为 false |
|
||||
| `REDIS_URL` | `redis://localhost:6379/0` | 任务状态存储(不可用时回退内存) |
|
||||
| `WEIXIN_HEADLESS_MODE` | headless-new | 视频号 Playwright 模式 (headful/headless-new) |
|
||||
| `WEIXIN_CHROME_PATH` | `/usr/bin/google-chrome` | 系统 Chrome 路径 |
|
||||
| `WEIXIN_BROWSER_CHANNEL` | | Chromium 通道 (可选) |
|
||||
| `WEIXIN_USER_AGENT` | Chrome 120 UA | 视频号浏览器指纹 UA |
|
||||
| `WEIXIN_LOCALE` | zh-CN | 视频号语言环境 |
|
||||
| `WEIXIN_TIMEZONE_ID` | Asia/Shanghai | 视频号时区 |
|
||||
| `WEIXIN_FORCE_SWIFTSHADER` | true | 强制软件 WebGL,避免 context lost |
|
||||
| `WEIXIN_TRANSCODE_MODE` | reencode | 上传前转码 (reencode/faststart/off) |
|
||||
| `DOUYIN_HEADLESS_MODE` | headless-new | 抖音 Playwright 模式 (headful/headless-new) |
|
||||
| `DOUYIN_CHROME_PATH` | `/usr/bin/google-chrome` | 抖音 Chrome 路径 |
|
||||
| `DOUYIN_BROWSER_CHANNEL` | | 抖音 Chromium 通道 (可选) |
|
||||
| `DOUYIN_USER_AGENT` | Chrome/144 UA | 抖音浏览器指纹 UA |
|
||||
| `DOUYIN_LOCALE` | zh-CN | 抖音语言环境 |
|
||||
| `DOUYIN_TIMEZONE_ID` | Asia/Shanghai | 抖音时区 |
|
||||
| `DOUYIN_FORCE_SWIFTSHADER` | true | 强制软件 WebGL |
|
||||
| `DOUYIN_DEBUG_ARTIFACTS` | false | 保留调试截图 |
|
||||
| `DOUYIN_RECORD_VIDEO` | false | 录制浏览器操作视频 |
|
||||
| `DOUYIN_KEEP_SUCCESS_VIDEO` | false | 成功后保留录屏 |
|
||||
| `CORS_ORIGINS` | `*` | CORS 允许源 (生产环境建议白名单) |
|
||||
| `MUSETALK_GPU_ID` | 0 | MuseTalk GPU 编号 |
|
||||
| `MUSETALK_API_URL` | `http://localhost:8011` | MuseTalk 常驻服务地址 |
|
||||
| `MUSETALK_BATCH_SIZE` | 32 | MuseTalk 推理批大小 |
|
||||
| `MUSETALK_VERSION` | v15 | MuseTalk 模型版本 |
|
||||
| `MUSETALK_USE_FLOAT16` | true | MuseTalk 半精度加速 |
|
||||
| `LIPSYNC_DURATION_THRESHOLD` | 120 | 秒,>=此值用 MuseTalk,<此值用 LatentSync |
|
||||
| `ALIPAY_APP_ID` | 空 | 支付宝应用 APPID |
|
||||
| `ALIPAY_PRIVATE_KEY_PATH` | 空 | 应用私钥 PEM 文件路径 |
|
||||
| `ALIPAY_PUBLIC_KEY_PATH` | 空 | 支付宝公钥 PEM 文件路径 |
|
||||
| `ALIPAY_NOTIFY_URL` | 空 | 支付宝异步回调地址 (公网 HTTPS) |
|
||||
| `ALIPAY_RETURN_URL` | 空 | 支付完成后浏览器跳转地址 |
|
||||
| `PAYMENT_AMOUNT` | `999.00` | 会员价格 (元) |
|
||||
| `PAYMENT_EXPIRE_DAYS` | `365` | 会员有效天数 |
|
||||
|
||||
> 支付宝完整配置步骤(密钥生成、PEM 格式、产品开通等)请参考 **[支付宝部署指南](ALIPAY_DEPLOY.md)**。
|
||||
|
||||
---
|
||||
|
||||
@@ -180,6 +278,12 @@ source venv/bin/activate
|
||||
uvicorn app.main:app --host 0.0.0.0 --port 8006
|
||||
```
|
||||
|
||||
推荐使用项目脚本启动后端(已内置 xvfb + headful 发布环境):
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2
|
||||
./run_backend.sh # 默认 8006,可用 PORT 覆盖
|
||||
```
|
||||
|
||||
### 启动前端 (终端 2)
|
||||
|
||||
```bash
|
||||
@@ -194,6 +298,13 @@ cd /home/rongye/ProgramFiles/ViGent2/models/LatentSync
|
||||
conda activate latentsync
|
||||
python -m scripts.server
|
||||
```
|
||||
|
||||
### 启动 MuseTalk (终端 4, 长视频唇形同步)
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||||
/home/rongye/ProgramFiles/miniconda3/envs/musetalk/bin/python scripts/server.py
|
||||
```
|
||||
|
||||
### 验证
|
||||
|
||||
@@ -214,9 +325,19 @@ python -m scripts.server
|
||||
1. 创建启动脚本 `run_backend.sh`:
|
||||
```bash
|
||||
cat > run_backend.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
./venv/bin/uvicorn app.main:app --host 0.0.0.0 --port 8006
|
||||
#!/usr/bin/env bash
|
||||
set -e
|
||||
BASE_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
export WEIXIN_HEADLESS_MODE=headful
|
||||
export DOUYIN_HEADLESS_MODE=headful
|
||||
export WEIXIN_DEBUG_ARTIFACTS=false
|
||||
export WEIXIN_RECORD_VIDEO=false
|
||||
export DOUYIN_DEBUG_ARTIFACTS=false
|
||||
export DOUYIN_RECORD_VIDEO=false
|
||||
PORT=${PORT:-8006}
|
||||
cd "$BASE_DIR/backend"
|
||||
exec xvfb-run --auto-servernum --server-args="-screen 0 1920x1080x24" \
|
||||
./venv/bin/uvicorn app.main:app --host 0.0.0.0 --port "$PORT"
|
||||
EOF
|
||||
chmod +x run_backend.sh
|
||||
```
|
||||
@@ -258,19 +379,72 @@ chmod +x run_latentsync.sh
|
||||
pm2 start ./run_latentsync.sh --name vigent2-latentsync
|
||||
```
|
||||
|
||||
### 4. 保存当前列表 (开机自启)
|
||||
### 4. 启动 CosyVoice 3.0 声音克隆服务 (可选)
|
||||
|
||||
> 如需使用声音克隆功能,需要启动此服务。详细部署步骤见 [CosyVoice 3.0 部署文档](COSYVOICE3_DEPLOY.md)。
|
||||
|
||||
1. 启动脚本位于项目根目录: `run_cosyvoice.sh`
|
||||
|
||||
2. 使用 pm2 启动:
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2
|
||||
pm2 start ./run_cosyvoice.sh --name vigent2-cosyvoice
|
||||
pm2 save
|
||||
```
|
||||
|
||||
3. 验证服务:
|
||||
```bash
|
||||
# 检查健康状态
|
||||
curl http://localhost:8010/health
|
||||
```
|
||||
|
||||
### 5. 启动 MuseTalk 长视频唇形同步服务
|
||||
|
||||
> 长视频 (>=120s) 自动路由到 MuseTalk。MuseTalk 不可用时自动回退 LatentSync。
|
||||
> 详细部署步骤见 [MuseTalk 部署指南](MUSETALK_DEPLOY.md)。
|
||||
|
||||
1. 启动脚本位于项目根目录: `run_musetalk.sh`
|
||||
|
||||
2. 使用 pm2 启动:
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2
|
||||
pm2 start ./run_musetalk.sh --name vigent2-musetalk
|
||||
pm2 save
|
||||
```
|
||||
|
||||
3. 验证服务:
|
||||
```bash
|
||||
curl http://localhost:8011/health
|
||||
# {"status":"ok","model_loaded":true}
|
||||
```
|
||||
|
||||
### 6. 启动服务看门狗 (Watchdog)
|
||||
|
||||
> 🛡️ **推荐**:监控 CosyVoice 和 LatentSync 服务健康状态,卡死时自动重启。
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2
|
||||
pm2 start ./run_watchdog.sh --name vigent2-watchdog
|
||||
pm2 save
|
||||
```
|
||||
|
||||
### 6. 保存当前列表 (开机自启)
|
||||
|
||||
```bash
|
||||
pm2 save
|
||||
pm2 startup
|
||||
```
|
||||
|
||||
> **提示**: 完整的 PM2 进程列表应包含 5-6 个服务: vigent2-backend, vigent2-frontend, vigent2-latentsync, vigent2-cosyvoice, vigent2-musetalk, vigent2-watchdog。
|
||||
|
||||
### pm2 常用命令
|
||||
|
||||
```bash
|
||||
pm2 status # 查看所有服务状态
|
||||
pm2 logs # 查看所有日志
|
||||
pm2 logs vigent2-backend # 查看后端日志
|
||||
pm2 logs vigent2-cosyvoice # 查看 CosyVoice 日志
|
||||
pm2 logs vigent2-musetalk # 查看 MuseTalk 日志
|
||||
pm2 restart all # 重启所有服务
|
||||
pm2 stop vigent2-latentsync # 停止 LatentSync 服务
|
||||
pm2 delete all # 删除所有服务
|
||||
@@ -322,7 +496,46 @@ server {
|
||||
|
||||
---
|
||||
|
||||
## 步骤 12: 配置阿里云 Nginx 网关 (关键)
|
||||
---
|
||||
|
||||
## 步骤 13: 部署可选功能 (字幕与文案助手)
|
||||
|
||||
本节介绍如何部署逐字高亮字幕、片头标题以及文案提取助手功能。
|
||||
|
||||
### 13.1 部署字幕系统 (Subtitle System)
|
||||
|
||||
包含 `faster-whisper` (字幕生成) 和 `Remotion` (视频渲染) 组件。
|
||||
|
||||
详细步骤请参考:**[字幕功能部署指南](SUBTITLE_DEPLOY.md)**
|
||||
|
||||
简要步骤:
|
||||
1. 安装 Python 依赖: `faster-whisper`
|
||||
2. 安装 Node.js 依赖: `npm install` (在 `remotion/` 目录)
|
||||
3. 验证: `npx remotion --version`
|
||||
|
||||
### 13.2 部署文案提取助手 (Copywriting Assistant)
|
||||
|
||||
支持 B站/抖音/TikTok 视频链接提取文案与 AI 洗稿。
|
||||
|
||||
1. **安装核心依赖**:
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
source venv/bin/activate
|
||||
pip install yt-dlp zai-sdk
|
||||
```
|
||||
|
||||
2. **配置 AI 洗稿 (GLM)**:
|
||||
确保 `.env` 中已配置 `GLM_API_KEY`:
|
||||
```ini
|
||||
GLM_API_KEY=your_zhipu_api_key
|
||||
```
|
||||
|
||||
3. **验证**:
|
||||
访问 `http://localhost:8006/docs`,测试 `/api/tools/extract-script` 接口。
|
||||
|
||||
---
|
||||
|
||||
## 步骤 14: 配置阿里云 Nginx 网关 (关键)
|
||||
|
||||
> ⚠️ **CRITICAL**: 如果使用 `api.hbyrkj.top` 等域名作为入口,必须在阿里云 (或公网入口) 的 Nginx 配置中解除上传限制。
|
||||
> **这是导致 500/413 错误的核心原因。**
|
||||
@@ -370,6 +583,8 @@ python3 -c "import torch; print(torch.cuda.is_available())"
|
||||
sudo lsof -i :8006
|
||||
sudo lsof -i :3002
|
||||
sudo lsof -i :8007
|
||||
sudo lsof -i :8010 # CosyVoice
|
||||
sudo lsof -i :8011 # MuseTalk
|
||||
```
|
||||
|
||||
### 查看日志
|
||||
@@ -379,6 +594,8 @@ sudo lsof -i :8007
|
||||
pm2 logs vigent2-backend
|
||||
pm2 logs vigent2-frontend
|
||||
pm2 logs vigent2-latentsync
|
||||
pm2 logs vigent2-cosyvoice
|
||||
pm2 logs vigent2-musetalk
|
||||
```
|
||||
|
||||
### SSH 连接卡顿 / 系统响应慢
|
||||
@@ -405,9 +622,11 @@ pm2 logs vigent2-latentsync
|
||||
| `fastapi` | Web API 框架 |
|
||||
| `uvicorn` | ASGI 服务器 |
|
||||
| `edge-tts` | 微软 TTS 配音 |
|
||||
| `httpx` | GLM API HTTP 客户端 |
|
||||
| `playwright` | 社交媒体自动发布 |
|
||||
| `biliup` | B站视频上传 |
|
||||
| `loguru` | 日志管理 |
|
||||
| `python-alipay-sdk` | 支付宝支付集成 |
|
||||
|
||||
### 前端关键依赖
|
||||
|
||||
@@ -416,6 +635,7 @@ pm2 logs vigent2-latentsync
|
||||
| `next` | React 框架 |
|
||||
| `swr` | 数据请求与缓存 |
|
||||
| `tailwindcss` | CSS 样式 |
|
||||
| `wavesurfer.js` | 音频波形(时间轴编辑器) |
|
||||
|
||||
### LatentSync 关键依赖
|
||||
|
||||
|
||||
@@ -9,7 +9,7 @@
|
||||
### 背景
|
||||
统一处理 API 请求的认证失败场景,避免各页面重复处理 401/403 错误。
|
||||
|
||||
### 实现 (`frontend/src/lib/axios.ts`)
|
||||
### 实现 (`frontend/src/shared/api/axios.ts`)
|
||||
|
||||
```typescript
|
||||
import axios from 'axios';
|
||||
@@ -325,7 +325,7 @@ models/Qwen3-TTS/
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `frontend/src/lib/axios.ts` | 修改 | Axios 全局拦截器 (401/403 自动跳转) |
|
||||
| `frontend/src/shared/api/axios.ts` | 修改 | Axios 全局拦截器 (401/403 自动跳转) |
|
||||
| `frontend/src/app/layout.tsx` | 修改 | viewport 配置 + body 渐变背景 |
|
||||
| `frontend/src/app/globals.css` | 修改 | 安全区域 CSS 支持 |
|
||||
| `frontend/src/app/page.tsx` | 修改 | 移除独立渐变 + Header 响应式 |
|
||||
@@ -342,6 +342,6 @@ models/Qwen3-TTS/
|
||||
|
||||
## 🔗 相关文档
|
||||
|
||||
- [task_complete.md](../task_complete.md) - 任务总览
|
||||
- [TASK_COMPLETE.md](../TASK_COMPLETE.md) - 任务总览
|
||||
- [Day11.md](./Day11.md) - 上传架构重构
|
||||
- [QWEN3_TTS_DEPLOY.md](../QWEN3_TTS_DEPLOY.md) - Qwen3-TTS 部署指南
|
||||
|
||||
431
Docs/DevLogs/Day13.md
Normal file
431
Docs/DevLogs/Day13.md
Normal file
@@ -0,0 +1,431 @@
|
||||
# Day 13 - 声音克隆功能集成 + 字幕功能
|
||||
|
||||
**日期**:2026-01-29
|
||||
|
||||
---
|
||||
|
||||
## 🎙️ Qwen3-TTS 服务集成
|
||||
|
||||
### 背景
|
||||
在 Day 12 完成 Qwen3-TTS 模型部署后,今日重点是将其集成到 ViGent2 系统中,提供完整的声音克隆功能。
|
||||
|
||||
### 架构设计
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 前端 (Next.js) │
|
||||
│ 参考音频上传 → TTS 模式选择 → 视频生成请求 │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 后端 (FastAPI :8006) │
|
||||
│ ref-audios API → voice_clone_service → video_service │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Qwen3-TTS 服务 (FastAPI :8009) │
|
||||
│ HTTP /generate → 返回克隆音频 │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Qwen3-TTS HTTP 服务 (`qwen_tts_server.py`)
|
||||
|
||||
创建独立的 FastAPI 服务,运行在 8009 端口:
|
||||
|
||||
```python
|
||||
from fastapi import FastAPI, UploadFile, Form, HTTPException
|
||||
from fastapi.responses import Response
|
||||
import torch
|
||||
import soundfile as sf
|
||||
from qwen_tts import Qwen3TTSModel
|
||||
import io, os
|
||||
|
||||
app = FastAPI(title="Qwen3-TTS Voice Clone Service")
|
||||
|
||||
# GPU 配置
|
||||
GPU_ID = os.getenv("QWEN_TTS_GPU_ID", "0")
|
||||
model = None
|
||||
|
||||
@app.on_event("startup")
|
||||
async def load_model():
|
||||
global model
|
||||
model = Qwen3TTSModel.from_pretrained(
|
||||
"./checkpoints/0.6B-Base",
|
||||
device_map=f"cuda:{GPU_ID}",
|
||||
dtype=torch.bfloat16,
|
||||
)
|
||||
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
return {"service": "Qwen3-TTS", "ready": model is not None, "gpu_id": GPU_ID}
|
||||
|
||||
@app.post("/generate")
|
||||
async def generate(
|
||||
ref_audio: UploadFile,
|
||||
text: str = Form(...),
|
||||
ref_text: str = Form(""),
|
||||
language: str = Form("Chinese"),
|
||||
):
|
||||
# 保存临时参考音频
|
||||
ref_path = f"/tmp/ref_{ref_audio.filename}"
|
||||
with open(ref_path, "wb") as f:
|
||||
f.write(await ref_audio.read())
|
||||
|
||||
# 生成克隆音频
|
||||
wavs, sr = model.generate_voice_clone(
|
||||
text=text,
|
||||
language=language,
|
||||
ref_audio=ref_path,
|
||||
ref_text=ref_text or "一段参考音频。",
|
||||
)
|
||||
|
||||
# 返回 WAV 音频
|
||||
buffer = io.BytesIO()
|
||||
sf.write(buffer, wavs[0], sr, format="WAV")
|
||||
buffer.seek(0)
|
||||
return Response(content=buffer.read(), media_type="audio/wav")
|
||||
```
|
||||
|
||||
### 后端声音克隆服务 (`voice_clone_service.py`)
|
||||
|
||||
通过 HTTP 调用 Qwen3-TTS 服务:
|
||||
|
||||
```python
|
||||
import aiohttp
|
||||
from loguru import logger
|
||||
|
||||
QWEN_TTS_URL = "http://localhost:8009"
|
||||
|
||||
async def generate_cloned_audio(
|
||||
ref_audio_path: str,
|
||||
text: str,
|
||||
output_path: str,
|
||||
ref_text: str = "",
|
||||
) -> str:
|
||||
"""调用 Qwen3-TTS 服务生成克隆音频"""
|
||||
|
||||
async with aiohttp.ClientSession() as session:
|
||||
with open(ref_audio_path, "rb") as f:
|
||||
data = aiohttp.FormData()
|
||||
data.add_field("ref_audio", f, filename="ref.wav")
|
||||
data.add_field("text", text)
|
||||
data.add_field("ref_text", ref_text)
|
||||
|
||||
async with session.post(f"{QWEN_TTS_URL}/generate", data=data) as resp:
|
||||
if resp.status != 200:
|
||||
raise Exception(f"Qwen3-TTS error: {resp.status}")
|
||||
|
||||
audio_data = await resp.read()
|
||||
with open(output_path, "wb") as out:
|
||||
out.write(audio_data)
|
||||
|
||||
return output_path
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📂 参考音频管理 API
|
||||
|
||||
### 新增 API 端点 (`ref_audios.py`)
|
||||
|
||||
| 端点 | 方法 | 功能 |
|
||||
|------|------|------|
|
||||
| `/api/ref-audios` | GET | 获取参考音频列表 |
|
||||
| `/api/ref-audios` | POST | 上传参考音频 |
|
||||
| `/api/ref-audios/{id}` | DELETE | 删除参考音频 |
|
||||
|
||||
### Supabase Bucket 配置
|
||||
|
||||
为参考音频创建独立存储桶:
|
||||
|
||||
```sql
|
||||
-- 创建 ref-audios bucket
|
||||
INSERT INTO storage.buckets (id, name, public)
|
||||
VALUES ('ref-audios', 'ref-audios', true)
|
||||
ON CONFLICT (id) DO NOTHING;
|
||||
|
||||
-- RLS 策略
|
||||
CREATE POLICY "Allow public uploads" ON storage.objects
|
||||
FOR INSERT TO anon WITH CHECK (bucket_id = 'ref-audios');
|
||||
|
||||
CREATE POLICY "Allow public read" ON storage.objects
|
||||
FOR SELECT TO anon USING (bucket_id = 'ref-audios');
|
||||
|
||||
CREATE POLICY "Allow public delete" ON storage.objects
|
||||
FOR DELETE TO anon USING (bucket_id = 'ref-audios');
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎨 前端声音克隆 UI
|
||||
|
||||
### TTS 模式选择
|
||||
|
||||
在视频生成页面新增声音克隆选项:
|
||||
|
||||
```tsx
|
||||
{/* TTS 模式选择 */}
|
||||
<div className="flex gap-2 mb-4">
|
||||
<button
|
||||
onClick={() => setTtsMode("edge")}
|
||||
className={`px-4 py-2 rounded-lg ${ttsMode === "edge" ? "bg-purple-600" : "bg-white/10"}`}
|
||||
>
|
||||
🔊 EdgeTTS
|
||||
</button>
|
||||
<button
|
||||
onClick={() => setTtsMode("clone")}
|
||||
className={`px-4 py-2 rounded-lg ${ttsMode === "clone" ? "bg-purple-600" : "bg-white/10"}`}
|
||||
>
|
||||
🎙️ 声音克隆
|
||||
</button>
|
||||
</div>
|
||||
```
|
||||
|
||||
### 参考音频管理
|
||||
|
||||
新增参考音频上传和列表展示功能:
|
||||
|
||||
| 功能 | 实现 |
|
||||
|------|------|
|
||||
| 音频上传 | 拖拽上传 WAV/MP3,直传 Supabase |
|
||||
| 列表展示 | 显示文件名、时长、上传时间 |
|
||||
| 快速选择 | 点击即选中作为参考音频 |
|
||||
| 删除功能 | 删除不需要的参考音频 |
|
||||
|
||||
---
|
||||
|
||||
## ✅ 端到端测试验证
|
||||
|
||||
### 测试流程
|
||||
1. **上传参考音频**: 3 秒参考音频 → Supabase ref-audios bucket
|
||||
2. **选择声音克隆模式**: TTS 模式切换为 "声音克隆"
|
||||
3. **输入文案**: 测试口播文案
|
||||
4. **生成视频**:
|
||||
- TTS 阶段调用 Qwen3-TTS (17.7s)
|
||||
- LipSync 阶段调用 LatentSync (122.8s)
|
||||
5. **播放验证**: 视频声音与参考音色一致
|
||||
|
||||
### 测试结果
|
||||
- ✅ 参考音频上传成功
|
||||
- ✅ Qwen3-TTS 生成克隆音频 (15s 推理,4.6s 音频)
|
||||
- ✅ LatentSync 唇形同步正常
|
||||
- ✅ 总生成时间 143.1s
|
||||
- ✅ 前端视频播放正常
|
||||
|
||||
---
|
||||
|
||||
## 🔧 PM2 服务配置
|
||||
|
||||
### 新增 Qwen3-TTS 服务
|
||||
|
||||
**前置依赖安装**:
|
||||
```bash
|
||||
conda activate qwen-tts
|
||||
pip install fastapi uvicorn python-multipart
|
||||
```
|
||||
|
||||
启动脚本 `run_qwen_tts.sh` (位于项目**根目录**):
|
||||
```bash
|
||||
#!/bin/bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS
|
||||
/home/rongye/ProgramFiles/miniconda3/envs/qwen-tts/bin/python qwen_tts_server.py
|
||||
```
|
||||
|
||||
PM2 管理命令:
|
||||
```bash
|
||||
# 进入根目录启动
|
||||
cd /home/rongye/ProgramFiles/ViGent2
|
||||
pm2 start ./run_qwen_tts.sh --name vigent2-qwen-tts
|
||||
pm2 save
|
||||
|
||||
# 查看状态
|
||||
pm2 status
|
||||
|
||||
# 查看日志
|
||||
pm2 logs vigent2-qwen-tts --lines 50
|
||||
```
|
||||
|
||||
### 完整服务列表
|
||||
|
||||
| 服务名 | 端口 | 功能 |
|
||||
|--------|------|------|
|
||||
| vigent2-backend | 8006 | FastAPI 后端 |
|
||||
| vigent2-frontend | 3002 | Next.js 前端 |
|
||||
| vigent2-latentsync | 8007 | LatentSync 唇形同步 |
|
||||
| vigent2-qwen-tts | 8009 | Qwen3-TTS 声音克隆 |
|
||||
|
||||
---
|
||||
|
||||
## 📁 今日修改文件清单
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `models/Qwen3-TTS/qwen_tts_server.py` | 新增 | Qwen3-TTS HTTP 推理服务 |
|
||||
| `run_qwen_tts.sh` | 新增 | PM2 启动脚本 (根目录) |
|
||||
| `backend/app/services/voice_clone_service.py` | 新增 | 声音克隆服务 (HTTP 调用) |
|
||||
| `backend/app/api/ref_audios.py` | 新增 | 参考音频管理 API |
|
||||
| `backend/app/main.py` | 修改 | 注册 ref-audios 路由 |
|
||||
| `frontend/src/app/page.tsx` | 修改 | TTS 模式选择 + 参考音频 UI |
|
||||
|
||||
---
|
||||
|
||||
## 🔗 相关文档
|
||||
|
||||
- [TASK_COMPLETE.md](../TASK_COMPLETE.md) - 任务总览
|
||||
- [Day12.md](./Day12.md) - iOS 兼容与 Qwen3-TTS 部署
|
||||
- [QWEN3_TTS_DEPLOY.md](../QWEN3_TTS_DEPLOY.md) - Qwen3-TTS 部署指南
|
||||
- [SUBTITLE_DEPLOY.md](../SUBTITLE_DEPLOY.md) - 字幕功能部署指南
|
||||
- [DEPLOY_MANUAL.md](../DEPLOY_MANUAL.md) - 完整部署手册
|
||||
|
||||
---
|
||||
|
||||
## 🎬 逐字高亮字幕 + 片头标题功能
|
||||
|
||||
### 背景
|
||||
|
||||
为提升视频质量,新增逐字高亮字幕(卡拉OK效果)和片头标题功能。
|
||||
|
||||
### 技术方案
|
||||
|
||||
| 组件 | 技术 | 说明 |
|
||||
|------|------|------|
|
||||
| 字幕对齐 | **faster-whisper** | 生成字级别时间戳 |
|
||||
| 视频渲染 | **Remotion** | React 视频合成框架 |
|
||||
|
||||
### 架构设计
|
||||
|
||||
```
|
||||
原有流程:
|
||||
文本 → EdgeTTS → 音频 → LatentSync → FFmpeg合成 → 最终视频
|
||||
|
||||
新流程:
|
||||
文本 → EdgeTTS → 音频 ─┬→ LatentSync → 唇形视频 ─┐
|
||||
└→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频
|
||||
```
|
||||
|
||||
### 后端新增服务
|
||||
|
||||
#### 1. 字幕服务 (`whisper_service.py`)
|
||||
|
||||
基于 faster-whisper 生成字级别时间戳:
|
||||
|
||||
```python
|
||||
from faster_whisper import WhisperModel
|
||||
|
||||
class WhisperService:
|
||||
def __init__(self, model_size="large-v3", device="cuda"):
|
||||
self.model = WhisperModel(model_size, device=device)
|
||||
|
||||
async def align(self, audio_path: str, text: str, output_path: str):
|
||||
segments, info = self.model.transcribe(audio_path, word_timestamps=True)
|
||||
# 将词拆分成单字,时间戳线性插值
|
||||
result = {"segments": [...]}
|
||||
# 保存到 JSON
|
||||
```
|
||||
|
||||
**字幕拆字算法**:faster-whisper 对中文返回词级别,系统自动拆分成单字并线性插值:
|
||||
|
||||
```python
|
||||
# 输入: {"word": "大家好", "start": 0.0, "end": 0.9}
|
||||
# 输出:
|
||||
[
|
||||
{"word": "大", "start": 0.0, "end": 0.3},
|
||||
{"word": "家", "start": 0.3, "end": 0.6},
|
||||
{"word": "好", "start": 0.6, "end": 0.9}
|
||||
]
|
||||
```
|
||||
|
||||
#### 2. Remotion 渲染服务 (`remotion_service.py`)
|
||||
|
||||
调用 Remotion 渲染字幕和标题:
|
||||
|
||||
```python
|
||||
class RemotionService:
|
||||
async def render(self, video_path, output_path, captions_path, title, ...):
|
||||
cmd = f"npx ts-node render.ts --video {video_path} --output {output_path} ..."
|
||||
# 执行渲染
|
||||
```
|
||||
|
||||
### Remotion 项目结构
|
||||
|
||||
```
|
||||
remotion/
|
||||
├── package.json # Node.js 依赖
|
||||
├── render.ts # 服务端渲染脚本
|
||||
└── src/
|
||||
├── Video.tsx # 主视频组件
|
||||
├── components/
|
||||
│ ├── Title.tsx # 片头标题(淡入淡出)
|
||||
│ ├── Subtitles.tsx # 逐字高亮字幕
|
||||
│ └── VideoLayer.tsx # 视频图层
|
||||
└── utils/
|
||||
└── captions.ts # 字幕数据类型
|
||||
```
|
||||
|
||||
### 前端 UI
|
||||
|
||||
新增标题和字幕设置区块:
|
||||
|
||||
| 功能 | 说明 |
|
||||
|------|------|
|
||||
| 片头标题输入 | 可选,在视频开头显示 3 秒 |
|
||||
| 字幕开关 | 默认开启,可关闭 |
|
||||
|
||||
### 遇到的问题与修复
|
||||
|
||||
#### 问题 1: `fs` 模块错误
|
||||
|
||||
**现象**:Remotion 打包失败,提示 `fs.js doesn't exist`
|
||||
|
||||
**原因**:`captions.ts` 中有 `loadCaptions` 函数使用了 Node.js 的 `fs` 模块
|
||||
|
||||
**修复**:删除未使用的 `loadCaptions` 函数
|
||||
|
||||
#### 问题 2: 视频文件读取失败
|
||||
|
||||
**现象**:`file://` 协议无法读取本地视频
|
||||
|
||||
**修复**:
|
||||
1. `render.ts` 使用 `publicDir` 指向视频目录
|
||||
2. `VideoLayer.tsx` 使用 `staticFile()` 加载视频
|
||||
|
||||
```typescript
|
||||
// render.ts
|
||||
const publicDir = path.dirname(path.resolve(options.videoPath));
|
||||
const bundleLocation = await bundle({
|
||||
entryPoint: path.resolve(__dirname, './src/index.ts'),
|
||||
publicDir, // 关键配置
|
||||
});
|
||||
|
||||
// VideoLayer.tsx
|
||||
const videoUrl = staticFile(videoSrc);
|
||||
```
|
||||
|
||||
### 测试结果
|
||||
|
||||
- ✅ faster-whisper 字幕对齐成功(~1秒)
|
||||
- ✅ Remotion 渲染成功(~10秒)
|
||||
- ✅ 字幕逐字高亮效果正常
|
||||
- ✅ 片头标题淡入淡出正常
|
||||
- ✅ 降级机制正常(Remotion 失败时回退到 FFmpeg)
|
||||
|
||||
---
|
||||
|
||||
## 📁 今日修改文件清单(完整)
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `models/Qwen3-TTS/qwen_tts_server.py` | 新增 | Qwen3-TTS HTTP 推理服务 |
|
||||
| `run_qwen_tts.sh` | 新增 | PM2 启动脚本 (根目录) |
|
||||
| `backend/app/services/voice_clone_service.py` | 新增 | 声音克隆服务 (HTTP 调用) |
|
||||
| `backend/app/services/whisper_service.py` | 新增 | 字幕对齐服务 (faster-whisper) |
|
||||
| `backend/app/services/remotion_service.py` | 新增 | Remotion 渲染服务 |
|
||||
| `backend/app/api/ref_audios.py` | 新增 | 参考音频管理 API |
|
||||
| `backend/app/api/videos.py` | 修改 | 集成字幕和标题功能 |
|
||||
| `backend/app/main.py` | 修改 | 注册 ref-audios 路由 |
|
||||
| `backend/requirements.txt` | 修改 | 添加 faster-whisper 依赖 |
|
||||
| `remotion/` | 新增 | Remotion 视频渲染项目 |
|
||||
| `frontend/src/app/page.tsx` | 修改 | TTS 模式选择 + 标题字幕 UI |
|
||||
| `Docs/SUBTITLE_DEPLOY.md` | 新增 | 字幕功能部署文档 |
|
||||
402
Docs/DevLogs/Day14.md
Normal file
402
Docs/DevLogs/Day14.md
Normal file
@@ -0,0 +1,402 @@
|
||||
# Day 14 - 模型升级 + 标题标签生成 + 前端修复
|
||||
|
||||
**日期**:2026-01-30
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Qwen3-TTS 模型升级 (0.6B → 1.7B)
|
||||
|
||||
### 背景
|
||||
|
||||
为提升声音克隆质量,将 Qwen3-TTS 模型从 0.6B-Base 升级到 1.7B-Base。
|
||||
|
||||
### 变更内容
|
||||
|
||||
| 项目 | 升级前 | 升级后 |
|
||||
|------|--------|--------|
|
||||
| 模型 | 0.6B-Base | **1.7B-Base** |
|
||||
| 大小 | 2.4GB | 6.8GB |
|
||||
| 质量 | 基础 | 更高质量 |
|
||||
|
||||
### 代码修改
|
||||
|
||||
**文件**: `models/Qwen3-TTS/qwen_tts_server.py`
|
||||
|
||||
```python
|
||||
# 升级前
|
||||
MODEL_PATH = Path(__file__).parent / "checkpoints" / "0.6B-Base"
|
||||
|
||||
# 升级后
|
||||
MODEL_PATH = Path(__file__).parent / "checkpoints" / "1.7B-Base"
|
||||
```
|
||||
|
||||
### 模型下载
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS
|
||||
|
||||
# 下载 1.7B-Base 模型 (6.8GB)
|
||||
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./checkpoints/1.7B-Base
|
||||
```
|
||||
|
||||
### 结果
|
||||
|
||||
- ✅ 模型加载正常 (GPU0, bfloat16)
|
||||
- ✅ 声音克隆质量提升
|
||||
- ✅ 推理速度可接受
|
||||
|
||||
---
|
||||
|
||||
## 🎨 标题和字幕显示优化
|
||||
|
||||
### 字幕组件优化 (`Subtitles.tsx`)
|
||||
|
||||
**文件**: `remotion/src/components/Subtitles.tsx`
|
||||
|
||||
优化内容:
|
||||
- 调整高亮颜色配置
|
||||
- 优化文字描边效果(多层阴影)
|
||||
- 调整字间距和行高
|
||||
|
||||
```typescript
|
||||
export const Subtitles: React.FC<SubtitlesProps> = ({
|
||||
captions,
|
||||
highlightColor = '#FFFF00', // 高亮颜色
|
||||
normalColor = '#FFFFFF', // 普通文字颜色
|
||||
fontSize = 52,
|
||||
}) => {
|
||||
// 样式优化
|
||||
const style = {
|
||||
textShadow: `
|
||||
2px 2px 4px rgba(0,0,0,0.8),
|
||||
-2px -2px 4px rgba(0,0,0,0.8),
|
||||
...
|
||||
`,
|
||||
letterSpacing: '2px',
|
||||
lineHeight: 1.4,
|
||||
maxWidth: '90%',
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
### 标题组件优化 (`Title.tsx`)
|
||||
|
||||
**文件**: `remotion/src/components/Title.tsx`
|
||||
|
||||
优化内容:
|
||||
- 淡入淡出动画效果
|
||||
- 下滑入场动画
|
||||
- 可配置显示时长
|
||||
|
||||
```typescript
|
||||
interface TitleProps {
|
||||
title: string;
|
||||
duration?: number; // 标题显示时长(秒,默认3秒)
|
||||
fadeOutStart?: number; // 开始淡出的时间(秒,默认2秒)
|
||||
}
|
||||
|
||||
// 动画效果
|
||||
// 淡入:0-0.5 秒
|
||||
// 淡出:2-3 秒
|
||||
// 下滑:0-0.5 秒,-20px → 0px
|
||||
```
|
||||
|
||||
### 结果
|
||||
|
||||
- ✅ 字幕显示更清晰
|
||||
- ✅ 标题动画更流畅
|
||||
|
||||
---
|
||||
|
||||
## 🤖 标题标签自动生成功能
|
||||
|
||||
### 功能描述
|
||||
|
||||
使用 AI(智谱 GLM-4-Flash)根据口播文案自动生成视频标题和标签。
|
||||
|
||||
### 后端实现
|
||||
|
||||
#### 1. GLM 服务 (`glm_service.py`)
|
||||
|
||||
**文件**: `backend/app/services/glm_service.py`
|
||||
|
||||
```python
|
||||
class GLMService:
|
||||
"""智谱 GLM AI 服务"""
|
||||
|
||||
async def generate_meta(self, text: str) -> dict:
|
||||
"""根据文案生成标题和标签"""
|
||||
|
||||
prompt = """根据以下口播文案,生成一个吸引人的短视频标题和3个相关标签。
|
||||
|
||||
要求:
|
||||
1. 标题要简洁有力,能吸引观众点击,不超过10个字
|
||||
2. 标签要与内容相关,便于搜索和推荐,只要3个
|
||||
|
||||
返回格式:{"title": "标题", "tags": ["标签1", "标签2", "标签3"]}
|
||||
"""
|
||||
# 调用 GLM-4-Flash API
|
||||
response = await self._call_api(prompt + text)
|
||||
return self._parse_json(response)
|
||||
```
|
||||
|
||||
**JSON 解析容错**:
|
||||
- 支持直接 JSON 解析
|
||||
- 支持提取 JSON 块
|
||||
- 支持 ```json 代码块提取
|
||||
|
||||
#### 2. API 端点 (`ai.py`)
|
||||
|
||||
**文件**: `backend/app/api/ai.py`
|
||||
|
||||
```python
|
||||
from pydantic import BaseModel
|
||||
|
||||
class GenerateMetaRequest(BaseModel):
|
||||
text: str # 口播文案
|
||||
|
||||
class GenerateMetaResponse(BaseModel):
|
||||
title: str # 生成的标题
|
||||
tags: list[str] # 生成的标签列表
|
||||
|
||||
@router.post("/generate-meta", response_model=GenerateMetaResponse)
|
||||
async def generate_meta(request: GenerateMetaRequest):
|
||||
"""AI 生成标题和标签"""
|
||||
result = await glm_service.generate_meta(request.text)
|
||||
return result
|
||||
```
|
||||
|
||||
### 前端实现
|
||||
|
||||
**文件**: `frontend/src/app/page.tsx`
|
||||
|
||||
#### UI 按钮
|
||||
|
||||
```tsx
|
||||
<button
|
||||
onClick={handleGenerateMeta}
|
||||
disabled={isGeneratingMeta || !text.trim()}
|
||||
className="px-2 py-1 text-xs rounded transition-all whitespace-nowrap"
|
||||
>
|
||||
{isGeneratingMeta ? "⏳ 生成中..." : "🤖 AI生成标题标签"}
|
||||
</button>
|
||||
```
|
||||
|
||||
#### 处理逻辑
|
||||
|
||||
```typescript
|
||||
const handleGenerateMeta = async () => {
|
||||
if (!text.trim()) {
|
||||
alert("请先输入口播文案");
|
||||
return;
|
||||
}
|
||||
|
||||
setIsGeneratingMeta(true);
|
||||
try {
|
||||
const { data } = await api.post('/api/ai/generate-meta', { text: text.trim() });
|
||||
|
||||
// 更新首页标题
|
||||
setVideoTitle(data.title || "");
|
||||
|
||||
// 同步到发布页 localStorage
|
||||
localStorage.setItem(`vigent_${storageKey}_publish_title`, data.title || "");
|
||||
localStorage.setItem(`vigent_${storageKey}_publish_tags`, JSON.stringify(data.tags || []));
|
||||
} catch (err: any) {
|
||||
alert(`AI 生成失败: ${err.message}`);
|
||||
} finally {
|
||||
setIsGeneratingMeta(false);
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
### 发布页集成
|
||||
|
||||
**文件**: `frontend/src/app/publish/page.tsx`
|
||||
|
||||
从 localStorage 恢复 AI 生成的标题和标签:
|
||||
|
||||
```typescript
|
||||
// 恢复标题和标签
|
||||
const savedTitle = localStorage.getItem(`vigent_${storageKey}_publish_title`);
|
||||
const savedTags = localStorage.getItem(`vigent_${storageKey}_publish_tags`);
|
||||
|
||||
if (savedTags) {
|
||||
try {
|
||||
const parsed = JSON.parse(savedTags);
|
||||
if (Array.isArray(parsed)) {
|
||||
setTags(parsed.join(', ')); // 数组转逗号分隔字符串
|
||||
} else {
|
||||
setTags(savedTags);
|
||||
}
|
||||
} catch {
|
||||
setTags(savedTags);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 结果
|
||||
|
||||
- ✅ AI 生成标题和标签功能正常
|
||||
- ✅ 数据自动同步到发布页
|
||||
- ✅ 支持 JSON 数组和字符串格式兼容
|
||||
|
||||
---
|
||||
|
||||
## 🐛 前端文本保存问题修复
|
||||
|
||||
### 问题描述
|
||||
|
||||
**现象**:页面刷新后,用户输入的文案、标题等数据丢失
|
||||
|
||||
**原因**:
|
||||
1. 认证状态恢复失败时,`userId` 为 `null`
|
||||
2. 原代码判断 `!userId` 后用默认值覆盖 localStorage 数据
|
||||
3. 导致已保存的用户数据被清空
|
||||
|
||||
### 解决方案
|
||||
|
||||
**文件**: `frontend/src/app/page.tsx`
|
||||
|
||||
#### 1. 添加恢复完成标志
|
||||
|
||||
```typescript
|
||||
const [isRestored, setIsRestored] = useState(false);
|
||||
```
|
||||
|
||||
#### 2. 等待认证完成后恢复数据
|
||||
|
||||
```typescript
|
||||
useEffect(() => {
|
||||
if (isAuthLoading) return; // 等待认证完成
|
||||
|
||||
// 使用 userId 或 'guest' 作为 key
|
||||
const key = userId || 'guest';
|
||||
|
||||
// 从 localStorage 恢复数据
|
||||
const savedText = localStorage.getItem(`vigent_${key}_text`);
|
||||
if (savedText) setText(savedText);
|
||||
|
||||
// ... 恢复其他数据
|
||||
|
||||
setIsRestored(true); // 标记恢复完成
|
||||
}, [userId, isAuthLoading]);
|
||||
```
|
||||
|
||||
#### 3. 恢复完成后才保存
|
||||
|
||||
```typescript
|
||||
useEffect(() => {
|
||||
if (isRestored) {
|
||||
localStorage.setItem(`vigent_${storageKey}_text`, text);
|
||||
}
|
||||
}, [text, storageKey, isRestored]);
|
||||
```
|
||||
|
||||
### 用户隔离机制
|
||||
|
||||
```typescript
|
||||
const storageKey = userId || 'guest';
|
||||
```
|
||||
|
||||
| 用户状态 | storageKey | 说明 |
|
||||
|----------|------------|------|
|
||||
| 已登录 | `user_xxx` | 数据按用户隔离 |
|
||||
| 未登录/认证失败 | `guest` | 使用统一 key |
|
||||
|
||||
### 数据恢复流程
|
||||
|
||||
```
|
||||
1. 页面加载
|
||||
↓
|
||||
2. 检查 isAuthLoading
|
||||
├─ true: 等待认证完成
|
||||
└─ false: 继续
|
||||
↓
|
||||
3. 确定 storageKey (userId || 'guest')
|
||||
↓
|
||||
4. 从 localStorage 读取数据
|
||||
├─ 有保存数据: 恢复到状态
|
||||
└─ 无保存数据: 使用默认值
|
||||
↓
|
||||
5. 设置 isRestored = true
|
||||
↓
|
||||
6. 后续状态变化时保存到 localStorage
|
||||
```
|
||||
|
||||
### 保存的数据项
|
||||
|
||||
| Key | 说明 |
|
||||
|-----|------|
|
||||
| `vigent_${key}_text` | 口播文案 |
|
||||
| `vigent_${key}_title` | 视频标题 |
|
||||
| `vigent_${key}_subtitles` | 字幕开关 |
|
||||
| `vigent_${key}_ttsMode` | TTS 模式 |
|
||||
| `vigent_${key}_voice` | 选择的音色 |
|
||||
| `vigent_${key}_material` | 选择的素材 |
|
||||
| `vigent_${key}_publish_title` | 发布标题 |
|
||||
| `vigent_${key}_publish_tags` | 发布标签 |
|
||||
|
||||
### 结果
|
||||
|
||||
- ✅ 页面刷新后数据正常恢复
|
||||
- ✅ 认证失败时不会覆盖已保存数据
|
||||
- ✅ 多用户数据隔离正常
|
||||
|
||||
---
|
||||
|
||||
## 🐛 登录页刷新循环修复
|
||||
|
||||
### 问题描述
|
||||
|
||||
**现象**:登录页未登录时不断刷新,无法停留在表单页面。
|
||||
|
||||
**原因**:
|
||||
1. `AuthProvider` 初始化时调用 `/api/auth/me`
|
||||
2. 未登录返回 401
|
||||
3. `axios` 全局拦截器遇到 401/403 重定向 `/login`
|
||||
4. 登录页本身也在 Provider 中,导致循环刷新
|
||||
|
||||
### 解决方案
|
||||
|
||||
**文件**: `frontend/src/shared/api/axios.ts`
|
||||
|
||||
在拦截器中对公开路由跳过重定向,仅在受保护页面触发登录跳转:
|
||||
|
||||
```typescript
|
||||
const PUBLIC_PATHS = new Set(['/login', '/register']);
|
||||
const isPublicPath = typeof window !== 'undefined' && PUBLIC_PATHS.has(window.location.pathname);
|
||||
|
||||
if ((status === 401 || status === 403) && !isRedirecting && !isPublicPath) {
|
||||
// ... 保持原有重定向逻辑
|
||||
}
|
||||
```
|
||||
|
||||
### 结果
|
||||
|
||||
- ✅ 登录页不再刷新,表单可正常输入
|
||||
- ✅ 受保护页面仍会在 401/403 时跳转登录页
|
||||
|
||||
---
|
||||
|
||||
## 📁 今日修改文件清单
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `models/Qwen3-TTS/qwen_tts_server.py` | 修改 | 模型路径升级到 1.7B-Base |
|
||||
| `Docs/QWEN3_TTS_DEPLOY.md` | 修改 | 更新部署文档为 1.7B 版本 |
|
||||
| `remotion/src/components/Subtitles.tsx` | 修改 | 优化字幕显示效果 |
|
||||
| `remotion/src/components/Title.tsx` | 修改 | 优化标题动画效果 |
|
||||
| `backend/app/services/glm_service.py` | 新增 | GLM AI 服务 |
|
||||
| `backend/app/api/ai.py` | 新增 | AI 生成标题标签 API |
|
||||
| `backend/app/main.py` | 修改 | 注册 ai 路由 |
|
||||
| `frontend/src/app/page.tsx` | 修改 | AI 生成按钮 + localStorage 修复 |
|
||||
| `frontend/src/app/publish/page.tsx` | 修改 | 恢复 AI 生成的标签 |
|
||||
| `frontend/src/shared/api/axios.ts` | 修改 | 公开路由跳过 401/403 登录重定向 |
|
||||
|
||||
---
|
||||
|
||||
## 🔗 相关文档
|
||||
|
||||
- [TASK_COMPLETE.md](../TASK_COMPLETE.md) - 任务总览
|
||||
- [Day13.md](./Day13.md) - 声音克隆功能集成 + 字幕功能
|
||||
- [QWEN3_TTS_DEPLOY.md](../QWEN3_TTS_DEPLOY.md) - Qwen3-TTS 1.7B 部署指南
|
||||
410
Docs/DevLogs/Day15.md
Normal file
410
Docs/DevLogs/Day15.md
Normal file
@@ -0,0 +1,410 @@
|
||||
# Day 15 - 手机号登录迁移 + 账户设置功能
|
||||
|
||||
**日期**:2026-02-02
|
||||
|
||||
---
|
||||
|
||||
## 🔐 认证系统迁移:邮箱 → 手机号
|
||||
|
||||
### 背景
|
||||
|
||||
根据业务需求,将用户认证从邮箱登录迁移到手机号登录(11位中国手机号)。
|
||||
|
||||
### 变更范围
|
||||
|
||||
| 组件 | 变更内容 |
|
||||
|------|----------|
|
||||
| 数据库 Schema | `email` 字段替换为 `phone` |
|
||||
| 后端 API | 注册/登录/获取用户信息接口使用 `phone` |
|
||||
| 前端页面 | 登录/注册页面改为手机号输入框 |
|
||||
| 管理员配置 | `ADMIN_EMAIL` 改为 `ADMIN_PHONE` |
|
||||
|
||||
---
|
||||
|
||||
## 📦 后端修改
|
||||
|
||||
### 1. 数据库 Schema (`schema.sql`)
|
||||
|
||||
**文件**: `backend/database/schema.sql`
|
||||
|
||||
```sql
|
||||
CREATE TABLE users (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
phone TEXT UNIQUE NOT NULL, -- 原 email 改为 phone
|
||||
password_hash TEXT NOT NULL,
|
||||
username TEXT,
|
||||
role TEXT DEFAULT 'pending' CHECK (role IN ('pending', 'user', 'admin')),
|
||||
is_active BOOLEAN DEFAULT FALSE,
|
||||
expires_at TIMESTAMP WITH TIME ZONE,
|
||||
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
|
||||
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
|
||||
);
|
||||
|
||||
CREATE INDEX idx_users_phone ON users(phone);
|
||||
```
|
||||
|
||||
### 2. 认证 API (`auth.py`)
|
||||
|
||||
**文件**: `backend/app/api/auth.py`
|
||||
|
||||
#### 请求模型更新
|
||||
|
||||
```python
|
||||
class RegisterRequest(BaseModel):
|
||||
phone: str
|
||||
password: str
|
||||
username: Optional[str] = None
|
||||
|
||||
@field_validator('phone')
|
||||
@classmethod
|
||||
def validate_phone(cls, v):
|
||||
if not re.match(r'^\d{11}$', v):
|
||||
raise ValueError('手机号必须是11位数字')
|
||||
return v
|
||||
```
|
||||
|
||||
#### 新增修改密码接口
|
||||
|
||||
```python
|
||||
class ChangePasswordRequest(BaseModel):
|
||||
old_password: str
|
||||
new_password: str
|
||||
|
||||
@field_validator('new_password')
|
||||
@classmethod
|
||||
def validate_new_password(cls, v):
|
||||
if len(v) < 6:
|
||||
raise ValueError('新密码长度至少6位')
|
||||
return v
|
||||
|
||||
@router.post("/change-password")
|
||||
async def change_password(request: ChangePasswordRequest, req: Request, response: Response):
|
||||
"""修改密码,验证当前密码后更新"""
|
||||
# 1. 验证当前密码
|
||||
# 2. 更新密码 hash
|
||||
# 3. 重新生成 session token
|
||||
# 4. 返回新的 JWT Cookie
|
||||
```
|
||||
|
||||
### 3. 配置更新
|
||||
|
||||
**文件**: `backend/app/core/config.py`
|
||||
|
||||
```python
|
||||
# 管理员配置
|
||||
ADMIN_PHONE: str = "" # 原 ADMIN_EMAIL
|
||||
ADMIN_PASSWORD: str = ""
|
||||
```
|
||||
|
||||
**文件**: `backend/.env`
|
||||
|
||||
```bash
|
||||
ADMIN_PHONE=15549380526
|
||||
ADMIN_PASSWORD=lam1988324
|
||||
```
|
||||
|
||||
### 4. 管理员初始化 (`main.py`)
|
||||
|
||||
**文件**: `backend/app/main.py`
|
||||
|
||||
```python
|
||||
@app.on_event("startup")
|
||||
async def init_admin():
|
||||
admin_phone = settings.ADMIN_PHONE # 原 ADMIN_EMAIL
|
||||
# ... 使用 phone 字段创建管理员
|
||||
```
|
||||
|
||||
### 5. 管理员 API (`admin.py`)
|
||||
|
||||
**文件**: `backend/app/api/admin.py`
|
||||
|
||||
```python
|
||||
class UserListItem(BaseModel):
|
||||
id: str
|
||||
phone: str # 原 email
|
||||
username: Optional[str]
|
||||
role: str
|
||||
is_active: bool
|
||||
expires_at: Optional[str]
|
||||
created_at: str
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🖥️ 前端修改
|
||||
|
||||
### 1. 登录页面 (`login/page.tsx`)
|
||||
|
||||
**文件**: `frontend/src/app/login/page.tsx`
|
||||
|
||||
```tsx
|
||||
const [phone, setPhone] = useState('');
|
||||
|
||||
// 验证手机号格式
|
||||
if (!/^\d{11}$/.test(phone)) {
|
||||
setError('请输入正确的11位手机号');
|
||||
return;
|
||||
}
|
||||
|
||||
<input
|
||||
type="tel"
|
||||
value={phone}
|
||||
onChange={(e) => setPhone(e.target.value.replace(/\D/g, '').slice(0, 11))}
|
||||
maxLength={11}
|
||||
placeholder="请输入11位手机号"
|
||||
/>
|
||||
```
|
||||
|
||||
### 2. 注册页面 (`register/page.tsx`)
|
||||
|
||||
同样使用手机号输入,增加 11 位数字验证。
|
||||
|
||||
### 3. Auth 工具函数 (`auth.ts`)
|
||||
|
||||
**文件**: `frontend/src/shared/lib/auth.ts`
|
||||
|
||||
```typescript
|
||||
export interface User {
|
||||
id: string;
|
||||
phone: string; // 原 email
|
||||
username: string | null;
|
||||
role: string;
|
||||
is_active: boolean;
|
||||
}
|
||||
|
||||
export async function login(phone: string, password: string): Promise<AuthResponse> { ... }
|
||||
export async function register(phone: string, password: string, username?: string): Promise<AuthResponse> { ... }
|
||||
export async function changePassword(oldPassword: string, newPassword: string): Promise<AuthResponse> { ... }
|
||||
```
|
||||
|
||||
### 4. 首页账户设置下拉菜单 (`page.tsx`)
|
||||
|
||||
**文件**: `frontend/src/app/page.tsx`
|
||||
|
||||
将原来的"退出"按钮改为账户设置下拉菜单:
|
||||
|
||||
```tsx
|
||||
function AccountSettingsDropdown() {
|
||||
const [isOpen, setIsOpen] = useState(false);
|
||||
const [showPasswordModal, setShowPasswordModal] = useState(false);
|
||||
// ...
|
||||
|
||||
return (
|
||||
<div className="relative">
|
||||
<button onClick={() => setIsOpen(!isOpen)}>
|
||||
⚙️ 账户
|
||||
</button>
|
||||
|
||||
{/* 下拉菜单 */}
|
||||
{isOpen && (
|
||||
<div className="absolute right-0 mt-2 w-40 bg-gray-800 ...">
|
||||
<button onClick={() => setShowPasswordModal(true)}>
|
||||
🔐 修改密码
|
||||
</button>
|
||||
<button onClick={handleLogout} className="text-red-300">
|
||||
🚪 退出登录
|
||||
</button>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* 修改密码弹窗 */}
|
||||
{showPasswordModal && (
|
||||
<div className="fixed inset-0 z-50 ...">
|
||||
<form onSubmit={handleChangePassword}>
|
||||
<input placeholder="当前密码" />
|
||||
<input placeholder="新密码" />
|
||||
<input placeholder="确认新密码" />
|
||||
</form>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
### 5. 管理员页面 (`admin/page.tsx`)
|
||||
|
||||
**文件**: `frontend/src/app/admin/page.tsx`
|
||||
|
||||
```tsx
|
||||
interface UserListItem {
|
||||
id: string;
|
||||
phone: string; // 原 email
|
||||
// ...
|
||||
}
|
||||
|
||||
// 显示手机号而非邮箱
|
||||
<div className="text-gray-400 text-sm">{user.phone}</div>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🗄️ 数据库迁移
|
||||
|
||||
### 迁移脚本
|
||||
|
||||
**文件**: `backend/database/migrate_to_phone.sql`
|
||||
|
||||
```sql
|
||||
-- 删除旧表 (CASCADE 处理外键依赖)
|
||||
DROP TABLE IF EXISTS user_sessions CASCADE;
|
||||
DROP TABLE IF EXISTS social_accounts CASCADE;
|
||||
DROP TABLE IF EXISTS users CASCADE;
|
||||
|
||||
-- 重新创建使用 phone 字段的表
|
||||
CREATE TABLE users (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
phone TEXT UNIQUE NOT NULL,
|
||||
-- ...
|
||||
);
|
||||
|
||||
-- 重新创建依赖表和索引
|
||||
CREATE TABLE user_sessions (...);
|
||||
CREATE TABLE social_accounts (...);
|
||||
CREATE INDEX idx_users_phone ON users(phone);
|
||||
```
|
||||
|
||||
### 执行方式
|
||||
|
||||
```bash
|
||||
# 方式一:Docker 命令
|
||||
docker exec -i supabase-db psql -U postgres < backend/database/migrate_to_phone.sql
|
||||
|
||||
# 方式二:Supabase Studio SQL Editor
|
||||
# 打开 https://supabase.hbyrkj.top -> SQL Editor -> 粘贴执行
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ 部署步骤
|
||||
|
||||
```bash
|
||||
# 1. 执行数据库迁移
|
||||
docker exec -i supabase-db psql -U postgres < backend/database/migrate_to_phone.sql
|
||||
|
||||
# 2. 重新构建前端
|
||||
cd frontend && npm run build
|
||||
|
||||
# 3. 重启服务
|
||||
pm2 restart vigent2-backend vigent2-frontend
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📁 今日修改文件清单
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `backend/database/schema.sql` | 修改 | email → phone |
|
||||
| `backend/database/migrate_to_phone.sql` | 新增 | 数据库迁移脚本 |
|
||||
| `backend/app/api/auth.py` | 修改 | 手机号验证 + 修改密码 API |
|
||||
| `backend/app/api/admin.py` | 修改 | UserListItem.email → phone |
|
||||
| `backend/app/core/config.py` | 修改 | ADMIN_EMAIL → ADMIN_PHONE |
|
||||
| `backend/app/main.py` | 修改 | 管理员初始化使用 phone |
|
||||
| `backend/.env` | 修改 | ADMIN_PHONE=15549380526 |
|
||||
| `frontend/src/app/login/page.tsx` | 修改 | 手机号登录 + 11位验证 |
|
||||
| `frontend/src/app/register/page.tsx` | 修改 | 手机号注册 + 11位验证 |
|
||||
| `frontend/src/shared/lib/auth.ts` | 修改 | phone 参数 + changePassword 函数 |
|
||||
| `frontend/src/app/page.tsx` | 修改 | AccountSettingsDropdown 组件 |
|
||||
| `frontend/src/app/admin/page.tsx` | 修改 | 用户列表显示手机号 |
|
||||
| `frontend/src/contexts/AuthContext.tsx` | 修改 | 存储完整用户信息含 expires_at |
|
||||
|
||||
---
|
||||
|
||||
## 🆕 后续完善 (Day 15 下午)
|
||||
|
||||
### 账户有效期显示
|
||||
|
||||
在账户下拉菜单中显示用户的有效期:
|
||||
|
||||
| 显示情况 | 格式 |
|
||||
|----------|------|
|
||||
| 有设置 expires_at | `2026-03-15` |
|
||||
| NULL | `永久有效` |
|
||||
|
||||
**相关修改**:
|
||||
- `backend/app/api/auth.py`: UserResponse 新增 `expires_at` 字段
|
||||
- `frontend/src/contexts/AuthContext.tsx`: 存储完整用户对象
|
||||
- `frontend/src/app/page.tsx`: 格式化并显示有效期
|
||||
|
||||
### 点击外部关闭下拉菜单
|
||||
|
||||
使用 `useRef` + `useEffect` 监听全局点击事件,点击菜单外部自动关闭。
|
||||
|
||||
### 修改密码后强制重新登录
|
||||
|
||||
密码修改成功后:
|
||||
1. 显示"密码修改成功,正在跳转登录页..."
|
||||
2. 1.5秒后调用登出 API
|
||||
3. 跳转到登录页面
|
||||
|
||||
---
|
||||
|
||||
## 🔗 相关文档
|
||||
|
||||
- [TASK_COMPLETE.md](../TASK_COMPLETE.md) - 任务总览
|
||||
- [Day14.md](./Day14.md) - 模型升级 + AI 标题标签
|
||||
- [AUTH_DEPLOY.md](../AUTH_DEPLOY.md) - 认证系统部署指南
|
||||
|
||||
---
|
||||
|
||||
## 🤖 模型与功能增强 (Day 15 晚)
|
||||
|
||||
### 1. GLM-4.7-Flash 升级
|
||||
|
||||
**文件**: `backend/app/services/glm_service.py`
|
||||
|
||||
将文案洗稿模型从 `glm-4-flash` 升级为 `glm-4.7-flash`:
|
||||
|
||||
```python
|
||||
response = client.chat.completions.create(
|
||||
model="glm-4.7-flash", # Upgrade from glm-4-flash
|
||||
messages=[...],
|
||||
# ...
|
||||
)
|
||||
```
|
||||
|
||||
**改进**:
|
||||
- 响应速度提升
|
||||
- 洗稿文案的流畅度和逻辑性增强
|
||||
|
||||
### 2. 独立文案提取助手
|
||||
|
||||
实现了独立的文案提取工具,支持从视频/音频文件或 URL 提取文字。
|
||||
|
||||
#### 后端实现 (`backend/app/api/tools.py`)
|
||||
|
||||
- **多源支持**: 文件上传 (MP4/MP3/WAV) 或 URL 下载
|
||||
- **智能下载**:
|
||||
- `yt-dlp`: 通用下载 (Douyin/TikTok/Bilibili)
|
||||
- `Playwright`: 智能回退机制 (Bilibili Dashboard API, Douyin Cookie Bypass)
|
||||
- **URL 自动清洗**: 正则提取分享文本中的 HTTP 链接
|
||||
- **流程**: 下载 -> FFmpeg 转 WAV (16k) -> Whisper 识别 -> GLM-4.7 洗稿
|
||||
|
||||
#### 前端实现 (`frontend/src/components/ScriptExtractionModal.tsx`)
|
||||
|
||||
- **独立模态框**: 通过顶部导航栏打开
|
||||
- **功能**:
|
||||
- 链接粘贴 / 文件拖拽
|
||||
- 实时进度显示 (下载 -> 识别 -> 洗稿)
|
||||
- **一键填入**: 将提取结果直接填充到主输入框
|
||||
- **自动识别**: 自动区分平台与链接
|
||||
- **交互优化**:
|
||||
- 防止误触背景关闭
|
||||
- 复制功能兼容 HTTP 环境 (Fallback textArea)
|
||||
|
||||
### 3. 上传视频预览功能
|
||||
|
||||
在素材列表 (`frontend/src/app/page.tsx`) 中为上传的视频添加预览功能:
|
||||
- 点击缩略图弹出视频播放模态框
|
||||
- 支持下载与发布快捷跳转
|
||||
|
||||
---
|
||||
|
||||
## 📝 任务清单更新
|
||||
|
||||
- [x] 认证系统迁移 (手机号)
|
||||
- [x] 账户管理 (密码修改/有效期)
|
||||
- [x] GLM-4.7 模型升级
|
||||
- [x] 独立文案提取助手 (B站/抖音支持)
|
||||
- [x] 视频预览功能
|
||||
139
Docs/DevLogs/Day16.md
Normal file
139
Docs/DevLogs/Day16.md
Normal file
@@ -0,0 +1,139 @@
|
||||
## 🔧 Qwen-TTS Flash Attention 优化 (10:00)
|
||||
|
||||
### 优化背景
|
||||
Qwen3-TTS 1.7B 模型在默认情况下加载速度慢,推理显存占用高。通过引入 Flash Attention 2,可以显著提升模型加载速度和推理效率。
|
||||
|
||||
### 实施方案
|
||||
在 `qwen-tts` Conda 环境中安装 `flash-attn`:
|
||||
|
||||
```bash
|
||||
conda activate qwen-tts
|
||||
pip install -U flash-attn --no-build-isolation
|
||||
```
|
||||
|
||||
### 验证结果
|
||||
- **加载速度**: 从 ~60s 提升至 **8.9s** ⚡
|
||||
- **显存占用**: 显著降低,消除 OOM 风险
|
||||
- **代码变动**: 无代码变动,仅环境优化 (自动检测)
|
||||
|
||||
## 🛡️ 服务看门狗 Watchdog (10:30)
|
||||
|
||||
### 问题描述
|
||||
常驻服务 (`vigent2-qwen-tts` 和 `vigent2-latentsync`) 可能会因显存碎片或长时间运行出现僵死 (Port open but unresponsive)。
|
||||
|
||||
### 解决方案
|
||||
开发了一个 Python Watchdog 脚本,每 30 秒轮询服务的 `/health` 接口,如果连续 3 次失败则自动重启服务。
|
||||
|
||||
1. **Watchdog 脚本**: `backend/scripts/watchdog.py`
|
||||
2. **启动脚本**: `run_watchdog.sh` (基于 PM2)
|
||||
|
||||
### 核心逻辑
|
||||
```python
|
||||
# 连续 3 次心跳失败触发重启
|
||||
if service["failures"] >= service['threshold']:
|
||||
subprocess.run(["pm2", "restart", service["name"]])
|
||||
```
|
||||
|
||||
### 部署状态
|
||||
- `vigent2-watchdog` 已启动并加入 PM2 列表
|
||||
- 监控对象: `vigent2-qwen-tts` (8009), `vigent2-latentsync` (8007)
|
||||
|
||||
---
|
||||
|
||||
## ⚡ LatentSync 性能确认
|
||||
|
||||
经代码审计,LatentSync 1.6 已内置优化:
|
||||
- ✅ **Flash Attention**: 原生使用 `torch.nn.functional.scaled_dot_product_attention`
|
||||
- ✅ **DeepCache**: 已启用 (`cache_interval=3`),提供 ~2.5x 加速
|
||||
- ✅ **GPU 并发**: 双卡流水线 (GPU0 TTS | GPU1 LipSync) 已确认工作正常
|
||||
|
||||
---
|
||||
|
||||
## 🎨 交互体验与视图优化 (14:20)
|
||||
|
||||
### 主页优化
|
||||
- 视频生成完成后,预览优先选中最新输出
|
||||
- 选择项持久化:素材 / 背景音乐 / 历史作品
|
||||
- 列表内滚动定位选中项,避免页面跳动
|
||||
- 刷新回到顶部(首页)
|
||||
- 标题/字幕样式预览面板
|
||||
- 背景音乐试听即选中并自动开启,音量滑块实时影响试听
|
||||
|
||||
### 发布页优化
|
||||
- 刷新回到顶部(发布页)
|
||||
|
||||
---
|
||||
|
||||
## 🎵 背景音乐链路修复 (15:00)
|
||||
|
||||
### 修复点
|
||||
- FFmpeg 混音改为 `shell=False`,避免 `filter_complex` 被 shell 误解析
|
||||
- `amix` 禁用归一化,避免配音音量被压低
|
||||
|
||||
### 关键修改
|
||||
`backend/app/services/video_service.py`
|
||||
|
||||
---
|
||||
|
||||
## 🗣️ 字幕断句修复 (15:20)
|
||||
|
||||
### 内容
|
||||
- 字幕切分逻辑保留英文单词整体,避免中英混合被硬切
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/whisper_service.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧱 资源库与样式能力接入 (15:40)
|
||||
|
||||
### 内容
|
||||
- 字体库 / BGM 资源接入本地 assets
|
||||
- 新增样式配置文件(字幕/标题)
|
||||
- 新增资源 API 与静态挂载 `/assets`
|
||||
- Remotion 支持样式参数与字体加载
|
||||
|
||||
### 涉及文件
|
||||
- `backend/assets/fonts/`
|
||||
- `backend/assets/bgm/`
|
||||
- `backend/assets/styles/subtitle.json`
|
||||
- `backend/assets/styles/title.json`
|
||||
- `backend/app/services/assets_service.py`
|
||||
- `backend/app/api/assets.py`
|
||||
- `backend/app/main.py`
|
||||
- `backend/app/api/videos.py`
|
||||
- `backend/app/services/remotion_service.py`
|
||||
- `remotion/src/components/Subtitles.tsx`
|
||||
- `remotion/src/components/Title.tsx`
|
||||
- `remotion/src/Video.tsx`
|
||||
- `remotion/render.ts`
|
||||
- `frontend/src/app/page.tsx`
|
||||
- `frontend/next.config.ts`
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ 运维调整 (16:10)
|
||||
|
||||
### 内容
|
||||
- Watchdog 移除 LatentSync 监控,避免长推理误杀
|
||||
- LatentSync PM2 增加内存重启阈值(运行时配置)
|
||||
|
||||
---
|
||||
|
||||
## 🎯 前端按钮图标统一 (16:40)
|
||||
|
||||
### 内容
|
||||
- 首页与发布页按钮图标统一替换为 Lucide SVG
|
||||
- 交互按钮保持一致尺寸与对齐
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/home/ui/`
|
||||
- `frontend/src/app/publish/page.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 📝 文档更新
|
||||
|
||||
- [x] `Docs/QWEN3_TTS_DEPLOY.md`: 添加 Flash Attention 安装指南
|
||||
- [x] `Docs/DEPLOY_MANUAL.md`: 添加 Watchdog 部署说明
|
||||
- [x] `Docs/TASK_COMPLETE.md`: 更新进度至 100% (Day 16)
|
||||
176
Docs/DevLogs/Day17.md
Normal file
176
Docs/DevLogs/Day17.md
Normal file
@@ -0,0 +1,176 @@
|
||||
# Day 17 - 前端重构与体验优化
|
||||
|
||||
## 🧩 前端 UI 拆分 (09:10)
|
||||
|
||||
### 内容
|
||||
- 首页 `page.tsx` 拆分为独立 UI 组件,状态与逻辑仍集中在页面
|
||||
- 新增首页组件目录 `frontend/src/features/home/ui/`
|
||||
|
||||
### 组件列表
|
||||
- `HomeHeader`
|
||||
- `MaterialSelector`
|
||||
- `ScriptEditor`
|
||||
- `TitleSubtitlePanel`
|
||||
- `VoiceSelector`
|
||||
- `RefAudioPanel`
|
||||
- `BgmPanel`
|
||||
- `GenerateActionBar`
|
||||
- `PreviewPanel`
|
||||
- `HistoryList`
|
||||
|
||||
---
|
||||
|
||||
## 🧰 前端通用工具抽取 (09:30)
|
||||
|
||||
### 内容
|
||||
- 抽取 API Base / 资源 URL / 日期格式化等通用工具
|
||||
- 首页与发布页统一调用,消除重复逻辑
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/shared/lib/media.ts`
|
||||
- `frontend/src/app/page.tsx`
|
||||
- `frontend/src/app/publish/page.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 📝 前端规范更新 (09:40)
|
||||
|
||||
### 内容
|
||||
- 更新 `FRONTEND_DEV.md` 以匹配最新目录结构
|
||||
- 新增 `media.ts` 使用规范与示例
|
||||
- 增加组件拆分规范与页面 checklist
|
||||
|
||||
### 涉及文件
|
||||
- `Docs/FRONTEND_DEV.md`
|
||||
|
||||
---
|
||||
|
||||
## 🎨 交互体验与视图优化 (10:00)
|
||||
|
||||
### 标题/字幕预览
|
||||
- 标题/字幕预览按素材分辨率缩放,字号更接近成片
|
||||
- 标题/字幕样式选择持久化,刷新不回默认
|
||||
- 默认样式更新:标题 90px 站酷快乐体,字幕 60px 经典黄字 + DingTalkJinBuTi
|
||||
|
||||
### 发布页优化
|
||||
- 选择作品改为卡片列表 + 搜索 + 预览弹窗
|
||||
|
||||
---
|
||||
|
||||
## ⚡ 性能微优化 (10:30)
|
||||
|
||||
### 内容
|
||||
- 列表渲染启用 `content-visibility`(素材/历史/参考音频/发布作品),BGM 列表保留滚动定位
|
||||
- 首屏数据请求并行化(`Promise.allSettled`)
|
||||
- localStorage 写入防抖(文本/标题/BGM 音量/发布表单)
|
||||
|
||||
---
|
||||
|
||||
## 🖼️ 预览弹窗增强 (11:10)
|
||||
|
||||
### 内容
|
||||
- 预览弹窗统一为可复用组件,支持标题与提示
|
||||
- 发布页预览与素材预览共享弹窗样式
|
||||
- 弹窗头部样式统一(图标 + 标题 + 关闭按钮)
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/components/VideoPreviewModal.tsx`
|
||||
- `frontend/src/app/page.tsx`
|
||||
- `frontend/src/app/publish/page.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 🧭 术语统一 (11:20)
|
||||
|
||||
### 内容
|
||||
- “视频预览” → “作品预览”
|
||||
- “历史视频” → “历史作品”
|
||||
- “选择要发布的视频” → “选择要发布的作品”
|
||||
- “选择素材视频” → “视频素材”
|
||||
- “选择配音方式” → “配音方式”
|
||||
|
||||
---
|
||||
|
||||
## 🧱 Phase 2 Hook 抽取 (11:45)
|
||||
|
||||
### 内容
|
||||
- `useTitleSubtitleStyles`:标题/字幕样式获取与默认选择逻辑
|
||||
- `useMaterials`:素材列表/上传/删除逻辑抽取
|
||||
- `useRefAudios`:参考音频列表/上传/删除逻辑抽取
|
||||
- `useBgm`:背景音乐列表与加载状态抽取
|
||||
- `useMediaPlayers`:音频试听逻辑集中管理(参考音频/背景音乐)
|
||||
- `useGeneratedVideos`:历史作品列表获取 + 选择逻辑抽取
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/home/model/useTitleSubtitleStyles.ts`
|
||||
- `frontend/src/features/home/model/useMaterials.ts`
|
||||
- `frontend/src/features/home/model/useRefAudios.ts`
|
||||
- `frontend/src/features/home/model/useBgm.ts`
|
||||
- `frontend/src/features/home/model/useMediaPlayers.ts`
|
||||
- `frontend/src/features/home/model/useGeneratedVideos.ts`
|
||||
- `frontend/src/app/page.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 🧩 首页持久化修复 (12:20)
|
||||
|
||||
### 内容
|
||||
- 接入 `useHomePersistence`,补齐 `isRestored` 恢复/保存逻辑
|
||||
- 修复首页刷新后选择项恢复链路,`npm run build` 通过
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/app/page.tsx`
|
||||
- `frontend/src/features/home/model/useHomePersistence.ts`
|
||||
|
||||
---
|
||||
|
||||
## 🧩 发布预览与播放修复 (14:10)
|
||||
|
||||
### 内容
|
||||
- 发布页作品预览兼容签名 URL 与相对路径
|
||||
- 参考音频试听统一走 `resolveMediaUrl`
|
||||
- 素材/BGM 选择在列表变化时自动回退有效项
|
||||
- 录音预览 URL 回收、预览弹窗滚动状态恢复、全局任务提示挂载
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/app/publish/page.tsx`
|
||||
- `frontend/src/features/home/model/useMediaPlayers.ts`
|
||||
- `frontend/src/features/home/model/useBgm.ts`
|
||||
- `frontend/src/features/home/model/useMaterials.ts`
|
||||
- `frontend/src/features/home/ui/RefAudioPanel.tsx`
|
||||
- `frontend/src/components/VideoPreviewModal.tsx`
|
||||
- `frontend/src/app/layout.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 🧩 标题同步与长度限制 (15:30)
|
||||
|
||||
### 内容
|
||||
- 片头标题修改同步写入发布信息标题
|
||||
- 标题输入兼容中文输入法,限制 15 字(发布信息同规则)
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/home/model/useHomeController.ts`
|
||||
- `frontend/src/features/home/ui/TitleSubtitlePanel.tsx`
|
||||
- `frontend/src/features/publish/model/usePublishController.ts`
|
||||
|
||||
---
|
||||
|
||||
## 🧱 轻量 FSD 迁移 (16:20)
|
||||
|
||||
### 内容
|
||||
- 页面瘦身:`app` 仅保留入口组件,业务逻辑集中到 Controller Hook
|
||||
- 引入 `features/*` 分层:UI 与 model 分离,Home/Publish 按功能聚合
|
||||
- 通用能力下沉到 `shared/*`(lib/hooks/api)
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/home/ui/HomePage.tsx`
|
||||
- `frontend/src/features/home/model/useHomeController.ts`
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx`
|
||||
- `frontend/src/features/publish/model/usePublishController.ts`
|
||||
- `frontend/src/shared/lib/media.ts`
|
||||
- `frontend/src/shared/lib/title.ts`
|
||||
- `frontend/src/shared/api/axios.ts`
|
||||
- `frontend/src/shared/hooks/useTitleInput.ts`
|
||||
- `frontend/src/app/page.tsx`
|
||||
- `frontend/src/app/publish/page.tsx`
|
||||
168
Docs/DevLogs/Day18.md
Normal file
168
Docs/DevLogs/Day18.md
Normal file
@@ -0,0 +1,168 @@
|
||||
# Day 18 - 后端模块化与规范完善
|
||||
|
||||
## 🧱 后端模块化重构 (10:10)
|
||||
|
||||
### 内容
|
||||
- API 路由统一透传到 `modules/*`,路由仅负责参数/权限与响应
|
||||
- 视频生成逻辑下沉 `workflow`,任务状态抽到 `task_store`
|
||||
- `TaskStore` 支持 Redis 优先、不可用时自动回退内存
|
||||
- Supabase 访问抽到 `repositories/*`,`deps/auth/admin` 全面改造
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/modules/videos/router.py`
|
||||
- `backend/app/modules/videos/workflow.py`
|
||||
- `backend/app/modules/videos/task_store.py`
|
||||
- `backend/app/modules/videos/service.py`
|
||||
- `backend/app/modules/*/router.py`
|
||||
- `backend/app/repositories/users.py`
|
||||
- `backend/app/repositories/sessions.py`
|
||||
- `backend/app/core/deps.py`
|
||||
|
||||
---
|
||||
|
||||
## ✅ 统一响应与异常处理 (11:00)
|
||||
|
||||
### 内容
|
||||
- 统一 JSON 响应结构:`success/message/data/code`
|
||||
- 全局异常处理中将 `detail` 转换为 `message`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/core/response.py`
|
||||
- `backend/app/main.py`
|
||||
|
||||
---
|
||||
|
||||
## 🎞️ 素材重命名与存储操作 (11:40)
|
||||
|
||||
### 内容
|
||||
- 新增素材重命名接口 `PUT /api/materials/{material_id}`
|
||||
- Storage 增加 `move_file` 以支持重命名/移动
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/modules/materials/router.py`
|
||||
- `backend/app/services/storage.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧾 平台列表调整 (12:10)
|
||||
|
||||
### 内容
|
||||
- 平台顺序调整为:抖音 → 微信视频号 → B站 → 小红书
|
||||
- 移除快手配置
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/publish_service.py`
|
||||
|
||||
---
|
||||
|
||||
## 📘 后端开发规范补充 (12:30)
|
||||
|
||||
### 内容
|
||||
- 新增 `BACKEND_DEV.md` 作为后端规范文档
|
||||
- `BACKEND_README.md` 同步模块化结构与响应格式
|
||||
|
||||
### 涉及文件
|
||||
- `Docs/BACKEND_DEV.md`
|
||||
- `Docs/BACKEND_README.md`
|
||||
|
||||
---
|
||||
|
||||
## 🚀 发布管理进入体验优化 (13:10)
|
||||
|
||||
### 内容
|
||||
- 首页预取 `/publish` 路由,进入发布管理时更快
|
||||
- 发布页读取 `sessionStorage` 预取数据,首屏更快渲染
|
||||
- 账号与作品列表增加骨架屏,避免空白等待
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/home/ui/HomePage.tsx`
|
||||
- `frontend/src/features/home/model/useHomeController.ts`
|
||||
- `frontend/src/features/publish/model/usePublishController.ts`
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 📁 首页素材加载优化 (13:30)
|
||||
|
||||
### 内容
|
||||
- 素材列表签名 URL 并发生成(并发上限 8),缩短加载时间
|
||||
- 素材列表增加加载骨架,数量根据上次素材数量动态调整
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/modules/materials/router.py`
|
||||
- `frontend/src/features/home/model/useMaterials.ts`
|
||||
- `frontend/src/features/home/model/useHomeController.ts`
|
||||
- `frontend/src/features/home/ui/HomePage.tsx`
|
||||
- `frontend/src/features/home/ui/MaterialSelector.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 🎬 预览加载体验优化 (14:00)
|
||||
|
||||
### 内容
|
||||
- 预览视频设置 `preload="metadata"`,缩短首帧等待
|
||||
- 发布页预览按钮悬停预取视频资源
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/components/VideoPreviewModal.tsx`
|
||||
- `frontend/src/features/home/ui/PreviewPanel.tsx`
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 📹 微信视频号发布接入 (16:30)
|
||||
|
||||
### 内容
|
||||
- 新增视频号上传器 `WeixinUploader`,打通上传/标题/简介/标签/发布流程
|
||||
- 视频号扫码登录配置完善(iframe 扫码、候选二维码过滤)
|
||||
- 发布平台与路由接入视频号
|
||||
- 中文错误提示 + 关键节点截图保存到 `debug_screenshots`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/services/qr_login_service.py`
|
||||
- `backend/app/services/publish_service.py`
|
||||
- `backend/app/modules/publish/router.py`
|
||||
- `backend/app/modules/login_helper/router.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧪 视频号上传稳定性修复 (17:40)
|
||||
|
||||
### 内容
|
||||
- 统一浏览器指纹(UA/locale/timezone)并支持系统 Chrome
|
||||
- 增加 headful + xvfb-run 运行方案,避免 headless 检测与解码失败
|
||||
- 强制 SwiftShader,修复 WebGL context loss
|
||||
- 上传前转码为兼容 MP4(H.264 + AAC + faststart)
|
||||
- 增强上传状态判断与调试日志 `weixin_network.log`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/core/config.py`
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/services/qr_login_service.py`
|
||||
- `run_backend.sh`
|
||||
|
||||
---
|
||||
|
||||
## 🧾 发布诊断增强 (18:10)
|
||||
|
||||
### 内容
|
||||
- 抖音发布新增网络日志与失败截图,便于定位上传/发布失败
|
||||
- 视频号上传失败截图与网络日志落盘
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/debug_screenshots/*`
|
||||
|
||||
---
|
||||
|
||||
## 🧩 发布页交互调整 (18:20)
|
||||
|
||||
### 内容
|
||||
- 未选择平台时禁用发布按钮
|
||||
- 移除定时发布 UI/参数,仅保留立即发布
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx`
|
||||
- `frontend/src/features/publish/model/usePublishController.ts`
|
||||
485
Docs/DevLogs/Day19.md
Normal file
485
Docs/DevLogs/Day19.md
Normal file
@@ -0,0 +1,485 @@
|
||||
## 🛡️ 发布中防误刷新(15:46,合并)
|
||||
|
||||
### 内容
|
||||
- 发布按钮文案统一为:`正在发布...请勿刷新或关闭网页`
|
||||
- 发布中启用浏览器 `beforeunload` 拦截,刷新/关闭页面会触发原生二次确认
|
||||
- 适用于发布管理页全部平台(抖音 / 微信视频号 / B站 / 小红书)
|
||||
- 后续优化已登记:发布任务状态恢复机制(任务化 + 状态持久化 + 前端轮询恢复)
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/publish/model/usePublishController.ts`
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 🖼️ 发布成功截图稳定性优化(15:26,合并)
|
||||
|
||||
### 内容
|
||||
- 成功判定后先等待页面加载,再额外等待 `3s` 后截图,避免抓到半加载页面
|
||||
- 针对“截图里页面内容只占 1/3”问题,成功截图从 `full_page=True` 调整为视口截图 `full_page=False`
|
||||
- 视频号成功截图前额外恢复 `zoom=1.0`,避免流程缩放影响最终截图比例
|
||||
- 抖音成功截图同步应用相同策略,统一前端展示观感
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧪 视频号录屏 Debug 开关(15:12,已回收)
|
||||
|
||||
### 内容
|
||||
- 为视频号上传器新增 Playwright 录屏能力,开关受 `WEIXIN_DEBUG_ARTIFACTS && WEIXIN_RECORD_VIDEO` 控制
|
||||
- 新增视频号录屏配置项:
|
||||
- `WEIXIN_RECORD_VIDEO`
|
||||
- `WEIXIN_KEEP_SUCCESS_VIDEO`
|
||||
- `WEIXIN_RECORD_VIDEO_WIDTH`
|
||||
- `WEIXIN_RECORD_VIDEO_HEIGHT`
|
||||
- 上传流程在 `finally` 中统一保存录屏,失败必保留;成功录屏默认按开关清理
|
||||
- 排障阶段临时开启过视频号 debug/录屏;当前已回收为默认关闭(`run_backend.sh` 设为 `false`)
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/core/config.py`
|
||||
- `run_backend.sh`
|
||||
- `Docs/DEPLOY_MANUAL.md`
|
||||
|
||||
---
|
||||
|
||||
## 🔁 后端启动脚本统一为 run_backend.sh (15:00)
|
||||
|
||||
### 内容
|
||||
- 删除旧脚本 `run_backend_xvfb.sh`
|
||||
- 将 `run_backend.sh` 统一为 xvfb + headful 启动逻辑(不再保留非 xvfb 版本)
|
||||
- 默认端口从 `8010` 统一为 `8006`
|
||||
- 启动脚本默认关闭微信/抖音 debug 产物
|
||||
- 更新部署手册中的启动与 pm2 示例,统一使用 `run_backend.sh`
|
||||
|
||||
### 涉及文件
|
||||
- `run_backend.sh`
|
||||
- `run_backend_xvfb.sh` (deleted)
|
||||
- `Docs/DEPLOY_MANUAL.md`
|
||||
|
||||
---
|
||||
|
||||
## 🧾 视频号卡顿与文案未写入修复 (14:52)
|
||||
|
||||
### 内容
|
||||
- 复盘日志确认视频号 `post_create` 请求已成功,但结果判定仅靠页面文案,导致长时间“等待发布结果”
|
||||
- 发布判定优化:`post_create` 成功且页面进入 `post/list` 时立即判定成功
|
||||
- 发布超时改为失败返回(不再 `success=true` 假成功)
|
||||
- “标题+标签写在视频描述”进一步加强:先按 `视频描述` 标签定位输入框,再做 placeholder 与 contenteditable 兜底
|
||||
- 视频号发布结果等待超时从 `180s` 收敛到 `90s`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🚦 视频号发布卡顿根因与快速判定 (14:45)
|
||||
|
||||
### 内容
|
||||
- 定位到卡顿根因是实际请求已提交(`post_create` 成功)但结果判定仍在轮询文本提示,导致长时间等待
|
||||
- 新增发布成功网络信号:监听 `post/post_create` 成功响应后标记已提交
|
||||
- 若已提交且页面已回到内容列表(`/post/list`),立即判定发布成功,不再等满超时
|
||||
- 新增发布接口失败信号:`post_create` 返回错误时立即失败返回
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 📸 视频号发布成功截图接入前端 (13:34)
|
||||
|
||||
### 内容
|
||||
- 为微信视频号新增“发布成功截图”能力:发布成功后直接对当前成功页截图
|
||||
- 截图存储沿用私有隔离目录:`private_outputs/publish_screenshots/{user_id}`
|
||||
- 返回前端的 `screenshot_url` 使用鉴权接口:`/api/publish/screenshot/{filename}`
|
||||
- 视频号上传器新增 `user_id` 透传,确保截图按用户隔离
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/services/publish_service.py`
|
||||
|
||||
---
|
||||
|
||||
## ✍️ 视频号描述填充修正 + 关闭调试产物 (13:26)
|
||||
|
||||
### 内容
|
||||
- 按最新规则调整视频号文案填充:标题和标签统一写入“视频描述”输入区
|
||||
- 标签统一规范为 `#标签` 形式并去重
|
||||
- 若未找到“视频描述”输入区,直接返回失败,避免“发布成功但标题/标签为空”
|
||||
- 关闭视频号 debug 产物:新增 `WEIXIN_DEBUG_ARTIFACTS=false`,禁用调试日志与截图输出
|
||||
- `run_backend.sh` 增加 `WEIXIN_DEBUG_ARTIFACTS=false`,启动脚本层面强制关闭
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/core/config.py`
|
||||
- `run_backend.sh`
|
||||
|
||||
---
|
||||
|
||||
## 🚫 强制关闭抖音调试产物 (13:15)
|
||||
|
||||
### 内容
|
||||
- 进一步收紧为“默认不生成任何抖音 debug 截屏/日志/录屏”
|
||||
- 录屏开关改为依赖 `DOUYIN_DEBUG_ARTIFACTS && DOUYIN_RECORD_VIDEO`,避免单独误开
|
||||
- `run_backend.sh` 增加环境变量强制关闭:
|
||||
- `DOUYIN_DEBUG_ARTIFACTS=false`
|
||||
- `DOUYIN_RECORD_VIDEO=false`
|
||||
- 仅保留给用户看的发布成功截图(私有目录 + 鉴权访问)
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `backend/app/core/config.py`
|
||||
- `run_backend.sh`
|
||||
|
||||
---
|
||||
|
||||
## 🧹 关闭调试截屏/录屏并清理历史文件 (13:08)
|
||||
|
||||
### 内容
|
||||
- 抖音调试产物默认关闭:
|
||||
- `DOUYIN_DEBUG_ARTIFACTS=false`
|
||||
- `DOUYIN_RECORD_VIDEO=false`
|
||||
- 保留功能信号监听(上传提交/封面生成/发布接口状态)用于流程判断,不依赖调试文件
|
||||
- 已删除现有抖音调试文件(`debug_screenshots` 下的 `douyin_*` 截图、日志与失败录屏)
|
||||
- 继续保留并展示“给用户看的发布成功截图”(用户隔离 + 鉴权访问)
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/core/config.py`
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `backend/app/debug_screenshots/douyin_*` (deleted)
|
||||
- `backend/app/debug_screenshots/videos/douyin_*` (deleted)
|
||||
|
||||
---
|
||||
|
||||
## 🔒 成功截图用户隔离 (12:58)
|
||||
|
||||
### 内容
|
||||
- 发布成功截图改为用户隔离存储,不再写入公开静态目录
|
||||
- 存储目录迁移到私有路径:`private_outputs/publish_screenshots/{user_id}`
|
||||
- 新增鉴权访问接口:`GET /api/publish/screenshot/{filename}`(必须登录,仅可访问本人截图)
|
||||
- 返回给前端的 `screenshot_url` 改为鉴权接口地址,避免跨用户直接猜路径访问
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `backend/app/services/publish_service.py`
|
||||
- `backend/app/modules/publish/router.py`
|
||||
- `backend/app/core/config.py`
|
||||
|
||||
---
|
||||
|
||||
## 🎯 封面触发提速与审核中截图强化 (12:49)
|
||||
|
||||
### 内容
|
||||
- 修复“上传完成后长时间不进入封面”:当出现 `重新上传+预览` 且已收到视频提交信号时,立即进入封面步骤
|
||||
- 目标是减少“处理中”文案残留导致的额外等待
|
||||
- 成功截图逻辑强化为优先“真实点击审核中标签”,新增文本点击兜底,不再只用可见即通过
|
||||
- 若审核中列表未马上出现标题,自动刷新并再次进入审核中重查后再截图
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🔐 登录态识别增强(避免误报上传失败) (12:41)
|
||||
|
||||
### 内容
|
||||
- 针对“未触发文件选择弹窗”误报,新增登录页识别:
|
||||
- URL 关键字:`passport/login/check_qrconnect/sso`
|
||||
- 页面文本:`扫码登录/验证码登录/立即登录/抖音APP扫码登录` 等
|
||||
- 登录控件:手机号/验证码输入框、登录按钮
|
||||
- 上传阶段重试后若识别为登录页,直接返回 `Cookie 已失效,请重新登录`
|
||||
- 避免把“实际掉登录”误判成“上传入口失效”
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ⏱️ 发布阶段超时与网络不佳快速失败 (12:30)
|
||||
|
||||
### 内容
|
||||
- 针对“网络不佳后长时间卡住”增加发布阶段快速失败
|
||||
- 上传完成后到发布结果设置总超时 `60s`(`POST_UPLOAD_STAGE_TIMEOUT`),超过直接失败
|
||||
- 识别发布接口 `create_v2` 的 HTTP 错误(如 403)并立即返回失败,不再等待 180 秒
|
||||
- 发布结果判定新增网络类失败文案匹配(`网络不佳/网络异常/请稍后重试`)
|
||||
- 阻塞弹窗关闭策略新增 `暂不设置`,避免“设置横封面获更多流量”弹窗阻塞点击发布
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧯 封面已完成但误判失败修复 (12:22)
|
||||
|
||||
### 内容
|
||||
- 针对报错“封面为必填但未设置成功”新增页面态兜底,避免封面已完成却未点击发布
|
||||
- 新增 `_is_cover_configured_on_page()`:通过 `横封面/竖封面` + 封面预览图判断页面已配置封面
|
||||
- 当出现 `horizontal_switch_missed` 或 `no_cover_button` 时,若页面已配置封面则允许继续发布
|
||||
- 封面必填主流程增加 `configured_fallback_continue` 兜底,降低误杀
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧾 成功截图切到审核中视图 (11:26)
|
||||
|
||||
### 内容
|
||||
- 按需求将“发布成功截图”改为内容管理 `审核中/待审核` 视图,不再截“全部作品”
|
||||
- 发布成功后先进入内容管理并点击 `审核中`(或 `待审核`)再截图
|
||||
- 截图前额外尝试等待当前标题出现在审核中列表,便于确认是最新发布作品
|
||||
- 发布超时兜底验证也改为优先在审核中列表查找标题
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ✅ 封面步骤按指定顺序强约束 (11:18)
|
||||
|
||||
### 内容
|
||||
- 按确认流程收紧旧发布页封面链路:
|
||||
- 作品描述填完 → 点击 `选择封面` → 点击 `设置横封面` → 点击 `完成` → 等待封面效果检测通过 → 才允许发布
|
||||
- 新增 `require_horizontal` 约束:封面必填场景必须切换到横封面,否则直接失败重试
|
||||
- 新增封面效果检测通过等待:优先 `cover/gen` 新请求信号,其次页面“检测通过”文案
|
||||
- 避免因漏点 `设置横封面` 导致后续卡住或误发布
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧩 横封面点击漏判修复 (11:10)
|
||||
|
||||
### 内容
|
||||
- 根据复现反馈修复“未点击设置横封面导致封面流程卡住”问题
|
||||
- 新增 `_switch_to_horizontal_cover()`,扩展横封面入口选择器(`设置横封面/横封面/横版封面`)
|
||||
- 进入封面弹窗后先关闭阻塞弹窗再点击横封面,点击失败会重试一次
|
||||
- 若页面存在横封面入口但始终未切换成功,直接返回失败并重试,避免长时间假等待
|
||||
- 新增日志:`[douyin][cover] switched_horizontal ...`、`horizontal_switch_missed`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ⚡ 横封面后直接完成优化 (11:03)
|
||||
|
||||
### 内容
|
||||
- 根据实测反馈,在点击 `设置横封面` 后新增一次“立即点击完成”快速路径
|
||||
- 若平台已自动选中横封面,将直接确认并退出弹窗,不再执行后续封面扫描
|
||||
- 新增日志:`[douyin][cover] fast_confirm_after_switch ...`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ 封面步骤提速优化 (10:58)
|
||||
|
||||
### 内容
|
||||
- 复盘日志确认旧发布页封面步骤存在明显耗时(示例:`required_by_text` 到 `cover selected` 约 35 秒)
|
||||
- 新增封面“快速确认”路径:若平台已默认选中封面,直接确认并跳过多余扫描
|
||||
- 收紧封面成功条件:仅“确认按钮点击成功”才算封面设置成功,避免误判
|
||||
- 缩短不必要等待并新增封面耗时日志:`[douyin][cover] fast_confirm/selected=... confirmed=... elapsed=...`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧾 发布成功截图前台展示 (10:48)
|
||||
|
||||
### 内容
|
||||
- 按需求删除 `run_backend_xvfb_live.sh`,不再提供实时直播脚本
|
||||
- 抖音发布成功时自动保存成功截图到 `outputs/publish_screenshots`
|
||||
- 发布接口返回 `screenshot_url`,前端发布结果卡片直接展示截图并支持点击查看大图
|
||||
- 发布结果不再 10 秒自动清空,方便用户确认“是否真正发布成功”
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `frontend/src/features/publish/model/usePublishController.ts`
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx`
|
||||
- `run_backend_xvfb_live.sh` (deleted)
|
||||
|
||||
---
|
||||
|
||||
## 🧬 抖音界面差异根因与环境对齐 (10:20)
|
||||
|
||||
### 内容
|
||||
- 定位到 Playwright 与手动 Win11 Chrome 的环境指纹不一致(Linux 平台 + 自动化上下文),可能触发不同灰度界面
|
||||
- 抖音上传器新增独立浏览器配置项,不再复用 `WEIXIN_*` 配置
|
||||
- 新增 `DOUYIN_*` 配置:`HEADLESS_MODE/USER_AGENT/LOCALE/TIMEZONE_ID/CHROME_PATH/BROWSER_CHANNEL/FORCE_SWIFTSHADER`
|
||||
- 上传器启动改为 `_build_launch_options()`,可直接切换到系统 Chrome + headful(推荐配合 xvfb)
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `backend/app/core/config.py`
|
||||
|
||||
---
|
||||
|
||||
## 🪄 新旧发布页封面逻辑分流 (10:28)
|
||||
|
||||
### 内容
|
||||
- 依据页面结构自动分流:
|
||||
- 新版发布页(封面非必填):默认跳过封面设置
|
||||
- 旧版发布页(出现 `设置封面` + `必填`):强制先设置封面
|
||||
- 新增 `_is_cover_required()` 判断,避免在新页面做多余封面操作
|
||||
- 若判定为非必填但点击发布失败,会回退尝试设置封面后再重试发布
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 📺 虚拟屏实时观看方案 (10:36)
|
||||
|
||||
### 内容
|
||||
- 新增 `run_backend_xvfb_live.sh`,在 Xvfb 下同时启动后端与实时画面转码
|
||||
- 通过 ffmpeg 抓取虚拟屏并输出 HLS:`/outputs/live/live.m3u8`
|
||||
- 适用于“边跑自动发布边实时观看”,不依赖 VNC
|
||||
- 默认仍保留失败录屏,HLS 用于过程实时观察
|
||||
|
||||
### 涉及文件
|
||||
- `run_backend_xvfb_live.sh`
|
||||
|
||||
---
|
||||
|
||||
## 🎥 抖音后台录屏能力 (09:55)
|
||||
|
||||
### 内容
|
||||
- 新增抖音自动发布过程录屏能力,便于定位“卡住在哪一步”
|
||||
- 录屏文件保存目录:`backend/app/debug_screenshots/videos`
|
||||
- 默认开启录屏,默认只保留失败录屏(成功录屏自动清理)
|
||||
- 每次执行会在网络日志追加录屏保存记录(`[douyin][record]`)
|
||||
- 增加发布阶段关键标记日志:`publish_wait ready`、`publish_click try/clicked`
|
||||
- 新增配置项:`DOUYIN_RECORD_VIDEO`、`DOUYIN_KEEP_SUCCESS_VIDEO`、`DOUYIN_RECORD_VIDEO_WIDTH`、`DOUYIN_RECORD_VIDEO_HEIGHT`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `backend/app/core/config.py`
|
||||
|
||||
---
|
||||
|
||||
## 🚀 发布按钮等待逻辑修正 (10:00)
|
||||
|
||||
### 内容
|
||||
- 根据线上反馈,发布页不再做冗长前置等待,改为“尽快尝试点击发布”
|
||||
- 新增发布按钮定位策略(role + text 多选择器),避免 `exact role` 匹配失败导致假等待
|
||||
- 将发布按钮等待上限从上传超时(300s)独立为 `PUBLISH_BUTTON_TIMEOUT=60s`
|
||||
- 点击发布阶段统一走 `_click_publish_button`,并持续记录 `publish_wait/publish_click` 日志
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧪 上传完成特征判定增强 (10:07)
|
||||
|
||||
### 内容
|
||||
- 基于实测页面特征补齐“上传中/上传完成”判定:
|
||||
- 上传中:`上传过程中请不要刷新`、`取消上传`、`已上传/当前速度/剩余时间`
|
||||
- 上传完成:`重新上传` + `预览视频/预览封面/标题`
|
||||
- 仅在确认上传完成后才允许执行发布点击,避免“未传完提前发布”
|
||||
- 新增上传等待日志:`[douyin][upload_wait] ...`,可直观看到卡在上传中还是等完成信号
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ⏸️ 上传完成后延时发布 (10:10)
|
||||
|
||||
### 内容
|
||||
- 根据实测反馈,增加“上传完成后固定等待 2 秒”再点发布
|
||||
- 避免刚出现完成信号就立即点击,给前端状态收敛留缓冲
|
||||
- 新增日志标记:`[douyin][upload_ready] wait_before_publish=2s`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🖼️ 恢复封面设置流程 (10:14)
|
||||
|
||||
### 内容
|
||||
- 按实测需求恢复“上传完成后先设置封面,再发布”流程
|
||||
- 封面设置改为最多尝试 2 次,成功写入 `[douyin][cover] selected`
|
||||
- 若封面未设置成功则直接终止发布并保存截图 `cover_not_selected`
|
||||
- 避免出现“未设封面就点击发布”的情况
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ 抖音发布流程修复 (09:20)
|
||||
|
||||
### 内容
|
||||
- 按最新页面流程改为先进入首页并点击 `高清发布`,再进入上传页
|
||||
- 新增未发布草稿处理:检测到 `你还有上次未发布的视频` 时自动点击 `放弃`
|
||||
- 上传策略改为优先点击 `上传视频` 并走 file chooser,失败后再回退多 input 选择器
|
||||
- 只有检测到 `基础信息/作品描述/发布设置/重新上传` 等发布态信号才继续,避免误判“已上传”
|
||||
- 修复无扩展名视频临时文件策略:优先 hardlink,失败时 copy,移除 symlink 回退
|
||||
- 适配当前智能封面流程:跳过手动封面操作
|
||||
- 话题填写改为在简介/描述区域使用 `#标签` 形式追加
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ⚡ 抖音等待链路再收敛 (09:52)
|
||||
|
||||
### 内容
|
||||
- 根据“选完视频即进入发布页”流程,移除独立的上传完成轮询阶段
|
||||
- 改为在点击发布前统一等待“发布按钮可点击”,避免重复等待导致总时长偏长
|
||||
- 新增 `publish_wait` 调试日志,按秒记录按钮可点击等待时长
|
||||
- 超时文案改为明确提示“发布按钮长时间不可点击”
|
||||
- 上传入口改为严格 file chooser 流程:只走“点击上传视频 → 选择文件 → 进入发布页”链路
|
||||
- 移除直接 input 回退上传,避免绕开上传入口导致状态机异常
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧭 抖音卡慢环节定位与修复 (09:45)
|
||||
|
||||
### 内容
|
||||
- 通过 `douyin_network.log` 定位到卡慢发生在“上传完成判定”阶段,而非真正提交发布接口
|
||||
- 新增上传完成网络信号:`CommitUploadInner` 成功与封面生成成功信号写入日志
|
||||
- 收紧“上传完成”判定,移除 `publish_button_enabled` 这种过早放行条件
|
||||
- 仅在检测到 `重新上传/重新选择` 或上传提交信号后才进入下一步,降低误判导致的长等待
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ✅ 抖音发布结果判定修正 (09:38)
|
||||
|
||||
### 内容
|
||||
- 修复“发布检测超时仍返回 success=true”的问题,超时场景改为 `success=false`
|
||||
- 优化超时返回文案,明确为“发布状态未知,需要后台确认”
|
||||
- 下线过于宽松的管理页兜底判定(仅出现 `审核中` 不再当作发布成功)
|
||||
- 超时时即使管理页出现同名标题也不直接判定成功,避免旧作品同名导致误报
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ⏱️ 抖音上传完成判定优化 (09:34)
|
||||
|
||||
### 内容
|
||||
- 根据最新日志确认文件上传已开始并有分片上传请求成功,但流程长时间停留在“等待上传完成”
|
||||
- 扩展“上传完成”判定条件,不再只依赖单一 `long-card + 重新上传` 选择器
|
||||
- 新增上传完成信号:`重新上传/重新选择` 可见、发布按钮可用、`发布设置` 或 `预览视频` 可见
|
||||
- 上传等待日志增加耗时秒数,便于判断是否真实卡住
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
103
Docs/DevLogs/Day20.md
Normal file
103
Docs/DevLogs/Day20.md
Normal file
@@ -0,0 +1,103 @@
|
||||
## 🔧 代码质量与安全优化 (13:30)
|
||||
|
||||
### 概述
|
||||
本日进行项目全面代码审查与优化,共处理 27 项优化点,完成 18 项核心修复。
|
||||
|
||||
### 已完成优化
|
||||
|
||||
#### 功能性修复
|
||||
- [x] **P0-1**: LatentSync 回退逻辑空实现 → 改为 `raise RuntimeError`
|
||||
- [x] **P1-1**: 任务状态接口无用户归属校验 → 添加用户认证依赖
|
||||
- [x] **P1-2**: 前端 User 类型定义重复 → 统一到 `shared/types/user.ts`
|
||||
|
||||
#### 性能优化
|
||||
- [x] **P1-3**: 参考音频列表 N+1 查询 → 使用 `asyncio.gather` 并发
|
||||
- [x] **P1-4**: 视频上传整读内存 → 新增 `upload_file_from_path` 流式处理
|
||||
- [x] **P1-5**: async 路由内同步阻塞 → `httpx.AsyncClient` 替换 `requests`
|
||||
- [x] **P2-2**: GLM 服务同步调用 → `asyncio.to_thread` 包装
|
||||
- [x] **P2-3**: Remotion 渲染启动慢 → 预编译 JS + `build:render` 脚本
|
||||
|
||||
#### 安全修复
|
||||
- [x] **P1-8**: 硬编码 Cookie → 移至环境变量 `DOUYIN_COOKIE`
|
||||
- [x] **P1-9**: 请求日志打印完整 headers → 敏感信息脱敏
|
||||
- [x] **P2-10**: ffprobe 使用 `shell=True` → 改为参数列表
|
||||
- [x] **P2-11**: CORS 配置 `*` + credentials → 从 `CORS_ORIGINS` 环境变量读取
|
||||
|
||||
#### 配置优化
|
||||
- [x] **P2-5**: 存储服务硬编码路径 → 环境变量 `SUPABASE_STORAGE_LOCAL_PATH`
|
||||
- [x] **P3-3**: Remotion `execSync` 同步调用 → promisified `exec` 异步
|
||||
- [x] **P3-5**: LatentSync 相对路径 → 基于 `__file__` 绝对路径
|
||||
|
||||
### 暂不处理(收益有限)
|
||||
- [~] **P1-6**: useHomeController 超大文件 (884行)
|
||||
- [~] **P1-7**: 抖音/微信上传器重复代码(流程差异大)
|
||||
|
||||
### 低优先级(后续处理)
|
||||
- [~] **P2-6~P2-9**: API 转发壳、前端 API 客户端混用、ESLint、重复逻辑
|
||||
- [~] **P3-1~P3-4**: 阻塞式交互、Modal 过大、样式兼容层
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/latentsync_service.py` - 回退逻辑
|
||||
- `backend/app/modules/videos/router.py` - 任务状态认证
|
||||
- `backend/app/modules/tools/router.py` - httpx 异步、Cookie 配置化
|
||||
- `backend/app/services/glm_service.py` - 异步包装
|
||||
- `backend/app/services/storage.py` - 流式上传、路径配置化
|
||||
- `backend/app/services/video_service.py` - ffprobe 安全调用
|
||||
- `backend/app/main.py` - CORS 配置、日志脱敏
|
||||
- `backend/app/core/config.py` - 新增配置项
|
||||
- `remotion/render.ts` - 异步 exec
|
||||
- `remotion/package.json` - build:render 脚本
|
||||
- `models/LatentSync/scripts/server.py` - 绝对路径
|
||||
- `frontend/src/shared/types/user.ts` - 统一类型定义
|
||||
|
||||
### 新增环境变量
|
||||
```bash
|
||||
# .env 新增配置(均有默认值,无需必填)
|
||||
CORS_ORIGINS=* # CORS 白名单
|
||||
SUPABASE_STORAGE_LOCAL_PATH=/path/to/... # 本地存储路径
|
||||
DOUYIN_COOKIE=... # 抖音视频下载 Cookie
|
||||
```
|
||||
|
||||
### 重启要求
|
||||
```bash
|
||||
pm2 restart vigent2-backend
|
||||
pm2 restart vigent2-latentsync
|
||||
# Remotion 已自动编译
|
||||
```
|
||||
|
||||
### 🎨 交互与体验优化 (17:00)
|
||||
|
||||
- [x] **UX-1**: PublishPage 图片加载优化 (`<img>` → `next/image`)
|
||||
- [x] **UX-2**: 按钮 Loading 状态统一 (提取脚本弹窗 + 发布页)
|
||||
- [x] **UX-3**: 骨架屏加载优化 (发布页加载中状态)
|
||||
- [x] **UX-4**: 全局快捷键支持 (ESC 关闭弹窗, Enter 确认)
|
||||
- [x] **UX-5**: 移除全局 GlobalTaskIndicator (视觉降噪)
|
||||
- [x] **UX-6**: 视频生成完成自动刷新列表并选中最新
|
||||
|
||||
### 🐛 缺陷修复与回归治理 (17:30)
|
||||
|
||||
#### 严重缺陷修复
|
||||
- [x] **BUG-1**: Remotion 渲染脚本路径解析错误 (导致标题字幕丢失)
|
||||
- *原因*: `render.js` 预编译后使用了 `__dirname`,在 `dist` 目录下寻找源码失败。
|
||||
- *修复*: 修改 `render.ts` 使用 `process.cwd()` 动态解析路径,并重新编译。
|
||||
|
||||
- [x] **BUG-2**: 发布页视频选择持久化失效 (Auth 异步竞态)
|
||||
- *原因*: 页面加载时 `useAuth` 尚未返回用户 ID,导致使用 `guest` Key 读取不到记录,随后被默认值覆盖。
|
||||
- *修复*: 引入 `isVideoRestored` 状态机,强制等待 Auth 完成且 Video 列表加载完毕后,才执行恢复逻辑。
|
||||
|
||||
#### 回归问题治理
|
||||
- [x] **REG-1**: 首页历史作品 ID 恢复后内容不显示
|
||||
- *原因*: 持久化模块恢复了 ID,但 `useGeneratedVideos` 未监听 ID 变化同步 URL。
|
||||
- *修复*: 新增 `useEffect` 监听 `selectedVideoId` 变化并同步 `generatedVideo` URL。
|
||||
|
||||
- [x] **REG-2**: 首页/发布页“默认选中第一个”逻辑丢失
|
||||
- *原因*: 重构移除旧逻辑后,新用户或无缓存用户进入页面无默认选中。
|
||||
- *修复*: 在 `isRestored` 且无选中时,增加兜底逻辑自动选中列表第一项。
|
||||
|
||||
- [x] **REG-3**: 素材选择持久化失效 (闭包陷阱)
|
||||
- *原因*: `useMaterials` 加载回调中捕获了旧的 `selectedMaterial` 状态,覆盖了已恢复的值。
|
||||
- *修复*: 改为函数式状态更新 (`setState(prev => ...)`),确保基于最新状态判断。
|
||||
|
||||
- [x] **REF-1**: 持久化逻辑全站收敛与排查
|
||||
- *优化*: 清理 `useBgm`, `useGeneratedVideos`, `useTitleSubtitleStyles` 中的冗余 `localStorage` 读取,统一由 `useHomePersistence` 管理。
|
||||
- *排查*: 深度排查 `useRefAudios`, `useTitleSubtitleStyles` 等模块,确认逻辑健壮,无类似回归风险。
|
||||
449
Docs/DevLogs/Day21.md
Normal file
449
Docs/DevLogs/Day21.md
Normal file
@@ -0,0 +1,449 @@
|
||||
## 🐛 缺陷修复:视频生成与持久化回归 (Day 21)
|
||||
|
||||
### 概述
|
||||
本日修复 Day 20 优化后引入的 3 个回归缺陷:Remotion 渲染崩溃容错、首页作品选择持久化、发布页作品选择持久化。
|
||||
|
||||
---
|
||||
|
||||
### 已完成修复
|
||||
|
||||
#### BUG-1: Remotion 渲染进程崩溃导致标题/字幕丢失
|
||||
- **现象**: 视频生成后没有标题和字幕,回退到纯 FFmpeg 合成。
|
||||
- **根因**: Remotion Node.js 进程在渲染完成(100%)后以 SIGABRT (code -6) 退出,Python 端将其视为失败。
|
||||
- **修复**: `remotion_service.py` 在进程非零退出时,先检查输出文件是否存在且大小合理(>1KB),若存在则视为成功。
|
||||
- **文件**: `backend/app/services/remotion_service.py`
|
||||
|
||||
```python
|
||||
if process.returncode != 0:
|
||||
output_file = Path(output_path)
|
||||
if output_file.exists() and output_file.stat().st_size > 1024:
|
||||
logger.warning(
|
||||
f"Remotion process exited with code {process.returncode}, "
|
||||
f"but output file exists ({output_file.stat().st_size} bytes). Treating as success."
|
||||
)
|
||||
return output_path
|
||||
raise RuntimeError(...)
|
||||
```
|
||||
|
||||
#### BUG-2: 首页历史作品选择刷新后不保持
|
||||
- **现象**: 用户选择某个历史作品后刷新页面,总是回到第一个视频。
|
||||
- **根因**: `fetchGeneratedVideos()` 在初始加载时无条件自动选中第一个视频,覆盖了 `useHomePersistence` 的恢复值。
|
||||
- **修复**: `fetchGeneratedVideos` 增加 `preferVideoId` 参数,仅在明确指定时才自动选中;新增 `"__latest__"` 哨兵值用于生成完成后选中最新。
|
||||
- **文件**: `frontend/src/features/home/model/useGeneratedVideos.ts`, `frontend/src/features/home/model/useHomeController.ts`
|
||||
|
||||
```typescript
|
||||
// 任务完成 → 自动选中最新
|
||||
useEffect(() => {
|
||||
if (prevIsGenerating.current && !isGenerating) {
|
||||
if (currentTask?.status === "completed") {
|
||||
void fetchGeneratedVideos("__latest__");
|
||||
} else {
|
||||
void fetchGeneratedVideos();
|
||||
}
|
||||
}
|
||||
prevIsGenerating.current = isGenerating;
|
||||
}, [isGenerating, currentTask, fetchGeneratedVideos]);
|
||||
```
|
||||
|
||||
#### BUG-3: 发布页作品选择刷新后不保持(根因:签名 URL 不稳定)
|
||||
- **现象**: 发布管理页选择视频后刷新,选择丢失(无任何视频被选中)。
|
||||
- **根因**: 后端 `/api/videos/generated` 返回的 `path` 是 Supabase 签名 URL,每次请求都会变化。发布页用 `path` 作为选择标识存入 localStorage,刷新后新的 `path` 与保存值永远不匹配。首页不受影响是因为使用稳定的 `video.id`。
|
||||
- **修复**: 发布页全面改用 `id`(稳定标识)替代 `path`(签名 URL)进行选择、持久化和比较。
|
||||
- **文件**:
|
||||
- `frontend/src/shared/types/publish.ts` — `PublishVideo` 新增 `id` 字段
|
||||
- `frontend/src/features/publish/model/usePublishController.ts` — `selectedVideo` 存储 `id`,发布时根据 `id` 查找 `path`
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx` — `key`/`onClick`/选中比较改用 `v.id`
|
||||
- `frontend/src/features/home/model/useHomeController.ts` — 预取缓存加入 `id` 字段
|
||||
|
||||
```typescript
|
||||
// 类型定义新增 id
|
||||
export interface PublishVideo {
|
||||
id: string; // 稳定标识符
|
||||
name: string;
|
||||
path: string; // 签名 URL(仅用于播放/发布)
|
||||
}
|
||||
|
||||
// 发布时根据 id 查找 path
|
||||
const video = videos.find(v => v.id === selectedVideo);
|
||||
await api.post('/api/publish', { video_path: video.path, ... });
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/services/remotion_service.py` | Remotion 崩溃容错 |
|
||||
| `frontend/src/features/home/model/useGeneratedVideos.ts` | 首页视频选择不自动覆盖 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 任务完成监听 + 预取缓存加 id |
|
||||
| `frontend/src/shared/types/publish.ts` | PublishVideo 新增 id 字段 |
|
||||
| `frontend/src/features/publish/model/usePublishController.ts` | 选择/持久化/发布改用 id |
|
||||
| `frontend/src/features/publish/ui/PublishPage.tsx` | UI 选择比较改用 id |
|
||||
|
||||
### 关键教训
|
||||
|
||||
> **签名 URL 不可作为持久化标识**。Supabase Storage 的签名 URL 包含时间戳和签名参数,每次请求都不同。任何需要跨请求/跨刷新保持的标识,必须使用后端返回的稳定 `id` 字段。
|
||||
|
||||
### 重启要求
|
||||
```bash
|
||||
pm2 restart vigent2-backend # Remotion 容错
|
||||
npm run build && pm2 restart vigent2-frontend # 前端持久化修复
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎨 浮动样式预览窗口优化 (Day 21)
|
||||
|
||||
### 概述
|
||||
标题与字幕面板中的预览区域原本是内联折叠的,展开后调节下方滑块时看不到预览效果。改为 `position: fixed` 浮动窗口,固定在视口左上角,滚动页面时预览始终可见,边调边看。
|
||||
|
||||
### 已完成优化
|
||||
|
||||
#### 1. 新建浮动预览组件 `FloatingStylePreview.tsx`
|
||||
- `createPortal(jsx, document.body)` 渲染到 body 层级,脱离面板 DOM 树
|
||||
- `position: fixed` + 左上角固定定位,滚动时不移动
|
||||
- `z-index: 150`(低于 VideoPreviewModal 的 200)
|
||||
- 顶部标题栏 + X 关闭按钮,ESC 键关闭
|
||||
- 桌面端固定宽度 280px,移动端自适应(最大 360px)
|
||||
- `previewScale = windowWidth / previewBaseWidth` 自行计算缩放
|
||||
- `maxHeight: calc(100dvh - 32px)` 防止超出视口
|
||||
|
||||
#### 2. 修改 `TitleSubtitlePanel.tsx`
|
||||
- 删除内联预览区域(`ref={previewContainerRef}` 整块 JSX)
|
||||
- 条件渲染 `<FloatingStylePreview />`,按钮文本保持"预览样式"/"收起预览"
|
||||
- 移除 `previewScale`、`previewAspectRatio`、`previewContainerRef` props
|
||||
- 保留 `previewBaseWidth/Height`(浮动窗口需要原始尺寸计算 scale)
|
||||
|
||||
#### 3. 清理 `useHomeController.ts`
|
||||
- 移除 `previewContainerWidth` 状态
|
||||
- 移除 `titlePreviewContainerRef` ref
|
||||
- 移除 ResizeObserver useEffect(浮动窗口自管尺寸,不再需要)
|
||||
|
||||
#### 4. 简化 `HomePage.tsx` 传参
|
||||
- 移除 `previewContainerWidth`、`titlePreviewContainerRef` 解构
|
||||
- 移除 `previewScale`、`previewAspectRatio`、`previewContainerRef` prop 传递
|
||||
|
||||
#### 5. 移动端适配
|
||||
- `ScriptEditor.tsx`:标题行改为 `flex-wrap`,"AI生成标题标签"按钮不再溢出
|
||||
- 预览默认比例从 1280×720 (16:9) 改为 1080×1920 (9:16),符合抖音竖屏视频
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/FloatingStylePreview.tsx` | **新建** 浮动预览组件 |
|
||||
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | 移除内联预览,渲染浮动组件 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 移除 preview 容器相关状态和 ResizeObserver |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 简化 props 传递,默认比例改 9:16 |
|
||||
| `frontend/src/features/home/ui/ScriptEditor.tsx` | 移动端按钮换行适配 |
|
||||
|
||||
### 重启要求
|
||||
```bash
|
||||
npm run build && pm2 restart vigent2-frontend
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 多平台发布体系重构:用户隔离与抖音刷脸验证 (Day 21)
|
||||
|
||||
### 概述
|
||||
重构发布系统的两大核心问题:① 多用户场景下 Cookie/会话缺乏隔离,② 抖音登录新增刷脸验证步骤无法处理。同时修复了平台配置混用和微信视频号发布流程问题。
|
||||
|
||||
---
|
||||
|
||||
### 一、平台配置独立化
|
||||
|
||||
#### 问题
|
||||
所有平台(抖音、微信、B站、小红书)共用 WEIXIN_* 配置,导致 User-Agent、Headless 模式等设置不匹配。
|
||||
|
||||
#### 修复 — `config.py`
|
||||
- 新增 `DOUYIN_*` 独立配置项:`DOUYIN_HEADLESS_MODE`、`DOUYIN_USER_AGENT`(Chrome/144)、`DOUYIN_LOCALE`、`DOUYIN_TIMEZONE_ID`、`DOUYIN_CHROME_PATH`、`DOUYIN_FORCE_SWIFTSHADER`、调试开关等
|
||||
- 微信保持已有 `WEIXIN_*` 配置
|
||||
- B站/小红书使用通用默认值
|
||||
|
||||
#### 修复 — `qr_login_service.py` 平台配置映射
|
||||
```python
|
||||
# 之前:所有平台都用 WEIXIN 设置
|
||||
# 之后:每个平台独立配置
|
||||
PLATFORM_CONFIGS = {
|
||||
"douyin": { headless, user_agent, locale, timezone... },
|
||||
"weixin": { headless, user_agent, locale, timezone... },
|
||||
"bilibili": { 通用配置 },
|
||||
"xiaohongshu": { 通用配置 },
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 二、用户隔离的 Cookie 管理
|
||||
|
||||
#### 问题
|
||||
多用户共享同一套 Cookie 文件,用户 A 的登录态可能被用户 B 覆盖。
|
||||
|
||||
#### 修复 — `publish_service.py`
|
||||
- `_get_cookies_dir(user_id)` → `backend/user_data/{uuid}/cookies/`
|
||||
- `_get_cookie_path(user_id, platform)` → 按用户+平台返回独立 Cookie 文件路径
|
||||
- `_get_session_key(user_id, platform)` → `"{user_id}_{platform}"` 格式的会话 key
|
||||
- 登录/发布流程全链路传入 `user_id`,清理残留会话避免干扰
|
||||
|
||||
---
|
||||
|
||||
### 三、抖音刷脸验证二维码
|
||||
|
||||
#### 问题
|
||||
抖音扫码登录后可能弹出刷脸验证窗口,内含新的二维码需要用户再次扫描,前端无法感知和展示。
|
||||
|
||||
#### 修复 — 后端 `qr_login_service.py`
|
||||
- 扩展 QR 选择器:支持跨 iframe 搜索二维码元素
|
||||
- 抖音 API 拦截:监听 `check_qrconnect` 响应,检测 `redirect_url`
|
||||
- 检测 "完成验证" / "请前往APP完成验证" 文案
|
||||
- 在验证弹窗内找到正方形二维码(排除头像),截图返回给前端
|
||||
- API 确认后直接导航到 redirect_url(不重新加载 QR 页,避免销毁会话)
|
||||
|
||||
#### 修复 — 后端 `publish_service.py`
|
||||
- `get_login_session_status()` 新增 `face_verify_qr` 字段返回
|
||||
- 登录成功且 Cookie 保存后自动清理会话
|
||||
|
||||
#### 修复 — 前端
|
||||
- `usePublishController.ts`:新增 `faceVerifyQr` 状态,轮询时获取 `face_verify_qr` 字段
|
||||
- `PublishPage.tsx`:QR 弹窗优先展示刷脸验证二维码,附提示文案
|
||||
|
||||
```tsx
|
||||
{faceVerifyQr ? (
|
||||
<>
|
||||
<Image src={`data:image/png;base64,${faceVerifyQr}`} />
|
||||
<p>需要身份验证,请用抖音APP扫描上方二维码完成刷脸验证</p>
|
||||
</>
|
||||
) : /* 普通登录二维码 */ }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 四、微信视频号发布流程优化
|
||||
|
||||
#### 修复 — `weixin_uploader.py`
|
||||
- 添加 `user_id` 参数支持,发布截图目录隔离
|
||||
- 新增 `post_create` API 响应监听,精准判断发布成功
|
||||
- 发布结果判定:URL 离开创建页 或 API 确认提交 → 视为成功
|
||||
- 标题/标签处理改为统一写入"视频描述"字段(不再单独填写 title/tags)
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/core/config.py` | 新增 DOUYIN_* 独立配置项 |
|
||||
| `backend/app/services/qr_login_service.py` | 平台配置拆分、刷脸验证二维码、跨 iframe 选择器 |
|
||||
| `backend/app/services/publish_service.py` | 用户隔离 Cookie 管理、刷脸验证状态返回 |
|
||||
| `backend/app/services/uploader/weixin_uploader.py` | user_id 支持、post_create API 监听、描述字段合并 |
|
||||
| `frontend/src/features/publish/model/usePublishController.ts` | faceVerifyQr 状态 |
|
||||
| `frontend/src/features/publish/ui/PublishPage.tsx` | 刷脸验证二维码展示 |
|
||||
|
||||
### 重启要求
|
||||
```bash
|
||||
pm2 restart vigent2-backend # 发布服务 + QR登录
|
||||
npm run build && pm2 restart vigent2-frontend # 刷脸验证UI
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ 架构优化:前端结构微调 + 后端模块分层 (Day 21)
|
||||
|
||||
### 概述
|
||||
根据架构审计结果,完成前端目录规范化和后端核心模块的分层补全。
|
||||
|
||||
### 一、前端结构微调
|
||||
|
||||
#### 1. ScriptExtractionModal 迁移
|
||||
- `components/ScriptExtractionModal.tsx` → `features/home/ui/ScriptExtractionModal.tsx`
|
||||
- 连带 `components/script-extraction/` 目录一并迁移到 `features/home/ui/script-extraction/`
|
||||
- 更新 `HomePage.tsx` 的 import 路径
|
||||
|
||||
#### 2. contexts/ 目录归并
|
||||
- `src/contexts/AuthContext.tsx` → `src/shared/contexts/AuthContext.tsx`
|
||||
- `src/contexts/TaskContext.tsx` → `src/shared/contexts/TaskContext.tsx`
|
||||
- 更新 6 处 import(layout.tsx, useHomeController.ts, usePublishController.ts, AccountSettingsDropdown.tsx, GlobalTaskIndicator.tsx)
|
||||
- 删除空的 `src/contexts/` 目录
|
||||
|
||||
#### 3. 清理重构遗留空目录
|
||||
- 删除 `src/lib/`、`src/components/home/`、`src/hooks/`
|
||||
|
||||
### 二、后端模块分层补全
|
||||
|
||||
将 3 个 400+ 行的 router-only 模块拆分为 `router.py + schemas.py + service.py`:
|
||||
|
||||
| 模块 | 改造前 | 改造后 router |
|
||||
|------|--------|--------------|
|
||||
| `materials/` | 416 行 | 63 行 |
|
||||
| `tools/` | 417 行 | 33 行 |
|
||||
| `ref_audios/` | 421 行 | 71 行 |
|
||||
|
||||
业务逻辑全部提取到 `service.py`,数据模型定义在 `schemas.py`,router 只做参数校验 + 调用 service + 返回响应。
|
||||
|
||||
### 三、开发规范更新
|
||||
|
||||
`BACKEND_DEV.md` 第 8 节新增渐进原则:
|
||||
- 新模块**必须**包含 `router.py + schemas.py + service.py`
|
||||
- 改旧模块时顺手拆涉及的部分
|
||||
- 新代码高标准,旧代码逐步改
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | 从 components/ 迁入 |
|
||||
| `frontend/src/features/home/ui/script-extraction/` | 从 components/ 迁入 |
|
||||
| `frontend/src/shared/contexts/AuthContext.tsx` | 从 contexts/ 迁入 |
|
||||
| `frontend/src/shared/contexts/TaskContext.tsx` | 从 contexts/ 迁入 |
|
||||
| `backend/app/modules/materials/schemas.py` | **新建** |
|
||||
| `backend/app/modules/materials/service.py` | **新建** |
|
||||
| `backend/app/modules/materials/router.py` | 精简为薄路由 |
|
||||
| `backend/app/modules/tools/schemas.py` | **新建** |
|
||||
| `backend/app/modules/tools/service.py` | **新建** |
|
||||
| `backend/app/modules/tools/router.py` | 精简为薄路由 |
|
||||
| `backend/app/modules/ref_audios/schemas.py` | **新建** |
|
||||
| `backend/app/modules/ref_audios/service.py` | **新建** |
|
||||
| `backend/app/modules/ref_audios/router.py` | 精简为薄路由 |
|
||||
| `Docs/BACKEND_DEV.md` | 目录结构标注分层、新增渐进原则 |
|
||||
| `Docs/BACKEND_README.md` | 目录结构标注分层 |
|
||||
| `Docs/FRONTEND_DEV.md` | 更新目录结构(contexts 迁移、ScriptExtractionModal 迁移) |
|
||||
|
||||
### 重启要求
|
||||
```bash
|
||||
pm2 restart vigent2-backend
|
||||
npm run build && pm2 restart vigent2-frontend
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎬 多素材视频生成(多机位效果)
|
||||
|
||||
### 概述
|
||||
支持用户上传多个不同角度的自拍视频,生成视频时按句子自动切换素材,最终效果类似多机位拍摄。单素材时走原有流程,无额外开销。
|
||||
|
||||
### 核心架构
|
||||
|
||||
#### 流水线变更
|
||||
```
|
||||
【单素材(不变)】
|
||||
text → TTS → audio → LatentSync(1个素材+完整audio) → Whisper字幕 → Remotion → 成片
|
||||
|
||||
【多素材(新增)】
|
||||
text → TTS → audio → Whisper字幕(提前) → 按素材数量均分时长(对齐字边界)
|
||||
→ 对每段: 切分audio + LatentSync(素材[i]+音频片段[i])
|
||||
→ FFmpeg拼接所有片段 → Remotion(完整字幕时间戳) → 成片
|
||||
```
|
||||
|
||||
#### 素材切换逻辑(均分方案)
|
||||
1. Whisper 对完整音频转录,得到字级别时间戳
|
||||
2. 按素材数量**均分音频总时长**(`total_duration / N`)
|
||||
3. 每个分割点对齐到最近的 Whisper 字边界,避免在字中间切分
|
||||
4. 首段 start 扩展为 0.0,末段 end 扩展为音频结尾,确保完整覆盖
|
||||
|
||||
> **设计决策**:最初方案基于原始文案标点分句,但用户文案往往不含句号(只有逗号),导致只产生 1 段。改为均分方案后不依赖文案标点,对任何输入都能正确切分。
|
||||
|
||||
---
|
||||
|
||||
### 一、后端改动
|
||||
|
||||
#### 1. `backend/app/modules/videos/schemas.py`
|
||||
- 新增 `material_paths: Optional[List[str]]` 字段
|
||||
- 保留 `material_path: str` 向后兼容
|
||||
|
||||
#### 2. `backend/app/modules/videos/workflow.py`(核心改动)
|
||||
|
||||
**新增函数**:
|
||||
- `_split_equal(segments, material_paths)`: 按素材数量均分音频时长,对齐到最近的 Whisper 字边界
|
||||
|
||||
**修改 `process_video_generation()`**:
|
||||
- `is_multi = len(material_paths) > 1` 判断走多素材/单素材分支
|
||||
- 多素材分支:Whisper 提前 → 均分切分 → 音频切分 → 逐段 LatentSync → FFmpeg 拼接
|
||||
|
||||
#### 3. `backend/app/services/video_service.py`
|
||||
- 新增 `concat_videos()`: FFmpeg concat demuxer (`-c copy`) 拼接视频片段
|
||||
- 新增 `split_audio()`: FFmpeg 按时间范围切分音频 (`-ss` + `-t` + `-c copy`)
|
||||
|
||||
#### 4. `backend/scripts/watchdog.py`
|
||||
- 健康检查阈值从 3 次提高到 5 次(容忍期 2.5 分钟)
|
||||
- 新增重启后 120 秒冷却期,避免模型加载期间被误判为故障
|
||||
- 启动时给所有服务 60 秒初始冷却期
|
||||
|
||||
---
|
||||
|
||||
### 二、前端改动
|
||||
|
||||
#### 1. 新增依赖
|
||||
```bash
|
||||
npm install @dnd-kit/core @dnd-kit/sortable @dnd-kit/utilities
|
||||
```
|
||||
|
||||
#### 2. `frontend/src/features/home/model/useMaterials.ts`
|
||||
- `selectedMaterial: string` → `selectedMaterials: string[]`(多选)
|
||||
- 新增 `toggleMaterial(id)`: 切换选中/取消(至少保留1个)
|
||||
- 新增 `reorderMaterials(activeId, overId)`: 拖拽排序
|
||||
- 上传格式扩展:新增 `.mkv/.webm/.flv/.wmv/.m4v/.ts/.mts`
|
||||
|
||||
#### 3. `frontend/src/features/home/ui/MaterialSelector.tsx`(重写)
|
||||
- 素材列表每行增加复选框 + 序号徽标(①②③)
|
||||
- 选中 ≥2 个时显示拖拽排序区(@dnd-kit `SortableContext`)
|
||||
- 每个排序项:拖拽把手 + 序号 + 素材名 + 移除按钮
|
||||
- HTML input accept 改为 `video/*`
|
||||
|
||||
#### 4. `frontend/src/features/home/model/useHomeController.ts`
|
||||
- 多素材 payload:`material_paths` 数组 + `material_path` 向后兼容
|
||||
- `enable_subtitles` 硬编码为 `true`(移除开关)
|
||||
- 验证:至少选中 1 个素材
|
||||
|
||||
#### 5. `frontend/src/features/home/model/useHomePersistence.ts`
|
||||
- 素材持久化改为 JSON 数组,向后兼容旧格式(单字符串)
|
||||
- 移除 `enableSubtitles` 持久化
|
||||
|
||||
#### 6. `frontend/src/features/home/ui/TitleSubtitlePanel.tsx`
|
||||
- 移除"逐字高亮字幕"开关,字幕样式区始终显示
|
||||
|
||||
#### 7. `frontend/src/features/home/ui/HomePage.tsx`
|
||||
- 更新 props 传递(`selectedMaterials`, `toggleMaterial`, `reorderMaterials`)
|
||||
|
||||
---
|
||||
|
||||
### 三、Bug 修复记录
|
||||
|
||||
#### BUG-1: 多素材只使用第一个视频(基于标点的分句方案失败)
|
||||
- **现象**: 选了 2 个素材但生成的视频只使用第 1 个,日志显示 `Multi-material: 1 segments, 2 materials`。
|
||||
- **根因 v1**: 最初通过正则 `[。!?!?]` 在 Whisper 输出中分句,但 Whisper 不输出标点。
|
||||
- **修复 v1**: 改为用原始文案标点分句——但用户文案往往只含逗号(,),无句末标点(。!?),仍退化为 1 段。
|
||||
- **最终修复**: 彻底放弃基于标点的分句方案,改为 `_split_equal()` **按素材数量均分音频时长**,对齐到最近的 Whisper 字边界。不依赖任何标点符号,对所有文案均有效。
|
||||
|
||||
#### BUG-2: 口型对不上(音频时间偏移)
|
||||
- **根因**: `split_audio` 用 Whisper 的 start/end 时间(如 0.11~7.21)切分音频,但 `compose()` 用完整原始音频(0.0~结尾)合成,导致时间偏移。
|
||||
- **修复**: 强制首段 start=0.0,末段 end=音频实际时长,确保切分音频完整覆盖。
|
||||
|
||||
#### BUG-3: min_segment_sec 过度合并导致退化(已随方案切换移除)
|
||||
- **根因**: 旧方案中 2 个句子第 2 句不足 3 秒时,最短时长检查合并为 1 段,多素材退化为单素材。
|
||||
- **状态**: 均分方案不存在此问题,相关代码已移除。
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `backend/app/modules/videos/schemas.py` | 修改 | 新增 material_paths 字段 |
|
||||
| `backend/app/modules/videos/workflow.py` | 修改 | 多素材流水线核心逻辑 + 3个 Bug 修复 |
|
||||
| `backend/app/services/video_service.py` | 修改 | 新增 concat_videos / split_audio |
|
||||
| `backend/scripts/watchdog.py` | 修改 | 阈值优化 + 冷却期机制 |
|
||||
| `frontend/package.json` | 修改 | 新增 @dnd-kit 依赖 |
|
||||
| `frontend/src/features/home/model/useMaterials.ts` | 修改 | 多选 + 排序状态管理 |
|
||||
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 重写 | 多选复选框 + 拖拽排序 UI |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 修改 | 多素材 payload + 移除字幕开关 |
|
||||
| `frontend/src/features/home/model/useHomePersistence.ts` | 修改 | JSON 数组持久化 |
|
||||
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | 修改 | 移除字幕开关 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 修改 | 更新 props 传递 |
|
||||
|
||||
### 重启要求
|
||||
```bash
|
||||
pm2 restart vigent2-backend
|
||||
npm run build && pm2 restart vigent2-frontend
|
||||
```
|
||||
221
Docs/DevLogs/Day22.md
Normal file
221
Docs/DevLogs/Day22.md
Normal file
@@ -0,0 +1,221 @@
|
||||
## 🔧 多素材生成优化与健壮性加固 (Day 22)
|
||||
|
||||
### 概述
|
||||
对 Day 21 实现的多素材视频生成(多机位)功能进行全面审查,修复 6 个高优先级 Bug、完成 8 项体验优化,并将多素材流水线从"逐段 LatentSync"重构为"先拼接再推理"架构,推理次数从 N 次降为 1 次。
|
||||
|
||||
---
|
||||
|
||||
### 一、后端高优 Bug 修复
|
||||
|
||||
#### 1. `_split_equal()` 素材数 > 字符数边界溢出
|
||||
- **问题**: 5 个素材但只有 2 个 Whisper 字符时,边界索引重复,部分素材被跳过
|
||||
- **修复**: 加入 `n = min(n, len(all_chars))` 上限保护
|
||||
- **文件**: `backend/app/modules/videos/workflow.py`
|
||||
|
||||
#### 2. 多素材 LatentSync 单段失败无 fallback
|
||||
- **问题**: 单素材模式下 LatentSync 失败会 fallback 到原始素材,但多素材模式直接抛异常,整个任务失败
|
||||
- **修复**: 多素材循环中加 try-except,失败时 fallback 到原始素材片段
|
||||
- **文件**: `backend/app/modules/videos/workflow.py`
|
||||
|
||||
#### 3. `num_segments == 0` 时 ZeroDivisionError
|
||||
- **问题**: 所有 assignments 被跳过后 `i / num_segments` 触发除零
|
||||
- **修复**: 循环前加 `if num_segments == 0` 检查并抛出明确错误
|
||||
- **文件**: `backend/app/modules/videos/workflow.py`
|
||||
|
||||
#### 4. `split_audio` 未校验 duration > 0
|
||||
- **问题**: `end <= start` 时 FFmpeg 行为异常
|
||||
- **修复**: 加入 `if duration <= 0: raise ValueError(...)`
|
||||
- **文件**: `backend/app/services/video_service.py`
|
||||
|
||||
#### 5. Whisper 失败时按时长均分兜底
|
||||
- **问题**: Whisper 失败后直接退化为单素材,其他素材被浪费
|
||||
- **修复**: 按 `audio_duration / len(material_paths)` 均分,不依赖字符对齐
|
||||
- **文件**: `backend/app/modules/videos/workflow.py`
|
||||
|
||||
#### 6. `concat_videos` 空列表未检查
|
||||
- **问题**: 传入空 `video_paths` 时 FFmpeg 报错
|
||||
- **修复**: 加入 `if not video_paths: raise ValueError(...)`
|
||||
- **文件**: `backend/app/services/video_service.py`
|
||||
|
||||
---
|
||||
|
||||
### 二、前端优化
|
||||
|
||||
#### 1. payload 构建非空断言修复
|
||||
- `m!.path` → `m?.path` + `.filter(Boolean)`,防止素材被删后 crash
|
||||
- **文件**: `frontend/src/features/home/model/useHomeController.ts`
|
||||
|
||||
#### 2. 生成按钮展示后端进度消息
|
||||
- 新增 `message` prop,生成中显示如"(正在处理片段 2/3...)"
|
||||
- **文件**: `frontend/src/features/home/ui/GenerateActionBar.tsx`, `HomePage.tsx`
|
||||
|
||||
#### 3. 新上传素材自动选中
|
||||
- 上传成功后对比前后素材列表,新增的 ID 自动追加到 `selectedMaterials`
|
||||
- **文件**: `frontend/src/features/home/model/useMaterials.ts`
|
||||
|
||||
#### 4. Material 接口统一
|
||||
- 三处 `interface Material` 重复定义提取到 `shared/types/material.ts`
|
||||
- **文件**: `frontend/src/shared/types/material.ts` (新建), `useMaterials.ts`, `useHomeController.ts`, `MaterialSelector.tsx`
|
||||
|
||||
#### 5. 拖拽排序修复
|
||||
- 移除 `DragOverlay`(`backdrop-blur` 创建新 containing block 导致定位错乱)
|
||||
- 改为 `useSortable` 原生拖拽 + `CSS.Translate`,拖拽中元素高亮加阴影
|
||||
- **文件**: `frontend/src/features/home/ui/MaterialSelector.tsx`
|
||||
|
||||
#### 6. 素材选择上限 4 个
|
||||
- `toggleMaterial` 新增 `MAX_MATERIALS = 4` 限制
|
||||
- UI 选满后未选中项变半透明禁用,提示文字改为"可多选,最多4个"
|
||||
- **文件**: `useMaterials.ts`, `MaterialSelector.tsx`
|
||||
|
||||
#### 7. 移动端排序区域响应式
|
||||
- 素材列表 `max-h-64` → `max-h-48 sm:max-h-64`
|
||||
- **文件**: `MaterialSelector.tsx`
|
||||
|
||||
#### 8. 多素材耗时提示
|
||||
- 选中 ≥2 素材时生成按钮下方显示"多素材模式 (N 个机位),生成耗时较长"
|
||||
- **文件**: `GenerateActionBar.tsx`, `HomePage.tsx`
|
||||
|
||||
---
|
||||
|
||||
### 三、核心架构重构:先拼接再推理
|
||||
|
||||
#### V1 (Day 21): 逐段 LatentSync
|
||||
```
|
||||
素材A → LatentSync(素材A, 音频片段1) → lipsync_A
|
||||
素材B → LatentSync(素材B, 音频片段2) → lipsync_B
|
||||
FFmpeg concat(lipsync_A, lipsync_B) → 最终视频
|
||||
```
|
||||
- 缺点:N 个素材 = N 次 LatentSync 推理(每次 ~30s)
|
||||
|
||||
#### V2 (Day 22): 先拼接再推理
|
||||
```
|
||||
素材A → prepare_segment(裁剪到3.67s) → prepared_A
|
||||
素材B → prepare_segment(裁剪到4.00s) → prepared_B
|
||||
FFmpeg concat(prepared_A, prepared_B) → concat_video (7.67s)
|
||||
LatentSync(concat_video, 完整音频) → 最终视频
|
||||
```
|
||||
- 优点:只需 **1 次** LatentSync 推理,时间从 N×30s 降为 1×30s
|
||||
|
||||
#### 新增 `prepare_segment()` 方法
|
||||
```python
|
||||
def prepare_segment(self, video_path, target_duration, output_path, target_resolution=None):
|
||||
# 素材时长 > 目标: 裁剪 (-t)
|
||||
# 素材时长 < 目标: 循环 (-stream_loop) + 裁剪
|
||||
# 分辨率一致: -c copy 无损 (不重编码)
|
||||
# 分辨率不一致: scale + pad 统一到第一个素材分辨率
|
||||
```
|
||||
|
||||
#### 分辨率处理策略
|
||||
- 新增 `get_resolution()` 方法检测各素材分辨率
|
||||
- 所有素材分辨率相同时:`-c copy` 无损裁剪(保持原画质)
|
||||
- 分辨率不一致时:统一到第一个素材的分辨率,`force_original_aspect_ratio=decrease` + `pad` 居中
|
||||
- LatentSync 只处理嘴部 512×512 区域,输出保持原分辨率
|
||||
|
||||
#### 时间对齐验证
|
||||
|
||||
| 环节 | 时间基准 | 对齐关系 |
|
||||
|------|---------|---------|
|
||||
| TTS 音频 | 原始时长 (7.67s) | 基准 |
|
||||
| Whisper 字幕 | 基于 TTS 音频 | 时间戳对齐音频 |
|
||||
| 均分切分 | assignments 总时长 = 音频时长 | 首段 start=0, 末段 end=audio_duration |
|
||||
| prepare 各段 | `-t seg_dur` 精确截断 | 总和 ≈ 音频时长 |
|
||||
| LatentSync | concat_video + 完整音频 | 内部 0.5s 容差 |
|
||||
| compose | lipsync_video + 音频/BGM | `-shortest` 保证同步 |
|
||||
| Remotion | 基于 captions_path 渲染字幕 | 时间戳对齐音频 |
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `backend/app/modules/videos/workflow.py` | 修改 | 6 个 Bug 修复 + 流水线重构(先拼接再推理)|
|
||||
| `backend/app/services/video_service.py` | 修改 | 新增 `prepare_segment()`、`get_resolution()`,`split_audio` 校验,`concat_videos` 空列表检查 |
|
||||
| `frontend/src/shared/types/material.ts` | 新建 | 统一 Material 接口 |
|
||||
| `frontend/src/features/home/model/useMaterials.ts` | 修改 | 上传自动选中、素材上限 4 个 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 修改 | payload 非空断言修复、Material 接口引用 |
|
||||
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 修改 | 拖拽修复、上限 4 个 UI、移动端响应式 |
|
||||
| `frontend/src/features/home/ui/GenerateActionBar.tsx` | 修改 | 进度消息展示、多素材耗时提示 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 修改 | 传递 message、materialCount prop |
|
||||
|
||||
---
|
||||
|
||||
### 四、AI 多语言翻译
|
||||
|
||||
#### 功能
|
||||
在文案编辑区新增「AI多语言」按钮,支持将中文口播文案一键翻译为 9 种语言,并可随时还原原文。
|
||||
|
||||
#### 支持语言
|
||||
英语 English、日语 日本語、韩语 한국어、法语 Français、德语 Deutsch、西班牙语 Español、俄语 Русский、意大利语 Italiano、葡萄牙语 Português
|
||||
|
||||
#### 实现
|
||||
|
||||
##### 后端
|
||||
- **`backend/app/services/glm_service.py`** — 新增 `translate_text()` 方法,调用智谱 GLM API(temperature=0.3),prompt 要求只返回译文、保持语气风格
|
||||
- **`backend/app/modules/ai/router.py`** — 新增 `POST /api/ai/translate` 接口,接收 `{text, target_lang}`,返回 `{translated_text}`
|
||||
|
||||
##### 前端
|
||||
- **`frontend/src/features/home/ui/ScriptEditor.tsx`** — 新增 `LANGUAGES` 列表(9 种语言)、语言下拉菜单(点击外部自动关闭)、翻译中 loading 状态、「还原原文」按钮(翻译过后出现在菜单顶部)
|
||||
- **`frontend/src/features/home/model/useHomeController.ts`** — 新增 `handleTranslate`(调用翻译 API、首次翻译保存原文)、`originalText` 状态、`handleRestoreOriginal`(恢复原文)
|
||||
|
||||
#### 涉及文件
|
||||
|
||||
| 文件 | 变更 | 说明 |
|
||||
|------|------|------|
|
||||
| `backend/app/services/glm_service.py` | 修改 | 新增 `translate_text()` 方法 |
|
||||
| `backend/app/modules/ai/router.py` | 修改 | 新增 `/api/ai/translate` 接口 |
|
||||
| `frontend/src/features/home/ui/ScriptEditor.tsx` | 修改 | 语言菜单 UI、翻译 loading、还原原文按钮 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 修改 | `handleTranslate`、`originalText`、`handleRestoreOriginal` |
|
||||
|
||||
---
|
||||
|
||||
### 五、TTS 多语言支持
|
||||
|
||||
#### 背景
|
||||
翻译功能实现后,用户可将中文文案翻译为其他语言。但翻译后生成视频时 TTS 仍只支持中文:
|
||||
- **EdgeTTS**:声音列表只有 5 个 `zh-CN-*` 中文声音
|
||||
- **声音克隆 (Qwen3-TTS)**:`language` 参数硬编码为 `"Chinese"`
|
||||
|
||||
#### 实现方案
|
||||
|
||||
##### 1. 前端:语言感知的声音列表
|
||||
- `VOICES` 从扁平数组扩展为 `Record<string, VoiceOption[]>`,覆盖 10 种语言(zh-CN / en-US / ja-JP / ko-KR / fr-FR / de-DE / es-ES / ru-RU / it-IT / pt-BR),每种语言 2 个声音(男/女)
|
||||
- 新增 `LANG_TO_LOCALE` 映射:翻译目标语言名 → EdgeTTS locale(如 `"English" → "en-US"`)
|
||||
- 新增 `textLang` 状态,跟踪当前文案语言,默认 `"zh-CN"`
|
||||
|
||||
##### 2. 翻译时自动切换声音
|
||||
- `handleTranslate` 成功后:根据目标语言设置 `textLang`,EdgeTTS 模式下自动切换 `voice` 为目标语言的默认声音
|
||||
- `handleRestoreOriginal` 还原时:重置 `textLang` 为 `"zh-CN"`,恢复中文默认声音
|
||||
- `VoiceSelector` 根据 `textLang` 动态显示对应语言的声音列表
|
||||
|
||||
##### 3. 声音克隆语言透传
|
||||
- 前端:新增 `LOCALE_TO_QWEN_LANG` 映射(`zh-CN→"Chinese"`, `en-US→"English"`, 其他→`"Auto"`)
|
||||
- 生成请求 payload 加入 `language` 字段(仅声音克隆模式)
|
||||
- 后端 `GenerateRequest` schema 新增 `language: str = "Chinese"` 字段
|
||||
- `workflow.py`:`language="Chinese"` 硬编码改为 `language=req.language`
|
||||
|
||||
##### 4. Bug 修复:textLang 持久化
|
||||
- **问题**: `voice` 已持久化但 `textLang` 未持久化,刷新页面后 `voice` 恢复为英文声音但 `textLang` 默认回中文,导致 VoiceSelector 显示中文声音列表却选中英文声音,无高亮按钮
|
||||
- **修复**: 在 `useHomePersistence` 中加入 `textLang` 的 localStorage 读写
|
||||
|
||||
#### 数据流
|
||||
|
||||
```
|
||||
用户翻译 "English"
|
||||
→ ScriptEditor.onTranslate("English")
|
||||
→ LANG_TO_LOCALE["English"] = "en-US"
|
||||
→ setTextLang("en-US"), setVoice("en-US-GuyNeural")
|
||||
→ VoiceSelector 显示 VOICES["en-US"] = [Guy, Jenny]
|
||||
→ 生成时:
|
||||
EdgeTTS: payload.voice = "en-US-GuyNeural"
|
||||
声音克隆: payload.language = "English" (via getQwenLanguage)
|
||||
```
|
||||
|
||||
#### 涉及文件
|
||||
|
||||
| 文件 | 变更 | 说明 |
|
||||
|------|------|------|
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 修改 | VOICES 多语言 Record、textLang 状态、LANG_TO_LOCALE / LOCALE_TO_QWEN_LANG 映射、翻译自动切换 voice |
|
||||
| `frontend/src/features/home/model/useHomePersistence.ts` | 修改 | textLang 持久化读写 |
|
||||
| `backend/app/modules/videos/schemas.py` | 修改 | GenerateRequest 加 `language` 字段 |
|
||||
| `backend/app/modules/videos/workflow.py` | 修改 | 声音克隆调用处用 `req.language` 替代硬编码 |
|
||||
856
Docs/DevLogs/Day23.md
Normal file
856
Docs/DevLogs/Day23.md
Normal file
@@ -0,0 +1,856 @@
|
||||
## 🎙️ 配音前置重构 — 第一阶段 (Day 23)
|
||||
|
||||
### 概述
|
||||
|
||||
将配音从视频生成流程中独立出来,实现"先生成配音 → 选中配音 → 再选素材 → 生成视频"的新工作流。用户可以独立管理配音(生成/试听/改名/删除/选择),并在选中配音后看到时长信息,为第二阶段的素材时间轴编排奠定数据基础。
|
||||
|
||||
**旧流程**: 文案 + 选素材 → 一键生成(内联 TTS → Whisper → 均分 → LipSync → 合成)
|
||||
**新流程**: 文案 → 配音方式 → **生成配音** → 选中配音 → 选素材 → 背景音乐 → 生成视频
|
||||
|
||||
---
|
||||
|
||||
### 一、后端:新增 `generated_audios` 模块
|
||||
|
||||
#### 模块结构
|
||||
|
||||
```
|
||||
backend/app/modules/generated_audios/
|
||||
├── __init__.py
|
||||
├── router.py # 5 个 API 端点
|
||||
├── schemas.py # 请求/响应模型
|
||||
└── service.py # 生成/列表/删除/改名
|
||||
```
|
||||
|
||||
#### API 端点
|
||||
|
||||
| 方法 | 路径 | 说明 |
|
||||
|------|------|------|
|
||||
| POST | `/api/generated-audios/generate` | 异步生成配音(返回 task_id) |
|
||||
| GET | `/api/generated-audios/tasks/{task_id}` | 轮询生成进度 |
|
||||
| GET | `/api/generated-audios` | 列出用户所有配音 |
|
||||
| DELETE | `/api/generated-audios/{audio_id}` | 删除配音 |
|
||||
| PUT | `/api/generated-audios/{audio_id}` | 改名 |
|
||||
|
||||
#### 存储方案
|
||||
|
||||
- Supabase 存储桶:`generated-audios`(启动时自动创建)
|
||||
- 音频文件:`{user_id}/{timestamp}_audio.wav`
|
||||
- 元数据文件:`{user_id}/{timestamp}_audio.json`(含 display_name、text、tts_mode、duration_sec 等)
|
||||
|
||||
#### 生成流程
|
||||
|
||||
复用现有 `TTSService` / `voice_clone_service` / `task_store`:
|
||||
|
||||
```
|
||||
POST /generate → 创建 task → BackgroundTask:
|
||||
1. edgetts → TTSService.generate_audio()
|
||||
voiceclone → 下载 ref_audio → voice_clone_service.generate_audio()
|
||||
2. ffprobe 获取时长
|
||||
3. 上传 .wav + .json 到 generated-audios 桶
|
||||
4. 更新 task(status=completed, output={audio_id, duration_sec, ...})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 二、后端:修改视频生成 workflow
|
||||
|
||||
#### `GenerateRequest` 新增字段
|
||||
|
||||
```python
|
||||
generated_audio_id: Optional[str] = None # 预生成配音 ID(存在时跳过内联 TTS)
|
||||
```
|
||||
|
||||
#### `workflow.py` TTS 阶段新增分支
|
||||
|
||||
```python
|
||||
if req.generated_audio_id:
|
||||
# 下载预生成配音 + 从元数据读取 language
|
||||
elif req.tts_mode == "voiceclone":
|
||||
# 原有声音克隆逻辑
|
||||
else:
|
||||
# 原有 EdgeTTS 逻辑
|
||||
```
|
||||
|
||||
向后兼容:不传 `generated_audio_id` 时,原有内联 TTS 流程不受影响。
|
||||
|
||||
---
|
||||
|
||||
### 三、前端:新增配音列表 hook + 面板
|
||||
|
||||
#### `useGeneratedAudios.ts`
|
||||
|
||||
- 状态:`generatedAudios[]`、`selectedAudio`、`isGeneratingAudio`、`audioTask`
|
||||
- 方法:`fetchGeneratedAudios()`、`generateAudio()`、`deleteAudio()`、`renameAudio()`、`selectAudio()`
|
||||
- 轮询:生成后 1s 轮询 task 状态,完成后自动刷新列表并选中最新配音
|
||||
- 独立于视频生成的 TaskContext(不互相干扰)
|
||||
|
||||
#### `GeneratedAudiosPanel.tsx`
|
||||
|
||||
- 每条配音:播放/暂停、名称、时长、重命名、删除
|
||||
- 选中态:`border-purple-500 bg-purple-500/20`
|
||||
- 内嵌进度条(生成中显示)
|
||||
- 底部显示选中配音的原始文案(截断)
|
||||
- 播放逻辑自包含于面板内(`new Audio()` + play/pause toggle)
|
||||
|
||||
---
|
||||
|
||||
### 四、前端:UI 面板重排序
|
||||
|
||||
**旧顺序**: MaterialSelector → ScriptEditor → TitleSubtitle → VoiceSelector → BgmPanel → GenerateActionBar
|
||||
|
||||
**新顺序**:
|
||||
1. ScriptEditor(文案编辑)
|
||||
2. TitleSubtitlePanel(标题与字幕样式)
|
||||
3. VoiceSelector(配音方式)
|
||||
4. **GeneratedAudiosPanel**(配音列表)← 新增
|
||||
5. MaterialSelector(视频素材)← 后移,需选中配音才解锁
|
||||
6. BgmPanel(背景音乐)
|
||||
7. GenerateActionBar(生成视频)
|
||||
|
||||
#### 素材区门控
|
||||
|
||||
未选中配音时,素材区显示半透明遮罩 + "请先生成并选中配音"提示。素材上传/预览/改名/删除始终可用,仅选择勾选被遮罩。
|
||||
|
||||
#### 时长信息
|
||||
|
||||
选中配音后,MaterialSelector 顶部显示:
|
||||
```
|
||||
当前配音: 45.2 秒 | 已选 3 个素材(自动均分每段 ~15.1 秒)
|
||||
```
|
||||
|
||||
#### 生成按钮条件更新
|
||||
|
||||
```typescript
|
||||
// 旧条件
|
||||
disabled={isGenerating || selectedMaterials.length === 0 || (ttsMode === "voiceclone" && !selectedRefAudio)}
|
||||
// 新条件
|
||||
disabled={isGenerating || selectedMaterials.length === 0 || !selectedAudio}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 五、持久化
|
||||
|
||||
`useHomePersistence` 新增 `selectedAudioId` 的 localStorage 读写,刷新页面后恢复选中的配音。
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
#### 后端新增
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `backend/app/modules/generated_audios/__init__.py` | 模块标记 |
|
||||
| `backend/app/modules/generated_audios/router.py` | 5 个 API 端点 |
|
||||
| `backend/app/modules/generated_audios/service.py` | 生成/列表/删除/改名 |
|
||||
| `backend/app/modules/generated_audios/schemas.py` | 请求/响应模型 |
|
||||
|
||||
#### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/main.py` | 注册 generated_audios 路由 |
|
||||
| `backend/app/services/storage.py` | 新增 `BUCKET_GENERATED_AUDIOS`,启动时自动创建桶 |
|
||||
| `backend/app/modules/videos/schemas.py` | `GenerateRequest` 新增 `generated_audio_id` 字段 |
|
||||
| `backend/app/modules/videos/workflow.py` | TTS 阶段新增预生成音频分支 |
|
||||
|
||||
#### 前端新增
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/model/useGeneratedAudios.ts` | 配音列表 hook |
|
||||
| `frontend/src/features/home/ui/GeneratedAudiosPanel.tsx` | 配音列表面板 |
|
||||
|
||||
#### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 面板重排序 + 素材区门控 + 插入 GeneratedAudiosPanel |
|
||||
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 新增 `selectedAudioDuration` prop + 时长信息显示 |
|
||||
| `frontend/src/features/home/ui/GenerateActionBar.tsx` | 禁用条件改为 `!selectedAudio` |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 集成 useGeneratedAudios、新增 handleGenerateAudio、修改 handleGenerate 使用 generated_audio_id |
|
||||
| `frontend/src/features/home/model/useHomePersistence.ts` | 新增 selectedAudioId 持久化 |
|
||||
|
||||
---
|
||||
|
||||
## 🎞️ 素材时间轴编排 — 第二阶段 (Day 23)
|
||||
|
||||
### 概述
|
||||
|
||||
在第一阶段"配音前置"基础上,新增**时间轴编辑器**,用户可以:
|
||||
1. 在音频波形上查看各素材块的时长分配
|
||||
2. 拖拽分割线调整每段素材的时长(无缝铺满,调整一段自动压缩/扩展相邻段)
|
||||
3. 为每段素材设置**源视频截取起点**(从视频任意位置开始,而非始终从头)
|
||||
|
||||
**旧行为**: 多素材时自动均分(`_split_equal`),无法控制每段时长和源视频起始点
|
||||
**新行为**: 时间轴编辑器可视化分配 + 拖拽调整 + ClipTrimmer 截取设置
|
||||
|
||||
---
|
||||
|
||||
### 一、后端改动
|
||||
|
||||
#### 1.1 新增 `CustomAssignment` 模型
|
||||
|
||||
```python
|
||||
# backend/app/modules/videos/schemas.py
|
||||
class CustomAssignment(BaseModel):
|
||||
material_path: str
|
||||
start: float # 音频时间轴起点
|
||||
end: float # 音频时间轴终点
|
||||
source_start: float = 0.0 # 源视频截取起点
|
||||
```
|
||||
|
||||
`GenerateRequest` 新增 `custom_assignments: Optional[List[CustomAssignment]] = None`。存在时跳过 Whisper 均分,直接使用用户定义的分配。
|
||||
|
||||
#### 1.2 `prepare_segment` 支持 `source_start`
|
||||
|
||||
```python
|
||||
def prepare_segment(self, video_path, target_duration, output_path,
|
||||
target_resolution=None, source_start: float = 0.0):
|
||||
```
|
||||
|
||||
关键逻辑:
|
||||
- `source_start > 0` 时使用 `-ss` 快速 seek,并强制重编码(避免 stream copy 关键帧不精确)
|
||||
- 当需要循环且有 `source_start` 时,先裁剪出 `source_start` 到视频结尾的片段,再循环裁剪后的文件(避免 `stream_loop` 从视频 0s 开始循环)
|
||||
- 裁剪临时文件在 `finally` 中自动清理
|
||||
|
||||
#### 1.3 `workflow.py` 支持 `custom_assignments`
|
||||
|
||||
- **多素材模式**: `custom_assignments` 存在时,直接使用用户分配(仍运行 Whisper 生成字幕),每个 `prepare_segment` 调用传入 `source_start`
|
||||
- **单素材模式**: `custom_assignments` 有 1 条且 `source_start > 0` 时,先截取片段再传入 LatentSync
|
||||
- **向后兼容**: `custom_assignments` 为 `None` 时完全走旧路径
|
||||
|
||||
---
|
||||
|
||||
### 二、前端新增组件
|
||||
|
||||
#### 2.1 `useTimelineEditor.ts` — 时间轴段管理 hook
|
||||
|
||||
```typescript
|
||||
interface TimelineSegment {
|
||||
id: string; // React key
|
||||
materialId: string; // 素材 ID
|
||||
materialName: string; // 显示名
|
||||
start: number; // 音频时间轴开始秒数
|
||||
end: number; // 音频时间轴结束秒数
|
||||
sourceStart: number; // 源视频截取起点(默认 0)
|
||||
sourceEnd: number; // 源视频截取终点(0 = 到结尾)
|
||||
color: string; // 色块颜色
|
||||
}
|
||||
```
|
||||
|
||||
核心方法:
|
||||
- `initSegments()`: selectedMaterials 变化时按数量均分 audioDuration
|
||||
- `resizeSegment(id, newEnd)`: 拖拽右边界,约束每段最小 1s
|
||||
- `setSourceRange(id, sourceStart, sourceEnd)`: 设置截取范围
|
||||
- `toCustomAssignments()`: 转为后端 `CustomAssignment[]` 格式
|
||||
|
||||
#### 2.2 `TimelineEditor.tsx` — 波形 + 色块时间轴
|
||||
|
||||
- **wavesurfer.js** 渲染音频波形(仅展示,不播放)
|
||||
- 色块层按比例排列,显示素材名 + 时长 + 截取标记
|
||||
- 色块间分割线可拖拽(`onPointerDown/Move/Up` 实现连续像素拖拽)
|
||||
- 点击色块打开 ClipTrimmer
|
||||
|
||||
#### 2.3 `ClipTrimmer.tsx` — 素材截取模态框
|
||||
|
||||
- HTML5 `<video>` 实时预览,拖拽滑块时 `video.currentTime` 跟随
|
||||
- 双端 Range Slider(起点/终点),互锁约束 ≥ 0.5s
|
||||
- 显示截取时长 vs 分配时长对比(循环补足/截断提示)
|
||||
- `loadedmetadata` 获取源视频时长
|
||||
|
||||
---
|
||||
|
||||
### 三、前端整合改动
|
||||
|
||||
#### 3.1 `useHomeController.ts`
|
||||
|
||||
- 集成 `useTimelineEditor` hook
|
||||
- 新增 `clipTrimmerOpen` / `clipTrimmerSegmentId` 状态
|
||||
- `handleGenerate` 多素材时始终发送 `custom_assignments`;单素材 + `sourceStart > 0` 时也发送
|
||||
- 移除不再使用的 `reorderMaterials` 导出
|
||||
|
||||
#### 3.2 `HomePage.tsx`
|
||||
|
||||
- 在 MaterialSelector 和 BgmPanel 之间插入 TimelineEditor(仅当有配音且已选素材时显示)
|
||||
- 底部新增 ClipTrimmer 模态框
|
||||
- 移除 `reorderMaterials` 和 `selectedAudioDuration` prop 传递
|
||||
|
||||
#### 3.3 `MaterialSelector.tsx`
|
||||
|
||||
- 移除配音时长信息栏(功能迁至 TimelineEditor)
|
||||
- 移除拖拽排序区(SortableChip + @dnd-kit 相关代码)
|
||||
- 移除 `onReorderMaterials` / `selectedAudioDuration` prop
|
||||
|
||||
---
|
||||
|
||||
### 四、审查修复的 Bug
|
||||
|
||||
| # | 严重程度 | 问题 | 修复 |
|
||||
|---|---------|------|------|
|
||||
| 1 | **中** | `prepare_segment` 使用 `source_start > 0` + stream copy 时 seek 不精确 | 添加 `source_start > 0` 到重编码条件 |
|
||||
| 2 | **高** | `stream_loop + source_start` 循环时从视频 0s 开始而非从 source_start 循环 | 改为两步:先裁剪片段再循环裁剪后的文件 |
|
||||
| 3 | **低** | `useHomeController` 导出已废弃的 `reorderMaterials` | 移除 |
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
#### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/modules/videos/schemas.py` | 新增 `CustomAssignment` model,`GenerateRequest` 新增 `custom_assignments` 字段 |
|
||||
| `backend/app/services/video_service.py` | `prepare_segment` 新增 `source_start` 参数,循环+截取两步处理 |
|
||||
| `backend/app/modules/videos/workflow.py` | 多素材/单素材流水线支持 `custom_assignments`,传递 `source_start` |
|
||||
|
||||
#### 前端新增
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/model/useTimelineEditor.ts` | 时间轴段管理 hook |
|
||||
| `frontend/src/features/home/ui/TimelineEditor.tsx` | 波形 + 色块时间轴组件 |
|
||||
| `frontend/src/features/home/ui/ClipTrimmer.tsx` | 素材截取模态框 |
|
||||
|
||||
#### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 插入 TimelineEditor + ClipTrimmer |
|
||||
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 移除时长信息 + 拖拽排序区 + 相关 prop |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 集成 useTimelineEditor,handleGenerate 发送 custom_assignments |
|
||||
| `frontend/package.json` | 新增 `wavesurfer.js` 依赖 |
|
||||
|
||||
---
|
||||
|
||||
## 🎨 UI 体验优化 + TTS 稳定性修复 — 第三阶段 (Day 23)
|
||||
|
||||
### 概述
|
||||
|
||||
根据用户反馈,修复 6 项 UI 体验问题,同时修复声音克隆服务的 SoX 路径问题和显存缓存管理。
|
||||
|
||||
> **注**: Qwen3-TTS 已在后续被 CosyVoice 3.0 (端口 8010) 替换,以下记录为当时的修复过程。
|
||||
|
||||
---
|
||||
|
||||
### 一、Qwen3-TTS 稳定性修复 (已被 CosyVoice 3.0 替换)
|
||||
|
||||
#### 1.1 SoX PATH 修复
|
||||
|
||||
**问题**: PM2 启动 qwen-tts 时,`sox` 工具安装在 conda env 的 bin 目录中,系统 PATH 找不到,导致音频编解码走 fallback 路径(CPU 密集型),日志中出现 `SoX could not be found!` 警告。
|
||||
|
||||
**修复**: `run_qwen_tts.sh` 中 export conda env bin 到 PATH:
|
||||
|
||||
```bash
|
||||
export PATH="/home/rongye/ProgramFiles/miniconda3/envs/qwen-tts/bin:$PATH"
|
||||
```
|
||||
|
||||
#### 1.2 CUDA 缓存清理
|
||||
|
||||
**修复**: `qwen_tts_server.py` 每次生成完成后(无论成功或失败)调用 `torch.cuda.empty_cache()`,防止显存碎片累积。使用 `asyncio.to_thread()` 在线程池中运行推理,避免阻塞事件循环导致健康检查超时。
|
||||
|
||||
> **后续**: Qwen3-TTS 已停用,CosyVoice 3.0 沿用了相同的保护机制(GPU 推理锁、超时保护、显存清理、启动自检)。
|
||||
|
||||
---
|
||||
|
||||
### 二、配音列表按钮布局统一 (反馈 #1 + #6)
|
||||
|
||||
**问题**: `GeneratedAudiosPanel` 的试听按钮位于左侧(独立于 Edit/Delete),与 `RefAudioPanel` 的布局不一致。底部文案摘要区域不需要展示。
|
||||
|
||||
**修复**:
|
||||
- Play/Edit/Delete 按钮统一放在右侧同组,hover 显示,顺序为 试听→重命名→删除
|
||||
- 移除选中配音的文案摘要区域
|
||||
- 布局与 RefAudioPanel 一致:左侧名称+时长,右侧操作按钮组
|
||||
|
||||
---
|
||||
|
||||
### 三、视频素材区域移除配音依赖遮罩 (反馈 #2)
|
||||
|
||||
**问题**: MaterialSelector 被 `!selectedAudio` 遮罩覆盖,必须先选配音才能操作素材。
|
||||
|
||||
**修复**: 移除 `HomePage.tsx` 中 MaterialSelector 外层的 disabled overlay `<div>`。素材随时可上传/预览/管理,仅 TimelineEditor 需要选中配音才显示(已有独立条件 `selectedAudio && selectedMaterials.length > 0`)。
|
||||
|
||||
---
|
||||
|
||||
### 四、时间轴拖拽排序 (反馈 #3)
|
||||
|
||||
**问题**: TimelineEditor 不支持调换素材顺序。
|
||||
|
||||
**修复**:
|
||||
- `useTimelineEditor` 已有 `reorderSegments()` 方法(交换两个段的素材信息但保留时间范围)
|
||||
- 通过 `useHomeController` 暴露 `reorderSegments`,传入 `TimelineEditor`
|
||||
- 色块支持 HTML5 Drag & Drop:`draggable` + `onDragStart/Over/Drop/End`
|
||||
- 拖拽时:源色块半透明(`opacity-50`),目标色块高亮 ring(`ring-2 ring-purple-400 scale-[1.02]`)
|
||||
- 光标样式:`cursor-grab` / `active:cursor-grabbing`
|
||||
|
||||
---
|
||||
|
||||
### 五、截取设置双手柄 Range Slider (反馈 #4)
|
||||
|
||||
**问题**: ClipTrimmer 使用两个独立的 `<input type="range">` 滑块,起点和终点分开操作,体验不直观。
|
||||
|
||||
**修复**: 改为自定义双手柄 range slider:
|
||||
- 单条轨道,紫色圆形手柄(起点)+ 粉色圆形手柄(终点)
|
||||
- 轨道底色 `bg-white/10`,选中范围用素材对应颜色高亮
|
||||
- Pointer Events 实现拖拽:`onPointerDown` 捕获手柄 → `onPointerMove` 更新位置 → `onPointerUp` 释放
|
||||
- 手柄互锁约束:起点不超过终点 - 0.5s,终点不低于起点 + 0.5s
|
||||
- 底部显示起点(紫色)和终点(粉色)时间标签
|
||||
|
||||
---
|
||||
|
||||
### 六、截取设置视频预览 (反馈 #5)
|
||||
|
||||
**问题**: ClipTrimmer 的视频只能静态查看,无法播放预览截取范围。
|
||||
|
||||
**修复**:
|
||||
- 视频区域点击可播放/暂停(Play/Pause 图标覆盖层)
|
||||
- 播放范围:从 sourceStart 播放到 sourceEnd 自动停止
|
||||
- 播放结束后回到起点
|
||||
- 拖拽手柄时 `video.currentTime` 实时跟随(seek 到当前位置查看画面)
|
||||
- 播放进度条(白色竖线)叠加在 range slider 轨道上
|
||||
- `preload="auto"` 预加载视频,确保拖拽时快速 seek
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
#### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `run_qwen_tts.sh` | export conda env bin 到 PATH,修复 SoX 找不到问题 (已停用) |
|
||||
| `models/Qwen3-TTS/qwen_tts_server.py` | 每次生成后 `torch.cuda.empty_cache()`,asyncio.to_thread 避免阻塞 (已停用) |
|
||||
|
||||
#### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/GeneratedAudiosPanel.tsx` | 按钮布局统一(Play/Edit/Delete 右侧同组),移除文案摘要 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 移除 MaterialSelector 配音遮罩,传入 onReorderSegment |
|
||||
| `frontend/src/features/home/ui/TimelineEditor.tsx` | 新增 HTML5 Drag & Drop 排序,新增 onReorderSegment prop |
|
||||
| `frontend/src/features/home/ui/ClipTrimmer.tsx` | 双手柄 range slider + 视频播放预览 + 播放进度指示 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 暴露 reorderSegments 方法 |
|
||||
|
||||
---
|
||||
|
||||
## 📝 历史文案保存 + 时间轴拖拽修复 — 第四阶段 (Day 23)
|
||||
|
||||
### 概述
|
||||
|
||||
新增文案手动保存与加载功能,修复时间轴拖拽排序后素材时长不跟随的 Bug,统一按钮视觉规范。
|
||||
|
||||
---
|
||||
|
||||
### 一、历史文案保存与加载
|
||||
|
||||
#### 功能
|
||||
|
||||
用户可手动保存当前文案到历史列表,随时从历史中加载恢复。只有手动保存的文案才出现在历史列表中,与自动保存(`useHomePersistence`)完全独立。
|
||||
|
||||
#### UI 布局
|
||||
|
||||
```
|
||||
按钮栏: [历史文案▼] [文案提取助手] [AI多语言▼] [AI生成标题标签]
|
||||
底部栏: 128 字 [保存文案]
|
||||
```
|
||||
|
||||
- **历史文案下拉**: 展示已保存列表(名称 + 日期 + 删除按钮),点击条目加载文案,空列表显示"暂无保存的文案"
|
||||
- **保存文案按钮**: 文案为空时 disabled,点击后 `toast.success("文案已保存")`
|
||||
- **预计时长已移除**: 底部栏只保留字数 + 保存按钮
|
||||
|
||||
#### 实现
|
||||
|
||||
##### `useSavedScripts.ts`(新建)
|
||||
|
||||
```typescript
|
||||
interface SavedScript { id: string; name: string; content: string; savedAt: number }
|
||||
```
|
||||
|
||||
- localStorage key: `vigent_{storageKey}_savedScripts`
|
||||
- `saveScript(content)`: 取前 15 字符自动命名,新条目插入列表头部,**直接写入 localStorage**
|
||||
- `deleteScript(id)`: 删除指定条目,直接写入 localStorage
|
||||
- `useEffect([lsKey])`: lsKey 变化时(guest → userId)重新从 localStorage 读取
|
||||
- **不使用自动持久化 effect**,避免 storageKey 切换时空数组覆盖已有数据
|
||||
|
||||
##### 数据流
|
||||
|
||||
```
|
||||
ScriptEditor (UI)
|
||||
↑ savedScripts / onSaveScript / onLoadScript / onDeleteScript (纯 props + callbacks)
|
||||
│
|
||||
useHomeController
|
||||
├── useSavedScripts(storageKey) → { savedScripts, saveScript, deleteScript }
|
||||
└── handleSaveScript() → saveScript(text) + toast
|
||||
│
|
||||
HomePage
|
||||
└── 传递 props 到 ScriptEditor
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 二、时间轴拖拽排序 Bug 修复
|
||||
|
||||
#### 问题
|
||||
|
||||
拖拽调换素材顺序后,各素材的时长没有跟随素材移动,而是留在原槽位。例如:素材1(3s) + 素材2(8s+4s循环),拖拽后变成素材2(3s) + 素材1(8s+4s循环),时长分配没变。
|
||||
|
||||
#### 根因
|
||||
|
||||
`reorderSegments` 使用**属性交换**方式:逐个拷贝 `materialId`、`sourceStart`、`sourceEnd` 等属性在两个槽位间交换,然后调用 `recalcPositions` 重算位置。
|
||||
|
||||
#### 修复
|
||||
|
||||
改为**数组移动**(splice):将整个 segment 对象从旧位置取出,插入到新位置。segment 对象携带全部属性(materialId、sourceStart、sourceEnd、color 等)作为一个整体移动,再由 `recalcPositions` 重算位置。
|
||||
|
||||
```typescript
|
||||
// 修复前:属性交换
|
||||
const fromMat = { materialId: next[fromIdx].materialId, ... };
|
||||
const toMat = { materialId: next[toIdx].materialId, ... };
|
||||
next[fromIdx] = { ...next[fromIdx], ...toMat };
|
||||
next[toIdx] = { ...next[toIdx], ...fromMat };
|
||||
|
||||
// 修复后:数组移动
|
||||
const [moved] = next.splice(fromIdx, 1);
|
||||
next.splice(toIdx, 0, moved);
|
||||
```
|
||||
|
||||
附带优势:3+ 素材拖拽行为从"交换"变为"插入",更符合用户直觉。
|
||||
|
||||
---
|
||||
|
||||
### 三、按钮视觉统一
|
||||
|
||||
#### 问题
|
||||
|
||||
历史文案、文案提取助手、AI多语言、AI生成标题标签 4 个按钮高度不一致,AI 按钮的文本被 `<span>` 嵌套包裹导致内部布局差异。
|
||||
|
||||
#### 修复
|
||||
|
||||
- 4 个按钮统一为 `h-7 px-2.5 text-xs rounded inline-flex items-center gap-1`(固定高度 28px)
|
||||
- 移除 AI多语言 / AI生成标题标签 按钮内多余的 `<span>` 嵌套,改为 `<>...</>` fragment
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
#### 前端新增
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/model/useSavedScripts.ts` | 历史文案 hook(localStorage 持久化) |
|
||||
|
||||
#### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/ScriptEditor.tsx` | 历史文案下拉 + 保存按钮 + 移除预计时长 + 按钮高度统一 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 集成 useSavedScripts,新增 handleSaveScript |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 传递 savedScripts / handleSaveScript / deleteSavedScript 到 ScriptEditor |
|
||||
| `frontend/src/features/home/model/useTimelineEditor.ts` | reorderSegments 从属性交换改为数组移动(splice) |
|
||||
|
||||
---
|
||||
|
||||
## 🔤 字幕语言不匹配 + 视频比例错位修复 — 第五阶段 (Day 23)
|
||||
|
||||
### 概述
|
||||
|
||||
修复两个视频生成 Bug:
|
||||
1. **字幕语言不匹配**: 中文配音 + 英文翻译文案 → 字幕错误显示英文(Whisper 独立转录,忽略原文)
|
||||
2. **标题字幕比例错位**: 9:16 竖屏素材生成视频后,标题/字幕按 16:9 横屏布局渲染
|
||||
|
||||
附带修复代码审查中发现的 `split_word_to_chars` 英文空格丢失问题。
|
||||
|
||||
---
|
||||
|
||||
### 一、字幕用原文替换 Whisper 转录文字
|
||||
|
||||
#### 根因
|
||||
|
||||
Whisper 对音频独立转录,完全忽略传入的 `text` 参数。当配音语言与编辑器文案语言不一致时(例如:用户先写中文文案 → 翻译成英文 → 生成英文配音 → 再改回中文文案),Whisper "听到"英文语音就输出英文字幕。
|
||||
|
||||
#### 修复思路
|
||||
|
||||
Whisper 仅负责检测**语音总时间范围**(`first_start` → `last_end`),字幕文字永远用配音保存的原始文案。
|
||||
|
||||
#### `whisper_service.py` — `align()` 新增 `original_text` 参数
|
||||
|
||||
```python
|
||||
async def align(self, audio_path, text, output_path=None,
|
||||
language="zh", original_text=None):
|
||||
```
|
||||
|
||||
当 `original_text` 非空时:
|
||||
1. 正常运行 Whisper 转录,记录 `whisper_first_start` 和 `whisper_last_end`
|
||||
2. 将 `original_text` 传入 `split_word_to_chars()` 在总时间范围上线性分布
|
||||
3. 用 `split_segment_to_lines()` 按标点和字数断行
|
||||
4. 替换 Whisper 的转录结果
|
||||
|
||||
#### `workflow.py` — 配音元数据无条件覆盖 + 传入原文
|
||||
|
||||
```python
|
||||
# 改前(只在文案为空时覆盖)
|
||||
if not req.text.strip():
|
||||
req.text = meta.get("text", req.text)
|
||||
|
||||
# 改后(无条件用配音元数据覆盖)
|
||||
meta_text = meta.get("text", "")
|
||||
if meta_text:
|
||||
req.text = meta_text
|
||||
```
|
||||
|
||||
所有 4 处 `whisper_service.align()` 调用添加 `original_text=req.text`。
|
||||
|
||||
---
|
||||
|
||||
### 二、Remotion 动态传入视频尺寸
|
||||
|
||||
#### 根因
|
||||
|
||||
`remotion/src/Root.tsx` 硬编码 `width={1280} height={720}`。虽然 `render.ts` 用 ffprobe 检测真实尺寸后覆盖 `composition.width/height`,但 `selectComposition` 阶段组件已按 1280×720 初始化,标题和字幕定位基于错误的画布尺寸。
|
||||
|
||||
#### 修复
|
||||
|
||||
##### `Root.tsx` — `calculateMetadata` 从 props 读取尺寸
|
||||
|
||||
```tsx
|
||||
<Composition
|
||||
id="ViGentVideo"
|
||||
component={Video}
|
||||
durationInFrames={300}
|
||||
fps={25}
|
||||
width={1080}
|
||||
height={1920}
|
||||
calculateMetadata={async ({ props }) => ({
|
||||
width: props.width || 1080,
|
||||
height: props.height || 1920,
|
||||
})}
|
||||
defaultProps={{
|
||||
videoSrc: '',
|
||||
width: 1080,
|
||||
height: 1920,
|
||||
// ...
|
||||
}}
|
||||
/>
|
||||
```
|
||||
|
||||
默认从 1280×720 改为 1080×1920(竖屏优先),`calculateMetadata` 确保 `selectComposition` 阶段使用 ffprobe 检测的真实尺寸。
|
||||
|
||||
##### `Video.tsx` — VideoProps 新增可选 `width/height`
|
||||
|
||||
仅供 `calculateMetadata` 访问,组件渲染不引用。
|
||||
|
||||
##### `render.ts` — inputProps 统一传入视频尺寸
|
||||
|
||||
```typescript
|
||||
const inputProps = {
|
||||
videoSrc: videoFileName,
|
||||
captions,
|
||||
title: options.title,
|
||||
// ...
|
||||
width: videoWidth, // ffprobe 检测值
|
||||
height: videoHeight, // ffprobe 检测值
|
||||
};
|
||||
```
|
||||
|
||||
`selectComposition` 和 `renderMedia` 使用同一个 `inputProps`。保留显式 `composition.width/height` 覆盖作为保险。
|
||||
|
||||
---
|
||||
|
||||
### 三、代码审查修复:英文空格丢失
|
||||
|
||||
#### 问题
|
||||
|
||||
`split_word_to_chars` 原设计处理 Whisper 单个词(如 `" Hello"`),但 `original_text` 传入整段文本时,中间空格被 `continue` 跳过且不 flush `ascii_buffer`,导致 `"Hello World"` 变成 `"HelloWorld"`。
|
||||
|
||||
#### 执行路径追踪
|
||||
|
||||
```
|
||||
输入: "Hello World"
|
||||
H,e,l,l,o → ascii_buffer = "Hello"
|
||||
' ' → continue(跳过,不 flush!)
|
||||
W,o,r,l,d → ascii_buffer = "HelloWorld"
|
||||
结果: tokens = ["HelloWorld"] ← 空格丢失
|
||||
```
|
||||
|
||||
#### 修复
|
||||
|
||||
遇到空格时 flush `ascii_buffer`,并用 `pending_space` 标记给下一个 token 前置空格:
|
||||
|
||||
```python
|
||||
if not char.strip():
|
||||
if ascii_buffer:
|
||||
tokens.append(ascii_buffer)
|
||||
ascii_buffer = ""
|
||||
if tokens:
|
||||
pending_space = True
|
||||
continue
|
||||
```
|
||||
|
||||
修复后:`"Hello World"` → tokens = `["Hello", " World"]` → 字幕正确显示。中文不受影响。
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
#### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/services/whisper_service.py` | `align()` 新增 `original_text` 参数;`split_word_to_chars` 修复英文空格丢失 |
|
||||
| `backend/app/modules/videos/workflow.py` | 配音元数据无条件覆盖 text/language;4 处 `align()` 调用传入 `original_text` |
|
||||
|
||||
#### 前端修改(Remotion)
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `remotion/src/Root.tsx` | 默认尺寸改为 1080×1920,新增 `calculateMetadata` + width/height defaultProps |
|
||||
| `remotion/src/Video.tsx` | VideoProps 新增可选 `width`/`height` |
|
||||
| `remotion/render.ts` | inputProps 统一传入 `videoWidth`/`videoHeight`,selectComposition 和 renderMedia 共用 |
|
||||
|
||||
---
|
||||
|
||||
## 🎤 参考音频自动转写 + 语速控制 — 第六阶段 (Day 23)
|
||||
|
||||
### 概述
|
||||
|
||||
解决声音克隆 ref_text 不匹配问题:旧方案使用前端固定文字作为 ref_text,CosyVoice zero-shot 克隆要求 ref_text 必须与参考音频实际内容匹配,不匹配时模型会在生成音频开头"幻觉"出多余片段。
|
||||
|
||||
**改进**:上传参考音频时自动调用 Whisper 转写内容作为 ref_text,同时新增语速控制功能。
|
||||
|
||||
---
|
||||
|
||||
### 一、Whisper 自动转写参考音频
|
||||
|
||||
#### 1.1 `whisper_service.py` — 语言自动检测
|
||||
|
||||
`transcribe()` 方法原先硬编码 `language="zh"`,改为接受可选 `language` 参数(默认 `None` = 自动检测),支持多语言参考音频。
|
||||
|
||||
#### 1.2 `ref_audios/service.py` — 上传时自动转写
|
||||
|
||||
上传流程变更:转码 WAV → 检查时长(≥1s) → 超 10s 在静音点截取 → **Whisper 自动转写** → 验证非空 → 上传。
|
||||
|
||||
```python
|
||||
try:
|
||||
transcribed = await whisper_service.transcribe(tmp_wav_path)
|
||||
if transcribed.strip():
|
||||
ref_text = transcribed.strip()
|
||||
except Exception as e:
|
||||
logger.warning(f"Auto-transcribe failed: {e}")
|
||||
|
||||
if not ref_text or not ref_text.strip():
|
||||
raise ValueError("无法识别音频内容,请确保音频包含清晰的语音")
|
||||
```
|
||||
|
||||
#### 1.3 `ref_audios/router.py` — ref_text 改为可选
|
||||
|
||||
`ref_text: str = Form("")`(不再必填),前端不再发送固定文字。
|
||||
|
||||
---
|
||||
|
||||
### 二、参考音频智能截取(10 秒上限)
|
||||
|
||||
CosyVoice 对 3-10 秒参考音频效果最好。
|
||||
|
||||
#### 2.1 静音点检测
|
||||
|
||||
使用 ffmpeg `silencedetect` 找 10 秒内最后一个静音结束点(阈值 -30dB,最短 0.3s),避免在字词中间硬切:
|
||||
|
||||
```python
|
||||
def _find_silence_cut_point(file_path, max_duration):
|
||||
# silencedetect → 解析 silence_end → 找 3s~max_duration 内最后的静音点
|
||||
# 找不到则回退到 max_duration
|
||||
```
|
||||
|
||||
#### 2.2 淡出处理
|
||||
|
||||
截取时末尾 0.1 秒淡出(`afade=t=out`),避免截断爆音。
|
||||
|
||||
---
|
||||
|
||||
### 三、重新识别功能(旧数据迁移)
|
||||
|
||||
#### 3.1 新增 API
|
||||
|
||||
`POST /api/ref-audios/{audio_id}/retranscribe` — 下载音频 → 超 10s 截取 → Whisper 转写 → 重新上传音频和元数据。
|
||||
|
||||
#### 3.2 前端 UI
|
||||
|
||||
- RefAudioPanel 新增 RotateCw 按钮("重新识别文字"),转写中显示 `animate-spin`
|
||||
- 旧音频 ref_text 以固定文字开头时显示 ⚠ 黄色警告
|
||||
|
||||
---
|
||||
|
||||
### 四、语速控制(CosyVoice speed 参数)
|
||||
|
||||
#### 4.1 全链路传递
|
||||
|
||||
```
|
||||
前端 GeneratedAudiosPanel (速度选择器)
|
||||
→ useHomeController (speed state + persistence)
|
||||
→ useGeneratedAudios.generateAudio(params)
|
||||
→ POST /api/generated-audios/generate { speed: 1.0 }
|
||||
→ GenerateAudioRequest.speed (Pydantic)
|
||||
→ generate_audio_task → voice_clone_service.generate_audio(speed=)
|
||||
→ _generate_once → POST /generate { speed: "1.0" }
|
||||
→ cosyvoice_server → _model.inference_zero_shot(speed=speed)
|
||||
```
|
||||
|
||||
#### 4.2 前端 UI
|
||||
|
||||
声音克隆模式下,配音列表面板标题栏"生成配音"按钮左侧显示语速下拉菜单(`语速: 正常 ▼`):
|
||||
|
||||
| 标签 | speed 值 |
|
||||
|------|----------|
|
||||
| 较慢 | 0.8 |
|
||||
| 稍慢 | 0.9 |
|
||||
| 正常 | 1.0 (默认) |
|
||||
| 稍快 | 1.1 |
|
||||
| 较快 | 1.2 |
|
||||
|
||||
语速选择持久化到 localStorage(`vigent_{storageKey}_speed`)。
|
||||
|
||||
---
|
||||
|
||||
### 五、缺少参考音频门控
|
||||
|
||||
声音克隆模式下未选参考音频时:
|
||||
- "生成配音"按钮禁用 + title 提示"请先选择参考音频"
|
||||
- 面板内显示黄色警告条"声音克隆模式需要先选择参考音频"
|
||||
|
||||
---
|
||||
|
||||
### 六、前端清理
|
||||
|
||||
- 移除 `FIXED_REF_TEXT` 常量和 `fixedRefText` prop
|
||||
- 移除"请朗读以下内容"引导区块
|
||||
- 上传提示简化为"上传任意语音样本(3-10秒),系统将自动识别内容并克隆声音"
|
||||
- 录音区备注"建议 3-10 秒,超出将自动截取"
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
#### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/services/whisper_service.py` | `transcribe()` 增加可选 `language` 参数,默认 None (自动检测) |
|
||||
| `backend/app/modules/ref_audios/service.py` | 上传自动转写 + 静音点截取 + 淡出 + retranscribe 函数 |
|
||||
| `backend/app/modules/ref_audios/router.py` | `ref_text` 改为 Form(""),新增 retranscribe 端点 |
|
||||
| `backend/app/modules/generated_audios/schemas.py` | `GenerateAudioRequest` 新增 `speed: float = 1.0` |
|
||||
| `backend/app/modules/generated_audios/service.py` | 传递 `req.speed` 到 voice_clone_service |
|
||||
| `backend/app/services/voice_clone_service.py` | `generate_audio()` / `_generate_once()` 接受并传递 speed |
|
||||
| `models/CosyVoice/cosyvoice_server.py` | `/generate` 端点接受 `speed` 参数,传递到 `inference_zero_shot(speed=)` |
|
||||
|
||||
#### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 新增 speed state,移除 FIXED_REF_TEXT,handleGenerateAudio 传 speed |
|
||||
| `frontend/src/features/home/model/useHomePersistence.ts` | 新增 speed 持久化 |
|
||||
| `frontend/src/features/home/model/useRefAudios.ts` | 移除 fixedRefText,新增 retranscribe |
|
||||
| `frontend/src/features/home/model/useGeneratedAudios.ts` | generateAudio params 新增 speed |
|
||||
| `frontend/src/features/home/ui/GeneratedAudiosPanel.tsx` | 新增语速选择器 + 缺少参考音频门控 |
|
||||
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 移除朗读引导,新增重新识别按钮 + ⚠ 警告 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 传递 speed/setSpeed/ttsMode 到 GeneratedAudiosPanel |
|
||||
185
Docs/DevLogs/Day24.md
Normal file
185
Docs/DevLogs/Day24.md
Normal file
@@ -0,0 +1,185 @@
|
||||
## 🔧 鉴权到期治理 + 多素材时间轴稳定性修复 (Day 24)
|
||||
|
||||
### 概述
|
||||
|
||||
本日主要完成两条主线:
|
||||
|
||||
1. **账号与鉴权治理**:会员到期改为请求时自动失效(登录/鉴权接口触发),并统一返回续费提示。
|
||||
2. **视频生成稳定性**:围绕多素材时间轴、截取语义、拼接边界冻结、画面比例与字幕标题适配进行一轮端到端修复。
|
||||
|
||||
---
|
||||
|
||||
## 🔐 会员到期请求时失效 — 第一阶段 (Day 24)
|
||||
|
||||
### 目标
|
||||
|
||||
避免依赖定时任务,用户在触发登录或访问受保护接口时即可完成到期判定与账号停用。
|
||||
|
||||
### 行为调整
|
||||
|
||||
- 到期判断基于 `users.expires_at`。
|
||||
- 判定到期后:
|
||||
- 将 `is_active` 自动置为 `false`
|
||||
- 删除该用户全部 session
|
||||
- 返回 `403`,提示:`会员已到期,请续费`
|
||||
|
||||
### 实现点
|
||||
|
||||
- `users.py` 新增 `deactivate_user_if_expired()`,并补充 `_parse_expires_at()` 统一时区解析。
|
||||
- `deps.py` 在 `get_current_user` / `get_current_user_optional` 中统一接入到期检查。
|
||||
- `auth/router.py` 在登录路径增加到期停用逻辑;`/api/auth/me` 统一走 `Depends(get_current_user)`。
|
||||
|
||||
---
|
||||
|
||||
## 🖼️ 画面比例控制 + 字幕标题适配 — 第二阶段 (Day 24)
|
||||
|
||||
### 2.1 输出画面比例可配置
|
||||
|
||||
- 时间轴顶部新增“画面比例”下拉:`9:16` / `16:9`。
|
||||
- 默认值 `9:16`,并持久化到 localStorage。
|
||||
- 生成请求携带 `output_aspect_ratio`,后端在单素材与多素材流程中统一按目标分辨率处理。
|
||||
|
||||
### 2.2 标题/字幕在窄屏画布防溢出
|
||||
|
||||
为减少“预览正常、成片溢出”的差异,统一了预览与渲染策略:
|
||||
|
||||
- 根据 composition 宽度进行响应式缩放。
|
||||
- 开启可换行:`white-space: normal` + `word-break` + `overflow-wrap`。
|
||||
- 描边、字距、上下边距同步按比例缩放。
|
||||
|
||||
### 2.3 片头标题显示模式(短暂/常驻)
|
||||
|
||||
- 在“标题与字幕”面板的“片头标题”行尾新增下拉,支持:`短暂显示` / `常驻显示`。
|
||||
- 默认模式为 `短暂显示`,短暂模式默认时长为 4 秒。
|
||||
- 用户选择会持久化到 localStorage,刷新后保持上次配置。
|
||||
- 生成请求新增 `title_display_mode`,短暂模式透传 `title_duration=4.0`。
|
||||
- Remotion 端到端支持该参数:
|
||||
- `short`:标题在设定时长后淡出并结束渲染;
|
||||
- `persistent`:标题全程常驻(保留淡入动画,不执行淡出)。
|
||||
|
||||
---
|
||||
|
||||
## 🎥 方向归一化 + 多素材拼接稳定性 — 第三阶段 (Day 24)
|
||||
|
||||
### 3.1 MOV 旋转元数据导致横竖识别错误
|
||||
|
||||
问题场景:编码分辨率是横屏,但依赖 rotation side-data 才能正确显示为竖屏(常见于手机 MOV)。
|
||||
|
||||
修复方案:
|
||||
|
||||
- `get_video_metadata()` 扩展返回 `rotation/effective_width/effective_height`。
|
||||
- 新增 `normalize_orientation()`,在流程前对带旋转元数据素材做物理方向归一化。
|
||||
- 单素材和多素材下载后统一执行方向归一化,再做分辨率决策。
|
||||
|
||||
### 3.2 多素材“只看到第一段”与边界冻结
|
||||
|
||||
针对拼接可靠性补了两类保护:
|
||||
|
||||
- **分配保护**:`custom_assignments` 与素材数量不一致时,后端回退自动分配,避免异常输入导致仅首段生效。
|
||||
- **编码一致性**:
|
||||
- 片段准备阶段统一重编码;
|
||||
- concat 阶段不再走拷贝;
|
||||
- 进一步统一为 `25fps + CFR`,并在 concat 增加 `+genpts`,降低段边界时间基不连续导致的“画面冻结口型还动”风险。
|
||||
|
||||
---
|
||||
|
||||
## ⏱️ 时间轴截取语义对齐修复 — 第四阶段 (Day 24)
|
||||
|
||||
### 背景
|
||||
|
||||
时间轴设计语义是:
|
||||
|
||||
- 每段可以设置 `sourceStart/sourceEnd`;
|
||||
- 总时长超出音频时,仅保留可见段,末段截齐音频;
|
||||
- 总时长不足时,由最后可见段循环补齐。
|
||||
|
||||
本日将前后端对齐到这一语义。
|
||||
|
||||
### 4.1 `source_end` 全链路打通
|
||||
|
||||
此前仅传 `source_start`,导致后端无法准确知道“截到哪里”。
|
||||
|
||||
本次改动:
|
||||
|
||||
- 前端 `toCustomAssignments()` 增加可选 `source_end`。
|
||||
- 后端 `CustomAssignment` schema 增加 `source_end`。
|
||||
- workflow 将 `source_end` 透传到 `prepare_segment()`(单素材/多素材均支持)。
|
||||
- `prepare_segment()` 增加 `source_end` 参数,按 `[source_start, source_end)` 计算可用片段,并在需要循环时先裁剪再循环,避免循环范围错位。
|
||||
|
||||
### 4.2 时间轴有效时长计算修复
|
||||
|
||||
修复 `sourceStart > 0 且 sourceEnd = 0` 时的有效时长错误:
|
||||
|
||||
- 旧逻辑会按整段素材时长计算;
|
||||
- 新逻辑改为 `materialDuration - sourceStart`。
|
||||
|
||||
该修复同时用于:
|
||||
|
||||
- `recalcPositions()` 的段时长计算;
|
||||
- TimelineEditor 中“循环补足”可视化比例计算。
|
||||
|
||||
### 4.3 可见段分配优先级修复
|
||||
|
||||
修复“可见段数 < 已选素材数时,custom_assignments 被丢弃回退自动分配”的问题:
|
||||
|
||||
- 生成请求优先以时间轴可见段的 `assignments` 为准;
|
||||
- 超出时间轴的素材不参与本次生成。
|
||||
|
||||
### 4.4 单素材截取触发条件补齐
|
||||
|
||||
单素材模式下,若只改了终点(`sourceEnd > 0`)也会发送 `custom_assignments`,确保截取生效。
|
||||
|
||||
---
|
||||
|
||||
## 🧭 页面交互与体验细节 — 第五阶段 (Day 24)
|
||||
|
||||
- 页面刷新后自动回到顶部,避免从历史滚动位置进入页面。
|
||||
- 素材列表与历史视频列表滚动增加“跳过首次自动滚动”保护,减少恢复状态时页面跳动。
|
||||
- 时间轴比例区移除多余文案,保持信息简洁。
|
||||
|
||||
---
|
||||
|
||||
## 涉及文件汇总
|
||||
|
||||
### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/repositories/users.py` | 新增 `deactivate_user_if_expired()` 与 `_parse_expires_at()` |
|
||||
| `backend/app/core/deps.py` | `get_current_user` / `get_current_user_optional` 接入到期失效检查 |
|
||||
| `backend/app/modules/auth/router.py` | 登录时到期停用 + `/api/auth/me` 统一鉴权依赖 |
|
||||
| `backend/app/modules/videos/schemas.py` | `CustomAssignment` 新增 `source_end`;保留 `output_aspect_ratio` |
|
||||
| `backend/app/modules/videos/workflow.py` | 多素材/单素材透传 `source_end`;多素材 prepare/concat 统一 25fps;标题显示模式参数透传 Remotion |
|
||||
| `backend/app/services/video_service.py` | 旋转元数据解析与方向归一化;`prepare_segment` 支持 `source_end/target_fps`;concat 强制 CFR + `+genpts` |
|
||||
| `backend/app/services/remotion_service.py` | render 支持 `title_display_mode/title_duration` 并传递到 render.ts |
|
||||
|
||||
### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/model/useTimelineEditor.ts` | `CustomAssignment` 新增 `source_end`;修复 sourceStart 开放终点时长计算 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 多素材以可见 assignments 为准发送;单素材截取触发条件补齐 |
|
||||
| `frontend/src/features/home/ui/TimelineEditor.tsx` | 画面比例下拉;循环比例按截取后有效时长计算 |
|
||||
| `frontend/src/features/home/model/useHomePersistence.ts` | `outputAspectRatio` 与 `titleDisplayMode` 持久化 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 页面进入滚动到顶部;ClipTrimmer/Timeline 交互保持一致 |
|
||||
| `frontend/src/features/home/ui/FloatingStylePreview.tsx` | 标题/字幕样式预览与成片渲染策略对齐 |
|
||||
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | 标题行新增“短暂显示/常驻显示”下拉 |
|
||||
|
||||
### Remotion 修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `remotion/src/components/Title.tsx` | 标题响应式缩放与自动换行;新增短暂/常驻显示模式控制 |
|
||||
| `remotion/src/components/Subtitles.tsx` | 字幕响应式缩放与自动换行,减少预览/成片差异 |
|
||||
| `remotion/src/Video.tsx` | 新增 `titleDisplayMode` 透传到标题组件 |
|
||||
| `remotion/src/Root.tsx` | 默认 props 增加 `titleDisplayMode='short'` 与 `titleDuration=4` |
|
||||
| `remotion/render.ts` | CLI 参数新增 `--titleDisplayMode`,inputProps 增加 `titleDisplayMode` |
|
||||
|
||||
---
|
||||
|
||||
## 验证记录
|
||||
|
||||
- 后端语法检查:`python -m py_compile backend/app/modules/videos/schemas.py backend/app/modules/videos/workflow.py backend/app/services/video_service.py backend/app/services/remotion_service.py`
|
||||
- 前端类型检查:`npx tsc --noEmit`
|
||||
- 前端 ESLint:`npx eslint src/features/home/model/useHomeController.ts src/features/home/model/useHomePersistence.ts src/features/home/ui/HomePage.tsx src/features/home/ui/TitleSubtitlePanel.tsx`
|
||||
- Remotion 渲染脚本构建:`npm run build:render`
|
||||
254
Docs/DevLogs/Day25.md
Normal file
254
Docs/DevLogs/Day25.md
Normal file
@@ -0,0 +1,254 @@
|
||||
## 🔧 文案提取助手修复 — 抖音链接无法提取文案 (Day 25)
|
||||
|
||||
### 概述
|
||||
|
||||
文案提取助手粘贴抖音链接后无法提取文案,yt-dlp 报错 `Fresh cookies are needed`,手动回退方案也因抖音页面结构变化失效。本日完成了完整修复,并清理了不再需要的 `DOUYIN_COOKIE` 配置。
|
||||
|
||||
---
|
||||
|
||||
## 🐛 问题诊断
|
||||
|
||||
### 错误链路
|
||||
|
||||
1. **yt-dlp 失败**:`ERROR: [Douyin] Fresh cookies (not necessarily logged in) are needed`
|
||||
- yt-dlp 版本 `2025.12.08` 过旧
|
||||
- 抖音 API `aweme/v1/web/aweme/detail/` 需要签名 cookie(`s_v_web_id` 等),即使升级 yt-dlp 到最新版 + 传入 cookie 仍无法解决,属 yt-dlp 已知问题
|
||||
2. **手动回退失败**:`Could not find RENDER_DATA in page`
|
||||
- 旧方案通过桌面端用户主页 + `modal_id` 访问,抖音 SSR 已不再返回 `videoDetail` 数据
|
||||
3. **`.env` 中 `DOUYIN_COOKIE`**:时间戳 2024 年 12 月,早已过期
|
||||
|
||||
---
|
||||
|
||||
## ✅ 修复方案:移动端分享页 + 自动获取 ttwid
|
||||
|
||||
### 核心思路
|
||||
|
||||
放弃依赖 yt-dlp 下载抖音视频和手动维护 cookie,改为:
|
||||
|
||||
1. 自动从 ByteDance 公共 API 获取新鲜 `ttwid`(匿名令牌,不绑定账号)
|
||||
2. 用 `ttwid` 访问移动端分享页 `m.douyin.com/share/video/{id}`
|
||||
3. 从页面内嵌 JSON 中提取 `play_addr` 播放地址并下载
|
||||
|
||||
### 关键代码(`_download_douyin_manual` 重写)
|
||||
|
||||
```python
|
||||
# 1. 获取新鲜 ttwid
|
||||
ttwid_resp = await client.post(
|
||||
"https://ttwid.bytedance.com/ttwid/union/register/",
|
||||
json={"region": "cn", "aid": 6383, "service": "www.douyin.com", ...}
|
||||
)
|
||||
ttwid = ttwid_resp.cookies.get("ttwid", "")
|
||||
|
||||
# 2. 访问移动端分享页
|
||||
page_resp = await client.get(
|
||||
f"https://m.douyin.com/share/video/{video_id}",
|
||||
headers={"cookie": f"ttwid={ttwid}", ...}
|
||||
)
|
||||
|
||||
# 3. 提取 play_addr
|
||||
addr_match = re.search(r'"play_addr":\{"uri":"([^"]+)","url_list":\["([^"]+)"', page_text)
|
||||
video_url = addr_match.group(2).replace(r"\u002F", "/")
|
||||
```
|
||||
|
||||
### 优势
|
||||
|
||||
- 不再依赖手动维护的 `DOUYIN_COOKIE`,ttwid 每次请求自动获取
|
||||
- 不受 yt-dlp 对抖音支持状况影响
|
||||
- 所有用户通用,不绑定特定账号
|
||||
|
||||
---
|
||||
|
||||
## 🧹 清理 DOUYIN_COOKIE 配置
|
||||
|
||||
`DOUYIN_COOKIE` 仅用于文案提取,新方案不再需要,已从以下位置删除:
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/.env` | 删除 `DOUYIN_COOKIE` 配置项及注释 |
|
||||
| `backend/app/core/config.py` | 删除 `DOUYIN_COOKIE: str = ""` 字段定义 |
|
||||
| `backend/app/modules/tools/service.py` | 删除 yt-dlp 传 cookie 逻辑和 `_write_netscape_cookies` 辅助函数 |
|
||||
|
||||
---
|
||||
|
||||
## 🔤 前端文案修正
|
||||
|
||||
将文案提取界面中的"AI 洗稿结果"改为"AI 改写结果"。
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | `AI 洗稿结果` → `AI 改写结果` |
|
||||
| `backend/app/modules/tools/service.py` | 注释中"洗稿"→"改写" |
|
||||
| `backend/app/services/glm_service.py` | docstring 中"洗稿"→"改写文案" |
|
||||
|
||||
---
|
||||
|
||||
## 📦 其他变更
|
||||
|
||||
- **yt-dlp 升级**:`2025.12.08` → `2026.2.21`
|
||||
- **yt-dlp 初始化修正**:改为 `YoutubeDL(ydl_opts)` 直接传参初始化(原先空初始化后 update params 不生效)
|
||||
- **User-Agent 更新**:yt-dlp 中 `Chrome/91` → `Chrome/131`
|
||||
|
||||
---
|
||||
|
||||
## 涉及文件汇总
|
||||
|
||||
### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/modules/tools/service.py` | 重写 `_download_douyin_manual`(移动端分享页方案);修正 yt-dlp 初始化;清理 cookie 相关代码;注释改写 |
|
||||
| `backend/app/services/glm_service.py` | docstring "洗稿" → "改写文案" |
|
||||
| `backend/app/core/config.py` | 删除 `DOUYIN_COOKIE` 字段 |
|
||||
| `backend/.env` | 删除 `DOUYIN_COOKIE` 配置 |
|
||||
| `backend/requirements.txt` | yt-dlp 版本升级 |
|
||||
|
||||
### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | "AI 洗稿结果" → "AI 改写结果" |
|
||||
|
||||
---
|
||||
|
||||
## ✏️ AI 智能改写 — 自定义提示词功能
|
||||
|
||||
### 概述
|
||||
|
||||
文案提取助手的"AI 智能改写"原先使用硬编码 prompt,用户无法定制改写风格。本次在 checkbox 右侧新增"自定义提示词"折叠区域,用户可编辑自定义 prompt,持久化到 localStorage,后端按需替换默认 prompt。
|
||||
|
||||
### 后端修改
|
||||
|
||||
**路由层** (`router.py`):`extract_script_tool` 新增可选 Form 参数 `custom_prompt: Optional[str] = Form(None)`,透传给 service。
|
||||
|
||||
**服务层** (`service.py`):`extract_script()` 签名新增 `custom_prompt`,透传给 `glm_service.rewrite_script(script, custom_prompt)`。
|
||||
|
||||
**AI 层** (`glm_service.py`):`rewrite_script(self, text, custom_prompt=None)`,若 `custom_prompt` 有值则用自定义 prompt + 原文拼接,否则保持原有默认 prompt。
|
||||
|
||||
```python
|
||||
if custom_prompt and custom_prompt.strip():
|
||||
prompt = f"""{custom_prompt.strip()}
|
||||
|
||||
原始文案:
|
||||
{text}"""
|
||||
else:
|
||||
prompt = f"""请将以下视频文案进行改写。...(原有默认)"""
|
||||
```
|
||||
|
||||
### 前端修改
|
||||
|
||||
**Hook** (`useScriptExtraction.ts`):
|
||||
- 新增 `customPrompt` / `showCustomPrompt` 状态
|
||||
- 初始值从 `localStorage.getItem("vigent_rewriteCustomPrompt")` 恢复
|
||||
- `customPrompt` 变化时防抖 300ms 保存到 localStorage
|
||||
- `handleExtract()` 中若 `doRewrite && customPrompt.trim()` 有值,追加 `formData.append("custom_prompt", ...)`
|
||||
- modal 重置时不清空 customPrompt(持久化偏好)
|
||||
|
||||
**UI** (`ScriptExtractionModal.tsx`):
|
||||
- checkbox 同行右侧新增"自定义提示词 ▼"按钮(仅 `doRewrite` 时显示)
|
||||
- 点击展开 textarea 编辑区域,底部提示"留空则使用默认提示词"
|
||||
- 取消勾选 AI 智能改写时,自定义提示词区域自动隐藏
|
||||
|
||||
### 涉及文件
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/modules/tools/router.py` | 新增 `custom_prompt` Form 参数 |
|
||||
| `backend/app/modules/tools/service.py` | `extract_script()` 透传 `custom_prompt` |
|
||||
| `backend/app/services/glm_service.py` | `rewrite_script()` 支持自定义 prompt |
|
||||
| `frontend/.../useScriptExtraction.ts` | 新增状态、localStorage 持久化、FormData 传参 |
|
||||
| `frontend/.../ScriptExtractionModal.tsx` | UI 按钮 + 展开 textarea |
|
||||
|
||||
### 验证
|
||||
|
||||
- 后端 `python -m py_compile` 三个文件通过
|
||||
- 前端 `npx tsc --noEmit` 通过
|
||||
|
||||
---
|
||||
|
||||
## 🐛 SSR 构建修复 — localStorage is not defined
|
||||
|
||||
### 问题
|
||||
|
||||
`npm run build` 报错 `ReferenceError: localStorage is not defined`,因为 `useScriptExtraction.ts` 中 `useState` 的初始化函数在 SSR(Node.js)环境下也会执行,而服务端没有 `localStorage`。
|
||||
|
||||
### 修复
|
||||
|
||||
`useState` 初始化加 `typeof window !== "undefined"` 守卫:
|
||||
|
||||
```typescript
|
||||
const [customPrompt, setCustomPrompt] = useState(
|
||||
() => typeof window !== "undefined" ? localStorage.getItem(CUSTOM_PROMPT_KEY) || "" : ""
|
||||
);
|
||||
```
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/.../useScriptExtraction.ts` | `useState` 初始化增加 SSR 安全守卫 |
|
||||
|
||||
---
|
||||
|
||||
## 🎬 片头副标题功能
|
||||
|
||||
### 概述
|
||||
|
||||
新增片头副标题(secondary_title),显示在主标题下方,用于补充说明或悬念引导。副标题有独立的样式配置(字体、字号、颜色等),可由 AI 同时生成,20 字限制,仅在视频画面中显示,不参与发布标题。
|
||||
|
||||
命名约定:后端 `secondary_title`(snake_case),前端 `videoSecondaryTitle`(camelCase),用户界面"片头副标题"。
|
||||
|
||||
---
|
||||
|
||||
### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/modules/videos/schemas.py` | `GenerateRequest` 新增 4 个可选字段:`secondary_title`、`secondary_title_style_id`、`secondary_title_font_size`、`secondary_title_top_margin` |
|
||||
| `backend/app/services/glm_service.py` | AI prompt 增加副标题生成要求(不超过20字),JSON 格式新增 `secondary_title` 字段 |
|
||||
| `backend/app/modules/ai/router.py` | `GenerateMetaResponse` 增加 `secondary_title: str = ""`,endpoint 返回时取 `result.get("secondary_title", "")` |
|
||||
| `backend/app/modules/videos/workflow.py` | `use_remotion` 条件增加 `or req.secondary_title`;副标题样式解析复用 `get_style("title", ...)`;字号/间距覆盖;`prepare_style_for_remotion` 处理副标题字体;`remotion_service.render()` 传入 `secondary_title` + `secondary_title_style` |
|
||||
| `backend/app/services/remotion_service.py` | `render()` 新增 `secondary_title` 和 `secondary_title_style` 参数,构建 CLI 参数 `--secondaryTitle` 和 `--secondaryTitleStyle` |
|
||||
|
||||
### Remotion 修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `remotion/render.ts` | `RenderOptions` 新增 `secondaryTitle?` + `secondaryTitleStyle?`;`parseArgs()` 新增 switch case;`inputProps` 新增两个字段 |
|
||||
| `remotion/src/components/Title.tsx` | `TitleProps` 新增 `secondaryTitle?` 和 `secondaryTitleStyle?`;`AbsoluteFill` 改为 `flexDirection: 'column'` 垂直堆叠;主标题 `<h1>` 后增加副标题 `<h2>`,独立样式(默认字号 48px、字重 700),共享淡入淡出动画;副标题字体使用独立 `@font-face`(`SecondaryTitleFont`)避免与主标题冲突 |
|
||||
| `remotion/src/Video.tsx` | `VideoProps` 新增 `secondaryTitle?` + `secondaryTitleStyle?`;传递给 `<Title>` 组件;渲染条件改为 `{(title \|\| secondaryTitle) && ...}` |
|
||||
| `remotion/src/Root.tsx` | `defaultProps` 新增 `secondaryTitle: undefined` + `secondaryTitleStyle: undefined` |
|
||||
|
||||
### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/shared/lib/title.ts` | 新增 `SECONDARY_TITLE_MAX_LENGTH = 20` 和 `clampSecondaryTitle()` |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 新增状态 `videoSecondaryTitle`、`selectedSecondaryTitleStyleId`、`secondaryTitleFontSize`、`secondaryTitleTopMargin`、`secondaryTitleSizeLocked`;新建 `secondaryTitleInput = useTitleInput({ maxLength: 20 })`(不 sync 到发布页);`handleGenerateMeta()` 接收并填充 `secondary_title`;`handleGenerate()` 构建 payload 增加副标题字段;return 暴露所有新状态 |
|
||||
| `frontend/src/features/home/model/useHomePersistence.ts` | 新增 localStorage key:`secondaryTitle`、`secondaryTitleStyle`、`secondaryTitleFontSize`、`secondaryTitleTopMargin`;对应的恢复和保存 effect |
|
||||
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | Props 新增副标题相关;主标题输入框下方添加"片头副标题(限制20个字)"输入框;副标题样式选择器(复用 titleStyles 预设)、字号滑块(30-100px)、间距滑块(0-100px) |
|
||||
| `frontend/src/features/home/ui/FloatingStylePreview.tsx` | 标题预览改为 flex column 布局;主标题下方增加副标题预览行,独立样式渲染 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 从 `useHomeController` 解构新状态,传给 `TitleSubtitlePanel` |
|
||||
|
||||
---
|
||||
|
||||
## 🐛 参考音频上传 — 中文文件名 InvalidKey 修复
|
||||
|
||||
### 问题
|
||||
|
||||
上传中文名参考音频(如"我的声音.wav")时,Supabase Storage 报 `InvalidKey`,因为存储路径直接使用了原始中文文件名。
|
||||
|
||||
### 修复
|
||||
|
||||
在 `ref_audios/service.py` 新增 `sanitize_filename()` 函数,将存储路径的文件名清洗为 ASCII 安全字符(仅 `A-Za-z0-9._-`):
|
||||
|
||||
- NFKD 规范化 → 丢弃非 ASCII → 非法字符替换为 `_`
|
||||
- 纯中文/emoji 清洗后为空时,使用 MD5 哈希兜底(如 `audio_e924b1193007`)
|
||||
- 文件名限长 50 字符
|
||||
- 原始中文文件名保留在 metadata 中作为展示名,前端显示不受影响
|
||||
|
||||
```
|
||||
修复前: cbbe.../1771915755_我的声音.wav → InvalidKey
|
||||
修复后: cbbe.../1771915755_audio_xxxxxxxx.wav → 上传成功
|
||||
```
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/modules/ref_audios/service.py` | 新增 `sanitize_filename()` 函数,上传路径使用清洗后文件名 |
|
||||
239
Docs/DevLogs/Day26.md
Normal file
239
Docs/DevLogs/Day26.md
Normal file
@@ -0,0 +1,239 @@
|
||||
## 🎨 前端优化:板块合并 + 序号标题 + UI 精细化 (Day 26)
|
||||
|
||||
### 概述
|
||||
|
||||
首页原有 9 个独立板块(左栏 7 个 + 右栏 2 个),每个都有自己的卡片容器和标题,视觉碎片化严重。本次将相关板块合并为 5 个主板块,添加中文序号(一~十),移除 emoji 图标,并对多个子组件的布局和交互细节进行优化。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 改动内容
|
||||
|
||||
### 1. 板块合并方案
|
||||
|
||||
**左栏(4 个主板块 + 2 个独立区域):**
|
||||
|
||||
| 序号 | 板块名 | 子板块 | 原组件 |
|
||||
|------|--------|--------|--------|
|
||||
| 一 | 文案提取与编辑 | — | ScriptEditor |
|
||||
| 二 | 标题与字幕 | — | TitleSubtitlePanel |
|
||||
| 三 | 配音 | 配音方式 / 配音列表 | VoiceSelector + GeneratedAudiosPanel |
|
||||
| 四 | 素材编辑 | 视频素材 / 时间轴编辑 | MaterialSelector + TimelineEditor |
|
||||
| 五 | 背景音乐 | — | BgmPanel |
|
||||
| — | 生成按钮 | — | GenerateActionBar(不编号) |
|
||||
|
||||
**右栏(1 个主板块):**
|
||||
|
||||
| 序号 | 板块名 | 子板块 | 原组件 |
|
||||
|------|--------|--------|--------|
|
||||
| 六 | 作品 | 作品列表 / 作品预览 | HistoryList + PreviewPanel |
|
||||
|
||||
**发布页(/publish):**
|
||||
|
||||
| 序号 | 板块名 |
|
||||
|------|--------|
|
||||
| 七 | 平台账号 |
|
||||
| 八 | 选择发布作品 |
|
||||
| 九 | 发布信息 |
|
||||
| 十 | 选择发布平台 |
|
||||
|
||||
### 2. embedded 模式
|
||||
|
||||
6 个组件新增 `embedded?: boolean` prop(默认 `false`):
|
||||
|
||||
- `VoiceSelector` — embedded 时不渲染外层卡片和主标题
|
||||
- `GeneratedAudiosPanel` — embedded 时两行布局:第 1 行(语速+生成配音右对齐)、第 2 行(配音列表+刷新)
|
||||
- `MaterialSelector` — embedded 时自渲染 h3 子标题"视频素材"+ 上传/刷新按钮同行
|
||||
- `TimelineEditor` — embedded 时自渲染 h3 子标题"时间轴编辑"+ 画面比例/播放控件同行
|
||||
- `PreviewPanel` — embedded 时不渲染外层卡片和标题
|
||||
- `HistoryList` — embedded 时不渲染外层卡片和标题(刷新按钮由 HomePage 提供)
|
||||
|
||||
### 3. 序号标题 + emoji 移除
|
||||
|
||||
所有编号板块移除 emoji 图标,使用纯中文序号:
|
||||
|
||||
- ScriptEditor: `✍️ 文案提取与编辑` → `一、文案提取与编辑`
|
||||
- TitleSubtitlePanel: `🎬 标题与字幕` → `二、标题与字幕`
|
||||
- BgmPanel: `🎵 背景音乐` → `五、背景音乐`
|
||||
- HomePage 右栏: `五、作品` → `六、作品`
|
||||
- PublishPage: `👤 平台账号` → `七、平台账号`、`📹 选择发布作品` → `八、选择发布作品`、`✍️ 发布信息` → `九、发布信息`、`📱 选择发布平台` → `十、选择发布平台`
|
||||
|
||||
### 4. 子标题与分隔样式
|
||||
|
||||
- **主标题**: `text-base sm:text-lg font-semibold text-white`
|
||||
- **子标题**: `text-sm font-medium text-gray-400`
|
||||
- **分隔线**: `<div className="border-t border-white/10 my-4" />`
|
||||
|
||||
### 5. 配音列表布局优化
|
||||
|
||||
GeneratedAudiosPanel embedded 模式下采用两行布局:
|
||||
- **第 1 行**:语速下拉 + 生成配音按钮(右对齐,`flex justify-end`)
|
||||
- **第 2 行**:`<h3>配音列表</h3>` + 刷新按钮(两端对齐)
|
||||
- 非 embedded 模式保持原单行布局
|
||||
|
||||
### 6. TitleSubtitlePanel 下拉对齐
|
||||
|
||||
- 标题样式/副标题样式/字幕样式三行标签统一 `w-20`(固定 80px),确保下拉菜单垂直对齐
|
||||
- 下拉菜单宽度 `w-1/3 min-w-[100px]`,避免过宽
|
||||
|
||||
### 7. RefAudioPanel 文案简化
|
||||
|
||||
- 原底部段落"上传任意语音样本(3-10秒)…" 移至 "我的参考音频" 标题旁,简化为 `(上传3-10秒语音样本)`
|
||||
|
||||
### 8. 账户下拉菜单添加手机号
|
||||
|
||||
- AccountSettingsDropdown 在账户有效期上方新增手机号显示区域
|
||||
- 显示 `user?.phone || '未知账户'`
|
||||
|
||||
### 9. 标题显示模式对副标题生效
|
||||
|
||||
- **payload 修复**: `useHomeController.ts` 中 `title_display_mode` 的发送条件从 `videoTitle.trim()` 改为 `videoTitle.trim() || videoSecondaryTitle.trim()`,确保仅有副标题时也能发送显示模式
|
||||
- **UI 调整**: 短暂显示/常驻显示下拉从片头标题输入行移至"二、标题与字幕"板块标题行(与预览样式按钮同行),明确表示该设置对标题和副标题同时生效
|
||||
- Remotion 端 `Title.tsx` 已支持(标题和副标题作为整体组件渲染,`displayMode` 统一控制)
|
||||
|
||||
### 10. 时间轴模糊遮罩
|
||||
|
||||
遮罩从外层 wrapper 移入"四、素材编辑"卡片内,仅覆盖时间轴子区域(`rounded-xl`)。
|
||||
|
||||
### 11. 登录后用户信息立即可用
|
||||
|
||||
- AuthContext 新增 `setUser` 方法暴露给消费者
|
||||
- 登录页成功后调用 `setUser(result.user)` 立即写入 Context,无需等页面刷新
|
||||
- 修复登录后账户下拉显示"未知账户"、刷新后才显示手机号的问题
|
||||
|
||||
### 12. 文案与选项微调
|
||||
|
||||
- MaterialSelector 描述 `(可多选,最多4个)` → `(上传自拍视频,最多可选4个)`
|
||||
- TitleSubtitlePanel 显示模式选项 `短暂显示/常驻显示` → `标题短暂显示/标题常驻显示`
|
||||
|
||||
### 13. UI/UX 体验优化(6 项)
|
||||
|
||||
- **操作按钮移动端可见**: 配音列表、作品列表、素材列表、参考音频、历史文案的操作按钮从 `opacity-0`(hover 才显示)改为 `opacity-40`(平时半透明可见,hover 全亮),解决触屏设备无法发现按钮的问题
|
||||
- **手机号脱敏**: AccountSettingsDropdown 手机号中间四位遮掩 `138****5678`
|
||||
- **标题字数计数器**: TitleSubtitlePanel 标题/副标题输入框右侧显示实时字数 `3/15`,超限变红
|
||||
- **列表滚动条提示**: ~~配音列表、作品列表、素材列表、BGM 列表从 `hide-scrollbar` 改为 `custom-scrollbar`~~ → 已全部改回 `hide-scrollbar` 隐藏滚动条(滚动功能不变)
|
||||
- **时间轴拖拽提示**: TimelineEditor 色块左上角新增 `GripVertical` 抓手图标,暗示可拖拽排序
|
||||
- **截取滑块放大**: ClipTrimmer 手柄从 16px 放大到 20px,触控区从 32px 放大到 40px
|
||||
|
||||
### 14. 代码质量修复(4 项)
|
||||
|
||||
- **AccountSettingsDropdown**: 关闭密码弹窗补齐 `setSuccess('')` 清空
|
||||
- **MaterialSelector**: `selectedSet` 加 `useMemo` 避免每次渲染重建
|
||||
- **TimelineEditor**: `visibleSegments`/`overflowSegments` 加 `useMemo`
|
||||
- **MaterialSelector**: 素材满 4 个时非选中项按钮加 `disabled`
|
||||
|
||||
### 15. 发布页平台账号响应式布局
|
||||
|
||||
- **单行布局**:图标+名称+状态在左,按钮在右(`flex items-center`)
|
||||
- **移动端紧凑**:图标 `h-6 w-6`、按钮 `text-xs px-2 py-1 rounded-md`、间距 `space-y-2 px-3 py-2.5`
|
||||
- **桌面端宽松**:`sm:h-7 sm:w-7`、`sm:text-sm sm:px-3 sm:py-1.5 sm:rounded-lg`、`sm:space-y-3 sm:px-4 sm:py-3.5`
|
||||
- 两端各自美观,风格与其他板块一致
|
||||
|
||||
### 16. 移动端刷新回顶部修复
|
||||
|
||||
- **问题**: 移动端刷新页面后不回到顶部,而是滚动到背景音乐板块
|
||||
- **根因**: 1) 浏览器原生滚动恢复覆盖 `scrollTo(0,0)`;2) 列表 scroll effect 有双依赖(`selectedId` + `list`),数据异步加载时第二次触发跳过了 ref 守卫,执行了 `scrollIntoView` 导致页面跳动
|
||||
- **修复**: 三管齐下 — ① `history.scrollRestoration = "manual"` 禁用浏览器原生恢复;② 时间门控 `scrollEffectsEnabled` ref(1 秒内禁止所有列表自动滚动)替代单次 ref 守卫;③ 200ms 延迟兜底 `scrollTo(0,0)`
|
||||
|
||||
### 17. 移动端样式预览窗口缩小
|
||||
|
||||
- **问题**: 移动端点击"预览样式"后窗口占满整屏(宽 358px,高约 636px),遮挡样式调节控件
|
||||
- **修复**: 移动端宽度从 `window.innerWidth - 32` 缩小到 **160px**;位置从左上角改为**右下角**(`right:12, bottom:12`),不遮挡上方控件;最大高度限制 `50dvh`
|
||||
- 桌面端保持不变(280px,左上角)
|
||||
|
||||
### 18. 列表滚动条统一隐藏
|
||||
|
||||
- 将 Day 26 早期改为 `custom-scrollbar`(细紫色滚动条)的 7 处全部改回 `hide-scrollbar`
|
||||
- 涉及:BgmPanel、GeneratedAudiosPanel、HistoryList、MaterialSelector(2处)、ScriptExtractionModal(2处)
|
||||
- 滚动功能不受影响,仅视觉上不显示滚动条
|
||||
|
||||
### 19. 配音按钮移动端适配
|
||||
|
||||
- VoiceSelector "选择声音/克隆声音" 按钮:内边距 `px-4` → `px-2 sm:px-4`,字号加 `text-sm sm:text-base`,图标加 `shrink-0`
|
||||
- 修复移动端窄屏下按钮被挤压导致"克隆声音"不可见的问题
|
||||
|
||||
### 20. 素材标题溢出修复
|
||||
|
||||
- MaterialSelector embedded 标题行移除 `whitespace-nowrap`
|
||||
- 描述文字 `(上传自拍视频,最多可选4个)` 在移动端隐藏(`hidden sm:inline`),桌面端正常显示
|
||||
- 修复移动端刷新按钮被推出容器外的问题
|
||||
|
||||
### 21. 生成配音按钮放大
|
||||
|
||||
- "生成配音" 作为核心操作按钮,从辅助尺寸升级为主操作尺寸
|
||||
- 内边距 `px-2/px-3 py-1/py-1.5` → `px-4 py-2`,字号 `text-xs` → `text-sm font-medium`
|
||||
- 图标 `h-3.5 w-3.5` → `h-4 w-4`,新增 `shadow-sm` + hover `shadow-md`
|
||||
- embedded 与非 embedded 模式统一放大
|
||||
|
||||
### 22. 生成进度条位置调整
|
||||
|
||||
- **问题**: 生成进度条在"六、作品"卡片内部(作品预览下方),不够醒目
|
||||
- **修复**: 进度条从 PreviewPanel 内部提取到 HomePage 右栏,作为独立卡片渲染在"六、作品"卡片**上方**
|
||||
- 使用紫色边框(`border-purple-500/30`)区分,显示任务消息和百分比
|
||||
- PreviewPanel embedded 模式下不再渲染进度条(传入 `currentTask={null}`)
|
||||
- 生成完成后进度卡片自动消失
|
||||
|
||||
### 23. LatentSync 超时修复
|
||||
|
||||
- **问题**: 约 2 分钟的视频(3023 帧,190 段推理)预计推理 54 分钟,但 httpx 超时仅 20 分钟,导致 LatentSync 调用失败并回退到无口型同步
|
||||
- **根因**: `lipsync_service.py` 中 `httpx.AsyncClient(timeout=1200.0)` 不足以覆盖长视频推理时间
|
||||
- **修复**: 超时从 `1200s`(20 分钟)改为 `3600s`(1 小时),足以覆盖 2-3 分钟视频的推理
|
||||
|
||||
### 24. 字幕时间戳节奏映射(修复长视频字幕漂移)
|
||||
|
||||
- **问题**: 2 分钟视频字幕明显对不上语音,越到后面偏差越大
|
||||
- **根因**: `whisper_service.py` 的 `original_text` 处理逻辑丢弃了 Whisper 逐词时间戳,仅保留总时间范围后做全程线性插值,每个字分配相同时长,完全忽略语速变化和停顿
|
||||
- **修复**: 保留 Whisper 的逐字时间戳作为语音节奏模板,将原文字符按比例映射到 Whisper 时间节奏上(rhythm-mapping),而非线性均分。字幕文字不变,只是时间戳跟随真实语速
|
||||
- **算法**: 原文第 i 个字符映射到 Whisper 时间线的 `(i/N)*M` 位置(N=原文字符数,M=Whisper字符数),在相邻 Whisper 时间点间线性插值
|
||||
|
||||
---
|
||||
|
||||
## 📁 修改文件清单
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `VoiceSelector.tsx` | 新增 embedded prop,移动端按钮适配(`px-2 sm:px-4`) |
|
||||
| `GeneratedAudiosPanel.tsx` | 新增 embedded prop,两行布局,操作按钮可见度,"生成配音"按钮放大 |
|
||||
| `MaterialSelector.tsx` | 新增 embedded prop,自渲染子标题+操作按钮,useMemo,disabled 守卫,操作按钮可见度,标题溢出修复 |
|
||||
| `TimelineEditor.tsx` | 新增 embedded prop,自渲染子标题+控件,useMemo,拖拽抓手图标 |
|
||||
| `PreviewPanel.tsx` | 新增 embedded prop |
|
||||
| `HistoryList.tsx` | 新增 embedded prop,操作按钮可见度 |
|
||||
| `ScriptEditor.tsx` | 标题加序号,移除 emoji,操作按钮可见度 |
|
||||
| `TitleSubtitlePanel.tsx` | 标题加序号,移除 emoji,下拉对齐,显示模式下拉上移,字数计数器 |
|
||||
| `BgmPanel.tsx` | 标题加序号 |
|
||||
| `HomePage.tsx` | 核心重构:合并板块、序号标题、生成配音按钮迁入、`scrollRestoration` + 延迟兜底修复刷新回顶部、生成进度条提取到作品卡片上方 |
|
||||
| `PublishPage.tsx` | 四个板块加序号(七~十),移除 emoji,平台卡片响应式单行布局 |
|
||||
| `RefAudioPanel.tsx` | 简化提示文案,操作按钮可见度 |
|
||||
| `AccountSettingsDropdown.tsx` | 新增手机号显示(脱敏),补齐 success 清空 |
|
||||
| `AuthContext.tsx` | 新增 `setUser` 方法,登录后立即更新用户状态 |
|
||||
| `login/page.tsx` | 登录成功后调用 `setUser` 写入用户数据 |
|
||||
| `useHomeController.ts` | titleDisplayMode 条件修复,列表 scroll 时间门控 `scrollEffectsEnabled` |
|
||||
| `FloatingStylePreview.tsx` | 移动端预览窗口缩小(160px)并移至右下角 |
|
||||
| `ScriptExtractionModal.tsx` | 滚动条改回隐藏 |
|
||||
| `ClipTrimmer.tsx` | 滑块手柄放大、触控区增高 |
|
||||
| `lipsync_service.py` | httpx 超时从 1200s 改为 3600s |
|
||||
| `whisper_service.py` | 字幕时间戳从线性插值改为 Whisper 节奏映射 |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证
|
||||
|
||||
- `npm run build` — 零报错零警告
|
||||
- 合并后布局:各子板块分隔清晰、主标题有序号
|
||||
- 向后兼容:`embedded` 默认 `false`,组件独立使用不受影响
|
||||
- 配音列表两行布局:语速+生成配音在上,配音列表+刷新在下
|
||||
- 下拉菜单垂直对齐正确
|
||||
- 短暂显示/常驻显示对标题和副标题同时生效
|
||||
- 操作按钮在移动端(触屏)可见
|
||||
- 手机号脱敏显示
|
||||
- 标题字数计数器正常
|
||||
- 列表滚动条全部隐藏
|
||||
- 时间轴拖拽抓手图标显示
|
||||
- 发布页平台卡片:移动端紧凑、桌面端宽松,风格一致
|
||||
- 移动端刷新后回到顶部,不再滚动到背景音乐位置
|
||||
- 移动端样式预览窗口不遮挡控件
|
||||
- 移动端配音按钮(选择声音/克隆声音)均可见
|
||||
- 移动端素材标题行按钮不溢出
|
||||
- 生成配音按钮视觉层级高于辅助按钮
|
||||
- 生成进度条在作品卡片上方独立显示
|
||||
- LatentSync 长视频推理不再超时回退
|
||||
- 字幕时间戳与语音节奏同步,长视频不漂移
|
||||
231
Docs/DevLogs/Day27.md
Normal file
231
Docs/DevLogs/Day27.md
Normal file
@@ -0,0 +1,231 @@
|
||||
## Remotion 描边修复 + 字体样式扩展 + TypeScript 修复 (Day 27)
|
||||
|
||||
### 概述
|
||||
|
||||
修复标题/字幕描边渲染问题(描边过粗 + 副标题重影),扩展字体样式选项(标题 4→12、字幕 4→8),修复 Remotion 项目 TypeScript 类型错误。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 改动内容
|
||||
|
||||
### 1. 描边渲染修复(标题 + 字幕)
|
||||
|
||||
- **问题**: 标题黑色描边过粗,副标题出现重影/鬼影
|
||||
- **根因**: `buildTextShadow` 用 4 方向 `textShadow` 模拟描边 — 对角线叠加导致描边视觉上比实际 `stroke_size` 更粗;4 角方向在中间有间隙和叠加,造成重影
|
||||
- **修复**: 改用 CSS 原生描边 `-webkit-text-stroke` + `paint-order: stroke fill`(Remotion 用 Chromium 渲染,完美支持)
|
||||
- **旧方案**:
|
||||
```javascript
|
||||
textShadow: `-8px -8px 0 #000, 8px -8px 0 #000, -8px 8px 0 #000, 8px 8px 0 #000, 0 0 16px rgba(0,0,0,0.5), 0 2px 4px rgba(0,0,0,0.3)`
|
||||
```
|
||||
- **新方案**:
|
||||
```javascript
|
||||
WebkitTextStroke: `5px #000000`,
|
||||
paintOrder: 'stroke fill',
|
||||
textShadow: `0 2px 4px rgba(0,0,0,0.3)`,
|
||||
```
|
||||
- 同时将所有预设样式的 `stroke_size` 从 8 降到 5,配合原生描边视觉更干净
|
||||
|
||||
### 2. 字体样式扩展
|
||||
|
||||
**标题样式**: 4 个 → 12 个(+8)
|
||||
|
||||
| ID | 样式名 | 字体 | 配色 |
|
||||
|----|--------|------|------|
|
||||
| title_pangmen | 庞门正道 | 庞门正道标题体3.0 | 白字黑描 |
|
||||
| title_round | 优设标题圆 | 优设标题圆 | 白字紫描 |
|
||||
| title_alibaba | 阿里数黑体 | 阿里巴巴数黑体 | 白字黑描 |
|
||||
| title_chaohei | 文道潮黑 | 文道潮黑 | 青蓝字深蓝描 |
|
||||
| title_wujie | 无界黑 | 标小智无界黑 | 白字深灰描 |
|
||||
| title_houdi | 厚底黑 | Aa厚底黑 | 红字深黑描 |
|
||||
| title_banyuan | 寒蝉半圆体 | 寒蝉半圆体 | 白字黑描 |
|
||||
| title_jixiang | 欣意吉祥宋 | 字体圈欣意吉祥宋 | 金字棕描 |
|
||||
|
||||
**字幕样式**: 4 个 → 8 个(+4)
|
||||
|
||||
| ID | 样式名 | 字体 | 高亮色 |
|
||||
|----|--------|------|--------|
|
||||
| subtitle_pink | 少女粉 | DingTalk JinBuTi | 粉色 #FF69B4 |
|
||||
| subtitle_lime | 清新绿 | DingTalk Sans | 荧光绿 #76FF03 |
|
||||
| subtitle_gold | 金色隶书 | 阿里妈妈刀隶体 | 金色 #FDE68A |
|
||||
| subtitle_kai | 楷体红字 | SimKai | 红色 #FF4444 |
|
||||
|
||||
### 3. TypeScript 类型错误修复
|
||||
|
||||
- **Root.tsx**: `Composition` 泛型类型与 `calculateMetadata` 参数类型不匹配 — 内联 `calculateMetadata` 并显式标注参数类型,`defaultProps` 使用 `satisfies VideoProps` 约束
|
||||
- **Video.tsx**: `VideoProps` 接口添加 `[key: string]: unknown` 索引签名,兼容 Remotion 要求的 `Record<string, unknown>` 约束
|
||||
- **VideoLayer.tsx**: `OffthreadVideo` 组件不支持 `loop` prop — 移除(该 prop 原本就被忽略)
|
||||
|
||||
### 4. 进度条文案还原
|
||||
|
||||
- **问题**: 进度条显示后端推送的详细阶段消息(如"正在合成唇型"),用户希望只显示"正在AI生成中..."
|
||||
- **修复**: `HomePage.tsx` 进度条文案从 `{currentTask.message || "正在AI生成中..."}` 改为固定 `正在AI生成中...`
|
||||
|
||||
---
|
||||
|
||||
## 📁 修改文件清单
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `remotion/src/components/Title.tsx` | `buildTextShadow` → `buildStrokeStyle`(CSS 原生描边),标题+副标题同时生效 |
|
||||
| `remotion/src/components/Subtitles.tsx` | `buildTextShadow` → `buildStrokeStyle`(CSS 原生描边) |
|
||||
| `remotion/src/Root.tsx` | 修复 `Composition` 泛型类型、`calculateMetadata` 参数类型 |
|
||||
| `remotion/src/Video.tsx` | `VideoProps` 添加索引签名 |
|
||||
| `remotion/src/components/VideoLayer.tsx` | 移除 `OffthreadVideo` 不支持的 `loop` prop |
|
||||
| `backend/assets/styles/title.json` | 标题样式从 4 个扩展到 12 个,`stroke_size` 8→5 |
|
||||
| `backend/assets/styles/subtitle.json` | 字幕样式从 4 个扩展到 8 个 |
|
||||
| `frontend/.../HomePage.tsx` | 进度条文案还原为固定"正在AI生成中..." |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证
|
||||
|
||||
- `npx tsc --noEmit` — 零错误
|
||||
- `npm run build:render` — 渲染脚本编译成功
|
||||
- `npm run build`(前端)— 零报错
|
||||
- 描边:标题/副标题/字幕使用 CSS 原生描边,无重影、无虚胖
|
||||
- 样式选择:前端下拉可加载全部 12 个标题 + 8 个字幕样式
|
||||
|
||||
---
|
||||
|
||||
## 视频生成流水线性能优化
|
||||
|
||||
### 概述
|
||||
|
||||
针对视频生成流水线进行全面性能优化,涵盖 FFmpeg 编码参数、LatentSync 推理参数、多素材并行化、以及后处理阶段并行化。预估 15s 单素材视频从 ~280s 降至 ~190s (32%),30s 双素材从 ~400s 降至 ~240s (40%)。
|
||||
|
||||
**服务器配置**: 2x RTX 3090 (24GB), 2x Xeon E5-2680 v4 (56核), 192GB RAM
|
||||
|
||||
### 第一阶段:FFmpeg 编码优化
|
||||
|
||||
**最终合成 preset `slow` → `medium`**
|
||||
- 合成阶段从 ~50s 降到 ~25s,质量几乎无变化
|
||||
|
||||
**中间文件 CRF 18 → 23**
|
||||
- 中间产物(trim、prepare_segment、concat、loop、normalize_orientation)不是最终输出,不需要高质量编码
|
||||
- 每个中间步骤快 3-8 秒
|
||||
|
||||
**最终合成 CRF 18 → 20**
|
||||
- 15 秒口播视频 CRF 18 vs 20 肉眼无法区分
|
||||
|
||||
### 第二阶段:LatentSync 推理参数调优
|
||||
|
||||
**inference_steps 20 → 16**
|
||||
- 推理时间线性减少 20%(~180s → ~144s)
|
||||
|
||||
**guidance_scale 2.0 → 1.5**
|
||||
- classifier-free guidance 权重降低,每步计算量微降(5-10%)
|
||||
|
||||
> ⚠️ 两项需重启 LatentSync 服务后测试唇形质量,确认可接受再保留。如质量不佳可回退 .env 参数。
|
||||
|
||||
### 第三阶段:多素材流水线并行化
|
||||
|
||||
**素材下载 + 归一化并行**
|
||||
- 串行 `for` 循环改为 `asyncio.gather()`,`normalize_orientation` 通过 `run_in_executor` 在线程池执行
|
||||
- N 个素材从串行 N×5s → ~5s
|
||||
|
||||
**片段预处理并行**
|
||||
- 逐个 `prepare_segment` 改为 `asyncio.gather()` + `run_in_executor`
|
||||
- 2 素材 ~90s → ~50s;4 素材 ~180s → ~60s
|
||||
|
||||
### 第四阶段:流水线交叠
|
||||
|
||||
**Whisper 字幕对齐 与 BGM 混音 并行**
|
||||
- 两者互不依赖(都只依赖 audio_path),用 `asyncio.gather()` 并行执行
|
||||
- 单素材模式下 Whisper 从 LatentSync 之后的串行步骤移至与 BGM 并行
|
||||
- 不开 BGM 或不开字幕时行为不变,只有同时启用时才并行
|
||||
|
||||
### 修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `backend/app/services/video_service.py` | compose: preset slow→medium, CRF 18→20; normalize_orientation/prepare_segment/concat: CRF 18→23 |
|
||||
| `backend/app/services/lipsync_service.py` | _loop_video_to_duration: CRF 18→23 |
|
||||
| `backend/.env` | LATENTSYNC_INFERENCE_STEPS=16, LATENTSYNC_GUIDANCE_SCALE=1.5 |
|
||||
| `backend/app/modules/videos/workflow.py` | import asyncio; 素材下载/归一化并行; 片段预处理并行; Whisper+BGM 并行 |
|
||||
|
||||
### 回退方案
|
||||
|
||||
- FFmpeg 参数:如画质不满意,将最终 CRF 改回 18、preset 改回 slow
|
||||
- LatentSync:如唇形质量下降,将 .env 中 `INFERENCE_STEPS` 改回 20、`GUIDANCE_SCALE` 改回 2.0
|
||||
- 并行化:纯架构优化,无质量影响,无需回退
|
||||
|
||||
---
|
||||
|
||||
## MuseTalk + LatentSync 混合唇形同步方案
|
||||
|
||||
### 概述
|
||||
|
||||
LatentSync 1.6 质量高但推理极慢(~78% 总时长),长视频(>=2min)耗时 20-60 分钟不可接受。MuseTalk 1.5 是单步潜空间修复(非扩散模型),逐帧推理速度接近实时(30fps+ on V100),适合长视频。混合方案按音频时长自动路由:短视频用 LatentSync 保质量,长视频用 MuseTalk 保速度。
|
||||
|
||||
### 架构
|
||||
|
||||
- **路由阈值**: `LIPSYNC_DURATION_THRESHOLD` (默认 120s)
|
||||
- **短视频 (<120s)**: LatentSync 1.6 (GPU1, 端口 8007)
|
||||
- **长视频 (>=120s)**: MuseTalk 1.5 (GPU0, 端口 8011)
|
||||
- **回退**: MuseTalk 不可用时自动 fallback 到 LatentSync
|
||||
|
||||
### 改动文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `models/MuseTalk/` | 从 Temp/MuseTalk 复制代码 + 下载权重 |
|
||||
| `models/MuseTalk/scripts/server.py` | 新建 FastAPI 常驻服务 (端口 8011, GPU0) |
|
||||
| `backend/app/core/config.py` | 新增 MUSETALK_* 和 LIPSYNC_DURATION_THRESHOLD |
|
||||
| `backend/.env` | 新增对应环境变量 |
|
||||
| `backend/app/services/lipsync_service.py` | 新增 `_call_musetalk_server()` + 混合路由逻辑 + 扩展 `check_health()` |
|
||||
|
||||
---
|
||||
|
||||
## MuseTalk 推理性能优化 (server.py v2)
|
||||
|
||||
### 概述
|
||||
|
||||
MuseTalk 首次长视频测试 (136s, 3404 帧) 耗时 1799s (~30 分钟),分析发现瓶颈集中在人脸检测 (28%)、BiSeNet 合成 (22%)、I/O (17%),而非 UNet 推理本身 (17%)。通过 6 项优化预估降至 8-10 分钟 (~3x 加速)。
|
||||
|
||||
### 性能瓶颈分析 (优化前, 1799s)
|
||||
|
||||
| 阶段 | 耗时 | 占比 | 瓶颈原因 |
|
||||
|------|------|------|---------|
|
||||
| DWPose + 人脸检测 | ~510s | 28% | `batch_size_fa=1`, 每帧跑 2 个 NN, 完全串行 |
|
||||
| 合成 + BiSeNet 人脸解析 | ~400s | 22% | 每帧都跑 BiSeNet + PNG 写盘 |
|
||||
| UNet 推理 | ~300s | 17% | batch_size=8 太小 |
|
||||
| I/O (PNG 读写 + FFmpeg) | ~300s | 17% | PNG 压缩慢, ffmpeg→PNG→imread 链路 |
|
||||
| VAE 编码 | ~100s | 6% | 逐帧编码, 未批处理 |
|
||||
|
||||
### 6 项优化
|
||||
|
||||
| # | 优化项 | 详情 |
|
||||
|---|--------|------|
|
||||
| 1 | **batch_size 8→32** | `.env` 修改, RTX 3090 显存充裕 |
|
||||
| 2 | **cv2.VideoCapture 直读帧** | 跳过 ffmpeg→PNG→imread 链路, 省去 3404 次 PNG 编解码 |
|
||||
| 3 | **人脸检测降频 (每5帧)** | 每 5 帧运行 DWPose + FaceAlignment, 中间帧线性插值 bbox |
|
||||
| 4 | **BiSeNet mask 缓存 (每5帧)** | 每 5 帧运行 `get_image_prepare_material`, 中间帧用 `get_image_blending` 复用缓存 mask |
|
||||
| 5 | **cv2.VideoWriter 直写** | 跳过逐帧 PNG 写盘 + ffmpeg 重编码, 用 VideoWriter 直写 mp4 |
|
||||
| 6 | **每阶段计时** | 7 个阶段精确计时, 方便后续进一步调优 |
|
||||
|
||||
### 修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `models/MuseTalk/scripts/server.py` | 完全重写 `_run_inference()`, 新增 `_detect_faces_subsampled()` |
|
||||
| `backend/.env` | `MUSETALK_BATCH_SIZE` 8→32 |
|
||||
|
||||
---
|
||||
|
||||
## Remotion 并发渲染优化
|
||||
|
||||
### 概述
|
||||
|
||||
Remotion 渲染在 56 核服务器上默认只用 8 并发 (`min(8, cores/2)`),改为 16 并发,预估从 ~5 分钟降到 ~2-3 分钟。
|
||||
|
||||
### 改动
|
||||
|
||||
- `remotion/render.ts`: `renderMedia()` 新增 `concurrency` 参数 (默认 16), 支持 `--concurrency` CLI 参数覆盖
|
||||
- `remotion/dist/render.js`: 重新编译
|
||||
|
||||
### 修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `remotion/render.ts` | `RenderOptions` 新增 `concurrency` 字段, `renderMedia()` 传入 `concurrency` |
|
||||
| `remotion/dist/render.js` | TypeScript 重新编译 |
|
||||
203
Docs/DevLogs/Day28.md
Normal file
203
Docs/DevLogs/Day28.md
Normal file
@@ -0,0 +1,203 @@
|
||||
## CosyVoice FP16 加速 + 文档更新 + AI改写界面重构 + 标题字幕面板重排与视频帧预览 (Day 28)
|
||||
|
||||
### 概述
|
||||
|
||||
CosyVoice 3.0 声音克隆服务开启 FP16 半精度推理,预估提速 30-40%。同步更新 4 个项目文档。重构 AI 改写文案界面(RewriteModal 两步流程 + ScriptExtractionModal 逻辑抽取)。前端将"标题与字幕"面板从第二步移至第四步(素材编辑之后),样式预览窗口背景从紫粉渐变改为视频片头帧截图,实现所见即所得。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 改动内容
|
||||
|
||||
### 1. CosyVoice FP16 半精度加速
|
||||
|
||||
- **问题**: CosyVoice 3.0 以 FP32 全精度运行,RTF (Real-Time Factor) 约 0.9-1.35x,生成 2 分钟音频需要约 2 分钟
|
||||
- **根因**: `AutoModel()` 初始化时未传入 `fp16=True`,LLM 推理和 Flow Matching (DiT) 均在 FP32 下运行
|
||||
- **修复**: 一行改动开启 FP16 自动混合精度
|
||||
|
||||
```python
|
||||
# 旧: _model = AutoModel(model_dir=str(MODEL_DIR))
|
||||
# 新:
|
||||
_model = AutoModel(model_dir=str(MODEL_DIR), fp16=True)
|
||||
```
|
||||
|
||||
- **生效机制**: `CosyVoice3Model` 在 `llm_job()` 和 `token2wav()` 中通过 `torch.cuda.amp.autocast(self.fp16)` 自动将计算转为 FP16
|
||||
- **预期效果**:
|
||||
- 推理速度提升 30-40%
|
||||
- 显存占用降低 ~30%
|
||||
- 语音质量基本无损(0.5B 模型 FP16 精度充足)
|
||||
- **验证**: 服务重启后自检通过,健康检查 `ready: true`
|
||||
|
||||
### 2. 文档全面更新 (4 个文件)
|
||||
|
||||
补充 Day 27 新增的 MuseTalk 混合唇形同步方案、性能优化、Remotion 并发渲染等内容到所有相关文档。
|
||||
|
||||
#### README.md
|
||||
- 项目描述更新为 "LatentSync 1.6 + MuseTalk 1.5 混合唇形同步"
|
||||
- 唇形同步功能描述改为混合方案(短视频 LatentSync,长视频 MuseTalk)
|
||||
- 技术栈表新增 MuseTalk 1.5
|
||||
- 项目结构新增 `models/MuseTalk/`
|
||||
- 服务架构表新增 MuseTalk (端口 8011)
|
||||
- 文档中心新增 MuseTalk 部署指南链接
|
||||
- 性能优化描述新增降频检测 + Remotion 16 并发
|
||||
|
||||
#### DEPLOY_MANUAL.md
|
||||
- GPU 分配说明更新 (GPU0=MuseTalk+CosyVoice, GPU1=LatentSync)
|
||||
- 步骤 3 拆分为 3a (LatentSync) + 3b (MuseTalk)
|
||||
- 环境变量表新增 7 个 MuseTalk 变量,移除过时的 `DOUYIN_COOKIE`
|
||||
- LatentSync 推理步数默认值 20→16
|
||||
- 测试运行新增 MuseTalk 启动终端
|
||||
- PM2 管理新增 MuseTalk 服务(第 5 项)
|
||||
- 端口检查、日志查看命令新增 8011/vigent2-musetalk
|
||||
|
||||
#### SUBTITLE_DEPLOY.md
|
||||
- 技术架构图更新为 LatentSync/MuseTalk 混合路由
|
||||
- 新增唇形同步路由说明
|
||||
- Remotion 配置表新增 `concurrency` 参数 (默认 16)
|
||||
- GPU 分配说明更新
|
||||
- 更新日志新增 v1.3.0 条目
|
||||
|
||||
#### BACKEND_README.md
|
||||
- 健康检查接口描述更新为含 LatentSync + MuseTalk + 混合路由阈值
|
||||
- 环境变量配置新增 MuseTalk 相关变量
|
||||
- 服务集成指南新增"唇形同步混合路由"章节
|
||||
|
||||
---
|
||||
|
||||
### 3. AI 改写文案界面重构
|
||||
|
||||
#### RewriteModal 重构
|
||||
|
||||
将 AI 改写弹窗改为两步式流程,提升交互体验:
|
||||
|
||||
**第一步 — 配置与触发**:
|
||||
- 自定义提示词输入(可选),自动持久化到 localStorage
|
||||
- "开始改写"按钮触发 `/api/ai/rewrite` 请求
|
||||
|
||||
**第二步 — 结果对比与选择**:
|
||||
- 上方:AI 改写结果 + "使用此结果"按钮(紫粉渐变色,醒目)
|
||||
- 下方:原文对比 + "保留原文"按钮(灰色低调)
|
||||
- 底部:可"重新改写"(重回第一步,保留自定义提示词)
|
||||
- ESC 快捷键关闭
|
||||
|
||||
#### ScriptExtractionModal 逻辑抽取
|
||||
|
||||
将文案提取模态框的全部业务逻辑抽取到独立 hook `useScriptExtraction`:
|
||||
|
||||
- **useScriptExtraction.ts** (新建): 管理 URL/文件双模式输入、拖拽上传、提取请求、步骤状态机 (config → processing → result)、剪贴板复制
|
||||
- **ScriptExtractionModal.tsx**: 纯展示组件,消费 hook 返回值,新增 ESC/Enter 快捷键
|
||||
|
||||
#### ScriptEditor 工具栏调整
|
||||
|
||||
- 按钮组右对齐 (`justify-end`),统一高度 `h-7` 和圆角
|
||||
- "历史文案"按钮用灰色 (bg-gray-600) 区分辅助功能
|
||||
- "文案提取助手"用紫色 (bg-purple-600) 表示主功能
|
||||
- "AI多语言"用绿渐变 (emerald-teal),"AI生成标题标签"用蓝渐变 (blue-cyan)
|
||||
- "AI智能改写"和"保存文案"移至文本框下方状态栏
|
||||
|
||||
---
|
||||
|
||||
### 4. 标题字幕面板重排 + 视频帧背景预览
|
||||
|
||||
#### 面板顺序重排
|
||||
|
||||
将 `<TitleSubtitlePanel>` 从第二步移至第四步(素材编辑之后),使用户在设置标题字幕样式时已经完成了素材选择和时间轴编排。
|
||||
|
||||
新顺序:
|
||||
```
|
||||
一、文案提取与编辑(不变)
|
||||
二、配音(原三)
|
||||
三、素材编辑(原四)
|
||||
四、标题与字幕(原二)→ 移到素材编辑之后
|
||||
```
|
||||
|
||||
#### 新建 useVideoFrameCapture hook
|
||||
|
||||
从视频 URL 截取 0.1s 处帧画面,返回 JPEG data URL:
|
||||
|
||||
- 创建 `<video>` 元素,设置 `crossOrigin="anonymous"`(素材存储在 Supabase Storage 跨域地址)
|
||||
- 先绑定 `loadedmetadata` / `canplay` / `seeked` / `error` 事件监听,再设 src(避免事件丢失)
|
||||
- `loadedmetadata` 或 `canplay` 触发后 seek 到 0.1s,`seeked` 回调中用 canvas `drawImage` 截帧
|
||||
- canvas 缩放到 480px 宽再编码(预览窗口最大 280px,节省内存)
|
||||
- `canvas.toDataURL("image/jpeg", 0.7)` 导出
|
||||
- 防御 `videoWidth/videoHeight` 为 0 的边界情况
|
||||
- try-catch 防 canvas taint,失败返回 null(降级渐变)
|
||||
- `isActive` 标志 + `seeked` 去重标志防止 stale 和重复更新
|
||||
- 截图完成后清理 video 元素释放内存
|
||||
|
||||
#### 按需截取(性能优化)
|
||||
|
||||
只在样式预览窗口打开时才触发截取:
|
||||
|
||||
```typescript
|
||||
const materialPosterUrl = useVideoFrameCapture(
|
||||
showStylePreview ? firstTimelineMaterialUrl : null
|
||||
);
|
||||
```
|
||||
|
||||
截取源优先使用**时间轴第一段素材**(用户拖拽排序后的真实片头),回退到 `selectedMaterials[0]`(未生成配音、时间轴为空时)。
|
||||
|
||||
#### 预览背景替换
|
||||
|
||||
`FloatingStylePreview` 有视频帧时直接显示原始画面(不加半透明,保证颜色真实),文字靠描边保证可读性;无视频帧时降级为原紫粉渐变背景。
|
||||
|
||||
#### 踩坑记录
|
||||
|
||||
1. **CORS tainted canvas**: 素材文件存储在 Supabase Storage (`api.hbyrkj.top`),是跨域签名链接。必须设 `video.crossOrigin = "anonymous"` 才能让 canvas `toDataURL` 不被 SecurityError 拦截
|
||||
2. **时间轴为空**: `useTimelineEditor` 在 `audioDuration <= 0`(未选配音)时返回空数组,需回退到 `selectedMaterials[0]`
|
||||
3. **事件监听顺序**: 必须先绑定事件监听再设 `video.src`,否则快速加载时事件可能丢失
|
||||
|
||||
---
|
||||
|
||||
## 📁 修改文件清单
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `models/CosyVoice/cosyvoice_server.py` | `AutoModel()` 新增 `fp16=True` 参数 |
|
||||
| `README.md` | 混合唇形同步描述、技术栈、服务架构、项目结构更新 |
|
||||
| `Docs/DEPLOY_MANUAL.md` | MuseTalk 部署步骤、环境变量、PM2 管理、端口检查 |
|
||||
| `Docs/SUBTITLE_DEPLOY.md` | 架构图、Remotion concurrency、GPU 分配、更新日志 |
|
||||
| `Docs/BACKEND_README.md` | 健康检查、环境变量、混合路由章节 |
|
||||
| `frontend/.../RewriteModal.tsx` | 两步式改写流程(自定义提示词 → 结果对比) |
|
||||
| `frontend/.../script-extraction/useScriptExtraction.ts` | **新建** — 文案提取逻辑 hook |
|
||||
| `frontend/.../ScriptExtractionModal.tsx` | 纯展示组件,消费 hook,新增快捷键 |
|
||||
| `frontend/.../ScriptEditor.tsx` | 工具栏右对齐 + 按钮分色 + 改写/保存移至底部 |
|
||||
| `frontend/.../useVideoFrameCapture.ts` | **新建** — 视频帧截取 hook,crossOrigin + canvas 缩放 |
|
||||
| `frontend/.../useHomeController.ts` | 新增 useMemo 计算素材 URL,调用帧截取 hook,showStylePreview 门控 |
|
||||
| `frontend/.../HomePage.tsx` | 面板重排(二↔四互换),编号更新,透传 materialPosterUrl |
|
||||
| `frontend/.../TitleSubtitlePanel.tsx` | 编号"二"→"四",新增 previewBackgroundUrl prop |
|
||||
| `frontend/.../FloatingStylePreview.tsx` | 新增 previewBackgroundUrl prop,条件渲染视频帧/渐变背景 |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证
|
||||
|
||||
- CosyVoice 重启成功,健康检查 `{"ready": true}`
|
||||
- 自检推理通过(7.2s for "你好")
|
||||
- FP16 通过 `torch.cuda.amp.autocast(self.fp16)` 在 LLM 和 Flow Matching 阶段生效
|
||||
- `npx tsc --noEmit` — 零错误
|
||||
- AI 改写:自定义提示词持久化 → 改写结果 + 原文对比 → "使用此结果"/"保留原文"
|
||||
- 文案提取:URL / 文件双模式 → 处理中动画 → 结果填入
|
||||
- 面板顺序:一→文案、二→配音、三→素材编辑、四→标题与字幕
|
||||
- 样式预览背景:有素材时显示真实视频片头帧,无素材降级紫粉渐变
|
||||
- 预览关闭时不触发截取,不浪费资源
|
||||
|
||||
---
|
||||
|
||||
## 💡 CosyVoice 性能分析备注
|
||||
|
||||
### 当前性能基线 (FP32, 优化前)
|
||||
|
||||
| 文本长度 | 音频时长 | 推理耗时 | RTF |
|
||||
|----------|----------|----------|-----|
|
||||
| 42 字 | 9.8s | 13.2s | 1.35x |
|
||||
| 89 字 | 18.2s | 20.3s | 1.12x |
|
||||
| ~530 字 | 115.8s | 107.7s | 0.93x |
|
||||
| ~670 字 | 143.5s | 131.6s | 0.92x |
|
||||
|
||||
### 未来可选优化(收益递减,暂不实施)
|
||||
|
||||
| 优化项 | 预期提升 | 复杂度 |
|
||||
|--------|----------|--------|
|
||||
| TensorRT (DiT 模块) | +20-30% | 需编译 .plan 引擎 |
|
||||
| torch.compile() | +10-20% | 一行代码,但首次编译慢 |
|
||||
| vLLM (LLM 模块) | +10-15% | 额外依赖 |
|
||||
@@ -389,7 +389,7 @@ if not qr_element:
|
||||
|
||||
## 📋 文档规则优化 (16:42 - 17:10)
|
||||
|
||||
**问题**:Doc_Rules需要优化,避免误删历史内容、规范工具使用、防止任务清单遗漏
|
||||
**问题**:DOC_RULES需要优化,避免误删历史内容、规范工具使用、防止任务清单遗漏
|
||||
|
||||
**优化内容(最终版)**:
|
||||
|
||||
@@ -411,7 +411,7 @@ if not qr_element:
|
||||
- 移除无关项目组件
|
||||
|
||||
**修改文件**:
|
||||
- `Docs/Doc_Rules.md` - 包含检查清单的最终完善版
|
||||
- `Docs/DOC_RULES.md` - 包含检查清单的最终完善版
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -228,7 +228,7 @@ else:
|
||||
|
||||
| 文件 | 说明 | 状态 |
|
||||
|------|------|------|
|
||||
| `src/lib/auth.ts` | 认证工具函数 | ✅ |
|
||||
| `src/shared/lib/auth.ts` | 认证工具函数 | ✅ |
|
||||
| `src/app/login/page.tsx` | 登录页 | ✅ |
|
||||
| `src/app/register/page.tsx` | 注册页 | ✅ |
|
||||
| `src/app/admin/page.tsx` | 管理后台 | ✅ |
|
||||
|
||||
@@ -8,8 +8,8 @@
|
||||
|
||||
| 规则 | 说明 |
|
||||
|------|------|
|
||||
| **默认更新** | 只更新 `DayN.md` |
|
||||
| **按需更新** | `task_complete.md` 仅在用户**明确要求**时更新 |
|
||||
| **默认更新** | 更新 `DayN.md` 和 `TASK_COMPLETE.md` |
|
||||
| **按需更新** | 其他文档仅在内容变化涉及时更新 |
|
||||
| **智能修改** | 错误→替换,改进→追加(见下方详细规则) |
|
||||
| **先读后写** | 更新前先查看文件当前内容 |
|
||||
| **日内合并** | 同一天的多次小修改合并为最终版本 |
|
||||
@@ -23,12 +23,14 @@
|
||||
| 优先级 | 文件路径 | 检查重点 |
|
||||
| :---: | :--- | :--- |
|
||||
| 🔥 **High** | `Docs/DevLogs/DayN.md` | **(最新日志)** 详细记录变更、修复、代码片段 |
|
||||
| 🔥 **High** | `Docs/task_complete.md` | **(任务总览)** 更新 `[x]`、进度条、时间线 |
|
||||
| 🔥 **High** | `Docs/TASK_COMPLETE.md` | **(任务总览)** 更新 `[x]`、进度条、时间线 |
|
||||
| ⚡ **Med** | `README.md` | **(项目主页)** 功能特性、技术栈、最新截图 |
|
||||
| ⚡ **Med** | `Docs/DEPLOY_MANUAL.md` | **(部署手册)** 环境变量、依赖包、启动命令变更 |
|
||||
| ⚡ **Med** | `Docs/BACKEND_DEV.md` | **(后端规范)** 接口契约、模块划分、环境变量 |
|
||||
| ⚡ **Med** | `Docs/BACKEND_README.md` | **(后端文档)** 接口说明、架构设计 |
|
||||
| ⚡ **Med** | `Docs/FRONTEND_DEV.md` | **(前端规范)** API封装、日期格式化、新页面规范 |
|
||||
| 🧊 **Low** | `Docs/implementation_plan.md` | **(实施计划)** 核对计划与实际实现的差异 |
|
||||
| 🧊 **Low** | `frontend/README.md` | **(前端文档)** 新页面路由、组件用法、UI变更 |
|
||||
| ⚡ **Med** | `Docs/FRONTEND_README.md` | **(前端文档)** 功能说明、页面变更 |
|
||||
| 🧊 **Low** | `Docs/*_DEPLOY.md` | **(子系统部署)** LatentSync/CosyVoice/字幕等独立部署文档 |
|
||||
|
||||
---
|
||||
|
||||
@@ -93,7 +95,7 @@
|
||||
|
||||
### 必须执行的检查步骤
|
||||
|
||||
**1. 快速浏览全文**(使用 `view_file` 或 `grep_search`)
|
||||
**1. 快速浏览全文**(使用 `Read` 或 `Grep`)
|
||||
```markdown
|
||||
# 检查是否存在:
|
||||
- 同主题的旧章节?
|
||||
@@ -140,62 +142,41 @@
|
||||
|
||||
> **核心原则**:使用正确的工具,避免字符编码问题
|
||||
|
||||
### ✅ 推荐工具:replace_file_content
|
||||
### ✅ 推荐工具:Edit / Read / Grep
|
||||
|
||||
**使用场景**:
|
||||
- 追加新章节到文件末尾
|
||||
- 修改/替换现有章节内容
|
||||
- 更新状态标记(🔄 → ✅)
|
||||
- 修正错误内容
|
||||
|
||||
**优势**:
|
||||
- ✅ 自动处理字符编码(Windows CRLF)
|
||||
- ✅ 精确替换,不会误删其他内容
|
||||
- ✅ 有错误提示,方便调试
|
||||
- `Read`:更新前先查看文件当前内容
|
||||
- `Edit`:精确替换现有内容、追加新章节
|
||||
- `Grep`:搜索文件中是否已有相关章节
|
||||
- `Write`:创建新文件(如 Day{N+1}.md)
|
||||
|
||||
**注意事项**:
|
||||
```markdown
|
||||
1. **必须精确匹配**:TargetContent 必须与文件完全一致
|
||||
2. **处理换行符**:文件使用 \r\n,不要漏掉 \r
|
||||
3. **合理范围**:StartLine/EndLine 应覆盖目标内容
|
||||
4. **先读后写**:编辑前先 view_file 确认内容
|
||||
1. **先读后写**:编辑前先用 Read 确认内容
|
||||
2. **精确匹配**:Edit 的 old_string 必须与文件内容完全一致
|
||||
3. **避免重复**:编辑前用 Grep 检查是否已存在同主题章节
|
||||
```
|
||||
|
||||
### ❌ 禁止使用:命令行工具
|
||||
### ❌ 禁止使用:命令行工具修改文档
|
||||
|
||||
**禁止场景**:
|
||||
- ❌ 使用 `echo >>` 追加内容(编码问题)
|
||||
- ❌ 使用 PowerShell 直接修改文档(破坏格式)
|
||||
- ❌ 使用 sed/awk 等命令行工具
|
||||
- ❌ 使用 `echo >>` 追加内容
|
||||
- ❌ 使用 `sed` / `awk` 修改文档
|
||||
- ❌ 使用 `cat <<EOF` 写入内容
|
||||
|
||||
**原因**:
|
||||
- 容易破坏 UTF-8 编码
|
||||
- Windows CRLF vs Unix LF 混乱
|
||||
- 容易破坏 UTF-8 编码和中文字符
|
||||
- 难以追踪修改,容易出错
|
||||
|
||||
**唯一例外**:简单的全局文本替换(如批量更新日期),且必须使用 `-NoNewline` 参数
|
||||
- 无法精确匹配替换位置
|
||||
|
||||
### 📝 最佳实践示例
|
||||
|
||||
**追加新章节**:
|
||||
```python
|
||||
replace_file_content(
|
||||
TargetFile="path/to/DayN.md",
|
||||
TargetContent="## 🔗 相关文档\n\n...\n\n", # 文件末尾的内容
|
||||
ReplacementContent="## 🔗 相关文档\n\n...\n\n---\n\n## 🆕 新章节\n内容...",
|
||||
StartLine=280,
|
||||
EndLine=284
|
||||
)
|
||||
```
|
||||
**追加新章节**:使用 `Edit` 工具,`old_string` 匹配文件末尾内容,`new_string` 包含原内容 + 新章节。
|
||||
|
||||
**修改现有内容**:
|
||||
```python
|
||||
replace_file_content(
|
||||
TargetContent="**状态**:🔄 待修复",
|
||||
ReplacementContent="**状态**:✅ 已修复",
|
||||
StartLine=310,
|
||||
EndLine=310
|
||||
)
|
||||
**修改现有内容**:使用 `Edit` 工具精确替换。
|
||||
```markdown
|
||||
old_string: "**状态**:🔄 待修复"
|
||||
new_string: "**状态**:✅ 已修复"
|
||||
```
|
||||
|
||||
|
||||
@@ -204,12 +185,19 @@ replace_file_content(
|
||||
## 📁 文件结构
|
||||
|
||||
```
|
||||
ViGent/Docs/
|
||||
├── task_complete.md # 任务总览(仅按需更新)
|
||||
├── Doc_Rules.md # 本文件
|
||||
ViGent2/Docs/
|
||||
├── TASK_COMPLETE.md # 任务总览(仅按需更新)
|
||||
├── DOC_RULES.md # 本文件
|
||||
├── BACKEND_DEV.md # 后端开发规范
|
||||
├── BACKEND_README.md # 后端功能文档
|
||||
├── FRONTEND_DEV.md # 前端开发规范
|
||||
├── FRONTEND_README.md # 前端功能文档
|
||||
├── DEPLOY_MANUAL.md # 部署手册
|
||||
├── SUPABASE_DEPLOY.md # Supabase 部署文档
|
||||
├── LATENTSYNC_DEPLOY.md # LatentSync 部署文档
|
||||
├── COSYVOICE3_DEPLOY.md # 声音克隆部署文档
|
||||
├── ALIPAY_DEPLOY.md # 支付宝付费部署文档
|
||||
├── SUBTITLE_DEPLOY.md # 字幕系统部署文档
|
||||
└── DevLogs/
|
||||
├── Day1.md # 开发日志
|
||||
└── ...
|
||||
@@ -219,8 +207,16 @@ ViGent/Docs/
|
||||
|
||||
## 📅 DayN.md 更新规则(日常更新)
|
||||
|
||||
### 更新时机
|
||||
|
||||
> **边开发边记录,不要等到最后才写。**
|
||||
|
||||
- 每完成一个功能/修复后,**立即**追加到 DayN.md
|
||||
- 避免积攒到对话末尾一次性补写,容易遗漏变更
|
||||
- `TASK_COMPLETE.md` 同理,重要变更完成后及时同步
|
||||
|
||||
### 新建判断 (对话开始前)
|
||||
1. **回顾进度**:查看 `task_complete.md` 了解当前状态
|
||||
1. **回顾进度**:查看 `TASK_COMPLETE.md` 了解当前状态
|
||||
2. **检查日期**:查看最新 `DayN.md`
|
||||
- **今天 (与当前日期相同)** → 🚨 **绝对禁止创建新文件**,必须**追加**到现有 `DayN.md` 末尾!即使是完全不同的功能模块。
|
||||
- **之前 (昨天或更早)** → 创建 `Day{N+1}.md`
|
||||
@@ -252,6 +248,10 @@ ViGent/Docs/
|
||||
**状态**:✅ 已修复 / 🔄 待验证
|
||||
```
|
||||
|
||||
### ⚠️ 注意
|
||||
- **DayN.md 文件开头禁止使用 `---`**,避免被解析为 Front Matter。
|
||||
- 分隔线只用于章节之间,不作为文件第一行。
|
||||
|
||||
---
|
||||
|
||||
## 📏 内容简洁性规则
|
||||
@@ -272,17 +272,17 @@ ViGent/Docs/
|
||||
|
||||
---
|
||||
|
||||
## 📝 task_complete.md 更新规则(仅按需)
|
||||
## 📝 TASK_COMPLETE.md 更新规则
|
||||
|
||||
> ⚠️ **仅当用户明确要求更新 `task_complete.md` 时才更新**
|
||||
> 与 DayN.md 同步更新,记录重要变更时更新任务总览。
|
||||
|
||||
### 更新原则
|
||||
- **格式一致性**:直接参考 `task_complete.md` 现有格式追加内容。
|
||||
- **格式一致性**:直接参考 `TASK_COMPLETE.md` 现有格式追加内容。
|
||||
- **进度更新**:仅在阶段性里程碑时更新进度百分比。
|
||||
|
||||
### 🔍 完整性检查清单 (必做)
|
||||
|
||||
每次更新 `task_complete.md` 时,必须**逐一检查**以下所有板块:
|
||||
每次更新 `TASK_COMPLETE.md` 时,必须**逐一检查**以下所有板块:
|
||||
|
||||
1. **文件头部 & 导航**
|
||||
- [ ] `更新时间`:必须是当天日期
|
||||
@@ -305,4 +305,4 @@ ViGent/Docs/
|
||||
|
||||
---
|
||||
|
||||
**最后更新**:2026-01-23
|
||||
**最后更新**:2026-02-11
|
||||
|
||||
@@ -2,18 +2,75 @@
|
||||
|
||||
## 目录结构
|
||||
|
||||
采用轻量 FSD(Feature-Sliced Design)结构:
|
||||
|
||||
```
|
||||
frontend/src/
|
||||
├── app/ # Next.js App Router 页面
|
||||
│ ├── page.tsx # 首页(视频生成)
|
||||
│ ├── publish/ # 发布页面
|
||||
│ ├── admin/ # 管理员页面
|
||||
│ ├── login/ # 登录页面
|
||||
│ └── register/ # 注册页面
|
||||
├── lib/ # 公共工具函数
|
||||
│ ├── axios.ts # Axios 实例(含 401/403 拦截器)
|
||||
│ └── auth.ts # 认证相关函数
|
||||
└── proxy.ts # 路由代理(原 middleware)
|
||||
├── app/ # Next.js App Router 页面入口
|
||||
│ ├── page.tsx # 首页(视频生成)
|
||||
│ ├── publish/ # 发布管理页
|
||||
│ ├── admin/ # 管理员页面
|
||||
│ ├── login/ # 登录
|
||||
│ ├── register/ # 注册
|
||||
│ └── pay/ # 付费开通会员
|
||||
├── features/ # 功能模块(按业务拆分)
|
||||
│ ├── home/
|
||||
│ │ ├── model/ # 业务逻辑 hooks
|
||||
│ │ │ ├── useHomeController.ts # 主控制器
|
||||
│ │ │ ├── useHomePersistence.ts # 持久化管理
|
||||
│ │ │ ├── useBgm.ts
|
||||
│ │ │ ├── useGeneratedVideos.ts
|
||||
│ │ │ ├── useGeneratedAudios.ts
|
||||
│ │ │ ├── useMaterials.ts
|
||||
│ │ │ ├── useMediaPlayers.ts
|
||||
│ │ │ ├── useRefAudios.ts
|
||||
│ │ │ ├── useSavedScripts.ts
|
||||
│ │ │ ├── useTimelineEditor.ts
|
||||
│ │ │ └── useTitleSubtitleStyles.ts
|
||||
│ │ └── ui/ # UI 组件(纯 props + 回调)
|
||||
│ │ ├── HomePage.tsx
|
||||
│ │ ├── HomeHeader.tsx
|
||||
│ │ ├── MaterialSelector.tsx
|
||||
│ │ ├── ScriptEditor.tsx
|
||||
│ │ ├── ScriptExtractionModal.tsx
|
||||
│ │ ├── script-extraction/
|
||||
│ │ │ └── useScriptExtraction.ts
|
||||
│ │ ├── TitleSubtitlePanel.tsx
|
||||
│ │ ├── FloatingStylePreview.tsx
|
||||
│ │ ├── VoiceSelector.tsx
|
||||
│ │ ├── RefAudioPanel.tsx
|
||||
│ │ ├── GeneratedAudiosPanel.tsx
|
||||
│ │ ├── TimelineEditor.tsx
|
||||
│ │ ├── ClipTrimmer.tsx
|
||||
│ │ ├── BgmPanel.tsx
|
||||
│ │ ├── GenerateActionBar.tsx
|
||||
│ │ ├── PreviewPanel.tsx
|
||||
│ │ └── HistoryList.tsx
|
||||
│ └── publish/
|
||||
│ ├── model/
|
||||
│ │ └── usePublishController.ts
|
||||
│ └── ui/
|
||||
│ └── PublishPage.tsx
|
||||
├── shared/ # 跨功能共享
|
||||
│ ├── api/
|
||||
│ │ ├── axios.ts # Axios 实例(含 401/403 拦截器)
|
||||
│ │ └── types.ts # 统一响应类型
|
||||
│ ├── lib/
|
||||
│ │ ├── media.ts # API Base / URL / 日期等通用工具
|
||||
│ │ ├── auth.ts # 认证相关函数
|
||||
│ │ └── title.ts # 标题输入处理
|
||||
│ ├── hooks/
|
||||
│ │ ├── useTitleInput.ts
|
||||
│ │ └── usePublishPrefetch.ts
|
||||
│ ├── types/
|
||||
│ │ ├── user.ts # User 类型定义
|
||||
│ │ └── publish.ts # 发布相关类型
|
||||
│ └── contexts/ # 全局 Context(Auth、Task)
|
||||
│ ├── AuthContext.tsx
|
||||
│ └── TaskContext.tsx
|
||||
├── components/ # 遗留通用组件
|
||||
│ └── VideoPreviewModal.tsx
|
||||
└── proxy.ts # Next.js middleware(路由保护)
|
||||
```
|
||||
|
||||
---
|
||||
@@ -94,20 +151,47 @@ body {
|
||||
| `sm:` | ≥ 640px | 平板/桌面 |
|
||||
| `lg:` | ≥ 1024px | 大屏桌面 |
|
||||
|
||||
### embedded 组件模式
|
||||
|
||||
合并板块时,子组件通过 `embedded?: boolean` prop 控制是否渲染外层卡片容器和主标题。
|
||||
|
||||
```tsx
|
||||
// embedded=false(独立使用):渲染完整卡片
|
||||
<div className="bg-white/5 rounded-2xl p-6 border border-white/10">
|
||||
<h2>标题</h2>
|
||||
{content}
|
||||
</div>
|
||||
|
||||
// embedded=true(嵌入父卡片):只渲染内容
|
||||
{content}
|
||||
```
|
||||
|
||||
- 子标题使用 `<h3 className="text-sm font-medium text-gray-400">`
|
||||
- 分隔线使用 `<div className="border-t border-white/10 my-4" />`
|
||||
- 移动端标题行避免 `whitespace-nowrap`,长描述文字可用 `hidden sm:inline` 在移动端隐藏
|
||||
|
||||
### 按钮视觉层级
|
||||
|
||||
| 层级 | 样式 | 用途 |
|
||||
|------|------|------|
|
||||
| 主操作 | `px-4 py-2 text-sm font-medium bg-gradient-to-r from-purple-600 to-pink-600 shadow-sm` | 生成配音、立即发布 |
|
||||
| 辅助操作 | `px-2 py-1 text-xs bg-white/10 rounded` | 刷新、上传、语速 |
|
||||
| 触屏可见 | `opacity-40 group-hover:opacity-100` | 列表行内操作(编辑/删除) |
|
||||
|
||||
---
|
||||
|
||||
## API 请求规范
|
||||
|
||||
### 必须使用 `api` (axios 实例)
|
||||
|
||||
所有需要认证的 API 请求**必须**使用 `@/lib/axios` 导出的 axios 实例。该实例已配置:
|
||||
所有需要认证的 API 请求**必须**使用 `@/shared/api/axios` 导出的 axios 实例。该实例已配置:
|
||||
- 自动携带 `credentials: include`
|
||||
- 遇到 401/403 时自动清除 cookie 并跳转登录页
|
||||
|
||||
**使用方式:**
|
||||
|
||||
```typescript
|
||||
import api from '@/lib/axios';
|
||||
import api from '@/shared/api/axios';
|
||||
|
||||
// GET 请求
|
||||
const { data } = await api.get('/api/materials');
|
||||
@@ -136,7 +220,7 @@ await api.post('/api/materials', formData, {
|
||||
### SWR 配合使用
|
||||
|
||||
```typescript
|
||||
import api from '@/lib/axios';
|
||||
import api from '@/shared/api/axios';
|
||||
|
||||
// SWR fetcher 使用 axios
|
||||
const fetcher = (url: string) => api.get(url).then(res => res.data);
|
||||
@@ -146,6 +230,27 @@ const { data } = useSWR('/api/xxx', fetcher, { refreshInterval: 2000 });
|
||||
|
||||
---
|
||||
|
||||
## 通用工具函数 (media.ts)
|
||||
|
||||
### 统一 API Base / URL 解析
|
||||
使用 `@/shared/lib/media` 统一处理服务端/客户端 API Base 与资源地址,避免硬编码:
|
||||
|
||||
```typescript
|
||||
import { getApiBaseUrl, resolveMediaUrl, resolveAssetUrl, formatDate } from '@/shared/lib/media';
|
||||
|
||||
const apiBase = getApiBaseUrl(); // SSR: http://localhost:8006 / Client: ''
|
||||
const playableUrl = resolveMediaUrl(video.path); // 兼容签名 URL 与相对路径
|
||||
const fontUrl = resolveAssetUrl(`fonts/${fontFile}`);
|
||||
const timeText = formatDate(video.created_at);
|
||||
```
|
||||
|
||||
### 资源路径规则
|
||||
- 视频/音频:优先用 `resolveMediaUrl()`
|
||||
- 字体/BGM:使用 `resolveAssetUrl()`(自动编码中文路径)
|
||||
- 预览前若已有签名 URL,先用 `isAbsoluteUrl()` 判定,避免再次拼接
|
||||
|
||||
---
|
||||
|
||||
## 日期格式化规范
|
||||
|
||||
### 禁止使用 `toLocaleString()`
|
||||
@@ -161,22 +266,211 @@ new Date(timestamp * 1000).toLocaleString('zh-CN')
|
||||
**正确做法:**
|
||||
```typescript
|
||||
// ✅ 使用固定格式
|
||||
const formatDate = (timestamp: number) => {
|
||||
const d = new Date(timestamp * 1000);
|
||||
const year = d.getFullYear();
|
||||
const month = String(d.getMonth() + 1).padStart(2, '0');
|
||||
const day = String(d.getDate()).padStart(2, '0');
|
||||
const hour = String(d.getHours()).padStart(2, '0');
|
||||
const minute = String(d.getMinutes()).padStart(2, '0');
|
||||
return `${year}/${month}/${day} ${hour}:${minute}`;
|
||||
};
|
||||
import { formatDate } from '@/shared/lib/media';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 组件拆分规范
|
||||
|
||||
当页面组件超过 300-500 行,建议按功能拆分到 `features/*/ui`:
|
||||
|
||||
- `page.tsx` 仅做组合与布局
|
||||
- 业务逻辑集中在 `features/*/model` 的 Controller Hook
|
||||
- UI 组件只接受 props 与回调,尽量不直接发 API
|
||||
- 首页拆分组件统一放在 `features/home/ui/`
|
||||
|
||||
---
|
||||
|
||||
## ⚡️ 体验优化规范
|
||||
|
||||
### 刷新回顶部(统一体验)
|
||||
|
||||
- 长页面(如首页/发布页)在首次挂载时统一回到顶部。
|
||||
- **必须**在页面级 `useEffect` 中设置 `history.scrollRestoration = "manual"` 禁用浏览器原生滚动恢复。
|
||||
- 调用 `window.scrollTo({ top: 0, left: 0, behavior: "auto" })` 并追加 200ms 延迟兜底(防止异步 effect 覆盖)。
|
||||
- **列表自动滚动必须使用时间门控**:页面加载后 1 秒内禁止所有列表自动滚动效果(`scrollEffectsEnabled` ref),防止持久化恢复 + 异步数据加载触发 `scrollIntoView` 导致页面跳动。
|
||||
- 推荐模式:
|
||||
|
||||
```typescript
|
||||
// 页面级(HomePage / PublishPage)
|
||||
useEffect(() => {
|
||||
if (typeof window === "undefined") return;
|
||||
if ("scrollRestoration" in history) history.scrollRestoration = "manual";
|
||||
window.scrollTo({ top: 0, left: 0, behavior: "auto" });
|
||||
const timer = setTimeout(() => window.scrollTo({ top: 0, left: 0, behavior: "auto" }), 200);
|
||||
return () => clearTimeout(timer);
|
||||
}, []);
|
||||
|
||||
// Controller 级(列表滚动时间门控)
|
||||
const scrollEffectsEnabled = useRef(false);
|
||||
useEffect(() => {
|
||||
const timer = setTimeout(() => { scrollEffectsEnabled.current = true; }, 1000);
|
||||
return () => clearTimeout(timer);
|
||||
}, []);
|
||||
|
||||
// 列表滚动 effect(BGM/素材/视频等)
|
||||
useEffect(() => {
|
||||
if (!selectedId || !scrollEffectsEnabled.current) return;
|
||||
target?.scrollIntoView({ block: "nearest", behavior: "smooth" });
|
||||
}, [selectedId, list]);
|
||||
```
|
||||
|
||||
### 路由预取
|
||||
|
||||
- 首页进入发布管理时使用 `router.prefetch("/publish")`
|
||||
- 只预取路由,不在首页渲染发布页组件
|
||||
|
||||
### 发布页数据预取缓存
|
||||
|
||||
- 使用 `sessionStorage` 保存最近的 `accounts/videos`
|
||||
- 缓存 TTL 2 分钟,进入发布页先读缓存,随后后台刷新
|
||||
|
||||
### 骨架屏
|
||||
|
||||
- 账号列表、作品列表、素材列表在加载时显示骨架
|
||||
- 骨架数量应与历史数据数量相近(避免加载时数量跳变)
|
||||
|
||||
### 预览加载优化
|
||||
|
||||
- 预览 `video` 使用 `preload="metadata"`
|
||||
- 发布页预览按钮可进行短时 `preload` 预取
|
||||
|
||||
---
|
||||
|
||||
## 轻量 FSD 结构
|
||||
|
||||
- `app/`:页面入口,保持轻量,只做组合与布局
|
||||
- `features/*/model`:业务逻辑与状态(Controller Hook + 子 Hook)
|
||||
- `features/*/ui`:功能 UI 组件(纯 props + 回调,不直接发 API)
|
||||
- `shared/api`:Axios 实例与统一响应类型
|
||||
- `shared/lib`:通用工具函数(media.ts / auth.ts / title.ts)
|
||||
- `shared/hooks`:跨功能通用 hooks
|
||||
- `shared/types`:跨功能实体类型(User / PublishVideo 等)
|
||||
- `shared/contexts`:全局 Context(AuthContext / TaskContext)
|
||||
- `components/`:遗留通用组件(VideoPreviewModal)
|
||||
|
||||
## 类型定义规范
|
||||
|
||||
- 通用实体类型(如 User, Account, Video)统一放置在 `src/shared/types/`。
|
||||
- 特定业务类型放在 feature 目录下的 types.ts 或 model 中。
|
||||
- **禁止**在多个地方重复定义 User 接口,统一引用 `import { User } from '@/shared/types/user';`。
|
||||
|
||||
---
|
||||
|
||||
## 用户偏好持久化
|
||||
|
||||
首页涉及样式与字号等用户偏好时,需持久化并在刷新后恢复:
|
||||
|
||||
- **必须持久化**:
|
||||
- 标题样式 ID / 字幕样式 ID
|
||||
- 标题字号 / 字幕字号
|
||||
- 标题显示模式(`short` / `persistent`)
|
||||
- 背景音乐选择 / 音量 / 开关状态
|
||||
- 输出画面比例(`9:16` / `16:9`)
|
||||
- 素材选择 / 历史作品选择
|
||||
- 选中配音 ID (`selectedAudioId`)
|
||||
- 语速 (`speed`,声音克隆模式)
|
||||
- 时间轴段信息 (`useTimelineEditor` 的 localStorage)
|
||||
|
||||
### 历史文案(独立持久化)
|
||||
|
||||
`useSavedScripts` hook 独立管理历史文案的 localStorage 持久化:
|
||||
- key: `vigent_{storageKey}_savedScripts`
|
||||
- 仅在用户手动保存/删除时写入 localStorage,不使用自动持久化 effect
|
||||
- 与 `useHomePersistence` 完全独立,互不影响
|
||||
|
||||
### 实施规范
|
||||
- 使用 `storageKey = userId || 'guest'`,按用户隔离。
|
||||
- **恢复先于保存**:恢复完成前禁止写入(`isRestored` 保护)。
|
||||
- 避免默认值覆盖用户选择(优先读取已保存值)。
|
||||
- 优先使用 `useHomePersistence` 集中管理恢复/保存,页面内避免分散的 localStorage 读写。
|
||||
- **禁止使用签名 URL 作为持久化标识**:Supabase Storage 签名 URL 每次请求都变化,必须使用后端返回的稳定 `id` 字段。
|
||||
- 如需新增持久化字段,必须加入恢复与保存逻辑,并更新本节。
|
||||
|
||||
---
|
||||
|
||||
## 标题输入规则
|
||||
|
||||
- 片头标题与发布信息标题统一限制 15 字。
|
||||
- 中文输入法合成阶段不截断,合成结束后才校验长度。
|
||||
- 首页片头标题修改会同步写入 `vigent_${storageKey}_publish_title`。
|
||||
- 标题显示模式使用 `short` / `persistent` 两个固定值;默认 `short`(短暂显示 4 秒)。
|
||||
- 避免使用 `maxLength` 强制截断输入法合成态。
|
||||
- 推荐使用 `@/shared/hooks/useTitleInput` 统一处理输入逻辑。
|
||||
|
||||
---
|
||||
|
||||
## 发布页交互规则
|
||||
|
||||
- 发布按钮在未选择任何平台时禁用
|
||||
- 仅保留"立即发布",不再提供定时发布 UI/参数
|
||||
- **作品选择持久化**:使用 `video.id`(稳定标识)而非 `video.path`(签名 URL)进行选择、比较和 localStorage 存储。发布时根据 `id` 查找对应 `path` 发送请求。
|
||||
|
||||
---
|
||||
|
||||
## 新增页面 Checklist
|
||||
|
||||
1. [ ] 导入 `import api from '@/lib/axios'`
|
||||
1. [ ] 导入 `import api from '@/shared/api/axios'`
|
||||
2. [ ] 所有 API 请求使用 `api.get/post/delete()` 而非原生 `fetch`
|
||||
3. [ ] 日期格式化使用固定格式函数,不用 `toLocaleString()`
|
||||
4. [ ] 添加 `'use client'` 指令(如需客户端交互)
|
||||
3. [ ] 日期格式化使用 `@/shared/lib/media` 的 `formatDate`
|
||||
4. [ ] 资源 URL 使用 `resolveMediaUrl`/`resolveAssetUrl`
|
||||
5. [ ] 添加 `'use client'` 指令(如需客户端交互)
|
||||
|
||||
---
|
||||
|
||||
## 声音克隆 (Voice Clone) 功能
|
||||
|
||||
### API 端点
|
||||
|
||||
| 接口 | 方法 | 功能 |
|
||||
|------|------|------|
|
||||
| `/api/ref-audios` | POST | 上传参考音频 (multipart/form-data: file,ref_text 可选,后端自动 Whisper 转写) |
|
||||
| `/api/ref-audios` | GET | 列出用户的参考音频 |
|
||||
| `/api/ref-audios/{id}` | PUT | 重命名参考音频 |
|
||||
| `/api/ref-audios/{id}` | DELETE | 删除参考音频 (id 需 encodeURIComponent) |
|
||||
| `/api/ref-audios/{id}/retranscribe` | POST | 重新识别参考音频文字(Whisper 转写 + 超 10s 自动截取) |
|
||||
|
||||
### 视频生成 API 扩展
|
||||
|
||||
```typescript
|
||||
// EdgeTTS 模式 (默认)
|
||||
await api.post('/api/videos/generate', {
|
||||
material_path: '...',
|
||||
text: '口播文案',
|
||||
tts_mode: 'edgetts',
|
||||
voice: 'zh-CN-YunxiNeural',
|
||||
});
|
||||
|
||||
// 声音克隆模式
|
||||
await api.post('/api/videos/generate', {
|
||||
material_path: '...',
|
||||
text: '口播文案',
|
||||
tts_mode: 'voiceclone',
|
||||
ref_audio_id: 'user_id/timestamp_name.wav',
|
||||
ref_text: '参考音频对应文字', // 从参考音频 metadata 自动获取
|
||||
speed: 1.0, // 语速 (0.8-1.2)
|
||||
});
|
||||
```
|
||||
|
||||
### 在线录音
|
||||
|
||||
使用 `MediaRecorder` API 录制音频,格式为 `audio/webm`,上传后后端自动转换为 WAV (16kHz mono)。
|
||||
|
||||
```typescript
|
||||
// 录音需要用户授权麦克风
|
||||
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
|
||||
const mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm' });
|
||||
```
|
||||
|
||||
### 参考音频自动处理
|
||||
|
||||
- **自动转写**: 上传参考音频时后端自动调用 Whisper 转写内容作为 `ref_text`,无需用户手动输入
|
||||
- **自动截取**: 参考音频超过 10 秒时自动在静音点截取前 10 秒(CosyVoice 建议 3-10 秒)
|
||||
- **重新识别**: 旧参考音频可通过 retranscribe 端点重新转写并截取
|
||||
|
||||
### UI 结构
|
||||
|
||||
配音方式使用 Tab 切换:
|
||||
- **EdgeTTS 音色** - 预设音色 2x3 网格
|
||||
- **声音克隆** - 参考音频列表 + 在线录音 + 语速下拉菜单 (5 档: 较慢/稍慢/正常/稍快/较快)
|
||||
|
||||
149
Docs/FRONTEND_README.md
Normal file
149
Docs/FRONTEND_README.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# ViGent2 Frontend
|
||||
|
||||
ViGent2 的前端界面,采用 Next.js 16 + TailwindCSS 构建。
|
||||
|
||||
## ✨ 核心功能
|
||||
|
||||
### 1. 视频生成 (`/`)
|
||||
- **一、文案提取与编辑**: 文案输入/提取/翻译/保存。
|
||||
- **二、配音**: 配音方式(EdgeTTS/声音克隆)+ 配音列表(生成/试听/管理)合并为一个板块。
|
||||
- **三、素材编辑**: 视频素材(上传/选择/管理)+ 时间轴编辑(波形/色块/拖拽排序)合并为一个板块。
|
||||
- **四、标题与字幕**: 片头标题/副标题/字幕样式配置;短暂显示/常驻显示;样式预览使用视频片头帧作为真实背景 (Day 28)。
|
||||
- **五、背景音乐**: 试听 + 音量控制 + 选择持久化。
|
||||
- **六、作品**(右栏): 作品列表 + 作品预览合并为一个板块。
|
||||
- **进度追踪**: 实时显示视频生成进度 (10% -> 100%)。
|
||||
- **作品预览**: 生成完成后直接播放下载(作品预览 + 历史作品)。
|
||||
- **预览优化**: 预览视频 `metadata` 预取,首帧加载更快。
|
||||
- **本地保存**: 文案/标题/偏好由 `useHomePersistence` 统一持久化,刷新后恢复 (Day 14/17)。
|
||||
- **历史文案**: 手动保存/加载/删除历史文案,独立 localStorage 持久化 (Day 23)。
|
||||
- **选择持久化**: 首页/发布页作品选择均使用稳定 `id` 持久化,刷新保持用户选择;新视频生成后自动选中最新 (Day 21)。
|
||||
- **AI 多语言翻译**: 支持 9 种目标语言翻译文案 + 还原原文 (Day 22)。
|
||||
|
||||
### 2. 全自动发布 (`/publish`) [Day 7 新增]
|
||||
- **多平台管理**: 统一管理抖音、微信视频号、B站、小红书账号状态。
|
||||
- **扫码登录**:
|
||||
- 集成后端 Playwright 生成的 QR Code。
|
||||
- 实时检测扫码状态 (Wait/Success)。
|
||||
- Cookie 自动保存与状态同步。
|
||||
- **发布配置**: 设置视频标题、标签、简介。
|
||||
- **作品选择**: 卡片列表 + 搜索 + 预览弹窗。
|
||||
- **选择持久化**: 使用稳定 `video.id` 持久化选择,刷新保持;新视频生成自动选中最新 (Day 21)。
|
||||
- **预览兼容**: 签名 URL / 相对路径均可直接预览。
|
||||
- **发布方式**: 仅支持 "立即发布"。
|
||||
|
||||
### 3. 声音克隆 [Day 13 新增]
|
||||
- **TTS 模式选择**: EdgeTTS (预设音色) / 声音克隆 (自定义音色) 切换。
|
||||
- **参考音频管理**: 上传/列表/重命名/删除参考音频,上传后自动 Whisper 转写 ref_text + 超 10s 自动截取。
|
||||
- **重新识别**: 旧参考音频可重新转写并截取 (RotateCw 按钮)。
|
||||
- **一键克隆**: 选择参考音频后自动调用 CosyVoice 3.0 服务。
|
||||
- **语速控制**: 声音克隆模式下支持 5 档语速 (0.8-1.2),选择持久化 (Day 23)。
|
||||
- **多语言支持**: EdgeTTS 10 语言声音列表,声音克隆 language 透传 (Day 22)。
|
||||
|
||||
### 4. 配音前置 + 时间轴编排 [Day 23 新增]
|
||||
- **配音独立生成**: 先生成配音 → 选中配音 → 再选素材 → 生成视频。
|
||||
- **配音管理面板**: 生成/试听/改名/删除/选中,异步生成 + 进度轮询。
|
||||
- **时间轴编辑器**: wavesurfer.js 音频波形 + 色块可视化素材分配,拖拽分割线调整各段时长。
|
||||
- **素材截取设置**: ClipTrimmer 双手柄 range slider + HTML5 视频预览播放。
|
||||
- **拖拽排序**: 时间轴色块支持 HTML5 Drag & Drop 调换素材顺序。
|
||||
- **自定义分配**: 后端 `custom_assignments` 支持用户定义的素材分配方案(含 `source_start/source_end` 截取区间)。
|
||||
- **时间轴语义对齐**: 超出音频时仅保留可见段并截齐末段,超出段不参与生成;不足音频时最后可见段自动循环补齐。
|
||||
- **画面比例控制**: 时间轴顶部支持 `9:16 / 16:9` 输出比例选择,设置持久化并透传后端。
|
||||
|
||||
### 5. 字幕与标题 [Day 13 新增]
|
||||
- **片头标题**: 可选输入,限制 15 字;支持”短暂显示 / 常驻显示”,默认短暂显示(4 秒),对标题和副标题同时生效。
|
||||
- **片头副标题**: 可选输入,限制 20 字;显示在主标题下方,用于补充说明或悬念引导;独立样式配置(字体/字号/颜色/间距),可由 AI 同时生成;与标题共享显示模式设定;仅在视频画面中显示,不参与发布标题 (Day 25)。
|
||||
- **标题同步**: 首页片头标题修改会同步到发布信息标题。
|
||||
- **逐字高亮字幕**: 卡拉OK效果,默认开启,可关闭。
|
||||
- **自动对齐**: 基于 faster-whisper 生成字级别时间戳。
|
||||
- **样式预设**: 标题/字幕/副标题样式选择 + 预览 + 字号调节 (Day 16/25)。
|
||||
- **默认样式**: 标题 90px 站酷快乐体;字幕 60px 经典黄字 + DingTalkJinBuTi (Day 17)。
|
||||
- **样式持久化**: 标题/字幕/副标题样式与字号刷新保留 (Day 17/25)。
|
||||
|
||||
### 6. 背景音乐 [Day 16 新增]
|
||||
- **试听预览**: 点击试听即选中,音量滑块实时生效。
|
||||
- **混音控制**: 仅影响 BGM,配音保持原音量。
|
||||
|
||||
### 7. 账户设置 [Day 15 新增]
|
||||
- **手机号登录**: 11位中国手机号验证登录。
|
||||
- **账户下拉菜单**: 显示手机号(中间四位脱敏)+ 有效期 + 修改密码 + 安全退出。
|
||||
- **修改密码**: 弹窗输入当前密码与新密码,修改后强制重新登录。
|
||||
- **登录即时生效**: 登录成功后 AuthContext 立即写入用户数据,无需刷新即显示手机号。
|
||||
|
||||
### 8. 付费开通会员 (`/pay`)
|
||||
- **支付宝电脑网站支付**: 跳转支付宝官方收银台,支持扫码/账号登录/余额等多种支付方式。
|
||||
- **自动激活**: 支付成功后异步回调自动激活会员(有效期 1 年),前端轮询检测支付结果。
|
||||
- **到期续费**: 会员到期后登录自动跳转付费页续费,流程与首次开通一致。
|
||||
- **管理员激活**: 管理员手动激活功能并存,两种方式互不影响。
|
||||
|
||||
### 8. 文案提取助手 (`ScriptExtractionModal`) [Day 15 新增]
|
||||
- **多源提取**: 支持文件拖拽上传与 URL 粘贴 (B站/抖音/TikTok)。
|
||||
- **AI 智能改写**: 集成 GLM-4.7-Flash,自动改写为口播文案。
|
||||
- **自定义提示词**: 可自定义改写提示词,留空使用默认;设置持久化到 localStorage (Day 25)。
|
||||
- **一键填入**: 提取结果直接填充至视频生成输入框。
|
||||
- **智能交互**: 实时进度展示,防误触设计。
|
||||
|
||||
## 🛠️ 技术栈
|
||||
|
||||
- **框架**: Next.js 16 (App Router)
|
||||
- **样式**: TailwindCSS
|
||||
- **图标**: Lucide React
|
||||
- **组件**: 自定义现代化组件 (Glassmorphism 风格)
|
||||
- **音频波形**: wavesurfer.js (时间轴编辑器)
|
||||
- **API**: Axios 实例 `@/shared/api/axios` (对接后端 FastAPI :8006)
|
||||
|
||||
## 🚀 开发指南
|
||||
|
||||
### 安装依赖
|
||||
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
|
||||
### 启动开发服务器
|
||||
|
||||
默认运行在 **3002** 端口 (通过 `package.json` 配置):
|
||||
|
||||
```bash
|
||||
npm run dev
|
||||
# 访问: http://localhost:3002
|
||||
```
|
||||
|
||||
### 目录结构
|
||||
|
||||
```
|
||||
src/
|
||||
├── app/ # 页面入口 (轻量)
|
||||
│ ├── page.tsx # 视频生成主页
|
||||
│ ├── publish/ # 发布管理页
|
||||
│ │ └── page.tsx
|
||||
│ ├── pay/ # 付费开通会员页
|
||||
│ │ └── page.tsx
|
||||
│ └── layout.tsx # 全局布局 (导航栏)
|
||||
├── features/
|
||||
│ ├── home/
|
||||
│ │ ├── model/ # Home 业务逻辑 (hooks)
|
||||
│ │ └── ui/ # Home UI 组件
|
||||
│ └── publish/
|
||||
│ ├── model/ # Publish 业务逻辑 (hooks)
|
||||
│ └── ui/ # Publish UI 组件
|
||||
├── shared/
|
||||
│ ├── api/ # API 实例
|
||||
│ ├── hooks/ # 通用 hooks
|
||||
│ └── lib/ # 工具函数
|
||||
└── components/ # 跨页面复用 UI
|
||||
```
|
||||
|
||||
## 🔌 后端对接
|
||||
|
||||
- **Base URL**: `http://localhost:8006` (SSR) / 相对路径 (Client)
|
||||
- **URL 统一工具**: `@/shared/lib/media` 提供 `resolveMediaUrl` / `resolveAssetUrl`
|
||||
- **代理配置**: Next.js Rewrites (如需) 或直接 CORS。
|
||||
|
||||
## 🎨 设计规范
|
||||
|
||||
- **主色调**: 深紫/黑色系 (Dark Mode)
|
||||
- **交互**: 悬停微动画 (Hover Effects);操作按钮默认半透明可见 (opacity-40),hover 时全亮,兼顾触屏设备
|
||||
- **响应式**: 适配桌面端与移动端;发布页平台卡片响应式布局(移动端紧凑/桌面端宽松)
|
||||
- **滚动体验**: 列表滚动条统一隐藏 (hide-scrollbar);刷新后自动回到顶部(禁用浏览器滚动恢复 + 列表 scroll 时间门控)
|
||||
- **样式预览**: 浮动预览窗口,桌面端左上角 280px,移动端右下角 160px(不遮挡控件)
|
||||
- **输入辅助**: 标题/副标题输入框实时字数计数器,超限变红
|
||||
29
Docs/Logs.md
29
Docs/Logs.md
@@ -1,29 +0,0 @@
|
||||
rongye@r730-ubuntu:~/ProgramFiles/Supabase$ docker compose up -d
|
||||
[+] up 136/136
|
||||
✔ Image timberio/vector:0.28.1-alpine Pulled 63.3ss
|
||||
✔ Image supabase/storage-api:v1.33.0 Pulled 78.6ss
|
||||
✔ Image darthsim/imgproxy:v3.30.1 Pulled 151.9s
|
||||
✔ Image supabase/postgres-meta:v0.95.1 Pulled 87.5ss
|
||||
✔ Image supabase/logflare:1.27.0 Pulled 229.2s
|
||||
✔ Image supabase/postgres:15.8.1.085 Pulled 268.3s
|
||||
✔ Image supabase/supavisor:2.7.4 Pulled 101.6s
|
||||
✔ Image supabase/realtime:v2.68.0 Pulled 56.5ss
|
||||
✔ Image postgrest/postgrest:v14.1 Pulled 201.8s
|
||||
✔ Image supabase/edge-runtime:v1.69.28 Pulled 254.0s
|
||||
✔ Network supabase_default Created 0.1s
|
||||
✔ Volume supabase_db-config Created 0.1s
|
||||
✔ Container supabase-vector Healthy 16.9s
|
||||
✔ Container supabase-imgproxy Created 7.4s
|
||||
✔ Container supabase-db Healthy 20.6s
|
||||
✔ Container supabase-analytics Created 0.4s
|
||||
✔ Container supabase-edge-functions Created 1.8s
|
||||
✔ Container supabase-auth Created 1.7s
|
||||
✔ Container supabase-studio Created 2.0s
|
||||
✔ Container realtime-dev.supabase-realtime Created 1.7s
|
||||
✔ Container supabase-pooler Created 1.8s
|
||||
✔ Container supabase-kong Created 1.7s
|
||||
✔ Container supabase-meta Created 2.0s
|
||||
✔ Container supabase-rest Created 0.9s
|
||||
✔ Container supabase-storage Created 1.4s
|
||||
Error response from daemon: failed to set up container networking: driver failed programming external connectivity on endpoint supabase-analytics (2fd60a510a1f16bf29f8f5140f14ef457a284c5b65a2567b7be250a4f9708f34): failed to bind host port 0.0.0.0:4000/tcp: address already in use
|
||||
[ble: exit 1]
|
||||
252
Docs/MUSETALK_DEPLOY.md
Normal file
252
Docs/MUSETALK_DEPLOY.md
Normal file
@@ -0,0 +1,252 @@
|
||||
# MuseTalk 部署指南
|
||||
|
||||
> **更新时间**:2026-02-27
|
||||
> **适用版本**:MuseTalk v1.5 (常驻服务模式)
|
||||
> **架构**:FastAPI 常驻服务 + PM2 进程管理
|
||||
|
||||
---
|
||||
|
||||
## 架构概览
|
||||
|
||||
MuseTalk 作为 **混合唇形同步方案** 的长视频引擎:
|
||||
|
||||
- **短视频 (<120s)** → LatentSync 1.6 (GPU1, 端口 8007)
|
||||
- **长视频 (>=120s)** → MuseTalk 1.5 (GPU0, 端口 8011)
|
||||
- 路由阈值由 `LIPSYNC_DURATION_THRESHOLD` 控制
|
||||
- MuseTalk 不可用时自动回退到 LatentSync
|
||||
|
||||
---
|
||||
|
||||
## 硬件要求
|
||||
|
||||
| 配置 | 最低要求 | 推荐配置 |
|
||||
|------|----------|----------|
|
||||
| GPU | 8GB VRAM (RTX 3060) | 24GB VRAM (RTX 3090) |
|
||||
| 内存 | 32GB | 64GB |
|
||||
| CUDA | 11.7+ | 11.8 |
|
||||
|
||||
> MuseTalk fp16 推理约需 4-8GB 显存,可与 CosyVoice 共享 GPU0。
|
||||
|
||||
---
|
||||
|
||||
## 安装步骤
|
||||
|
||||
### 1. Conda 环境
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||||
conda create -n musetalk python=3.10 -y
|
||||
conda activate musetalk
|
||||
```
|
||||
|
||||
### 2. PyTorch 2.0.1 + CUDA 11.8
|
||||
|
||||
> 必须使用此版本,mmcv 预编译包依赖。
|
||||
|
||||
```bash
|
||||
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
|
||||
```
|
||||
|
||||
### 3. 依赖安装
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
|
||||
# MMLab 系列
|
||||
pip install --no-cache-dir -U openmim
|
||||
mim install mmengine
|
||||
mim install "mmcv==2.0.1"
|
||||
mim install "mmdet==3.1.0"
|
||||
pip install chumpy --no-build-isolation
|
||||
pip install "mmpose==1.1.0" --no-deps
|
||||
|
||||
# FastAPI 服务依赖
|
||||
pip install fastapi uvicorn httpx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 模型权重
|
||||
|
||||
### 目录结构
|
||||
|
||||
```
|
||||
models/MuseTalk/models/
|
||||
├── musetalk/ ← v1 基础模型
|
||||
│ ├── config.json -> musetalk.json (软链接)
|
||||
│ ├── musetalk.json
|
||||
│ ├── musetalkV15 -> ../musetalkV15 (软链接, 关键!)
|
||||
│ └── pytorch_model.bin (~3.2GB)
|
||||
├── musetalkV15/ ← v1.5 UNet 模型
|
||||
│ ├── musetalk.json
|
||||
│ └── unet.pth (~3.2GB)
|
||||
├── sd-vae/ ← Stable Diffusion VAE
|
||||
│ ├── config.json
|
||||
│ └── diffusion_pytorch_model.bin
|
||||
├── whisper/ ← OpenAI Whisper Tiny
|
||||
│ ├── config.json
|
||||
│ ├── pytorch_model.bin (~151MB)
|
||||
│ └── preprocessor_config.json
|
||||
├── dwpose/ ← DWPose 人体姿态检测
|
||||
│ └── dw-ll_ucoco_384.pth (~387MB)
|
||||
├── syncnet/ ← SyncNet 唇形同步评估
|
||||
│ └── latentsync_syncnet.pt
|
||||
└── face-parse-bisent/ ← 人脸解析模型
|
||||
├── 79999_iter.pth (~53MB)
|
||||
└── resnet18-5c106cde.pth (~45MB)
|
||||
```
|
||||
|
||||
### 下载方式
|
||||
|
||||
使用项目自带脚本:
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||||
conda activate musetalk
|
||||
bash download_weights.sh
|
||||
```
|
||||
|
||||
或手动 Python API 下载:
|
||||
|
||||
```bash
|
||||
conda activate musetalk
|
||||
export HF_ENDPOINT=https://hf-mirror.com
|
||||
python -c "
|
||||
from huggingface_hub import snapshot_download
|
||||
snapshot_download('TMElyralab/MuseTalk', local_dir='models',
|
||||
allow_patterns=['musetalk/*', 'musetalkV15/*'])
|
||||
snapshot_download('stabilityai/sd-vae-ft-mse', local_dir='models/sd-vae',
|
||||
allow_patterns=['config.json', 'diffusion_pytorch_model.bin'])
|
||||
snapshot_download('openai/whisper-tiny', local_dir='models/whisper',
|
||||
allow_patterns=['config.json', 'pytorch_model.bin', 'preprocessor_config.json'])
|
||||
snapshot_download('yzd-v/DWPose', local_dir='models/dwpose',
|
||||
allow_patterns=['dw-ll_ucoco_384.pth'])
|
||||
"
|
||||
```
|
||||
|
||||
### 创建必要的软链接
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk/models/musetalk
|
||||
ln -sf musetalk.json config.json
|
||||
ln -sf ../musetalkV15 musetalkV15
|
||||
```
|
||||
|
||||
> **关键**:`musetalk/musetalkV15` 软链接缺失会导致权重检测失败 (`weights: False`)。
|
||||
|
||||
---
|
||||
|
||||
## 服务启动
|
||||
|
||||
### PM2 进程管理(推荐)
|
||||
|
||||
```bash
|
||||
# 首次注册
|
||||
cd /home/rongye/ProgramFiles/ViGent2
|
||||
pm2 start run_musetalk.sh --name vigent2-musetalk
|
||||
pm2 save
|
||||
|
||||
# 日常管理
|
||||
pm2 restart vigent2-musetalk
|
||||
pm2 logs vigent2-musetalk
|
||||
pm2 stop vigent2-musetalk
|
||||
```
|
||||
|
||||
### 手动启动
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||||
/home/rongye/ProgramFiles/miniconda3/envs/musetalk/bin/python scripts/server.py
|
||||
```
|
||||
|
||||
### 健康检查
|
||||
|
||||
```bash
|
||||
curl http://localhost:8011/health
|
||||
# {"status":"ok","model_loaded":true}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 后端配置
|
||||
|
||||
`backend/.env` 中的相关变量:
|
||||
|
||||
```ini
|
||||
# MuseTalk 配置
|
||||
MUSETALK_GPU_ID=0 # GPU 编号 (与 CosyVoice 共存)
|
||||
MUSETALK_API_URL=http://localhost:8011 # 常驻服务地址
|
||||
MUSETALK_BATCH_SIZE=32 # 推理批大小
|
||||
MUSETALK_VERSION=v15 # 模型版本
|
||||
MUSETALK_USE_FLOAT16=true # 半精度加速
|
||||
|
||||
# 混合唇形同步路由
|
||||
LIPSYNC_DURATION_THRESHOLD=120 # 秒, >=此值用 MuseTalk
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 相关文件
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `models/MuseTalk/scripts/server.py` | FastAPI 常驻服务 (端口 8011) |
|
||||
| `run_musetalk.sh` | PM2 启动脚本 |
|
||||
| `backend/app/services/lipsync_service.py` | 混合路由 + `_call_musetalk_server()` |
|
||||
| `backend/app/core/config.py` | `MUSETALK_*` 配置项 |
|
||||
|
||||
---
|
||||
|
||||
## 性能优化 (server.py v2)
|
||||
|
||||
首次长视频测试 (136s, 3404 帧) 耗时 30 分钟。分析发现瓶颈在人脸检测 (28%)、BiSeNet 合成 (22%)、I/O (17%),而非 UNet 推理 (17%)。
|
||||
|
||||
### 已实施优化
|
||||
|
||||
| 优化项 | 说明 |
|
||||
|--------|------|
|
||||
| `MUSETALK_BATCH_SIZE` 8→32 | RTX 3090 显存充裕,UNet 推理加速 ~3x |
|
||||
| cv2.VideoCapture 直读帧 | 跳过 ffmpeg→PNG→imread 链路 |
|
||||
| 人脸检测降频 (每5帧) | DWPose + FaceAlignment 只在采样帧运行,中间帧线性插值 bbox |
|
||||
| BiSeNet mask 缓存 (每5帧) | `get_image_prepare_material` 每 5 帧运行,中间帧用 `get_image_blending` 复用 |
|
||||
| cv2.VideoWriter 直写 | 跳过逐帧 PNG 写盘 + ffmpeg 重编码 |
|
||||
| 每阶段计时 | 7 个阶段精确计时,方便后续调优 |
|
||||
|
||||
### 调优参数
|
||||
|
||||
`models/MuseTalk/scripts/server.py` 顶部可调:
|
||||
|
||||
```python
|
||||
DETECT_EVERY = 5 # 人脸检测降频间隔 (帧)
|
||||
BLEND_CACHE_EVERY = 5 # BiSeNet mask 缓存间隔 (帧)
|
||||
```
|
||||
|
||||
> 对于口播视频 (人脸几乎不动),5 帧间隔的插值误差可忽略。
|
||||
> 如人脸运动剧烈的场景,可降低为 2-3。
|
||||
|
||||
---
|
||||
|
||||
## 常见问题
|
||||
|
||||
### huggingface-hub 版本冲突
|
||||
|
||||
```
|
||||
ImportError: huggingface-hub>=0.19.3,<1.0 is required
|
||||
```
|
||||
|
||||
**解决**:降级 huggingface-hub
|
||||
|
||||
```bash
|
||||
pip install "huggingface-hub>=0.19.3,<1.0"
|
||||
```
|
||||
|
||||
### mmcv 导入失败
|
||||
|
||||
```bash
|
||||
pip uninstall mmcv mmcv-full -y
|
||||
mim install "mmcv==2.0.1"
|
||||
```
|
||||
|
||||
### 音视频长度不匹配
|
||||
|
||||
已在 `musetalk/utils/audio_processor.py` 中修复(零填充逻辑),无需额外处理。
|
||||
@@ -1,13 +1,13 @@
|
||||
# Qwen3-TTS 0.6B 部署指南
|
||||
# Qwen3-TTS 1.7B 部署指南
|
||||
|
||||
> 本文档描述如何在 Ubuntu 服务器上部署 Qwen3-TTS 0.6B-Base 声音克隆模型。
|
||||
> 本文档描述如何在 Ubuntu 服务器上部署 Qwen3-TTS 1.7B-Base 声音克隆模型。
|
||||
|
||||
## 系统要求
|
||||
|
||||
| 要求 | 规格 |
|
||||
|------|------|
|
||||
| GPU | NVIDIA RTX 3090 24GB (或更高) |
|
||||
| VRAM | ≥ 4GB (推理), ≥ 8GB (带 flash-attn) |
|
||||
| VRAM | ≥ 8GB (推理), ≥ 12GB (带 flash-attn) |
|
||||
| CUDA | 12.1+ |
|
||||
| Python | 3.10.x |
|
||||
| 系统 | Ubuntu 20.04+ |
|
||||
@@ -18,7 +18,7 @@
|
||||
|
||||
| GPU | 服务 | 模型 |
|
||||
|-----|------|------|
|
||||
| GPU0 | **Qwen3-TTS** | 0.6B-Base (声音克隆) |
|
||||
| GPU0 | **Qwen3-TTS** | 1.7B-Base (声音克隆,更高质量) |
|
||||
| GPU1 | LatentSync | 1.6 (唇形同步) |
|
||||
|
||||
---
|
||||
@@ -55,9 +55,9 @@ pip install -e .
|
||||
conda install -y -c conda-forge sox
|
||||
```
|
||||
|
||||
### 可选: 安装 FlashAttention (推荐)
|
||||
### 可选: 安装 FlashAttention (强烈推荐)
|
||||
|
||||
FlashAttention 可以显著提升推理速度并减少显存占用:
|
||||
FlashAttention 可以显著提升推理速度 (加载时间减少 85%) 并减少显存占用:
|
||||
|
||||
```bash
|
||||
pip install -U flash-attn --no-build-isolation
|
||||
@@ -81,8 +81,8 @@ pip install modelscope
|
||||
# 下载 Tokenizer (651MB)
|
||||
modelscope download --model Qwen/Qwen3-TTS-Tokenizer-12Hz --local_dir ./checkpoints/Tokenizer
|
||||
|
||||
# 下载 0.6B-Base 模型 (2.4GB)
|
||||
modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-Base --local_dir ./checkpoints/0.6B-Base
|
||||
# 下载 1.7B-Base 模型 (6.8GB)
|
||||
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./checkpoints/1.7B-Base
|
||||
```
|
||||
|
||||
### 方式 B: HuggingFace
|
||||
@@ -91,7 +91,7 @@ modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-Base --local_dir ./checkpoi
|
||||
pip install -U "huggingface_hub[cli]"
|
||||
|
||||
huggingface-cli download Qwen/Qwen3-TTS-Tokenizer-12Hz --local-dir ./checkpoints/Tokenizer
|
||||
huggingface-cli download Qwen/Qwen3-TTS-12Hz-0.6B-Base --local-dir ./checkpoints/0.6B-Base
|
||||
huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-Base --local-dir ./checkpoints/1.7B-Base
|
||||
```
|
||||
|
||||
下载完成后,目录结构应如下:
|
||||
@@ -102,7 +102,7 @@ checkpoints/
|
||||
│ ├── config.json
|
||||
│ ├── model.safetensors
|
||||
│ └── ...
|
||||
└── 0.6B-Base/ # ~2.4GB
|
||||
└── 1.7B-Base/ # ~6.8GB
|
||||
├── config.json
|
||||
├── model.safetensors
|
||||
└── ...
|
||||
@@ -136,7 +136,7 @@ from qwen_tts import Qwen3TTSModel
|
||||
|
||||
print("Loading Qwen3-TTS model on GPU:0...")
|
||||
model = Qwen3TTSModel.from_pretrained(
|
||||
"./checkpoints/0.6B-Base",
|
||||
"./checkpoints/1.7B-Base",
|
||||
device_map="cuda:0",
|
||||
dtype=torch.bfloat16,
|
||||
)
|
||||
@@ -169,24 +169,106 @@ python test_inference.py
|
||||
|
||||
---
|
||||
|
||||
## 步骤 6: 安装 HTTP 服务依赖
|
||||
|
||||
```bash
|
||||
conda activate qwen-tts
|
||||
pip install fastapi uvicorn python-multipart
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 步骤 7: 启动服务 (PM2 管理)
|
||||
|
||||
### 手动测试
|
||||
|
||||
```bash
|
||||
conda activate qwen-tts
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS
|
||||
python qwen_tts_server.py
|
||||
```
|
||||
|
||||
访问 http://localhost:8009/health 验证服务状态。
|
||||
|
||||
### PM2 常驻服务
|
||||
|
||||
> ⚠️ **注意**:启动脚本 `run_qwen_tts.sh` 位于项目**根目录**,而非 models/Qwen3-TTS 目录。
|
||||
|
||||
1. 使用启动脚本:
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2
|
||||
pm2 start ./run_qwen_tts.sh --name vigent2-qwen-tts
|
||||
pm2 save
|
||||
```
|
||||
|
||||
2. 查看日志:
|
||||
```bash
|
||||
pm2 logs vigent2-qwen-tts
|
||||
```
|
||||
|
||||
3. 重启服务:
|
||||
```bash
|
||||
pm2 restart vigent2-qwen-tts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 目录结构
|
||||
|
||||
部署完成后,目录结构应如下:
|
||||
|
||||
```
|
||||
/home/rongye/ProgramFiles/ViGent2/models/Qwen3-TTS/
|
||||
├── checkpoints/
|
||||
│ ├── Tokenizer/ # 语音编解码器
|
||||
│ └── 0.6B-Base/ # 声音克隆模型
|
||||
├── qwen_tts/ # 源码
|
||||
│ ├── inference/
|
||||
│ ├── models/
|
||||
│ └── ...
|
||||
├── examples/
|
||||
│ └── myvoice.wav # 参考音频
|
||||
├── pyproject.toml
|
||||
├── requirements.txt
|
||||
└── test_inference.py # 测试脚本
|
||||
/home/rongye/ProgramFiles/ViGent2/
|
||||
├── run_qwen_tts.sh # PM2 启动脚本 (根目录)
|
||||
└── models/Qwen3-TTS/
|
||||
├── checkpoints/
|
||||
│ ├── Tokenizer/ # 语音编解码器
|
||||
│ └── 1.7B-Base/ # 声音克隆模型 (更高质量)
|
||||
├── qwen_tts/ # 源码
|
||||
│ ├── inference/
|
||||
│ ├── models/
|
||||
│ └── ...
|
||||
├── examples/
|
||||
│ └── myvoice.wav # 参考音频
|
||||
├── qwen_tts_server.py # HTTP 推理服务 (端口 8009)
|
||||
├── pyproject.toml
|
||||
├── requirements.txt
|
||||
└── test_inference.py # 测试脚本
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API 参考
|
||||
|
||||
### 健康检查
|
||||
|
||||
```
|
||||
GET http://localhost:8009/health
|
||||
```
|
||||
|
||||
响应:
|
||||
```json
|
||||
{
|
||||
"service": "Qwen3-TTS Voice Clone",
|
||||
"model": "1.7B-Base",
|
||||
"ready": true,
|
||||
"gpu_id": 0
|
||||
}
|
||||
```
|
||||
|
||||
### 声音克隆生成
|
||||
|
||||
```
|
||||
POST http://localhost:8009/generate
|
||||
Content-Type: multipart/form-data
|
||||
|
||||
Fields:
|
||||
- ref_audio: 参考音频文件 (WAV)
|
||||
- text: 要合成的文本
|
||||
- ref_text: 参考音频的转写文字
|
||||
- language: 语言 (默认 Chinese)
|
||||
|
||||
Response: audio/wav 文件
|
||||
```
|
||||
|
||||
---
|
||||
@@ -199,7 +281,7 @@ python test_inference.py
|
||||
|------|------|------|
|
||||
| 0.6B-Base | 3秒快速声音克隆 | 2.4GB |
|
||||
| 0.6B-CustomVoice | 9种预设音色 | 2.4GB |
|
||||
| 1.7B-Base | 声音克隆 (更高质量) | 6.8GB |
|
||||
| **1.7B-Base** | **声音克隆 (更高质量)** ✅ 当前使用 | 6.8GB |
|
||||
| 1.7B-VoiceDesign | 自然语言描述生成声音 | 6.8GB |
|
||||
|
||||
### 支持语言
|
||||
@@ -216,25 +298,34 @@ python test_inference.py
|
||||
SoX could not be found!
|
||||
```
|
||||
|
||||
**解决**: 通过 conda 安装 sox:
|
||||
**解决**:
|
||||
|
||||
1. 通过 conda 安装 sox:
|
||||
|
||||
```bash
|
||||
conda install -y -c conda-forge sox
|
||||
```
|
||||
|
||||
2. 确保启动脚本 `run_qwen_tts.sh` 中已 export conda env bin 到 PATH(PM2 启动时系统 PATH 不含 conda 环境目录):
|
||||
|
||||
```bash
|
||||
export PATH="/home/rongye/ProgramFiles/miniconda3/envs/qwen-tts/bin:$PATH"
|
||||
```
|
||||
|
||||
### CUDA 内存不足
|
||||
|
||||
Qwen3-TTS 0.6B 通常只需要 4-6GB VRAM。如果遇到 OOM:
|
||||
Qwen3-TTS 1.7B 通常需要 8-10GB VRAM。如果遇到 OOM:
|
||||
|
||||
1. 确保 GPU0 没有运行其他程序
|
||||
2. 不使用 flash-attn (会增加显存占用)
|
||||
3. 使用更小的参考音频 (3-5秒)
|
||||
4. 如果显存仍不足,可降级使用 0.6B-Base 模型
|
||||
|
||||
### 模型加载失败
|
||||
|
||||
确保以下文件存在:
|
||||
- `checkpoints/0.6B-Base/config.json`
|
||||
- `checkpoints/0.6B-Base/model.safetensors`
|
||||
- `checkpoints/1.7B-Base/config.json`
|
||||
- `checkpoints/1.7B-Base/model.safetensors`
|
||||
|
||||
### 音频输出质量问题
|
||||
|
||||
@@ -244,6 +335,55 @@ Qwen3-TTS 0.6B 通常只需要 4-6GB VRAM。如果遇到 OOM:
|
||||
|
||||
---
|
||||
|
||||
## 后端 ViGent2 集成
|
||||
|
||||
### 声音克隆服务 (`voice_clone_service.py`)
|
||||
|
||||
后端通过 HTTP 调用 Qwen3-TTS 服务:
|
||||
|
||||
```python
|
||||
import aiohttp
|
||||
|
||||
QWEN_TTS_URL = "http://localhost:8009"
|
||||
|
||||
async def generate_cloned_audio(ref_audio_path: str, text: str, output_path: str):
|
||||
async with aiohttp.ClientSession() as session:
|
||||
with open(ref_audio_path, "rb") as f:
|
||||
data = aiohttp.FormData()
|
||||
data.add_field("ref_audio", f, filename="ref.wav")
|
||||
data.add_field("text", text)
|
||||
|
||||
async with session.post(f"{QWEN_TTS_URL}/generate", data=data) as resp:
|
||||
audio_data = await resp.read()
|
||||
with open(output_path, "wb") as out:
|
||||
out.write(audio_data)
|
||||
return output_path
|
||||
```
|
||||
|
||||
### 参考音频 Supabase Bucket
|
||||
|
||||
```sql
|
||||
-- 创建 ref-audios bucket
|
||||
INSERT INTO storage.buckets (id, name, public)
|
||||
VALUES ('ref-audios', 'ref-audios', true)
|
||||
ON CONFLICT (id) DO NOTHING;
|
||||
|
||||
-- RLS 策略
|
||||
CREATE POLICY "Allow public uploads" ON storage.objects
|
||||
FOR INSERT TO anon WITH CHECK (bucket_id = 'ref-audios');
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 更新日志
|
||||
|
||||
| 日期 | 版本 | 说明 |
|
||||
|------|------|------|
|
||||
| 2026-02-09 | 1.2.0 | 修复 SoX PATH 问题(run_qwen_tts.sh export conda bin),每次生成后 empty_cache() |
|
||||
| 2026-01-30 | 1.1.0 | 明确默认模型升级为 1.7B-Base,替换旧版 0.6B 路径 |
|
||||
|
||||
---
|
||||
|
||||
## 参考链接
|
||||
|
||||
- [Qwen3-TTS GitHub](https://github.com/QwenLM/Qwen3-TTS)
|
||||
|
||||
296
Docs/SUBTITLE_DEPLOY.md
Normal file
296
Docs/SUBTITLE_DEPLOY.md
Normal file
@@ -0,0 +1,296 @@
|
||||
# ViGent2 字幕与标题功能部署指南
|
||||
|
||||
本文档介绍如何部署 ViGent2 的逐字高亮字幕和片头标题功能。
|
||||
|
||||
## 功能概述
|
||||
|
||||
| 功能 | 说明 |
|
||||
|------|------|
|
||||
| **逐字高亮字幕** | 使用 faster-whisper 生成字级别时间戳,Remotion 渲染卡拉OK效果 |
|
||||
| **片头标题** | 视频开头显示标题,带淡入淡出动画,几秒后消失 |
|
||||
|
||||
## 技术架构
|
||||
|
||||
```
|
||||
原有流程:
|
||||
文本 → EdgeTTS → 音频 → LatentSync → FFmpeg合成 → 最终视频
|
||||
|
||||
新流程 (单素材):
|
||||
文本 → EdgeTTS/CosyVoice/预生成配音 → 音频 ─┬→ LatentSync/MuseTalk → 唇形视频 ─┐
|
||||
└→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频
|
||||
|
||||
新流程 (多素材):
|
||||
音频 → 多素材按 custom_assignments 拼接 → LatentSync/MuseTalk (单次推理) → 唇形视频 ─┐
|
||||
音频 → faster-whisper → 字幕JSON ─────────────────────────────────────────────┴→ Remotion合成 → 最终视频
|
||||
```
|
||||
|
||||
> **唇形同步路由**: 短视频 (<120s) 用 LatentSync 1.6 (GPU1),长视频 (>=120s) 用 MuseTalk 1.5 (GPU0),由 `LIPSYNC_DURATION_THRESHOLD` 控制。
|
||||
|
||||
## 系统要求
|
||||
|
||||
| 组件 | 要求 |
|
||||
|------|------|
|
||||
| Node.js | 18+ |
|
||||
| Python | 3.10+ |
|
||||
| GPU 显存 | faster-whisper 需要约 3-4GB VRAM |
|
||||
| FFmpeg | 已安装 |
|
||||
|
||||
---
|
||||
|
||||
## 部署步骤
|
||||
|
||||
### 步骤 1: 安装 faster-whisper (Python)
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
source venv/bin/activate
|
||||
|
||||
# 安装 faster-whisper
|
||||
pip install faster-whisper>=1.0.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
```
|
||||
|
||||
> **注意**: 首次运行时,faster-whisper 会自动下载 `large-v3` Whisper 模型 (~3GB)
|
||||
|
||||
### 步骤 2: 安装 Remotion (Node.js)
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/remotion
|
||||
|
||||
# 安装依赖
|
||||
npm install
|
||||
|
||||
# 预编译渲染脚本 (生产环境必须)
|
||||
npm run build:render
|
||||
```
|
||||
|
||||
### 步骤 3: 重启后端服务
|
||||
|
||||
```bash
|
||||
pm2 restart vigent2-backend
|
||||
```
|
||||
|
||||
### 步骤 4: 验证安装
|
||||
|
||||
```bash
|
||||
# 检查 faster-whisper 是否安装成功
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
source venv/bin/activate
|
||||
python -c "from faster_whisper import WhisperModel; print('faster-whisper OK')"
|
||||
|
||||
# 检查 Remotion 是否安装成功
|
||||
cd /home/rongye/ProgramFiles/ViGent2/remotion
|
||||
npx remotion --version
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 文件结构
|
||||
|
||||
### 后端新增文件
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `backend/app/services/whisper_service.py` | 字幕对齐服务 (基于 faster-whisper) |
|
||||
| `backend/app/services/remotion_service.py` | Remotion 渲染服务 |
|
||||
|
||||
### Remotion 项目结构
|
||||
|
||||
```
|
||||
remotion/
|
||||
├── package.json # Node.js 依赖配置
|
||||
├── tsconfig.json # TypeScript 配置
|
||||
├── render.ts # 服务端渲染脚本
|
||||
└── src/
|
||||
├── index.ts # Remotion 入口
|
||||
├── Root.tsx # 根组件
|
||||
├── Video.tsx # 主视频组件
|
||||
├── components/
|
||||
│ ├── Title.tsx # 片头标题组件
|
||||
│ ├── Subtitles.tsx # 逐字高亮字幕组件
|
||||
│ └── VideoLayer.tsx # 视频图层组件
|
||||
├── utils/
|
||||
│ └── captions.ts # 字幕数据处理工具
|
||||
└── fonts/ # 字体文件目录 (可选)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API 参数
|
||||
|
||||
视频生成 API (`POST /api/videos/generate`) 新增以下参数:
|
||||
|
||||
| 参数 | 类型 | 默认值 | 说明 |
|
||||
|------|------|--------|------|
|
||||
| `title` | string | null | 视频标题(片头显示,可选) |
|
||||
| `enable_subtitles` | boolean | true | 是否启用逐字高亮字幕 |
|
||||
|
||||
### 请求示例
|
||||
|
||||
```json
|
||||
{
|
||||
"material_path": "https://...",
|
||||
"text": "大家好,欢迎来到我的频道",
|
||||
"tts_mode": "edgetts",
|
||||
"voice": "zh-CN-YunxiNeural",
|
||||
"title": "今日分享",
|
||||
"enable_subtitles": true
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 视频生成流程
|
||||
|
||||
新的视频生成流程进度分配:
|
||||
|
||||
| 阶段 | 进度 | 说明 |
|
||||
|------|------|------|
|
||||
| 下载素材 | 0% → 5% | 从 Supabase 下载输入视频 |
|
||||
| TTS 语音生成 | 5% → 25% | EdgeTTS / Qwen3-TTS / 预生成配音下载 |
|
||||
| 唇形同步 | 25% → 80% | LatentSync 推理 |
|
||||
| 字幕对齐 | 80% → 85% | faster-whisper 生成字级别时间戳 |
|
||||
| Remotion 渲染 | 85% → 95% | 合成字幕和标题 |
|
||||
| 上传结果 | 95% → 100% | 上传到 Supabase Storage |
|
||||
|
||||
---
|
||||
|
||||
## 降级处理
|
||||
|
||||
系统包含自动降级机制,确保基本功能不受影响:
|
||||
|
||||
| 场景 | 处理方式 |
|
||||
|------|----------|
|
||||
| 字幕对齐失败 | 跳过字幕,继续生成视频 |
|
||||
| Remotion 未安装 | 使用 FFmpeg 直接合成 |
|
||||
| Remotion 渲染失败 | 回退到 FFmpeg 合成 |
|
||||
|
||||
---
|
||||
|
||||
## 配置说明
|
||||
|
||||
### 字幕服务配置
|
||||
|
||||
字幕服务位于 `backend/app/services/whisper_service.py`,默认配置:
|
||||
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `model_size` | large-v3 | Whisper 模型大小 |
|
||||
| `device` | cuda | 运行设备 |
|
||||
| `compute_type` | float16 | 计算精度 |
|
||||
|
||||
如需修改,可编辑 `whisper_service.py` 中的 `WhisperService` 初始化参数。
|
||||
|
||||
### Remotion 配置
|
||||
|
||||
Remotion 渲染参数在 `backend/app/services/remotion_service.py` 中配置:
|
||||
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `fps` | 25 | 输出帧率 |
|
||||
| `concurrency` | 16 | Remotion 并发渲染进程数(默认 16,可通过 `--concurrency` CLI 参数覆盖) |
|
||||
| `title_display_mode` | `short` | 标题显示模式(`short`=短暂显示;`persistent`=常驻显示) |
|
||||
| `title_duration` | 4.0 | 标题显示时长(秒,仅 `short` 模式生效) |
|
||||
|
||||
---
|
||||
|
||||
## 故障排除
|
||||
|
||||
### faster-whisper 相关
|
||||
|
||||
**问题**: `ModuleNotFoundError: No module named 'faster_whisper'`
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
source venv/bin/activate
|
||||
pip install faster-whisper>=1.0.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
|
||||
```
|
||||
|
||||
**问题**: GPU 显存不足
|
||||
|
||||
修改 `whisper_service.py`,使用较小的模型:
|
||||
```python
|
||||
WhisperService(model_size="medium", compute_type="int8")
|
||||
```
|
||||
|
||||
### Remotion 相关
|
||||
|
||||
**问题**: `node_modules not found`
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/remotion
|
||||
npm install
|
||||
```
|
||||
|
||||
**问题**: Remotion 渲染失败 - `fs` 模块错误
|
||||
|
||||
确保 `remotion/src/utils/captions.ts` 中没有使用 Node.js 的 `fs` 模块。Remotion 在浏览器环境打包,不支持 `fs`。
|
||||
|
||||
**问题**: Remotion 渲染失败 - 视频文件读取错误 (`file://` 协议)
|
||||
|
||||
确保 `render.ts` 使用 `publicDir` 选项指向视频所在目录,`VideoLayer.tsx` 使用 `staticFile()` 加载视频:
|
||||
|
||||
```typescript
|
||||
// render.ts
|
||||
const publicDir = path.dirname(path.resolve(options.videoPath));
|
||||
const bundleLocation = await bundle({
|
||||
entryPoint: path.resolve(__dirname, './src/index.ts'),
|
||||
publicDir, // 关键配置
|
||||
});
|
||||
|
||||
// VideoLayer.tsx
|
||||
const videoUrl = staticFile(videoSrc); // 使用 staticFile
|
||||
```
|
||||
|
||||
**问题**: Remotion 渲染失败
|
||||
|
||||
查看后端日志:
|
||||
```bash
|
||||
pm2 logs vigent2-backend
|
||||
```
|
||||
|
||||
### 查看服务健康状态
|
||||
|
||||
```bash
|
||||
# 字幕服务健康检查
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
source venv/bin/activate
|
||||
python -c "from app.services.whisper_service import whisper_service; import asyncio; print(asyncio.run(whisper_service.check_health()))"
|
||||
|
||||
# Remotion 健康检查
|
||||
python -c "from app.services.remotion_service import remotion_service; import asyncio; print(asyncio.run(remotion_service.check_health()))"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 可选优化
|
||||
|
||||
### 添加中文字体
|
||||
|
||||
为获得更好的字幕渲染效果,可添加中文字体:
|
||||
|
||||
```bash
|
||||
# 下载 Noto Sans SC 字体
|
||||
cd /home/rongye/ProgramFiles/ViGent2/remotion/src/fonts
|
||||
wget https://github.com/googlefonts/noto-cjk/raw/main/Sans/OTF/SimplifiedChinese/NotoSansSC-Regular.otf -O NotoSansSC.otf
|
||||
```
|
||||
|
||||
### 使用 GPU 0
|
||||
|
||||
faster-whisper 默认使用 GPU 0,与 MuseTalk 共享 GPU 0;LatentSync 使用 GPU 1,互不冲突。如需指定 GPU:
|
||||
|
||||
```python
|
||||
# 在 whisper_service.py 中修改
|
||||
WhisperService(device="cuda:0") # 或 "cuda:1"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 更新日志
|
||||
|
||||
| 日期 | 版本 | 说明 |
|
||||
|------|------|------|
|
||||
| 2026-01-29 | 1.0.0 | 初始版本,使用 faster-whisper + Remotion 实现逐字高亮字幕和片头标题 |
|
||||
| 2026-02-10 | 1.1.0 | 更新架构图:多素材 concat-then-infer、预生成配音选项 |
|
||||
| 2026-01-30 | 1.0.1 | 字幕高亮样式与标题动画优化,视觉表现更清晰 |
|
||||
| 2026-02-25 | 1.2.0 | 字幕时间戳从线性插值改为 Whisper 节奏映射,修复长视频字幕漂移 |
|
||||
| 2026-02-27 | 1.3.0 | 架构图更新 MuseTalk 混合路由;Remotion 并发渲染从 8 提升到 16;GPU 分配说明更新 |
|
||||
@@ -1,405 +0,0 @@
|
||||
# 数字人口播视频生成系统 - 实现计划
|
||||
|
||||
## 项目目标
|
||||
|
||||
构建一个开源的数字人口播视频生成系统,功能包括:
|
||||
- 上传静态人物视频 → 生成口播视频(唇形同步)
|
||||
- TTS 配音或声音克隆
|
||||
- 字幕自动生成与渲染
|
||||
- 一键发布到多个社交平台
|
||||
|
||||
---
|
||||
|
||||
## 技术架构
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 前端 (Next.js) │
|
||||
│ 素材管理 | 视频生成 | 发布管理 | 任务状态 │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│ REST API
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 后端 (FastAPI) │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ 异步任务队列 (asyncio) │
|
||||
│ ├── 视频生成任务 │
|
||||
│ ├── TTS 配音任务 │
|
||||
│ └── 自动发布任务 │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
│LatentSync│ │ FFmpeg │ │Playwright│
|
||||
│ 唇形同步 │ │ 视频合成 │ │ 自动发布 │
|
||||
└──────────┘ └──────────┘ └──────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 技术选型
|
||||
|
||||
| 模块 | 技术选择 | 备选方案 |
|
||||
|------|----------|----------|
|
||||
| **前端框架** | Next.js 14 | Vue 3 + Vite |
|
||||
| **UI 组件库** | Tailwind + shadcn/ui | Ant Design |
|
||||
| **后端框架** | FastAPI | Flask |
|
||||
| **任务队列** | Celery + Redis | RQ / Dramatiq |
|
||||
| **唇形同步** | **LatentSync 1.6** | MuseTalk / Wav2Lip |
|
||||
| **TTS 配音** | EdgeTTS | CosyVoice |
|
||||
| **声音克隆** | GPT-SoVITS (可选) | - |
|
||||
| **视频处理** | FFmpeg | MoviePy |
|
||||
| **自动发布** | social-auto-upload | 自行实现 |
|
||||
| **数据库** | SQLite → PostgreSQL | MySQL |
|
||||
| **文件存储** | 本地 / MinIO | 阿里云 OSS |
|
||||
|
||||
---
|
||||
|
||||
## 分阶段实施计划
|
||||
|
||||
### 阶段一:核心功能验证 (MVP)
|
||||
|
||||
> **目标**:验证 MuseTalk + EdgeTTS 效果,跑通端到端流程
|
||||
|
||||
#### 1.1 环境搭建
|
||||
|
||||
```bash
|
||||
# 创建项目目录
|
||||
mkdir TalkingHeadAgent
|
||||
cd TalkingHeadAgent
|
||||
|
||||
# 克隆 MuseTalk
|
||||
git clone https://github.com/TMElyralab/MuseTalk.git
|
||||
|
||||
# 安装依赖
|
||||
cd MuseTalk
|
||||
pip install -r requirements.txt
|
||||
|
||||
# 下载模型权重 (按官方文档)
|
||||
```
|
||||
|
||||
#### 1.2 集成 EdgeTTS
|
||||
|
||||
```python
|
||||
# tts_engine.py
|
||||
import edge_tts
|
||||
import asyncio
|
||||
|
||||
async def text_to_speech(text: str, voice: str = "zh-CN-YunxiNeural", output_path: str = "output.mp3"):
|
||||
communicate = edge_tts.Communicate(text, voice)
|
||||
await communicate.save(output_path)
|
||||
return output_path
|
||||
```
|
||||
|
||||
#### 1.3 端到端测试脚本
|
||||
|
||||
```python
|
||||
# test_pipeline.py
|
||||
"""
|
||||
1. 文案 → EdgeTTS → 音频
|
||||
2. 静态视频 + 音频 → MuseTalk → 口播视频
|
||||
3. 添加字幕 → FFmpeg → 最终视频
|
||||
"""
|
||||
```
|
||||
|
||||
#### 1.4 验证标准
|
||||
- [ ] MuseTalk 能正常推理
|
||||
- [ ] 唇形与音频同步率 > 90%
|
||||
- [ ] 单个视频生成时间 < 2 分钟
|
||||
|
||||
---
|
||||
|
||||
### 阶段二:后端 API 开发
|
||||
|
||||
> **目标**:将核心功能封装为 API,支持异步任务
|
||||
|
||||
#### 2.1 项目结构
|
||||
|
||||
```
|
||||
backend/
|
||||
├── app/
|
||||
│ ├── main.py # FastAPI 入口
|
||||
│ ├── api/
|
||||
│ │ ├── videos.py # 视频生成 API
|
||||
│ │ ├── materials.py # 素材管理 API
|
||||
│ │ └── publish.py # 发布管理 API
|
||||
│ ├── services/
|
||||
│ │ ├── tts_service.py # TTS 服务
|
||||
│ │ ├── lipsync_service.py # 唇形同步服务
|
||||
│ │ └── video_service.py # 视频合成服务
|
||||
│ ├── tasks/
|
||||
│ │ └── celery_tasks.py # Celery 异步任务
|
||||
│ ├── models/
|
||||
│ │ └── schemas.py # Pydantic 模型
|
||||
│ └── core/
|
||||
│ └── config.py # 配置管理
|
||||
├── requirements.txt
|
||||
└── docker-compose.yml # Redis + API
|
||||
```
|
||||
|
||||
#### 2.2 核心 API 设计
|
||||
|
||||
| 端点 | 方法 | 功能 |
|
||||
|------|------|------|
|
||||
| `/api/materials` | POST | 上传素材视频 | ✅ |
|
||||
| `/api/materials` | GET | 获取素材列表 | ✅ |
|
||||
| `/api/videos/generate` | POST | 创建视频生成任务 | ✅ |
|
||||
| `/api/tasks/{id}` | GET | 查询任务状态 | ✅ |
|
||||
| `/api/videos/{id}/download` | GET | 下载生成的视频 | ✅ |
|
||||
| `/api/publish` | POST | 发布到社交平台 | ✅ |
|
||||
|
||||
#### 2.3 Celery 任务定义
|
||||
|
||||
```python
|
||||
# tasks/celery_tasks.py
|
||||
@celery.task
|
||||
def generate_video_task(material_id: str, text: str, voice: str):
|
||||
# 1. TTS 生成音频
|
||||
# 2. MuseTalk 唇形同步
|
||||
# 3. FFmpeg 添加字幕
|
||||
# 4. 保存并返回视频 URL
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 阶段三:前端 Web UI
|
||||
|
||||
> **目标**:提供用户友好的操作界面
|
||||
|
||||
#### 3.1 页面设计
|
||||
|
||||
| 页面 | 功能 |
|
||||
|------|------|
|
||||
| **素材库** | 上传/管理多场景素材视频 |
|
||||
| **生成视频** | 输入文案、选择素材、生成预览 |
|
||||
| **任务中心** | 查看生成进度、下载视频 |
|
||||
| **发布管理** | 绑定平台、一键发布、定时发布 |
|
||||
|
||||
#### 3.2 技术实现
|
||||
|
||||
```bash
|
||||
# 创建 Next.js 项目
|
||||
npx create-next-app@latest frontend --typescript --tailwind --app
|
||||
|
||||
# 安装依赖
|
||||
cd frontend
|
||||
npm install @tanstack/react-query axios
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 阶段四:社交媒体发布
|
||||
|
||||
> **目标**:集成 social-auto-upload,支持多平台发布
|
||||
|
||||
#### 4.1 复用 social-auto-upload
|
||||
|
||||
```bash
|
||||
# 复制模块
|
||||
cp -r SuperIPAgent/social-auto-upload backend/social_upload
|
||||
```
|
||||
|
||||
#### 4.2 Cookie 管理
|
||||
|
||||
```python
|
||||
# 用户通过浏览器登录 → 保存 Cookie → 后续自动发布
|
||||
```
|
||||
|
||||
#### 4.3 支持平台
|
||||
- 抖音
|
||||
- 小红书
|
||||
- 微信视频号
|
||||
- 快手
|
||||
|
||||
---
|
||||
|
||||
### 阶段五:优化与扩展
|
||||
|
||||
| 功能 | 实现方式 |
|
||||
|------|----------|
|
||||
| **声音克隆** | 集成 GPT-SoVITS,用自己的声音 |
|
||||
| **批量生成** | 上传 Excel/CSV,批量生成视频 |
|
||||
| **字幕编辑器** | 可视化调整字幕样式、位置 |
|
||||
| **Docker 部署** | 一键部署到云服务器 | ✅ |
|
||||
|
||||
---
|
||||
|
||||
### 阶段六:MuseTalk 服务器部署 (Day 2-3) ✅
|
||||
|
||||
> **目标**:在双显卡服务器上部署 MuseTalk 环境
|
||||
|
||||
- [x] Conda 环境配置 (musetalk)
|
||||
- [x] 模型权重下载 (~7GB)
|
||||
- [x] Subprocess 调用方式实现
|
||||
- [x] 健康检查功能
|
||||
|
||||
### 阶段七:MuseTalk 完整修复 (Day 4) ✅
|
||||
|
||||
> **目标**:解决推理脚本的各种兼容性问题
|
||||
|
||||
- [x] 权重检测路径修复 (软链接)
|
||||
- [x] 音视频长度不匹配修复
|
||||
- [x] 推理脚本错误日志增强
|
||||
- [x] 视频合成 MP4 生成验证
|
||||
|
||||
### 阶段八:前端功能增强 (Day 5) ✅
|
||||
|
||||
> **目标**:提升用户体验
|
||||
|
||||
- [x] Web 视频上传功能
|
||||
- [x] 上传进度显示
|
||||
- [x] 自动刷新素材列表
|
||||
|
||||
### 阶段九:唇形同步模型升级 (Day 6) ✅
|
||||
|
||||
> **目标**:从 MuseTalk 迁移到 LatentSync 1.6
|
||||
|
||||
- [x] MuseTalk → LatentSync 1.6 迁移
|
||||
- [x] 后端代码适配 (config.py, lipsync_service.py)
|
||||
- [x] Latent Diffusion 架构 (512x512 高清)
|
||||
- [x] 服务器端到端验证
|
||||
|
||||
### 阶段十:性能优化 (Day 6) ✅
|
||||
|
||||
> **目标**:提升系统响应速度和稳定性
|
||||
|
||||
- [x] 视频预压缩优化 (1080p → 720p 自动适配)
|
||||
- [x] 进度更新细化 (实时反馈)
|
||||
- [x] **常驻模型服务** (Persistent Server, 0s 加载)
|
||||
- [x] **GPU 并发控制** (串行队列防崩溃)
|
||||
|
||||
### 阶段十一:社交媒体发布完善 (Day 7) ✅
|
||||
|
||||
> **目标**:实现全自动扫码登录和多平台发布
|
||||
|
||||
- [x] QR码自动登录 (Playwright headless + Stealth)
|
||||
- [x] 多平台上传器架构 (B站/抖音/小红书)
|
||||
- [x] Cookie 自动管理
|
||||
- [x] 定时发布功能
|
||||
|
||||
### 阶段十二:用户体验优化 (Day 8) ✅
|
||||
|
||||
> **目标**:提升文件管理和历史记录功能
|
||||
|
||||
- [x] 文件名保留 (时间戳前缀 + 原始名称)
|
||||
- [x] 视频持久化 (历史视频列表 API)
|
||||
- [x] 素材/视频删除功能
|
||||
|
||||
### 阶段十三:发布模块优化 (Day 9) ✅
|
||||
|
||||
> **目标**:代码质量优化 + 发布功能验证
|
||||
|
||||
- [x] B站/抖音登录+发布验证通过
|
||||
- [x] 资源清理保障 (try-finally)
|
||||
- [x] 超时保护 (消除无限循环)
|
||||
- [x] 完整类型提示
|
||||
|
||||
### 阶段十四:用户认证系统 (Day 9) ✅
|
||||
|
||||
> **目标**:实现安全、隔离的多用户认证体系
|
||||
|
||||
- [x] Supabase 云数据库集成 (本地自托管)
|
||||
- [x] JWT + HttpOnly Cookie 认证架构
|
||||
- [x] 用户表与权限表设计 (RLS 准备)
|
||||
- [x] 认证部署文档 (Docs/SUPABASE_DEPLOY.md)
|
||||
|
||||
### 阶段十五:部署稳定性优化 (Day 9) ✅
|
||||
|
||||
> **目标**:确保生产环境服务长期稳定
|
||||
|
||||
- [x] 依赖冲突修复 (bcrypt)
|
||||
- [x] 前端构建修复 (Production Build)
|
||||
- [x] PM2 进程守护配置
|
||||
- [x] 部署手册更新 (Docs/DEPLOY_MANUAL.md)
|
||||
|
||||
### 阶段十六:HTTPS 全栈部署 (Day 10) ✅
|
||||
|
||||
> **目标**:实现安全的公网 HTTPS 访问
|
||||
|
||||
- [x] 阿里云 Nginx 反向代理配置
|
||||
- [x] Let's Encrypt SSL 证书集成
|
||||
- [x] Supabase 自托管部署 (Docker)
|
||||
- [x] 端口冲突解决 (3003/8008/8444)
|
||||
- [x] Basic Auth 管理后台保护
|
||||
|
||||
---
|
||||
|
||||
## 项目目录结构 (最终)
|
||||
|
||||
```
|
||||
TalkingHeadAgent/
|
||||
├── frontend/ # Next.js 前端
|
||||
│ ├── app/
|
||||
│ ├── components/
|
||||
│ └── package.json
|
||||
├── backend/ # FastAPI 后端
|
||||
│ ├── app/
|
||||
│ ├── MuseTalk/ # 唇形同步模型
|
||||
│ ├── social_upload/ # 社交发布模块
|
||||
│ └── requirements.txt
|
||||
├── docker-compose.yml # 一键部署
|
||||
└── README.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 开发时间估算
|
||||
|
||||
| 阶段 | 预计时间 | 说明 |
|
||||
|------|----------|------|
|
||||
| 阶段一 | 2-3 天 | 环境搭建 + 效果验证 |
|
||||
| 阶段二 | 3-4 天 | 后端 API 开发 |
|
||||
| 阶段三 | 3-4 天 | 前端 UI 开发 |
|
||||
| 阶段四 | 2 天 | 社交发布集成 |
|
||||
| 阶段五 | 按需 | 持续优化 |
|
||||
|
||||
**总计**:约 10-13 天可完成 MVP
|
||||
|
||||
---
|
||||
|
||||
## 验证计划
|
||||
|
||||
### 阶段一验证
|
||||
1. 运行 `test_pipeline.py` 脚本
|
||||
2. 检查生成视频的唇形同步效果
|
||||
3. 确认音画同步
|
||||
|
||||
### 阶段二验证
|
||||
1. 使用 Postman/curl 测试所有 API 端点
|
||||
2. 验证任务队列正常工作
|
||||
3. 检查视频生成完整流程
|
||||
|
||||
### 阶段三验证
|
||||
1. 在浏览器中完成完整操作流程
|
||||
2. 验证上传、生成、下载功能
|
||||
3. 检查响应式布局
|
||||
|
||||
### 阶段四验证
|
||||
1. 发布一个测试视频到抖音
|
||||
2. 验证定时发布功能
|
||||
3. 检查发布状态同步
|
||||
|
||||
---
|
||||
|
||||
## 硬件要求
|
||||
|
||||
| 配置 | 最低要求 | 推荐配置 |
|
||||
|------|----------|----------|
|
||||
| **GPU** | NVIDIA GTX 1060 6GB | RTX 3060 12GB+ |
|
||||
| **内存** | 16GB | 32GB |
|
||||
| **存储** | 100GB SSD | 500GB SSD |
|
||||
| **CUDA** | 11.7+ | 12.0+ |
|
||||
|
||||
---
|
||||
|
||||
## 下一步行动
|
||||
|
||||
1. **确认你的 GPU 配置** - MuseTalk 需要 NVIDIA GPU
|
||||
2. **选择开发起点** - 从阶段一开始验证效果
|
||||
3. **确定项目位置** - 在哪个目录创建项目
|
||||
|
||||
---
|
||||
|
||||
> [!IMPORTANT]
|
||||
> 请确认以上计划是否符合你的需求,有任何需要调整的地方请告诉我。
|
||||
@@ -1,360 +1,272 @@
|
||||
# ViGent 数字人口播系统 - 开发任务清单
|
||||
# ViGent2 开发任务清单 (Task Log)
|
||||
|
||||
**项目**:ViGent2 数字人口播视频生成系统
|
||||
**服务器**:Dell R730 (2× RTX 3090 24GB)
|
||||
**更新时间**:2026-01-28
|
||||
**整体进度**:100%(Day 12 iOS 兼容、移动端优化、Qwen3-TTS 部署)
|
||||
|
||||
## 📖 快速导航
|
||||
|
||||
| 章节 | 说明 |
|
||||
|------|------|
|
||||
| [已完成任务](#-已完成任务) | Day 1-12 完成的功能 |
|
||||
| [后续规划](#️-后续规划) | 待办项目 |
|
||||
| [进度统计](#-进度统计) | 各模块完成度 |
|
||||
| [里程碑](#-里程碑) | 关键节点 |
|
||||
| [时间线](#-时间线) | 开发历程 |
|
||||
|
||||
**相关文档**:
|
||||
- [Day 日志](file:///d:/CodingProjects/Antigravity/ViGent2/Docs/DevLogs/) (Day1-Day12)
|
||||
- [部署指南](file:///d:/CodingProjects/Antigravity/ViGent2/Docs/DEPLOY_MANUAL.md)
|
||||
- [Qwen3-TTS 部署](file:///d:/CodingProjects/Antigravity/ViGent2/Docs/QWEN3_TTS_DEPLOY.md)
|
||||
**项目**: ViGent2 数字人口播视频生成系统
|
||||
**进度**: 100% (Day 28 - CosyVoice FP16 加速 + 文档全面更新)
|
||||
**更新时间**: 2026-02-27
|
||||
|
||||
---
|
||||
|
||||
## ✅ 已完成任务
|
||||
## 📅 对话历史与开发日志
|
||||
|
||||
### 阶段一:核心功能验证
|
||||
- [x] EdgeTTS 配音集成
|
||||
- [x] FFmpeg 视频合成
|
||||
- [x] MuseTalk 唇形同步 (代码集成)
|
||||
- [x] 端到端流程验证
|
||||
> 这里记录了每一天的核心开发内容与 milestone。
|
||||
|
||||
### 阶段二:后端 API 开发
|
||||
- [x] FastAPI 项目搭建
|
||||
- [x] 视频生成 API
|
||||
- [x] 素材管理 API
|
||||
- [x] 文件存储管理
|
||||
### Day 28: CosyVoice FP16 加速 + 文档全面更新 (Current)
|
||||
- [x] **CosyVoice FP16 半精度加速**: `AutoModel()` 开启 `fp16=True`,LLM 推理和 Flow Matching 自动混合精度运行,预估提速 30-40%、显存降低 ~30%。
|
||||
- [x] **文档全面更新**: README.md / DEPLOY_MANUAL.md / SUBTITLE_DEPLOY.md / BACKEND_README.md 补充 MuseTalk 混合唇形同步方案、性能优化、Remotion 并发渲染等内容。
|
||||
|
||||
### 阶段三:前端 Web UI
|
||||
- [x] Next.js 项目初始化
|
||||
- [x] 视频生成页面
|
||||
- [x] 发布管理页面
|
||||
- [x] 任务状态展示
|
||||
### Day 27: Remotion 描边修复 + 字体样式扩展 + 混合唇形同步 + 性能优化
|
||||
- [x] **描边渲染修复**: 标题/副标题/字幕从 `textShadow` 4 方向模拟改为 CSS 原生 `-webkit-text-stroke` + `paint-order: stroke fill`,修复描边过粗和副标题重影问题。
|
||||
- [x] **字体样式扩展**: 标题样式 4→12 个(+庞门正道/优设标题圆/阿里数黑体/文道潮黑/无界黑/厚底黑/寒蝉半圆体/欣意吉祥宋),字幕样式 4→8 个(+少女粉/清新绿/金色隶书/楷体红字)。
|
||||
- [x] **描边参数优化**: 所有预设 `stroke_size` 从 8 降至 4~5,配合原生描边视觉更干净。
|
||||
- [x] **TypeScript 类型修复**: Root.tsx `Composition` 泛型与 `calculateMetadata` 参数类型对齐;Video.tsx `VideoProps` 添加索引签名兼容 `Record<string, unknown>`;VideoLayer.tsx 移除 `OffthreadVideo` 不支持的 `loop` prop。
|
||||
- [x] **进度条文案还原**: 进度条从显示后端推送消息改回固定 `正在AI生成中...`。
|
||||
- [x] **MuseTalk 混合唇形同步**: 部署 MuseTalk 1.5 常驻服务 (GPU0, 端口 8011),按音频时长自动路由 — 短视频 (<120s) 走 LatentSync,长视频 (>=120s) 走 MuseTalk,MuseTalk 不可用时自动回退。
|
||||
- [x] **MuseTalk 推理性能优化**: server.py v2 重写 — cv2 直读帧(跳过 ffmpeg→PNG)、人脸检测降频(每5帧)、BiSeNet mask 缓存(每5帧)、cv2.VideoWriter 直写(跳过 PNG 写盘)、batch_size 8→32,预估 30min→8-10min (~3x)。
|
||||
- [x] **Remotion 并发渲染优化**: render.ts 新增 concurrency 参数,从默认 8 提升到 16 (56核 CPU),预估 5min→2-3min。
|
||||
|
||||
### 阶段四:社交媒体发布
|
||||
- [x] Playwright 自动化框架
|
||||
- [x] Cookie 管理功能
|
||||
- [x] 多平台发布 UI
|
||||
- [x] 定时发布功能 (Day 7)
|
||||
- [x] QR码自动登录 (Day 7)
|
||||
### Day 26: 前端优化:板块合并 + 序号标题 + UI 精细化
|
||||
- [x] **板块合并**: 首页 9 个独立板块合并为 5 个主板块(配音方式+配音列表→三、配音;视频素材+时间轴→四、素材编辑;历史作品+作品预览→六、作品)。
|
||||
- [x] **中文序号标题**: 一~十编号(首页一~六,发布页七~十),移除所有 emoji 图标。
|
||||
- [x] **embedded 模式**: 6 个组件支持 `embedded` prop,嵌入时不渲染外层卡片/标题。
|
||||
- [x] **配音列表两行布局**: embedded 模式第 1 行语速+生成配音(右对齐),第 2 行配音列表+刷新。
|
||||
- [x] **子组件自渲染子标题**: MaterialSelector/TimelineEditor embedded 时自渲染 h3 子标题+操作按钮同行。
|
||||
- [x] **下拉对齐**: TitleSubtitlePanel 标签统一 `w-20`,下拉 `w-1/3 min-w-[100px]`,垂直对齐。
|
||||
- [x] **参考音频文案简化**: 底部段落移至标题旁,简化为 `(上传3-10秒语音样本)`。
|
||||
- [x] **账户手机号显示**: AccountSettingsDropdown 新增手机号显示。
|
||||
- [x] **标题显示模式对副标题生效**: payload 条件修复 + UI 下拉上移至板块标题行。
|
||||
- [x] **登录后用户信息立即可用**: AuthContext 暴露 `setUser`,登录成功后立即写入用户数据,修复登录后显示"未知账户"的问题。
|
||||
- [x] **文案微调**: 素材描述改为"上传自拍视频,最多可选4个";显示模式选项加"标题"前缀。
|
||||
- [x] **UI/UX 体验优化**: 操作按钮移动端可见(opacity-40)、手机号脱敏、标题字数计数器、时间轴拖拽抓手图标、截取滑块放大。
|
||||
- [x] **代码质量修复**: 密码弹窗 success 清空、MaterialSelector useMemo + disabled 守卫、TimelineEditor useMemo。
|
||||
- [x] **发布页响应式布局**: 平台账号卡片单行布局,移动端紧凑(小图标/小按钮),桌面端宽松(与其他板块风格一致)。
|
||||
- [x] **移动端刷新回顶部**: `scrollRestoration = "manual"` + 列表 scroll 时间门控(`scrollEffectsEnabled` ref,1 秒内禁止自动滚动)+ 延迟兜底 `scrollTo(0,0)`。
|
||||
- [x] **移动端样式预览缩小**: FloatingStylePreview 移动端宽度缩至 160px,位置改为右下角,不遮挡样式调节控件。
|
||||
- [x] **列表滚动条统一隐藏**: 所有列表(BGM/配音/作品/素材/文案提取)滚动条改回 `hide-scrollbar`。
|
||||
- [x] **移动端配音/素材适配**: VoiceSelector 按钮移动端缩小(`px-2 sm:px-4`)修复克隆声音不可见;MaterialSelector 标题行移除 `whitespace-nowrap`,描述移动端隐藏,修复刷新按钮溢出。
|
||||
- [x] **生成配音按钮放大**: 从辅助尺寸(`text-xs px-2 py-1`)升级为主操作尺寸(`text-sm font-medium px-4 py-2`),新增阴影。
|
||||
- [x] **生成进度条位置调整**: 从"六、作品"卡片内部提取到右栏独立卡片,显示在作品卡片上方,更醒目。
|
||||
- [x] **LatentSync 超时修复**: httpx 超时从 1200s(20 分钟)改为 3600s(1 小时),修复 2 分钟以上视频口型推理超时回退问题。
|
||||
- [x] **字幕时间戳节奏映射**: `whisper_service.py` 从全程线性插值改为 Whisper 逐词节奏映射,修复长视频字幕漂移。
|
||||
|
||||
### 阶段五:部署与文档
|
||||
- [x] 手动部署指南 (DEPLOY_MANUAL.md)
|
||||
- [x] 一键部署脚本 (deploy.sh)
|
||||
- [x] 环境配置模板 (.env.example)
|
||||
- [x] 项目文档 (README.md)
|
||||
- [x] 端口配置 (8006/3002)
|
||||
### Day 25: 文案提取修复 + 自定义提示词 + 片头副标题
|
||||
- [x] **抖音文案提取修复**: yt-dlp Fresh cookies 报错,重写 `_download_douyin_manual` 为移动端分享页 + 自动获取 ttwid 方案。
|
||||
- [x] **清理 DOUYIN_COOKIE**: 新方案不再需要手动维护 Cookie,从 `.env`/`config.py`/`service.py` 全面删除。
|
||||
- [x] **AI 智能改写自定义提示词**: 后端 `rewrite_script()` 支持 `custom_prompt` 参数;前端 checkbox 旁新增折叠式提示词编辑区,localStorage 持久化。
|
||||
- [x] **SSR 构建修复**: `useState` 初始化 `localStorage` 访问加 `typeof window` 守卫,修复 `npm run build` 报错。
|
||||
- [x] **片头副标题**: 新增 secondary_title(后端/Remotion/前端全链路),AI 同时生成,独立样式配置,20 字限制。
|
||||
- [x] **前端文案修正**: "AI 洗稿结果"→"AI 改写结果"。
|
||||
- [x] **yt-dlp 升级**: `2025.12.08` → `2026.2.21`。
|
||||
- [x] **参考音频中文文件名修复**: `sanitize_filename()` 将存储路径清洗为 ASCII 安全字符,纯中文名哈希兜底,原始名保留为展示名。
|
||||
|
||||
### 阶段六:MuseTalk 服务器部署 (Day 2-3)
|
||||
- [x] conda 环境配置 (musetalk)
|
||||
- [x] 模型权重下载 (~7GB)
|
||||
- [x] subprocess 调用方式实现
|
||||
- [x] 健康检查功能
|
||||
- [x] 实际推理调用验证 (Day 3 修复)
|
||||
### Day 24: 鉴权到期治理 + 多素材时间轴稳定性修复
|
||||
- [x] **会员到期请求时失效**: 登录与鉴权接口统一执行 `expires_at` 检查;到期后自动停用账号、清理 session,并返回“会员已到期,请续费”。
|
||||
- [x] **画面比例控制**: 时间轴新增 `9:16 / 16:9` 输出比例选择,前端持久化并透传后端,单素材/多素材统一按目标分辨率处理。
|
||||
- [x] **标题/字幕防溢出**: Remotion 与前端预览统一响应式缩放、自动换行、描边/字距/边距比例缩放,降低预览与成片差异。
|
||||
- [x] **标题显示模式**: 标题行新增“短暂显示/常驻显示”下拉;默认短暂显示(4 秒),用户选择持久化并透传至 Remotion 渲染链路。
|
||||
- [x] **MOV 方向归一化**: 新增旋转元数据解析与 orientation normalize,修复“编码横屏+旋转元数据”导致的竖屏判断偏差。
|
||||
- [x] **多素材拼接稳定性**: 片段 prepare 与 concat 统一 25fps/CFR,concat 增加 `+genpts`,缓解段切换处“画面冻结口型还动”。
|
||||
- [x] **时间轴语义对齐**: 打通 `source_end` 全链路;修复 `sourceStart>0 且 sourceEnd=0` 时长计算;生成时以时间轴可见段 assignments 为准,超出段不参与。
|
||||
- [x] **交互细节优化**: 页面刷新回顶部;素材/历史列表首轮自动滚动抑制,减少恢复状态时页面跳动。
|
||||
|
||||
### 阶段七:MuseTalk 完整修复 (Day 4)
|
||||
- [x] 权重检测路径修复 (软链接)
|
||||
- [x] 音视频长度不匹配修复 (audio_processor.py)
|
||||
- [x] 推理脚本错误日志增强 (inference.py)
|
||||
- [x] 视频合成 MP4 生成验证
|
||||
- [x] 端到端流程完整测试
|
||||
### Day 23: 配音前置重构 + 素材时间轴编排 + UI 体验优化 + 声音克隆增强
|
||||
|
||||
### 阶段八:前端功能增强 (Day 5)
|
||||
- [x] Web 视频上传功能
|
||||
- [x] 上传进度显示
|
||||
- [x] 自动刷新素材列表
|
||||
#### 第一阶段:配音前置
|
||||
- [x] **配音生成独立化**: 新增 `generated_audios` 后端模块(router/schemas/service),5 个 API 端点,复用现有 TTSService / voice_clone_service / task_store。
|
||||
- [x] **配音管理面板**: 前端新增 `useGeneratedAudios` hook + `GeneratedAudiosPanel` 组件,支持生成/试听/改名/删除/选中。
|
||||
- [x] **UI 面板重排序**: 文案 → 标题字幕 → 配音方式 → 配音列表 → 素材选择 → BGM → 生成视频。
|
||||
- [x] **素材区门控**: 未选中配音时素材区显示遮罩,选中后显示配音时长 + 素材均分信息。
|
||||
- [x] **视频生成对接**: workflow.py 新增预生成音频分支(`generated_audio_id`),跳过内联 TTS,向后兼容。
|
||||
- [x] **持久化**: selectedAudioId 加入 useHomePersistence,刷新页面恢复选中配音。
|
||||
|
||||
### 阶段九:唇形同步模型升级 (Day 6)
|
||||
- [x] MuseTalk → LatentSync 1.6 迁移
|
||||
- [x] 后端代码适配 (config.py, lipsync_service.py)
|
||||
- [x] Conda 环境配置 (latentsync)
|
||||
- [x] 模型权重部署指南
|
||||
- [x] 服务器端到端验证
|
||||
#### 第二阶段:素材时间轴编排
|
||||
- [x] **时间轴编辑器**: 新增 `TimelineEditor` 组件,wavesurfer.js 音频波形 + 色块可视化素材分配,拖拽分割线调整各段时长。
|
||||
- [x] **素材截取设置**: 新增 `ClipTrimmer` 模态框,HTML5 视频预览 + 双端滑块设置源视频截取起点/终点。
|
||||
- [x] **后端自定义分配**: 新增 `CustomAssignment` 模型,`prepare_segment` 支持 `source_start`,workflow 多素材/单素材流水线支持 `custom_assignments`。
|
||||
- [x] **循环截取修复**: `stream_loop + source_start` 改为两步处理(先裁剪再循环),确保从截取起点循环而非从视频 0s 开始。
|
||||
- [x] **MaterialSelector 精简**: 移除旧的时长信息栏和拖拽排序区(功能迁移到 TimelineEditor)。
|
||||
|
||||
### 阶段十:性能优化 (Day 6)
|
||||
- [x] 视频预压缩优化 (高分辨率自动压缩到720p)
|
||||
- [x] 进度更新细化 (5% → 10% → 25% → ... → 100%)
|
||||
- [x] LipSync 服务单例缓存
|
||||
- [x] 健康检查缓存 (5分钟)
|
||||
- [x] 异步子进程修复 (subprocess.run → asyncio)
|
||||
- [x] 预加载模型服务 (常驻 Server + FastAPI)
|
||||
- [x] 批量队列处理 (GPU 并发控制)
|
||||
#### 第三阶段:UI 体验优化 + TTS 稳定性
|
||||
- [x] **TTS SoX PATH 修复**: `run_qwen_tts.sh` export conda env bin 到 PATH (Qwen3-TTS 已停用,已被 CosyVoice 3.0 替换)。
|
||||
- [x] **TTS 显存管理**: 每次生成后 `torch.cuda.empty_cache()`,asyncio.to_thread 避免阻塞事件循环 (CosyVoice 沿用相同机制)。
|
||||
- [x] **配音列表按钮统一**: Play/Edit/Delete 按钮右侧同组 hover 显示,与 RefAudioPanel 一致,移除文案摘要。
|
||||
- [x] **素材区解除配音门控**: 移除 MaterialSelector 的 selectedAudio 遮罩,素材随时可上传管理。
|
||||
- [x] **时间轴拖拽排序**: TimelineEditor 色块支持 HTML5 Drag & Drop 调换素材顺序。
|
||||
- [x] **截取设置 Range Slider**: ClipTrimmer 改为单轨道双手柄(紫色起点+粉色终点),替换两个独立滑块。
|
||||
- [x] **截取设置视频预览**: 视频区域可播放/暂停,从 sourceStart 到 sourceEnd 自动停止,拖拽手柄时实时 seek。
|
||||
|
||||
### 阶段十一:社交媒体发布完善 (Day 7)
|
||||
- [x] QR码自动登录 (Playwright headless)
|
||||
- [x] 多平台上传器架构 (B站/抖音/小红书)
|
||||
- [x] B站发布 (biliup官方库)
|
||||
- [x] 抖音/小红书发布 (Playwright)
|
||||
- [x] 定时发布功能
|
||||
- [x] 前端发布UI优化
|
||||
- [x] Cookie自动管理
|
||||
- [x] UI一致性修复 (导航栏对齐、滚动条隐藏)
|
||||
- [x] QR登录超时修复 (Stealth模式、多选择器fallback)
|
||||
- [x] 文档规则优化 (智能修改标准、工具使用规范)
|
||||
#### 第四阶段:历史文案 + Bug 修复
|
||||
- [x] **历史文案保存与加载**: 新增 `useSavedScripts` hook,手动保存/加载/删除历史文案,独立 localStorage 持久化。
|
||||
- [x] **时间轴拖拽修复**: `reorderSegments` 从属性交换改为数组移动(splice),修复拖拽后时长不跟随素材的 Bug。
|
||||
- [x] **按钮视觉统一**: 文案编辑区 4 个按钮统一为固定高度 `h-7`,移除多余 `<span>` 嵌套。
|
||||
- [x] **底部栏调整**: "保存文案"按钮移至底部右侧,移除预计时长显示。
|
||||
|
||||
### 阶段十二:用户体验优化 (Day 8)
|
||||
- [x] 文件名保留 (时间戳前缀 + 原始名称)
|
||||
- [x] 视频持久化 (从文件系统读取历史)
|
||||
- [x] 历史视频列表组件
|
||||
- [x] 素材/视频删除功能
|
||||
- [x] 登出功能 (Logout API + 前端按钮)
|
||||
- [x] 前端 SWR 轮询优化
|
||||
- [x] QR 登录状态检测修复
|
||||
#### 第五阶段:字幕语言不匹配 + 视频比例错位修复
|
||||
- [x] **字幕用原文替换 Whisper 转录**: `align()` 新增 `original_text` 参数,字幕文字永远用配音保存的原始文案。
|
||||
- [x] **Remotion 动态视频尺寸**: `calculateMetadata` 从 props 读取真实尺寸,修复标题/字幕比例错位。
|
||||
- [x] **英文空格丢失修复**: `split_word_to_chars` 遇到空格时 flush buffer + pending_space 标记。
|
||||
|
||||
### 阶段十三:发布模块优化 (Day 9)
|
||||
- [x] B站/抖音发布验证通过
|
||||
- [x] 资源清理保障 (try-finally)
|
||||
- [x] 超时保护 (消除无限循环)
|
||||
- [x] 小红书 headless 模式修复
|
||||
- [x] API 输入验证
|
||||
- [x] 完整类型提示
|
||||
- [x] 扫码登录等待界面 (加载动画)
|
||||
- [x] 抖音/B站登录策略优化 (Text优先)
|
||||
- [x] 发布成功审核提示
|
||||
#### 第六阶段:参考音频自动转写 + 语速控制
|
||||
- [x] **Whisper 自动转写 ref_text**: 上传参考音频时自动调用 Whisper 转写内容作为 ref_text,不再使用前端固定文字。
|
||||
- [x] **参考音频自动截取**: 超过 10 秒自动在静音点截取(ffmpeg silencedetect),末尾 0.1 秒淡出避免截断爆音。
|
||||
- [x] **重新识别功能**: 新增 `POST /ref-audios/{id}/retranscribe` 端点 + 前端 RotateCw 按钮,旧音频可重新转写并截取。
|
||||
- [x] **语速控制**: 全链路 speed 参数(前端选择器 → 持久化 → 后端 → CosyVoice `inference_zero_shot(speed=)`),5 档:较慢(0.8)/稍慢(0.9)/正常(1.0)/稍快(1.1)/较快(1.2)。
|
||||
- [x] **缺少参考音频门控**: 声音克隆模式下未选参考音频时,生成配音按钮禁用 + 黄色警告提示。
|
||||
- [x] **Whisper 语言自动检测**: `transcribe()` language 参数改为可选(默认 None = 自动检测),支持多语言参考音频。
|
||||
- [x] **前端清理**: 移除固定 ref_text 常量、朗读引导文字,简化为"上传任意语音样本,系统将自动识别内容并克隆声音"。
|
||||
|
||||
### 阶段十四:用户认证系统 (Day 9)
|
||||
- [x] Supabase 数据库表设计与部署
|
||||
- [x] JWT 认证 (HttpOnly Cookie)
|
||||
- [x] 用户注册/登录/登出 API
|
||||
- [x] 管理员权限控制 (is_active)
|
||||
- [x] 单设备登录限制 (Session Token)
|
||||
- [x] 防止 Supabase 暂停 (GitHub Actions/Crontab)
|
||||
- [x] 认证部署文档 (AUTH_DEPLOY.md)
|
||||
### Day 22: 多素材优化 + AI 翻译 + TTS 多语言
|
||||
- [x] **多素材 Bug 修复**: 6 个高优 Bug(边界溢出、单段 fallback、除零、duration 校验、Whisper 兜底、空列表检查)。
|
||||
- [x] **架构重构**: 多素材从"逐段 LatentSync"重构为"先拼接再推理",推理次数 N→1。
|
||||
- [x] **前端优化**: payload 安全、进度消息、上传自动选中、Material 接口统一、拖拽修复、素材上限 4 个。
|
||||
- [x] **AI 多语言翻译**: 新增 `/api/ai/translate` 接口,前端 9 种语言翻译 + 还原原文。
|
||||
- [x] **TTS 多语言**: EdgeTTS 10 语言声音列表、翻译自动切换声音、声音克隆 language 透传、textLang 持久化。
|
||||
|
||||
### 阶段十五:部署稳定性优化 (Day 9)
|
||||
- [x] 后端依赖修复 (bcrypt/email-validator)
|
||||
- [x] 前端生产环境构建修复 (npm run build)
|
||||
- [x] LatentSync 性能卡顿修复 (OMP_NUM_THREADS限制)
|
||||
- [x] 部署服务自愈 (PM2 配置优化)
|
||||
- [x] 部署手册全量更新 (DEPLOY_MANUAL.md)
|
||||
### Day 21: 缺陷修复 + 浮动预览 + 发布重构 + 架构优化 + 多素材生成
|
||||
- [x] **Remotion 崩溃容错**: 渲染进程 SIGABRT 退出时检查输出文件,避免误判失败导致标题/字幕丢失。
|
||||
- [x] **首页作品选择持久化**: 修复 `fetchGeneratedVideos` 无条件覆盖恢复值的问题,新增 `preferVideoId` 参数控制选中逻辑。
|
||||
- [x] **发布页作品选择持久化**: 根因为签名 URL 不稳定,全面改用 `video.id` 替代 `path` 进行选择/持久化/比较。
|
||||
- [x] **预取缓存补全**: 首页预取发布页数据时加入 `id` 字段,确保缓存数据可用于持久化匹配。
|
||||
- [x] **浮动样式预览窗口**: 标题字幕预览改为 `position: fixed` 浮动窗口,固定左上角,滚动时始终可见。
|
||||
- [x] **移动端适配**: ScriptEditor 按钮换行、预览默认比例改为 9:16 竖屏。
|
||||
- [x] **多平台发布重构**: 平台配置独立化(DOUYIN_*/WEIXIN_*)、用户隔离 Cookie 管理、抖音刷脸验证二维码、微信发布流程优化。
|
||||
- [x] **前端结构微调**: ScriptExtractionModal 迁移到 features/、contexts 迁移到 shared/contexts/、清理空目录。
|
||||
- [x] **后端模块分层**: materials/tools/ref_audios 三个模块补全 router+schemas+service 分层。
|
||||
- [x] **开发规范更新**: BACKEND_DEV.md 新增渐进原则、DOC_RULES.md 取消 TASK_COMPLETE.md 手动触发约束。
|
||||
- [x] **文档全面更新**: BACKEND_DEV/README、FRONTEND_DEV、DEPLOY_MANUAL、README.md 同步更新。
|
||||
- [x] **多素材视频生成(多机位效果)**: 支持多选素材 + 拖拽排序,按素材数量均分音频时长(对齐 Whisper 字边界)自动切换机位。逐段 LatentSync + FFmpeg 拼接。前端 @dnd-kit 拖拽排序 UI。
|
||||
- [x] **字幕开关移除**: 默认启用逐字高亮字幕,移除开关及相关死代码。
|
||||
- [x] **视频格式扩展**: 上传支持 mkv/webm/flv/wmv/m4v/ts/mts 等常见格式。
|
||||
- [x] **Watchdog 优化**: 健康检查阈值提高到 5 次,新增重启冷却期 120 秒,避免误重启。
|
||||
- [x] **多素材 Bug 修复**: 修复标点分句方案对无句末标点文案无效(改为均分方案)、音频时间偏移导致口型不对齐等缺陷。
|
||||
|
||||
### 阶段十六:HTTPS 部署与细节完善 (Day 10)
|
||||
- [x] 隧道访问修复 (StaticFiles 挂载 + Rewrite)
|
||||
- [x] 平台账号列表 500 错误修复 (paths.py)
|
||||
- [x] Nginx HTTPS 配置 (反向代理 + SSL)
|
||||
- [x] 浏览器标题修改 (ViGent)
|
||||
- [x] 代码自适应 HTTPS 验证
|
||||
- [x] **Supabase 自托管部署** (Docker, 3003/8008端口)
|
||||
- [x] **安全加固** (Basic Auth 保护后台)
|
||||
- [x] **端口冲突解决** (迁移 Analytics/Kong)
|
||||
### Day 20: 代码质量与安全优化
|
||||
- [x] **功能性修复**: LatentSync 回退逻辑、任务状态接口认证、User 类型统一。
|
||||
- [x] **性能优化**: N+1 查询修复、视频上传流式处理、httpx 异步替换、GLM 异步包装。
|
||||
- [x] **安全修复**: 硬编码 Cookie 配置化、日志敏感信息脱敏、ffprobe 安全调用、CORS 配置化。
|
||||
- [x] **配置优化**: 存储路径环境变量化、Remotion 预编译加速、LatentSync 绝对路径。
|
||||
- [x] **文档更新**: 更新 DOC_RULES.md 清单,补齐后端与部署文档;更新 SUBTITLE_DEPLOY.md, FRONTEND_DEV.md, implementation_plan.md。
|
||||
- [x] **缺陷修复**: 修复 Remotion 路径解析、发布页持久化竞态、首页选中回归、素材闭包陷阱。
|
||||
|
||||
### 阶段十七:上传架构重构 (Day 11)
|
||||
- [x] **直传改造** (前端直接上传 Supabase,绕过后端代理)
|
||||
- [x] **后端适配** (Signed URL 签名生成)
|
||||
- [x] **RLS 策略部署** (SQL 脚本自动化权限配置)
|
||||
- [x] **超时问题根治** (彻底解决 Nginx/FRP 30s 限制)
|
||||
- [x] **前端依赖更新** (@supabase/supabase-js 集成)
|
||||
### Day 19: 自动发布稳定性与发布体验优化 🚀
|
||||
- [x] **抖音发布稳定性**: 上传入口、封面流程、发布重试、登录失效识别与网络失败快速返回全面增强。
|
||||
- [x] **视频号发布修复**: 标题+标签统一写入“视频描述”,`post_create` 成功信号快速判定,超时改为失败返回。
|
||||
- [x] **成功截图闭环**: 抖音/视频号发布成功截图接入前端,支持用户隔离存储与鉴权访问。
|
||||
- [x] **截图观感优化**: 成功截图延后 3 秒并改为视口截图,修复“截图内容仅占 1/3”问题。
|
||||
- [x] **调试能力开关化**: 新增视频号录屏配置,默认可按环境变量开关,失败排障更直观。
|
||||
- [x] **启动链路统一**: 合并为 `run_backend.sh`(xvfb + headful),统一端口 `8006`,减少多进程混淆。
|
||||
- [x] **发布页防误操作**: 发布中按钮提示“请勿刷新或关闭网页”,并启用刷新/关页二次确认拦截。
|
||||
- [ ] **后续优化**: 发布任务状态恢复机制(任务化 + 状态持久化 + 前端轮询恢复)。
|
||||
|
||||
### 阶段十八:用户隔离与存储优化 (Day 11)
|
||||
- [x] **用户数据隔离** (素材/视频/Cookie 按用户ID目录隔离)
|
||||
- [x] **Storage URL 修复** (SUPABASE_PUBLIC_URL 配置,修复 localhost 问题)
|
||||
- [x] **发布服务优化** (直接读取本地 Supabase Storage 文件,跳过 HTTP 下载)
|
||||
- [x] **Supabase Studio 配置** (公网访问配置)
|
||||
### Day 18: 后端模块化与规范完善
|
||||
- [x] **模块化迁移**: 路由透传 `modules/*`,业务逻辑集中到 service/workflow。
|
||||
- [x] **视频生成拆分**: 生成流程下沉 workflow,任务状态统一 TaskStore。
|
||||
- [x] **Redis 任务存储**: Redis 优先,不可用自动回退内存。
|
||||
- [x] **仓储层抽离**: Supabase 访问统一 `repositories/*`,deps/auth/admin 全面替换。
|
||||
- [x] **响应规范**: 统一 `success/message/data/code` + 全局异常处理。
|
||||
- [x] **素材重命名**: 新增重命名接口与 Storage `move_file`。
|
||||
- [x] **平台顺序调整**: 抖音/微信视频号/B站/小红书,移除快手。
|
||||
- [x] **后端开发规范**: 新增 `BACKEND_DEV.md`,README 同步模块化结构。
|
||||
- [x] **发布管理体验**: 首页预取路由 + 发布页骨架与缓存,进入更快。
|
||||
- [x] **素材加载优化**: 素材列表并发签名 URL,骨架数量动态。
|
||||
- [x] **预览加载优化**: `preload="metadata"` + hover 预取。
|
||||
|
||||
### 阶段十九:iOS 兼容与移动端 UI 优化 (Day 12)
|
||||
- [x] **Axios 全局拦截器** (401/403 自动跳转登录,防重复跳转)
|
||||
- [x] **iOS Safari 安全区域修复** (viewport-fit: cover, themeColor, 渐变背景统一)
|
||||
- [x] **移动端 Header 优化** (按钮紧凑布局,响应式间距)
|
||||
- [x] **发布页面 UI 重构** (立即发布/定时发布按钮分离,防误触设计)
|
||||
- [x] **Qwen3-TTS 0.6B 部署** (声音克隆模型,GPU0,3秒参考音频快速克隆)
|
||||
### Day 17: 前端重构与体验优化
|
||||
- [x] **UI 组件拆分**: 首页拆分为独立组件,降低 `page.tsx` 复杂度。
|
||||
- [x] **轻量 FSD 迁移**: `app` 页面轻量化,逻辑集中到 `features/*/model`,通用能力下沉 `shared/*`。
|
||||
- [x] **Controller Hooks**: Home/Publish 页面逻辑集中到 Controller Hook,Page 仅组合渲染。
|
||||
- [x] **通用工具抽取**: `media.ts` 统一 API Base / URL / 日期格式化。
|
||||
- [x] **交互优化**: 选择项持久化、列表内定位、刷新回顶部、最新作品优先预览。
|
||||
- [x] **发布页改造**: 作品列表卡片化 + 搜索 + 预览弹窗。
|
||||
- [x] **预览体验**: 预览弹窗统一头部样式与提示文案。
|
||||
- [x] **预览一致性**: 标题/字幕预览按素材分辨率缩放。
|
||||
- [x] **标题同步与限制**: 片头标题同步发布标题,输入法合成态兼容,限制 15 字。
|
||||
- [x] **样式默认与持久化**: 默认样式与字号调整,刷新保留用户选择。
|
||||
- [x] **性能微优化**: 列表渲染优化 + 并行请求 + localStorage 防抖。
|
||||
- [x] **资源能力**: 字体/BGM 资源库 + `/api/assets` 接入。
|
||||
- [x] **音频与字幕修复**: BGM 混音稳定性与字幕断句优化。
|
||||
- [x] **持久化修复**: 接入 `useHomePersistence`,恢复 `isRestored` 逻辑并通过构建。
|
||||
- [x] **预览与选择修复**: 发布预览兼容签名 URL,音频试听路径解析,素材/BGM 回退有效项。
|
||||
- [x] **体验细节优化**: 录音预览 URL 回收,预览弹窗滚动恢复,全局任务提示挂载。
|
||||
|
||||
### Day 16: 深度性能优化
|
||||
- [x] **Qwen-TTS 加速**: 集成 Flash Attention 2 (已停用,被 CosyVoice 3.0 替换)。
|
||||
- [x] **服务守护**: 开发 `Watchdog` 看门狗机制,自动监控并重启僵死服务。
|
||||
- [x] **LatentSync 性能确认**: 验证 DeepCache + 原生 Flash Attn 生效。
|
||||
- [x] **文档重构**: 全面更新 README、部署手册及后端文档。
|
||||
|
||||
### Day 15: 手机号认证迁移
|
||||
- [x] **认证系统升级**: 从邮箱迁移至 11 位手机号注册/登录。
|
||||
- [x] **账户管理**: 新增修改密码、有效期显示、安全退出功能。
|
||||
- [x] **AI 文案助手**: 升级 GLM-4.7-Flash,支持 B站/抖音链接提取与洗稿。
|
||||
|
||||
### Day 14: AI 增强与体验优化
|
||||
- [x] **AI 标题/标签**: 集成 GLM-4API 自动生成视频元数据。
|
||||
- [x] **字幕升级**: Remotion 逐字高亮字幕 (卡拉OK效果) 及动画片头。
|
||||
- [x] **模型升级**: 声音克隆已迁移至 CosyVoice 3.0 (0.5B)。
|
||||
|
||||
### Day 13: 声音克隆集成
|
||||
- [x] **声音克隆微服务**: 封装 CosyVoice 3.0 为独立 API (8010端口,替换 Qwen3-TTS)。
|
||||
- [x] **参考音频管理**: Supabase 存储桶配置与管理接口。
|
||||
- [x] **多模态 TTS**: 前端支持 EdgeTTS / Clone Voice 切换。
|
||||
|
||||
### Day 12: 移动端适配
|
||||
- [x] **iOS 兼容**: 修复 Safari 安全区域、状态栏颜色、Cookie 拦截问题。
|
||||
- [x] **响应式 UI**: 移动端 Header 与发布页重构。
|
||||
|
||||
### Day 11: 上传架构重构
|
||||
- [x] **直传优化**: 前端直传 Supabase Storage,解决 Nginx 30s 超时问题。
|
||||
- [x] **数据隔离**: 用户素材/视频按 UserID 物理隔离。
|
||||
|
||||
### Day 10: HTTPS 与安全
|
||||
- [x] **HTTPS 部署**: 配置 SSL 证书与 Nginx 反向代理。
|
||||
- [x] **安全加固**: Supabase Studio 增加 Basic Auth 保护。
|
||||
|
||||
### Day 9: 认证系统与发布闭环
|
||||
- [x] **用户系统**: 基于 Supabase Auth 实现 JWT 认证。
|
||||
- [x] **发布闭环**: 验证 B站/抖音/小红书 自动发布流程。
|
||||
- [x] **服务自愈**: 配置 PM2 进程守护。
|
||||
|
||||
### Day 1-8: 核心功能构建
|
||||
- [x] **Day 8**: 历史记录持久化与文件管理。
|
||||
- [x] **Day 7**: 社交媒体自动登录与多平台发布。
|
||||
- [x] **Day 6**: **LatentSync 1.6** 升级与服务器部署。
|
||||
- [x] **Day 5**: 前端视频上传与进度反馈。
|
||||
- [x] **Day 4**: MuseTalk (旧版) 口型同步修复。
|
||||
- [x] **Day 3**: 服务器环境配置与模型权重下载。
|
||||
- [x] **Day 1-2**: 项目基础框架 (FastAPI + Next.js) 搭建。
|
||||
|
||||
---
|
||||
|
||||
## 🛤️ 后续规划
|
||||
## 🛤️ 后续规划 (Roadmap)
|
||||
|
||||
### 🔴 优先待办
|
||||
- [ ] **Qwen3-TTS 集成到 ViGent2** - 前端 UI + 后端服务集成
|
||||
- [ ] 批量视频生成架构设计
|
||||
|
||||
### 🟠 功能完善
|
||||
- [x] 定时发布功能 ✅ Day 7 完成
|
||||
- [ ] **后端定时发布** - 替代平台端定时,使用 APScheduler 实现任务调度
|
||||
- [ ] 批量视频生成
|
||||
- [ ] 字幕样式编辑器
|
||||
- [x] ~~**配音前置重构 — 第二阶段**: 素材片段截取 + 语音时间轴编排~~ ✅ Day 23 已完成
|
||||
- [ ] **批量生成架构**: 支持 Excel 导入,批量生产视频。
|
||||
- [ ] **定时任务后台化**: 迁移前端触发的定时发布到后端 APScheduler。
|
||||
- [ ] **发布任务恢复机制**: 发布任务化 + 状态持久化 + 前端断点恢复,解决刷新后状态丢失。
|
||||
|
||||
### 🔵 长期探索
|
||||
- [ ] Docker 容器化
|
||||
- [ ] Celery 分布式任务队列
|
||||
- [ ] **容器化交付**: 提供完整的 Docker Compose 一键部署包。
|
||||
- [ ] **分布式队列**: 引入 Celery + Redis 处理超高并发任务。
|
||||
|
||||
---
|
||||
|
||||
## 📊 进度统计
|
||||
|
||||
### 总体进度
|
||||
```
|
||||
████████████████████ 100%
|
||||
```
|
||||
|
||||
### 各模块进度
|
||||
## 📊 模块完成度
|
||||
|
||||
| 模块 | 进度 | 状态 |
|
||||
|------|------|------|
|
||||
| 后端 API | 100% | ✅ 完成 |
|
||||
| 前端 UI | 100% | ✅ 完成 |
|
||||
| TTS 配音 | 100% | ✅ 完成 |
|
||||
| 视频合成 | 100% | ✅ 完成 |
|
||||
| 唇形同步 | 100% | ✅ LatentSync 1.6 升级完成 |
|
||||
| 社交发布 | 100% | ✅ Day 9 验证通过 |
|
||||
| 用户认证 | 100% | ✅ Day 9 Supabase+JWT |
|
||||
| 服务器部署 | 100% | ✅ Day 9 稳定性优化完成 |
|
||||
| **核心 API** | 100% | ✅ 稳定 |
|
||||
| **Web UI** | 100% | ✅ 稳定 (移动端适配) |
|
||||
| **唇形同步** | 100% | ✅ LatentSync 1.6 |
|
||||
| **TTS 配音** | 100% | ✅ EdgeTTS + CosyVoice 3.0 + 配音前置 + 时间轴编排 + 自动转写 + 语速控制 |
|
||||
| **自动发布** | 100% | ✅ 抖音/微信视频号/B站/小红书 |
|
||||
| **用户认证** | 100% | ✅ 手机号 + JWT |
|
||||
| **付费会员** | 100% | ✅ 支付宝电脑网站支付 + 自动激活 |
|
||||
| **部署运维** | 100% | ✅ PM2 + Watchdog |
|
||||
|
||||
---
|
||||
|
||||
## 🎯 里程碑
|
||||
|
||||
### Milestone 1: 项目框架搭建 ✅
|
||||
**完成时间**: Day 1
|
||||
**成果**:
|
||||
- FastAPI 后端 + Next.js 前端
|
||||
- EdgeTTS + FFmpeg 集成
|
||||
- 视频生成端到端验证
|
||||
|
||||
### Milestone 2: 服务器部署 ✅
|
||||
**完成时间**: Day 3
|
||||
**成果**:
|
||||
- PyTorch 2.0.1 + MMLab 环境修复
|
||||
- 模型目录重组与权重补全
|
||||
- MuseTalk 推理成功运行
|
||||
|
||||
### Milestone 3: 口型同步完整修复 ✅
|
||||
**完成时间**: Day 4
|
||||
**成果**:
|
||||
- 权重检测路径修复 (软链接)
|
||||
- 音视频长度不匹配修复
|
||||
- 视频合成 MP4 验证通过 (28MB → 3.8MB)
|
||||
|
||||
### Milestone 4: LatentSync 1.6 升级 ✅
|
||||
**完成时间**: Day 6
|
||||
**成果**:
|
||||
- MuseTalk → LatentSync 1.6 迁移
|
||||
- 512×512 高分辨率唇形同步
|
||||
- Latent Diffusion 架构升级
|
||||
- 性能优化 (视频预压缩、进度更新)
|
||||
|
||||
### Milestone 5: 用户认证系统 ✅
|
||||
**完成时间**: Day 9
|
||||
**成果**:
|
||||
- Supabase 云数据库集成
|
||||
- 安全的 JWT + HttpOnly Cookie 认证
|
||||
- 管理员后台与用户隔离
|
||||
- 完善的部署与保活方案
|
||||
|
||||
### Milestone 6: 生产环境部署稳定化 ✅
|
||||
**完成时间**: Day 9
|
||||
**成果**:
|
||||
- 修复了后端 (bcrypt) 和前端 (build) 的启动崩溃问题
|
||||
- 解决了 LatentSync 占用全量 CPU 导致服务器卡顿的严重问题
|
||||
- 完善了部署手册,记录了关键的 Troubleshooting 步骤
|
||||
- 实现了服务 Long-term 稳定运行 (Reset PM2 counter)
|
||||
|
||||
---
|
||||
|
||||
## 📅 时间线
|
||||
|
||||
Day 1: 项目初始化 + 核心功能 ✅ 完成
|
||||
- 后端 API 框架
|
||||
- 前端 UI
|
||||
- TTS + 视频合成
|
||||
- 社交发布框架
|
||||
- 部署文档
|
||||
|
||||
Day 2: 服务器部署 + MuseTalk ✅ 完成
|
||||
- 端口配置 (8006/3002)
|
||||
- MuseTalk conda 环境初始化
|
||||
- subprocess 调用实现
|
||||
- 健康检查验证
|
||||
|
||||
Day 3: 环境修复与验证 ✅ 完成
|
||||
- PyTorch 降级 (2.5 -> 2.0.1)
|
||||
- MMLab 依赖全量安装
|
||||
- 模型权重补全 (dwpose, syncnet)
|
||||
- 目录结构修复 (symlinks)
|
||||
- 推理脚本验证 (生成593帧)
|
||||
|
||||
Day 4: 口型同步完整修复 ✅ 完成
|
||||
- 权重检测路径修复 (软链接)
|
||||
- audio_processor.py 音视频长度修复
|
||||
- inference.py 错误日志增强
|
||||
- MP4 视频合成验证通过
|
||||
|
||||
Day 5: 前端功能增强 ✅ 完成
|
||||
- Web 视频上传功能
|
||||
- 上传进度显示
|
||||
- 自动刷新素材列表
|
||||
|
||||
Day 6: LatentSync 1.6 升级 ✅ 完成
|
||||
- MuseTalk → LatentSync 迁移
|
||||
- 后端代码适配
|
||||
- 模型部署指南
|
||||
- 服务器部署验证
|
||||
- 性能优化 (视频预压缩、进度更新)
|
||||
|
||||
Day 7: 社交媒体发布完善 ✅ 完成
|
||||
- QR码自动登录 (B站/抖音验证通过)
|
||||
- 智能定位策略 (CSS/Text并行)
|
||||
- 多平台发布 (B站/抖音/小红书)
|
||||
- UI 一致性优化
|
||||
- 文档规则体系优化
|
||||
|
||||
Day 8: 用户体验优化 ✅ 完成
|
||||
- 文件名保留 (时间戳前缀)
|
||||
- 视频持久化 (历史视频API)
|
||||
- 历史视频列表组件
|
||||
- 素材/视频删除功能
|
||||
|
||||
Day 9: 发布模块优化 ✅ 完成
|
||||
- B站/抖音登录+发布验证通过
|
||||
- 资源清理保障 (try-finally)
|
||||
- 超时保护 (消除无限循环)
|
||||
- 小红书 headless 模式修复
|
||||
- 扫码登录等待界面 (加载动画)
|
||||
- 抖音/B站登录策略优化 (Text优先)
|
||||
- 发布成功审核提示
|
||||
- 用户认证系统规划 (FastAPI+Supabase)
|
||||
- Supabase 表结构设计 (users/sessions)
|
||||
- 后端 JWT 认证实现 (auth.py/deps.py)
|
||||
- 数据库配置与 SQL 部署
|
||||
- 独立认证部署文档 (AUTH_DEPLOY.md)
|
||||
- 自动保活机制 (Crontab/Actions)
|
||||
- 部署稳定性优化 (Backend依赖修复)
|
||||
- 前端生产构建流程修复
|
||||
- LatentSync 严重卡顿修复 (线程数限制)
|
||||
- 部署手册全量更新
|
||||
|
||||
Day 10: HTTPS 部署与细节完善 ✅ 完成
|
||||
- 隧道访问视频修正 (挂载 uploads)
|
||||
- 账号列表 Bug 修复 (paths.py 白名单)
|
||||
- 阿里云 Nginx HTTPS 部署
|
||||
- UI 细节优化 (Title 更新)
|
||||
|
||||
Day 11: 上传架构重构 ✅ 完成
|
||||
- **核心修复**: Aliyun Nginx `client_max_body_size 0` 配置
|
||||
- 500 错误根治 (Direct Upload + Gateway Config)
|
||||
- Supabase RLS 权限策略部署
|
||||
- 前端集成 supabase-js
|
||||
- 彻底解决大文件上传超时 (30s 限制)
|
||||
- **用户数据隔离** (素材/视频/Cookie 按用户目录存储)
|
||||
- **Storage URL 修复** (SUPABASE_PUBLIC_URL 公网地址配置)
|
||||
- **发布服务优化** (本地文件直读,跳过 HTTP 下载)
|
||||
|
||||
Day 12: iOS 兼容与移动端优化 ✅ 完成
|
||||
- Axios 全局拦截器 (401/403 自动跳转登录)
|
||||
- iOS Safari 安全区域白边修复 (viewport-fit: cover)
|
||||
- themeColor 配置 (状态栏颜色适配)
|
||||
- 渐变背景统一 (body 全局渐变,消除分层)
|
||||
- 移动端 Header 响应式优化 (按钮紧凑布局)
|
||||
- 发布页面 UI 重构 (立即发布 3/4 + 定时 1/4)
|
||||
- **Qwen3-TTS 0.6B 部署** (声音克隆模型,GPU0)
|
||||
- **部署文档** (QWEN3_TTS_DEPLOY.md)
|
||||
## 📎 相关文档
|
||||
|
||||
- [详细开发日志 (DevLogs)](Docs/DevLogs/)
|
||||
- [部署手册 (DEPLOY_MANUAL)](Docs/DEPLOY_MANUAL.md)
|
||||
|
||||
232
README.md
232
README.md
@@ -1,34 +1,79 @@
|
||||
# ViGent2 - 数字人口播视频生成系统
|
||||
|
||||
基于 **LatentSync 1.6 + EdgeTTS** 的开源数字人口播视频生成系统。
|
||||
<div align="center">
|
||||
|
||||
> 📹 上传静态人物视频 → 🎙️ 输入口播文案 → 🎬 自动生成唇形同步视频
|
||||
> 📹 **上传人物** · 🎙️ **输入文案** · 🎬 **一键成片**
|
||||
|
||||
基于 **LatentSync 1.6 + MuseTalk 1.5 混合唇形同步** 的开源数字人口播视频生成系统。
|
||||
集成 **CosyVoice 3.0** 声音克隆与自动社交媒体发布功能。
|
||||
|
||||
[功能特性](#-功能特性) • [技术栈](#-技术栈) • [文档中心](#-文档中心) • [部署指南](Docs/DEPLOY_MANUAL.md)
|
||||
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## ✨ 功能特性
|
||||
|
||||
- 🎬 **唇形同步** - LatentSync 1.6 驱动,512×512 高分辨率 Diffusion 模型
|
||||
- 🎙️ **TTS 配音** - EdgeTTS 多音色支持(云溪、晓晓等)
|
||||
- 📱 **全自动发布** - 扫码登录 + Cookie持久化,支持多平台(B站/抖音/小红书)定时发布
|
||||
- 🖥️ **Web UI** - Next.js 现代化界面,iOS/Android 移动端适配
|
||||
- 🔐 **用户系统** - Supabase + JWT 认证,支持管理员后台、注册/登录
|
||||
- 👥 **多用户隔离** - 素材/视频/Cookie 按用户独立存储,数据完全隔离
|
||||
- 🚀 **性能优化** - 视频预压缩、常驻模型服务 (0s加载)、本地文件直读
|
||||
### 核心能力
|
||||
- 🎬 **高清唇形同步** - 混合方案:短视频 (<120s) 用 LatentSync 1.6 (高质量 Latent Diffusion),长视频 (>=120s) 用 MuseTalk 1.5 (实时级单步推理),自动路由 + 回退。
|
||||
- 🎙️ **多模态配音** - 支持 **EdgeTTS** (微软超自然语音, 10 语言) 和 **CosyVoice 3.0** (3秒极速声音克隆, 9语言+18方言, 语速可调)。上传参考音频自动 Whisper 转写 + 智能截取。配音前置工作流:先生成配音 → 选素材 → 生成视频。
|
||||
- 📝 **智能字幕** - 集成 faster-whisper + Remotion,自动生成逐字高亮 (卡拉OK效果) 字幕。
|
||||
- 🎨 **样式预设** - 12 种标题 + 8 种字幕样式预设,支持预览 + 字号调节 + 自定义字体库。CSS 原生描边渲染,清晰无重影。
|
||||
- 🏷️ **标题显示模式** - 片头标题支持 `短暂显示` / `常驻显示`,默认短暂显示(4秒),用户偏好自动持久化。
|
||||
- 📌 **片头副标题** - 可选副标题显示在主标题下方,独立样式配置,AI 可同时生成,20 字限制。
|
||||
- 🖼️ **作品预览一致性** - 标题/字幕预览与 Remotion 成片统一响应式缩放和自动换行,窄屏画布也稳定显示。
|
||||
- 🎞️ **多素材多机位** - 支持多选素材 + 时间轴编辑器 (wavesurfer.js 波形可视化),拖拽分割线调整时长、拖拽排序切换机位、按 `source_start/source_end` 截取片段。
|
||||
- 📐 **画面比例控制** - 时间轴一键切换 `9:16 / 16:9` 输出比例,生成链路全程按目标比例处理。
|
||||
- 💾 **用户偏好持久化** - 首页状态统一恢复/保存,刷新后延续上次配置。历史文案手动保存与加载。
|
||||
- 🎵 **背景音乐** - 试听 + 音量控制 + 混音,保持配音音量稳定。
|
||||
- 🤖 **AI 辅助创作** - 内置 GLM-4.7-Flash,支持 B站/抖音链接文案提取、AI 智能改写(支持自定义提示词)、标题/标签自动生成、9 语言翻译。
|
||||
|
||||
### 平台化功能
|
||||
- 📱 **全自动发布** - 支持抖音/微信视频号/B站/小红书立即发布;扫码登录 + Cookie 持久化。
|
||||
- 🖥️ **发布管理预览** - 支持签名 URL / 相对路径作品预览,确保可直接播放。
|
||||
- 📸 **发布结果可视化** - 抖音/微信视频号发布成功后返回截图,发布页结果卡片可直接查看。
|
||||
- 🛡️ **发布防误操作** - 发布进行中自动提示“请勿刷新或关闭网页”,并拦截刷新/关页二次确认。
|
||||
- 💳 **付费会员** - 支付宝电脑网站支付自动开通会员,到期自动停用并引导续费,管理员手动激活并存。
|
||||
- 🔐 **认证与隔离** - 基于 Supabase 的用户隔离,支持手机号注册/登录、密码管理。
|
||||
- 🛡️ **服务守护** - 内置 Watchdog 看门狗机制,自动监控并重启僵死服务,确保 7x24h 稳定运行。
|
||||
- 🚀 **性能优化** - 视频预压缩、模型常驻服务(近实时加载)、双 GPU 流水线并发、MuseTalk 人脸检测降频 + BiSeNet 缓存、Remotion 16 并发渲染。
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ 技术栈
|
||||
|
||||
| 模块 | 技术 |
|
||||
|------|------|
|
||||
| 前端 | Next.js 14 + TypeScript + TailwindCSS |
|
||||
| 后端 | FastAPI + Python 3.10 |
|
||||
| 数据库 | **Supabase** (PostgreSQL) 自托管 Docker |
|
||||
| 存储 | **Supabase Storage** (本地文件系统) |
|
||||
| 认证 | **JWT** + HttpOnly Cookie |
|
||||
| 唇形同步 | **LatentSync 1.6** (Latent Diffusion, 512×512) |
|
||||
| TTS | EdgeTTS |
|
||||
| 视频处理 | FFmpeg |
|
||||
| 自动发布 | Playwright |
|
||||
| 领域 | 核心技术 | 说明 |
|
||||
|------|----------|------|
|
||||
| **前端** | Next.js 16 | TypeScript, TailwindCSS, SWR, wavesurfer.js |
|
||||
| **后端** | FastAPI | Python 3.12, AsyncIO, PM2 |
|
||||
| **数据库** | Supabase | PostgreSQL, Storage (本地/S3), Auth |
|
||||
| **唇形同步** | LatentSync 1.6 + MuseTalk 1.5 | 混合路由:短视频 Diffusion 高质量,长视频单步实时推理 |
|
||||
| **声音克隆** | CosyVoice 3.0 | 0.5B 参数量,9 语言 + 18 方言 |
|
||||
| **自动化** | Playwright | 社交媒体无头浏览器自动化 |
|
||||
| **部署** | Docker & PM2 | 混合部署架构 |
|
||||
|
||||
---
|
||||
|
||||
## 📖 文档中心
|
||||
|
||||
我们提供了详尽的开发与部署文档:
|
||||
|
||||
### 部署运维
|
||||
- **[部署手册 (DEPLOY_MANUAL.md)](Docs/DEPLOY_MANUAL.md)** - 👈 **部署请看这里**!包含完整的环境搭建步骤。
|
||||
- [参考音频服务部署 (COSYVOICE3_DEPLOY.md)](Docs/COSYVOICE3_DEPLOY.md) - 声音克隆模型部署指南。
|
||||
- [LatentSync 部署指南 (LATENTSYNC_DEPLOY.md)](Docs/LATENTSYNC_DEPLOY.md) - 唇形同步模型独立部署。
|
||||
- [MuseTalk 部署指南 (MUSETALK_DEPLOY.md)](Docs/MUSETALK_DEPLOY.md) - 长视频唇形同步模型部署。
|
||||
- [Supabase 部署指南 (SUPABASE_DEPLOY.md)](Docs/SUPABASE_DEPLOY.md) - Supabase 与认证系统配置。
|
||||
- [支付宝部署指南 (ALIPAY_DEPLOY.md)](Docs/ALIPAY_DEPLOY.md) - 支付宝付费开通会员配置。
|
||||
|
||||
### 开发文档
|
||||
- [后端开发指南 (BACKEND_README.md)](Docs/BACKEND_README.md) - 接口规范与开发流程。
|
||||
- [后端开发规范 (BACKEND_DEV.md)](Docs/BACKEND_DEV.md) - 分层约定与开发习惯。
|
||||
- [前端开发指南 (FRONTEND_DEV.md)](Docs/FRONTEND_DEV.md) - UI 组件与页面规范。
|
||||
- [前端组件文档 (FRONTEND_README.md)](Docs/FRONTEND_README.md) - 组件结构与板块说明。
|
||||
- [Remotion 字幕部署 (SUBTITLE_DEPLOY.md)](Docs/SUBTITLE_DEPLOY.md) - 字幕渲染服务部署。
|
||||
- [开发日志 (DevLogs)](Docs/DevLogs/) - 每日开发进度与技术决策记录。
|
||||
|
||||
---
|
||||
|
||||
@@ -36,136 +81,37 @@
|
||||
|
||||
```
|
||||
ViGent2/
|
||||
├── backend/ # FastAPI 后端
|
||||
│ ├── app/
|
||||
│ │ ├── api/ # API 路由
|
||||
│ │ ├── services/ # 核心服务 (TTS, LipSync, Video)
|
||||
│ │ └── core/ # 配置
|
||||
│ ├── requirements.txt
|
||||
│ └── .env.example
|
||||
├── frontend/ # Next.js 前端
|
||||
│ └── src/app/
|
||||
├── models/ # AI 模型
|
||||
│ └── LatentSync/ # 唇形同步模型
|
||||
│ └── DEPLOY.md # LatentSync 部署指南
|
||||
└── Docs/ # 文档
|
||||
├── DEPLOY_MANUAL.md # 部署手册
|
||||
├── AUTH_DEPLOY.md # 认证部署指南
|
||||
├── task_complete.md
|
||||
└── DevLogs/
|
||||
├── backend/ # FastAPI 后端服务
|
||||
│ ├── app/ # 核心业务逻辑
|
||||
│ ├── assets/ # 字体 / 样式 / BGM
|
||||
│ ├── user_data/ # 用户隔离数据 (Cookie 等)
|
||||
│ └── scripts/ # 运维脚本 (Watchdog 等)
|
||||
├── frontend/ # Next.js 前端应用
|
||||
├── remotion/ # Remotion 视频渲染 (标题/字幕合成)
|
||||
├── models/ # AI 模型仓库
|
||||
│ ├── LatentSync/ # 唇形同步服务 (GPU1, 短视频)
|
||||
│ ├── MuseTalk/ # 唇形同步服务 (GPU0, 长视频)
|
||||
│ └── CosyVoice/ # 声音克隆服务
|
||||
└── Docs/ # 项目文档
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 快速开始
|
||||
## 🌐 服务架构
|
||||
|
||||
### 1. 克隆项目
|
||||
系统采用微服务架构设计,各组件独立运行:
|
||||
|
||||
```bash
|
||||
git clone <仓库地址> /home/rongye/ProgramFiles/ViGent2
|
||||
cd /home/rongye/ProgramFiles/ViGent2
|
||||
```
|
||||
|
||||
### 2. 安装后端
|
||||
|
||||
```bash
|
||||
cd backend
|
||||
python -m venv venv
|
||||
source venv/bin/activate # Windows: venv\Scripts\activate
|
||||
pip install -r requirements.txt
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
### 3. 安装前端
|
||||
|
||||
```bash
|
||||
cd frontend
|
||||
npm install
|
||||
```
|
||||
|
||||
### 4. 安装 LatentSync (服务器)
|
||||
|
||||
详见 [models/LatentSync/DEPLOY.md](models/LatentSync/DEPLOY.md)
|
||||
|
||||
```bash
|
||||
# 创建独立 Conda 环境
|
||||
conda create -n latentsync python=3.10.13
|
||||
conda activate latentsync
|
||||
|
||||
# 安装依赖并下载权重
|
||||
cd models/LatentSync
|
||||
pip install -r requirements.txt
|
||||
huggingface-cli download ByteDance/LatentSync-1.6 --local-dir checkpoints
|
||||
```
|
||||
|
||||
### 5. 启动服务
|
||||
|
||||
```bash
|
||||
# 终端 1: 后端 (端口 8006)
|
||||
cd backend && source venv/bin/activate
|
||||
uvicorn app.main:app --host 0.0.0.0 --port 8006
|
||||
|
||||
# 终端 2: 前端 (端口 3002)
|
||||
cd frontend
|
||||
npm run dev -- -p 3002
|
||||
|
||||
# 终端 3: LatentSync 服务 (端口 8007, 推荐启动)
|
||||
cd models/LatentSync
|
||||
nohup python -m scripts.server > server.log 2>&1 &
|
||||
```
|
||||
| 服务名称 | 端口 | 用途 |
|
||||
|----------|------|------|
|
||||
| **Web UI** | 3002 | 用户访问入口 (Next.js) |
|
||||
| **Backend API** | 8006 | 核心业务接口 (FastAPI) |
|
||||
| **LatentSync** | 8007 | 唇形同步推理服务 (GPU1, 短视频) |
|
||||
| **MuseTalk** | 8011 | 唇形同步推理服务 (GPU0, 长视频) |
|
||||
| **CosyVoice 3.0** | 8010 | 声音克隆推理服务 |
|
||||
| **Supabase** | 8008 | 数据库与认证网关 |
|
||||
|
||||
---
|
||||
|
||||
## 🖥️ 服务器配置
|
||||
## ⚖️ License
|
||||
|
||||
**目标服务器**: Dell PowerEdge R730
|
||||
|
||||
| 配置 | 规格 |
|
||||
|------|------|
|
||||
| CPU | 2× Intel Xeon E5-2680 v4 (56 线程) |
|
||||
| 内存 | 192GB DDR4 |
|
||||
| GPU | 2× NVIDIA RTX 3090 24GB |
|
||||
| 存储 | 4.47TB |
|
||||
|
||||
**GPU 分配**:
|
||||
- GPU 0: 其他服务
|
||||
- GPU 1: **LatentSync** 唇形同步 (~18GB VRAM)
|
||||
|
||||
---
|
||||
|
||||
## 🌐 访问地址
|
||||
|
||||
| 服务 | 地址 | 说明 |
|
||||
|------|------|------|
|
||||
| **视频生成 (UI)** | `https://vigent.hbyrkj.top` | 用户访问入口 |
|
||||
| **API 服务** | `http://<服务器IP>:8006` | 后端 Swagger |
|
||||
| **认证管理 (Studio)** | `https://supabase.hbyrkj.top` | 需要 Basic Auth |
|
||||
| **认证 API (Kong)** | `https://api.hbyrkj.top` | Supabase 接口 |
|
||||
| **模型服务** | `http://<服务器IP>:8007` | LatentSync |
|
||||
|
||||
---
|
||||
|
||||
## 📖 文档
|
||||
|
||||
- [手动部署指南](Docs/DEPLOY_MANUAL.md)
|
||||
- [Supabase 部署指南](Docs/SUPABASE_DEPLOY.md)
|
||||
- [LatentSync 部署指南](models/LatentSync/DEPLOY.md)
|
||||
- [开发日志](Docs/DevLogs/)
|
||||
- [任务进度](Docs/task_complete.md)
|
||||
|
||||
---
|
||||
|
||||
## 🆚 与 ViGent 的区别
|
||||
|
||||
| 特性 | ViGent (v1) | ViGent2 |
|
||||
|------|-------------|---------|
|
||||
| 唇形同步模型 | MuseTalk v1.5 | **LatentSync 1.6** |
|
||||
| 分辨率 | 256×256 | **512×512** |
|
||||
| 架构 | GAN | **Latent Diffusion** |
|
||||
| 视频预处理 | 无 | **自动压缩优化** |
|
||||
|
||||
---
|
||||
|
||||
## 📄 License
|
||||
|
||||
MIT
|
||||
[MIT License](LICENSE) © 2026 ViGent Team
|
||||
|
||||
@@ -15,18 +15,17 @@ DEFAULT_TTS_VOICE=zh-CN-YunxiNeural
|
||||
# GPU 选择 (0=第一块GPU, 1=第二块GPU)
|
||||
LATENTSYNC_GPU_ID=1
|
||||
|
||||
# 使用本地模式 (true) 或远程 API (false)
|
||||
# 使用本地模式 (true) 或远程 API (false)
|
||||
LATENTSYNC_LOCAL=true
|
||||
|
||||
# 使用常驻服务 (Persistent Server) 加速
|
||||
LATENTSYNC_USE_SERVER=false
|
||||
LATENTSYNC_USE_SERVER=true
|
||||
|
||||
# 远程 API 地址 (常驻服务默认端口 8007)
|
||||
# LATENTSYNC_API_URL=http://localhost:8007
|
||||
|
||||
# 推理步数 (20-50, 越高质量越好,速度越慢)
|
||||
LATENTSYNC_INFERENCE_STEPS=20
|
||||
LATENTSYNC_INFERENCE_STEPS=16
|
||||
|
||||
# 引导系数 (1.0-3.0, 越高唇同步越准,但可能抖动)
|
||||
LATENTSYNC_GUIDANCE_SCALE=1.5
|
||||
@@ -37,6 +36,26 @@ LATENTSYNC_ENABLE_DEEPCACHE=true
|
||||
# 随机种子 (设为 -1 则随机)
|
||||
LATENTSYNC_SEED=1247
|
||||
|
||||
# =============== MuseTalk 配置 ===============
|
||||
# GPU 选择 (默认 GPU0,与 CosyVoice 共存)
|
||||
MUSETALK_GPU_ID=0
|
||||
|
||||
# 常驻服务地址 (端口 8011)
|
||||
MUSETALK_API_URL=http://localhost:8011
|
||||
|
||||
# 推理批大小
|
||||
MUSETALK_BATCH_SIZE=32
|
||||
|
||||
# 模型版本
|
||||
MUSETALK_VERSION=v15
|
||||
|
||||
# 半精度加速
|
||||
MUSETALK_USE_FLOAT16=true
|
||||
|
||||
# =============== 混合唇形同步路由 ===============
|
||||
# 音频时长 >= 此阈值(秒)用 MuseTalk,< 此阈值用 LatentSync
|
||||
LIPSYNC_DURATION_THRESHOLD=120
|
||||
|
||||
# =============== 上传配置 ===============
|
||||
# 最大上传文件大小 (MB)
|
||||
MAX_UPLOAD_SIZE_MB=500
|
||||
@@ -59,5 +78,21 @@ JWT_EXPIRE_HOURS=168
|
||||
|
||||
# =============== 管理员配置 ===============
|
||||
# 服务启动时自动创建的管理员账号
|
||||
ADMIN_EMAIL=lamnickdavid@gmail.com
|
||||
ADMIN_PHONE=15549380526
|
||||
ADMIN_PASSWORD=lam1988324
|
||||
|
||||
# =============== GLM AI 配置 ===============
|
||||
# 智谱 GLM API 配置 (用于生成标题和标签)
|
||||
GLM_API_KEY=32440cd3f3444d1f8fe721304acea8bd.YXNLrk7eIJMKcg4t
|
||||
GLM_MODEL=glm-4.7-flash
|
||||
|
||||
# =============== Supabase Storage 本地路径 ===============
|
||||
# 确保存储卷映射正确,避免硬编码路径
|
||||
SUPABASE_STORAGE_LOCAL_PATH=/home/rongye/ProgramFiles/Supabase/volumes/storage/stub/stub
|
||||
|
||||
# =============== 支付宝配置 ===============
|
||||
ALIPAY_APP_ID=2021006132600283
|
||||
ALIPAY_PRIVATE_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/app_private_key.pem
|
||||
ALIPAY_PUBLIC_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/alipay_public_key.pem
|
||||
ALIPAY_NOTIFY_URL=https://vigent.hbyrkj.top/api/payment/notify
|
||||
ALIPAY_RETURN_URL=https://vigent.hbyrkj.top/pay
|
||||
|
||||
@@ -1,223 +0,0 @@
|
||||
"""
|
||||
认证 API:注册、登录、登出
|
||||
"""
|
||||
from fastapi import APIRouter, HTTPException, Response, status, Request
|
||||
from pydantic import BaseModel, EmailStr
|
||||
from app.core.supabase import get_supabase
|
||||
from app.core.security import (
|
||||
get_password_hash,
|
||||
verify_password,
|
||||
create_access_token,
|
||||
generate_session_token,
|
||||
set_auth_cookie,
|
||||
clear_auth_cookie,
|
||||
decode_access_token
|
||||
)
|
||||
from loguru import logger
|
||||
from typing import Optional
|
||||
|
||||
router = APIRouter(prefix="/api/auth", tags=["认证"])
|
||||
|
||||
|
||||
class RegisterRequest(BaseModel):
|
||||
email: EmailStr
|
||||
password: str
|
||||
username: Optional[str] = None
|
||||
|
||||
|
||||
class LoginRequest(BaseModel):
|
||||
email: EmailStr
|
||||
password: str
|
||||
|
||||
|
||||
class UserResponse(BaseModel):
|
||||
id: str
|
||||
email: str
|
||||
username: Optional[str]
|
||||
role: str
|
||||
is_active: bool
|
||||
|
||||
|
||||
@router.post("/register")
|
||||
async def register(request: RegisterRequest):
|
||||
"""
|
||||
用户注册
|
||||
|
||||
注册后状态为 pending,需要管理员激活
|
||||
"""
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
|
||||
# 检查邮箱是否已存在
|
||||
existing = supabase.table("users").select("id").eq(
|
||||
"email", request.email
|
||||
).execute()
|
||||
|
||||
if existing.data:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail="该邮箱已注册"
|
||||
)
|
||||
|
||||
# 创建用户
|
||||
password_hash = get_password_hash(request.password)
|
||||
|
||||
result = supabase.table("users").insert({
|
||||
"email": request.email,
|
||||
"password_hash": password_hash,
|
||||
"username": request.username or request.email.split("@")[0],
|
||||
"role": "pending",
|
||||
"is_active": False
|
||||
}).execute()
|
||||
|
||||
logger.info(f"新用户注册: {request.email}")
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": "注册成功,请等待管理员审核激活"
|
||||
}
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"注册失败: {e}")
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||
detail="注册失败,请稍后重试"
|
||||
)
|
||||
|
||||
|
||||
@router.post("/login")
|
||||
async def login(request: LoginRequest, response: Response):
|
||||
"""
|
||||
用户登录
|
||||
|
||||
- 验证密码
|
||||
- 检查是否激活
|
||||
- 实现"后踢前"单设备登录
|
||||
"""
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
|
||||
# 查找用户
|
||||
user_result = supabase.table("users").select("*").eq(
|
||||
"email", request.email
|
||||
).single().execute()
|
||||
|
||||
user = user_result.data
|
||||
if not user:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="邮箱或密码错误"
|
||||
)
|
||||
|
||||
# 验证密码
|
||||
if not verify_password(request.password, user["password_hash"]):
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="邮箱或密码错误"
|
||||
)
|
||||
|
||||
# 检查是否激活
|
||||
if not user["is_active"]:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="账号未激活,请等待管理员审核"
|
||||
)
|
||||
|
||||
# 检查授权是否过期
|
||||
if user.get("expires_at"):
|
||||
from datetime import datetime, timezone
|
||||
expires_at = datetime.fromisoformat(user["expires_at"].replace("Z", "+00:00"))
|
||||
if datetime.now(timezone.utc) > expires_at:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="授权已过期,请联系管理员续期"
|
||||
)
|
||||
|
||||
# 生成新的 session_token (后踢前)
|
||||
session_token = generate_session_token()
|
||||
|
||||
# 删除旧 session,插入新 session
|
||||
supabase.table("user_sessions").delete().eq(
|
||||
"user_id", user["id"]
|
||||
).execute()
|
||||
|
||||
supabase.table("user_sessions").insert({
|
||||
"user_id": user["id"],
|
||||
"session_token": session_token,
|
||||
"device_info": None # 可以从 request headers 获取
|
||||
}).execute()
|
||||
|
||||
# 生成 JWT Token
|
||||
token = create_access_token(user["id"], session_token)
|
||||
|
||||
# 设置 HttpOnly Cookie
|
||||
set_auth_cookie(response, token)
|
||||
|
||||
logger.info(f"用户登录: {request.email}")
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": "登录成功",
|
||||
"user": UserResponse(
|
||||
id=user["id"],
|
||||
email=user["email"],
|
||||
username=user.get("username"),
|
||||
role=user["role"],
|
||||
is_active=user["is_active"]
|
||||
)
|
||||
}
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"登录失败: {e}")
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||
detail="登录失败,请稍后重试"
|
||||
)
|
||||
|
||||
|
||||
@router.post("/logout")
|
||||
async def logout(response: Response):
|
||||
"""用户登出"""
|
||||
clear_auth_cookie(response)
|
||||
return {"success": True, "message": "已登出"}
|
||||
|
||||
|
||||
@router.get("/me")
|
||||
async def get_me(request: Request):
|
||||
"""获取当前用户信息"""
|
||||
# 从 Cookie 获取用户
|
||||
token = request.cookies.get("access_token")
|
||||
if not token:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="未登录"
|
||||
)
|
||||
|
||||
token_data = decode_access_token(token)
|
||||
if not token_data:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="Token 无效"
|
||||
)
|
||||
|
||||
supabase = get_supabase()
|
||||
user_result = supabase.table("users").select("*").eq(
|
||||
"id", token_data.user_id
|
||||
).single().execute()
|
||||
|
||||
user = user_result.data
|
||||
if not user:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="用户不存在"
|
||||
)
|
||||
|
||||
return UserResponse(
|
||||
id=user["id"],
|
||||
email=user["email"],
|
||||
username=user.get("username"),
|
||||
role=user["role"],
|
||||
is_active=user["is_active"]
|
||||
)
|
||||
@@ -1,331 +0,0 @@
|
||||
from fastapi import APIRouter, UploadFile, File, HTTPException, Request, BackgroundTasks, Depends
|
||||
from app.core.config import settings
|
||||
from app.core.deps import get_current_user
|
||||
from app.services.storage import storage_service
|
||||
import re
|
||||
import time
|
||||
import traceback
|
||||
import os
|
||||
import aiofiles
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
def sanitize_filename(filename: str) -> str:
|
||||
safe_name = re.sub(r'[<>:"/\\|?*]', '_', filename)
|
||||
if len(safe_name) > 100:
|
||||
ext = Path(safe_name).suffix
|
||||
safe_name = safe_name[:100 - len(ext)] + ext
|
||||
return safe_name
|
||||
|
||||
async def process_and_upload(temp_file_path: str, original_filename: str, content_type: str, user_id: str):
|
||||
"""Background task to strip multipart headers and upload to Supabase"""
|
||||
try:
|
||||
logger.info(f"Processing raw upload: {temp_file_path} for user {user_id}")
|
||||
|
||||
# 1. Analyze file to find actual video content (strip multipart boundaries)
|
||||
# This is a simplified manual parser for a SINGLE file upload.
|
||||
# Structure:
|
||||
# --boundary
|
||||
# Content-Disposition: form-data; name="file"; filename="..."
|
||||
# Content-Type: video/mp4
|
||||
# \r\n\r\n
|
||||
# [DATA]
|
||||
# \r\n--boundary--
|
||||
|
||||
# We need to read the first few KB to find the header end
|
||||
start_offset = 0
|
||||
end_offset = 0
|
||||
boundary = b""
|
||||
|
||||
file_size = os.path.getsize(temp_file_path)
|
||||
|
||||
with open(temp_file_path, 'rb') as f:
|
||||
# Read first 4KB to find header
|
||||
head = f.read(4096)
|
||||
|
||||
# Find boundary
|
||||
first_line_end = head.find(b'\r\n')
|
||||
if first_line_end == -1:
|
||||
raise Exception("Could not find boundary in multipart body")
|
||||
|
||||
boundary = head[:first_line_end] # e.g. --boundary123
|
||||
logger.info(f"Detected boundary: {boundary}")
|
||||
|
||||
# Find end of headers (\r\n\r\n)
|
||||
header_end = head.find(b'\r\n\r\n')
|
||||
if header_end == -1:
|
||||
raise Exception("Could not find end of multipart headers")
|
||||
|
||||
start_offset = header_end + 4
|
||||
logger.info(f"Video data starts at offset: {start_offset}")
|
||||
|
||||
# Find end boundary (read from end of file)
|
||||
# It should be \r\n + boundary + -- + \r\n
|
||||
# We seek to end-200 bytes
|
||||
f.seek(max(0, file_size - 200))
|
||||
tail = f.read()
|
||||
|
||||
# The closing boundary is usually --boundary--
|
||||
# We look for the last occurrence of the boundary
|
||||
last_boundary_pos = tail.rfind(boundary)
|
||||
if last_boundary_pos != -1:
|
||||
# The data ends before \r\n + boundary
|
||||
# The tail buffer relative position needs to be converted to absolute
|
||||
end_pos_in_tail = last_boundary_pos
|
||||
# We also need to check for the preceding \r\n
|
||||
if end_pos_in_tail >= 2 and tail[end_pos_in_tail-2:end_pos_in_tail] == b'\r\n':
|
||||
end_pos_in_tail -= 2
|
||||
|
||||
# Absolute end offset
|
||||
end_offset = (file_size - 200) + last_boundary_pos
|
||||
# Correction for CRLF before boundary
|
||||
# Actually, simply: read until (file_size - len(tail) + last_boundary_pos) - 2
|
||||
end_offset = (max(0, file_size - 200) + last_boundary_pos) - 2
|
||||
else:
|
||||
logger.warning("Could not find closing boundary, assuming EOF")
|
||||
end_offset = file_size
|
||||
|
||||
logger.info(f"Video data ends at offset: {end_offset}. Total video size: {end_offset - start_offset}")
|
||||
|
||||
# 2. Extract and Upload to Supabase
|
||||
# Since we have the file on disk, we can just pass the file object (seeked) to upload_file?
|
||||
# Or if upload_file expects bytes/path, checking storage.py...
|
||||
# It takes `file_data` (bytes) or file-like?
|
||||
# supabase-py's `upload` method handles parsing if we pass a file object.
|
||||
# But we need to pass ONLY the video slice.
|
||||
# So we create a generator or a sliced file object?
|
||||
# Simpler: Read the slice into memory if < 1GB? Or copy to new temp file?
|
||||
# Copying to new temp file is safer for memory.
|
||||
|
||||
video_path = temp_file_path + "_video.mp4"
|
||||
with open(temp_file_path, 'rb') as src, open(video_path, 'wb') as dst:
|
||||
src.seek(start_offset)
|
||||
# Copy in chunks
|
||||
bytes_to_copy = end_offset - start_offset
|
||||
copied = 0
|
||||
while copied < bytes_to_copy:
|
||||
chunk_size = min(1024*1024*10, bytes_to_copy - copied) # 10MB chunks
|
||||
chunk = src.read(chunk_size)
|
||||
if not chunk:
|
||||
break
|
||||
dst.write(chunk)
|
||||
copied += len(chunk)
|
||||
|
||||
logger.info(f"Extracted video content to {video_path}")
|
||||
|
||||
# 3. Upload to Supabase with user isolation
|
||||
timestamp = int(time.time())
|
||||
safe_name = re.sub(r'[^a-zA-Z0-9._-]', '', original_filename)
|
||||
# 使用 user_id 作为目录前缀实现隔离
|
||||
storage_path = f"{user_id}/{timestamp}_{safe_name}"
|
||||
|
||||
# Use storage service (this calls Supabase which might do its own http request)
|
||||
# We read the cleaned video file
|
||||
with open(video_path, 'rb') as f:
|
||||
file_content = f.read() # Still reading into memory for simple upload call, but server has 32GB RAM so ok for 500MB
|
||||
await storage_service.upload_file(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=storage_path,
|
||||
file_data=file_content,
|
||||
content_type=content_type
|
||||
)
|
||||
|
||||
logger.info(f"Upload to Supabase complete: {storage_path}")
|
||||
|
||||
# Cleanup
|
||||
os.remove(temp_file_path)
|
||||
os.remove(video_path)
|
||||
|
||||
return storage_path
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Background upload processing failed: {e}\n{traceback.format_exc()}")
|
||||
raise
|
||||
|
||||
|
||||
@router.post("")
|
||||
async def upload_material(
|
||||
request: Request,
|
||||
background_tasks: BackgroundTasks,
|
||||
current_user: dict = Depends(get_current_user)
|
||||
):
|
||||
user_id = current_user["id"]
|
||||
logger.info(f"ENTERED upload_material (Streaming Mode) for user {user_id}. Headers: {request.headers}")
|
||||
|
||||
filename = "unknown_video.mp4" # Fallback
|
||||
content_type = "video/mp4"
|
||||
|
||||
# Try to parse filename from header if possible (unreliable in raw stream)
|
||||
# We will rely on post-processing or client hint
|
||||
# Frontend sends standard multipart.
|
||||
|
||||
# Create temp file
|
||||
timestamp = int(time.time())
|
||||
temp_filename = f"upload_{timestamp}.raw"
|
||||
temp_path = os.path.join("/tmp", temp_filename) # Use /tmp on Linux
|
||||
# Ensure /tmp exists (it does) but verify paths
|
||||
if os.name == 'nt': # Local dev
|
||||
temp_path = f"d:/tmp/{temp_filename}"
|
||||
os.makedirs("d:/tmp", exist_ok=True)
|
||||
|
||||
try:
|
||||
total_size = 0
|
||||
last_log = 0
|
||||
|
||||
async with aiofiles.open(temp_path, 'wb') as f:
|
||||
async for chunk in request.stream():
|
||||
await f.write(chunk)
|
||||
total_size += len(chunk)
|
||||
|
||||
# Log progress every 20MB
|
||||
if total_size - last_log > 20 * 1024 * 1024:
|
||||
logger.info(f"Receiving stream... Processed {total_size / (1024*1024):.2f} MB")
|
||||
last_log = total_size
|
||||
|
||||
logger.info(f"Stream reception complete. Total size: {total_size} bytes. Saved to {temp_path}")
|
||||
|
||||
if total_size == 0:
|
||||
raise HTTPException(400, "Received empty body")
|
||||
|
||||
# Attempt to extract filename from the saved file's first bytes?
|
||||
# Or just accept it as "uploaded_video.mp4" for now to prove it works.
|
||||
# We can try to regex the header in the file content we just wrote.
|
||||
# Implemented in background task to return success immediately.
|
||||
|
||||
# Wait, if we return immediately, the user's UI might not show the file yet?
|
||||
# The prompt says "Wait for upload".
|
||||
# But to avoid User Waiting Timeout, maybe returning early is better?
|
||||
# NO, user expects the file to be in the list.
|
||||
# So we Must await the processing.
|
||||
# But "Processing" (Strip + Upload to Supabase) takes time.
|
||||
# Receiving took time.
|
||||
# If we await Supabase upload, does it timeout?
|
||||
# Supabase upload is outgoing. Usually faster/stable.
|
||||
|
||||
# Let's await the processing to ensure "List Materials" shows it.
|
||||
# We need to extract the filename for the list.
|
||||
|
||||
# Quick extract filename from first 4kb
|
||||
with open(temp_path, 'rb') as f:
|
||||
head = f.read(4096).decode('utf-8', errors='ignore')
|
||||
match = re.search(r'filename="([^"]+)"', head)
|
||||
if match:
|
||||
filename = match.group(1)
|
||||
logger.info(f"Extracted filename from body: {filename}")
|
||||
|
||||
# Run processing sync (in await)
|
||||
storage_path = await process_and_upload(temp_path, filename, content_type, user_id)
|
||||
|
||||
# Get signed URL (it exists now)
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=storage_path
|
||||
)
|
||||
|
||||
size_mb = total_size / (1024 * 1024) # Approximate (includes headers)
|
||||
|
||||
# 从 storage_path 提取显示名
|
||||
display_name = storage_path.split('/')[-1] # 去掉 user_id 前缀
|
||||
if '_' in display_name:
|
||||
parts = display_name.split('_', 1)
|
||||
if parts[0].isdigit():
|
||||
display_name = parts[1]
|
||||
|
||||
return {
|
||||
"id": storage_path,
|
||||
"name": display_name,
|
||||
"path": signed_url,
|
||||
"size_mb": size_mb,
|
||||
"type": "video"
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
error_msg = f"Streaming upload failed: {str(e)}"
|
||||
detail_msg = f"Exception: {repr(e)}\nArgs: {e.args}\n{traceback.format_exc()}"
|
||||
logger.error(error_msg + "\n" + detail_msg)
|
||||
|
||||
# Write to debug file
|
||||
try:
|
||||
with open("debug_upload.log", "a") as logf:
|
||||
logf.write(f"\n--- Error at {time.ctime()} ---\n")
|
||||
logf.write(detail_msg)
|
||||
logf.write("\n-----------------------------\n")
|
||||
except:
|
||||
pass
|
||||
|
||||
if os.path.exists(temp_path):
|
||||
try:
|
||||
os.remove(temp_path)
|
||||
except:
|
||||
pass
|
||||
raise HTTPException(500, f"Upload failed. Check server logs. Error: {str(e)}")
|
||||
|
||||
|
||||
@router.get("")
|
||||
async def list_materials(current_user: dict = Depends(get_current_user)):
|
||||
user_id = current_user["id"]
|
||||
try:
|
||||
# 只列出当前用户目录下的文件
|
||||
files_obj = await storage_service.list_files(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=user_id
|
||||
)
|
||||
materials = []
|
||||
for f in files_obj:
|
||||
name = f.get('name')
|
||||
if not name or name == '.emptyFolderPlaceholder':
|
||||
continue
|
||||
display_name = name
|
||||
if '_' in name:
|
||||
parts = name.split('_', 1)
|
||||
if parts[0].isdigit():
|
||||
display_name = parts[1]
|
||||
# 完整路径包含 user_id
|
||||
full_path = f"{user_id}/{name}"
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=full_path
|
||||
)
|
||||
metadata = f.get('metadata', {})
|
||||
size = metadata.get('size', 0)
|
||||
# created_at 在顶层,是 ISO 字符串
|
||||
created_at_str = f.get('created_at', '')
|
||||
created_at = 0
|
||||
if created_at_str:
|
||||
from datetime import datetime
|
||||
try:
|
||||
dt = datetime.fromisoformat(created_at_str.replace('Z', '+00:00'))
|
||||
created_at = int(dt.timestamp())
|
||||
except:
|
||||
pass
|
||||
materials.append({
|
||||
"id": full_path, # ID 使用完整路径
|
||||
"name": display_name,
|
||||
"path": signed_url,
|
||||
"size_mb": size / (1024 * 1024),
|
||||
"type": "video",
|
||||
"created_at": created_at
|
||||
})
|
||||
materials.sort(key=lambda x: x['id'], reverse=True)
|
||||
return {"materials": materials}
|
||||
except Exception as e:
|
||||
logger.error(f"List materials failed: {e}")
|
||||
return {"materials": []}
|
||||
|
||||
|
||||
@router.delete("/{material_id:path}")
|
||||
async def delete_material(material_id: str, current_user: dict = Depends(get_current_user)):
|
||||
user_id = current_user["id"]
|
||||
# 验证 material_id 属于当前用户
|
||||
if not material_id.startswith(f"{user_id}/"):
|
||||
raise HTTPException(403, "无权删除此素材")
|
||||
try:
|
||||
await storage_service.delete_file(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=material_id
|
||||
)
|
||||
return {"success": True, "message": "素材已删除"}
|
||||
except Exception as e:
|
||||
raise HTTPException(500, f"删除失败: {str(e)}")
|
||||
@@ -1,299 +0,0 @@
|
||||
from fastapi import APIRouter, HTTPException, BackgroundTasks, Depends, Request
|
||||
from pydantic import BaseModel
|
||||
from typing import Optional
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
import uuid
|
||||
import traceback
|
||||
import time
|
||||
import httpx
|
||||
import os
|
||||
from app.services.tts_service import TTSService
|
||||
from app.services.video_service import VideoService
|
||||
from app.services.lipsync_service import LipSyncService
|
||||
from app.services.storage import storage_service
|
||||
from app.core.config import settings
|
||||
from app.core.deps import get_current_user
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
class GenerateRequest(BaseModel):
|
||||
text: str
|
||||
voice: str = "zh-CN-YunxiNeural"
|
||||
material_path: str
|
||||
|
||||
tasks = {} # In-memory task store
|
||||
|
||||
# 缓存 LipSync 服务实例和健康状态
|
||||
_lipsync_service: Optional[LipSyncService] = None
|
||||
_lipsync_ready: Optional[bool] = None
|
||||
_lipsync_last_check: float = 0
|
||||
|
||||
def _get_lipsync_service() -> LipSyncService:
|
||||
"""获取或创建 LipSync 服务实例(单例模式,避免重复初始化)"""
|
||||
global _lipsync_service
|
||||
if _lipsync_service is None:
|
||||
_lipsync_service = LipSyncService()
|
||||
return _lipsync_service
|
||||
|
||||
async def _check_lipsync_ready(force: bool = False) -> bool:
|
||||
"""检查 LipSync 是否就绪(带缓存,5分钟内不重复检查)"""
|
||||
global _lipsync_ready, _lipsync_last_check
|
||||
|
||||
now = time.time()
|
||||
# 5分钟缓存
|
||||
if not force and _lipsync_ready is not None and (now - _lipsync_last_check) < 300:
|
||||
return _lipsync_ready
|
||||
|
||||
lipsync = _get_lipsync_service()
|
||||
health = await lipsync.check_health()
|
||||
_lipsync_ready = health.get("ready", False)
|
||||
_lipsync_last_check = now
|
||||
print(f"[LipSync] Health check: ready={_lipsync_ready}")
|
||||
return _lipsync_ready
|
||||
|
||||
async def _download_material(path_or_url: str, temp_path: Path):
|
||||
"""下载素材到临时文件 (流式下载,节省内存)"""
|
||||
if path_or_url.startswith("http"):
|
||||
# Download from URL
|
||||
timeout = httpx.Timeout(None) # Disable timeout for large files
|
||||
async with httpx.AsyncClient(timeout=timeout) as client:
|
||||
async with client.stream("GET", path_or_url) as resp:
|
||||
resp.raise_for_status()
|
||||
with open(temp_path, "wb") as f:
|
||||
async for chunk in resp.aiter_bytes():
|
||||
f.write(chunk)
|
||||
else:
|
||||
# Local file (legacy or absolute path)
|
||||
src = Path(path_or_url)
|
||||
if not src.is_absolute():
|
||||
src = settings.BASE_DIR.parent / path_or_url
|
||||
|
||||
if src.exists():
|
||||
import shutil
|
||||
shutil.copy(src, temp_path)
|
||||
else:
|
||||
raise FileNotFoundError(f"Material not found: {path_or_url}")
|
||||
|
||||
async def _process_video_generation(task_id: str, req: GenerateRequest, user_id: str):
|
||||
temp_files = [] # Track files to clean up
|
||||
try:
|
||||
start_time = time.time()
|
||||
|
||||
tasks[task_id]["status"] = "processing"
|
||||
tasks[task_id]["progress"] = 5
|
||||
tasks[task_id]["message"] = "正在下载素材..."
|
||||
|
||||
# Prepare temp dir
|
||||
temp_dir = settings.UPLOAD_DIR / "temp"
|
||||
temp_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# 0. Download Material
|
||||
input_material_path = temp_dir / f"{task_id}_input.mp4"
|
||||
temp_files.append(input_material_path)
|
||||
|
||||
await _download_material(req.material_path, input_material_path)
|
||||
|
||||
# 1. TTS - 进度 5% -> 25%
|
||||
tasks[task_id]["message"] = "正在生成语音 (TTS)..."
|
||||
tasks[task_id]["progress"] = 10
|
||||
|
||||
tts = TTSService()
|
||||
audio_path = temp_dir / f"{task_id}_audio.mp3"
|
||||
temp_files.append(audio_path)
|
||||
await tts.generate_audio(req.text, req.voice, str(audio_path))
|
||||
|
||||
tts_time = time.time() - start_time
|
||||
print(f"[Pipeline] TTS completed in {tts_time:.1f}s")
|
||||
tasks[task_id]["progress"] = 25
|
||||
|
||||
# 2. LipSync - 进度 25% -> 85%
|
||||
tasks[task_id]["message"] = "正在合成唇形 (LatentSync)..."
|
||||
tasks[task_id]["progress"] = 30
|
||||
|
||||
lipsync = _get_lipsync_service()
|
||||
lipsync_video_path = temp_dir / f"{task_id}_lipsync.mp4"
|
||||
temp_files.append(lipsync_video_path)
|
||||
|
||||
# 使用缓存的健康检查结果
|
||||
lipsync_start = time.time()
|
||||
is_ready = await _check_lipsync_ready()
|
||||
|
||||
if is_ready:
|
||||
print(f"[LipSync] Starting LatentSync inference...")
|
||||
tasks[task_id]["progress"] = 35
|
||||
tasks[task_id]["message"] = "正在运行 LatentSync 推理..."
|
||||
await lipsync.generate(str(input_material_path), str(audio_path), str(lipsync_video_path))
|
||||
else:
|
||||
# Skip lipsync if not available
|
||||
print(f"[LipSync] LatentSync not ready, copying original video")
|
||||
tasks[task_id]["message"] = "唇形同步不可用,使用原始视频..."
|
||||
import shutil
|
||||
shutil.copy(str(input_material_path), lipsync_video_path)
|
||||
|
||||
lipsync_time = time.time() - lipsync_start
|
||||
print(f"[Pipeline] LipSync completed in {lipsync_time:.1f}s")
|
||||
tasks[task_id]["progress"] = 85
|
||||
|
||||
# 3. Composition - 进度 85% -> 100%
|
||||
tasks[task_id]["message"] = "正在合成最终视频..."
|
||||
tasks[task_id]["progress"] = 90
|
||||
|
||||
video = VideoService()
|
||||
final_output_local_path = temp_dir / f"{task_id}_output.mp4"
|
||||
temp_files.append(final_output_local_path)
|
||||
|
||||
await video.compose(str(lipsync_video_path), str(audio_path), str(final_output_local_path))
|
||||
|
||||
total_time = time.time() - start_time
|
||||
|
||||
# 4. Upload to Supabase with user isolation
|
||||
tasks[task_id]["message"] = "正在上传结果..."
|
||||
tasks[task_id]["progress"] = 95
|
||||
|
||||
# 使用 user_id 作为目录前缀实现隔离
|
||||
storage_path = f"{user_id}/{task_id}_output.mp4"
|
||||
with open(final_output_local_path, "rb") as f:
|
||||
file_data = f.read()
|
||||
await storage_service.upload_file(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=storage_path,
|
||||
file_data=file_data,
|
||||
content_type="video/mp4"
|
||||
)
|
||||
|
||||
# Get Signed URL
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=storage_path
|
||||
)
|
||||
|
||||
print(f"[Pipeline] Total generation time: {total_time:.1f}s")
|
||||
|
||||
tasks[task_id]["status"] = "completed"
|
||||
tasks[task_id]["progress"] = 100
|
||||
tasks[task_id]["message"] = f"生成完成!耗时 {total_time:.0f} 秒"
|
||||
tasks[task_id]["output"] = storage_path
|
||||
tasks[task_id]["download_url"] = signed_url
|
||||
|
||||
except Exception as e:
|
||||
tasks[task_id]["status"] = "failed"
|
||||
tasks[task_id]["message"] = f"错误: {str(e)}"
|
||||
tasks[task_id]["error"] = traceback.format_exc()
|
||||
logger.error(f"Generate video failed: {e}")
|
||||
finally:
|
||||
# Cleanup temp files
|
||||
for f in temp_files:
|
||||
try:
|
||||
if f.exists():
|
||||
f.unlink()
|
||||
except Exception as e:
|
||||
print(f"Error cleaning up {f}: {e}")
|
||||
|
||||
@router.post("/generate")
|
||||
async def generate_video(
|
||||
req: GenerateRequest,
|
||||
background_tasks: BackgroundTasks,
|
||||
current_user: dict = Depends(get_current_user)
|
||||
):
|
||||
user_id = current_user["id"]
|
||||
task_id = str(uuid.uuid4())
|
||||
tasks[task_id] = {"status": "pending", "task_id": task_id, "progress": 0, "user_id": user_id}
|
||||
background_tasks.add_task(_process_video_generation, task_id, req, user_id)
|
||||
return {"task_id": task_id}
|
||||
|
||||
@router.get("/tasks/{task_id}")
|
||||
async def get_task(task_id: str):
|
||||
return tasks.get(task_id, {"status": "not_found"})
|
||||
|
||||
@router.get("/tasks")
|
||||
async def list_tasks():
|
||||
return {"tasks": list(tasks.values())}
|
||||
|
||||
@router.get("/lipsync/health")
|
||||
async def lipsync_health():
|
||||
"""获取 LipSync 服务健康状态"""
|
||||
lipsync = _get_lipsync_service()
|
||||
return await lipsync.check_health()
|
||||
|
||||
|
||||
@router.get("/generated")
|
||||
async def list_generated_videos(current_user: dict = Depends(get_current_user)):
|
||||
"""从 Storage 读取当前用户生成的视频列表"""
|
||||
user_id = current_user["id"]
|
||||
try:
|
||||
# 只列出当前用户目录下的文件
|
||||
files_obj = await storage_service.list_files(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=user_id
|
||||
)
|
||||
|
||||
videos = []
|
||||
for f in files_obj:
|
||||
name = f.get('name')
|
||||
if not name or name == '.emptyFolderPlaceholder':
|
||||
continue
|
||||
|
||||
# 过滤非 output.mp4 文件
|
||||
if not name.endswith("_output.mp4"):
|
||||
continue
|
||||
|
||||
# 获取 ID (即文件名去除后缀)
|
||||
video_id = Path(name).stem
|
||||
|
||||
# 完整路径包含 user_id
|
||||
full_path = f"{user_id}/{name}"
|
||||
|
||||
# 获取签名链接
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=full_path
|
||||
)
|
||||
|
||||
metadata = f.get('metadata', {})
|
||||
size = metadata.get('size', 0)
|
||||
# created_at 在顶层,是 ISO 字符串,转换为 Unix 时间戳
|
||||
created_at_str = f.get('created_at', '')
|
||||
created_at = 0
|
||||
if created_at_str:
|
||||
from datetime import datetime
|
||||
try:
|
||||
dt = datetime.fromisoformat(created_at_str.replace('Z', '+00:00'))
|
||||
created_at = int(dt.timestamp())
|
||||
except:
|
||||
pass
|
||||
|
||||
videos.append({
|
||||
"id": video_id,
|
||||
"name": name,
|
||||
"path": signed_url, # Direct playable URL
|
||||
"size_mb": size / (1024 * 1024),
|
||||
"created_at": created_at
|
||||
})
|
||||
|
||||
# Sort by created_at desc (newest first)
|
||||
# Supabase API usually returns ISO string, simpler string sort works for ISO
|
||||
videos.sort(key=lambda x: x.get("created_at", ""), reverse=True)
|
||||
return {"videos": videos}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"List generated videos failed: {e}")
|
||||
return {"videos": []}
|
||||
|
||||
|
||||
@router.delete("/generated/{video_id}")
|
||||
async def delete_generated_video(video_id: str, current_user: dict = Depends(get_current_user)):
|
||||
"""删除生成的视频"""
|
||||
user_id = current_user["id"]
|
||||
try:
|
||||
# video_id 通常是 uuid_output,完整路径需要加上 user_id
|
||||
storage_path = f"{user_id}/{video_id}.mp4"
|
||||
|
||||
await storage_service.delete_file(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=storage_path
|
||||
)
|
||||
return {"success": True, "message": "视频已删除"}
|
||||
except Exception as e:
|
||||
raise HTTPException(500, f"删除失败: {str(e)}")
|
||||
|
||||
@@ -6,10 +6,43 @@ class Settings(BaseSettings):
|
||||
BASE_DIR: Path = Path(__file__).resolve().parent.parent
|
||||
UPLOAD_DIR: Path = BASE_DIR.parent / "uploads"
|
||||
OUTPUT_DIR: Path = BASE_DIR.parent / "outputs"
|
||||
ASSETS_DIR: Path = BASE_DIR.parent / "assets"
|
||||
PUBLISH_SCREENSHOT_DIR: Path = BASE_DIR.parent / "private_outputs" / "publish_screenshots"
|
||||
|
||||
# 数据库/缓存
|
||||
REDIS_URL: str = "redis://localhost:6379/0"
|
||||
DEBUG: bool = True
|
||||
|
||||
# Playwright 配置
|
||||
WEIXIN_HEADLESS_MODE: str = "headless-new"
|
||||
WEIXIN_USER_AGENT: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
|
||||
WEIXIN_LOCALE: str = "zh-CN"
|
||||
WEIXIN_TIMEZONE_ID: str = "Asia/Shanghai"
|
||||
WEIXIN_CHROME_PATH: str = "/usr/bin/google-chrome"
|
||||
WEIXIN_BROWSER_CHANNEL: str = ""
|
||||
WEIXIN_FORCE_SWIFTSHADER: bool = True
|
||||
WEIXIN_TRANSCODE_MODE: str = "reencode"
|
||||
WEIXIN_DEBUG_ARTIFACTS: bool = False
|
||||
WEIXIN_RECORD_VIDEO: bool = False
|
||||
WEIXIN_KEEP_SUCCESS_VIDEO: bool = False
|
||||
WEIXIN_RECORD_VIDEO_WIDTH: int = 1280
|
||||
WEIXIN_RECORD_VIDEO_HEIGHT: int = 720
|
||||
|
||||
# Douyin Playwright 配置
|
||||
DOUYIN_HEADLESS_MODE: str = "headless-new"
|
||||
DOUYIN_USER_AGENT: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36"
|
||||
DOUYIN_LOCALE: str = "zh-CN"
|
||||
DOUYIN_TIMEZONE_ID: str = "Asia/Shanghai"
|
||||
DOUYIN_CHROME_PATH: str = "/usr/bin/google-chrome"
|
||||
DOUYIN_BROWSER_CHANNEL: str = ""
|
||||
DOUYIN_FORCE_SWIFTSHADER: bool = True
|
||||
|
||||
# Douyin 调试录屏
|
||||
DOUYIN_DEBUG_ARTIFACTS: bool = False
|
||||
DOUYIN_RECORD_VIDEO: bool = False
|
||||
DOUYIN_KEEP_SUCCESS_VIDEO: bool = False
|
||||
DOUYIN_RECORD_VIDEO_WIDTH: int = 1280
|
||||
DOUYIN_RECORD_VIDEO_HEIGHT: int = 720
|
||||
|
||||
# TTS 配置
|
||||
DEFAULT_TTS_VOICE: str = "zh-CN-YunxiNeural"
|
||||
@@ -22,10 +55,19 @@ class Settings(BaseSettings):
|
||||
LATENTSYNC_INFERENCE_STEPS: int = 20 # 推理步数 [20-50]
|
||||
LATENTSYNC_GUIDANCE_SCALE: float = 1.5 # 引导系数 [1.0-3.0]
|
||||
LATENTSYNC_ENABLE_DEEPCACHE: bool = True # 启用 DeepCache 加速
|
||||
LATENTSYNC_ENABLE_DEEPCACHE: bool = True # 启用 DeepCache 加速
|
||||
LATENTSYNC_SEED: int = 1247 # 随机种子 (-1 则随机)
|
||||
LATENTSYNC_USE_SERVER: bool = False # 使用常驻服务 (Persistent Server) 加速
|
||||
|
||||
LATENTSYNC_USE_SERVER: bool = True # 使用常驻服务 (Persistent Server) 加速
|
||||
|
||||
# MuseTalk 配置
|
||||
MUSETALK_GPU_ID: int = 0 # GPU ID (默认使用 GPU0)
|
||||
MUSETALK_API_URL: str = "http://localhost:8011" # 常驻服务地址
|
||||
MUSETALK_BATCH_SIZE: int = 8 # 推理批大小
|
||||
MUSETALK_VERSION: str = "v15" # 模型版本
|
||||
MUSETALK_USE_FLOAT16: bool = True # 半精度加速
|
||||
|
||||
# 混合唇形同步路由
|
||||
LIPSYNC_DURATION_THRESHOLD: float = 120.0 # 秒,>=此值用 MuseTalk
|
||||
|
||||
# Supabase 配置
|
||||
SUPABASE_URL: str = ""
|
||||
SUPABASE_PUBLIC_URL: str = "" # 公网访问地址,用于生成前端可访问的 URL
|
||||
@@ -37,14 +79,35 @@ class Settings(BaseSettings):
|
||||
JWT_EXPIRE_HOURS: int = 24
|
||||
|
||||
# 管理员配置
|
||||
ADMIN_EMAIL: str = ""
|
||||
ADMIN_PHONE: str = ""
|
||||
ADMIN_PASSWORD: str = ""
|
||||
|
||||
# GLM AI 配置
|
||||
GLM_API_KEY: str = ""
|
||||
GLM_MODEL: str = "glm-4.7-flash"
|
||||
|
||||
# 支付宝配置
|
||||
ALIPAY_APP_ID: str = ""
|
||||
ALIPAY_PRIVATE_KEY_PATH: str = "" # 应用私钥 PEM 文件路径
|
||||
ALIPAY_PUBLIC_KEY_PATH: str = "" # 支付宝公钥 PEM 文件路径
|
||||
ALIPAY_NOTIFY_URL: str = "" # 异步通知回调地址(公网可达)
|
||||
ALIPAY_RETURN_URL: str = "" # 支付成功后同步跳转地址
|
||||
ALIPAY_SANDBOX: bool = False # 是否使用沙箱环境
|
||||
PAYMENT_AMOUNT: float = 999.00 # 会员价格(元)
|
||||
PAYMENT_EXPIRE_DAYS: int = 365 # 会员有效天数
|
||||
|
||||
# CORS 配置 (逗号分隔的域名列表,* 表示允许所有)
|
||||
CORS_ORIGINS: str = "*"
|
||||
@property
|
||||
def LATENTSYNC_DIR(self) -> Path:
|
||||
"""LatentSync 目录路径 (动态计算)"""
|
||||
return self.BASE_DIR.parent.parent / "models" / "LatentSync"
|
||||
|
||||
@property
|
||||
def MUSETALK_DIR(self) -> Path:
|
||||
"""MuseTalk 目录路径 (动态计算)"""
|
||||
return self.BASE_DIR.parent.parent / "models" / "MuseTalk"
|
||||
|
||||
class Config:
|
||||
env_file = ".env"
|
||||
extra = "ignore" # 忽略未知的环境变量
|
||||
|
||||
@@ -1,10 +1,11 @@
|
||||
"""
|
||||
依赖注入模块:认证和用户获取
|
||||
"""
|
||||
from typing import Optional
|
||||
from typing import Optional, Any, Dict, cast
|
||||
from fastapi import Request, HTTPException, Depends, status
|
||||
from app.core.security import decode_access_token, TokenData
|
||||
from app.core.supabase import get_supabase
|
||||
from app.core.security import decode_access_token
|
||||
from app.repositories.sessions import get_session, delete_sessions
|
||||
from app.repositories.users import get_user_by_id, deactivate_user_if_expired
|
||||
from loguru import logger
|
||||
|
||||
|
||||
@@ -15,7 +16,7 @@ async def get_token_from_cookie(request: Request) -> Optional[str]:
|
||||
|
||||
async def get_current_user_optional(
|
||||
request: Request
|
||||
) -> Optional[dict]:
|
||||
) -> Optional[Dict[str, Any]]:
|
||||
"""
|
||||
获取当前用户 (可选,未登录返回 None)
|
||||
"""
|
||||
@@ -29,23 +30,21 @@ async def get_current_user_optional(
|
||||
|
||||
# 验证 session_token 是否有效 (单设备登录检查)
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("user_sessions").select("*").eq(
|
||||
"user_id", token_data.user_id
|
||||
).eq(
|
||||
"session_token", token_data.session_token
|
||||
).execute()
|
||||
|
||||
if not result.data:
|
||||
session = get_session(token_data.user_id, token_data.session_token)
|
||||
if not session:
|
||||
logger.warning(f"Session token 无效: user_id={token_data.user_id}")
|
||||
return None
|
||||
|
||||
# 获取用户信息
|
||||
user_result = supabase.table("users").select("*").eq(
|
||||
"id", token_data.user_id
|
||||
).single().execute()
|
||||
|
||||
return user_result.data
|
||||
|
||||
user = cast(Optional[Dict[str, Any]], get_user_by_id(token_data.user_id))
|
||||
if user and deactivate_user_if_expired(user):
|
||||
delete_sessions(token_data.user_id)
|
||||
return None
|
||||
|
||||
if user and not user.get("is_active"):
|
||||
delete_sessions(token_data.user_id)
|
||||
return None
|
||||
|
||||
return user
|
||||
except Exception as e:
|
||||
logger.error(f"获取用户信息失败: {e}")
|
||||
return None
|
||||
@@ -53,7 +52,7 @@ async def get_current_user_optional(
|
||||
|
||||
async def get_current_user(
|
||||
request: Request
|
||||
) -> dict:
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
获取当前用户 (必须登录)
|
||||
|
||||
@@ -76,43 +75,35 @@ async def get_current_user(
|
||||
)
|
||||
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
|
||||
# 验证 session_token (单设备登录)
|
||||
session_result = supabase.table("user_sessions").select("*").eq(
|
||||
"user_id", token_data.user_id
|
||||
).eq(
|
||||
"session_token", token_data.session_token
|
||||
).execute()
|
||||
|
||||
if not session_result.data:
|
||||
session = get_session(token_data.user_id, token_data.session_token)
|
||||
if not session:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="会话已失效,请重新登录(可能已在其他设备登录)"
|
||||
)
|
||||
|
||||
# 获取用户信息
|
||||
user_result = supabase.table("users").select("*").eq(
|
||||
"id", token_data.user_id
|
||||
).single().execute()
|
||||
|
||||
user = user_result.data
|
||||
|
||||
user = get_user_by_id(token_data.user_id)
|
||||
if not user:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="用户不存在"
|
||||
)
|
||||
|
||||
# 检查授权是否过期
|
||||
if user.get("expires_at"):
|
||||
from datetime import datetime, timezone
|
||||
expires_at = datetime.fromisoformat(user["expires_at"].replace("Z", "+00:00"))
|
||||
if datetime.now(timezone.utc) > expires_at:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="授权已过期,请联系管理员续期"
|
||||
)
|
||||
|
||||
user = cast(Dict[str, Any], user)
|
||||
|
||||
if deactivate_user_if_expired(user):
|
||||
delete_sessions(token_data.user_id)
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="会员已到期,请续费"
|
||||
)
|
||||
|
||||
if not user.get("is_active"):
|
||||
delete_sessions(token_data.user_id)
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="账号已停用"
|
||||
)
|
||||
|
||||
return user
|
||||
except HTTPException:
|
||||
raise
|
||||
|
||||
26
backend/app/core/response.py
Normal file
26
backend/app/core/response.py
Normal file
@@ -0,0 +1,26 @@
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
|
||||
def success_response(
|
||||
data: Any = None,
|
||||
message: str = "ok",
|
||||
code: int = 0,
|
||||
success: bool = True,
|
||||
) -> Dict[str, Any]:
|
||||
return {
|
||||
"success": success,
|
||||
"message": message,
|
||||
"data": data,
|
||||
"code": code,
|
||||
}
|
||||
|
||||
|
||||
def error_response(message: str, code: int, data: Optional[Any] = None) -> Dict[str, Any]:
|
||||
payload = {
|
||||
"success": False,
|
||||
"message": message,
|
||||
"code": code,
|
||||
}
|
||||
if data is not None:
|
||||
payload["data"] = data
|
||||
return payload
|
||||
@@ -110,3 +110,28 @@ def set_auth_cookie(response: Response, token: str) -> None:
|
||||
def clear_auth_cookie(response: Response) -> None:
|
||||
"""清除认证 Cookie"""
|
||||
response.delete_cookie(key="access_token")
|
||||
|
||||
|
||||
def create_payment_token(user_id: str) -> str:
|
||||
"""生成付费专用短期 JWT token(30 分钟有效)"""
|
||||
payload = {
|
||||
"sub": user_id,
|
||||
"purpose": "payment",
|
||||
"exp": datetime.now(timezone.utc) + timedelta(minutes=30),
|
||||
}
|
||||
return jwt.encode(payload, settings.JWT_SECRET_KEY, algorithm=settings.JWT_ALGORITHM)
|
||||
|
||||
|
||||
def decode_payment_token(token: str) -> str | None:
|
||||
"""解析 payment_token,返回 user_id(仅 purpose=payment 有效)"""
|
||||
try:
|
||||
data = jwt.decode(
|
||||
token,
|
||||
settings.JWT_SECRET_KEY,
|
||||
algorithms=[settings.JWT_ALGORITHM],
|
||||
)
|
||||
if data.get("purpose") != "payment":
|
||||
return None
|
||||
return data.get("sub")
|
||||
except JWTError:
|
||||
return None
|
||||
|
||||
@@ -1,8 +1,22 @@
|
||||
from fastapi import FastAPI
|
||||
from fastapi import FastAPI, HTTPException
|
||||
from fastapi.staticfiles import StaticFiles
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from fastapi.responses import JSONResponse
|
||||
from app.core import config
|
||||
from app.api import materials, videos, publish, login_helper, auth, admin
|
||||
from app.core.response import error_response
|
||||
# 直接从 modules 导入路由,消除 api 转发层
|
||||
from app.modules.materials.router import router as materials_router
|
||||
from app.modules.videos.router import router as videos_router
|
||||
from app.modules.publish.router import router as publish_router
|
||||
from app.modules.login_helper.router import router as login_helper_router
|
||||
from app.modules.auth.router import router as auth_router
|
||||
from app.modules.admin.router import router as admin_router
|
||||
from app.modules.ref_audios.router import router as ref_audios_router
|
||||
from app.modules.ai.router import router as ai_router
|
||||
from app.modules.tools.router import router as tools_router
|
||||
from app.modules.assets.router import router as assets_router
|
||||
from app.modules.generated_audios.router import router as generated_audios_router
|
||||
from app.modules.payment.router import router as payment_router
|
||||
from loguru import logger
|
||||
import os
|
||||
|
||||
@@ -11,15 +25,33 @@ settings = config.settings
|
||||
app = FastAPI(title="ViGent TalkingHead Agent")
|
||||
|
||||
from fastapi import Request
|
||||
from fastapi.exceptions import RequestValidationError
|
||||
from starlette.middleware.base import BaseHTTPMiddleware
|
||||
import time
|
||||
import traceback
|
||||
|
||||
class LoggingMiddleware(BaseHTTPMiddleware):
|
||||
# 敏感 header 名称列表(小写)
|
||||
SENSITIVE_HEADERS = {'authorization', 'cookie', 'set-cookie', 'x-api-key', 'api-key'}
|
||||
|
||||
def _sanitize_headers(self, headers: dict) -> dict:
|
||||
"""脱敏处理请求头,隐藏敏感信息"""
|
||||
sanitized = {}
|
||||
for key, value in headers.items():
|
||||
if key.lower() in self.SENSITIVE_HEADERS:
|
||||
# 显示前8个字符 + 掩码
|
||||
if len(value) > 8:
|
||||
sanitized[key] = value[:8] + "..." + f"[{len(value)} chars]"
|
||||
else:
|
||||
sanitized[key] = "[REDACTED]"
|
||||
else:
|
||||
sanitized[key] = value
|
||||
return sanitized
|
||||
|
||||
async def dispatch(self, request: Request, call_next):
|
||||
start_time = time.time()
|
||||
logger.info(f"START Request: {request.method} {request.url}")
|
||||
logger.info(f"HEADERS: {dict(request.headers)}")
|
||||
logger.debug(f"HEADERS: {self._sanitize_headers(dict(request.headers))}")
|
||||
try:
|
||||
response = await call_next(request)
|
||||
process_time = time.time() - start_time
|
||||
@@ -32,10 +64,43 @@ class LoggingMiddleware(BaseHTTPMiddleware):
|
||||
|
||||
app.add_middleware(LoggingMiddleware)
|
||||
|
||||
|
||||
@app.exception_handler(RequestValidationError)
|
||||
async def validation_exception_handler(request: Request, exc: RequestValidationError):
|
||||
return JSONResponse(
|
||||
status_code=422,
|
||||
content=error_response("参数校验失败", 422, data=exc.errors()),
|
||||
)
|
||||
|
||||
|
||||
@app.exception_handler(HTTPException)
|
||||
async def http_exception_handler(request: Request, exc: HTTPException):
|
||||
detail = exc.detail
|
||||
message = detail if isinstance(detail, str) else "请求失败"
|
||||
data = detail if not isinstance(detail, str) else None
|
||||
return JSONResponse(
|
||||
status_code=exc.status_code,
|
||||
content=error_response(message, exc.status_code, data=data),
|
||||
headers=exc.headers,
|
||||
)
|
||||
|
||||
|
||||
@app.exception_handler(Exception)
|
||||
async def unhandled_exception_handler(request: Request, exc: Exception):
|
||||
return JSONResponse(
|
||||
status_code=500,
|
||||
content=error_response("服务器内部错误", 500),
|
||||
)
|
||||
|
||||
# CORS 配置:从环境变量读取允许的域名
|
||||
# 当使用 credentials 时,不能使用 * 通配符
|
||||
cors_origins = settings.CORS_ORIGINS.split(",") if settings.CORS_ORIGINS != "*" else ["*"]
|
||||
allow_credentials = settings.CORS_ORIGINS != "*" # 使用 * 时不能 allow_credentials
|
||||
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_credentials=True,
|
||||
allow_origins=cors_origins,
|
||||
allow_credentials=allow_credentials,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
@@ -44,17 +109,25 @@ app.add_middleware(
|
||||
settings.UPLOAD_DIR.mkdir(parents=True, exist_ok=True)
|
||||
settings.OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
||||
(settings.UPLOAD_DIR / "materials").mkdir(exist_ok=True)
|
||||
settings.ASSETS_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
app.mount("/outputs", StaticFiles(directory=str(settings.OUTPUT_DIR)), name="outputs")
|
||||
app.mount("/uploads", StaticFiles(directory=str(settings.UPLOAD_DIR)), name="uploads")
|
||||
app.mount("/assets", StaticFiles(directory=str(settings.ASSETS_DIR)), name="assets")
|
||||
|
||||
# 注册路由
|
||||
app.include_router(materials.router, prefix="/api/materials", tags=["Materials"])
|
||||
app.include_router(videos.router, prefix="/api/videos", tags=["Videos"])
|
||||
app.include_router(publish.router, prefix="/api/publish", tags=["Publish"])
|
||||
app.include_router(login_helper.router, prefix="/api", tags=["LoginHelper"])
|
||||
app.include_router(auth.router) # /api/auth
|
||||
app.include_router(admin.router) # /api/admin
|
||||
app.include_router(materials_router, prefix="/api/materials", tags=["Materials"])
|
||||
app.include_router(videos_router, prefix="/api/videos", tags=["Videos"])
|
||||
app.include_router(publish_router, prefix="/api/publish", tags=["Publish"])
|
||||
app.include_router(login_helper_router, prefix="/api", tags=["LoginHelper"])
|
||||
app.include_router(auth_router) # /api/auth
|
||||
app.include_router(admin_router) # /api/admin
|
||||
app.include_router(ref_audios_router, prefix="/api/ref-audios", tags=["RefAudios"])
|
||||
app.include_router(ai_router) # /api/ai
|
||||
app.include_router(tools_router, prefix="/api/tools", tags=["Tools"])
|
||||
app.include_router(assets_router, prefix="/api/assets", tags=["Assets"])
|
||||
app.include_router(generated_audios_router, prefix="/api/generated-audios", tags=["GeneratedAudios"])
|
||||
app.include_router(payment_router) # /api/payment
|
||||
|
||||
|
||||
@app.on_event("startup")
|
||||
@@ -62,37 +135,31 @@ async def init_admin():
|
||||
"""
|
||||
服务启动时初始化管理员账号
|
||||
"""
|
||||
admin_email = settings.ADMIN_EMAIL
|
||||
admin_phone = settings.ADMIN_PHONE
|
||||
admin_password = settings.ADMIN_PASSWORD
|
||||
|
||||
if not admin_email or not admin_password:
|
||||
logger.warning("未配置 ADMIN_EMAIL 和 ADMIN_PASSWORD,跳过管理员初始化")
|
||||
if not admin_phone or not admin_password:
|
||||
logger.warning("未配置 ADMIN_PHONE 和 ADMIN_PASSWORD,跳过管理员初始化")
|
||||
return
|
||||
|
||||
try:
|
||||
from app.core.supabase import get_supabase
|
||||
from app.core.security import get_password_hash
|
||||
|
||||
supabase = get_supabase()
|
||||
|
||||
# 检查是否已存在
|
||||
existing = supabase.table("users").select("id").eq("email", admin_email).execute()
|
||||
|
||||
if existing.data:
|
||||
logger.info(f"管理员账号已存在: {admin_email}")
|
||||
from app.repositories.users import create_user, user_exists_by_phone
|
||||
|
||||
if user_exists_by_phone(admin_phone):
|
||||
logger.info(f"管理员账号已存在: {admin_phone}")
|
||||
return
|
||||
|
||||
# 创建管理员
|
||||
supabase.table("users").insert({
|
||||
"email": admin_email,
|
||||
|
||||
create_user({
|
||||
"phone": admin_phone,
|
||||
"password_hash": get_password_hash(admin_password),
|
||||
"username": "Admin",
|
||||
"role": "admin",
|
||||
"is_active": True,
|
||||
"expires_at": None # 永不过期
|
||||
}).execute()
|
||||
})
|
||||
|
||||
logger.success(f"管理员账号已创建: {admin_email}")
|
||||
logger.success(f"管理员账号已创建: {admin_phone}")
|
||||
except Exception as e:
|
||||
logger.error(f"初始化管理员失败: {e}")
|
||||
|
||||
|
||||
0
backend/app/modules/admin/__init__.py
Normal file
0
backend/app/modules/admin/__init__.py
Normal file
@@ -3,10 +3,12 @@
|
||||
"""
|
||||
from fastapi import APIRouter, HTTPException, Depends, status
|
||||
from pydantic import BaseModel
|
||||
from typing import Optional, List
|
||||
from typing import Optional, List, Any, cast
|
||||
from datetime import datetime, timezone, timedelta
|
||||
from app.core.supabase import get_supabase
|
||||
from app.core.deps import get_current_admin
|
||||
from app.core.deps import get_current_admin
|
||||
from app.core.response import success_response
|
||||
from app.repositories.sessions import delete_sessions
|
||||
from app.repositories.users import get_user_by_id, list_users as list_users_repo, update_user
|
||||
from loguru import logger
|
||||
|
||||
router = APIRouter(prefix="/api/admin", tags=["管理"])
|
||||
@@ -14,7 +16,7 @@ router = APIRouter(prefix="/api/admin", tags=["管理"])
|
||||
|
||||
class UserListItem(BaseModel):
|
||||
id: str
|
||||
email: str
|
||||
phone: str
|
||||
username: Optional[str]
|
||||
role: str
|
||||
is_active: bool
|
||||
@@ -26,25 +28,23 @@ class ActivateRequest(BaseModel):
|
||||
expires_days: Optional[int] = None # 授权天数,None 表示永久
|
||||
|
||||
|
||||
@router.get("/users", response_model=List[UserListItem])
|
||||
async def list_users(admin: dict = Depends(get_current_admin)):
|
||||
@router.get("/users")
|
||||
async def list_users(admin: dict = Depends(get_current_admin)):
|
||||
"""获取所有用户列表"""
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").select("*").order("created_at", desc=True).execute()
|
||||
|
||||
return [
|
||||
UserListItem(
|
||||
id=u["id"],
|
||||
email=u["email"],
|
||||
username=u.get("username"),
|
||||
role=u["role"],
|
||||
is_active=u["is_active"],
|
||||
expires_at=u.get("expires_at"),
|
||||
created_at=u["created_at"]
|
||||
)
|
||||
for u in result.data
|
||||
]
|
||||
data = list_users_repo()
|
||||
return success_response([
|
||||
UserListItem(
|
||||
id=u["id"],
|
||||
phone=u["phone"],
|
||||
username=u.get("username"),
|
||||
role=u["role"],
|
||||
is_active=u["is_active"],
|
||||
expires_at=u.get("expires_at"),
|
||||
created_at=u["created_at"]
|
||||
).model_dump()
|
||||
for u in data
|
||||
])
|
||||
except Exception as e:
|
||||
logger.error(f"获取用户列表失败: {e}")
|
||||
raise HTTPException(
|
||||
@@ -67,32 +67,26 @@ async def activate_user(
|
||||
request.expires_days: 授权天数 (None 表示永久)
|
||||
"""
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
# 计算过期时间
|
||||
expires_at = None
|
||||
if request.expires_days:
|
||||
expires_at = (datetime.now(timezone.utc) + timedelta(days=request.expires_days)).isoformat()
|
||||
|
||||
result = update_user(user_id, {
|
||||
"is_active": True,
|
||||
"role": "user",
|
||||
"expires_at": expires_at
|
||||
})
|
||||
|
||||
if not result:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_404_NOT_FOUND,
|
||||
detail="用户不存在"
|
||||
)
|
||||
|
||||
# 计算过期时间
|
||||
expires_at = None
|
||||
if request.expires_days:
|
||||
expires_at = (datetime.now(timezone.utc) + timedelta(days=request.expires_days)).isoformat()
|
||||
logger.info(f"管理员 {admin['phone']} 激活用户 {user_id}, 有效期: {request.expires_days or '永久'} 天")
|
||||
|
||||
# 更新用户
|
||||
result = supabase.table("users").update({
|
||||
"is_active": True,
|
||||
"role": "user",
|
||||
"expires_at": expires_at
|
||||
}).eq("id", user_id).execute()
|
||||
|
||||
if not result.data:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_404_NOT_FOUND,
|
||||
detail="用户不存在"
|
||||
)
|
||||
|
||||
logger.info(f"管理员 {admin['email']} 激活用户 {user_id}, 有效期: {request.expires_days or '永久'} 天")
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": f"用户已激活,有效期: {request.expires_days or '永久'} 天"
|
||||
}
|
||||
return success_response(message=f"用户已激活,有效期: {request.expires_days or '永久'} 天")
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
@@ -110,27 +104,20 @@ async def deactivate_user(
|
||||
):
|
||||
"""停用用户"""
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
# 不能停用管理员
|
||||
user = cast(dict[str, Any], get_user_by_id(user_id) or {})
|
||||
if user.get("role") == "admin":
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail="不能停用管理员账号"
|
||||
)
|
||||
|
||||
update_user(user_id, {"is_active": False})
|
||||
delete_sessions(user_id)
|
||||
|
||||
# 不能停用管理员
|
||||
user_result = supabase.table("users").select("role").eq("id", user_id).single().execute()
|
||||
if user_result.data and user_result.data["role"] == "admin":
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail="不能停用管理员账号"
|
||||
)
|
||||
logger.info(f"管理员 {admin['phone']} 停用用户 {user_id}")
|
||||
|
||||
# 更新用户
|
||||
result = supabase.table("users").update({
|
||||
"is_active": False
|
||||
}).eq("id", user_id).execute()
|
||||
|
||||
# 清除用户 session
|
||||
supabase.table("user_sessions").delete().eq("user_id", user_id).execute()
|
||||
|
||||
logger.info(f"管理员 {admin['email']} 停用用户 {user_id}")
|
||||
|
||||
return {"success": True, "message": "用户已停用"}
|
||||
return success_response(message="用户已停用")
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
@@ -149,15 +136,12 @@ async def extend_user(
|
||||
):
|
||||
"""延长用户授权期限"""
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
|
||||
if not request.expires_days:
|
||||
# 设为永久
|
||||
expires_at = None
|
||||
else:
|
||||
# 获取当前过期时间
|
||||
user_result = supabase.table("users").select("expires_at").eq("id", user_id).single().execute()
|
||||
user = user_result.data
|
||||
if not request.expires_days:
|
||||
# 设为永久
|
||||
expires_at = None
|
||||
else:
|
||||
# 获取当前过期时间
|
||||
user = cast(dict[str, Any], get_user_by_id(user_id) or {})
|
||||
|
||||
if user and user.get("expires_at"):
|
||||
current_expires = datetime.fromisoformat(user["expires_at"].replace("Z", "+00:00"))
|
||||
@@ -167,16 +151,11 @@ async def extend_user(
|
||||
|
||||
expires_at = (base_time + timedelta(days=request.expires_days)).isoformat()
|
||||
|
||||
result = supabase.table("users").update({
|
||||
"expires_at": expires_at
|
||||
}).eq("id", user_id).execute()
|
||||
update_user(user_id, {"expires_at": expires_at})
|
||||
|
||||
logger.info(f"管理员 {admin['email']} 延长用户 {user_id} 授权 {request.expires_days or '永久'} 天")
|
||||
logger.info(f"管理员 {admin['phone']} 延长用户 {user_id} 授权 {request.expires_days or '永久'} 天")
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": f"授权已延长 {request.expires_days or '永久'} 天"
|
||||
}
|
||||
return success_response(message=f"授权已延长 {request.expires_days or '永久'} 天")
|
||||
except Exception as e:
|
||||
logger.error(f"延长授权失败: {e}")
|
||||
raise HTTPException(
|
||||
0
backend/app/modules/ai/__init__.py
Normal file
0
backend/app/modules/ai/__init__.py
Normal file
98
backend/app/modules/ai/router.py
Normal file
98
backend/app/modules/ai/router.py
Normal file
@@ -0,0 +1,98 @@
|
||||
"""
|
||||
AI 相关 API 路由
|
||||
"""
|
||||
|
||||
from typing import Optional
|
||||
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from pydantic import BaseModel
|
||||
from loguru import logger
|
||||
|
||||
from app.services.glm_service import glm_service
|
||||
from app.core.response import success_response
|
||||
|
||||
|
||||
router = APIRouter(prefix="/api/ai", tags=["AI"])
|
||||
|
||||
|
||||
class GenerateMetaRequest(BaseModel):
|
||||
"""生成标题标签请求"""
|
||||
text: str
|
||||
|
||||
|
||||
class GenerateMetaResponse(BaseModel):
|
||||
"""生成标题标签响应"""
|
||||
title: str
|
||||
secondary_title: str = ""
|
||||
tags: list[str]
|
||||
|
||||
|
||||
class RewriteRequest(BaseModel):
|
||||
"""改写请求"""
|
||||
text: str
|
||||
custom_prompt: Optional[str] = None
|
||||
|
||||
|
||||
class TranslateRequest(BaseModel):
|
||||
"""翻译请求"""
|
||||
text: str
|
||||
target_lang: str
|
||||
|
||||
|
||||
@router.post("/translate")
|
||||
async def translate_text(req: TranslateRequest):
|
||||
"""
|
||||
AI 翻译文案
|
||||
|
||||
将文案翻译为指定目标语言
|
||||
"""
|
||||
if not req.text or not req.text.strip():
|
||||
raise HTTPException(status_code=400, detail="文案不能为空")
|
||||
if not req.target_lang or not req.target_lang.strip():
|
||||
raise HTTPException(status_code=400, detail="目标语言不能为空")
|
||||
|
||||
try:
|
||||
logger.info(f"Translating text to {req.target_lang}: {req.text[:50]}...")
|
||||
translated = await glm_service.translate_text(req.text.strip(), req.target_lang.strip())
|
||||
return success_response({"translated_text": translated})
|
||||
except Exception as e:
|
||||
logger.error(f"Translate failed: {e}")
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
|
||||
|
||||
@router.post("/generate-meta")
|
||||
async def generate_meta(req: GenerateMetaRequest):
|
||||
"""
|
||||
AI 生成视频标题和标签
|
||||
|
||||
根据口播文案自动生成吸引人的标题和相关标签
|
||||
"""
|
||||
if not req.text or not req.text.strip():
|
||||
raise HTTPException(status_code=400, detail="口播文案不能为空")
|
||||
|
||||
try:
|
||||
logger.info(f"Generating meta for text: {req.text[:50]}...")
|
||||
result = await glm_service.generate_title_tags(req.text)
|
||||
return success_response(GenerateMetaResponse(
|
||||
title=result.get("title", ""),
|
||||
secondary_title=result.get("secondary_title", ""),
|
||||
tags=result.get("tags", [])
|
||||
).model_dump())
|
||||
except Exception as e:
|
||||
logger.error(f"Generate meta failed: {e}")
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
|
||||
|
||||
@router.post("/rewrite")
|
||||
async def rewrite_script(req: RewriteRequest):
|
||||
"""AI 改写文案"""
|
||||
if not req.text or not req.text.strip():
|
||||
raise HTTPException(status_code=400, detail="文案不能为空")
|
||||
|
||||
try:
|
||||
logger.info(f"Rewriting text: {req.text[:50]}...")
|
||||
rewritten = await glm_service.rewrite_script(req.text.strip(), req.custom_prompt)
|
||||
return success_response({"rewritten_text": rewritten})
|
||||
except Exception as e:
|
||||
logger.error(f"Rewrite failed: {e}")
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
0
backend/app/modules/assets/__init__.py
Normal file
0
backend/app/modules/assets/__init__.py
Normal file
23
backend/app/modules/assets/router.py
Normal file
23
backend/app/modules/assets/router.py
Normal file
@@ -0,0 +1,23 @@
|
||||
from fastapi import APIRouter, Depends
|
||||
|
||||
from app.core.deps import get_current_user
|
||||
from app.services.assets_service import list_styles, list_bgm
|
||||
from app.core.response import success_response
|
||||
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
@router.get("/subtitle-styles")
|
||||
async def list_subtitle_styles(current_user: dict = Depends(get_current_user)):
|
||||
return success_response({"styles": list_styles("subtitle")})
|
||||
|
||||
|
||||
@router.get("/title-styles")
|
||||
async def list_title_styles(current_user: dict = Depends(get_current_user)):
|
||||
return success_response({"styles": list_styles("title")})
|
||||
|
||||
|
||||
@router.get("/bgm")
|
||||
async def list_bgm_items(current_user: dict = Depends(get_current_user)):
|
||||
return success_response({"bgm": list_bgm()})
|
||||
0
backend/app/modules/auth/__init__.py
Normal file
0
backend/app/modules/auth/__init__.py
Normal file
285
backend/app/modules/auth/router.py
Normal file
285
backend/app/modules/auth/router.py
Normal file
@@ -0,0 +1,285 @@
|
||||
"""
|
||||
认证 API:注册、登录、登出、修改密码
|
||||
"""
|
||||
from fastapi import APIRouter, HTTPException, Response, status, Request, Depends
|
||||
from fastapi.responses import JSONResponse
|
||||
from pydantic import BaseModel, field_validator
|
||||
from app.core.security import (
|
||||
get_password_hash,
|
||||
verify_password,
|
||||
create_access_token,
|
||||
generate_session_token,
|
||||
set_auth_cookie,
|
||||
clear_auth_cookie,
|
||||
decode_access_token,
|
||||
create_payment_token,
|
||||
)
|
||||
from app.repositories.sessions import create_session, delete_sessions
|
||||
from app.repositories.users import (
|
||||
create_user,
|
||||
get_user_by_id,
|
||||
get_user_by_phone,
|
||||
user_exists_by_phone,
|
||||
update_user,
|
||||
deactivate_user_if_expired,
|
||||
)
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
from loguru import logger
|
||||
from typing import Optional, Any, cast
|
||||
import re
|
||||
|
||||
router = APIRouter(prefix="/api/auth", tags=["认证"])
|
||||
|
||||
|
||||
class RegisterRequest(BaseModel):
|
||||
phone: str
|
||||
password: str
|
||||
username: Optional[str] = None
|
||||
|
||||
@field_validator('phone')
|
||||
@classmethod
|
||||
def validate_phone(cls, v):
|
||||
if not re.match(r'^\d{11}$', v):
|
||||
raise ValueError('手机号必须是11位数字')
|
||||
return v
|
||||
|
||||
|
||||
class LoginRequest(BaseModel):
|
||||
phone: str
|
||||
password: str
|
||||
|
||||
@field_validator('phone')
|
||||
@classmethod
|
||||
def validate_phone(cls, v):
|
||||
if not re.match(r'^\d{11}$', v):
|
||||
raise ValueError('手机号必须是11位数字')
|
||||
return v
|
||||
|
||||
|
||||
class ChangePasswordRequest(BaseModel):
|
||||
old_password: str
|
||||
new_password: str
|
||||
|
||||
@field_validator('new_password')
|
||||
@classmethod
|
||||
def validate_new_password(cls, v):
|
||||
if len(v) < 6:
|
||||
raise ValueError('新密码长度至少6位')
|
||||
return v
|
||||
|
||||
|
||||
class UserResponse(BaseModel):
|
||||
id: str
|
||||
phone: str
|
||||
username: Optional[str]
|
||||
role: str
|
||||
is_active: bool
|
||||
expires_at: Optional[str] = None
|
||||
|
||||
|
||||
@router.post("/register")
|
||||
async def register(request: RegisterRequest):
|
||||
"""
|
||||
用户注册
|
||||
|
||||
注册后状态为 pending,需要管理员激活
|
||||
"""
|
||||
try:
|
||||
if user_exists_by_phone(request.phone):
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail="该手机号已注册"
|
||||
)
|
||||
|
||||
# 创建用户
|
||||
password_hash = get_password_hash(request.password)
|
||||
|
||||
create_user({
|
||||
"phone": request.phone,
|
||||
"password_hash": password_hash,
|
||||
"username": request.username or f"用户{request.phone[-4:]}",
|
||||
"role": "pending",
|
||||
"is_active": False
|
||||
})
|
||||
|
||||
logger.info(f"新用户注册: {request.phone}")
|
||||
|
||||
return success_response(message="注册成功,请等待管理员审核激活")
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"注册失败: {e}")
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||
detail="注册失败,请稍后重试"
|
||||
)
|
||||
|
||||
|
||||
@router.post("/login")
|
||||
async def login(request: LoginRequest, response: Response):
|
||||
"""
|
||||
用户登录
|
||||
|
||||
- 验证密码
|
||||
- 检查是否激活
|
||||
- 实现"后踢前"单设备登录
|
||||
"""
|
||||
try:
|
||||
user = cast(dict[str, Any], get_user_by_phone(request.phone) or {})
|
||||
if not user:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="手机号或密码错误"
|
||||
)
|
||||
|
||||
# 验证密码
|
||||
if not verify_password(request.password, user["password_hash"]):
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="手机号或密码错误"
|
||||
)
|
||||
|
||||
# 过期自动停用(注意:只更新 DB,不修改内存中的 user 字典)
|
||||
expired = deactivate_user_if_expired(user)
|
||||
if expired:
|
||||
delete_sessions(user["id"])
|
||||
|
||||
# 过期 或 未激活(新注册)→ 返回付费指引
|
||||
if expired or not user["is_active"]:
|
||||
payment_token = create_payment_token(user["id"])
|
||||
return JSONResponse(
|
||||
status_code=403,
|
||||
content={
|
||||
"success": False,
|
||||
"message": "请付费开通会员",
|
||||
"code": 403,
|
||||
"data": {
|
||||
"reason": "PAYMENT_REQUIRED",
|
||||
"payment_token": payment_token,
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
# 生成新的 session_token (后踢前)
|
||||
session_token = generate_session_token()
|
||||
|
||||
# 删除旧 session,插入新 session
|
||||
delete_sessions(user["id"])
|
||||
create_session(user["id"], session_token, None)
|
||||
|
||||
# 生成 JWT Token
|
||||
token = create_access_token(user["id"], session_token)
|
||||
|
||||
# 设置 HttpOnly Cookie
|
||||
set_auth_cookie(response, token)
|
||||
|
||||
logger.info(f"用户登录: {request.phone}")
|
||||
|
||||
return success_response(
|
||||
data={
|
||||
"user": UserResponse(
|
||||
id=user["id"],
|
||||
phone=user["phone"],
|
||||
username=user.get("username"),
|
||||
role=user["role"],
|
||||
is_active=user["is_active"],
|
||||
expires_at=user.get("expires_at")
|
||||
).model_dump()
|
||||
},
|
||||
message="登录成功",
|
||||
)
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"登录失败: {e}")
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||
detail="登录失败,请稍后重试"
|
||||
)
|
||||
|
||||
|
||||
@router.post("/logout")
|
||||
async def logout(response: Response):
|
||||
"""用户登出"""
|
||||
clear_auth_cookie(response)
|
||||
return success_response(message="已登出")
|
||||
|
||||
|
||||
@router.post("/change-password")
|
||||
async def change_password(request: ChangePasswordRequest, req: Request, response: Response):
|
||||
"""
|
||||
修改密码
|
||||
|
||||
- 验证当前密码
|
||||
- 设置新密码
|
||||
- 重新生成 session token
|
||||
"""
|
||||
# 从 Cookie 获取用户
|
||||
token = req.cookies.get("access_token")
|
||||
if not token:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="未登录"
|
||||
)
|
||||
|
||||
token_data = decode_access_token(token)
|
||||
if not token_data:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="Token 无效"
|
||||
)
|
||||
|
||||
try:
|
||||
user = cast(dict[str, Any], get_user_by_id(token_data.user_id) or {})
|
||||
if not user:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="用户不存在"
|
||||
)
|
||||
|
||||
# 验证当前密码
|
||||
if not verify_password(request.old_password, user["password_hash"]):
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail="当前密码错误"
|
||||
)
|
||||
|
||||
# 更新密码
|
||||
new_password_hash = get_password_hash(request.new_password)
|
||||
update_user(user["id"], {"password_hash": new_password_hash})
|
||||
|
||||
# 生成新的 session token,使旧 token 失效
|
||||
new_session_token = generate_session_token()
|
||||
|
||||
delete_sessions(user["id"])
|
||||
create_session(user["id"], new_session_token, None)
|
||||
|
||||
# 生成新的 JWT Token
|
||||
new_token = create_access_token(user["id"], new_session_token)
|
||||
set_auth_cookie(response, new_token)
|
||||
|
||||
logger.info(f"用户修改密码: {user['phone']}")
|
||||
|
||||
return success_response(message="密码修改成功")
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"修改密码失败: {e}")
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
||||
detail="修改密码失败,请稍后重试"
|
||||
)
|
||||
|
||||
|
||||
@router.get("/me")
|
||||
async def get_me(user: dict = Depends(get_current_user)):
|
||||
"""获取当前用户信息"""
|
||||
return success_response(UserResponse(
|
||||
id=user["id"],
|
||||
phone=user["phone"],
|
||||
username=user.get("username"),
|
||||
role=user["role"],
|
||||
is_active=user["is_active"],
|
||||
expires_at=user.get("expires_at")
|
||||
).model_dump())
|
||||
0
backend/app/modules/generated_audios/__init__.py
Normal file
0
backend/app/modules/generated_audios/__init__.py
Normal file
77
backend/app/modules/generated_audios/router.py
Normal file
77
backend/app/modules/generated_audios/router.py
Normal file
@@ -0,0 +1,77 @@
|
||||
"""生成配音 API"""
|
||||
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException
|
||||
import uuid
|
||||
from loguru import logger
|
||||
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
from app.modules.videos.task_store import create_task, get_task
|
||||
from app.modules.generated_audios.schemas import GenerateAudioRequest, RenameAudioRequest
|
||||
from app.modules.generated_audios import service
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
@router.post("/generate")
|
||||
async def generate_audio(
|
||||
req: GenerateAudioRequest,
|
||||
background_tasks: BackgroundTasks,
|
||||
user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""异步生成配音(返回 task_id)"""
|
||||
task_id = str(uuid.uuid4())
|
||||
create_task(task_id, user["id"])
|
||||
background_tasks.add_task(service.generate_audio_task, task_id, req, user["id"])
|
||||
return success_response({"task_id": task_id})
|
||||
|
||||
|
||||
@router.get("/tasks/{task_id}")
|
||||
async def get_audio_task(task_id: str, user: dict = Depends(get_current_user)):
|
||||
"""轮询配音生成进度"""
|
||||
task = get_task(task_id)
|
||||
if task.get("status") != "not_found" and task.get("user_id") != user["id"]:
|
||||
return success_response({"status": "not_found"})
|
||||
return success_response(task)
|
||||
|
||||
|
||||
@router.get("")
|
||||
async def list_audios(user: dict = Depends(get_current_user)):
|
||||
"""列出当前用户所有已生成配音"""
|
||||
try:
|
||||
result = await service.list_generated_audios(user["id"])
|
||||
return success_response(result)
|
||||
except Exception as e:
|
||||
logger.error(f"列出配音失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"获取列表失败: {str(e)}")
|
||||
|
||||
|
||||
@router.delete("/{audio_id:path}")
|
||||
async def delete_audio(audio_id: str, user: dict = Depends(get_current_user)):
|
||||
"""删除配音"""
|
||||
try:
|
||||
await service.delete_generated_audio(audio_id, user["id"])
|
||||
return success_response(message="删除成功")
|
||||
except PermissionError as e:
|
||||
raise HTTPException(status_code=403, detail=str(e))
|
||||
except Exception as e:
|
||||
logger.error(f"删除配音失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"删除失败: {str(e)}")
|
||||
|
||||
|
||||
@router.put("/{audio_id:path}")
|
||||
async def rename_audio(
|
||||
audio_id: str,
|
||||
request: RenameAudioRequest,
|
||||
user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""重命名配音"""
|
||||
try:
|
||||
result = await service.rename_generated_audio(audio_id, request.new_name, user["id"])
|
||||
return success_response(result, message="重命名成功")
|
||||
except PermissionError as e:
|
||||
raise HTTPException(status_code=403, detail=str(e))
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
except Exception as e:
|
||||
logger.error(f"重命名配音失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"重命名失败: {str(e)}")
|
||||
31
backend/app/modules/generated_audios/schemas.py
Normal file
31
backend/app/modules/generated_audios/schemas.py
Normal file
@@ -0,0 +1,31 @@
|
||||
from pydantic import BaseModel
|
||||
from typing import Optional, List
|
||||
|
||||
|
||||
class GenerateAudioRequest(BaseModel):
|
||||
text: str
|
||||
tts_mode: str = "edgetts"
|
||||
voice: str = "zh-CN-YunxiNeural"
|
||||
ref_audio_id: Optional[str] = None
|
||||
ref_text: Optional[str] = None
|
||||
language: str = "zh-CN"
|
||||
speed: float = 1.0
|
||||
|
||||
|
||||
class RenameAudioRequest(BaseModel):
|
||||
new_name: str
|
||||
|
||||
|
||||
class GeneratedAudioItem(BaseModel):
|
||||
id: str
|
||||
name: str
|
||||
path: str
|
||||
duration_sec: float
|
||||
text: str
|
||||
tts_mode: str
|
||||
language: str
|
||||
created_at: int
|
||||
|
||||
|
||||
class GeneratedAudioListResponse(BaseModel):
|
||||
items: List[GeneratedAudioItem]
|
||||
264
backend/app/modules/generated_audios/service.py
Normal file
264
backend/app/modules/generated_audios/service.py
Normal file
@@ -0,0 +1,264 @@
|
||||
"""生成配音 - 业务逻辑"""
|
||||
import re
|
||||
import json
|
||||
import time
|
||||
import asyncio
|
||||
import subprocess
|
||||
import tempfile
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import httpx
|
||||
from loguru import logger
|
||||
|
||||
from app.services.storage import storage_service
|
||||
from app.services.tts_service import TTSService
|
||||
from app.services.voice_clone_service import voice_clone_service
|
||||
from app.modules.videos.task_store import task_store
|
||||
from app.modules.generated_audios.schemas import (
|
||||
GenerateAudioRequest,
|
||||
GeneratedAudioItem,
|
||||
GeneratedAudioListResponse,
|
||||
)
|
||||
|
||||
BUCKET = "generated-audios"
|
||||
|
||||
|
||||
def _locale_to_tts_lang(locale: str) -> str:
|
||||
mapping = {"zh": "Chinese", "en": "English"}
|
||||
return mapping.get(locale.split("-")[0], "Auto")
|
||||
|
||||
|
||||
def _get_audio_duration(file_path: str) -> float:
|
||||
try:
|
||||
result = subprocess.run(
|
||||
['ffprobe', '-v', 'quiet', '-show_entries', 'format=duration',
|
||||
'-of', 'csv=p=0', file_path],
|
||||
capture_output=True, text=True, timeout=10
|
||||
)
|
||||
return float(result.stdout.strip())
|
||||
except Exception as e:
|
||||
logger.warning(f"获取音频时长失败: {e}")
|
||||
return 0.0
|
||||
|
||||
|
||||
async def generate_audio_task(task_id: str, req: GenerateAudioRequest, user_id: str):
|
||||
"""后台任务:生成配音"""
|
||||
try:
|
||||
task_store.update(task_id, {"status": "processing", "progress": 10, "message": "正在生成配音..."})
|
||||
|
||||
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp:
|
||||
audio_path = tmp.name
|
||||
|
||||
try:
|
||||
if req.tts_mode == "voiceclone":
|
||||
if not req.ref_audio_id or not req.ref_text:
|
||||
raise ValueError("声音克隆模式需要提供参考音频和参考文字")
|
||||
|
||||
task_store.update(task_id, {"progress": 20, "message": "正在下载参考音频..."})
|
||||
|
||||
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_ref:
|
||||
ref_local = tmp_ref.name
|
||||
|
||||
try:
|
||||
ref_url = await storage_service.get_signed_url(
|
||||
bucket="ref-audios", path=req.ref_audio_id
|
||||
)
|
||||
timeout = httpx.Timeout(None)
|
||||
async with httpx.AsyncClient(timeout=timeout) as client:
|
||||
async with client.stream("GET", ref_url) as resp:
|
||||
resp.raise_for_status()
|
||||
with open(ref_local, "wb") as f:
|
||||
async for chunk in resp.aiter_bytes():
|
||||
f.write(chunk)
|
||||
|
||||
task_store.update(task_id, {"progress": 40, "message": "正在克隆声音..."})
|
||||
await voice_clone_service.generate_audio(
|
||||
text=req.text,
|
||||
ref_audio_path=ref_local,
|
||||
ref_text=req.ref_text,
|
||||
output_path=audio_path,
|
||||
language=_locale_to_tts_lang(req.language),
|
||||
speed=req.speed,
|
||||
)
|
||||
finally:
|
||||
if os.path.exists(ref_local):
|
||||
os.unlink(ref_local)
|
||||
else:
|
||||
task_store.update(task_id, {"progress": 30, "message": "正在生成语音..."})
|
||||
tts = TTSService()
|
||||
await tts.generate_audio(req.text, req.voice, audio_path)
|
||||
|
||||
task_store.update(task_id, {"progress": 70, "message": "正在上传配音..."})
|
||||
|
||||
duration = _get_audio_duration(audio_path)
|
||||
timestamp = int(time.time())
|
||||
audio_id = f"{user_id}/{timestamp}_audio.wav"
|
||||
meta_id = f"{user_id}/{timestamp}_audio.json"
|
||||
|
||||
# 生成 display_name
|
||||
now = time.strftime("%Y%m%d_%H%M", time.localtime(timestamp))
|
||||
display_name = f"配音_{now}"
|
||||
|
||||
with open(audio_path, "rb") as f:
|
||||
wav_data = f.read()
|
||||
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET, path=audio_id,
|
||||
file_data=wav_data, content_type="audio/wav",
|
||||
)
|
||||
|
||||
metadata = {
|
||||
"display_name": display_name,
|
||||
"text": req.text,
|
||||
"tts_mode": req.tts_mode,
|
||||
"voice": req.voice if req.tts_mode == "edgetts" else None,
|
||||
"ref_audio_id": req.ref_audio_id,
|
||||
"language": req.language,
|
||||
"duration_sec": duration,
|
||||
"created_at": timestamp,
|
||||
}
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET, path=meta_id,
|
||||
file_data=json.dumps(metadata, ensure_ascii=False).encode("utf-8"),
|
||||
content_type="application/json",
|
||||
)
|
||||
|
||||
signed_url = await storage_service.get_signed_url(BUCKET, audio_id)
|
||||
|
||||
task_store.update(task_id, {
|
||||
"status": "completed",
|
||||
"progress": 100,
|
||||
"message": f"配音生成完成 ({duration:.1f}s)",
|
||||
"output": {
|
||||
"audio_id": audio_id,
|
||||
"name": display_name,
|
||||
"path": signed_url,
|
||||
"duration_sec": duration,
|
||||
"text": req.text,
|
||||
"tts_mode": req.tts_mode,
|
||||
"language": req.language,
|
||||
"created_at": timestamp,
|
||||
},
|
||||
})
|
||||
finally:
|
||||
if os.path.exists(audio_path):
|
||||
os.unlink(audio_path)
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
task_store.update(task_id, {
|
||||
"status": "failed",
|
||||
"message": f"配音生成失败: {str(e)}",
|
||||
"error": traceback.format_exc(),
|
||||
})
|
||||
logger.error(f"Generate audio failed: {e}")
|
||||
|
||||
|
||||
async def list_generated_audios(user_id: str) -> dict:
|
||||
"""列出用户的所有已生成配音"""
|
||||
files = await storage_service.list_files(BUCKET, user_id)
|
||||
wav_files = [f for f in files if f.get("name", "").endswith("_audio.wav")]
|
||||
|
||||
if not wav_files:
|
||||
return GeneratedAudioListResponse(items=[]).model_dump()
|
||||
|
||||
async def fetch_info(f):
|
||||
name = f.get("name", "")
|
||||
storage_path = f"{user_id}/{name}"
|
||||
meta_name = name.replace("_audio.wav", "_audio.json")
|
||||
meta_path = f"{user_id}/{meta_name}"
|
||||
|
||||
display_name = name
|
||||
text = ""
|
||||
tts_mode = "edgetts"
|
||||
language = "zh-CN"
|
||||
duration_sec = 0.0
|
||||
created_at = 0
|
||||
|
||||
try:
|
||||
meta_url = await storage_service.get_signed_url(BUCKET, meta_path)
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.get(meta_url)
|
||||
if resp.status_code == 200:
|
||||
meta = resp.json()
|
||||
display_name = meta.get("display_name", name)
|
||||
text = meta.get("text", "")
|
||||
tts_mode = meta.get("tts_mode", "edgetts")
|
||||
language = meta.get("language", "zh-CN")
|
||||
duration_sec = meta.get("duration_sec", 0.0)
|
||||
created_at = meta.get("created_at", 0)
|
||||
except Exception as e:
|
||||
logger.debug(f"读取配音 metadata 失败: {e}")
|
||||
try:
|
||||
created_at = int(name.split("_")[0])
|
||||
except:
|
||||
pass
|
||||
|
||||
signed_url = await storage_service.get_signed_url(BUCKET, storage_path)
|
||||
|
||||
return GeneratedAudioItem(
|
||||
id=storage_path,
|
||||
name=display_name,
|
||||
path=signed_url,
|
||||
duration_sec=duration_sec,
|
||||
text=text,
|
||||
tts_mode=tts_mode,
|
||||
language=language,
|
||||
created_at=created_at,
|
||||
)
|
||||
|
||||
items = await asyncio.gather(*[fetch_info(f) for f in wav_files])
|
||||
items = sorted(items, key=lambda x: x.created_at, reverse=True)
|
||||
return GeneratedAudioListResponse(items=items).model_dump()
|
||||
|
||||
|
||||
async def delete_generated_audio(audio_id: str, user_id: str) -> None:
|
||||
if not audio_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权删除此文件")
|
||||
|
||||
await storage_service.delete_file(BUCKET, audio_id)
|
||||
meta_path = audio_id.replace("_audio.wav", "_audio.json")
|
||||
try:
|
||||
await storage_service.delete_file(BUCKET, meta_path)
|
||||
except:
|
||||
pass
|
||||
|
||||
|
||||
async def rename_generated_audio(audio_id: str, new_name: str, user_id: str) -> dict:
|
||||
if not audio_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权修改此文件")
|
||||
|
||||
new_name = new_name.strip()
|
||||
if not new_name:
|
||||
raise ValueError("新名称不能为空")
|
||||
|
||||
meta_path = audio_id.replace("_audio.wav", "_audio.json")
|
||||
try:
|
||||
meta_url = await storage_service.get_signed_url(BUCKET, meta_path)
|
||||
async with httpx.AsyncClient() as client:
|
||||
resp = await client.get(meta_url)
|
||||
if resp.status_code == 200:
|
||||
metadata = resp.json()
|
||||
else:
|
||||
raise Exception(f"Failed to fetch metadata: {resp.status_code}")
|
||||
except Exception as e:
|
||||
logger.warning(f"无法读取配音元数据: {e}, 将创建新的")
|
||||
metadata = {
|
||||
"display_name": new_name,
|
||||
"text": "",
|
||||
"tts_mode": "edgetts",
|
||||
"language": "zh-CN",
|
||||
"duration_sec": 0.0,
|
||||
"created_at": int(time.time()),
|
||||
}
|
||||
|
||||
metadata["display_name"] = new_name
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET,
|
||||
path=meta_path,
|
||||
file_data=json.dumps(metadata, ensure_ascii=False).encode("utf-8"),
|
||||
content_type="application/json",
|
||||
)
|
||||
return {"name": new_name}
|
||||
0
backend/app/modules/login_helper/__init__.py
Normal file
0
backend/app/modules/login_helper/__init__.py
Normal file
@@ -15,17 +15,19 @@ async def login_helper_page(platform: str, request: Request):
|
||||
登录后JavaScript自动提取Cookie并POST回服务器
|
||||
"""
|
||||
|
||||
platform_urls = {
|
||||
"bilibili": "https://www.bilibili.com/",
|
||||
"douyin": "https://creator.douyin.com/",
|
||||
"xiaohongshu": "https://creator.xiaohongshu.com/"
|
||||
}
|
||||
platform_urls = {
|
||||
"bilibili": "https://www.bilibili.com/",
|
||||
"douyin": "https://creator.douyin.com/",
|
||||
"xiaohongshu": "https://creator.xiaohongshu.com/",
|
||||
"weixin": "https://channels.weixin.qq.com/"
|
||||
}
|
||||
|
||||
platform_names = {
|
||||
"bilibili": "B站",
|
||||
"douyin": "抖音",
|
||||
"xiaohongshu": "小红书"
|
||||
}
|
||||
platform_names = {
|
||||
"bilibili": "B站",
|
||||
"douyin": "抖音",
|
||||
"xiaohongshu": "小红书",
|
||||
"weixin": "微信视频号"
|
||||
}
|
||||
|
||||
if platform not in platform_urls:
|
||||
return "<h1>不支持的平台</h1>"
|
||||
0
backend/app/modules/materials/__init__.py
Normal file
0
backend/app/modules/materials/__init__.py
Normal file
62
backend/app/modules/materials/router.py
Normal file
62
backend/app/modules/materials/router.py
Normal file
@@ -0,0 +1,62 @@
|
||||
from fastapi import APIRouter, HTTPException, Request, Depends
|
||||
from loguru import logger
|
||||
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
from app.modules.materials.schemas import RenameMaterialRequest
|
||||
from app.modules.materials import service
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
@router.post("")
|
||||
async def upload_material(
|
||||
request: Request,
|
||||
current_user: dict = Depends(get_current_user)
|
||||
):
|
||||
user_id = current_user["id"]
|
||||
logger.info(f"Upload material request from user {user_id}")
|
||||
try:
|
||||
result = await service.upload_material(request, user_id)
|
||||
return success_response(result)
|
||||
except ValueError as e:
|
||||
raise HTTPException(400, str(e))
|
||||
except Exception as e:
|
||||
raise HTTPException(500, f"Upload failed. Error: {str(e)}")
|
||||
|
||||
|
||||
@router.get("")
|
||||
async def list_materials(current_user: dict = Depends(get_current_user)):
|
||||
user_id = current_user["id"]
|
||||
materials = await service.list_materials(user_id)
|
||||
return success_response({"materials": materials})
|
||||
|
||||
|
||||
@router.delete("/{material_id:path}")
|
||||
async def delete_material(material_id: str, current_user: dict = Depends(get_current_user)):
|
||||
user_id = current_user["id"]
|
||||
try:
|
||||
await service.delete_material(material_id, user_id)
|
||||
return success_response(message="素材已删除")
|
||||
except PermissionError as e:
|
||||
raise HTTPException(403, str(e))
|
||||
except Exception as e:
|
||||
raise HTTPException(500, f"删除失败: {str(e)}")
|
||||
|
||||
|
||||
@router.put("/{material_id:path}")
|
||||
async def rename_material(
|
||||
material_id: str,
|
||||
payload: RenameMaterialRequest,
|
||||
current_user: dict = Depends(get_current_user)
|
||||
):
|
||||
user_id = current_user["id"]
|
||||
try:
|
||||
result = await service.rename_material(material_id, payload.new_name, user_id)
|
||||
return success_response(result, message="重命名成功")
|
||||
except PermissionError as e:
|
||||
raise HTTPException(403, str(e))
|
||||
except ValueError as e:
|
||||
raise HTTPException(400, str(e))
|
||||
except Exception as e:
|
||||
raise HTTPException(500, f"重命名失败: {str(e)}")
|
||||
14
backend/app/modules/materials/schemas.py
Normal file
14
backend/app/modules/materials/schemas.py
Normal file
@@ -0,0 +1,14 @@
|
||||
from pydantic import BaseModel
|
||||
|
||||
|
||||
class RenameMaterialRequest(BaseModel):
|
||||
new_name: str
|
||||
|
||||
|
||||
class MaterialItem(BaseModel):
|
||||
id: str
|
||||
name: str
|
||||
path: str
|
||||
size_mb: float
|
||||
type: str = "video"
|
||||
created_at: int = 0
|
||||
296
backend/app/modules/materials/service.py
Normal file
296
backend/app/modules/materials/service.py
Normal file
@@ -0,0 +1,296 @@
|
||||
import re
|
||||
import os
|
||||
import time
|
||||
import asyncio
|
||||
import traceback
|
||||
import aiofiles
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
|
||||
from app.services.storage import storage_service
|
||||
|
||||
|
||||
def sanitize_filename(filename: str) -> str:
|
||||
safe_name = re.sub(r'[<>:"/\\|?*]', '_', filename)
|
||||
if len(safe_name) > 100:
|
||||
ext = Path(safe_name).suffix
|
||||
safe_name = safe_name[:100 - len(ext)] + ext
|
||||
return safe_name
|
||||
|
||||
|
||||
def _extract_display_name(storage_name: str) -> str:
|
||||
"""从存储文件名中提取显示名(去掉时间戳前缀)"""
|
||||
if '_' in storage_name:
|
||||
parts = storage_name.split('_', 1)
|
||||
if parts[0].isdigit():
|
||||
return parts[1]
|
||||
return storage_name
|
||||
|
||||
|
||||
async def _process_and_upload(temp_file_path: str, original_filename: str, content_type: str, user_id: str) -> str:
|
||||
"""Strip multipart headers and upload to Supabase, return storage_path"""
|
||||
try:
|
||||
logger.info(f"Processing raw upload: {temp_file_path} for user {user_id}")
|
||||
|
||||
file_size = os.path.getsize(temp_file_path)
|
||||
|
||||
with open(temp_file_path, 'rb') as f:
|
||||
head = f.read(4096)
|
||||
|
||||
first_line_end = head.find(b'\r\n')
|
||||
if first_line_end == -1:
|
||||
raise Exception("Could not find boundary in multipart body")
|
||||
|
||||
boundary = head[:first_line_end]
|
||||
logger.info(f"Detected boundary: {boundary}")
|
||||
|
||||
header_end = head.find(b'\r\n\r\n')
|
||||
if header_end == -1:
|
||||
raise Exception("Could not find end of multipart headers")
|
||||
|
||||
start_offset = header_end + 4
|
||||
logger.info(f"Video data starts at offset: {start_offset}")
|
||||
|
||||
f.seek(max(0, file_size - 200))
|
||||
tail = f.read()
|
||||
|
||||
last_boundary_pos = tail.rfind(boundary)
|
||||
if last_boundary_pos != -1:
|
||||
end_offset = (max(0, file_size - 200) + last_boundary_pos) - 2
|
||||
else:
|
||||
logger.warning("Could not find closing boundary, assuming EOF")
|
||||
end_offset = file_size
|
||||
|
||||
logger.info(f"Video data ends at offset: {end_offset}. Total video size: {end_offset - start_offset}")
|
||||
|
||||
video_path = temp_file_path + "_video.mp4"
|
||||
with open(temp_file_path, 'rb') as src, open(video_path, 'wb') as dst:
|
||||
src.seek(start_offset)
|
||||
bytes_to_copy = end_offset - start_offset
|
||||
copied = 0
|
||||
while copied < bytes_to_copy:
|
||||
chunk_size = min(1024 * 1024 * 10, bytes_to_copy - copied)
|
||||
chunk = src.read(chunk_size)
|
||||
if not chunk:
|
||||
break
|
||||
dst.write(chunk)
|
||||
copied += len(chunk)
|
||||
|
||||
logger.info(f"Extracted video content to {video_path}")
|
||||
|
||||
timestamp = int(time.time())
|
||||
safe_name = re.sub(r'[^a-zA-Z0-9._-]', '', original_filename)
|
||||
storage_path = f"{user_id}/{timestamp}_{safe_name}"
|
||||
|
||||
with open(video_path, 'rb') as f:
|
||||
file_content = f.read()
|
||||
await storage_service.upload_file(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=storage_path,
|
||||
file_data=file_content,
|
||||
content_type=content_type
|
||||
)
|
||||
|
||||
logger.info(f"Upload to Supabase complete: {storage_path}")
|
||||
|
||||
os.remove(temp_file_path)
|
||||
os.remove(video_path)
|
||||
|
||||
return storage_path
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Background upload processing failed: {e}\n{traceback.format_exc()}")
|
||||
raise
|
||||
|
||||
|
||||
async def upload_material(request, user_id: str) -> dict:
|
||||
"""接收流式上传并存储到 Supabase,返回素材信息"""
|
||||
filename = "unknown_video.mp4"
|
||||
content_type = "video/mp4"
|
||||
|
||||
timestamp = int(time.time())
|
||||
temp_filename = f"upload_{timestamp}.raw"
|
||||
temp_path = os.path.join("/tmp", temp_filename)
|
||||
if os.name == 'nt':
|
||||
temp_path = f"d:/tmp/{temp_filename}"
|
||||
os.makedirs("d:/tmp", exist_ok=True)
|
||||
|
||||
try:
|
||||
total_size = 0
|
||||
last_log = 0
|
||||
|
||||
async with aiofiles.open(temp_path, 'wb') as f:
|
||||
async for chunk in request.stream():
|
||||
await f.write(chunk)
|
||||
total_size += len(chunk)
|
||||
|
||||
if total_size - last_log > 20 * 1024 * 1024:
|
||||
logger.info(f"Receiving stream... Processed {total_size / (1024*1024):.2f} MB")
|
||||
last_log = total_size
|
||||
|
||||
logger.info(f"Stream reception complete. Total size: {total_size} bytes. Saved to {temp_path}")
|
||||
|
||||
if total_size == 0:
|
||||
raise ValueError("Received empty body")
|
||||
|
||||
with open(temp_path, 'rb') as f:
|
||||
head = f.read(4096).decode('utf-8', errors='ignore')
|
||||
match = re.search(r'filename="([^"]+)"', head)
|
||||
if match:
|
||||
filename = match.group(1)
|
||||
logger.info(f"Extracted filename from body: {filename}")
|
||||
|
||||
storage_path = await _process_and_upload(temp_path, filename, content_type, user_id)
|
||||
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=storage_path
|
||||
)
|
||||
|
||||
size_mb = total_size / (1024 * 1024)
|
||||
display_name = _extract_display_name(storage_path.split('/')[-1])
|
||||
|
||||
return {
|
||||
"id": storage_path,
|
||||
"name": display_name,
|
||||
"path": signed_url,
|
||||
"size_mb": size_mb,
|
||||
"type": "video"
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
error_msg = f"Streaming upload failed: {str(e)}"
|
||||
detail_msg = f"Exception: {repr(e)}\nArgs: {e.args}\n{traceback.format_exc()}"
|
||||
logger.error(error_msg + "\n" + detail_msg)
|
||||
|
||||
try:
|
||||
with open("debug_upload.log", "a") as logf:
|
||||
logf.write(f"\n--- Error at {time.ctime()} ---\n")
|
||||
logf.write(detail_msg)
|
||||
logf.write("\n-----------------------------\n")
|
||||
except:
|
||||
pass
|
||||
|
||||
if os.path.exists(temp_path):
|
||||
try:
|
||||
os.remove(temp_path)
|
||||
except:
|
||||
pass
|
||||
raise
|
||||
|
||||
|
||||
async def list_materials(user_id: str) -> list[dict]:
|
||||
"""列出用户的所有素材"""
|
||||
try:
|
||||
files_obj = await storage_service.list_files(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=user_id
|
||||
)
|
||||
semaphore = asyncio.Semaphore(8)
|
||||
|
||||
async def build_item(f):
|
||||
name = f.get('name')
|
||||
if not name or name == '.emptyFolderPlaceholder':
|
||||
return None
|
||||
display_name = _extract_display_name(name)
|
||||
full_path = f"{user_id}/{name}"
|
||||
async with semaphore:
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=full_path
|
||||
)
|
||||
metadata = f.get('metadata', {})
|
||||
size = metadata.get('size', 0)
|
||||
created_at_str = f.get('created_at', '')
|
||||
created_at = 0
|
||||
if created_at_str:
|
||||
from datetime import datetime
|
||||
try:
|
||||
dt = datetime.fromisoformat(created_at_str.replace('Z', '+00:00'))
|
||||
created_at = int(dt.timestamp())
|
||||
except Exception:
|
||||
pass
|
||||
return {
|
||||
"id": full_path,
|
||||
"name": display_name,
|
||||
"path": signed_url,
|
||||
"size_mb": size / (1024 * 1024),
|
||||
"type": "video",
|
||||
"created_at": created_at
|
||||
}
|
||||
|
||||
tasks = [build_item(f) for f in files_obj]
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
materials = []
|
||||
for item in results:
|
||||
if not item:
|
||||
continue
|
||||
if isinstance(item, Exception):
|
||||
logger.warning(f"Material signed url build failed: {item}")
|
||||
continue
|
||||
materials.append(item)
|
||||
materials.sort(key=lambda x: x['id'], reverse=True)
|
||||
return materials
|
||||
except Exception as e:
|
||||
logger.error(f"List materials failed: {e}")
|
||||
return []
|
||||
|
||||
|
||||
async def delete_material(material_id: str, user_id: str) -> None:
|
||||
"""删除素材"""
|
||||
if not material_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权删除此素材")
|
||||
await storage_service.delete_file(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=material_id
|
||||
)
|
||||
|
||||
|
||||
async def rename_material(material_id: str, new_name_raw: str, user_id: str) -> dict:
|
||||
"""重命名素材,返回更新后的素材信息"""
|
||||
if not material_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权重命名此素材")
|
||||
|
||||
new_name_raw = new_name_raw.strip() if new_name_raw else ""
|
||||
if not new_name_raw:
|
||||
raise ValueError("新名称不能为空")
|
||||
|
||||
old_name = material_id.split("/", 1)[1]
|
||||
old_ext = Path(old_name).suffix
|
||||
base_name = Path(new_name_raw).stem if Path(new_name_raw).suffix else new_name_raw
|
||||
safe_base = sanitize_filename(base_name).strip()
|
||||
if not safe_base:
|
||||
raise ValueError("新名称无效")
|
||||
|
||||
new_filename = f"{safe_base}{old_ext}"
|
||||
|
||||
prefix = None
|
||||
if "_" in old_name:
|
||||
maybe_prefix, _ = old_name.split("_", 1)
|
||||
if maybe_prefix.isdigit():
|
||||
prefix = maybe_prefix
|
||||
if prefix:
|
||||
new_filename = f"{prefix}_{new_filename}"
|
||||
|
||||
new_path = f"{user_id}/{new_filename}"
|
||||
|
||||
if new_path != material_id:
|
||||
await storage_service.move_file(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
from_path=material_id,
|
||||
to_path=new_path
|
||||
)
|
||||
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=new_path
|
||||
)
|
||||
|
||||
display_name = _extract_display_name(new_filename)
|
||||
|
||||
return {
|
||||
"id": new_path,
|
||||
"name": display_name,
|
||||
"path": signed_url,
|
||||
}
|
||||
0
backend/app/modules/payment/__init__.py
Normal file
0
backend/app/modules/payment/__init__.py
Normal file
52
backend/app/modules/payment/router.py
Normal file
52
backend/app/modules/payment/router.py
Normal file
@@ -0,0 +1,52 @@
|
||||
"""
|
||||
支付 API:创建订单、异步通知、状态查询
|
||||
|
||||
遵循 BACKEND_DEV.md 规范:router 只做参数校验、调用 service、返回统一响应
|
||||
"""
|
||||
from fastapi import APIRouter, HTTPException, Request, status
|
||||
from fastapi.responses import PlainTextResponse
|
||||
|
||||
from app.core.response import success_response
|
||||
from .schemas import CreateOrderRequest, CreateOrderResponse, OrderStatusResponse
|
||||
from . import service
|
||||
|
||||
router = APIRouter(prefix="/api/payment", tags=["支付"])
|
||||
|
||||
|
||||
@router.post("/create-order")
|
||||
async def create_payment_order(request: CreateOrderRequest):
|
||||
"""创建支付宝电脑网站支付订单,返回收银台 URL"""
|
||||
try:
|
||||
result = service.create_payment_order(request.payment_token)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail=str(e))
|
||||
except RuntimeError as e:
|
||||
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=str(e))
|
||||
|
||||
return success_response(
|
||||
CreateOrderResponse(**result).model_dump()
|
||||
)
|
||||
|
||||
|
||||
@router.post("/notify")
|
||||
async def payment_notify(request: Request):
|
||||
"""
|
||||
支付宝异步通知回调
|
||||
|
||||
必须返回纯文本 "success"(不是 JSON),否则支付宝会重复推送。
|
||||
"""
|
||||
form_data = await request.form()
|
||||
verified = service.handle_payment_notify(dict(form_data))
|
||||
return PlainTextResponse("success" if verified else "fail")
|
||||
|
||||
|
||||
@router.get("/status/{out_trade_no}")
|
||||
async def check_payment_status(out_trade_no: str):
|
||||
"""查询订单支付状态(前端轮询)"""
|
||||
order_status = service.get_order_status(out_trade_no)
|
||||
if order_status is None:
|
||||
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="订单不存在")
|
||||
|
||||
return success_response(
|
||||
OrderStatusResponse(status=order_status).model_dump()
|
||||
)
|
||||
15
backend/app/modules/payment/schemas.py
Normal file
15
backend/app/modules/payment/schemas.py
Normal file
@@ -0,0 +1,15 @@
|
||||
from pydantic import BaseModel
|
||||
|
||||
|
||||
class CreateOrderRequest(BaseModel):
|
||||
payment_token: str
|
||||
|
||||
|
||||
class CreateOrderResponse(BaseModel):
|
||||
pay_url: str
|
||||
out_trade_no: str
|
||||
amount: float
|
||||
|
||||
|
||||
class OrderStatusResponse(BaseModel):
|
||||
status: str
|
||||
137
backend/app/modules/payment/service.py
Normal file
137
backend/app/modules/payment/service.py
Normal file
@@ -0,0 +1,137 @@
|
||||
"""
|
||||
支付业务服务
|
||||
|
||||
职责:Alipay SDK 封装、创建订单、处理支付通知、查询状态
|
||||
遵循 BACKEND_DEV.md "薄路由 + 厚服务" 原则
|
||||
"""
|
||||
from datetime import datetime, timezone, timedelta
|
||||
import uuid
|
||||
|
||||
from alipay import AliPay
|
||||
from loguru import logger
|
||||
|
||||
from app.core.config import settings
|
||||
from app.core.security import decode_payment_token
|
||||
from app.repositories.orders import create_order, get_order_by_trade_no, update_order_status
|
||||
from app.repositories.users import update_user
|
||||
|
||||
# 支付宝网关地址
|
||||
ALIPAY_GATEWAY = "https://openapi.alipay.com/gateway.do"
|
||||
ALIPAY_GATEWAY_SANDBOX = "https://openapi-sandbox.dl.alipaydev.com/gateway.do"
|
||||
|
||||
|
||||
def _get_alipay_client() -> AliPay:
|
||||
"""延迟初始化 Alipay 客户端"""
|
||||
return AliPay(
|
||||
appid=settings.ALIPAY_APP_ID,
|
||||
app_notify_url=settings.ALIPAY_NOTIFY_URL,
|
||||
app_private_key_string=open(settings.ALIPAY_PRIVATE_KEY_PATH).read(),
|
||||
alipay_public_key_string=open(settings.ALIPAY_PUBLIC_KEY_PATH).read(),
|
||||
sign_type="RSA2",
|
||||
debug=settings.ALIPAY_SANDBOX,
|
||||
)
|
||||
|
||||
|
||||
def _create_page_pay_url(out_trade_no: str, amount: float, subject: str) -> str | None:
|
||||
"""调用 alipay.trade.page.pay,返回支付宝收银台 URL"""
|
||||
client = _get_alipay_client()
|
||||
order_string = client.api_alipay_trade_page_pay(
|
||||
subject=subject,
|
||||
out_trade_no=out_trade_no,
|
||||
total_amount=amount,
|
||||
return_url=settings.ALIPAY_RETURN_URL,
|
||||
)
|
||||
if not order_string:
|
||||
logger.error(f"电脑网站支付下单失败: {out_trade_no}")
|
||||
return None
|
||||
|
||||
gateway = ALIPAY_GATEWAY_SANDBOX if settings.ALIPAY_SANDBOX else ALIPAY_GATEWAY
|
||||
pay_url = f"{gateway}?{order_string}"
|
||||
logger.info(f"电脑网站支付下单成功: {out_trade_no}")
|
||||
return pay_url
|
||||
|
||||
|
||||
def _verify_signature(data: dict, signature: str) -> bool:
|
||||
"""验证支付宝异步通知签名"""
|
||||
client = _get_alipay_client()
|
||||
return client.verify(data, signature)
|
||||
|
||||
|
||||
def create_payment_order(payment_token: str) -> dict:
|
||||
"""
|
||||
创建支付订单完整流程
|
||||
|
||||
Returns: {"pay_url": str, "out_trade_no": str, "amount": float}
|
||||
Raises: ValueError (token 无效), RuntimeError (API 失败)
|
||||
"""
|
||||
user_id = decode_payment_token(payment_token)
|
||||
if not user_id:
|
||||
raise ValueError("付费凭证无效或已过期,请重新登录")
|
||||
|
||||
out_trade_no = f"VG_{int(datetime.now().timestamp())}_{uuid.uuid4().hex[:8]}"
|
||||
amount = settings.PAYMENT_AMOUNT
|
||||
|
||||
create_order(user_id, out_trade_no, amount)
|
||||
|
||||
pay_url = _create_page_pay_url(out_trade_no, amount, "IPAgent 会员开通")
|
||||
if not pay_url:
|
||||
raise RuntimeError("创建支付订单失败,请稍后重试")
|
||||
|
||||
logger.info(f"用户 {user_id} 创建支付订单: {out_trade_no}")
|
||||
|
||||
return {"pay_url": pay_url, "out_trade_no": out_trade_no, "amount": amount}
|
||||
|
||||
|
||||
def handle_payment_notify(form_data: dict) -> bool:
|
||||
"""
|
||||
处理支付宝异步通知完整流程
|
||||
|
||||
Returns: True=验签通过, False=验签失败
|
||||
"""
|
||||
data = dict(form_data)
|
||||
|
||||
signature = data.pop("sign", "")
|
||||
data.pop("sign_type", None)
|
||||
|
||||
if not _verify_signature(data, signature):
|
||||
logger.warning(f"支付宝通知验签失败: {data.get('out_trade_no')}")
|
||||
return False
|
||||
|
||||
out_trade_no = data.get("out_trade_no", "")
|
||||
trade_status = data.get("trade_status", "")
|
||||
trade_no = data.get("trade_no", "")
|
||||
|
||||
logger.info(f"收到支付宝通知: {out_trade_no}, status={trade_status}, trade_no={trade_no}")
|
||||
|
||||
if trade_status not in ("TRADE_SUCCESS", "TRADE_FINISHED"):
|
||||
return True
|
||||
|
||||
order = get_order_by_trade_no(out_trade_no)
|
||||
if not order:
|
||||
logger.warning(f"订单不存在: {out_trade_no}")
|
||||
return True
|
||||
|
||||
if order["status"] == "paid":
|
||||
logger.info(f"订单已处理过: {out_trade_no}")
|
||||
return True
|
||||
|
||||
update_order_status(out_trade_no, "paid", trade_no)
|
||||
|
||||
user_id = order["user_id"]
|
||||
expires_at = (datetime.now(timezone.utc) + timedelta(days=settings.PAYMENT_EXPIRE_DAYS)).isoformat()
|
||||
update_user(user_id, {
|
||||
"is_active": True,
|
||||
"role": "user",
|
||||
"expires_at": expires_at,
|
||||
})
|
||||
|
||||
logger.success(f"用户 {user_id} 支付成功,已激活,有效期至 {expires_at}")
|
||||
return True
|
||||
|
||||
|
||||
def get_order_status(out_trade_no: str) -> str | None:
|
||||
"""查询订单支付状态"""
|
||||
order = get_order_by_trade_no(out_trade_no)
|
||||
if not order:
|
||||
return None
|
||||
return order["status"]
|
||||
0
backend/app/modules/publish/__init__.py
Normal file
0
backend/app/modules/publish/__init__.py
Normal file
@@ -1,13 +1,17 @@
|
||||
"""
|
||||
发布管理 API (支持用户认证)
|
||||
"""
|
||||
from fastapi import APIRouter, HTTPException, BackgroundTasks, Depends, Request
|
||||
from pydantic import BaseModel
|
||||
from typing import List, Optional
|
||||
from datetime import datetime
|
||||
from loguru import logger
|
||||
from app.services.publish_service import PublishService
|
||||
from app.core.deps import get_current_user_optional
|
||||
from fastapi import APIRouter, HTTPException, BackgroundTasks, Depends, Request
|
||||
from fastapi.responses import FileResponse
|
||||
from pydantic import BaseModel
|
||||
from typing import List, Optional
|
||||
from datetime import datetime
|
||||
import re
|
||||
from loguru import logger
|
||||
from app.services.publish_service import PublishService
|
||||
from app.core.response import success_response
|
||||
from app.core.config import settings
|
||||
from app.core.deps import get_current_user
|
||||
|
||||
router = APIRouter()
|
||||
publish_service = PublishService()
|
||||
@@ -29,7 +33,7 @@ class PublishResponse(BaseModel):
|
||||
url: Optional[str] = None
|
||||
|
||||
# Supported platforms for validation
|
||||
SUPPORTED_PLATFORMS = {"bilibili", "douyin", "xiaohongshu"}
|
||||
SUPPORTED_PLATFORMS = {"bilibili", "douyin", "xiaohongshu", "weixin"}
|
||||
|
||||
|
||||
def _get_user_id(request: Request) -> Optional[str]:
|
||||
@@ -46,8 +50,8 @@ def _get_user_id(request: Request) -> Optional[str]:
|
||||
return None
|
||||
|
||||
|
||||
@router.post("", response_model=PublishResponse)
|
||||
async def publish_video(request: PublishRequest, req: Request, background_tasks: BackgroundTasks):
|
||||
@router.post("")
|
||||
async def publish_video(request: PublishRequest, req: Request, background_tasks: BackgroundTasks):
|
||||
"""发布视频到指定平台"""
|
||||
# Validate platform
|
||||
if request.platform not in SUPPORTED_PLATFORMS:
|
||||
@@ -69,27 +73,23 @@ async def publish_video(request: PublishRequest, req: Request, background_tasks:
|
||||
publish_time=request.publish_time,
|
||||
user_id=user_id
|
||||
)
|
||||
return PublishResponse(
|
||||
success=result.get("success", False),
|
||||
message=result.get("message", ""),
|
||||
platform=request.platform,
|
||||
url=result.get("url")
|
||||
)
|
||||
message = result.get("message", "")
|
||||
return success_response(result, message=message)
|
||||
except Exception as e:
|
||||
logger.error(f"发布失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
|
||||
@router.get("/platforms")
|
||||
async def list_platforms():
|
||||
return {"platforms": [{**pinfo, "id": pid} for pid, pinfo in publish_service.PLATFORMS.items()]}
|
||||
async def list_platforms():
|
||||
return success_response({"platforms": [{**pinfo, "id": pid} for pid, pinfo in publish_service.PLATFORMS.items()]})
|
||||
|
||||
@router.get("/accounts")
|
||||
async def list_accounts(req: Request):
|
||||
user_id = _get_user_id(req)
|
||||
return {"accounts": publish_service.get_accounts(user_id)}
|
||||
async def list_accounts(req: Request):
|
||||
user_id = _get_user_id(req)
|
||||
return success_response({"accounts": publish_service.get_accounts(user_id)})
|
||||
|
||||
@router.post("/login/{platform}")
|
||||
async def login_platform(platform: str, req: Request):
|
||||
async def login_platform(platform: str, req: Request):
|
||||
"""触发平台QR码登录"""
|
||||
if platform not in SUPPORTED_PLATFORMS:
|
||||
raise HTTPException(status_code=400, detail=f"不支持的平台: {platform}")
|
||||
@@ -97,32 +97,33 @@ async def login_platform(platform: str, req: Request):
|
||||
user_id = _get_user_id(req)
|
||||
result = await publish_service.login(platform, user_id)
|
||||
|
||||
if result.get("success"):
|
||||
return result
|
||||
else:
|
||||
raise HTTPException(status_code=400, detail=result.get("message"))
|
||||
message = result.get("message", "")
|
||||
return success_response(result, message=message)
|
||||
|
||||
@router.post("/logout/{platform}")
|
||||
async def logout_platform(platform: str, req: Request):
|
||||
async def logout_platform(platform: str, req: Request):
|
||||
"""注销平台登录"""
|
||||
if platform not in SUPPORTED_PLATFORMS:
|
||||
raise HTTPException(status_code=400, detail=f"不支持的平台: {platform}")
|
||||
|
||||
user_id = _get_user_id(req)
|
||||
result = publish_service.logout(platform, user_id)
|
||||
return result
|
||||
result = publish_service.logout(platform, user_id)
|
||||
message = result.get("message", "")
|
||||
return success_response(result, message=message)
|
||||
|
||||
@router.get("/login/status/{platform}")
|
||||
async def get_login_status(platform: str, req: Request):
|
||||
async def get_login_status(platform: str, req: Request):
|
||||
"""检查登录状态 (优先检查活跃的扫码会话)"""
|
||||
if platform not in SUPPORTED_PLATFORMS:
|
||||
raise HTTPException(status_code=400, detail=f"不支持的平台: {platform}")
|
||||
|
||||
user_id = _get_user_id(req)
|
||||
return publish_service.get_login_session_status(platform, user_id)
|
||||
result = publish_service.get_login_session_status(platform, user_id)
|
||||
message = result.get("message", "")
|
||||
return success_response(result, message=message)
|
||||
|
||||
@router.post("/cookies/save/{platform}")
|
||||
async def save_platform_cookie(platform: str, cookie_data: dict, req: Request):
|
||||
@router.post("/cookies/save/{platform}")
|
||||
async def save_platform_cookie(platform: str, cookie_data: dict, req: Request):
|
||||
"""
|
||||
保存从客户端浏览器提取的Cookie
|
||||
|
||||
@@ -140,7 +141,25 @@ async def save_platform_cookie(platform: str, cookie_data: dict, req: Request):
|
||||
user_id = _get_user_id(req)
|
||||
result = await publish_service.save_cookie_string(platform, cookie_string, user_id)
|
||||
|
||||
if result.get("success"):
|
||||
return result
|
||||
else:
|
||||
raise HTTPException(status_code=400, detail=result.get("message"))
|
||||
message = result.get("message", "")
|
||||
return success_response(result, message=message)
|
||||
|
||||
|
||||
@router.get("/screenshot/{filename}")
|
||||
async def get_publish_screenshot(
|
||||
filename: str,
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
if not re.match(r"^[A-Za-z0-9_.-]+$", filename):
|
||||
raise HTTPException(status_code=400, detail="非法文件名")
|
||||
|
||||
user_id = str(current_user.get("id") or "")
|
||||
if not user_id:
|
||||
raise HTTPException(status_code=401, detail="未登录")
|
||||
|
||||
user_dir = re.sub(r"[^A-Za-z0-9_-]", "_", user_id)[:64] or "legacy"
|
||||
file_path = settings.PUBLISH_SCREENSHOT_DIR / user_dir / filename
|
||||
if not file_path.exists() or not file_path.is_file():
|
||||
raise HTTPException(status_code=404, detail="截图不存在")
|
||||
|
||||
return FileResponse(path=str(file_path), media_type="image/png")
|
||||
0
backend/app/modules/ref_audios/__init__.py
Normal file
0
backend/app/modules/ref_audios/__init__.py
Normal file
88
backend/app/modules/ref_audios/router.py
Normal file
88
backend/app/modules/ref_audios/router.py
Normal file
@@ -0,0 +1,88 @@
|
||||
"""参考音频管理 API"""
|
||||
from fastapi import APIRouter, UploadFile, File, Form, HTTPException, Depends
|
||||
from loguru import logger
|
||||
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
from app.modules.ref_audios.schemas import RenameRequest
|
||||
from app.modules.ref_audios import service
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
@router.post("")
|
||||
async def upload_ref_audio(
|
||||
file: UploadFile = File(...),
|
||||
ref_text: str = Form(""),
|
||||
user: dict = Depends(get_current_user)
|
||||
):
|
||||
"""上传参考音频"""
|
||||
try:
|
||||
result = await service.upload_ref_audio(file, ref_text, user["id"])
|
||||
return success_response(result)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
except Exception as e:
|
||||
logger.error(f"上传参考音频失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"上传失败: {str(e)}")
|
||||
|
||||
|
||||
@router.get("")
|
||||
async def list_ref_audios(user: dict = Depends(get_current_user)):
|
||||
"""列出当前用户的所有参考音频"""
|
||||
try:
|
||||
result = await service.list_ref_audios(user["id"])
|
||||
return success_response(result)
|
||||
except Exception as e:
|
||||
logger.error(f"列出参考音频失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"获取列表失败: {str(e)}")
|
||||
|
||||
|
||||
@router.delete("/{audio_id:path}")
|
||||
async def delete_ref_audio(audio_id: str, user: dict = Depends(get_current_user)):
|
||||
"""删除参考音频"""
|
||||
try:
|
||||
await service.delete_ref_audio(audio_id, user["id"])
|
||||
return success_response(message="删除成功")
|
||||
except PermissionError as e:
|
||||
raise HTTPException(status_code=403, detail=str(e))
|
||||
except Exception as e:
|
||||
logger.error(f"删除参考音频失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"删除失败: {str(e)}")
|
||||
|
||||
|
||||
@router.put("/{audio_id:path}")
|
||||
async def rename_ref_audio(
|
||||
audio_id: str,
|
||||
request: RenameRequest,
|
||||
user: dict = Depends(get_current_user)
|
||||
):
|
||||
"""重命名参考音频"""
|
||||
try:
|
||||
result = await service.rename_ref_audio(audio_id, request.new_name, user["id"])
|
||||
return success_response(result, message="重命名成功")
|
||||
except PermissionError as e:
|
||||
raise HTTPException(status_code=403, detail=str(e))
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
except Exception as e:
|
||||
logger.error(f"重命名失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"重命名失败: {str(e)}")
|
||||
|
||||
|
||||
@router.post("/{audio_id:path}/retranscribe")
|
||||
async def retranscribe_ref_audio(
|
||||
audio_id: str,
|
||||
user: dict = Depends(get_current_user)
|
||||
):
|
||||
"""重新识别参考音频的文字内容"""
|
||||
try:
|
||||
result = await service.retranscribe_ref_audio(audio_id, user["id"])
|
||||
return success_response(result, message="识别完成")
|
||||
except PermissionError as e:
|
||||
raise HTTPException(status_code=403, detail=str(e))
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
except Exception as e:
|
||||
logger.error(f"重新识别失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"识别失败: {str(e)}")
|
||||
19
backend/app/modules/ref_audios/schemas.py
Normal file
19
backend/app/modules/ref_audios/schemas.py
Normal file
@@ -0,0 +1,19 @@
|
||||
from pydantic import BaseModel
|
||||
from typing import List
|
||||
|
||||
|
||||
class RefAudioResponse(BaseModel):
|
||||
id: str
|
||||
name: str
|
||||
path: str
|
||||
ref_text: str
|
||||
duration_sec: float
|
||||
created_at: int
|
||||
|
||||
|
||||
class RefAudioListResponse(BaseModel):
|
||||
items: List[RefAudioResponse]
|
||||
|
||||
|
||||
class RenameRequest(BaseModel):
|
||||
new_name: str
|
||||
405
backend/app/modules/ref_audios/service.py
Normal file
405
backend/app/modules/ref_audios/service.py
Normal file
@@ -0,0 +1,405 @@
|
||||
import re
|
||||
import os
|
||||
import time
|
||||
import json
|
||||
import hashlib
|
||||
import asyncio
|
||||
import subprocess
|
||||
import tempfile
|
||||
import unicodedata
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import httpx
|
||||
from loguru import logger
|
||||
|
||||
from app.services.storage import storage_service
|
||||
from app.modules.ref_audios.schemas import RefAudioResponse, RefAudioListResponse
|
||||
|
||||
ALLOWED_AUDIO_EXTENSIONS = {'.wav', '.mp3', '.m4a', '.webm', '.ogg', '.flac', '.aac'}
|
||||
BUCKET_REF_AUDIOS = "ref-audios"
|
||||
|
||||
|
||||
def sanitize_filename(filename: str) -> str:
|
||||
"""清理文件名用于 Storage key(仅保留 ASCII 安全字符)。"""
|
||||
normalized = unicodedata.normalize("NFKD", filename)
|
||||
ascii_name = normalized.encode("ascii", "ignore").decode("ascii")
|
||||
safe_name = re.sub(r"[^A-Za-z0-9._-]+", "_", ascii_name).strip("._-")
|
||||
|
||||
# 纯中文/emoji 等场景会被清空,使用稳定哈希兜底,避免 InvalidKey
|
||||
if not safe_name:
|
||||
digest = hashlib.md5(filename.encode("utf-8")).hexdigest()[:12]
|
||||
safe_name = f"audio_{digest}"
|
||||
|
||||
if len(safe_name) > 50:
|
||||
ext = Path(safe_name).suffix
|
||||
safe_name = safe_name[:50 - len(ext)] + ext
|
||||
return safe_name
|
||||
|
||||
|
||||
def _get_audio_duration(file_path: str) -> float:
|
||||
"""获取音频时长 (秒)"""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
['ffprobe', '-v', 'quiet', '-show_entries', 'format=duration',
|
||||
'-of', 'csv=p=0', file_path],
|
||||
capture_output=True, text=True, timeout=10
|
||||
)
|
||||
return float(result.stdout.strip())
|
||||
except Exception as e:
|
||||
logger.warning(f"获取音频时长失败: {e}")
|
||||
return 0.0
|
||||
|
||||
|
||||
def _find_silence_cut_point(file_path: str, max_duration: float) -> float:
|
||||
"""在 max_duration 附近找一个静音点作为截取位置,找不到则回退到 max_duration"""
|
||||
try:
|
||||
# 用 silencedetect 找所有静音段(阈值 -30dB,最短 0.3 秒)
|
||||
result = subprocess.run(
|
||||
['ffmpeg', '-i', file_path, '-af',
|
||||
'silencedetect=noise=-30dB:d=0.3', '-f', 'null', '-'],
|
||||
capture_output=True, text=True, timeout=30
|
||||
)
|
||||
# 解析 silence_end 时间点
|
||||
import re as _re
|
||||
ends = [float(m) for m in _re.findall(r'silence_end:\s*([\d.]+)', result.stderr)]
|
||||
# 找 max_duration 之前最后一个静音结束点(至少 3 秒)
|
||||
candidates = [t for t in ends if 3.0 <= t <= max_duration]
|
||||
if candidates:
|
||||
cut = candidates[-1]
|
||||
logger.info(f"Found silence cut point at {cut:.1f}s (max={max_duration}s)")
|
||||
return cut
|
||||
except Exception as e:
|
||||
logger.warning(f"Silence detection failed: {e}")
|
||||
return max_duration
|
||||
|
||||
|
||||
def _convert_to_wav(input_path: str, output_path: str, max_duration: float = 0) -> bool:
|
||||
"""将音频转换为 WAV 格式 (16kHz, mono),可选截取前 max_duration 秒并淡出"""
|
||||
try:
|
||||
cmd = ['ffmpeg', '-y', '-i', input_path]
|
||||
if max_duration > 0:
|
||||
cmd += ['-t', str(max_duration)]
|
||||
# 末尾 0.1 秒淡出,避免截断爆音
|
||||
fade_start = max(0, max_duration - 0.1)
|
||||
cmd += ['-af', f'afade=t=out:st={fade_start}:d=0.1']
|
||||
cmd += ['-ar', '16000', '-ac', '1', '-acodec', 'pcm_s16le', output_path]
|
||||
subprocess.run(cmd, capture_output=True, timeout=60, check=True)
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"音频转换失败: {e}")
|
||||
return False
|
||||
|
||||
|
||||
async def upload_ref_audio(file, ref_text: str, user_id: str) -> dict:
|
||||
"""上传参考音频:转码、获取时长、存储到 Supabase"""
|
||||
if not file.filename:
|
||||
raise ValueError("文件名无效")
|
||||
filename = file.filename
|
||||
|
||||
ext = Path(filename).suffix.lower()
|
||||
if ext not in ALLOWED_AUDIO_EXTENSIONS:
|
||||
raise ValueError(f"不支持的音频格式: {ext}。支持的格式: {', '.join(ALLOWED_AUDIO_EXTENSIONS)}")
|
||||
|
||||
# 创建临时文件
|
||||
with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp_input:
|
||||
content = await file.read()
|
||||
tmp_input.write(content)
|
||||
tmp_input_path = tmp_input.name
|
||||
|
||||
try:
|
||||
# 转换为 WAV 格式
|
||||
tmp_wav_path = tmp_input_path + ".wav"
|
||||
if not _convert_to_wav(tmp_input_path, tmp_wav_path):
|
||||
raise RuntimeError("音频格式转换失败")
|
||||
|
||||
# 获取音频时长
|
||||
duration = _get_audio_duration(tmp_wav_path)
|
||||
if duration < 1.0:
|
||||
raise ValueError("音频时长过短,至少需要 1 秒")
|
||||
|
||||
# 超过 10 秒自动在静音点截取(CosyVoice 对 3-10 秒效果最好)
|
||||
MAX_REF_DURATION = 10.0
|
||||
if duration > MAX_REF_DURATION:
|
||||
cut_point = _find_silence_cut_point(tmp_wav_path, MAX_REF_DURATION)
|
||||
logger.info(f"Ref audio {duration:.1f}s > {MAX_REF_DURATION}s, trimming at {cut_point:.1f}s")
|
||||
trimmed_path = tmp_input_path + "_trimmed.wav"
|
||||
if not _convert_to_wav(tmp_wav_path, trimmed_path, max_duration=cut_point):
|
||||
raise RuntimeError("音频截取失败")
|
||||
os.unlink(tmp_wav_path)
|
||||
tmp_wav_path = trimmed_path
|
||||
duration = _get_audio_duration(tmp_wav_path)
|
||||
|
||||
# 自动转写参考音频内容
|
||||
try:
|
||||
from app.services.whisper_service import whisper_service
|
||||
transcribed = await whisper_service.transcribe(tmp_wav_path)
|
||||
if transcribed.strip():
|
||||
ref_text = transcribed.strip()
|
||||
logger.info(f"Auto-transcribed ref audio: {ref_text[:50]}...")
|
||||
except Exception as e:
|
||||
logger.warning(f"Auto-transcribe failed: {e}")
|
||||
|
||||
if not ref_text or not ref_text.strip():
|
||||
raise ValueError("无法识别音频内容,请确保音频包含清晰的语音")
|
||||
|
||||
# 检查重名
|
||||
existing_files = await storage_service.list_files(BUCKET_REF_AUDIOS, user_id)
|
||||
dup_count = 0
|
||||
search_suffix = f"_{filename}"
|
||||
for f in existing_files:
|
||||
fname = f.get('name', '')
|
||||
if fname.endswith(search_suffix):
|
||||
dup_count += 1
|
||||
|
||||
final_display_name = filename
|
||||
if dup_count > 0:
|
||||
name_stem = Path(filename).stem
|
||||
name_ext = Path(filename).suffix
|
||||
final_display_name = f"{name_stem}({dup_count}){name_ext}"
|
||||
|
||||
# 生成存储路径
|
||||
timestamp = int(time.time())
|
||||
safe_name = sanitize_filename(Path(filename).stem)
|
||||
storage_path = f"{user_id}/{timestamp}_{safe_name}.wav"
|
||||
|
||||
# 上传 WAV 文件
|
||||
with open(tmp_wav_path, 'rb') as f:
|
||||
wav_data = f.read()
|
||||
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET_REF_AUDIOS,
|
||||
path=storage_path,
|
||||
file_data=wav_data,
|
||||
content_type="audio/wav"
|
||||
)
|
||||
|
||||
# 上传元数据 JSON
|
||||
metadata = {
|
||||
"ref_text": ref_text.strip(),
|
||||
"original_filename": final_display_name,
|
||||
"duration_sec": duration,
|
||||
"created_at": timestamp
|
||||
}
|
||||
metadata_path = f"{user_id}/{timestamp}_{safe_name}.json"
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET_REF_AUDIOS,
|
||||
path=metadata_path,
|
||||
file_data=json.dumps(metadata, ensure_ascii=False).encode('utf-8'),
|
||||
content_type="application/json"
|
||||
)
|
||||
|
||||
# 获取签名 URL
|
||||
signed_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, storage_path)
|
||||
|
||||
return RefAudioResponse(
|
||||
id=storage_path,
|
||||
name=filename,
|
||||
path=signed_url,
|
||||
ref_text=ref_text.strip(),
|
||||
duration_sec=duration,
|
||||
created_at=timestamp
|
||||
).model_dump()
|
||||
|
||||
finally:
|
||||
os.unlink(tmp_input_path)
|
||||
if os.path.exists(tmp_input_path + ".wav"):
|
||||
os.unlink(tmp_input_path + ".wav")
|
||||
|
||||
|
||||
async def list_ref_audios(user_id: str) -> dict:
|
||||
"""列出用户的所有参考音频"""
|
||||
files = await storage_service.list_files(BUCKET_REF_AUDIOS, user_id)
|
||||
wav_files = [f for f in files if f.get("name", "").endswith(".wav")]
|
||||
|
||||
if not wav_files:
|
||||
return RefAudioListResponse(items=[]).model_dump()
|
||||
|
||||
async def fetch_audio_info(f):
|
||||
name = f.get("name", "")
|
||||
storage_path = f"{user_id}/{name}"
|
||||
metadata_name = name.replace(".wav", ".json")
|
||||
metadata_path = f"{user_id}/{metadata_name}"
|
||||
|
||||
ref_text = ""
|
||||
duration_sec = 0.0
|
||||
created_at = 0
|
||||
original_filename = ""
|
||||
|
||||
try:
|
||||
metadata_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, metadata_path)
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.get(metadata_url)
|
||||
if resp.status_code == 200:
|
||||
metadata = resp.json()
|
||||
ref_text = metadata.get("ref_text", "")
|
||||
duration_sec = metadata.get("duration_sec", 0.0)
|
||||
created_at = metadata.get("created_at", 0)
|
||||
original_filename = metadata.get("original_filename", "")
|
||||
except Exception as e:
|
||||
logger.debug(f"读取 metadata 失败: {e}")
|
||||
try:
|
||||
created_at = int(name.split("_")[0])
|
||||
except:
|
||||
pass
|
||||
|
||||
signed_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, storage_path)
|
||||
|
||||
display_name = original_filename if original_filename else name
|
||||
if not display_name or display_name == name:
|
||||
match = re.match(r'^\d+_(.+)$', name)
|
||||
if match:
|
||||
display_name = match.group(1)
|
||||
|
||||
return RefAudioResponse(
|
||||
id=storage_path,
|
||||
name=display_name,
|
||||
path=signed_url,
|
||||
ref_text=ref_text,
|
||||
duration_sec=duration_sec,
|
||||
created_at=created_at
|
||||
)
|
||||
|
||||
items = await asyncio.gather(*[fetch_audio_info(f) for f in wav_files])
|
||||
items = sorted(items, key=lambda x: x.created_at, reverse=True)
|
||||
|
||||
return RefAudioListResponse(items=items).model_dump()
|
||||
|
||||
|
||||
async def delete_ref_audio(audio_id: str, user_id: str) -> None:
|
||||
"""删除参考音频及其元数据"""
|
||||
if not audio_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权删除此文件")
|
||||
|
||||
await storage_service.delete_file(BUCKET_REF_AUDIOS, audio_id)
|
||||
|
||||
metadata_path = audio_id.replace(".wav", ".json")
|
||||
try:
|
||||
await storage_service.delete_file(BUCKET_REF_AUDIOS, metadata_path)
|
||||
except:
|
||||
pass
|
||||
|
||||
|
||||
async def rename_ref_audio(audio_id: str, new_name: str, user_id: str) -> dict:
|
||||
"""重命名参考音频(修改 metadata 中的 display name)"""
|
||||
if not audio_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权修改此文件")
|
||||
|
||||
new_name = new_name.strip()
|
||||
if not new_name:
|
||||
raise ValueError("新名称不能为空")
|
||||
|
||||
if not Path(new_name).suffix:
|
||||
new_name += ".wav"
|
||||
|
||||
# 下载现有 metadata
|
||||
metadata_path = audio_id.replace(".wav", ".json")
|
||||
try:
|
||||
metadata_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, metadata_path)
|
||||
async with httpx.AsyncClient() as client:
|
||||
resp = await client.get(metadata_url)
|
||||
if resp.status_code == 200:
|
||||
metadata = resp.json()
|
||||
else:
|
||||
raise Exception(f"Failed to fetch metadata: {resp.status_code}")
|
||||
except Exception as e:
|
||||
logger.warning(f"无法读取元数据: {e}, 将创建新的元数据")
|
||||
metadata = {
|
||||
"ref_text": "",
|
||||
"duration_sec": 0.0,
|
||||
"created_at": int(time.time()),
|
||||
"original_filename": new_name
|
||||
}
|
||||
|
||||
# 更新并覆盖上传
|
||||
metadata["original_filename"] = new_name
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET_REF_AUDIOS,
|
||||
path=metadata_path,
|
||||
file_data=json.dumps(metadata, ensure_ascii=False).encode('utf-8'),
|
||||
content_type="application/json"
|
||||
)
|
||||
|
||||
return {"name": new_name}
|
||||
|
||||
|
||||
async def retranscribe_ref_audio(audio_id: str, user_id: str) -> dict:
|
||||
"""重新转写参考音频的 ref_text,并截取前 10 秒重新上传(用于迁移旧数据)"""
|
||||
if not audio_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权修改此文件")
|
||||
|
||||
# 下载音频到临时文件
|
||||
audio_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, audio_id)
|
||||
tmp_wav_path = None
|
||||
trimmed_path = None
|
||||
try:
|
||||
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp:
|
||||
tmp_wav_path = tmp.name
|
||||
timeout = httpx.Timeout(None)
|
||||
async with httpx.AsyncClient(timeout=timeout) as client:
|
||||
async with client.stream("GET", audio_url) as resp:
|
||||
resp.raise_for_status()
|
||||
async for chunk in resp.aiter_bytes():
|
||||
tmp.write(chunk)
|
||||
|
||||
# 超过 10 秒则截取前 10 秒并重新上传音频
|
||||
MAX_REF_DURATION = 10.0
|
||||
duration = _get_audio_duration(tmp_wav_path)
|
||||
transcribe_path = tmp_wav_path
|
||||
need_reupload = False
|
||||
|
||||
if duration > MAX_REF_DURATION:
|
||||
cut_point = _find_silence_cut_point(tmp_wav_path, MAX_REF_DURATION)
|
||||
logger.info(f"Retranscribe: trimming {audio_id} from {duration:.1f}s at {cut_point:.1f}s")
|
||||
trimmed_path = tmp_wav_path + "_trimmed.wav"
|
||||
if _convert_to_wav(tmp_wav_path, trimmed_path, max_duration=cut_point):
|
||||
transcribe_path = trimmed_path
|
||||
duration = _get_audio_duration(trimmed_path)
|
||||
need_reupload = True
|
||||
|
||||
# Whisper 转写
|
||||
from app.services.whisper_service import whisper_service
|
||||
transcribed = await whisper_service.transcribe(transcribe_path)
|
||||
if not transcribed or not transcribed.strip():
|
||||
raise ValueError("无法识别音频内容")
|
||||
|
||||
ref_text = transcribed.strip()
|
||||
logger.info(f"Re-transcribed ref audio {audio_id}: {ref_text[:50]}...")
|
||||
|
||||
# 截取过的音频重新上传覆盖原文件
|
||||
if need_reupload and trimmed_path:
|
||||
with open(trimmed_path, "rb") as f:
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET_REF_AUDIOS, path=audio_id,
|
||||
file_data=f.read(), content_type="audio/wav",
|
||||
)
|
||||
logger.info(f"Re-uploaded trimmed audio: {audio_id} ({duration:.1f}s)")
|
||||
|
||||
# 更新 metadata
|
||||
metadata_path = audio_id.replace(".wav", ".json")
|
||||
try:
|
||||
meta_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, metadata_path)
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.get(meta_url)
|
||||
if resp.status_code == 200:
|
||||
metadata = resp.json()
|
||||
else:
|
||||
raise Exception(f"status {resp.status_code}")
|
||||
except Exception:
|
||||
metadata = {}
|
||||
|
||||
metadata["ref_text"] = ref_text
|
||||
metadata["duration_sec"] = duration
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET_REF_AUDIOS,
|
||||
path=metadata_path,
|
||||
file_data=json.dumps(metadata, ensure_ascii=False).encode('utf-8'),
|
||||
content_type="application/json"
|
||||
)
|
||||
|
||||
return {"ref_text": ref_text, "duration_sec": duration}
|
||||
finally:
|
||||
if tmp_wav_path and os.path.exists(tmp_wav_path):
|
||||
os.unlink(tmp_wav_path)
|
||||
if trimmed_path and os.path.exists(trimmed_path):
|
||||
os.unlink(trimmed_path)
|
||||
0
backend/app/modules/tools/__init__.py
Normal file
0
backend/app/modules/tools/__init__.py
Normal file
33
backend/app/modules/tools/router.py
Normal file
33
backend/app/modules/tools/router.py
Normal file
@@ -0,0 +1,33 @@
|
||||
from fastapi import APIRouter, UploadFile, File, Form, HTTPException
|
||||
from typing import Optional
|
||||
import traceback
|
||||
from loguru import logger
|
||||
|
||||
from app.core.response import success_response
|
||||
from app.modules.tools import service
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
@router.post("/extract-script")
|
||||
async def extract_script_tool(
|
||||
file: Optional[UploadFile] = File(None),
|
||||
url: Optional[str] = Form(None),
|
||||
rewrite: bool = Form(True),
|
||||
custom_prompt: Optional[str] = Form(None)
|
||||
):
|
||||
"""独立文案提取工具"""
|
||||
try:
|
||||
result = await service.extract_script(file=file, url=url, rewrite=rewrite, custom_prompt=custom_prompt)
|
||||
return success_response(result)
|
||||
except ValueError as e:
|
||||
raise HTTPException(400, str(e))
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Tool extract failed: {e}")
|
||||
logger.error(traceback.format_exc())
|
||||
msg = str(e)
|
||||
if "Fresh cookies" in msg:
|
||||
msg = "下载失败:目标平台开启了反爬验证,请过段时间重试或直接上传视频文件。"
|
||||
raise HTTPException(500, f"提取失败: {msg}")
|
||||
7
backend/app/modules/tools/schemas.py
Normal file
7
backend/app/modules/tools/schemas.py
Normal file
@@ -0,0 +1,7 @@
|
||||
from pydantic import BaseModel
|
||||
from typing import Optional
|
||||
|
||||
|
||||
class ExtractScriptResponse(BaseModel):
|
||||
original_script: Optional[str] = None
|
||||
rewritten_script: Optional[str] = None
|
||||
354
backend/app/modules/tools/service.py
Normal file
354
backend/app/modules/tools/service.py
Normal file
@@ -0,0 +1,354 @@
|
||||
import asyncio
|
||||
import os
|
||||
import re
|
||||
import json
|
||||
import time
|
||||
import shutil
|
||||
import subprocess
|
||||
import traceback
|
||||
from pathlib import Path
|
||||
from typing import Optional, Any
|
||||
from urllib.parse import unquote
|
||||
|
||||
import httpx
|
||||
from loguru import logger
|
||||
|
||||
from app.services.whisper_service import whisper_service
|
||||
from app.services.glm_service import glm_service
|
||||
|
||||
|
||||
async def extract_script(file=None, url: Optional[str] = None, rewrite: bool = True, custom_prompt: Optional[str] = None) -> dict:
|
||||
"""
|
||||
文案提取:上传文件或视频链接 -> Whisper 转写 -> (可选) GLM 改写
|
||||
"""
|
||||
if not file and not url:
|
||||
raise ValueError("必须提供文件或视频链接")
|
||||
|
||||
temp_path = None
|
||||
try:
|
||||
timestamp = int(time.time())
|
||||
temp_dir = Path("/tmp")
|
||||
if os.name == 'nt':
|
||||
temp_dir = Path("d:/tmp")
|
||||
temp_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
loop = asyncio.get_event_loop()
|
||||
|
||||
# 1. 获取/保存文件
|
||||
if file:
|
||||
filename = file.filename
|
||||
if not filename:
|
||||
raise ValueError("文件名无效")
|
||||
safe_filename = Path(filename).name.replace(" ", "_")
|
||||
temp_path = temp_dir / f"tool_extract_{timestamp}_{safe_filename}"
|
||||
await loop.run_in_executor(None, lambda: shutil.copyfileobj(file.file, open(temp_path, "wb")))
|
||||
logger.info(f"Tool processing upload file: {temp_path}")
|
||||
else:
|
||||
temp_path = await _download_video(url, temp_dir, timestamp)
|
||||
|
||||
if not temp_path or not temp_path.exists():
|
||||
raise ValueError("文件获取失败")
|
||||
|
||||
# 1.5 安全转换: 强制转为 WAV (16k)
|
||||
audio_path = temp_dir / f"extract_audio_{timestamp}.wav"
|
||||
try:
|
||||
await loop.run_in_executor(None, lambda: _convert_to_wav(temp_path, audio_path))
|
||||
logger.info(f"Converted to WAV: {audio_path}")
|
||||
except ValueError as ve:
|
||||
if str(ve) == "HTML_DETECTED":
|
||||
raise ValueError("下载的文件是网页而非视频,请重试或手动上传。")
|
||||
else:
|
||||
raise ValueError("下载的文件已损坏或格式无法识别。")
|
||||
|
||||
# 2. 提取文案 (Whisper)
|
||||
script = await whisper_service.transcribe(str(audio_path))
|
||||
|
||||
# 3. AI 改写 (GLM) — 失败时降级返回原文
|
||||
rewritten = None
|
||||
if rewrite and script and len(script.strip()) > 0:
|
||||
logger.info("Rewriting script...")
|
||||
try:
|
||||
rewritten = await glm_service.rewrite_script(script, custom_prompt)
|
||||
except Exception as e:
|
||||
logger.warning(f"GLM rewrite failed, returning original script: {e}")
|
||||
rewritten = None
|
||||
|
||||
return {
|
||||
"original_script": script,
|
||||
"rewritten_script": rewritten
|
||||
}
|
||||
|
||||
finally:
|
||||
if temp_path and temp_path.exists():
|
||||
try:
|
||||
os.remove(temp_path)
|
||||
logger.info(f"Cleaned up temp file: {temp_path}")
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to cleanup temp file {temp_path}: {e}")
|
||||
|
||||
|
||||
def _convert_to_wav(input_path: Path, output_path: Path) -> None:
|
||||
"""FFmpeg 转换为 16k WAV"""
|
||||
try:
|
||||
convert_cmd = [
|
||||
'ffmpeg',
|
||||
'-i', str(input_path),
|
||||
'-vn',
|
||||
'-acodec', 'pcm_s16le',
|
||||
'-ar', '16000',
|
||||
'-ac', '1',
|
||||
'-y',
|
||||
str(output_path)
|
||||
]
|
||||
subprocess.run(convert_cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
except subprocess.CalledProcessError as e:
|
||||
error_log = e.stderr.decode('utf-8', errors='ignore') if e.stderr else str(e)
|
||||
logger.error(f"FFmpeg check/convert failed: {error_log}")
|
||||
head = b""
|
||||
try:
|
||||
with open(input_path, 'rb') as f:
|
||||
head = f.read(100)
|
||||
except:
|
||||
pass
|
||||
if b'<!DOCTYPE html' in head or b'<html' in head:
|
||||
raise ValueError("HTML_DETECTED")
|
||||
raise ValueError("CONVERT_FAILED")
|
||||
|
||||
|
||||
async def _download_video(url: str, temp_dir: Path, timestamp: int) -> Path:
|
||||
"""下载视频(yt-dlp 优先,失败回退手动解析)"""
|
||||
url_value = url
|
||||
url_match = re.search(r'https?://[^\s]+', url_value)
|
||||
if url_match:
|
||||
extracted_url = url_match.group(0)
|
||||
logger.info(f"Extracted URL from text: {extracted_url}")
|
||||
url_value = extracted_url
|
||||
|
||||
logger.info(f"Tool downloading URL: {url_value}")
|
||||
loop = asyncio.get_event_loop()
|
||||
|
||||
# 先尝试 yt-dlp
|
||||
try:
|
||||
temp_path = await loop.run_in_executor(None, lambda: _download_yt_dlp(url_value, temp_dir, timestamp))
|
||||
logger.info(f"yt-dlp downloaded to: {temp_path}")
|
||||
return temp_path
|
||||
except Exception as e:
|
||||
logger.warning(f"yt-dlp download failed: {e}. Trying manual fallback...")
|
||||
|
||||
if "douyin" in url_value:
|
||||
manual_path = await _download_douyin_manual(url_value, temp_dir, timestamp)
|
||||
if manual_path:
|
||||
return manual_path
|
||||
raise ValueError(f"视频下载失败。yt-dlp 报错: {str(e)}")
|
||||
elif "bilibili" in url_value:
|
||||
manual_path = await _download_bilibili_manual(url_value, temp_dir, timestamp)
|
||||
if manual_path:
|
||||
return manual_path
|
||||
raise ValueError(f"视频下载失败。yt-dlp 报错: {str(e)}")
|
||||
else:
|
||||
raise ValueError(f"视频下载失败: {str(e)}")
|
||||
|
||||
|
||||
def _download_yt_dlp(url_value: str, temp_dir: Path, timestamp: int) -> Path:
|
||||
"""yt-dlp 下载(阻塞调用,应在线程池中运行)"""
|
||||
import yt_dlp
|
||||
logger.info("Attempting download with yt-dlp...")
|
||||
|
||||
ydl_opts = {
|
||||
'format': 'bestaudio/best',
|
||||
'outtmpl': str(temp_dir / f"tool_download_{timestamp}_%(id)s.%(ext)s"),
|
||||
'quiet': True,
|
||||
'no_warnings': True,
|
||||
'http_headers': {
|
||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
|
||||
'Referer': 'https://www.douyin.com/',
|
||||
}
|
||||
}
|
||||
|
||||
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
|
||||
info = ydl.extract_info(url_value, download=True)
|
||||
if 'requested_downloads' in info:
|
||||
downloaded_file = info['requested_downloads'][0]['filepath']
|
||||
else:
|
||||
ext = info.get('ext', 'mp4')
|
||||
vid_id = info.get('id')
|
||||
downloaded_file = str(temp_dir / f"tool_download_{timestamp}_{vid_id}.{ext}")
|
||||
|
||||
return Path(downloaded_file)
|
||||
|
||||
|
||||
async def _download_douyin_manual(url: str, temp_dir: Path, timestamp: int) -> Optional[Path]:
|
||||
"""手动下载抖音视频 (Fallback) — 通过移动端分享页获取播放地址"""
|
||||
logger.info(f"[douyin-fallback] Starting download for: {url}")
|
||||
|
||||
try:
|
||||
# 1. 解析短链接,提取视频 ID
|
||||
headers = {
|
||||
"user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15"
|
||||
}
|
||||
|
||||
async with httpx.AsyncClient(follow_redirects=True, timeout=10.0) as client:
|
||||
resp = await client.get(url, headers=headers)
|
||||
final_url = str(resp.url)
|
||||
|
||||
logger.info(f"[douyin-fallback] Final URL: {final_url}")
|
||||
|
||||
video_id = None
|
||||
match = re.search(r'/video/(\d+)', final_url)
|
||||
if match:
|
||||
video_id = match.group(1)
|
||||
|
||||
if not video_id:
|
||||
logger.error("[douyin-fallback] Could not extract video_id")
|
||||
return None
|
||||
|
||||
logger.info(f"[douyin-fallback] Extracted video_id: {video_id}")
|
||||
|
||||
# 2. 获取新鲜 ttwid
|
||||
ttwid = ""
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
ttwid_resp = await client.post(
|
||||
"https://ttwid.bytedance.com/ttwid/union/register/",
|
||||
json={
|
||||
"region": "cn", "aid": 6383, "needFid": False,
|
||||
"service": "www.douyin.com",
|
||||
"migrate_info": {"ticket": "", "source": "node"},
|
||||
"cbUrlProtocol": "https", "union": True,
|
||||
}
|
||||
)
|
||||
ttwid = ttwid_resp.cookies.get("ttwid", "")
|
||||
logger.info(f"[douyin-fallback] Got fresh ttwid (len={len(ttwid)})")
|
||||
except Exception as e:
|
||||
logger.warning(f"[douyin-fallback] Failed to get ttwid: {e}")
|
||||
|
||||
# 3. 访问移动端分享页提取播放地址
|
||||
page_headers = {
|
||||
"user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15",
|
||||
"cookie": f"ttwid={ttwid}" if ttwid else "",
|
||||
}
|
||||
|
||||
async with httpx.AsyncClient(follow_redirects=True, timeout=15.0) as client:
|
||||
page_resp = await client.get(
|
||||
f"https://m.douyin.com/share/video/{video_id}",
|
||||
headers=page_headers,
|
||||
)
|
||||
|
||||
page_text = page_resp.text
|
||||
logger.info(f"[douyin-fallback] Mobile page length: {len(page_text)}")
|
||||
|
||||
# 4. 提取 play_addr
|
||||
addr_match = re.search(
|
||||
r'"play_addr":\{"uri":"([^"]+)","url_list":\["([^"]+)"',
|
||||
page_text,
|
||||
)
|
||||
if not addr_match:
|
||||
logger.error("[douyin-fallback] Could not find play_addr in mobile page")
|
||||
return None
|
||||
|
||||
video_url = addr_match.group(2).replace(r"\u002F", "/")
|
||||
if video_url.startswith("//"):
|
||||
video_url = "https:" + video_url
|
||||
|
||||
logger.info(f"[douyin-fallback] Found video URL: {video_url[:80]}...")
|
||||
|
||||
# 5. 下载视频
|
||||
temp_path = temp_dir / f"douyin_manual_{timestamp}.mp4"
|
||||
download_headers = {
|
||||
"Referer": "https://www.douyin.com/",
|
||||
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15",
|
||||
}
|
||||
|
||||
async with httpx.AsyncClient(timeout=120.0, follow_redirects=True) as client:
|
||||
async with client.stream("GET", video_url, headers=download_headers) as dl_resp:
|
||||
if dl_resp.status_code == 200:
|
||||
with open(temp_path, "wb") as f:
|
||||
async for chunk in dl_resp.aiter_bytes(chunk_size=8192):
|
||||
f.write(chunk)
|
||||
|
||||
logger.info(f"[douyin-fallback] Downloaded successfully: {temp_path}")
|
||||
return temp_path
|
||||
else:
|
||||
logger.error(f"[douyin-fallback] Download failed: {dl_resp.status_code}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"[douyin-fallback] Logic failed: {e}")
|
||||
return None
|
||||
|
||||
|
||||
async def _download_bilibili_manual(url: str, temp_dir: Path, timestamp: int) -> Optional[Path]:
|
||||
"""手动下载 Bilibili 视频 (Playwright Fallback)"""
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
logger.info(f"[Playwright] Starting Bilibili download for: {url}")
|
||||
|
||||
playwright = None
|
||||
browser = None
|
||||
try:
|
||||
playwright = await async_playwright().start()
|
||||
browser = await playwright.chromium.launch(headless=True, args=['--no-sandbox', '--disable-setuid-sandbox'])
|
||||
|
||||
context = await browser.new_context(
|
||||
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
page = await context.new_page()
|
||||
|
||||
logger.info("[Playwright] Navigating to Bilibili...")
|
||||
await page.goto(url, timeout=45000)
|
||||
|
||||
try:
|
||||
await page.wait_for_selector('video', timeout=15000)
|
||||
except:
|
||||
logger.warning("[Playwright] Video selector timeout")
|
||||
|
||||
playinfo = await page.evaluate("window.__playinfo__")
|
||||
|
||||
audio_url = None
|
||||
|
||||
if playinfo and "data" in playinfo and "dash" in playinfo["data"]:
|
||||
dash = playinfo["data"]["dash"]
|
||||
if "audio" in dash and dash["audio"]:
|
||||
audio_url = dash["audio"][0]["baseUrl"]
|
||||
logger.info(f"[Playwright] Found audio stream in __playinfo__: {audio_url[:50]}...")
|
||||
|
||||
if not audio_url:
|
||||
logger.warning("[Playwright] Could not find audio in __playinfo__")
|
||||
return None
|
||||
|
||||
temp_path = temp_dir / f"bilibili_audio_{timestamp}.m4s"
|
||||
|
||||
try:
|
||||
api_request = context.request
|
||||
headers = {
|
||||
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
|
||||
"Referer": "https://www.bilibili.com/"
|
||||
}
|
||||
|
||||
logger.info(f"[Playwright] Downloading audio stream...")
|
||||
response = await api_request.get(audio_url, headers=headers)
|
||||
|
||||
if response.status == 200:
|
||||
body = await response.body()
|
||||
with open(temp_path, 'wb') as f:
|
||||
f.write(body)
|
||||
|
||||
logger.info(f"[Playwright] Downloaded successfully: {temp_path}")
|
||||
return temp_path
|
||||
else:
|
||||
logger.error(f"[Playwright] API Request failed: {response.status}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"[Playwright] Download logic error: {e}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"[Playwright] Bilibili download failed: {e}")
|
||||
return None
|
||||
finally:
|
||||
if browser:
|
||||
await browser.close()
|
||||
if playwright:
|
||||
await playwright.stop()
|
||||
0
backend/app/modules/videos/__init__.py
Normal file
0
backend/app/modules/videos/__init__.py
Normal file
64
backend/app/modules/videos/router.py
Normal file
64
backend/app/modules/videos/router.py
Normal file
@@ -0,0 +1,64 @@
|
||||
from fastapi import APIRouter, BackgroundTasks, Depends
|
||||
import uuid
|
||||
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
|
||||
from .schemas import GenerateRequest
|
||||
from .task_store import create_task, get_task, list_tasks
|
||||
from .workflow import process_video_generation, get_lipsync_health, get_voiceclone_health
|
||||
from .service import list_generated_videos, delete_generated_video
|
||||
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
@router.post("/generate")
|
||||
async def generate_video(
|
||||
req: GenerateRequest,
|
||||
background_tasks: BackgroundTasks,
|
||||
current_user: dict = Depends(get_current_user)
|
||||
):
|
||||
user_id = current_user["id"]
|
||||
task_id = str(uuid.uuid4())
|
||||
create_task(task_id, user_id)
|
||||
background_tasks.add_task(process_video_generation, task_id, req, user_id)
|
||||
return success_response({"task_id": task_id})
|
||||
|
||||
|
||||
@router.get("/tasks/{task_id}")
|
||||
async def get_task_status(task_id: str, current_user: dict = Depends(get_current_user)):
|
||||
task = get_task(task_id)
|
||||
# 验证任务归属:只能查看自己的任务
|
||||
if task.get("status") != "not_found" and task.get("user_id") != current_user["id"]:
|
||||
return success_response({"status": "not_found"})
|
||||
return success_response(task)
|
||||
|
||||
|
||||
@router.get("/tasks")
|
||||
async def list_tasks_view(current_user: dict = Depends(get_current_user)):
|
||||
# 只返回当前用户的任务
|
||||
all_tasks = list_tasks()
|
||||
user_tasks = [t for t in all_tasks if t.get("user_id") == current_user["id"]]
|
||||
return success_response({"tasks": user_tasks})
|
||||
|
||||
|
||||
@router.get("/lipsync/health")
|
||||
async def lipsync_health():
|
||||
return success_response(await get_lipsync_health())
|
||||
|
||||
|
||||
@router.get("/voiceclone/health")
|
||||
async def voiceclone_health():
|
||||
return success_response(await get_voiceclone_health())
|
||||
|
||||
|
||||
@router.get("/generated")
|
||||
async def list_generated(current_user: dict = Depends(get_current_user)):
|
||||
return success_response(await list_generated_videos(current_user["id"]))
|
||||
|
||||
|
||||
@router.delete("/generated/{video_id}")
|
||||
async def delete_generated(video_id: str, current_user: dict = Depends(get_current_user)):
|
||||
result = await delete_generated_video(current_user["id"], video_id)
|
||||
return success_response(result, message="视频已删除")
|
||||
40
backend/app/modules/videos/schemas.py
Normal file
40
backend/app/modules/videos/schemas.py
Normal file
@@ -0,0 +1,40 @@
|
||||
from pydantic import BaseModel
|
||||
from typing import Optional, List, Literal
|
||||
|
||||
|
||||
class CustomAssignment(BaseModel):
|
||||
material_path: str
|
||||
start: float # 音频时间轴起点
|
||||
end: float # 音频时间轴终点
|
||||
source_start: float = 0.0 # 源视频截取起点
|
||||
source_end: Optional[float] = None # 源视频截取终点(可选)
|
||||
|
||||
|
||||
class GenerateRequest(BaseModel):
|
||||
text: str
|
||||
voice: str = "zh-CN-YunxiNeural"
|
||||
material_path: str
|
||||
material_paths: Optional[List[str]] = None
|
||||
tts_mode: str = "edgetts"
|
||||
ref_audio_id: Optional[str] = None
|
||||
ref_text: Optional[str] = None
|
||||
language: str = "zh-CN"
|
||||
generated_audio_id: Optional[str] = None # 预生成配音 ID(存在时跳过内联 TTS)
|
||||
title: Optional[str] = None
|
||||
title_display_mode: Literal["short", "persistent"] = "short"
|
||||
title_duration: float = 4.0
|
||||
enable_subtitles: bool = True
|
||||
subtitle_style_id: Optional[str] = None
|
||||
title_style_id: Optional[str] = None
|
||||
secondary_title: Optional[str] = None
|
||||
secondary_title_style_id: Optional[str] = None
|
||||
secondary_title_font_size: Optional[int] = None
|
||||
secondary_title_top_margin: Optional[int] = None
|
||||
subtitle_font_size: Optional[int] = None
|
||||
title_font_size: Optional[int] = None
|
||||
title_top_margin: Optional[int] = None
|
||||
subtitle_bottom_margin: Optional[int] = None
|
||||
bgm_id: Optional[str] = None
|
||||
bgm_volume: Optional[float] = 0.2
|
||||
custom_assignments: Optional[List[CustomAssignment]] = None
|
||||
output_aspect_ratio: Literal["9:16", "16:9"] = "9:16"
|
||||
87
backend/app/modules/videos/service.py
Normal file
87
backend/app/modules/videos/service.py
Normal file
@@ -0,0 +1,87 @@
|
||||
from fastapi import HTTPException
|
||||
import asyncio
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
|
||||
from app.services.storage import storage_service
|
||||
|
||||
|
||||
async def list_generated_videos(user_id: str) -> dict:
|
||||
"""从 Storage 读取当前用户生成的视频列表"""
|
||||
try:
|
||||
files_obj = await storage_service.list_files(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=user_id
|
||||
)
|
||||
|
||||
semaphore = asyncio.Semaphore(8)
|
||||
|
||||
async def build_item(f):
|
||||
name = f.get("name")
|
||||
if not name or name == ".emptyFolderPlaceholder":
|
||||
return None
|
||||
|
||||
if not name.endswith("_output.mp4"):
|
||||
return None
|
||||
|
||||
video_id = Path(name).stem
|
||||
full_path = f"{user_id}/{name}"
|
||||
|
||||
async with semaphore:
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=full_path
|
||||
)
|
||||
|
||||
metadata = f.get("metadata", {})
|
||||
size = metadata.get("size", 0)
|
||||
created_at_str = f.get("created_at", "")
|
||||
created_at = 0
|
||||
if created_at_str:
|
||||
from datetime import datetime
|
||||
try:
|
||||
dt = datetime.fromisoformat(created_at_str.replace("Z", "+00:00"))
|
||||
created_at = int(dt.timestamp())
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return {
|
||||
"id": video_id,
|
||||
"name": name,
|
||||
"path": signed_url,
|
||||
"size_mb": size / (1024 * 1024),
|
||||
"created_at": created_at
|
||||
}
|
||||
|
||||
tasks = [build_item(f) for f in files_obj]
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
videos = []
|
||||
for item in results:
|
||||
if not item:
|
||||
continue
|
||||
if isinstance(item, Exception):
|
||||
logger.warning(f"Signed url build failed: {item}")
|
||||
continue
|
||||
videos.append(item)
|
||||
|
||||
videos.sort(key=lambda x: x.get("created_at", ""), reverse=True)
|
||||
return {"videos": videos}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"List generated videos failed: {e}")
|
||||
return {"videos": []}
|
||||
|
||||
|
||||
async def delete_generated_video(user_id: str, video_id: str) -> dict:
|
||||
"""删除生成的视频"""
|
||||
try:
|
||||
storage_path = f"{user_id}/{video_id}.mp4"
|
||||
|
||||
await storage_service.delete_file(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=storage_path
|
||||
)
|
||||
return {"video_id": video_id}
|
||||
except Exception as e:
|
||||
raise HTTPException(500, f"删除失败: {str(e)}")
|
||||
118
backend/app/modules/videos/task_store.py
Normal file
118
backend/app/modules/videos/task_store.py
Normal file
@@ -0,0 +1,118 @@
|
||||
from typing import Any, Dict, List
|
||||
import json
|
||||
|
||||
from loguru import logger
|
||||
from app.core.config import settings
|
||||
|
||||
try:
|
||||
import redis
|
||||
except Exception: # pragma: no cover - optional dependency
|
||||
redis = None
|
||||
|
||||
|
||||
class InMemoryTaskStore:
|
||||
def __init__(self) -> None:
|
||||
self._tasks: Dict[str, Dict[str, Any]] = {}
|
||||
|
||||
def create(self, task_id: str, user_id: str) -> Dict[str, Any]:
|
||||
task = {
|
||||
"status": "pending",
|
||||
"task_id": task_id,
|
||||
"progress": 0,
|
||||
"user_id": user_id,
|
||||
}
|
||||
self._tasks[task_id] = task
|
||||
return task
|
||||
|
||||
def get(self, task_id: str) -> Dict[str, Any]:
|
||||
return self._tasks.get(task_id, {"status": "not_found"})
|
||||
|
||||
def list(self) -> List[Dict[str, Any]]:
|
||||
return list(self._tasks.values())
|
||||
|
||||
def update(self, task_id: str, updates: Dict[str, Any]) -> Dict[str, Any]:
|
||||
task = self._tasks.get(task_id)
|
||||
if not task:
|
||||
task = {"status": "pending", "task_id": task_id}
|
||||
self._tasks[task_id] = task
|
||||
task.update(updates)
|
||||
return task
|
||||
|
||||
|
||||
class RedisTaskStore:
|
||||
def __init__(self, client: "redis.Redis") -> None:
|
||||
self._client = client
|
||||
self._index_key = "vigent:tasks:index"
|
||||
|
||||
def _key(self, task_id: str) -> str:
|
||||
return f"vigent:tasks:{task_id}"
|
||||
|
||||
def create(self, task_id: str, user_id: str) -> Dict[str, Any]:
|
||||
task = {
|
||||
"status": "pending",
|
||||
"task_id": task_id,
|
||||
"progress": 0,
|
||||
"user_id": user_id,
|
||||
}
|
||||
self._client.set(self._key(task_id), json.dumps(task, ensure_ascii=False))
|
||||
self._client.sadd(self._index_key, task_id)
|
||||
return task
|
||||
|
||||
def get(self, task_id: str) -> Dict[str, Any]:
|
||||
raw = self._client.get(self._key(task_id))
|
||||
if not raw:
|
||||
return {"status": "not_found"}
|
||||
return json.loads(raw)
|
||||
|
||||
def list(self) -> List[Dict[str, Any]]:
|
||||
task_ids = list(self._client.smembers(self._index_key) or [])
|
||||
if not task_ids:
|
||||
return []
|
||||
keys = [self._key(task_id) for task_id in task_ids]
|
||||
raw_items = self._client.mget(keys)
|
||||
tasks = []
|
||||
for raw in raw_items:
|
||||
if raw:
|
||||
try:
|
||||
tasks.append(json.loads(raw))
|
||||
except Exception:
|
||||
continue
|
||||
return tasks
|
||||
|
||||
def update(self, task_id: str, updates: Dict[str, Any]) -> Dict[str, Any]:
|
||||
task = self.get(task_id)
|
||||
if task.get("status") == "not_found":
|
||||
task = {"status": "pending", "task_id": task_id}
|
||||
task.update(updates)
|
||||
self._client.set(self._key(task_id), json.dumps(task, ensure_ascii=False))
|
||||
self._client.sadd(self._index_key, task_id)
|
||||
return task
|
||||
|
||||
|
||||
def _build_task_store():
|
||||
if redis is None:
|
||||
logger.warning("Redis not available, using in-memory task store")
|
||||
return InMemoryTaskStore()
|
||||
try:
|
||||
client = redis.Redis.from_url(settings.REDIS_URL, decode_responses=True)
|
||||
client.ping()
|
||||
logger.info("Using Redis task store")
|
||||
return RedisTaskStore(client)
|
||||
except Exception as e:
|
||||
logger.warning(f"Redis connection failed, using in-memory task store: {e}")
|
||||
return InMemoryTaskStore()
|
||||
|
||||
|
||||
task_store = _build_task_store()
|
||||
|
||||
|
||||
def create_task(task_id: str, user_id: str) -> Dict[str, Any]:
|
||||
return task_store.create(task_id, user_id)
|
||||
|
||||
|
||||
def get_task(task_id: str) -> Dict[str, Any]:
|
||||
return task_store.get(task_id)
|
||||
|
||||
|
||||
def list_tasks() -> List[Dict[str, Any]]:
|
||||
return task_store.list()
|
||||
829
backend/app/modules/videos/workflow.py
Normal file
829
backend/app/modules/videos/workflow.py
Normal file
@@ -0,0 +1,829 @@
|
||||
from typing import Optional, Any, List
|
||||
from pathlib import Path
|
||||
import asyncio
|
||||
import time
|
||||
import traceback
|
||||
import httpx
|
||||
from loguru import logger
|
||||
|
||||
from app.core.config import settings
|
||||
from app.services.tts_service import TTSService
|
||||
from app.services.video_service import VideoService
|
||||
from app.services.lipsync_service import LipSyncService
|
||||
from app.services.voice_clone_service import voice_clone_service
|
||||
from app.services.assets_service import (
|
||||
get_style,
|
||||
get_default_style,
|
||||
resolve_bgm_path,
|
||||
prepare_style_for_remotion,
|
||||
)
|
||||
from app.services.storage import storage_service
|
||||
from app.services.whisper_service import whisper_service
|
||||
from app.services.remotion_service import remotion_service
|
||||
|
||||
from .schemas import GenerateRequest
|
||||
from .task_store import task_store
|
||||
|
||||
|
||||
def _locale_to_whisper_lang(locale: str) -> str:
|
||||
"""'en-US' → 'en', 'zh-CN' → 'zh'"""
|
||||
return locale.split("-")[0] if "-" in locale else locale
|
||||
|
||||
|
||||
def _locale_to_tts_lang(locale: str) -> str:
|
||||
"""'zh-CN' → 'Chinese', 'en-US' → 'English', 其他 → 'Auto'"""
|
||||
mapping = {"zh": "Chinese", "en": "English"}
|
||||
return mapping.get(locale.split("-")[0], "Auto")
|
||||
|
||||
|
||||
_lipsync_service: Optional[LipSyncService] = None
|
||||
_lipsync_ready: Optional[bool] = None
|
||||
_lipsync_last_check: float = 0
|
||||
|
||||
|
||||
def _get_lipsync_service() -> LipSyncService:
|
||||
"""获取或创建 LipSync 服务实例(单例模式,避免重复初始化)"""
|
||||
global _lipsync_service
|
||||
if _lipsync_service is None:
|
||||
_lipsync_service = LipSyncService()
|
||||
return _lipsync_service
|
||||
|
||||
|
||||
async def _check_lipsync_ready(force: bool = False) -> bool:
|
||||
"""检查 LipSync 是否就绪(带缓存,5分钟内不重复检查)"""
|
||||
global _lipsync_ready, _lipsync_last_check
|
||||
|
||||
now = time.time()
|
||||
if not force and _lipsync_ready is not None and (now - _lipsync_last_check) < 300:
|
||||
return bool(_lipsync_ready)
|
||||
|
||||
lipsync = _get_lipsync_service()
|
||||
health = await lipsync.check_health()
|
||||
_lipsync_ready = health.get("ready", False)
|
||||
_lipsync_last_check = now
|
||||
print(f"[LipSync] Health check: ready={_lipsync_ready}")
|
||||
return bool(_lipsync_ready)
|
||||
|
||||
|
||||
async def _download_material(path_or_url: str, temp_path: Path):
|
||||
"""下载素材到临时文件 (流式下载,节省内存)"""
|
||||
if path_or_url.startswith("http"):
|
||||
timeout = httpx.Timeout(None)
|
||||
async with httpx.AsyncClient(timeout=timeout) as client:
|
||||
async with client.stream("GET", path_or_url) as resp:
|
||||
resp.raise_for_status()
|
||||
with open(temp_path, "wb") as f:
|
||||
async for chunk in resp.aiter_bytes():
|
||||
f.write(chunk)
|
||||
else:
|
||||
src = Path(path_or_url)
|
||||
if not src.is_absolute():
|
||||
src = settings.BASE_DIR.parent / path_or_url
|
||||
|
||||
if src.exists():
|
||||
import shutil
|
||||
shutil.copy(src, temp_path)
|
||||
else:
|
||||
raise FileNotFoundError(f"Material not found: {path_or_url}")
|
||||
|
||||
|
||||
def _update_task(task_id: str, **updates: Any) -> None:
|
||||
task_store.update(task_id, updates)
|
||||
|
||||
|
||||
# ── 多素材辅助函数 ──
|
||||
|
||||
|
||||
def _split_equal(segments: List[dict], material_paths: List[str]) -> List[dict]:
|
||||
"""按素材数量均分音频时长,对齐到最近的 Whisper 字边界。
|
||||
|
||||
Args:
|
||||
segments: Whisper 产出的 segment 列表, 每个包含 words (字级时间戳)
|
||||
material_paths: 素材路径列表
|
||||
|
||||
Returns:
|
||||
[{"material_path": "...", "start": 0.0, "end": 5.2, "index": 0}, ...]
|
||||
"""
|
||||
# 展平所有 Whisper 字符
|
||||
all_chars: List[dict] = []
|
||||
for seg in segments:
|
||||
for w in seg.get("words", []):
|
||||
all_chars.append(w)
|
||||
|
||||
n = len(material_paths)
|
||||
|
||||
if not all_chars or n == 0:
|
||||
return [{"material_path": material_paths[0] if material_paths else "",
|
||||
"start": 0.0, "end": 99999.0, "index": 0}]
|
||||
|
||||
# 素材数不能超过字符数,否则边界会重复
|
||||
if n > len(all_chars):
|
||||
logger.warning(f"[MultiMat] 素材数({n}) > 字符数({len(all_chars)}),裁剪为 {len(all_chars)}")
|
||||
n = len(all_chars)
|
||||
|
||||
total_start = all_chars[0]["start"]
|
||||
total_end = all_chars[-1]["end"]
|
||||
seg_dur = (total_end - total_start) / n
|
||||
|
||||
# 计算 N-1 个分割点,对齐到最近的字边界
|
||||
boundaries = [0] # 第一段从第 0 个字开始
|
||||
for i in range(1, n):
|
||||
target_time = total_start + i * seg_dur
|
||||
# 找到 start 时间最接近 target_time 的字
|
||||
best_idx = boundaries[-1] + 1 # 至少比上一个边界后移 1
|
||||
best_diff = float("inf")
|
||||
for j in range(boundaries[-1] + 1, len(all_chars)):
|
||||
diff = abs(all_chars[j]["start"] - target_time)
|
||||
if diff < best_diff:
|
||||
best_diff = diff
|
||||
best_idx = j
|
||||
elif diff > best_diff:
|
||||
break # 时间递增,差值开始变大后可以停了
|
||||
boundaries.append(min(best_idx, len(all_chars) - 1))
|
||||
boundaries.append(len(all_chars)) # 最后一段到末尾
|
||||
|
||||
# 按边界生成分配结果
|
||||
assignments: List[dict] = []
|
||||
for i in range(n):
|
||||
s_idx = boundaries[i]
|
||||
e_idx = boundaries[i + 1]
|
||||
if s_idx >= len(all_chars) or s_idx >= e_idx:
|
||||
continue
|
||||
assignments.append({
|
||||
"material_path": material_paths[i],
|
||||
"start": all_chars[s_idx]["start"],
|
||||
"end": all_chars[e_idx - 1]["end"],
|
||||
"text": "".join(c["word"] for c in all_chars[s_idx:e_idx]),
|
||||
"index": len(assignments),
|
||||
})
|
||||
|
||||
if not assignments:
|
||||
return [{"material_path": material_paths[0], "start": 0.0, "end": 99999.0, "index": 0}]
|
||||
|
||||
logger.info(f"[MultiMat] 均分 {len(all_chars)} 字为 {len(assignments)} 段")
|
||||
for a in assignments:
|
||||
dur = a["end"] - a["start"]
|
||||
logger.info(f" 段{a['index']}: [{a['start']:.2f}-{a['end']:.2f}s] ({dur:.1f}s) {a['text'][:20]}")
|
||||
|
||||
return assignments
|
||||
|
||||
|
||||
async def process_video_generation(task_id: str, req: GenerateRequest, user_id: str):
|
||||
temp_files = []
|
||||
try:
|
||||
start_time = time.time()
|
||||
|
||||
# ── 确定素材列表 ──
|
||||
material_paths: List[str] = []
|
||||
if req.custom_assignments and len(req.custom_assignments) > 1:
|
||||
material_paths = [a.material_path for a in req.custom_assignments if a.material_path]
|
||||
elif req.material_paths and len(req.material_paths) > 1:
|
||||
material_paths = req.material_paths
|
||||
else:
|
||||
material_paths = [req.material_path]
|
||||
|
||||
is_multi = len(material_paths) > 1
|
||||
target_resolution = (1080, 1920) if req.output_aspect_ratio == "9:16" else (1920, 1080)
|
||||
|
||||
logger.info(
|
||||
f"[Render] 输出画面比例: {req.output_aspect_ratio}, "
|
||||
f"目标分辨率: {target_resolution[0]}x{target_resolution[1]}"
|
||||
)
|
||||
|
||||
_update_task(task_id, status="processing", progress=5, message="正在下载素材...")
|
||||
|
||||
temp_dir = settings.UPLOAD_DIR / "temp"
|
||||
temp_dir.mkdir(parents=True, exist_ok=True)
|
||||
video = VideoService()
|
||||
input_material_path: Optional[Path] = None
|
||||
|
||||
# 单素材模式:下载主素材
|
||||
if not is_multi:
|
||||
input_material_path = temp_dir / f"{task_id}_input.mp4"
|
||||
temp_files.append(input_material_path)
|
||||
await _download_material(material_paths[0], input_material_path)
|
||||
|
||||
# 归一化旋转元数据(如 iPhone MOV 1920x1080 + rotation=-90)
|
||||
normalized_input_path = temp_dir / f"{task_id}_input_norm.mp4"
|
||||
normalized_result = video.normalize_orientation(
|
||||
str(input_material_path),
|
||||
str(normalized_input_path),
|
||||
)
|
||||
if normalized_result != str(input_material_path):
|
||||
temp_files.append(normalized_input_path)
|
||||
input_material_path = normalized_input_path
|
||||
|
||||
_update_task(task_id, message="正在生成语音...", progress=10)
|
||||
|
||||
audio_path = temp_dir / f"{task_id}_audio.wav"
|
||||
temp_files.append(audio_path)
|
||||
|
||||
if req.generated_audio_id:
|
||||
# 新流程:使用预生成的配音
|
||||
_update_task(task_id, message="正在下载配音...", progress=12)
|
||||
audio_url = await storage_service.get_signed_url(
|
||||
bucket="generated-audios",
|
||||
path=req.generated_audio_id,
|
||||
)
|
||||
await _download_material(audio_url, audio_path)
|
||||
|
||||
# 从元数据获取 language
|
||||
meta_path = req.generated_audio_id.replace("_audio.wav", "_audio.json")
|
||||
try:
|
||||
meta_url = await storage_service.get_signed_url(
|
||||
bucket="generated-audios", path=meta_path,
|
||||
)
|
||||
import httpx as _httpx
|
||||
async with _httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.get(meta_url)
|
||||
if resp.status_code == 200:
|
||||
meta = resp.json()
|
||||
req.language = meta.get("language", req.language)
|
||||
# 无条件用配音元数据覆盖文案,确保字幕与配音语言一致
|
||||
meta_text = meta.get("text", "")
|
||||
if meta_text:
|
||||
req.text = meta_text
|
||||
except Exception as e:
|
||||
logger.warning(f"读取配音元数据失败: {e}")
|
||||
|
||||
elif req.tts_mode == "voiceclone":
|
||||
if not req.ref_audio_id or not req.ref_text:
|
||||
raise ValueError("声音克隆模式需要提供参考音频和参考文字")
|
||||
|
||||
_update_task(task_id, message="正在下载参考音频...")
|
||||
|
||||
ref_audio_local = temp_dir / f"{task_id}_ref.wav"
|
||||
temp_files.append(ref_audio_local)
|
||||
|
||||
ref_audio_url = await storage_service.get_signed_url(
|
||||
bucket="ref-audios",
|
||||
path=req.ref_audio_id
|
||||
)
|
||||
await _download_material(ref_audio_url, ref_audio_local)
|
||||
|
||||
_update_task(task_id, message="正在克隆声音...")
|
||||
await voice_clone_service.generate_audio(
|
||||
text=req.text,
|
||||
ref_audio_path=str(ref_audio_local),
|
||||
ref_text=req.ref_text,
|
||||
output_path=str(audio_path),
|
||||
language=_locale_to_tts_lang(req.language)
|
||||
)
|
||||
else:
|
||||
_update_task(task_id, message="正在生成语音 (EdgeTTS)...")
|
||||
tts = TTSService()
|
||||
await tts.generate_audio(req.text, req.voice, str(audio_path))
|
||||
|
||||
tts_time = time.time() - start_time
|
||||
print(f"[Pipeline] TTS completed in {tts_time:.1f}s")
|
||||
|
||||
lipsync = _get_lipsync_service()
|
||||
lipsync_video_path = temp_dir / f"{task_id}_lipsync.mp4"
|
||||
temp_files.append(lipsync_video_path)
|
||||
|
||||
captions_path = None
|
||||
|
||||
if is_multi:
|
||||
# ══════════════════════════════════════
|
||||
# 多素材流水线
|
||||
# ══════════════════════════════════════
|
||||
_update_task(task_id, progress=12, message="正在分配素材...")
|
||||
|
||||
if req.custom_assignments and len(req.custom_assignments) == len(material_paths):
|
||||
# 用户自定义分配,跳过 Whisper 均分
|
||||
assignments = [
|
||||
{
|
||||
"material_path": a.material_path,
|
||||
"start": a.start,
|
||||
"end": a.end,
|
||||
"source_start": a.source_start,
|
||||
"source_end": a.source_end,
|
||||
"index": i,
|
||||
}
|
||||
for i, a in enumerate(req.custom_assignments)
|
||||
]
|
||||
# 仍然需要 Whisper 生成字幕(如果启用)
|
||||
captions_path = temp_dir / f"{task_id}_captions.json"
|
||||
temp_files.append(captions_path)
|
||||
if req.enable_subtitles:
|
||||
_update_task(task_id, message="正在生成字幕 (Whisper)...")
|
||||
try:
|
||||
await whisper_service.align(
|
||||
audio_path=str(audio_path),
|
||||
text=req.text,
|
||||
output_path=str(captions_path),
|
||||
language=_locale_to_whisper_lang(req.language),
|
||||
original_text=req.text,
|
||||
)
|
||||
print(f"[Pipeline] Whisper alignment completed (custom assignments)")
|
||||
except Exception as e:
|
||||
logger.warning(f"Whisper alignment failed: {e}")
|
||||
captions_path = None
|
||||
else:
|
||||
captions_path = None
|
||||
elif req.custom_assignments:
|
||||
logger.warning(
|
||||
f"[MultiMat] custom_assignments 数量({len(req.custom_assignments)})"
|
||||
f" 与素材数量({len(material_paths)})不一致,回退自动分配"
|
||||
)
|
||||
|
||||
# 原有逻辑:Whisper → _split_equal
|
||||
_update_task(task_id, message="正在生成字幕 (Whisper)...")
|
||||
|
||||
captions_path = temp_dir / f"{task_id}_captions.json"
|
||||
temp_files.append(captions_path)
|
||||
|
||||
try:
|
||||
captions_data = await whisper_service.align(
|
||||
audio_path=str(audio_path),
|
||||
text=req.text,
|
||||
output_path=str(captions_path),
|
||||
language=_locale_to_whisper_lang(req.language),
|
||||
original_text=req.text,
|
||||
)
|
||||
print(f"[Pipeline] Whisper alignment completed (multi-material)")
|
||||
except Exception as e:
|
||||
logger.warning(f"Whisper alignment failed: {e}")
|
||||
captions_data = None
|
||||
captions_path = None
|
||||
|
||||
_update_task(task_id, progress=15, message="正在分配素材...")
|
||||
|
||||
if captions_data and captions_data.get("segments"):
|
||||
assignments = _split_equal(captions_data["segments"], material_paths)
|
||||
else:
|
||||
# Whisper 失败 → 按时长均分(不依赖字符对齐)
|
||||
logger.warning("[MultiMat] Whisper 无数据,按时长均分")
|
||||
audio_dur = video._get_duration(str(audio_path))
|
||||
if audio_dur <= 0:
|
||||
audio_dur = 30.0 # 安全兜底
|
||||
seg_dur = audio_dur / len(material_paths)
|
||||
assignments = [
|
||||
{"material_path": material_paths[i], "start": i * seg_dur,
|
||||
"end": (i + 1) * seg_dur, "index": i}
|
||||
for i in range(len(material_paths))
|
||||
]
|
||||
|
||||
else:
|
||||
# 原有逻辑:Whisper → _split_equal
|
||||
_update_task(task_id, message="正在生成字幕 (Whisper)...")
|
||||
|
||||
captions_path = temp_dir / f"{task_id}_captions.json"
|
||||
temp_files.append(captions_path)
|
||||
|
||||
try:
|
||||
captions_data = await whisper_service.align(
|
||||
audio_path=str(audio_path),
|
||||
text=req.text,
|
||||
output_path=str(captions_path),
|
||||
language=_locale_to_whisper_lang(req.language),
|
||||
original_text=req.text,
|
||||
)
|
||||
print(f"[Pipeline] Whisper alignment completed (multi-material)")
|
||||
except Exception as e:
|
||||
logger.warning(f"Whisper alignment failed: {e}")
|
||||
captions_data = None
|
||||
captions_path = None
|
||||
|
||||
_update_task(task_id, progress=15, message="正在分配素材...")
|
||||
|
||||
if captions_data and captions_data.get("segments"):
|
||||
assignments = _split_equal(captions_data["segments"], material_paths)
|
||||
else:
|
||||
# Whisper 失败 → 按时长均分(不依赖字符对齐)
|
||||
logger.warning("[MultiMat] Whisper 无数据,按时长均分")
|
||||
audio_dur = video._get_duration(str(audio_path))
|
||||
if audio_dur <= 0:
|
||||
audio_dur = 30.0 # 安全兜底
|
||||
seg_dur = audio_dur / len(material_paths)
|
||||
assignments = [
|
||||
{"material_path": material_paths[i], "start": i * seg_dur,
|
||||
"end": (i + 1) * seg_dur, "index": i}
|
||||
for i in range(len(material_paths))
|
||||
]
|
||||
|
||||
# 扩展段覆盖完整音频范围:首段从0开始,末段到音频结尾
|
||||
audio_duration = video._get_duration(str(audio_path))
|
||||
if assignments and audio_duration > 0:
|
||||
assignments[0]["start"] = 0.0
|
||||
assignments[-1]["end"] = audio_duration
|
||||
|
||||
num_segments = len(assignments)
|
||||
print(f"[Pipeline] Multi-material: {num_segments} segments, {len(material_paths)} materials")
|
||||
|
||||
if num_segments == 0:
|
||||
raise RuntimeError("Multi-material: no valid segments after splitting")
|
||||
|
||||
lipsync_start = time.time()
|
||||
|
||||
# ── 第一步:并行下载所有素材并检测分辨率 ──
|
||||
material_locals: List[Path] = []
|
||||
resolutions = []
|
||||
|
||||
async def _download_and_normalize(i: int, assignment: dict):
|
||||
"""下载单个素材并归一化方向"""
|
||||
material_local = temp_dir / f"{task_id}_material_{i}.mp4"
|
||||
temp_files.append(material_local)
|
||||
await _download_material(assignment["material_path"], material_local)
|
||||
|
||||
normalized_material = temp_dir / f"{task_id}_material_{i}_norm.mp4"
|
||||
loop = asyncio.get_event_loop()
|
||||
normalized_result = await loop.run_in_executor(
|
||||
None,
|
||||
video.normalize_orientation,
|
||||
str(material_local),
|
||||
str(normalized_material),
|
||||
)
|
||||
if normalized_result != str(material_local):
|
||||
temp_files.append(normalized_material)
|
||||
material_local = normalized_material
|
||||
|
||||
res = video.get_resolution(str(material_local))
|
||||
return material_local, res
|
||||
|
||||
download_tasks = [
|
||||
_download_and_normalize(i, assignment)
|
||||
for i, assignment in enumerate(assignments)
|
||||
]
|
||||
download_results = await asyncio.gather(*download_tasks)
|
||||
for local, res in download_results:
|
||||
material_locals.append(local)
|
||||
resolutions.append(res)
|
||||
|
||||
# 按用户选择的画面比例统一分辨率
|
||||
base_res = target_resolution
|
||||
need_scale = any(r != base_res for r in resolutions)
|
||||
if need_scale:
|
||||
logger.info(f"[MultiMat] 素材分辨率不一致,统一到 {base_res[0]}x{base_res[1]}")
|
||||
|
||||
# ── 第二步:并行裁剪每段素材到对应时长 ──
|
||||
prepared_segments: List[Path] = [None] * num_segments
|
||||
|
||||
async def _prepare_one_segment(i: int, assignment: dict):
|
||||
"""将单个素材裁剪/循环到对应时长"""
|
||||
seg_dur = assignment["end"] - assignment["start"]
|
||||
prepared_path = temp_dir / f"{task_id}_prepared_{i}.mp4"
|
||||
temp_files.append(prepared_path)
|
||||
|
||||
loop = asyncio.get_event_loop()
|
||||
await loop.run_in_executor(
|
||||
None,
|
||||
video.prepare_segment,
|
||||
str(material_locals[i]),
|
||||
seg_dur,
|
||||
str(prepared_path),
|
||||
base_res,
|
||||
assignment.get("source_start", 0.0),
|
||||
assignment.get("source_end"),
|
||||
25,
|
||||
)
|
||||
return i, prepared_path
|
||||
|
||||
_update_task(
|
||||
task_id,
|
||||
progress=15,
|
||||
message=f"正在并行准备 {num_segments} 个素材片段..."
|
||||
)
|
||||
|
||||
prepare_tasks = [
|
||||
_prepare_one_segment(i, assignment)
|
||||
for i, assignment in enumerate(assignments)
|
||||
]
|
||||
prepare_results = await asyncio.gather(*prepare_tasks)
|
||||
for i, path in prepare_results:
|
||||
prepared_segments[i] = path
|
||||
|
||||
# ── 第二步:拼接所有素材片段 ──
|
||||
_update_task(task_id, progress=50, message="正在拼接素材片段...")
|
||||
concat_path = temp_dir / f"{task_id}_concat.mp4"
|
||||
temp_files.append(concat_path)
|
||||
video.concat_videos(
|
||||
[str(p) for p in prepared_segments],
|
||||
str(concat_path),
|
||||
target_fps=25,
|
||||
)
|
||||
|
||||
# ── 第三步:一次 LatentSync 推理 ──
|
||||
is_ready = await _check_lipsync_ready()
|
||||
|
||||
if is_ready:
|
||||
_update_task(task_id, progress=55, message="正在合成唇形 (LatentSync)...")
|
||||
print(f"[LipSync] Multi-material: single LatentSync on concatenated video")
|
||||
try:
|
||||
await lipsync.generate(str(concat_path), str(audio_path), str(lipsync_video_path))
|
||||
except Exception as e:
|
||||
logger.warning(f"[LipSync] Failed, fallback to concat without lipsync: {e}")
|
||||
import shutil
|
||||
shutil.copy(str(concat_path), str(lipsync_video_path))
|
||||
else:
|
||||
print(f"[LipSync] Not ready, using concatenated video without lipsync")
|
||||
import shutil
|
||||
shutil.copy(str(concat_path), str(lipsync_video_path))
|
||||
|
||||
lipsync_time = time.time() - lipsync_start
|
||||
print(f"[Pipeline] Multi-material prepare + concat + LipSync completed in {lipsync_time:.1f}s")
|
||||
_update_task(task_id, progress=80)
|
||||
|
||||
# 如果用户关闭了字幕,清除 captions_path(Whisper 仅用于句子切分)
|
||||
if not req.enable_subtitles:
|
||||
captions_path = None
|
||||
|
||||
else:
|
||||
# ══════════════════════════════════════
|
||||
# 单素材流水线(原有逻辑)
|
||||
# ══════════════════════════════════════
|
||||
|
||||
if input_material_path is None:
|
||||
raise RuntimeError("单素材流程缺少输入素材")
|
||||
|
||||
# 单素材:按用户选择画面比例统一到目标分辨率,并应用 source_start
|
||||
single_source_start = 0.0
|
||||
single_source_end = None
|
||||
if req.custom_assignments and len(req.custom_assignments) == 1:
|
||||
single_source_start = req.custom_assignments[0].source_start
|
||||
single_source_end = req.custom_assignments[0].source_end
|
||||
|
||||
_update_task(task_id, progress=20, message="正在准备素材片段...")
|
||||
audio_dur = video._get_duration(str(audio_path))
|
||||
if audio_dur <= 0:
|
||||
audio_dur = 30.0
|
||||
prepared_single_path = temp_dir / f"{task_id}_prepared_single.mp4"
|
||||
temp_files.append(prepared_single_path)
|
||||
video.prepare_segment(
|
||||
str(input_material_path),
|
||||
audio_dur,
|
||||
str(prepared_single_path),
|
||||
target_resolution=target_resolution,
|
||||
source_start=single_source_start,
|
||||
source_end=single_source_end,
|
||||
)
|
||||
input_material_path = prepared_single_path
|
||||
|
||||
_update_task(task_id, progress=25)
|
||||
_update_task(task_id, message="正在合成唇形 (LatentSync)...", progress=30)
|
||||
|
||||
lipsync_start = time.time()
|
||||
is_ready = await _check_lipsync_ready()
|
||||
|
||||
if is_ready:
|
||||
print(f"[LipSync] Starting LatentSync inference...")
|
||||
_update_task(task_id, progress=35, message="正在运行 LatentSync 推理...")
|
||||
await lipsync.generate(str(input_material_path), str(audio_path), str(lipsync_video_path))
|
||||
else:
|
||||
print(f"[LipSync] LatentSync not ready, copying original video")
|
||||
_update_task(task_id, message="唇形同步不可用,使用原始视频...")
|
||||
import shutil
|
||||
shutil.copy(str(input_material_path), lipsync_video_path)
|
||||
|
||||
lipsync_time = time.time() - lipsync_start
|
||||
print(f"[Pipeline] LipSync completed in {lipsync_time:.1f}s")
|
||||
_update_task(task_id, progress=80)
|
||||
|
||||
# 单素材模式:Whisper 延迟到下方与 BGM 并行执行
|
||||
if not req.enable_subtitles:
|
||||
captions_path = None
|
||||
|
||||
_update_task(task_id, progress=85)
|
||||
|
||||
# ── Whisper 字幕 + BGM 混音 并行(两者都只依赖 audio_path)──
|
||||
final_audio_path = audio_path
|
||||
_whisper_task = None
|
||||
_bgm_task = None
|
||||
|
||||
# 单素材模式下 Whisper 尚未执行,这里与 BGM 并行启动
|
||||
need_whisper = not is_multi and req.enable_subtitles and captions_path is None
|
||||
if need_whisper:
|
||||
captions_path = temp_dir / f"{task_id}_captions.json"
|
||||
temp_files.append(captions_path)
|
||||
_captions_path_str = str(captions_path)
|
||||
|
||||
async def _run_whisper():
|
||||
_update_task(task_id, message="正在生成字幕 (Whisper)...", progress=82)
|
||||
try:
|
||||
await whisper_service.align(
|
||||
audio_path=str(audio_path),
|
||||
text=req.text,
|
||||
output_path=_captions_path_str,
|
||||
language=_locale_to_whisper_lang(req.language),
|
||||
original_text=req.text,
|
||||
)
|
||||
print(f"[Pipeline] Whisper alignment completed")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"Whisper alignment failed, skipping subtitles: {e}")
|
||||
return False
|
||||
|
||||
_whisper_task = _run_whisper()
|
||||
|
||||
if req.bgm_id:
|
||||
bgm_path = resolve_bgm_path(req.bgm_id)
|
||||
if bgm_path:
|
||||
mix_output_path = temp_dir / f"{task_id}_audio_mix.wav"
|
||||
temp_files.append(mix_output_path)
|
||||
volume = req.bgm_volume if req.bgm_volume is not None else 0.2
|
||||
volume = max(0.0, min(float(volume), 1.0))
|
||||
_mix_output = str(mix_output_path)
|
||||
_bgm_path = str(bgm_path)
|
||||
_voice_path = str(audio_path)
|
||||
_volume = volume
|
||||
|
||||
async def _run_bgm():
|
||||
_update_task(task_id, message="正在合成背景音乐...", progress=86)
|
||||
loop = asyncio.get_event_loop()
|
||||
try:
|
||||
await loop.run_in_executor(
|
||||
None,
|
||||
video.mix_audio,
|
||||
_voice_path,
|
||||
_bgm_path,
|
||||
_mix_output,
|
||||
_volume,
|
||||
)
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"BGM mix failed, fallback to voice only: {e}")
|
||||
return False
|
||||
|
||||
_bgm_task = _run_bgm()
|
||||
else:
|
||||
logger.warning(f"BGM not found: {req.bgm_id}")
|
||||
|
||||
# 并行等待 Whisper + BGM
|
||||
parallel_tasks = [t for t in (_whisper_task, _bgm_task) if t is not None]
|
||||
if parallel_tasks:
|
||||
results = await asyncio.gather(*parallel_tasks)
|
||||
result_idx = 0
|
||||
if _whisper_task is not None:
|
||||
if not results[result_idx]:
|
||||
captions_path = None
|
||||
result_idx += 1
|
||||
if _bgm_task is not None:
|
||||
if results[result_idx]:
|
||||
final_audio_path = mix_output_path
|
||||
|
||||
|
||||
use_remotion = (captions_path and captions_path.exists()) or req.title or req.secondary_title
|
||||
|
||||
subtitle_style = None
|
||||
title_style = None
|
||||
secondary_title_style = None
|
||||
if req.enable_subtitles:
|
||||
subtitle_style = get_style("subtitle", req.subtitle_style_id) or get_default_style("subtitle")
|
||||
if req.title:
|
||||
title_style = get_style("title", req.title_style_id) or get_default_style("title")
|
||||
if req.secondary_title:
|
||||
secondary_title_style = get_style("title", req.secondary_title_style_id) or get_default_style("title")
|
||||
|
||||
if req.subtitle_font_size and req.enable_subtitles:
|
||||
if subtitle_style is None:
|
||||
subtitle_style = {}
|
||||
subtitle_style["font_size"] = int(req.subtitle_font_size)
|
||||
|
||||
if req.title_font_size and req.title:
|
||||
if title_style is None:
|
||||
title_style = {}
|
||||
title_style["font_size"] = int(req.title_font_size)
|
||||
|
||||
if req.title_top_margin is not None and req.title:
|
||||
if title_style is None:
|
||||
title_style = {}
|
||||
title_style["top_margin"] = int(req.title_top_margin)
|
||||
|
||||
if req.subtitle_bottom_margin is not None and req.enable_subtitles:
|
||||
if subtitle_style is None:
|
||||
subtitle_style = {}
|
||||
subtitle_style["bottom_margin"] = int(req.subtitle_bottom_margin)
|
||||
|
||||
if req.secondary_title_font_size and req.secondary_title:
|
||||
if secondary_title_style is None:
|
||||
secondary_title_style = {}
|
||||
secondary_title_style["font_size"] = int(req.secondary_title_font_size)
|
||||
|
||||
if req.secondary_title_top_margin is not None and req.secondary_title:
|
||||
if secondary_title_style is None:
|
||||
secondary_title_style = {}
|
||||
secondary_title_style["top_margin"] = int(req.secondary_title_top_margin)
|
||||
|
||||
if use_remotion:
|
||||
subtitle_style = prepare_style_for_remotion(
|
||||
subtitle_style,
|
||||
temp_dir,
|
||||
f"{task_id}_subtitle_font"
|
||||
)
|
||||
title_style = prepare_style_for_remotion(
|
||||
title_style,
|
||||
temp_dir,
|
||||
f"{task_id}_title_font"
|
||||
)
|
||||
secondary_title_style = prepare_style_for_remotion(
|
||||
secondary_title_style,
|
||||
temp_dir,
|
||||
f"{task_id}_secondary_title_font"
|
||||
)
|
||||
|
||||
final_output_local_path = temp_dir / f"{task_id}_output.mp4"
|
||||
temp_files.append(final_output_local_path)
|
||||
|
||||
if use_remotion:
|
||||
_update_task(task_id, message="正在合成视频 (Remotion)...", progress=87)
|
||||
|
||||
composed_video_path = temp_dir / f"{task_id}_composed.mp4"
|
||||
temp_files.append(composed_video_path)
|
||||
|
||||
await video.compose(str(lipsync_video_path), str(final_audio_path), str(composed_video_path))
|
||||
|
||||
remotion_health = await remotion_service.check_health()
|
||||
if remotion_health.get("ready"):
|
||||
try:
|
||||
def on_remotion_progress(percent):
|
||||
mapped = 87 + int(percent * 0.08)
|
||||
_update_task(task_id, progress=mapped)
|
||||
|
||||
title_display_mode = (
|
||||
req.title_display_mode
|
||||
if req.title_display_mode in ("short", "persistent")
|
||||
else "short"
|
||||
)
|
||||
title_duration = max(0.5, min(float(req.title_duration or 4.0), 30.0))
|
||||
|
||||
await remotion_service.render(
|
||||
video_path=str(composed_video_path),
|
||||
output_path=str(final_output_local_path),
|
||||
captions_path=str(captions_path) if captions_path else None,
|
||||
title=req.title,
|
||||
title_duration=title_duration,
|
||||
title_display_mode=title_display_mode,
|
||||
fps=25,
|
||||
enable_subtitles=req.enable_subtitles,
|
||||
subtitle_style=subtitle_style,
|
||||
title_style=title_style,
|
||||
secondary_title=req.secondary_title,
|
||||
secondary_title_style=secondary_title_style,
|
||||
on_progress=on_remotion_progress
|
||||
)
|
||||
print(f"[Pipeline] Remotion render completed")
|
||||
except Exception as e:
|
||||
logger.warning(f"Remotion render failed, using FFmpeg fallback: {e}")
|
||||
import shutil
|
||||
shutil.copy(str(composed_video_path), final_output_local_path)
|
||||
else:
|
||||
logger.warning(f"Remotion not ready: {remotion_health.get('error')}, using FFmpeg")
|
||||
import shutil
|
||||
shutil.copy(str(composed_video_path), final_output_local_path)
|
||||
else:
|
||||
_update_task(task_id, message="正在合成最终视频...", progress=90)
|
||||
|
||||
await video.compose(str(lipsync_video_path), str(final_audio_path), str(final_output_local_path))
|
||||
|
||||
total_time = time.time() - start_time
|
||||
|
||||
_update_task(task_id, message="正在上传结果...", progress=95)
|
||||
|
||||
storage_path = f"{user_id}/{task_id}_output.mp4"
|
||||
await storage_service.upload_file_from_path(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
storage_path=storage_path,
|
||||
local_file_path=str(final_output_local_path),
|
||||
content_type="video/mp4"
|
||||
)
|
||||
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=storage_path
|
||||
)
|
||||
|
||||
print(f"[Pipeline] Total generation time: {total_time:.1f}s")
|
||||
|
||||
_update_task(
|
||||
task_id,
|
||||
status="completed",
|
||||
progress=100,
|
||||
message=f"生成完成!耗时 {total_time:.0f} 秒",
|
||||
output=storage_path,
|
||||
download_url=signed_url,
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
_update_task(
|
||||
task_id,
|
||||
status="failed",
|
||||
message=f"错误: {str(e)}",
|
||||
error=traceback.format_exc(),
|
||||
)
|
||||
logger.error(f"Generate video failed: {e}")
|
||||
finally:
|
||||
for f in temp_files:
|
||||
try:
|
||||
if f.exists():
|
||||
f.unlink()
|
||||
except Exception as e:
|
||||
print(f"Error cleaning up {f}: {e}")
|
||||
|
||||
|
||||
async def get_lipsync_health():
|
||||
lipsync = _get_lipsync_service()
|
||||
return await lipsync.check_health()
|
||||
|
||||
|
||||
async def get_voiceclone_health():
|
||||
return await voice_clone_service.check_health()
|
||||
0
backend/app/repositories/__init__.py
Normal file
0
backend/app/repositories/__init__.py
Normal file
34
backend/app/repositories/orders.py
Normal file
34
backend/app/repositories/orders.py
Normal file
@@ -0,0 +1,34 @@
|
||||
"""
|
||||
订单数据访问层
|
||||
"""
|
||||
from datetime import datetime, timezone
|
||||
from typing import Any, Dict, Optional, cast
|
||||
|
||||
from app.core.supabase import get_supabase
|
||||
|
||||
|
||||
def create_order(user_id: str, out_trade_no: str, amount: float) -> Dict[str, Any]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("orders").insert({
|
||||
"user_id": user_id,
|
||||
"out_trade_no": out_trade_no,
|
||||
"amount": amount,
|
||||
"status": "pending",
|
||||
}).execute()
|
||||
return cast(Dict[str, Any], (result.data or [{}])[0])
|
||||
|
||||
|
||||
def get_order_by_trade_no(out_trade_no: str) -> Optional[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("orders").select("*").eq("out_trade_no", out_trade_no).single().execute()
|
||||
return cast(Optional[Dict[str, Any]], result.data or None)
|
||||
|
||||
|
||||
def update_order_status(out_trade_no: str, status: str, trade_no: str | None = None) -> None:
|
||||
supabase = get_supabase()
|
||||
payload: Dict[str, Any] = {"status": status}
|
||||
if trade_no:
|
||||
payload["trade_no"] = trade_no
|
||||
if status == "paid":
|
||||
payload["paid_at"] = datetime.now(timezone.utc).isoformat()
|
||||
supabase.table("orders").update(payload).eq("out_trade_no", out_trade_no).execute()
|
||||
31
backend/app/repositories/sessions.py
Normal file
31
backend/app/repositories/sessions.py
Normal file
@@ -0,0 +1,31 @@
|
||||
from typing import Any, Dict, List, Optional, cast
|
||||
|
||||
from app.core.supabase import get_supabase
|
||||
|
||||
|
||||
def get_session(user_id: str, session_token: str) -> Optional[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = (
|
||||
supabase.table("user_sessions")
|
||||
.select("*")
|
||||
.eq("user_id", user_id)
|
||||
.eq("session_token", session_token)
|
||||
.execute()
|
||||
)
|
||||
data = cast(List[Dict[str, Any]], result.data or [])
|
||||
return data[0] if data else None
|
||||
|
||||
|
||||
def delete_sessions(user_id: str) -> None:
|
||||
supabase = get_supabase()
|
||||
supabase.table("user_sessions").delete().eq("user_id", user_id).execute()
|
||||
|
||||
|
||||
def create_session(user_id: str, session_token: str, device_info: Optional[str] = None) -> List[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("user_sessions").insert({
|
||||
"user_id": user_id,
|
||||
"session_token": session_token,
|
||||
"device_info": device_info,
|
||||
}).execute()
|
||||
return cast(List[Dict[str, Any]], result.data or [])
|
||||
70
backend/app/repositories/users.py
Normal file
70
backend/app/repositories/users.py
Normal file
@@ -0,0 +1,70 @@
|
||||
from datetime import datetime, timezone
|
||||
from typing import Any, Dict, List, Optional, cast
|
||||
|
||||
from app.core.supabase import get_supabase
|
||||
|
||||
|
||||
def get_user_by_phone(phone: str) -> Optional[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").select("*").eq("phone", phone).single().execute()
|
||||
return cast(Optional[Dict[str, Any]], result.data or None)
|
||||
|
||||
|
||||
def get_user_by_id(user_id: str) -> Optional[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").select("*").eq("id", user_id).single().execute()
|
||||
return cast(Optional[Dict[str, Any]], result.data or None)
|
||||
|
||||
|
||||
def user_exists_by_phone(phone: str) -> bool:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").select("id").eq("phone", phone).execute()
|
||||
return bool(result.data)
|
||||
|
||||
|
||||
def create_user(payload: Dict[str, Any]) -> List[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").insert(payload).execute()
|
||||
return cast(List[Dict[str, Any]], result.data or [])
|
||||
|
||||
|
||||
def list_users() -> List[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").select("*").order("created_at", desc=True).execute()
|
||||
return cast(List[Dict[str, Any]], result.data or [])
|
||||
|
||||
|
||||
def update_user(user_id: str, payload: Dict[str, Any]) -> List[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").update(payload).eq("id", user_id).execute()
|
||||
return cast(List[Dict[str, Any]], result.data or [])
|
||||
|
||||
|
||||
def _parse_expires_at(expires_at: Any) -> Optional[datetime]:
|
||||
try:
|
||||
expires_at_dt = datetime.fromisoformat(str(expires_at).replace("Z", "+00:00"))
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
if expires_at_dt.tzinfo is None:
|
||||
expires_at_dt = expires_at_dt.replace(tzinfo=timezone.utc)
|
||||
return expires_at_dt.astimezone(timezone.utc)
|
||||
|
||||
|
||||
def deactivate_user_if_expired(user: Dict[str, Any]) -> bool:
|
||||
expires_at = user.get("expires_at")
|
||||
if not expires_at:
|
||||
return False
|
||||
|
||||
expires_at_dt = _parse_expires_at(expires_at)
|
||||
if not expires_at_dt:
|
||||
return False
|
||||
|
||||
if datetime.now(timezone.utc) <= expires_at_dt:
|
||||
return False
|
||||
|
||||
user_id = user.get("id")
|
||||
if user.get("is_active") and user_id:
|
||||
update_user(cast(str, user_id), {"is_active": False})
|
||||
|
||||
return True
|
||||
128
backend/app/services/assets_service.py
Normal file
128
backend/app/services/assets_service.py
Normal file
@@ -0,0 +1,128 @@
|
||||
import json
|
||||
import shutil
|
||||
from pathlib import Path
|
||||
from typing import Optional, List, Dict, Any
|
||||
|
||||
from loguru import logger
|
||||
|
||||
from app.core.config import settings
|
||||
|
||||
|
||||
BGM_EXTENSIONS = {".wav", ".mp3", ".m4a", ".aac", ".flac", ".ogg", ".webm"}
|
||||
|
||||
|
||||
def _style_file_path(style_type: str) -> Path:
|
||||
return settings.ASSETS_DIR / "styles" / f"{style_type}.json"
|
||||
|
||||
|
||||
def _load_style_file(style_type: str) -> List[Dict[str, Any]]:
|
||||
style_path = _style_file_path(style_type)
|
||||
if not style_path.exists():
|
||||
return []
|
||||
try:
|
||||
with open(style_path, "r", encoding="utf-8") as f:
|
||||
data = json.load(f)
|
||||
if isinstance(data, list):
|
||||
return data
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to load style file {style_path}: {e}")
|
||||
return []
|
||||
|
||||
|
||||
def list_styles(style_type: str) -> List[Dict[str, Any]]:
|
||||
return _load_style_file(style_type)
|
||||
|
||||
|
||||
def get_style(style_type: str, style_id: Optional[str]) -> Optional[Dict[str, Any]]:
|
||||
if not style_id:
|
||||
return None
|
||||
for item in _load_style_file(style_type):
|
||||
if item.get("id") == style_id:
|
||||
return item
|
||||
return None
|
||||
|
||||
|
||||
def get_default_style(style_type: str) -> Optional[Dict[str, Any]]:
|
||||
styles = _load_style_file(style_type)
|
||||
if not styles:
|
||||
return None
|
||||
for item in styles:
|
||||
if item.get("is_default"):
|
||||
return item
|
||||
return styles[0]
|
||||
|
||||
|
||||
def list_bgm() -> List[Dict[str, Any]]:
|
||||
bgm_root = settings.ASSETS_DIR / "bgm"
|
||||
if not bgm_root.exists():
|
||||
return []
|
||||
|
||||
items: List[Dict[str, Any]] = []
|
||||
for path in bgm_root.rglob("*"):
|
||||
if not path.is_file():
|
||||
continue
|
||||
if path.suffix.lower() not in BGM_EXTENSIONS:
|
||||
continue
|
||||
rel = path.relative_to(bgm_root).as_posix()
|
||||
items.append({
|
||||
"id": rel,
|
||||
"name": path.stem,
|
||||
"ext": path.suffix.lower().lstrip(".")
|
||||
})
|
||||
|
||||
items.sort(key=lambda x: x.get("name", ""))
|
||||
return items
|
||||
|
||||
|
||||
def resolve_bgm_path(bgm_id: str) -> Optional[Path]:
|
||||
if not bgm_id:
|
||||
return None
|
||||
bgm_root = settings.ASSETS_DIR / "bgm"
|
||||
candidate = (bgm_root / bgm_id).resolve()
|
||||
try:
|
||||
candidate.relative_to(bgm_root.resolve())
|
||||
except ValueError:
|
||||
return None
|
||||
if candidate.exists() and candidate.is_file():
|
||||
return candidate
|
||||
return None
|
||||
|
||||
|
||||
def prepare_style_for_remotion(
|
||||
style: Optional[Dict[str, Any]],
|
||||
temp_dir: Path,
|
||||
prefix: str
|
||||
) -> Optional[Dict[str, Any]]:
|
||||
if not style:
|
||||
return None
|
||||
|
||||
prepared = dict(style)
|
||||
font_file = prepared.get("font_file")
|
||||
if not font_file:
|
||||
return prepared
|
||||
|
||||
source_font = (settings.ASSETS_DIR / "fonts" / font_file).resolve()
|
||||
try:
|
||||
source_font.relative_to((settings.ASSETS_DIR / "fonts").resolve())
|
||||
except ValueError:
|
||||
logger.warning(f"Font path outside assets: {font_file}")
|
||||
return prepared
|
||||
|
||||
if not source_font.exists():
|
||||
logger.warning(f"Font file missing: {source_font}")
|
||||
return prepared
|
||||
|
||||
temp_dir.mkdir(parents=True, exist_ok=True)
|
||||
ext = source_font.suffix.lower()
|
||||
target_name = f"{prefix}{ext}"
|
||||
target_path = temp_dir / target_name
|
||||
|
||||
try:
|
||||
shutil.copy(source_font, target_path)
|
||||
prepared["font_file"] = target_name
|
||||
if not prepared.get("font_family"):
|
||||
prepared["font_family"] = prefix
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to copy font {source_font} -> {target_path}: {e}")
|
||||
|
||||
return prepared
|
||||
206
backend/app/services/glm_service.py
Normal file
206
backend/app/services/glm_service.py
Normal file
@@ -0,0 +1,206 @@
|
||||
"""
|
||||
GLM AI 服务
|
||||
使用智谱 GLM 生成标题和标签
|
||||
"""
|
||||
|
||||
import json
|
||||
import re
|
||||
from loguru import logger
|
||||
from zai import ZhipuAiClient
|
||||
|
||||
from app.core.config import settings
|
||||
|
||||
|
||||
class GLMService:
|
||||
"""GLM AI 服务"""
|
||||
|
||||
def __init__(self):
|
||||
self.client = None
|
||||
|
||||
def _get_client(self):
|
||||
"""获取或创建 ZhipuAI 客户端"""
|
||||
if self.client is None:
|
||||
if not settings.GLM_API_KEY:
|
||||
raise Exception("GLM_API_KEY 未配置")
|
||||
self.client = ZhipuAiClient(api_key=settings.GLM_API_KEY)
|
||||
return self.client
|
||||
|
||||
async def generate_title_tags(self, text: str) -> dict:
|
||||
"""
|
||||
根据口播文案生成标题和标签
|
||||
|
||||
Args:
|
||||
text: 口播文案
|
||||
|
||||
Returns:
|
||||
{"title": "标题", "tags": ["标签1", "标签2", ...]}
|
||||
"""
|
||||
prompt = f"""根据以下口播文案,生成一个吸引人的短视频标题、副标题和3个相关标签。
|
||||
|
||||
口播文案:
|
||||
{text}
|
||||
|
||||
要求:
|
||||
1. 标题要简洁有力,能吸引观众点击,不超过10个字
|
||||
2. 副标题是对标题的补充说明或描述性文字,不超过20个字
|
||||
3. 标签要与内容相关,便于搜索和推荐,只要3个
|
||||
4. 标题、副标题和标签必须使用与口播文案相同的语言(如文案是英文就用英文,日文就用日文)
|
||||
|
||||
请严格按以下JSON格式返回(不要包含其他内容):
|
||||
{{"title": "标题", "secondary_title": "副标题", "tags": ["标签1", "标签2", "标签3"]}}"""
|
||||
|
||||
try:
|
||||
client = self._get_client()
|
||||
logger.info(f"Calling GLM API with model: {settings.GLM_MODEL}")
|
||||
|
||||
# 使用 asyncio.to_thread 包装同步 SDK 调用,避免阻塞事件循环
|
||||
import asyncio
|
||||
response = await asyncio.to_thread(
|
||||
client.chat.completions.create,
|
||||
model=settings.GLM_MODEL,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
thinking={"type": "disabled"}, # 禁用思考模式,加快响应
|
||||
max_tokens=500,
|
||||
temperature=0.7
|
||||
)
|
||||
|
||||
# 提取生成的内容
|
||||
content = response.choices[0].message.content
|
||||
logger.info(f"GLM response (model: {settings.GLM_MODEL}): {content}")
|
||||
|
||||
# 解析 JSON
|
||||
result = self._parse_json_response(content)
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"GLM service error: {e}")
|
||||
raise Exception(f"AI 生成失败: {str(e)}")
|
||||
|
||||
async def rewrite_script(self, text: str, custom_prompt: str = None) -> str:
|
||||
"""
|
||||
AI 改写文案
|
||||
|
||||
Args:
|
||||
text: 原始文案
|
||||
custom_prompt: 自定义提示词,为空则使用默认提示词
|
||||
|
||||
Returns:
|
||||
改写后的文案
|
||||
"""
|
||||
if custom_prompt and custom_prompt.strip():
|
||||
prompt = f"""{custom_prompt.strip()}
|
||||
|
||||
原始文案:
|
||||
{text}"""
|
||||
else:
|
||||
prompt = f"""请将以下视频文案进行改写。
|
||||
|
||||
原始文案:
|
||||
{text}
|
||||
|
||||
要求:
|
||||
1. 保持原意,但语气更加自然流畅
|
||||
2. 适合口播,读起来朗朗上口
|
||||
3. 字数与原文相当或略微精简
|
||||
4. 不要返回多余的解释,只返回改写后的正文"""
|
||||
|
||||
try:
|
||||
client = self._get_client()
|
||||
logger.info(f"Using GLM to rewrite script")
|
||||
|
||||
# 使用 asyncio.to_thread 包装同步 SDK 调用,避免阻塞事件循环
|
||||
import asyncio
|
||||
response = await asyncio.to_thread(
|
||||
client.chat.completions.create,
|
||||
model=settings.GLM_MODEL,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
thinking={"type": "disabled"},
|
||||
max_tokens=2000,
|
||||
temperature=0.8
|
||||
)
|
||||
|
||||
content = response.choices[0].message.content
|
||||
logger.info("GLM rewrite completed")
|
||||
return content.strip()
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"GLM rewrite error: {e}")
|
||||
raise Exception(f"AI 改写失败: {str(e)}")
|
||||
|
||||
|
||||
|
||||
async def translate_text(self, text: str, target_lang: str) -> str:
|
||||
"""
|
||||
将文案翻译为指定语言
|
||||
|
||||
Args:
|
||||
text: 原始文案
|
||||
target_lang: 目标语言(如 English, 日本語 等)
|
||||
|
||||
Returns:
|
||||
翻译后的文案
|
||||
"""
|
||||
prompt = f"""请将以下文案翻译为{target_lang}。
|
||||
|
||||
原文:
|
||||
{text}
|
||||
|
||||
要求:
|
||||
1. 只返回翻译后的文案,不要添加任何解释或说明
|
||||
2. 保持原文的语气和风格
|
||||
3. 翻译要自然流畅,符合目标语言的表达习惯"""
|
||||
|
||||
try:
|
||||
client = self._get_client()
|
||||
logger.info(f"Using GLM to translate text to {target_lang}")
|
||||
|
||||
import asyncio
|
||||
response = await asyncio.to_thread(
|
||||
client.chat.completions.create,
|
||||
model=settings.GLM_MODEL,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
thinking={"type": "disabled"},
|
||||
max_tokens=2000,
|
||||
temperature=0.3
|
||||
)
|
||||
|
||||
content = response.choices[0].message.content
|
||||
logger.info("GLM translation completed")
|
||||
return content.strip()
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"GLM translate error: {e}")
|
||||
raise Exception(f"AI 翻译失败: {str(e)}")
|
||||
|
||||
def _parse_json_response(self, content: str) -> dict:
|
||||
"""解析 GLM 返回的 JSON 内容"""
|
||||
# 尝试直接解析
|
||||
try:
|
||||
return json.loads(content)
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# 尝试提取 JSON 块
|
||||
json_match = re.search(r'\{[^{}]*"title"[^{}]*"tags"[^{}]*\}', content, re.DOTALL)
|
||||
if not json_match:
|
||||
json_match = re.search(r'\{[^{}]*"title"[^{}]*"secondary_title"[^{}]*"tags"[^{}]*\}', content, re.DOTALL)
|
||||
if json_match:
|
||||
try:
|
||||
return json.loads(json_match.group())
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# 尝试提取 ```json 代码块
|
||||
code_match = re.search(r'```(?:json)?\s*(\{.*?\})\s*```', content, re.DOTALL)
|
||||
if code_match:
|
||||
try:
|
||||
return json.loads(code_match.group(1))
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
logger.error(f"Failed to parse GLM response: {content}")
|
||||
raise Exception("AI 返回格式解析失败")
|
||||
|
||||
|
||||
# 全局服务实例
|
||||
glm_service = GLMService()
|
||||
@@ -1,7 +1,7 @@
|
||||
"""
|
||||
唇形同步服务
|
||||
通过 subprocess 调用 LatentSync conda 环境进行推理
|
||||
配置为使用 GPU1 (CUDA:1)
|
||||
混合方案: 短视频用 LatentSync (高质量), 长视频用 MuseTalk (高速度)
|
||||
路由阈值: LIPSYNC_DURATION_THRESHOLD (默认 120s)
|
||||
"""
|
||||
import os
|
||||
import shutil
|
||||
@@ -17,15 +17,18 @@ from app.core.config import settings
|
||||
|
||||
|
||||
class LipSyncService:
|
||||
"""唇形同步服务 - LatentSync 1.6 集成 (Subprocess 方式)"""
|
||||
|
||||
"""唇形同步服务 - LatentSync 1.6 + MuseTalk 1.5 混合方案"""
|
||||
|
||||
def __init__(self):
|
||||
self.use_local = settings.LATENTSYNC_LOCAL
|
||||
self.api_url = settings.LATENTSYNC_API_URL
|
||||
self.latentsync_dir = settings.LATENTSYNC_DIR
|
||||
self.gpu_id = settings.LATENTSYNC_GPU_ID
|
||||
self.use_server = settings.LATENTSYNC_USE_SERVER
|
||||
|
||||
|
||||
# MuseTalk 配置
|
||||
self.musetalk_api_url = settings.MUSETALK_API_URL
|
||||
|
||||
# GPU 并发锁 (Serial Queue)
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
@@ -73,7 +76,51 @@ class LipSyncService:
|
||||
logger.warning(f"⚠️ Conda Python 不存在: {self.conda_python}")
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def _get_media_duration(self, media_path: str) -> Optional[float]:
|
||||
"""获取音频或视频的时长(秒)"""
|
||||
try:
|
||||
cmd = [
|
||||
"ffprobe", "-v", "error",
|
||||
"-show_entries", "format=duration",
|
||||
"-of", "default=noprint_wrappers=1:nokey=1",
|
||||
media_path
|
||||
]
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
|
||||
if result.returncode == 0:
|
||||
return float(result.stdout.strip())
|
||||
except Exception as e:
|
||||
logger.warning(f"⚠️ 获取媒体时长失败: {e}")
|
||||
return None
|
||||
|
||||
def _loop_video_to_duration(self, video_path: str, output_path: str, target_duration: float) -> str:
|
||||
"""
|
||||
循环视频以匹配目标时长
|
||||
使用 FFmpeg stream_loop 实现无缝循环
|
||||
"""
|
||||
try:
|
||||
cmd = [
|
||||
"ffmpeg", "-y",
|
||||
"-stream_loop", "-1", # 无限循环
|
||||
"-i", video_path,
|
||||
"-t", str(target_duration), # 截取到目标时长
|
||||
"-c:v", "libx264",
|
||||
"-preset", "fast",
|
||||
"-crf", "23",
|
||||
"-an", # 去掉原音频
|
||||
output_path
|
||||
]
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
|
||||
if result.returncode == 0 and Path(output_path).exists():
|
||||
logger.info(f"✅ 视频循环完成: {target_duration:.1f}s")
|
||||
return output_path
|
||||
else:
|
||||
logger.warning(f"⚠️ 视频循环失败: {result.stderr[:200]}")
|
||||
return video_path
|
||||
except Exception as e:
|
||||
logger.warning(f"⚠️ 视频循环异常: {e}")
|
||||
return video_path
|
||||
|
||||
def _preprocess_video(self, video_path: str, output_path: str, target_height: int = 720) -> str:
|
||||
"""
|
||||
视频预处理:压缩视频以加速后续处理
|
||||
@@ -204,27 +251,46 @@ class LipSyncService:
|
||||
|
||||
logger.info("⏳ 等待 GPU 资源 (排队中)...")
|
||||
async with self._lock:
|
||||
if self.use_server:
|
||||
# 模式 A: 调用常驻服务 (加速模式)
|
||||
return await self._call_persistent_server(video_path, audio_path, output_path)
|
||||
|
||||
logger.info("🔄 调用 LatentSync 推理 (subprocess)...")
|
||||
|
||||
# 使用临时目录存放输出
|
||||
# 使用临时目录存放中间文件
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
tmpdir = Path(tmpdir)
|
||||
|
||||
# 获取音频和视频时长
|
||||
audio_duration = self._get_media_duration(audio_path)
|
||||
video_duration = self._get_media_duration(video_path)
|
||||
|
||||
# 如果音频比视频长,循环视频以匹配音频长度
|
||||
if audio_duration and video_duration and audio_duration > video_duration + 0.5:
|
||||
logger.info(f"🔄 音频({audio_duration:.1f}s) > 视频({video_duration:.1f}s),循环视频...")
|
||||
looped_video = tmpdir / "looped_input.mp4"
|
||||
actual_video_path = self._loop_video_to_duration(
|
||||
video_path,
|
||||
str(looped_video),
|
||||
audio_duration
|
||||
)
|
||||
else:
|
||||
actual_video_path = video_path
|
||||
|
||||
# 混合路由: 长视频走 MuseTalk,短视频走 LatentSync
|
||||
if audio_duration and audio_duration >= settings.LIPSYNC_DURATION_THRESHOLD:
|
||||
logger.info(
|
||||
f"🔄 音频 {audio_duration:.1f}s >= {settings.LIPSYNC_DURATION_THRESHOLD}s,路由到 MuseTalk"
|
||||
)
|
||||
musetalk_result = await self._call_musetalk_server(
|
||||
actual_video_path, audio_path, output_path
|
||||
)
|
||||
if musetalk_result:
|
||||
return musetalk_result
|
||||
logger.warning("⚠️ MuseTalk 不可用,回退到 LatentSync(长视频,会较慢)")
|
||||
|
||||
if self.use_server:
|
||||
# 模式 A: 调用常驻服务 (加速模式)
|
||||
return await self._call_persistent_server(actual_video_path, audio_path, output_path)
|
||||
|
||||
logger.info("🔄 调用 LatentSync 推理 (subprocess)...")
|
||||
|
||||
temp_output = tmpdir / "output.mp4"
|
||||
|
||||
# 视频预处理:压缩高分辨率视频以加速处理
|
||||
# preprocessed_video = tmpdir / "preprocessed_input.mp4"
|
||||
# actual_video_path = self._preprocess_video(
|
||||
# video_path,
|
||||
# str(preprocessed_video),
|
||||
# target_height=720
|
||||
# )
|
||||
# 暂时禁用预处理以保持原始分辨率
|
||||
actual_video_path = video_path
|
||||
|
||||
# 构建命令
|
||||
cmd = [
|
||||
str(self.conda_python),
|
||||
@@ -285,7 +351,7 @@ class LipSyncService:
|
||||
return output_path
|
||||
|
||||
logger.info(f"LatentSync 输出:\n{stdout_text[-500:] if stdout_text else 'N/A'}")
|
||||
|
||||
|
||||
# 检查输出文件
|
||||
if temp_output.exists():
|
||||
shutil.copy(temp_output, output_path)
|
||||
@@ -301,6 +367,55 @@ class LipSyncService:
|
||||
shutil.copy(video_path, output_path)
|
||||
return output_path
|
||||
|
||||
async def _call_musetalk_server(
|
||||
self, video_path: str, audio_path: str, output_path: str
|
||||
) -> Optional[str]:
|
||||
"""
|
||||
调用 MuseTalk 常驻服务。
|
||||
成功返回 output_path,不可用返回 None(信号上层回退到 LatentSync)。
|
||||
"""
|
||||
server_url = self.musetalk_api_url
|
||||
logger.info(f"⚡ 调用 MuseTalk 服务: {server_url}")
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=3600.0) as client:
|
||||
# 健康检查
|
||||
try:
|
||||
resp = await client.get(f"{server_url}/health", timeout=5.0)
|
||||
if resp.status_code != 200:
|
||||
logger.warning("⚠️ MuseTalk 健康检查失败")
|
||||
return None
|
||||
health = resp.json()
|
||||
if not health.get("model_loaded"):
|
||||
logger.warning("⚠️ MuseTalk 模型未加载")
|
||||
return None
|
||||
except Exception:
|
||||
logger.warning("⚠️ 无法连接 MuseTalk 服务")
|
||||
return None
|
||||
|
||||
# 发送推理请求
|
||||
payload = {
|
||||
"video_path": str(Path(video_path).resolve()),
|
||||
"audio_path": str(Path(audio_path).resolve()),
|
||||
"video_out_path": str(Path(output_path).resolve()),
|
||||
"batch_size": settings.MUSETALK_BATCH_SIZE,
|
||||
}
|
||||
|
||||
response = await client.post(f"{server_url}/lipsync", json=payload)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
if Path(result["output_path"]).exists():
|
||||
logger.info(f"✅ MuseTalk 推理完成: {output_path}")
|
||||
return output_path
|
||||
|
||||
logger.error(f"❌ MuseTalk 服务报错: {response.text}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"❌ MuseTalk 调用失败: {e}")
|
||||
return None
|
||||
|
||||
async def _call_persistent_server(self, video_path: str, audio_path: str, output_path: str) -> str:
|
||||
"""调用本地常驻服务 (server.py)"""
|
||||
server_url = "http://localhost:8007"
|
||||
@@ -318,7 +433,7 @@ class LipSyncService:
|
||||
}
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=1200.0) as client:
|
||||
async with httpx.AsyncClient(timeout=3600.0) as client:
|
||||
# 先检查健康状态
|
||||
try:
|
||||
resp = await client.get(f"{server_url}/health", timeout=5.0)
|
||||
@@ -347,18 +462,23 @@ class LipSyncService:
|
||||
raise e
|
||||
|
||||
async def _local_generate_subprocess(self, video_path: str, audio_path: str, output_path: str) -> str:
|
||||
"""原有的 subprocess 逻辑提取为独立方法"""
|
||||
logger.info("🔄 调用 LatentSync 推理 (subprocess)...")
|
||||
# ... (此处仅为占位符提示,实际代码需要调整结构以避免重复,
|
||||
# 但鉴于原有 _local_generate 的结构,最简单的方法是在 _local_generate 内部做判断,
|
||||
# 如果 use_server 失败,可以 retry 或者 _local_generate 不做拆分,直接在里面写逻辑)
|
||||
# 为了最小化改动且保持安全,上面的 _call_persistent_server 如果失败,
|
||||
# 最好不要自动回退(可能导致双重资源消耗),而是直接报错让用户检查服务。
|
||||
# 但为了用户体验,我们可以允许回退。
|
||||
# *修正策略*:
|
||||
# 我将不拆分 _local_generate_subprocess,而是将 subprocess 逻辑保留在 _local_generate 的后半部分。
|
||||
# 如果 self.use_server 为 True,先尝试调用 server,成功则 return,失败则继续往下走。
|
||||
pass
|
||||
"""
|
||||
原有的 subprocess 回退逻辑
|
||||
|
||||
注意:subprocess 回退已被禁用,原因如下:
|
||||
1. subprocess 模式需要重新加载模型,消耗大量时间和显存
|
||||
2. 如果常驻服务不可用,应该让用户知道并修复服务,而非静默回退
|
||||
3. 避免双重资源消耗导致的 GPU OOM
|
||||
|
||||
如果常驻服务不可用,请检查:
|
||||
- 服务是否启动: python scripts/server.py (在 models/LatentSync 目录)
|
||||
- 端口是否被占用: lsof -i:8007
|
||||
- GPU 显存是否充足: nvidia-smi
|
||||
"""
|
||||
raise RuntimeError(
|
||||
"LatentSync 常驻服务不可用,无法进行唇形同步。"
|
||||
"请确保 LatentSync 服务已启动 (cd models/LatentSync && python scripts/server.py)"
|
||||
)
|
||||
|
||||
async def _remote_generate(
|
||||
self,
|
||||
@@ -421,8 +541,18 @@ class LipSyncService:
|
||||
except:
|
||||
pass
|
||||
|
||||
# 检查 MuseTalk 服务
|
||||
musetalk_ready = False
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.get(f"{self.musetalk_api_url}/health")
|
||||
if resp.status_code == 200:
|
||||
musetalk_ready = resp.json().get("model_loaded", False)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return {
|
||||
"model": "LatentSync 1.6",
|
||||
"model": "LatentSync 1.6 + MuseTalk 1.5",
|
||||
"conda_env": conda_ok,
|
||||
"weights": weights_ok,
|
||||
"gpu": gpu_ok,
|
||||
@@ -430,5 +560,7 @@ class LipSyncService:
|
||||
"gpu_id": self.gpu_id,
|
||||
"inference_steps": settings.LATENTSYNC_INFERENCE_STEPS,
|
||||
"guidance_scale": settings.LATENTSYNC_GUIDANCE_SCALE,
|
||||
"ready": conda_ok and weights_ok and gpu_ok
|
||||
"ready": conda_ok and weights_ok and gpu_ok,
|
||||
"musetalk_ready": musetalk_ready,
|
||||
"lipsync_threshold": settings.LIPSYNC_DURATION_THRESHOLD,
|
||||
}
|
||||
|
||||
@@ -18,6 +18,7 @@ from app.services.storage import storage_service
|
||||
from .uploader.bilibili_uploader import BilibiliUploader
|
||||
from .uploader.douyin_uploader import DouyinUploader
|
||||
from .uploader.xiaohongshu_uploader import XiaohongshuUploader
|
||||
from .uploader.weixin_uploader import WeixinUploader
|
||||
|
||||
|
||||
class PublishService:
|
||||
@@ -25,11 +26,10 @@ class PublishService:
|
||||
|
||||
# 支持的平台配置
|
||||
PLATFORMS: Dict[str, Dict[str, Any]] = {
|
||||
"bilibili": {"name": "B站", "url": "https://member.bilibili.com/platform/upload/video/frame", "enabled": True},
|
||||
"douyin": {"name": "抖音", "url": "https://creator.douyin.com/", "enabled": True},
|
||||
"weixin": {"name": "微信视频号", "url": "https://channels.weixin.qq.com/", "enabled": True},
|
||||
"bilibili": {"name": "B站", "url": "https://member.bilibili.com/platform/upload/video/frame", "enabled": True},
|
||||
"xiaohongshu": {"name": "小红书", "url": "https://creator.xiaohongshu.com/", "enabled": True},
|
||||
"weixin": {"name": "微信视频号", "url": "https://channels.weixin.qq.com/", "enabled": False},
|
||||
"kuaishou": {"name": "快手", "url": "https://cp.kuaishou.com/", "enabled": False},
|
||||
}
|
||||
|
||||
def __init__(self) -> None:
|
||||
@@ -182,7 +182,8 @@ class PublishService:
|
||||
tags=tags,
|
||||
publish_date=publish_time,
|
||||
account_file=str(account_file),
|
||||
description=description
|
||||
description=description,
|
||||
user_id=user_id,
|
||||
)
|
||||
elif platform == "xiaohongshu":
|
||||
uploader = XiaohongshuUploader(
|
||||
@@ -193,6 +194,16 @@ class PublishService:
|
||||
account_file=str(account_file),
|
||||
description=description
|
||||
)
|
||||
elif platform == "weixin":
|
||||
uploader = WeixinUploader(
|
||||
title=title,
|
||||
file_path=local_video_path,
|
||||
tags=tags,
|
||||
publish_date=publish_time,
|
||||
account_file=str(account_file),
|
||||
description=description,
|
||||
user_id=user_id,
|
||||
)
|
||||
else:
|
||||
logger.warning(f"[发布] {platform} 上传功能尚未实现")
|
||||
return {
|
||||
@@ -225,30 +236,38 @@ class PublishService:
|
||||
async def login(self, platform: str, user_id: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""
|
||||
启动QR码登录流程
|
||||
|
||||
|
||||
Args:
|
||||
platform: 平台 ID
|
||||
user_id: 用户 ID (用于 Cookie 隔离)
|
||||
|
||||
|
||||
Returns:
|
||||
dict: 包含二维码base64图片
|
||||
"""
|
||||
if platform not in self.PLATFORMS:
|
||||
return {"success": False, "message": "不支持的平台"}
|
||||
|
||||
|
||||
try:
|
||||
from .qr_login_service import QRLoginService
|
||||
|
||||
|
||||
# 获取用户专属的 Cookie 目录
|
||||
cookies_dir = self._get_cookies_dir(user_id)
|
||||
|
||||
|
||||
# 清理旧的活跃会话(避免残留会话干扰新登录)
|
||||
session_key = self._get_session_key(platform, user_id)
|
||||
if session_key in self.active_login_sessions:
|
||||
old_service = self.active_login_sessions.pop(session_key)
|
||||
try:
|
||||
await old_service._cleanup()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# 创建QR登录服务
|
||||
qr_service = QRLoginService(platform, cookies_dir)
|
||||
|
||||
|
||||
# 存储活跃会话 (带用户隔离)
|
||||
session_key = self._get_session_key(platform, user_id)
|
||||
self.active_login_sessions[session_key] = qr_service
|
||||
|
||||
|
||||
# 启动登录并获取二维码
|
||||
result = await qr_service.start_login()
|
||||
|
||||
@@ -262,27 +281,28 @@ class PublishService:
|
||||
}
|
||||
|
||||
def get_login_session_status(self, platform: str, user_id: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""获取活跃登录会话的状态"""
|
||||
"""获取活跃登录会话的状态(仅用于扫码轮询)"""
|
||||
session_key = self._get_session_key(platform, user_id)
|
||||
|
||||
# 1. 如果有活跃的扫码会话,优先检查它
|
||||
|
||||
# 只检查活跃的扫码会话,不检查 Cookie 文件
|
||||
# Cookie 文件检查会导致"重新登录"时误判为已登录
|
||||
if session_key in self.active_login_sessions:
|
||||
qr_service = self.active_login_sessions[session_key]
|
||||
status = qr_service.get_login_status()
|
||||
|
||||
|
||||
# 如果登录成功且Cookie已保存,清理会话
|
||||
if status["success"] and status["cookies_saved"]:
|
||||
del self.active_login_sessions[session_key]
|
||||
return {"success": True, "message": "登录成功"}
|
||||
|
||||
return {"success": False, "message": "等待扫码..."}
|
||||
|
||||
# 2. 检查本地Cookie文件是否存在
|
||||
cookie_file = self._get_cookie_path(platform, user_id)
|
||||
if cookie_file.exists():
|
||||
return {"success": True, "message": "已登录 (历史状态)"}
|
||||
|
||||
return {"success": False, "message": "未登录"}
|
||||
|
||||
# 刷脸验证:传递新二维码给前端
|
||||
result: Dict[str, Any] = {"success": False, "message": "等待扫码..."}
|
||||
if status.get("face_verify_qr"):
|
||||
result["face_verify_qr"] = status["face_verify_qr"]
|
||||
return result
|
||||
|
||||
# 没有活跃会话 → 返回 False(前端不应在无会话时轮询)
|
||||
return {"success": False, "message": "无活跃登录会话"}
|
||||
|
||||
def logout(self, platform: str, user_id: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
194
backend/app/services/remotion_service.py
Normal file
194
backend/app/services/remotion_service.py
Normal file
@@ -0,0 +1,194 @@
|
||||
"""
|
||||
Remotion 视频渲染服务
|
||||
调用 Node.js Remotion 进行视频合成(字幕 + 标题)
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
from collections.abc import Callable
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
from loguru import logger
|
||||
|
||||
|
||||
class RemotionService:
|
||||
"""Remotion 视频渲染服务"""
|
||||
|
||||
def __init__(self, remotion_dir: Optional[str] = None):
|
||||
# Remotion 项目目录
|
||||
if remotion_dir:
|
||||
self.remotion_dir = Path(remotion_dir)
|
||||
else:
|
||||
# 默认在 ViGent2/remotion 目录
|
||||
self.remotion_dir = Path(__file__).parent.parent.parent.parent / "remotion"
|
||||
|
||||
async def render(
|
||||
self,
|
||||
video_path: str,
|
||||
output_path: str,
|
||||
captions_path: Optional[str] = None,
|
||||
title: Optional[str] = None,
|
||||
title_duration: float = 4.0,
|
||||
title_display_mode: str = "short",
|
||||
fps: int = 25,
|
||||
enable_subtitles: bool = True,
|
||||
subtitle_style: Optional[dict] = None,
|
||||
title_style: Optional[dict] = None,
|
||||
secondary_title: Optional[str] = None,
|
||||
secondary_title_style: Optional[dict] = None,
|
||||
on_progress: Optional[Callable[[int], None]] = None
|
||||
) -> str:
|
||||
"""
|
||||
使用 Remotion 渲染视频(添加字幕和标题)
|
||||
|
||||
Args:
|
||||
video_path: 输入视频路径(唇形同步后的视频)
|
||||
output_path: 输出视频路径
|
||||
captions_path: 字幕 JSON 文件路径(Whisper 生成)
|
||||
title: 视频标题(可选)
|
||||
title_duration: 标题显示时长(秒)
|
||||
title_display_mode: 标题显示模式(short/persistent)
|
||||
fps: 帧率
|
||||
enable_subtitles: 是否启用字幕
|
||||
on_progress: 进度回调函数
|
||||
|
||||
Returns:
|
||||
输出视频路径
|
||||
"""
|
||||
# 构建命令参数
|
||||
# 优先使用预编译的 JS 文件(更快),如果不存在则回退到 ts-node
|
||||
compiled_js = self.remotion_dir / "dist" / "render.js"
|
||||
if compiled_js.exists():
|
||||
cmd = ["node", "dist/render.js"]
|
||||
logger.info("Using pre-compiled render.js for faster startup")
|
||||
else:
|
||||
cmd = ["npx", "ts-node", "render.ts"]
|
||||
logger.warning("Using ts-node (slower). Run 'npm run build:render' to compile for faster startup.")
|
||||
|
||||
cmd.extend([
|
||||
"--video", str(video_path),
|
||||
"--output", str(output_path),
|
||||
"--fps", str(fps),
|
||||
"--enableSubtitles", str(enable_subtitles).lower()
|
||||
])
|
||||
|
||||
if captions_path:
|
||||
cmd.extend(["--captions", str(captions_path)])
|
||||
|
||||
if title:
|
||||
cmd.extend(["--title", title])
|
||||
cmd.extend(["--titleDuration", str(title_duration)])
|
||||
cmd.extend(["--titleDisplayMode", title_display_mode])
|
||||
|
||||
if subtitle_style:
|
||||
cmd.extend(["--subtitleStyle", json.dumps(subtitle_style, ensure_ascii=False)])
|
||||
|
||||
if title_style:
|
||||
cmd.extend(["--titleStyle", json.dumps(title_style, ensure_ascii=False)])
|
||||
|
||||
if secondary_title:
|
||||
cmd.extend(["--secondaryTitle", secondary_title])
|
||||
|
||||
if secondary_title_style:
|
||||
cmd.extend(["--secondaryTitleStyle", json.dumps(secondary_title_style, ensure_ascii=False)])
|
||||
|
||||
logger.info(f"Running Remotion render: {' '.join(cmd)}")
|
||||
|
||||
# 在线程池中运行子进程
|
||||
def _run_render():
|
||||
process = subprocess.Popen(
|
||||
cmd,
|
||||
cwd=str(self.remotion_dir),
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.STDOUT,
|
||||
text=True,
|
||||
bufsize=1
|
||||
)
|
||||
|
||||
if process.stdout is None:
|
||||
raise RuntimeError("Remotion process stdout is unavailable")
|
||||
stdout = process.stdout
|
||||
|
||||
output_lines = []
|
||||
for line in iter(stdout.readline, ''):
|
||||
line = line.strip()
|
||||
if line:
|
||||
output_lines.append(line)
|
||||
logger.debug(f"[Remotion] {line}")
|
||||
|
||||
# 解析进度
|
||||
if "Rendering:" in line and "%" in line:
|
||||
try:
|
||||
percent_str = line.split("Rendering:")[1].strip().replace("%", "")
|
||||
percent = int(percent_str)
|
||||
if on_progress:
|
||||
on_progress(percent)
|
||||
except (ValueError, IndexError):
|
||||
pass
|
||||
|
||||
process.wait()
|
||||
|
||||
if process.returncode != 0:
|
||||
# Remotion 渲染可能在完成输出后进程崩溃 (如 SIGABRT code -6)
|
||||
# 如果输出文件已存在且大小合理,视为成功
|
||||
output_file = Path(output_path)
|
||||
if output_file.exists() and output_file.stat().st_size > 1024:
|
||||
logger.warning(
|
||||
f"Remotion process exited with code {process.returncode}, "
|
||||
f"but output file exists ({output_file.stat().st_size} bytes). Treating as success."
|
||||
)
|
||||
return output_path
|
||||
|
||||
error_msg = "\n".join(output_lines[-20:]) # 最后 20 行
|
||||
raise RuntimeError(f"Remotion render failed (code {process.returncode}):\n{error_msg}")
|
||||
|
||||
return output_path
|
||||
|
||||
loop = asyncio.get_event_loop()
|
||||
result = await loop.run_in_executor(None, _run_render)
|
||||
|
||||
logger.info(f"Remotion render complete: {result}")
|
||||
return result
|
||||
|
||||
async def check_health(self) -> dict:
|
||||
"""检查 Remotion 服务健康状态"""
|
||||
try:
|
||||
# 检查 remotion 目录是否存在
|
||||
if not self.remotion_dir.exists():
|
||||
return {
|
||||
"ready": False,
|
||||
"error": f"Remotion directory not found: {self.remotion_dir}"
|
||||
}
|
||||
|
||||
# 检查 package.json 是否存在
|
||||
package_json = self.remotion_dir / "package.json"
|
||||
if not package_json.exists():
|
||||
return {
|
||||
"ready": False,
|
||||
"error": "package.json not found"
|
||||
}
|
||||
|
||||
# 检查 node_modules 是否存在
|
||||
node_modules = self.remotion_dir / "node_modules"
|
||||
if not node_modules.exists():
|
||||
return {
|
||||
"ready": False,
|
||||
"error": "node_modules not found, run 'npm install' first"
|
||||
}
|
||||
|
||||
return {
|
||||
"ready": True,
|
||||
"remotion_dir": str(self.remotion_dir)
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
return {
|
||||
"ready": False,
|
||||
"error": str(e)
|
||||
}
|
||||
|
||||
|
||||
# 全局服务实例
|
||||
remotion_service = RemotionService()
|
||||
@@ -7,15 +7,39 @@ from pathlib import Path
|
||||
import asyncio
|
||||
import functools
|
||||
import os
|
||||
import shutil
|
||||
|
||||
# Supabase Storage 本地存储根目录
|
||||
SUPABASE_STORAGE_LOCAL_PATH = Path("/home/rongye/ProgramFiles/Supabase/volumes/storage/stub/stub")
|
||||
# Supabase Storage 本地存储根目录(从环境变量读取,支持不同部署环境)
|
||||
# Supabase Storage 本地存储根目录(从环境变量读取,支持不同部署环境)
|
||||
_default_storage_path = "/var/lib/supabase/storage" # 生产环境默认路径
|
||||
SUPABASE_STORAGE_LOCAL_PATH = Path(os.getenv("SUPABASE_STORAGE_LOCAL_PATH", _default_storage_path))
|
||||
|
||||
class StorageService:
|
||||
def __init__(self):
|
||||
self.supabase: Client = get_supabase()
|
||||
self.BUCKET_MATERIALS = "materials"
|
||||
self.BUCKET_OUTPUTS = "outputs"
|
||||
self.BUCKET_REF_AUDIOS = "ref-audios"
|
||||
self.BUCKET_GENERATED_AUDIOS = "generated-audios"
|
||||
# 确保所有 bucket 存在
|
||||
self._ensure_buckets()
|
||||
|
||||
def _ensure_buckets(self):
|
||||
"""确保所有必需的 bucket 存在"""
|
||||
buckets = [self.BUCKET_MATERIALS, self.BUCKET_OUTPUTS, self.BUCKET_REF_AUDIOS, self.BUCKET_GENERATED_AUDIOS]
|
||||
try:
|
||||
existing = self.supabase.storage.list_buckets()
|
||||
existing_names = {b.name for b in existing} if existing else set()
|
||||
for bucket_name in buckets:
|
||||
if bucket_name not in existing_names:
|
||||
try:
|
||||
self.supabase.storage.create_bucket(bucket_name, options={"public": True})
|
||||
logger.info(f"Created bucket: {bucket_name}")
|
||||
except Exception as e:
|
||||
# 可能已存在,忽略错误
|
||||
logger.debug(f"Bucket {bucket_name} creation skipped: {e}")
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to ensure buckets: {e}")
|
||||
|
||||
def _convert_to_public_url(self, url: str) -> str:
|
||||
"""将内部 URL 转换为公网可访问的 URL"""
|
||||
@@ -80,6 +104,45 @@ class StorageService:
|
||||
logger.error(f"Storage upload failed: {e}")
|
||||
raise e
|
||||
|
||||
async def upload_file_from_path(self, bucket: str, storage_path: str, local_file_path: str, content_type: str) -> str:
|
||||
"""
|
||||
从本地文件路径上传文件到 Supabase Storage
|
||||
|
||||
使用分块读取减少内存峰值,避免大文件整读入内存
|
||||
|
||||
Args:
|
||||
bucket: 存储桶名称
|
||||
storage_path: Storage 中的目标路径
|
||||
local_file_path: 本地文件的绝对路径
|
||||
content_type: MIME 类型
|
||||
"""
|
||||
local_file = Path(local_file_path)
|
||||
if not local_file.exists():
|
||||
raise FileNotFoundError(f"本地文件不存在: {local_file_path}")
|
||||
|
||||
loop = asyncio.get_running_loop()
|
||||
file_size = local_file.stat().st_size
|
||||
|
||||
# 分块读取文件,避免大文件整读入内存
|
||||
# 虽然最终还是需要拼接成 bytes 传给 SDK,但分块读取可以减少 IO 压力
|
||||
def read_file_chunked():
|
||||
chunks = []
|
||||
chunk_size = 10 * 1024 * 1024 # 10MB per chunk
|
||||
with open(local_file_path, "rb") as f:
|
||||
while True:
|
||||
chunk = f.read(chunk_size)
|
||||
if not chunk:
|
||||
break
|
||||
chunks.append(chunk)
|
||||
return b"".join(chunks)
|
||||
|
||||
if file_size > 50 * 1024 * 1024: # 大于 50MB 记录日志
|
||||
logger.info(f"大文件上传: {file_size / 1024 / 1024:.1f}MB")
|
||||
|
||||
file_data = await loop.run_in_executor(None, read_file_chunked)
|
||||
|
||||
return await self.upload_file(bucket, storage_path, file_data, content_type)
|
||||
|
||||
async def get_signed_url(self, bucket: str, path: str, expires_in: int = 3600) -> str:
|
||||
"""异步获取签名访问链接"""
|
||||
try:
|
||||
@@ -132,6 +195,19 @@ class StorageService:
|
||||
logger.error(f"Delete file failed: {e}")
|
||||
pass
|
||||
|
||||
async def move_file(self, bucket: str, from_path: str, to_path: str):
|
||||
"""异步移动/重命名文件"""
|
||||
try:
|
||||
loop = asyncio.get_running_loop()
|
||||
await loop.run_in_executor(
|
||||
None,
|
||||
lambda: self.supabase.storage.from_(bucket).move(from_path, to_path)
|
||||
)
|
||||
logger.info(f"Moved file: {bucket}/{from_path} -> {to_path}")
|
||||
except Exception as e:
|
||||
logger.error(f"Move file failed: {e}")
|
||||
raise e
|
||||
|
||||
async def list_files(self, bucket: str, path: str) -> List[Any]:
|
||||
"""异步列出文件"""
|
||||
try:
|
||||
|
||||
@@ -4,6 +4,7 @@ Platform uploader base classes and utilities
|
||||
from .base_uploader import BaseUploader
|
||||
from .bilibili_uploader import BilibiliUploader
|
||||
from .douyin_uploader import DouyinUploader
|
||||
from .xiaohongshu_uploader import XiaohongshuUploader
|
||||
from .xiaohongshu_uploader import XiaohongshuUploader
|
||||
from .weixin_uploader import WeixinUploader
|
||||
|
||||
__all__ = ['BaseUploader', 'BilibiliUploader', 'DouyinUploader', 'XiaohongshuUploader']
|
||||
__all__ = ['BaseUploader', 'BilibiliUploader', 'DouyinUploader', 'XiaohongshuUploader', 'WeixinUploader']
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
1429
backend/app/services/uploader/weixin_uploader.py
Normal file
1429
backend/app/services/uploader/weixin_uploader.py
Normal file
File diff suppressed because it is too large
Load Diff
@@ -1,95 +1,405 @@
|
||||
"""
|
||||
视频合成服务
|
||||
"""
|
||||
import os
|
||||
import subprocess
|
||||
import json
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
from typing import Optional
|
||||
|
||||
class VideoService:
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
def _run_ffmpeg(self, cmd: list) -> bool:
|
||||
cmd_str = ' '.join(f'"{c}"' if ' ' in c or '\\' in c else c for c in cmd)
|
||||
logger.debug(f"FFmpeg CMD: {cmd_str}")
|
||||
try:
|
||||
# Synchronous call for BackgroundTasks compatibility
|
||||
result = subprocess.run(
|
||||
cmd_str,
|
||||
shell=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
encoding='utf-8',
|
||||
)
|
||||
if result.returncode != 0:
|
||||
logger.error(f"FFmpeg Error: {result.stderr}")
|
||||
return False
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"FFmpeg Exception: {e}")
|
||||
return False
|
||||
|
||||
def _get_duration(self, file_path: str) -> float:
|
||||
# Synchronous call for BackgroundTasks compatibility
|
||||
cmd = f'ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "{file_path}"'
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
shell=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
return float(result.stdout.strip())
|
||||
except Exception:
|
||||
return 0.0
|
||||
|
||||
async def compose(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
subtitle_path: Optional[str] = None
|
||||
) -> str:
|
||||
"""合成视频"""
|
||||
# Ensure output dir
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
video_duration = self._get_duration(video_path)
|
||||
audio_duration = self._get_duration(audio_path)
|
||||
|
||||
# Audio loop if needed
|
||||
loop_count = 1
|
||||
if audio_duration > video_duration and video_duration > 0:
|
||||
loop_count = int(audio_duration / video_duration) + 1
|
||||
|
||||
cmd = ["ffmpeg", "-y"]
|
||||
|
||||
# Input video (stream_loop must be before -i)
|
||||
if loop_count > 1:
|
||||
cmd.extend(["-stream_loop", str(loop_count)])
|
||||
cmd.extend(["-i", video_path])
|
||||
|
||||
# Input audio
|
||||
cmd.extend(["-i", audio_path])
|
||||
|
||||
# Filter complex
|
||||
filter_complex = []
|
||||
|
||||
# Subtitles (skip for now to mimic previous state or implement basic)
|
||||
# Previous state: subtitles disabled due to font issues
|
||||
# if subtitle_path: ...
|
||||
|
||||
# Audio map
|
||||
cmd.extend(["-c:v", "libx264", "-c:a", "aac", "-shortest"])
|
||||
# Use audio from input 1
|
||||
cmd.extend(["-map", "0:v", "-map", "1:a"])
|
||||
|
||||
cmd.append(output_path)
|
||||
|
||||
if self._run_ffmpeg(cmd):
|
||||
return output_path
|
||||
else:
|
||||
raise RuntimeError("FFmpeg composition failed")
|
||||
"""
|
||||
视频合成服务
|
||||
"""
|
||||
import os
|
||||
import subprocess
|
||||
import json
|
||||
import shlex
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
from typing import Optional
|
||||
|
||||
class VideoService:
|
||||
def __init__(self):
|
||||
pass
|
||||
|
||||
def get_video_metadata(self, file_path: str) -> dict:
|
||||
"""获取视频元信息(含旋转角与有效显示分辨率)"""
|
||||
cmd = [
|
||||
"ffprobe", "-v", "error",
|
||||
"-select_streams", "v:0",
|
||||
"-show_entries", "stream=width,height:stream_side_data=rotation",
|
||||
"-of", "json",
|
||||
file_path,
|
||||
]
|
||||
default_info = {
|
||||
"width": 0,
|
||||
"height": 0,
|
||||
"rotation": 0,
|
||||
"effective_width": 0,
|
||||
"effective_height": 0,
|
||||
}
|
||||
|
||||
try:
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
|
||||
if result.returncode != 0:
|
||||
return default_info
|
||||
|
||||
payload = json.loads(result.stdout or "{}")
|
||||
streams = payload.get("streams") or []
|
||||
if not streams:
|
||||
return default_info
|
||||
|
||||
stream = streams[0]
|
||||
width = int(stream.get("width") or 0)
|
||||
height = int(stream.get("height") or 0)
|
||||
|
||||
rotation = 0
|
||||
for side_data in stream.get("side_data_list") or []:
|
||||
if not isinstance(side_data, dict):
|
||||
continue
|
||||
raw_rotation = side_data.get("rotation")
|
||||
if raw_rotation is None:
|
||||
continue
|
||||
try:
|
||||
rotation = int(round(float(str(raw_rotation))))
|
||||
except Exception:
|
||||
rotation = 0
|
||||
break
|
||||
|
||||
norm_rotation = rotation % 360
|
||||
if norm_rotation > 180:
|
||||
norm_rotation -= 360
|
||||
swap_wh = abs(norm_rotation) == 90
|
||||
|
||||
effective_width = height if swap_wh else width
|
||||
effective_height = width if swap_wh else height
|
||||
|
||||
return {
|
||||
"width": width,
|
||||
"height": height,
|
||||
"rotation": norm_rotation,
|
||||
"effective_width": effective_width,
|
||||
"effective_height": effective_height,
|
||||
}
|
||||
except Exception as e:
|
||||
logger.warning(f"获取视频元信息失败: {e}")
|
||||
return default_info
|
||||
|
||||
def normalize_orientation(self, video_path: str, output_path: str) -> str:
|
||||
"""将带旋转元数据的视频转为物理方向,避免后续流程忽略 rotation。"""
|
||||
info = self.get_video_metadata(video_path)
|
||||
rotation = int(info.get("rotation") or 0)
|
||||
if rotation == 0:
|
||||
return video_path
|
||||
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
logger.info(
|
||||
f"检测到旋转元数据 rotation={rotation},归一化方向: "
|
||||
f"{info.get('effective_width', 0)}x{info.get('effective_height', 0)}"
|
||||
)
|
||||
|
||||
cmd = [
|
||||
"ffmpeg", "-y",
|
||||
"-i", video_path,
|
||||
"-map", "0:v:0",
|
||||
"-map", "0:a?",
|
||||
"-c:v", "libx264",
|
||||
"-preset", "fast",
|
||||
"-crf", "23",
|
||||
"-c:a", "copy",
|
||||
"-movflags", "+faststart",
|
||||
output_path,
|
||||
]
|
||||
|
||||
if self._run_ffmpeg(cmd):
|
||||
normalized = self.get_video_metadata(output_path)
|
||||
logger.info(
|
||||
"视频方向归一化完成: "
|
||||
f"coded={normalized.get('width', 0)}x{normalized.get('height', 0)}, "
|
||||
f"rotation={normalized.get('rotation', 0)}"
|
||||
)
|
||||
return output_path
|
||||
|
||||
logger.warning("视频方向归一化失败,回退使用原视频")
|
||||
return video_path
|
||||
|
||||
def _run_ffmpeg(self, cmd: list) -> bool:
|
||||
cmd_str = ' '.join(shlex.quote(str(c)) for c in cmd)
|
||||
logger.debug(f"FFmpeg CMD: {cmd_str}")
|
||||
try:
|
||||
# Synchronous call for BackgroundTasks compatibility
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
shell=False,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
encoding='utf-8',
|
||||
)
|
||||
if result.returncode != 0:
|
||||
logger.error(f"FFmpeg Error: {result.stderr}")
|
||||
return False
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"FFmpeg Exception: {e}")
|
||||
return False
|
||||
|
||||
def _get_duration(self, file_path: str) -> float:
|
||||
# Synchronous call for BackgroundTasks compatibility
|
||||
# 使用参数列表形式避免 shell=True 的命令注入风险
|
||||
cmd = [
|
||||
'ffprobe', '-v', 'error',
|
||||
'-show_entries', 'format=duration',
|
||||
'-of', 'default=noprint_wrappers=1:nokey=1',
|
||||
file_path
|
||||
]
|
||||
try:
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
return float(result.stdout.strip())
|
||||
except Exception:
|
||||
return 0.0
|
||||
|
||||
def mix_audio(
|
||||
self,
|
||||
voice_path: str,
|
||||
bgm_path: str,
|
||||
output_path: str,
|
||||
bgm_volume: float = 0.2
|
||||
) -> str:
|
||||
"""混合人声与背景音乐"""
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
volume = max(0.0, min(float(bgm_volume), 1.0))
|
||||
filter_complex = (
|
||||
f"[0:a]volume=1.0[a0];"
|
||||
f"[1:a]volume={volume}[a1];"
|
||||
f"[a0][a1]amix=inputs=2:duration=first:dropout_transition=2:normalize=0[aout]"
|
||||
)
|
||||
|
||||
cmd = [
|
||||
"ffmpeg", "-y",
|
||||
"-i", voice_path,
|
||||
"-stream_loop", "-1", "-i", bgm_path,
|
||||
"-filter_complex", filter_complex,
|
||||
"-map", "[aout]",
|
||||
"-c:a", "pcm_s16le",
|
||||
"-shortest",
|
||||
output_path,
|
||||
]
|
||||
|
||||
if self._run_ffmpeg(cmd):
|
||||
return output_path
|
||||
raise RuntimeError("FFmpeg audio mix failed")
|
||||
|
||||
async def compose(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
subtitle_path: Optional[str] = None
|
||||
) -> str:
|
||||
"""合成视频"""
|
||||
# Ensure output dir
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
video_duration = self._get_duration(video_path)
|
||||
audio_duration = self._get_duration(audio_path)
|
||||
|
||||
# Audio loop if needed
|
||||
loop_count = 1
|
||||
if audio_duration > video_duration and video_duration > 0:
|
||||
loop_count = int(audio_duration / video_duration) + 1
|
||||
|
||||
cmd = ["ffmpeg", "-y"]
|
||||
|
||||
# Input video (stream_loop must be before -i)
|
||||
if loop_count > 1:
|
||||
cmd.extend(["-stream_loop", str(loop_count)])
|
||||
cmd.extend(["-i", video_path])
|
||||
|
||||
# Input audio
|
||||
cmd.extend(["-i", audio_path])
|
||||
|
||||
# Filter complex
|
||||
filter_complex = []
|
||||
|
||||
# Subtitles (skip for now to mimic previous state or implement basic)
|
||||
# Previous state: subtitles disabled due to font issues
|
||||
# if subtitle_path: ...
|
||||
|
||||
# Audio map with high quality encoding
|
||||
cmd.extend([
|
||||
"-c:v", "libx264",
|
||||
"-preset", "medium", # 平衡速度与压缩效率
|
||||
"-crf", "20", # 最终输出:高质量(肉眼无损)
|
||||
"-c:a", "aac",
|
||||
"-b:a", "192k", # 音频比特率
|
||||
"-shortest"
|
||||
])
|
||||
# Use audio from input 1
|
||||
cmd.extend(["-map", "0:v", "-map", "1:a"])
|
||||
|
||||
cmd.append(output_path)
|
||||
|
||||
if self._run_ffmpeg(cmd):
|
||||
return output_path
|
||||
else:
|
||||
raise RuntimeError("FFmpeg composition failed")
|
||||
|
||||
def concat_videos(self, video_paths: list, output_path: str, target_fps: int = 25) -> str:
|
||||
"""使用 FFmpeg concat demuxer 拼接多个视频片段"""
|
||||
if not video_paths:
|
||||
raise ValueError("No video segments to concat")
|
||||
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# 生成 concat list 文件
|
||||
list_path = Path(output_path).parent / f"{Path(output_path).stem}_concat.txt"
|
||||
with open(list_path, "w", encoding="utf-8") as f:
|
||||
for vp in video_paths:
|
||||
f.write(f"file '{vp}'\n")
|
||||
|
||||
cmd = [
|
||||
"ffmpeg", "-y",
|
||||
"-f", "concat",
|
||||
"-safe", "0",
|
||||
"-fflags", "+genpts",
|
||||
"-i", str(list_path),
|
||||
"-an",
|
||||
"-vsync", "cfr",
|
||||
"-r", str(target_fps),
|
||||
"-c:v", "libx264",
|
||||
"-preset", "fast",
|
||||
"-crf", "23",
|
||||
"-pix_fmt", "yuv420p",
|
||||
"-movflags", "+faststart",
|
||||
output_path,
|
||||
]
|
||||
|
||||
try:
|
||||
if self._run_ffmpeg(cmd):
|
||||
return output_path
|
||||
else:
|
||||
raise RuntimeError("FFmpeg concat failed")
|
||||
finally:
|
||||
try:
|
||||
list_path.unlink(missing_ok=True)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
def split_audio(self, audio_path: str, start: float, end: float, output_path: str) -> str:
|
||||
"""用 FFmpeg 按时间范围切分音频"""
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
duration = end - start
|
||||
if duration <= 0:
|
||||
raise ValueError(f"Invalid audio split range: start={start}, end={end}, duration={duration}")
|
||||
|
||||
cmd = [
|
||||
"ffmpeg", "-y",
|
||||
"-ss", str(start),
|
||||
"-t", str(duration),
|
||||
"-i", audio_path,
|
||||
"-c", "copy",
|
||||
output_path,
|
||||
]
|
||||
|
||||
if self._run_ffmpeg(cmd):
|
||||
return output_path
|
||||
raise RuntimeError(f"FFmpeg audio split failed: {start}-{end}")
|
||||
|
||||
def get_resolution(self, file_path: str) -> tuple[int, int]:
|
||||
"""获取视频有效显示分辨率(考虑旋转元数据)。"""
|
||||
info = self.get_video_metadata(file_path)
|
||||
return (
|
||||
int(info.get("effective_width") or 0),
|
||||
int(info.get("effective_height") or 0),
|
||||
)
|
||||
|
||||
def prepare_segment(self, video_path: str, target_duration: float, output_path: str,
|
||||
target_resolution: Optional[tuple] = None, source_start: float = 0.0,
|
||||
source_end: Optional[float] = None, target_fps: Optional[int] = None) -> str:
|
||||
"""将素材视频裁剪或循环到指定时长(无音频)。
|
||||
target_resolution: (width, height) 如需统一分辨率则传入,否则保持原分辨率。
|
||||
source_start: 源视频截取起点(秒),默认 0。
|
||||
source_end: 源视频截取终点(秒),默认到素材结尾。
|
||||
target_fps: 输出帧率(可选),用于多素材拼接前统一时间基。
|
||||
"""
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
video_dur = self._get_duration(video_path)
|
||||
if video_dur <= 0:
|
||||
video_dur = target_duration
|
||||
|
||||
clip_end = video_dur
|
||||
if source_end is not None:
|
||||
try:
|
||||
source_end_value = float(source_end)
|
||||
if source_end_value > source_start:
|
||||
clip_end = min(source_end_value, video_dur)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# 可用时长 = 从 source_start 到视频结尾
|
||||
available = max(clip_end - source_start, 0.1)
|
||||
needs_loop = target_duration > available
|
||||
needs_scale = target_resolution is not None
|
||||
needs_fps = bool(target_fps and target_fps > 0)
|
||||
has_source_end = clip_end < video_dur
|
||||
|
||||
# 当需要循环且存在截取范围时,先裁剪出片段,再循环裁剪后的文件
|
||||
# 避免 stream_loop 循环整个视频(而不是截取后的片段)
|
||||
actual_input = video_path
|
||||
trim_temp = None
|
||||
if needs_loop and (source_start > 0 or has_source_end):
|
||||
trim_temp = str(Path(output_path).parent / (Path(output_path).stem + "_trim_tmp.mp4"))
|
||||
trim_cmd = [
|
||||
"ffmpeg", "-y",
|
||||
"-ss", str(source_start),
|
||||
"-i", video_path,
|
||||
"-t", str(available),
|
||||
"-an",
|
||||
"-c:v", "libx264", "-preset", "fast", "-crf", "23",
|
||||
trim_temp,
|
||||
]
|
||||
if not self._run_ffmpeg(trim_cmd):
|
||||
raise RuntimeError(f"FFmpeg trim for loop failed: {video_path}")
|
||||
actual_input = trim_temp
|
||||
source_start = 0.0 # 已裁剪,不需要再 seek
|
||||
# 重新计算循环次数(基于裁剪后文件)
|
||||
available = self._get_duration(trim_temp) or available
|
||||
|
||||
loop_count = int(target_duration / available) + 1 if needs_loop else 0
|
||||
|
||||
cmd = ["ffmpeg", "-y"]
|
||||
if needs_loop:
|
||||
cmd.extend(["-stream_loop", str(loop_count)])
|
||||
if source_start > 0:
|
||||
cmd.extend(["-ss", str(source_start)])
|
||||
cmd.extend(["-i", actual_input, "-t", str(target_duration), "-an"])
|
||||
|
||||
filters = []
|
||||
if needs_fps:
|
||||
filters.append(f"fps={int(target_fps)}")
|
||||
if needs_scale:
|
||||
w, h = target_resolution
|
||||
filters.append(f"scale={w}:{h}:force_original_aspect_ratio=decrease,pad={w}:{h}:(ow-iw)/2:(oh-ih)/2")
|
||||
|
||||
if filters:
|
||||
cmd.extend(["-vf", ",".join(filters)])
|
||||
if needs_fps:
|
||||
cmd.extend(["-vsync", "cfr", "-r", str(int(target_fps))])
|
||||
|
||||
# 需要循环、缩放或指定起点时必须重编码,否则用 stream copy 保持原画质
|
||||
if needs_loop or needs_scale or source_start > 0 or has_source_end or needs_fps:
|
||||
cmd.extend(["-c:v", "libx264", "-preset", "fast", "-crf", "23"])
|
||||
else:
|
||||
cmd.extend(["-c:v", "copy"])
|
||||
|
||||
cmd.append(output_path)
|
||||
|
||||
try:
|
||||
if self._run_ffmpeg(cmd):
|
||||
return output_path
|
||||
raise RuntimeError(f"FFmpeg prepare_segment failed: {video_path}")
|
||||
finally:
|
||||
# 清理裁剪临时文件
|
||||
if trim_temp:
|
||||
try:
|
||||
Path(trim_temp).unlink(missing_ok=True)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
171
backend/app/services/voice_clone_service.py
Normal file
171
backend/app/services/voice_clone_service.py
Normal file
@@ -0,0 +1,171 @@
|
||||
"""
|
||||
声音克隆服务
|
||||
通过 HTTP 调用 CosyVoice 3.0 独立服务 (端口 8010)
|
||||
"""
|
||||
import asyncio
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import httpx
|
||||
from loguru import logger
|
||||
|
||||
# CosyVoice 3.0 服务地址
|
||||
VOICE_CLONE_URL = "http://localhost:8010"
|
||||
|
||||
|
||||
class VoiceCloneService:
|
||||
"""声音克隆服务 - 调用 CosyVoice 3.0 HTTP API"""
|
||||
|
||||
def __init__(self):
|
||||
self.base_url = VOICE_CLONE_URL
|
||||
# 健康状态缓存
|
||||
self._health_cache: Optional[dict] = None
|
||||
self._health_cache_time: float = 0
|
||||
# GPU 并发锁 (Serial Queue)
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
async def _generate_once(
|
||||
self,
|
||||
*,
|
||||
text: str,
|
||||
ref_audio_data: bytes,
|
||||
ref_text: str,
|
||||
language: str,
|
||||
speed: float = 1.0,
|
||||
max_retries: int = 4,
|
||||
) -> bytes:
|
||||
timeout = httpx.Timeout(240.0)
|
||||
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=timeout) as client:
|
||||
response = await client.post(
|
||||
f"{self.base_url}/generate",
|
||||
files={"ref_audio": ("ref.wav", ref_audio_data, "audio/wav")},
|
||||
data={
|
||||
"text": text,
|
||||
"ref_text": ref_text,
|
||||
"language": language,
|
||||
"speed": str(speed),
|
||||
},
|
||||
)
|
||||
|
||||
retryable = False
|
||||
reason = ""
|
||||
|
||||
if response.status_code in (429, 502, 503, 504):
|
||||
retryable = True
|
||||
reason = f"HTTP {response.status_code}"
|
||||
elif response.status_code == 500 and (
|
||||
"生成超时" in response.text or "timeout" in response.text.lower()
|
||||
):
|
||||
retryable = True
|
||||
reason = "upstream timeout"
|
||||
|
||||
if retryable and attempt < max_retries - 1:
|
||||
wait = 8 * (attempt + 1)
|
||||
logger.warning(
|
||||
f"Voice clone retryable error ({reason}), retrying in {wait}s "
|
||||
f"(attempt {attempt + 1}/{max_retries})"
|
||||
)
|
||||
await asyncio.sleep(wait)
|
||||
continue
|
||||
|
||||
response.raise_for_status()
|
||||
return response.content
|
||||
|
||||
except httpx.HTTPStatusError as e:
|
||||
logger.error(f"Voice clone API error: {e.response.status_code} - {e.response.text}")
|
||||
raise RuntimeError(f"声音克隆服务错误: {e.response.text}")
|
||||
except httpx.RequestError as e:
|
||||
if attempt < max_retries - 1:
|
||||
wait = 6 * (attempt + 1)
|
||||
logger.warning(
|
||||
f"Voice clone connection error: {e}; retrying in {wait}s "
|
||||
f"(attempt {attempt + 1}/{max_retries})"
|
||||
)
|
||||
await asyncio.sleep(wait)
|
||||
continue
|
||||
logger.error(f"Voice clone connection error: {e}")
|
||||
raise RuntimeError("无法连接声音克隆服务,请检查服务是否启动")
|
||||
|
||||
raise RuntimeError("声音克隆服务繁忙,请稍后重试")
|
||||
|
||||
async def generate_audio(
|
||||
self,
|
||||
text: str,
|
||||
ref_audio_path: str,
|
||||
ref_text: str,
|
||||
output_path: str,
|
||||
language: str = "Chinese",
|
||||
speed: float = 1.0,
|
||||
) -> str:
|
||||
"""
|
||||
使用声音克隆生成语音
|
||||
|
||||
Args:
|
||||
text: 要合成的文本
|
||||
ref_audio_path: 参考音频本地路径
|
||||
ref_text: 参考音频的转写文字
|
||||
output_path: 输出 wav 路径
|
||||
language: 语言 (Chinese/English/Auto)
|
||||
|
||||
Returns:
|
||||
输出文件路径
|
||||
"""
|
||||
# 使用锁确保串行执行,避免 GPU 显存溢出
|
||||
async with self._lock:
|
||||
logger.info(f"🎤 Voice Clone: {text[:30]}... (language={language})")
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
text = text.strip()
|
||||
if not text:
|
||||
raise RuntimeError("文本为空,无法生成语音")
|
||||
|
||||
with open(ref_audio_path, "rb") as f:
|
||||
ref_audio_data = f.read()
|
||||
|
||||
# CosyVoice 内部自带 text_normalize 分段,无需客户端切分
|
||||
audio_bytes = await self._generate_once(
|
||||
text=text,
|
||||
ref_audio_data=ref_audio_data,
|
||||
ref_text=ref_text,
|
||||
language=language,
|
||||
speed=speed,
|
||||
)
|
||||
with open(output_path, "wb") as f:
|
||||
f.write(audio_bytes)
|
||||
logger.info(f"✅ Voice clone saved: {output_path}")
|
||||
return output_path
|
||||
|
||||
async def check_health(self) -> dict:
|
||||
"""健康检查"""
|
||||
import time
|
||||
|
||||
# 30秒缓存
|
||||
now = time.time()
|
||||
cached = self._health_cache
|
||||
if cached is not None and (now - self._health_cache_time) < 30:
|
||||
return cached
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
response = await client.get(f"{self.base_url}/health")
|
||||
response.raise_for_status()
|
||||
payload = response.json()
|
||||
self._health_cache = payload
|
||||
self._health_cache_time = now
|
||||
return payload
|
||||
except Exception as e:
|
||||
logger.warning(f"Voice clone health check failed: {e}")
|
||||
return {
|
||||
"service": "CosyVoice 3.0 Voice Clone",
|
||||
"model": "unknown",
|
||||
"ready": False,
|
||||
"gpu_id": 0,
|
||||
"error": str(e)
|
||||
}
|
||||
|
||||
|
||||
# 单例
|
||||
voice_clone_service = VoiceCloneService()
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user