Compare commits
34 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
b289006844 | ||
|
|
71b45852bf | ||
|
|
23ff4ff86e | ||
|
|
091f78174e | ||
|
|
190fc2e590 | ||
|
|
48bc78fe38 | ||
|
|
abf005f225 | ||
|
|
9de2cb40b4 | ||
|
|
29c67f629d | ||
|
|
0e3502c6f0 | ||
|
|
a1604979f0 | ||
|
|
08221e48de | ||
|
|
42b5cc0c02 | ||
|
|
1717635bfd | ||
|
|
0a5a17402c | ||
|
|
bc0fe9326a | ||
|
|
035ee29d72 | ||
|
|
a6cc919e5c | ||
|
|
96a298e51c | ||
|
|
e33dfc3031 | ||
|
|
3129d45b25 | ||
|
|
e226224119 | ||
|
|
ee342cc40f | ||
|
|
1a291a03b8 | ||
|
|
1e52346eb4 | ||
|
|
945262a7fc | ||
|
|
be6a3436bb | ||
|
|
b2c1042c5c | ||
|
|
aaa8088c82 | ||
|
|
31469ca01d | ||
|
|
22ea3dd0db | ||
|
|
8a5912c517 | ||
|
|
74516dbcdb | ||
|
|
5357d97012 |
18
.gitignore
vendored
18
.gitignore
vendored
@@ -20,11 +20,14 @@ node_modules/
|
||||
out/
|
||||
.turbo/
|
||||
|
||||
# ============ IDE ============
|
||||
# ============ IDE / AI 工具 ============
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
.agents/
|
||||
.opencode/
|
||||
.claude/
|
||||
|
||||
# ============ 系统文件 ============
|
||||
.DS_Store
|
||||
@@ -35,11 +38,22 @@ desktop.ini
|
||||
backend/outputs/
|
||||
backend/uploads/
|
||||
backend/cookies/
|
||||
backend/user_data/
|
||||
backend/debug_screenshots/
|
||||
backend/keys/
|
||||
*_cookies.json
|
||||
|
||||
# ============ MuseTalk ============
|
||||
# ============ 模型权重 ============
|
||||
models/*/checkpoints/
|
||||
models/MuseTalk/models/
|
||||
models/MuseTalk/results/
|
||||
models/LatentSync/temp/
|
||||
|
||||
# ============ Remotion 构建 ============
|
||||
remotion/dist/
|
||||
|
||||
# ============ 临时文件 ============
|
||||
Temp/
|
||||
|
||||
# ============ 日志 ============
|
||||
*.log
|
||||
|
||||
278
Docs/ALIPAY_DEPLOY.md
Normal file
278
Docs/ALIPAY_DEPLOY.md
Normal file
@@ -0,0 +1,278 @@
|
||||
# 支付宝付费开通会员 — 部署指南
|
||||
|
||||
本文档涵盖支付宝电脑网站支付功能的完整部署流程。用户注册后通过支付宝付费自动激活会员,有效期 1 年。
|
||||
|
||||
---
|
||||
|
||||
## 前置条件
|
||||
|
||||
- 支付宝企业/个体商户账号
|
||||
- 已在 [支付宝开放平台](https://open.alipay.com) 创建应用并获取 APPID
|
||||
- 应用已开通 **「电脑网站支付」** 产品权限(`alipay.trade.page.pay` 接口)
|
||||
- 服务器域名已配置 HTTPS(支付宝回调要求公网可达)
|
||||
|
||||
---
|
||||
|
||||
## 第一部分:支付宝开放平台配置
|
||||
|
||||
### 1. 创建应用
|
||||
|
||||
登录 https://open.alipay.com → 控制台 → 创建应用(或使用已有应用)。
|
||||
|
||||
### 2. 开通「电脑网站支付」产品
|
||||
|
||||
进入应用详情 → 产品绑定/产品管理 → 添加 **「电脑网站支付」** → 提交审核。
|
||||
|
||||
> **注意**:未开通此产品会导致 `ACQ.ACCESS_FORBIDDEN` 错误。
|
||||
|
||||
### 3. 生成密钥对
|
||||
|
||||
进入应用详情 → 开发设置 → 接口加签方式 → 选择 **RSA2(SHA256)**:
|
||||
|
||||
1. 使用支付宝官方密钥工具生成 RSA2048 密钥对
|
||||
2. 将 **应用公钥** 上传到开放平台
|
||||
3. 上传后平台会显示 **支付宝公钥**(`alipayPublicKey_RSA2`)
|
||||
|
||||
最终你会得到两样东西:
|
||||
- **应用私钥**:你本地保存,代码用来签名请求
|
||||
- **支付宝公钥**:平台返回给你,代码用来验证回调签名
|
||||
|
||||
> 应用公钥只是上传用的中间产物,代码中不需要。
|
||||
|
||||
---
|
||||
|
||||
## 第二部分:服务器配置
|
||||
|
||||
### 1. 放置密钥文件
|
||||
|
||||
将密钥保存为标准 PEM 格式,放到 `backend/keys/` 目录:
|
||||
|
||||
```bash
|
||||
mkdir -p /home/rongye/ProgramFiles/ViGent2/backend/keys
|
||||
```
|
||||
|
||||
**`backend/keys/app_private_key.pem`**(应用私钥):
|
||||
|
||||
```
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MIIEvQIBADANBgkqhkiG9w0BAQEFAASC...(你的私钥内容)
|
||||
...
|
||||
-----END PRIVATE KEY-----
|
||||
```
|
||||
|
||||
**`backend/keys/alipay_public_key.pem`**(支付宝公钥):
|
||||
|
||||
```
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A...(支付宝公钥内容)
|
||||
...
|
||||
-----END PUBLIC KEY-----
|
||||
```
|
||||
|
||||
#### PEM 格式要求
|
||||
|
||||
支付宝密钥工具导出的是一行纯文本,需要转换为标准 PEM 格式:
|
||||
|
||||
- 必须有头尾标记(`-----BEGIN/END ...-----`)
|
||||
- 密钥内容每 64 字符换行
|
||||
- 私钥头标记为 `-----BEGIN PRIVATE KEY-----`(PKCS#8 格式)
|
||||
- 公钥头标记为 `-----BEGIN PUBLIC KEY-----`
|
||||
|
||||
如果你拿到的是一行裸密钥,用以下命令转换:
|
||||
|
||||
```bash
|
||||
# 私钥格式化(假设裸密钥在 raw_private.txt 中)
|
||||
echo "-----BEGIN PRIVATE KEY-----" > app_private_key.pem
|
||||
cat raw_private.txt | fold -w 64 >> app_private_key.pem
|
||||
echo "-----END PRIVATE KEY-----" >> app_private_key.pem
|
||||
|
||||
# 公钥格式化
|
||||
echo "-----BEGIN PUBLIC KEY-----" > alipay_public_key.pem
|
||||
cat raw_public.txt | fold -w 64 >> alipay_public_key.pem
|
||||
echo "-----END PUBLIC KEY-----" >> alipay_public_key.pem
|
||||
```
|
||||
|
||||
> `backend/keys/` 目录已加入 `.gitignore`,不会被提交到仓库。
|
||||
|
||||
### 2. 配置环境变量
|
||||
|
||||
在 `backend/.env` 中添加:
|
||||
|
||||
```ini
|
||||
# =============== 支付宝配置 ===============
|
||||
ALIPAY_APP_ID=你的应用APPID
|
||||
ALIPAY_PRIVATE_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/app_private_key.pem
|
||||
ALIPAY_PUBLIC_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/alipay_public_key.pem
|
||||
ALIPAY_NOTIFY_URL=https://vigent.hbyrkj.top/api/payment/notify
|
||||
ALIPAY_RETURN_URL=https://vigent.hbyrkj.top/pay
|
||||
```
|
||||
|
||||
| 变量 | 说明 |
|
||||
|------|------|
|
||||
| `ALIPAY_APP_ID` | 支付宝开放平台应用 APPID |
|
||||
| `ALIPAY_PRIVATE_KEY_PATH` | 应用私钥 PEM 文件绝对路径 |
|
||||
| `ALIPAY_PUBLIC_KEY_PATH` | 支付宝公钥 PEM 文件绝对路径 |
|
||||
| `ALIPAY_NOTIFY_URL` | 异步回调地址(服务器间通信),必须公网 HTTPS 可达 |
|
||||
| `ALIPAY_RETURN_URL` | 同步跳转地址(用户支付完成后浏览器跳转回的页面) |
|
||||
|
||||
`config.py` 中还有几个可调参数(已有默认值,一般不需要加到 .env):
|
||||
|
||||
| 变量 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `ALIPAY_SANDBOX` | `false` | 是否使用沙箱环境 |
|
||||
| `PAYMENT_AMOUNT` | `999.00` | 会员价格(元) |
|
||||
| `PAYMENT_EXPIRE_DAYS` | `365` | 会员有效天数 |
|
||||
|
||||
### 3. 创建数据库表
|
||||
|
||||
通过 Docker 在本地 Supabase 中执行:
|
||||
|
||||
```bash
|
||||
docker exec -i supabase-db psql -U postgres -c "
|
||||
CREATE TABLE IF NOT EXISTS orders (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
|
||||
out_trade_no TEXT UNIQUE NOT NULL,
|
||||
amount DECIMAL(10, 2) NOT NULL DEFAULT 999.00,
|
||||
status TEXT DEFAULT 'pending' CHECK (status IN ('pending', 'paid', 'failed')),
|
||||
trade_no TEXT,
|
||||
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
|
||||
paid_at TIMESTAMP WITH TIME ZONE
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_orders_user_id ON orders(user_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_orders_out_trade_no ON orders(out_trade_no);
|
||||
"
|
||||
```
|
||||
|
||||
### 4. 安装依赖
|
||||
|
||||
```bash
|
||||
# 后端(在 venv 中)
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
venv/bin/pip install python-alipay-sdk
|
||||
```
|
||||
|
||||
> 前端无额外依赖需要安装。
|
||||
|
||||
### 5. Nginx 配置
|
||||
|
||||
确保 Nginx 将 `/api/payment/notify` 代理到后端。如果现有配置已覆盖 `/api/` 前缀,则无需额外修改:
|
||||
|
||||
```nginx
|
||||
location /api/ {
|
||||
proxy_pass http://localhost:8006;
|
||||
# ... 现有配置
|
||||
}
|
||||
```
|
||||
|
||||
### 6. 重启服务
|
||||
|
||||
```bash
|
||||
# 构建前端
|
||||
cd /home/rongye/ProgramFiles/ViGent2/frontend
|
||||
npx next build
|
||||
|
||||
# 重启
|
||||
pm2 restart vigent2-backend
|
||||
pm2 restart vigent2-frontend
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 第三部分:正式上线
|
||||
|
||||
测试通过后,将 `backend/app/core/config.py` 中的测试金额改为正式价格:
|
||||
|
||||
```python
|
||||
PAYMENT_AMOUNT: float = 999.00 # 正式价格
|
||||
```
|
||||
|
||||
或在 `backend/.env` 中添加覆盖:
|
||||
|
||||
```ini
|
||||
PAYMENT_AMOUNT=999.00
|
||||
```
|
||||
|
||||
然后重启后端:
|
||||
|
||||
```bash
|
||||
pm2 restart vigent2-backend
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 支付流程说明
|
||||
|
||||
```
|
||||
用户注册 → 登录(密码正确但 is_active=false)
|
||||
→ 后端返回 403 + payment_token
|
||||
→ 前端跳转 /pay 页面
|
||||
→ POST /api/payment/create-order → 返回支付宝收银台 URL
|
||||
→ 前端重定向到支付宝收银台页面(支持扫码、账号登录、余额等多种支付方式)
|
||||
→ 用户完成支付
|
||||
→ 支付宝异步回调 POST /api/payment/notify
|
||||
→ 后端验签 → 更新订单 → 激活用户(is_active=true, expires_at=+365天)
|
||||
→ 支付宝同步跳转回 /pay?out_trade_no=xxx
|
||||
→ 前端轮询 GET /api/payment/status/{out_trade_no}
|
||||
→ 轮询到 paid → 提示成功 → 跳转登录页
|
||||
→ 用户重新登录 → 成功进入系统
|
||||
```
|
||||
|
||||
**电脑网站支付 vs 当面付**:电脑网站支付(`alipay.trade.page.pay`)会跳转到支付宝官方收银台页面,用户可以选择扫码、支付宝账号登录、余额等多种方式支付,体验更好。当面付(`alipay.trade.precreate`)仅生成一个二维码,只能扫码支付。
|
||||
|
||||
会员到期续费同流程:登录时检测到过期 → 返回 PAYMENT_REQUIRED → 跳转 /pay。
|
||||
|
||||
管理员手动激活功能不受影响,两种方式并存。
|
||||
|
||||
---
|
||||
|
||||
## 涉及文件
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|---------|------|
|
||||
| `backend/requirements.txt` | 修改 | 添加 `python-alipay-sdk` |
|
||||
| `backend/database/schema.sql` | 修改 | 新增 `orders` 表 |
|
||||
| `backend/app/core/config.py` | 修改 | 支付宝配置项 |
|
||||
| `backend/app/core/security.py` | 修改 | payment_token 函数 |
|
||||
| `backend/app/core/deps.py` | 修改 | is_active 安全兜底 |
|
||||
| `backend/app/repositories/orders.py` | 新建 | orders 数据层 |
|
||||
| `backend/app/modules/payment/__init__.py` | 新建 | 模块初始化 |
|
||||
| `backend/app/modules/payment/schemas.py` | 新建 | 请求/响应模型 |
|
||||
| `backend/app/modules/payment/service.py` | 新建 | 支付业务逻辑(电脑网站支付) |
|
||||
| `backend/app/modules/payment/router.py` | 新建 | 3 个 API 端点 |
|
||||
| `backend/app/modules/auth/router.py` | 修改 | 登录返回 PAYMENT_REQUIRED |
|
||||
| `backend/app/main.py` | 修改 | 注册 payment_router |
|
||||
| `backend/.env` | 修改 | 支付宝环境变量 |
|
||||
| `backend/keys/` | 新建 | PEM 密钥文件 |
|
||||
| `frontend/src/shared/lib/auth.ts` | 修改 | login() 处理 paymentToken |
|
||||
| `frontend/src/shared/api/axios.ts` | 修改 | PUBLIC_PATHS 加 /pay |
|
||||
| `frontend/src/app/login/page.tsx` | 修改 | paymentToken 跳转 |
|
||||
| `frontend/src/app/register/page.tsx` | 修改 | 注册成功提示文案 |
|
||||
| `frontend/src/app/pay/page.tsx` | 新建 | 付费页面(重定向到支付宝收银台) |
|
||||
|
||||
---
|
||||
|
||||
## 常见问题
|
||||
|
||||
### RSA key format is not supported
|
||||
|
||||
密钥文件缺少 PEM 头尾标记或未按 64 字符换行。参考「PEM 格式要求」重新格式化。
|
||||
|
||||
### ACQ.ACCESS_FORBIDDEN
|
||||
|
||||
应用未开通「电脑网站支付」产品。在支付宝开放平台 → 应用详情 → 产品管理中添加并开通。
|
||||
|
||||
### 支付宝回调不到
|
||||
|
||||
1. 检查 `ALIPAY_NOTIFY_URL` 是否公网 HTTPS 可达
|
||||
2. 检查 Nginx 是否将 `/api/payment/notify` 代理到后端
|
||||
3. 支付宝回调超时(15s 未响应)会重试,共重试 8 次,持续 24 小时
|
||||
|
||||
### 支付完成后页面未跳转回来
|
||||
|
||||
检查 `ALIPAY_RETURN_URL` 配置是否正确,必须是前端 `/pay` 页面的完整 URL(如 `https://vigent.hbyrkj.top/pay`)。支付宝会在用户支付完成后将浏览器重定向到此地址,并附带 `out_trade_no` 等参数。
|
||||
|
||||
### 前端显示"网络错误"而非具体错误
|
||||
|
||||
API 函数缺少 try/catch 捕获 axios 异常。已在 `auth.ts` 的 `register()` 和 `login()` 中修复。
|
||||
248
Docs/BACKEND_DEV.md
Normal file
248
Docs/BACKEND_DEV.md
Normal file
@@ -0,0 +1,248 @@
|
||||
# ViGent2 后端开发规范
|
||||
|
||||
本文档定义后端开发的结构规范、接口契约与实现习惯。目标是让新功能按统一范式落地,旧逻辑在修复时逐步抽离。
|
||||
|
||||
## 文档定位
|
||||
|
||||
- 本文档只定义后端开发规范与工程约束(分层职责、契约、流程、代码习惯)。
|
||||
- 接口说明、部署运行与环境配置示例请查看 `Docs/BACKEND_README.md`。
|
||||
- 历史变更请记录在 `Docs/DevLogs/` 与 `Docs/TASK_COMPLETE.md`,不要写入本规范文档。
|
||||
|
||||
---
|
||||
|
||||
## 1. 模块化与分层原则
|
||||
|
||||
每个业务功能放入 `app/modules/<feature>/`,以“薄路由 + 厚服务/流程”组织代码。
|
||||
|
||||
- **router.py**:只做参数校验、权限校验、调用 service/workflow、返回统一响应。
|
||||
- **schemas.py**:Pydantic 请求/响应模型。
|
||||
- **service.py**:业务逻辑与集成逻辑(非长流程)。
|
||||
- **workflow.py**:长流程/重任务编排(视频生成、渲染、异步任务)。
|
||||
- **__init__.py**:模块标记。
|
||||
|
||||
其它层级职责:
|
||||
|
||||
- **repositories/**:数据读写(Supabase),不包含业务逻辑。
|
||||
- **services/**:外部依赖与基础能力(TTS、Storage、Remotion 等)。
|
||||
- **core/**:配置、安全、依赖注入、统一响应。
|
||||
|
||||
---
|
||||
|
||||
## 2. 目录结构(当前约定)
|
||||
|
||||
```
|
||||
backend/
|
||||
├── app/
|
||||
│ ├── core/ # config、deps、security、response
|
||||
│ ├── modules/ # 业务模块(路由 + 逻辑)
|
||||
│ │ ├── videos/ # 视频生成任务(router/schemas/service/workflow)
|
||||
│ │ ├── materials/ # 素材管理(router/schemas/service)
|
||||
│ │ ├── publish/ # 多平台发布
|
||||
│ │ ├── auth/ # 认证与会话
|
||||
│ │ ├── ai/ # AI 功能(标题标签生成、多语言翻译)
|
||||
│ │ ├── assets/ # 静态资源(字体/样式/BGM)
|
||||
│ │ ├── ref_audios/ # 声音克隆参考音频(router/schemas/service)
|
||||
│ │ ├── generated_audios/ # 预生成配音管理(router/schemas/service)
|
||||
│ │ ├── login_helper/ # 扫码登录辅助
|
||||
│ │ ├── tools/ # 工具接口(router/schemas/service)
|
||||
│ │ ├── payment/ # 支付宝付费开通(router/schemas/service)
|
||||
│ │ └── admin/ # 管理员功能
|
||||
│ ├── repositories/ # Supabase 数据访问
|
||||
│ ├── services/ # 外部服务集成
|
||||
│ │ ├── uploader/ # 平台发布器(douyin/weixin/xiaohongshu/bilibili)
|
||||
│ │ ├── qr_login_service.py
|
||||
│ │ ├── publish_service.py
|
||||
│ │ ├── remotion_service.py
|
||||
│ │ ├── storage.py
|
||||
│ │ └── ...
|
||||
│ └── tests/
|
||||
├── assets/ # 字体 / 样式 / bgm
|
||||
├── user_data/ # 用户隔离数据(Cookie 等)
|
||||
├── scripts/
|
||||
└── requirements.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 接口契约规范(统一响应)
|
||||
|
||||
所有 JSON API 返回统一结构:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "ok",
|
||||
"data": { },
|
||||
"code": 0
|
||||
}
|
||||
```
|
||||
|
||||
- 正常响应使用 `success_response`。
|
||||
- 错误通过 `HTTPException` 抛出,统一由全局异常处理返回 `{success:false, message, code}`。
|
||||
- 不再使用 `detail` 作为前端错误文案(前端已改为读 `message`)。
|
||||
|
||||
### `/api/videos/generate` 参数契约(关键约定)
|
||||
|
||||
- `custom_assignments` 每项使用 `material_path/start/end/source_start/source_end?`,并以时间轴可见段为准。
|
||||
- `output_aspect_ratio` 仅允许 `9:16` / `16:9`,默认 `9:16`。
|
||||
- 标题显示模式参数:
|
||||
- `title_display_mode`: `short` / `persistent`(默认 `short`,对主标题与副标题统一生效)
|
||||
- `title_duration`: 默认 `4.0`(秒),仅 `short` 模式生效
|
||||
- 片头副标题参数:
|
||||
- `secondary_title`: 副标题文字(可选,限 20 字),仅在视频画面中显示,不参与发布标题
|
||||
- `secondary_title_style_id` / `secondary_title_font_size` / `secondary_title_top_margin`: 副标题样式配置
|
||||
- workflow/remotion 侧需保持字段透传一致,避免前后端语义漂移。
|
||||
|
||||
### `/api/videos/cleanup` 行为约定
|
||||
|
||||
- 仅清理当前用户在 Storage 中的生成产物:
|
||||
- `outputs` bucket(生成视频)
|
||||
- `generated-audios` bucket(预生成配音 `.wav/.json`)
|
||||
- 清理接口采用严格成功语义:
|
||||
- 全部删除成功才返回 success
|
||||
- 任一删除失败返回错误,前端应保留清理弹窗并允许重试
|
||||
- 下载接口约定:`GET /api/videos/generated/{video_id}/download` 必须返回 `Content-Disposition: attachment`,用于前端一键下载,避免浏览器改为在线播放。
|
||||
|
||||
---
|
||||
|
||||
## 4. 认证与权限
|
||||
|
||||
- 认证方式:**HttpOnly Cookie** (`access_token`)。
|
||||
- `get_current_user` / `get_current_user_optional` 位于 `core/deps.py`。
|
||||
- Session 单设备校验使用 `repositories/sessions.py`。
|
||||
- AI/Tools 等高成本接口必须强制鉴权(`Depends(get_current_user)`),禁止匿名调用消耗外部 API 配额。
|
||||
- 生产环境要求 `DEBUG=false` + 非默认 `JWT_SECRET_KEY`;默认密钥在生产模式下必须阻止服务启动。
|
||||
|
||||
---
|
||||
|
||||
## 5. 任务与状态
|
||||
|
||||
- 视频生成任务通过 `modules/videos/workflow.py` 统一编排。
|
||||
- 任务状态通过 `modules/videos/task_store.py` 读写,**不要直接维护全局 dict**。
|
||||
- 默认使用 Redis(`REDIS_URL`),不可用自动回退内存。
|
||||
|
||||
---
|
||||
|
||||
## 6. 文件与存储
|
||||
|
||||
- 所有文件上传/下载/删除/移动通过 `services/storage.py`。
|
||||
- 需要重命名时使用 `move_file`,避免直接读写 Storage。
|
||||
- `delete_file` 必须向上抛出异常,不允许静默吞错(避免清理接口出现“假成功”)。
|
||||
- `list_files` 默认容错返回空列表;清理等强一致场景应使用 `strict=True`。
|
||||
- 所有用户输入的文件路径/ID 必须做防御校验:
|
||||
- `material_id` 拒绝 `..` 序列,避免路径穿越
|
||||
- `video_id` 等资源 ID 使用白名单(如 `^[A-Za-z0-9_-]+$`)
|
||||
- 上传/下载链路必须有体积上限:
|
||||
- 素材上传遵循 `MAX_UPLOAD_SIZE_MB`
|
||||
- 参考音频上限 5MB
|
||||
- 文案提取工具文件上传与 URL 下载结果均上限 500MB
|
||||
- 面向前端的错误返回默认使用通用文案;内部堆栈只写服务端日志,避免泄露路径/实现细节。
|
||||
|
||||
### Cookie 存储(用户隔离)
|
||||
|
||||
多平台扫码登录产生的 Cookie 按用户隔离存储:
|
||||
|
||||
```
|
||||
backend/user_data/{user_uuid}/cookies/
|
||||
├── douyin_cookies.json
|
||||
├── weixin_cookies.json
|
||||
└── ...
|
||||
```
|
||||
|
||||
- `publish_service.py` 中通过 `_get_cookies_dir(user_id)` / `_get_cookie_path(user_id, platform)` 定位
|
||||
- 会话 key 格式:`"{user_id}_{platform}"`,确保多用户并发登录互不干扰
|
||||
- 登录成功后 Cookie 自动保存到对应路径,发布时自动加载
|
||||
|
||||
---
|
||||
|
||||
## 7. 代码约定
|
||||
|
||||
- 只在 router 做校验与响应拼装。
|
||||
- 业务逻辑写在 service/workflow。
|
||||
- 数据库访问写在 repositories。
|
||||
- 统一使用 `loguru` 打日志。
|
||||
- GLM SDK 调用统一收口到 `services/glm_service.py`(通过统一入口方法),避免在模块内重复拼装 `chat.completions.create` 调用代码。
|
||||
- 涉及文案深度学习的抓取调用,router 侧应透传 `current_user.id` 到 `creator_scraper`,以便复用用户 Cookie 上下文并保持 `analysis_id` 用户隔离。
|
||||
|
||||
---
|
||||
|
||||
## 8. 开发流程建议
|
||||
|
||||
- **新增功能**:先建模块,**必须**包含 `router.py + schemas.py + service.py`,不允许 router-only。
|
||||
- **修复 Bug**:顺手把涉及的逻辑抽到对应 service/workflow(渐进式改造)。
|
||||
- **改旧模块**:改动哪部分就拆哪部分,不要求一次重构整个文件。
|
||||
- **核心流程变更**:必跑冒烟(登录/生成/发布)。
|
||||
|
||||
> **渐进原则**:新代码高标准,旧代码逐步改。不做大规模一次性重构,避免引入回归风险。
|
||||
|
||||
---
|
||||
|
||||
## 9. 常用环境变量
|
||||
|
||||
- `SUPABASE_URL` / `SUPABASE_KEY`
|
||||
- `SUPABASE_PUBLIC_URL`
|
||||
- `REDIS_URL`
|
||||
- `GLM_API_KEY`
|
||||
- `LATENTSYNC_*`
|
||||
- `CORS_ORIGINS` (CORS 白名单,默认 *)
|
||||
|
||||
### MuseTalk / 混合唇形同步
|
||||
- `MUSETALK_GPU_ID` (GPU 编号,默认 0)
|
||||
- `MUSETALK_API_URL` (常驻服务地址,默认 http://localhost:8011)
|
||||
- `MUSETALK_BATCH_SIZE` (推理批大小,默认 32)
|
||||
- `MUSETALK_VERSION` (v15)
|
||||
- `MUSETALK_USE_FLOAT16` (半精度,默认 true)
|
||||
- `LIPSYNC_DURATION_THRESHOLD` (秒,>=此值用 MuseTalk;代码默认 120,本仓库当前 `.env` 配置 100)
|
||||
|
||||
### 微信视频号
|
||||
- `WEIXIN_HEADLESS_MODE` (headful/headless-new)
|
||||
- `WEIXIN_CHROME_PATH` / `WEIXIN_BROWSER_CHANNEL`
|
||||
- `WEIXIN_USER_AGENT` / `WEIXIN_LOCALE` / `WEIXIN_TIMEZONE_ID`
|
||||
- `WEIXIN_FORCE_SWIFTSHADER`
|
||||
- `WEIXIN_TRANSCODE_MODE` (reencode/faststart/off)
|
||||
|
||||
### 抖音
|
||||
- `DOUYIN_HEADLESS_MODE` (headful/headless-new,默认 headless-new)
|
||||
- `DOUYIN_CHROME_PATH` / `DOUYIN_BROWSER_CHANNEL`
|
||||
- `DOUYIN_USER_AGENT` (默认 Chrome/144)
|
||||
- `DOUYIN_LOCALE` / `DOUYIN_TIMEZONE_ID`
|
||||
- `DOUYIN_FORCE_SWIFTSHADER`
|
||||
- `DOUYIN_DEBUG_ARTIFACTS` / `DOUYIN_RECORD_VIDEO` / `DOUYIN_KEEP_SUCCESS_VIDEO`
|
||||
|
||||
### 小红书
|
||||
- `XIAOHONGSHU_HEADLESS_MODE` (headful/headless-new,默认 headless-new)
|
||||
- `XIAOHONGSHU_CHROME_PATH` / `XIAOHONGSHU_BROWSER_CHANNEL`
|
||||
- `XIAOHONGSHU_USER_AGENT`
|
||||
- `XIAOHONGSHU_LOCALE` / `XIAOHONGSHU_TIMEZONE_ID`
|
||||
- `XIAOHONGSHU_FORCE_SWIFTSHADER`
|
||||
- `XIAOHONGSHU_DEBUG_ARTIFACTS`
|
||||
|
||||
### 支付宝
|
||||
- `ALIPAY_APP_ID` / `ALIPAY_PRIVATE_KEY_PATH` / `ALIPAY_PUBLIC_KEY_PATH`
|
||||
- `ALIPAY_NOTIFY_URL` / `ALIPAY_RETURN_URL`
|
||||
- `ALIPAY_SANDBOX` (沙箱模式,默认 false)
|
||||
- `PAYMENT_AMOUNT` (会员价格,默认 999.00)
|
||||
- `PAYMENT_EXPIRE_DAYS` (会员有效天数,默认 365)
|
||||
|
||||
---
|
||||
|
||||
## 10. Playwright 发布调试
|
||||
|
||||
- 诊断日志落盘:`backend/app/debug_screenshots/weixin_network.log` / `douyin_network.log`
|
||||
- 关键失败截图:`backend/app/debug_screenshots/weixin_*.png` / `douyin_*.png` / `xiaohongshu_*.png`
|
||||
- 视频号建议使用 headful + xvfb-run(避免 headless 解码/指纹问题)
|
||||
- 发布专项实现细节(登录链路、成功判定、排障)统一维护在 `Docs/PUBLISH_DEPLOY.md`
|
||||
|
||||
---
|
||||
|
||||
## 11. 最小新增模块示例
|
||||
|
||||
```
|
||||
app/modules/foo/
|
||||
├── router.py
|
||||
├── schemas.py
|
||||
├── service.py
|
||||
└── workflow.py
|
||||
```
|
||||
|
||||
router 仅调用 service/workflow 并返回 `success_response`。
|
||||
@@ -1,6 +1,12 @@
|
||||
# ViGent2 后端开发指南
|
||||
|
||||
本文档为后端开发人员提供架构概览、接口规范以及开发流程指南。
|
||||
本文档提供后端架构概览、接口说明与运行配置。
|
||||
|
||||
## 📌 文档定位
|
||||
|
||||
- 本文档用于说明后端服务能力、接口与部署运行方式(面向使用与联调)。
|
||||
- 开发规范、分层约束与代码实现习惯请查看 `Docs/BACKEND_DEV.md`。
|
||||
- 历史变更与里程碑请查看 `Docs/DevLogs/` 与 `Docs/TASK_COMPLETE.md`。
|
||||
|
||||
---
|
||||
|
||||
@@ -8,26 +14,36 @@
|
||||
|
||||
后端采用 **FastAPI** 框架,基于 Python 3.10+ 构建,主要负责业务逻辑处理、AI 任务调度以及与各微服务组件的交互。
|
||||
|
||||
### 目录结构
|
||||
### 目录结构(概览)
|
||||
|
||||
```
|
||||
backend/
|
||||
├── app/
|
||||
│ ├── api/ # API 路由定义 (endpoints)
|
||||
│ ├── core/ # 核心配置 (config.py, security.py)
|
||||
│ ├── models/ # Pydantic 数据模型 (schemas)
|
||||
│ ├── services/ # 业务逻辑服务层
|
||||
│ │ ├── auth_service.py # 用户认证服务
|
||||
│ │ ├── glm_service.py # GLM-4 大模型服务
|
||||
│ │ ├── lipsync_service.py # LatentSync 唇形同步
|
||||
│ │ ├── publish_service.py # 社交媒体发布
|
||||
│ │ └── voice_clone_service.py# Qwen3-TTS 声音克隆
|
||||
│ ├── core/ # 核心配置 (config.py, security.py, response.py)
|
||||
│ ├── modules/ # 业务模块 (router/service/workflow/schemas)
|
||||
│ │ ├── videos/ # 视频生成任务(router/schemas/service/workflow)
|
||||
│ │ ├── materials/ # 素材管理(router/schemas/service)
|
||||
│ │ ├── publish/ # 多平台发布
|
||||
│ │ ├── auth/ # 认证与会话
|
||||
│ │ ├── ai/ # AI 功能(标题标签生成、多语言翻译)
|
||||
│ │ ├── assets/ # 静态资源(字体/样式/BGM)
|
||||
│ │ ├── ref_audios/ # 声音克隆参考音频(router/schemas/service)
|
||||
│ │ ├── generated_audios/ # 预生成配音管理(router/schemas/service)
|
||||
│ │ ├── login_helper/ # 扫码登录辅助
|
||||
│ │ ├── tools/ # 工具接口(router/schemas/service)
|
||||
│ │ ├── payment/ # 支付宝付费开通(router/schemas/service)
|
||||
│ │ └── admin/ # 管理员功能
|
||||
│ ├── repositories/ # Supabase 数据访问
|
||||
│ ├── services/ # 外部服务集成 (TTS/Remotion/Storage/Uploader 等)
|
||||
│ └── tests/ # 单元测试与集成测试
|
||||
├── scripts/ # 运维脚本 (watchdog.py, init_db.py)
|
||||
├── assets/ # 资源库 (fonts, bgm, styles)
|
||||
├── user_data/ # 用户隔离数据 (Cookie 等)
|
||||
└── requirements.txt # 依赖清单
|
||||
```
|
||||
|
||||
> 详细分层职责(router/service/workflow/repositories)与开发约束请查看 `Docs/BACKEND_DEV.md`。
|
||||
|
||||
---
|
||||
|
||||
## 🔌 API 接口规范
|
||||
@@ -35,7 +51,7 @@ backend/
|
||||
后端服务默认运行在 `8006` 端口。
|
||||
|
||||
- **文档地址**: `http://localhost:8006/docs` (Swagger UI)
|
||||
- **认证方式**: Bearer Token (JWT)
|
||||
- **认证方式**: HttpOnly Cookie (JWT)
|
||||
|
||||
### 核心模块
|
||||
|
||||
@@ -44,39 +60,158 @@ backend/
|
||||
* `POST /api/auth/register`: 用户注册
|
||||
* `GET /api/auth/me`: 获取当前用户信息
|
||||
|
||||
> 授权有效期策略:在登录与受保护接口鉴权时,后端会检查 `users.expires_at`。账号到期会自动停用 (`is_active=false`) 并清理 session,返回 `403: 会员已到期,请续费`。
|
||||
|
||||
2. **视频生成 (Videos)**
|
||||
* `POST /api/videos/generate`: 提交生成任务
|
||||
* `GET /api/videos/tasks/{task_id}`: 查询任务状态
|
||||
* `GET/POST /api/videos/voice-preview`: 生成音色试听短音频(返回二进制音频流)
|
||||
* `POST /api/videos/cleanup`: 清理当前用户工作区生成产物(outputs + generated-audios)
|
||||
* `GET /api/videos/tasks/{task_id}`: 查询单个任务状态
|
||||
* `GET /api/videos/tasks`: 获取用户所有任务列表
|
||||
* `GET /api/videos/generated`: 获取历史视频列表
|
||||
* `GET /api/videos/generated/{video_id}/download`: 下载历史视频(`Content-Disposition: attachment`)
|
||||
* `DELETE /api/videos/generated/{video_id}`: 删除历史视频
|
||||
|
||||
> **修正 (16:20)**:任务查询与历史列表接口已更新为 `/api/videos/tasks/{task_id}` 与 `/api/videos/generated`。
|
||||
> `POST /api/videos/cleanup` 采用严格成功语义:仅当目标文件删除全部成功时返回 success;存在删除失败会返回错误并提示重试。
|
||||
|
||||
3. **素材管理 (Materials)**
|
||||
* `POST /api/materials/upload`: 上传素材 (Direct Upload to Supabase)
|
||||
* `POST /api/materials`: 上传素材
|
||||
* `GET /api/materials`: 获取素材列表
|
||||
* `PUT /api/materials/{material_id}`: 重命名素材
|
||||
* `GET /api/materials/stream/{material_id}`: 同源流式返回素材文件(用于前端 canvas 截帧,避免跨域 CORS taint;服务端会拒绝 `..` 路径)
|
||||
|
||||
4. **社交发布 (Publish)**
|
||||
* `POST /api/publish`: 发布视频到 B站/抖音/小红书
|
||||
* `POST /api/publish`: 发布视频到 抖音/微信视频号/B站/小红书
|
||||
* `POST /api/publish/login/{platform}`: 获取平台二维码并启动扫码登录
|
||||
* `GET /api/publish/login/status/{platform}`: 轮询登录状态(含抖音刷脸验证二维码)
|
||||
* `POST /api/publish/logout/{platform}`: 注销平台登录(删除 Cookie)
|
||||
* `POST /api/publish/cookies/save/{platform}`: 保存客户端提取的 Cookie
|
||||
* `GET /api/publish/accounts`: 获取已登录账号列表
|
||||
* `GET /api/publish/screenshot/{filename}`: 获取发布成功截图(需登录)
|
||||
|
||||
> 提示:视频号/抖音发布建议使用 headful + xvfb-run 运行后端。发布专项实现与部署说明见 `Docs/PUBLISH_DEPLOY.md`。
|
||||
|
||||
5. **资源库 (Assets)**
|
||||
* `GET /api/assets/subtitle-styles`: 字幕样式列表
|
||||
* `GET /api/assets/title-styles`: 标题样式列表
|
||||
* `GET /api/assets/bgm`: 背景音乐列表
|
||||
|
||||
6. **声音克隆 (Ref Audios)**
|
||||
* `POST /api/ref-audios`: 上传参考音频 (multipart/form-data,自动 Whisper 转写 ref_text)
|
||||
* `GET /api/ref-audios`: 获取参考音频列表
|
||||
* `PUT /api/ref-audios/{id}`: 重命名参考音频
|
||||
* `DELETE /api/ref-audios/{id}`: 删除参考音频
|
||||
* `POST /api/ref-audios/{id}/retranscribe`: 重新识别参考音频文字(Whisper 转写 + 超 10s 自动截取)
|
||||
|
||||
7. **AI 功能 (AI)**
|
||||
* `POST /api/ai/generate-meta`: AI 生成标题和标签(需登录)
|
||||
* `POST /api/ai/translate`: AI 多语言翻译(支持 9 种目标语言,需登录)
|
||||
* `POST /api/ai/rewrite`: AI 改写文案(需登录)
|
||||
|
||||
8. **预生成配音 (Generated Audios)**
|
||||
* `POST /api/generated-audios/generate`: 异步生成配音(返回 task_id)
|
||||
* `GET /api/generated-audios/tasks/{task_id}`: 轮询生成进度
|
||||
* `GET /api/generated-audios`: 列出用户所有配音
|
||||
* `DELETE /api/generated-audios/{audio_id}`: 删除配音
|
||||
* `PUT /api/generated-audios/{audio_id}`: 重命名配音
|
||||
|
||||
9. **工具 (Tools)**
|
||||
* `POST /api/tools/extract-script`: 从视频链接提取文案(需登录)
|
||||
* `POST /api/tools/analyze-creator`: 分析博主标题并返回热门话题(需登录)
|
||||
* `POST /api/tools/generate-topic-script`: 基于选中话题生成文案(需登录)
|
||||
|
||||
> 文案深度学习说明:
|
||||
> - 平台支持:抖音 / B站博主主页链接。
|
||||
> - 抓取策略:当前统一使用 Playwright 主链路抓取标题(抖音/B站),并结合用户登录态 Cookie 上下文增强成功率。
|
||||
> - `analysis_id` 绑定 `user_id` 且有 TTL(默认 20 分钟),用于后续“生成文案”阶段安全读取标题上下文。
|
||||
|
||||
10. **健康检查**
|
||||
* `GET /api/videos/lipsync/health`: 唇形同步服务健康状态(含 LatentSync + MuseTalk + 混合路由阈值)
|
||||
* `GET /api/videos/voiceclone/health`: CosyVoice 3.0 服务健康状态
|
||||
|
||||
11. **支付 (Payment)**
|
||||
* `POST /api/payment/create-order`: 创建支付宝电脑网站支付订单(需 payment_token)
|
||||
* `POST /api/payment/notify`: 支付宝异步通知回调(返回纯文本 success/fail)
|
||||
* `GET /api/payment/status/{out_trade_no}`: 查询订单支付状态(前端轮询)
|
||||
|
||||
> 登录时若账号未激活或已过期,返回 403 + `payment_token`,前端跳转 `/pay` 页面完成付费。详见 [支付宝部署指南](ALIPAY_DEPLOY.md)。
|
||||
|
||||
### 安全基线(生产环境)
|
||||
|
||||
- `DEBUG` 必须设为 `false`:认证 Cookie 会带 `Secure`,仅在 HTTPS 下发送。
|
||||
- `JWT_SECRET_KEY` 必须是强随机值且不能使用默认值;当 `DEBUG=false` 且仍为默认值时,后端会在启动阶段直接拒绝启动。
|
||||
- 上传体积限制:
|
||||
- `POST /api/materials`:受 `MAX_UPLOAD_SIZE_MB` 限制(默认 500MB)
|
||||
- `POST /api/ref-audios`:5MB
|
||||
- `POST /api/tools/extract-script`:文件上传与 URL 下载结果均限制 500MB
|
||||
- `video_id` 在下载/删除接口使用白名单校验(`^[A-Za-z0-9_-]+$`),非法值直接返回 400。
|
||||
|
||||
### 统一响应结构
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "ok",
|
||||
"data": { },
|
||||
"code": 0
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎛️ 视频生成扩展参数
|
||||
|
||||
`POST /api/videos/generate` 支持以下可选字段:
|
||||
|
||||
- `material_path`: 视频素材路径(单素材模式)
|
||||
- `material_paths`: 多素材路径数组(多机位模式,≥2 个素材时按句子自动切换)
|
||||
- `tts_mode`: TTS 模式 (`edgetts` / `voiceclone`)
|
||||
- `voice`: EdgeTTS 音色 ID(edgetts 模式)
|
||||
- `ref_audio_id` / `ref_text`: 参考音频 ID 与文本(voiceclone 模式)
|
||||
- `generated_audio_id`: 预生成配音 ID(存在时跳过内联 TTS,使用已生成的配音文件)
|
||||
- `speed`: 语速(声音克隆模式,默认 1.0,范围 0.8-1.2)
|
||||
- `custom_assignments`: 自定义素材分配数组(每项含 `material_path` / `start` / `end` / `source_start` / `source_end?`),存在时优先按时间轴可见段生成
|
||||
- `output_aspect_ratio`: 输出画面比例(`9:16` 或 `16:9`,默认 `9:16`)
|
||||
- `lipsync_model`: 唇形模型路由模式(`default` / `fast` / `advanced`)
|
||||
- `default`: 阈值路由(`LIPSYNC_DURATION_THRESHOLD`)
|
||||
- `fast`: 强制 MuseTalk,不可用时回退 LatentSync
|
||||
- `advanced`: 强制 LatentSync
|
||||
- `language`: TTS 语言区域(默认 `zh-CN`;会映射为 Whisper 的 `zh/en/...` 与 CosyVoice 的 `Chinese/English/Auto`)
|
||||
- `title`: 片头标题文字
|
||||
- `title_display_mode`: 标题显示模式(`short` / `persistent`,默认 `short`;该模式对主标题与副标题统一生效)
|
||||
- `title_duration`: 标题显示时长(秒,默认 `4.0`;`short` 模式生效)
|
||||
- `subtitle_style_id`: 字幕样式 ID
|
||||
- `title_style_id`: 标题样式 ID
|
||||
- `subtitle_font_size`: 字幕字号(覆盖样式默认值)
|
||||
- `title_font_size`: 标题字号(覆盖样式默认值)
|
||||
- `title_top_margin`: 标题距顶部像素
|
||||
- `secondary_title`: 片头副标题文字(可选,限 20 字,仅视频画面显示)
|
||||
- `secondary_title_style_id`: 副标题样式 ID
|
||||
- `secondary_title_font_size`: 副标题字号
|
||||
- `secondary_title_top_margin`: 副标题距主标题间距
|
||||
- `subtitle_bottom_margin`: 字幕距底部像素
|
||||
- `enable_subtitles`: 是否启用字幕
|
||||
- `bgm_id`: 背景音乐 ID
|
||||
- `bgm_volume`: 背景音乐音量(0-1,默认 0.2)
|
||||
|
||||
### 多素材稳定性说明
|
||||
|
||||
- 多素材片段在拼接前统一重编码,并强制 `25fps + CFR`,减少段边界时间基不一致导致的画面卡顿。
|
||||
- concat 流程启用 `+genpts` 重建时间戳,提升拼接后时间轴连续性。
|
||||
- 对带旋转元数据的 MOV 素材会先做方向归一化,再进入分辨率判断和后续流程。
|
||||
- compose 阶段(视频轨+音频轨合并)在**无需循环视频**时使用 `-c:v copy` 流复制;需要循环时才重编码。
|
||||
- FFmpeg 子进程设有超时保护:`_run_ffmpeg()` 600 秒、`_get_duration()` 30 秒,防止畸形文件导致永久挂起。
|
||||
|
||||
### 全局并发控制
|
||||
|
||||
- 视频生成入口使用 `asyncio.Semaphore(2)` 限制最多 2 个任务同时执行,排队中的任务显示"排队中..."状态。
|
||||
- Redis 任务 key 设有 TTL:创建时 24 小时,completed/failed 状态 2 小时,`list()` 时自动清理过期索引。
|
||||
|
||||
### 字幕时间戳优化
|
||||
|
||||
- Whisper 输出经 `smooth_word_timestamps()` 三步平滑:单调递增保证、重叠消除(中点分割)、微小间隙填补(<50ms)。
|
||||
- 支持 `original_text` 原文节奏映射:原文字符按比例映射到 Whisper 时间戳上,解决 AI 改写/多语言文案与转录不一致问题。
|
||||
|
||||
## 📦 资源库与静态资源
|
||||
|
||||
- 本地资源目录:`backend/assets/{fonts,bgm,styles}`
|
||||
@@ -106,18 +241,30 @@ pip install -r requirements.txt
|
||||
|
||||
### 3. 环境变量配置
|
||||
|
||||
复制 `.env.example` 到 `.env` 并配置必要的 Key:
|
||||
当前仓库使用 `backend/.env` 作为运行配置基准;请按你的环境替换敏感值并核对以下关键项(生产环境请勿提交真实密钥):
|
||||
|
||||
```ini
|
||||
# Supabase
|
||||
SUPABASE_URL=http://localhost:8008
|
||||
SUPABASE_KEY=your_service_role_key
|
||||
|
||||
# GLM API (用于 AI 标题生成)
|
||||
# GLM API (用于 AI 标题/改写/翻译/文案深度学习)
|
||||
GLM_API_KEY=your_glm_api_key
|
||||
|
||||
# LatentSync 配置
|
||||
LATENTSYNC_GPU_ID=1
|
||||
|
||||
# MuseTalk 配置 (长视频唇形同步)
|
||||
MUSETALK_GPU_ID=0
|
||||
MUSETALK_API_URL=http://localhost:8011
|
||||
MUSETALK_BATCH_SIZE=32
|
||||
LIPSYNC_DURATION_THRESHOLD=100
|
||||
|
||||
# MuseTalk 可调参数(示例)
|
||||
MUSETALK_DETECT_EVERY=2
|
||||
MUSETALK_BLEND_CACHE_EVERY=2
|
||||
MUSETALK_ENCODE_CRF=14
|
||||
MUSETALK_ENCODE_PRESET=slow
|
||||
```
|
||||
|
||||
### 4. 启动服务
|
||||
@@ -129,43 +276,11 @@ uvicorn app.main:app --host 0.0.0.0 --port 8006 --reload
|
||||
|
||||
---
|
||||
|
||||
## 🧩 服务集成指南
|
||||
## 🧩 开发约定与测试
|
||||
|
||||
### 集成新模型
|
||||
|
||||
如果需要集成新的 AI 模型 (例如新的 TTS 引擎):
|
||||
|
||||
1. 在 `app/services/` 下创建新的 Service 类 (如 `NewTTSService`)。
|
||||
2. 实现 `generate` 方法,可以使用 subprocess 调用,也可以是 HTTP 请求。
|
||||
3. **重要**: 如果模型占用 GPU,请务必使用 `asyncio.Lock` 进行并发控制,防止 OOM。
|
||||
4. 在 `app/api/` 中添加对应的路由调用。
|
||||
|
||||
### 添加定时任务
|
||||
|
||||
目前推荐使用 **APScheduler** 或 **Crontab** 来管理定时任务。
|
||||
社交媒体的定时发布功能目前依赖 `playwright` 的延迟执行,未来计划迁移到 Celery 队列。
|
||||
|
||||
---
|
||||
|
||||
## 🛡️ 错误处理
|
||||
|
||||
全项目统一使用 `Loguru` 进行日志记录。
|
||||
|
||||
```python
|
||||
from loguru import logger
|
||||
|
||||
try:
|
||||
# 业务逻辑
|
||||
except Exception as e:
|
||||
logger.error(f"操作失败: {str(e)}")
|
||||
raise HTTPException(status_code=500, detail="服务器内部错误")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🧪 测试
|
||||
|
||||
运行测试套件:
|
||||
- 新增模块、分层职责、统一响应、错误处理与调试规范请查看 `Docs/BACKEND_DEV.md`。
|
||||
- 建议在核心流程变更后做基础冒烟:登录、视频生成、发布。
|
||||
- 测试命令:
|
||||
|
||||
```bash
|
||||
pytest
|
||||
|
||||
224
Docs/COSYVOICE3_DEPLOY.md
Normal file
224
Docs/COSYVOICE3_DEPLOY.md
Normal file
@@ -0,0 +1,224 @@
|
||||
# CosyVoice 3.0 部署文档
|
||||
|
||||
## 概览
|
||||
|
||||
| 项目 | 值 |
|
||||
|------|------|
|
||||
| 模型 | Fun-CosyVoice3-0.5B-2512 (0.5B 参数) |
|
||||
| 端口 | 8010 |
|
||||
| GPU | 0 (CUDA_VISIBLE_DEVICES=0) |
|
||||
| 推理精度 | FP16 (自动混合精度) |
|
||||
| PM2 名称 | vigent2-cosyvoice |
|
||||
| Conda 环境 | cosyvoice (Python 3.10) |
|
||||
| 启动脚本 | `run_cosyvoice.sh` |
|
||||
| 服务脚本 | `models/CosyVoice/cosyvoice_server.py` |
|
||||
| 模型加载时间 | ~22-34 秒 |
|
||||
| 显存占用 | ~3-5 GB |
|
||||
|
||||
## 支持语言
|
||||
|
||||
中文、英文、日语、韩语、德语、西班牙语、法语、意大利语、俄语,18+ 中国方言
|
||||
|
||||
## 目录结构
|
||||
|
||||
```
|
||||
models/CosyVoice/
|
||||
├── cosyvoice_server.py # FastAPI 服务 (端口 8010)
|
||||
├── cosyvoice/ # CosyVoice 源码
|
||||
│ └── cli/cosyvoice.py # AutoModel 入口
|
||||
├── third_party/Matcha-TTS/ # 子模块依赖
|
||||
├── pretrained_models/
|
||||
│ ├── Fun-CosyVoice3-0.5B/ # 模型文件 (~8.2GB)
|
||||
│ │ ├── llm.pt # LLM 模型 (1.9GB)
|
||||
│ │ ├── llm.rl.pt # RL 模型 (1.9GB, 备用)
|
||||
│ │ ├── flow.pt # Flow 模型 (1.3GB)
|
||||
│ │ ├── hift.pt # HiFT 声码器 (80MB)
|
||||
│ │ ├── campplus.onnx # 说话人嵌入 (27MB)
|
||||
│ │ ├── speech_tokenizer_v3.onnx # 语音分词器 (925MB)
|
||||
│ │ ├── cosyvoice3.yaml # 模型配置
|
||||
│ │ └── CosyVoice-BlankEN/ # Qwen tokenizer
|
||||
│ └── CosyVoice-ttsfrd/ # 文本正则化资源
|
||||
│ ├── resource/ # 解压后的 ttsfrd 资源
|
||||
│ └── resource.zip
|
||||
run_cosyvoice.sh # PM2 启动脚本
|
||||
```
|
||||
|
||||
## API 接口
|
||||
|
||||
### GET /health
|
||||
|
||||
健康检查,返回:
|
||||
```json
|
||||
{
|
||||
"service": "CosyVoice 3.0 Voice Clone",
|
||||
"model": "Fun-CosyVoice3-0.5B",
|
||||
"ready": true,
|
||||
"gpu_id": 0
|
||||
}
|
||||
```
|
||||
|
||||
### POST /generate
|
||||
|
||||
声音克隆生成。
|
||||
|
||||
**参数 (multipart/form-data):**
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
|------|------|------|------|
|
||||
| ref_audio | File | 是 | 参考音频 (WAV) |
|
||||
| text | string | 是 | 要合成的文本 |
|
||||
| ref_text | string | 是 | 参考音频的转写文字 |
|
||||
| language | string | 否 | 语言 (默认 "Chinese",CosyVoice 自动检测) |
|
||||
| speed | float | 否 | 语速 (默认 1.0,范围 0.5-2.0,建议 0.8-1.2) |
|
||||
| instruct_text | string | 否 | 语气指令 (默认 "",非空时切换为 `inference_instruct2` 模式) |
|
||||
|
||||
**推理模式分支:**
|
||||
- `instruct_text` 为空 → `inference_zero_shot(text, prompt_text, ref_audio)` — 纯声音克隆
|
||||
- `instruct_text` 非空 → `inference_instruct2(text, instruct_text, ref_audio)` — 带语气/情绪控制的声音克隆
|
||||
|
||||
**支持的语气指令示例:**
|
||||
```
|
||||
"You are a helpful assistant. 请非常开心地说一句话。<|endofprompt|>"
|
||||
"You are a helpful assistant. 请非常伤心地说一句话。<|endofprompt|>"
|
||||
"You are a helpful assistant. 请非常生气地说一句话。<|endofprompt|>"
|
||||
```
|
||||
|
||||
**返回:** WAV 音频文件
|
||||
|
||||
**状态码:**
|
||||
- 200: 成功
|
||||
- 429: GPU 忙,请重试
|
||||
- 500: 生成失败/超时
|
||||
- 503: 模型未加载/服务中毒
|
||||
|
||||
## 安全机制
|
||||
|
||||
1. **GPU 推理锁** (`asyncio.Lock`): 防止并发推理导致 GPU 状态损坏
|
||||
2. **429 拒绝**: 锁被占用时立即返回 429,客户端重试
|
||||
3. **超时保护**: `60 + len(text) * 2` 秒,上限 300 秒
|
||||
4. **Poisoned 标记**: 超时后标记服务为中毒状态,健康检查返回 `ready: false`
|
||||
5. **强制退出**: 超时后 1.5 秒强制 `os._exit(1)`,PM2 自动重启
|
||||
6. **启动自检**: 启动时用短文本做一次真实推理,验证 GPU 推理链路可用;失败则 `_model_loaded = False`,健康检查返回 `ready: false`,避免假阳性
|
||||
7. **参考音频自动截取**: 参考音频超过 10 秒时自动截取前 10 秒(CosyVoice 建议 3-10 秒),避免采样异常
|
||||
|
||||
## 运维命令
|
||||
|
||||
```bash
|
||||
# 启动
|
||||
pm2 start run_cosyvoice.sh --name vigent2-cosyvoice
|
||||
|
||||
# 重启
|
||||
pm2 restart vigent2-cosyvoice
|
||||
|
||||
# 查看日志
|
||||
pm2 logs vigent2-cosyvoice --lines 50
|
||||
|
||||
# 健康检查
|
||||
curl http://localhost:8010/health
|
||||
|
||||
# 停止
|
||||
pm2 stop vigent2-cosyvoice
|
||||
```
|
||||
|
||||
## 从零部署步骤
|
||||
|
||||
### 1. 克隆仓库
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models
|
||||
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
|
||||
cd CosyVoice
|
||||
git submodule update --init --recursive
|
||||
```
|
||||
|
||||
### 2. 创建 Conda 环境
|
||||
|
||||
```bash
|
||||
conda create -n cosyvoice -y python=3.10
|
||||
conda activate cosyvoice
|
||||
```
|
||||
|
||||
### 3. 安装依赖
|
||||
|
||||
注意:不能直接 `pip install -r requirements.txt`,有版本冲突需要处理。
|
||||
|
||||
```bash
|
||||
# 安装 PyTorch 2.3.1 (CUDA 12.1) — 必须先装,版本严格要求
|
||||
pip install torch==2.3.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
|
||||
|
||||
# 核心推理依赖
|
||||
pip install conformer==0.3.2 HyperPyYAML==1.2.2 inflect==7.3.1 \
|
||||
librosa==0.10.2 lightning==2.2.4 modelscope==1.20.0 omegaconf==2.3.0 \
|
||||
pydantic==2.7.0 soundfile==0.12.1 fastapi==0.115.6 uvicorn==0.30.0 \
|
||||
transformers==4.51.3 protobuf==4.25 hydra-core==1.3.2 \
|
||||
rich==13.7.1 diffusers==0.29.0 x-transformers==2.11.24 wetext==0.0.4
|
||||
|
||||
# onnxruntime-gpu
|
||||
pip install onnxruntime-gpu==1.18.0 \
|
||||
--extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
|
||||
|
||||
# 其他必要依赖
|
||||
pip install gdown matplotlib pyarrow wget onnx python-multipart httpx
|
||||
|
||||
# openai-whisper 需要 setuptools < 71(提供 pkg_resources)
|
||||
pip install "setuptools<71"
|
||||
pip install --no-build-isolation openai-whisper==20231117
|
||||
|
||||
# pyworld 需要 g++ 和 Cython
|
||||
pip install Cython
|
||||
PATH="/usr/bin:$PATH" pip install pyworld==0.3.4
|
||||
|
||||
# 关键版本修复
|
||||
pip install "numpy<2" # onnxruntime-gpu 不兼容 numpy 2.x
|
||||
pip install "ruamel.yaml<0.18" # hyperpyyaml 不兼容 ruamel.yaml 0.19+
|
||||
```
|
||||
|
||||
> **重要**: CosyVoice 要求 torch==2.3.1。torch 2.10+ 会导致 CUBLAS_STATUS_INVALID_VALUE 错误。
|
||||
> torch 2.3.1+cu121 自带 nvidia-cudnn-cu12,onnxruntime CUDAExecutionProvider 可正常使用。
|
||||
|
||||
### 4. 下载模型
|
||||
|
||||
```bash
|
||||
# 使用 huggingface_hub (国内用 hf-mirror.com)
|
||||
HF_ENDPOINT=https://hf-mirror.com python -c "
|
||||
from huggingface_hub import snapshot_download
|
||||
snapshot_download('FunAudioLLM/Fun-CosyVoice3-0.5B-2512', local_dir='pretrained_models/Fun-CosyVoice3-0.5B')
|
||||
snapshot_download('FunAudioLLM/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')
|
||||
"
|
||||
```
|
||||
|
||||
### 5. 安装 ttsfrd (可选,提升文本正则化质量)
|
||||
|
||||
```bash
|
||||
cd pretrained_models/CosyVoice-ttsfrd/
|
||||
unzip resource.zip -d .
|
||||
pip install ttsfrd_dependency-0.1-py3-none-any.whl
|
||||
pip install ttsfrd-0.4.2-cp310-cp310-linux_x86_64.whl
|
||||
```
|
||||
|
||||
### 6. 注册 PM2
|
||||
|
||||
```bash
|
||||
pm2 start run_cosyvoice.sh --name vigent2-cosyvoice
|
||||
pm2 save
|
||||
```
|
||||
|
||||
## 已知问题
|
||||
|
||||
1. **ttsfrd "prepare tts engine failed"**: ttsfrd C 库内部日志,Python 层初始化成功,不影响使用
|
||||
2. **Sliding Window Attention 警告**: transformers 库提示,不影响推理结果
|
||||
3. **onnxruntime Memcpy 性能提示**: `Memcpy nodes are not supported by the CUDA EP`,仅为性能建议日志,不影响功能
|
||||
|
||||
> 注:libcudnn.so.8 问题在 torch 2.3.1+cu121 环境下已解决(自带 nvidia-cudnn-cu12),onnxruntime CUDAExecutionProvider 可正常加载。
|
||||
|
||||
## 与 Qwen3-TTS 对比
|
||||
|
||||
| 特性 | Qwen3-TTS (已停用) | CosyVoice 3.0 (当前) |
|
||||
|------|-----------|----------------|
|
||||
| 端口 | 8009 | 8010 |
|
||||
| 模型大小 | 0.6B | 0.5B |
|
||||
| 语言 | 中/英/日/韩 | 9 语言 + 18 方言 |
|
||||
| 克隆方式 | ref_audio + ref_text | ref_audio + ref_text |
|
||||
| prompt 格式 | 直接传 ref_text | `You are a helpful assistant.<\|endofprompt\|>` + ref_text |
|
||||
| 内置分段 | 无,需客户端分段 | 内置 text_normalize 自动分段 |
|
||||
| 状态 | 已停用 (PM2 stopped) | 生产使用中 |
|
||||
@@ -7,8 +7,8 @@
|
||||
| 服务器 | Dell PowerEdge R730 |
|
||||
| CPU | 2× Intel Xeon E5-2680 v4 (56 线程) |
|
||||
| 内存 | 192GB DDR4 |
|
||||
| GPU 0 | NVIDIA RTX 3090 24GB |
|
||||
| GPU 1 | NVIDIA RTX 3090 24GB (用于 LatentSync) |
|
||||
| GPU 0 | NVIDIA RTX 3090 24GB (MuseTalk + CosyVoice) |
|
||||
| GPU 1 | NVIDIA RTX 3090 24GB (LatentSync) |
|
||||
| 部署路径 | `/home/rongye/ProgramFiles/ViGent2` |
|
||||
|
||||
---
|
||||
@@ -28,8 +28,17 @@ node --version
|
||||
# 检查 FFmpeg
|
||||
ffmpeg -version
|
||||
|
||||
# 检查 Chrome (视频号发布)
|
||||
google-chrome --version
|
||||
|
||||
# 检查 Xvfb
|
||||
xvfb-run --help
|
||||
|
||||
# 检查 pm2 (用于服务管理)
|
||||
pm2 --version
|
||||
|
||||
# 检查 Redis (任务状态存储,推荐)
|
||||
redis-server --version
|
||||
```
|
||||
|
||||
如果缺少依赖:
|
||||
@@ -37,8 +46,17 @@ pm2 --version
|
||||
sudo apt update
|
||||
sudo apt install ffmpeg
|
||||
|
||||
# 安装 Xvfb (视频号发布)
|
||||
sudo apt install xvfb
|
||||
|
||||
# 安装 pm2
|
||||
npm install -g pm2
|
||||
|
||||
# 安装 Chrome (视频号发布)
|
||||
wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | sudo gpg --dearmor -o /usr/share/keyrings/google-linux-signing-keyring.gpg
|
||||
printf "deb [arch=amd64 signed-by=/usr/share/keyrings/google-linux-signing-keyring.gpg] http://dl.google.com/linux/chrome/deb/ stable main\n" | sudo tee /etc/apt/sources.list.d/google-chrome.list > /dev/null
|
||||
sudo apt update
|
||||
sudo apt install -y google-chrome-stable
|
||||
```
|
||||
|
||||
---
|
||||
@@ -54,7 +72,9 @@ cd /home/rongye/ProgramFiles/ViGent2
|
||||
|
||||
---
|
||||
|
||||
## 步骤 3: 部署 AI 模型 (LatentSync 1.6)
|
||||
## 步骤 3: 部署 AI 模型
|
||||
|
||||
### 3a. LatentSync 1.6 (短视频唇形同步, GPU1)
|
||||
|
||||
> ⚠️ **重要**:LatentSync 需要独立的 Conda 环境和 **~18GB VRAM**。请**不要**直接安装在后端环境中。
|
||||
|
||||
@@ -75,6 +95,26 @@ conda activate latentsync
|
||||
python -m scripts.server # 测试能否启动,Ctrl+C 退出
|
||||
```
|
||||
|
||||
### 3b. MuseTalk 1.5 (长视频唇形同步, GPU0)
|
||||
|
||||
> MuseTalk 是单步潜空间修复模型(非扩散模型),推理速度接近实时,适合达到路由阈值的长视频(本仓库当前 `.env` 示例为 >=100s)。与 CosyVoice 共享 GPU0,fp16 推理约需 4-8GB 显存。合成阶段已改为 FFmpeg rawvideo 管道直编码(`libx264` + 可配 CRF/preset)并保留 numpy blending,减少中间有损文件。
|
||||
|
||||
请参考详细的独立部署指南:
|
||||
**[MuseTalk 部署指南](MUSETALK_DEPLOY.md)**
|
||||
|
||||
简要步骤:
|
||||
1. 创建独立的 `musetalk` Conda 环境 (Python 3.10 + PyTorch 2.0.1 + CUDA 11.8)
|
||||
2. 安装 mmcv/mmdet/mmpose 等依赖
|
||||
3. 下载模型权重 (`download_weights.sh`)
|
||||
4. 创建必要的软链接 (`musetalk/config.json`, `musetalk/musetalkV15`)
|
||||
|
||||
**验证 MuseTalk 部署**:
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||||
/home/rongye/ProgramFiles/miniconda3/envs/musetalk/bin/python scripts/server.py
|
||||
# 另一个终端: curl http://localhost:8011/health
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 步骤 4: 安装后端依赖
|
||||
@@ -96,14 +136,30 @@ pip install -r requirements.txt
|
||||
playwright install chromium
|
||||
```
|
||||
|
||||
> 提示:视频号发布建议使用系统 Chrome + xvfb-run(避免 headless 解码失败)。
|
||||
> 抖音发布同样建议 headful 模式 (`DOUYIN_HEADLESS_MODE=headful`)。
|
||||
> 四平台发布专项实现说明请见 `Docs/PUBLISH_DEPLOY.md`。
|
||||
|
||||
### 扫码登录注意事项
|
||||
|
||||
- **Cookie 按用户隔离**:每个用户的 Cookie 存储在 `backend/user_data/{uuid}/cookies/` 目录下,多用户并发登录互不干扰。
|
||||
- **抖音 QR 登录关键教训**:
|
||||
- 扫码后绝对**不能重新加载 QR 页面**,否则会销毁会话 token
|
||||
- 使用**新标签页**检测登录完成状态(检查 URL 包含 `creator-micro` + session cookies 存在)
|
||||
- 抖音可能弹出**刷脸验证**,后端会自动提取验证二维码返回给前端展示
|
||||
- **小红书 QR 登录关键点**:
|
||||
- 创作平台默认可能是短信登录视图,需先切换到扫码登录再抓取二维码
|
||||
- 扫码后可能跳转 `creator.xiaohongshu.com/new/home`,不一定命中旧 `publish` 成功指示 URL
|
||||
- **微信视频号发布**:标题、描述、标签统一写入"视频描述"字段
|
||||
|
||||
---
|
||||
|
||||
### 可选:AI 标题/标签生成
|
||||
### 可选:AI 标题/标签生成
|
||||
|
||||
> ✅ 如需启用“AI 标题/标签生成”功能,请确保后端可访问外网 API。
|
||||
|
||||
- 需要可访问 `https://open.bigmodel.cn`
|
||||
- API Key 配置在 `backend/app/services/glm_service.py`(建议替换为自己的密钥)
|
||||
- 需要可访问 `https://open.bigmodel.cn`
|
||||
- API Key 配置在 `backend/.env` 的 `GLM_API_KEY`
|
||||
|
||||
---
|
||||
|
||||
@@ -135,31 +191,77 @@ playwright install chromium
|
||||
CREATE POLICY "Allow public read" ON storage.objects FOR SELECT TO anon USING (bucket_id = 'materials' OR bucket_id = 'outputs');
|
||||
EOF
|
||||
```
|
||||
|
||||
> **注意**:后端启动时会自动创建额外的存储桶(`ref-audios`、`generated-audios`),无需手动创建。
|
||||
|
||||
---
|
||||
|
||||
## 步骤 7: 配置环境变量
|
||||
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
```
|
||||
|
||||
> 💡 **说明**:当前仓库直接使用 `backend/.env`。请按你的环境替换敏感值并确认以下参数。
|
||||
> 如需自定义,可编辑 `.env` 修改以下参数:
|
||||
|
||||
| 配置项 | 当前示例值 | 说明 |
|
||||
|--------|------------|------|
|
||||
| `SUPABASE_URL` | `http://localhost:8008` | Supabase API 内部地址 |
|
||||
| `SUPABASE_PUBLIC_URL` | `https://api.hbyrkj.top` | Supabase API 公网地址 (前端访问) |
|
||||
| `LATENTSYNC_GPU_ID` | 1 | GPU 选择 (0 或 1) |
|
||||
| `LATENTSYNC_USE_SERVER` | true | 设为 true 以启用常驻服务加速 |
|
||||
| `LATENTSYNC_INFERENCE_STEPS` | 30 | 推理步数 (16-50) |
|
||||
| `LATENTSYNC_GUIDANCE_SCALE` | 1.9 | 引导系数 (1.0-3.0) |
|
||||
| `LATENTSYNC_ENABLE_DEEPCACHE` | true | DeepCache 推理加速 |
|
||||
| `LATENTSYNC_SEED` | 1247 | 固定随机种子(可复现) |
|
||||
| `DEBUG` | false | 生产环境必须为 false(仅开发环境可设 true) |
|
||||
| `JWT_SECRET_KEY` | 强随机值 | 生产环境禁止默认值;默认值在 `DEBUG=false` 下会阻止后端启动 |
|
||||
| `REDIS_URL` | `redis://localhost:6379/0` | 任务状态存储(不可用时回退内存) |
|
||||
| `WEIXIN_HEADLESS_MODE` | headless-new | 视频号 Playwright 模式 (headful/headless-new) |
|
||||
| `WEIXIN_CHROME_PATH` | `/usr/bin/google-chrome` | 系统 Chrome 路径 |
|
||||
| `WEIXIN_BROWSER_CHANNEL` | | Chromium 通道 (可选) |
|
||||
| `WEIXIN_USER_AGENT` | Chrome 120 UA | 视频号浏览器指纹 UA |
|
||||
| `WEIXIN_LOCALE` | zh-CN | 视频号语言环境 |
|
||||
| `WEIXIN_TIMEZONE_ID` | Asia/Shanghai | 视频号时区 |
|
||||
| `WEIXIN_FORCE_SWIFTSHADER` | true | 强制软件 WebGL,避免 context lost |
|
||||
| `WEIXIN_TRANSCODE_MODE` | reencode | 上传前转码 (reencode/faststart/off) |
|
||||
| `DOUYIN_HEADLESS_MODE` | headless-new | 抖音 Playwright 模式 (headful/headless-new) |
|
||||
| `DOUYIN_CHROME_PATH` | `/usr/bin/google-chrome` | 抖音 Chrome 路径 |
|
||||
| `DOUYIN_BROWSER_CHANNEL` | | 抖音 Chromium 通道 (可选) |
|
||||
| `DOUYIN_USER_AGENT` | Chrome/144 UA | 抖音浏览器指纹 UA |
|
||||
| `DOUYIN_LOCALE` | zh-CN | 抖音语言环境 |
|
||||
| `DOUYIN_TIMEZONE_ID` | Asia/Shanghai | 抖音时区 |
|
||||
| `DOUYIN_FORCE_SWIFTSHADER` | true | 强制软件 WebGL |
|
||||
| `XIAOHONGSHU_HEADLESS_MODE` | headless-new | 小红书 Playwright 模式 (headful/headless-new) |
|
||||
| `XIAOHONGSHU_CHROME_PATH` | `/usr/bin/google-chrome` | 小红书 Chrome 路径 |
|
||||
| `XIAOHONGSHU_BROWSER_CHANNEL` | | 小红书 Chromium 通道 (可选) |
|
||||
| `XIAOHONGSHU_USER_AGENT` | Chrome/144 UA | 小红书浏览器指纹 UA |
|
||||
| `XIAOHONGSHU_LOCALE` | zh-CN | 小红书语言环境 |
|
||||
| `XIAOHONGSHU_TIMEZONE_ID` | Asia/Shanghai | 小红书时区 |
|
||||
| `XIAOHONGSHU_FORCE_SWIFTSHADER` | true | 强制软件 WebGL |
|
||||
| `DOUYIN_DEBUG_ARTIFACTS` | false | 保留调试截图 |
|
||||
| `DOUYIN_RECORD_VIDEO` | false | 录制浏览器操作视频 |
|
||||
| `DOUYIN_KEEP_SUCCESS_VIDEO` | false | 成功后保留录屏 |
|
||||
| `CORS_ORIGINS` | `*` | CORS 允许源 (生产环境建议白名单) |
|
||||
| `MUSETALK_GPU_ID` | 0 | MuseTalk GPU 编号 |
|
||||
| `MUSETALK_API_URL` | `http://localhost:8011` | MuseTalk 常驻服务地址 |
|
||||
| `MUSETALK_BATCH_SIZE` | 32 | MuseTalk 推理批大小 |
|
||||
| `MUSETALK_VERSION` | v15 | MuseTalk 模型版本 |
|
||||
| `MUSETALK_USE_FLOAT16` | true | MuseTalk 半精度加速 |
|
||||
| `LIPSYNC_DURATION_THRESHOLD` | 100 | 秒,>=此值用 MuseTalk,<此值用 LatentSync(代码默认 120,建议在 `.env` 显式配置) |
|
||||
| `ALIPAY_APP_ID` | 空 | 支付宝应用 APPID |
|
||||
| `ALIPAY_PRIVATE_KEY_PATH` | 空 | 应用私钥 PEM 文件路径 |
|
||||
| `ALIPAY_PUBLIC_KEY_PATH` | 空 | 支付宝公钥 PEM 文件路径 |
|
||||
| `ALIPAY_NOTIFY_URL` | 空 | 支付宝异步回调地址 (公网 HTTPS) |
|
||||
| `ALIPAY_RETURN_URL` | 空 | 支付完成后浏览器跳转地址 |
|
||||
| `PAYMENT_AMOUNT` | `999.00` | 会员价格 (元) |
|
||||
| `PAYMENT_EXPIRE_DAYS` | `365` | 会员有效天数 |
|
||||
|
||||
# 复制配置模板
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
> 💡 **说明**:`.env.example` 已包含正确的默认配置,直接复制即可使用。
|
||||
> 如需自定义,可编辑 `.env` 修改以下参数:
|
||||
|
||||
| 配置项 | 默认值 | 说明 |
|
||||
|--------|--------|------|
|
||||
| `SUPABASE_URL` | `http://localhost:8008` | Supabase API 内部地址 |
|
||||
| `SUPABASE_PUBLIC_URL` | `https://api.hbyrkj.top` | Supabase API 公网地址 (前端访问) |
|
||||
| `LATENTSYNC_GPU_ID` | 1 | GPU 选择 (0 或 1) |
|
||||
| `LATENTSYNC_USE_SERVER` | false | 设为 true 以启用常驻服务加速 |
|
||||
| `LATENTSYNC_INFERENCE_STEPS` | 20 | 推理步数 (20-50) |
|
||||
| `LATENTSYNC_GUIDANCE_SCALE` | 1.5 | 引导系数 (1.0-3.0) |
|
||||
| `DEBUG` | true | 生产环境改为 false |
|
||||
> 支付宝完整配置步骤(密钥生成、PEM 格式、产品开通等)请参考 **[支付宝部署指南](ALIPAY_DEPLOY.md)**。
|
||||
|
||||
> 认证相关强约束:当 `DEBUG=false` 时,后端登录 Cookie 会带 `Secure`,前端必须通过 HTTPS 域名访问,HTTP 端口直连无法保持登录态。
|
||||
|
||||
---
|
||||
|
||||
@@ -189,6 +291,12 @@ source venv/bin/activate
|
||||
uvicorn app.main:app --host 0.0.0.0 --port 8006
|
||||
```
|
||||
|
||||
推荐使用项目脚本启动后端(已内置 xvfb + headful 发布环境):
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2
|
||||
./run_backend.sh # 默认 8006,可用 PORT 覆盖
|
||||
```
|
||||
|
||||
### 启动前端 (终端 2)
|
||||
|
||||
```bash
|
||||
@@ -203,12 +311,19 @@ cd /home/rongye/ProgramFiles/ViGent2/models/LatentSync
|
||||
conda activate latentsync
|
||||
python -m scripts.server
|
||||
```
|
||||
|
||||
### 验证
|
||||
|
||||
1. 访问 http://服务器IP:3002 查看前端
|
||||
2. 访问 http://服务器IP:8006/docs 查看 API 文档
|
||||
3. 上传测试视频,生成口播视频
|
||||
### 启动 MuseTalk (终端 4, 长视频唇形同步)
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||||
/home/rongye/ProgramFiles/miniconda3/envs/musetalk/bin/python scripts/server.py
|
||||
```
|
||||
|
||||
### 验证
|
||||
|
||||
1. 访问 `https://你的前端域名` 查看前端(生产环境不要用 HTTP 端口直连)
|
||||
2. 访问 `http://服务器IP:8006/docs` 查看 API 文档(仅内网/运维调试)
|
||||
3. 上传测试视频,生成口播视频
|
||||
|
||||
---
|
||||
|
||||
@@ -223,9 +338,19 @@ python -m scripts.server
|
||||
1. 创建启动脚本 `run_backend.sh`:
|
||||
```bash
|
||||
cat > run_backend.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
./venv/bin/uvicorn app.main:app --host 0.0.0.0 --port 8006
|
||||
#!/usr/bin/env bash
|
||||
set -e
|
||||
BASE_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
export WEIXIN_HEADLESS_MODE=headful
|
||||
export DOUYIN_HEADLESS_MODE=headful
|
||||
export WEIXIN_DEBUG_ARTIFACTS=false
|
||||
export WEIXIN_RECORD_VIDEO=false
|
||||
export DOUYIN_DEBUG_ARTIFACTS=false
|
||||
export DOUYIN_RECORD_VIDEO=false
|
||||
PORT=${PORT:-8006}
|
||||
cd "$BASE_DIR/backend"
|
||||
exec xvfb-run --auto-servernum --server-args="-screen 0 1920x1080x24" \
|
||||
./venv/bin/uvicorn app.main:app --host 0.0.0.0 --port "$PORT"
|
||||
EOF
|
||||
chmod +x run_backend.sh
|
||||
```
|
||||
@@ -267,34 +392,48 @@ chmod +x run_latentsync.sh
|
||||
pm2 start ./run_latentsync.sh --name vigent2-latentsync
|
||||
```
|
||||
|
||||
### 4. 启动 Qwen3-TTS 声音克隆服务 (可选)
|
||||
### 4. 启动 CosyVoice 3.0 声音克隆服务 (可选)
|
||||
|
||||
> 如需使用声音克隆功能,需要启动此服务。
|
||||
> 如需使用声音克隆功能,需要启动此服务。详细部署步骤见 [CosyVoice 3.0 部署文档](COSYVOICE3_DEPLOY.md)。
|
||||
|
||||
1. 安装 HTTP 服务依赖:
|
||||
```bash
|
||||
conda activate qwen-tts
|
||||
pip install fastapi uvicorn python-multipart
|
||||
```
|
||||
1. 启动脚本位于项目根目录: `run_cosyvoice.sh`
|
||||
|
||||
2. 启动脚本位于项目根目录: `run_qwen_tts.sh`
|
||||
|
||||
3. 使用 pm2 启动:
|
||||
2. 使用 pm2 启动:
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2
|
||||
pm2 start ./run_qwen_tts.sh --name vigent2-qwen-tts
|
||||
pm2 start ./run_cosyvoice.sh --name vigent2-cosyvoice
|
||||
pm2 save
|
||||
```
|
||||
|
||||
4. 验证服务:
|
||||
3. 验证服务:
|
||||
```bash
|
||||
# 检查健康状态
|
||||
curl http://localhost:8009/health
|
||||
curl http://localhost:8010/health
|
||||
```
|
||||
|
||||
### 5. 启动服务看门狗 (Watchdog)
|
||||
### 5. 启动 MuseTalk 长视频唇形同步服务
|
||||
|
||||
> 🛡️ **推荐**:监控 Qwen-TTS 和 LatentSync 服务健康状态,卡死时自动重启。
|
||||
> 达到阈值(当前 `.env` 示例为 >=100s)自动路由到 MuseTalk。MuseTalk 不可用时自动回退 LatentSync。
|
||||
> 详细部署步骤见 [MuseTalk 部署指南](MUSETALK_DEPLOY.md)。
|
||||
|
||||
1. 启动脚本位于项目根目录: `run_musetalk.sh`
|
||||
|
||||
2. 使用 pm2 启动:
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2
|
||||
pm2 start ./run_musetalk.sh --name vigent2-musetalk
|
||||
pm2 save
|
||||
```
|
||||
|
||||
3. 验证服务:
|
||||
```bash
|
||||
curl http://localhost:8011/health
|
||||
# {"status":"ok","model_loaded":true}
|
||||
```
|
||||
|
||||
### 6. 启动服务看门狗 (Watchdog)
|
||||
|
||||
> 🛡️ **推荐**:监控 CosyVoice 和 LatentSync 服务健康状态,卡死时自动重启。
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2
|
||||
@@ -309,13 +448,16 @@ pm2 save
|
||||
pm2 startup
|
||||
```
|
||||
|
||||
> **提示**: 完整的 PM2 进程列表应包含 5-6 个服务: vigent2-backend, vigent2-frontend, vigent2-latentsync, vigent2-cosyvoice, vigent2-musetalk, vigent2-watchdog。
|
||||
|
||||
### pm2 常用命令
|
||||
|
||||
```bash
|
||||
pm2 status # 查看所有服务状态
|
||||
pm2 logs # 查看所有日志
|
||||
pm2 logs vigent2-backend # 查看后端日志
|
||||
pm2 logs vigent2-qwen-tts # 查看 Qwen3-TTS 日志
|
||||
pm2 logs vigent2-cosyvoice # 查看 CosyVoice 日志
|
||||
pm2 logs vigent2-musetalk # 查看 MuseTalk 日志
|
||||
pm2 restart all # 重启所有服务
|
||||
pm2 stop vigent2-latentsync # 停止 LatentSync 服务
|
||||
pm2 delete all # 删除所有服务
|
||||
@@ -401,8 +543,8 @@ server {
|
||||
GLM_API_KEY=your_zhipu_api_key
|
||||
```
|
||||
|
||||
3. **验证**:
|
||||
访问 `http://localhost:8006/docs`,测试 `/api/tools/extract-script` 接口。
|
||||
3. **验证**:
|
||||
访问 `http://localhost:8006/docs`,在已登录会话下测试 `/api/tools/extract-script`(该接口需认证)。
|
||||
|
||||
---
|
||||
|
||||
@@ -454,7 +596,8 @@ python3 -c "import torch; print(torch.cuda.is_available())"
|
||||
sudo lsof -i :8006
|
||||
sudo lsof -i :3002
|
||||
sudo lsof -i :8007
|
||||
sudo lsof -i :8009 # Qwen3-TTS
|
||||
sudo lsof -i :8010 # CosyVoice
|
||||
sudo lsof -i :8011 # MuseTalk
|
||||
```
|
||||
|
||||
### 查看日志
|
||||
@@ -464,7 +607,8 @@ sudo lsof -i :8009 # Qwen3-TTS
|
||||
pm2 logs vigent2-backend
|
||||
pm2 logs vigent2-frontend
|
||||
pm2 logs vigent2-latentsync
|
||||
pm2 logs vigent2-qwen-tts
|
||||
pm2 logs vigent2-cosyvoice
|
||||
pm2 logs vigent2-musetalk
|
||||
```
|
||||
|
||||
### SSH 连接卡顿 / 系统响应慢
|
||||
@@ -495,6 +639,7 @@ pm2 logs vigent2-qwen-tts
|
||||
| `playwright` | 社交媒体自动发布 |
|
||||
| `biliup` | B站视频上传 |
|
||||
| `loguru` | 日志管理 |
|
||||
| `python-alipay-sdk` | 支付宝支付集成 |
|
||||
|
||||
### 前端关键依赖
|
||||
|
||||
@@ -503,6 +648,7 @@ pm2 logs vigent2-qwen-tts
|
||||
| `next` | React 框架 |
|
||||
| `swr` | 数据请求与缓存 |
|
||||
| `tailwindcss` | CSS 样式 |
|
||||
| `wavesurfer.js` | 音频波形(时间轴编辑器) |
|
||||
|
||||
### LatentSync 关键依赖
|
||||
|
||||
|
||||
@@ -9,7 +9,7 @@
|
||||
### 背景
|
||||
统一处理 API 请求的认证失败场景,避免各页面重复处理 401/403 错误。
|
||||
|
||||
### 实现 (`frontend/src/lib/axios.ts`)
|
||||
### 实现 (`frontend/src/shared/api/axios.ts`)
|
||||
|
||||
```typescript
|
||||
import axios from 'axios';
|
||||
@@ -325,7 +325,7 @@ models/Qwen3-TTS/
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `frontend/src/lib/axios.ts` | 修改 | Axios 全局拦截器 (401/403 自动跳转) |
|
||||
| `frontend/src/shared/api/axios.ts` | 修改 | Axios 全局拦截器 (401/403 自动跳转) |
|
||||
| `frontend/src/app/layout.tsx` | 修改 | viewport 配置 + body 渐变背景 |
|
||||
| `frontend/src/app/globals.css` | 修改 | 安全区域 CSS 支持 |
|
||||
| `frontend/src/app/page.tsx` | 修改 | 移除独立渐变 + Header 响应式 |
|
||||
@@ -342,6 +342,6 @@ models/Qwen3-TTS/
|
||||
|
||||
## 🔗 相关文档
|
||||
|
||||
- [task_complete.md](../task_complete.md) - 任务总览
|
||||
- [TASK_COMPLETE.md](../TASK_COMPLETE.md) - 任务总览
|
||||
- [Day11.md](./Day11.md) - 上传架构重构
|
||||
- [QWEN3_TTS_DEPLOY.md](../QWEN3_TTS_DEPLOY.md) - Qwen3-TTS 部署指南
|
||||
|
||||
@@ -273,7 +273,7 @@ pm2 logs vigent2-qwen-tts --lines 50
|
||||
|
||||
## 🔗 相关文档
|
||||
|
||||
- [task_complete.md](../task_complete.md) - 任务总览
|
||||
- [TASK_COMPLETE.md](../TASK_COMPLETE.md) - 任务总览
|
||||
- [Day12.md](./Day12.md) - iOS 兼容与 Qwen3-TTS 部署
|
||||
- [QWEN3_TTS_DEPLOY.md](../QWEN3_TTS_DEPLOY.md) - Qwen3-TTS 部署指南
|
||||
- [SUBTITLE_DEPLOY.md](../SUBTITLE_DEPLOY.md) - 字幕功能部署指南
|
||||
|
||||
@@ -358,7 +358,7 @@ const storageKey = userId || 'guest';
|
||||
|
||||
### 解决方案
|
||||
|
||||
**文件**: `frontend/src/lib/axios.ts`
|
||||
**文件**: `frontend/src/shared/api/axios.ts`
|
||||
|
||||
在拦截器中对公开路由跳过重定向,仅在受保护页面触发登录跳转:
|
||||
|
||||
@@ -391,12 +391,12 @@ if ((status === 401 || status === 403) && !isRedirecting && !isPublicPath) {
|
||||
| `backend/app/main.py` | 修改 | 注册 ai 路由 |
|
||||
| `frontend/src/app/page.tsx` | 修改 | AI 生成按钮 + localStorage 修复 |
|
||||
| `frontend/src/app/publish/page.tsx` | 修改 | 恢复 AI 生成的标签 |
|
||||
| `frontend/src/lib/axios.ts` | 修改 | 公开路由跳过 401/403 登录重定向 |
|
||||
| `frontend/src/shared/api/axios.ts` | 修改 | 公开路由跳过 401/403 登录重定向 |
|
||||
|
||||
---
|
||||
|
||||
## 🔗 相关文档
|
||||
|
||||
- [task_complete.md](../task_complete.md) - 任务总览
|
||||
- [TASK_COMPLETE.md](../TASK_COMPLETE.md) - 任务总览
|
||||
- [Day13.md](./Day13.md) - 声音克隆功能集成 + 字幕功能
|
||||
- [QWEN3_TTS_DEPLOY.md](../QWEN3_TTS_DEPLOY.md) - Qwen3-TTS 1.7B 部署指南
|
||||
|
||||
@@ -161,7 +161,7 @@ if (!/^\d{11}$/.test(phone)) {
|
||||
|
||||
### 3. Auth 工具函数 (`auth.ts`)
|
||||
|
||||
**文件**: `frontend/src/lib/auth.ts`
|
||||
**文件**: `frontend/src/shared/lib/auth.ts`
|
||||
|
||||
```typescript
|
||||
export interface User {
|
||||
@@ -304,7 +304,7 @@ pm2 restart vigent2-backend vigent2-frontend
|
||||
| `backend/.env` | 修改 | ADMIN_PHONE=15549380526 |
|
||||
| `frontend/src/app/login/page.tsx` | 修改 | 手机号登录 + 11位验证 |
|
||||
| `frontend/src/app/register/page.tsx` | 修改 | 手机号注册 + 11位验证 |
|
||||
| `frontend/src/lib/auth.ts` | 修改 | phone 参数 + changePassword 函数 |
|
||||
| `frontend/src/shared/lib/auth.ts` | 修改 | phone 参数 + changePassword 函数 |
|
||||
| `frontend/src/app/page.tsx` | 修改 | AccountSettingsDropdown 组件 |
|
||||
| `frontend/src/app/admin/page.tsx` | 修改 | 用户列表显示手机号 |
|
||||
| `frontend/src/contexts/AuthContext.tsx` | 修改 | 存储完整用户信息含 expires_at |
|
||||
@@ -342,7 +342,7 @@ pm2 restart vigent2-backend vigent2-frontend
|
||||
|
||||
## 🔗 相关文档
|
||||
|
||||
- [task_complete.md](../task_complete.md) - 任务总览
|
||||
- [TASK_COMPLETE.md](../TASK_COMPLETE.md) - 任务总览
|
||||
- [Day14.md](./Day14.md) - 模型升级 + AI 标题标签
|
||||
- [AUTH_DEPLOY.md](../AUTH_DEPLOY.md) - 认证系统部署指南
|
||||
|
||||
|
||||
@@ -1,5 +1,3 @@
|
||||
---
|
||||
|
||||
## 🔧 Qwen-TTS Flash Attention 优化 (10:00)
|
||||
|
||||
### 优化背景
|
||||
@@ -18,8 +16,6 @@ pip install -U flash-attn --no-build-isolation
|
||||
- **显存占用**: 显著降低,消除 OOM 风险
|
||||
- **代码变动**: 无代码变动,仅环境优化 (自动检测)
|
||||
|
||||
---
|
||||
|
||||
## 🛡️ 服务看门狗 Watchdog (10:30)
|
||||
|
||||
### 问题描述
|
||||
@@ -53,59 +49,86 @@ if service["failures"] >= service['threshold']:
|
||||
|
||||
---
|
||||
|
||||
## 🎨 UI 交互体验优化 (15:30)
|
||||
## 🎨 交互体验与视图优化 (14:20)
|
||||
|
||||
### 优化内容
|
||||
### 主页优化
|
||||
- 视频生成完成后,预览优先选中最新输出
|
||||
- 选择项持久化:素材 / 背景音乐 / 历史视频
|
||||
- 选择项持久化:素材 / 背景音乐 / 历史作品
|
||||
- 列表内滚动定位选中项,避免页面跳动
|
||||
- 刷新回顶部(首页 / 发布页)
|
||||
- 刷新回到顶部(首页)
|
||||
- 标题/字幕样式预览面板
|
||||
- 背景音乐试听即选中并自动开启,音量滑块实时影响试听
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/app/page.tsx`
|
||||
- `frontend/src/app/publish/page.tsx`
|
||||
### 发布页优化
|
||||
- 刷新回到顶部(发布页)
|
||||
|
||||
---
|
||||
|
||||
## 🎵 字体与背景音乐资源库接入 (15:50)
|
||||
## 🎵 背景音乐链路修复 (15:00)
|
||||
|
||||
### 资源库
|
||||
- `backend/assets/fonts/`(SuperIPAgent 字体全量导入)
|
||||
- `backend/assets/bgm/`(背景音乐素材)
|
||||
- `backend/assets/styles/{subtitle.json,title.json}`(样式预设)
|
||||
|
||||
### 服务能力
|
||||
- `/api/assets/subtitle-styles`、`/api/assets/title-styles`、`/api/assets/bgm`
|
||||
- `/assets` 静态挂载供前端预览与试听
|
||||
|
||||
### 生成链路调整
|
||||
- 先完成人声与唇形/字幕对齐,再混入 BGM
|
||||
- 修复 FFmpeg shell 解析导致的混音失败
|
||||
- 禁用 amix 归一化,保证配音音量不被压低
|
||||
### 修复点
|
||||
- FFmpeg 混音改为 `shell=False`,避免 `filter_complex` 被 shell 误解析
|
||||
- `amix` 禁用归一化,避免配音音量被压低
|
||||
|
||||
### 关键修改
|
||||
`backend/app/services/video_service.py`
|
||||
```python
|
||||
filter_complex = (
|
||||
"[0:a]volume=1.0[a0];"
|
||||
f"[1:a]volume={volume}[a1];"
|
||||
"[a0][a1]amix=inputs=2:duration=first:dropout_transition=2:normalize=0[aout]"
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🖼️ 标题/字幕样式预览 (16:10)
|
||||
## 🗣️ 字幕断句修复 (15:20)
|
||||
|
||||
### 前端
|
||||
- 样式选择 + 预览面板
|
||||
- 字号可调(覆盖样式默认值)
|
||||
- 字体文件动态加载
|
||||
### 内容
|
||||
- 字幕切分逻辑保留英文单词整体,避免中英混合被硬切
|
||||
|
||||
### Remotion
|
||||
- 样式参数透传到 `Subtitles` / `Title`
|
||||
- 渲染前临时复制字体到渲染目录
|
||||
### 涉及文件
|
||||
- `backend/app/services/whisper_service.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧱 资源库与样式能力接入 (15:40)
|
||||
|
||||
### 内容
|
||||
- 字体库 / BGM 资源接入本地 assets
|
||||
- 新增样式配置文件(字幕/标题)
|
||||
- 新增资源 API 与静态挂载 `/assets`
|
||||
- Remotion 支持样式参数与字体加载
|
||||
|
||||
### 涉及文件
|
||||
- `backend/assets/fonts/`
|
||||
- `backend/assets/bgm/`
|
||||
- `backend/assets/styles/subtitle.json`
|
||||
- `backend/assets/styles/title.json`
|
||||
- `backend/app/services/assets_service.py`
|
||||
- `backend/app/api/assets.py`
|
||||
- `backend/app/main.py`
|
||||
- `backend/app/api/videos.py`
|
||||
- `backend/app/services/remotion_service.py`
|
||||
- `remotion/src/components/Subtitles.tsx`
|
||||
- `remotion/src/components/Title.tsx`
|
||||
- `remotion/src/Video.tsx`
|
||||
- `remotion/render.ts`
|
||||
- `frontend/src/app/page.tsx`
|
||||
- `frontend/next.config.ts`
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ 运维调整 (16:10)
|
||||
|
||||
### 内容
|
||||
- Watchdog 移除 LatentSync 监控,避免长推理误杀
|
||||
- LatentSync PM2 增加内存重启阈值(运行时配置)
|
||||
|
||||
---
|
||||
|
||||
## 🎯 前端按钮图标统一 (16:40)
|
||||
|
||||
### 内容
|
||||
- 首页与发布页按钮图标统一替换为 Lucide SVG
|
||||
- 交互按钮保持一致尺寸与对齐
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/home/ui/`
|
||||
- `frontend/src/app/publish/page.tsx`
|
||||
|
||||
---
|
||||
|
||||
@@ -113,7 +136,4 @@ filter_complex = (
|
||||
|
||||
- [x] `Docs/QWEN3_TTS_DEPLOY.md`: 添加 Flash Attention 安装指南
|
||||
- [x] `Docs/DEPLOY_MANUAL.md`: 添加 Watchdog 部署说明
|
||||
- [x] `Docs/task_complete.md`: 更新进度至 100% (Day 16)
|
||||
- [x] `README.md`: 新增样式与背景音乐能力说明
|
||||
- [x] `Docs/BACKEND_README.md`: 资产接口与混音链路说明
|
||||
- [x] `Docs/FRONTEND_README.md`: 新增样式预览与BGM试听说明
|
||||
- [x] `Docs/TASK_COMPLETE.md`: 更新进度至 100% (Day 16)
|
||||
|
||||
176
Docs/DevLogs/Day17.md
Normal file
176
Docs/DevLogs/Day17.md
Normal file
@@ -0,0 +1,176 @@
|
||||
# Day 17 - 前端重构与体验优化
|
||||
|
||||
## 🧩 前端 UI 拆分 (09:10)
|
||||
|
||||
### 内容
|
||||
- 首页 `page.tsx` 拆分为独立 UI 组件,状态与逻辑仍集中在页面
|
||||
- 新增首页组件目录 `frontend/src/features/home/ui/`
|
||||
|
||||
### 组件列表
|
||||
- `HomeHeader`
|
||||
- `MaterialSelector`
|
||||
- `ScriptEditor`
|
||||
- `TitleSubtitlePanel`
|
||||
- `VoiceSelector`
|
||||
- `RefAudioPanel`
|
||||
- `BgmPanel`
|
||||
- `GenerateActionBar`
|
||||
- `PreviewPanel`
|
||||
- `HistoryList`
|
||||
|
||||
---
|
||||
|
||||
## 🧰 前端通用工具抽取 (09:30)
|
||||
|
||||
### 内容
|
||||
- 抽取 API Base / 资源 URL / 日期格式化等通用工具
|
||||
- 首页与发布页统一调用,消除重复逻辑
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/shared/lib/media.ts`
|
||||
- `frontend/src/app/page.tsx`
|
||||
- `frontend/src/app/publish/page.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 📝 前端规范更新 (09:40)
|
||||
|
||||
### 内容
|
||||
- 更新 `FRONTEND_DEV.md` 以匹配最新目录结构
|
||||
- 新增 `media.ts` 使用规范与示例
|
||||
- 增加组件拆分规范与页面 checklist
|
||||
|
||||
### 涉及文件
|
||||
- `Docs/FRONTEND_DEV.md`
|
||||
|
||||
---
|
||||
|
||||
## 🎨 交互体验与视图优化 (10:00)
|
||||
|
||||
### 标题/字幕预览
|
||||
- 标题/字幕预览按素材分辨率缩放,字号更接近成片
|
||||
- 标题/字幕样式选择持久化,刷新不回默认
|
||||
- 默认样式更新:标题 90px 站酷快乐体,字幕 60px 经典黄字 + DingTalkJinBuTi
|
||||
|
||||
### 发布页优化
|
||||
- 选择作品改为卡片列表 + 搜索 + 预览弹窗
|
||||
|
||||
---
|
||||
|
||||
## ⚡ 性能微优化 (10:30)
|
||||
|
||||
### 内容
|
||||
- 列表渲染启用 `content-visibility`(素材/历史/参考音频/发布作品),BGM 列表保留滚动定位
|
||||
- 首屏数据请求并行化(`Promise.allSettled`)
|
||||
- localStorage 写入防抖(文本/标题/BGM 音量/发布表单)
|
||||
|
||||
---
|
||||
|
||||
## 🖼️ 预览弹窗增强 (11:10)
|
||||
|
||||
### 内容
|
||||
- 预览弹窗统一为可复用组件,支持标题与提示
|
||||
- 发布页预览与素材预览共享弹窗样式
|
||||
- 弹窗头部样式统一(图标 + 标题 + 关闭按钮)
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/components/VideoPreviewModal.tsx`
|
||||
- `frontend/src/app/page.tsx`
|
||||
- `frontend/src/app/publish/page.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 🧭 术语统一 (11:20)
|
||||
|
||||
### 内容
|
||||
- “视频预览” → “作品预览”
|
||||
- “历史视频” → “历史作品”
|
||||
- “选择要发布的视频” → “选择要发布的作品”
|
||||
- “选择素材视频” → “视频素材”
|
||||
- “选择配音方式” → “配音方式”
|
||||
|
||||
---
|
||||
|
||||
## 🧱 Phase 2 Hook 抽取 (11:45)
|
||||
|
||||
### 内容
|
||||
- `useTitleSubtitleStyles`:标题/字幕样式获取与默认选择逻辑
|
||||
- `useMaterials`:素材列表/上传/删除逻辑抽取
|
||||
- `useRefAudios`:参考音频列表/上传/删除逻辑抽取
|
||||
- `useBgm`:背景音乐列表与加载状态抽取
|
||||
- `useMediaPlayers`:音频试听逻辑集中管理(参考音频/背景音乐)
|
||||
- `useGeneratedVideos`:历史作品列表获取 + 选择逻辑抽取
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/home/model/useTitleSubtitleStyles.ts`
|
||||
- `frontend/src/features/home/model/useMaterials.ts`
|
||||
- `frontend/src/features/home/model/useRefAudios.ts`
|
||||
- `frontend/src/features/home/model/useBgm.ts`
|
||||
- `frontend/src/features/home/model/useMediaPlayers.ts`
|
||||
- `frontend/src/features/home/model/useGeneratedVideos.ts`
|
||||
- `frontend/src/app/page.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 🧩 首页持久化修复 (12:20)
|
||||
|
||||
### 内容
|
||||
- 接入 `useHomePersistence`,补齐 `isRestored` 恢复/保存逻辑
|
||||
- 修复首页刷新后选择项恢复链路,`npm run build` 通过
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/app/page.tsx`
|
||||
- `frontend/src/features/home/model/useHomePersistence.ts`
|
||||
|
||||
---
|
||||
|
||||
## 🧩 发布预览与播放修复 (14:10)
|
||||
|
||||
### 内容
|
||||
- 发布页作品预览兼容签名 URL 与相对路径
|
||||
- 参考音频试听统一走 `resolveMediaUrl`
|
||||
- 素材/BGM 选择在列表变化时自动回退有效项
|
||||
- 录音预览 URL 回收、预览弹窗滚动状态恢复、全局任务提示挂载
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/app/publish/page.tsx`
|
||||
- `frontend/src/features/home/model/useMediaPlayers.ts`
|
||||
- `frontend/src/features/home/model/useBgm.ts`
|
||||
- `frontend/src/features/home/model/useMaterials.ts`
|
||||
- `frontend/src/features/home/ui/RefAudioPanel.tsx`
|
||||
- `frontend/src/components/VideoPreviewModal.tsx`
|
||||
- `frontend/src/app/layout.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 🧩 标题同步与长度限制 (15:30)
|
||||
|
||||
### 内容
|
||||
- 片头标题修改同步写入发布信息标题
|
||||
- 标题输入兼容中文输入法,限制 15 字(发布信息同规则)
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/home/model/useHomeController.ts`
|
||||
- `frontend/src/features/home/ui/TitleSubtitlePanel.tsx`
|
||||
- `frontend/src/features/publish/model/usePublishController.ts`
|
||||
|
||||
---
|
||||
|
||||
## 🧱 轻量 FSD 迁移 (16:20)
|
||||
|
||||
### 内容
|
||||
- 页面瘦身:`app` 仅保留入口组件,业务逻辑集中到 Controller Hook
|
||||
- 引入 `features/*` 分层:UI 与 model 分离,Home/Publish 按功能聚合
|
||||
- 通用能力下沉到 `shared/*`(lib/hooks/api)
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/home/ui/HomePage.tsx`
|
||||
- `frontend/src/features/home/model/useHomeController.ts`
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx`
|
||||
- `frontend/src/features/publish/model/usePublishController.ts`
|
||||
- `frontend/src/shared/lib/media.ts`
|
||||
- `frontend/src/shared/lib/title.ts`
|
||||
- `frontend/src/shared/api/axios.ts`
|
||||
- `frontend/src/shared/hooks/useTitleInput.ts`
|
||||
- `frontend/src/app/page.tsx`
|
||||
- `frontend/src/app/publish/page.tsx`
|
||||
168
Docs/DevLogs/Day18.md
Normal file
168
Docs/DevLogs/Day18.md
Normal file
@@ -0,0 +1,168 @@
|
||||
# Day 18 - 后端模块化与规范完善
|
||||
|
||||
## 🧱 后端模块化重构 (10:10)
|
||||
|
||||
### 内容
|
||||
- API 路由统一透传到 `modules/*`,路由仅负责参数/权限与响应
|
||||
- 视频生成逻辑下沉 `workflow`,任务状态抽到 `task_store`
|
||||
- `TaskStore` 支持 Redis 优先、不可用时自动回退内存
|
||||
- Supabase 访问抽到 `repositories/*`,`deps/auth/admin` 全面改造
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/modules/videos/router.py`
|
||||
- `backend/app/modules/videos/workflow.py`
|
||||
- `backend/app/modules/videos/task_store.py`
|
||||
- `backend/app/modules/videos/service.py`
|
||||
- `backend/app/modules/*/router.py`
|
||||
- `backend/app/repositories/users.py`
|
||||
- `backend/app/repositories/sessions.py`
|
||||
- `backend/app/core/deps.py`
|
||||
|
||||
---
|
||||
|
||||
## ✅ 统一响应与异常处理 (11:00)
|
||||
|
||||
### 内容
|
||||
- 统一 JSON 响应结构:`success/message/data/code`
|
||||
- 全局异常处理中将 `detail` 转换为 `message`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/core/response.py`
|
||||
- `backend/app/main.py`
|
||||
|
||||
---
|
||||
|
||||
## 🎞️ 素材重命名与存储操作 (11:40)
|
||||
|
||||
### 内容
|
||||
- 新增素材重命名接口 `PUT /api/materials/{material_id}`
|
||||
- Storage 增加 `move_file` 以支持重命名/移动
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/modules/materials/router.py`
|
||||
- `backend/app/services/storage.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧾 平台列表调整 (12:10)
|
||||
|
||||
### 内容
|
||||
- 平台顺序调整为:抖音 → 微信视频号 → B站 → 小红书
|
||||
- 移除快手配置
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/publish_service.py`
|
||||
|
||||
---
|
||||
|
||||
## 📘 后端开发规范补充 (12:30)
|
||||
|
||||
### 内容
|
||||
- 新增 `BACKEND_DEV.md` 作为后端规范文档
|
||||
- `BACKEND_README.md` 同步模块化结构与响应格式
|
||||
|
||||
### 涉及文件
|
||||
- `Docs/BACKEND_DEV.md`
|
||||
- `Docs/BACKEND_README.md`
|
||||
|
||||
---
|
||||
|
||||
## 🚀 发布管理进入体验优化 (13:10)
|
||||
|
||||
### 内容
|
||||
- 首页预取 `/publish` 路由,进入发布管理时更快
|
||||
- 发布页读取 `sessionStorage` 预取数据,首屏更快渲染
|
||||
- 账号与作品列表增加骨架屏,避免空白等待
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/home/ui/HomePage.tsx`
|
||||
- `frontend/src/features/home/model/useHomeController.ts`
|
||||
- `frontend/src/features/publish/model/usePublishController.ts`
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 📁 首页素材加载优化 (13:30)
|
||||
|
||||
### 内容
|
||||
- 素材列表签名 URL 并发生成(并发上限 8),缩短加载时间
|
||||
- 素材列表增加加载骨架,数量根据上次素材数量动态调整
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/modules/materials/router.py`
|
||||
- `frontend/src/features/home/model/useMaterials.ts`
|
||||
- `frontend/src/features/home/model/useHomeController.ts`
|
||||
- `frontend/src/features/home/ui/HomePage.tsx`
|
||||
- `frontend/src/features/home/ui/MaterialSelector.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 🎬 预览加载体验优化 (14:00)
|
||||
|
||||
### 内容
|
||||
- 预览视频设置 `preload="metadata"`,缩短首帧等待
|
||||
- 发布页预览按钮悬停预取视频资源
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/components/VideoPreviewModal.tsx`
|
||||
- `frontend/src/features/home/ui/PreviewPanel.tsx`
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 📹 微信视频号发布接入 (16:30)
|
||||
|
||||
### 内容
|
||||
- 新增视频号上传器 `WeixinUploader`,打通上传/标题/简介/标签/发布流程
|
||||
- 视频号扫码登录配置完善(iframe 扫码、候选二维码过滤)
|
||||
- 发布平台与路由接入视频号
|
||||
- 中文错误提示 + 关键节点截图保存到 `debug_screenshots`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/services/qr_login_service.py`
|
||||
- `backend/app/services/publish_service.py`
|
||||
- `backend/app/modules/publish/router.py`
|
||||
- `backend/app/modules/login_helper/router.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧪 视频号上传稳定性修复 (17:40)
|
||||
|
||||
### 内容
|
||||
- 统一浏览器指纹(UA/locale/timezone)并支持系统 Chrome
|
||||
- 增加 headful + xvfb-run 运行方案,避免 headless 检测与解码失败
|
||||
- 强制 SwiftShader,修复 WebGL context loss
|
||||
- 上传前转码为兼容 MP4(H.264 + AAC + faststart)
|
||||
- 增强上传状态判断与调试日志 `weixin_network.log`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/core/config.py`
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/services/qr_login_service.py`
|
||||
- `run_backend.sh`
|
||||
|
||||
---
|
||||
|
||||
## 🧾 发布诊断增强 (18:10)
|
||||
|
||||
### 内容
|
||||
- 抖音发布新增网络日志与失败截图,便于定位上传/发布失败
|
||||
- 视频号上传失败截图与网络日志落盘
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/debug_screenshots/*`
|
||||
|
||||
---
|
||||
|
||||
## 🧩 发布页交互调整 (18:20)
|
||||
|
||||
### 内容
|
||||
- 未选择平台时禁用发布按钮
|
||||
- 移除定时发布 UI/参数,仅保留立即发布
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx`
|
||||
- `frontend/src/features/publish/model/usePublishController.ts`
|
||||
485
Docs/DevLogs/Day19.md
Normal file
485
Docs/DevLogs/Day19.md
Normal file
@@ -0,0 +1,485 @@
|
||||
## 🛡️ 发布中防误刷新(15:46,合并)
|
||||
|
||||
### 内容
|
||||
- 发布按钮文案统一为:`正在发布...请勿刷新或关闭网页`
|
||||
- 发布中启用浏览器 `beforeunload` 拦截,刷新/关闭页面会触发原生二次确认
|
||||
- 适用于发布管理页全部平台(抖音 / 微信视频号 / B站 / 小红书)
|
||||
- 后续优化已登记:发布任务状态恢复机制(任务化 + 状态持久化 + 前端轮询恢复)
|
||||
|
||||
### 涉及文件
|
||||
- `frontend/src/features/publish/model/usePublishController.ts`
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx`
|
||||
|
||||
---
|
||||
|
||||
## 🖼️ 发布成功截图稳定性优化(15:26,合并)
|
||||
|
||||
### 内容
|
||||
- 成功判定后先等待页面加载,再额外等待 `3s` 后截图,避免抓到半加载页面
|
||||
- 针对“截图里页面内容只占 1/3”问题,成功截图从 `full_page=True` 调整为视口截图 `full_page=False`
|
||||
- 视频号成功截图前额外恢复 `zoom=1.0`,避免流程缩放影响最终截图比例
|
||||
- 抖音成功截图同步应用相同策略,统一前端展示观感
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧪 视频号录屏 Debug 开关(15:12,已回收)
|
||||
|
||||
### 内容
|
||||
- 为视频号上传器新增 Playwright 录屏能力,开关受 `WEIXIN_DEBUG_ARTIFACTS && WEIXIN_RECORD_VIDEO` 控制
|
||||
- 新增视频号录屏配置项:
|
||||
- `WEIXIN_RECORD_VIDEO`
|
||||
- `WEIXIN_KEEP_SUCCESS_VIDEO`
|
||||
- `WEIXIN_RECORD_VIDEO_WIDTH`
|
||||
- `WEIXIN_RECORD_VIDEO_HEIGHT`
|
||||
- 上传流程在 `finally` 中统一保存录屏,失败必保留;成功录屏默认按开关清理
|
||||
- 排障阶段临时开启过视频号 debug/录屏;当前已回收为默认关闭(`run_backend.sh` 设为 `false`)
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/core/config.py`
|
||||
- `run_backend.sh`
|
||||
- `Docs/DEPLOY_MANUAL.md`
|
||||
|
||||
---
|
||||
|
||||
## 🔁 后端启动脚本统一为 run_backend.sh (15:00)
|
||||
|
||||
### 内容
|
||||
- 删除旧脚本 `run_backend_xvfb.sh`
|
||||
- 将 `run_backend.sh` 统一为 xvfb + headful 启动逻辑(不再保留非 xvfb 版本)
|
||||
- 默认端口从 `8010` 统一为 `8006`
|
||||
- 启动脚本默认关闭微信/抖音 debug 产物
|
||||
- 更新部署手册中的启动与 pm2 示例,统一使用 `run_backend.sh`
|
||||
|
||||
### 涉及文件
|
||||
- `run_backend.sh`
|
||||
- `run_backend_xvfb.sh` (deleted)
|
||||
- `Docs/DEPLOY_MANUAL.md`
|
||||
|
||||
---
|
||||
|
||||
## 🧾 视频号卡顿与文案未写入修复 (14:52)
|
||||
|
||||
### 内容
|
||||
- 复盘日志确认视频号 `post_create` 请求已成功,但结果判定仅靠页面文案,导致长时间“等待发布结果”
|
||||
- 发布判定优化:`post_create` 成功且页面进入 `post/list` 时立即判定成功
|
||||
- 发布超时改为失败返回(不再 `success=true` 假成功)
|
||||
- “标题+标签写在视频描述”进一步加强:先按 `视频描述` 标签定位输入框,再做 placeholder 与 contenteditable 兜底
|
||||
- 视频号发布结果等待超时从 `180s` 收敛到 `90s`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🚦 视频号发布卡顿根因与快速判定 (14:45)
|
||||
|
||||
### 内容
|
||||
- 定位到卡顿根因是实际请求已提交(`post_create` 成功)但结果判定仍在轮询文本提示,导致长时间等待
|
||||
- 新增发布成功网络信号:监听 `post/post_create` 成功响应后标记已提交
|
||||
- 若已提交且页面已回到内容列表(`/post/list`),立即判定发布成功,不再等满超时
|
||||
- 新增发布接口失败信号:`post_create` 返回错误时立即失败返回
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 📸 视频号发布成功截图接入前端 (13:34)
|
||||
|
||||
### 内容
|
||||
- 为微信视频号新增“发布成功截图”能力:发布成功后直接对当前成功页截图
|
||||
- 截图存储沿用私有隔离目录:`private_outputs/publish_screenshots/{user_id}`
|
||||
- 返回前端的 `screenshot_url` 使用鉴权接口:`/api/publish/screenshot/{filename}`
|
||||
- 视频号上传器新增 `user_id` 透传,确保截图按用户隔离
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/services/publish_service.py`
|
||||
|
||||
---
|
||||
|
||||
## ✍️ 视频号描述填充修正 + 关闭调试产物 (13:26)
|
||||
|
||||
### 内容
|
||||
- 按最新规则调整视频号文案填充:标题和标签统一写入“视频描述”输入区
|
||||
- 标签统一规范为 `#标签` 形式并去重
|
||||
- 若未找到“视频描述”输入区,直接返回失败,避免“发布成功但标题/标签为空”
|
||||
- 关闭视频号 debug 产物:新增 `WEIXIN_DEBUG_ARTIFACTS=false`,禁用调试日志与截图输出
|
||||
- `run_backend.sh` 增加 `WEIXIN_DEBUG_ARTIFACTS=false`,启动脚本层面强制关闭
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/weixin_uploader.py`
|
||||
- `backend/app/core/config.py`
|
||||
- `run_backend.sh`
|
||||
|
||||
---
|
||||
|
||||
## 🚫 强制关闭抖音调试产物 (13:15)
|
||||
|
||||
### 内容
|
||||
- 进一步收紧为“默认不生成任何抖音 debug 截屏/日志/录屏”
|
||||
- 录屏开关改为依赖 `DOUYIN_DEBUG_ARTIFACTS && DOUYIN_RECORD_VIDEO`,避免单独误开
|
||||
- `run_backend.sh` 增加环境变量强制关闭:
|
||||
- `DOUYIN_DEBUG_ARTIFACTS=false`
|
||||
- `DOUYIN_RECORD_VIDEO=false`
|
||||
- 仅保留给用户看的发布成功截图(私有目录 + 鉴权访问)
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `backend/app/core/config.py`
|
||||
- `run_backend.sh`
|
||||
|
||||
---
|
||||
|
||||
## 🧹 关闭调试截屏/录屏并清理历史文件 (13:08)
|
||||
|
||||
### 内容
|
||||
- 抖音调试产物默认关闭:
|
||||
- `DOUYIN_DEBUG_ARTIFACTS=false`
|
||||
- `DOUYIN_RECORD_VIDEO=false`
|
||||
- 保留功能信号监听(上传提交/封面生成/发布接口状态)用于流程判断,不依赖调试文件
|
||||
- 已删除现有抖音调试文件(`debug_screenshots` 下的 `douyin_*` 截图、日志与失败录屏)
|
||||
- 继续保留并展示“给用户看的发布成功截图”(用户隔离 + 鉴权访问)
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/core/config.py`
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `backend/app/debug_screenshots/douyin_*` (deleted)
|
||||
- `backend/app/debug_screenshots/videos/douyin_*` (deleted)
|
||||
|
||||
---
|
||||
|
||||
## 🔒 成功截图用户隔离 (12:58)
|
||||
|
||||
### 内容
|
||||
- 发布成功截图改为用户隔离存储,不再写入公开静态目录
|
||||
- 存储目录迁移到私有路径:`private_outputs/publish_screenshots/{user_id}`
|
||||
- 新增鉴权访问接口:`GET /api/publish/screenshot/{filename}`(必须登录,仅可访问本人截图)
|
||||
- 返回给前端的 `screenshot_url` 改为鉴权接口地址,避免跨用户直接猜路径访问
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `backend/app/services/publish_service.py`
|
||||
- `backend/app/modules/publish/router.py`
|
||||
- `backend/app/core/config.py`
|
||||
|
||||
---
|
||||
|
||||
## 🎯 封面触发提速与审核中截图强化 (12:49)
|
||||
|
||||
### 内容
|
||||
- 修复“上传完成后长时间不进入封面”:当出现 `重新上传+预览` 且已收到视频提交信号时,立即进入封面步骤
|
||||
- 目标是减少“处理中”文案残留导致的额外等待
|
||||
- 成功截图逻辑强化为优先“真实点击审核中标签”,新增文本点击兜底,不再只用可见即通过
|
||||
- 若审核中列表未马上出现标题,自动刷新并再次进入审核中重查后再截图
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🔐 登录态识别增强(避免误报上传失败) (12:41)
|
||||
|
||||
### 内容
|
||||
- 针对“未触发文件选择弹窗”误报,新增登录页识别:
|
||||
- URL 关键字:`passport/login/check_qrconnect/sso`
|
||||
- 页面文本:`扫码登录/验证码登录/立即登录/抖音APP扫码登录` 等
|
||||
- 登录控件:手机号/验证码输入框、登录按钮
|
||||
- 上传阶段重试后若识别为登录页,直接返回 `Cookie 已失效,请重新登录`
|
||||
- 避免把“实际掉登录”误判成“上传入口失效”
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ⏱️ 发布阶段超时与网络不佳快速失败 (12:30)
|
||||
|
||||
### 内容
|
||||
- 针对“网络不佳后长时间卡住”增加发布阶段快速失败
|
||||
- 上传完成后到发布结果设置总超时 `60s`(`POST_UPLOAD_STAGE_TIMEOUT`),超过直接失败
|
||||
- 识别发布接口 `create_v2` 的 HTTP 错误(如 403)并立即返回失败,不再等待 180 秒
|
||||
- 发布结果判定新增网络类失败文案匹配(`网络不佳/网络异常/请稍后重试`)
|
||||
- 阻塞弹窗关闭策略新增 `暂不设置`,避免“设置横封面获更多流量”弹窗阻塞点击发布
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧯 封面已完成但误判失败修复 (12:22)
|
||||
|
||||
### 内容
|
||||
- 针对报错“封面为必填但未设置成功”新增页面态兜底,避免封面已完成却未点击发布
|
||||
- 新增 `_is_cover_configured_on_page()`:通过 `横封面/竖封面` + 封面预览图判断页面已配置封面
|
||||
- 当出现 `horizontal_switch_missed` 或 `no_cover_button` 时,若页面已配置封面则允许继续发布
|
||||
- 封面必填主流程增加 `configured_fallback_continue` 兜底,降低误杀
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧾 成功截图切到审核中视图 (11:26)
|
||||
|
||||
### 内容
|
||||
- 按需求将“发布成功截图”改为内容管理 `审核中/待审核` 视图,不再截“全部作品”
|
||||
- 发布成功后先进入内容管理并点击 `审核中`(或 `待审核`)再截图
|
||||
- 截图前额外尝试等待当前标题出现在审核中列表,便于确认是最新发布作品
|
||||
- 发布超时兜底验证也改为优先在审核中列表查找标题
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ✅ 封面步骤按指定顺序强约束 (11:18)
|
||||
|
||||
### 内容
|
||||
- 按确认流程收紧旧发布页封面链路:
|
||||
- 作品描述填完 → 点击 `选择封面` → 点击 `设置横封面` → 点击 `完成` → 等待封面效果检测通过 → 才允许发布
|
||||
- 新增 `require_horizontal` 约束:封面必填场景必须切换到横封面,否则直接失败重试
|
||||
- 新增封面效果检测通过等待:优先 `cover/gen` 新请求信号,其次页面“检测通过”文案
|
||||
- 避免因漏点 `设置横封面` 导致后续卡住或误发布
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧩 横封面点击漏判修复 (11:10)
|
||||
|
||||
### 内容
|
||||
- 根据复现反馈修复“未点击设置横封面导致封面流程卡住”问题
|
||||
- 新增 `_switch_to_horizontal_cover()`,扩展横封面入口选择器(`设置横封面/横封面/横版封面`)
|
||||
- 进入封面弹窗后先关闭阻塞弹窗再点击横封面,点击失败会重试一次
|
||||
- 若页面存在横封面入口但始终未切换成功,直接返回失败并重试,避免长时间假等待
|
||||
- 新增日志:`[douyin][cover] switched_horizontal ...`、`horizontal_switch_missed`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ⚡ 横封面后直接完成优化 (11:03)
|
||||
|
||||
### 内容
|
||||
- 根据实测反馈,在点击 `设置横封面` 后新增一次“立即点击完成”快速路径
|
||||
- 若平台已自动选中横封面,将直接确认并退出弹窗,不再执行后续封面扫描
|
||||
- 新增日志:`[douyin][cover] fast_confirm_after_switch ...`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ 封面步骤提速优化 (10:58)
|
||||
|
||||
### 内容
|
||||
- 复盘日志确认旧发布页封面步骤存在明显耗时(示例:`required_by_text` 到 `cover selected` 约 35 秒)
|
||||
- 新增封面“快速确认”路径:若平台已默认选中封面,直接确认并跳过多余扫描
|
||||
- 收紧封面成功条件:仅“确认按钮点击成功”才算封面设置成功,避免误判
|
||||
- 缩短不必要等待并新增封面耗时日志:`[douyin][cover] fast_confirm/selected=... confirmed=... elapsed=...`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧾 发布成功截图前台展示 (10:48)
|
||||
|
||||
### 内容
|
||||
- 按需求删除 `run_backend_xvfb_live.sh`,不再提供实时直播脚本
|
||||
- 抖音发布成功时自动保存成功截图到 `outputs/publish_screenshots`
|
||||
- 发布接口返回 `screenshot_url`,前端发布结果卡片直接展示截图并支持点击查看大图
|
||||
- 发布结果不再 10 秒自动清空,方便用户确认“是否真正发布成功”
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `frontend/src/features/publish/model/usePublishController.ts`
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx`
|
||||
- `run_backend_xvfb_live.sh` (deleted)
|
||||
|
||||
---
|
||||
|
||||
## 🧬 抖音界面差异根因与环境对齐 (10:20)
|
||||
|
||||
### 内容
|
||||
- 定位到 Playwright 与手动 Win11 Chrome 的环境指纹不一致(Linux 平台 + 自动化上下文),可能触发不同灰度界面
|
||||
- 抖音上传器新增独立浏览器配置项,不再复用 `WEIXIN_*` 配置
|
||||
- 新增 `DOUYIN_*` 配置:`HEADLESS_MODE/USER_AGENT/LOCALE/TIMEZONE_ID/CHROME_PATH/BROWSER_CHANNEL/FORCE_SWIFTSHADER`
|
||||
- 上传器启动改为 `_build_launch_options()`,可直接切换到系统 Chrome + headful(推荐配合 xvfb)
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `backend/app/core/config.py`
|
||||
|
||||
---
|
||||
|
||||
## 🪄 新旧发布页封面逻辑分流 (10:28)
|
||||
|
||||
### 内容
|
||||
- 依据页面结构自动分流:
|
||||
- 新版发布页(封面非必填):默认跳过封面设置
|
||||
- 旧版发布页(出现 `设置封面` + `必填`):强制先设置封面
|
||||
- 新增 `_is_cover_required()` 判断,避免在新页面做多余封面操作
|
||||
- 若判定为非必填但点击发布失败,会回退尝试设置封面后再重试发布
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 📺 虚拟屏实时观看方案 (10:36)
|
||||
|
||||
### 内容
|
||||
- 新增 `run_backend_xvfb_live.sh`,在 Xvfb 下同时启动后端与实时画面转码
|
||||
- 通过 ffmpeg 抓取虚拟屏并输出 HLS:`/outputs/live/live.m3u8`
|
||||
- 适用于“边跑自动发布边实时观看”,不依赖 VNC
|
||||
- 默认仍保留失败录屏,HLS 用于过程实时观察
|
||||
|
||||
### 涉及文件
|
||||
- `run_backend_xvfb_live.sh`
|
||||
|
||||
---
|
||||
|
||||
## 🎥 抖音后台录屏能力 (09:55)
|
||||
|
||||
### 内容
|
||||
- 新增抖音自动发布过程录屏能力,便于定位“卡住在哪一步”
|
||||
- 录屏文件保存目录:`backend/app/debug_screenshots/videos`
|
||||
- 默认开启录屏,默认只保留失败录屏(成功录屏自动清理)
|
||||
- 每次执行会在网络日志追加录屏保存记录(`[douyin][record]`)
|
||||
- 增加发布阶段关键标记日志:`publish_wait ready`、`publish_click try/clicked`
|
||||
- 新增配置项:`DOUYIN_RECORD_VIDEO`、`DOUYIN_KEEP_SUCCESS_VIDEO`、`DOUYIN_RECORD_VIDEO_WIDTH`、`DOUYIN_RECORD_VIDEO_HEIGHT`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
- `backend/app/core/config.py`
|
||||
|
||||
---
|
||||
|
||||
## 🚀 发布按钮等待逻辑修正 (10:00)
|
||||
|
||||
### 内容
|
||||
- 根据线上反馈,发布页不再做冗长前置等待,改为“尽快尝试点击发布”
|
||||
- 新增发布按钮定位策略(role + text 多选择器),避免 `exact role` 匹配失败导致假等待
|
||||
- 将发布按钮等待上限从上传超时(300s)独立为 `PUBLISH_BUTTON_TIMEOUT=60s`
|
||||
- 点击发布阶段统一走 `_click_publish_button`,并持续记录 `publish_wait/publish_click` 日志
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧪 上传完成特征判定增强 (10:07)
|
||||
|
||||
### 内容
|
||||
- 基于实测页面特征补齐“上传中/上传完成”判定:
|
||||
- 上传中:`上传过程中请不要刷新`、`取消上传`、`已上传/当前速度/剩余时间`
|
||||
- 上传完成:`重新上传` + `预览视频/预览封面/标题`
|
||||
- 仅在确认上传完成后才允许执行发布点击,避免“未传完提前发布”
|
||||
- 新增上传等待日志:`[douyin][upload_wait] ...`,可直观看到卡在上传中还是等完成信号
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ⏸️ 上传完成后延时发布 (10:10)
|
||||
|
||||
### 内容
|
||||
- 根据实测反馈,增加“上传完成后固定等待 2 秒”再点发布
|
||||
- 避免刚出现完成信号就立即点击,给前端状态收敛留缓冲
|
||||
- 新增日志标记:`[douyin][upload_ready] wait_before_publish=2s`
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🖼️ 恢复封面设置流程 (10:14)
|
||||
|
||||
### 内容
|
||||
- 按实测需求恢复“上传完成后先设置封面,再发布”流程
|
||||
- 封面设置改为最多尝试 2 次,成功写入 `[douyin][cover] selected`
|
||||
- 若封面未设置成功则直接终止发布并保存截图 `cover_not_selected`
|
||||
- 避免出现“未设封面就点击发布”的情况
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ 抖音发布流程修复 (09:20)
|
||||
|
||||
### 内容
|
||||
- 按最新页面流程改为先进入首页并点击 `高清发布`,再进入上传页
|
||||
- 新增未发布草稿处理:检测到 `你还有上次未发布的视频` 时自动点击 `放弃`
|
||||
- 上传策略改为优先点击 `上传视频` 并走 file chooser,失败后再回退多 input 选择器
|
||||
- 只有检测到 `基础信息/作品描述/发布设置/重新上传` 等发布态信号才继续,避免误判“已上传”
|
||||
- 修复无扩展名视频临时文件策略:优先 hardlink,失败时 copy,移除 symlink 回退
|
||||
- 适配当前智能封面流程:跳过手动封面操作
|
||||
- 话题填写改为在简介/描述区域使用 `#标签` 形式追加
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ⚡ 抖音等待链路再收敛 (09:52)
|
||||
|
||||
### 内容
|
||||
- 根据“选完视频即进入发布页”流程,移除独立的上传完成轮询阶段
|
||||
- 改为在点击发布前统一等待“发布按钮可点击”,避免重复等待导致总时长偏长
|
||||
- 新增 `publish_wait` 调试日志,按秒记录按钮可点击等待时长
|
||||
- 超时文案改为明确提示“发布按钮长时间不可点击”
|
||||
- 上传入口改为严格 file chooser 流程:只走“点击上传视频 → 选择文件 → 进入发布页”链路
|
||||
- 移除直接 input 回退上传,避免绕开上传入口导致状态机异常
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## 🧭 抖音卡慢环节定位与修复 (09:45)
|
||||
|
||||
### 内容
|
||||
- 通过 `douyin_network.log` 定位到卡慢发生在“上传完成判定”阶段,而非真正提交发布接口
|
||||
- 新增上传完成网络信号:`CommitUploadInner` 成功与封面生成成功信号写入日志
|
||||
- 收紧“上传完成”判定,移除 `publish_button_enabled` 这种过早放行条件
|
||||
- 仅在检测到 `重新上传/重新选择` 或上传提交信号后才进入下一步,降低误判导致的长等待
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ✅ 抖音发布结果判定修正 (09:38)
|
||||
|
||||
### 内容
|
||||
- 修复“发布检测超时仍返回 success=true”的问题,超时场景改为 `success=false`
|
||||
- 优化超时返回文案,明确为“发布状态未知,需要后台确认”
|
||||
- 下线过于宽松的管理页兜底判定(仅出现 `审核中` 不再当作发布成功)
|
||||
- 超时时即使管理页出现同名标题也不直接判定成功,避免旧作品同名导致误报
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
---
|
||||
|
||||
## ⏱️ 抖音上传完成判定优化 (09:34)
|
||||
|
||||
### 内容
|
||||
- 根据最新日志确认文件上传已开始并有分片上传请求成功,但流程长时间停留在“等待上传完成”
|
||||
- 扩展“上传完成”判定条件,不再只依赖单一 `long-card + 重新上传` 选择器
|
||||
- 新增上传完成信号:`重新上传/重新选择` 可见、发布按钮可用、`发布设置` 或 `预览视频` 可见
|
||||
- 上传等待日志增加耗时秒数,便于判断是否真实卡住
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/uploader/douyin_uploader.py`
|
||||
103
Docs/DevLogs/Day20.md
Normal file
103
Docs/DevLogs/Day20.md
Normal file
@@ -0,0 +1,103 @@
|
||||
## 🔧 代码质量与安全优化 (13:30)
|
||||
|
||||
### 概述
|
||||
本日进行项目全面代码审查与优化,共处理 27 项优化点,完成 18 项核心修复。
|
||||
|
||||
### 已完成优化
|
||||
|
||||
#### 功能性修复
|
||||
- [x] **P0-1**: LatentSync 回退逻辑空实现 → 改为 `raise RuntimeError`
|
||||
- [x] **P1-1**: 任务状态接口无用户归属校验 → 添加用户认证依赖
|
||||
- [x] **P1-2**: 前端 User 类型定义重复 → 统一到 `shared/types/user.ts`
|
||||
|
||||
#### 性能优化
|
||||
- [x] **P1-3**: 参考音频列表 N+1 查询 → 使用 `asyncio.gather` 并发
|
||||
- [x] **P1-4**: 视频上传整读内存 → 新增 `upload_file_from_path` 流式处理
|
||||
- [x] **P1-5**: async 路由内同步阻塞 → `httpx.AsyncClient` 替换 `requests`
|
||||
- [x] **P2-2**: GLM 服务同步调用 → `asyncio.to_thread` 包装
|
||||
- [x] **P2-3**: Remotion 渲染启动慢 → 预编译 JS + `build:render` 脚本
|
||||
|
||||
#### 安全修复
|
||||
- [x] **P1-8**: 硬编码 Cookie → 移至环境变量 `DOUYIN_COOKIE`
|
||||
- [x] **P1-9**: 请求日志打印完整 headers → 敏感信息脱敏
|
||||
- [x] **P2-10**: ffprobe 使用 `shell=True` → 改为参数列表
|
||||
- [x] **P2-11**: CORS 配置 `*` + credentials → 从 `CORS_ORIGINS` 环境变量读取
|
||||
|
||||
#### 配置优化
|
||||
- [x] **P2-5**: 存储服务硬编码路径 → 环境变量 `SUPABASE_STORAGE_LOCAL_PATH`
|
||||
- [x] **P3-3**: Remotion `execSync` 同步调用 → promisified `exec` 异步
|
||||
- [x] **P3-5**: LatentSync 相对路径 → 基于 `__file__` 绝对路径
|
||||
|
||||
### 暂不处理(收益有限)
|
||||
- [~] **P1-6**: useHomeController 超大文件 (884行)
|
||||
- [~] **P1-7**: 抖音/微信上传器重复代码(流程差异大)
|
||||
|
||||
### 低优先级(后续处理)
|
||||
- [~] **P2-6~P2-9**: API 转发壳、前端 API 客户端混用、ESLint、重复逻辑
|
||||
- [~] **P3-1~P3-4**: 阻塞式交互、Modal 过大、样式兼容层
|
||||
|
||||
### 涉及文件
|
||||
- `backend/app/services/latentsync_service.py` - 回退逻辑
|
||||
- `backend/app/modules/videos/router.py` - 任务状态认证
|
||||
- `backend/app/modules/tools/router.py` - httpx 异步、Cookie 配置化
|
||||
- `backend/app/services/glm_service.py` - 异步包装
|
||||
- `backend/app/services/storage.py` - 流式上传、路径配置化
|
||||
- `backend/app/services/video_service.py` - ffprobe 安全调用
|
||||
- `backend/app/main.py` - CORS 配置、日志脱敏
|
||||
- `backend/app/core/config.py` - 新增配置项
|
||||
- `remotion/render.ts` - 异步 exec
|
||||
- `remotion/package.json` - build:render 脚本
|
||||
- `models/LatentSync/scripts/server.py` - 绝对路径
|
||||
- `frontend/src/shared/types/user.ts` - 统一类型定义
|
||||
|
||||
### 新增环境变量
|
||||
```bash
|
||||
# .env 新增配置(均有默认值,无需必填)
|
||||
CORS_ORIGINS=* # CORS 白名单
|
||||
SUPABASE_STORAGE_LOCAL_PATH=/path/to/... # 本地存储路径
|
||||
DOUYIN_COOKIE=... # 抖音视频下载 Cookie
|
||||
```
|
||||
|
||||
### 重启要求
|
||||
```bash
|
||||
pm2 restart vigent2-backend
|
||||
pm2 restart vigent2-latentsync
|
||||
# Remotion 已自动编译
|
||||
```
|
||||
|
||||
### 🎨 交互与体验优化 (17:00)
|
||||
|
||||
- [x] **UX-1**: PublishPage 图片加载优化 (`<img>` → `next/image`)
|
||||
- [x] **UX-2**: 按钮 Loading 状态统一 (提取脚本弹窗 + 发布页)
|
||||
- [x] **UX-3**: 骨架屏加载优化 (发布页加载中状态)
|
||||
- [x] **UX-4**: 全局快捷键支持 (ESC 关闭弹窗, Enter 确认)
|
||||
- [x] **UX-5**: 移除全局 GlobalTaskIndicator (视觉降噪)
|
||||
- [x] **UX-6**: 视频生成完成自动刷新列表并选中最新
|
||||
|
||||
### 🐛 缺陷修复与回归治理 (17:30)
|
||||
|
||||
#### 严重缺陷修复
|
||||
- [x] **BUG-1**: Remotion 渲染脚本路径解析错误 (导致标题字幕丢失)
|
||||
- *原因*: `render.js` 预编译后使用了 `__dirname`,在 `dist` 目录下寻找源码失败。
|
||||
- *修复*: 修改 `render.ts` 使用 `process.cwd()` 动态解析路径,并重新编译。
|
||||
|
||||
- [x] **BUG-2**: 发布页视频选择持久化失效 (Auth 异步竞态)
|
||||
- *原因*: 页面加载时 `useAuth` 尚未返回用户 ID,导致使用 `guest` Key 读取不到记录,随后被默认值覆盖。
|
||||
- *修复*: 引入 `isVideoRestored` 状态机,强制等待 Auth 完成且 Video 列表加载完毕后,才执行恢复逻辑。
|
||||
|
||||
#### 回归问题治理
|
||||
- [x] **REG-1**: 首页历史作品 ID 恢复后内容不显示
|
||||
- *原因*: 持久化模块恢复了 ID,但 `useGeneratedVideos` 未监听 ID 变化同步 URL。
|
||||
- *修复*: 新增 `useEffect` 监听 `selectedVideoId` 变化并同步 `generatedVideo` URL。
|
||||
|
||||
- [x] **REG-2**: 首页/发布页“默认选中第一个”逻辑丢失
|
||||
- *原因*: 重构移除旧逻辑后,新用户或无缓存用户进入页面无默认选中。
|
||||
- *修复*: 在 `isRestored` 且无选中时,增加兜底逻辑自动选中列表第一项。
|
||||
|
||||
- [x] **REG-3**: 素材选择持久化失效 (闭包陷阱)
|
||||
- *原因*: `useMaterials` 加载回调中捕获了旧的 `selectedMaterial` 状态,覆盖了已恢复的值。
|
||||
- *修复*: 改为函数式状态更新 (`setState(prev => ...)`),确保基于最新状态判断。
|
||||
|
||||
- [x] **REF-1**: 持久化逻辑全站收敛与排查
|
||||
- *优化*: 清理 `useBgm`, `useGeneratedVideos`, `useTitleSubtitleStyles` 中的冗余 `localStorage` 读取,统一由 `useHomePersistence` 管理。
|
||||
- *排查*: 深度排查 `useRefAudios`, `useTitleSubtitleStyles` 等模块,确认逻辑健壮,无类似回归风险。
|
||||
449
Docs/DevLogs/Day21.md
Normal file
449
Docs/DevLogs/Day21.md
Normal file
@@ -0,0 +1,449 @@
|
||||
## 🐛 缺陷修复:视频生成与持久化回归 (Day 21)
|
||||
|
||||
### 概述
|
||||
本日修复 Day 20 优化后引入的 3 个回归缺陷:Remotion 渲染崩溃容错、首页作品选择持久化、发布页作品选择持久化。
|
||||
|
||||
---
|
||||
|
||||
### 已完成修复
|
||||
|
||||
#### BUG-1: Remotion 渲染进程崩溃导致标题/字幕丢失
|
||||
- **现象**: 视频生成后没有标题和字幕,回退到纯 FFmpeg 合成。
|
||||
- **根因**: Remotion Node.js 进程在渲染完成(100%)后以 SIGABRT (code -6) 退出,Python 端将其视为失败。
|
||||
- **修复**: `remotion_service.py` 在进程非零退出时,先检查输出文件是否存在且大小合理(>1KB),若存在则视为成功。
|
||||
- **文件**: `backend/app/services/remotion_service.py`
|
||||
|
||||
```python
|
||||
if process.returncode != 0:
|
||||
output_file = Path(output_path)
|
||||
if output_file.exists() and output_file.stat().st_size > 1024:
|
||||
logger.warning(
|
||||
f"Remotion process exited with code {process.returncode}, "
|
||||
f"but output file exists ({output_file.stat().st_size} bytes). Treating as success."
|
||||
)
|
||||
return output_path
|
||||
raise RuntimeError(...)
|
||||
```
|
||||
|
||||
#### BUG-2: 首页历史作品选择刷新后不保持
|
||||
- **现象**: 用户选择某个历史作品后刷新页面,总是回到第一个视频。
|
||||
- **根因**: `fetchGeneratedVideos()` 在初始加载时无条件自动选中第一个视频,覆盖了 `useHomePersistence` 的恢复值。
|
||||
- **修复**: `fetchGeneratedVideos` 增加 `preferVideoId` 参数,仅在明确指定时才自动选中;新增 `"__latest__"` 哨兵值用于生成完成后选中最新。
|
||||
- **文件**: `frontend/src/features/home/model/useGeneratedVideos.ts`, `frontend/src/features/home/model/useHomeController.ts`
|
||||
|
||||
```typescript
|
||||
// 任务完成 → 自动选中最新
|
||||
useEffect(() => {
|
||||
if (prevIsGenerating.current && !isGenerating) {
|
||||
if (currentTask?.status === "completed") {
|
||||
void fetchGeneratedVideos("__latest__");
|
||||
} else {
|
||||
void fetchGeneratedVideos();
|
||||
}
|
||||
}
|
||||
prevIsGenerating.current = isGenerating;
|
||||
}, [isGenerating, currentTask, fetchGeneratedVideos]);
|
||||
```
|
||||
|
||||
#### BUG-3: 发布页作品选择刷新后不保持(根因:签名 URL 不稳定)
|
||||
- **现象**: 发布管理页选择视频后刷新,选择丢失(无任何视频被选中)。
|
||||
- **根因**: 后端 `/api/videos/generated` 返回的 `path` 是 Supabase 签名 URL,每次请求都会变化。发布页用 `path` 作为选择标识存入 localStorage,刷新后新的 `path` 与保存值永远不匹配。首页不受影响是因为使用稳定的 `video.id`。
|
||||
- **修复**: 发布页全面改用 `id`(稳定标识)替代 `path`(签名 URL)进行选择、持久化和比较。
|
||||
- **文件**:
|
||||
- `frontend/src/shared/types/publish.ts` — `PublishVideo` 新增 `id` 字段
|
||||
- `frontend/src/features/publish/model/usePublishController.ts` — `selectedVideo` 存储 `id`,发布时根据 `id` 查找 `path`
|
||||
- `frontend/src/features/publish/ui/PublishPage.tsx` — `key`/`onClick`/选中比较改用 `v.id`
|
||||
- `frontend/src/features/home/model/useHomeController.ts` — 预取缓存加入 `id` 字段
|
||||
|
||||
```typescript
|
||||
// 类型定义新增 id
|
||||
export interface PublishVideo {
|
||||
id: string; // 稳定标识符
|
||||
name: string;
|
||||
path: string; // 签名 URL(仅用于播放/发布)
|
||||
}
|
||||
|
||||
// 发布时根据 id 查找 path
|
||||
const video = videos.find(v => v.id === selectedVideo);
|
||||
await api.post('/api/publish', { video_path: video.path, ... });
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/services/remotion_service.py` | Remotion 崩溃容错 |
|
||||
| `frontend/src/features/home/model/useGeneratedVideos.ts` | 首页视频选择不自动覆盖 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 任务完成监听 + 预取缓存加 id |
|
||||
| `frontend/src/shared/types/publish.ts` | PublishVideo 新增 id 字段 |
|
||||
| `frontend/src/features/publish/model/usePublishController.ts` | 选择/持久化/发布改用 id |
|
||||
| `frontend/src/features/publish/ui/PublishPage.tsx` | UI 选择比较改用 id |
|
||||
|
||||
### 关键教训
|
||||
|
||||
> **签名 URL 不可作为持久化标识**。Supabase Storage 的签名 URL 包含时间戳和签名参数,每次请求都不同。任何需要跨请求/跨刷新保持的标识,必须使用后端返回的稳定 `id` 字段。
|
||||
|
||||
### 重启要求
|
||||
```bash
|
||||
pm2 restart vigent2-backend # Remotion 容错
|
||||
npm run build && pm2 restart vigent2-frontend # 前端持久化修复
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎨 浮动样式预览窗口优化 (Day 21)
|
||||
|
||||
### 概述
|
||||
标题与字幕面板中的预览区域原本是内联折叠的,展开后调节下方滑块时看不到预览效果。改为 `position: fixed` 浮动窗口,固定在视口左上角,滚动页面时预览始终可见,边调边看。
|
||||
|
||||
### 已完成优化
|
||||
|
||||
#### 1. 新建浮动预览组件 `FloatingStylePreview.tsx`
|
||||
- `createPortal(jsx, document.body)` 渲染到 body 层级,脱离面板 DOM 树
|
||||
- `position: fixed` + 左上角固定定位,滚动时不移动
|
||||
- `z-index: 150`(低于 VideoPreviewModal 的 200)
|
||||
- 顶部标题栏 + X 关闭按钮,ESC 键关闭
|
||||
- 桌面端固定宽度 280px,移动端自适应(最大 360px)
|
||||
- `previewScale = windowWidth / previewBaseWidth` 自行计算缩放
|
||||
- `maxHeight: calc(100dvh - 32px)` 防止超出视口
|
||||
|
||||
#### 2. 修改 `TitleSubtitlePanel.tsx`
|
||||
- 删除内联预览区域(`ref={previewContainerRef}` 整块 JSX)
|
||||
- 条件渲染 `<FloatingStylePreview />`,按钮文本保持"预览样式"/"收起预览"
|
||||
- 移除 `previewScale`、`previewAspectRatio`、`previewContainerRef` props
|
||||
- 保留 `previewBaseWidth/Height`(浮动窗口需要原始尺寸计算 scale)
|
||||
|
||||
#### 3. 清理 `useHomeController.ts`
|
||||
- 移除 `previewContainerWidth` 状态
|
||||
- 移除 `titlePreviewContainerRef` ref
|
||||
- 移除 ResizeObserver useEffect(浮动窗口自管尺寸,不再需要)
|
||||
|
||||
#### 4. 简化 `HomePage.tsx` 传参
|
||||
- 移除 `previewContainerWidth`、`titlePreviewContainerRef` 解构
|
||||
- 移除 `previewScale`、`previewAspectRatio`、`previewContainerRef` prop 传递
|
||||
|
||||
#### 5. 移动端适配
|
||||
- `ScriptEditor.tsx`:标题行改为 `flex-wrap`,"AI生成标题标签"按钮不再溢出
|
||||
- 预览默认比例从 1280×720 (16:9) 改为 1080×1920 (9:16),符合抖音竖屏视频
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/FloatingStylePreview.tsx` | **新建** 浮动预览组件 |
|
||||
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | 移除内联预览,渲染浮动组件 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 移除 preview 容器相关状态和 ResizeObserver |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 简化 props 传递,默认比例改 9:16 |
|
||||
| `frontend/src/features/home/ui/ScriptEditor.tsx` | 移动端按钮换行适配 |
|
||||
|
||||
### 重启要求
|
||||
```bash
|
||||
npm run build && pm2 restart vigent2-frontend
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 多平台发布体系重构:用户隔离与抖音刷脸验证 (Day 21)
|
||||
|
||||
### 概述
|
||||
重构发布系统的两大核心问题:① 多用户场景下 Cookie/会话缺乏隔离,② 抖音登录新增刷脸验证步骤无法处理。同时修复了平台配置混用和微信视频号发布流程问题。
|
||||
|
||||
---
|
||||
|
||||
### 一、平台配置独立化
|
||||
|
||||
#### 问题
|
||||
所有平台(抖音、微信、B站、小红书)共用 WEIXIN_* 配置,导致 User-Agent、Headless 模式等设置不匹配。
|
||||
|
||||
#### 修复 — `config.py`
|
||||
- 新增 `DOUYIN_*` 独立配置项:`DOUYIN_HEADLESS_MODE`、`DOUYIN_USER_AGENT`(Chrome/144)、`DOUYIN_LOCALE`、`DOUYIN_TIMEZONE_ID`、`DOUYIN_CHROME_PATH`、`DOUYIN_FORCE_SWIFTSHADER`、调试开关等
|
||||
- 微信保持已有 `WEIXIN_*` 配置
|
||||
- B站/小红书使用通用默认值
|
||||
|
||||
#### 修复 — `qr_login_service.py` 平台配置映射
|
||||
```python
|
||||
# 之前:所有平台都用 WEIXIN 设置
|
||||
# 之后:每个平台独立配置
|
||||
PLATFORM_CONFIGS = {
|
||||
"douyin": { headless, user_agent, locale, timezone... },
|
||||
"weixin": { headless, user_agent, locale, timezone... },
|
||||
"bilibili": { 通用配置 },
|
||||
"xiaohongshu": { 通用配置 },
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 二、用户隔离的 Cookie 管理
|
||||
|
||||
#### 问题
|
||||
多用户共享同一套 Cookie 文件,用户 A 的登录态可能被用户 B 覆盖。
|
||||
|
||||
#### 修复 — `publish_service.py`
|
||||
- `_get_cookies_dir(user_id)` → `backend/user_data/{uuid}/cookies/`
|
||||
- `_get_cookie_path(user_id, platform)` → 按用户+平台返回独立 Cookie 文件路径
|
||||
- `_get_session_key(user_id, platform)` → `"{user_id}_{platform}"` 格式的会话 key
|
||||
- 登录/发布流程全链路传入 `user_id`,清理残留会话避免干扰
|
||||
|
||||
---
|
||||
|
||||
### 三、抖音刷脸验证二维码
|
||||
|
||||
#### 问题
|
||||
抖音扫码登录后可能弹出刷脸验证窗口,内含新的二维码需要用户再次扫描,前端无法感知和展示。
|
||||
|
||||
#### 修复 — 后端 `qr_login_service.py`
|
||||
- 扩展 QR 选择器:支持跨 iframe 搜索二维码元素
|
||||
- 抖音 API 拦截:监听 `check_qrconnect` 响应,检测 `redirect_url`
|
||||
- 检测 "完成验证" / "请前往APP完成验证" 文案
|
||||
- 在验证弹窗内找到正方形二维码(排除头像),截图返回给前端
|
||||
- API 确认后直接导航到 redirect_url(不重新加载 QR 页,避免销毁会话)
|
||||
|
||||
#### 修复 — 后端 `publish_service.py`
|
||||
- `get_login_session_status()` 新增 `face_verify_qr` 字段返回
|
||||
- 登录成功且 Cookie 保存后自动清理会话
|
||||
|
||||
#### 修复 — 前端
|
||||
- `usePublishController.ts`:新增 `faceVerifyQr` 状态,轮询时获取 `face_verify_qr` 字段
|
||||
- `PublishPage.tsx`:QR 弹窗优先展示刷脸验证二维码,附提示文案
|
||||
|
||||
```tsx
|
||||
{faceVerifyQr ? (
|
||||
<>
|
||||
<Image src={`data:image/png;base64,${faceVerifyQr}`} />
|
||||
<p>需要身份验证,请用抖音APP扫描上方二维码完成刷脸验证</p>
|
||||
</>
|
||||
) : /* 普通登录二维码 */ }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 四、微信视频号发布流程优化
|
||||
|
||||
#### 修复 — `weixin_uploader.py`
|
||||
- 添加 `user_id` 参数支持,发布截图目录隔离
|
||||
- 新增 `post_create` API 响应监听,精准判断发布成功
|
||||
- 发布结果判定:URL 离开创建页 或 API 确认提交 → 视为成功
|
||||
- 标题/标签处理改为统一写入"视频描述"字段(不再单独填写 title/tags)
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/core/config.py` | 新增 DOUYIN_* 独立配置项 |
|
||||
| `backend/app/services/qr_login_service.py` | 平台配置拆分、刷脸验证二维码、跨 iframe 选择器 |
|
||||
| `backend/app/services/publish_service.py` | 用户隔离 Cookie 管理、刷脸验证状态返回 |
|
||||
| `backend/app/services/uploader/weixin_uploader.py` | user_id 支持、post_create API 监听、描述字段合并 |
|
||||
| `frontend/src/features/publish/model/usePublishController.ts` | faceVerifyQr 状态 |
|
||||
| `frontend/src/features/publish/ui/PublishPage.tsx` | 刷脸验证二维码展示 |
|
||||
|
||||
### 重启要求
|
||||
```bash
|
||||
pm2 restart vigent2-backend # 发布服务 + QR登录
|
||||
npm run build && pm2 restart vigent2-frontend # 刷脸验证UI
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ 架构优化:前端结构微调 + 后端模块分层 (Day 21)
|
||||
|
||||
### 概述
|
||||
根据架构审计结果,完成前端目录规范化和后端核心模块的分层补全。
|
||||
|
||||
### 一、前端结构微调
|
||||
|
||||
#### 1. ScriptExtractionModal 迁移
|
||||
- `components/ScriptExtractionModal.tsx` → `features/home/ui/ScriptExtractionModal.tsx`
|
||||
- 连带 `components/script-extraction/` 目录一并迁移到 `features/home/ui/script-extraction/`
|
||||
- 更新 `HomePage.tsx` 的 import 路径
|
||||
|
||||
#### 2. contexts/ 目录归并
|
||||
- `src/contexts/AuthContext.tsx` → `src/shared/contexts/AuthContext.tsx`
|
||||
- `src/contexts/TaskContext.tsx` → `src/shared/contexts/TaskContext.tsx`
|
||||
- 更新 6 处 import(layout.tsx, useHomeController.ts, usePublishController.ts, AccountSettingsDropdown.tsx, GlobalTaskIndicator.tsx)
|
||||
- 删除空的 `src/contexts/` 目录
|
||||
|
||||
#### 3. 清理重构遗留空目录
|
||||
- 删除 `src/lib/`、`src/components/home/`、`src/hooks/`
|
||||
|
||||
### 二、后端模块分层补全
|
||||
|
||||
将 3 个 400+ 行的 router-only 模块拆分为 `router.py + schemas.py + service.py`:
|
||||
|
||||
| 模块 | 改造前 | 改造后 router |
|
||||
|------|--------|--------------|
|
||||
| `materials/` | 416 行 | 63 行 |
|
||||
| `tools/` | 417 行 | 33 行 |
|
||||
| `ref_audios/` | 421 行 | 71 行 |
|
||||
|
||||
业务逻辑全部提取到 `service.py`,数据模型定义在 `schemas.py`,router 只做参数校验 + 调用 service + 返回响应。
|
||||
|
||||
### 三、开发规范更新
|
||||
|
||||
`BACKEND_DEV.md` 第 8 节新增渐进原则:
|
||||
- 新模块**必须**包含 `router.py + schemas.py + service.py`
|
||||
- 改旧模块时顺手拆涉及的部分
|
||||
- 新代码高标准,旧代码逐步改
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | 从 components/ 迁入 |
|
||||
| `frontend/src/features/home/ui/script-extraction/` | 从 components/ 迁入 |
|
||||
| `frontend/src/shared/contexts/AuthContext.tsx` | 从 contexts/ 迁入 |
|
||||
| `frontend/src/shared/contexts/TaskContext.tsx` | 从 contexts/ 迁入 |
|
||||
| `backend/app/modules/materials/schemas.py` | **新建** |
|
||||
| `backend/app/modules/materials/service.py` | **新建** |
|
||||
| `backend/app/modules/materials/router.py` | 精简为薄路由 |
|
||||
| `backend/app/modules/tools/schemas.py` | **新建** |
|
||||
| `backend/app/modules/tools/service.py` | **新建** |
|
||||
| `backend/app/modules/tools/router.py` | 精简为薄路由 |
|
||||
| `backend/app/modules/ref_audios/schemas.py` | **新建** |
|
||||
| `backend/app/modules/ref_audios/service.py` | **新建** |
|
||||
| `backend/app/modules/ref_audios/router.py` | 精简为薄路由 |
|
||||
| `Docs/BACKEND_DEV.md` | 目录结构标注分层、新增渐进原则 |
|
||||
| `Docs/BACKEND_README.md` | 目录结构标注分层 |
|
||||
| `Docs/FRONTEND_DEV.md` | 更新目录结构(contexts 迁移、ScriptExtractionModal 迁移) |
|
||||
|
||||
### 重启要求
|
||||
```bash
|
||||
pm2 restart vigent2-backend
|
||||
npm run build && pm2 restart vigent2-frontend
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎬 多素材视频生成(多机位效果)
|
||||
|
||||
### 概述
|
||||
支持用户上传多个不同角度的自拍视频,生成视频时按句子自动切换素材,最终效果类似多机位拍摄。单素材时走原有流程,无额外开销。
|
||||
|
||||
### 核心架构
|
||||
|
||||
#### 流水线变更
|
||||
```
|
||||
【单素材(不变)】
|
||||
text → TTS → audio → LatentSync(1个素材+完整audio) → Whisper字幕 → Remotion → 成片
|
||||
|
||||
【多素材(新增)】
|
||||
text → TTS → audio → Whisper字幕(提前) → 按素材数量均分时长(对齐字边界)
|
||||
→ 对每段: 切分audio + LatentSync(素材[i]+音频片段[i])
|
||||
→ FFmpeg拼接所有片段 → Remotion(完整字幕时间戳) → 成片
|
||||
```
|
||||
|
||||
#### 素材切换逻辑(均分方案)
|
||||
1. Whisper 对完整音频转录,得到字级别时间戳
|
||||
2. 按素材数量**均分音频总时长**(`total_duration / N`)
|
||||
3. 每个分割点对齐到最近的 Whisper 字边界,避免在字中间切分
|
||||
4. 首段 start 扩展为 0.0,末段 end 扩展为音频结尾,确保完整覆盖
|
||||
|
||||
> **设计决策**:最初方案基于原始文案标点分句,但用户文案往往不含句号(只有逗号),导致只产生 1 段。改为均分方案后不依赖文案标点,对任何输入都能正确切分。
|
||||
|
||||
---
|
||||
|
||||
### 一、后端改动
|
||||
|
||||
#### 1. `backend/app/modules/videos/schemas.py`
|
||||
- 新增 `material_paths: Optional[List[str]]` 字段
|
||||
- 保留 `material_path: str` 向后兼容
|
||||
|
||||
#### 2. `backend/app/modules/videos/workflow.py`(核心改动)
|
||||
|
||||
**新增函数**:
|
||||
- `_split_equal(segments, material_paths)`: 按素材数量均分音频时长,对齐到最近的 Whisper 字边界
|
||||
|
||||
**修改 `process_video_generation()`**:
|
||||
- `is_multi = len(material_paths) > 1` 判断走多素材/单素材分支
|
||||
- 多素材分支:Whisper 提前 → 均分切分 → 音频切分 → 逐段 LatentSync → FFmpeg 拼接
|
||||
|
||||
#### 3. `backend/app/services/video_service.py`
|
||||
- 新增 `concat_videos()`: FFmpeg concat demuxer (`-c copy`) 拼接视频片段
|
||||
- 新增 `split_audio()`: FFmpeg 按时间范围切分音频 (`-ss` + `-t` + `-c copy`)
|
||||
|
||||
#### 4. `backend/scripts/watchdog.py`
|
||||
- 健康检查阈值从 3 次提高到 5 次(容忍期 2.5 分钟)
|
||||
- 新增重启后 120 秒冷却期,避免模型加载期间被误判为故障
|
||||
- 启动时给所有服务 60 秒初始冷却期
|
||||
|
||||
---
|
||||
|
||||
### 二、前端改动
|
||||
|
||||
#### 1. 新增依赖
|
||||
```bash
|
||||
npm install @dnd-kit/core @dnd-kit/sortable @dnd-kit/utilities
|
||||
```
|
||||
|
||||
#### 2. `frontend/src/features/home/model/useMaterials.ts`
|
||||
- `selectedMaterial: string` → `selectedMaterials: string[]`(多选)
|
||||
- 新增 `toggleMaterial(id)`: 切换选中/取消(至少保留1个)
|
||||
- 新增 `reorderMaterials(activeId, overId)`: 拖拽排序
|
||||
- 上传格式扩展:新增 `.mkv/.webm/.flv/.wmv/.m4v/.ts/.mts`
|
||||
|
||||
#### 3. `frontend/src/features/home/ui/MaterialSelector.tsx`(重写)
|
||||
- 素材列表每行增加复选框 + 序号徽标(①②③)
|
||||
- 选中 ≥2 个时显示拖拽排序区(@dnd-kit `SortableContext`)
|
||||
- 每个排序项:拖拽把手 + 序号 + 素材名 + 移除按钮
|
||||
- HTML input accept 改为 `video/*`
|
||||
|
||||
#### 4. `frontend/src/features/home/model/useHomeController.ts`
|
||||
- 多素材 payload:`material_paths` 数组 + `material_path` 向后兼容
|
||||
- `enable_subtitles` 硬编码为 `true`(移除开关)
|
||||
- 验证:至少选中 1 个素材
|
||||
|
||||
#### 5. `frontend/src/features/home/model/useHomePersistence.ts`
|
||||
- 素材持久化改为 JSON 数组,向后兼容旧格式(单字符串)
|
||||
- 移除 `enableSubtitles` 持久化
|
||||
|
||||
#### 6. `frontend/src/features/home/ui/TitleSubtitlePanel.tsx`
|
||||
- 移除"逐字高亮字幕"开关,字幕样式区始终显示
|
||||
|
||||
#### 7. `frontend/src/features/home/ui/HomePage.tsx`
|
||||
- 更新 props 传递(`selectedMaterials`, `toggleMaterial`, `reorderMaterials`)
|
||||
|
||||
---
|
||||
|
||||
### 三、Bug 修复记录
|
||||
|
||||
#### BUG-1: 多素材只使用第一个视频(基于标点的分句方案失败)
|
||||
- **现象**: 选了 2 个素材但生成的视频只使用第 1 个,日志显示 `Multi-material: 1 segments, 2 materials`。
|
||||
- **根因 v1**: 最初通过正则 `[。!?!?]` 在 Whisper 输出中分句,但 Whisper 不输出标点。
|
||||
- **修复 v1**: 改为用原始文案标点分句——但用户文案往往只含逗号(,),无句末标点(。!?),仍退化为 1 段。
|
||||
- **最终修复**: 彻底放弃基于标点的分句方案,改为 `_split_equal()` **按素材数量均分音频时长**,对齐到最近的 Whisper 字边界。不依赖任何标点符号,对所有文案均有效。
|
||||
|
||||
#### BUG-2: 口型对不上(音频时间偏移)
|
||||
- **根因**: `split_audio` 用 Whisper 的 start/end 时间(如 0.11~7.21)切分音频,但 `compose()` 用完整原始音频(0.0~结尾)合成,导致时间偏移。
|
||||
- **修复**: 强制首段 start=0.0,末段 end=音频实际时长,确保切分音频完整覆盖。
|
||||
|
||||
#### BUG-3: min_segment_sec 过度合并导致退化(已随方案切换移除)
|
||||
- **根因**: 旧方案中 2 个句子第 2 句不足 3 秒时,最短时长检查合并为 1 段,多素材退化为单素材。
|
||||
- **状态**: 均分方案不存在此问题,相关代码已移除。
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `backend/app/modules/videos/schemas.py` | 修改 | 新增 material_paths 字段 |
|
||||
| `backend/app/modules/videos/workflow.py` | 修改 | 多素材流水线核心逻辑 + 3个 Bug 修复 |
|
||||
| `backend/app/services/video_service.py` | 修改 | 新增 concat_videos / split_audio |
|
||||
| `backend/scripts/watchdog.py` | 修改 | 阈值优化 + 冷却期机制 |
|
||||
| `frontend/package.json` | 修改 | 新增 @dnd-kit 依赖 |
|
||||
| `frontend/src/features/home/model/useMaterials.ts` | 修改 | 多选 + 排序状态管理 |
|
||||
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 重写 | 多选复选框 + 拖拽排序 UI |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 修改 | 多素材 payload + 移除字幕开关 |
|
||||
| `frontend/src/features/home/model/useHomePersistence.ts` | 修改 | JSON 数组持久化 |
|
||||
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | 修改 | 移除字幕开关 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 修改 | 更新 props 传递 |
|
||||
|
||||
### 重启要求
|
||||
```bash
|
||||
pm2 restart vigent2-backend
|
||||
npm run build && pm2 restart vigent2-frontend
|
||||
```
|
||||
221
Docs/DevLogs/Day22.md
Normal file
221
Docs/DevLogs/Day22.md
Normal file
@@ -0,0 +1,221 @@
|
||||
## 🔧 多素材生成优化与健壮性加固 (Day 22)
|
||||
|
||||
### 概述
|
||||
对 Day 21 实现的多素材视频生成(多机位)功能进行全面审查,修复 6 个高优先级 Bug、完成 8 项体验优化,并将多素材流水线从"逐段 LatentSync"重构为"先拼接再推理"架构,推理次数从 N 次降为 1 次。
|
||||
|
||||
---
|
||||
|
||||
### 一、后端高优 Bug 修复
|
||||
|
||||
#### 1. `_split_equal()` 素材数 > 字符数边界溢出
|
||||
- **问题**: 5 个素材但只有 2 个 Whisper 字符时,边界索引重复,部分素材被跳过
|
||||
- **修复**: 加入 `n = min(n, len(all_chars))` 上限保护
|
||||
- **文件**: `backend/app/modules/videos/workflow.py`
|
||||
|
||||
#### 2. 多素材 LatentSync 单段失败无 fallback
|
||||
- **问题**: 单素材模式下 LatentSync 失败会 fallback 到原始素材,但多素材模式直接抛异常,整个任务失败
|
||||
- **修复**: 多素材循环中加 try-except,失败时 fallback 到原始素材片段
|
||||
- **文件**: `backend/app/modules/videos/workflow.py`
|
||||
|
||||
#### 3. `num_segments == 0` 时 ZeroDivisionError
|
||||
- **问题**: 所有 assignments 被跳过后 `i / num_segments` 触发除零
|
||||
- **修复**: 循环前加 `if num_segments == 0` 检查并抛出明确错误
|
||||
- **文件**: `backend/app/modules/videos/workflow.py`
|
||||
|
||||
#### 4. `split_audio` 未校验 duration > 0
|
||||
- **问题**: `end <= start` 时 FFmpeg 行为异常
|
||||
- **修复**: 加入 `if duration <= 0: raise ValueError(...)`
|
||||
- **文件**: `backend/app/services/video_service.py`
|
||||
|
||||
#### 5. Whisper 失败时按时长均分兜底
|
||||
- **问题**: Whisper 失败后直接退化为单素材,其他素材被浪费
|
||||
- **修复**: 按 `audio_duration / len(material_paths)` 均分,不依赖字符对齐
|
||||
- **文件**: `backend/app/modules/videos/workflow.py`
|
||||
|
||||
#### 6. `concat_videos` 空列表未检查
|
||||
- **问题**: 传入空 `video_paths` 时 FFmpeg 报错
|
||||
- **修复**: 加入 `if not video_paths: raise ValueError(...)`
|
||||
- **文件**: `backend/app/services/video_service.py`
|
||||
|
||||
---
|
||||
|
||||
### 二、前端优化
|
||||
|
||||
#### 1. payload 构建非空断言修复
|
||||
- `m!.path` → `m?.path` + `.filter(Boolean)`,防止素材被删后 crash
|
||||
- **文件**: `frontend/src/features/home/model/useHomeController.ts`
|
||||
|
||||
#### 2. 生成按钮展示后端进度消息
|
||||
- 新增 `message` prop,生成中显示如"(正在处理片段 2/3...)"
|
||||
- **文件**: `frontend/src/features/home/ui/GenerateActionBar.tsx`, `HomePage.tsx`
|
||||
|
||||
#### 3. 新上传素材自动选中
|
||||
- 上传成功后对比前后素材列表,新增的 ID 自动追加到 `selectedMaterials`
|
||||
- **文件**: `frontend/src/features/home/model/useMaterials.ts`
|
||||
|
||||
#### 4. Material 接口统一
|
||||
- 三处 `interface Material` 重复定义提取到 `shared/types/material.ts`
|
||||
- **文件**: `frontend/src/shared/types/material.ts` (新建), `useMaterials.ts`, `useHomeController.ts`, `MaterialSelector.tsx`
|
||||
|
||||
#### 5. 拖拽排序修复
|
||||
- 移除 `DragOverlay`(`backdrop-blur` 创建新 containing block 导致定位错乱)
|
||||
- 改为 `useSortable` 原生拖拽 + `CSS.Translate`,拖拽中元素高亮加阴影
|
||||
- **文件**: `frontend/src/features/home/ui/MaterialSelector.tsx`
|
||||
|
||||
#### 6. 素材选择上限 4 个
|
||||
- `toggleMaterial` 新增 `MAX_MATERIALS = 4` 限制
|
||||
- UI 选满后未选中项变半透明禁用,提示文字改为"可多选,最多4个"
|
||||
- **文件**: `useMaterials.ts`, `MaterialSelector.tsx`
|
||||
|
||||
#### 7. 移动端排序区域响应式
|
||||
- 素材列表 `max-h-64` → `max-h-48 sm:max-h-64`
|
||||
- **文件**: `MaterialSelector.tsx`
|
||||
|
||||
#### 8. 多素材耗时提示
|
||||
- 选中 ≥2 素材时生成按钮下方显示"多素材模式 (N 个机位),生成耗时较长"
|
||||
- **文件**: `GenerateActionBar.tsx`, `HomePage.tsx`
|
||||
|
||||
---
|
||||
|
||||
### 三、核心架构重构:先拼接再推理
|
||||
|
||||
#### V1 (Day 21): 逐段 LatentSync
|
||||
```
|
||||
素材A → LatentSync(素材A, 音频片段1) → lipsync_A
|
||||
素材B → LatentSync(素材B, 音频片段2) → lipsync_B
|
||||
FFmpeg concat(lipsync_A, lipsync_B) → 最终视频
|
||||
```
|
||||
- 缺点:N 个素材 = N 次 LatentSync 推理(每次 ~30s)
|
||||
|
||||
#### V2 (Day 22): 先拼接再推理
|
||||
```
|
||||
素材A → prepare_segment(裁剪到3.67s) → prepared_A
|
||||
素材B → prepare_segment(裁剪到4.00s) → prepared_B
|
||||
FFmpeg concat(prepared_A, prepared_B) → concat_video (7.67s)
|
||||
LatentSync(concat_video, 完整音频) → 最终视频
|
||||
```
|
||||
- 优点:只需 **1 次** LatentSync 推理,时间从 N×30s 降为 1×30s
|
||||
|
||||
#### 新增 `prepare_segment()` 方法
|
||||
```python
|
||||
def prepare_segment(self, video_path, target_duration, output_path, target_resolution=None):
|
||||
# 素材时长 > 目标: 裁剪 (-t)
|
||||
# 素材时长 < 目标: 循环 (-stream_loop) + 裁剪
|
||||
# 分辨率一致: -c copy 无损 (不重编码)
|
||||
# 分辨率不一致: scale + pad 统一到第一个素材分辨率
|
||||
```
|
||||
|
||||
#### 分辨率处理策略
|
||||
- 新增 `get_resolution()` 方法检测各素材分辨率
|
||||
- 所有素材分辨率相同时:`-c copy` 无损裁剪(保持原画质)
|
||||
- 分辨率不一致时:统一到第一个素材的分辨率,`force_original_aspect_ratio=decrease` + `pad` 居中
|
||||
- LatentSync 只处理嘴部 512×512 区域,输出保持原分辨率
|
||||
|
||||
#### 时间对齐验证
|
||||
|
||||
| 环节 | 时间基准 | 对齐关系 |
|
||||
|------|---------|---------|
|
||||
| TTS 音频 | 原始时长 (7.67s) | 基准 |
|
||||
| Whisper 字幕 | 基于 TTS 音频 | 时间戳对齐音频 |
|
||||
| 均分切分 | assignments 总时长 = 音频时长 | 首段 start=0, 末段 end=audio_duration |
|
||||
| prepare 各段 | `-t seg_dur` 精确截断 | 总和 ≈ 音频时长 |
|
||||
| LatentSync | concat_video + 完整音频 | 内部 0.5s 容差 |
|
||||
| compose | lipsync_video + 音频/BGM | `-shortest` 保证同步 |
|
||||
| Remotion | 基于 captions_path 渲染字幕 | 时间戳对齐音频 |
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
| 文件 | 变更类型 | 说明 |
|
||||
|------|----------|------|
|
||||
| `backend/app/modules/videos/workflow.py` | 修改 | 6 个 Bug 修复 + 流水线重构(先拼接再推理)|
|
||||
| `backend/app/services/video_service.py` | 修改 | 新增 `prepare_segment()`、`get_resolution()`,`split_audio` 校验,`concat_videos` 空列表检查 |
|
||||
| `frontend/src/shared/types/material.ts` | 新建 | 统一 Material 接口 |
|
||||
| `frontend/src/features/home/model/useMaterials.ts` | 修改 | 上传自动选中、素材上限 4 个 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 修改 | payload 非空断言修复、Material 接口引用 |
|
||||
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 修改 | 拖拽修复、上限 4 个 UI、移动端响应式 |
|
||||
| `frontend/src/features/home/ui/GenerateActionBar.tsx` | 修改 | 进度消息展示、多素材耗时提示 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 修改 | 传递 message、materialCount prop |
|
||||
|
||||
---
|
||||
|
||||
### 四、AI 多语言翻译
|
||||
|
||||
#### 功能
|
||||
在文案编辑区新增「AI多语言」按钮,支持将中文口播文案一键翻译为 9 种语言,并可随时还原原文。
|
||||
|
||||
#### 支持语言
|
||||
英语 English、日语 日本語、韩语 한국어、法语 Français、德语 Deutsch、西班牙语 Español、俄语 Русский、意大利语 Italiano、葡萄牙语 Português
|
||||
|
||||
#### 实现
|
||||
|
||||
##### 后端
|
||||
- **`backend/app/services/glm_service.py`** — 新增 `translate_text()` 方法,调用智谱 GLM API(temperature=0.3),prompt 要求只返回译文、保持语气风格
|
||||
- **`backend/app/modules/ai/router.py`** — 新增 `POST /api/ai/translate` 接口,接收 `{text, target_lang}`,返回 `{translated_text}`
|
||||
|
||||
##### 前端
|
||||
- **`frontend/src/features/home/ui/ScriptEditor.tsx`** — 新增 `LANGUAGES` 列表(9 种语言)、语言下拉菜单(点击外部自动关闭)、翻译中 loading 状态、「还原原文」按钮(翻译过后出现在菜单顶部)
|
||||
- **`frontend/src/features/home/model/useHomeController.ts`** — 新增 `handleTranslate`(调用翻译 API、首次翻译保存原文)、`originalText` 状态、`handleRestoreOriginal`(恢复原文)
|
||||
|
||||
#### 涉及文件
|
||||
|
||||
| 文件 | 变更 | 说明 |
|
||||
|------|------|------|
|
||||
| `backend/app/services/glm_service.py` | 修改 | 新增 `translate_text()` 方法 |
|
||||
| `backend/app/modules/ai/router.py` | 修改 | 新增 `/api/ai/translate` 接口 |
|
||||
| `frontend/src/features/home/ui/ScriptEditor.tsx` | 修改 | 语言菜单 UI、翻译 loading、还原原文按钮 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 修改 | `handleTranslate`、`originalText`、`handleRestoreOriginal` |
|
||||
|
||||
---
|
||||
|
||||
### 五、TTS 多语言支持
|
||||
|
||||
#### 背景
|
||||
翻译功能实现后,用户可将中文文案翻译为其他语言。但翻译后生成视频时 TTS 仍只支持中文:
|
||||
- **EdgeTTS**:声音列表只有 5 个 `zh-CN-*` 中文声音
|
||||
- **声音克隆 (Qwen3-TTS)**:`language` 参数硬编码为 `"Chinese"`
|
||||
|
||||
#### 实现方案
|
||||
|
||||
##### 1. 前端:语言感知的声音列表
|
||||
- `VOICES` 从扁平数组扩展为 `Record<string, VoiceOption[]>`,覆盖 10 种语言(zh-CN / en-US / ja-JP / ko-KR / fr-FR / de-DE / es-ES / ru-RU / it-IT / pt-BR),每种语言 2 个声音(男/女)
|
||||
- 新增 `LANG_TO_LOCALE` 映射:翻译目标语言名 → EdgeTTS locale(如 `"English" → "en-US"`)
|
||||
- 新增 `textLang` 状态,跟踪当前文案语言,默认 `"zh-CN"`
|
||||
|
||||
##### 2. 翻译时自动切换声音
|
||||
- `handleTranslate` 成功后:根据目标语言设置 `textLang`,EdgeTTS 模式下自动切换 `voice` 为目标语言的默认声音
|
||||
- `handleRestoreOriginal` 还原时:重置 `textLang` 为 `"zh-CN"`,恢复中文默认声音
|
||||
- `VoiceSelector` 根据 `textLang` 动态显示对应语言的声音列表
|
||||
|
||||
##### 3. 声音克隆语言透传
|
||||
- 前端:新增 `LOCALE_TO_QWEN_LANG` 映射(`zh-CN→"Chinese"`, `en-US→"English"`, 其他→`"Auto"`)
|
||||
- 生成请求 payload 加入 `language` 字段(仅声音克隆模式)
|
||||
- 后端 `GenerateRequest` schema 新增 `language: str = "Chinese"` 字段
|
||||
- `workflow.py`:`language="Chinese"` 硬编码改为 `language=req.language`
|
||||
|
||||
##### 4. Bug 修复:textLang 持久化
|
||||
- **问题**: `voice` 已持久化但 `textLang` 未持久化,刷新页面后 `voice` 恢复为英文声音但 `textLang` 默认回中文,导致 VoiceSelector 显示中文声音列表却选中英文声音,无高亮按钮
|
||||
- **修复**: 在 `useHomePersistence` 中加入 `textLang` 的 localStorage 读写
|
||||
|
||||
#### 数据流
|
||||
|
||||
```
|
||||
用户翻译 "English"
|
||||
→ ScriptEditor.onTranslate("English")
|
||||
→ LANG_TO_LOCALE["English"] = "en-US"
|
||||
→ setTextLang("en-US"), setVoice("en-US-GuyNeural")
|
||||
→ VoiceSelector 显示 VOICES["en-US"] = [Guy, Jenny]
|
||||
→ 生成时:
|
||||
EdgeTTS: payload.voice = "en-US-GuyNeural"
|
||||
声音克隆: payload.language = "English" (via getQwenLanguage)
|
||||
```
|
||||
|
||||
#### 涉及文件
|
||||
|
||||
| 文件 | 变更 | 说明 |
|
||||
|------|------|------|
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 修改 | VOICES 多语言 Record、textLang 状态、LANG_TO_LOCALE / LOCALE_TO_QWEN_LANG 映射、翻译自动切换 voice |
|
||||
| `frontend/src/features/home/model/useHomePersistence.ts` | 修改 | textLang 持久化读写 |
|
||||
| `backend/app/modules/videos/schemas.py` | 修改 | GenerateRequest 加 `language` 字段 |
|
||||
| `backend/app/modules/videos/workflow.py` | 修改 | 声音克隆调用处用 `req.language` 替代硬编码 |
|
||||
856
Docs/DevLogs/Day23.md
Normal file
856
Docs/DevLogs/Day23.md
Normal file
@@ -0,0 +1,856 @@
|
||||
## 🎙️ 配音前置重构 — 第一阶段 (Day 23)
|
||||
|
||||
### 概述
|
||||
|
||||
将配音从视频生成流程中独立出来,实现"先生成配音 → 选中配音 → 再选素材 → 生成视频"的新工作流。用户可以独立管理配音(生成/试听/改名/删除/选择),并在选中配音后看到时长信息,为第二阶段的素材时间轴编排奠定数据基础。
|
||||
|
||||
**旧流程**: 文案 + 选素材 → 一键生成(内联 TTS → Whisper → 均分 → LipSync → 合成)
|
||||
**新流程**: 文案 → 配音方式 → **生成配音** → 选中配音 → 选素材 → 背景音乐 → 生成视频
|
||||
|
||||
---
|
||||
|
||||
### 一、后端:新增 `generated_audios` 模块
|
||||
|
||||
#### 模块结构
|
||||
|
||||
```
|
||||
backend/app/modules/generated_audios/
|
||||
├── __init__.py
|
||||
├── router.py # 5 个 API 端点
|
||||
├── schemas.py # 请求/响应模型
|
||||
└── service.py # 生成/列表/删除/改名
|
||||
```
|
||||
|
||||
#### API 端点
|
||||
|
||||
| 方法 | 路径 | 说明 |
|
||||
|------|------|------|
|
||||
| POST | `/api/generated-audios/generate` | 异步生成配音(返回 task_id) |
|
||||
| GET | `/api/generated-audios/tasks/{task_id}` | 轮询生成进度 |
|
||||
| GET | `/api/generated-audios` | 列出用户所有配音 |
|
||||
| DELETE | `/api/generated-audios/{audio_id}` | 删除配音 |
|
||||
| PUT | `/api/generated-audios/{audio_id}` | 改名 |
|
||||
|
||||
#### 存储方案
|
||||
|
||||
- Supabase 存储桶:`generated-audios`(启动时自动创建)
|
||||
- 音频文件:`{user_id}/{timestamp}_audio.wav`
|
||||
- 元数据文件:`{user_id}/{timestamp}_audio.json`(含 display_name、text、tts_mode、duration_sec 等)
|
||||
|
||||
#### 生成流程
|
||||
|
||||
复用现有 `TTSService` / `voice_clone_service` / `task_store`:
|
||||
|
||||
```
|
||||
POST /generate → 创建 task → BackgroundTask:
|
||||
1. edgetts → TTSService.generate_audio()
|
||||
voiceclone → 下载 ref_audio → voice_clone_service.generate_audio()
|
||||
2. ffprobe 获取时长
|
||||
3. 上传 .wav + .json 到 generated-audios 桶
|
||||
4. 更新 task(status=completed, output={audio_id, duration_sec, ...})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 二、后端:修改视频生成 workflow
|
||||
|
||||
#### `GenerateRequest` 新增字段
|
||||
|
||||
```python
|
||||
generated_audio_id: Optional[str] = None # 预生成配音 ID(存在时跳过内联 TTS)
|
||||
```
|
||||
|
||||
#### `workflow.py` TTS 阶段新增分支
|
||||
|
||||
```python
|
||||
if req.generated_audio_id:
|
||||
# 下载预生成配音 + 从元数据读取 language
|
||||
elif req.tts_mode == "voiceclone":
|
||||
# 原有声音克隆逻辑
|
||||
else:
|
||||
# 原有 EdgeTTS 逻辑
|
||||
```
|
||||
|
||||
向后兼容:不传 `generated_audio_id` 时,原有内联 TTS 流程不受影响。
|
||||
|
||||
---
|
||||
|
||||
### 三、前端:新增配音列表 hook + 面板
|
||||
|
||||
#### `useGeneratedAudios.ts`
|
||||
|
||||
- 状态:`generatedAudios[]`、`selectedAudio`、`isGeneratingAudio`、`audioTask`
|
||||
- 方法:`fetchGeneratedAudios()`、`generateAudio()`、`deleteAudio()`、`renameAudio()`、`selectAudio()`
|
||||
- 轮询:生成后 1s 轮询 task 状态,完成后自动刷新列表并选中最新配音
|
||||
- 独立于视频生成的 TaskContext(不互相干扰)
|
||||
|
||||
#### `GeneratedAudiosPanel.tsx`
|
||||
|
||||
- 每条配音:播放/暂停、名称、时长、重命名、删除
|
||||
- 选中态:`border-purple-500 bg-purple-500/20`
|
||||
- 内嵌进度条(生成中显示)
|
||||
- 底部显示选中配音的原始文案(截断)
|
||||
- 播放逻辑自包含于面板内(`new Audio()` + play/pause toggle)
|
||||
|
||||
---
|
||||
|
||||
### 四、前端:UI 面板重排序
|
||||
|
||||
**旧顺序**: MaterialSelector → ScriptEditor → TitleSubtitle → VoiceSelector → BgmPanel → GenerateActionBar
|
||||
|
||||
**新顺序**:
|
||||
1. ScriptEditor(文案编辑)
|
||||
2. TitleSubtitlePanel(标题与字幕样式)
|
||||
3. VoiceSelector(配音方式)
|
||||
4. **GeneratedAudiosPanel**(配音列表)← 新增
|
||||
5. MaterialSelector(视频素材)← 后移,需选中配音才解锁
|
||||
6. BgmPanel(背景音乐)
|
||||
7. GenerateActionBar(生成视频)
|
||||
|
||||
#### 素材区门控
|
||||
|
||||
未选中配音时,素材区显示半透明遮罩 + "请先生成并选中配音"提示。素材上传/预览/改名/删除始终可用,仅选择勾选被遮罩。
|
||||
|
||||
#### 时长信息
|
||||
|
||||
选中配音后,MaterialSelector 顶部显示:
|
||||
```
|
||||
当前配音: 45.2 秒 | 已选 3 个素材(自动均分每段 ~15.1 秒)
|
||||
```
|
||||
|
||||
#### 生成按钮条件更新
|
||||
|
||||
```typescript
|
||||
// 旧条件
|
||||
disabled={isGenerating || selectedMaterials.length === 0 || (ttsMode === "voiceclone" && !selectedRefAudio)}
|
||||
// 新条件
|
||||
disabled={isGenerating || selectedMaterials.length === 0 || !selectedAudio}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 五、持久化
|
||||
|
||||
`useHomePersistence` 新增 `selectedAudioId` 的 localStorage 读写,刷新页面后恢复选中的配音。
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
#### 后端新增
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `backend/app/modules/generated_audios/__init__.py` | 模块标记 |
|
||||
| `backend/app/modules/generated_audios/router.py` | 5 个 API 端点 |
|
||||
| `backend/app/modules/generated_audios/service.py` | 生成/列表/删除/改名 |
|
||||
| `backend/app/modules/generated_audios/schemas.py` | 请求/响应模型 |
|
||||
|
||||
#### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/main.py` | 注册 generated_audios 路由 |
|
||||
| `backend/app/services/storage.py` | 新增 `BUCKET_GENERATED_AUDIOS`,启动时自动创建桶 |
|
||||
| `backend/app/modules/videos/schemas.py` | `GenerateRequest` 新增 `generated_audio_id` 字段 |
|
||||
| `backend/app/modules/videos/workflow.py` | TTS 阶段新增预生成音频分支 |
|
||||
|
||||
#### 前端新增
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/model/useGeneratedAudios.ts` | 配音列表 hook |
|
||||
| `frontend/src/features/home/ui/GeneratedAudiosPanel.tsx` | 配音列表面板 |
|
||||
|
||||
#### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 面板重排序 + 素材区门控 + 插入 GeneratedAudiosPanel |
|
||||
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 新增 `selectedAudioDuration` prop + 时长信息显示 |
|
||||
| `frontend/src/features/home/ui/GenerateActionBar.tsx` | 禁用条件改为 `!selectedAudio` |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 集成 useGeneratedAudios、新增 handleGenerateAudio、修改 handleGenerate 使用 generated_audio_id |
|
||||
| `frontend/src/features/home/model/useHomePersistence.ts` | 新增 selectedAudioId 持久化 |
|
||||
|
||||
---
|
||||
|
||||
## 🎞️ 素材时间轴编排 — 第二阶段 (Day 23)
|
||||
|
||||
### 概述
|
||||
|
||||
在第一阶段"配音前置"基础上,新增**时间轴编辑器**,用户可以:
|
||||
1. 在音频波形上查看各素材块的时长分配
|
||||
2. 拖拽分割线调整每段素材的时长(无缝铺满,调整一段自动压缩/扩展相邻段)
|
||||
3. 为每段素材设置**源视频截取起点**(从视频任意位置开始,而非始终从头)
|
||||
|
||||
**旧行为**: 多素材时自动均分(`_split_equal`),无法控制每段时长和源视频起始点
|
||||
**新行为**: 时间轴编辑器可视化分配 + 拖拽调整 + ClipTrimmer 截取设置
|
||||
|
||||
---
|
||||
|
||||
### 一、后端改动
|
||||
|
||||
#### 1.1 新增 `CustomAssignment` 模型
|
||||
|
||||
```python
|
||||
# backend/app/modules/videos/schemas.py
|
||||
class CustomAssignment(BaseModel):
|
||||
material_path: str
|
||||
start: float # 音频时间轴起点
|
||||
end: float # 音频时间轴终点
|
||||
source_start: float = 0.0 # 源视频截取起点
|
||||
```
|
||||
|
||||
`GenerateRequest` 新增 `custom_assignments: Optional[List[CustomAssignment]] = None`。存在时跳过 Whisper 均分,直接使用用户定义的分配。
|
||||
|
||||
#### 1.2 `prepare_segment` 支持 `source_start`
|
||||
|
||||
```python
|
||||
def prepare_segment(self, video_path, target_duration, output_path,
|
||||
target_resolution=None, source_start: float = 0.0):
|
||||
```
|
||||
|
||||
关键逻辑:
|
||||
- `source_start > 0` 时使用 `-ss` 快速 seek,并强制重编码(避免 stream copy 关键帧不精确)
|
||||
- 当需要循环且有 `source_start` 时,先裁剪出 `source_start` 到视频结尾的片段,再循环裁剪后的文件(避免 `stream_loop` 从视频 0s 开始循环)
|
||||
- 裁剪临时文件在 `finally` 中自动清理
|
||||
|
||||
#### 1.3 `workflow.py` 支持 `custom_assignments`
|
||||
|
||||
- **多素材模式**: `custom_assignments` 存在时,直接使用用户分配(仍运行 Whisper 生成字幕),每个 `prepare_segment` 调用传入 `source_start`
|
||||
- **单素材模式**: `custom_assignments` 有 1 条且 `source_start > 0` 时,先截取片段再传入 LatentSync
|
||||
- **向后兼容**: `custom_assignments` 为 `None` 时完全走旧路径
|
||||
|
||||
---
|
||||
|
||||
### 二、前端新增组件
|
||||
|
||||
#### 2.1 `useTimelineEditor.ts` — 时间轴段管理 hook
|
||||
|
||||
```typescript
|
||||
interface TimelineSegment {
|
||||
id: string; // React key
|
||||
materialId: string; // 素材 ID
|
||||
materialName: string; // 显示名
|
||||
start: number; // 音频时间轴开始秒数
|
||||
end: number; // 音频时间轴结束秒数
|
||||
sourceStart: number; // 源视频截取起点(默认 0)
|
||||
sourceEnd: number; // 源视频截取终点(0 = 到结尾)
|
||||
color: string; // 色块颜色
|
||||
}
|
||||
```
|
||||
|
||||
核心方法:
|
||||
- `initSegments()`: selectedMaterials 变化时按数量均分 audioDuration
|
||||
- `resizeSegment(id, newEnd)`: 拖拽右边界,约束每段最小 1s
|
||||
- `setSourceRange(id, sourceStart, sourceEnd)`: 设置截取范围
|
||||
- `toCustomAssignments()`: 转为后端 `CustomAssignment[]` 格式
|
||||
|
||||
#### 2.2 `TimelineEditor.tsx` — 波形 + 色块时间轴
|
||||
|
||||
- **wavesurfer.js** 渲染音频波形(仅展示,不播放)
|
||||
- 色块层按比例排列,显示素材名 + 时长 + 截取标记
|
||||
- 色块间分割线可拖拽(`onPointerDown/Move/Up` 实现连续像素拖拽)
|
||||
- 点击色块打开 ClipTrimmer
|
||||
|
||||
#### 2.3 `ClipTrimmer.tsx` — 素材截取模态框
|
||||
|
||||
- HTML5 `<video>` 实时预览,拖拽滑块时 `video.currentTime` 跟随
|
||||
- 双端 Range Slider(起点/终点),互锁约束 ≥ 0.5s
|
||||
- 显示截取时长 vs 分配时长对比(循环补足/截断提示)
|
||||
- `loadedmetadata` 获取源视频时长
|
||||
|
||||
---
|
||||
|
||||
### 三、前端整合改动
|
||||
|
||||
#### 3.1 `useHomeController.ts`
|
||||
|
||||
- 集成 `useTimelineEditor` hook
|
||||
- 新增 `clipTrimmerOpen` / `clipTrimmerSegmentId` 状态
|
||||
- `handleGenerate` 多素材时始终发送 `custom_assignments`;单素材 + `sourceStart > 0` 时也发送
|
||||
- 移除不再使用的 `reorderMaterials` 导出
|
||||
|
||||
#### 3.2 `HomePage.tsx`
|
||||
|
||||
- 在 MaterialSelector 和 BgmPanel 之间插入 TimelineEditor(仅当有配音且已选素材时显示)
|
||||
- 底部新增 ClipTrimmer 模态框
|
||||
- 移除 `reorderMaterials` 和 `selectedAudioDuration` prop 传递
|
||||
|
||||
#### 3.3 `MaterialSelector.tsx`
|
||||
|
||||
- 移除配音时长信息栏(功能迁至 TimelineEditor)
|
||||
- 移除拖拽排序区(SortableChip + @dnd-kit 相关代码)
|
||||
- 移除 `onReorderMaterials` / `selectedAudioDuration` prop
|
||||
|
||||
---
|
||||
|
||||
### 四、审查修复的 Bug
|
||||
|
||||
| # | 严重程度 | 问题 | 修复 |
|
||||
|---|---------|------|------|
|
||||
| 1 | **中** | `prepare_segment` 使用 `source_start > 0` + stream copy 时 seek 不精确 | 添加 `source_start > 0` 到重编码条件 |
|
||||
| 2 | **高** | `stream_loop + source_start` 循环时从视频 0s 开始而非从 source_start 循环 | 改为两步:先裁剪片段再循环裁剪后的文件 |
|
||||
| 3 | **低** | `useHomeController` 导出已废弃的 `reorderMaterials` | 移除 |
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
#### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/modules/videos/schemas.py` | 新增 `CustomAssignment` model,`GenerateRequest` 新增 `custom_assignments` 字段 |
|
||||
| `backend/app/services/video_service.py` | `prepare_segment` 新增 `source_start` 参数,循环+截取两步处理 |
|
||||
| `backend/app/modules/videos/workflow.py` | 多素材/单素材流水线支持 `custom_assignments`,传递 `source_start` |
|
||||
|
||||
#### 前端新增
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/model/useTimelineEditor.ts` | 时间轴段管理 hook |
|
||||
| `frontend/src/features/home/ui/TimelineEditor.tsx` | 波形 + 色块时间轴组件 |
|
||||
| `frontend/src/features/home/ui/ClipTrimmer.tsx` | 素材截取模态框 |
|
||||
|
||||
#### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 插入 TimelineEditor + ClipTrimmer |
|
||||
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 移除时长信息 + 拖拽排序区 + 相关 prop |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 集成 useTimelineEditor,handleGenerate 发送 custom_assignments |
|
||||
| `frontend/package.json` | 新增 `wavesurfer.js` 依赖 |
|
||||
|
||||
---
|
||||
|
||||
## 🎨 UI 体验优化 + TTS 稳定性修复 — 第三阶段 (Day 23)
|
||||
|
||||
### 概述
|
||||
|
||||
根据用户反馈,修复 6 项 UI 体验问题,同时修复声音克隆服务的 SoX 路径问题和显存缓存管理。
|
||||
|
||||
> **注**: Qwen3-TTS 已在后续被 CosyVoice 3.0 (端口 8010) 替换,以下记录为当时的修复过程。
|
||||
|
||||
---
|
||||
|
||||
### 一、Qwen3-TTS 稳定性修复 (已被 CosyVoice 3.0 替换)
|
||||
|
||||
#### 1.1 SoX PATH 修复
|
||||
|
||||
**问题**: PM2 启动 qwen-tts 时,`sox` 工具安装在 conda env 的 bin 目录中,系统 PATH 找不到,导致音频编解码走 fallback 路径(CPU 密集型),日志中出现 `SoX could not be found!` 警告。
|
||||
|
||||
**修复**: `run_qwen_tts.sh` 中 export conda env bin 到 PATH:
|
||||
|
||||
```bash
|
||||
export PATH="/home/rongye/ProgramFiles/miniconda3/envs/qwen-tts/bin:$PATH"
|
||||
```
|
||||
|
||||
#### 1.2 CUDA 缓存清理
|
||||
|
||||
**修复**: `qwen_tts_server.py` 每次生成完成后(无论成功或失败)调用 `torch.cuda.empty_cache()`,防止显存碎片累积。使用 `asyncio.to_thread()` 在线程池中运行推理,避免阻塞事件循环导致健康检查超时。
|
||||
|
||||
> **后续**: Qwen3-TTS 已停用,CosyVoice 3.0 沿用了相同的保护机制(GPU 推理锁、超时保护、显存清理、启动自检)。
|
||||
|
||||
---
|
||||
|
||||
### 二、配音列表按钮布局统一 (反馈 #1 + #6)
|
||||
|
||||
**问题**: `GeneratedAudiosPanel` 的试听按钮位于左侧(独立于 Edit/Delete),与 `RefAudioPanel` 的布局不一致。底部文案摘要区域不需要展示。
|
||||
|
||||
**修复**:
|
||||
- Play/Edit/Delete 按钮统一放在右侧同组,hover 显示,顺序为 试听→重命名→删除
|
||||
- 移除选中配音的文案摘要区域
|
||||
- 布局与 RefAudioPanel 一致:左侧名称+时长,右侧操作按钮组
|
||||
|
||||
---
|
||||
|
||||
### 三、视频素材区域移除配音依赖遮罩 (反馈 #2)
|
||||
|
||||
**问题**: MaterialSelector 被 `!selectedAudio` 遮罩覆盖,必须先选配音才能操作素材。
|
||||
|
||||
**修复**: 移除 `HomePage.tsx` 中 MaterialSelector 外层的 disabled overlay `<div>`。素材随时可上传/预览/管理,仅 TimelineEditor 需要选中配音才显示(已有独立条件 `selectedAudio && selectedMaterials.length > 0`)。
|
||||
|
||||
---
|
||||
|
||||
### 四、时间轴拖拽排序 (反馈 #3)
|
||||
|
||||
**问题**: TimelineEditor 不支持调换素材顺序。
|
||||
|
||||
**修复**:
|
||||
- `useTimelineEditor` 已有 `reorderSegments()` 方法(交换两个段的素材信息但保留时间范围)
|
||||
- 通过 `useHomeController` 暴露 `reorderSegments`,传入 `TimelineEditor`
|
||||
- 色块支持 HTML5 Drag & Drop:`draggable` + `onDragStart/Over/Drop/End`
|
||||
- 拖拽时:源色块半透明(`opacity-50`),目标色块高亮 ring(`ring-2 ring-purple-400 scale-[1.02]`)
|
||||
- 光标样式:`cursor-grab` / `active:cursor-grabbing`
|
||||
|
||||
---
|
||||
|
||||
### 五、截取设置双手柄 Range Slider (反馈 #4)
|
||||
|
||||
**问题**: ClipTrimmer 使用两个独立的 `<input type="range">` 滑块,起点和终点分开操作,体验不直观。
|
||||
|
||||
**修复**: 改为自定义双手柄 range slider:
|
||||
- 单条轨道,紫色圆形手柄(起点)+ 粉色圆形手柄(终点)
|
||||
- 轨道底色 `bg-white/10`,选中范围用素材对应颜色高亮
|
||||
- Pointer Events 实现拖拽:`onPointerDown` 捕获手柄 → `onPointerMove` 更新位置 → `onPointerUp` 释放
|
||||
- 手柄互锁约束:起点不超过终点 - 0.5s,终点不低于起点 + 0.5s
|
||||
- 底部显示起点(紫色)和终点(粉色)时间标签
|
||||
|
||||
---
|
||||
|
||||
### 六、截取设置视频预览 (反馈 #5)
|
||||
|
||||
**问题**: ClipTrimmer 的视频只能静态查看,无法播放预览截取范围。
|
||||
|
||||
**修复**:
|
||||
- 视频区域点击可播放/暂停(Play/Pause 图标覆盖层)
|
||||
- 播放范围:从 sourceStart 播放到 sourceEnd 自动停止
|
||||
- 播放结束后回到起点
|
||||
- 拖拽手柄时 `video.currentTime` 实时跟随(seek 到当前位置查看画面)
|
||||
- 播放进度条(白色竖线)叠加在 range slider 轨道上
|
||||
- `preload="auto"` 预加载视频,确保拖拽时快速 seek
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
#### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `run_qwen_tts.sh` | export conda env bin 到 PATH,修复 SoX 找不到问题 (已停用) |
|
||||
| `models/Qwen3-TTS/qwen_tts_server.py` | 每次生成后 `torch.cuda.empty_cache()`,asyncio.to_thread 避免阻塞 (已停用) |
|
||||
|
||||
#### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/GeneratedAudiosPanel.tsx` | 按钮布局统一(Play/Edit/Delete 右侧同组),移除文案摘要 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 移除 MaterialSelector 配音遮罩,传入 onReorderSegment |
|
||||
| `frontend/src/features/home/ui/TimelineEditor.tsx` | 新增 HTML5 Drag & Drop 排序,新增 onReorderSegment prop |
|
||||
| `frontend/src/features/home/ui/ClipTrimmer.tsx` | 双手柄 range slider + 视频播放预览 + 播放进度指示 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 暴露 reorderSegments 方法 |
|
||||
|
||||
---
|
||||
|
||||
## 📝 历史文案保存 + 时间轴拖拽修复 — 第四阶段 (Day 23)
|
||||
|
||||
### 概述
|
||||
|
||||
新增文案手动保存与加载功能,修复时间轴拖拽排序后素材时长不跟随的 Bug,统一按钮视觉规范。
|
||||
|
||||
---
|
||||
|
||||
### 一、历史文案保存与加载
|
||||
|
||||
#### 功能
|
||||
|
||||
用户可手动保存当前文案到历史列表,随时从历史中加载恢复。只有手动保存的文案才出现在历史列表中,与自动保存(`useHomePersistence`)完全独立。
|
||||
|
||||
#### UI 布局
|
||||
|
||||
```
|
||||
按钮栏: [历史文案▼] [文案提取助手] [AI多语言▼] [AI生成标题标签]
|
||||
底部栏: 128 字 [保存文案]
|
||||
```
|
||||
|
||||
- **历史文案下拉**: 展示已保存列表(名称 + 日期 + 删除按钮),点击条目加载文案,空列表显示"暂无保存的文案"
|
||||
- **保存文案按钮**: 文案为空时 disabled,点击后 `toast.success("文案已保存")`
|
||||
- **预计时长已移除**: 底部栏只保留字数 + 保存按钮
|
||||
|
||||
#### 实现
|
||||
|
||||
##### `useSavedScripts.ts`(新建)
|
||||
|
||||
```typescript
|
||||
interface SavedScript { id: string; name: string; content: string; savedAt: number }
|
||||
```
|
||||
|
||||
- localStorage key: `vigent_{storageKey}_savedScripts`
|
||||
- `saveScript(content)`: 取前 15 字符自动命名,新条目插入列表头部,**直接写入 localStorage**
|
||||
- `deleteScript(id)`: 删除指定条目,直接写入 localStorage
|
||||
- `useEffect([lsKey])`: lsKey 变化时(guest → userId)重新从 localStorage 读取
|
||||
- **不使用自动持久化 effect**,避免 storageKey 切换时空数组覆盖已有数据
|
||||
|
||||
##### 数据流
|
||||
|
||||
```
|
||||
ScriptEditor (UI)
|
||||
↑ savedScripts / onSaveScript / onLoadScript / onDeleteScript (纯 props + callbacks)
|
||||
│
|
||||
useHomeController
|
||||
├── useSavedScripts(storageKey) → { savedScripts, saveScript, deleteScript }
|
||||
└── handleSaveScript() → saveScript(text) + toast
|
||||
│
|
||||
HomePage
|
||||
└── 传递 props 到 ScriptEditor
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 二、时间轴拖拽排序 Bug 修复
|
||||
|
||||
#### 问题
|
||||
|
||||
拖拽调换素材顺序后,各素材的时长没有跟随素材移动,而是留在原槽位。例如:素材1(3s) + 素材2(8s+4s循环),拖拽后变成素材2(3s) + 素材1(8s+4s循环),时长分配没变。
|
||||
|
||||
#### 根因
|
||||
|
||||
`reorderSegments` 使用**属性交换**方式:逐个拷贝 `materialId`、`sourceStart`、`sourceEnd` 等属性在两个槽位间交换,然后调用 `recalcPositions` 重算位置。
|
||||
|
||||
#### 修复
|
||||
|
||||
改为**数组移动**(splice):将整个 segment 对象从旧位置取出,插入到新位置。segment 对象携带全部属性(materialId、sourceStart、sourceEnd、color 等)作为一个整体移动,再由 `recalcPositions` 重算位置。
|
||||
|
||||
```typescript
|
||||
// 修复前:属性交换
|
||||
const fromMat = { materialId: next[fromIdx].materialId, ... };
|
||||
const toMat = { materialId: next[toIdx].materialId, ... };
|
||||
next[fromIdx] = { ...next[fromIdx], ...toMat };
|
||||
next[toIdx] = { ...next[toIdx], ...fromMat };
|
||||
|
||||
// 修复后:数组移动
|
||||
const [moved] = next.splice(fromIdx, 1);
|
||||
next.splice(toIdx, 0, moved);
|
||||
```
|
||||
|
||||
附带优势:3+ 素材拖拽行为从"交换"变为"插入",更符合用户直觉。
|
||||
|
||||
---
|
||||
|
||||
### 三、按钮视觉统一
|
||||
|
||||
#### 问题
|
||||
|
||||
历史文案、文案提取助手、AI多语言、AI生成标题标签 4 个按钮高度不一致,AI 按钮的文本被 `<span>` 嵌套包裹导致内部布局差异。
|
||||
|
||||
#### 修复
|
||||
|
||||
- 4 个按钮统一为 `h-7 px-2.5 text-xs rounded inline-flex items-center gap-1`(固定高度 28px)
|
||||
- 移除 AI多语言 / AI生成标题标签 按钮内多余的 `<span>` 嵌套,改为 `<>...</>` fragment
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
#### 前端新增
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/model/useSavedScripts.ts` | 历史文案 hook(localStorage 持久化) |
|
||||
|
||||
#### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/ScriptEditor.tsx` | 历史文案下拉 + 保存按钮 + 移除预计时长 + 按钮高度统一 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 集成 useSavedScripts,新增 handleSaveScript |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 传递 savedScripts / handleSaveScript / deleteSavedScript 到 ScriptEditor |
|
||||
| `frontend/src/features/home/model/useTimelineEditor.ts` | reorderSegments 从属性交换改为数组移动(splice) |
|
||||
|
||||
---
|
||||
|
||||
## 🔤 字幕语言不匹配 + 视频比例错位修复 — 第五阶段 (Day 23)
|
||||
|
||||
### 概述
|
||||
|
||||
修复两个视频生成 Bug:
|
||||
1. **字幕语言不匹配**: 中文配音 + 英文翻译文案 → 字幕错误显示英文(Whisper 独立转录,忽略原文)
|
||||
2. **标题字幕比例错位**: 9:16 竖屏素材生成视频后,标题/字幕按 16:9 横屏布局渲染
|
||||
|
||||
附带修复代码审查中发现的 `split_word_to_chars` 英文空格丢失问题。
|
||||
|
||||
---
|
||||
|
||||
### 一、字幕用原文替换 Whisper 转录文字
|
||||
|
||||
#### 根因
|
||||
|
||||
Whisper 对音频独立转录,完全忽略传入的 `text` 参数。当配音语言与编辑器文案语言不一致时(例如:用户先写中文文案 → 翻译成英文 → 生成英文配音 → 再改回中文文案),Whisper "听到"英文语音就输出英文字幕。
|
||||
|
||||
#### 修复思路
|
||||
|
||||
Whisper 仅负责检测**语音总时间范围**(`first_start` → `last_end`),字幕文字永远用配音保存的原始文案。
|
||||
|
||||
#### `whisper_service.py` — `align()` 新增 `original_text` 参数
|
||||
|
||||
```python
|
||||
async def align(self, audio_path, text, output_path=None,
|
||||
language="zh", original_text=None):
|
||||
```
|
||||
|
||||
当 `original_text` 非空时:
|
||||
1. 正常运行 Whisper 转录,记录 `whisper_first_start` 和 `whisper_last_end`
|
||||
2. 将 `original_text` 传入 `split_word_to_chars()` 在总时间范围上线性分布
|
||||
3. 用 `split_segment_to_lines()` 按标点和字数断行
|
||||
4. 替换 Whisper 的转录结果
|
||||
|
||||
#### `workflow.py` — 配音元数据无条件覆盖 + 传入原文
|
||||
|
||||
```python
|
||||
# 改前(只在文案为空时覆盖)
|
||||
if not req.text.strip():
|
||||
req.text = meta.get("text", req.text)
|
||||
|
||||
# 改后(无条件用配音元数据覆盖)
|
||||
meta_text = meta.get("text", "")
|
||||
if meta_text:
|
||||
req.text = meta_text
|
||||
```
|
||||
|
||||
所有 4 处 `whisper_service.align()` 调用添加 `original_text=req.text`。
|
||||
|
||||
---
|
||||
|
||||
### 二、Remotion 动态传入视频尺寸
|
||||
|
||||
#### 根因
|
||||
|
||||
`remotion/src/Root.tsx` 硬编码 `width={1280} height={720}`。虽然 `render.ts` 用 ffprobe 检测真实尺寸后覆盖 `composition.width/height`,但 `selectComposition` 阶段组件已按 1280×720 初始化,标题和字幕定位基于错误的画布尺寸。
|
||||
|
||||
#### 修复
|
||||
|
||||
##### `Root.tsx` — `calculateMetadata` 从 props 读取尺寸
|
||||
|
||||
```tsx
|
||||
<Composition
|
||||
id="ViGentVideo"
|
||||
component={Video}
|
||||
durationInFrames={300}
|
||||
fps={25}
|
||||
width={1080}
|
||||
height={1920}
|
||||
calculateMetadata={async ({ props }) => ({
|
||||
width: props.width || 1080,
|
||||
height: props.height || 1920,
|
||||
})}
|
||||
defaultProps={{
|
||||
videoSrc: '',
|
||||
width: 1080,
|
||||
height: 1920,
|
||||
// ...
|
||||
}}
|
||||
/>
|
||||
```
|
||||
|
||||
默认从 1280×720 改为 1080×1920(竖屏优先),`calculateMetadata` 确保 `selectComposition` 阶段使用 ffprobe 检测的真实尺寸。
|
||||
|
||||
##### `Video.tsx` — VideoProps 新增可选 `width/height`
|
||||
|
||||
仅供 `calculateMetadata` 访问,组件渲染不引用。
|
||||
|
||||
##### `render.ts` — inputProps 统一传入视频尺寸
|
||||
|
||||
```typescript
|
||||
const inputProps = {
|
||||
videoSrc: videoFileName,
|
||||
captions,
|
||||
title: options.title,
|
||||
// ...
|
||||
width: videoWidth, // ffprobe 检测值
|
||||
height: videoHeight, // ffprobe 检测值
|
||||
};
|
||||
```
|
||||
|
||||
`selectComposition` 和 `renderMedia` 使用同一个 `inputProps`。保留显式 `composition.width/height` 覆盖作为保险。
|
||||
|
||||
---
|
||||
|
||||
### 三、代码审查修复:英文空格丢失
|
||||
|
||||
#### 问题
|
||||
|
||||
`split_word_to_chars` 原设计处理 Whisper 单个词(如 `" Hello"`),但 `original_text` 传入整段文本时,中间空格被 `continue` 跳过且不 flush `ascii_buffer`,导致 `"Hello World"` 变成 `"HelloWorld"`。
|
||||
|
||||
#### 执行路径追踪
|
||||
|
||||
```
|
||||
输入: "Hello World"
|
||||
H,e,l,l,o → ascii_buffer = "Hello"
|
||||
' ' → continue(跳过,不 flush!)
|
||||
W,o,r,l,d → ascii_buffer = "HelloWorld"
|
||||
结果: tokens = ["HelloWorld"] ← 空格丢失
|
||||
```
|
||||
|
||||
#### 修复
|
||||
|
||||
遇到空格时 flush `ascii_buffer`,并用 `pending_space` 标记给下一个 token 前置空格:
|
||||
|
||||
```python
|
||||
if not char.strip():
|
||||
if ascii_buffer:
|
||||
tokens.append(ascii_buffer)
|
||||
ascii_buffer = ""
|
||||
if tokens:
|
||||
pending_space = True
|
||||
continue
|
||||
```
|
||||
|
||||
修复后:`"Hello World"` → tokens = `["Hello", " World"]` → 字幕正确显示。中文不受影响。
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
#### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/services/whisper_service.py` | `align()` 新增 `original_text` 参数;`split_word_to_chars` 修复英文空格丢失 |
|
||||
| `backend/app/modules/videos/workflow.py` | 配音元数据无条件覆盖 text/language;4 处 `align()` 调用传入 `original_text` |
|
||||
|
||||
#### 前端修改(Remotion)
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `remotion/src/Root.tsx` | 默认尺寸改为 1080×1920,新增 `calculateMetadata` + width/height defaultProps |
|
||||
| `remotion/src/Video.tsx` | VideoProps 新增可选 `width`/`height` |
|
||||
| `remotion/render.ts` | inputProps 统一传入 `videoWidth`/`videoHeight`,selectComposition 和 renderMedia 共用 |
|
||||
|
||||
---
|
||||
|
||||
## 🎤 参考音频自动转写 + 语速控制 — 第六阶段 (Day 23)
|
||||
|
||||
### 概述
|
||||
|
||||
解决声音克隆 ref_text 不匹配问题:旧方案使用前端固定文字作为 ref_text,CosyVoice zero-shot 克隆要求 ref_text 必须与参考音频实际内容匹配,不匹配时模型会在生成音频开头"幻觉"出多余片段。
|
||||
|
||||
**改进**:上传参考音频时自动调用 Whisper 转写内容作为 ref_text,同时新增语速控制功能。
|
||||
|
||||
---
|
||||
|
||||
### 一、Whisper 自动转写参考音频
|
||||
|
||||
#### 1.1 `whisper_service.py` — 语言自动检测
|
||||
|
||||
`transcribe()` 方法原先硬编码 `language="zh"`,改为接受可选 `language` 参数(默认 `None` = 自动检测),支持多语言参考音频。
|
||||
|
||||
#### 1.2 `ref_audios/service.py` — 上传时自动转写
|
||||
|
||||
上传流程变更:转码 WAV → 检查时长(≥1s) → 超 10s 在静音点截取 → **Whisper 自动转写** → 验证非空 → 上传。
|
||||
|
||||
```python
|
||||
try:
|
||||
transcribed = await whisper_service.transcribe(tmp_wav_path)
|
||||
if transcribed.strip():
|
||||
ref_text = transcribed.strip()
|
||||
except Exception as e:
|
||||
logger.warning(f"Auto-transcribe failed: {e}")
|
||||
|
||||
if not ref_text or not ref_text.strip():
|
||||
raise ValueError("无法识别音频内容,请确保音频包含清晰的语音")
|
||||
```
|
||||
|
||||
#### 1.3 `ref_audios/router.py` — ref_text 改为可选
|
||||
|
||||
`ref_text: str = Form("")`(不再必填),前端不再发送固定文字。
|
||||
|
||||
---
|
||||
|
||||
### 二、参考音频智能截取(10 秒上限)
|
||||
|
||||
CosyVoice 对 3-10 秒参考音频效果最好。
|
||||
|
||||
#### 2.1 静音点检测
|
||||
|
||||
使用 ffmpeg `silencedetect` 找 10 秒内最后一个静音结束点(阈值 -30dB,最短 0.3s),避免在字词中间硬切:
|
||||
|
||||
```python
|
||||
def _find_silence_cut_point(file_path, max_duration):
|
||||
# silencedetect → 解析 silence_end → 找 3s~max_duration 内最后的静音点
|
||||
# 找不到则回退到 max_duration
|
||||
```
|
||||
|
||||
#### 2.2 淡出处理
|
||||
|
||||
截取时末尾 0.1 秒淡出(`afade=t=out`),避免截断爆音。
|
||||
|
||||
---
|
||||
|
||||
### 三、重新识别功能(旧数据迁移)
|
||||
|
||||
#### 3.1 新增 API
|
||||
|
||||
`POST /api/ref-audios/{audio_id}/retranscribe` — 下载音频 → 超 10s 截取 → Whisper 转写 → 重新上传音频和元数据。
|
||||
|
||||
#### 3.2 前端 UI
|
||||
|
||||
- RefAudioPanel 新增 RotateCw 按钮("重新识别文字"),转写中显示 `animate-spin`
|
||||
- 旧音频 ref_text 以固定文字开头时显示 ⚠ 黄色警告
|
||||
|
||||
---
|
||||
|
||||
### 四、语速控制(CosyVoice speed 参数)
|
||||
|
||||
#### 4.1 全链路传递
|
||||
|
||||
```
|
||||
前端 GeneratedAudiosPanel (速度选择器)
|
||||
→ useHomeController (speed state + persistence)
|
||||
→ useGeneratedAudios.generateAudio(params)
|
||||
→ POST /api/generated-audios/generate { speed: 1.0 }
|
||||
→ GenerateAudioRequest.speed (Pydantic)
|
||||
→ generate_audio_task → voice_clone_service.generate_audio(speed=)
|
||||
→ _generate_once → POST /generate { speed: "1.0" }
|
||||
→ cosyvoice_server → _model.inference_zero_shot(speed=speed)
|
||||
```
|
||||
|
||||
#### 4.2 前端 UI
|
||||
|
||||
声音克隆模式下,配音列表面板标题栏"生成配音"按钮左侧显示语速下拉菜单(`语速: 正常 ▼`):
|
||||
|
||||
| 标签 | speed 值 |
|
||||
|------|----------|
|
||||
| 较慢 | 0.8 |
|
||||
| 稍慢 | 0.9 |
|
||||
| 正常 | 1.0 (默认) |
|
||||
| 稍快 | 1.1 |
|
||||
| 较快 | 1.2 |
|
||||
|
||||
语速选择持久化到 localStorage(`vigent_{storageKey}_speed`)。
|
||||
|
||||
---
|
||||
|
||||
### 五、缺少参考音频门控
|
||||
|
||||
声音克隆模式下未选参考音频时:
|
||||
- "生成配音"按钮禁用 + title 提示"请先选择参考音频"
|
||||
- 面板内显示黄色警告条"声音克隆模式需要先选择参考音频"
|
||||
|
||||
---
|
||||
|
||||
### 六、前端清理
|
||||
|
||||
- 移除 `FIXED_REF_TEXT` 常量和 `fixedRefText` prop
|
||||
- 移除"请朗读以下内容"引导区块
|
||||
- 上传提示简化为"上传任意语音样本(3-10秒),系统将自动识别内容并克隆声音"
|
||||
- 录音区备注"建议 3-10 秒,超出将自动截取"
|
||||
|
||||
---
|
||||
|
||||
### 涉及文件汇总
|
||||
|
||||
#### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/services/whisper_service.py` | `transcribe()` 增加可选 `language` 参数,默认 None (自动检测) |
|
||||
| `backend/app/modules/ref_audios/service.py` | 上传自动转写 + 静音点截取 + 淡出 + retranscribe 函数 |
|
||||
| `backend/app/modules/ref_audios/router.py` | `ref_text` 改为 Form(""),新增 retranscribe 端点 |
|
||||
| `backend/app/modules/generated_audios/schemas.py` | `GenerateAudioRequest` 新增 `speed: float = 1.0` |
|
||||
| `backend/app/modules/generated_audios/service.py` | 传递 `req.speed` 到 voice_clone_service |
|
||||
| `backend/app/services/voice_clone_service.py` | `generate_audio()` / `_generate_once()` 接受并传递 speed |
|
||||
| `models/CosyVoice/cosyvoice_server.py` | `/generate` 端点接受 `speed` 参数,传递到 `inference_zero_shot(speed=)` |
|
||||
|
||||
#### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 新增 speed state,移除 FIXED_REF_TEXT,handleGenerateAudio 传 speed |
|
||||
| `frontend/src/features/home/model/useHomePersistence.ts` | 新增 speed 持久化 |
|
||||
| `frontend/src/features/home/model/useRefAudios.ts` | 移除 fixedRefText,新增 retranscribe |
|
||||
| `frontend/src/features/home/model/useGeneratedAudios.ts` | generateAudio params 新增 speed |
|
||||
| `frontend/src/features/home/ui/GeneratedAudiosPanel.tsx` | 新增语速选择器 + 缺少参考音频门控 |
|
||||
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 移除朗读引导,新增重新识别按钮 + ⚠ 警告 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 传递 speed/setSpeed/ttsMode 到 GeneratedAudiosPanel |
|
||||
185
Docs/DevLogs/Day24.md
Normal file
185
Docs/DevLogs/Day24.md
Normal file
@@ -0,0 +1,185 @@
|
||||
## 🔧 鉴权到期治理 + 多素材时间轴稳定性修复 (Day 24)
|
||||
|
||||
### 概述
|
||||
|
||||
本日主要完成两条主线:
|
||||
|
||||
1. **账号与鉴权治理**:会员到期改为请求时自动失效(登录/鉴权接口触发),并统一返回续费提示。
|
||||
2. **视频生成稳定性**:围绕多素材时间轴、截取语义、拼接边界冻结、画面比例与字幕标题适配进行一轮端到端修复。
|
||||
|
||||
---
|
||||
|
||||
## 🔐 会员到期请求时失效 — 第一阶段 (Day 24)
|
||||
|
||||
### 目标
|
||||
|
||||
避免依赖定时任务,用户在触发登录或访问受保护接口时即可完成到期判定与账号停用。
|
||||
|
||||
### 行为调整
|
||||
|
||||
- 到期判断基于 `users.expires_at`。
|
||||
- 判定到期后:
|
||||
- 将 `is_active` 自动置为 `false`
|
||||
- 删除该用户全部 session
|
||||
- 返回 `403`,提示:`会员已到期,请续费`
|
||||
|
||||
### 实现点
|
||||
|
||||
- `users.py` 新增 `deactivate_user_if_expired()`,并补充 `_parse_expires_at()` 统一时区解析。
|
||||
- `deps.py` 在 `get_current_user` / `get_current_user_optional` 中统一接入到期检查。
|
||||
- `auth/router.py` 在登录路径增加到期停用逻辑;`/api/auth/me` 统一走 `Depends(get_current_user)`。
|
||||
|
||||
---
|
||||
|
||||
## 🖼️ 画面比例控制 + 字幕标题适配 — 第二阶段 (Day 24)
|
||||
|
||||
### 2.1 输出画面比例可配置
|
||||
|
||||
- 时间轴顶部新增“画面比例”下拉:`9:16` / `16:9`。
|
||||
- 默认值 `9:16`,并持久化到 localStorage。
|
||||
- 生成请求携带 `output_aspect_ratio`,后端在单素材与多素材流程中统一按目标分辨率处理。
|
||||
|
||||
### 2.2 标题/字幕在窄屏画布防溢出
|
||||
|
||||
为减少“预览正常、成片溢出”的差异,统一了预览与渲染策略:
|
||||
|
||||
- 根据 composition 宽度进行响应式缩放。
|
||||
- 开启可换行:`white-space: normal` + `word-break` + `overflow-wrap`。
|
||||
- 描边、字距、上下边距同步按比例缩放。
|
||||
|
||||
### 2.3 片头标题显示模式(短暂/常驻)
|
||||
|
||||
- 在“标题与字幕”面板的“片头标题”行尾新增下拉,支持:`短暂显示` / `常驻显示`。
|
||||
- 默认模式为 `短暂显示`,短暂模式默认时长为 4 秒。
|
||||
- 用户选择会持久化到 localStorage,刷新后保持上次配置。
|
||||
- 生成请求新增 `title_display_mode`,短暂模式透传 `title_duration=4.0`。
|
||||
- Remotion 端到端支持该参数:
|
||||
- `short`:标题在设定时长后淡出并结束渲染;
|
||||
- `persistent`:标题全程常驻(保留淡入动画,不执行淡出)。
|
||||
|
||||
---
|
||||
|
||||
## 🎥 方向归一化 + 多素材拼接稳定性 — 第三阶段 (Day 24)
|
||||
|
||||
### 3.1 MOV 旋转元数据导致横竖识别错误
|
||||
|
||||
问题场景:编码分辨率是横屏,但依赖 rotation side-data 才能正确显示为竖屏(常见于手机 MOV)。
|
||||
|
||||
修复方案:
|
||||
|
||||
- `get_video_metadata()` 扩展返回 `rotation/effective_width/effective_height`。
|
||||
- 新增 `normalize_orientation()`,在流程前对带旋转元数据素材做物理方向归一化。
|
||||
- 单素材和多素材下载后统一执行方向归一化,再做分辨率决策。
|
||||
|
||||
### 3.2 多素材“只看到第一段”与边界冻结
|
||||
|
||||
针对拼接可靠性补了两类保护:
|
||||
|
||||
- **分配保护**:`custom_assignments` 与素材数量不一致时,后端回退自动分配,避免异常输入导致仅首段生效。
|
||||
- **编码一致性**:
|
||||
- 片段准备阶段统一重编码;
|
||||
- concat 阶段不再走拷贝;
|
||||
- 进一步统一为 `25fps + CFR`,并在 concat 增加 `+genpts`,降低段边界时间基不连续导致的“画面冻结口型还动”风险。
|
||||
|
||||
---
|
||||
|
||||
## ⏱️ 时间轴截取语义对齐修复 — 第四阶段 (Day 24)
|
||||
|
||||
### 背景
|
||||
|
||||
时间轴设计语义是:
|
||||
|
||||
- 每段可以设置 `sourceStart/sourceEnd`;
|
||||
- 总时长超出音频时,仅保留可见段,末段截齐音频;
|
||||
- 总时长不足时,由最后可见段循环补齐。
|
||||
|
||||
本日将前后端对齐到这一语义。
|
||||
|
||||
### 4.1 `source_end` 全链路打通
|
||||
|
||||
此前仅传 `source_start`,导致后端无法准确知道“截到哪里”。
|
||||
|
||||
本次改动:
|
||||
|
||||
- 前端 `toCustomAssignments()` 增加可选 `source_end`。
|
||||
- 后端 `CustomAssignment` schema 增加 `source_end`。
|
||||
- workflow 将 `source_end` 透传到 `prepare_segment()`(单素材/多素材均支持)。
|
||||
- `prepare_segment()` 增加 `source_end` 参数,按 `[source_start, source_end)` 计算可用片段,并在需要循环时先裁剪再循环,避免循环范围错位。
|
||||
|
||||
### 4.2 时间轴有效时长计算修复
|
||||
|
||||
修复 `sourceStart > 0 且 sourceEnd = 0` 时的有效时长错误:
|
||||
|
||||
- 旧逻辑会按整段素材时长计算;
|
||||
- 新逻辑改为 `materialDuration - sourceStart`。
|
||||
|
||||
该修复同时用于:
|
||||
|
||||
- `recalcPositions()` 的段时长计算;
|
||||
- TimelineEditor 中“循环补足”可视化比例计算。
|
||||
|
||||
### 4.3 可见段分配优先级修复
|
||||
|
||||
修复“可见段数 < 已选素材数时,custom_assignments 被丢弃回退自动分配”的问题:
|
||||
|
||||
- 生成请求优先以时间轴可见段的 `assignments` 为准;
|
||||
- 超出时间轴的素材不参与本次生成。
|
||||
|
||||
### 4.4 单素材截取触发条件补齐
|
||||
|
||||
单素材模式下,若只改了终点(`sourceEnd > 0`)也会发送 `custom_assignments`,确保截取生效。
|
||||
|
||||
---
|
||||
|
||||
## 🧭 页面交互与体验细节 — 第五阶段 (Day 24)
|
||||
|
||||
- 页面刷新后自动回到顶部,避免从历史滚动位置进入页面。
|
||||
- 素材列表与历史视频列表滚动增加“跳过首次自动滚动”保护,减少恢复状态时页面跳动。
|
||||
- 时间轴比例区移除多余文案,保持信息简洁。
|
||||
|
||||
---
|
||||
|
||||
## 涉及文件汇总
|
||||
|
||||
### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/repositories/users.py` | 新增 `deactivate_user_if_expired()` 与 `_parse_expires_at()` |
|
||||
| `backend/app/core/deps.py` | `get_current_user` / `get_current_user_optional` 接入到期失效检查 |
|
||||
| `backend/app/modules/auth/router.py` | 登录时到期停用 + `/api/auth/me` 统一鉴权依赖 |
|
||||
| `backend/app/modules/videos/schemas.py` | `CustomAssignment` 新增 `source_end`;保留 `output_aspect_ratio` |
|
||||
| `backend/app/modules/videos/workflow.py` | 多素材/单素材透传 `source_end`;多素材 prepare/concat 统一 25fps;标题显示模式参数透传 Remotion |
|
||||
| `backend/app/services/video_service.py` | 旋转元数据解析与方向归一化;`prepare_segment` 支持 `source_end/target_fps`;concat 强制 CFR + `+genpts` |
|
||||
| `backend/app/services/remotion_service.py` | render 支持 `title_display_mode/title_duration` 并传递到 render.ts |
|
||||
|
||||
### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/model/useTimelineEditor.ts` | `CustomAssignment` 新增 `source_end`;修复 sourceStart 开放终点时长计算 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 多素材以可见 assignments 为准发送;单素材截取触发条件补齐 |
|
||||
| `frontend/src/features/home/ui/TimelineEditor.tsx` | 画面比例下拉;循环比例按截取后有效时长计算 |
|
||||
| `frontend/src/features/home/model/useHomePersistence.ts` | `outputAspectRatio` 与 `titleDisplayMode` 持久化 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 页面进入滚动到顶部;ClipTrimmer/Timeline 交互保持一致 |
|
||||
| `frontend/src/features/home/ui/FloatingStylePreview.tsx` | 标题/字幕样式预览与成片渲染策略对齐 |
|
||||
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | 标题行新增“短暂显示/常驻显示”下拉 |
|
||||
|
||||
### Remotion 修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `remotion/src/components/Title.tsx` | 标题响应式缩放与自动换行;新增短暂/常驻显示模式控制 |
|
||||
| `remotion/src/components/Subtitles.tsx` | 字幕响应式缩放与自动换行,减少预览/成片差异 |
|
||||
| `remotion/src/Video.tsx` | 新增 `titleDisplayMode` 透传到标题组件 |
|
||||
| `remotion/src/Root.tsx` | 默认 props 增加 `titleDisplayMode='short'` 与 `titleDuration=4` |
|
||||
| `remotion/render.ts` | CLI 参数新增 `--titleDisplayMode`,inputProps 增加 `titleDisplayMode` |
|
||||
|
||||
---
|
||||
|
||||
## 验证记录
|
||||
|
||||
- 后端语法检查:`python -m py_compile backend/app/modules/videos/schemas.py backend/app/modules/videos/workflow.py backend/app/services/video_service.py backend/app/services/remotion_service.py`
|
||||
- 前端类型检查:`npx tsc --noEmit`
|
||||
- 前端 ESLint:`npx eslint src/features/home/model/useHomeController.ts src/features/home/model/useHomePersistence.ts src/features/home/ui/HomePage.tsx src/features/home/ui/TitleSubtitlePanel.tsx`
|
||||
- Remotion 渲染脚本构建:`npm run build:render`
|
||||
254
Docs/DevLogs/Day25.md
Normal file
254
Docs/DevLogs/Day25.md
Normal file
@@ -0,0 +1,254 @@
|
||||
## 🔧 文案提取助手修复 — 抖音链接无法提取文案 (Day 25)
|
||||
|
||||
### 概述
|
||||
|
||||
文案提取助手粘贴抖音链接后无法提取文案,yt-dlp 报错 `Fresh cookies are needed`,手动回退方案也因抖音页面结构变化失效。本日完成了完整修复,并清理了不再需要的 `DOUYIN_COOKIE` 配置。
|
||||
|
||||
---
|
||||
|
||||
## 🐛 问题诊断
|
||||
|
||||
### 错误链路
|
||||
|
||||
1. **yt-dlp 失败**:`ERROR: [Douyin] Fresh cookies (not necessarily logged in) are needed`
|
||||
- yt-dlp 版本 `2025.12.08` 过旧
|
||||
- 抖音 API `aweme/v1/web/aweme/detail/` 需要签名 cookie(`s_v_web_id` 等),即使升级 yt-dlp 到最新版 + 传入 cookie 仍无法解决,属 yt-dlp 已知问题
|
||||
2. **手动回退失败**:`Could not find RENDER_DATA in page`
|
||||
- 旧方案通过桌面端用户主页 + `modal_id` 访问,抖音 SSR 已不再返回 `videoDetail` 数据
|
||||
3. **`.env` 中 `DOUYIN_COOKIE`**:时间戳 2024 年 12 月,早已过期
|
||||
|
||||
---
|
||||
|
||||
## ✅ 修复方案:移动端分享页 + 自动获取 ttwid
|
||||
|
||||
### 核心思路
|
||||
|
||||
放弃依赖 yt-dlp 下载抖音视频和手动维护 cookie,改为:
|
||||
|
||||
1. 自动从 ByteDance 公共 API 获取新鲜 `ttwid`(匿名令牌,不绑定账号)
|
||||
2. 用 `ttwid` 访问移动端分享页 `m.douyin.com/share/video/{id}`
|
||||
3. 从页面内嵌 JSON 中提取 `play_addr` 播放地址并下载
|
||||
|
||||
### 关键代码(`_download_douyin_manual` 重写)
|
||||
|
||||
```python
|
||||
# 1. 获取新鲜 ttwid
|
||||
ttwid_resp = await client.post(
|
||||
"https://ttwid.bytedance.com/ttwid/union/register/",
|
||||
json={"region": "cn", "aid": 6383, "service": "www.douyin.com", ...}
|
||||
)
|
||||
ttwid = ttwid_resp.cookies.get("ttwid", "")
|
||||
|
||||
# 2. 访问移动端分享页
|
||||
page_resp = await client.get(
|
||||
f"https://m.douyin.com/share/video/{video_id}",
|
||||
headers={"cookie": f"ttwid={ttwid}", ...}
|
||||
)
|
||||
|
||||
# 3. 提取 play_addr
|
||||
addr_match = re.search(r'"play_addr":\{"uri":"([^"]+)","url_list":\["([^"]+)"', page_text)
|
||||
video_url = addr_match.group(2).replace(r"\u002F", "/")
|
||||
```
|
||||
|
||||
### 优势
|
||||
|
||||
- 不再依赖手动维护的 `DOUYIN_COOKIE`,ttwid 每次请求自动获取
|
||||
- 不受 yt-dlp 对抖音支持状况影响
|
||||
- 所有用户通用,不绑定特定账号
|
||||
|
||||
---
|
||||
|
||||
## 🧹 清理 DOUYIN_COOKIE 配置
|
||||
|
||||
`DOUYIN_COOKIE` 仅用于文案提取,新方案不再需要,已从以下位置删除:
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/.env` | 删除 `DOUYIN_COOKIE` 配置项及注释 |
|
||||
| `backend/app/core/config.py` | 删除 `DOUYIN_COOKIE: str = ""` 字段定义 |
|
||||
| `backend/app/modules/tools/service.py` | 删除 yt-dlp 传 cookie 逻辑和 `_write_netscape_cookies` 辅助函数 |
|
||||
|
||||
---
|
||||
|
||||
## 🔤 前端文案修正
|
||||
|
||||
将文案提取界面中的"AI 洗稿结果"改为"AI 改写结果"。
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | `AI 洗稿结果` → `AI 改写结果` |
|
||||
| `backend/app/modules/tools/service.py` | 注释中"洗稿"→"改写" |
|
||||
| `backend/app/services/glm_service.py` | docstring 中"洗稿"→"改写文案" |
|
||||
|
||||
---
|
||||
|
||||
## 📦 其他变更
|
||||
|
||||
- **yt-dlp 升级**:`2025.12.08` → `2026.2.21`
|
||||
- **yt-dlp 初始化修正**:改为 `YoutubeDL(ydl_opts)` 直接传参初始化(原先空初始化后 update params 不生效)
|
||||
- **User-Agent 更新**:yt-dlp 中 `Chrome/91` → `Chrome/131`
|
||||
|
||||
---
|
||||
|
||||
## 涉及文件汇总
|
||||
|
||||
### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/modules/tools/service.py` | 重写 `_download_douyin_manual`(移动端分享页方案);修正 yt-dlp 初始化;清理 cookie 相关代码;注释改写 |
|
||||
| `backend/app/services/glm_service.py` | docstring "洗稿" → "改写文案" |
|
||||
| `backend/app/core/config.py` | 删除 `DOUYIN_COOKIE` 字段 |
|
||||
| `backend/.env` | 删除 `DOUYIN_COOKIE` 配置 |
|
||||
| `backend/requirements.txt` | yt-dlp 版本升级 |
|
||||
|
||||
### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | "AI 洗稿结果" → "AI 改写结果" |
|
||||
|
||||
---
|
||||
|
||||
## ✏️ AI 智能改写 — 自定义提示词功能
|
||||
|
||||
### 概述
|
||||
|
||||
文案提取助手的"AI 智能改写"原先使用硬编码 prompt,用户无法定制改写风格。本次在 checkbox 右侧新增"自定义提示词"折叠区域,用户可编辑自定义 prompt,持久化到 localStorage,后端按需替换默认 prompt。
|
||||
|
||||
### 后端修改
|
||||
|
||||
**路由层** (`router.py`):`extract_script_tool` 新增可选 Form 参数 `custom_prompt: Optional[str] = Form(None)`,透传给 service。
|
||||
|
||||
**服务层** (`service.py`):`extract_script()` 签名新增 `custom_prompt`,透传给 `glm_service.rewrite_script(script, custom_prompt)`。
|
||||
|
||||
**AI 层** (`glm_service.py`):`rewrite_script(self, text, custom_prompt=None)`,若 `custom_prompt` 有值则用自定义 prompt + 原文拼接,否则保持原有默认 prompt。
|
||||
|
||||
```python
|
||||
if custom_prompt and custom_prompt.strip():
|
||||
prompt = f"""{custom_prompt.strip()}
|
||||
|
||||
原始文案:
|
||||
{text}"""
|
||||
else:
|
||||
prompt = f"""请将以下视频文案进行改写。...(原有默认)"""
|
||||
```
|
||||
|
||||
### 前端修改
|
||||
|
||||
**Hook** (`useScriptExtraction.ts`):
|
||||
- 新增 `customPrompt` / `showCustomPrompt` 状态
|
||||
- 初始值从 `localStorage.getItem("vigent_rewriteCustomPrompt")` 恢复
|
||||
- `customPrompt` 变化时防抖 300ms 保存到 localStorage
|
||||
- `handleExtract()` 中若 `doRewrite && customPrompt.trim()` 有值,追加 `formData.append("custom_prompt", ...)`
|
||||
- modal 重置时不清空 customPrompt(持久化偏好)
|
||||
|
||||
**UI** (`ScriptExtractionModal.tsx`):
|
||||
- checkbox 同行右侧新增"自定义提示词 ▼"按钮(仅 `doRewrite` 时显示)
|
||||
- 点击展开 textarea 编辑区域,底部提示"留空则使用默认提示词"
|
||||
- 取消勾选 AI 智能改写时,自定义提示词区域自动隐藏
|
||||
|
||||
### 涉及文件
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/modules/tools/router.py` | 新增 `custom_prompt` Form 参数 |
|
||||
| `backend/app/modules/tools/service.py` | `extract_script()` 透传 `custom_prompt` |
|
||||
| `backend/app/services/glm_service.py` | `rewrite_script()` 支持自定义 prompt |
|
||||
| `frontend/.../useScriptExtraction.ts` | 新增状态、localStorage 持久化、FormData 传参 |
|
||||
| `frontend/.../ScriptExtractionModal.tsx` | UI 按钮 + 展开 textarea |
|
||||
|
||||
### 验证
|
||||
|
||||
- 后端 `python -m py_compile` 三个文件通过
|
||||
- 前端 `npx tsc --noEmit` 通过
|
||||
|
||||
---
|
||||
|
||||
## 🐛 SSR 构建修复 — localStorage is not defined
|
||||
|
||||
### 问题
|
||||
|
||||
`npm run build` 报错 `ReferenceError: localStorage is not defined`,因为 `useScriptExtraction.ts` 中 `useState` 的初始化函数在 SSR(Node.js)环境下也会执行,而服务端没有 `localStorage`。
|
||||
|
||||
### 修复
|
||||
|
||||
`useState` 初始化加 `typeof window !== "undefined"` 守卫:
|
||||
|
||||
```typescript
|
||||
const [customPrompt, setCustomPrompt] = useState(
|
||||
() => typeof window !== "undefined" ? localStorage.getItem(CUSTOM_PROMPT_KEY) || "" : ""
|
||||
);
|
||||
```
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/.../useScriptExtraction.ts` | `useState` 初始化增加 SSR 安全守卫 |
|
||||
|
||||
---
|
||||
|
||||
## 🎬 片头副标题功能
|
||||
|
||||
### 概述
|
||||
|
||||
新增片头副标题(secondary_title),显示在主标题下方,用于补充说明或悬念引导。副标题有独立的样式配置(字体、字号、颜色等),可由 AI 同时生成,20 字限制,仅在视频画面中显示,不参与发布标题。
|
||||
|
||||
命名约定:后端 `secondary_title`(snake_case),前端 `videoSecondaryTitle`(camelCase),用户界面"片头副标题"。
|
||||
|
||||
---
|
||||
|
||||
### 后端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/modules/videos/schemas.py` | `GenerateRequest` 新增 4 个可选字段:`secondary_title`、`secondary_title_style_id`、`secondary_title_font_size`、`secondary_title_top_margin` |
|
||||
| `backend/app/services/glm_service.py` | AI prompt 增加副标题生成要求(不超过20字),JSON 格式新增 `secondary_title` 字段 |
|
||||
| `backend/app/modules/ai/router.py` | `GenerateMetaResponse` 增加 `secondary_title: str = ""`,endpoint 返回时取 `result.get("secondary_title", "")` |
|
||||
| `backend/app/modules/videos/workflow.py` | `use_remotion` 条件增加 `or req.secondary_title`;副标题样式解析复用 `get_style("title", ...)`;字号/间距覆盖;`prepare_style_for_remotion` 处理副标题字体;`remotion_service.render()` 传入 `secondary_title` + `secondary_title_style` |
|
||||
| `backend/app/services/remotion_service.py` | `render()` 新增 `secondary_title` 和 `secondary_title_style` 参数,构建 CLI 参数 `--secondaryTitle` 和 `--secondaryTitleStyle` |
|
||||
|
||||
### Remotion 修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `remotion/render.ts` | `RenderOptions` 新增 `secondaryTitle?` + `secondaryTitleStyle?`;`parseArgs()` 新增 switch case;`inputProps` 新增两个字段 |
|
||||
| `remotion/src/components/Title.tsx` | `TitleProps` 新增 `secondaryTitle?` 和 `secondaryTitleStyle?`;`AbsoluteFill` 改为 `flexDirection: 'column'` 垂直堆叠;主标题 `<h1>` 后增加副标题 `<h2>`,独立样式(默认字号 48px、字重 700),共享淡入淡出动画;副标题字体使用独立 `@font-face`(`SecondaryTitleFont`)避免与主标题冲突 |
|
||||
| `remotion/src/Video.tsx` | `VideoProps` 新增 `secondaryTitle?` + `secondaryTitleStyle?`;传递给 `<Title>` 组件;渲染条件改为 `{(title \|\| secondaryTitle) && ...}` |
|
||||
| `remotion/src/Root.tsx` | `defaultProps` 新增 `secondaryTitle: undefined` + `secondaryTitleStyle: undefined` |
|
||||
|
||||
### 前端修改
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `frontend/src/shared/lib/title.ts` | 新增 `SECONDARY_TITLE_MAX_LENGTH = 20` 和 `clampSecondaryTitle()` |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 新增状态 `videoSecondaryTitle`、`selectedSecondaryTitleStyleId`、`secondaryTitleFontSize`、`secondaryTitleTopMargin`、`secondaryTitleSizeLocked`;新建 `secondaryTitleInput = useTitleInput({ maxLength: 20 })`(不 sync 到发布页);`handleGenerateMeta()` 接收并填充 `secondary_title`;`handleGenerate()` 构建 payload 增加副标题字段;return 暴露所有新状态 |
|
||||
| `frontend/src/features/home/model/useHomePersistence.ts` | 新增 localStorage key:`secondaryTitle`、`secondaryTitleStyle`、`secondaryTitleFontSize`、`secondaryTitleTopMargin`;对应的恢复和保存 effect |
|
||||
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | Props 新增副标题相关;主标题输入框下方添加"片头副标题(限制20个字)"输入框;副标题样式选择器(复用 titleStyles 预设)、字号滑块(30-100px)、间距滑块(0-100px) |
|
||||
| `frontend/src/features/home/ui/FloatingStylePreview.tsx` | 标题预览改为 flex column 布局;主标题下方增加副标题预览行,独立样式渲染 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 从 `useHomeController` 解构新状态,传给 `TitleSubtitlePanel` |
|
||||
|
||||
---
|
||||
|
||||
## 🐛 参考音频上传 — 中文文件名 InvalidKey 修复
|
||||
|
||||
### 问题
|
||||
|
||||
上传中文名参考音频(如"我的声音.wav")时,Supabase Storage 报 `InvalidKey`,因为存储路径直接使用了原始中文文件名。
|
||||
|
||||
### 修复
|
||||
|
||||
在 `ref_audios/service.py` 新增 `sanitize_filename()` 函数,将存储路径的文件名清洗为 ASCII 安全字符(仅 `A-Za-z0-9._-`):
|
||||
|
||||
- NFKD 规范化 → 丢弃非 ASCII → 非法字符替换为 `_`
|
||||
- 纯中文/emoji 清洗后为空时,使用 MD5 哈希兜底(如 `audio_e924b1193007`)
|
||||
- 文件名限长 50 字符
|
||||
- 原始中文文件名保留在 metadata 中作为展示名,前端显示不受影响
|
||||
|
||||
```
|
||||
修复前: cbbe.../1771915755_我的声音.wav → InvalidKey
|
||||
修复后: cbbe.../1771915755_audio_xxxxxxxx.wav → 上传成功
|
||||
```
|
||||
|
||||
| 文件 | 变更 |
|
||||
|------|------|
|
||||
| `backend/app/modules/ref_audios/service.py` | 新增 `sanitize_filename()` 函数,上传路径使用清洗后文件名 |
|
||||
239
Docs/DevLogs/Day26.md
Normal file
239
Docs/DevLogs/Day26.md
Normal file
@@ -0,0 +1,239 @@
|
||||
## 🎨 前端优化:板块合并 + 序号标题 + UI 精细化 (Day 26)
|
||||
|
||||
### 概述
|
||||
|
||||
首页原有 9 个独立板块(左栏 7 个 + 右栏 2 个),每个都有自己的卡片容器和标题,视觉碎片化严重。本次将相关板块合并为 5 个主板块,添加中文序号(一~十),移除 emoji 图标,并对多个子组件的布局和交互细节进行优化。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 改动内容
|
||||
|
||||
### 1. 板块合并方案
|
||||
|
||||
**左栏(4 个主板块 + 2 个独立区域):**
|
||||
|
||||
| 序号 | 板块名 | 子板块 | 原组件 |
|
||||
|------|--------|--------|--------|
|
||||
| 一 | 文案提取与编辑 | — | ScriptEditor |
|
||||
| 二 | 标题与字幕 | — | TitleSubtitlePanel |
|
||||
| 三 | 配音 | 配音方式 / 配音列表 | VoiceSelector + GeneratedAudiosPanel |
|
||||
| 四 | 素材编辑 | 视频素材 / 时间轴编辑 | MaterialSelector + TimelineEditor |
|
||||
| 五 | 背景音乐 | — | BgmPanel |
|
||||
| — | 生成按钮 | — | GenerateActionBar(不编号) |
|
||||
|
||||
**右栏(1 个主板块):**
|
||||
|
||||
| 序号 | 板块名 | 子板块 | 原组件 |
|
||||
|------|--------|--------|--------|
|
||||
| 六 | 作品 | 作品列表 / 作品预览 | HistoryList + PreviewPanel |
|
||||
|
||||
**发布页(/publish):**
|
||||
|
||||
| 序号 | 板块名 |
|
||||
|------|--------|
|
||||
| 七 | 平台账号 |
|
||||
| 八 | 选择发布作品 |
|
||||
| 九 | 发布信息 |
|
||||
| 十 | 选择发布平台 |
|
||||
|
||||
### 2. embedded 模式
|
||||
|
||||
6 个组件新增 `embedded?: boolean` prop(默认 `false`):
|
||||
|
||||
- `VoiceSelector` — embedded 时不渲染外层卡片和主标题
|
||||
- `GeneratedAudiosPanel` — embedded 时两行布局:第 1 行(语速+生成配音右对齐)、第 2 行(配音列表+刷新)
|
||||
- `MaterialSelector` — embedded 时自渲染 h3 子标题"视频素材"+ 上传/刷新按钮同行
|
||||
- `TimelineEditor` — embedded 时自渲染 h3 子标题"时间轴编辑"+ 画面比例/播放控件同行
|
||||
- `PreviewPanel` — embedded 时不渲染外层卡片和标题
|
||||
- `HistoryList` — embedded 时不渲染外层卡片和标题(刷新按钮由 HomePage 提供)
|
||||
|
||||
### 3. 序号标题 + emoji 移除
|
||||
|
||||
所有编号板块移除 emoji 图标,使用纯中文序号:
|
||||
|
||||
- ScriptEditor: `✍️ 文案提取与编辑` → `一、文案提取与编辑`
|
||||
- TitleSubtitlePanel: `🎬 标题与字幕` → `二、标题与字幕`
|
||||
- BgmPanel: `🎵 背景音乐` → `五、背景音乐`
|
||||
- HomePage 右栏: `五、作品` → `六、作品`
|
||||
- PublishPage: `👤 平台账号` → `七、平台账号`、`📹 选择发布作品` → `八、选择发布作品`、`✍️ 发布信息` → `九、发布信息`、`📱 选择发布平台` → `十、选择发布平台`
|
||||
|
||||
### 4. 子标题与分隔样式
|
||||
|
||||
- **主标题**: `text-base sm:text-lg font-semibold text-white`
|
||||
- **子标题**: `text-sm font-medium text-gray-400`
|
||||
- **分隔线**: `<div className="border-t border-white/10 my-4" />`
|
||||
|
||||
### 5. 配音列表布局优化
|
||||
|
||||
GeneratedAudiosPanel embedded 模式下采用两行布局:
|
||||
- **第 1 行**:语速下拉 + 生成配音按钮(右对齐,`flex justify-end`)
|
||||
- **第 2 行**:`<h3>配音列表</h3>` + 刷新按钮(两端对齐)
|
||||
- 非 embedded 模式保持原单行布局
|
||||
|
||||
### 6. TitleSubtitlePanel 下拉对齐
|
||||
|
||||
- 标题样式/副标题样式/字幕样式三行标签统一 `w-20`(固定 80px),确保下拉菜单垂直对齐
|
||||
- 下拉菜单宽度 `w-1/3 min-w-[100px]`,避免过宽
|
||||
|
||||
### 7. RefAudioPanel 文案简化
|
||||
|
||||
- 原底部段落"上传任意语音样本(3-10秒)…" 移至 "我的参考音频" 标题旁,简化为 `(上传3-10秒语音样本)`
|
||||
|
||||
### 8. 账户下拉菜单添加手机号
|
||||
|
||||
- AccountSettingsDropdown 在账户有效期上方新增手机号显示区域
|
||||
- 显示 `user?.phone || '未知账户'`
|
||||
|
||||
### 9. 标题显示模式对副标题生效
|
||||
|
||||
- **payload 修复**: `useHomeController.ts` 中 `title_display_mode` 的发送条件从 `videoTitle.trim()` 改为 `videoTitle.trim() || videoSecondaryTitle.trim()`,确保仅有副标题时也能发送显示模式
|
||||
- **UI 调整**: 短暂显示/常驻显示下拉从片头标题输入行移至"二、标题与字幕"板块标题行(与预览样式按钮同行),明确表示该设置对标题和副标题同时生效
|
||||
- Remotion 端 `Title.tsx` 已支持(标题和副标题作为整体组件渲染,`displayMode` 统一控制)
|
||||
|
||||
### 10. 时间轴模糊遮罩
|
||||
|
||||
遮罩从外层 wrapper 移入"四、素材编辑"卡片内,仅覆盖时间轴子区域(`rounded-xl`)。
|
||||
|
||||
### 11. 登录后用户信息立即可用
|
||||
|
||||
- AuthContext 新增 `setUser` 方法暴露给消费者
|
||||
- 登录页成功后调用 `setUser(result.user)` 立即写入 Context,无需等页面刷新
|
||||
- 修复登录后账户下拉显示"未知账户"、刷新后才显示手机号的问题
|
||||
|
||||
### 12. 文案与选项微调
|
||||
|
||||
- MaterialSelector 描述 `(可多选,最多4个)` → `(上传自拍视频,最多可选4个)`
|
||||
- TitleSubtitlePanel 显示模式选项 `短暂显示/常驻显示` → `标题短暂显示/标题常驻显示`
|
||||
|
||||
### 13. UI/UX 体验优化(6 项)
|
||||
|
||||
- **操作按钮移动端可见**: 配音列表、作品列表、素材列表、参考音频、历史文案的操作按钮从 `opacity-0`(hover 才显示)改为 `opacity-40`(平时半透明可见,hover 全亮),解决触屏设备无法发现按钮的问题
|
||||
- **手机号脱敏**: AccountSettingsDropdown 手机号中间四位遮掩 `138****5678`
|
||||
- **标题字数计数器**: TitleSubtitlePanel 标题/副标题输入框右侧显示实时字数 `3/15`,超限变红
|
||||
- **列表滚动条提示**: ~~配音列表、作品列表、素材列表、BGM 列表从 `hide-scrollbar` 改为 `custom-scrollbar`~~ → 已全部改回 `hide-scrollbar` 隐藏滚动条(滚动功能不变)
|
||||
- **时间轴拖拽提示**: TimelineEditor 色块左上角新增 `GripVertical` 抓手图标,暗示可拖拽排序
|
||||
- **截取滑块放大**: ClipTrimmer 手柄从 16px 放大到 20px,触控区从 32px 放大到 40px
|
||||
|
||||
### 14. 代码质量修复(4 项)
|
||||
|
||||
- **AccountSettingsDropdown**: 关闭密码弹窗补齐 `setSuccess('')` 清空
|
||||
- **MaterialSelector**: `selectedSet` 加 `useMemo` 避免每次渲染重建
|
||||
- **TimelineEditor**: `visibleSegments`/`overflowSegments` 加 `useMemo`
|
||||
- **MaterialSelector**: 素材满 4 个时非选中项按钮加 `disabled`
|
||||
|
||||
### 15. 发布页平台账号响应式布局
|
||||
|
||||
- **单行布局**:图标+名称+状态在左,按钮在右(`flex items-center`)
|
||||
- **移动端紧凑**:图标 `h-6 w-6`、按钮 `text-xs px-2 py-1 rounded-md`、间距 `space-y-2 px-3 py-2.5`
|
||||
- **桌面端宽松**:`sm:h-7 sm:w-7`、`sm:text-sm sm:px-3 sm:py-1.5 sm:rounded-lg`、`sm:space-y-3 sm:px-4 sm:py-3.5`
|
||||
- 两端各自美观,风格与其他板块一致
|
||||
|
||||
### 16. 移动端刷新回顶部修复
|
||||
|
||||
- **问题**: 移动端刷新页面后不回到顶部,而是滚动到背景音乐板块
|
||||
- **根因**: 1) 浏览器原生滚动恢复覆盖 `scrollTo(0,0)`;2) 列表 scroll effect 有双依赖(`selectedId` + `list`),数据异步加载时第二次触发跳过了 ref 守卫,执行了 `scrollIntoView` 导致页面跳动
|
||||
- **修复**: 三管齐下 — ① `history.scrollRestoration = "manual"` 禁用浏览器原生恢复;② 时间门控 `scrollEffectsEnabled` ref(1 秒内禁止所有列表自动滚动)替代单次 ref 守卫;③ 200ms 延迟兜底 `scrollTo(0,0)`
|
||||
|
||||
### 17. 移动端样式预览窗口缩小
|
||||
|
||||
- **问题**: 移动端点击"预览样式"后窗口占满整屏(宽 358px,高约 636px),遮挡样式调节控件
|
||||
- **修复**: 移动端宽度从 `window.innerWidth - 32` 缩小到 **160px**;位置从左上角改为**右下角**(`right:12, bottom:12`),不遮挡上方控件;最大高度限制 `50dvh`
|
||||
- 桌面端保持不变(280px,左上角)
|
||||
|
||||
### 18. 列表滚动条统一隐藏
|
||||
|
||||
- 将 Day 26 早期改为 `custom-scrollbar`(细紫色滚动条)的 7 处全部改回 `hide-scrollbar`
|
||||
- 涉及:BgmPanel、GeneratedAudiosPanel、HistoryList、MaterialSelector(2处)、ScriptExtractionModal(2处)
|
||||
- 滚动功能不受影响,仅视觉上不显示滚动条
|
||||
|
||||
### 19. 配音按钮移动端适配
|
||||
|
||||
- VoiceSelector "选择声音/克隆声音" 按钮:内边距 `px-4` → `px-2 sm:px-4`,字号加 `text-sm sm:text-base`,图标加 `shrink-0`
|
||||
- 修复移动端窄屏下按钮被挤压导致"克隆声音"不可见的问题
|
||||
|
||||
### 20. 素材标题溢出修复
|
||||
|
||||
- MaterialSelector embedded 标题行移除 `whitespace-nowrap`
|
||||
- 描述文字 `(上传自拍视频,最多可选4个)` 在移动端隐藏(`hidden sm:inline`),桌面端正常显示
|
||||
- 修复移动端刷新按钮被推出容器外的问题
|
||||
|
||||
### 21. 生成配音按钮放大
|
||||
|
||||
- "生成配音" 作为核心操作按钮,从辅助尺寸升级为主操作尺寸
|
||||
- 内边距 `px-2/px-3 py-1/py-1.5` → `px-4 py-2`,字号 `text-xs` → `text-sm font-medium`
|
||||
- 图标 `h-3.5 w-3.5` → `h-4 w-4`,新增 `shadow-sm` + hover `shadow-md`
|
||||
- embedded 与非 embedded 模式统一放大
|
||||
|
||||
### 22. 生成进度条位置调整
|
||||
|
||||
- **问题**: 生成进度条在"六、作品"卡片内部(作品预览下方),不够醒目
|
||||
- **修复**: 进度条从 PreviewPanel 内部提取到 HomePage 右栏,作为独立卡片渲染在"六、作品"卡片**上方**
|
||||
- 使用紫色边框(`border-purple-500/30`)区分,显示任务消息和百分比
|
||||
- PreviewPanel embedded 模式下不再渲染进度条(传入 `currentTask={null}`)
|
||||
- 生成完成后进度卡片自动消失
|
||||
|
||||
### 23. LatentSync 超时修复
|
||||
|
||||
- **问题**: 约 2 分钟的视频(3023 帧,190 段推理)预计推理 54 分钟,但 httpx 超时仅 20 分钟,导致 LatentSync 调用失败并回退到无口型同步
|
||||
- **根因**: `lipsync_service.py` 中 `httpx.AsyncClient(timeout=1200.0)` 不足以覆盖长视频推理时间
|
||||
- **修复**: 超时从 `1200s`(20 分钟)改为 `3600s`(1 小时),足以覆盖 2-3 分钟视频的推理
|
||||
|
||||
### 24. 字幕时间戳节奏映射(修复长视频字幕漂移)
|
||||
|
||||
- **问题**: 2 分钟视频字幕明显对不上语音,越到后面偏差越大
|
||||
- **根因**: `whisper_service.py` 的 `original_text` 处理逻辑丢弃了 Whisper 逐词时间戳,仅保留总时间范围后做全程线性插值,每个字分配相同时长,完全忽略语速变化和停顿
|
||||
- **修复**: 保留 Whisper 的逐字时间戳作为语音节奏模板,将原文字符按比例映射到 Whisper 时间节奏上(rhythm-mapping),而非线性均分。字幕文字不变,只是时间戳跟随真实语速
|
||||
- **算法**: 原文第 i 个字符映射到 Whisper 时间线的 `(i/N)*M` 位置(N=原文字符数,M=Whisper字符数),在相邻 Whisper 时间点间线性插值
|
||||
|
||||
---
|
||||
|
||||
## 📁 修改文件清单
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `VoiceSelector.tsx` | 新增 embedded prop,移动端按钮适配(`px-2 sm:px-4`) |
|
||||
| `GeneratedAudiosPanel.tsx` | 新增 embedded prop,两行布局,操作按钮可见度,"生成配音"按钮放大 |
|
||||
| `MaterialSelector.tsx` | 新增 embedded prop,自渲染子标题+操作按钮,useMemo,disabled 守卫,操作按钮可见度,标题溢出修复 |
|
||||
| `TimelineEditor.tsx` | 新增 embedded prop,自渲染子标题+控件,useMemo,拖拽抓手图标 |
|
||||
| `PreviewPanel.tsx` | 新增 embedded prop |
|
||||
| `HistoryList.tsx` | 新增 embedded prop,操作按钮可见度 |
|
||||
| `ScriptEditor.tsx` | 标题加序号,移除 emoji,操作按钮可见度 |
|
||||
| `TitleSubtitlePanel.tsx` | 标题加序号,移除 emoji,下拉对齐,显示模式下拉上移,字数计数器 |
|
||||
| `BgmPanel.tsx` | 标题加序号 |
|
||||
| `HomePage.tsx` | 核心重构:合并板块、序号标题、生成配音按钮迁入、`scrollRestoration` + 延迟兜底修复刷新回顶部、生成进度条提取到作品卡片上方 |
|
||||
| `PublishPage.tsx` | 四个板块加序号(七~十),移除 emoji,平台卡片响应式单行布局 |
|
||||
| `RefAudioPanel.tsx` | 简化提示文案,操作按钮可见度 |
|
||||
| `AccountSettingsDropdown.tsx` | 新增手机号显示(脱敏),补齐 success 清空 |
|
||||
| `AuthContext.tsx` | 新增 `setUser` 方法,登录后立即更新用户状态 |
|
||||
| `login/page.tsx` | 登录成功后调用 `setUser` 写入用户数据 |
|
||||
| `useHomeController.ts` | titleDisplayMode 条件修复,列表 scroll 时间门控 `scrollEffectsEnabled` |
|
||||
| `FloatingStylePreview.tsx` | 移动端预览窗口缩小(160px)并移至右下角 |
|
||||
| `ScriptExtractionModal.tsx` | 滚动条改回隐藏 |
|
||||
| `ClipTrimmer.tsx` | 滑块手柄放大、触控区增高 |
|
||||
| `lipsync_service.py` | httpx 超时从 1200s 改为 3600s |
|
||||
| `whisper_service.py` | 字幕时间戳从线性插值改为 Whisper 节奏映射 |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证
|
||||
|
||||
- `npm run build` — 零报错零警告
|
||||
- 合并后布局:各子板块分隔清晰、主标题有序号
|
||||
- 向后兼容:`embedded` 默认 `false`,组件独立使用不受影响
|
||||
- 配音列表两行布局:语速+生成配音在上,配音列表+刷新在下
|
||||
- 下拉菜单垂直对齐正确
|
||||
- 短暂显示/常驻显示对标题和副标题同时生效
|
||||
- 操作按钮在移动端(触屏)可见
|
||||
- 手机号脱敏显示
|
||||
- 标题字数计数器正常
|
||||
- 列表滚动条全部隐藏
|
||||
- 时间轴拖拽抓手图标显示
|
||||
- 发布页平台卡片:移动端紧凑、桌面端宽松,风格一致
|
||||
- 移动端刷新后回到顶部,不再滚动到背景音乐位置
|
||||
- 移动端样式预览窗口不遮挡控件
|
||||
- 移动端配音按钮(选择声音/克隆声音)均可见
|
||||
- 移动端素材标题行按钮不溢出
|
||||
- 生成配音按钮视觉层级高于辅助按钮
|
||||
- 生成进度条在作品卡片上方独立显示
|
||||
- LatentSync 长视频推理不再超时回退
|
||||
- 字幕时间戳与语音节奏同步,长视频不漂移
|
||||
231
Docs/DevLogs/Day27.md
Normal file
231
Docs/DevLogs/Day27.md
Normal file
@@ -0,0 +1,231 @@
|
||||
## Remotion 描边修复 + 字体样式扩展 + TypeScript 修复 (Day 27)
|
||||
|
||||
### 概述
|
||||
|
||||
修复标题/字幕描边渲染问题(描边过粗 + 副标题重影),扩展字体样式选项(标题 4→12、字幕 4→8),修复 Remotion 项目 TypeScript 类型错误。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 改动内容
|
||||
|
||||
### 1. 描边渲染修复(标题 + 字幕)
|
||||
|
||||
- **问题**: 标题黑色描边过粗,副标题出现重影/鬼影
|
||||
- **根因**: `buildTextShadow` 用 4 方向 `textShadow` 模拟描边 — 对角线叠加导致描边视觉上比实际 `stroke_size` 更粗;4 角方向在中间有间隙和叠加,造成重影
|
||||
- **修复**: 改用 CSS 原生描边 `-webkit-text-stroke` + `paint-order: stroke fill`(Remotion 用 Chromium 渲染,完美支持)
|
||||
- **旧方案**:
|
||||
```javascript
|
||||
textShadow: `-8px -8px 0 #000, 8px -8px 0 #000, -8px 8px 0 #000, 8px 8px 0 #000, 0 0 16px rgba(0,0,0,0.5), 0 2px 4px rgba(0,0,0,0.3)`
|
||||
```
|
||||
- **新方案**:
|
||||
```javascript
|
||||
WebkitTextStroke: `5px #000000`,
|
||||
paintOrder: 'stroke fill',
|
||||
textShadow: `0 2px 4px rgba(0,0,0,0.3)`,
|
||||
```
|
||||
- 同时将所有预设样式的 `stroke_size` 从 8 降到 5,配合原生描边视觉更干净
|
||||
|
||||
### 2. 字体样式扩展
|
||||
|
||||
**标题样式**: 4 个 → 12 个(+8)
|
||||
|
||||
| ID | 样式名 | 字体 | 配色 |
|
||||
|----|--------|------|------|
|
||||
| title_pangmen | 庞门正道 | 庞门正道标题体3.0 | 白字黑描 |
|
||||
| title_round | 优设标题圆 | 优设标题圆 | 白字紫描 |
|
||||
| title_alibaba | 阿里数黑体 | 阿里巴巴数黑体 | 白字黑描 |
|
||||
| title_chaohei | 文道潮黑 | 文道潮黑 | 青蓝字深蓝描 |
|
||||
| title_wujie | 无界黑 | 标小智无界黑 | 白字深灰描 |
|
||||
| title_houdi | 厚底黑 | Aa厚底黑 | 红字深黑描 |
|
||||
| title_banyuan | 寒蝉半圆体 | 寒蝉半圆体 | 白字黑描 |
|
||||
| title_jixiang | 欣意吉祥宋 | 字体圈欣意吉祥宋 | 金字棕描 |
|
||||
|
||||
**字幕样式**: 4 个 → 8 个(+4)
|
||||
|
||||
| ID | 样式名 | 字体 | 高亮色 |
|
||||
|----|--------|------|--------|
|
||||
| subtitle_pink | 少女粉 | DingTalk JinBuTi | 粉色 #FF69B4 |
|
||||
| subtitle_lime | 清新绿 | DingTalk Sans | 荧光绿 #76FF03 |
|
||||
| subtitle_gold | 金色隶书 | 阿里妈妈刀隶体 | 金色 #FDE68A |
|
||||
| subtitle_kai | 楷体红字 | SimKai | 红色 #FF4444 |
|
||||
|
||||
### 3. TypeScript 类型错误修复
|
||||
|
||||
- **Root.tsx**: `Composition` 泛型类型与 `calculateMetadata` 参数类型不匹配 — 内联 `calculateMetadata` 并显式标注参数类型,`defaultProps` 使用 `satisfies VideoProps` 约束
|
||||
- **Video.tsx**: `VideoProps` 接口添加 `[key: string]: unknown` 索引签名,兼容 Remotion 要求的 `Record<string, unknown>` 约束
|
||||
- **VideoLayer.tsx**: `OffthreadVideo` 组件不支持 `loop` prop — 移除(该 prop 原本就被忽略)
|
||||
|
||||
### 4. 进度条文案还原
|
||||
|
||||
- **问题**: 进度条显示后端推送的详细阶段消息(如"正在合成唇型"),用户希望只显示"正在AI生成中..."
|
||||
- **修复**: `HomePage.tsx` 进度条文案从 `{currentTask.message || "正在AI生成中..."}` 改为固定 `正在AI生成中...`
|
||||
|
||||
---
|
||||
|
||||
## 📁 修改文件清单
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `remotion/src/components/Title.tsx` | `buildTextShadow` → `buildStrokeStyle`(CSS 原生描边),标题+副标题同时生效 |
|
||||
| `remotion/src/components/Subtitles.tsx` | `buildTextShadow` → `buildStrokeStyle`(CSS 原生描边) |
|
||||
| `remotion/src/Root.tsx` | 修复 `Composition` 泛型类型、`calculateMetadata` 参数类型 |
|
||||
| `remotion/src/Video.tsx` | `VideoProps` 添加索引签名 |
|
||||
| `remotion/src/components/VideoLayer.tsx` | 移除 `OffthreadVideo` 不支持的 `loop` prop |
|
||||
| `backend/assets/styles/title.json` | 标题样式从 4 个扩展到 12 个,`stroke_size` 8→5 |
|
||||
| `backend/assets/styles/subtitle.json` | 字幕样式从 4 个扩展到 8 个 |
|
||||
| `frontend/.../HomePage.tsx` | 进度条文案还原为固定"正在AI生成中..." |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证
|
||||
|
||||
- `npx tsc --noEmit` — 零错误
|
||||
- `npm run build:render` — 渲染脚本编译成功
|
||||
- `npm run build`(前端)— 零报错
|
||||
- 描边:标题/副标题/字幕使用 CSS 原生描边,无重影、无虚胖
|
||||
- 样式选择:前端下拉可加载全部 12 个标题 + 8 个字幕样式
|
||||
|
||||
---
|
||||
|
||||
## 视频生成流水线性能优化
|
||||
|
||||
### 概述
|
||||
|
||||
针对视频生成流水线进行全面性能优化,涵盖 FFmpeg 编码参数、LatentSync 推理参数、多素材并行化、以及后处理阶段并行化。预估 15s 单素材视频从 ~280s 降至 ~190s (32%),30s 双素材从 ~400s 降至 ~240s (40%)。
|
||||
|
||||
**服务器配置**: 2x RTX 3090 (24GB), 2x Xeon E5-2680 v4 (56核), 192GB RAM
|
||||
|
||||
### 第一阶段:FFmpeg 编码优化
|
||||
|
||||
**最终合成 preset `slow` → `medium`**
|
||||
- 合成阶段从 ~50s 降到 ~25s,质量几乎无变化
|
||||
|
||||
**中间文件 CRF 18 → 23**
|
||||
- 中间产物(trim、prepare_segment、concat、loop、normalize_orientation)不是最终输出,不需要高质量编码
|
||||
- 每个中间步骤快 3-8 秒
|
||||
|
||||
**最终合成 CRF 18 → 20**
|
||||
- 15 秒口播视频 CRF 18 vs 20 肉眼无法区分
|
||||
|
||||
### 第二阶段:LatentSync 推理参数调优
|
||||
|
||||
**inference_steps 20 → 16**
|
||||
- 推理时间线性减少 20%(~180s → ~144s)
|
||||
|
||||
**guidance_scale 2.0 → 1.5**
|
||||
- classifier-free guidance 权重降低,每步计算量微降(5-10%)
|
||||
|
||||
> ⚠️ 两项需重启 LatentSync 服务后测试唇形质量,确认可接受再保留。如质量不佳可回退 .env 参数。
|
||||
|
||||
### 第三阶段:多素材流水线并行化
|
||||
|
||||
**素材下载 + 归一化并行**
|
||||
- 串行 `for` 循环改为 `asyncio.gather()`,`normalize_orientation` 通过 `run_in_executor` 在线程池执行
|
||||
- N 个素材从串行 N×5s → ~5s
|
||||
|
||||
**片段预处理并行**
|
||||
- 逐个 `prepare_segment` 改为 `asyncio.gather()` + `run_in_executor`
|
||||
- 2 素材 ~90s → ~50s;4 素材 ~180s → ~60s
|
||||
|
||||
### 第四阶段:流水线交叠
|
||||
|
||||
**Whisper 字幕对齐 与 BGM 混音 并行**
|
||||
- 两者互不依赖(都只依赖 audio_path),用 `asyncio.gather()` 并行执行
|
||||
- 单素材模式下 Whisper 从 LatentSync 之后的串行步骤移至与 BGM 并行
|
||||
- 不开 BGM 或不开字幕时行为不变,只有同时启用时才并行
|
||||
|
||||
### 修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `backend/app/services/video_service.py` | compose: preset slow→medium, CRF 18→20; normalize_orientation/prepare_segment/concat: CRF 18→23 |
|
||||
| `backend/app/services/lipsync_service.py` | _loop_video_to_duration: CRF 18→23 |
|
||||
| `backend/.env` | LATENTSYNC_INFERENCE_STEPS=16, LATENTSYNC_GUIDANCE_SCALE=1.5 |
|
||||
| `backend/app/modules/videos/workflow.py` | import asyncio; 素材下载/归一化并行; 片段预处理并行; Whisper+BGM 并行 |
|
||||
|
||||
### 回退方案
|
||||
|
||||
- FFmpeg 参数:如画质不满意,将最终 CRF 改回 18、preset 改回 slow
|
||||
- LatentSync:如唇形质量下降,将 .env 中 `INFERENCE_STEPS` 改回 20、`GUIDANCE_SCALE` 改回 2.0
|
||||
- 并行化:纯架构优化,无质量影响,无需回退
|
||||
|
||||
---
|
||||
|
||||
## MuseTalk + LatentSync 混合唇形同步方案
|
||||
|
||||
### 概述
|
||||
|
||||
LatentSync 1.6 质量高但推理极慢(~78% 总时长),长视频(>=2min)耗时 20-60 分钟不可接受。MuseTalk 1.5 是单步潜空间修复(非扩散模型),逐帧推理速度接近实时(30fps+ on V100),适合长视频。混合方案按音频时长自动路由:短视频用 LatentSync 保质量,长视频用 MuseTalk 保速度。
|
||||
|
||||
### 架构
|
||||
|
||||
- **路由阈值**: `LIPSYNC_DURATION_THRESHOLD` (默认 120s)
|
||||
- **短视频 (<120s)**: LatentSync 1.6 (GPU1, 端口 8007)
|
||||
- **长视频 (>=120s)**: MuseTalk 1.5 (GPU0, 端口 8011)
|
||||
- **回退**: MuseTalk 不可用时自动 fallback 到 LatentSync
|
||||
|
||||
### 改动文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `models/MuseTalk/` | 从 Temp/MuseTalk 复制代码 + 下载权重 |
|
||||
| `models/MuseTalk/scripts/server.py` | 新建 FastAPI 常驻服务 (端口 8011, GPU0) |
|
||||
| `backend/app/core/config.py` | 新增 MUSETALK_* 和 LIPSYNC_DURATION_THRESHOLD |
|
||||
| `backend/.env` | 新增对应环境变量 |
|
||||
| `backend/app/services/lipsync_service.py` | 新增 `_call_musetalk_server()` + 混合路由逻辑 + 扩展 `check_health()` |
|
||||
|
||||
---
|
||||
|
||||
## MuseTalk 推理性能优化 (server.py v2)
|
||||
|
||||
### 概述
|
||||
|
||||
MuseTalk 首次长视频测试 (136s, 3404 帧) 耗时 1799s (~30 分钟),分析发现瓶颈集中在人脸检测 (28%)、BiSeNet 合成 (22%)、I/O (17%),而非 UNet 推理本身 (17%)。通过 6 项优化预估降至 8-10 分钟 (~3x 加速)。
|
||||
|
||||
### 性能瓶颈分析 (优化前, 1799s)
|
||||
|
||||
| 阶段 | 耗时 | 占比 | 瓶颈原因 |
|
||||
|------|------|------|---------|
|
||||
| DWPose + 人脸检测 | ~510s | 28% | `batch_size_fa=1`, 每帧跑 2 个 NN, 完全串行 |
|
||||
| 合成 + BiSeNet 人脸解析 | ~400s | 22% | 每帧都跑 BiSeNet + PNG 写盘 |
|
||||
| UNet 推理 | ~300s | 17% | batch_size=8 太小 |
|
||||
| I/O (PNG 读写 + FFmpeg) | ~300s | 17% | PNG 压缩慢, ffmpeg→PNG→imread 链路 |
|
||||
| VAE 编码 | ~100s | 6% | 逐帧编码, 未批处理 |
|
||||
|
||||
### 6 项优化
|
||||
|
||||
| # | 优化项 | 详情 |
|
||||
|---|--------|------|
|
||||
| 1 | **batch_size 8→32** | `.env` 修改, RTX 3090 显存充裕 |
|
||||
| 2 | **cv2.VideoCapture 直读帧** | 跳过 ffmpeg→PNG→imread 链路, 省去 3404 次 PNG 编解码 |
|
||||
| 3 | **人脸检测降频 (每5帧)** | 每 5 帧运行 DWPose + FaceAlignment, 中间帧线性插值 bbox |
|
||||
| 4 | **BiSeNet mask 缓存 (每5帧)** | 每 5 帧运行 `get_image_prepare_material`, 中间帧用 `get_image_blending` 复用缓存 mask |
|
||||
| 5 | **cv2.VideoWriter 直写** | 跳过逐帧 PNG 写盘 + ffmpeg 重编码, 用 VideoWriter 直写 mp4 |
|
||||
| 6 | **每阶段计时** | 7 个阶段精确计时, 方便后续进一步调优 |
|
||||
|
||||
### 修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `models/MuseTalk/scripts/server.py` | 完全重写 `_run_inference()`, 新增 `_detect_faces_subsampled()` |
|
||||
| `backend/.env` | `MUSETALK_BATCH_SIZE` 8→32 |
|
||||
|
||||
---
|
||||
|
||||
## Remotion 并发渲染优化
|
||||
|
||||
### 概述
|
||||
|
||||
Remotion 渲染在 56 核服务器上默认只用 8 并发 (`min(8, cores/2)`),改为 16 并发,预估从 ~5 分钟降到 ~2-3 分钟。
|
||||
|
||||
### 改动
|
||||
|
||||
- `remotion/render.ts`: `renderMedia()` 新增 `concurrency` 参数 (默认 16), 支持 `--concurrency` CLI 参数覆盖
|
||||
- `remotion/dist/render.js`: 重新编译
|
||||
|
||||
### 修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `remotion/render.ts` | `RenderOptions` 新增 `concurrency` 字段, `renderMedia()` 传入 `concurrency` |
|
||||
| `remotion/dist/render.js` | TypeScript 重新编译 |
|
||||
263
Docs/DevLogs/Day28.md
Normal file
263
Docs/DevLogs/Day28.md
Normal file
@@ -0,0 +1,263 @@
|
||||
## CosyVoice FP16 加速 + 文档更新 + AI改写界面重构 + 标题字幕面板重排与视频帧预览 (Day 28)
|
||||
|
||||
### 概述
|
||||
|
||||
CosyVoice 3.0 声音克隆服务开启 FP16 半精度推理,预估提速 30-40%。同步更新 4 个项目文档。重构 AI 改写文案界面(RewriteModal 两步流程 + ScriptExtractionModal 逻辑抽取)。前端将"标题与字幕"面板从第二步移至第四步(素材编辑之后),样式预览窗口背景从紫粉渐变改为视频片头帧截图,实现所见即所得。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 改动内容
|
||||
|
||||
### 1. CosyVoice FP16 半精度加速
|
||||
|
||||
- **问题**: CosyVoice 3.0 以 FP32 全精度运行,RTF (Real-Time Factor) 约 0.9-1.35x,生成 2 分钟音频需要约 2 分钟
|
||||
- **根因**: `AutoModel()` 初始化时未传入 `fp16=True`,LLM 推理和 Flow Matching (DiT) 均在 FP32 下运行
|
||||
- **修复**: 一行改动开启 FP16 自动混合精度
|
||||
|
||||
```python
|
||||
# 旧: _model = AutoModel(model_dir=str(MODEL_DIR))
|
||||
# 新:
|
||||
_model = AutoModel(model_dir=str(MODEL_DIR), fp16=True)
|
||||
```
|
||||
|
||||
- **生效机制**: `CosyVoice3Model` 在 `llm_job()` 和 `token2wav()` 中通过 `torch.cuda.amp.autocast(self.fp16)` 自动将计算转为 FP16
|
||||
- **预期效果**:
|
||||
- 推理速度提升 30-40%
|
||||
- 显存占用降低 ~30%
|
||||
- 语音质量基本无损(0.5B 模型 FP16 精度充足)
|
||||
- **验证**: 服务重启后自检通过,健康检查 `ready: true`
|
||||
|
||||
### 2. 文档全面更新 (4 个文件)
|
||||
|
||||
补充 Day 27 新增的 MuseTalk 混合唇形同步方案、性能优化、Remotion 并发渲染等内容到所有相关文档。
|
||||
|
||||
#### README.md
|
||||
- 项目描述更新为 "LatentSync 1.6 + MuseTalk 1.5 混合唇形同步"
|
||||
- 唇形同步功能描述改为混合方案(短视频 LatentSync,长视频 MuseTalk)
|
||||
- 技术栈表新增 MuseTalk 1.5
|
||||
- 项目结构新增 `models/MuseTalk/`
|
||||
- 服务架构表新增 MuseTalk (端口 8011)
|
||||
- 文档中心新增 MuseTalk 部署指南链接
|
||||
- 性能优化描述新增降频检测 + Remotion 16 并发
|
||||
|
||||
#### DEPLOY_MANUAL.md
|
||||
- GPU 分配说明更新 (GPU0=MuseTalk+CosyVoice, GPU1=LatentSync)
|
||||
- 步骤 3 拆分为 3a (LatentSync) + 3b (MuseTalk)
|
||||
- 环境变量表新增 7 个 MuseTalk 变量,移除过时的 `DOUYIN_COOKIE`
|
||||
- LatentSync 推理步数默认值 20→16
|
||||
- 测试运行新增 MuseTalk 启动终端
|
||||
- PM2 管理新增 MuseTalk 服务(第 5 项)
|
||||
- 端口检查、日志查看命令新增 8011/vigent2-musetalk
|
||||
|
||||
#### SUBTITLE_DEPLOY.md
|
||||
- 技术架构图更新为 LatentSync/MuseTalk 混合路由
|
||||
- 新增唇形同步路由说明
|
||||
- Remotion 配置表新增 `concurrency` 参数 (默认 16)
|
||||
- GPU 分配说明更新
|
||||
- 更新日志新增 v1.3.0 条目
|
||||
|
||||
#### BACKEND_README.md
|
||||
- 健康检查接口描述更新为含 LatentSync + MuseTalk + 混合路由阈值
|
||||
- 环境变量配置新增 MuseTalk 相关变量
|
||||
- 服务集成指南新增"唇形同步混合路由"章节
|
||||
|
||||
---
|
||||
|
||||
### 3. AI 改写文案界面重构
|
||||
|
||||
#### RewriteModal 重构
|
||||
|
||||
将 AI 改写弹窗改为两步式流程,提升交互体验:
|
||||
|
||||
**第一步 — 配置与触发**:
|
||||
- 自定义提示词输入(可选),自动持久化到 localStorage
|
||||
- "开始改写"按钮触发 `/api/ai/rewrite` 请求
|
||||
|
||||
**第二步 — 结果对比与选择**:
|
||||
- 上方:AI 改写结果 + "使用此结果"按钮(紫粉渐变色,醒目)
|
||||
- 下方:原文对比 + "保留原文"按钮(灰色低调)
|
||||
- 底部:可"重新改写"(重回第一步,保留自定义提示词)
|
||||
- ESC 快捷键关闭
|
||||
|
||||
#### ScriptExtractionModal 逻辑抽取
|
||||
|
||||
将文案提取模态框的全部业务逻辑抽取到独立 hook `useScriptExtraction`:
|
||||
|
||||
- **useScriptExtraction.ts** (新建): 管理 URL/文件双模式输入、拖拽上传、提取请求、步骤状态机 (config → processing → result)、剪贴板复制
|
||||
- **ScriptExtractionModal.tsx**: 纯展示组件,消费 hook 返回值,新增 ESC/Enter 快捷键
|
||||
|
||||
#### ScriptEditor 工具栏调整
|
||||
|
||||
- 按钮组右对齐 (`justify-end`),统一高度 `h-7` 和圆角
|
||||
- "历史文案"按钮用灰色 (bg-gray-600) 区分辅助功能
|
||||
- "文案提取助手"用紫色 (bg-purple-600) 表示主功能
|
||||
- "AI多语言"用绿渐变 (emerald-teal),"AI生成标题标签"用蓝渐变 (blue-cyan)
|
||||
- "AI智能改写"和"保存文案"移至文本框下方状态栏
|
||||
|
||||
---
|
||||
|
||||
### 4. 标题字幕面板重排 + 视频帧背景预览
|
||||
|
||||
#### 面板顺序重排
|
||||
|
||||
将 `<TitleSubtitlePanel>` 从第二步移至第四步(素材编辑之后),使用户在设置标题字幕样式时已经完成了素材选择和时间轴编排。
|
||||
|
||||
新顺序:
|
||||
```
|
||||
一、文案提取与编辑(不变)
|
||||
二、配音(原三)
|
||||
三、素材编辑(原四)
|
||||
四、标题与字幕(原二)→ 移到素材编辑之后
|
||||
```
|
||||
|
||||
#### 新建 useVideoFrameCapture hook
|
||||
|
||||
从视频 URL 截取 0.1s 处帧画面,返回 JPEG data URL:
|
||||
|
||||
- 创建 `<video>` 元素,设置 `crossOrigin="anonymous"`(素材存储在 Supabase Storage 跨域地址)
|
||||
- 先绑定 `loadedmetadata` / `canplay` / `seeked` / `error` 事件监听,再设 src(避免事件丢失)
|
||||
- `loadedmetadata` 或 `canplay` 触发后 seek 到 0.1s,`seeked` 回调中用 canvas `drawImage` 截帧
|
||||
- canvas 缩放到 480px 宽再编码(预览窗口最大 280px,节省内存)
|
||||
- `canvas.toDataURL("image/jpeg", 0.7)` 导出
|
||||
- 防御 `videoWidth/videoHeight` 为 0 的边界情况
|
||||
- try-catch 防 canvas taint,失败返回 null(降级渐变)
|
||||
- `isActive` 标志 + `seeked` 去重标志防止 stale 和重复更新
|
||||
- 截图完成后清理 video 元素释放内存
|
||||
|
||||
#### 按需截取(性能优化)
|
||||
|
||||
只在样式预览窗口打开时才触发截取:
|
||||
|
||||
```typescript
|
||||
const materialPosterUrl = useVideoFrameCapture(
|
||||
showStylePreview ? firstTimelineMaterialUrl : null
|
||||
);
|
||||
```
|
||||
|
||||
截取源优先使用**时间轴第一段素材**(用户拖拽排序后的真实片头),回退到 `selectedMaterials[0]`(未生成配音、时间轴为空时)。
|
||||
|
||||
#### 预览背景替换
|
||||
|
||||
`FloatingStylePreview` 有视频帧时直接显示原始画面(不加半透明,保证颜色真实),文字靠描边保证可读性;无视频帧时降级为原紫粉渐变背景。
|
||||
|
||||
#### 踩坑记录
|
||||
|
||||
1. **CORS tainted canvas**: 素材文件存储在 Supabase Storage (`api.hbyrkj.top`),是跨域签名链接。必须设 `video.crossOrigin = "anonymous"` 才能让 canvas `toDataURL` 不被 SecurityError 拦截
|
||||
2. **时间轴为空**: `useTimelineEditor` 在 `audioDuration <= 0`(未选配音)时返回空数组,需回退到 `selectedMaterials[0]`
|
||||
3. **事件监听顺序**: 必须先绑定事件监听再设 `video.src`,否则快速加载时事件可能丢失
|
||||
|
||||
---
|
||||
|
||||
## 📁 修改文件清单
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `models/CosyVoice/cosyvoice_server.py` | `AutoModel()` 新增 `fp16=True` 参数 |
|
||||
| `README.md` | 混合唇形同步描述、技术栈、服务架构、项目结构更新 |
|
||||
| `Docs/DEPLOY_MANUAL.md` | MuseTalk 部署步骤、环境变量、PM2 管理、端口检查 |
|
||||
| `Docs/SUBTITLE_DEPLOY.md` | 架构图、Remotion concurrency、GPU 分配、更新日志 |
|
||||
| `Docs/BACKEND_README.md` | 健康检查、环境变量、混合路由章节 |
|
||||
| `frontend/.../RewriteModal.tsx` | 两步式改写流程(自定义提示词 → 结果对比) |
|
||||
| `frontend/.../script-extraction/useScriptExtraction.ts` | **新建** — 文案提取逻辑 hook |
|
||||
| `frontend/.../ScriptExtractionModal.tsx` | 纯展示组件,消费 hook,新增快捷键 |
|
||||
| `frontend/.../ScriptEditor.tsx` | 工具栏右对齐 + 按钮分色 + 改写/保存移至底部 |
|
||||
| `frontend/.../useVideoFrameCapture.ts` | **新建** — 视频帧截取 hook,crossOrigin + canvas 缩放 |
|
||||
| `frontend/.../useHomeController.ts` | 新增 useMemo 计算素材 URL,调用帧截取 hook,showStylePreview 门控 |
|
||||
| `frontend/.../HomePage.tsx` | 面板重排(二↔四互换),编号更新,透传 materialPosterUrl |
|
||||
| `frontend/.../TitleSubtitlePanel.tsx` | 编号"二"→"四",新增 previewBackgroundUrl prop |
|
||||
| `frontend/.../FloatingStylePreview.tsx` | 新增 previewBackgroundUrl prop,条件渲染视频帧/渐变背景 |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证
|
||||
|
||||
- CosyVoice 重启成功,健康检查 `{"ready": true}`
|
||||
- 自检推理通过(7.2s for "你好")
|
||||
- FP16 通过 `torch.cuda.amp.autocast(self.fp16)` 在 LLM 和 Flow Matching 阶段生效
|
||||
- `npx tsc --noEmit` — 零错误
|
||||
- AI 改写:自定义提示词持久化 → 改写结果 + 原文对比 → "使用此结果"/"保留原文"
|
||||
- 文案提取:URL / 文件双模式 → 处理中动画 → 结果填入
|
||||
- 面板顺序:一→文案、二→配音、三→素材编辑、四→标题与字幕
|
||||
- 样式预览背景:有素材时显示真实视频片头帧,无素材降级紫粉渐变
|
||||
- 预览关闭时不触发截取,不浪费资源
|
||||
|
||||
---
|
||||
|
||||
## 💡 CosyVoice 性能分析备注
|
||||
|
||||
### 当前性能基线 (FP32, 优化前)
|
||||
|
||||
| 文本长度 | 音频时长 | 推理耗时 | RTF |
|
||||
|----------|----------|----------|-----|
|
||||
| 42 字 | 9.8s | 13.2s | 1.35x |
|
||||
| 89 字 | 18.2s | 20.3s | 1.12x |
|
||||
| ~530 字 | 115.8s | 107.7s | 0.93x |
|
||||
| ~670 字 | 143.5s | 131.6s | 0.92x |
|
||||
|
||||
### 未来可选优化(收益递减,暂不实施)
|
||||
|
||||
| 优化项 | 预期提升 | 复杂度 |
|
||||
|--------|----------|--------|
|
||||
| TensorRT (DiT 模块) | +20-30% | 需编译 .plan 引擎 |
|
||||
| torch.compile() | +10-20% | 一行代码,但首次编译慢 |
|
||||
| vLLM (LLM 模块) | +10-15% | 额外依赖 |
|
||||
|
||||
---
|
||||
|
||||
## MuseTalk 合成阶段性能优化
|
||||
|
||||
### 概述
|
||||
|
||||
MuseTalk v2 优化后总耗时从 1799s 降到 819s(2.2x),但合成阶段(Phase 6)仍占 462.2s (56.4%),是最大单一瓶颈。本次优化两个方向:纯 numpy blending 替代 PIL 转换、FFmpeg pipe + NVENC GPU 硬编码替代双重编码。
|
||||
|
||||
### 1. 纯 numpy blending 替代 PIL(blending.py)
|
||||
|
||||
- **问题**: `get_image_blending` 每帧做 3 次 numpy↔PIL 转换 + BGR↔RGB 通道翻转,纯粹浪费
|
||||
- **方案**: 新增 `get_image_blending_fast()` 函数
|
||||
- 全程保持 BGR numpy 数组,不做 PIL 转换和通道翻转
|
||||
- mask 混合用 numpy 向量化广播 `mask * (1/255)` 替代 `PIL.paste with mask`
|
||||
- 原 `get_image_blending` 保留作为 fallback
|
||||
- **降级链**: `blending_fast` → `blending`(PIL)→ `get_image`(完整重算)
|
||||
|
||||
### 2. FFmpeg pipe + NVENC 硬编码替代双重编码(server.py)
|
||||
|
||||
**优化前(双重编码)**:
|
||||
```
|
||||
Phase 6: 逐帧 → cv2.VideoWriter (mp4v CPU 软编码) → temp_raw.mp4
|
||||
Phase 7: FFmpeg 读 temp_raw.mp4 → H.264 CPU 重编码 + 合并音频 → output.mp4
|
||||
```
|
||||
|
||||
**优化后(单次 GPU 编码)**:
|
||||
```
|
||||
Phase 6: 逐帧 → FFmpeg stdin pipe (rawvideo → h264_nvenc GPU 编码) → temp_raw.mp4
|
||||
Phase 7: FFmpeg 只做音频合并 (-c:v copy -c:a copy) → output.mp4 (秒级)
|
||||
```
|
||||
|
||||
- NVENC 参数: `-c:v h264_nvenc -preset p4 -cq 20 -pix_fmt yuv420p`
|
||||
- RTX 3090 NVENC 专用芯片编码,不占 CUDA 核心,编码速度 >500fps
|
||||
|
||||
### 3. FFmpeg 进程资源管理加固
|
||||
|
||||
- `try/finally` 包裹写帧循环,确保异常时 `proc.stdin.close()` 执行
|
||||
- `proc.wait()` 后读 stderr 再关闭,避免缓冲区死锁
|
||||
- stderr decode 加 `errors="ignore"` 防止非 UTF-8 崩溃
|
||||
|
||||
### 4. `run_ffmpeg` 安全改进
|
||||
|
||||
- 去掉 `shell=True`,改用列表传参,避免路径特殊字符导致命令注入
|
||||
- Phase 7 FFmpeg 命令从字符串拼接改为列表传参
|
||||
|
||||
### 调优过程
|
||||
|
||||
| 版本 | Phase 6 | Phase 7 | 总计 | 结论 |
|
||||
|------|---------|---------|------|------|
|
||||
| Day27 基线 | 462s | 38s | 819s | — |
|
||||
| v1: libx264 -preset medium | 548s | 0.3s | 854s | CPU 编码背压,反而更慢 |
|
||||
| v2: h264_nvenc(当前) | 待测 | 待测 | 待测 | NVENC 零背压,预估 Phase 6 < 200s |
|
||||
|
||||
### 修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `models/MuseTalk/musetalk/utils/blending.py` | 新增 `get_image_blending_fast()` 纯 numpy 函数 |
|
||||
| `models/MuseTalk/scripts/server.py` | Phase 6: FFmpeg pipe + NVENC + blending_fast;Phase 7: -c:v copy;`run_ffmpeg` 去掉 shell=True |
|
||||
283
Docs/DevLogs/Day29.md
Normal file
283
Docs/DevLogs/Day29.md
Normal file
@@ -0,0 +1,283 @@
|
||||
## 字幕同步修复 + 嘴型参数调优 + 视频流水线全面优化 + 预览背景修复 + CosyVoice 语气控制 (Day 29)
|
||||
|
||||
### 概述
|
||||
|
||||
本轮对视频生成流水线做全面审查优化:修复字幕与语音不同步问题(Whisper 时间戳平滑 + 原文节奏映射)、调优 LatentSync 嘴型参数、compose 流复制省去冗余重编码、FFmpeg 超时保护、全局并发限制、Redis 任务 TTL、临时文件清理、死代码移除。修复因前端域名迁移导致的样式预览背景 CORS 失效问题。新增 CosyVoice 语气控制功能,声音克隆模式下支持开心/伤心/生气等情绪表达(基于 `inference_instruct2`)。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 改动内容
|
||||
|
||||
### 1. 字幕同步修复(Whisper 时间戳 + 原文节奏映射)
|
||||
|
||||
- **问题**: 字幕高亮与语音不同步,表现为字幕超前/滞后、高亮跳空
|
||||
- **根因**: Whisper 输出的逐字时间戳存在微小抖动(相邻字 end > 下一字 start),且字间间隙导致高亮"闪烁"
|
||||
|
||||
#### whisper_service.py — 时间戳后处理
|
||||
|
||||
新增 `smooth_word_timestamps()` 函数,三步平滑:
|
||||
|
||||
1. **单调递增保证**: 后一字的 start 不早于前一字的 start
|
||||
2. **重叠消除**: 两字时间重叠时取中点分割
|
||||
3. **间隙填补**: 字间间隙 < 50ms 时直接连接,避免高亮跳空
|
||||
|
||||
```python
|
||||
def smooth_word_timestamps(words):
|
||||
for i in range(1, len(words)):
|
||||
# 重叠 → 中点分割
|
||||
if w["start"] < prev["end"]:
|
||||
mid = (prev["end"] + w["start"]) / 2
|
||||
prev["end"] = mid; w["start"] = mid
|
||||
# 微小间隙 → 直接连接
|
||||
if 0 < gap < 0.05:
|
||||
prev["end"] = w["start"]
|
||||
```
|
||||
|
||||
#### whisper_service.py — 原文节奏映射
|
||||
|
||||
- **问题**: AI 改写/多语言文案与 Whisper 转录文字不一致,直接用 Whisper 文字会乱码
|
||||
- **方案**: `original_text` 参数非空时,用原文字符替换 Whisper 文字,但保留 Whisper 的语音节奏时间戳
|
||||
- 实现:将 N 个原文字符按比例映射到 M 个 Whisper 时间戳上(线性插值)
|
||||
- 字数比例异常检测(>1.5x 或 <0.67x 时警告)
|
||||
- 单字时长钳位:40ms ~ 800ms,防止极端漂移
|
||||
|
||||
#### captions.ts — Remotion 端字幕查找
|
||||
|
||||
新增 `getCurrentSegment()` 和 `getCurrentWordIndex()` 函数:
|
||||
|
||||
- 根据当前帧时间精确查找应显示的字幕段落和高亮字索引
|
||||
- 处理字间间隙(两字之间返回前一字索引,保持高亮连续)
|
||||
- 超过最后一字结束时间时返回最后一字(避免末尾闪烁)
|
||||
|
||||
---
|
||||
|
||||
### 2. LatentSync 嘴型参数调优
|
||||
|
||||
| 参数 | Day28 值 | Day29 值 | 说明 |
|
||||
|------|----------|----------|------|
|
||||
| `LATENTSYNC_INFERENCE_STEPS` | 16 | 20 | 适当增加步数提升嘴型质量 |
|
||||
| `LATENTSYNC_GUIDANCE_SCALE` | (默认) | 2.0 | 平衡嘴型贴合度与自然感 |
|
||||
| `LATENTSYNC_ENABLE_DEEPCACHE` | (默认) | true | DeepCache 加速推理 |
|
||||
| `LATENTSYNC_SEED` | (默认) | 1247 | 固定种子保证可复现 |
|
||||
| Remotion concurrency | 16 | 4 | 降低并发防止资源争抢 |
|
||||
|
||||
---
|
||||
|
||||
### 3. compose() 流复制替代冗余重编码(高优先级)
|
||||
|
||||
**文件**: `video_service.py`
|
||||
|
||||
- **问题**: `compose()` 只是合并视频轨+音频轨(mux),却每次用 `libx264 -preset medium -crf 20` 做完整重编码,耗时数分钟。整条流水线一个视频最多被 x264 编码 5 次
|
||||
- **方案**: 不需要循环时(`loop_count == 1`)用 `-c:v copy` 流复制,几乎瞬间完成;需要循环时仍用 libx264
|
||||
|
||||
```python
|
||||
if loop_count > 1:
|
||||
cmd.extend(["-c:v", "libx264", "-preset", "fast", "-crf", "23"])
|
||||
else:
|
||||
cmd.extend(["-c:v", "copy"])
|
||||
```
|
||||
|
||||
- compose 是中间产物(Remotion 会再次编码),流复制省一次编码且无质量损失
|
||||
|
||||
---
|
||||
|
||||
### 4. FFmpeg 超时保护(高优先级)
|
||||
|
||||
**文件**: `video_service.py`
|
||||
|
||||
- `_run_ffmpeg()`: 新增 `timeout=600`(10 分钟),捕获 `subprocess.TimeoutExpired`
|
||||
- `_get_duration()`: 新增 `timeout=30`
|
||||
- 防止畸形视频导致 FFmpeg 永久挂起阻塞后台任务
|
||||
|
||||
---
|
||||
|
||||
### 5. 全局任务并发限制(高优先级)
|
||||
|
||||
**文件**: `workflow.py`
|
||||
|
||||
- 模块级 `asyncio.Semaphore(2)`,`process_video_generation()` 入口 acquire
|
||||
- 排队中的任务显示"排队中..."状态
|
||||
- 防止多个请求同时跑 FFmpeg + Remotion 导致 CPU/内存爆炸
|
||||
|
||||
```python
|
||||
_generation_semaphore = asyncio.Semaphore(2)
|
||||
|
||||
async def process_video_generation(task_id, req, user_id):
|
||||
_update_task(task_id, message="排队中...")
|
||||
async with _generation_semaphore:
|
||||
await _process_video_generation_inner(task_id, req, user_id)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 6. Redis 任务 TTL + 索引清理(中优先级)
|
||||
|
||||
**文件**: `task_store.py`
|
||||
|
||||
- `create()`: 设 24 小时 TTL(`ex=86400`)
|
||||
- `update()`: completed/failed 状态设 2 小时 TTL(`ex=7200`),其余 24 小时
|
||||
- `list()`: 遍历时顺带清理已过期的索引条目(`srem`)
|
||||
- 解决 Redis 任务 key 永久堆积问题
|
||||
|
||||
---
|
||||
|
||||
### 7. 临时字体文件清理(中优先级)
|
||||
|
||||
**文件**: `workflow.py`
|
||||
|
||||
- `prepare_style_for_remotion()` 复制字体到 temp_dir,但未加入清理列表
|
||||
- 现在遍历三组前缀(subtitle/title/secondary_title)× 四种扩展名(.ttf/.otf/.woff/.woff2),将存在的字体文件加入 `temp_files`
|
||||
|
||||
---
|
||||
|
||||
### 8. Whisper+split 逻辑去重(低优先级)
|
||||
|
||||
**文件**: `workflow.py`
|
||||
|
||||
- 两个分支(custom_assignments 不匹配 vs 默认)的 Whisper→_split_equal 代码 100% 相同(36 行重复)
|
||||
- 提取为内部函数 `_whisper_and_split()`,两个分支共用
|
||||
|
||||
---
|
||||
|
||||
### 9. LipSync 死代码清理(低优先级)
|
||||
|
||||
**文件**: `lipsync_service.py`
|
||||
|
||||
- 删除 `_preprocess_video()` 方法(92 行),全项目无任何调用
|
||||
|
||||
---
|
||||
|
||||
### 10. 标题字幕预览背景 CORS 修复
|
||||
|
||||
- **问题**: 前端域名从 `vigent.hbyrkj.top` 迁移到 `ipagent.ai-labz.cn` 后,素材签名 URL(`api.hbyrkj.top`)与新前端域名完全不同根域,Supabase Kong 网关的 CORS 不覆盖新域名 → `<video crossOrigin="anonymous">` 加载失败 → canvas 截帧失败 → 回退渐变背景
|
||||
- **根因**: Day28 实现依赖 Supabase 返回 `Access-Control-Allow-Origin` 头,换域名后此依赖断裂
|
||||
|
||||
**修复方案 — 同源代理(彻底绕开 CORS)**:
|
||||
|
||||
| 组件 | 改动 |
|
||||
|------|------|
|
||||
| `materials/router.py` | 新增 `GET /api/materials/stream/{material_id}` 端点,通过 `get_local_file_path()` 从本地磁盘直读,返回 `FileResponse` |
|
||||
| `useHomeController.ts` | 帧截取 URL 改为 `/api/materials/stream/${mat.id}`(同源),不再用跨域签名 URL |
|
||||
| `useVideoFrameCapture.ts` | 移除 `crossOrigin = "anonymous"`,同源请求不需要 |
|
||||
|
||||
链路:`用户点预览 → /api/materials/stream/xxx → Next.js rewrite → FastAPI FileResponse → 同源 <video> → canvas 截帧成功`
|
||||
|
||||
---
|
||||
|
||||
### 11. 支付宝回调域名更新
|
||||
|
||||
**文件**: `.env`
|
||||
|
||||
```
|
||||
ALIPAY_NOTIFY_URL=https://ipagent.ai-labz.cn/api/payment/notify
|
||||
ALIPAY_RETURN_URL=https://ipagent.ai-labz.cn/pay
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📁 修改文件清单
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `backend/app/services/whisper_service.py` | 时间戳平滑 + 原文节奏映射 + 单字时长钳位 |
|
||||
| `remotion/src/utils/captions.ts` | 新增 `getCurrentSegment` / `getCurrentWordIndex` |
|
||||
| `backend/app/services/video_service.py` | compose 流复制 + FFmpeg 超时保护 |
|
||||
| `backend/app/modules/videos/workflow.py` | Semaphore(2) 并发限制 + 字体清理 + Whisper 逻辑去重 |
|
||||
| `backend/app/modules/videos/task_store.py` | Redis TTL + 索引过期清理 |
|
||||
| `backend/app/services/lipsync_service.py` | 删除 `_preprocess_video()` 死代码 |
|
||||
| `backend/app/services/remotion_service.py` | concurrency 16 → 4 |
|
||||
| `remotion/render.ts` | 新增 concurrency 参数支持 |
|
||||
| `backend/app/modules/materials/router.py` | 新增 `/stream/{material_id}` 同源代理端点 |
|
||||
| `frontend/.../useVideoFrameCapture.ts` | 移除 crossOrigin |
|
||||
| `frontend/.../useHomeController.ts` | 帧截取 URL 改用同源代理 |
|
||||
| `backend/.env` | 嘴型参数 + 支付宝域名更新 |
|
||||
|
||||
---
|
||||
|
||||
### 12. CosyVoice 语气控制功能
|
||||
|
||||
- **功能**: 声音克隆模式下新增"语气"下拉菜单(正常/欢快/低沉/严肃),利用 CosyVoice3 的 `inference_instruct2()` 方法通过自然语言指令控制语气情绪
|
||||
- **默认行为不变**: 选择"正常"时仍走 `inference_zero_shot()`,与改动前完全一致
|
||||
|
||||
#### 数据流
|
||||
|
||||
```
|
||||
用户选择语气 → setEmotion("happy") → localStorage 持久化
|
||||
→ 生成配音 → emotion 映射为 instruct_text
|
||||
→ POST /api/generated-audios/generate { instruct_text }
|
||||
→ voice_clone_service → POST localhost:8010/generate { instruct_text }
|
||||
→ instruct_text 非空 ? inference_instruct2() : inference_zero_shot()
|
||||
```
|
||||
|
||||
#### CosyVoice 服务 — `cosyvoice_server.py`
|
||||
|
||||
- `/generate` 端点新增 `instruct_text: str = Form("")` 参数
|
||||
- 推理分支:空 → `inference_zero_shot()`,非空 → `inference_instruct2(text, instruct_text, ref_audio_path, ...)`
|
||||
- `inference_instruct2` 不需要 `prompt_text`,直接接受 `instruct_text` + `prompt_wav`
|
||||
|
||||
#### 后端透传
|
||||
|
||||
- `schemas.py`: `GenerateAudioRequest` 新增 `instruct_text: Optional[str] = None`
|
||||
- `service.py`: `generate_audio_task()` voiceclone 分支传递 `instruct_text=req.instruct_text or ""`
|
||||
- `voice_clone_service.py`: `_generate_once()` 和 `generate_audio()` 新增 `instruct_text` 参数
|
||||
|
||||
#### 前端
|
||||
|
||||
- `useHomeController.ts`: 新增 `emotion` state + `emotionToInstruct` 映射表
|
||||
- `useHomePersistence.ts`: 语气选择持久化到 localStorage
|
||||
- `useGeneratedAudios.ts`: `generateAudio` params 新增 `instruct_text`
|
||||
- `GeneratedAudiosPanel.tsx`: 语气下拉菜单(语速按钮左侧),复用语速下拉样式,仅 voiceclone 模式可见
|
||||
- `HomePage.tsx`: 透传 `emotion`/`onEmotionChange`
|
||||
|
||||
#### instruct_text 格式(来自 CosyVoice3 instruct_list)
|
||||
|
||||
```
|
||||
正常: ""(走 inference_zero_shot)
|
||||
欢快: "You are a helpful assistant. 请非常开心地说一句话。<|endofprompt|>"
|
||||
低沉: "You are a helpful assistant. 请非常伤心地说一句话。<|endofprompt|>"
|
||||
严肃: "You are a helpful assistant. 请非常生气地说一句话。<|endofprompt|>"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📁 修改文件清单
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `backend/app/services/whisper_service.py` | 时间戳平滑 + 原文节奏映射 + 单字时长钳位 |
|
||||
| `remotion/src/utils/captions.ts` | 新增 `getCurrentSegment` / `getCurrentWordIndex` |
|
||||
| `backend/app/services/video_service.py` | compose 流复制 + FFmpeg 超时保护 |
|
||||
| `backend/app/modules/videos/workflow.py` | Semaphore(2) 并发限制 + 字体清理 + Whisper 逻辑去重 |
|
||||
| `backend/app/modules/videos/task_store.py` | Redis TTL + 索引过期清理 |
|
||||
| `backend/app/services/lipsync_service.py` | 删除 `_preprocess_video()` 死代码 |
|
||||
| `backend/app/services/remotion_service.py` | concurrency 16 → 4 |
|
||||
| `remotion/render.ts` | 新增 concurrency 参数支持 |
|
||||
| `backend/app/modules/materials/router.py` | 新增 `/stream/{material_id}` 同源代理端点 |
|
||||
| `frontend/.../useVideoFrameCapture.ts` | 移除 crossOrigin |
|
||||
| `frontend/.../useHomeController.ts` | 帧截取 URL 改用同源代理 + emotion state + emotionToInstruct 映射 |
|
||||
| `backend/.env` | 嘴型参数 + 支付宝域名更新 |
|
||||
| `models/CosyVoice/cosyvoice_server.py` | `/generate` 新增 `instruct_text` 参数,分支 `inference_instruct2` / `inference_zero_shot` |
|
||||
| `backend/app/services/voice_clone_service.py` | `_generate_once` / `generate_audio` 新增 `instruct_text` 透传 |
|
||||
| `backend/app/modules/generated_audios/schemas.py` | `GenerateAudioRequest` 新增 `instruct_text` 字段 |
|
||||
| `backend/app/modules/generated_audios/service.py` | voiceclone 分支传递 `instruct_text` |
|
||||
| `frontend/.../useGeneratedAudios.ts` | `generateAudio` params 新增 `instruct_text` |
|
||||
| `frontend/.../useHomePersistence.ts` | emotion 持久化 (localStorage) |
|
||||
| `frontend/.../GeneratedAudiosPanel.tsx` | 语气下拉菜单 UI (embedded + standalone) |
|
||||
| `frontend/.../HomePage.tsx` | 透传 emotion / onEmotionChange |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证
|
||||
|
||||
1. **字幕同步**: 生成视频观察逐字高亮,不应出现超前/滞后/跳空
|
||||
2. **compose 流复制**: FFmpeg 日志中 compose 步骤应出现 `-c:v copy`,耗时从分钟级降到秒级
|
||||
3. **FFmpeg 超时**: 代码确认 timeout 参数已加
|
||||
4. **并发限制**: 连续提交 3 个任务,第 3 个应显示"排队中",前 2 个完成后才开始
|
||||
5. **Redis TTL**: `redis-cli TTL vigent:tasks:<id>` 确认有过期时间
|
||||
6. **字体清理**: 生成视频后 temp 目录不应残留字体文件
|
||||
7. **预览背景**: 选择素材 → 点击"预览样式",应显示视频第一帧(非渐变)
|
||||
8. **支付宝**: 发起支付后回调和跳转地址为新域名
|
||||
9. **语气控制**: 声音克隆模式选择"开心"/"生气"生成配音,CosyVoice 日志出现 `🎭 Instruct mode`,音频语气有明显变化
|
||||
10. **语气默认**: 选择"正常"时行为与改动前完全相同(走 `inference_zero_shot`)
|
||||
11. **语气持久化**: 切换语气后刷新页面,下拉菜单恢复上次选择
|
||||
12. **语气可见性**: 语气下拉仅在 voiceclone 模式显示,edgetts 模式不显示
|
||||
405
Docs/DevLogs/Day30.md
Normal file
405
Docs/DevLogs/Day30.md
Normal file
@@ -0,0 +1,405 @@
|
||||
## Remotion 缓存修复 + 编码流水线质量优化 + 唇形同步容错 + 统一下拉交互 (Day 30)
|
||||
|
||||
### 概述
|
||||
|
||||
本轮最终合并为五大方面:(1) Remotion bundle 缓存导致标题/字幕丢失的严重 Bug;(2) 全面优化 LatentSync + MuseTalk 双引擎编码流水线,消除冗余有损编码;(3) 增强 LatentSync 的鲁棒性,允许素材中部分帧检测不到人脸时继续推理而非中断任务;(4) 唇形模型选择全链路透传(默认/快速/高级);(5) 首页与发布页选择器统一为 SelectPopover 交互,并修复遮挡、定位与预览层级问题。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 改动内容
|
||||
|
||||
### 1. Remotion Bundle 缓存 404 修复(严重 Bug)
|
||||
|
||||
- **问题**: 生成的视频没有标题和字幕,Remotion 渲染失败后静默回退到 FFmpeg(无文字叠加能力)
|
||||
- **根因**: Remotion 的 bundle 缓存机制只在首次打包时复制 `publicDir`(视频/字体所在目录)。代码稳定后缓存持续命中,新生成的视频和字体文件不在旧缓存的 `public/` 目录 → Remotion HTTP server 返回 404 → 渲染失败
|
||||
- **尝试**: 先用 `fs.symlinkSync` 符号链接,但 Remotion 内部 HTTP server 不支持跟随符号链接
|
||||
- **最终方案**: 使用 `fs.linkSync` 硬链接(同文件系统零拷贝,对应用完全透明),跨文件系统时自动回退为 `fs.copyFileSync`
|
||||
|
||||
**文件**: `remotion/render.ts`
|
||||
|
||||
```typescript
|
||||
function ensureInCachedPublic(cachedPublicDir, srcAbsPath, fileName) {
|
||||
// 检查是否已存在且为同一 inode
|
||||
// 优先硬链接(零拷贝),跨文件系统回退为复制
|
||||
try {
|
||||
fs.linkSync(srcAbsPath, cachedPath);
|
||||
} catch {
|
||||
fs.copyFileSync(srcAbsPath, cachedPath);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
使用缓存 bundle 时,自动将当前渲染所需的文件(视频 + 字体)硬链接到缓存的 `public/` 目录:
|
||||
- 视频文件(`videoFileName`)
|
||||
- 字体文件(从 `subtitleStyle` / `titleStyle` / `secondaryTitleStyle` 的 `font_file` 字段提取)
|
||||
|
||||
---
|
||||
|
||||
### 2. 视频编码流水线质量优化
|
||||
|
||||
对完整流水线做全面审查,发现从素材上传到最终输出,视频最多经历 **5-6 次有损重编码**,而官方 LatentSync demo 只有 1-2 次。
|
||||
|
||||
#### 优化前编码链路
|
||||
|
||||
| # | 阶段 | CRF | 问题 |
|
||||
|---|------|-----|------|
|
||||
| 1 | 方向归一化 | 23 | 条件触发 |
|
||||
| 2 | `prepare_segment` 缩放+时长 | 23 | 必经,质量偏低 |
|
||||
| 3 | LatentSync `read_video` FPS 转换 | 18 | **即使已是 25fps 也重编码** |
|
||||
| 4 | LatentSync `imageio` 写帧 | 13 | 模型输出 |
|
||||
| 5 | LatentSync final mux | 18 | **CRF13 刚写完立刻 CRF18 重编码** |
|
||||
| 6 | compose | copy | Day29 已优化 |
|
||||
| 7 | 多素材 concat | 23 | **段参数已统一,不需要重编码** |
|
||||
| 8 | Remotion 渲染 | ~18 | 必经(叠加文字) |
|
||||
|
||||
#### 优化措施
|
||||
|
||||
##### 2a. LatentSync `read_video` 跳过冗余 FPS 重编码
|
||||
|
||||
**文件**: `models/LatentSync/latentsync/utils/util.py`
|
||||
|
||||
- 原代码无条件执行 `ffmpeg -r 25 -crf 18`,即使输入视频已是 25fps
|
||||
- 新增 FPS 检测:`abs(current_fps - 25.0) < 0.5` 时直接使用原文件
|
||||
- 我们的 `prepare_segment` 已统一输出 25fps,此步完全多余
|
||||
|
||||
```python
|
||||
cap = cv2.VideoCapture(video_path)
|
||||
current_fps = cap.get(cv2.CAP_PROP_FPS)
|
||||
cap.release()
|
||||
|
||||
if abs(current_fps - 25.0) < 0.5:
|
||||
print(f"Video already at {current_fps:.1f}fps, skipping FPS conversion")
|
||||
target_video_path = video_path
|
||||
else:
|
||||
# 仅非 25fps 时才重编码
|
||||
command = f"ffmpeg ... -r 25 -crf 18 ..."
|
||||
```
|
||||
|
||||
##### 2b. LatentSync final mux 流复制替代重编码
|
||||
|
||||
**文件**: `models/LatentSync/latentsync/pipelines/lipsync_pipeline.py`
|
||||
|
||||
- 原代码:`imageio` 以 CRF 13 高质量写完帧后,final mux 又用 `libx264 -crf 18` 完整重编码
|
||||
- 修复:改为 `-c:v copy` 流复制,仅 mux 音频轨,视频零损失
|
||||
|
||||
```diff
|
||||
- ffmpeg ... -c:v libx264 -crf 18 -c:a aac -q:v 0 -q:a 0
|
||||
+ ffmpeg ... -c:v copy -c:a aac -q:a 0
|
||||
```
|
||||
|
||||
##### 2c. `prepare_segment` + `normalize_orientation` CRF 23 → 18
|
||||
|
||||
**文件**: `backend/app/services/video_service.py`
|
||||
|
||||
- `normalize_orientation`:CRF 23 → 18
|
||||
- `prepare_segment` trim 临时文件:CRF 23 → 18
|
||||
- `prepare_segment` 主命令:CRF 23 → 18
|
||||
- CRF 18 是"高质量"级别,与 LatentSync 内部标准一致
|
||||
|
||||
##### 2d. 多素材 concat 流复制
|
||||
|
||||
**文件**: `backend/app/services/video_service.py`
|
||||
|
||||
- 原代码用 `libx264 -crf 23` 重编码拼接
|
||||
- 所有段已由 `prepare_segment` 统一为相同分辨率/帧率/编码参数
|
||||
- 改为 `-c:v copy` 流复制,消除一次完整重编码
|
||||
|
||||
```diff
|
||||
- -vsync cfr -r 25 -c:v libx264 -preset fast -crf 23 -pix_fmt yuv420p
|
||||
+ -c:v copy
|
||||
```
|
||||
|
||||
#### 优化后编码链路
|
||||
|
||||
| # | 阶段 | CRF | 状态 |
|
||||
|---|------|-----|------|
|
||||
| 1 | 方向归一化 | **18** | 提质(条件触发) |
|
||||
| 2 | `prepare_segment` | **18** | 提质(必经) |
|
||||
| 3 | ~~LatentSync FPS 转换~~ | - | **已消除** |
|
||||
| 4 | LatentSync 模型输出 | 13 | 不变(不可避免) |
|
||||
| 5 | ~~LatentSync final mux~~ | - | **已消除(copy)** |
|
||||
| 6 | compose | copy | 不变 |
|
||||
| 7 | ~~多素材 concat~~ | - | **已消除(copy)** |
|
||||
| 8 | Remotion 渲染 | ~18 | 不变(不可避免) |
|
||||
|
||||
**总计:5-6 次有损编码 → 3 次**(prepare_segment → LatentSync 模型输出 → Remotion),质量损失减少近一半。
|
||||
|
||||
---
|
||||
|
||||
## 📁 修改文件清单
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `remotion/render.ts` | bundle 缓存使用时硬链接视频+字体到 public 目录 |
|
||||
| `models/LatentSync/latentsync/utils/util.py` | `read_video` 检测 FPS,25fps 时跳过重编码 |
|
||||
| `models/LatentSync/latentsync/pipelines/lipsync_pipeline.py` | final mux `-c:v copy`;无脸帧容错(affine_transform + restore_video) |
|
||||
| `backend/app/services/video_service.py` | `normalize_orientation` CRF 23→18;`prepare_segment` CRF 23→18;`concat_videos` `-c:v copy` |
|
||||
| `backend/app/modules/videos/workflow.py` | 单素材 LatentSync 异常时回退原视频 |
|
||||
|
||||
---
|
||||
|
||||
### 3. LatentSync 无脸帧容错
|
||||
|
||||
- **问题**: 素材中如果有部分帧检测不到人脸(转头、遮挡、空镜头),`affine_transform` 会抛异常导致整个推理任务失败
|
||||
- **改动**:
|
||||
- `affine_transform_video`: 单帧异常时 catch 住,用最近有效帧的 face/box/affine_matrix 填充(保证 tensor batch 维度完整),全部帧无脸时仍 raise
|
||||
- `restore_video`: 新增 `valid_face_flags` 参数,无脸帧直接保留原画面(不做嘴型替换)
|
||||
- `loop_video`: `valid_face_flags` 跟随循环和翻转
|
||||
- `workflow.py`: 单素材路径 `lipsync.generate()` 整体异常时 copy 原视频继续流程,任务不会失败
|
||||
|
||||
---
|
||||
|
||||
### 4. MuseTalk 编码链路优化
|
||||
|
||||
#### 4a. FFmpeg rawvideo 管道直编码(消除中间有损文件)
|
||||
|
||||
**文件**: `models/MuseTalk/scripts/server.py`
|
||||
|
||||
- **原流程**: UNet 推理帧 → `cv2.VideoWriter(mp4v)` 写中间文件(有损) → FFmpeg 重编码+音频 mux(又一次有损)
|
||||
- **新流程**: UNet 推理帧 → FFmpeg rawvideo stdin 管道 → 一次 libx264 编码+音频 mux
|
||||
|
||||
```python
|
||||
ffmpeg_cmd = [
|
||||
"ffmpeg", "-y", "-v", "warning",
|
||||
"-f", "rawvideo", "-pix_fmt", "bgr24",
|
||||
"-s", f"{w}x{h}", "-r", str(fps),
|
||||
"-i", "-", # stdin 管道输入
|
||||
"-i", audio_path,
|
||||
"-c:v", "libx264", "-preset", ENCODE_PRESET, "-crf", str(ENCODE_CRF),
|
||||
"-pix_fmt", "yuv420p",
|
||||
"-c:a", "copy", "-shortest",
|
||||
output_vid_path,
|
||||
]
|
||||
ffmpeg_proc = subprocess.Popen(ffmpeg_cmd, stdin=subprocess.PIPE, ...)
|
||||
# 每帧直接 pipe_in.write(frame.tobytes())
|
||||
```
|
||||
|
||||
关键实现细节:
|
||||
- `-pix_fmt bgr24` 匹配 OpenCV 原生帧格式,零转换开销
|
||||
- `np.ascontiguousarray` 确保帧内存连续
|
||||
- `BrokenPipeError` 捕获 + return code 检查覆盖异常路径
|
||||
- `pipe_in.close()` 在 `ffmpeg_proc.wait()` 之前,正确发送 EOF
|
||||
- 合成 fallback(resize 失败、mask 失败、blending 失败)均通过 `_write_pipe_frame` 输出原帧
|
||||
|
||||
#### 4b. MuseTalk 参数环境变量化 + 质量优先档
|
||||
|
||||
**文件**: `models/MuseTalk/scripts/server.py` + `backend/.env`
|
||||
|
||||
所有推理与编码参数从硬编码改为 `.env` 可配置,当前使用"质量优先"档:
|
||||
|
||||
| 参数 | 原默认值 | 质量优先值 | 作用 |
|
||||
|------|----------|-----------|------|
|
||||
| `MUSETALK_DETECT_EVERY` | 5 | **2** | 人脸检测频率 ↑2.5x,画面跟踪更稳 |
|
||||
| `MUSETALK_BLEND_CACHE_EVERY` | 5 | **2** | mask 更新更频,面部边缘融合更干净 |
|
||||
| `MUSETALK_EXTRA_MARGIN` | 15 | **14** | 下巴区域微调 |
|
||||
| `MUSETALK_BLEND_MODE` | auto | **jaw** | v1.5 显式 jaw 模式 |
|
||||
| `MUSETALK_ENCODE_CRF` | 18 | **14** | 接近视觉无损(输出还要进 Remotion 再编码) |
|
||||
| `MUSETALK_ENCODE_PRESET` | medium | **slow** | 同 CRF 下压缩效率更高 |
|
||||
| `MUSETALK_AUDIO_PADDING` | 2/2 | 2/2 | 不变 |
|
||||
| `MUSETALK_FACEPARSING_CHEEK` | 90/90 | 90/90 | 不变 |
|
||||
|
||||
新增可配置参数完整列表:`DETECT_EVERY`、`BLEND_CACHE_EVERY`、`AUDIO_PADDING_LEFT/RIGHT`、`EXTRA_MARGIN`、`DELAY_FRAME`、`BLEND_MODE`、`FACEPARSING_LEFT/RIGHT_CHEEK_WIDTH`、`ENCODE_CRF`、`ENCODE_PRESET`。
|
||||
|
||||
---
|
||||
|
||||
### 5. Workflow 异步防阻塞 + compose 跳过优化
|
||||
|
||||
#### 5a. 阻塞调用线程池化
|
||||
|
||||
**文件**: `backend/app/modules/videos/workflow.py`
|
||||
|
||||
workflow 中多处同步 FFmpeg 调用会阻塞 asyncio 事件循环,导致其他 API 请求(健康检查、任务状态查询)无法响应。新增通用辅助函数 `_run_blocking()`,将所有阻塞调用统一走线程池:
|
||||
|
||||
```python
|
||||
async def _run_blocking(func, *args):
|
||||
"""在线程池执行阻塞函数,避免卡住事件循环。"""
|
||||
loop = asyncio.get_running_loop()
|
||||
return await loop.run_in_executor(None, func, *args)
|
||||
```
|
||||
|
||||
已改造的阻塞调用点:
|
||||
|
||||
| 调用 | 位置 | 说明 |
|
||||
|------|------|------|
|
||||
| `video.normalize_orientation()` | 单素材旋转归一化 | FFmpeg 旋转/转码 |
|
||||
| `video.prepare_segment()` | 多素材片段准备 | FFmpeg 缩放+时长裁剪,配合 `asyncio.gather` 多段并行 |
|
||||
| `video.concat_videos()` | 多素材拼接 | FFmpeg concat |
|
||||
| `video.prepare_segment()` | 单素材 prepare | FFmpeg 缩放+时长裁剪 |
|
||||
| `video.mix_audio()` | BGM 混音 | FFmpeg 音频混合 |
|
||||
| `video._get_duration()` | 音频/视频时长探测 (3处) | ffprobe 子进程 |
|
||||
|
||||
#### 5b. `prepare_segment` 同分辨率跳过 scale
|
||||
|
||||
**文件**: `backend/app/modules/videos/workflow.py`
|
||||
|
||||
原来无论素材分辨率是否已匹配目标,都强制传 `target_resolution` 给 `prepare_segment`,触发 scale filter + libx264 重编码。优化后逐素材比对分辨率:
|
||||
|
||||
- **多素材**: 逐段判断,分辨率匹配的传 `None`(`prepare_target_res = None if res == base_res else base_res`),走 `-c:v copy` 分支
|
||||
- **单素材**: 先 `get_resolution` 比对,匹配则传 `None`
|
||||
|
||||
当分辨率匹配且无截取、不需要循环、不需要变帧率时,`prepare_segment` 内部走 `-c:v copy`,完全零损编码。
|
||||
|
||||
#### 5c. `_get_duration()` 线程池化
|
||||
|
||||
**文件**: `backend/app/modules/videos/workflow.py`
|
||||
|
||||
3 处 `video._get_duration()` 同步 ffprobe 调用改为 `await _run_blocking(video._get_duration, ...)`,避免阻塞事件循环。
|
||||
|
||||
#### 5d. compose 循环场景 CRF 统一
|
||||
|
||||
**文件**: `backend/app/services/video_service.py`
|
||||
|
||||
`compose()` 在视频需要循环时的编码从 CRF 23 提升到 CRF 18,与全流水线质量标准统一。
|
||||
|
||||
#### 5e. 多素材片段校验
|
||||
|
||||
**文件**: `backend/app/modules/videos/workflow.py`
|
||||
|
||||
多素材 `prepare_segment` 完成后新增片段数量一致性校验,避免空片段进入 concat 导致异常。
|
||||
|
||||
#### 5f. compose() 内部防阻塞
|
||||
|
||||
**文件**: `backend/app/services/video_service.py`
|
||||
|
||||
`compose()` 改为 `async def`,内部的 `_get_duration()` 和 `_run_ffmpeg()` 都通过 `loop.run_in_executor` 在线程池执行。
|
||||
|
||||
#### 5g. 无需二次 compose 直接透传
|
||||
|
||||
**文件**: `backend/app/modules/videos/workflow.py`
|
||||
|
||||
当没有 BGM 时(`final_audio_path == audio_path`),LatentSync/MuseTalk 输出已包含正确音轨,跳过多余的 compose 步骤:
|
||||
|
||||
```python
|
||||
needs_audio_compose = str(final_audio_path) != str(audio_path)
|
||||
```
|
||||
|
||||
- **Remotion 路径**: 音频没变则跳过 pre-compose,直接用 lipsync 输出进 Remotion
|
||||
- **非 Remotion 路径**: 音频没变则 `shutil.copy` 直接透传 lipsync 输出,不再走 compose
|
||||
|
||||
---
|
||||
|
||||
### 6. 唇形模型选择全链路
|
||||
|
||||
前端“生成视频”按钮右侧新增模型选择,下拉值全链路透传到后端路由与推理服务。
|
||||
|
||||
#### 模型选项
|
||||
|
||||
| 选项 | 值 | 路由逻辑 |
|
||||
|------|------|------|
|
||||
| 默认模型 | `default` | 保持阈值路由(`LIPSYNC_DURATION_THRESHOLD`,当前建议 100s) |
|
||||
| 快速模型 | `fast` | 强制 MuseTalk,不可用时回退 LatentSync |
|
||||
| 高级模型 | `advanced` | 强制 LatentSync |
|
||||
|
||||
#### 最终 UI 形态
|
||||
|
||||
- 模型按钮由原生 `<select>` 升级为统一 `SelectPopover`
|
||||
- 触发器文案改为业务语义(`默认模型 / 快速模型 / 高级模型` + `按时长智能路由 / 速度优先 / 质量优先`)
|
||||
- 选择状态持久化到 `useHomePersistence`(`lipsyncModelMode`)
|
||||
|
||||
#### 数据流
|
||||
|
||||
```
|
||||
前端 SelectPopover → setLipsyncModelMode("fast") → localStorage 持久化
|
||||
↓
|
||||
用户点击"生成视频" → handleGenerate()
|
||||
→ payload.lipsync_model = lipsyncModelMode
|
||||
→ POST /api/videos/generate { ..., lipsync_model: "fast" }
|
||||
→ workflow: req.lipsync_model 透传给 lipsync.generate(model_mode=...)
|
||||
→ lipsync_service.generate(): 按 model_mode 路由
|
||||
→ fast: 强制 MuseTalk → 回退 LatentSync
|
||||
→ advanced: 强制 LatentSync
|
||||
→ default: 阈值策略
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 7. 首页/发布页统一下拉交互(SelectPopover)
|
||||
|
||||
#### 7a. 统一改造范围
|
||||
|
||||
首页与发布页的业务选择项统一迁移到 `SelectPopover`:
|
||||
|
||||
- 首页:音色、参考音频、配音列表、素材选择、BGM 选择、作品选择、标题显示模式、标题/副标题/字幕样式、时间轴画面比例、唇形模型
|
||||
- 发布页:选择发布作品(搜索 + 预览)
|
||||
|
||||
例外:`ScriptEditor` 的“历史文案 / AI多语言”按产品要求恢复为原有轻量菜单,不强制统一。
|
||||
|
||||
#### 7b. 关键交互修复
|
||||
|
||||
- **遮挡修复**:桌面端面板改为 `Portal + fixed`,脱离局部 stacking context,彻底解决被卡片遮挡
|
||||
- **上拉/下拉自适应**:底部空间不足时自动上拉,避免菜单显示不全
|
||||
- **同宽展示**:面板宽度与触发器保持一致
|
||||
- **风格统一**:面板背景加实(高不透明度),滚动条隐藏但可滚动
|
||||
- **已选定位**:再次打开下拉时自动滚动到已选项(`data-popover-selected="true"`)
|
||||
- **预览协同**:
|
||||
- 下拉内点“预览”不强制关闭,支持连续预览
|
||||
- 视频预览弹窗层级高于下拉,避免被遮挡
|
||||
- 预览弹窗打开时,下拉不会因外部点击/Esc被误关闭;关闭预览后仍可继续操作
|
||||
|
||||
#### 7c. BGM 面板收敛
|
||||
|
||||
- BGM 改为与“发布作品”同款选择器(搜索 + 列表 + 试听 + 选中态)
|
||||
- 按产品要求移除首页 BGM 音量滑杆
|
||||
- 生成请求统一使用固定 `bgm_volume=0.2`
|
||||
|
||||
---
|
||||
|
||||
## 📁 总修改文件清单
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `remotion/render.ts` | bundle 缓存使用时硬链接视频+字体到 public 目录 |
|
||||
| `models/LatentSync/latentsync/utils/util.py` | `read_video` 检测 FPS,25fps 时跳过重编码 |
|
||||
| `models/LatentSync/latentsync/pipelines/lipsync_pipeline.py` | final mux `-c:v copy`;无脸帧容错 |
|
||||
| `backend/app/services/video_service.py` | CRF 23→18;`concat_videos` copy;`compose()` 异步化 + 循环 CRF 18 |
|
||||
| `backend/app/modules/videos/workflow.py` | 线程池化;同分辨率跳过 scale;compose 跳过;片段校验;模型选择透传 |
|
||||
| `backend/app/modules/videos/schemas.py` | 新增 `lipsync_model` 字段 |
|
||||
| `backend/app/services/lipsync_service.py` | `generate()` 新增 `model_mode` 三路分支路由 |
|
||||
| `models/MuseTalk/scripts/server.py` | FFmpeg rawvideo 管道;参数环境变量化 |
|
||||
| `backend/.env` | MuseTalk 推理/融合/编码参数可配;路由阈值与质量档调优 |
|
||||
| `frontend/src/shared/ui/SelectPopover.tsx` | 新增统一选择器:Portal+fixed、防遮挡、上拉/下拉自适应、同宽、隐藏滚动条、已选定位、预览协同 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 配音卡层级修复;传递统一下拉状态 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | `lipsyncModelMode` 透传;BGM 固定 `bgm_volume=0.2` |
|
||||
| `frontend/src/features/home/model/useHomePersistence.ts` | 模型模式等新增字段持久化 |
|
||||
| `frontend/src/features/home/ui/GenerateActionBar.tsx` | 模型选择改为 SelectPopover(速度/质量语义文案) |
|
||||
| `frontend/src/features/home/ui/VoiceSelector.tsx` | 音色选择统一为 SelectPopover(音色名+语言) |
|
||||
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 参考音频选择统一为 SelectPopover(含试听/重命名/删除/重识别) |
|
||||
| `frontend/src/features/home/ui/GeneratedAudiosPanel.tsx` | 配音列表、语速、语气统一为 SelectPopover |
|
||||
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 素材选择改为发布页同款下拉(搜索/多选/预览/重命名/删除) |
|
||||
| `frontend/src/features/home/ui/BgmPanel.tsx` | BGM 选择改为发布页同款下拉(搜索+试听),移除音量滑杆 |
|
||||
| `frontend/src/features/home/ui/HistoryList.tsx` | 首页作品选择改为下拉(搜索+删除+选中态) |
|
||||
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | 标题显示模式与样式选择统一为 SelectPopover |
|
||||
| `frontend/src/features/home/ui/TimelineEditor.tsx` | 画面比例选择统一为 SelectPopover(单行按钮) |
|
||||
| `frontend/src/features/publish/ui/PublishPage.tsx` | 发布作品选择改为 SelectPopover;预览时下拉保持打开 |
|
||||
| `frontend/src/components/VideoPreviewModal.tsx` | 提升层级并添加预览标记,与下拉联动 |
|
||||
| `frontend/src/features/home/ui/ScriptEditor.tsx` | 历史文案/AI多语言恢复原轻量菜单(产品例外) |
|
||||
| `Docs/FRONTEND_DEV.md` | 新增 SelectPopover 规范、预览层级规范、持久化字段修订 |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证
|
||||
|
||||
1. **标题字幕恢复**: 生成视频应有标题和逐字高亮字幕(Remotion 渲染成功,非 FFmpeg 回退)
|
||||
2. **Remotion 日志**: 应出现 `Hardlinked into cached bundle:` 或 `Copied into cached bundle:` 而非 404
|
||||
3. **LatentSync FPS 跳过**: 日志应出现 `Video already at 25.0fps, skipping FPS conversion`
|
||||
4. **LatentSync mux**: FFmpeg 日志中 final mux 应为 `-c:v copy`
|
||||
5. **画质对比**: 同一素材+音频,优化后生成的视频嘴型区域(尤其牙齿)应比优化前更清晰
|
||||
6. **多素材拼接**: concat 步骤应为流复制,耗时从秒级降到毫秒级
|
||||
7. **无脸帧容错**: 包含转头/遮挡帧的素材不再导致任务失败,无脸帧保留原画面
|
||||
8. **MuseTalk 管道编码**: 日志中不应出现中间 mp4v 文件,合成阶段直接管道写入
|
||||
9. **MuseTalk 质量参数**: `curl localhost:8011/health` 确认服务在线,生成视频嘴型边缘更清晰
|
||||
10. **事件循环不阻塞**: 生成视频期间,`/api/tasks/{id}` 等接口应正常响应,不出现超时
|
||||
11. **compose 跳过**: 无 BGM 时日志应出现 `Audio unchanged, skip pre-Remotion compose`
|
||||
12. **同分辨率跳过 scale**: 素材已是目标分辨率时,`prepare_segment` 应走 `-c:v copy`(日志中无 scale filter)
|
||||
13. **compose 循环 CRF**: 循环场景编码应为 CRF 18(非 23)
|
||||
14. **模型选择 UI**: 生成按钮右侧应出现默认模型/快速模型/高级模型下拉
|
||||
15. **模型选择持久化**: 切换模型后刷新页面,下拉应恢复上次选择
|
||||
16. **快速模型路由**: 选择"快速模型"时,后端日志应出现 `强制快速模型:MuseTalk`
|
||||
17. **高级模型路由**: 选择"高级模型"时,后端日志应出现 `强制高级模型:LatentSync`
|
||||
18. **默认模型不变**: 选择"默认模型"时行为与改动前完全一致(阈值路由)
|
||||
19. **统一下拉样式**: 首页/发布页业务选择项均为同款 SelectPopover(触发器 + 面板 + 选中态)
|
||||
20. **上拉自适应**: 页面底部打开下拉时应自动上拉,不出现被截断
|
||||
21. **已选定位**: 任意下拉再次打开时应自动定位到已选项,而非列表顶端
|
||||
22. **预览层级**: 视频预览弹窗应始终覆盖在下拉之上,不被菜单遮挡
|
||||
23. **连续预览**: 下拉内点击预览后菜单保持打开,关闭预览后可继续点击其他预览项
|
||||
24. **BGM 行为**: 首页 BGM 不再显示音量滑杆,生成请求固定 `bgm_volume=0.2`
|
||||
526
Docs/DevLogs/Day31.md
Normal file
526
Docs/DevLogs/Day31.md
Normal file
@@ -0,0 +1,526 @@
|
||||
## 文档分层收敛 + 音色试听修复 + 录音弹窗重构 + 弹窗体系统一 (Day 31)
|
||||
|
||||
### 概述
|
||||
|
||||
今天的工作聚焦四件事:
|
||||
|
||||
1. 清理并收敛根目录文档(README/DEV 职责边界、历史内容归档、参数描述与代码对齐)
|
||||
2. 完成 EdgeTTS 音色列表「一键试听」能力,并修复浏览器端试听失败问题
|
||||
3. 重构声音克隆录音交互:录音入口下沉到参考音频区域底部右侧,流程改为弹窗
|
||||
4. 抽离统一弹窗基座 `AppModal`,将主要弹窗迁移到同一视觉和交互规范
|
||||
|
||||
---
|
||||
|
||||
## ✅ 1) 文档体系与内容一致性优化
|
||||
|
||||
### 1.1 README / DEV 边界明确
|
||||
|
||||
- 为 `FRONTEND_README.md`、`BACKEND_README.md`、`FRONTEND_DEV.md`、`BACKEND_DEV.md` 增加「文档定位」
|
||||
- README 只保留稳定说明(功能、接口、运行),DEV 保留规范(约束、分层、Checklist)
|
||||
- 将 README 中偏日志化内容(如 Day 标注)清理为稳定表述
|
||||
|
||||
### 1.2 部署与参数文档对齐当前代码
|
||||
|
||||
- 将唇形路由阈值文案统一为阈值驱动,并以当前 `.env` 示例 `100` 为参考
|
||||
- 修正旧编码描述(将 MuseTalk 合成描述对齐为 rawvideo 管道 + `libx264`)
|
||||
- 修复文档中不存在的 `.env.example` 指引,改为基于 `backend/.env` 的说明
|
||||
- 将 Qwen3-TTS 文档标注为「历史归档(已停用)」并指向 CosyVoice 3.0
|
||||
|
||||
---
|
||||
|
||||
## ✅ 2) 音色试听能力落地与故障修复
|
||||
|
||||
### 2.1 功能实现
|
||||
|
||||
- 音色下拉项新增试听按钮(播放/暂停/加载态)
|
||||
- 新增后端试听接口:`/api/videos/voice-preview`
|
||||
- 试听文本按音色 locale 自动选择固定示例文案(9 国语言 + 中文兜底)
|
||||
|
||||
### 2.2 兼容与稳定性调整
|
||||
|
||||
- 保留 `POST /api/videos/voice-preview`(兼容)
|
||||
- 新增 `GET /api/videos/voice-preview?voice=...`,前端改为直接播放 GET 音频流,减少浏览器自动播放策略干扰
|
||||
|
||||
```python
|
||||
@router.get("/voice-preview")
|
||||
async def preview_voice_get(voice: str, current_user: dict = Depends(get_current_user)):
|
||||
voice_value = voice.strip()
|
||||
if not voice_value:
|
||||
raise HTTPException(status_code=400, detail="voice 不能为空")
|
||||
text = _get_preview_text_for_voice(voice_value)
|
||||
return await _render_voice_preview(voice=voice_value, text=text)
|
||||
```
|
||||
|
||||
### 2.3 本次线上问题结论(已修复)
|
||||
|
||||
- 现象:浏览器端试听请求 404
|
||||
- 根因:新增 GET 路由后,后端进程未重启,运行中的代码仍是旧版本
|
||||
- 处理:`pm2 restart vigent2-backend` 后路由生效
|
||||
- 补充:`curl` 返回 401(无 auth cookie)属于预期;浏览器同源请求会自动带 cookie
|
||||
|
||||
---
|
||||
|
||||
## ✅ 3) 录音交互重构(声音克隆)
|
||||
|
||||
### 3.1 入口重排
|
||||
|
||||
- 去掉参考音频面板内的独立录音大块区域
|
||||
- 将「上传音频 / 录音」入口放到「我的参考音频」区域底部右侧
|
||||
|
||||
### 3.2 录音流程改为弹窗
|
||||
|
||||
- 录音弹窗支持:开始录音 / 停止录音 / 状态计时 / 试听
|
||||
- 保留并强化「使用此录音」和「弃用本次录音」
|
||||
- 关闭弹窗时若仍在录音,会先停止录音再关闭
|
||||
- 修正弹窗挂载位置:从局部组件渲染改为 `AppModal` Portal 到 `document.body`,确保是全页面弹窗体验
|
||||
- 参考音频区按钮文案更新:`录音` -> `在线录音`
|
||||
|
||||
### 3.4 文案区按钮视觉统一
|
||||
|
||||
- 统一「文案提取与编辑」区按钮尺寸与圆角(`px-3 py-1.5 text-xs rounded-lg`)
|
||||
- 将 `AI智能改写`、`保存文案` 按钮改为与上传/在线录音同等级的视觉规格
|
||||
- 同步统一图标尺寸与禁用态样式,消除“底部按钮偏小”问题
|
||||
|
||||
### 3.5 录音试听条 UI 美化
|
||||
|
||||
- 将录音完成后的原生白色 `<audio controls>` 替换为项目深色风格的自定义试听条
|
||||
- 新试听条包含:播放/暂停按钮、进度拖拽、当前时长/总时长显示
|
||||
- 统一配色到当前页面(深色底 + 绿色强调),避免与整体 UI 风格割裂
|
||||
|
||||
### 3.6 录音上传关闭时机优化
|
||||
|
||||
- 原逻辑:点击「使用此录音」后,需等待上传+识别完成才关闭弹窗(体感卡顿)
|
||||
- 新逻辑:点击后立即关闭弹窗,上传/识别在后台继续进行
|
||||
- 状态反馈仍在参考音频区域显示(上传识别中的提示 + 失败错误提示)
|
||||
|
||||
---
|
||||
|
||||
## ✅ 5) 发布管理抖音登录「无法获取二维码」修复
|
||||
|
||||
### 问题定位
|
||||
|
||||
- 现象:发布管理中点击抖音登录,前端提示无法获取二维码
|
||||
- 后端日志显示根因:
|
||||
- `Page.goto: Timeout 30000ms exceeded`
|
||||
- 导航目标:`https://creator.douyin.com/`
|
||||
- 等待条件:`wait_until="networkidle"`
|
||||
|
||||
### 修复方案
|
||||
|
||||
- 抖音登录页改为与微信一致的更稳策略:`wait_until="domcontentloaded"`
|
||||
- 对抖音导航超时增加容错:即使 `goto` 超时,也继续执行二维码提取流程(避免长连接导致误失败)
|
||||
|
||||
### 验证
|
||||
|
||||
- 本地接口冒烟:`POST /api/publish/login/douyin` 返回 `success=true` 且包含 `qr_code`
|
||||
- 已重启后端进程使修复生效:`pm2 restart vigent2-backend`
|
||||
|
||||
### 3.3 状态逻辑补齐
|
||||
|
||||
- 新增 `discardRecording()`:清空本次录音与计时
|
||||
- 开始新录音前先清空旧录音,避免旧状态残留
|
||||
|
||||
---
|
||||
|
||||
## ✅ 4) 弹窗 UI/UX 统一(AppModal)
|
||||
|
||||
新增统一弹窗基座:`frontend/src/shared/ui/AppModal.tsx`
|
||||
|
||||
- 统一遮罩:`bg-black/80 + backdrop-blur-sm`
|
||||
- 统一容器:深色半透明背景、`border-white/10`、`rounded-2xl`、重阴影
|
||||
- 统一 Header:标题/副标题/关闭按钮
|
||||
- 统一行为:ESC 关闭、背景滚动锁定、按需控制 overlay 点击关闭
|
||||
- 统一挂载:通过 Portal 渲染到 `document.body`,避免出现“看起来只在配音区弹出”的层叠问题
|
||||
- 统一可访问性:补齐 `role="dialog"` + `aria-modal="true"`
|
||||
- 统一焦点管理:打开弹窗自动聚焦,关闭后恢复到打开前焦点元素
|
||||
- 统一滚动锁计数:支持多弹窗并存,避免一个弹窗关闭后提前恢复页面滚动
|
||||
|
||||
已迁移弹窗:
|
||||
|
||||
- 视频预览(`VideoPreviewModal`)
|
||||
- 文案提取(`ScriptExtractionModal`)
|
||||
- AI 改写(`RewriteModal`)
|
||||
- 截取设置(`ClipTrimmer`)
|
||||
- 录音弹窗(`RefAudioPanel` 内)
|
||||
- 修改密码弹窗(`AccountSettingsDropdown`)
|
||||
- 发布管理扫码登录弹窗(`PublishPage` 内 QR 登录弹窗)
|
||||
|
||||
---
|
||||
|
||||
## ✅ 6) 微信视频号登录二维码观感优化(“能扫但像被截断”)
|
||||
|
||||
### 问题现象
|
||||
|
||||
- 微信视频号登录二维码可扫码成功,但视觉上像“边缘不完整/被切掉”,观感不佳
|
||||
|
||||
### 修复方案
|
||||
|
||||
- 后端二维码提取策略增强(`qr_login_service.py`):
|
||||
- 优先导出二维码原始 PNG 数据(`canvas.toDataURL('image/png')` / `img[data:image/png]`),减少二次截图导致的边缘损失
|
||||
- 微信回退截图时改为“按二维码 bbox 外扩留白裁剪”,避免贴边截取带来的不完整感
|
||||
- 仅接受 PNG Data URL,避免把非 PNG(如 SVG 片段)直接当二维码返回造成边角异常
|
||||
- 前端扫码弹窗展示优化(`PublishPage.tsx`):
|
||||
- 取消二维码图片本体圆角裁切,改为外层白底容器 + 内边距(模拟 quiet zone)
|
||||
- 同步调整二维码显示宽度与边框,提升完整感与观感一致性
|
||||
|
||||
### 验证
|
||||
|
||||
- 本地接口冒烟:`POST /api/publish/login/weixin` 返回 `success=true` 且包含 `qr_code`
|
||||
- 解码后图片尺寸为 `1000x1000`,扫码仍正常
|
||||
- 前后端进程已重启使修复生效:
|
||||
- `pm2 restart vigent2-frontend`
|
||||
- `pm2 restart vigent2-backend`
|
||||
|
||||
---
|
||||
|
||||
## ✅ 7) 发布流程性能与日志可读性优化(双平台发布场景)
|
||||
|
||||
### 7.1 发布请求并发优化(前端)
|
||||
|
||||
- 原逻辑:发布页按平台串行 `for...of await`,多平台总耗时为各平台耗时累加
|
||||
- 新逻辑:引入受限并发执行(并发度=2),两平台可并行发布,显著缩短总等待时长
|
||||
- 结果列表仍按用户选择的平台顺序回填,避免并发返回导致顺序抖动
|
||||
|
||||
### 7.2 微信上传日志噪声优化(后端)
|
||||
|
||||
- 原逻辑:`set_input_files` 后若立即读不到 `input.files[0]` 就直接打 warning:`[weixin][file_input] empty`
|
||||
- 新逻辑:先轮询确认“是否已进入上传中状态”,再决定是否告警;非最后一次重试只记 info,最后一次才 warning
|
||||
- 效果:减少误报警(实际已开始上传时不再刷 warning),排障日志更干净
|
||||
|
||||
### 验证
|
||||
|
||||
- `python -m py_compile backend/app/services/uploader/weixin_uploader.py` ✅
|
||||
- `npm run build`(frontend)✅
|
||||
- 服务重启:`pm2 restart vigent2-frontend && pm2 restart vigent2-backend` ✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 8) 小红书发布链路对齐改造(启动模式 / Cookie 格式 / 成功截图)
|
||||
|
||||
### 8.1 启动模式与反检测参数对齐
|
||||
|
||||
- 在 `config.py` 新增小红书 Playwright 配置:
|
||||
- `XIAOHONGSHU_HEADLESS_MODE`(默认 `headless-new`)
|
||||
- `XIAOHONGSHU_USER_AGENT / LOCALE / TIMEZONE_ID`
|
||||
- `XIAOHONGSHU_CHROME_PATH / BROWSER_CHANNEL`
|
||||
- `XIAOHONGSHU_FORCE_SWIFTSHADER / DEBUG_ARTIFACTS`
|
||||
- `xiaohongshu_uploader.py` 改为与抖音/微信一致的可配置启动策略,并保留反检测基础参数(`--disable-blink-features=AutomationControlled`)
|
||||
|
||||
### 8.2 小红书 uploader 重构增强
|
||||
|
||||
- 重写小红书 uploader 主流程(参考抖音/微信模式):
|
||||
- 上传入口/文件 input 多选择器回退
|
||||
- 上传中/成功/失败状态轮询判定
|
||||
- 标题与正文/话题填充容错
|
||||
- 发布按钮多选择器与可点击检查
|
||||
- 发布成功判定从“仅 URL”增强为“多信号组合”:
|
||||
- URL 跳转判定
|
||||
- 页面成功/失败文案判定
|
||||
- 发布 API 响应监听(`publish` / `note create` 类接口)
|
||||
- 发布成功后补齐截图能力并返回 `screenshot_url`(路径格式与抖音/微信一致):
|
||||
- `/api/publish/screenshot/{filename}`
|
||||
|
||||
### 8.3 Cookie 保存格式统一
|
||||
|
||||
- `publish_service.save_cookie_string()` 调整:
|
||||
- `bilibili` 继续使用原有简化 cookie dict(兼容既有上传库)
|
||||
- 非 `bilibili` 平台统一保存为 Playwright `storage_state`:
|
||||
- `{"cookies": [...], "origins": []}`
|
||||
- 补充平台默认 domain(抖音/微信/小红书),使 cookie 文件可直接用于 `browser.new_context(storage_state=...)`
|
||||
|
||||
### 8.4 验证与生效
|
||||
|
||||
- `python -m py_compile backend/app/core/config.py backend/app/services/publish_service.py backend/app/services/uploader/xiaohongshu_uploader.py` ✅
|
||||
- `pm2 restart vigent2-backend` ✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 9) 小红书登录二维码修复(默认短信登录需先切换)
|
||||
|
||||
### 问题现象
|
||||
|
||||
- 小红书创作平台 `https://creator.xiaohongshu.com/` 默认落在“短信登录”视图
|
||||
- 二维码需要先点击右上角切换图标才会出现,导致后端直接按二维码选择器抓取失败
|
||||
|
||||
### 修复方案(`qr_login_service.py`)
|
||||
|
||||
- 新增 `_ensure_xiaohongshu_qr_mode()`:
|
||||
- 先检测是否处于短信登录(`input[placeholder*='手机号']`)
|
||||
- 自动点击登录卡片右上角切换图标(优先稳定选择器,失败后用几何位置兜底)
|
||||
- 切换后等待二维码渲染再进入提取流程
|
||||
- 扩展小红书二维码选择器集合:
|
||||
- 增加登录卡片内二维码图片选择器(包含当前页面结构)
|
||||
- 保留通用 `img[src*='qr'/'qrcode']` 兜底
|
||||
- 提高小红书候选过滤阈值(`min_side=120`),避免误选右上角切换小图标
|
||||
- 文本策略补充小红书关键词(如 `APP扫一扫登录`)
|
||||
|
||||
### 验证
|
||||
|
||||
- 本地接口冒烟:`POST /api/publish/login/xiaohongshu` 返回 `success=true` 且 `qr_code` 非空
|
||||
- 后端日志确认修复链路生效:
|
||||
- `已点击登录方式切换,等待二维码渲染`
|
||||
- `策略1(CSS): 匹配成功`
|
||||
|
||||
---
|
||||
|
||||
## ✅ 10) 小红书发布上传阶段修复(“发布笔记 - 上传视频”场景)
|
||||
|
||||
### 问题现象
|
||||
|
||||
- 小红书发布在“上传视频”阶段失败,页面停留在发布页,前端提示发布失败
|
||||
- 后端日志显示 `set_input_files` 触发成功,但短时间内未检测到上传状态,导致重复触发上传并误判失败
|
||||
- 进一步定位到上传文件实际是 Supabase 本地对象文件(无后缀),日志里 `file_input type=` 为空,平台可能无法正确识别视频 MIME
|
||||
|
||||
### 修复方案(`xiaohongshu_uploader.py`)
|
||||
|
||||
- 新增上传启动探测窗口 `UPLOAD_SIGNAL_TIMEOUT=12s`:
|
||||
- `set_input_files` 成功后给上传状态留出启动时间
|
||||
- 检测到“上传中/处理中/转码中”等信号即进入后续上传轮询
|
||||
- 启动窗口内未出现明显信号时,不再立即判失败,转入主上传监控阶段继续等待
|
||||
- 修正失败判定词:
|
||||
- 从失败关键词中移除 `重新上传`(该文案在小红书页面常作为正常状态/操作入口,不能直接视为失败)
|
||||
- 增补上传文件诊断日志:
|
||||
- 输出 `file_input` 选中文件名/大小/类型,便于确认文件是否真正注入 input
|
||||
- 上传失败命中时记录明确告警日志,便于线上快速定位
|
||||
- 增加无后缀视频文件兜底:
|
||||
- 若原文件无后缀且父目录名带后缀(如 `xxx.mp4/<uuid>`),自动在 `/tmp/vigent_uploads` 生成同名临时文件(硬链接/软链接/复制兜底)
|
||||
- 上传改用带后缀临时文件,提升站点 MIME 识别稳定性
|
||||
- 任务结束后自动清理临时上传文件
|
||||
|
||||
### 10.1 二次定位与加固(卡住复现后)
|
||||
|
||||
- 复现日志显示:即使传入了带后缀临时路径,`file_input` 中仍出现无后缀文件名,且长时间停留在 `等待上传状态...`
|
||||
- 根因进一步确认:此前在跨设备场景下会走 `symlink` 回退,浏览器实际取到原始目标文件名(无后缀),导致站点识别失败
|
||||
- 加固修复:
|
||||
- 去掉 `symlink` 回退,仅保留 `hardlink -> copy`,确保最终上传文件名稳定带 `.mp4`
|
||||
- 新增 `file_input` 文件名后缀一致性校验:若与预期后缀不一致,直接重试并在最终失败时提前返回(不再无意义长时间等待)
|
||||
- 新增上传空转超时保护(`UPLOAD_IDLE_TIMEOUT=90s`):长时间无有效上传信号时提前失败并保留调试截图,避免前端“看起来卡死”
|
||||
- 优化失败文案为“未能触发有效视频上传,请确认发布页状态及视频文件格式”
|
||||
|
||||
### 10.2 实时发布验证(修复后)
|
||||
|
||||
- 重新发起 `POST /api/publish`(小红书),后端完整走通上传+发布,接口返回 `200`
|
||||
- 本次实测耗时约 `45.77s`,属于上传与发布等待区间内的正常时长
|
||||
- 发布成功截图可访问:`GET /api/publish/screenshot/xiaohongshu_success_20260303_115944_633.png` 返回 `200`
|
||||
- 关键日志链路:`正在上传` -> `已设置上传文件` -> `等待发布结果` -> `Cookie 更新完毕`
|
||||
|
||||
### 验证
|
||||
|
||||
- `python -m py_compile backend/app/services/uploader/xiaohongshu_uploader.py` ✅
|
||||
- `pm2 restart vigent2-backend` ✅
|
||||
- `curl http://127.0.0.1:8006/health` 返回 `{"status":"ok"}` ✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 11) 首页「AI生成标题标签」按钮位置优化(迁移到四、标题与字幕)
|
||||
|
||||
### 设计结论
|
||||
|
||||
- 将 `AI生成标题标签` 从「一、文案提取与编辑」迁移到「四、标题与字幕」
|
||||
- 标题区改为两行:
|
||||
- 第一行:`四、标题与字幕` 标题 + 右侧 `AI生成标题标签`
|
||||
- 第二行:右对齐放置 `标题短暂显示/常驻显示` + `预览样式`
|
||||
- 显示语义补充:`标题短暂显示/常驻显示` 对主标题与副标题统一生效(常驻=主/副标题都常驻)
|
||||
- 不额外增加提示文案,保持界面简洁
|
||||
- `AI生成标题标签` 外观对齐 `在线录音` 按钮的圆角与尺寸(`rounded-lg` + 同级按钮尺寸),颜色保留原蓝色渐变
|
||||
|
||||
### 结果
|
||||
|
||||
- 标题相关动作集中到同一板块,避免用户在「一」和「四」之间来回跳转
|
||||
- 行内层级更明确:AI 动作在标题同层,配置项与预览在下一行
|
||||
- AI 按钮圆角与尺寸更柔和,配色仍保持原蓝色渐变,视觉更统一
|
||||
|
||||
### 验证
|
||||
|
||||
- `npm run build`(frontend)✅
|
||||
- `pm2 restart vigent2-frontend` ✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 12) 文案编辑框右下角扩展角标(弹出大编辑器)
|
||||
|
||||
### 设计与实现
|
||||
|
||||
- 在「一、文案提取与编辑」主输入框右下角新增角标按钮(点击后打开扩展编辑器)
|
||||
- 扩展编辑器使用 `AppModal`,提供更大编辑空间(高约 `66vh`)
|
||||
- 主输入框与弹窗内输入框共享同一份 `text` 状态,双向实时同步
|
||||
- 为避免角标遮挡正文,主输入框增加右下内边距(`pr-6 pb-6`)
|
||||
- 角标样式进一步极简化:仅保留双箭头图标,去掉外框容器并贴近输入框边缘
|
||||
- 角标位置微调为更协调的“上移+右移”:`right-0.5 bottom-2`,并固定点击区域 `h-5 w-5`
|
||||
- 修复扩展编辑输入焦点丢失:`AppModal` 改为使用 `onCloseRef` 处理 ESC,避免父组件重渲染时 effect 误清理导致 textarea 失焦
|
||||
- 移除扩展编辑输入框紫色聚焦边框,改为中性边框高亮(`focus:border-white/25`)
|
||||
|
||||
### 验证
|
||||
|
||||
- `npm run build`(frontend)✅
|
||||
- `pm2 restart vigent2-frontend` ✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 13) 站点 Icon 替换(使用 `Temp/video.png`)
|
||||
|
||||
### 变更
|
||||
|
||||
- 将提供的 `Temp/video.png` 转换并替换为站点图标资源
|
||||
- 新增 `frontend/src/app/icon.png`(Next App Router icon 资源)
|
||||
- 更新 `frontend/src/app/favicon.ico`(16/32/48/64 多尺寸)
|
||||
|
||||
### 验证
|
||||
|
||||
- `npm run build`(frontend)✅
|
||||
- 构建产物包含 `/icon.png` 路由 ✅
|
||||
- `pm2 restart vigent2-frontend` ✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 14) 发布后工作区清理链路加固(CleanupContext + `/api/videos/cleanup`)
|
||||
|
||||
### 14.1 功能落地
|
||||
|
||||
- 发布页新增“全平台发布成功后清理引导”链路:
|
||||
- 全平台成功:触发 `CleanupModal`
|
||||
- 任一平台失败:保持原内联结果展示
|
||||
- `CleanupModal` 支持展示:成功平台列表、成功截图、下载视频备份、一键清理
|
||||
- 清理状态 `cleanup_pending` 持久化到 localStorage,刷新/跳转后可恢复
|
||||
|
||||
### 14.2 稳定性与防锁死优化
|
||||
|
||||
- 后端删除能力改为“异常上抛”,避免静默吞错导致前端误判清理成功
|
||||
- 清理接口改为严格成功语义:
|
||||
- 视频和配音删除都成功才返回 success
|
||||
- 任一删除失败直接返回错误,前端保留弹窗并允许重试
|
||||
- 前端清理动作改为“先后端、后本地”:
|
||||
- 后端失败:不清本地、不关弹窗
|
||||
- 后端成功:再清理本地输入字段并关闭弹窗
|
||||
- 后端成功清理后前端派发 `vigent:workspace-cleared` 事件,发布页就地重置标题/标签输入态(无需手动刷新)
|
||||
- 连续失败达到阈值(3 次)后显示“暂不清理,继续使用”,避免异常环境下永久阻塞
|
||||
- 清理弹窗增加 24h 过期,避免跨天残留状态
|
||||
- 用户切换/登出时重置 cleanup 状态,避免旧账号状态串扰
|
||||
|
||||
### 14.3 清理范围口径
|
||||
|
||||
- 仅清理输入内容字段:
|
||||
- 首页:文案/标题/副标题
|
||||
- 发布页:标题/标签
|
||||
- 保留用户偏好字段(样式、字号、边距、模型、BGM 等)
|
||||
|
||||
### 验证
|
||||
|
||||
- `python -m py_compile backend/app/services/storage.py backend/app/modules/videos/service.py backend/app/modules/generated_audios/service.py backend/app/modules/videos/router.py` ✅
|
||||
- `npm run build`(frontend)✅
|
||||
- `pm2 restart vigent2-backend && pm2 restart vigent2-frontend` ✅
|
||||
- `curl http://127.0.0.1:8006/health` 返回 `{"status":"ok"}` ✅
|
||||
|
||||
---
|
||||
|
||||
## 📁 今日主要修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `backend/app/modules/videos/router.py` | 新增/增强 `voice-preview` GET+POST,试听文本 locale 路由,临时文件清理;新增 `POST /api/videos/cleanup` 严格成功语义 |
|
||||
| `backend/app/modules/videos/service.py` | 新增批量删除生成视频能力;返回 `(deleted, failed)` 供 cleanup 路由判定 |
|
||||
| `backend/app/modules/generated_audios/service.py` | 新增批量删除预生成配音能力;返回 `(deleted, failed)` 供 cleanup 路由判定 |
|
||||
| `backend/app/services/storage.py` | `delete_file()` 改为异常上抛,避免删除失败静默吞错造成“假成功” |
|
||||
| `backend/app/modules/videos/schemas.py` | 新增 `VoicePreviewRequest` |
|
||||
| `frontend/src/features/home/ui/VoiceSelector.tsx` | 音色下拉增加试听按钮,改为 GET 音频流播放 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 录音状态重置、`discardRecording` |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 透传录音弃用动作;将 `AI生成标题标签` 事件改为传入 `TitleSubtitlePanel` |
|
||||
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 上传/录音入口重排;录音改弹窗;使用/弃用流程 |
|
||||
| `frontend/src/features/home/ui/ScriptEditor.tsx` | 文案编辑区按钮视觉统一;移除 `AI生成标题标签`(职责回归标题板块);新增输入框右下角扩展角标与大编辑弹窗;角标改为双箭头极简贴边样式并微调到 `right-0.5 bottom-2`;输入框去除紫色聚焦边框 |
|
||||
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | 标题区改为“首行标题+AI、次行右对齐设置+预览”;AI按钮外观对齐在线录音按钮(软圆角) |
|
||||
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 录音完成试听条改为自定义深色播放器(替换原生白色控制条) |
|
||||
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 使用录音后弹窗立即关闭,上传识别后台进行(提升交互流畅度) |
|
||||
| `frontend/src/features/publish/model/usePublishController.ts` | 发布改为受限并发(并发度=2);全平台发布成功时触发 `triggerCleanup()`,失败保持内联结果;监听 `workspace-cleared` 事件就地清空发布输入态 |
|
||||
| `frontend/src/shared/contexts/CleanupContext.tsx` | 新增发布后清理弹窗与持久化状态;失败不关闭/不清本地、3 次失败可跳过、24h 过期、用户切换复位;清理范围收敛为输入内容字段;成功清理后派发 `workspace-cleared` 事件 |
|
||||
| `frontend/src/app/layout.tsx` | 在 `TaskProvider` 内挂载 `CleanupProvider`,确保全局可触发发布后清理弹窗 |
|
||||
| `backend/app/core/config.py` | 新增小红书 Playwright 配置(headless/UA/locale/timezone/chrome/debug) |
|
||||
| `backend/app/services/uploader/xiaohongshu_uploader.py` | 按抖音/微信模式重构;补充上传启动容错窗口、无后缀文件兜底(hardlink/copy)、后缀一致性校验、空转超时保护与上传诊断日志 |
|
||||
| `backend/app/services/publish_service.py` | `save_cookie_string` 非 bilibili 统一存储为 Playwright `storage_state`;小红书 uploader 透传 `user_id` |
|
||||
| `backend/app/services/qr_login_service.py` | 抖音导航超时容错 + 微信二维码提取增强 + 小红书登录自动切换到扫码模式并提取二维码 |
|
||||
| `backend/app/services/uploader/weixin_uploader.py` | `file_input empty` 告警策略优化:先检测上传信号,非最后一次重试降级为 info |
|
||||
| `frontend/src/shared/ui/AppModal.tsx` | 统一弹窗组件 + 无障碍语义 + 焦点管理 + 多弹窗滚动锁计数;新增 `onCloseRef` 避免回调引用变化引发的意外失焦 |
|
||||
| `frontend/src/components/VideoPreviewModal.tsx` | 迁移到 `AppModal` |
|
||||
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | 迁移到 `AppModal` |
|
||||
| `frontend/src/features/home/ui/RewriteModal.tsx` | 迁移到 `AppModal` |
|
||||
| `frontend/src/features/home/ui/ClipTrimmer.tsx` | 迁移到 `AppModal` |
|
||||
| `frontend/src/components/AccountSettingsDropdown.tsx` | 修改密码弹窗迁移到 `AppModal` |
|
||||
| `frontend/src/app/icon.png` | 新增站点 icon 资源(来自 `Temp/video.png`) |
|
||||
| `frontend/src/app/favicon.ico` | 替换站点 favicon(由 `video.png` 转换为多尺寸 ico) |
|
||||
| `frontend/src/features/publish/ui/PublishPage.tsx` | 扫码登录(QR)弹窗迁移到 `AppModal` + 二维码白底留白容器优化(避免边缘观感被裁) |
|
||||
| `Docs/FRONTEND_DEV.md` | 新增统一弹窗规范(AppModal)和录音交互规范;补充文案扩展编辑也统一走 AppModal;新增 CleanupContext 清理策略规范 |
|
||||
| `Docs/FRONTEND_README.md` | 增补录音入口与弹窗交互说明;明确“标题常驻显示”对主/副标题同时生效;补充文案输入框扩展编辑器说明;补充发布后清理弹窗失败兜底说明 |
|
||||
| `Docs/BACKEND_README.md` | 增补 `voice-preview` 接口说明;更新发布 API 路径(`/login/{platform}` 等)并链接发布专项文档;补充 `title_display_mode` 对主/副标题统一生效说明;新增 `/api/videos/cleanup` 接口说明 |
|
||||
| `Docs/BACKEND_DEV.md` | 更新后端规范中的发布器覆盖范围与小红书配置项;补充发布专项文档指引;补充 `title_display_mode` 主/副标题统一生效约定;新增 cleanup 严格成功语义约定 |
|
||||
| `Docs/PUBLISH_DEPLOY.md` | 新增多平台发布专项文档(登录实现、自动化发布流程、部署要点与排障);补充“发布成功后清理联动”说明 |
|
||||
| `Docs/DEPLOY_MANUAL.md` | 部署参数与扫码说明补充小红书要点;新增发布专项文档入口 |
|
||||
| `README.md` | 文档中心新增 `PUBLISH_DEPLOY.md` 入口;发布结果可视化描述补齐小红书;补充发布成功后工作区清理引导说明 |
|
||||
| `Docs/TASK_COMPLETE.md` | 新增 Day31 任务汇总,更新 Current 标签与更新时间;补充发布后清理链路加固条目 |
|
||||
| `Docs/DOC_RULES.md` | 增补“发布相关三检”(路由真值/专项文档/入口回写)、敏感信息处理规范,更新工具规范为 `Read/Grep/apply_patch`,并对齐 TASK_COMPLETE 检查清单 |
|
||||
| `Docs/SUBTITLE_DEPLOY.md` | 与当前阈值/参数说明对齐 |
|
||||
| `Docs/LATENTSYNC_DEPLOY.md` | 与当前阈值/参数说明对齐 |
|
||||
| `Docs/COSYVOICE3_DEPLOY.md` | TTS 部署说明与当前运行路径对齐 |
|
||||
| `Docs/QWEN3_TTS_DEPLOY.md` | 标注为历史归档并指向 CosyVoice 3.0 |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证记录
|
||||
|
||||
- `python -m py_compile backend/app/modules/videos/router.py backend/app/modules/videos/schemas.py` ✅
|
||||
- `python -m py_compile backend/app/services/qr_login_service.py` ✅
|
||||
- `python -m py_compile backend/app/services/uploader/weixin_uploader.py` ✅
|
||||
- `python -m py_compile backend/app/core/config.py backend/app/services/publish_service.py backend/app/services/uploader/xiaohongshu_uploader.py` ✅
|
||||
- `POST /api/publish/login/xiaohongshu` 冒烟返回 `success=true` + `qr_code` ✅
|
||||
- `python -m py_compile backend/app/services/uploader/xiaohongshu_uploader.py`(上传阶段修复后)✅
|
||||
- `pm2 restart vigent2-backend`(上传阶段修复后)✅
|
||||
- `curl http://127.0.0.1:8006/health` 返回 `{"status":"ok"}` ✅
|
||||
- `backend/venv/bin/python` 本地探针验证 `_prepare_upload_file()`:临时文件非 symlink、后缀 `.mp4`、清理成功 ✅
|
||||
- 小红书发布实测:`POST /api/publish` 返回 `200`(`Duration: 45.77s`)且成功截图接口返回 `200` ✅
|
||||
- 新增 `Docs/PUBLISH_DEPLOY.md`(抖音/微信/B站/小红书登录与发布实现说明)✅
|
||||
- `npm run build`(frontend)✅
|
||||
- 站点 icon 替换后构建通过,产物包含 `/icon.png` 路由 ✅
|
||||
- `pm2 restart vigent2-frontend`(icon 替换后)✅
|
||||
- `python -m py_compile backend/app/services/storage.py backend/app/modules/videos/service.py backend/app/modules/generated_audios/service.py backend/app/modules/videos/router.py`(cleanup 链路加固后)✅
|
||||
- `npm run build`(CleanupContext 优化后)✅
|
||||
- `pm2 restart vigent2-backend && pm2 restart vigent2-frontend`(cleanup 链路加固后)✅
|
||||
- `curl http://127.0.0.1:8006/health` 返回 `{"status":"ok"}`(cleanup 链路加固后)✅
|
||||
- `POST /api/publish/login/weixin` 冒烟返回 `success=true` + `qr_code` ✅
|
||||
- `npx eslint` 定向检查以下文件通过:
|
||||
- `VoiceSelector.tsx`
|
||||
- `RefAudioPanel.tsx`
|
||||
- `HomePage.tsx`
|
||||
- `useHomeController.ts`
|
||||
- `AppModal.tsx`
|
||||
- `VideoPreviewModal.tsx`
|
||||
- `ScriptExtractionModal.tsx`
|
||||
- `RewriteModal.tsx`
|
||||
- `AccountSettingsDropdown.tsx`
|
||||
- `ClipTrimmer.tsx` 仍有仓库既有 lint 规则项(`react-hooks/set-state-in-effect`),与本次弹窗风格迁移无关
|
||||
- 音色试听线上问题经后端重启后已恢复可用(浏览器同源携带 cookie)
|
||||
|
||||
---
|
||||
|
||||
## ☑️ Day31 覆盖核对(今日新增补充)
|
||||
|
||||
已对照今天新增改动做二次核对,以下内容已写入本日志:
|
||||
|
||||
- `AppModal` 的可访问性与焦点/滚动锁稳健性增强
|
||||
- 微信视频号二维码“观感不完整”问题的后端提取修复
|
||||
- 发布页二维码展示样式优化(白底留白、去除本体圆角裁切)
|
||||
- 小红书 uploader 对齐重构(启动参数、发布判定、成功截图)
|
||||
- 小红书“上传阶段卡住”二次定位与加固(文件名后缀一致性 + 空转超时)并完成实测发布成功
|
||||
- 形成发布专项文档 `Docs/PUBLISH_DEPLOY.md`,沉淀四平台登录与自动化发布实现
|
||||
- 回写 `Docs/BACKEND_README.md` / `Docs/BACKEND_DEV.md` / `Docs/DEPLOY_MANUAL.md`,统一发布 API 与部署说明口径
|
||||
- 回写 `Docs/FRONTEND_README.md` / `Docs/FRONTEND_DEV.md` / `Docs/PUBLISH_DEPLOY.md`,补齐发布后清理弹窗与 cleanup 接口联动说明
|
||||
- 回写 `README.md`,补充发布专项文档入口与小红书发布成功截图能力描述
|
||||
- 回写 `Docs/TASK_COMPLETE.md`,补齐 Day31 任务完成记录
|
||||
- 回写 `Docs/DOC_RULES.md`,同步文档更新规则到当前文档结构与工具链
|
||||
- 首页「AI生成标题标签」按钮迁移到「四、标题与字幕」并固定标题同层最右;显示方式与预览下沉到下一行右侧
|
||||
- 文案输入框右下角新增扩展角标,支持弹出大编辑器进行长文案编辑
|
||||
- 站点 icon 已替换为 `Temp/video.png` 对应资源(`app/icon.png` + `app/favicon.ico`)
|
||||
- 发布后工作区清理链路落地(CleanupModal + `/api/videos/cleanup`)并补齐失败兜底(失败不关弹窗、不清本地)
|
||||
- 清理链路防锁死优化:3 次失败可跳过、24h 过期、用户切换复位
|
||||
- 文档补充:`标题短暂显示/常驻显示` 对主标题与副标题统一生效(常驻=主/副标题全程显示)
|
||||
- 非 bilibili 平台 cookie 保存为 `storage_state` 格式
|
||||
- 小红书登录二维码自动切换(短信登录 -> 扫码登录)与提取修复
|
||||
- 对应构建/重启/冒烟验证记录
|
||||
- 今日运行期产物(`backend/user_data/**/cookies/*.json`、`watchdog.log`)为会话副产物,不属于代码/文档变更项
|
||||
159
Docs/DevLogs/Day32.md
Normal file
159
Docs/DevLogs/Day32.md
Normal file
@@ -0,0 +1,159 @@
|
||||
## 视频下载同源修复 + 安全漏洞第一批修复 (Day 32)
|
||||
|
||||
### 概述
|
||||
|
||||
今天的工作聚焦四件事:
|
||||
|
||||
1. 修复首页与发布成功弹窗点击下载时被浏览器当作在线播放(新开标签页)的问题。
|
||||
2. 将下载修复开始后的开发内容从 `Day31` 拆分到 `Day32`,保持日志按天清晰归档。
|
||||
3. 根据安全审计报告(`Temp/安全审计报告.md`),实施第一批 6 项无功能风险的安全修复。
|
||||
4. 统一弹窗关闭交互(仅关闭策略):默认支持点空白关闭,发布成功清理弹窗保持强制留存。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 1) 视频下载链路修复(避免新开标签页播放)
|
||||
|
||||
### 问题现象
|
||||
|
||||
- 首页“下载视频”与发布成功弹窗“下载视频备份”在部分浏览器会打开新标签页播放视频,而不是直接触发下载。
|
||||
- 根因是跨域签名 URL 场景下,浏览器可能忽略 `<a download>`。
|
||||
|
||||
### 修复方案
|
||||
|
||||
- 后端新增同源下载接口:`GET /api/videos/generated/{video_id}/download`
|
||||
- 使用 `FileResponse` 返回本地视频文件
|
||||
- 显式返回 `Content-Disposition: attachment`
|
||||
- 浏览器直接进入保存文件流程
|
||||
- 发布成功弹窗下载改为传 `videoId`,不再依赖签名 URL。
|
||||
- 首页作品预览下载同步改为同源下载接口,下载行为与发布弹窗统一。
|
||||
- 兼容旧清理状态:`CleanupContext` 对旧 `videoDownloadUrl` 持久化字段做 `videoId` 解析回填。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 2) 配套调整与文档拆分
|
||||
|
||||
### 前端联动
|
||||
|
||||
- `CleanupContext` 继续沿用“清理失败不关弹窗、不清本地”的逻辑,下载链路仅替换为同源接口。
|
||||
- 首页 `PreviewPanel` 支持传入 `generatedVideoId`,下载按钮优先走 `/api/videos/generated/{id}/download`。
|
||||
|
||||
### 日志归档
|
||||
|
||||
- 将“下载修复开始后的内容”从 `Day31` 移出并归档到 `Day32`。
|
||||
- `Day31` 保留 Day31 当日核心内容(到 cleanup 链路加固为止)。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 3) 安全漏洞第一批修复(6 项,无功能风险)
|
||||
|
||||
根据安全审计报告,实施第一批 6 项可直接修复的安全加固项。
|
||||
|
||||
### 3.1 JWT 默认密钥启动拦截
|
||||
|
||||
- **文件**:`backend/app/main.py`
|
||||
- 新增 `check_jwt_secret` startup 事件(在 `init_admin` 之前)
|
||||
- 当 `JWT_SECRET_KEY` 仍为默认值 `"your-secret-key-change-in-production"` 时:
|
||||
- **生产环境**(`DEBUG=False`):`raise RuntimeError` 直接阻止服务启动
|
||||
- **开发环境**(`DEBUG=True`):输出 `CRITICAL` 级别日志告警,不阻止启动
|
||||
|
||||
### 3.2 AI / Tools 接口加认证
|
||||
|
||||
- **文件**:`backend/app/modules/ai/router.py`、`backend/app/modules/tools/router.py`
|
||||
- AI 路由 3 个端点(`/translate`、`/generate-meta`、`/rewrite`)均增加 `Depends(get_current_user)`
|
||||
- Tools 路由 1 个端点(`/extract-script`)增加 `Depends(get_current_user)`
|
||||
- 前端 axios 已有 `withCredentials: true`,401 自动跳登录页,无需前端改动
|
||||
|
||||
### 3.3 素材路径穿越修复
|
||||
|
||||
- **文件**:`backend/app/modules/materials/router.py`、`backend/app/modules/materials/service.py`
|
||||
- `stream`、`delete_material`、`rename_material` 三处在 `startswith(user_id)` 校验之前新增 `..` 拒绝
|
||||
- 含 `..` 的 `material_id` 直接返回 400
|
||||
- `delete_material` 路由补充 `except ValueError` → 400(原先仅 catch `PermissionError`,`ValueError` 会被 `Exception` 兜底返回 500)
|
||||
|
||||
### 3.4 video_id 白名单校验
|
||||
|
||||
- **文件**:`backend/app/modules/videos/router.py`
|
||||
- `download_generated` 和 `delete_generated` 两个端点在函数开头增加正则校验
|
||||
- 仅允许 `^[A-Za-z0-9_-]+$`,不符合直接返回 400
|
||||
|
||||
### 3.5 上传/下载大小限制
|
||||
|
||||
- **materials/service.py**(流式上传):在 chunk 累加后检查 `MAX_UPLOAD_SIZE_MB`(默认 500MB),超限抛 `ValueError`
|
||||
- **ref_audios/service.py**(参考音频):`await file.read()` 后检查 5MB 上限
|
||||
- **tools/service.py**(文案提取文件上传):将 `shutil.copyfileobj` 替换为分块拷贝 + 500MB 限制
|
||||
- **tools/service.py**(URL 下载分支):`_download_video` 返回后检查文件体积,超 500MB 删除临时文件并拒绝
|
||||
|
||||
### 3.6 错误信息通用化
|
||||
|
||||
- **ai/router.py**:3 处 `detail=str(e)` 分别改为"翻译服务暂时不可用"、"生成标题标签失败"、"改写服务暂时不可用"
|
||||
- **tools/router.py**:保留 "Fresh cookies" 特定分支提示,fallback 改为"文案提取失败,请稍后重试"
|
||||
- **generated_audios/service.py**:任务失败 `error` 字段从 `traceback.format_exc()` 改为 `str(e)`,traceback 仅写入服务端日志
|
||||
|
||||
---
|
||||
|
||||
## ✅ 4) 弹窗关闭策略统一(UX)
|
||||
|
||||
### 目标
|
||||
|
||||
- 保持统一交互预期:业务弹窗默认可通过 `X` 与点击遮罩关闭。
|
||||
- 保留关键流程保护:发布成功清理弹窗继续禁止遮罩关闭,避免误触导致流程中断。
|
||||
- 说明:按钮位置与视觉样式统一属于 Day33 范畴,本日志仅记录关闭策略统一。
|
||||
|
||||
### 调整内容
|
||||
|
||||
- 文案提取弹窗(`ScriptExtractionModal`)支持点击遮罩关闭。
|
||||
- AI 改写弹窗(`RewriteModal`)支持点击遮罩关闭。
|
||||
- 发布页扫码登录弹窗支持点击遮罩关闭。
|
||||
- 修改密码弹窗支持点击遮罩关闭。
|
||||
- 录音弹窗采用动态策略:`closeOnOverlay={!isRecording}`
|
||||
- 未录音:允许遮罩关闭
|
||||
- 录音中:禁止遮罩关闭(防误触);`X` 关闭仍可用,且会先停止录音再关闭
|
||||
- 发布成功清理弹窗维持 `closeOnOverlay=false`,并且不提供 `onClose`(无右上角关闭按钮)。
|
||||
|
||||
---
|
||||
|
||||
## 📁 今日主要修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `backend/app/modules/videos/router.py` | 新增 `GET /api/videos/generated/{video_id}/download`,返回 `attachment` 下载响应;新增 `video_id` 白名单正则校验(`^[A-Za-z0-9_-]+$`) |
|
||||
| `frontend/src/features/publish/model/usePublishController.ts` | 发布成功后 `triggerCleanup()` 传 `video.id`(替换签名 URL) |
|
||||
| `frontend/src/shared/contexts/CleanupContext.tsx` | 下载字段改为 `videoId`;兼容旧 `videoDownloadUrl` 回填;下载按钮改同源路径 |
|
||||
| `frontend/src/features/home/ui/PreviewPanel.tsx` | 首页下载改为同源下载接口 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 透传 `generatedVideoId` 给 `PreviewPanel` |
|
||||
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | 弹窗支持点击遮罩关闭(`closeOnOverlay`) |
|
||||
| `frontend/src/features/home/ui/RewriteModal.tsx` | 弹窗支持点击遮罩关闭(`closeOnOverlay`) |
|
||||
| `frontend/src/features/publish/ui/PublishPage.tsx` | 扫码登录弹窗支持点击遮罩关闭 |
|
||||
| `frontend/src/components/AccountSettingsDropdown.tsx` | 修改密码弹窗支持点击遮罩关闭 |
|
||||
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 录音弹窗改为 `closeOnOverlay={!isRecording}`(录音中禁遮罩关闭) |
|
||||
| `Docs/DevLogs/Day31.md` | 移除下载修复章节与对应验证/覆盖项(迁入 Day32) |
|
||||
| `Docs/TASK_COMPLETE.md` | 当日新增 Day32 区块并接棒 Current(后续由 Day33 接棒 Current) |
|
||||
| `Docs/BACKEND_README.md` | 补充 `/api/videos/generated/{video_id}/download` 接口说明 |
|
||||
| `Docs/BACKEND_DEV.md` | 补充下载接口 `attachment` 约定 |
|
||||
| `Docs/FRONTEND_README.md` | 补充首页/发布弹窗下载统一同源接口说明 |
|
||||
| `Docs/FRONTEND_DEV.md` | 补充 CleanupContext 下载策略规范 |
|
||||
| `Docs/PUBLISH_DEPLOY.md` | 补充发布成功后同源下载联动说明 |
|
||||
| `README.md` | 补充”一键下载直达(同源 attachment)”能力描述 |
|
||||
| `backend/app/main.py` | `check_jwt_secret` startup 事件:生产环境(`DEBUG=False`)强拦截启动,开发环境 `CRITICAL` 告警 |
|
||||
| `backend/app/modules/ai/router.py` | 3 个端点加 `Depends(get_current_user)` 认证;错误返回改为通用消息 |
|
||||
| `backend/app/modules/tools/router.py` | `extract-script` 端点加 `Depends(get_current_user)` 认证;错误返回改为通用消息 |
|
||||
| `backend/app/modules/materials/router.py` | `stream` 端点新增 `..` 路径穿越拒绝;`delete` 端点补充 `except ValueError` → 400 |
|
||||
| `backend/app/modules/materials/service.py` | `delete_material` / `rename_material` 新增 `..` 路径穿越拒绝;流式上传增加 `MAX_UPLOAD_SIZE_MB` 大小限制 |
|
||||
| `backend/app/modules/ref_audios/service.py` | 参考音频上传增加 5MB 大小限制 |
|
||||
| `backend/app/modules/tools/service.py` | 文案提取文件上传替换为限大小分块拷贝(500MB);URL 下载分支增加下载后体积检查(500MB) |
|
||||
| `backend/app/modules/generated_audios/service.py` | 任务失败错误字段从 `traceback.format_exc()` 改为 `str(e)`,避免泄露内部路径 |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证记录
|
||||
|
||||
- `python -m py_compile backend/app/modules/videos/router.py` ✅
|
||||
- `npm run build`(frontend)✅
|
||||
- `npm run build`(frontend,弹窗关闭策略调整后复验)✅
|
||||
- `pm2 restart vigent2-frontend` ✅
|
||||
- `pm2 restart vigent2-backend` ✅
|
||||
- `curl http://127.0.0.1:8006/health` 返回 `{"status":"ok"}` ✅
|
||||
- 安全修复第一批语法验证:`python -m py_compile backend/app/main.py backend/app/modules/materials/router.py backend/app/modules/tools/service.py backend/app/modules/ai/router.py backend/app/modules/tools/router.py backend/app/modules/materials/service.py backend/app/modules/ref_audios/service.py backend/app/modules/videos/router.py backend/app/modules/generated_audios/service.py` ✅
|
||||
- 未登录调用 `/api/ai/translate` → 返回 401 ✅
|
||||
- 未登录调用 `/api/tools/extract-script` → 返回 401 ✅
|
||||
- 收尾三刀语法验证:`python -m py_compile backend/app/main.py backend/app/modules/materials/router.py backend/app/modules/tools/service.py` ✅
|
||||
290
Docs/DevLogs/Day33.md
Normal file
290
Docs/DevLogs/Day33.md
Normal file
@@ -0,0 +1,290 @@
|
||||
## 抖音短链文案提取稳健性修复 (Day 33)
|
||||
|
||||
### 概述
|
||||
|
||||
今天聚焦修复「文案提取助手」里抖音分享短链/口令文本偶发提取失败的问题,并补齐多种抖音落地 URL 形态的兼容。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 1) 问题复盘
|
||||
|
||||
### 现象
|
||||
|
||||
- 复制抖音分享口令文本(含 `v.douyin.com` 短链)时,文案提取偶发失败。
|
||||
- 直接粘贴地址栏链接(如 `jingxuan?modal_id=...`)时,提取成功。
|
||||
|
||||
### 根因
|
||||
|
||||
- `backend/app/modules/tools/service.py` 中 `_download_douyin_manual` 原先只按 `/video/{id}` 提取视频 ID。
|
||||
- 短链重定向结果并不总是 `/video/{id}`,常见还包括:
|
||||
- `/share/video/{id}`
|
||||
- `/user/...?...&vid={id}`
|
||||
- `/follow/search?...&modal_id={id}`
|
||||
- 当落到上述形态时会出现 `Could not extract video_id`,导致 fallback 失败。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 2) 修复方案
|
||||
|
||||
### 2.1 抽取统一解析函数
|
||||
|
||||
- 新增 `_extract_douyin_video_id(candidate_url)`,统一解析以下 ID 形态:
|
||||
- 路径:`/video/{id}`、`/share/video/{id}`
|
||||
- Query 参数:`modal_id`、`vid`、`video_id`、`aweme_id`、`item_id`
|
||||
- 解码后的整串 URL 兜底正则匹配
|
||||
|
||||
### 2.2 fallback 提取链路增强
|
||||
|
||||
- `_download_douyin_manual` 改为:
|
||||
1. 优先从重定向后的 `final_url` 提取 `video_id`
|
||||
2. 若失败,再从原始输入 `url` 提取 `video_id`
|
||||
- 保持后续下载链路不变:访问 `m.douyin.com/share/video/{video_id}` 提取 `play_addr` 并下载。
|
||||
|
||||
---
|
||||
|
||||
## 📁 今日修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `backend/app/modules/tools/service.py` | 新增 `_extract_douyin_video_id`;增强抖音 fallback 的 `video_id` 提取策略(兼容 `share/video`、`modal_id`、`vid` 等) |
|
||||
| `Docs/DevLogs/Day33.md` | 新增 Day33 开发日志,记录问题、根因、修复与验证 |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证记录
|
||||
|
||||
- `python -m py_compile backend/app/modules/tools/service.py` ✅
|
||||
- URL 解析冒烟(函数级):
|
||||
- `jingxuan?modal_id=...` 可提取 ✅
|
||||
- `user?...&vid=...` 可提取 ✅
|
||||
- `follow/search?...&modal_id=...` 可提取 ✅
|
||||
- 下载链路冒烟(服务级):
|
||||
- 用户提供的短链口令文本可成功下载临时视频 ✅
|
||||
- 历史失败样例 `user?...&vid=...` 可成功走通 fallback 下载 ✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 3) 文案深度学习:抖音抓取 Playwright 降级增强
|
||||
|
||||
### 3.1 问题复盘
|
||||
|
||||
- 在「文案深度学习」博主分析链路里,抖音用户页有时返回 JS 壳页(含 `byted_acrawler`),静态 HTML 提取拿不到 `desc`。
|
||||
- 表现为:短链可解析 `sec_uid`,但标题抓取报错“页面结构可能已变更”。
|
||||
|
||||
### 3.2 修复方案
|
||||
|
||||
- 在 `backend/app/services/creator_scraper.py` 中新增 Playwright 降级抓取:
|
||||
1. 保留原 HTTP + `ttwid` 抓取作为首选(轻量、快)。
|
||||
2. 当 HTTP 提取不到标题时,自动切换 Playwright。
|
||||
3. 监听页面网络响应,定向捕获:
|
||||
- `/aweme/v1/web/aweme/post/`
|
||||
- `/aweme/v1/web/user/profile/other/`
|
||||
4. 解析响应 JSON 中 `desc` 作为视频标题来源,并提取博主昵称。
|
||||
- 仅在确实失败时返回更准确提示:
|
||||
- `抖音触发风控验证,暂时无法抓取标题,请稍后重试`
|
||||
|
||||
### 3.3 结果
|
||||
|
||||
- 给定短链 `https://v.douyin.com/hmFXdx5PvzQ/` 可稳定识别并完成标题抓取。
|
||||
- 抓取结果可获得有效博主昵称与约 50 条标题(受平台返回数据影响)。
|
||||
|
||||
### 3.4 本次新增/更新文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `backend/app/services/creator_scraper.py` | 新增抖音 Playwright 降级抓取、网络响应采集、标题/昵称解析优化、错误提示优化 |
|
||||
| `Docs/DevLogs/Day33.md` | 增补文案深度学习抖音抓取增强记录 |
|
||||
|
||||
### 3.5 验证记录
|
||||
|
||||
- `python -m py_compile backend/app/services/creator_scraper.py` ✅
|
||||
- 冒烟验证:
|
||||
- 短链重定向 + `sec_uid` 提取 ✅
|
||||
- HTTP 首选链路失败时自动切换 Playwright ✅
|
||||
- Playwright 网络响应中抓取到 `aweme/post` 数据并提取标题 ✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 4) 文案深度学习功能首版落地
|
||||
|
||||
### 4.1 后端实现
|
||||
|
||||
- 新增博主抓取服务:`backend/app/services/creator_scraper.py`
|
||||
- `scrape_creator_titles(url)`:平台识别 + 标题抓取统一入口
|
||||
- `validate_url(url)`:`https` 强制、域名白名单、DNS 全记录公网校验、逐跳重定向校验
|
||||
- `cache_titles(titles, user_id)` / `get_cached_titles(analysis_id, user_id)`:20 分钟 TTL + 用户绑定
|
||||
- GLM 服务扩展:`backend/app/services/glm_service.py`
|
||||
- `analyze_topics(titles)`:从标题归纳热门话题(≤10)
|
||||
- `generate_script_from_topic(topic, word_count, titles)`:按话题与风格生成文案
|
||||
- 工具路由新增接口:`backend/app/modules/tools/router.py`
|
||||
- `POST /api/tools/analyze-creator`
|
||||
- `POST /api/tools/generate-topic-script`
|
||||
- 使用 Pydantic JSON 请求模型 + 登录态校验 + 统一 `success_response`
|
||||
|
||||
### 4.2 前端实现
|
||||
|
||||
- 新增状态逻辑 Hook:`frontend/src/features/home/ui/script-learning/useScriptLearning.ts`
|
||||
- 流程状态:`input -> analyzing -> topics -> generating -> result`
|
||||
- 管理分析请求、生成请求、错误态、复制、重新生成
|
||||
- 新增弹窗组件:`frontend/src/features/home/ui/ScriptLearningModal.tsx`
|
||||
- 步骤式 UI:输入链接、话题单选、字数输入、结果展示、填入文案/复制
|
||||
- 接入首页交互:
|
||||
- `frontend/src/features/home/ui/ScriptEditor.tsx`:新增「文案深度学习」按钮
|
||||
- `frontend/src/features/home/model/useHomeController.ts`:新增 `learningModalOpen` 状态
|
||||
- `frontend/src/features/home/ui/HomePage.tsx`:挂载弹窗并支持回填主编辑器
|
||||
|
||||
### 4.3 交互位置与规则
|
||||
|
||||
- 按钮位置已按约定落位:
|
||||
- `历史文案` → `文案提取助手` → `文案深度学习` → `AI多语言`
|
||||
- 弹窗遵循当前统一策略:支持遮罩点击关闭(非关键流程弹窗)。
|
||||
|
||||
### 4.4 验证记录
|
||||
|
||||
- 后端语法检查:
|
||||
- `python -m py_compile backend/app/services/creator_scraper.py backend/app/services/glm_service.py backend/app/modules/tools/router.py` ✅
|
||||
- 前端构建:
|
||||
- `cd frontend && npm run build` ✅
|
||||
- 抖音短链样例联调:
|
||||
- `https://v.douyin.com/hmFXdx5PvzQ/` 可解析、可抓取标题(触发降级时可自动走 Playwright)✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 5) 抖音 Cookie 依赖澄清与 B站频率限制增强
|
||||
|
||||
### 5.1 抖音 Cookie 依赖澄清
|
||||
|
||||
- 文案深度学习的抖音抓取**不依赖发布管理页登录 Cookie**。
|
||||
- 当前链路使用:
|
||||
- 短链解析 + `sec_uid` 提取
|
||||
- 公共访问链路(`ttwid` + 页面/接口抓取)
|
||||
- 必要时 Playwright 降级
|
||||
- 因此用户即使未登录抖音,也可使用该功能(但仍可能受平台风控影响)。
|
||||
|
||||
### 5.2 B站“请求过于频繁”优化
|
||||
|
||||
- 在 `backend/app/services/creator_scraper.py` 增强 B站抓取稳健性:
|
||||
- 对频率限制场景增加自动重试(指数退避 + 随机抖动)
|
||||
- 频率限制识别(HTTP 412/429、错误码/错误文案)
|
||||
- HTTP 链路失败后自动切换 Playwright 降级抓取
|
||||
- 最终报错文案统一为更可理解的提示
|
||||
- `mid` 提取兼容根路径与子路径(如 `/upload/video`)
|
||||
|
||||
### 5.3 验证记录
|
||||
|
||||
- B站样例联调:`https://space.bilibili.com/8047632` 可抓取 50 条标题 ✅
|
||||
- 抖音短链复测:`https://v.douyin.com/hmFXdx5PvzQ/` 仍可抓取 50 条标题 ✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 6) 抖音 + B站 抓取可靠性二次增强
|
||||
|
||||
### 6.1 抖音增强
|
||||
|
||||
- `backend/app/services/creator_scraper.py`
|
||||
- `scrape_creator_titles(..., user_id)` 透传用户 ID,支持读取用户已登录平台 Cookie 作为增强上下文。
|
||||
- 抖音抓取新增可选用户 Cookie 注入(HTTP 请求 + Playwright 上下文)。
|
||||
- Playwright 降级抓取轮次从 4 次提升到 8 次,目标改为尽量补齐 `MAX_TITLES=50`。
|
||||
- 保留网络响应抓取主链路(`aweme/post` + `profile/other`),优先 `desc` 提取标题。
|
||||
|
||||
### 6.2 B站增强
|
||||
|
||||
- 新增 WBI 签名链路(主链路):
|
||||
- 获取 `wbi_img` key(兼容 `nav` 返回 `-101` 但携带 `wbi_img` 的场景)
|
||||
- 计算 `w_rid/wts` 后调用 `x/space/wbi/arc/search`
|
||||
- 多页拉取(分页累加)+ 标题去重,尽量补齐 50 条
|
||||
- 新增 B站会话预热:
|
||||
- `x/frontend/finger/spi` 获取并注入 `buvid3/buvid4`
|
||||
- 支持读取用户已登录 B站 Cookie(若存在)提升命中率
|
||||
- Playwright 降级增强:
|
||||
- 监听 `x/space/*/arc/search` 响应并解析有效 payload
|
||||
- 对捕获的 arc URL 进行 `context.request` 二次回放尝试
|
||||
|
||||
### 6.3 路由联动
|
||||
|
||||
- `backend/app/modules/tools/router.py`
|
||||
- `/api/tools/analyze-creator` 调用抓取时传入 `current_user.id`,用于平台 Cookie 增强。
|
||||
|
||||
### 6.4 结果说明
|
||||
|
||||
- 抖音:短链场景稳定性进一步提升,风控页下优先走 Playwright 降级抓取。
|
||||
- B站:已补齐签名链路与降级链路;但在平台强风控窗口仍可能返回“请求过于频繁/风控校验失败”,属于平台侧限制。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 7) 抓取策略最终调整:抖音/B站改为 Playwright 直连
|
||||
|
||||
根据产品决策,将文案深度学习的博主标题抓取策略统一为 **Playwright 直连主链路**,不再使用“HTTP 主链路 + Playwright 降级”。
|
||||
|
||||
### 7.1 调整内容
|
||||
|
||||
- `backend/app/services/creator_scraper.py`
|
||||
- `_scrape_douyin()` 改为直接调用 `_scrape_douyin_with_playwright()`。
|
||||
- `_scrape_bilibili()` 改为直接调用 `_scrape_bilibili_with_playwright()`。
|
||||
- 两个平台均保留 2 次 Playwright 抓取重试。
|
||||
- 支持优先读取用户隔离 Cookie,若缺失再尝试旧版全局 Cookie。
|
||||
- `backend/app/modules/tools/router.py`
|
||||
- `analyze-creator` 继续传入 `current_user.id`,用于匹配用户 Cookie 上下文。
|
||||
|
||||
### 7.2 影响评估
|
||||
|
||||
- 影响范围仅限「文案深度学习」抓取链路。
|
||||
- **不影响**:视频自动化发布、文案提取助手(extract-script)现有流程。
|
||||
|
||||
### 7.3 验证
|
||||
|
||||
- 抖音短链样例:`https://v.douyin.com/hmFXdx5PvzQ/` 抓取成功,50 条。
|
||||
- B站样例:
|
||||
- `https://space.bilibili.com/256237759?spm_id_from=...` 抓取成功,40 条。
|
||||
- `https://space.bilibili.com/1140672573` 抓取成功,40 条。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 8) GLM 调用链统一与超时体验优化
|
||||
|
||||
### 8.1 现象
|
||||
|
||||
- 文案深度学习“生成文案”偶发前端报错:`timeout of 30000ms exceeded`。
|
||||
|
||||
### 8.2 原因
|
||||
|
||||
- 主要是前端请求超时阈值过短(30s),在模型排队或长文本生成时容易超时。
|
||||
- 后端虽然统一走 `glm_service`,但各方法内部仍重复编写 SDK 调用代码,维护成本高。
|
||||
|
||||
### 8.3 调整
|
||||
|
||||
- 前端:`generate-topic-script` 超时从 30s 提升到 90s,并优化超时提示文案。
|
||||
- 后端:`backend/app/services/glm_service.py`
|
||||
- 新增 `_call_glm(...)` 作为统一调用入口(统一 model / thinking / to_thread / timeout)
|
||||
- `generate_title_tags / rewrite_script / analyze_topics / generate_script_from_topic / translate_text`
|
||||
全部改为复用该入口
|
||||
- 保持 `settings.GLM_MODEL` 单点配置,避免多处散落调用
|
||||
|
||||
### 8.4 结果
|
||||
|
||||
- GLM 调用标准统一,后续参数调整只需改一处。
|
||||
- 前端超时报错显著减少;如确实超时会给出可理解提示。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 9) 三个文案弹窗操作按钮统一
|
||||
|
||||
### 9.1 目标
|
||||
|
||||
- 统一「文案提取助手」「AI 智能改写」「文案深度学习」结果页操作按钮的位置、样式与主次关系。
|
||||
|
||||
### 9.2 调整
|
||||
|
||||
- `frontend/src/features/home/ui/ScriptExtractionModal.tsx`
|
||||
- 结果页按钮从“分散在标题右侧 + 底部单独按钮”改为统一底部 Action Grid。
|
||||
- 按钮统一为:`填入文案`、`复制`、`提取下一个`、`关闭`。
|
||||
- `frontend/src/features/home/ui/RewriteModal.tsx`
|
||||
- 结果页按钮改为统一底部 Action Grid。
|
||||
- 新增复制按钮(含 clipboard fallback)。
|
||||
- 按钮统一为:`填入文案`、`复制`、`重新生成`、`保留原文`。
|
||||
- `frontend/src/features/home/ui/ScriptLearningModal.tsx`
|
||||
- 维持同一 Action Grid 风格:`填入文案`、`复制`、`重新生成`、`换个话题`。
|
||||
|
||||
### 9.3 验证
|
||||
|
||||
- `cd frontend && npm run build` ✅
|
||||
@@ -389,7 +389,7 @@ if not qr_element:
|
||||
|
||||
## 📋 文档规则优化 (16:42 - 17:10)
|
||||
|
||||
**问题**:Doc_Rules需要优化,避免误删历史内容、规范工具使用、防止任务清单遗漏
|
||||
**问题**:DOC_RULES需要优化,避免误删历史内容、规范工具使用、防止任务清单遗漏
|
||||
|
||||
**优化内容(最终版)**:
|
||||
|
||||
@@ -411,7 +411,7 @@ if not qr_element:
|
||||
- 移除无关项目组件
|
||||
|
||||
**修改文件**:
|
||||
- `Docs/Doc_Rules.md` - 包含检查清单的最终完善版
|
||||
- `Docs/DOC_RULES.md` - 包含检查清单的最终完善版
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -228,7 +228,7 @@ else:
|
||||
|
||||
| 文件 | 说明 | 状态 |
|
||||
|------|------|------|
|
||||
| `src/lib/auth.ts` | 认证工具函数 | ✅ |
|
||||
| `src/shared/lib/auth.ts` | 认证工具函数 | ✅ |
|
||||
| `src/app/login/page.tsx` | 登录页 | ✅ |
|
||||
| `src/app/register/page.tsx` | 注册页 | ✅ |
|
||||
| `src/app/admin/page.tsx` | 管理后台 | ✅ |
|
||||
|
||||
@@ -6,13 +6,14 @@
|
||||
|
||||
## ⚡ 核心原则
|
||||
|
||||
| 规则 | 说明 |
|
||||
|------|------|
|
||||
| **默认更新** | 只更新 `DayN.md` |
|
||||
| **按需更新** | `task_complete.md` 仅在用户**明确要求**时更新 |
|
||||
| **智能修改** | 错误→替换,改进→追加(见下方详细规则) |
|
||||
| **先读后写** | 更新前先查看文件当前内容 |
|
||||
| **日内合并** | 同一天的多次小修改合并为最终版本 |
|
||||
| 规则 | 说明 |
|
||||
|------|------|
|
||||
| **默认更新** | 更新 `DayN.md` 和 `TASK_COMPLETE.md` |
|
||||
| **按需更新** | 其他文档仅在内容变化涉及时更新 |
|
||||
| **链路对齐** | 新增/重构文档后,回写入口文档(`README.md` 或对应 `*_README.md`) |
|
||||
| **智能修改** | 错误→替换,改进→追加(见下方详细规则) |
|
||||
| **先读后写** | 更新前先查看文件当前内容 |
|
||||
| **日内合并** | 同一天的多次小修改合并为最终版本 |
|
||||
|
||||
---
|
||||
|
||||
@@ -20,15 +21,19 @@
|
||||
|
||||
> **每次提交重要变更时,请核对以下文件是否需要同步:**
|
||||
|
||||
| 优先级 | 文件路径 | 检查重点 |
|
||||
| :---: | :--- | :--- |
|
||||
| 🔥 **High** | `Docs/DevLogs/DayN.md` | **(最新日志)** 详细记录变更、修复、代码片段 |
|
||||
| 🔥 **High** | `Docs/task_complete.md` | **(任务总览)** 更新 `[x]`、进度条、时间线 |
|
||||
| ⚡ **Med** | `README.md` | **(项目主页)** 功能特性、技术栈、最新截图 |
|
||||
| ⚡ **Med** | `Docs/DEPLOY_MANUAL.md` | **(部署手册)** 环境变量、依赖包、启动命令变更 |
|
||||
| ⚡ **Med** | `Docs/FRONTEND_DEV.md` | **(前端规范)** API封装、日期格式化、新页面规范 |
|
||||
| 🧊 **Low** | `Docs/implementation_plan.md` | **(实施计划)** 核对计划与实际实现的差异 |
|
||||
| 🧊 **Low** | `frontend/README.md` | **(前端文档)** 新页面路由、组件用法、UI变更 |
|
||||
| 优先级 | 文件路径 | 检查重点 |
|
||||
| :---: | :--- | :--- |
|
||||
| 🔥 **High** | `Docs/DevLogs/DayN.md` | **(最新日志)** 详细记录变更、修复、代码片段 |
|
||||
| 🔥 **High** | `Docs/TASK_COMPLETE.md` | **(任务总览)** 更新 Day Current、`[x]` 与更新时间 |
|
||||
| ⚡ **Med** | `README.md` | **(项目主页)** 功能特性、技术栈、最新截图 |
|
||||
| ⚡ **Med** | `Docs/DEPLOY_MANUAL.md` | **(部署手册)** 环境变量、依赖包、启动命令变更 |
|
||||
| ⚡ **Med** | `Docs/PUBLISH_DEPLOY.md` | **(发布专项)** 四平台登录/发布实现、排障、验收流程 |
|
||||
| ⚡ **Med** | `Docs/BACKEND_DEV.md` | **(后端规范)** 接口契约、模块划分、环境变量 |
|
||||
| ⚡ **Med** | `Docs/BACKEND_README.md` | **(后端文档)** 接口说明、架构设计 |
|
||||
| ⚡ **Med** | `Docs/FRONTEND_DEV.md` | **(前端规范)** API封装、日期格式化、新页面规范 |
|
||||
| ⚡ **Med** | `Docs/FRONTEND_README.md` | **(前端文档)** 功能说明、页面变更 |
|
||||
| 🧊 **Low** | `Docs/DOC_RULES.md` | **(规则文档)** 文档结构变化或流程变化时同步更新 |
|
||||
| 🧊 **Low** | `Docs/*_DEPLOY.md` | **(子系统部署)** LatentSync/CosyVoice/字幕等独立部署文档 |
|
||||
|
||||
---
|
||||
|
||||
@@ -87,13 +92,13 @@
|
||||
|
||||
---
|
||||
|
||||
## 🔍 更新前检查清单
|
||||
## 🔍 更新前检查清单
|
||||
|
||||
> **核心原则**:追加前先查找,避免重复和遗漏
|
||||
|
||||
### 必须执行的检查步骤
|
||||
|
||||
**1. 快速浏览全文**(使用 `view_file` 或 `grep_search`)
|
||||
**1. 快速浏览全文**(使用 `Read` 或 `Grep`)
|
||||
```markdown
|
||||
# 检查是否存在:
|
||||
- 同主题的旧章节?
|
||||
@@ -110,12 +115,20 @@
|
||||
| **有待验证状态** | 更新状态标记 |
|
||||
| **全新独立内容** | 追加到末尾 |
|
||||
|
||||
**3. 必须更新的内容**
|
||||
**3. 必须更新的内容**
|
||||
|
||||
- ✅ **状态标记**:`🔄 待验证` → `✅ 已修复` / `❌ 失败`
|
||||
- ✅ **进度百分比**:更新为最新值
|
||||
- ✅ **文件修改列表**:补充新修改的文件
|
||||
- ❌ **禁止**:创建重复的章节标题
|
||||
- ✅ **文件修改列表**:补充新修改的文件
|
||||
- ❌ **禁止**:创建重复的章节标题
|
||||
|
||||
### 发布相关变更的三检(新增)
|
||||
|
||||
若涉及抖音/微信/B站/小红书发布或扫码登录,额外执行:
|
||||
|
||||
1. **路由真值检查**:以 `backend/app/modules/publish/router.py` 为准校验 API 路径,避免文档写成旧路径(例如 `/screenshots/`)。
|
||||
2. **专项文档对齐**:更新 `Docs/PUBLISH_DEPLOY.md` 中对应平台章节(登录、发布判定、排障)。
|
||||
3. **入口文档回写**:至少回写一处入口文档(`README.md` 或 `Docs/BACKEND_README.md` / `Docs/DEPLOY_MANUAL.md`)。
|
||||
|
||||
### 示例场景
|
||||
|
||||
@@ -136,67 +149,47 @@
|
||||
|
||||
---
|
||||
|
||||
## ️ 工具使用规范
|
||||
## ️ 工具使用规范
|
||||
|
||||
> **核心原则**:使用正确的工具,避免字符编码问题
|
||||
|
||||
### ✅ 推荐工具:replace_file_content
|
||||
### ✅ 推荐工具:Read / Grep / apply_patch
|
||||
|
||||
**使用场景**:
|
||||
- 追加新章节到文件末尾
|
||||
- 修改/替换现有章节内容
|
||||
- 更新状态标记(🔄 → ✅)
|
||||
- 修正错误内容
|
||||
|
||||
**优势**:
|
||||
- ✅ 自动处理字符编码(Windows CRLF)
|
||||
- ✅ 精确替换,不会误删其他内容
|
||||
- ✅ 有错误提示,方便调试
|
||||
**使用场景**:
|
||||
- `Read`:更新前先查看文件当前内容
|
||||
- `apply_patch`:精确替换现有内容、追加新章节
|
||||
- `Grep`:搜索文件中是否已有相关章节
|
||||
- `Write`:创建新文件(如 Day{N+1}.md)
|
||||
|
||||
**注意事项**:
|
||||
```markdown
|
||||
1. **必须精确匹配**:TargetContent 必须与文件完全一致
|
||||
2. **处理换行符**:文件使用 \r\n,不要漏掉 \r
|
||||
3. **合理范围**:StartLine/EndLine 应覆盖目标内容
|
||||
4. **先读后写**:编辑前先 view_file 确认内容
|
||||
1. **先读后写**:编辑前先用 Read 确认内容
|
||||
2. **精确匹配**:`apply_patch` 的上下文必须与文件内容一致
|
||||
3. **避免重复**:编辑前用 Grep 检查是否已存在同主题章节
|
||||
```
|
||||
|
||||
### ❌ 禁止使用:命令行工具
|
||||
### ❌ 禁止使用:命令行工具修改文档
|
||||
|
||||
**禁止场景**:
|
||||
- ❌ 使用 `echo >>` 追加内容(编码问题)
|
||||
- ❌ 使用 PowerShell 直接修改文档(破坏格式)
|
||||
- ❌ 使用 sed/awk 等命令行工具
|
||||
- ❌ 使用 `echo >>` 追加内容
|
||||
- ❌ 使用 `sed` / `awk` 修改文档
|
||||
- ❌ 使用 `cat <<EOF` 写入内容
|
||||
|
||||
**原因**:
|
||||
- 容易破坏 UTF-8 编码
|
||||
- Windows CRLF vs Unix LF 混乱
|
||||
- 容易破坏 UTF-8 编码和中文字符
|
||||
- 难以追踪修改,容易出错
|
||||
|
||||
**唯一例外**:简单的全局文本替换(如批量更新日期),且必须使用 `-NoNewline` 参数
|
||||
- 无法精确匹配替换位置
|
||||
|
||||
### 📝 最佳实践示例
|
||||
|
||||
**追加新章节**:
|
||||
```python
|
||||
replace_file_content(
|
||||
TargetFile="path/to/DayN.md",
|
||||
TargetContent="## 🔗 相关文档\n\n...\n\n", # 文件末尾的内容
|
||||
ReplacementContent="## 🔗 相关文档\n\n...\n\n---\n\n## 🆕 新章节\n内容...",
|
||||
StartLine=280,
|
||||
EndLine=284
|
||||
)
|
||||
```
|
||||
|
||||
**修改现有内容**:
|
||||
```python
|
||||
replace_file_content(
|
||||
TargetContent="**状态**:🔄 待修复",
|
||||
ReplacementContent="**状态**:✅ 已修复",
|
||||
StartLine=310,
|
||||
EndLine=310
|
||||
)
|
||||
```
|
||||
**追加新章节**:使用 `apply_patch`,以文件末尾稳定上下文为锚点追加。
|
||||
|
||||
**修改现有内容**:使用 `apply_patch` 精确替换。
|
||||
```markdown
|
||||
@@
|
||||
-**状态**:🔄 待修复
|
||||
+**状态**:✅ 已修复
|
||||
```
|
||||
|
||||
|
||||
---
|
||||
@@ -204,12 +197,20 @@ replace_file_content(
|
||||
## 📁 文件结构
|
||||
|
||||
```
|
||||
ViGent/Docs/
|
||||
├── task_complete.md # 任务总览(仅按需更新)
|
||||
├── Doc_Rules.md # 本文件
|
||||
ViGent2/Docs/
|
||||
├── TASK_COMPLETE.md # 任务总览(仅按需更新)
|
||||
├── DOC_RULES.md # 本文件
|
||||
├── BACKEND_DEV.md # 后端开发规范
|
||||
├── BACKEND_README.md # 后端功能文档
|
||||
├── FRONTEND_DEV.md # 前端开发规范
|
||||
├── DEPLOY_MANUAL.md # 部署手册
|
||||
├── SUPABASE_DEPLOY.md # Supabase 部署文档
|
||||
├── FRONTEND_README.md # 前端功能文档
|
||||
├── DEPLOY_MANUAL.md # 部署手册
|
||||
├── PUBLISH_DEPLOY.md # 多平台发布专项文档
|
||||
├── SUPABASE_DEPLOY.md # Supabase 部署文档
|
||||
├── LATENTSYNC_DEPLOY.md # LatentSync 部署文档
|
||||
├── COSYVOICE3_DEPLOY.md # 声音克隆部署文档
|
||||
├── ALIPAY_DEPLOY.md # 支付宝付费部署文档
|
||||
├── SUBTITLE_DEPLOY.md # 字幕系统部署文档
|
||||
└── DevLogs/
|
||||
├── Day1.md # 开发日志
|
||||
└── ...
|
||||
@@ -219,8 +220,16 @@ ViGent/Docs/
|
||||
|
||||
## 📅 DayN.md 更新规则(日常更新)
|
||||
|
||||
### 更新时机
|
||||
|
||||
> **边开发边记录,不要等到最后才写。**
|
||||
|
||||
- 每完成一个功能/修复后,**立即**追加到 DayN.md
|
||||
- 避免积攒到对话末尾一次性补写,容易遗漏变更
|
||||
- `TASK_COMPLETE.md` 同理,重要变更完成后及时同步
|
||||
|
||||
### 新建判断 (对话开始前)
|
||||
1. **回顾进度**:查看 `task_complete.md` 了解当前状态
|
||||
1. **回顾进度**:查看 `TASK_COMPLETE.md` 了解当前状态
|
||||
2. **检查日期**:查看最新 `DayN.md`
|
||||
- **今天 (与当前日期相同)** → 🚨 **绝对禁止创建新文件**,必须**追加**到现有 `DayN.md` 末尾!即使是完全不同的功能模块。
|
||||
- **之前 (昨天或更早)** → 创建 `Day{N+1}.md`
|
||||
@@ -252,18 +261,27 @@ ViGent/Docs/
|
||||
**状态**:✅ 已修复 / 🔄 待验证
|
||||
```
|
||||
|
||||
### ⚠️ 注意
|
||||
- **DayN.md 文件开头禁止使用 `---`**,避免被解析为 Front Matter。
|
||||
- 分隔线只用于章节之间,不作为文件第一行。
|
||||
|
||||
---
|
||||
|
||||
## 📏 内容简洁性规则
|
||||
## 📏 内容简洁性规则
|
||||
|
||||
### 代码示例长度控制
|
||||
- **原则**:只展示关键代码片段(10-20行以内)
|
||||
- **超长代码**:使用 `// ... 省略 ...` 或仅列出文件名+行号
|
||||
- **完整代码**:引用文件链接,而非粘贴全文
|
||||
|
||||
### 调试信息处理
|
||||
- **临时调试**:验证后删除(如调试日志、测试截图)
|
||||
- **有价值信息**:保留(如错误日志、性能数据)
|
||||
### 调试信息处理
|
||||
- **临时调试**:验证后删除(如调试日志、测试截图)
|
||||
- **有价值信息**:保留(如错误日志、性能数据)
|
||||
|
||||
### 敏感信息处理
|
||||
- **禁止落盘**:Cookie 值、Token、密钥、完整手机号、支付凭证。
|
||||
- **日志引用**:仅记录必要关键词与结论,避免粘贴大段原始日志。
|
||||
- **路径引用**:优先给相对路径与文件名,不记录无关个人目录信息。
|
||||
|
||||
### 状态标记更新
|
||||
- **🔄 待验证** → 验证后更新为 **✅ 已修复** 或 **❌ 失败**
|
||||
@@ -272,37 +290,37 @@ ViGent/Docs/
|
||||
|
||||
---
|
||||
|
||||
## 📝 task_complete.md 更新规则(仅按需)
|
||||
## 📝 TASK_COMPLETE.md 更新规则
|
||||
|
||||
> ⚠️ **仅当用户明确要求更新 `task_complete.md` 时才更新**
|
||||
> 与 DayN.md 同步更新,记录重要变更时更新任务总览。
|
||||
|
||||
### 更新原则
|
||||
- **格式一致性**:直接参考 `task_complete.md` 现有格式追加内容。
|
||||
- **格式一致性**:直接参考 `TASK_COMPLETE.md` 现有格式追加内容。
|
||||
- **进度更新**:仅在阶段性里程碑时更新进度百分比。
|
||||
|
||||
### 🔍 完整性检查清单 (必做)
|
||||
|
||||
每次更新 `task_complete.md` 时,必须**逐一检查**以下所有板块:
|
||||
|
||||
1. **文件头部 & 导航**
|
||||
- [ ] `更新时间`:必须是当天日期
|
||||
- [ ] `整体进度`:简述当前状态
|
||||
- [ ] `快速导航`:Day 范围与文档一致
|
||||
|
||||
2. **核心任务区**
|
||||
- [ ] `已完成任务`:添加新的 [x] 项目
|
||||
- [ ] `后续规划`:管理三色板块 (优先/债务/未来)
|
||||
|
||||
3. **统计与回顾**
|
||||
- [ ] `进度统计`:更新对应模块状态和百分比
|
||||
- [ ] `里程碑`:若有重大进展,追加 `## Milestone N`
|
||||
|
||||
4. **底部链接**
|
||||
- [ ] `时间线`:追加今日概括
|
||||
- [ ] `相关文档`:更新 DayLog 链接范围
|
||||
|
||||
> **口诀**:头尾时间要对齐,任务规划两手抓,里程碑上别落下。
|
||||
### 🔍 完整性检查清单 (必做)
|
||||
|
||||
每次更新 `TASK_COMPLETE.md` 时,必须**逐一检查**以下板块:
|
||||
|
||||
1. **文件头部**
|
||||
- [ ] `更新时间`:必须是当天日期
|
||||
- [ ] `整体进度`:与当前 Day 状态一致(例如 Day31)
|
||||
|
||||
2. **当日 Current 区块**
|
||||
- [ ] 新增/更新 `Day N (Current)` 标题
|
||||
- [ ] 关键任务以 `[x]` 列出(避免仅写结论)
|
||||
- [ ] 前一天 Day 标题取消 `(Current)` 标记
|
||||
|
||||
3. **Roadmap 与模块状态**
|
||||
- [ ] 如有已完成长期事项,及时从待办迁移到已完成
|
||||
- [ ] 模块完成度有变化时同步更新
|
||||
|
||||
4. **相关文档链接**
|
||||
- [ ] 新增的核心文档(如 `PUBLISH_DEPLOY.md`)要在相关位置可追溯
|
||||
- [ ] 若 DayN 记录了“文档回写”,`TASK_COMPLETE.md` 的当日条目也要体现
|
||||
|
||||
> **口诀**:头部日期、当日 Current、模块状态、链接可追溯。
|
||||
|
||||
---
|
||||
|
||||
**最后更新**:2026-01-23
|
||||
**最后更新**:2026-03-03
|
||||
|
||||
@@ -1,19 +1,90 @@
|
||||
# 前端开发规范
|
||||
|
||||
## 文档定位
|
||||
|
||||
- 本文档只定义前端开发规范与约束(结构、交互、持久化、接口调用、Checklist)。
|
||||
- 功能说明与启动方式请查看 `Docs/FRONTEND_README.md`。
|
||||
- 历史变更请记录在 `Docs/DevLogs/` 与 `Docs/TASK_COMPLETE.md`,不要写入本规范文档。
|
||||
|
||||
## 目录结构
|
||||
|
||||
采用轻量 FSD(Feature-Sliced Design)结构:
|
||||
|
||||
```
|
||||
frontend/src/
|
||||
├── app/ # Next.js App Router 页面
|
||||
│ ├── page.tsx # 首页(视频生成)
|
||||
│ ├── publish/ # 发布页面
|
||||
│ ├── admin/ # 管理员页面
|
||||
│ ├── login/ # 登录页面
|
||||
│ └── register/ # 注册页面
|
||||
├── lib/ # 公共工具函数
|
||||
│ ├── axios.ts # Axios 实例(含 401/403 拦截器)
|
||||
│ └── auth.ts # 认证相关函数
|
||||
└── proxy.ts # 路由代理(原 middleware)
|
||||
├── app/ # Next.js App Router 页面入口
|
||||
│ ├── page.tsx # 首页(视频生成)
|
||||
│ ├── publish/ # 发布管理页
|
||||
│ ├── admin/ # 管理员页面
|
||||
│ ├── login/ # 登录
|
||||
│ ├── register/ # 注册
|
||||
│ └── pay/ # 付费开通会员
|
||||
├── features/ # 功能模块(按业务拆分)
|
||||
│ ├── home/
|
||||
│ │ ├── model/ # 业务逻辑 hooks
|
||||
│ │ │ ├── useHomeController.ts # 主控制器
|
||||
│ │ │ ├── useHomePersistence.ts # 持久化管理
|
||||
│ │ │ ├── useBgm.ts
|
||||
│ │ │ ├── useGeneratedVideos.ts
|
||||
│ │ │ ├── useGeneratedAudios.ts
|
||||
│ │ │ ├── useMaterials.ts
|
||||
│ │ │ ├── useMediaPlayers.ts
|
||||
│ │ │ ├── useRefAudios.ts
|
||||
│ │ │ ├── useSavedScripts.ts
|
||||
│ │ │ ├── useTimelineEditor.ts
|
||||
│ │ │ └── useTitleSubtitleStyles.ts
|
||||
│ │ └── ui/ # UI 组件(纯 props + 回调)
|
||||
│ │ ├── HomePage.tsx
|
||||
│ │ ├── HomeHeader.tsx
|
||||
│ │ ├── MaterialSelector.tsx
|
||||
│ │ ├── ScriptEditor.tsx
|
||||
│ │ ├── ScriptExtractionModal.tsx
|
||||
│ │ ├── RewriteModal.tsx
|
||||
│ │ ├── ScriptLearningModal.tsx
|
||||
│ │ ├── script-extraction/
|
||||
│ │ │ └── useScriptExtraction.ts
|
||||
│ │ ├── script-learning/
|
||||
│ │ │ └── useScriptLearning.ts
|
||||
│ │ ├── TitleSubtitlePanel.tsx
|
||||
│ │ ├── FloatingStylePreview.tsx
|
||||
│ │ ├── VoiceSelector.tsx
|
||||
│ │ ├── RefAudioPanel.tsx
|
||||
│ │ ├── GeneratedAudiosPanel.tsx
|
||||
│ │ ├── TimelineEditor.tsx
|
||||
│ │ ├── ClipTrimmer.tsx
|
||||
│ │ ├── BgmPanel.tsx
|
||||
│ │ ├── GenerateActionBar.tsx
|
||||
│ │ ├── PreviewPanel.tsx
|
||||
│ │ └── HistoryList.tsx
|
||||
│ └── publish/
|
||||
│ ├── model/
|
||||
│ │ └── usePublishController.ts
|
||||
│ └── ui/
|
||||
│ └── PublishPage.tsx
|
||||
├── shared/ # 跨功能共享
|
||||
│ ├── api/
|
||||
│ │ ├── axios.ts # Axios 实例(含 401/403 拦截器)
|
||||
│ │ └── types.ts # 统一响应类型
|
||||
│ ├── lib/
|
||||
│ │ ├── media.ts # API Base / URL / 日期等通用工具
|
||||
│ │ ├── auth.ts # 认证相关函数
|
||||
│ │ └── title.ts # 标题输入处理
|
||||
│ ├── hooks/
|
||||
│ │ ├── useTitleInput.ts
|
||||
│ │ └── usePublishPrefetch.ts
|
||||
│ ├── ui/
|
||||
│ │ ├── SelectPopover.tsx # 统一下拉/BottomSheet 选择器
|
||||
│ │ └── AppModal.tsx # 统一弹窗基座
|
||||
│ ├── types/
|
||||
│ │ ├── user.ts # User 类型定义
|
||||
│ │ └── publish.ts # 发布相关类型
|
||||
│ └── contexts/ # 全局 Context(Auth、Task、Cleanup)
|
||||
│ ├── AuthContext.tsx
|
||||
│ ├── TaskContext.tsx
|
||||
│ └── CleanupContext.tsx
|
||||
├── components/ # 遗留通用组件
|
||||
│ └── VideoPreviewModal.tsx
|
||||
└── proxy.ts # Next.js middleware(路由保护)
|
||||
```
|
||||
|
||||
---
|
||||
@@ -94,20 +165,113 @@ body {
|
||||
| `sm:` | ≥ 640px | 平板/桌面 |
|
||||
| `lg:` | ≥ 1024px | 大屏桌面 |
|
||||
|
||||
### embedded 组件模式
|
||||
|
||||
合并板块时,子组件通过 `embedded?: boolean` prop 控制是否渲染外层卡片容器和主标题。
|
||||
|
||||
```tsx
|
||||
// embedded=false(独立使用):渲染完整卡片
|
||||
<div className="bg-white/5 rounded-2xl p-6 border border-white/10">
|
||||
<h2>标题</h2>
|
||||
{content}
|
||||
</div>
|
||||
|
||||
// embedded=true(嵌入父卡片):只渲染内容
|
||||
{content}
|
||||
```
|
||||
|
||||
- 子标题使用 `<h3 className="text-sm font-medium text-gray-400">`
|
||||
- 分隔线使用 `<div className="border-t border-white/10 my-4" />`
|
||||
- 移动端标题行避免 `whitespace-nowrap`,长描述文字可用 `hidden sm:inline` 在移动端隐藏
|
||||
|
||||
### 按钮视觉层级
|
||||
|
||||
| 层级 | 样式 | 用途 |
|
||||
|------|------|------|
|
||||
| 主操作 | `px-4 py-2 text-sm font-medium bg-gradient-to-r from-purple-600 to-pink-600 shadow-sm` | 生成配音、立即发布 |
|
||||
| 辅助操作 | `px-2 py-1 text-xs bg-white/10 rounded` | 刷新、上传、语速 |
|
||||
| 触屏可见 | `opacity-40 group-hover:opacity-100` | 列表行内操作(编辑/删除) |
|
||||
|
||||
---
|
||||
|
||||
## 统一下拉选择器规范 (SelectPopover)
|
||||
|
||||
首页/发布页的业务选择项(音色、参考音频、配音、素材、BGM、作品、样式、模型、画面比例)统一使用 `@/shared/ui/SelectPopover`:
|
||||
|
||||
- 桌面端使用 Popover,移动端自动切换 BottomSheet
|
||||
- 触发器与面板风格统一:`border-white/10 + bg-black/25`(或同级变体)
|
||||
- 下拉项选中态统一:`border-purple-500 bg-purple-500/20`
|
||||
- 选中项需添加 `data-popover-selected="true"`,确保再次打开时自动滚动定位到已选项
|
||||
- 底部空间不足时自动上拉;滚动条隐藏但保留滚动能力
|
||||
|
||||
### 视频预览与下拉层级
|
||||
|
||||
- 下拉菜单层级应低于视频预览弹窗,避免遮挡预览内容
|
||||
- 在下拉内点击“预览”时,不强制关闭下拉(便于连续预览)
|
||||
- 关闭预览后,用户可继续在下拉内操作;点击外部时下拉正常收起
|
||||
|
||||
### 例外说明
|
||||
|
||||
- `ScriptEditor` 的“历史文案 / AI多语言”保持原有轻量菜单样式,不强制迁移到 `SelectPopover`
|
||||
|
||||
---
|
||||
|
||||
## 统一弹窗规范 (AppModal)
|
||||
|
||||
所有居中弹窗(如视频预览、文案提取、AI 改写、文案深度学习、文案扩展编辑、录音、密码修改)统一使用 `@/shared/ui/AppModal` + `AppModalHeader`:
|
||||
|
||||
- 统一遮罩与层级:`fixed inset-0` + `bg-black/80` + `backdrop-blur-sm` + 明确 `z-index`
|
||||
- 统一挂载位置:通过 Portal 挂载到 `document.body`,避免局部容器/层叠上下文影响,确保是全页面弹窗
|
||||
- 统一容器风格:`border-white/10`、深色半透明背景、圆角 `rounded-2xl`、重阴影
|
||||
- 统一关闭行为:支持 `ESC`;是否允许点击遮罩关闭通过 `closeOnOverlay` 显式配置
|
||||
- 默认策略:除关键流程外,`closeOnOverlay` 默认应为 `true`,并通过 `AppModalHeader onClose` 提供右上角 `X` 关闭入口
|
||||
- 关键流程例外:发布成功清理弹窗(`CleanupContext`)必须保持 `closeOnOverlay=false`,且不提供右上角关闭按钮
|
||||
- 录音弹窗例外:使用 `closeOnOverlay={!isRecording}`,录音中禁止遮罩关闭,避免误触中断
|
||||
- 统一滚动策略:弹窗打开时锁定背景滚动(`lockBodyScroll`),内容区自行滚动
|
||||
- 特殊层级场景(例如视频预览压过下拉)使用更高 `z-index`(如 `z-[320]`)
|
||||
|
||||
### 文案类弹窗结果操作栏规范
|
||||
|
||||
适用组件:
|
||||
- `ScriptExtractionModal`
|
||||
- `RewriteModal`
|
||||
- `ScriptLearningModal`
|
||||
|
||||
统一要求:
|
||||
- 结果页操作按钮统一放在内容底部(Action Grid),避免“标题右上角按钮 + 底部按钮”混排。
|
||||
- 主按钮统一为高亮渐变(如「填入文案」),其余按钮统一次级样式(`bg-white/10`)。
|
||||
- 动作文案尽量统一:`填入文案` / `复制` / `重新生成`(或与当前流程等价的返回动作)。
|
||||
- 按钮尺寸、圆角、间距保持一致(推荐 `py-2.5 px-3 rounded-lg text-sm`)。
|
||||
|
||||
---
|
||||
|
||||
## 发布后清理弹窗规范 (CleanupContext)
|
||||
|
||||
发布页由 `CleanupContext` 统一承接“全部平台发布成功后的清理引导”,规则如下:
|
||||
|
||||
- 触发条件:仅当本次发布结果 **全部成功** 才触发弹窗;有任一失败则走原内联结果展示。
|
||||
- 持久化恢复:`cleanup_pending` 写入 localStorage,支持刷新/跳转后恢复;带 `createdAt`,24 小时自动过期。
|
||||
- 清理顺序:必须先调用 `POST /api/videos/cleanup`;仅在接口成功后才清本地输入字段并关闭弹窗。
|
||||
- 状态同步:清理成功后派发 `vigent:workspace-cleared` 事件,当前发布页输入态需就地重置(避免“localStorage 已清空但页面仍显示旧值”)。
|
||||
- 失败处理:接口失败时保留弹窗和输入数据,允许重试;连续失败达到阈值后显示“暂不清理,继续使用”。
|
||||
- 本地清理范围:仅输入内容(文案/标题/副标题/发布标题/标签),不清用户偏好(样式、字号、边距、模型、BGM 等)。
|
||||
- 下载策略:弹窗“下载视频备份”必须使用同源下载接口(`/api/videos/generated/{id}/download`),不要直接使用签名 URL 作为 `href`。
|
||||
|
||||
---
|
||||
|
||||
## API 请求规范
|
||||
|
||||
### 必须使用 `api` (axios 实例)
|
||||
|
||||
所有需要认证的 API 请求**必须**使用 `@/lib/axios` 导出的 axios 实例。该实例已配置:
|
||||
所有需要认证的 API 请求**必须**使用 `@/shared/api/axios` 导出的 axios 实例。该实例已配置:
|
||||
- 自动携带 `credentials: include`
|
||||
- 遇到 401/403 时自动清除 cookie 并跳转登录页
|
||||
- AI/Tools 接口(如 `/api/ai/*`、`/api/tools/extract-script`、`/api/tools/analyze-creator`、`/api/tools/generate-topic-script`)现为强制鉴权,禁止匿名 `fetch` 直调
|
||||
|
||||
**使用方式:**
|
||||
|
||||
```typescript
|
||||
import api from '@/lib/axios';
|
||||
import api from '@/shared/api/axios';
|
||||
|
||||
// GET 请求
|
||||
const { data } = await api.get('/api/materials');
|
||||
@@ -136,7 +300,7 @@ await api.post('/api/materials', formData, {
|
||||
### SWR 配合使用
|
||||
|
||||
```typescript
|
||||
import api from '@/lib/axios';
|
||||
import api from '@/shared/api/axios';
|
||||
|
||||
// SWR fetcher 使用 axios
|
||||
const fetcher = (url: string) => api.get(url).then(res => res.data);
|
||||
@@ -146,6 +310,27 @@ const { data } = useSWR('/api/xxx', fetcher, { refreshInterval: 2000 });
|
||||
|
||||
---
|
||||
|
||||
## 通用工具函数 (media.ts)
|
||||
|
||||
### 统一 API Base / URL 解析
|
||||
使用 `@/shared/lib/media` 统一处理服务端/客户端 API Base 与资源地址,避免硬编码:
|
||||
|
||||
```typescript
|
||||
import { getApiBaseUrl, resolveMediaUrl, resolveAssetUrl, formatDate } from '@/shared/lib/media';
|
||||
|
||||
const apiBase = getApiBaseUrl(); // SSR: http://localhost:8006 / Client: ''
|
||||
const playableUrl = resolveMediaUrl(video.path); // 兼容签名 URL 与相对路径
|
||||
const fontUrl = resolveAssetUrl(`fonts/${fontFile}`);
|
||||
const timeText = formatDate(video.created_at);
|
||||
```
|
||||
|
||||
### 资源路径规则
|
||||
- 视频/音频:优先用 `resolveMediaUrl()`
|
||||
- 字体/BGM:使用 `resolveAssetUrl()`(自动编码中文路径)
|
||||
- 预览前若已有签名 URL,先用 `isAbsoluteUrl()` 判定,避免再次拼接
|
||||
|
||||
---
|
||||
|
||||
## 日期格式化规范
|
||||
|
||||
### 禁止使用 `toLocaleString()`
|
||||
@@ -161,25 +346,161 @@ new Date(timestamp * 1000).toLocaleString('zh-CN')
|
||||
**正确做法:**
|
||||
```typescript
|
||||
// ✅ 使用固定格式
|
||||
const formatDate = (timestamp: number) => {
|
||||
const d = new Date(timestamp * 1000);
|
||||
const year = d.getFullYear();
|
||||
const month = String(d.getMonth() + 1).padStart(2, '0');
|
||||
const day = String(d.getDate()).padStart(2, '0');
|
||||
const hour = String(d.getHours()).padStart(2, '0');
|
||||
const minute = String(d.getMinutes()).padStart(2, '0');
|
||||
return `${year}/${month}/${day} ${hour}:${minute}`;
|
||||
};
|
||||
import { formatDate } from '@/shared/lib/media';
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 组件拆分规范
|
||||
|
||||
当页面组件超过 300-500 行,建议按功能拆分到 `features/*/ui`:
|
||||
|
||||
- `page.tsx` 仅做组合与布局
|
||||
- 业务逻辑集中在 `features/*/model` 的 Controller Hook
|
||||
- UI 组件只接受 props 与回调,尽量不直接发 API
|
||||
- 首页拆分组件统一放在 `features/home/ui/`
|
||||
|
||||
---
|
||||
|
||||
## ⚡️ 体验优化规范
|
||||
|
||||
### 刷新回顶部(统一体验)
|
||||
|
||||
- 长页面(如首页/发布页)在首次挂载时统一回到顶部。
|
||||
- **必须**在页面级 `useEffect` 中设置 `history.scrollRestoration = "manual"` 禁用浏览器原生滚动恢复。
|
||||
- 调用 `window.scrollTo({ top: 0, left: 0, behavior: "auto" })` 并追加 200ms 延迟兜底(防止异步 effect 覆盖)。
|
||||
- **列表自动滚动必须使用时间门控**:页面加载后 1 秒内禁止所有列表自动滚动效果(`scrollEffectsEnabled` ref),防止持久化恢复 + 异步数据加载触发 `scrollIntoView` 导致页面跳动。
|
||||
- 推荐模式:
|
||||
|
||||
```typescript
|
||||
// 页面级(HomePage / PublishPage)
|
||||
useEffect(() => {
|
||||
if (typeof window === "undefined") return;
|
||||
if ("scrollRestoration" in history) history.scrollRestoration = "manual";
|
||||
window.scrollTo({ top: 0, left: 0, behavior: "auto" });
|
||||
const timer = setTimeout(() => window.scrollTo({ top: 0, left: 0, behavior: "auto" }), 200);
|
||||
return () => clearTimeout(timer);
|
||||
}, []);
|
||||
|
||||
// Controller 级(列表滚动时间门控)
|
||||
const scrollEffectsEnabled = useRef(false);
|
||||
useEffect(() => {
|
||||
const timer = setTimeout(() => { scrollEffectsEnabled.current = true; }, 1000);
|
||||
return () => clearTimeout(timer);
|
||||
}, []);
|
||||
|
||||
// 列表滚动 effect(BGM/素材/视频等)
|
||||
useEffect(() => {
|
||||
if (!selectedId || !scrollEffectsEnabled.current) return;
|
||||
target?.scrollIntoView({ block: "nearest", behavior: "smooth" });
|
||||
}, [selectedId, list]);
|
||||
```
|
||||
|
||||
### 路由预取
|
||||
|
||||
- 首页进入发布管理时使用 `router.prefetch("/publish")`
|
||||
- 只预取路由,不在首页渲染发布页组件
|
||||
|
||||
### 发布页数据预取缓存
|
||||
|
||||
- 使用 `sessionStorage` 保存最近的 `accounts/videos`
|
||||
- 缓存 TTL 2 分钟,进入发布页先读缓存,随后后台刷新
|
||||
|
||||
### 骨架屏
|
||||
|
||||
- 账号列表、作品列表、素材列表在加载时显示骨架
|
||||
- 骨架数量应与历史数据数量相近(避免加载时数量跳变)
|
||||
|
||||
### 预览加载优化
|
||||
|
||||
- 预览 `video` 使用 `preload="metadata"`
|
||||
- 发布页预览按钮可进行短时 `preload` 预取
|
||||
|
||||
---
|
||||
|
||||
## 轻量 FSD 结构
|
||||
|
||||
- `app/`:页面入口,保持轻量,只做组合与布局
|
||||
- `features/*/model`:业务逻辑与状态(Controller Hook + 子 Hook)
|
||||
- `features/*/ui`:功能 UI 组件(纯 props + 回调,不直接发 API)
|
||||
- `shared/api`:Axios 实例与统一响应类型
|
||||
- `shared/lib`:通用工具函数(media.ts / auth.ts / title.ts)
|
||||
- `shared/hooks`:跨功能通用 hooks
|
||||
- `shared/ui`:跨功能通用 UI(如 SelectPopover)
|
||||
- `shared/types`:跨功能实体类型(User / PublishVideo 等)
|
||||
- `shared/contexts`:全局 Context(AuthContext / TaskContext / CleanupContext)
|
||||
- `components/`:遗留通用组件(VideoPreviewModal)
|
||||
|
||||
## 类型定义规范
|
||||
|
||||
- 通用实体类型(如 User, Account, Video)统一放置在 `src/shared/types/`。
|
||||
- 特定业务类型放在 feature 目录下的 types.ts 或 model 中。
|
||||
- **禁止**在多个地方重复定义 User 接口,统一引用 `import { User } from '@/shared/types/user';`。
|
||||
|
||||
---
|
||||
|
||||
## 用户偏好持久化
|
||||
|
||||
首页涉及样式与字号等用户偏好时,需持久化并在刷新后恢复:
|
||||
|
||||
- **必须持久化**:
|
||||
- 标题样式 ID / 字幕样式 ID
|
||||
- 标题字号 / 字幕字号
|
||||
- 标题显示模式(`short` / `persistent`)
|
||||
- 唇形模型模式(`default` / `fast` / `advanced`)
|
||||
- 背景音乐选择 / 开关状态(当前前端不提供音量滑杆,生成时使用固定音量)
|
||||
- 输出画面比例(`9:16` / `16:9`)
|
||||
- 素材选择 / 历史作品选择
|
||||
- 选中配音 ID (`selectedAudioId`)
|
||||
- 选中参考音频 ID (`selectedRefAudio` 对应 id)
|
||||
- 语速 (`speed`,声音克隆模式)
|
||||
- 语气 (`emotion`,声音克隆模式)
|
||||
- 时间轴段信息 (`useTimelineEditor` 的 localStorage)
|
||||
|
||||
### 历史文案(独立持久化)
|
||||
|
||||
`useSavedScripts` hook 独立管理历史文案的 localStorage 持久化:
|
||||
- key: `vigent_{storageKey}_savedScripts`
|
||||
- 仅在用户手动保存/删除时写入 localStorage,不使用自动持久化 effect
|
||||
- 与 `useHomePersistence` 完全独立,互不影响
|
||||
|
||||
### 实施规范
|
||||
- 使用 `storageKey = userId || 'guest'`,按用户隔离。
|
||||
- **恢复先于保存**:恢复完成前禁止写入(`isRestored` 保护)。
|
||||
- 避免默认值覆盖用户选择(优先读取已保存值)。
|
||||
- 优先使用 `useHomePersistence` 集中管理恢复/保存,页面内避免分散的 localStorage 读写。
|
||||
- **禁止使用签名 URL 作为持久化标识**:Supabase Storage 签名 URL 每次请求都变化,必须使用后端返回的稳定 `id` 字段。
|
||||
- 如需新增持久化字段,必须加入恢复与保存逻辑,并更新本节。
|
||||
|
||||
---
|
||||
|
||||
## 标题输入规则
|
||||
|
||||
- 片头标题与发布信息标题统一限制 15 字。
|
||||
- 中文输入法合成阶段不截断,合成结束后才校验长度。
|
||||
- 首页片头标题修改会同步写入 `vigent_${storageKey}_publish_title`。
|
||||
- 标题显示模式使用 `short` / `persistent` 两个固定值;默认 `short`(短暂显示 4 秒)。
|
||||
- 避免使用 `maxLength` 强制截断输入法合成态。
|
||||
- 推荐使用 `@/shared/hooks/useTitleInput` 统一处理输入逻辑。
|
||||
|
||||
---
|
||||
|
||||
## 发布页交互规则
|
||||
|
||||
- 发布按钮在未选择任何平台时禁用
|
||||
- 仅保留"立即发布",不再提供定时发布 UI/参数
|
||||
- **作品选择持久化**:使用 `video.id`(稳定标识)而非 `video.path`(签名 URL)进行选择、比较和 localStorage 存储。发布时根据 `id` 查找对应 `path` 发送请求。
|
||||
- **新作品优先级**:检测到“刚生成的新视频”时,页面首次恢复优先选中最新视频;之后用户手动改选会继续按持久化值恢复。
|
||||
|
||||
---
|
||||
|
||||
## 新增页面 Checklist
|
||||
|
||||
1. [ ] 导入 `import api from '@/lib/axios'`
|
||||
1. [ ] 导入 `import api from '@/shared/api/axios'`
|
||||
2. [ ] 所有 API 请求使用 `api.get/post/delete()` 而非原生 `fetch`
|
||||
3. [ ] 日期格式化使用固定格式函数,不用 `toLocaleString()`
|
||||
4. [ ] 添加 `'use client'` 指令(如需客户端交互)
|
||||
3. [ ] 日期格式化使用 `@/shared/lib/media` 的 `formatDate`
|
||||
4. [ ] 资源 URL 使用 `resolveMediaUrl`/`resolveAssetUrl`
|
||||
5. [ ] 添加 `'use client'` 指令(如需客户端交互)
|
||||
|
||||
---
|
||||
|
||||
@@ -189,9 +510,11 @@ const formatDate = (timestamp: number) => {
|
||||
|
||||
| 接口 | 方法 | 功能 |
|
||||
|------|------|------|
|
||||
| `/api/ref-audios` | POST | 上传参考音频 (multipart/form-data: file + ref_text) |
|
||||
| `/api/ref-audios` | POST | 上传参考音频 (multipart/form-data: file,ref_text 可选,后端自动 Whisper 转写) |
|
||||
| `/api/ref-audios` | GET | 列出用户的参考音频 |
|
||||
| `/api/ref-audios/{id}` | PUT | 重命名参考音频 |
|
||||
| `/api/ref-audios/{id}` | DELETE | 删除参考音频 (id 需 encodeURIComponent) |
|
||||
| `/api/ref-audios/{id}/retranscribe` | POST | 重新识别参考音频文字(Whisper 转写 + 超 10s 自动截取) |
|
||||
|
||||
### 视频生成 API 扩展
|
||||
|
||||
@@ -210,7 +533,8 @@ await api.post('/api/videos/generate', {
|
||||
text: '口播文案',
|
||||
tts_mode: 'voiceclone',
|
||||
ref_audio_id: 'user_id/timestamp_name.wav',
|
||||
ref_text: '参考音频对应文字',
|
||||
ref_text: '参考音频对应文字', // 从参考音频 metadata 自动获取
|
||||
speed: 1.0, // 语速 (0.8-1.2)
|
||||
});
|
||||
```
|
||||
|
||||
@@ -218,14 +542,25 @@ await api.post('/api/videos/generate', {
|
||||
|
||||
使用 `MediaRecorder` API 录制音频,格式为 `audio/webm`,上传后后端自动转换为 WAV (16kHz mono)。
|
||||
|
||||
- 录音入口放在“我的参考音频”区域底部右侧(与“上传音频”并排)。
|
||||
- 录音交互使用弹窗:开始/停止 -> 试听 -> 使用此录音 / 弃用本次录音。
|
||||
- 关闭录音弹窗时如仍在录制,会先停止录音再关闭。
|
||||
- 录音中禁止点击遮罩关闭(`closeOnOverlay={!isRecording}`);未录音时允许遮罩关闭。
|
||||
|
||||
```typescript
|
||||
// 录音需要用户授权麦克风
|
||||
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
|
||||
const mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm' });
|
||||
```
|
||||
|
||||
### 参考音频自动处理
|
||||
|
||||
- **自动转写**: 上传参考音频时后端自动调用 Whisper 转写内容作为 `ref_text`,无需用户手动输入
|
||||
- **自动截取**: 参考音频超过 10 秒时自动在静音点截取前 10 秒(CosyVoice 建议 3-10 秒)
|
||||
- **重新识别**: 旧参考音频可通过 retranscribe 端点重新转写并截取
|
||||
|
||||
### UI 结构
|
||||
|
||||
配音方式使用 Tab 切换:
|
||||
- **EdgeTTS 音色** - 预设音色 2x3 网格
|
||||
- **声音克隆** - 参考音频列表 + 在线录音 + 参考文字输入
|
||||
- **EdgeTTS 音色** - 统一下拉选择(显示“音色名 + 语言”)
|
||||
- **声音克隆** - 参考音频选择器(含试听/重命名/删除/重识别)+ 底部右侧上传/录音入口(录音弹窗)+ 语速/语气下拉
|
||||
|
||||
@@ -1,64 +1,113 @@
|
||||
# ViGent2 Frontend
|
||||
|
||||
ViGent2 的前端界面,采用 Next.js 14 + TailwindCSS 构建。
|
||||
ViGent2 的前端界面,采用 Next.js 16 + TailwindCSS 构建。
|
||||
|
||||
## 📌 文档定位
|
||||
|
||||
- 本文档用于说明前端功能、运行方式与目录概览(面向使用与协作)。
|
||||
- 开发规范与实现约束请查看 `Docs/FRONTEND_DEV.md`。
|
||||
- 历史变更与里程碑请查看 `Docs/DevLogs/` 与 `Docs/TASK_COMPLETE.md`。
|
||||
|
||||
## ✨ 核心功能
|
||||
|
||||
### 1. 视频生成 (`/`)
|
||||
- **素材管理**: 拖拽上传人物视频,实时预览。
|
||||
- **文案配音**: 集成 EdgeTTS,支持多音色选择 (云溪 / 晓晓)。
|
||||
- **AI 标题/标签**: 一键生成视频标题与标签 (Day 14)。
|
||||
- **标题/字幕样式**: 样式选择 + 预览 + 字号调节 (Day 16)。
|
||||
- **背景音乐**: 试听 + 音量控制 + 选择持久化 (Day 16)。
|
||||
- **交互优化**: 选择项持久化、列表内定位、刷新回顶部 (Day 16)。
|
||||
- **一、文案提取与编辑**: 文案输入/提取/翻译/保存;输入框右下角支持一键扩展到大编辑器。
|
||||
- **二、配音**: 配音方式(EdgeTTS/声音克隆)+ 配音列表(生成/试听/管理)合并为一个板块。
|
||||
- **三、素材编辑**: 视频素材(上传/选择/管理)+ 时间轴编辑(波形/色块/拖拽排序)合并为一个板块。
|
||||
- **四、标题与字幕**: 片头标题/副标题/字幕样式配置;短暂显示/常驻显示;样式预览使用视频片头帧作为真实背景。
|
||||
- **五、背景音乐**: 试听 + 搜索选择 + 选择持久化(无音量滑杆,生成时固定混音系数)。
|
||||
- **六、作品**(右栏): 作品列表 + 作品预览合并为一个板块。
|
||||
- **进度追踪**: 实时显示视频生成进度 (10% -> 100%)。
|
||||
- **结果预览**: 生成完成后直接播放下载。
|
||||
- **本地保存**: 文案/标题自动保存,刷新后恢复 (Day 14)。
|
||||
- **作品预览**: 生成完成后直接播放下载(作品预览 + 历史作品)。
|
||||
- **下载直达**: 首页作品下载与发布成功弹窗下载统一走同源下载接口(`/api/videos/generated/{id}/download`),避免新标签页在线播放。
|
||||
- **预览优化**: 预览视频 `metadata` 预取,首帧加载更快。
|
||||
- **本地保存**: 文案/标题/偏好由 `useHomePersistence` 统一持久化,刷新后恢复。
|
||||
- **历史文案**: 手动保存/加载/删除历史文案,独立 localStorage 持久化。
|
||||
- **选择持久化**: 首页/发布页作品选择均使用稳定 `id` 持久化;新视频生成后优先选中最新,后续用户手动选择持续持久化恢复。
|
||||
- **统一下拉交互**: 首页/发布页业务选择器统一为 SelectPopover(支持自动上拉、已选定位、移动端 BottomSheet);`ScriptEditor` 的“历史文案 / AI多语言”为产品例外,保留原轻量菜单。
|
||||
- **AI 多语言翻译**: 支持 9 种目标语言翻译文案 + 还原原文。
|
||||
|
||||
### 2. 全自动发布 (`/publish`) [Day 7 新增]
|
||||
- **多平台管理**: 统一管理 B站、抖音、小红书账号状态。
|
||||
### 2. 全自动发布 (`/publish`)
|
||||
- **多平台管理**: 统一管理抖音、微信视频号、B站、小红书账号状态。
|
||||
- **扫码登录**:
|
||||
- 集成后端 Playwright 生成的 QR Code。
|
||||
- 实时检测扫码状态 (Wait/Success)。
|
||||
- Cookie 自动保存与状态同步。
|
||||
- **发布配置**: 设置视频标题、标签、简介。
|
||||
- **定时任务**: 支持 "立即发布" 或 "定时发布"。
|
||||
- **作品选择**: SelectPopover 下拉 + 搜索 + 预览弹窗(下拉内可连续预览,不强制收起)。
|
||||
- **选择持久化**: 使用稳定 `video.id` 持久化选择,刷新保持;新视频生成自动选中最新。
|
||||
- **预览兼容**: 签名 URL / 相对路径均可直接预览。
|
||||
- **发布方式**: 仅支持 "立即发布"。
|
||||
- **发布成功清理弹窗**: 全平台发布成功后触发 `CleanupModal`(展示成功平台、截图、下载备份、清理按钮),刷新/跳转后可恢复。
|
||||
- **清理失败兜底**: 清理接口失败时弹窗不关闭且不清本地输入;连续失败达到阈值后可“暂不清理,继续使用”。
|
||||
- **清理范围**: 仅清理输入内容字段(文案/标题/副标题/发布标题/标签),保留样式、字号、边距、模型等用户偏好。
|
||||
|
||||
### 3. 声音克隆 [Day 13 新增]
|
||||
- **TTS 模式选择**: EdgeTTS (预设音色) / 声音克隆 (自定义音色) 切换。
|
||||
- **参考音频管理**: 上传/列表/删除参考音频 (3-20秒 WAV)。
|
||||
- **一键克隆**: 选择参考音频后自动调用 Qwen3-TTS 服务。
|
||||
### 3. 声音克隆
|
||||
- **TTS 模式选择**: EdgeTTS / 声音克隆切换,音色选择统一下拉(显示音色名 + 语言)。
|
||||
- **音色试听**: EdgeTTS 音色列表支持一键试听,按音色 locale 自动选择对应语言的固定示例文案。
|
||||
- **参考音频管理**: 上传/列表/重命名/删除参考音频,上传后自动 Whisper 转写 ref_text + 超 10s 自动截取。
|
||||
- **录音入口**: 参考音频区域底部右侧提供“上传音频 / 录音”入口;录音采用弹窗流程(录制 -> 试听 -> 使用/弃用)。
|
||||
- **录音防误触**: 录音中禁用遮罩关闭(避免误触中断);未录音时可点空白关闭。
|
||||
- **重新识别**: 旧参考音频可重新转写并截取 (RotateCw 按钮)。
|
||||
- **一键克隆**: 选择参考音频后自动调用 CosyVoice 3.0 服务。
|
||||
- **语速控制**: 声音克隆模式下支持 5 档语速 (0.8-1.2),统一下拉,选择持久化。
|
||||
- **语气控制**: 声音克隆模式下支持 4 种语气 (正常/欢快/低沉/严肃),统一下拉,选择持久化。
|
||||
- **多语言支持**: EdgeTTS 10 语言声音列表,声音克隆 language 透传。
|
||||
|
||||
### 4. 字幕与标题 [Day 13 新增]
|
||||
- **片头标题**: 可选输入,视频开头显示 3 秒淡入淡出标题。
|
||||
- **逐字高亮字幕**: 卡拉OK效果,默认开启,可关闭。
|
||||
### 4. 配音前置 + 时间轴编排
|
||||
- **配音独立生成**: 先生成配音 → 选中配音 → 再选素材 → 生成视频。
|
||||
- **配音管理面板**: 生成/试听/改名/删除/选中,异步生成 + 进度轮询。
|
||||
- **时间轴编辑器**: wavesurfer.js 音频波形 + 色块可视化素材分配,拖拽分割线调整各段时长。
|
||||
- **素材截取设置**: ClipTrimmer 双手柄 range slider + HTML5 视频预览播放。
|
||||
- **拖拽排序**: 时间轴色块支持 HTML5 Drag & Drop 调换素材顺序。
|
||||
- **自定义分配**: 后端 `custom_assignments` 支持用户定义的素材分配方案(含 `source_start/source_end` 截取区间)。
|
||||
- **时间轴语义对齐**: 超出音频时仅保留可见段并截齐末段,超出段不参与生成;不足音频时最后可见段自动循环补齐。
|
||||
- **画面比例控制**: 时间轴顶部支持 `9:16 / 16:9` 输出比例选择,设置持久化并透传后端。
|
||||
|
||||
### 5. 字幕与标题
|
||||
- **片头标题**: 可选输入,限制 15 字;支持”短暂显示 / 常驻显示”,默认短暂显示(4 秒);`常驻显示` 时主标题与副标题都会全程显示。
|
||||
- **片头副标题**: 可选输入,限制 20 字;显示在主标题下方,用于补充说明或悬念引导;独立样式配置(字体/字号/颜色/间距),可由 AI 同时生成;与标题共享显示模式设定;仅在视频画面中显示,不参与发布标题。
|
||||
- **标题同步**: 首页片头标题修改会同步到发布信息标题。
|
||||
- **逐字高亮字幕**: 卡拉OK效果,默认开启。
|
||||
- **自动对齐**: 基于 faster-whisper 生成字级别时间戳。
|
||||
- **样式预设**: 标题/字幕样式选择 + 预览 + 字号调节 (Day 16)。
|
||||
- **样式预设**: 标题/字幕/副标题样式选择 + 预览 + 字号调节。
|
||||
- **默认样式**: 标题 90px 站酷快乐体;字幕 60px 经典黄字 + DingTalkJinBuTi。
|
||||
- **样式持久化**: 标题/字幕/副标题样式与字号刷新保留。
|
||||
|
||||
### 5. 背景音乐 [Day 16 新增]
|
||||
- **试听预览**: 点击试听即选中,音量滑块实时生效。
|
||||
- **混音控制**: 仅影响 BGM,配音保持原音量。
|
||||
### 6. 背景音乐
|
||||
- **试听预览**: 下拉列表内可直接试听。
|
||||
- **选择体验**: 发布页同款搜索选择器,打开时自动定位到当前已选。
|
||||
- **混音控制**: 当前前端不提供音量滑杆,生成时固定 `bgm_volume=0.2`,保持配音音量稳定。
|
||||
|
||||
### 6. 账户设置 [Day 15 新增]
|
||||
### 7. 账户设置
|
||||
- **手机号登录**: 11位中国手机号验证登录。
|
||||
- **账户下拉菜单**: 显示有效期 + 修改密码 + 安全退出。
|
||||
- **账户下拉菜单**: 显示手机号(中间四位脱敏)+ 有效期 + 修改密码 + 安全退出。
|
||||
- **修改密码**: 弹窗输入当前密码与新密码,修改后强制重新登录。
|
||||
- **登录即时生效**: 登录成功后 AuthContext 立即写入用户数据,无需刷新即显示手机号。
|
||||
|
||||
### 7. 文案提取助手 (`ScriptExtractionModal`) [Day 15 新增]
|
||||
- **多源提取**: 支持文件拖拽上传与 URL 粘贴 (B站/抖音/TikTok)。
|
||||
- **AI 洗稿**: 集成 GLM-4.7-Flash,自动改写为口播文案。
|
||||
- **一键填入**: 提取结果直接填充至视频生成输入框。
|
||||
- **智能交互**: 实时进度展示,防误触设计。
|
||||
### 8. 付费开通会员 (`/pay`)
|
||||
- **支付宝电脑网站支付**: 跳转支付宝官方收银台,支持扫码/账号登录/余额等多种支付方式。
|
||||
- **自动激活**: 支付成功后异步回调自动激活会员(有效期 1 年),前端轮询检测支付结果。
|
||||
- **到期续费**: 会员到期后登录自动跳转付费页续费,流程与首次开通一致。
|
||||
- **管理员激活**: 管理员手动激活功能并存,两种方式互不影响。
|
||||
|
||||
### 9. 文案创作助手(3 个弹窗)
|
||||
- **文案提取助手** (`ScriptExtractionModal`): 支持文件上传与 URL 提取(需登录),提取结果可一键填入主编辑器。
|
||||
- **AI 智能改写** (`RewriteModal`): 基于 GLM-4.7-Flash 改写文案,支持自定义提示词持久化。
|
||||
- **文案深度学习** (`ScriptLearningModal`): 输入抖音/B站博主主页,分析热门话题并生成口播文案(需登录)。
|
||||
- **统一结果操作栏**: 三个弹窗结果页统一底部 Action Grid 风格,主按钮为「填入文案」,次按钮统一「复制 / 重新生成(或等价返回操作)」。
|
||||
- **登录鉴权**: 依赖受保护接口(`/api/tools/*`、`/api/ai/*`),未登录会触发全局 401 跳转登录。
|
||||
|
||||
## 🛠️ 技术栈
|
||||
|
||||
- **框架**: Next.js 14 (App Router)
|
||||
- **框架**: Next.js 16 (App Router)
|
||||
- **样式**: TailwindCSS
|
||||
- **图标**: Lucide React
|
||||
- **组件**: 自定义现代化组件 (Glassmorphism 风格)
|
||||
- **API**: Axios 实例 `@/lib/axios` (对接后端 FastAPI :8006)
|
||||
- **音频波形**: wavesurfer.js (时间轴编辑器)
|
||||
- **API**: Axios 实例 `@/shared/api/axios` (对接后端 FastAPI :8006)
|
||||
|
||||
## 🚀 开发指南
|
||||
## 🚀 快速开始
|
||||
|
||||
### 安装依赖
|
||||
|
||||
@@ -79,25 +128,39 @@ npm run dev
|
||||
|
||||
```
|
||||
src/
|
||||
├── app/
|
||||
├── app/ # 页面入口 (轻量)
|
||||
│ ├── page.tsx # 视频生成主页
|
||||
│ ├── publish/ # 发布管理页
|
||||
│ │ └── page.tsx
|
||||
│ ├── pay/ # 付费开通会员页
|
||||
│ │ └── page.tsx
|
||||
│ └── layout.tsx # 全局布局 (导航栏)
|
||||
├── components/ # UI 组件
|
||||
│ ├── VideoUploader.tsx # 视频上传
|
||||
│ ├── StatusBadge.tsx # 状态徽章
|
||||
│ └── ...
|
||||
└── lib/ # 工具函数
|
||||
├── features/
|
||||
│ ├── home/
|
||||
│ │ ├── model/ # Home 业务逻辑 (hooks)
|
||||
│ │ └── ui/ # Home UI 组件
|
||||
│ └── publish/
|
||||
│ ├── model/ # Publish 业务逻辑 (hooks)
|
||||
│ └── ui/ # Publish UI 组件
|
||||
├── shared/
|
||||
│ ├── api/ # API 实例
|
||||
│ ├── hooks/ # 通用 hooks
|
||||
│ └── lib/ # 工具函数
|
||||
└── components/ # 跨页面复用 UI
|
||||
```
|
||||
|
||||
## 🔌 后端对接
|
||||
|
||||
- **Base URL**: `http://localhost:8006`
|
||||
- **Base URL**: `http://localhost:8006` (SSR) / 相对路径 (Client)
|
||||
- **URL 统一工具**: `@/shared/lib/media` 提供 `resolveMediaUrl` / `resolveAssetUrl`
|
||||
- **代理配置**: Next.js Rewrites (如需) 或直接 CORS。
|
||||
|
||||
## 🎨 设计规范
|
||||
## 🎨 UI 说明(概览)
|
||||
|
||||
- **主色调**: 深紫/黑色系 (Dark Mode)
|
||||
- **交互**: 悬停微动画 (Hover Effects)
|
||||
- **响应式**: 适配桌面端大屏操作
|
||||
- 业务选择器统一使用 `SelectPopover`(桌面 Popover / 移动端 BottomSheet);`ScriptEditor` 的“历史文案 / AI多语言”保留原轻量菜单。
|
||||
- 业务弹窗统一使用 `AppModal`(统一遮罩、头部、关闭行为与滚动策略)。
|
||||
- 弹窗关闭策略:默认支持 `ESC` / `X` / 点击空白关闭;仅发布成功清理弹窗为强制流程(不允许空白关闭,也不显示 `X`)。
|
||||
- 文案类弹窗结果页按钮统一:底部 Action Grid、主次按钮层级一致、文案动作命名一致(填入/复制/重新生成)。
|
||||
- 视频预览弹窗层级高于下拉菜单;下拉内支持连续预览。
|
||||
- 页面同时适配桌面端与移动端;长列表统一隐藏滚动条。
|
||||
- 详细 UI 规范、持久化规范与交互约束请查看 `Docs/FRONTEND_DEV.md`。
|
||||
|
||||
@@ -137,11 +137,9 @@ CUDA_VISIBLE_DEVICES=1 python -m scripts.inference \
|
||||
└── DEPLOY.md
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## 步骤 7: 性能优化 (预加载模型服务)
|
||||
---
|
||||
|
||||
## 步骤 6: 性能优化(预加载模型服务)
|
||||
|
||||
为了消除每次生成视频时 30-40秒 的模型加载时间,建议运行常驻服务。
|
||||
|
||||
@@ -201,6 +199,29 @@ LatentSync 1.6 需要 ~18GB VRAM。如果遇到 OOM 错误:
|
||||
- `inference_steps`: 增加到 30-50 可提高质量
|
||||
- `guidance_scale`: 增加可改善唇同步,但过高可能导致抖动
|
||||
|
||||
### 编码流水线优化(当前实现)
|
||||
|
||||
LatentSync 内部默认流程有两处冗余编码已优化:
|
||||
|
||||
1. **`read_video` FPS 转换**: 原代码无条件 `ffmpeg -r 25 -crf 18`,现已改为检测 FPS,25fps 时跳过(我们的 `prepare_segment` 已输出 25fps)
|
||||
2. **final mux 双重编码**: 原代码 `imageio` CRF 13 写帧后又用 `libx264 -crf 18` 重编码做 mux,现已改为 `-c:v copy` 流复制
|
||||
|
||||
这两项优化位于:
|
||||
- `latentsync/utils/util.py` — `read_video()` 函数
|
||||
- `latentsync/pipelines/lipsync_pipeline.py` — final mux 命令
|
||||
|
||||
---
|
||||
|
||||
### 无脸帧容错(当前实现)
|
||||
|
||||
素材中部分帧检测不到人脸(转头、遮挡、空镜头)时,不再中断整次推理:
|
||||
|
||||
- `affine_transform_video`: 单帧异常时用最近有效帧填充,全部帧无脸时仍报错
|
||||
- `restore_video`: 无脸帧保留原画面,不做嘴型替换
|
||||
- 后端 `workflow.py`: LatentSync 整体异常时自动回退原视频,任务不会失败
|
||||
|
||||
改动位于 `latentsync/pipelines/lipsync_pipeline.py`。
|
||||
|
||||
---
|
||||
|
||||
## 参考链接
|
||||
|
||||
285
Docs/MUSETALK_DEPLOY.md
Normal file
285
Docs/MUSETALK_DEPLOY.md
Normal file
@@ -0,0 +1,285 @@
|
||||
# MuseTalk 部署指南
|
||||
|
||||
> **更新时间**:2026-03-02
|
||||
> **适用版本**:MuseTalk v1.5 (常驻服务模式)
|
||||
> **架构**:FastAPI 常驻服务 + PM2 进程管理
|
||||
|
||||
---
|
||||
|
||||
## 架构概览
|
||||
|
||||
MuseTalk 作为 **混合唇形同步方案** 的长视频引擎:
|
||||
|
||||
- **短视频 (<100s,按当前 `.env` 示例)** → LatentSync 1.6 (GPU1, 端口 8007)
|
||||
- **长视频 (>=100s,按当前 `.env` 示例)** → MuseTalk 1.5 (GPU0, 端口 8011)
|
||||
- 路由阈值由 `LIPSYNC_DURATION_THRESHOLD` 控制
|
||||
- MuseTalk 不可用时自动回退到 LatentSync
|
||||
|
||||
---
|
||||
|
||||
## 硬件要求
|
||||
|
||||
| 配置 | 最低要求 | 推荐配置 |
|
||||
|------|----------|----------|
|
||||
| GPU | 8GB VRAM (RTX 3060) | 24GB VRAM (RTX 3090) |
|
||||
| 内存 | 32GB | 64GB |
|
||||
| CUDA | 11.7+ | 11.8 |
|
||||
|
||||
> MuseTalk fp16 推理约需 4-8GB 显存,可与 CosyVoice 共享 GPU0。
|
||||
|
||||
---
|
||||
|
||||
## 安装步骤
|
||||
|
||||
### 1. Conda 环境
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||||
conda create -n musetalk python=3.10 -y
|
||||
conda activate musetalk
|
||||
```
|
||||
|
||||
### 2. PyTorch 2.0.1 + CUDA 11.8
|
||||
|
||||
> 必须使用此版本,mmcv 预编译包依赖。
|
||||
|
||||
```bash
|
||||
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
|
||||
```
|
||||
|
||||
### 3. 依赖安装
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
|
||||
# MMLab 系列
|
||||
pip install --no-cache-dir -U openmim
|
||||
mim install mmengine
|
||||
mim install "mmcv==2.0.1"
|
||||
mim install "mmdet==3.1.0"
|
||||
pip install chumpy --no-build-isolation
|
||||
pip install "mmpose==1.1.0" --no-deps
|
||||
|
||||
# FastAPI 服务依赖
|
||||
pip install fastapi uvicorn httpx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 模型权重
|
||||
|
||||
### 目录结构
|
||||
|
||||
```
|
||||
models/MuseTalk/models/
|
||||
├── musetalk/ ← v1 基础模型
|
||||
│ ├── config.json -> musetalk.json (软链接)
|
||||
│ ├── musetalk.json
|
||||
│ ├── musetalkV15 -> ../musetalkV15 (软链接, 关键!)
|
||||
│ └── pytorch_model.bin (~3.2GB)
|
||||
├── musetalkV15/ ← v1.5 UNet 模型
|
||||
│ ├── musetalk.json
|
||||
│ └── unet.pth (~3.2GB)
|
||||
├── sd-vae/ ← Stable Diffusion VAE
|
||||
│ ├── config.json
|
||||
│ └── diffusion_pytorch_model.bin
|
||||
├── whisper/ ← OpenAI Whisper Tiny
|
||||
│ ├── config.json
|
||||
│ ├── pytorch_model.bin (~151MB)
|
||||
│ └── preprocessor_config.json
|
||||
├── dwpose/ ← DWPose 人体姿态检测
|
||||
│ └── dw-ll_ucoco_384.pth (~387MB)
|
||||
├── syncnet/ ← SyncNet 唇形同步评估
|
||||
│ └── latentsync_syncnet.pt
|
||||
└── face-parse-bisent/ ← 人脸解析模型
|
||||
├── 79999_iter.pth (~53MB)
|
||||
└── resnet18-5c106cde.pth (~45MB)
|
||||
```
|
||||
|
||||
### 下载方式
|
||||
|
||||
使用项目自带脚本:
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||||
conda activate musetalk
|
||||
bash download_weights.sh
|
||||
```
|
||||
|
||||
或手动 Python API 下载:
|
||||
|
||||
```bash
|
||||
conda activate musetalk
|
||||
export HF_ENDPOINT=https://hf-mirror.com
|
||||
python -c "
|
||||
from huggingface_hub import snapshot_download
|
||||
snapshot_download('TMElyralab/MuseTalk', local_dir='models',
|
||||
allow_patterns=['musetalk/*', 'musetalkV15/*'])
|
||||
snapshot_download('stabilityai/sd-vae-ft-mse', local_dir='models/sd-vae',
|
||||
allow_patterns=['config.json', 'diffusion_pytorch_model.bin'])
|
||||
snapshot_download('openai/whisper-tiny', local_dir='models/whisper',
|
||||
allow_patterns=['config.json', 'pytorch_model.bin', 'preprocessor_config.json'])
|
||||
snapshot_download('yzd-v/DWPose', local_dir='models/dwpose',
|
||||
allow_patterns=['dw-ll_ucoco_384.pth'])
|
||||
"
|
||||
```
|
||||
|
||||
### 创建必要的软链接
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk/models/musetalk
|
||||
ln -sf musetalk.json config.json
|
||||
ln -sf ../musetalkV15 musetalkV15
|
||||
```
|
||||
|
||||
> **关键**:`musetalk/musetalkV15` 软链接缺失会导致权重检测失败 (`weights: False`)。
|
||||
|
||||
---
|
||||
|
||||
## 服务启动
|
||||
|
||||
### PM2 进程管理(推荐)
|
||||
|
||||
```bash
|
||||
# 首次注册
|
||||
cd /home/rongye/ProgramFiles/ViGent2
|
||||
pm2 start run_musetalk.sh --name vigent2-musetalk
|
||||
pm2 save
|
||||
|
||||
# 日常管理
|
||||
pm2 restart vigent2-musetalk
|
||||
pm2 logs vigent2-musetalk
|
||||
pm2 stop vigent2-musetalk
|
||||
```
|
||||
|
||||
### 手动启动
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||||
/home/rongye/ProgramFiles/miniconda3/envs/musetalk/bin/python scripts/server.py
|
||||
```
|
||||
|
||||
### 健康检查
|
||||
|
||||
```bash
|
||||
curl http://localhost:8011/health
|
||||
# {"status":"ok","model_loaded":true}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 后端配置
|
||||
|
||||
`backend/.env` 中的相关变量:
|
||||
|
||||
```ini
|
||||
# MuseTalk 基础配置
|
||||
MUSETALK_GPU_ID=0 # GPU 编号 (与 CosyVoice 共存)
|
||||
MUSETALK_API_URL=http://localhost:8011 # 常驻服务地址
|
||||
MUSETALK_BATCH_SIZE=32 # 推理批大小
|
||||
MUSETALK_VERSION=v15 # 模型版本
|
||||
MUSETALK_USE_FLOAT16=true # 半精度加速
|
||||
|
||||
# 推理质量参数
|
||||
MUSETALK_DETECT_EVERY=2 # 人脸检测降频间隔 (帧,越小越准但更慢)
|
||||
MUSETALK_BLEND_CACHE_EVERY=2 # BiSeNet mask 缓存更新间隔 (帧)
|
||||
MUSETALK_AUDIO_PADDING_LEFT=2 # Whisper 时序上下文 (左)
|
||||
MUSETALK_AUDIO_PADDING_RIGHT=2 # Whisper 时序上下文 (右)
|
||||
MUSETALK_EXTRA_MARGIN=14 # v1.5 下巴区域扩展像素
|
||||
MUSETALK_DELAY_FRAME=0 # 音频-口型对齐偏移 (帧)
|
||||
MUSETALK_BLEND_MODE=jaw # 融合模式: auto / jaw / raw
|
||||
MUSETALK_FACEPARSING_LEFT_CHEEK_WIDTH=90 # 面颊宽度 (仅 v1.5)
|
||||
MUSETALK_FACEPARSING_RIGHT_CHEEK_WIDTH=90
|
||||
|
||||
# 编码质量参数
|
||||
MUSETALK_ENCODE_CRF=14 # CRF 越小越清晰 (14≈接近视觉无损)
|
||||
MUSETALK_ENCODE_PRESET=slow # x264 preset (slow=高压缩效率)
|
||||
|
||||
# 混合唇形同步路由
|
||||
LIPSYNC_DURATION_THRESHOLD=100 # 秒, >=此值用 MuseTalk
|
||||
```
|
||||
|
||||
> **参数档位参考**:
|
||||
> - 速度优先:`DETECT_EVERY=5, BLEND_CACHE_EVERY=5, ENCODE_CRF=18, ENCODE_PRESET=medium`
|
||||
> - 质量优先(当前):`DETECT_EVERY=2, BLEND_CACHE_EVERY=2, ENCODE_CRF=14, ENCODE_PRESET=slow`
|
||||
|
||||
---
|
||||
|
||||
## 相关文件
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `models/MuseTalk/scripts/server.py` | FastAPI 常驻服务 (端口 8011) |
|
||||
| `run_musetalk.sh` | PM2 启动脚本 |
|
||||
| `backend/app/services/lipsync_service.py` | 混合路由 + `_call_musetalk_server()` |
|
||||
| `backend/app/core/config.py` | `MUSETALK_*` 配置项 |
|
||||
|
||||
---
|
||||
|
||||
## 性能优化 (server.py v2)
|
||||
|
||||
首次长视频测试 (136s, 3404 帧) 耗时 30 分钟。分析发现瓶颈在人脸检测 (28%)、BiSeNet 合成 (22%)、I/O (17%),而非 UNet 推理 (17%)。
|
||||
|
||||
### 已实施优化
|
||||
|
||||
| 优化项 | 说明 |
|
||||
|--------|------|
|
||||
| `MUSETALK_BATCH_SIZE` 8→32 | RTX 3090 显存充裕,UNet 推理加速 ~3x |
|
||||
| cv2.VideoCapture 直读帧 | 跳过 ffmpeg→PNG→imread 链路 |
|
||||
| 人脸检测降频 (每N帧) | DWPose + FaceAlignment 只在采样帧运行,中间帧线性插值 bbox |
|
||||
| BiSeNet mask 缓存 (每N帧) | `get_image_prepare_material` 每 N 帧运行,中间帧复用 |
|
||||
| FFmpeg rawvideo 管道直编码 | 原 `cv2.VideoWriter(mp4v)` 中间有损文件改为 stdin 管道直写,消除一次冗余有损编码 |
|
||||
| 参数环境变量化 | 所有推理/编码参数从 `.env` 读取,支持速度优先/质量优先快速切换 |
|
||||
| 每阶段计时 | 7 个阶段精确计时,方便后续调优 |
|
||||
|
||||
### 编码链路
|
||||
|
||||
```
|
||||
UNet 推理帧 (raw BGR24)
|
||||
→ FFmpeg rawvideo stdin 管道
|
||||
→ 一次 libx264 编码 (CRF 14, preset slow) + 音频 mux
|
||||
→ 最终输出 .mp4
|
||||
```
|
||||
|
||||
与旧流程对比:消除了 `cv2.VideoWriter(mp4v)` 中间有损文件,编码次数从 2 次减至 1 次。
|
||||
|
||||
### 调优参数
|
||||
|
||||
所有参数通过 `backend/.env` 配置(修改后需重启 MuseTalk 服务生效):
|
||||
|
||||
```ini
|
||||
MUSETALK_DETECT_EVERY=2 # 人脸检测降频间隔 (帧),质量优先用 2,速度优先用 5
|
||||
MUSETALK_BLEND_CACHE_EVERY=2 # BiSeNet mask 缓存间隔 (帧)
|
||||
MUSETALK_ENCODE_CRF=14 # 编码质量 (14≈接近视觉无损,18=高质量)
|
||||
MUSETALK_ENCODE_PRESET=slow # 编码速度 (slow=高压缩效率,medium=平衡)
|
||||
```
|
||||
|
||||
> 对于口播视频 (人脸几乎不动),detect_every=5 的插值误差可忽略。
|
||||
> 如人脸运动剧烈或追求最佳质量,使用 detect_every=2。
|
||||
|
||||
---
|
||||
|
||||
## 常见问题
|
||||
|
||||
### huggingface-hub 版本冲突
|
||||
|
||||
```
|
||||
ImportError: huggingface-hub>=0.19.3,<1.0 is required
|
||||
```
|
||||
|
||||
**解决**:降级 huggingface-hub
|
||||
|
||||
```bash
|
||||
pip install "huggingface-hub>=0.19.3,<1.0"
|
||||
```
|
||||
|
||||
### mmcv 导入失败
|
||||
|
||||
```bash
|
||||
pip uninstall mmcv mmcv-full -y
|
||||
mim install "mmcv==2.0.1"
|
||||
```
|
||||
|
||||
### 音视频长度不匹配
|
||||
|
||||
已在 `musetalk/utils/audio_processor.py` 中修复(零填充逻辑),无需额外处理。
|
||||
215
Docs/PUBLISH_DEPLOY.md
Normal file
215
Docs/PUBLISH_DEPLOY.md
Normal file
@@ -0,0 +1,215 @@
|
||||
# 多平台发布部署与实现说明(抖音 / 微信视频号 / B站 / 小红书)
|
||||
|
||||
## 1. 目标
|
||||
|
||||
本文件用于集中说明以下内容:
|
||||
|
||||
- 平台登录(扫码)如何实现
|
||||
- 自动化发布链路如何实现
|
||||
- 部署时必须具备的运行环境与配置
|
||||
- 常见故障如何快速定位
|
||||
|
||||
适用代码范围:`backend/app/modules/publish`、`backend/app/services/publish_service.py`、`backend/app/services/qr_login_service.py`、`backend/app/services/uploader/*`。
|
||||
|
||||
---
|
||||
|
||||
## 2. 总体架构
|
||||
|
||||
### 2.1 API 入口
|
||||
|
||||
- `POST /api/publish`:执行发布
|
||||
- `POST /api/publish/login/{platform}`:获取二维码并启动登录会话
|
||||
- `GET /api/publish/login/status/{platform}`:轮询扫码状态
|
||||
- `POST /api/publish/logout/{platform}`:注销并删除对应 Cookie
|
||||
- `POST /api/publish/cookies/save/{platform}`:手动保存浏览器 `document.cookie`
|
||||
- `GET /api/publish/accounts`:查询各平台是否已登录
|
||||
- `GET /api/publish/screenshot/{filename}`:读取发布成功截图(需登录)
|
||||
- `POST /api/videos/cleanup`:清理当前用户工作区生成产物(发布成功后前端触发)
|
||||
|
||||
核心路由文件:`backend/app/modules/publish/router.py`。
|
||||
|
||||
### 2.2 服务分层
|
||||
|
||||
- `PublishService`:平台路由、账号隔离、视频路径处理、调用具体 uploader
|
||||
- `QRLoginService`:Playwright 获取二维码、监控扫码结果、保存 Cookie
|
||||
- `*Uploader`:平台发布自动化(抖音/微信/小红书基于 Playwright,B站基于 biliup)
|
||||
|
||||
### 2.3 发布成功后的清理联动
|
||||
|
||||
- 前端 `CleanupContext` 在“本次所选平台全部发布成功”时触发清理弹窗。
|
||||
- 用户点击清理时先调用 `POST /api/videos/cleanup`,仅接口成功后才清本地输入并关闭弹窗。
|
||||
- 清理成功后前端派发 `vigent:workspace-cleared` 事件,当前发布页会就地重置标题/标签输入态。
|
||||
- 接口失败时弹窗保持打开并允许重试;连续失败达到阈值后可“暂不清理,继续使用”。
|
||||
- 弹窗“下载视频备份”走同源下载接口:`GET /api/videos/generated/{video_id}/download`,确保浏览器直接保存文件而非新标签页播放。
|
||||
|
||||
---
|
||||
|
||||
## 3. Cookie 与账号隔离
|
||||
|
||||
### 3.1 存储路径
|
||||
|
||||
- 用户隔离路径:`backend/user_data/{user_uuid}/cookies/{platform}_cookies.json`
|
||||
- 兼容旧版路径:`backend/app/cookies/{platform}_cookies.json`
|
||||
|
||||
路径管理文件:`backend/app/core/paths.py`。
|
||||
|
||||
### 3.2 Cookie 格式
|
||||
|
||||
- `bilibili`:简化字典格式(`SESSDATA` / `bili_jct` / `DedeUserID` / `DedeUserID__ckMd5`)
|
||||
- `douyin` / `weixin` / `xiaohongshu`:Playwright `storage_state` 格式(`cookies + origins`)
|
||||
|
||||
对应逻辑:`backend/app/services/publish_service.py` 与 `backend/app/services/qr_login_service.py`。
|
||||
|
||||
---
|
||||
|
||||
## 4. 运行与部署要求
|
||||
|
||||
### 4.1 系统依赖
|
||||
|
||||
- Python 3.10+
|
||||
- Node.js 18+
|
||||
- Playwright Chromium(`playwright install chromium`)
|
||||
- 系统 Chrome(建议)
|
||||
- Xvfb(建议,尤其抖音/微信 headful)
|
||||
|
||||
### 4.2 启动建议
|
||||
|
||||
- 推荐使用根目录脚本启动后端:`./run_backend.sh`
|
||||
- 脚本内置 `xvfb-run`,适合无物理桌面服务器场景
|
||||
|
||||
脚本:`run_backend.sh`。
|
||||
|
||||
### 4.3 环境变量(核心)
|
||||
|
||||
统一在 `backend/.env` 配置,配置定义见 `backend/app/core/config.py`。
|
||||
|
||||
- 抖音:`DOUYIN_HEADLESS_MODE`、`DOUYIN_CHROME_PATH`、`DOUYIN_USER_AGENT`、`DOUYIN_LOCALE`、`DOUYIN_TIMEZONE_ID`
|
||||
- 微信:`WEIXIN_HEADLESS_MODE`、`WEIXIN_CHROME_PATH`、`WEIXIN_USER_AGENT`、`WEIXIN_LOCALE`、`WEIXIN_TIMEZONE_ID`、`WEIXIN_TRANSCODE_MODE`
|
||||
- 小红书:`XIAOHONGSHU_HEADLESS_MODE`、`XIAOHONGSHU_CHROME_PATH`、`XIAOHONGSHU_USER_AGENT`、`XIAOHONGSHU_LOCALE`、`XIAOHONGSHU_TIMEZONE_ID`
|
||||
- 发布截图目录:`PUBLISH_SCREENSHOT_DIR`
|
||||
|
||||
说明:小红书这些配置当前用于发布 uploader;扫码登录服务里抖音/微信使用独立配置,B站/小红书登录走通用默认浏览器参数。
|
||||
|
||||
---
|
||||
|
||||
## 5. 登录实现(扫码)
|
||||
|
||||
统一由 `QRLoginService` 处理:
|
||||
|
||||
1. 打开平台登录页并提取二维码(CSS/Text 多策略)
|
||||
2. 前端展示二维码给用户扫码
|
||||
3. 后台监控 URL + Session Cookie 变化
|
||||
4. 登录成功后保存 Cookie 文件
|
||||
|
||||
关键文件:`backend/app/services/qr_login_service.py`。
|
||||
|
||||
### 5.1 抖音
|
||||
|
||||
- 登录页:`https://creator.douyin.com/`
|
||||
- 额外能力:监听 `check_qrconnect` 接口,支持识别 `redirect_url`
|
||||
- 特殊场景:若触发刷脸验证,会提取验证二维码 `face_verify_qr` 返回前端
|
||||
|
||||
### 5.2 微信视频号
|
||||
|
||||
- 登录页:`https://channels.weixin.qq.com/platform/`
|
||||
- 二维码提取支持 `img/canvas/svg` 等兜底选择器
|
||||
|
||||
### 5.3 小红书
|
||||
|
||||
- 登录页:`https://creator.xiaohongshu.com/`
|
||||
- 关键修复:默认可能落在短信登录页,先自动切换到扫码模式再提取二维码
|
||||
- 成功判定支持 `/new/home`,避免仅依赖旧 `success_indicator`
|
||||
|
||||
### 5.4 B站
|
||||
|
||||
- 登录页:`https://passport.bilibili.com/login`
|
||||
- 扫码成功后保存 B站所需核心 Cookie 字段
|
||||
|
||||
---
|
||||
|
||||
## 6. 自动化发布实现
|
||||
|
||||
### 6.1 抖音(Playwright)
|
||||
|
||||
文件:`backend/app/services/uploader/douyin_uploader.py`
|
||||
|
||||
- 使用 `storage_state` 打开浏览器上下文
|
||||
- 自动进入上传页,触发 file chooser 上传
|
||||
- 上传完成后填写标题/简介/话题,必要时处理封面
|
||||
- 发布成功判定:页面跳转、接口信号、管理页核验
|
||||
- 成功后回写 Cookie,并保存发布成功截图
|
||||
|
||||
### 6.2 微信视频号(Playwright)
|
||||
|
||||
文件:`backend/app/services/uploader/weixin_uploader.py`
|
||||
|
||||
- 进入视频号创作平台,自动定位上传入口
|
||||
- 标题/描述/标签按当前产品规则统一写入“视频描述”字段
|
||||
- 发布成功判定:`post_create` API 或页面离开创建页
|
||||
- 成功后回写 Cookie,并保存发布成功截图
|
||||
|
||||
### 6.3 小红书(Playwright)
|
||||
|
||||
文件:`backend/app/services/uploader/xiaohongshu_uploader.py`
|
||||
|
||||
- 自动进入发布页并触发上传
|
||||
- 上传阶段增强:
|
||||
- `UPLOAD_SIGNAL_TIMEOUT` 启动探测窗口
|
||||
- 无后缀视频文件自动准备带后缀临时文件(`hardlink/copy`)
|
||||
- 文件名后缀一致性校验
|
||||
- `UPLOAD_IDLE_TIMEOUT` 空转超时保护,避免长时间“假卡住”
|
||||
- 发布成功判定:URL 跳转 + 成功文案 + 发布 API 信号
|
||||
- 成功后回写 Cookie,并返回成功截图 URL
|
||||
|
||||
### 6.4 B站(biliup)
|
||||
|
||||
文件:`backend/app/services/uploader/bilibili_uploader.py`
|
||||
|
||||
- 使用 biliup SDK,不依赖 Playwright 发布流程
|
||||
- 读取 B站 Cookie,调用 biliup 上传并提交
|
||||
- 返回 `bvid/aid` 对应链接(若 API 返回)
|
||||
|
||||
---
|
||||
|
||||
## 7. 调试与排障
|
||||
|
||||
### 7.1 后端日志
|
||||
|
||||
- PM2 输出日志:`~/.pm2/logs/vigent2-backend-out.log`
|
||||
- PM2 错误日志:`~/.pm2/logs/vigent2-backend-error.log`
|
||||
|
||||
### 7.2 常见问题
|
||||
|
||||
- 现象:登录二维码拿不到
|
||||
- 优先检查平台登录页是否改版(selector 失效)
|
||||
- 小红书需确认是否仍停留短信登录视图
|
||||
|
||||
- 现象:发布看起来卡住
|
||||
- 检查是否长期停留“等待上传状态/等待发布结果”
|
||||
- 小红书优先检查上传文件名后缀与 MIME 识别
|
||||
|
||||
- 现象:突然要求重新登录
|
||||
- 通常为 Cookie 失效或平台风控,需要重新扫码
|
||||
|
||||
### 7.3 调试产物
|
||||
|
||||
- 开启对应 `*_DEBUG_ARTIFACTS` 可输出调试截图/网络日志
|
||||
- 成功截图通过 `/api/publish/screenshot/{filename}` 回传前端
|
||||
|
||||
---
|
||||
|
||||
## 8. 建议的验收流程(每次部署后)
|
||||
|
||||
1. 健康检查:`curl http://127.0.0.1:8006/health`
|
||||
2. 登录检查:分别触发 4 个平台扫码登录并确认状态轮询可达成功
|
||||
3. 发布检查:四个平台各发 1 条测试视频(或最少覆盖当日变更平台)
|
||||
4. 截图检查:确认成功截图可通过 `/api/publish/screenshot/{filename}` 拉取
|
||||
5. 日志检查:确认无持续重试、无长时间空转、无明显 selector 失败风暴
|
||||
|
||||
---
|
||||
|
||||
## 9. 关联文档
|
||||
|
||||
- 总部署文档:`Docs/DEPLOY_MANUAL.md`
|
||||
- 后端说明:`Docs/BACKEND_README.md`
|
||||
- 当日变更记录:`Docs/DevLogs/Day31.md`
|
||||
@@ -1,6 +1,10 @@
|
||||
# Qwen3-TTS 1.7B 部署指南
|
||||
|
||||
> 本文档描述如何在 Ubuntu 服务器上部署 Qwen3-TTS 1.7B-Base 声音克隆模型。
|
||||
>
|
||||
> ⚠️ **状态:历史归档(已停用)**
|
||||
> 当前项目生产环境已切换到 CosyVoice 3.0,请优先参考 `Docs/COSYVOICE3_DEPLOY.md`。
|
||||
> 本文档仅保留用于回溯旧方案,不建议新部署继续使用。
|
||||
|
||||
## 系统要求
|
||||
|
||||
@@ -298,12 +302,20 @@ Response: audio/wav 文件
|
||||
SoX could not be found!
|
||||
```
|
||||
|
||||
**解决**: 通过 conda 安装 sox:
|
||||
**解决**:
|
||||
|
||||
1. 通过 conda 安装 sox:
|
||||
|
||||
```bash
|
||||
conda install -y -c conda-forge sox
|
||||
```
|
||||
|
||||
2. 确保启动脚本 `run_qwen_tts.sh` 中已 export conda env bin 到 PATH(PM2 启动时系统 PATH 不含 conda 环境目录):
|
||||
|
||||
```bash
|
||||
export PATH="/home/rongye/ProgramFiles/miniconda3/envs/qwen-tts/bin:$PATH"
|
||||
```
|
||||
|
||||
### CUDA 内存不足
|
||||
|
||||
Qwen3-TTS 1.7B 通常需要 8-10GB VRAM。如果遇到 OOM:
|
||||
@@ -371,6 +383,7 @@ FOR INSERT TO anon WITH CHECK (bucket_id = 'ref-audios');
|
||||
|
||||
| 日期 | 版本 | 说明 |
|
||||
|------|------|------|
|
||||
| 2026-02-09 | 1.2.0 | 修复 SoX PATH 问题(run_qwen_tts.sh export conda bin),每次生成后 empty_cache() |
|
||||
| 2026-01-30 | 1.1.0 | 明确默认模型升级为 1.7B-Base,替换旧版 0.6B 路径 |
|
||||
|
||||
---
|
||||
|
||||
@@ -15,11 +15,17 @@
|
||||
原有流程:
|
||||
文本 → EdgeTTS → 音频 → LatentSync → FFmpeg合成 → 最终视频
|
||||
|
||||
新流程:
|
||||
文本 → EdgeTTS → 音频 ─┬→ LatentSync → 唇形视频 ─┐
|
||||
└→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频
|
||||
新流程 (单素材):
|
||||
文本 → EdgeTTS/CosyVoice/预生成配音 → 音频 ─┬→ LatentSync/MuseTalk → 唇形视频 ─┐
|
||||
└→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频
|
||||
|
||||
新流程 (多素材):
|
||||
音频 → 多素材按 custom_assignments 拼接 → LatentSync/MuseTalk (单次推理) → 唇形视频 ─┐
|
||||
音频 → faster-whisper → 字幕JSON ─────────────────────────────────────────────┴→ Remotion合成 → 最终视频
|
||||
```
|
||||
|
||||
> **唇形同步路由**: 短视频 (<100s,按当前 `.env` 示例) 用 LatentSync 1.6 (GPU1),长视频 (>=100s,按当前 `.env` 示例) 用 MuseTalk 1.5 (GPU0),由 `LIPSYNC_DURATION_THRESHOLD` 控制。
|
||||
|
||||
## 系统要求
|
||||
|
||||
| 组件 | 要求 |
|
||||
@@ -52,6 +58,9 @@ cd /home/rongye/ProgramFiles/ViGent2/remotion
|
||||
|
||||
# 安装依赖
|
||||
npm install
|
||||
|
||||
# 预编译渲染脚本 (生产环境必须)
|
||||
npm run build:render
|
||||
```
|
||||
|
||||
### 步骤 3: 重启后端服务
|
||||
@@ -137,8 +146,8 @@ remotion/
|
||||
| 阶段 | 进度 | 说明 |
|
||||
|------|------|------|
|
||||
| 下载素材 | 0% → 5% | 从 Supabase 下载输入视频 |
|
||||
| TTS 语音生成 | 5% → 25% | EdgeTTS 或 Qwen3-TTS 生成音频 |
|
||||
| 唇形同步 | 25% → 80% | LatentSync 推理 |
|
||||
| TTS 语音生成 | 5% → 25% | EdgeTTS / CosyVoice / 预生成配音下载 |
|
||||
| 唇形同步 | 25% → 80% | LatentSync / MuseTalk(按阈值路由) |
|
||||
| 字幕对齐 | 80% → 85% | faster-whisper 生成字级别时间戳 |
|
||||
| Remotion 渲染 | 85% → 95% | 合成字幕和标题 |
|
||||
| 上传结果 | 95% → 100% | 上传到 Supabase Storage |
|
||||
@@ -178,7 +187,9 @@ Remotion 渲染参数在 `backend/app/services/remotion_service.py` 中配置:
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `fps` | 25 | 输出帧率 |
|
||||
| `title_duration` | 3.0 | 标题显示时长(秒) |
|
||||
| `concurrency` | 4 | Remotion 并发渲染进程数(默认 4,可通过 `--concurrency` CLI 参数覆盖) |
|
||||
| `title_display_mode` | `short` | 标题显示模式(`short`=短暂显示;`persistent`=常驻显示) |
|
||||
| `title_duration` | 4.0 | 标题显示时长(秒,仅 `short` 模式生效) |
|
||||
|
||||
---
|
||||
|
||||
@@ -230,6 +241,15 @@ const bundleLocation = await bundle({
|
||||
const videoUrl = staticFile(videoSrc); // 使用 staticFile
|
||||
```
|
||||
|
||||
**问题**: Remotion 渲染失败 - 404 视频文件找不到(bundle 缓存问题)
|
||||
|
||||
Remotion 使用 bundle 缓存加速打包。缓存命中时,新生成的视频/字体文件需要硬链接到缓存的 `public/` 目录。如果出现 404 错误,清除缓存重试:
|
||||
|
||||
```bash
|
||||
rm -rf /home/rongye/ProgramFiles/ViGent2/remotion/.remotion-bundle-cache
|
||||
pm2 restart vigent2-backend
|
||||
```
|
||||
|
||||
**问题**: Remotion 渲染失败
|
||||
|
||||
查看后端日志:
|
||||
@@ -265,7 +285,7 @@ wget https://github.com/googlefonts/noto-cjk/raw/main/Sans/OTF/SimplifiedChinese
|
||||
|
||||
### 使用 GPU 0
|
||||
|
||||
faster-whisper 默认使用 GPU 0,与 LatentSync (GPU 1) 分开,避免显存冲突。如需指定 GPU:
|
||||
faster-whisper 默认使用 GPU 0,与 MuseTalk 共享 GPU 0;LatentSync 使用 GPU 1,互不冲突。如需指定 GPU:
|
||||
|
||||
```python
|
||||
# 在 whisper_service.py 中修改
|
||||
@@ -279,4 +299,10 @@ WhisperService(device="cuda:0") # 或 "cuda:1"
|
||||
| 日期 | 版本 | 说明 |
|
||||
|------|------|------|
|
||||
| 2026-01-29 | 1.0.0 | 初始版本,使用 faster-whisper + Remotion 实现逐字高亮字幕和片头标题 |
|
||||
| 2026-02-10 | 1.1.0 | 更新架构图:多素材 concat-then-infer、预生成配音选项 |
|
||||
| 2026-01-30 | 1.0.1 | 字幕高亮样式与标题动画优化,视觉表现更清晰 |
|
||||
| 2026-02-25 | 1.2.0 | 字幕时间戳从线性插值改为 Whisper 节奏映射,修复长视频字幕漂移 |
|
||||
| 2026-02-27 | 1.3.0 | 架构图更新 MuseTalk 混合路由;Remotion 并发渲染从 8 提升到 16;GPU 分配说明更新 |
|
||||
| 2026-02-28 | 1.3.1 | MuseTalk 合成阶段优化:纯 numpy blending + FFmpeg pipe NVENC GPU 硬编码替代双重编码 |
|
||||
| 2026-02-28 | 1.4.0 | compose 流复制替代重编码;FFmpeg 超时保护 (600s/30s);Remotion 并发 16→4;Whisper 时间戳平滑 + 原文节奏映射;全局视频生成 Semaphore(2);Redis 任务 TTL |
|
||||
| 2026-03-02 | 1.5.0 | Remotion bundle 缓存修复(硬链接视频/字体到 cached public 目录);编码流水线优化 prepare_segment/normalize CRF 23→18;多素材 concat 改为流复制;MuseTalk 合成改为 rawvideo 管道 + `libx264`(可配 CRF/preset) |
|
||||
|
||||
@@ -1,424 +0,0 @@
|
||||
# 数字人口播视频生成系统 - 实现计划
|
||||
|
||||
## 项目目标
|
||||
|
||||
构建一个开源的数字人口播视频生成系统,功能包括:
|
||||
- 上传静态人物视频 → 生成口播视频(唇形同步)
|
||||
- TTS 配音或声音克隆
|
||||
- 字幕自动生成与渲染
|
||||
- AI 自动生成标题与标签
|
||||
- 一键发布到多个社交平台
|
||||
|
||||
---
|
||||
|
||||
## 技术架构
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 前端 (Next.js) │
|
||||
│ 素材管理 | 视频生成 | 发布管理 | 任务状态 │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│ REST API
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ 后端 (FastAPI) │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ 异步任务队列 (asyncio) │
|
||||
│ ├── 视频生成任务 │
|
||||
│ ├── TTS 配音任务 │
|
||||
│ └── 自动发布任务 │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌──────────┐ ┌──────────┐ ┌──────────┐
|
||||
│LatentSync│ │ FFmpeg │ │Playwright│
|
||||
│ 唇形同步 │ │ 视频合成 │ │ 自动发布 │
|
||||
└──────────┘ └──────────┘ └──────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 技术选型
|
||||
|
||||
| 模块 | 技术选择 | 备选方案 |
|
||||
|------|----------|----------|
|
||||
| **前端框架** | Next.js 14 | Vue 3 + Vite |
|
||||
| **UI 组件库** | Tailwind + shadcn/ui | Ant Design |
|
||||
| **后端框架** | FastAPI | Flask |
|
||||
| **任务队列** | Celery + Redis | RQ / Dramatiq |
|
||||
| **唇形同步** | **LatentSync 1.6** | MuseTalk / Wav2Lip |
|
||||
| **TTS 配音** | EdgeTTS | CosyVoice |
|
||||
| **声音克隆** | **Qwen3-TTS 1.7B** ✅ | GPT-SoVITS |
|
||||
| **视频处理** | FFmpeg | MoviePy |
|
||||
| **自动发布** | social-auto-upload | 自行实现 |
|
||||
| **数据库** | SQLite → PostgreSQL | MySQL |
|
||||
| **文件存储** | 本地 / MinIO | 阿里云 OSS |
|
||||
|
||||
---
|
||||
|
||||
## 分阶段实施计划
|
||||
|
||||
### 阶段一:核心功能验证 (MVP)
|
||||
|
||||
> **目标**:验证 MuseTalk + EdgeTTS 效果,跑通端到端流程
|
||||
|
||||
#### 1.1 环境搭建
|
||||
|
||||
```bash
|
||||
# 创建项目目录
|
||||
mkdir TalkingHeadAgent
|
||||
cd TalkingHeadAgent
|
||||
|
||||
# 克隆 MuseTalk
|
||||
git clone https://github.com/TMElyralab/MuseTalk.git
|
||||
|
||||
# 安装依赖
|
||||
cd MuseTalk
|
||||
pip install -r requirements.txt
|
||||
|
||||
# 下载模型权重 (按官方文档)
|
||||
```
|
||||
|
||||
#### 1.2 集成 EdgeTTS
|
||||
|
||||
```python
|
||||
# tts_engine.py
|
||||
import edge_tts
|
||||
import asyncio
|
||||
|
||||
async def text_to_speech(text: str, voice: str = "zh-CN-YunxiNeural", output_path: str = "output.mp3"):
|
||||
communicate = edge_tts.Communicate(text, voice)
|
||||
await communicate.save(output_path)
|
||||
return output_path
|
||||
```
|
||||
|
||||
#### 1.3 端到端测试脚本
|
||||
|
||||
```python
|
||||
# test_pipeline.py
|
||||
"""
|
||||
1. 文案 → EdgeTTS → 音频
|
||||
2. 静态视频 + 音频 → MuseTalk → 口播视频
|
||||
3. 添加字幕 → FFmpeg → 最终视频
|
||||
"""
|
||||
```
|
||||
|
||||
#### 1.4 验证标准
|
||||
- [ ] MuseTalk 能正常推理
|
||||
- [ ] 唇形与音频同步率 > 90%
|
||||
- [ ] 单个视频生成时间 < 2 分钟
|
||||
|
||||
---
|
||||
|
||||
### 阶段二:后端 API 开发
|
||||
|
||||
> **目标**:将核心功能封装为 API,支持异步任务
|
||||
|
||||
#### 2.1 项目结构
|
||||
|
||||
```
|
||||
backend/
|
||||
├── app/
|
||||
│ ├── main.py # FastAPI 入口
|
||||
│ ├── api/
|
||||
│ │ ├── videos.py # 视频生成 API
|
||||
│ │ ├── materials.py # 素材管理 API
|
||||
│ │ └── publish.py # 发布管理 API
|
||||
│ ├── services/
|
||||
│ │ ├── tts_service.py # TTS 服务
|
||||
│ │ ├── lipsync_service.py # 唇形同步服务
|
||||
│ │ └── video_service.py # 视频合成服务
|
||||
│ ├── tasks/
|
||||
│ │ └── celery_tasks.py # Celery 异步任务
|
||||
│ ├── models/
|
||||
│ │ └── schemas.py # Pydantic 模型
|
||||
│ └── core/
|
||||
│ └── config.py # 配置管理
|
||||
├── requirements.txt
|
||||
└── docker-compose.yml # Redis + API
|
||||
```
|
||||
|
||||
#### 2.2 核心 API 设计
|
||||
|
||||
| 端点 | 方法 | 功能 |
|
||||
|------|------|------|
|
||||
| `/api/materials` | POST | 上传素材视频 | ✅ |
|
||||
| `/api/materials` | GET | 获取素材列表 | ✅ |
|
||||
| `/api/videos/generate` | POST | 创建视频生成任务 | ✅ |
|
||||
| `/api/tasks/{id}` | GET | 查询任务状态 | ✅ |
|
||||
| `/api/videos/{id}/download` | GET | 下载生成的视频 | ✅ |
|
||||
| `/api/publish` | POST | 发布到社交平台 | ✅ |
|
||||
|
||||
#### 2.3 Celery 任务定义
|
||||
|
||||
```python
|
||||
# tasks/celery_tasks.py
|
||||
@celery.task
|
||||
def generate_video_task(material_id: str, text: str, voice: str):
|
||||
# 1. TTS 生成音频
|
||||
# 2. MuseTalk 唇形同步
|
||||
# 3. FFmpeg 添加字幕
|
||||
# 4. 保存并返回视频 URL
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 阶段三:前端 Web UI
|
||||
|
||||
> **目标**:提供用户友好的操作界面
|
||||
|
||||
#### 3.1 页面设计
|
||||
|
||||
| 页面 | 功能 |
|
||||
|------|------|
|
||||
| **素材库** | 上传/管理多场景素材视频 |
|
||||
| **生成视频** | 输入文案、选择素材、生成预览 |
|
||||
| **任务中心** | 查看生成进度、下载视频 |
|
||||
| **发布管理** | 绑定平台、一键发布、定时发布 |
|
||||
|
||||
#### 3.2 技术实现
|
||||
|
||||
```bash
|
||||
# 创建 Next.js 项目
|
||||
npx create-next-app@latest frontend --typescript --tailwind --app
|
||||
|
||||
# 安装依赖
|
||||
cd frontend
|
||||
npm install @tanstack/react-query axios
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 阶段四:社交媒体发布
|
||||
|
||||
> **目标**:集成 social-auto-upload,支持多平台发布
|
||||
|
||||
#### 4.1 复用 social-auto-upload
|
||||
|
||||
```bash
|
||||
# 复制模块
|
||||
cp -r SuperIPAgent/social-auto-upload backend/social_upload
|
||||
```
|
||||
|
||||
#### 4.2 Cookie 管理
|
||||
|
||||
```python
|
||||
# 用户通过浏览器登录 → 保存 Cookie → 后续自动发布
|
||||
```
|
||||
|
||||
#### 4.3 支持平台
|
||||
- 抖音
|
||||
- 小红书
|
||||
- 微信视频号
|
||||
- 快手
|
||||
|
||||
---
|
||||
|
||||
### 阶段五:优化与扩展
|
||||
|
||||
| 功能 | 实现方式 |
|
||||
|------|----------|
|
||||
| **声音克隆** | 集成 GPT-SoVITS,用自己的声音 |
|
||||
| **AI 标题/标签生成** | 调用大模型 API 自动生成标题与标签 ✅ |
|
||||
| **批量生成** | 上传 Excel/CSV,批量生成视频 |
|
||||
| **字幕编辑器** | 可视化调整字幕样式、位置 |
|
||||
| **Docker 部署** | 一键部署到云服务器 | ✅ |
|
||||
|
||||
---
|
||||
|
||||
### 阶段六:MuseTalk 服务器部署 (Day 2-3) ✅
|
||||
|
||||
> **目标**:在双显卡服务器上部署 MuseTalk 环境
|
||||
|
||||
- [x] Conda 环境配置 (musetalk)
|
||||
- [x] 模型权重下载 (~7GB)
|
||||
- [x] Subprocess 调用方式实现
|
||||
- [x] 健康检查功能
|
||||
|
||||
### 阶段七:MuseTalk 完整修复 (Day 4) ✅
|
||||
|
||||
> **目标**:解决推理脚本的各种兼容性问题
|
||||
|
||||
- [x] 权重检测路径修复 (软链接)
|
||||
- [x] 音视频长度不匹配修复
|
||||
- [x] 推理脚本错误日志增强
|
||||
- [x] 视频合成 MP4 生成验证
|
||||
|
||||
### 阶段八:前端功能增强 (Day 5) ✅
|
||||
|
||||
> **目标**:提升用户体验
|
||||
|
||||
- [x] Web 视频上传功能
|
||||
- [x] 上传进度显示
|
||||
- [x] 自动刷新素材列表
|
||||
|
||||
### 阶段九:唇形同步模型升级 (Day 6) ✅
|
||||
|
||||
> **目标**:从 MuseTalk 迁移到 LatentSync 1.6
|
||||
|
||||
- [x] MuseTalk → LatentSync 1.6 迁移
|
||||
- [x] 后端代码适配 (config.py, lipsync_service.py)
|
||||
- [x] Latent Diffusion 架构 (512x512 高清)
|
||||
- [x] 服务器端到端验证
|
||||
|
||||
### 阶段十:性能优化 (Day 6) ✅
|
||||
|
||||
> **目标**:提升系统响应速度和稳定性
|
||||
|
||||
- [x] 视频预压缩优化 (1080p → 720p 自动适配)
|
||||
- [x] 进度更新细化 (实时反馈)
|
||||
- [x] **常驻模型服务** (Persistent Server, 0s 加载)
|
||||
- [x] **GPU 并发控制** (串行队列防崩溃)
|
||||
|
||||
### 阶段十一:社交媒体发布完善 (Day 7) ✅
|
||||
|
||||
> **目标**:实现全自动扫码登录和多平台发布
|
||||
|
||||
- [x] QR码自动登录 (Playwright headless + Stealth)
|
||||
- [x] 多平台上传器架构 (B站/抖音/小红书)
|
||||
- [x] Cookie 自动管理
|
||||
- [x] 定时发布功能
|
||||
|
||||
### 阶段十二:用户体验优化 (Day 8) ✅
|
||||
|
||||
> **目标**:提升文件管理和历史记录功能
|
||||
|
||||
- [x] 文件名保留 (时间戳前缀 + 原始名称)
|
||||
- [x] 视频持久化 (历史视频列表 API)
|
||||
- [x] 素材/视频删除功能
|
||||
|
||||
### 阶段十三:发布模块优化 (Day 9) ✅
|
||||
|
||||
> **目标**:代码质量优化 + 发布功能验证
|
||||
|
||||
- [x] B站/抖音登录+发布验证通过
|
||||
- [x] 资源清理保障 (try-finally)
|
||||
- [x] 超时保护 (消除无限循环)
|
||||
- [x] 完整类型提示
|
||||
|
||||
### 阶段十四:用户认证系统 (Day 9) ✅
|
||||
|
||||
> **目标**:实现安全、隔离的多用户认证体系
|
||||
|
||||
- [x] Supabase 云数据库集成 (本地自托管)
|
||||
- [x] JWT + HttpOnly Cookie 认证架构
|
||||
- [x] 用户表与权限表设计 (RLS 准备)
|
||||
- [x] 认证部署文档 (Docs/SUPABASE_DEPLOY.md)
|
||||
|
||||
### 阶段十五:部署稳定性优化 (Day 9) ✅
|
||||
|
||||
> **目标**:确保生产环境服务长期稳定
|
||||
|
||||
- [x] 依赖冲突修复 (bcrypt)
|
||||
- [x] 前端构建修复 (Production Build)
|
||||
- [x] PM2 进程守护配置
|
||||
- [x] 部署手册更新 (Docs/DEPLOY_MANUAL.md)
|
||||
|
||||
### 阶段十六:HTTPS 全栈部署 (Day 10) ✅
|
||||
|
||||
> **目标**:实现安全的公网 HTTPS 访问
|
||||
|
||||
- [x] 阿里云 Nginx 反向代理配置
|
||||
- [x] Let's Encrypt SSL 证书集成
|
||||
- [x] Supabase 自托管部署 (Docker)
|
||||
- [x] 端口冲突解决 (3003/8008/8444)
|
||||
- [x] Basic Auth 管理后台保护
|
||||
|
||||
### 阶段十七:声音克隆功能集成 (Day 13) ✅
|
||||
|
||||
> **目标**:实现用户自定义声音克隆能力
|
||||
|
||||
- [x] Qwen3-TTS HTTP 服务 (独立 FastAPI,端口 8009)
|
||||
- [x] 声音克隆服务封装 (voice_clone_service.py)
|
||||
- [x] 参考音频管理 API (上传/列表/删除)
|
||||
- [x] 前端 TTS 模式选择 UI
|
||||
- [x] Supabase ref-audios Bucket 配置
|
||||
- [x] 端到端测试验证
|
||||
|
||||
### 阶段十八:手机号登录迁移 (Day 15) ✅
|
||||
|
||||
> **目标**:将认证系统从邮箱迁移到手机号
|
||||
|
||||
- [x] 数据库 Schema 迁移 (email → phone)
|
||||
- [x] 后端 API 适配 (auth.py/admin.py)
|
||||
- [x] 11位手机号校验 (正则验证)
|
||||
- [x] 修改密码功能 (/api/auth/change-password)
|
||||
- [x] 账户设置下拉菜单 (修改密码 + 有效期显示 + 退出)
|
||||
- [x] 前端登录/注册页面更新
|
||||
- [x] 数据库迁移脚本 (migrate_to_phone.sql)
|
||||
|
||||
### 阶段十九:深度性能优化与服务守护 (Day 16) ✅
|
||||
|
||||
> **目标**:提升系统响应速度与服务稳定性
|
||||
|
||||
- [x] Flash Attention 2 集成 (Qwen3-TTS 加速 5x)
|
||||
- [x] LatentSync 性能调优 (OMP 线程限制 + 原生 Flash Attn)
|
||||
- [x] Watchdog 服务守护 (自动重启僵死服务)
|
||||
- [x] 文档体系更新 (部署手册与运维指南)
|
||||
|
||||
---
|
||||
|
||||
## 项目目录结构 (最终)
|
||||
|
||||
---
|
||||
|
||||
## 开发时间估算
|
||||
|
||||
| 阶段 | 预计时间 | 说明 |
|
||||
|------|----------|------|
|
||||
| 阶段一 | 2-3 天 | 环境搭建 + 效果验证 |
|
||||
| 阶段二 | 3-4 天 | 后端 API 开发 |
|
||||
| 阶段三 | 3-4 天 | 前端 UI 开发 |
|
||||
| 阶段四 | 2 天 | 社交发布集成 |
|
||||
| 阶段五 | 按需 | 持续优化 |
|
||||
|
||||
**总计**:约 10-13 天可完成 MVP
|
||||
|
||||
---
|
||||
|
||||
## 验证计划
|
||||
|
||||
### 阶段一验证
|
||||
1. 运行 `test_pipeline.py` 脚本
|
||||
2. 检查生成视频的唇形同步效果
|
||||
3. 确认音画同步
|
||||
|
||||
### 阶段二验证
|
||||
1. 使用 Postman/curl 测试所有 API 端点
|
||||
2. 验证任务队列正常工作
|
||||
3. 检查视频生成完整流程
|
||||
|
||||
### 阶段三验证
|
||||
1. 在浏览器中完成完整操作流程
|
||||
2. 验证上传、生成、下载功能
|
||||
3. 检查响应式布局
|
||||
|
||||
### 阶段四验证
|
||||
1. 发布一个测试视频到抖音
|
||||
2. 验证定时发布功能
|
||||
3. 检查发布状态同步
|
||||
|
||||
---
|
||||
|
||||
## 硬件要求
|
||||
|
||||
| 配置 | 最低要求 | 推荐配置 |
|
||||
|------|----------|----------|
|
||||
| **GPU** | NVIDIA GTX 1060 6GB | RTX 3060 12GB+ |
|
||||
| **内存** | 16GB | 32GB |
|
||||
| **存储** | 100GB SSD | 500GB SSD |
|
||||
| **CUDA** | 11.7+ | 12.0+ |
|
||||
|
||||
---
|
||||
|
||||
## 下一步行动
|
||||
|
||||
1. **确认你的 GPU 配置** - MuseTalk 需要 NVIDIA GPU
|
||||
2. **选择开发起点** - 从阶段一开始验证效果
|
||||
3. **确定项目位置** - 在哪个目录创建项目
|
||||
|
||||
---
|
||||
|
||||
> [!IMPORTANT]
|
||||
> 请确认以上计划是否符合你的需求,有任何需要调整的地方请告诉我。
|
||||
@@ -1,8 +1,8 @@
|
||||
# ViGent2 开发任务清单 (Task Log)
|
||||
|
||||
**项目**: ViGent2 数字人口播视频生成系统
|
||||
**进度**: 100% (Day 16 - 深度优化完成)
|
||||
**更新时间**: 2026-02-03
|
||||
**项目**: ViGent2 数字人口播视频生成系统
|
||||
**进度**: 100% (Day 33 - 文案深度学习落地 + 抓取稳定性增强 + 弹窗操作统一)
|
||||
**更新时间**: 2026-03-05
|
||||
|
||||
---
|
||||
|
||||
@@ -10,15 +10,266 @@
|
||||
|
||||
> 这里记录了每一天的核心开发内容与 milestone。
|
||||
|
||||
### Day 16: 深度性能优化 (Current) 🚀
|
||||
- [x] **Qwen-TTS 加速**: 集成 Flash Attention 2,模型加载速度提升至 8.9s。
|
||||
- [x] **服务守护**: 开发 `Watchdog` 看门狗机制,自动监控并重启僵死服务。
|
||||
- [x] **LatentSync 性能确认**: 验证 DeepCache + 原生 Flash Attn 生效。
|
||||
- [x] **文档重构**: 全面更新 README、部署手册及后端文档。
|
||||
- [x] **UI 交互优化**: 选择项持久化、列表内定位、刷新回顶部。
|
||||
- [x] **样式与预览**: 标题/字幕样式选择 + 预览 + 字号调节。
|
||||
- [x] **背景音乐**: 试听 + 音量控制 + 混音稳定性修复。
|
||||
- [x] **资产库接入**: 字体/BGM 资源库 + `/api/assets` 资源接口。
|
||||
### Day 33: 文案深度学习落地 + 抓取稳定性增强 + 交互统一 (Current)
|
||||
- [x] **文案深度学习功能上线**: 新增 `ScriptLearningModal`(输入主页链接 -> 话题分析 -> 生成文案 -> 填入编辑器)与首页入口接入。
|
||||
- [x] **Tools 新接口**: 新增 `POST /api/tools/analyze-creator` 与 `POST /api/tools/generate-topic-script`,并接入登录鉴权。
|
||||
- [x] **抖音/B站抓取增强**: 博主标题抓取统一升级为 Playwright 直连主链路,支持用户 Cookie 上下文增强与失败重试。
|
||||
- [x] **GLM 调用统一收口**: `glm_service` 新增统一调用入口,标题生成/改写/翻译/话题分析/话题文案生成全部复用,减少重复代码。
|
||||
- [x] **超时体验优化**: 文案深度学习“生成文案”前端超时从 30s 提升到 90s,并补充超时提示文案。
|
||||
- [x] **文案弹窗交互统一**: 文案提取/AI 改写/文案深度学习结果页按钮统一为底部 Action Grid,主次按钮层级与文案动作统一。
|
||||
- [x] **依赖升级**: 后端 venv 升级 `yt-dlp`、`playwright`、`biliup` 并完成兼容性冒烟验证。
|
||||
- [x] **文档同步**: 回写 `Day33`、`FRONTEND_README`、`FRONTEND_DEV`、`BACKEND_README`、`BACKEND_DEV`、`TASK_COMPLETE`。
|
||||
|
||||
### Day 32: 视频下载同源修复 + 安全整改第一批 + Day 日志拆分归档
|
||||
- [x] **下载链路修复**: 新增 `GET /api/videos/generated/{video_id}/download`,统一返回 `Content-Disposition: attachment`,修复“点击下载却新开标签页播放”问题。
|
||||
- [x] **发布成功弹窗下载改造**: `CleanupContext` 从传 URL 改为传 `videoId`,下载按钮改走同源接口,去掉 `target="_blank"`。
|
||||
- [x] **首页下载改造**: `PreviewPanel` 同步切换到同源下载接口,首页与发布页下载行为一致。
|
||||
- [x] **兼容旧持久化状态**: `CleanupContext` 对旧 `videoDownloadUrl` 做 `videoId` 解析回填,避免旧 pending 状态失效。
|
||||
- [x] **文档拆分归档**: 将“下载修复开始后的今日内容”归档到 `Docs/DevLogs/Day32.md`,并从 `Day31.md` 移除对应章节与验证记录。
|
||||
- [x] **安全第一批修复**: JWT 默认密钥生产拦截、AI/Tools 接口强制鉴权、materials 路径穿越拦截、video_id 白名单、上传体积限制、错误信息通用化。
|
||||
- [x] **安全收尾三刀**: `delete_material` 的 `ValueError -> 400`、`tools` URL 下载分支 500MB 限制、`DEBUG=false` 下默认 JWT 密钥阻断启动。
|
||||
- [x] **弹窗关闭策略收敛**: 默认支持 `ESC/X/遮罩` 关闭;发布成功清理弹窗保持强制流程不允许遮罩关闭;录音弹窗录音中禁遮罩关闭(防误触)。
|
||||
|
||||
### Day 31: 文档体系收敛 + 音色试听 + 录音弹窗重构 + 发布登录稳定性修复
|
||||
- [x] **文档体系收敛**: README/DEV 职责边界明确,部署参数与代码对齐,Qwen3-TTS 文档归档至历史状态。
|
||||
- [x] **音色试听能力**: 新增并启用 `GET/POST /api/videos/voice-preview`,前端改为直接播放 GET 音频流,修复线上 404(重启后端生效)。
|
||||
- [x] **录音交互重构**: 录音入口迁移到参考音频区底部,流程改为弹窗;支持录音后即时关闭弹窗、后台上传识别。
|
||||
- [x] **弹窗系统统一**: 抽离 `AppModal`,统一遮罩/焦点/滚动锁/Portal,可访问性补齐;主要弹窗完成迁移(预览、提取、改写、截取、录音、改密、发布登录)。
|
||||
- [x] **抖音扫码修复**: 登录页等待策略改为 `domcontentloaded`,并对导航超时容错,避免“无法获取二维码”。
|
||||
- [x] **微信二维码优化**: 后端优先导出原始 PNG,前端展示加入白底留白容器,修复“二维码边缘像被截断”的观感问题。
|
||||
- [x] **发布性能优化**: 发布页改为受限并发(并发度 2),多平台发布总等待时长明显下降。
|
||||
- [x] **微信上传日志降噪**: `file_input empty` 告警改为信号驱动,非最终重试降级为 info,减少误报警。
|
||||
- [x] **小红书发布重构**: 对齐抖音/微信上传架构,补齐启动配置、上传/发布多信号判定、成功截图与 `screenshot_url` 回传。
|
||||
- [x] **Cookie 格式统一**: 非 B 站平台统一保存为 Playwright `storage_state`,支持 uploader 直接加载上下文。
|
||||
- [x] **小红书扫码修复**: 自动从短信登录切换到扫码页并提取二维码,登录成功判定补齐 `/new/home` 路径。
|
||||
- [x] **小红书“上传卡住”修复**: 新增无后缀视频临时文件兜底(hardlink/copy)、文件名后缀一致性校验、上传空转超时保护(90s)。
|
||||
- [x] **实测闭环**: 小红书 `POST /api/publish` 实测成功(45.77s)并可访问成功截图接口。
|
||||
- [x] **文档补齐**: 新增 `Docs/PUBLISH_DEPLOY.md`,并回写 `README.md`、`BACKEND_README.md`、`BACKEND_DEV.md`、`DEPLOY_MANUAL.md`。
|
||||
- [x] **文档规则对齐**: 更新 `Docs/DOC_RULES.md`,补充发布相关“三检”与敏感信息处理规范,加入 `PUBLISH_DEPLOY.md` 检查项,工具规范改为 `Read/Grep/apply_patch`,并对齐 TASK_COMPLETE 检查清单。
|
||||
- [x] **首页交互微调**: `AI生成标题标签` 按钮迁移到“四、标题与字幕”标题同层最右;`标题显示方式 + 预览样式` 下沉到下一行右侧;AI按钮圆角/尺寸对齐“在线录音”,配色保留原蓝色渐变;文档明确 `title_display_mode` 对主/副标题统一生效。
|
||||
- [x] **文案编辑扩展**: 在文案输入框右下角新增扩展角标,点击后弹出大编辑器,主框与弹窗内文案实时同步;角标样式改为双箭头极简贴边并微调到 `right-0.5 bottom-2`;修复扩展输入框打字后失焦问题,移除紫色聚焦边框。
|
||||
- [x] **站点图标更新**: 使用 `Temp/video.png` 替换网站 icon,生成并更新 `frontend/src/app/icon.png` 与多尺寸 `frontend/src/app/favicon.ico`。
|
||||
- [x] **发布后清理链路加固**: 新增/优化 `CleanupContext` + `/api/videos/cleanup` 全链路;后端删除异常不再吞错、清理接口严格成功语义;前端失败不清本地/不关弹窗,3 次失败可暂不清理,清理状态 24h 过期并支持用户切换复位;清理范围收敛为输入内容字段并保留用户偏好。
|
||||
|
||||
### Day 30: Remotion 缓存修复 + 编码流水线质量优化 + 唇形同步容错 + 统一下拉交互
|
||||
- [x] **Remotion 缓存 404 修复**: bundle 缓存命中时,新生成的视频/字体文件不在旧缓存 `public/` 目录 → 404 → 回退 FFmpeg(无标题字幕)。改为硬链接(`fs.linkSync`)当前渲染所需文件到缓存目录。
|
||||
- [x] **LatentSync `read_video` 跳过冗余 FPS 重编码**: 检测输入 FPS,已是 25fps 时跳过 `ffmpeg -r 25 -crf 18` 重编码。
|
||||
- [x] **LatentSync final mux 流复制**: `imageio` CRF 13 写帧后的 mux 步骤从 `libx264 -crf 18` 改为 `-c:v copy`,消除冗余双重编码。
|
||||
- [x] **`prepare_segment` + `normalize_orientation` CRF 提质**: CRF 23 → 18,与 LatentSync 内部质量标准统一。
|
||||
- [x] **多素材 concat 流复制**: 各段参数已统一,`concat_videos` 从 `libx264 -crf 23` 改为 `-c:v copy`。
|
||||
- [x] **编码次数总计**: 从 5-6 次有损编码降至 3 次(prepare_segment → LatentSync/MuseTalk 模型输出 → Remotion)。
|
||||
- [x] **LatentSync 无脸帧容错**: 素材部分帧检测不到人脸时不再中断推理,无脸帧保留原画面,单素材异常时回退原视频。
|
||||
- [x] **MuseTalk 管道直编码**: `cv2.VideoWriter(mp4v)` 中间有损文件改为 FFmpeg rawvideo stdin 管道,消除一次冗余有损编码。
|
||||
- [x] **MuseTalk 参数环境变量化**: 推理与编码参数(detect_every/blend_cache/CRF/preset 等)从硬编码迁移到 `backend/.env`,当前使用质量优先档(CRF 14, preset slow, detect_every 2, blend_cache_every 2)。
|
||||
- [x] **Workflow 异步防阻塞**: 新增 `_run_blocking()` 线程池辅助,5 处同步 FFmpeg 调用(旋转归一化/prepare_segment/concat/BGM 混音)改为 `await _run_blocking()`,事件循环不再被阻塞。
|
||||
- [x] **compose 跳过优化**: 无 BGM 时 `final_audio_path == audio_path`,跳过多余的 compose 步骤,Remotion 路径直接用 lipsync 输出,非 Remotion 路径 `shutil.copy` 透传。
|
||||
- [x] **compose() 异步化**: `compose()` 改为 `async def`,内部 `_get_duration` 和 `_run_ffmpeg` 走 `run_in_executor`。
|
||||
- [x] **同分辨率跳过 scale**: 多素材逐段比对分辨率,匹配的传 `None` 走 copy 分支;单素材同理。避免已是目标分辨率时的无效重编码。
|
||||
- [x] **`_get_duration()` 线程池化**: workflow 中 3 处同步 ffprobe 探测改为 `await _run_blocking()`。
|
||||
- [x] **compose 循环 CRF 统一**: 循环场景 CRF 23 → 18,与全流水线质量标准一致。
|
||||
- [x] **多素材片段校验**: prepare 完成后校验片段数量一致,防止空片段进入 concat。
|
||||
- [x] **唇形模型前端选择**: 生成按钮右侧新增模型下拉(默认模型/快速模型/高级模型),全链路透传 `lipsync_model` 到后端路由。默认保持阈值策略,快速强制 MuseTalk,高级强制 LatentSync,三种模式均有 LatentSync 兜底。选择 localStorage 持久化。
|
||||
- [x] **业务下拉统一组件化**: 新增 `SelectPopover`(桌面 Popover + 移动端 BottomSheet),覆盖首页/发布页主要业务选择器(音色、参考音频、配音、素材、BGM、作品、样式、模型、画面比例)。
|
||||
- [x] **下拉体验修复**: 统一处理遮挡(Portal + fixed)、自动上拉、触发器同宽、背景不透明、滚动条隐藏、再次打开定位到已选项。
|
||||
- [x] **预览联动修复**: 下拉内点击视频预览不强制收起菜单;预览弹窗层级高于下拉;关闭预览后可继续在菜单内连续预览。
|
||||
- [x] **BGM 交互收敛**: BGM 选择改为发布页同款(搜索 + 列表 + 试听);按产品要求移除首页音量滑杆,生成请求固定 `bgm_volume=0.2`。
|
||||
- [x] **例外回退**: `ScriptEditor` 的“历史文案 / AI多语言”恢复原轻量菜单样式(不强制统一 SelectPopover)。
|
||||
- [x] **文档同步**: Day30 / TASK_COMPLETE / FRONTEND_DEV / FRONTEND_README / README / BACKEND_README 同步更新到最终实现。
|
||||
|
||||
### Day 29: 视频流水线优化 + CosyVoice 语气控制
|
||||
- [x] **字幕同步修复**: Whisper 时间戳三步平滑(单调递增+重叠消除+间隙填补)+ 原文节奏映射(线性插值 + 单字时长钳位)。
|
||||
- [x] **LatentSync 嘴型参数调优**: inference_steps 16→20, guidance_scale 2.0, DeepCache 启用, Remotion concurrency 16→4。
|
||||
- [x] **compose 流复制**: 不循环时 `-c:v copy` 替代 libx264 重编码,compose 耗时从分钟级降到秒级。
|
||||
- [x] **FFmpeg 超时保护**: `_run_ffmpeg()` timeout=600, `_get_duration()` timeout=30。
|
||||
- [x] **全局并发限制**: `asyncio.Semaphore(2)` 控制同时运行的生成任务数。
|
||||
- [x] **Redis 任务 TTL**: create 24h, completed/failed 2h, list 自动清理过期索引。
|
||||
- [x] **临时字体清理**: 字体文件加入 temp_files 清理列表。
|
||||
- [x] **预览背景 CORS 修复**: 素材同源代理 `/api/materials/stream/{id}` 彻底绕开跨域。
|
||||
- [x] **CosyVoice 语气控制**: 声音克隆模式新增语气下拉(正常/欢快/低沉/严肃),基于 `inference_instruct2()` 自然语言指令控制情绪,全链路透传 instruct_text,默认"正常"行为不变。
|
||||
|
||||
### Day 28: CosyVoice FP16 加速 + 文档全面更新
|
||||
- [x] **CosyVoice FP16 半精度加速**: `AutoModel()` 开启 `fp16=True`,LLM 推理和 Flow Matching 自动混合精度运行,预估提速 30-40%、显存降低 ~30%。
|
||||
- [x] **文档全面更新**: README.md / DEPLOY_MANUAL.md / SUBTITLE_DEPLOY.md / BACKEND_README.md 补充 MuseTalk 混合唇形同步方案、性能优化、Remotion 并发渲染等内容。
|
||||
|
||||
### Day 27: Remotion 描边修复 + 字体样式扩展 + 混合唇形同步 + 性能优化
|
||||
- [x] **描边渲染修复**: 标题/副标题/字幕从 `textShadow` 4 方向模拟改为 CSS 原生 `-webkit-text-stroke` + `paint-order: stroke fill`,修复描边过粗和副标题重影问题。
|
||||
- [x] **字体样式扩展**: 标题样式 4→12 个(+庞门正道/优设标题圆/阿里数黑体/文道潮黑/无界黑/厚底黑/寒蝉半圆体/欣意吉祥宋),字幕样式 4→8 个(+少女粉/清新绿/金色隶书/楷体红字)。
|
||||
- [x] **描边参数优化**: 所有预设 `stroke_size` 从 8 降至 4~5,配合原生描边视觉更干净。
|
||||
- [x] **TypeScript 类型修复**: Root.tsx `Composition` 泛型与 `calculateMetadata` 参数类型对齐;Video.tsx `VideoProps` 添加索引签名兼容 `Record<string, unknown>`;VideoLayer.tsx 移除 `OffthreadVideo` 不支持的 `loop` prop。
|
||||
- [x] **进度条文案还原**: 进度条从显示后端推送消息改回固定 `正在AI生成中...`。
|
||||
- [x] **MuseTalk 混合唇形同步**: 部署 MuseTalk 1.5 常驻服务 (GPU0, 端口 8011),按音频时长自动路由(由 `LIPSYNC_DURATION_THRESHOLD` 控制;本仓库当前 `.env` 为 100)— 短视频走 LatentSync,长视频走 MuseTalk,MuseTalk 不可用时自动回退。
|
||||
- [x] **MuseTalk 推理性能优化**: server.py v2 重写 — cv2 直读帧(跳过 ffmpeg→PNG)、人脸检测降频(每5帧)、BiSeNet mask 缓存(每5帧)、cv2.VideoWriter 直写(跳过 PNG 写盘)、batch_size 8→32,预估 30min→8-10min (~3x)。
|
||||
- [x] **Remotion 并发渲染优化**: render.ts 新增 concurrency 参数,从默认 8 提升到 16 (56核 CPU),预估 5min→2-3min。
|
||||
|
||||
### Day 26: 前端优化:板块合并 + 序号标题 + UI 精细化
|
||||
- [x] **板块合并**: 首页 9 个独立板块合并为 5 个主板块(配音方式+配音列表→三、配音;视频素材+时间轴→四、素材编辑;历史作品+作品预览→六、作品)。
|
||||
- [x] **中文序号标题**: 一~十编号(首页一~六,发布页七~十),移除所有 emoji 图标。
|
||||
- [x] **embedded 模式**: 6 个组件支持 `embedded` prop,嵌入时不渲染外层卡片/标题。
|
||||
- [x] **配音列表两行布局**: embedded 模式第 1 行语速+生成配音(右对齐),第 2 行配音列表+刷新。
|
||||
- [x] **子组件自渲染子标题**: MaterialSelector/TimelineEditor embedded 时自渲染 h3 子标题+操作按钮同行。
|
||||
- [x] **下拉对齐**: TitleSubtitlePanel 标签统一 `w-20`,下拉 `w-1/3 min-w-[100px]`,垂直对齐。
|
||||
- [x] **参考音频文案简化**: 底部段落移至标题旁,简化为 `(上传3-10秒语音样本)`。
|
||||
- [x] **账户手机号显示**: AccountSettingsDropdown 新增手机号显示。
|
||||
- [x] **标题显示模式对副标题生效**: payload 条件修复 + UI 下拉上移至板块标题行。
|
||||
- [x] **登录后用户信息立即可用**: AuthContext 暴露 `setUser`,登录成功后立即写入用户数据,修复登录后显示"未知账户"的问题。
|
||||
- [x] **文案微调**: 素材描述改为"上传自拍视频,最多可选4个";显示模式选项加"标题"前缀。
|
||||
- [x] **UI/UX 体验优化**: 操作按钮移动端可见(opacity-40)、手机号脱敏、标题字数计数器、时间轴拖拽抓手图标、截取滑块放大。
|
||||
- [x] **代码质量修复**: 密码弹窗 success 清空、MaterialSelector useMemo + disabled 守卫、TimelineEditor useMemo。
|
||||
- [x] **发布页响应式布局**: 平台账号卡片单行布局,移动端紧凑(小图标/小按钮),桌面端宽松(与其他板块风格一致)。
|
||||
- [x] **移动端刷新回顶部**: `scrollRestoration = "manual"` + 列表 scroll 时间门控(`scrollEffectsEnabled` ref,1 秒内禁止自动滚动)+ 延迟兜底 `scrollTo(0,0)`。
|
||||
- [x] **移动端样式预览缩小**: FloatingStylePreview 移动端宽度缩至 160px,位置改为右下角,不遮挡样式调节控件。
|
||||
- [x] **列表滚动条统一隐藏**: 所有列表(BGM/配音/作品/素材/文案提取)滚动条改回 `hide-scrollbar`。
|
||||
- [x] **移动端配音/素材适配**: VoiceSelector 按钮移动端缩小(`px-2 sm:px-4`)修复克隆声音不可见;MaterialSelector 标题行移除 `whitespace-nowrap`,描述移动端隐藏,修复刷新按钮溢出。
|
||||
- [x] **生成配音按钮放大**: 从辅助尺寸(`text-xs px-2 py-1`)升级为主操作尺寸(`text-sm font-medium px-4 py-2`),新增阴影。
|
||||
- [x] **生成进度条位置调整**: 从"六、作品"卡片内部提取到右栏独立卡片,显示在作品卡片上方,更醒目。
|
||||
- [x] **LatentSync 超时修复**: httpx 超时从 1200s(20 分钟)改为 3600s(1 小时),修复 2 分钟以上视频口型推理超时回退问题。
|
||||
- [x] **字幕时间戳节奏映射**: `whisper_service.py` 从全程线性插值改为 Whisper 逐词节奏映射,修复长视频字幕漂移。
|
||||
|
||||
### Day 25: 文案提取修复 + 自定义提示词 + 片头副标题
|
||||
- [x] **抖音文案提取修复**: yt-dlp Fresh cookies 报错,重写 `_download_douyin_manual` 为移动端分享页 + 自动获取 ttwid 方案。
|
||||
- [x] **清理 DOUYIN_COOKIE**: 新方案不再需要手动维护 Cookie,从 `.env`/`config.py`/`service.py` 全面删除。
|
||||
- [x] **AI 智能改写自定义提示词**: 后端 `rewrite_script()` 支持 `custom_prompt` 参数;前端 checkbox 旁新增折叠式提示词编辑区,localStorage 持久化。
|
||||
- [x] **SSR 构建修复**: `useState` 初始化 `localStorage` 访问加 `typeof window` 守卫,修复 `npm run build` 报错。
|
||||
- [x] **片头副标题**: 新增 secondary_title(后端/Remotion/前端全链路),AI 同时生成,独立样式配置,20 字限制。
|
||||
- [x] **前端文案修正**: "AI 洗稿结果"→"AI 改写结果"。
|
||||
- [x] **yt-dlp 升级**: `2025.12.08` → `2026.2.21`。
|
||||
- [x] **参考音频中文文件名修复**: `sanitize_filename()` 将存储路径清洗为 ASCII 安全字符,纯中文名哈希兜底,原始名保留为展示名。
|
||||
|
||||
### Day 24: 鉴权到期治理 + 多素材时间轴稳定性修复
|
||||
- [x] **会员到期请求时失效**: 登录与鉴权接口统一执行 `expires_at` 检查;到期后自动停用账号、清理 session,并返回“会员已到期,请续费”。
|
||||
- [x] **画面比例控制**: 时间轴新增 `9:16 / 16:9` 输出比例选择,前端持久化并透传后端,单素材/多素材统一按目标分辨率处理。
|
||||
- [x] **标题/字幕防溢出**: Remotion 与前端预览统一响应式缩放、自动换行、描边/字距/边距比例缩放,降低预览与成片差异。
|
||||
- [x] **标题显示模式**: 标题行新增“短暂显示/常驻显示”下拉;默认短暂显示(4 秒),用户选择持久化并透传至 Remotion 渲染链路。
|
||||
- [x] **MOV 方向归一化**: 新增旋转元数据解析与 orientation normalize,修复“编码横屏+旋转元数据”导致的竖屏判断偏差。
|
||||
- [x] **多素材拼接稳定性**: 片段 prepare 与 concat 统一 25fps/CFR,concat 增加 `+genpts`,缓解段切换处“画面冻结口型还动”。
|
||||
- [x] **时间轴语义对齐**: 打通 `source_end` 全链路;修复 `sourceStart>0 且 sourceEnd=0` 时长计算;生成时以时间轴可见段 assignments 为准,超出段不参与。
|
||||
- [x] **交互细节优化**: 页面刷新回顶部;素材/历史列表首轮自动滚动抑制,减少恢复状态时页面跳动。
|
||||
|
||||
### Day 23: 配音前置重构 + 素材时间轴编排 + UI 体验优化 + 声音克隆增强
|
||||
|
||||
#### 第一阶段:配音前置
|
||||
- [x] **配音生成独立化**: 新增 `generated_audios` 后端模块(router/schemas/service),5 个 API 端点,复用现有 TTSService / voice_clone_service / task_store。
|
||||
- [x] **配音管理面板**: 前端新增 `useGeneratedAudios` hook + `GeneratedAudiosPanel` 组件,支持生成/试听/改名/删除/选中。
|
||||
- [x] **UI 面板重排序**: 文案 → 标题字幕 → 配音方式 → 配音列表 → 素材选择 → BGM → 生成视频。
|
||||
- [x] **素材区门控**: 未选中配音时素材区显示遮罩,选中后显示配音时长 + 素材均分信息。
|
||||
- [x] **视频生成对接**: workflow.py 新增预生成音频分支(`generated_audio_id`),跳过内联 TTS,向后兼容。
|
||||
- [x] **持久化**: selectedAudioId 加入 useHomePersistence,刷新页面恢复选中配音。
|
||||
|
||||
#### 第二阶段:素材时间轴编排
|
||||
- [x] **时间轴编辑器**: 新增 `TimelineEditor` 组件,wavesurfer.js 音频波形 + 色块可视化素材分配,拖拽分割线调整各段时长。
|
||||
- [x] **素材截取设置**: 新增 `ClipTrimmer` 模态框,HTML5 视频预览 + 双端滑块设置源视频截取起点/终点。
|
||||
- [x] **后端自定义分配**: 新增 `CustomAssignment` 模型,`prepare_segment` 支持 `source_start`,workflow 多素材/单素材流水线支持 `custom_assignments`。
|
||||
- [x] **循环截取修复**: `stream_loop + source_start` 改为两步处理(先裁剪再循环),确保从截取起点循环而非从视频 0s 开始。
|
||||
- [x] **MaterialSelector 精简**: 移除旧的时长信息栏和拖拽排序区(功能迁移到 TimelineEditor)。
|
||||
|
||||
#### 第三阶段:UI 体验优化 + TTS 稳定性
|
||||
- [x] **TTS SoX PATH 修复**: `run_qwen_tts.sh` export conda env bin 到 PATH (Qwen3-TTS 已停用,已被 CosyVoice 3.0 替换)。
|
||||
- [x] **TTS 显存管理**: 每次生成后 `torch.cuda.empty_cache()`,asyncio.to_thread 避免阻塞事件循环 (CosyVoice 沿用相同机制)。
|
||||
- [x] **配音列表按钮统一**: Play/Edit/Delete 按钮右侧同组 hover 显示,与 RefAudioPanel 一致,移除文案摘要。
|
||||
- [x] **素材区解除配音门控**: 移除 MaterialSelector 的 selectedAudio 遮罩,素材随时可上传管理。
|
||||
- [x] **时间轴拖拽排序**: TimelineEditor 色块支持 HTML5 Drag & Drop 调换素材顺序。
|
||||
- [x] **截取设置 Range Slider**: ClipTrimmer 改为单轨道双手柄(紫色起点+粉色终点),替换两个独立滑块。
|
||||
- [x] **截取设置视频预览**: 视频区域可播放/暂停,从 sourceStart 到 sourceEnd 自动停止,拖拽手柄时实时 seek。
|
||||
|
||||
#### 第四阶段:历史文案 + Bug 修复
|
||||
- [x] **历史文案保存与加载**: 新增 `useSavedScripts` hook,手动保存/加载/删除历史文案,独立 localStorage 持久化。
|
||||
- [x] **时间轴拖拽修复**: `reorderSegments` 从属性交换改为数组移动(splice),修复拖拽后时长不跟随素材的 Bug。
|
||||
- [x] **按钮视觉统一**: 文案编辑区 4 个按钮统一为固定高度 `h-7`,移除多余 `<span>` 嵌套。
|
||||
- [x] **底部栏调整**: "保存文案"按钮移至底部右侧,移除预计时长显示。
|
||||
|
||||
#### 第五阶段:字幕语言不匹配 + 视频比例错位修复
|
||||
- [x] **字幕用原文替换 Whisper 转录**: `align()` 新增 `original_text` 参数,字幕文字永远用配音保存的原始文案。
|
||||
- [x] **Remotion 动态视频尺寸**: `calculateMetadata` 从 props 读取真实尺寸,修复标题/字幕比例错位。
|
||||
- [x] **英文空格丢失修复**: `split_word_to_chars` 遇到空格时 flush buffer + pending_space 标记。
|
||||
|
||||
#### 第六阶段:参考音频自动转写 + 语速控制
|
||||
- [x] **Whisper 自动转写 ref_text**: 上传参考音频时自动调用 Whisper 转写内容作为 ref_text,不再使用前端固定文字。
|
||||
- [x] **参考音频自动截取**: 超过 10 秒自动在静音点截取(ffmpeg silencedetect),末尾 0.1 秒淡出避免截断爆音。
|
||||
- [x] **重新识别功能**: 新增 `POST /ref-audios/{id}/retranscribe` 端点 + 前端 RotateCw 按钮,旧音频可重新转写并截取。
|
||||
- [x] **语速控制**: 全链路 speed 参数(前端选择器 → 持久化 → 后端 → CosyVoice `inference_zero_shot(speed=)`),5 档:较慢(0.8)/稍慢(0.9)/正常(1.0)/稍快(1.1)/较快(1.2)。
|
||||
- [x] **缺少参考音频门控**: 声音克隆模式下未选参考音频时,生成配音按钮禁用 + 黄色警告提示。
|
||||
- [x] **Whisper 语言自动检测**: `transcribe()` language 参数改为可选(默认 None = 自动检测),支持多语言参考音频。
|
||||
- [x] **前端清理**: 移除固定 ref_text 常量、朗读引导文字,简化为"上传任意语音样本,系统将自动识别内容并克隆声音"。
|
||||
|
||||
### Day 22: 多素材优化 + AI 翻译 + TTS 多语言
|
||||
- [x] **多素材 Bug 修复**: 6 个高优 Bug(边界溢出、单段 fallback、除零、duration 校验、Whisper 兜底、空列表检查)。
|
||||
- [x] **架构重构**: 多素材从"逐段 LatentSync"重构为"先拼接再推理",推理次数 N→1。
|
||||
- [x] **前端优化**: payload 安全、进度消息、上传自动选中、Material 接口统一、拖拽修复、素材上限 4 个。
|
||||
- [x] **AI 多语言翻译**: 新增 `/api/ai/translate` 接口,前端 9 种语言翻译 + 还原原文。
|
||||
- [x] **TTS 多语言**: EdgeTTS 10 语言声音列表、翻译自动切换声音、声音克隆 language 透传、textLang 持久化。
|
||||
|
||||
### Day 21: 缺陷修复 + 浮动预览 + 发布重构 + 架构优化 + 多素材生成
|
||||
- [x] **Remotion 崩溃容错**: 渲染进程 SIGABRT 退出时检查输出文件,避免误判失败导致标题/字幕丢失。
|
||||
- [x] **首页作品选择持久化**: 修复 `fetchGeneratedVideos` 无条件覆盖恢复值的问题,新增 `preferVideoId` 参数控制选中逻辑。
|
||||
- [x] **发布页作品选择持久化**: 根因为签名 URL 不稳定,全面改用 `video.id` 替代 `path` 进行选择/持久化/比较。
|
||||
- [x] **预取缓存补全**: 首页预取发布页数据时加入 `id` 字段,确保缓存数据可用于持久化匹配。
|
||||
- [x] **浮动样式预览窗口**: 标题字幕预览改为 `position: fixed` 浮动窗口,固定左上角,滚动时始终可见。
|
||||
- [x] **移动端适配**: ScriptEditor 按钮换行、预览默认比例改为 9:16 竖屏。
|
||||
- [x] **多平台发布重构**: 平台配置独立化(DOUYIN_*/WEIXIN_*)、用户隔离 Cookie 管理、抖音刷脸验证二维码、微信发布流程优化。
|
||||
- [x] **前端结构微调**: ScriptExtractionModal 迁移到 features/、contexts 迁移到 shared/contexts/、清理空目录。
|
||||
- [x] **后端模块分层**: materials/tools/ref_audios 三个模块补全 router+schemas+service 分层。
|
||||
- [x] **开发规范更新**: BACKEND_DEV.md 新增渐进原则、DOC_RULES.md 取消 TASK_COMPLETE.md 手动触发约束。
|
||||
- [x] **文档全面更新**: BACKEND_DEV/README、FRONTEND_DEV、DEPLOY_MANUAL、README.md 同步更新。
|
||||
- [x] **多素材视频生成(多机位效果)**: 支持多选素材 + 拖拽排序,按素材数量均分音频时长(对齐 Whisper 字边界)自动切换机位。逐段 LatentSync + FFmpeg 拼接。前端 @dnd-kit 拖拽排序 UI。
|
||||
- [x] **字幕开关移除**: 默认启用逐字高亮字幕,移除开关及相关死代码。
|
||||
- [x] **视频格式扩展**: 上传支持 mkv/webm/flv/wmv/m4v/ts/mts 等常见格式。
|
||||
- [x] **Watchdog 优化**: 健康检查阈值提高到 5 次,新增重启冷却期 120 秒,避免误重启。
|
||||
- [x] **多素材 Bug 修复**: 修复标点分句方案对无句末标点文案无效(改为均分方案)、音频时间偏移导致口型不对齐等缺陷。
|
||||
|
||||
### Day 20: 代码质量与安全优化
|
||||
- [x] **功能性修复**: LatentSync 回退逻辑、任务状态接口认证、User 类型统一。
|
||||
- [x] **性能优化**: N+1 查询修复、视频上传流式处理、httpx 异步替换、GLM 异步包装。
|
||||
- [x] **安全修复**: 硬编码 Cookie 配置化、日志敏感信息脱敏、ffprobe 安全调用、CORS 配置化。
|
||||
- [x] **配置优化**: 存储路径环境变量化、Remotion 预编译加速、LatentSync 绝对路径。
|
||||
- [x] **文档更新**: 更新 DOC_RULES.md 清单,补齐后端与部署文档;更新 SUBTITLE_DEPLOY.md, FRONTEND_DEV.md, implementation_plan.md。
|
||||
- [x] **缺陷修复**: 修复 Remotion 路径解析、发布页持久化竞态、首页选中回归、素材闭包陷阱。
|
||||
|
||||
### Day 19: 自动发布稳定性与发布体验优化 🚀
|
||||
- [x] **抖音发布稳定性**: 上传入口、封面流程、发布重试、登录失效识别与网络失败快速返回全面增强。
|
||||
- [x] **视频号发布修复**: 标题+标签统一写入“视频描述”,`post_create` 成功信号快速判定,超时改为失败返回。
|
||||
- [x] **成功截图闭环**: 抖音/视频号发布成功截图接入前端,支持用户隔离存储与鉴权访问。
|
||||
- [x] **截图观感优化**: 成功截图延后 3 秒并改为视口截图,修复“截图内容仅占 1/3”问题。
|
||||
- [x] **调试能力开关化**: 新增视频号录屏配置,默认可按环境变量开关,失败排障更直观。
|
||||
- [x] **启动链路统一**: 合并为 `run_backend.sh`(xvfb + headful),统一端口 `8006`,减少多进程混淆。
|
||||
- [x] **发布页防误操作**: 发布中按钮提示“请勿刷新或关闭网页”,并启用刷新/关页二次确认拦截。
|
||||
- [ ] **后续优化**: 发布任务状态恢复机制(任务化 + 状态持久化 + 前端轮询恢复)。
|
||||
|
||||
### Day 18: 后端模块化与规范完善
|
||||
- [x] **模块化迁移**: 路由透传 `modules/*`,业务逻辑集中到 service/workflow。
|
||||
- [x] **视频生成拆分**: 生成流程下沉 workflow,任务状态统一 TaskStore。
|
||||
- [x] **Redis 任务存储**: Redis 优先,不可用自动回退内存。
|
||||
- [x] **仓储层抽离**: Supabase 访问统一 `repositories/*`,deps/auth/admin 全面替换。
|
||||
- [x] **响应规范**: 统一 `success/message/data/code` + 全局异常处理。
|
||||
- [x] **素材重命名**: 新增重命名接口与 Storage `move_file`。
|
||||
- [x] **平台顺序调整**: 抖音/微信视频号/B站/小红书,移除快手。
|
||||
- [x] **后端开发规范**: 新增 `BACKEND_DEV.md`,README 同步模块化结构。
|
||||
- [x] **发布管理体验**: 首页预取路由 + 发布页骨架与缓存,进入更快。
|
||||
- [x] **素材加载优化**: 素材列表并发签名 URL,骨架数量动态。
|
||||
- [x] **预览加载优化**: `preload="metadata"` + hover 预取。
|
||||
|
||||
### Day 17: 前端重构与体验优化
|
||||
- [x] **UI 组件拆分**: 首页拆分为独立组件,降低 `page.tsx` 复杂度。
|
||||
- [x] **轻量 FSD 迁移**: `app` 页面轻量化,逻辑集中到 `features/*/model`,通用能力下沉 `shared/*`。
|
||||
- [x] **Controller Hooks**: Home/Publish 页面逻辑集中到 Controller Hook,Page 仅组合渲染。
|
||||
- [x] **通用工具抽取**: `media.ts` 统一 API Base / URL / 日期格式化。
|
||||
- [x] **交互优化**: 选择项持久化、列表内定位、刷新回顶部、最新作品优先预览。
|
||||
- [x] **发布页改造**: 作品列表卡片化 + 搜索 + 预览弹窗。
|
||||
- [x] **预览体验**: 预览弹窗统一头部样式与提示文案。
|
||||
- [x] **预览一致性**: 标题/字幕预览按素材分辨率缩放。
|
||||
- [x] **标题同步与限制**: 片头标题同步发布标题,输入法合成态兼容,限制 15 字。
|
||||
- [x] **样式默认与持久化**: 默认样式与字号调整,刷新保留用户选择。
|
||||
- [x] **性能微优化**: 列表渲染优化 + 并行请求 + localStorage 防抖。
|
||||
- [x] **资源能力**: 字体/BGM 资源库 + `/api/assets` 接入。
|
||||
- [x] **音频与字幕修复**: BGM 混音稳定性与字幕断句优化。
|
||||
- [x] **持久化修复**: 接入 `useHomePersistence`,恢复 `isRestored` 逻辑并通过构建。
|
||||
- [x] **预览与选择修复**: 发布预览兼容签名 URL,音频试听路径解析,素材/BGM 回退有效项。
|
||||
- [x] **体验细节优化**: 录音预览 URL 回收,预览弹窗滚动恢复,全局任务提示挂载。
|
||||
|
||||
### Day 16: 深度性能优化
|
||||
- [x] **Qwen-TTS 加速**: 集成 Flash Attention 2 (已停用,被 CosyVoice 3.0 替换)。
|
||||
- [x] **服务守护**: 开发 `Watchdog` 看门狗机制,自动监控并重启僵死服务。
|
||||
- [x] **LatentSync 性能确认**: 验证 DeepCache + 原生 Flash Attn 生效。
|
||||
- [x] **文档重构**: 全面更新 README、部署手册及后端文档。
|
||||
|
||||
### Day 15: 手机号认证迁移
|
||||
- [x] **认证系统升级**: 从邮箱迁移至 11 位手机号注册/登录。
|
||||
@@ -28,10 +279,10 @@
|
||||
### Day 14: AI 增强与体验优化
|
||||
- [x] **AI 标题/标签**: 集成 GLM-4API 自动生成视频元数据。
|
||||
- [x] **字幕升级**: Remotion 逐字高亮字幕 (卡拉OK效果) 及动画片头。
|
||||
- [x] **模型升级**: Qwen3-TTS 升级至 1.7B-Base 版本。
|
||||
- [x] **模型升级**: 声音克隆已迁移至 CosyVoice 3.0 (0.5B)。
|
||||
|
||||
### Day 13: 声音克隆集成
|
||||
- [x] **声音克隆微服务**: 封装 Qwen3-TTS 为独立 API (8009端口)。
|
||||
- [x] **声音克隆微服务**: 封装 CosyVoice 3.0 为独立 API (8010端口,替换 Qwen3-TTS)。
|
||||
- [x] **参考音频管理**: Supabase 存储桶配置与管理接口。
|
||||
- [x] **多模态 TTS**: 前端支持 EdgeTTS / Clone Voice 切换。
|
||||
|
||||
@@ -66,8 +317,10 @@
|
||||
## 🛤️ 后续规划 (Roadmap)
|
||||
|
||||
### 🔴 优先待办
|
||||
- [x] ~~**配音前置重构 — 第二阶段**: 素材片段截取 + 语音时间轴编排~~ ✅ Day 23 已完成
|
||||
- [ ] **批量生成架构**: 支持 Excel 导入,批量生产视频。
|
||||
- [ ] **定时任务后台化**: 迁移前端触发的定时发布到后端 APScheduler。
|
||||
- [ ] **发布任务恢复机制**: 发布任务化 + 状态持久化 + 前端断点恢复,解决刷新后状态丢失。
|
||||
|
||||
### 🔵 长期探索
|
||||
- [ ] **容器化交付**: 提供完整的 Docker Compose 一键部署包。
|
||||
@@ -82,14 +335,15 @@
|
||||
| **核心 API** | 100% | ✅ 稳定 |
|
||||
| **Web UI** | 100% | ✅ 稳定 (移动端适配) |
|
||||
| **唇形同步** | 100% | ✅ LatentSync 1.6 |
|
||||
| **TTS 配音** | 100% | ✅ EdgeTTS + Qwen3 |
|
||||
| **自动发布** | 100% | ✅ B站/抖音/小红书 |
|
||||
| **TTS 配音** | 100% | ✅ EdgeTTS + CosyVoice 3.0 + 配音前置 + 时间轴编排 + 自动转写 + 语速控制 + 语气控制 |
|
||||
| **自动发布** | 100% | ✅ 抖音/微信视频号/B站/小红书 |
|
||||
| **用户认证** | 100% | ✅ 手机号 + JWT |
|
||||
| **付费会员** | 100% | ✅ 支付宝电脑网站支付 + 自动激活 |
|
||||
| **部署运维** | 100% | ✅ PM2 + Watchdog |
|
||||
|
||||
---
|
||||
|
||||
## 📎 相关文档
|
||||
|
||||
- [详细开发日志 (DevLogs)](file:///d:/CodingProjects/Antigravity/ViGent2/Docs/DevLogs/)
|
||||
- [部署手册 (DEPLOY_MANUAL)](file:///d:/CodingProjects/Antigravity/ViGent2/Docs/DEPLOY_MANUAL.md)
|
||||
- [详细开发日志 (DevLogs)](Docs/DevLogs/)
|
||||
- [部署手册 (DEPLOY_MANUAL)](Docs/DEPLOY_MANUAL.md)
|
||||
|
||||
82
README.md
82
README.md
@@ -4,8 +4,8 @@
|
||||
|
||||
> 📹 **上传人物** · 🎙️ **输入文案** · 🎬 **一键成片**
|
||||
|
||||
基于 **LatentSync 1.6 + EdgeTTS** 的开源数字人口播视频生成系统。
|
||||
集成 **Qwen3-TTS** 声音克隆与自动社交媒体发布功能。
|
||||
基于 **LatentSync 1.6 + MuseTalk 1.5 混合唇形同步** 的开源数字人口播视频生成系统。
|
||||
集成 **CosyVoice 3.0** 声音克隆与自动社交媒体发布功能。
|
||||
|
||||
[功能特性](#-功能特性) • [技术栈](#-技术栈) • [文档中心](#-文档中心) • [部署指南](Docs/DEPLOY_MANUAL.md)
|
||||
|
||||
@@ -15,19 +15,33 @@
|
||||
|
||||
## ✨ 功能特性
|
||||
|
||||
### 核心能力
|
||||
- 🎬 **高清唇形同步** - LatentSync 1.6 驱动,512×512 高分辨率 Latent Diffusion 模型。
|
||||
- 🎙️ **多模态配音** - 支持 **EdgeTTS** (微软超自然语音) 和 **Qwen3-TTS** (3秒极速声音克隆)。
|
||||
- 📝 **智能字幕** - 集成 faster-whisper + Remotion,自动生成逐字高亮 (卡拉OK效果) 字幕。
|
||||
- 🎨 **样式预设** - 标题/字幕样式选择 + 预览 + 字号调节,支持自定义字体库。
|
||||
- 🎵 **背景音乐** - 试听 + 音量控制 + 混音,保持配音音量稳定。
|
||||
- 🤖 **AI 辅助创作** - 内置 GLM-4.7-Flash,支持 B站/抖音链接文案提取、AI 洗稿、标题/标签自动生成。
|
||||
### 核心能力
|
||||
- 🎬 **高清唇形同步** - 混合方案:短视频(本仓库当前 `.env` 阈值 100s,可配)用 LatentSync 1.6(高质量 Latent Diffusion),长视频用 MuseTalk 1.5(实时级单步推理),自动路由 + 回退。前端可选模型:默认模型(阈值自动路由)/ 快速模型(速度优先)/ 高级模型(质量优先)。
|
||||
- 🎙️ **多模态配音** - 支持 **EdgeTTS** (微软超自然语音, 10 语言) 和 **CosyVoice 3.0** (3秒极速声音克隆, 9语言+18方言, 语速/语气可调)。上传参考音频自动 Whisper 转写 + 智能截取。配音前置工作流:先生成配音 → 选素材 → 生成视频。
|
||||
- 📝 **智能字幕** - 集成 faster-whisper + Remotion,自动生成逐字高亮 (卡拉OK效果) 字幕。
|
||||
- 🎨 **样式预设** - 12 种标题 + 8 种字幕样式预设,支持预览 + 字号调节 + 自定义字体库。CSS 原生描边渲染,清晰无重影。
|
||||
- 🏷️ **标题显示模式** - 片头标题支持 `短暂显示` / `常驻显示`,默认短暂显示(4秒),用户偏好自动持久化。
|
||||
- 📌 **片头副标题** - 可选副标题显示在主标题下方,独立样式配置,AI 可同时生成,20 字限制。
|
||||
- 🖼️ **作品预览一致性** - 标题/字幕预览与 Remotion 成片统一响应式缩放和自动换行,窄屏画布也稳定显示。
|
||||
- 🎞️ **多素材多机位** - 支持多选素材 + 时间轴编辑器 (wavesurfer.js 波形可视化),拖拽分割线调整时长、拖拽排序切换机位、按 `source_start/source_end` 截取片段。
|
||||
- 📐 **画面比例控制** - 时间轴一键切换 `9:16 / 16:9` 输出比例,生成链路全程按目标比例处理。
|
||||
- 💾 **用户偏好持久化** - 首页状态统一恢复/保存,刷新后延续上次配置;新作品生成后优先选中最新,后续用户手动选择持续持久化。
|
||||
- 🎵 **背景音乐** - 试听 + 搜索选择 + 混音(当前前端固定混音系数,保持配音音量稳定)。
|
||||
- 🧩 **统一选择器交互** - 首页/发布页业务选择项统一 SelectPopover(桌面 Popover / 移动端 BottomSheet),支持自动上拉、已选定位与连续预览。
|
||||
- 🤖 **AI 辅助创作** - 内置 GLM-4.7-Flash,支持 B站/抖音链接文案提取、AI 智能改写(支持自定义提示词)、文案深度学习(博主话题分析+文案生成)、标题/标签自动生成、9 语言翻译。
|
||||
|
||||
### 平台化功能
|
||||
- 📱 **全自动发布** - 支持 B站、抖音、小红书定时发布,扫码登录 + Cookie 持久化。
|
||||
- 🔐 **企业级认证** - 完善的用户隔离系统 (Supabase),支持手机号注册/登录、密码管理。
|
||||
- 🛡️ **服务守护** - 内置 Watchdog 看门狗机制,自动监控并重启僵死服务,确保 7x24h 稳定运行。
|
||||
- 🚀 **极致性能** - 视频预压缩、模型常驻服务 (0s加载)、双 GPU 流水线并发。
|
||||
- 📱 **全自动发布** - 支持抖音/微信视频号/B站/小红书立即发布;扫码登录 + Cookie 持久化。
|
||||
- 🖥️ **发布管理预览** - 支持签名 URL / 相对路径作品预览,确保可直接播放。
|
||||
- 📸 **发布结果可视化** - 抖音/微信视频号/小红书发布成功后返回截图,发布页结果卡片可直接查看。
|
||||
- 🧹 **发布后工作区清理引导** - 全平台发布成功后弹出不可误关清理弹窗(失败可重试,达到阈值可暂不清理),仅清输入内容并保留用户偏好。
|
||||
- ⬇️ **一键下载直达** - 首页与发布成功弹窗下载统一走同源 `attachment` 接口,不再新开标签页播放视频。
|
||||
- 🛡️ **发布防误操作** - 发布进行中自动提示“请勿刷新或关闭网页”,并拦截刷新/关页二次确认。
|
||||
- 💳 **付费会员** - 支付宝电脑网站支付自动开通会员,到期自动停用并引导续费,管理员手动激活并存。
|
||||
- 🔐 **认证与隔离** - 基于 Supabase 的用户隔离,支持手机号注册/登录、密码管理。
|
||||
- 🛡️ **安全基线** - AI/Tools 接口强制登录鉴权、关键上传链路体积限制、生产环境默认密钥启动拦截。
|
||||
- 🛡️ **服务守护** - 内置 Watchdog 看门狗机制,自动监控并重启僵死服务,确保 7x24h 稳定运行。
|
||||
- 🚀 **性能优化** - 编码流水线从 5-6 次有损编码精简至 3 次(prepare_segment → 模型输出 → Remotion)、compose 流复制免重编码、同分辨率跳过 scale、FFmpeg 超时保护、全局视频生成并发限制 (Semaphore(2))、Remotion 4 并发渲染、MuseTalk rawvideo 管道直编码(消除中间有损文件)、模型常驻服务、双 GPU 流水线并发、Redis 任务 TTL 自动清理、workflow 阻塞调用线程池化。
|
||||
|
||||
---
|
||||
|
||||
@@ -35,11 +49,11 @@
|
||||
|
||||
| 领域 | 核心技术 | 说明 |
|
||||
|------|----------|------|
|
||||
| **前端** | Next.js 14 | TypeScript, TailwindCSS, SWR |
|
||||
| **后端** | FastAPI | Python 3.10, AsyncIO, PM2 |
|
||||
| **前端** | Next.js 16 | TypeScript, TailwindCSS, SWR, wavesurfer.js |
|
||||
| **后端** | FastAPI | Python 3.12, AsyncIO, PM2 |
|
||||
| **数据库** | Supabase | PostgreSQL, Storage (本地/S3), Auth |
|
||||
| **唇形同步** | LatentSync 1.6 | PyTorch 2.5, Diffusers, DeepCache |
|
||||
| **声音克隆** | Qwen3-TTS | 1.7B 参数量,Flash Attention 2 加速 |
|
||||
| **唇形同步** | LatentSync 1.6 + MuseTalk 1.5 | 混合路由:短视频 Diffusion 高质量,长视频单步实时推理 |
|
||||
| **声音克隆** | CosyVoice 3.0 | 0.5B 参数量,9 语言 + 18 方言 |
|
||||
| **自动化** | Playwright | 社交媒体无头浏览器自动化 |
|
||||
| **部署** | Docker & PM2 | 混合部署架构 |
|
||||
|
||||
@@ -50,14 +64,20 @@
|
||||
我们提供了详尽的开发与部署文档:
|
||||
|
||||
### 部署运维
|
||||
- **[部署手册 (DEPLOY_MANUAL.md)](Docs/DEPLOY_MANUAL.md)** - 👈 **部署请看这里**!包含完整的环境搭建步骤。
|
||||
- [参考音频服务部署 (QWEN3_TTS_DEPLOY.md)](Docs/QWEN3_TTS_DEPLOY.md) - 声音克隆模型部署指南。
|
||||
- [LatentSync 部署指南](models/LatentSync/DEPLOY.md) - 唇形同步模型独立部署。
|
||||
- [用户认证部署 (AUTH_DEPLOY.md)](Docs/AUTH_DEPLOY.md) - Supabase 与 Auth 系统配置。
|
||||
- **[部署手册 (DEPLOY_MANUAL.md)](Docs/DEPLOY_MANUAL.md)** - 👈 **部署请看这里**!包含完整的环境搭建步骤。
|
||||
- [多平台发布部署说明 (PUBLISH_DEPLOY.md)](Docs/PUBLISH_DEPLOY.md) - 抖音/微信视频号/B站/小红书登录与自动化发布专项文档。
|
||||
- [参考音频服务部署 (COSYVOICE3_DEPLOY.md)](Docs/COSYVOICE3_DEPLOY.md) - 声音克隆模型部署指南。
|
||||
- [LatentSync 部署指南 (LATENTSYNC_DEPLOY.md)](Docs/LATENTSYNC_DEPLOY.md) - 唇形同步模型独立部署。
|
||||
- [MuseTalk 部署指南 (MUSETALK_DEPLOY.md)](Docs/MUSETALK_DEPLOY.md) - 长视频唇形同步模型部署。
|
||||
- [Supabase 部署指南 (SUPABASE_DEPLOY.md)](Docs/SUPABASE_DEPLOY.md) - Supabase 与认证系统配置。
|
||||
- [支付宝部署指南 (ALIPAY_DEPLOY.md)](Docs/ALIPAY_DEPLOY.md) - 支付宝付费开通会员配置。
|
||||
|
||||
### 开发文档
|
||||
- [后端开发指南](Docs/BACKEND_README.md) - 接口规范与开发流程。
|
||||
- [前端开发指南](Docs/FRONTEND_DEV.md) - UI 组件与页面规范。
|
||||
- [后端开发指南 (BACKEND_README.md)](Docs/BACKEND_README.md) - 接口规范与开发流程。
|
||||
- [后端开发规范 (BACKEND_DEV.md)](Docs/BACKEND_DEV.md) - 分层约定与开发习惯。
|
||||
- [前端开发指南 (FRONTEND_DEV.md)](Docs/FRONTEND_DEV.md) - UI 组件与页面规范。
|
||||
- [前端组件文档 (FRONTEND_README.md)](Docs/FRONTEND_README.md) - 组件结构与板块说明。
|
||||
- [Remotion 字幕部署 (SUBTITLE_DEPLOY.md)](Docs/SUBTITLE_DEPLOY.md) - 字幕渲染服务部署。
|
||||
- [开发日志 (DevLogs)](Docs/DevLogs/) - 每日开发进度与技术决策记录。
|
||||
|
||||
---
|
||||
@@ -68,12 +88,15 @@
|
||||
ViGent2/
|
||||
├── backend/ # FastAPI 后端服务
|
||||
│ ├── app/ # 核心业务逻辑
|
||||
│ ├── scripts/ # 运维脚本 (Watchdog 等)
|
||||
│ └── tests/ # 测试用例
|
||||
│ ├── assets/ # 字体 / 样式 / BGM
|
||||
│ ├── user_data/ # 用户隔离数据 (Cookie 等)
|
||||
│ └── scripts/ # 运维脚本 (Watchdog 等)
|
||||
├── frontend/ # Next.js 前端应用
|
||||
├── remotion/ # Remotion 视频渲染 (标题/字幕合成)
|
||||
├── models/ # AI 模型仓库
|
||||
│ ├── LatentSync/ # 唇形同步服务
|
||||
│ └── Qwen3-TTS/ # 声音克隆服务
|
||||
│ ├── LatentSync/ # 唇形同步服务 (GPU1, 短视频)
|
||||
│ ├── MuseTalk/ # 唇形同步服务 (GPU0, 长视频)
|
||||
│ └── CosyVoice/ # 声音克隆服务
|
||||
└── Docs/ # 项目文档
|
||||
```
|
||||
|
||||
@@ -87,8 +110,9 @@ ViGent2/
|
||||
|----------|------|------|
|
||||
| **Web UI** | 3002 | 用户访问入口 (Next.js) |
|
||||
| **Backend API** | 8006 | 核心业务接口 (FastAPI) |
|
||||
| **LatentSync** | 8007 | 唇形同步推理服务 |
|
||||
| **Qwen3-TTS** | 8009 | 声音克隆推理服务 |
|
||||
| **LatentSync** | 8007 | 唇形同步推理服务 (GPU1, 短视频) |
|
||||
| **MuseTalk** | 8011 | 唇形同步推理服务 (GPU0, 长视频) |
|
||||
| **CosyVoice 3.0** | 8010 | 声音克隆推理服务 |
|
||||
| **Supabase** | 8008 | 数据库与认证网关 |
|
||||
|
||||
---
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
# 复制此文件为 .env 并填入实际值
|
||||
|
||||
# 调试模式
|
||||
DEBUG=true
|
||||
DEBUG=false
|
||||
|
||||
# Redis 配置 (Celery 任务队列)
|
||||
REDIS_URL=redis://localhost:6379/0
|
||||
@@ -15,7 +15,6 @@ DEFAULT_TTS_VOICE=zh-CN-YunxiNeural
|
||||
# GPU 选择 (0=第一块GPU, 1=第二块GPU)
|
||||
LATENTSYNC_GPU_ID=1
|
||||
|
||||
# 使用本地模式 (true) 或远程 API (false)
|
||||
# 使用本地模式 (true) 或远程 API (false)
|
||||
LATENTSYNC_LOCAL=true
|
||||
|
||||
@@ -26,10 +25,10 @@ LATENTSYNC_USE_SERVER=true
|
||||
# LATENTSYNC_API_URL=http://localhost:8007
|
||||
|
||||
# 推理步数 (20-50, 越高质量越好,速度越慢)
|
||||
LATENTSYNC_INFERENCE_STEPS=40
|
||||
LATENTSYNC_INFERENCE_STEPS=30
|
||||
|
||||
# 引导系数 (1.0-3.0, 越高唇同步越准,但可能抖动)
|
||||
LATENTSYNC_GUIDANCE_SCALE=2.0
|
||||
LATENTSYNC_GUIDANCE_SCALE=1.9
|
||||
|
||||
# 启用 DeepCache 加速 (推荐开启)
|
||||
LATENTSYNC_ENABLE_DEEPCACHE=true
|
||||
@@ -37,6 +36,53 @@ LATENTSYNC_ENABLE_DEEPCACHE=true
|
||||
# 随机种子 (设为 -1 则随机)
|
||||
LATENTSYNC_SEED=1247
|
||||
|
||||
# =============== MuseTalk 配置 ===============
|
||||
# GPU 选择 (默认 GPU0,与 CosyVoice 共存)
|
||||
MUSETALK_GPU_ID=0
|
||||
|
||||
# 常驻服务地址 (端口 8011)
|
||||
MUSETALK_API_URL=http://localhost:8011
|
||||
|
||||
# 推理批大小
|
||||
MUSETALK_BATCH_SIZE=32
|
||||
|
||||
# 模型版本
|
||||
MUSETALK_VERSION=v15
|
||||
|
||||
# 半精度加速
|
||||
MUSETALK_USE_FLOAT16=true
|
||||
|
||||
# 人脸检测降频间隔(帧,越小质量越稳但更慢)
|
||||
MUSETALK_DETECT_EVERY=2
|
||||
|
||||
# BiSeNet mask 缓存更新间隔(帧,越小质量越稳但更慢)
|
||||
MUSETALK_BLEND_CACHE_EVERY=2
|
||||
|
||||
# Whisper 时序上下文(越大越平滑,口型响应会更钝)
|
||||
MUSETALK_AUDIO_PADDING_LEFT=2
|
||||
MUSETALK_AUDIO_PADDING_RIGHT=2
|
||||
|
||||
# v1.5 下巴区域扩展像素(越大越容易看到下唇/牙齿,也更易边缘不稳)
|
||||
MUSETALK_EXTRA_MARGIN=14
|
||||
|
||||
# 音频-口型对齐偏移(帧,正数=口型更晚,负数=口型更早)
|
||||
MUSETALK_DELAY_FRAME=0
|
||||
|
||||
# 融合模式:auto(按版本自动) / jaw / raw
|
||||
MUSETALK_BLEND_MODE=jaw
|
||||
|
||||
# FaceParsing 面颊宽度(仅 v1.5 生效,影响融合掩膜范围)
|
||||
MUSETALK_FACEPARSING_LEFT_CHEEK_WIDTH=90
|
||||
MUSETALK_FACEPARSING_RIGHT_CHEEK_WIDTH=90
|
||||
|
||||
# 最终编码质量(CRF 越小越清晰但体积更大)
|
||||
MUSETALK_ENCODE_CRF=14
|
||||
MUSETALK_ENCODE_PRESET=slow
|
||||
|
||||
# =============== 混合唇形同步路由 ===============
|
||||
# 音频时长 >= 此阈值(秒)用 MuseTalk,< 此阈值用 LatentSync
|
||||
LIPSYNC_DURATION_THRESHOLD=100
|
||||
|
||||
# =============== 上传配置 ===============
|
||||
# 最大上传文件大小 (MB)
|
||||
MAX_UPLOAD_SIZE_MB=500
|
||||
@@ -66,3 +112,14 @@ ADMIN_PASSWORD=lam1988324
|
||||
# 智谱 GLM API 配置 (用于生成标题和标签)
|
||||
GLM_API_KEY=32440cd3f3444d1f8fe721304acea8bd.YXNLrk7eIJMKcg4t
|
||||
GLM_MODEL=glm-4.7-flash
|
||||
|
||||
# =============== Supabase Storage 本地路径 ===============
|
||||
# 确保存储卷映射正确,避免硬编码路径
|
||||
SUPABASE_STORAGE_LOCAL_PATH=/home/rongye/ProgramFiles/Supabase/volumes/storage/stub/stub
|
||||
|
||||
# =============== 支付宝配置 ===============
|
||||
ALIPAY_APP_ID=2021006132600283
|
||||
ALIPAY_PRIVATE_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/app_private_key.pem
|
||||
ALIPAY_PUBLIC_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/alipay_public_key.pem
|
||||
ALIPAY_NOTIFY_URL=https://ipagent.ai-labz.cn/api/payment/notify
|
||||
ALIPAY_RETURN_URL=https://ipagent.ai-labz.cn/pay
|
||||
|
||||
@@ -1,45 +0,0 @@
|
||||
"""
|
||||
AI 相关 API 路由
|
||||
"""
|
||||
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from pydantic import BaseModel
|
||||
from loguru import logger
|
||||
|
||||
from app.services.glm_service import glm_service
|
||||
|
||||
|
||||
router = APIRouter(prefix="/api/ai", tags=["AI"])
|
||||
|
||||
|
||||
class GenerateMetaRequest(BaseModel):
|
||||
"""生成标题标签请求"""
|
||||
text: str
|
||||
|
||||
|
||||
class GenerateMetaResponse(BaseModel):
|
||||
"""生成标题标签响应"""
|
||||
title: str
|
||||
tags: list[str]
|
||||
|
||||
|
||||
@router.post("/generate-meta", response_model=GenerateMetaResponse)
|
||||
async def generate_meta(req: GenerateMetaRequest):
|
||||
"""
|
||||
AI 生成视频标题和标签
|
||||
|
||||
根据口播文案自动生成吸引人的标题和相关标签
|
||||
"""
|
||||
if not req.text or not req.text.strip():
|
||||
raise HTTPException(status_code=400, detail="口播文案不能为空")
|
||||
|
||||
try:
|
||||
logger.info(f"Generating meta for text: {req.text[:50]}...")
|
||||
result = await glm_service.generate_title_tags(req.text)
|
||||
return GenerateMetaResponse(
|
||||
title=result.get("title", ""),
|
||||
tags=result.get("tags", [])
|
||||
)
|
||||
except Exception as e:
|
||||
logger.error(f"Generate meta failed: {e}")
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
@@ -1,338 +0,0 @@
|
||||
from fastapi import APIRouter, UploadFile, File, HTTPException, Request, BackgroundTasks, Depends
|
||||
from app.core.config import settings
|
||||
from app.core.deps import get_current_user
|
||||
from app.services.storage import storage_service
|
||||
import re
|
||||
import time
|
||||
import traceback
|
||||
import os
|
||||
import aiofiles
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
from pydantic import BaseModel
|
||||
from typing import Optional
|
||||
import httpx
|
||||
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
def sanitize_filename(filename: str) -> str:
|
||||
safe_name = re.sub(r'[<>:"/\\|?*]', '_', filename)
|
||||
if len(safe_name) > 100:
|
||||
ext = Path(safe_name).suffix
|
||||
safe_name = safe_name[:100 - len(ext)] + ext
|
||||
return safe_name
|
||||
|
||||
async def process_and_upload(temp_file_path: str, original_filename: str, content_type: str, user_id: str):
|
||||
"""Background task to strip multipart headers and upload to Supabase"""
|
||||
try:
|
||||
logger.info(f"Processing raw upload: {temp_file_path} for user {user_id}")
|
||||
|
||||
# 1. Analyze file to find actual video content (strip multipart boundaries)
|
||||
# This is a simplified manual parser for a SINGLE file upload.
|
||||
# Structure:
|
||||
# --boundary
|
||||
# Content-Disposition: form-data; name="file"; filename="..."
|
||||
# Content-Type: video/mp4
|
||||
# \r\n\r\n
|
||||
# [DATA]
|
||||
# \r\n--boundary--
|
||||
|
||||
# We need to read the first few KB to find the header end
|
||||
start_offset = 0
|
||||
end_offset = 0
|
||||
boundary = b""
|
||||
|
||||
file_size = os.path.getsize(temp_file_path)
|
||||
|
||||
with open(temp_file_path, 'rb') as f:
|
||||
# Read first 4KB to find header
|
||||
head = f.read(4096)
|
||||
|
||||
# Find boundary
|
||||
first_line_end = head.find(b'\r\n')
|
||||
if first_line_end == -1:
|
||||
raise Exception("Could not find boundary in multipart body")
|
||||
|
||||
boundary = head[:first_line_end] # e.g. --boundary123
|
||||
logger.info(f"Detected boundary: {boundary}")
|
||||
|
||||
# Find end of headers (\r\n\r\n)
|
||||
header_end = head.find(b'\r\n\r\n')
|
||||
if header_end == -1:
|
||||
raise Exception("Could not find end of multipart headers")
|
||||
|
||||
start_offset = header_end + 4
|
||||
logger.info(f"Video data starts at offset: {start_offset}")
|
||||
|
||||
# Find end boundary (read from end of file)
|
||||
# It should be \r\n + boundary + -- + \r\n
|
||||
# We seek to end-200 bytes
|
||||
f.seek(max(0, file_size - 200))
|
||||
tail = f.read()
|
||||
|
||||
# The closing boundary is usually --boundary--
|
||||
# We look for the last occurrence of the boundary
|
||||
last_boundary_pos = tail.rfind(boundary)
|
||||
if last_boundary_pos != -1:
|
||||
# The data ends before \r\n + boundary
|
||||
# The tail buffer relative position needs to be converted to absolute
|
||||
end_pos_in_tail = last_boundary_pos
|
||||
# We also need to check for the preceding \r\n
|
||||
if end_pos_in_tail >= 2 and tail[end_pos_in_tail-2:end_pos_in_tail] == b'\r\n':
|
||||
end_pos_in_tail -= 2
|
||||
|
||||
# Absolute end offset
|
||||
end_offset = (file_size - 200) + last_boundary_pos
|
||||
# Correction for CRLF before boundary
|
||||
# Actually, simply: read until (file_size - len(tail) + last_boundary_pos) - 2
|
||||
end_offset = (max(0, file_size - 200) + last_boundary_pos) - 2
|
||||
else:
|
||||
logger.warning("Could not find closing boundary, assuming EOF")
|
||||
end_offset = file_size
|
||||
|
||||
logger.info(f"Video data ends at offset: {end_offset}. Total video size: {end_offset - start_offset}")
|
||||
|
||||
# 2. Extract and Upload to Supabase
|
||||
# Since we have the file on disk, we can just pass the file object (seeked) to upload_file?
|
||||
# Or if upload_file expects bytes/path, checking storage.py...
|
||||
# It takes `file_data` (bytes) or file-like?
|
||||
# supabase-py's `upload` method handles parsing if we pass a file object.
|
||||
# But we need to pass ONLY the video slice.
|
||||
# So we create a generator or a sliced file object?
|
||||
# Simpler: Read the slice into memory if < 1GB? Or copy to new temp file?
|
||||
# Copying to new temp file is safer for memory.
|
||||
|
||||
video_path = temp_file_path + "_video.mp4"
|
||||
with open(temp_file_path, 'rb') as src, open(video_path, 'wb') as dst:
|
||||
src.seek(start_offset)
|
||||
# Copy in chunks
|
||||
bytes_to_copy = end_offset - start_offset
|
||||
copied = 0
|
||||
while copied < bytes_to_copy:
|
||||
chunk_size = min(1024*1024*10, bytes_to_copy - copied) # 10MB chunks
|
||||
chunk = src.read(chunk_size)
|
||||
if not chunk:
|
||||
break
|
||||
dst.write(chunk)
|
||||
copied += len(chunk)
|
||||
|
||||
logger.info(f"Extracted video content to {video_path}")
|
||||
|
||||
# 3. Upload to Supabase with user isolation
|
||||
timestamp = int(time.time())
|
||||
safe_name = re.sub(r'[^a-zA-Z0-9._-]', '', original_filename)
|
||||
# 使用 user_id 作为目录前缀实现隔离
|
||||
storage_path = f"{user_id}/{timestamp}_{safe_name}"
|
||||
|
||||
# Use storage service (this calls Supabase which might do its own http request)
|
||||
# We read the cleaned video file
|
||||
with open(video_path, 'rb') as f:
|
||||
file_content = f.read() # Still reading into memory for simple upload call, but server has 32GB RAM so ok for 500MB
|
||||
await storage_service.upload_file(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=storage_path,
|
||||
file_data=file_content,
|
||||
content_type=content_type
|
||||
)
|
||||
|
||||
logger.info(f"Upload to Supabase complete: {storage_path}")
|
||||
|
||||
# Cleanup
|
||||
os.remove(temp_file_path)
|
||||
os.remove(video_path)
|
||||
|
||||
return storage_path
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Background upload processing failed: {e}\n{traceback.format_exc()}")
|
||||
raise
|
||||
|
||||
|
||||
@router.post("")
|
||||
async def upload_material(
|
||||
request: Request,
|
||||
background_tasks: BackgroundTasks,
|
||||
current_user: dict = Depends(get_current_user)
|
||||
):
|
||||
user_id = current_user["id"]
|
||||
logger.info(f"ENTERED upload_material (Streaming Mode) for user {user_id}. Headers: {request.headers}")
|
||||
|
||||
filename = "unknown_video.mp4" # Fallback
|
||||
content_type = "video/mp4"
|
||||
|
||||
# Try to parse filename from header if possible (unreliable in raw stream)
|
||||
# We will rely on post-processing or client hint
|
||||
# Frontend sends standard multipart.
|
||||
|
||||
# Create temp file
|
||||
timestamp = int(time.time())
|
||||
temp_filename = f"upload_{timestamp}.raw"
|
||||
temp_path = os.path.join("/tmp", temp_filename) # Use /tmp on Linux
|
||||
# Ensure /tmp exists (it does) but verify paths
|
||||
if os.name == 'nt': # Local dev
|
||||
temp_path = f"d:/tmp/{temp_filename}"
|
||||
os.makedirs("d:/tmp", exist_ok=True)
|
||||
|
||||
try:
|
||||
total_size = 0
|
||||
last_log = 0
|
||||
|
||||
async with aiofiles.open(temp_path, 'wb') as f:
|
||||
async for chunk in request.stream():
|
||||
await f.write(chunk)
|
||||
total_size += len(chunk)
|
||||
|
||||
# Log progress every 20MB
|
||||
if total_size - last_log > 20 * 1024 * 1024:
|
||||
logger.info(f"Receiving stream... Processed {total_size / (1024*1024):.2f} MB")
|
||||
last_log = total_size
|
||||
|
||||
logger.info(f"Stream reception complete. Total size: {total_size} bytes. Saved to {temp_path}")
|
||||
|
||||
if total_size == 0:
|
||||
raise HTTPException(400, "Received empty body")
|
||||
|
||||
# Attempt to extract filename from the saved file's first bytes?
|
||||
# Or just accept it as "uploaded_video.mp4" for now to prove it works.
|
||||
# We can try to regex the header in the file content we just wrote.
|
||||
# Implemented in background task to return success immediately.
|
||||
|
||||
# Wait, if we return immediately, the user's UI might not show the file yet?
|
||||
# The prompt says "Wait for upload".
|
||||
# But to avoid User Waiting Timeout, maybe returning early is better?
|
||||
# NO, user expects the file to be in the list.
|
||||
# So we Must await the processing.
|
||||
# But "Processing" (Strip + Upload to Supabase) takes time.
|
||||
# Receiving took time.
|
||||
# If we await Supabase upload, does it timeout?
|
||||
# Supabase upload is outgoing. Usually faster/stable.
|
||||
|
||||
# Let's await the processing to ensure "List Materials" shows it.
|
||||
# We need to extract the filename for the list.
|
||||
|
||||
# Quick extract filename from first 4kb
|
||||
with open(temp_path, 'rb') as f:
|
||||
head = f.read(4096).decode('utf-8', errors='ignore')
|
||||
match = re.search(r'filename="([^"]+)"', head)
|
||||
if match:
|
||||
filename = match.group(1)
|
||||
logger.info(f"Extracted filename from body: {filename}")
|
||||
|
||||
# Run processing sync (in await)
|
||||
storage_path = await process_and_upload(temp_path, filename, content_type, user_id)
|
||||
|
||||
# Get signed URL (it exists now)
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=storage_path
|
||||
)
|
||||
|
||||
size_mb = total_size / (1024 * 1024) # Approximate (includes headers)
|
||||
|
||||
# 从 storage_path 提取显示名
|
||||
display_name = storage_path.split('/')[-1] # 去掉 user_id 前缀
|
||||
if '_' in display_name:
|
||||
parts = display_name.split('_', 1)
|
||||
if parts[0].isdigit():
|
||||
display_name = parts[1]
|
||||
|
||||
return {
|
||||
"id": storage_path,
|
||||
"name": display_name,
|
||||
"path": signed_url,
|
||||
"size_mb": size_mb,
|
||||
"type": "video"
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
error_msg = f"Streaming upload failed: {str(e)}"
|
||||
detail_msg = f"Exception: {repr(e)}\nArgs: {e.args}\n{traceback.format_exc()}"
|
||||
logger.error(error_msg + "\n" + detail_msg)
|
||||
|
||||
# Write to debug file
|
||||
try:
|
||||
with open("debug_upload.log", "a") as logf:
|
||||
logf.write(f"\n--- Error at {time.ctime()} ---\n")
|
||||
logf.write(detail_msg)
|
||||
logf.write("\n-----------------------------\n")
|
||||
except:
|
||||
pass
|
||||
|
||||
if os.path.exists(temp_path):
|
||||
try:
|
||||
os.remove(temp_path)
|
||||
except:
|
||||
pass
|
||||
raise HTTPException(500, f"Upload failed. Check server logs. Error: {str(e)}")
|
||||
|
||||
|
||||
@router.get("")
|
||||
async def list_materials(current_user: dict = Depends(get_current_user)):
|
||||
user_id = current_user["id"]
|
||||
try:
|
||||
# 只列出当前用户目录下的文件
|
||||
files_obj = await storage_service.list_files(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=user_id
|
||||
)
|
||||
materials = []
|
||||
for f in files_obj:
|
||||
name = f.get('name')
|
||||
if not name or name == '.emptyFolderPlaceholder':
|
||||
continue
|
||||
display_name = name
|
||||
if '_' in name:
|
||||
parts = name.split('_', 1)
|
||||
if parts[0].isdigit():
|
||||
display_name = parts[1]
|
||||
# 完整路径包含 user_id
|
||||
full_path = f"{user_id}/{name}"
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=full_path
|
||||
)
|
||||
metadata = f.get('metadata', {})
|
||||
size = metadata.get('size', 0)
|
||||
# created_at 在顶层,是 ISO 字符串
|
||||
created_at_str = f.get('created_at', '')
|
||||
created_at = 0
|
||||
if created_at_str:
|
||||
from datetime import datetime
|
||||
try:
|
||||
dt = datetime.fromisoformat(created_at_str.replace('Z', '+00:00'))
|
||||
created_at = int(dt.timestamp())
|
||||
except:
|
||||
pass
|
||||
materials.append({
|
||||
"id": full_path, # ID 使用完整路径
|
||||
"name": display_name,
|
||||
"path": signed_url,
|
||||
"size_mb": size / (1024 * 1024),
|
||||
"type": "video",
|
||||
"created_at": created_at
|
||||
})
|
||||
materials.sort(key=lambda x: x['id'], reverse=True)
|
||||
return {"materials": materials}
|
||||
except Exception as e:
|
||||
logger.error(f"List materials failed: {e}")
|
||||
return {"materials": []}
|
||||
|
||||
|
||||
@router.delete("/{material_id:path}")
|
||||
async def delete_material(material_id: str, current_user: dict = Depends(get_current_user)):
|
||||
user_id = current_user["id"]
|
||||
# 验证 material_id 属于当前用户
|
||||
if not material_id.startswith(f"{user_id}/"):
|
||||
raise HTTPException(403, "无权删除此素材")
|
||||
try:
|
||||
await storage_service.delete_file(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=material_id
|
||||
)
|
||||
return {"success": True, "message": "素材已删除"}
|
||||
except Exception as e:
|
||||
raise HTTPException(500, f"删除失败: {str(e)}")
|
||||
|
||||
|
||||
|
||||
@@ -1,411 +0,0 @@
|
||||
"""
|
||||
参考音频管理 API
|
||||
支持上传/列表/删除参考音频,用于 Qwen3-TTS 声音克隆
|
||||
"""
|
||||
from fastapi import APIRouter, UploadFile, File, Form, HTTPException, Depends
|
||||
from pydantic import BaseModel
|
||||
from typing import List, Optional
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
import time
|
||||
import json
|
||||
import subprocess
|
||||
import tempfile
|
||||
import os
|
||||
import re
|
||||
|
||||
from app.core.deps import get_current_user
|
||||
from app.services.storage import storage_service
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
# 支持的音频格式
|
||||
ALLOWED_AUDIO_EXTENSIONS = {'.wav', '.mp3', '.m4a', '.webm', '.ogg', '.flac', '.aac'}
|
||||
|
||||
# 参考音频 bucket
|
||||
BUCKET_REF_AUDIOS = "ref-audios"
|
||||
|
||||
|
||||
class RefAudioResponse(BaseModel):
|
||||
id: str
|
||||
name: str
|
||||
path: str # signed URL for playback
|
||||
ref_text: str
|
||||
duration_sec: float
|
||||
created_at: int
|
||||
|
||||
|
||||
class RefAudioListResponse(BaseModel):
|
||||
items: List[RefAudioResponse]
|
||||
|
||||
|
||||
def sanitize_filename(filename: str) -> str:
|
||||
"""清理文件名,移除特殊字符"""
|
||||
safe_name = re.sub(r'[<>:"/\\|?*\s]', '_', filename)
|
||||
if len(safe_name) > 50:
|
||||
ext = Path(safe_name).suffix
|
||||
safe_name = safe_name[:50 - len(ext)] + ext
|
||||
return safe_name
|
||||
|
||||
|
||||
def get_audio_duration(file_path: str) -> float:
|
||||
"""获取音频时长 (秒)"""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
['ffprobe', '-v', 'quiet', '-show_entries', 'format=duration',
|
||||
'-of', 'csv=p=0', file_path],
|
||||
capture_output=True, text=True, timeout=10
|
||||
)
|
||||
return float(result.stdout.strip())
|
||||
except Exception as e:
|
||||
logger.warning(f"获取音频时长失败: {e}")
|
||||
return 0.0
|
||||
|
||||
|
||||
def convert_to_wav(input_path: str, output_path: str) -> bool:
|
||||
"""将音频转换为 WAV 格式 (16kHz, mono)"""
|
||||
try:
|
||||
subprocess.run([
|
||||
'ffmpeg', '-y', '-i', input_path,
|
||||
'-ar', '16000', # 16kHz 采样率
|
||||
'-ac', '1', # 单声道
|
||||
'-acodec', 'pcm_s16le', # 16-bit PCM
|
||||
output_path
|
||||
], capture_output=True, timeout=60, check=True)
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"音频转换失败: {e}")
|
||||
return False
|
||||
|
||||
|
||||
@router.post("", response_model=RefAudioResponse)
|
||||
async def upload_ref_audio(
|
||||
file: UploadFile = File(...),
|
||||
ref_text: str = Form(...),
|
||||
user: dict = Depends(get_current_user)
|
||||
):
|
||||
"""
|
||||
上传参考音频
|
||||
|
||||
- file: 音频文件 (支持 wav, mp3, m4a, webm 等)
|
||||
- ref_text: 参考音频的转写文字 (必填)
|
||||
"""
|
||||
user_id = user["id"]
|
||||
|
||||
# 验证文件扩展名
|
||||
ext = Path(file.filename).suffix.lower()
|
||||
if ext not in ALLOWED_AUDIO_EXTENSIONS:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"不支持的音频格式: {ext}。支持的格式: {', '.join(ALLOWED_AUDIO_EXTENSIONS)}"
|
||||
)
|
||||
|
||||
# 验证 ref_text
|
||||
if not ref_text or len(ref_text.strip()) < 2:
|
||||
raise HTTPException(status_code=400, detail="参考文字不能为空")
|
||||
|
||||
try:
|
||||
# 创建临时文件
|
||||
with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp_input:
|
||||
content = await file.read()
|
||||
tmp_input.write(content)
|
||||
tmp_input_path = tmp_input.name
|
||||
|
||||
# 转换为 WAV 格式
|
||||
tmp_wav_path = tmp_input_path + ".wav"
|
||||
if ext != '.wav':
|
||||
if not convert_to_wav(tmp_input_path, tmp_wav_path):
|
||||
raise HTTPException(status_code=500, detail="音频格式转换失败")
|
||||
else:
|
||||
# 即使是 wav 也要标准化格式
|
||||
convert_to_wav(tmp_input_path, tmp_wav_path)
|
||||
|
||||
# 获取音频时长
|
||||
duration = get_audio_duration(tmp_wav_path)
|
||||
if duration < 1.0:
|
||||
raise HTTPException(status_code=400, detail="音频时长过短,至少需要 1 秒")
|
||||
if duration > 60.0:
|
||||
raise HTTPException(status_code=400, detail="音频时长过长,最多 60 秒")
|
||||
|
||||
|
||||
# 3. 处理重名逻辑 (Friendly Display Name)
|
||||
original_name = file.filename
|
||||
|
||||
# 获取用户现有的所有参考音频列表 (为了检查文件名冲突)
|
||||
# 注意: 这种列表方式在文件极多时性能一般,但考虑到单用户参考音频数量有限,目前可行
|
||||
existing_files = await storage_service.list_files(BUCKET_REF_AUDIOS, user_id)
|
||||
existing_names = set()
|
||||
|
||||
# 预加载所有现有的 display name
|
||||
# 这里需要并发请求 metadata 可能会慢,优化: 仅检查 metadata 文件并解析
|
||||
# 简易方案: 仅在 metadata 中读取 original_filename
|
||||
# 但 list_files 返回的是 name,我们需要 metadata
|
||||
# 考虑到性能,这里使用一种妥协方案:
|
||||
# 我们不做全量检查,而是简单的检查:如果用户上传 myvoice.wav
|
||||
# 我们看看有没有 (timestamp)_myvoice.wav 这种其实并不能准确判断 display name 是否冲突
|
||||
#
|
||||
# 正确做法: 应该有个数据库表存 metadata。但目前是无数据库设计。
|
||||
#
|
||||
# 改用简单方案:
|
||||
# 既然我们无法快速获取所有 display name,
|
||||
# 我们暂时只处理 "在新上传时,original_filename 保持原样"
|
||||
# 但用户希望 "如果在列表中看到重复的,自动加(1)"
|
||||
#
|
||||
# 鉴于无数据库架构的限制,要在上传时知道"已有的 display name" 成本太高(需遍历下载所有json)。
|
||||
#
|
||||
# 💡 替代方案:
|
||||
# 我们不检查旧的。我们只保证**存储**唯一。
|
||||
# 对于用户提到的 "新上传的文件名后加个数字" -> 这通常是指 "另存为" 的逻辑。
|
||||
# 既然用户现在的痛点是 "显示了时间戳太丑",而我已经去掉了时间戳显示。
|
||||
# 那么如果用户上传两个 "TEST.wav",列表里就会有两个 "TEST.wav" (但时间不同)。
|
||||
# 这其实是可以接受的。
|
||||
#
|
||||
# 但如果用户强求 "自动重命名":
|
||||
# 我们可以在这里做一个轻量级的 "同名检测":
|
||||
# 检查有没有 *_{original_name} 的文件存在。
|
||||
# 如果 storage 里已经有 123_abc.wav, 456_abc.wav
|
||||
# 我们可以认为 abc.wav 已经存在。
|
||||
|
||||
dup_count = 0
|
||||
search_suffix = f"_{original_name}" # 比如 _test.wav
|
||||
|
||||
for f in existing_files:
|
||||
fname = f.get('name', '')
|
||||
if fname.endswith(search_suffix):
|
||||
dup_count += 1
|
||||
|
||||
final_display_name = original_name
|
||||
if dup_count > 0:
|
||||
name_stem = Path(original_name).stem
|
||||
name_ext = Path(original_name).suffix
|
||||
final_display_name = f"{name_stem}({dup_count}){name_ext}"
|
||||
|
||||
# 生成存储路径 (唯一ID)
|
||||
timestamp = int(time.time())
|
||||
safe_name = sanitize_filename(Path(file.filename).stem)
|
||||
storage_path = f"{user_id}/{timestamp}_{safe_name}.wav"
|
||||
|
||||
# 上传 WAV 文件到 Supabase
|
||||
with open(tmp_wav_path, 'rb') as f:
|
||||
wav_data = f.read()
|
||||
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET_REF_AUDIOS,
|
||||
path=storage_path,
|
||||
file_data=wav_data,
|
||||
content_type="audio/wav"
|
||||
)
|
||||
|
||||
# 上传元数据 JSON
|
||||
metadata = {
|
||||
"ref_text": ref_text.strip(),
|
||||
"original_filename": final_display_name, # 这里的名字如果有重复会自动加(1)
|
||||
"duration_sec": duration,
|
||||
"created_at": timestamp
|
||||
}
|
||||
metadata_path = f"{user_id}/{timestamp}_{safe_name}.json"
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET_REF_AUDIOS,
|
||||
path=metadata_path,
|
||||
file_data=json.dumps(metadata, ensure_ascii=False).encode('utf-8'),
|
||||
content_type="application/json"
|
||||
)
|
||||
|
||||
# 获取签名 URL
|
||||
signed_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, storage_path)
|
||||
|
||||
# 清理临时文件
|
||||
os.unlink(tmp_input_path)
|
||||
if os.path.exists(tmp_wav_path):
|
||||
os.unlink(tmp_wav_path)
|
||||
|
||||
return RefAudioResponse(
|
||||
id=storage_path,
|
||||
name=file.filename,
|
||||
path=signed_url,
|
||||
ref_text=ref_text.strip(),
|
||||
duration_sec=duration,
|
||||
created_at=timestamp
|
||||
)
|
||||
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"上传参考音频失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"上传失败: {str(e)}")
|
||||
|
||||
|
||||
@router.get("", response_model=RefAudioListResponse)
|
||||
async def list_ref_audios(user: dict = Depends(get_current_user)):
|
||||
"""列出当前用户的所有参考音频"""
|
||||
user_id = user["id"]
|
||||
|
||||
try:
|
||||
# 列出用户目录下的文件
|
||||
files = await storage_service.list_files(BUCKET_REF_AUDIOS, user_id)
|
||||
|
||||
# 过滤出 .wav 文件并获取对应的 metadata
|
||||
items = []
|
||||
for f in files:
|
||||
name = f.get("name", "")
|
||||
if not name.endswith(".wav"):
|
||||
continue
|
||||
|
||||
storage_path = f"{user_id}/{name}"
|
||||
|
||||
# 尝试读取 metadata
|
||||
metadata_name = name.replace(".wav", ".json")
|
||||
metadata_path = f"{user_id}/{metadata_name}"
|
||||
|
||||
ref_text = ""
|
||||
duration_sec = 0.0
|
||||
created_at = 0
|
||||
original_filename = ""
|
||||
|
||||
try:
|
||||
# 获取 metadata 内容
|
||||
metadata_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, metadata_path)
|
||||
import httpx
|
||||
async with httpx.AsyncClient() as client:
|
||||
resp = await client.get(metadata_url)
|
||||
if resp.status_code == 200:
|
||||
metadata = resp.json()
|
||||
ref_text = metadata.get("ref_text", "")
|
||||
duration_sec = metadata.get("duration_sec", 0.0)
|
||||
created_at = metadata.get("created_at", 0)
|
||||
original_filename = metadata.get("original_filename", "")
|
||||
except Exception as e:
|
||||
logger.warning(f"读取 metadata 失败: {e}")
|
||||
# 从文件名提取时间戳
|
||||
try:
|
||||
created_at = int(name.split("_")[0])
|
||||
except:
|
||||
pass
|
||||
|
||||
# 获取音频签名 URL
|
||||
signed_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, storage_path)
|
||||
|
||||
# 优先显示原始文件名 (去掉时间戳前缀)
|
||||
display_name = original_filename if original_filename else name
|
||||
# 如果原始文件名丢失,尝试从现有文件名中通过正则去掉时间戳
|
||||
if not display_name or display_name == name:
|
||||
# 匹配 "1234567890_filename.wav"
|
||||
match = re.match(r'^\d+_(.+)$', name)
|
||||
if match:
|
||||
display_name = match.group(1)
|
||||
|
||||
items.append(RefAudioResponse(
|
||||
id=storage_path,
|
||||
name=display_name,
|
||||
path=signed_url,
|
||||
ref_text=ref_text,
|
||||
duration_sec=duration_sec,
|
||||
created_at=created_at
|
||||
))
|
||||
|
||||
# 按创建时间倒序排列
|
||||
items.sort(key=lambda x: x.created_at, reverse=True)
|
||||
|
||||
return RefAudioListResponse(items=items)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"列出参考音频失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"获取列表失败: {str(e)}")
|
||||
|
||||
|
||||
@router.delete("/{audio_id:path}")
|
||||
async def delete_ref_audio(audio_id: str, user: dict = Depends(get_current_user)):
|
||||
"""删除参考音频"""
|
||||
user_id = user["id"]
|
||||
|
||||
# 安全检查:确保只能删除自己的文件
|
||||
if not audio_id.startswith(f"{user_id}/"):
|
||||
raise HTTPException(status_code=403, detail="无权删除此文件")
|
||||
|
||||
try:
|
||||
# 删除 WAV 文件
|
||||
await storage_service.delete_file(BUCKET_REF_AUDIOS, audio_id)
|
||||
|
||||
# 删除 metadata JSON
|
||||
metadata_path = audio_id.replace(".wav", ".json")
|
||||
try:
|
||||
await storage_service.delete_file(BUCKET_REF_AUDIOS, metadata_path)
|
||||
except:
|
||||
pass # metadata 可能不存在
|
||||
|
||||
return {"success": True, "message": "删除成功"}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"删除参考音频失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"删除失败: {str(e)}")
|
||||
|
||||
|
||||
class RenameRequest(BaseModel):
|
||||
new_name: str
|
||||
|
||||
|
||||
@router.put("/{audio_id:path}")
|
||||
async def rename_ref_audio(
|
||||
audio_id: str,
|
||||
request: RenameRequest,
|
||||
user: dict = Depends(get_current_user)
|
||||
):
|
||||
"""重命名参考音频 (修改 metadata 中的 display name)"""
|
||||
user_id = user["id"]
|
||||
|
||||
# 安全检查
|
||||
if not audio_id.startswith(f"{user_id}/"):
|
||||
raise HTTPException(status_code=403, detail="无权修改此文件")
|
||||
|
||||
new_name = request.new_name.strip()
|
||||
if not new_name:
|
||||
raise HTTPException(status_code=400, detail="新名称不能为空")
|
||||
|
||||
# 确保新名称有后缀 (保留原后缀或添加 .wav)
|
||||
if not Path(new_name).suffix:
|
||||
new_name += ".wav"
|
||||
|
||||
try:
|
||||
# 1. 下载现有的 metadata
|
||||
metadata_path = audio_id.replace(".wav", ".json")
|
||||
try:
|
||||
# 获取已有的 JSON
|
||||
import httpx
|
||||
metadata_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, metadata_path)
|
||||
if not metadata_url:
|
||||
# 如果 json 不存在,则需要新建一个基础的
|
||||
raise Exception("Metadata not found")
|
||||
|
||||
async with httpx.AsyncClient() as client:
|
||||
resp = await client.get(metadata_url)
|
||||
if resp.status_code == 200:
|
||||
metadata = resp.json()
|
||||
else:
|
||||
raise Exception(f"Failed to fetch metadata: {resp.status_code}")
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"无法读取元数据: {e}, 将创建新的元数据")
|
||||
# 兜底:如果读取失败,构建最小元数据
|
||||
metadata = {
|
||||
"ref_text": "", # 可能丢失
|
||||
"duration_sec": 0.0,
|
||||
"created_at": int(time.time()),
|
||||
"original_filename": new_name
|
||||
}
|
||||
|
||||
# 2. 更新 original_filename
|
||||
metadata["original_filename"] = new_name
|
||||
|
||||
# 3. 覆盖上传 metadata
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET_REF_AUDIOS,
|
||||
path=metadata_path,
|
||||
file_data=json.dumps(metadata, ensure_ascii=False).encode('utf-8'),
|
||||
content_type="application/json"
|
||||
)
|
||||
|
||||
return {"success": True, "name": new_name}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"重命名失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"重命名失败: {str(e)}")
|
||||
@@ -1,398 +0,0 @@
|
||||
from fastapi import APIRouter, UploadFile, File, Form, HTTPException
|
||||
from typing import Optional
|
||||
import shutil
|
||||
import os
|
||||
import time
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
import traceback
|
||||
import re
|
||||
import json
|
||||
import requests
|
||||
from urllib.parse import unquote
|
||||
|
||||
from app.services.whisper_service import whisper_service
|
||||
from app.services.glm_service import glm_service
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
@router.post("/extract-script")
|
||||
async def extract_script_tool(
|
||||
file: Optional[UploadFile] = File(None),
|
||||
url: Optional[str] = Form(None),
|
||||
rewrite: bool = Form(True)
|
||||
):
|
||||
"""
|
||||
独立文案提取工具
|
||||
支持上传视频/音频 OR 输入视频链接 -> 提取文字 -> (可选) AI洗稿
|
||||
"""
|
||||
if not file and not url:
|
||||
raise HTTPException(400, "必须提供文件或视频链接")
|
||||
|
||||
temp_path = None
|
||||
try:
|
||||
timestamp = int(time.time())
|
||||
temp_dir = Path("/tmp")
|
||||
if os.name == 'nt':
|
||||
temp_dir = Path("d:/tmp")
|
||||
temp_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# 1. 获取/保存文件
|
||||
loop = asyncio.get_event_loop()
|
||||
|
||||
if file:
|
||||
safe_filename = Path(file.filename).name.replace(" ", "_")
|
||||
temp_path = temp_dir / f"tool_extract_{timestamp}_{safe_filename}"
|
||||
# 文件 I/O 放入线程池
|
||||
await loop.run_in_executor(None, lambda: shutil.copyfileobj(file.file, open(temp_path, "wb")))
|
||||
logger.info(f"Tool processing upload file: {temp_path}")
|
||||
else:
|
||||
# URL 下载逻辑
|
||||
# 自动提取文案中的链接 (支持 Douyin/Bilibili 等分享文案)
|
||||
url_match = re.search(r'https?://[^\s]+', url)
|
||||
if url_match:
|
||||
extracted_url = url_match.group(0)
|
||||
logger.info(f"Extracted URL from text: {extracted_url}")
|
||||
url = extracted_url
|
||||
|
||||
logger.info(f"Tool downloading URL: {url}")
|
||||
|
||||
# 封装 yt-dlp 下载函数 (Blocking)
|
||||
def _download_yt_dlp():
|
||||
import yt_dlp
|
||||
logger.info("Attempting download with yt-dlp...")
|
||||
|
||||
ydl_opts = {
|
||||
'format': 'bestaudio/best',
|
||||
'outtmpl': str(temp_dir / f"tool_download_{timestamp}_%(id)s.%(ext)s"),
|
||||
'quiet': True,
|
||||
'no_warnings': True,
|
||||
'http_headers': {
|
||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
|
||||
'Referer': 'https://www.douyin.com/',
|
||||
}
|
||||
}
|
||||
|
||||
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
|
||||
info = ydl.extract_info(url, download=True)
|
||||
if 'requested_downloads' in info:
|
||||
downloaded_file = info['requested_downloads'][0]['filepath']
|
||||
else:
|
||||
ext = info.get('ext', 'mp4')
|
||||
id = info.get('id')
|
||||
downloaded_file = str(temp_dir / f"tool_download_{timestamp}_{id}.{ext}")
|
||||
|
||||
return Path(downloaded_file)
|
||||
|
||||
# 先尝试 yt-dlp (Run in Executor)
|
||||
try:
|
||||
temp_path = await loop.run_in_executor(None, _download_yt_dlp)
|
||||
logger.info(f"yt-dlp downloaded to: {temp_path}")
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"yt-dlp download failed: {e}. Trying manual Douyin fallback...")
|
||||
|
||||
# 失败则尝试手动解析 (Douyin Fallback)
|
||||
if "douyin" in url:
|
||||
manual_path = await download_douyin_manual(url, temp_dir, timestamp)
|
||||
if manual_path:
|
||||
temp_path = manual_path
|
||||
logger.info(f"Manual Douyin fallback successful: {temp_path}")
|
||||
else:
|
||||
raise HTTPException(400, f"视频下载失败。yt-dlp 报错: {str(e)}")
|
||||
elif "bilibili" in url:
|
||||
manual_path = await download_bilibili_manual(url, temp_dir, timestamp)
|
||||
if manual_path:
|
||||
temp_path = manual_path
|
||||
logger.info(f"Manual Bilibili fallback successful: {temp_path}")
|
||||
else:
|
||||
raise HTTPException(400, f"视频下载失败。yt-dlp 报错: {str(e)}")
|
||||
else:
|
||||
raise HTTPException(400, f"视频下载失败: {str(e)}")
|
||||
|
||||
if not temp_path or not temp_path.exists():
|
||||
raise HTTPException(400, "文件获取失败")
|
||||
|
||||
# 1.5 安全转换: 强制转为 WAV (16k)
|
||||
import subprocess
|
||||
audio_path = temp_dir / f"extract_audio_{timestamp}.wav"
|
||||
|
||||
def _convert_audio():
|
||||
try:
|
||||
convert_cmd = [
|
||||
'ffmpeg',
|
||||
'-i', str(temp_path),
|
||||
'-vn', # 忽略视频
|
||||
'-acodec', 'pcm_s16le',
|
||||
'-ar', '16000', # Whisper 推荐采样率
|
||||
'-ac', '1', # 单声道
|
||||
'-y', # 覆盖
|
||||
str(audio_path)
|
||||
]
|
||||
# 捕获 stderr
|
||||
subprocess.run(convert_cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
return True
|
||||
except subprocess.CalledProcessError as e:
|
||||
error_log = e.stderr.decode('utf-8', errors='ignore') if e.stderr else str(e)
|
||||
logger.error(f"FFmpeg check/convert failed: {error_log}")
|
||||
# 检查是否为 HTML
|
||||
head = b""
|
||||
try:
|
||||
with open(temp_path, 'rb') as f:
|
||||
head = f.read(100)
|
||||
except: pass
|
||||
if b'<!DOCTYPE html' in head or b'<html' in head:
|
||||
raise ValueError("HTML_DETECTED")
|
||||
raise ValueError("CONVERT_FAILED")
|
||||
|
||||
# 执行转换 (Run in Executor)
|
||||
try:
|
||||
await loop.run_in_executor(None, _convert_audio)
|
||||
logger.info(f"Converted to WAV: {audio_path}")
|
||||
target_path = audio_path
|
||||
except ValueError as ve:
|
||||
if str(ve) == "HTML_DETECTED":
|
||||
raise HTTPException(400, "下载的文件是网页而非视频,请重试或手动上传。")
|
||||
else:
|
||||
raise HTTPException(400, "下载的文件已损坏或格式无法识别。")
|
||||
|
||||
# 2. 提取文案 (Whisper)
|
||||
script = await whisper_service.transcribe(str(target_path))
|
||||
|
||||
# 3. AI 洗稿 (GLM)
|
||||
rewritten = None
|
||||
if rewrite:
|
||||
if script and len(script.strip()) > 0:
|
||||
logger.info("Rewriting script...")
|
||||
rewritten = await glm_service.rewrite_script(script)
|
||||
else:
|
||||
logger.warning("No script extracted, skipping rewrite")
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"original_script": script,
|
||||
"rewritten_script": rewritten
|
||||
}
|
||||
|
||||
except HTTPException as he:
|
||||
raise he
|
||||
except Exception as e:
|
||||
logger.error(f"Tool extract failed: {e}")
|
||||
logger.error(traceback.format_exc())
|
||||
|
||||
# Friendly error message
|
||||
msg = str(e)
|
||||
if "Fresh cookies" in msg:
|
||||
msg = "下载失败:目标平台开启了反爬验证,请过段时间重试或直接上传视频文件。"
|
||||
|
||||
raise HTTPException(500, f"提取失败: {msg}")
|
||||
finally:
|
||||
# 清理临时文件
|
||||
if temp_path and temp_path.exists():
|
||||
try:
|
||||
os.remove(temp_path)
|
||||
logger.info(f"Cleaned up temp file: {temp_path}")
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to cleanup temp file {temp_path}: {e}")
|
||||
|
||||
|
||||
async def download_douyin_manual(url: str, temp_dir: Path, timestamp: int) -> Optional[Path]:
|
||||
"""
|
||||
手动下载抖音视频 (Fallback logic - Ported from SuperIPAgent/douyinDownloader)
|
||||
使用特定的 User Profile URL 和硬编码 Cookie 绕过反爬
|
||||
"""
|
||||
logger.info(f"[SuperIPAgent] Starting download for: {url}")
|
||||
|
||||
try:
|
||||
# 1. 提取 Modal ID (支持短链跳转)
|
||||
headers = {
|
||||
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
|
||||
}
|
||||
|
||||
# 如果是短链或重定向
|
||||
resp = requests.get(url, headers=headers, allow_redirects=True, timeout=10)
|
||||
final_url = resp.url
|
||||
logger.info(f"[SuperIPAgent] Final URL: {final_url}")
|
||||
|
||||
modal_id = None
|
||||
match = re.search(r'/video/(\d+)', final_url)
|
||||
if match:
|
||||
modal_id = match.group(1)
|
||||
|
||||
if not modal_id:
|
||||
logger.error("[SuperIPAgent] Could not extract modal_id")
|
||||
return None
|
||||
|
||||
logger.info(f"[SuperIPAgent] Extracted modal_id: {modal_id}")
|
||||
|
||||
# 2. 构造特定请求 URL (Copy from SuperIPAgent)
|
||||
# 使用特定用户的 Profile 页 + modal_id 参数,配合特定 Cookie
|
||||
target_url = f"https://www.douyin.com/user/MS4wLjABAAAAN_s_hups7LD0N4qnrM3o2gI0vuG3pozNaEolz2_py3cHTTrpVr1Z4dukFD9SOlwY?from_tab_name=main&modal_id={modal_id}"
|
||||
|
||||
# 3. 使用硬编码 Cookie (Copy from SuperIPAgent)
|
||||
headers_with_cookie = {
|
||||
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
|
||||
"cookie": "douyin.com; device_web_cpu_core=10; device_web_memory_size=8; __ac_nonce=06760391f00b9b51264ae; __ac_signature=_02B4Z6wo00f019a5ceAAAIDAhEZR-X3jjWfWmXVAAJLXd4; ttwid=1%7C7MTKBSMsP4eOv9h5NAh8p0E-NYIud09ftNmB0mjLpWc%7C1734359327%7C8794abeabbd47447e1f56e5abc726be089f2a0344d6343b5f75f23e7b0f0028f; UIFID_TEMP=0de8750d2b188f4235dbfd208e44abbb976428f0720eb983255afefa45d39c0c6532e1d4768dd8587bf919f866ff1396912bcb2af71efee56a14a2a9f37b74010d0a0413795262f6d4afe02a032ac7ab; s_v_web_id=verify_m4r4ribr_c7krmY1z_WoeI_43po_ATpO_I4o8U1bex2D7; hevc_supported=true; home_can_add_dy_2_desktop=%220%22; dy_swidth=2560; dy_sheight=1440; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A2560%2C%5C%22screen_height%5C%22%3A1440%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A10%2C%5C%22device_memory%5C%22%3A8%2C%5C%22downlink%5C%22%3A10%2C%5C%22effective_type%5C%22%3A%5C%224g%5C%22%2C%5C%22round_trip_time%5C%22%3A50%7D%22; strategyABtestKey=%221734359328.577%22; csrf_session_id=2f53aed9aa6974e83aa9a1014180c3a4; fpk1=U2FsdGVkX1/IpBh0qdmlKAVhGyYHgur4/VtL9AReZoeSxadXn4juKvsakahRGqjxOPytHWspYoBogyhS/V6QSw==; fpk2=0845b309c7b9b957afd9ecf775a4c21f; passport_csrf_token=d80e0c5b2fa2328219856be5ba7e671e; passport_csrf_token_default=d80e0c5b2fa2328219856be5ba7e671e; odin_tt=3c891091d2eb0f4718c1d5645bc4a0017032d4d5aa989decb729e9da2ad570918cbe5e9133dc6b145fa8c758de98efe32ff1f81aa0d611e838cc73ab08ef7d3f6adf66ab4d10e8372ddd628f94f16b8e; volume_info=%7B%22isUserMute%22%3Afalse%2C%22isMute%22%3Afalse%2C%22volume%22%3A0.5%7D; bd_ticket_guard_client_web_domain=2; FORCE_LOGIN=%7B%22videoConsumedRemainSeconds%22%3A180%7D; UIFID=0de8750d2b188f4235dbfd208e44abbb976428f0720eb983255afefa45d39c0c6532e1d4768dd8587bf919f866ff139655a3c2b735923234f371c699560c657923fd3d6c5b63ab7bb9b83423b6cb4787e2ce66a7fbc4ecb24c8570f520fe6de068bbb95115023c0c6c1b6ee31b49fb7e3996fb8349f43a3fd8b7a61cd9e18e8fe65eb6a7c13de4c0960d84e344b644725db3eb2fa6b7caf821de1b50527979f2; is_dash_user=1; biz_trace_id=b57a241f; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCTEo2R0lDalVoWW1XcHpGOFdrN0Vrc0dXcCtaUzNKY1g4NGNGY2k0TTl1TEowNjdUb21mbFU5aDdvWVBGamhNRWNRQWtKdnN1MnM3RmpTWnlJQXpHMjA9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoyfQ%3D%3D; download_guide=%221%2F20241216%2F0%22; sdk_source_info=7e276470716a68645a606960273f276364697660272927676c715a6d6069756077273f276364697660272927666d776a68605a607d71606b766c6a6b5a7666776c7571273f275e58272927666a6b766a69605a696c6061273f27636469766027292762696a6764695a7364776c6467696076273f275e5827292771273f273d33323131333c3036313632342778; bit_env=RiOY4jzzpxZoVCl6zdVSVhVRjdwHRTxqcqWdqMBZLPGjMdB4Tax1kAELHNTVAAh72KuhumewE4Lq6f0-VJ2UpJrkrhSxoPw9LUb3zQrq1OSwbeSPHkRlRgRQvO89sItdGUyq1oFr0XyRCnMYG87KSeWyc4x0czGR0o50hTDoDLG5rJVoRcdQOLvjiAegsqyytKF59sPX_QM9qffK2SqYsg0hCggURc_AI6kguDDE5DvG0bnyz1utw4z1eEnIoLrkGDqzqBZj4dOAr0BVU6ofbsS-pOQ2u2PM1dLP9FlBVBlVaqYVgHJeSLsR5k76BRTddUjTb4zEilVIEwAMJWGN4I1BxVt6fC9B5tBQpuT0lj3n3eKXCKXZsd8FrEs5_pbfDsxV-e_WMiXI2ff4qxiTC0U73sfo9OpicKICtZjdq8qsHxJuu6wVR36zvXeL2Wch5C6MzprNvkivv0l8nbh2mSgy1nabZr3dmU6NcR-Bg3Q3xTWUlR9aAUmpopC-cNuXjgLpT-Lw1AYGilSUnCvosth1Gfypq-b0MpgmdSDgTrQ%3D; gulu_source_res=eyJwX2luIjoiMDhjOGQ3ZTJiODQyNjZkZWI5Y2VkMGJiODNlNmY1ZWY0ZjMyNTE2ZmYyZjAzNDMzZjI0OWU1Y2Q1NTczNTk5NyJ9; passport_auth_mix_state=hp9bc3dgb1tm5wd8p82zawus27g0e3ue; IsDouyinActive=false",
|
||||
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
|
||||
}
|
||||
|
||||
logger.info(f"[SuperIPAgent] Requesting page with Cookie...")
|
||||
# 必须 verify=False 否则有些环境会报错
|
||||
response = requests.get(target_url, headers=headers_with_cookie, timeout=10)
|
||||
|
||||
# 4. 解析 RENDER_DATA
|
||||
content_match = re.findall(r'<script id="RENDER_DATA" type="application/json">(.*?)</script>', response.text)
|
||||
if not content_match:
|
||||
# 尝试解码后再查找?或者结构变了
|
||||
# 再尝试找 SSR_HYDRATED_DATA
|
||||
if "SSR_HYDRATED_DATA" in response.text:
|
||||
content_match = re.findall(r'<script id="SSR_HYDRATED_DATA" type="application/json">(.*?)</script>', response.text)
|
||||
|
||||
if not content_match:
|
||||
logger.error(f"[SuperIPAgent] Could not find RENDER_DATA in page (len={len(response.text)})")
|
||||
return None
|
||||
|
||||
content = unquote(content_match[0])
|
||||
try:
|
||||
data = json.loads(content)
|
||||
except:
|
||||
logger.error("[SuperIPAgent] JSON decode failed")
|
||||
return None
|
||||
|
||||
# 5. 提取视频流
|
||||
video_url = None
|
||||
try:
|
||||
# 路径通常是: app -> videoDetail -> video -> bitRateList -> playAddr -> src
|
||||
if "app" in data and "videoDetail" in data["app"]:
|
||||
info = data["app"]["videoDetail"]["video"]
|
||||
if "bitRateList" in info and info["bitRateList"]:
|
||||
video_url = info["bitRateList"][0]["playAddr"][0]["src"]
|
||||
elif "playAddr" in info and info["playAddr"]:
|
||||
video_url = info["playAddr"][0]["src"]
|
||||
except Exception as e:
|
||||
logger.error(f"[SuperIPAgent] Path extraction failed: {e}")
|
||||
|
||||
if not video_url:
|
||||
logger.error("[SuperIPAgent] No video_url found")
|
||||
return None
|
||||
|
||||
if video_url.startswith("//"):
|
||||
video_url = "https:" + video_url
|
||||
|
||||
logger.info(f"[SuperIPAgent] Found video URL: {video_url[:50]}...")
|
||||
|
||||
# 6. 下载 (带 Header)
|
||||
temp_path = temp_dir / f"douyin_manual_{timestamp}.mp4"
|
||||
download_headers = {
|
||||
'Referer': 'https://www.douyin.com/',
|
||||
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
|
||||
}
|
||||
|
||||
dl_resp = requests.get(video_url, headers=download_headers, stream=True, timeout=60)
|
||||
if dl_resp.status_code == 200:
|
||||
with open(temp_path, 'wb') as f:
|
||||
for chunk in dl_resp.iter_content(chunk_size=1024):
|
||||
f.write(chunk)
|
||||
|
||||
logger.info(f"[SuperIPAgent] Downloaded successfully: {temp_path}")
|
||||
return temp_path
|
||||
else:
|
||||
logger.error(f"[SuperIPAgent] Download failed: {dl_resp.status_code}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"[SuperIPAgent] Logic failed: {e}")
|
||||
return None
|
||||
|
||||
async def download_bilibili_manual(url: str, temp_dir: Path, timestamp: int) -> Optional[Path]:
|
||||
"""
|
||||
手动下载 Bilibili 视频 (Fallback logic - Playwright Version)
|
||||
B站通常音视频分离,这里只提取音频即可(因为只需要文案)
|
||||
"""
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
logger.info(f"[Playwright] Starting Bilibili download for: {url}")
|
||||
|
||||
playwright = None
|
||||
browser = None
|
||||
try:
|
||||
playwright = await async_playwright().start()
|
||||
# Launch browser (ensure chromium is installed: playwright install chromium)
|
||||
browser = await playwright.chromium.launch(headless=True, args=['--no-sandbox', '--disable-setuid-sandbox'])
|
||||
|
||||
# Mobile User Agent often gives single stream?
|
||||
# But Bilibili mobile web is tricky. Desktop is fine.
|
||||
context = await browser.new_context(
|
||||
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
page = await context.new_page()
|
||||
|
||||
# Intercept audio responses?
|
||||
# Bilibili streams are usually .m4s
|
||||
# But finding the initial state is easier.
|
||||
|
||||
logger.info("[Playwright] Navigating to Bilibili...")
|
||||
await page.goto(url, timeout=45000)
|
||||
|
||||
# Wait for video element (triggers loading)
|
||||
try:
|
||||
await page.wait_for_selector('video', timeout=15000)
|
||||
except:
|
||||
logger.warning("[Playwright] Video selector timeout")
|
||||
|
||||
# 1. Try extracting from __playinfo__
|
||||
# window.__playinfo__ contains dash streams
|
||||
playinfo = await page.evaluate("window.__playinfo__")
|
||||
|
||||
audio_url = None
|
||||
|
||||
if playinfo and "data" in playinfo and "dash" in playinfo["data"]:
|
||||
dash = playinfo["data"]["dash"]
|
||||
if "audio" in dash and dash["audio"]:
|
||||
audio_url = dash["audio"][0]["baseUrl"]
|
||||
logger.info(f"[Playwright] Found audio stream in __playinfo__: {audio_url[:50]}...")
|
||||
|
||||
# 2. If playinfo fails, try extracting video src (sometimes it's a blob, which we can't fetch easily without interception)
|
||||
# But interception is complex. Let's try requests with Referer if we have URL.
|
||||
|
||||
if not audio_url:
|
||||
logger.warning("[Playwright] Could not find audio in __playinfo__")
|
||||
return None
|
||||
|
||||
# Download the audio stream
|
||||
temp_path = temp_dir / f"bilibili_audio_{timestamp}.m4s" # usually m4s
|
||||
|
||||
try:
|
||||
api_request = context.request
|
||||
headers = {
|
||||
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
|
||||
"Referer": "https://www.bilibili.com/"
|
||||
}
|
||||
|
||||
logger.info(f"[Playwright] Downloading audio stream...")
|
||||
response = await api_request.get(audio_url, headers=headers)
|
||||
|
||||
if response.status == 200:
|
||||
body = await response.body()
|
||||
with open(temp_path, 'wb') as f:
|
||||
f.write(body)
|
||||
|
||||
logger.info(f"[Playwright] Downloaded successfully: {temp_path}")
|
||||
return temp_path
|
||||
else:
|
||||
logger.error(f"[Playwright] API Request failed: {response.status}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"[Playwright] Download logic error: {e}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"[Playwright] Bilibili download failed: {e}")
|
||||
return None
|
||||
finally:
|
||||
if browser:
|
||||
await browser.close()
|
||||
if playwright:
|
||||
await playwright.stop()
|
||||
@@ -1,478 +0,0 @@
|
||||
from fastapi import APIRouter, HTTPException, BackgroundTasks, Depends, Request
|
||||
from pydantic import BaseModel
|
||||
from typing import Optional
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
import uuid
|
||||
import traceback
|
||||
import time
|
||||
import httpx
|
||||
import os
|
||||
from app.services.tts_service import TTSService
|
||||
from app.services.video_service import VideoService
|
||||
from app.services.lipsync_service import LipSyncService
|
||||
from app.services.voice_clone_service import voice_clone_service
|
||||
from app.services.assets_service import (
|
||||
get_style,
|
||||
get_default_style,
|
||||
resolve_bgm_path,
|
||||
prepare_style_for_remotion,
|
||||
)
|
||||
from app.services.storage import storage_service
|
||||
from app.services.whisper_service import whisper_service
|
||||
from app.services.remotion_service import remotion_service
|
||||
from app.core.config import settings
|
||||
from app.core.deps import get_current_user
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
class GenerateRequest(BaseModel):
|
||||
text: str
|
||||
voice: str = "zh-CN-YunxiNeural"
|
||||
material_path: str
|
||||
# 声音克隆模式新增字段
|
||||
tts_mode: str = "edgetts" # "edgetts" | "voiceclone"
|
||||
ref_audio_id: Optional[str] = None # 参考音频 storage path
|
||||
ref_text: Optional[str] = None # 参考音频的转写文字
|
||||
# 字幕和标题功能
|
||||
title: Optional[str] = None # 视频标题(片头显示)
|
||||
enable_subtitles: bool = True # 是否启用逐字高亮字幕
|
||||
subtitle_style_id: Optional[str] = None # 字幕样式 ID
|
||||
title_style_id: Optional[str] = None # 标题样式 ID
|
||||
subtitle_font_size: Optional[int] = None # 字幕字号(覆盖样式)
|
||||
title_font_size: Optional[int] = None # 标题字号(覆盖样式)
|
||||
bgm_id: Optional[str] = None # 背景音乐 ID
|
||||
bgm_volume: Optional[float] = 0.2 # 背景音乐音量 (0-1)
|
||||
|
||||
tasks = {} # In-memory task store
|
||||
|
||||
# 缓存 LipSync 服务实例和健康状态
|
||||
_lipsync_service: Optional[LipSyncService] = None
|
||||
_lipsync_ready: Optional[bool] = None
|
||||
_lipsync_last_check: float = 0
|
||||
|
||||
def _get_lipsync_service() -> LipSyncService:
|
||||
"""获取或创建 LipSync 服务实例(单例模式,避免重复初始化)"""
|
||||
global _lipsync_service
|
||||
if _lipsync_service is None:
|
||||
_lipsync_service = LipSyncService()
|
||||
return _lipsync_service
|
||||
|
||||
async def _check_lipsync_ready(force: bool = False) -> bool:
|
||||
"""检查 LipSync 是否就绪(带缓存,5分钟内不重复检查)"""
|
||||
global _lipsync_ready, _lipsync_last_check
|
||||
|
||||
now = time.time()
|
||||
# 5分钟缓存
|
||||
if not force and _lipsync_ready is not None and (now - _lipsync_last_check) < 300:
|
||||
return bool(_lipsync_ready)
|
||||
|
||||
lipsync = _get_lipsync_service()
|
||||
health = await lipsync.check_health()
|
||||
_lipsync_ready = health.get("ready", False)
|
||||
_lipsync_last_check = now
|
||||
print(f"[LipSync] Health check: ready={_lipsync_ready}")
|
||||
return bool(_lipsync_ready)
|
||||
|
||||
async def _download_material(path_or_url: str, temp_path: Path):
|
||||
"""下载素材到临时文件 (流式下载,节省内存)"""
|
||||
if path_or_url.startswith("http"):
|
||||
# Download from URL
|
||||
timeout = httpx.Timeout(None) # Disable timeout for large files
|
||||
async with httpx.AsyncClient(timeout=timeout) as client:
|
||||
async with client.stream("GET", path_or_url) as resp:
|
||||
resp.raise_for_status()
|
||||
with open(temp_path, "wb") as f:
|
||||
async for chunk in resp.aiter_bytes():
|
||||
f.write(chunk)
|
||||
else:
|
||||
# Local file (legacy or absolute path)
|
||||
src = Path(path_or_url)
|
||||
if not src.is_absolute():
|
||||
src = settings.BASE_DIR.parent / path_or_url
|
||||
|
||||
if src.exists():
|
||||
import shutil
|
||||
shutil.copy(src, temp_path)
|
||||
else:
|
||||
raise FileNotFoundError(f"Material not found: {path_or_url}")
|
||||
|
||||
async def _process_video_generation(task_id: str, req: GenerateRequest, user_id: str):
|
||||
temp_files = [] # Track files to clean up
|
||||
try:
|
||||
start_time = time.time()
|
||||
|
||||
tasks[task_id]["status"] = "processing"
|
||||
tasks[task_id]["progress"] = 5
|
||||
tasks[task_id]["message"] = "正在下载素材..."
|
||||
|
||||
# Prepare temp dir
|
||||
temp_dir = settings.UPLOAD_DIR / "temp"
|
||||
temp_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# 0. Download Material
|
||||
input_material_path = temp_dir / f"{task_id}_input.mp4"
|
||||
temp_files.append(input_material_path)
|
||||
|
||||
await _download_material(req.material_path, input_material_path)
|
||||
|
||||
# 1. TTS - 进度 5% -> 25%
|
||||
tasks[task_id]["message"] = "正在生成语音..."
|
||||
tasks[task_id]["progress"] = 10
|
||||
|
||||
audio_path = temp_dir / f"{task_id}_audio.wav"
|
||||
temp_files.append(audio_path)
|
||||
|
||||
if req.tts_mode == "voiceclone":
|
||||
# 声音克隆模式
|
||||
if not req.ref_audio_id or not req.ref_text:
|
||||
raise ValueError("声音克隆模式需要提供参考音频和参考文字")
|
||||
|
||||
tasks[task_id]["message"] = "正在下载参考音频..."
|
||||
|
||||
# 从 Supabase 下载参考音频
|
||||
ref_audio_local = temp_dir / f"{task_id}_ref.wav"
|
||||
temp_files.append(ref_audio_local)
|
||||
|
||||
ref_audio_url = await storage_service.get_signed_url(
|
||||
bucket="ref-audios",
|
||||
path=req.ref_audio_id
|
||||
)
|
||||
await _download_material(ref_audio_url, ref_audio_local)
|
||||
|
||||
tasks[task_id]["message"] = "正在克隆声音 (Qwen3-TTS)..."
|
||||
await voice_clone_service.generate_audio(
|
||||
text=req.text,
|
||||
ref_audio_path=str(ref_audio_local),
|
||||
ref_text=req.ref_text,
|
||||
output_path=str(audio_path),
|
||||
language="Chinese"
|
||||
)
|
||||
else:
|
||||
# EdgeTTS 模式 (默认)
|
||||
tasks[task_id]["message"] = "正在生成语音 (EdgeTTS)..."
|
||||
tts = TTSService()
|
||||
await tts.generate_audio(req.text, req.voice, str(audio_path))
|
||||
|
||||
tts_time = time.time() - start_time
|
||||
print(f"[Pipeline] TTS completed in {tts_time:.1f}s")
|
||||
tasks[task_id]["progress"] = 25
|
||||
|
||||
# 2. LipSync - 进度 25% -> 85%
|
||||
tasks[task_id]["message"] = "正在合成唇形 (LatentSync)..."
|
||||
tasks[task_id]["progress"] = 30
|
||||
|
||||
lipsync = _get_lipsync_service()
|
||||
lipsync_video_path = temp_dir / f"{task_id}_lipsync.mp4"
|
||||
temp_files.append(lipsync_video_path)
|
||||
|
||||
# 使用缓存的健康检查结果
|
||||
lipsync_start = time.time()
|
||||
is_ready = await _check_lipsync_ready()
|
||||
|
||||
if is_ready:
|
||||
print(f"[LipSync] Starting LatentSync inference...")
|
||||
tasks[task_id]["progress"] = 35
|
||||
tasks[task_id]["message"] = "正在运行 LatentSync 推理..."
|
||||
await lipsync.generate(str(input_material_path), str(audio_path), str(lipsync_video_path))
|
||||
else:
|
||||
# Skip lipsync if not available
|
||||
print(f"[LipSync] LatentSync not ready, copying original video")
|
||||
tasks[task_id]["message"] = "唇形同步不可用,使用原始视频..."
|
||||
import shutil
|
||||
shutil.copy(str(input_material_path), lipsync_video_path)
|
||||
|
||||
lipsync_time = time.time() - lipsync_start
|
||||
print(f"[Pipeline] LipSync completed in {lipsync_time:.1f}s")
|
||||
tasks[task_id]["progress"] = 80
|
||||
|
||||
# 3. WhisperX 字幕对齐 - 进度 80% -> 85%
|
||||
captions_path = None
|
||||
if req.enable_subtitles:
|
||||
tasks[task_id]["message"] = "正在生成字幕 (Whisper)..."
|
||||
tasks[task_id]["progress"] = 82
|
||||
|
||||
captions_path = temp_dir / f"{task_id}_captions.json"
|
||||
temp_files.append(captions_path)
|
||||
|
||||
try:
|
||||
await whisper_service.align(
|
||||
audio_path=str(audio_path),
|
||||
text=req.text,
|
||||
output_path=str(captions_path)
|
||||
)
|
||||
print(f"[Pipeline] Whisper alignment completed")
|
||||
except Exception as e:
|
||||
logger.warning(f"Whisper alignment failed, skipping subtitles: {e}")
|
||||
captions_path = None
|
||||
|
||||
tasks[task_id]["progress"] = 85
|
||||
|
||||
# 3.5 背景音乐混音(不影响唇形与字幕对齐)
|
||||
video = VideoService()
|
||||
final_audio_path = audio_path
|
||||
if req.bgm_id:
|
||||
tasks[task_id]["message"] = "正在合成背景音乐..."
|
||||
tasks[task_id]["progress"] = 86
|
||||
|
||||
bgm_path = resolve_bgm_path(req.bgm_id)
|
||||
if bgm_path:
|
||||
mix_output_path = temp_dir / f"{task_id}_audio_mix.wav"
|
||||
temp_files.append(mix_output_path)
|
||||
volume = req.bgm_volume if req.bgm_volume is not None else 0.2
|
||||
volume = max(0.0, min(float(volume), 1.0))
|
||||
try:
|
||||
video.mix_audio(
|
||||
voice_path=str(audio_path),
|
||||
bgm_path=str(bgm_path),
|
||||
output_path=str(mix_output_path),
|
||||
bgm_volume=volume
|
||||
)
|
||||
final_audio_path = mix_output_path
|
||||
except Exception as e:
|
||||
logger.warning(f"BGM mix failed, fallback to voice only: {e}")
|
||||
else:
|
||||
logger.warning(f"BGM not found: {req.bgm_id}")
|
||||
|
||||
# 4. Remotion 视频合成(字幕 + 标题)- 进度 85% -> 95%
|
||||
# 判断是否需要使用 Remotion(有字幕或标题时使用)
|
||||
use_remotion = (captions_path and captions_path.exists()) or req.title
|
||||
|
||||
subtitle_style = None
|
||||
title_style = None
|
||||
if req.enable_subtitles:
|
||||
subtitle_style = get_style("subtitle", req.subtitle_style_id) or get_default_style("subtitle")
|
||||
if req.title:
|
||||
title_style = get_style("title", req.title_style_id) or get_default_style("title")
|
||||
|
||||
if req.subtitle_font_size and req.enable_subtitles:
|
||||
if subtitle_style is None:
|
||||
subtitle_style = {}
|
||||
subtitle_style["font_size"] = int(req.subtitle_font_size)
|
||||
|
||||
if req.title_font_size and req.title:
|
||||
if title_style is None:
|
||||
title_style = {}
|
||||
title_style["font_size"] = int(req.title_font_size)
|
||||
|
||||
if use_remotion:
|
||||
subtitle_style = prepare_style_for_remotion(
|
||||
subtitle_style,
|
||||
temp_dir,
|
||||
f"{task_id}_subtitle_font"
|
||||
)
|
||||
title_style = prepare_style_for_remotion(
|
||||
title_style,
|
||||
temp_dir,
|
||||
f"{task_id}_title_font"
|
||||
)
|
||||
|
||||
final_output_local_path = temp_dir / f"{task_id}_output.mp4"
|
||||
temp_files.append(final_output_local_path)
|
||||
|
||||
if use_remotion:
|
||||
tasks[task_id]["message"] = "正在合成视频 (Remotion)..."
|
||||
tasks[task_id]["progress"] = 87
|
||||
|
||||
# 先用 FFmpeg 合成音视频(Remotion 需要带音频的视频)
|
||||
composed_video_path = temp_dir / f"{task_id}_composed.mp4"
|
||||
temp_files.append(composed_video_path)
|
||||
|
||||
await video.compose(str(lipsync_video_path), str(final_audio_path), str(composed_video_path))
|
||||
|
||||
# 检查 Remotion 是否可用
|
||||
remotion_health = await remotion_service.check_health()
|
||||
if remotion_health.get("ready"):
|
||||
try:
|
||||
def on_remotion_progress(percent):
|
||||
# 映射 Remotion 进度到 87-95%
|
||||
mapped = 87 + int(percent * 0.08)
|
||||
tasks[task_id]["progress"] = mapped
|
||||
|
||||
await remotion_service.render(
|
||||
video_path=str(composed_video_path),
|
||||
output_path=str(final_output_local_path),
|
||||
captions_path=str(captions_path) if captions_path else None,
|
||||
title=req.title,
|
||||
title_duration=3.0,
|
||||
fps=25,
|
||||
enable_subtitles=req.enable_subtitles,
|
||||
subtitle_style=subtitle_style,
|
||||
title_style=title_style,
|
||||
on_progress=on_remotion_progress
|
||||
)
|
||||
print(f"[Pipeline] Remotion render completed")
|
||||
except Exception as e:
|
||||
logger.warning(f"Remotion render failed, using FFmpeg fallback: {e}")
|
||||
# 回退到 FFmpeg 合成
|
||||
import shutil
|
||||
shutil.copy(str(composed_video_path), final_output_local_path)
|
||||
else:
|
||||
logger.warning(f"Remotion not ready: {remotion_health.get('error')}, using FFmpeg")
|
||||
import shutil
|
||||
shutil.copy(str(composed_video_path), final_output_local_path)
|
||||
else:
|
||||
# 不需要字幕和标题,直接用 FFmpeg 合成
|
||||
tasks[task_id]["message"] = "正在合成最终视频..."
|
||||
tasks[task_id]["progress"] = 90
|
||||
|
||||
await video.compose(str(lipsync_video_path), str(final_audio_path), str(final_output_local_path))
|
||||
|
||||
total_time = time.time() - start_time
|
||||
|
||||
# 4. Upload to Supabase with user isolation
|
||||
tasks[task_id]["message"] = "正在上传结果..."
|
||||
tasks[task_id]["progress"] = 95
|
||||
|
||||
# 使用 user_id 作为目录前缀实现隔离
|
||||
storage_path = f"{user_id}/{task_id}_output.mp4"
|
||||
with open(final_output_local_path, "rb") as f:
|
||||
file_data = f.read()
|
||||
await storage_service.upload_file(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=storage_path,
|
||||
file_data=file_data,
|
||||
content_type="video/mp4"
|
||||
)
|
||||
|
||||
# Get Signed URL
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=storage_path
|
||||
)
|
||||
|
||||
print(f"[Pipeline] Total generation time: {total_time:.1f}s")
|
||||
|
||||
tasks[task_id]["status"] = "completed"
|
||||
tasks[task_id]["progress"] = 100
|
||||
tasks[task_id]["message"] = f"生成完成!耗时 {total_time:.0f} 秒"
|
||||
tasks[task_id]["output"] = storage_path
|
||||
tasks[task_id]["download_url"] = signed_url
|
||||
|
||||
except Exception as e:
|
||||
tasks[task_id]["status"] = "failed"
|
||||
tasks[task_id]["message"] = f"错误: {str(e)}"
|
||||
tasks[task_id]["error"] = traceback.format_exc()
|
||||
logger.error(f"Generate video failed: {e}")
|
||||
finally:
|
||||
# Cleanup temp files
|
||||
for f in temp_files:
|
||||
try:
|
||||
if f.exists():
|
||||
f.unlink()
|
||||
except Exception as e:
|
||||
print(f"Error cleaning up {f}: {e}")
|
||||
|
||||
@router.post("/generate")
|
||||
async def generate_video(
|
||||
req: GenerateRequest,
|
||||
background_tasks: BackgroundTasks,
|
||||
current_user: dict = Depends(get_current_user)
|
||||
):
|
||||
user_id = current_user["id"]
|
||||
task_id = str(uuid.uuid4())
|
||||
tasks[task_id] = {"status": "pending", "task_id": task_id, "progress": 0, "user_id": user_id}
|
||||
background_tasks.add_task(_process_video_generation, task_id, req, user_id)
|
||||
return {"task_id": task_id}
|
||||
|
||||
@router.get("/tasks/{task_id}")
|
||||
async def get_task(task_id: str):
|
||||
return tasks.get(task_id, {"status": "not_found"})
|
||||
|
||||
@router.get("/tasks")
|
||||
async def list_tasks():
|
||||
return {"tasks": list(tasks.values())}
|
||||
|
||||
@router.get("/lipsync/health")
|
||||
async def lipsync_health():
|
||||
"""获取 LipSync 服务健康状态"""
|
||||
lipsync = _get_lipsync_service()
|
||||
return await lipsync.check_health()
|
||||
|
||||
|
||||
@router.get("/voiceclone/health")
|
||||
async def voiceclone_health():
|
||||
"""获取声音克隆服务健康状态"""
|
||||
return await voice_clone_service.check_health()
|
||||
|
||||
|
||||
@router.get("/generated")
|
||||
async def list_generated_videos(current_user: dict = Depends(get_current_user)):
|
||||
"""从 Storage 读取当前用户生成的视频列表"""
|
||||
user_id = current_user["id"]
|
||||
try:
|
||||
# 只列出当前用户目录下的文件
|
||||
files_obj = await storage_service.list_files(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=user_id
|
||||
)
|
||||
|
||||
videos = []
|
||||
for f in files_obj:
|
||||
name = f.get('name')
|
||||
if not name or name == '.emptyFolderPlaceholder':
|
||||
continue
|
||||
|
||||
# 过滤非 output.mp4 文件
|
||||
if not name.endswith("_output.mp4"):
|
||||
continue
|
||||
|
||||
# 获取 ID (即文件名去除后缀)
|
||||
video_id = Path(name).stem
|
||||
|
||||
# 完整路径包含 user_id
|
||||
full_path = f"{user_id}/{name}"
|
||||
|
||||
# 获取签名链接
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=full_path
|
||||
)
|
||||
|
||||
metadata = f.get('metadata', {})
|
||||
size = metadata.get('size', 0)
|
||||
# created_at 在顶层,是 ISO 字符串,转换为 Unix 时间戳
|
||||
created_at_str = f.get('created_at', '')
|
||||
created_at = 0
|
||||
if created_at_str:
|
||||
from datetime import datetime
|
||||
try:
|
||||
dt = datetime.fromisoformat(created_at_str.replace('Z', '+00:00'))
|
||||
created_at = int(dt.timestamp())
|
||||
except:
|
||||
pass
|
||||
|
||||
videos.append({
|
||||
"id": video_id,
|
||||
"name": name,
|
||||
"path": signed_url, # Direct playable URL
|
||||
"size_mb": size / (1024 * 1024),
|
||||
"created_at": created_at
|
||||
})
|
||||
|
||||
# Sort by created_at desc (newest first)
|
||||
# Supabase API usually returns ISO string, simpler string sort works for ISO
|
||||
videos.sort(key=lambda x: x.get("created_at", ""), reverse=True)
|
||||
return {"videos": videos}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"List generated videos failed: {e}")
|
||||
return {"videos": []}
|
||||
|
||||
|
||||
@router.delete("/generated/{video_id}")
|
||||
async def delete_generated_video(video_id: str, current_user: dict = Depends(get_current_user)):
|
||||
"""删除生成的视频"""
|
||||
user_id = current_user["id"]
|
||||
try:
|
||||
# video_id 通常是 uuid_output,完整路径需要加上 user_id
|
||||
storage_path = f"{user_id}/{video_id}.mp4"
|
||||
|
||||
await storage_service.delete_file(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=storage_path
|
||||
)
|
||||
return {"success": True, "message": "视频已删除"}
|
||||
except Exception as e:
|
||||
raise HTTPException(500, f"删除失败: {str(e)}")
|
||||
|
||||
@@ -3,14 +3,56 @@ from pathlib import Path
|
||||
|
||||
class Settings(BaseSettings):
|
||||
# 基础路径配置
|
||||
BASE_DIR: Path = Path(__file__).resolve().parent.parent
|
||||
UPLOAD_DIR: Path = BASE_DIR.parent / "uploads"
|
||||
OUTPUT_DIR: Path = BASE_DIR.parent / "outputs"
|
||||
ASSETS_DIR: Path = BASE_DIR.parent / "assets"
|
||||
BASE_DIR: Path = Path(__file__).resolve().parent.parent
|
||||
UPLOAD_DIR: Path = BASE_DIR.parent / "uploads"
|
||||
OUTPUT_DIR: Path = BASE_DIR.parent / "outputs"
|
||||
ASSETS_DIR: Path = BASE_DIR.parent / "assets"
|
||||
PUBLISH_SCREENSHOT_DIR: Path = BASE_DIR.parent / "private_outputs" / "publish_screenshots"
|
||||
|
||||
# 数据库/缓存
|
||||
REDIS_URL: str = "redis://localhost:6379/0"
|
||||
DEBUG: bool = True
|
||||
|
||||
# Playwright 配置
|
||||
WEIXIN_HEADLESS_MODE: str = "headless-new"
|
||||
WEIXIN_USER_AGENT: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
|
||||
WEIXIN_LOCALE: str = "zh-CN"
|
||||
WEIXIN_TIMEZONE_ID: str = "Asia/Shanghai"
|
||||
WEIXIN_CHROME_PATH: str = "/usr/bin/google-chrome"
|
||||
WEIXIN_BROWSER_CHANNEL: str = ""
|
||||
WEIXIN_FORCE_SWIFTSHADER: bool = True
|
||||
WEIXIN_TRANSCODE_MODE: str = "reencode"
|
||||
WEIXIN_DEBUG_ARTIFACTS: bool = False
|
||||
WEIXIN_RECORD_VIDEO: bool = False
|
||||
WEIXIN_KEEP_SUCCESS_VIDEO: bool = False
|
||||
WEIXIN_RECORD_VIDEO_WIDTH: int = 1280
|
||||
WEIXIN_RECORD_VIDEO_HEIGHT: int = 720
|
||||
|
||||
# Douyin Playwright 配置
|
||||
DOUYIN_HEADLESS_MODE: str = "headless-new"
|
||||
DOUYIN_USER_AGENT: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36"
|
||||
DOUYIN_LOCALE: str = "zh-CN"
|
||||
DOUYIN_TIMEZONE_ID: str = "Asia/Shanghai"
|
||||
DOUYIN_CHROME_PATH: str = "/usr/bin/google-chrome"
|
||||
DOUYIN_BROWSER_CHANNEL: str = ""
|
||||
DOUYIN_FORCE_SWIFTSHADER: bool = True
|
||||
|
||||
# Douyin 调试录屏
|
||||
DOUYIN_DEBUG_ARTIFACTS: bool = False
|
||||
DOUYIN_RECORD_VIDEO: bool = False
|
||||
DOUYIN_KEEP_SUCCESS_VIDEO: bool = False
|
||||
DOUYIN_RECORD_VIDEO_WIDTH: int = 1280
|
||||
DOUYIN_RECORD_VIDEO_HEIGHT: int = 720
|
||||
|
||||
# Xiaohongshu Playwright 配置
|
||||
XIAOHONGSHU_HEADLESS_MODE: str = "headless-new"
|
||||
XIAOHONGSHU_USER_AGENT: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36"
|
||||
XIAOHONGSHU_LOCALE: str = "zh-CN"
|
||||
XIAOHONGSHU_TIMEZONE_ID: str = "Asia/Shanghai"
|
||||
XIAOHONGSHU_CHROME_PATH: str = "/usr/bin/google-chrome"
|
||||
XIAOHONGSHU_BROWSER_CHANNEL: str = ""
|
||||
XIAOHONGSHU_FORCE_SWIFTSHADER: bool = True
|
||||
XIAOHONGSHU_DEBUG_ARTIFACTS: bool = False
|
||||
|
||||
# TTS 配置
|
||||
DEFAULT_TTS_VOICE: str = "zh-CN-YunxiNeural"
|
||||
@@ -25,7 +67,17 @@ class Settings(BaseSettings):
|
||||
LATENTSYNC_ENABLE_DEEPCACHE: bool = True # 启用 DeepCache 加速
|
||||
LATENTSYNC_SEED: int = 1247 # 随机种子 (-1 则随机)
|
||||
LATENTSYNC_USE_SERVER: bool = True # 使用常驻服务 (Persistent Server) 加速
|
||||
|
||||
|
||||
# MuseTalk 配置
|
||||
MUSETALK_GPU_ID: int = 0 # GPU ID (默认使用 GPU0)
|
||||
MUSETALK_API_URL: str = "http://localhost:8011" # 常驻服务地址
|
||||
MUSETALK_BATCH_SIZE: int = 8 # 推理批大小
|
||||
MUSETALK_VERSION: str = "v15" # 模型版本
|
||||
MUSETALK_USE_FLOAT16: bool = True # 半精度加速
|
||||
|
||||
# 混合唇形同步路由
|
||||
LIPSYNC_DURATION_THRESHOLD: float = 120.0 # 秒,>=此值用 MuseTalk
|
||||
|
||||
# Supabase 配置
|
||||
SUPABASE_URL: str = ""
|
||||
SUPABASE_PUBLIC_URL: str = "" # 公网访问地址,用于生成前端可访问的 URL
|
||||
@@ -44,11 +96,28 @@ class Settings(BaseSettings):
|
||||
GLM_API_KEY: str = ""
|
||||
GLM_MODEL: str = "glm-4.7-flash"
|
||||
|
||||
# 支付宝配置
|
||||
ALIPAY_APP_ID: str = ""
|
||||
ALIPAY_PRIVATE_KEY_PATH: str = "" # 应用私钥 PEM 文件路径
|
||||
ALIPAY_PUBLIC_KEY_PATH: str = "" # 支付宝公钥 PEM 文件路径
|
||||
ALIPAY_NOTIFY_URL: str = "" # 异步通知回调地址(公网可达)
|
||||
ALIPAY_RETURN_URL: str = "" # 支付成功后同步跳转地址
|
||||
ALIPAY_SANDBOX: bool = False # 是否使用沙箱环境
|
||||
PAYMENT_AMOUNT: float = 999.00 # 会员价格(元)
|
||||
PAYMENT_EXPIRE_DAYS: int = 365 # 会员有效天数
|
||||
|
||||
# CORS 配置 (逗号分隔的域名列表,* 表示允许所有)
|
||||
CORS_ORIGINS: str = "*"
|
||||
@property
|
||||
def LATENTSYNC_DIR(self) -> Path:
|
||||
"""LatentSync 目录路径 (动态计算)"""
|
||||
return self.BASE_DIR.parent.parent / "models" / "LatentSync"
|
||||
|
||||
@property
|
||||
def MUSETALK_DIR(self) -> Path:
|
||||
"""MuseTalk 目录路径 (动态计算)"""
|
||||
return self.BASE_DIR.parent.parent / "models" / "MuseTalk"
|
||||
|
||||
class Config:
|
||||
env_file = ".env"
|
||||
extra = "ignore" # 忽略未知的环境变量
|
||||
|
||||
@@ -1,10 +1,11 @@
|
||||
"""
|
||||
依赖注入模块:认证和用户获取
|
||||
"""
|
||||
from typing import Optional
|
||||
from typing import Optional, Any, Dict, cast
|
||||
from fastapi import Request, HTTPException, Depends, status
|
||||
from app.core.security import decode_access_token, TokenData
|
||||
from app.core.supabase import get_supabase
|
||||
from app.core.security import decode_access_token
|
||||
from app.repositories.sessions import get_session, delete_sessions
|
||||
from app.repositories.users import get_user_by_id, deactivate_user_if_expired
|
||||
from loguru import logger
|
||||
|
||||
|
||||
@@ -15,7 +16,7 @@ async def get_token_from_cookie(request: Request) -> Optional[str]:
|
||||
|
||||
async def get_current_user_optional(
|
||||
request: Request
|
||||
) -> Optional[dict]:
|
||||
) -> Optional[Dict[str, Any]]:
|
||||
"""
|
||||
获取当前用户 (可选,未登录返回 None)
|
||||
"""
|
||||
@@ -29,23 +30,21 @@ async def get_current_user_optional(
|
||||
|
||||
# 验证 session_token 是否有效 (单设备登录检查)
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("user_sessions").select("*").eq(
|
||||
"user_id", token_data.user_id
|
||||
).eq(
|
||||
"session_token", token_data.session_token
|
||||
).execute()
|
||||
|
||||
if not result.data:
|
||||
session = get_session(token_data.user_id, token_data.session_token)
|
||||
if not session:
|
||||
logger.warning(f"Session token 无效: user_id={token_data.user_id}")
|
||||
return None
|
||||
|
||||
# 获取用户信息
|
||||
user_result = supabase.table("users").select("*").eq(
|
||||
"id", token_data.user_id
|
||||
).single().execute()
|
||||
|
||||
return user_result.data
|
||||
|
||||
user = cast(Optional[Dict[str, Any]], get_user_by_id(token_data.user_id))
|
||||
if user and deactivate_user_if_expired(user):
|
||||
delete_sessions(token_data.user_id)
|
||||
return None
|
||||
|
||||
if user and not user.get("is_active"):
|
||||
delete_sessions(token_data.user_id)
|
||||
return None
|
||||
|
||||
return user
|
||||
except Exception as e:
|
||||
logger.error(f"获取用户信息失败: {e}")
|
||||
return None
|
||||
@@ -53,7 +52,7 @@ async def get_current_user_optional(
|
||||
|
||||
async def get_current_user(
|
||||
request: Request
|
||||
) -> dict:
|
||||
) -> Dict[str, Any]:
|
||||
"""
|
||||
获取当前用户 (必须登录)
|
||||
|
||||
@@ -76,43 +75,35 @@ async def get_current_user(
|
||||
)
|
||||
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
|
||||
# 验证 session_token (单设备登录)
|
||||
session_result = supabase.table("user_sessions").select("*").eq(
|
||||
"user_id", token_data.user_id
|
||||
).eq(
|
||||
"session_token", token_data.session_token
|
||||
).execute()
|
||||
|
||||
if not session_result.data:
|
||||
session = get_session(token_data.user_id, token_data.session_token)
|
||||
if not session:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="会话已失效,请重新登录(可能已在其他设备登录)"
|
||||
)
|
||||
|
||||
# 获取用户信息
|
||||
user_result = supabase.table("users").select("*").eq(
|
||||
"id", token_data.user_id
|
||||
).single().execute()
|
||||
|
||||
user = user_result.data
|
||||
|
||||
user = get_user_by_id(token_data.user_id)
|
||||
if not user:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="用户不存在"
|
||||
)
|
||||
|
||||
# 检查授权是否过期
|
||||
if user.get("expires_at"):
|
||||
from datetime import datetime, timezone
|
||||
expires_at = datetime.fromisoformat(user["expires_at"].replace("Z", "+00:00"))
|
||||
if datetime.now(timezone.utc) > expires_at:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="授权已过期,请联系管理员续期"
|
||||
)
|
||||
|
||||
user = cast(Dict[str, Any], user)
|
||||
|
||||
if deactivate_user_if_expired(user):
|
||||
delete_sessions(token_data.user_id)
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="会员已到期,请续费"
|
||||
)
|
||||
|
||||
if not user.get("is_active"):
|
||||
delete_sessions(token_data.user_id)
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="账号已停用"
|
||||
)
|
||||
|
||||
return user
|
||||
except HTTPException:
|
||||
raise
|
||||
|
||||
26
backend/app/core/response.py
Normal file
26
backend/app/core/response.py
Normal file
@@ -0,0 +1,26 @@
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
|
||||
def success_response(
|
||||
data: Any = None,
|
||||
message: str = "ok",
|
||||
code: int = 0,
|
||||
success: bool = True,
|
||||
) -> Dict[str, Any]:
|
||||
return {
|
||||
"success": success,
|
||||
"message": message,
|
||||
"data": data,
|
||||
"code": code,
|
||||
}
|
||||
|
||||
|
||||
def error_response(message: str, code: int, data: Optional[Any] = None) -> Dict[str, Any]:
|
||||
payload = {
|
||||
"success": False,
|
||||
"message": message,
|
||||
"code": code,
|
||||
}
|
||||
if data is not None:
|
||||
payload["data"] = data
|
||||
return payload
|
||||
@@ -110,3 +110,28 @@ def set_auth_cookie(response: Response, token: str) -> None:
|
||||
def clear_auth_cookie(response: Response) -> None:
|
||||
"""清除认证 Cookie"""
|
||||
response.delete_cookie(key="access_token")
|
||||
|
||||
|
||||
def create_payment_token(user_id: str) -> str:
|
||||
"""生成付费专用短期 JWT token(30 分钟有效)"""
|
||||
payload = {
|
||||
"sub": user_id,
|
||||
"purpose": "payment",
|
||||
"exp": datetime.now(timezone.utc) + timedelta(minutes=30),
|
||||
}
|
||||
return jwt.encode(payload, settings.JWT_SECRET_KEY, algorithm=settings.JWT_ALGORITHM)
|
||||
|
||||
|
||||
def decode_payment_token(token: str) -> str | None:
|
||||
"""解析 payment_token,返回 user_id(仅 purpose=payment 有效)"""
|
||||
try:
|
||||
data = jwt.decode(
|
||||
token,
|
||||
settings.JWT_SECRET_KEY,
|
||||
algorithms=[settings.JWT_ALGORITHM],
|
||||
)
|
||||
if data.get("purpose") != "payment":
|
||||
return None
|
||||
return data.get("sub")
|
||||
except JWTError:
|
||||
return None
|
||||
|
||||
@@ -1,8 +1,22 @@
|
||||
from fastapi import FastAPI
|
||||
from fastapi import FastAPI, HTTPException
|
||||
from fastapi.staticfiles import StaticFiles
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from fastapi.responses import JSONResponse
|
||||
from app.core import config
|
||||
from app.api import materials, videos, publish, login_helper, auth, admin, ref_audios, ai, tools, assets
|
||||
from app.core.response import error_response
|
||||
# 直接从 modules 导入路由,消除 api 转发层
|
||||
from app.modules.materials.router import router as materials_router
|
||||
from app.modules.videos.router import router as videos_router
|
||||
from app.modules.publish.router import router as publish_router
|
||||
from app.modules.login_helper.router import router as login_helper_router
|
||||
from app.modules.auth.router import router as auth_router
|
||||
from app.modules.admin.router import router as admin_router
|
||||
from app.modules.ref_audios.router import router as ref_audios_router
|
||||
from app.modules.ai.router import router as ai_router
|
||||
from app.modules.tools.router import router as tools_router
|
||||
from app.modules.assets.router import router as assets_router
|
||||
from app.modules.generated_audios.router import router as generated_audios_router
|
||||
from app.modules.payment.router import router as payment_router
|
||||
from loguru import logger
|
||||
import os
|
||||
|
||||
@@ -11,15 +25,33 @@ settings = config.settings
|
||||
app = FastAPI(title="ViGent TalkingHead Agent")
|
||||
|
||||
from fastapi import Request
|
||||
from fastapi.exceptions import RequestValidationError
|
||||
from starlette.middleware.base import BaseHTTPMiddleware
|
||||
import time
|
||||
import traceback
|
||||
|
||||
class LoggingMiddleware(BaseHTTPMiddleware):
|
||||
# 敏感 header 名称列表(小写)
|
||||
SENSITIVE_HEADERS = {'authorization', 'cookie', 'set-cookie', 'x-api-key', 'api-key'}
|
||||
|
||||
def _sanitize_headers(self, headers: dict) -> dict:
|
||||
"""脱敏处理请求头,隐藏敏感信息"""
|
||||
sanitized = {}
|
||||
for key, value in headers.items():
|
||||
if key.lower() in self.SENSITIVE_HEADERS:
|
||||
# 显示前8个字符 + 掩码
|
||||
if len(value) > 8:
|
||||
sanitized[key] = value[:8] + "..." + f"[{len(value)} chars]"
|
||||
else:
|
||||
sanitized[key] = "[REDACTED]"
|
||||
else:
|
||||
sanitized[key] = value
|
||||
return sanitized
|
||||
|
||||
async def dispatch(self, request: Request, call_next):
|
||||
start_time = time.time()
|
||||
logger.info(f"START Request: {request.method} {request.url}")
|
||||
logger.info(f"HEADERS: {dict(request.headers)}")
|
||||
logger.debug(f"HEADERS: {self._sanitize_headers(dict(request.headers))}")
|
||||
try:
|
||||
response = await call_next(request)
|
||||
process_time = time.time() - start_time
|
||||
@@ -32,35 +64,84 @@ class LoggingMiddleware(BaseHTTPMiddleware):
|
||||
|
||||
app.add_middleware(LoggingMiddleware)
|
||||
|
||||
|
||||
@app.exception_handler(RequestValidationError)
|
||||
async def validation_exception_handler(request: Request, exc: RequestValidationError):
|
||||
return JSONResponse(
|
||||
status_code=422,
|
||||
content=error_response("参数校验失败", 422, data=exc.errors()),
|
||||
)
|
||||
|
||||
|
||||
@app.exception_handler(HTTPException)
|
||||
async def http_exception_handler(request: Request, exc: HTTPException):
|
||||
detail = exc.detail
|
||||
message = detail if isinstance(detail, str) else "请求失败"
|
||||
data = detail if not isinstance(detail, str) else None
|
||||
return JSONResponse(
|
||||
status_code=exc.status_code,
|
||||
content=error_response(message, exc.status_code, data=data),
|
||||
headers=exc.headers,
|
||||
)
|
||||
|
||||
|
||||
@app.exception_handler(Exception)
|
||||
async def unhandled_exception_handler(request: Request, exc: Exception):
|
||||
return JSONResponse(
|
||||
status_code=500,
|
||||
content=error_response("服务器内部错误", 500),
|
||||
)
|
||||
|
||||
# CORS 配置:从环境变量读取允许的域名
|
||||
# 当使用 credentials 时,不能使用 * 通配符
|
||||
cors_origins = settings.CORS_ORIGINS.split(",") if settings.CORS_ORIGINS != "*" else ["*"]
|
||||
allow_credentials = settings.CORS_ORIGINS != "*" # 使用 * 时不能 allow_credentials
|
||||
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_credentials=True,
|
||||
allow_origins=cors_origins,
|
||||
allow_credentials=allow_credentials,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
# Create dirs
|
||||
settings.UPLOAD_DIR.mkdir(parents=True, exist_ok=True)
|
||||
settings.OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
||||
(settings.UPLOAD_DIR / "materials").mkdir(exist_ok=True)
|
||||
settings.ASSETS_DIR.mkdir(parents=True, exist_ok=True)
|
||||
settings.UPLOAD_DIR.mkdir(parents=True, exist_ok=True)
|
||||
settings.OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
||||
(settings.UPLOAD_DIR / "materials").mkdir(exist_ok=True)
|
||||
settings.ASSETS_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
app.mount("/outputs", StaticFiles(directory=str(settings.OUTPUT_DIR)), name="outputs")
|
||||
app.mount("/uploads", StaticFiles(directory=str(settings.UPLOAD_DIR)), name="uploads")
|
||||
app.mount("/assets", StaticFiles(directory=str(settings.ASSETS_DIR)), name="assets")
|
||||
app.mount("/outputs", StaticFiles(directory=str(settings.OUTPUT_DIR)), name="outputs")
|
||||
app.mount("/uploads", StaticFiles(directory=str(settings.UPLOAD_DIR)), name="uploads")
|
||||
app.mount("/assets", StaticFiles(directory=str(settings.ASSETS_DIR)), name="assets")
|
||||
|
||||
# 注册路由
|
||||
app.include_router(materials.router, prefix="/api/materials", tags=["Materials"])
|
||||
app.include_router(videos.router, prefix="/api/videos", tags=["Videos"])
|
||||
app.include_router(publish.router, prefix="/api/publish", tags=["Publish"])
|
||||
app.include_router(login_helper.router, prefix="/api", tags=["LoginHelper"])
|
||||
app.include_router(auth.router) # /api/auth
|
||||
app.include_router(admin.router) # /api/admin
|
||||
app.include_router(ref_audios.router, prefix="/api/ref-audios", tags=["RefAudios"])
|
||||
app.include_router(ai.router) # /api/ai
|
||||
app.include_router(tools.router, prefix="/api/tools", tags=["Tools"])
|
||||
app.include_router(assets.router, prefix="/api/assets", tags=["Assets"])
|
||||
app.include_router(materials_router, prefix="/api/materials", tags=["Materials"])
|
||||
app.include_router(videos_router, prefix="/api/videos", tags=["Videos"])
|
||||
app.include_router(publish_router, prefix="/api/publish", tags=["Publish"])
|
||||
app.include_router(login_helper_router, prefix="/api", tags=["LoginHelper"])
|
||||
app.include_router(auth_router) # /api/auth
|
||||
app.include_router(admin_router) # /api/admin
|
||||
app.include_router(ref_audios_router, prefix="/api/ref-audios", tags=["RefAudios"])
|
||||
app.include_router(ai_router) # /api/ai
|
||||
app.include_router(tools_router, prefix="/api/tools", tags=["Tools"])
|
||||
app.include_router(assets_router, prefix="/api/assets", tags=["Assets"])
|
||||
app.include_router(generated_audios_router, prefix="/api/generated-audios", tags=["GeneratedAudios"])
|
||||
app.include_router(payment_router) # /api/payment
|
||||
|
||||
|
||||
@app.on_event("startup")
|
||||
async def check_jwt_secret():
|
||||
if settings.JWT_SECRET_KEY == "your-secret-key-change-in-production":
|
||||
if not settings.DEBUG:
|
||||
raise RuntimeError(
|
||||
"JWT_SECRET_KEY is still the default value! "
|
||||
"Set a strong random secret in .env before running in production (DEBUG=False)."
|
||||
)
|
||||
logger.critical(
|
||||
"JWT_SECRET_KEY is still the default value! "
|
||||
"Set a strong random secret in .env for production."
|
||||
)
|
||||
|
||||
|
||||
@app.on_event("startup")
|
||||
@@ -76,27 +157,21 @@ async def init_admin():
|
||||
return
|
||||
|
||||
try:
|
||||
from app.core.supabase import get_supabase
|
||||
from app.core.security import get_password_hash
|
||||
|
||||
supabase = get_supabase()
|
||||
|
||||
# 检查是否已存在
|
||||
existing = supabase.table("users").select("id").eq("phone", admin_phone).execute()
|
||||
|
||||
if existing.data:
|
||||
from app.repositories.users import create_user, user_exists_by_phone
|
||||
|
||||
if user_exists_by_phone(admin_phone):
|
||||
logger.info(f"管理员账号已存在: {admin_phone}")
|
||||
return
|
||||
|
||||
# 创建管理员
|
||||
supabase.table("users").insert({
|
||||
|
||||
create_user({
|
||||
"phone": admin_phone,
|
||||
"password_hash": get_password_hash(admin_password),
|
||||
"username": "Admin",
|
||||
"role": "admin",
|
||||
"is_active": True,
|
||||
"expires_at": None # 永不过期
|
||||
}).execute()
|
||||
})
|
||||
|
||||
logger.success(f"管理员账号已创建: {admin_phone}")
|
||||
except Exception as e:
|
||||
|
||||
0
backend/app/modules/admin/__init__.py
Normal file
0
backend/app/modules/admin/__init__.py
Normal file
@@ -3,10 +3,12 @@
|
||||
"""
|
||||
from fastapi import APIRouter, HTTPException, Depends, status
|
||||
from pydantic import BaseModel
|
||||
from typing import Optional, List
|
||||
from typing import Optional, List, Any, cast
|
||||
from datetime import datetime, timezone, timedelta
|
||||
from app.core.supabase import get_supabase
|
||||
from app.core.deps import get_current_admin
|
||||
from app.core.deps import get_current_admin
|
||||
from app.core.response import success_response
|
||||
from app.repositories.sessions import delete_sessions
|
||||
from app.repositories.users import get_user_by_id, list_users as list_users_repo, update_user
|
||||
from loguru import logger
|
||||
|
||||
router = APIRouter(prefix="/api/admin", tags=["管理"])
|
||||
@@ -26,25 +28,23 @@ class ActivateRequest(BaseModel):
|
||||
expires_days: Optional[int] = None # 授权天数,None 表示永久
|
||||
|
||||
|
||||
@router.get("/users", response_model=List[UserListItem])
|
||||
async def list_users(admin: dict = Depends(get_current_admin)):
|
||||
@router.get("/users")
|
||||
async def list_users(admin: dict = Depends(get_current_admin)):
|
||||
"""获取所有用户列表"""
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").select("*").order("created_at", desc=True).execute()
|
||||
|
||||
return [
|
||||
UserListItem(
|
||||
id=u["id"],
|
||||
phone=u["phone"],
|
||||
username=u.get("username"),
|
||||
role=u["role"],
|
||||
is_active=u["is_active"],
|
||||
expires_at=u.get("expires_at"),
|
||||
created_at=u["created_at"]
|
||||
)
|
||||
for u in result.data
|
||||
]
|
||||
data = list_users_repo()
|
||||
return success_response([
|
||||
UserListItem(
|
||||
id=u["id"],
|
||||
phone=u["phone"],
|
||||
username=u.get("username"),
|
||||
role=u["role"],
|
||||
is_active=u["is_active"],
|
||||
expires_at=u.get("expires_at"),
|
||||
created_at=u["created_at"]
|
||||
).model_dump()
|
||||
for u in data
|
||||
])
|
||||
except Exception as e:
|
||||
logger.error(f"获取用户列表失败: {e}")
|
||||
raise HTTPException(
|
||||
@@ -67,32 +67,26 @@ async def activate_user(
|
||||
request.expires_days: 授权天数 (None 表示永久)
|
||||
"""
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
|
||||
# 计算过期时间
|
||||
expires_at = None
|
||||
if request.expires_days:
|
||||
expires_at = (datetime.now(timezone.utc) + timedelta(days=request.expires_days)).isoformat()
|
||||
|
||||
# 更新用户
|
||||
result = supabase.table("users").update({
|
||||
"is_active": True,
|
||||
"role": "user",
|
||||
"expires_at": expires_at
|
||||
}).eq("id", user_id).execute()
|
||||
|
||||
if not result.data:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_404_NOT_FOUND,
|
||||
detail="用户不存在"
|
||||
)
|
||||
# 计算过期时间
|
||||
expires_at = None
|
||||
if request.expires_days:
|
||||
expires_at = (datetime.now(timezone.utc) + timedelta(days=request.expires_days)).isoformat()
|
||||
|
||||
result = update_user(user_id, {
|
||||
"is_active": True,
|
||||
"role": "user",
|
||||
"expires_at": expires_at
|
||||
})
|
||||
|
||||
if not result:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_404_NOT_FOUND,
|
||||
detail="用户不存在"
|
||||
)
|
||||
|
||||
logger.info(f"管理员 {admin['phone']} 激活用户 {user_id}, 有效期: {request.expires_days or '永久'} 天")
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": f"用户已激活,有效期: {request.expires_days or '永久'} 天"
|
||||
}
|
||||
return success_response(message=f"用户已激活,有效期: {request.expires_days or '永久'} 天")
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
@@ -110,27 +104,20 @@ async def deactivate_user(
|
||||
):
|
||||
"""停用用户"""
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
|
||||
# 不能停用管理员
|
||||
user_result = supabase.table("users").select("role").eq("id", user_id).single().execute()
|
||||
if user_result.data and user_result.data["role"] == "admin":
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail="不能停用管理员账号"
|
||||
)
|
||||
|
||||
# 更新用户
|
||||
result = supabase.table("users").update({
|
||||
"is_active": False
|
||||
}).eq("id", user_id).execute()
|
||||
|
||||
# 清除用户 session
|
||||
supabase.table("user_sessions").delete().eq("user_id", user_id).execute()
|
||||
# 不能停用管理员
|
||||
user = cast(dict[str, Any], get_user_by_id(user_id) or {})
|
||||
if user.get("role") == "admin":
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail="不能停用管理员账号"
|
||||
)
|
||||
|
||||
update_user(user_id, {"is_active": False})
|
||||
delete_sessions(user_id)
|
||||
|
||||
logger.info(f"管理员 {admin['phone']} 停用用户 {user_id}")
|
||||
|
||||
return {"success": True, "message": "用户已停用"}
|
||||
return success_response(message="用户已停用")
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
@@ -149,15 +136,12 @@ async def extend_user(
|
||||
):
|
||||
"""延长用户授权期限"""
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
|
||||
if not request.expires_days:
|
||||
# 设为永久
|
||||
expires_at = None
|
||||
else:
|
||||
# 获取当前过期时间
|
||||
user_result = supabase.table("users").select("expires_at").eq("id", user_id).single().execute()
|
||||
user = user_result.data
|
||||
if not request.expires_days:
|
||||
# 设为永久
|
||||
expires_at = None
|
||||
else:
|
||||
# 获取当前过期时间
|
||||
user = cast(dict[str, Any], get_user_by_id(user_id) or {})
|
||||
|
||||
if user and user.get("expires_at"):
|
||||
current_expires = datetime.fromisoformat(user["expires_at"].replace("Z", "+00:00"))
|
||||
@@ -167,16 +151,11 @@ async def extend_user(
|
||||
|
||||
expires_at = (base_time + timedelta(days=request.expires_days)).isoformat()
|
||||
|
||||
result = supabase.table("users").update({
|
||||
"expires_at": expires_at
|
||||
}).eq("id", user_id).execute()
|
||||
update_user(user_id, {"expires_at": expires_at})
|
||||
|
||||
logger.info(f"管理员 {admin['phone']} 延长用户 {user_id} 授权 {request.expires_days or '永久'} 天")
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": f"授权已延长 {request.expires_days or '永久'} 天"
|
||||
}
|
||||
return success_response(message=f"授权已延长 {request.expires_days or '永久'} 天")
|
||||
except Exception as e:
|
||||
logger.error(f"延长授权失败: {e}")
|
||||
raise HTTPException(
|
||||
0
backend/app/modules/ai/__init__.py
Normal file
0
backend/app/modules/ai/__init__.py
Normal file
99
backend/app/modules/ai/router.py
Normal file
99
backend/app/modules/ai/router.py
Normal file
@@ -0,0 +1,99 @@
|
||||
"""
|
||||
AI 相关 API 路由
|
||||
"""
|
||||
|
||||
from typing import Optional
|
||||
|
||||
from fastapi import APIRouter, Depends, HTTPException
|
||||
from pydantic import BaseModel
|
||||
from loguru import logger
|
||||
|
||||
from app.services.glm_service import glm_service
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
|
||||
|
||||
router = APIRouter(prefix="/api/ai", tags=["AI"])
|
||||
|
||||
|
||||
class GenerateMetaRequest(BaseModel):
|
||||
"""生成标题标签请求"""
|
||||
text: str
|
||||
|
||||
|
||||
class GenerateMetaResponse(BaseModel):
|
||||
"""生成标题标签响应"""
|
||||
title: str
|
||||
secondary_title: str = ""
|
||||
tags: list[str]
|
||||
|
||||
|
||||
class RewriteRequest(BaseModel):
|
||||
"""改写请求"""
|
||||
text: str
|
||||
custom_prompt: Optional[str] = None
|
||||
|
||||
|
||||
class TranslateRequest(BaseModel):
|
||||
"""翻译请求"""
|
||||
text: str
|
||||
target_lang: str
|
||||
|
||||
|
||||
@router.post("/translate")
|
||||
async def translate_text(req: TranslateRequest, current_user: dict = Depends(get_current_user)):
|
||||
"""
|
||||
AI 翻译文案
|
||||
|
||||
将文案翻译为指定目标语言
|
||||
"""
|
||||
if not req.text or not req.text.strip():
|
||||
raise HTTPException(status_code=400, detail="文案不能为空")
|
||||
if not req.target_lang or not req.target_lang.strip():
|
||||
raise HTTPException(status_code=400, detail="目标语言不能为空")
|
||||
|
||||
try:
|
||||
logger.info(f"Translating text to {req.target_lang}: {req.text[:50]}...")
|
||||
translated = await glm_service.translate_text(req.text.strip(), req.target_lang.strip())
|
||||
return success_response({"translated_text": translated})
|
||||
except Exception as e:
|
||||
logger.error(f"Translate failed: {e}")
|
||||
raise HTTPException(status_code=500, detail="翻译服务暂时不可用,请稍后重试")
|
||||
|
||||
|
||||
@router.post("/generate-meta")
|
||||
async def generate_meta(req: GenerateMetaRequest, current_user: dict = Depends(get_current_user)):
|
||||
"""
|
||||
AI 生成视频标题和标签
|
||||
|
||||
根据口播文案自动生成吸引人的标题和相关标签
|
||||
"""
|
||||
if not req.text or not req.text.strip():
|
||||
raise HTTPException(status_code=400, detail="口播文案不能为空")
|
||||
|
||||
try:
|
||||
logger.info(f"Generating meta for text: {req.text[:50]}...")
|
||||
result = await glm_service.generate_title_tags(req.text)
|
||||
return success_response(GenerateMetaResponse(
|
||||
title=result.get("title", ""),
|
||||
secondary_title=result.get("secondary_title", ""),
|
||||
tags=result.get("tags", [])
|
||||
).model_dump())
|
||||
except Exception as e:
|
||||
logger.error(f"Generate meta failed: {e}")
|
||||
raise HTTPException(status_code=500, detail="生成标题标签失败,请稍后重试")
|
||||
|
||||
|
||||
@router.post("/rewrite")
|
||||
async def rewrite_script(req: RewriteRequest, current_user: dict = Depends(get_current_user)):
|
||||
"""AI 改写文案"""
|
||||
if not req.text or not req.text.strip():
|
||||
raise HTTPException(status_code=400, detail="文案不能为空")
|
||||
|
||||
try:
|
||||
logger.info(f"Rewriting text: {req.text[:50]}...")
|
||||
rewritten = await glm_service.rewrite_script(req.text.strip(), req.custom_prompt)
|
||||
return success_response({"rewritten_text": rewritten})
|
||||
except Exception as e:
|
||||
logger.error(f"Rewrite failed: {e}")
|
||||
raise HTTPException(status_code=500, detail="改写服务暂时不可用,请稍后重试")
|
||||
0
backend/app/modules/assets/__init__.py
Normal file
0
backend/app/modules/assets/__init__.py
Normal file
@@ -2,6 +2,7 @@ from fastapi import APIRouter, Depends
|
||||
|
||||
from app.core.deps import get_current_user
|
||||
from app.services.assets_service import list_styles, list_bgm
|
||||
from app.core.response import success_response
|
||||
|
||||
|
||||
router = APIRouter()
|
||||
@@ -9,14 +10,14 @@ router = APIRouter()
|
||||
|
||||
@router.get("/subtitle-styles")
|
||||
async def list_subtitle_styles(current_user: dict = Depends(get_current_user)):
|
||||
return {"styles": list_styles("subtitle")}
|
||||
return success_response({"styles": list_styles("subtitle")})
|
||||
|
||||
|
||||
@router.get("/title-styles")
|
||||
async def list_title_styles(current_user: dict = Depends(get_current_user)):
|
||||
return {"styles": list_styles("title")}
|
||||
return success_response({"styles": list_styles("title")})
|
||||
|
||||
|
||||
@router.get("/bgm")
|
||||
async def list_bgm_items(current_user: dict = Depends(get_current_user)):
|
||||
return {"bgm": list_bgm()}
|
||||
return success_response({"bgm": list_bgm()})
|
||||
0
backend/app/modules/auth/__init__.py
Normal file
0
backend/app/modules/auth/__init__.py
Normal file
@@ -1,9 +1,9 @@
|
||||
"""
|
||||
认证 API:注册、登录、登出、修改密码
|
||||
"""
|
||||
from fastapi import APIRouter, HTTPException, Response, status, Request
|
||||
from fastapi import APIRouter, HTTPException, Response, status, Request, Depends
|
||||
from fastapi.responses import JSONResponse
|
||||
from pydantic import BaseModel, field_validator
|
||||
from app.core.supabase import get_supabase
|
||||
from app.core.security import (
|
||||
get_password_hash,
|
||||
verify_password,
|
||||
@@ -11,10 +11,22 @@ from app.core.security import (
|
||||
generate_session_token,
|
||||
set_auth_cookie,
|
||||
clear_auth_cookie,
|
||||
decode_access_token
|
||||
decode_access_token,
|
||||
create_payment_token,
|
||||
)
|
||||
from app.repositories.sessions import create_session, delete_sessions
|
||||
from app.repositories.users import (
|
||||
create_user,
|
||||
get_user_by_id,
|
||||
get_user_by_phone,
|
||||
user_exists_by_phone,
|
||||
update_user,
|
||||
deactivate_user_if_expired,
|
||||
)
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
from loguru import logger
|
||||
from typing import Optional
|
||||
from typing import Optional, Any, cast
|
||||
import re
|
||||
|
||||
router = APIRouter(prefix="/api/auth", tags=["认证"])
|
||||
@@ -74,14 +86,7 @@ async def register(request: RegisterRequest):
|
||||
注册后状态为 pending,需要管理员激活
|
||||
"""
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
|
||||
# 检查手机号是否已存在
|
||||
existing = supabase.table("users").select("id").eq(
|
||||
"phone", request.phone
|
||||
).execute()
|
||||
|
||||
if existing.data:
|
||||
if user_exists_by_phone(request.phone):
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_400_BAD_REQUEST,
|
||||
detail="该手机号已注册"
|
||||
@@ -90,20 +95,17 @@ async def register(request: RegisterRequest):
|
||||
# 创建用户
|
||||
password_hash = get_password_hash(request.password)
|
||||
|
||||
result = supabase.table("users").insert({
|
||||
create_user({
|
||||
"phone": request.phone,
|
||||
"password_hash": password_hash,
|
||||
"username": request.username or f"用户{request.phone[-4:]}",
|
||||
"role": "pending",
|
||||
"is_active": False
|
||||
}).execute()
|
||||
})
|
||||
|
||||
logger.info(f"新用户注册: {request.phone}")
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": "注册成功,请等待管理员审核激活"
|
||||
}
|
||||
return success_response(message="注册成功,请等待管理员审核激活")
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
@@ -124,14 +126,7 @@ async def login(request: LoginRequest, response: Response):
|
||||
- 实现"后踢前"单设备登录
|
||||
"""
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
|
||||
# 查找用户
|
||||
user_result = supabase.table("users").select("*").eq(
|
||||
"phone", request.phone
|
||||
).single().execute()
|
||||
|
||||
user = user_result.data
|
||||
user = cast(dict[str, Any], get_user_by_phone(request.phone) or {})
|
||||
if not user:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
@@ -145,36 +140,33 @@ async def login(request: LoginRequest, response: Response):
|
||||
detail="手机号或密码错误"
|
||||
)
|
||||
|
||||
# 检查是否激活
|
||||
if not user["is_active"]:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="账号未激活,请等待管理员审核"
|
||||
# 过期自动停用(注意:只更新 DB,不修改内存中的 user 字典)
|
||||
expired = deactivate_user_if_expired(user)
|
||||
if expired:
|
||||
delete_sessions(user["id"])
|
||||
|
||||
# 过期 或 未激活(新注册)→ 返回付费指引
|
||||
if expired or not user["is_active"]:
|
||||
payment_token = create_payment_token(user["id"])
|
||||
return JSONResponse(
|
||||
status_code=403,
|
||||
content={
|
||||
"success": False,
|
||||
"message": "请付费开通会员",
|
||||
"code": 403,
|
||||
"data": {
|
||||
"reason": "PAYMENT_REQUIRED",
|
||||
"payment_token": payment_token,
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
# 检查授权是否过期
|
||||
if user.get("expires_at"):
|
||||
from datetime import datetime, timezone
|
||||
expires_at = datetime.fromisoformat(user["expires_at"].replace("Z", "+00:00"))
|
||||
if datetime.now(timezone.utc) > expires_at:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_403_FORBIDDEN,
|
||||
detail="授权已过期,请联系管理员续期"
|
||||
)
|
||||
|
||||
# 生成新的 session_token (后踢前)
|
||||
session_token = generate_session_token()
|
||||
|
||||
# 删除旧 session,插入新 session
|
||||
supabase.table("user_sessions").delete().eq(
|
||||
"user_id", user["id"]
|
||||
).execute()
|
||||
|
||||
supabase.table("user_sessions").insert({
|
||||
"user_id": user["id"],
|
||||
"session_token": session_token,
|
||||
"device_info": None # 可以从 request headers 获取
|
||||
}).execute()
|
||||
delete_sessions(user["id"])
|
||||
create_session(user["id"], session_token, None)
|
||||
|
||||
# 生成 JWT Token
|
||||
token = create_access_token(user["id"], session_token)
|
||||
@@ -184,18 +176,19 @@ async def login(request: LoginRequest, response: Response):
|
||||
|
||||
logger.info(f"用户登录: {request.phone}")
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": "登录成功",
|
||||
"user": UserResponse(
|
||||
id=user["id"],
|
||||
phone=user["phone"],
|
||||
username=user.get("username"),
|
||||
role=user["role"],
|
||||
is_active=user["is_active"],
|
||||
expires_at=user.get("expires_at")
|
||||
)
|
||||
}
|
||||
return success_response(
|
||||
data={
|
||||
"user": UserResponse(
|
||||
id=user["id"],
|
||||
phone=user["phone"],
|
||||
username=user.get("username"),
|
||||
role=user["role"],
|
||||
is_active=user["is_active"],
|
||||
expires_at=user.get("expires_at")
|
||||
).model_dump()
|
||||
},
|
||||
message="登录成功",
|
||||
)
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
@@ -210,7 +203,7 @@ async def login(request: LoginRequest, response: Response):
|
||||
async def logout(response: Response):
|
||||
"""用户登出"""
|
||||
clear_auth_cookie(response)
|
||||
return {"success": True, "message": "已登出"}
|
||||
return success_response(message="已登出")
|
||||
|
||||
|
||||
@router.post("/change-password")
|
||||
@@ -238,14 +231,7 @@ async def change_password(request: ChangePasswordRequest, req: Request, response
|
||||
)
|
||||
|
||||
try:
|
||||
supabase = get_supabase()
|
||||
|
||||
# 获取用户信息
|
||||
user_result = supabase.table("users").select("*").eq(
|
||||
"id", token_data.user_id
|
||||
).single().execute()
|
||||
|
||||
user = user_result.data
|
||||
user = cast(dict[str, Any], get_user_by_id(token_data.user_id) or {})
|
||||
if not user:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
@@ -261,22 +247,13 @@ async def change_password(request: ChangePasswordRequest, req: Request, response
|
||||
|
||||
# 更新密码
|
||||
new_password_hash = get_password_hash(request.new_password)
|
||||
supabase.table("users").update({
|
||||
"password_hash": new_password_hash
|
||||
}).eq("id", user["id"]).execute()
|
||||
update_user(user["id"], {"password_hash": new_password_hash})
|
||||
|
||||
# 生成新的 session token,使旧 token 失效
|
||||
new_session_token = generate_session_token()
|
||||
|
||||
supabase.table("user_sessions").delete().eq(
|
||||
"user_id", user["id"]
|
||||
).execute()
|
||||
|
||||
supabase.table("user_sessions").insert({
|
||||
"user_id": user["id"],
|
||||
"session_token": new_session_token,
|
||||
"device_info": None
|
||||
}).execute()
|
||||
delete_sessions(user["id"])
|
||||
create_session(user["id"], new_session_token, None)
|
||||
|
||||
# 生成新的 JWT Token
|
||||
new_token = create_access_token(user["id"], new_session_token)
|
||||
@@ -284,10 +261,7 @@ async def change_password(request: ChangePasswordRequest, req: Request, response
|
||||
|
||||
logger.info(f"用户修改密码: {user['phone']}")
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": "密码修改成功"
|
||||
}
|
||||
return success_response(message="密码修改成功")
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
@@ -299,40 +273,13 @@ async def change_password(request: ChangePasswordRequest, req: Request, response
|
||||
|
||||
|
||||
@router.get("/me")
|
||||
async def get_me(request: Request):
|
||||
async def get_me(user: dict = Depends(get_current_user)):
|
||||
"""获取当前用户信息"""
|
||||
# 从 Cookie 获取用户
|
||||
token = request.cookies.get("access_token")
|
||||
if not token:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="未登录"
|
||||
)
|
||||
|
||||
token_data = decode_access_token(token)
|
||||
if not token_data:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="Token 无效"
|
||||
)
|
||||
|
||||
supabase = get_supabase()
|
||||
user_result = supabase.table("users").select("*").eq(
|
||||
"id", token_data.user_id
|
||||
).single().execute()
|
||||
|
||||
user = user_result.data
|
||||
if not user:
|
||||
raise HTTPException(
|
||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||
detail="用户不存在"
|
||||
)
|
||||
|
||||
return UserResponse(
|
||||
return success_response(UserResponse(
|
||||
id=user["id"],
|
||||
phone=user["phone"],
|
||||
username=user.get("username"),
|
||||
role=user["role"],
|
||||
is_active=user["is_active"],
|
||||
expires_at=user.get("expires_at")
|
||||
)
|
||||
).model_dump())
|
||||
0
backend/app/modules/generated_audios/__init__.py
Normal file
0
backend/app/modules/generated_audios/__init__.py
Normal file
77
backend/app/modules/generated_audios/router.py
Normal file
77
backend/app/modules/generated_audios/router.py
Normal file
@@ -0,0 +1,77 @@
|
||||
"""生成配音 API"""
|
||||
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException
|
||||
import uuid
|
||||
from loguru import logger
|
||||
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
from app.modules.videos.task_store import create_task, get_task
|
||||
from app.modules.generated_audios.schemas import GenerateAudioRequest, RenameAudioRequest
|
||||
from app.modules.generated_audios import service
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
@router.post("/generate")
|
||||
async def generate_audio(
|
||||
req: GenerateAudioRequest,
|
||||
background_tasks: BackgroundTasks,
|
||||
user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""异步生成配音(返回 task_id)"""
|
||||
task_id = str(uuid.uuid4())
|
||||
create_task(task_id, user["id"])
|
||||
background_tasks.add_task(service.generate_audio_task, task_id, req, user["id"])
|
||||
return success_response({"task_id": task_id})
|
||||
|
||||
|
||||
@router.get("/tasks/{task_id}")
|
||||
async def get_audio_task(task_id: str, user: dict = Depends(get_current_user)):
|
||||
"""轮询配音生成进度"""
|
||||
task = get_task(task_id)
|
||||
if task.get("status") != "not_found" and task.get("user_id") != user["id"]:
|
||||
return success_response({"status": "not_found"})
|
||||
return success_response(task)
|
||||
|
||||
|
||||
@router.get("")
|
||||
async def list_audios(user: dict = Depends(get_current_user)):
|
||||
"""列出当前用户所有已生成配音"""
|
||||
try:
|
||||
result = await service.list_generated_audios(user["id"])
|
||||
return success_response(result)
|
||||
except Exception as e:
|
||||
logger.error(f"列出配音失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"获取列表失败: {str(e)}")
|
||||
|
||||
|
||||
@router.delete("/{audio_id:path}")
|
||||
async def delete_audio(audio_id: str, user: dict = Depends(get_current_user)):
|
||||
"""删除配音"""
|
||||
try:
|
||||
await service.delete_generated_audio(audio_id, user["id"])
|
||||
return success_response(message="删除成功")
|
||||
except PermissionError as e:
|
||||
raise HTTPException(status_code=403, detail=str(e))
|
||||
except Exception as e:
|
||||
logger.error(f"删除配音失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"删除失败: {str(e)}")
|
||||
|
||||
|
||||
@router.put("/{audio_id:path}")
|
||||
async def rename_audio(
|
||||
audio_id: str,
|
||||
request: RenameAudioRequest,
|
||||
user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""重命名配音"""
|
||||
try:
|
||||
result = await service.rename_generated_audio(audio_id, request.new_name, user["id"])
|
||||
return success_response(result, message="重命名成功")
|
||||
except PermissionError as e:
|
||||
raise HTTPException(status_code=403, detail=str(e))
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
except Exception as e:
|
||||
logger.error(f"重命名配音失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"重命名失败: {str(e)}")
|
||||
32
backend/app/modules/generated_audios/schemas.py
Normal file
32
backend/app/modules/generated_audios/schemas.py
Normal file
@@ -0,0 +1,32 @@
|
||||
from pydantic import BaseModel
|
||||
from typing import Optional, List
|
||||
|
||||
|
||||
class GenerateAudioRequest(BaseModel):
|
||||
text: str
|
||||
tts_mode: str = "edgetts"
|
||||
voice: str = "zh-CN-YunxiNeural"
|
||||
ref_audio_id: Optional[str] = None
|
||||
ref_text: Optional[str] = None
|
||||
language: str = "zh-CN"
|
||||
speed: float = 1.0
|
||||
instruct_text: Optional[str] = None
|
||||
|
||||
|
||||
class RenameAudioRequest(BaseModel):
|
||||
new_name: str
|
||||
|
||||
|
||||
class GeneratedAudioItem(BaseModel):
|
||||
id: str
|
||||
name: str
|
||||
path: str
|
||||
duration_sec: float
|
||||
text: str
|
||||
tts_mode: str
|
||||
language: str
|
||||
created_at: int
|
||||
|
||||
|
||||
class GeneratedAudioListResponse(BaseModel):
|
||||
items: List[GeneratedAudioItem]
|
||||
289
backend/app/modules/generated_audios/service.py
Normal file
289
backend/app/modules/generated_audios/service.py
Normal file
@@ -0,0 +1,289 @@
|
||||
"""生成配音 - 业务逻辑"""
|
||||
import re
|
||||
import json
|
||||
import time
|
||||
import asyncio
|
||||
import subprocess
|
||||
import tempfile
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import httpx
|
||||
from loguru import logger
|
||||
|
||||
from app.services.storage import storage_service
|
||||
from app.services.tts_service import TTSService
|
||||
from app.services.voice_clone_service import voice_clone_service
|
||||
from app.modules.videos.task_store import task_store
|
||||
from app.modules.generated_audios.schemas import (
|
||||
GenerateAudioRequest,
|
||||
GeneratedAudioItem,
|
||||
GeneratedAudioListResponse,
|
||||
)
|
||||
|
||||
BUCKET = "generated-audios"
|
||||
|
||||
|
||||
def _locale_to_tts_lang(locale: str) -> str:
|
||||
mapping = {"zh": "Chinese", "en": "English"}
|
||||
return mapping.get(locale.split("-")[0], "Auto")
|
||||
|
||||
|
||||
def _get_audio_duration(file_path: str) -> float:
|
||||
try:
|
||||
result = subprocess.run(
|
||||
['ffprobe', '-v', 'quiet', '-show_entries', 'format=duration',
|
||||
'-of', 'csv=p=0', file_path],
|
||||
capture_output=True, text=True, timeout=10
|
||||
)
|
||||
return float(result.stdout.strip())
|
||||
except Exception as e:
|
||||
logger.warning(f"获取音频时长失败: {e}")
|
||||
return 0.0
|
||||
|
||||
|
||||
async def generate_audio_task(task_id: str, req: GenerateAudioRequest, user_id: str):
|
||||
"""后台任务:生成配音"""
|
||||
try:
|
||||
task_store.update(task_id, {"status": "processing", "progress": 10, "message": "正在生成配音..."})
|
||||
|
||||
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp:
|
||||
audio_path = tmp.name
|
||||
|
||||
try:
|
||||
if req.tts_mode == "voiceclone":
|
||||
if not req.ref_audio_id or not req.ref_text:
|
||||
raise ValueError("声音克隆模式需要提供参考音频和参考文字")
|
||||
|
||||
task_store.update(task_id, {"progress": 20, "message": "正在下载参考音频..."})
|
||||
|
||||
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_ref:
|
||||
ref_local = tmp_ref.name
|
||||
|
||||
try:
|
||||
ref_url = await storage_service.get_signed_url(
|
||||
bucket="ref-audios", path=req.ref_audio_id
|
||||
)
|
||||
timeout = httpx.Timeout(None)
|
||||
async with httpx.AsyncClient(timeout=timeout) as client:
|
||||
async with client.stream("GET", ref_url) as resp:
|
||||
resp.raise_for_status()
|
||||
with open(ref_local, "wb") as f:
|
||||
async for chunk in resp.aiter_bytes():
|
||||
f.write(chunk)
|
||||
|
||||
task_store.update(task_id, {"progress": 40, "message": "正在克隆声音..."})
|
||||
await voice_clone_service.generate_audio(
|
||||
text=req.text,
|
||||
ref_audio_path=ref_local,
|
||||
ref_text=req.ref_text,
|
||||
output_path=audio_path,
|
||||
language=_locale_to_tts_lang(req.language),
|
||||
speed=req.speed,
|
||||
instruct_text=req.instruct_text or "",
|
||||
)
|
||||
finally:
|
||||
if os.path.exists(ref_local):
|
||||
os.unlink(ref_local)
|
||||
else:
|
||||
task_store.update(task_id, {"progress": 30, "message": "正在生成语音..."})
|
||||
tts = TTSService()
|
||||
await tts.generate_audio(req.text, req.voice, audio_path)
|
||||
|
||||
task_store.update(task_id, {"progress": 70, "message": "正在上传配音..."})
|
||||
|
||||
duration = _get_audio_duration(audio_path)
|
||||
timestamp = int(time.time())
|
||||
audio_id = f"{user_id}/{timestamp}_audio.wav"
|
||||
meta_id = f"{user_id}/{timestamp}_audio.json"
|
||||
|
||||
# 生成 display_name
|
||||
now = time.strftime("%Y%m%d_%H%M", time.localtime(timestamp))
|
||||
display_name = f"配音_{now}"
|
||||
|
||||
with open(audio_path, "rb") as f:
|
||||
wav_data = f.read()
|
||||
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET, path=audio_id,
|
||||
file_data=wav_data, content_type="audio/wav",
|
||||
)
|
||||
|
||||
metadata = {
|
||||
"display_name": display_name,
|
||||
"text": req.text,
|
||||
"tts_mode": req.tts_mode,
|
||||
"voice": req.voice if req.tts_mode == "edgetts" else None,
|
||||
"ref_audio_id": req.ref_audio_id,
|
||||
"language": req.language,
|
||||
"duration_sec": duration,
|
||||
"created_at": timestamp,
|
||||
}
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET, path=meta_id,
|
||||
file_data=json.dumps(metadata, ensure_ascii=False).encode("utf-8"),
|
||||
content_type="application/json",
|
||||
)
|
||||
|
||||
signed_url = await storage_service.get_signed_url(BUCKET, audio_id)
|
||||
|
||||
task_store.update(task_id, {
|
||||
"status": "completed",
|
||||
"progress": 100,
|
||||
"message": f"配音生成完成 ({duration:.1f}s)",
|
||||
"output": {
|
||||
"audio_id": audio_id,
|
||||
"name": display_name,
|
||||
"path": signed_url,
|
||||
"duration_sec": duration,
|
||||
"text": req.text,
|
||||
"tts_mode": req.tts_mode,
|
||||
"language": req.language,
|
||||
"created_at": timestamp,
|
||||
},
|
||||
})
|
||||
finally:
|
||||
if os.path.exists(audio_path):
|
||||
os.unlink(audio_path)
|
||||
|
||||
except Exception as e:
|
||||
import traceback
|
||||
task_store.update(task_id, {
|
||||
"status": "failed",
|
||||
"message": f"配音生成失败: {str(e)}",
|
||||
"error": str(e),
|
||||
})
|
||||
logger.error(f"Generate audio failed: {e}\n{traceback.format_exc()}")
|
||||
|
||||
|
||||
async def list_generated_audios(user_id: str) -> dict:
|
||||
"""列出用户的所有已生成配音"""
|
||||
files = await storage_service.list_files(BUCKET, user_id)
|
||||
wav_files = [f for f in files if f.get("name", "").endswith("_audio.wav")]
|
||||
|
||||
if not wav_files:
|
||||
return GeneratedAudioListResponse(items=[]).model_dump()
|
||||
|
||||
async def fetch_info(f):
|
||||
name = f.get("name", "")
|
||||
storage_path = f"{user_id}/{name}"
|
||||
meta_name = name.replace("_audio.wav", "_audio.json")
|
||||
meta_path = f"{user_id}/{meta_name}"
|
||||
|
||||
display_name = name
|
||||
text = ""
|
||||
tts_mode = "edgetts"
|
||||
language = "zh-CN"
|
||||
duration_sec = 0.0
|
||||
created_at = 0
|
||||
|
||||
try:
|
||||
meta_url = await storage_service.get_signed_url(BUCKET, meta_path)
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.get(meta_url)
|
||||
if resp.status_code == 200:
|
||||
meta = resp.json()
|
||||
display_name = meta.get("display_name", name)
|
||||
text = meta.get("text", "")
|
||||
tts_mode = meta.get("tts_mode", "edgetts")
|
||||
language = meta.get("language", "zh-CN")
|
||||
duration_sec = meta.get("duration_sec", 0.0)
|
||||
created_at = meta.get("created_at", 0)
|
||||
except Exception as e:
|
||||
logger.debug(f"读取配音 metadata 失败: {e}")
|
||||
try:
|
||||
created_at = int(name.split("_")[0])
|
||||
except:
|
||||
pass
|
||||
|
||||
signed_url = await storage_service.get_signed_url(BUCKET, storage_path)
|
||||
|
||||
return GeneratedAudioItem(
|
||||
id=storage_path,
|
||||
name=display_name,
|
||||
path=signed_url,
|
||||
duration_sec=duration_sec,
|
||||
text=text,
|
||||
tts_mode=tts_mode,
|
||||
language=language,
|
||||
created_at=created_at,
|
||||
)
|
||||
|
||||
items = await asyncio.gather(*[fetch_info(f) for f in wav_files])
|
||||
items = sorted(items, key=lambda x: x.created_at, reverse=True)
|
||||
return GeneratedAudioListResponse(items=items).model_dump()
|
||||
|
||||
|
||||
async def delete_all_generated_audios(user_id: str) -> tuple[int, int]:
|
||||
"""删除用户所有生成的配音(.wav + .json),返回 (删除数量, 失败数量)"""
|
||||
try:
|
||||
files = await storage_service.list_files(BUCKET, user_id, strict=True)
|
||||
deleted_count = 0
|
||||
failed_count = 0
|
||||
for f in files:
|
||||
name = f.get("name", "")
|
||||
if not name or name == ".emptyFolderPlaceholder":
|
||||
continue
|
||||
if name.endswith("_audio.wav") or name.endswith("_audio.json"):
|
||||
full_path = f"{user_id}/{name}"
|
||||
try:
|
||||
await storage_service.delete_file(BUCKET, full_path)
|
||||
deleted_count += 1
|
||||
except Exception as e:
|
||||
failed_count += 1
|
||||
logger.warning(f"Delete audio file failed: {full_path}, {e}")
|
||||
return deleted_count, failed_count
|
||||
except Exception as e:
|
||||
logger.error(f"Delete all generated audios failed: {e}")
|
||||
return 0, 1
|
||||
|
||||
|
||||
async def delete_generated_audio(audio_id: str, user_id: str) -> None:
|
||||
if not audio_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权删除此文件")
|
||||
|
||||
await storage_service.delete_file(BUCKET, audio_id)
|
||||
meta_path = audio_id.replace("_audio.wav", "_audio.json")
|
||||
try:
|
||||
await storage_service.delete_file(BUCKET, meta_path)
|
||||
except:
|
||||
pass
|
||||
|
||||
|
||||
async def rename_generated_audio(audio_id: str, new_name: str, user_id: str) -> dict:
|
||||
if not audio_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权修改此文件")
|
||||
|
||||
new_name = new_name.strip()
|
||||
if not new_name:
|
||||
raise ValueError("新名称不能为空")
|
||||
|
||||
meta_path = audio_id.replace("_audio.wav", "_audio.json")
|
||||
try:
|
||||
meta_url = await storage_service.get_signed_url(BUCKET, meta_path)
|
||||
async with httpx.AsyncClient() as client:
|
||||
resp = await client.get(meta_url)
|
||||
if resp.status_code == 200:
|
||||
metadata = resp.json()
|
||||
else:
|
||||
raise Exception(f"Failed to fetch metadata: {resp.status_code}")
|
||||
except Exception as e:
|
||||
logger.warning(f"无法读取配音元数据: {e}, 将创建新的")
|
||||
metadata = {
|
||||
"display_name": new_name,
|
||||
"text": "",
|
||||
"tts_mode": "edgetts",
|
||||
"language": "zh-CN",
|
||||
"duration_sec": 0.0,
|
||||
"created_at": int(time.time()),
|
||||
}
|
||||
|
||||
metadata["display_name"] = new_name
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET,
|
||||
path=meta_path,
|
||||
file_data=json.dumps(metadata, ensure_ascii=False).encode("utf-8"),
|
||||
content_type="application/json",
|
||||
)
|
||||
return {"name": new_name}
|
||||
0
backend/app/modules/login_helper/__init__.py
Normal file
0
backend/app/modules/login_helper/__init__.py
Normal file
@@ -15,17 +15,19 @@ async def login_helper_page(platform: str, request: Request):
|
||||
登录后JavaScript自动提取Cookie并POST回服务器
|
||||
"""
|
||||
|
||||
platform_urls = {
|
||||
"bilibili": "https://www.bilibili.com/",
|
||||
"douyin": "https://creator.douyin.com/",
|
||||
"xiaohongshu": "https://creator.xiaohongshu.com/"
|
||||
}
|
||||
platform_urls = {
|
||||
"bilibili": "https://www.bilibili.com/",
|
||||
"douyin": "https://creator.douyin.com/",
|
||||
"xiaohongshu": "https://creator.xiaohongshu.com/",
|
||||
"weixin": "https://channels.weixin.qq.com/"
|
||||
}
|
||||
|
||||
platform_names = {
|
||||
"bilibili": "B站",
|
||||
"douyin": "抖音",
|
||||
"xiaohongshu": "小红书"
|
||||
}
|
||||
platform_names = {
|
||||
"bilibili": "B站",
|
||||
"douyin": "抖音",
|
||||
"xiaohongshu": "小红书",
|
||||
"weixin": "微信视频号"
|
||||
}
|
||||
|
||||
if platform not in platform_urls:
|
||||
return "<h1>不支持的平台</h1>"
|
||||
0
backend/app/modules/materials/__init__.py
Normal file
0
backend/app/modules/materials/__init__.py
Normal file
80
backend/app/modules/materials/router.py
Normal file
80
backend/app/modules/materials/router.py
Normal file
@@ -0,0 +1,80 @@
|
||||
from fastapi import APIRouter, HTTPException, Request, Depends
|
||||
from fastapi.responses import FileResponse
|
||||
from loguru import logger
|
||||
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
from app.modules.materials.schemas import RenameMaterialRequest
|
||||
from app.modules.materials import service
|
||||
from app.services.storage import storage_service
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
@router.get("/stream/{material_id:path}")
|
||||
async def stream_material(material_id: str, current_user: dict = Depends(get_current_user)):
|
||||
"""直接流式返回素材文件(同源,避免 CORS canvas taint)"""
|
||||
if ".." in material_id:
|
||||
raise HTTPException(400, "非法素材ID")
|
||||
user_id = current_user["id"]
|
||||
if not material_id.startswith(f"{user_id}/"):
|
||||
raise HTTPException(403, "无权访问此素材")
|
||||
local_path = storage_service.get_local_file_path("materials", material_id)
|
||||
if not local_path:
|
||||
raise HTTPException(404, "素材文件不存在")
|
||||
return FileResponse(local_path, media_type="video/mp4")
|
||||
|
||||
|
||||
@router.post("")
|
||||
async def upload_material(
|
||||
request: Request,
|
||||
current_user: dict = Depends(get_current_user)
|
||||
):
|
||||
user_id = current_user["id"]
|
||||
logger.info(f"Upload material request from user {user_id}")
|
||||
try:
|
||||
result = await service.upload_material(request, user_id)
|
||||
return success_response(result)
|
||||
except ValueError as e:
|
||||
raise HTTPException(400, str(e))
|
||||
except Exception as e:
|
||||
raise HTTPException(500, f"Upload failed. Error: {str(e)}")
|
||||
|
||||
|
||||
@router.get("")
|
||||
async def list_materials(current_user: dict = Depends(get_current_user)):
|
||||
user_id = current_user["id"]
|
||||
materials = await service.list_materials(user_id)
|
||||
return success_response({"materials": materials})
|
||||
|
||||
|
||||
@router.delete("/{material_id:path}")
|
||||
async def delete_material(material_id: str, current_user: dict = Depends(get_current_user)):
|
||||
user_id = current_user["id"]
|
||||
try:
|
||||
await service.delete_material(material_id, user_id)
|
||||
return success_response(message="素材已删除")
|
||||
except ValueError as e:
|
||||
raise HTTPException(400, str(e))
|
||||
except PermissionError as e:
|
||||
raise HTTPException(403, str(e))
|
||||
except Exception as e:
|
||||
raise HTTPException(500, f"删除失败: {str(e)}")
|
||||
|
||||
|
||||
@router.put("/{material_id:path}")
|
||||
async def rename_material(
|
||||
material_id: str,
|
||||
payload: RenameMaterialRequest,
|
||||
current_user: dict = Depends(get_current_user)
|
||||
):
|
||||
user_id = current_user["id"]
|
||||
try:
|
||||
result = await service.rename_material(material_id, payload.new_name, user_id)
|
||||
return success_response(result, message="重命名成功")
|
||||
except PermissionError as e:
|
||||
raise HTTPException(403, str(e))
|
||||
except ValueError as e:
|
||||
raise HTTPException(400, str(e))
|
||||
except Exception as e:
|
||||
raise HTTPException(500, f"重命名失败: {str(e)}")
|
||||
14
backend/app/modules/materials/schemas.py
Normal file
14
backend/app/modules/materials/schemas.py
Normal file
@@ -0,0 +1,14 @@
|
||||
from pydantic import BaseModel
|
||||
|
||||
|
||||
class RenameMaterialRequest(BaseModel):
|
||||
new_name: str
|
||||
|
||||
|
||||
class MaterialItem(BaseModel):
|
||||
id: str
|
||||
name: str
|
||||
path: str
|
||||
size_mb: float
|
||||
type: str = "video"
|
||||
created_at: int = 0
|
||||
304
backend/app/modules/materials/service.py
Normal file
304
backend/app/modules/materials/service.py
Normal file
@@ -0,0 +1,304 @@
|
||||
import re
|
||||
import os
|
||||
import time
|
||||
import asyncio
|
||||
import traceback
|
||||
import aiofiles
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
|
||||
from app.core.config import settings as app_settings
|
||||
from app.services.storage import storage_service
|
||||
|
||||
|
||||
def sanitize_filename(filename: str) -> str:
|
||||
safe_name = re.sub(r'[<>:"/\\|?*]', '_', filename)
|
||||
if len(safe_name) > 100:
|
||||
ext = Path(safe_name).suffix
|
||||
safe_name = safe_name[:100 - len(ext)] + ext
|
||||
return safe_name
|
||||
|
||||
|
||||
def _extract_display_name(storage_name: str) -> str:
|
||||
"""从存储文件名中提取显示名(去掉时间戳前缀)"""
|
||||
if '_' in storage_name:
|
||||
parts = storage_name.split('_', 1)
|
||||
if parts[0].isdigit():
|
||||
return parts[1]
|
||||
return storage_name
|
||||
|
||||
|
||||
async def _process_and_upload(temp_file_path: str, original_filename: str, content_type: str, user_id: str) -> str:
|
||||
"""Strip multipart headers and upload to Supabase, return storage_path"""
|
||||
try:
|
||||
logger.info(f"Processing raw upload: {temp_file_path} for user {user_id}")
|
||||
|
||||
file_size = os.path.getsize(temp_file_path)
|
||||
|
||||
with open(temp_file_path, 'rb') as f:
|
||||
head = f.read(4096)
|
||||
|
||||
first_line_end = head.find(b'\r\n')
|
||||
if first_line_end == -1:
|
||||
raise Exception("Could not find boundary in multipart body")
|
||||
|
||||
boundary = head[:first_line_end]
|
||||
logger.info(f"Detected boundary: {boundary}")
|
||||
|
||||
header_end = head.find(b'\r\n\r\n')
|
||||
if header_end == -1:
|
||||
raise Exception("Could not find end of multipart headers")
|
||||
|
||||
start_offset = header_end + 4
|
||||
logger.info(f"Video data starts at offset: {start_offset}")
|
||||
|
||||
f.seek(max(0, file_size - 200))
|
||||
tail = f.read()
|
||||
|
||||
last_boundary_pos = tail.rfind(boundary)
|
||||
if last_boundary_pos != -1:
|
||||
end_offset = (max(0, file_size - 200) + last_boundary_pos) - 2
|
||||
else:
|
||||
logger.warning("Could not find closing boundary, assuming EOF")
|
||||
end_offset = file_size
|
||||
|
||||
logger.info(f"Video data ends at offset: {end_offset}. Total video size: {end_offset - start_offset}")
|
||||
|
||||
video_path = temp_file_path + "_video.mp4"
|
||||
with open(temp_file_path, 'rb') as src, open(video_path, 'wb') as dst:
|
||||
src.seek(start_offset)
|
||||
bytes_to_copy = end_offset - start_offset
|
||||
copied = 0
|
||||
while copied < bytes_to_copy:
|
||||
chunk_size = min(1024 * 1024 * 10, bytes_to_copy - copied)
|
||||
chunk = src.read(chunk_size)
|
||||
if not chunk:
|
||||
break
|
||||
dst.write(chunk)
|
||||
copied += len(chunk)
|
||||
|
||||
logger.info(f"Extracted video content to {video_path}")
|
||||
|
||||
timestamp = int(time.time())
|
||||
safe_name = re.sub(r'[^a-zA-Z0-9._-]', '', original_filename)
|
||||
storage_path = f"{user_id}/{timestamp}_{safe_name}"
|
||||
|
||||
with open(video_path, 'rb') as f:
|
||||
file_content = f.read()
|
||||
await storage_service.upload_file(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=storage_path,
|
||||
file_data=file_content,
|
||||
content_type=content_type
|
||||
)
|
||||
|
||||
logger.info(f"Upload to Supabase complete: {storage_path}")
|
||||
|
||||
os.remove(temp_file_path)
|
||||
os.remove(video_path)
|
||||
|
||||
return storage_path
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Background upload processing failed: {e}\n{traceback.format_exc()}")
|
||||
raise
|
||||
|
||||
|
||||
async def upload_material(request, user_id: str) -> dict:
|
||||
"""接收流式上传并存储到 Supabase,返回素材信息"""
|
||||
filename = "unknown_video.mp4"
|
||||
content_type = "video/mp4"
|
||||
|
||||
timestamp = int(time.time())
|
||||
temp_filename = f"upload_{timestamp}.raw"
|
||||
temp_path = os.path.join("/tmp", temp_filename)
|
||||
if os.name == 'nt':
|
||||
temp_path = f"d:/tmp/{temp_filename}"
|
||||
os.makedirs("d:/tmp", exist_ok=True)
|
||||
|
||||
try:
|
||||
total_size = 0
|
||||
last_log = 0
|
||||
|
||||
async with aiofiles.open(temp_path, 'wb') as f:
|
||||
async for chunk in request.stream():
|
||||
await f.write(chunk)
|
||||
total_size += len(chunk)
|
||||
max_bytes = app_settings.MAX_UPLOAD_SIZE_MB * 1024 * 1024
|
||||
if total_size > max_bytes:
|
||||
raise ValueError(f"文件大小超过限制 ({app_settings.MAX_UPLOAD_SIZE_MB}MB)")
|
||||
|
||||
if total_size - last_log > 20 * 1024 * 1024:
|
||||
logger.info(f"Receiving stream... Processed {total_size / (1024*1024):.2f} MB")
|
||||
last_log = total_size
|
||||
|
||||
logger.info(f"Stream reception complete. Total size: {total_size} bytes. Saved to {temp_path}")
|
||||
|
||||
if total_size == 0:
|
||||
raise ValueError("Received empty body")
|
||||
|
||||
with open(temp_path, 'rb') as f:
|
||||
head = f.read(4096).decode('utf-8', errors='ignore')
|
||||
match = re.search(r'filename="([^"]+)"', head)
|
||||
if match:
|
||||
filename = match.group(1)
|
||||
logger.info(f"Extracted filename from body: {filename}")
|
||||
|
||||
storage_path = await _process_and_upload(temp_path, filename, content_type, user_id)
|
||||
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=storage_path
|
||||
)
|
||||
|
||||
size_mb = total_size / (1024 * 1024)
|
||||
display_name = _extract_display_name(storage_path.split('/')[-1])
|
||||
|
||||
return {
|
||||
"id": storage_path,
|
||||
"name": display_name,
|
||||
"path": signed_url,
|
||||
"size_mb": size_mb,
|
||||
"type": "video"
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
error_msg = f"Streaming upload failed: {str(e)}"
|
||||
detail_msg = f"Exception: {repr(e)}\nArgs: {e.args}\n{traceback.format_exc()}"
|
||||
logger.error(error_msg + "\n" + detail_msg)
|
||||
|
||||
try:
|
||||
with open("debug_upload.log", "a") as logf:
|
||||
logf.write(f"\n--- Error at {time.ctime()} ---\n")
|
||||
logf.write(detail_msg)
|
||||
logf.write("\n-----------------------------\n")
|
||||
except:
|
||||
pass
|
||||
|
||||
if os.path.exists(temp_path):
|
||||
try:
|
||||
os.remove(temp_path)
|
||||
except:
|
||||
pass
|
||||
raise
|
||||
|
||||
|
||||
async def list_materials(user_id: str) -> list[dict]:
|
||||
"""列出用户的所有素材"""
|
||||
try:
|
||||
files_obj = await storage_service.list_files(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=user_id
|
||||
)
|
||||
semaphore = asyncio.Semaphore(8)
|
||||
|
||||
async def build_item(f):
|
||||
name = f.get('name')
|
||||
if not name or name == '.emptyFolderPlaceholder':
|
||||
return None
|
||||
display_name = _extract_display_name(name)
|
||||
full_path = f"{user_id}/{name}"
|
||||
async with semaphore:
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=full_path
|
||||
)
|
||||
metadata = f.get('metadata', {})
|
||||
size = metadata.get('size', 0)
|
||||
created_at_str = f.get('created_at', '')
|
||||
created_at = 0
|
||||
if created_at_str:
|
||||
from datetime import datetime
|
||||
try:
|
||||
dt = datetime.fromisoformat(created_at_str.replace('Z', '+00:00'))
|
||||
created_at = int(dt.timestamp())
|
||||
except Exception:
|
||||
pass
|
||||
return {
|
||||
"id": full_path,
|
||||
"name": display_name,
|
||||
"path": signed_url,
|
||||
"size_mb": size / (1024 * 1024),
|
||||
"type": "video",
|
||||
"created_at": created_at
|
||||
}
|
||||
|
||||
tasks = [build_item(f) for f in files_obj]
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
materials = []
|
||||
for item in results:
|
||||
if not item:
|
||||
continue
|
||||
if isinstance(item, Exception):
|
||||
logger.warning(f"Material signed url build failed: {item}")
|
||||
continue
|
||||
materials.append(item)
|
||||
materials.sort(key=lambda x: x['id'], reverse=True)
|
||||
return materials
|
||||
except Exception as e:
|
||||
logger.error(f"List materials failed: {e}")
|
||||
return []
|
||||
|
||||
|
||||
async def delete_material(material_id: str, user_id: str) -> None:
|
||||
"""删除素材"""
|
||||
if ".." in material_id:
|
||||
raise ValueError("非法素材ID")
|
||||
if not material_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权删除此素材")
|
||||
await storage_service.delete_file(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=material_id
|
||||
)
|
||||
|
||||
|
||||
async def rename_material(material_id: str, new_name_raw: str, user_id: str) -> dict:
|
||||
"""重命名素材,返回更新后的素材信息"""
|
||||
if ".." in material_id:
|
||||
raise ValueError("非法素材ID")
|
||||
if not material_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权重命名此素材")
|
||||
|
||||
new_name_raw = new_name_raw.strip() if new_name_raw else ""
|
||||
if not new_name_raw:
|
||||
raise ValueError("新名称不能为空")
|
||||
|
||||
old_name = material_id.split("/", 1)[1]
|
||||
old_ext = Path(old_name).suffix
|
||||
base_name = Path(new_name_raw).stem if Path(new_name_raw).suffix else new_name_raw
|
||||
safe_base = sanitize_filename(base_name).strip()
|
||||
if not safe_base:
|
||||
raise ValueError("新名称无效")
|
||||
|
||||
new_filename = f"{safe_base}{old_ext}"
|
||||
|
||||
prefix = None
|
||||
if "_" in old_name:
|
||||
maybe_prefix, _ = old_name.split("_", 1)
|
||||
if maybe_prefix.isdigit():
|
||||
prefix = maybe_prefix
|
||||
if prefix:
|
||||
new_filename = f"{prefix}_{new_filename}"
|
||||
|
||||
new_path = f"{user_id}/{new_filename}"
|
||||
|
||||
if new_path != material_id:
|
||||
await storage_service.move_file(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
from_path=material_id,
|
||||
to_path=new_path
|
||||
)
|
||||
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_MATERIALS,
|
||||
path=new_path
|
||||
)
|
||||
|
||||
display_name = _extract_display_name(new_filename)
|
||||
|
||||
return {
|
||||
"id": new_path,
|
||||
"name": display_name,
|
||||
"path": signed_url,
|
||||
}
|
||||
0
backend/app/modules/payment/__init__.py
Normal file
0
backend/app/modules/payment/__init__.py
Normal file
52
backend/app/modules/payment/router.py
Normal file
52
backend/app/modules/payment/router.py
Normal file
@@ -0,0 +1,52 @@
|
||||
"""
|
||||
支付 API:创建订单、异步通知、状态查询
|
||||
|
||||
遵循 BACKEND_DEV.md 规范:router 只做参数校验、调用 service、返回统一响应
|
||||
"""
|
||||
from fastapi import APIRouter, HTTPException, Request, status
|
||||
from fastapi.responses import PlainTextResponse
|
||||
|
||||
from app.core.response import success_response
|
||||
from .schemas import CreateOrderRequest, CreateOrderResponse, OrderStatusResponse
|
||||
from . import service
|
||||
|
||||
router = APIRouter(prefix="/api/payment", tags=["支付"])
|
||||
|
||||
|
||||
@router.post("/create-order")
|
||||
async def create_payment_order(request: CreateOrderRequest):
|
||||
"""创建支付宝电脑网站支付订单,返回收银台 URL"""
|
||||
try:
|
||||
result = service.create_payment_order(request.payment_token)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail=str(e))
|
||||
except RuntimeError as e:
|
||||
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=str(e))
|
||||
|
||||
return success_response(
|
||||
CreateOrderResponse(**result).model_dump()
|
||||
)
|
||||
|
||||
|
||||
@router.post("/notify")
|
||||
async def payment_notify(request: Request):
|
||||
"""
|
||||
支付宝异步通知回调
|
||||
|
||||
必须返回纯文本 "success"(不是 JSON),否则支付宝会重复推送。
|
||||
"""
|
||||
form_data = await request.form()
|
||||
verified = service.handle_payment_notify(dict(form_data))
|
||||
return PlainTextResponse("success" if verified else "fail")
|
||||
|
||||
|
||||
@router.get("/status/{out_trade_no}")
|
||||
async def check_payment_status(out_trade_no: str):
|
||||
"""查询订单支付状态(前端轮询)"""
|
||||
order_status = service.get_order_status(out_trade_no)
|
||||
if order_status is None:
|
||||
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="订单不存在")
|
||||
|
||||
return success_response(
|
||||
OrderStatusResponse(status=order_status).model_dump()
|
||||
)
|
||||
15
backend/app/modules/payment/schemas.py
Normal file
15
backend/app/modules/payment/schemas.py
Normal file
@@ -0,0 +1,15 @@
|
||||
from pydantic import BaseModel
|
||||
|
||||
|
||||
class CreateOrderRequest(BaseModel):
|
||||
payment_token: str
|
||||
|
||||
|
||||
class CreateOrderResponse(BaseModel):
|
||||
pay_url: str
|
||||
out_trade_no: str
|
||||
amount: float
|
||||
|
||||
|
||||
class OrderStatusResponse(BaseModel):
|
||||
status: str
|
||||
137
backend/app/modules/payment/service.py
Normal file
137
backend/app/modules/payment/service.py
Normal file
@@ -0,0 +1,137 @@
|
||||
"""
|
||||
支付业务服务
|
||||
|
||||
职责:Alipay SDK 封装、创建订单、处理支付通知、查询状态
|
||||
遵循 BACKEND_DEV.md "薄路由 + 厚服务" 原则
|
||||
"""
|
||||
from datetime import datetime, timezone, timedelta
|
||||
import uuid
|
||||
|
||||
from alipay import AliPay
|
||||
from loguru import logger
|
||||
|
||||
from app.core.config import settings
|
||||
from app.core.security import decode_payment_token
|
||||
from app.repositories.orders import create_order, get_order_by_trade_no, update_order_status
|
||||
from app.repositories.users import update_user
|
||||
|
||||
# 支付宝网关地址
|
||||
ALIPAY_GATEWAY = "https://openapi.alipay.com/gateway.do"
|
||||
ALIPAY_GATEWAY_SANDBOX = "https://openapi-sandbox.dl.alipaydev.com/gateway.do"
|
||||
|
||||
|
||||
def _get_alipay_client() -> AliPay:
|
||||
"""延迟初始化 Alipay 客户端"""
|
||||
return AliPay(
|
||||
appid=settings.ALIPAY_APP_ID,
|
||||
app_notify_url=settings.ALIPAY_NOTIFY_URL,
|
||||
app_private_key_string=open(settings.ALIPAY_PRIVATE_KEY_PATH).read(),
|
||||
alipay_public_key_string=open(settings.ALIPAY_PUBLIC_KEY_PATH).read(),
|
||||
sign_type="RSA2",
|
||||
debug=settings.ALIPAY_SANDBOX,
|
||||
)
|
||||
|
||||
|
||||
def _create_page_pay_url(out_trade_no: str, amount: float, subject: str) -> str | None:
|
||||
"""调用 alipay.trade.page.pay,返回支付宝收银台 URL"""
|
||||
client = _get_alipay_client()
|
||||
order_string = client.api_alipay_trade_page_pay(
|
||||
subject=subject,
|
||||
out_trade_no=out_trade_no,
|
||||
total_amount=amount,
|
||||
return_url=settings.ALIPAY_RETURN_URL,
|
||||
)
|
||||
if not order_string:
|
||||
logger.error(f"电脑网站支付下单失败: {out_trade_no}")
|
||||
return None
|
||||
|
||||
gateway = ALIPAY_GATEWAY_SANDBOX if settings.ALIPAY_SANDBOX else ALIPAY_GATEWAY
|
||||
pay_url = f"{gateway}?{order_string}"
|
||||
logger.info(f"电脑网站支付下单成功: {out_trade_no}")
|
||||
return pay_url
|
||||
|
||||
|
||||
def _verify_signature(data: dict, signature: str) -> bool:
|
||||
"""验证支付宝异步通知签名"""
|
||||
client = _get_alipay_client()
|
||||
return client.verify(data, signature)
|
||||
|
||||
|
||||
def create_payment_order(payment_token: str) -> dict:
|
||||
"""
|
||||
创建支付订单完整流程
|
||||
|
||||
Returns: {"pay_url": str, "out_trade_no": str, "amount": float}
|
||||
Raises: ValueError (token 无效), RuntimeError (API 失败)
|
||||
"""
|
||||
user_id = decode_payment_token(payment_token)
|
||||
if not user_id:
|
||||
raise ValueError("付费凭证无效或已过期,请重新登录")
|
||||
|
||||
out_trade_no = f"VG_{int(datetime.now().timestamp())}_{uuid.uuid4().hex[:8]}"
|
||||
amount = settings.PAYMENT_AMOUNT
|
||||
|
||||
create_order(user_id, out_trade_no, amount)
|
||||
|
||||
pay_url = _create_page_pay_url(out_trade_no, amount, "IPAgent 会员开通")
|
||||
if not pay_url:
|
||||
raise RuntimeError("创建支付订单失败,请稍后重试")
|
||||
|
||||
logger.info(f"用户 {user_id} 创建支付订单: {out_trade_no}")
|
||||
|
||||
return {"pay_url": pay_url, "out_trade_no": out_trade_no, "amount": amount}
|
||||
|
||||
|
||||
def handle_payment_notify(form_data: dict) -> bool:
|
||||
"""
|
||||
处理支付宝异步通知完整流程
|
||||
|
||||
Returns: True=验签通过, False=验签失败
|
||||
"""
|
||||
data = dict(form_data)
|
||||
|
||||
signature = data.pop("sign", "")
|
||||
data.pop("sign_type", None)
|
||||
|
||||
if not _verify_signature(data, signature):
|
||||
logger.warning(f"支付宝通知验签失败: {data.get('out_trade_no')}")
|
||||
return False
|
||||
|
||||
out_trade_no = data.get("out_trade_no", "")
|
||||
trade_status = data.get("trade_status", "")
|
||||
trade_no = data.get("trade_no", "")
|
||||
|
||||
logger.info(f"收到支付宝通知: {out_trade_no}, status={trade_status}, trade_no={trade_no}")
|
||||
|
||||
if trade_status not in ("TRADE_SUCCESS", "TRADE_FINISHED"):
|
||||
return True
|
||||
|
||||
order = get_order_by_trade_no(out_trade_no)
|
||||
if not order:
|
||||
logger.warning(f"订单不存在: {out_trade_no}")
|
||||
return True
|
||||
|
||||
if order["status"] == "paid":
|
||||
logger.info(f"订单已处理过: {out_trade_no}")
|
||||
return True
|
||||
|
||||
update_order_status(out_trade_no, "paid", trade_no)
|
||||
|
||||
user_id = order["user_id"]
|
||||
expires_at = (datetime.now(timezone.utc) + timedelta(days=settings.PAYMENT_EXPIRE_DAYS)).isoformat()
|
||||
update_user(user_id, {
|
||||
"is_active": True,
|
||||
"role": "user",
|
||||
"expires_at": expires_at,
|
||||
})
|
||||
|
||||
logger.success(f"用户 {user_id} 支付成功,已激活,有效期至 {expires_at}")
|
||||
return True
|
||||
|
||||
|
||||
def get_order_status(out_trade_no: str) -> str | None:
|
||||
"""查询订单支付状态"""
|
||||
order = get_order_by_trade_no(out_trade_no)
|
||||
if not order:
|
||||
return None
|
||||
return order["status"]
|
||||
0
backend/app/modules/publish/__init__.py
Normal file
0
backend/app/modules/publish/__init__.py
Normal file
@@ -1,13 +1,17 @@
|
||||
"""
|
||||
发布管理 API (支持用户认证)
|
||||
"""
|
||||
from fastapi import APIRouter, HTTPException, BackgroundTasks, Depends, Request
|
||||
from pydantic import BaseModel
|
||||
from typing import List, Optional
|
||||
from datetime import datetime
|
||||
from loguru import logger
|
||||
from app.services.publish_service import PublishService
|
||||
from app.core.deps import get_current_user_optional
|
||||
from fastapi import APIRouter, HTTPException, BackgroundTasks, Depends, Request
|
||||
from fastapi.responses import FileResponse
|
||||
from pydantic import BaseModel
|
||||
from typing import List, Optional
|
||||
from datetime import datetime
|
||||
import re
|
||||
from loguru import logger
|
||||
from app.services.publish_service import PublishService
|
||||
from app.core.response import success_response
|
||||
from app.core.config import settings
|
||||
from app.core.deps import get_current_user
|
||||
|
||||
router = APIRouter()
|
||||
publish_service = PublishService()
|
||||
@@ -29,7 +33,7 @@ class PublishResponse(BaseModel):
|
||||
url: Optional[str] = None
|
||||
|
||||
# Supported platforms for validation
|
||||
SUPPORTED_PLATFORMS = {"bilibili", "douyin", "xiaohongshu"}
|
||||
SUPPORTED_PLATFORMS = {"bilibili", "douyin", "xiaohongshu", "weixin"}
|
||||
|
||||
|
||||
def _get_user_id(request: Request) -> Optional[str]:
|
||||
@@ -46,8 +50,8 @@ def _get_user_id(request: Request) -> Optional[str]:
|
||||
return None
|
||||
|
||||
|
||||
@router.post("", response_model=PublishResponse)
|
||||
async def publish_video(request: PublishRequest, req: Request, background_tasks: BackgroundTasks):
|
||||
@router.post("")
|
||||
async def publish_video(request: PublishRequest, req: Request, background_tasks: BackgroundTasks):
|
||||
"""发布视频到指定平台"""
|
||||
# Validate platform
|
||||
if request.platform not in SUPPORTED_PLATFORMS:
|
||||
@@ -69,27 +73,23 @@ async def publish_video(request: PublishRequest, req: Request, background_tasks:
|
||||
publish_time=request.publish_time,
|
||||
user_id=user_id
|
||||
)
|
||||
return PublishResponse(
|
||||
success=result.get("success", False),
|
||||
message=result.get("message", ""),
|
||||
platform=request.platform,
|
||||
url=result.get("url")
|
||||
)
|
||||
message = result.get("message", "")
|
||||
return success_response(result, message=message)
|
||||
except Exception as e:
|
||||
logger.error(f"发布失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
|
||||
@router.get("/platforms")
|
||||
async def list_platforms():
|
||||
return {"platforms": [{**pinfo, "id": pid} for pid, pinfo in publish_service.PLATFORMS.items()]}
|
||||
async def list_platforms():
|
||||
return success_response({"platforms": [{**pinfo, "id": pid} for pid, pinfo in publish_service.PLATFORMS.items()]})
|
||||
|
||||
@router.get("/accounts")
|
||||
async def list_accounts(req: Request):
|
||||
user_id = _get_user_id(req)
|
||||
return {"accounts": publish_service.get_accounts(user_id)}
|
||||
async def list_accounts(req: Request):
|
||||
user_id = _get_user_id(req)
|
||||
return success_response({"accounts": publish_service.get_accounts(user_id)})
|
||||
|
||||
@router.post("/login/{platform}")
|
||||
async def login_platform(platform: str, req: Request):
|
||||
async def login_platform(platform: str, req: Request):
|
||||
"""触发平台QR码登录"""
|
||||
if platform not in SUPPORTED_PLATFORMS:
|
||||
raise HTTPException(status_code=400, detail=f"不支持的平台: {platform}")
|
||||
@@ -97,32 +97,33 @@ async def login_platform(platform: str, req: Request):
|
||||
user_id = _get_user_id(req)
|
||||
result = await publish_service.login(platform, user_id)
|
||||
|
||||
if result.get("success"):
|
||||
return result
|
||||
else:
|
||||
raise HTTPException(status_code=400, detail=result.get("message"))
|
||||
message = result.get("message", "")
|
||||
return success_response(result, message=message)
|
||||
|
||||
@router.post("/logout/{platform}")
|
||||
async def logout_platform(platform: str, req: Request):
|
||||
async def logout_platform(platform: str, req: Request):
|
||||
"""注销平台登录"""
|
||||
if platform not in SUPPORTED_PLATFORMS:
|
||||
raise HTTPException(status_code=400, detail=f"不支持的平台: {platform}")
|
||||
|
||||
user_id = _get_user_id(req)
|
||||
result = publish_service.logout(platform, user_id)
|
||||
return result
|
||||
result = publish_service.logout(platform, user_id)
|
||||
message = result.get("message", "")
|
||||
return success_response(result, message=message)
|
||||
|
||||
@router.get("/login/status/{platform}")
|
||||
async def get_login_status(platform: str, req: Request):
|
||||
async def get_login_status(platform: str, req: Request):
|
||||
"""检查登录状态 (优先检查活跃的扫码会话)"""
|
||||
if platform not in SUPPORTED_PLATFORMS:
|
||||
raise HTTPException(status_code=400, detail=f"不支持的平台: {platform}")
|
||||
|
||||
user_id = _get_user_id(req)
|
||||
return publish_service.get_login_session_status(platform, user_id)
|
||||
result = publish_service.get_login_session_status(platform, user_id)
|
||||
message = result.get("message", "")
|
||||
return success_response(result, message=message)
|
||||
|
||||
@router.post("/cookies/save/{platform}")
|
||||
async def save_platform_cookie(platform: str, cookie_data: dict, req: Request):
|
||||
@router.post("/cookies/save/{platform}")
|
||||
async def save_platform_cookie(platform: str, cookie_data: dict, req: Request):
|
||||
"""
|
||||
保存从客户端浏览器提取的Cookie
|
||||
|
||||
@@ -140,7 +141,25 @@ async def save_platform_cookie(platform: str, cookie_data: dict, req: Request):
|
||||
user_id = _get_user_id(req)
|
||||
result = await publish_service.save_cookie_string(platform, cookie_string, user_id)
|
||||
|
||||
if result.get("success"):
|
||||
return result
|
||||
else:
|
||||
raise HTTPException(status_code=400, detail=result.get("message"))
|
||||
message = result.get("message", "")
|
||||
return success_response(result, message=message)
|
||||
|
||||
|
||||
@router.get("/screenshot/{filename}")
|
||||
async def get_publish_screenshot(
|
||||
filename: str,
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
if not re.match(r"^[A-Za-z0-9_.-]+$", filename):
|
||||
raise HTTPException(status_code=400, detail="非法文件名")
|
||||
|
||||
user_id = str(current_user.get("id") or "")
|
||||
if not user_id:
|
||||
raise HTTPException(status_code=401, detail="未登录")
|
||||
|
||||
user_dir = re.sub(r"[^A-Za-z0-9_-]", "_", user_id)[:64] or "legacy"
|
||||
file_path = settings.PUBLISH_SCREENSHOT_DIR / user_dir / filename
|
||||
if not file_path.exists() or not file_path.is_file():
|
||||
raise HTTPException(status_code=404, detail="截图不存在")
|
||||
|
||||
return FileResponse(path=str(file_path), media_type="image/png")
|
||||
0
backend/app/modules/ref_audios/__init__.py
Normal file
0
backend/app/modules/ref_audios/__init__.py
Normal file
88
backend/app/modules/ref_audios/router.py
Normal file
88
backend/app/modules/ref_audios/router.py
Normal file
@@ -0,0 +1,88 @@
|
||||
"""参考音频管理 API"""
|
||||
from fastapi import APIRouter, UploadFile, File, Form, HTTPException, Depends
|
||||
from loguru import logger
|
||||
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
from app.modules.ref_audios.schemas import RenameRequest
|
||||
from app.modules.ref_audios import service
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
@router.post("")
|
||||
async def upload_ref_audio(
|
||||
file: UploadFile = File(...),
|
||||
ref_text: str = Form(""),
|
||||
user: dict = Depends(get_current_user)
|
||||
):
|
||||
"""上传参考音频"""
|
||||
try:
|
||||
result = await service.upload_ref_audio(file, ref_text, user["id"])
|
||||
return success_response(result)
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
except Exception as e:
|
||||
logger.error(f"上传参考音频失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"上传失败: {str(e)}")
|
||||
|
||||
|
||||
@router.get("")
|
||||
async def list_ref_audios(user: dict = Depends(get_current_user)):
|
||||
"""列出当前用户的所有参考音频"""
|
||||
try:
|
||||
result = await service.list_ref_audios(user["id"])
|
||||
return success_response(result)
|
||||
except Exception as e:
|
||||
logger.error(f"列出参考音频失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"获取列表失败: {str(e)}")
|
||||
|
||||
|
||||
@router.delete("/{audio_id:path}")
|
||||
async def delete_ref_audio(audio_id: str, user: dict = Depends(get_current_user)):
|
||||
"""删除参考音频"""
|
||||
try:
|
||||
await service.delete_ref_audio(audio_id, user["id"])
|
||||
return success_response(message="删除成功")
|
||||
except PermissionError as e:
|
||||
raise HTTPException(status_code=403, detail=str(e))
|
||||
except Exception as e:
|
||||
logger.error(f"删除参考音频失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"删除失败: {str(e)}")
|
||||
|
||||
|
||||
@router.put("/{audio_id:path}")
|
||||
async def rename_ref_audio(
|
||||
audio_id: str,
|
||||
request: RenameRequest,
|
||||
user: dict = Depends(get_current_user)
|
||||
):
|
||||
"""重命名参考音频"""
|
||||
try:
|
||||
result = await service.rename_ref_audio(audio_id, request.new_name, user["id"])
|
||||
return success_response(result, message="重命名成功")
|
||||
except PermissionError as e:
|
||||
raise HTTPException(status_code=403, detail=str(e))
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
except Exception as e:
|
||||
logger.error(f"重命名失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"重命名失败: {str(e)}")
|
||||
|
||||
|
||||
@router.post("/{audio_id:path}/retranscribe")
|
||||
async def retranscribe_ref_audio(
|
||||
audio_id: str,
|
||||
user: dict = Depends(get_current_user)
|
||||
):
|
||||
"""重新识别参考音频的文字内容"""
|
||||
try:
|
||||
result = await service.retranscribe_ref_audio(audio_id, user["id"])
|
||||
return success_response(result, message="识别完成")
|
||||
except PermissionError as e:
|
||||
raise HTTPException(status_code=403, detail=str(e))
|
||||
except ValueError as e:
|
||||
raise HTTPException(status_code=400, detail=str(e))
|
||||
except Exception as e:
|
||||
logger.error(f"重新识别失败: {e}")
|
||||
raise HTTPException(status_code=500, detail=f"识别失败: {str(e)}")
|
||||
19
backend/app/modules/ref_audios/schemas.py
Normal file
19
backend/app/modules/ref_audios/schemas.py
Normal file
@@ -0,0 +1,19 @@
|
||||
from pydantic import BaseModel
|
||||
from typing import List
|
||||
|
||||
|
||||
class RefAudioResponse(BaseModel):
|
||||
id: str
|
||||
name: str
|
||||
path: str
|
||||
ref_text: str
|
||||
duration_sec: float
|
||||
created_at: int
|
||||
|
||||
|
||||
class RefAudioListResponse(BaseModel):
|
||||
items: List[RefAudioResponse]
|
||||
|
||||
|
||||
class RenameRequest(BaseModel):
|
||||
new_name: str
|
||||
407
backend/app/modules/ref_audios/service.py
Normal file
407
backend/app/modules/ref_audios/service.py
Normal file
@@ -0,0 +1,407 @@
|
||||
import re
|
||||
import os
|
||||
import time
|
||||
import json
|
||||
import hashlib
|
||||
import asyncio
|
||||
import subprocess
|
||||
import tempfile
|
||||
import unicodedata
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import httpx
|
||||
from loguru import logger
|
||||
|
||||
from app.services.storage import storage_service
|
||||
from app.modules.ref_audios.schemas import RefAudioResponse, RefAudioListResponse
|
||||
|
||||
ALLOWED_AUDIO_EXTENSIONS = {'.wav', '.mp3', '.m4a', '.webm', '.ogg', '.flac', '.aac'}
|
||||
BUCKET_REF_AUDIOS = "ref-audios"
|
||||
|
||||
|
||||
def sanitize_filename(filename: str) -> str:
|
||||
"""清理文件名用于 Storage key(仅保留 ASCII 安全字符)。"""
|
||||
normalized = unicodedata.normalize("NFKD", filename)
|
||||
ascii_name = normalized.encode("ascii", "ignore").decode("ascii")
|
||||
safe_name = re.sub(r"[^A-Za-z0-9._-]+", "_", ascii_name).strip("._-")
|
||||
|
||||
# 纯中文/emoji 等场景会被清空,使用稳定哈希兜底,避免 InvalidKey
|
||||
if not safe_name:
|
||||
digest = hashlib.md5(filename.encode("utf-8")).hexdigest()[:12]
|
||||
safe_name = f"audio_{digest}"
|
||||
|
||||
if len(safe_name) > 50:
|
||||
ext = Path(safe_name).suffix
|
||||
safe_name = safe_name[:50 - len(ext)] + ext
|
||||
return safe_name
|
||||
|
||||
|
||||
def _get_audio_duration(file_path: str) -> float:
|
||||
"""获取音频时长 (秒)"""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
['ffprobe', '-v', 'quiet', '-show_entries', 'format=duration',
|
||||
'-of', 'csv=p=0', file_path],
|
||||
capture_output=True, text=True, timeout=10
|
||||
)
|
||||
return float(result.stdout.strip())
|
||||
except Exception as e:
|
||||
logger.warning(f"获取音频时长失败: {e}")
|
||||
return 0.0
|
||||
|
||||
|
||||
def _find_silence_cut_point(file_path: str, max_duration: float) -> float:
|
||||
"""在 max_duration 附近找一个静音点作为截取位置,找不到则回退到 max_duration"""
|
||||
try:
|
||||
# 用 silencedetect 找所有静音段(阈值 -30dB,最短 0.3 秒)
|
||||
result = subprocess.run(
|
||||
['ffmpeg', '-i', file_path, '-af',
|
||||
'silencedetect=noise=-30dB:d=0.3', '-f', 'null', '-'],
|
||||
capture_output=True, text=True, timeout=30
|
||||
)
|
||||
# 解析 silence_end 时间点
|
||||
import re as _re
|
||||
ends = [float(m) for m in _re.findall(r'silence_end:\s*([\d.]+)', result.stderr)]
|
||||
# 找 max_duration 之前最后一个静音结束点(至少 3 秒)
|
||||
candidates = [t for t in ends if 3.0 <= t <= max_duration]
|
||||
if candidates:
|
||||
cut = candidates[-1]
|
||||
logger.info(f"Found silence cut point at {cut:.1f}s (max={max_duration}s)")
|
||||
return cut
|
||||
except Exception as e:
|
||||
logger.warning(f"Silence detection failed: {e}")
|
||||
return max_duration
|
||||
|
||||
|
||||
def _convert_to_wav(input_path: str, output_path: str, max_duration: float = 0) -> bool:
|
||||
"""将音频转换为 WAV 格式 (16kHz, mono),可选截取前 max_duration 秒并淡出"""
|
||||
try:
|
||||
cmd = ['ffmpeg', '-y', '-i', input_path]
|
||||
if max_duration > 0:
|
||||
cmd += ['-t', str(max_duration)]
|
||||
# 末尾 0.1 秒淡出,避免截断爆音
|
||||
fade_start = max(0, max_duration - 0.1)
|
||||
cmd += ['-af', f'afade=t=out:st={fade_start}:d=0.1']
|
||||
cmd += ['-ar', '16000', '-ac', '1', '-acodec', 'pcm_s16le', output_path]
|
||||
subprocess.run(cmd, capture_output=True, timeout=60, check=True)
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.error(f"音频转换失败: {e}")
|
||||
return False
|
||||
|
||||
|
||||
async def upload_ref_audio(file, ref_text: str, user_id: str) -> dict:
|
||||
"""上传参考音频:转码、获取时长、存储到 Supabase"""
|
||||
if not file.filename:
|
||||
raise ValueError("文件名无效")
|
||||
filename = file.filename
|
||||
|
||||
ext = Path(filename).suffix.lower()
|
||||
if ext not in ALLOWED_AUDIO_EXTENSIONS:
|
||||
raise ValueError(f"不支持的音频格式: {ext}。支持的格式: {', '.join(ALLOWED_AUDIO_EXTENSIONS)}")
|
||||
|
||||
# 创建临时文件
|
||||
with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp_input:
|
||||
content = await file.read()
|
||||
if len(content) > 5 * 1024 * 1024:
|
||||
raise ValueError("参考音频文件大小不能超过 5MB")
|
||||
tmp_input.write(content)
|
||||
tmp_input_path = tmp_input.name
|
||||
|
||||
try:
|
||||
# 转换为 WAV 格式
|
||||
tmp_wav_path = tmp_input_path + ".wav"
|
||||
if not _convert_to_wav(tmp_input_path, tmp_wav_path):
|
||||
raise RuntimeError("音频格式转换失败")
|
||||
|
||||
# 获取音频时长
|
||||
duration = _get_audio_duration(tmp_wav_path)
|
||||
if duration < 1.0:
|
||||
raise ValueError("音频时长过短,至少需要 1 秒")
|
||||
|
||||
# 超过 10 秒自动在静音点截取(CosyVoice 对 3-10 秒效果最好)
|
||||
MAX_REF_DURATION = 10.0
|
||||
if duration > MAX_REF_DURATION:
|
||||
cut_point = _find_silence_cut_point(tmp_wav_path, MAX_REF_DURATION)
|
||||
logger.info(f"Ref audio {duration:.1f}s > {MAX_REF_DURATION}s, trimming at {cut_point:.1f}s")
|
||||
trimmed_path = tmp_input_path + "_trimmed.wav"
|
||||
if not _convert_to_wav(tmp_wav_path, trimmed_path, max_duration=cut_point):
|
||||
raise RuntimeError("音频截取失败")
|
||||
os.unlink(tmp_wav_path)
|
||||
tmp_wav_path = trimmed_path
|
||||
duration = _get_audio_duration(tmp_wav_path)
|
||||
|
||||
# 自动转写参考音频内容
|
||||
try:
|
||||
from app.services.whisper_service import whisper_service
|
||||
transcribed = await whisper_service.transcribe(tmp_wav_path)
|
||||
if transcribed.strip():
|
||||
ref_text = transcribed.strip()
|
||||
logger.info(f"Auto-transcribed ref audio: {ref_text[:50]}...")
|
||||
except Exception as e:
|
||||
logger.warning(f"Auto-transcribe failed: {e}")
|
||||
|
||||
if not ref_text or not ref_text.strip():
|
||||
raise ValueError("无法识别音频内容,请确保音频包含清晰的语音")
|
||||
|
||||
# 检查重名
|
||||
existing_files = await storage_service.list_files(BUCKET_REF_AUDIOS, user_id)
|
||||
dup_count = 0
|
||||
search_suffix = f"_{filename}"
|
||||
for f in existing_files:
|
||||
fname = f.get('name', '')
|
||||
if fname.endswith(search_suffix):
|
||||
dup_count += 1
|
||||
|
||||
final_display_name = filename
|
||||
if dup_count > 0:
|
||||
name_stem = Path(filename).stem
|
||||
name_ext = Path(filename).suffix
|
||||
final_display_name = f"{name_stem}({dup_count}){name_ext}"
|
||||
|
||||
# 生成存储路径
|
||||
timestamp = int(time.time())
|
||||
safe_name = sanitize_filename(Path(filename).stem)
|
||||
storage_path = f"{user_id}/{timestamp}_{safe_name}.wav"
|
||||
|
||||
# 上传 WAV 文件
|
||||
with open(tmp_wav_path, 'rb') as f:
|
||||
wav_data = f.read()
|
||||
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET_REF_AUDIOS,
|
||||
path=storage_path,
|
||||
file_data=wav_data,
|
||||
content_type="audio/wav"
|
||||
)
|
||||
|
||||
# 上传元数据 JSON
|
||||
metadata = {
|
||||
"ref_text": ref_text.strip(),
|
||||
"original_filename": final_display_name,
|
||||
"duration_sec": duration,
|
||||
"created_at": timestamp
|
||||
}
|
||||
metadata_path = f"{user_id}/{timestamp}_{safe_name}.json"
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET_REF_AUDIOS,
|
||||
path=metadata_path,
|
||||
file_data=json.dumps(metadata, ensure_ascii=False).encode('utf-8'),
|
||||
content_type="application/json"
|
||||
)
|
||||
|
||||
# 获取签名 URL
|
||||
signed_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, storage_path)
|
||||
|
||||
return RefAudioResponse(
|
||||
id=storage_path,
|
||||
name=filename,
|
||||
path=signed_url,
|
||||
ref_text=ref_text.strip(),
|
||||
duration_sec=duration,
|
||||
created_at=timestamp
|
||||
).model_dump()
|
||||
|
||||
finally:
|
||||
os.unlink(tmp_input_path)
|
||||
if os.path.exists(tmp_input_path + ".wav"):
|
||||
os.unlink(tmp_input_path + ".wav")
|
||||
|
||||
|
||||
async def list_ref_audios(user_id: str) -> dict:
|
||||
"""列出用户的所有参考音频"""
|
||||
files = await storage_service.list_files(BUCKET_REF_AUDIOS, user_id)
|
||||
wav_files = [f for f in files if f.get("name", "").endswith(".wav")]
|
||||
|
||||
if not wav_files:
|
||||
return RefAudioListResponse(items=[]).model_dump()
|
||||
|
||||
async def fetch_audio_info(f):
|
||||
name = f.get("name", "")
|
||||
storage_path = f"{user_id}/{name}"
|
||||
metadata_name = name.replace(".wav", ".json")
|
||||
metadata_path = f"{user_id}/{metadata_name}"
|
||||
|
||||
ref_text = ""
|
||||
duration_sec = 0.0
|
||||
created_at = 0
|
||||
original_filename = ""
|
||||
|
||||
try:
|
||||
metadata_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, metadata_path)
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.get(metadata_url)
|
||||
if resp.status_code == 200:
|
||||
metadata = resp.json()
|
||||
ref_text = metadata.get("ref_text", "")
|
||||
duration_sec = metadata.get("duration_sec", 0.0)
|
||||
created_at = metadata.get("created_at", 0)
|
||||
original_filename = metadata.get("original_filename", "")
|
||||
except Exception as e:
|
||||
logger.debug(f"读取 metadata 失败: {e}")
|
||||
try:
|
||||
created_at = int(name.split("_")[0])
|
||||
except:
|
||||
pass
|
||||
|
||||
signed_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, storage_path)
|
||||
|
||||
display_name = original_filename if original_filename else name
|
||||
if not display_name or display_name == name:
|
||||
match = re.match(r'^\d+_(.+)$', name)
|
||||
if match:
|
||||
display_name = match.group(1)
|
||||
|
||||
return RefAudioResponse(
|
||||
id=storage_path,
|
||||
name=display_name,
|
||||
path=signed_url,
|
||||
ref_text=ref_text,
|
||||
duration_sec=duration_sec,
|
||||
created_at=created_at
|
||||
)
|
||||
|
||||
items = await asyncio.gather(*[fetch_audio_info(f) for f in wav_files])
|
||||
items = sorted(items, key=lambda x: x.created_at, reverse=True)
|
||||
|
||||
return RefAudioListResponse(items=items).model_dump()
|
||||
|
||||
|
||||
async def delete_ref_audio(audio_id: str, user_id: str) -> None:
|
||||
"""删除参考音频及其元数据"""
|
||||
if not audio_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权删除此文件")
|
||||
|
||||
await storage_service.delete_file(BUCKET_REF_AUDIOS, audio_id)
|
||||
|
||||
metadata_path = audio_id.replace(".wav", ".json")
|
||||
try:
|
||||
await storage_service.delete_file(BUCKET_REF_AUDIOS, metadata_path)
|
||||
except:
|
||||
pass
|
||||
|
||||
|
||||
async def rename_ref_audio(audio_id: str, new_name: str, user_id: str) -> dict:
|
||||
"""重命名参考音频(修改 metadata 中的 display name)"""
|
||||
if not audio_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权修改此文件")
|
||||
|
||||
new_name = new_name.strip()
|
||||
if not new_name:
|
||||
raise ValueError("新名称不能为空")
|
||||
|
||||
if not Path(new_name).suffix:
|
||||
new_name += ".wav"
|
||||
|
||||
# 下载现有 metadata
|
||||
metadata_path = audio_id.replace(".wav", ".json")
|
||||
try:
|
||||
metadata_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, metadata_path)
|
||||
async with httpx.AsyncClient() as client:
|
||||
resp = await client.get(metadata_url)
|
||||
if resp.status_code == 200:
|
||||
metadata = resp.json()
|
||||
else:
|
||||
raise Exception(f"Failed to fetch metadata: {resp.status_code}")
|
||||
except Exception as e:
|
||||
logger.warning(f"无法读取元数据: {e}, 将创建新的元数据")
|
||||
metadata = {
|
||||
"ref_text": "",
|
||||
"duration_sec": 0.0,
|
||||
"created_at": int(time.time()),
|
||||
"original_filename": new_name
|
||||
}
|
||||
|
||||
# 更新并覆盖上传
|
||||
metadata["original_filename"] = new_name
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET_REF_AUDIOS,
|
||||
path=metadata_path,
|
||||
file_data=json.dumps(metadata, ensure_ascii=False).encode('utf-8'),
|
||||
content_type="application/json"
|
||||
)
|
||||
|
||||
return {"name": new_name}
|
||||
|
||||
|
||||
async def retranscribe_ref_audio(audio_id: str, user_id: str) -> dict:
|
||||
"""重新转写参考音频的 ref_text,并截取前 10 秒重新上传(用于迁移旧数据)"""
|
||||
if not audio_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权修改此文件")
|
||||
|
||||
# 下载音频到临时文件
|
||||
audio_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, audio_id)
|
||||
tmp_wav_path = None
|
||||
trimmed_path = None
|
||||
try:
|
||||
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp:
|
||||
tmp_wav_path = tmp.name
|
||||
timeout = httpx.Timeout(None)
|
||||
async with httpx.AsyncClient(timeout=timeout) as client:
|
||||
async with client.stream("GET", audio_url) as resp:
|
||||
resp.raise_for_status()
|
||||
async for chunk in resp.aiter_bytes():
|
||||
tmp.write(chunk)
|
||||
|
||||
# 超过 10 秒则截取前 10 秒并重新上传音频
|
||||
MAX_REF_DURATION = 10.0
|
||||
duration = _get_audio_duration(tmp_wav_path)
|
||||
transcribe_path = tmp_wav_path
|
||||
need_reupload = False
|
||||
|
||||
if duration > MAX_REF_DURATION:
|
||||
cut_point = _find_silence_cut_point(tmp_wav_path, MAX_REF_DURATION)
|
||||
logger.info(f"Retranscribe: trimming {audio_id} from {duration:.1f}s at {cut_point:.1f}s")
|
||||
trimmed_path = tmp_wav_path + "_trimmed.wav"
|
||||
if _convert_to_wav(tmp_wav_path, trimmed_path, max_duration=cut_point):
|
||||
transcribe_path = trimmed_path
|
||||
duration = _get_audio_duration(trimmed_path)
|
||||
need_reupload = True
|
||||
|
||||
# Whisper 转写
|
||||
from app.services.whisper_service import whisper_service
|
||||
transcribed = await whisper_service.transcribe(transcribe_path)
|
||||
if not transcribed or not transcribed.strip():
|
||||
raise ValueError("无法识别音频内容")
|
||||
|
||||
ref_text = transcribed.strip()
|
||||
logger.info(f"Re-transcribed ref audio {audio_id}: {ref_text[:50]}...")
|
||||
|
||||
# 截取过的音频重新上传覆盖原文件
|
||||
if need_reupload and trimmed_path:
|
||||
with open(trimmed_path, "rb") as f:
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET_REF_AUDIOS, path=audio_id,
|
||||
file_data=f.read(), content_type="audio/wav",
|
||||
)
|
||||
logger.info(f"Re-uploaded trimmed audio: {audio_id} ({duration:.1f}s)")
|
||||
|
||||
# 更新 metadata
|
||||
metadata_path = audio_id.replace(".wav", ".json")
|
||||
try:
|
||||
meta_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, metadata_path)
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.get(meta_url)
|
||||
if resp.status_code == 200:
|
||||
metadata = resp.json()
|
||||
else:
|
||||
raise Exception(f"status {resp.status_code}")
|
||||
except Exception:
|
||||
metadata = {}
|
||||
|
||||
metadata["ref_text"] = ref_text
|
||||
metadata["duration_sec"] = duration
|
||||
await storage_service.upload_file(
|
||||
bucket=BUCKET_REF_AUDIOS,
|
||||
path=metadata_path,
|
||||
file_data=json.dumps(metadata, ensure_ascii=False).encode('utf-8'),
|
||||
content_type="application/json"
|
||||
)
|
||||
|
||||
return {"ref_text": ref_text, "duration_sec": duration}
|
||||
finally:
|
||||
if tmp_wav_path and os.path.exists(tmp_wav_path):
|
||||
os.unlink(tmp_wav_path)
|
||||
if trimmed_path and os.path.exists(trimmed_path):
|
||||
os.unlink(trimmed_path)
|
||||
0
backend/app/modules/tools/__init__.py
Normal file
0
backend/app/modules/tools/__init__.py
Normal file
126
backend/app/modules/tools/router.py
Normal file
126
backend/app/modules/tools/router.py
Normal file
@@ -0,0 +1,126 @@
|
||||
from fastapi import APIRouter, Depends, UploadFile, File, Form, HTTPException
|
||||
from typing import Optional
|
||||
from urllib.parse import urlparse
|
||||
import traceback
|
||||
from loguru import logger
|
||||
from pydantic import BaseModel, Field, field_validator
|
||||
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
from app.modules.tools import service
|
||||
from app.services import creator_scraper
|
||||
from app.services.creator_scraper import ALLOWED_INPUT_DOMAINS
|
||||
from app.services.glm_service import glm_service
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
class AnalyzeCreatorRequest(BaseModel):
|
||||
url: str = Field(..., description="博主主页链接(仅支持抖音/B站 https 链接)")
|
||||
|
||||
@field_validator("url")
|
||||
@classmethod
|
||||
def validate_url_format(cls, value: str) -> str:
|
||||
candidate = value.strip()
|
||||
if len(candidate) > 500:
|
||||
raise ValueError("链接过长")
|
||||
|
||||
parsed = urlparse(candidate)
|
||||
if parsed.scheme != "https":
|
||||
raise ValueError("仅支持 https 链接")
|
||||
|
||||
hostname = (parsed.hostname or "").lower()
|
||||
if hostname not in ALLOWED_INPUT_DOMAINS:
|
||||
raise ValueError(f"不支持的域名: {hostname},仅支持抖音和B站")
|
||||
|
||||
return candidate
|
||||
|
||||
|
||||
class GenerateTopicScriptRequest(BaseModel):
|
||||
analysis_id: str = Field(..., min_length=8, max_length=80, description="分析结果ID")
|
||||
topic: str = Field(..., min_length=2, max_length=30, description="选中的话题(2-30字)")
|
||||
word_count: int = Field(..., ge=80, le=1000, description="目标字数(80-1000)")
|
||||
|
||||
|
||||
@router.post("/extract-script")
|
||||
async def extract_script_tool(
|
||||
file: Optional[UploadFile] = File(None),
|
||||
url: Optional[str] = Form(None),
|
||||
rewrite: bool = Form(True),
|
||||
custom_prompt: Optional[str] = Form(None),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""独立文案提取工具"""
|
||||
try:
|
||||
result = await service.extract_script(file=file, url=url, rewrite=rewrite, custom_prompt=custom_prompt)
|
||||
return success_response(result)
|
||||
except ValueError as e:
|
||||
raise HTTPException(400, str(e))
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Tool extract failed: {e}")
|
||||
logger.error(traceback.format_exc())
|
||||
msg = str(e)
|
||||
if "Fresh cookies" in msg:
|
||||
raise HTTPException(500, "下载失败:目标平台开启了反爬验证,请过段时间重试或直接上传视频文件。")
|
||||
raise HTTPException(500, "文案提取失败,请稍后重试")
|
||||
|
||||
|
||||
@router.post("/analyze-creator")
|
||||
async def analyze_creator(
|
||||
req: AnalyzeCreatorRequest,
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""分析博主内容并返回热门话题"""
|
||||
try:
|
||||
user_id = str(current_user.get("id") or "").strip()
|
||||
if not user_id:
|
||||
raise HTTPException(401, "登录状态无效,请重新登录")
|
||||
|
||||
creator_result = await creator_scraper.scrape_creator_titles(req.url, user_id=user_id)
|
||||
titles = creator_result.get("titles") or []
|
||||
topics = await glm_service.analyze_topics(titles)
|
||||
|
||||
analysis_id = creator_scraper.cache_titles(titles, user_id)
|
||||
|
||||
return success_response({
|
||||
"platform": creator_result.get("platform", ""),
|
||||
"creator_name": creator_result.get("creator_name", ""),
|
||||
"topics": topics,
|
||||
"analysis_id": analysis_id,
|
||||
"fetched_count": creator_result.get("fetched_count", len(titles)),
|
||||
})
|
||||
except ValueError as e:
|
||||
raise HTTPException(400, str(e))
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Analyze creator failed: {e}")
|
||||
logger.error(traceback.format_exc())
|
||||
raise HTTPException(500, "博主内容分析失败,请稍后重试")
|
||||
|
||||
|
||||
@router.post("/generate-topic-script")
|
||||
async def generate_topic_script(
|
||||
req: GenerateTopicScriptRequest,
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""根据话题生成文案"""
|
||||
try:
|
||||
user_id = str(current_user.get("id") or "").strip()
|
||||
if not user_id:
|
||||
raise HTTPException(401, "登录状态无效,请重新登录")
|
||||
|
||||
titles = creator_scraper.get_cached_titles(req.analysis_id, user_id)
|
||||
script = await glm_service.generate_script_from_topic(req.topic, req.word_count, titles)
|
||||
|
||||
return success_response({"script": script})
|
||||
except ValueError as e:
|
||||
raise HTTPException(400, str(e))
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Generate topic script failed: {e}")
|
||||
logger.error(traceback.format_exc())
|
||||
raise HTTPException(500, "文案生成失败,请稍后重试")
|
||||
7
backend/app/modules/tools/schemas.py
Normal file
7
backend/app/modules/tools/schemas.py
Normal file
@@ -0,0 +1,7 @@
|
||||
from pydantic import BaseModel
|
||||
from typing import Optional
|
||||
|
||||
|
||||
class ExtractScriptResponse(BaseModel):
|
||||
original_script: Optional[str] = None
|
||||
rewritten_script: Optional[str] = None
|
||||
406
backend/app/modules/tools/service.py
Normal file
406
backend/app/modules/tools/service.py
Normal file
@@ -0,0 +1,406 @@
|
||||
import asyncio
|
||||
import os
|
||||
import re
|
||||
import json
|
||||
import time
|
||||
import shutil
|
||||
import subprocess
|
||||
import traceback
|
||||
from pathlib import Path
|
||||
from typing import Optional, Any
|
||||
from urllib.parse import unquote, parse_qs, urlparse
|
||||
|
||||
import httpx
|
||||
from loguru import logger
|
||||
|
||||
from app.services.whisper_service import whisper_service
|
||||
from app.services.glm_service import glm_service
|
||||
|
||||
|
||||
async def extract_script(file=None, url: Optional[str] = None, rewrite: bool = True, custom_prompt: Optional[str] = None) -> dict:
|
||||
"""
|
||||
文案提取:上传文件或视频链接 -> Whisper 转写 -> (可选) GLM 改写
|
||||
"""
|
||||
if not file and not url:
|
||||
raise ValueError("必须提供文件或视频链接")
|
||||
|
||||
temp_path = None
|
||||
try:
|
||||
timestamp = int(time.time())
|
||||
temp_dir = Path("/tmp")
|
||||
if os.name == 'nt':
|
||||
temp_dir = Path("d:/tmp")
|
||||
temp_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
loop = asyncio.get_event_loop()
|
||||
|
||||
# 1. 获取/保存文件
|
||||
if file:
|
||||
filename = file.filename
|
||||
if not filename:
|
||||
raise ValueError("文件名无效")
|
||||
safe_filename = Path(filename).name.replace(" ", "_")
|
||||
temp_path = temp_dir / f"tool_extract_{timestamp}_{safe_filename}"
|
||||
max_bytes = 500 * 1024 * 1024 # 500MB
|
||||
total_written = 0
|
||||
with open(temp_path, "wb") as dst:
|
||||
while True:
|
||||
chunk = file.file.read(1024 * 1024)
|
||||
if not chunk:
|
||||
break
|
||||
total_written += len(chunk)
|
||||
if total_written > max_bytes:
|
||||
dst.close()
|
||||
os.remove(temp_path)
|
||||
raise ValueError("上传文件大小不能超过 500MB")
|
||||
dst.write(chunk)
|
||||
logger.info(f"Tool processing upload file: {temp_path}")
|
||||
else:
|
||||
temp_path = await _download_video(url, temp_dir, timestamp)
|
||||
|
||||
if not temp_path or not temp_path.exists():
|
||||
raise ValueError("文件获取失败")
|
||||
|
||||
# 下载文件体积检查(500MB 上限)
|
||||
max_download_bytes = 500 * 1024 * 1024
|
||||
file_size = temp_path.stat().st_size
|
||||
if file_size > max_download_bytes:
|
||||
os.remove(temp_path)
|
||||
raise ValueError(f"下载的文件过大({file_size / (1024*1024):.0f}MB),上限 500MB")
|
||||
|
||||
# 1.5 安全转换: 强制转为 WAV (16k)
|
||||
audio_path = temp_dir / f"extract_audio_{timestamp}.wav"
|
||||
try:
|
||||
await loop.run_in_executor(None, lambda: _convert_to_wav(temp_path, audio_path))
|
||||
logger.info(f"Converted to WAV: {audio_path}")
|
||||
except ValueError as ve:
|
||||
if str(ve) == "HTML_DETECTED":
|
||||
raise ValueError("下载的文件是网页而非视频,请重试或手动上传。")
|
||||
else:
|
||||
raise ValueError("下载的文件已损坏或格式无法识别。")
|
||||
|
||||
# 2. 提取文案 (Whisper)
|
||||
script = await whisper_service.transcribe(str(audio_path))
|
||||
|
||||
# 3. AI 改写 (GLM) — 失败时降级返回原文
|
||||
rewritten = None
|
||||
if rewrite and script and len(script.strip()) > 0:
|
||||
logger.info("Rewriting script...")
|
||||
try:
|
||||
rewritten = await glm_service.rewrite_script(script, custom_prompt)
|
||||
except Exception as e:
|
||||
logger.warning(f"GLM rewrite failed, returning original script: {e}")
|
||||
rewritten = None
|
||||
|
||||
return {
|
||||
"original_script": script,
|
||||
"rewritten_script": rewritten
|
||||
}
|
||||
|
||||
finally:
|
||||
if temp_path and temp_path.exists():
|
||||
try:
|
||||
os.remove(temp_path)
|
||||
logger.info(f"Cleaned up temp file: {temp_path}")
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to cleanup temp file {temp_path}: {e}")
|
||||
|
||||
|
||||
def _convert_to_wav(input_path: Path, output_path: Path) -> None:
|
||||
"""FFmpeg 转换为 16k WAV"""
|
||||
try:
|
||||
convert_cmd = [
|
||||
'ffmpeg',
|
||||
'-i', str(input_path),
|
||||
'-vn',
|
||||
'-acodec', 'pcm_s16le',
|
||||
'-ar', '16000',
|
||||
'-ac', '1',
|
||||
'-y',
|
||||
str(output_path)
|
||||
]
|
||||
subprocess.run(convert_cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
except subprocess.CalledProcessError as e:
|
||||
error_log = e.stderr.decode('utf-8', errors='ignore') if e.stderr else str(e)
|
||||
logger.error(f"FFmpeg check/convert failed: {error_log}")
|
||||
head = b""
|
||||
try:
|
||||
with open(input_path, 'rb') as f:
|
||||
head = f.read(100)
|
||||
except:
|
||||
pass
|
||||
if b'<!DOCTYPE html' in head or b'<html' in head:
|
||||
raise ValueError("HTML_DETECTED")
|
||||
raise ValueError("CONVERT_FAILED")
|
||||
|
||||
|
||||
async def _download_video(url: str, temp_dir: Path, timestamp: int) -> Path:
|
||||
"""下载视频(yt-dlp 优先,失败回退手动解析)"""
|
||||
url_value = url
|
||||
url_match = re.search(r'https?://[^\s]+', url_value)
|
||||
if url_match:
|
||||
extracted_url = url_match.group(0)
|
||||
logger.info(f"Extracted URL from text: {extracted_url}")
|
||||
url_value = extracted_url
|
||||
|
||||
logger.info(f"Tool downloading URL: {url_value}")
|
||||
loop = asyncio.get_event_loop()
|
||||
|
||||
# 先尝试 yt-dlp
|
||||
try:
|
||||
temp_path = await loop.run_in_executor(None, lambda: _download_yt_dlp(url_value, temp_dir, timestamp))
|
||||
logger.info(f"yt-dlp downloaded to: {temp_path}")
|
||||
return temp_path
|
||||
except Exception as e:
|
||||
logger.warning(f"yt-dlp download failed: {e}. Trying manual fallback...")
|
||||
|
||||
if "douyin" in url_value:
|
||||
manual_path = await _download_douyin_manual(url_value, temp_dir, timestamp)
|
||||
if manual_path:
|
||||
return manual_path
|
||||
raise ValueError(f"视频下载失败。yt-dlp 报错: {str(e)}")
|
||||
elif "bilibili" in url_value:
|
||||
manual_path = await _download_bilibili_manual(url_value, temp_dir, timestamp)
|
||||
if manual_path:
|
||||
return manual_path
|
||||
raise ValueError(f"视频下载失败。yt-dlp 报错: {str(e)}")
|
||||
else:
|
||||
raise ValueError(f"视频下载失败: {str(e)}")
|
||||
|
||||
|
||||
def _download_yt_dlp(url_value: str, temp_dir: Path, timestamp: int) -> Path:
|
||||
"""yt-dlp 下载(阻塞调用,应在线程池中运行)"""
|
||||
import yt_dlp
|
||||
logger.info("Attempting download with yt-dlp...")
|
||||
|
||||
ydl_opts = {
|
||||
'format': 'bestaudio/best',
|
||||
'outtmpl': str(temp_dir / f"tool_download_{timestamp}_%(id)s.%(ext)s"),
|
||||
'quiet': True,
|
||||
'no_warnings': True,
|
||||
'http_headers': {
|
||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
|
||||
'Referer': 'https://www.douyin.com/',
|
||||
}
|
||||
}
|
||||
|
||||
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
|
||||
info = ydl.extract_info(url_value, download=True)
|
||||
if 'requested_downloads' in info:
|
||||
downloaded_file = info['requested_downloads'][0]['filepath']
|
||||
else:
|
||||
ext = info.get('ext', 'mp4')
|
||||
vid_id = info.get('id')
|
||||
downloaded_file = str(temp_dir / f"tool_download_{timestamp}_{vid_id}.{ext}")
|
||||
|
||||
return Path(downloaded_file)
|
||||
|
||||
|
||||
async def _download_douyin_manual(url: str, temp_dir: Path, timestamp: int) -> Optional[Path]:
|
||||
"""手动下载抖音视频 (Fallback) — 通过移动端分享页获取播放地址"""
|
||||
logger.info(f"[douyin-fallback] Starting download for: {url}")
|
||||
|
||||
try:
|
||||
# 1. 解析短链接,提取视频 ID
|
||||
headers = {
|
||||
"user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15"
|
||||
}
|
||||
|
||||
async with httpx.AsyncClient(follow_redirects=True, timeout=10.0) as client:
|
||||
resp = await client.get(url, headers=headers)
|
||||
final_url = str(resp.url)
|
||||
|
||||
logger.info(f"[douyin-fallback] Final URL: {final_url}")
|
||||
|
||||
video_id = _extract_douyin_video_id(final_url)
|
||||
if not video_id:
|
||||
video_id = _extract_douyin_video_id(url)
|
||||
|
||||
if not video_id:
|
||||
logger.error("[douyin-fallback] Could not extract video_id")
|
||||
return None
|
||||
|
||||
logger.info(f"[douyin-fallback] Extracted video_id: {video_id}")
|
||||
|
||||
# 2. 获取新鲜 ttwid
|
||||
ttwid = ""
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
ttwid_resp = await client.post(
|
||||
"https://ttwid.bytedance.com/ttwid/union/register/",
|
||||
json={
|
||||
"region": "cn", "aid": 6383, "needFid": False,
|
||||
"service": "www.douyin.com",
|
||||
"migrate_info": {"ticket": "", "source": "node"},
|
||||
"cbUrlProtocol": "https", "union": True,
|
||||
}
|
||||
)
|
||||
fresh_ttwid = ttwid_resp.cookies.get("ttwid")
|
||||
ttwid = str(fresh_ttwid) if fresh_ttwid else ""
|
||||
logger.info(f"[douyin-fallback] Got fresh ttwid (len={len(ttwid)})")
|
||||
except Exception as e:
|
||||
logger.warning(f"[douyin-fallback] Failed to get ttwid: {e}")
|
||||
|
||||
# 3. 访问移动端分享页提取播放地址
|
||||
page_headers = {
|
||||
"user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15",
|
||||
"cookie": f"ttwid={ttwid}" if ttwid else "",
|
||||
}
|
||||
|
||||
async with httpx.AsyncClient(follow_redirects=True, timeout=15.0) as client:
|
||||
page_resp = await client.get(
|
||||
f"https://m.douyin.com/share/video/{video_id}",
|
||||
headers=page_headers,
|
||||
)
|
||||
|
||||
page_text = page_resp.text
|
||||
logger.info(f"[douyin-fallback] Mobile page length: {len(page_text)}")
|
||||
|
||||
# 4. 提取 play_addr
|
||||
addr_match = re.search(
|
||||
r'"play_addr":\{"uri":"([^"]+)","url_list":\["([^"]+)"',
|
||||
page_text,
|
||||
)
|
||||
if not addr_match:
|
||||
logger.error("[douyin-fallback] Could not find play_addr in mobile page")
|
||||
return None
|
||||
|
||||
video_url = addr_match.group(2).replace(r"\u002F", "/")
|
||||
if video_url.startswith("//"):
|
||||
video_url = "https:" + video_url
|
||||
|
||||
logger.info(f"[douyin-fallback] Found video URL: {video_url[:80]}...")
|
||||
|
||||
# 5. 下载视频
|
||||
temp_path = temp_dir / f"douyin_manual_{timestamp}.mp4"
|
||||
download_headers = {
|
||||
"Referer": "https://www.douyin.com/",
|
||||
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15",
|
||||
}
|
||||
|
||||
async with httpx.AsyncClient(timeout=120.0, follow_redirects=True) as client:
|
||||
async with client.stream("GET", video_url, headers=download_headers) as dl_resp:
|
||||
if dl_resp.status_code == 200:
|
||||
with open(temp_path, "wb") as f:
|
||||
async for chunk in dl_resp.aiter_bytes(chunk_size=8192):
|
||||
f.write(chunk)
|
||||
|
||||
logger.info(f"[douyin-fallback] Downloaded successfully: {temp_path}")
|
||||
return temp_path
|
||||
else:
|
||||
logger.error(f"[douyin-fallback] Download failed: {dl_resp.status_code}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"[douyin-fallback] Logic failed: {e}")
|
||||
return None
|
||||
|
||||
|
||||
def _extract_douyin_video_id(candidate_url: str) -> Optional[str]:
|
||||
"""从抖音 URL 中提取视频 ID,兼容 video/share/video/modal_id/vid 等形态"""
|
||||
if not candidate_url:
|
||||
return None
|
||||
|
||||
decoded_url = unquote(candidate_url)
|
||||
parsed = urlparse(decoded_url)
|
||||
|
||||
for source in (decoded_url, parsed.path):
|
||||
for pattern in (r"/video/(\d+)", r"/share/video/(\d+)"):
|
||||
match = re.search(pattern, source)
|
||||
if match:
|
||||
return match.group(1)
|
||||
|
||||
id_keys = ("modal_id", "vid", "video_id", "aweme_id", "item_id")
|
||||
for pairs in (parse_qs(parsed.query), parse_qs(parsed.fragment)):
|
||||
for key in id_keys:
|
||||
values = pairs.get(key, [])
|
||||
for value in values:
|
||||
match = re.search(r"(\d+)", value)
|
||||
if match:
|
||||
return match.group(1)
|
||||
|
||||
inline_match = re.search(
|
||||
r"(?:[?&#](?:modal_id|vid|video_id|aweme_id|item_id)=)(\d+)",
|
||||
decoded_url,
|
||||
)
|
||||
if inline_match:
|
||||
return inline_match.group(1)
|
||||
|
||||
return None
|
||||
|
||||
|
||||
async def _download_bilibili_manual(url: str, temp_dir: Path, timestamp: int) -> Optional[Path]:
|
||||
"""手动下载 Bilibili 视频 (Playwright Fallback)"""
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
logger.info(f"[Playwright] Starting Bilibili download for: {url}")
|
||||
|
||||
playwright = None
|
||||
browser = None
|
||||
try:
|
||||
playwright = await async_playwright().start()
|
||||
browser = await playwright.chromium.launch(headless=True, args=['--no-sandbox', '--disable-setuid-sandbox'])
|
||||
|
||||
context = await browser.new_context(
|
||||
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
page = await context.new_page()
|
||||
|
||||
logger.info("[Playwright] Navigating to Bilibili...")
|
||||
await page.goto(url, timeout=45000)
|
||||
|
||||
try:
|
||||
await page.wait_for_selector('video', timeout=15000)
|
||||
except:
|
||||
logger.warning("[Playwright] Video selector timeout")
|
||||
|
||||
playinfo = await page.evaluate("window.__playinfo__")
|
||||
|
||||
audio_url = None
|
||||
|
||||
if playinfo and "data" in playinfo and "dash" in playinfo["data"]:
|
||||
dash = playinfo["data"]["dash"]
|
||||
if "audio" in dash and dash["audio"]:
|
||||
audio_url = dash["audio"][0]["baseUrl"]
|
||||
logger.info(f"[Playwright] Found audio stream in __playinfo__: {audio_url[:50]}...")
|
||||
|
||||
if not audio_url:
|
||||
logger.warning("[Playwright] Could not find audio in __playinfo__")
|
||||
return None
|
||||
|
||||
temp_path = temp_dir / f"bilibili_audio_{timestamp}.m4s"
|
||||
|
||||
try:
|
||||
api_request = context.request
|
||||
headers = {
|
||||
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
|
||||
"Referer": "https://www.bilibili.com/"
|
||||
}
|
||||
|
||||
logger.info(f"[Playwright] Downloading audio stream...")
|
||||
response = await api_request.get(audio_url, headers=headers)
|
||||
|
||||
if response.status == 200:
|
||||
body = await response.body()
|
||||
with open(temp_path, 'wb') as f:
|
||||
f.write(body)
|
||||
|
||||
logger.info(f"[Playwright] Downloaded successfully: {temp_path}")
|
||||
return temp_path
|
||||
else:
|
||||
logger.error(f"[Playwright] API Request failed: {response.status}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"[Playwright] Download logic error: {e}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"[Playwright] Bilibili download failed: {e}")
|
||||
return None
|
||||
finally:
|
||||
if browser:
|
||||
await browser.close()
|
||||
if playwright:
|
||||
await playwright.stop()
|
||||
0
backend/app/modules/videos/__init__.py
Normal file
0
backend/app/modules/videos/__init__.py
Normal file
206
backend/app/modules/videos/router.py
Normal file
206
backend/app/modules/videos/router.py
Normal file
@@ -0,0 +1,206 @@
|
||||
import os
|
||||
import re
|
||||
import tempfile
|
||||
import uuid
|
||||
|
||||
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException
|
||||
from fastapi.responses import FileResponse
|
||||
from loguru import logger
|
||||
from starlette.background import BackgroundTask
|
||||
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
from app.services.tts_service import TTSService
|
||||
|
||||
from .schemas import GenerateRequest, VoicePreviewRequest
|
||||
from .task_store import create_task, get_task, list_tasks
|
||||
from .workflow import process_video_generation, get_lipsync_health, get_voiceclone_health
|
||||
from .service import list_generated_videos, delete_generated_video, delete_all_generated_videos
|
||||
from app.modules.generated_audios.service import delete_all_generated_audios
|
||||
from app.services.storage import storage_service
|
||||
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
PREVIEW_TEXTS = {
|
||||
"zh-CN": "你好,请选择你喜欢的音色吧。",
|
||||
"en-US": "Hello, please choose the voice you like.",
|
||||
"ja-JP": "こんにちは。お好きな音声を選んでください。",
|
||||
"ko-KR": "안녕하세요, 마음에 드는 음성을 선택해 주세요.",
|
||||
"fr-FR": "Bonjour, veuillez choisir la voix que vous preferez.",
|
||||
"de-DE": "Hallo, bitte waehlen Sie die Stimme, die Ihnen gefaellt.",
|
||||
"es-ES": "Hola, por favor elige la voz que mas te guste.",
|
||||
"ru-RU": "Zdravstvuite, pozhaluista, vyberite golos, kotoryi vam nravitsya.",
|
||||
"it-IT": "Ciao, scegli la voce che preferisci.",
|
||||
"pt-BR": "Ola, escolha a voz de que voce mais gosta.",
|
||||
}
|
||||
|
||||
|
||||
def _cleanup_temp_file(path: str) -> None:
|
||||
try:
|
||||
os.unlink(path)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
|
||||
def _get_voice_locale(voice: str) -> str:
|
||||
parts = voice.split("-")
|
||||
if len(parts) >= 2:
|
||||
return f"{parts[0]}-{parts[1]}"
|
||||
return "zh-CN"
|
||||
|
||||
|
||||
def _get_preview_text_for_voice(voice: str) -> str:
|
||||
locale = _get_voice_locale(voice)
|
||||
return PREVIEW_TEXTS.get(locale, PREVIEW_TEXTS["zh-CN"])
|
||||
|
||||
|
||||
async def _render_voice_preview(voice: str, text: str) -> FileResponse:
|
||||
tmp_file = tempfile.NamedTemporaryFile(prefix="voice_preview_", suffix=".mp3", delete=False)
|
||||
output_path = tmp_file.name
|
||||
tmp_file.close()
|
||||
|
||||
tts = TTSService()
|
||||
try:
|
||||
await tts.generate_audio(text=text, voice=voice, output_path=output_path)
|
||||
except Exception as e:
|
||||
_cleanup_temp_file(output_path)
|
||||
logger.error(f"音色试听生成失败: voice={voice}, error={e}")
|
||||
raise HTTPException(status_code=500, detail="音色试听生成失败,请稍后重试")
|
||||
|
||||
return FileResponse(
|
||||
path=output_path,
|
||||
media_type="audio/mpeg",
|
||||
filename="voice_preview.mp3",
|
||||
background=BackgroundTask(_cleanup_temp_file, output_path),
|
||||
)
|
||||
|
||||
|
||||
@router.post("/generate")
|
||||
async def generate_video(
|
||||
req: GenerateRequest,
|
||||
background_tasks: BackgroundTasks,
|
||||
current_user: dict = Depends(get_current_user)
|
||||
):
|
||||
user_id = current_user["id"]
|
||||
task_id = str(uuid.uuid4())
|
||||
create_task(task_id, user_id)
|
||||
background_tasks.add_task(process_video_generation, task_id, req, user_id)
|
||||
return success_response({"task_id": task_id})
|
||||
|
||||
|
||||
@router.get("/tasks/{task_id}")
|
||||
async def get_task_status(task_id: str, current_user: dict = Depends(get_current_user)):
|
||||
task = get_task(task_id)
|
||||
# 验证任务归属:只能查看自己的任务
|
||||
if task.get("status") != "not_found" and task.get("user_id") != current_user["id"]:
|
||||
return success_response({"status": "not_found"})
|
||||
return success_response(task)
|
||||
|
||||
|
||||
@router.get("/tasks")
|
||||
async def list_tasks_view(current_user: dict = Depends(get_current_user)):
|
||||
# 只返回当前用户的任务
|
||||
all_tasks = list_tasks()
|
||||
user_tasks = [t for t in all_tasks if t.get("user_id") == current_user["id"]]
|
||||
return success_response({"tasks": user_tasks})
|
||||
|
||||
|
||||
@router.get("/lipsync/health")
|
||||
async def lipsync_health():
|
||||
return success_response(await get_lipsync_health())
|
||||
|
||||
|
||||
@router.get("/voiceclone/health")
|
||||
async def voiceclone_health():
|
||||
return success_response(await get_voiceclone_health())
|
||||
|
||||
|
||||
@router.post("/cleanup")
|
||||
async def cleanup_workspace(current_user: dict = Depends(get_current_user)):
|
||||
user_id = current_user["id"]
|
||||
|
||||
videos_deleted, videos_failed = await delete_all_generated_videos(user_id)
|
||||
audios_deleted, audios_failed = await delete_all_generated_audios(user_id)
|
||||
|
||||
if videos_failed > 0 or audios_failed > 0:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail=(
|
||||
f"工作区清理不完整:视频删除失败 {videos_failed} 个,"
|
||||
f"配音删除失败 {audios_failed} 个,请重试"
|
||||
),
|
||||
)
|
||||
|
||||
return success_response({
|
||||
"videos_deleted": videos_deleted,
|
||||
"audios_deleted": audios_deleted,
|
||||
}, message="工作区已清理")
|
||||
|
||||
|
||||
@router.get("/generated")
|
||||
async def list_generated(current_user: dict = Depends(get_current_user)):
|
||||
return success_response(await list_generated_videos(current_user["id"]))
|
||||
|
||||
|
||||
@router.get("/generated/{video_id}/download")
|
||||
async def download_generated(video_id: str, current_user: dict = Depends(get_current_user)):
|
||||
if not re.match(r'^[A-Za-z0-9_-]+$', video_id):
|
||||
raise HTTPException(status_code=400, detail="非法 video_id")
|
||||
user_id = current_user["id"]
|
||||
storage_path = f"{user_id}/{video_id}.mp4"
|
||||
local_path = storage_service.get_local_file_path(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=storage_path,
|
||||
)
|
||||
if not local_path or not os.path.exists(local_path):
|
||||
raise HTTPException(status_code=404, detail="视频文件不存在")
|
||||
return FileResponse(
|
||||
path=local_path,
|
||||
media_type="video/mp4",
|
||||
filename=f"{video_id}.mp4",
|
||||
headers={"Content-Disposition": f'attachment; filename="{video_id}.mp4"'},
|
||||
)
|
||||
|
||||
|
||||
@router.delete("/generated/{video_id}")
|
||||
async def delete_generated(video_id: str, current_user: dict = Depends(get_current_user)):
|
||||
if not re.match(r'^[A-Za-z0-9_-]+$', video_id):
|
||||
raise HTTPException(status_code=400, detail="非法 video_id")
|
||||
result = await delete_generated_video(current_user["id"], video_id)
|
||||
return success_response(result, message="视频已删除")
|
||||
|
||||
|
||||
@router.post("/voice-preview")
|
||||
async def preview_voice_post(
|
||||
req: VoicePreviewRequest,
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
# 复用统一鉴权,接口本身不需要 user_id
|
||||
_ = current_user
|
||||
|
||||
voice = req.voice.strip()
|
||||
text = req.text.strip()
|
||||
|
||||
if not voice:
|
||||
raise HTTPException(status_code=400, detail="voice 不能为空")
|
||||
if not text:
|
||||
raise HTTPException(status_code=400, detail="text 不能为空")
|
||||
|
||||
return await _render_voice_preview(voice=voice, text=text)
|
||||
|
||||
|
||||
@router.get("/voice-preview")
|
||||
async def preview_voice_get(
|
||||
voice: str,
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
# 复用统一鉴权,接口本身不需要 user_id
|
||||
_ = current_user
|
||||
|
||||
voice_value = voice.strip()
|
||||
if not voice_value:
|
||||
raise HTTPException(status_code=400, detail="voice 不能为空")
|
||||
|
||||
text = _get_preview_text_for_voice(voice_value)
|
||||
return await _render_voice_preview(voice=voice_value, text=text)
|
||||
46
backend/app/modules/videos/schemas.py
Normal file
46
backend/app/modules/videos/schemas.py
Normal file
@@ -0,0 +1,46 @@
|
||||
from pydantic import BaseModel, Field
|
||||
from typing import Optional, List, Literal
|
||||
|
||||
|
||||
class CustomAssignment(BaseModel):
|
||||
material_path: str
|
||||
start: float # 音频时间轴起点
|
||||
end: float # 音频时间轴终点
|
||||
source_start: float = 0.0 # 源视频截取起点
|
||||
source_end: Optional[float] = None # 源视频截取终点(可选)
|
||||
|
||||
|
||||
class GenerateRequest(BaseModel):
|
||||
text: str
|
||||
voice: str = "zh-CN-YunxiNeural"
|
||||
material_path: str
|
||||
material_paths: Optional[List[str]] = None
|
||||
tts_mode: str = "edgetts"
|
||||
ref_audio_id: Optional[str] = None
|
||||
ref_text: Optional[str] = None
|
||||
language: str = "zh-CN"
|
||||
generated_audio_id: Optional[str] = None # 预生成配音 ID(存在时跳过内联 TTS)
|
||||
title: Optional[str] = None
|
||||
title_display_mode: Literal["short", "persistent"] = "short"
|
||||
title_duration: float = 4.0
|
||||
enable_subtitles: bool = True
|
||||
subtitle_style_id: Optional[str] = None
|
||||
title_style_id: Optional[str] = None
|
||||
secondary_title: Optional[str] = None
|
||||
secondary_title_style_id: Optional[str] = None
|
||||
secondary_title_font_size: Optional[int] = None
|
||||
secondary_title_top_margin: Optional[int] = None
|
||||
subtitle_font_size: Optional[int] = None
|
||||
title_font_size: Optional[int] = None
|
||||
title_top_margin: Optional[int] = None
|
||||
subtitle_bottom_margin: Optional[int] = None
|
||||
bgm_id: Optional[str] = None
|
||||
bgm_volume: Optional[float] = 0.2
|
||||
custom_assignments: Optional[List[CustomAssignment]] = None
|
||||
output_aspect_ratio: Literal["9:16", "16:9"] = "9:16"
|
||||
lipsync_model: Literal["default", "fast", "advanced"] = "default"
|
||||
|
||||
|
||||
class VoicePreviewRequest(BaseModel):
|
||||
voice: str
|
||||
text: str = Field(..., min_length=1, max_length=120)
|
||||
117
backend/app/modules/videos/service.py
Normal file
117
backend/app/modules/videos/service.py
Normal file
@@ -0,0 +1,117 @@
|
||||
from fastapi import HTTPException
|
||||
import asyncio
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
|
||||
from app.services.storage import storage_service
|
||||
|
||||
|
||||
async def list_generated_videos(user_id: str) -> dict:
|
||||
"""从 Storage 读取当前用户生成的视频列表"""
|
||||
try:
|
||||
files_obj = await storage_service.list_files(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=user_id
|
||||
)
|
||||
|
||||
semaphore = asyncio.Semaphore(8)
|
||||
|
||||
async def build_item(f):
|
||||
name = f.get("name")
|
||||
if not name or name == ".emptyFolderPlaceholder":
|
||||
return None
|
||||
|
||||
if not name.endswith("_output.mp4"):
|
||||
return None
|
||||
|
||||
video_id = Path(name).stem
|
||||
full_path = f"{user_id}/{name}"
|
||||
|
||||
async with semaphore:
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=full_path
|
||||
)
|
||||
|
||||
metadata = f.get("metadata", {})
|
||||
size = metadata.get("size", 0)
|
||||
created_at_str = f.get("created_at", "")
|
||||
created_at = 0
|
||||
if created_at_str:
|
||||
from datetime import datetime
|
||||
try:
|
||||
dt = datetime.fromisoformat(created_at_str.replace("Z", "+00:00"))
|
||||
created_at = int(dt.timestamp())
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return {
|
||||
"id": video_id,
|
||||
"name": name,
|
||||
"path": signed_url,
|
||||
"size_mb": size / (1024 * 1024),
|
||||
"created_at": created_at
|
||||
}
|
||||
|
||||
tasks = [build_item(f) for f in files_obj]
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
videos = []
|
||||
for item in results:
|
||||
if not item:
|
||||
continue
|
||||
if isinstance(item, Exception):
|
||||
logger.warning(f"Signed url build failed: {item}")
|
||||
continue
|
||||
videos.append(item)
|
||||
|
||||
videos.sort(key=lambda x: x.get("created_at", ""), reverse=True)
|
||||
return {"videos": videos}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"List generated videos failed: {e}")
|
||||
return {"videos": []}
|
||||
|
||||
|
||||
async def delete_all_generated_videos(user_id: str) -> tuple[int, int]:
|
||||
"""删除用户所有生成的视频,返回 (删除数量, 失败数量)"""
|
||||
try:
|
||||
files = await storage_service.list_files(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=user_id,
|
||||
strict=True,
|
||||
)
|
||||
deleted_count = 0
|
||||
failed_count = 0
|
||||
for f in files:
|
||||
name = f.get("name")
|
||||
if not name or name == ".emptyFolderPlaceholder":
|
||||
continue
|
||||
full_path = f"{user_id}/{name}"
|
||||
try:
|
||||
await storage_service.delete_file(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=full_path
|
||||
)
|
||||
deleted_count += 1
|
||||
except Exception as e:
|
||||
failed_count += 1
|
||||
logger.warning(f"Delete file failed: {full_path}, {e}")
|
||||
return deleted_count, failed_count
|
||||
except Exception as e:
|
||||
logger.error(f"Delete all generated videos failed: {e}")
|
||||
return 0, 1
|
||||
|
||||
|
||||
async def delete_generated_video(user_id: str, video_id: str) -> dict:
|
||||
"""删除生成的视频"""
|
||||
try:
|
||||
storage_path = f"{user_id}/{video_id}.mp4"
|
||||
|
||||
await storage_service.delete_file(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=storage_path
|
||||
)
|
||||
return {"video_id": video_id}
|
||||
except Exception as e:
|
||||
raise HTTPException(500, f"删除失败: {str(e)}")
|
||||
124
backend/app/modules/videos/task_store.py
Normal file
124
backend/app/modules/videos/task_store.py
Normal file
@@ -0,0 +1,124 @@
|
||||
from typing import Any, Dict, List
|
||||
import json
|
||||
|
||||
from loguru import logger
|
||||
from app.core.config import settings
|
||||
|
||||
try:
|
||||
import redis
|
||||
except Exception: # pragma: no cover - optional dependency
|
||||
redis = None
|
||||
|
||||
|
||||
class InMemoryTaskStore:
|
||||
def __init__(self) -> None:
|
||||
self._tasks: Dict[str, Dict[str, Any]] = {}
|
||||
|
||||
def create(self, task_id: str, user_id: str) -> Dict[str, Any]:
|
||||
task = {
|
||||
"status": "pending",
|
||||
"task_id": task_id,
|
||||
"progress": 0,
|
||||
"user_id": user_id,
|
||||
}
|
||||
self._tasks[task_id] = task
|
||||
return task
|
||||
|
||||
def get(self, task_id: str) -> Dict[str, Any]:
|
||||
return self._tasks.get(task_id, {"status": "not_found"})
|
||||
|
||||
def list(self) -> List[Dict[str, Any]]:
|
||||
return list(self._tasks.values())
|
||||
|
||||
def update(self, task_id: str, updates: Dict[str, Any]) -> Dict[str, Any]:
|
||||
task = self._tasks.get(task_id)
|
||||
if not task:
|
||||
task = {"status": "pending", "task_id": task_id}
|
||||
self._tasks[task_id] = task
|
||||
task.update(updates)
|
||||
return task
|
||||
|
||||
|
||||
class RedisTaskStore:
|
||||
def __init__(self, client: "redis.Redis") -> None:
|
||||
self._client = client
|
||||
self._index_key = "vigent:tasks:index"
|
||||
|
||||
def _key(self, task_id: str) -> str:
|
||||
return f"vigent:tasks:{task_id}"
|
||||
|
||||
def create(self, task_id: str, user_id: str) -> Dict[str, Any]:
|
||||
task = {
|
||||
"status": "pending",
|
||||
"task_id": task_id,
|
||||
"progress": 0,
|
||||
"user_id": user_id,
|
||||
}
|
||||
self._client.set(self._key(task_id), json.dumps(task, ensure_ascii=False), ex=86400)
|
||||
self._client.sadd(self._index_key, task_id)
|
||||
return task
|
||||
|
||||
def get(self, task_id: str) -> Dict[str, Any]:
|
||||
raw = self._client.get(self._key(task_id))
|
||||
if not raw:
|
||||
return {"status": "not_found"}
|
||||
return json.loads(raw)
|
||||
|
||||
def list(self) -> List[Dict[str, Any]]:
|
||||
task_ids = list(self._client.smembers(self._index_key) or [])
|
||||
if not task_ids:
|
||||
return []
|
||||
keys = [self._key(task_id) for task_id in task_ids]
|
||||
raw_items = self._client.mget(keys)
|
||||
tasks = []
|
||||
expired = []
|
||||
for task_id, raw in zip(task_ids, raw_items):
|
||||
if raw is None:
|
||||
expired.append(task_id)
|
||||
continue
|
||||
try:
|
||||
tasks.append(json.loads(raw))
|
||||
except Exception:
|
||||
continue
|
||||
if expired:
|
||||
self._client.srem(self._index_key, *expired)
|
||||
return tasks
|
||||
|
||||
def update(self, task_id: str, updates: Dict[str, Any]) -> Dict[str, Any]:
|
||||
task = self.get(task_id)
|
||||
if task.get("status") == "not_found":
|
||||
task = {"status": "pending", "task_id": task_id}
|
||||
task.update(updates)
|
||||
ttl = 7200 if task.get("status") in ("completed", "failed") else 86400
|
||||
self._client.set(self._key(task_id), json.dumps(task, ensure_ascii=False), ex=ttl)
|
||||
self._client.sadd(self._index_key, task_id)
|
||||
return task
|
||||
|
||||
|
||||
def _build_task_store():
|
||||
if redis is None:
|
||||
logger.warning("Redis not available, using in-memory task store")
|
||||
return InMemoryTaskStore()
|
||||
try:
|
||||
client = redis.Redis.from_url(settings.REDIS_URL, decode_responses=True)
|
||||
client.ping()
|
||||
logger.info("Using Redis task store")
|
||||
return RedisTaskStore(client)
|
||||
except Exception as e:
|
||||
logger.warning(f"Redis connection failed, using in-memory task store: {e}")
|
||||
return InMemoryTaskStore()
|
||||
|
||||
|
||||
task_store = _build_task_store()
|
||||
|
||||
|
||||
def create_task(task_id: str, user_id: str) -> Dict[str, Any]:
|
||||
return task_store.create(task_id, user_id)
|
||||
|
||||
|
||||
def get_task(task_id: str) -> Dict[str, Any]:
|
||||
return task_store.get(task_id)
|
||||
|
||||
|
||||
def list_tasks() -> List[Dict[str, Any]]:
|
||||
return task_store.list()
|
||||
846
backend/app/modules/videos/workflow.py
Normal file
846
backend/app/modules/videos/workflow.py
Normal file
@@ -0,0 +1,846 @@
|
||||
from typing import Optional, Any, List
|
||||
from pathlib import Path
|
||||
import asyncio
|
||||
import time
|
||||
import traceback
|
||||
import httpx
|
||||
from loguru import logger
|
||||
|
||||
from app.core.config import settings
|
||||
from app.services.tts_service import TTSService
|
||||
from app.services.video_service import VideoService
|
||||
from app.services.lipsync_service import LipSyncService
|
||||
from app.services.voice_clone_service import voice_clone_service
|
||||
from app.services.assets_service import (
|
||||
get_style,
|
||||
get_default_style,
|
||||
resolve_bgm_path,
|
||||
prepare_style_for_remotion,
|
||||
)
|
||||
from app.services.storage import storage_service
|
||||
from app.services.whisper_service import whisper_service
|
||||
from app.services.remotion_service import remotion_service
|
||||
|
||||
from .schemas import GenerateRequest
|
||||
from .task_store import task_store
|
||||
|
||||
# 全局并发限制:最多同时运行 2 个视频生成任务
|
||||
_generation_semaphore = asyncio.Semaphore(2)
|
||||
|
||||
|
||||
def _locale_to_whisper_lang(locale: str) -> str:
|
||||
"""'en-US' → 'en', 'zh-CN' → 'zh'"""
|
||||
return locale.split("-")[0] if "-" in locale else locale
|
||||
|
||||
|
||||
def _locale_to_tts_lang(locale: str) -> str:
|
||||
"""'zh-CN' → 'Chinese', 'en-US' → 'English', 其他 → 'Auto'"""
|
||||
mapping = {"zh": "Chinese", "en": "English"}
|
||||
return mapping.get(locale.split("-")[0], "Auto")
|
||||
|
||||
|
||||
_lipsync_service: Optional[LipSyncService] = None
|
||||
_lipsync_ready: Optional[bool] = None
|
||||
_lipsync_last_check: float = 0
|
||||
|
||||
|
||||
def _get_lipsync_service() -> LipSyncService:
|
||||
"""获取或创建 LipSync 服务实例(单例模式,避免重复初始化)"""
|
||||
global _lipsync_service
|
||||
if _lipsync_service is None:
|
||||
_lipsync_service = LipSyncService()
|
||||
return _lipsync_service
|
||||
|
||||
|
||||
async def _check_lipsync_ready(force: bool = False) -> bool:
|
||||
"""检查 LipSync 是否就绪(带缓存,5分钟内不重复检查)"""
|
||||
global _lipsync_ready, _lipsync_last_check
|
||||
|
||||
now = time.time()
|
||||
if not force and _lipsync_ready is not None and (now - _lipsync_last_check) < 300:
|
||||
return bool(_lipsync_ready)
|
||||
|
||||
lipsync = _get_lipsync_service()
|
||||
health = await lipsync.check_health()
|
||||
_lipsync_ready = health.get("ready", False)
|
||||
_lipsync_last_check = now
|
||||
print(f"[LipSync] Health check: ready={_lipsync_ready}")
|
||||
return bool(_lipsync_ready)
|
||||
|
||||
|
||||
async def _download_material(path_or_url: str, temp_path: Path):
|
||||
"""下载素材到临时文件 (流式下载,节省内存)"""
|
||||
if path_or_url.startswith("http"):
|
||||
timeout = httpx.Timeout(None)
|
||||
async with httpx.AsyncClient(timeout=timeout) as client:
|
||||
async with client.stream("GET", path_or_url) as resp:
|
||||
resp.raise_for_status()
|
||||
with open(temp_path, "wb") as f:
|
||||
async for chunk in resp.aiter_bytes():
|
||||
f.write(chunk)
|
||||
else:
|
||||
src = Path(path_or_url)
|
||||
if not src.is_absolute():
|
||||
src = settings.BASE_DIR.parent / path_or_url
|
||||
|
||||
if src.exists():
|
||||
import shutil
|
||||
shutil.copy(src, temp_path)
|
||||
else:
|
||||
raise FileNotFoundError(f"Material not found: {path_or_url}")
|
||||
|
||||
|
||||
def _update_task(task_id: str, **updates: Any) -> None:
|
||||
task_store.update(task_id, updates)
|
||||
|
||||
|
||||
async def _run_blocking(func, *args):
|
||||
"""在线程池执行阻塞函数,避免卡住事件循环。"""
|
||||
loop = asyncio.get_running_loop()
|
||||
return await loop.run_in_executor(None, func, *args)
|
||||
|
||||
|
||||
# ── 多素材辅助函数 ──
|
||||
|
||||
|
||||
def _split_equal(segments: List[dict], material_paths: List[str]) -> List[dict]:
|
||||
"""按素材数量均分音频时长,对齐到最近的 Whisper 字边界。
|
||||
|
||||
Args:
|
||||
segments: Whisper 产出的 segment 列表, 每个包含 words (字级时间戳)
|
||||
material_paths: 素材路径列表
|
||||
|
||||
Returns:
|
||||
[{"material_path": "...", "start": 0.0, "end": 5.2, "index": 0}, ...]
|
||||
"""
|
||||
# 展平所有 Whisper 字符
|
||||
all_chars: List[dict] = []
|
||||
for seg in segments:
|
||||
for w in seg.get("words", []):
|
||||
all_chars.append(w)
|
||||
|
||||
n = len(material_paths)
|
||||
|
||||
if not all_chars or n == 0:
|
||||
return [{"material_path": material_paths[0] if material_paths else "",
|
||||
"start": 0.0, "end": 99999.0, "index": 0}]
|
||||
|
||||
# 素材数不能超过字符数,否则边界会重复
|
||||
if n > len(all_chars):
|
||||
logger.warning(f"[MultiMat] 素材数({n}) > 字符数({len(all_chars)}),裁剪为 {len(all_chars)}")
|
||||
n = len(all_chars)
|
||||
|
||||
total_start = all_chars[0]["start"]
|
||||
total_end = all_chars[-1]["end"]
|
||||
seg_dur = (total_end - total_start) / n
|
||||
|
||||
# 计算 N-1 个分割点,对齐到最近的字边界
|
||||
boundaries = [0] # 第一段从第 0 个字开始
|
||||
for i in range(1, n):
|
||||
target_time = total_start + i * seg_dur
|
||||
# 找到 start 时间最接近 target_time 的字
|
||||
best_idx = boundaries[-1] + 1 # 至少比上一个边界后移 1
|
||||
best_diff = float("inf")
|
||||
for j in range(boundaries[-1] + 1, len(all_chars)):
|
||||
diff = abs(all_chars[j]["start"] - target_time)
|
||||
if diff < best_diff:
|
||||
best_diff = diff
|
||||
best_idx = j
|
||||
elif diff > best_diff:
|
||||
break # 时间递增,差值开始变大后可以停了
|
||||
boundaries.append(min(best_idx, len(all_chars) - 1))
|
||||
boundaries.append(len(all_chars)) # 最后一段到末尾
|
||||
|
||||
# 按边界生成分配结果
|
||||
assignments: List[dict] = []
|
||||
for i in range(n):
|
||||
s_idx = boundaries[i]
|
||||
e_idx = boundaries[i + 1]
|
||||
if s_idx >= len(all_chars) or s_idx >= e_idx:
|
||||
continue
|
||||
assignments.append({
|
||||
"material_path": material_paths[i],
|
||||
"start": all_chars[s_idx]["start"],
|
||||
"end": all_chars[e_idx - 1]["end"],
|
||||
"text": "".join(c["word"] for c in all_chars[s_idx:e_idx]),
|
||||
"index": len(assignments),
|
||||
})
|
||||
|
||||
if not assignments:
|
||||
return [{"material_path": material_paths[0], "start": 0.0, "end": 99999.0, "index": 0}]
|
||||
|
||||
logger.info(f"[MultiMat] 均分 {len(all_chars)} 字为 {len(assignments)} 段")
|
||||
for a in assignments:
|
||||
dur = a["end"] - a["start"]
|
||||
logger.info(f" 段{a['index']}: [{a['start']:.2f}-{a['end']:.2f}s] ({dur:.1f}s) {a['text'][:20]}")
|
||||
|
||||
return assignments
|
||||
|
||||
|
||||
async def process_video_generation(task_id: str, req: GenerateRequest, user_id: str):
|
||||
_update_task(task_id, message="排队中...")
|
||||
async with _generation_semaphore:
|
||||
await _process_video_generation_inner(task_id, req, user_id)
|
||||
|
||||
|
||||
async def _process_video_generation_inner(task_id: str, req: GenerateRequest, user_id: str):
|
||||
temp_files = []
|
||||
try:
|
||||
start_time = time.time()
|
||||
|
||||
# ── 确定素材列表 ──
|
||||
material_paths: List[str] = []
|
||||
if req.custom_assignments and len(req.custom_assignments) > 1:
|
||||
material_paths = [a.material_path for a in req.custom_assignments if a.material_path]
|
||||
elif req.material_paths and len(req.material_paths) > 1:
|
||||
material_paths = req.material_paths
|
||||
else:
|
||||
material_paths = [req.material_path]
|
||||
|
||||
is_multi = len(material_paths) > 1
|
||||
target_resolution = (1080, 1920) if req.output_aspect_ratio == "9:16" else (1920, 1080)
|
||||
|
||||
logger.info(
|
||||
f"[Render] 输出画面比例: {req.output_aspect_ratio}, "
|
||||
f"目标分辨率: {target_resolution[0]}x{target_resolution[1]}"
|
||||
)
|
||||
|
||||
_update_task(task_id, status="processing", progress=5, message="正在下载素材...")
|
||||
|
||||
temp_dir = settings.UPLOAD_DIR / "temp"
|
||||
temp_dir.mkdir(parents=True, exist_ok=True)
|
||||
video = VideoService()
|
||||
input_material_path: Optional[Path] = None
|
||||
|
||||
# 单素材模式:下载主素材
|
||||
if not is_multi:
|
||||
input_material_path = temp_dir / f"{task_id}_input.mp4"
|
||||
temp_files.append(input_material_path)
|
||||
await _download_material(material_paths[0], input_material_path)
|
||||
|
||||
# 归一化旋转元数据(如 iPhone MOV 1920x1080 + rotation=-90)
|
||||
normalized_input_path = temp_dir / f"{task_id}_input_norm.mp4"
|
||||
normalized_result = await _run_blocking(
|
||||
video.normalize_orientation,
|
||||
str(input_material_path),
|
||||
str(normalized_input_path),
|
||||
)
|
||||
if normalized_result != str(input_material_path):
|
||||
temp_files.append(normalized_input_path)
|
||||
input_material_path = normalized_input_path
|
||||
|
||||
_update_task(task_id, message="正在生成语音...", progress=10)
|
||||
|
||||
audio_path = temp_dir / f"{task_id}_audio.wav"
|
||||
temp_files.append(audio_path)
|
||||
|
||||
if req.generated_audio_id:
|
||||
# 新流程:使用预生成的配音
|
||||
_update_task(task_id, message="正在下载配音...", progress=12)
|
||||
audio_url = await storage_service.get_signed_url(
|
||||
bucket="generated-audios",
|
||||
path=req.generated_audio_id,
|
||||
)
|
||||
await _download_material(audio_url, audio_path)
|
||||
|
||||
# 从元数据获取 language
|
||||
meta_path = req.generated_audio_id.replace("_audio.wav", "_audio.json")
|
||||
try:
|
||||
meta_url = await storage_service.get_signed_url(
|
||||
bucket="generated-audios", path=meta_path,
|
||||
)
|
||||
import httpx as _httpx
|
||||
async with _httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.get(meta_url)
|
||||
if resp.status_code == 200:
|
||||
meta = resp.json()
|
||||
req.language = meta.get("language", req.language)
|
||||
# 无条件用配音元数据覆盖文案,确保字幕与配音语言一致
|
||||
meta_text = meta.get("text", "")
|
||||
if meta_text:
|
||||
req.text = meta_text
|
||||
except Exception as e:
|
||||
logger.warning(f"读取配音元数据失败: {e}")
|
||||
|
||||
elif req.tts_mode == "voiceclone":
|
||||
if not req.ref_audio_id or not req.ref_text:
|
||||
raise ValueError("声音克隆模式需要提供参考音频和参考文字")
|
||||
|
||||
_update_task(task_id, message="正在下载参考音频...")
|
||||
|
||||
ref_audio_local = temp_dir / f"{task_id}_ref.wav"
|
||||
temp_files.append(ref_audio_local)
|
||||
|
||||
ref_audio_url = await storage_service.get_signed_url(
|
||||
bucket="ref-audios",
|
||||
path=req.ref_audio_id
|
||||
)
|
||||
await _download_material(ref_audio_url, ref_audio_local)
|
||||
|
||||
_update_task(task_id, message="正在克隆声音...")
|
||||
await voice_clone_service.generate_audio(
|
||||
text=req.text,
|
||||
ref_audio_path=str(ref_audio_local),
|
||||
ref_text=req.ref_text,
|
||||
output_path=str(audio_path),
|
||||
language=_locale_to_tts_lang(req.language)
|
||||
)
|
||||
else:
|
||||
_update_task(task_id, message="正在生成语音 (EdgeTTS)...")
|
||||
tts = TTSService()
|
||||
await tts.generate_audio(req.text, req.voice, str(audio_path))
|
||||
|
||||
tts_time = time.time() - start_time
|
||||
print(f"[Pipeline] TTS completed in {tts_time:.1f}s")
|
||||
|
||||
lipsync = _get_lipsync_service()
|
||||
lipsync_video_path = temp_dir / f"{task_id}_lipsync.mp4"
|
||||
temp_files.append(lipsync_video_path)
|
||||
|
||||
captions_path = None
|
||||
|
||||
async def _whisper_and_split():
|
||||
"""Whisper 对齐 → _split_equal 均分素材(公共逻辑)"""
|
||||
_update_task(task_id, message="正在生成字幕 (Whisper)...")
|
||||
_captions_path = temp_dir / f"{task_id}_captions.json"
|
||||
temp_files.append(_captions_path)
|
||||
captions_data = None
|
||||
try:
|
||||
captions_data = await whisper_service.align(
|
||||
audio_path=str(audio_path),
|
||||
text=req.text,
|
||||
output_path=str(_captions_path),
|
||||
language=_locale_to_whisper_lang(req.language),
|
||||
original_text=req.text,
|
||||
)
|
||||
print(f"[Pipeline] Whisper alignment completed (multi-material)")
|
||||
except Exception as e:
|
||||
logger.warning(f"Whisper alignment failed: {e}")
|
||||
_captions_path = None
|
||||
|
||||
_update_task(task_id, progress=15, message="正在分配素材...")
|
||||
|
||||
if captions_data and captions_data.get("segments"):
|
||||
result = _split_equal(captions_data["segments"], material_paths)
|
||||
else:
|
||||
logger.warning("[MultiMat] Whisper 无数据,按时长均分")
|
||||
audio_dur = await _run_blocking(video._get_duration, str(audio_path))
|
||||
if audio_dur <= 0:
|
||||
audio_dur = 30.0
|
||||
seg_dur = audio_dur / len(material_paths)
|
||||
result = [
|
||||
{"material_path": material_paths[i], "start": i * seg_dur,
|
||||
"end": (i + 1) * seg_dur, "index": i}
|
||||
for i in range(len(material_paths))
|
||||
]
|
||||
return result, _captions_path
|
||||
|
||||
if is_multi:
|
||||
# ══════════════════════════════════════
|
||||
# 多素材流水线
|
||||
# ══════════════════════════════════════
|
||||
_update_task(task_id, progress=12, message="正在分配素材...")
|
||||
|
||||
if req.custom_assignments and len(req.custom_assignments) == len(material_paths):
|
||||
# 用户自定义分配,跳过 Whisper 均分
|
||||
assignments = [
|
||||
{
|
||||
"material_path": a.material_path,
|
||||
"start": a.start,
|
||||
"end": a.end,
|
||||
"source_start": a.source_start,
|
||||
"source_end": a.source_end,
|
||||
"index": i,
|
||||
}
|
||||
for i, a in enumerate(req.custom_assignments)
|
||||
]
|
||||
# 仍然需要 Whisper 生成字幕(如果启用)
|
||||
captions_path = temp_dir / f"{task_id}_captions.json"
|
||||
temp_files.append(captions_path)
|
||||
if req.enable_subtitles:
|
||||
_update_task(task_id, message="正在生成字幕 (Whisper)...")
|
||||
try:
|
||||
await whisper_service.align(
|
||||
audio_path=str(audio_path),
|
||||
text=req.text,
|
||||
output_path=str(captions_path),
|
||||
language=_locale_to_whisper_lang(req.language),
|
||||
original_text=req.text,
|
||||
)
|
||||
print(f"[Pipeline] Whisper alignment completed (custom assignments)")
|
||||
except Exception as e:
|
||||
logger.warning(f"Whisper alignment failed: {e}")
|
||||
captions_path = None
|
||||
else:
|
||||
captions_path = None
|
||||
elif req.custom_assignments:
|
||||
logger.warning(
|
||||
f"[MultiMat] custom_assignments 数量({len(req.custom_assignments)})"
|
||||
f" 与素材数量({len(material_paths)})不一致,回退自动分配"
|
||||
)
|
||||
|
||||
assignments, captions_path = await _whisper_and_split()
|
||||
|
||||
else:
|
||||
assignments, captions_path = await _whisper_and_split()
|
||||
|
||||
# 扩展段覆盖完整音频范围:首段从0开始,末段到音频结尾
|
||||
audio_duration = await _run_blocking(video._get_duration, str(audio_path))
|
||||
if assignments and audio_duration > 0:
|
||||
assignments[0]["start"] = 0.0
|
||||
assignments[-1]["end"] = audio_duration
|
||||
|
||||
num_segments = len(assignments)
|
||||
print(f"[Pipeline] Multi-material: {num_segments} segments, {len(material_paths)} materials")
|
||||
|
||||
if num_segments == 0:
|
||||
raise RuntimeError("Multi-material: no valid segments after splitting")
|
||||
|
||||
lipsync_start = time.time()
|
||||
|
||||
# ── 第一步:并行下载所有素材并检测分辨率 ──
|
||||
material_locals: List[Path] = []
|
||||
resolutions = []
|
||||
|
||||
async def _download_and_normalize(i: int, assignment: dict):
|
||||
"""下载单个素材并归一化方向"""
|
||||
material_local = temp_dir / f"{task_id}_material_{i}.mp4"
|
||||
temp_files.append(material_local)
|
||||
await _download_material(assignment["material_path"], material_local)
|
||||
|
||||
normalized_material = temp_dir / f"{task_id}_material_{i}_norm.mp4"
|
||||
normalized_result = await _run_blocking(
|
||||
video.normalize_orientation,
|
||||
str(material_local),
|
||||
str(normalized_material),
|
||||
)
|
||||
if normalized_result != str(material_local):
|
||||
temp_files.append(normalized_material)
|
||||
material_local = normalized_material
|
||||
|
||||
res = video.get_resolution(str(material_local))
|
||||
return material_local, res
|
||||
|
||||
download_tasks = [
|
||||
_download_and_normalize(i, assignment)
|
||||
for i, assignment in enumerate(assignments)
|
||||
]
|
||||
download_results = await asyncio.gather(*download_tasks)
|
||||
for local, res in download_results:
|
||||
material_locals.append(local)
|
||||
resolutions.append(res)
|
||||
|
||||
# 按用户选择的画面比例统一分辨率
|
||||
base_res = target_resolution
|
||||
need_scale = any(r != base_res for r in resolutions)
|
||||
if need_scale:
|
||||
logger.info(f"[MultiMat] 素材分辨率不一致,统一到 {base_res[0]}x{base_res[1]}")
|
||||
|
||||
# ── 第二步:并行裁剪每段素材到对应时长 ──
|
||||
prepared_segments: List[Optional[Path]] = [None] * num_segments
|
||||
|
||||
async def _prepare_one_segment(i: int, assignment: dict):
|
||||
"""将单个素材裁剪/循环到对应时长"""
|
||||
seg_dur = assignment["end"] - assignment["start"]
|
||||
prepared_path = temp_dir / f"{task_id}_prepared_{i}.mp4"
|
||||
temp_files.append(prepared_path)
|
||||
prepare_target_res = None if resolutions[i] == base_res else base_res
|
||||
|
||||
await _run_blocking(
|
||||
video.prepare_segment,
|
||||
str(material_locals[i]),
|
||||
seg_dur,
|
||||
str(prepared_path),
|
||||
prepare_target_res,
|
||||
assignment.get("source_start", 0.0),
|
||||
assignment.get("source_end"),
|
||||
25,
|
||||
)
|
||||
return i, prepared_path
|
||||
|
||||
_update_task(
|
||||
task_id,
|
||||
progress=15,
|
||||
message=f"正在并行准备 {num_segments} 个素材片段..."
|
||||
)
|
||||
|
||||
prepare_tasks = [
|
||||
_prepare_one_segment(i, assignment)
|
||||
for i, assignment in enumerate(assignments)
|
||||
]
|
||||
prepare_results = await asyncio.gather(*prepare_tasks)
|
||||
for i, path in prepare_results:
|
||||
prepared_segments[i] = path
|
||||
|
||||
# ── 第二步:拼接所有素材片段 ──
|
||||
_update_task(task_id, progress=50, message="正在拼接素材片段...")
|
||||
concat_path = temp_dir / f"{task_id}_concat.mp4"
|
||||
temp_files.append(concat_path)
|
||||
prepared_segment_paths = [str(p) for p in prepared_segments if p is not None]
|
||||
if len(prepared_segment_paths) != num_segments:
|
||||
raise RuntimeError("Multi-material: prepared segments mismatch")
|
||||
await _run_blocking(
|
||||
video.concat_videos,
|
||||
prepared_segment_paths,
|
||||
str(concat_path),
|
||||
25,
|
||||
)
|
||||
|
||||
# ── 第三步:一次 LatentSync 推理 ──
|
||||
is_ready = await _check_lipsync_ready()
|
||||
|
||||
if is_ready:
|
||||
_update_task(task_id, progress=55, message="正在合成唇形 (LatentSync)...")
|
||||
print(f"[LipSync] Multi-material: single LatentSync on concatenated video")
|
||||
try:
|
||||
await lipsync.generate(
|
||||
str(concat_path),
|
||||
str(audio_path),
|
||||
str(lipsync_video_path),
|
||||
model_mode=req.lipsync_model,
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning(f"[LipSync] Failed, fallback to concat without lipsync: {e}")
|
||||
import shutil
|
||||
shutil.copy(str(concat_path), str(lipsync_video_path))
|
||||
else:
|
||||
print(f"[LipSync] Not ready, using concatenated video without lipsync")
|
||||
import shutil
|
||||
shutil.copy(str(concat_path), str(lipsync_video_path))
|
||||
|
||||
lipsync_time = time.time() - lipsync_start
|
||||
print(f"[Pipeline] Multi-material prepare + concat + LipSync completed in {lipsync_time:.1f}s")
|
||||
_update_task(task_id, progress=80)
|
||||
|
||||
# 如果用户关闭了字幕,清除 captions_path(Whisper 仅用于句子切分)
|
||||
if not req.enable_subtitles:
|
||||
captions_path = None
|
||||
|
||||
else:
|
||||
# ══════════════════════════════════════
|
||||
# 单素材流水线(原有逻辑)
|
||||
# ══════════════════════════════════════
|
||||
|
||||
if input_material_path is None:
|
||||
raise RuntimeError("单素材流程缺少输入素材")
|
||||
|
||||
# 单素材:按用户选择画面比例统一到目标分辨率,并应用 source_start
|
||||
single_source_start = 0.0
|
||||
single_source_end = None
|
||||
if req.custom_assignments and len(req.custom_assignments) == 1:
|
||||
single_source_start = req.custom_assignments[0].source_start
|
||||
single_source_end = req.custom_assignments[0].source_end
|
||||
|
||||
_update_task(task_id, progress=20, message="正在准备素材片段...")
|
||||
audio_dur = await _run_blocking(video._get_duration, str(audio_path))
|
||||
if audio_dur <= 0:
|
||||
audio_dur = 30.0
|
||||
single_res = await _run_blocking(video.get_resolution, str(input_material_path))
|
||||
single_target_res = None if single_res == target_resolution else target_resolution
|
||||
prepared_single_path = temp_dir / f"{task_id}_prepared_single.mp4"
|
||||
temp_files.append(prepared_single_path)
|
||||
await _run_blocking(
|
||||
video.prepare_segment,
|
||||
str(input_material_path),
|
||||
audio_dur,
|
||||
str(prepared_single_path),
|
||||
single_target_res,
|
||||
single_source_start,
|
||||
single_source_end,
|
||||
None,
|
||||
)
|
||||
input_material_path = prepared_single_path
|
||||
|
||||
_update_task(task_id, progress=25)
|
||||
_update_task(task_id, message="正在合成唇形 (LatentSync)...", progress=30)
|
||||
|
||||
lipsync_start = time.time()
|
||||
is_ready = await _check_lipsync_ready()
|
||||
|
||||
if is_ready:
|
||||
print(f"[LipSync] Starting LatentSync inference...")
|
||||
_update_task(task_id, progress=35, message="正在运行 LatentSync 推理...")
|
||||
try:
|
||||
await lipsync.generate(
|
||||
str(input_material_path),
|
||||
str(audio_path),
|
||||
str(lipsync_video_path),
|
||||
model_mode=req.lipsync_model,
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning(f"[LipSync] Failed on single-material, fallback to prepared video: {e}")
|
||||
_update_task(task_id, message="唇形同步失败,使用原始视频...")
|
||||
import shutil
|
||||
shutil.copy(str(input_material_path), str(lipsync_video_path))
|
||||
else:
|
||||
print(f"[LipSync] LatentSync not ready, copying original video")
|
||||
_update_task(task_id, message="唇形同步不可用,使用原始视频...")
|
||||
import shutil
|
||||
shutil.copy(str(input_material_path), lipsync_video_path)
|
||||
|
||||
lipsync_time = time.time() - lipsync_start
|
||||
print(f"[Pipeline] LipSync completed in {lipsync_time:.1f}s")
|
||||
_update_task(task_id, progress=80)
|
||||
|
||||
# 单素材模式:Whisper 延迟到下方与 BGM 并行执行
|
||||
if not req.enable_subtitles:
|
||||
captions_path = None
|
||||
|
||||
_update_task(task_id, progress=85)
|
||||
|
||||
# ── Whisper 字幕 + BGM 混音 并行(两者都只依赖 audio_path)──
|
||||
final_audio_path = audio_path
|
||||
_whisper_task = None
|
||||
_bgm_task = None
|
||||
mix_output_path: Optional[Path] = None
|
||||
|
||||
# 单素材模式下 Whisper 尚未执行,这里与 BGM 并行启动
|
||||
need_whisper = not is_multi and req.enable_subtitles and captions_path is None
|
||||
if need_whisper:
|
||||
captions_path = temp_dir / f"{task_id}_captions.json"
|
||||
temp_files.append(captions_path)
|
||||
_captions_path_str = str(captions_path)
|
||||
|
||||
async def _run_whisper():
|
||||
_update_task(task_id, message="正在生成字幕 (Whisper)...", progress=82)
|
||||
try:
|
||||
await whisper_service.align(
|
||||
audio_path=str(audio_path),
|
||||
text=req.text,
|
||||
output_path=_captions_path_str,
|
||||
language=_locale_to_whisper_lang(req.language),
|
||||
original_text=req.text,
|
||||
)
|
||||
print(f"[Pipeline] Whisper alignment completed")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"Whisper alignment failed, skipping subtitles: {e}")
|
||||
return False
|
||||
|
||||
_whisper_task = _run_whisper()
|
||||
|
||||
if req.bgm_id:
|
||||
bgm_path = resolve_bgm_path(req.bgm_id)
|
||||
if bgm_path:
|
||||
mix_output_path = temp_dir / f"{task_id}_audio_mix.wav"
|
||||
temp_files.append(mix_output_path)
|
||||
volume = req.bgm_volume if req.bgm_volume is not None else 0.2
|
||||
volume = max(0.0, min(float(volume), 1.0))
|
||||
_mix_output = str(mix_output_path)
|
||||
_bgm_path = str(bgm_path)
|
||||
_voice_path = str(audio_path)
|
||||
_volume = volume
|
||||
|
||||
async def _run_bgm():
|
||||
_update_task(task_id, message="正在合成背景音乐...", progress=86)
|
||||
try:
|
||||
await _run_blocking(
|
||||
video.mix_audio,
|
||||
_voice_path,
|
||||
_bgm_path,
|
||||
_mix_output,
|
||||
_volume,
|
||||
)
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"BGM mix failed, fallback to voice only: {e}")
|
||||
return False
|
||||
|
||||
_bgm_task = _run_bgm()
|
||||
else:
|
||||
logger.warning(f"BGM not found: {req.bgm_id}")
|
||||
|
||||
# 并行等待 Whisper + BGM
|
||||
parallel_tasks = [t for t in (_whisper_task, _bgm_task) if t is not None]
|
||||
if parallel_tasks:
|
||||
results = await asyncio.gather(*parallel_tasks)
|
||||
result_idx = 0
|
||||
if _whisper_task is not None:
|
||||
if not results[result_idx]:
|
||||
captions_path = None
|
||||
result_idx += 1
|
||||
if _bgm_task is not None:
|
||||
if results[result_idx] and mix_output_path is not None:
|
||||
final_audio_path = mix_output_path
|
||||
|
||||
|
||||
use_remotion = (captions_path and captions_path.exists()) or req.title or req.secondary_title
|
||||
|
||||
subtitle_style = None
|
||||
title_style = None
|
||||
secondary_title_style = None
|
||||
if req.enable_subtitles:
|
||||
subtitle_style = get_style("subtitle", req.subtitle_style_id) or get_default_style("subtitle")
|
||||
if req.title:
|
||||
title_style = get_style("title", req.title_style_id) or get_default_style("title")
|
||||
if req.secondary_title:
|
||||
secondary_title_style = get_style("title", req.secondary_title_style_id) or get_default_style("title")
|
||||
|
||||
if req.subtitle_font_size and req.enable_subtitles:
|
||||
if subtitle_style is None:
|
||||
subtitle_style = {}
|
||||
subtitle_style["font_size"] = int(req.subtitle_font_size)
|
||||
|
||||
if req.title_font_size and req.title:
|
||||
if title_style is None:
|
||||
title_style = {}
|
||||
title_style["font_size"] = int(req.title_font_size)
|
||||
|
||||
if req.title_top_margin is not None and req.title:
|
||||
if title_style is None:
|
||||
title_style = {}
|
||||
title_style["top_margin"] = int(req.title_top_margin)
|
||||
|
||||
if req.subtitle_bottom_margin is not None and req.enable_subtitles:
|
||||
if subtitle_style is None:
|
||||
subtitle_style = {}
|
||||
subtitle_style["bottom_margin"] = int(req.subtitle_bottom_margin)
|
||||
|
||||
if req.secondary_title_font_size and req.secondary_title:
|
||||
if secondary_title_style is None:
|
||||
secondary_title_style = {}
|
||||
secondary_title_style["font_size"] = int(req.secondary_title_font_size)
|
||||
|
||||
if req.secondary_title_top_margin is not None and req.secondary_title:
|
||||
if secondary_title_style is None:
|
||||
secondary_title_style = {}
|
||||
secondary_title_style["top_margin"] = int(req.secondary_title_top_margin)
|
||||
|
||||
if use_remotion:
|
||||
subtitle_style = prepare_style_for_remotion(
|
||||
subtitle_style,
|
||||
temp_dir,
|
||||
f"{task_id}_subtitle_font"
|
||||
)
|
||||
title_style = prepare_style_for_remotion(
|
||||
title_style,
|
||||
temp_dir,
|
||||
f"{task_id}_title_font"
|
||||
)
|
||||
secondary_title_style = prepare_style_for_remotion(
|
||||
secondary_title_style,
|
||||
temp_dir,
|
||||
f"{task_id}_secondary_title_font"
|
||||
)
|
||||
|
||||
# 清理字体临时文件
|
||||
for prefix in [f"{task_id}_subtitle_font", f"{task_id}_title_font", f"{task_id}_secondary_title_font"]:
|
||||
for ext in [".ttf", ".otf", ".woff", ".woff2"]:
|
||||
font_tmp = temp_dir / f"{prefix}{ext}"
|
||||
if font_tmp.exists():
|
||||
temp_files.append(font_tmp)
|
||||
|
||||
final_output_local_path = temp_dir / f"{task_id}_output.mp4"
|
||||
temp_files.append(final_output_local_path)
|
||||
needs_audio_compose = str(final_audio_path) != str(audio_path)
|
||||
|
||||
if use_remotion:
|
||||
_update_task(task_id, message="正在合成视频 (Remotion)...", progress=87)
|
||||
remotion_input_path = lipsync_video_path
|
||||
|
||||
if needs_audio_compose:
|
||||
composed_video_path = temp_dir / f"{task_id}_composed.mp4"
|
||||
temp_files.append(composed_video_path)
|
||||
await video.compose(str(lipsync_video_path), str(final_audio_path), str(composed_video_path))
|
||||
remotion_input_path = composed_video_path
|
||||
else:
|
||||
logger.info("[Pipeline] Audio unchanged, skip pre-Remotion compose")
|
||||
|
||||
remotion_health = await remotion_service.check_health()
|
||||
if remotion_health.get("ready"):
|
||||
try:
|
||||
def on_remotion_progress(percent):
|
||||
mapped = 87 + int(percent * 0.08)
|
||||
_update_task(task_id, progress=mapped)
|
||||
|
||||
title_display_mode = (
|
||||
req.title_display_mode
|
||||
if req.title_display_mode in ("short", "persistent")
|
||||
else "short"
|
||||
)
|
||||
title_duration = max(0.5, min(float(req.title_duration or 4.0), 30.0))
|
||||
|
||||
await remotion_service.render(
|
||||
video_path=str(remotion_input_path),
|
||||
output_path=str(final_output_local_path),
|
||||
captions_path=str(captions_path) if captions_path else None,
|
||||
title=req.title,
|
||||
title_duration=title_duration,
|
||||
title_display_mode=title_display_mode,
|
||||
fps=25,
|
||||
enable_subtitles=req.enable_subtitles,
|
||||
subtitle_style=subtitle_style,
|
||||
title_style=title_style,
|
||||
secondary_title=req.secondary_title,
|
||||
secondary_title_style=secondary_title_style,
|
||||
on_progress=on_remotion_progress
|
||||
)
|
||||
print(f"[Pipeline] Remotion render completed")
|
||||
except Exception as e:
|
||||
logger.warning(f"Remotion render failed, using FFmpeg fallback: {e}")
|
||||
import shutil
|
||||
shutil.copy(str(remotion_input_path), str(final_output_local_path))
|
||||
else:
|
||||
logger.warning(f"Remotion not ready: {remotion_health.get('error')}, using FFmpeg")
|
||||
import shutil
|
||||
shutil.copy(str(remotion_input_path), str(final_output_local_path))
|
||||
else:
|
||||
_update_task(task_id, message="正在合成最终视频...", progress=90)
|
||||
if needs_audio_compose:
|
||||
await video.compose(str(lipsync_video_path), str(final_audio_path), str(final_output_local_path))
|
||||
else:
|
||||
import shutil
|
||||
shutil.copy(str(lipsync_video_path), str(final_output_local_path))
|
||||
|
||||
total_time = time.time() - start_time
|
||||
|
||||
_update_task(task_id, message="正在上传结果...", progress=95)
|
||||
|
||||
storage_path = f"{user_id}/{task_id}_output.mp4"
|
||||
await storage_service.upload_file_from_path(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
storage_path=storage_path,
|
||||
local_file_path=str(final_output_local_path),
|
||||
content_type="video/mp4"
|
||||
)
|
||||
|
||||
signed_url = await storage_service.get_signed_url(
|
||||
bucket=storage_service.BUCKET_OUTPUTS,
|
||||
path=storage_path
|
||||
)
|
||||
|
||||
print(f"[Pipeline] Total generation time: {total_time:.1f}s")
|
||||
|
||||
_update_task(
|
||||
task_id,
|
||||
status="completed",
|
||||
progress=100,
|
||||
message=f"生成完成!耗时 {total_time:.0f} 秒",
|
||||
output=storage_path,
|
||||
download_url=signed_url,
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
_update_task(
|
||||
task_id,
|
||||
status="failed",
|
||||
message=f"错误: {str(e)}",
|
||||
error=traceback.format_exc(),
|
||||
)
|
||||
logger.error(f"Generate video failed: {e}")
|
||||
finally:
|
||||
for f in temp_files:
|
||||
try:
|
||||
if f.exists():
|
||||
f.unlink()
|
||||
except Exception as e:
|
||||
print(f"Error cleaning up {f}: {e}")
|
||||
|
||||
|
||||
async def get_lipsync_health():
|
||||
lipsync = _get_lipsync_service()
|
||||
return await lipsync.check_health()
|
||||
|
||||
|
||||
async def get_voiceclone_health():
|
||||
return await voice_clone_service.check_health()
|
||||
0
backend/app/repositories/__init__.py
Normal file
0
backend/app/repositories/__init__.py
Normal file
34
backend/app/repositories/orders.py
Normal file
34
backend/app/repositories/orders.py
Normal file
@@ -0,0 +1,34 @@
|
||||
"""
|
||||
订单数据访问层
|
||||
"""
|
||||
from datetime import datetime, timezone
|
||||
from typing import Any, Dict, Optional, cast
|
||||
|
||||
from app.core.supabase import get_supabase
|
||||
|
||||
|
||||
def create_order(user_id: str, out_trade_no: str, amount: float) -> Dict[str, Any]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("orders").insert({
|
||||
"user_id": user_id,
|
||||
"out_trade_no": out_trade_no,
|
||||
"amount": amount,
|
||||
"status": "pending",
|
||||
}).execute()
|
||||
return cast(Dict[str, Any], (result.data or [{}])[0])
|
||||
|
||||
|
||||
def get_order_by_trade_no(out_trade_no: str) -> Optional[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("orders").select("*").eq("out_trade_no", out_trade_no).single().execute()
|
||||
return cast(Optional[Dict[str, Any]], result.data or None)
|
||||
|
||||
|
||||
def update_order_status(out_trade_no: str, status: str, trade_no: str | None = None) -> None:
|
||||
supabase = get_supabase()
|
||||
payload: Dict[str, Any] = {"status": status}
|
||||
if trade_no:
|
||||
payload["trade_no"] = trade_no
|
||||
if status == "paid":
|
||||
payload["paid_at"] = datetime.now(timezone.utc).isoformat()
|
||||
supabase.table("orders").update(payload).eq("out_trade_no", out_trade_no).execute()
|
||||
31
backend/app/repositories/sessions.py
Normal file
31
backend/app/repositories/sessions.py
Normal file
@@ -0,0 +1,31 @@
|
||||
from typing import Any, Dict, List, Optional, cast
|
||||
|
||||
from app.core.supabase import get_supabase
|
||||
|
||||
|
||||
def get_session(user_id: str, session_token: str) -> Optional[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = (
|
||||
supabase.table("user_sessions")
|
||||
.select("*")
|
||||
.eq("user_id", user_id)
|
||||
.eq("session_token", session_token)
|
||||
.execute()
|
||||
)
|
||||
data = cast(List[Dict[str, Any]], result.data or [])
|
||||
return data[0] if data else None
|
||||
|
||||
|
||||
def delete_sessions(user_id: str) -> None:
|
||||
supabase = get_supabase()
|
||||
supabase.table("user_sessions").delete().eq("user_id", user_id).execute()
|
||||
|
||||
|
||||
def create_session(user_id: str, session_token: str, device_info: Optional[str] = None) -> List[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("user_sessions").insert({
|
||||
"user_id": user_id,
|
||||
"session_token": session_token,
|
||||
"device_info": device_info,
|
||||
}).execute()
|
||||
return cast(List[Dict[str, Any]], result.data or [])
|
||||
70
backend/app/repositories/users.py
Normal file
70
backend/app/repositories/users.py
Normal file
@@ -0,0 +1,70 @@
|
||||
from datetime import datetime, timezone
|
||||
from typing import Any, Dict, List, Optional, cast
|
||||
|
||||
from app.core.supabase import get_supabase
|
||||
|
||||
|
||||
def get_user_by_phone(phone: str) -> Optional[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").select("*").eq("phone", phone).single().execute()
|
||||
return cast(Optional[Dict[str, Any]], result.data or None)
|
||||
|
||||
|
||||
def get_user_by_id(user_id: str) -> Optional[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").select("*").eq("id", user_id).single().execute()
|
||||
return cast(Optional[Dict[str, Any]], result.data or None)
|
||||
|
||||
|
||||
def user_exists_by_phone(phone: str) -> bool:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").select("id").eq("phone", phone).execute()
|
||||
return bool(result.data)
|
||||
|
||||
|
||||
def create_user(payload: Dict[str, Any]) -> List[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").insert(payload).execute()
|
||||
return cast(List[Dict[str, Any]], result.data or [])
|
||||
|
||||
|
||||
def list_users() -> List[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").select("*").order("created_at", desc=True).execute()
|
||||
return cast(List[Dict[str, Any]], result.data or [])
|
||||
|
||||
|
||||
def update_user(user_id: str, payload: Dict[str, Any]) -> List[Dict[str, Any]]:
|
||||
supabase = get_supabase()
|
||||
result = supabase.table("users").update(payload).eq("id", user_id).execute()
|
||||
return cast(List[Dict[str, Any]], result.data or [])
|
||||
|
||||
|
||||
def _parse_expires_at(expires_at: Any) -> Optional[datetime]:
|
||||
try:
|
||||
expires_at_dt = datetime.fromisoformat(str(expires_at).replace("Z", "+00:00"))
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
if expires_at_dt.tzinfo is None:
|
||||
expires_at_dt = expires_at_dt.replace(tzinfo=timezone.utc)
|
||||
return expires_at_dt.astimezone(timezone.utc)
|
||||
|
||||
|
||||
def deactivate_user_if_expired(user: Dict[str, Any]) -> bool:
|
||||
expires_at = user.get("expires_at")
|
||||
if not expires_at:
|
||||
return False
|
||||
|
||||
expires_at_dt = _parse_expires_at(expires_at)
|
||||
if not expires_at_dt:
|
||||
return False
|
||||
|
||||
if datetime.now(timezone.utc) <= expires_at_dt:
|
||||
return False
|
||||
|
||||
user_id = user.get("id")
|
||||
if user.get("is_active") and user_id:
|
||||
update_user(cast(str, user_id), {"is_active": False})
|
||||
|
||||
return True
|
||||
1301
backend/app/services/creator_scraper.py
Normal file
1301
backend/app/services/creator_scraper.py
Normal file
File diff suppressed because it is too large
Load Diff
@@ -3,8 +3,10 @@ GLM AI 服务
|
||||
使用智谱 GLM 生成标题和标签
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import re
|
||||
from typing import Any, Optional, cast
|
||||
from loguru import logger
|
||||
from zai import ZhipuAiClient
|
||||
|
||||
@@ -25,6 +27,48 @@ class GLMService:
|
||||
self.client = ZhipuAiClient(api_key=settings.GLM_API_KEY)
|
||||
return self.client
|
||||
|
||||
async def _call_glm(
|
||||
self,
|
||||
*,
|
||||
prompt: str,
|
||||
max_tokens: int,
|
||||
temperature: float,
|
||||
action: str,
|
||||
timeout_seconds: float = 85.0,
|
||||
) -> str:
|
||||
"""统一 GLM 调用入口,避免重复调用代码"""
|
||||
client = self._get_client()
|
||||
logger.info(
|
||||
f"{action} | model={settings.GLM_MODEL} | max_tokens={max_tokens} | temperature={temperature}"
|
||||
)
|
||||
|
||||
try:
|
||||
response = await asyncio.wait_for(
|
||||
asyncio.to_thread(
|
||||
client.chat.completions.create,
|
||||
model=settings.GLM_MODEL,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
thinking={"type": "disabled"},
|
||||
max_tokens=max_tokens,
|
||||
temperature=temperature,
|
||||
),
|
||||
timeout=timeout_seconds,
|
||||
)
|
||||
except asyncio.TimeoutError as exc:
|
||||
raise Exception("GLM 请求超时,请稍后重试") from exc
|
||||
|
||||
completion = cast(Any, response)
|
||||
choices = getattr(completion, "choices", None)
|
||||
if not choices:
|
||||
raise Exception("AI 返回内容为空")
|
||||
|
||||
message = getattr(choices[0], "message", None)
|
||||
content = getattr(message, "content", "")
|
||||
text = content.strip() if isinstance(content, str) else str(content or "").strip()
|
||||
if not text:
|
||||
raise Exception("AI 返回内容为空")
|
||||
return text
|
||||
|
||||
async def generate_title_tags(self, text: str) -> dict:
|
||||
"""
|
||||
根据口播文案生成标题和标签
|
||||
@@ -35,32 +79,28 @@ class GLMService:
|
||||
Returns:
|
||||
{"title": "标题", "tags": ["标签1", "标签2", ...]}
|
||||
"""
|
||||
prompt = f"""根据以下口播文案,生成一个吸引人的短视频标题和3个相关标签。
|
||||
prompt = f"""根据以下口播文案,生成一个吸引人的短视频标题、副标题和3个相关标签。
|
||||
|
||||
口播文案:
|
||||
{text}
|
||||
|
||||
要求:
|
||||
1. 标题要简洁有力,能吸引观众点击,不超过10个字
|
||||
2. 标签要与内容相关,便于搜索和推荐,只要3个
|
||||
2. 副标题是对标题的补充说明或描述性文字,不超过20个字
|
||||
3. 标签要与内容相关,便于搜索和推荐,只要3个
|
||||
4. 标题、副标题和标签必须使用与口播文案相同的语言(如文案是英文就用英文,日文就用日文)
|
||||
|
||||
请严格按以下JSON格式返回(不要包含其他内容):
|
||||
{{"title": "标题", "tags": ["标签1", "标签2", "标签3"]}}"""
|
||||
{{"title": "标题", "secondary_title": "副标题", "tags": ["标签1", "标签2", "标签3"]}}"""
|
||||
|
||||
try:
|
||||
client = self._get_client()
|
||||
logger.info(f"Calling GLM API with model: {settings.GLM_MODEL}")
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model=settings.GLM_MODEL,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
thinking={"type": "disabled"}, # 禁用思考模式,加快响应
|
||||
content = await self._call_glm(
|
||||
prompt=prompt,
|
||||
max_tokens=500,
|
||||
temperature=0.7
|
||||
temperature=0.7,
|
||||
action="生成标题与标签",
|
||||
timeout_seconds=75.0,
|
||||
)
|
||||
|
||||
# 提取生成的内容
|
||||
content = response.choices[0].message.content
|
||||
logger.info(f"GLM response (model: {settings.GLM_MODEL}): {content}")
|
||||
|
||||
# 解析 JSON
|
||||
@@ -71,17 +111,24 @@ class GLMService:
|
||||
logger.error(f"GLM service error: {e}")
|
||||
raise Exception(f"AI 生成失败: {str(e)}")
|
||||
|
||||
async def rewrite_script(self, text: str) -> str:
|
||||
async def rewrite_script(self, text: str, custom_prompt: Optional[str] = None) -> str:
|
||||
"""
|
||||
AI 洗稿(文案改写)
|
||||
AI 改写文案
|
||||
|
||||
Args:
|
||||
text: 原始文案
|
||||
custom_prompt: 自定义提示词,为空则使用默认提示词
|
||||
|
||||
Returns:
|
||||
改写后的文案
|
||||
"""
|
||||
prompt = f"""请将以下视频文案进行改写。
|
||||
if custom_prompt and custom_prompt.strip():
|
||||
prompt = f"""{custom_prompt.strip()}
|
||||
|
||||
原始文案:
|
||||
{text}"""
|
||||
else:
|
||||
prompt = f"""请将以下视频文案进行改写。
|
||||
|
||||
原始文案:
|
||||
{text}
|
||||
@@ -93,26 +140,163 @@ class GLMService:
|
||||
4. 不要返回多余的解释,只返回改写后的正文"""
|
||||
|
||||
try:
|
||||
client = self._get_client()
|
||||
logger.info(f"Using GLM to rewrite script")
|
||||
|
||||
response = client.chat.completions.create(
|
||||
model=settings.GLM_MODEL,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
thinking={"type": "disabled"},
|
||||
content = await self._call_glm(
|
||||
prompt=prompt,
|
||||
max_tokens=2000,
|
||||
temperature=0.8
|
||||
temperature=0.8,
|
||||
action="改写文案",
|
||||
timeout_seconds=85.0,
|
||||
)
|
||||
|
||||
content = response.choices[0].message.content
|
||||
logger.info("GLM rewrite completed")
|
||||
return content.strip()
|
||||
return content
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"GLM rewrite error: {e}")
|
||||
raise Exception(f"AI 改写失败: {str(e)}")
|
||||
|
||||
async def analyze_topics(self, titles: list[str]) -> list[str]:
|
||||
"""
|
||||
分析视频标题列表并归纳热门话题(最多 10 个)
|
||||
"""
|
||||
cleaned_titles = [str(title).strip() for title in titles if str(title).strip()]
|
||||
if not cleaned_titles:
|
||||
raise Exception("标题列表为空")
|
||||
|
||||
limited_titles = cleaned_titles[:50]
|
||||
titles_text = "\n".join(f"{idx + 1}. {title}" for idx, title in enumerate(limited_titles))
|
||||
|
||||
prompt = f"""以下是某短视频博主最近发布的视频标题列表:
|
||||
|
||||
{titles_text}
|
||||
|
||||
请分析这些标题,归纳总结出该博主内容中最热门的话题方向。
|
||||
|
||||
要求:
|
||||
1. 提取不超过10个话题方向
|
||||
2. 每个话题用简短短语描述(建议 5-15 字)
|
||||
3. 按热门程度排序(出现频率高的在前)
|
||||
4. 只返回话题列表,每行一个,不要编号、解释或多余内容"""
|
||||
|
||||
try:
|
||||
content = await self._call_glm(
|
||||
prompt=prompt,
|
||||
max_tokens=500,
|
||||
temperature=0.5,
|
||||
action="分析博主话题",
|
||||
timeout_seconds=85.0,
|
||||
)
|
||||
topics = self._parse_topic_lines(content)
|
||||
if not topics:
|
||||
raise Exception("未识别到有效话题")
|
||||
|
||||
logger.info(f"GLM topic analysis completed: {len(topics)} topics")
|
||||
return topics[:10]
|
||||
except Exception as e:
|
||||
logger.error(f"GLM topic analysis error: {e}")
|
||||
raise Exception(f"话题分析失败: {str(e)}")
|
||||
|
||||
async def generate_script_from_topic(self, topic: str, word_count: int, titles: list[str]) -> str:
|
||||
"""
|
||||
根据选中话题与博主标题风格生成文案
|
||||
"""
|
||||
topic_value = str(topic or "").strip()
|
||||
if not topic_value:
|
||||
raise Exception("话题不能为空")
|
||||
|
||||
cleaned_titles = [str(title).strip() for title in titles if str(title).strip()]
|
||||
if not cleaned_titles:
|
||||
raise Exception("参考标题为空")
|
||||
|
||||
word_count_value = max(80, min(int(word_count), 1000))
|
||||
sample_titles = "\n".join(f"{idx + 1}. {title}" for idx, title in enumerate(cleaned_titles[:10]))
|
||||
|
||||
prompt = f"""请围绕「{topic_value}」这个话题,生成一段短视频口播文案。
|
||||
|
||||
参考该博主的标题风格:
|
||||
{sample_titles}
|
||||
|
||||
要求:
|
||||
1. 文案字数约 {word_count_value} 字
|
||||
2. 适合短视频口播,语气自然、有吸引力
|
||||
3. 开头要有钩子吸引观众
|
||||
4. 只返回文案正文,不要标题和其他说明"""
|
||||
|
||||
try:
|
||||
content = await self._call_glm(
|
||||
prompt=prompt,
|
||||
max_tokens=min(word_count_value * 3, 4000),
|
||||
temperature=0.8,
|
||||
action=f"按话题生成文案(topic={topic_value})",
|
||||
timeout_seconds=88.0,
|
||||
)
|
||||
|
||||
logger.info("GLM topic script generation completed")
|
||||
return content
|
||||
except Exception as e:
|
||||
logger.error(f"GLM topic script generation error: {e}")
|
||||
raise Exception(f"文案生成失败: {str(e)}")
|
||||
|
||||
def _parse_topic_lines(self, content: str) -> list[str]:
|
||||
lines = [line.strip() for line in str(content or "").splitlines()]
|
||||
topics: list[str] = []
|
||||
seen: set[str] = set()
|
||||
|
||||
for line in lines:
|
||||
if not line:
|
||||
continue
|
||||
|
||||
cleaned = re.sub(r"^\s*(?:[-*•]+|\d+[.)、\s]+)", "", line).strip()
|
||||
cleaned = cleaned.strip('"“”')
|
||||
if not cleaned:
|
||||
continue
|
||||
|
||||
if cleaned in seen:
|
||||
continue
|
||||
seen.add(cleaned)
|
||||
topics.append(cleaned)
|
||||
|
||||
if len(topics) >= 10:
|
||||
break
|
||||
|
||||
return topics
|
||||
|
||||
|
||||
|
||||
async def translate_text(self, text: str, target_lang: str) -> str:
|
||||
"""
|
||||
将文案翻译为指定语言
|
||||
|
||||
Args:
|
||||
text: 原始文案
|
||||
target_lang: 目标语言(如 English, 日本語 等)
|
||||
|
||||
Returns:
|
||||
翻译后的文案
|
||||
"""
|
||||
prompt = f"""请将以下文案翻译为{target_lang}。
|
||||
|
||||
原文:
|
||||
{text}
|
||||
|
||||
要求:
|
||||
1. 只返回翻译后的文案,不要添加任何解释或说明
|
||||
2. 保持原文的语气和风格
|
||||
3. 翻译要自然流畅,符合目标语言的表达习惯"""
|
||||
|
||||
try:
|
||||
content = await self._call_glm(
|
||||
prompt=prompt,
|
||||
max_tokens=2000,
|
||||
temperature=0.3,
|
||||
action=f"翻译文案(target={target_lang})",
|
||||
timeout_seconds=75.0,
|
||||
)
|
||||
logger.info("GLM translation completed")
|
||||
return content
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"GLM translate error: {e}")
|
||||
raise Exception(f"AI 翻译失败: {str(e)}")
|
||||
|
||||
def _parse_json_response(self, content: str) -> dict:
|
||||
"""解析 GLM 返回的 JSON 内容"""
|
||||
@@ -124,6 +308,8 @@ class GLMService:
|
||||
|
||||
# 尝试提取 JSON 块
|
||||
json_match = re.search(r'\{[^{}]*"title"[^{}]*"tags"[^{}]*\}', content, re.DOTALL)
|
||||
if not json_match:
|
||||
json_match = re.search(r'\{[^{}]*"title"[^{}]*"secondary_title"[^{}]*"tags"[^{}]*\}', content, re.DOTALL)
|
||||
if json_match:
|
||||
try:
|
||||
return json.loads(json_match.group())
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
"""
|
||||
唇形同步服务
|
||||
通过 subprocess 调用 LatentSync conda 环境进行推理
|
||||
配置为使用 GPU1 (CUDA:1)
|
||||
混合方案: 短视频用 LatentSync (高质量), 长视频用 MuseTalk (高速度)
|
||||
路由阈值: LIPSYNC_DURATION_THRESHOLD (默认 120s)
|
||||
"""
|
||||
import os
|
||||
import shutil
|
||||
@@ -11,21 +11,24 @@ import asyncio
|
||||
import httpx
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
from typing import Optional
|
||||
from typing import Optional, Literal
|
||||
|
||||
from app.core.config import settings
|
||||
|
||||
|
||||
class LipSyncService:
|
||||
"""唇形同步服务 - LatentSync 1.6 集成 (Subprocess 方式)"""
|
||||
|
||||
class LipSyncService:
|
||||
"""唇形同步服务 - LatentSync 1.6 + MuseTalk 1.5 混合方案"""
|
||||
|
||||
def __init__(self):
|
||||
self.use_local = settings.LATENTSYNC_LOCAL
|
||||
self.api_url = settings.LATENTSYNC_API_URL
|
||||
self.latentsync_dir = settings.LATENTSYNC_DIR
|
||||
self.gpu_id = settings.LATENTSYNC_GPU_ID
|
||||
self.use_server = settings.LATENTSYNC_USE_SERVER
|
||||
|
||||
|
||||
# MuseTalk 配置
|
||||
self.musetalk_api_url = settings.MUSETALK_API_URL
|
||||
|
||||
# GPU 并发锁 (Serial Queue)
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
@@ -103,7 +106,7 @@ class LipSyncService:
|
||||
"-t", str(target_duration), # 截取到目标时长
|
||||
"-c:v", "libx264",
|
||||
"-preset", "fast",
|
||||
"-crf", "18",
|
||||
"-crf", "23",
|
||||
"-an", # 去掉原音频
|
||||
output_path
|
||||
]
|
||||
@@ -118,139 +121,43 @@ class LipSyncService:
|
||||
logger.warning(f"⚠️ 视频循环异常: {e}")
|
||||
return video_path
|
||||
|
||||
def _preprocess_video(self, video_path: str, output_path: str, target_height: int = 720) -> str:
|
||||
"""
|
||||
视频预处理:压缩视频以加速后续处理
|
||||
- 限制最大高度为 target_height (默认720p)
|
||||
- 保持宽高比
|
||||
- 使用快速编码预设
|
||||
|
||||
Returns: 预处理后的视频路径
|
||||
"""
|
||||
import subprocess
|
||||
import json
|
||||
|
||||
# 获取视频信息 (使用 JSON 格式更可靠)
|
||||
probe_cmd = [
|
||||
"ffprobe", "-v", "error",
|
||||
"-select_streams", "v:0",
|
||||
"-show_entries", "stream=height,width",
|
||||
"-of", "json",
|
||||
video_path
|
||||
]
|
||||
try:
|
||||
result = subprocess.run(probe_cmd, capture_output=True, text=True, timeout=10)
|
||||
if result.returncode != 0:
|
||||
logger.warning(f"⚠️ ffprobe 失败: {result.stderr[:100]}")
|
||||
return video_path
|
||||
|
||||
probe_data = json.loads(result.stdout)
|
||||
streams = probe_data.get("streams", [])
|
||||
if not streams:
|
||||
logger.warning("⚠️ 无法获取视频流信息,跳过预处理")
|
||||
return video_path
|
||||
|
||||
current_height = streams[0].get("height", 0)
|
||||
current_width = streams[0].get("width", 0)
|
||||
|
||||
if current_height == 0:
|
||||
logger.warning("⚠️ 视频高度为 0,跳过预处理")
|
||||
return video_path
|
||||
|
||||
logger.info(f"📹 原始视频分辨率: {current_width}×{current_height}")
|
||||
|
||||
except json.JSONDecodeError as e:
|
||||
logger.warning(f"⚠️ ffprobe 输出解析失败: {e}")
|
||||
return video_path
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.warning("⚠️ ffprobe 超时,跳过预处理")
|
||||
return video_path
|
||||
except Exception as e:
|
||||
logger.warning(f"⚠️ 获取视频信息失败: {e}")
|
||||
return video_path
|
||||
|
||||
# 如果视频已经足够小,跳过压缩
|
||||
if current_height <= target_height:
|
||||
logger.info(f"📹 视频高度 {current_height}p <= {target_height}p,无需压缩")
|
||||
return video_path
|
||||
|
||||
logger.info(f"📹 预处理视频: {current_height}p → {target_height}p")
|
||||
|
||||
# 使用 FFmpeg 压缩
|
||||
compress_cmd = [
|
||||
"ffmpeg", "-y",
|
||||
"-i", video_path,
|
||||
"-vf", f"scale=-2:{target_height}", # 保持宽高比,高度设为 target_height
|
||||
"-c:v", "libx264",
|
||||
"-preset", "ultrafast", # 最快编码速度
|
||||
"-crf", "23", # 质量因子
|
||||
"-c:a", "copy", # 音频直接复制
|
||||
output_path
|
||||
]
|
||||
|
||||
try:
|
||||
result = subprocess.run(
|
||||
compress_cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120 # 增加超时时间到2分钟
|
||||
)
|
||||
if result.returncode == 0 and Path(output_path).exists():
|
||||
original_size = Path(video_path).stat().st_size / 1024 / 1024
|
||||
new_size = Path(output_path).stat().st_size / 1024 / 1024
|
||||
logger.info(f"✅ 视频压缩完成: {original_size:.1f}MB → {new_size:.1f}MB")
|
||||
return output_path
|
||||
else:
|
||||
logger.warning(f"⚠️ 视频压缩失败: {result.stderr[:200]}")
|
||||
return video_path
|
||||
except subprocess.TimeoutExpired:
|
||||
logger.warning("⚠️ 视频压缩超时,使用原始视频")
|
||||
return video_path
|
||||
except Exception as e:
|
||||
logger.warning(f"⚠️ 视频压缩异常: {e}")
|
||||
return video_path
|
||||
async def generate(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
fps: int = 25,
|
||||
model_mode: Literal["default", "fast", "advanced"] = "default",
|
||||
) -> str:
|
||||
"""生成唇形同步视频"""
|
||||
logger.info(f"🎬 唇形同步任务: {Path(video_path).name} + {Path(audio_path).name}")
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
normalized_mode: Literal["default", "fast", "advanced"] = model_mode
|
||||
if normalized_mode not in ("default", "fast", "advanced"):
|
||||
normalized_mode = "default"
|
||||
logger.info(f"🧠 Lipsync 模式: {normalized_mode}")
|
||||
|
||||
if self.use_local:
|
||||
return await self._local_generate(video_path, audio_path, output_path, fps, normalized_mode)
|
||||
else:
|
||||
return await self._remote_generate(video_path, audio_path, output_path, fps, normalized_mode)
|
||||
|
||||
async def generate(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
fps: int = 25
|
||||
) -> str:
|
||||
"""生成唇形同步视频"""
|
||||
logger.info(f"🎬 唇形同步任务: {Path(video_path).name} + {Path(audio_path).name}")
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
if self.use_local:
|
||||
return await self._local_generate(video_path, audio_path, output_path, fps)
|
||||
else:
|
||||
return await self._remote_generate(video_path, audio_path, output_path, fps)
|
||||
|
||||
async def _local_generate(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
fps: int
|
||||
) -> str:
|
||||
"""使用 subprocess 调用 LatentSync conda 环境"""
|
||||
|
||||
# 检查前置条件
|
||||
if not self._check_conda_env():
|
||||
logger.warning("⚠️ Conda 环境不可用,使用 Fallback")
|
||||
shutil.copy(video_path, output_path)
|
||||
return output_path
|
||||
|
||||
if not self._check_weights():
|
||||
logger.warning("⚠️ 模型权重不存在,使用 Fallback")
|
||||
shutil.copy(video_path, output_path)
|
||||
return output_path
|
||||
|
||||
logger.info("⏳ 等待 GPU 资源 (排队中)...")
|
||||
async with self._lock:
|
||||
# 使用临时目录存放中间文件
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
tmpdir = Path(tmpdir)
|
||||
async def _local_generate(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
fps: int,
|
||||
model_mode: Literal["default", "fast", "advanced"],
|
||||
) -> str:
|
||||
"""使用 subprocess 调用 LatentSync conda 环境"""
|
||||
|
||||
logger.info("⏳ 等待 GPU 资源 (排队中)...")
|
||||
async with self._lock:
|
||||
# 使用临时目录存放中间文件
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
tmpdir = Path(tmpdir)
|
||||
|
||||
# 获取音频和视频时长
|
||||
audio_duration = self._get_media_duration(audio_path)
|
||||
@@ -265,12 +172,53 @@ class LipSyncService:
|
||||
str(looped_video),
|
||||
audio_duration
|
||||
)
|
||||
else:
|
||||
actual_video_path = video_path
|
||||
|
||||
if self.use_server:
|
||||
# 模式 A: 调用常驻服务 (加速模式)
|
||||
return await self._call_persistent_server(actual_video_path, audio_path, output_path)
|
||||
else:
|
||||
actual_video_path = video_path
|
||||
|
||||
# 模型路由
|
||||
force_musetalk = model_mode == "fast"
|
||||
force_latentsync = model_mode == "advanced"
|
||||
auto_to_musetalk = (
|
||||
model_mode == "default"
|
||||
and audio_duration is not None
|
||||
and audio_duration >= settings.LIPSYNC_DURATION_THRESHOLD
|
||||
)
|
||||
|
||||
if force_musetalk:
|
||||
logger.info("⚡ 强制快速模型:MuseTalk")
|
||||
musetalk_result = await self._call_musetalk_server(
|
||||
actual_video_path, audio_path, output_path
|
||||
)
|
||||
if musetalk_result:
|
||||
return musetalk_result
|
||||
logger.warning("⚠️ MuseTalk 不可用,快速模型回退到 LatentSync")
|
||||
elif auto_to_musetalk:
|
||||
logger.info(
|
||||
f"🔄 音频 {audio_duration:.1f}s >= {settings.LIPSYNC_DURATION_THRESHOLD}s,路由到 MuseTalk"
|
||||
)
|
||||
musetalk_result = await self._call_musetalk_server(
|
||||
actual_video_path, audio_path, output_path
|
||||
)
|
||||
if musetalk_result:
|
||||
return musetalk_result
|
||||
logger.warning("⚠️ MuseTalk 不可用,回退到 LatentSync(长视频,会较慢)")
|
||||
elif force_latentsync:
|
||||
logger.info("🎯 强制高级模型:LatentSync")
|
||||
|
||||
# 检查 LatentSync 前置条件(仅在需要回退或使用 LatentSync 时)
|
||||
if not self._check_conda_env():
|
||||
logger.warning("⚠️ Conda 环境不可用,使用 Fallback")
|
||||
shutil.copy(video_path, output_path)
|
||||
return output_path
|
||||
|
||||
if not self._check_weights():
|
||||
logger.warning("⚠️ 模型权重不存在,使用 Fallback")
|
||||
shutil.copy(video_path, output_path)
|
||||
return output_path
|
||||
|
||||
if self.use_server:
|
||||
# 模式 A: 调用常驻服务 (加速模式)
|
||||
return await self._call_persistent_server(actual_video_path, audio_path, output_path)
|
||||
|
||||
logger.info("🔄 调用 LatentSync 推理 (subprocess)...")
|
||||
|
||||
@@ -352,6 +300,55 @@ class LipSyncService:
|
||||
shutil.copy(video_path, output_path)
|
||||
return output_path
|
||||
|
||||
async def _call_musetalk_server(
|
||||
self, video_path: str, audio_path: str, output_path: str
|
||||
) -> Optional[str]:
|
||||
"""
|
||||
调用 MuseTalk 常驻服务。
|
||||
成功返回 output_path,不可用返回 None(信号上层回退到 LatentSync)。
|
||||
"""
|
||||
server_url = self.musetalk_api_url
|
||||
logger.info(f"⚡ 调用 MuseTalk 服务: {server_url}")
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=3600.0) as client:
|
||||
# 健康检查
|
||||
try:
|
||||
resp = await client.get(f"{server_url}/health", timeout=5.0)
|
||||
if resp.status_code != 200:
|
||||
logger.warning("⚠️ MuseTalk 健康检查失败")
|
||||
return None
|
||||
health = resp.json()
|
||||
if not health.get("model_loaded"):
|
||||
logger.warning("⚠️ MuseTalk 模型未加载")
|
||||
return None
|
||||
except Exception:
|
||||
logger.warning("⚠️ 无法连接 MuseTalk 服务")
|
||||
return None
|
||||
|
||||
# 发送推理请求
|
||||
payload = {
|
||||
"video_path": str(Path(video_path).resolve()),
|
||||
"audio_path": str(Path(audio_path).resolve()),
|
||||
"video_out_path": str(Path(output_path).resolve()),
|
||||
"batch_size": settings.MUSETALK_BATCH_SIZE,
|
||||
}
|
||||
|
||||
response = await client.post(f"{server_url}/lipsync", json=payload)
|
||||
|
||||
if response.status_code == 200:
|
||||
result = response.json()
|
||||
if Path(result["output_path"]).exists():
|
||||
logger.info(f"✅ MuseTalk 推理完成: {output_path}")
|
||||
return output_path
|
||||
|
||||
logger.error(f"❌ MuseTalk 服务报错: {response.text}")
|
||||
return None
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"❌ MuseTalk 调用失败: {e}")
|
||||
return None
|
||||
|
||||
async def _call_persistent_server(self, video_path: str, audio_path: str, output_path: str) -> str:
|
||||
"""调用本地常驻服务 (server.py)"""
|
||||
server_url = "http://localhost:8007"
|
||||
@@ -369,7 +366,7 @@ class LipSyncService:
|
||||
}
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=1200.0) as client:
|
||||
async with httpx.AsyncClient(timeout=3600.0) as client:
|
||||
# 先检查健康状态
|
||||
try:
|
||||
resp = await client.get(f"{server_url}/health", timeout=5.0)
|
||||
@@ -398,28 +395,36 @@ class LipSyncService:
|
||||
raise e
|
||||
|
||||
async def _local_generate_subprocess(self, video_path: str, audio_path: str, output_path: str) -> str:
|
||||
"""原有的 subprocess 逻辑提取为独立方法"""
|
||||
logger.info("🔄 调用 LatentSync 推理 (subprocess)...")
|
||||
# ... (此处仅为占位符提示,实际代码需要调整结构以避免重复,
|
||||
# 但鉴于原有 _local_generate 的结构,最简单的方法是在 _local_generate 内部做判断,
|
||||
# 如果 use_server 失败,可以 retry 或者 _local_generate 不做拆分,直接在里面写逻辑)
|
||||
# 为了最小化改动且保持安全,上面的 _call_persistent_server 如果失败,
|
||||
# 最好不要自动回退(可能导致双重资源消耗),而是直接报错让用户检查服务。
|
||||
# 但为了用户体验,我们可以允许回退。
|
||||
# *修正策略*:
|
||||
# 我将不拆分 _local_generate_subprocess,而是将 subprocess 逻辑保留在 _local_generate 的后半部分。
|
||||
# 如果 self.use_server 为 True,先尝试调用 server,成功则 return,失败则继续往下走。
|
||||
pass
|
||||
"""
|
||||
原有的 subprocess 回退逻辑
|
||||
|
||||
注意:subprocess 回退已被禁用,原因如下:
|
||||
1. subprocess 模式需要重新加载模型,消耗大量时间和显存
|
||||
2. 如果常驻服务不可用,应该让用户知道并修复服务,而非静默回退
|
||||
3. 避免双重资源消耗导致的 GPU OOM
|
||||
|
||||
如果常驻服务不可用,请检查:
|
||||
- 服务是否启动: python scripts/server.py (在 models/LatentSync 目录)
|
||||
- 端口是否被占用: lsof -i:8007
|
||||
- GPU 显存是否充足: nvidia-smi
|
||||
"""
|
||||
raise RuntimeError(
|
||||
"LatentSync 常驻服务不可用,无法进行唇形同步。"
|
||||
"请确保 LatentSync 服务已启动 (cd models/LatentSync && python scripts/server.py)"
|
||||
)
|
||||
|
||||
async def _remote_generate(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
fps: int
|
||||
) -> str:
|
||||
"""调用远程 LatentSync API 服务"""
|
||||
logger.info(f"📡 调用远程 API: {self.api_url}")
|
||||
async def _remote_generate(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
fps: int,
|
||||
model_mode: Literal["default", "fast", "advanced"],
|
||||
) -> str:
|
||||
"""调用远程 LatentSync API 服务"""
|
||||
if model_mode == "fast":
|
||||
logger.warning("⚠️ 远程模式未接入 MuseTalk,快速模型将使用远程 LatentSync")
|
||||
logger.info(f"📡 调用远程 API: {self.api_url}")
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=600.0) as client:
|
||||
@@ -472,8 +477,18 @@ class LipSyncService:
|
||||
except:
|
||||
pass
|
||||
|
||||
# 检查 MuseTalk 服务
|
||||
musetalk_ready = False
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=5.0) as client:
|
||||
resp = await client.get(f"{self.musetalk_api_url}/health")
|
||||
if resp.status_code == 200:
|
||||
musetalk_ready = resp.json().get("model_loaded", False)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return {
|
||||
"model": "LatentSync 1.6",
|
||||
"model": "LatentSync 1.6 + MuseTalk 1.5",
|
||||
"conda_env": conda_ok,
|
||||
"weights": weights_ok,
|
||||
"gpu": gpu_ok,
|
||||
@@ -481,5 +496,7 @@ class LipSyncService:
|
||||
"gpu_id": self.gpu_id,
|
||||
"inference_steps": settings.LATENTSYNC_INFERENCE_STEPS,
|
||||
"guidance_scale": settings.LATENTSYNC_GUIDANCE_SCALE,
|
||||
"ready": conda_ok and weights_ok and gpu_ok
|
||||
"ready": conda_ok and weights_ok and gpu_ok,
|
||||
"musetalk_ready": musetalk_ready,
|
||||
"lipsync_threshold": settings.LIPSYNC_DURATION_THRESHOLD,
|
||||
}
|
||||
|
||||
@@ -18,19 +18,25 @@ from app.services.storage import storage_service
|
||||
from .uploader.bilibili_uploader import BilibiliUploader
|
||||
from .uploader.douyin_uploader import DouyinUploader
|
||||
from .uploader.xiaohongshu_uploader import XiaohongshuUploader
|
||||
from .uploader.weixin_uploader import WeixinUploader
|
||||
|
||||
|
||||
class PublishService:
|
||||
"""Social media publishing service (with user isolation)"""
|
||||
class PublishService:
|
||||
"""Social media publishing service (with user isolation)"""
|
||||
|
||||
# 支持的平台配置
|
||||
PLATFORMS: Dict[str, Dict[str, Any]] = {
|
||||
"bilibili": {"name": "B站", "url": "https://member.bilibili.com/platform/upload/video/frame", "enabled": True},
|
||||
"douyin": {"name": "抖音", "url": "https://creator.douyin.com/", "enabled": True},
|
||||
"xiaohongshu": {"name": "小红书", "url": "https://creator.xiaohongshu.com/", "enabled": True},
|
||||
"weixin": {"name": "微信视频号", "url": "https://channels.weixin.qq.com/", "enabled": False},
|
||||
"kuaishou": {"name": "快手", "url": "https://cp.kuaishou.com/", "enabled": False},
|
||||
}
|
||||
PLATFORMS: Dict[str, Dict[str, Any]] = {
|
||||
"douyin": {"name": "抖音", "url": "https://creator.douyin.com/", "enabled": True},
|
||||
"weixin": {"name": "微信视频号", "url": "https://channels.weixin.qq.com/", "enabled": True},
|
||||
"bilibili": {"name": "B站", "url": "https://member.bilibili.com/platform/upload/video/frame", "enabled": True},
|
||||
"xiaohongshu": {"name": "小红书", "url": "https://creator.xiaohongshu.com/", "enabled": True},
|
||||
}
|
||||
|
||||
COOKIE_DOMAINS: Dict[str, str] = {
|
||||
"douyin": ".douyin.com",
|
||||
"weixin": ".weixin.qq.com",
|
||||
"xiaohongshu": ".xiaohongshu.com",
|
||||
}
|
||||
|
||||
def __init__(self) -> None:
|
||||
# 存储活跃的登录会话,用于跟踪登录状态
|
||||
@@ -182,16 +188,28 @@ class PublishService:
|
||||
tags=tags,
|
||||
publish_date=publish_time,
|
||||
account_file=str(account_file),
|
||||
description=description
|
||||
description=description,
|
||||
user_id=user_id,
|
||||
)
|
||||
elif platform == "xiaohongshu":
|
||||
uploader = XiaohongshuUploader(
|
||||
elif platform == "xiaohongshu":
|
||||
uploader = XiaohongshuUploader(
|
||||
title=title,
|
||||
file_path=local_video_path,
|
||||
tags=tags,
|
||||
publish_date=publish_time,
|
||||
account_file=str(account_file),
|
||||
description=description,
|
||||
user_id=user_id,
|
||||
)
|
||||
elif platform == "weixin":
|
||||
uploader = WeixinUploader(
|
||||
title=title,
|
||||
file_path=local_video_path,
|
||||
tags=tags,
|
||||
publish_date=publish_time,
|
||||
account_file=str(account_file),
|
||||
description=description
|
||||
description=description,
|
||||
user_id=user_id,
|
||||
)
|
||||
else:
|
||||
logger.warning(f"[发布] {platform} 上传功能尚未实现")
|
||||
@@ -225,30 +243,38 @@ class PublishService:
|
||||
async def login(self, platform: str, user_id: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""
|
||||
启动QR码登录流程
|
||||
|
||||
|
||||
Args:
|
||||
platform: 平台 ID
|
||||
user_id: 用户 ID (用于 Cookie 隔离)
|
||||
|
||||
|
||||
Returns:
|
||||
dict: 包含二维码base64图片
|
||||
"""
|
||||
if platform not in self.PLATFORMS:
|
||||
return {"success": False, "message": "不支持的平台"}
|
||||
|
||||
|
||||
try:
|
||||
from .qr_login_service import QRLoginService
|
||||
|
||||
|
||||
# 获取用户专属的 Cookie 目录
|
||||
cookies_dir = self._get_cookies_dir(user_id)
|
||||
|
||||
|
||||
# 清理旧的活跃会话(避免残留会话干扰新登录)
|
||||
session_key = self._get_session_key(platform, user_id)
|
||||
if session_key in self.active_login_sessions:
|
||||
old_service = self.active_login_sessions.pop(session_key)
|
||||
try:
|
||||
await old_service._cleanup()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# 创建QR登录服务
|
||||
qr_service = QRLoginService(platform, cookies_dir)
|
||||
|
||||
|
||||
# 存储活跃会话 (带用户隔离)
|
||||
session_key = self._get_session_key(platform, user_id)
|
||||
self.active_login_sessions[session_key] = qr_service
|
||||
|
||||
|
||||
# 启动登录并获取二维码
|
||||
result = await qr_service.start_login()
|
||||
|
||||
@@ -262,27 +288,28 @@ class PublishService:
|
||||
}
|
||||
|
||||
def get_login_session_status(self, platform: str, user_id: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""获取活跃登录会话的状态"""
|
||||
"""获取活跃登录会话的状态(仅用于扫码轮询)"""
|
||||
session_key = self._get_session_key(platform, user_id)
|
||||
|
||||
# 1. 如果有活跃的扫码会话,优先检查它
|
||||
|
||||
# 只检查活跃的扫码会话,不检查 Cookie 文件
|
||||
# Cookie 文件检查会导致"重新登录"时误判为已登录
|
||||
if session_key in self.active_login_sessions:
|
||||
qr_service = self.active_login_sessions[session_key]
|
||||
status = qr_service.get_login_status()
|
||||
|
||||
|
||||
# 如果登录成功且Cookie已保存,清理会话
|
||||
if status["success"] and status["cookies_saved"]:
|
||||
del self.active_login_sessions[session_key]
|
||||
return {"success": True, "message": "登录成功"}
|
||||
|
||||
return {"success": False, "message": "等待扫码..."}
|
||||
|
||||
# 2. 检查本地Cookie文件是否存在
|
||||
cookie_file = self._get_cookie_path(platform, user_id)
|
||||
if cookie_file.exists():
|
||||
return {"success": True, "message": "已登录 (历史状态)"}
|
||||
|
||||
return {"success": False, "message": "未登录"}
|
||||
|
||||
# 刷脸验证:传递新二维码给前端
|
||||
result: Dict[str, Any] = {"success": False, "message": "等待扫码..."}
|
||||
if status.get("face_verify_qr"):
|
||||
result["face_verify_qr"] = status["face_verify_qr"]
|
||||
return result
|
||||
|
||||
# 没有活跃会话 → 返回 False(前端不应在无会话时轮询)
|
||||
return {"success": False, "message": "无活跃登录会话"}
|
||||
|
||||
def logout(self, platform: str, user_id: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""
|
||||
@@ -310,48 +337,88 @@ class PublishService:
|
||||
logger.exception(f"[登出] 失败: {e}")
|
||||
return {"success": False, "message": f"注销失败: {str(e)}"}
|
||||
|
||||
async def save_cookie_string(self, platform: str, cookie_string: str, user_id: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""
|
||||
保存从客户端浏览器提取的Cookie字符串
|
||||
async def save_cookie_string(self, platform: str, cookie_string: str, user_id: Optional[str] = None) -> Dict[str, Any]:
|
||||
"""
|
||||
保存从客户端浏览器提取的Cookie字符串
|
||||
|
||||
Args:
|
||||
platform: 平台ID
|
||||
cookie_string: document.cookie 格式的Cookie字符串
|
||||
user_id: 用户 ID (用于 Cookie 隔离)
|
||||
"""
|
||||
try:
|
||||
account_file = self._get_cookie_path(platform, user_id)
|
||||
|
||||
# 解析Cookie字符串
|
||||
cookie_dict = {}
|
||||
for item in cookie_string.split('; '):
|
||||
if '=' in item:
|
||||
name, value = item.split('=', 1)
|
||||
cookie_dict[name] = value
|
||||
|
||||
# 对B站进行特殊处理
|
||||
if platform == "bilibili":
|
||||
bilibili_cookies = {}
|
||||
required_fields = ['SESSDATA', 'bili_jct', 'DedeUserID', 'DedeUserID__ckMd5']
|
||||
"""
|
||||
try:
|
||||
if platform not in self.PLATFORMS:
|
||||
return {
|
||||
"success": False,
|
||||
"message": f"不支持的平台: {platform}",
|
||||
}
|
||||
|
||||
account_file = self._get_cookie_path(platform, user_id)
|
||||
|
||||
# 解析Cookie字符串
|
||||
cookie_dict: Dict[str, str] = {}
|
||||
for item in cookie_string.split(';'):
|
||||
item = item.strip()
|
||||
if not item:
|
||||
continue
|
||||
if '=' in item:
|
||||
name, value = item.split('=', 1)
|
||||
cookie_dict[name.strip()] = value.strip()
|
||||
|
||||
if not cookie_dict:
|
||||
return {
|
||||
"success": False,
|
||||
"message": "Cookie 为空,请确认已完成登录",
|
||||
}
|
||||
|
||||
# 对B站进行特殊处理
|
||||
if platform == "bilibili":
|
||||
bilibili_cookies = {}
|
||||
required_fields = ['SESSDATA', 'bili_jct', 'DedeUserID', 'DedeUserID__ckMd5']
|
||||
|
||||
for field in required_fields:
|
||||
if field in cookie_dict:
|
||||
bilibili_cookies[field] = cookie_dict[field]
|
||||
|
||||
if len(bilibili_cookies) < 3:
|
||||
return {
|
||||
"success": False,
|
||||
"message": "Cookie不完整,请确保已登录"
|
||||
}
|
||||
|
||||
cookie_dict = bilibili_cookies
|
||||
|
||||
# 确保目录存在
|
||||
account_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# 保存Cookie
|
||||
with open(account_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(cookie_dict, f, indent=2)
|
||||
if len(bilibili_cookies) < 3:
|
||||
return {
|
||||
"success": False,
|
||||
"message": "Cookie不完整,请确保已登录"
|
||||
}
|
||||
payload: Any = bilibili_cookies
|
||||
else:
|
||||
cookie_domain = self.COOKIE_DOMAINS.get(platform, "")
|
||||
if not cookie_domain:
|
||||
platform_url = self.PLATFORMS.get(platform, {}).get("url", "")
|
||||
host = re.sub(r"^https?://", "", platform_url).strip("/")
|
||||
cookie_domain = f".{host}" if host else ""
|
||||
|
||||
storage_cookies = []
|
||||
for name, value in cookie_dict.items():
|
||||
if not name:
|
||||
continue
|
||||
storage_cookies.append({
|
||||
"name": name,
|
||||
"value": value,
|
||||
"domain": cookie_domain,
|
||||
"path": "/",
|
||||
"httpOnly": False,
|
||||
"secure": True,
|
||||
"sameSite": "Lax",
|
||||
"expires": -1,
|
||||
})
|
||||
|
||||
payload = {
|
||||
"cookies": storage_cookies,
|
||||
"origins": [],
|
||||
}
|
||||
|
||||
# 确保目录存在
|
||||
account_file.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# 保存Cookie
|
||||
with open(account_file, 'w', encoding='utf-8') as f:
|
||||
json.dump(payload, f, indent=2)
|
||||
|
||||
logger.success(f"[登录] {platform} Cookie已保存 (user: {user_id or 'legacy'})")
|
||||
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user