Compare commits

..

14 Commits
v2.6.1 ... main

Author SHA1 Message Date
Kevin Wong
0e3502c6f0 更新 2026-02-27 16:11:34 +08:00
Kevin Wong
a1604979f0 更新 2026-02-26 11:13:03 +08:00
Kevin Wong
08221e48de 更新 2026-02-26 10:49:22 +08:00
Kevin Wong
42b5cc0c02 更新 2026-02-26 10:14:41 +08:00
Kevin Wong
1717635bfd 更新 2026-02-25 17:51:58 +08:00
Kevin Wong
0a5a17402c 更新 2026-02-24 16:55:29 +08:00
Kevin Wong
bc0fe9326a 更新 2026-02-11 17:48:38 +08:00
Kevin Wong
035ee29d72 更新 2026-02-11 14:33:05 +08:00
Kevin Wong
a6cc919e5c 更新 2026-02-11 13:57:41 +08:00
Kevin Wong
96a298e51c 更新 2026-02-11 13:48:45 +08:00
Kevin Wong
e33dfc3031 更新 2026-02-10 13:31:29 +08:00
Kevin Wong
3129d45b25 更新 2026-02-09 14:47:19 +08:00
Kevin Wong
e226224119 更新 2026-02-08 19:54:11 +08:00
Kevin Wong
ee342cc40f 更新 2026-02-08 16:23:39 +08:00
462 changed files with 219322 additions and 3566 deletions

18
.gitignore vendored
View File

@@ -20,11 +20,14 @@ node_modules/
out/
.turbo/
# ============ IDE ============
# ============ IDE / AI 工具 ============
.vscode/
.idea/
*.swp
*.swo
.agents/
.opencode/
.claude/
# ============ 系统文件 ============
.DS_Store
@@ -35,11 +38,22 @@ desktop.ini
backend/outputs/
backend/uploads/
backend/cookies/
backend/user_data/
backend/debug_screenshots/
backend/keys/
*_cookies.json
# ============ MuseTalk ============
# ============ 模型权重 ============
models/*/checkpoints/
models/MuseTalk/models/
models/MuseTalk/results/
models/LatentSync/temp/
# ============ Remotion 构建 ============
remotion/dist/
# ============ 临时文件 ============
Temp/
# ============ 日志 ============
*.log

278
Docs/ALIPAY_DEPLOY.md Normal file
View File

@@ -0,0 +1,278 @@
# 支付宝付费开通会员 — 部署指南
本文档涵盖支付宝电脑网站支付功能的完整部署流程。用户注册后通过支付宝付费自动激活会员,有效期 1 年。
---
## 前置条件
- 支付宝企业/个体商户账号
- 已在 [支付宝开放平台](https://open.alipay.com) 创建应用并获取 APPID
- 应用已开通 **「电脑网站支付」** 产品权限(`alipay.trade.page.pay` 接口)
- 服务器域名已配置 HTTPS支付宝回调要求公网可达
---
## 第一部分:支付宝开放平台配置
### 1. 创建应用
登录 https://open.alipay.com → 控制台 → 创建应用(或使用已有应用)。
### 2. 开通「电脑网站支付」产品
进入应用详情 → 产品绑定/产品管理 → 添加 **「电脑网站支付」** → 提交审核。
> **注意**:未开通此产品会导致 `ACQ.ACCESS_FORBIDDEN` 错误。
### 3. 生成密钥对
进入应用详情 → 开发设置 → 接口加签方式 → 选择 **RSA2(SHA256)**
1. 使用支付宝官方密钥工具生成 RSA2048 密钥对
2.**应用公钥** 上传到开放平台
3. 上传后平台会显示 **支付宝公钥**`alipayPublicKey_RSA2`
最终你会得到两样东西:
- **应用私钥**:你本地保存,代码用来签名请求
- **支付宝公钥**:平台返回给你,代码用来验证回调签名
> 应用公钥只是上传用的中间产物,代码中不需要。
---
## 第二部分:服务器配置
### 1. 放置密钥文件
将密钥保存为标准 PEM 格式,放到 `backend/keys/` 目录:
```bash
mkdir -p /home/rongye/ProgramFiles/ViGent2/backend/keys
```
**`backend/keys/app_private_key.pem`**(应用私钥):
```
-----BEGIN PRIVATE KEY-----
MIIEvQIBADANBgkqhkiG9w0BAQEFAASC...(你的私钥内容)
...
-----END PRIVATE KEY-----
```
**`backend/keys/alipay_public_key.pem`**(支付宝公钥):
```
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8A...(支付宝公钥内容)
...
-----END PUBLIC KEY-----
```
#### PEM 格式要求
支付宝密钥工具导出的是一行纯文本,需要转换为标准 PEM 格式:
- 必须有头尾标记(`-----BEGIN/END ...-----`
- 密钥内容每 64 字符换行
- 私钥头标记为 `-----BEGIN PRIVATE KEY-----`PKCS#8 格式)
- 公钥头标记为 `-----BEGIN PUBLIC KEY-----`
如果你拿到的是一行裸密钥,用以下命令转换:
```bash
# 私钥格式化(假设裸密钥在 raw_private.txt 中)
echo "-----BEGIN PRIVATE KEY-----" > app_private_key.pem
cat raw_private.txt | fold -w 64 >> app_private_key.pem
echo "-----END PRIVATE KEY-----" >> app_private_key.pem
# 公钥格式化
echo "-----BEGIN PUBLIC KEY-----" > alipay_public_key.pem
cat raw_public.txt | fold -w 64 >> alipay_public_key.pem
echo "-----END PUBLIC KEY-----" >> alipay_public_key.pem
```
> `backend/keys/` 目录已加入 `.gitignore`,不会被提交到仓库。
### 2. 配置环境变量
`backend/.env` 中添加:
```ini
# =============== 支付宝配置 ===============
ALIPAY_APP_ID=你的应用APPID
ALIPAY_PRIVATE_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/app_private_key.pem
ALIPAY_PUBLIC_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/alipay_public_key.pem
ALIPAY_NOTIFY_URL=https://vigent.hbyrkj.top/api/payment/notify
ALIPAY_RETURN_URL=https://vigent.hbyrkj.top/pay
```
| 变量 | 说明 |
|------|------|
| `ALIPAY_APP_ID` | 支付宝开放平台应用 APPID |
| `ALIPAY_PRIVATE_KEY_PATH` | 应用私钥 PEM 文件绝对路径 |
| `ALIPAY_PUBLIC_KEY_PATH` | 支付宝公钥 PEM 文件绝对路径 |
| `ALIPAY_NOTIFY_URL` | 异步回调地址(服务器间通信),必须公网 HTTPS 可达 |
| `ALIPAY_RETURN_URL` | 同步跳转地址(用户支付完成后浏览器跳转回的页面) |
`config.py` 中还有几个可调参数(已有默认值,一般不需要加到 .env
| 变量 | 默认值 | 说明 |
|------|--------|------|
| `ALIPAY_SANDBOX` | `false` | 是否使用沙箱环境 |
| `PAYMENT_AMOUNT` | `999.00` | 会员价格(元) |
| `PAYMENT_EXPIRE_DAYS` | `365` | 会员有效天数 |
### 3. 创建数据库表
通过 Docker 在本地 Supabase 中执行:
```bash
docker exec -i supabase-db psql -U postgres -c "
CREATE TABLE IF NOT EXISTS orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
out_trade_no TEXT UNIQUE NOT NULL,
amount DECIMAL(10, 2) NOT NULL DEFAULT 999.00,
status TEXT DEFAULT 'pending' CHECK (status IN ('pending', 'paid', 'failed')),
trade_no TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
paid_at TIMESTAMP WITH TIME ZONE
);
CREATE INDEX IF NOT EXISTS idx_orders_user_id ON orders(user_id);
CREATE INDEX IF NOT EXISTS idx_orders_out_trade_no ON orders(out_trade_no);
"
```
### 4. 安装依赖
```bash
# 后端(在 venv 中)
cd /home/rongye/ProgramFiles/ViGent2/backend
venv/bin/pip install python-alipay-sdk
```
> 前端无额外依赖需要安装。
### 5. Nginx 配置
确保 Nginx 将 `/api/payment/notify` 代理到后端。如果现有配置已覆盖 `/api/` 前缀,则无需额外修改:
```nginx
location /api/ {
proxy_pass http://localhost:8006;
# ... 现有配置
}
```
### 6. 重启服务
```bash
# 构建前端
cd /home/rongye/ProgramFiles/ViGent2/frontend
npx next build
# 重启
pm2 restart vigent2-backend
pm2 restart vigent2-frontend
```
---
## 第三部分:正式上线
测试通过后,将 `backend/app/core/config.py` 中的测试金额改为正式价格:
```python
PAYMENT_AMOUNT: float = 999.00 # 正式价格
```
或在 `backend/.env` 中添加覆盖:
```ini
PAYMENT_AMOUNT=999.00
```
然后重启后端:
```bash
pm2 restart vigent2-backend
```
---
## 支付流程说明
```
用户注册 → 登录(密码正确但 is_active=false
→ 后端返回 403 + payment_token
→ 前端跳转 /pay 页面
→ POST /api/payment/create-order → 返回支付宝收银台 URL
→ 前端重定向到支付宝收银台页面(支持扫码、账号登录、余额等多种支付方式)
→ 用户完成支付
→ 支付宝异步回调 POST /api/payment/notify
→ 后端验签 → 更新订单 → 激活用户is_active=true, expires_at=+365天
→ 支付宝同步跳转回 /pay?out_trade_no=xxx
→ 前端轮询 GET /api/payment/status/{out_trade_no}
→ 轮询到 paid → 提示成功 → 跳转登录页
→ 用户重新登录 → 成功进入系统
```
**电脑网站支付 vs 当面付**:电脑网站支付(`alipay.trade.page.pay`)会跳转到支付宝官方收银台页面,用户可以选择扫码、支付宝账号登录、余额等多种方式支付,体验更好。当面付(`alipay.trade.precreate`)仅生成一个二维码,只能扫码支付。
会员到期续费同流程:登录时检测到过期 → 返回 PAYMENT_REQUIRED → 跳转 /pay。
管理员手动激活功能不受影响,两种方式并存。
---
## 涉及文件
| 文件 | 变更类型 | 说明 |
|------|---------|------|
| `backend/requirements.txt` | 修改 | 添加 `python-alipay-sdk` |
| `backend/database/schema.sql` | 修改 | 新增 `orders` 表 |
| `backend/app/core/config.py` | 修改 | 支付宝配置项 |
| `backend/app/core/security.py` | 修改 | payment_token 函数 |
| `backend/app/core/deps.py` | 修改 | is_active 安全兜底 |
| `backend/app/repositories/orders.py` | 新建 | orders 数据层 |
| `backend/app/modules/payment/__init__.py` | 新建 | 模块初始化 |
| `backend/app/modules/payment/schemas.py` | 新建 | 请求/响应模型 |
| `backend/app/modules/payment/service.py` | 新建 | 支付业务逻辑(电脑网站支付) |
| `backend/app/modules/payment/router.py` | 新建 | 3 个 API 端点 |
| `backend/app/modules/auth/router.py` | 修改 | 登录返回 PAYMENT_REQUIRED |
| `backend/app/main.py` | 修改 | 注册 payment_router |
| `backend/.env` | 修改 | 支付宝环境变量 |
| `backend/keys/` | 新建 | PEM 密钥文件 |
| `frontend/src/shared/lib/auth.ts` | 修改 | login() 处理 paymentToken |
| `frontend/src/shared/api/axios.ts` | 修改 | PUBLIC_PATHS 加 /pay |
| `frontend/src/app/login/page.tsx` | 修改 | paymentToken 跳转 |
| `frontend/src/app/register/page.tsx` | 修改 | 注册成功提示文案 |
| `frontend/src/app/pay/page.tsx` | 新建 | 付费页面(重定向到支付宝收银台) |
---
## 常见问题
### RSA key format is not supported
密钥文件缺少 PEM 头尾标记或未按 64 字符换行。参考「PEM 格式要求」重新格式化。
### ACQ.ACCESS_FORBIDDEN
应用未开通「电脑网站支付」产品。在支付宝开放平台 → 应用详情 → 产品管理中添加并开通。
### 支付宝回调不到
1. 检查 `ALIPAY_NOTIFY_URL` 是否公网 HTTPS 可达
2. 检查 Nginx 是否将 `/api/payment/notify` 代理到后端
3. 支付宝回调超时15s 未响应)会重试,共重试 8 次,持续 24 小时
### 支付完成后页面未跳转回来
检查 `ALIPAY_RETURN_URL` 配置是否正确,必须是前端 `/pay` 页面的完整 URL`https://vigent.hbyrkj.top/pay`)。支付宝会在用户支付完成后将浏览器重定向到此地址,并附带 `out_trade_no` 等参数。
### 前端显示"网络错误"而非具体错误
API 函数缺少 try/catch 捕获 axios 异常。已在 `auth.ts``register()``login()` 中修复。

View File

@@ -29,15 +29,29 @@ backend/
├── app/
│ ├── core/ # config、deps、security、response
│ ├── modules/ # 业务模块(路由 + 逻辑)
│ │ ├── videos/
│ │ ├── materials/
│ │ ├── publish/
│ │ ├── auth/
│ │ ── ...
│ │ ├── videos/ # 视频生成任务router/schemas/service/workflow
│ │ ├── materials/ # 素材管理router/schemas/service
│ │ ├── publish/ # 多平台发布
│ │ ├── auth/ # 认证与会话
│ │ ── ai/ # AI 功能(标题标签生成、多语言翻译)
│ │ ├── assets/ # 静态资源(字体/样式/BGM
│ │ ├── ref_audios/ # 声音克隆参考音频router/schemas/service
│ │ ├── generated_audios/ # 预生成配音管理router/schemas/service
│ │ ├── login_helper/ # 扫码登录辅助
│ │ ├── tools/ # 工具接口router/schemas/service
│ │ ├── payment/ # 支付宝付费开通router/schemas/service
│ │ └── admin/ # 管理员功能
│ ├── repositories/ # Supabase 数据访问
│ ├── services/ # 外部服务集成
│ │ ├── uploader/ # 平台发布器douyin/weixin
│ │ ├── qr_login_service.py
│ │ ├── publish_service.py
│ │ ├── remotion_service.py
│ │ ├── storage.py
│ │ └── ...
│ └── tests/
├── assets/ # 字体 / 样式 / bgm
├── user_data/ # 用户隔离数据Cookie 等)
├── scripts/
└── requirements.txt
```
@@ -61,6 +75,18 @@ backend/
- 错误通过 `HTTPException` 抛出,统一由全局异常处理返回 `{success:false, message, code}`
- 不再使用 `detail` 作为前端错误文案(前端已改为读 `message`)。
### `/api/videos/generate` 参数契约(关键约定)
- `custom_assignments` 每项使用 `material_path/start/end/source_start/source_end?`,并以时间轴可见段为准。
- `output_aspect_ratio` 仅允许 `9:16` / `16:9`,默认 `9:16`
- 标题显示模式参数:
- `title_display_mode`: `short` / `persistent`(默认 `short`
- `title_duration`: 默认 `4.0`(秒),仅 `short` 模式生效
- 片头副标题参数:
- `secondary_title`: 副标题文字(可选,限 20 字),仅在视频画面中显示,不参与发布标题
- `secondary_title_style_id` / `secondary_title_font_size` / `secondary_title_top_margin`: 副标题样式配置
- workflow/remotion 侧需保持字段透传一致,避免前后端语义漂移。
---
## 4. 认证与权限
@@ -84,6 +110,21 @@ backend/
- 所有文件上传/下载/删除/移动通过 `services/storage.py`
- 需要重命名时使用 `move_file`,避免直接读写 Storage。
### Cookie 存储(用户隔离)
多平台扫码登录产生的 Cookie 按用户隔离存储:
```
backend/user_data/{user_uuid}/cookies/
├── douyin_cookies.json
├── weixin_cookies.json
└── ...
```
- `publish_service.py` 中通过 `_get_cookies_dir(user_id)` / `_get_cookie_path(user_id, platform)` 定位
- 会话 key 格式:`"{user_id}_{platform}"`,确保多用户并发登录互不干扰
- 登录成功后 Cookie 自动保存到对应路径,发布时自动加载
---
## 7. 代码约定
@@ -97,10 +138,13 @@ backend/
## 8. 开发流程建议
- **新增功能**:先建模块,再写 router/service/workflow
- **修复 Bug**:顺手把涉及的逻辑抽到对应 service/workflow。
- **新增功能**:先建模块,**必须**包含 `router.py + schemas.py + service.py`,不允许 router-only
- **修复 Bug**:顺手把涉及的逻辑抽到对应 service/workflow(渐进式改造)
- **改旧模块**:改动哪部分就拆哪部分,不要求一次重构整个文件。
- **核心流程变更**:必跑冒烟(登录/生成/发布)。
> **渐进原则**:新代码高标准,旧代码逐步改。不做大规模一次性重构,避免引入回归风险。
---
## 9. 常用环境变量
@@ -110,14 +154,37 @@ backend/
- `REDIS_URL`
- `GLM_API_KEY`
- `LATENTSYNC_*`
- `CORS_ORIGINS` (CORS 白名单,默认 *)
### MuseTalk / 混合唇形同步
- `MUSETALK_GPU_ID` (GPU 编号,默认 0)
- `MUSETALK_API_URL` (常驻服务地址,默认 http://localhost:8011)
- `MUSETALK_BATCH_SIZE` (推理批大小,默认 32)
- `MUSETALK_VERSION` (v15)
- `MUSETALK_USE_FLOAT16` (半精度,默认 true)
- `LIPSYNC_DURATION_THRESHOLD` (秒,>=此值用 MuseTalk默认 120)
### 微信视频号
- `WEIXIN_HEADLESS_MODE` (headful/headless-new)
- `WEIXIN_CHROME_PATH` / `WEIXIN_BROWSER_CHANNEL`
- `WEIXIN_USER_AGENT` / `WEIXIN_LOCALE` / `WEIXIN_TIMEZONE_ID`
- `WEIXIN_FORCE_SWIFTSHADER`
- `WEIXIN_TRANSCODE_MODE` (reencode/faststart/off)
- `CORS_ORIGINS` (CORS 白名单,默认 *)
- `SUPABASE_STORAGE_LOCAL_PATH` (本地存储路径)
- `DOUYIN_COOKIE` (抖音视频下载 Cookie)
### 抖音
- `DOUYIN_HEADLESS_MODE` (headful/headless-new默认 headless-new)
- `DOUYIN_CHROME_PATH` / `DOUYIN_BROWSER_CHANNEL`
- `DOUYIN_USER_AGENT` (默认 Chrome/144)
- `DOUYIN_LOCALE` / `DOUYIN_TIMEZONE_ID`
- `DOUYIN_FORCE_SWIFTSHADER`
- `DOUYIN_DEBUG_ARTIFACTS` / `DOUYIN_RECORD_VIDEO` / `DOUYIN_KEEP_SUCCESS_VIDEO`
### 支付宝
- `ALIPAY_APP_ID` / `ALIPAY_PRIVATE_KEY_PATH` / `ALIPAY_PUBLIC_KEY_PATH`
- `ALIPAY_NOTIFY_URL` / `ALIPAY_RETURN_URL`
- `ALIPAY_SANDBOX` (沙箱模式,默认 false)
- `PAYMENT_AMOUNT` (会员价格,默认 999.00)
- `PAYMENT_EXPIRE_DAYS` (会员有效天数,默认 365)
---

View File

@@ -15,11 +15,24 @@ backend/
├── app/
│ ├── core/ # 核心配置 (config.py, security.py, response.py)
│ ├── modules/ # 业务模块 (router/service/workflow/schemas)
│ │ ├── videos/ # 视频生成任务router/schemas/service/workflow
│ │ ├── materials/ # 素材管理router/schemas/service
│ │ ├── publish/ # 多平台发布
│ │ ├── auth/ # 认证与会话
│ │ ├── ai/ # AI 功能(标题标签生成、多语言翻译)
│ │ ├── assets/ # 静态资源(字体/样式/BGM
│ │ ├── ref_audios/ # 声音克隆参考音频router/schemas/service
│ │ ├── generated_audios/ # 预生成配音管理router/schemas/service
│ │ ├── login_helper/ # 扫码登录辅助
│ │ ├── tools/ # 工具接口router/schemas/service
│ │ ├── payment/ # 支付宝付费开通router/schemas/service
│ │ └── admin/ # 管理员功能
│ ├── repositories/ # Supabase 数据访问
│ ├── services/ # 外部服务集成 (TTS/Remotion/Storage 等)
│ ├── services/ # 外部服务集成 (TTS/Remotion/Storage/Uploader 等)
│ └── tests/ # 单元测试与集成测试
├── scripts/ # 运维脚本 (watchdog.py, init_db.py)
├── assets/ # 资源库 (fonts, bgm, styles)
├── user_data/ # 用户隔离数据 (Cookie 等)
└── requirements.txt # 依赖清单
```
@@ -39,14 +52,15 @@ backend/
* `POST /api/auth/register`: 用户注册
* `GET /api/auth/me`: 获取当前用户信息
> 授权有效期策略:在登录与受保护接口鉴权时,后端会检查 `users.expires_at`。账号到期会自动停用 (`is_active=false`) 并清理 session返回 `403: 会员已到期,请续费`。
2. **视频生成 (Videos)**
* `POST /api/videos/generate`: 提交生成任务
* `GET /api/videos/tasks/{task_id}`: 查询任务状态
* `GET /api/videos/tasks/{task_id}`: 查询单个任务状态
* `GET /api/videos/tasks`: 获取用户所有任务列表
* `GET /api/videos/generated`: 获取历史视频列表
* `DELETE /api/videos/generated/{video_id}`: 删除历史视频
> **修正 (16:20)**:任务查询与历史列表接口已更新为 `/api/videos/tasks/{task_id}` 与 `/api/videos/generated`。
3. **素材管理 (Materials)**
* `POST /api/materials`: 上传素材
* `GET /api/materials`: 获取素材列表
@@ -54,14 +68,49 @@ backend/
4. **社交发布 (Publish)**
* `POST /api/publish`: 发布视频到 抖音/微信视频号/B站/小红书
* `POST /api/publish/login`: 扫码登录平台
* `GET /api/publish/login/status`: 查询登录状态(含刷脸验证二维码)
* `GET /api/publish/accounts`: 获取已登录账号列表
> 提示:视频号发布建议使用 headful + xvfb-run 运行后端,避免 headless 解码失败
> 提示:视频号/抖音发布建议使用 headful + xvfb-run 运行后端。
5. **资源库 (Assets)**
* `GET /api/assets/subtitle-styles`: 字幕样式列表
* `GET /api/assets/title-styles`: 标题样式列表
* `GET /api/assets/bgm`: 背景音乐列表
6. **声音克隆 (Ref Audios)**
* `POST /api/ref-audios`: 上传参考音频 (multipart/form-data自动 Whisper 转写 ref_text)
* `GET /api/ref-audios`: 获取参考音频列表
* `PUT /api/ref-audios/{id}`: 重命名参考音频
* `DELETE /api/ref-audios/{id}`: 删除参考音频
* `POST /api/ref-audios/{id}/retranscribe`: 重新识别参考音频文字Whisper 转写 + 超 10s 自动截取)
7. **AI 功能 (AI)**
* `POST /api/ai/generate-meta`: AI 生成标题和标签
* `POST /api/ai/translate`: AI 多语言翻译(支持 9 种目标语言)
8. **预生成配音 (Generated Audios)**
* `POST /api/generated-audios/generate`: 异步生成配音(返回 task_id
* `GET /api/generated-audios/tasks/{task_id}`: 轮询生成进度
* `GET /api/generated-audios`: 列出用户所有配音
* `DELETE /api/generated-audios/{audio_id}`: 删除配音
* `PUT /api/generated-audios/{audio_id}`: 重命名配音
9. **工具 (Tools)**
* `POST /api/tools/extract-script`: 从视频链接提取文案
10. **健康检查**
* `GET /api/lipsync/health`: 唇形同步服务健康状态(含 LatentSync + MuseTalk + 混合路由阈值)
* `GET /api/voiceclone/health`: CosyVoice 3.0 服务健康状态
11. **支付 (Payment)**
* `POST /api/payment/create-order`: 创建支付宝电脑网站支付订单(需 payment_token
* `POST /api/payment/notify`: 支付宝异步通知回调(返回纯文本 success/fail
* `GET /api/payment/status/{out_trade_no}`: 查询订单支付状态(前端轮询)
> 登录时若账号未激活或已过期,返回 403 + `payment_token`,前端跳转 `/pay` 页面完成付费。详见 [支付宝部署指南](ALIPAY_DEPLOY.md)。
### 统一响应结构
```json
@@ -79,13 +128,39 @@ backend/
`POST /api/videos/generate` 支持以下可选字段:
- `material_path`: 视频素材路径(单素材模式)
- `material_paths`: 多素材路径数组多机位模式≥2 个素材时按句子自动切换)
- `tts_mode`: TTS 模式 (`edgetts` / `voiceclone`)
- `voice`: EdgeTTS 音色 IDedgetts 模式)
- `ref_audio_id` / `ref_text`: 参考音频 ID 与文本voiceclone 模式)
- `generated_audio_id`: 预生成配音 ID存在时跳过内联 TTS使用已生成的配音文件
- `speed`: 语速(声音克隆模式,默认 1.0,范围 0.8-1.2
- `custom_assignments`: 自定义素材分配数组(每项含 `material_path` / `start` / `end` / `source_start` / `source_end?`),存在时优先按时间轴可见段生成
- `output_aspect_ratio`: 输出画面比例(`9:16``16:9`,默认 `9:16`
- `language`: TTS 语言(默认自动检测,声音克隆时透传给 CosyVoice 3.0
- `title`: 片头标题文字
- `title_display_mode`: 标题显示模式(`short` / `persistent`,默认 `short`
- `title_duration`: 标题显示时长(秒,默认 `4.0``short` 模式生效)
- `subtitle_style_id`: 字幕样式 ID
- `title_style_id`: 标题样式 ID
- `subtitle_font_size`: 字幕字号(覆盖样式默认值)
- `title_font_size`: 标题字号(覆盖样式默认值)
- `title_top_margin`: 标题距顶部像素
- `secondary_title`: 片头副标题文字(可选,限 20 字,仅视频画面显示)
- `secondary_title_style_id`: 副标题样式 ID
- `secondary_title_font_size`: 副标题字号
- `secondary_title_top_margin`: 副标题距主标题间距
- `subtitle_bottom_margin`: 字幕距底部像素
- `enable_subtitles`: 是否启用字幕
- `bgm_id`: 背景音乐 ID
- `bgm_volume`: 背景音乐音量0-1默认 0.2
### 多素材稳定性说明
- 多素材片段在拼接前统一重编码,并强制 `25fps + CFR`,减少段边界时间基不一致导致的画面卡顿。
- concat 流程启用 `+genpts` 重建时间戳,提升拼接后时间轴连续性。
- 对带旋转元数据的 MOV 素材会先做方向归一化,再进入分辨率判断和后续流程。
## 📦 资源库与静态资源
- 本地资源目录:`backend/assets/{fonts,bgm,styles}`
@@ -127,6 +202,12 @@ GLM_API_KEY=your_glm_api_key
# LatentSync 配置
LATENTSYNC_GPU_ID=1
# MuseTalk 配置 (长视频唇形同步)
MUSETALK_GPU_ID=0
MUSETALK_API_URL=http://localhost:8011
MUSETALK_BATCH_SIZE=32
LIPSYNC_DURATION_THRESHOLD=120
```
### 4. 启动服务
@@ -149,6 +230,14 @@ uvicorn app.main:app --host 0.0.0.0 --port 8006 --reload
3. **重要**: 如果模型占用 GPU请务必使用 `asyncio.Lock` 进行并发控制,防止 OOM。
4.`app/modules/` 下创建对应模块,添加 router/service/schemas并在 `main.py` 注册路由。
### 唇形同步混合路由
`lipsync_service.py` 实现了 LatentSync + MuseTalk 混合路由:
- 短视频 (<`LIPSYNC_DURATION_THRESHOLD`s) → LatentSync 1.6 (GPU1, 端口 8007)
- 长视频 (>=阈值) → MuseTalk 1.5 (GPU0, 端口 8011)
- MuseTalk 不可用时自动回退到 LatentSync
- 路由逻辑对 workflow 完全透明
### 添加定时任务
目前推荐使用 **APScheduler****Crontab** 来管理定时任务。

212
Docs/COSYVOICE3_DEPLOY.md Normal file
View File

@@ -0,0 +1,212 @@
# CosyVoice 3.0 部署文档
## 概览
| 项目 | 值 |
|------|------|
| 模型 | Fun-CosyVoice3-0.5B-2512 (0.5B 参数) |
| 端口 | 8010 |
| GPU | 0 (CUDA_VISIBLE_DEVICES=0) |
| 推理精度 | FP16 (自动混合精度) |
| PM2 名称 | vigent2-cosyvoice (id=15) |
| Conda 环境 | cosyvoice (Python 3.10) |
| 启动脚本 | `run_cosyvoice.sh` |
| 服务脚本 | `models/CosyVoice/cosyvoice_server.py` |
| 模型加载时间 | ~22-34 秒 |
| 显存占用 | ~3-5 GB |
## 支持语言
中文、英文、日语、韩语、德语、西班牙语、法语、意大利语、俄语18+ 中国方言
## 目录结构
```
models/CosyVoice/
├── cosyvoice_server.py # FastAPI 服务 (端口 8010)
├── cosyvoice/ # CosyVoice 源码
│ └── cli/cosyvoice.py # AutoModel 入口
├── third_party/Matcha-TTS/ # 子模块依赖
├── pretrained_models/
│ ├── Fun-CosyVoice3-0.5B/ # 模型文件 (~8.2GB)
│ │ ├── llm.pt # LLM 模型 (1.9GB)
│ │ ├── llm.rl.pt # RL 模型 (1.9GB, 备用)
│ │ ├── flow.pt # Flow 模型 (1.3GB)
│ │ ├── hift.pt # HiFT 声码器 (80MB)
│ │ ├── campplus.onnx # 说话人嵌入 (27MB)
│ │ ├── speech_tokenizer_v3.onnx # 语音分词器 (925MB)
│ │ ├── cosyvoice3.yaml # 模型配置
│ │ └── CosyVoice-BlankEN/ # Qwen tokenizer
│ └── CosyVoice-ttsfrd/ # 文本正则化资源
│ ├── resource/ # 解压后的 ttsfrd 资源
│ └── resource.zip
run_cosyvoice.sh # PM2 启动脚本
```
## API 接口
### GET /health
健康检查,返回:
```json
{
"service": "CosyVoice 3.0 Voice Clone",
"model": "Fun-CosyVoice3-0.5B",
"ready": true,
"gpu_id": 0
}
```
### POST /generate
声音克隆生成。
**参数 (multipart/form-data)**
| 参数 | 类型 | 必填 | 说明 |
|------|------|------|------|
| ref_audio | File | 是 | 参考音频 (WAV) |
| text | string | 是 | 要合成的文本 |
| ref_text | string | 是 | 参考音频的转写文字 |
| language | string | 否 | 语言 (默认 "Chinese"CosyVoice 自动检测) |
| speed | float | 否 | 语速 (默认 1.0,范围 0.5-2.0,建议 0.8-1.2) |
**返回:** WAV 音频文件
**状态码:**
- 200: 成功
- 429: GPU 忙,请重试
- 500: 生成失败/超时
- 503: 模型未加载/服务中毒
## 安全机制
1. **GPU 推理锁** (`asyncio.Lock`): 防止并发推理导致 GPU 状态损坏
2. **429 拒绝**: 锁被占用时立即返回 429客户端重试
3. **超时保护**: `60 + len(text) * 2` 秒,上限 300 秒
4. **Poisoned 标记**: 超时后标记服务为中毒状态,健康检查返回 `ready: false`
5. **强制退出**: 超时后 1.5 秒强制 `os._exit(1)`PM2 自动重启
6. **启动自检**: 启动时用短文本做一次真实推理,验证 GPU 推理链路可用;失败则 `_model_loaded = False`,健康检查返回 `ready: false`,避免假阳性
7. **参考音频自动截取**: 参考音频超过 10 秒时自动截取前 10 秒CosyVoice 建议 3-10 秒),避免采样异常
## 运维命令
```bash
# 启动
pm2 start run_cosyvoice.sh --name vigent2-cosyvoice
# 重启
pm2 restart vigent2-cosyvoice
# 查看日志
pm2 logs vigent2-cosyvoice --lines 50
# 健康检查
curl http://localhost:8010/health
# 停止
pm2 stop vigent2-cosyvoice
```
## 从零部署步骤
### 1. 克隆仓库
```bash
cd /home/rongye/ProgramFiles/ViGent2/models
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
cd CosyVoice
git submodule update --init --recursive
```
### 2. 创建 Conda 环境
```bash
conda create -n cosyvoice -y python=3.10
conda activate cosyvoice
```
### 3. 安装依赖
注意:不能直接 `pip install -r requirements.txt`,有版本冲突需要处理。
```bash
# 安装 PyTorch 2.3.1 (CUDA 12.1) — 必须先装,版本严格要求
pip install torch==2.3.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
# 核心推理依赖
pip install conformer==0.3.2 HyperPyYAML==1.2.2 inflect==7.3.1 \
librosa==0.10.2 lightning==2.2.4 modelscope==1.20.0 omegaconf==2.3.0 \
pydantic==2.7.0 soundfile==0.12.1 fastapi==0.115.6 uvicorn==0.30.0 \
transformers==4.51.3 protobuf==4.25 hydra-core==1.3.2 \
rich==13.7.1 diffusers==0.29.0 x-transformers==2.11.24 wetext==0.0.4
# onnxruntime-gpu
pip install onnxruntime-gpu==1.18.0 \
--extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
# 其他必要依赖
pip install gdown matplotlib pyarrow wget onnx python-multipart httpx
# openai-whisper 需要 setuptools < 71提供 pkg_resources
pip install "setuptools<71"
pip install --no-build-isolation openai-whisper==20231117
# pyworld 需要 g++ 和 Cython
pip install Cython
PATH="/usr/bin:$PATH" pip install pyworld==0.3.4
# 关键版本修复
pip install "numpy<2" # onnxruntime-gpu 不兼容 numpy 2.x
pip install "ruamel.yaml<0.18" # hyperpyyaml 不兼容 ruamel.yaml 0.19+
```
> **重要**: CosyVoice 要求 torch==2.3.1。torch 2.10+ 会导致 CUBLAS_STATUS_INVALID_VALUE 错误。
> torch 2.3.1+cu121 自带 nvidia-cudnn-cu12onnxruntime CUDAExecutionProvider 可正常使用。
### 4. 下载模型
```bash
# 使用 huggingface_hub (国内用 hf-mirror.com)
HF_ENDPOINT=https://hf-mirror.com python -c "
from huggingface_hub import snapshot_download
snapshot_download('FunAudioLLM/Fun-CosyVoice3-0.5B-2512', local_dir='pretrained_models/Fun-CosyVoice3-0.5B')
snapshot_download('FunAudioLLM/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')
"
```
### 5. 安装 ttsfrd (可选,提升文本正则化质量)
```bash
cd pretrained_models/CosyVoice-ttsfrd/
unzip resource.zip -d .
pip install ttsfrd_dependency-0.1-py3-none-any.whl
pip install ttsfrd-0.4.2-cp310-cp310-linux_x86_64.whl
```
### 6. 注册 PM2
```bash
pm2 start run_cosyvoice.sh --name vigent2-cosyvoice
pm2 save
```
## 已知问题
1. **ttsfrd "prepare tts engine failed"**: ttsfrd C 库内部日志Python 层初始化成功,不影响使用
2. **Sliding Window Attention 警告**: transformers 库提示,不影响推理结果
3. **onnxruntime Memcpy 性能提示**: `Memcpy nodes are not supported by the CUDA EP`,仅为性能建议日志,不影响功能
> libcudnn.so.8 问题在 torch 2.3.1+cu121 环境下已解决(自带 nvidia-cudnn-cu12onnxruntime CUDAExecutionProvider 可正常加载。
## 与 Qwen3-TTS 对比
| 特性 | Qwen3-TTS (已停用) | CosyVoice 3.0 (当前) |
|------|-----------|----------------|
| 端口 | 8009 | 8010 |
| 模型大小 | 0.6B | 0.5B |
| 语言 | 中/英/日/韩 | 9 语言 + 18 方言 |
| 克隆方式 | ref_audio + ref_text | ref_audio + ref_text |
| prompt 格式 | 直接传 ref_text | `You are a helpful assistant.<\|endofprompt\|>` + ref_text |
| 内置分段 | 无,需客户端分段 | 内置 text_normalize 自动分段 |
| 状态 | 已停用 (PM2 stopped) | 生产使用中 |

View File

@@ -7,8 +7,8 @@
| 服务器 | Dell PowerEdge R730 |
| CPU | 2× Intel Xeon E5-2680 v4 (56 线程) |
| 内存 | 192GB DDR4 |
| GPU 0 | NVIDIA RTX 3090 24GB |
| GPU 1 | NVIDIA RTX 3090 24GB (用于 LatentSync) |
| GPU 0 | NVIDIA RTX 3090 24GB (MuseTalk + CosyVoice) |
| GPU 1 | NVIDIA RTX 3090 24GB (LatentSync) |
| 部署路径 | `/home/rongye/ProgramFiles/ViGent2` |
---
@@ -72,7 +72,9 @@ cd /home/rongye/ProgramFiles/ViGent2
---
## 步骤 3: 部署 AI 模型 (LatentSync 1.6)
## 步骤 3: 部署 AI 模型
### 3a. LatentSync 1.6 (短视频唇形同步, GPU1)
> ⚠️ **重要**LatentSync 需要独立的 Conda 环境和 **~18GB VRAM**。请**不要**直接安装在后端环境中。
@@ -93,6 +95,26 @@ conda activate latentsync
python -m scripts.server # 测试能否启动Ctrl+C 退出
```
### 3b. MuseTalk 1.5 (长视频唇形同步, GPU0)
> MuseTalk 是单步潜空间修复模型(非扩散模型),推理速度接近实时,适合 >=120s 的长视频。与 CosyVoice 共享 GPU0fp16 推理约需 4-8GB 显存。
请参考详细的独立部署指南:
**[MuseTalk 部署指南](MUSETALK_DEPLOY.md)**
简要步骤:
1. 创建独立的 `musetalk` Conda 环境 (Python 3.10 + PyTorch 2.0.1 + CUDA 11.8)
2. 安装 mmcv/mmdet/mmpose 等依赖
3. 下载模型权重 (`download_weights.sh`)
4. 创建必要的软链接 (`musetalk/config.json`, `musetalk/musetalkV15`)
**验证 MuseTalk 部署**:
```bash
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
/home/rongye/ProgramFiles/miniconda3/envs/musetalk/bin/python scripts/server.py
# 另一个终端: curl http://localhost:8011/health
```
---
## 步骤 4: 安装后端依赖
@@ -115,6 +137,16 @@ playwright install chromium
```
> 提示:视频号发布建议使用系统 Chrome + xvfb-run避免 headless 解码失败)。
> 抖音发布同样建议 headful 模式 (`DOUYIN_HEADLESS_MODE=headful`)。
### 扫码登录注意事项
- **Cookie 按用户隔离**:每个用户的 Cookie 存储在 `backend/user_data/{uuid}/cookies/` 目录下,多用户并发登录互不干扰。
- **抖音 QR 登录关键教训**
- 扫码后绝对**不能重新加载 QR 页面**,否则会销毁会话 token
- 使用**新标签页**检测登录完成状态(检查 URL 包含 `creator-micro` + session cookies 存在)
- 抖音可能弹出**刷脸验证**,后端会自动提取验证二维码返回给前端展示
- **微信视频号发布**:标题、描述、标签统一写入"视频描述"字段
---
@@ -155,6 +187,8 @@ playwright install chromium
CREATE POLICY "Allow public read" ON storage.objects FOR SELECT TO anon USING (bucket_id = 'materials' OR bucket_id = 'outputs');
EOF
```
> **注意**:后端启动时会自动创建额外的存储桶(`ref-audios`、`generated-audios`),无需手动创建。
---
@@ -177,7 +211,7 @@ cp .env.example .env
| `SUPABASE_PUBLIC_URL` | `https://api.hbyrkj.top` | Supabase API 公网地址 (前端访问) |
| `LATENTSYNC_GPU_ID` | 1 | GPU 选择 (0 或 1) |
| `LATENTSYNC_USE_SERVER` | false | 设为 true 以启用常驻服务加速 |
| `LATENTSYNC_INFERENCE_STEPS` | 20 | 推理步数 (20-50) |
| `LATENTSYNC_INFERENCE_STEPS` | 16 | 推理步数 (16-50) |
| `LATENTSYNC_GUIDANCE_SCALE` | 1.5 | 引导系数 (1.0-3.0) |
| `DEBUG` | true | 生产环境改为 false |
| `REDIS_URL` | `redis://localhost:6379/0` | 任务状态存储(不可用时回退内存) |
@@ -189,9 +223,32 @@ cp .env.example .env
| `WEIXIN_TIMEZONE_ID` | Asia/Shanghai | 视频号时区 |
| `WEIXIN_FORCE_SWIFTSHADER` | true | 强制软件 WebGL避免 context lost |
| `WEIXIN_TRANSCODE_MODE` | reencode | 上传前转码 (reencode/faststart/off) |
| `DOUYIN_HEADLESS_MODE` | headless-new | 抖音 Playwright 模式 (headful/headless-new) |
| `DOUYIN_CHROME_PATH` | `/usr/bin/google-chrome` | 抖音 Chrome 路径 |
| `DOUYIN_BROWSER_CHANNEL` | | 抖音 Chromium 通道 (可选) |
| `DOUYIN_USER_AGENT` | Chrome/144 UA | 抖音浏览器指纹 UA |
| `DOUYIN_LOCALE` | zh-CN | 抖音语言环境 |
| `DOUYIN_TIMEZONE_ID` | Asia/Shanghai | 抖音时区 |
| `DOUYIN_FORCE_SWIFTSHADER` | true | 强制软件 WebGL |
| `DOUYIN_DEBUG_ARTIFACTS` | false | 保留调试截图 |
| `DOUYIN_RECORD_VIDEO` | false | 录制浏览器操作视频 |
| `DOUYIN_KEEP_SUCCESS_VIDEO` | false | 成功后保留录屏 |
| `CORS_ORIGINS` | `*` | CORS 允许源 (生产环境建议白名单) |
| `SUPABASE_STORAGE_LOCAL_PATH` | 默认路径 | Supabase 本地存储路径 |
| `DOUYIN_COOKIE` | 空 | 抖音视频下载 Cookie (文案提取功能) |
| `MUSETALK_GPU_ID` | 0 | MuseTalk GPU 编号 |
| `MUSETALK_API_URL` | `http://localhost:8011` | MuseTalk 常驻服务地址 |
| `MUSETALK_BATCH_SIZE` | 32 | MuseTalk 推理批大小 |
| `MUSETALK_VERSION` | v15 | MuseTalk 模型版本 |
| `MUSETALK_USE_FLOAT16` | true | MuseTalk 半精度加速 |
| `LIPSYNC_DURATION_THRESHOLD` | 120 | 秒,>=此值用 MuseTalk<此值用 LatentSync |
| `ALIPAY_APP_ID` | 空 | 支付宝应用 APPID |
| `ALIPAY_PRIVATE_KEY_PATH` | 空 | 应用私钥 PEM 文件路径 |
| `ALIPAY_PUBLIC_KEY_PATH` | 空 | 支付宝公钥 PEM 文件路径 |
| `ALIPAY_NOTIFY_URL` | 空 | 支付宝异步回调地址 (公网 HTTPS) |
| `ALIPAY_RETURN_URL` | 空 | 支付完成后浏览器跳转地址 |
| `PAYMENT_AMOUNT` | `999.00` | 会员价格 (元) |
| `PAYMENT_EXPIRE_DAYS` | `365` | 会员有效天数 |
> 支付宝完整配置步骤密钥生成、PEM 格式、产品开通等)请参考 **[支付宝部署指南](ALIPAY_DEPLOY.md)**。
---
@@ -241,6 +298,13 @@ cd /home/rongye/ProgramFiles/ViGent2/models/LatentSync
conda activate latentsync
python -m scripts.server
```
### 启动 MuseTalk (终端 4, 长视频唇形同步)
```bash
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
/home/rongye/ProgramFiles/miniconda3/envs/musetalk/bin/python scripts/server.py
```
### 验证
@@ -265,6 +329,7 @@ cat > run_backend.sh << 'EOF'
set -e
BASE_DIR="$(cd "$(dirname "$0")" && pwd)"
export WEIXIN_HEADLESS_MODE=headful
export DOUYIN_HEADLESS_MODE=headful
export WEIXIN_DEBUG_ARTIFACTS=false
export WEIXIN_RECORD_VIDEO=false
export DOUYIN_DEBUG_ARTIFACTS=false
@@ -314,34 +379,48 @@ chmod +x run_latentsync.sh
pm2 start ./run_latentsync.sh --name vigent2-latentsync
```
### 4. 启动 Qwen3-TTS 声音克隆服务 (可选)
### 4. 启动 CosyVoice 3.0 声音克隆服务 (可选)
> 如需使用声音克隆功能,需要启动此服务。
> 如需使用声音克隆功能,需要启动此服务。详细部署步骤见 [CosyVoice 3.0 部署文档](COSYVOICE3_DEPLOY.md)。
1. 安装 HTTP 服务依赖:
```bash
conda activate qwen-tts
pip install fastapi uvicorn python-multipart
```
1. 启动脚本位于项目根目录: `run_cosyvoice.sh`
2. 启动脚本位于项目根目录: `run_qwen_tts.sh`
3. 使用 pm2 启动:
2. 使用 pm2 启动:
```bash
cd /home/rongye/ProgramFiles/ViGent2
pm2 start ./run_qwen_tts.sh --name vigent2-qwen-tts
pm2 start ./run_cosyvoice.sh --name vigent2-cosyvoice
pm2 save
```
4. 验证服务:
3. 验证服务:
```bash
# 检查健康状态
curl http://localhost:8009/health
curl http://localhost:8010/health
```
### 5. 启动服务看门狗 (Watchdog)
### 5. 启动 MuseTalk 长视频唇形同步服务
> 🛡️ **推荐**:监控 Qwen-TTS 和 LatentSync 服务健康状态,卡死时自动重启
> 长视频 (>=120s) 自动路由到 MuseTalk。MuseTalk 不可用时自动回退 LatentSync
> 详细部署步骤见 [MuseTalk 部署指南](MUSETALK_DEPLOY.md)。
1. 启动脚本位于项目根目录: `run_musetalk.sh`
2. 使用 pm2 启动:
```bash
cd /home/rongye/ProgramFiles/ViGent2
pm2 start ./run_musetalk.sh --name vigent2-musetalk
pm2 save
```
3. 验证服务:
```bash
curl http://localhost:8011/health
# {"status":"ok","model_loaded":true}
```
### 6. 启动服务看门狗 (Watchdog)
> 🛡️ **推荐**:监控 CosyVoice 和 LatentSync 服务健康状态,卡死时自动重启。
```bash
cd /home/rongye/ProgramFiles/ViGent2
@@ -356,13 +435,16 @@ pm2 save
pm2 startup
```
> **提示**: 完整的 PM2 进程列表应包含 5-6 个服务: vigent2-backend, vigent2-frontend, vigent2-latentsync, vigent2-cosyvoice, vigent2-musetalk, vigent2-watchdog。
### pm2 常用命令
```bash
pm2 status # 查看所有服务状态
pm2 logs # 查看所有日志
pm2 logs vigent2-backend # 查看后端日志
pm2 logs vigent2-qwen-tts # 查看 Qwen3-TTS 日志
pm2 logs vigent2-cosyvoice # 查看 CosyVoice 日志
pm2 logs vigent2-musetalk # 查看 MuseTalk 日志
pm2 restart all # 重启所有服务
pm2 stop vigent2-latentsync # 停止 LatentSync 服务
pm2 delete all # 删除所有服务
@@ -501,7 +583,8 @@ python3 -c "import torch; print(torch.cuda.is_available())"
sudo lsof -i :8006
sudo lsof -i :3002
sudo lsof -i :8007
sudo lsof -i :8009 # Qwen3-TTS
sudo lsof -i :8010 # CosyVoice
sudo lsof -i :8011 # MuseTalk
```
### 查看日志
@@ -511,7 +594,8 @@ sudo lsof -i :8009 # Qwen3-TTS
pm2 logs vigent2-backend
pm2 logs vigent2-frontend
pm2 logs vigent2-latentsync
pm2 logs vigent2-qwen-tts
pm2 logs vigent2-cosyvoice
pm2 logs vigent2-musetalk
```
### SSH 连接卡顿 / 系统响应慢
@@ -542,6 +626,7 @@ pm2 logs vigent2-qwen-tts
| `playwright` | 社交媒体自动发布 |
| `biliup` | B站视频上传 |
| `loguru` | 日志管理 |
| `python-alipay-sdk` | 支付宝支付集成 |
### 前端关键依赖
@@ -550,6 +635,7 @@ pm2 logs vigent2-qwen-tts
| `next` | React 框架 |
| `swr` | 数据请求与缓存 |
| `tailwindcss` | CSS 样式 |
| `wavesurfer.js` | 音频波形(时间轴编辑器) |
### LatentSync 关键依赖

View File

@@ -342,6 +342,6 @@ models/Qwen3-TTS/
## 🔗 相关文档
- [task_complete.md](../task_complete.md) - 任务总览
- [TASK_COMPLETE.md](../TASK_COMPLETE.md) - 任务总览
- [Day11.md](./Day11.md) - 上传架构重构
- [QWEN3_TTS_DEPLOY.md](../QWEN3_TTS_DEPLOY.md) - Qwen3-TTS 部署指南

View File

@@ -273,7 +273,7 @@ pm2 logs vigent2-qwen-tts --lines 50
## 🔗 相关文档
- [task_complete.md](../task_complete.md) - 任务总览
- [TASK_COMPLETE.md](../TASK_COMPLETE.md) - 任务总览
- [Day12.md](./Day12.md) - iOS 兼容与 Qwen3-TTS 部署
- [QWEN3_TTS_DEPLOY.md](../QWEN3_TTS_DEPLOY.md) - Qwen3-TTS 部署指南
- [SUBTITLE_DEPLOY.md](../SUBTITLE_DEPLOY.md) - 字幕功能部署指南

View File

@@ -397,6 +397,6 @@ if ((status === 401 || status === 403) && !isRedirecting && !isPublicPath) {
## 🔗 相关文档
- [task_complete.md](../task_complete.md) - 任务总览
- [TASK_COMPLETE.md](../TASK_COMPLETE.md) - 任务总览
- [Day13.md](./Day13.md) - 声音克隆功能集成 + 字幕功能
- [QWEN3_TTS_DEPLOY.md](../QWEN3_TTS_DEPLOY.md) - Qwen3-TTS 1.7B 部署指南

View File

@@ -342,7 +342,7 @@ pm2 restart vigent2-backend vigent2-frontend
## 🔗 相关文档
- [task_complete.md](../task_complete.md) - 任务总览
- [TASK_COMPLETE.md](../TASK_COMPLETE.md) - 任务总览
- [Day14.md](./Day14.md) - 模型升级 + AI 标题标签
- [AUTH_DEPLOY.md](../AUTH_DEPLOY.md) - 认证系统部署指南

View File

@@ -136,4 +136,4 @@ if service["failures"] >= service['threshold']:
- [x] `Docs/QWEN3_TTS_DEPLOY.md`: 添加 Flash Attention 安装指南
- [x] `Docs/DEPLOY_MANUAL.md`: 添加 Watchdog 部署说明
- [x] `Docs/task_complete.md`: 更新进度至 100% (Day 16)
- [x] `Docs/TASK_COMPLETE.md`: 更新进度至 100% (Day 16)

View File

@@ -65,6 +65,15 @@ pm2 restart vigent2-latentsync
# Remotion 已自动编译
```
### 🎨 交互与体验优化 (17:00)
- [x] **UX-1**: PublishPage 图片加载优化 (`<img>``next/image`)
- [x] **UX-2**: 按钮 Loading 状态统一 (提取脚本弹窗 + 发布页)
- [x] **UX-3**: 骨架屏加载优化 (发布页加载中状态)
- [x] **UX-4**: 全局快捷键支持 (ESC 关闭弹窗, Enter 确认)
- [x] **UX-5**: 移除全局 GlobalTaskIndicator (视觉降噪)
- [x] **UX-6**: 视频生成完成自动刷新列表并选中最新
### 🐛 缺陷修复与回归治理 (17:30)
#### 严重缺陷修复
@@ -85,6 +94,10 @@ pm2 restart vigent2-latentsync
- *原因*: 重构移除旧逻辑后,新用户或无缓存用户进入页面无默认选中。
- *修复*: 在 `isRestored` 且无选中时,增加兜底逻辑自动选中列表第一项。
- [x] **REF-1**: 持久化逻辑全站收敛
- *优化*: 清理 `useBgm`, `useGeneratedVideos`, `useTitleSubtitleStyles` 中的冗余 `localStorage` 读取
- *优化*: 修复 `useMaterials` 中的闭包陷阱(使用函数式更新),防止覆盖已恢复的状态。
- [x] **REG-3**: 素材选择持久化失效 (闭包陷阱)
- *原因*: `useMaterials` 加载回调中捕获了旧的 `selectedMaterial` 状态,覆盖了已恢复的值
- *修复*: 改为函数式状态更新 (`setState(prev => ...)`),确保基于最新状态判断。
- [x] **REF-1**: 持久化逻辑全站收敛与排查
- *优化*: 清理 `useBgm`, `useGeneratedVideos`, `useTitleSubtitleStyles` 中的冗余 `localStorage` 读取,统一由 `useHomePersistence` 管理。
- *排查*: 深度排查 `useRefAudios`, `useTitleSubtitleStyles` 等模块,确认逻辑健壮,无类似回归风险。

View File

@@ -90,3 +90,360 @@ await api.post('/api/publish', { video_path: video.path, ... });
pm2 restart vigent2-backend # Remotion 容错
npm run build && pm2 restart vigent2-frontend # 前端持久化修复
```
---
## 🎨 浮动样式预览窗口优化 (Day 21)
### 概述
标题与字幕面板中的预览区域原本是内联折叠的,展开后调节下方滑块时看不到预览效果。改为 `position: fixed` 浮动窗口,固定在视口左上角,滚动页面时预览始终可见,边调边看。
### 已完成优化
#### 1. 新建浮动预览组件 `FloatingStylePreview.tsx`
- `createPortal(jsx, document.body)` 渲染到 body 层级,脱离面板 DOM 树
- `position: fixed` + 左上角固定定位,滚动时不移动
- `z-index: 150`(低于 VideoPreviewModal 的 200
- 顶部标题栏 + X 关闭按钮ESC 键关闭
- 桌面端固定宽度 280px移动端自适应最大 360px
- `previewScale = windowWidth / previewBaseWidth` 自行计算缩放
- `maxHeight: calc(100dvh - 32px)` 防止超出视口
#### 2. 修改 `TitleSubtitlePanel.tsx`
- 删除内联预览区域(`ref={previewContainerRef}` 整块 JSX
- 条件渲染 `<FloatingStylePreview />`,按钮文本保持"预览样式"/"收起预览"
- 移除 `previewScale``previewAspectRatio``previewContainerRef` props
- 保留 `previewBaseWidth/Height`(浮动窗口需要原始尺寸计算 scale
#### 3. 清理 `useHomeController.ts`
- 移除 `previewContainerWidth` 状态
- 移除 `titlePreviewContainerRef` ref
- 移除 ResizeObserver useEffect浮动窗口自管尺寸不再需要
#### 4. 简化 `HomePage.tsx` 传参
- 移除 `previewContainerWidth``titlePreviewContainerRef` 解构
- 移除 `previewScale``previewAspectRatio``previewContainerRef` prop 传递
#### 5. 移动端适配
- `ScriptEditor.tsx`:标题行改为 `flex-wrap`"AI生成标题标签"按钮不再溢出
- 预览默认比例从 1280×720 (16:9) 改为 1080×1920 (9:16),符合抖音竖屏视频
### 涉及文件汇总
| 文件 | 变更 |
|------|------|
| `frontend/src/features/home/ui/FloatingStylePreview.tsx` | **新建** 浮动预览组件 |
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | 移除内联预览,渲染浮动组件 |
| `frontend/src/features/home/model/useHomeController.ts` | 移除 preview 容器相关状态和 ResizeObserver |
| `frontend/src/features/home/ui/HomePage.tsx` | 简化 props 传递,默认比例改 9:16 |
| `frontend/src/features/home/ui/ScriptEditor.tsx` | 移动端按钮换行适配 |
### 重启要求
```bash
npm run build && pm2 restart vigent2-frontend
```
---
## 🔧 多平台发布体系重构:用户隔离与抖音刷脸验证 (Day 21)
### 概述
重构发布系统的两大核心问题:① 多用户场景下 Cookie/会话缺乏隔离,② 抖音登录新增刷脸验证步骤无法处理。同时修复了平台配置混用和微信视频号发布流程问题。
---
### 一、平台配置独立化
#### 问题
所有平台抖音、微信、B站、小红书共用 WEIXIN_* 配置,导致 User-Agent、Headless 模式等设置不匹配。
#### 修复 — `config.py`
- 新增 `DOUYIN_*` 独立配置项:`DOUYIN_HEADLESS_MODE``DOUYIN_USER_AGENT`Chrome/144`DOUYIN_LOCALE``DOUYIN_TIMEZONE_ID``DOUYIN_CHROME_PATH``DOUYIN_FORCE_SWIFTSHADER`、调试开关等
- 微信保持已有 `WEIXIN_*` 配置
- B站/小红书使用通用默认值
#### 修复 — `qr_login_service.py` 平台配置映射
```python
# 之前:所有平台都用 WEIXIN 设置
# 之后:每个平台独立配置
PLATFORM_CONFIGS = {
"douyin": { headless, user_agent, locale, timezone... },
"weixin": { headless, user_agent, locale, timezone... },
"bilibili": { 通用配置 },
"xiaohongshu": { 通用配置 },
}
```
---
### 二、用户隔离的 Cookie 管理
#### 问题
多用户共享同一套 Cookie 文件,用户 A 的登录态可能被用户 B 覆盖。
#### 修复 — `publish_service.py`
- `_get_cookies_dir(user_id)``backend/user_data/{uuid}/cookies/`
- `_get_cookie_path(user_id, platform)` → 按用户+平台返回独立 Cookie 文件路径
- `_get_session_key(user_id, platform)``"{user_id}_{platform}"` 格式的会话 key
- 登录/发布流程全链路传入 `user_id`,清理残留会话避免干扰
---
### 三、抖音刷脸验证二维码
#### 问题
抖音扫码登录后可能弹出刷脸验证窗口,内含新的二维码需要用户再次扫描,前端无法感知和展示。
#### 修复 — 后端 `qr_login_service.py`
- 扩展 QR 选择器:支持跨 iframe 搜索二维码元素
- 抖音 API 拦截:监听 `check_qrconnect` 响应,检测 `redirect_url`
- 检测 "完成验证" / "请前往APP完成验证" 文案
- 在验证弹窗内找到正方形二维码(排除头像),截图返回给前端
- API 确认后直接导航到 redirect_url不重新加载 QR 页,避免销毁会话)
#### 修复 — 后端 `publish_service.py`
- `get_login_session_status()` 新增 `face_verify_qr` 字段返回
- 登录成功且 Cookie 保存后自动清理会话
#### 修复 — 前端
- `usePublishController.ts`:新增 `faceVerifyQr` 状态,轮询时获取 `face_verify_qr` 字段
- `PublishPage.tsx`QR 弹窗优先展示刷脸验证二维码,附提示文案
```tsx
{faceVerifyQr ? (
<>
<Image src={`data:image/png;base64,${faceVerifyQr}`} />
<p>APP扫描上方二维码完成刷脸验证</p>
</>
) : /* 普通登录二维码 */ }
```
---
### 四、微信视频号发布流程优化
#### 修复 — `weixin_uploader.py`
- 添加 `user_id` 参数支持,发布截图目录隔离
- 新增 `post_create` API 响应监听,精准判断发布成功
- 发布结果判定URL 离开创建页 或 API 确认提交 → 视为成功
- 标题/标签处理改为统一写入"视频描述"字段(不再单独填写 title/tags
---
### 涉及文件汇总
| 文件 | 变更 |
|------|------|
| `backend/app/core/config.py` | 新增 DOUYIN_* 独立配置项 |
| `backend/app/services/qr_login_service.py` | 平台配置拆分、刷脸验证二维码、跨 iframe 选择器 |
| `backend/app/services/publish_service.py` | 用户隔离 Cookie 管理、刷脸验证状态返回 |
| `backend/app/services/uploader/weixin_uploader.py` | user_id 支持、post_create API 监听、描述字段合并 |
| `frontend/src/features/publish/model/usePublishController.ts` | faceVerifyQr 状态 |
| `frontend/src/features/publish/ui/PublishPage.tsx` | 刷脸验证二维码展示 |
### 重启要求
```bash
pm2 restart vigent2-backend # 发布服务 + QR登录
npm run build && pm2 restart vigent2-frontend # 刷脸验证UI
```
---
## 🏗️ 架构优化:前端结构微调 + 后端模块分层 (Day 21)
### 概述
根据架构审计结果,完成前端目录规范化和后端核心模块的分层补全。
### 一、前端结构微调
#### 1. ScriptExtractionModal 迁移
- `components/ScriptExtractionModal.tsx``features/home/ui/ScriptExtractionModal.tsx`
- 连带 `components/script-extraction/` 目录一并迁移到 `features/home/ui/script-extraction/`
- 更新 `HomePage.tsx` 的 import 路径
#### 2. contexts/ 目录归并
- `src/contexts/AuthContext.tsx``src/shared/contexts/AuthContext.tsx`
- `src/contexts/TaskContext.tsx``src/shared/contexts/TaskContext.tsx`
- 更新 6 处 importlayout.tsx, useHomeController.ts, usePublishController.ts, AccountSettingsDropdown.tsx, GlobalTaskIndicator.tsx
- 删除空的 `src/contexts/` 目录
#### 3. 清理重构遗留空目录
- 删除 `src/lib/``src/components/home/``src/hooks/`
### 二、后端模块分层补全
将 3 个 400+ 行的 router-only 模块拆分为 `router.py + schemas.py + service.py`
| 模块 | 改造前 | 改造后 router |
|------|--------|--------------|
| `materials/` | 416 行 | 63 行 |
| `tools/` | 417 行 | 33 行 |
| `ref_audios/` | 421 行 | 71 行 |
业务逻辑全部提取到 `service.py`,数据模型定义在 `schemas.py`router 只做参数校验 + 调用 service + 返回响应。
### 三、开发规范更新
`BACKEND_DEV.md` 第 8 节新增渐进原则:
- 新模块**必须**包含 `router.py + schemas.py + service.py`
- 改旧模块时顺手拆涉及的部分
- 新代码高标准,旧代码逐步改
### 涉及文件汇总
| 文件 | 变更 |
|------|------|
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | 从 components/ 迁入 |
| `frontend/src/features/home/ui/script-extraction/` | 从 components/ 迁入 |
| `frontend/src/shared/contexts/AuthContext.tsx` | 从 contexts/ 迁入 |
| `frontend/src/shared/contexts/TaskContext.tsx` | 从 contexts/ 迁入 |
| `backend/app/modules/materials/schemas.py` | **新建** |
| `backend/app/modules/materials/service.py` | **新建** |
| `backend/app/modules/materials/router.py` | 精简为薄路由 |
| `backend/app/modules/tools/schemas.py` | **新建** |
| `backend/app/modules/tools/service.py` | **新建** |
| `backend/app/modules/tools/router.py` | 精简为薄路由 |
| `backend/app/modules/ref_audios/schemas.py` | **新建** |
| `backend/app/modules/ref_audios/service.py` | **新建** |
| `backend/app/modules/ref_audios/router.py` | 精简为薄路由 |
| `Docs/BACKEND_DEV.md` | 目录结构标注分层、新增渐进原则 |
| `Docs/BACKEND_README.md` | 目录结构标注分层 |
| `Docs/FRONTEND_DEV.md` | 更新目录结构contexts 迁移、ScriptExtractionModal 迁移) |
### 重启要求
```bash
pm2 restart vigent2-backend
npm run build && pm2 restart vigent2-frontend
```
---
## 🎬 多素材视频生成(多机位效果)
### 概述
支持用户上传多个不同角度的自拍视频,生成视频时按句子自动切换素材,最终效果类似多机位拍摄。单素材时走原有流程,无额外开销。
### 核心架构
#### 流水线变更
```
【单素材(不变)】
text → TTS → audio → LatentSync(1个素材+完整audio) → Whisper字幕 → Remotion → 成片
【多素材(新增)】
text → TTS → audio → Whisper字幕(提前) → 按素材数量均分时长(对齐字边界)
→ 对每段: 切分audio + LatentSync(素材[i]+音频片段[i])
→ FFmpeg拼接所有片段 → Remotion(完整字幕时间戳) → 成片
```
#### 素材切换逻辑(均分方案)
1. Whisper 对完整音频转录,得到字级别时间戳
2. 按素材数量**均分音频总时长**`total_duration / N`
3. 每个分割点对齐到最近的 Whisper 字边界,避免在字中间切分
4. 首段 start 扩展为 0.0,末段 end 扩展为音频结尾,确保完整覆盖
> **设计决策**:最初方案基于原始文案标点分句,但用户文案往往不含句号(只有逗号),导致只产生 1 段。改为均分方案后不依赖文案标点,对任何输入都能正确切分。
---
### 一、后端改动
#### 1. `backend/app/modules/videos/schemas.py`
- 新增 `material_paths: Optional[List[str]]` 字段
- 保留 `material_path: str` 向后兼容
#### 2. `backend/app/modules/videos/workflow.py`(核心改动)
**新增函数**
- `_split_equal(segments, material_paths)`: 按素材数量均分音频时长,对齐到最近的 Whisper 字边界
**修改 `process_video_generation()`**
- `is_multi = len(material_paths) > 1` 判断走多素材/单素材分支
- 多素材分支Whisper 提前 → 均分切分 → 音频切分 → 逐段 LatentSync → FFmpeg 拼接
#### 3. `backend/app/services/video_service.py`
- 新增 `concat_videos()`: FFmpeg concat demuxer (`-c copy`) 拼接视频片段
- 新增 `split_audio()`: FFmpeg 按时间范围切分音频 (`-ss` + `-t` + `-c copy`)
#### 4. `backend/scripts/watchdog.py`
- 健康检查阈值从 3 次提高到 5 次(容忍期 2.5 分钟)
- 新增重启后 120 秒冷却期,避免模型加载期间被误判为故障
- 启动时给所有服务 60 秒初始冷却期
---
### 二、前端改动
#### 1. 新增依赖
```bash
npm install @dnd-kit/core @dnd-kit/sortable @dnd-kit/utilities
```
#### 2. `frontend/src/features/home/model/useMaterials.ts`
- `selectedMaterial: string``selectedMaterials: string[]`(多选)
- 新增 `toggleMaterial(id)`: 切换选中/取消至少保留1个
- 新增 `reorderMaterials(activeId, overId)`: 拖拽排序
- 上传格式扩展:新增 `.mkv/.webm/.flv/.wmv/.m4v/.ts/.mts`
#### 3. `frontend/src/features/home/ui/MaterialSelector.tsx`(重写)
- 素材列表每行增加复选框 + 序号徽标(①②③)
- 选中 ≥2 个时显示拖拽排序区(@dnd-kit `SortableContext`
- 每个排序项:拖拽把手 + 序号 + 素材名 + 移除按钮
- HTML input accept 改为 `video/*`
#### 4. `frontend/src/features/home/model/useHomeController.ts`
- 多素材 payload`material_paths` 数组 + `material_path` 向后兼容
- `enable_subtitles` 硬编码为 `true`(移除开关)
- 验证:至少选中 1 个素材
#### 5. `frontend/src/features/home/model/useHomePersistence.ts`
- 素材持久化改为 JSON 数组,向后兼容旧格式(单字符串)
- 移除 `enableSubtitles` 持久化
#### 6. `frontend/src/features/home/ui/TitleSubtitlePanel.tsx`
- 移除"逐字高亮字幕"开关,字幕样式区始终显示
#### 7. `frontend/src/features/home/ui/HomePage.tsx`
- 更新 props 传递(`selectedMaterials`, `toggleMaterial`, `reorderMaterials`
---
### 三、Bug 修复记录
#### BUG-1: 多素材只使用第一个视频(基于标点的分句方案失败)
- **现象**: 选了 2 个素材但生成的视频只使用第 1 个,日志显示 `Multi-material: 1 segments, 2 materials`
- **根因 v1**: 最初通过正则 `[。!?!?]` 在 Whisper 输出中分句,但 Whisper 不输出标点。
- **修复 v1**: 改为用原始文案标点分句——但用户文案往往只含逗号(,),无句末标点(。!?),仍退化为 1 段。
- **最终修复**: 彻底放弃基于标点的分句方案,改为 `_split_equal()` **按素材数量均分音频时长**,对齐到最近的 Whisper 字边界。不依赖任何标点符号,对所有文案均有效。
#### BUG-2: 口型对不上(音频时间偏移)
- **根因**: `split_audio` 用 Whisper 的 start/end 时间(如 0.11~7.21)切分音频,但 `compose()` 用完整原始音频0.0~结尾)合成,导致时间偏移。
- **修复**: 强制首段 start=0.0,末段 end=音频实际时长,确保切分音频完整覆盖。
#### BUG-3: min_segment_sec 过度合并导致退化(已随方案切换移除)
- **根因**: 旧方案中 2 个句子第 2 句不足 3 秒时,最短时长检查合并为 1 段,多素材退化为单素材。
- **状态**: 均分方案不存在此问题,相关代码已移除。
---
### 涉及文件汇总
| 文件 | 变更类型 | 说明 |
|------|----------|------|
| `backend/app/modules/videos/schemas.py` | 修改 | 新增 material_paths 字段 |
| `backend/app/modules/videos/workflow.py` | 修改 | 多素材流水线核心逻辑 + 3个 Bug 修复 |
| `backend/app/services/video_service.py` | 修改 | 新增 concat_videos / split_audio |
| `backend/scripts/watchdog.py` | 修改 | 阈值优化 + 冷却期机制 |
| `frontend/package.json` | 修改 | 新增 @dnd-kit 依赖 |
| `frontend/src/features/home/model/useMaterials.ts` | 修改 | 多选 + 排序状态管理 |
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 重写 | 多选复选框 + 拖拽排序 UI |
| `frontend/src/features/home/model/useHomeController.ts` | 修改 | 多素材 payload + 移除字幕开关 |
| `frontend/src/features/home/model/useHomePersistence.ts` | 修改 | JSON 数组持久化 |
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | 修改 | 移除字幕开关 |
| `frontend/src/features/home/ui/HomePage.tsx` | 修改 | 更新 props 传递 |
### 重启要求
```bash
pm2 restart vigent2-backend
npm run build && pm2 restart vigent2-frontend
```

221
Docs/DevLogs/Day22.md Normal file
View File

@@ -0,0 +1,221 @@
## 🔧 多素材生成优化与健壮性加固 (Day 22)
### 概述
对 Day 21 实现的多素材视频生成(多机位)功能进行全面审查,修复 6 个高优先级 Bug、完成 8 项体验优化,并将多素材流水线从"逐段 LatentSync"重构为"先拼接再推理"架构,推理次数从 N 次降为 1 次。
---
### 一、后端高优 Bug 修复
#### 1. `_split_equal()` 素材数 > 字符数边界溢出
- **问题**: 5 个素材但只有 2 个 Whisper 字符时,边界索引重复,部分素材被跳过
- **修复**: 加入 `n = min(n, len(all_chars))` 上限保护
- **文件**: `backend/app/modules/videos/workflow.py`
#### 2. 多素材 LatentSync 单段失败无 fallback
- **问题**: 单素材模式下 LatentSync 失败会 fallback 到原始素材,但多素材模式直接抛异常,整个任务失败
- **修复**: 多素材循环中加 try-except失败时 fallback 到原始素材片段
- **文件**: `backend/app/modules/videos/workflow.py`
#### 3. `num_segments == 0` 时 ZeroDivisionError
- **问题**: 所有 assignments 被跳过后 `i / num_segments` 触发除零
- **修复**: 循环前加 `if num_segments == 0` 检查并抛出明确错误
- **文件**: `backend/app/modules/videos/workflow.py`
#### 4. `split_audio` 未校验 duration > 0
- **问题**: `end <= start` 时 FFmpeg 行为异常
- **修复**: 加入 `if duration <= 0: raise ValueError(...)`
- **文件**: `backend/app/services/video_service.py`
#### 5. Whisper 失败时按时长均分兜底
- **问题**: Whisper 失败后直接退化为单素材,其他素材被浪费
- **修复**: 按 `audio_duration / len(material_paths)` 均分,不依赖字符对齐
- **文件**: `backend/app/modules/videos/workflow.py`
#### 6. `concat_videos` 空列表未检查
- **问题**: 传入空 `video_paths` 时 FFmpeg 报错
- **修复**: 加入 `if not video_paths: raise ValueError(...)`
- **文件**: `backend/app/services/video_service.py`
---
### 二、前端优化
#### 1. payload 构建非空断言修复
- `m!.path``m?.path` + `.filter(Boolean)`,防止素材被删后 crash
- **文件**: `frontend/src/features/home/model/useHomeController.ts`
#### 2. 生成按钮展示后端进度消息
- 新增 `message` prop生成中显示如"(正在处理片段 2/3...)"
- **文件**: `frontend/src/features/home/ui/GenerateActionBar.tsx`, `HomePage.tsx`
#### 3. 新上传素材自动选中
- 上传成功后对比前后素材列表,新增的 ID 自动追加到 `selectedMaterials`
- **文件**: `frontend/src/features/home/model/useMaterials.ts`
#### 4. Material 接口统一
- 三处 `interface Material` 重复定义提取到 `shared/types/material.ts`
- **文件**: `frontend/src/shared/types/material.ts` (新建), `useMaterials.ts`, `useHomeController.ts`, `MaterialSelector.tsx`
#### 5. 拖拽排序修复
- 移除 `DragOverlay``backdrop-blur` 创建新 containing block 导致定位错乱)
- 改为 `useSortable` 原生拖拽 + `CSS.Translate`,拖拽中元素高亮加阴影
- **文件**: `frontend/src/features/home/ui/MaterialSelector.tsx`
#### 6. 素材选择上限 4 个
- `toggleMaterial` 新增 `MAX_MATERIALS = 4` 限制
- UI 选满后未选中项变半透明禁用,提示文字改为"可多选最多4个"
- **文件**: `useMaterials.ts`, `MaterialSelector.tsx`
#### 7. 移动端排序区域响应式
- 素材列表 `max-h-64``max-h-48 sm:max-h-64`
- **文件**: `MaterialSelector.tsx`
#### 8. 多素材耗时提示
- 选中 ≥2 素材时生成按钮下方显示"多素材模式 (N 个机位),生成耗时较长"
- **文件**: `GenerateActionBar.tsx`, `HomePage.tsx`
---
### 三、核心架构重构:先拼接再推理
#### V1 (Day 21): 逐段 LatentSync
```
素材A → LatentSync(素材A, 音频片段1) → lipsync_A
素材B → LatentSync(素材B, 音频片段2) → lipsync_B
FFmpeg concat(lipsync_A, lipsync_B) → 最终视频
```
- 缺点N 个素材 = N 次 LatentSync 推理(每次 ~30s
#### V2 (Day 22): 先拼接再推理
```
素材A → prepare_segment(裁剪到3.67s) → prepared_A
素材B → prepare_segment(裁剪到4.00s) → prepared_B
FFmpeg concat(prepared_A, prepared_B) → concat_video (7.67s)
LatentSync(concat_video, 完整音频) → 最终视频
```
- 优点:只需 **1 次** LatentSync 推理,时间从 N×30s 降为 1×30s
#### 新增 `prepare_segment()` 方法
```python
def prepare_segment(self, video_path, target_duration, output_path, target_resolution=None):
# 素材时长 > 目标: 裁剪 (-t)
# 素材时长 < 目标: 循环 (-stream_loop) + 裁剪
# 分辨率一致: -c copy 无损 (不重编码)
# 分辨率不一致: scale + pad 统一到第一个素材分辨率
```
#### 分辨率处理策略
- 新增 `get_resolution()` 方法检测各素材分辨率
- 所有素材分辨率相同时:`-c copy` 无损裁剪(保持原画质)
- 分辨率不一致时:统一到第一个素材的分辨率,`force_original_aspect_ratio=decrease` + `pad` 居中
- LatentSync 只处理嘴部 512×512 区域,输出保持原分辨率
#### 时间对齐验证
| 环节 | 时间基准 | 对齐关系 |
|------|---------|---------|
| TTS 音频 | 原始时长 (7.67s) | 基准 |
| Whisper 字幕 | 基于 TTS 音频 | 时间戳对齐音频 |
| 均分切分 | assignments 总时长 = 音频时长 | 首段 start=0, 末段 end=audio_duration |
| prepare 各段 | `-t seg_dur` 精确截断 | 总和 ≈ 音频时长 |
| LatentSync | concat_video + 完整音频 | 内部 0.5s 容差 |
| compose | lipsync_video + 音频/BGM | `-shortest` 保证同步 |
| Remotion | 基于 captions_path 渲染字幕 | 时间戳对齐音频 |
---
### 涉及文件汇总
| 文件 | 变更类型 | 说明 |
|------|----------|------|
| `backend/app/modules/videos/workflow.py` | 修改 | 6 个 Bug 修复 + 流水线重构(先拼接再推理)|
| `backend/app/services/video_service.py` | 修改 | 新增 `prepare_segment()``get_resolution()``split_audio` 校验,`concat_videos` 空列表检查 |
| `frontend/src/shared/types/material.ts` | 新建 | 统一 Material 接口 |
| `frontend/src/features/home/model/useMaterials.ts` | 修改 | 上传自动选中、素材上限 4 个 |
| `frontend/src/features/home/model/useHomeController.ts` | 修改 | payload 非空断言修复、Material 接口引用 |
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 修改 | 拖拽修复、上限 4 个 UI、移动端响应式 |
| `frontend/src/features/home/ui/GenerateActionBar.tsx` | 修改 | 进度消息展示、多素材耗时提示 |
| `frontend/src/features/home/ui/HomePage.tsx` | 修改 | 传递 message、materialCount prop |
---
### 四、AI 多语言翻译
#### 功能
在文案编辑区新增「AI多语言」按钮支持将中文口播文案一键翻译为 9 种语言,并可随时还原原文。
#### 支持语言
英语 English、日语 日本語、韩语 한국어、法语 Français、德语 Deutsch、西班牙语 Español、俄语 Русский、意大利语 Italiano、葡萄牙语 Português
#### 实现
##### 后端
- **`backend/app/services/glm_service.py`** — 新增 `translate_text()` 方法,调用智谱 GLM APItemperature=0.3prompt 要求只返回译文、保持语气风格
- **`backend/app/modules/ai/router.py`** — 新增 `POST /api/ai/translate` 接口,接收 `{text, target_lang}`,返回 `{translated_text}`
##### 前端
- **`frontend/src/features/home/ui/ScriptEditor.tsx`** — 新增 `LANGUAGES` 列表9 种语言)、语言下拉菜单(点击外部自动关闭)、翻译中 loading 状态、「还原原文」按钮(翻译过后出现在菜单顶部)
- **`frontend/src/features/home/model/useHomeController.ts`** — 新增 `handleTranslate`(调用翻译 API、首次翻译保存原文`originalText` 状态、`handleRestoreOriginal`(恢复原文)
#### 涉及文件
| 文件 | 变更 | 说明 |
|------|------|------|
| `backend/app/services/glm_service.py` | 修改 | 新增 `translate_text()` 方法 |
| `backend/app/modules/ai/router.py` | 修改 | 新增 `/api/ai/translate` 接口 |
| `frontend/src/features/home/ui/ScriptEditor.tsx` | 修改 | 语言菜单 UI、翻译 loading、还原原文按钮 |
| `frontend/src/features/home/model/useHomeController.ts` | 修改 | `handleTranslate``originalText``handleRestoreOriginal` |
---
### 五、TTS 多语言支持
#### 背景
翻译功能实现后,用户可将中文文案翻译为其他语言。但翻译后生成视频时 TTS 仍只支持中文:
- **EdgeTTS**:声音列表只有 5 个 `zh-CN-*` 中文声音
- **声音克隆 (Qwen3-TTS)**`language` 参数硬编码为 `"Chinese"`
#### 实现方案
##### 1. 前端:语言感知的声音列表
- `VOICES` 从扁平数组扩展为 `Record<string, VoiceOption[]>`,覆盖 10 种语言zh-CN / en-US / ja-JP / ko-KR / fr-FR / de-DE / es-ES / ru-RU / it-IT / pt-BR每种语言 2 个声音(男/女)
- 新增 `LANG_TO_LOCALE` 映射:翻译目标语言名 → EdgeTTS locale`"English" → "en-US"`
- 新增 `textLang` 状态,跟踪当前文案语言,默认 `"zh-CN"`
##### 2. 翻译时自动切换声音
- `handleTranslate` 成功后:根据目标语言设置 `textLang`EdgeTTS 模式下自动切换 `voice` 为目标语言的默认声音
- `handleRestoreOriginal` 还原时:重置 `textLang``"zh-CN"`,恢复中文默认声音
- `VoiceSelector` 根据 `textLang` 动态显示对应语言的声音列表
##### 3. 声音克隆语言透传
- 前端:新增 `LOCALE_TO_QWEN_LANG` 映射(`zh-CN→"Chinese"`, `en-US→"English"`, 其他→`"Auto"`
- 生成请求 payload 加入 `language` 字段(仅声音克隆模式)
- 后端 `GenerateRequest` schema 新增 `language: str = "Chinese"` 字段
- `workflow.py``language="Chinese"` 硬编码改为 `language=req.language`
##### 4. Bug 修复textLang 持久化
- **问题**: `voice` 已持久化但 `textLang` 未持久化,刷新页面后 `voice` 恢复为英文声音但 `textLang` 默认回中文,导致 VoiceSelector 显示中文声音列表却选中英文声音,无高亮按钮
- **修复**: 在 `useHomePersistence` 中加入 `textLang` 的 localStorage 读写
#### 数据流
```
用户翻译 "English"
→ ScriptEditor.onTranslate("English")
→ LANG_TO_LOCALE["English"] = "en-US"
→ setTextLang("en-US"), setVoice("en-US-GuyNeural")
→ VoiceSelector 显示 VOICES["en-US"] = [Guy, Jenny]
→ 生成时:
EdgeTTS: payload.voice = "en-US-GuyNeural"
声音克隆: payload.language = "English" (via getQwenLanguage)
```
#### 涉及文件
| 文件 | 变更 | 说明 |
|------|------|------|
| `frontend/src/features/home/model/useHomeController.ts` | 修改 | VOICES 多语言 Record、textLang 状态、LANG_TO_LOCALE / LOCALE_TO_QWEN_LANG 映射、翻译自动切换 voice |
| `frontend/src/features/home/model/useHomePersistence.ts` | 修改 | textLang 持久化读写 |
| `backend/app/modules/videos/schemas.py` | 修改 | GenerateRequest 加 `language` 字段 |
| `backend/app/modules/videos/workflow.py` | 修改 | 声音克隆调用处用 `req.language` 替代硬编码 |

856
Docs/DevLogs/Day23.md Normal file
View File

@@ -0,0 +1,856 @@
## 🎙️ 配音前置重构 — 第一阶段 (Day 23)
### 概述
将配音从视频生成流程中独立出来,实现"先生成配音 → 选中配音 → 再选素材 → 生成视频"的新工作流。用户可以独立管理配音(生成/试听/改名/删除/选择),并在选中配音后看到时长信息,为第二阶段的素材时间轴编排奠定数据基础。
**旧流程**: 文案 + 选素材 → 一键生成(内联 TTS → Whisper → 均分 → LipSync → 合成)
**新流程**: 文案 → 配音方式 → **生成配音** → 选中配音 → 选素材 → 背景音乐 → 生成视频
---
### 一、后端:新增 `generated_audios` 模块
#### 模块结构
```
backend/app/modules/generated_audios/
├── __init__.py
├── router.py # 5 个 API 端点
├── schemas.py # 请求/响应模型
└── service.py # 生成/列表/删除/改名
```
#### API 端点
| 方法 | 路径 | 说明 |
|------|------|------|
| POST | `/api/generated-audios/generate` | 异步生成配音(返回 task_id |
| GET | `/api/generated-audios/tasks/{task_id}` | 轮询生成进度 |
| GET | `/api/generated-audios` | 列出用户所有配音 |
| DELETE | `/api/generated-audios/{audio_id}` | 删除配音 |
| PUT | `/api/generated-audios/{audio_id}` | 改名 |
#### 存储方案
- Supabase 存储桶:`generated-audios`(启动时自动创建)
- 音频文件:`{user_id}/{timestamp}_audio.wav`
- 元数据文件:`{user_id}/{timestamp}_audio.json`(含 display_name、text、tts_mode、duration_sec 等)
#### 生成流程
复用现有 `TTSService` / `voice_clone_service` / `task_store`
```
POST /generate → 创建 task → BackgroundTask:
1. edgetts → TTSService.generate_audio()
voiceclone → 下载 ref_audio → voice_clone_service.generate_audio()
2. ffprobe 获取时长
3. 上传 .wav + .json 到 generated-audios 桶
4. 更新 task(status=completed, output={audio_id, duration_sec, ...})
```
---
### 二、后端:修改视频生成 workflow
#### `GenerateRequest` 新增字段
```python
generated_audio_id: Optional[str] = None # 预生成配音 ID存在时跳过内联 TTS
```
#### `workflow.py` TTS 阶段新增分支
```python
if req.generated_audio_id:
# 下载预生成配音 + 从元数据读取 language
elif req.tts_mode == "voiceclone":
# 原有声音克隆逻辑
else:
# 原有 EdgeTTS 逻辑
```
向后兼容:不传 `generated_audio_id` 时,原有内联 TTS 流程不受影响。
---
### 三、前端:新增配音列表 hook + 面板
#### `useGeneratedAudios.ts`
- 状态:`generatedAudios[]``selectedAudio``isGeneratingAudio``audioTask`
- 方法:`fetchGeneratedAudios()``generateAudio()``deleteAudio()``renameAudio()``selectAudio()`
- 轮询:生成后 1s 轮询 task 状态,完成后自动刷新列表并选中最新配音
- 独立于视频生成的 TaskContext不互相干扰
#### `GeneratedAudiosPanel.tsx`
- 每条配音:播放/暂停、名称、时长、重命名、删除
- 选中态:`border-purple-500 bg-purple-500/20`
- 内嵌进度条(生成中显示)
- 底部显示选中配音的原始文案(截断)
- 播放逻辑自包含于面板内(`new Audio()` + play/pause toggle
---
### 四、前端UI 面板重排序
**旧顺序**: MaterialSelector → ScriptEditor → TitleSubtitle → VoiceSelector → BgmPanel → GenerateActionBar
**新顺序**:
1. ScriptEditor文案编辑
2. TitleSubtitlePanel标题与字幕样式
3. VoiceSelector配音方式
4. **GeneratedAudiosPanel**(配音列表)← 新增
5. MaterialSelector视频素材← 后移,需选中配音才解锁
6. BgmPanel背景音乐
7. GenerateActionBar生成视频
#### 素材区门控
未选中配音时,素材区显示半透明遮罩 + "请先生成并选中配音"提示。素材上传/预览/改名/删除始终可用,仅选择勾选被遮罩。
#### 时长信息
选中配音后MaterialSelector 顶部显示:
```
当前配音: 45.2 秒 | 已选 3 个素材(自动均分每段 ~15.1 秒)
```
#### 生成按钮条件更新
```typescript
// 旧条件
disabled={isGenerating || selectedMaterials.length === 0 || (ttsMode === "voiceclone" && !selectedRefAudio)}
// 新条件
disabled={isGenerating || selectedMaterials.length === 0 || !selectedAudio}
```
---
### 五、持久化
`useHomePersistence` 新增 `selectedAudioId` 的 localStorage 读写,刷新页面后恢复选中的配音。
---
### 涉及文件汇总
#### 后端新增
| 文件 | 说明 |
|------|------|
| `backend/app/modules/generated_audios/__init__.py` | 模块标记 |
| `backend/app/modules/generated_audios/router.py` | 5 个 API 端点 |
| `backend/app/modules/generated_audios/service.py` | 生成/列表/删除/改名 |
| `backend/app/modules/generated_audios/schemas.py` | 请求/响应模型 |
#### 后端修改
| 文件 | 变更 |
|------|------|
| `backend/app/main.py` | 注册 generated_audios 路由 |
| `backend/app/services/storage.py` | 新增 `BUCKET_GENERATED_AUDIOS`,启动时自动创建桶 |
| `backend/app/modules/videos/schemas.py` | `GenerateRequest` 新增 `generated_audio_id` 字段 |
| `backend/app/modules/videos/workflow.py` | TTS 阶段新增预生成音频分支 |
#### 前端新增
| 文件 | 说明 |
|------|------|
| `frontend/src/features/home/model/useGeneratedAudios.ts` | 配音列表 hook |
| `frontend/src/features/home/ui/GeneratedAudiosPanel.tsx` | 配音列表面板 |
#### 前端修改
| 文件 | 变更 |
|------|------|
| `frontend/src/features/home/ui/HomePage.tsx` | 面板重排序 + 素材区门控 + 插入 GeneratedAudiosPanel |
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 新增 `selectedAudioDuration` prop + 时长信息显示 |
| `frontend/src/features/home/ui/GenerateActionBar.tsx` | 禁用条件改为 `!selectedAudio` |
| `frontend/src/features/home/model/useHomeController.ts` | 集成 useGeneratedAudios、新增 handleGenerateAudio、修改 handleGenerate 使用 generated_audio_id |
| `frontend/src/features/home/model/useHomePersistence.ts` | 新增 selectedAudioId 持久化 |
---
## 🎞️ 素材时间轴编排 — 第二阶段 (Day 23)
### 概述
在第一阶段"配音前置"基础上,新增**时间轴编辑器**,用户可以:
1. 在音频波形上查看各素材块的时长分配
2. 拖拽分割线调整每段素材的时长(无缝铺满,调整一段自动压缩/扩展相邻段)
3. 为每段素材设置**源视频截取起点**(从视频任意位置开始,而非始终从头)
**旧行为**: 多素材时自动均分(`_split_equal`),无法控制每段时长和源视频起始点
**新行为**: 时间轴编辑器可视化分配 + 拖拽调整 + ClipTrimmer 截取设置
---
### 一、后端改动
#### 1.1 新增 `CustomAssignment` 模型
```python
# backend/app/modules/videos/schemas.py
class CustomAssignment(BaseModel):
material_path: str
start: float # 音频时间轴起点
end: float # 音频时间轴终点
source_start: float = 0.0 # 源视频截取起点
```
`GenerateRequest` 新增 `custom_assignments: Optional[List[CustomAssignment]] = None`。存在时跳过 Whisper 均分,直接使用用户定义的分配。
#### 1.2 `prepare_segment` 支持 `source_start`
```python
def prepare_segment(self, video_path, target_duration, output_path,
target_resolution=None, source_start: float = 0.0):
```
关键逻辑:
- `source_start > 0` 时使用 `-ss` 快速 seek并强制重编码避免 stream copy 关键帧不精确)
- 当需要循环且有 `source_start` 时,先裁剪出 `source_start` 到视频结尾的片段,再循环裁剪后的文件(避免 `stream_loop` 从视频 0s 开始循环)
- 裁剪临时文件在 `finally` 中自动清理
#### 1.3 `workflow.py` 支持 `custom_assignments`
- **多素材模式**: `custom_assignments` 存在时,直接使用用户分配(仍运行 Whisper 生成字幕),每个 `prepare_segment` 调用传入 `source_start`
- **单素材模式**: `custom_assignments` 有 1 条且 `source_start > 0` 时,先截取片段再传入 LatentSync
- **向后兼容**: `custom_assignments``None` 时完全走旧路径
---
### 二、前端新增组件
#### 2.1 `useTimelineEditor.ts` — 时间轴段管理 hook
```typescript
interface TimelineSegment {
id: string; // React key
materialId: string; // 素材 ID
materialName: string; // 显示名
start: number; // 音频时间轴开始秒数
end: number; // 音频时间轴结束秒数
sourceStart: number; // 源视频截取起点(默认 0
sourceEnd: number; // 源视频截取终点0 = 到结尾)
color: string; // 色块颜色
}
```
核心方法:
- `initSegments()`: selectedMaterials 变化时按数量均分 audioDuration
- `resizeSegment(id, newEnd)`: 拖拽右边界,约束每段最小 1s
- `setSourceRange(id, sourceStart, sourceEnd)`: 设置截取范围
- `toCustomAssignments()`: 转为后端 `CustomAssignment[]` 格式
#### 2.2 `TimelineEditor.tsx` — 波形 + 色块时间轴
- **wavesurfer.js** 渲染音频波形(仅展示,不播放)
- 色块层按比例排列,显示素材名 + 时长 + 截取标记
- 色块间分割线可拖拽(`onPointerDown/Move/Up` 实现连续像素拖拽)
- 点击色块打开 ClipTrimmer
#### 2.3 `ClipTrimmer.tsx` — 素材截取模态框
- HTML5 `<video>` 实时预览,拖拽滑块时 `video.currentTime` 跟随
- 双端 Range Slider起点/终点),互锁约束 ≥ 0.5s
- 显示截取时长 vs 分配时长对比(循环补足/截断提示)
- `loadedmetadata` 获取源视频时长
---
### 三、前端整合改动
#### 3.1 `useHomeController.ts`
- 集成 `useTimelineEditor` hook
- 新增 `clipTrimmerOpen` / `clipTrimmerSegmentId` 状态
- `handleGenerate` 多素材时始终发送 `custom_assignments`;单素材 + `sourceStart > 0` 时也发送
- 移除不再使用的 `reorderMaterials` 导出
#### 3.2 `HomePage.tsx`
- 在 MaterialSelector 和 BgmPanel 之间插入 TimelineEditor仅当有配音且已选素材时显示
- 底部新增 ClipTrimmer 模态框
- 移除 `reorderMaterials``selectedAudioDuration` prop 传递
#### 3.3 `MaterialSelector.tsx`
- 移除配音时长信息栏(功能迁至 TimelineEditor
- 移除拖拽排序区SortableChip + @dnd-kit 相关代码)
- 移除 `onReorderMaterials` / `selectedAudioDuration` prop
---
### 四、审查修复的 Bug
| # | 严重程度 | 问题 | 修复 |
|---|---------|------|------|
| 1 | **中** | `prepare_segment` 使用 `source_start > 0` + stream copy 时 seek 不精确 | 添加 `source_start > 0` 到重编码条件 |
| 2 | **高** | `stream_loop + source_start` 循环时从视频 0s 开始而非从 source_start 循环 | 改为两步:先裁剪片段再循环裁剪后的文件 |
| 3 | **低** | `useHomeController` 导出已废弃的 `reorderMaterials` | 移除 |
---
### 涉及文件汇总
#### 后端修改
| 文件 | 变更 |
|------|------|
| `backend/app/modules/videos/schemas.py` | 新增 `CustomAssignment` model`GenerateRequest` 新增 `custom_assignments` 字段 |
| `backend/app/services/video_service.py` | `prepare_segment` 新增 `source_start` 参数,循环+截取两步处理 |
| `backend/app/modules/videos/workflow.py` | 多素材/单素材流水线支持 `custom_assignments`,传递 `source_start` |
#### 前端新增
| 文件 | 说明 |
|------|------|
| `frontend/src/features/home/model/useTimelineEditor.ts` | 时间轴段管理 hook |
| `frontend/src/features/home/ui/TimelineEditor.tsx` | 波形 + 色块时间轴组件 |
| `frontend/src/features/home/ui/ClipTrimmer.tsx` | 素材截取模态框 |
#### 前端修改
| 文件 | 变更 |
|------|------|
| `frontend/src/features/home/ui/HomePage.tsx` | 插入 TimelineEditor + ClipTrimmer |
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 移除时长信息 + 拖拽排序区 + 相关 prop |
| `frontend/src/features/home/model/useHomeController.ts` | 集成 useTimelineEditorhandleGenerate 发送 custom_assignments |
| `frontend/package.json` | 新增 `wavesurfer.js` 依赖 |
---
## 🎨 UI 体验优化 + TTS 稳定性修复 — 第三阶段 (Day 23)
### 概述
根据用户反馈,修复 6 项 UI 体验问题,同时修复声音克隆服务的 SoX 路径问题和显存缓存管理。
> **注**: Qwen3-TTS 已在后续被 CosyVoice 3.0 (端口 8010) 替换,以下记录为当时的修复过程。
---
### 一、Qwen3-TTS 稳定性修复 (已被 CosyVoice 3.0 替换)
#### 1.1 SoX PATH 修复
**问题**: PM2 启动 qwen-tts 时,`sox` 工具安装在 conda env 的 bin 目录中,系统 PATH 找不到,导致音频编解码走 fallback 路径CPU 密集型),日志中出现 `SoX could not be found!` 警告。
**修复**: `run_qwen_tts.sh` 中 export conda env bin 到 PATH
```bash
export PATH="/home/rongye/ProgramFiles/miniconda3/envs/qwen-tts/bin:$PATH"
```
#### 1.2 CUDA 缓存清理
**修复**: `qwen_tts_server.py` 每次生成完成后(无论成功或失败)调用 `torch.cuda.empty_cache()`,防止显存碎片累积。使用 `asyncio.to_thread()` 在线程池中运行推理,避免阻塞事件循环导致健康检查超时。
> **后续**: Qwen3-TTS 已停用CosyVoice 3.0 沿用了相同的保护机制GPU 推理锁、超时保护、显存清理、启动自检)。
---
### 二、配音列表按钮布局统一 (反馈 #1 + #6)
**问题**: `GeneratedAudiosPanel` 的试听按钮位于左侧(独立于 Edit/Delete`RefAudioPanel` 的布局不一致。底部文案摘要区域不需要展示。
**修复**:
- Play/Edit/Delete 按钮统一放在右侧同组hover 显示,顺序为 试听→重命名→删除
- 移除选中配音的文案摘要区域
- 布局与 RefAudioPanel 一致:左侧名称+时长,右侧操作按钮组
---
### 三、视频素材区域移除配音依赖遮罩 (反馈 #2)
**问题**: MaterialSelector 被 `!selectedAudio` 遮罩覆盖,必须先选配音才能操作素材。
**修复**: 移除 `HomePage.tsx` 中 MaterialSelector 外层的 disabled overlay `<div>`。素材随时可上传/预览/管理,仅 TimelineEditor 需要选中配音才显示(已有独立条件 `selectedAudio && selectedMaterials.length > 0`)。
---
### 四、时间轴拖拽排序 (反馈 #3)
**问题**: TimelineEditor 不支持调换素材顺序。
**修复**:
- `useTimelineEditor` 已有 `reorderSegments()` 方法(交换两个段的素材信息但保留时间范围)
- 通过 `useHomeController` 暴露 `reorderSegments`,传入 `TimelineEditor`
- 色块支持 HTML5 Drag & Drop`draggable` + `onDragStart/Over/Drop/End`
- 拖拽时:源色块半透明(`opacity-50`),目标色块高亮 ring`ring-2 ring-purple-400 scale-[1.02]`
- 光标样式:`cursor-grab` / `active:cursor-grabbing`
---
### 五、截取设置双手柄 Range Slider (反馈 #4)
**问题**: ClipTrimmer 使用两个独立的 `<input type="range">` 滑块,起点和终点分开操作,体验不直观。
**修复**: 改为自定义双手柄 range slider
- 单条轨道,紫色圆形手柄(起点)+ 粉色圆形手柄(终点)
- 轨道底色 `bg-white/10`,选中范围用素材对应颜色高亮
- Pointer Events 实现拖拽:`onPointerDown` 捕获手柄 → `onPointerMove` 更新位置 → `onPointerUp` 释放
- 手柄互锁约束:起点不超过终点 - 0.5s,终点不低于起点 + 0.5s
- 底部显示起点(紫色)和终点(粉色)时间标签
---
### 六、截取设置视频预览 (反馈 #5)
**问题**: ClipTrimmer 的视频只能静态查看,无法播放预览截取范围。
**修复**:
- 视频区域点击可播放/暂停Play/Pause 图标覆盖层)
- 播放范围:从 sourceStart 播放到 sourceEnd 自动停止
- 播放结束后回到起点
- 拖拽手柄时 `video.currentTime` 实时跟随seek 到当前位置查看画面)
- 播放进度条(白色竖线)叠加在 range slider 轨道上
- `preload="auto"` 预加载视频,确保拖拽时快速 seek
---
### 涉及文件汇总
#### 后端修改
| 文件 | 变更 |
|------|------|
| `run_qwen_tts.sh` | export conda env bin 到 PATH修复 SoX 找不到问题 (已停用) |
| `models/Qwen3-TTS/qwen_tts_server.py` | 每次生成后 `torch.cuda.empty_cache()`asyncio.to_thread 避免阻塞 (已停用) |
#### 前端修改
| 文件 | 变更 |
|------|------|
| `frontend/src/features/home/ui/GeneratedAudiosPanel.tsx` | 按钮布局统一Play/Edit/Delete 右侧同组),移除文案摘要 |
| `frontend/src/features/home/ui/HomePage.tsx` | 移除 MaterialSelector 配音遮罩,传入 onReorderSegment |
| `frontend/src/features/home/ui/TimelineEditor.tsx` | 新增 HTML5 Drag & Drop 排序,新增 onReorderSegment prop |
| `frontend/src/features/home/ui/ClipTrimmer.tsx` | 双手柄 range slider + 视频播放预览 + 播放进度指示 |
| `frontend/src/features/home/model/useHomeController.ts` | 暴露 reorderSegments 方法 |
---
## 📝 历史文案保存 + 时间轴拖拽修复 — 第四阶段 (Day 23)
### 概述
新增文案手动保存与加载功能,修复时间轴拖拽排序后素材时长不跟随的 Bug统一按钮视觉规范。
---
### 一、历史文案保存与加载
#### 功能
用户可手动保存当前文案到历史列表,随时从历史中加载恢复。只有手动保存的文案才出现在历史列表中,与自动保存(`useHomePersistence`)完全独立。
#### UI 布局
```
按钮栏: [历史文案▼] [文案提取助手] [AI多语言▼] [AI生成标题标签]
底部栏: 128 字 [保存文案]
```
- **历史文案下拉**: 展示已保存列表(名称 + 日期 + 删除按钮),点击条目加载文案,空列表显示"暂无保存的文案"
- **保存文案按钮**: 文案为空时 disabled点击后 `toast.success("文案已保存")`
- **预计时长已移除**: 底部栏只保留字数 + 保存按钮
#### 实现
##### `useSavedScripts.ts`(新建)
```typescript
interface SavedScript { id: string; name: string; content: string; savedAt: number }
```
- localStorage key: `vigent_{storageKey}_savedScripts`
- `saveScript(content)`: 取前 15 字符自动命名,新条目插入列表头部,**直接写入 localStorage**
- `deleteScript(id)`: 删除指定条目,直接写入 localStorage
- `useEffect([lsKey])`: lsKey 变化时guest → userId重新从 localStorage 读取
- **不使用自动持久化 effect**,避免 storageKey 切换时空数组覆盖已有数据
##### 数据流
```
ScriptEditor (UI)
↑ savedScripts / onSaveScript / onLoadScript / onDeleteScript (纯 props + callbacks)
useHomeController
├── useSavedScripts(storageKey) → { savedScripts, saveScript, deleteScript }
└── handleSaveScript() → saveScript(text) + toast
HomePage
└── 传递 props 到 ScriptEditor
```
---
### 二、时间轴拖拽排序 Bug 修复
#### 问题
拖拽调换素材顺序后各素材的时长没有跟随素材移动而是留在原槽位。例如素材1(3s) + 素材2(8s+4s循环)拖拽后变成素材2(3s) + 素材1(8s+4s循环),时长分配没变。
#### 根因
`reorderSegments` 使用**属性交换**方式:逐个拷贝 `materialId``sourceStart``sourceEnd` 等属性在两个槽位间交换,然后调用 `recalcPositions` 重算位置。
#### 修复
改为**数组移动**splice将整个 segment 对象从旧位置取出插入到新位置。segment 对象携带全部属性materialId、sourceStart、sourceEnd、color 等)作为一个整体移动,再由 `recalcPositions` 重算位置。
```typescript
// 修复前:属性交换
const fromMat = { materialId: next[fromIdx].materialId, ... };
const toMat = { materialId: next[toIdx].materialId, ... };
next[fromIdx] = { ...next[fromIdx], ...toMat };
next[toIdx] = { ...next[toIdx], ...fromMat };
// 修复后:数组移动
const [moved] = next.splice(fromIdx, 1);
next.splice(toIdx, 0, moved);
```
附带优势3+ 素材拖拽行为从"交换"变为"插入",更符合用户直觉。
---
### 三、按钮视觉统一
#### 问题
历史文案、文案提取助手、AI多语言、AI生成标题标签 4 个按钮高度不一致AI 按钮的文本被 `<span>` 嵌套包裹导致内部布局差异。
#### 修复
- 4 个按钮统一为 `h-7 px-2.5 text-xs rounded inline-flex items-center gap-1`(固定高度 28px
- 移除 AI多语言 / AI生成标题标签 按钮内多余的 `<span>` 嵌套,改为 `<>...</>` fragment
---
### 涉及文件汇总
#### 前端新增
| 文件 | 说明 |
|------|------|
| `frontend/src/features/home/model/useSavedScripts.ts` | 历史文案 hooklocalStorage 持久化) |
#### 前端修改
| 文件 | 变更 |
|------|------|
| `frontend/src/features/home/ui/ScriptEditor.tsx` | 历史文案下拉 + 保存按钮 + 移除预计时长 + 按钮高度统一 |
| `frontend/src/features/home/model/useHomeController.ts` | 集成 useSavedScripts新增 handleSaveScript |
| `frontend/src/features/home/ui/HomePage.tsx` | 传递 savedScripts / handleSaveScript / deleteSavedScript 到 ScriptEditor |
| `frontend/src/features/home/model/useTimelineEditor.ts` | reorderSegments 从属性交换改为数组移动splice |
---
## 🔤 字幕语言不匹配 + 视频比例错位修复 — 第五阶段 (Day 23)
### 概述
修复两个视频生成 Bug
1. **字幕语言不匹配**: 中文配音 + 英文翻译文案 → 字幕错误显示英文Whisper 独立转录,忽略原文)
2. **标题字幕比例错位**: 9:16 竖屏素材生成视频后,标题/字幕按 16:9 横屏布局渲染
附带修复代码审查中发现的 `split_word_to_chars` 英文空格丢失问题。
---
### 一、字幕用原文替换 Whisper 转录文字
#### 根因
Whisper 对音频独立转录,完全忽略传入的 `text` 参数。当配音语言与编辑器文案语言不一致时(例如:用户先写中文文案 → 翻译成英文 → 生成英文配音 → 再改回中文文案Whisper "听到"英文语音就输出英文字幕。
#### 修复思路
Whisper 仅负责检测**语音总时间范围**`first_start``last_end`),字幕文字永远用配音保存的原始文案。
#### `whisper_service.py` — `align()` 新增 `original_text` 参数
```python
async def align(self, audio_path, text, output_path=None,
language="zh", original_text=None):
```
`original_text` 非空时:
1. 正常运行 Whisper 转录,记录 `whisper_first_start``whisper_last_end`
2.`original_text` 传入 `split_word_to_chars()` 在总时间范围上线性分布
3.`split_segment_to_lines()` 按标点和字数断行
4. 替换 Whisper 的转录结果
#### `workflow.py` — 配音元数据无条件覆盖 + 传入原文
```python
# 改前(只在文案为空时覆盖)
if not req.text.strip():
req.text = meta.get("text", req.text)
# 改后(无条件用配音元数据覆盖)
meta_text = meta.get("text", "")
if meta_text:
req.text = meta_text
```
所有 4 处 `whisper_service.align()` 调用添加 `original_text=req.text`
---
### 二、Remotion 动态传入视频尺寸
#### 根因
`remotion/src/Root.tsx` 硬编码 `width={1280} height={720}`。虽然 `render.ts` 用 ffprobe 检测真实尺寸后覆盖 `composition.width/height`,但 `selectComposition` 阶段组件已按 1280×720 初始化,标题和字幕定位基于错误的画布尺寸。
#### 修复
##### `Root.tsx` — `calculateMetadata` 从 props 读取尺寸
```tsx
<Composition
id="ViGentVideo"
component={Video}
durationInFrames={300}
fps={25}
width={1080}
height={1920}
calculateMetadata={async ({ props }) => ({
width: props.width || 1080,
height: props.height || 1920,
})}
defaultProps={{
videoSrc: '',
width: 1080,
height: 1920,
// ...
}}
/>
```
默认从 1280×720 改为 1080×1920竖屏优先`calculateMetadata` 确保 `selectComposition` 阶段使用 ffprobe 检测的真实尺寸。
##### `Video.tsx` — VideoProps 新增可选 `width/height`
仅供 `calculateMetadata` 访问,组件渲染不引用。
##### `render.ts` — inputProps 统一传入视频尺寸
```typescript
const inputProps = {
videoSrc: videoFileName,
captions,
title: options.title,
// ...
width: videoWidth, // ffprobe 检测值
height: videoHeight, // ffprobe 检测值
};
```
`selectComposition``renderMedia` 使用同一个 `inputProps`。保留显式 `composition.width/height` 覆盖作为保险。
---
### 三、代码审查修复:英文空格丢失
#### 问题
`split_word_to_chars` 原设计处理 Whisper 单个词(如 `" Hello"`),但 `original_text` 传入整段文本时,中间空格被 `continue` 跳过且不 flush `ascii_buffer`,导致 `"Hello World"` 变成 `"HelloWorld"`
#### 执行路径追踪
```
输入: "Hello World"
H,e,l,l,o → ascii_buffer = "Hello"
' ' → continue跳过不 flush
W,o,r,l,d → ascii_buffer = "HelloWorld"
结果: tokens = ["HelloWorld"] ← 空格丢失
```
#### 修复
遇到空格时 flush `ascii_buffer`,并用 `pending_space` 标记给下一个 token 前置空格:
```python
if not char.strip():
if ascii_buffer:
tokens.append(ascii_buffer)
ascii_buffer = ""
if tokens:
pending_space = True
continue
```
修复后:`"Hello World"` → tokens = `["Hello", " World"]` → 字幕正确显示。中文不受影响。
---
### 涉及文件汇总
#### 后端修改
| 文件 | 变更 |
|------|------|
| `backend/app/services/whisper_service.py` | `align()` 新增 `original_text` 参数;`split_word_to_chars` 修复英文空格丢失 |
| `backend/app/modules/videos/workflow.py` | 配音元数据无条件覆盖 text/language4 处 `align()` 调用传入 `original_text` |
#### 前端修改Remotion
| 文件 | 变更 |
|------|------|
| `remotion/src/Root.tsx` | 默认尺寸改为 1080×1920新增 `calculateMetadata` + width/height defaultProps |
| `remotion/src/Video.tsx` | VideoProps 新增可选 `width`/`height` |
| `remotion/render.ts` | inputProps 统一传入 `videoWidth`/`videoHeight`selectComposition 和 renderMedia 共用 |
---
## 🎤 参考音频自动转写 + 语速控制 — 第六阶段 (Day 23)
### 概述
解决声音克隆 ref_text 不匹配问题:旧方案使用前端固定文字作为 ref_textCosyVoice zero-shot 克隆要求 ref_text 必须与参考音频实际内容匹配,不匹配时模型会在生成音频开头"幻觉"出多余片段。
**改进**:上传参考音频时自动调用 Whisper 转写内容作为 ref_text同时新增语速控制功能。
---
### 一、Whisper 自动转写参考音频
#### 1.1 `whisper_service.py` — 语言自动检测
`transcribe()` 方法原先硬编码 `language="zh"`,改为接受可选 `language` 参数(默认 `None` = 自动检测),支持多语言参考音频。
#### 1.2 `ref_audios/service.py` — 上传时自动转写
上传流程变更:转码 WAV → 检查时长(≥1s) → 超 10s 在静音点截取 → **Whisper 自动转写** → 验证非空 → 上传。
```python
try:
transcribed = await whisper_service.transcribe(tmp_wav_path)
if transcribed.strip():
ref_text = transcribed.strip()
except Exception as e:
logger.warning(f"Auto-transcribe failed: {e}")
if not ref_text or not ref_text.strip():
raise ValueError("无法识别音频内容,请确保音频包含清晰的语音")
```
#### 1.3 `ref_audios/router.py` — ref_text 改为可选
`ref_text: str = Form("")`(不再必填),前端不再发送固定文字。
---
### 二、参考音频智能截取10 秒上限)
CosyVoice 对 3-10 秒参考音频效果最好。
#### 2.1 静音点检测
使用 ffmpeg `silencedetect` 找 10 秒内最后一个静音结束点(阈值 -30dB最短 0.3s),避免在字词中间硬切:
```python
def _find_silence_cut_point(file_path, max_duration):
# silencedetect → 解析 silence_end → 找 3s~max_duration 内最后的静音点
# 找不到则回退到 max_duration
```
#### 2.2 淡出处理
截取时末尾 0.1 秒淡出(`afade=t=out`),避免截断爆音。
---
### 三、重新识别功能(旧数据迁移)
#### 3.1 新增 API
`POST /api/ref-audios/{audio_id}/retranscribe` — 下载音频 → 超 10s 截取 → Whisper 转写 → 重新上传音频和元数据。
#### 3.2 前端 UI
- RefAudioPanel 新增 RotateCw 按钮("重新识别文字"),转写中显示 `animate-spin`
- 旧音频 ref_text 以固定文字开头时显示 ⚠ 黄色警告
---
### 四、语速控制CosyVoice speed 参数)
#### 4.1 全链路传递
```
前端 GeneratedAudiosPanel (速度选择器)
→ useHomeController (speed state + persistence)
→ useGeneratedAudios.generateAudio(params)
→ POST /api/generated-audios/generate { speed: 1.0 }
→ GenerateAudioRequest.speed (Pydantic)
→ generate_audio_task → voice_clone_service.generate_audio(speed=)
→ _generate_once → POST /generate { speed: "1.0" }
→ cosyvoice_server → _model.inference_zero_shot(speed=speed)
```
#### 4.2 前端 UI
声音克隆模式下,配音列表面板标题栏"生成配音"按钮左侧显示语速下拉菜单(`语速: 正常 ▼`
| 标签 | speed 值 |
|------|----------|
| 较慢 | 0.8 |
| 稍慢 | 0.9 |
| 正常 | 1.0 (默认) |
| 稍快 | 1.1 |
| 较快 | 1.2 |
语速选择持久化到 localStorage`vigent_{storageKey}_speed`)。
---
### 五、缺少参考音频门控
声音克隆模式下未选参考音频时:
- "生成配音"按钮禁用 + title 提示"请先选择参考音频"
- 面板内显示黄色警告条"声音克隆模式需要先选择参考音频"
---
### 六、前端清理
- 移除 `FIXED_REF_TEXT` 常量和 `fixedRefText` prop
- 移除"请朗读以下内容"引导区块
- 上传提示简化为"上传任意语音样本3-10秒系统将自动识别内容并克隆声音"
- 录音区备注"建议 3-10 秒,超出将自动截取"
---
### 涉及文件汇总
#### 后端修改
| 文件 | 变更 |
|------|------|
| `backend/app/services/whisper_service.py` | `transcribe()` 增加可选 `language` 参数,默认 None (自动检测) |
| `backend/app/modules/ref_audios/service.py` | 上传自动转写 + 静音点截取 + 淡出 + retranscribe 函数 |
| `backend/app/modules/ref_audios/router.py` | `ref_text` 改为 Form(""),新增 retranscribe 端点 |
| `backend/app/modules/generated_audios/schemas.py` | `GenerateAudioRequest` 新增 `speed: float = 1.0` |
| `backend/app/modules/generated_audios/service.py` | 传递 `req.speed` 到 voice_clone_service |
| `backend/app/services/voice_clone_service.py` | `generate_audio()` / `_generate_once()` 接受并传递 speed |
| `models/CosyVoice/cosyvoice_server.py` | `/generate` 端点接受 `speed` 参数,传递到 `inference_zero_shot(speed=)` |
#### 前端修改
| 文件 | 变更 |
|------|------|
| `frontend/src/features/home/model/useHomeController.ts` | 新增 speed state移除 FIXED_REF_TEXThandleGenerateAudio 传 speed |
| `frontend/src/features/home/model/useHomePersistence.ts` | 新增 speed 持久化 |
| `frontend/src/features/home/model/useRefAudios.ts` | 移除 fixedRefText新增 retranscribe |
| `frontend/src/features/home/model/useGeneratedAudios.ts` | generateAudio params 新增 speed |
| `frontend/src/features/home/ui/GeneratedAudiosPanel.tsx` | 新增语速选择器 + 缺少参考音频门控 |
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 移除朗读引导,新增重新识别按钮 + ⚠ 警告 |
| `frontend/src/features/home/ui/HomePage.tsx` | 传递 speed/setSpeed/ttsMode 到 GeneratedAudiosPanel |

185
Docs/DevLogs/Day24.md Normal file
View File

@@ -0,0 +1,185 @@
## 🔧 鉴权到期治理 + 多素材时间轴稳定性修复 (Day 24)
### 概述
本日主要完成两条主线:
1. **账号与鉴权治理**:会员到期改为请求时自动失效(登录/鉴权接口触发),并统一返回续费提示。
2. **视频生成稳定性**:围绕多素材时间轴、截取语义、拼接边界冻结、画面比例与字幕标题适配进行一轮端到端修复。
---
## 🔐 会员到期请求时失效 — 第一阶段 (Day 24)
### 目标
避免依赖定时任务,用户在触发登录或访问受保护接口时即可完成到期判定与账号停用。
### 行为调整
- 到期判断基于 `users.expires_at`
- 判定到期后:
-`is_active` 自动置为 `false`
- 删除该用户全部 session
- 返回 `403`,提示:`会员已到期,请续费`
### 实现点
- `users.py` 新增 `deactivate_user_if_expired()`,并补充 `_parse_expires_at()` 统一时区解析。
- `deps.py``get_current_user` / `get_current_user_optional` 中统一接入到期检查。
- `auth/router.py` 在登录路径增加到期停用逻辑;`/api/auth/me` 统一走 `Depends(get_current_user)`
---
## 🖼️ 画面比例控制 + 字幕标题适配 — 第二阶段 (Day 24)
### 2.1 输出画面比例可配置
- 时间轴顶部新增“画面比例”下拉:`9:16` / `16:9`
- 默认值 `9:16`,并持久化到 localStorage。
- 生成请求携带 `output_aspect_ratio`,后端在单素材与多素材流程中统一按目标分辨率处理。
### 2.2 标题/字幕在窄屏画布防溢出
为减少“预览正常、成片溢出”的差异,统一了预览与渲染策略:
- 根据 composition 宽度进行响应式缩放。
- 开启可换行:`white-space: normal` + `word-break` + `overflow-wrap`
- 描边、字距、上下边距同步按比例缩放。
### 2.3 片头标题显示模式(短暂/常驻)
- 在“标题与字幕”面板的“片头标题”行尾新增下拉,支持:`短暂显示` / `常驻显示`
- 默认模式为 `短暂显示`,短暂模式默认时长为 4 秒。
- 用户选择会持久化到 localStorage刷新后保持上次配置。
- 生成请求新增 `title_display_mode`,短暂模式透传 `title_duration=4.0`
- Remotion 端到端支持该参数:
- `short`:标题在设定时长后淡出并结束渲染;
- `persistent`:标题全程常驻(保留淡入动画,不执行淡出)。
---
## 🎥 方向归一化 + 多素材拼接稳定性 — 第三阶段 (Day 24)
### 3.1 MOV 旋转元数据导致横竖识别错误
问题场景:编码分辨率是横屏,但依赖 rotation side-data 才能正确显示为竖屏(常见于手机 MOV
修复方案:
- `get_video_metadata()` 扩展返回 `rotation/effective_width/effective_height`
- 新增 `normalize_orientation()`,在流程前对带旋转元数据素材做物理方向归一化。
- 单素材和多素材下载后统一执行方向归一化,再做分辨率决策。
### 3.2 多素材“只看到第一段”与边界冻结
针对拼接可靠性补了两类保护:
- **分配保护**`custom_assignments` 与素材数量不一致时,后端回退自动分配,避免异常输入导致仅首段生效。
- **编码一致性**
- 片段准备阶段统一重编码;
- concat 阶段不再走拷贝;
- 进一步统一为 `25fps + CFR`,并在 concat 增加 `+genpts`,降低段边界时间基不连续导致的“画面冻结口型还动”风险。
---
## ⏱️ 时间轴截取语义对齐修复 — 第四阶段 (Day 24)
### 背景
时间轴设计语义是:
- 每段可以设置 `sourceStart/sourceEnd`
- 总时长超出音频时,仅保留可见段,末段截齐音频;
- 总时长不足时,由最后可见段循环补齐。
本日将前后端对齐到这一语义。
### 4.1 `source_end` 全链路打通
此前仅传 `source_start`,导致后端无法准确知道“截到哪里”。
本次改动:
- 前端 `toCustomAssignments()` 增加可选 `source_end`
- 后端 `CustomAssignment` schema 增加 `source_end`
- workflow 将 `source_end` 透传到 `prepare_segment()`(单素材/多素材均支持)。
- `prepare_segment()` 增加 `source_end` 参数,按 `[source_start, source_end)` 计算可用片段,并在需要循环时先裁剪再循环,避免循环范围错位。
### 4.2 时间轴有效时长计算修复
修复 `sourceStart > 0 且 sourceEnd = 0` 时的有效时长错误:
- 旧逻辑会按整段素材时长计算;
- 新逻辑改为 `materialDuration - sourceStart`
该修复同时用于:
- `recalcPositions()` 的段时长计算;
- TimelineEditor 中“循环补足”可视化比例计算。
### 4.3 可见段分配优先级修复
修复“可见段数 < 已选素材数时custom_assignments 被丢弃回退自动分配”的问题:
- 生成请求优先以时间轴可见段的 `assignments` 为准;
- 超出时间轴的素材不参与本次生成。
### 4.4 单素材截取触发条件补齐
单素材模式下,若只改了终点(`sourceEnd > 0`)也会发送 `custom_assignments`,确保截取生效。
---
## 🧭 页面交互与体验细节 — 第五阶段 (Day 24)
- 页面刷新后自动回到顶部,避免从历史滚动位置进入页面。
- 素材列表与历史视频列表滚动增加“跳过首次自动滚动”保护,减少恢复状态时页面跳动。
- 时间轴比例区移除多余文案,保持信息简洁。
---
## 涉及文件汇总
### 后端修改
| 文件 | 变更 |
|------|------|
| `backend/app/repositories/users.py` | 新增 `deactivate_user_if_expired()``_parse_expires_at()` |
| `backend/app/core/deps.py` | `get_current_user` / `get_current_user_optional` 接入到期失效检查 |
| `backend/app/modules/auth/router.py` | 登录时到期停用 + `/api/auth/me` 统一鉴权依赖 |
| `backend/app/modules/videos/schemas.py` | `CustomAssignment` 新增 `source_end`;保留 `output_aspect_ratio` |
| `backend/app/modules/videos/workflow.py` | 多素材/单素材透传 `source_end`;多素材 prepare/concat 统一 25fps标题显示模式参数透传 Remotion |
| `backend/app/services/video_service.py` | 旋转元数据解析与方向归一化;`prepare_segment` 支持 `source_end/target_fps`concat 强制 CFR + `+genpts` |
| `backend/app/services/remotion_service.py` | render 支持 `title_display_mode/title_duration` 并传递到 render.ts |
### 前端修改
| 文件 | 变更 |
|------|------|
| `frontend/src/features/home/model/useTimelineEditor.ts` | `CustomAssignment` 新增 `source_end`;修复 sourceStart 开放终点时长计算 |
| `frontend/src/features/home/model/useHomeController.ts` | 多素材以可见 assignments 为准发送;单素材截取触发条件补齐 |
| `frontend/src/features/home/ui/TimelineEditor.tsx` | 画面比例下拉;循环比例按截取后有效时长计算 |
| `frontend/src/features/home/model/useHomePersistence.ts` | `outputAspectRatio``titleDisplayMode` 持久化 |
| `frontend/src/features/home/ui/HomePage.tsx` | 页面进入滚动到顶部ClipTrimmer/Timeline 交互保持一致 |
| `frontend/src/features/home/ui/FloatingStylePreview.tsx` | 标题/字幕样式预览与成片渲染策略对齐 |
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | 标题行新增“短暂显示/常驻显示”下拉 |
### Remotion 修改
| 文件 | 变更 |
|------|------|
| `remotion/src/components/Title.tsx` | 标题响应式缩放与自动换行;新增短暂/常驻显示模式控制 |
| `remotion/src/components/Subtitles.tsx` | 字幕响应式缩放与自动换行,减少预览/成片差异 |
| `remotion/src/Video.tsx` | 新增 `titleDisplayMode` 透传到标题组件 |
| `remotion/src/Root.tsx` | 默认 props 增加 `titleDisplayMode='short'``titleDuration=4` |
| `remotion/render.ts` | CLI 参数新增 `--titleDisplayMode`inputProps 增加 `titleDisplayMode` |
---
## 验证记录
- 后端语法检查:`python -m py_compile backend/app/modules/videos/schemas.py backend/app/modules/videos/workflow.py backend/app/services/video_service.py backend/app/services/remotion_service.py`
- 前端类型检查:`npx tsc --noEmit`
- 前端 ESLint`npx eslint src/features/home/model/useHomeController.ts src/features/home/model/useHomePersistence.ts src/features/home/ui/HomePage.tsx src/features/home/ui/TitleSubtitlePanel.tsx`
- Remotion 渲染脚本构建:`npm run build:render`

254
Docs/DevLogs/Day25.md Normal file
View File

@@ -0,0 +1,254 @@
## 🔧 文案提取助手修复 — 抖音链接无法提取文案 (Day 25)
### 概述
文案提取助手粘贴抖音链接后无法提取文案yt-dlp 报错 `Fresh cookies are needed`,手动回退方案也因抖音页面结构变化失效。本日完成了完整修复,并清理了不再需要的 `DOUYIN_COOKIE` 配置。
---
## 🐛 问题诊断
### 错误链路
1. **yt-dlp 失败**`ERROR: [Douyin] Fresh cookies (not necessarily logged in) are needed`
- yt-dlp 版本 `2025.12.08` 过旧
- 抖音 API `aweme/v1/web/aweme/detail/` 需要签名 cookie`s_v_web_id` 等),即使升级 yt-dlp 到最新版 + 传入 cookie 仍无法解决,属 yt-dlp 已知问题
2. **手动回退失败**`Could not find RENDER_DATA in page`
- 旧方案通过桌面端用户主页 + `modal_id` 访问,抖音 SSR 已不再返回 `videoDetail` 数据
3. **`.env``DOUYIN_COOKIE`**:时间戳 2024 年 12 月,早已过期
---
## ✅ 修复方案:移动端分享页 + 自动获取 ttwid
### 核心思路
放弃依赖 yt-dlp 下载抖音视频和手动维护 cookie改为
1. 自动从 ByteDance 公共 API 获取新鲜 `ttwid`(匿名令牌,不绑定账号)
2.`ttwid` 访问移动端分享页 `m.douyin.com/share/video/{id}`
3. 从页面内嵌 JSON 中提取 `play_addr` 播放地址并下载
### 关键代码(`_download_douyin_manual` 重写)
```python
# 1. 获取新鲜 ttwid
ttwid_resp = await client.post(
"https://ttwid.bytedance.com/ttwid/union/register/",
json={"region": "cn", "aid": 6383, "service": "www.douyin.com", ...}
)
ttwid = ttwid_resp.cookies.get("ttwid", "")
# 2. 访问移动端分享页
page_resp = await client.get(
f"https://m.douyin.com/share/video/{video_id}",
headers={"cookie": f"ttwid={ttwid}", ...}
)
# 3. 提取 play_addr
addr_match = re.search(r'"play_addr":\{"uri":"([^"]+)","url_list":\["([^"]+)"', page_text)
video_url = addr_match.group(2).replace(r"\u002F", "/")
```
### 优势
- 不再依赖手动维护的 `DOUYIN_COOKIE`ttwid 每次请求自动获取
- 不受 yt-dlp 对抖音支持状况影响
- 所有用户通用,不绑定特定账号
---
## 🧹 清理 DOUYIN_COOKIE 配置
`DOUYIN_COOKIE` 仅用于文案提取,新方案不再需要,已从以下位置删除:
| 文件 | 变更 |
|------|------|
| `backend/.env` | 删除 `DOUYIN_COOKIE` 配置项及注释 |
| `backend/app/core/config.py` | 删除 `DOUYIN_COOKIE: str = ""` 字段定义 |
| `backend/app/modules/tools/service.py` | 删除 yt-dlp 传 cookie 逻辑和 `_write_netscape_cookies` 辅助函数 |
---
## 🔤 前端文案修正
将文案提取界面中的"AI 洗稿结果"改为"AI 改写结果"。
| 文件 | 变更 |
|------|------|
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | `AI 洗稿结果``AI 改写结果` |
| `backend/app/modules/tools/service.py` | 注释中"洗稿"→"改写" |
| `backend/app/services/glm_service.py` | docstring 中"洗稿"→"改写文案" |
---
## 📦 其他变更
- **yt-dlp 升级**`2025.12.08``2026.2.21`
- **yt-dlp 初始化修正**:改为 `YoutubeDL(ydl_opts)` 直接传参初始化(原先空初始化后 update params 不生效)
- **User-Agent 更新**yt-dlp 中 `Chrome/91``Chrome/131`
---
## 涉及文件汇总
### 后端修改
| 文件 | 变更 |
|------|------|
| `backend/app/modules/tools/service.py` | 重写 `_download_douyin_manual`(移动端分享页方案);修正 yt-dlp 初始化;清理 cookie 相关代码;注释改写 |
| `backend/app/services/glm_service.py` | docstring "洗稿" → "改写文案" |
| `backend/app/core/config.py` | 删除 `DOUYIN_COOKIE` 字段 |
| `backend/.env` | 删除 `DOUYIN_COOKIE` 配置 |
| `backend/requirements.txt` | yt-dlp 版本升级 |
### 前端修改
| 文件 | 变更 |
|------|------|
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | "AI 洗稿结果" → "AI 改写结果" |
---
## ✏️ AI 智能改写 — 自定义提示词功能
### 概述
文案提取助手的"AI 智能改写"原先使用硬编码 prompt用户无法定制改写风格。本次在 checkbox 右侧新增"自定义提示词"折叠区域,用户可编辑自定义 prompt持久化到 localStorage后端按需替换默认 prompt。
### 后端修改
**路由层** (`router.py`)`extract_script_tool` 新增可选 Form 参数 `custom_prompt: Optional[str] = Form(None)`,透传给 service。
**服务层** (`service.py`)`extract_script()` 签名新增 `custom_prompt`,透传给 `glm_service.rewrite_script(script, custom_prompt)`
**AI 层** (`glm_service.py`)`rewrite_script(self, text, custom_prompt=None)`,若 `custom_prompt` 有值则用自定义 prompt + 原文拼接,否则保持原有默认 prompt。
```python
if custom_prompt and custom_prompt.strip():
prompt = f"""{custom_prompt.strip()}
原始文案:
{text}"""
else:
prompt = f"""请将以下视频文案进行改写。...(原有默认)"""
```
### 前端修改
**Hook** (`useScriptExtraction.ts`)
- 新增 `customPrompt` / `showCustomPrompt` 状态
- 初始值从 `localStorage.getItem("vigent_rewriteCustomPrompt")` 恢复
- `customPrompt` 变化时防抖 300ms 保存到 localStorage
- `handleExtract()` 中若 `doRewrite && customPrompt.trim()` 有值,追加 `formData.append("custom_prompt", ...)`
- modal 重置时不清空 customPrompt持久化偏好
**UI** (`ScriptExtractionModal.tsx`)
- checkbox 同行右侧新增"自定义提示词 ▼"按钮(仅 `doRewrite` 时显示)
- 点击展开 textarea 编辑区域,底部提示"留空则使用默认提示词"
- 取消勾选 AI 智能改写时,自定义提示词区域自动隐藏
### 涉及文件
| 文件 | 变更 |
|------|------|
| `backend/app/modules/tools/router.py` | 新增 `custom_prompt` Form 参数 |
| `backend/app/modules/tools/service.py` | `extract_script()` 透传 `custom_prompt` |
| `backend/app/services/glm_service.py` | `rewrite_script()` 支持自定义 prompt |
| `frontend/.../useScriptExtraction.ts` | 新增状态、localStorage 持久化、FormData 传参 |
| `frontend/.../ScriptExtractionModal.tsx` | UI 按钮 + 展开 textarea |
### 验证
- 后端 `python -m py_compile` 三个文件通过
- 前端 `npx tsc --noEmit` 通过
---
## 🐛 SSR 构建修复 — localStorage is not defined
### 问题
`npm run build` 报错 `ReferenceError: localStorage is not defined`,因为 `useScriptExtraction.ts``useState` 的初始化函数在 SSRNode.js环境下也会执行而服务端没有 `localStorage`
### 修复
`useState` 初始化加 `typeof window !== "undefined"` 守卫:
```typescript
const [customPrompt, setCustomPrompt] = useState(
() => typeof window !== "undefined" ? localStorage.getItem(CUSTOM_PROMPT_KEY) || "" : ""
);
```
| 文件 | 变更 |
|------|------|
| `frontend/.../useScriptExtraction.ts` | `useState` 初始化增加 SSR 安全守卫 |
---
## 🎬 片头副标题功能
### 概述
新增片头副标题secondary_title显示在主标题下方用于补充说明或悬念引导。副标题有独立的样式配置字体、字号、颜色等可由 AI 同时生成20 字限制,仅在视频画面中显示,不参与发布标题。
命名约定:后端 `secondary_title`snake_case前端 `videoSecondaryTitle`camelCase用户界面"片头副标题"。
---
### 后端修改
| 文件 | 变更 |
|------|------|
| `backend/app/modules/videos/schemas.py` | `GenerateRequest` 新增 4 个可选字段:`secondary_title``secondary_title_style_id``secondary_title_font_size``secondary_title_top_margin` |
| `backend/app/services/glm_service.py` | AI prompt 增加副标题生成要求不超过20字JSON 格式新增 `secondary_title` 字段 |
| `backend/app/modules/ai/router.py` | `GenerateMetaResponse` 增加 `secondary_title: str = ""`endpoint 返回时取 `result.get("secondary_title", "")` |
| `backend/app/modules/videos/workflow.py` | `use_remotion` 条件增加 `or req.secondary_title`;副标题样式解析复用 `get_style("title", ...)`;字号/间距覆盖;`prepare_style_for_remotion` 处理副标题字体;`remotion_service.render()` 传入 `secondary_title` + `secondary_title_style` |
| `backend/app/services/remotion_service.py` | `render()` 新增 `secondary_title``secondary_title_style` 参数,构建 CLI 参数 `--secondaryTitle``--secondaryTitleStyle` |
### Remotion 修改
| 文件 | 变更 |
|------|------|
| `remotion/render.ts` | `RenderOptions` 新增 `secondaryTitle?` + `secondaryTitleStyle?``parseArgs()` 新增 switch case`inputProps` 新增两个字段 |
| `remotion/src/components/Title.tsx` | `TitleProps` 新增 `secondaryTitle?``secondaryTitleStyle?``AbsoluteFill` 改为 `flexDirection: 'column'` 垂直堆叠;主标题 `<h1>` 后增加副标题 `<h2>`,独立样式(默认字号 48px、字重 700共享淡入淡出动画副标题字体使用独立 `@font-face``SecondaryTitleFont`)避免与主标题冲突 |
| `remotion/src/Video.tsx` | `VideoProps` 新增 `secondaryTitle?` + `secondaryTitleStyle?`;传递给 `<Title>` 组件;渲染条件改为 `{(title \|\| secondaryTitle) && ...}` |
| `remotion/src/Root.tsx` | `defaultProps` 新增 `secondaryTitle: undefined` + `secondaryTitleStyle: undefined` |
### 前端修改
| 文件 | 变更 |
|------|------|
| `frontend/src/shared/lib/title.ts` | 新增 `SECONDARY_TITLE_MAX_LENGTH = 20``clampSecondaryTitle()` |
| `frontend/src/features/home/model/useHomeController.ts` | 新增状态 `videoSecondaryTitle``selectedSecondaryTitleStyleId``secondaryTitleFontSize``secondaryTitleTopMargin``secondaryTitleSizeLocked`;新建 `secondaryTitleInput = useTitleInput({ maxLength: 20 })`(不 sync 到发布页);`handleGenerateMeta()` 接收并填充 `secondary_title``handleGenerate()` 构建 payload 增加副标题字段return 暴露所有新状态 |
| `frontend/src/features/home/model/useHomePersistence.ts` | 新增 localStorage key`secondaryTitle``secondaryTitleStyle``secondaryTitleFontSize``secondaryTitleTopMargin`;对应的恢复和保存 effect |
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | Props 新增副标题相关;主标题输入框下方添加"片头副标题限制20个字"输入框;副标题样式选择器(复用 titleStyles 预设、字号滑块30-100px、间距滑块0-100px |
| `frontend/src/features/home/ui/FloatingStylePreview.tsx` | 标题预览改为 flex column 布局;主标题下方增加副标题预览行,独立样式渲染 |
| `frontend/src/features/home/ui/HomePage.tsx` | 从 `useHomeController` 解构新状态,传给 `TitleSubtitlePanel` |
---
## 🐛 参考音频上传 — 中文文件名 InvalidKey 修复
### 问题
上传中文名参考音频(如"我的声音.wav"Supabase Storage 报 `InvalidKey`,因为存储路径直接使用了原始中文文件名。
### 修复
`ref_audios/service.py` 新增 `sanitize_filename()` 函数,将存储路径的文件名清洗为 ASCII 安全字符(仅 `A-Za-z0-9._-`
- NFKD 规范化 → 丢弃非 ASCII → 非法字符替换为 `_`
- 纯中文/emoji 清洗后为空时,使用 MD5 哈希兜底(如 `audio_e924b1193007`
- 文件名限长 50 字符
- 原始中文文件名保留在 metadata 中作为展示名,前端显示不受影响
```
修复前: cbbe.../1771915755_我的声音.wav → InvalidKey
修复后: cbbe.../1771915755_audio_xxxxxxxx.wav → 上传成功
```
| 文件 | 变更 |
|------|------|
| `backend/app/modules/ref_audios/service.py` | 新增 `sanitize_filename()` 函数,上传路径使用清洗后文件名 |

239
Docs/DevLogs/Day26.md Normal file
View File

@@ -0,0 +1,239 @@
## 🎨 前端优化:板块合并 + 序号标题 + UI 精细化 (Day 26)
### 概述
首页原有 9 个独立板块(左栏 7 个 + 右栏 2 个),每个都有自己的卡片容器和标题,视觉碎片化严重。本次将相关板块合并为 5 个主板块,添加中文序号(一~十),移除 emoji 图标,并对多个子组件的布局和交互细节进行优化。
---
## ✅ 改动内容
### 1. 板块合并方案
**左栏4 个主板块 + 2 个独立区域):**
| 序号 | 板块名 | 子板块 | 原组件 |
|------|--------|--------|--------|
| 一 | 文案提取与编辑 | — | ScriptEditor |
| 二 | 标题与字幕 | — | TitleSubtitlePanel |
| 三 | 配音 | 配音方式 / 配音列表 | VoiceSelector + GeneratedAudiosPanel |
| 四 | 素材编辑 | 视频素材 / 时间轴编辑 | MaterialSelector + TimelineEditor |
| 五 | 背景音乐 | — | BgmPanel |
| — | 生成按钮 | — | GenerateActionBar不编号 |
**右栏1 个主板块):**
| 序号 | 板块名 | 子板块 | 原组件 |
|------|--------|--------|--------|
| 六 | 作品 | 作品列表 / 作品预览 | HistoryList + PreviewPanel |
**发布页(/publish**
| 序号 | 板块名 |
|------|--------|
| 七 | 平台账号 |
| 八 | 选择发布作品 |
| 九 | 发布信息 |
| 十 | 选择发布平台 |
### 2. embedded 模式
6 个组件新增 `embedded?: boolean` prop默认 `false`
- `VoiceSelector` — embedded 时不渲染外层卡片和主标题
- `GeneratedAudiosPanel` — embedded 时两行布局:第 1 行(语速+生成配音右对齐)、第 2 行(配音列表+刷新)
- `MaterialSelector` — embedded 时自渲染 h3 子标题"视频素材"+ 上传/刷新按钮同行
- `TimelineEditor` — embedded 时自渲染 h3 子标题"时间轴编辑"+ 画面比例/播放控件同行
- `PreviewPanel` — embedded 时不渲染外层卡片和标题
- `HistoryList` — embedded 时不渲染外层卡片和标题(刷新按钮由 HomePage 提供)
### 3. 序号标题 + emoji 移除
所有编号板块移除 emoji 图标,使用纯中文序号:
- ScriptEditor: `✍️ 文案提取与编辑``一、文案提取与编辑`
- TitleSubtitlePanel: `🎬 标题与字幕``二、标题与字幕`
- BgmPanel: `🎵 背景音乐``五、背景音乐`
- HomePage 右栏: `五、作品``六、作品`
- PublishPage: `👤 平台账号``七、平台账号``📹 选择发布作品``八、选择发布作品``✍️ 发布信息``九、发布信息``📱 选择发布平台``十、选择发布平台`
### 4. 子标题与分隔样式
- **主标题**: `text-base sm:text-lg font-semibold text-white`
- **子标题**: `text-sm font-medium text-gray-400`
- **分隔线**: `<div className="border-t border-white/10 my-4" />`
### 5. 配音列表布局优化
GeneratedAudiosPanel embedded 模式下采用两行布局:
- **第 1 行**:语速下拉 + 生成配音按钮(右对齐,`flex justify-end`
- **第 2 行**`<h3>配音列表</h3>` + 刷新按钮(两端对齐)
- 非 embedded 模式保持原单行布局
### 6. TitleSubtitlePanel 下拉对齐
- 标题样式/副标题样式/字幕样式三行标签统一 `w-20`(固定 80px确保下拉菜单垂直对齐
- 下拉菜单宽度 `w-1/3 min-w-[100px]`,避免过宽
### 7. RefAudioPanel 文案简化
- 原底部段落"上传任意语音样本3-10秒…" 移至 "我的参考音频" 标题旁,简化为 `(上传3-10秒语音样本)`
### 8. 账户下拉菜单添加手机号
- AccountSettingsDropdown 在账户有效期上方新增手机号显示区域
- 显示 `user?.phone || '未知账户'`
### 9. 标题显示模式对副标题生效
- **payload 修复**: `useHomeController.ts``title_display_mode` 的发送条件从 `videoTitle.trim()` 改为 `videoTitle.trim() || videoSecondaryTitle.trim()`,确保仅有副标题时也能发送显示模式
- **UI 调整**: 短暂显示/常驻显示下拉从片头标题输入行移至"二、标题与字幕"板块标题行(与预览样式按钮同行),明确表示该设置对标题和副标题同时生效
- Remotion 端 `Title.tsx` 已支持(标题和副标题作为整体组件渲染,`displayMode` 统一控制)
### 10. 时间轴模糊遮罩
遮罩从外层 wrapper 移入"四、素材编辑"卡片内,仅覆盖时间轴子区域(`rounded-xl`)。
### 11. 登录后用户信息立即可用
- AuthContext 新增 `setUser` 方法暴露给消费者
- 登录页成功后调用 `setUser(result.user)` 立即写入 Context无需等页面刷新
- 修复登录后账户下拉显示"未知账户"、刷新后才显示手机号的问题
### 12. 文案与选项微调
- MaterialSelector 描述 `(可多选最多4个)``(上传自拍视频最多可选4个)`
- TitleSubtitlePanel 显示模式选项 `短暂显示/常驻显示``标题短暂显示/标题常驻显示`
### 13. UI/UX 体验优化6 项)
- **操作按钮移动端可见**: 配音列表、作品列表、素材列表、参考音频、历史文案的操作按钮从 `opacity-0`hover 才显示)改为 `opacity-40`平时半透明可见hover 全亮),解决触屏设备无法发现按钮的问题
- **手机号脱敏**: AccountSettingsDropdown 手机号中间四位遮掩 `138****5678`
- **标题字数计数器**: TitleSubtitlePanel 标题/副标题输入框右侧显示实时字数 `3/15`,超限变红
- **列表滚动条提示**: ~~配音列表、作品列表、素材列表、BGM 列表从 `hide-scrollbar` 改为 `custom-scrollbar`~~ → 已全部改回 `hide-scrollbar` 隐藏滚动条(滚动功能不变)
- **时间轴拖拽提示**: TimelineEditor 色块左上角新增 `GripVertical` 抓手图标,暗示可拖拽排序
- **截取滑块放大**: ClipTrimmer 手柄从 16px 放大到 20px触控区从 32px 放大到 40px
### 14. 代码质量修复4 项)
- **AccountSettingsDropdown**: 关闭密码弹窗补齐 `setSuccess('')` 清空
- **MaterialSelector**: `selectedSet``useMemo` 避免每次渲染重建
- **TimelineEditor**: `visibleSegments`/`overflowSegments``useMemo`
- **MaterialSelector**: 素材满 4 个时非选中项按钮加 `disabled`
### 15. 发布页平台账号响应式布局
- **单行布局**:图标+名称+状态在左,按钮在右(`flex items-center`
- **移动端紧凑**:图标 `h-6 w-6`、按钮 `text-xs px-2 py-1 rounded-md`、间距 `space-y-2 px-3 py-2.5`
- **桌面端宽松**`sm:h-7 sm:w-7``sm:text-sm sm:px-3 sm:py-1.5 sm:rounded-lg``sm:space-y-3 sm:px-4 sm:py-3.5`
- 两端各自美观,风格与其他板块一致
### 16. 移动端刷新回顶部修复
- **问题**: 移动端刷新页面后不回到顶部,而是滚动到背景音乐板块
- **根因**: 1) 浏览器原生滚动恢复覆盖 `scrollTo(0,0)`2) 列表 scroll effect 有双依赖(`selectedId` + `list`),数据异步加载时第二次触发跳过了 ref 守卫,执行了 `scrollIntoView` 导致页面跳动
- **修复**: 三管齐下 — ① `history.scrollRestoration = "manual"` 禁用浏览器原生恢复;② 时间门控 `scrollEffectsEnabled` ref1 秒内禁止所有列表自动滚动)替代单次 ref 守卫;③ 200ms 延迟兜底 `scrollTo(0,0)`
### 17. 移动端样式预览窗口缩小
- **问题**: 移动端点击"预览样式"后窗口占满整屏(宽 358px高约 636px遮挡样式调节控件
- **修复**: 移动端宽度从 `window.innerWidth - 32` 缩小到 **160px**;位置从左上角改为**右下角**`right:12, bottom:12`),不遮挡上方控件;最大高度限制 `50dvh`
- 桌面端保持不变280px左上角
### 18. 列表滚动条统一隐藏
- 将 Day 26 早期改为 `custom-scrollbar`(细紫色滚动条)的 7 处全部改回 `hide-scrollbar`
- 涉及BgmPanel、GeneratedAudiosPanel、HistoryList、MaterialSelector2处、ScriptExtractionModal2处
- 滚动功能不受影响,仅视觉上不显示滚动条
### 19. 配音按钮移动端适配
- VoiceSelector "选择声音/克隆声音" 按钮:内边距 `px-4``px-2 sm:px-4`,字号加 `text-sm sm:text-base`,图标加 `shrink-0`
- 修复移动端窄屏下按钮被挤压导致"克隆声音"不可见的问题
### 20. 素材标题溢出修复
- MaterialSelector embedded 标题行移除 `whitespace-nowrap`
- 描述文字 `(上传自拍视频最多可选4个)` 在移动端隐藏(`hidden sm:inline`),桌面端正常显示
- 修复移动端刷新按钮被推出容器外的问题
### 21. 生成配音按钮放大
- "生成配音" 作为核心操作按钮,从辅助尺寸升级为主操作尺寸
- 内边距 `px-2/px-3 py-1/py-1.5``px-4 py-2`,字号 `text-xs``text-sm font-medium`
- 图标 `h-3.5 w-3.5``h-4 w-4`,新增 `shadow-sm` + hover `shadow-md`
- embedded 与非 embedded 模式统一放大
### 22. 生成进度条位置调整
- **问题**: 生成进度条在"六、作品"卡片内部(作品预览下方),不够醒目
- **修复**: 进度条从 PreviewPanel 内部提取到 HomePage 右栏,作为独立卡片渲染在"六、作品"卡片**上方**
- 使用紫色边框(`border-purple-500/30`)区分,显示任务消息和百分比
- PreviewPanel embedded 模式下不再渲染进度条(传入 `currentTask={null}`
- 生成完成后进度卡片自动消失
### 23. LatentSync 超时修复
- **问题**: 约 2 分钟的视频3023 帧190 段推理)预计推理 54 分钟,但 httpx 超时仅 20 分钟,导致 LatentSync 调用失败并回退到无口型同步
- **根因**: `lipsync_service.py``httpx.AsyncClient(timeout=1200.0)` 不足以覆盖长视频推理时间
- **修复**: 超时从 `1200s`20 分钟)改为 `3600s`1 小时),足以覆盖 2-3 分钟视频的推理
### 24. 字幕时间戳节奏映射(修复长视频字幕漂移)
- **问题**: 2 分钟视频字幕明显对不上语音,越到后面偏差越大
- **根因**: `whisper_service.py``original_text` 处理逻辑丢弃了 Whisper 逐词时间戳,仅保留总时间范围后做全程线性插值,每个字分配相同时长,完全忽略语速变化和停顿
- **修复**: 保留 Whisper 的逐字时间戳作为语音节奏模板,将原文字符按比例映射到 Whisper 时间节奏上rhythm-mapping而非线性均分。字幕文字不变只是时间戳跟随真实语速
- **算法**: 原文第 i 个字符映射到 Whisper 时间线的 `(i/N)*M` 位置N=原文字符数M=Whisper字符数在相邻 Whisper 时间点间线性插值
---
## 📁 修改文件清单
| 文件 | 改动 |
|------|------|
| `VoiceSelector.tsx` | 新增 embedded prop移动端按钮适配`px-2 sm:px-4` |
| `GeneratedAudiosPanel.tsx` | 新增 embedded prop两行布局操作按钮可见度"生成配音"按钮放大 |
| `MaterialSelector.tsx` | 新增 embedded prop自渲染子标题+操作按钮useMemodisabled 守卫,操作按钮可见度,标题溢出修复 |
| `TimelineEditor.tsx` | 新增 embedded prop自渲染子标题+控件useMemo拖拽抓手图标 |
| `PreviewPanel.tsx` | 新增 embedded prop |
| `HistoryList.tsx` | 新增 embedded prop操作按钮可见度 |
| `ScriptEditor.tsx` | 标题加序号,移除 emoji操作按钮可见度 |
| `TitleSubtitlePanel.tsx` | 标题加序号,移除 emoji下拉对齐显示模式下拉上移字数计数器 |
| `BgmPanel.tsx` | 标题加序号 |
| `HomePage.tsx` | 核心重构:合并板块、序号标题、生成配音按钮迁入、`scrollRestoration` + 延迟兜底修复刷新回顶部、生成进度条提取到作品卡片上方 |
| `PublishPage.tsx` | 四个板块加序号(七~十),移除 emoji平台卡片响应式单行布局 |
| `RefAudioPanel.tsx` | 简化提示文案,操作按钮可见度 |
| `AccountSettingsDropdown.tsx` | 新增手机号显示(脱敏),补齐 success 清空 |
| `AuthContext.tsx` | 新增 `setUser` 方法,登录后立即更新用户状态 |
| `login/page.tsx` | 登录成功后调用 `setUser` 写入用户数据 |
| `useHomeController.ts` | titleDisplayMode 条件修复,列表 scroll 时间门控 `scrollEffectsEnabled` |
| `FloatingStylePreview.tsx` | 移动端预览窗口缩小160px并移至右下角 |
| `ScriptExtractionModal.tsx` | 滚动条改回隐藏 |
| `ClipTrimmer.tsx` | 滑块手柄放大、触控区增高 |
| `lipsync_service.py` | httpx 超时从 1200s 改为 3600s |
| `whisper_service.py` | 字幕时间戳从线性插值改为 Whisper 节奏映射 |
---
## 🔍 验证
- `npm run build` — 零报错零警告
- 合并后布局:各子板块分隔清晰、主标题有序号
- 向后兼容:`embedded` 默认 `false`,组件独立使用不受影响
- 配音列表两行布局:语速+生成配音在上,配音列表+刷新在下
- 下拉菜单垂直对齐正确
- 短暂显示/常驻显示对标题和副标题同时生效
- 操作按钮在移动端(触屏)可见
- 手机号脱敏显示
- 标题字数计数器正常
- 列表滚动条全部隐藏
- 时间轴拖拽抓手图标显示
- 发布页平台卡片:移动端紧凑、桌面端宽松,风格一致
- 移动端刷新后回到顶部,不再滚动到背景音乐位置
- 移动端样式预览窗口不遮挡控件
- 移动端配音按钮(选择声音/克隆声音)均可见
- 移动端素材标题行按钮不溢出
- 生成配音按钮视觉层级高于辅助按钮
- 生成进度条在作品卡片上方独立显示
- LatentSync 长视频推理不再超时回退
- 字幕时间戳与语音节奏同步,长视频不漂移

231
Docs/DevLogs/Day27.md Normal file
View File

@@ -0,0 +1,231 @@
## Remotion 描边修复 + 字体样式扩展 + TypeScript 修复 (Day 27)
### 概述
修复标题/字幕描边渲染问题(描边过粗 + 副标题重影),扩展字体样式选项(标题 4→12、字幕 4→8修复 Remotion 项目 TypeScript 类型错误。
---
## ✅ 改动内容
### 1. 描边渲染修复(标题 + 字幕)
- **问题**: 标题黑色描边过粗,副标题出现重影/鬼影
- **根因**: `buildTextShadow` 用 4 方向 `textShadow` 模拟描边 — 对角线叠加导致描边视觉上比实际 `stroke_size` 更粗4 角方向在中间有间隙和叠加,造成重影
- **修复**: 改用 CSS 原生描边 `-webkit-text-stroke` + `paint-order: stroke fill`Remotion 用 Chromium 渲染,完美支持)
- **旧方案**:
```javascript
textShadow: `-8px -8px 0 #000, 8px -8px 0 #000, -8px 8px 0 #000, 8px 8px 0 #000, 0 0 16px rgba(0,0,0,0.5), 0 2px 4px rgba(0,0,0,0.3)`
```
- **新方案**:
```javascript
WebkitTextStroke: `5px #000000`,
paintOrder: 'stroke fill',
textShadow: `0 2px 4px rgba(0,0,0,0.3)`,
```
- 同时将所有预设样式的 `stroke_size` 从 8 降到 5配合原生描边视觉更干净
### 2. 字体样式扩展
**标题样式**: 4 个 → 12 个(+8
| ID | 样式名 | 字体 | 配色 |
|----|--------|------|------|
| title_pangmen | 庞门正道 | 庞门正道标题体3.0 | 白字黑描 |
| title_round | 优设标题圆 | 优设标题圆 | 白字紫描 |
| title_alibaba | 阿里数黑体 | 阿里巴巴数黑体 | 白字黑描 |
| title_chaohei | 文道潮黑 | 文道潮黑 | 青蓝字深蓝描 |
| title_wujie | 无界黑 | 标小智无界黑 | 白字深灰描 |
| title_houdi | 厚底黑 | Aa厚底黑 | 红字深黑描 |
| title_banyuan | 寒蝉半圆体 | 寒蝉半圆体 | 白字黑描 |
| title_jixiang | 欣意吉祥宋 | 字体圈欣意吉祥宋 | 金字棕描 |
**字幕样式**: 4 个 → 8 个(+4
| ID | 样式名 | 字体 | 高亮色 |
|----|--------|------|--------|
| subtitle_pink | 少女粉 | DingTalk JinBuTi | 粉色 #FF69B4 |
| subtitle_lime | 清新绿 | DingTalk Sans | 荧光绿 #76FF03 |
| subtitle_gold | 金色隶书 | 阿里妈妈刀隶体 | 金色 #FDE68A |
| subtitle_kai | 楷体红字 | SimKai | 红色 #FF4444 |
### 3. TypeScript 类型错误修复
- **Root.tsx**: `Composition` 泛型类型与 `calculateMetadata` 参数类型不匹配 — 内联 `calculateMetadata` 并显式标注参数类型,`defaultProps` 使用 `satisfies VideoProps` 约束
- **Video.tsx**: `VideoProps` 接口添加 `[key: string]: unknown` 索引签名,兼容 Remotion 要求的 `Record<string, unknown>` 约束
- **VideoLayer.tsx**: `OffthreadVideo` 组件不支持 `loop` prop — 移除(该 prop 原本就被忽略)
### 4. 进度条文案还原
- **问题**: 进度条显示后端推送的详细阶段消息(如"正在合成唇型"),用户希望只显示"正在AI生成中..."
- **修复**: `HomePage.tsx` 进度条文案从 `{currentTask.message || "正在AI生成中..."}` 改为固定 `正在AI生成中...`
---
## 📁 修改文件清单
| 文件 | 改动 |
|------|------|
| `remotion/src/components/Title.tsx` | `buildTextShadow` → `buildStrokeStyle`CSS 原生描边),标题+副标题同时生效 |
| `remotion/src/components/Subtitles.tsx` | `buildTextShadow` → `buildStrokeStyle`CSS 原生描边) |
| `remotion/src/Root.tsx` | 修复 `Composition` 泛型类型、`calculateMetadata` 参数类型 |
| `remotion/src/Video.tsx` | `VideoProps` 添加索引签名 |
| `remotion/src/components/VideoLayer.tsx` | 移除 `OffthreadVideo` 不支持的 `loop` prop |
| `backend/assets/styles/title.json` | 标题样式从 4 个扩展到 12 个,`stroke_size` 8→5 |
| `backend/assets/styles/subtitle.json` | 字幕样式从 4 个扩展到 8 个 |
| `frontend/.../HomePage.tsx` | 进度条文案还原为固定"正在AI生成中..." |
---
## 🔍 验证
- `npx tsc --noEmit` — 零错误
- `npm run build:render` — 渲染脚本编译成功
- `npm run build`(前端)— 零报错
- 描边:标题/副标题/字幕使用 CSS 原生描边,无重影、无虚胖
- 样式选择:前端下拉可加载全部 12 个标题 + 8 个字幕样式
---
## 视频生成流水线性能优化
### 概述
针对视频生成流水线进行全面性能优化,涵盖 FFmpeg 编码参数、LatentSync 推理参数、多素材并行化、以及后处理阶段并行化。预估 15s 单素材视频从 ~280s 降至 ~190s (32%)30s 双素材从 ~400s 降至 ~240s (40%)。
**服务器配置**: 2x RTX 3090 (24GB), 2x Xeon E5-2680 v4 (56核), 192GB RAM
### 第一阶段FFmpeg 编码优化
**最终合成 preset `slow` → `medium`**
- 合成阶段从 ~50s 降到 ~25s质量几乎无变化
**中间文件 CRF 18 → 23**
- 中间产物trim、prepare_segment、concat、loop、normalize_orientation不是最终输出不需要高质量编码
- 每个中间步骤快 3-8 秒
**最终合成 CRF 18 → 20**
- 15 秒口播视频 CRF 18 vs 20 肉眼无法区分
### 第二阶段LatentSync 推理参数调优
**inference_steps 20 → 16**
- 推理时间线性减少 20%~180s → ~144s
**guidance_scale 2.0 → 1.5**
- classifier-free guidance 权重降低每步计算量微降5-10%
> ⚠️ 两项需重启 LatentSync 服务后测试唇形质量,确认可接受再保留。如质量不佳可回退 .env 参数。
### 第三阶段:多素材流水线并行化
**素材下载 + 归一化并行**
- 串行 `for` 循环改为 `asyncio.gather()``normalize_orientation` 通过 `run_in_executor` 在线程池执行
- N 个素材从串行 N×5s → ~5s
**片段预处理并行**
- 逐个 `prepare_segment` 改为 `asyncio.gather()` + `run_in_executor`
- 2 素材 ~90s → ~50s4 素材 ~180s → ~60s
### 第四阶段:流水线交叠
**Whisper 字幕对齐 与 BGM 混音 并行**
- 两者互不依赖(都只依赖 audio_path用 `asyncio.gather()` 并行执行
- 单素材模式下 Whisper 从 LatentSync 之后的串行步骤移至与 BGM 并行
- 不开 BGM 或不开字幕时行为不变,只有同时启用时才并行
### 修改文件
| 文件 | 改动 |
|------|------|
| `backend/app/services/video_service.py` | compose: preset slow→medium, CRF 18→20; normalize_orientation/prepare_segment/concat: CRF 18→23 |
| `backend/app/services/lipsync_service.py` | _loop_video_to_duration: CRF 18→23 |
| `backend/.env` | LATENTSYNC_INFERENCE_STEPS=16, LATENTSYNC_GUIDANCE_SCALE=1.5 |
| `backend/app/modules/videos/workflow.py` | import asyncio; 素材下载/归一化并行; 片段预处理并行; Whisper+BGM 并行 |
### 回退方案
- FFmpeg 参数:如画质不满意,将最终 CRF 改回 18、preset 改回 slow
- LatentSync如唇形质量下降将 .env 中 `INFERENCE_STEPS` 改回 20、`GUIDANCE_SCALE` 改回 2.0
- 并行化:纯架构优化,无质量影响,无需回退
---
## MuseTalk + LatentSync 混合唇形同步方案
### 概述
LatentSync 1.6 质量高但推理极慢(~78% 总时长),长视频(>=2min耗时 20-60 分钟不可接受。MuseTalk 1.5 是单步潜空间修复非扩散模型逐帧推理速度接近实时30fps+ on V100适合长视频。混合方案按音频时长自动路由短视频用 LatentSync 保质量,长视频用 MuseTalk 保速度。
### 架构
- **路由阈值**: `LIPSYNC_DURATION_THRESHOLD` (默认 120s)
- **短视频 (<120s)**: LatentSync 1.6 (GPU1, 端口 8007)
- **长视频 (>=120s)**: MuseTalk 1.5 (GPU0, 端口 8011)
- **回退**: MuseTalk 不可用时自动 fallback 到 LatentSync
### 改动文件
| 文件 | 改动 |
|------|------|
| `models/MuseTalk/` | 从 Temp/MuseTalk 复制代码 + 下载权重 |
| `models/MuseTalk/scripts/server.py` | 新建 FastAPI 常驻服务 (端口 8011, GPU0) |
| `backend/app/core/config.py` | 新增 MUSETALK_* 和 LIPSYNC_DURATION_THRESHOLD |
| `backend/.env` | 新增对应环境变量 |
| `backend/app/services/lipsync_service.py` | 新增 `_call_musetalk_server()` + 混合路由逻辑 + 扩展 `check_health()` |
---
## MuseTalk 推理性能优化 (server.py v2)
### 概述
MuseTalk 首次长视频测试 (136s, 3404 帧) 耗时 1799s (~30 分钟),分析发现瓶颈集中在人脸检测 (28%)、BiSeNet 合成 (22%)、I/O (17%),而非 UNet 推理本身 (17%)。通过 6 项优化预估降至 8-10 分钟 (~3x 加速)。
### 性能瓶颈分析 (优化前, 1799s)
| 阶段 | 耗时 | 占比 | 瓶颈原因 |
|------|------|------|---------|
| DWPose + 人脸检测 | ~510s | 28% | `batch_size_fa=1`, 每帧跑 2 个 NN, 完全串行 |
| 合成 + BiSeNet 人脸解析 | ~400s | 22% | 每帧都跑 BiSeNet + PNG 写盘 |
| UNet 推理 | ~300s | 17% | batch_size=8 太小 |
| I/O (PNG 读写 + FFmpeg) | ~300s | 17% | PNG 压缩慢, ffmpeg→PNG→imread 链路 |
| VAE 编码 | ~100s | 6% | 逐帧编码, 未批处理 |
### 6 项优化
| # | 优化项 | 详情 |
|---|--------|------|
| 1 | **batch_size 8→32** | `.env` 修改, RTX 3090 显存充裕 |
| 2 | **cv2.VideoCapture 直读帧** | 跳过 ffmpeg→PNG→imread 链路, 省去 3404 次 PNG 编解码 |
| 3 | **人脸检测降频 (每5帧)** | 每 5 帧运行 DWPose + FaceAlignment, 中间帧线性插值 bbox |
| 4 | **BiSeNet mask 缓存 (每5帧)** | 每 5 帧运行 `get_image_prepare_material`, 中间帧用 `get_image_blending` 复用缓存 mask |
| 5 | **cv2.VideoWriter 直写** | 跳过逐帧 PNG 写盘 + ffmpeg 重编码, 用 VideoWriter 直写 mp4 |
| 6 | **每阶段计时** | 7 个阶段精确计时, 方便后续进一步调优 |
### 修改文件
| 文件 | 改动 |
|------|------|
| `models/MuseTalk/scripts/server.py` | 完全重写 `_run_inference()`, 新增 `_detect_faces_subsampled()` |
| `backend/.env` | `MUSETALK_BATCH_SIZE` 8→32 |
---
## Remotion 并发渲染优化
### 概述
Remotion 渲染在 56 核服务器上默认只用 8 并发 (`min(8, cores/2)`),改为 16 并发,预估从 ~5 分钟降到 ~2-3 分钟。
### 改动
- `remotion/render.ts`: `renderMedia()` 新增 `concurrency` 参数 (默认 16), 支持 `--concurrency` CLI 参数覆盖
- `remotion/dist/render.js`: 重新编译
### 修改文件
| 文件 | 改动 |
|------|------|
| `remotion/render.ts` | `RenderOptions` 新增 `concurrency` 字段, `renderMedia()` 传入 `concurrency` |
| `remotion/dist/render.js` | TypeScript 重新编译 |

203
Docs/DevLogs/Day28.md Normal file
View File

@@ -0,0 +1,203 @@
## CosyVoice FP16 加速 + 文档更新 + AI改写界面重构 + 标题字幕面板重排与视频帧预览 (Day 28)
### 概述
CosyVoice 3.0 声音克隆服务开启 FP16 半精度推理,预估提速 30-40%。同步更新 4 个项目文档。重构 AI 改写文案界面RewriteModal 两步流程 + ScriptExtractionModal 逻辑抽取)。前端将"标题与字幕"面板从第二步移至第四步(素材编辑之后),样式预览窗口背景从紫粉渐变改为视频片头帧截图,实现所见即所得。
---
## ✅ 改动内容
### 1. CosyVoice FP16 半精度加速
- **问题**: CosyVoice 3.0 以 FP32 全精度运行RTF (Real-Time Factor) 约 0.9-1.35x,生成 2 分钟音频需要约 2 分钟
- **根因**: `AutoModel()` 初始化时未传入 `fp16=True`LLM 推理和 Flow Matching (DiT) 均在 FP32 下运行
- **修复**: 一行改动开启 FP16 自动混合精度
```python
# 旧: _model = AutoModel(model_dir=str(MODEL_DIR))
# 新:
_model = AutoModel(model_dir=str(MODEL_DIR), fp16=True)
```
- **生效机制**: `CosyVoice3Model``llm_job()``token2wav()` 中通过 `torch.cuda.amp.autocast(self.fp16)` 自动将计算转为 FP16
- **预期效果**:
- 推理速度提升 30-40%
- 显存占用降低 ~30%
- 语音质量基本无损0.5B 模型 FP16 精度充足)
- **验证**: 服务重启后自检通过,健康检查 `ready: true`
### 2. 文档全面更新 (4 个文件)
补充 Day 27 新增的 MuseTalk 混合唇形同步方案、性能优化、Remotion 并发渲染等内容到所有相关文档。
#### README.md
- 项目描述更新为 "LatentSync 1.6 + MuseTalk 1.5 混合唇形同步"
- 唇形同步功能描述改为混合方案(短视频 LatentSync长视频 MuseTalk
- 技术栈表新增 MuseTalk 1.5
- 项目结构新增 `models/MuseTalk/`
- 服务架构表新增 MuseTalk (端口 8011)
- 文档中心新增 MuseTalk 部署指南链接
- 性能优化描述新增降频检测 + Remotion 16 并发
#### DEPLOY_MANUAL.md
- GPU 分配说明更新 (GPU0=MuseTalk+CosyVoice, GPU1=LatentSync)
- 步骤 3 拆分为 3a (LatentSync) + 3b (MuseTalk)
- 环境变量表新增 7 个 MuseTalk 变量,移除过时的 `DOUYIN_COOKIE`
- LatentSync 推理步数默认值 20→16
- 测试运行新增 MuseTalk 启动终端
- PM2 管理新增 MuseTalk 服务(第 5 项)
- 端口检查、日志查看命令新增 8011/vigent2-musetalk
#### SUBTITLE_DEPLOY.md
- 技术架构图更新为 LatentSync/MuseTalk 混合路由
- 新增唇形同步路由说明
- Remotion 配置表新增 `concurrency` 参数 (默认 16)
- GPU 分配说明更新
- 更新日志新增 v1.3.0 条目
#### BACKEND_README.md
- 健康检查接口描述更新为含 LatentSync + MuseTalk + 混合路由阈值
- 环境变量配置新增 MuseTalk 相关变量
- 服务集成指南新增"唇形同步混合路由"章节
---
### 3. AI 改写文案界面重构
#### RewriteModal 重构
将 AI 改写弹窗改为两步式流程,提升交互体验:
**第一步 — 配置与触发**
- 自定义提示词输入(可选),自动持久化到 localStorage
- "开始改写"按钮触发 `/api/ai/rewrite` 请求
**第二步 — 结果对比与选择**
- 上方AI 改写结果 + "使用此结果"按钮(紫粉渐变色,醒目)
- 下方:原文对比 + "保留原文"按钮(灰色低调)
- 底部:可"重新改写"(重回第一步,保留自定义提示词)
- ESC 快捷键关闭
#### ScriptExtractionModal 逻辑抽取
将文案提取模态框的全部业务逻辑抽取到独立 hook `useScriptExtraction`
- **useScriptExtraction.ts** (新建): 管理 URL/文件双模式输入、拖拽上传、提取请求、步骤状态机 (config → processing → result)、剪贴板复制
- **ScriptExtractionModal.tsx**: 纯展示组件,消费 hook 返回值,新增 ESC/Enter 快捷键
#### ScriptEditor 工具栏调整
- 按钮组右对齐 (`justify-end`),统一高度 `h-7` 和圆角
- "历史文案"按钮用灰色 (bg-gray-600) 区分辅助功能
- "文案提取助手"用紫色 (bg-purple-600) 表示主功能
- "AI多语言"用绿渐变 (emerald-teal)"AI生成标题标签"用蓝渐变 (blue-cyan)
- "AI智能改写"和"保存文案"移至文本框下方状态栏
---
### 4. 标题字幕面板重排 + 视频帧背景预览
#### 面板顺序重排
`<TitleSubtitlePanel>` 从第二步移至第四步(素材编辑之后),使用户在设置标题字幕样式时已经完成了素材选择和时间轴编排。
新顺序:
```
一、文案提取与编辑(不变)
二、配音(原三)
三、素材编辑(原四)
四、标题与字幕(原二)→ 移到素材编辑之后
```
#### 新建 useVideoFrameCapture hook
从视频 URL 截取 0.1s 处帧画面,返回 JPEG data URL
- 创建 `<video>` 元素,设置 `crossOrigin="anonymous"`(素材存储在 Supabase Storage 跨域地址)
- 先绑定 `loadedmetadata` / `canplay` / `seeked` / `error` 事件监听,再设 src避免事件丢失
- `loadedmetadata``canplay` 触发后 seek 到 0.1s`seeked` 回调中用 canvas `drawImage` 截帧
- canvas 缩放到 480px 宽再编码(预览窗口最大 280px节省内存
- `canvas.toDataURL("image/jpeg", 0.7)` 导出
- 防御 `videoWidth/videoHeight` 为 0 的边界情况
- try-catch 防 canvas taint失败返回 null降级渐变
- `isActive` 标志 + `seeked` 去重标志防止 stale 和重复更新
- 截图完成后清理 video 元素释放内存
#### 按需截取(性能优化)
只在样式预览窗口打开时才触发截取:
```typescript
const materialPosterUrl = useVideoFrameCapture(
showStylePreview ? firstTimelineMaterialUrl : null
);
```
截取源优先使用**时间轴第一段素材**(用户拖拽排序后的真实片头),回退到 `selectedMaterials[0]`(未生成配音、时间轴为空时)。
#### 预览背景替换
`FloatingStylePreview` 有视频帧时直接显示原始画面(不加半透明,保证颜色真实),文字靠描边保证可读性;无视频帧时降级为原紫粉渐变背景。
#### 踩坑记录
1. **CORS tainted canvas**: 素材文件存储在 Supabase Storage (`api.hbyrkj.top`),是跨域签名链接。必须设 `video.crossOrigin = "anonymous"` 才能让 canvas `toDataURL` 不被 SecurityError 拦截
2. **时间轴为空**: `useTimelineEditor``audioDuration <= 0`(未选配音)时返回空数组,需回退到 `selectedMaterials[0]`
3. **事件监听顺序**: 必须先绑定事件监听再设 `video.src`,否则快速加载时事件可能丢失
---
## 📁 修改文件清单
| 文件 | 改动 |
|------|------|
| `models/CosyVoice/cosyvoice_server.py` | `AutoModel()` 新增 `fp16=True` 参数 |
| `README.md` | 混合唇形同步描述、技术栈、服务架构、项目结构更新 |
| `Docs/DEPLOY_MANUAL.md` | MuseTalk 部署步骤、环境变量、PM2 管理、端口检查 |
| `Docs/SUBTITLE_DEPLOY.md` | 架构图、Remotion concurrency、GPU 分配、更新日志 |
| `Docs/BACKEND_README.md` | 健康检查、环境变量、混合路由章节 |
| `frontend/.../RewriteModal.tsx` | 两步式改写流程(自定义提示词 → 结果对比) |
| `frontend/.../script-extraction/useScriptExtraction.ts` | **新建** — 文案提取逻辑 hook |
| `frontend/.../ScriptExtractionModal.tsx` | 纯展示组件,消费 hook新增快捷键 |
| `frontend/.../ScriptEditor.tsx` | 工具栏右对齐 + 按钮分色 + 改写/保存移至底部 |
| `frontend/.../useVideoFrameCapture.ts` | **新建** — 视频帧截取 hookcrossOrigin + canvas 缩放 |
| `frontend/.../useHomeController.ts` | 新增 useMemo 计算素材 URL调用帧截取 hookshowStylePreview 门控 |
| `frontend/.../HomePage.tsx` | 面板重排(二↔四互换),编号更新,透传 materialPosterUrl |
| `frontend/.../TitleSubtitlePanel.tsx` | 编号"二"→"四",新增 previewBackgroundUrl prop |
| `frontend/.../FloatingStylePreview.tsx` | 新增 previewBackgroundUrl prop条件渲染视频帧/渐变背景 |
---
## 🔍 验证
- CosyVoice 重启成功,健康检查 `{"ready": true}`
- 自检推理通过7.2s for "你好"
- FP16 通过 `torch.cuda.amp.autocast(self.fp16)` 在 LLM 和 Flow Matching 阶段生效
- `npx tsc --noEmit` — 零错误
- AI 改写:自定义提示词持久化 → 改写结果 + 原文对比 → "使用此结果"/"保留原文"
- 文案提取URL / 文件双模式 → 处理中动画 → 结果填入
- 面板顺序:一→文案、二→配音、三→素材编辑、四→标题与字幕
- 样式预览背景:有素材时显示真实视频片头帧,无素材降级紫粉渐变
- 预览关闭时不触发截取,不浪费资源
---
## 💡 CosyVoice 性能分析备注
### 当前性能基线 (FP32, 优化前)
| 文本长度 | 音频时长 | 推理耗时 | RTF |
|----------|----------|----------|-----|
| 42 字 | 9.8s | 13.2s | 1.35x |
| 89 字 | 18.2s | 20.3s | 1.12x |
| ~530 字 | 115.8s | 107.7s | 0.93x |
| ~670 字 | 143.5s | 131.6s | 0.92x |
### 未来可选优化(收益递减,暂不实施)
| 优化项 | 预期提升 | 复杂度 |
|--------|----------|--------|
| TensorRT (DiT 模块) | +20-30% | 需编译 .plan 引擎 |
| torch.compile() | +10-20% | 一行代码,但首次编译慢 |
| vLLM (LLM 模块) | +10-15% | 额外依赖 |

View File

@@ -389,7 +389,7 @@ if not qr_element:
## 📋 文档规则优化 (16:42 - 17:10)
**问题**Doc_Rules需要优化,避免误删历史内容、规范工具使用、防止任务清单遗漏
**问题**DOC_RULES需要优化,避免误删历史内容、规范工具使用、防止任务清单遗漏
**优化内容(最终版)**
@@ -411,7 +411,7 @@ if not qr_element:
- 移除无关项目组件
**修改文件**
- `Docs/Doc_Rules.md` - 包含检查清单的最终完善版
- `Docs/DOC_RULES.md` - 包含检查清单的最终完善版
---

View File

@@ -8,8 +8,8 @@
| 规则 | 说明 |
|------|------|
| **默认更新** | 更新 `DayN.md` |
| **按需更新** | `task_complete.md` 仅在用户**明确要求**时更新 |
| **默认更新** | 更新 `DayN.md``TASK_COMPLETE.md` |
| **按需更新** | 其他文档仅在内容变化涉及时更新 |
| **智能修改** | 错误→替换,改进→追加(见下方详细规则) |
| **先读后写** | 更新前先查看文件当前内容 |
| **日内合并** | 同一天的多次小修改合并为最终版本 |
@@ -23,16 +23,14 @@
| 优先级 | 文件路径 | 检查重点 |
| :---: | :--- | :--- |
| 🔥 **High** | `Docs/DevLogs/DayN.md` | **(最新日志)** 详细记录变更、修复、代码片段 |
| 🔥 **High** | `Docs/task_complete.md` | **(任务总览)** 更新 `[x]`、进度条、时间线 |
| 🔥 **High** | `Docs/TASK_COMPLETE.md` | **(任务总览)** 更新 `[x]`、进度条、时间线 |
| ⚡ **Med** | `README.md` | **(项目主页)** 功能特性、技术栈、最新截图 |
| ⚡ **Med** | `Docs/DEPLOY_MANUAL.md` | **(部署手册)** 环境变量、依赖包、启动命令变更 |
| ⚡ **Med** | `Docs/BACKEND_DEV.md` | **(后端规范)** 接口契约、模块划分、环境变量 |
| ⚡ **Med** | `Docs/BACKEND_README.md` | **(后端文档)** 接口说明、架构设计 |
| ⚡ **Med** | `Docs/FRONTEND_DEV.md` | **(前端规范)** API封装、日期格式化、新页面规范 |
| ⚡ **Med** | `Docs/FRONTEND_README.md` | **(前端文档)** 功能说明、页面变更 |
| 🧊 **Low** | `Docs/*_DEPLOY.md` | **(子系统部署)** LatentSync/Qwen3/字幕等独立部署文档 |
| 🧊 **Low** | `Docs/implementation_plan.md` | **(实施计划)** 核对计划与实际实现的差异 |
| 🧊 **Low** | `Docs/architecture_plan.md` | **(前端架构)** 拆分计划与阶段目标 |
| 🧊 **Low** | `Docs/*_DEPLOY.md` | **(子系统部署)** LatentSync/CosyVoice/字幕等独立部署文档 |
---
@@ -97,7 +95,7 @@
### 必须执行的检查步骤
**1. 快速浏览全文**(使用 `view_file``grep_search`
**1. 快速浏览全文**(使用 `Read``Grep`
```markdown
# 检查是否存在:
- 同主题的旧章节?
@@ -144,66 +142,41 @@
> **核心原则**:使用正确的工具,避免字符编码问题
### ✅ 推荐工具:apply_patch
### ✅ 推荐工具:Edit / Read / Grep
**使用场景**
- 追加新章节到文件末尾
- 修改/替换现有章节内容
- 更新状态标记(🔄 → ✅)
- 修正错误内容
**优势**
- ✅ 自动处理字符编码Windows CRLF
- ✅ 精确替换,不会误删其他内容
- ✅ 有错误提示,方便调试
- `Read`:更新前先查看文件当前内容
- `Edit`:精确替换现有内容、追加新章节
- `Grep`:搜索文件中是否已有相关章节
- `Write`:创建新文件(如 Day{N+1}.md
**注意事项**
```markdown
1. **必须精确匹配**TargetContent 必须与文件完全一致
2. **处理换行符**文件使用 \r\n不要漏掉 \r
3. **合理范围**StartLine/EndLine 应覆盖目标内容
4. **先读后写**:编辑前先 view_file 确认内容
1. **先读后写**:编辑前先用 Read 确认内容
2. **精确匹配**Edit 的 old_string 必须与文件内容完全一致
3. **避免重复**编辑前用 Grep 检查是否已存在同主题章节
```
### ❌ 禁止使用:命令行工具
### ❌ 禁止使用:命令行工具修改文档
**禁止场景**
- ❌ 使用 `echo >>` 追加内容(编码问题)
- ❌ 使用 PowerShell 直接修改文档(破坏格式)
- ❌ 使用 sed/awk 等命令行工具
- ❌ 使用 `echo >>` 追加内容
- ❌ 使用 `sed` / `awk` 修改文档
- ❌ 使用 `cat <<EOF` 写入内容
**原因**
- 容易破坏 UTF-8 编码
- Windows CRLF vs Unix LF 混乱
- 容易破坏 UTF-8 编码和中文字符
- 难以追踪修改,容易出错
**唯一例外**:简单的全局文本替换(如批量更新日期),且必须使用 `-NoNewline` 参数
- 无法精确匹配替换位置
### 📝 最佳实践示例
**追加新章节**
```diff
*** Begin Patch
*** Update File: Docs/DevLogs/DayN.md
@@
## 🔗 相关文档
...
---
**追加新章节**使用 `Edit` 工具,`old_string` 匹配文件末尾内容,`new_string` 包含原内容 + 新章节。
## 🆕 新章节
内容...
*** End Patch
```
**修改现有内容**
```diff
*** Begin Patch
*** Update File: Docs/DevLogs/DayN.md
@@
-**状态**:🔄 待修复
+**状态**:✅ 已修复
*** End Patch
**修改现有内容**:使用 `Edit` 工具精确替换。
```markdown
old_string: "**状态**:🔄 待修复"
new_string: "**状态**:✅ 已修复"
```
@@ -213,18 +186,17 @@
```
ViGent2/Docs/
├── task_complete.md # 任务总览(仅按需更新)
├── Doc_Rules.md # 本文件
├── TASK_COMPLETE.md # 任务总览(仅按需更新)
├── DOC_RULES.md # 本文件
├── BACKEND_DEV.md # 后端开发规范
├── BACKEND_README.md # 后端功能文档
├── FRONTEND_DEV.md # 前端开发规范
├── FRONTEND_README.md # 前端功能文档
├── architecture_plan.md # 前端拆分计划
├── implementation_plan.md # 实施计划
├── DEPLOY_MANUAL.md # 部署手册
├── SUPABASE_DEPLOY.md # Supabase 部署文档
├── LatentSync_DEPLOY.md # LatentSync 部署文档
├── QWEN3_TTS_DEPLOY.md # 声音克隆部署文档
├── LATENTSYNC_DEPLOY.md # LatentSync 部署文档
├── COSYVOICE3_DEPLOY.md # 声音克隆部署文档
├── ALIPAY_DEPLOY.md # 支付宝付费部署文档
├── SUBTITLE_DEPLOY.md # 字幕系统部署文档
└── DevLogs/
├── Day1.md # 开发日志
@@ -235,8 +207,16 @@ ViGent2/Docs/
## 📅 DayN.md 更新规则(日常更新)
### 更新时机
> **边开发边记录,不要等到最后才写。**
- 每完成一个功能/修复后,**立即**追加到 DayN.md
- 避免积攒到对话末尾一次性补写,容易遗漏变更
- `TASK_COMPLETE.md` 同理,重要变更完成后及时同步
### 新建判断 (对话开始前)
1. **回顾进度**:查看 `task_complete.md` 了解当前状态
1. **回顾进度**:查看 `TASK_COMPLETE.md` 了解当前状态
2. **检查日期**:查看最新 `DayN.md`
- **今天 (与当前日期相同)** → 🚨 **绝对禁止创建新文件**,必须**追加**到现有 `DayN.md` 末尾!即使是完全不同的功能模块。
- **之前 (昨天或更早)** → 创建 `Day{N+1}.md`
@@ -292,17 +272,17 @@ ViGent2/Docs/
---
## 📝 task_complete.md 更新规则(仅按需)
## 📝 TASK_COMPLETE.md 更新规则
> ⚠️ **仅当用户明确要求更新 `task_complete.md` 时才更新**
> 与 DayN.md 同步更新,记录重要变更时更新任务总览。
### 更新原则
- **格式一致性**:直接参考 `task_complete.md` 现有格式追加内容。
- **格式一致性**:直接参考 `TASK_COMPLETE.md` 现有格式追加内容。
- **进度更新**:仅在阶段性里程碑时更新进度百分比。
### 🔍 完整性检查清单 (必做)
每次更新 `task_complete.md` 时,必须**逐一检查**以下所有板块:
每次更新 `TASK_COMPLETE.md` 时,必须**逐一检查**以下所有板块:
1. **文件头部 & 导航**
- [ ] `更新时间`:必须是当天日期
@@ -325,4 +305,4 @@ ViGent2/Docs/
---
**最后更新**2026-02-07
**最后更新**2026-02-11

View File

@@ -2,22 +2,75 @@
## 目录结构
采用轻量 FSDFeature-Sliced Design结构
```
frontend/src/
├── app/ # Next.js App Router 页面
│ ├── page.tsx # 首页(视频生成)
│ ├── publish/ # 发布页
│ ├── admin/ # 管理员页面
│ ├── login/ # 登录页面
── register/ # 注册页面
├── components/ # 可复用组件
│ ├── home/ # 首页拆分组件
── ...
├── lib/ # 公共工具函数
├── axios.ts # Axios 实例(含 401/403 拦截器)
├── auth.ts # 认证相关函数(统一使用 axios
└── media.ts # API Base / URL / 日期等通用工具
└── proxy.ts # 路由代理(原 middleware
├── app/ # Next.js App Router 页面入口
│ ├── page.tsx # 首页(视频生成)
│ ├── publish/ # 发布管理
│ ├── admin/ # 管理员页面
│ ├── login/ # 登录
── register/ # 注册
│ └── pay/ # 付费开通会员
├── features/ # 功能模块(按业务拆分)
── home/
│ │ ├── model/ # 业务逻辑 hooks
│ │ ├── useHomeController.ts # 主控制器
│ │ ├── useHomePersistence.ts # 持久化管理
├── useBgm.ts
├── useGeneratedVideos.ts
│ │ │ ├── useGeneratedAudios.ts
│ │ │ ├── useMaterials.ts
│ │ │ ├── useMediaPlayers.ts
│ │ │ ├── useRefAudios.ts
│ │ │ ├── useSavedScripts.ts
│ │ │ ├── useTimelineEditor.ts
│ │ │ └── useTitleSubtitleStyles.ts
│ │ └── ui/ # UI 组件(纯 props + 回调)
│ │ ├── HomePage.tsx
│ │ ├── HomeHeader.tsx
│ │ ├── MaterialSelector.tsx
│ │ ├── ScriptEditor.tsx
│ │ ├── ScriptExtractionModal.tsx
│ │ ├── script-extraction/
│ │ │ └── useScriptExtraction.ts
│ │ ├── TitleSubtitlePanel.tsx
│ │ ├── FloatingStylePreview.tsx
│ │ ├── VoiceSelector.tsx
│ │ ├── RefAudioPanel.tsx
│ │ ├── GeneratedAudiosPanel.tsx
│ │ ├── TimelineEditor.tsx
│ │ ├── ClipTrimmer.tsx
│ │ ├── BgmPanel.tsx
│ │ ├── GenerateActionBar.tsx
│ │ ├── PreviewPanel.tsx
│ │ └── HistoryList.tsx
│ └── publish/
│ ├── model/
│ │ └── usePublishController.ts
│ └── ui/
│ └── PublishPage.tsx
├── shared/ # 跨功能共享
│ ├── api/
│ │ ├── axios.ts # Axios 实例(含 401/403 拦截器)
│ │ └── types.ts # 统一响应类型
│ ├── lib/
│ │ ├── media.ts # API Base / URL / 日期等通用工具
│ │ ├── auth.ts # 认证相关函数
│ │ └── title.ts # 标题输入处理
│ ├── hooks/
│ │ ├── useTitleInput.ts
│ │ └── usePublishPrefetch.ts
│ ├── types/
│ │ ├── user.ts # User 类型定义
│ │ └── publish.ts # 发布相关类型
│ └── contexts/ # 全局 ContextAuth、Task
│ ├── AuthContext.tsx
│ └── TaskContext.tsx
├── components/ # 遗留通用组件
│ └── VideoPreviewModal.tsx
└── proxy.ts # Next.js middleware路由保护
```
---
@@ -98,6 +151,33 @@ body {
| `sm:` | ≥ 640px | 平板/桌面 |
| `lg:` | ≥ 1024px | 大屏桌面 |
### embedded 组件模式
合并板块时,子组件通过 `embedded?: boolean` prop 控制是否渲染外层卡片容器和主标题。
```tsx
// embedded=false独立使用渲染完整卡片
<div className="bg-white/5 rounded-2xl p-6 border border-white/10">
<h2></h2>
{content}
</div>
// embedded=true嵌入父卡片只渲染内容
{content}
```
- 子标题使用 `<h3 className="text-sm font-medium text-gray-400">`
- 分隔线使用 `<div className="border-t border-white/10 my-4" />`
- 移动端标题行避免 `whitespace-nowrap`,长描述文字可用 `hidden sm:inline` 在移动端隐藏
### 按钮视觉层级
| 层级 | 样式 | 用途 |
|------|------|------|
| 主操作 | `px-4 py-2 text-sm font-medium bg-gradient-to-r from-purple-600 to-pink-600 shadow-sm` | 生成配音、立即发布 |
| 辅助操作 | `px-2 py-1 text-xs bg-white/10 rounded` | 刷新、上传、语速 |
| 触屏可见 | `opacity-40 group-hover:opacity-100` | 列表行内操作(编辑/删除) |
---
## API 请求规范
@@ -204,6 +284,38 @@ import { formatDate } from '@/shared/lib/media';
## ⚡️ 体验优化规范
### 刷新回顶部(统一体验)
- 长页面(如首页/发布页)在首次挂载时统一回到顶部。
- **必须**在页面级 `useEffect` 中设置 `history.scrollRestoration = "manual"` 禁用浏览器原生滚动恢复。
- 调用 `window.scrollTo({ top: 0, left: 0, behavior: "auto" })` 并追加 200ms 延迟兜底(防止异步 effect 覆盖)。
- **列表自动滚动必须使用时间门控**:页面加载后 1 秒内禁止所有列表自动滚动效果(`scrollEffectsEnabled` ref防止持久化恢复 + 异步数据加载触发 `scrollIntoView` 导致页面跳动。
- 推荐模式:
```typescript
// 页面级HomePage / PublishPage
useEffect(() => {
if (typeof window === "undefined") return;
if ("scrollRestoration" in history) history.scrollRestoration = "manual";
window.scrollTo({ top: 0, left: 0, behavior: "auto" });
const timer = setTimeout(() => window.scrollTo({ top: 0, left: 0, behavior: "auto" }), 200);
return () => clearTimeout(timer);
}, []);
// Controller 级(列表滚动时间门控)
const scrollEffectsEnabled = useRef(false);
useEffect(() => {
const timer = setTimeout(() => { scrollEffectsEnabled.current = true; }, 1000);
return () => clearTimeout(timer);
}, []);
// 列表滚动 effectBGM/素材/视频等)
useEffect(() => {
if (!selectedId || !scrollEffectsEnabled.current) return;
target?.scrollIntoView({ block: "nearest", behavior: "smooth" });
}, [selectedId, list]);
```
### 路由预取
- 首页进入发布管理时使用 `router.prefetch("/publish")`
@@ -228,10 +340,15 @@ import { formatDate } from '@/shared/lib/media';
## 轻量 FSD 结构
- `app/`:页面入口,保持轻量
- `features/*/model`:业务逻辑与状态 (hooks)
- `features/*/ui`:功能 UI 组件
- `shared/`:通用工具、通用 hooks、API 实例
- `app/`:页面入口,保持轻量,只做组合与布局
- `features/*/model`:业务逻辑与状态Controller Hook + 子 Hook
- `features/*/ui`:功能 UI 组件(纯 props + 回调,不直接发 API
- `shared/api`Axios 实例与统一响应类型
- `shared/lib`通用工具函数media.ts / auth.ts / title.ts
- `shared/hooks`:跨功能通用 hooks
- `shared/types`跨功能实体类型User / PublishVideo 等)
- `shared/contexts`:全局 ContextAuthContext / TaskContext
- `components/`遗留通用组件VideoPreviewModal
## 类型定义规范
@@ -248,8 +365,20 @@ import { formatDate } from '@/shared/lib/media';
- **必须持久化**
- 标题样式 ID / 字幕样式 ID
- 标题字号 / 字幕字号
- 标题显示模式(`short` / `persistent`
- 背景音乐选择 / 音量 / 开关状态
- 输出画面比例(`9:16` / `16:9`
- 素材选择 / 历史作品选择
- 选中配音 ID (`selectedAudioId`)
- 语速 (`speed`,声音克隆模式)
- 时间轴段信息 (`useTimelineEditor` 的 localStorage)
### 历史文案(独立持久化)
`useSavedScripts` hook 独立管理历史文案的 localStorage 持久化:
- key: `vigent_{storageKey}_savedScripts`
- 仅在用户手动保存/删除时写入 localStorage不使用自动持久化 effect
-`useHomePersistence` 完全独立,互不影响
### 实施规范
- 使用 `storageKey = userId || 'guest'`,按用户隔离。
@@ -266,6 +395,7 @@ import { formatDate } from '@/shared/lib/media';
- 片头标题与发布信息标题统一限制 15 字。
- 中文输入法合成阶段不截断,合成结束后才校验长度。
- 首页片头标题修改会同步写入 `vigent_${storageKey}_publish_title`
- 标题显示模式使用 `short` / `persistent` 两个固定值;默认 `short`(短暂显示 4 秒)。
- 避免使用 `maxLength` 强制截断输入法合成态。
- 推荐使用 `@/shared/hooks/useTitleInput` 统一处理输入逻辑。
@@ -295,9 +425,11 @@ import { formatDate } from '@/shared/lib/media';
| 接口 | 方法 | 功能 |
|------|------|------|
| `/api/ref-audios` | POST | 上传参考音频 (multipart/form-data: file + ref_text) |
| `/api/ref-audios` | POST | 上传参考音频 (multipart/form-data: fileref_text 可选,后端自动 Whisper 转写) |
| `/api/ref-audios` | GET | 列出用户的参考音频 |
| `/api/ref-audios/{id}` | PUT | 重命名参考音频 |
| `/api/ref-audios/{id}` | DELETE | 删除参考音频 (id 需 encodeURIComponent) |
| `/api/ref-audios/{id}/retranscribe` | POST | 重新识别参考音频文字Whisper 转写 + 超 10s 自动截取) |
### 视频生成 API 扩展
@@ -316,7 +448,8 @@ await api.post('/api/videos/generate', {
text: '口播文案',
tts_mode: 'voiceclone',
ref_audio_id: 'user_id/timestamp_name.wav',
ref_text: '参考音频对应文字',
ref_text: '参考音频对应文字', // 从参考音频 metadata 自动获取
speed: 1.0, // 语速 (0.8-1.2)
});
```
@@ -330,8 +463,14 @@ const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm' });
```
### 参考音频自动处理
- **自动转写**: 上传参考音频时后端自动调用 Whisper 转写内容作为 `ref_text`,无需用户手动输入
- **自动截取**: 参考音频超过 10 秒时自动在静音点截取前 10 秒CosyVoice 建议 3-10 秒)
- **重新识别**: 旧参考音频可通过 retranscribe 端点重新转写并截取
### UI 结构
配音方式使用 Tab 切换:
- **EdgeTTS 音色** - 预设音色 2x3 网格
- **声音克隆** - 参考音频列表 + 在线录音 + 参考文字输入
- **声音克隆** - 参考音频列表 + 在线录音 + 语速下拉菜单 (5 档: 较慢/稍慢/正常/稍快/较快)

View File

@@ -5,19 +5,19 @@ ViGent2 的前端界面,采用 Next.js 16 + TailwindCSS 构建。
## ✨ 核心功能
### 1. 视频生成 (`/`)
- **素材管理**: 拖拽上传人物视频,实时预览
- **素材重命名**: 支持在列表中直接重命名素材
- **文案配音**: 集成 EdgeTTS支持多音色选择 (云溪 / 晓晓)
- **AI 标题/标签**: 一键生成视频标题与标签 (Day 14)。
- **标题/字幕样式**: 样式选择 + 预览 + 字号调节 (Day 16)
- **背景音乐**: 试听 + 音量控制 + 选择持久化 (Day 16)
- **交互优化**: 选择项持久化、列表内定位、刷新回顶部 (Day 16)。
- **预览一致性**: 标题/字幕预览按素材分辨率缩放,效果更接近成片 (Day 17)。
- **一、文案提取与编辑**: 文案输入/提取/翻译/保存
- **二、配音**: 配音方式EdgeTTS/声音克隆)+ 配音列表(生成/试听/管理)合并为一个板块
- **三、素材编辑**: 视频素材(上传/选择/管理)+ 时间轴编辑(波形/色块/拖拽排序)合并为一个板块
- **四、标题与字幕**: 片头标题/副标题/字幕样式配置;短暂显示/常驻显示;样式预览使用视频片头帧作为真实背景 (Day 28)。
- **五、背景音乐**: 试听 + 音量控制 + 选择持久化
- **六、作品**(右栏): 作品列表 + 作品预览合并为一个板块
- **进度追踪**: 实时显示视频生成进度 (10% -> 100%)。
- **作品预览**: 生成完成后直接播放下载(作品预览 + 历史作品)。
- **预览优化**: 预览视频 `metadata` 预取,首帧加载更快。
- **本地保存**: 文案/标题/偏好由 `useHomePersistence` 统一持久化,刷新后恢复 (Day 14/17)。
- **历史文案**: 手动保存/加载/删除历史文案,独立 localStorage 持久化 (Day 23)。
- **选择持久化**: 首页/发布页作品选择均使用稳定 `id` 持久化,刷新保持用户选择;新视频生成后自动选中最新 (Day 21)。
- **AI 多语言翻译**: 支持 9 种目标语言翻译文案 + 还原原文 (Day 22)。
### 2. 全自动发布 (`/publish`) [Day 7 新增]
- **多平台管理**: 统一管理抖音、微信视频号、B站、小红书账号状态。
@@ -33,30 +33,52 @@ ViGent2 的前端界面,采用 Next.js 16 + TailwindCSS 构建。
### 3. 声音克隆 [Day 13 新增]
- **TTS 模式选择**: EdgeTTS (预设音色) / 声音克隆 (自定义音色) 切换。
- **参考音频管理**: 上传/列表/删除参考音频 (3-20秒 WAV)
- **一键克隆**: 选择参考音频后自动调用 Qwen3-TTS 服务
- **参考音频管理**: 上传/列表/重命名/删除参考音频,上传后自动 Whisper 转写 ref_text + 超 10s 自动截取
- **重新识别**: 参考音频可重新转写并截取 (RotateCw 按钮)
- **一键克隆**: 选择参考音频后自动调用 CosyVoice 3.0 服务。
- **语速控制**: 声音克隆模式下支持 5 档语速 (0.8-1.2),选择持久化 (Day 23)。
- **多语言支持**: EdgeTTS 10 语言声音列表,声音克隆 language 透传 (Day 22)。
### 4. 字幕与标题 [Day 13 新增]
- **片头标题**: 可选输入,限制 15 字,视频开头显示 3 秒淡入淡出标题
### 4. 配音前置 + 时间轴编排 [Day 23 新增]
- **配音独立生成**: 先生成配音 → 选中配音 → 再选素材 → 生成视频
- **配音管理面板**: 生成/试听/改名/删除/选中,异步生成 + 进度轮询。
- **时间轴编辑器**: wavesurfer.js 音频波形 + 色块可视化素材分配,拖拽分割线调整各段时长。
- **素材截取设置**: ClipTrimmer 双手柄 range slider + HTML5 视频预览播放。
- **拖拽排序**: 时间轴色块支持 HTML5 Drag & Drop 调换素材顺序。
- **自定义分配**: 后端 `custom_assignments` 支持用户定义的素材分配方案(含 `source_start/source_end` 截取区间)。
- **时间轴语义对齐**: 超出音频时仅保留可见段并截齐末段,超出段不参与生成;不足音频时最后可见段自动循环补齐。
- **画面比例控制**: 时间轴顶部支持 `9:16 / 16:9` 输出比例选择,设置持久化并透传后端。
### 5. 字幕与标题 [Day 13 新增]
- **片头标题**: 可选输入,限制 15 字;支持”短暂显示 / 常驻显示”默认短暂显示4 秒),对标题和副标题同时生效。
- **片头副标题**: 可选输入,限制 20 字;显示在主标题下方,用于补充说明或悬念引导;独立样式配置(字体/字号/颜色/间距),可由 AI 同时生成;与标题共享显示模式设定;仅在视频画面中显示,不参与发布标题 (Day 25)。
- **标题同步**: 首页片头标题修改会同步到发布信息标题。
- **逐字高亮字幕**: 卡拉OK效果默认开启可关闭。
- **自动对齐**: 基于 faster-whisper 生成字级别时间戳。
- **样式预设**: 标题/字幕样式选择 + 预览 + 字号调节 (Day 16)。
- **样式预设**: 标题/字幕/副标题样式选择 + 预览 + 字号调节 (Day 16/25)。
- **默认样式**: 标题 90px 站酷快乐体;字幕 60px 经典黄字 + DingTalkJinBuTi (Day 17)。
- **样式持久化**: 标题/字幕样式与字号刷新保留 (Day 17)。
- **样式持久化**: 标题/字幕/副标题样式与字号刷新保留 (Day 17/25)。
### 5. 背景音乐 [Day 16 新增]
### 6. 背景音乐 [Day 16 新增]
- **试听预览**: 点击试听即选中,音量滑块实时生效。
- **混音控制**: 仅影响 BGM配音保持原音量。
### 6. 账户设置 [Day 15 新增]
### 7. 账户设置 [Day 15 新增]
- **手机号登录**: 11位中国手机号验证登录。
- **账户下拉菜单**: 显示有效期 + 修改密码 + 安全退出。
- **账户下拉菜单**: 显示手机号(中间四位脱敏)+ 有效期 + 修改密码 + 安全退出。
- **修改密码**: 弹窗输入当前密码与新密码,修改后强制重新登录。
- **登录即时生效**: 登录成功后 AuthContext 立即写入用户数据,无需刷新即显示手机号。
### 7. 文案提取助手 (`ScriptExtractionModal`) [Day 15 新增]
### 8. 付费开通会员 (`/pay`)
- **支付宝电脑网站支付**: 跳转支付宝官方收银台,支持扫码/账号登录/余额等多种支付方式。
- **自动激活**: 支付成功后异步回调自动激活会员(有效期 1 年),前端轮询检测支付结果。
- **到期续费**: 会员到期后登录自动跳转付费页续费,流程与首次开通一致。
- **管理员激活**: 管理员手动激活功能并存,两种方式互不影响。
### 8. 文案提取助手 (`ScriptExtractionModal`) [Day 15 新增]
- **多源提取**: 支持文件拖拽上传与 URL 粘贴 (B站/抖音/TikTok)。
- **AI 洗稿**: 集成 GLM-4.7-Flash自动改写为口播文案。
- **AI 智能改写**: 集成 GLM-4.7-Flash自动改写为口播文案。
- **自定义提示词**: 可自定义改写提示词,留空使用默认;设置持久化到 localStorage (Day 25)。
- **一键填入**: 提取结果直接填充至视频生成输入框。
- **智能交互**: 实时进度展示,防误触设计。
@@ -66,6 +88,7 @@ ViGent2 的前端界面,采用 Next.js 16 + TailwindCSS 构建。
- **样式**: TailwindCSS
- **图标**: Lucide React
- **组件**: 自定义现代化组件 (Glassmorphism 风格)
- **音频波形**: wavesurfer.js (时间轴编辑器)
- **API**: Axios 实例 `@/shared/api/axios` (对接后端 FastAPI :8006)
## 🚀 开发指南
@@ -93,6 +116,8 @@ src/
│ ├── page.tsx # 视频生成主页
│ ├── publish/ # 发布管理页
│ │ └── page.tsx
│ ├── pay/ # 付费开通会员页
│ │ └── page.tsx
│ └── layout.tsx # 全局布局 (导航栏)
├── features/
│ ├── home/
@@ -117,5 +142,8 @@ src/
## 🎨 设计规范
- **主色调**: 深紫/黑色系 (Dark Mode)
- **交互**: 悬停微动画 (Hover Effects)
- **响应式**: 适配桌面端大屏操作
- **交互**: 悬停微动画 (Hover Effects);操作按钮默认半透明可见 (opacity-40)hover 时全亮,兼顾触屏设备
- **响应式**: 适配桌面端与移动端;发布页平台卡片响应式布局(移动端紧凑/桌面端宽松)
- **滚动体验**: 列表滚动条统一隐藏 (hide-scrollbar);刷新后自动回到顶部(禁用浏览器滚动恢复 + 列表 scroll 时间门控)
- **样式预览**: 浮动预览窗口,桌面端左上角 280px移动端右下角 160px不遮挡控件
- **输入辅助**: 标题/副标题输入框实时字数计数器,超限变红

252
Docs/MUSETALK_DEPLOY.md Normal file
View File

@@ -0,0 +1,252 @@
# MuseTalk 部署指南
> **更新时间**2026-02-27
> **适用版本**MuseTalk v1.5 (常驻服务模式)
> **架构**FastAPI 常驻服务 + PM2 进程管理
---
## 架构概览
MuseTalk 作为 **混合唇形同步方案** 的长视频引擎:
- **短视频 (<120s)** → LatentSync 1.6 (GPU1, 端口 8007)
- **长视频 (>=120s)** → MuseTalk 1.5 (GPU0, 端口 8011)
- 路由阈值由 `LIPSYNC_DURATION_THRESHOLD` 控制
- MuseTalk 不可用时自动回退到 LatentSync
---
## 硬件要求
| 配置 | 最低要求 | 推荐配置 |
|------|----------|----------|
| GPU | 8GB VRAM (RTX 3060) | 24GB VRAM (RTX 3090) |
| 内存 | 32GB | 64GB |
| CUDA | 11.7+ | 11.8 |
> MuseTalk fp16 推理约需 4-8GB 显存,可与 CosyVoice 共享 GPU0。
---
## 安装步骤
### 1. Conda 环境
```bash
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
conda create -n musetalk python=3.10 -y
conda activate musetalk
```
### 2. PyTorch 2.0.1 + CUDA 11.8
> 必须使用此版本mmcv 预编译包依赖。
```bash
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
```
### 3. 依赖安装
```bash
pip install -r requirements.txt
# MMLab 系列
pip install --no-cache-dir -U openmim
mim install mmengine
mim install "mmcv==2.0.1"
mim install "mmdet==3.1.0"
pip install chumpy --no-build-isolation
pip install "mmpose==1.1.0" --no-deps
# FastAPI 服务依赖
pip install fastapi uvicorn httpx
```
---
## 模型权重
### 目录结构
```
models/MuseTalk/models/
├── musetalk/ ← v1 基础模型
│ ├── config.json -> musetalk.json (软链接)
│ ├── musetalk.json
│ ├── musetalkV15 -> ../musetalkV15 (软链接, 关键!)
│ └── pytorch_model.bin (~3.2GB)
├── musetalkV15/ ← v1.5 UNet 模型
│ ├── musetalk.json
│ └── unet.pth (~3.2GB)
├── sd-vae/ ← Stable Diffusion VAE
│ ├── config.json
│ └── diffusion_pytorch_model.bin
├── whisper/ ← OpenAI Whisper Tiny
│ ├── config.json
│ ├── pytorch_model.bin (~151MB)
│ └── preprocessor_config.json
├── dwpose/ ← DWPose 人体姿态检测
│ └── dw-ll_ucoco_384.pth (~387MB)
├── syncnet/ ← SyncNet 唇形同步评估
│ └── latentsync_syncnet.pt
└── face-parse-bisent/ ← 人脸解析模型
├── 79999_iter.pth (~53MB)
└── resnet18-5c106cde.pth (~45MB)
```
### 下载方式
使用项目自带脚本:
```bash
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
conda activate musetalk
bash download_weights.sh
```
或手动 Python API 下载:
```bash
conda activate musetalk
export HF_ENDPOINT=https://hf-mirror.com
python -c "
from huggingface_hub import snapshot_download
snapshot_download('TMElyralab/MuseTalk', local_dir='models',
allow_patterns=['musetalk/*', 'musetalkV15/*'])
snapshot_download('stabilityai/sd-vae-ft-mse', local_dir='models/sd-vae',
allow_patterns=['config.json', 'diffusion_pytorch_model.bin'])
snapshot_download('openai/whisper-tiny', local_dir='models/whisper',
allow_patterns=['config.json', 'pytorch_model.bin', 'preprocessor_config.json'])
snapshot_download('yzd-v/DWPose', local_dir='models/dwpose',
allow_patterns=['dw-ll_ucoco_384.pth'])
"
```
### 创建必要的软链接
```bash
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk/models/musetalk
ln -sf musetalk.json config.json
ln -sf ../musetalkV15 musetalkV15
```
> **关键**`musetalk/musetalkV15` 软链接缺失会导致权重检测失败 (`weights: False`)。
---
## 服务启动
### PM2 进程管理(推荐)
```bash
# 首次注册
cd /home/rongye/ProgramFiles/ViGent2
pm2 start run_musetalk.sh --name vigent2-musetalk
pm2 save
# 日常管理
pm2 restart vigent2-musetalk
pm2 logs vigent2-musetalk
pm2 stop vigent2-musetalk
```
### 手动启动
```bash
cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
/home/rongye/ProgramFiles/miniconda3/envs/musetalk/bin/python scripts/server.py
```
### 健康检查
```bash
curl http://localhost:8011/health
# {"status":"ok","model_loaded":true}
```
---
## 后端配置
`backend/.env` 中的相关变量:
```ini
# MuseTalk 配置
MUSETALK_GPU_ID=0 # GPU 编号 (与 CosyVoice 共存)
MUSETALK_API_URL=http://localhost:8011 # 常驻服务地址
MUSETALK_BATCH_SIZE=32 # 推理批大小
MUSETALK_VERSION=v15 # 模型版本
MUSETALK_USE_FLOAT16=true # 半精度加速
# 混合唇形同步路由
LIPSYNC_DURATION_THRESHOLD=120 # 秒, >=此值用 MuseTalk
```
---
## 相关文件
| 文件 | 说明 |
|------|------|
| `models/MuseTalk/scripts/server.py` | FastAPI 常驻服务 (端口 8011) |
| `run_musetalk.sh` | PM2 启动脚本 |
| `backend/app/services/lipsync_service.py` | 混合路由 + `_call_musetalk_server()` |
| `backend/app/core/config.py` | `MUSETALK_*` 配置项 |
---
## 性能优化 (server.py v2)
首次长视频测试 (136s, 3404 帧) 耗时 30 分钟。分析发现瓶颈在人脸检测 (28%)、BiSeNet 合成 (22%)、I/O (17%),而非 UNet 推理 (17%)。
### 已实施优化
| 优化项 | 说明 |
|--------|------|
| `MUSETALK_BATCH_SIZE` 8→32 | RTX 3090 显存充裕UNet 推理加速 ~3x |
| cv2.VideoCapture 直读帧 | 跳过 ffmpeg→PNG→imread 链路 |
| 人脸检测降频 (每5帧) | DWPose + FaceAlignment 只在采样帧运行,中间帧线性插值 bbox |
| BiSeNet mask 缓存 (每5帧) | `get_image_prepare_material` 每 5 帧运行,中间帧用 `get_image_blending` 复用 |
| cv2.VideoWriter 直写 | 跳过逐帧 PNG 写盘 + ffmpeg 重编码 |
| 每阶段计时 | 7 个阶段精确计时,方便后续调优 |
### 调优参数
`models/MuseTalk/scripts/server.py` 顶部可调:
```python
DETECT_EVERY = 5 # 人脸检测降频间隔 (帧)
BLEND_CACHE_EVERY = 5 # BiSeNet mask 缓存间隔 (帧)
```
> 对于口播视频 (人脸几乎不动)5 帧间隔的插值误差可忽略。
> 如人脸运动剧烈的场景,可降低为 2-3。
---
## 常见问题
### huggingface-hub 版本冲突
```
ImportError: huggingface-hub>=0.19.3,<1.0 is required
```
**解决**:降级 huggingface-hub
```bash
pip install "huggingface-hub>=0.19.3,<1.0"
```
### mmcv 导入失败
```bash
pip uninstall mmcv mmcv-full -y
mim install "mmcv==2.0.1"
```
### 音视频长度不匹配
已在 `musetalk/utils/audio_processor.py` 中修复(零填充逻辑),无需额外处理。

View File

@@ -298,12 +298,20 @@ Response: audio/wav 文件
SoX could not be found!
```
**解决**: 通过 conda 安装 sox
**解决**:
1. 通过 conda 安装 sox
```bash
conda install -y -c conda-forge sox
```
2. 确保启动脚本 `run_qwen_tts.sh` 中已 export conda env bin 到 PATHPM2 启动时系统 PATH 不含 conda 环境目录):
```bash
export PATH="/home/rongye/ProgramFiles/miniconda3/envs/qwen-tts/bin:$PATH"
```
### CUDA 内存不足
Qwen3-TTS 1.7B 通常需要 8-10GB VRAM。如果遇到 OOM
@@ -371,6 +379,7 @@ FOR INSERT TO anon WITH CHECK (bucket_id = 'ref-audios');
| 日期 | 版本 | 说明 |
|------|------|------|
| 2026-02-09 | 1.2.0 | 修复 SoX PATH 问题run_qwen_tts.sh export conda bin每次生成后 empty_cache() |
| 2026-01-30 | 1.1.0 | 明确默认模型升级为 1.7B-Base替换旧版 0.6B 路径 |
---

View File

@@ -15,11 +15,17 @@
原有流程:
文本 → EdgeTTS → 音频 → LatentSync → FFmpeg合成 → 最终视频
新流程:
文本 → EdgeTTS → 音频 ─┬→ LatentSync → 唇形视频 ─┐
└→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频
新流程 (单素材):
文本 → EdgeTTS/CosyVoice/预生成配音 → 音频 ─┬→ LatentSync/MuseTalk → 唇形视频 ─┐
└→ faster-whisper → 字幕JSON ─┴→ Remotion合成 → 最终视频
新流程 (多素材):
音频 → 多素材按 custom_assignments 拼接 → LatentSync/MuseTalk (单次推理) → 唇形视频 ─┐
音频 → faster-whisper → 字幕JSON ─────────────────────────────────────────────┴→ Remotion合成 → 最终视频
```
> **唇形同步路由**: 短视频 (<120s) 用 LatentSync 1.6 (GPU1),长视频 (>=120s) 用 MuseTalk 1.5 (GPU0),由 `LIPSYNC_DURATION_THRESHOLD` 控制。
## 系统要求
| 组件 | 要求 |
@@ -140,7 +146,7 @@ remotion/
| 阶段 | 进度 | 说明 |
|------|------|------|
| 下载素材 | 0% → 5% | 从 Supabase 下载输入视频 |
| TTS 语音生成 | 5% → 25% | EdgeTTS Qwen3-TTS 生成音频 |
| TTS 语音生成 | 5% → 25% | EdgeTTS / Qwen3-TTS / 预生成配音下载 |
| 唇形同步 | 25% → 80% | LatentSync 推理 |
| 字幕对齐 | 80% → 85% | faster-whisper 生成字级别时间戳 |
| Remotion 渲染 | 85% → 95% | 合成字幕和标题 |
@@ -181,7 +187,9 @@ Remotion 渲染参数在 `backend/app/services/remotion_service.py` 中配置:
| 参数 | 默认值 | 说明 |
|------|--------|------|
| `fps` | 25 | 输出帧率 |
| `title_duration` | 3.0 | 标题显示时长(秒 |
| `concurrency` | 16 | Remotion 并发渲染进程数(默认 16可通过 `--concurrency` CLI 参数覆盖 |
| `title_display_mode` | `short` | 标题显示模式(`short`=短暂显示;`persistent`=常驻显示) |
| `title_duration` | 4.0 | 标题显示时长(秒,仅 `short` 模式生效) |
---
@@ -268,7 +276,7 @@ wget https://github.com/googlefonts/noto-cjk/raw/main/Sans/OTF/SimplifiedChinese
### 使用 GPU 0
faster-whisper 默认使用 GPU 0与 LatentSync (GPU 1) 分开,避免显存冲突。如需指定 GPU
faster-whisper 默认使用 GPU 0MuseTalk 共享 GPU 0LatentSync 使用 GPU 1,互不冲突。如需指定 GPU
```python
# 在 whisper_service.py 中修改
@@ -282,4 +290,7 @@ WhisperService(device="cuda:0") # 或 "cuda:1"
| 日期 | 版本 | 说明 |
|------|------|------|
| 2026-01-29 | 1.0.0 | 初始版本,使用 faster-whisper + Remotion 实现逐字高亮字幕和片头标题 |
| 2026-02-10 | 1.1.0 | 更新架构图:多素材 concat-then-infer、预生成配音选项 |
| 2026-01-30 | 1.0.1 | 字幕高亮样式与标题动画优化,视觉表现更清晰 |
| 2026-02-25 | 1.2.0 | 字幕时间戳从线性插值改为 Whisper 节奏映射,修复长视频字幕漂移 |
| 2026-02-27 | 1.3.0 | 架构图更新 MuseTalk 混合路由Remotion 并发渲染从 8 提升到 16GPU 分配说明更新 |

View File

@@ -1,427 +0,0 @@
# 数字人口播视频生成系统 - 实现计划
## 项目目标
构建一个开源的数字人口播视频生成系统,功能包括:
- 上传静态人物视频 → 生成口播视频(唇形同步)
- TTS 配音或声音克隆
- 字幕自动生成与渲染
- AI 自动生成标题与标签
- 一键发布到多个社交平台
---
## 技术架构
```
┌─────────────────────────────────────────────────────────┐
│ 前端 (Next.js) │
│ 素材管理 | 视频生成 | 发布管理 | 任务状态 │
└─────────────────────────────────────────────────────────┘
│ REST API
┌─────────────────────────────────────────────────────────┐
│ 后端 (FastAPI) │
├─────────────────────────────────────────────────────────┤
│ 异步任务队列 (asyncio) │
│ ├── 视频生成任务 │
│ ├── TTS 配音任务 │
│ └── 自动发布任务 │
└─────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│LatentSync│ │ FFmpeg │ │Playwright│
│ 唇形同步 │ │ 视频合成 │ │ 自动发布 │
└──────────┘ └──────────┘ └──────────┘
```
---
## 技术选型
| 模块 | 技术选择 | 备选方案 |
|------|----------|----------|
| **前端框架** | Next.js 16 | Vue 3 + Vite |
| **UI 组件库** | TailwindCSS (自定义组件) | Ant Design |
| **后端框架** | FastAPI | Flask |
| **任务队列** | FastAPI BackgroundTasks (asyncio) | Celery + Redis |
| **唇形同步** | **LatentSync 1.6** | MuseTalk / Wav2Lip |
| **TTS 配音** | EdgeTTS | CosyVoice |
| **声音克隆** | **Qwen3-TTS 1.7B** ✅ | GPT-SoVITS |
| **视频处理** | FFmpeg | MoviePy |
| **自动发布** | Playwright | 自行实现 |
| **数据库** | Supabase (PostgreSQL) | MySQL |
| **文件存储** | Supabase Storage | 阿里云 OSS |
> **修正 (18:10)**:当前实现采用 Next.js 16、FastAPI BackgroundTasks 与 Supabase Storage/Auth自动发布基于 Playwright。
---
## ✅ 现状补充 (Day 17)
- 前端已拆分为组件化结构(`features/home/ui/`),主页面逻辑集中。
- 通用工具 `media.ts` 统一处理 API Base / 资源 URL / 日期格式化。
- 作品预览弹窗统一样式,并支持素材/发布预览复用。
- 标题/字幕预览按素材分辨率缩放,效果更接近成片。
---
## 分阶段实施计划
### 阶段一:核心功能验证 (MVP)
> **目标**:验证 LatentSync + EdgeTTS 效果,跑通端到端流程
#### 1.1 环境搭建
参考 `models/LatentSync/DEPLOY.md` 完成 LatentSync 环境与权重部署。
#### 1.2 集成 EdgeTTS
```python
# tts_engine.py
import edge_tts
import asyncio
async def text_to_speech(text: str, voice: str = "zh-CN-YunxiNeural", output_path: str = "output.mp3"):
communicate = edge_tts.Communicate(text, voice)
await communicate.save(output_path)
return output_path
```
#### 1.3 端到端测试脚本
```python
# test_pipeline.py
"""
1. 文案 → EdgeTTS → 音频
2. 静态视频 + 音频 → LatentSync → 口播视频
3. 添加字幕 → FFmpeg → 最终视频
"""
```
#### 1.4 验证标准
- [ ] LatentSync 能正常推理
- [ ] 唇形与音频同步率 > 90%
- [ ] 单个视频生成时间 < 2 分钟
---
### 阶段二:后端 API 开发
> **目标**:将核心功能封装为 API支持异步任务
#### 2.1 项目结构
```
backend/
├── app/
│ ├── main.py # FastAPI 入口
│ ├── api/
│ │ ├── videos.py # 视频生成 API
│ │ ├── materials.py # 素材管理 API
│ │ └── publish.py # 发布管理 API
│ ├── services/
│ │ ├── tts_service.py # TTS 服务
│ │ ├── lipsync_service.py # 唇形同步服务
│ │ └── video_service.py # 视频合成服务
│ ├── tasks/
│ │ └── celery_tasks.py # Celery 异步任务
│ ├── models/
│ │ └── schemas.py # Pydantic 模型
│ └── core/
│ └── config.py # 配置管理
├── requirements.txt
└── docker-compose.yml # Redis + API
```
#### 2.2 核心 API 设计
| 端点 | 方法 | 功能 |
|------|------|------|
| `/api/materials` | POST | 上传视频素材 | ✅ |
| `/api/materials` | GET | 获取素材列表 | ✅ |
| `/api/videos/generate` | POST | 创建视频生成任务 | ✅ |
| `/api/videos/tasks/{id}` | GET | 查询任务状态 | ✅ |
| `/api/videos/generated` | GET | 获取历史作品列表 | ✅ |
| `/api/publish` | POST | 发布到社交平台 | ✅ |
#### 2.3 BackgroundTasks 任务定义
```python
# app/api/videos.py
background_tasks.add_task(_process_video_generation, task_id, req, user_id)
```
---
### 阶段三:前端 Web UI
> **目标**:提供用户友好的操作界面
#### 3.1 页面设计
| 页面 | 功能 |
|------|------|
| **素材库** | 上传/管理多场景视频素材 |
| **生成视频** | 输入文案、选择素材、生成预览 |
| **任务中心** | 查看生成进度、下载视频 |
| **发布管理** | 绑定平台、一键发布、定时发布 |
#### 3.2 技术实现
```bash
# 创建 Next.js 项目
npx create-next-app@latest frontend --typescript --tailwind --app
# 安装依赖
cd frontend
npm install axios swr
```
---
### 阶段四:社交媒体发布
> **目标**:集成 social-auto-upload支持多平台发布
#### 4.1 复用 social-auto-upload
```bash
# 复制模块
cp -r SuperIPAgent/social-auto-upload backend/social_upload
```
#### 4.2 Cookie 管理
```python
# 用户通过浏览器登录 → 保存 Cookie → 后续自动发布
```
#### 4.3 支持平台
- 抖音
- 小红书
- 微信视频号
- 快手
---
### 阶段五:优化与扩展
| 功能 | 实现方式 |
|------|----------|
| **声音克隆** | 集成 GPT-SoVITS用自己的声音 |
| **AI 标题/标签生成** | 调用大模型 API 自动生成标题与标签 ✅ |
| **批量生成** | 上传 Excel/CSV批量生成视频 |
| **字幕编辑器** | 可视化调整字幕样式、位置 |
| **Docker 部署** | 一键部署到云服务器 | ✅ |
---
### 阶段六MuseTalk 服务器部署 (Day 2-3) ✅
> **目标**:在双显卡服务器上部署 MuseTalk 环境
- [x] Conda 环境配置 (musetalk)
- [x] 模型权重下载 (~7GB)
- [x] Subprocess 调用方式实现
- [x] 健康检查功能
### 阶段七MuseTalk 完整修复 (Day 4) ✅
> **目标**:解决推理脚本的各种兼容性问题
- [x] 权重检测路径修复 (软链接)
- [x] 音视频长度不匹配修复
- [x] 推理脚本错误日志增强
- [x] 视频合成 MP4 生成验证
### 阶段八:前端功能增强 (Day 5) ✅
> **目标**:提升用户体验
- [x] Web 视频上传功能
- [x] 上传进度显示
- [x] 自动刷新素材列表
### 阶段九:唇形同步模型升级 (Day 6) ✅
> **目标**:从 MuseTalk 迁移到 LatentSync 1.6
- [x] MuseTalk → LatentSync 1.6 迁移
- [x] 后端代码适配 (config.py, lipsync_service.py)
- [x] Latent Diffusion 架构 (512x512 高清)
- [x] 服务器端到端验证
### 阶段十:性能优化 (Day 6) ✅
> **目标**:提升系统响应速度和稳定性
- [x] 视频预压缩优化 (1080p → 720p 自动适配)
- [x] 进度更新细化 (实时反馈)
- [x] **常驻模型服务** (Persistent Server, 0s 加载)
- [x] **GPU 并发控制** (串行队列防崩溃)
### 阶段十一:社交媒体发布完善 (Day 7) ✅
> **目标**:实现全自动扫码登录和多平台发布
- [x] QR码自动登录 (Playwright headless + Stealth)
- [x] 多平台上传器架构 (B站/抖音/小红书)
- [x] Cookie 自动管理
- [x] 定时发布功能
### 阶段十二:用户体验优化 (Day 8) ✅
> **目标**:提升文件管理和历史记录功能
- [x] 文件名保留 (时间戳前缀 + 原始名称)
- [x] 视频持久化 (历史视频列表 API)
- [x] 素材/视频删除功能
### 阶段十三:发布模块优化 (Day 9) ✅
> **目标**:代码质量优化 + 发布功能验证
- [x] B站/抖音登录+发布验证通过
- [x] 资源清理保障 (try-finally)
- [x] 超时保护 (消除无限循环)
- [x] 完整类型提示
### 阶段十四:用户认证系统 (Day 9) ✅
> **目标**:实现安全、隔离的多用户认证体系
- [x] Supabase 云数据库集成 (本地自托管)
- [x] JWT + HttpOnly Cookie 认证架构
- [x] 用户表与权限表设计 (RLS 准备)
- [x] 认证部署文档 (Docs/SUPABASE_DEPLOY.md)
### 阶段十五:部署稳定性优化 (Day 9) ✅
> **目标**:确保生产环境服务长期稳定
- [x] 依赖冲突修复 (bcrypt)
- [x] 前端构建修复 (Production Build)
- [x] PM2 进程守护配置
- [x] 部署手册更新 (Docs/DEPLOY_MANUAL.md)
### 阶段十六HTTPS 全栈部署 (Day 10) ✅
> **目标**:实现安全的公网 HTTPS 访问
- [x] 阿里云 Nginx 反向代理配置
- [x] Let's Encrypt SSL 证书集成
- [x] Supabase 自托管部署 (Docker)
- [x] 端口冲突解决 (3003/8008/8444)
- [x] Basic Auth 管理后台保护
### 阶段十七:声音克隆功能集成 (Day 13) ✅
> **目标**:实现用户自定义声音克隆能力
- [x] Qwen3-TTS HTTP 服务 (独立 FastAPI端口 8009)
- [x] 声音克隆服务封装 (voice_clone_service.py)
- [x] 参考音频管理 API (上传/列表/删除)
- [x] 前端 TTS 模式选择 UI
- [x] Supabase ref-audios Bucket 配置
- [x] 端到端测试验证
### 阶段十八:手机号登录迁移 (Day 15) ✅
> **目标**:将认证系统从邮箱迁移到手机号
- [x] 数据库 Schema 迁移 (email → phone)
- [x] 后端 API 适配 (auth.py/admin.py)
- [x] 11位手机号校验 (正则验证)
- [x] 修改密码功能 (/api/auth/change-password)
- [x] 账户设置下拉菜单 (修改密码 + 有效期显示 + 退出)
- [x] 前端登录/注册页面更新
- [x] 数据库迁移脚本 (migrate_to_phone.sql)
### 阶段十九:深度性能优化与服务守护 (Day 16) ✅
> **目标**:提升系统响应速度与服务稳定性
- [x] Flash Attention 2 集成 (Qwen3-TTS 加速 5x)
- [x] LatentSync 性能调优 (OMP 线程限制 + 原生 Flash Attn)
- [x] Watchdog 服务守护 (自动重启僵死服务)
- [x] 文档体系更新 (部署手册与运维指南)
---
## 项目目录结构 (最终)
---
## 开发时间估算
| 阶段 | 预计时间 | 说明 |
|------|----------|------|
| 阶段一 | 2-3 天 | 环境搭建 + 效果验证 |
| 阶段二 | 3-4 天 | 后端 API 开发 |
| 阶段三 | 3-4 天 | 前端 UI 开发 |
| 阶段四 | 2 天 | 社交发布集成 |
| 阶段五 | 按需 | 持续优化 |
**总计**:约 10-13 天可完成 MVP
---
### 阶段二十:代码质量与安全优化 (Day 20) ✅
> **目标**:全面提升代码健壮性、安全性与配置灵活性
- [x] **安全性修复**:硬编码 Cookie/Key 移除ffprobe 安全调用,日志脱敏
- [x] **配置化改造**存储路径、CORS、录屏开关全面环境变量化
- [x] **性能优化**API 异步改造 (httpx/asyncio),大文件流式上传
- [x] **构建优化**Remotion 预编译,统一启动脚本 `run_backend.sh`
---
## 验证计划
### 阶段一验证
1. 运行 `test_pipeline.py` 脚本
2. 检查生成视频的唇形同步效果
3. 确认音画同步
### 阶段二验证
1. 使用 Postman/curl 测试所有 API 端点
2. 验证任务队列正常工作
3. 检查视频生成完整流程
### 阶段三验证
1. 在浏览器中完成完整操作流程
2. 验证上传、生成、下载功能
3. 检查响应式布局
### 阶段四验证
1. 发布一个测试视频到抖音
2. 验证定时发布功能
3. 检查发布状态同步
---
## 硬件要求
| 配置 | 最低要求 | 推荐配置 |
|------|----------|----------|
| **GPU** | NVIDIA GTX 1060 6GB | RTX 3060 12GB+ |
| **内存** | 16GB | 32GB |
| **存储** | 100GB SSD | 500GB SSD |
| **CUDA** | 11.7+ | 12.0+ |
---
## 下一步行动
1. **确认你的 GPU 配置** - MuseTalk 需要 NVIDIA GPU
2. **选择开发起点** - 从阶段一开始验证效果
3. **确定项目位置** - 在哪个目录创建项目
---
> [!IMPORTANT]
> 请确认以上计划是否符合你的需求,有任何需要调整的地方请告诉我。

View File

@@ -1,8 +1,8 @@
# ViGent2 开发任务清单 (Task Log)
**项目**: ViGent2 数字人口播视频生成系统
**进度**: 100% (Day 21 - 缺陷修复与持久化回归治理)
**更新时间**: 2026-02-08
**进度**: 100% (Day 28 - CosyVoice FP16 加速 + 文档全面更新)
**更新时间**: 2026-02-27
---
@@ -10,18 +10,141 @@
> 这里记录了每一天的核心开发内容与 milestone。
### Day 21: 缺陷修复与持久化回归治理 (Current)
### Day 28: CosyVoice FP16 加速 + 文档全面更新 (Current)
- [x] **CosyVoice FP16 半精度加速**: `AutoModel()` 开启 `fp16=True`LLM 推理和 Flow Matching 自动混合精度运行,预估提速 30-40%、显存降低 ~30%。
- [x] **文档全面更新**: README.md / DEPLOY_MANUAL.md / SUBTITLE_DEPLOY.md / BACKEND_README.md 补充 MuseTalk 混合唇形同步方案、性能优化、Remotion 并发渲染等内容。
### Day 27: Remotion 描边修复 + 字体样式扩展 + 混合唇形同步 + 性能优化
- [x] **描边渲染修复**: 标题/副标题/字幕从 `textShadow` 4 方向模拟改为 CSS 原生 `-webkit-text-stroke` + `paint-order: stroke fill`,修复描边过粗和副标题重影问题。
- [x] **字体样式扩展**: 标题样式 4→12 个(+庞门正道/优设标题圆/阿里数黑体/文道潮黑/无界黑/厚底黑/寒蝉半圆体/欣意吉祥宋),字幕样式 4→8 个(+少女粉/清新绿/金色隶书/楷体红字)。
- [x] **描边参数优化**: 所有预设 `stroke_size` 从 8 降至 4~5配合原生描边视觉更干净。
- [x] **TypeScript 类型修复**: Root.tsx `Composition` 泛型与 `calculateMetadata` 参数类型对齐Video.tsx `VideoProps` 添加索引签名兼容 `Record<string, unknown>`VideoLayer.tsx 移除 `OffthreadVideo` 不支持的 `loop` prop。
- [x] **进度条文案还原**: 进度条从显示后端推送消息改回固定 `正在AI生成中...`
- [x] **MuseTalk 混合唇形同步**: 部署 MuseTalk 1.5 常驻服务 (GPU0, 端口 8011),按音频时长自动路由 — 短视频 (<120s) 走 LatentSync长视频 (>=120s) 走 MuseTalkMuseTalk 不可用时自动回退。
- [x] **MuseTalk 推理性能优化**: server.py v2 重写 — cv2 直读帧(跳过 ffmpeg→PNG)、人脸检测降频(每5帧)、BiSeNet mask 缓存(每5帧)、cv2.VideoWriter 直写(跳过 PNG 写盘)、batch_size 8→32预估 30min→8-10min (~3x)。
- [x] **Remotion 并发渲染优化**: render.ts 新增 concurrency 参数,从默认 8 提升到 16 (56核 CPU),预估 5min→2-3min。
### Day 26: 前端优化:板块合并 + 序号标题 + UI 精细化
- [x] **板块合并**: 首页 9 个独立板块合并为 5 个主板块(配音方式+配音列表→三、配音;视频素材+时间轴→四、素材编辑;历史作品+作品预览→六、作品)。
- [x] **中文序号标题**: 一~十编号(首页一~六,发布页七~十),移除所有 emoji 图标。
- [x] **embedded 模式**: 6 个组件支持 `embedded` prop嵌入时不渲染外层卡片/标题。
- [x] **配音列表两行布局**: embedded 模式第 1 行语速+生成配音(右对齐),第 2 行配音列表+刷新。
- [x] **子组件自渲染子标题**: MaterialSelector/TimelineEditor embedded 时自渲染 h3 子标题+操作按钮同行。
- [x] **下拉对齐**: TitleSubtitlePanel 标签统一 `w-20`,下拉 `w-1/3 min-w-[100px]`,垂直对齐。
- [x] **参考音频文案简化**: 底部段落移至标题旁,简化为 `(上传3-10秒语音样本)`
- [x] **账户手机号显示**: AccountSettingsDropdown 新增手机号显示。
- [x] **标题显示模式对副标题生效**: payload 条件修复 + UI 下拉上移至板块标题行。
- [x] **登录后用户信息立即可用**: AuthContext 暴露 `setUser`,登录成功后立即写入用户数据,修复登录后显示"未知账户"的问题。
- [x] **文案微调**: 素材描述改为"上传自拍视频最多可选4个";显示模式选项加"标题"前缀。
- [x] **UI/UX 体验优化**: 操作按钮移动端可见opacity-40、手机号脱敏、标题字数计数器、时间轴拖拽抓手图标、截取滑块放大。
- [x] **代码质量修复**: 密码弹窗 success 清空、MaterialSelector useMemo + disabled 守卫、TimelineEditor useMemo。
- [x] **发布页响应式布局**: 平台账号卡片单行布局,移动端紧凑(小图标/小按钮),桌面端宽松(与其他板块风格一致)。
- [x] **移动端刷新回顶部**: `scrollRestoration = "manual"` + 列表 scroll 时间门控(`scrollEffectsEnabled` ref1 秒内禁止自动滚动)+ 延迟兜底 `scrollTo(0,0)`
- [x] **移动端样式预览缩小**: FloatingStylePreview 移动端宽度缩至 160px位置改为右下角不遮挡样式调节控件。
- [x] **列表滚动条统一隐藏**: 所有列表BGM/配音/作品/素材/文案提取)滚动条改回 `hide-scrollbar`
- [x] **移动端配音/素材适配**: VoiceSelector 按钮移动端缩小(`px-2 sm:px-4`修复克隆声音不可见MaterialSelector 标题行移除 `whitespace-nowrap`,描述移动端隐藏,修复刷新按钮溢出。
- [x] **生成配音按钮放大**: 从辅助尺寸(`text-xs px-2 py-1`)升级为主操作尺寸(`text-sm font-medium px-4 py-2`),新增阴影。
- [x] **生成进度条位置调整**: 从"六、作品"卡片内部提取到右栏独立卡片,显示在作品卡片上方,更醒目。
- [x] **LatentSync 超时修复**: httpx 超时从 1200s20 分钟)改为 3600s1 小时),修复 2 分钟以上视频口型推理超时回退问题。
- [x] **字幕时间戳节奏映射**: `whisper_service.py` 从全程线性插值改为 Whisper 逐词节奏映射,修复长视频字幕漂移。
### Day 25: 文案提取修复 + 自定义提示词 + 片头副标题
- [x] **抖音文案提取修复**: yt-dlp Fresh cookies 报错,重写 `_download_douyin_manual` 为移动端分享页 + 自动获取 ttwid 方案。
- [x] **清理 DOUYIN_COOKIE**: 新方案不再需要手动维护 Cookie`.env`/`config.py`/`service.py` 全面删除。
- [x] **AI 智能改写自定义提示词**: 后端 `rewrite_script()` 支持 `custom_prompt` 参数;前端 checkbox 旁新增折叠式提示词编辑区localStorage 持久化。
- [x] **SSR 构建修复**: `useState` 初始化 `localStorage` 访问加 `typeof window` 守卫,修复 `npm run build` 报错。
- [x] **片头副标题**: 新增 secondary_title后端/Remotion/前端全链路AI 同时生成独立样式配置20 字限制。
- [x] **前端文案修正**: "AI 洗稿结果"→"AI 改写结果"。
- [x] **yt-dlp 升级**: `2025.12.08``2026.2.21`
- [x] **参考音频中文文件名修复**: `sanitize_filename()` 将存储路径清洗为 ASCII 安全字符,纯中文名哈希兜底,原始名保留为展示名。
### Day 24: 鉴权到期治理 + 多素材时间轴稳定性修复
- [x] **会员到期请求时失效**: 登录与鉴权接口统一执行 `expires_at` 检查;到期后自动停用账号、清理 session并返回“会员已到期请续费”。
- [x] **画面比例控制**: 时间轴新增 `9:16 / 16:9` 输出比例选择,前端持久化并透传后端,单素材/多素材统一按目标分辨率处理。
- [x] **标题/字幕防溢出**: Remotion 与前端预览统一响应式缩放、自动换行、描边/字距/边距比例缩放,降低预览与成片差异。
- [x] **标题显示模式**: 标题行新增“短暂显示/常驻显示”下拉默认短暂显示4 秒),用户选择持久化并透传至 Remotion 渲染链路。
- [x] **MOV 方向归一化**: 新增旋转元数据解析与 orientation normalize修复“编码横屏+旋转元数据”导致的竖屏判断偏差。
- [x] **多素材拼接稳定性**: 片段 prepare 与 concat 统一 25fps/CFRconcat 增加 `+genpts`,缓解段切换处“画面冻结口型还动”。
- [x] **时间轴语义对齐**: 打通 `source_end` 全链路;修复 `sourceStart>0 且 sourceEnd=0` 时长计算;生成时以时间轴可见段 assignments 为准,超出段不参与。
- [x] **交互细节优化**: 页面刷新回顶部;素材/历史列表首轮自动滚动抑制,减少恢复状态时页面跳动。
### Day 23: 配音前置重构 + 素材时间轴编排 + UI 体验优化 + 声音克隆增强
#### 第一阶段:配音前置
- [x] **配音生成独立化**: 新增 `generated_audios` 后端模块router/schemas/service5 个 API 端点,复用现有 TTSService / voice_clone_service / task_store。
- [x] **配音管理面板**: 前端新增 `useGeneratedAudios` hook + `GeneratedAudiosPanel` 组件,支持生成/试听/改名/删除/选中。
- [x] **UI 面板重排序**: 文案 → 标题字幕 → 配音方式 → 配音列表 → 素材选择 → BGM → 生成视频。
- [x] **素材区门控**: 未选中配音时素材区显示遮罩,选中后显示配音时长 + 素材均分信息。
- [x] **视频生成对接**: workflow.py 新增预生成音频分支(`generated_audio_id`),跳过内联 TTS向后兼容。
- [x] **持久化**: selectedAudioId 加入 useHomePersistence刷新页面恢复选中配音。
#### 第二阶段:素材时间轴编排
- [x] **时间轴编辑器**: 新增 `TimelineEditor` 组件wavesurfer.js 音频波形 + 色块可视化素材分配,拖拽分割线调整各段时长。
- [x] **素材截取设置**: 新增 `ClipTrimmer` 模态框HTML5 视频预览 + 双端滑块设置源视频截取起点/终点。
- [x] **后端自定义分配**: 新增 `CustomAssignment` 模型,`prepare_segment` 支持 `source_start`workflow 多素材/单素材流水线支持 `custom_assignments`
- [x] **循环截取修复**: `stream_loop + source_start` 改为两步处理(先裁剪再循环),确保从截取起点循环而非从视频 0s 开始。
- [x] **MaterialSelector 精简**: 移除旧的时长信息栏和拖拽排序区(功能迁移到 TimelineEditor
#### 第三阶段UI 体验优化 + TTS 稳定性
- [x] **TTS SoX PATH 修复**: `run_qwen_tts.sh` export conda env bin 到 PATH (Qwen3-TTS 已停用,已被 CosyVoice 3.0 替换)。
- [x] **TTS 显存管理**: 每次生成后 `torch.cuda.empty_cache()`asyncio.to_thread 避免阻塞事件循环 (CosyVoice 沿用相同机制)。
- [x] **配音列表按钮统一**: Play/Edit/Delete 按钮右侧同组 hover 显示,与 RefAudioPanel 一致,移除文案摘要。
- [x] **素材区解除配音门控**: 移除 MaterialSelector 的 selectedAudio 遮罩,素材随时可上传管理。
- [x] **时间轴拖拽排序**: TimelineEditor 色块支持 HTML5 Drag & Drop 调换素材顺序。
- [x] **截取设置 Range Slider**: ClipTrimmer 改为单轨道双手柄(紫色起点+粉色终点),替换两个独立滑块。
- [x] **截取设置视频预览**: 视频区域可播放/暂停,从 sourceStart 到 sourceEnd 自动停止,拖拽手柄时实时 seek。
#### 第四阶段:历史文案 + Bug 修复
- [x] **历史文案保存与加载**: 新增 `useSavedScripts` hook手动保存/加载/删除历史文案,独立 localStorage 持久化。
- [x] **时间轴拖拽修复**: `reorderSegments` 从属性交换改为数组移动splice修复拖拽后时长不跟随素材的 Bug。
- [x] **按钮视觉统一**: 文案编辑区 4 个按钮统一为固定高度 `h-7`,移除多余 `<span>` 嵌套。
- [x] **底部栏调整**: "保存文案"按钮移至底部右侧,移除预计时长显示。
#### 第五阶段:字幕语言不匹配 + 视频比例错位修复
- [x] **字幕用原文替换 Whisper 转录**: `align()` 新增 `original_text` 参数,字幕文字永远用配音保存的原始文案。
- [x] **Remotion 动态视频尺寸**: `calculateMetadata` 从 props 读取真实尺寸,修复标题/字幕比例错位。
- [x] **英文空格丢失修复**: `split_word_to_chars` 遇到空格时 flush buffer + pending_space 标记。
#### 第六阶段:参考音频自动转写 + 语速控制
- [x] **Whisper 自动转写 ref_text**: 上传参考音频时自动调用 Whisper 转写内容作为 ref_text不再使用前端固定文字。
- [x] **参考音频自动截取**: 超过 10 秒自动在静音点截取ffmpeg silencedetect末尾 0.1 秒淡出避免截断爆音。
- [x] **重新识别功能**: 新增 `POST /ref-audios/{id}/retranscribe` 端点 + 前端 RotateCw 按钮,旧音频可重新转写并截取。
- [x] **语速控制**: 全链路 speed 参数(前端选择器 → 持久化 → 后端 → CosyVoice `inference_zero_shot(speed=)`5 档:较慢(0.8)/稍慢(0.9)/正常(1.0)/稍快(1.1)/较快(1.2)。
- [x] **缺少参考音频门控**: 声音克隆模式下未选参考音频时,生成配音按钮禁用 + 黄色警告提示。
- [x] **Whisper 语言自动检测**: `transcribe()` language 参数改为可选(默认 None = 自动检测),支持多语言参考音频。
- [x] **前端清理**: 移除固定 ref_text 常量、朗读引导文字,简化为"上传任意语音样本,系统将自动识别内容并克隆声音"。
### Day 22: 多素材优化 + AI 翻译 + TTS 多语言
- [x] **多素材 Bug 修复**: 6 个高优 Bug边界溢出、单段 fallback、除零、duration 校验、Whisper 兜底、空列表检查)。
- [x] **架构重构**: 多素材从"逐段 LatentSync"重构为"先拼接再推理",推理次数 N→1。
- [x] **前端优化**: payload 安全、进度消息、上传自动选中、Material 接口统一、拖拽修复、素材上限 4 个。
- [x] **AI 多语言翻译**: 新增 `/api/ai/translate` 接口,前端 9 种语言翻译 + 还原原文。
- [x] **TTS 多语言**: EdgeTTS 10 语言声音列表、翻译自动切换声音、声音克隆 language 透传、textLang 持久化。
### Day 21: 缺陷修复 + 浮动预览 + 发布重构 + 架构优化 + 多素材生成
- [x] **Remotion 崩溃容错**: 渲染进程 SIGABRT 退出时检查输出文件,避免误判失败导致标题/字幕丢失。
- [x] **首页作品选择持久化**: 修复 `fetchGeneratedVideos` 无条件覆盖恢复值的问题,新增 `preferVideoId` 参数控制选中逻辑。
- [x] **发布页作品选择持久化**: 根因为签名 URL 不稳定,全面改用 `video.id` 替代 `path` 进行选择/持久化/比较。
- [x] **预取缓存补全**: 首页预取发布页数据时加入 `id` 字段,确保缓存数据可用于持久化匹配。
- [x] **浮动样式预览窗口**: 标题字幕预览改为 `position: fixed` 浮动窗口,固定左上角,滚动时始终可见。
- [x] **移动端适配**: ScriptEditor 按钮换行、预览默认比例改为 9:16 竖屏。
- [x] **多平台发布重构**: 平台配置独立化DOUYIN_*/WEIXIN_*)、用户隔离 Cookie 管理、抖音刷脸验证二维码、微信发布流程优化。
- [x] **前端结构微调**: ScriptExtractionModal 迁移到 features/、contexts 迁移到 shared/contexts/、清理空目录。
- [x] **后端模块分层**: materials/tools/ref_audios 三个模块补全 router+schemas+service 分层。
- [x] **开发规范更新**: BACKEND_DEV.md 新增渐进原则、DOC_RULES.md 取消 TASK_COMPLETE.md 手动触发约束。
- [x] **文档全面更新**: BACKEND_DEV/README、FRONTEND_DEV、DEPLOY_MANUAL、README.md 同步更新。
- [x] **多素材视频生成(多机位效果)**: 支持多选素材 + 拖拽排序,按素材数量均分音频时长(对齐 Whisper 字边界)自动切换机位。逐段 LatentSync + FFmpeg 拼接。前端 @dnd-kit 拖拽排序 UI。
- [x] **字幕开关移除**: 默认启用逐字高亮字幕,移除开关及相关死代码。
- [x] **视频格式扩展**: 上传支持 mkv/webm/flv/wmv/m4v/ts/mts 等常见格式。
- [x] **Watchdog 优化**: 健康检查阈值提高到 5 次,新增重启冷却期 120 秒,避免误重启。
- [x] **多素材 Bug 修复**: 修复标点分句方案对无句末标点文案无效(改为均分方案)、音频时间偏移导致口型不对齐等缺陷。
### Day 20: 代码质量与安全优化
- [x] **功能性修复**: LatentSync 回退逻辑、任务状态接口认证、User 类型统一。
- [x] **性能优化**: N+1 查询修复、视频上传流式处理、httpx 异步替换、GLM 异步包装。
- [x] **安全修复**: 硬编码 Cookie 配置化、日志敏感信息脱敏、ffprobe 安全调用、CORS 配置化。
- [x] **配置优化**: 存储路径环境变量化、Remotion 预编译加速、LatentSync 绝对路径。
- [x] **文档更新**: 更新 Doc_Rules.md 清单,补齐后端与部署文档;更新 SUBTITLE_DEPLOY.md, FRONTEND_DEV.md, implementation_plan.md。
- [x] **文档更新**: 更新 DOC_RULES.md 清单,补齐后端与部署文档;更新 SUBTITLE_DEPLOY.md, FRONTEND_DEV.md, implementation_plan.md。
- [x] **缺陷修复**: 修复 Remotion 路径解析、发布页持久化竞态、首页选中回归、素材闭包陷阱。
### Day 19: 自动发布稳定性与发布体验优化 🚀
@@ -66,7 +189,7 @@
- [x] **体验细节优化**: 录音预览 URL 回收,预览弹窗滚动恢复,全局任务提示挂载。
### Day 16: 深度性能优化
- [x] **Qwen-TTS 加速**: 集成 Flash Attention 2,模型加载速度提升至 8.9s
- [x] **Qwen-TTS 加速**: 集成 Flash Attention 2 (已停用,被 CosyVoice 3.0 替换)
- [x] **服务守护**: 开发 `Watchdog` 看门狗机制,自动监控并重启僵死服务。
- [x] **LatentSync 性能确认**: 验证 DeepCache + 原生 Flash Attn 生效。
- [x] **文档重构**: 全面更新 README、部署手册及后端文档。
@@ -79,10 +202,10 @@
### Day 14: AI 增强与体验优化
- [x] **AI 标题/标签**: 集成 GLM-4API 自动生成视频元数据。
- [x] **字幕升级**: Remotion 逐字高亮字幕 (卡拉OK效果) 及动画片头。
- [x] **模型升级**: Qwen3-TTS 升级至 1.7B-Base 版本
- [x] **模型升级**: 声音克隆已迁移至 CosyVoice 3.0 (0.5B)
### Day 13: 声音克隆集成
- [x] **声音克隆微服务**: 封装 Qwen3-TTS 为独立 API (8009端口)。
- [x] **声音克隆微服务**: 封装 CosyVoice 3.0 为独立 API (8010端口替换 Qwen3-TTS)。
- [x] **参考音频管理**: Supabase 存储桶配置与管理接口。
- [x] **多模态 TTS**: 前端支持 EdgeTTS / Clone Voice 切换。
@@ -117,6 +240,7 @@
## 🛤️ 后续规划 (Roadmap)
### 🔴 优先待办
- [x] ~~**配音前置重构 — 第二阶段**: 素材片段截取 + 语音时间轴编排~~ ✅ Day 23 已完成
- [ ] **批量生成架构**: 支持 Excel 导入,批量生产视频。
- [ ] **定时任务后台化**: 迁移前端触发的定时发布到后端 APScheduler。
- [ ] **发布任务恢复机制**: 发布任务化 + 状态持久化 + 前端断点恢复,解决刷新后状态丢失。
@@ -134,9 +258,10 @@
| **核心 API** | 100% | ✅ 稳定 |
| **Web UI** | 100% | ✅ 稳定 (移动端适配) |
| **唇形同步** | 100% | ✅ LatentSync 1.6 |
| **TTS 配音** | 100% | ✅ EdgeTTS + Qwen3 |
| **TTS 配音** | 100% | ✅ EdgeTTS + CosyVoice 3.0 + 配音前置 + 时间轴编排 + 自动转写 + 语速控制 |
| **自动发布** | 100% | ✅ 抖音/微信视频号/B站/小红书 |
| **用户认证** | 100% | ✅ 手机号 + JWT |
| **付费会员** | 100% | ✅ 支付宝电脑网站支付 + 自动激活 |
| **部署运维** | 100% | ✅ PM2 + Watchdog |
---

View File

@@ -4,8 +4,8 @@
> 📹 **上传人物** · 🎙️ **输入文案** · 🎬 **一键成片**
基于 **LatentSync 1.6 + EdgeTTS** 的开源数字人口播视频生成系统。
集成 **Qwen3-TTS** 声音克隆与自动社交媒体发布功能。
基于 **LatentSync 1.6 + MuseTalk 1.5 混合唇形同步** 的开源数字人口播视频生成系统。
集成 **CosyVoice 3.0** 声音克隆与自动社交媒体发布功能。
[功能特性](#-功能特性) • [技术栈](#-技术栈) • [文档中心](#-文档中心) • [部署指南](Docs/DEPLOY_MANUAL.md)
@@ -15,24 +15,29 @@
## ✨ 功能特性
### 核心能力
- 🎬 **高清唇形同步** - LatentSync 1.6 驱动512×512 高分辨率 Latent Diffusion 模型
- 🎙️ **多模态配音** - 支持 **EdgeTTS** (微软超自然语音) 和 **Qwen3-TTS** (3秒极速声音克隆)
- 📝 **智能字幕** - 集成 faster-whisper + Remotion自动生成逐字高亮 (卡拉OK效果) 字幕。
- 🎨 **样式预设** - 标题/字幕样式选择 + 预览 + 字号调节,支持自定义字体库。
- 🖼 **作品预览一致性** - 标题/字幕预览按素材分辨率缩放,效果更接近成片
- 💾 **用户偏好持久化** - 首页状态统一恢复/保存,刷新后延续上次配置
- 🎵 **背景音乐** - 试听 + 音量控制 + 混音,保持配音音量稳定
- 🤖 **AI 辅助创作** - 内置 GLM-4.7-Flash支持 B站/抖音链接文案提取、AI 洗稿、标题/标签自动生成
### 核心能力
- 🎬 **高清唇形同步** - 混合方案:短视频 (<120s) 用 LatentSync 1.6 (高质量 Latent Diffusion),长视频 (>=120s) 用 MuseTalk 1.5 (实时级单步推理),自动路由 + 回退
- 🎙️ **多模态配音** - 支持 **EdgeTTS** (微软超自然语音, 10 语言) 和 **CosyVoice 3.0** (3秒极速声音克隆, 9语言+18方言, 语速可调)。上传参考音频自动 Whisper 转写 + 智能截取。配音前置工作流:先生成配音 → 选素材 → 生成视频
- 📝 **智能字幕** - 集成 faster-whisper + Remotion自动生成逐字高亮 (卡拉OK效果) 字幕。
- 🎨 **样式预设** - 12 种标题 + 8 种字幕样式预设,支持预览 + 字号调节 + 自定义字体库。CSS 原生描边渲染,清晰无重影。
- 🏷 **标题显示模式** - 片头标题支持 `短暂显示` / `常驻显示`默认短暂显示4秒用户偏好自动持久化
- 📌 **片头副标题** - 可选副标题显示在主标题下方独立样式配置AI 可同时生成20 字限制
- 🖼️ **作品预览一致性** - 标题/字幕预览与 Remotion 成片统一响应式缩放和自动换行,窄屏画布也稳定显示
- 🎞️ **多素材多机位** - 支持多选素材 + 时间轴编辑器 (wavesurfer.js 波形可视化),拖拽分割线调整时长、拖拽排序切换机位、按 `source_start/source_end` 截取片段
- 📐 **画面比例控制** - 时间轴一键切换 `9:16 / 16:9` 输出比例,生成链路全程按目标比例处理。
- 💾 **用户偏好持久化** - 首页状态统一恢复/保存,刷新后延续上次配置。历史文案手动保存与加载。
- 🎵 **背景音乐** - 试听 + 音量控制 + 混音,保持配音音量稳定。
- 🤖 **AI 辅助创作** - 内置 GLM-4.7-Flash支持 B站/抖音链接文案提取、AI 智能改写(支持自定义提示词)、标题/标签自动生成、9 语言翻译。
### 平台化功能
- 📱 **全自动发布** - 支持抖音/微信视频号/B站/小红书立即发布;扫码登录 + Cookie 持久化。
- 🖥️ **发布管理预览** - 支持签名 URL / 相对路径作品预览,确保可直接播放。
- 📸 **发布结果可视化** - 抖音/微信视频号发布成功后返回截图,发布页结果卡片可直接查看。
- 🛡️ **发布防误操作** - 发布进行中自动提示“请勿刷新或关闭网页”,并拦截刷新/关页二次确认。
- 🔐 **认证与隔离** - 基于 Supabase 的用户隔离,支持手机号注册/登录、密码管理
- 🛡️ **服务守护** - 内置 Watchdog 看门狗机制,自动监控并重启僵死服务,确保 7x24h 稳定运行
- 🚀 **性能优化** - 视频预压缩、模型常驻服务(近实时加载)、双 GPU 流水线并发
### 平台化功能
- 📱 **全自动发布** - 支持抖音/微信视频号/B站/小红书立即发布;扫码登录 + Cookie 持久化。
- 🖥️ **发布管理预览** - 支持签名 URL / 相对路径作品预览,确保可直接播放。
- 📸 **发布结果可视化** - 抖音/微信视频号发布成功后返回截图,发布页结果卡片可直接查看。
- 🛡️ **发布防误操作** - 发布进行中自动提示“请勿刷新或关闭网页”,并拦截刷新/关页二次确认。
- 💳 **付费会员** - 支付宝电脑网站支付自动开通会员,到期自动停用并引导续费,管理员手动激活并存
- 🔐 **认证与隔离** - 基于 Supabase 的用户隔离,支持手机号注册/登录、密码管理
- 🛡️ **服务守护** - 内置 Watchdog 看门狗机制,自动监控并重启僵死服务,确保 7x24h 稳定运行
- 🚀 **性能优化** - 视频预压缩、模型常驻服务(近实时加载)、双 GPU 流水线并发、MuseTalk 人脸检测降频 + BiSeNet 缓存、Remotion 16 并发渲染。
---
@@ -40,11 +45,11 @@
| 领域 | 核心技术 | 说明 |
|------|----------|------|
| **前端** | Next.js 16 | TypeScript, TailwindCSS, SWR |
| **后端** | FastAPI | Python 3.10, AsyncIO, PM2 |
| **前端** | Next.js 16 | TypeScript, TailwindCSS, SWR, wavesurfer.js |
| **后端** | FastAPI | Python 3.12, AsyncIO, PM2 |
| **数据库** | Supabase | PostgreSQL, Storage (本地/S3), Auth |
| **唇形同步** | LatentSync 1.6 | PyTorch 2.5, Diffusers, DeepCache |
| **声音克隆** | Qwen3-TTS | 1.7B 参数量Flash Attention 2 加速 |
| **唇形同步** | LatentSync 1.6 + MuseTalk 1.5 | 混合路由:短视频 Diffusion 高质量,长视频单步实时推理 |
| **声音克隆** | CosyVoice 3.0 | 0.5B 参数量9 语言 + 18 方言 |
| **自动化** | Playwright | 社交媒体无头浏览器自动化 |
| **部署** | Docker & PM2 | 混合部署架构 |
@@ -56,14 +61,18 @@
### 部署运维
- **[部署手册 (DEPLOY_MANUAL.md)](Docs/DEPLOY_MANUAL.md)** - 👈 **部署请看这里**!包含完整的环境搭建步骤。
- [参考音频服务部署 (QWEN3_TTS_DEPLOY.md)](Docs/QWEN3_TTS_DEPLOY.md) - 声音克隆模型部署指南。
- [LatentSync 部署指南](models/LatentSync/DEPLOY.md) - 唇形同步模型独立部署。
- [用户认证部署 (AUTH_DEPLOY.md)](Docs/AUTH_DEPLOY.md) - Supabase 与 Auth 系统配置
- [参考音频服务部署 (COSYVOICE3_DEPLOY.md)](Docs/COSYVOICE3_DEPLOY.md) - 声音克隆模型部署指南。
- [LatentSync 部署指南 (LATENTSYNC_DEPLOY.md)](Docs/LATENTSYNC_DEPLOY.md) - 唇形同步模型独立部署。
- [MuseTalk 部署指南 (MUSETALK_DEPLOY.md)](Docs/MUSETALK_DEPLOY.md) - 长视频唇形同步模型部署
- [Supabase 部署指南 (SUPABASE_DEPLOY.md)](Docs/SUPABASE_DEPLOY.md) - Supabase 与认证系统配置。
- [支付宝部署指南 (ALIPAY_DEPLOY.md)](Docs/ALIPAY_DEPLOY.md) - 支付宝付费开通会员配置。
### 开发文档
- [后端开发指南](Docs/BACKEND_README.md) - 接口规范与开发流程。
- [后端开发规范](Docs/BACKEND_DEV.md) - 分层约定与开发习惯。
- [前端开发指南](Docs/FRONTEND_DEV.md) - UI 组件与页面规范。
### 开发文档
- [后端开发指南 (BACKEND_README.md)](Docs/BACKEND_README.md) - 接口规范与开发流程。
- [后端开发规范 (BACKEND_DEV.md)](Docs/BACKEND_DEV.md) - 分层约定与开发习惯。
- [前端开发指南 (FRONTEND_DEV.md)](Docs/FRONTEND_DEV.md) - UI 组件与页面规范。
- [前端组件文档 (FRONTEND_README.md)](Docs/FRONTEND_README.md) - 组件结构与板块说明。
- [Remotion 字幕部署 (SUBTITLE_DEPLOY.md)](Docs/SUBTITLE_DEPLOY.md) - 字幕渲染服务部署。
- [开发日志 (DevLogs)](Docs/DevLogs/) - 每日开发进度与技术决策记录。
---
@@ -74,12 +83,15 @@
ViGent2/
├── backend/ # FastAPI 后端服务
│ ├── app/ # 核心业务逻辑
│ ├── scripts/ # 运维脚本 (Watchdog 等)
── tests/ # 测试用例
│ ├── assets/ # 字体 / 样式 / BGM
── user_data/ # 用户隔离数据 (Cookie 等)
│ └── scripts/ # 运维脚本 (Watchdog 等)
├── frontend/ # Next.js 前端应用
├── remotion/ # Remotion 视频渲染 (标题/字幕合成)
├── models/ # AI 模型仓库
│ ├── LatentSync/ # 唇形同步服务
── Qwen3-TTS/ # 声音克隆服务
│ ├── LatentSync/ # 唇形同步服务 (GPU1, 短视频)
── MuseTalk/ # 唇形同步服务 (GPU0, 长视频)
│ └── CosyVoice/ # 声音克隆服务
└── Docs/ # 项目文档
```
@@ -93,8 +105,9 @@ ViGent2/
|----------|------|------|
| **Web UI** | 3002 | 用户访问入口 (Next.js) |
| **Backend API** | 8006 | 核心业务接口 (FastAPI) |
| **LatentSync** | 8007 | 唇形同步推理服务 |
| **Qwen3-TTS** | 8009 | 声音克隆推理服务 |
| **LatentSync** | 8007 | 唇形同步推理服务 (GPU1, 短视频) |
| **MuseTalk** | 8011 | 唇形同步推理服务 (GPU0, 长视频) |
| **CosyVoice 3.0** | 8010 | 声音克隆推理服务 |
| **Supabase** | 8008 | 数据库与认证网关 |
---

View File

@@ -25,10 +25,10 @@ LATENTSYNC_USE_SERVER=true
# LATENTSYNC_API_URL=http://localhost:8007
# 推理步数 (20-50, 越高质量越好,速度越慢)
LATENTSYNC_INFERENCE_STEPS=40
LATENTSYNC_INFERENCE_STEPS=16
# 引导系数 (1.0-3.0, 越高唇同步越准,但可能抖动)
LATENTSYNC_GUIDANCE_SCALE=2.0
LATENTSYNC_GUIDANCE_SCALE=1.5
# 启用 DeepCache 加速 (推荐开启)
LATENTSYNC_ENABLE_DEEPCACHE=true
@@ -36,6 +36,26 @@ LATENTSYNC_ENABLE_DEEPCACHE=true
# 随机种子 (设为 -1 则随机)
LATENTSYNC_SEED=1247
# =============== MuseTalk 配置 ===============
# GPU 选择 (默认 GPU0与 CosyVoice 共存)
MUSETALK_GPU_ID=0
# 常驻服务地址 (端口 8011)
MUSETALK_API_URL=http://localhost:8011
# 推理批大小
MUSETALK_BATCH_SIZE=32
# 模型版本
MUSETALK_VERSION=v15
# 半精度加速
MUSETALK_USE_FLOAT16=true
# =============== 混合唇形同步路由 ===============
# 音频时长 >= 此阈值(秒)用 MuseTalk< 此阈值用 LatentSync
LIPSYNC_DURATION_THRESHOLD=120
# =============== 上传配置 ===============
# 最大上传文件大小 (MB)
MAX_UPLOAD_SIZE_MB=500
@@ -70,6 +90,9 @@ GLM_MODEL=glm-4.7-flash
# 确保存储卷映射正确,避免硬编码路径
SUPABASE_STORAGE_LOCAL_PATH=/home/rongye/ProgramFiles/Supabase/volumes/storage/stub/stub
# =============== 抖音视频下载 Cookie ===============
# 用于从抖音 URL 提取视频文案功能,会过期需要定期更新
DOUYIN_COOKIE=douyin.com; device_web_cpu_core=10; device_web_memory_size=8; __ac_nonce=06760391f00b9b51264ae; __ac_signature=_02B4Z6wo00f019a5ceAAAIDAhEZR-X3jjWfWmXVAAJLXd4; ttwid=1%7C7MTKBSMsP4eOv9h5NAh8p0E-NYIud09ftNmB0mjLpWc%7C1734359327%7C8794abeabbd47447e1f56e5abc726be089f2a0344d6343b5f75f23e7b0f0028f; UIFID_TEMP=0de8750d2b188f4235dbfd208e44abbb976428f0720eb983255afefa45d39c0c6532e1d4768dd8587bf919f866ff1396912bcb2af71efee56a14a2a9f37b74010d0a0413795262f6d4afe02a032ac7ab; s_v_web_id=verify_m4r4ribr_c7krmY1z_WoeI_43po_ATpO_I4o8U1bex2D7; hevc_supported=true; home_can_add_dy_2_desktop=%220%22; dy_swidth=2560; dy_sheight=1440; stream_recommend_feed_params=%22%7B%5C%22cookie_enabled%5C%22%3Atrue%2C%5C%22screen_width%5C%22%3A2560%2C%5C%22screen_height%5C%22%3A1440%2C%5C%22browser_online%5C%22%3Atrue%2C%5C%22cpu_core_num%5C%22%3A10%2C%5C%22device_memory%5C%22%3A8%2C%5C%22downlink%5C%22%3A10%2C%5C%22effective_type%5C%22%3A%5C%224g%5C%22%2C%5C%22round_trip_time%5C%22%3A50%7D%22; strategyABtestKey=%221734359328.577%22; csrf_session_id=2f53aed9aa6974e83aa9a1014180c3a4; fpk1=U2FsdGVkX1/IpBh0qdmlKAVhGyYHgur4/VtL9AReZoeSxadXn4juKvsakahRGqjxOPytHWspYoBogyhS/V6QSw==; fpk2=0845b309c7b9b957afd9ecf775a4c21f; passport_csrf_token=d80e0c5b2fa2328219856be5ba7e671e; passport_csrf_token_default=d80e0c5b2fa2328219856be5ba7e671e; odin_tt=3c891091d2eb0f4718c1d5645bc4a0017032d4d5aa989decb729e9da2ad570918cbe5e9133dc6b145fa8c758de98efe32ff1f81aa0d611e838cc73ab08ef7d3f6adf66ab4d10e8372ddd628f94f16b8e; volume_info=%7B%22isUserMute%22%3Afalse%2C%22isMute%22%3Afalse%2C%22volume%22%3A0.5%7D; bd_ticket_guard_client_web_domain=2; FORCE_LOGIN=%7B%22videoConsumedRemainSeconds%22%3A180%7D; UIFID=0de8750d2b188f4235dbfd208e44abbb976428f0720eb983255afefa45d39c0c6532e1d4768dd8587bf919f866ff139655a3c2b735923234f371c699560c657923fd3d6c5b63ab7bb9b83423b6cb4787e2ce66a7fbc4ecb24c8570f520fe6de068bbb95115023c0c6c1b6ee31b49fb7e3996fb8349f43a3fd8b7a61cd9e18e8fe65eb6a7c13de4c0960d84e344b644725db3eb2fa6b7caf821de1b50527979f2; is_dash_user=1; biz_trace_id=b57a241f; bd_ticket_guard_client_data=eyJiZC10aWNrZXQtZ3VhcmQtdmVyc2lvbiI6MiwiYmQtdGlja2V0LWd1YXJkLWl0ZXJhdGlvbi12ZXJzaW9uIjoxLCJiZC10aWNrZXQtZ3VhcmQtcmVlLXB1YmxpYy1rZXkiOiJCTEo2R0lDalVoWW1XcHpGOFdrN0Vrc0dXcCtaUzNKY1g4NGNGY2k0TTl1TEowNjdUb21mbFU5aDdvWVBGamhNRWNRQWtKdnN1MnM3RmpTWnlJQXpHMjA9IiwiYmQtdGlja2V0LWd1YXJkLXdlYi12ZXJzaW9uIjoyfQ%3D%3D; download_guide=%221%2F20241216%2F0%22; sdk_source_info=7e276470716a68645a606960273f276364697660272927676c715a6d6069756077273f276364697660272927666d776a68605a607d71606b766c6a6b5a7666776c7571273f275e58272927666a6b766a69605a696c6061273f27636469766027292762696a6764695a7364776c6467696076273f275e5827292771273f273d33323131333c3036313632342778; bit_env=RiOY4jzzpxZoVCl6zdVSVhVRjdwHRTxqcqWdqMBZLPGjMdB4Tax1kAELHNTVAAh72KuhumewE4Lq6f0-VJ2UpJrkrhSxoPw9LUb3zQrq1OSwbeSPHkRlRgRQvO89sItdGUyq1oFr0XyRCnMYG87KSeWyc4x0czGR0o50hTDoDLG5rJVoRcdQOLvjiAegsqyytKF59sPX_QM9qffK2SqYsg0hCggURc_AI6kguDDE5DvG0bnyz1utw4z1eEnIoLrkGDqzqBZj4dOAr0BVU6ofbsS-pOQ2u2PM1dLP9FlBVBlVaqYVgHJeSLsR5k76BRTddUjTb4zEilVIEwAMJWGN4I1BxVt6fC9B5tBQpuT0lj3n3eKXCKXZsd8FrEs5_pbfDsxV-e_WMiXI2ff4qxiTC0U73sfo9OpicKICtZjdq8qsHxJuu6wVR36zvXeL2Wch5C6MzprNvkivv0l8nbh2mSgy1nabZr3dmU6NcR-Bg3Q3xTWUlR9aAUmpopC-cNuXjgLpT-Lw1AYGilSUnCvosth1Gfypq-b0MpgmdSDgTrQ%3D; gulu_source_res=eyJwX2luIjoiMDhjOGQ3ZTJiODQyNjZkZWI5Y2VkMGJiODNlNmY1ZWY0ZjMyNTE2ZmYyZjAzNDMzZjI0OWU1Y2Q1NTczNTk5NyJ9; passport_auth_mix_state=hp9bc3dgb1tm5wd8p82zawus27g0e3ue; IsDouyinActive=false
# =============== 支付宝配置 ===============
ALIPAY_APP_ID=2021006132600283
ALIPAY_PRIVATE_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/app_private_key.pem
ALIPAY_PUBLIC_KEY_PATH=/home/rongye/ProgramFiles/ViGent2/backend/keys/alipay_public_key.pem
ALIPAY_NOTIFY_URL=https://vigent.hbyrkj.top/api/payment/notify
ALIPAY_RETURN_URL=https://vigent.hbyrkj.top/pay

View File

@@ -30,7 +30,7 @@ class Settings(BaseSettings):
# Douyin Playwright 配置
DOUYIN_HEADLESS_MODE: str = "headless-new"
DOUYIN_USER_AGENT: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
DOUYIN_USER_AGENT: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36"
DOUYIN_LOCALE: str = "zh-CN"
DOUYIN_TIMEZONE_ID: str = "Asia/Shanghai"
DOUYIN_CHROME_PATH: str = "/usr/bin/google-chrome"
@@ -57,7 +57,17 @@ class Settings(BaseSettings):
LATENTSYNC_ENABLE_DEEPCACHE: bool = True # 启用 DeepCache 加速
LATENTSYNC_SEED: int = 1247 # 随机种子 (-1 则随机)
LATENTSYNC_USE_SERVER: bool = True # 使用常驻服务 (Persistent Server) 加速
# MuseTalk 配置
MUSETALK_GPU_ID: int = 0 # GPU ID (默认使用 GPU0)
MUSETALK_API_URL: str = "http://localhost:8011" # 常驻服务地址
MUSETALK_BATCH_SIZE: int = 8 # 推理批大小
MUSETALK_VERSION: str = "v15" # 模型版本
MUSETALK_USE_FLOAT16: bool = True # 半精度加速
# 混合唇形同步路由
LIPSYNC_DURATION_THRESHOLD: float = 120.0 # 秒,>=此值用 MuseTalk
# Supabase 配置
SUPABASE_URL: str = ""
SUPABASE_PUBLIC_URL: str = "" # 公网访问地址,用于生成前端可访问的 URL
@@ -76,17 +86,28 @@ class Settings(BaseSettings):
GLM_API_KEY: str = ""
GLM_MODEL: str = "glm-4.7-flash"
# 支付宝配置
ALIPAY_APP_ID: str = ""
ALIPAY_PRIVATE_KEY_PATH: str = "" # 应用私钥 PEM 文件路径
ALIPAY_PUBLIC_KEY_PATH: str = "" # 支付宝公钥 PEM 文件路径
ALIPAY_NOTIFY_URL: str = "" # 异步通知回调地址(公网可达)
ALIPAY_RETURN_URL: str = "" # 支付成功后同步跳转地址
ALIPAY_SANDBOX: bool = False # 是否使用沙箱环境
PAYMENT_AMOUNT: float = 999.00 # 会员价格(元)
PAYMENT_EXPIRE_DAYS: int = 365 # 会员有效天数
# CORS 配置 (逗号分隔的域名列表,* 表示允许所有)
CORS_ORIGINS: str = "*"
# 抖音 Cookie (用于视频下载功能,会过期需要定期更新)
DOUYIN_COOKIE: str = ""
@property
def LATENTSYNC_DIR(self) -> Path:
"""LatentSync 目录路径 (动态计算)"""
return self.BASE_DIR.parent.parent / "models" / "LatentSync"
@property
def MUSETALK_DIR(self) -> Path:
"""MuseTalk 目录路径 (动态计算)"""
return self.BASE_DIR.parent.parent / "models" / "MuseTalk"
class Config:
env_file = ".env"
extra = "ignore" # 忽略未知的环境变量

View File

@@ -1,11 +1,11 @@
"""
依赖注入模块:认证和用户获取
"""
from typing import Optional, Any, Dict, cast
from typing import Optional, Any, Dict, cast
from fastapi import Request, HTTPException, Depends, status
from app.core.security import decode_access_token, TokenData
from app.repositories.sessions import get_session
from app.repositories.users import get_user_by_id
from app.core.security import decode_access_token
from app.repositories.sessions import get_session, delete_sessions
from app.repositories.users import get_user_by_id, deactivate_user_if_expired
from loguru import logger
@@ -14,9 +14,9 @@ async def get_token_from_cookie(request: Request) -> Optional[str]:
return request.cookies.get("access_token")
async def get_current_user_optional(
request: Request
) -> Optional[Dict[str, Any]]:
async def get_current_user_optional(
request: Request
) -> Optional[Dict[str, Any]]:
"""
获取当前用户 (可选,未登录返回 None)
"""
@@ -29,22 +29,30 @@ async def get_current_user_optional(
return None
# 验证 session_token 是否有效 (单设备登录检查)
try:
session = get_session(token_data.user_id, token_data.session_token)
if not session:
logger.warning(f"Session token 无效: user_id={token_data.user_id}")
return None
user = get_user_by_id(token_data.user_id)
return cast(Optional[Dict[str, Any]], user)
except Exception as e:
logger.error(f"获取用户信息失败: {e}")
return None
try:
session = get_session(token_data.user_id, token_data.session_token)
if not session:
logger.warning(f"Session token 无效: user_id={token_data.user_id}")
return None
user = cast(Optional[Dict[str, Any]], get_user_by_id(token_data.user_id))
if user and deactivate_user_if_expired(user):
delete_sessions(token_data.user_id)
return None
if user and not user.get("is_active"):
delete_sessions(token_data.user_id)
return None
return user
except Exception as e:
logger.error(f"获取用户信息失败: {e}")
return None
async def get_current_user(
request: Request
) -> Dict[str, Any]:
async def get_current_user(
request: Request
) -> Dict[str, Any]:
"""
获取当前用户 (必须登录)
@@ -66,40 +74,45 @@ async def get_current_user(
detail="Token 无效或已过期"
)
try:
session = get_session(token_data.user_id, token_data.session_token)
if not session:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="会话已失效,请重新登录(可能已在其他设备登录)"
)
user = get_user_by_id(token_data.user_id)
if not user:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="用户不存在"
)
user = cast(Dict[str, Any], user)
if user.get("expires_at"):
from datetime import datetime, timezone
expires_at = datetime.fromisoformat(user["expires_at"].replace("Z", "+00:00"))
if datetime.now(timezone.utc) > expires_at:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="授权已过期,请联系管理员续期"
)
return user
except HTTPException:
raise
except Exception as e:
logger.error(f"获取用户信息失败: {e}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="服务器错误"
)
try:
session = get_session(token_data.user_id, token_data.session_token)
if not session:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="会话已失效,请重新登录(可能已在其他设备登录)"
)
user = get_user_by_id(token_data.user_id)
if not user:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="用户不存在"
)
user = cast(Dict[str, Any], user)
if deactivate_user_if_expired(user):
delete_sessions(token_data.user_id)
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="会员已到期,请续费"
)
if not user.get("is_active"):
delete_sessions(token_data.user_id)
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="账号已停用"
)
return user
except HTTPException:
raise
except Exception as e:
logger.error(f"获取用户信息失败: {e}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="服务器错误"
)
async def get_current_admin(

View File

@@ -110,3 +110,28 @@ def set_auth_cookie(response: Response, token: str) -> None:
def clear_auth_cookie(response: Response) -> None:
"""清除认证 Cookie"""
response.delete_cookie(key="access_token")
def create_payment_token(user_id: str) -> str:
"""生成付费专用短期 JWT token30 分钟有效)"""
payload = {
"sub": user_id,
"purpose": "payment",
"exp": datetime.now(timezone.utc) + timedelta(minutes=30),
}
return jwt.encode(payload, settings.JWT_SECRET_KEY, algorithm=settings.JWT_ALGORITHM)
def decode_payment_token(token: str) -> str | None:
"""解析 payment_token返回 user_id仅 purpose=payment 有效)"""
try:
data = jwt.decode(
token,
settings.JWT_SECRET_KEY,
algorithms=[settings.JWT_ALGORITHM],
)
if data.get("purpose") != "payment":
return None
return data.get("sub")
except JWTError:
return None

View File

@@ -15,6 +15,8 @@ from app.modules.ref_audios.router import router as ref_audios_router
from app.modules.ai.router import router as ai_router
from app.modules.tools.router import router as tools_router
from app.modules.assets.router import router as assets_router
from app.modules.generated_audios.router import router as generated_audios_router
from app.modules.payment.router import router as payment_router
from loguru import logger
import os
@@ -124,6 +126,8 @@ app.include_router(ref_audios_router, prefix="/api/ref-audios", tags=["RefAudios
app.include_router(ai_router) # /api/ai
app.include_router(tools_router, prefix="/api/tools", tags=["Tools"])
app.include_router(assets_router, prefix="/api/assets", tags=["Assets"])
app.include_router(generated_audios_router, prefix="/api/generated-audios", tags=["GeneratedAudios"])
app.include_router(payment_router) # /api/payment
@app.on_event("startup")

View File

@@ -2,6 +2,8 @@
AI 相关 API 路由
"""
from typing import Optional
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
from loguru import logger
@@ -21,9 +23,43 @@ class GenerateMetaRequest(BaseModel):
class GenerateMetaResponse(BaseModel):
"""生成标题标签响应"""
title: str
secondary_title: str = ""
tags: list[str]
class RewriteRequest(BaseModel):
"""改写请求"""
text: str
custom_prompt: Optional[str] = None
class TranslateRequest(BaseModel):
"""翻译请求"""
text: str
target_lang: str
@router.post("/translate")
async def translate_text(req: TranslateRequest):
"""
AI 翻译文案
将文案翻译为指定目标语言
"""
if not req.text or not req.text.strip():
raise HTTPException(status_code=400, detail="文案不能为空")
if not req.target_lang or not req.target_lang.strip():
raise HTTPException(status_code=400, detail="目标语言不能为空")
try:
logger.info(f"Translating text to {req.target_lang}: {req.text[:50]}...")
translated = await glm_service.translate_text(req.text.strip(), req.target_lang.strip())
return success_response({"translated_text": translated})
except Exception as e:
logger.error(f"Translate failed: {e}")
raise HTTPException(status_code=500, detail=str(e))
@router.post("/generate-meta")
async def generate_meta(req: GenerateMetaRequest):
"""
@@ -39,8 +75,24 @@ async def generate_meta(req: GenerateMetaRequest):
result = await glm_service.generate_title_tags(req.text)
return success_response(GenerateMetaResponse(
title=result.get("title", ""),
secondary_title=result.get("secondary_title", ""),
tags=result.get("tags", [])
).model_dump())
except Exception as e:
logger.error(f"Generate meta failed: {e}")
raise HTTPException(status_code=500, detail=str(e))
@router.post("/rewrite")
async def rewrite_script(req: RewriteRequest):
"""AI 改写文案"""
if not req.text or not req.text.strip():
raise HTTPException(status_code=400, detail="文案不能为空")
try:
logger.info(f"Rewriting text: {req.text[:50]}...")
rewritten = await glm_service.rewrite_script(req.text.strip(), req.custom_prompt)
return success_response({"rewritten_text": rewritten})
except Exception as e:
logger.error(f"Rewrite failed: {e}")
raise HTTPException(status_code=500, detail=str(e))

View File

@@ -1,22 +1,32 @@
"""
认证 API注册、登录、登出、修改密码
"""
from fastapi import APIRouter, HTTPException, Response, status, Request
from fastapi import APIRouter, HTTPException, Response, status, Request, Depends
from fastapi.responses import JSONResponse
from pydantic import BaseModel, field_validator
from app.core.security import (
get_password_hash,
verify_password,
create_access_token,
generate_session_token,
set_auth_cookie,
clear_auth_cookie,
decode_access_token
)
from app.repositories.sessions import create_session, delete_sessions
from app.repositories.users import create_user, get_user_by_id, get_user_by_phone, user_exists_by_phone, update_user
from app.core.response import success_response
from app.core.security import (
get_password_hash,
verify_password,
create_access_token,
generate_session_token,
set_auth_cookie,
clear_auth_cookie,
decode_access_token,
create_payment_token,
)
from app.repositories.sessions import create_session, delete_sessions
from app.repositories.users import (
create_user,
get_user_by_id,
get_user_by_phone,
user_exists_by_phone,
update_user,
deactivate_user_if_expired,
)
from app.core.deps import get_current_user
from app.core.response import success_response
from loguru import logger
from typing import Optional, Any, cast
from typing import Optional, Any, cast
import re
router = APIRouter(prefix="/api/auth", tags=["认证"])
@@ -76,26 +86,26 @@ async def register(request: RegisterRequest):
注册后状态为 pending需要管理员激活
"""
try:
if user_exists_by_phone(request.phone):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="该手机号已注册"
)
if user_exists_by_phone(request.phone):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="该手机号已注册"
)
# 创建用户
password_hash = get_password_hash(request.password)
create_user({
"phone": request.phone,
"password_hash": password_hash,
"username": request.username or f"用户{request.phone[-4:]}",
"role": "pending",
"is_active": False
})
create_user({
"phone": request.phone,
"password_hash": password_hash,
"username": request.username or f"用户{request.phone[-4:]}",
"role": "pending",
"is_active": False
})
logger.info(f"新用户注册: {request.phone}")
return success_response(message="注册成功,请等待管理员审核激活")
return success_response(message="注册成功,请等待管理员审核激活")
except HTTPException:
raise
except Exception as e:
@@ -116,12 +126,12 @@ async def login(request: LoginRequest, response: Response):
- 实现"后踢前"单设备登录
"""
try:
user = cast(dict[str, Any], get_user_by_phone(request.phone) or {})
if not user:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="手机号或密码错误"
)
user = cast(dict[str, Any], get_user_by_phone(request.phone) or {})
if not user:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="手机号或密码错误"
)
# 验证密码
if not verify_password(request.password, user["password_hash"]):
@@ -130,29 +140,33 @@ async def login(request: LoginRequest, response: Response):
detail="手机号或密码错误"
)
# 检查是否激活
if not user["is_active"]:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="账号未激活,请等待管理员审核"
# 过期自动停用(注意:只更新 DB不修改内存中的 user 字典)
expired = deactivate_user_if_expired(user)
if expired:
delete_sessions(user["id"])
# 过期 或 未激活(新注册)→ 返回付费指引
if expired or not user["is_active"]:
payment_token = create_payment_token(user["id"])
return JSONResponse(
status_code=403,
content={
"success": False,
"message": "请付费开通会员",
"code": 403,
"data": {
"reason": "PAYMENT_REQUIRED",
"payment_token": payment_token,
}
}
)
# 检查授权是否过期
if user.get("expires_at"):
from datetime import datetime, timezone
expires_at = datetime.fromisoformat(user["expires_at"].replace("Z", "+00:00"))
if datetime.now(timezone.utc) > expires_at:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN,
detail="授权已过期,请联系管理员续期"
)
# 生成新的 session_token (后踢前)
session_token = generate_session_token()
# 删除旧 session插入新 session
delete_sessions(user["id"])
create_session(user["id"], session_token, None)
delete_sessions(user["id"])
create_session(user["id"], session_token, None)
# 生成 JWT Token
token = create_access_token(user["id"], session_token)
@@ -162,19 +176,19 @@ async def login(request: LoginRequest, response: Response):
logger.info(f"用户登录: {request.phone}")
return success_response(
data={
"user": UserResponse(
id=user["id"],
phone=user["phone"],
username=user.get("username"),
role=user["role"],
is_active=user["is_active"],
expires_at=user.get("expires_at")
).model_dump()
},
message="登录成功",
)
return success_response(
data={
"user": UserResponse(
id=user["id"],
phone=user["phone"],
username=user.get("username"),
role=user["role"],
is_active=user["is_active"],
expires_at=user.get("expires_at")
).model_dump()
},
message="登录成功",
)
except HTTPException:
raise
except Exception as e:
@@ -186,10 +200,10 @@ async def login(request: LoginRequest, response: Response):
@router.post("/logout")
async def logout(response: Response):
"""用户登出"""
clear_auth_cookie(response)
return success_response(message="已登出")
async def logout(response: Response):
"""用户登出"""
clear_auth_cookie(response)
return success_response(message="已登出")
@router.post("/change-password")
@@ -217,12 +231,12 @@ async def change_password(request: ChangePasswordRequest, req: Request, response
)
try:
user = cast(dict[str, Any], get_user_by_id(token_data.user_id) or {})
if not user:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="用户不存在"
)
user = cast(dict[str, Any], get_user_by_id(token_data.user_id) or {})
if not user:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="用户不存在"
)
# 验证当前密码
if not verify_password(request.old_password, user["password_hash"]):
@@ -233,13 +247,13 @@ async def change_password(request: ChangePasswordRequest, req: Request, response
# 更新密码
new_password_hash = get_password_hash(request.new_password)
update_user(user["id"], {"password_hash": new_password_hash})
update_user(user["id"], {"password_hash": new_password_hash})
# 生成新的 session token使旧 token 失效
new_session_token = generate_session_token()
delete_sessions(user["id"])
create_session(user["id"], new_session_token, None)
delete_sessions(user["id"])
create_session(user["id"], new_session_token, None)
# 生成新的 JWT Token
new_token = create_access_token(user["id"], new_session_token)
@@ -247,7 +261,7 @@ async def change_password(request: ChangePasswordRequest, req: Request, response
logger.info(f"用户修改密码: {user['phone']}")
return success_response(message="密码修改成功")
return success_response(message="密码修改成功")
except HTTPException:
raise
except Exception as e:
@@ -259,35 +273,13 @@ async def change_password(request: ChangePasswordRequest, req: Request, response
@router.get("/me")
async def get_me(request: Request):
async def get_me(user: dict = Depends(get_current_user)):
"""获取当前用户信息"""
# 从 Cookie 获取用户
token = request.cookies.get("access_token")
if not token:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="未登录"
)
token_data = decode_access_token(token)
if not token_data:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Token 无效"
)
user = cast(dict[str, Any], get_user_by_id(token_data.user_id) or {})
if not user:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="用户不存在"
)
return success_response(UserResponse(
id=user["id"],
phone=user["phone"],
username=user.get("username"),
role=user["role"],
is_active=user["is_active"],
expires_at=user.get("expires_at")
).model_dump())
return success_response(UserResponse(
id=user["id"],
phone=user["phone"],
username=user.get("username"),
role=user["role"],
is_active=user["is_active"],
expires_at=user.get("expires_at")
).model_dump())

View File

@@ -0,0 +1,77 @@
"""生成配音 API"""
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException
import uuid
from loguru import logger
from app.core.deps import get_current_user
from app.core.response import success_response
from app.modules.videos.task_store import create_task, get_task
from app.modules.generated_audios.schemas import GenerateAudioRequest, RenameAudioRequest
from app.modules.generated_audios import service
router = APIRouter()
@router.post("/generate")
async def generate_audio(
req: GenerateAudioRequest,
background_tasks: BackgroundTasks,
user: dict = Depends(get_current_user),
):
"""异步生成配音(返回 task_id"""
task_id = str(uuid.uuid4())
create_task(task_id, user["id"])
background_tasks.add_task(service.generate_audio_task, task_id, req, user["id"])
return success_response({"task_id": task_id})
@router.get("/tasks/{task_id}")
async def get_audio_task(task_id: str, user: dict = Depends(get_current_user)):
"""轮询配音生成进度"""
task = get_task(task_id)
if task.get("status") != "not_found" and task.get("user_id") != user["id"]:
return success_response({"status": "not_found"})
return success_response(task)
@router.get("")
async def list_audios(user: dict = Depends(get_current_user)):
"""列出当前用户所有已生成配音"""
try:
result = await service.list_generated_audios(user["id"])
return success_response(result)
except Exception as e:
logger.error(f"列出配音失败: {e}")
raise HTTPException(status_code=500, detail=f"获取列表失败: {str(e)}")
@router.delete("/{audio_id:path}")
async def delete_audio(audio_id: str, user: dict = Depends(get_current_user)):
"""删除配音"""
try:
await service.delete_generated_audio(audio_id, user["id"])
return success_response(message="删除成功")
except PermissionError as e:
raise HTTPException(status_code=403, detail=str(e))
except Exception as e:
logger.error(f"删除配音失败: {e}")
raise HTTPException(status_code=500, detail=f"删除失败: {str(e)}")
@router.put("/{audio_id:path}")
async def rename_audio(
audio_id: str,
request: RenameAudioRequest,
user: dict = Depends(get_current_user),
):
"""重命名配音"""
try:
result = await service.rename_generated_audio(audio_id, request.new_name, user["id"])
return success_response(result, message="重命名成功")
except PermissionError as e:
raise HTTPException(status_code=403, detail=str(e))
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
logger.error(f"重命名配音失败: {e}")
raise HTTPException(status_code=500, detail=f"重命名失败: {str(e)}")

View File

@@ -0,0 +1,31 @@
from pydantic import BaseModel
from typing import Optional, List
class GenerateAudioRequest(BaseModel):
text: str
tts_mode: str = "edgetts"
voice: str = "zh-CN-YunxiNeural"
ref_audio_id: Optional[str] = None
ref_text: Optional[str] = None
language: str = "zh-CN"
speed: float = 1.0
class RenameAudioRequest(BaseModel):
new_name: str
class GeneratedAudioItem(BaseModel):
id: str
name: str
path: str
duration_sec: float
text: str
tts_mode: str
language: str
created_at: int
class GeneratedAudioListResponse(BaseModel):
items: List[GeneratedAudioItem]

View File

@@ -0,0 +1,264 @@
"""生成配音 - 业务逻辑"""
import re
import json
import time
import asyncio
import subprocess
import tempfile
import os
from pathlib import Path
from typing import Optional
import httpx
from loguru import logger
from app.services.storage import storage_service
from app.services.tts_service import TTSService
from app.services.voice_clone_service import voice_clone_service
from app.modules.videos.task_store import task_store
from app.modules.generated_audios.schemas import (
GenerateAudioRequest,
GeneratedAudioItem,
GeneratedAudioListResponse,
)
BUCKET = "generated-audios"
def _locale_to_tts_lang(locale: str) -> str:
mapping = {"zh": "Chinese", "en": "English"}
return mapping.get(locale.split("-")[0], "Auto")
def _get_audio_duration(file_path: str) -> float:
try:
result = subprocess.run(
['ffprobe', '-v', 'quiet', '-show_entries', 'format=duration',
'-of', 'csv=p=0', file_path],
capture_output=True, text=True, timeout=10
)
return float(result.stdout.strip())
except Exception as e:
logger.warning(f"获取音频时长失败: {e}")
return 0.0
async def generate_audio_task(task_id: str, req: GenerateAudioRequest, user_id: str):
"""后台任务:生成配音"""
try:
task_store.update(task_id, {"status": "processing", "progress": 10, "message": "正在生成配音..."})
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp:
audio_path = tmp.name
try:
if req.tts_mode == "voiceclone":
if not req.ref_audio_id or not req.ref_text:
raise ValueError("声音克隆模式需要提供参考音频和参考文字")
task_store.update(task_id, {"progress": 20, "message": "正在下载参考音频..."})
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp_ref:
ref_local = tmp_ref.name
try:
ref_url = await storage_service.get_signed_url(
bucket="ref-audios", path=req.ref_audio_id
)
timeout = httpx.Timeout(None)
async with httpx.AsyncClient(timeout=timeout) as client:
async with client.stream("GET", ref_url) as resp:
resp.raise_for_status()
with open(ref_local, "wb") as f:
async for chunk in resp.aiter_bytes():
f.write(chunk)
task_store.update(task_id, {"progress": 40, "message": "正在克隆声音..."})
await voice_clone_service.generate_audio(
text=req.text,
ref_audio_path=ref_local,
ref_text=req.ref_text,
output_path=audio_path,
language=_locale_to_tts_lang(req.language),
speed=req.speed,
)
finally:
if os.path.exists(ref_local):
os.unlink(ref_local)
else:
task_store.update(task_id, {"progress": 30, "message": "正在生成语音..."})
tts = TTSService()
await tts.generate_audio(req.text, req.voice, audio_path)
task_store.update(task_id, {"progress": 70, "message": "正在上传配音..."})
duration = _get_audio_duration(audio_path)
timestamp = int(time.time())
audio_id = f"{user_id}/{timestamp}_audio.wav"
meta_id = f"{user_id}/{timestamp}_audio.json"
# 生成 display_name
now = time.strftime("%Y%m%d_%H%M", time.localtime(timestamp))
display_name = f"配音_{now}"
with open(audio_path, "rb") as f:
wav_data = f.read()
await storage_service.upload_file(
bucket=BUCKET, path=audio_id,
file_data=wav_data, content_type="audio/wav",
)
metadata = {
"display_name": display_name,
"text": req.text,
"tts_mode": req.tts_mode,
"voice": req.voice if req.tts_mode == "edgetts" else None,
"ref_audio_id": req.ref_audio_id,
"language": req.language,
"duration_sec": duration,
"created_at": timestamp,
}
await storage_service.upload_file(
bucket=BUCKET, path=meta_id,
file_data=json.dumps(metadata, ensure_ascii=False).encode("utf-8"),
content_type="application/json",
)
signed_url = await storage_service.get_signed_url(BUCKET, audio_id)
task_store.update(task_id, {
"status": "completed",
"progress": 100,
"message": f"配音生成完成 ({duration:.1f}s)",
"output": {
"audio_id": audio_id,
"name": display_name,
"path": signed_url,
"duration_sec": duration,
"text": req.text,
"tts_mode": req.tts_mode,
"language": req.language,
"created_at": timestamp,
},
})
finally:
if os.path.exists(audio_path):
os.unlink(audio_path)
except Exception as e:
import traceback
task_store.update(task_id, {
"status": "failed",
"message": f"配音生成失败: {str(e)}",
"error": traceback.format_exc(),
})
logger.error(f"Generate audio failed: {e}")
async def list_generated_audios(user_id: str) -> dict:
"""列出用户的所有已生成配音"""
files = await storage_service.list_files(BUCKET, user_id)
wav_files = [f for f in files if f.get("name", "").endswith("_audio.wav")]
if not wav_files:
return GeneratedAudioListResponse(items=[]).model_dump()
async def fetch_info(f):
name = f.get("name", "")
storage_path = f"{user_id}/{name}"
meta_name = name.replace("_audio.wav", "_audio.json")
meta_path = f"{user_id}/{meta_name}"
display_name = name
text = ""
tts_mode = "edgetts"
language = "zh-CN"
duration_sec = 0.0
created_at = 0
try:
meta_url = await storage_service.get_signed_url(BUCKET, meta_path)
async with httpx.AsyncClient(timeout=5.0) as client:
resp = await client.get(meta_url)
if resp.status_code == 200:
meta = resp.json()
display_name = meta.get("display_name", name)
text = meta.get("text", "")
tts_mode = meta.get("tts_mode", "edgetts")
language = meta.get("language", "zh-CN")
duration_sec = meta.get("duration_sec", 0.0)
created_at = meta.get("created_at", 0)
except Exception as e:
logger.debug(f"读取配音 metadata 失败: {e}")
try:
created_at = int(name.split("_")[0])
except:
pass
signed_url = await storage_service.get_signed_url(BUCKET, storage_path)
return GeneratedAudioItem(
id=storage_path,
name=display_name,
path=signed_url,
duration_sec=duration_sec,
text=text,
tts_mode=tts_mode,
language=language,
created_at=created_at,
)
items = await asyncio.gather(*[fetch_info(f) for f in wav_files])
items = sorted(items, key=lambda x: x.created_at, reverse=True)
return GeneratedAudioListResponse(items=items).model_dump()
async def delete_generated_audio(audio_id: str, user_id: str) -> None:
if not audio_id.startswith(f"{user_id}/"):
raise PermissionError("无权删除此文件")
await storage_service.delete_file(BUCKET, audio_id)
meta_path = audio_id.replace("_audio.wav", "_audio.json")
try:
await storage_service.delete_file(BUCKET, meta_path)
except:
pass
async def rename_generated_audio(audio_id: str, new_name: str, user_id: str) -> dict:
if not audio_id.startswith(f"{user_id}/"):
raise PermissionError("无权修改此文件")
new_name = new_name.strip()
if not new_name:
raise ValueError("新名称不能为空")
meta_path = audio_id.replace("_audio.wav", "_audio.json")
try:
meta_url = await storage_service.get_signed_url(BUCKET, meta_path)
async with httpx.AsyncClient() as client:
resp = await client.get(meta_url)
if resp.status_code == 200:
metadata = resp.json()
else:
raise Exception(f"Failed to fetch metadata: {resp.status_code}")
except Exception as e:
logger.warning(f"无法读取配音元数据: {e}, 将创建新的")
metadata = {
"display_name": new_name,
"text": "",
"tts_mode": "edgetts",
"language": "zh-CN",
"duration_sec": 0.0,
"created_at": int(time.time()),
}
metadata["display_name"] = new_name
await storage_service.upload_file(
bucket=BUCKET,
path=meta_path,
file_data=json.dumps(metadata, ensure_ascii=False).encode("utf-8"),
content_type="application/json",
)
return {"name": new_name}

View File

@@ -1,416 +1,62 @@
from fastapi import APIRouter, UploadFile, File, HTTPException, Request, BackgroundTasks, Depends
from app.core.config import settings
from app.core.deps import get_current_user
from app.core.response import success_response
from app.services.storage import storage_service
import re
import time
import traceback
import os
import aiofiles
from pathlib import Path
from loguru import logger
import asyncio
from pydantic import BaseModel
from typing import Optional
import httpx
from fastapi import APIRouter, HTTPException, Request, Depends
from loguru import logger
from app.core.deps import get_current_user
from app.core.response import success_response
from app.modules.materials.schemas import RenameMaterialRequest
from app.modules.materials import service
router = APIRouter()
class RenameMaterialRequest(BaseModel):
new_name: str
def sanitize_filename(filename: str) -> str:
safe_name = re.sub(r'[<>:"/\\|?*]', '_', filename)
if len(safe_name) > 100:
ext = Path(safe_name).suffix
safe_name = safe_name[:100 - len(ext)] + ext
return safe_name
async def process_and_upload(temp_file_path: str, original_filename: str, content_type: str, user_id: str):
"""Background task to strip multipart headers and upload to Supabase"""
try:
logger.info(f"Processing raw upload: {temp_file_path} for user {user_id}")
# 1. Analyze file to find actual video content (strip multipart boundaries)
# This is a simplified manual parser for a SINGLE file upload.
# Structure:
# --boundary
# Content-Disposition: form-data; name="file"; filename="..."
# Content-Type: video/mp4
# \r\n\r\n
# [DATA]
# \r\n--boundary--
# We need to read the first few KB to find the header end
start_offset = 0
end_offset = 0
boundary = b""
file_size = os.path.getsize(temp_file_path)
with open(temp_file_path, 'rb') as f:
# Read first 4KB to find header
head = f.read(4096)
# Find boundary
first_line_end = head.find(b'\r\n')
if first_line_end == -1:
raise Exception("Could not find boundary in multipart body")
boundary = head[:first_line_end] # e.g. --boundary123
logger.info(f"Detected boundary: {boundary}")
# Find end of headers (\r\n\r\n)
header_end = head.find(b'\r\n\r\n')
if header_end == -1:
raise Exception("Could not find end of multipart headers")
start_offset = header_end + 4
logger.info(f"Video data starts at offset: {start_offset}")
# Find end boundary (read from end of file)
# It should be \r\n + boundary + -- + \r\n
# We seek to end-200 bytes
f.seek(max(0, file_size - 200))
tail = f.read()
# The closing boundary is usually --boundary--
# We look for the last occurrence of the boundary
last_boundary_pos = tail.rfind(boundary)
if last_boundary_pos != -1:
# The data ends before \r\n + boundary
# The tail buffer relative position needs to be converted to absolute
end_pos_in_tail = last_boundary_pos
# We also need to check for the preceding \r\n
if end_pos_in_tail >= 2 and tail[end_pos_in_tail-2:end_pos_in_tail] == b'\r\n':
end_pos_in_tail -= 2
# Absolute end offset
end_offset = (file_size - 200) + last_boundary_pos
# Correction for CRLF before boundary
# Actually, simply: read until (file_size - len(tail) + last_boundary_pos) - 2
end_offset = (max(0, file_size - 200) + last_boundary_pos) - 2
else:
logger.warning("Could not find closing boundary, assuming EOF")
end_offset = file_size
logger.info(f"Video data ends at offset: {end_offset}. Total video size: {end_offset - start_offset}")
# 2. Extract and Upload to Supabase
# Since we have the file on disk, we can just pass the file object (seeked) to upload_file?
# Or if upload_file expects bytes/path, checking storage.py...
# It takes `file_data` (bytes) or file-like?
# supabase-py's `upload` method handles parsing if we pass a file object.
# But we need to pass ONLY the video slice.
# So we create a generator or a sliced file object?
# Simpler: Read the slice into memory if < 1GB? Or copy to new temp file?
# Copying to new temp file is safer for memory.
video_path = temp_file_path + "_video.mp4"
with open(temp_file_path, 'rb') as src, open(video_path, 'wb') as dst:
src.seek(start_offset)
# Copy in chunks
bytes_to_copy = end_offset - start_offset
copied = 0
while copied < bytes_to_copy:
chunk_size = min(1024*1024*10, bytes_to_copy - copied) # 10MB chunks
chunk = src.read(chunk_size)
if not chunk:
break
dst.write(chunk)
copied += len(chunk)
logger.info(f"Extracted video content to {video_path}")
# 3. Upload to Supabase with user isolation
timestamp = int(time.time())
safe_name = re.sub(r'[^a-zA-Z0-9._-]', '', original_filename)
# 使用 user_id 作为目录前缀实现隔离
storage_path = f"{user_id}/{timestamp}_{safe_name}"
# Use storage service (this calls Supabase which might do its own http request)
# We read the cleaned video file
with open(video_path, 'rb') as f:
file_content = f.read() # Still reading into memory for simple upload call, but server has 32GB RAM so ok for 500MB
await storage_service.upload_file(
bucket=storage_service.BUCKET_MATERIALS,
path=storage_path,
file_data=file_content,
content_type=content_type
)
logger.info(f"Upload to Supabase complete: {storage_path}")
# Cleanup
os.remove(temp_file_path)
os.remove(video_path)
return storage_path
except Exception as e:
logger.error(f"Background upload processing failed: {e}\n{traceback.format_exc()}")
raise
router = APIRouter()
@router.post("")
async def upload_material(
request: Request,
background_tasks: BackgroundTasks,
current_user: dict = Depends(get_current_user)
):
user_id = current_user["id"]
logger.info(f"ENTERED upload_material (Streaming Mode) for user {user_id}. Headers: {request.headers}")
filename = "unknown_video.mp4" # Fallback
content_type = "video/mp4"
# Try to parse filename from header if possible (unreliable in raw stream)
# We will rely on post-processing or client hint
# Frontend sends standard multipart.
# Create temp file
timestamp = int(time.time())
temp_filename = f"upload_{timestamp}.raw"
temp_path = os.path.join("/tmp", temp_filename) # Use /tmp on Linux
# Ensure /tmp exists (it does) but verify paths
if os.name == 'nt': # Local dev
temp_path = f"d:/tmp/{temp_filename}"
os.makedirs("d:/tmp", exist_ok=True)
logger.info(f"Upload material request from user {user_id}")
try:
total_size = 0
last_log = 0
async with aiofiles.open(temp_path, 'wb') as f:
async for chunk in request.stream():
await f.write(chunk)
total_size += len(chunk)
# Log progress every 20MB
if total_size - last_log > 20 * 1024 * 1024:
logger.info(f"Receiving stream... Processed {total_size / (1024*1024):.2f} MB")
last_log = total_size
logger.info(f"Stream reception complete. Total size: {total_size} bytes. Saved to {temp_path}")
if total_size == 0:
raise HTTPException(400, "Received empty body")
# Attempt to extract filename from the saved file's first bytes?
# Or just accept it as "uploaded_video.mp4" for now to prove it works.
# We can try to regex the header in the file content we just wrote.
# Implemented in background task to return success immediately.
# Wait, if we return immediately, the user's UI might not show the file yet?
# The prompt says "Wait for upload".
# But to avoid User Waiting Timeout, maybe returning early is better?
# NO, user expects the file to be in the list.
# So we Must await the processing.
# But "Processing" (Strip + Upload to Supabase) takes time.
# Receiving took time.
# If we await Supabase upload, does it timeout?
# Supabase upload is outgoing. Usually faster/stable.
# Let's await the processing to ensure "List Materials" shows it.
# We need to extract the filename for the list.
# Quick extract filename from first 4kb
with open(temp_path, 'rb') as f:
head = f.read(4096).decode('utf-8', errors='ignore')
match = re.search(r'filename="([^"]+)"', head)
if match:
filename = match.group(1)
logger.info(f"Extracted filename from body: {filename}")
# Run processing sync (in await)
storage_path = await process_and_upload(temp_path, filename, content_type, user_id)
# Get signed URL (it exists now)
signed_url = await storage_service.get_signed_url(
bucket=storage_service.BUCKET_MATERIALS,
path=storage_path
)
size_mb = total_size / (1024 * 1024) # Approximate (includes headers)
# 从 storage_path 提取显示名
display_name = storage_path.split('/')[-1] # 去掉 user_id 前缀
if '_' in display_name:
parts = display_name.split('_', 1)
if parts[0].isdigit():
display_name = parts[1]
return success_response({
"id": storage_path,
"name": display_name,
"path": signed_url,
"size_mb": size_mb,
"type": "video"
})
result = await service.upload_material(request, user_id)
return success_response(result)
except ValueError as e:
raise HTTPException(400, str(e))
except Exception as e:
error_msg = f"Streaming upload failed: {str(e)}"
detail_msg = f"Exception: {repr(e)}\nArgs: {e.args}\n{traceback.format_exc()}"
logger.error(error_msg + "\n" + detail_msg)
# Write to debug file
try:
with open("debug_upload.log", "a") as logf:
logf.write(f"\n--- Error at {time.ctime()} ---\n")
logf.write(detail_msg)
logf.write("\n-----------------------------\n")
except:
pass
if os.path.exists(temp_path):
try:
os.remove(temp_path)
except:
pass
raise HTTPException(500, f"Upload failed. Check server logs. Error: {str(e)}")
raise HTTPException(500, f"Upload failed. Error: {str(e)}")
@router.get("")
async def list_materials(current_user: dict = Depends(get_current_user)):
user_id = current_user["id"]
try:
# 只列出当前用户目录下的文件
files_obj = await storage_service.list_files(
bucket=storage_service.BUCKET_MATERIALS,
path=user_id
)
semaphore = asyncio.Semaphore(8)
async def build_item(f):
name = f.get('name')
if not name or name == '.emptyFolderPlaceholder':
return None
display_name = name
if '_' in name:
parts = name.split('_', 1)
if parts[0].isdigit():
display_name = parts[1]
full_path = f"{user_id}/{name}"
async with semaphore:
signed_url = await storage_service.get_signed_url(
bucket=storage_service.BUCKET_MATERIALS,
path=full_path
)
metadata = f.get('metadata', {})
size = metadata.get('size', 0)
created_at_str = f.get('created_at', '')
created_at = 0
if created_at_str:
from datetime import datetime
try:
dt = datetime.fromisoformat(created_at_str.replace('Z', '+00:00'))
created_at = int(dt.timestamp())
except Exception:
pass
return {
"id": full_path,
"name": display_name,
"path": signed_url,
"size_mb": size / (1024 * 1024),
"type": "video",
"created_at": created_at
}
tasks = [build_item(f) for f in files_obj]
results = await asyncio.gather(*tasks, return_exceptions=True)
materials = []
for item in results:
if not item:
continue
if isinstance(item, Exception):
logger.warning(f"Material signed url build failed: {item}")
continue
materials.append(item)
materials.sort(key=lambda x: x['id'], reverse=True)
return success_response({"materials": materials})
except Exception as e:
logger.error(f"List materials failed: {e}")
return success_response({"materials": []}, message="获取素材失败")
materials = await service.list_materials(user_id)
return success_response({"materials": materials})
@router.delete("/{material_id:path}")
async def delete_material(material_id: str, current_user: dict = Depends(get_current_user)):
@router.delete("/{material_id:path}")
async def delete_material(material_id: str, current_user: dict = Depends(get_current_user)):
user_id = current_user["id"]
# 验证 material_id 属于当前用户
if not material_id.startswith(f"{user_id}/"):
raise HTTPException(403, "无权删除此素材")
try:
await storage_service.delete_file(
bucket=storage_service.BUCKET_MATERIALS,
path=material_id
)
return success_response(message="素材已删除")
except Exception as e:
raise HTTPException(500, f"删除失败: {str(e)}")
@router.put("/{material_id:path}")
async def rename_material(
material_id: str,
payload: RenameMaterialRequest,
current_user: dict = Depends(get_current_user)
):
user_id = current_user["id"]
if not material_id.startswith(f"{user_id}/"):
raise HTTPException(403, "无权重命名此素材")
new_name_raw = payload.new_name.strip() if payload.new_name else ""
if not new_name_raw:
raise HTTPException(400, "新名称不能为空")
old_name = material_id.split("/", 1)[1]
old_ext = Path(old_name).suffix
base_name = Path(new_name_raw).stem if Path(new_name_raw).suffix else new_name_raw
safe_base = sanitize_filename(base_name).strip()
if not safe_base:
raise HTTPException(400, "新名称无效")
new_filename = f"{safe_base}{old_ext}"
prefix = None
if "_" in old_name:
maybe_prefix, _ = old_name.split("_", 1)
if maybe_prefix.isdigit():
prefix = maybe_prefix
if prefix:
new_filename = f"{prefix}_{new_filename}"
new_path = f"{user_id}/{new_filename}"
try:
if new_path != material_id:
await storage_service.move_file(
bucket=storage_service.BUCKET_MATERIALS,
from_path=material_id,
to_path=new_path
)
signed_url = await storage_service.get_signed_url(
bucket=storage_service.BUCKET_MATERIALS,
path=new_path
)
display_name = new_filename
if "_" in new_filename:
parts = new_filename.split("_", 1)
if parts[0].isdigit():
display_name = parts[1]
return success_response({
"id": new_path,
"name": display_name,
"path": signed_url,
}, message="重命名成功")
except Exception as e:
raise HTTPException(500, f"重命名失败: {str(e)}")
await service.delete_material(material_id, user_id)
return success_response(message="素材已删除")
except PermissionError as e:
raise HTTPException(403, str(e))
except Exception as e:
raise HTTPException(500, f"删除失败: {str(e)}")
@router.put("/{material_id:path}")
async def rename_material(
material_id: str,
payload: RenameMaterialRequest,
current_user: dict = Depends(get_current_user)
):
user_id = current_user["id"]
try:
result = await service.rename_material(material_id, payload.new_name, user_id)
return success_response(result, message="重命名成功")
except PermissionError as e:
raise HTTPException(403, str(e))
except ValueError as e:
raise HTTPException(400, str(e))
except Exception as e:
raise HTTPException(500, f"重命名失败: {str(e)}")

View File

@@ -0,0 +1,14 @@
from pydantic import BaseModel
class RenameMaterialRequest(BaseModel):
new_name: str
class MaterialItem(BaseModel):
id: str
name: str
path: str
size_mb: float
type: str = "video"
created_at: int = 0

View File

@@ -0,0 +1,296 @@
import re
import os
import time
import asyncio
import traceback
import aiofiles
from pathlib import Path
from loguru import logger
from app.services.storage import storage_service
def sanitize_filename(filename: str) -> str:
safe_name = re.sub(r'[<>:"/\\|?*]', '_', filename)
if len(safe_name) > 100:
ext = Path(safe_name).suffix
safe_name = safe_name[:100 - len(ext)] + ext
return safe_name
def _extract_display_name(storage_name: str) -> str:
"""从存储文件名中提取显示名(去掉时间戳前缀)"""
if '_' in storage_name:
parts = storage_name.split('_', 1)
if parts[0].isdigit():
return parts[1]
return storage_name
async def _process_and_upload(temp_file_path: str, original_filename: str, content_type: str, user_id: str) -> str:
"""Strip multipart headers and upload to Supabase, return storage_path"""
try:
logger.info(f"Processing raw upload: {temp_file_path} for user {user_id}")
file_size = os.path.getsize(temp_file_path)
with open(temp_file_path, 'rb') as f:
head = f.read(4096)
first_line_end = head.find(b'\r\n')
if first_line_end == -1:
raise Exception("Could not find boundary in multipart body")
boundary = head[:first_line_end]
logger.info(f"Detected boundary: {boundary}")
header_end = head.find(b'\r\n\r\n')
if header_end == -1:
raise Exception("Could not find end of multipart headers")
start_offset = header_end + 4
logger.info(f"Video data starts at offset: {start_offset}")
f.seek(max(0, file_size - 200))
tail = f.read()
last_boundary_pos = tail.rfind(boundary)
if last_boundary_pos != -1:
end_offset = (max(0, file_size - 200) + last_boundary_pos) - 2
else:
logger.warning("Could not find closing boundary, assuming EOF")
end_offset = file_size
logger.info(f"Video data ends at offset: {end_offset}. Total video size: {end_offset - start_offset}")
video_path = temp_file_path + "_video.mp4"
with open(temp_file_path, 'rb') as src, open(video_path, 'wb') as dst:
src.seek(start_offset)
bytes_to_copy = end_offset - start_offset
copied = 0
while copied < bytes_to_copy:
chunk_size = min(1024 * 1024 * 10, bytes_to_copy - copied)
chunk = src.read(chunk_size)
if not chunk:
break
dst.write(chunk)
copied += len(chunk)
logger.info(f"Extracted video content to {video_path}")
timestamp = int(time.time())
safe_name = re.sub(r'[^a-zA-Z0-9._-]', '', original_filename)
storage_path = f"{user_id}/{timestamp}_{safe_name}"
with open(video_path, 'rb') as f:
file_content = f.read()
await storage_service.upload_file(
bucket=storage_service.BUCKET_MATERIALS,
path=storage_path,
file_data=file_content,
content_type=content_type
)
logger.info(f"Upload to Supabase complete: {storage_path}")
os.remove(temp_file_path)
os.remove(video_path)
return storage_path
except Exception as e:
logger.error(f"Background upload processing failed: {e}\n{traceback.format_exc()}")
raise
async def upload_material(request, user_id: str) -> dict:
"""接收流式上传并存储到 Supabase返回素材信息"""
filename = "unknown_video.mp4"
content_type = "video/mp4"
timestamp = int(time.time())
temp_filename = f"upload_{timestamp}.raw"
temp_path = os.path.join("/tmp", temp_filename)
if os.name == 'nt':
temp_path = f"d:/tmp/{temp_filename}"
os.makedirs("d:/tmp", exist_ok=True)
try:
total_size = 0
last_log = 0
async with aiofiles.open(temp_path, 'wb') as f:
async for chunk in request.stream():
await f.write(chunk)
total_size += len(chunk)
if total_size - last_log > 20 * 1024 * 1024:
logger.info(f"Receiving stream... Processed {total_size / (1024*1024):.2f} MB")
last_log = total_size
logger.info(f"Stream reception complete. Total size: {total_size} bytes. Saved to {temp_path}")
if total_size == 0:
raise ValueError("Received empty body")
with open(temp_path, 'rb') as f:
head = f.read(4096).decode('utf-8', errors='ignore')
match = re.search(r'filename="([^"]+)"', head)
if match:
filename = match.group(1)
logger.info(f"Extracted filename from body: {filename}")
storage_path = await _process_and_upload(temp_path, filename, content_type, user_id)
signed_url = await storage_service.get_signed_url(
bucket=storage_service.BUCKET_MATERIALS,
path=storage_path
)
size_mb = total_size / (1024 * 1024)
display_name = _extract_display_name(storage_path.split('/')[-1])
return {
"id": storage_path,
"name": display_name,
"path": signed_url,
"size_mb": size_mb,
"type": "video"
}
except Exception as e:
error_msg = f"Streaming upload failed: {str(e)}"
detail_msg = f"Exception: {repr(e)}\nArgs: {e.args}\n{traceback.format_exc()}"
logger.error(error_msg + "\n" + detail_msg)
try:
with open("debug_upload.log", "a") as logf:
logf.write(f"\n--- Error at {time.ctime()} ---\n")
logf.write(detail_msg)
logf.write("\n-----------------------------\n")
except:
pass
if os.path.exists(temp_path):
try:
os.remove(temp_path)
except:
pass
raise
async def list_materials(user_id: str) -> list[dict]:
"""列出用户的所有素材"""
try:
files_obj = await storage_service.list_files(
bucket=storage_service.BUCKET_MATERIALS,
path=user_id
)
semaphore = asyncio.Semaphore(8)
async def build_item(f):
name = f.get('name')
if not name or name == '.emptyFolderPlaceholder':
return None
display_name = _extract_display_name(name)
full_path = f"{user_id}/{name}"
async with semaphore:
signed_url = await storage_service.get_signed_url(
bucket=storage_service.BUCKET_MATERIALS,
path=full_path
)
metadata = f.get('metadata', {})
size = metadata.get('size', 0)
created_at_str = f.get('created_at', '')
created_at = 0
if created_at_str:
from datetime import datetime
try:
dt = datetime.fromisoformat(created_at_str.replace('Z', '+00:00'))
created_at = int(dt.timestamp())
except Exception:
pass
return {
"id": full_path,
"name": display_name,
"path": signed_url,
"size_mb": size / (1024 * 1024),
"type": "video",
"created_at": created_at
}
tasks = [build_item(f) for f in files_obj]
results = await asyncio.gather(*tasks, return_exceptions=True)
materials = []
for item in results:
if not item:
continue
if isinstance(item, Exception):
logger.warning(f"Material signed url build failed: {item}")
continue
materials.append(item)
materials.sort(key=lambda x: x['id'], reverse=True)
return materials
except Exception as e:
logger.error(f"List materials failed: {e}")
return []
async def delete_material(material_id: str, user_id: str) -> None:
"""删除素材"""
if not material_id.startswith(f"{user_id}/"):
raise PermissionError("无权删除此素材")
await storage_service.delete_file(
bucket=storage_service.BUCKET_MATERIALS,
path=material_id
)
async def rename_material(material_id: str, new_name_raw: str, user_id: str) -> dict:
"""重命名素材,返回更新后的素材信息"""
if not material_id.startswith(f"{user_id}/"):
raise PermissionError("无权重命名此素材")
new_name_raw = new_name_raw.strip() if new_name_raw else ""
if not new_name_raw:
raise ValueError("新名称不能为空")
old_name = material_id.split("/", 1)[1]
old_ext = Path(old_name).suffix
base_name = Path(new_name_raw).stem if Path(new_name_raw).suffix else new_name_raw
safe_base = sanitize_filename(base_name).strip()
if not safe_base:
raise ValueError("新名称无效")
new_filename = f"{safe_base}{old_ext}"
prefix = None
if "_" in old_name:
maybe_prefix, _ = old_name.split("_", 1)
if maybe_prefix.isdigit():
prefix = maybe_prefix
if prefix:
new_filename = f"{prefix}_{new_filename}"
new_path = f"{user_id}/{new_filename}"
if new_path != material_id:
await storage_service.move_file(
bucket=storage_service.BUCKET_MATERIALS,
from_path=material_id,
to_path=new_path
)
signed_url = await storage_service.get_signed_url(
bucket=storage_service.BUCKET_MATERIALS,
path=new_path
)
display_name = _extract_display_name(new_filename)
return {
"id": new_path,
"name": display_name,
"path": signed_url,
}

View File

View File

@@ -0,0 +1,52 @@
"""
支付 API创建订单、异步通知、状态查询
遵循 BACKEND_DEV.md 规范router 只做参数校验、调用 service、返回统一响应
"""
from fastapi import APIRouter, HTTPException, Request, status
from fastapi.responses import PlainTextResponse
from app.core.response import success_response
from .schemas import CreateOrderRequest, CreateOrderResponse, OrderStatusResponse
from . import service
router = APIRouter(prefix="/api/payment", tags=["支付"])
@router.post("/create-order")
async def create_payment_order(request: CreateOrderRequest):
"""创建支付宝电脑网站支付订单,返回收银台 URL"""
try:
result = service.create_payment_order(request.payment_token)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail=str(e))
except RuntimeError as e:
raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=str(e))
return success_response(
CreateOrderResponse(**result).model_dump()
)
@router.post("/notify")
async def payment_notify(request: Request):
"""
支付宝异步通知回调
必须返回纯文本 "success"(不是 JSON否则支付宝会重复推送。
"""
form_data = await request.form()
verified = service.handle_payment_notify(dict(form_data))
return PlainTextResponse("success" if verified else "fail")
@router.get("/status/{out_trade_no}")
async def check_payment_status(out_trade_no: str):
"""查询订单支付状态(前端轮询)"""
order_status = service.get_order_status(out_trade_no)
if order_status is None:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="订单不存在")
return success_response(
OrderStatusResponse(status=order_status).model_dump()
)

View File

@@ -0,0 +1,15 @@
from pydantic import BaseModel
class CreateOrderRequest(BaseModel):
payment_token: str
class CreateOrderResponse(BaseModel):
pay_url: str
out_trade_no: str
amount: float
class OrderStatusResponse(BaseModel):
status: str

View File

@@ -0,0 +1,137 @@
"""
支付业务服务
职责Alipay SDK 封装、创建订单、处理支付通知、查询状态
遵循 BACKEND_DEV.md "薄路由 + 厚服务" 原则
"""
from datetime import datetime, timezone, timedelta
import uuid
from alipay import AliPay
from loguru import logger
from app.core.config import settings
from app.core.security import decode_payment_token
from app.repositories.orders import create_order, get_order_by_trade_no, update_order_status
from app.repositories.users import update_user
# 支付宝网关地址
ALIPAY_GATEWAY = "https://openapi.alipay.com/gateway.do"
ALIPAY_GATEWAY_SANDBOX = "https://openapi-sandbox.dl.alipaydev.com/gateway.do"
def _get_alipay_client() -> AliPay:
"""延迟初始化 Alipay 客户端"""
return AliPay(
appid=settings.ALIPAY_APP_ID,
app_notify_url=settings.ALIPAY_NOTIFY_URL,
app_private_key_string=open(settings.ALIPAY_PRIVATE_KEY_PATH).read(),
alipay_public_key_string=open(settings.ALIPAY_PUBLIC_KEY_PATH).read(),
sign_type="RSA2",
debug=settings.ALIPAY_SANDBOX,
)
def _create_page_pay_url(out_trade_no: str, amount: float, subject: str) -> str | None:
"""调用 alipay.trade.page.pay返回支付宝收银台 URL"""
client = _get_alipay_client()
order_string = client.api_alipay_trade_page_pay(
subject=subject,
out_trade_no=out_trade_no,
total_amount=amount,
return_url=settings.ALIPAY_RETURN_URL,
)
if not order_string:
logger.error(f"电脑网站支付下单失败: {out_trade_no}")
return None
gateway = ALIPAY_GATEWAY_SANDBOX if settings.ALIPAY_SANDBOX else ALIPAY_GATEWAY
pay_url = f"{gateway}?{order_string}"
logger.info(f"电脑网站支付下单成功: {out_trade_no}")
return pay_url
def _verify_signature(data: dict, signature: str) -> bool:
"""验证支付宝异步通知签名"""
client = _get_alipay_client()
return client.verify(data, signature)
def create_payment_order(payment_token: str) -> dict:
"""
创建支付订单完整流程
Returns: {"pay_url": str, "out_trade_no": str, "amount": float}
Raises: ValueError (token 无效), RuntimeError (API 失败)
"""
user_id = decode_payment_token(payment_token)
if not user_id:
raise ValueError("付费凭证无效或已过期,请重新登录")
out_trade_no = f"VG_{int(datetime.now().timestamp())}_{uuid.uuid4().hex[:8]}"
amount = settings.PAYMENT_AMOUNT
create_order(user_id, out_trade_no, amount)
pay_url = _create_page_pay_url(out_trade_no, amount, "IPAgent 会员开通")
if not pay_url:
raise RuntimeError("创建支付订单失败,请稍后重试")
logger.info(f"用户 {user_id} 创建支付订单: {out_trade_no}")
return {"pay_url": pay_url, "out_trade_no": out_trade_no, "amount": amount}
def handle_payment_notify(form_data: dict) -> bool:
"""
处理支付宝异步通知完整流程
Returns: True=验签通过, False=验签失败
"""
data = dict(form_data)
signature = data.pop("sign", "")
data.pop("sign_type", None)
if not _verify_signature(data, signature):
logger.warning(f"支付宝通知验签失败: {data.get('out_trade_no')}")
return False
out_trade_no = data.get("out_trade_no", "")
trade_status = data.get("trade_status", "")
trade_no = data.get("trade_no", "")
logger.info(f"收到支付宝通知: {out_trade_no}, status={trade_status}, trade_no={trade_no}")
if trade_status not in ("TRADE_SUCCESS", "TRADE_FINISHED"):
return True
order = get_order_by_trade_no(out_trade_no)
if not order:
logger.warning(f"订单不存在: {out_trade_no}")
return True
if order["status"] == "paid":
logger.info(f"订单已处理过: {out_trade_no}")
return True
update_order_status(out_trade_no, "paid", trade_no)
user_id = order["user_id"]
expires_at = (datetime.now(timezone.utc) + timedelta(days=settings.PAYMENT_EXPIRE_DAYS)).isoformat()
update_user(user_id, {
"is_active": True,
"role": "user",
"expires_at": expires_at,
})
logger.success(f"用户 {user_id} 支付成功,已激活,有效期至 {expires_at}")
return True
def get_order_status(out_trade_no: str) -> str | None:
"""查询订单支付状态"""
order = get_order_by_trade_no(out_trade_no)
if not order:
return None
return order["status"]

View File

@@ -1,240 +1,27 @@
"""
参考音频管理 API
支持上传/列表/删除参考音频,用于 Qwen3-TTS 声音克隆
"""
"""参考音频管理 API"""
from fastapi import APIRouter, UploadFile, File, Form, HTTPException, Depends
from pydantic import BaseModel
from typing import List, Optional
from pathlib import Path
from loguru import logger
import time
import json
import subprocess
import tempfile
import os
import re
from app.core.deps import get_current_user
from app.services.storage import storage_service
from app.core.response import success_response
from app.modules.ref_audios.schemas import RenameRequest
from app.modules.ref_audios import service
router = APIRouter()
# 支持的音频格式
ALLOWED_AUDIO_EXTENSIONS = {'.wav', '.mp3', '.m4a', '.webm', '.ogg', '.flac', '.aac'}
# 参考音频 bucket
BUCKET_REF_AUDIOS = "ref-audios"
class RefAudioResponse(BaseModel):
id: str
name: str
path: str # signed URL for playback
ref_text: str
duration_sec: float
created_at: int
class RefAudioListResponse(BaseModel):
items: List[RefAudioResponse]
def sanitize_filename(filename: str) -> str:
"""清理文件名,移除特殊字符"""
safe_name = re.sub(r'[<>:"/\\|?*\s]', '_', filename)
if len(safe_name) > 50:
ext = Path(safe_name).suffix
safe_name = safe_name[:50 - len(ext)] + ext
return safe_name
def get_audio_duration(file_path: str) -> float:
"""获取音频时长 (秒)"""
try:
result = subprocess.run(
['ffprobe', '-v', 'quiet', '-show_entries', 'format=duration',
'-of', 'csv=p=0', file_path],
capture_output=True, text=True, timeout=10
)
return float(result.stdout.strip())
except Exception as e:
logger.warning(f"获取音频时长失败: {e}")
return 0.0
def convert_to_wav(input_path: str, output_path: str) -> bool:
"""将音频转换为 WAV 格式 (16kHz, mono)"""
try:
subprocess.run([
'ffmpeg', '-y', '-i', input_path,
'-ar', '16000', # 16kHz 采样率
'-ac', '1', # 单声道
'-acodec', 'pcm_s16le', # 16-bit PCM
output_path
], capture_output=True, timeout=60, check=True)
return True
except Exception as e:
logger.error(f"音频转换失败: {e}")
return False
@router.post("")
async def upload_ref_audio(
file: UploadFile = File(...),
ref_text: str = Form(...),
ref_text: str = Form(""),
user: dict = Depends(get_current_user)
):
"""
上传参考音频
- file: 音频文件 (支持 wav, mp3, m4a, webm 等)
- ref_text: 参考音频的转写文字 (必填)
"""
user_id = user["id"]
if not file.filename:
raise HTTPException(status_code=400, detail="文件名无效")
filename = file.filename
# 验证文件扩展名
ext = Path(filename).suffix.lower()
if ext not in ALLOWED_AUDIO_EXTENSIONS:
raise HTTPException(
status_code=400,
detail=f"不支持的音频格式: {ext}。支持的格式: {', '.join(ALLOWED_AUDIO_EXTENSIONS)}"
)
# 验证 ref_text
if not ref_text or len(ref_text.strip()) < 2:
raise HTTPException(status_code=400, detail="参考文字不能为空")
"""上传参考音频"""
try:
# 创建临时文件
with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp_input:
content = await file.read()
tmp_input.write(content)
tmp_input_path = tmp_input.name
# 转换为 WAV 格式
tmp_wav_path = tmp_input_path + ".wav"
if ext != '.wav':
if not convert_to_wav(tmp_input_path, tmp_wav_path):
raise HTTPException(status_code=500, detail="音频格式转换失败")
else:
# 即使是 wav 也要标准化格式
convert_to_wav(tmp_input_path, tmp_wav_path)
# 获取音频时长
duration = get_audio_duration(tmp_wav_path)
if duration < 1.0:
raise HTTPException(status_code=400, detail="音频时长过短,至少需要 1 秒")
if duration > 60.0:
raise HTTPException(status_code=400, detail="音频时长过长,最多 60 秒")
# 3. 处理重名逻辑 (Friendly Display Name)
original_name = filename
# 获取用户现有的所有参考音频列表 (为了检查文件名冲突)
# 注意: 这种列表方式在文件极多时性能一般,但考虑到单用户参考音频数量有限,目前可行
existing_files = await storage_service.list_files(BUCKET_REF_AUDIOS, user_id)
existing_names = set()
# 预加载所有现有的 display name
# 这里需要并发请求 metadata 可能会慢,优化: 仅检查 metadata 文件并解析
# 简易方案: 仅在 metadata 中读取 original_filename
# 但 list_files 返回的是 name我们需要 metadata
# 考虑到性能,这里使用一种妥协方案:
# 我们不做全量检查,而是简单的检查:如果用户上传 myvoice.wav
# 我们看看有没有 (timestamp)_myvoice.wav 这种其实并不能准确判断 display name 是否冲突
#
# 正确做法: 应该有个数据库表存 metadata。但目前是无数据库设计。
#
# 改用简单方案:
# 既然我们无法快速获取所有 display name
# 我们暂时只处理 "在新上传时original_filename 保持原样"
# 但用户希望 "如果在列表中看到重复的,自动加(1)"
#
# 鉴于无数据库架构的限制,要在上传时知道"已有的 display name" 成本太高(需遍历下载所有json)。
#
# 💡 替代方案:
# 我们不检查旧的。我们只保证**存储**唯一。
# 对于用户提到的 "新上传的文件名后加个数字" -> 这通常是指 "另存为" 的逻辑。
# 既然用户现在的痛点是 "显示了时间戳太丑",而我已经去掉了时间戳显示。
# 那么如果用户上传两个 "TEST.wav",列表里就会有两个 "TEST.wav" (但时间不同)。
# 这其实是可以接受的。
#
# 但如果用户强求 "自动重命名":
# 我们可以在这里做一个轻量级的 "同名检测"
# 检查有没有 *_{original_name} 的文件存在。
# 如果 storage 里已经有 123_abc.wav, 456_abc.wav
# 我们可以认为 abc.wav 已经存在。
dup_count = 0
search_suffix = f"_{original_name}" # 比如 _test.wav
for f in existing_files:
fname = f.get('name', '')
if fname.endswith(search_suffix):
dup_count += 1
final_display_name = original_name
if dup_count > 0:
name_stem = Path(original_name).stem
name_ext = Path(original_name).suffix
final_display_name = f"{name_stem}({dup_count}){name_ext}"
# 生成存储路径 (唯一ID)
timestamp = int(time.time())
safe_name = sanitize_filename(Path(filename).stem)
storage_path = f"{user_id}/{timestamp}_{safe_name}.wav"
# 上传 WAV 文件到 Supabase
with open(tmp_wav_path, 'rb') as f:
wav_data = f.read()
await storage_service.upload_file(
bucket=BUCKET_REF_AUDIOS,
path=storage_path,
file_data=wav_data,
content_type="audio/wav"
)
# 上传元数据 JSON
metadata = {
"ref_text": ref_text.strip(),
"original_filename": final_display_name, # 这里的名字如果有重复会自动加(1)
"duration_sec": duration,
"created_at": timestamp
}
metadata_path = f"{user_id}/{timestamp}_{safe_name}.json"
await storage_service.upload_file(
bucket=BUCKET_REF_AUDIOS,
path=metadata_path,
file_data=json.dumps(metadata, ensure_ascii=False).encode('utf-8'),
content_type="application/json"
)
# 获取签名 URL
signed_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, storage_path)
# 清理临时文件
os.unlink(tmp_input_path)
if os.path.exists(tmp_wav_path):
os.unlink(tmp_wav_path)
return success_response(RefAudioResponse(
id=storage_path,
name=filename,
path=signed_url,
ref_text=ref_text.strip(),
duration_sec=duration,
created_at=timestamp
).model_dump())
except HTTPException:
raise
result = await service.upload_ref_audio(file, ref_text, user["id"])
return success_response(result)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
logger.error(f"上传参考音频失败: {e}")
raise HTTPException(status_code=500, detail=f"上传失败: {str(e)}")
@@ -243,81 +30,9 @@ async def upload_ref_audio(
@router.get("")
async def list_ref_audios(user: dict = Depends(get_current_user)):
"""列出当前用户的所有参考音频"""
user_id = user["id"]
try:
# 列出用户目录下的文件
files = await storage_service.list_files(BUCKET_REF_AUDIOS, user_id)
# 过滤出 .wav 文件
wav_files = [f for f in files if f.get("name", "").endswith(".wav")]
if not wav_files:
return success_response(RefAudioListResponse(items=[]).model_dump())
# 并发获取所有 metadata 和签名 URL
async def fetch_audio_info(f):
"""获取单个音频的信息metadata + signed URL"""
name = f.get("name", "")
storage_path = f"{user_id}/{name}"
metadata_name = name.replace(".wav", ".json")
metadata_path = f"{user_id}/{metadata_name}"
ref_text = ""
duration_sec = 0.0
created_at = 0
original_filename = ""
try:
# 获取 metadata 内容
metadata_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, metadata_path)
import httpx
async with httpx.AsyncClient(timeout=5.0) as client:
resp = await client.get(metadata_url)
if resp.status_code == 200:
metadata = resp.json()
ref_text = metadata.get("ref_text", "")
duration_sec = metadata.get("duration_sec", 0.0)
created_at = metadata.get("created_at", 0)
original_filename = metadata.get("original_filename", "")
except Exception as e:
logger.debug(f"读取 metadata 失败: {e}")
# 从文件名提取时间戳
try:
created_at = int(name.split("_")[0])
except:
pass
# 获取音频签名 URL
signed_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, storage_path)
# 优先显示原始文件名 (去掉时间戳前缀)
display_name = original_filename if original_filename else name
# 如果原始文件名丢失,尝试从现有文件名中通过正则去掉时间戳
if not display_name or display_name == name:
# 匹配 "1234567890_filename.wav"
match = re.match(r'^\d+_(.+)$', name)
if match:
display_name = match.group(1)
return RefAudioResponse(
id=storage_path,
name=display_name,
path=signed_url,
ref_text=ref_text,
duration_sec=duration_sec,
created_at=created_at
)
# 使用 asyncio.gather 并发获取所有音频信息
import asyncio
items = await asyncio.gather(*[fetch_audio_info(f) for f in wav_files])
# 按创建时间倒序排列
items = sorted(items, key=lambda x: x.created_at, reverse=True)
return success_response(RefAudioListResponse(items=items).model_dump())
result = await service.list_ref_audios(user["id"])
return success_response(result)
except Exception as e:
logger.error(f"列出参考音频失败: {e}")
raise HTTPException(status_code=500, detail=f"获取列表失败: {str(e)}")
@@ -326,96 +41,48 @@ async def list_ref_audios(user: dict = Depends(get_current_user)):
@router.delete("/{audio_id:path}")
async def delete_ref_audio(audio_id: str, user: dict = Depends(get_current_user)):
"""删除参考音频"""
user_id = user["id"]
# 安全检查:确保只能删除自己的文件
if not audio_id.startswith(f"{user_id}/"):
raise HTTPException(status_code=403, detail="无权删除此文件")
try:
# 删除 WAV 文件
await storage_service.delete_file(BUCKET_REF_AUDIOS, audio_id)
# 删除 metadata JSON
metadata_path = audio_id.replace(".wav", ".json")
try:
await storage_service.delete_file(BUCKET_REF_AUDIOS, metadata_path)
except:
pass # metadata 可能不存在
await service.delete_ref_audio(audio_id, user["id"])
return success_response(message="删除成功")
except PermissionError as e:
raise HTTPException(status_code=403, detail=str(e))
except Exception as e:
logger.error(f"删除参考音频失败: {e}")
raise HTTPException(status_code=500, detail=f"删除失败: {str(e)}")
class RenameRequest(BaseModel):
new_name: str
@router.put("/{audio_id:path}")
async def rename_ref_audio(
audio_id: str,
request: RenameRequest,
user: dict = Depends(get_current_user)
):
"""重命名参考音频 (修改 metadata 中的 display name)"""
user_id = user["id"]
# 安全检查
if not audio_id.startswith(f"{user_id}/"):
raise HTTPException(status_code=403, detail="无权修改此文件")
new_name = request.new_name.strip()
if not new_name:
raise HTTPException(status_code=400, detail="新名称不能为空")
# 确保新名称有后缀 (保留原后缀或添加 .wav)
if not Path(new_name).suffix:
new_name += ".wav"
"""重命名参考音频"""
try:
# 1. 下载现有的 metadata
metadata_path = audio_id.replace(".wav", ".json")
try:
# 获取已有的 JSON
import httpx
metadata_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, metadata_path)
if not metadata_url:
# 如果 json 不存在,则需要新建一个基础的
raise Exception("Metadata not found")
async with httpx.AsyncClient() as client:
resp = await client.get(metadata_url)
if resp.status_code == 200:
metadata = resp.json()
else:
raise Exception(f"Failed to fetch metadata: {resp.status_code}")
except Exception as e:
logger.warning(f"无法读取元数据: {e}, 将创建新的元数据")
# 兜底:如果读取失败,构建最小元数据
metadata = {
"ref_text": "", # 可能丢失
"duration_sec": 0.0,
"created_at": int(time.time()),
"original_filename": new_name
}
# 2. 更新 original_filename
metadata["original_filename"] = new_name
# 3. 覆盖上传 metadata
await storage_service.upload_file(
bucket=BUCKET_REF_AUDIOS,
path=metadata_path,
file_data=json.dumps(metadata, ensure_ascii=False).encode('utf-8'),
content_type="application/json"
)
return success_response({"name": new_name}, message="重命名成功")
result = await service.rename_ref_audio(audio_id, request.new_name, user["id"])
return success_response(result, message="重命名成功")
except PermissionError as e:
raise HTTPException(status_code=403, detail=str(e))
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
logger.error(f"重命名失败: {e}")
raise HTTPException(status_code=500, detail=f"重命名失败: {str(e)}")
@router.post("/{audio_id:path}/retranscribe")
async def retranscribe_ref_audio(
audio_id: str,
user: dict = Depends(get_current_user)
):
"""重新识别参考音频的文字内容"""
try:
result = await service.retranscribe_ref_audio(audio_id, user["id"])
return success_response(result, message="识别完成")
except PermissionError as e:
raise HTTPException(status_code=403, detail=str(e))
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
logger.error(f"重新识别失败: {e}")
raise HTTPException(status_code=500, detail=f"识别失败: {str(e)}")

View File

@@ -0,0 +1,19 @@
from pydantic import BaseModel
from typing import List
class RefAudioResponse(BaseModel):
id: str
name: str
path: str
ref_text: str
duration_sec: float
created_at: int
class RefAudioListResponse(BaseModel):
items: List[RefAudioResponse]
class RenameRequest(BaseModel):
new_name: str

View File

@@ -0,0 +1,405 @@
import re
import os
import time
import json
import hashlib
import asyncio
import subprocess
import tempfile
import unicodedata
from pathlib import Path
from typing import Optional
import httpx
from loguru import logger
from app.services.storage import storage_service
from app.modules.ref_audios.schemas import RefAudioResponse, RefAudioListResponse
ALLOWED_AUDIO_EXTENSIONS = {'.wav', '.mp3', '.m4a', '.webm', '.ogg', '.flac', '.aac'}
BUCKET_REF_AUDIOS = "ref-audios"
def sanitize_filename(filename: str) -> str:
"""清理文件名用于 Storage key仅保留 ASCII 安全字符)。"""
normalized = unicodedata.normalize("NFKD", filename)
ascii_name = normalized.encode("ascii", "ignore").decode("ascii")
safe_name = re.sub(r"[^A-Za-z0-9._-]+", "_", ascii_name).strip("._-")
# 纯中文/emoji 等场景会被清空,使用稳定哈希兜底,避免 InvalidKey
if not safe_name:
digest = hashlib.md5(filename.encode("utf-8")).hexdigest()[:12]
safe_name = f"audio_{digest}"
if len(safe_name) > 50:
ext = Path(safe_name).suffix
safe_name = safe_name[:50 - len(ext)] + ext
return safe_name
def _get_audio_duration(file_path: str) -> float:
"""获取音频时长 (秒)"""
try:
result = subprocess.run(
['ffprobe', '-v', 'quiet', '-show_entries', 'format=duration',
'-of', 'csv=p=0', file_path],
capture_output=True, text=True, timeout=10
)
return float(result.stdout.strip())
except Exception as e:
logger.warning(f"获取音频时长失败: {e}")
return 0.0
def _find_silence_cut_point(file_path: str, max_duration: float) -> float:
"""在 max_duration 附近找一个静音点作为截取位置,找不到则回退到 max_duration"""
try:
# 用 silencedetect 找所有静音段(阈值 -30dB最短 0.3 秒)
result = subprocess.run(
['ffmpeg', '-i', file_path, '-af',
'silencedetect=noise=-30dB:d=0.3', '-f', 'null', '-'],
capture_output=True, text=True, timeout=30
)
# 解析 silence_end 时间点
import re as _re
ends = [float(m) for m in _re.findall(r'silence_end:\s*([\d.]+)', result.stderr)]
# 找 max_duration 之前最后一个静音结束点(至少 3 秒)
candidates = [t for t in ends if 3.0 <= t <= max_duration]
if candidates:
cut = candidates[-1]
logger.info(f"Found silence cut point at {cut:.1f}s (max={max_duration}s)")
return cut
except Exception as e:
logger.warning(f"Silence detection failed: {e}")
return max_duration
def _convert_to_wav(input_path: str, output_path: str, max_duration: float = 0) -> bool:
"""将音频转换为 WAV 格式 (16kHz, mono),可选截取前 max_duration 秒并淡出"""
try:
cmd = ['ffmpeg', '-y', '-i', input_path]
if max_duration > 0:
cmd += ['-t', str(max_duration)]
# 末尾 0.1 秒淡出,避免截断爆音
fade_start = max(0, max_duration - 0.1)
cmd += ['-af', f'afade=t=out:st={fade_start}:d=0.1']
cmd += ['-ar', '16000', '-ac', '1', '-acodec', 'pcm_s16le', output_path]
subprocess.run(cmd, capture_output=True, timeout=60, check=True)
return True
except Exception as e:
logger.error(f"音频转换失败: {e}")
return False
async def upload_ref_audio(file, ref_text: str, user_id: str) -> dict:
"""上传参考音频:转码、获取时长、存储到 Supabase"""
if not file.filename:
raise ValueError("文件名无效")
filename = file.filename
ext = Path(filename).suffix.lower()
if ext not in ALLOWED_AUDIO_EXTENSIONS:
raise ValueError(f"不支持的音频格式: {ext}。支持的格式: {', '.join(ALLOWED_AUDIO_EXTENSIONS)}")
# 创建临时文件
with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp_input:
content = await file.read()
tmp_input.write(content)
tmp_input_path = tmp_input.name
try:
# 转换为 WAV 格式
tmp_wav_path = tmp_input_path + ".wav"
if not _convert_to_wav(tmp_input_path, tmp_wav_path):
raise RuntimeError("音频格式转换失败")
# 获取音频时长
duration = _get_audio_duration(tmp_wav_path)
if duration < 1.0:
raise ValueError("音频时长过短,至少需要 1 秒")
# 超过 10 秒自动在静音点截取CosyVoice 对 3-10 秒效果最好)
MAX_REF_DURATION = 10.0
if duration > MAX_REF_DURATION:
cut_point = _find_silence_cut_point(tmp_wav_path, MAX_REF_DURATION)
logger.info(f"Ref audio {duration:.1f}s > {MAX_REF_DURATION}s, trimming at {cut_point:.1f}s")
trimmed_path = tmp_input_path + "_trimmed.wav"
if not _convert_to_wav(tmp_wav_path, trimmed_path, max_duration=cut_point):
raise RuntimeError("音频截取失败")
os.unlink(tmp_wav_path)
tmp_wav_path = trimmed_path
duration = _get_audio_duration(tmp_wav_path)
# 自动转写参考音频内容
try:
from app.services.whisper_service import whisper_service
transcribed = await whisper_service.transcribe(tmp_wav_path)
if transcribed.strip():
ref_text = transcribed.strip()
logger.info(f"Auto-transcribed ref audio: {ref_text[:50]}...")
except Exception as e:
logger.warning(f"Auto-transcribe failed: {e}")
if not ref_text or not ref_text.strip():
raise ValueError("无法识别音频内容,请确保音频包含清晰的语音")
# 检查重名
existing_files = await storage_service.list_files(BUCKET_REF_AUDIOS, user_id)
dup_count = 0
search_suffix = f"_{filename}"
for f in existing_files:
fname = f.get('name', '')
if fname.endswith(search_suffix):
dup_count += 1
final_display_name = filename
if dup_count > 0:
name_stem = Path(filename).stem
name_ext = Path(filename).suffix
final_display_name = f"{name_stem}({dup_count}){name_ext}"
# 生成存储路径
timestamp = int(time.time())
safe_name = sanitize_filename(Path(filename).stem)
storage_path = f"{user_id}/{timestamp}_{safe_name}.wav"
# 上传 WAV 文件
with open(tmp_wav_path, 'rb') as f:
wav_data = f.read()
await storage_service.upload_file(
bucket=BUCKET_REF_AUDIOS,
path=storage_path,
file_data=wav_data,
content_type="audio/wav"
)
# 上传元数据 JSON
metadata = {
"ref_text": ref_text.strip(),
"original_filename": final_display_name,
"duration_sec": duration,
"created_at": timestamp
}
metadata_path = f"{user_id}/{timestamp}_{safe_name}.json"
await storage_service.upload_file(
bucket=BUCKET_REF_AUDIOS,
path=metadata_path,
file_data=json.dumps(metadata, ensure_ascii=False).encode('utf-8'),
content_type="application/json"
)
# 获取签名 URL
signed_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, storage_path)
return RefAudioResponse(
id=storage_path,
name=filename,
path=signed_url,
ref_text=ref_text.strip(),
duration_sec=duration,
created_at=timestamp
).model_dump()
finally:
os.unlink(tmp_input_path)
if os.path.exists(tmp_input_path + ".wav"):
os.unlink(tmp_input_path + ".wav")
async def list_ref_audios(user_id: str) -> dict:
"""列出用户的所有参考音频"""
files = await storage_service.list_files(BUCKET_REF_AUDIOS, user_id)
wav_files = [f for f in files if f.get("name", "").endswith(".wav")]
if not wav_files:
return RefAudioListResponse(items=[]).model_dump()
async def fetch_audio_info(f):
name = f.get("name", "")
storage_path = f"{user_id}/{name}"
metadata_name = name.replace(".wav", ".json")
metadata_path = f"{user_id}/{metadata_name}"
ref_text = ""
duration_sec = 0.0
created_at = 0
original_filename = ""
try:
metadata_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, metadata_path)
async with httpx.AsyncClient(timeout=5.0) as client:
resp = await client.get(metadata_url)
if resp.status_code == 200:
metadata = resp.json()
ref_text = metadata.get("ref_text", "")
duration_sec = metadata.get("duration_sec", 0.0)
created_at = metadata.get("created_at", 0)
original_filename = metadata.get("original_filename", "")
except Exception as e:
logger.debug(f"读取 metadata 失败: {e}")
try:
created_at = int(name.split("_")[0])
except:
pass
signed_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, storage_path)
display_name = original_filename if original_filename else name
if not display_name or display_name == name:
match = re.match(r'^\d+_(.+)$', name)
if match:
display_name = match.group(1)
return RefAudioResponse(
id=storage_path,
name=display_name,
path=signed_url,
ref_text=ref_text,
duration_sec=duration_sec,
created_at=created_at
)
items = await asyncio.gather(*[fetch_audio_info(f) for f in wav_files])
items = sorted(items, key=lambda x: x.created_at, reverse=True)
return RefAudioListResponse(items=items).model_dump()
async def delete_ref_audio(audio_id: str, user_id: str) -> None:
"""删除参考音频及其元数据"""
if not audio_id.startswith(f"{user_id}/"):
raise PermissionError("无权删除此文件")
await storage_service.delete_file(BUCKET_REF_AUDIOS, audio_id)
metadata_path = audio_id.replace(".wav", ".json")
try:
await storage_service.delete_file(BUCKET_REF_AUDIOS, metadata_path)
except:
pass
async def rename_ref_audio(audio_id: str, new_name: str, user_id: str) -> dict:
"""重命名参考音频(修改 metadata 中的 display name"""
if not audio_id.startswith(f"{user_id}/"):
raise PermissionError("无权修改此文件")
new_name = new_name.strip()
if not new_name:
raise ValueError("新名称不能为空")
if not Path(new_name).suffix:
new_name += ".wav"
# 下载现有 metadata
metadata_path = audio_id.replace(".wav", ".json")
try:
metadata_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, metadata_path)
async with httpx.AsyncClient() as client:
resp = await client.get(metadata_url)
if resp.status_code == 200:
metadata = resp.json()
else:
raise Exception(f"Failed to fetch metadata: {resp.status_code}")
except Exception as e:
logger.warning(f"无法读取元数据: {e}, 将创建新的元数据")
metadata = {
"ref_text": "",
"duration_sec": 0.0,
"created_at": int(time.time()),
"original_filename": new_name
}
# 更新并覆盖上传
metadata["original_filename"] = new_name
await storage_service.upload_file(
bucket=BUCKET_REF_AUDIOS,
path=metadata_path,
file_data=json.dumps(metadata, ensure_ascii=False).encode('utf-8'),
content_type="application/json"
)
return {"name": new_name}
async def retranscribe_ref_audio(audio_id: str, user_id: str) -> dict:
"""重新转写参考音频的 ref_text并截取前 10 秒重新上传(用于迁移旧数据)"""
if not audio_id.startswith(f"{user_id}/"):
raise PermissionError("无权修改此文件")
# 下载音频到临时文件
audio_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, audio_id)
tmp_wav_path = None
trimmed_path = None
try:
with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as tmp:
tmp_wav_path = tmp.name
timeout = httpx.Timeout(None)
async with httpx.AsyncClient(timeout=timeout) as client:
async with client.stream("GET", audio_url) as resp:
resp.raise_for_status()
async for chunk in resp.aiter_bytes():
tmp.write(chunk)
# 超过 10 秒则截取前 10 秒并重新上传音频
MAX_REF_DURATION = 10.0
duration = _get_audio_duration(tmp_wav_path)
transcribe_path = tmp_wav_path
need_reupload = False
if duration > MAX_REF_DURATION:
cut_point = _find_silence_cut_point(tmp_wav_path, MAX_REF_DURATION)
logger.info(f"Retranscribe: trimming {audio_id} from {duration:.1f}s at {cut_point:.1f}s")
trimmed_path = tmp_wav_path + "_trimmed.wav"
if _convert_to_wav(tmp_wav_path, trimmed_path, max_duration=cut_point):
transcribe_path = trimmed_path
duration = _get_audio_duration(trimmed_path)
need_reupload = True
# Whisper 转写
from app.services.whisper_service import whisper_service
transcribed = await whisper_service.transcribe(transcribe_path)
if not transcribed or not transcribed.strip():
raise ValueError("无法识别音频内容")
ref_text = transcribed.strip()
logger.info(f"Re-transcribed ref audio {audio_id}: {ref_text[:50]}...")
# 截取过的音频重新上传覆盖原文件
if need_reupload and trimmed_path:
with open(trimmed_path, "rb") as f:
await storage_service.upload_file(
bucket=BUCKET_REF_AUDIOS, path=audio_id,
file_data=f.read(), content_type="audio/wav",
)
logger.info(f"Re-uploaded trimmed audio: {audio_id} ({duration:.1f}s)")
# 更新 metadata
metadata_path = audio_id.replace(".wav", ".json")
try:
meta_url = await storage_service.get_signed_url(BUCKET_REF_AUDIOS, metadata_path)
async with httpx.AsyncClient(timeout=5.0) as client:
resp = await client.get(meta_url)
if resp.status_code == 200:
metadata = resp.json()
else:
raise Exception(f"status {resp.status_code}")
except Exception:
metadata = {}
metadata["ref_text"] = ref_text
metadata["duration_sec"] = duration
await storage_service.upload_file(
bucket=BUCKET_REF_AUDIOS,
path=metadata_path,
file_data=json.dumps(metadata, ensure_ascii=False).encode('utf-8'),
content_type="application/json"
)
return {"ref_text": ref_text, "duration_sec": duration}
finally:
if tmp_wav_path and os.path.exists(tmp_wav_path):
os.unlink(tmp_wav_path)
if trimmed_path and os.path.exists(trimmed_path):
os.unlink(trimmed_path)

View File

@@ -1,417 +1,33 @@
from fastapi import APIRouter, UploadFile, File, Form, HTTPException
from typing import Optional, Any, cast
import asyncio
import shutil
import os
import time
from pathlib import Path
from loguru import logger
from typing import Optional
import traceback
import re
import json
import requests
from urllib.parse import unquote
from loguru import logger
from app.services.whisper_service import whisper_service
from app.services.glm_service import glm_service
from app.core.response import success_response
from app.modules.tools import service
router = APIRouter()
@router.post("/extract-script")
async def extract_script_tool(
file: Optional[UploadFile] = File(None),
url: Optional[str] = Form(None),
rewrite: bool = Form(True)
rewrite: bool = Form(True),
custom_prompt: Optional[str] = Form(None)
):
"""
独立文案提取工具
支持上传视频/音频 OR 输入视频链接 -> 提取文字 -> (可选) AI洗稿
"""
if not file and not url:
raise HTTPException(400, "必须提供文件或视频链接")
temp_path = None
"""独立文案提取工具"""
try:
timestamp = int(time.time())
temp_dir = Path("/tmp")
if os.name == 'nt':
temp_dir = Path("d:/tmp")
temp_dir.mkdir(parents=True, exist_ok=True)
# 1. 获取/保存文件
loop = asyncio.get_event_loop()
if file:
filename = file.filename
if not filename:
raise HTTPException(400, "文件名无效")
safe_filename = Path(filename).name.replace(" ", "_")
temp_path = temp_dir / f"tool_extract_{timestamp}_{safe_filename}"
# 文件 I/O 放入线程池
await loop.run_in_executor(None, lambda: shutil.copyfileobj(file.file, open(temp_path, "wb")))
logger.info(f"Tool processing upload file: {temp_path}")
else:
if not url:
raise HTTPException(400, "必须提供视频链接")
url_value: str = url
# URL 下载逻辑
# 自动提取文案中的链接 (支持 Douyin/Bilibili 等分享文案)
url_match = re.search(r'https?://[^\s]+', url_value)
if url_match:
extracted_url = url_match.group(0)
logger.info(f"Extracted URL from text: {extracted_url}")
url_value = extracted_url
logger.info(f"Tool downloading URL: {url_value}")
# 封装 yt-dlp 下载函数 (Blocking)
def _download_yt_dlp():
import yt_dlp
logger.info("Attempting download with yt-dlp...")
ydl_opts = {
'format': 'bestaudio/best',
'outtmpl': str(temp_dir / f"tool_download_{timestamp}_%(id)s.%(ext)s"),
'quiet': True,
'no_warnings': True,
'http_headers': {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
'Referer': 'https://www.douyin.com/',
}
}
with yt_dlp.YoutubeDL() as ydl_raw:
ydl: Any = ydl_raw
ydl.params.update(ydl_opts)
info = ydl.extract_info(url_value, download=True)
if 'requested_downloads' in info:
downloaded_file = info['requested_downloads'][0]['filepath']
else:
ext = info.get('ext', 'mp4')
id = info.get('id')
downloaded_file = str(temp_dir / f"tool_download_{timestamp}_{id}.{ext}")
return Path(downloaded_file)
# 先尝试 yt-dlp (Run in Executor)
try:
temp_path = await loop.run_in_executor(None, _download_yt_dlp)
logger.info(f"yt-dlp downloaded to: {temp_path}")
except Exception as e:
logger.warning(f"yt-dlp download failed: {e}. Trying manual Douyin fallback...")
# 失败则尝试手动解析 (Douyin Fallback)
if "douyin" in url_value:
manual_path = await download_douyin_manual(url_value, temp_dir, timestamp)
if manual_path:
temp_path = manual_path
logger.info(f"Manual Douyin fallback successful: {temp_path}")
else:
raise HTTPException(400, f"视频下载失败。yt-dlp 报错: {str(e)}")
elif "bilibili" in url_value:
manual_path = await download_bilibili_manual(url_value, temp_dir, timestamp)
if manual_path:
temp_path = manual_path
logger.info(f"Manual Bilibili fallback successful: {temp_path}")
else:
raise HTTPException(400, f"视频下载失败。yt-dlp 报错: {str(e)}")
else:
raise HTTPException(400, f"视频下载失败: {str(e)}")
if not temp_path or not temp_path.exists():
raise HTTPException(400, "文件获取失败")
# 1.5 安全转换: 强制转为 WAV (16k)
import subprocess
audio_path = temp_dir / f"extract_audio_{timestamp}.wav"
def _convert_audio():
try:
convert_cmd = [
'ffmpeg',
'-i', str(temp_path),
'-vn', # 忽略视频
'-acodec', 'pcm_s16le',
'-ar', '16000', # Whisper 推荐采样率
'-ac', '1', # 单声道
'-y', # 覆盖
str(audio_path)
]
# 捕获 stderr
subprocess.run(convert_cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
return True
except subprocess.CalledProcessError as e:
error_log = e.stderr.decode('utf-8', errors='ignore') if e.stderr else str(e)
logger.error(f"FFmpeg check/convert failed: {error_log}")
# 检查是否为 HTML
head = b""
try:
with open(temp_path, 'rb') as f:
head = f.read(100)
except: pass
if b'<!DOCTYPE html' in head or b'<html' in head:
raise ValueError("HTML_DETECTED")
raise ValueError("CONVERT_FAILED")
# 执行转换 (Run in Executor)
try:
await loop.run_in_executor(None, _convert_audio)
logger.info(f"Converted to WAV: {audio_path}")
target_path = audio_path
except ValueError as ve:
if str(ve) == "HTML_DETECTED":
raise HTTPException(400, "下载的文件是网页而非视频,请重试或手动上传。")
else:
raise HTTPException(400, "下载的文件已损坏或格式无法识别。")
# 2. 提取文案 (Whisper)
script = await whisper_service.transcribe(str(target_path))
# 3. AI 洗稿 (GLM)
rewritten = None
if rewrite:
if script and len(script.strip()) > 0:
logger.info("Rewriting script...")
rewritten = await glm_service.rewrite_script(script)
else:
logger.warning("No script extracted, skipping rewrite")
return success_response({
"original_script": script,
"rewritten_script": rewritten
})
except HTTPException as he:
raise he
result = await service.extract_script(file=file, url=url, rewrite=rewrite, custom_prompt=custom_prompt)
return success_response(result)
except ValueError as e:
raise HTTPException(400, str(e))
except HTTPException:
raise
except Exception as e:
logger.error(f"Tool extract failed: {e}")
logger.error(traceback.format_exc())
# Friendly error message
msg = str(e)
if "Fresh cookies" in msg:
msg = "下载失败:目标平台开启了反爬验证,请过段时间重试或直接上传视频文件。"
raise HTTPException(500, f"提取失败: {msg}")
finally:
# 清理临时文件
if temp_path and temp_path.exists():
try:
os.remove(temp_path)
logger.info(f"Cleaned up temp file: {temp_path}")
except Exception as e:
logger.warning(f"Failed to cleanup temp file {temp_path}: {e}")
async def download_douyin_manual(url: str, temp_dir: Path, timestamp: int) -> Optional[Path]:
"""
手动下载抖音视频 (Fallback logic - Ported from SuperIPAgent/douyinDownloader)
使用特定的 User Profile URL 和硬编码 Cookie 绕过反爬
"""
import httpx
logger.info(f"[SuperIPAgent] Starting download for: {url}")
try:
# 1. 提取 Modal ID (支持短链跳转)
headers = {
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36"
}
# 如果是短链或重定向 - 使用异步 httpx
async with httpx.AsyncClient(follow_redirects=True, timeout=10.0) as client:
resp = await client.get(url, headers=headers)
final_url = str(resp.url)
logger.info(f"[SuperIPAgent] Final URL: {final_url}")
modal_id = None
match = re.search(r'/video/(\d+)', final_url)
if match:
modal_id = match.group(1)
if not modal_id:
logger.error("[SuperIPAgent] Could not extract modal_id")
return None
logger.info(f"[SuperIPAgent] Extracted modal_id: {modal_id}")
# 2. 构造特定请求 URL (Copy from SuperIPAgent)
# 使用特定用户的 Profile 页 + modal_id 参数,配合特定 Cookie
target_url = f"https://www.douyin.com/user/MS4wLjABAAAAN_s_hups7LD0N4qnrM3o2gI0vuG3pozNaEolz2_py3cHTTrpVr1Z4dukFD9SOlwY?from_tab_name=main&modal_id={modal_id}"
# 3. 使用配置的 Cookie (从环境变量 DOUYIN_COOKIE 读取)
from app.core.config import settings
if not settings.DOUYIN_COOKIE:
logger.warning("[SuperIPAgent] DOUYIN_COOKIE 未配置,视频下载可能失败")
headers_with_cookie = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"cookie": settings.DOUYIN_COOKIE,
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36",
}
logger.info(f"[SuperIPAgent] Requesting page with Cookie...")
async with httpx.AsyncClient(timeout=10.0) as client:
response = await client.get(target_url, headers=headers_with_cookie)
# 4. 解析 RENDER_DATA
content_match = re.findall(r'<script id="RENDER_DATA" type="application/json">(.*?)</script>', response.text)
if not content_match:
# 尝试解码后再查找?或者结构变了
# 再尝试找 SSR_HYDRATED_DATA
if "SSR_HYDRATED_DATA" in response.text:
content_match = re.findall(r'<script id="SSR_HYDRATED_DATA" type="application/json">(.*?)</script>', response.text)
if not content_match:
logger.error(f"[SuperIPAgent] Could not find RENDER_DATA in page (len={len(response.text)})")
return None
content = unquote(content_match[0])
try:
data = json.loads(content)
except:
logger.error("[SuperIPAgent] JSON decode failed")
return None
# 5. 提取视频流
video_url = None
try:
# 路径通常是: app -> videoDetail -> video -> bitRateList -> playAddr -> src
if "app" in data and "videoDetail" in data["app"]:
info = data["app"]["videoDetail"]["video"]
if "bitRateList" in info and info["bitRateList"]:
video_url = info["bitRateList"][0]["playAddr"][0]["src"]
elif "playAddr" in info and info["playAddr"]:
video_url = info["playAddr"][0]["src"]
except Exception as e:
logger.error(f"[SuperIPAgent] Path extraction failed: {e}")
if not video_url:
logger.error("[SuperIPAgent] No video_url found")
return None
if video_url.startswith("//"):
video_url = "https:" + video_url
logger.info(f"[SuperIPAgent] Found video URL: {video_url[:50]}...")
# 6. 下载 (带 Header) - 使用异步 httpx
temp_path = temp_dir / f"douyin_manual_{timestamp}.mp4"
download_headers = {
'Referer': 'https://www.douyin.com/',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
}
async with httpx.AsyncClient(timeout=60.0) as client:
async with client.stream("GET", video_url, headers=download_headers) as dl_resp:
if dl_resp.status_code == 200:
with open(temp_path, 'wb') as f:
async for chunk in dl_resp.aiter_bytes(chunk_size=8192):
f.write(chunk)
logger.info(f"[SuperIPAgent] Downloaded successfully: {temp_path}")
return temp_path
else:
logger.error(f"[SuperIPAgent] Download failed: {dl_resp.status_code}")
return None
except Exception as e:
logger.error(f"[SuperIPAgent] Logic failed: {e}")
return None
async def download_bilibili_manual(url: str, temp_dir: Path, timestamp: int) -> Optional[Path]:
"""
手动下载 Bilibili 视频 (Fallback logic - Playwright Version)
B站通常音视频分离这里只提取音频即可因为只需要文案
"""
from playwright.async_api import async_playwright
logger.info(f"[Playwright] Starting Bilibili download for: {url}")
playwright = None
browser = None
try:
playwright = await async_playwright().start()
# Launch browser (ensure chromium is installed: playwright install chromium)
browser = await playwright.chromium.launch(headless=True, args=['--no-sandbox', '--disable-setuid-sandbox'])
# Mobile User Agent often gives single stream?
# But Bilibili mobile web is tricky. Desktop is fine.
context = await browser.new_context(
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
)
page = await context.new_page()
# Intercept audio responses?
# Bilibili streams are usually .m4s
# But finding the initial state is easier.
logger.info("[Playwright] Navigating to Bilibili...")
await page.goto(url, timeout=45000)
# Wait for video element (triggers loading)
try:
await page.wait_for_selector('video', timeout=15000)
except:
logger.warning("[Playwright] Video selector timeout")
# 1. Try extracting from __playinfo__
# window.__playinfo__ contains dash streams
playinfo = await page.evaluate("window.__playinfo__")
audio_url = None
if playinfo and "data" in playinfo and "dash" in playinfo["data"]:
dash = playinfo["data"]["dash"]
if "audio" in dash and dash["audio"]:
audio_url = dash["audio"][0]["baseUrl"]
logger.info(f"[Playwright] Found audio stream in __playinfo__: {audio_url[:50]}...")
# 2. If playinfo fails, try extracting video src (sometimes it's a blob, which we can't fetch easily without interception)
# But interception is complex. Let's try requests with Referer if we have URL.
if not audio_url:
logger.warning("[Playwright] Could not find audio in __playinfo__")
return None
# Download the audio stream
temp_path = temp_dir / f"bilibili_audio_{timestamp}.m4s" # usually m4s
try:
api_request = context.request
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Referer": "https://www.bilibili.com/"
}
logger.info(f"[Playwright] Downloading audio stream...")
response = await api_request.get(audio_url, headers=headers)
if response.status == 200:
body = await response.body()
with open(temp_path, 'wb') as f:
f.write(body)
logger.info(f"[Playwright] Downloaded successfully: {temp_path}")
return temp_path
else:
logger.error(f"[Playwright] API Request failed: {response.status}")
return None
except Exception as e:
logger.error(f"[Playwright] Download logic error: {e}")
return None
except Exception as e:
logger.error(f"[Playwright] Bilibili download failed: {e}")
return None
finally:
if browser:
await browser.close()
if playwright:
await playwright.stop()

View File

@@ -0,0 +1,7 @@
from pydantic import BaseModel
from typing import Optional
class ExtractScriptResponse(BaseModel):
original_script: Optional[str] = None
rewritten_script: Optional[str] = None

View File

@@ -0,0 +1,354 @@
import asyncio
import os
import re
import json
import time
import shutil
import subprocess
import traceback
from pathlib import Path
from typing import Optional, Any
from urllib.parse import unquote
import httpx
from loguru import logger
from app.services.whisper_service import whisper_service
from app.services.glm_service import glm_service
async def extract_script(file=None, url: Optional[str] = None, rewrite: bool = True, custom_prompt: Optional[str] = None) -> dict:
"""
文案提取:上传文件或视频链接 -> Whisper 转写 -> (可选) GLM 改写
"""
if not file and not url:
raise ValueError("必须提供文件或视频链接")
temp_path = None
try:
timestamp = int(time.time())
temp_dir = Path("/tmp")
if os.name == 'nt':
temp_dir = Path("d:/tmp")
temp_dir.mkdir(parents=True, exist_ok=True)
loop = asyncio.get_event_loop()
# 1. 获取/保存文件
if file:
filename = file.filename
if not filename:
raise ValueError("文件名无效")
safe_filename = Path(filename).name.replace(" ", "_")
temp_path = temp_dir / f"tool_extract_{timestamp}_{safe_filename}"
await loop.run_in_executor(None, lambda: shutil.copyfileobj(file.file, open(temp_path, "wb")))
logger.info(f"Tool processing upload file: {temp_path}")
else:
temp_path = await _download_video(url, temp_dir, timestamp)
if not temp_path or not temp_path.exists():
raise ValueError("文件获取失败")
# 1.5 安全转换: 强制转为 WAV (16k)
audio_path = temp_dir / f"extract_audio_{timestamp}.wav"
try:
await loop.run_in_executor(None, lambda: _convert_to_wav(temp_path, audio_path))
logger.info(f"Converted to WAV: {audio_path}")
except ValueError as ve:
if str(ve) == "HTML_DETECTED":
raise ValueError("下载的文件是网页而非视频,请重试或手动上传。")
else:
raise ValueError("下载的文件已损坏或格式无法识别。")
# 2. 提取文案 (Whisper)
script = await whisper_service.transcribe(str(audio_path))
# 3. AI 改写 (GLM) — 失败时降级返回原文
rewritten = None
if rewrite and script and len(script.strip()) > 0:
logger.info("Rewriting script...")
try:
rewritten = await glm_service.rewrite_script(script, custom_prompt)
except Exception as e:
logger.warning(f"GLM rewrite failed, returning original script: {e}")
rewritten = None
return {
"original_script": script,
"rewritten_script": rewritten
}
finally:
if temp_path and temp_path.exists():
try:
os.remove(temp_path)
logger.info(f"Cleaned up temp file: {temp_path}")
except Exception as e:
logger.warning(f"Failed to cleanup temp file {temp_path}: {e}")
def _convert_to_wav(input_path: Path, output_path: Path) -> None:
"""FFmpeg 转换为 16k WAV"""
try:
convert_cmd = [
'ffmpeg',
'-i', str(input_path),
'-vn',
'-acodec', 'pcm_s16le',
'-ar', '16000',
'-ac', '1',
'-y',
str(output_path)
]
subprocess.run(convert_cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
except subprocess.CalledProcessError as e:
error_log = e.stderr.decode('utf-8', errors='ignore') if e.stderr else str(e)
logger.error(f"FFmpeg check/convert failed: {error_log}")
head = b""
try:
with open(input_path, 'rb') as f:
head = f.read(100)
except:
pass
if b'<!DOCTYPE html' in head or b'<html' in head:
raise ValueError("HTML_DETECTED")
raise ValueError("CONVERT_FAILED")
async def _download_video(url: str, temp_dir: Path, timestamp: int) -> Path:
"""下载视频yt-dlp 优先,失败回退手动解析)"""
url_value = url
url_match = re.search(r'https?://[^\s]+', url_value)
if url_match:
extracted_url = url_match.group(0)
logger.info(f"Extracted URL from text: {extracted_url}")
url_value = extracted_url
logger.info(f"Tool downloading URL: {url_value}")
loop = asyncio.get_event_loop()
# 先尝试 yt-dlp
try:
temp_path = await loop.run_in_executor(None, lambda: _download_yt_dlp(url_value, temp_dir, timestamp))
logger.info(f"yt-dlp downloaded to: {temp_path}")
return temp_path
except Exception as e:
logger.warning(f"yt-dlp download failed: {e}. Trying manual fallback...")
if "douyin" in url_value:
manual_path = await _download_douyin_manual(url_value, temp_dir, timestamp)
if manual_path:
return manual_path
raise ValueError(f"视频下载失败。yt-dlp 报错: {str(e)}")
elif "bilibili" in url_value:
manual_path = await _download_bilibili_manual(url_value, temp_dir, timestamp)
if manual_path:
return manual_path
raise ValueError(f"视频下载失败。yt-dlp 报错: {str(e)}")
else:
raise ValueError(f"视频下载失败: {str(e)}")
def _download_yt_dlp(url_value: str, temp_dir: Path, timestamp: int) -> Path:
"""yt-dlp 下载(阻塞调用,应在线程池中运行)"""
import yt_dlp
logger.info("Attempting download with yt-dlp...")
ydl_opts = {
'format': 'bestaudio/best',
'outtmpl': str(temp_dir / f"tool_download_{timestamp}_%(id)s.%(ext)s"),
'quiet': True,
'no_warnings': True,
'http_headers': {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
'Referer': 'https://www.douyin.com/',
}
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
info = ydl.extract_info(url_value, download=True)
if 'requested_downloads' in info:
downloaded_file = info['requested_downloads'][0]['filepath']
else:
ext = info.get('ext', 'mp4')
vid_id = info.get('id')
downloaded_file = str(temp_dir / f"tool_download_{timestamp}_{vid_id}.{ext}")
return Path(downloaded_file)
async def _download_douyin_manual(url: str, temp_dir: Path, timestamp: int) -> Optional[Path]:
"""手动下载抖音视频 (Fallback) — 通过移动端分享页获取播放地址"""
logger.info(f"[douyin-fallback] Starting download for: {url}")
try:
# 1. 解析短链接,提取视频 ID
headers = {
"user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15"
}
async with httpx.AsyncClient(follow_redirects=True, timeout=10.0) as client:
resp = await client.get(url, headers=headers)
final_url = str(resp.url)
logger.info(f"[douyin-fallback] Final URL: {final_url}")
video_id = None
match = re.search(r'/video/(\d+)', final_url)
if match:
video_id = match.group(1)
if not video_id:
logger.error("[douyin-fallback] Could not extract video_id")
return None
logger.info(f"[douyin-fallback] Extracted video_id: {video_id}")
# 2. 获取新鲜 ttwid
ttwid = ""
try:
async with httpx.AsyncClient(timeout=10.0) as client:
ttwid_resp = await client.post(
"https://ttwid.bytedance.com/ttwid/union/register/",
json={
"region": "cn", "aid": 6383, "needFid": False,
"service": "www.douyin.com",
"migrate_info": {"ticket": "", "source": "node"},
"cbUrlProtocol": "https", "union": True,
}
)
ttwid = ttwid_resp.cookies.get("ttwid", "")
logger.info(f"[douyin-fallback] Got fresh ttwid (len={len(ttwid)})")
except Exception as e:
logger.warning(f"[douyin-fallback] Failed to get ttwid: {e}")
# 3. 访问移动端分享页提取播放地址
page_headers = {
"user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15",
"cookie": f"ttwid={ttwid}" if ttwid else "",
}
async with httpx.AsyncClient(follow_redirects=True, timeout=15.0) as client:
page_resp = await client.get(
f"https://m.douyin.com/share/video/{video_id}",
headers=page_headers,
)
page_text = page_resp.text
logger.info(f"[douyin-fallback] Mobile page length: {len(page_text)}")
# 4. 提取 play_addr
addr_match = re.search(
r'"play_addr":\{"uri":"([^"]+)","url_list":\["([^"]+)"',
page_text,
)
if not addr_match:
logger.error("[douyin-fallback] Could not find play_addr in mobile page")
return None
video_url = addr_match.group(2).replace(r"\u002F", "/")
if video_url.startswith("//"):
video_url = "https:" + video_url
logger.info(f"[douyin-fallback] Found video URL: {video_url[:80]}...")
# 5. 下载视频
temp_path = temp_dir / f"douyin_manual_{timestamp}.mp4"
download_headers = {
"Referer": "https://www.douyin.com/",
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X) AppleWebKit/605.1.15",
}
async with httpx.AsyncClient(timeout=120.0, follow_redirects=True) as client:
async with client.stream("GET", video_url, headers=download_headers) as dl_resp:
if dl_resp.status_code == 200:
with open(temp_path, "wb") as f:
async for chunk in dl_resp.aiter_bytes(chunk_size=8192):
f.write(chunk)
logger.info(f"[douyin-fallback] Downloaded successfully: {temp_path}")
return temp_path
else:
logger.error(f"[douyin-fallback] Download failed: {dl_resp.status_code}")
return None
except Exception as e:
logger.error(f"[douyin-fallback] Logic failed: {e}")
return None
async def _download_bilibili_manual(url: str, temp_dir: Path, timestamp: int) -> Optional[Path]:
"""手动下载 Bilibili 视频 (Playwright Fallback)"""
from playwright.async_api import async_playwright
logger.info(f"[Playwright] Starting Bilibili download for: {url}")
playwright = None
browser = None
try:
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless=True, args=['--no-sandbox', '--disable-setuid-sandbox'])
context = await browser.new_context(
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
)
page = await context.new_page()
logger.info("[Playwright] Navigating to Bilibili...")
await page.goto(url, timeout=45000)
try:
await page.wait_for_selector('video', timeout=15000)
except:
logger.warning("[Playwright] Video selector timeout")
playinfo = await page.evaluate("window.__playinfo__")
audio_url = None
if playinfo and "data" in playinfo and "dash" in playinfo["data"]:
dash = playinfo["data"]["dash"]
if "audio" in dash and dash["audio"]:
audio_url = dash["audio"][0]["baseUrl"]
logger.info(f"[Playwright] Found audio stream in __playinfo__: {audio_url[:50]}...")
if not audio_url:
logger.warning("[Playwright] Could not find audio in __playinfo__")
return None
temp_path = temp_dir / f"bilibili_audio_{timestamp}.m4s"
try:
api_request = context.request
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Referer": "https://www.bilibili.com/"
}
logger.info(f"[Playwright] Downloading audio stream...")
response = await api_request.get(audio_url, headers=headers)
if response.status == 200:
body = await response.body()
with open(temp_path, 'wb') as f:
f.write(body)
logger.info(f"[Playwright] Downloaded successfully: {temp_path}")
return temp_path
else:
logger.error(f"[Playwright] API Request failed: {response.status}")
return None
except Exception as e:
logger.error(f"[Playwright] Download logic error: {e}")
return None
except Exception as e:
logger.error(f"[Playwright] Bilibili download failed: {e}")
return None
finally:
if browser:
await browser.close()
if playwright:
await playwright.stop()

View File

@@ -1,19 +1,40 @@
from pydantic import BaseModel
from typing import Optional
from typing import Optional, List, Literal
class CustomAssignment(BaseModel):
material_path: str
start: float # 音频时间轴起点
end: float # 音频时间轴终点
source_start: float = 0.0 # 源视频截取起点
source_end: Optional[float] = None # 源视频截取终点(可选)
class GenerateRequest(BaseModel):
text: str
voice: str = "zh-CN-YunxiNeural"
material_path: str
material_paths: Optional[List[str]] = None
tts_mode: str = "edgetts"
ref_audio_id: Optional[str] = None
ref_text: Optional[str] = None
language: str = "zh-CN"
generated_audio_id: Optional[str] = None # 预生成配音 ID存在时跳过内联 TTS
title: Optional[str] = None
title_display_mode: Literal["short", "persistent"] = "short"
title_duration: float = 4.0
enable_subtitles: bool = True
subtitle_style_id: Optional[str] = None
title_style_id: Optional[str] = None
secondary_title: Optional[str] = None
secondary_title_style_id: Optional[str] = None
secondary_title_font_size: Optional[int] = None
secondary_title_top_margin: Optional[int] = None
subtitle_font_size: Optional[int] = None
title_font_size: Optional[int] = None
title_top_margin: Optional[int] = None
subtitle_bottom_margin: Optional[int] = None
bgm_id: Optional[str] = None
bgm_volume: Optional[float] = 0.2
custom_assignments: Optional[List[CustomAssignment]] = None
output_aspect_ratio: Literal["9:16", "16:9"] = "9:16"

View File

@@ -1,5 +1,6 @@
from typing import Optional, Any
from typing import Optional, Any, List
from pathlib import Path
import asyncio
import time
import traceback
import httpx
@@ -24,6 +25,17 @@ from .schemas import GenerateRequest
from .task_store import task_store
def _locale_to_whisper_lang(locale: str) -> str:
"""'en-US''en', 'zh-CN''zh'"""
return locale.split("-")[0] if "-" in locale else locale
def _locale_to_tts_lang(locale: str) -> str:
"""'zh-CN''Chinese', 'en-US''English', 其他 → 'Auto'"""
mapping = {"zh": "Chinese", "en": "English"}
return mapping.get(locale.split("-")[0], "Auto")
_lipsync_service: Optional[LipSyncService] = None
_lipsync_ready: Optional[bool] = None
_lipsync_last_check: float = 0
@@ -79,26 +91,162 @@ def _update_task(task_id: str, **updates: Any) -> None:
task_store.update(task_id, updates)
# ── 多素材辅助函数 ──
def _split_equal(segments: List[dict], material_paths: List[str]) -> List[dict]:
"""按素材数量均分音频时长,对齐到最近的 Whisper 字边界。
Args:
segments: Whisper 产出的 segment 列表, 每个包含 words (字级时间戳)
material_paths: 素材路径列表
Returns:
[{"material_path": "...", "start": 0.0, "end": 5.2, "index": 0}, ...]
"""
# 展平所有 Whisper 字符
all_chars: List[dict] = []
for seg in segments:
for w in seg.get("words", []):
all_chars.append(w)
n = len(material_paths)
if not all_chars or n == 0:
return [{"material_path": material_paths[0] if material_paths else "",
"start": 0.0, "end": 99999.0, "index": 0}]
# 素材数不能超过字符数,否则边界会重复
if n > len(all_chars):
logger.warning(f"[MultiMat] 素材数({n}) > 字符数({len(all_chars)}),裁剪为 {len(all_chars)}")
n = len(all_chars)
total_start = all_chars[0]["start"]
total_end = all_chars[-1]["end"]
seg_dur = (total_end - total_start) / n
# 计算 N-1 个分割点,对齐到最近的字边界
boundaries = [0] # 第一段从第 0 个字开始
for i in range(1, n):
target_time = total_start + i * seg_dur
# 找到 start 时间最接近 target_time 的字
best_idx = boundaries[-1] + 1 # 至少比上一个边界后移 1
best_diff = float("inf")
for j in range(boundaries[-1] + 1, len(all_chars)):
diff = abs(all_chars[j]["start"] - target_time)
if diff < best_diff:
best_diff = diff
best_idx = j
elif diff > best_diff:
break # 时间递增,差值开始变大后可以停了
boundaries.append(min(best_idx, len(all_chars) - 1))
boundaries.append(len(all_chars)) # 最后一段到末尾
# 按边界生成分配结果
assignments: List[dict] = []
for i in range(n):
s_idx = boundaries[i]
e_idx = boundaries[i + 1]
if s_idx >= len(all_chars) or s_idx >= e_idx:
continue
assignments.append({
"material_path": material_paths[i],
"start": all_chars[s_idx]["start"],
"end": all_chars[e_idx - 1]["end"],
"text": "".join(c["word"] for c in all_chars[s_idx:e_idx]),
"index": len(assignments),
})
if not assignments:
return [{"material_path": material_paths[0], "start": 0.0, "end": 99999.0, "index": 0}]
logger.info(f"[MultiMat] 均分 {len(all_chars)} 字为 {len(assignments)}")
for a in assignments:
dur = a["end"] - a["start"]
logger.info(f"{a['index']}: [{a['start']:.2f}-{a['end']:.2f}s] ({dur:.1f}s) {a['text'][:20]}")
return assignments
async def process_video_generation(task_id: str, req: GenerateRequest, user_id: str):
temp_files = []
try:
start_time = time.time()
# ── 确定素材列表 ──
material_paths: List[str] = []
if req.custom_assignments and len(req.custom_assignments) > 1:
material_paths = [a.material_path for a in req.custom_assignments if a.material_path]
elif req.material_paths and len(req.material_paths) > 1:
material_paths = req.material_paths
else:
material_paths = [req.material_path]
is_multi = len(material_paths) > 1
target_resolution = (1080, 1920) if req.output_aspect_ratio == "9:16" else (1920, 1080)
logger.info(
f"[Render] 输出画面比例: {req.output_aspect_ratio}, "
f"目标分辨率: {target_resolution[0]}x{target_resolution[1]}"
)
_update_task(task_id, status="processing", progress=5, message="正在下载素材...")
temp_dir = settings.UPLOAD_DIR / "temp"
temp_dir.mkdir(parents=True, exist_ok=True)
video = VideoService()
input_material_path: Optional[Path] = None
input_material_path = temp_dir / f"{task_id}_input.mp4"
temp_files.append(input_material_path)
# 单素材模式:下载主素材
if not is_multi:
input_material_path = temp_dir / f"{task_id}_input.mp4"
temp_files.append(input_material_path)
await _download_material(material_paths[0], input_material_path)
await _download_material(req.material_path, input_material_path)
# 归一化旋转元数据(如 iPhone MOV 1920x1080 + rotation=-90
normalized_input_path = temp_dir / f"{task_id}_input_norm.mp4"
normalized_result = video.normalize_orientation(
str(input_material_path),
str(normalized_input_path),
)
if normalized_result != str(input_material_path):
temp_files.append(normalized_input_path)
input_material_path = normalized_input_path
_update_task(task_id, message="正在生成语音...", progress=10)
audio_path = temp_dir / f"{task_id}_audio.wav"
temp_files.append(audio_path)
if req.tts_mode == "voiceclone":
if req.generated_audio_id:
# 新流程:使用预生成的配音
_update_task(task_id, message="正在下载配音...", progress=12)
audio_url = await storage_service.get_signed_url(
bucket="generated-audios",
path=req.generated_audio_id,
)
await _download_material(audio_url, audio_path)
# 从元数据获取 language
meta_path = req.generated_audio_id.replace("_audio.wav", "_audio.json")
try:
meta_url = await storage_service.get_signed_url(
bucket="generated-audios", path=meta_path,
)
import httpx as _httpx
async with _httpx.AsyncClient(timeout=5.0) as client:
resp = await client.get(meta_url)
if resp.status_code == 200:
meta = resp.json()
req.language = meta.get("language", req.language)
# 无条件用配音元数据覆盖文案,确保字幕与配音语言一致
meta_text = meta.get("text", "")
if meta_text:
req.text = meta_text
except Exception as e:
logger.warning(f"读取配音元数据失败: {e}")
elif req.tts_mode == "voiceclone":
if not req.ref_audio_id or not req.ref_text:
raise ValueError("声音克隆模式需要提供参考音频和参考文字")
@@ -113,13 +261,13 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
)
await _download_material(ref_audio_url, ref_audio_local)
_update_task(task_id, message="正在克隆声音 (Qwen3-TTS)...")
_update_task(task_id, message="正在克隆声音...")
await voice_clone_service.generate_audio(
text=req.text,
ref_audio_path=str(ref_audio_local),
ref_text=req.ref_text,
output_path=str(audio_path),
language="Chinese"
language=_locale_to_tts_lang(req.language)
)
else:
_update_task(task_id, message="正在生成语音 (EdgeTTS)...")
@@ -128,83 +276,403 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
tts_time = time.time() - start_time
print(f"[Pipeline] TTS completed in {tts_time:.1f}s")
_update_task(task_id, progress=25)
_update_task(task_id, message="正在合成唇形 (LatentSync)...", progress=30)
lipsync = _get_lipsync_service()
lipsync_video_path = temp_dir / f"{task_id}_lipsync.mp4"
temp_files.append(lipsync_video_path)
lipsync_start = time.time()
is_ready = await _check_lipsync_ready()
if is_ready:
print(f"[LipSync] Starting LatentSync inference...")
_update_task(task_id, progress=35, message="正在运行 LatentSync 推理...")
await lipsync.generate(str(input_material_path), str(audio_path), str(lipsync_video_path))
else:
print(f"[LipSync] LatentSync not ready, copying original video")
_update_task(task_id, message="唇形同步不可用,使用原始视频...")
import shutil
shutil.copy(str(input_material_path), lipsync_video_path)
lipsync_time = time.time() - lipsync_start
print(f"[Pipeline] LipSync completed in {lipsync_time:.1f}s")
_update_task(task_id, progress=80)
captions_path = None
if req.enable_subtitles:
_update_task(task_id, message="正在生成字幕 (Whisper)...", progress=82)
captions_path = temp_dir / f"{task_id}_captions.json"
temp_files.append(captions_path)
if is_multi:
# ══════════════════════════════════════
# 多素材流水线
# ══════════════════════════════════════
_update_task(task_id, progress=12, message="正在分配素材...")
try:
await whisper_service.align(
audio_path=str(audio_path),
text=req.text,
output_path=str(captions_path)
if req.custom_assignments and len(req.custom_assignments) == len(material_paths):
# 用户自定义分配,跳过 Whisper 均分
assignments = [
{
"material_path": a.material_path,
"start": a.start,
"end": a.end,
"source_start": a.source_start,
"source_end": a.source_end,
"index": i,
}
for i, a in enumerate(req.custom_assignments)
]
# 仍然需要 Whisper 生成字幕(如果启用)
captions_path = temp_dir / f"{task_id}_captions.json"
temp_files.append(captions_path)
if req.enable_subtitles:
_update_task(task_id, message="正在生成字幕 (Whisper)...")
try:
await whisper_service.align(
audio_path=str(audio_path),
text=req.text,
output_path=str(captions_path),
language=_locale_to_whisper_lang(req.language),
original_text=req.text,
)
print(f"[Pipeline] Whisper alignment completed (custom assignments)")
except Exception as e:
logger.warning(f"Whisper alignment failed: {e}")
captions_path = None
else:
captions_path = None
elif req.custom_assignments:
logger.warning(
f"[MultiMat] custom_assignments 数量({len(req.custom_assignments)})"
f" 与素材数量({len(material_paths)})不一致,回退自动分配"
)
print(f"[Pipeline] Whisper alignment completed")
except Exception as e:
logger.warning(f"Whisper alignment failed, skipping subtitles: {e}")
# 原有逻辑Whisper → _split_equal
_update_task(task_id, message="正在生成字幕 (Whisper)...")
captions_path = temp_dir / f"{task_id}_captions.json"
temp_files.append(captions_path)
try:
captions_data = await whisper_service.align(
audio_path=str(audio_path),
text=req.text,
output_path=str(captions_path),
language=_locale_to_whisper_lang(req.language),
original_text=req.text,
)
print(f"[Pipeline] Whisper alignment completed (multi-material)")
except Exception as e:
logger.warning(f"Whisper alignment failed: {e}")
captions_data = None
captions_path = None
_update_task(task_id, progress=15, message="正在分配素材...")
if captions_data and captions_data.get("segments"):
assignments = _split_equal(captions_data["segments"], material_paths)
else:
# Whisper 失败 → 按时长均分(不依赖字符对齐)
logger.warning("[MultiMat] Whisper 无数据,按时长均分")
audio_dur = video._get_duration(str(audio_path))
if audio_dur <= 0:
audio_dur = 30.0 # 安全兜底
seg_dur = audio_dur / len(material_paths)
assignments = [
{"material_path": material_paths[i], "start": i * seg_dur,
"end": (i + 1) * seg_dur, "index": i}
for i in range(len(material_paths))
]
else:
# 原有逻辑Whisper → _split_equal
_update_task(task_id, message="正在生成字幕 (Whisper)...")
captions_path = temp_dir / f"{task_id}_captions.json"
temp_files.append(captions_path)
try:
captions_data = await whisper_service.align(
audio_path=str(audio_path),
text=req.text,
output_path=str(captions_path),
language=_locale_to_whisper_lang(req.language),
original_text=req.text,
)
print(f"[Pipeline] Whisper alignment completed (multi-material)")
except Exception as e:
logger.warning(f"Whisper alignment failed: {e}")
captions_data = None
captions_path = None
_update_task(task_id, progress=15, message="正在分配素材...")
if captions_data and captions_data.get("segments"):
assignments = _split_equal(captions_data["segments"], material_paths)
else:
# Whisper 失败 → 按时长均分(不依赖字符对齐)
logger.warning("[MultiMat] Whisper 无数据,按时长均分")
audio_dur = video._get_duration(str(audio_path))
if audio_dur <= 0:
audio_dur = 30.0 # 安全兜底
seg_dur = audio_dur / len(material_paths)
assignments = [
{"material_path": material_paths[i], "start": i * seg_dur,
"end": (i + 1) * seg_dur, "index": i}
for i in range(len(material_paths))
]
# 扩展段覆盖完整音频范围首段从0开始末段到音频结尾
audio_duration = video._get_duration(str(audio_path))
if assignments and audio_duration > 0:
assignments[0]["start"] = 0.0
assignments[-1]["end"] = audio_duration
num_segments = len(assignments)
print(f"[Pipeline] Multi-material: {num_segments} segments, {len(material_paths)} materials")
if num_segments == 0:
raise RuntimeError("Multi-material: no valid segments after splitting")
lipsync_start = time.time()
# ── 第一步:并行下载所有素材并检测分辨率 ──
material_locals: List[Path] = []
resolutions = []
async def _download_and_normalize(i: int, assignment: dict):
"""下载单个素材并归一化方向"""
material_local = temp_dir / f"{task_id}_material_{i}.mp4"
temp_files.append(material_local)
await _download_material(assignment["material_path"], material_local)
normalized_material = temp_dir / f"{task_id}_material_{i}_norm.mp4"
loop = asyncio.get_event_loop()
normalized_result = await loop.run_in_executor(
None,
video.normalize_orientation,
str(material_local),
str(normalized_material),
)
if normalized_result != str(material_local):
temp_files.append(normalized_material)
material_local = normalized_material
res = video.get_resolution(str(material_local))
return material_local, res
download_tasks = [
_download_and_normalize(i, assignment)
for i, assignment in enumerate(assignments)
]
download_results = await asyncio.gather(*download_tasks)
for local, res in download_results:
material_locals.append(local)
resolutions.append(res)
# 按用户选择的画面比例统一分辨率
base_res = target_resolution
need_scale = any(r != base_res for r in resolutions)
if need_scale:
logger.info(f"[MultiMat] 素材分辨率不一致,统一到 {base_res[0]}x{base_res[1]}")
# ── 第二步:并行裁剪每段素材到对应时长 ──
prepared_segments: List[Path] = [None] * num_segments
async def _prepare_one_segment(i: int, assignment: dict):
"""将单个素材裁剪/循环到对应时长"""
seg_dur = assignment["end"] - assignment["start"]
prepared_path = temp_dir / f"{task_id}_prepared_{i}.mp4"
temp_files.append(prepared_path)
loop = asyncio.get_event_loop()
await loop.run_in_executor(
None,
video.prepare_segment,
str(material_locals[i]),
seg_dur,
str(prepared_path),
base_res,
assignment.get("source_start", 0.0),
assignment.get("source_end"),
25,
)
return i, prepared_path
_update_task(
task_id,
progress=15,
message=f"正在并行准备 {num_segments} 个素材片段..."
)
prepare_tasks = [
_prepare_one_segment(i, assignment)
for i, assignment in enumerate(assignments)
]
prepare_results = await asyncio.gather(*prepare_tasks)
for i, path in prepare_results:
prepared_segments[i] = path
# ── 第二步:拼接所有素材片段 ──
_update_task(task_id, progress=50, message="正在拼接素材片段...")
concat_path = temp_dir / f"{task_id}_concat.mp4"
temp_files.append(concat_path)
video.concat_videos(
[str(p) for p in prepared_segments],
str(concat_path),
target_fps=25,
)
# ── 第三步:一次 LatentSync 推理 ──
is_ready = await _check_lipsync_ready()
if is_ready:
_update_task(task_id, progress=55, message="正在合成唇形 (LatentSync)...")
print(f"[LipSync] Multi-material: single LatentSync on concatenated video")
try:
await lipsync.generate(str(concat_path), str(audio_path), str(lipsync_video_path))
except Exception as e:
logger.warning(f"[LipSync] Failed, fallback to concat without lipsync: {e}")
import shutil
shutil.copy(str(concat_path), str(lipsync_video_path))
else:
print(f"[LipSync] Not ready, using concatenated video without lipsync")
import shutil
shutil.copy(str(concat_path), str(lipsync_video_path))
lipsync_time = time.time() - lipsync_start
print(f"[Pipeline] Multi-material prepare + concat + LipSync completed in {lipsync_time:.1f}s")
_update_task(task_id, progress=80)
# 如果用户关闭了字幕,清除 captions_pathWhisper 仅用于句子切分)
if not req.enable_subtitles:
captions_path = None
else:
# ══════════════════════════════════════
# 单素材流水线(原有逻辑)
# ══════════════════════════════════════
if input_material_path is None:
raise RuntimeError("单素材流程缺少输入素材")
# 单素材:按用户选择画面比例统一到目标分辨率,并应用 source_start
single_source_start = 0.0
single_source_end = None
if req.custom_assignments and len(req.custom_assignments) == 1:
single_source_start = req.custom_assignments[0].source_start
single_source_end = req.custom_assignments[0].source_end
_update_task(task_id, progress=20, message="正在准备素材片段...")
audio_dur = video._get_duration(str(audio_path))
if audio_dur <= 0:
audio_dur = 30.0
prepared_single_path = temp_dir / f"{task_id}_prepared_single.mp4"
temp_files.append(prepared_single_path)
video.prepare_segment(
str(input_material_path),
audio_dur,
str(prepared_single_path),
target_resolution=target_resolution,
source_start=single_source_start,
source_end=single_source_end,
)
input_material_path = prepared_single_path
_update_task(task_id, progress=25)
_update_task(task_id, message="正在合成唇形 (LatentSync)...", progress=30)
lipsync_start = time.time()
is_ready = await _check_lipsync_ready()
if is_ready:
print(f"[LipSync] Starting LatentSync inference...")
_update_task(task_id, progress=35, message="正在运行 LatentSync 推理...")
await lipsync.generate(str(input_material_path), str(audio_path), str(lipsync_video_path))
else:
print(f"[LipSync] LatentSync not ready, copying original video")
_update_task(task_id, message="唇形同步不可用,使用原始视频...")
import shutil
shutil.copy(str(input_material_path), lipsync_video_path)
lipsync_time = time.time() - lipsync_start
print(f"[Pipeline] LipSync completed in {lipsync_time:.1f}s")
_update_task(task_id, progress=80)
# 单素材模式Whisper 延迟到下方与 BGM 并行执行
if not req.enable_subtitles:
captions_path = None
_update_task(task_id, progress=85)
video = VideoService()
# ── Whisper 字幕 + BGM 混音 并行(两者都只依赖 audio_path──
final_audio_path = audio_path
if req.bgm_id:
_update_task(task_id, message="正在合成背景音乐...", progress=86)
_whisper_task = None
_bgm_task = None
# 单素材模式下 Whisper 尚未执行,这里与 BGM 并行启动
need_whisper = not is_multi and req.enable_subtitles and captions_path is None
if need_whisper:
captions_path = temp_dir / f"{task_id}_captions.json"
temp_files.append(captions_path)
_captions_path_str = str(captions_path)
async def _run_whisper():
_update_task(task_id, message="正在生成字幕 (Whisper)...", progress=82)
try:
await whisper_service.align(
audio_path=str(audio_path),
text=req.text,
output_path=_captions_path_str,
language=_locale_to_whisper_lang(req.language),
original_text=req.text,
)
print(f"[Pipeline] Whisper alignment completed")
return True
except Exception as e:
logger.warning(f"Whisper alignment failed, skipping subtitles: {e}")
return False
_whisper_task = _run_whisper()
if req.bgm_id:
bgm_path = resolve_bgm_path(req.bgm_id)
if bgm_path:
mix_output_path = temp_dir / f"{task_id}_audio_mix.wav"
temp_files.append(mix_output_path)
volume = req.bgm_volume if req.bgm_volume is not None else 0.2
volume = max(0.0, min(float(volume), 1.0))
try:
video.mix_audio(
voice_path=str(audio_path),
bgm_path=str(bgm_path),
output_path=str(mix_output_path),
bgm_volume=volume
)
final_audio_path = mix_output_path
except Exception as e:
logger.warning(f"BGM mix failed, fallback to voice only: {e}")
_mix_output = str(mix_output_path)
_bgm_path = str(bgm_path)
_voice_path = str(audio_path)
_volume = volume
async def _run_bgm():
_update_task(task_id, message="正在合成背景音乐...", progress=86)
loop = asyncio.get_event_loop()
try:
await loop.run_in_executor(
None,
video.mix_audio,
_voice_path,
_bgm_path,
_mix_output,
_volume,
)
return True
except Exception as e:
logger.warning(f"BGM mix failed, fallback to voice only: {e}")
return False
_bgm_task = _run_bgm()
else:
logger.warning(f"BGM not found: {req.bgm_id}")
use_remotion = (captions_path and captions_path.exists()) or req.title
# 并行等待 Whisper + BGM
parallel_tasks = [t for t in (_whisper_task, _bgm_task) if t is not None]
if parallel_tasks:
results = await asyncio.gather(*parallel_tasks)
result_idx = 0
if _whisper_task is not None:
if not results[result_idx]:
captions_path = None
result_idx += 1
if _bgm_task is not None:
if results[result_idx]:
final_audio_path = mix_output_path
use_remotion = (captions_path and captions_path.exists()) or req.title or req.secondary_title
subtitle_style = None
title_style = None
secondary_title_style = None
if req.enable_subtitles:
subtitle_style = get_style("subtitle", req.subtitle_style_id) or get_default_style("subtitle")
if req.title:
title_style = get_style("title", req.title_style_id) or get_default_style("title")
if req.secondary_title:
secondary_title_style = get_style("title", req.secondary_title_style_id) or get_default_style("title")
if req.subtitle_font_size and req.enable_subtitles:
if subtitle_style is None:
@@ -216,6 +684,26 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
title_style = {}
title_style["font_size"] = int(req.title_font_size)
if req.title_top_margin is not None and req.title:
if title_style is None:
title_style = {}
title_style["top_margin"] = int(req.title_top_margin)
if req.subtitle_bottom_margin is not None and req.enable_subtitles:
if subtitle_style is None:
subtitle_style = {}
subtitle_style["bottom_margin"] = int(req.subtitle_bottom_margin)
if req.secondary_title_font_size and req.secondary_title:
if secondary_title_style is None:
secondary_title_style = {}
secondary_title_style["font_size"] = int(req.secondary_title_font_size)
if req.secondary_title_top_margin is not None and req.secondary_title:
if secondary_title_style is None:
secondary_title_style = {}
secondary_title_style["top_margin"] = int(req.secondary_title_top_margin)
if use_remotion:
subtitle_style = prepare_style_for_remotion(
subtitle_style,
@@ -227,6 +715,11 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
temp_dir,
f"{task_id}_title_font"
)
secondary_title_style = prepare_style_for_remotion(
secondary_title_style,
temp_dir,
f"{task_id}_secondary_title_font"
)
final_output_local_path = temp_dir / f"{task_id}_output.mp4"
temp_files.append(final_output_local_path)
@@ -246,16 +739,26 @@ async def process_video_generation(task_id: str, req: GenerateRequest, user_id:
mapped = 87 + int(percent * 0.08)
_update_task(task_id, progress=mapped)
title_display_mode = (
req.title_display_mode
if req.title_display_mode in ("short", "persistent")
else "short"
)
title_duration = max(0.5, min(float(req.title_duration or 4.0), 30.0))
await remotion_service.render(
video_path=str(composed_video_path),
output_path=str(final_output_local_path),
captions_path=str(captions_path) if captions_path else None,
title=req.title,
title_duration=3.0,
title_duration=title_duration,
title_display_mode=title_display_mode,
fps=25,
enable_subtitles=req.enable_subtitles,
subtitle_style=subtitle_style,
title_style=title_style,
secondary_title=req.secondary_title,
secondary_title_style=secondary_title_style,
on_progress=on_remotion_progress
)
print(f"[Pipeline] Remotion render completed")

View File

@@ -0,0 +1,34 @@
"""
订单数据访问层
"""
from datetime import datetime, timezone
from typing import Any, Dict, Optional, cast
from app.core.supabase import get_supabase
def create_order(user_id: str, out_trade_no: str, amount: float) -> Dict[str, Any]:
supabase = get_supabase()
result = supabase.table("orders").insert({
"user_id": user_id,
"out_trade_no": out_trade_no,
"amount": amount,
"status": "pending",
}).execute()
return cast(Dict[str, Any], (result.data or [{}])[0])
def get_order_by_trade_no(out_trade_no: str) -> Optional[Dict[str, Any]]:
supabase = get_supabase()
result = supabase.table("orders").select("*").eq("out_trade_no", out_trade_no).single().execute()
return cast(Optional[Dict[str, Any]], result.data or None)
def update_order_status(out_trade_no: str, status: str, trade_no: str | None = None) -> None:
supabase = get_supabase()
payload: Dict[str, Any] = {"status": status}
if trade_no:
payload["trade_no"] = trade_no
if status == "paid":
payload["paid_at"] = datetime.now(timezone.utc).isoformat()
supabase.table("orders").update(payload).eq("out_trade_no", out_trade_no).execute()

View File

@@ -1,3 +1,4 @@
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional, cast
from app.core.supabase import get_supabase
@@ -37,3 +38,33 @@ def update_user(user_id: str, payload: Dict[str, Any]) -> List[Dict[str, Any]]:
supabase = get_supabase()
result = supabase.table("users").update(payload).eq("id", user_id).execute()
return cast(List[Dict[str, Any]], result.data or [])
def _parse_expires_at(expires_at: Any) -> Optional[datetime]:
try:
expires_at_dt = datetime.fromisoformat(str(expires_at).replace("Z", "+00:00"))
except Exception:
return None
if expires_at_dt.tzinfo is None:
expires_at_dt = expires_at_dt.replace(tzinfo=timezone.utc)
return expires_at_dt.astimezone(timezone.utc)
def deactivate_user_if_expired(user: Dict[str, Any]) -> bool:
expires_at = user.get("expires_at")
if not expires_at:
return False
expires_at_dt = _parse_expires_at(expires_at)
if not expires_at_dt:
return False
if datetime.now(timezone.utc) <= expires_at_dt:
return False
user_id = user.get("id")
if user.get("is_active") and user_id:
update_user(cast(str, user_id), {"is_active": False})
return True

View File

@@ -35,17 +35,19 @@ class GLMService:
Returns:
{"title": "标题", "tags": ["标签1", "标签2", ...]}
"""
prompt = f"""根据以下口播文案生成一个吸引人的短视频标题和3个相关标签。
prompt = f"""根据以下口播文案,生成一个吸引人的短视频标题、副标题和3个相关标签。
口播文案:
{text}
要求:
1. 标题要简洁有力能吸引观众点击不超过10个字
2. 标签要与内容相关便于搜索和推荐只要3个
2. 副标题是对标题的补充说明或描述性文字不超过20个字
3. 标签要与内容相关便于搜索和推荐只要3个
4. 标题、副标题和标签必须使用与口播文案相同的语言(如文案是英文就用英文,日文就用日文)
请严格按以下JSON格式返回不要包含其他内容
{{"title": "标题", "tags": ["标签1", "标签2", "标签3"]}}"""
{{"title": "标题", "secondary_title": "副标题", "tags": ["标签1", "标签2", "标签3"]}}"""
try:
client = self._get_client()
@@ -74,17 +76,24 @@ class GLMService:
logger.error(f"GLM service error: {e}")
raise Exception(f"AI 生成失败: {str(e)}")
async def rewrite_script(self, text: str) -> str:
async def rewrite_script(self, text: str, custom_prompt: str = None) -> str:
"""
AI 洗稿(文案改写)
AI 改写文案
Args:
text: 原始文案
custom_prompt: 自定义提示词,为空则使用默认提示词
Returns:
改写后的文案
"""
prompt = f"""请将以下视频文案进行改写。
if custom_prompt and custom_prompt.strip():
prompt = f"""{custom_prompt.strip()}
原始文案:
{text}"""
else:
prompt = f"""请将以下视频文案进行改写。
原始文案:
{text}
@@ -120,6 +129,49 @@ class GLMService:
async def translate_text(self, text: str, target_lang: str) -> str:
"""
将文案翻译为指定语言
Args:
text: 原始文案
target_lang: 目标语言(如 English, 日本語 等)
Returns:
翻译后的文案
"""
prompt = f"""请将以下文案翻译为{target_lang}
原文:
{text}
要求:
1. 只返回翻译后的文案,不要添加任何解释或说明
2. 保持原文的语气和风格
3. 翻译要自然流畅,符合目标语言的表达习惯"""
try:
client = self._get_client()
logger.info(f"Using GLM to translate text to {target_lang}")
import asyncio
response = await asyncio.to_thread(
client.chat.completions.create,
model=settings.GLM_MODEL,
messages=[{"role": "user", "content": prompt}],
thinking={"type": "disabled"},
max_tokens=2000,
temperature=0.3
)
content = response.choices[0].message.content
logger.info("GLM translation completed")
return content.strip()
except Exception as e:
logger.error(f"GLM translate error: {e}")
raise Exception(f"AI 翻译失败: {str(e)}")
def _parse_json_response(self, content: str) -> dict:
"""解析 GLM 返回的 JSON 内容"""
# 尝试直接解析
@@ -130,6 +182,8 @@ class GLMService:
# 尝试提取 JSON 块
json_match = re.search(r'\{[^{}]*"title"[^{}]*"tags"[^{}]*\}', content, re.DOTALL)
if not json_match:
json_match = re.search(r'\{[^{}]*"title"[^{}]*"secondary_title"[^{}]*"tags"[^{}]*\}', content, re.DOTALL)
if json_match:
try:
return json.loads(json_match.group())

View File

@@ -1,7 +1,7 @@
"""
唇形同步服务
通过 subprocess 调用 LatentSync conda 环境进行推理
配置为使用 GPU1 (CUDA:1)
混合方案: 短视频用 LatentSync (高质量), 长视频用 MuseTalk (高速度)
路由阈值: LIPSYNC_DURATION_THRESHOLD (默认 120s)
"""
import os
import shutil
@@ -17,15 +17,18 @@ from app.core.config import settings
class LipSyncService:
"""唇形同步服务 - LatentSync 1.6 集成 (Subprocess 方式)"""
"""唇形同步服务 - LatentSync 1.6 + MuseTalk 1.5 混合方案"""
def __init__(self):
self.use_local = settings.LATENTSYNC_LOCAL
self.api_url = settings.LATENTSYNC_API_URL
self.latentsync_dir = settings.LATENTSYNC_DIR
self.gpu_id = settings.LATENTSYNC_GPU_ID
self.use_server = settings.LATENTSYNC_USE_SERVER
# MuseTalk 配置
self.musetalk_api_url = settings.MUSETALK_API_URL
# GPU 并发锁 (Serial Queue)
self._lock = asyncio.Lock()
@@ -103,7 +106,7 @@ class LipSyncService:
"-t", str(target_duration), # 截取到目标时长
"-c:v", "libx264",
"-preset", "fast",
"-crf", "18",
"-crf", "23",
"-an", # 去掉原音频
output_path
]
@@ -268,6 +271,18 @@ class LipSyncService:
else:
actual_video_path = video_path
# 混合路由: 长视频走 MuseTalk短视频走 LatentSync
if audio_duration and audio_duration >= settings.LIPSYNC_DURATION_THRESHOLD:
logger.info(
f"🔄 音频 {audio_duration:.1f}s >= {settings.LIPSYNC_DURATION_THRESHOLD}s路由到 MuseTalk"
)
musetalk_result = await self._call_musetalk_server(
actual_video_path, audio_path, output_path
)
if musetalk_result:
return musetalk_result
logger.warning("⚠️ MuseTalk 不可用,回退到 LatentSync长视频会较慢")
if self.use_server:
# 模式 A: 调用常驻服务 (加速模式)
return await self._call_persistent_server(actual_video_path, audio_path, output_path)
@@ -352,6 +367,55 @@ class LipSyncService:
shutil.copy(video_path, output_path)
return output_path
async def _call_musetalk_server(
self, video_path: str, audio_path: str, output_path: str
) -> Optional[str]:
"""
调用 MuseTalk 常驻服务。
成功返回 output_path不可用返回 None信号上层回退到 LatentSync
"""
server_url = self.musetalk_api_url
logger.info(f"⚡ 调用 MuseTalk 服务: {server_url}")
try:
async with httpx.AsyncClient(timeout=3600.0) as client:
# 健康检查
try:
resp = await client.get(f"{server_url}/health", timeout=5.0)
if resp.status_code != 200:
logger.warning("⚠️ MuseTalk 健康检查失败")
return None
health = resp.json()
if not health.get("model_loaded"):
logger.warning("⚠️ MuseTalk 模型未加载")
return None
except Exception:
logger.warning("⚠️ 无法连接 MuseTalk 服务")
return None
# 发送推理请求
payload = {
"video_path": str(Path(video_path).resolve()),
"audio_path": str(Path(audio_path).resolve()),
"video_out_path": str(Path(output_path).resolve()),
"batch_size": settings.MUSETALK_BATCH_SIZE,
}
response = await client.post(f"{server_url}/lipsync", json=payload)
if response.status_code == 200:
result = response.json()
if Path(result["output_path"]).exists():
logger.info(f"✅ MuseTalk 推理完成: {output_path}")
return output_path
logger.error(f"❌ MuseTalk 服务报错: {response.text}")
return None
except Exception as e:
logger.error(f"❌ MuseTalk 调用失败: {e}")
return None
async def _call_persistent_server(self, video_path: str, audio_path: str, output_path: str) -> str:
"""调用本地常驻服务 (server.py)"""
server_url = "http://localhost:8007"
@@ -369,7 +433,7 @@ class LipSyncService:
}
try:
async with httpx.AsyncClient(timeout=1200.0) as client:
async with httpx.AsyncClient(timeout=3600.0) as client:
# 先检查健康状态
try:
resp = await client.get(f"{server_url}/health", timeout=5.0)
@@ -477,8 +541,18 @@ class LipSyncService:
except:
pass
# 检查 MuseTalk 服务
musetalk_ready = False
try:
async with httpx.AsyncClient(timeout=5.0) as client:
resp = await client.get(f"{self.musetalk_api_url}/health")
if resp.status_code == 200:
musetalk_ready = resp.json().get("model_loaded", False)
except Exception:
pass
return {
"model": "LatentSync 1.6",
"model": "LatentSync 1.6 + MuseTalk 1.5",
"conda_env": conda_ok,
"weights": weights_ok,
"gpu": gpu_ok,
@@ -486,5 +560,7 @@ class LipSyncService:
"gpu_id": self.gpu_id,
"inference_steps": settings.LATENTSYNC_INFERENCE_STEPS,
"guidance_scale": settings.LATENTSYNC_GUIDANCE_SCALE,
"ready": conda_ok and weights_ok and gpu_ok
"ready": conda_ok and weights_ok and gpu_ok,
"musetalk_ready": musetalk_ready,
"lipsync_threshold": settings.LIPSYNC_DURATION_THRESHOLD,
}

View File

@@ -17,20 +17,20 @@ from app.services.storage import storage_service
# Import platform uploaders
from .uploader.bilibili_uploader import BilibiliUploader
from .uploader.douyin_uploader import DouyinUploader
from .uploader.xiaohongshu_uploader import XiaohongshuUploader
from .uploader.weixin_uploader import WeixinUploader
from .uploader.xiaohongshu_uploader import XiaohongshuUploader
from .uploader.weixin_uploader import WeixinUploader
class PublishService:
"""Social media publishing service (with user isolation)"""
# 支持的平台配置
PLATFORMS: Dict[str, Dict[str, Any]] = {
"douyin": {"name": "抖音", "url": "https://creator.douyin.com/", "enabled": True},
"weixin": {"name": "微信视频号", "url": "https://channels.weixin.qq.com/", "enabled": True},
"bilibili": {"name": "B站", "url": "https://member.bilibili.com/platform/upload/video/frame", "enabled": True},
"xiaohongshu": {"name": "小红书", "url": "https://creator.xiaohongshu.com/", "enabled": True},
}
PLATFORMS: Dict[str, Dict[str, Any]] = {
"douyin": {"name": "抖音", "url": "https://creator.douyin.com/", "enabled": True},
"weixin": {"name": "微信视频号", "url": "https://channels.weixin.qq.com/", "enabled": True},
"bilibili": {"name": "B站", "url": "https://member.bilibili.com/platform/upload/video/frame", "enabled": True},
"xiaohongshu": {"name": "小红书", "url": "https://creator.xiaohongshu.com/", "enabled": True},
}
def __init__(self) -> None:
# 存储活跃的登录会话,用于跟踪登录状态
@@ -175,36 +175,36 @@ class PublishService:
tid=kwargs.get('tid', 122),
copyright=kwargs.get('copyright', 1)
)
elif platform == "douyin":
uploader = DouyinUploader(
title=title,
file_path=local_video_path,
tags=tags,
publish_date=publish_time,
account_file=str(account_file),
description=description,
user_id=user_id,
)
elif platform == "xiaohongshu":
uploader = XiaohongshuUploader(
title=title,
file_path=local_video_path,
tags=tags,
publish_date=publish_time,
account_file=str(account_file),
description=description
)
elif platform == "weixin":
uploader = WeixinUploader(
title=title,
file_path=local_video_path,
tags=tags,
publish_date=publish_time,
account_file=str(account_file),
description=description,
user_id=user_id,
)
else:
elif platform == "douyin":
uploader = DouyinUploader(
title=title,
file_path=local_video_path,
tags=tags,
publish_date=publish_time,
account_file=str(account_file),
description=description,
user_id=user_id,
)
elif platform == "xiaohongshu":
uploader = XiaohongshuUploader(
title=title,
file_path=local_video_path,
tags=tags,
publish_date=publish_time,
account_file=str(account_file),
description=description
)
elif platform == "weixin":
uploader = WeixinUploader(
title=title,
file_path=local_video_path,
tags=tags,
publish_date=publish_time,
account_file=str(account_file),
description=description,
user_id=user_id,
)
else:
logger.warning(f"[发布] {platform} 上传功能尚未实现")
return {
"success": False,
@@ -236,30 +236,38 @@ class PublishService:
async def login(self, platform: str, user_id: Optional[str] = None) -> Dict[str, Any]:
"""
启动QR码登录流程
Args:
platform: 平台 ID
user_id: 用户 ID (用于 Cookie 隔离)
Returns:
dict: 包含二维码base64图片
"""
if platform not in self.PLATFORMS:
return {"success": False, "message": "不支持的平台"}
try:
from .qr_login_service import QRLoginService
# 获取用户专属的 Cookie 目录
cookies_dir = self._get_cookies_dir(user_id)
# 清理旧的活跃会话(避免残留会话干扰新登录)
session_key = self._get_session_key(platform, user_id)
if session_key in self.active_login_sessions:
old_service = self.active_login_sessions.pop(session_key)
try:
await old_service._cleanup()
except Exception:
pass
# 创建QR登录服务
qr_service = QRLoginService(platform, cookies_dir)
# 存储活跃会话 (带用户隔离)
session_key = self._get_session_key(platform, user_id)
self.active_login_sessions[session_key] = qr_service
# 启动登录并获取二维码
result = await qr_service.start_login()
@@ -273,27 +281,28 @@ class PublishService:
}
def get_login_session_status(self, platform: str, user_id: Optional[str] = None) -> Dict[str, Any]:
"""获取活跃登录会话的状态"""
"""获取活跃登录会话的状态(仅用于扫码轮询)"""
session_key = self._get_session_key(platform, user_id)
# 1. 如果有活跃的扫码会话,优先检查它
# 只检查活跃的扫码会话,不检查 Cookie 文件
# Cookie 文件检查会导致"重新登录"时误判为已登录
if session_key in self.active_login_sessions:
qr_service = self.active_login_sessions[session_key]
status = qr_service.get_login_status()
# 如果登录成功且Cookie已保存清理会话
if status["success"] and status["cookies_saved"]:
del self.active_login_sessions[session_key]
return {"success": True, "message": "登录成功"}
return {"success": False, "message": "等待扫码..."}
# 2. 检查本地Cookie文件是否存在
cookie_file = self._get_cookie_path(platform, user_id)
if cookie_file.exists():
return {"success": True, "message": "已登录 (历史状态)"}
return {"success": False, "message": "未登录"}
# 刷脸验证:传递新二维码给前端
result: Dict[str, Any] = {"success": False, "message": "等待扫码..."}
if status.get("face_verify_qr"):
result["face_verify_qr"] = status["face_verify_qr"]
return result
# 没有活跃会话 → 返回 False前端不应在无会话时轮询
return {"success": False, "message": "无活跃登录会话"}
def logout(self, platform: str, user_id: Optional[str] = None) -> Dict[str, Any]:
"""

View File

@@ -1,59 +1,67 @@
"""
QR码自动登录服务
后端Playwright无头模式获取二维码前端扫码后自动保存Cookie
"""
"""
QR码自动登录服务
后端Playwright无头模式获取二维码前端扫码后自动保存Cookie
"""
import asyncio
import time
import base64
import json
from pathlib import Path
import base64
import json
from pathlib import Path
from typing import Optional, Dict, Any, List, Sequence, Mapping, Union
from playwright.async_api import async_playwright, Page, Frame, BrowserContext, Browser, Playwright as PW
from loguru import logger
from app.core.config import settings
class QRLoginService:
"""QR码登录服务"""
# 登录监控超时 (秒)
LOGIN_TIMEOUT = 120
"""QR码登录服务"""
# 登录监控超时 (秒)
LOGIN_TIMEOUT = 180
def __init__(self, platform: str, cookies_dir: Path) -> None:
self.platform = platform
self.cookies_dir = cookies_dir
self.qr_code_image: Optional[str] = None
self.login_success: bool = False
self.cookies_data: Optional[Dict[str, Any]] = None
# Playwright 资源 (手动管理生命周期)
self.playwright: Optional[PW] = None
self.browser: Optional[Browser] = None
self.context: Optional[BrowserContext] = None
# 每个平台使用多个选择器 (使用逗号分隔Playwright会同时等待它们)
self.cookies_dir = cookies_dir
self.qr_code_image: Optional[str] = None
self.login_success: bool = False
self.cookies_data: Optional[Dict[str, Any]] = None
# Playwright 资源 (手动管理生命周期)
self.playwright: Optional[PW] = None
self.browser: Optional[Browser] = None
self.context: Optional[BrowserContext] = None
# 抖音 check_qrconnect API 响应拦截
self._qr_api_confirmed: bool = False
self._qr_redirect_url: Optional[str] = None
self._douyin_needs_verify: bool = False # 需要APP验证
# 刷脸验证二维码(点击刷脸后页面展示新二维码,需要前端再次展示给用户)
self._face_verify_qr: Optional[str] = None # base64 截图
# 每个平台使用多个选择器 (使用逗号分隔Playwright会同时等待它们)
self.platform_configs = {
"bilibili": {
"url": "https://passport.bilibili.com/login",
"qr_selectors": [
"div[class*='qrcode'] canvas", # 常见canvas二维码
"div[class*='qrcode'] img", # 常见图片二维码
".qrcode-img img", # 旧版
".login-scan-box img", # 扫码框
"div[class*='scan'] img"
],
"success_indicator": "https://www.bilibili.com/"
},
"douyin": {
"url": "https://creator.douyin.com/",
"qr_selectors": [
".qrcode img", # 优先尝试
"img[alt='qrcode']",
"canvas[class*='qr']",
"img[src*='qr']"
],
"success_indicator": "https://creator.douyin.com/creator-micro"
},
"bilibili": {
"url": "https://passport.bilibili.com/login",
"qr_selectors": [
"div[class*='qrcode'] canvas", # 常见canvas二维码
"div[class*='qrcode'] img", # 常见图片二维码
".qrcode-img img", # 旧版
".login-scan-box img", # 扫码框
"div[class*='scan'] img"
],
"success_indicator": "https://www.bilibili.com/"
},
"douyin": {
"url": "https://creator.douyin.com/",
"qr_selectors": [
".qrcode img", # 优先尝试
"img[alt='qrcode']",
"canvas[class*='qr']",
"img[src*='qr']"
],
"success_indicator": "https://creator.douyin.com/creator-micro"
},
"xiaohongshu": {
"url": "https://creator.xiaohongshu.com/",
"qr_selectors": [
@@ -79,10 +87,15 @@ class QRLoginService:
}
def _resolve_headless_mode(self) -> str:
if self.platform != "weixin":
return "headless"
mode = (settings.WEIXIN_HEADLESS_MODE or "").strip().lower()
return mode or "headful"
# 抖音和微信使用 headful 模式xvfb 虚拟显示),避免反爬检测
# 其他平台使用 headless-new
if self.platform == "douyin":
mode = (settings.DOUYIN_HEADLESS_MODE or "").strip().lower()
return mode or "headful"
if self.platform == "weixin":
mode = (settings.WEIXIN_HEADLESS_MODE or "").strip().lower()
return mode or "headful"
return "headless-new"
def _is_square_bbox(self, bbox: Optional[Dict[str, float]], min_side: int = 100) -> bool:
if not bbox:
@@ -158,20 +171,20 @@ class QRLoginService:
except Exception:
continue
return None
async def start_login(self) -> Dict[str, Any]:
"""
启动登录流程
Returns:
dict: 包含二维码base64和状态
"""
if self.platform not in self.platform_configs:
return {"success": False, "message": "不支持的平台"}
config = self.platform_configs[self.platform]
try:
async def start_login(self) -> Dict[str, Any]:
"""
启动登录流程
Returns:
dict: 包含二维码base64和状态
"""
if self.platform not in self.platform_configs:
return {"success": False, "message": "不支持的平台"}
config = self.platform_configs[self.platform]
try:
# 1. 启动 Playwright (不使用 async with手动管理生命周期)
self.playwright = await async_playwright().start()
@@ -180,46 +193,66 @@ class QRLoginService:
launch_args = [
'--disable-blink-features=AutomationControlled',
'--no-sandbox',
'--disable-dev-shm-usage'
'--disable-dev-shm-usage',
]
if headless and mode in ("new", "headless-new", "headless_new"):
launch_args.append("--headless=new")
if not headless:
# headful 模式下 xvfb 没有 GPU需要软件渲染
launch_args.extend([
'--use-gl=swiftshader',
'--disable-gpu',
])
# Stealth模式启动浏览器
launch_options: Dict[str, Any] = {
"headless": headless,
"args": launch_args,
}
if self.platform == "weixin":
# 根据平台选择对应的浏览器配置
if self.platform == "douyin":
chrome_path = (settings.DOUYIN_CHROME_PATH or "").strip()
browser_channel = (settings.DOUYIN_BROWSER_CHANNEL or "").strip()
user_agent = settings.DOUYIN_USER_AGENT
locale = settings.DOUYIN_LOCALE
timezone_id = settings.DOUYIN_TIMEZONE_ID
elif self.platform == "weixin":
chrome_path = (settings.WEIXIN_CHROME_PATH or "").strip()
if chrome_path:
if Path(chrome_path).exists():
launch_options["executable_path"] = chrome_path
else:
logger.warning(f"[weixin] WEIXIN_CHROME_PATH not found: {chrome_path}")
else:
channel = (settings.WEIXIN_BROWSER_CHANNEL or "").strip()
if channel:
launch_options["channel"] = channel
browser_channel = (settings.WEIXIN_BROWSER_CHANNEL or "").strip()
user_agent = settings.WEIXIN_USER_AGENT
locale = settings.WEIXIN_LOCALE
timezone_id = settings.WEIXIN_TIMEZONE_ID
else:
# B站、小红书等使用通用默认值
chrome_path = (settings.WEIXIN_CHROME_PATH or "").strip()
browser_channel = ""
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
locale = "zh-CN"
timezone_id = "Asia/Shanghai"
if chrome_path and Path(chrome_path).exists():
launch_options["executable_path"] = chrome_path
elif browser_channel:
launch_options["channel"] = browser_channel
self.browser = await self.playwright.chromium.launch(**launch_options)
# 配置真实浏览器特征
self.context = await self.browser.new_context(
viewport={'width': 1920, 'height': 1080},
user_agent=settings.WEIXIN_USER_AGENT,
locale=settings.WEIXIN_LOCALE,
timezone_id=settings.WEIXIN_TIMEZONE_ID
user_agent=user_agent,
locale=locale,
timezone_id=timezone_id
)
page = await self.context.new_page()
# 注入stealth.js
stealth_path = Path(__file__).parent / 'uploader' / 'stealth.min.js'
if stealth_path.exists():
await page.add_init_script(path=str(stealth_path))
logger.debug(f"[{self.platform}] Stealth模式已启用")
page = await self.context.new_page()
# 注入stealth.js
stealth_path = Path(__file__).parent / 'uploader' / 'stealth.min.js'
if stealth_path.exists():
await page.add_init_script(path=str(stealth_path))
logger.debug(f"[{self.platform}] Stealth模式已启用")
urls_to_try = [config["url"]]
if self.platform == "weixin":
urls_to_try = [
@@ -228,6 +261,60 @@ class QRLoginService:
]
qr_image = None
# 抖音:拦截 QR 登录相关 API 响应,检测登录成功
if self.platform == "douyin":
async def _on_douyin_qr_response(response):
try:
url = response.url or ""
if "check_qrconnect" not in url.lower():
return
body = None
try:
body = await response.json()
except Exception:
try:
text = await response.text()
import re as _re
m = _re.search(r'\{.*\}', text, _re.DOTALL)
if m:
body = json.loads(m.group())
except Exception:
pass
if not body:
return
data = body.get("data", {})
redirect_url = data.get("redirect_url", "")
status_val = data.get("status", "")
desc = data.get("description", body.get("description", ""))
logger.info(
f"[douyin][qr-poll] status={status_val} "
f"desc={desc[:60]} redirect={'yes' if redirect_url else 'no'}"
)
# 检测需要APP验证
if "完成验证" in desc or "验证后" in desc:
self._douyin_needs_verify = True
logger.warning("[douyin] 需要APP验证")
if self._qr_api_confirmed:
return
# 检测登录成功:出现 redirect_url
if redirect_url:
self._qr_redirect_url = redirect_url
self._qr_api_confirmed = True
logger.success(f"[douyin] 登录确认redirect_url={redirect_url[:120]}")
except Exception as e:
logger.debug(f"[douyin][qr-poll] error: {e}")
page.on("response", _on_douyin_qr_response)
for url in urls_to_try:
logger.info(f"[{self.platform}] 打开登录页: {url}")
wait_until = "domcontentloaded" if self.platform == "weixin" else "networkidle"
@@ -240,72 +327,94 @@ class QRLoginService:
qr_image = await self._extract_qr_code(page, config["qr_selectors"])
if qr_image:
break
if not qr_image:
await self._cleanup()
return {"success": False, "message": "未找到二维码"}
logger.info(f"[{self.platform}] 二维码已获取,等待扫码...")
# 启动后台监控任务 (浏览器保持开启)
asyncio.create_task(
self._monitor_login_status(page, config["success_indicator"])
)
return {
"success": True,
"qr_code": qr_image,
"message": "请扫码登录"
}
except Exception as e:
logger.exception(f"[{self.platform}] 启动登录失败: {e}")
await self._cleanup()
return {"success": False, "message": f"启动失败: {str(e)}"}
async def _extract_qr_code(self, page: Page, selectors: List[str]) -> Optional[str]:
"""
提取二维码图片 (优化策略顺序)
根据日志分析抖音和B站使用 Text 策略成功率最高
"""
qr_element = None
# 针对抖音和B站优先使用 Text 策略 (成功率最高,速度最快)
if self.platform in ("douyin", "bilibili"):
# 尝试最多2次 (首次 + 1次重试)
for attempt in range(2):
if attempt > 0:
logger.info(f"[{self.platform}] 等待页面加载后重试...")
await asyncio.sleep(2)
# 策略1: Text (优先,成功率最高)
qr_element = await self._try_text_strategy(page)
if qr_element:
try:
screenshot = await qr_element.screenshot()
return base64.b64encode(screenshot).decode()
except Exception as e:
logger.warning(f"[{self.platform}] Text策略截图失败: {e}")
qr_element = None
# 策略2: CSS (备用)
if not qr_element:
try:
combined_selector = ", ".join(selectors)
logger.debug(f"[{self.platform}] 策略2(CSS): 开始等待...")
# 增加超时到5秒抖音页面加载较慢
el = await page.wait_for_selector(combined_selector, state="visible", timeout=5000)
if el:
logger.info(f"[{self.platform}] 策略2(CSS): 匹配成功")
screenshot = await el.screenshot()
return base64.b64encode(screenshot).decode()
except Exception as e:
logger.warning(f"[{self.platform}] 策略2(CSS) 失败: {e}")
# 如果已成功,退出循环
if qr_element:
break
else:
if not qr_image:
await self._cleanup()
return {"success": False, "message": "未找到二维码"}
logger.info(f"[{self.platform}] 二维码已获取,等待扫码...")
# 启动后台监控任务 (浏览器保持开启)
asyncio.create_task(
self._monitor_login_status(page, config["success_indicator"])
)
return {
"success": True,
"qr_code": qr_image,
"message": "请扫码登录"
}
except Exception as e:
logger.exception(f"[{self.platform}] 启动登录失败: {e}")
await self._cleanup()
return {"success": False, "message": f"启动失败: {str(e)}"}
async def _extract_qr_code(self, page: Page, selectors: List[str]) -> Optional[str]:
"""
提取二维码图片 (优化策略顺序)
抖音CSS 优先Text 策略每次超时 15 秒)
B站Text 优先
其他CSS -> Text
"""
qr_element = None
if self.platform == "douyin":
# 抖音CSS 优先Text 备用CSS 成功率高且快)
for attempt in range(2):
if attempt > 0:
logger.info(f"[{self.platform}] 等待页面加载后重试...")
await asyncio.sleep(2)
# 策略1: CSS (快速)
try:
combined_selector = ", ".join(selectors)
logger.debug(f"[{self.platform}] 策略CSS: 开始等待...")
el = await page.wait_for_selector(combined_selector, state="visible", timeout=5000)
if el:
logger.info(f"[{self.platform}] 策略CSS: 匹配成功")
screenshot = await el.screenshot()
return base64.b64encode(screenshot).decode()
except Exception as e:
logger.warning(f"[{self.platform}] 策略CSS 失败: {e}")
# 策略2: Text (备用)
qr_element = await self._try_text_strategy(page)
if qr_element:
try:
screenshot = await qr_element.screenshot()
return base64.b64encode(screenshot).decode()
except Exception as e:
logger.warning(f"[{self.platform}] Text策略截图失败: {e}")
elif self.platform == "bilibili":
# B站Text 优先
for attempt in range(2):
if attempt > 0:
logger.info(f"[{self.platform}] 等待页面加载后重试...")
await asyncio.sleep(2)
qr_element = await self._try_text_strategy(page)
if qr_element:
try:
screenshot = await qr_element.screenshot()
return base64.b64encode(screenshot).decode()
except Exception as e:
logger.warning(f"[{self.platform}] Text策略截图失败: {e}")
qr_element = None
if not qr_element:
try:
combined_selector = ", ".join(selectors)
logger.debug(f"[{self.platform}] 策略CSS: 开始等待...")
el = await page.wait_for_selector(combined_selector, state="visible", timeout=5000)
if el:
logger.info(f"[{self.platform}] 策略CSS: 匹配成功")
screenshot = await el.screenshot()
return base64.b64encode(screenshot).decode()
except Exception as e:
logger.warning(f"[{self.platform}] 策略CSS 失败: {e}")
else:
# 其他平台 (小红书/微信等):保持原顺序 CSS -> Text
# 策略1: CSS 选择器
try:
@@ -328,36 +437,31 @@ class QRLoginService:
logger.info(f"[{self.platform}] 策略1(CSS): 匹配成功")
except Exception as e:
logger.warning(f"[{self.platform}] 策略1(CSS) 失败: {e}")
# 策略2: Text
# 策略2: Text
if not qr_element:
qr_element = await self._try_text_strategy(page)
if not qr_element and self.platform == "weixin":
qr_element = await self._try_text_strategy_in_frames(page)
# 如果找到元素,截图返回
if qr_element:
try:
screenshot = await qr_element.screenshot()
return base64.b64encode(screenshot).decode()
except Exception as e:
logger.error(f"[{self.platform}] 截图失败: {e}")
# 所有策略失败
logger.error(f"[{self.platform}] 所有QR码提取策略失败")
# 保存调试截图
debug_dir = Path(__file__).parent.parent.parent / 'debug_screenshots'
debug_dir.mkdir(exist_ok=True)
await page.screenshot(path=str(debug_dir / f"{self.platform}_debug.png"))
return None
# 如果找到元素,截图返回
if qr_element:
try:
screenshot = await qr_element.screenshot()
return base64.b64encode(screenshot).decode()
except Exception as e:
logger.error(f"[{self.platform}] 截图失败: {e}")
# 所有策略失败
logger.error(f"[{self.platform}] 所有QR码提取策略失败")
return None
async def _try_text_strategy(self, page: Union[Page, Frame]) -> Optional[Any]:
"""基于文本查找二维码图片"""
try:
logger.debug(f"[{self.platform}] 策略Text: 开始搜索...")
"""基于文本查找二维码图片"""
try:
logger.debug(f"[{self.platform}] 策略Text: 开始搜索...")
keywords = [
"扫码登录",
"二维码",
@@ -368,138 +472,265 @@ class QRLoginService:
"请使用微信扫码",
"视频号"
]
for kw in keywords:
try:
text_el = page.get_by_text(kw, exact=False).first
await text_el.wait_for(state="visible", timeout=2000)
# 向上查找图片
parent = text_el
for _ in range(5):
parent = parent.locator("..")
for kw in keywords:
try:
text_el = page.get_by_text(kw, exact=False).first
await text_el.wait_for(state="visible", timeout=2000)
# 向上查找图片
parent = text_el
for _ in range(5):
parent = parent.locator("..")
candidates = parent.locator("img, canvas")
min_side = 120 if self.platform == "weixin" else 100
best = await self._pick_best_candidate(candidates, min_side=min_side)
if best:
logger.info(f"[{self.platform}] 策略Text: 成功")
return best
except Exception:
continue
except Exception as e:
logger.warning(f"[{self.platform}] 策略Text 失败: {e}")
return None
async def _monitor_login_status(self, page: Page, success_url: str):
"""监控登录状态"""
try:
logger.info(f"[{self.platform}] 开始监控登录状态...")
except Exception:
continue
except Exception as e:
logger.warning(f"[{self.platform}] 策略Text 失败: {e}")
return None
async def _monitor_login_status(self, page: Page, success_url: str):
"""监控登录状态 — 简洁版
策略:
1. 监听页面 URL 变化和 session cookie 出现(通用,适用所有平台)
2. 抖音特殊:如果 API 拦截到 redirect_url直接导航过去拿 cookie
3. 抖音特殊如果需要APP验证且JS轮询停了等用户验证完后
用 page.goto 重新访问首页,让服务器分配 session
"""
try:
logger.info(f"[{self.platform}] 开始监控登录状态...")
key_cookies = {
"bilibili": ["SESSDATA"],
"douyin": ["sessionid"],
"douyin": ["sessionid", "sessionid_ss", "sid_guard", "sid_tt", "uid_tt"],
"xiaohongshu": ["web_session"],
"weixin": [
"wxuin",
"wxsid",
"pass_ticket",
"webwx_data_ticket",
"uin",
"skey",
"p_uin",
"p_skey",
"pac_uid",
],
"weixin": ["wxuin", "wxsid", "pass_ticket", "uin", "skey",
"p_uin", "p_skey", "pac_uid"],
}
target_cookies = key_cookies.get(self.platform, [])
for i in range(self.LOGIN_TIMEOUT):
await asyncio.sleep(1)
try:
if not self.context: break # 避免意外关闭
cookies = [dict(cookie) for cookie in await self.context.cookies()]
current_url = page.url
has_cookie = any((c.get('name') in target_cookies) for c in cookies) if target_cookies else False
if i % 5 == 0:
logger.debug(f"[{self.platform}] 等待登录... HasCookie: {has_cookie}")
if success_url in current_url or has_cookie:
logger.success(f"[{self.platform}] 登录成功!")
self.login_success = True
await asyncio.sleep(2) # 缓冲
# 保存Cookie
final_cookies = [dict(cookie) for cookie in await self.context.cookies()]
await self._save_cookies(final_cookies)
break
except Exception as e:
logger.warning(f"[{self.platform}] 监控循环警告: {e}")
break
if not self.login_success:
logger.warning(f"[{self.platform}] 登录超时")
except Exception as e:
logger.error(f"[{self.platform}] 监控异常: {e}")
finally:
await self._cleanup()
async def _cleanup(self) -> None:
"""清理资源"""
if self.context:
try:
await self.context.close()
except Exception:
pass
self.context = None
if self.browser:
try:
await self.browser.close()
except Exception:
pass
self.browser = None
if self.playwright:
try:
await self.playwright.stop()
except Exception:
pass
self.playwright = None
initial_url = page.url
_verify_detected_at: Optional[int] = None # 检测到需要验证的时间点(循环计数)
for i in range(self.LOGIN_TIMEOUT):
await asyncio.sleep(1)
if not self.context:
break
try:
# ── 检查 session cookie ──
cookies = [dict(c) for c in await self.context.cookies()]
cookie_names = [c.get("name") for c in cookies]
has_session = any(n in cookie_names for n in target_cookies) if target_cookies else False
current_url = page.url
# 每10秒打一次日志
if i % 10 == 0:
logger.info(
f"[{self.platform}] 等待登录... i={i} "
f"URL={current_url[:80]} session={has_session} "
f"cookies={len(cookies)}"
)
# ── 成功条件:有 session cookie ──
if has_session:
logger.success(f"[{self.platform}] 登录成功检测到session cookie")
self.login_success = True
await asyncio.sleep(2)
final = [dict(c) for c in await self.context.cookies()]
await self._save_cookies(final)
break
# ── 成功条件URL 跳转到目标页 ──
if success_url in current_url:
logger.success(f"[{self.platform}] 登录成功URL={current_url[:80]}")
self.login_success = True
await asyncio.sleep(2)
final = [dict(c) for c in await self.context.cookies()]
await self._save_cookies(final)
break
# ── 抖音API 拦截到 redirect_url → 直接导航 ──
if self.platform == "douyin" and self._qr_api_confirmed and self._qr_redirect_url:
logger.info(f"[douyin] 导航到 redirect_url...")
try:
await page.goto(self._qr_redirect_url, wait_until="domcontentloaded", timeout=30000)
except Exception:
pass
await asyncio.sleep(3)
# 重置,下一轮循环会检查 cookie
self._qr_api_confirmed = False
self._qr_redirect_url = None
continue
# ── 抖音需要APP验证点击"手机刷脸验证"选项 ──
if self.platform == "douyin" and self._douyin_needs_verify:
if _verify_detected_at is None:
_verify_detected_at = i
logger.info("[douyin] 检测到身份验证弹窗,将点击手机刷脸验证...")
elapsed = i - _verify_detected_at
# 第一次:点击"手机刷脸验证"选项
if elapsed == 2:
try:
clicked = await page.evaluate("""() => {
// 查找身份验证弹窗中的选项
const allEls = document.querySelectorAll('div, span, p, a, li');
for (const el of allEls) {
const text = (el.textContent || '').trim();
// 点击"手机刷脸验证"
if (text.includes('刷脸验证') && text.length < 30) {
el.click();
return '刷脸验证';
}
}
return null;
}""")
if clicked:
logger.info(f"[douyin] 已点击验证选项: {clicked}")
else:
logger.warning("[douyin] 未找到验证选项")
except Exception as e:
logger.warning(f"[douyin] 点击验证选项异常: {e}")
# 点击后等待新二维码出现,提取弹窗内二维码截图
if elapsed == 5 and not self._face_verify_qr:
try:
# 用 JS 在"刷脸验证"弹窗内找最大的正方形 img即二维码跳过头像
qr_selector = await page.evaluate("""() => {
// 找到包含"刷脸验证"文字的弹窗
const allEls = document.querySelectorAll('div, h2, h3, span, p');
let modal = null;
for (const el of allEls) {
const text = (el.textContent || '').trim();
if (text.includes('刷脸验证') && text.length < 20) {
modal = el;
for (let i = 0; i < 8; i++) {
if (!modal.parentElement) break;
modal = modal.parentElement;
if (modal.offsetWidth > 250 && modal.offsetHeight > 250) break;
}
break;
}
}
if (!modal) return null;
// 用 offsetWidth/Height显示尺寸而非 naturalWidth源文件可能很大
const imgs = modal.querySelectorAll('img');
let best = null;
let bestArea = 0;
for (const img of imgs) {
const w = img.offsetWidth;
const h = img.offsetHeight;
if (w < 80 || h < 80) continue;
const ratio = Math.abs(w - h) / Math.max(w, h);
if (ratio > 0.3) continue;
const area = w * h;
if (area > bestArea) {
bestArea = area;
best = img;
}
}
if (best) {
best.setAttribute('data-face-qr', 'true');
return 'img[data-face-qr="true"]';
}
return null;
}""")
if qr_selector:
qr_el = page.locator(qr_selector).first
if await qr_el.is_visible():
screenshot = await qr_el.screenshot()
self._face_verify_qr = base64.b64encode(screenshot).decode()
logger.info("[douyin] 刷脸弹窗内二维码截图已捕获")
else:
logger.warning("[douyin] 二维码元素不可见")
if not self._face_verify_qr:
# 兜底:整页截图
logger.warning("[douyin] 未在弹窗内找到二维码,使用全页截图")
screenshot = await page.screenshot()
self._face_verify_qr = base64.b64encode(screenshot).decode()
except Exception as e:
logger.warning(f"[douyin] 截取刷脸二维码异常: {e}")
# 之后每10秒打一次日志
if elapsed > 0 and elapsed % 10 == 0:
logger.info(f"[douyin] 等待用户完成手机验证... ({elapsed}s)")
except Exception as e:
logger.warning(f"[{self.platform}] 监控异常: {e}")
if not self.login_success:
logger.warning(f"[{self.platform}] 登录超时")
except Exception as e:
logger.error(f"[{self.platform}] 监控异常: {e}")
finally:
await self._cleanup()
async def _cleanup(self) -> None:
"""清理资源"""
if self.context:
try:
await self.context.close()
except Exception:
pass
self.context = None
if self.browser:
try:
await self.browser.close()
except Exception:
pass
self.browser = None
if self.playwright:
try:
await self.playwright.stop()
except Exception:
pass
self.playwright = None
async def _save_cookies(self, cookies: Sequence[Mapping[str, Any]]) -> None:
"""保存Cookie到文件"""
try:
cookie_file = self.cookies_dir / f"{self.platform}_cookies.json"
if self.platform == "bilibili":
# Bilibili 使用简单格式 (biliup库需要)
"""保存Cookie到文件"""
try:
cookie_file = self.cookies_dir / f"{self.platform}_cookies.json"
if self.platform == "bilibili":
# Bilibili 使用简单格式 (biliup库需要)
cookie_dict = {c.get('name'): c.get('value') for c in cookies if c.get('name')}
required = ['SESSDATA', 'bili_jct', 'DedeUserID', 'DedeUserID__ckMd5']
cookie_dict = {k: v for k, v in cookie_dict.items() if k in required}
with open(cookie_file, 'w', encoding='utf-8') as f:
json.dump(cookie_dict, f, indent=2)
self.cookies_data = cookie_dict
else:
# Douyin/Xiaohongshu 使用 Playwright storage_state 完整格式
# 这样可以直接用 browser.new_context(storage_state=file)
storage_state = {
"cookies": cookies,
"origins": []
}
with open(cookie_file, 'w', encoding='utf-8') as f:
json.dump(storage_state, f, indent=2)
self.cookies_data = storage_state
logger.success(f"[{self.platform}] Cookie已保存")
except Exception as e:
logger.error(f"[{self.platform}] 保存Cookie失败: {e}")
def get_login_status(self) -> Dict[str, Any]:
"""获取登录状态"""
return {
"success": self.login_success,
"cookies_saved": self.cookies_data is not None
}
required = ['SESSDATA', 'bili_jct', 'DedeUserID', 'DedeUserID__ckMd5']
cookie_dict = {k: v for k, v in cookie_dict.items() if k in required}
with open(cookie_file, 'w', encoding='utf-8') as f:
json.dump(cookie_dict, f, indent=2)
self.cookies_data = cookie_dict
else:
# Douyin/Xiaohongshu 使用 Playwright storage_state 完整格式
# 这样可以直接用 browser.new_context(storage_state=file)
storage_state = {
"cookies": cookies,
"origins": []
}
with open(cookie_file, 'w', encoding='utf-8') as f:
json.dump(storage_state, f, indent=2)
self.cookies_data = storage_state
logger.success(f"[{self.platform}] Cookie已保存")
except Exception as e:
logger.error(f"[{self.platform}] 保存Cookie失败: {e}")
def get_login_status(self) -> Dict[str, Any]:
"""获取登录状态"""
result: Dict[str, Any] = {
"success": self.login_success,
"cookies_saved": self.cookies_data is not None
}
# 刷脸验证:返回新二维码截图给前端展示
if self._face_verify_qr:
result["face_verify_qr"] = self._face_verify_qr
return result

View File

@@ -7,6 +7,7 @@ import asyncio
import json
import os
import subprocess
from collections.abc import Callable
from pathlib import Path
from typing import Optional
from loguru import logger
@@ -29,12 +30,15 @@ class RemotionService:
output_path: str,
captions_path: Optional[str] = None,
title: Optional[str] = None,
title_duration: float = 3.0,
title_duration: float = 4.0,
title_display_mode: str = "short",
fps: int = 25,
enable_subtitles: bool = True,
subtitle_style: Optional[dict] = None,
title_style: Optional[dict] = None,
on_progress: Optional[callable] = None
secondary_title: Optional[str] = None,
secondary_title_style: Optional[dict] = None,
on_progress: Optional[Callable[[int], None]] = None
) -> str:
"""
使用 Remotion 渲染视频(添加字幕和标题)
@@ -45,6 +49,7 @@ class RemotionService:
captions_path: 字幕 JSON 文件路径Whisper 生成)
title: 视频标题(可选)
title_duration: 标题显示时长(秒)
title_display_mode: 标题显示模式short/persistent
fps: 帧率
enable_subtitles: 是否启用字幕
on_progress: 进度回调函数
@@ -75,6 +80,7 @@ class RemotionService:
if title:
cmd.extend(["--title", title])
cmd.extend(["--titleDuration", str(title_duration)])
cmd.extend(["--titleDisplayMode", title_display_mode])
if subtitle_style:
cmd.extend(["--subtitleStyle", json.dumps(subtitle_style, ensure_ascii=False)])
@@ -82,6 +88,12 @@ class RemotionService:
if title_style:
cmd.extend(["--titleStyle", json.dumps(title_style, ensure_ascii=False)])
if secondary_title:
cmd.extend(["--secondaryTitle", secondary_title])
if secondary_title_style:
cmd.extend(["--secondaryTitleStyle", json.dumps(secondary_title_style, ensure_ascii=False)])
logger.info(f"Running Remotion render: {' '.join(cmd)}")
# 在线程池中运行子进程
@@ -95,8 +107,12 @@ class RemotionService:
bufsize=1
)
if process.stdout is None:
raise RuntimeError("Remotion process stdout is unavailable")
stdout = process.stdout
output_lines = []
for line in iter(process.stdout.readline, ''):
for line in iter(stdout.readline, ''):
line = line.strip()
if line:
output_lines.append(line)

View File

@@ -20,12 +20,13 @@ class StorageService:
self.BUCKET_MATERIALS = "materials"
self.BUCKET_OUTPUTS = "outputs"
self.BUCKET_REF_AUDIOS = "ref-audios"
self.BUCKET_GENERATED_AUDIOS = "generated-audios"
# 确保所有 bucket 存在
self._ensure_buckets()
def _ensure_buckets(self):
"""确保所有必需的 bucket 存在"""
buckets = [self.BUCKET_MATERIALS, self.BUCKET_OUTPUTS, self.BUCKET_REF_AUDIOS]
buckets = [self.BUCKET_MATERIALS, self.BUCKET_OUTPUTS, self.BUCKET_REF_AUDIOS, self.BUCKET_GENERATED_AUDIOS]
try:
existing = self.supabase.storage.list_buckets()
existing_names = {b.name for b in existing} if existing else set()

View File

@@ -127,6 +127,22 @@ class WeixinUploader(BaseUploader):
return False
def _attach_debug_listeners(self, page) -> None:
# post_create 响应监听始终注册(不依赖 debug 开关)
def log_post_create(response):
try:
url = response.url or ""
if "/post/post_create" in url:
if response.status < 400:
self._post_create_submitted = True
logger.info("[weixin][publish] post_create API ok")
else:
self._publish_api_error = f"发布请求失败HTTP {response.status}"
logger.warning(f"[weixin][publish] post_create_failed status={response.status}")
except Exception:
pass
page.on("response", log_post_create)
if not self._debug_artifacts_enabled():
return
@@ -1210,15 +1226,7 @@ class WeixinUploader(BaseUploader):
return False
async def _wait_for_publish_result(self, page):
success_texts = [
"\u53d1\u5e03\u6210\u529f",
"\u53d1\u5e03\u5b8c\u6210",
"\u5df2\u53d1\u5e03",
"\u5ba1\u6838\u4e2d",
"\u5f85\u5ba1\u6838",
"\u63d0\u4ea4\u6210\u529f",
"\u5df2\u63d0\u4ea4",
]
"""点击发表后等待结果:页面离开创建页即视为成功"""
failure_texts = [
"\u53d1\u5e03\u5931\u8d25",
"\u53d1\u5e03\u5f02\u5e38",
@@ -1229,38 +1237,33 @@ class WeixinUploader(BaseUploader):
"\u7f51\u7edc\u5f02\u5e38",
]
# 记录点击发表时的 URL用于判断是否跳转
create_url = page.url
start_time = time.time()
last_capture = -1
while time.time() - start_time < self.PUBLISH_TIMEOUT:
current_url = page.url
# API 层面报错 → 直接失败
if self._publish_api_error:
return False, self._publish_api_error, False
if self._post_create_submitted and (
"/post/list" in current_url
or "/platform/post/list" in current_url
):
return True, "发布成功:已进入内容列表", False
# 核心判定URL 离开了创建页(跳转到列表页或其他页面)→ 发布成功
if current_url != create_url and "/post/create" not in current_url:
logger.info(f"[weixin] page navigated away from create page: {current_url}")
return True, "发布成功:页面已跳转", False
if "channels.weixin.qq.com/platform" in current_url:
for text in success_texts:
if await self._is_text_visible(page, text, exact=False):
return True, f"发布成功:{text}", False
# post_create API 已确认成功 → 也视为成功
if self._post_create_submitted:
logger.info("[weixin] post_create API confirmed success")
return True, "发布成功:API 已确认", False
# 检查页面上的失败文案
for text in failure_texts:
if await self._is_text_visible(page, text, exact=False):
return False, f"发布失败:{text}", False
for text in success_texts:
if await self._is_text_visible(page, text, exact=False):
return True, f"发布成功:{text}", False
logger.info("[weixin] waiting for publish result...")
elapsed = int(time.time() - start_time)
if elapsed % 20 == 0 and elapsed != last_capture:
last_capture = elapsed
await self._save_debug_screenshot(page, "publish_waiting")
await asyncio.sleep(self.POLL_INTERVAL)
return False, "发布超时", True

View File

@@ -1,140 +1,405 @@
"""
视频合成服务
"""
import os
import subprocess
import json
import shlex
from pathlib import Path
from loguru import logger
from typing import Optional
class VideoService:
def __init__(self):
pass
def _run_ffmpeg(self, cmd: list) -> bool:
cmd_str = ' '.join(shlex.quote(str(c)) for c in cmd)
logger.debug(f"FFmpeg CMD: {cmd_str}")
try:
# Synchronous call for BackgroundTasks compatibility
result = subprocess.run(
cmd,
shell=False,
capture_output=True,
text=True,
encoding='utf-8',
)
if result.returncode != 0:
logger.error(f"FFmpeg Error: {result.stderr}")
return False
return True
except Exception as e:
logger.error(f"FFmpeg Exception: {e}")
return False
def _get_duration(self, file_path: str) -> float:
# Synchronous call for BackgroundTasks compatibility
# 使用参数列表形式避免 shell=True 的命令注入风险
cmd = [
'ffprobe', '-v', 'error',
'-show_entries', 'format=duration',
'-of', 'default=noprint_wrappers=1:nokey=1',
file_path
]
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
)
return float(result.stdout.strip())
except Exception:
return 0.0
def mix_audio(
self,
voice_path: str,
bgm_path: str,
output_path: str,
bgm_volume: float = 0.2
) -> str:
"""混合人声与背景音乐"""
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
volume = max(0.0, min(float(bgm_volume), 1.0))
filter_complex = (
f"[0:a]volume=1.0[a0];"
f"[1:a]volume={volume}[a1];"
f"[a0][a1]amix=inputs=2:duration=first:dropout_transition=2:normalize=0[aout]"
)
cmd = [
"ffmpeg", "-y",
"-i", voice_path,
"-stream_loop", "-1", "-i", bgm_path,
"-filter_complex", filter_complex,
"-map", "[aout]",
"-c:a", "pcm_s16le",
"-shortest",
output_path,
]
if self._run_ffmpeg(cmd):
return output_path
raise RuntimeError("FFmpeg audio mix failed")
async def compose(
self,
video_path: str,
audio_path: str,
output_path: str,
subtitle_path: Optional[str] = None
) -> str:
"""合成视频"""
# Ensure output dir
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
video_duration = self._get_duration(video_path)
audio_duration = self._get_duration(audio_path)
# Audio loop if needed
loop_count = 1
if audio_duration > video_duration and video_duration > 0:
loop_count = int(audio_duration / video_duration) + 1
cmd = ["ffmpeg", "-y"]
# Input video (stream_loop must be before -i)
if loop_count > 1:
cmd.extend(["-stream_loop", str(loop_count)])
cmd.extend(["-i", video_path])
# Input audio
cmd.extend(["-i", audio_path])
# Filter complex
filter_complex = []
# Subtitles (skip for now to mimic previous state or implement basic)
# Previous state: subtitles disabled due to font issues
# if subtitle_path: ...
# Audio map with high quality encoding
cmd.extend([
"-c:v", "libx264",
"-preset", "slow", # 慢速预设,更好的压缩效率
"-crf", "18", # 高质量(与 LatentSync 一致)
"-c:a", "aac",
"-b:a", "192k", # 音频比特率
"-shortest"
])
# Use audio from input 1
cmd.extend(["-map", "0:v", "-map", "1:a"])
cmd.append(output_path)
if self._run_ffmpeg(cmd):
return output_path
else:
raise RuntimeError("FFmpeg composition failed")
"""
视频合成服务
"""
import os
import subprocess
import json
import shlex
from pathlib import Path
from loguru import logger
from typing import Optional
class VideoService:
def __init__(self):
pass
def get_video_metadata(self, file_path: str) -> dict:
"""获取视频元信息(含旋转角与有效显示分辨率)"""
cmd = [
"ffprobe", "-v", "error",
"-select_streams", "v:0",
"-show_entries", "stream=width,height:stream_side_data=rotation",
"-of", "json",
file_path,
]
default_info = {
"width": 0,
"height": 0,
"rotation": 0,
"effective_width": 0,
"effective_height": 0,
}
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
if result.returncode != 0:
return default_info
payload = json.loads(result.stdout or "{}")
streams = payload.get("streams") or []
if not streams:
return default_info
stream = streams[0]
width = int(stream.get("width") or 0)
height = int(stream.get("height") or 0)
rotation = 0
for side_data in stream.get("side_data_list") or []:
if not isinstance(side_data, dict):
continue
raw_rotation = side_data.get("rotation")
if raw_rotation is None:
continue
try:
rotation = int(round(float(str(raw_rotation))))
except Exception:
rotation = 0
break
norm_rotation = rotation % 360
if norm_rotation > 180:
norm_rotation -= 360
swap_wh = abs(norm_rotation) == 90
effective_width = height if swap_wh else width
effective_height = width if swap_wh else height
return {
"width": width,
"height": height,
"rotation": norm_rotation,
"effective_width": effective_width,
"effective_height": effective_height,
}
except Exception as e:
logger.warning(f"获取视频元信息失败: {e}")
return default_info
def normalize_orientation(self, video_path: str, output_path: str) -> str:
"""将带旋转元数据的视频转为物理方向,避免后续流程忽略 rotation。"""
info = self.get_video_metadata(video_path)
rotation = int(info.get("rotation") or 0)
if rotation == 0:
return video_path
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
logger.info(
f"检测到旋转元数据 rotation={rotation},归一化方向: "
f"{info.get('effective_width', 0)}x{info.get('effective_height', 0)}"
)
cmd = [
"ffmpeg", "-y",
"-i", video_path,
"-map", "0:v:0",
"-map", "0:a?",
"-c:v", "libx264",
"-preset", "fast",
"-crf", "23",
"-c:a", "copy",
"-movflags", "+faststart",
output_path,
]
if self._run_ffmpeg(cmd):
normalized = self.get_video_metadata(output_path)
logger.info(
"视频方向归一化完成: "
f"coded={normalized.get('width', 0)}x{normalized.get('height', 0)}, "
f"rotation={normalized.get('rotation', 0)}"
)
return output_path
logger.warning("视频方向归一化失败,回退使用原视频")
return video_path
def _run_ffmpeg(self, cmd: list) -> bool:
cmd_str = ' '.join(shlex.quote(str(c)) for c in cmd)
logger.debug(f"FFmpeg CMD: {cmd_str}")
try:
# Synchronous call for BackgroundTasks compatibility
result = subprocess.run(
cmd,
shell=False,
capture_output=True,
text=True,
encoding='utf-8',
)
if result.returncode != 0:
logger.error(f"FFmpeg Error: {result.stderr}")
return False
return True
except Exception as e:
logger.error(f"FFmpeg Exception: {e}")
return False
def _get_duration(self, file_path: str) -> float:
# Synchronous call for BackgroundTasks compatibility
# 使用参数列表形式避免 shell=True 的命令注入风险
cmd = [
'ffprobe', '-v', 'error',
'-show_entries', 'format=duration',
'-of', 'default=noprint_wrappers=1:nokey=1',
file_path
]
try:
result = subprocess.run(
cmd,
capture_output=True,
text=True,
)
return float(result.stdout.strip())
except Exception:
return 0.0
def mix_audio(
self,
voice_path: str,
bgm_path: str,
output_path: str,
bgm_volume: float = 0.2
) -> str:
"""混合人声与背景音乐"""
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
volume = max(0.0, min(float(bgm_volume), 1.0))
filter_complex = (
f"[0:a]volume=1.0[a0];"
f"[1:a]volume={volume}[a1];"
f"[a0][a1]amix=inputs=2:duration=first:dropout_transition=2:normalize=0[aout]"
)
cmd = [
"ffmpeg", "-y",
"-i", voice_path,
"-stream_loop", "-1", "-i", bgm_path,
"-filter_complex", filter_complex,
"-map", "[aout]",
"-c:a", "pcm_s16le",
"-shortest",
output_path,
]
if self._run_ffmpeg(cmd):
return output_path
raise RuntimeError("FFmpeg audio mix failed")
async def compose(
self,
video_path: str,
audio_path: str,
output_path: str,
subtitle_path: Optional[str] = None
) -> str:
"""合成视频"""
# Ensure output dir
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
video_duration = self._get_duration(video_path)
audio_duration = self._get_duration(audio_path)
# Audio loop if needed
loop_count = 1
if audio_duration > video_duration and video_duration > 0:
loop_count = int(audio_duration / video_duration) + 1
cmd = ["ffmpeg", "-y"]
# Input video (stream_loop must be before -i)
if loop_count > 1:
cmd.extend(["-stream_loop", str(loop_count)])
cmd.extend(["-i", video_path])
# Input audio
cmd.extend(["-i", audio_path])
# Filter complex
filter_complex = []
# Subtitles (skip for now to mimic previous state or implement basic)
# Previous state: subtitles disabled due to font issues
# if subtitle_path: ...
# Audio map with high quality encoding
cmd.extend([
"-c:v", "libx264",
"-preset", "medium", # 平衡速度与压缩效率
"-crf", "20", # 最终输出:高质量(肉眼无损)
"-c:a", "aac",
"-b:a", "192k", # 音频比特率
"-shortest"
])
# Use audio from input 1
cmd.extend(["-map", "0:v", "-map", "1:a"])
cmd.append(output_path)
if self._run_ffmpeg(cmd):
return output_path
else:
raise RuntimeError("FFmpeg composition failed")
def concat_videos(self, video_paths: list, output_path: str, target_fps: int = 25) -> str:
"""使用 FFmpeg concat demuxer 拼接多个视频片段"""
if not video_paths:
raise ValueError("No video segments to concat")
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
# 生成 concat list 文件
list_path = Path(output_path).parent / f"{Path(output_path).stem}_concat.txt"
with open(list_path, "w", encoding="utf-8") as f:
for vp in video_paths:
f.write(f"file '{vp}'\n")
cmd = [
"ffmpeg", "-y",
"-f", "concat",
"-safe", "0",
"-fflags", "+genpts",
"-i", str(list_path),
"-an",
"-vsync", "cfr",
"-r", str(target_fps),
"-c:v", "libx264",
"-preset", "fast",
"-crf", "23",
"-pix_fmt", "yuv420p",
"-movflags", "+faststart",
output_path,
]
try:
if self._run_ffmpeg(cmd):
return output_path
else:
raise RuntimeError("FFmpeg concat failed")
finally:
try:
list_path.unlink(missing_ok=True)
except Exception:
pass
def split_audio(self, audio_path: str, start: float, end: float, output_path: str) -> str:
"""用 FFmpeg 按时间范围切分音频"""
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
duration = end - start
if duration <= 0:
raise ValueError(f"Invalid audio split range: start={start}, end={end}, duration={duration}")
cmd = [
"ffmpeg", "-y",
"-ss", str(start),
"-t", str(duration),
"-i", audio_path,
"-c", "copy",
output_path,
]
if self._run_ffmpeg(cmd):
return output_path
raise RuntimeError(f"FFmpeg audio split failed: {start}-{end}")
def get_resolution(self, file_path: str) -> tuple[int, int]:
"""获取视频有效显示分辨率(考虑旋转元数据)。"""
info = self.get_video_metadata(file_path)
return (
int(info.get("effective_width") or 0),
int(info.get("effective_height") or 0),
)
def prepare_segment(self, video_path: str, target_duration: float, output_path: str,
target_resolution: Optional[tuple] = None, source_start: float = 0.0,
source_end: Optional[float] = None, target_fps: Optional[int] = None) -> str:
"""将素材视频裁剪或循环到指定时长(无音频)。
target_resolution: (width, height) 如需统一分辨率则传入,否则保持原分辨率。
source_start: 源视频截取起点(秒),默认 0。
source_end: 源视频截取终点(秒),默认到素材结尾。
target_fps: 输出帧率(可选),用于多素材拼接前统一时间基。
"""
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
video_dur = self._get_duration(video_path)
if video_dur <= 0:
video_dur = target_duration
clip_end = video_dur
if source_end is not None:
try:
source_end_value = float(source_end)
if source_end_value > source_start:
clip_end = min(source_end_value, video_dur)
except Exception:
pass
# 可用时长 = 从 source_start 到视频结尾
available = max(clip_end - source_start, 0.1)
needs_loop = target_duration > available
needs_scale = target_resolution is not None
needs_fps = bool(target_fps and target_fps > 0)
has_source_end = clip_end < video_dur
# 当需要循环且存在截取范围时,先裁剪出片段,再循环裁剪后的文件
# 避免 stream_loop 循环整个视频(而不是截取后的片段)
actual_input = video_path
trim_temp = None
if needs_loop and (source_start > 0 or has_source_end):
trim_temp = str(Path(output_path).parent / (Path(output_path).stem + "_trim_tmp.mp4"))
trim_cmd = [
"ffmpeg", "-y",
"-ss", str(source_start),
"-i", video_path,
"-t", str(available),
"-an",
"-c:v", "libx264", "-preset", "fast", "-crf", "23",
trim_temp,
]
if not self._run_ffmpeg(trim_cmd):
raise RuntimeError(f"FFmpeg trim for loop failed: {video_path}")
actual_input = trim_temp
source_start = 0.0 # 已裁剪,不需要再 seek
# 重新计算循环次数(基于裁剪后文件)
available = self._get_duration(trim_temp) or available
loop_count = int(target_duration / available) + 1 if needs_loop else 0
cmd = ["ffmpeg", "-y"]
if needs_loop:
cmd.extend(["-stream_loop", str(loop_count)])
if source_start > 0:
cmd.extend(["-ss", str(source_start)])
cmd.extend(["-i", actual_input, "-t", str(target_duration), "-an"])
filters = []
if needs_fps:
filters.append(f"fps={int(target_fps)}")
if needs_scale:
w, h = target_resolution
filters.append(f"scale={w}:{h}:force_original_aspect_ratio=decrease,pad={w}:{h}:(ow-iw)/2:(oh-ih)/2")
if filters:
cmd.extend(["-vf", ",".join(filters)])
if needs_fps:
cmd.extend(["-vsync", "cfr", "-r", str(int(target_fps))])
# 需要循环、缩放或指定起点时必须重编码,否则用 stream copy 保持原画质
if needs_loop or needs_scale or source_start > 0 or has_source_end or needs_fps:
cmd.extend(["-c:v", "libx264", "-preset", "fast", "-crf", "23"])
else:
cmd.extend(["-c:v", "copy"])
cmd.append(output_path)
try:
if self._run_ffmpeg(cmd):
return output_path
raise RuntimeError(f"FFmpeg prepare_segment failed: {video_path}")
finally:
# 清理裁剪临时文件
if trim_temp:
try:
Path(trim_temp).unlink(missing_ok=True)
except Exception:
pass

View File

@@ -1,37 +1,104 @@
"""
声音克隆服务
通过 HTTP 调用 Qwen3-TTS 独立服务 (端口 8009)
通过 HTTP 调用 CosyVoice 3.0 独立服务 (端口 8010)
"""
import httpx
import asyncio
from pathlib import Path
from typing import Optional
import httpx
from loguru import logger
from app.core.config import settings
# Qwen3-TTS 服务地址
QWEN_TTS_URL = "http://localhost:8009"
# CosyVoice 3.0 服务地址
VOICE_CLONE_URL = "http://localhost:8010"
class VoiceCloneService:
"""声音克隆服务 - 调用 Qwen3-TTS HTTP API"""
"""声音克隆服务 - 调用 CosyVoice 3.0 HTTP API"""
def __init__(self):
self.base_url = QWEN_TTS_URL
self.base_url = VOICE_CLONE_URL
# 健康状态缓存
self._health_cache: Optional[dict] = None
self._health_cache_time: float = 0
# GPU 并发锁 (Serial Queue)
self._lock = asyncio.Lock()
async def _generate_once(
self,
*,
text: str,
ref_audio_data: bytes,
ref_text: str,
language: str,
speed: float = 1.0,
max_retries: int = 4,
) -> bytes:
timeout = httpx.Timeout(240.0)
for attempt in range(max_retries):
try:
async with httpx.AsyncClient(timeout=timeout) as client:
response = await client.post(
f"{self.base_url}/generate",
files={"ref_audio": ("ref.wav", ref_audio_data, "audio/wav")},
data={
"text": text,
"ref_text": ref_text,
"language": language,
"speed": str(speed),
},
)
retryable = False
reason = ""
if response.status_code in (429, 502, 503, 504):
retryable = True
reason = f"HTTP {response.status_code}"
elif response.status_code == 500 and (
"生成超时" in response.text or "timeout" in response.text.lower()
):
retryable = True
reason = "upstream timeout"
if retryable and attempt < max_retries - 1:
wait = 8 * (attempt + 1)
logger.warning(
f"Voice clone retryable error ({reason}), retrying in {wait}s "
f"(attempt {attempt + 1}/{max_retries})"
)
await asyncio.sleep(wait)
continue
response.raise_for_status()
return response.content
except httpx.HTTPStatusError as e:
logger.error(f"Voice clone API error: {e.response.status_code} - {e.response.text}")
raise RuntimeError(f"声音克隆服务错误: {e.response.text}")
except httpx.RequestError as e:
if attempt < max_retries - 1:
wait = 6 * (attempt + 1)
logger.warning(
f"Voice clone connection error: {e}; retrying in {wait}s "
f"(attempt {attempt + 1}/{max_retries})"
)
await asyncio.sleep(wait)
continue
logger.error(f"Voice clone connection error: {e}")
raise RuntimeError("无法连接声音克隆服务,请检查服务是否启动")
raise RuntimeError("声音克隆服务繁忙,请稍后重试")
async def generate_audio(
self,
text: str,
ref_audio_path: str,
ref_text: str,
output_path: str,
language: str = "Chinese"
language: str = "Chinese",
speed: float = 1.0,
) -> str:
"""
使用声音克隆生成语音
@@ -48,63 +115,52 @@ class VoiceCloneService:
"""
# 使用锁确保串行执行,避免 GPU 显存溢出
async with self._lock:
logger.info(f"🎤 Voice Clone: {text[:30]}...")
logger.info(f"🎤 Voice Clone: {text[:30]}... (language={language})")
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
# 读取参考音频
text = text.strip()
if not text:
raise RuntimeError("文本为空,无法生成语音")
with open(ref_audio_path, "rb") as f:
ref_audio_data = f.read()
# 调用 Qwen3-TTS 服务
timeout = httpx.Timeout(300.0) # 5分钟超时
async with httpx.AsyncClient(timeout=timeout) as client:
try:
response = await client.post(
f"{self.base_url}/generate",
files={"ref_audio": ("ref.wav", ref_audio_data, "audio/wav")},
data={
"text": text,
"ref_text": ref_text,
"language": language
}
)
response.raise_for_status()
# 保存返回的音频
with open(output_path, "wb") as f:
f.write(response.content)
logger.info(f"✅ Voice clone saved: {output_path}")
return output_path
except httpx.HTTPStatusError as e:
logger.error(f"Qwen3-TTS API error: {e.response.status_code} - {e.response.text}")
raise RuntimeError(f"声音克隆服务错误: {e.response.text}")
except httpx.RequestError as e:
logger.error(f"Qwen3-TTS connection error: {e}")
raise RuntimeError("无法连接声音克隆服务,请检查服务是否启动")
# CosyVoice 内部自带 text_normalize 分段,无需客户端切分
audio_bytes = await self._generate_once(
text=text,
ref_audio_data=ref_audio_data,
ref_text=ref_text,
language=language,
speed=speed,
)
with open(output_path, "wb") as f:
f.write(audio_bytes)
logger.info(f"✅ Voice clone saved: {output_path}")
return output_path
async def check_health(self) -> dict:
"""健康检查"""
import time
# 5分钟缓存
# 30秒缓存
now = time.time()
if self._health_cache and (now - self._health_cache_time) < 300:
return self._health_cache
cached = self._health_cache
if cached is not None and (now - self._health_cache_time) < 30:
return cached
try:
async with httpx.AsyncClient(timeout=5.0) as client:
response = await client.get(f"{self.base_url}/health")
response.raise_for_status()
self._health_cache = response.json()
payload = response.json()
self._health_cache = payload
self._health_cache_time = now
return self._health_cache
return payload
except Exception as e:
logger.warning(f"Qwen3-TTS health check failed: {e}")
logger.warning(f"Voice clone health check failed: {e}")
return {
"service": "Qwen3-TTS Voice Clone",
"model": "0.6B-Base",
"service": "CosyVoice 3.0 Voice Clone",
"model": "unknown",
"ready": False,
"gpu_id": 0,
"error": str(e)

View File

@@ -20,24 +20,41 @@ MAX_CHARS_PER_LINE = 12
def split_word_to_chars(word: str, start: float, end: float) -> list:
"""
将词拆分成单个字符,时间戳线性插值
将词拆分成单个字符,时间戳线性插值
保留英文词前的空格Whisper 输出如 " Hello"),用于正确重建英文字幕。
Args:
word: 词文本
word: 词文本(可能含前导空格)
start: 词开始时间
end: 词结束时间
Returns:
单字符列表,每个包含 word/start/end
"""
# 保留前导空格(英文 Whisper 输出常见 " Hello" 形式)
leading_space = ""
if word and not word[0].strip():
leading_space = " "
word = word.lstrip()
tokens = []
ascii_buffer = ""
pending_space = False # 记录是否有待处理的空格(用于英文单词间距)
for char in word:
if not char.strip():
# 空格flush ascii_buffer标记下一个 token 需要前导空格
if ascii_buffer:
tokens.append(ascii_buffer)
ascii_buffer = ""
if tokens: # 仅在已有 token 时标记(避免开头重复空格)
pending_space = True
continue
if char.isascii() and char.isalnum():
if pending_space and not ascii_buffer:
ascii_buffer = " " # 将空格前置到新英文单词
pending_space = False
ascii_buffer += char
continue
@@ -45,7 +62,9 @@ def split_word_to_chars(word: str, start: float, end: float) -> list:
tokens.append(ascii_buffer)
ascii_buffer = ""
tokens.append(char)
prefix = " " if pending_space else ""
pending_space = False
tokens.append(prefix + char)
if ascii_buffer:
tokens.append(ascii_buffer)
@@ -54,7 +73,8 @@ def split_word_to_chars(word: str, start: float, end: float) -> list:
return []
if len(tokens) == 1:
return [{"word": tokens[0], "start": start, "end": end}]
w = leading_space + tokens[0] if leading_space else tokens[0]
return [{"word": w, "start": start, "end": end}]
# 线性插值时间戳
duration = end - start
@@ -64,8 +84,11 @@ def split_word_to_chars(word: str, start: float, end: float) -> list:
for i, token in enumerate(tokens):
token_start = start + i * token_duration
token_end = start + (i + 1) * token_duration
w = token
if i == 0 and leading_space:
w = leading_space + w
result.append({
"word": token,
"word": w,
"start": round(token_start, 3),
"end": round(token_end, 3)
})
@@ -108,7 +131,7 @@ def split_segment_to_lines(words: List[dict], max_chars: int = MAX_CHARS_PER_LIN
if should_break and current_words:
segments.append({
"text": current_text,
"text": current_text.strip(),
"start": current_words[0]["start"],
"end": current_words[-1]["end"],
"words": current_words.copy()
@@ -119,7 +142,7 @@ def split_segment_to_lines(words: List[dict], max_chars: int = MAX_CHARS_PER_LIN
# 处理剩余的字
if current_words:
segments.append({
"text": current_text,
"text": current_text.strip(),
"start": current_words[0]["start"],
"end": current_words[-1]["end"],
"words": current_words.copy()
@@ -162,7 +185,9 @@ class WhisperService:
self,
audio_path: str,
text: str,
output_path: Optional[str] = None
output_path: Optional[str] = None,
language: str = "zh",
original_text: Optional[str] = None,
) -> dict:
"""
对音频进行转录,生成字级别时间戳
@@ -171,12 +196,18 @@ class WhisperService:
audio_path: 音频文件路径
text: 原始文本(用于参考,但实际使用 whisper 转录结果)
output_path: 可选,输出 JSON 文件路径
language: 语言代码 (zh/en 等)
original_text: 原始文案。非空时Whisper 仅用于检测总时间范围,
字幕文字用此原文替换(解决语言不匹配问题)
Returns:
包含字级别时间戳的字典
"""
import asyncio
# 英文等西文需要更大的每行字数
max_chars = 40 if language != "zh" else MAX_CHARS_PER_LINE
def _do_transcribe():
model = self._load_model()
@@ -185,22 +216,26 @@ class WhisperService:
# 转录并获取字级别时间戳
segments_iter, info = model.transcribe(
audio_path,
language="zh",
language=language,
word_timestamps=True, # 启用字级别时间戳
vad_filter=True, # 启用 VAD 过滤静音
)
logger.info(f"Detected language: {info.language} (prob: {info.language_probability:.2f})")
# 收集 Whisper 转录结果(始终需要,用于获取时间范围)
all_segments = []
whisper_first_start = None
whisper_last_end = None
for segment in segments_iter:
# 提取每个字的时间戳,并拆分成单字
all_words = []
if segment.words:
for word_info in segment.words:
word_text = word_info.word.strip()
if word_text:
# 将词拆分成单字,时间戳线性插值
word_text = word_info.word
if word_text.strip():
if whisper_first_start is None:
whisper_first_start = word_info.start
whisper_last_end = word_info.end
chars = split_word_to_chars(
word_text,
word_info.start,
@@ -208,11 +243,72 @@ class WhisperService:
)
all_words.extend(chars)
# 将长段落按标点和字数拆分成多行
if all_words:
line_segments = split_segment_to_lines(all_words, MAX_CHARS_PER_LINE)
line_segments = split_segment_to_lines(all_words, max_chars)
all_segments.extend(line_segments)
# 如果提供了 original_text用原文替换 Whisper 转录文字,保留语音节奏
if original_text and original_text.strip() and whisper_first_start is not None:
# 收集 Whisper 逐字时间戳(保留真实语音节奏)
whisper_chars = []
for seg in all_segments:
whisper_chars.extend(seg.get("words", []))
# 用原文字符 + Whisper 节奏生成新的时间戳
orig_chars = split_word_to_chars(
original_text.strip(),
whisper_first_start,
whisper_last_end
)
if orig_chars and len(whisper_chars) >= 2:
# 将原文字符按比例映射到 Whisper 的时间节奏上
n_w = len(whisper_chars)
n_o = len(orig_chars)
w_starts = [c["start"] for c in whisper_chars]
w_final_end = whisper_chars[-1]["end"]
logger.info(
f"Using original_text for subtitles (len={len(original_text)}), "
f"rhythm-mapping {n_o} orig chars onto {n_w} Whisper chars, "
f"time range: {whisper_first_start:.2f}-{whisper_last_end:.2f}s"
)
remapped = []
for i, oc in enumerate(orig_chars):
# 原文第 i 个字符对应 Whisper 时间线的位置
pos = (i / n_o) * n_w
idx = min(int(pos), n_w - 1)
frac = pos - idx
t_start = (
w_starts[idx] + frac * (w_starts[idx + 1] - w_starts[idx])
if idx < n_w - 1
else w_starts[idx] + frac * (w_final_end - w_starts[idx])
)
# 结束时间 = 下一个字符的开始时间
pos_next = ((i + 1) / n_o) * n_w
idx_n = min(int(pos_next), n_w - 1)
frac_n = pos_next - idx_n
t_end = (
w_starts[idx_n] + frac_n * (w_starts[idx_n + 1] - w_starts[idx_n])
if idx_n < n_w - 1
else w_starts[idx_n] + frac_n * (w_final_end - w_starts[idx_n])
)
remapped.append({
"word": oc["word"],
"start": round(t_start, 3),
"end": round(t_end, 3),
})
all_segments = split_segment_to_lines(remapped, max_chars)
logger.info(f"Rebuilt {len(all_segments)} subtitle segments (rhythm-mapped)")
elif orig_chars:
# Whisper 字符不足,退回线性插值
all_segments = split_segment_to_lines(orig_chars, max_chars)
logger.info(f"Rebuilt {len(all_segments)} subtitle segments (linear fallback)")
logger.info(f"Generated {len(all_segments)} subtitle segments")
return {"segments": all_segments}
@@ -230,12 +326,13 @@ class WhisperService:
return result
async def transcribe(self, audio_path: str) -> str:
async def transcribe(self, audio_path: str, language: str | None = None) -> str:
"""
仅转录文本(用于提取文案)
Args:
audio_path: 音频/视频文件路径
language: 语言代码None 表示自动检测
Returns:
纯文本内容
@@ -249,7 +346,7 @@ class WhisperService:
# 转录 (无需字级时间戳)
segments_iter, _ = model.transcribe(
audio_path,
language="zh",
language=language,
word_timestamps=False,
vad_filter=True,
)

View File

@@ -54,5 +54,61 @@
"letter_spacing": 1,
"bottom_margin": 72,
"is_default": false
},
{
"id": "subtitle_pink",
"label": "少女粉",
"font_file": "DingTalk JinBuTi.ttf",
"font_family": "DingTalkJinBuTi",
"font_size": 56,
"highlight_color": "#FF69B4",
"normal_color": "#FFFFFF",
"stroke_color": "#1A0010",
"stroke_size": 3,
"letter_spacing": 2,
"bottom_margin": 80,
"is_default": false
},
{
"id": "subtitle_lime",
"label": "清新绿",
"font_file": "DingTalk Sans.ttf",
"font_family": "DingTalkSans",
"font_size": 50,
"highlight_color": "#76FF03",
"normal_color": "#FFFFFF",
"stroke_color": "#001A00",
"stroke_size": 3,
"letter_spacing": 1,
"bottom_margin": 78,
"is_default": false
},
{
"id": "subtitle_gold",
"label": "金色隶书",
"font_file": "阿里妈妈刀隶体.ttf",
"font_family": "AliMamaDaoLiTi",
"font_size": 56,
"highlight_color": "#FDE68A",
"normal_color": "#E8D5B0",
"stroke_color": "#2B1B00",
"stroke_size": 3,
"letter_spacing": 3,
"bottom_margin": 80,
"is_default": false
},
{
"id": "subtitle_kai",
"label": "楷体红字",
"font_file": "simkai.ttf",
"font_family": "SimKai",
"font_size": 54,
"highlight_color": "#FF4444",
"normal_color": "#FFFFFF",
"stroke_color": "#000000",
"stroke_size": 3,
"letter_spacing": 2,
"bottom_margin": 80,
"is_default": false
}
]

View File

@@ -7,7 +7,7 @@
"font_size": 90,
"color": "#FFFFFF",
"stroke_color": "#000000",
"stroke_size": 8,
"stroke_size": 5,
"letter_spacing": 5,
"top_margin": 62,
"font_weight": 900,
@@ -21,7 +21,7 @@
"font_size": 72,
"color": "#FFFFFF",
"stroke_color": "#000000",
"stroke_size": 8,
"stroke_size": 5,
"letter_spacing": 4,
"top_margin": 60,
"font_weight": 900,
@@ -35,7 +35,7 @@
"font_size": 70,
"color": "#FDE68A",
"stroke_color": "#2B1B00",
"stroke_size": 8,
"stroke_size": 5,
"letter_spacing": 3,
"top_margin": 58,
"font_weight": 800,
@@ -49,10 +49,122 @@
"font_size": 72,
"color": "#FFFFFF",
"stroke_color": "#1F0A00",
"stroke_size": 8,
"stroke_size": 5,
"letter_spacing": 4,
"top_margin": 60,
"font_weight": 900,
"is_default": false
},
{
"id": "title_pangmen",
"label": "庞门正道",
"font_file": "title/庞门正道标题体3.0.ttf",
"font_family": "PangMenZhengDao",
"font_size": 80,
"color": "#FFFFFF",
"stroke_color": "#000000",
"stroke_size": 5,
"letter_spacing": 5,
"top_margin": 60,
"font_weight": 900,
"is_default": false
},
{
"id": "title_round",
"label": "优设标题圆",
"font_file": "title/优设标题圆.otf",
"font_family": "YouSheBiaoTiYuan",
"font_size": 78,
"color": "#FFFFFF",
"stroke_color": "#4A1A6B",
"stroke_size": 5,
"letter_spacing": 4,
"top_margin": 60,
"font_weight": 900,
"is_default": false
},
{
"id": "title_alibaba",
"label": "阿里数黑体",
"font_file": "title/阿里巴巴数黑体.ttf",
"font_family": "AlibabaShuHeiTi",
"font_size": 72,
"color": "#FFFFFF",
"stroke_color": "#000000",
"stroke_size": 4,
"letter_spacing": 3,
"top_margin": 60,
"font_weight": 900,
"is_default": false
},
{
"id": "title_chaohei",
"label": "文道潮黑",
"font_file": "title/文道潮黑.ttf",
"font_family": "WenDaoChaoHei",
"font_size": 76,
"color": "#00E5FF",
"stroke_color": "#001A33",
"stroke_size": 5,
"letter_spacing": 4,
"top_margin": 60,
"font_weight": 900,
"is_default": false
},
{
"id": "title_wujie",
"label": "无界黑",
"font_file": "title/标小智无界黑.otf",
"font_family": "BiaoXiaoZhiWuJieHei",
"font_size": 74,
"color": "#FFFFFF",
"stroke_color": "#1A1A1A",
"stroke_size": 4,
"letter_spacing": 3,
"top_margin": 60,
"font_weight": 900,
"is_default": false
},
{
"id": "title_houdi",
"label": "厚底黑",
"font_file": "title/Aa厚底黑.ttf",
"font_family": "AaHouDiHei",
"font_size": 76,
"color": "#FF6B6B",
"stroke_color": "#1A0000",
"stroke_size": 5,
"letter_spacing": 4,
"top_margin": 60,
"font_weight": 900,
"is_default": false
},
{
"id": "title_banyuan",
"label": "寒蝉半圆体",
"font_file": "title/寒蝉半圆体.otf",
"font_family": "HanChanBanYuan",
"font_size": 78,
"color": "#FFFFFF",
"stroke_color": "#000000",
"stroke_size": 5,
"letter_spacing": 4,
"top_margin": 60,
"font_weight": 900,
"is_default": false
},
{
"id": "title_jixiang",
"label": "欣意吉祥宋",
"font_file": "title/字体圈欣意吉祥宋.ttf",
"font_family": "XinYiJiXiangSong",
"font_size": 70,
"color": "#FDE68A",
"stroke_color": "#2B1B00",
"stroke_size": 5,
"letter_spacing": 3,
"top_margin": 58,
"font_weight": 800,
"is_default": false
}
]

View File

@@ -71,3 +71,18 @@ CREATE TRIGGER users_updated_at
BEFORE UPDATE ON users
FOR EACH ROW
EXECUTE FUNCTION update_updated_at();
-- 8. 订单表(支付宝付费)
CREATE TABLE IF NOT EXISTS orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
out_trade_no TEXT UNIQUE NOT NULL,
amount DECIMAL(10, 2) NOT NULL DEFAULT 999.00,
status TEXT DEFAULT 'pending' CHECK (status IN ('pending', 'paid', 'failed')),
trade_no TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
paid_at TIMESTAMP WITH TIME ZONE
);
CREATE INDEX IF NOT EXISTS idx_orders_user_id ON orders(user_id);
CREATE INDEX IF NOT EXISTS idx_orders_out_trade_no ON orders(out_trade_no);

31
backend/package-lock.json generated Normal file
View File

@@ -0,0 +1,31 @@
{
"name": "backend",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"dependencies": {
"qrcode.react": "^4.2.0"
}
},
"node_modules/qrcode.react": {
"version": "4.2.0",
"resolved": "https://registry.npmjs.org/qrcode.react/-/qrcode.react-4.2.0.tgz",
"integrity": "sha512-QpgqWi8rD9DsS9EP3z7BT+5lY5SFhsqGjpgW5DY/i3mK4M9DTBNz3ErMi8BWYEfI3L0d8GIbGmcdFAS1uIRGjA==",
"license": "ISC",
"peerDependencies": {
"react": "^16.8.0 || ^17.0.0 || ^18.0.0 || ^19.0.0"
}
},
"node_modules/react": {
"version": "19.2.4",
"resolved": "https://registry.npmjs.org/react/-/react-19.2.4.tgz",
"integrity": "sha512-9nfp2hYpCwOjAN+8TZFGhtWEwgvWHXqESH8qT89AT/lWklpLON22Lc8pEtnpsZz7VmawabSU0gCjnj8aC0euHQ==",
"license": "MIT",
"peer": true,
"engines": {
"node": ">=0.10.0"
}
}
}
}

5
backend/package.json Normal file
View File

@@ -0,0 +1,5 @@
{
"dependencies": {
"qrcode.react": "^4.2.0"
}
}

View File

@@ -29,6 +29,9 @@ python-jose[cryptography]>=3.3.0
passlib[bcrypt]>=1.7.4
bcrypt==4.0.1
# 支付宝支付
python-alipay-sdk>=3.6.0
# 字幕对齐
faster-whisper>=1.0.0

View File

@@ -20,62 +20,81 @@ logger = logging.getLogger("Watchdog")
# 服务配置
SERVICES = [
{
"name": "vigent2-qwen-tts",
"url": "http://localhost:8009/health",
"name": "vigent2-cosyvoice",
"url": "http://localhost:8010/health",
"failures": 0,
"threshold": 3,
"threshold": 3, # 连续3次失败才重启3×15s ≈ 45秒容忍期
"timeout": 10.0,
"restart_cmd": ["pm2", "restart", "vigent2-qwen-tts"]
"restart_cmd": ["pm2", "restart", "vigent2-cosyvoice"],
"cooldown_until": 0, # 重启后的冷却截止时间戳
"cooldown_sec": 45, # 重启后等待45秒再开始检查
}
]
async def check_service(service):
"""检查单个服务健康状态"""
# 冷却期内跳过检查
now = time.time()
if now < service.get("cooldown_until", 0):
remaining = int(service["cooldown_until"] - now)
logger.debug(f"⏳ 服务 {service['name']} 冷却中,剩余 {remaining}s")
return True
try:
timeout = service.get("timeout", 10.0)
async with httpx.AsyncClient(timeout=timeout) as client:
response = await client.get(service["url"])
if response.status_code == 200:
# 成功
if service["failures"] > 0:
logger.info(f"✅ 服务 {service['name']} 已恢复正常")
service["failures"] = 0
return True
ready = True
try:
payload = response.json()
ready = bool(payload.get("ready", True))
except Exception:
payload = {}
if ready:
if service["failures"] > 0:
logger.info(f"✅ 服务 {service['name']} 已恢复正常")
service["failures"] = 0
return True
logger.warning(f"⚠️ 服务 {service['name']} ready=false健康检查未通过: {payload}")
else:
logger.warning(f"⚠️ 服务 {service['name']} 返回状态码 {response.status_code}")
except Exception as e:
logger.warning(f"⚠️ 无法连接服务 {service['name']}: {str(e)}")
# 失败处理
service["failures"] += 1
logger.warning(f"❌ 服务 {service['name']} 连续失败 {service['failures']}/{service['threshold']}")
if service["failures"] >= service['threshold']:
logger.error(f"🚨 服务 {service['name']} 已达到失败阈值,正在重启...")
try:
subprocess.run(service["restart_cmd"], check=True)
logger.info(f"♻️ 服务 {service['name']} 重启命令已发送")
# 重启后给予一段宽限期 (例如 60秒) 不检查,等待服务启动
service["failures"] = 0 # 重置计数
return "restarting"
service["failures"] = 0
# 设置冷却期,等待服务完成启动和模型加载
service["cooldown_until"] = time.time() + service.get("cooldown_sec", 120)
return "restarting"
except Exception as restart_error:
logger.error(f"💥 重启服务 {service['name']} 失败: {restart_error}")
return False
async def main():
logger.info("🛡️ ViGent2 服务看门狗 (Watchdog) 已启动")
# 启动时给所有服务一个初始冷却期,避免服务还没起来就被判定失败
for service in SERVICES:
service["cooldown_until"] = time.time() + 60
while True:
# 并发检查所有服务
for service in SERVICES:
result = await check_service(service)
if result == "restarting":
# 如果有服务重启,额外等待包含启动时间
pass
# 每 30 秒检查一次
await asyncio.sleep(30)
await check_service(service)
# 每 15 秒检查一次
await asyncio.sleep(15)
if __name__ == "__main__":
try:

View File

@@ -8,14 +8,19 @@
"name": "frontend",
"version": "0.1.0",
"dependencies": {
"@dnd-kit/core": "^6.3.1",
"@dnd-kit/sortable": "^10.0.0",
"@dnd-kit/utilities": "^3.2.2",
"@supabase/supabase-js": "^2.93.1",
"axios": "^1.13.4",
"lucide-react": "^0.563.0",
"next": "16.1.1",
"qrcode.react": "^4.2.0",
"react": "19.2.3",
"react-dom": "19.2.3",
"sonner": "^2.0.7",
"swr": "^2.3.8"
"swr": "^2.3.8",
"wavesurfer.js": "^7.12.1"
},
"devDependencies": {
"@tailwindcss/postcss": "^4",
@@ -281,6 +286,59 @@
"node": ">=6.9.0"
}
},
"node_modules/@dnd-kit/accessibility": {
"version": "3.1.1",
"resolved": "https://registry.npmjs.org/@dnd-kit/accessibility/-/accessibility-3.1.1.tgz",
"integrity": "sha512-2P+YgaXF+gRsIihwwY1gCsQSYnu9Zyj2py8kY5fFvUM1qm2WA2u639R6YNVfU4GWr+ZM5mqEsfHZZLoRONbemw==",
"license": "MIT",
"dependencies": {
"tslib": "^2.0.0"
},
"peerDependencies": {
"react": ">=16.8.0"
}
},
"node_modules/@dnd-kit/core": {
"version": "6.3.1",
"resolved": "https://registry.npmjs.org/@dnd-kit/core/-/core-6.3.1.tgz",
"integrity": "sha512-xkGBRQQab4RLwgXxoqETICr6S5JlogafbhNsidmrkVv2YRs5MLwpjoF2qpiGjQt8S9AoxtIV603s0GIUpY5eYQ==",
"license": "MIT",
"dependencies": {
"@dnd-kit/accessibility": "^3.1.1",
"@dnd-kit/utilities": "^3.2.2",
"tslib": "^2.0.0"
},
"peerDependencies": {
"react": ">=16.8.0",
"react-dom": ">=16.8.0"
}
},
"node_modules/@dnd-kit/sortable": {
"version": "10.0.0",
"resolved": "https://registry.npmjs.org/@dnd-kit/sortable/-/sortable-10.0.0.tgz",
"integrity": "sha512-+xqhmIIzvAYMGfBYYnbKuNicfSsk4RksY2XdmJhT+HAC01nix6fHCztU68jooFiMUB01Ky3F0FyOvhG/BZrWkg==",
"license": "MIT",
"dependencies": {
"@dnd-kit/utilities": "^3.2.2",
"tslib": "^2.0.0"
},
"peerDependencies": {
"@dnd-kit/core": "^6.3.0",
"react": ">=16.8.0"
}
},
"node_modules/@dnd-kit/utilities": {
"version": "3.2.2",
"resolved": "https://registry.npmjs.org/@dnd-kit/utilities/-/utilities-3.2.2.tgz",
"integrity": "sha512-+MKAJEOfaBe5SmV6t34p80MMKhjvUz0vRrvVJbPT0WElzaOJ/1xs+D+KDv+tD/NE5ujfrChEcshd4fLn0wpiqg==",
"license": "MIT",
"dependencies": {
"tslib": "^2.0.0"
},
"peerDependencies": {
"react": ">=16.8.0"
}
},
"node_modules/@emnapi/core": {
"version": "1.8.1",
"resolved": "https://registry.npmjs.org/@emnapi/core/-/core-1.8.1.tgz",
@@ -5561,6 +5619,15 @@
"node": ">=6"
}
},
"node_modules/qrcode.react": {
"version": "4.2.0",
"resolved": "https://registry.npmjs.org/qrcode.react/-/qrcode.react-4.2.0.tgz",
"integrity": "sha512-QpgqWi8rD9DsS9EP3z7BT+5lY5SFhsqGjpgW5DY/i3mK4M9DTBNz3ErMi8BWYEfI3L0d8GIbGmcdFAS1uIRGjA==",
"license": "ISC",
"peerDependencies": {
"react": "^16.8.0 || ^17.0.0 || ^18.0.0 || ^19.0.0"
}
},
"node_modules/queue-microtask": {
"version": "1.2.3",
"resolved": "https://registry.npmjs.org/queue-microtask/-/queue-microtask-1.2.3.tgz",
@@ -6611,6 +6678,12 @@
"react": "^16.8.0 || ^17.0.0 || ^18.0.0 || ^19.0.0"
}
},
"node_modules/wavesurfer.js": {
"version": "7.12.1",
"resolved": "https://registry.npmjs.org/wavesurfer.js/-/wavesurfer.js-7.12.1.tgz",
"integrity": "sha512-NswPjVHxk0Q1F/VMRemCPUzSojjuHHisQrBqQiRXg7MVbe3f5vQ6r0rTTXA/a/neC/4hnOEC4YpXca4LpH0SUg==",
"license": "BSD-3-Clause"
},
"node_modules/which": {
"version": "2.0.2",
"resolved": "https://registry.npmjs.org/which/-/which-2.0.2.tgz",

View File

@@ -9,14 +9,19 @@
"lint": "eslint"
},
"dependencies": {
"@dnd-kit/core": "^6.3.1",
"@dnd-kit/sortable": "^10.0.0",
"@dnd-kit/utilities": "^3.2.2",
"@supabase/supabase-js": "^2.93.1",
"axios": "^1.13.4",
"lucide-react": "^0.563.0",
"next": "16.1.1",
"qrcode.react": "^4.2.0",
"react": "19.2.3",
"react-dom": "19.2.3",
"sonner": "^2.0.7",
"swr": "^2.3.8"
"swr": "^2.3.8",
"wavesurfer.js": "^7.12.1"
},
"devDependencies": {
"@tailwindcss/postcss": "^4",

View File

@@ -1,8 +1,8 @@
import type { Metadata, Viewport } from "next";
import { Geist, Geist_Mono } from "next/font/google";
import "./globals.css";
import { AuthProvider } from "@/contexts/AuthContext";
import { TaskProvider } from "@/contexts/TaskContext";
import { AuthProvider } from "@/shared/contexts/AuthContext";
import { TaskProvider } from "@/shared/contexts/TaskContext";
import { Toaster } from "sonner";
@@ -46,7 +46,6 @@ export default function RootLayout({
<Toaster
position="top-center"
richColors
closeButton
toastOptions={{
duration: 3000,
className: "text-sm",

View File

@@ -3,9 +3,11 @@
import { useState } from 'react';
import { useRouter } from 'next/navigation';
import { login } from "@/shared/lib/auth";
import { useAuth } from "@/shared/contexts/AuthContext";
export default function LoginPage() {
const router = useRouter();
const { setUser } = useAuth();
const [phone, setPhone] = useState('');
const [password, setPassword] = useState('');
const [error, setError] = useState('');
@@ -25,7 +27,11 @@ export default function LoginPage() {
try {
const result = await login(phone, password);
if (result.success) {
if (result.paymentToken) {
sessionStorage.setItem('payment_token', result.paymentToken);
router.push('/pay');
} else if (result.success) {
if (result.user) setUser(result.user);
router.push('/');
} else {
setError(result.message || '登录失败');

View File

@@ -0,0 +1,160 @@
'use client';
import { Suspense, useState, useEffect, useRef } from 'react';
import { useRouter, useSearchParams } from 'next/navigation';
import api from '@/shared/api/axios';
type PageStatus = 'loading' | 'redirecting' | 'checking' | 'success' | 'error';
function PayContent() {
const router = useRouter();
const searchParams = useSearchParams();
const [status, setStatus] = useState<PageStatus>('loading');
const [errorMsg, setErrorMsg] = useState('');
const pollRef = useRef<ReturnType<typeof setInterval> | null>(null);
useEffect(() => {
const outTradeNo = searchParams.get('out_trade_no');
if (outTradeNo) {
setStatus('checking');
startPolling(outTradeNo);
return;
}
const token = sessionStorage.getItem('payment_token');
if (!token) {
router.replace('/login');
return;
}
createOrder(token);
return () => {
if (pollRef.current) clearInterval(pollRef.current);
};
}, []);
const createOrder = async (token: string) => {
try {
const { data } = await api.post('/api/payment/create-order', { payment_token: token });
const { pay_url } = data.data;
setStatus('redirecting');
window.location.href = pay_url;
} catch (err: any) {
setStatus('error');
setErrorMsg(err.response?.data?.message || '创建订单失败,请重新登录');
}
};
const startPolling = (tradeNo: string) => {
checkStatus(tradeNo);
pollRef.current = setInterval(() => checkStatus(tradeNo), 3000);
};
const checkStatus = async (tradeNo: string) => {
try {
const { data } = await api.get(`/api/payment/status/${tradeNo}`);
if (data.data.status === 'paid') {
if (pollRef.current) clearInterval(pollRef.current);
setStatus('success');
sessionStorage.removeItem('payment_token');
setTimeout(() => router.replace('/login'), 3000);
}
} catch {
// ignore polling errors
}
};
return (
<div className="w-full max-w-md p-8 bg-white/10 backdrop-blur-lg rounded-2xl shadow-2xl border border-white/20">
{(status === 'loading' || status === 'redirecting') && (
<div className="text-center">
<div className="mb-6">
<svg className="animate-spin h-12 w-12 mx-auto text-purple-400" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4"></circle>
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"></path>
</svg>
</div>
<p className="text-gray-300">
{status === 'loading' ? '正在创建订单...' : '正在跳转到支付宝...'}
</p>
</div>
)}
{status === 'checking' && (
<div className="text-center">
<h1 className="text-2xl font-bold text-white mb-6"></h1>
<div className="flex items-center justify-center gap-2 text-purple-300 mb-4">
<svg className="animate-spin h-5 w-5" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4"></circle>
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"></path>
</svg>
...
</div>
<p className="text-gray-400 text-sm"></p>
</div>
)}
{status === 'success' && (
<div className="text-center">
<div className="mb-6">
<svg className="w-16 h-16 mx-auto text-green-400" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 12l2 2 4-4m6 2a9 9 0 11-18 0 9 9 0 0118 0z" />
</svg>
</div>
<h2 className="text-2xl font-bold text-white mb-4"></h2>
<p className="text-gray-300 mb-2">...</p>
<p className="text-gray-500 text-sm">使</p>
</div>
)}
{status === 'error' && (
<div className="text-center">
<div className="mb-6">
<svg className="w-16 h-16 mx-auto text-red-400" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 8v4m0 4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
</svg>
</div>
<h2 className="text-2xl font-bold text-white mb-4"></h2>
<p className="text-red-300 mb-6">{errorMsg}</p>
<button
onClick={() => router.replace('/login')}
className="py-3 px-6 bg-gradient-to-r from-purple-600 to-pink-600 text-white font-semibold rounded-lg"
>
</button>
</div>
)}
{status === 'checking' && (
<div className="mt-6 text-center">
<button
onClick={() => {
if (pollRef.current) clearInterval(pollRef.current);
router.replace('/login');
}}
className="text-purple-300 hover:text-purple-200 text-sm"
>
</button>
</div>
)}
</div>
);
}
export default function PayPage() {
return (
<div className="min-h-dvh flex items-center justify-center">
<Suspense fallback={
<div className="w-full max-w-md p-8 bg-white/10 backdrop-blur-lg rounded-2xl shadow-2xl border border-white/20 text-center">
<svg className="animate-spin h-12 w-12 mx-auto text-purple-400" xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 24 24">
<circle className="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" strokeWidth="4"></circle>
<path className="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4zm2 5.291A7.962 7.962 0 014 12H0c0 3.042 1.135 5.824 3 7.938l3-2.647z"></path>
</svg>
</div>
}>
<PayContent />
</Suspense>
</div>
);
}

View File

@@ -61,7 +61,7 @@ export default function RegisterPage() {
</div>
<h2 className="text-2xl font-bold text-white mb-4"></h2>
<p className="text-gray-300 mb-6">
</p>
<a
href="/login"

View File

@@ -1,7 +1,7 @@
"use client";
import { useState, useEffect, useRef } from "react";
import { useAuth } from "@/contexts/AuthContext";
import { useAuth } from "@/shared/contexts/AuthContext";
import api from "@/shared/api/axios";
import { ApiResponse } from "@/shared/api/types";
@@ -106,6 +106,10 @@ export default function AccountSettingsDropdown() {
{/* 下拉菜单 */}
{isOpen && (
<div className="absolute right-0 mt-2 bg-gray-800 border border-white/10 rounded-lg shadow-xl z-[160] overflow-hidden whitespace-nowrap">
{/* 账户名称 */}
<div className="px-3 py-2 border-b border-white/10 text-center">
<div className="text-sm text-white font-medium">{user?.phone ? `${user.phone.slice(0, 3)}****${user.phone.slice(-4)}` : '未知账户'}</div>
</div>
{/* 有效期显示 */}
<div className="px-3 py-2 border-b border-white/10 text-center">
<div className="text-xs text-gray-400"></div>
@@ -188,6 +192,7 @@ export default function AccountSettingsDropdown() {
onClick={() => {
setShowPasswordModal(false);
setError('');
setSuccess('');
setOldPassword('');
setNewPassword('');
setConfirmPassword('');

View File

@@ -1,6 +1,6 @@
"use client";
import { useTask } from "@/contexts/TaskContext";
import { useTask } from "@/shared/contexts/TaskContext";
import Link from "next/link";
import { usePathname } from "next/navigation";

View File

@@ -0,0 +1,193 @@
import { useCallback, useEffect, useRef, useState } from "react";
import api from "@/shared/api/axios";
import { ApiResponse, unwrap } from "@/shared/api/types";
import { toast } from "sonner";
export interface GeneratedAudio {
id: string;
name: string;
path: string;
duration_sec: number;
text: string;
tts_mode: string;
language: string;
created_at: number;
}
interface AudioTask {
status: string;
progress?: number;
message?: string;
output?: GeneratedAudio & { audio_id: string };
}
interface UseGeneratedAudiosOptions {
selectedAudioId: string | null;
setSelectedAudioId: React.Dispatch<React.SetStateAction<string | null>>;
}
export const useGeneratedAudios = ({
selectedAudioId,
setSelectedAudioId,
}: UseGeneratedAudiosOptions) => {
const [generatedAudios, setGeneratedAudios] = useState<GeneratedAudio[]>([]);
const [selectedAudio, setSelectedAudio] = useState<GeneratedAudio | null>(null);
const [isGeneratingAudio, setIsGeneratingAudio] = useState(false);
const [audioTaskId, setAudioTaskId] = useState<string | null>(null);
const [audioTask, setAudioTask] = useState<AudioTask | null>(null);
const pollRef = useRef<NodeJS.Timeout | null>(null);
const fetchGeneratedAudios = useCallback(async (selectId?: string) => {
try {
const { data: res } = await api.get<ApiResponse<{ items: GeneratedAudio[] }>>(
"/api/generated-audios"
);
const payload = unwrap(res);
const items: GeneratedAudio[] = payload.items || [];
setGeneratedAudios(items);
if (selectId && items.length > 0) {
if (selectId === "__latest__") {
setSelectedAudioId(items[0].id);
setSelectedAudio(items[0]);
} else {
const found = items.find((a) => a.id === selectId);
if (found) {
setSelectedAudioId(found.id);
setSelectedAudio(found);
}
}
}
} catch (error) {
console.error("获取配音列表失败:", error);
}
}, [setSelectedAudioId]);
// Sync selectedAudio when selectedAudioId changes externally (e.g. from persistence)
useEffect(() => {
if (!selectedAudioId || generatedAudios.length === 0) return;
const found = generatedAudios.find((a) => a.id === selectedAudioId);
if (found) {
setSelectedAudio(found);
}
}, [selectedAudioId, generatedAudios]);
const stopPolling = useCallback(() => {
if (pollRef.current) {
clearInterval(pollRef.current);
pollRef.current = null;
}
}, []);
const startPolling = useCallback((taskId: string) => {
stopPolling();
pollRef.current = setInterval(async () => {
try {
const { data: res } = await api.get<ApiResponse<AudioTask>>(
`/api/generated-audios/tasks/${taskId}`
);
const task = unwrap(res);
setAudioTask(task);
if (task.status === "completed") {
stopPolling();
setIsGeneratingAudio(false);
setAudioTaskId(null);
// Refresh list and select the new audio
await fetchGeneratedAudios("__latest__");
toast.success(task.message || "配音生成完成");
} else if (task.status === "failed") {
stopPolling();
setIsGeneratingAudio(false);
setAudioTaskId(null);
toast.error(task.message || "配音生成失败");
} else if (task.status === "not_found") {
stopPolling();
setIsGeneratingAudio(false);
setAudioTaskId(null);
setAudioTask(null);
toast.error("任务已丢失(服务可能已重启),请重新生成");
}
} catch {
// Network error, keep polling
}
}, 1000);
}, [stopPolling, fetchGeneratedAudios]);
// Cleanup on unmount
useEffect(() => {
return () => stopPolling();
}, [stopPolling]);
const generateAudio = useCallback(async (params: {
text: string;
tts_mode: string;
voice?: string;
ref_audio_id?: string;
ref_text?: string;
language: string;
speed?: number;
}) => {
setIsGeneratingAudio(true);
setAudioTask({ status: "pending", progress: 0, message: "正在提交..." });
try {
const { data: res } = await api.post<ApiResponse<{ task_id: string }>>(
"/api/generated-audios/generate",
params
);
const { task_id } = unwrap(res);
setAudioTaskId(task_id);
startPolling(task_id);
} catch (err: unknown) {
setIsGeneratingAudio(false);
setAudioTask(null);
const axiosErr = err as { response?: { data?: { message?: string } }; message?: string };
const errorMsg = axiosErr.response?.data?.message || axiosErr.message || String(err);
toast.error(`配音生成失败: ${errorMsg}`);
}
}, [startPolling]);
const deleteAudio = useCallback(async (audioId: string) => {
if (!confirm("确定要删除这个配音吗?")) return;
try {
await api.delete(`/api/generated-audios/${encodeURIComponent(audioId)}`);
if (selectedAudioId === audioId) {
setSelectedAudioId(null);
setSelectedAudio(null);
}
fetchGeneratedAudios();
} catch (error) {
toast.error("删除失败: " + error);
}
}, [fetchGeneratedAudios, selectedAudioId, setSelectedAudioId]);
const renameAudio = useCallback(async (audioId: string, newName: string) => {
try {
await api.put(`/api/generated-audios/${encodeURIComponent(audioId)}`, {
new_name: newName,
});
fetchGeneratedAudios();
} catch (err: unknown) {
toast.error("重命名失败: " + String(err));
}
}, [fetchGeneratedAudios]);
const selectAudio = useCallback((audio: GeneratedAudio) => {
setSelectedAudioId(audio.id);
setSelectedAudio(audio);
}, [setSelectedAudioId]);
return {
generatedAudios,
selectedAudio,
selectedAudioId,
isGeneratingAudio,
audioTask,
fetchGeneratedAudios,
generateAudio,
deleteAudio,
renameAudio,
selectAudio,
};
};

View File

@@ -12,7 +12,7 @@ interface GeneratedVideo {
}
interface UseGeneratedVideosOptions {
storageKey: string;
selectedVideoId: string | null;
setSelectedVideoId: React.Dispatch<React.SetStateAction<string | null>>;
setGeneratedVideo: React.Dispatch<React.SetStateAction<string | null>>;
@@ -20,7 +20,7 @@ interface UseGeneratedVideosOptions {
}
export const useGeneratedVideos = ({
storageKey,
selectedVideoId,
setSelectedVideoId,
setGeneratedVideo,
@@ -45,6 +45,8 @@ export const useGeneratedVideos = ({
if (preferVideoId === "__latest__") {
setSelectedVideoId(videos[0].id);
setGeneratedVideo(resolveMediaUrl(videos[0].path));
// 写入跨页面共享标记,让另一个页面也能感知最新生成的视频
localStorage.setItem(`vigent_${storageKey}_latestGeneratedVideoId`, videos[0].id);
} else {
const found = videos.find(v => v.id === preferVideoId);
if (found) {

View File

@@ -1,4 +1,4 @@
import { useEffect, useRef, useState } from "react";
import { useEffect, useMemo, useRef, useState } from "react";
import api from "@/shared/api/axios";
import {
buildTextShadow,
@@ -9,35 +9,89 @@ import {
resolveBgmUrl,
resolveMediaUrl,
} from "@/shared/lib/media";
import { clampTitle } from "@/shared/lib/title";
import { clampTitle, clampSecondaryTitle, SECONDARY_TITLE_MAX_LENGTH } from "@/shared/lib/title";
import { useTitleInput } from "@/shared/hooks/useTitleInput";
import { useAuth } from "@/contexts/AuthContext";
import { useTask } from "@/contexts/TaskContext";
import { useAuth } from "@/shared/contexts/AuthContext";
import { useTask } from "@/shared/contexts/TaskContext";
import { toast } from "sonner";
import { usePublishPrefetch } from "@/shared/hooks/usePublishPrefetch";
import { PublishAccount } from "@/shared/types/publish";
import { useBgm } from "@/features/home/model/useBgm";
import { useGeneratedVideos } from "@/features/home/model/useGeneratedVideos";
import { useGeneratedAudios } from "@/features/home/model/useGeneratedAudios";
import { useHomePersistence } from "@/features/home/model/useHomePersistence";
import { useMaterials } from "@/features/home/model/useMaterials";
import { useMediaPlayers } from "@/features/home/model/useMediaPlayers";
import { useRefAudios } from "@/features/home/model/useRefAudios";
import { useTitleSubtitleStyles } from "@/features/home/model/useTitleSubtitleStyles";
import { useTimelineEditor } from "@/features/home/model/useTimelineEditor";
import { useSavedScripts } from "@/features/home/model/useSavedScripts";
import { useVideoFrameCapture } from "@/features/home/model/useVideoFrameCapture";
import { ApiResponse, unwrap } from "@/shared/api/types";
const VOICES = [
{ id: "zh-CN-YunxiNeural", name: "云溪 (男声-年轻)" },
{ id: "zh-CN-YunjianNeural", name: "云 (男声-新闻)" },
{ id: "zh-CN-YunyangNeural", name: "云 (男声-专业)" },
{ id: "zh-CN-XiaoxiaoNeural", name: "晓晓 (声-活泼)" },
{ id: "zh-CN-XiaoyiNeural", name: "晓 (女声-温柔)" },
];
const VOICES: Record<string, { id: string; name: string }[]> = {
"zh-CN": [
{ id: "zh-CN-YunxiNeural", name: "云 (男声-年轻)" },
{ id: "zh-CN-YunjianNeural", name: "云 (男声-新闻)" },
{ id: "zh-CN-YunyangNeural", name: "云扬 (声-专业)" },
{ id: "zh-CN-XiaoxiaoNeural", name: "晓 (女声-活泼)" },
{ id: "zh-CN-XiaoyiNeural", name: "晓伊 (女声-温柔)" },
],
"en-US": [
{ id: "en-US-GuyNeural", name: "Guy (Male)" },
{ id: "en-US-JennyNeural", name: "Jenny (Female)" },
],
"ja-JP": [
{ id: "ja-JP-KeitaNeural", name: "圭太 (男声)" },
{ id: "ja-JP-NanamiNeural", name: "七海 (女声)" },
],
"ko-KR": [
{ id: "ko-KR-InJoonNeural", name: "인준 (男声)" },
{ id: "ko-KR-SunHiNeural", name: "선히 (女声)" },
],
"fr-FR": [
{ id: "fr-FR-HenriNeural", name: "Henri (Male)" },
{ id: "fr-FR-DeniseNeural", name: "Denise (Female)" },
],
"de-DE": [
{ id: "de-DE-ConradNeural", name: "Conrad (Male)" },
{ id: "de-DE-KatjaNeural", name: "Katja (Female)" },
],
"es-ES": [
{ id: "es-ES-AlvaroNeural", name: "Álvaro (Male)" },
{ id: "es-ES-ElviraNeural", name: "Elvira (Female)" },
],
"ru-RU": [
{ id: "ru-RU-DmitryNeural", name: "Дмитрий (Male)" },
{ id: "ru-RU-SvetlanaNeural", name: "Светлана (Female)" },
],
"it-IT": [
{ id: "it-IT-DiegoNeural", name: "Diego (Male)" },
{ id: "it-IT-ElsaNeural", name: "Elsa (Female)" },
],
"pt-BR": [
{ id: "pt-BR-AntonioNeural", name: "Antonio (Male)" },
{ id: "pt-BR-FranciscaNeural", name: "Francisca (Female)" },
],
};
const LANG_TO_LOCALE: Record<string, string> = {
"中文": "zh-CN",
"English": "en-US",
"日本語": "ja-JP",
"한국어": "ko-KR",
"Français": "fr-FR",
"Deutsch": "de-DE",
"Español": "es-ES",
"Русский": "ru-RU",
"Italiano": "it-IT",
"Português": "pt-BR",
};
const DEFAULT_SHORT_TITLE_DURATION = 4;
const FIXED_REF_TEXT =
"其实生活中有许多美好的瞬间,比如清晨的阳光,或者一杯温热的清茶。希望这次生成的音色能够自然、流畅,完美还原出我最真实的声音状态。";
const scrollContainerToItem = (container: HTMLDivElement, item: HTMLDivElement) => {
const containerRect = container.getBoundingClientRect();
const itemRect = item.getBoundingClientRect();
@@ -70,22 +124,17 @@ interface RefAudio {
created_at: number;
}
interface Material {
id: string;
name: string;
path: string;
size_mb: number;
scene?: string;
}
import type { Material } from "@/shared/types/material";
export const useHomeController = () => {
const apiBase = getApiBaseUrl();
const [selectedMaterial, setSelectedMaterial] = useState<string>("");
const [selectedMaterials, setSelectedMaterials] = useState<string[]>([]);
const [previewMaterial, setPreviewMaterial] = useState<string | null>(null);
const [text, setText] = useState<string>("");
const [voice, setVoice] = useState<string>("zh-CN-YunxiNeural");
const [textLang, setTextLang] = useState<string>("zh-CN");
// 使用全局任务状态
const { currentTask, isGenerating, startTask } = useTask();
@@ -96,16 +145,26 @@ export const useHomeController = () => {
// 字幕和标题相关状态
const [videoTitle, setVideoTitle] = useState<string>("");
const [enableSubtitles, setEnableSubtitles] = useState<boolean>(true);
const [selectedSubtitleStyleId, setSelectedSubtitleStyleId] = useState<string>("");
const [selectedTitleStyleId, setSelectedTitleStyleId] = useState<string>("");
const [subtitleFontSize, setSubtitleFontSize] = useState<number>(60);
const [titleFontSize, setTitleFontSize] = useState<number>(90);
const [subtitleFontSize, setSubtitleFontSize] = useState<number>(80);
const [titleFontSize, setTitleFontSize] = useState<number>(120);
const [subtitleSizeLocked, setSubtitleSizeLocked] = useState<boolean>(false);
const [titleSizeLocked, setTitleSizeLocked] = useState<boolean>(false);
const [titleTopMargin, setTitleTopMargin] = useState<number>(62);
const [titleDisplayMode, setTitleDisplayMode] = useState<"short" | "persistent">("short");
const [subtitleBottomMargin, setSubtitleBottomMargin] = useState<number>(80);
const [outputAspectRatio, setOutputAspectRatio] = useState<"9:16" | "16:9">("9:16");
const [showStylePreview, setShowStylePreview] = useState<boolean>(false);
const [materialDimensions, setMaterialDimensions] = useState<{ width: number; height: number } | null>(null);
const [previewContainerWidth, setPreviewContainerWidth] = useState<number>(0);
// 副标题相关状态
const [videoSecondaryTitle, setVideoSecondaryTitle] = useState<string>("");
const [selectedSecondaryTitleStyleId, setSelectedSecondaryTitleStyleId] = useState<string>("");
const [secondaryTitleFontSize, setSecondaryTitleFontSize] = useState<number>(48);
const [secondaryTitleTopMargin, setSecondaryTitleTopMargin] = useState<number>(12);
const [secondaryTitleSizeLocked, setSecondaryTitleSizeLocked] = useState<boolean>(false);
// 背景音乐相关状态
const [selectedBgmId, setSelectedBgmId] = useState<string>("");
@@ -115,7 +174,17 @@ export const useHomeController = () => {
// 声音克隆相关状态
const [ttsMode, setTtsMode] = useState<"edgetts" | "voiceclone">("edgetts");
const [selectedRefAudio, setSelectedRefAudio] = useState<RefAudio | null>(null);
const [refText, setRefText] = useState(FIXED_REF_TEXT);
const [refText, setRefText] = useState("");
// 预生成配音选中 ID
const [selectedAudioId, setSelectedAudioId] = useState<string | null>(null);
// 语速控制
const [speed, setSpeed] = useState<number>(1.0);
// ClipTrimmer 模态框状态
const [clipTrimmerOpen, setClipTrimmerOpen] = useState(false);
const [clipTrimmerSegmentId, setClipTrimmerSegmentId] = useState<string | null>(null);
// 音频预览与重命名状态
const [editingAudioId, setEditingAudioId] = useState<string | null>(null);
@@ -124,7 +193,7 @@ export const useHomeController = () => {
const [editMaterialName, setEditMaterialName] = useState("");
const bgmItemRefs = useRef<Record<string, HTMLDivElement | null>>({});
const bgmListContainerRef = useRef<HTMLDivElement | null>(null);
const titlePreviewContainerRef = useRef<HTMLDivElement | null>(null);
const materialItemRefs = useRef<Record<string, HTMLDivElement | null>>({});
const videoItemRefs = useRef<Record<string, HTMLDivElement | null>>({});
@@ -179,8 +248,8 @@ export const useHomeController = () => {
{ new_name: editMaterialName.trim() }
);
const payload = unwrap(res);
if (selectedMaterial === materialId && payload?.id) {
setSelectedMaterial(payload.id);
if (selectedMaterials.includes(materialId) && payload?.id) {
setSelectedMaterials((prev) => prev.map((x) => (x === materialId ? payload.id : x)));
}
setEditingMaterialId(null);
setEditMaterialName("");
@@ -195,6 +264,10 @@ export const useHomeController = () => {
// AI 生成标题标签
const [isGeneratingMeta, setIsGeneratingMeta] = useState(false);
// AI 多语言翻译
const [isTranslating, setIsTranslating] = useState(false);
const [originalText, setOriginalText] = useState<string | null>(null);
// 在线录音相关
const [isRecording, setIsRecording] = useState(false);
const [recordedBlob, setRecordedBlob] = useState<Blob | null>(null);
@@ -208,6 +281,9 @@ export const useHomeController = () => {
// 文案提取模态框
const [extractModalOpen, setExtractModalOpen] = useState(false);
// AI 改写模态框
const [rewriteModalOpen, setRewriteModalOpen] = useState(false);
// 获取存储 key 的前缀(登录用户使用 userId未登录使用 guest
const storageKey = userId || "guest";
@@ -224,11 +300,12 @@ export const useHomeController = () => {
uploadError,
setUploadError,
fetchMaterials,
toggleMaterial,
deleteMaterial,
handleUpload,
} = useMaterials({
selectedMaterial,
setSelectedMaterial,
selectedMaterials,
setSelectedMaterials,
});
const {
@@ -251,8 +328,9 @@ export const useHomeController = () => {
fetchRefAudios,
uploadRefAudio,
deleteRefAudio,
retranscribeRefAudio,
retranscribingId,
} = useRefAudios({
fixedRefText: FIXED_REF_TEXT,
selectedRefAudio,
setSelectedRefAudio,
setRefText,
@@ -287,13 +365,52 @@ export const useHomeController = () => {
fetchGeneratedVideos,
deleteVideo,
} = useGeneratedVideos({
storageKey,
selectedVideoId,
setSelectedVideoId,
setGeneratedVideo,
resolveMediaUrl,
});
const {
generatedAudios,
selectedAudio,
isGeneratingAudio,
audioTask,
fetchGeneratedAudios,
generateAudio,
deleteAudio,
renameAudio,
selectAudio,
} = useGeneratedAudios({
selectedAudioId,
setSelectedAudioId,
});
const {
segments: timelineSegments,
reorderSegments,
setSourceRange,
toCustomAssignments,
} = useTimelineEditor({
audioDuration: selectedAudio?.duration_sec ?? 0,
materials,
selectedMaterials,
storageKey,
});
// 时间轴第一段素材的视频 URL用于帧截取预览
// 有时间轴段时用第一段,没有(如未选配音)回退到 selectedMaterials[0]
const firstTimelineMaterialUrl = useMemo(() => {
const firstSeg = timelineSegments[0];
const matId = firstSeg?.materialId ?? selectedMaterials[0];
if (!matId) return null;
const mat = materials.find((m) => m.id === matId);
return mat?.path ? resolveMediaUrl(mat.path) : null;
}, [materials, timelineSegments, selectedMaterials]);
const materialPosterUrl = useVideoFrameCapture(showStylePreview ? firstTimelineMaterialUrl : null);
useEffect(() => {
if (isAuthLoading || !userId) return;
let active = true;
@@ -336,24 +453,41 @@ export const useHomeController = () => {
setText,
videoTitle,
setVideoTitle,
enableSubtitles,
setEnableSubtitles,
videoSecondaryTitle,
setVideoSecondaryTitle,
ttsMode,
setTtsMode,
voice,
setVoice,
selectedMaterial,
setSelectedMaterial,
textLang,
setTextLang,
selectedMaterials,
setSelectedMaterials,
selectedSubtitleStyleId,
setSelectedSubtitleStyleId,
selectedTitleStyleId,
setSelectedTitleStyleId,
selectedSecondaryTitleStyleId,
setSelectedSecondaryTitleStyleId,
subtitleFontSize,
setSubtitleFontSize,
titleFontSize,
setTitleFontSize,
secondaryTitleFontSize,
setSecondaryTitleFontSize,
setSubtitleSizeLocked,
setTitleSizeLocked,
setSecondaryTitleSizeLocked,
titleTopMargin,
setTitleTopMargin,
secondaryTitleTopMargin,
setSecondaryTitleTopMargin,
titleDisplayMode,
setTitleDisplayMode,
subtitleBottomMargin,
setSubtitleBottomMargin,
outputAspectRatio,
setOutputAspectRatio,
selectedBgmId,
setSelectedBgmId,
bgmVolume,
@@ -363,8 +497,20 @@ export const useHomeController = () => {
selectedVideoId,
setSelectedVideoId,
selectedRefAudio,
selectedAudioId,
setSelectedAudioId,
speed,
setSpeed,
});
const { savedScripts, saveScript, deleteScript: deleteSavedScript } = useSavedScripts(storageKey);
const handleSaveScript = () => {
if (!text.trim()) return;
saveScript(text);
toast.success("文案已保存");
};
const syncTitleToPublish = (value: string) => {
if (typeof window !== "undefined") {
localStorage.setItem(`vigent_${storageKey}_publish_title`, value);
@@ -377,6 +523,12 @@ export const useHomeController = () => {
onCommit: syncTitleToPublish,
});
const secondaryTitleInput = useTitleInput({
value: videoSecondaryTitle,
onChange: setVideoSecondaryTitle,
maxLength: SECONDARY_TITLE_MAX_LENGTH,
});
// 加载素材列表和历史视频
useEffect(() => {
if (isAuthLoading) return;
@@ -384,6 +536,7 @@ export const useHomeController = () => {
fetchMaterials(),
fetchGeneratedVideos(),
fetchRefAudios(),
fetchGeneratedAudios(),
refreshSubtitleStyles(),
refreshTitleStyles(),
fetchBgmList(),
@@ -404,7 +557,8 @@ export const useHomeController = () => {
}, [isGenerating, currentTask, fetchGeneratedVideos]);
useEffect(() => {
const material = materials.find((item) => item.id === selectedMaterial);
const firstSelected = selectedMaterials[0];
const material = materials.find((item) => item.id === firstSelected);
if (!material?.path) {
setMaterialDimensions(null);
return;
@@ -417,7 +571,6 @@ export const useHomeController = () => {
let isActive = true;
const video = document.createElement("video");
video.crossOrigin = "anonymous";
video.preload = "metadata";
video.src = url;
video.load();
@@ -444,27 +597,8 @@ export const useHomeController = () => {
video.removeEventListener("loadedmetadata", handleLoaded);
video.removeEventListener("error", handleError);
};
}, [materials, selectedMaterial]);
}, [materials, selectedMaterials]);
useEffect(() => {
if (!showStylePreview) return;
const container = titlePreviewContainerRef.current;
if (!container) return;
setPreviewContainerWidth(container.getBoundingClientRect().width);
const resizeObserver = new ResizeObserver((entries) => {
for (const entry of entries) {
setPreviewContainerWidth(entry.contentRect.width);
}
});
resizeObserver.observe(container);
return () => {
resizeObserver.disconnect();
};
}, [showStylePreview]);
useEffect(() => {
if (subtitleSizeLocked || subtitleStyles.length === 0) return;
@@ -486,11 +620,32 @@ export const useHomeController = () => {
}
}, [titleStyles, selectedTitleStyleId, titleSizeLocked]);
useEffect(() => {
if (secondaryTitleSizeLocked || titleStyles.length === 0) return;
const active = titleStyles.find((s) => s.id === selectedSecondaryTitleStyleId)
|| titleStyles.find((s) => s.is_default)
|| titleStyles[0];
if (active?.font_size) {
setSecondaryTitleFontSize(active.font_size);
}
}, [titleStyles, selectedSecondaryTitleStyleId, secondaryTitleSizeLocked]);
// 移除重复的 BGM 持久化恢复逻辑 (已统一移动到 useHomePersistence 中)
// useEffect(() => { ... })
// 时间门控:页面加载后 1 秒内禁止所有列表自动滚动效果
// 防止持久化恢复 + 异步数据加载触发 scrollIntoView 导致移动端页面跳动
const scrollEffectsEnabled = useRef(false);
useEffect(() => {
if (!selectedBgmId) return;
const timer = setTimeout(() => {
scrollEffectsEnabled.current = true;
}, 1000);
return () => clearTimeout(timer);
}, []);
// BGM 列表滚动
useEffect(() => {
if (!selectedBgmId || !scrollEffectsEnabled.current) return;
const container = bgmListContainerRef.current;
const target = bgmItemRefs.current[selectedBgmId];
if (container && target) {
@@ -498,13 +653,16 @@ export const useHomeController = () => {
}
}, [selectedBgmId, bgmList]);
// 素材列表滚动
useEffect(() => {
if (!selectedMaterial) return;
const target = materialItemRefs.current[selectedMaterial];
const firstSelected = selectedMaterials[0];
if (!firstSelected || !scrollEffectsEnabled.current) return;
const target = materialItemRefs.current[firstSelected];
if (target) {
target.scrollIntoView({ block: "nearest", behavior: "smooth" });
}
}, [selectedMaterial, materials]);
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [selectedMaterials.length]);
// 【修复】历史视频默认选中逻辑
// 当持久化恢复完成,且列表加载完毕,如果没选中任何视频,默认选中第一个
@@ -514,7 +672,7 @@ export const useHomeController = () => {
setSelectedVideoId(firstId);
setGeneratedVideo(resolveMediaUrl(generatedVideos[0].path));
}
}, [isRestored, generatedVideos, selectedVideoId, setSelectedVideoId, setGeneratedVideo, resolveMediaUrl]);
}, [isRestored, generatedVideos, selectedVideoId, setSelectedVideoId, setGeneratedVideo]);
// 【修复】BGM 默认选中逻辑
useEffect(() => {
@@ -523,8 +681,9 @@ export const useHomeController = () => {
}
}, [isRestored, bgmList, selectedBgmId, enableBgm, setSelectedBgmId]);
// 视频列表滚动
useEffect(() => {
if (!selectedVideoId) return;
if (!selectedVideoId || !scrollEffectsEnabled.current) return;
const target = videoItemRefs.current[selectedVideoId];
if (target) {
target.scrollIntoView({ block: "nearest", behavior: "smooth" });
@@ -630,7 +789,7 @@ export const useHomeController = () => {
setIsGeneratingMeta(true);
try {
const { data: res } = await api.post<ApiResponse<{ title?: string; tags?: string[] }>>(
const { data: res } = await api.post<ApiResponse<{ title?: string; secondary_title?: string; tags?: string[] }>>(
"/api/ai/generate-meta",
{ text: text.trim() }
);
@@ -640,6 +799,10 @@ export const useHomeController = () => {
const nextTitle = clampTitle(payload.title || "");
titleInput.commitValue(nextTitle);
// 更新副标题
const nextSecondaryTitle = clampSecondaryTitle(payload.secondary_title || "");
secondaryTitleInput.commitValue(nextSecondaryTitle);
// 同步到发布页 localStorage
localStorage.setItem(`vigent_${storageKey}_publish_tags`, JSON.stringify(payload.tags || []));
} catch (err: unknown) {
@@ -652,19 +815,88 @@ export const useHomeController = () => {
}
};
// AI 多语言翻译
const handleTranslate = async (targetLang: string) => {
if (!text.trim()) {
toast.error("请先输入口播文案");
return;
}
// 首次翻译时保存原文
if (originalText === null) {
setOriginalText(text);
}
setIsTranslating(true);
try {
const { data: res } = await api.post<ApiResponse<{ translated_text: string }>>(
"/api/ai/translate",
{ text: text.trim(), target_lang: targetLang }
);
const payload = unwrap(res);
setText(payload.translated_text || "");
// 根据翻译目标语言更新 textLang 并自动切换声音
const locale = LANG_TO_LOCALE[targetLang] || "zh-CN";
setTextLang(locale);
if (ttsMode === "edgetts") {
const langVoices = VOICES[locale] || VOICES["zh-CN"];
setVoice(langVoices[0].id);
}
} catch (err: unknown) {
console.error("AI translate failed:", err);
const axiosErr = err as { response?: { data?: { message?: string } }; message?: string };
const errorMsg = axiosErr.response?.data?.message || axiosErr.message || String(err);
toast.error(`AI 翻译失败: ${errorMsg}`);
} finally {
setIsTranslating(false);
}
};
const handleRestoreOriginal = () => {
if (originalText !== null) {
setText(originalText);
setOriginalText(null);
setTextLang("zh-CN");
if (ttsMode === "edgetts") {
setVoice(VOICES["zh-CN"][0].id);
}
}
};
// 生成配音
const handleGenerateAudio = async () => {
if (!text.trim()) {
toast.error("请先输入文案");
return;
}
if (ttsMode === "voiceclone" && !selectedRefAudio) {
toast.error("请选择参考音频");
return;
}
const params = {
text: text.trim(),
tts_mode: ttsMode,
voice: ttsMode === "edgetts" ? voice : undefined,
ref_audio_id: ttsMode === "voiceclone" ? selectedRefAudio!.id : undefined,
ref_text: ttsMode === "voiceclone" ? refText : undefined,
language: textLang,
speed: ttsMode === "voiceclone" ? speed : undefined,
};
await generateAudio(params);
};
// 生成视频
const handleGenerate = async () => {
if (!selectedMaterial || !text.trim()) {
if (selectedMaterials.length === 0 || !text.trim()) {
toast.error("请先选择素材并填写文案");
return;
}
// 声音克隆模式校验
if (ttsMode === "voiceclone") {
if (!selectedRefAudio) {
toast.error("请选择或上传参考音频");
return;
}
if (!selectedAudio) {
toast.error("请先生成并选中配音");
return;
}
if (enableBgm && !selectedBgmId) {
@@ -676,26 +908,81 @@ export const useHomeController = () => {
try {
// 查找选中的素材对象以获取路径
const materialObj = materials.find((m) => m.id === selectedMaterial);
if (!materialObj) {
const firstMaterialObj = materials.find((m) => m.id === selectedMaterials[0]);
if (!firstMaterialObj) {
toast.error("素材数据异常");
return;
}
// 构建请求参数
// 构建请求参数 - 使用预生成配音
const payload: Record<string, unknown> = {
material_path: materialObj.path,
text: text,
tts_mode: ttsMode,
material_path: firstMaterialObj.path,
text: selectedAudio.text || text,
generated_audio_id: selectedAudio.id,
language: selectedAudio.language || textLang,
title: videoTitle.trim() || undefined,
enable_subtitles: enableSubtitles,
enable_subtitles: true,
output_aspect_ratio: outputAspectRatio,
};
if (enableSubtitles && selectedSubtitleStyleId) {
// 多素材
if (selectedMaterials.length > 1) {
const timelineOrderedIds = timelineSegments
.map((seg) => seg.materialId)
.filter((id, index, arr) => arr.indexOf(id) === index);
const orderedMaterialIds = [
...timelineOrderedIds.filter((id) => selectedMaterials.includes(id)),
...selectedMaterials.filter((id) => !timelineOrderedIds.includes(id)),
];
const materialPaths = orderedMaterialIds
.map((id) => materials.find((x) => x.id === id)?.path)
.filter((path): path is string => !!path);
if (materialPaths.length === 0) {
toast.error("多素材解析失败,请刷新素材后重试");
return;
}
payload.material_paths = materialPaths;
payload.material_path = materialPaths[0];
// 发送自定义时间轴分配
const assignments = toCustomAssignments();
if (assignments.length > 0) {
const assignmentPaths = assignments
.map((a) => a.material_path)
.filter((path): path is string => !!path);
if (assignmentPaths.length === assignments.length) {
// 以时间轴可见段为准:超出时间轴的素材不会参与本次生成
payload.material_paths = assignmentPaths;
payload.material_path = assignmentPaths[0];
}
payload.custom_assignments = assignments;
} else {
console.warn(
"[Timeline] custom_assignments 为空,回退后端自动分配",
{ materials: materialPaths.length }
);
}
}
// 单素材 + 截取范围
const singleSeg = timelineSegments[0];
if (
selectedMaterials.length === 1
&& singleSeg
&& (singleSeg.sourceStart > 0 || singleSeg.sourceEnd > 0)
) {
payload.custom_assignments = toCustomAssignments();
}
if (selectedSubtitleStyleId) {
payload.subtitle_style_id = selectedSubtitleStyleId;
}
if (enableSubtitles && subtitleFontSize) {
if (subtitleFontSize) {
payload.subtitle_font_size = Math.round(subtitleFontSize);
}
@@ -707,18 +994,35 @@ export const useHomeController = () => {
payload.title_font_size = Math.round(titleFontSize);
}
if (videoTitle.trim() || videoSecondaryTitle.trim()) {
payload.title_display_mode = titleDisplayMode;
if (titleDisplayMode === "short") {
payload.title_duration = DEFAULT_SHORT_TITLE_DURATION;
}
}
if (videoTitle.trim()) {
payload.title_top_margin = Math.round(titleTopMargin);
}
if (videoSecondaryTitle.trim()) {
payload.secondary_title = videoSecondaryTitle.trim();
if (selectedSecondaryTitleStyleId) {
payload.secondary_title_style_id = selectedSecondaryTitleStyleId;
}
if (secondaryTitleFontSize) {
payload.secondary_title_font_size = Math.round(secondaryTitleFontSize);
}
payload.secondary_title_top_margin = Math.round(secondaryTitleTopMargin);
}
payload.subtitle_bottom_margin = Math.round(subtitleBottomMargin);
if (enableBgm && selectedBgmId) {
payload.bgm_id = selectedBgmId;
payload.bgm_volume = bgmVolume;
}
if (ttsMode === "edgetts") {
payload.voice = voice;
} else {
payload.ref_audio_id = selectedRefAudio!.id;
payload.ref_text = refText;
}
// 创建生成任务
const { data: res } = await api.post<ApiResponse<{ task_id: string }>>(
"/api/videos/generate",
@@ -779,8 +1083,8 @@ export const useHomeController = () => {
fetchMaterials,
deleteMaterial,
handleUpload,
selectedMaterial,
setSelectedMaterial,
selectedMaterials,
toggleMaterial,
handlePreviewMaterial,
editingMaterialId,
editMaterialName,
@@ -792,8 +1096,17 @@ export const useHomeController = () => {
setText,
extractModalOpen,
setExtractModalOpen,
rewriteModalOpen,
setRewriteModalOpen,
handleGenerateMeta,
isGeneratingMeta,
handleTranslate,
isTranslating,
originalText,
handleRestoreOriginal,
savedScripts,
handleSaveScript,
deleteSavedScript,
showStylePreview,
setShowStylePreview,
videoTitle,
@@ -804,25 +1117,40 @@ export const useHomeController = () => {
titleFontSize,
setTitleFontSize,
setTitleSizeLocked,
videoSecondaryTitle,
secondaryTitleInput,
selectedSecondaryTitleStyleId,
setSelectedSecondaryTitleStyleId,
secondaryTitleFontSize,
setSecondaryTitleFontSize,
setSecondaryTitleSizeLocked,
secondaryTitleTopMargin,
setSecondaryTitleTopMargin,
subtitleStyles,
selectedSubtitleStyleId,
setSelectedSubtitleStyleId,
subtitleFontSize,
setSubtitleFontSize,
setSubtitleSizeLocked,
enableSubtitles,
setEnableSubtitles,
titleTopMargin,
setTitleTopMargin,
titleDisplayMode,
setTitleDisplayMode,
subtitleBottomMargin,
setSubtitleBottomMargin,
outputAspectRatio,
setOutputAspectRatio,
resolveAssetUrl,
getFontFormat,
buildTextShadow,
previewContainerWidth,
materialDimensions,
titlePreviewContainerRef,
materialPosterUrl,
ttsMode,
setTtsMode,
voices: VOICES,
voices: VOICES[textLang] || VOICES["zh-CN"],
voice,
setVoice,
textLang,
refAudios,
selectedRefAudio,
handleSelectRefAudio,
@@ -840,6 +1168,8 @@ export const useHomeController = () => {
saveEditing,
cancelEditing,
deleteRefAudio,
retranscribeRefAudio,
retranscribingId,
recordedBlob,
isRecording,
recordingTime,
@@ -847,7 +1177,6 @@ export const useHomeController = () => {
stopRecording,
useRecording,
formatRecordingTime,
fixedRefText: FIXED_REF_TEXT,
bgmList,
bgmLoading,
bgmError,
@@ -873,5 +1202,24 @@ export const useHomeController = () => {
fetchGeneratedVideos,
registerVideoRef,
formatDate,
generatedAudios,
selectedAudio,
selectedAudioId,
isGeneratingAudio,
audioTask,
fetchGeneratedAudios,
handleGenerateAudio,
deleteAudio,
renameAudio,
selectAudio,
speed,
setSpeed,
timelineSegments,
reorderSegments,
setSourceRange,
clipTrimmerOpen,
setClipTrimmerOpen,
clipTrimmerSegmentId,
setClipTrimmerSegmentId,
};
};

View File

@@ -1,5 +1,5 @@
import { useEffect, useState } from "react";
import { clampTitle } from "@/shared/lib/title";
import { clampTitle, clampSecondaryTitle } from "@/shared/lib/title";
interface RefAudio {
id: string;
@@ -17,24 +17,41 @@ interface UseHomePersistenceOptions {
setText: React.Dispatch<React.SetStateAction<string>>;
videoTitle: string;
setVideoTitle: React.Dispatch<React.SetStateAction<string>>;
enableSubtitles: boolean;
setEnableSubtitles: React.Dispatch<React.SetStateAction<boolean>>;
videoSecondaryTitle: string;
setVideoSecondaryTitle: React.Dispatch<React.SetStateAction<string>>;
ttsMode: 'edgetts' | 'voiceclone';
setTtsMode: React.Dispatch<React.SetStateAction<'edgetts' | 'voiceclone'>>;
voice: string;
setVoice: React.Dispatch<React.SetStateAction<string>>;
selectedMaterial: string;
setSelectedMaterial: React.Dispatch<React.SetStateAction<string>>;
textLang: string;
setTextLang: React.Dispatch<React.SetStateAction<string>>;
selectedMaterials: string[];
setSelectedMaterials: React.Dispatch<React.SetStateAction<string[]>>;
selectedSubtitleStyleId: string;
setSelectedSubtitleStyleId: React.Dispatch<React.SetStateAction<string>>;
selectedTitleStyleId: string;
setSelectedTitleStyleId: React.Dispatch<React.SetStateAction<string>>;
selectedSecondaryTitleStyleId: string;
setSelectedSecondaryTitleStyleId: React.Dispatch<React.SetStateAction<string>>;
subtitleFontSize: number;
setSubtitleFontSize: React.Dispatch<React.SetStateAction<number>>;
titleFontSize: number;
setTitleFontSize: React.Dispatch<React.SetStateAction<number>>;
secondaryTitleFontSize: number;
setSecondaryTitleFontSize: React.Dispatch<React.SetStateAction<number>>;
setSubtitleSizeLocked: React.Dispatch<React.SetStateAction<boolean>>;
setTitleSizeLocked: React.Dispatch<React.SetStateAction<boolean>>;
setSecondaryTitleSizeLocked: React.Dispatch<React.SetStateAction<boolean>>;
titleTopMargin: number;
setTitleTopMargin: React.Dispatch<React.SetStateAction<number>>;
secondaryTitleTopMargin: number;
setSecondaryTitleTopMargin: React.Dispatch<React.SetStateAction<number>>;
titleDisplayMode: 'short' | 'persistent';
setTitleDisplayMode: React.Dispatch<React.SetStateAction<'short' | 'persistent'>>;
subtitleBottomMargin: number;
setSubtitleBottomMargin: React.Dispatch<React.SetStateAction<number>>;
outputAspectRatio: '9:16' | '16:9';
setOutputAspectRatio: React.Dispatch<React.SetStateAction<'9:16' | '16:9'>>;
selectedBgmId: string;
setSelectedBgmId: React.Dispatch<React.SetStateAction<string>>;
bgmVolume: number;
@@ -44,6 +61,10 @@ interface UseHomePersistenceOptions {
selectedVideoId: string | null;
setSelectedVideoId: React.Dispatch<React.SetStateAction<string | null>>;
selectedRefAudio: RefAudio | null;
selectedAudioId: string | null;
setSelectedAudioId: React.Dispatch<React.SetStateAction<string | null>>;
speed: number;
setSpeed: React.Dispatch<React.SetStateAction<number>>;
}
export const useHomePersistence = ({
@@ -53,24 +74,41 @@ export const useHomePersistence = ({
setText,
videoTitle,
setVideoTitle,
enableSubtitles,
setEnableSubtitles,
videoSecondaryTitle,
setVideoSecondaryTitle,
ttsMode,
setTtsMode,
voice,
setVoice,
selectedMaterial,
setSelectedMaterial,
textLang,
setTextLang,
selectedMaterials,
setSelectedMaterials,
selectedSubtitleStyleId,
setSelectedSubtitleStyleId,
selectedTitleStyleId,
setSelectedTitleStyleId,
selectedSecondaryTitleStyleId,
setSelectedSecondaryTitleStyleId,
subtitleFontSize,
setSubtitleFontSize,
titleFontSize,
setTitleFontSize,
secondaryTitleFontSize,
setSecondaryTitleFontSize,
setSubtitleSizeLocked,
setTitleSizeLocked,
setSecondaryTitleSizeLocked,
titleTopMargin,
setTitleTopMargin,
secondaryTitleTopMargin,
setSecondaryTitleTopMargin,
titleDisplayMode,
setTitleDisplayMode,
subtitleBottomMargin,
setSubtitleBottomMargin,
outputAspectRatio,
setOutputAspectRatio,
selectedBgmId,
setSelectedBgmId,
bgmVolume,
@@ -80,6 +118,10 @@ export const useHomePersistence = ({
selectedVideoId,
setSelectedVideoId,
selectedRefAudio,
selectedAudioId,
setSelectedAudioId,
speed,
setSpeed,
}: UseHomePersistenceOptions) => {
const [isRestored, setIsRestored] = useState(false);
@@ -88,28 +130,53 @@ export const useHomePersistence = ({
const savedText = localStorage.getItem(`vigent_${storageKey}_text`);
const savedTitle = localStorage.getItem(`vigent_${storageKey}_title`);
const savedSubtitles = localStorage.getItem(`vigent_${storageKey}_subtitles`);
const savedSecondaryTitle = localStorage.getItem(`vigent_${storageKey}_secondaryTitle`);
const savedTtsMode = localStorage.getItem(`vigent_${storageKey}_ttsMode`);
const savedVoice = localStorage.getItem(`vigent_${storageKey}_voice`);
const savedTextLang = localStorage.getItem(`vigent_${storageKey}_textLang`);
const savedMaterial = localStorage.getItem(`vigent_${storageKey}_material`);
const savedSubtitleStyle = localStorage.getItem(`vigent_${storageKey}_subtitleStyle`);
const savedTitleStyle = localStorage.getItem(`vigent_${storageKey}_titleStyle`);
const savedSecondaryTitleStyle = localStorage.getItem(`vigent_${storageKey}_secondaryTitleStyle`);
const savedSubtitleFontSize = localStorage.getItem(`vigent_${storageKey}_subtitleFontSize`);
const savedTitleFontSize = localStorage.getItem(`vigent_${storageKey}_titleFontSize`);
const savedSecondaryTitleFontSize = localStorage.getItem(`vigent_${storageKey}_secondaryTitleFontSize`);
const savedBgmId = localStorage.getItem(`vigent_${storageKey}_bgmId`);
const savedSelectedVideoId = localStorage.getItem(`vigent_${storageKey}_selectedVideoId`);
const savedSelectedVideoId = localStorage.getItem(`vigent_${storageKey}_latestGeneratedVideoId`)
|| localStorage.getItem(`vigent_${storageKey}_selectedVideoId`);
const savedSelectedAudioId = localStorage.getItem(`vigent_${storageKey}_selectedAudioId`);
const savedBgmVolume = localStorage.getItem(`vigent_${storageKey}_bgmVolume`);
const savedEnableBgm = localStorage.getItem(`vigent_${storageKey}_enableBgm`);
const savedTitleTopMargin = localStorage.getItem(`vigent_${storageKey}_titleTopMargin`);
const savedSecondaryTitleTopMargin = localStorage.getItem(`vigent_${storageKey}_secondaryTitleTopMargin`);
const savedTitleDisplayMode = localStorage.getItem(`vigent_${storageKey}_titleDisplayMode`);
const savedSubtitleBottomMargin = localStorage.getItem(`vigent_${storageKey}_subtitleBottomMargin`);
const savedOutputAspectRatio = localStorage.getItem(`vigent_${storageKey}_outputAspectRatio`);
const savedSpeed = localStorage.getItem(`vigent_${storageKey}_speed`);
setText(savedText || "大家好,欢迎来到我的频道,今天给大家分享一些有趣的内容。");
setVideoTitle(savedTitle ? clampTitle(savedTitle) : "");
setEnableSubtitles(savedSubtitles !== null ? savedSubtitles === 'true' : true);
setVideoSecondaryTitle(savedSecondaryTitle ? clampSecondaryTitle(savedSecondaryTitle) : "");
setTtsMode((savedTtsMode as 'edgetts' | 'voiceclone') || 'edgetts');
setVoice(savedVoice || "zh-CN-YunxiNeural");
if (savedTextLang) setTextLang(savedTextLang);
if (savedMaterial) setSelectedMaterial(savedMaterial);
if (savedMaterial) {
try {
const parsed = JSON.parse(savedMaterial);
if (Array.isArray(parsed)) {
setSelectedMaterials(parsed);
} else {
setSelectedMaterials([savedMaterial]);
}
} catch {
// 旧格式: 单字符串
setSelectedMaterials([savedMaterial]);
}
}
if (savedSubtitleStyle) setSelectedSubtitleStyleId(savedSubtitleStyle);
if (savedTitleStyle) setSelectedTitleStyleId(savedTitleStyle);
if (savedSecondaryTitleStyle) setSelectedSecondaryTitleStyleId(savedSecondaryTitleStyle);
if (savedSubtitleFontSize) {
const parsed = parseInt(savedSubtitleFontSize, 10);
@@ -127,10 +194,46 @@ export const useHomePersistence = ({
}
}
if (savedSecondaryTitleFontSize) {
const parsed = parseInt(savedSecondaryTitleFontSize, 10);
if (!Number.isNaN(parsed)) {
setSecondaryTitleFontSize(parsed);
setSecondaryTitleSizeLocked(true);
}
}
if (savedBgmId) setSelectedBgmId(savedBgmId);
if (savedBgmVolume) setBgmVolume(parseFloat(savedBgmVolume));
if (savedEnableBgm !== null) setEnableBgm(savedEnableBgm === 'true');
if (savedSelectedVideoId) setSelectedVideoId(savedSelectedVideoId);
// 消费后清除跨页面共享标记,避免反复覆盖
localStorage.removeItem(`vigent_${storageKey}_latestGeneratedVideoId`);
if (savedSelectedAudioId) setSelectedAudioId(savedSelectedAudioId);
if (savedTitleTopMargin) {
const parsed = parseInt(savedTitleTopMargin, 10);
if (!Number.isNaN(parsed)) setTitleTopMargin(parsed);
}
if (savedSecondaryTitleTopMargin) {
const parsed = parseInt(savedSecondaryTitleTopMargin, 10);
if (!Number.isNaN(parsed)) setSecondaryTitleTopMargin(parsed);
}
if (savedTitleDisplayMode === 'short' || savedTitleDisplayMode === 'persistent') {
setTitleDisplayMode(savedTitleDisplayMode);
}
if (savedSubtitleBottomMargin) {
const parsed = parseInt(savedSubtitleBottomMargin, 10);
if (!Number.isNaN(parsed)) setSubtitleBottomMargin(parsed);
}
if (savedOutputAspectRatio === '9:16' || savedOutputAspectRatio === '16:9') {
setOutputAspectRatio(savedOutputAspectRatio);
}
if (savedSpeed) {
const parsed = parseFloat(savedSpeed);
if (!Number.isNaN(parsed)) setSpeed(parsed);
}
// eslint-disable-next-line react-hooks/set-state-in-effect
setIsRestored(true);
@@ -138,19 +241,30 @@ export const useHomePersistence = ({
isAuthLoading,
setBgmVolume,
setEnableBgm,
setEnableSubtitles,
setSelectedBgmId,
setSelectedMaterial,
setSelectedMaterials,
setSelectedSubtitleStyleId,
setSelectedTitleStyleId,
setSelectedSecondaryTitleStyleId,
setSelectedVideoId,
setSelectedAudioId,
setSpeed,
setSubtitleFontSize,
setSubtitleSizeLocked,
setText,
setTextLang,
setTitleFontSize,
setTitleSizeLocked,
setSecondaryTitleFontSize,
setSecondaryTitleSizeLocked,
setTitleTopMargin,
setSecondaryTitleTopMargin,
setTitleDisplayMode,
setSubtitleBottomMargin,
setOutputAspectRatio,
setTtsMode,
setVideoTitle,
setVideoSecondaryTitle,
setVoice,
storageKey,
]);
@@ -172,8 +286,12 @@ export const useHomePersistence = ({
}, [videoTitle, storageKey, isRestored]);
useEffect(() => {
if (isRestored) localStorage.setItem(`vigent_${storageKey}_subtitles`, String(enableSubtitles));
}, [enableSubtitles, storageKey, isRestored]);
if (!isRestored) return;
const timeout = setTimeout(() => {
localStorage.setItem(`vigent_${storageKey}_secondaryTitle`, videoSecondaryTitle);
}, 300);
return () => clearTimeout(timeout);
}, [videoSecondaryTitle, storageKey, isRestored]);
useEffect(() => {
if (isRestored) localStorage.setItem(`vigent_${storageKey}_ttsMode`, ttsMode);
@@ -184,10 +302,14 @@ export const useHomePersistence = ({
}, [voice, storageKey, isRestored]);
useEffect(() => {
if (isRestored && selectedMaterial) {
localStorage.setItem(`vigent_${storageKey}_material`, selectedMaterial);
if (isRestored) localStorage.setItem(`vigent_${storageKey}_textLang`, textLang);
}, [textLang, storageKey, isRestored]);
useEffect(() => {
if (isRestored && selectedMaterials.length > 0) {
localStorage.setItem(`vigent_${storageKey}_material`, JSON.stringify(selectedMaterials));
}
}, [selectedMaterial, storageKey, isRestored]);
}, [selectedMaterials, storageKey, isRestored]);
useEffect(() => {
if (isRestored && selectedSubtitleStyleId) {
@@ -201,6 +323,12 @@ export const useHomePersistence = ({
}
}, [selectedTitleStyleId, storageKey, isRestored]);
useEffect(() => {
if (isRestored && selectedSecondaryTitleStyleId) {
localStorage.setItem(`vigent_${storageKey}_secondaryTitleStyle`, selectedSecondaryTitleStyleId);
}
}, [selectedSecondaryTitleStyleId, storageKey, isRestored]);
useEffect(() => {
if (isRestored) {
localStorage.setItem(`vigent_${storageKey}_subtitleFontSize`, String(subtitleFontSize));
@@ -213,6 +341,42 @@ export const useHomePersistence = ({
}
}, [titleFontSize, storageKey, isRestored]);
useEffect(() => {
if (isRestored) {
localStorage.setItem(`vigent_${storageKey}_secondaryTitleFontSize`, String(secondaryTitleFontSize));
}
}, [secondaryTitleFontSize, storageKey, isRestored]);
useEffect(() => {
if (isRestored) {
localStorage.setItem(`vigent_${storageKey}_titleTopMargin`, String(titleTopMargin));
}
}, [titleTopMargin, storageKey, isRestored]);
useEffect(() => {
if (isRestored) {
localStorage.setItem(`vigent_${storageKey}_secondaryTitleTopMargin`, String(secondaryTitleTopMargin));
}
}, [secondaryTitleTopMargin, storageKey, isRestored]);
useEffect(() => {
if (isRestored) {
localStorage.setItem(`vigent_${storageKey}_titleDisplayMode`, titleDisplayMode);
}
}, [titleDisplayMode, storageKey, isRestored]);
useEffect(() => {
if (isRestored) {
localStorage.setItem(`vigent_${storageKey}_subtitleBottomMargin`, String(subtitleBottomMargin));
}
}, [subtitleBottomMargin, storageKey, isRestored]);
useEffect(() => {
if (isRestored) {
localStorage.setItem(`vigent_${storageKey}_outputAspectRatio`, outputAspectRatio);
}
}, [outputAspectRatio, storageKey, isRestored]);
useEffect(() => {
if (isRestored) {
localStorage.setItem(`vigent_${storageKey}_bgmId`, selectedBgmId);
@@ -242,11 +406,26 @@ export const useHomePersistence = ({
}
}, [selectedVideoId, storageKey, isRestored]);
useEffect(() => {
if (!isRestored) return;
if (selectedAudioId) {
localStorage.setItem(`vigent_${storageKey}_selectedAudioId`, selectedAudioId);
} else {
localStorage.removeItem(`vigent_${storageKey}_selectedAudioId`);
}
}, [selectedAudioId, storageKey, isRestored]);
useEffect(() => {
if (isRestored && selectedRefAudio) {
localStorage.setItem(`vigent_${storageKey}_refAudioId`, selectedRefAudio.id);
}
}, [selectedRefAudio, storageKey, isRestored]);
useEffect(() => {
if (isRestored) {
localStorage.setItem(`vigent_${storageKey}_speed`, String(speed));
}
}, [speed, storageKey, isRestored]);
return { isRestored };
};

View File

@@ -2,23 +2,44 @@ import { useCallback, useState } from "react";
import api from "@/shared/api/axios";
import { ApiResponse, unwrap } from "@/shared/api/types";
import { toast } from "sonner";
import { resolveMediaUrl } from "@/shared/lib/media";
import type { Material } from "@/shared/types/material";
interface Material {
id: string;
name: string;
scene: string;
size_mb: number;
path: string;
/** Probe video duration from a URL using <video> element */
function probeVideoDuration(url: string): Promise<number> {
return new Promise((resolve) => {
const video = document.createElement("video");
video.preload = "metadata";
video.crossOrigin = "anonymous";
const cleanup = () => {
video.removeEventListener("loadedmetadata", onMeta);
video.removeEventListener("error", onError);
video.src = "";
};
const onMeta = () => {
const dur = video.duration;
cleanup();
resolve(Number.isFinite(dur) ? dur : 0);
};
const onError = () => {
cleanup();
resolve(0);
};
video.addEventListener("loadedmetadata", onMeta);
video.addEventListener("error", onError);
video.src = url;
video.load();
});
}
interface UseMaterialsOptions {
selectedMaterial: string;
setSelectedMaterial: React.Dispatch<React.SetStateAction<string>>;
selectedMaterials: string[];
setSelectedMaterials: React.Dispatch<React.SetStateAction<string[]>>;
}
export const useMaterials = ({
selectedMaterial,
setSelectedMaterial,
selectedMaterials,
setSelectedMaterials,
}: UseMaterialsOptions) => {
const [materials, setMaterials] = useState<Material[]>([]);
const [fetchError, setFetchError] = useState<string | null>(null);
@@ -41,12 +62,25 @@ export const useMaterials = ({
setMaterials(nextMaterials);
setLastMaterialCount(nextMaterials.length);
setSelectedMaterial((prev) => {
// 如果当前选中的素材在列表中依然存在,保持选中
const exists = nextMaterials.some((item) => item.id === prev);
if (exists) return prev;
// Probe video durations in background
if (nextMaterials.length > 0) {
Promise.all(
nextMaterials.map(async (m) => {
const url = resolveMediaUrl(m.path);
if (!url) return m;
const dur = await probeVideoDuration(url);
return { ...m, duration_sec: dur };
})
).then((enriched) => setMaterials(enriched));
}
setSelectedMaterials((prev) => {
// 保留已选中且仍存在的
const existingIds = new Set(nextMaterials.map((m) => m.id));
const kept = prev.filter((id) => existingIds.has(id));
if (kept.length > 0) return kept;
// 否则默认选中第一个
return nextMaterials[0]?.id || "";
return nextMaterials[0]?.id ? [nextMaterials[0].id] : [];
});
} catch (error) {
console.error("获取素材失败:", error);
@@ -54,29 +88,58 @@ export const useMaterials = ({
} finally {
setIsFetching(false);
}
}, [setSelectedMaterial]);
}, [setSelectedMaterials]);
const MAX_MATERIALS = 4;
const toggleMaterial = useCallback((id: string) => {
setSelectedMaterials((prev) => {
if (prev.includes(id)) {
// 不能取消最后一个
if (prev.length <= 1) return prev;
return prev.filter((x) => x !== id);
}
if (prev.length >= MAX_MATERIALS) return prev;
return [...prev, id];
});
}, [setSelectedMaterials]);
const reorderMaterials = useCallback((activeId: string, overId: string) => {
setSelectedMaterials((prev) => {
const oldIndex = prev.indexOf(activeId);
const newIndex = prev.indexOf(overId);
if (oldIndex === -1 || newIndex === -1) return prev;
const next = [...prev];
next.splice(oldIndex, 1);
next.splice(newIndex, 0, activeId);
return next;
});
}, [setSelectedMaterials]);
const deleteMaterial = useCallback(async (materialId: string) => {
if (!confirm("确定要删除这个素材吗?")) return;
try {
await api.delete(`/api/materials/${materialId}`);
fetchMaterials();
if (selectedMaterial === materialId) {
setSelectedMaterial("");
if (selectedMaterials.includes(materialId)) {
setSelectedMaterials((prev) => {
const next = prev.filter((id) => id !== materialId);
return next.length > 0 ? next : [];
});
}
} catch (error) {
toast.error("删除失败: " + error);
}
}, [fetchMaterials, selectedMaterial, setSelectedMaterial]);
}, [fetchMaterials, selectedMaterials, setSelectedMaterials]);
const handleUpload = useCallback(async (e: React.ChangeEvent<HTMLInputElement>) => {
const file = e.target.files?.[0];
if (!file) return;
const validTypes = ['.mp4', '.mov', '.avi'];
const validTypes = ['.mp4', '.mov', '.avi', '.mkv', '.webm', '.flv', '.wmv', '.m4v', '.ts', '.mts'];
const ext = file.name.toLowerCase().slice(file.name.lastIndexOf('.'));
if (!validTypes.includes(ext)) {
setUploadError('支持 MP4、MOV、AVI 格式');
setUploadError('支持的视频格式');
return;
}
@@ -100,7 +163,37 @@ export const useMaterials = ({
setUploadProgress(100);
setIsUploading(false);
fetchMaterials();
// 上传后重新拉列表并自动选中新素材
const { data: res } = await api.get<ApiResponse<{ materials: Material[] }>>(
`/api/materials?t=${new Date().getTime()}`
);
const payload = unwrap(res);
const nextMaterials = payload.materials || [];
setMaterials(nextMaterials);
setLastMaterialCount(nextMaterials.length);
// Probe video durations in background
if (nextMaterials.length > 0) {
Promise.all(
nextMaterials.map(async (m) => {
const url = resolveMediaUrl(m.path);
if (!url) return m;
const dur = await probeVideoDuration(url);
return { ...m, duration_sec: dur };
})
).then((enriched) => setMaterials(enriched));
}
// 找出新增素材并默认仅选中新上传项,避免误触发多素材模式
const oldIds = new Set(materials.map((m) => m.id));
const newIds = nextMaterials.filter((m) => !oldIds.has(m.id)).map((m) => m.id);
if (newIds.length > 0) {
setSelectedMaterials([newIds[0]]);
} else if (nextMaterials[0]?.id) {
// 兜底:即使未识别到新增项,也保持单素材默认选择最新一个
setSelectedMaterials([nextMaterials[0].id]);
}
} catch (err: unknown) {
console.error("Upload failed:", err);
setIsUploading(false);
@@ -110,7 +203,7 @@ export const useMaterials = ({
}
e.target.value = '';
}, [fetchMaterials]);
}, [materials, setSelectedMaterials]);
return {
materials,
@@ -122,6 +215,8 @@ export const useMaterials = ({
uploadError,
setUploadError,
fetchMaterials,
toggleMaterial,
reorderMaterials,
deleteMaterial,
handleUpload,
};

View File

@@ -13,14 +13,12 @@ interface RefAudio {
}
interface UseRefAudiosOptions {
fixedRefText: string;
selectedRefAudio: RefAudio | null;
setSelectedRefAudio: React.Dispatch<React.SetStateAction<RefAudio | null>>;
setRefText: React.Dispatch<React.SetStateAction<string>>;
}
export const useRefAudios = ({
fixedRefText,
selectedRefAudio,
setSelectedRefAudio,
setRefText,
@@ -28,6 +26,7 @@ export const useRefAudios = ({
const [refAudios, setRefAudios] = useState<RefAudio[]>([]);
const [isUploadingRef, setIsUploadingRef] = useState(false);
const [uploadRefError, setUploadRefError] = useState<string | null>(null);
const [retranscribingId, setRetranscribingId] = useState<string | null>(null);
const fetchRefAudios = useCallback(async () => {
try {
@@ -42,15 +41,12 @@ export const useRefAudios = ({
}, []);
const uploadRefAudio = useCallback(async (file: File) => {
const refTextInput = fixedRefText;
setIsUploadingRef(true);
setUploadRefError(null);
try {
const formData = new FormData();
formData.append('file', file);
formData.append('ref_text', refTextInput);
const { data: res } = await api.post<ApiResponse<RefAudio>>('/api/ref-audios', formData, {
headers: { 'Content-Type': 'multipart/form-data' },
@@ -68,7 +64,7 @@ export const useRefAudios = ({
const errorMsg = axiosErr.response?.data?.message || axiosErr.message || String(err);
setUploadRefError(`上传失败: ${errorMsg}`);
}
}, [fetchRefAudios, fixedRefText, setRefText, setSelectedRefAudio]);
}, [fetchRefAudios, setRefText, setSelectedRefAudio]);
const deleteRefAudio = useCallback(async (audioId: string) => {
if (!confirm("确定要删除这个参考音频吗?")) return;
@@ -84,6 +80,28 @@ export const useRefAudios = ({
}
}, [fetchRefAudios, selectedRefAudio, setRefText, setSelectedRefAudio]);
const retranscribeRefAudio = useCallback(async (audioId: string) => {
setRetranscribingId(audioId);
try {
const { data: res } = await api.post<ApiResponse<{ ref_text: string }>>(
`/api/ref-audios/${encodeURIComponent(audioId)}/retranscribe`
);
const payload = unwrap(res);
toast.success("识别完成");
// 更新列表和当前选中
await fetchRefAudios();
if (selectedRefAudio?.id === audioId) {
setRefText(payload.ref_text);
}
} catch (err: unknown) {
const axiosErr = err as { response?: { data?: { message?: string } }; message?: string };
const errorMsg = axiosErr.response?.data?.message || axiosErr.message || String(err);
toast.error(`识别失败: ${errorMsg}`);
} finally {
setRetranscribingId(null);
}
}, [fetchRefAudios, selectedRefAudio, setRefText]);
return {
refAudios,
isUploadingRef,
@@ -92,5 +110,7 @@ export const useRefAudios = ({
fetchRefAudios,
uploadRefAudio,
deleteRefAudio,
retranscribeRefAudio,
retranscribingId,
};
};

View File

@@ -0,0 +1,51 @@
import { useState, useEffect, useRef } from "react";
export interface SavedScript {
id: string;
name: string;
content: string;
savedAt: number;
}
export function useSavedScripts(storageKey: string) {
const lsKey = `vigent_${storageKey}_savedScripts`;
const lsKeyRef = useRef(lsKey);
lsKeyRef.current = lsKey;
const [savedScripts, setSavedScripts] = useState<SavedScript[]>([]);
// Re-read from localStorage whenever lsKey changes (e.g. guest → userId)
useEffect(() => {
try {
const raw = localStorage.getItem(lsKey);
setSavedScripts(raw ? JSON.parse(raw) : []);
} catch {
setSavedScripts([]);
}
}, [lsKey]);
const saveScript = (content: string) => {
const name = content.slice(0, 15).replace(/\n/g, " ") || "未命名";
const entry: SavedScript = {
id: Date.now().toString(36) + Math.random().toString(36).slice(2, 6),
name,
content,
savedAt: Date.now(),
};
setSavedScripts((prev) => {
const next = [entry, ...prev];
localStorage.setItem(lsKeyRef.current, JSON.stringify(next));
return next;
});
};
const deleteScript = (id: string) => {
setSavedScripts((prev) => {
const next = prev.filter((s) => s.id !== id);
localStorage.setItem(lsKeyRef.current, JSON.stringify(next));
return next;
});
};
return { savedScripts, saveScript, deleteScript };
}

View File

@@ -0,0 +1,256 @@
import { useCallback, useEffect, useRef, useState } from "react";
import type { Material } from "@/shared/types/material";
export interface TimelineSegment {
id: string;
materialId: string;
materialName: string;
start: number;
end: number;
sourceStart: number;
sourceEnd: number;
color: string;
}
export interface CustomAssignment {
material_path: string;
start: number;
end: number;
source_start: number;
source_end?: number;
}
const COLORS = ["#8b5cf6", "#ec4899", "#06b6d4", "#f59e0b", "#10b981", "#f97316"];
/** Serializable subset for localStorage */
interface SegmentSnapshot {
materialId: string;
start: number;
end: number;
sourceStart: number;
sourceEnd: number;
}
/** Get effective duration of a segment (clipped range or full material duration) */
function getEffectiveDuration(
seg: { sourceStart: number; sourceEnd: number; materialId: string },
mats: Material[]
): number {
const mat = mats.find((m) => m.id === seg.materialId);
const matDur = mat?.duration_sec ?? 0;
if (seg.sourceEnd > seg.sourceStart) return seg.sourceEnd - seg.sourceStart;
if (seg.sourceStart > 0) return Math.max(matDur - seg.sourceStart, 0);
return matDur;
}
/**
* Recalculate segment start/end positions based on effective durations.
* - Segments placed sequentially by effective duration
* - Segments exceeding audioDuration keep their positions (overflow, start >= duration)
* - Last visible segment is capped/extended to exactly audioDuration (loop fill)
*/
function recalcPositions(
segs: TimelineSegment[],
mats: Material[],
duration: number
): TimelineSegment[] {
if (segs.length === 0 || duration <= 0) return segs;
const fallbackDur = duration / segs.length;
let cursor = 0;
const result = segs.map((seg) => {
const effDur = getEffectiveDuration(seg, mats);
const dur = effDur > 0 ? effDur : fallbackDur;
const newSeg = { ...seg, start: cursor, end: cursor + dur };
cursor += dur;
return newSeg;
});
// Find last segment that starts before audioDuration
let lastVisibleIdx = -1;
for (let i = result.length - 1; i >= 0; i--) {
if (result[i].start < duration) {
lastVisibleIdx = i;
break;
}
}
// Cap/extend last visible segment to exactly audioDuration
if (lastVisibleIdx >= 0) {
result[lastVisibleIdx] = { ...result[lastVisibleIdx], end: duration };
}
return result;
}
interface UseTimelineEditorOptions {
audioDuration: number;
materials: Material[];
selectedMaterials: string[];
storageKey?: string;
}
export const useTimelineEditor = ({
audioDuration,
materials,
selectedMaterials,
storageKey,
}: UseTimelineEditorOptions) => {
const [segments, setSegments] = useState<TimelineSegment[]>([]);
const prevKey = useRef("");
const restoredRef = useRef(false);
// Refs for stable callbacks (avoid recreating on every materials/duration change)
const materialsRef = useRef(materials);
const audioDurationRef = useRef(audioDuration);
useEffect(() => {
materialsRef.current = materials;
}, [materials]);
useEffect(() => {
audioDurationRef.current = audioDuration;
}, [audioDuration]);
// Build a durationsKey so segments re-init when material durations become available
const durationsKey = selectedMaterials
.map((id) => materials.find((m) => m.id === id)?.duration_sec ?? 0)
.join(",");
// Build a cache key from materials + duration
const cacheKey = `${selectedMaterials.join(",")}_${audioDuration.toFixed(1)}`;
const lsKey = storageKey ? `vigent_${storageKey}_timeline` : null;
const initSegments = useCallback(() => {
if (selectedMaterials.length === 0 || audioDuration <= 0) {
setSegments([]);
return;
}
// Try restore from localStorage
if (lsKey) {
try {
const raw = localStorage.getItem(lsKey);
if (raw) {
const saved = JSON.parse(raw) as { key: string; segments: SegmentSnapshot[] };
if (saved.key === cacheKey && saved.segments.length === selectedMaterials.length) {
const allMatch = saved.segments.every(
(s, i) => s.materialId === selectedMaterials[i] || saved.segments.some((ss) => ss.materialId === selectedMaterials[i])
);
if (allMatch) {
const restored: TimelineSegment[] = saved.segments.map((s, i) => {
const mat = materials.find((m) => m.id === s.materialId);
return {
id: `seg-${i}-${Date.now()}`,
materialId: s.materialId,
materialName: mat?.scene || mat?.name || s.materialId,
start: 0,
end: 0,
sourceStart: s.sourceStart,
sourceEnd: s.sourceEnd,
color: COLORS[i % COLORS.length],
};
});
setSegments(recalcPositions(restored, materials, audioDuration));
restoredRef.current = true;
return;
}
}
}
} catch {
// ignore parse errors
}
}
// Create fresh segments — positions derived by recalcPositions
const newSegments: TimelineSegment[] = selectedMaterials.map((matId, i) => {
const mat = materials.find((m) => m.id === matId);
return {
id: `seg-${i}-${Date.now()}`,
materialId: matId,
materialName: mat?.scene || mat?.name || matId,
start: 0,
end: 0,
sourceStart: 0,
sourceEnd: 0,
color: COLORS[i % COLORS.length],
};
});
setSegments(recalcPositions(newSegments, materials, audioDuration));
}, [audioDuration, materials, selectedMaterials, lsKey, cacheKey]);
// Auto-init when selectedMaterials, audioDuration, or material durations change
useEffect(() => {
const key = `${selectedMaterials.join(",")}_${audioDuration}_${durationsKey}`;
if (key !== prevKey.current) {
prevKey.current = key;
initSegments();
}
}, [selectedMaterials, audioDuration, durationsKey, initSegments]);
// Persist segments to localStorage on change (debounced)
useEffect(() => {
if (!lsKey || segments.length === 0) return;
const timeout = setTimeout(() => {
const snapshots: SegmentSnapshot[] = segments.map((s) => ({
materialId: s.materialId,
start: s.start,
end: s.end,
sourceStart: s.sourceStart,
sourceEnd: s.sourceEnd,
}));
localStorage.setItem(lsKey, JSON.stringify({ key: cacheKey, segments: snapshots }));
}, 300);
return () => clearTimeout(timeout);
}, [segments, lsKey, cacheKey]);
const reorderSegments = useCallback(
(fromIdx: number, toIdx: number) => {
setSegments((prev) => {
if (fromIdx < 0 || toIdx < 0 || fromIdx >= prev.length || toIdx >= prev.length) return prev;
if (fromIdx === toIdx) return prev;
const next = [...prev];
// Move the segment: remove from old position, insert at new position
const [moved] = next.splice(fromIdx, 1);
next.splice(toIdx, 0, moved);
return recalcPositions(next, materialsRef.current, audioDurationRef.current);
});
},
[]
);
const setSourceRange = useCallback(
(id: string, sourceStart: number, sourceEnd: number) => {
setSegments((prev) => {
const updated = prev.map((s) => (s.id === id ? { ...s, sourceStart, sourceEnd } : s));
return recalcPositions(updated, materialsRef.current, audioDurationRef.current);
});
},
[]
);
const toCustomAssignments = useCallback((): CustomAssignment[] => {
const duration = audioDurationRef.current;
return segments
.filter((seg) => seg.start < duration)
.map((seg) => {
const mat = materialsRef.current.find((m) => m.id === seg.materialId);
return {
material_path: mat?.path || seg.materialId,
start: seg.start,
end: seg.end,
source_start: seg.sourceStart,
source_end: seg.sourceEnd > seg.sourceStart ? seg.sourceEnd : undefined,
};
});
}, [segments]);
return {
segments,
initSegments,
reorderSegments,
setSourceRange,
toCustomAssignments,
};
};

View File

@@ -0,0 +1,94 @@
import { useEffect, useState } from "react";
/** 预览窗口最大 280px 宽,截取无需超过此尺寸 */
const MAX_CAPTURE_WIDTH = 480;
/**
* 从视频 URL 截取 0.1s 处的帧,返回 JPEG data URL。
* 失败时返回 null降级渐变背景
*/
export function useVideoFrameCapture(videoUrl: string | null): string | null {
const [frameUrl, setFrameUrl] = useState<string | null>(null);
useEffect(() => {
if (!videoUrl) {
setFrameUrl(null);
return;
}
let isActive = true;
const video = document.createElement("video");
video.crossOrigin = "anonymous";
video.muted = true;
video.preload = "auto";
video.playsInline = true;
const cleanup = () => {
video.removeEventListener("loadedmetadata", onLoaded);
video.removeEventListener("canplay", onLoaded);
video.removeEventListener("seeked", onSeeked);
video.removeEventListener("error", onError);
video.src = "";
video.load();
};
const onSeeked = () => {
if (!isActive) return;
try {
const vw = video.videoWidth;
const vh = video.videoHeight;
if (!vw || !vh) {
if (isActive) setFrameUrl(null);
cleanup();
return;
}
const scale = Math.min(1, MAX_CAPTURE_WIDTH / vw);
const cw = Math.round(vw * scale);
const ch = Math.round(vh * scale);
const canvas = document.createElement("canvas");
canvas.width = cw;
canvas.height = ch;
const ctx = canvas.getContext("2d");
if (!ctx) {
if (isActive) setFrameUrl(null);
cleanup();
return;
}
ctx.drawImage(video, 0, 0, cw, ch);
const dataUrl = canvas.toDataURL("image/jpeg", 0.7);
if (isActive) setFrameUrl(dataUrl);
} catch {
if (isActive) setFrameUrl(null);
}
cleanup();
};
let seeked = false;
const onLoaded = () => {
if (!isActive || seeked) return;
seeked = true;
video.currentTime = 0.1;
};
const onError = () => {
if (isActive) setFrameUrl(null);
cleanup();
};
// 先绑定监听,再设 src
video.addEventListener("loadedmetadata", onLoaded);
video.addEventListener("canplay", onLoaded);
video.addEventListener("seeked", onSeeked);
video.addEventListener("error", onError);
video.src = videoUrl;
return () => {
isActive = false;
cleanup();
};
}, [videoUrl]);
return frameUrl;
}

View File

@@ -43,7 +43,7 @@ export function BgmPanel({
return (
<div className="bg-white/5 rounded-2xl p-6 border border-white/10 backdrop-blur-sm">
<div className="flex items-center justify-between mb-4">
<h2 className="text-lg font-semibold text-white flex items-center gap-2">🎵 </h2>
<h2 className="text-lg font-semibold text-white flex items-center gap-2"></h2>
<div className="flex items-center gap-2">
<button
onClick={onRefresh}

View File

@@ -0,0 +1,293 @@
import { useCallback, useEffect, useRef, useState } from "react";
import { X, Play, Pause } from "lucide-react";
import type { TimelineSegment } from "@/features/home/model/useTimelineEditor";
interface ClipTrimmerProps {
isOpen: boolean;
segment: TimelineSegment | null;
materialUrl: string | null;
onConfirm: (sourceStart: number, sourceEnd: number) => void;
onClose: () => void;
}
function formatSec(sec: number): string {
const m = Math.floor(sec / 60);
const s = sec % 60;
return `${String(m).padStart(2, "0")}:${s.toFixed(1).padStart(4, "0")}`;
}
export function ClipTrimmer({
isOpen,
segment,
materialUrl,
onConfirm,
onClose,
}: ClipTrimmerProps) {
const videoRef = useRef<HTMLVideoElement>(null);
const trackRef = useRef<HTMLDivElement>(null);
const [duration, setDuration] = useState(0);
const [sourceStart, setSourceStart] = useState(0);
const [sourceEnd, setSourceEnd] = useState(0);
const [currentTime, setCurrentTime] = useState(0);
const [isPlaying, setIsPlaying] = useState(false);
const [dragging, setDragging] = useState<"start" | "end" | null>(null);
const animRef = useRef<number>(0);
// Reset state when segment changes
useEffect(() => {
if (segment && isOpen) {
setSourceStart(segment.sourceStart);
setSourceEnd(segment.sourceEnd);
setCurrentTime(segment.sourceStart);
setIsPlaying(false);
}
}, [segment, isOpen]);
// Track currentTime during playback
useEffect(() => {
if (!isPlaying || !videoRef.current) return;
const tick = () => {
if (!videoRef.current) return;
const t = videoRef.current.currentTime;
const end = sourceEnd || duration;
if (t >= end) {
videoRef.current.pause();
videoRef.current.currentTime = sourceStart;
setCurrentTime(sourceStart);
setIsPlaying(false);
return;
}
setCurrentTime(t);
animRef.current = requestAnimationFrame(tick);
};
animRef.current = requestAnimationFrame(tick);
return () => cancelAnimationFrame(animRef.current);
}, [isPlaying, sourceStart, sourceEnd, duration]);
// Seek video when not playing and currentTime changes
useEffect(() => {
if (videoRef.current && !isPlaying) {
videoRef.current.currentTime = currentTime;
}
}, [currentTime, isPlaying]);
const handleLoadedMetadata = useCallback(() => {
if (videoRef.current) {
const dur = videoRef.current.duration;
setDuration(dur);
if (sourceEnd === 0) {
setSourceEnd(dur);
}
}
}, [sourceEnd]);
const togglePlay = useCallback(() => {
if (!videoRef.current || duration === 0) return;
if (isPlaying) {
videoRef.current.pause();
setIsPlaying(false);
} else {
const end = sourceEnd || duration;
if (videoRef.current.currentTime >= end || videoRef.current.currentTime < sourceStart) {
videoRef.current.currentTime = sourceStart;
setCurrentTime(sourceStart);
}
videoRef.current.play().catch(() => {});
setIsPlaying(true);
}
}, [isPlaying, sourceStart, sourceEnd, duration]);
// --- Dual-handle slider logic ---
const getPositionFromEvent = useCallback(
(clientX: number) => {
if (!trackRef.current || duration === 0) return 0;
const rect = trackRef.current.getBoundingClientRect();
const ratio = Math.max(0, Math.min(1, (clientX - rect.left) / rect.width));
return ratio * duration;
},
[duration]
);
const handleThumbPointerDown = useCallback(
(which: "start" | "end", e: React.PointerEvent) => {
e.preventDefault();
e.stopPropagation();
setDragging(which);
(e.target as HTMLElement).setPointerCapture(e.pointerId);
},
[]
);
const handleTrackPointerMove = useCallback(
(e: React.PointerEvent) => {
if (!dragging) return;
const pos = getPositionFromEvent(e.clientX);
const minGap = 0.5;
if (dragging === "start") {
const clamped = Math.max(0, Math.min(pos, (sourceEnd || duration) - minGap));
setSourceStart(clamped);
setCurrentTime(clamped);
} else {
const clamped = Math.min(duration, Math.max(pos, sourceStart + minGap));
setSourceEnd(clamped);
}
},
[dragging, getPositionFromEvent, sourceStart, sourceEnd, duration]
);
const handleTrackPointerUp = useCallback(() => {
setDragging(null);
}, []);
const handleConfirm = () => {
onConfirm(sourceStart, sourceEnd >= duration ? 0 : sourceEnd);
};
if (!isOpen || !segment) return null;
const assignedDur = segment.end - segment.start;
const effectiveEnd = sourceEnd || duration;
const clipDur = effectiveEnd - sourceStart;
const startPct = duration > 0 ? (sourceStart / duration) * 100 : 0;
const endPct = duration > 0 ? (effectiveEnd / duration) * 100 : 100;
const playheadPct = duration > 0 ? (currentTime / duration) * 100 : 0;
return (
<div className="fixed inset-0 z-50 flex items-center justify-center bg-black/60 backdrop-blur-sm" onClick={onClose}>
<div
className="bg-gray-900 border border-white/10 rounded-2xl w-full max-w-lg mx-4 overflow-hidden"
onClick={(e) => e.stopPropagation()}
>
{/* Header */}
<div className="flex items-center justify-between px-5 py-3 border-b border-white/10">
<h3 className="text-white font-semibold text-sm">
- {segment.materialName}
</h3>
<button onClick={onClose} className="text-gray-400 hover:text-white">
<X className="h-4 w-4" />
</button>
</div>
{/* Video preview */}
<div className="px-5 pt-4">
<div className="relative bg-black rounded-lg overflow-hidden aspect-video group">
{materialUrl ? (
<video
ref={videoRef}
src={materialUrl}
className="w-full h-full object-contain"
onLoadedMetadata={handleLoadedMetadata}
onEnded={() => setIsPlaying(false)}
preload="auto"
muted
/>
) : (
<div className="flex items-center justify-center h-full text-gray-500 text-sm">
</div>
)}
{/* Play/Pause overlay */}
{materialUrl && (
<button
onClick={togglePlay}
className="absolute inset-0 flex items-center justify-center bg-black/0 hover:bg-black/30 transition-colors"
>
<div className={`p-3 rounded-full bg-black/60 text-white transition-opacity ${isPlaying ? "opacity-0 group-hover:opacity-100" : "opacity-100"}`}>
{isPlaying ? <Pause className="h-6 w-6" /> : <Play className="h-6 w-6" />}
</div>
</button>
)}
<div className="absolute bottom-2 right-2 bg-black/70 text-white text-[10px] px-2 py-0.5 rounded pointer-events-none">
{formatSec(currentTime)}
</div>
</div>
</div>
{/* Dual-handle range slider */}
<div className="px-5 py-4 space-y-3">
<div className="text-xs text-gray-400 flex justify-between">
<span>: {duration > 0 ? formatSec(duration) : "加载中..."}</span>
</div>
{/* Custom range track */}
<div
ref={trackRef}
className="relative h-10 cursor-pointer select-none touch-none"
onPointerMove={handleTrackPointerMove}
onPointerUp={handleTrackPointerUp}
onPointerLeave={handleTrackPointerUp}
>
{/* Background track */}
<div className="absolute top-1/2 -translate-y-1/2 left-0 right-0 h-2 bg-white/10 rounded-full" />
{/* Selected range */}
<div
className="absolute top-1/2 -translate-y-1/2 h-2 rounded-full"
style={{
left: `${startPct}%`,
width: `${endPct - startPct}%`,
backgroundColor: segment.color + "88",
}}
/>
{/* Playhead indicator */}
{duration > 0 && (
<div
className="absolute top-1/2 -translate-y-1/2 w-0.5 h-4 bg-white/60 rounded-full pointer-events-none"
style={{ left: `${playheadPct}%` }}
/>
)}
{/* Start thumb */}
<div
onPointerDown={(e) => handleThumbPointerDown("start", e)}
className="absolute top-1/2 -translate-y-1/2 -translate-x-1/2 w-5 h-5 rounded-full bg-purple-500 border-2 border-white shadow-lg cursor-grab active:cursor-grabbing hover:scale-110 transition-transform z-10"
style={{ left: `${startPct}%` }}
title={`起点: ${formatSec(sourceStart)}`}
/>
{/* End thumb */}
<div
onPointerDown={(e) => handleThumbPointerDown("end", e)}
className="absolute top-1/2 -translate-y-1/2 -translate-x-1/2 w-5 h-5 rounded-full bg-pink-500 border-2 border-white shadow-lg cursor-grab active:cursor-grabbing hover:scale-110 transition-transform z-10"
style={{ left: `${endPct}%` }}
title={`终点: ${formatSec(effectiveEnd)}`}
/>
</div>
{/* Time labels */}
<div className="flex justify-between text-xs text-gray-400">
<span className="text-purple-400">{formatSec(sourceStart)}</span>
<span className="text-pink-400">{formatSec(effectiveEnd)}</span>
</div>
{/* Info */}
<div className="text-[11px] text-gray-500 flex items-center gap-2 flex-wrap">
<span>: {clipDur.toFixed(1)}s</span>
<span className="text-gray-600">|</span>
<span>: {assignedDur.toFixed(1)}s</span>
{clipDur < assignedDur && <span className="text-amber-500">()</span>}
{clipDur > assignedDur && <span className="text-cyan-500">()</span>}
</div>
</div>
{/* Actions */}
<div className="flex justify-end gap-2 px-5 pb-4">
<button
onClick={onClose}
className="px-4 py-1.5 text-xs bg-white/10 hover:bg-white/20 rounded-lg text-gray-300 transition-colors"
>
</button>
<button
onClick={handleConfirm}
className="px-4 py-1.5 text-xs bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-700 hover:to-pink-700 text-white rounded-lg transition-colors"
>
</button>
</div>
</div>
</div>
);
}

View File

@@ -0,0 +1,303 @@
import { useEffect } from "react";
import { createPortal } from "react-dom";
import { X } from "lucide-react";
interface SubtitleStyleOption {
id: string;
label: string;
font_family?: string;
font_file?: string;
font_size?: number;
highlight_color?: string;
normal_color?: string;
stroke_color?: string;
stroke_size?: number;
letter_spacing?: number;
bottom_margin?: number;
is_default?: boolean;
}
interface TitleStyleOption {
id: string;
label: string;
font_family?: string;
font_file?: string;
font_size?: number;
color?: string;
stroke_color?: string;
stroke_size?: number;
letter_spacing?: number;
font_weight?: number;
top_margin?: number;
is_default?: boolean;
}
interface FloatingStylePreviewProps {
onClose: () => void;
videoTitle: string;
videoSecondaryTitle: string;
titleStyles: TitleStyleOption[];
selectedTitleStyleId: string;
titleFontSize: number;
selectedSecondaryTitleStyleId: string;
secondaryTitleFontSize: number;
secondaryTitleTopMargin: number;
subtitleStyles: SubtitleStyleOption[];
selectedSubtitleStyleId: string;
subtitleFontSize: number;
titleTopMargin: number;
subtitleBottomMargin: number;
enableSubtitles: boolean;
resolveAssetUrl: (path?: string | null) => string | null;
getFontFormat: (fontFile?: string) => string;
buildTextShadow: (color: string, size: number) => string;
previewBaseWidth: number;
previewBaseHeight: number;
previewBackgroundUrl?: string | null;
}
const DESKTOP_WIDTH = 280;
const MOBILE_WIDTH = 160;
export function FloatingStylePreview({
onClose,
videoTitle,
videoSecondaryTitle,
titleStyles,
selectedTitleStyleId,
titleFontSize,
selectedSecondaryTitleStyleId,
secondaryTitleFontSize,
secondaryTitleTopMargin,
subtitleStyles,
selectedSubtitleStyleId,
subtitleFontSize,
titleTopMargin,
subtitleBottomMargin,
enableSubtitles,
resolveAssetUrl,
getFontFormat,
buildTextShadow,
previewBaseWidth,
previewBaseHeight,
previewBackgroundUrl,
}: FloatingStylePreviewProps) {
const isMobile = typeof window !== "undefined" && window.innerWidth < 640;
const windowWidth = isMobile ? MOBILE_WIDTH : DESKTOP_WIDTH;
useEffect(() => {
const handleKeyDown = (e: KeyboardEvent) => {
if (e.key === "Escape") onClose();
};
window.addEventListener("keydown", handleKeyDown);
return () => window.removeEventListener("keydown", handleKeyDown);
}, [onClose]);
const previewScale = windowWidth / previewBaseWidth;
const previewHeight = previewBaseHeight * previewScale;
const widthScale = Math.min(1, previewBaseWidth / 1080);
const responsiveScale = Math.max(0.55, widthScale);
const activeSubtitleStyle = subtitleStyles.find((s) => s.id === selectedSubtitleStyleId)
|| subtitleStyles.find((s) => s.is_default)
|| subtitleStyles[0];
const activeTitleStyle = titleStyles.find((s) => s.id === selectedTitleStyleId)
|| titleStyles.find((s) => s.is_default)
|| titleStyles[0];
const previewTitleText = videoTitle.trim() || "这里是标题预览";
const subtitleHighlightText = "最近一个叫Cloudbot";
const subtitleNormalText = "的开源项目在GitHub上彻底火了";
const subtitleHighlightColor = activeSubtitleStyle?.highlight_color || "#FFE600";
const subtitleNormalColor = activeSubtitleStyle?.normal_color || "#FFFFFF";
const subtitleStrokeColor = activeSubtitleStyle?.stroke_color || "#000000";
const subtitleStrokeSize = Math.max(1, Math.round((activeSubtitleStyle?.stroke_size ?? 3) * responsiveScale));
const subtitleLetterSpacing = Math.max(0, (activeSubtitleStyle?.letter_spacing ?? 2) * responsiveScale);
const subtitleFontFamilyName = `SubtitlePreview-${activeSubtitleStyle?.id || "default"}`;
const subtitleFontUrl = activeSubtitleStyle?.font_file
? resolveAssetUrl(`fonts/${activeSubtitleStyle.font_file}`)
: null;
const titleColor = activeTitleStyle?.color || "#FFFFFF";
const titleStrokeColor = activeTitleStyle?.stroke_color || "#000000";
const titleStrokeSize = Math.max(1, Math.round((activeTitleStyle?.stroke_size ?? 8) * responsiveScale));
const titleLetterSpacing = Math.max(0, (activeTitleStyle?.letter_spacing ?? 4) * responsiveScale);
const titleFontWeight = activeTitleStyle?.font_weight ?? 900;
const titleFontFamilyName = `TitlePreview-${activeTitleStyle?.id || "default"}`;
const titleFontUrl = activeTitleStyle?.font_file
? resolveAssetUrl(`fonts/${activeTitleStyle.font_file}`)
: null;
const scaledTitleFontSize = Math.max(36, Math.round(titleFontSize * responsiveScale));
const scaledSubtitleFontSize = Math.max(28, Math.round(subtitleFontSize * responsiveScale));
const scaledTitleTopMargin = Math.max(0, Math.round(titleTopMargin * responsiveScale));
const scaledSubtitleBottomMargin = Math.max(0, Math.round(subtitleBottomMargin * responsiveScale));
// 副标题样式
const activeSecondaryTitleStyle = titleStyles.find((s) => s.id === selectedSecondaryTitleStyleId)
|| activeTitleStyle;
const stColor = activeSecondaryTitleStyle?.color || "#FFFFFF";
const stStrokeColor = activeSecondaryTitleStyle?.stroke_color || "#000000";
const stStrokeSize = Math.max(1, Math.round((activeSecondaryTitleStyle?.stroke_size ?? 6) * responsiveScale));
const stLetterSpacing = Math.max(0, (activeSecondaryTitleStyle?.letter_spacing ?? 2) * responsiveScale);
const stFontWeight = activeSecondaryTitleStyle?.font_weight ?? 700;
const stFontFamilyName = `SecondaryTitlePreview-${activeSecondaryTitleStyle?.id || "default"}`;
const stFontUrl = activeSecondaryTitleStyle?.font_file
? resolveAssetUrl(`fonts/${activeSecondaryTitleStyle.font_file}`)
: null;
const scaledSecondaryTitleFontSize = Math.max(24, Math.round(secondaryTitleFontSize * responsiveScale));
const scaledSecondaryTitleTopMargin = Math.max(0, Math.round(secondaryTitleTopMargin * responsiveScale));
const previewSecondaryTitleText = videoSecondaryTitle.trim() || "";
const content = (
<div
style={{
position: "fixed",
...(isMobile
? { right: "12px", bottom: "12px" }
: { left: "16px", top: "16px" }),
width: `${windowWidth}px`,
zIndex: 150,
maxHeight: isMobile ? "calc(50dvh)" : "calc(100dvh - 32px)",
overflow: "hidden",
}}
className="rounded-xl border border-white/20 bg-gray-900/95 backdrop-blur-md shadow-2xl"
>
{/* 标题栏 */}
<div
className="flex items-center justify-between px-3 py-2 border-b border-white/10 select-none"
>
<div className="flex items-center gap-2 text-sm text-gray-300">
<span></span>
</div>
<button
onClick={onClose}
className="p-1 rounded hover:bg-white/10 text-gray-400 hover:text-white transition-colors"
>
<X className="h-4 w-4" />
</button>
</div>
{/* 预览内容 */}
<div
className="relative overflow-hidden rounded-b-xl"
style={{ height: `${previewHeight}px` }}
>
{(titleFontUrl || subtitleFontUrl || stFontUrl) && (
<style>{`
${titleFontUrl ? `@font-face { font-family: '${titleFontFamilyName}'; src: url('${titleFontUrl}') format('${getFontFormat(activeTitleStyle?.font_file)}'); font-weight: 400; font-style: normal; }` : ''}
${stFontUrl && stFontUrl !== titleFontUrl ? `@font-face { font-family: '${stFontFamilyName}'; src: url('${stFontUrl}') format('${getFontFormat(activeSecondaryTitleStyle?.font_file)}'); font-weight: 400; font-style: normal; }` : ''}
${subtitleFontUrl ? `@font-face { font-family: '${subtitleFontFamilyName}'; src: url('${subtitleFontUrl}') format('${getFontFormat(activeSubtitleStyle?.font_file)}'); font-weight: 400; font-style: normal; }` : ''}
`}</style>
)}
{previewBackgroundUrl ? (
<img src={previewBackgroundUrl} alt="" className="absolute inset-0 w-full h-full object-cover" />
) : (
<div className="absolute inset-0 opacity-20 bg-gradient-to-br from-purple-500/40 via-transparent to-pink-500/30" />
)}
<div
className="absolute top-0 left-0"
style={{
width: `${previewBaseWidth}px`,
height: `${previewBaseHeight}px`,
transform: `scale(${previewScale})`,
transformOrigin: 'top left',
}}
>
<div
className="w-full text-center"
style={{
position: 'absolute',
top: `${scaledTitleTopMargin}px`,
left: 0,
right: 0,
display: 'flex',
flexDirection: 'column',
alignItems: 'center',
padding: '0 5%',
boxSizing: 'border-box',
}}
>
<div
style={{
color: titleColor,
fontSize: `${scaledTitleFontSize}px`,
fontWeight: titleFontWeight,
fontFamily: titleFontUrl
? `'${titleFontFamilyName}', "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "Noto Sans SC", sans-serif`
: '"PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "Noto Sans SC", sans-serif',
textShadow: buildTextShadow(titleStrokeColor, titleStrokeSize),
letterSpacing: `${titleLetterSpacing}px`,
lineHeight: 1.2,
whiteSpace: 'normal',
wordBreak: 'break-word',
overflowWrap: 'anywhere',
opacity: videoTitle.trim() ? 1 : 0.7,
}}
>
{previewTitleText}
</div>
{previewSecondaryTitleText && (
<div
style={{
marginTop: `${scaledSecondaryTitleTopMargin}px`,
color: stColor,
fontSize: `${scaledSecondaryTitleFontSize}px`,
fontWeight: stFontWeight,
fontFamily: stFontUrl && stFontUrl !== titleFontUrl
? `'${stFontFamilyName}', "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "Noto Sans SC", sans-serif`
: titleFontUrl
? `'${titleFontFamilyName}', "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "Noto Sans SC", sans-serif`
: '"PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "Noto Sans SC", sans-serif',
textShadow: buildTextShadow(stStrokeColor, stStrokeSize),
letterSpacing: `${stLetterSpacing}px`,
lineHeight: 1.2,
whiteSpace: 'normal',
wordBreak: 'break-word',
overflowWrap: 'anywhere',
}}
>
{previewSecondaryTitleText}
</div>
)}
</div>
<div
className="w-full text-center"
style={{
position: 'absolute',
bottom: `${scaledSubtitleBottomMargin}px`,
left: 0,
right: 0,
fontSize: `${scaledSubtitleFontSize}px`,
fontFamily: subtitleFontUrl
? `'${subtitleFontFamilyName}', "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "Noto Sans SC", sans-serif`
: '"PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "Noto Sans SC", sans-serif',
textShadow: buildTextShadow(subtitleStrokeColor, subtitleStrokeSize),
letterSpacing: `${subtitleLetterSpacing}px`,
lineHeight: 1.35,
whiteSpace: 'normal',
wordBreak: 'break-word',
overflowWrap: 'anywhere',
boxSizing: 'border-box',
padding: '0 6%',
}}
>
{enableSubtitles ? (
<>
<span style={{ color: subtitleHighlightColor }}>{subtitleHighlightText}</span>
<span style={{ color: subtitleNormalColor }}>{subtitleNormalText}</span>
</>
) : (
<span className="text-gray-400 text-sm"></span>
)}
</div>
</div>
</div>
</div>
);
return createPortal(content, document.body);
}

View File

@@ -4,6 +4,7 @@ interface GenerateActionBarProps {
isGenerating: boolean;
progress: number;
disabled: boolean;
materialCount?: number;
onGenerate: () => void;
}
@@ -11,43 +12,51 @@ export function GenerateActionBar({
isGenerating,
progress,
disabled,
materialCount = 1,
onGenerate,
}: GenerateActionBarProps) {
return (
<button
onClick={onGenerate}
disabled={disabled}
className={`w-full py-4 rounded-xl font-bold text-lg transition-all ${disabled
? "bg-gray-600 cursor-not-allowed text-gray-400"
: "bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-700 hover:to-pink-700 text-white shadow-lg hover:shadow-purple-500/25"
}`}
>
{isGenerating ? (
<span className="flex items-center justify-center gap-3">
<svg className="animate-spin h-5 w-5" viewBox="0 0 24 24">
<circle
className="opacity-25"
cx="12"
cy="12"
r="10"
stroke="currentColor"
strokeWidth="4"
fill="none"
/>
<path
className="opacity-75"
fill="currentColor"
d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z"
/>
</svg>
... {progress}%
</span>
) : (
<span className="flex items-center justify-center gap-2">
<Rocket className="h-5 w-5" />
</span>
<div>
<button
onClick={onGenerate}
disabled={disabled}
className={`w-full py-4 rounded-xl font-bold text-lg transition-all ${disabled
? "bg-gray-600 cursor-not-allowed text-gray-400"
: "bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-700 hover:to-pink-700 text-white shadow-lg hover:shadow-purple-500/25"
}`}
>
{isGenerating ? (
<span className="flex items-center justify-center gap-3">
<svg className="animate-spin h-5 w-5" viewBox="0 0 24 24">
<circle
className="opacity-25"
cx="12"
cy="12"
r="10"
stroke="currentColor"
strokeWidth="4"
fill="none"
/>
<path
className="opacity-75"
fill="currentColor"
d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z"
/>
</svg>
... {progress}%
</span>
) : (
<span className="flex items-center justify-center gap-2">
<Rocket className="h-5 w-5" />
</span>
)}
</button>
{!isGenerating && materialCount >= 2 && (
<p className="text-xs text-gray-400 text-center mt-1.5">
({materialCount} )
</p>
)}
</button>
</div>
);
}

View File

@@ -0,0 +1,362 @@
import { useState, useRef, useCallback, useEffect } from "react";
import { Play, Pause, Pencil, Trash2, Check, X, RefreshCw, Mic, ChevronDown } from "lucide-react";
import type { GeneratedAudio } from "@/features/home/model/useGeneratedAudios";
interface AudioTask {
status: string;
progress?: number;
message?: string;
}
interface GeneratedAudiosPanelProps {
generatedAudios: GeneratedAudio[];
selectedAudioId: string | null;
isGeneratingAudio: boolean;
audioTask: AudioTask | null;
onGenerateAudio: () => void;
onRefresh: () => void;
onSelectAudio: (audio: GeneratedAudio) => void;
onDeleteAudio: (id: string) => void;
onRenameAudio: (id: string, newName: string) => void;
hasText: boolean;
missingRefAudio?: boolean;
speed: number;
onSpeedChange: (speed: number) => void;
ttsMode: string;
embedded?: boolean;
}
export function GeneratedAudiosPanel({
generatedAudios,
selectedAudioId,
isGeneratingAudio,
audioTask,
onGenerateAudio,
onRefresh,
onSelectAudio,
onDeleteAudio,
onRenameAudio,
hasText,
missingRefAudio = false,
speed,
onSpeedChange,
ttsMode,
embedded = false,
}: GeneratedAudiosPanelProps) {
const [editingId, setEditingId] = useState<string | null>(null);
const [editName, setEditName] = useState("");
const [playingId, setPlayingId] = useState<string | null>(null);
const [speedOpen, setSpeedOpen] = useState(false);
const audioRef = useRef<HTMLAudioElement | null>(null);
const speedRef = useRef<HTMLDivElement>(null);
const stopPlaying = useCallback(() => {
if (audioRef.current) {
audioRef.current.pause();
audioRef.current.currentTime = 0;
audioRef.current = null;
}
setPlayingId(null);
}, []);
// Cleanup on unmount
useEffect(() => {
return () => {
if (audioRef.current) {
audioRef.current.pause();
audioRef.current = null;
}
};
}, []);
// Close speed dropdown on click outside
useEffect(() => {
const handler = (e: MouseEvent) => {
if (speedRef.current && !speedRef.current.contains(e.target as Node)) {
setSpeedOpen(false);
}
};
if (speedOpen) document.addEventListener("mousedown", handler);
return () => document.removeEventListener("mousedown", handler);
}, [speedOpen]);
const togglePlay = (audio: GeneratedAudio, e: React.MouseEvent) => {
e.stopPropagation();
if (playingId === audio.id) {
stopPlaying();
return;
}
stopPlaying();
const player = new Audio(audio.path);
player.onended = () => setPlayingId(null);
player.play().catch(() => {});
audioRef.current = player;
setPlayingId(audio.id);
};
const startEditing = (audio: GeneratedAudio, e: React.MouseEvent) => {
e.stopPropagation();
setEditingId(audio.id);
setEditName(audio.name);
};
const saveEditing = (audioId: string, e: React.MouseEvent) => {
e.stopPropagation();
if (!editName.trim()) return;
onRenameAudio(audioId, editName.trim());
setEditingId(null);
setEditName("");
};
const cancelEditing = (e: React.MouseEvent) => {
e.stopPropagation();
setEditingId(null);
setEditName("");
};
const canGenerate = hasText && !missingRefAudio;
const speedOptions = [
{ value: 0.8, label: "较慢" },
{ value: 0.9, label: "稍慢" },
{ value: 1.0, label: "正常" },
{ value: 1.1, label: "稍快" },
{ value: 1.2, label: "较快" },
] as const;
const currentSpeedLabel = speedOptions.find((o) => o.value === speed)?.label ?? "正常";
const content = (
<>
{embedded ? (
<>
{/* Row 1: 语速 + 生成配音 (right-aligned) */}
<div className="flex justify-end items-center gap-1.5 mb-3">
{ttsMode === "voiceclone" && (
<div ref={speedRef} className="relative">
<button
onClick={() => setSpeedOpen((v) => !v)}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 whitespace-nowrap flex items-center gap-1 transition-all"
>
: {currentSpeedLabel}
<ChevronDown className={`h-3 w-3 transition-transform ${speedOpen ? "rotate-180" : ""}`} />
</button>
{speedOpen && (
<div className="absolute right-0 top-full mt-1 bg-gray-800 border border-white/20 rounded-lg shadow-xl py-1 z-50 min-w-[80px]">
{speedOptions.map((opt) => (
<button
key={opt.value}
onClick={() => { onSpeedChange(opt.value); setSpeedOpen(false); }}
className={`w-full text-left px-3 py-1.5 text-xs transition-colors ${
speed === opt.value
? "bg-purple-600/40 text-purple-200"
: "text-gray-300 hover:bg-white/10"
}`}
>
{opt.label}
</button>
))}
</div>
)}
</div>
)}
<button
onClick={onGenerateAudio}
disabled={isGeneratingAudio || !canGenerate}
title={missingRefAudio ? "请先选择参考音频" : !hasText ? "请先输入文案" : ""}
className={`px-4 py-2 text-sm font-medium rounded-lg transition-all whitespace-nowrap flex items-center gap-1.5 shadow-sm ${
isGeneratingAudio || !canGenerate
? "bg-gray-600 cursor-not-allowed text-gray-400"
: "bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-700 hover:to-pink-700 text-white hover:shadow-md"
}`}
>
<Mic className="h-4 w-4" />
</button>
</div>
{/* Row 2: 配音列表 + 刷新 */}
<div className="flex justify-between items-center mb-3">
<h3 className="text-sm font-medium text-gray-400"></h3>
<button
onClick={onRefresh}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 whitespace-nowrap flex items-center gap-1"
>
<RefreshCw className="h-3.5 w-3.5" />
</button>
</div>
</>
) : (
<div className="flex justify-between items-center gap-2 mb-4">
<h2 className="text-base sm:text-lg font-semibold text-white flex items-center gap-2 whitespace-nowrap">
<Mic className="h-4 w-4 text-purple-400" />
</h2>
<div className="flex gap-1.5">
{ttsMode === "voiceclone" && (
<div ref={speedRef} className="relative">
<button
onClick={() => setSpeedOpen((v) => !v)}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 whitespace-nowrap flex items-center gap-1 transition-all"
>
: {currentSpeedLabel}
<ChevronDown className={`h-3 w-3 transition-transform ${speedOpen ? "rotate-180" : ""}`} />
</button>
{speedOpen && (
<div className="absolute right-0 top-full mt-1 bg-gray-800 border border-white/20 rounded-lg shadow-xl py-1 z-50 min-w-[80px]">
{speedOptions.map((opt) => (
<button
key={opt.value}
onClick={() => { onSpeedChange(opt.value); setSpeedOpen(false); }}
className={`w-full text-left px-3 py-1.5 text-xs transition-colors ${
speed === opt.value
? "bg-purple-600/40 text-purple-200"
: "text-gray-300 hover:bg-white/10"
}`}
>
{opt.label}
</button>
))}
</div>
)}
</div>
)}
<button
onClick={onGenerateAudio}
disabled={isGeneratingAudio || !canGenerate}
title={missingRefAudio ? "请先选择参考音频" : !hasText ? "请先输入文案" : ""}
className={`px-4 py-2 text-sm font-medium rounded-lg transition-all whitespace-nowrap flex items-center gap-1.5 shadow-sm ${
isGeneratingAudio || !canGenerate
? "bg-gray-600 cursor-not-allowed text-gray-400"
: "bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-700 hover:to-pink-700 text-white hover:shadow-md"
}`}
>
<Mic className="h-4 w-4" />
</button>
<button
onClick={onRefresh}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 whitespace-nowrap flex items-center gap-1"
>
<RefreshCw className="h-3.5 w-3.5" />
</button>
</div>
</div>
)}
{/* 缺少参考音频提示 */}
{missingRefAudio && (
<div className="mb-3 px-3 py-2 bg-yellow-500/10 border border-yellow-500/30 rounded-lg text-yellow-300 text-xs">
</div>
)}
{/* 生成进度 */}
{isGeneratingAudio && audioTask && (
<div className="mb-4 p-3 bg-purple-500/10 rounded-xl border border-purple-500/30">
<div className="flex justify-between text-sm text-purple-300 mb-2">
<span>{audioTask.message || "生成中..."}</span>
<span>{audioTask.progress || 0}%</span>
</div>
<div className="h-2 bg-black/30 rounded-full overflow-hidden">
<div
className="h-full bg-gradient-to-r from-purple-500 to-pink-500 transition-all duration-300"
style={{ width: `${audioTask.progress || 0}%` }}
/>
</div>
</div>
)}
{/* 配音列表 */}
{generatedAudios.length === 0 ? (
<div className="text-center py-6 text-gray-400">
<p className="text-sm"></p>
<p className="text-xs mt-1 text-gray-500"></p>
</div>
) : (
<div className="space-y-2 max-h-48 sm:max-h-56 overflow-y-auto hide-scrollbar">
{generatedAudios.map((audio) => {
const isSelected = selectedAudioId === audio.id;
return (
<div
key={audio.id}
onClick={() => onSelectAudio(audio)}
className={`p-3 rounded-lg border transition-all cursor-pointer flex items-center justify-between group ${
isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
{editingId === audio.id ? (
<div className="flex-1 flex items-center gap-2" onClick={(e) => e.stopPropagation()}>
<input
value={editName}
onChange={(e) => setEditName(e.target.value)}
className="flex-1 bg-black/40 border border-white/20 rounded-md px-2 py-1 text-xs text-white"
autoFocus
onKeyDown={(e) => {
if (e.key === "Enter") saveEditing(audio.id, e as unknown as React.MouseEvent);
if (e.key === "Escape") cancelEditing(e as unknown as React.MouseEvent);
}}
/>
<button onClick={(e) => saveEditing(audio.id, e)} className="p-1 text-green-400 hover:text-green-300" title="保存">
<Check className="h-4 w-4" />
</button>
<button onClick={cancelEditing} className="p-1 text-gray-400 hover:text-white" title="取消">
<X className="h-4 w-4" />
</button>
</div>
) : (
<>
<div className="min-w-0 flex-1">
<div className="text-white text-sm truncate">{audio.name}</div>
<div className="text-gray-400 text-xs">{audio.duration_sec.toFixed(1)}s</div>
</div>
<div className="flex items-center gap-1 pl-2 opacity-40 group-hover:opacity-100 transition-opacity">
<button
onClick={(e) => togglePlay(audio, e)}
className="p-1 text-gray-500 hover:text-purple-400 transition-colors"
title={playingId === audio.id ? "暂停" : "播放"}
>
{playingId === audio.id ? (
<Pause className="h-3.5 w-3.5" />
) : (
<Play className="h-3.5 w-3.5" />
)}
</button>
<button
onClick={(e) => startEditing(audio, e)}
className="p-1 text-gray-500 hover:text-white transition-colors"
title="重命名"
>
<Pencil className="h-3.5 w-3.5" />
</button>
<button
onClick={(e) => {
e.stopPropagation();
onDeleteAudio(audio.id);
}}
className="p-1 text-gray-500 hover:text-red-400 transition-colors"
title="删除"
>
<Trash2 className="h-3.5 w-3.5" />
</button>
</div>
</>
)}
</div>
);
})}
</div>
)}
</>
);
if (embedded) return content;
return (
<div className="bg-white/5 rounded-2xl p-4 sm:p-6 border border-white/10 backdrop-blur-sm relative z-10">
{content}
</div>
);
}

View File

@@ -16,6 +16,7 @@ interface HistoryListProps {
onRefresh: () => void;
registerVideoRef: (id: string, element: HTMLDivElement | null) => void;
formatDate: (timestamp: number) => string;
embedded?: boolean;
}
export function HistoryList({
@@ -26,19 +27,22 @@ export function HistoryList({
onRefresh,
registerVideoRef,
formatDate,
embedded = false,
}: HistoryListProps) {
return (
<div className="bg-white/5 rounded-2xl p-6 border border-white/10 backdrop-blur-sm">
<div className="flex justify-between items-center mb-4">
<h2 className="text-lg font-semibold text-white flex items-center gap-2">📂 </h2>
<button
onClick={onRefresh}
className="px-3 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 flex items-center gap-1"
>
<RefreshCw className="h-3.5 w-3.5" />
</button>
</div>
const content = (
<>
{!embedded && (
<div className="flex justify-between items-center mb-4">
<h2 className="text-lg font-semibold text-white flex items-center gap-2"></h2>
<button
onClick={onRefresh}
className="px-3 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 flex items-center gap-1"
>
<RefreshCw className="h-3.5 w-3.5" />
</button>
</div>
)}
{generatedVideos.length === 0 ? (
<div className="text-center py-4 text-gray-500">
<p></p>
@@ -66,7 +70,7 @@ export function HistoryList({
e.stopPropagation();
onDeleteVideo(v.id);
}}
className="p-1 text-gray-500 hover:text-red-400 opacity-0 group-hover:opacity-100 transition-opacity"
className="p-1 text-gray-500 hover:text-red-400 opacity-40 group-hover:opacity-100 transition-opacity"
title="删除视频"
>
<Trash2 className="h-4 w-4" />
@@ -75,6 +79,14 @@ export function HistoryList({
))}
</div>
)}
</>
);
if (embedded) return content;
return (
<div className="bg-white/5 rounded-2xl p-6 border border-white/10 backdrop-blur-sm">
{content}
</div>
);
}

View File

@@ -1,20 +1,26 @@
"use client";
import { useEffect } from "react";
import { useEffect, useMemo } from "react";
import { useRouter } from "next/navigation";
import { RefreshCw } from "lucide-react";
import VideoPreviewModal from "@/components/VideoPreviewModal";
import ScriptExtractionModal from "@/components/ScriptExtractionModal";
import ScriptExtractionModal from "./ScriptExtractionModal";
import RewriteModal from "./RewriteModal";
import { useHomeController } from "@/features/home/model/useHomeController";
import { resolveMediaUrl } from "@/shared/lib/media";
import { BgmPanel } from "@/features/home/ui/BgmPanel";
import { GenerateActionBar } from "@/features/home/ui/GenerateActionBar";
import { HistoryList } from "@/features/home/ui/HistoryList";
import { HomeHeader } from "@/features/home/ui/HomeHeader";
import { MaterialSelector } from "@/features/home/ui/MaterialSelector";
import { TimelineEditor } from "@/features/home/ui/TimelineEditor";
import { ClipTrimmer } from "@/features/home/ui/ClipTrimmer";
import { PreviewPanel } from "@/features/home/ui/PreviewPanel";
import { RefAudioPanel } from "@/features/home/ui/RefAudioPanel";
import { ScriptEditor } from "@/features/home/ui/ScriptEditor";
import { TitleSubtitlePanel } from "@/features/home/ui/TitleSubtitlePanel";
import { VoiceSelector } from "@/features/home/ui/VoiceSelector";
import { GeneratedAudiosPanel } from "@/features/home/ui/GeneratedAudiosPanel";
export function HomePage() {
const router = useRouter();
@@ -34,8 +40,8 @@ export function HomePage() {
fetchMaterials,
deleteMaterial,
handleUpload,
selectedMaterial,
setSelectedMaterial,
selectedMaterials,
toggleMaterial,
handlePreviewMaterial,
editingMaterialId,
editMaterialName,
@@ -47,8 +53,17 @@ export function HomePage() {
setText,
extractModalOpen,
setExtractModalOpen,
rewriteModalOpen,
setRewriteModalOpen,
handleGenerateMeta,
isGeneratingMeta,
handleTranslate,
isTranslating,
originalText,
handleRestoreOriginal,
savedScripts,
handleSaveScript,
deleteSavedScript,
showStylePreview,
setShowStylePreview,
videoTitle,
@@ -59,20 +74,32 @@ export function HomePage() {
titleFontSize,
setTitleFontSize,
setTitleSizeLocked,
videoSecondaryTitle,
secondaryTitleInput,
selectedSecondaryTitleStyleId,
setSelectedSecondaryTitleStyleId,
secondaryTitleFontSize,
setSecondaryTitleFontSize,
setSecondaryTitleSizeLocked,
secondaryTitleTopMargin,
setSecondaryTitleTopMargin,
subtitleStyles,
selectedSubtitleStyleId,
setSelectedSubtitleStyleId,
subtitleFontSize,
setSubtitleFontSize,
setSubtitleSizeLocked,
enableSubtitles,
setEnableSubtitles,
titleTopMargin,
setTitleTopMargin,
subtitleBottomMargin,
setSubtitleBottomMargin,
titleDisplayMode,
setTitleDisplayMode,
outputAspectRatio,
setOutputAspectRatio,
resolveAssetUrl,
getFontFormat,
buildTextShadow,
previewContainerWidth,
materialDimensions,
titlePreviewContainerRef,
ttsMode,
setTtsMode,
voices,
@@ -95,6 +122,8 @@ export function HomePage() {
saveEditing,
cancelEditing,
deleteRefAudio,
retranscribeRefAudio,
retranscribingId,
recordedBlob,
isRecording,
recordingTime,
@@ -102,7 +131,6 @@ export function HomePage() {
stopRecording,
useRecording,
formatRecordingTime,
fixedRefText,
bgmList,
bgmLoading,
bgmError,
@@ -128,12 +156,56 @@ export function HomePage() {
fetchGeneratedVideos,
registerVideoRef,
formatDate,
generatedAudios,
selectedAudio,
selectedAudioId,
isGeneratingAudio,
audioTask,
fetchGeneratedAudios,
handleGenerateAudio,
deleteAudio,
renameAudio,
selectAudio,
speed,
setSpeed,
timelineSegments,
reorderSegments,
setSourceRange,
clipTrimmerOpen,
setClipTrimmerOpen,
clipTrimmerSegmentId,
setClipTrimmerSegmentId,
materialPosterUrl,
} = useHomeController();
useEffect(() => {
router.prefetch("/publish");
}, [router]);
useEffect(() => {
if (typeof window === "undefined") return;
if ("scrollRestoration" in history) {
history.scrollRestoration = "manual";
}
window.scrollTo({ top: 0, left: 0, behavior: "auto" });
// 兜底:等所有恢复 effect + 异步数据加载 settle 后再次强制回顶部
const timer = setTimeout(() => {
window.scrollTo({ top: 0, left: 0, behavior: "auto" });
}, 200);
return () => clearTimeout(timer);
}, []);
const clipTrimmerSegment = useMemo(
() => timelineSegments.find((s) => s.id === clipTrimmerSegmentId) ?? null,
[timelineSegments, clipTrimmerSegmentId]
);
const clipTrimmerMaterialUrl = useMemo(() => {
if (!clipTrimmerSegment) return null;
const mat = materials.find((m) => m.id === clipTrimmerSegment.materialId);
return mat?.path ? resolveMediaUrl(mat.path) : null;
}, [clipTrimmerSegment, materials]);
return (
<div className="min-h-dvh">
<HomeHeader />
@@ -142,42 +214,145 @@ export function HomePage() {
<div className="grid grid-cols-1 lg:grid-cols-2 gap-8">
{/* 左侧: 输入区域 */}
<div className="space-y-6">
{/* 素材选择 */}
<MaterialSelector
materials={materials}
selectedMaterial={selectedMaterial}
isFetching={isFetching}
lastMaterialCount={lastMaterialCount}
editingMaterialId={editingMaterialId}
editMaterialName={editMaterialName}
isUploading={isUploading}
uploadProgress={uploadProgress}
uploadError={uploadError}
fetchError={fetchError}
apiBase={apiBase}
onUploadChange={handleUpload}
onRefresh={fetchMaterials}
onSelectMaterial={setSelectedMaterial}
onPreviewMaterial={handlePreviewMaterial}
onStartEditing={startMaterialEditing}
onEditNameChange={setEditMaterialName}
onSaveEditing={saveMaterialEditing}
onCancelEditing={cancelMaterialEditing}
onDeleteMaterial={deleteMaterial}
onClearUploadError={() => setUploadError(null)}
registerMaterialRef={registerMaterialRef}
/>
{/* 文案输入 */}
{/* 一、文案提取与编辑 */}
<ScriptEditor
text={text}
onChangeText={setText}
onOpenExtractModal={() => setExtractModalOpen(true)}
onOpenRewriteModal={() => setRewriteModalOpen(true)}
onGenerateMeta={handleGenerateMeta}
isGeneratingMeta={isGeneratingMeta}
onTranslate={handleTranslate}
isTranslating={isTranslating}
hasOriginalText={originalText !== null}
onRestoreOriginal={handleRestoreOriginal}
savedScripts={savedScripts}
onSaveScript={handleSaveScript}
onLoadScript={setText}
onDeleteScript={deleteSavedScript}
/>
{/* 标题和字幕设置 */}
{/* 二、配音 */}
<div className="bg-white/5 rounded-2xl p-4 sm:p-6 border border-white/10 backdrop-blur-sm">
<h2 className="text-base sm:text-lg font-semibold text-white mb-4">
</h2>
<h3 className="text-sm font-medium text-gray-400 mb-3"></h3>
<VoiceSelector
embedded
ttsMode={ttsMode}
onSelectTtsMode={setTtsMode}
voices={voices}
voice={voice}
onSelectVoice={setVoice}
voiceCloneSlot={(
<RefAudioPanel
refAudios={refAudios}
selectedRefAudio={selectedRefAudio}
onSelectRefAudio={handleSelectRefAudio}
isUploadingRef={isUploadingRef}
uploadRefError={uploadRefError}
onClearUploadRefError={() => setUploadRefError(null)}
onUploadRefAudio={uploadRefAudio}
onFetchRefAudios={fetchRefAudios}
playingAudioId={playingAudioId}
onTogglePlayPreview={togglePlayPreview}
editingAudioId={editingAudioId}
editName={editName}
onEditNameChange={setEditName}
onStartEditing={startEditing}
onSaveEditing={saveEditing}
onCancelEditing={cancelEditing}
onDeleteRefAudio={deleteRefAudio}
onRetranscribe={retranscribeRefAudio}
retranscribingId={retranscribingId}
recordedBlob={recordedBlob}
isRecording={isRecording}
recordingTime={recordingTime}
onStartRecording={startRecording}
onStopRecording={stopRecording}
onUseRecording={useRecording}
formatRecordingTime={formatRecordingTime}
/>
)}
/>
<div className="border-t border-white/10 my-4" />
<GeneratedAudiosPanel
embedded
generatedAudios={generatedAudios}
selectedAudioId={selectedAudioId}
isGeneratingAudio={isGeneratingAudio}
audioTask={audioTask}
onGenerateAudio={handleGenerateAudio}
onRefresh={() => fetchGeneratedAudios()}
onSelectAudio={selectAudio}
onDeleteAudio={deleteAudio}
onRenameAudio={renameAudio}
hasText={!!text.trim()}
missingRefAudio={ttsMode === "voiceclone" && !selectedRefAudio}
speed={speed}
onSpeedChange={setSpeed}
ttsMode={ttsMode}
/>
</div>
{/* 三、素材编辑 */}
<div className="bg-white/5 rounded-2xl p-4 sm:p-6 border border-white/10 backdrop-blur-sm">
<h2 className="text-base sm:text-lg font-semibold text-white mb-4">
</h2>
<MaterialSelector
embedded
materials={materials}
selectedMaterials={selectedMaterials}
isFetching={isFetching}
lastMaterialCount={lastMaterialCount}
editingMaterialId={editingMaterialId}
editMaterialName={editMaterialName}
isUploading={isUploading}
uploadProgress={uploadProgress}
uploadError={uploadError}
fetchError={fetchError}
apiBase={apiBase}
onUploadChange={handleUpload}
onRefresh={fetchMaterials}
onToggleMaterial={toggleMaterial}
onPreviewMaterial={handlePreviewMaterial}
onStartEditing={startMaterialEditing}
onEditNameChange={setEditMaterialName}
onSaveEditing={saveMaterialEditing}
onCancelEditing={cancelMaterialEditing}
onDeleteMaterial={deleteMaterial}
onClearUploadError={() => setUploadError(null)}
registerMaterialRef={registerMaterialRef}
/>
<div className="border-t border-white/10 my-4" />
<div className="relative">
{(!selectedAudio || selectedMaterials.length === 0) && (
<div className="absolute inset-0 bg-black/50 backdrop-blur-sm rounded-xl flex items-center justify-center z-10">
<p className="text-gray-400">
{!selectedAudio ? "请先生成并选中配音" : "请先选择素材"}
</p>
</div>
)}
<TimelineEditor
embedded
audioDuration={selectedAudio?.duration_sec ?? 0}
audioUrl={selectedAudio ? (resolveMediaUrl(selectedAudio.path) || "") : ""}
segments={timelineSegments}
materials={materials}
outputAspectRatio={outputAspectRatio}
onOutputAspectRatioChange={setOutputAspectRatio}
onReorderSegment={reorderSegments}
onClickSegment={(seg) => {
setClipTrimmerSegmentId(seg.id);
setClipTrimmerOpen(true);
}}
/>
</div>
</div>
{/* 四、标题与字幕 */}
<TitleSubtitlePanel
showStylePreview={showStylePreview}
onTogglePreview={() => setShowStylePreview((prev) => !prev)}
@@ -185,6 +360,10 @@ export function HomePage() {
onTitleChange={titleInput.handleChange}
onTitleCompositionStart={titleInput.handleCompositionStart}
onTitleCompositionEnd={titleInput.handleCompositionEnd}
videoSecondaryTitle={videoSecondaryTitle}
onSecondaryTitleChange={secondaryTitleInput.handleChange}
onSecondaryTitleCompositionStart={secondaryTitleInput.handleCompositionStart}
onSecondaryTitleCompositionEnd={secondaryTitleInput.handleCompositionEnd}
titleStyles={titleStyles}
selectedTitleStyleId={selectedTitleStyleId}
onSelectTitleStyle={setSelectedTitleStyleId}
@@ -193,6 +372,15 @@ export function HomePage() {
setTitleFontSize(value);
setTitleSizeLocked(true);
}}
selectedSecondaryTitleStyleId={selectedSecondaryTitleStyleId}
onSelectSecondaryTitleStyle={setSelectedSecondaryTitleStyleId}
secondaryTitleFontSize={secondaryTitleFontSize}
onSecondaryTitleFontSizeChange={(value) => {
setSecondaryTitleFontSize(value);
setSecondaryTitleSizeLocked(true);
}}
secondaryTitleTopMargin={secondaryTitleTopMargin}
onSecondaryTitleTopMarginChange={setSecondaryTitleTopMargin}
subtitleStyles={subtitleStyles}
selectedSubtitleStyleId={selectedSubtitleStyleId}
onSelectSubtitleStyle={setSelectedSubtitleStyleId}
@@ -201,61 +389,21 @@ export function HomePage() {
setSubtitleFontSize(value);
setSubtitleSizeLocked(true);
}}
enableSubtitles={enableSubtitles}
onToggleSubtitles={setEnableSubtitles}
titleTopMargin={titleTopMargin}
onTitleTopMarginChange={setTitleTopMargin}
subtitleBottomMargin={subtitleBottomMargin}
onSubtitleBottomMarginChange={setSubtitleBottomMargin}
titleDisplayMode={titleDisplayMode}
onTitleDisplayModeChange={setTitleDisplayMode}
resolveAssetUrl={resolveAssetUrl}
getFontFormat={getFontFormat}
buildTextShadow={buildTextShadow}
previewScale={previewContainerWidth && (materialDimensions?.width || 1280)
? previewContainerWidth / (materialDimensions?.width || 1280)
: 1}
previewAspectRatio={materialDimensions
? `${materialDimensions.width} / ${materialDimensions.height}`
: "16 / 9"}
previewBaseWidth={materialDimensions?.width || 1280}
previewBaseHeight={materialDimensions?.height || 720}
previewContainerRef={titlePreviewContainerRef}
previewBaseWidth={outputAspectRatio === "16:9" ? 1920 : 1080}
previewBaseHeight={outputAspectRatio === "16:9" ? 1080 : 1920}
previewBackgroundUrl={materialPosterUrl}
/>
{/* 配音方式选择 */}
<VoiceSelector
ttsMode={ttsMode}
onSelectTtsMode={setTtsMode}
voices={voices}
voice={voice}
onSelectVoice={setVoice}
voiceCloneSlot={(
<RefAudioPanel
refAudios={refAudios}
selectedRefAudio={selectedRefAudio}
onSelectRefAudio={handleSelectRefAudio}
isUploadingRef={isUploadingRef}
uploadRefError={uploadRefError}
onClearUploadRefError={() => setUploadRefError(null)}
onUploadRefAudio={uploadRefAudio}
onFetchRefAudios={fetchRefAudios}
playingAudioId={playingAudioId}
onTogglePlayPreview={togglePlayPreview}
editingAudioId={editingAudioId}
editName={editName}
onEditNameChange={setEditName}
onStartEditing={startEditing}
onSaveEditing={saveEditing}
onCancelEditing={cancelEditing}
onDeleteRefAudio={deleteRefAudio}
recordedBlob={recordedBlob}
isRecording={isRecording}
recordingTime={recordingTime}
onStartRecording={startRecording}
onStopRecording={stopRecording}
onUseRecording={useRecording}
formatRecordingTime={formatRecordingTime}
fixedRefText={fixedRefText}
/>
)}
/>
{/* 背景音乐 */}
{/* 背景音乐 (不编号) */}
<BgmPanel
bgmList={bgmList}
bgmLoading={bgmLoading}
@@ -273,32 +421,69 @@ export function HomePage() {
registerBgmItemRef={registerBgmItemRef}
/>
{/* 生成按钮 */}
{/* 生成按钮 (不编号) */}
<GenerateActionBar
isGenerating={isGenerating}
progress={currentTask?.progress || 0}
disabled={isGenerating || !selectedMaterial || (ttsMode === "voiceclone" && !selectedRefAudio)}
materialCount={selectedMaterials.length}
disabled={isGenerating || selectedMaterials.length === 0 || !selectedAudio}
onGenerate={handleGenerate}
/>
</div>
{/* 右侧: 预览区域 */}
{/* 右侧: 作品区域 */}
<div className="space-y-6">
<PreviewPanel
currentTask={currentTask}
isGenerating={isGenerating}
generatedVideo={generatedVideo}
/>
<HistoryList
generatedVideos={generatedVideos}
selectedVideoId={selectedVideoId}
onSelectVideo={handleSelectVideo}
onDeleteVideo={deleteVideo}
onRefresh={() => fetchGeneratedVideos()}
registerVideoRef={registerVideoRef}
formatDate={formatDate}
/>
{/* 生成进度(在作品卡片上方) */}
{currentTask && isGenerating && (
<div className="bg-white/5 rounded-2xl p-4 sm:p-6 border border-purple-500/30 backdrop-blur-sm">
<div className="space-y-3">
<div className="flex justify-between text-sm text-purple-300 mb-1">
<span>AI生成中...</span>
<span>{currentTask.progress || 0}%</span>
</div>
<div className="h-3 bg-black/30 rounded-full overflow-hidden">
<div
className="h-full bg-gradient-to-r from-purple-500 to-pink-500 transition-all duration-300"
style={{ width: `${currentTask.progress || 0}%` }}
/>
</div>
</div>
</div>
)}
{/* 六、作品 */}
<div className="bg-white/5 rounded-2xl p-4 sm:p-6 border border-white/10 backdrop-blur-sm">
<h2 className="text-base sm:text-lg font-semibold text-white mb-4">
</h2>
<div className="flex justify-between items-center mb-3">
<h3 className="text-sm font-medium text-gray-400"></h3>
<button
onClick={() => fetchGeneratedVideos()}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 flex items-center gap-1"
>
<RefreshCw className="h-3.5 w-3.5" />
</button>
</div>
<HistoryList
embedded
generatedVideos={generatedVideos}
selectedVideoId={selectedVideoId}
onSelectVideo={handleSelectVideo}
onDeleteVideo={deleteVideo}
onRefresh={() => fetchGeneratedVideos()}
registerVideoRef={registerVideoRef}
formatDate={formatDate}
/>
<div className="border-t border-white/10 my-4" />
<h3 className="text-sm font-medium text-gray-400 mb-3"></h3>
<PreviewPanel
embedded
currentTask={null}
isGenerating={false}
generatedVideo={generatedVideo}
/>
</div>
</div>
</div>
</main>
@@ -313,6 +498,26 @@ export function HomePage() {
onClose={() => setExtractModalOpen(false)}
onApply={(nextText) => setText(nextText)}
/>
<RewriteModal
isOpen={rewriteModalOpen}
onClose={() => setRewriteModalOpen(false)}
originalText={text}
onApply={(newText) => setText(newText)}
/>
<ClipTrimmer
isOpen={clipTrimmerOpen}
segment={clipTrimmerSegment}
materialUrl={clipTrimmerMaterialUrl}
onConfirm={(sourceStart, sourceEnd) => {
if (clipTrimmerSegmentId) {
setSourceRange(clipTrimmerSegmentId, sourceStart, sourceEnd);
}
setClipTrimmerOpen(false);
}}
onClose={() => setClipTrimmerOpen(false)}
/>
</div>
);
}

View File

@@ -1,17 +1,10 @@
import type { ChangeEvent, MouseEvent } from "react";
import { type ChangeEvent, type MouseEvent, useMemo } from "react";
import { Upload, RefreshCw, Eye, Trash2, X, Pencil, Check } from "lucide-react";
interface Material {
id: string;
name: string;
scene: string;
size_mb: number;
path: string;
}
import type { Material } from "@/shared/types/material";
interface MaterialSelectorProps {
materials: Material[];
selectedMaterial: string;
selectedMaterials: string[];
isFetching: boolean;
lastMaterialCount: number;
editingMaterialId: string | null;
@@ -23,7 +16,7 @@ interface MaterialSelectorProps {
apiBase: string;
onUploadChange: (event: ChangeEvent<HTMLInputElement>) => void;
onRefresh: () => void;
onSelectMaterial: (id: string) => void;
onToggleMaterial: (id: string) => void;
onPreviewMaterial: (path: string) => void;
onStartEditing: (material: Material, event: MouseEvent) => void;
onEditNameChange: (value: string) => void;
@@ -32,11 +25,12 @@ interface MaterialSelectorProps {
onDeleteMaterial: (id: string) => void;
onClearUploadError: () => void;
registerMaterialRef: (id: string, element: HTMLDivElement | null) => void;
embedded?: boolean;
}
export function MaterialSelector({
materials,
selectedMaterial,
selectedMaterials,
isFetching,
lastMaterialCount,
editingMaterialId,
@@ -48,7 +42,7 @@ export function MaterialSelector({
apiBase,
onUploadChange,
onRefresh,
onSelectMaterial,
onToggleMaterial,
onPreviewMaterial,
onStartEditing,
onEditNameChange,
@@ -57,21 +51,32 @@ export function MaterialSelector({
onDeleteMaterial,
onClearUploadError,
registerMaterialRef,
embedded = false,
}: MaterialSelectorProps) {
return (
<div className="bg-white/5 rounded-2xl p-4 sm:p-6 border border-white/10 backdrop-blur-sm">
const selectedSet = useMemo(() => new Set(selectedMaterials), [selectedMaterials]);
const isFull = selectedMaterials.length >= 4;
const content = (
<>
<div className="flex justify-between items-center gap-2 mb-4">
<h2 className="text-base sm:text-lg font-semibold text-white flex items-center gap-2 whitespace-nowrap">
📹
<span className="ml-1 text-[11px] sm:text-xs text-gray-400/90 font-normal">
()
</span>
</h2>
{!embedded ? (
<h2 className="text-base sm:text-lg font-semibold text-white flex items-center gap-2 min-w-0">
<span className="shrink-0"></span>
<span className="text-[11px] sm:text-xs text-gray-400/90 font-normal truncate">
(4)
</span>
</h2>
) : (
<h3 className="text-sm font-medium text-gray-400 min-w-0">
<span className="shrink-0"></span>
<span className="ml-1 text-[11px] text-gray-400/90 font-normal hidden sm:inline">(4)</span>
</h3>
)}
<div className="flex gap-1.5">
<input
type="file"
id="video-upload"
accept=".mp4,.mov,.avi"
accept="video/*"
onChange={onUploadChange}
className="hidden"
/>
@@ -98,7 +103,7 @@ export function MaterialSelector({
{isUploading && (
<div className="mb-4 p-4 bg-purple-500/10 rounded-xl border border-purple-500/30">
<div className="flex justify-between text-sm text-purple-300 mb-2">
<span>📤 ...</span>
<span>...</span>
<span>{uploadProgress}%</span>
</div>
<div className="h-2 bg-black/30 rounded-full overflow-hidden">
@@ -112,7 +117,7 @@ export function MaterialSelector({
{uploadError && (
<div className="mb-4 p-4 bg-red-500/20 text-red-200 rounded-xl text-sm flex justify-between items-center">
<span> {uploadError}</span>
<span>{uploadError}</span>
<button onClick={onClearUploadError} className="text-red-300 hover:text-white">
<X className="h-3.5 w-3.5" />
</button>
@@ -126,7 +131,7 @@ export function MaterialSelector({
API: {apiBase}/api/materials/
</div>
) : isFetching && materials.length === 0 ? (
<div className="space-y-2 max-h-64 overflow-y-auto hide-scrollbar" style={{ contentVisibility: 'auto' }}>
<div className="space-y-2 max-h-48 sm:max-h-64 overflow-y-auto hide-scrollbar" style={{ contentVisibility: 'auto' }}>
{Array.from({ length: Math.min(4, Math.max(1, lastMaterialCount || 1)) }).map((_, index) => (
<div
key={`material-skeleton-${index}`}
@@ -142,89 +147,113 @@ export function MaterialSelector({
<div className="text-5xl mb-4">📁</div>
<p></p>
<p className="text-sm mt-2">
📤
</p>
</div>
) : (
<div
className="space-y-2 max-h-64 overflow-y-auto hide-scrollbar"
className="space-y-2 max-h-48 sm:max-h-64 overflow-y-auto hide-scrollbar"
style={{ contentVisibility: 'auto' }}
>
{materials.map((m) => (
<div
key={m.id}
ref={(el) => registerMaterialRef(m.id, el)}
className={`p-3 rounded-lg border transition-all flex items-center justify-between group ${selectedMaterial === m.id
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
{editingMaterialId === m.id ? (
<div className="flex-1 flex items-center gap-2" onClick={(e) => e.stopPropagation()}>
<input
value={editMaterialName}
onChange={(e) => onEditNameChange(e.target.value)}
className="flex-1 bg-black/40 border border-white/20 rounded-md px-2 py-1 text-xs text-white"
autoFocus
/>
<button
onClick={(e) => onSaveEditing(m.id, e)}
className="p-1 text-green-400 hover:text-green-300"
title="保存"
>
<Check className="h-4 w-4" />
</button>
<button
onClick={onCancelEditing}
className="p-1 text-gray-400 hover:text-white"
title="取消"
>
<X className="h-4 w-4" />
</button>
</div>
) : (
<button onClick={() => onSelectMaterial(m.id)} className="flex-1 text-left">
<div className="text-white text-sm truncate">{m.scene || m.name}</div>
<div className="text-gray-400 text-xs">{m.size_mb.toFixed(1)} MB</div>
</button>
)}
<div className="flex items-center gap-2 pl-2">
<button
onClick={(e) => {
e.stopPropagation();
if (m.path) {
onPreviewMaterial(m.path);
}
}}
className="p-1 text-gray-500 hover:text-white opacity-0 group-hover:opacity-100 transition-opacity"
title="预览视频"
>
<Eye className="h-4 w-4" />
</button>
{editingMaterialId !== m.id && (
<button
onClick={(e) => onStartEditing(m, e)}
className="p-1 text-gray-500 hover:text-white opacity-0 group-hover:opacity-100 transition-opacity"
title="重命名"
>
<Pencil className="h-4 w-4" />
{materials.map((m) => {
const isSelected = selectedSet.has(m.id);
return (
<div
key={m.id}
ref={(el) => registerMaterialRef(m.id, el)}
className={`p-3 rounded-lg border transition-all flex items-center justify-between group ${isSelected
? "border-purple-500 bg-purple-500/20"
: isFull
? "border-white/5 bg-white/[0.02] opacity-50 cursor-not-allowed"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
{editingMaterialId === m.id ? (
<div className="flex-1 flex items-center gap-2" onClick={(e) => e.stopPropagation()}>
<input
value={editMaterialName}
onChange={(e) => onEditNameChange(e.target.value)}
className="flex-1 bg-black/40 border border-white/20 rounded-md px-2 py-1 text-xs text-white"
autoFocus
/>
<button
onClick={(e) => onSaveEditing(m.id, e)}
className="p-1 text-green-400 hover:text-green-300"
title="保存"
>
<Check className="h-4 w-4" />
</button>
<button
onClick={onCancelEditing}
className="p-1 text-gray-400 hover:text-white"
title="取消"
>
<X className="h-4 w-4" />
</button>
</div>
) : (
<button onClick={() => onToggleMaterial(m.id)} disabled={isFull && !isSelected} className="flex-1 text-left flex items-center gap-2">
{/* 复选框 */}
<span
className={`flex-shrink-0 w-4 h-4 rounded border flex items-center justify-center text-[10px] ${isSelected
? "border-purple-500 bg-purple-500 text-white"
: "border-white/30 text-transparent"
}`}
>
{isSelected ? "✓" : ""}
</span>
<div className="min-w-0">
<div className="text-white text-sm truncate">{m.scene || m.name}</div>
<div className="text-gray-400 text-xs">{m.size_mb.toFixed(1)} MB</div>
</div>
</button>
)}
<button
onClick={(e) => {
e.stopPropagation();
onDeleteMaterial(m.id);
}}
className="p-1 text-gray-500 hover:text-red-400 opacity-0 group-hover:opacity-100 transition-opacity"
title="删除素材"
>
<Trash2 className="h-4 w-4" />
</button>
<div className="flex items-center gap-2 pl-2">
<button
onClick={(e) => {
e.stopPropagation();
if (m.path) {
onPreviewMaterial(m.path);
}
}}
className="p-1 text-gray-500 hover:text-white opacity-40 group-hover:opacity-100 transition-opacity"
title="预览视频"
>
<Eye className="h-4 w-4" />
</button>
{editingMaterialId !== m.id && (
<button
onClick={(e) => onStartEditing(m, e)}
className="p-1 text-gray-500 hover:text-white opacity-40 group-hover:opacity-100 transition-opacity"
title="重命名"
>
<Pencil className="h-4 w-4" />
</button>
)}
<button
onClick={(e) => {
e.stopPropagation();
onDeleteMaterial(m.id);
}}
className="p-1 text-gray-500 hover:text-red-400 opacity-40 group-hover:opacity-100 transition-opacity"
title="删除素材"
>
<Trash2 className="h-4 w-4" />
</button>
</div>
</div>
</div>
))}
);
})}
</div>
)}
</>
);
if (embedded) return content;
return (
<div className="bg-white/5 rounded-2xl p-4 sm:p-6 border border-white/10 backdrop-blur-sm">
{content}
</div>
);
}

Some files were not shown because too many files have changed in this diff Show More