Compare commits
4 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
0939d81e9f | ||
|
|
f879fb0001 | ||
|
|
b289006844 | ||
|
|
71b45852bf |
@@ -110,6 +110,8 @@ backend/
|
||||
- 认证方式:**HttpOnly Cookie** (`access_token`)。
|
||||
- `get_current_user` / `get_current_user_optional` 位于 `core/deps.py`。
|
||||
- Session 单设备校验使用 `repositories/sessions.py`。
|
||||
- AI/Tools 等高成本接口必须强制鉴权(`Depends(get_current_user)`),禁止匿名调用消耗外部 API 配额。
|
||||
- 生产环境要求 `DEBUG=false` + 非默认 `JWT_SECRET_KEY`;默认密钥在生产模式下必须阻止服务启动。
|
||||
|
||||
---
|
||||
|
||||
@@ -127,6 +129,14 @@ backend/
|
||||
- 需要重命名时使用 `move_file`,避免直接读写 Storage。
|
||||
- `delete_file` 必须向上抛出异常,不允许静默吞错(避免清理接口出现“假成功”)。
|
||||
- `list_files` 默认容错返回空列表;清理等强一致场景应使用 `strict=True`。
|
||||
- 所有用户输入的文件路径/ID 必须做防御校验:
|
||||
- `material_id` 拒绝 `..` 序列,避免路径穿越
|
||||
- `video_id` 等资源 ID 使用白名单(如 `^[A-Za-z0-9_-]+$`)
|
||||
- 上传/下载链路必须有体积上限:
|
||||
- 素材上传遵循 `MAX_UPLOAD_SIZE_MB`
|
||||
- 参考音频上限 5MB
|
||||
- 文案提取工具文件上传与 URL 下载结果均上限 500MB
|
||||
- 面向前端的错误返回默认使用通用文案;内部堆栈只写服务端日志,避免泄露路径/实现细节。
|
||||
|
||||
### Cookie 存储(用户隔离)
|
||||
|
||||
@@ -151,6 +161,8 @@ backend/user_data/{user_uuid}/cookies/
|
||||
- 业务逻辑写在 service/workflow。
|
||||
- 数据库访问写在 repositories。
|
||||
- 统一使用 `loguru` 打日志。
|
||||
- GLM SDK 调用统一收口到 `services/glm_service.py`(通过统一入口方法),避免在模块内重复拼装 `chat.completions.create` 调用代码。
|
||||
- 涉及文案深度学习的抓取调用,router 侧应透传 `current_user.id` 到 `creator_scraper`,以便复用用户 Cookie 上下文并保持 `analysis_id` 用户隔离。
|
||||
|
||||
---
|
||||
|
||||
@@ -182,6 +194,15 @@ backend/user_data/{user_uuid}/cookies/
|
||||
- `MUSETALK_USE_FLOAT16` (半精度,默认 true)
|
||||
- `LIPSYNC_DURATION_THRESHOLD` (秒,>=此值用 MuseTalk;代码默认 120,本仓库当前 `.env` 配置 100)
|
||||
|
||||
### 小脸口型质量补偿(本地唇形路径)
|
||||
- `LIPSYNC_SMALL_FACE_ENHANCE` (总开关,默认 false)
|
||||
- `LIPSYNC_SMALL_FACE_THRESHOLD` (触发阈值,默认 256)
|
||||
- `LIPSYNC_SMALL_FACE_UPSCALER` (`gfpgan` / `codeformer`)
|
||||
- `LIPSYNC_SMALL_FACE_GPU_ID` (超分 GPU,默认 0)
|
||||
- `LIPSYNC_SMALL_FACE_FAIL_OPEN` (失败回退,默认 true)
|
||||
|
||||
> 部署与验证细节见 `Docs/FACEENHANCE_DEPLOY.md`。
|
||||
|
||||
### 微信视频号
|
||||
- `WEIXIN_HEADLESS_MODE` (headful/headless-new)
|
||||
- `WEIXIN_CHROME_PATH` / `WEIXIN_BROWSER_CHANNEL`
|
||||
|
||||
@@ -78,7 +78,7 @@ backend/
|
||||
* `POST /api/materials`: 上传素材
|
||||
* `GET /api/materials`: 获取素材列表
|
||||
* `PUT /api/materials/{material_id}`: 重命名素材
|
||||
* `GET /api/materials/stream/{material_id}`: 同源流式返回素材文件(用于前端 canvas 截帧,避免跨域 CORS taint)
|
||||
* `GET /api/materials/stream/{material_id}`: 同源流式返回素材文件(用于前端 canvas 截帧,避免跨域 CORS taint;服务端会拒绝 `..` 路径)
|
||||
|
||||
4. **社交发布 (Publish)**
|
||||
* `POST /api/publish`: 发布视频到 抖音/微信视频号/B站/小红书
|
||||
@@ -104,8 +104,9 @@ backend/
|
||||
* `POST /api/ref-audios/{id}/retranscribe`: 重新识别参考音频文字(Whisper 转写 + 超 10s 自动截取)
|
||||
|
||||
7. **AI 功能 (AI)**
|
||||
* `POST /api/ai/generate-meta`: AI 生成标题和标签
|
||||
* `POST /api/ai/translate`: AI 多语言翻译(支持 9 种目标语言)
|
||||
* `POST /api/ai/generate-meta`: AI 生成标题和标签(需登录)
|
||||
* `POST /api/ai/translate`: AI 多语言翻译(支持 9 种目标语言,需登录)
|
||||
* `POST /api/ai/rewrite`: AI 改写文案(需登录)
|
||||
|
||||
8. **预生成配音 (Generated Audios)**
|
||||
* `POST /api/generated-audios/generate`: 异步生成配音(返回 task_id)
|
||||
@@ -115,11 +116,20 @@ backend/
|
||||
* `PUT /api/generated-audios/{audio_id}`: 重命名配音
|
||||
|
||||
9. **工具 (Tools)**
|
||||
* `POST /api/tools/extract-script`: 从视频链接提取文案
|
||||
* `POST /api/tools/extract-script`: 从视频链接提取文案(需登录)
|
||||
* `POST /api/tools/analyze-creator`: 分析博主标题并返回热门话题(需登录)
|
||||
* `POST /api/tools/generate-topic-script`: 基于选中话题生成文案(需登录)
|
||||
|
||||
> 文案深度学习说明:
|
||||
> - 平台支持:抖音 / B站博主主页链接。
|
||||
> - 抓取策略:当前统一使用 Playwright 主链路抓取标题(抖音/B站),并结合用户登录态 Cookie 上下文增强成功率。
|
||||
> - `analysis_id` 绑定 `user_id` 且有 TTL(默认 20 分钟),用于后续“生成文案”阶段安全读取标题上下文。
|
||||
|
||||
10. **健康检查**
|
||||
* `GET /api/lipsync/health`: 唇形同步服务健康状态(含 LatentSync + MuseTalk + 混合路由阈值)
|
||||
* `GET /api/voiceclone/health`: CosyVoice 3.0 服务健康状态
|
||||
* `GET /api/videos/lipsync/health`: 唇形同步服务健康状态(含 LatentSync + MuseTalk + 混合路由阈值 + `data.small_face_enhance`)
|
||||
* `GET /api/videos/voiceclone/health`: CosyVoice 3.0 服务健康状态
|
||||
|
||||
> 小脸口型质量补偿链路健康字段说明:`data.small_face_enhance.enabled`(总开关)、`threshold`(触发阈值)、`detector_loaded`(SCRFD 是否已懒加载)。
|
||||
|
||||
11. **支付 (Payment)**
|
||||
* `POST /api/payment/create-order`: 创建支付宝电脑网站支付订单(需 payment_token)
|
||||
@@ -128,6 +138,16 @@ backend/
|
||||
|
||||
> 登录时若账号未激活或已过期,返回 403 + `payment_token`,前端跳转 `/pay` 页面完成付费。详见 [支付宝部署指南](ALIPAY_DEPLOY.md)。
|
||||
|
||||
### 安全基线(生产环境)
|
||||
|
||||
- `DEBUG` 必须设为 `false`:认证 Cookie 会带 `Secure`,仅在 HTTPS 下发送。
|
||||
- `JWT_SECRET_KEY` 必须是强随机值且不能使用默认值;当 `DEBUG=false` 且仍为默认值时,后端会在启动阶段直接拒绝启动。
|
||||
- 上传体积限制:
|
||||
- `POST /api/materials`:受 `MAX_UPLOAD_SIZE_MB` 限制(默认 500MB)
|
||||
- `POST /api/ref-audios`:5MB
|
||||
- `POST /api/tools/extract-script`:文件上传与 URL 下载结果均限制 500MB
|
||||
- `video_id` 在下载/删除接口使用白名单校验(`^[A-Za-z0-9_-]+$`),非法值直接返回 400。
|
||||
|
||||
### 统一响应结构
|
||||
|
||||
```json
|
||||
@@ -230,7 +250,7 @@ pip install -r requirements.txt
|
||||
SUPABASE_URL=http://localhost:8008
|
||||
SUPABASE_KEY=your_service_role_key
|
||||
|
||||
# GLM API (用于 AI 标题生成)
|
||||
# GLM API (用于 AI 标题/改写/翻译/文案深度学习)
|
||||
GLM_API_KEY=your_glm_api_key
|
||||
|
||||
# LatentSync 配置
|
||||
@@ -242,6 +262,13 @@ MUSETALK_API_URL=http://localhost:8011
|
||||
MUSETALK_BATCH_SIZE=32
|
||||
LIPSYNC_DURATION_THRESHOLD=100
|
||||
|
||||
# 小脸口型质量补偿(默认关闭,建议灰度开启)
|
||||
LIPSYNC_SMALL_FACE_ENHANCE=false
|
||||
LIPSYNC_SMALL_FACE_THRESHOLD=256
|
||||
LIPSYNC_SMALL_FACE_UPSCALER=gfpgan
|
||||
LIPSYNC_SMALL_FACE_GPU_ID=0
|
||||
LIPSYNC_SMALL_FACE_FAIL_OPEN=true
|
||||
|
||||
# MuseTalk 可调参数(示例)
|
||||
MUSETALK_DETECT_EVERY=2
|
||||
MUSETALK_BLEND_CACHE_EVERY=2
|
||||
@@ -249,6 +276,8 @@ MUSETALK_ENCODE_CRF=14
|
||||
MUSETALK_ENCODE_PRESET=slow
|
||||
```
|
||||
|
||||
> 小脸口型质量补偿链路部署、权重与回滚说明见 `Docs/FACEENHANCE_DEPLOY.md`(仅本地 `_local_generate()` 路径接入,远程模式暂不接入)。
|
||||
|
||||
### 4. 启动服务
|
||||
|
||||
**开发模式 (热重载)**:
|
||||
|
||||
@@ -99,8 +99,11 @@ python -m scripts.server # 测试能否启动,Ctrl+C 退出
|
||||
|
||||
> MuseTalk 是单步潜空间修复模型(非扩散模型),推理速度接近实时,适合达到路由阈值的长视频(本仓库当前 `.env` 示例为 >=100s)。与 CosyVoice 共享 GPU0,fp16 推理约需 4-8GB 显存。合成阶段已改为 FFmpeg rawvideo 管道直编码(`libx264` + 可配 CRF/preset)并保留 numpy blending,减少中间有损文件。
|
||||
|
||||
请参考详细的独立部署指南:
|
||||
**[MuseTalk 部署指南](MUSETALK_DEPLOY.md)**
|
||||
请参考详细的独立部署指南:
|
||||
**[MuseTalk 部署指南](MUSETALK_DEPLOY.md)**
|
||||
|
||||
小脸口型质量补偿(可选)部署与验证:
|
||||
**[小脸口型质量补偿链路部署指南](FACEENHANCE_DEPLOY.md)**
|
||||
|
||||
简要步骤:
|
||||
1. 创建独立的 `musetalk` Conda 环境 (Python 3.10 + PyTorch 2.0.1 + CUDA 11.8)
|
||||
@@ -154,12 +157,12 @@ playwright install chromium
|
||||
|
||||
---
|
||||
|
||||
### 可选:AI 标题/标签生成
|
||||
### 可选:AI 标题/标签生成
|
||||
|
||||
> ✅ 如需启用“AI 标题/标签生成”功能,请确保后端可访问外网 API。
|
||||
|
||||
- 需要可访问 `https://open.bigmodel.cn`
|
||||
- API Key 配置在 `backend/app/services/glm_service.py`(建议替换为自己的密钥)
|
||||
- 需要可访问 `https://open.bigmodel.cn`
|
||||
- API Key 配置在 `backend/.env` 的 `GLM_API_KEY`
|
||||
|
||||
---
|
||||
|
||||
@@ -214,10 +217,11 @@ cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
| `LATENTSYNC_USE_SERVER` | true | 设为 true 以启用常驻服务加速 |
|
||||
| `LATENTSYNC_INFERENCE_STEPS` | 30 | 推理步数 (16-50) |
|
||||
| `LATENTSYNC_GUIDANCE_SCALE` | 1.9 | 引导系数 (1.0-3.0) |
|
||||
| `LATENTSYNC_ENABLE_DEEPCACHE` | true | DeepCache 推理加速 |
|
||||
| `LATENTSYNC_SEED` | 1247 | 固定随机种子(可复现) |
|
||||
| `DEBUG` | true | 生产环境改为 false |
|
||||
| `REDIS_URL` | `redis://localhost:6379/0` | 任务状态存储(不可用时回退内存) |
|
||||
| `LATENTSYNC_ENABLE_DEEPCACHE` | true | DeepCache 推理加速 |
|
||||
| `LATENTSYNC_SEED` | 1247 | 固定随机种子(可复现) |
|
||||
| `DEBUG` | false | 生产环境必须为 false(仅开发环境可设 true) |
|
||||
| `JWT_SECRET_KEY` | 强随机值 | 生产环境禁止默认值;默认值在 `DEBUG=false` 下会阻止后端启动 |
|
||||
| `REDIS_URL` | `redis://localhost:6379/0` | 任务状态存储(不可用时回退内存) |
|
||||
| `WEIXIN_HEADLESS_MODE` | headless-new | 视频号 Playwright 模式 (headful/headless-new) |
|
||||
| `WEIXIN_CHROME_PATH` | `/usr/bin/google-chrome` | 系统 Chrome 路径 |
|
||||
| `WEIXIN_BROWSER_CHANNEL` | | Chromium 通道 (可选) |
|
||||
@@ -247,9 +251,14 @@ cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
| `MUSETALK_GPU_ID` | 0 | MuseTalk GPU 编号 |
|
||||
| `MUSETALK_API_URL` | `http://localhost:8011` | MuseTalk 常驻服务地址 |
|
||||
| `MUSETALK_BATCH_SIZE` | 32 | MuseTalk 推理批大小 |
|
||||
| `MUSETALK_VERSION` | v15 | MuseTalk 模型版本 |
|
||||
| `MUSETALK_USE_FLOAT16` | true | MuseTalk 半精度加速 |
|
||||
| `MUSETALK_VERSION` | v15 | MuseTalk 模型版本 |
|
||||
| `MUSETALK_USE_FLOAT16` | true | MuseTalk 半精度加速 |
|
||||
| `LIPSYNC_DURATION_THRESHOLD` | 100 | 秒,>=此值用 MuseTalk,<此值用 LatentSync(代码默认 120,建议在 `.env` 显式配置) |
|
||||
| `LIPSYNC_SMALL_FACE_ENHANCE` | false | 小脸口型质量补偿总开关(建议先关闭,灰度验证后开启) |
|
||||
| `LIPSYNC_SMALL_FACE_THRESHOLD` | 256 | 小脸触发阈值(像素) |
|
||||
| `LIPSYNC_SMALL_FACE_UPSCALER` | gfpgan | 超分模型(`gfpgan` / `codeformer`) |
|
||||
| `LIPSYNC_SMALL_FACE_GPU_ID` | 0 | 小脸补偿超分 GPU(建议与 MuseTalk 同卡) |
|
||||
| `LIPSYNC_SMALL_FACE_FAIL_OPEN` | true | 补偿链路失败时是否自动回退原流程 |
|
||||
| `ALIPAY_APP_ID` | 空 | 支付宝应用 APPID |
|
||||
| `ALIPAY_PRIVATE_KEY_PATH` | 空 | 应用私钥 PEM 文件路径 |
|
||||
| `ALIPAY_PUBLIC_KEY_PATH` | 空 | 支付宝公钥 PEM 文件路径 |
|
||||
@@ -258,7 +267,9 @@ cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
| `PAYMENT_AMOUNT` | `999.00` | 会员价格 (元) |
|
||||
| `PAYMENT_EXPIRE_DAYS` | `365` | 会员有效天数 |
|
||||
|
||||
> 支付宝完整配置步骤(密钥生成、PEM 格式、产品开通等)请参考 **[支付宝部署指南](ALIPAY_DEPLOY.md)**。
|
||||
> 支付宝完整配置步骤(密钥生成、PEM 格式、产品开通等)请参考 **[支付宝部署指南](ALIPAY_DEPLOY.md)**。
|
||||
|
||||
> 认证相关强约束:当 `DEBUG=false` 时,后端登录 Cookie 会带 `Secure`,前端必须通过 HTTPS 域名访问,HTTP 端口直连无法保持登录态。
|
||||
|
||||
---
|
||||
|
||||
@@ -316,11 +327,11 @@ cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
|
||||
/home/rongye/ProgramFiles/miniconda3/envs/musetalk/bin/python scripts/server.py
|
||||
```
|
||||
|
||||
### 验证
|
||||
|
||||
1. 访问 http://服务器IP:3002 查看前端
|
||||
2. 访问 http://服务器IP:8006/docs 查看 API 文档
|
||||
3. 上传测试视频,生成口播视频
|
||||
### 验证
|
||||
|
||||
1. 访问 `https://你的前端域名` 查看前端(生产环境不要用 HTTP 端口直连)
|
||||
2. 访问 `http://服务器IP:8006/docs` 查看 API 文档(仅内网/运维调试)
|
||||
3. 上传测试视频,生成口播视频
|
||||
|
||||
---
|
||||
|
||||
@@ -540,8 +551,8 @@ server {
|
||||
GLM_API_KEY=your_zhipu_api_key
|
||||
```
|
||||
|
||||
3. **验证**:
|
||||
访问 `http://localhost:8006/docs`,测试 `/api/tools/extract-script` 接口。
|
||||
3. **验证**:
|
||||
访问 `http://localhost:8006/docs`,在已登录会话下测试 `/api/tools/extract-script`(该接口需认证)。
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -1,11 +1,13 @@
|
||||
## 视频下载同源修复 + Day 日志拆分归档 (Day 32)
|
||||
## 视频下载同源修复 + 安全漏洞第一批修复 (Day 32)
|
||||
|
||||
### 概述
|
||||
|
||||
今天主要处理“下载行为不符合预期”的问题:
|
||||
今天的工作聚焦四件事:
|
||||
|
||||
1. 修复首页与发布成功弹窗点击下载时被浏览器当作在线播放(新开标签页)的问题。
|
||||
2. 将下载修复开始后的开发内容从 `Day31` 拆分到 `Day32`,保持日志按天清晰归档。
|
||||
3. 根据安全审计报告(`Temp/安全审计报告.md`),实施第一批 6 项无功能风险的安全修复。
|
||||
4. 统一弹窗关闭交互(仅关闭策略):默认支持点空白关闭,发布成功清理弹窗保持强制留存。
|
||||
|
||||
---
|
||||
|
||||
@@ -42,23 +44,104 @@
|
||||
|
||||
---
|
||||
|
||||
## ✅ 3) 安全漏洞第一批修复(6 项,无功能风险)
|
||||
|
||||
根据安全审计报告,实施第一批 6 项可直接修复的安全加固项。
|
||||
|
||||
### 3.1 JWT 默认密钥启动拦截
|
||||
|
||||
- **文件**:`backend/app/main.py`
|
||||
- 新增 `check_jwt_secret` startup 事件(在 `init_admin` 之前)
|
||||
- 当 `JWT_SECRET_KEY` 仍为默认值 `"your-secret-key-change-in-production"` 时:
|
||||
- **生产环境**(`DEBUG=False`):`raise RuntimeError` 直接阻止服务启动
|
||||
- **开发环境**(`DEBUG=True`):输出 `CRITICAL` 级别日志告警,不阻止启动
|
||||
|
||||
### 3.2 AI / Tools 接口加认证
|
||||
|
||||
- **文件**:`backend/app/modules/ai/router.py`、`backend/app/modules/tools/router.py`
|
||||
- AI 路由 3 个端点(`/translate`、`/generate-meta`、`/rewrite`)均增加 `Depends(get_current_user)`
|
||||
- Tools 路由 1 个端点(`/extract-script`)增加 `Depends(get_current_user)`
|
||||
- 前端 axios 已有 `withCredentials: true`,401 自动跳登录页,无需前端改动
|
||||
|
||||
### 3.3 素材路径穿越修复
|
||||
|
||||
- **文件**:`backend/app/modules/materials/router.py`、`backend/app/modules/materials/service.py`
|
||||
- `stream`、`delete_material`、`rename_material` 三处在 `startswith(user_id)` 校验之前新增 `..` 拒绝
|
||||
- 含 `..` 的 `material_id` 直接返回 400
|
||||
- `delete_material` 路由补充 `except ValueError` → 400(原先仅 catch `PermissionError`,`ValueError` 会被 `Exception` 兜底返回 500)
|
||||
|
||||
### 3.4 video_id 白名单校验
|
||||
|
||||
- **文件**:`backend/app/modules/videos/router.py`
|
||||
- `download_generated` 和 `delete_generated` 两个端点在函数开头增加正则校验
|
||||
- 仅允许 `^[A-Za-z0-9_-]+$`,不符合直接返回 400
|
||||
|
||||
### 3.5 上传/下载大小限制
|
||||
|
||||
- **materials/service.py**(流式上传):在 chunk 累加后检查 `MAX_UPLOAD_SIZE_MB`(默认 500MB),超限抛 `ValueError`
|
||||
- **ref_audios/service.py**(参考音频):`await file.read()` 后检查 5MB 上限
|
||||
- **tools/service.py**(文案提取文件上传):将 `shutil.copyfileobj` 替换为分块拷贝 + 500MB 限制
|
||||
- **tools/service.py**(URL 下载分支):`_download_video` 返回后检查文件体积,超 500MB 删除临时文件并拒绝
|
||||
|
||||
### 3.6 错误信息通用化
|
||||
|
||||
- **ai/router.py**:3 处 `detail=str(e)` 分别改为"翻译服务暂时不可用"、"生成标题标签失败"、"改写服务暂时不可用"
|
||||
- **tools/router.py**:保留 "Fresh cookies" 特定分支提示,fallback 改为"文案提取失败,请稍后重试"
|
||||
- **generated_audios/service.py**:任务失败 `error` 字段从 `traceback.format_exc()` 改为 `str(e)`,traceback 仅写入服务端日志
|
||||
|
||||
---
|
||||
|
||||
## ✅ 4) 弹窗关闭策略统一(UX)
|
||||
|
||||
### 目标
|
||||
|
||||
- 保持统一交互预期:业务弹窗默认可通过 `X` 与点击遮罩关闭。
|
||||
- 保留关键流程保护:发布成功清理弹窗继续禁止遮罩关闭,避免误触导致流程中断。
|
||||
- 说明:按钮位置与视觉样式统一属于 Day33 范畴,本日志仅记录关闭策略统一。
|
||||
|
||||
### 调整内容
|
||||
|
||||
- 文案提取弹窗(`ScriptExtractionModal`)支持点击遮罩关闭。
|
||||
- AI 改写弹窗(`RewriteModal`)支持点击遮罩关闭。
|
||||
- 发布页扫码登录弹窗支持点击遮罩关闭。
|
||||
- 修改密码弹窗支持点击遮罩关闭。
|
||||
- 录音弹窗采用动态策略:`closeOnOverlay={!isRecording}`
|
||||
- 未录音:允许遮罩关闭
|
||||
- 录音中:禁止遮罩关闭(防误触);`X` 关闭仍可用,且会先停止录音再关闭
|
||||
- 发布成功清理弹窗维持 `closeOnOverlay=false`,并且不提供 `onClose`(无右上角关闭按钮)。
|
||||
|
||||
---
|
||||
|
||||
## 📁 今日主要修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `backend/app/modules/videos/router.py` | 新增 `GET /api/videos/generated/{video_id}/download`,返回 `attachment` 下载响应 |
|
||||
| `backend/app/modules/videos/router.py` | 新增 `GET /api/videos/generated/{video_id}/download`,返回 `attachment` 下载响应;新增 `video_id` 白名单正则校验(`^[A-Za-z0-9_-]+$`) |
|
||||
| `frontend/src/features/publish/model/usePublishController.ts` | 发布成功后 `triggerCleanup()` 传 `video.id`(替换签名 URL) |
|
||||
| `frontend/src/shared/contexts/CleanupContext.tsx` | 下载字段改为 `videoId`;兼容旧 `videoDownloadUrl` 回填;下载按钮改同源路径 |
|
||||
| `frontend/src/features/home/ui/PreviewPanel.tsx` | 首页下载改为同源下载接口 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | 透传 `generatedVideoId` 给 `PreviewPanel` |
|
||||
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | 弹窗支持点击遮罩关闭(`closeOnOverlay`) |
|
||||
| `frontend/src/features/home/ui/RewriteModal.tsx` | 弹窗支持点击遮罩关闭(`closeOnOverlay`) |
|
||||
| `frontend/src/features/publish/ui/PublishPage.tsx` | 扫码登录弹窗支持点击遮罩关闭 |
|
||||
| `frontend/src/components/AccountSettingsDropdown.tsx` | 修改密码弹窗支持点击遮罩关闭 |
|
||||
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 录音弹窗改为 `closeOnOverlay={!isRecording}`(录音中禁遮罩关闭) |
|
||||
| `Docs/DevLogs/Day31.md` | 移除下载修复章节与对应验证/覆盖项(迁入 Day32) |
|
||||
| `Docs/TASK_COMPLETE.md` | 新增 Day32 Current 区块,Day31 取消 Current |
|
||||
| `Docs/TASK_COMPLETE.md` | 当日新增 Day32 区块并接棒 Current(后续由 Day33 接棒 Current) |
|
||||
| `Docs/BACKEND_README.md` | 补充 `/api/videos/generated/{video_id}/download` 接口说明 |
|
||||
| `Docs/BACKEND_DEV.md` | 补充下载接口 `attachment` 约定 |
|
||||
| `Docs/FRONTEND_README.md` | 补充首页/发布弹窗下载统一同源接口说明 |
|
||||
| `Docs/FRONTEND_DEV.md` | 补充 CleanupContext 下载策略规范 |
|
||||
| `Docs/PUBLISH_DEPLOY.md` | 补充发布成功后同源下载联动说明 |
|
||||
| `README.md` | 补充“一键下载直达(同源 attachment)”能力描述 |
|
||||
| `README.md` | 补充”一键下载直达(同源 attachment)”能力描述 |
|
||||
| `backend/app/main.py` | `check_jwt_secret` startup 事件:生产环境(`DEBUG=False`)强拦截启动,开发环境 `CRITICAL` 告警 |
|
||||
| `backend/app/modules/ai/router.py` | 3 个端点加 `Depends(get_current_user)` 认证;错误返回改为通用消息 |
|
||||
| `backend/app/modules/tools/router.py` | `extract-script` 端点加 `Depends(get_current_user)` 认证;错误返回改为通用消息 |
|
||||
| `backend/app/modules/materials/router.py` | `stream` 端点新增 `..` 路径穿越拒绝;`delete` 端点补充 `except ValueError` → 400 |
|
||||
| `backend/app/modules/materials/service.py` | `delete_material` / `rename_material` 新增 `..` 路径穿越拒绝;流式上传增加 `MAX_UPLOAD_SIZE_MB` 大小限制 |
|
||||
| `backend/app/modules/ref_audios/service.py` | 参考音频上传增加 5MB 大小限制 |
|
||||
| `backend/app/modules/tools/service.py` | 文案提取文件上传替换为限大小分块拷贝(500MB);URL 下载分支增加下载后体积检查(500MB) |
|
||||
| `backend/app/modules/generated_audios/service.py` | 任务失败错误字段从 `traceback.format_exc()` 改为 `str(e)`,避免泄露内部路径 |
|
||||
|
||||
---
|
||||
|
||||
@@ -66,6 +149,11 @@
|
||||
|
||||
- `python -m py_compile backend/app/modules/videos/router.py` ✅
|
||||
- `npm run build`(frontend)✅
|
||||
- `npm run build`(frontend,弹窗关闭策略调整后复验)✅
|
||||
- `pm2 restart vigent2-frontend` ✅
|
||||
- `pm2 restart vigent2-backend` ✅
|
||||
- `curl http://127.0.0.1:8006/health` 返回 `{"status":"ok"}` ✅
|
||||
- 安全修复第一批语法验证:`python -m py_compile backend/app/main.py backend/app/modules/materials/router.py backend/app/modules/tools/service.py backend/app/modules/ai/router.py backend/app/modules/tools/router.py backend/app/modules/materials/service.py backend/app/modules/ref_audios/service.py backend/app/modules/videos/router.py backend/app/modules/generated_audios/service.py` ✅
|
||||
- 未登录调用 `/api/ai/translate` → 返回 401 ✅
|
||||
- 未登录调用 `/api/tools/extract-script` → 返回 401 ✅
|
||||
- 收尾三刀语法验证:`python -m py_compile backend/app/main.py backend/app/modules/materials/router.py backend/app/modules/tools/service.py` ✅
|
||||
|
||||
290
Docs/DevLogs/Day33.md
Normal file
290
Docs/DevLogs/Day33.md
Normal file
@@ -0,0 +1,290 @@
|
||||
## 抖音短链文案提取稳健性修复 (Day 33)
|
||||
|
||||
### 概述
|
||||
|
||||
今天聚焦修复「文案提取助手」里抖音分享短链/口令文本偶发提取失败的问题,并补齐多种抖音落地 URL 形态的兼容。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 1) 问题复盘
|
||||
|
||||
### 现象
|
||||
|
||||
- 复制抖音分享口令文本(含 `v.douyin.com` 短链)时,文案提取偶发失败。
|
||||
- 直接粘贴地址栏链接(如 `jingxuan?modal_id=...`)时,提取成功。
|
||||
|
||||
### 根因
|
||||
|
||||
- `backend/app/modules/tools/service.py` 中 `_download_douyin_manual` 原先只按 `/video/{id}` 提取视频 ID。
|
||||
- 短链重定向结果并不总是 `/video/{id}`,常见还包括:
|
||||
- `/share/video/{id}`
|
||||
- `/user/...?...&vid={id}`
|
||||
- `/follow/search?...&modal_id={id}`
|
||||
- 当落到上述形态时会出现 `Could not extract video_id`,导致 fallback 失败。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 2) 修复方案
|
||||
|
||||
### 2.1 抽取统一解析函数
|
||||
|
||||
- 新增 `_extract_douyin_video_id(candidate_url)`,统一解析以下 ID 形态:
|
||||
- 路径:`/video/{id}`、`/share/video/{id}`
|
||||
- Query 参数:`modal_id`、`vid`、`video_id`、`aweme_id`、`item_id`
|
||||
- 解码后的整串 URL 兜底正则匹配
|
||||
|
||||
### 2.2 fallback 提取链路增强
|
||||
|
||||
- `_download_douyin_manual` 改为:
|
||||
1. 优先从重定向后的 `final_url` 提取 `video_id`
|
||||
2. 若失败,再从原始输入 `url` 提取 `video_id`
|
||||
- 保持后续下载链路不变:访问 `m.douyin.com/share/video/{video_id}` 提取 `play_addr` 并下载。
|
||||
|
||||
---
|
||||
|
||||
## 📁 今日修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `backend/app/modules/tools/service.py` | 新增 `_extract_douyin_video_id`;增强抖音 fallback 的 `video_id` 提取策略(兼容 `share/video`、`modal_id`、`vid` 等) |
|
||||
| `Docs/DevLogs/Day33.md` | 新增 Day33 开发日志,记录问题、根因、修复与验证 |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证记录
|
||||
|
||||
- `python -m py_compile backend/app/modules/tools/service.py` ✅
|
||||
- URL 解析冒烟(函数级):
|
||||
- `jingxuan?modal_id=...` 可提取 ✅
|
||||
- `user?...&vid=...` 可提取 ✅
|
||||
- `follow/search?...&modal_id=...` 可提取 ✅
|
||||
- 下载链路冒烟(服务级):
|
||||
- 用户提供的短链口令文本可成功下载临时视频 ✅
|
||||
- 历史失败样例 `user?...&vid=...` 可成功走通 fallback 下载 ✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 3) 文案深度学习:抖音抓取 Playwright 降级增强
|
||||
|
||||
### 3.1 问题复盘
|
||||
|
||||
- 在「文案深度学习」博主分析链路里,抖音用户页有时返回 JS 壳页(含 `byted_acrawler`),静态 HTML 提取拿不到 `desc`。
|
||||
- 表现为:短链可解析 `sec_uid`,但标题抓取报错“页面结构可能已变更”。
|
||||
|
||||
### 3.2 修复方案
|
||||
|
||||
- 在 `backend/app/services/creator_scraper.py` 中新增 Playwright 降级抓取:
|
||||
1. 保留原 HTTP + `ttwid` 抓取作为首选(轻量、快)。
|
||||
2. 当 HTTP 提取不到标题时,自动切换 Playwright。
|
||||
3. 监听页面网络响应,定向捕获:
|
||||
- `/aweme/v1/web/aweme/post/`
|
||||
- `/aweme/v1/web/user/profile/other/`
|
||||
4. 解析响应 JSON 中 `desc` 作为视频标题来源,并提取博主昵称。
|
||||
- 仅在确实失败时返回更准确提示:
|
||||
- `抖音触发风控验证,暂时无法抓取标题,请稍后重试`
|
||||
|
||||
### 3.3 结果
|
||||
|
||||
- 给定短链 `https://v.douyin.com/hmFXdx5PvzQ/` 可稳定识别并完成标题抓取。
|
||||
- 抓取结果可获得有效博主昵称与约 50 条标题(受平台返回数据影响)。
|
||||
|
||||
### 3.4 本次新增/更新文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `backend/app/services/creator_scraper.py` | 新增抖音 Playwright 降级抓取、网络响应采集、标题/昵称解析优化、错误提示优化 |
|
||||
| `Docs/DevLogs/Day33.md` | 增补文案深度学习抖音抓取增强记录 |
|
||||
|
||||
### 3.5 验证记录
|
||||
|
||||
- `python -m py_compile backend/app/services/creator_scraper.py` ✅
|
||||
- 冒烟验证:
|
||||
- 短链重定向 + `sec_uid` 提取 ✅
|
||||
- HTTP 首选链路失败时自动切换 Playwright ✅
|
||||
- Playwright 网络响应中抓取到 `aweme/post` 数据并提取标题 ✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 4) 文案深度学习功能首版落地
|
||||
|
||||
### 4.1 后端实现
|
||||
|
||||
- 新增博主抓取服务:`backend/app/services/creator_scraper.py`
|
||||
- `scrape_creator_titles(url)`:平台识别 + 标题抓取统一入口
|
||||
- `validate_url(url)`:`https` 强制、域名白名单、DNS 全记录公网校验、逐跳重定向校验
|
||||
- `cache_titles(titles, user_id)` / `get_cached_titles(analysis_id, user_id)`:20 分钟 TTL + 用户绑定
|
||||
- GLM 服务扩展:`backend/app/services/glm_service.py`
|
||||
- `analyze_topics(titles)`:从标题归纳热门话题(≤10)
|
||||
- `generate_script_from_topic(topic, word_count, titles)`:按话题与风格生成文案
|
||||
- 工具路由新增接口:`backend/app/modules/tools/router.py`
|
||||
- `POST /api/tools/analyze-creator`
|
||||
- `POST /api/tools/generate-topic-script`
|
||||
- 使用 Pydantic JSON 请求模型 + 登录态校验 + 统一 `success_response`
|
||||
|
||||
### 4.2 前端实现
|
||||
|
||||
- 新增状态逻辑 Hook:`frontend/src/features/home/ui/script-learning/useScriptLearning.ts`
|
||||
- 流程状态:`input -> analyzing -> topics -> generating -> result`
|
||||
- 管理分析请求、生成请求、错误态、复制、重新生成
|
||||
- 新增弹窗组件:`frontend/src/features/home/ui/ScriptLearningModal.tsx`
|
||||
- 步骤式 UI:输入链接、话题单选、字数输入、结果展示、填入文案/复制
|
||||
- 接入首页交互:
|
||||
- `frontend/src/features/home/ui/ScriptEditor.tsx`:新增「文案深度学习」按钮
|
||||
- `frontend/src/features/home/model/useHomeController.ts`:新增 `learningModalOpen` 状态
|
||||
- `frontend/src/features/home/ui/HomePage.tsx`:挂载弹窗并支持回填主编辑器
|
||||
|
||||
### 4.3 交互位置与规则
|
||||
|
||||
- 按钮位置已按约定落位:
|
||||
- `历史文案` → `文案提取助手` → `文案深度学习` → `AI多语言`
|
||||
- 弹窗遵循当前统一策略:支持遮罩点击关闭(非关键流程弹窗)。
|
||||
|
||||
### 4.4 验证记录
|
||||
|
||||
- 后端语法检查:
|
||||
- `python -m py_compile backend/app/services/creator_scraper.py backend/app/services/glm_service.py backend/app/modules/tools/router.py` ✅
|
||||
- 前端构建:
|
||||
- `cd frontend && npm run build` ✅
|
||||
- 抖音短链样例联调:
|
||||
- `https://v.douyin.com/hmFXdx5PvzQ/` 可解析、可抓取标题(触发降级时可自动走 Playwright)✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 5) 抖音 Cookie 依赖澄清与 B站频率限制增强
|
||||
|
||||
### 5.1 抖音 Cookie 依赖澄清
|
||||
|
||||
- 文案深度学习的抖音抓取**不依赖发布管理页登录 Cookie**。
|
||||
- 当前链路使用:
|
||||
- 短链解析 + `sec_uid` 提取
|
||||
- 公共访问链路(`ttwid` + 页面/接口抓取)
|
||||
- 必要时 Playwright 降级
|
||||
- 因此用户即使未登录抖音,也可使用该功能(但仍可能受平台风控影响)。
|
||||
|
||||
### 5.2 B站“请求过于频繁”优化
|
||||
|
||||
- 在 `backend/app/services/creator_scraper.py` 增强 B站抓取稳健性:
|
||||
- 对频率限制场景增加自动重试(指数退避 + 随机抖动)
|
||||
- 频率限制识别(HTTP 412/429、错误码/错误文案)
|
||||
- HTTP 链路失败后自动切换 Playwright 降级抓取
|
||||
- 最终报错文案统一为更可理解的提示
|
||||
- `mid` 提取兼容根路径与子路径(如 `/upload/video`)
|
||||
|
||||
### 5.3 验证记录
|
||||
|
||||
- B站样例联调:`https://space.bilibili.com/8047632` 可抓取 50 条标题 ✅
|
||||
- 抖音短链复测:`https://v.douyin.com/hmFXdx5PvzQ/` 仍可抓取 50 条标题 ✅
|
||||
|
||||
---
|
||||
|
||||
## ✅ 6) 抖音 + B站 抓取可靠性二次增强
|
||||
|
||||
### 6.1 抖音增强
|
||||
|
||||
- `backend/app/services/creator_scraper.py`
|
||||
- `scrape_creator_titles(..., user_id)` 透传用户 ID,支持读取用户已登录平台 Cookie 作为增强上下文。
|
||||
- 抖音抓取新增可选用户 Cookie 注入(HTTP 请求 + Playwright 上下文)。
|
||||
- Playwright 降级抓取轮次从 4 次提升到 8 次,目标改为尽量补齐 `MAX_TITLES=50`。
|
||||
- 保留网络响应抓取主链路(`aweme/post` + `profile/other`),优先 `desc` 提取标题。
|
||||
|
||||
### 6.2 B站增强
|
||||
|
||||
- 新增 WBI 签名链路(主链路):
|
||||
- 获取 `wbi_img` key(兼容 `nav` 返回 `-101` 但携带 `wbi_img` 的场景)
|
||||
- 计算 `w_rid/wts` 后调用 `x/space/wbi/arc/search`
|
||||
- 多页拉取(分页累加)+ 标题去重,尽量补齐 50 条
|
||||
- 新增 B站会话预热:
|
||||
- `x/frontend/finger/spi` 获取并注入 `buvid3/buvid4`
|
||||
- 支持读取用户已登录 B站 Cookie(若存在)提升命中率
|
||||
- Playwright 降级增强:
|
||||
- 监听 `x/space/*/arc/search` 响应并解析有效 payload
|
||||
- 对捕获的 arc URL 进行 `context.request` 二次回放尝试
|
||||
|
||||
### 6.3 路由联动
|
||||
|
||||
- `backend/app/modules/tools/router.py`
|
||||
- `/api/tools/analyze-creator` 调用抓取时传入 `current_user.id`,用于平台 Cookie 增强。
|
||||
|
||||
### 6.4 结果说明
|
||||
|
||||
- 抖音:短链场景稳定性进一步提升,风控页下优先走 Playwright 降级抓取。
|
||||
- B站:已补齐签名链路与降级链路;但在平台强风控窗口仍可能返回“请求过于频繁/风控校验失败”,属于平台侧限制。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 7) 抓取策略最终调整:抖音/B站改为 Playwright 直连
|
||||
|
||||
根据产品决策,将文案深度学习的博主标题抓取策略统一为 **Playwright 直连主链路**,不再使用“HTTP 主链路 + Playwright 降级”。
|
||||
|
||||
### 7.1 调整内容
|
||||
|
||||
- `backend/app/services/creator_scraper.py`
|
||||
- `_scrape_douyin()` 改为直接调用 `_scrape_douyin_with_playwright()`。
|
||||
- `_scrape_bilibili()` 改为直接调用 `_scrape_bilibili_with_playwright()`。
|
||||
- 两个平台均保留 2 次 Playwright 抓取重试。
|
||||
- 支持优先读取用户隔离 Cookie,若缺失再尝试旧版全局 Cookie。
|
||||
- `backend/app/modules/tools/router.py`
|
||||
- `analyze-creator` 继续传入 `current_user.id`,用于匹配用户 Cookie 上下文。
|
||||
|
||||
### 7.2 影响评估
|
||||
|
||||
- 影响范围仅限「文案深度学习」抓取链路。
|
||||
- **不影响**:视频自动化发布、文案提取助手(extract-script)现有流程。
|
||||
|
||||
### 7.3 验证
|
||||
|
||||
- 抖音短链样例:`https://v.douyin.com/hmFXdx5PvzQ/` 抓取成功,50 条。
|
||||
- B站样例:
|
||||
- `https://space.bilibili.com/256237759?spm_id_from=...` 抓取成功,40 条。
|
||||
- `https://space.bilibili.com/1140672573` 抓取成功,40 条。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 8) GLM 调用链统一与超时体验优化
|
||||
|
||||
### 8.1 现象
|
||||
|
||||
- 文案深度学习“生成文案”偶发前端报错:`timeout of 30000ms exceeded`。
|
||||
|
||||
### 8.2 原因
|
||||
|
||||
- 主要是前端请求超时阈值过短(30s),在模型排队或长文本生成时容易超时。
|
||||
- 后端虽然统一走 `glm_service`,但各方法内部仍重复编写 SDK 调用代码,维护成本高。
|
||||
|
||||
### 8.3 调整
|
||||
|
||||
- 前端:`generate-topic-script` 超时从 30s 提升到 90s,并优化超时提示文案。
|
||||
- 后端:`backend/app/services/glm_service.py`
|
||||
- 新增 `_call_glm(...)` 作为统一调用入口(统一 model / thinking / to_thread / timeout)
|
||||
- `generate_title_tags / rewrite_script / analyze_topics / generate_script_from_topic / translate_text`
|
||||
全部改为复用该入口
|
||||
- 保持 `settings.GLM_MODEL` 单点配置,避免多处散落调用
|
||||
|
||||
### 8.4 结果
|
||||
|
||||
- GLM 调用标准统一,后续参数调整只需改一处。
|
||||
- 前端超时报错显著减少;如确实超时会给出可理解提示。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 9) 三个文案弹窗操作按钮统一
|
||||
|
||||
### 9.1 目标
|
||||
|
||||
- 统一「文案提取助手」「AI 智能改写」「文案深度学习」结果页操作按钮的位置、样式与主次关系。
|
||||
|
||||
### 9.2 调整
|
||||
|
||||
- `frontend/src/features/home/ui/ScriptExtractionModal.tsx`
|
||||
- 结果页按钮从“分散在标题右侧 + 底部单独按钮”改为统一底部 Action Grid。
|
||||
- 按钮统一为:`填入文案`、`复制`、`提取下一个`、`关闭`。
|
||||
- `frontend/src/features/home/ui/RewriteModal.tsx`
|
||||
- 结果页按钮改为统一底部 Action Grid。
|
||||
- 新增复制按钮(含 clipboard fallback)。
|
||||
- 按钮统一为:`填入文案`、`复制`、`重新生成`、`保留原文`。
|
||||
- `frontend/src/features/home/ui/ScriptLearningModal.tsx`
|
||||
- 维持同一 Action Grid 风格:`填入文案`、`复制`、`重新生成`、`换个话题`。
|
||||
|
||||
### 9.3 验证
|
||||
|
||||
- `cd frontend && npm run build` ✅
|
||||
244
Docs/DevLogs/Day34.md
Normal file
244
Docs/DevLogs/Day34.md
Normal file
@@ -0,0 +1,244 @@
|
||||
## 多镜头(Multi-Camera)时间轴系统重构 (Day 34)
|
||||
|
||||
### 概述
|
||||
|
||||
将时间轴系统从"等分顺序片段"模型重构为"主素材 + 插入镜头"多镜头模型。主素材连续循环播放填满整条时间轴,用户可在任意位置叠加插入镜头,实现多机位切换效果。单素材模式行为完全不变。同时补充修复「文案深度学习」弹窗误触关闭问题。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 1) 核心架构变更
|
||||
|
||||
### 1.1 旧模型 vs 新模型
|
||||
|
||||
| | 旧模型 | 新模型 |
|
||||
|---|---|---|
|
||||
| 时间轴结构 | 等分 N 段,每段对应一个素材 | 主素材连续播放 + 浮动插入块 |
|
||||
| 主素材 | 无概念 | `selectedMaterials[0]`,循环填满整条音频时长 |
|
||||
| 其余素材 | 平均分配时长 | 作为插入候选,可自由添加到时间轴任意位置 |
|
||||
| 片段边界 | 固定等分 | 用户拖拽调整位置,点击弹窗编辑时长 |
|
||||
| 最大素材数 | 4(等分) | 4(1 主 + 最多 3 插入候选),每个候选可多次插入 |
|
||||
|
||||
### 1.2 `buildAssignments()` 核心算法
|
||||
|
||||
多素材模式下调用 `toCustomAssignments()` 生成 `custom_assignments` 数组:
|
||||
|
||||
1. 将插入块按 `start` 排序
|
||||
2. 插入块之间的空白(gap)由主素材填充
|
||||
3. 主素材使用 `primaryAccum` 追踪累计播放位置,实现无缝循环
|
||||
4. 每段 gap 按主素材有效片段长度做**边界分割**,确保每个子段不跨越 loop 边界
|
||||
5. 后端 `prepare_segment` 只需做简单裁剪,避免触发"先裁后循环"的帧重复路径
|
||||
|
||||
---
|
||||
|
||||
## ✅ 2) 前端改动
|
||||
|
||||
### 2.1 新增文件
|
||||
|
||||
**`frontend/src/shared/types/timeline.ts`**
|
||||
|
||||
```typescript
|
||||
export interface InsertSegment {
|
||||
id: string;
|
||||
materialId: string;
|
||||
materialName: string;
|
||||
start: number;
|
||||
end: number;
|
||||
sourceStart: number;
|
||||
sourceEnd: number;
|
||||
color: string;
|
||||
}
|
||||
```
|
||||
|
||||
跨模块共享类型,供 `useTimelineEditor`、`TimelineEditor`、`useHomeController` 共用。
|
||||
|
||||
### 2.2 `useTimelineEditor.ts` — 完全重写
|
||||
|
||||
核心 Hook 从等分模型重写为主素材+插入模型:
|
||||
|
||||
- **新 API**:`addInsert`(返回 `AddInsertResult: "ok" | "limit" | "no_space"`)、`removeInsert`、`moveInsert`、`resizeInsert`、`setInsertSourceRange`、`setPrimarySourceRange`、`toCustomAssignments`
|
||||
- **`MultiCamCache`** 接口:独立 localStorage 持久化(`vigent_${storageKey}_multicam`),保存 inserts + primarySourceStart/End
|
||||
- 自动清理:当选中素材列表变化时,移除引用已删除素材的插入块
|
||||
- 主素材源范围在单/多模式切换时自动重置
|
||||
|
||||
### 2.3 `TimelineEditor.tsx` — 完全重写
|
||||
|
||||
可视化组件配合新模型:
|
||||
|
||||
- 主素材背景条:紫色底色 + 循环条纹图案(loopCount > 1 时显示)
|
||||
- 浮动插入块:彩色半透明矩形,支持拖拽移动(中央),点击弹出 ClipTrimmer 编辑截取范围与时长
|
||||
- 插入候选栏:`selectedMaterials[1:]` 显示为 `+` 按钮,点击添加到时间轴
|
||||
- 移动端适配:40px 最小高度、12px 拖拽边缘、始终可见的删除按钮
|
||||
- 清理了未使用的 `TimelineSegment` import
|
||||
|
||||
### 2.4 `useHomeController.ts` — 适配新 API
|
||||
|
||||
- 替换旧 timeline 解构为新 API(`inserts`、`addInsert`、`removeInsert` 等)
|
||||
- `handleGenerate()` 多素材分支重写:调用 `toCustomAssignments()` 生成 assignments,构建 payload 时拆分 `material_path`(主)和 `material_paths`(全部去重路径)
|
||||
- 单素材分支同样调用 `toCustomAssignments()` 处理裁剪范围
|
||||
- 素材重命名时同步更新 inserts 中的 `materialName`
|
||||
- 新增 `handleSetPrimary` 回调:将指定素材提升到 `selectedMaterials[0]`
|
||||
- 新增 `insertCandidates` 计算值:`selectedMaterials[1:]` 对应的 Material 对象列表
|
||||
|
||||
### 2.5 `MaterialSelector.tsx` — 增强
|
||||
|
||||
- 新增 `Crown` 图标和 `onSetPrimary` 回调 prop
|
||||
- 多素材模式下显示角色标签:`selectedMaterials[0]` 显示紫色"主素材"徽章,其余显示灰色"可插入"徽章
|
||||
- 非主素材行显示 Crown 按钮,点击可设为主素材
|
||||
|
||||
### 2.6 `HomePage.tsx` — 适配
|
||||
|
||||
- `clipTrimmerSegment` 重写:支持 `"primary"` ID(主素材裁剪)和插入块 ID 两种路由
|
||||
- `TimelineEditor` 组件传入全部新 props
|
||||
- `ClipTrimmer` 的 `onConfirm` 根据 segment ID 路由到 `setPrimarySourceRange` 或 `setInsertSourceRange`
|
||||
- `MaterialSelector` 传入 `onSetPrimary`
|
||||
|
||||
---
|
||||
|
||||
## ✅ 3) 后端改动
|
||||
|
||||
### 3.1 `workflow.py` — 多镜头支持修复
|
||||
|
||||
四项关键修复:
|
||||
|
||||
**(a) material_paths 来源**
|
||||
|
||||
```python
|
||||
# 旧:从 custom_assignments 推断(不适用于多镜头)
|
||||
# 新:优先信任前端传入的 req.material_paths
|
||||
if req.material_paths and len(req.material_paths) >= 1:
|
||||
material_paths = req.material_paths
|
||||
else:
|
||||
material_paths = [req.material_path]
|
||||
```
|
||||
|
||||
**(b) custom_assignments 校验**
|
||||
|
||||
```python
|
||||
# 旧:len(custom_assignments) == len(material_paths)
|
||||
# 新:>= 1 + 硬上限 50 + 路径子集校验
|
||||
if len(req.custom_assignments) > 50:
|
||||
raise ValueError(...)
|
||||
unknown = [a.material_path for a in req.custom_assignments if a.material_path not in known_paths]
|
||||
if unknown:
|
||||
raise ValueError(...)
|
||||
```
|
||||
|
||||
**(c) 下载去重 + 并发控制**
|
||||
|
||||
```python
|
||||
# 旧:每个 assignment 独立下载(同一素材重复下载)
|
||||
# 新:按唯一路径去重下载,path_to_local 映射
|
||||
_segment_sem = asyncio.Semaphore(4) # 每次调用内部创建,非模块级
|
||||
unique_paths = list(dict.fromkeys(a["material_path"] for a in assignments))
|
||||
path_to_local: dict = {}
|
||||
```
|
||||
|
||||
Semaphore 在每次 `generate_video()` 内部创建,2 个并发任务 × 4 = 峰值 8 个 ffmpeg 进程。
|
||||
|
||||
**(d) 首尾段 capping 保护**
|
||||
|
||||
```python
|
||||
# 仅在非 custom_assignments 模式下执行首尾对齐
|
||||
if not req.custom_assignments and assignments and audio_duration > 0:
|
||||
assignments[0]["start"] = 0.0
|
||||
assignments[-1]["end"] = audio_duration
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ 4) 文案深度学习弹窗防误触关闭
|
||||
|
||||
### 4.1 问题
|
||||
|
||||
- 「文案深度学习」弹窗默认支持遮罩与 `ESC` 关闭,用户在查看生成结果时容易误触关闭,重新打开后已生成内容丢失。
|
||||
|
||||
### 4.2 修复
|
||||
|
||||
- `frontend/src/shared/ui/AppModal.tsx`
|
||||
- 新增 `closeOnEsc?: boolean` 配置,默认值 `true`,保持旧弹窗行为不变。
|
||||
- `frontend/src/features/home/ui/ScriptLearningModal.tsx`
|
||||
- 设置 `closeOnOverlay={false}` 与 `closeOnEsc={false}`,禁止遮罩/ESC 关闭。
|
||||
- 输入页底部按钮由“取消”改为“清空”,仅清理链接输入,不关闭弹窗。
|
||||
- 关闭路径收敛为:右上角 `X` 或结果页“填入文案”。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 5) Code Review 修复
|
||||
|
||||
### 5.1 UX:统一时长编辑入口
|
||||
|
||||
- **问题**:时间轴插入块同时支持右边缘拖拽调时长和点击弹窗编辑,拖拽操作每次都误触弹窗
|
||||
- **修复**:
|
||||
- 移除 `TimelineEditor` 右侧 resize handle
|
||||
- 引入 `dragMovedRef` + 5px 像素阈值区分拖拽与点击
|
||||
- `ClipTrimmer` onConfirm 新增 `resizeInsert()` 同步,确认截取后自动更新时间轴块时长
|
||||
- 帮助文字更新:"点击插入块设置截取/时长"
|
||||
|
||||
### 5.2 Lint 修复
|
||||
|
||||
- `useTimelineEditor.ts`:3 处 `react-hooks/set-state-in-effect`,用 `eslint-disable-next-line` 标注(初始化和清理场景)
|
||||
- `useTimelineEditor.ts`:render-time ref 访问改为 `useState` 模式(`prevPrimaryId`)
|
||||
- `HomePage.tsx`:移除未使用的 `reorderMaterials` 解构
|
||||
- `TimelineEditor.tsx`:移除未使用的 `useMemo` import 和 `materials`/`onResizeInsert` props
|
||||
|
||||
### 5.3 P1:多片段 assignment 退化
|
||||
|
||||
- **问题**:`selectedMaterials.length > 1` 但时间轴无插入块时,`is_multi=False`,后端走单素材路径丢弃非主素材
|
||||
- **修复**(`workflow.py`):
|
||||
|
||||
```python
|
||||
is_multi = len(material_paths) > 1 or (
|
||||
req.custom_assignments is not None and len(req.custom_assignments) > 1
|
||||
)
|
||||
```
|
||||
|
||||
### 5.4 P1:主素材 trim range 泄漏
|
||||
|
||||
- **问题**:切换主素材("设为主素材")时,旧主素材的 `primarySourceStart/End` 保留给新主素材,导致截取范围错误
|
||||
- **原因**:仅按 `selectedMaterials.length` 变化重置,切换主素材时长度不变
|
||||
- **修复**(`useTimelineEditor.ts`):改用 identity 追踪
|
||||
|
||||
```typescript
|
||||
const [prevPrimaryId, setPrevPrimaryId] = useState(selectedMaterials[0]);
|
||||
if (selectedMaterials[0] !== prevPrimaryId) {
|
||||
setPrevPrimaryId(selectedMaterials[0]);
|
||||
setPrimarySourceStart(0);
|
||||
setPrimarySourceEnd(0);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📁 今日修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `frontend/src/shared/types/timeline.ts` | **新增**:`InsertSegment` 接口定义 |
|
||||
| `frontend/src/features/home/model/useTimelineEditor.ts` | **重写**:等分模型 → 主素材+插入模型 |
|
||||
| `frontend/src/features/home/ui/TimelineEditor.tsx` | **重写**:可视化组件适配新模型 |
|
||||
| `frontend/src/features/home/model/useHomeController.ts` | 适配新 timeline API、生成 payload 重写 |
|
||||
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 主素材/可插入标签、设为主素材按钮 |
|
||||
| `frontend/src/features/home/ui/HomePage.tsx` | ClipTrimmer 路由、TimelineEditor 新 props |
|
||||
| `backend/app/modules/videos/workflow.py` | material_paths 来源、校验、下载去重、capping 保护 |
|
||||
| `frontend/src/shared/ui/AppModal.tsx` | 新增 `closeOnEsc` 配置,支持按弹窗粒度控制 ESC 关闭行为 |
|
||||
| `frontend/src/features/home/ui/ScriptLearningModal.tsx` | 禁用遮罩/ESC 关闭;输入页“取消”改为“清空” |
|
||||
|
||||
---
|
||||
|
||||
## 🔍 验证记录
|
||||
|
||||
- TypeScript 编译检查:`npx tsc --noEmit` ✅ 无错误
|
||||
- Python 语法检查:`python -c "import ast; ast.parse(open(...).read())"` ✅
|
||||
- 前端 lint(本次补充修复):`npm run lint -- src/shared/ui/AppModal.tsx src/features/home/ui/ScriptLearningModal.tsx` ✅
|
||||
- 代码审查(前端 + 后端各一轮 subagent review):
|
||||
- 前端:逻辑正确,无 bug,仅 1 处未使用 import(已清理)
|
||||
- 后端:校验逻辑、下载去重、并发控制均正确
|
||||
- 单素材模式向后兼容:`toCustomAssignments()` 在单素材时正确生成带裁剪范围的单段 assignment
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ 已知限制
|
||||
|
||||
- `prepare_segment` 的"先裁后循环"路径(`needs_loop && source_start > 0`)仍存在,但前端的边界分割算法确保永远不会触发该路径
|
||||
- 插入块最多 10 个(`useTimelineEditor` 内 `MAX_INSERTS=10`),超出时返回 `"limit"`
|
||||
- 插入块最小时长 0.5s,低于此值的操作会被忽略
|
||||
165
Docs/DevLogs/Day35.md
Normal file
165
Docs/DevLogs/Day35.md
Normal file
@@ -0,0 +1,165 @@
|
||||
## 小脸口型质量补偿落地 + 部署验证 (Day 35)
|
||||
|
||||
### 概述
|
||||
|
||||
完成「小脸口型质量补偿(Small-Face LipSync Compensation)」后端落地与部署收口。核心目标是在不改变用户模型选择语义(`default/fast/advanced`)的前提下,对远景小脸素材增加质量补偿链路(检测 -> 裁切 -> 稀疏超分 -> 模型推理 -> 贴回),并保持默认关闭、失败回退(fail-open)、线上可快速回滚。
|
||||
|
||||
---
|
||||
|
||||
## ✅ 1) 后端能力落地
|
||||
|
||||
### 1.1 配置与开关
|
||||
|
||||
新增 5 个配置项(默认保守):
|
||||
|
||||
- `LIPSYNC_SMALL_FACE_ENHANCE`(默认 `false`)
|
||||
- `LIPSYNC_SMALL_FACE_THRESHOLD`(默认 `256`)
|
||||
- `LIPSYNC_SMALL_FACE_UPSCALER`(`gfpgan | codeformer`)
|
||||
- `LIPSYNC_SMALL_FACE_GPU_ID`(默认 `0`)
|
||||
- `LIPSYNC_SMALL_FACE_FAIL_OPEN`(默认 `true`)
|
||||
|
||||
对应代码入口:`backend/app/core/config.py`、`backend/.env`。
|
||||
|
||||
### 1.2 新增小脸增强服务
|
||||
|
||||
新增 `backend/app/services/small_face_enhance_service.py`,实现完整补偿链路:
|
||||
|
||||
1. **小脸判定**(CPU)
|
||||
- SCRFD(`det_10g.onnx`,复用 LatentSync 权重)
|
||||
- 从视频 10%-30% 区间均匀采样 24 帧
|
||||
- 用最大脸宽中位数与阈值比较触发
|
||||
|
||||
2. **裁切与轨迹**(CPU)
|
||||
- 每 8 帧检测一次,其余帧前向填充 + EMA 平滑
|
||||
- bbox 外扩 `padding=0.28`
|
||||
|
||||
3. **稀疏超分**(GPU0)
|
||||
- 检测帧走 GFPGAN/CodeFormer
|
||||
- 非检测帧走 bicubic resize
|
||||
- 目标尺寸 `512x512`
|
||||
|
||||
4. **贴回融合**(CPU)
|
||||
- 口型局部 mask(起点 68% + 侧边留白 16%)+ 高斯羽化(15px)
|
||||
- `cv2.seamlessClone`,失败回退 alpha blend
|
||||
|
||||
5. **帧数保护**
|
||||
- 贴回前校验 `lipsync_frames <= original_frames`
|
||||
- 仅当 `lipsync_frames > original_frames` 时报错(异常),其余按 lipsync 帧数正常贴回
|
||||
|
||||
---
|
||||
|
||||
## ✅ 2) LipSyncService 集成
|
||||
|
||||
`backend/app/services/lipsync_service.py` 关键改造:
|
||||
|
||||
- 在 `_local_generate()` 内按顺序执行:
|
||||
- `video looping` -> `small face enhance` -> `model infer` -> `blend back`
|
||||
- 抽取 `_run_selected_model()` 统一模型路由(MuseTalk / LatentSync server / LatentSync subprocess)
|
||||
- 小脸增强分支全链路 `try/except`,受 `LIPSYNC_SMALL_FACE_FAIL_OPEN` 控制
|
||||
- `check_health()` 新增 `small_face_enhance` 状态字段
|
||||
|
||||
语义保持:
|
||||
|
||||
- 前端与 API 协议不变
|
||||
- 用户选择模型优先,不因小脸强制换模型
|
||||
- 仅本地路径(`_local_generate`)接入;远程路径暂不接入
|
||||
|
||||
---
|
||||
|
||||
## ✅ 3) 依赖与权重
|
||||
|
||||
### 3.1 依赖
|
||||
|
||||
`backend/requirements.txt` 新增:
|
||||
|
||||
- `opencv-python-headless>=4.8.0`
|
||||
- `gfpgan>=1.3.8`
|
||||
|
||||
### 3.2 权重
|
||||
|
||||
- `models/FaceEnhance/GFPGANv1.4.pth`(新增目录与权重)
|
||||
- `models/LatentSync/checkpoints/auxiliary/models/buffalo_l/det_10g.onnx`(复用)
|
||||
|
||||
---
|
||||
|
||||
## ✅ 4) 稳定性修复(部署后补丁)
|
||||
|
||||
为解决实际部署中的依赖兼容、帧数估算偏差、贴回误判与输出质量问题,补充九处修复:
|
||||
|
||||
1. **懒加载 + 守卫**
|
||||
- `cv2/numpy` 改为 `try/except` 导入
|
||||
- 用 `_CV2_AVAILABLE` 守卫增强入口
|
||||
- 缺依赖时跳过增强,不影响主流程
|
||||
|
||||
2. **类型注解与 torchvision 兼容补丁**
|
||||
- 增加 `from __future__ import annotations`,避免 `np.ndarray` 在缺依赖场景下导入期报错
|
||||
- 在 `_ensure_upscaler()` 中注入
|
||||
`sys.modules['torchvision.transforms.functional_tensor']`
|
||||
兼容 `torchvision>=0.20` 与 `gfpgan/basicsr` 旧引用
|
||||
|
||||
3. **ffprobe 帧率与帧数估算修复**
|
||||
- `_get_video_info()` 从 `csv` 切到 `json` 字段访问,避免 `nb_frames` 缺失导致字段错位
|
||||
- fps 取值改为优先 `avg_frame_rate`,`r_frame_rate` 仅作为 fallback
|
||||
|
||||
4. **轨迹帧数与贴回检查修复**
|
||||
- `_build_face_track()` 记录 ffmpeg 实际读帧数,覆盖估算 `nb_frames`
|
||||
- `blend_back()` 放宽检查为 `lipsync <= original` 正常贴回,仅 `>` 报错
|
||||
|
||||
5. **空输出防护**
|
||||
- `blend_back()` 增加 `ls_frames <= 0` 异常分支
|
||||
- 由外层 `FAIL_OPEN` 捕获并回退常规路径,避免写出空视频
|
||||
|
||||
6. **时基对齐修复(慢动作/重影)**
|
||||
- `_crop_and_upscale_video()` 输出 fps 改为跟随源视频 fps,避免增强视频时间轴拉伸
|
||||
- `blend_back()` 按 `orig_fps/ls_fps` 映射原始帧索引,避免只贴回前段帧导致动作变慢/重影
|
||||
|
||||
7. **无声视频修复**
|
||||
- 小脸贴回成功后新增音轨封装(mux)步骤
|
||||
- 强制将当前任务 `audio_path` 封装回贴回视频,防止增强路径无声音
|
||||
|
||||
8. **眼部重影修复**
|
||||
- 口型 mask 起点进一步下移到 68%,并增加左右 16% 留白,减少眼周/鼻翼参与融合
|
||||
- `seamlessClone` 后对结果做 mask 限域二次融合,抑制 Poisson 扩散到眼部上方
|
||||
|
||||
9. **畸形规避(运行侧)**
|
||||
- `LIPSYNC_SMALL_FACE_THRESHOLD=9999` 仅用于链路冒烟,不用于质量评估
|
||||
- 质量验证前统一恢复 `LIPSYNC_SMALL_FACE_THRESHOLD=256`
|
||||
|
||||
---
|
||||
|
||||
## ✅ 5) 部署文档与验证
|
||||
|
||||
新增并回写部署文档:`Docs/FACEENHANCE_DEPLOY.md`。
|
||||
|
||||
文档修正点:
|
||||
|
||||
- 健康检查地址修正为:`/api/videos/lipsync/health`
|
||||
- 响应示例补齐 `success/data` 外层包装
|
||||
|
||||
实际验证要点:
|
||||
|
||||
- `GET /api/videos/lipsync/health` 返回 `data.small_face_enhance`
|
||||
- 默认 `enabled=false`,开关关闭时行为与旧版一致
|
||||
- `detector_loaded=false`(懒加载)符合预期
|
||||
|
||||
---
|
||||
|
||||
## 📁 今日修改文件
|
||||
|
||||
| 文件 | 改动 |
|
||||
|------|------|
|
||||
| `backend/app/core/config.py` | 新增 `LIPSYNC_SMALL_FACE_*` 配置项(5 个) |
|
||||
| `backend/.env` | 增加小脸增强开关与参数 |
|
||||
| `backend/app/services/small_face_enhance_service.py` | 新增:检测/裁切/超分/贴回主服务;后续补丁含懒加载与兼容修复 |
|
||||
| `backend/app/services/lipsync_service.py` | 集成增强链路、抽取 `_run_selected_model`、health 增强状态 |
|
||||
| `backend/requirements.txt` | 新增 `opencv-python-headless`、`gfpgan` |
|
||||
| `models/FaceEnhance/GFPGANv1.4.pth` | 新增超分权重 |
|
||||
| `Docs/FACEENHANCE_DEPLOY.md` | 新增部署文档并修正健康检查路径/返回示例 |
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ 已知限制
|
||||
|
||||
- 仅本地唇形路径接入(`_local_generate()`);远程模式未接入小脸补偿
|
||||
- 多镜头场景当前仍为全局判定,暂不做逐段小脸判定
|
||||
- v1 优先单人自拍稳定性,多人脸切换策略后续再补
|
||||
428
Docs/FACEENHANCE_DEPLOY.md
Normal file
428
Docs/FACEENHANCE_DEPLOY.md
Normal file
@@ -0,0 +1,428 @@
|
||||
# 小脸口型质量补偿链路部署指南
|
||||
|
||||
> **更新时间**:2026-03-10 v1.4
|
||||
> **适用版本**:SmallFaceEnhance v1.4 (内嵌于 Backend 进程)
|
||||
> **架构**:LipSyncService 内部模块,无独立进程
|
||||
|
||||
---
|
||||
|
||||
## 架构概览
|
||||
|
||||
小脸口型质量补偿链路(简称"小脸增强")作为 `LipSyncService._local_generate()` 的**前处理分支**,在 lipsync 推理前自动检测小脸并增强输入质量:
|
||||
|
||||
```
|
||||
原视频 + 音频
|
||||
→ video looping (已有逻辑)
|
||||
→ 小脸检测 (SCRFD, CPU)
|
||||
→ [非小脸] 直接用用户所选模型推理 (现有路径)
|
||||
→ [小脸]
|
||||
A. 裁切主脸区域 (带 padding)
|
||||
B. 稀疏关键帧超分到 512px (GFPGAN, GPU0)
|
||||
C. 用用户所选模型推理 (MuseTalk 或 LatentSync)
|
||||
D. 下半脸 mask 羽化 + seamlessClone 贴回原帧
|
||||
→ 进入现有后续流程 (字幕/BGM/上传)
|
||||
```
|
||||
|
||||
**关键约束**:
|
||||
- 不改前端、不改 API 协议
|
||||
- 模型选择权归用户,不因小脸自动换模型
|
||||
- 默认 fail-open:增强链任何一步失败,自动回退原流程
|
||||
- 无独立进程/PM2,跟随 `vigent2-backend` 运行
|
||||
|
||||
---
|
||||
|
||||
## 硬件要求
|
||||
|
||||
| 配置 | 说明 |
|
||||
|------|------|
|
||||
| 检测器 | SCRFD (det_10g.onnx),CPU 推理,无额外 GPU 开销 |
|
||||
| 超分 | GFPGAN,GPU0 (与 MuseTalk 同卡,顺序执行),约 2-3GB 显存 |
|
||||
| 内存 | 流式 ffmpeg pipe 逐帧处理,不额外占用大量内存 |
|
||||
|
||||
> 超分与 MuseTalk 共享 GPU0,顺序执行不会同时占用显存。
|
||||
|
||||
---
|
||||
|
||||
## 依赖安装
|
||||
|
||||
### 1. pip 依赖
|
||||
|
||||
已在 `backend/requirements.txt` 中添加:
|
||||
|
||||
```
|
||||
opencv-python-headless>=4.8.0
|
||||
gfpgan>=1.3.8
|
||||
```
|
||||
|
||||
安装:
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/backend
|
||||
pip install opencv-python-headless gfpgan
|
||||
```
|
||||
|
||||
> `gfpgan` 会自动拉取 `basicsr`、`facexlib` 等依赖。
|
||||
> `onnxruntime` 需单独确认已安装(LatentSync 环境中已有 1.23.2)。
|
||||
> 如果 backend 虚拟环境中缺少 onnxruntime,需额外安装:`pip install onnxruntime`
|
||||
|
||||
### 2. 系统依赖
|
||||
|
||||
- `ffmpeg` / `ffprobe`:已有(视频处理必需)
|
||||
|
||||
---
|
||||
|
||||
## 模型权重
|
||||
|
||||
### 目录结构
|
||||
|
||||
```
|
||||
models/
|
||||
├── FaceEnhance/
|
||||
│ └── GFPGANv1.4.pth ← 超分权重 (~333MB)
|
||||
└── LatentSync/checkpoints/auxiliary/
|
||||
└── models/buffalo_l/
|
||||
└── det_10g.onnx ← 人脸检测权重 (~16MB, 复用已有)
|
||||
```
|
||||
|
||||
### 下载方式
|
||||
|
||||
**GFPGAN 权重**(已下载):
|
||||
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/ViGent2/models/FaceEnhance
|
||||
wget -O GFPGANv1.4.pth "https://github.com/TencentARC/GFPGAN/releases/download/v1.3.4/GFPGANv1.4.pth"
|
||||
```
|
||||
|
||||
**SCRFD 检测器权重**:
|
||||
|
||||
复用 LatentSync 已有的 `det_10g.onnx`,无需额外下载。代码自动引用路径:
|
||||
`models/LatentSync/checkpoints/auxiliary/models/buffalo_l/det_10g.onnx`
|
||||
|
||||
> 权重缺失时自动 fail-open 跳过增强,不会导致任务失败。
|
||||
|
||||
---
|
||||
|
||||
## 后端配置
|
||||
|
||||
`backend/.env` 中的相关变量:
|
||||
|
||||
```ini
|
||||
# =============== 小脸口型质量补偿链路 ===============
|
||||
LIPSYNC_SMALL_FACE_ENHANCE=false # 总开关 (true/false)
|
||||
LIPSYNC_SMALL_FACE_THRESHOLD=256 # 触发阈值 (像素,脸宽 < 此值触发)
|
||||
LIPSYNC_SMALL_FACE_UPSCALER=gfpgan # 超分模型: gfpgan | codeformer
|
||||
LIPSYNC_SMALL_FACE_GPU_ID=0 # 超分 GPU (与 MuseTalk 同卡)
|
||||
LIPSYNC_SMALL_FACE_FAIL_OPEN=true # 失败回退 (true=回退原流程, false=报错)
|
||||
```
|
||||
|
||||
`backend/app/core/config.py` 中的默认值:
|
||||
|
||||
```python
|
||||
LIPSYNC_SMALL_FACE_ENHANCE: bool = False
|
||||
LIPSYNC_SMALL_FACE_THRESHOLD: int = 256
|
||||
LIPSYNC_SMALL_FACE_UPSCALER: str = "codeformer"
|
||||
LIPSYNC_SMALL_FACE_GPU_ID: int = 0
|
||||
LIPSYNC_SMALL_FACE_FAIL_OPEN: bool = True
|
||||
```
|
||||
|
||||
> `.env` 优先于 `config.py` 默认值。`config.py` 仅在 `.env` 未设置时生效。
|
||||
|
||||
### 模块内部常量
|
||||
|
||||
以下参数固定为代码常量(`small_face_enhance_service.py`),暂不走 env:
|
||||
|
||||
| 常量 | 值 | 说明 |
|
||||
|------|-----|------|
|
||||
| `PADDING` | 0.28 | bbox 外扩比例 |
|
||||
| `DETECT_EVERY` | 8 | 每 N 帧检测,中间帧 EMA 插值 |
|
||||
| `TARGET_SIZE` | 512 | 超分目标尺寸 |
|
||||
| `MASK_FEATHER` | 15 | 下半脸 mask 羽化像素 |
|
||||
| `MASK_UPPER_RATIO` | 0.68 | 口型 mask 起始位置 (crop 高度的 68%,仅覆盖嘴部/下巴) |
|
||||
| `MASK_SIDE_MARGIN` | 0.16 | 左右留白比例,避免改动面颊/鼻翼 |
|
||||
| `SAMPLE_FRAMES` | 24 | 小脸判定采样帧数 |
|
||||
| `SAMPLE_WINDOW` | (0.10, 0.30) | 采样窗口 (视频 10%~30%) |
|
||||
| `ENCODE_FPS` | 25 | 中间视频编码帧率 fallback(优先跟随源视频 fps,源 fps 不可用时回退 25) |
|
||||
| `ENCODE_CRF` | 18 | 中间视频编码质量 |
|
||||
| `EMA_ALPHA` | 0.3 | bbox EMA 平滑系数 |
|
||||
|
||||
---
|
||||
|
||||
## 启用与验证
|
||||
|
||||
### 1. 开启小脸口型质量补偿链路
|
||||
|
||||
```bash
|
||||
# 编辑 backend/.env
|
||||
LIPSYNC_SMALL_FACE_ENHANCE=true
|
||||
```
|
||||
|
||||
重启后端:
|
||||
|
||||
```bash
|
||||
pm2 restart vigent2-backend
|
||||
```
|
||||
|
||||
### 2. 强制触发测试
|
||||
|
||||
设置极大阈值,使任何视频都触发增强:
|
||||
|
||||
```ini
|
||||
LIPSYNC_SMALL_FACE_THRESHOLD=9999
|
||||
```
|
||||
|
||||
> 仅用于链路冒烟测试,不用于质量评估。`9999` 会强制大脸素材进入增强分支,可能出现中脸变形/鼻翼细节异常。
|
||||
|
||||
提交一个视频任务,检查日志:
|
||||
|
||||
```bash
|
||||
pm2 logs vigent2-backend --lines 50
|
||||
```
|
||||
|
||||
应看到类似输出:
|
||||
|
||||
```
|
||||
小脸增强: face_w=320px < threshold=9999px, 触发增强
|
||||
✅ SCRFD 检测器已加载
|
||||
✅ 超分器已加载: gfpgan
|
||||
小脸增强: face_w=320px threshold=9999px enhanced=True upscaler=gfpgan time=12.3s
|
||||
✅ 小脸增强 + 唇形同步完成: /path/to/output.mp4
|
||||
```
|
||||
|
||||
### 3. 调回正常阈值
|
||||
|
||||
验证通过后,改回合理阈值:
|
||||
|
||||
```ini
|
||||
LIPSYNC_SMALL_FACE_THRESHOLD=256
|
||||
```
|
||||
|
||||
并重启 backend:`pm2 restart vigent2-backend`。
|
||||
|
||||
### 4. 健康检查
|
||||
|
||||
```bash
|
||||
curl http://localhost:8006/api/videos/lipsync/health | python3 -m json.tool
|
||||
```
|
||||
|
||||
应包含 `data.small_face_enhance`:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"small_face_enhance": {
|
||||
"enabled": true,
|
||||
"threshold": 256,
|
||||
"detector_loaded": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 相关文件
|
||||
|
||||
| 文件 | 说明 |
|
||||
|------|------|
|
||||
| `backend/app/services/small_face_enhance_service.py` | 小脸增强主服务 (检测 + 裁切 + 超分 + 贴回) |
|
||||
| `backend/app/services/lipsync_service.py` | 混合路由 + 小脸增强集成 + `_run_selected_model()` |
|
||||
| `backend/app/core/config.py` | `LIPSYNC_SMALL_FACE_*` 配置项 |
|
||||
| `models/FaceEnhance/GFPGANv1.4.pth` | GFPGAN 超分权重 |
|
||||
| `models/LatentSync/checkpoints/auxiliary/models/buffalo_l/det_10g.onnx` | SCRFD 检测器权重 (复用) |
|
||||
| `Temp/小脸增强分支-实施计划.md` | 详细方案文档 |
|
||||
|
||||
---
|
||||
|
||||
## 处理流程详解
|
||||
|
||||
### 1. 检测阶段 (CPU)
|
||||
|
||||
- 从视频 10%~30% 区间均匀采 24 帧
|
||||
- SCRFD (det_10g.onnx) 检测最大脸,取中位数脸宽
|
||||
- `脸宽 < THRESHOLD` 时触发增强
|
||||
|
||||
### 2. 裁切 + 轨迹 (CPU)
|
||||
|
||||
- 每 8 帧检测人脸 bbox,中间帧 EMA 插值平滑
|
||||
- bbox + 0.28 padding 外扩,clamp 到帧边界
|
||||
- 实际读取帧数回写 `track.frame_count`,修正 ffprobe 估算偏差
|
||||
- ffmpeg pipe 流式裁切,输出 512x512 视频
|
||||
|
||||
### 3. 超分 (GPU0)
|
||||
|
||||
- 检测帧 (每 8 帧):GFPGAN 全量超分
|
||||
- 非检测帧:bicubic resize 到 512x512
|
||||
- 增强视频输出 fps 跟随源视频 fps(不再固定写 25fps),避免时基拉伸
|
||||
- 推理后自动 `torch.cuda.empty_cache()`
|
||||
|
||||
### 4. Lipsync 推理
|
||||
|
||||
- 用户选择的模型 (fast/default/advanced) 对增强后的人脸视频推理
|
||||
- 模型选择语义不变
|
||||
|
||||
### 5. 贴回 (CPU)
|
||||
|
||||
- 口型局部 mask (从 68% 高度开始 + 左右留白 16%) + 高斯羽化 15px(仅覆盖嘴部/下巴)
|
||||
- `cv2.seamlessClone(NORMAL_CLONE)` 贴回原帧
|
||||
- 对 seamlessClone 结果再按 mask 区域做二次 alpha 限域,避免融合扩散到眼部上方
|
||||
- seamlessClone 失败时 fallback alpha 混合
|
||||
- 贴回按时间轴映射原始帧索引(`orig_fps/ls_fps`),避免只使用前段帧导致动作变慢/重影
|
||||
- 帧数保护:lipsync 按音频时长输出,帧数通常 <= 原始 looped 视频;仅 `lipsync帧数 > 原始帧数` 时报错,`<=` 时正常贴回
|
||||
- 空输出保护:`lipsync帧数 <= 0` 直接抛异常,外层 `FAIL_OPEN` 回退原流程,避免写出空视频
|
||||
- 音轨封装:贴回后强制复用 `audio_path` 重新 mux 音轨,避免增强路径出现无声视频
|
||||
|
||||
---
|
||||
|
||||
## 回滚方案
|
||||
|
||||
**一级回滚 (秒级)**:
|
||||
|
||||
```ini
|
||||
LIPSYNC_SMALL_FACE_ENHANCE=false
|
||||
```
|
||||
|
||||
重启 backend 即可,所有任务走原流程。
|
||||
|
||||
**二级回滚 (版本级)**:
|
||||
|
||||
回退 `lipsync_service.py` 增强接入提交,配置项保留但不生效。
|
||||
|
||||
---
|
||||
|
||||
## 常见问题
|
||||
|
||||
### onnxruntime 未安装
|
||||
|
||||
```
|
||||
⚠️ SCRFD 初始化失败: No module named 'onnxruntime'
|
||||
```
|
||||
|
||||
**解决**:
|
||||
|
||||
```bash
|
||||
pip install onnxruntime
|
||||
```
|
||||
|
||||
### GFPGAN 权重缺失
|
||||
|
||||
```
|
||||
⚠️ GFPGAN 权重不存在: .../models/FaceEnhance/GFPGANv1.4.pth
|
||||
```
|
||||
|
||||
**解决**:参考上方"模型权重"章节下载。权重缺失时超分自动降级为 bicubic resize。
|
||||
|
||||
### 帧数异常导致 fail-open
|
||||
|
||||
```
|
||||
⚠️ 小脸贴回失败,回退原流程: 帧数异常: lipsync=300 > original=250
|
||||
```
|
||||
|
||||
**说明**:v1.1 已放宽帧数检查。lipsync 模型按音频时长输出帧数,通常 <= looped 视频帧数,此时正常贴回。仅当 lipsync 输出帧数**大于**原始帧数时才报错(异常情况)。
|
||||
|
||||
### lipsync 输出为空导致回退
|
||||
|
||||
```
|
||||
⚠️ 小脸贴回失败,回退原流程: lipsync 输出帧数为 0,跳过贴回
|
||||
```
|
||||
|
||||
**说明**:v1.2 新增空输出保护。`ls_frames <= 0` 时立即抛错,由外层 fail-open 回退到常规唇形路径,避免生成空视频文件。
|
||||
|
||||
### 增强后动作变慢 / 眼睛重影
|
||||
|
||||
**原因**:原视频与 lipsync 输出 fps 不一致时,若按同帧号直接贴回,可能出现时间轴错位(只贴回前段帧)。
|
||||
|
||||
**修复**:v1.3 已改为按 `orig_fps/ls_fps` 做时间轴映射,贴回阶段使用时间对应帧而非同索引帧,同时增强视频输出 fps 跟随源 fps。
|
||||
|
||||
**进一步修复(v1.4)**:
|
||||
- mask 起点进一步下移到 68%,并增加左右 16% 留白,减少眼周/鼻翼参与融合
|
||||
- 对 seamlessClone 输出增加 mask 限域,防止 Poisson 扩散造成眼部上方重影
|
||||
|
||||
### 增强后脸部畸形(鼻翼/中脸异常)
|
||||
|
||||
**高概率原因**:使用了测试阈值 `LIPSYNC_SMALL_FACE_THRESHOLD=9999`,把本不需要增强的大脸素材强制送入补偿链路。
|
||||
|
||||
**建议处理**:
|
||||
- 先改回 `LIPSYNC_SMALL_FACE_THRESHOLD=256` 并重启 backend。
|
||||
- 如仍有异常,临时关闭 `LIPSYNC_SMALL_FACE_ENHANCE=false` 做 A/B 对比,再继续调参。
|
||||
|
||||
### 增强后无声音
|
||||
|
||||
**原因**:贴回阶段 rawvideo 写出默认不带音轨。
|
||||
|
||||
**修复**:v1.3 已在贴回后强制执行音轨封装(mux),使用当前任务 `audio_path` 写回音频。
|
||||
|
||||
> v1.0 使用严格一致性检查(`lipsync != original` 即失败),在 looped 视频帧数远大于音频帧数时会误判失败。v1.1 已修复。
|
||||
|
||||
### 增强后口型有偏移
|
||||
|
||||
检查 `PADDING` 常量是否合理。过小的 padding 可能导致裁切区域不够,过大会引入太多背景。当前默认 0.28 (28%) 适用于大多数单人自拍场景。
|
||||
|
||||
### torchvision 兼容性 (functional_tensor)
|
||||
|
||||
```
|
||||
No module named 'torchvision.transforms.functional_tensor'
|
||||
```
|
||||
|
||||
**原因**:torchvision >= 0.20 移除了 `functional_tensor` 模块,但 `basicsr`(gfpgan 依赖)仍引用。
|
||||
|
||||
**解决**:代码已内置兼容 shim(`_ensure_upscaler()` 中自动注入 `sys.modules`),无需手动处理。如仍出现,检查 `_ensure_upscaler` 方法是否正常执行。
|
||||
|
||||
### cv2/numpy 未安装
|
||||
|
||||
```
|
||||
⚠️ cv2 未安装,小脸增强不可用
|
||||
```
|
||||
|
||||
**说明**:`cv2` 和 `numpy` 为 lazy import(`try/except`),缺失时小脸增强自动禁用,不影响后端启动和其他功能。安装 `opencv-python-headless` 即可恢复。
|
||||
|
||||
---
|
||||
|
||||
## 已知限制 (v1.4)
|
||||
|
||||
- 仅覆盖本地 lipsync 路径 (`_local_generate()`),远程模式 (`_remote_generate()`) 暂不接入
|
||||
- 多镜头仅全局判定,不做逐段小脸检测
|
||||
- 仅保证单人 (主脸) 场景稳定,不做多人脸切换
|
||||
- CodeFormer 超分需额外安装 `basicsr`,当前推荐使用 GFPGAN
|
||||
|
||||
---
|
||||
|
||||
## v1.3 → v1.4 变更记录
|
||||
|
||||
| 修复项 | 说明 |
|
||||
|--------|------|
|
||||
| 眼部重影修复 | mask 起点下移到 68% + 左右 16% 留白,减少上半脸与鼻翼参与融合 |
|
||||
| Poisson 扩散抑制 | seamlessClone 后按 mask 二次限域,避免眼部上方 ghosting |
|
||||
|
||||
---
|
||||
|
||||
## v1.2 → v1.3 变更记录
|
||||
|
||||
| 修复项 | 说明 |
|
||||
|--------|------|
|
||||
| 时基修复 | `_crop_and_upscale_video()` 输出 fps 跟随源视频 fps,避免增强视频时间轴被拉伸 |
|
||||
| 贴回对齐修复 | `blend_back()` 改为按 `orig_fps/ls_fps` 映射原始帧索引,减少动作变慢与重影 |
|
||||
| 音轨修复 | 贴回成功后新增音轨封装(mux),避免增强路径无声音 |
|
||||
|
||||
---
|
||||
|
||||
## v1.1 → v1.2 变更记录
|
||||
|
||||
| 修复项 | 说明 |
|
||||
|--------|------|
|
||||
| 空输出保护 | `blend_back()` 新增 `ls_frames <= 0` 判断,直接抛错并由外层 fail-open 回退,避免写出空视频 |
|
||||
|
||||
---
|
||||
|
||||
## v1.0 → v1.1 变更记录
|
||||
|
||||
| 修复项 | 说明 |
|
||||
|--------|------|
|
||||
| ffprobe 解析 | CSV → JSON 格式,字段名访问,不再受 `nb_frames` 缺失导致的字段错位影响 |
|
||||
| fps 选取 | 优先 `avg_frame_rate`(真实平均帧率),`r_frame_rate` 作为 fallback;避免 `60/1` 等 timebase 倍数导致帧数估算偏大 |
|
||||
| 实际帧数回写 | `_build_face_track()` 用 ffmpeg 实际读到的帧数覆盖估算值,`track.frame_count` 更准确 |
|
||||
| 贴回帧数检查 | 放宽为 `lipsync <= original` 时正常贴回,仅 `>` 时报错;适配 MuseTalk/LatentSync 按音频时长输出的行为 |
|
||||
| 边界防护 | `streams` 为空时 return None;`r_frame_rate` 分母为 0 时 fallback 25fps |
|
||||
| torchvision 兼容 | `_ensure_upscaler()` 中注入 `functional_tensor` shim,兼容 torchvision >= 0.20 |
|
||||
| lazy import | `cv2`/`numpy` 包装在 `try/except`,缺失时增强自动禁用不影响后端启动 |
|
||||
| 类型注解 | `from __future__ import annotations` 避免依赖缺失时 `np.ndarray` 等注解触发 NameError |
|
||||
@@ -39,8 +39,12 @@ frontend/src/
|
||||
│ │ ├── MaterialSelector.tsx
|
||||
│ │ ├── ScriptEditor.tsx
|
||||
│ │ ├── ScriptExtractionModal.tsx
|
||||
│ │ ├── RewriteModal.tsx
|
||||
│ │ ├── ScriptLearningModal.tsx
|
||||
│ │ ├── script-extraction/
|
||||
│ │ │ └── useScriptExtraction.ts
|
||||
│ │ ├── script-learning/
|
||||
│ │ │ └── useScriptLearning.ts
|
||||
│ │ ├── TitleSubtitlePanel.tsx
|
||||
│ │ ├── FloatingStylePreview.tsx
|
||||
│ │ ├── VoiceSelector.tsx
|
||||
@@ -69,7 +73,8 @@ frontend/src/
|
||||
│ │ ├── useTitleInput.ts
|
||||
│ │ └── usePublishPrefetch.ts
|
||||
│ ├── ui/
|
||||
│ │ └── SelectPopover.tsx # 统一下拉/BottomSheet 选择器
|
||||
│ │ ├── SelectPopover.tsx # 统一下拉/BottomSheet 选择器
|
||||
│ │ └── AppModal.tsx # 统一弹窗基座
|
||||
│ ├── types/
|
||||
│ │ ├── user.ts # User 类型定义
|
||||
│ │ └── publish.ts # 发布相关类型
|
||||
@@ -213,15 +218,31 @@ body {
|
||||
|
||||
## 统一弹窗规范 (AppModal)
|
||||
|
||||
所有居中弹窗(如视频预览、文案提取、AI 改写、文案扩展编辑、录音、密码修改)统一使用 `@/shared/ui/AppModal` + `AppModalHeader`:
|
||||
所有居中弹窗(如视频预览、文案提取、AI 改写、文案深度学习、文案扩展编辑、录音、密码修改)统一使用 `@/shared/ui/AppModal` + `AppModalHeader`:
|
||||
|
||||
- 统一遮罩与层级:`fixed inset-0` + `bg-black/80` + `backdrop-blur-sm` + 明确 `z-index`
|
||||
- 统一挂载位置:通过 Portal 挂载到 `document.body`,避免局部容器/层叠上下文影响,确保是全页面弹窗
|
||||
- 统一容器风格:`border-white/10`、深色半透明背景、圆角 `rounded-2xl`、重阴影
|
||||
- 统一关闭行为:支持 `ESC`;是否允许点击遮罩关闭通过 `closeOnOverlay` 显式配置
|
||||
- 默认策略:除关键流程外,`closeOnOverlay` 默认应为 `true`,并通过 `AppModalHeader onClose` 提供右上角 `X` 关闭入口
|
||||
- 关键流程例外:发布成功清理弹窗(`CleanupContext`)必须保持 `closeOnOverlay=false`,且不提供右上角关闭按钮
|
||||
- 录音弹窗例外:使用 `closeOnOverlay={!isRecording}`,录音中禁止遮罩关闭,避免误触中断
|
||||
- 统一滚动策略:弹窗打开时锁定背景滚动(`lockBodyScroll`),内容区自行滚动
|
||||
- 特殊层级场景(例如视频预览压过下拉)使用更高 `z-index`(如 `z-[320]`)
|
||||
|
||||
### 文案类弹窗结果操作栏规范
|
||||
|
||||
适用组件:
|
||||
- `ScriptExtractionModal`
|
||||
- `RewriteModal`
|
||||
- `ScriptLearningModal`
|
||||
|
||||
统一要求:
|
||||
- 结果页操作按钮统一放在内容底部(Action Grid),避免“标题右上角按钮 + 底部按钮”混排。
|
||||
- 主按钮统一为高亮渐变(如「填入文案」),其余按钮统一次级样式(`bg-white/10`)。
|
||||
- 动作文案尽量统一:`填入文案` / `复制` / `重新生成`(或与当前流程等价的返回动作)。
|
||||
- 按钮尺寸、圆角、间距保持一致(推荐 `py-2.5 px-3 rounded-lg text-sm`)。
|
||||
|
||||
---
|
||||
|
||||
## 发布后清理弹窗规范 (CleanupContext)
|
||||
@@ -245,6 +266,7 @@ body {
|
||||
所有需要认证的 API 请求**必须**使用 `@/shared/api/axios` 导出的 axios 实例。该实例已配置:
|
||||
- 自动携带 `credentials: include`
|
||||
- 遇到 401/403 时自动清除 cookie 并跳转登录页
|
||||
- AI/Tools 接口(如 `/api/ai/*`、`/api/tools/extract-script`、`/api/tools/analyze-creator`、`/api/tools/generate-topic-script`)现为强制鉴权,禁止匿名 `fetch` 直调
|
||||
|
||||
**使用方式:**
|
||||
|
||||
@@ -523,6 +545,7 @@ await api.post('/api/videos/generate', {
|
||||
- 录音入口放在“我的参考音频”区域底部右侧(与“上传音频”并排)。
|
||||
- 录音交互使用弹窗:开始/停止 -> 试听 -> 使用此录音 / 弃用本次录音。
|
||||
- 关闭录音弹窗时如仍在录制,会先停止录音再关闭。
|
||||
- 录音中禁止点击遮罩关闭(`closeOnOverlay={!isRecording}`);未录音时允许遮罩关闭。
|
||||
|
||||
```typescript
|
||||
// 录音需要用户授权麦克风
|
||||
|
||||
@@ -47,6 +47,7 @@ ViGent2 的前端界面,采用 Next.js 16 + TailwindCSS 构建。
|
||||
- **音色试听**: EdgeTTS 音色列表支持一键试听,按音色 locale 自动选择对应语言的固定示例文案。
|
||||
- **参考音频管理**: 上传/列表/重命名/删除参考音频,上传后自动 Whisper 转写 ref_text + 超 10s 自动截取。
|
||||
- **录音入口**: 参考音频区域底部右侧提供“上传音频 / 录音”入口;录音采用弹窗流程(录制 -> 试听 -> 使用/弃用)。
|
||||
- **录音防误触**: 录音中禁用遮罩关闭(避免误触中断);未录音时可点空白关闭。
|
||||
- **重新识别**: 旧参考音频可重新转写并截取 (RotateCw 按钮)。
|
||||
- **一键克隆**: 选择参考音频后自动调用 CosyVoice 3.0 服务。
|
||||
- **语速控制**: 声音克隆模式下支持 5 档语速 (0.8-1.2),统一下拉,选择持久化。
|
||||
@@ -56,9 +57,9 @@ ViGent2 的前端界面,采用 Next.js 16 + TailwindCSS 构建。
|
||||
### 4. 配音前置 + 时间轴编排
|
||||
- **配音独立生成**: 先生成配音 → 选中配音 → 再选素材 → 生成视频。
|
||||
- **配音管理面板**: 生成/试听/改名/删除/选中,异步生成 + 进度轮询。
|
||||
- **时间轴编辑器**: wavesurfer.js 音频波形 + 色块可视化素材分配,拖拽分割线调整各段时长。
|
||||
- **素材截取设置**: ClipTrimmer 双手柄 range slider + HTML5 视频预览播放。
|
||||
- **拖拽排序**: 时间轴色块支持 HTML5 Drag & Drop 调换素材顺序。
|
||||
- **时间轴编辑器**: wavesurfer.js 音频波形 + 主素材连续播放背景 + 浮动插入镜头块,拖拽移动位置,点击弹窗编辑截取范围与时长。
|
||||
- **素材截取设置**: ClipTrimmer 双手柄 range slider + HTML5 视频预览播放(主素材与插入块统一入口)。
|
||||
- **多镜头模型**: 主素材循环填满音频时长,其余素材作为插入候选可多次添加到时间轴任意位置;支持"设为主素材"切换。
|
||||
- **自定义分配**: 后端 `custom_assignments` 支持用户定义的素材分配方案(含 `source_start/source_end` 截取区间)。
|
||||
- **时间轴语义对齐**: 超出音频时仅保留可见段并截齐末段,超出段不参与生成;不足音频时最后可见段自动循环补齐。
|
||||
- **画面比例控制**: 时间轴顶部支持 `9:16 / 16:9` 输出比例选择,设置持久化并透传后端。
|
||||
@@ -90,12 +91,12 @@ ViGent2 的前端界面,采用 Next.js 16 + TailwindCSS 构建。
|
||||
- **到期续费**: 会员到期后登录自动跳转付费页续费,流程与首次开通一致。
|
||||
- **管理员激活**: 管理员手动激活功能并存,两种方式互不影响。
|
||||
|
||||
### 9. 文案提取助手 (`ScriptExtractionModal`)
|
||||
- **多源提取**: 支持文件拖拽上传与 URL 粘贴 (B站/抖音/TikTok)。
|
||||
- **AI 智能改写**: 集成 GLM-4.7-Flash,自动改写为口播文案。
|
||||
- **自定义提示词**: 可自定义改写提示词,留空使用默认;设置持久化到 localStorage。
|
||||
- **一键填入**: 提取结果直接填充至视频生成输入框。
|
||||
- **智能交互**: 实时进度展示,防误触设计。
|
||||
### 9. 文案创作助手(3 个弹窗)
|
||||
- **文案提取助手** (`ScriptExtractionModal`): 支持文件上传与 URL 提取(需登录),提取结果可一键填入主编辑器。
|
||||
- **AI 智能改写** (`RewriteModal`): 基于 GLM-4.7-Flash 改写文案,支持自定义提示词持久化。
|
||||
- **文案深度学习** (`ScriptLearningModal`): 输入抖音/B站博主主页,分析热门话题并生成口播文案(需登录)。
|
||||
- **统一结果操作栏**: 三个弹窗结果页统一底部 Action Grid 风格,主按钮为「填入文案」,次按钮统一「复制 / 重新生成(或等价返回操作)」。
|
||||
- **登录鉴权**: 依赖受保护接口(`/api/tools/*`、`/api/ai/*`),未登录会触发全局 401 跳转登录。
|
||||
|
||||
## 🛠️ 技术栈
|
||||
|
||||
@@ -158,6 +159,8 @@ src/
|
||||
|
||||
- 业务选择器统一使用 `SelectPopover`(桌面 Popover / 移动端 BottomSheet);`ScriptEditor` 的“历史文案 / AI多语言”保留原轻量菜单。
|
||||
- 业务弹窗统一使用 `AppModal`(统一遮罩、头部、关闭行为与滚动策略)。
|
||||
- 弹窗关闭策略:默认支持 `ESC` / `X` / 点击空白关闭;仅发布成功清理弹窗为强制流程(不允许空白关闭,也不显示 `X`)。
|
||||
- 文案类弹窗结果页按钮统一:底部 Action Grid、主次按钮层级一致、文案动作命名一致(填入/复制/重新生成)。
|
||||
- 视频预览弹窗层级高于下拉菜单;下拉内支持连续预览。
|
||||
- 页面同时适配桌面端与移动端;长列表统一隐藏滚动条。
|
||||
- 详细 UI 规范、持久化规范与交互约束请查看 `Docs/FRONTEND_DEV.md`。
|
||||
|
||||
@@ -1,21 +1,65 @@
|
||||
# ViGent2 开发任务清单 (Task Log)
|
||||
|
||||
**项目**: ViGent2 数字人口播视频生成系统
|
||||
**进度**: 100% (Day 32 - 视频下载同源修复 + 清理链路体验收敛)
|
||||
**更新时间**: 2026-03-04
|
||||
|
||||
---
|
||||
|
||||
## 📅 对话历史与开发日志
|
||||
|
||||
> 这里记录了每一天的核心开发内容与 milestone。
|
||||
|
||||
### Day 32: 视频下载同源修复 + Day 日志拆分归档 (Current)
|
||||
# ViGent2 开发任务清单 (Task Log)
|
||||
|
||||
**项目**: ViGent2 数字人口播视频生成系统
|
||||
**进度**: 100% (Day 35 - 小脸口型质量补偿落地 + 部署验证)
|
||||
**更新时间**: 2026-03-10
|
||||
|
||||
---
|
||||
|
||||
## 📅 对话历史与开发日志
|
||||
|
||||
> 这里记录了每一天的核心开发内容与 milestone。
|
||||
|
||||
### Day 35: 小脸口型质量补偿落地 + 部署验证 + 稳定性补丁 (Current)
|
||||
- [x] **小脸口型质量补偿落地**: 新增 `small_face_enhance_service.py`,实现 SCRFD 小脸检测(10%-30% 采样)-> 裁切轨迹(每 8 帧检测 + EMA)-> 稀疏关键帧超分(GFPGAN/CodeFormer)-> 下半脸贴回(seamlessClone/alpha fallback)完整链路。
|
||||
- [x] **后端集成完成**: `lipsync_service.py` 在 `_local_generate()` 内完成 looping 后插入增强,抽取 `_run_selected_model()` 统一模型路由,增强失败按 `FAIL_OPEN` 自动回退原流程。
|
||||
- [x] **配置与依赖**: 新增 5 个 `LIPSYNC_SMALL_FACE_*` 配置项;`requirements.txt` 增加 `opencv-python-headless`、`gfpgan`;新增 `models/FaceEnhance/GFPGANv1.4.pth` 权重目录。
|
||||
- [x] **部署文档新增**: 新增并回写 `Docs/FACEENHANCE_DEPLOY.md`,补齐部署、权重、开关、验证、回滚说明。
|
||||
- [x] **线上稳定性修复**:
|
||||
- `small_face_enhance_service.py` 增加 `cv2/numpy` 懒加载守卫,缺依赖时跳过增强不影响主流程。
|
||||
- 增加 `from __future__ import annotations`,避免 `np.ndarray` 注解在缺依赖场景导入期报错。
|
||||
- 增加 `torchvision.transforms.functional_tensor` shim,修复 `torchvision>=0.20` 下 GFPGAN 初始化失败。
|
||||
- `_get_video_info()` 改为 JSON 字段解析并优先 `avg_frame_rate`,修复 `nb_frames` 缺失导致的帧数估算偏差。
|
||||
- `_build_face_track()` 回写实际读帧数;`blend_back()` 帧数校验放宽为 `lipsync <= original` 正常贴回,仅 `>` 报错。
|
||||
- `blend_back()` 新增 `ls_frames <= 0` 空输出保护,异常时由 `FAIL_OPEN` 回退常规路径,避免写出空视频。
|
||||
- 时基修复:增强视频输出 fps 跟随源视频 fps;贴回按 `orig_fps/ls_fps` 映射原始帧索引,修复动作变慢与重影。
|
||||
- 音轨修复:贴回成功后新增 mux 音轨步骤,确保小脸增强路径输出视频包含声音。
|
||||
- 眼部重影修复:mask 起点下移到 68% 并增加左右 16% 留白,对 seamlessClone 结果做 mask 限域二次融合,减少眼部上方 ghosting。
|
||||
- 运行策略收口:`LIPSYNC_SMALL_FACE_THRESHOLD=9999` 仅用于链路冒烟,质量验证与日常运行统一回归 `256`。
|
||||
- [x] **部署校验通过**: `GET /api/videos/lipsync/health` 已返回 `data.small_face_enhance`;默认 `enabled=false`,开关关闭下行为与原流程一致。
|
||||
|
||||
### Day 34: 多镜头时间轴重构 + 文案深度学习弹窗防误触关闭 + Code Review 修复
|
||||
- [x] **时间轴模型重构**: 多素材从”等分顺序片段”升级为”主素材连续播放 + 插入镜头块”,支持自由插入、拖拽移动。
|
||||
- [x] **前端链路落地**: 重写 `useTimelineEditor` 与 `TimelineEditor`,新增主素材/插入候选语义,`useHomeController` / `HomePage` / `MaterialSelector` 全链路适配。
|
||||
- [x] **后端生成链路适配**: `workflow.py` 完成 `material_paths` 来源修正、`custom_assignments` 新校验、素材下载去重与段处理并发限制,保持单素材兼容。
|
||||
- [x] **文案深度学习防误触关闭**: `ScriptLearningModal` 禁用遮罩和 `ESC` 关闭,仅允许右上角 `X` 或”填入文案”关闭;输入页”取消”改为”清空”。
|
||||
- [x] **Code Review 修复**:
|
||||
- UX: 移除时间轴 resize handle,统一用 ClipTrimmer 弹窗编辑时长;引入拖拽/点击像素阈值区分。
|
||||
- Lint: 修复 `useTimelineEditor` 3 处 set-state-in-effect、`HomePage` 未使用解构、`TimelineEditor` 未使用 import/props。
|
||||
- P1: `workflow.py` `is_multi` 补充 `custom_assignments` 条件,防止多片段 assignment 退化为单素材路径。
|
||||
- P1: 主素材 trim range 改为按 identity(非 count)重置,修复切换主素材时截取范围泄漏。
|
||||
- ClipTrimmer onConfirm 同步调用 `resizeInsert()` 更新时间轴块时长。
|
||||
- [x] **文档同步**: 回写 `Day34` 与 `TASK_COMPLETE`,并更新 Current 指向。
|
||||
|
||||
### Day 33: 文案深度学习落地 + 抓取稳定性增强 + 交互统一
|
||||
- [x] **文案深度学习功能上线**: 新增 `ScriptLearningModal`(输入主页链接 -> 话题分析 -> 生成文案 -> 填入编辑器)与首页入口接入。
|
||||
- [x] **Tools 新接口**: 新增 `POST /api/tools/analyze-creator` 与 `POST /api/tools/generate-topic-script`,并接入登录鉴权。
|
||||
- [x] **抖音/B站抓取增强**: 博主标题抓取统一升级为 Playwright 直连主链路,支持用户 Cookie 上下文增强与失败重试。
|
||||
- [x] **GLM 调用统一收口**: `glm_service` 新增统一调用入口,标题生成/改写/翻译/话题分析/话题文案生成全部复用,减少重复代码。
|
||||
- [x] **超时体验优化**: 文案深度学习“生成文案”前端超时从 30s 提升到 90s,并补充超时提示文案。
|
||||
- [x] **文案弹窗交互统一**: 文案提取/AI 改写/文案深度学习结果页按钮统一为底部 Action Grid,主次按钮层级与文案动作统一。
|
||||
- [x] **依赖升级**: 后端 venv 升级 `yt-dlp`、`playwright`、`biliup` 并完成兼容性冒烟验证。
|
||||
- [x] **文档同步**: 回写 `Day33`、`FRONTEND_README`、`FRONTEND_DEV`、`BACKEND_README`、`BACKEND_DEV`、`TASK_COMPLETE`。
|
||||
|
||||
### Day 32: 视频下载同源修复 + 安全整改第一批 + Day 日志拆分归档
|
||||
- [x] **下载链路修复**: 新增 `GET /api/videos/generated/{video_id}/download`,统一返回 `Content-Disposition: attachment`,修复“点击下载却新开标签页播放”问题。
|
||||
- [x] **发布成功弹窗下载改造**: `CleanupContext` 从传 URL 改为传 `videoId`,下载按钮改走同源接口,去掉 `target="_blank"`。
|
||||
- [x] **首页下载改造**: `PreviewPanel` 同步切换到同源下载接口,首页与发布页下载行为一致。
|
||||
- [x] **兼容旧持久化状态**: `CleanupContext` 对旧 `videoDownloadUrl` 做 `videoId` 解析回填,避免旧 pending 状态失效。
|
||||
- [x] **文档拆分归档**: 将“下载修复开始后的今日内容”归档到 `Docs/DevLogs/Day32.md`,并从 `Day31.md` 移除对应章节与验证记录。
|
||||
- [x] **安全第一批修复**: JWT 默认密钥生产拦截、AI/Tools 接口强制鉴权、materials 路径穿越拦截、video_id 白名单、上传体积限制、错误信息通用化。
|
||||
- [x] **安全收尾三刀**: `delete_material` 的 `ValueError -> 400`、`tools` URL 下载分支 500MB 限制、`DEBUG=false` 下默认 JWT 密钥阻断启动。
|
||||
- [x] **弹窗关闭策略收敛**: 默认支持 `ESC/X/遮罩` 关闭;发布成功清理弹窗保持强制流程不允许遮罩关闭;录音弹窗录音中禁遮罩关闭(防误触)。
|
||||
|
||||
### Day 31: 文档体系收敛 + 音色试听 + 录音弹窗重构 + 发布登录稳定性修复
|
||||
- [x] **文档体系收敛**: README/DEV 职责边界明确,部署参数与代码对齐,Qwen3-TTS 文档归档至历史状态。
|
||||
@@ -39,21 +83,21 @@
|
||||
- [x] **发布后清理链路加固**: 新增/优化 `CleanupContext` + `/api/videos/cleanup` 全链路;后端删除异常不再吞错、清理接口严格成功语义;前端失败不清本地/不关弹窗,3 次失败可暂不清理,清理状态 24h 过期并支持用户切换复位;清理范围收敛为输入内容字段并保留用户偏好。
|
||||
|
||||
### Day 30: Remotion 缓存修复 + 编码流水线质量优化 + 唇形同步容错 + 统一下拉交互
|
||||
- [x] **Remotion 缓存 404 修复**: bundle 缓存命中时,新生成的视频/字体文件不在旧缓存 `public/` 目录 → 404 → 回退 FFmpeg(无标题字幕)。改为硬链接(`fs.linkSync`)当前渲染所需文件到缓存目录。
|
||||
- [x] **LatentSync `read_video` 跳过冗余 FPS 重编码**: 检测输入 FPS,已是 25fps 时跳过 `ffmpeg -r 25 -crf 18` 重编码。
|
||||
- [x] **LatentSync final mux 流复制**: `imageio` CRF 13 写帧后的 mux 步骤从 `libx264 -crf 18` 改为 `-c:v copy`,消除冗余双重编码。
|
||||
- [x] **`prepare_segment` + `normalize_orientation` CRF 提质**: CRF 23 → 18,与 LatentSync 内部质量标准统一。
|
||||
- [x] **多素材 concat 流复制**: 各段参数已统一,`concat_videos` 从 `libx264 -crf 23` 改为 `-c:v copy`。
|
||||
- [x] **编码次数总计**: 从 5-6 次有损编码降至 3 次(prepare_segment → LatentSync/MuseTalk 模型输出 → Remotion)。
|
||||
- [x] **LatentSync 无脸帧容错**: 素材部分帧检测不到人脸时不再中断推理,无脸帧保留原画面,单素材异常时回退原视频。
|
||||
- [x] **MuseTalk 管道直编码**: `cv2.VideoWriter(mp4v)` 中间有损文件改为 FFmpeg rawvideo stdin 管道,消除一次冗余有损编码。
|
||||
- [x] **MuseTalk 参数环境变量化**: 推理与编码参数(detect_every/blend_cache/CRF/preset 等)从硬编码迁移到 `backend/.env`,当前使用质量优先档(CRF 14, preset slow, detect_every 2, blend_cache_every 2)。
|
||||
- [x] **Workflow 异步防阻塞**: 新增 `_run_blocking()` 线程池辅助,5 处同步 FFmpeg 调用(旋转归一化/prepare_segment/concat/BGM 混音)改为 `await _run_blocking()`,事件循环不再被阻塞。
|
||||
- [x] **compose 跳过优化**: 无 BGM 时 `final_audio_path == audio_path`,跳过多余的 compose 步骤,Remotion 路径直接用 lipsync 输出,非 Remotion 路径 `shutil.copy` 透传。
|
||||
- [x] **compose() 异步化**: `compose()` 改为 `async def`,内部 `_get_duration` 和 `_run_ffmpeg` 走 `run_in_executor`。
|
||||
- [x] **同分辨率跳过 scale**: 多素材逐段比对分辨率,匹配的传 `None` 走 copy 分支;单素材同理。避免已是目标分辨率时的无效重编码。
|
||||
- [x] **`_get_duration()` 线程池化**: workflow 中 3 处同步 ffprobe 探测改为 `await _run_blocking()`。
|
||||
- [x] **compose 循环 CRF 统一**: 循环场景 CRF 23 → 18,与全流水线质量标准一致。
|
||||
- [x] **Remotion 缓存 404 修复**: bundle 缓存命中时,新生成的视频/字体文件不在旧缓存 `public/` 目录 → 404 → 回退 FFmpeg(无标题字幕)。改为硬链接(`fs.linkSync`)当前渲染所需文件到缓存目录。
|
||||
- [x] **LatentSync `read_video` 跳过冗余 FPS 重编码**: 检测输入 FPS,已是 25fps 时跳过 `ffmpeg -r 25 -crf 18` 重编码。
|
||||
- [x] **LatentSync final mux 流复制**: `imageio` CRF 13 写帧后的 mux 步骤从 `libx264 -crf 18` 改为 `-c:v copy`,消除冗余双重编码。
|
||||
- [x] **`prepare_segment` + `normalize_orientation` CRF 提质**: CRF 23 → 18,与 LatentSync 内部质量标准统一。
|
||||
- [x] **多素材 concat 流复制**: 各段参数已统一,`concat_videos` 从 `libx264 -crf 23` 改为 `-c:v copy`。
|
||||
- [x] **编码次数总计**: 从 5-6 次有损编码降至 3 次(prepare_segment → LatentSync/MuseTalk 模型输出 → Remotion)。
|
||||
- [x] **LatentSync 无脸帧容错**: 素材部分帧检测不到人脸时不再中断推理,无脸帧保留原画面,单素材异常时回退原视频。
|
||||
- [x] **MuseTalk 管道直编码**: `cv2.VideoWriter(mp4v)` 中间有损文件改为 FFmpeg rawvideo stdin 管道,消除一次冗余有损编码。
|
||||
- [x] **MuseTalk 参数环境变量化**: 推理与编码参数(detect_every/blend_cache/CRF/preset 等)从硬编码迁移到 `backend/.env`,当前使用质量优先档(CRF 14, preset slow, detect_every 2, blend_cache_every 2)。
|
||||
- [x] **Workflow 异步防阻塞**: 新增 `_run_blocking()` 线程池辅助,5 处同步 FFmpeg 调用(旋转归一化/prepare_segment/concat/BGM 混音)改为 `await _run_blocking()`,事件循环不再被阻塞。
|
||||
- [x] **compose 跳过优化**: 无 BGM 时 `final_audio_path == audio_path`,跳过多余的 compose 步骤,Remotion 路径直接用 lipsync 输出,非 Remotion 路径 `shutil.copy` 透传。
|
||||
- [x] **compose() 异步化**: `compose()` 改为 `async def`,内部 `_get_duration` 和 `_run_ffmpeg` 走 `run_in_executor`。
|
||||
- [x] **同分辨率跳过 scale**: 多素材逐段比对分辨率,匹配的传 `None` 走 copy 分支;单素材同理。避免已是目标分辨率时的无效重编码。
|
||||
- [x] **`_get_duration()` 线程池化**: workflow 中 3 处同步 ffprobe 探测改为 `await _run_blocking()`。
|
||||
- [x] **compose 循环 CRF 统一**: 循环场景 CRF 23 → 18,与全流水线质量标准一致。
|
||||
- [x] **多素材片段校验**: prepare 完成后校验片段数量一致,防止空片段进入 concat。
|
||||
- [x] **唇形模型前端选择**: 生成按钮右侧新增模型下拉(默认模型/快速模型/高级模型),全链路透传 `lipsync_model` 到后端路由。默认保持阈值策略,快速强制 MuseTalk,高级强制 LatentSync,三种模式均有 LatentSync 兜底。选择 localStorage 持久化。
|
||||
- [x] **业务下拉统一组件化**: 新增 `SelectPopover`(桌面 Popover + 移动端 BottomSheet),覆盖首页/发布页主要业务选择器(音色、参考音频、配音、素材、BGM、作品、样式、模型、画面比例)。
|
||||
@@ -62,275 +106,275 @@
|
||||
- [x] **BGM 交互收敛**: BGM 选择改为发布页同款(搜索 + 列表 + 试听);按产品要求移除首页音量滑杆,生成请求固定 `bgm_volume=0.2`。
|
||||
- [x] **例外回退**: `ScriptEditor` 的“历史文案 / AI多语言”恢复原轻量菜单样式(不强制统一 SelectPopover)。
|
||||
- [x] **文档同步**: Day30 / TASK_COMPLETE / FRONTEND_DEV / FRONTEND_README / README / BACKEND_README 同步更新到最终实现。
|
||||
|
||||
### Day 29: 视频流水线优化 + CosyVoice 语气控制
|
||||
- [x] **字幕同步修复**: Whisper 时间戳三步平滑(单调递增+重叠消除+间隙填补)+ 原文节奏映射(线性插值 + 单字时长钳位)。
|
||||
- [x] **LatentSync 嘴型参数调优**: inference_steps 16→20, guidance_scale 2.0, DeepCache 启用, Remotion concurrency 16→4。
|
||||
- [x] **compose 流复制**: 不循环时 `-c:v copy` 替代 libx264 重编码,compose 耗时从分钟级降到秒级。
|
||||
- [x] **FFmpeg 超时保护**: `_run_ffmpeg()` timeout=600, `_get_duration()` timeout=30。
|
||||
- [x] **全局并发限制**: `asyncio.Semaphore(2)` 控制同时运行的生成任务数。
|
||||
- [x] **Redis 任务 TTL**: create 24h, completed/failed 2h, list 自动清理过期索引。
|
||||
- [x] **临时字体清理**: 字体文件加入 temp_files 清理列表。
|
||||
- [x] **预览背景 CORS 修复**: 素材同源代理 `/api/materials/stream/{id}` 彻底绕开跨域。
|
||||
- [x] **CosyVoice 语气控制**: 声音克隆模式新增语气下拉(正常/欢快/低沉/严肃),基于 `inference_instruct2()` 自然语言指令控制情绪,全链路透传 instruct_text,默认"正常"行为不变。
|
||||
|
||||
### Day 28: CosyVoice FP16 加速 + 文档全面更新
|
||||
- [x] **CosyVoice FP16 半精度加速**: `AutoModel()` 开启 `fp16=True`,LLM 推理和 Flow Matching 自动混合精度运行,预估提速 30-40%、显存降低 ~30%。
|
||||
- [x] **文档全面更新**: README.md / DEPLOY_MANUAL.md / SUBTITLE_DEPLOY.md / BACKEND_README.md 补充 MuseTalk 混合唇形同步方案、性能优化、Remotion 并发渲染等内容。
|
||||
|
||||
### Day 27: Remotion 描边修复 + 字体样式扩展 + 混合唇形同步 + 性能优化
|
||||
- [x] **描边渲染修复**: 标题/副标题/字幕从 `textShadow` 4 方向模拟改为 CSS 原生 `-webkit-text-stroke` + `paint-order: stroke fill`,修复描边过粗和副标题重影问题。
|
||||
- [x] **字体样式扩展**: 标题样式 4→12 个(+庞门正道/优设标题圆/阿里数黑体/文道潮黑/无界黑/厚底黑/寒蝉半圆体/欣意吉祥宋),字幕样式 4→8 个(+少女粉/清新绿/金色隶书/楷体红字)。
|
||||
- [x] **描边参数优化**: 所有预设 `stroke_size` 从 8 降至 4~5,配合原生描边视觉更干净。
|
||||
- [x] **TypeScript 类型修复**: Root.tsx `Composition` 泛型与 `calculateMetadata` 参数类型对齐;Video.tsx `VideoProps` 添加索引签名兼容 `Record<string, unknown>`;VideoLayer.tsx 移除 `OffthreadVideo` 不支持的 `loop` prop。
|
||||
- [x] **进度条文案还原**: 进度条从显示后端推送消息改回固定 `正在AI生成中...`。
|
||||
|
||||
### Day 29: 视频流水线优化 + CosyVoice 语气控制
|
||||
- [x] **字幕同步修复**: Whisper 时间戳三步平滑(单调递增+重叠消除+间隙填补)+ 原文节奏映射(线性插值 + 单字时长钳位)。
|
||||
- [x] **LatentSync 嘴型参数调优**: inference_steps 16→20, guidance_scale 2.0, DeepCache 启用, Remotion concurrency 16→4。
|
||||
- [x] **compose 流复制**: 不循环时 `-c:v copy` 替代 libx264 重编码,compose 耗时从分钟级降到秒级。
|
||||
- [x] **FFmpeg 超时保护**: `_run_ffmpeg()` timeout=600, `_get_duration()` timeout=30。
|
||||
- [x] **全局并发限制**: `asyncio.Semaphore(2)` 控制同时运行的生成任务数。
|
||||
- [x] **Redis 任务 TTL**: create 24h, completed/failed 2h, list 自动清理过期索引。
|
||||
- [x] **临时字体清理**: 字体文件加入 temp_files 清理列表。
|
||||
- [x] **预览背景 CORS 修复**: 素材同源代理 `/api/materials/stream/{id}` 彻底绕开跨域。
|
||||
- [x] **CosyVoice 语气控制**: 声音克隆模式新增语气下拉(正常/欢快/低沉/严肃),基于 `inference_instruct2()` 自然语言指令控制情绪,全链路透传 instruct_text,默认"正常"行为不变。
|
||||
|
||||
### Day 28: CosyVoice FP16 加速 + 文档全面更新
|
||||
- [x] **CosyVoice FP16 半精度加速**: `AutoModel()` 开启 `fp16=True`,LLM 推理和 Flow Matching 自动混合精度运行,预估提速 30-40%、显存降低 ~30%。
|
||||
- [x] **文档全面更新**: README.md / DEPLOY_MANUAL.md / SUBTITLE_DEPLOY.md / BACKEND_README.md 补充 MuseTalk 混合唇形同步方案、性能优化、Remotion 并发渲染等内容。
|
||||
|
||||
### Day 27: Remotion 描边修复 + 字体样式扩展 + 混合唇形同步 + 性能优化
|
||||
- [x] **描边渲染修复**: 标题/副标题/字幕从 `textShadow` 4 方向模拟改为 CSS 原生 `-webkit-text-stroke` + `paint-order: stroke fill`,修复描边过粗和副标题重影问题。
|
||||
- [x] **字体样式扩展**: 标题样式 4→12 个(+庞门正道/优设标题圆/阿里数黑体/文道潮黑/无界黑/厚底黑/寒蝉半圆体/欣意吉祥宋),字幕样式 4→8 个(+少女粉/清新绿/金色隶书/楷体红字)。
|
||||
- [x] **描边参数优化**: 所有预设 `stroke_size` 从 8 降至 4~5,配合原生描边视觉更干净。
|
||||
- [x] **TypeScript 类型修复**: Root.tsx `Composition` 泛型与 `calculateMetadata` 参数类型对齐;Video.tsx `VideoProps` 添加索引签名兼容 `Record<string, unknown>`;VideoLayer.tsx 移除 `OffthreadVideo` 不支持的 `loop` prop。
|
||||
- [x] **进度条文案还原**: 进度条从显示后端推送消息改回固定 `正在AI生成中...`。
|
||||
- [x] **MuseTalk 混合唇形同步**: 部署 MuseTalk 1.5 常驻服务 (GPU0, 端口 8011),按音频时长自动路由(由 `LIPSYNC_DURATION_THRESHOLD` 控制;本仓库当前 `.env` 为 100)— 短视频走 LatentSync,长视频走 MuseTalk,MuseTalk 不可用时自动回退。
|
||||
- [x] **MuseTalk 推理性能优化**: server.py v2 重写 — cv2 直读帧(跳过 ffmpeg→PNG)、人脸检测降频(每5帧)、BiSeNet mask 缓存(每5帧)、cv2.VideoWriter 直写(跳过 PNG 写盘)、batch_size 8→32,预估 30min→8-10min (~3x)。
|
||||
- [x] **Remotion 并发渲染优化**: render.ts 新增 concurrency 参数,从默认 8 提升到 16 (56核 CPU),预估 5min→2-3min。
|
||||
|
||||
### Day 26: 前端优化:板块合并 + 序号标题 + UI 精细化
|
||||
- [x] **板块合并**: 首页 9 个独立板块合并为 5 个主板块(配音方式+配音列表→三、配音;视频素材+时间轴→四、素材编辑;历史作品+作品预览→六、作品)。
|
||||
- [x] **中文序号标题**: 一~十编号(首页一~六,发布页七~十),移除所有 emoji 图标。
|
||||
- [x] **embedded 模式**: 6 个组件支持 `embedded` prop,嵌入时不渲染外层卡片/标题。
|
||||
- [x] **配音列表两行布局**: embedded 模式第 1 行语速+生成配音(右对齐),第 2 行配音列表+刷新。
|
||||
- [x] **子组件自渲染子标题**: MaterialSelector/TimelineEditor embedded 时自渲染 h3 子标题+操作按钮同行。
|
||||
- [x] **下拉对齐**: TitleSubtitlePanel 标签统一 `w-20`,下拉 `w-1/3 min-w-[100px]`,垂直对齐。
|
||||
- [x] **参考音频文案简化**: 底部段落移至标题旁,简化为 `(上传3-10秒语音样本)`。
|
||||
- [x] **账户手机号显示**: AccountSettingsDropdown 新增手机号显示。
|
||||
- [x] **标题显示模式对副标题生效**: payload 条件修复 + UI 下拉上移至板块标题行。
|
||||
- [x] **登录后用户信息立即可用**: AuthContext 暴露 `setUser`,登录成功后立即写入用户数据,修复登录后显示"未知账户"的问题。
|
||||
- [x] **文案微调**: 素材描述改为"上传自拍视频,最多可选4个";显示模式选项加"标题"前缀。
|
||||
- [x] **UI/UX 体验优化**: 操作按钮移动端可见(opacity-40)、手机号脱敏、标题字数计数器、时间轴拖拽抓手图标、截取滑块放大。
|
||||
- [x] **代码质量修复**: 密码弹窗 success 清空、MaterialSelector useMemo + disabled 守卫、TimelineEditor useMemo。
|
||||
- [x] **发布页响应式布局**: 平台账号卡片单行布局,移动端紧凑(小图标/小按钮),桌面端宽松(与其他板块风格一致)。
|
||||
- [x] **移动端刷新回顶部**: `scrollRestoration = "manual"` + 列表 scroll 时间门控(`scrollEffectsEnabled` ref,1 秒内禁止自动滚动)+ 延迟兜底 `scrollTo(0,0)`。
|
||||
- [x] **移动端样式预览缩小**: FloatingStylePreview 移动端宽度缩至 160px,位置改为右下角,不遮挡样式调节控件。
|
||||
- [x] **列表滚动条统一隐藏**: 所有列表(BGM/配音/作品/素材/文案提取)滚动条改回 `hide-scrollbar`。
|
||||
- [x] **移动端配音/素材适配**: VoiceSelector 按钮移动端缩小(`px-2 sm:px-4`)修复克隆声音不可见;MaterialSelector 标题行移除 `whitespace-nowrap`,描述移动端隐藏,修复刷新按钮溢出。
|
||||
- [x] **生成配音按钮放大**: 从辅助尺寸(`text-xs px-2 py-1`)升级为主操作尺寸(`text-sm font-medium px-4 py-2`),新增阴影。
|
||||
- [x] **生成进度条位置调整**: 从"六、作品"卡片内部提取到右栏独立卡片,显示在作品卡片上方,更醒目。
|
||||
- [x] **LatentSync 超时修复**: httpx 超时从 1200s(20 分钟)改为 3600s(1 小时),修复 2 分钟以上视频口型推理超时回退问题。
|
||||
- [x] **字幕时间戳节奏映射**: `whisper_service.py` 从全程线性插值改为 Whisper 逐词节奏映射,修复长视频字幕漂移。
|
||||
|
||||
### Day 25: 文案提取修复 + 自定义提示词 + 片头副标题
|
||||
- [x] **抖音文案提取修复**: yt-dlp Fresh cookies 报错,重写 `_download_douyin_manual` 为移动端分享页 + 自动获取 ttwid 方案。
|
||||
- [x] **清理 DOUYIN_COOKIE**: 新方案不再需要手动维护 Cookie,从 `.env`/`config.py`/`service.py` 全面删除。
|
||||
- [x] **AI 智能改写自定义提示词**: 后端 `rewrite_script()` 支持 `custom_prompt` 参数;前端 checkbox 旁新增折叠式提示词编辑区,localStorage 持久化。
|
||||
- [x] **SSR 构建修复**: `useState` 初始化 `localStorage` 访问加 `typeof window` 守卫,修复 `npm run build` 报错。
|
||||
- [x] **片头副标题**: 新增 secondary_title(后端/Remotion/前端全链路),AI 同时生成,独立样式配置,20 字限制。
|
||||
- [x] **前端文案修正**: "AI 洗稿结果"→"AI 改写结果"。
|
||||
- [x] **yt-dlp 升级**: `2025.12.08` → `2026.2.21`。
|
||||
- [x] **参考音频中文文件名修复**: `sanitize_filename()` 将存储路径清洗为 ASCII 安全字符,纯中文名哈希兜底,原始名保留为展示名。
|
||||
|
||||
### Day 24: 鉴权到期治理 + 多素材时间轴稳定性修复
|
||||
- [x] **会员到期请求时失效**: 登录与鉴权接口统一执行 `expires_at` 检查;到期后自动停用账号、清理 session,并返回“会员已到期,请续费”。
|
||||
- [x] **画面比例控制**: 时间轴新增 `9:16 / 16:9` 输出比例选择,前端持久化并透传后端,单素材/多素材统一按目标分辨率处理。
|
||||
- [x] **标题/字幕防溢出**: Remotion 与前端预览统一响应式缩放、自动换行、描边/字距/边距比例缩放,降低预览与成片差异。
|
||||
- [x] **标题显示模式**: 标题行新增“短暂显示/常驻显示”下拉;默认短暂显示(4 秒),用户选择持久化并透传至 Remotion 渲染链路。
|
||||
- [x] **MOV 方向归一化**: 新增旋转元数据解析与 orientation normalize,修复“编码横屏+旋转元数据”导致的竖屏判断偏差。
|
||||
- [x] **多素材拼接稳定性**: 片段 prepare 与 concat 统一 25fps/CFR,concat 增加 `+genpts`,缓解段切换处“画面冻结口型还动”。
|
||||
- [x] **时间轴语义对齐**: 打通 `source_end` 全链路;修复 `sourceStart>0 且 sourceEnd=0` 时长计算;生成时以时间轴可见段 assignments 为准,超出段不参与。
|
||||
- [x] **交互细节优化**: 页面刷新回顶部;素材/历史列表首轮自动滚动抑制,减少恢复状态时页面跳动。
|
||||
|
||||
### Day 23: 配音前置重构 + 素材时间轴编排 + UI 体验优化 + 声音克隆增强
|
||||
|
||||
#### 第一阶段:配音前置
|
||||
- [x] **配音生成独立化**: 新增 `generated_audios` 后端模块(router/schemas/service),5 个 API 端点,复用现有 TTSService / voice_clone_service / task_store。
|
||||
- [x] **配音管理面板**: 前端新增 `useGeneratedAudios` hook + `GeneratedAudiosPanel` 组件,支持生成/试听/改名/删除/选中。
|
||||
- [x] **UI 面板重排序**: 文案 → 标题字幕 → 配音方式 → 配音列表 → 素材选择 → BGM → 生成视频。
|
||||
- [x] **素材区门控**: 未选中配音时素材区显示遮罩,选中后显示配音时长 + 素材均分信息。
|
||||
- [x] **视频生成对接**: workflow.py 新增预生成音频分支(`generated_audio_id`),跳过内联 TTS,向后兼容。
|
||||
- [x] **持久化**: selectedAudioId 加入 useHomePersistence,刷新页面恢复选中配音。
|
||||
|
||||
#### 第二阶段:素材时间轴编排
|
||||
- [x] **时间轴编辑器**: 新增 `TimelineEditor` 组件,wavesurfer.js 音频波形 + 色块可视化素材分配,拖拽分割线调整各段时长。
|
||||
- [x] **素材截取设置**: 新增 `ClipTrimmer` 模态框,HTML5 视频预览 + 双端滑块设置源视频截取起点/终点。
|
||||
- [x] **后端自定义分配**: 新增 `CustomAssignment` 模型,`prepare_segment` 支持 `source_start`,workflow 多素材/单素材流水线支持 `custom_assignments`。
|
||||
- [x] **循环截取修复**: `stream_loop + source_start` 改为两步处理(先裁剪再循环),确保从截取起点循环而非从视频 0s 开始。
|
||||
- [x] **MaterialSelector 精简**: 移除旧的时长信息栏和拖拽排序区(功能迁移到 TimelineEditor)。
|
||||
|
||||
#### 第三阶段:UI 体验优化 + TTS 稳定性
|
||||
- [x] **TTS SoX PATH 修复**: `run_qwen_tts.sh` export conda env bin 到 PATH (Qwen3-TTS 已停用,已被 CosyVoice 3.0 替换)。
|
||||
- [x] **TTS 显存管理**: 每次生成后 `torch.cuda.empty_cache()`,asyncio.to_thread 避免阻塞事件循环 (CosyVoice 沿用相同机制)。
|
||||
- [x] **配音列表按钮统一**: Play/Edit/Delete 按钮右侧同组 hover 显示,与 RefAudioPanel 一致,移除文案摘要。
|
||||
- [x] **素材区解除配音门控**: 移除 MaterialSelector 的 selectedAudio 遮罩,素材随时可上传管理。
|
||||
- [x] **时间轴拖拽排序**: TimelineEditor 色块支持 HTML5 Drag & Drop 调换素材顺序。
|
||||
- [x] **截取设置 Range Slider**: ClipTrimmer 改为单轨道双手柄(紫色起点+粉色终点),替换两个独立滑块。
|
||||
- [x] **截取设置视频预览**: 视频区域可播放/暂停,从 sourceStart 到 sourceEnd 自动停止,拖拽手柄时实时 seek。
|
||||
|
||||
#### 第四阶段:历史文案 + Bug 修复
|
||||
- [x] **历史文案保存与加载**: 新增 `useSavedScripts` hook,手动保存/加载/删除历史文案,独立 localStorage 持久化。
|
||||
- [x] **时间轴拖拽修复**: `reorderSegments` 从属性交换改为数组移动(splice),修复拖拽后时长不跟随素材的 Bug。
|
||||
- [x] **按钮视觉统一**: 文案编辑区 4 个按钮统一为固定高度 `h-7`,移除多余 `<span>` 嵌套。
|
||||
- [x] **底部栏调整**: "保存文案"按钮移至底部右侧,移除预计时长显示。
|
||||
|
||||
#### 第五阶段:字幕语言不匹配 + 视频比例错位修复
|
||||
- [x] **字幕用原文替换 Whisper 转录**: `align()` 新增 `original_text` 参数,字幕文字永远用配音保存的原始文案。
|
||||
- [x] **Remotion 动态视频尺寸**: `calculateMetadata` 从 props 读取真实尺寸,修复标题/字幕比例错位。
|
||||
- [x] **英文空格丢失修复**: `split_word_to_chars` 遇到空格时 flush buffer + pending_space 标记。
|
||||
|
||||
#### 第六阶段:参考音频自动转写 + 语速控制
|
||||
- [x] **Whisper 自动转写 ref_text**: 上传参考音频时自动调用 Whisper 转写内容作为 ref_text,不再使用前端固定文字。
|
||||
- [x] **参考音频自动截取**: 超过 10 秒自动在静音点截取(ffmpeg silencedetect),末尾 0.1 秒淡出避免截断爆音。
|
||||
- [x] **重新识别功能**: 新增 `POST /ref-audios/{id}/retranscribe` 端点 + 前端 RotateCw 按钮,旧音频可重新转写并截取。
|
||||
- [x] **语速控制**: 全链路 speed 参数(前端选择器 → 持久化 → 后端 → CosyVoice `inference_zero_shot(speed=)`),5 档:较慢(0.8)/稍慢(0.9)/正常(1.0)/稍快(1.1)/较快(1.2)。
|
||||
- [x] **缺少参考音频门控**: 声音克隆模式下未选参考音频时,生成配音按钮禁用 + 黄色警告提示。
|
||||
- [x] **Whisper 语言自动检测**: `transcribe()` language 参数改为可选(默认 None = 自动检测),支持多语言参考音频。
|
||||
- [x] **前端清理**: 移除固定 ref_text 常量、朗读引导文字,简化为"上传任意语音样本,系统将自动识别内容并克隆声音"。
|
||||
|
||||
### Day 22: 多素材优化 + AI 翻译 + TTS 多语言
|
||||
- [x] **多素材 Bug 修复**: 6 个高优 Bug(边界溢出、单段 fallback、除零、duration 校验、Whisper 兜底、空列表检查)。
|
||||
- [x] **架构重构**: 多素材从"逐段 LatentSync"重构为"先拼接再推理",推理次数 N→1。
|
||||
- [x] **前端优化**: payload 安全、进度消息、上传自动选中、Material 接口统一、拖拽修复、素材上限 4 个。
|
||||
- [x] **AI 多语言翻译**: 新增 `/api/ai/translate` 接口,前端 9 种语言翻译 + 还原原文。
|
||||
- [x] **TTS 多语言**: EdgeTTS 10 语言声音列表、翻译自动切换声音、声音克隆 language 透传、textLang 持久化。
|
||||
|
||||
### Day 21: 缺陷修复 + 浮动预览 + 发布重构 + 架构优化 + 多素材生成
|
||||
- [x] **Remotion 崩溃容错**: 渲染进程 SIGABRT 退出时检查输出文件,避免误判失败导致标题/字幕丢失。
|
||||
- [x] **首页作品选择持久化**: 修复 `fetchGeneratedVideos` 无条件覆盖恢复值的问题,新增 `preferVideoId` 参数控制选中逻辑。
|
||||
- [x] **发布页作品选择持久化**: 根因为签名 URL 不稳定,全面改用 `video.id` 替代 `path` 进行选择/持久化/比较。
|
||||
- [x] **预取缓存补全**: 首页预取发布页数据时加入 `id` 字段,确保缓存数据可用于持久化匹配。
|
||||
- [x] **浮动样式预览窗口**: 标题字幕预览改为 `position: fixed` 浮动窗口,固定左上角,滚动时始终可见。
|
||||
- [x] **移动端适配**: ScriptEditor 按钮换行、预览默认比例改为 9:16 竖屏。
|
||||
- [x] **多平台发布重构**: 平台配置独立化(DOUYIN_*/WEIXIN_*)、用户隔离 Cookie 管理、抖音刷脸验证二维码、微信发布流程优化。
|
||||
- [x] **前端结构微调**: ScriptExtractionModal 迁移到 features/、contexts 迁移到 shared/contexts/、清理空目录。
|
||||
- [x] **后端模块分层**: materials/tools/ref_audios 三个模块补全 router+schemas+service 分层。
|
||||
- [x] **开发规范更新**: BACKEND_DEV.md 新增渐进原则、DOC_RULES.md 取消 TASK_COMPLETE.md 手动触发约束。
|
||||
- [x] **文档全面更新**: BACKEND_DEV/README、FRONTEND_DEV、DEPLOY_MANUAL、README.md 同步更新。
|
||||
- [x] **多素材视频生成(多机位效果)**: 支持多选素材 + 拖拽排序,按素材数量均分音频时长(对齐 Whisper 字边界)自动切换机位。逐段 LatentSync + FFmpeg 拼接。前端 @dnd-kit 拖拽排序 UI。
|
||||
- [x] **字幕开关移除**: 默认启用逐字高亮字幕,移除开关及相关死代码。
|
||||
- [x] **视频格式扩展**: 上传支持 mkv/webm/flv/wmv/m4v/ts/mts 等常见格式。
|
||||
- [x] **Watchdog 优化**: 健康检查阈值提高到 5 次,新增重启冷却期 120 秒,避免误重启。
|
||||
- [x] **多素材 Bug 修复**: 修复标点分句方案对无句末标点文案无效(改为均分方案)、音频时间偏移导致口型不对齐等缺陷。
|
||||
|
||||
### Day 20: 代码质量与安全优化
|
||||
- [x] **功能性修复**: LatentSync 回退逻辑、任务状态接口认证、User 类型统一。
|
||||
- [x] **性能优化**: N+1 查询修复、视频上传流式处理、httpx 异步替换、GLM 异步包装。
|
||||
- [x] **安全修复**: 硬编码 Cookie 配置化、日志敏感信息脱敏、ffprobe 安全调用、CORS 配置化。
|
||||
- [x] **配置优化**: 存储路径环境变量化、Remotion 预编译加速、LatentSync 绝对路径。
|
||||
- [x] **文档更新**: 更新 DOC_RULES.md 清单,补齐后端与部署文档;更新 SUBTITLE_DEPLOY.md, FRONTEND_DEV.md, implementation_plan.md。
|
||||
- [x] **缺陷修复**: 修复 Remotion 路径解析、发布页持久化竞态、首页选中回归、素材闭包陷阱。
|
||||
|
||||
### Day 19: 自动发布稳定性与发布体验优化 🚀
|
||||
- [x] **抖音发布稳定性**: 上传入口、封面流程、发布重试、登录失效识别与网络失败快速返回全面增强。
|
||||
- [x] **视频号发布修复**: 标题+标签统一写入“视频描述”,`post_create` 成功信号快速判定,超时改为失败返回。
|
||||
- [x] **成功截图闭环**: 抖音/视频号发布成功截图接入前端,支持用户隔离存储与鉴权访问。
|
||||
- [x] **截图观感优化**: 成功截图延后 3 秒并改为视口截图,修复“截图内容仅占 1/3”问题。
|
||||
- [x] **调试能力开关化**: 新增视频号录屏配置,默认可按环境变量开关,失败排障更直观。
|
||||
- [x] **启动链路统一**: 合并为 `run_backend.sh`(xvfb + headful),统一端口 `8006`,减少多进程混淆。
|
||||
- [x] **发布页防误操作**: 发布中按钮提示“请勿刷新或关闭网页”,并启用刷新/关页二次确认拦截。
|
||||
- [ ] **后续优化**: 发布任务状态恢复机制(任务化 + 状态持久化 + 前端轮询恢复)。
|
||||
|
||||
### Day 18: 后端模块化与规范完善
|
||||
- [x] **模块化迁移**: 路由透传 `modules/*`,业务逻辑集中到 service/workflow。
|
||||
- [x] **视频生成拆分**: 生成流程下沉 workflow,任务状态统一 TaskStore。
|
||||
- [x] **Redis 任务存储**: Redis 优先,不可用自动回退内存。
|
||||
- [x] **仓储层抽离**: Supabase 访问统一 `repositories/*`,deps/auth/admin 全面替换。
|
||||
- [x] **响应规范**: 统一 `success/message/data/code` + 全局异常处理。
|
||||
- [x] **素材重命名**: 新增重命名接口与 Storage `move_file`。
|
||||
- [x] **平台顺序调整**: 抖音/微信视频号/B站/小红书,移除快手。
|
||||
- [x] **后端开发规范**: 新增 `BACKEND_DEV.md`,README 同步模块化结构。
|
||||
- [x] **发布管理体验**: 首页预取路由 + 发布页骨架与缓存,进入更快。
|
||||
- [x] **素材加载优化**: 素材列表并发签名 URL,骨架数量动态。
|
||||
- [x] **预览加载优化**: `preload="metadata"` + hover 预取。
|
||||
|
||||
### Day 17: 前端重构与体验优化
|
||||
- [x] **UI 组件拆分**: 首页拆分为独立组件,降低 `page.tsx` 复杂度。
|
||||
- [x] **轻量 FSD 迁移**: `app` 页面轻量化,逻辑集中到 `features/*/model`,通用能力下沉 `shared/*`。
|
||||
- [x] **Controller Hooks**: Home/Publish 页面逻辑集中到 Controller Hook,Page 仅组合渲染。
|
||||
- [x] **通用工具抽取**: `media.ts` 统一 API Base / URL / 日期格式化。
|
||||
- [x] **交互优化**: 选择项持久化、列表内定位、刷新回顶部、最新作品优先预览。
|
||||
- [x] **发布页改造**: 作品列表卡片化 + 搜索 + 预览弹窗。
|
||||
- [x] **预览体验**: 预览弹窗统一头部样式与提示文案。
|
||||
- [x] **预览一致性**: 标题/字幕预览按素材分辨率缩放。
|
||||
- [x] **标题同步与限制**: 片头标题同步发布标题,输入法合成态兼容,限制 15 字。
|
||||
- [x] **样式默认与持久化**: 默认样式与字号调整,刷新保留用户选择。
|
||||
- [x] **性能微优化**: 列表渲染优化 + 并行请求 + localStorage 防抖。
|
||||
- [x] **资源能力**: 字体/BGM 资源库 + `/api/assets` 接入。
|
||||
- [x] **音频与字幕修复**: BGM 混音稳定性与字幕断句优化。
|
||||
- [x] **持久化修复**: 接入 `useHomePersistence`,恢复 `isRestored` 逻辑并通过构建。
|
||||
- [x] **预览与选择修复**: 发布预览兼容签名 URL,音频试听路径解析,素材/BGM 回退有效项。
|
||||
- [x] **体验细节优化**: 录音预览 URL 回收,预览弹窗滚动恢复,全局任务提示挂载。
|
||||
|
||||
### Day 16: 深度性能优化
|
||||
- [x] **Qwen-TTS 加速**: 集成 Flash Attention 2 (已停用,被 CosyVoice 3.0 替换)。
|
||||
- [x] **服务守护**: 开发 `Watchdog` 看门狗机制,自动监控并重启僵死服务。
|
||||
- [x] **LatentSync 性能确认**: 验证 DeepCache + 原生 Flash Attn 生效。
|
||||
- [x] **文档重构**: 全面更新 README、部署手册及后端文档。
|
||||
|
||||
### Day 15: 手机号认证迁移
|
||||
- [x] **认证系统升级**: 从邮箱迁移至 11 位手机号注册/登录。
|
||||
- [x] **账户管理**: 新增修改密码、有效期显示、安全退出功能。
|
||||
- [x] **AI 文案助手**: 升级 GLM-4.7-Flash,支持 B站/抖音链接提取与洗稿。
|
||||
|
||||
### Day 14: AI 增强与体验优化
|
||||
- [x] **AI 标题/标签**: 集成 GLM-4API 自动生成视频元数据。
|
||||
- [x] **字幕升级**: Remotion 逐字高亮字幕 (卡拉OK效果) 及动画片头。
|
||||
- [x] **模型升级**: 声音克隆已迁移至 CosyVoice 3.0 (0.5B)。
|
||||
|
||||
### Day 13: 声音克隆集成
|
||||
- [x] **声音克隆微服务**: 封装 CosyVoice 3.0 为独立 API (8010端口,替换 Qwen3-TTS)。
|
||||
- [x] **参考音频管理**: Supabase 存储桶配置与管理接口。
|
||||
- [x] **多模态 TTS**: 前端支持 EdgeTTS / Clone Voice 切换。
|
||||
|
||||
### Day 12: 移动端适配
|
||||
- [x] **iOS 兼容**: 修复 Safari 安全区域、状态栏颜色、Cookie 拦截问题。
|
||||
- [x] **响应式 UI**: 移动端 Header 与发布页重构。
|
||||
|
||||
### Day 11: 上传架构重构
|
||||
- [x] **直传优化**: 前端直传 Supabase Storage,解决 Nginx 30s 超时问题。
|
||||
- [x] **数据隔离**: 用户素材/视频按 UserID 物理隔离。
|
||||
|
||||
### Day 10: HTTPS 与安全
|
||||
- [x] **HTTPS 部署**: 配置 SSL 证书与 Nginx 反向代理。
|
||||
- [x] **安全加固**: Supabase Studio 增加 Basic Auth 保护。
|
||||
|
||||
### Day 9: 认证系统与发布闭环
|
||||
- [x] **用户系统**: 基于 Supabase Auth 实现 JWT 认证。
|
||||
- [x] **发布闭环**: 验证 B站/抖音/小红书 自动发布流程。
|
||||
- [x] **服务自愈**: 配置 PM2 进程守护。
|
||||
|
||||
### Day 1-8: 核心功能构建
|
||||
- [x] **Day 8**: 历史记录持久化与文件管理。
|
||||
- [x] **Day 7**: 社交媒体自动登录与多平台发布。
|
||||
- [x] **Day 6**: **LatentSync 1.6** 升级与服务器部署。
|
||||
- [x] **Day 5**: 前端视频上传与进度反馈。
|
||||
- [x] **Day 4**: MuseTalk (旧版) 口型同步修复。
|
||||
- [x] **Day 3**: 服务器环境配置与模型权重下载。
|
||||
- [x] **Day 1-2**: 项目基础框架 (FastAPI + Next.js) 搭建。
|
||||
|
||||
---
|
||||
|
||||
## 🛤️ 后续规划 (Roadmap)
|
||||
|
||||
### 🔴 优先待办
|
||||
- [x] ~~**配音前置重构 — 第二阶段**: 素材片段截取 + 语音时间轴编排~~ ✅ Day 23 已完成
|
||||
- [ ] **批量生成架构**: 支持 Excel 导入,批量生产视频。
|
||||
- [ ] **定时任务后台化**: 迁移前端触发的定时发布到后端 APScheduler。
|
||||
- [ ] **发布任务恢复机制**: 发布任务化 + 状态持久化 + 前端断点恢复,解决刷新后状态丢失。
|
||||
|
||||
### 🔵 长期探索
|
||||
- [ ] **容器化交付**: 提供完整的 Docker Compose 一键部署包。
|
||||
- [ ] **分布式队列**: 引入 Celery + Redis 处理超高并发任务。
|
||||
|
||||
---
|
||||
|
||||
## 📊 模块完成度
|
||||
|
||||
| 模块 | 进度 | 状态 |
|
||||
|------|------|------|
|
||||
| **核心 API** | 100% | ✅ 稳定 |
|
||||
| **Web UI** | 100% | ✅ 稳定 (移动端适配) |
|
||||
| **唇形同步** | 100% | ✅ LatentSync 1.6 |
|
||||
| **TTS 配音** | 100% | ✅ EdgeTTS + CosyVoice 3.0 + 配音前置 + 时间轴编排 + 自动转写 + 语速控制 + 语气控制 |
|
||||
| **自动发布** | 100% | ✅ 抖音/微信视频号/B站/小红书 |
|
||||
| **用户认证** | 100% | ✅ 手机号 + JWT |
|
||||
| **付费会员** | 100% | ✅ 支付宝电脑网站支付 + 自动激活 |
|
||||
| **部署运维** | 100% | ✅ PM2 + Watchdog |
|
||||
|
||||
---
|
||||
|
||||
## 📎 相关文档
|
||||
|
||||
- [详细开发日志 (DevLogs)](Docs/DevLogs/)
|
||||
- [部署手册 (DEPLOY_MANUAL)](Docs/DEPLOY_MANUAL.md)
|
||||
- [x] **MuseTalk 推理性能优化**: server.py v2 重写 — cv2 直读帧(跳过 ffmpeg→PNG)、人脸检测降频(每5帧)、BiSeNet mask 缓存(每5帧)、cv2.VideoWriter 直写(跳过 PNG 写盘)、batch_size 8→32,预估 30min→8-10min (~3x)。
|
||||
- [x] **Remotion 并发渲染优化**: render.ts 新增 concurrency 参数,从默认 8 提升到 16 (56核 CPU),预估 5min→2-3min。
|
||||
|
||||
### Day 26: 前端优化:板块合并 + 序号标题 + UI 精细化
|
||||
- [x] **板块合并**: 首页 9 个独立板块合并为 5 个主板块(配音方式+配音列表→三、配音;视频素材+时间轴→四、素材编辑;历史作品+作品预览→六、作品)。
|
||||
- [x] **中文序号标题**: 一~十编号(首页一~六,发布页七~十),移除所有 emoji 图标。
|
||||
- [x] **embedded 模式**: 6 个组件支持 `embedded` prop,嵌入时不渲染外层卡片/标题。
|
||||
- [x] **配音列表两行布局**: embedded 模式第 1 行语速+生成配音(右对齐),第 2 行配音列表+刷新。
|
||||
- [x] **子组件自渲染子标题**: MaterialSelector/TimelineEditor embedded 时自渲染 h3 子标题+操作按钮同行。
|
||||
- [x] **下拉对齐**: TitleSubtitlePanel 标签统一 `w-20`,下拉 `w-1/3 min-w-[100px]`,垂直对齐。
|
||||
- [x] **参考音频文案简化**: 底部段落移至标题旁,简化为 `(上传3-10秒语音样本)`。
|
||||
- [x] **账户手机号显示**: AccountSettingsDropdown 新增手机号显示。
|
||||
- [x] **标题显示模式对副标题生效**: payload 条件修复 + UI 下拉上移至板块标题行。
|
||||
- [x] **登录后用户信息立即可用**: AuthContext 暴露 `setUser`,登录成功后立即写入用户数据,修复登录后显示"未知账户"的问题。
|
||||
- [x] **文案微调**: 素材描述改为"上传自拍视频,最多可选4个";显示模式选项加"标题"前缀。
|
||||
- [x] **UI/UX 体验优化**: 操作按钮移动端可见(opacity-40)、手机号脱敏、标题字数计数器、时间轴拖拽抓手图标、截取滑块放大。
|
||||
- [x] **代码质量修复**: 密码弹窗 success 清空、MaterialSelector useMemo + disabled 守卫、TimelineEditor useMemo。
|
||||
- [x] **发布页响应式布局**: 平台账号卡片单行布局,移动端紧凑(小图标/小按钮),桌面端宽松(与其他板块风格一致)。
|
||||
- [x] **移动端刷新回顶部**: `scrollRestoration = "manual"` + 列表 scroll 时间门控(`scrollEffectsEnabled` ref,1 秒内禁止自动滚动)+ 延迟兜底 `scrollTo(0,0)`。
|
||||
- [x] **移动端样式预览缩小**: FloatingStylePreview 移动端宽度缩至 160px,位置改为右下角,不遮挡样式调节控件。
|
||||
- [x] **列表滚动条统一隐藏**: 所有列表(BGM/配音/作品/素材/文案提取)滚动条改回 `hide-scrollbar`。
|
||||
- [x] **移动端配音/素材适配**: VoiceSelector 按钮移动端缩小(`px-2 sm:px-4`)修复克隆声音不可见;MaterialSelector 标题行移除 `whitespace-nowrap`,描述移动端隐藏,修复刷新按钮溢出。
|
||||
- [x] **生成配音按钮放大**: 从辅助尺寸(`text-xs px-2 py-1`)升级为主操作尺寸(`text-sm font-medium px-4 py-2`),新增阴影。
|
||||
- [x] **生成进度条位置调整**: 从"六、作品"卡片内部提取到右栏独立卡片,显示在作品卡片上方,更醒目。
|
||||
- [x] **LatentSync 超时修复**: httpx 超时从 1200s(20 分钟)改为 3600s(1 小时),修复 2 分钟以上视频口型推理超时回退问题。
|
||||
- [x] **字幕时间戳节奏映射**: `whisper_service.py` 从全程线性插值改为 Whisper 逐词节奏映射,修复长视频字幕漂移。
|
||||
|
||||
### Day 25: 文案提取修复 + 自定义提示词 + 片头副标题
|
||||
- [x] **抖音文案提取修复**: yt-dlp Fresh cookies 报错,重写 `_download_douyin_manual` 为移动端分享页 + 自动获取 ttwid 方案。
|
||||
- [x] **清理 DOUYIN_COOKIE**: 新方案不再需要手动维护 Cookie,从 `.env`/`config.py`/`service.py` 全面删除。
|
||||
- [x] **AI 智能改写自定义提示词**: 后端 `rewrite_script()` 支持 `custom_prompt` 参数;前端 checkbox 旁新增折叠式提示词编辑区,localStorage 持久化。
|
||||
- [x] **SSR 构建修复**: `useState` 初始化 `localStorage` 访问加 `typeof window` 守卫,修复 `npm run build` 报错。
|
||||
- [x] **片头副标题**: 新增 secondary_title(后端/Remotion/前端全链路),AI 同时生成,独立样式配置,20 字限制。
|
||||
- [x] **前端文案修正**: "AI 洗稿结果"→"AI 改写结果"。
|
||||
- [x] **yt-dlp 升级**: `2025.12.08` → `2026.2.21`。
|
||||
- [x] **参考音频中文文件名修复**: `sanitize_filename()` 将存储路径清洗为 ASCII 安全字符,纯中文名哈希兜底,原始名保留为展示名。
|
||||
|
||||
### Day 24: 鉴权到期治理 + 多素材时间轴稳定性修复
|
||||
- [x] **会员到期请求时失效**: 登录与鉴权接口统一执行 `expires_at` 检查;到期后自动停用账号、清理 session,并返回“会员已到期,请续费”。
|
||||
- [x] **画面比例控制**: 时间轴新增 `9:16 / 16:9` 输出比例选择,前端持久化并透传后端,单素材/多素材统一按目标分辨率处理。
|
||||
- [x] **标题/字幕防溢出**: Remotion 与前端预览统一响应式缩放、自动换行、描边/字距/边距比例缩放,降低预览与成片差异。
|
||||
- [x] **标题显示模式**: 标题行新增“短暂显示/常驻显示”下拉;默认短暂显示(4 秒),用户选择持久化并透传至 Remotion 渲染链路。
|
||||
- [x] **MOV 方向归一化**: 新增旋转元数据解析与 orientation normalize,修复“编码横屏+旋转元数据”导致的竖屏判断偏差。
|
||||
- [x] **多素材拼接稳定性**: 片段 prepare 与 concat 统一 25fps/CFR,concat 增加 `+genpts`,缓解段切换处“画面冻结口型还动”。
|
||||
- [x] **时间轴语义对齐**: 打通 `source_end` 全链路;修复 `sourceStart>0 且 sourceEnd=0` 时长计算;生成时以时间轴可见段 assignments 为准,超出段不参与。
|
||||
- [x] **交互细节优化**: 页面刷新回顶部;素材/历史列表首轮自动滚动抑制,减少恢复状态时页面跳动。
|
||||
|
||||
### Day 23: 配音前置重构 + 素材时间轴编排 + UI 体验优化 + 声音克隆增强
|
||||
|
||||
#### 第一阶段:配音前置
|
||||
- [x] **配音生成独立化**: 新增 `generated_audios` 后端模块(router/schemas/service),5 个 API 端点,复用现有 TTSService / voice_clone_service / task_store。
|
||||
- [x] **配音管理面板**: 前端新增 `useGeneratedAudios` hook + `GeneratedAudiosPanel` 组件,支持生成/试听/改名/删除/选中。
|
||||
- [x] **UI 面板重排序**: 文案 → 标题字幕 → 配音方式 → 配音列表 → 素材选择 → BGM → 生成视频。
|
||||
- [x] **素材区门控**: 未选中配音时素材区显示遮罩,选中后显示配音时长 + 素材均分信息。
|
||||
- [x] **视频生成对接**: workflow.py 新增预生成音频分支(`generated_audio_id`),跳过内联 TTS,向后兼容。
|
||||
- [x] **持久化**: selectedAudioId 加入 useHomePersistence,刷新页面恢复选中配音。
|
||||
|
||||
#### 第二阶段:素材时间轴编排
|
||||
- [x] **时间轴编辑器**: 新增 `TimelineEditor` 组件,wavesurfer.js 音频波形 + 色块可视化素材分配,拖拽分割线调整各段时长。
|
||||
- [x] **素材截取设置**: 新增 `ClipTrimmer` 模态框,HTML5 视频预览 + 双端滑块设置源视频截取起点/终点。
|
||||
- [x] **后端自定义分配**: 新增 `CustomAssignment` 模型,`prepare_segment` 支持 `source_start`,workflow 多素材/单素材流水线支持 `custom_assignments`。
|
||||
- [x] **循环截取修复**: `stream_loop + source_start` 改为两步处理(先裁剪再循环),确保从截取起点循环而非从视频 0s 开始。
|
||||
- [x] **MaterialSelector 精简**: 移除旧的时长信息栏和拖拽排序区(功能迁移到 TimelineEditor)。
|
||||
|
||||
#### 第三阶段:UI 体验优化 + TTS 稳定性
|
||||
- [x] **TTS SoX PATH 修复**: `run_qwen_tts.sh` export conda env bin 到 PATH (Qwen3-TTS 已停用,已被 CosyVoice 3.0 替换)。
|
||||
- [x] **TTS 显存管理**: 每次生成后 `torch.cuda.empty_cache()`,asyncio.to_thread 避免阻塞事件循环 (CosyVoice 沿用相同机制)。
|
||||
- [x] **配音列表按钮统一**: Play/Edit/Delete 按钮右侧同组 hover 显示,与 RefAudioPanel 一致,移除文案摘要。
|
||||
- [x] **素材区解除配音门控**: 移除 MaterialSelector 的 selectedAudio 遮罩,素材随时可上传管理。
|
||||
- [x] **时间轴拖拽排序**: TimelineEditor 色块支持 HTML5 Drag & Drop 调换素材顺序。
|
||||
- [x] **截取设置 Range Slider**: ClipTrimmer 改为单轨道双手柄(紫色起点+粉色终点),替换两个独立滑块。
|
||||
- [x] **截取设置视频预览**: 视频区域可播放/暂停,从 sourceStart 到 sourceEnd 自动停止,拖拽手柄时实时 seek。
|
||||
|
||||
#### 第四阶段:历史文案 + Bug 修复
|
||||
- [x] **历史文案保存与加载**: 新增 `useSavedScripts` hook,手动保存/加载/删除历史文案,独立 localStorage 持久化。
|
||||
- [x] **时间轴拖拽修复**: `reorderSegments` 从属性交换改为数组移动(splice),修复拖拽后时长不跟随素材的 Bug。
|
||||
- [x] **按钮视觉统一**: 文案编辑区 4 个按钮统一为固定高度 `h-7`,移除多余 `<span>` 嵌套。
|
||||
- [x] **底部栏调整**: "保存文案"按钮移至底部右侧,移除预计时长显示。
|
||||
|
||||
#### 第五阶段:字幕语言不匹配 + 视频比例错位修复
|
||||
- [x] **字幕用原文替换 Whisper 转录**: `align()` 新增 `original_text` 参数,字幕文字永远用配音保存的原始文案。
|
||||
- [x] **Remotion 动态视频尺寸**: `calculateMetadata` 从 props 读取真实尺寸,修复标题/字幕比例错位。
|
||||
- [x] **英文空格丢失修复**: `split_word_to_chars` 遇到空格时 flush buffer + pending_space 标记。
|
||||
|
||||
#### 第六阶段:参考音频自动转写 + 语速控制
|
||||
- [x] **Whisper 自动转写 ref_text**: 上传参考音频时自动调用 Whisper 转写内容作为 ref_text,不再使用前端固定文字。
|
||||
- [x] **参考音频自动截取**: 超过 10 秒自动在静音点截取(ffmpeg silencedetect),末尾 0.1 秒淡出避免截断爆音。
|
||||
- [x] **重新识别功能**: 新增 `POST /ref-audios/{id}/retranscribe` 端点 + 前端 RotateCw 按钮,旧音频可重新转写并截取。
|
||||
- [x] **语速控制**: 全链路 speed 参数(前端选择器 → 持久化 → 后端 → CosyVoice `inference_zero_shot(speed=)`),5 档:较慢(0.8)/稍慢(0.9)/正常(1.0)/稍快(1.1)/较快(1.2)。
|
||||
- [x] **缺少参考音频门控**: 声音克隆模式下未选参考音频时,生成配音按钮禁用 + 黄色警告提示。
|
||||
- [x] **Whisper 语言自动检测**: `transcribe()` language 参数改为可选(默认 None = 自动检测),支持多语言参考音频。
|
||||
- [x] **前端清理**: 移除固定 ref_text 常量、朗读引导文字,简化为"上传任意语音样本,系统将自动识别内容并克隆声音"。
|
||||
|
||||
### Day 22: 多素材优化 + AI 翻译 + TTS 多语言
|
||||
- [x] **多素材 Bug 修复**: 6 个高优 Bug(边界溢出、单段 fallback、除零、duration 校验、Whisper 兜底、空列表检查)。
|
||||
- [x] **架构重构**: 多素材从"逐段 LatentSync"重构为"先拼接再推理",推理次数 N→1。
|
||||
- [x] **前端优化**: payload 安全、进度消息、上传自动选中、Material 接口统一、拖拽修复、素材上限 4 个。
|
||||
- [x] **AI 多语言翻译**: 新增 `/api/ai/translate` 接口,前端 9 种语言翻译 + 还原原文。
|
||||
- [x] **TTS 多语言**: EdgeTTS 10 语言声音列表、翻译自动切换声音、声音克隆 language 透传、textLang 持久化。
|
||||
|
||||
### Day 21: 缺陷修复 + 浮动预览 + 发布重构 + 架构优化 + 多素材生成
|
||||
- [x] **Remotion 崩溃容错**: 渲染进程 SIGABRT 退出时检查输出文件,避免误判失败导致标题/字幕丢失。
|
||||
- [x] **首页作品选择持久化**: 修复 `fetchGeneratedVideos` 无条件覆盖恢复值的问题,新增 `preferVideoId` 参数控制选中逻辑。
|
||||
- [x] **发布页作品选择持久化**: 根因为签名 URL 不稳定,全面改用 `video.id` 替代 `path` 进行选择/持久化/比较。
|
||||
- [x] **预取缓存补全**: 首页预取发布页数据时加入 `id` 字段,确保缓存数据可用于持久化匹配。
|
||||
- [x] **浮动样式预览窗口**: 标题字幕预览改为 `position: fixed` 浮动窗口,固定左上角,滚动时始终可见。
|
||||
- [x] **移动端适配**: ScriptEditor 按钮换行、预览默认比例改为 9:16 竖屏。
|
||||
- [x] **多平台发布重构**: 平台配置独立化(DOUYIN_*/WEIXIN_*)、用户隔离 Cookie 管理、抖音刷脸验证二维码、微信发布流程优化。
|
||||
- [x] **前端结构微调**: ScriptExtractionModal 迁移到 features/、contexts 迁移到 shared/contexts/、清理空目录。
|
||||
- [x] **后端模块分层**: materials/tools/ref_audios 三个模块补全 router+schemas+service 分层。
|
||||
- [x] **开发规范更新**: BACKEND_DEV.md 新增渐进原则、DOC_RULES.md 取消 TASK_COMPLETE.md 手动触发约束。
|
||||
- [x] **文档全面更新**: BACKEND_DEV/README、FRONTEND_DEV、DEPLOY_MANUAL、README.md 同步更新。
|
||||
- [x] **多素材视频生成(多机位效果)**: 支持多选素材 + 拖拽排序,按素材数量均分音频时长(对齐 Whisper 字边界)自动切换机位。逐段 LatentSync + FFmpeg 拼接。前端 @dnd-kit 拖拽排序 UI。
|
||||
- [x] **字幕开关移除**: 默认启用逐字高亮字幕,移除开关及相关死代码。
|
||||
- [x] **视频格式扩展**: 上传支持 mkv/webm/flv/wmv/m4v/ts/mts 等常见格式。
|
||||
- [x] **Watchdog 优化**: 健康检查阈值提高到 5 次,新增重启冷却期 120 秒,避免误重启。
|
||||
- [x] **多素材 Bug 修复**: 修复标点分句方案对无句末标点文案无效(改为均分方案)、音频时间偏移导致口型不对齐等缺陷。
|
||||
|
||||
### Day 20: 代码质量与安全优化
|
||||
- [x] **功能性修复**: LatentSync 回退逻辑、任务状态接口认证、User 类型统一。
|
||||
- [x] **性能优化**: N+1 查询修复、视频上传流式处理、httpx 异步替换、GLM 异步包装。
|
||||
- [x] **安全修复**: 硬编码 Cookie 配置化、日志敏感信息脱敏、ffprobe 安全调用、CORS 配置化。
|
||||
- [x] **配置优化**: 存储路径环境变量化、Remotion 预编译加速、LatentSync 绝对路径。
|
||||
- [x] **文档更新**: 更新 DOC_RULES.md 清单,补齐后端与部署文档;更新 SUBTITLE_DEPLOY.md, FRONTEND_DEV.md, implementation_plan.md。
|
||||
- [x] **缺陷修复**: 修复 Remotion 路径解析、发布页持久化竞态、首页选中回归、素材闭包陷阱。
|
||||
|
||||
### Day 19: 自动发布稳定性与发布体验优化 🚀
|
||||
- [x] **抖音发布稳定性**: 上传入口、封面流程、发布重试、登录失效识别与网络失败快速返回全面增强。
|
||||
- [x] **视频号发布修复**: 标题+标签统一写入“视频描述”,`post_create` 成功信号快速判定,超时改为失败返回。
|
||||
- [x] **成功截图闭环**: 抖音/视频号发布成功截图接入前端,支持用户隔离存储与鉴权访问。
|
||||
- [x] **截图观感优化**: 成功截图延后 3 秒并改为视口截图,修复“截图内容仅占 1/3”问题。
|
||||
- [x] **调试能力开关化**: 新增视频号录屏配置,默认可按环境变量开关,失败排障更直观。
|
||||
- [x] **启动链路统一**: 合并为 `run_backend.sh`(xvfb + headful),统一端口 `8006`,减少多进程混淆。
|
||||
- [x] **发布页防误操作**: 发布中按钮提示“请勿刷新或关闭网页”,并启用刷新/关页二次确认拦截。
|
||||
- [ ] **后续优化**: 发布任务状态恢复机制(任务化 + 状态持久化 + 前端轮询恢复)。
|
||||
|
||||
### Day 18: 后端模块化与规范完善
|
||||
- [x] **模块化迁移**: 路由透传 `modules/*`,业务逻辑集中到 service/workflow。
|
||||
- [x] **视频生成拆分**: 生成流程下沉 workflow,任务状态统一 TaskStore。
|
||||
- [x] **Redis 任务存储**: Redis 优先,不可用自动回退内存。
|
||||
- [x] **仓储层抽离**: Supabase 访问统一 `repositories/*`,deps/auth/admin 全面替换。
|
||||
- [x] **响应规范**: 统一 `success/message/data/code` + 全局异常处理。
|
||||
- [x] **素材重命名**: 新增重命名接口与 Storage `move_file`。
|
||||
- [x] **平台顺序调整**: 抖音/微信视频号/B站/小红书,移除快手。
|
||||
- [x] **后端开发规范**: 新增 `BACKEND_DEV.md`,README 同步模块化结构。
|
||||
- [x] **发布管理体验**: 首页预取路由 + 发布页骨架与缓存,进入更快。
|
||||
- [x] **素材加载优化**: 素材列表并发签名 URL,骨架数量动态。
|
||||
- [x] **预览加载优化**: `preload="metadata"` + hover 预取。
|
||||
|
||||
### Day 17: 前端重构与体验优化
|
||||
- [x] **UI 组件拆分**: 首页拆分为独立组件,降低 `page.tsx` 复杂度。
|
||||
- [x] **轻量 FSD 迁移**: `app` 页面轻量化,逻辑集中到 `features/*/model`,通用能力下沉 `shared/*`。
|
||||
- [x] **Controller Hooks**: Home/Publish 页面逻辑集中到 Controller Hook,Page 仅组合渲染。
|
||||
- [x] **通用工具抽取**: `media.ts` 统一 API Base / URL / 日期格式化。
|
||||
- [x] **交互优化**: 选择项持久化、列表内定位、刷新回顶部、最新作品优先预览。
|
||||
- [x] **发布页改造**: 作品列表卡片化 + 搜索 + 预览弹窗。
|
||||
- [x] **预览体验**: 预览弹窗统一头部样式与提示文案。
|
||||
- [x] **预览一致性**: 标题/字幕预览按素材分辨率缩放。
|
||||
- [x] **标题同步与限制**: 片头标题同步发布标题,输入法合成态兼容,限制 15 字。
|
||||
- [x] **样式默认与持久化**: 默认样式与字号调整,刷新保留用户选择。
|
||||
- [x] **性能微优化**: 列表渲染优化 + 并行请求 + localStorage 防抖。
|
||||
- [x] **资源能力**: 字体/BGM 资源库 + `/api/assets` 接入。
|
||||
- [x] **音频与字幕修复**: BGM 混音稳定性与字幕断句优化。
|
||||
- [x] **持久化修复**: 接入 `useHomePersistence`,恢复 `isRestored` 逻辑并通过构建。
|
||||
- [x] **预览与选择修复**: 发布预览兼容签名 URL,音频试听路径解析,素材/BGM 回退有效项。
|
||||
- [x] **体验细节优化**: 录音预览 URL 回收,预览弹窗滚动恢复,全局任务提示挂载。
|
||||
|
||||
### Day 16: 深度性能优化
|
||||
- [x] **Qwen-TTS 加速**: 集成 Flash Attention 2 (已停用,被 CosyVoice 3.0 替换)。
|
||||
- [x] **服务守护**: 开发 `Watchdog` 看门狗机制,自动监控并重启僵死服务。
|
||||
- [x] **LatentSync 性能确认**: 验证 DeepCache + 原生 Flash Attn 生效。
|
||||
- [x] **文档重构**: 全面更新 README、部署手册及后端文档。
|
||||
|
||||
### Day 15: 手机号认证迁移
|
||||
- [x] **认证系统升级**: 从邮箱迁移至 11 位手机号注册/登录。
|
||||
- [x] **账户管理**: 新增修改密码、有效期显示、安全退出功能。
|
||||
- [x] **AI 文案助手**: 升级 GLM-4.7-Flash,支持 B站/抖音链接提取与洗稿。
|
||||
|
||||
### Day 14: AI 增强与体验优化
|
||||
- [x] **AI 标题/标签**: 集成 GLM-4API 自动生成视频元数据。
|
||||
- [x] **字幕升级**: Remotion 逐字高亮字幕 (卡拉OK效果) 及动画片头。
|
||||
- [x] **模型升级**: 声音克隆已迁移至 CosyVoice 3.0 (0.5B)。
|
||||
|
||||
### Day 13: 声音克隆集成
|
||||
- [x] **声音克隆微服务**: 封装 CosyVoice 3.0 为独立 API (8010端口,替换 Qwen3-TTS)。
|
||||
- [x] **参考音频管理**: Supabase 存储桶配置与管理接口。
|
||||
- [x] **多模态 TTS**: 前端支持 EdgeTTS / Clone Voice 切换。
|
||||
|
||||
### Day 12: 移动端适配
|
||||
- [x] **iOS 兼容**: 修复 Safari 安全区域、状态栏颜色、Cookie 拦截问题。
|
||||
- [x] **响应式 UI**: 移动端 Header 与发布页重构。
|
||||
|
||||
### Day 11: 上传架构重构
|
||||
- [x] **直传优化**: 前端直传 Supabase Storage,解决 Nginx 30s 超时问题。
|
||||
- [x] **数据隔离**: 用户素材/视频按 UserID 物理隔离。
|
||||
|
||||
### Day 10: HTTPS 与安全
|
||||
- [x] **HTTPS 部署**: 配置 SSL 证书与 Nginx 反向代理。
|
||||
- [x] **安全加固**: Supabase Studio 增加 Basic Auth 保护。
|
||||
|
||||
### Day 9: 认证系统与发布闭环
|
||||
- [x] **用户系统**: 基于 Supabase Auth 实现 JWT 认证。
|
||||
- [x] **发布闭环**: 验证 B站/抖音/小红书 自动发布流程。
|
||||
- [x] **服务自愈**: 配置 PM2 进程守护。
|
||||
|
||||
### Day 1-8: 核心功能构建
|
||||
- [x] **Day 8**: 历史记录持久化与文件管理。
|
||||
- [x] **Day 7**: 社交媒体自动登录与多平台发布。
|
||||
- [x] **Day 6**: **LatentSync 1.6** 升级与服务器部署。
|
||||
- [x] **Day 5**: 前端视频上传与进度反馈。
|
||||
- [x] **Day 4**: MuseTalk (旧版) 口型同步修复。
|
||||
- [x] **Day 3**: 服务器环境配置与模型权重下载。
|
||||
- [x] **Day 1-2**: 项目基础框架 (FastAPI + Next.js) 搭建。
|
||||
|
||||
---
|
||||
|
||||
## 🛤️ 后续规划 (Roadmap)
|
||||
|
||||
### 🔴 优先待办
|
||||
- [x] ~~**配音前置重构 — 第二阶段**: 素材片段截取 + 语音时间轴编排~~ ✅ Day 23 已完成
|
||||
- [ ] **批量生成架构**: 支持 Excel 导入,批量生产视频。
|
||||
- [ ] **定时任务后台化**: 迁移前端触发的定时发布到后端 APScheduler。
|
||||
- [ ] **发布任务恢复机制**: 发布任务化 + 状态持久化 + 前端断点恢复,解决刷新后状态丢失。
|
||||
|
||||
### 🔵 长期探索
|
||||
- [ ] **容器化交付**: 提供完整的 Docker Compose 一键部署包。
|
||||
- [ ] **分布式队列**: 引入 Celery + Redis 处理超高并发任务。
|
||||
|
||||
---
|
||||
|
||||
## 📊 模块完成度
|
||||
|
||||
| 模块 | 进度 | 状态 |
|
||||
|------|------|------|
|
||||
| **核心 API** | 100% | ✅ 稳定 |
|
||||
| **Web UI** | 100% | ✅ 稳定 (移动端适配) |
|
||||
| **唇形同步** | 100% | ✅ LatentSync 1.6 |
|
||||
| **TTS 配音** | 100% | ✅ EdgeTTS + CosyVoice 3.0 + 配音前置 + 时间轴编排 + 自动转写 + 语速控制 + 语气控制 |
|
||||
| **自动发布** | 100% | ✅ 抖音/微信视频号/B站/小红书 |
|
||||
| **用户认证** | 100% | ✅ 手机号 + JWT |
|
||||
| **付费会员** | 100% | ✅ 支付宝电脑网站支付 + 自动激活 |
|
||||
| **部署运维** | 100% | ✅ PM2 + Watchdog |
|
||||
|
||||
---
|
||||
|
||||
## 📎 相关文档
|
||||
|
||||
- [详细开发日志 (DevLogs)](Docs/DevLogs/)
|
||||
- [部署手册 (DEPLOY_MANUAL)](Docs/DEPLOY_MANUAL.md)
|
||||
|
||||
35
README.md
35
README.md
@@ -17,28 +17,30 @@
|
||||
|
||||
### 核心能力
|
||||
- 🎬 **高清唇形同步** - 混合方案:短视频(本仓库当前 `.env` 阈值 100s,可配)用 LatentSync 1.6(高质量 Latent Diffusion),长视频用 MuseTalk 1.5(实时级单步推理),自动路由 + 回退。前端可选模型:默认模型(阈值自动路由)/ 快速模型(速度优先)/ 高级模型(质量优先)。
|
||||
- 🧠 **小脸口型质量补偿(可选)** - 本地唇形路径支持小脸检测 + 裁切 + 稀疏关键帧超分 + 下半脸贴回补偿链路;默认关闭(`LIPSYNC_SMALL_FACE_ENHANCE=false`),失败自动回退原流程(fail-open)。
|
||||
- 🎙️ **多模态配音** - 支持 **EdgeTTS** (微软超自然语音, 10 语言) 和 **CosyVoice 3.0** (3秒极速声音克隆, 9语言+18方言, 语速/语气可调)。上传参考音频自动 Whisper 转写 + 智能截取。配音前置工作流:先生成配音 → 选素材 → 生成视频。
|
||||
- 📝 **智能字幕** - 集成 faster-whisper + Remotion,自动生成逐字高亮 (卡拉OK效果) 字幕。
|
||||
- 🎨 **样式预设** - 12 种标题 + 8 种字幕样式预设,支持预览 + 字号调节 + 自定义字体库。CSS 原生描边渲染,清晰无重影。
|
||||
- 🏷️ **标题显示模式** - 片头标题支持 `短暂显示` / `常驻显示`,默认短暂显示(4秒),用户偏好自动持久化。
|
||||
- 📌 **片头副标题** - 可选副标题显示在主标题下方,独立样式配置,AI 可同时生成,20 字限制。
|
||||
- 🖼️ **作品预览一致性** - 标题/字幕预览与 Remotion 成片统一响应式缩放和自动换行,窄屏画布也稳定显示。
|
||||
- 🎞️ **多素材多机位** - 支持多选素材 + 时间轴编辑器 (wavesurfer.js 波形可视化),拖拽分割线调整时长、拖拽排序切换机位、按 `source_start/source_end` 截取片段。
|
||||
- 🎞️ **多素材多机位** - 支持多选素材 + 时间轴编辑器 (wavesurfer.js 波形可视化),主素材连续循环播放 + 浮动插入镜头块自由叠加,拖拽移动位置、ClipTrimmer 统一编辑截取范围与时长,支持"设为主素材"切换。
|
||||
- 📐 **画面比例控制** - 时间轴一键切换 `9:16 / 16:9` 输出比例,生成链路全程按目标比例处理。
|
||||
- 💾 **用户偏好持久化** - 首页状态统一恢复/保存,刷新后延续上次配置;新作品生成后优先选中最新,后续用户手动选择持续持久化。
|
||||
- 🎵 **背景音乐** - 试听 + 搜索选择 + 混音(当前前端固定混音系数,保持配音音量稳定)。
|
||||
- 🧩 **统一选择器交互** - 首页/发布页业务选择项统一 SelectPopover(桌面 Popover / 移动端 BottomSheet),支持自动上拉、已选定位与连续预览。
|
||||
- 🤖 **AI 辅助创作** - 内置 GLM-4.7-Flash,支持 B站/抖音链接文案提取、AI 智能改写(支持自定义提示词)、标题/标签自动生成、9 语言翻译。
|
||||
- 💾 **用户偏好持久化** - 首页状态统一恢复/保存,刷新后延续上次配置;新作品生成后优先选中最新,后续用户手动选择持续持久化。
|
||||
- 🎵 **背景音乐** - 试听 + 搜索选择 + 混音(当前前端固定混音系数,保持配音音量稳定)。
|
||||
- 🧩 **统一选择器交互** - 首页/发布页业务选择项统一 SelectPopover(桌面 Popover / 移动端 BottomSheet),支持自动上拉、已选定位与连续预览。
|
||||
- 🤖 **AI 辅助创作** - 内置 GLM-4.7-Flash,支持 B站/抖音链接文案提取、AI 智能改写(支持自定义提示词)、文案深度学习(博主话题分析+文案生成)、标题/标签自动生成、9 语言翻译。
|
||||
|
||||
### 平台化功能
|
||||
- 📱 **全自动发布** - 支持抖音/微信视频号/B站/小红书立即发布;扫码登录 + Cookie 持久化。
|
||||
- 🖥️ **发布管理预览** - 支持签名 URL / 相对路径作品预览,确保可直接播放。
|
||||
- 📸 **发布结果可视化** - 抖音/微信视频号/小红书发布成功后返回截图,发布页结果卡片可直接查看。
|
||||
- 🧹 **发布后工作区清理引导** - 全平台发布成功后弹出不可误关清理弹窗(失败可重试,达到阈值可暂不清理),仅清输入内容并保留用户偏好。
|
||||
- ⬇️ **一键下载直达** - 首页与发布成功弹窗下载统一走同源 `attachment` 接口,不再新开标签页播放视频。
|
||||
- 🛡️ **发布防误操作** - 发布进行中自动提示“请勿刷新或关闭网页”,并拦截刷新/关页二次确认。
|
||||
- 🖥️ **发布管理预览** - 支持签名 URL / 相对路径作品预览,确保可直接播放。
|
||||
- 📸 **发布结果可视化** - 抖音/微信视频号/小红书发布成功后返回截图,发布页结果卡片可直接查看。
|
||||
- 🧹 **发布后工作区清理引导** - 全平台发布成功后弹出不可误关清理弹窗(失败可重试,达到阈值可暂不清理),仅清输入内容并保留用户偏好。
|
||||
- ⬇️ **一键下载直达** - 首页与发布成功弹窗下载统一走同源 `attachment` 接口,不再新开标签页播放视频。
|
||||
- 🛡️ **发布防误操作** - 发布进行中自动提示“请勿刷新或关闭网页”,并拦截刷新/关页二次确认。
|
||||
- 💳 **付费会员** - 支付宝电脑网站支付自动开通会员,到期自动停用并引导续费,管理员手动激活并存。
|
||||
- 🔐 **认证与隔离** - 基于 Supabase 的用户隔离,支持手机号注册/登录、密码管理。
|
||||
- 🛡️ **安全基线** - AI/Tools 接口强制登录鉴权、关键上传链路体积限制、生产环境默认密钥启动拦截。
|
||||
- 🛡️ **服务守护** - 内置 Watchdog 看门狗机制,自动监控并重启僵死服务,确保 7x24h 稳定运行。
|
||||
- 🚀 **性能优化** - 编码流水线从 5-6 次有损编码精简至 3 次(prepare_segment → 模型输出 → Remotion)、compose 流复制免重编码、同分辨率跳过 scale、FFmpeg 超时保护、全局视频生成并发限制 (Semaphore(2))、Remotion 4 并发渲染、MuseTalk rawvideo 管道直编码(消除中间有损文件)、模型常驻服务、双 GPU 流水线并发、Redis 任务 TTL 自动清理、workflow 阻塞调用线程池化。
|
||||
|
||||
@@ -63,12 +65,13 @@
|
||||
我们提供了详尽的开发与部署文档:
|
||||
|
||||
### 部署运维
|
||||
- **[部署手册 (DEPLOY_MANUAL.md)](Docs/DEPLOY_MANUAL.md)** - 👈 **部署请看这里**!包含完整的环境搭建步骤。
|
||||
- [多平台发布部署说明 (PUBLISH_DEPLOY.md)](Docs/PUBLISH_DEPLOY.md) - 抖音/微信视频号/B站/小红书登录与自动化发布专项文档。
|
||||
- [参考音频服务部署 (COSYVOICE3_DEPLOY.md)](Docs/COSYVOICE3_DEPLOY.md) - 声音克隆模型部署指南。
|
||||
- [LatentSync 部署指南 (LATENTSYNC_DEPLOY.md)](Docs/LATENTSYNC_DEPLOY.md) - 唇形同步模型独立部署。
|
||||
- [MuseTalk 部署指南 (MUSETALK_DEPLOY.md)](Docs/MUSETALK_DEPLOY.md) - 长视频唇形同步模型部署。
|
||||
- [Supabase 部署指南 (SUPABASE_DEPLOY.md)](Docs/SUPABASE_DEPLOY.md) - Supabase 与认证系统配置。
|
||||
- **[部署手册 (DEPLOY_MANUAL.md)](Docs/DEPLOY_MANUAL.md)** - 👈 **部署请看这里**!包含完整的环境搭建步骤。
|
||||
- [多平台发布部署说明 (PUBLISH_DEPLOY.md)](Docs/PUBLISH_DEPLOY.md) - 抖音/微信视频号/B站/小红书登录与自动化发布专项文档。
|
||||
- [参考音频服务部署 (COSYVOICE3_DEPLOY.md)](Docs/COSYVOICE3_DEPLOY.md) - 声音克隆模型部署指南。
|
||||
- [LatentSync 部署指南 (LATENTSYNC_DEPLOY.md)](Docs/LATENTSYNC_DEPLOY.md) - 唇形同步模型独立部署。
|
||||
- [MuseTalk 部署指南 (MUSETALK_DEPLOY.md)](Docs/MUSETALK_DEPLOY.md) - 长视频唇形同步模型部署。
|
||||
- [小脸口型质量补偿链路部署指南 (FACEENHANCE_DEPLOY.md)](Docs/FACEENHANCE_DEPLOY.md) - 小脸口型质量补偿链路部署与验证。
|
||||
- [Supabase 部署指南 (SUPABASE_DEPLOY.md)](Docs/SUPABASE_DEPLOY.md) - Supabase 与认证系统配置。
|
||||
- [支付宝部署指南 (ALIPAY_DEPLOY.md)](Docs/ALIPAY_DEPLOY.md) - 支付宝付费开通会员配置。
|
||||
|
||||
### 开发文档
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
# 复制此文件为 .env 并填入实际值
|
||||
|
||||
# 调试模式
|
||||
DEBUG=true
|
||||
DEBUG=false
|
||||
|
||||
# Redis 配置 (Celery 任务队列)
|
||||
REDIS_URL=redis://localhost:6379/0
|
||||
@@ -83,6 +83,13 @@ MUSETALK_ENCODE_PRESET=slow
|
||||
# 音频时长 >= 此阈值(秒)用 MuseTalk,< 此阈值用 LatentSync
|
||||
LIPSYNC_DURATION_THRESHOLD=100
|
||||
|
||||
# =============== 小脸口型质量补偿链路 ===============
|
||||
LIPSYNC_SMALL_FACE_ENHANCE=true
|
||||
LIPSYNC_SMALL_FACE_THRESHOLD=256
|
||||
LIPSYNC_SMALL_FACE_UPSCALER=gfpgan
|
||||
LIPSYNC_SMALL_FACE_GPU_ID=0
|
||||
LIPSYNC_SMALL_FACE_FAIL_OPEN=true
|
||||
|
||||
# =============== 上传配置 ===============
|
||||
# 最大上传文件大小 (MB)
|
||||
MAX_UPLOAD_SIZE_MB=500
|
||||
|
||||
@@ -37,22 +37,22 @@ class Settings(BaseSettings):
|
||||
DOUYIN_BROWSER_CHANNEL: str = ""
|
||||
DOUYIN_FORCE_SWIFTSHADER: bool = True
|
||||
|
||||
# Douyin 调试录屏
|
||||
DOUYIN_DEBUG_ARTIFACTS: bool = False
|
||||
DOUYIN_RECORD_VIDEO: bool = False
|
||||
DOUYIN_KEEP_SUCCESS_VIDEO: bool = False
|
||||
DOUYIN_RECORD_VIDEO_WIDTH: int = 1280
|
||||
DOUYIN_RECORD_VIDEO_HEIGHT: int = 720
|
||||
|
||||
# Xiaohongshu Playwright 配置
|
||||
XIAOHONGSHU_HEADLESS_MODE: str = "headless-new"
|
||||
XIAOHONGSHU_USER_AGENT: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36"
|
||||
XIAOHONGSHU_LOCALE: str = "zh-CN"
|
||||
XIAOHONGSHU_TIMEZONE_ID: str = "Asia/Shanghai"
|
||||
XIAOHONGSHU_CHROME_PATH: str = "/usr/bin/google-chrome"
|
||||
XIAOHONGSHU_BROWSER_CHANNEL: str = ""
|
||||
XIAOHONGSHU_FORCE_SWIFTSHADER: bool = True
|
||||
XIAOHONGSHU_DEBUG_ARTIFACTS: bool = False
|
||||
# Douyin 调试录屏
|
||||
DOUYIN_DEBUG_ARTIFACTS: bool = False
|
||||
DOUYIN_RECORD_VIDEO: bool = False
|
||||
DOUYIN_KEEP_SUCCESS_VIDEO: bool = False
|
||||
DOUYIN_RECORD_VIDEO_WIDTH: int = 1280
|
||||
DOUYIN_RECORD_VIDEO_HEIGHT: int = 720
|
||||
|
||||
# Xiaohongshu Playwright 配置
|
||||
XIAOHONGSHU_HEADLESS_MODE: str = "headless-new"
|
||||
XIAOHONGSHU_USER_AGENT: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36"
|
||||
XIAOHONGSHU_LOCALE: str = "zh-CN"
|
||||
XIAOHONGSHU_TIMEZONE_ID: str = "Asia/Shanghai"
|
||||
XIAOHONGSHU_CHROME_PATH: str = "/usr/bin/google-chrome"
|
||||
XIAOHONGSHU_BROWSER_CHANNEL: str = ""
|
||||
XIAOHONGSHU_FORCE_SWIFTSHADER: bool = True
|
||||
XIAOHONGSHU_DEBUG_ARTIFACTS: bool = False
|
||||
|
||||
# TTS 配置
|
||||
DEFAULT_TTS_VOICE: str = "zh-CN-YunxiNeural"
|
||||
@@ -78,6 +78,13 @@ class Settings(BaseSettings):
|
||||
# 混合唇形同步路由
|
||||
LIPSYNC_DURATION_THRESHOLD: float = 120.0 # 秒,>=此值用 MuseTalk
|
||||
|
||||
# 小脸口型质量补偿链路
|
||||
LIPSYNC_SMALL_FACE_ENHANCE: bool = False
|
||||
LIPSYNC_SMALL_FACE_THRESHOLD: int = 256
|
||||
LIPSYNC_SMALL_FACE_UPSCALER: str = "codeformer"
|
||||
LIPSYNC_SMALL_FACE_GPU_ID: int = 0
|
||||
LIPSYNC_SMALL_FACE_FAIL_OPEN: bool = True
|
||||
|
||||
# Supabase 配置
|
||||
SUPABASE_URL: str = ""
|
||||
SUPABASE_PUBLIC_URL: str = "" # 公网访问地址,用于生成前端可访问的 URL
|
||||
|
||||
@@ -130,6 +130,20 @@ app.include_router(generated_audios_router, prefix="/api/generated-audios", tags
|
||||
app.include_router(payment_router) # /api/payment
|
||||
|
||||
|
||||
@app.on_event("startup")
|
||||
async def check_jwt_secret():
|
||||
if settings.JWT_SECRET_KEY == "your-secret-key-change-in-production":
|
||||
if not settings.DEBUG:
|
||||
raise RuntimeError(
|
||||
"JWT_SECRET_KEY is still the default value! "
|
||||
"Set a strong random secret in .env before running in production (DEBUG=False)."
|
||||
)
|
||||
logger.critical(
|
||||
"JWT_SECRET_KEY is still the default value! "
|
||||
"Set a strong random secret in .env for production."
|
||||
)
|
||||
|
||||
|
||||
@app.on_event("startup")
|
||||
async def init_admin():
|
||||
"""
|
||||
|
||||
@@ -4,11 +4,12 @@ AI 相关 API 路由
|
||||
|
||||
from typing import Optional
|
||||
|
||||
from fastapi import APIRouter, HTTPException
|
||||
from fastapi import APIRouter, Depends, HTTPException
|
||||
from pydantic import BaseModel
|
||||
from loguru import logger
|
||||
|
||||
from app.services.glm_service import glm_service
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
|
||||
|
||||
@@ -40,7 +41,7 @@ class TranslateRequest(BaseModel):
|
||||
|
||||
|
||||
@router.post("/translate")
|
||||
async def translate_text(req: TranslateRequest):
|
||||
async def translate_text(req: TranslateRequest, current_user: dict = Depends(get_current_user)):
|
||||
"""
|
||||
AI 翻译文案
|
||||
|
||||
@@ -57,11 +58,11 @@ async def translate_text(req: TranslateRequest):
|
||||
return success_response({"translated_text": translated})
|
||||
except Exception as e:
|
||||
logger.error(f"Translate failed: {e}")
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
raise HTTPException(status_code=500, detail="翻译服务暂时不可用,请稍后重试")
|
||||
|
||||
|
||||
@router.post("/generate-meta")
|
||||
async def generate_meta(req: GenerateMetaRequest):
|
||||
async def generate_meta(req: GenerateMetaRequest, current_user: dict = Depends(get_current_user)):
|
||||
"""
|
||||
AI 生成视频标题和标签
|
||||
|
||||
@@ -80,11 +81,11 @@ async def generate_meta(req: GenerateMetaRequest):
|
||||
).model_dump())
|
||||
except Exception as e:
|
||||
logger.error(f"Generate meta failed: {e}")
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
raise HTTPException(status_code=500, detail="生成标题标签失败,请稍后重试")
|
||||
|
||||
|
||||
@router.post("/rewrite")
|
||||
async def rewrite_script(req: RewriteRequest):
|
||||
async def rewrite_script(req: RewriteRequest, current_user: dict = Depends(get_current_user)):
|
||||
"""AI 改写文案"""
|
||||
if not req.text or not req.text.strip():
|
||||
raise HTTPException(status_code=400, detail="文案不能为空")
|
||||
@@ -95,4 +96,4 @@ async def rewrite_script(req: RewriteRequest):
|
||||
return success_response({"rewritten_text": rewritten})
|
||||
except Exception as e:
|
||||
logger.error(f"Rewrite failed: {e}")
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
raise HTTPException(status_code=500, detail="改写服务暂时不可用,请稍后重试")
|
||||
|
||||
@@ -152,9 +152,9 @@ async def generate_audio_task(task_id: str, req: GenerateAudioRequest, user_id:
|
||||
task_store.update(task_id, {
|
||||
"status": "failed",
|
||||
"message": f"配音生成失败: {str(e)}",
|
||||
"error": traceback.format_exc(),
|
||||
"error": str(e),
|
||||
})
|
||||
logger.error(f"Generate audio failed: {e}")
|
||||
logger.error(f"Generate audio failed: {e}\n{traceback.format_exc()}")
|
||||
|
||||
|
||||
async def list_generated_audios(user_id: str) -> dict:
|
||||
@@ -215,28 +215,28 @@ async def list_generated_audios(user_id: str) -> dict:
|
||||
return GeneratedAudioListResponse(items=items).model_dump()
|
||||
|
||||
|
||||
async def delete_all_generated_audios(user_id: str) -> tuple[int, int]:
|
||||
"""删除用户所有生成的配音(.wav + .json),返回 (删除数量, 失败数量)"""
|
||||
try:
|
||||
files = await storage_service.list_files(BUCKET, user_id, strict=True)
|
||||
deleted_count = 0
|
||||
failed_count = 0
|
||||
for f in files:
|
||||
name = f.get("name", "")
|
||||
if not name or name == ".emptyFolderPlaceholder":
|
||||
continue
|
||||
if name.endswith("_audio.wav") or name.endswith("_audio.json"):
|
||||
full_path = f"{user_id}/{name}"
|
||||
try:
|
||||
await storage_service.delete_file(BUCKET, full_path)
|
||||
deleted_count += 1
|
||||
except Exception as e:
|
||||
failed_count += 1
|
||||
logger.warning(f"Delete audio file failed: {full_path}, {e}")
|
||||
return deleted_count, failed_count
|
||||
except Exception as e:
|
||||
logger.error(f"Delete all generated audios failed: {e}")
|
||||
return 0, 1
|
||||
async def delete_all_generated_audios(user_id: str) -> tuple[int, int]:
|
||||
"""删除用户所有生成的配音(.wav + .json),返回 (删除数量, 失败数量)"""
|
||||
try:
|
||||
files = await storage_service.list_files(BUCKET, user_id, strict=True)
|
||||
deleted_count = 0
|
||||
failed_count = 0
|
||||
for f in files:
|
||||
name = f.get("name", "")
|
||||
if not name or name == ".emptyFolderPlaceholder":
|
||||
continue
|
||||
if name.endswith("_audio.wav") or name.endswith("_audio.json"):
|
||||
full_path = f"{user_id}/{name}"
|
||||
try:
|
||||
await storage_service.delete_file(BUCKET, full_path)
|
||||
deleted_count += 1
|
||||
except Exception as e:
|
||||
failed_count += 1
|
||||
logger.warning(f"Delete audio file failed: {full_path}, {e}")
|
||||
return deleted_count, failed_count
|
||||
except Exception as e:
|
||||
logger.error(f"Delete all generated audios failed: {e}")
|
||||
return 0, 1
|
||||
|
||||
|
||||
async def delete_generated_audio(audio_id: str, user_id: str) -> None:
|
||||
|
||||
@@ -14,6 +14,8 @@ router = APIRouter()
|
||||
@router.get("/stream/{material_id:path}")
|
||||
async def stream_material(material_id: str, current_user: dict = Depends(get_current_user)):
|
||||
"""直接流式返回素材文件(同源,避免 CORS canvas taint)"""
|
||||
if ".." in material_id:
|
||||
raise HTTPException(400, "非法素材ID")
|
||||
user_id = current_user["id"]
|
||||
if not material_id.startswith(f"{user_id}/"):
|
||||
raise HTTPException(403, "无权访问此素材")
|
||||
@@ -52,6 +54,8 @@ async def delete_material(material_id: str, current_user: dict = Depends(get_cur
|
||||
try:
|
||||
await service.delete_material(material_id, user_id)
|
||||
return success_response(message="素材已删除")
|
||||
except ValueError as e:
|
||||
raise HTTPException(400, str(e))
|
||||
except PermissionError as e:
|
||||
raise HTTPException(403, str(e))
|
||||
except Exception as e:
|
||||
|
||||
@@ -7,6 +7,7 @@ import aiofiles
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
|
||||
from app.core.config import settings as app_settings
|
||||
from app.services.storage import storage_service
|
||||
|
||||
|
||||
@@ -123,6 +124,9 @@ async def upload_material(request, user_id: str) -> dict:
|
||||
async for chunk in request.stream():
|
||||
await f.write(chunk)
|
||||
total_size += len(chunk)
|
||||
max_bytes = app_settings.MAX_UPLOAD_SIZE_MB * 1024 * 1024
|
||||
if total_size > max_bytes:
|
||||
raise ValueError(f"文件大小超过限制 ({app_settings.MAX_UPLOAD_SIZE_MB}MB)")
|
||||
|
||||
if total_size - last_log > 20 * 1024 * 1024:
|
||||
logger.info(f"Receiving stream... Processed {total_size / (1024*1024):.2f} MB")
|
||||
@@ -239,6 +243,8 @@ async def list_materials(user_id: str) -> list[dict]:
|
||||
|
||||
async def delete_material(material_id: str, user_id: str) -> None:
|
||||
"""删除素材"""
|
||||
if ".." in material_id:
|
||||
raise ValueError("非法素材ID")
|
||||
if not material_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权删除此素材")
|
||||
await storage_service.delete_file(
|
||||
@@ -249,6 +255,8 @@ async def delete_material(material_id: str, user_id: str) -> None:
|
||||
|
||||
async def rename_material(material_id: str, new_name_raw: str, user_id: str) -> dict:
|
||||
"""重命名素材,返回更新后的素材信息"""
|
||||
if ".." in material_id:
|
||||
raise ValueError("非法素材ID")
|
||||
if not material_id.startswith(f"{user_id}/"):
|
||||
raise PermissionError("无权重命名此素材")
|
||||
|
||||
|
||||
@@ -104,6 +104,8 @@ async def upload_ref_audio(file, ref_text: str, user_id: str) -> dict:
|
||||
# 创建临时文件
|
||||
with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp_input:
|
||||
content = await file.read()
|
||||
if len(content) > 5 * 1024 * 1024:
|
||||
raise ValueError("参考音频文件大小不能超过 5MB")
|
||||
tmp_input.write(content)
|
||||
tmp_input_path = tmp_input.name
|
||||
|
||||
|
||||
@@ -1,20 +1,54 @@
|
||||
from fastapi import APIRouter, UploadFile, File, Form, HTTPException
|
||||
from fastapi import APIRouter, Depends, UploadFile, File, Form, HTTPException
|
||||
from typing import Optional
|
||||
from urllib.parse import urlparse
|
||||
import traceback
|
||||
from loguru import logger
|
||||
from pydantic import BaseModel, Field, field_validator
|
||||
|
||||
from app.core.deps import get_current_user
|
||||
from app.core.response import success_response
|
||||
from app.modules.tools import service
|
||||
from app.services import creator_scraper
|
||||
from app.services.creator_scraper import ALLOWED_INPUT_DOMAINS
|
||||
from app.services.glm_service import glm_service
|
||||
|
||||
router = APIRouter()
|
||||
|
||||
|
||||
class AnalyzeCreatorRequest(BaseModel):
|
||||
url: str = Field(..., description="博主主页链接(仅支持抖音/B站 https 链接)")
|
||||
|
||||
@field_validator("url")
|
||||
@classmethod
|
||||
def validate_url_format(cls, value: str) -> str:
|
||||
candidate = value.strip()
|
||||
if len(candidate) > 500:
|
||||
raise ValueError("链接过长")
|
||||
|
||||
parsed = urlparse(candidate)
|
||||
if parsed.scheme != "https":
|
||||
raise ValueError("仅支持 https 链接")
|
||||
|
||||
hostname = (parsed.hostname or "").lower()
|
||||
if hostname not in ALLOWED_INPUT_DOMAINS:
|
||||
raise ValueError(f"不支持的域名: {hostname},仅支持抖音和B站")
|
||||
|
||||
return candidate
|
||||
|
||||
|
||||
class GenerateTopicScriptRequest(BaseModel):
|
||||
analysis_id: str = Field(..., min_length=8, max_length=80, description="分析结果ID")
|
||||
topic: str = Field(..., min_length=2, max_length=30, description="选中的话题(2-30字)")
|
||||
word_count: int = Field(..., ge=80, le=1000, description="目标字数(80-1000)")
|
||||
|
||||
|
||||
@router.post("/extract-script")
|
||||
async def extract_script_tool(
|
||||
file: Optional[UploadFile] = File(None),
|
||||
url: Optional[str] = Form(None),
|
||||
rewrite: bool = Form(True),
|
||||
custom_prompt: Optional[str] = Form(None)
|
||||
custom_prompt: Optional[str] = Form(None),
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""独立文案提取工具"""
|
||||
try:
|
||||
@@ -29,5 +63,64 @@ async def extract_script_tool(
|
||||
logger.error(traceback.format_exc())
|
||||
msg = str(e)
|
||||
if "Fresh cookies" in msg:
|
||||
msg = "下载失败:目标平台开启了反爬验证,请过段时间重试或直接上传视频文件。"
|
||||
raise HTTPException(500, f"提取失败: {msg}")
|
||||
raise HTTPException(500, "下载失败:目标平台开启了反爬验证,请过段时间重试或直接上传视频文件。")
|
||||
raise HTTPException(500, "文案提取失败,请稍后重试")
|
||||
|
||||
|
||||
@router.post("/analyze-creator")
|
||||
async def analyze_creator(
|
||||
req: AnalyzeCreatorRequest,
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""分析博主内容并返回热门话题"""
|
||||
try:
|
||||
user_id = str(current_user.get("id") or "").strip()
|
||||
if not user_id:
|
||||
raise HTTPException(401, "登录状态无效,请重新登录")
|
||||
|
||||
creator_result = await creator_scraper.scrape_creator_titles(req.url, user_id=user_id)
|
||||
titles = creator_result.get("titles") or []
|
||||
topics = await glm_service.analyze_topics(titles)
|
||||
|
||||
analysis_id = creator_scraper.cache_titles(titles, user_id)
|
||||
|
||||
return success_response({
|
||||
"platform": creator_result.get("platform", ""),
|
||||
"creator_name": creator_result.get("creator_name", ""),
|
||||
"topics": topics,
|
||||
"analysis_id": analysis_id,
|
||||
"fetched_count": creator_result.get("fetched_count", len(titles)),
|
||||
})
|
||||
except ValueError as e:
|
||||
raise HTTPException(400, str(e))
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Analyze creator failed: {e}")
|
||||
logger.error(traceback.format_exc())
|
||||
raise HTTPException(500, "博主内容分析失败,请稍后重试")
|
||||
|
||||
|
||||
@router.post("/generate-topic-script")
|
||||
async def generate_topic_script(
|
||||
req: GenerateTopicScriptRequest,
|
||||
current_user: dict = Depends(get_current_user),
|
||||
):
|
||||
"""根据话题生成文案"""
|
||||
try:
|
||||
user_id = str(current_user.get("id") or "").strip()
|
||||
if not user_id:
|
||||
raise HTTPException(401, "登录状态无效,请重新登录")
|
||||
|
||||
titles = creator_scraper.get_cached_titles(req.analysis_id, user_id)
|
||||
script = await glm_service.generate_script_from_topic(req.topic, req.word_count, titles)
|
||||
|
||||
return success_response({"script": script})
|
||||
except ValueError as e:
|
||||
raise HTTPException(400, str(e))
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Generate topic script failed: {e}")
|
||||
logger.error(traceback.format_exc())
|
||||
raise HTTPException(500, "文案生成失败,请稍后重试")
|
||||
|
||||
@@ -8,7 +8,7 @@ import subprocess
|
||||
import traceback
|
||||
from pathlib import Path
|
||||
from typing import Optional, Any
|
||||
from urllib.parse import unquote
|
||||
from urllib.parse import unquote, parse_qs, urlparse
|
||||
|
||||
import httpx
|
||||
from loguru import logger
|
||||
@@ -41,7 +41,19 @@ async def extract_script(file=None, url: Optional[str] = None, rewrite: bool = T
|
||||
raise ValueError("文件名无效")
|
||||
safe_filename = Path(filename).name.replace(" ", "_")
|
||||
temp_path = temp_dir / f"tool_extract_{timestamp}_{safe_filename}"
|
||||
await loop.run_in_executor(None, lambda: shutil.copyfileobj(file.file, open(temp_path, "wb")))
|
||||
max_bytes = 500 * 1024 * 1024 # 500MB
|
||||
total_written = 0
|
||||
with open(temp_path, "wb") as dst:
|
||||
while True:
|
||||
chunk = file.file.read(1024 * 1024)
|
||||
if not chunk:
|
||||
break
|
||||
total_written += len(chunk)
|
||||
if total_written > max_bytes:
|
||||
dst.close()
|
||||
os.remove(temp_path)
|
||||
raise ValueError("上传文件大小不能超过 500MB")
|
||||
dst.write(chunk)
|
||||
logger.info(f"Tool processing upload file: {temp_path}")
|
||||
else:
|
||||
temp_path = await _download_video(url, temp_dir, timestamp)
|
||||
@@ -49,6 +61,13 @@ async def extract_script(file=None, url: Optional[str] = None, rewrite: bool = T
|
||||
if not temp_path or not temp_path.exists():
|
||||
raise ValueError("文件获取失败")
|
||||
|
||||
# 下载文件体积检查(500MB 上限)
|
||||
max_download_bytes = 500 * 1024 * 1024
|
||||
file_size = temp_path.stat().st_size
|
||||
if file_size > max_download_bytes:
|
||||
os.remove(temp_path)
|
||||
raise ValueError(f"下载的文件过大({file_size / (1024*1024):.0f}MB),上限 500MB")
|
||||
|
||||
# 1.5 安全转换: 强制转为 WAV (16k)
|
||||
audio_path = temp_dir / f"extract_audio_{timestamp}.wav"
|
||||
try:
|
||||
@@ -193,10 +212,9 @@ async def _download_douyin_manual(url: str, temp_dir: Path, timestamp: int) -> O
|
||||
|
||||
logger.info(f"[douyin-fallback] Final URL: {final_url}")
|
||||
|
||||
video_id = None
|
||||
match = re.search(r'/video/(\d+)', final_url)
|
||||
if match:
|
||||
video_id = match.group(1)
|
||||
video_id = _extract_douyin_video_id(final_url)
|
||||
if not video_id:
|
||||
video_id = _extract_douyin_video_id(url)
|
||||
|
||||
if not video_id:
|
||||
logger.error("[douyin-fallback] Could not extract video_id")
|
||||
@@ -217,7 +235,8 @@ async def _download_douyin_manual(url: str, temp_dir: Path, timestamp: int) -> O
|
||||
"cbUrlProtocol": "https", "union": True,
|
||||
}
|
||||
)
|
||||
ttwid = ttwid_resp.cookies.get("ttwid", "")
|
||||
fresh_ttwid = ttwid_resp.cookies.get("ttwid")
|
||||
ttwid = str(fresh_ttwid) if fresh_ttwid else ""
|
||||
logger.info(f"[douyin-fallback] Got fresh ttwid (len={len(ttwid)})")
|
||||
except Exception as e:
|
||||
logger.warning(f"[douyin-fallback] Failed to get ttwid: {e}")
|
||||
@@ -277,6 +296,39 @@ async def _download_douyin_manual(url: str, temp_dir: Path, timestamp: int) -> O
|
||||
return None
|
||||
|
||||
|
||||
def _extract_douyin_video_id(candidate_url: str) -> Optional[str]:
|
||||
"""从抖音 URL 中提取视频 ID,兼容 video/share/video/modal_id/vid 等形态"""
|
||||
if not candidate_url:
|
||||
return None
|
||||
|
||||
decoded_url = unquote(candidate_url)
|
||||
parsed = urlparse(decoded_url)
|
||||
|
||||
for source in (decoded_url, parsed.path):
|
||||
for pattern in (r"/video/(\d+)", r"/share/video/(\d+)"):
|
||||
match = re.search(pattern, source)
|
||||
if match:
|
||||
return match.group(1)
|
||||
|
||||
id_keys = ("modal_id", "vid", "video_id", "aweme_id", "item_id")
|
||||
for pairs in (parse_qs(parsed.query), parse_qs(parsed.fragment)):
|
||||
for key in id_keys:
|
||||
values = pairs.get(key, [])
|
||||
for value in values:
|
||||
match = re.search(r"(\d+)", value)
|
||||
if match:
|
||||
return match.group(1)
|
||||
|
||||
inline_match = re.search(
|
||||
r"(?:[?&#](?:modal_id|vid|video_id|aweme_id|item_id)=)(\d+)",
|
||||
decoded_url,
|
||||
)
|
||||
if inline_match:
|
||||
return inline_match.group(1)
|
||||
|
||||
return None
|
||||
|
||||
|
||||
async def _download_bilibili_manual(url: str, temp_dir: Path, timestamp: int) -> Optional[Path]:
|
||||
"""手动下载 Bilibili 视频 (Playwright Fallback)"""
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
import os
|
||||
import re
|
||||
import tempfile
|
||||
import uuid
|
||||
|
||||
@@ -144,6 +145,8 @@ async def list_generated(current_user: dict = Depends(get_current_user)):
|
||||
|
||||
@router.get("/generated/{video_id}/download")
|
||||
async def download_generated(video_id: str, current_user: dict = Depends(get_current_user)):
|
||||
if not re.match(r'^[A-Za-z0-9_-]+$', video_id):
|
||||
raise HTTPException(status_code=400, detail="非法 video_id")
|
||||
user_id = current_user["id"]
|
||||
storage_path = f"{user_id}/{video_id}.mp4"
|
||||
local_path = storage_service.get_local_file_path(
|
||||
@@ -162,6 +165,8 @@ async def download_generated(video_id: str, current_user: dict = Depends(get_cur
|
||||
|
||||
@router.delete("/generated/{video_id}")
|
||||
async def delete_generated(video_id: str, current_user: dict = Depends(get_current_user)):
|
||||
if not re.match(r'^[A-Za-z0-9_-]+$', video_id):
|
||||
raise HTTPException(status_code=400, detail="非法 video_id")
|
||||
result = await delete_generated_video(current_user["id"], video_id)
|
||||
return success_response(result, message="视频已删除")
|
||||
|
||||
|
||||
@@ -188,16 +188,16 @@ async def _process_video_generation_inner(task_id: str, req: GenerateRequest, us
|
||||
try:
|
||||
start_time = time.time()
|
||||
|
||||
# ── 确定素材列表 ──
|
||||
# ── 确定素材列表(优先信任 req.material_paths 去重列表)──
|
||||
material_paths: List[str] = []
|
||||
if req.custom_assignments and len(req.custom_assignments) > 1:
|
||||
material_paths = [a.material_path for a in req.custom_assignments if a.material_path]
|
||||
elif req.material_paths and len(req.material_paths) > 1:
|
||||
if req.material_paths and len(req.material_paths) >= 1:
|
||||
material_paths = req.material_paths
|
||||
else:
|
||||
material_paths = [req.material_path]
|
||||
|
||||
is_multi = len(material_paths) > 1
|
||||
is_multi = len(material_paths) > 1 or (
|
||||
req.custom_assignments is not None and len(req.custom_assignments) > 1
|
||||
)
|
||||
target_resolution = (1080, 1920) if req.output_aspect_ratio == "9:16" else (1920, 1080)
|
||||
|
||||
logger.info(
|
||||
@@ -341,8 +341,18 @@ async def _process_video_generation_inner(task_id: str, req: GenerateRequest, us
|
||||
# ══════════════════════════════════════
|
||||
_update_task(task_id, progress=12, message="正在分配素材...")
|
||||
|
||||
if req.custom_assignments and len(req.custom_assignments) == len(material_paths):
|
||||
# 用户自定义分配,跳过 Whisper 均分
|
||||
if req.custom_assignments and len(req.custom_assignments) >= 1:
|
||||
# 硬上限校验
|
||||
if len(req.custom_assignments) > 50:
|
||||
raise ValueError(f"custom_assignments 数量超限: {len(req.custom_assignments)}")
|
||||
# 校验所有 assignment 的 material_path 都在前端声明的 material_paths 中
|
||||
known_paths = set(material_paths)
|
||||
unknown = [a.material_path for a in req.custom_assignments if a.material_path not in known_paths]
|
||||
if unknown:
|
||||
logger.warning(f"[MultiMat] custom_assignments 包含未知素材路径: {unknown[:3]},终止生成")
|
||||
raise ValueError(f"素材路径校验失败: 包含 {len(unknown)} 个未知路径")
|
||||
|
||||
# 用户自定义分配(多镜头模式:主素材可重复出现)
|
||||
assignments = [
|
||||
{
|
||||
"material_path": a.material_path,
|
||||
@@ -373,20 +383,13 @@ async def _process_video_generation_inner(task_id: str, req: GenerateRequest, us
|
||||
captions_path = None
|
||||
else:
|
||||
captions_path = None
|
||||
elif req.custom_assignments:
|
||||
logger.warning(
|
||||
f"[MultiMat] custom_assignments 数量({len(req.custom_assignments)})"
|
||||
f" 与素材数量({len(material_paths)})不一致,回退自动分配"
|
||||
)
|
||||
|
||||
assignments, captions_path = await _whisper_and_split()
|
||||
|
||||
else:
|
||||
assignments, captions_path = await _whisper_and_split()
|
||||
|
||||
# 扩展段覆盖完整音频范围:首段从0开始,末段到音频结尾
|
||||
# 扩展段覆盖完整音频范围(仅自动均分时执行,自定义分配已精确计算)
|
||||
audio_duration = await _run_blocking(video._get_duration, str(audio_path))
|
||||
if assignments and audio_duration > 0:
|
||||
if not req.custom_assignments and assignments and audio_duration > 0:
|
||||
assignments[0]["start"] = 0.0
|
||||
assignments[-1]["end"] = audio_duration
|
||||
|
||||
@@ -398,65 +401,73 @@ async def _process_video_generation_inner(task_id: str, req: GenerateRequest, us
|
||||
|
||||
lipsync_start = time.time()
|
||||
|
||||
# ── 第一步:并行下载所有素材并检测分辨率 ──
|
||||
material_locals: List[Path] = []
|
||||
resolutions = []
|
||||
# 并发限流(每个任务独立 Semaphore,峰值 2×4=8 个 ffmpeg 进程)
|
||||
_segment_sem = asyncio.Semaphore(4)
|
||||
|
||||
async def _download_and_normalize(i: int, assignment: dict):
|
||||
"""下载单个素材并归一化方向"""
|
||||
material_local = temp_dir / f"{task_id}_material_{i}.mp4"
|
||||
temp_files.append(material_local)
|
||||
await _download_material(assignment["material_path"], material_local)
|
||||
# ── 第一步:去重下载所有素材并检测分辨率 ──
|
||||
unique_paths = list(dict.fromkeys(a["material_path"] for a in assignments))
|
||||
path_to_local: dict = {} # material_path → 本地文件
|
||||
path_to_res: dict = {} # material_path → 分辨率
|
||||
|
||||
normalized_material = temp_dir / f"{task_id}_material_{i}_norm.mp4"
|
||||
normalized_result = await _run_blocking(
|
||||
video.normalize_orientation,
|
||||
str(material_local),
|
||||
str(normalized_material),
|
||||
)
|
||||
if normalized_result != str(material_local):
|
||||
temp_files.append(normalized_material)
|
||||
material_local = normalized_material
|
||||
async def _download_unique(mat_path: str, idx: int):
|
||||
"""去重下载单个素材并归一化方向"""
|
||||
async with _segment_sem:
|
||||
material_local = temp_dir / f"{task_id}_material_{idx}.mp4"
|
||||
temp_files.append(material_local)
|
||||
await _download_material(mat_path, material_local)
|
||||
|
||||
res = video.get_resolution(str(material_local))
|
||||
return material_local, res
|
||||
normalized_material = temp_dir / f"{task_id}_material_{idx}_norm.mp4"
|
||||
normalized_result = await _run_blocking(
|
||||
video.normalize_orientation,
|
||||
str(material_local),
|
||||
str(normalized_material),
|
||||
)
|
||||
if normalized_result != str(material_local):
|
||||
temp_files.append(normalized_material)
|
||||
material_local = normalized_material
|
||||
|
||||
download_tasks = [
|
||||
_download_and_normalize(i, assignment)
|
||||
for i, assignment in enumerate(assignments)
|
||||
]
|
||||
download_results = await asyncio.gather(*download_tasks)
|
||||
for local, res in download_results:
|
||||
material_locals.append(local)
|
||||
resolutions.append(res)
|
||||
res = video.get_resolution(str(material_local))
|
||||
return mat_path, material_local, res
|
||||
|
||||
download_results = await asyncio.gather(*[
|
||||
_download_unique(p, i) for i, p in enumerate(unique_paths)
|
||||
])
|
||||
for mat_path, local, res in download_results:
|
||||
path_to_local[mat_path] = local
|
||||
path_to_res[mat_path] = res
|
||||
|
||||
logger.info(f"[MultiMat] 去重下载 {len(unique_paths)} 个素材(共 {num_segments} 个段)")
|
||||
|
||||
# 按用户选择的画面比例统一分辨率
|
||||
base_res = target_resolution
|
||||
need_scale = any(r != base_res for r in resolutions)
|
||||
need_scale = any(r != base_res for r in path_to_res.values())
|
||||
if need_scale:
|
||||
logger.info(f"[MultiMat] 素材分辨率不一致,统一到 {base_res[0]}x{base_res[1]}")
|
||||
|
||||
# ── 第二步:并行裁剪每段素材到对应时长 ──
|
||||
# ── 第二步:并行裁剪每段素材到对应时长(通过映射找到已下载文件)──
|
||||
prepared_segments: List[Optional[Path]] = [None] * num_segments
|
||||
|
||||
async def _prepare_one_segment(i: int, assignment: dict):
|
||||
"""将单个素材裁剪/循环到对应时长"""
|
||||
seg_dur = assignment["end"] - assignment["start"]
|
||||
prepared_path = temp_dir / f"{task_id}_prepared_{i}.mp4"
|
||||
temp_files.append(prepared_path)
|
||||
prepare_target_res = None if resolutions[i] == base_res else base_res
|
||||
async with _segment_sem:
|
||||
seg_dur = assignment["end"] - assignment["start"]
|
||||
prepared_path = temp_dir / f"{task_id}_prepared_{i}.mp4"
|
||||
temp_files.append(prepared_path)
|
||||
mat_local = path_to_local[assignment["material_path"]]
|
||||
mat_res = path_to_res[assignment["material_path"]]
|
||||
prepare_target_res = None if mat_res == base_res else base_res
|
||||
|
||||
await _run_blocking(
|
||||
video.prepare_segment,
|
||||
str(material_locals[i]),
|
||||
seg_dur,
|
||||
str(prepared_path),
|
||||
prepare_target_res,
|
||||
assignment.get("source_start", 0.0),
|
||||
assignment.get("source_end"),
|
||||
25,
|
||||
)
|
||||
return i, prepared_path
|
||||
await _run_blocking(
|
||||
video.prepare_segment,
|
||||
str(mat_local),
|
||||
seg_dur,
|
||||
str(prepared_path),
|
||||
prepare_target_res,
|
||||
assignment.get("source_start", 0.0),
|
||||
assignment.get("source_end"),
|
||||
25,
|
||||
)
|
||||
return i, prepared_path
|
||||
|
||||
_update_task(
|
||||
task_id,
|
||||
|
||||
1301
backend/app/services/creator_scraper.py
Normal file
1301
backend/app/services/creator_scraper.py
Normal file
File diff suppressed because it is too large
Load Diff
@@ -3,8 +3,10 @@ GLM AI 服务
|
||||
使用智谱 GLM 生成标题和标签
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import re
|
||||
from typing import Any, Optional, cast
|
||||
from loguru import logger
|
||||
from zai import ZhipuAiClient
|
||||
|
||||
@@ -25,6 +27,48 @@ class GLMService:
|
||||
self.client = ZhipuAiClient(api_key=settings.GLM_API_KEY)
|
||||
return self.client
|
||||
|
||||
async def _call_glm(
|
||||
self,
|
||||
*,
|
||||
prompt: str,
|
||||
max_tokens: int,
|
||||
temperature: float,
|
||||
action: str,
|
||||
timeout_seconds: float = 85.0,
|
||||
) -> str:
|
||||
"""统一 GLM 调用入口,避免重复调用代码"""
|
||||
client = self._get_client()
|
||||
logger.info(
|
||||
f"{action} | model={settings.GLM_MODEL} | max_tokens={max_tokens} | temperature={temperature}"
|
||||
)
|
||||
|
||||
try:
|
||||
response = await asyncio.wait_for(
|
||||
asyncio.to_thread(
|
||||
client.chat.completions.create,
|
||||
model=settings.GLM_MODEL,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
thinking={"type": "disabled"},
|
||||
max_tokens=max_tokens,
|
||||
temperature=temperature,
|
||||
),
|
||||
timeout=timeout_seconds,
|
||||
)
|
||||
except asyncio.TimeoutError as exc:
|
||||
raise Exception("GLM 请求超时,请稍后重试") from exc
|
||||
|
||||
completion = cast(Any, response)
|
||||
choices = getattr(completion, "choices", None)
|
||||
if not choices:
|
||||
raise Exception("AI 返回内容为空")
|
||||
|
||||
message = getattr(choices[0], "message", None)
|
||||
content = getattr(message, "content", "")
|
||||
text = content.strip() if isinstance(content, str) else str(content or "").strip()
|
||||
if not text:
|
||||
raise Exception("AI 返回内容为空")
|
||||
return text
|
||||
|
||||
async def generate_title_tags(self, text: str) -> dict:
|
||||
"""
|
||||
根据口播文案生成标题和标签
|
||||
@@ -50,22 +94,13 @@ class GLMService:
|
||||
{{"title": "标题", "secondary_title": "副标题", "tags": ["标签1", "标签2", "标签3"]}}"""
|
||||
|
||||
try:
|
||||
client = self._get_client()
|
||||
logger.info(f"Calling GLM API with model: {settings.GLM_MODEL}")
|
||||
|
||||
# 使用 asyncio.to_thread 包装同步 SDK 调用,避免阻塞事件循环
|
||||
import asyncio
|
||||
response = await asyncio.to_thread(
|
||||
client.chat.completions.create,
|
||||
model=settings.GLM_MODEL,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
thinking={"type": "disabled"}, # 禁用思考模式,加快响应
|
||||
content = await self._call_glm(
|
||||
prompt=prompt,
|
||||
max_tokens=500,
|
||||
temperature=0.7
|
||||
temperature=0.7,
|
||||
action="生成标题与标签",
|
||||
timeout_seconds=75.0,
|
||||
)
|
||||
|
||||
# 提取生成的内容
|
||||
content = response.choices[0].message.content
|
||||
logger.info(f"GLM response (model: {settings.GLM_MODEL}): {content}")
|
||||
|
||||
# 解析 JSON
|
||||
@@ -76,7 +111,7 @@ class GLMService:
|
||||
logger.error(f"GLM service error: {e}")
|
||||
raise Exception(f"AI 生成失败: {str(e)}")
|
||||
|
||||
async def rewrite_script(self, text: str, custom_prompt: str = None) -> str:
|
||||
async def rewrite_script(self, text: str, custom_prompt: Optional[str] = None) -> str:
|
||||
"""
|
||||
AI 改写文案
|
||||
|
||||
@@ -105,28 +140,126 @@ class GLMService:
|
||||
4. 不要返回多余的解释,只返回改写后的正文"""
|
||||
|
||||
try:
|
||||
client = self._get_client()
|
||||
logger.info(f"Using GLM to rewrite script")
|
||||
|
||||
# 使用 asyncio.to_thread 包装同步 SDK 调用,避免阻塞事件循环
|
||||
import asyncio
|
||||
response = await asyncio.to_thread(
|
||||
client.chat.completions.create,
|
||||
model=settings.GLM_MODEL,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
thinking={"type": "disabled"},
|
||||
content = await self._call_glm(
|
||||
prompt=prompt,
|
||||
max_tokens=2000,
|
||||
temperature=0.8
|
||||
temperature=0.8,
|
||||
action="改写文案",
|
||||
timeout_seconds=85.0,
|
||||
)
|
||||
|
||||
content = response.choices[0].message.content
|
||||
logger.info("GLM rewrite completed")
|
||||
return content.strip()
|
||||
return content
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"GLM rewrite error: {e}")
|
||||
raise Exception(f"AI 改写失败: {str(e)}")
|
||||
|
||||
async def analyze_topics(self, titles: list[str]) -> list[str]:
|
||||
"""
|
||||
分析视频标题列表并归纳热门话题(最多 10 个)
|
||||
"""
|
||||
cleaned_titles = [str(title).strip() for title in titles if str(title).strip()]
|
||||
if not cleaned_titles:
|
||||
raise Exception("标题列表为空")
|
||||
|
||||
limited_titles = cleaned_titles[:50]
|
||||
titles_text = "\n".join(f"{idx + 1}. {title}" for idx, title in enumerate(limited_titles))
|
||||
|
||||
prompt = f"""以下是某短视频博主最近发布的视频标题列表:
|
||||
|
||||
{titles_text}
|
||||
|
||||
请分析这些标题,归纳总结出该博主内容中最热门的话题方向。
|
||||
|
||||
要求:
|
||||
1. 提取不超过10个话题方向
|
||||
2. 每个话题用简短短语描述(建议 5-15 字)
|
||||
3. 按热门程度排序(出现频率高的在前)
|
||||
4. 只返回话题列表,每行一个,不要编号、解释或多余内容"""
|
||||
|
||||
try:
|
||||
content = await self._call_glm(
|
||||
prompt=prompt,
|
||||
max_tokens=500,
|
||||
temperature=0.5,
|
||||
action="分析博主话题",
|
||||
timeout_seconds=85.0,
|
||||
)
|
||||
topics = self._parse_topic_lines(content)
|
||||
if not topics:
|
||||
raise Exception("未识别到有效话题")
|
||||
|
||||
logger.info(f"GLM topic analysis completed: {len(topics)} topics")
|
||||
return topics[:10]
|
||||
except Exception as e:
|
||||
logger.error(f"GLM topic analysis error: {e}")
|
||||
raise Exception(f"话题分析失败: {str(e)}")
|
||||
|
||||
async def generate_script_from_topic(self, topic: str, word_count: int, titles: list[str]) -> str:
|
||||
"""
|
||||
根据选中话题与博主标题风格生成文案
|
||||
"""
|
||||
topic_value = str(topic or "").strip()
|
||||
if not topic_value:
|
||||
raise Exception("话题不能为空")
|
||||
|
||||
cleaned_titles = [str(title).strip() for title in titles if str(title).strip()]
|
||||
if not cleaned_titles:
|
||||
raise Exception("参考标题为空")
|
||||
|
||||
word_count_value = max(80, min(int(word_count), 1000))
|
||||
sample_titles = "\n".join(f"{idx + 1}. {title}" for idx, title in enumerate(cleaned_titles[:10]))
|
||||
|
||||
prompt = f"""请围绕「{topic_value}」这个话题,生成一段短视频口播文案。
|
||||
|
||||
参考该博主的标题风格:
|
||||
{sample_titles}
|
||||
|
||||
要求:
|
||||
1. 文案字数约 {word_count_value} 字
|
||||
2. 适合短视频口播,语气自然、有吸引力
|
||||
3. 开头要有钩子吸引观众
|
||||
4. 只返回文案正文,不要标题和其他说明"""
|
||||
|
||||
try:
|
||||
content = await self._call_glm(
|
||||
prompt=prompt,
|
||||
max_tokens=min(word_count_value * 3, 4000),
|
||||
temperature=0.8,
|
||||
action=f"按话题生成文案(topic={topic_value})",
|
||||
timeout_seconds=88.0,
|
||||
)
|
||||
|
||||
logger.info("GLM topic script generation completed")
|
||||
return content
|
||||
except Exception as e:
|
||||
logger.error(f"GLM topic script generation error: {e}")
|
||||
raise Exception(f"文案生成失败: {str(e)}")
|
||||
|
||||
def _parse_topic_lines(self, content: str) -> list[str]:
|
||||
lines = [line.strip() for line in str(content or "").splitlines()]
|
||||
topics: list[str] = []
|
||||
seen: set[str] = set()
|
||||
|
||||
for line in lines:
|
||||
if not line:
|
||||
continue
|
||||
|
||||
cleaned = re.sub(r"^\s*(?:[-*•]+|\d+[.)、\s]+)", "", line).strip()
|
||||
cleaned = cleaned.strip('"“”')
|
||||
if not cleaned:
|
||||
continue
|
||||
|
||||
if cleaned in seen:
|
||||
continue
|
||||
seen.add(cleaned)
|
||||
topics.append(cleaned)
|
||||
|
||||
if len(topics) >= 10:
|
||||
break
|
||||
|
||||
return topics
|
||||
|
||||
|
||||
|
||||
async def translate_text(self, text: str, target_lang: str) -> str:
|
||||
@@ -151,22 +284,15 @@ class GLMService:
|
||||
3. 翻译要自然流畅,符合目标语言的表达习惯"""
|
||||
|
||||
try:
|
||||
client = self._get_client()
|
||||
logger.info(f"Using GLM to translate text to {target_lang}")
|
||||
|
||||
import asyncio
|
||||
response = await asyncio.to_thread(
|
||||
client.chat.completions.create,
|
||||
model=settings.GLM_MODEL,
|
||||
messages=[{"role": "user", "content": prompt}],
|
||||
thinking={"type": "disabled"},
|
||||
content = await self._call_glm(
|
||||
prompt=prompt,
|
||||
max_tokens=2000,
|
||||
temperature=0.3
|
||||
temperature=0.3,
|
||||
action=f"翻译文案(target={target_lang})",
|
||||
timeout_seconds=75.0,
|
||||
)
|
||||
|
||||
content = response.choices[0].message.content
|
||||
logger.info("GLM translation completed")
|
||||
return content.strip()
|
||||
return content
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"GLM translate error: {e}")
|
||||
|
||||
@@ -11,12 +11,13 @@ import asyncio
|
||||
import httpx
|
||||
from pathlib import Path
|
||||
from loguru import logger
|
||||
from typing import Optional, Literal
|
||||
from typing import Optional, Literal
|
||||
|
||||
from app.core.config import settings
|
||||
from app.services.small_face_enhance_service import SmallFaceEnhanceService
|
||||
|
||||
|
||||
class LipSyncService:
|
||||
class LipSyncService:
|
||||
"""唇形同步服务 - LatentSync 1.6 + MuseTalk 1.5 混合方案"""
|
||||
|
||||
def __init__(self):
|
||||
@@ -38,6 +39,9 @@ class LipSyncService:
|
||||
|
||||
# 运行时检测
|
||||
self._weights_available: Optional[bool] = None
|
||||
|
||||
# 小脸增强
|
||||
self._face_enhance = SmallFaceEnhanceService()
|
||||
|
||||
def _check_weights(self) -> bool:
|
||||
"""检查模型权重是否存在"""
|
||||
@@ -93,7 +97,7 @@ class LipSyncService:
|
||||
logger.warning(f"⚠️ 获取媒体时长失败: {e}")
|
||||
return None
|
||||
|
||||
def _loop_video_to_duration(self, video_path: str, output_path: str, target_duration: float) -> str:
|
||||
def _loop_video_to_duration(self, video_path: str, output_path: str, target_duration: float) -> str:
|
||||
"""
|
||||
循环视频以匹配目标时长
|
||||
使用 FFmpeg stream_loop 实现无缝循环
|
||||
@@ -117,47 +121,70 @@ class LipSyncService:
|
||||
else:
|
||||
logger.warning(f"⚠️ 视频循环失败: {result.stderr[:200]}")
|
||||
return video_path
|
||||
except Exception as e:
|
||||
logger.warning(f"⚠️ 视频循环异常: {e}")
|
||||
return video_path
|
||||
except Exception as e:
|
||||
logger.warning(f"⚠️ 视频循环异常: {e}")
|
||||
return video_path
|
||||
|
||||
def _mux_audio_to_video(self, video_path: str, audio_path: str, output_path: str) -> bool:
|
||||
"""将音轨封装到视频,避免增强路径出现无声输出。"""
|
||||
try:
|
||||
cmd = [
|
||||
"ffmpeg", "-y",
|
||||
"-i", video_path,
|
||||
"-i", audio_path,
|
||||
"-map", "0:v:0",
|
||||
"-map", "1:a:0",
|
||||
"-c:v", "copy",
|
||||
"-c:a", "aac",
|
||||
"-shortest",
|
||||
output_path,
|
||||
]
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
|
||||
if result.returncode == 0 and Path(output_path).exists():
|
||||
return True
|
||||
logger.warning(f"⚠️ 音轨封装失败: {result.stderr[:200]}")
|
||||
return False
|
||||
except Exception as e:
|
||||
logger.warning(f"⚠️ 音轨封装异常: {e}")
|
||||
return False
|
||||
|
||||
async def generate(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
fps: int = 25,
|
||||
model_mode: Literal["default", "fast", "advanced"] = "default",
|
||||
) -> str:
|
||||
"""生成唇形同步视频"""
|
||||
logger.info(f"🎬 唇形同步任务: {Path(video_path).name} + {Path(audio_path).name}")
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
normalized_mode: Literal["default", "fast", "advanced"] = model_mode
|
||||
if normalized_mode not in ("default", "fast", "advanced"):
|
||||
normalized_mode = "default"
|
||||
logger.info(f"🧠 Lipsync 模式: {normalized_mode}")
|
||||
|
||||
if self.use_local:
|
||||
return await self._local_generate(video_path, audio_path, output_path, fps, normalized_mode)
|
||||
else:
|
||||
return await self._remote_generate(video_path, audio_path, output_path, fps, normalized_mode)
|
||||
async def generate(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
fps: int = 25,
|
||||
model_mode: Literal["default", "fast", "advanced"] = "default",
|
||||
) -> str:
|
||||
"""生成唇形同步视频"""
|
||||
logger.info(f"🎬 唇形同步任务: {Path(video_path).name} + {Path(audio_path).name}")
|
||||
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
normalized_mode: Literal["default", "fast", "advanced"] = model_mode
|
||||
if normalized_mode not in ("default", "fast", "advanced"):
|
||||
normalized_mode = "default"
|
||||
logger.info(f"🧠 Lipsync 模式: {normalized_mode}")
|
||||
|
||||
if self.use_local:
|
||||
return await self._local_generate(video_path, audio_path, output_path, fps, normalized_mode)
|
||||
else:
|
||||
return await self._remote_generate(video_path, audio_path, output_path, fps, normalized_mode)
|
||||
|
||||
async def _local_generate(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
fps: int,
|
||||
model_mode: Literal["default", "fast", "advanced"],
|
||||
) -> str:
|
||||
"""使用 subprocess 调用 LatentSync conda 环境"""
|
||||
|
||||
logger.info("⏳ 等待 GPU 资源 (排队中)...")
|
||||
async with self._lock:
|
||||
# 使用临时目录存放中间文件
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
tmpdir = Path(tmpdir)
|
||||
async def _local_generate(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
fps: int,
|
||||
model_mode: Literal["default", "fast", "advanced"],
|
||||
) -> str:
|
||||
"""使用 subprocess 调用 LatentSync conda 环境"""
|
||||
|
||||
logger.info("⏳ 等待 GPU 资源 (排队中)...")
|
||||
async with self._lock:
|
||||
# 使用临时目录存放中间文件
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
tmpdir = Path(tmpdir)
|
||||
|
||||
# 获取音频和视频时长
|
||||
audio_duration = self._get_media_duration(audio_path)
|
||||
@@ -172,133 +199,206 @@ class LipSyncService:
|
||||
str(looped_video),
|
||||
audio_duration
|
||||
)
|
||||
else:
|
||||
actual_video_path = video_path
|
||||
|
||||
# 模型路由
|
||||
force_musetalk = model_mode == "fast"
|
||||
force_latentsync = model_mode == "advanced"
|
||||
auto_to_musetalk = (
|
||||
model_mode == "default"
|
||||
and audio_duration is not None
|
||||
and audio_duration >= settings.LIPSYNC_DURATION_THRESHOLD
|
||||
)
|
||||
|
||||
if force_musetalk:
|
||||
logger.info("⚡ 强制快速模型:MuseTalk")
|
||||
musetalk_result = await self._call_musetalk_server(
|
||||
actual_video_path, audio_path, output_path
|
||||
)
|
||||
if musetalk_result:
|
||||
return musetalk_result
|
||||
logger.warning("⚠️ MuseTalk 不可用,快速模型回退到 LatentSync")
|
||||
elif auto_to_musetalk:
|
||||
logger.info(
|
||||
f"🔄 音频 {audio_duration:.1f}s >= {settings.LIPSYNC_DURATION_THRESHOLD}s,路由到 MuseTalk"
|
||||
)
|
||||
musetalk_result = await self._call_musetalk_server(
|
||||
actual_video_path, audio_path, output_path
|
||||
)
|
||||
if musetalk_result:
|
||||
return musetalk_result
|
||||
logger.warning("⚠️ MuseTalk 不可用,回退到 LatentSync(长视频,会较慢)")
|
||||
elif force_latentsync:
|
||||
logger.info("🎯 强制高级模型:LatentSync")
|
||||
|
||||
# 检查 LatentSync 前置条件(仅在需要回退或使用 LatentSync 时)
|
||||
if not self._check_conda_env():
|
||||
logger.warning("⚠️ Conda 环境不可用,使用 Fallback")
|
||||
shutil.copy(video_path, output_path)
|
||||
return output_path
|
||||
|
||||
if not self._check_weights():
|
||||
logger.warning("⚠️ 模型权重不存在,使用 Fallback")
|
||||
shutil.copy(video_path, output_path)
|
||||
return output_path
|
||||
|
||||
if self.use_server:
|
||||
# 模式 A: 调用常驻服务 (加速模式)
|
||||
return await self._call_persistent_server(actual_video_path, audio_path, output_path)
|
||||
else:
|
||||
actual_video_path = video_path
|
||||
|
||||
logger.info("🔄 调用 LatentSync 推理 (subprocess)...")
|
||||
|
||||
temp_output = tmpdir / "output.mp4"
|
||||
|
||||
# 构建命令
|
||||
cmd = [
|
||||
str(self.conda_python),
|
||||
"-m", "scripts.inference",
|
||||
"--unet_config_path", "configs/unet/stage2_512.yaml",
|
||||
"--inference_ckpt_path", "checkpoints/latentsync_unet.pt",
|
||||
"--inference_steps", str(settings.LATENTSYNC_INFERENCE_STEPS),
|
||||
"--guidance_scale", str(settings.LATENTSYNC_GUIDANCE_SCALE),
|
||||
"--video_path", str(actual_video_path), # 使用预处理后的视频
|
||||
"--audio_path", str(audio_path),
|
||||
"--video_out_path", str(temp_output),
|
||||
"--seed", str(settings.LATENTSYNC_SEED),
|
||||
"--temp_dir", str(tmpdir / "cache"),
|
||||
]
|
||||
|
||||
if settings.LATENTSYNC_ENABLE_DEEPCACHE:
|
||||
cmd.append("--enable_deepcache")
|
||||
|
||||
# 设置环境变量
|
||||
env = os.environ.copy()
|
||||
env["CUDA_VISIBLE_DEVICES"] = str(self.gpu_id)
|
||||
|
||||
logger.info(f"🖥️ 执行命令: {' '.join(cmd[:8])}...")
|
||||
logger.info(f"🖥️ GPU: CUDA_VISIBLE_DEVICES={self.gpu_id}")
|
||||
|
||||
# ── 小脸增强 ──
|
||||
enhance_result = None
|
||||
try:
|
||||
# 使用 asyncio subprocess 实现真正的异步执行
|
||||
# 这样事件循环可以继续处理其他请求(如进度查询)
|
||||
process = await asyncio.create_subprocess_exec(
|
||||
*cmd,
|
||||
cwd=str(self.latentsync_dir),
|
||||
env=env,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
enhance_result = self._face_enhance.enhance_if_needed(
|
||||
video_path=str(actual_video_path),
|
||||
tmpdir=tmpdir,
|
||||
gpu_id=settings.LIPSYNC_SMALL_FACE_GPU_ID,
|
||||
)
|
||||
|
||||
# 等待进程完成,带超时
|
||||
try:
|
||||
stdout, stderr = await asyncio.wait_for(
|
||||
process.communicate(),
|
||||
timeout=900 # 15分钟超时
|
||||
)
|
||||
except asyncio.TimeoutError:
|
||||
process.kill()
|
||||
await process.wait()
|
||||
logger.error("⏰ LatentSync 推理超时 (15分钟)")
|
||||
shutil.copy(video_path, output_path)
|
||||
return output_path
|
||||
|
||||
stdout_text = stdout.decode() if stdout else ""
|
||||
stderr_text = stderr.decode() if stderr else ""
|
||||
|
||||
if process.returncode != 0:
|
||||
logger.error(f"LatentSync 推理失败:\n{stderr_text}")
|
||||
logger.error(f"stdout:\n{stdout_text[-1000:] if stdout_text else 'N/A'}")
|
||||
# Fallback
|
||||
shutil.copy(video_path, output_path)
|
||||
return output_path
|
||||
|
||||
logger.info(f"LatentSync 输出:\n{stdout_text[-500:] if stdout_text else 'N/A'}")
|
||||
|
||||
# 检查输出文件
|
||||
if temp_output.exists():
|
||||
shutil.copy(temp_output, output_path)
|
||||
logger.info(f"✅ 唇形同步完成: {output_path}")
|
||||
return output_path
|
||||
else:
|
||||
logger.warning("⚠️ 未找到输出文件,使用 Fallback")
|
||||
shutil.copy(video_path, output_path)
|
||||
return output_path
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"❌ 推理异常: {e}")
|
||||
shutil.copy(video_path, output_path)
|
||||
return output_path
|
||||
if settings.LIPSYNC_SMALL_FACE_FAIL_OPEN:
|
||||
logger.warning(f"⚠️ 小脸增强失败,跳过: {e}")
|
||||
else:
|
||||
raise
|
||||
|
||||
if enhance_result and enhance_result.was_enhanced:
|
||||
track = enhance_result.track
|
||||
if track is None:
|
||||
raise RuntimeError("小脸增强轨迹缺失")
|
||||
|
||||
# 增强路径:模型推理增强后的人脸视频 → 贴回原视频
|
||||
temp_sync = tmpdir / "face_sync.mp4"
|
||||
await self._run_selected_model(
|
||||
video_path=enhance_result.video_path,
|
||||
audio_path=audio_path,
|
||||
output_path=str(temp_sync),
|
||||
tmpdir=tmpdir,
|
||||
model_mode=model_mode,
|
||||
audio_duration=audio_duration,
|
||||
original_video_path=video_path,
|
||||
)
|
||||
|
||||
try:
|
||||
blended = self._face_enhance.blend_back(
|
||||
original_video=str(actual_video_path),
|
||||
lipsync_video=str(temp_sync),
|
||||
track=track,
|
||||
tmpdir=tmpdir,
|
||||
)
|
||||
blended_with_audio = tmpdir / "blended_with_audio.mp4"
|
||||
if not self._mux_audio_to_video(
|
||||
video_path=str(blended),
|
||||
audio_path=audio_path,
|
||||
output_path=str(blended_with_audio),
|
||||
):
|
||||
raise RuntimeError("贴回视频音轨封装失败")
|
||||
|
||||
shutil.copy(str(blended_with_audio), output_path)
|
||||
logger.info(f"✅ 小脸增强 + 唇形同步完成: {output_path}")
|
||||
return output_path
|
||||
except Exception as e:
|
||||
if settings.LIPSYNC_SMALL_FACE_FAIL_OPEN:
|
||||
logger.warning(f"⚠️ 小脸贴回失败,回退原流程: {e}")
|
||||
else:
|
||||
raise
|
||||
|
||||
# 常规路径(未增强或增强失败)
|
||||
return await self._run_selected_model(
|
||||
video_path=str(actual_video_path),
|
||||
audio_path=audio_path,
|
||||
output_path=output_path,
|
||||
tmpdir=tmpdir,
|
||||
model_mode=model_mode,
|
||||
audio_duration=audio_duration,
|
||||
original_video_path=video_path,
|
||||
)
|
||||
|
||||
async def _run_selected_model(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
tmpdir: Path,
|
||||
model_mode: Literal["default", "fast", "advanced"],
|
||||
audio_duration: Optional[float],
|
||||
original_video_path: str,
|
||||
) -> str:
|
||||
"""模型路由 + 执行(MuseTalk / LatentSync 常驻服务 / LatentSync subprocess)"""
|
||||
|
||||
# 模型路由
|
||||
force_musetalk = model_mode == "fast"
|
||||
force_latentsync = model_mode == "advanced"
|
||||
auto_to_musetalk = (
|
||||
model_mode == "default"
|
||||
and audio_duration is not None
|
||||
and audio_duration >= settings.LIPSYNC_DURATION_THRESHOLD
|
||||
)
|
||||
|
||||
if force_musetalk:
|
||||
logger.info("⚡ 强制快速模型:MuseTalk")
|
||||
musetalk_result = await self._call_musetalk_server(
|
||||
video_path, audio_path, output_path
|
||||
)
|
||||
if musetalk_result:
|
||||
return musetalk_result
|
||||
logger.warning("⚠️ MuseTalk 不可用,快速模型回退到 LatentSync")
|
||||
elif auto_to_musetalk:
|
||||
logger.info(
|
||||
f"🔄 音频 {audio_duration:.1f}s >= {settings.LIPSYNC_DURATION_THRESHOLD}s,路由到 MuseTalk"
|
||||
)
|
||||
musetalk_result = await self._call_musetalk_server(
|
||||
video_path, audio_path, output_path
|
||||
)
|
||||
if musetalk_result:
|
||||
return musetalk_result
|
||||
logger.warning("⚠️ MuseTalk 不可用,回退到 LatentSync(长视频,会较慢)")
|
||||
elif force_latentsync:
|
||||
logger.info("🎯 强制高级模型:LatentSync")
|
||||
|
||||
# 检查 LatentSync 前置条件
|
||||
if not self._check_conda_env():
|
||||
logger.warning("⚠️ Conda 环境不可用,使用 Fallback")
|
||||
shutil.copy(original_video_path, output_path)
|
||||
return output_path
|
||||
|
||||
if not self._check_weights():
|
||||
logger.warning("⚠️ 模型权重不存在,使用 Fallback")
|
||||
shutil.copy(original_video_path, output_path)
|
||||
return output_path
|
||||
|
||||
if self.use_server:
|
||||
# 模式 A: 调用常驻服务 (加速模式)
|
||||
return await self._call_persistent_server(video_path, audio_path, output_path)
|
||||
|
||||
logger.info("🔄 调用 LatentSync 推理 (subprocess)...")
|
||||
|
||||
temp_output = tmpdir / "output.mp4"
|
||||
|
||||
# 构建命令
|
||||
cmd = [
|
||||
str(self.conda_python),
|
||||
"-m", "scripts.inference",
|
||||
"--unet_config_path", "configs/unet/stage2_512.yaml",
|
||||
"--inference_ckpt_path", "checkpoints/latentsync_unet.pt",
|
||||
"--inference_steps", str(settings.LATENTSYNC_INFERENCE_STEPS),
|
||||
"--guidance_scale", str(settings.LATENTSYNC_GUIDANCE_SCALE),
|
||||
"--video_path", str(video_path),
|
||||
"--audio_path", str(audio_path),
|
||||
"--video_out_path", str(temp_output),
|
||||
"--seed", str(settings.LATENTSYNC_SEED),
|
||||
"--temp_dir", str(tmpdir / "cache"),
|
||||
]
|
||||
|
||||
if settings.LATENTSYNC_ENABLE_DEEPCACHE:
|
||||
cmd.append("--enable_deepcache")
|
||||
|
||||
# 设置环境变量
|
||||
env = os.environ.copy()
|
||||
env["CUDA_VISIBLE_DEVICES"] = str(self.gpu_id)
|
||||
|
||||
logger.info(f"🖥️ 执行命令: {' '.join(cmd[:8])}...")
|
||||
logger.info(f"🖥️ GPU: CUDA_VISIBLE_DEVICES={self.gpu_id}")
|
||||
|
||||
try:
|
||||
process = await asyncio.create_subprocess_exec(
|
||||
*cmd,
|
||||
cwd=str(self.latentsync_dir),
|
||||
env=env,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE,
|
||||
)
|
||||
|
||||
try:
|
||||
stdout, stderr = await asyncio.wait_for(
|
||||
process.communicate(),
|
||||
timeout=900 # 15分钟超时
|
||||
)
|
||||
except asyncio.TimeoutError:
|
||||
process.kill()
|
||||
await process.wait()
|
||||
logger.error("⏰ LatentSync 推理超时 (15分钟)")
|
||||
shutil.copy(original_video_path, output_path)
|
||||
return output_path
|
||||
|
||||
stdout_text = stdout.decode() if stdout else ""
|
||||
stderr_text = stderr.decode() if stderr else ""
|
||||
|
||||
if process.returncode != 0:
|
||||
logger.error(f"LatentSync 推理失败:\n{stderr_text}")
|
||||
logger.error(f"stdout:\n{stdout_text[-1000:] if stdout_text else 'N/A'}")
|
||||
shutil.copy(original_video_path, output_path)
|
||||
return output_path
|
||||
|
||||
logger.info(f"LatentSync 输出:\n{stdout_text[-500:] if stdout_text else 'N/A'}")
|
||||
|
||||
if temp_output.exists():
|
||||
shutil.copy(temp_output, output_path)
|
||||
logger.info(f"✅ 唇形同步完成: {output_path}")
|
||||
return output_path
|
||||
else:
|
||||
logger.warning("⚠️ 未找到输出文件,使用 Fallback")
|
||||
shutil.copy(original_video_path, output_path)
|
||||
return output_path
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"❌ 推理异常: {e}")
|
||||
shutil.copy(original_video_path, output_path)
|
||||
return output_path
|
||||
|
||||
async def _call_musetalk_server(
|
||||
self, video_path: str, audio_path: str, output_path: str
|
||||
@@ -413,18 +513,18 @@ class LipSyncService:
|
||||
"请确保 LatentSync 服务已启动 (cd models/LatentSync && python scripts/server.py)"
|
||||
)
|
||||
|
||||
async def _remote_generate(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
fps: int,
|
||||
model_mode: Literal["default", "fast", "advanced"],
|
||||
) -> str:
|
||||
"""调用远程 LatentSync API 服务"""
|
||||
if model_mode == "fast":
|
||||
logger.warning("⚠️ 远程模式未接入 MuseTalk,快速模型将使用远程 LatentSync")
|
||||
logger.info(f"📡 调用远程 API: {self.api_url}")
|
||||
async def _remote_generate(
|
||||
self,
|
||||
video_path: str,
|
||||
audio_path: str,
|
||||
output_path: str,
|
||||
fps: int,
|
||||
model_mode: Literal["default", "fast", "advanced"],
|
||||
) -> str:
|
||||
"""调用远程 LatentSync API 服务"""
|
||||
if model_mode == "fast":
|
||||
logger.warning("⚠️ 远程模式未接入 MuseTalk,快速模型将使用远程 LatentSync")
|
||||
logger.info(f"📡 调用远程 API: {self.api_url}")
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=600.0) as client:
|
||||
@@ -499,4 +599,9 @@ class LipSyncService:
|
||||
"ready": conda_ok and weights_ok and gpu_ok,
|
||||
"musetalk_ready": musetalk_ready,
|
||||
"lipsync_threshold": settings.LIPSYNC_DURATION_THRESHOLD,
|
||||
"small_face_enhance": {
|
||||
"enabled": settings.LIPSYNC_SMALL_FACE_ENHANCE,
|
||||
"threshold": settings.LIPSYNC_SMALL_FACE_THRESHOLD,
|
||||
"detector_loaded": self._face_enhance._detector_session is not None,
|
||||
},
|
||||
}
|
||||
|
||||
872
backend/app/services/small_face_enhance_service.py
Normal file
872
backend/app/services/small_face_enhance_service.py
Normal file
@@ -0,0 +1,872 @@
|
||||
"""
|
||||
小脸增强服务
|
||||
远景小脸场景下,裁切 + 超分 -> lipsync 推理 -> 贴回,提升输入质量。
|
||||
|
||||
单文件单类,供 LipSyncService 调用。
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import subprocess
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Optional, Tuple, List
|
||||
|
||||
from loguru import logger
|
||||
|
||||
from app.core.config import settings
|
||||
|
||||
try:
|
||||
import cv2
|
||||
import numpy as np
|
||||
_CV2_AVAILABLE = True
|
||||
except ImportError:
|
||||
_CV2_AVAILABLE = False
|
||||
|
||||
# ── 模块常量 ──
|
||||
PADDING = 0.28 # bbox 外扩比例
|
||||
DETECT_EVERY = 8 # 每 N 帧检测一次
|
||||
TARGET_SIZE = 512 # 超分目标尺寸
|
||||
MASK_FEATHER = 15 # 羽化像素
|
||||
MASK_UPPER_RATIO = 0.68 # 口型区域起始位置(仅覆盖嘴部/下巴)
|
||||
MASK_SIDE_MARGIN = 0.16 # 左右留白比例,避免改动面颊/鼻翼
|
||||
SAMPLE_FRAMES = 24 # 采样帧数
|
||||
SAMPLE_WINDOW = (0.10, 0.30) # 采样窗口 (10%~30%)
|
||||
ENCODE_FPS = 25 # 编码帧率
|
||||
ENCODE_CRF = 18 # 编码质量
|
||||
EMA_ALPHA = 0.3 # EMA 平滑系数
|
||||
|
||||
# 检测过滤
|
||||
MIN_FACE_WIDTH = 50
|
||||
FACE_ASPECT_MIN = 0.2
|
||||
FACE_ASPECT_MAX = 1.5
|
||||
DET_SCORE_THRESH = 0.5
|
||||
NMS_IOU_THRESH = 0.4
|
||||
|
||||
# 权重路径
|
||||
_PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent.parent
|
||||
DET_MODEL_PATH = (
|
||||
_PROJECT_ROOT
|
||||
/ "models" / "LatentSync" / "checkpoints"
|
||||
/ "auxiliary" / "models" / "buffalo_l" / "det_10g.onnx"
|
||||
)
|
||||
|
||||
|
||||
# ── 数据结构 ──
|
||||
|
||||
@dataclass
|
||||
class FaceTrack:
|
||||
"""每帧的人脸追踪数据(用于裁切 + 贴回)"""
|
||||
crop_boxes: List[Tuple[int, int, int, int]] # 每帧 (x1,y1,x2,y2)
|
||||
face_width_median: float
|
||||
frame_count: int
|
||||
frame_w: int
|
||||
frame_h: int
|
||||
|
||||
|
||||
@dataclass
|
||||
class EnhanceResult:
|
||||
"""enhance_if_needed 返回值"""
|
||||
video_path: str
|
||||
was_enhanced: bool
|
||||
track: Optional[FaceTrack] = None
|
||||
face_width: float = 0.0
|
||||
|
||||
|
||||
class SmallFaceEnhanceService:
|
||||
"""小脸增强服务:检测 → 裁切 → 超分 → (lipsync) → 贴回"""
|
||||
|
||||
def __init__(self):
|
||||
self._detector_session = None
|
||||
self._sr_model = None
|
||||
self._sr_type: Optional[str] = None
|
||||
|
||||
# ================================================================
|
||||
# SCRFD 人脸检测(det_10g.onnx,CPU 推理)
|
||||
# ================================================================
|
||||
|
||||
def _ensure_detector(self) -> bool:
|
||||
if self._detector_session is not None:
|
||||
return True
|
||||
if not DET_MODEL_PATH.exists():
|
||||
logger.warning(f"⚠️ SCRFD 权重不存在: {DET_MODEL_PATH}")
|
||||
return False
|
||||
try:
|
||||
import onnxruntime as ort
|
||||
self._detector_session = ort.InferenceSession(
|
||||
str(DET_MODEL_PATH),
|
||||
providers=["CPUExecutionProvider"],
|
||||
)
|
||||
logger.info("✅ SCRFD 检测器已加载")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"⚠️ SCRFD 初始化失败: {e}")
|
||||
return False
|
||||
|
||||
def _detect_faces(self, img_bgr: np.ndarray) -> List[Tuple[np.ndarray, float]]:
|
||||
"""
|
||||
用 SCRFD 检测人脸。
|
||||
Returns: [(bbox_xyxy, score), ...] 按面积降序。
|
||||
"""
|
||||
if self._detector_session is None:
|
||||
return []
|
||||
|
||||
h, w = img_bgr.shape[:2]
|
||||
input_h, input_w = 640, 640
|
||||
|
||||
# ── Preprocess ──
|
||||
ratio = min(input_h / h, input_w / w)
|
||||
new_h, new_w = int(h * ratio), int(w * ratio)
|
||||
resized = cv2.resize(img_bgr, (new_w, new_h))
|
||||
|
||||
padded = np.full((input_h, input_w, 3), 127.5, dtype=np.float32)
|
||||
padded[:new_h, :new_w] = resized.astype(np.float32)
|
||||
|
||||
# BGR → RGB → normalize
|
||||
blob = padded[:, :, ::-1].copy()
|
||||
blob = (blob - 127.5) / 128.0
|
||||
blob = blob.transpose(2, 0, 1)[np.newaxis].astype(np.float32)
|
||||
|
||||
# ── Inference ──
|
||||
input_name = self._detector_session.get_inputs()[0].name
|
||||
outputs = self._detector_session.run(None, {input_name: blob})
|
||||
|
||||
# det_10g outputs: [scores_s8, scores_s16, scores_s32,
|
||||
# bbox_s8, bbox_s16, bbox_s32,
|
||||
# kps_s8, kps_s16, kps_s32]
|
||||
strides = [8, 16, 32]
|
||||
all_bboxes = []
|
||||
all_scores = []
|
||||
|
||||
for i, stride in enumerate(strides):
|
||||
scores = outputs[i].flatten()
|
||||
bboxes = outputs[i + 3].reshape(-1, 4)
|
||||
|
||||
# 生成 anchor 中心
|
||||
feat_h = input_h // stride
|
||||
feat_w = input_w // stride
|
||||
anchors = []
|
||||
for y in range(feat_h):
|
||||
for x in range(feat_w):
|
||||
cx, cy = x * stride, y * stride
|
||||
anchors.append([cx, cy])
|
||||
anchors.append([cx, cy]) # 2 anchors per cell
|
||||
anchors = np.array(anchors, dtype=np.float32)
|
||||
|
||||
# 置信度过滤
|
||||
mask = scores > DET_SCORE_THRESH
|
||||
if not mask.any():
|
||||
continue
|
||||
|
||||
f_scores = scores[mask]
|
||||
f_bboxes = bboxes[mask]
|
||||
f_anchors = anchors[mask]
|
||||
|
||||
# Decode: distance * stride → xyxy
|
||||
decoded = np.empty_like(f_bboxes)
|
||||
decoded[:, 0] = f_anchors[:, 0] - f_bboxes[:, 0] * stride
|
||||
decoded[:, 1] = f_anchors[:, 1] - f_bboxes[:, 1] * stride
|
||||
decoded[:, 2] = f_anchors[:, 0] + f_bboxes[:, 2] * stride
|
||||
decoded[:, 3] = f_anchors[:, 1] + f_bboxes[:, 3] * stride
|
||||
|
||||
# 缩放回原始图像坐标
|
||||
decoded /= ratio
|
||||
|
||||
all_bboxes.append(decoded)
|
||||
all_scores.append(f_scores)
|
||||
|
||||
if not all_bboxes:
|
||||
return []
|
||||
|
||||
bboxes_cat = np.concatenate(all_bboxes)
|
||||
scores_cat = np.concatenate(all_scores)
|
||||
|
||||
# NMS
|
||||
keep = self._nms(bboxes_cat, scores_cat, NMS_IOU_THRESH)
|
||||
|
||||
# 尺寸 + 宽高比过滤
|
||||
results = []
|
||||
for idx in keep:
|
||||
bbox = bboxes_cat[idx]
|
||||
score = float(scores_cat[idx])
|
||||
bw = bbox[2] - bbox[0]
|
||||
bh = bbox[3] - bbox[1]
|
||||
if bw < MIN_FACE_WIDTH or bh < MIN_FACE_WIDTH:
|
||||
continue
|
||||
aspect = bw / max(bh, 1)
|
||||
if aspect < FACE_ASPECT_MIN or aspect > FACE_ASPECT_MAX:
|
||||
continue
|
||||
results.append((bbox.copy(), score))
|
||||
|
||||
results.sort(key=lambda x: (x[0][2] - x[0][0]) * (x[0][3] - x[0][1]), reverse=True)
|
||||
return results
|
||||
|
||||
@staticmethod
|
||||
def _nms(bboxes: np.ndarray, scores: np.ndarray, threshold: float) -> List[int]:
|
||||
x1 = bboxes[:, 0]
|
||||
y1 = bboxes[:, 1]
|
||||
x2 = bboxes[:, 2]
|
||||
y2 = bboxes[:, 3]
|
||||
areas = (x2 - x1) * (y2 - y1)
|
||||
order = scores.argsort()[::-1]
|
||||
keep = []
|
||||
while order.size > 0:
|
||||
i = order[0]
|
||||
keep.append(int(i))
|
||||
if order.size == 1:
|
||||
break
|
||||
xx1 = np.maximum(x1[i], x1[order[1:]])
|
||||
yy1 = np.maximum(y1[i], y1[order[1:]])
|
||||
xx2 = np.minimum(x2[i], x2[order[1:]])
|
||||
yy2 = np.minimum(y2[i], y2[order[1:]])
|
||||
inter = np.maximum(0, xx2 - xx1) * np.maximum(0, yy2 - yy1)
|
||||
iou = inter / (areas[i] + areas[order[1:]] - inter + 1e-6)
|
||||
inds = np.where(iou <= threshold)[0]
|
||||
order = order[inds + 1]
|
||||
return keep
|
||||
|
||||
# ================================================================
|
||||
# 视频工具
|
||||
# ================================================================
|
||||
|
||||
@staticmethod
|
||||
def _get_video_info(video_path: str) -> Optional[Tuple[int, int, int, float]]:
|
||||
"""返回 (width, height, frame_count, fps)"""
|
||||
try:
|
||||
import json as _json
|
||||
cmd = [
|
||||
"ffprobe", "-v", "error",
|
||||
"-select_streams", "v:0",
|
||||
"-show_entries", "stream=width,height,nb_frames,r_frame_rate,avg_frame_rate",
|
||||
"-of", "json",
|
||||
video_path,
|
||||
]
|
||||
r = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
|
||||
if r.returncode != 0:
|
||||
return None
|
||||
info = _json.loads(r.stdout)
|
||||
streams = info.get("streams")
|
||||
if not streams:
|
||||
return None
|
||||
stream = streams[0]
|
||||
w, h = int(stream["width"]), int(stream["height"])
|
||||
# nb_frames 可能为 "N/A" 或缺失
|
||||
nb_raw = stream.get("nb_frames", "N/A")
|
||||
nb = int(nb_raw) if nb_raw not in ("N/A", "") else 0
|
||||
|
||||
def _parse_fps(s: str) -> float:
|
||||
if "/" in s:
|
||||
num, den = s.split("/")
|
||||
return float(num) / float(den) if float(den) != 0 else 0.0
|
||||
return float(s) if s else 0.0
|
||||
|
||||
# 优先 avg_frame_rate(真实平均帧率),r_frame_rate 可能是 timebase 倍数
|
||||
avg_fps = _parse_fps(stream.get("avg_frame_rate", "0/0"))
|
||||
r_fps = _parse_fps(stream.get("r_frame_rate", "25/1"))
|
||||
fps = avg_fps if avg_fps > 0 else (r_fps if r_fps > 0 else 25.0)
|
||||
|
||||
if nb == 0:
|
||||
cmd2 = [
|
||||
"ffprobe", "-v", "error",
|
||||
"-show_entries", "format=duration",
|
||||
"-of", "default=noprint_wrappers=1:nokey=1",
|
||||
video_path,
|
||||
]
|
||||
r2 = subprocess.run(cmd2, capture_output=True, text=True, timeout=10)
|
||||
if r2.returncode == 0 and r2.stdout.strip():
|
||||
nb = int(float(r2.stdout.strip()) * fps)
|
||||
return w, h, nb, fps
|
||||
except Exception as e:
|
||||
logger.warning(f"⚠️ 获取视频信息失败: {e}")
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def _open_video_reader(video_path: str, w: int, h: int,
|
||||
seek_sec: float = 0, duration_sec: float = 0):
|
||||
"""打开 ffmpeg rawvideo 读取管道"""
|
||||
cmd = ["ffmpeg"]
|
||||
if seek_sec > 0:
|
||||
cmd += ["-ss", f"{seek_sec:.3f}"]
|
||||
cmd += ["-i", video_path]
|
||||
if duration_sec > 0:
|
||||
cmd += ["-t", f"{duration_sec:.3f}"]
|
||||
cmd += ["-f", "rawvideo", "-pix_fmt", "bgr24", "-v", "quiet", "-"]
|
||||
return subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
|
||||
|
||||
@staticmethod
|
||||
def _read_one_frame(proc, w: int, h: int) -> Optional[np.ndarray]:
|
||||
raw = proc.stdout.read(w * h * 3)
|
||||
if len(raw) < w * h * 3:
|
||||
return None
|
||||
return np.frombuffer(raw, dtype=np.uint8).reshape(h, w, 3).copy()
|
||||
|
||||
@staticmethod
|
||||
def _open_video_writer(output_path: str, w: int, h: int,
|
||||
fps: int = ENCODE_FPS, crf: int = ENCODE_CRF):
|
||||
"""打开 ffmpeg rawvideo 写入管道"""
|
||||
cmd = [
|
||||
"ffmpeg", "-y",
|
||||
"-f", "rawvideo", "-pix_fmt", "bgr24",
|
||||
"-s", f"{w}x{h}", "-r", str(fps), "-i", "-",
|
||||
"-c:v", "libx264", "-crf", str(crf),
|
||||
"-preset", "fast", "-pix_fmt", "yuv420p",
|
||||
output_path,
|
||||
]
|
||||
return subprocess.Popen(cmd, stdin=subprocess.PIPE, stderr=subprocess.DEVNULL)
|
||||
|
||||
# ================================================================
|
||||
# Phase 2: 人脸尺寸检测
|
||||
# ================================================================
|
||||
|
||||
def _detect_face_size(self, video_path: str) -> Optional[float]:
|
||||
"""
|
||||
从视频 10%~30% 区间均匀采样,检测最大脸宽度中位数。
|
||||
返回 None 表示未检测到人脸或检测器不可用。
|
||||
"""
|
||||
if not self._ensure_detector():
|
||||
return None
|
||||
|
||||
info = self._get_video_info(video_path)
|
||||
if info is None:
|
||||
return None
|
||||
w, h, nb_frames, fps = info
|
||||
if nb_frames < 1 or fps <= 0:
|
||||
return None
|
||||
|
||||
# 计算采样区间
|
||||
start_frame = int(nb_frames * SAMPLE_WINDOW[0])
|
||||
end_frame = int(nb_frames * SAMPLE_WINDOW[1])
|
||||
end_frame = max(end_frame, start_frame + 1)
|
||||
n_sample = min(SAMPLE_FRAMES, end_frame - start_frame)
|
||||
if n_sample <= 0:
|
||||
return None
|
||||
|
||||
step = max(1, (end_frame - start_frame) // n_sample)
|
||||
sample_indices = set(range(start_frame, end_frame, step))
|
||||
|
||||
# 用 ffmpeg seek 定位到采样起点
|
||||
seek_sec = start_frame / fps
|
||||
duration_sec = (end_frame - start_frame) / fps + 0.5 # 余量
|
||||
|
||||
proc = self._open_video_reader(video_path, w, h, seek_sec, duration_sec)
|
||||
face_widths = []
|
||||
try:
|
||||
for local_idx in range(end_frame - start_frame + 1):
|
||||
frame = self._read_one_frame(proc, w, h)
|
||||
if frame is None:
|
||||
break
|
||||
global_idx = start_frame + local_idx
|
||||
if global_idx not in sample_indices:
|
||||
continue
|
||||
faces = self._detect_faces(frame)
|
||||
if faces:
|
||||
bbox = faces[0][0] # 最大脸
|
||||
face_widths.append(float(bbox[2] - bbox[0]))
|
||||
finally:
|
||||
proc.stdout.close()
|
||||
proc.terminate()
|
||||
proc.wait()
|
||||
|
||||
if not face_widths:
|
||||
return None
|
||||
|
||||
face_widths.sort()
|
||||
mid = len(face_widths) // 2
|
||||
if len(face_widths) % 2 == 0:
|
||||
return (face_widths[mid - 1] + face_widths[mid]) / 2
|
||||
return face_widths[mid]
|
||||
|
||||
# ================================================================
|
||||
# Phase 3: 裁切 + 轨迹
|
||||
# ================================================================
|
||||
|
||||
def _build_face_track(self, video_path: str,
|
||||
w: int, h: int, nb_frames: int) -> Optional[FaceTrack]:
|
||||
"""
|
||||
逐帧人脸追踪:每 DETECT_EVERY 帧检测,中间帧 EMA 插值。
|
||||
返回 FaceTrack 或 None(检测失败)。
|
||||
"""
|
||||
if not self._ensure_detector():
|
||||
return None
|
||||
|
||||
detect_set = set(range(0, nb_frames, DETECT_EVERY))
|
||||
|
||||
# 第一遍:检测帧
|
||||
proc = self._open_video_reader(video_path, w, h)
|
||||
keyframe_bboxes = {}
|
||||
actual_frames = 0
|
||||
try:
|
||||
for idx in range(nb_frames):
|
||||
frame = self._read_one_frame(proc, w, h)
|
||||
if frame is None:
|
||||
break
|
||||
actual_frames = idx + 1
|
||||
if idx not in detect_set:
|
||||
continue
|
||||
faces = self._detect_faces(frame)
|
||||
if faces:
|
||||
keyframe_bboxes[idx] = faces[0][0].copy()
|
||||
finally:
|
||||
proc.stdout.close()
|
||||
proc.terminate()
|
||||
proc.wait()
|
||||
|
||||
if not keyframe_bboxes:
|
||||
return None
|
||||
|
||||
# 用实际读到的帧数,避免 _get_video_info 估算偏差
|
||||
nb_frames = actual_frames
|
||||
|
||||
# 前向填充 + EMA 平滑
|
||||
sorted_keys = sorted(keyframe_bboxes.keys())
|
||||
raw_bboxes: List[np.ndarray] = [None] * nb_frames # type: ignore
|
||||
|
||||
for k in sorted_keys:
|
||||
raw_bboxes[k] = keyframe_bboxes[k]
|
||||
|
||||
prev = keyframe_bboxes[sorted_keys[0]]
|
||||
for i in range(nb_frames):
|
||||
if raw_bboxes[i] is not None:
|
||||
prev = raw_bboxes[i]
|
||||
else:
|
||||
raw_bboxes[i] = prev.copy()
|
||||
|
||||
# EMA 平滑
|
||||
smoothed = [raw_bboxes[0].copy()]
|
||||
for i in range(1, nb_frames):
|
||||
s = EMA_ALPHA * raw_bboxes[i] + (1 - EMA_ALPHA) * smoothed[-1]
|
||||
smoothed.append(s)
|
||||
|
||||
# 带 padding 的 crop box(clamp 到帧边界)
|
||||
crop_boxes = []
|
||||
for bbox in smoothed:
|
||||
x1, y1, x2, y2 = bbox
|
||||
bw, bh = x2 - x1, y2 - y1
|
||||
pad_w, pad_h = bw * PADDING, bh * PADDING
|
||||
cx1 = max(0, int(x1 - pad_w))
|
||||
cy1 = max(0, int(y1 - pad_h))
|
||||
cx2 = min(w, int(x2 + pad_w))
|
||||
cy2 = min(h, int(y2 + pad_h))
|
||||
crop_boxes.append((cx1, cy1, cx2, cy2))
|
||||
|
||||
# 中位数脸宽
|
||||
widths = sorted(float(b[2] - b[0]) for b in smoothed)
|
||||
median_w = widths[len(widths) // 2]
|
||||
|
||||
return FaceTrack(
|
||||
crop_boxes=crop_boxes,
|
||||
face_width_median=median_w,
|
||||
frame_count=nb_frames,
|
||||
frame_w=w,
|
||||
frame_h=h,
|
||||
)
|
||||
|
||||
# ================================================================
|
||||
# Phase 3: 超分
|
||||
# ================================================================
|
||||
|
||||
def _ensure_upscaler(self, upscaler: str, gpu_id: int) -> bool:
|
||||
"""懒加载超分模型"""
|
||||
if self._sr_model is not None and self._sr_type == upscaler:
|
||||
return True
|
||||
try:
|
||||
import sys
|
||||
import torch
|
||||
|
||||
# torchvision >= 0.20 移除了 functional_tensor,但 basicsr 仍引用
|
||||
if "torchvision.transforms.functional_tensor" not in sys.modules:
|
||||
try:
|
||||
import torchvision.transforms.functional as _F
|
||||
sys.modules["torchvision.transforms.functional_tensor"] = _F
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
device = torch.device(f"cuda:{gpu_id}" if torch.cuda.is_available() else "cpu")
|
||||
|
||||
if upscaler == "gfpgan":
|
||||
from gfpgan import GFPGANer
|
||||
model_path = _PROJECT_ROOT / "models" / "FaceEnhance" / "GFPGANv1.4.pth"
|
||||
if not model_path.exists():
|
||||
logger.warning(f"⚠️ GFPGAN 权重不存在: {model_path}")
|
||||
return False
|
||||
self._sr_model = GFPGANer(
|
||||
model_path=str(model_path),
|
||||
upscale=2,
|
||||
arch="clean",
|
||||
channel_multiplier=2,
|
||||
bg_upsampler=None,
|
||||
device=device,
|
||||
)
|
||||
elif upscaler == "codeformer":
|
||||
from basicsr.archs.codeformer_arch import CodeFormer as CodeFormerArch
|
||||
model_path = _PROJECT_ROOT / "models" / "FaceEnhance" / "codeformer.pth"
|
||||
if not model_path.exists():
|
||||
logger.warning(f"⚠️ CodeFormer 权重不存在: {model_path}")
|
||||
# 尝试回退 gfpgan
|
||||
return self._ensure_upscaler("gfpgan", gpu_id)
|
||||
net = CodeFormerArch(
|
||||
dim_embd=512, codebook_size=1024, n_head=8, n_layers=9,
|
||||
connect_list=["32", "64", "128", "256"],
|
||||
).to(device)
|
||||
ckpt = torch.load(str(model_path), map_location=device, weights_only=False)
|
||||
net.load_state_dict(ckpt.get("params_ema", ckpt.get("params", ckpt)))
|
||||
net.eval()
|
||||
self._sr_model = net
|
||||
self._sr_device = device
|
||||
else:
|
||||
logger.warning(f"⚠️ 未知超分器: {upscaler}")
|
||||
return False
|
||||
|
||||
self._sr_type = upscaler
|
||||
logger.info(f"✅ 超分器已加载: {upscaler}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"⚠️ 超分器初始化失败 ({upscaler}): {e}")
|
||||
return False
|
||||
|
||||
def _upscale_face(self, face_img: np.ndarray, target_size: int) -> np.ndarray:
|
||||
"""用已加载的超分模型增强单帧,失败回退 bicubic"""
|
||||
try:
|
||||
if self._sr_type == "gfpgan":
|
||||
_, _, output = self._sr_model.enhance(
|
||||
face_img, paste_back=False, has_aligned=False,
|
||||
)
|
||||
if output is not None:
|
||||
return cv2.resize(
|
||||
output, (target_size, target_size),
|
||||
interpolation=cv2.INTER_LANCZOS4,
|
||||
)
|
||||
elif self._sr_type == "codeformer":
|
||||
import torch
|
||||
img = cv2.resize(face_img, (512, 512))
|
||||
img_t = (
|
||||
torch.from_numpy(img.astype(np.float32) / 255.0)
|
||||
.permute(2, 0, 1)
|
||||
.unsqueeze(0)
|
||||
.to(self._sr_device)
|
||||
)
|
||||
with torch.no_grad():
|
||||
out = self._sr_model(img_t, w=0.7)[0]
|
||||
out_np = (
|
||||
out.squeeze().permute(1, 2, 0).cpu().numpy() * 255
|
||||
).clip(0, 255).astype(np.uint8)
|
||||
return cv2.resize(
|
||||
out_np, (target_size, target_size),
|
||||
interpolation=cv2.INTER_LANCZOS4,
|
||||
)
|
||||
except Exception as e:
|
||||
logger.debug(f"超分失败,回退 bicubic: {e}")
|
||||
|
||||
return cv2.resize(
|
||||
face_img, (target_size, target_size),
|
||||
interpolation=cv2.INTER_CUBIC,
|
||||
)
|
||||
|
||||
# ================================================================
|
||||
# Phase 3: 裁切 + 超分 → 增强视频
|
||||
# ================================================================
|
||||
|
||||
def _crop_and_upscale_video(
|
||||
self,
|
||||
video_path: str,
|
||||
track: FaceTrack,
|
||||
tmpdir: Path,
|
||||
gpu_id: int,
|
||||
source_fps: float,
|
||||
) -> str:
|
||||
"""
|
||||
裁切人脸区域 → 稀疏关键帧超分 → 输出 TARGET_SIZE 视频。
|
||||
流式处理,不占满内存。
|
||||
"""
|
||||
output_path = str(tmpdir / "enhanced_face.mp4")
|
||||
w, h = track.frame_w, track.frame_h
|
||||
|
||||
upscaler = settings.LIPSYNC_SMALL_FACE_UPSCALER
|
||||
sr_available = self._ensure_upscaler(upscaler, gpu_id)
|
||||
detect_set = set(range(0, track.frame_count, DETECT_EVERY))
|
||||
|
||||
reader = self._open_video_reader(video_path, w, h)
|
||||
out_fps = max(1, int(round(source_fps))) if source_fps > 0 else ENCODE_FPS
|
||||
writer = self._open_video_writer(output_path, TARGET_SIZE, TARGET_SIZE, fps=out_fps)
|
||||
|
||||
try:
|
||||
for idx in range(track.frame_count):
|
||||
frame = self._read_one_frame(reader, w, h)
|
||||
if frame is None:
|
||||
break
|
||||
|
||||
cx1, cy1, cx2, cy2 = track.crop_boxes[idx]
|
||||
cropped = frame[cy1:cy2, cx1:cx2]
|
||||
|
||||
if sr_available and idx in detect_set:
|
||||
enhanced = self._upscale_face(cropped, TARGET_SIZE)
|
||||
else:
|
||||
enhanced = cv2.resize(
|
||||
cropped, (TARGET_SIZE, TARGET_SIZE),
|
||||
interpolation=cv2.INTER_CUBIC,
|
||||
)
|
||||
|
||||
writer.stdin.write(enhanced.tobytes())
|
||||
finally:
|
||||
reader.stdout.close()
|
||||
reader.terminate()
|
||||
reader.wait()
|
||||
writer.stdin.close()
|
||||
writer.wait()
|
||||
|
||||
if not Path(output_path).exists():
|
||||
raise RuntimeError("增强视频写入失败")
|
||||
|
||||
return output_path
|
||||
|
||||
# ================================================================
|
||||
# Phase 3: 贴回
|
||||
# ================================================================
|
||||
|
||||
def blend_back(
|
||||
self,
|
||||
original_video: str,
|
||||
lipsync_video: str,
|
||||
track: FaceTrack,
|
||||
tmpdir,
|
||||
) -> str:
|
||||
"""
|
||||
将 lipsync 推理结果贴回原视频。
|
||||
下半脸 mask + 高斯羽化 + seamlessClone。
|
||||
"""
|
||||
tmpdir = Path(tmpdir)
|
||||
output_path = str(tmpdir / "blended_output.mp4")
|
||||
w, h = track.frame_w, track.frame_h
|
||||
|
||||
# 获取 lipsync 视频尺寸
|
||||
ls_info = self._get_video_info(lipsync_video)
|
||||
if ls_info is None:
|
||||
raise RuntimeError("无法读取 lipsync 视频信息")
|
||||
ls_w, ls_h, ls_frames, ls_fps = ls_info
|
||||
|
||||
if ls_fps <= 0:
|
||||
ls_fps = ENCODE_FPS
|
||||
|
||||
# 帧数保护:lipsync 模型按音频时长输出,帧数通常 <= 原始(looped)视频
|
||||
if ls_frames <= 0:
|
||||
raise RuntimeError(f"lipsync 输出帧数为 {ls_frames},跳过贴回")
|
||||
if ls_frames > track.frame_count:
|
||||
raise RuntimeError(
|
||||
f"帧数异常: lipsync={ls_frames} > original={track.frame_count}"
|
||||
)
|
||||
blend_count = ls_frames
|
||||
|
||||
orig_info = self._get_video_info(original_video)
|
||||
orig_fps = orig_info[3] if orig_info is not None else 0.0
|
||||
if orig_fps <= 0:
|
||||
orig_fps = ls_fps
|
||||
|
||||
orig_reader = self._open_video_reader(original_video, w, h)
|
||||
ls_reader = self._open_video_reader(lipsync_video, ls_w, ls_h)
|
||||
writer = self._open_video_writer(
|
||||
output_path,
|
||||
w,
|
||||
h,
|
||||
fps=max(1, int(round(ls_fps))),
|
||||
)
|
||||
|
||||
current_orig_idx = -1
|
||||
current_orig_frame = None
|
||||
|
||||
try:
|
||||
for idx in range(blend_count):
|
||||
target_orig_idx = min(
|
||||
track.frame_count - 1,
|
||||
int(round((idx / ls_fps) * orig_fps)),
|
||||
)
|
||||
|
||||
while current_orig_idx < target_orig_idx:
|
||||
frame = self._read_one_frame(orig_reader, w, h)
|
||||
if frame is None:
|
||||
current_orig_frame = None
|
||||
break
|
||||
current_orig_idx += 1
|
||||
current_orig_frame = frame
|
||||
|
||||
orig_frame = current_orig_frame
|
||||
ls_frame = self._read_one_frame(ls_reader, ls_w, ls_h)
|
||||
if orig_frame is None or ls_frame is None:
|
||||
break
|
||||
|
||||
cx1, cy1, cx2, cy2 = track.crop_boxes[target_orig_idx]
|
||||
crop_w, crop_h = cx2 - cx1, cy2 - cy1
|
||||
|
||||
# 将 lipsync 输出 resize 到裁切区域尺寸
|
||||
ls_resized = cv2.resize(
|
||||
ls_frame, (crop_w, crop_h),
|
||||
interpolation=cv2.INTER_LANCZOS4,
|
||||
)
|
||||
|
||||
# 嘴部局部 mask(尽量仅覆盖嘴唇与下巴区域,避免鼻子/眼周被改动)
|
||||
mask = np.zeros((crop_h, crop_w), dtype=np.uint8)
|
||||
upper = int(crop_h * MASK_UPPER_RATIO)
|
||||
left = int(crop_w * MASK_SIDE_MARGIN)
|
||||
right = int(crop_w * (1.0 - MASK_SIDE_MARGIN))
|
||||
if right - left < 8:
|
||||
left, right = 0, crop_w
|
||||
|
||||
mask[upper:, left:right] = 255
|
||||
|
||||
# 中央椭圆增强口型区域权重
|
||||
ellipse_center = (crop_w // 2, int(crop_h * 0.82))
|
||||
ellipse_axes = (max(8, int(crop_w * 0.22)), max(8, int(crop_h * 0.13)))
|
||||
cv2.ellipse(mask, ellipse_center, ellipse_axes, 0, 0, 360, 255, -1)
|
||||
mask = cv2.GaussianBlur(mask, (0, 0), MASK_FEATHER)
|
||||
|
||||
# 融合
|
||||
blended = self._blend_face_region(
|
||||
orig_frame, ls_resized, mask, cx1, cy1, cx2, cy2,
|
||||
)
|
||||
writer.stdin.write(blended.tobytes())
|
||||
finally:
|
||||
for p in (orig_reader, ls_reader):
|
||||
p.stdout.close()
|
||||
p.terminate()
|
||||
p.wait()
|
||||
writer.stdin.close()
|
||||
writer.wait()
|
||||
|
||||
if not Path(output_path).exists():
|
||||
raise RuntimeError("融合视频写入失败")
|
||||
return output_path
|
||||
|
||||
@staticmethod
|
||||
def _blend_face_region(
|
||||
orig: np.ndarray,
|
||||
face: np.ndarray,
|
||||
mask: np.ndarray,
|
||||
x1: int, y1: int, x2: int, y2: int,
|
||||
) -> np.ndarray:
|
||||
"""seamlessClone 贴回,失败回退 alpha 混合"""
|
||||
result = orig.copy()
|
||||
crop_h, crop_w = face.shape[:2]
|
||||
|
||||
# 尝试 seamlessClone
|
||||
try:
|
||||
center_x = (x1 + x2) // 2
|
||||
center_y = int(y1 + (y2 - y1) * 0.7)
|
||||
center_x = max(1, min(center_x, orig.shape[1] - 2))
|
||||
center_y = max(1, min(center_y, orig.shape[0] - 2))
|
||||
|
||||
src = np.zeros_like(orig)
|
||||
src[y1:y2, x1:x2] = face
|
||||
|
||||
full_mask = np.zeros(orig.shape[:2], dtype=np.uint8)
|
||||
full_mask[y1:y2, x1:x2] = mask
|
||||
|
||||
if full_mask.max() > 0:
|
||||
cloned = cv2.seamlessClone(
|
||||
src, orig, full_mask, (center_x, center_y), cv2.NORMAL_CLONE,
|
||||
)
|
||||
|
||||
# 限制融合影响范围到 mask 区域,避免 Poisson 扩散导致眼部上方重影
|
||||
alpha = mask.astype(np.float32) / 255.0
|
||||
alpha_3ch = np.stack([alpha] * 3, axis=-1)
|
||||
roi_orig = orig[y1:y2, x1:x2].astype(np.float32)
|
||||
roi_clone = cloned[y1:y2, x1:x2].astype(np.float32)
|
||||
blended_roi = roi_orig * (1 - alpha_3ch) + roi_clone * alpha_3ch
|
||||
|
||||
result = orig.copy()
|
||||
result[y1:y2, x1:x2] = blended_roi.astype(np.uint8)
|
||||
return result
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Fallback: alpha 混合
|
||||
alpha = mask.astype(np.float32) / 255.0
|
||||
alpha_3ch = np.stack([alpha] * 3, axis=-1)
|
||||
crop_region = result[y1:y2, x1:x2].astype(np.float32)
|
||||
blended = crop_region * (1 - alpha_3ch) + face.astype(np.float32) * alpha_3ch
|
||||
result[y1:y2, x1:x2] = blended.astype(np.uint8)
|
||||
return result
|
||||
|
||||
# ================================================================
|
||||
# 主入口
|
||||
# ================================================================
|
||||
|
||||
def enhance_if_needed(
|
||||
self,
|
||||
video_path: str,
|
||||
tmpdir,
|
||||
gpu_id: int,
|
||||
) -> EnhanceResult:
|
||||
"""
|
||||
主入口:检测小脸 → 裁切 + 超分 → 返回增强结果。
|
||||
如不需要增强,返回 was_enhanced=False。
|
||||
"""
|
||||
if not settings.LIPSYNC_SMALL_FACE_ENHANCE:
|
||||
return EnhanceResult(video_path=video_path, was_enhanced=False)
|
||||
|
||||
if not _CV2_AVAILABLE:
|
||||
logger.warning("⚠️ opencv/numpy 未安装,小脸增强不可用")
|
||||
return EnhanceResult(video_path=video_path, was_enhanced=False)
|
||||
|
||||
start = time.time()
|
||||
tmpdir = Path(tmpdir)
|
||||
face_dir = tmpdir / "face_enhance"
|
||||
face_dir.mkdir(exist_ok=True)
|
||||
|
||||
# ── 检测 ──
|
||||
face_width = self._detect_face_size(video_path)
|
||||
if face_width is None:
|
||||
logger.info("小脸增强: 未检测到人脸,跳过")
|
||||
return EnhanceResult(video_path=video_path, was_enhanced=False)
|
||||
|
||||
threshold = settings.LIPSYNC_SMALL_FACE_THRESHOLD
|
||||
if face_width >= threshold:
|
||||
logger.info(
|
||||
f"小脸增强: face_w={face_width:.0f}px >= threshold={threshold}px, 跳过"
|
||||
)
|
||||
return EnhanceResult(
|
||||
video_path=video_path, was_enhanced=False, face_width=face_width,
|
||||
)
|
||||
|
||||
logger.info(
|
||||
f"小脸增强: face_w={face_width:.0f}px < threshold={threshold}px, 触发增强"
|
||||
)
|
||||
|
||||
# ── 构建追踪 ──
|
||||
info = self._get_video_info(video_path)
|
||||
if info is None:
|
||||
raise RuntimeError("无法读取视频信息")
|
||||
w, h, nb_frames, fps = info
|
||||
|
||||
track = self._build_face_track(video_path, w, h, nb_frames)
|
||||
if track is None:
|
||||
raise RuntimeError("人脸追踪失败")
|
||||
|
||||
# ── 裁切 + 超分 ──
|
||||
enhanced_path = self._crop_and_upscale_video(
|
||||
video_path,
|
||||
track,
|
||||
face_dir,
|
||||
gpu_id,
|
||||
source_fps=fps,
|
||||
)
|
||||
|
||||
# 清理 GPU 缓存
|
||||
try:
|
||||
import torch
|
||||
if torch.cuda.is_available():
|
||||
torch.cuda.empty_cache()
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
elapsed = time.time() - start
|
||||
logger.info(
|
||||
f"小脸增强: face_w={face_width:.0f}px threshold={threshold}px "
|
||||
f"enhanced=True upscaler={settings.LIPSYNC_SMALL_FACE_UPSCALER} "
|
||||
f"time={elapsed:.1f}s"
|
||||
)
|
||||
|
||||
return EnhanceResult(
|
||||
video_path=enhanced_path,
|
||||
was_enhanced=True,
|
||||
track=track,
|
||||
face_width=face_width,
|
||||
)
|
||||
@@ -38,3 +38,7 @@ faster-whisper>=1.0.0
|
||||
# 文案提取与AI生成
|
||||
yt-dlp>=2023.0.0
|
||||
zai-sdk>=0.2.0
|
||||
|
||||
# 小脸增强
|
||||
opencv-python-headless>=4.8.0
|
||||
gfpgan>=1.3.8
|
||||
|
||||
@@ -152,7 +152,7 @@ export default function AccountSettingsDropdown() {
|
||||
onClose={closePasswordModal}
|
||||
zIndexClassName="z-[200]"
|
||||
panelClassName="w-full max-w-md rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden"
|
||||
closeOnOverlay={false}
|
||||
closeOnOverlay
|
||||
>
|
||||
<AppModalHeader
|
||||
title="修改密码"
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
import { useEffect, useMemo, useRef, useState } from "react";
|
||||
import { useCallback, useEffect, useMemo, useRef, useState } from "react";
|
||||
import api from "@/shared/api/axios";
|
||||
import {
|
||||
buildTextShadow,
|
||||
@@ -256,6 +256,14 @@ export const useHomeController = () => {
|
||||
const payload = unwrap(res);
|
||||
if (selectedMaterials.includes(materialId) && payload?.id) {
|
||||
setSelectedMaterials((prev) => prev.map((x) => (x === materialId ? payload.id : x)));
|
||||
// Sync inserts: update materialId and name when rename changes the ID
|
||||
if (payload.id !== materialId) {
|
||||
setInserts((prev) => prev.map((ins) =>
|
||||
ins.materialId === materialId
|
||||
? { ...ins, materialId: payload.id, materialName: editMaterialName.trim() }
|
||||
: ins
|
||||
));
|
||||
}
|
||||
}
|
||||
setEditingMaterialId(null);
|
||||
setEditMaterialName("");
|
||||
@@ -287,6 +295,9 @@ export const useHomeController = () => {
|
||||
// 文案提取模态框
|
||||
const [extractModalOpen, setExtractModalOpen] = useState(false);
|
||||
|
||||
// 文案深度学习模态框
|
||||
const [learningModalOpen, setLearningModalOpen] = useState(false);
|
||||
|
||||
// AI 改写模态框
|
||||
const [rewriteModalOpen, setRewriteModalOpen] = useState(false);
|
||||
|
||||
@@ -307,6 +318,7 @@ export const useHomeController = () => {
|
||||
setUploadError,
|
||||
fetchMaterials,
|
||||
toggleMaterial,
|
||||
reorderMaterials,
|
||||
deleteMaterial,
|
||||
handleUpload,
|
||||
} = useMaterials({
|
||||
@@ -394,9 +406,17 @@ export const useHomeController = () => {
|
||||
});
|
||||
|
||||
const {
|
||||
segments: timelineSegments,
|
||||
reorderSegments,
|
||||
setSourceRange,
|
||||
inserts,
|
||||
setInserts,
|
||||
primaryMaterial: timelinePrimaryMaterial,
|
||||
primarySourceStart,
|
||||
primarySourceEnd,
|
||||
addInsert,
|
||||
removeInsert,
|
||||
moveInsert,
|
||||
resizeInsert,
|
||||
setInsertSourceRange,
|
||||
setPrimarySourceRange,
|
||||
toCustomAssignments,
|
||||
} = useTimelineEditor({
|
||||
audioDuration: selectedAudio?.duration_sec ?? 0,
|
||||
@@ -405,16 +425,15 @@ export const useHomeController = () => {
|
||||
storageKey,
|
||||
});
|
||||
|
||||
// 时间轴第一段素材的视频 URL(用于帧截取预览)
|
||||
// 主素材的视频 URL(用于帧截取预览)
|
||||
// 使用后端代理 URL(同源)避免 CORS canvas taint
|
||||
const firstTimelineMaterialUrl = useMemo(() => {
|
||||
const firstSeg = timelineSegments[0];
|
||||
const matId = firstSeg?.materialId ?? selectedMaterials[0];
|
||||
const matId = selectedMaterials[0];
|
||||
if (!matId) return null;
|
||||
const mat = materials.find((m) => m.id === matId);
|
||||
if (!mat) return null;
|
||||
return `/api/materials/stream/${mat.id}`;
|
||||
}, [materials, timelineSegments, selectedMaterials]);
|
||||
}, [materials, selectedMaterials]);
|
||||
|
||||
const materialPosterUrl = useVideoFrameCapture(showStylePreview ? firstTimelineMaterialUrl : null);
|
||||
|
||||
@@ -952,57 +971,36 @@ export const useHomeController = () => {
|
||||
output_aspect_ratio: outputAspectRatio,
|
||||
};
|
||||
|
||||
// 多素材
|
||||
// 多素材(多镜头模式)
|
||||
if (selectedMaterials.length > 1) {
|
||||
const timelineOrderedIds = timelineSegments
|
||||
.map((seg) => seg.materialId)
|
||||
.filter((id, index, arr) => arr.indexOf(id) === index);
|
||||
const orderedMaterialIds = [
|
||||
...timelineOrderedIds.filter((id) => selectedMaterials.includes(id)),
|
||||
...selectedMaterials.filter((id) => !timelineOrderedIds.includes(id)),
|
||||
];
|
||||
|
||||
const materialPaths = orderedMaterialIds
|
||||
.map((id) => materials.find((x) => x.id === id)?.path)
|
||||
.filter((path): path is string => !!path);
|
||||
|
||||
if (materialPaths.length === 0) {
|
||||
toast.error("多素材解析失败,请刷新素材后重试");
|
||||
return;
|
||||
}
|
||||
|
||||
payload.material_paths = materialPaths;
|
||||
payload.material_path = materialPaths[0];
|
||||
|
||||
// 发送自定义时间轴分配
|
||||
const assignments = toCustomAssignments();
|
||||
if (assignments.length > 0) {
|
||||
const assignmentPaths = assignments
|
||||
.map((a) => a.material_path)
|
||||
.filter((path): path is string => !!path);
|
||||
|
||||
if (assignmentPaths.length === assignments.length) {
|
||||
// 以时间轴可见段为准:超出时间轴的素材不会参与本次生成
|
||||
payload.material_paths = assignmentPaths;
|
||||
payload.material_path = assignmentPaths[0];
|
||||
// 前端预估段数校验(与后端硬上限 50 对齐)
|
||||
if (assignments.length > 50) {
|
||||
toast.error(`时间轴段数过多(${assignments.length}),请减少插入或使用更长的主素材`);
|
||||
return;
|
||||
}
|
||||
// 主素材路径(始终来自 selectedMaterials[0])
|
||||
const primaryPath = firstMaterialObj.path;
|
||||
// 去重素材路径列表,主素材保证在首位
|
||||
const otherPaths = [...new Set(
|
||||
assignments.map((a) => a.material_path).filter((p) => p !== primaryPath)
|
||||
)];
|
||||
payload.material_path = primaryPath;
|
||||
payload.material_paths = [primaryPath, ...otherPaths];
|
||||
payload.custom_assignments = assignments;
|
||||
} else {
|
||||
console.warn(
|
||||
"[Timeline] custom_assignments 为空,回退后端自动分配",
|
||||
{ materials: materialPaths.length }
|
||||
);
|
||||
// 无插入且主素材无裁剪:退化为单素材
|
||||
payload.material_path = firstMaterialObj.path;
|
||||
}
|
||||
}
|
||||
|
||||
// 单素材 + 截取范围
|
||||
const singleSeg = timelineSegments[0];
|
||||
if (
|
||||
selectedMaterials.length === 1
|
||||
&& singleSeg
|
||||
&& (singleSeg.sourceStart > 0 || singleSeg.sourceEnd > 0)
|
||||
) {
|
||||
payload.custom_assignments = toCustomAssignments();
|
||||
if (selectedMaterials.length === 1) {
|
||||
const assignments = toCustomAssignments();
|
||||
if (assignments.length > 0) {
|
||||
payload.custom_assignments = assignments;
|
||||
}
|
||||
}
|
||||
|
||||
if (selectedSubtitleStyleId) {
|
||||
@@ -1094,6 +1092,21 @@ export const useHomeController = () => {
|
||||
videoItemRefs.current[id] = el;
|
||||
};
|
||||
|
||||
// 设为主素材:将目标素材移到 selectedMaterials[0]
|
||||
const handleSetPrimary = useCallback((materialId: string) => {
|
||||
setSelectedMaterials((prev) => {
|
||||
const filtered = prev.filter((id) => id !== materialId);
|
||||
return [materialId, ...filtered];
|
||||
});
|
||||
}, [setSelectedMaterials]);
|
||||
|
||||
// 多镜头:插入候选素材(selectedMaterials[1:])
|
||||
const insertCandidates = useMemo(() => {
|
||||
return selectedMaterials.slice(1)
|
||||
.map((id) => materials.find((m) => m.id === id))
|
||||
.filter((m): m is Material => !!m);
|
||||
}, [selectedMaterials, materials]);
|
||||
|
||||
return {
|
||||
apiBase,
|
||||
registerMaterialRef,
|
||||
@@ -1123,6 +1136,8 @@ export const useHomeController = () => {
|
||||
setText,
|
||||
extractModalOpen,
|
||||
setExtractModalOpen,
|
||||
learningModalOpen,
|
||||
setLearningModalOpen,
|
||||
rewriteModalOpen,
|
||||
setRewriteModalOpen,
|
||||
handleGenerateMeta,
|
||||
@@ -1246,9 +1261,20 @@ export const useHomeController = () => {
|
||||
setSpeed,
|
||||
emotion,
|
||||
setEmotion,
|
||||
timelineSegments,
|
||||
reorderSegments,
|
||||
setSourceRange,
|
||||
// Multi-camera timeline
|
||||
inserts,
|
||||
timelinePrimaryMaterial,
|
||||
primarySourceStart,
|
||||
primarySourceEnd,
|
||||
insertCandidates,
|
||||
addInsert,
|
||||
removeInsert,
|
||||
moveInsert,
|
||||
resizeInsert,
|
||||
setInsertSourceRange,
|
||||
setPrimarySourceRange,
|
||||
handleSetPrimary,
|
||||
reorderMaterials,
|
||||
clipTrimmerOpen,
|
||||
setClipTrimmerOpen,
|
||||
clipTrimmerSegmentId,
|
||||
|
||||
@@ -1,5 +1,9 @@
|
||||
import { useCallback, useEffect, useRef, useState } from "react";
|
||||
import type { Material } from "@/shared/types/material";
|
||||
import type { InsertSegment } from "@/shared/types/timeline";
|
||||
|
||||
// Re-export for downstream consumers (ClipTrimmer, etc.)
|
||||
export type { InsertSegment };
|
||||
|
||||
export interface TimelineSegment {
|
||||
id: string;
|
||||
@@ -12,18 +16,23 @@ export interface TimelineSegment {
|
||||
color: string;
|
||||
}
|
||||
|
||||
export interface CustomAssignment {
|
||||
material_path: string;
|
||||
start: number;
|
||||
end: number;
|
||||
source_start: number;
|
||||
source_end?: number;
|
||||
}
|
||||
export interface CustomAssignment {
|
||||
material_path: string;
|
||||
start: number;
|
||||
end: number;
|
||||
source_start: number;
|
||||
source_end?: number;
|
||||
}
|
||||
|
||||
const COLORS = ["#8b5cf6", "#ec4899", "#06b6d4", "#f59e0b", "#10b981", "#f97316"];
|
||||
const MAX_INSERTS = 10;
|
||||
const DEFAULT_INSERT_DURATION = 3;
|
||||
const MIN_GAP = 0.5;
|
||||
|
||||
export type AddInsertResult = "ok" | "limit" | "no_space";
|
||||
|
||||
/** Serializable subset for localStorage */
|
||||
interface SegmentSnapshot {
|
||||
interface InsertSnapshot {
|
||||
materialId: string;
|
||||
start: number;
|
||||
end: number;
|
||||
@@ -31,56 +40,11 @@ interface SegmentSnapshot {
|
||||
sourceEnd: number;
|
||||
}
|
||||
|
||||
/** Get effective duration of a segment (clipped range or full material duration) */
|
||||
function getEffectiveDuration(
|
||||
seg: { sourceStart: number; sourceEnd: number; materialId: string },
|
||||
mats: Material[]
|
||||
): number {
|
||||
const mat = mats.find((m) => m.id === seg.materialId);
|
||||
const matDur = mat?.duration_sec ?? 0;
|
||||
if (seg.sourceEnd > seg.sourceStart) return seg.sourceEnd - seg.sourceStart;
|
||||
if (seg.sourceStart > 0) return Math.max(matDur - seg.sourceStart, 0);
|
||||
return matDur;
|
||||
}
|
||||
|
||||
/**
|
||||
* Recalculate segment start/end positions based on effective durations.
|
||||
* - Segments placed sequentially by effective duration
|
||||
* - Segments exceeding audioDuration keep their positions (overflow, start >= duration)
|
||||
* - Last visible segment is capped/extended to exactly audioDuration (loop fill)
|
||||
*/
|
||||
function recalcPositions(
|
||||
segs: TimelineSegment[],
|
||||
mats: Material[],
|
||||
duration: number
|
||||
): TimelineSegment[] {
|
||||
if (segs.length === 0 || duration <= 0) return segs;
|
||||
|
||||
const fallbackDur = duration / segs.length;
|
||||
let cursor = 0;
|
||||
const result = segs.map((seg) => {
|
||||
const effDur = getEffectiveDuration(seg, mats);
|
||||
const dur = effDur > 0 ? effDur : fallbackDur;
|
||||
const newSeg = { ...seg, start: cursor, end: cursor + dur };
|
||||
cursor += dur;
|
||||
return newSeg;
|
||||
});
|
||||
|
||||
// Find last segment that starts before audioDuration
|
||||
let lastVisibleIdx = -1;
|
||||
for (let i = result.length - 1; i >= 0; i--) {
|
||||
if (result[i].start < duration) {
|
||||
lastVisibleIdx = i;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// Cap/extend last visible segment to exactly audioDuration
|
||||
if (lastVisibleIdx >= 0) {
|
||||
result[lastVisibleIdx] = { ...result[lastVisibleIdx], end: duration };
|
||||
}
|
||||
|
||||
return result;
|
||||
interface MultiCamCache {
|
||||
key: string;
|
||||
inserts: InsertSnapshot[];
|
||||
primarySourceStart: number;
|
||||
primarySourceEnd: number;
|
||||
}
|
||||
|
||||
interface UseTimelineEditorOptions {
|
||||
@@ -96,34 +60,40 @@ export const useTimelineEditor = ({
|
||||
selectedMaterials,
|
||||
storageKey,
|
||||
}: UseTimelineEditorOptions) => {
|
||||
const [segments, setSegments] = useState<TimelineSegment[]>([]);
|
||||
const [inserts, setInserts] = useState<InsertSegment[]>([]);
|
||||
const [primarySourceStart, setPrimarySourceStart] = useState(0);
|
||||
const [primarySourceEnd, setPrimarySourceEnd] = useState(0);
|
||||
const prevKey = useRef("");
|
||||
const restoredRef = useRef(false);
|
||||
const [prevPrimaryId, setPrevPrimaryId] = useState(selectedMaterials[0]);
|
||||
|
||||
// Refs for stable callbacks (avoid recreating on every materials/duration change)
|
||||
const materialsRef = useRef(materials);
|
||||
const audioDurationRef = useRef(audioDuration);
|
||||
|
||||
useEffect(() => {
|
||||
materialsRef.current = materials;
|
||||
}, [materials]);
|
||||
|
||||
useEffect(() => {
|
||||
audioDurationRef.current = audioDuration;
|
||||
}, [audioDuration]);
|
||||
// Refs for stable callbacks
|
||||
const materialsRef = useRef(materials);
|
||||
const audioDurationRef = useRef(audioDuration);
|
||||
const selectedMaterialsRef = useRef(selectedMaterials);
|
||||
|
||||
// Build a durationsKey so segments re-init when material durations become available
|
||||
const durationsKey = selectedMaterials
|
||||
.map((id) => materials.find((m) => m.id === id)?.duration_sec ?? 0)
|
||||
.join(",");
|
||||
useEffect(() => { materialsRef.current = materials; }, [materials]);
|
||||
useEffect(() => { audioDurationRef.current = audioDuration; }, [audioDuration]);
|
||||
useEffect(() => { selectedMaterialsRef.current = selectedMaterials; }, [selectedMaterials]);
|
||||
|
||||
// Build a cache key from materials + duration
|
||||
// Computed: primary material
|
||||
const primaryMaterial = materials.find((m) => m.id === selectedMaterials[0]);
|
||||
|
||||
// Cache key
|
||||
const cacheKey = `${selectedMaterials.join(",")}_${audioDuration.toFixed(1)}`;
|
||||
const lsKey = storageKey ? `vigent_${storageKey}_timeline` : null;
|
||||
const lsKey = storageKey ? `vigent_${storageKey}_multicam` : null;
|
||||
|
||||
const initSegments = useCallback(() => {
|
||||
if (selectedMaterials.length === 0 || audioDuration <= 0) {
|
||||
setSegments([]);
|
||||
// Reset primary source range when primary material identity changes
|
||||
// (React render-time state adjustment pattern for derived state)
|
||||
if (selectedMaterials[0] !== prevPrimaryId) {
|
||||
setPrevPrimaryId(selectedMaterials[0]);
|
||||
setPrimarySourceStart(0);
|
||||
setPrimarySourceEnd(0);
|
||||
}
|
||||
|
||||
// Initialize / restore from localStorage
|
||||
const initInserts = useCallback(() => {
|
||||
if (selectedMaterials.length <= 1 || audioDuration <= 0) {
|
||||
setInserts([]);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -132,27 +102,28 @@ export const useTimelineEditor = ({
|
||||
try {
|
||||
const raw = localStorage.getItem(lsKey);
|
||||
if (raw) {
|
||||
const saved = JSON.parse(raw) as { key: string; segments: SegmentSnapshot[] };
|
||||
if (saved.key === cacheKey && saved.segments.length === selectedMaterials.length) {
|
||||
const allMatch = saved.segments.every(
|
||||
(s, i) => s.materialId === selectedMaterials[i] || saved.segments.some((ss) => ss.materialId === selectedMaterials[i])
|
||||
);
|
||||
if (allMatch) {
|
||||
const restored: TimelineSegment[] = saved.segments.map((s, i) => {
|
||||
const saved: MultiCamCache = JSON.parse(raw);
|
||||
if (saved.key === cacheKey) {
|
||||
// Validate all insert materialIds still exist
|
||||
const existingIds = new Set(materials.map((m) => m.id));
|
||||
const validInserts = saved.inserts.filter((s) => existingIds.has(s.materialId));
|
||||
if (validInserts.length === saved.inserts.length) {
|
||||
const restored: InsertSegment[] = validInserts.map((s, i) => {
|
||||
const mat = materials.find((m) => m.id === s.materialId);
|
||||
return {
|
||||
id: `seg-${i}-${Date.now()}`,
|
||||
id: `ins-${i}-${Date.now()}`,
|
||||
materialId: s.materialId,
|
||||
materialName: mat?.scene || mat?.name || s.materialId,
|
||||
start: 0,
|
||||
end: 0,
|
||||
start: s.start,
|
||||
end: s.end,
|
||||
sourceStart: s.sourceStart,
|
||||
sourceEnd: s.sourceEnd,
|
||||
color: COLORS[i % COLORS.length],
|
||||
};
|
||||
});
|
||||
setSegments(recalcPositions(restored, materials, audioDuration));
|
||||
restoredRef.current = true;
|
||||
setInserts(restored);
|
||||
setPrimarySourceStart(saved.primarySourceStart || 0);
|
||||
setPrimarySourceEnd(saved.primarySourceEnd || 0);
|
||||
return;
|
||||
}
|
||||
}
|
||||
@@ -162,95 +133,315 @@ export const useTimelineEditor = ({
|
||||
}
|
||||
}
|
||||
|
||||
// Create fresh segments — positions derived by recalcPositions
|
||||
const newSegments: TimelineSegment[] = selectedMaterials.map((matId, i) => {
|
||||
const mat = materials.find((m) => m.id === matId);
|
||||
return {
|
||||
id: `seg-${i}-${Date.now()}`,
|
||||
materialId: matId,
|
||||
materialName: mat?.scene || mat?.name || matId,
|
||||
start: 0,
|
||||
end: 0,
|
||||
sourceStart: 0,
|
||||
sourceEnd: 0,
|
||||
color: COLORS[i % COLORS.length],
|
||||
};
|
||||
});
|
||||
|
||||
setSegments(recalcPositions(newSegments, materials, audioDuration));
|
||||
// Start fresh
|
||||
setInserts([]);
|
||||
setPrimarySourceStart(0);
|
||||
setPrimarySourceEnd(0);
|
||||
}, [audioDuration, materials, selectedMaterials, lsKey, cacheKey]);
|
||||
|
||||
// Auto-init when selectedMaterials, audioDuration, or material durations change
|
||||
// Auto-init when inputs change
|
||||
useEffect(() => {
|
||||
const durationsKey = selectedMaterials
|
||||
.map((id) => materials.find((m) => m.id === id)?.duration_sec ?? 0)
|
||||
.join(",");
|
||||
const key = `${selectedMaterials.join(",")}_${audioDuration}_${durationsKey}`;
|
||||
if (key !== prevKey.current) {
|
||||
prevKey.current = key;
|
||||
initSegments();
|
||||
// eslint-disable-next-line react-hooks/set-state-in-effect -- initialization on input change
|
||||
initInserts();
|
||||
}
|
||||
}, [selectedMaterials, audioDuration, durationsKey, initSegments]);
|
||||
}, [selectedMaterials, audioDuration, materials, initInserts]);
|
||||
|
||||
// Persist segments to localStorage on change (debounced)
|
||||
// Persist to localStorage (debounced)
|
||||
useEffect(() => {
|
||||
if (!lsKey || segments.length === 0) return;
|
||||
if (!lsKey || selectedMaterials.length <= 1) return;
|
||||
const timeout = setTimeout(() => {
|
||||
const snapshots: SegmentSnapshot[] = segments.map((s) => ({
|
||||
const snapshots: InsertSnapshot[] = inserts.map((s) => ({
|
||||
materialId: s.materialId,
|
||||
start: s.start,
|
||||
end: s.end,
|
||||
sourceStart: s.sourceStart,
|
||||
sourceEnd: s.sourceEnd,
|
||||
}));
|
||||
localStorage.setItem(lsKey, JSON.stringify({ key: cacheKey, segments: snapshots }));
|
||||
const cache: MultiCamCache = {
|
||||
key: cacheKey,
|
||||
inserts: snapshots,
|
||||
primarySourceStart,
|
||||
primarySourceEnd,
|
||||
};
|
||||
localStorage.setItem(lsKey, JSON.stringify(cache));
|
||||
}, 300);
|
||||
return () => clearTimeout(timeout);
|
||||
}, [segments, lsKey, cacheKey]);
|
||||
}, [inserts, primarySourceStart, primarySourceEnd, lsKey, cacheKey, selectedMaterials.length]);
|
||||
|
||||
const reorderSegments = useCallback(
|
||||
(fromIdx: number, toIdx: number) => {
|
||||
setSegments((prev) => {
|
||||
if (fromIdx < 0 || toIdx < 0 || fromIdx >= prev.length || toIdx >= prev.length) return prev;
|
||||
if (fromIdx === toIdx) return prev;
|
||||
const next = [...prev];
|
||||
// Move the segment: remove from old position, insert at new position
|
||||
const [moved] = next.splice(fromIdx, 1);
|
||||
next.splice(toIdx, 0, moved);
|
||||
return recalcPositions(next, materialsRef.current, audioDurationRef.current);
|
||||
});
|
||||
},
|
||||
[]
|
||||
);
|
||||
// Clean up inserts referencing removed materials
|
||||
useEffect(() => {
|
||||
const existingIds = new Set(selectedMaterials.slice(1));
|
||||
// eslint-disable-next-line react-hooks/set-state-in-effect -- cleanup stale references
|
||||
setInserts((prev) => {
|
||||
const filtered = prev.filter((ins) => existingIds.has(ins.materialId));
|
||||
return filtered.length !== prev.length ? filtered : prev;
|
||||
});
|
||||
}, [selectedMaterials]);
|
||||
|
||||
const setSourceRange = useCallback(
|
||||
(id: string, sourceStart: number, sourceEnd: number) => {
|
||||
setSegments((prev) => {
|
||||
const updated = prev.map((s) => (s.id === id ? { ...s, sourceStart, sourceEnd } : s));
|
||||
return recalcPositions(updated, materialsRef.current, audioDurationRef.current);
|
||||
});
|
||||
},
|
||||
[]
|
||||
);
|
||||
// ── Operations ──
|
||||
|
||||
const addInsert = useCallback((materialId: string): AddInsertResult => {
|
||||
const currentInserts = inserts;
|
||||
const duration = audioDurationRef.current;
|
||||
|
||||
if (currentInserts.length >= MAX_INSERTS) return "limit";
|
||||
if (duration <= 0) return "no_space";
|
||||
|
||||
const mat = materialsRef.current.find((m) => m.id === materialId);
|
||||
const sorted = [...currentInserts].sort((a, b) => a.start - b.start);
|
||||
|
||||
// Find first gap that can fit DEFAULT_INSERT_DURATION
|
||||
let bestStart = -1;
|
||||
let prevEnd = 0;
|
||||
for (const ins of sorted) {
|
||||
if (ins.start - prevEnd >= DEFAULT_INSERT_DURATION + MIN_GAP) {
|
||||
bestStart = prevEnd + MIN_GAP;
|
||||
break;
|
||||
}
|
||||
prevEnd = ins.end;
|
||||
}
|
||||
// Check trailing gap
|
||||
if (bestStart < 0 && duration - prevEnd >= DEFAULT_INSERT_DURATION + MIN_GAP) {
|
||||
bestStart = prevEnd + MIN_GAP;
|
||||
}
|
||||
|
||||
if (bestStart < 0) return "no_space";
|
||||
|
||||
const newInsert: InsertSegment = {
|
||||
id: `ins-${Date.now()}-${Math.random().toString(36).slice(2, 6)}`,
|
||||
materialId,
|
||||
materialName: mat?.scene || mat?.name || materialId,
|
||||
start: bestStart,
|
||||
end: Math.min(bestStart + DEFAULT_INSERT_DURATION, duration),
|
||||
sourceStart: 0,
|
||||
sourceEnd: 0,
|
||||
color: COLORS[currentInserts.length % COLORS.length],
|
||||
};
|
||||
|
||||
setInserts((prev) => [...prev, newInsert]);
|
||||
return "ok";
|
||||
}, [inserts]);
|
||||
|
||||
const removeInsert = useCallback((id: string) => {
|
||||
setInserts((prev) => prev.filter((ins) => ins.id !== id));
|
||||
}, []);
|
||||
|
||||
const moveInsert = useCallback((id: string, newStart: number) => {
|
||||
setInserts((prev) => {
|
||||
const duration = audioDurationRef.current;
|
||||
const target = prev.find((ins) => ins.id === id);
|
||||
if (!target) return prev;
|
||||
|
||||
const len = target.end - target.start;
|
||||
let clampedStart = Math.max(0, Math.min(newStart, duration - len));
|
||||
let clampedEnd = clampedStart + len;
|
||||
|
||||
// Prevent overlap with other inserts
|
||||
const others = prev.filter((ins) => ins.id !== id).sort((a, b) => a.start - b.start);
|
||||
for (const other of others) {
|
||||
if (clampedEnd > other.start && clampedStart < other.end) {
|
||||
// Try pushing to right of blocker
|
||||
const rightStart = other.end + 0.1;
|
||||
if (rightStart + len <= duration) {
|
||||
clampedStart = rightStart;
|
||||
clampedEnd = clampedStart + len;
|
||||
} else {
|
||||
// Try pushing to left of blocker
|
||||
const leftEnd = other.start - 0.1;
|
||||
if (leftEnd - len >= 0) {
|
||||
clampedEnd = leftEnd;
|
||||
clampedStart = clampedEnd - len;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return prev.map((ins) =>
|
||||
ins.id === id ? { ...ins, start: clampedStart, end: clampedEnd } : ins
|
||||
);
|
||||
});
|
||||
}, []);
|
||||
|
||||
const resizeInsert = useCallback((id: string, newEnd: number) => {
|
||||
setInserts((prev) => {
|
||||
const duration = audioDurationRef.current;
|
||||
const target = prev.find((ins) => ins.id === id);
|
||||
if (!target) return prev;
|
||||
|
||||
const minLen = 0.5;
|
||||
let clamped = Math.max(target.start + minLen, Math.min(newEnd, duration));
|
||||
|
||||
// Prevent overlap with next insert
|
||||
const others = prev.filter((ins) => ins.id !== id).sort((a, b) => a.start - b.start);
|
||||
for (const other of others) {
|
||||
if (other.start > target.start && clamped > other.start - 0.1) {
|
||||
clamped = other.start - 0.1;
|
||||
}
|
||||
}
|
||||
|
||||
return prev.map((ins) =>
|
||||
ins.id === id ? { ...ins, end: Math.max(clamped, target.start + minLen) } : ins
|
||||
);
|
||||
});
|
||||
}, []);
|
||||
|
||||
const setInsertSourceRange = useCallback((id: string, sourceStart: number, sourceEnd: number) => {
|
||||
setInserts((prev) =>
|
||||
prev.map((ins) => (ins.id === id ? { ...ins, sourceStart, sourceEnd } : ins))
|
||||
);
|
||||
}, []);
|
||||
|
||||
const setPrimarySourceRange = useCallback((sourceStart: number, sourceEnd: number) => {
|
||||
setPrimarySourceStart(sourceStart);
|
||||
setPrimarySourceEnd(sourceEnd);
|
||||
}, []);
|
||||
|
||||
// ── Serialization ──
|
||||
|
||||
const toCustomAssignments = useCallback((): CustomAssignment[] => {
|
||||
const mats = materialsRef.current;
|
||||
const selMats = selectedMaterialsRef.current;
|
||||
const duration = audioDurationRef.current;
|
||||
return segments
|
||||
.filter((seg) => seg.start < duration)
|
||||
.map((seg) => {
|
||||
const mat = materialsRef.current.find((m) => m.id === seg.materialId);
|
||||
return {
|
||||
material_path: mat?.path || seg.materialId,
|
||||
start: seg.start,
|
||||
end: seg.end,
|
||||
source_start: seg.sourceStart,
|
||||
source_end: seg.sourceEnd > seg.sourceStart ? seg.sourceEnd : undefined,
|
||||
};
|
||||
});
|
||||
}, [segments]);
|
||||
|
||||
if (duration <= 0 || selMats.length === 0) return [];
|
||||
|
||||
const primaryMat = mats.find((m) => m.id === selMats[0]);
|
||||
if (!primaryMat) return [];
|
||||
const primaryPath = primaryMat.path;
|
||||
const primaryDuration = primaryMat.duration_sec ?? 0;
|
||||
|
||||
// Single material mode: only emit assignment if user has set a source range
|
||||
if (selMats.length === 1) {
|
||||
if (primarySourceStart > 0 || primarySourceEnd > 0) {
|
||||
return [{
|
||||
material_path: primaryPath,
|
||||
start: 0,
|
||||
end: duration,
|
||||
source_start: primarySourceStart,
|
||||
source_end: primarySourceEnd > primarySourceStart ? primarySourceEnd : undefined,
|
||||
}];
|
||||
}
|
||||
return [];
|
||||
}
|
||||
|
||||
// Multi-camera mode: build assignments with gap splitting
|
||||
return buildAssignments(
|
||||
primaryPath,
|
||||
primaryDuration,
|
||||
primarySourceStart,
|
||||
primarySourceEnd,
|
||||
inserts,
|
||||
duration,
|
||||
mats,
|
||||
);
|
||||
}, [inserts, primarySourceStart, primarySourceEnd]);
|
||||
|
||||
return {
|
||||
segments,
|
||||
initSegments,
|
||||
reorderSegments,
|
||||
setSourceRange,
|
||||
// State
|
||||
inserts,
|
||||
setInserts,
|
||||
primaryMaterial,
|
||||
primarySourceStart,
|
||||
primarySourceEnd,
|
||||
// Operations
|
||||
addInsert,
|
||||
removeInsert,
|
||||
moveInsert,
|
||||
resizeInsert,
|
||||
setInsertSourceRange,
|
||||
setPrimarySourceRange,
|
||||
// Serialization
|
||||
toCustomAssignments,
|
||||
};
|
||||
};
|
||||
|
||||
// ── buildAssignments: gap-filling + boundary-splitting ──
|
||||
|
||||
function buildAssignments(
|
||||
primaryPath: string,
|
||||
primaryDuration: number,
|
||||
pSourceStart: number,
|
||||
pSourceEnd: number,
|
||||
inserts: InsertSegment[],
|
||||
audioDuration: number,
|
||||
materials: Material[],
|
||||
): CustomAssignment[] {
|
||||
const assignments: CustomAssignment[] = [];
|
||||
const sorted = [...inserts].sort((a, b) => a.start - b.start);
|
||||
|
||||
// Primary material effective play range
|
||||
const clipStart = pSourceStart;
|
||||
const clipEnd = pSourceEnd > pSourceStart ? pSourceEnd : primaryDuration;
|
||||
const effective = clipEnd - clipStart;
|
||||
|
||||
let cursor = 0;
|
||||
let primaryAccum = 0;
|
||||
|
||||
function addPrimaryGap(gapStart: number, gapEnd: number) {
|
||||
if (gapEnd - gapStart < 0.05) return;
|
||||
|
||||
// No valid effective range: single segment from 0 (graceful degradation)
|
||||
if (effective <= 0) {
|
||||
assignments.push({
|
||||
material_path: primaryPath,
|
||||
start: gapStart,
|
||||
end: gapEnd,
|
||||
source_start: 0,
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
let remaining = gapEnd - gapStart;
|
||||
let segStart = gapStart;
|
||||
const EPSILON = 0.01;
|
||||
|
||||
while (remaining > 0.05) {
|
||||
const posInClip = primaryAccum % effective;
|
||||
const sourceStart = clipStart + posInClip;
|
||||
const availableInClip = effective - posInClip;
|
||||
const segDuration = Math.min(remaining, availableInClip);
|
||||
|
||||
if (segDuration < EPSILON) break;
|
||||
|
||||
assignments.push({
|
||||
material_path: primaryPath,
|
||||
start: segStart,
|
||||
end: segStart + segDuration,
|
||||
source_start: sourceStart,
|
||||
source_end: pSourceEnd > pSourceStart ? pSourceEnd : undefined,
|
||||
});
|
||||
|
||||
primaryAccum += segDuration;
|
||||
segStart += segDuration;
|
||||
remaining -= segDuration;
|
||||
}
|
||||
}
|
||||
|
||||
for (const insert of sorted) {
|
||||
// Primary gap before this insert
|
||||
addPrimaryGap(cursor, insert.start);
|
||||
|
||||
// Insert segment
|
||||
const mat = materials.find((m) => m.id === insert.materialId);
|
||||
assignments.push({
|
||||
material_path: mat?.path || insert.materialId,
|
||||
start: insert.start,
|
||||
end: insert.end,
|
||||
source_start: insert.sourceStart,
|
||||
source_end: insert.sourceEnd > insert.sourceStart ? insert.sourceEnd : undefined,
|
||||
});
|
||||
|
||||
cursor = insert.end;
|
||||
}
|
||||
|
||||
// Trailing primary gap
|
||||
addPrimaryGap(cursor, audioDuration);
|
||||
|
||||
return assignments;
|
||||
}
|
||||
|
||||
@@ -5,9 +5,11 @@ import { useRouter } from "next/navigation";
|
||||
import { RefreshCw } from "lucide-react";
|
||||
import VideoPreviewModal from "@/components/VideoPreviewModal";
|
||||
import ScriptExtractionModal from "./ScriptExtractionModal";
|
||||
import ScriptLearningModal from "./ScriptLearningModal";
|
||||
import RewriteModal from "./RewriteModal";
|
||||
import { useHomeController } from "@/features/home/model/useHomeController";
|
||||
import { resolveMediaUrl } from "@/shared/lib/media";
|
||||
import { toast } from "sonner";
|
||||
import { BgmPanel } from "@/features/home/ui/BgmPanel";
|
||||
import { GenerateActionBar } from "@/features/home/ui/GenerateActionBar";
|
||||
import { HistoryList } from "@/features/home/ui/HistoryList";
|
||||
@@ -53,6 +55,8 @@ export function HomePage() {
|
||||
setText,
|
||||
extractModalOpen,
|
||||
setExtractModalOpen,
|
||||
learningModalOpen,
|
||||
setLearningModalOpen,
|
||||
rewriteModalOpen,
|
||||
setRewriteModalOpen,
|
||||
handleGenerateMeta,
|
||||
@@ -171,9 +175,19 @@ export function HomePage() {
|
||||
setSpeed,
|
||||
emotion,
|
||||
setEmotion,
|
||||
timelineSegments,
|
||||
reorderSegments,
|
||||
setSourceRange,
|
||||
// Multi-camera timeline
|
||||
inserts,
|
||||
timelinePrimaryMaterial,
|
||||
primarySourceStart,
|
||||
primarySourceEnd,
|
||||
insertCandidates,
|
||||
addInsert,
|
||||
removeInsert,
|
||||
moveInsert,
|
||||
resizeInsert,
|
||||
setInsertSourceRange,
|
||||
setPrimarySourceRange,
|
||||
handleSetPrimary,
|
||||
clipTrimmerOpen,
|
||||
setClipTrimmerOpen,
|
||||
clipTrimmerSegmentId,
|
||||
@@ -198,10 +212,27 @@ export function HomePage() {
|
||||
return () => clearTimeout(timer);
|
||||
}, []);
|
||||
|
||||
const clipTrimmerSegment = useMemo(
|
||||
() => timelineSegments.find((s) => s.id === clipTrimmerSegmentId) ?? null,
|
||||
[timelineSegments, clipTrimmerSegmentId]
|
||||
);
|
||||
// ClipTrimmer: construct segment from either primary or an insert
|
||||
const clipTrimmerSegment = useMemo(() => {
|
||||
if (!clipTrimmerSegmentId) return null;
|
||||
// Check if it's the primary material
|
||||
if (clipTrimmerSegmentId === "primary" && timelinePrimaryMaterial) {
|
||||
return {
|
||||
id: "primary",
|
||||
materialId: timelinePrimaryMaterial.id,
|
||||
materialName: timelinePrimaryMaterial.scene || timelinePrimaryMaterial.name || "",
|
||||
start: 0,
|
||||
end: selectedAudio?.duration_sec ?? 0,
|
||||
sourceStart: primarySourceStart,
|
||||
sourceEnd: primarySourceEnd,
|
||||
color: "#8b5cf6",
|
||||
};
|
||||
}
|
||||
// Check inserts
|
||||
const insert = inserts.find((i) => i.id === clipTrimmerSegmentId);
|
||||
if (insert) return insert;
|
||||
return null;
|
||||
}, [clipTrimmerSegmentId, timelinePrimaryMaterial, inserts, selectedAudio, primarySourceStart, primarySourceEnd]);
|
||||
|
||||
const clipTrimmerMaterialUrl = useMemo(() => {
|
||||
if (!clipTrimmerSegment) return null;
|
||||
@@ -222,6 +253,7 @@ export function HomePage() {
|
||||
text={text}
|
||||
onChangeText={setText}
|
||||
onOpenExtractModal={() => setExtractModalOpen(true)}
|
||||
onOpenLearningModal={() => setLearningModalOpen(true)}
|
||||
onOpenRewriteModal={() => setRewriteModalOpen(true)}
|
||||
onTranslate={handleTranslate}
|
||||
isTranslating={isTranslating}
|
||||
@@ -329,6 +361,7 @@ export function HomePage() {
|
||||
onDeleteMaterial={deleteMaterial}
|
||||
onClearUploadError={() => setUploadError(null)}
|
||||
registerMaterialRef={registerMaterialRef}
|
||||
onSetPrimary={handleSetPrimary}
|
||||
/>
|
||||
<div className="border-t border-white/10 my-4" />
|
||||
<div className="relative">
|
||||
@@ -343,15 +376,28 @@ export function HomePage() {
|
||||
embedded
|
||||
audioDuration={selectedAudio?.duration_sec ?? 0}
|
||||
audioUrl={selectedAudio ? (resolveMediaUrl(selectedAudio.path) || "") : ""}
|
||||
segments={timelineSegments}
|
||||
materials={materials}
|
||||
outputAspectRatio={outputAspectRatio}
|
||||
onOutputAspectRatioChange={setOutputAspectRatio}
|
||||
onReorderSegment={reorderSegments}
|
||||
onClickSegment={(seg) => {
|
||||
setClipTrimmerSegmentId(seg.id);
|
||||
primaryMaterial={timelinePrimaryMaterial}
|
||||
inserts={inserts}
|
||||
insertCandidates={insertCandidates}
|
||||
onAddInsert={(materialId) => {
|
||||
const result = addInsert(materialId);
|
||||
if (result === "limit") toast.error("最多添加 10 个插入片段");
|
||||
else if (result === "no_space") toast.error("时间轴空间不足,无法再添加插入");
|
||||
}}
|
||||
onRemoveInsert={removeInsert}
|
||||
onMoveInsert={moveInsert}
|
||||
onClickInsert={(insert) => {
|
||||
setClipTrimmerSegmentId(insert.id);
|
||||
setClipTrimmerOpen(true);
|
||||
}}
|
||||
onClickPrimary={() => {
|
||||
setClipTrimmerSegmentId("primary");
|
||||
setClipTrimmerOpen(true);
|
||||
}}
|
||||
primarySourceStart={primarySourceStart}
|
||||
primarySourceEnd={primarySourceEnd}
|
||||
outputAspectRatio={outputAspectRatio}
|
||||
onOutputAspectRatioChange={setOutputAspectRatio}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
@@ -514,13 +560,28 @@ export function HomePage() {
|
||||
onApply={(newText) => setText(newText)}
|
||||
/>
|
||||
|
||||
<ScriptLearningModal
|
||||
isOpen={learningModalOpen}
|
||||
onClose={() => setLearningModalOpen(false)}
|
||||
onApply={(nextText) => setText(nextText)}
|
||||
/>
|
||||
|
||||
<ClipTrimmer
|
||||
isOpen={clipTrimmerOpen}
|
||||
segment={clipTrimmerSegment}
|
||||
materialUrl={clipTrimmerMaterialUrl}
|
||||
onConfirm={(sourceStart, sourceEnd) => {
|
||||
if (clipTrimmerSegmentId) {
|
||||
setSourceRange(clipTrimmerSegmentId, sourceStart, sourceEnd);
|
||||
if (clipTrimmerSegmentId === "primary") {
|
||||
setPrimarySourceRange(sourceStart, sourceEnd);
|
||||
} else if (clipTrimmerSegmentId) {
|
||||
setInsertSourceRange(clipTrimmerSegmentId, sourceStart, sourceEnd);
|
||||
// Sync timeline duration to match trimmed source duration
|
||||
if (sourceEnd > sourceStart) {
|
||||
const ins = inserts.find((i) => i.id === clipTrimmerSegmentId);
|
||||
if (ins) {
|
||||
resizeInsert(clipTrimmerSegmentId, ins.start + (sourceEnd - sourceStart));
|
||||
}
|
||||
}
|
||||
}
|
||||
setClipTrimmerOpen(false);
|
||||
}}
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
import { type ChangeEvent, type MouseEvent, useCallback, useMemo, useRef, useState } from "react";
|
||||
import { Upload, RefreshCw, Eye, Trash2, X, Pencil, Check, Search, ChevronDown } from "lucide-react";
|
||||
import { Upload, RefreshCw, Eye, Trash2, X, Pencil, Check, Search, ChevronDown, Crown } from "lucide-react";
|
||||
import type { Material } from "@/shared/types/material";
|
||||
import { SelectPopover } from "@/shared/ui/SelectPopover";
|
||||
|
||||
@@ -26,6 +26,7 @@ interface MaterialSelectorProps {
|
||||
onDeleteMaterial: (id: string) => void;
|
||||
onClearUploadError: () => void;
|
||||
registerMaterialRef: (id: string, element: HTMLDivElement | null) => void;
|
||||
onSetPrimary?: (materialId: string) => void;
|
||||
embedded?: boolean;
|
||||
}
|
||||
|
||||
@@ -52,6 +53,7 @@ export function MaterialSelector({
|
||||
onDeleteMaterial,
|
||||
onClearUploadError,
|
||||
registerMaterialRef,
|
||||
onSetPrimary,
|
||||
embedded = false,
|
||||
}: MaterialSelectorProps) {
|
||||
const [materialFilter, setMaterialFilter] = useState("");
|
||||
@@ -280,12 +282,33 @@ export function MaterialSelector({
|
||||
disabled={isFull && !isSelected}
|
||||
className="min-w-0 flex-1 text-left"
|
||||
>
|
||||
<span className="block truncate text-sm text-white">{m.scene || m.name}</span>
|
||||
<span className="flex items-center gap-1.5">
|
||||
<span className="block truncate text-sm text-white">{m.scene || m.name}</span>
|
||||
{isSelected && selectedMaterials[0] === m.id && selectedMaterials.length > 1 && (
|
||||
<span className="shrink-0 text-[9px] px-1 py-0.5 rounded bg-purple-500/30 text-purple-300 border border-purple-500/40">主素材</span>
|
||||
)}
|
||||
{isSelected && selectedMaterials[0] !== m.id && (
|
||||
<span className="shrink-0 text-[9px] px-1 py-0.5 rounded bg-white/10 text-gray-400 border border-white/10">可插入</span>
|
||||
)}
|
||||
</span>
|
||||
<span className="mt-0.5 block text-xs text-gray-400">{m.size_mb.toFixed(1)} MB</span>
|
||||
</button>
|
||||
)}
|
||||
|
||||
<div className="flex items-center gap-2 pl-2">
|
||||
{isSelected && selectedMaterials[0] !== m.id && onSetPrimary && (
|
||||
<button
|
||||
type="button"
|
||||
onClick={(e) => {
|
||||
e.stopPropagation();
|
||||
onSetPrimary(m.id);
|
||||
}}
|
||||
className="p-1 text-gray-400 hover:text-amber-300"
|
||||
title="设为主素材"
|
||||
>
|
||||
<Crown className="h-4 w-4" />
|
||||
</button>
|
||||
)}
|
||||
<button
|
||||
type="button"
|
||||
onClick={(e) => {
|
||||
|
||||
@@ -398,7 +398,7 @@ export function RefAudioPanel({
|
||||
isOpen={recordingModalOpen}
|
||||
onClose={closeRecordingModal}
|
||||
panelClassName="w-full max-w-lg rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden"
|
||||
closeOnOverlay={false}
|
||||
closeOnOverlay={!isRecording}
|
||||
>
|
||||
<AppModalHeader
|
||||
title="🎤 在线录音"
|
||||
|
||||
@@ -3,6 +3,7 @@ import { Loader2, Sparkles } from "lucide-react";
|
||||
import api from "@/shared/api/axios";
|
||||
import { ApiResponse, unwrap } from "@/shared/api/types";
|
||||
import { AppModal, AppModalHeader } from "@/shared/ui/AppModal";
|
||||
import { toast } from "sonner";
|
||||
|
||||
const CUSTOM_PROMPT_KEY = "vigent_rewriteCustomPrompt";
|
||||
|
||||
@@ -78,10 +79,52 @@ export default function RewriteModal({
|
||||
onClose();
|
||||
};
|
||||
|
||||
const handleRetry = () => {
|
||||
setRewrittenText("");
|
||||
setError(null);
|
||||
};
|
||||
const handleRetry = () => {
|
||||
setRewrittenText("");
|
||||
setError(null);
|
||||
};
|
||||
|
||||
const fallbackCopyTextToClipboard = useCallback((text: string) => {
|
||||
const textArea = document.createElement("textarea");
|
||||
textArea.value = text;
|
||||
textArea.style.top = "0";
|
||||
textArea.style.left = "0";
|
||||
textArea.style.position = "fixed";
|
||||
textArea.style.opacity = "0";
|
||||
|
||||
document.body.appendChild(textArea);
|
||||
textArea.focus();
|
||||
textArea.select();
|
||||
|
||||
try {
|
||||
const successful = document.execCommand("copy");
|
||||
if (successful) {
|
||||
toast.success("已复制到剪贴板");
|
||||
} else {
|
||||
toast.error("复制失败,请手动复制");
|
||||
}
|
||||
} catch {
|
||||
toast.error("复制失败,请手动复制");
|
||||
}
|
||||
|
||||
document.body.removeChild(textArea);
|
||||
}, []);
|
||||
|
||||
const handleCopy = useCallback((text: string) => {
|
||||
if (!text.trim()) return;
|
||||
if (navigator.clipboard && window.isSecureContext) {
|
||||
navigator.clipboard
|
||||
.writeText(text)
|
||||
.then(() => {
|
||||
toast.success("已复制到剪贴板");
|
||||
})
|
||||
.catch(() => {
|
||||
fallbackCopyTextToClipboard(text);
|
||||
});
|
||||
} else {
|
||||
fallbackCopyTextToClipboard(text);
|
||||
}
|
||||
}, [fallbackCopyTextToClipboard]);
|
||||
|
||||
if (!isOpen) return null;
|
||||
|
||||
@@ -90,7 +133,7 @@ export default function RewriteModal({
|
||||
isOpen={isOpen}
|
||||
onClose={onClose}
|
||||
panelClassName="w-full max-w-2xl max-h-[90vh] rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden flex flex-col"
|
||||
closeOnOverlay={false}
|
||||
closeOnOverlay
|
||||
>
|
||||
<AppModalHeader
|
||||
title="AI 智能改写"
|
||||
@@ -143,56 +186,63 @@ export default function RewriteModal({
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Rewritten result */}
|
||||
{rewrittenText && (
|
||||
<>
|
||||
<div className="space-y-2">
|
||||
<div className="flex justify-between items-center">
|
||||
<h4 className="font-semibold text-purple-300 flex items-center gap-2">
|
||||
<Sparkles className="h-4 w-4" />
|
||||
AI 改写结果
|
||||
</h4>
|
||||
<button
|
||||
onClick={handleApply}
|
||||
className="text-xs bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-500 hover:to-pink-500 text-white px-3 py-1.5 rounded-lg transition-colors shadow-sm"
|
||||
>
|
||||
使用此结果
|
||||
</button>
|
||||
</div>
|
||||
<div className="bg-purple-900/10 border border-purple-500/20 rounded-xl p-4 max-h-60 overflow-y-auto hide-scrollbar">
|
||||
<p className="text-gray-200 text-sm leading-relaxed whitespace-pre-wrap">
|
||||
{rewrittenText}
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="space-y-2">
|
||||
<div className="flex justify-between items-center">
|
||||
<h4 className="font-semibold text-gray-400 flex items-center gap-2">
|
||||
📝 原文对比
|
||||
</h4>
|
||||
<button
|
||||
onClick={onClose}
|
||||
className="text-xs bg-white/10 hover:bg-white/20 text-white px-3 py-1.5 rounded-lg transition-colors"
|
||||
>
|
||||
保留原文
|
||||
</button>
|
||||
</div>
|
||||
<div className="bg-white/5 border border-white/10 rounded-xl p-4 max-h-40 overflow-y-auto hide-scrollbar">
|
||||
<p className="text-gray-400 text-sm leading-relaxed whitespace-pre-wrap">
|
||||
{originalText}
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<button
|
||||
onClick={handleRetry}
|
||||
className="w-full py-2.5 px-4 bg-white/10 hover:bg-white/20 text-white rounded-xl transition-colors"
|
||||
>
|
||||
重新改写
|
||||
</button>
|
||||
</>
|
||||
)}
|
||||
{/* Rewritten result */}
|
||||
{rewrittenText && (
|
||||
<>
|
||||
<div className="space-y-2">
|
||||
<div className="flex justify-between items-center">
|
||||
<h4 className="font-semibold text-purple-300 flex items-center gap-2">
|
||||
<Sparkles className="h-4 w-4" />
|
||||
AI 改写结果
|
||||
</h4>
|
||||
<span className="text-xs text-gray-400">{rewrittenText.length} 字</span>
|
||||
</div>
|
||||
<div className="bg-purple-900/10 border border-purple-500/20 rounded-xl p-4 max-h-60 overflow-y-auto hide-scrollbar">
|
||||
<p className="text-gray-200 text-sm leading-relaxed whitespace-pre-wrap">
|
||||
{rewrittenText}
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="space-y-2">
|
||||
<h4 className="font-semibold text-gray-400 flex items-center gap-2">
|
||||
📝 原文对比
|
||||
</h4>
|
||||
<div className="bg-white/5 border border-white/10 rounded-xl p-4 max-h-40 overflow-y-auto hide-scrollbar">
|
||||
<p className="text-gray-400 text-sm leading-relaxed whitespace-pre-wrap">
|
||||
{originalText}
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="grid grid-cols-2 sm:grid-cols-4 gap-2">
|
||||
<button
|
||||
onClick={handleApply}
|
||||
className="py-2.5 px-3 bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-500 hover:to-pink-500 text-white rounded-lg transition-colors text-sm"
|
||||
>
|
||||
填入文案
|
||||
</button>
|
||||
<button
|
||||
onClick={() => handleCopy(rewrittenText)}
|
||||
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
|
||||
>
|
||||
复制
|
||||
</button>
|
||||
<button
|
||||
onClick={handleRetry}
|
||||
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
|
||||
>
|
||||
重新生成
|
||||
</button>
|
||||
<button
|
||||
onClick={onClose}
|
||||
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
|
||||
>
|
||||
保留原文
|
||||
</button>
|
||||
</div>
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
</AppModal>
|
||||
);
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
import { useCallback, useEffect, useRef, useState } from "react";
|
||||
import { FileText, History, Languages, Loader2, Maximize2, RotateCcw, Save, Sparkles, Trash2 } from "lucide-react";
|
||||
import { FileText, GraduationCap, History, Languages, Loader2, Maximize2, RotateCcw, Save, Sparkles, Trash2 } from "lucide-react";
|
||||
import type { SavedScript } from "@/features/home/model/useSavedScripts";
|
||||
import { AppModal, AppModalHeader } from "@/shared/ui/AppModal";
|
||||
|
||||
@@ -19,6 +19,7 @@ interface ScriptEditorProps {
|
||||
text: string;
|
||||
onChangeText: (value: string) => void;
|
||||
onOpenExtractModal: () => void;
|
||||
onOpenLearningModal: () => void;
|
||||
onOpenRewriteModal: () => void;
|
||||
onTranslate: (targetLang: string) => void;
|
||||
isTranslating: boolean;
|
||||
@@ -34,6 +35,7 @@ export function ScriptEditor({
|
||||
text,
|
||||
onChangeText,
|
||||
onOpenExtractModal,
|
||||
onOpenLearningModal,
|
||||
onOpenRewriteModal,
|
||||
onTranslate,
|
||||
isTranslating,
|
||||
@@ -146,6 +148,13 @@ export function ScriptEditor({
|
||||
<FileText className="h-3.5 w-3.5" />
|
||||
文案提取助手
|
||||
</button>
|
||||
<button
|
||||
onClick={onOpenLearningModal}
|
||||
className={`${actionBtnBase} bg-gradient-to-r from-blue-600 to-cyan-600 hover:from-blue-500 hover:to-cyan-500 text-white`}
|
||||
>
|
||||
<GraduationCap className="h-3.5 w-3.5" />
|
||||
文案深度学习
|
||||
</button>
|
||||
<div className="relative" ref={langMenuRef}>
|
||||
<button
|
||||
onClick={() => setShowLangMenu((prev) => !prev)}
|
||||
|
||||
@@ -71,7 +71,7 @@ export default function ScriptExtractionModal({
|
||||
isOpen={isOpen}
|
||||
onClose={onClose}
|
||||
panelClassName="w-full max-w-2xl max-h-[90vh] rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden flex flex-col"
|
||||
closeOnOverlay={false}
|
||||
closeOnOverlay
|
||||
>
|
||||
<AppModalHeader title="📜 文案提取助手" onClose={onClose} />
|
||||
|
||||
@@ -228,43 +228,47 @@ export default function ScriptExtractionModal({
|
||||
)}
|
||||
|
||||
{step === "result" && (
|
||||
<div className="space-y-6">
|
||||
<div className="space-y-2">
|
||||
<div className="flex justify-between items-center">
|
||||
<h4 className="font-semibold text-gray-300 flex items-center gap-2">
|
||||
🎙️ 识别结果
|
||||
</h4>
|
||||
<div className="flex items-center gap-2">
|
||||
{onApply && (
|
||||
<button
|
||||
onClick={() => handleApplyAndClose(script)}
|
||||
className="text-xs bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-500 hover:to-pink-500 text-white px-3 py-1.5 rounded-lg transition-colors flex items-center gap-1 shadow-sm"
|
||||
>
|
||||
📥 填入
|
||||
</button>
|
||||
)}
|
||||
<button
|
||||
onClick={() => copyToClipboard(script)}
|
||||
className="text-xs bg-white/10 hover:bg-white/20 text-white px-3 py-1.5 rounded-lg transition-colors"
|
||||
>
|
||||
复制
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
<div className="bg-white/5 border border-white/10 rounded-xl p-4 max-h-60 overflow-y-auto hide-scrollbar">
|
||||
<p className="text-gray-200 text-sm leading-relaxed whitespace-pre-wrap">
|
||||
{script}
|
||||
</p>
|
||||
</div>
|
||||
<div className="space-y-5">
|
||||
<div className="flex justify-between items-center">
|
||||
<h4 className="font-semibold text-gray-300 flex items-center gap-2">
|
||||
🎙️ 识别结果
|
||||
</h4>
|
||||
<span className="text-xs text-gray-400">{script.length} 字</span>
|
||||
</div>
|
||||
|
||||
<div className="flex justify-center pt-4">
|
||||
<div className="bg-white/5 border border-white/10 rounded-xl p-4 max-h-72 overflow-y-auto hide-scrollbar">
|
||||
<p className="text-gray-200 text-sm leading-relaxed whitespace-pre-wrap">
|
||||
{script}
|
||||
</p>
|
||||
</div>
|
||||
|
||||
<div className={`grid ${onApply ? "grid-cols-2 sm:grid-cols-4" : "grid-cols-2 sm:grid-cols-3"} gap-2`}>
|
||||
{onApply && (
|
||||
<button
|
||||
onClick={() => handleApplyAndClose(script)}
|
||||
className="py-2.5 px-3 bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-500 hover:to-pink-500 text-white rounded-lg transition-colors text-sm"
|
||||
>
|
||||
填入文案
|
||||
</button>
|
||||
)}
|
||||
<button
|
||||
onClick={() => copyToClipboard(script)}
|
||||
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
|
||||
>
|
||||
复制
|
||||
</button>
|
||||
<button
|
||||
onClick={handleExtractNext}
|
||||
className="px-6 py-2 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors"
|
||||
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
|
||||
>
|
||||
提取下一个
|
||||
</button>
|
||||
<button
|
||||
onClick={onClose}
|
||||
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
|
||||
>
|
||||
关闭
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
242
frontend/src/features/home/ui/ScriptLearningModal.tsx
Normal file
242
frontend/src/features/home/ui/ScriptLearningModal.tsx
Normal file
@@ -0,0 +1,242 @@
|
||||
"use client";
|
||||
|
||||
import { BookOpen, Sparkles } from "lucide-react";
|
||||
import { AppModal, AppModalHeader } from "@/shared/ui/AppModal";
|
||||
import { useScriptLearning } from "./script-learning/useScriptLearning";
|
||||
|
||||
interface ScriptLearningModalProps {
|
||||
isOpen: boolean;
|
||||
onClose: () => void;
|
||||
onApply?: (text: string) => void;
|
||||
}
|
||||
|
||||
const WORD_COUNT_MIN = 80;
|
||||
const WORD_COUNT_MAX = 1000;
|
||||
|
||||
export default function ScriptLearningModal({ isOpen, onClose, onApply }: ScriptLearningModalProps) {
|
||||
const {
|
||||
step,
|
||||
inputUrl,
|
||||
setInputUrl,
|
||||
topics,
|
||||
selectedTopic,
|
||||
setSelectedTopic,
|
||||
wordCount,
|
||||
setWordCount,
|
||||
generatedScript,
|
||||
error,
|
||||
analysisId,
|
||||
handleAnalyze,
|
||||
handleGenerate,
|
||||
handleRegenerate,
|
||||
backToInput,
|
||||
backToTopics,
|
||||
copyToClipboard,
|
||||
} = useScriptLearning({ isOpen });
|
||||
|
||||
if (!isOpen) return null;
|
||||
|
||||
const wordCountNum = Number(wordCount);
|
||||
const wordCountValid = Number.isInteger(wordCountNum)
|
||||
&& wordCountNum >= WORD_COUNT_MIN
|
||||
&& wordCountNum <= WORD_COUNT_MAX;
|
||||
const canGenerate = !!analysisId && !!selectedTopic && wordCountValid;
|
||||
|
||||
const handleApplyAndClose = () => {
|
||||
if (!generatedScript.trim()) return;
|
||||
onApply?.(generatedScript);
|
||||
onClose();
|
||||
};
|
||||
|
||||
return (
|
||||
<AppModal
|
||||
isOpen={isOpen}
|
||||
onClose={onClose}
|
||||
panelClassName="w-full max-w-2xl max-h-[90vh] rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden flex flex-col"
|
||||
closeOnOverlay={false}
|
||||
closeOnEsc={false}
|
||||
>
|
||||
<AppModalHeader
|
||||
title="文案深度学习"
|
||||
icon={<BookOpen className="h-5 w-5 text-cyan-300" />}
|
||||
subtitle="分析博主近期选题风格并快速生成文案"
|
||||
onClose={onClose}
|
||||
/>
|
||||
|
||||
<div className="flex-1 overflow-y-auto p-6">
|
||||
{step === "input" && (
|
||||
<div className="space-y-5">
|
||||
<div className="space-y-2">
|
||||
<label className="text-sm text-gray-300">博主主页链接</label>
|
||||
<input
|
||||
type="text"
|
||||
value={inputUrl}
|
||||
onChange={(event) => setInputUrl(event.target.value)}
|
||||
placeholder="请粘贴抖音或B站博主主页链接..."
|
||||
className="w-full bg-black/20 border border-white/10 rounded-xl px-4 py-3 text-white placeholder-gray-500 focus:outline-none focus:border-cyan-500 transition-colors"
|
||||
/>
|
||||
<p className="text-xs text-gray-500">仅支持 https 链接,建议使用主页地址(非单条视频链接)</p>
|
||||
</div>
|
||||
|
||||
{error && (
|
||||
<div className="bg-red-500/10 border border-red-500/30 rounded-xl p-4">
|
||||
<p className="text-red-400 text-sm">{error}</p>
|
||||
</div>
|
||||
)}
|
||||
|
||||
<div className="flex gap-3 pt-1">
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => setInputUrl("")}
|
||||
className="flex-1 py-3 px-4 bg-white/10 hover:bg-white/20 text-white rounded-xl transition-colors"
|
||||
>
|
||||
清空
|
||||
</button>
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => void handleAnalyze()}
|
||||
disabled={!inputUrl.trim()}
|
||||
className="flex-1 py-3 px-4 bg-gradient-to-r from-blue-600 to-cyan-600 hover:from-blue-500 hover:to-cyan-500 disabled:opacity-50 disabled:cursor-not-allowed text-white rounded-xl transition-all font-medium shadow-lg"
|
||||
>
|
||||
开始分析
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{(step === "analyzing" || step === "generating") && (
|
||||
<div className="flex flex-col items-center justify-center py-20">
|
||||
<div className="relative w-20 h-20 mb-6">
|
||||
<div className="absolute inset-0 border-4 border-cyan-500/30 rounded-full" />
|
||||
<div className="absolute inset-0 border-4 border-t-cyan-500 rounded-full animate-spin" />
|
||||
</div>
|
||||
<h4 className="text-xl font-medium text-white mb-2">
|
||||
{step === "analyzing" ? "正在分析中..." : "正在生成中..."}
|
||||
</h4>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{step === "topics" && (
|
||||
<div className="space-y-5">
|
||||
<div className="bg-cyan-500/10 border border-cyan-500/30 rounded-xl p-3">
|
||||
<p className="text-cyan-200 text-sm">已完成深度学习,请选择热门话题。</p>
|
||||
</div>
|
||||
|
||||
<div className="space-y-2">
|
||||
<p className="text-sm text-gray-300">请选择一个话题</p>
|
||||
<div className="grid grid-cols-1 sm:grid-cols-2 gap-2">
|
||||
{topics.map((topic) => {
|
||||
const active = selectedTopic === topic;
|
||||
return (
|
||||
<button
|
||||
key={topic}
|
||||
type="button"
|
||||
onClick={() => setSelectedTopic(topic)}
|
||||
className={`text-left rounded-lg border px-3 py-2.5 text-sm transition-colors ${
|
||||
active
|
||||
? "border-cyan-400 bg-cyan-500/20 text-cyan-100"
|
||||
: "border-white/10 bg-white/5 text-gray-200 hover:border-white/20 hover:bg-white/10"
|
||||
}`}
|
||||
>
|
||||
{topic}
|
||||
</button>
|
||||
);
|
||||
})}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="space-y-2">
|
||||
<label className="text-sm text-gray-300">目标字数</label>
|
||||
<input
|
||||
type="number"
|
||||
min={WORD_COUNT_MIN}
|
||||
max={WORD_COUNT_MAX}
|
||||
value={wordCount}
|
||||
onChange={(event) => setWordCount(event.target.value)}
|
||||
placeholder="请输入目标字数(80-1000),如 300"
|
||||
className="w-full bg-black/20 border border-white/10 rounded-xl px-4 py-3 text-white placeholder-gray-500 focus:outline-none focus:border-cyan-500 transition-colors"
|
||||
/>
|
||||
</div>
|
||||
|
||||
{error && (
|
||||
<div className="bg-red-500/10 border border-red-500/30 rounded-xl p-4">
|
||||
<p className="text-red-400 text-sm">{error}</p>
|
||||
</div>
|
||||
)}
|
||||
|
||||
<div className="flex gap-3 pt-1">
|
||||
<button
|
||||
type="button"
|
||||
onClick={backToInput}
|
||||
className="flex-1 py-3 px-4 bg-white/10 hover:bg-white/20 text-white rounded-xl transition-colors"
|
||||
>
|
||||
返回
|
||||
</button>
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => void handleGenerate()}
|
||||
disabled={!canGenerate}
|
||||
className="flex-1 py-3 px-4 bg-gradient-to-r from-blue-600 to-cyan-600 hover:from-blue-500 hover:to-cyan-500 disabled:opacity-50 disabled:cursor-not-allowed text-white rounded-xl transition-all font-medium shadow-lg"
|
||||
>
|
||||
生成文案
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{step === "result" && (
|
||||
<div className="space-y-5">
|
||||
<div className="flex justify-between items-center">
|
||||
<h4 className="font-semibold text-cyan-200 flex items-center gap-2">
|
||||
<Sparkles className="h-4 w-4" />
|
||||
生成结果
|
||||
</h4>
|
||||
<span className="text-xs text-gray-400">{generatedScript.length} 字</span>
|
||||
</div>
|
||||
|
||||
<div className="bg-white/5 border border-white/10 rounded-xl p-4 max-h-72 overflow-y-auto hide-scrollbar">
|
||||
<p className="text-gray-200 text-sm leading-relaxed whitespace-pre-wrap">{generatedScript}</p>
|
||||
</div>
|
||||
|
||||
<div className="grid grid-cols-2 sm:grid-cols-4 gap-2">
|
||||
<button
|
||||
type="button"
|
||||
onClick={handleApplyAndClose}
|
||||
className="py-2.5 px-3 bg-gradient-to-r from-blue-600 to-cyan-600 hover:from-blue-500 hover:to-cyan-500 text-white rounded-lg transition-colors text-sm"
|
||||
>
|
||||
填入文案
|
||||
</button>
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => copyToClipboard(generatedScript)}
|
||||
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
|
||||
>
|
||||
复制
|
||||
</button>
|
||||
<button
|
||||
type="button"
|
||||
onClick={() => void handleRegenerate()}
|
||||
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
|
||||
>
|
||||
重新生成
|
||||
</button>
|
||||
<button
|
||||
type="button"
|
||||
onClick={backToTopics}
|
||||
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
|
||||
>
|
||||
换个话题
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{error && (
|
||||
<div className="bg-red-500/10 border border-red-500/30 rounded-xl p-4">
|
||||
<p className="text-red-400 text-sm">{error}</p>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</AppModal>
|
||||
);
|
||||
}
|
||||
@@ -1,19 +1,28 @@
|
||||
import { useEffect, useRef, useCallback, useState, useMemo } from "react";
|
||||
import { useEffect, useRef, useCallback, useState } from "react";
|
||||
import WaveSurfer from "wavesurfer.js";
|
||||
import { ChevronDown, GripVertical, Check } from "lucide-react";
|
||||
import type { TimelineSegment } from "@/features/home/model/useTimelineEditor";
|
||||
import { ChevronDown, Check, X, Plus } from "lucide-react";
|
||||
import type { InsertSegment } from "@/shared/types/timeline";
|
||||
import type { Material } from "@/shared/types/material";
|
||||
import { SelectPopover } from "@/shared/ui/SelectPopover";
|
||||
|
||||
interface TimelineEditorProps {
|
||||
audioDuration: number;
|
||||
audioUrl: string;
|
||||
segments: TimelineSegment[];
|
||||
materials: Material[];
|
||||
// Multi-camera props
|
||||
primaryMaterial: Material | undefined;
|
||||
inserts: InsertSegment[];
|
||||
insertCandidates: Material[];
|
||||
onAddInsert: (materialId: string) => void;
|
||||
onRemoveInsert: (id: string) => void;
|
||||
onMoveInsert: (id: string, newStart: number) => void;
|
||||
onClickInsert: (insert: InsertSegment) => void;
|
||||
onClickPrimary: () => void;
|
||||
// Single material: for ClipTrimmer compat, pass a synthetic TimelineSegment
|
||||
primarySourceStart: number;
|
||||
primarySourceEnd: number;
|
||||
// Shared
|
||||
outputAspectRatio: "9:16" | "16:9";
|
||||
onOutputAspectRatioChange: (ratio: "9:16" | "16:9") => void;
|
||||
onReorderSegment: (fromIdx: number, toIdx: number) => void;
|
||||
onClickSegment: (segment: TimelineSegment) => void;
|
||||
embedded?: boolean;
|
||||
}
|
||||
|
||||
@@ -26,12 +35,18 @@ function formatTime(sec: number): string {
|
||||
export function TimelineEditor({
|
||||
audioDuration,
|
||||
audioUrl,
|
||||
segments,
|
||||
materials,
|
||||
primaryMaterial,
|
||||
inserts,
|
||||
insertCandidates,
|
||||
onAddInsert,
|
||||
onRemoveInsert,
|
||||
onMoveInsert,
|
||||
onClickInsert,
|
||||
onClickPrimary,
|
||||
primarySourceStart,
|
||||
primarySourceEnd,
|
||||
outputAspectRatio,
|
||||
onOutputAspectRatioChange,
|
||||
onReorderSegment,
|
||||
onClickSegment,
|
||||
embedded = false,
|
||||
}: TimelineEditorProps) {
|
||||
const waveRef = useRef<HTMLDivElement>(null);
|
||||
@@ -39,18 +54,25 @@ export function TimelineEditor({
|
||||
const [waveReady, setWaveReady] = useState(false);
|
||||
const [isPlaying, setIsPlaying] = useState(false);
|
||||
|
||||
// Refs for high-frequency DOM updates (avoid 60fps re-renders)
|
||||
// Refs for high-frequency DOM updates
|
||||
const playheadRef = useRef<HTMLDivElement>(null);
|
||||
const timeRef = useRef<HTMLSpanElement>(null);
|
||||
const audioDurationRef = useRef(audioDuration);
|
||||
const timelineBarRef = useRef<HTMLDivElement>(null);
|
||||
|
||||
useEffect(() => {
|
||||
audioDurationRef.current = audioDuration;
|
||||
}, [audioDuration]);
|
||||
|
||||
// Drag-to-reorder state
|
||||
const [dragFromIdx, setDragFromIdx] = useState<number | null>(null);
|
||||
const [dragOverIdx, setDragOverIdx] = useState<number | null>(null);
|
||||
// Drag state for insert blocks (move only; duration editing unified to ClipTrimmer)
|
||||
const [dragId, setDragId] = useState<string | null>(null);
|
||||
const dragStartXRef = useRef(0);
|
||||
const dragStartValRef = useRef(0);
|
||||
const dragMovedRef = useRef(false);
|
||||
const DRAG_THRESHOLD = 5;
|
||||
|
||||
const isMultiCam = insertCandidates.length > 0 || inserts.length > 0;
|
||||
const hasPrimary = !!primaryMaterial;
|
||||
|
||||
// Aspect ratio options
|
||||
const ratioOptions = [
|
||||
@@ -60,14 +82,21 @@ export function TimelineEditor({
|
||||
const currentRatioLabel =
|
||||
ratioOptions.find((opt) => opt.value === outputAspectRatio)?.label ?? "竖屏 9:16";
|
||||
|
||||
// Primary material loop info
|
||||
const primaryDuration = primaryMaterial?.duration_sec ?? 0;
|
||||
const primaryEffective = primarySourceEnd > primarySourceStart
|
||||
? primarySourceEnd - primarySourceStart
|
||||
: primaryDuration;
|
||||
const loopCount = primaryEffective > 0 && audioDuration > 0
|
||||
? (audioDuration / primaryEffective)
|
||||
: 0;
|
||||
|
||||
// Create / recreate wavesurfer when audioUrl changes
|
||||
useEffect(() => {
|
||||
if (!waveRef.current || !audioUrl) return;
|
||||
|
||||
const playheadEl = playheadRef.current;
|
||||
const timeEl = timeRef.current;
|
||||
|
||||
// Destroy previous instance
|
||||
if (wsRef.current) {
|
||||
wsRef.current.destroy();
|
||||
wsRef.current = null;
|
||||
@@ -87,7 +116,6 @@ export function TimelineEditor({
|
||||
normalize: true,
|
||||
});
|
||||
|
||||
// Click waveform → seek + auto-play
|
||||
ws.on("interaction", () => ws.play());
|
||||
ws.on("play", () => setIsPlaying(true));
|
||||
ws.on("pause", () => setIsPlaying(false));
|
||||
@@ -95,7 +123,6 @@ export function TimelineEditor({
|
||||
setIsPlaying(false);
|
||||
if (playheadRef.current) playheadRef.current.style.display = "none";
|
||||
});
|
||||
// High-frequency: update playhead + time via refs (no React re-render)
|
||||
ws.on("timeupdate", (time: number) => {
|
||||
const dur = audioDurationRef.current;
|
||||
if (playheadRef.current && dur > 0) {
|
||||
@@ -119,7 +146,6 @@ export function TimelineEditor({
|
||||
};
|
||||
}, [audioUrl, waveReady]);
|
||||
|
||||
// Callback ref to detect when waveRef div mounts
|
||||
const waveCallbackRef = useCallback((node: HTMLDivElement | null) => {
|
||||
(waveRef as React.MutableRefObject<HTMLDivElement | null>).current = node;
|
||||
setWaveReady(!!node);
|
||||
@@ -129,43 +155,45 @@ export function TimelineEditor({
|
||||
wsRef.current?.playPause();
|
||||
}, []);
|
||||
|
||||
// Drag-to-reorder handlers
|
||||
const handleDragStart = useCallback((idx: number, e: React.DragEvent) => {
|
||||
setDragFromIdx(idx);
|
||||
e.dataTransfer.effectAllowed = "move";
|
||||
e.dataTransfer.setData("text/plain", String(idx));
|
||||
}, []);
|
||||
// ── Insert block pointer handlers (move only) ──
|
||||
|
||||
const handleDragOver = useCallback((idx: number, e: React.DragEvent) => {
|
||||
const getTimeFromClientX = useCallback((clientX: number): number => {
|
||||
if (!timelineBarRef.current || audioDuration <= 0) return 0;
|
||||
const rect = timelineBarRef.current.getBoundingClientRect();
|
||||
const ratio = Math.max(0, Math.min(1, (clientX - rect.left) / rect.width));
|
||||
return ratio * audioDuration;
|
||||
}, [audioDuration]);
|
||||
|
||||
const handleInsertPointerDown = useCallback((
|
||||
id: string,
|
||||
e: React.PointerEvent
|
||||
) => {
|
||||
e.preventDefault();
|
||||
e.dataTransfer.dropEffect = "move";
|
||||
setDragOverIdx(idx);
|
||||
}, []);
|
||||
e.stopPropagation();
|
||||
setDragId(id);
|
||||
dragStartXRef.current = e.clientX;
|
||||
dragMovedRef.current = false;
|
||||
const ins = inserts.find((i) => i.id === id);
|
||||
dragStartValRef.current = ins?.start ?? 0;
|
||||
(e.target as HTMLElement).setPointerCapture(e.pointerId);
|
||||
}, [inserts]);
|
||||
|
||||
const handleDragLeave = useCallback(() => {
|
||||
setDragOverIdx(null);
|
||||
}, []);
|
||||
|
||||
const handleDrop = useCallback((toIdx: number, e: React.DragEvent) => {
|
||||
e.preventDefault();
|
||||
const fromIdx = parseInt(e.dataTransfer.getData("text/plain"), 10);
|
||||
if (!isNaN(fromIdx) && fromIdx !== toIdx) {
|
||||
onReorderSegment(fromIdx, toIdx);
|
||||
const handlePointerMove = useCallback((e: React.PointerEvent) => {
|
||||
if (!dragId) return;
|
||||
if (!dragMovedRef.current) {
|
||||
const dx = Math.abs(e.clientX - dragStartXRef.current);
|
||||
if (dx < DRAG_THRESHOLD) return;
|
||||
dragMovedRef.current = true;
|
||||
}
|
||||
setDragFromIdx(null);
|
||||
setDragOverIdx(null);
|
||||
}, [onReorderSegment]);
|
||||
const currentTime = getTimeFromClientX(e.clientX);
|
||||
const startTime = getTimeFromClientX(dragStartXRef.current);
|
||||
onMoveInsert(dragId, dragStartValRef.current + (currentTime - startTime));
|
||||
}, [dragId, getTimeFromClientX, onMoveInsert]);
|
||||
|
||||
const handleDragEnd = useCallback(() => {
|
||||
setDragFromIdx(null);
|
||||
setDragOverIdx(null);
|
||||
const handlePointerUp = useCallback(() => {
|
||||
setDragId(null);
|
||||
}, []);
|
||||
|
||||
// Filter visible vs overflow segments
|
||||
const visibleSegments = useMemo(() => segments.filter((s) => s.start < audioDuration), [segments, audioDuration]);
|
||||
const overflowSegments = useMemo(() => segments.filter((s) => s.start >= audioDuration), [segments, audioDuration]);
|
||||
const hasSegments = visibleSegments.length > 0;
|
||||
|
||||
const content = (
|
||||
<>
|
||||
<div className="flex items-center justify-between mb-3">
|
||||
@@ -239,109 +267,149 @@ export function TimelineEditor({
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Waveform — always rendered so ref stays mounted */}
|
||||
{/* Waveform */}
|
||||
<div className="relative mb-1">
|
||||
<div ref={waveCallbackRef} className="rounded-lg overflow-hidden bg-black/20 cursor-pointer" style={{ minHeight: 56 }} />
|
||||
</div>
|
||||
|
||||
{/* Segment blocks or empty placeholder */}
|
||||
{hasSegments ? (
|
||||
{/* Timeline visualization */}
|
||||
{hasPrimary && audioDuration > 0 ? (
|
||||
<>
|
||||
<div className="relative h-14 flex select-none">
|
||||
{/* Playhead — syncs with audio playback */}
|
||||
<div
|
||||
ref={timelineBarRef}
|
||||
className="relative select-none touch-none"
|
||||
style={{ minHeight: isMultiCam ? 80 : 56 }}
|
||||
onPointerMove={handlePointerMove}
|
||||
onPointerUp={handlePointerUp}
|
||||
onPointerLeave={handlePointerUp}
|
||||
>
|
||||
{/* Playhead */}
|
||||
<div
|
||||
ref={playheadRef}
|
||||
className="absolute top-0 h-full w-0.5 bg-fuchsia-400 z-10 pointer-events-none"
|
||||
className="absolute top-0 h-full w-0.5 bg-fuchsia-400 z-20 pointer-events-none"
|
||||
style={{ display: "none", left: "0%" }}
|
||||
/>
|
||||
{visibleSegments.map((seg, i) => {
|
||||
const left = (seg.start / audioDuration) * 100;
|
||||
const width = ((seg.end - seg.start) / audioDuration) * 100;
|
||||
const segDur = seg.end - seg.start;
|
||||
const isDragTarget = dragOverIdx === i && dragFromIdx !== i;
|
||||
|
||||
// Compute loop portion for the last visible segment
|
||||
const isLastVisible = i === visibleSegments.length - 1;
|
||||
let loopPercent = 0;
|
||||
if (isLastVisible && audioDuration > 0) {
|
||||
const mat = materials.find((m) => m.id === seg.materialId);
|
||||
const matDur = mat?.duration_sec ?? 0;
|
||||
const effDur = (seg.sourceEnd > seg.sourceStart)
|
||||
? (seg.sourceEnd - seg.sourceStart)
|
||||
: Math.max(matDur - seg.sourceStart, 0);
|
||||
if (effDur > 0 && segDur > effDur + 0.1) {
|
||||
loopPercent = ((segDur - effDur) / segDur) * 100;
|
||||
}
|
||||
}
|
||||
{/* Primary material background bar */}
|
||||
<button
|
||||
onClick={onClickPrimary}
|
||||
className="absolute inset-0 rounded-lg overflow-hidden border border-purple-500/30 hover:border-purple-500/50 transition-colors cursor-pointer"
|
||||
style={{ backgroundColor: "#8b5cf620" }}
|
||||
title={`主素材: ${primaryMaterial?.scene || primaryMaterial?.name || ""}${
|
||||
loopCount > 1 ? ` (${primaryEffective.toFixed(1)}s ×${loopCount.toFixed(1)} 循环)` : ""
|
||||
}\n点击设置截取范围`}
|
||||
>
|
||||
{/* Loop stripe pattern */}
|
||||
{loopCount > 1 && (
|
||||
<div
|
||||
className="absolute inset-0 pointer-events-none"
|
||||
style={{
|
||||
background: `repeating-linear-gradient(-45deg, transparent, transparent 6px, rgba(139,92,246,0.06) 6px, rgba(139,92,246,0.06) 12px)`,
|
||||
}}
|
||||
/>
|
||||
)}
|
||||
<div className="absolute inset-0 flex items-center px-3">
|
||||
<span className="text-[11px] text-purple-300/80 truncate">
|
||||
主素材: {primaryMaterial?.scene || primaryMaterial?.name || ""}
|
||||
{loopCount > 1 && (
|
||||
<span className="text-purple-400/60 ml-1">
|
||||
({primaryEffective.toFixed(1)}s ×{loopCount.toFixed(1)} 循环)
|
||||
</span>
|
||||
)}
|
||||
{primarySourceStart > 0 && (
|
||||
<span className="text-amber-400/80 ml-1">✂ {primarySourceStart.toFixed(1)}s</span>
|
||||
)}
|
||||
</span>
|
||||
</div>
|
||||
</button>
|
||||
|
||||
{/* Insert blocks floating above primary */}
|
||||
{inserts.map((ins) => {
|
||||
const left = (ins.start / audioDuration) * 100;
|
||||
const width = ((ins.end - ins.start) / audioDuration) * 100;
|
||||
const insDur = ins.end - ins.start;
|
||||
const isDragging = dragId === ins.id;
|
||||
|
||||
return (
|
||||
<div key={seg.id} className="absolute top-0 h-full" style={{ left: `${left}%`, width: `${width}%` }}>
|
||||
<div
|
||||
key={ins.id}
|
||||
className={`absolute group min-h-[40px] ${isDragging ? "z-30" : "z-10"}`}
|
||||
style={{
|
||||
left: `${left}%`,
|
||||
width: `${width}%`,
|
||||
top: isMultiCam ? 12 : 4,
|
||||
bottom: isMultiCam ? 12 : 4,
|
||||
}}
|
||||
>
|
||||
{/* Main block body — move on drag, click opens ClipTrimmer */}
|
||||
<button
|
||||
draggable
|
||||
onDragStart={(e) => handleDragStart(i, e)}
|
||||
onDragOver={(e) => handleDragOver(i, e)}
|
||||
onDragLeave={handleDragLeave}
|
||||
onDrop={(e) => handleDrop(i, e)}
|
||||
onDragEnd={handleDragEnd}
|
||||
onClick={() => onClickSegment(seg)}
|
||||
className={`relative w-full h-full rounded-lg flex flex-col items-center justify-center overflow-hidden cursor-grab active:cursor-grabbing transition-all border ${
|
||||
isDragTarget
|
||||
? "ring-2 ring-purple-400 border-purple-400 scale-[1.02]"
|
||||
: dragFromIdx === i
|
||||
? "opacity-50 border-white/10"
|
||||
: "hover:opacity-90 border-white/10"
|
||||
className={`w-full h-full rounded-lg flex flex-col items-center justify-center overflow-hidden cursor-grab active:cursor-grabbing transition-all border ${
|
||||
isDragging
|
||||
? "ring-2 ring-white/40 scale-[1.02]"
|
||||
: "hover:brightness-110"
|
||||
}`}
|
||||
style={{ backgroundColor: seg.color + "33", borderColor: isDragTarget ? undefined : seg.color + "66" }}
|
||||
title={`拖拽可调换顺序 · 点击设置截取范围\n${seg.materialName}\n${segDur.toFixed(1)}s${loopPercent > 0 ? ` (含循环 ${(segDur * loopPercent / 100).toFixed(1)}s)` : ""}`}
|
||||
style={{
|
||||
backgroundColor: ins.color + "55",
|
||||
borderColor: ins.color + "88",
|
||||
}}
|
||||
onPointerDown={(e) => handleInsertPointerDown(ins.id, e)}
|
||||
onClick={() => {
|
||||
if (!dragMovedRef.current) onClickInsert(ins);
|
||||
}}
|
||||
title={`${ins.materialName} ${insDur.toFixed(1)}s\n点击设置截取范围`}
|
||||
>
|
||||
<GripVertical className="absolute top-0.5 left-0.5 h-3 w-3 text-white/30 z-[1]" />
|
||||
<span className="text-[11px] text-white/90 truncate max-w-full px-1 leading-tight z-[1]">
|
||||
{seg.materialName}
|
||||
{ins.materialName}
|
||||
</span>
|
||||
<span className="text-[10px] text-white/60 leading-tight z-[1]">
|
||||
{segDur.toFixed(1)}s
|
||||
{insDur.toFixed(1)}s
|
||||
</span>
|
||||
{seg.sourceStart > 0 && (
|
||||
{ins.sourceStart > 0 && (
|
||||
<span className="text-[9px] text-amber-400/80 leading-tight z-[1]">
|
||||
✂ {seg.sourceStart.toFixed(1)}s
|
||||
✂ {ins.sourceStart.toFixed(1)}s
|
||||
</span>
|
||||
)}
|
||||
{/* Loop fill stripe overlay */}
|
||||
{loopPercent > 0 && (
|
||||
<div
|
||||
className="absolute top-0 right-0 h-full pointer-events-none flex items-center justify-center"
|
||||
style={{
|
||||
width: `${loopPercent}%`,
|
||||
background: `repeating-linear-gradient(-45deg, transparent, transparent 3px, rgba(255,255,255,0.07) 3px, rgba(255,255,255,0.07) 6px)`,
|
||||
borderLeft: "1px dashed rgba(255,255,255,0.25)",
|
||||
}}
|
||||
>
|
||||
<span className="text-[9px] text-white/30">循环</span>
|
||||
</div>
|
||||
)}
|
||||
</button>
|
||||
|
||||
{/* Delete button */}
|
||||
<button
|
||||
className="absolute -top-1.5 -right-1.5 w-5 h-5 rounded-full bg-red-500/80 hover:bg-red-500 flex items-center justify-center opacity-40 sm:opacity-0 sm:group-hover:opacity-100 transition-opacity z-20"
|
||||
onClick={(e) => {
|
||||
e.stopPropagation();
|
||||
onRemoveInsert(ins.id);
|
||||
}}
|
||||
title="删除此插入"
|
||||
>
|
||||
<X className="w-3 h-3 text-white" />
|
||||
</button>
|
||||
</div>
|
||||
);
|
||||
})}
|
||||
</div>
|
||||
|
||||
{/* Overflow segments — shown as gray chips */}
|
||||
{overflowSegments.length > 0 && (
|
||||
<div className="flex flex-wrap items-center gap-1.5 mt-1.5">
|
||||
<span className="text-[10px] text-gray-500">未使用:</span>
|
||||
{overflowSegments.map((seg) => (
|
||||
<span
|
||||
key={seg.id}
|
||||
className="text-[10px] text-gray-500 bg-white/5 border border-white/10 rounded px-1.5 py-0.5"
|
||||
{/* Insert candidates bar (multi-cam only) */}
|
||||
{isMultiCam && insertCandidates.length > 0 && (
|
||||
<div className="flex flex-wrap items-center gap-1.5 mt-2">
|
||||
<span className="text-[10px] text-gray-500">可插入:</span>
|
||||
{insertCandidates.map((mat) => (
|
||||
<button
|
||||
key={mat.id}
|
||||
className="flex items-center gap-0.5 text-[10px] text-gray-300 bg-white/5 border border-white/10 hover:border-white/30 rounded px-1.5 py-0.5 transition-colors"
|
||||
onClick={() => onAddInsert(mat.id)}
|
||||
title={`添加 "${mat.scene || mat.name}" 到时间轴`}
|
||||
>
|
||||
{seg.materialName}
|
||||
</span>
|
||||
<Plus className="w-2.5 h-2.5" />
|
||||
{mat.scene || mat.name}
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
)}
|
||||
|
||||
<p className="text-[10px] text-gray-500 mt-1.5">
|
||||
点击波形定位播放 · 拖拽色块调换顺序 · 点击色块设置截取范围
|
||||
{isMultiCam
|
||||
? "点击主素材设置截取范围 · 拖拽插入块调整位置 · 点击插入块设置截取/时长"
|
||||
: "点击波形定位播放 · 点击素材条设置截取范围"
|
||||
}
|
||||
</p>
|
||||
</>
|
||||
) : (
|
||||
|
||||
@@ -0,0 +1,239 @@
|
||||
import { useCallback, useEffect, useState } from "react";
|
||||
import api from "@/shared/api/axios";
|
||||
import { ApiResponse, unwrap } from "@/shared/api/types";
|
||||
import { toast } from "sonner";
|
||||
|
||||
export type ScriptLearningStep = "input" | "analyzing" | "topics" | "generating" | "result";
|
||||
|
||||
const WORD_COUNT_MIN = 80;
|
||||
const WORD_COUNT_MAX = 1000;
|
||||
const DEFAULT_WORD_COUNT = "300";
|
||||
|
||||
interface UseScriptLearningOptions {
|
||||
isOpen: boolean;
|
||||
}
|
||||
|
||||
interface AnalyzeCreatorPayload {
|
||||
topics: string[];
|
||||
analysis_id: string;
|
||||
fetched_count: number;
|
||||
}
|
||||
|
||||
interface GenerateTopicScriptPayload {
|
||||
script: string;
|
||||
}
|
||||
|
||||
export const useScriptLearning = ({ isOpen }: UseScriptLearningOptions) => {
|
||||
const [step, setStep] = useState<ScriptLearningStep>("input");
|
||||
const [inputUrl, setInputUrl] = useState("");
|
||||
const [topics, setTopics] = useState<string[]>([]);
|
||||
const [selectedTopic, setSelectedTopic] = useState<string | null>(null);
|
||||
const [wordCount, setWordCount] = useState(DEFAULT_WORD_COUNT);
|
||||
const [generatedScript, setGeneratedScript] = useState("");
|
||||
const [error, setError] = useState<string | null>(null);
|
||||
const [analysisId, setAnalysisId] = useState<string | null>(null);
|
||||
const [fetchedCount, setFetchedCount] = useState(0);
|
||||
|
||||
const resetAll = useCallback(() => {
|
||||
setStep("input");
|
||||
setInputUrl("");
|
||||
setTopics([]);
|
||||
setSelectedTopic(null);
|
||||
setWordCount(DEFAULT_WORD_COUNT);
|
||||
setGeneratedScript("");
|
||||
setError(null);
|
||||
setAnalysisId(null);
|
||||
setFetchedCount(0);
|
||||
}, []);
|
||||
|
||||
useEffect(() => {
|
||||
if (isOpen) {
|
||||
resetAll();
|
||||
}
|
||||
}, [isOpen, resetAll]);
|
||||
|
||||
const parseWordCount = useCallback((value: string): number | null => {
|
||||
const num = Number(value);
|
||||
if (!Number.isInteger(num)) {
|
||||
return null;
|
||||
}
|
||||
if (num < WORD_COUNT_MIN || num > WORD_COUNT_MAX) {
|
||||
return null;
|
||||
}
|
||||
return num;
|
||||
}, []);
|
||||
|
||||
const handleAnalyze = useCallback(async () => {
|
||||
const urlValue = inputUrl.trim();
|
||||
if (!urlValue) {
|
||||
setError("请先输入博主主页链接");
|
||||
return;
|
||||
}
|
||||
|
||||
setError(null);
|
||||
setStep("analyzing");
|
||||
|
||||
try {
|
||||
const { data: res } = await api.post<ApiResponse<AnalyzeCreatorPayload>>(
|
||||
"/api/tools/analyze-creator",
|
||||
{ url: urlValue },
|
||||
{ timeout: 60000 }
|
||||
);
|
||||
const payload = unwrap(res);
|
||||
const topicList = payload.topics || [];
|
||||
|
||||
if (topicList.length === 0) {
|
||||
throw new Error("未识别到可用话题,请更换链接重试");
|
||||
}
|
||||
|
||||
setTopics(topicList);
|
||||
setSelectedTopic(topicList[0]);
|
||||
setAnalysisId(payload.analysis_id || null);
|
||||
setFetchedCount(payload.fetched_count || 0);
|
||||
setGeneratedScript("");
|
||||
setStep("topics");
|
||||
} catch (err: unknown) {
|
||||
const axiosErr = err as {
|
||||
response?: { data?: { message?: string } };
|
||||
message?: string;
|
||||
};
|
||||
const msg = axiosErr.response?.data?.message || axiosErr.message || "分析失败,请稍后重试";
|
||||
setError(msg);
|
||||
setStep("input");
|
||||
}
|
||||
}, [inputUrl]);
|
||||
|
||||
const handleGenerate = useCallback(async () => {
|
||||
if (!analysisId) {
|
||||
setError("分析结果已失效,请重新分析");
|
||||
setStep("input");
|
||||
return;
|
||||
}
|
||||
if (!selectedTopic) {
|
||||
setError("请先选择一个话题");
|
||||
return;
|
||||
}
|
||||
|
||||
const count = parseWordCount(wordCount.trim());
|
||||
if (count === null) {
|
||||
setError(`目标字数需在 ${WORD_COUNT_MIN}-${WORD_COUNT_MAX} 之间`);
|
||||
return;
|
||||
}
|
||||
|
||||
setError(null);
|
||||
setStep("generating");
|
||||
|
||||
try {
|
||||
const { data: res } = await api.post<ApiResponse<GenerateTopicScriptPayload>>(
|
||||
"/api/tools/generate-topic-script",
|
||||
{
|
||||
analysis_id: analysisId,
|
||||
topic: selectedTopic,
|
||||
word_count: count,
|
||||
},
|
||||
{ timeout: 90000 }
|
||||
);
|
||||
const payload = unwrap(res);
|
||||
|
||||
const script = (payload.script || "").trim();
|
||||
if (!script) {
|
||||
throw new Error("生成内容为空,请重试");
|
||||
}
|
||||
|
||||
setGeneratedScript(script);
|
||||
setStep("result");
|
||||
} catch (err: unknown) {
|
||||
const axiosErr = err as {
|
||||
response?: { data?: { message?: string } };
|
||||
message?: string;
|
||||
code?: string;
|
||||
};
|
||||
let msg = axiosErr.response?.data?.message || axiosErr.message || "生成失败,请稍后重试";
|
||||
if (axiosErr.code === "ECONNABORTED" || /timeout/i.test(axiosErr.message || "")) {
|
||||
msg = "生成超时,请稍后重试(可适当减少目标字数)";
|
||||
}
|
||||
setError(msg);
|
||||
setStep("topics");
|
||||
}
|
||||
}, [analysisId, parseWordCount, selectedTopic, wordCount]);
|
||||
|
||||
const handleRegenerate = useCallback(async () => {
|
||||
await handleGenerate();
|
||||
}, [handleGenerate]);
|
||||
|
||||
const backToInput = useCallback(() => {
|
||||
setError(null);
|
||||
setStep("input");
|
||||
}, []);
|
||||
|
||||
const backToTopics = useCallback(() => {
|
||||
setError(null);
|
||||
setStep("topics");
|
||||
}, []);
|
||||
|
||||
const fallbackCopyTextToClipboard = useCallback((text: string) => {
|
||||
const textArea = document.createElement("textarea");
|
||||
textArea.value = text;
|
||||
textArea.style.top = "0";
|
||||
textArea.style.left = "0";
|
||||
textArea.style.position = "fixed";
|
||||
textArea.style.opacity = "0";
|
||||
|
||||
document.body.appendChild(textArea);
|
||||
textArea.focus();
|
||||
textArea.select();
|
||||
|
||||
try {
|
||||
const successful = document.execCommand("copy");
|
||||
if (successful) {
|
||||
toast.success("已复制到剪贴板");
|
||||
} else {
|
||||
toast.error("复制失败,请手动复制");
|
||||
}
|
||||
} catch {
|
||||
toast.error("复制失败,请手动复制");
|
||||
}
|
||||
|
||||
document.body.removeChild(textArea);
|
||||
}, []);
|
||||
|
||||
const copyToClipboard = useCallback(
|
||||
(text: string) => {
|
||||
if (navigator.clipboard && window.isSecureContext) {
|
||||
navigator.clipboard
|
||||
.writeText(text)
|
||||
.then(() => {
|
||||
toast.success("已复制到剪贴板");
|
||||
})
|
||||
.catch(() => {
|
||||
fallbackCopyTextToClipboard(text);
|
||||
});
|
||||
} else {
|
||||
fallbackCopyTextToClipboard(text);
|
||||
}
|
||||
},
|
||||
[fallbackCopyTextToClipboard]
|
||||
);
|
||||
|
||||
return {
|
||||
step,
|
||||
inputUrl,
|
||||
setInputUrl,
|
||||
topics,
|
||||
selectedTopic,
|
||||
setSelectedTopic,
|
||||
wordCount,
|
||||
setWordCount,
|
||||
generatedScript,
|
||||
error,
|
||||
analysisId,
|
||||
fetchedCount,
|
||||
handleAnalyze,
|
||||
handleGenerate,
|
||||
handleRegenerate,
|
||||
backToInput,
|
||||
backToTopics,
|
||||
resetAll,
|
||||
copyToClipboard,
|
||||
};
|
||||
};
|
||||
@@ -67,7 +67,7 @@ export function PublishPage() {
|
||||
isOpen={Boolean(qrPlatform)}
|
||||
onClose={closeQrModal}
|
||||
panelClassName="w-full max-w-md rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden"
|
||||
closeOnOverlay={false}
|
||||
closeOnOverlay
|
||||
>
|
||||
<AppModalHeader
|
||||
title={`🔐 扫码登录 ${qrPlatform}`}
|
||||
|
||||
10
frontend/src/shared/types/timeline.ts
Normal file
10
frontend/src/shared/types/timeline.ts
Normal file
@@ -0,0 +1,10 @@
|
||||
export interface InsertSegment {
|
||||
id: string;
|
||||
materialId: string;
|
||||
materialName: string;
|
||||
start: number;
|
||||
end: number;
|
||||
sourceStart: number;
|
||||
sourceEnd: number;
|
||||
color: string;
|
||||
}
|
||||
@@ -11,6 +11,7 @@ interface AppModalProps {
|
||||
zIndexClassName?: string;
|
||||
panelClassName?: string;
|
||||
closeOnOverlay?: boolean;
|
||||
closeOnEsc?: boolean;
|
||||
lockBodyScroll?: boolean;
|
||||
}
|
||||
|
||||
@@ -21,6 +22,7 @@ export function AppModal({
|
||||
zIndexClassName = "z-[220]",
|
||||
panelClassName = "w-full max-w-2xl rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden",
|
||||
closeOnOverlay = true,
|
||||
closeOnEsc = true,
|
||||
lockBodyScroll = true,
|
||||
}: AppModalProps) {
|
||||
const containerRef = useRef<HTMLDivElement | null>(null);
|
||||
@@ -34,7 +36,7 @@ export function AppModal({
|
||||
if (!isOpen) return;
|
||||
|
||||
const handleEsc = (event: KeyboardEvent) => {
|
||||
if (event.key === "Escape") onCloseRef.current();
|
||||
if (closeOnEsc && event.key === "Escape") onCloseRef.current();
|
||||
};
|
||||
|
||||
const previousActiveElement = document.activeElement as HTMLElement | null;
|
||||
@@ -69,7 +71,7 @@ export function AppModal({
|
||||
|
||||
previousActiveElement?.focus?.();
|
||||
};
|
||||
}, [isOpen, lockBodyScroll]);
|
||||
}, [closeOnEsc, isOpen, lockBodyScroll]);
|
||||
|
||||
if (!isOpen || typeof document === "undefined") return null;
|
||||
|
||||
|
||||
Reference in New Issue
Block a user