Compare commits

...

7 Commits

Author SHA1 Message Date
Kevin Wong
0939d81e9f 更新 2026-03-10 10:59:38 +08:00
Kevin Wong
f879fb0001 更新 2026-03-09 10:18:14 +08:00
Kevin Wong
b289006844 更新 2026-03-05 17:23:22 +08:00
Kevin Wong
71b45852bf 更新 2026-03-04 17:35:59 +08:00
Kevin Wong
23ff4ff86e 更新 2026-03-04 14:07:54 +08:00
Kevin Wong
091f78174e 更新 2026-03-03 15:16:38 +08:00
Kevin Wong
190fc2e590 更新 2026-03-03 12:23:49 +08:00
75 changed files with 11545 additions and 2711 deletions

View File

@@ -2,6 +2,12 @@
本文档定义后端开发的结构规范、接口契约与实现习惯。目标是让新功能按统一范式落地,旧逻辑在修复时逐步抽离。
## 文档定位
- 本文档只定义后端开发规范与工程约束(分层职责、契约、流程、代码习惯)。
- 接口说明、部署运行与环境配置示例请查看 `Docs/BACKEND_README.md`
- 历史变更请记录在 `Docs/DevLogs/``Docs/TASK_COMPLETE.md`,不要写入本规范文档。
---
## 1. 模块化与分层原则
@@ -43,7 +49,7 @@ backend/
│ │ └── admin/ # 管理员功能
│ ├── repositories/ # Supabase 数据访问
│ ├── services/ # 外部服务集成
│ │ ├── uploader/ # 平台发布器douyin/weixin
│ │ ├── uploader/ # 平台发布器douyin/weixin/xiaohongshu/bilibili
│ │ ├── qr_login_service.py
│ │ ├── publish_service.py
│ │ ├── remotion_service.py
@@ -80,13 +86,23 @@ backend/
- `custom_assignments` 每项使用 `material_path/start/end/source_start/source_end?`,并以时间轴可见段为准。
- `output_aspect_ratio` 仅允许 `9:16` / `16:9`,默认 `9:16`
- 标题显示模式参数:
- `title_display_mode`: `short` / `persistent`(默认 `short`
- `title_display_mode`: `short` / `persistent`(默认 `short`,对主标题与副标题统一生效
- `title_duration`: 默认 `4.0`(秒),仅 `short` 模式生效
- 片头副标题参数:
- `secondary_title`: 副标题文字(可选,限 20 字),仅在视频画面中显示,不参与发布标题
- `secondary_title_style_id` / `secondary_title_font_size` / `secondary_title_top_margin`: 副标题样式配置
- workflow/remotion 侧需保持字段透传一致,避免前后端语义漂移。
### `/api/videos/cleanup` 行为约定
- 仅清理当前用户在 Storage 中的生成产物:
- `outputs` bucket生成视频
- `generated-audios` bucket预生成配音 `.wav/.json`
- 清理接口采用严格成功语义:
- 全部删除成功才返回 success
- 任一删除失败返回错误,前端应保留清理弹窗并允许重试
- 下载接口约定:`GET /api/videos/generated/{video_id}/download` 必须返回 `Content-Disposition: attachment`,用于前端一键下载,避免浏览器改为在线播放。
---
## 4. 认证与权限
@@ -94,6 +110,8 @@ backend/
- 认证方式:**HttpOnly Cookie** (`access_token`)。
- `get_current_user` / `get_current_user_optional` 位于 `core/deps.py`
- Session 单设备校验使用 `repositories/sessions.py`
- AI/Tools 等高成本接口必须强制鉴权(`Depends(get_current_user)`),禁止匿名调用消耗外部 API 配额。
- 生产环境要求 `DEBUG=false` + 非默认 `JWT_SECRET_KEY`;默认密钥在生产模式下必须阻止服务启动。
---
@@ -109,6 +127,16 @@ backend/
- 所有文件上传/下载/删除/移动通过 `services/storage.py`
- 需要重命名时使用 `move_file`,避免直接读写 Storage。
- `delete_file` 必须向上抛出异常,不允许静默吞错(避免清理接口出现“假成功”)。
- `list_files` 默认容错返回空列表;清理等强一致场景应使用 `strict=True`
- 所有用户输入的文件路径/ID 必须做防御校验:
- `material_id` 拒绝 `..` 序列,避免路径穿越
- `video_id` 等资源 ID 使用白名单(如 `^[A-Za-z0-9_-]+$`
- 上传/下载链路必须有体积上限:
- 素材上传遵循 `MAX_UPLOAD_SIZE_MB`
- 参考音频上限 5MB
- 文案提取工具文件上传与 URL 下载结果均上限 500MB
- 面向前端的错误返回默认使用通用文案;内部堆栈只写服务端日志,避免泄露路径/实现细节。
### Cookie 存储(用户隔离)
@@ -133,6 +161,8 @@ backend/user_data/{user_uuid}/cookies/
- 业务逻辑写在 service/workflow。
- 数据库访问写在 repositories。
- 统一使用 `loguru` 打日志。
- GLM SDK 调用统一收口到 `services/glm_service.py`(通过统一入口方法),避免在模块内重复拼装 `chat.completions.create` 调用代码。
- 涉及文案深度学习的抓取调用router 侧应透传 `current_user.id``creator_scraper`,以便复用用户 Cookie 上下文并保持 `analysis_id` 用户隔离。
---
@@ -162,7 +192,16 @@ backend/user_data/{user_uuid}/cookies/
- `MUSETALK_BATCH_SIZE` (推理批大小,默认 32)
- `MUSETALK_VERSION` (v15)
- `MUSETALK_USE_FLOAT16` (半精度,默认 true)
- `LIPSYNC_DURATION_THRESHOLD` (秒,>=此值用 MuseTalk默认 120)
- `LIPSYNC_DURATION_THRESHOLD` (秒,>=此值用 MuseTalk;代码默认 120,本仓库当前 `.env` 配置 100)
### 小脸口型质量补偿(本地唇形路径)
- `LIPSYNC_SMALL_FACE_ENHANCE` (总开关,默认 false)
- `LIPSYNC_SMALL_FACE_THRESHOLD` (触发阈值,默认 256)
- `LIPSYNC_SMALL_FACE_UPSCALER` (`gfpgan` / `codeformer`)
- `LIPSYNC_SMALL_FACE_GPU_ID` (超分 GPU默认 0)
- `LIPSYNC_SMALL_FACE_FAIL_OPEN` (失败回退,默认 true)
> 部署与验证细节见 `Docs/FACEENHANCE_DEPLOY.md`。
### 微信视频号
- `WEIXIN_HEADLESS_MODE` (headful/headless-new)
@@ -179,6 +218,14 @@ backend/user_data/{user_uuid}/cookies/
- `DOUYIN_FORCE_SWIFTSHADER`
- `DOUYIN_DEBUG_ARTIFACTS` / `DOUYIN_RECORD_VIDEO` / `DOUYIN_KEEP_SUCCESS_VIDEO`
### 小红书
- `XIAOHONGSHU_HEADLESS_MODE` (headful/headless-new默认 headless-new)
- `XIAOHONGSHU_CHROME_PATH` / `XIAOHONGSHU_BROWSER_CHANNEL`
- `XIAOHONGSHU_USER_AGENT`
- `XIAOHONGSHU_LOCALE` / `XIAOHONGSHU_TIMEZONE_ID`
- `XIAOHONGSHU_FORCE_SWIFTSHADER`
- `XIAOHONGSHU_DEBUG_ARTIFACTS`
### 支付宝
- `ALIPAY_APP_ID` / `ALIPAY_PRIVATE_KEY_PATH` / `ALIPAY_PUBLIC_KEY_PATH`
- `ALIPAY_NOTIFY_URL` / `ALIPAY_RETURN_URL`
@@ -191,8 +238,9 @@ backend/user_data/{user_uuid}/cookies/
## 10. Playwright 发布调试
- 诊断日志落盘:`backend/app/debug_screenshots/weixin_network.log` / `douyin_network.log`
- 关键失败截图:`backend/app/debug_screenshots/weixin_*.png` / `douyin_*.png`
- 关键失败截图:`backend/app/debug_screenshots/weixin_*.png` / `douyin_*.png` / `xiaohongshu_*.png`
- 视频号建议使用 headful + xvfb-run避免 headless 解码/指纹问题)
- 发布专项实现细节(登录链路、成功判定、排障)统一维护在 `Docs/PUBLISH_DEPLOY.md`
---

View File

@@ -1,6 +1,12 @@
# ViGent2 后端开发指南
本文档提供后端架构概览接口规范。开发规范与分层约定见 `Docs/BACKEND_DEV.md`
本文档提供后端架构概览接口说明与运行配置
## 📌 文档定位
- 本文档用于说明后端服务能力、接口与部署运行方式(面向使用与联调)。
- 开发规范、分层约束与代码实现习惯请查看 `Docs/BACKEND_DEV.md`
- 历史变更与里程碑请查看 `Docs/DevLogs/``Docs/TASK_COMPLETE.md`
---
@@ -8,7 +14,7 @@
后端采用 **FastAPI** 框架,基于 Python 3.10+ 构建主要负责业务逻辑处理、AI 任务调度以及与各微服务组件的交互。
### 目录结构
### 目录结构(概览)
```
backend/
@@ -36,6 +42,8 @@ backend/
└── requirements.txt # 依赖清单
```
> 详细分层职责router/service/workflow/repositories与开发约束请查看 `Docs/BACKEND_DEV.md`。
---
## 🔌 API 接口规范
@@ -56,24 +64,32 @@ backend/
2. **视频生成 (Videos)**
* `POST /api/videos/generate`: 提交生成任务
* `GET/POST /api/videos/voice-preview`: 生成音色试听短音频(返回二进制音频流)
* `POST /api/videos/cleanup`: 清理当前用户工作区生成产物outputs + generated-audios
* `GET /api/videos/tasks/{task_id}`: 查询单个任务状态
* `GET /api/videos/tasks`: 获取用户所有任务列表
* `GET /api/videos/generated`: 获取历史视频列表
* `GET /api/videos/generated/{video_id}/download`: 下载历史视频(`Content-Disposition: attachment`
* `DELETE /api/videos/generated/{video_id}`: 删除历史视频
> `POST /api/videos/cleanup` 采用严格成功语义:仅当目标文件删除全部成功时返回 success存在删除失败会返回错误并提示重试。
3. **素材管理 (Materials)**
* `POST /api/materials`: 上传素材
* `GET /api/materials`: 获取素材列表
* `PUT /api/materials/{material_id}`: 重命名素材
* `GET /api/materials/stream/{material_id}`: 同源流式返回素材文件(用于前端 canvas 截帧,避免跨域 CORS taint
* `GET /api/materials/stream/{material_id}`: 同源流式返回素材文件(用于前端 canvas 截帧,避免跨域 CORS taint;服务端会拒绝 `..` 路径
4. **社交发布 (Publish)**
* `POST /api/publish`: 发布视频到 抖音/微信视频号/B站/小红书
* `POST /api/publish/login`: 扫码登录平台
* `GET /api/publish/login/status`: 询登录状态(含刷脸验证二维码)
* `POST /api/publish/login/{platform}`: 获取平台二维码并启动扫码登录
* `GET /api/publish/login/status/{platform}`: 询登录状态(含抖音刷脸验证二维码)
* `POST /api/publish/logout/{platform}`: 注销平台登录(删除 Cookie
* `POST /api/publish/cookies/save/{platform}`: 保存客户端提取的 Cookie
* `GET /api/publish/accounts`: 获取已登录账号列表
* `GET /api/publish/screenshot/{filename}`: 获取发布成功截图(需登录)
> 提示:视频号/抖音发布建议使用 headful + xvfb-run 运行后端。
> 提示:视频号/抖音发布建议使用 headful + xvfb-run 运行后端。发布专项实现与部署说明见 `Docs/PUBLISH_DEPLOY.md`。
5. **资源库 (Assets)**
* `GET /api/assets/subtitle-styles`: 字幕样式列表
@@ -88,8 +104,9 @@ backend/
* `POST /api/ref-audios/{id}/retranscribe`: 重新识别参考音频文字Whisper 转写 + 超 10s 自动截取)
7. **AI 功能 (AI)**
* `POST /api/ai/generate-meta`: AI 生成标题和标签
* `POST /api/ai/translate`: AI 多语言翻译(支持 9 种目标语言)
* `POST /api/ai/generate-meta`: AI 生成标题和标签(需登录)
* `POST /api/ai/translate`: AI 多语言翻译(支持 9 种目标语言,需登录
* `POST /api/ai/rewrite`: AI 改写文案(需登录)
8. **预生成配音 (Generated Audios)**
* `POST /api/generated-audios/generate`: 异步生成配音(返回 task_id
@@ -99,11 +116,20 @@ backend/
* `PUT /api/generated-audios/{audio_id}`: 重命名配音
9. **工具 (Tools)**
* `POST /api/tools/extract-script`: 从视频链接提取文案
* `POST /api/tools/extract-script`: 从视频链接提取文案(需登录)
* `POST /api/tools/analyze-creator`: 分析博主标题并返回热门话题(需登录)
* `POST /api/tools/generate-topic-script`: 基于选中话题生成文案(需登录)
> 文案深度学习说明:
> - 平台支持:抖音 / B站博主主页链接。
> - 抓取策略:当前统一使用 Playwright 主链路抓取标题(抖音/B站并结合用户登录态 Cookie 上下文增强成功率。
> - `analysis_id` 绑定 `user_id` 且有 TTL默认 20 分钟),用于后续“生成文案”阶段安全读取标题上下文。
10. **健康检查**
* `GET /api/lipsync/health`: 唇形同步服务健康状态(含 LatentSync + MuseTalk + 混合路由阈值)
* `GET /api/voiceclone/health`: CosyVoice 3.0 服务健康状态
* `GET /api/videos/lipsync/health`: 唇形同步服务健康状态(含 LatentSync + MuseTalk + 混合路由阈值 + `data.small_face_enhance`
* `GET /api/videos/voiceclone/health`: CosyVoice 3.0 服务健康状态
> 小脸口型质量补偿链路健康字段说明:`data.small_face_enhance.enabled`(总开关)、`threshold`(触发阈值)、`detector_loaded`SCRFD 是否已懒加载)。
11. **支付 (Payment)**
* `POST /api/payment/create-order`: 创建支付宝电脑网站支付订单(需 payment_token
@@ -112,6 +138,16 @@ backend/
> 登录时若账号未激活或已过期,返回 403 + `payment_token`,前端跳转 `/pay` 页面完成付费。详见 [支付宝部署指南](ALIPAY_DEPLOY.md)。
### 安全基线(生产环境)
- `DEBUG` 必须设为 `false`:认证 Cookie 会带 `Secure`,仅在 HTTPS 下发送。
- `JWT_SECRET_KEY` 必须是强随机值且不能使用默认值;当 `DEBUG=false` 且仍为默认值时,后端会在启动阶段直接拒绝启动。
- 上传体积限制:
- `POST /api/materials`:受 `MAX_UPLOAD_SIZE_MB` 限制(默认 500MB
- `POST /api/ref-audios`5MB
- `POST /api/tools/extract-script`:文件上传与 URL 下载结果均限制 500MB
- `video_id` 在下载/删除接口使用白名单校验(`^[A-Za-z0-9_-]+$`),非法值直接返回 400。
### 统一响应结构
```json
@@ -138,9 +174,13 @@ backend/
- `speed`: 语速(声音克隆模式,默认 1.0,范围 0.8-1.2
- `custom_assignments`: 自定义素材分配数组(每项含 `material_path` / `start` / `end` / `source_start` / `source_end?`),存在时优先按时间轴可见段生成
- `output_aspect_ratio`: 输出画面比例(`9:16``16:9`,默认 `9:16`
- `language`: TTS 语言(默认自动检测,声音克隆时透传给 CosyVoice 3.0
- `lipsync_model`: 唇形模型路由模式(`default` / `fast` / `advanced`
- `default`: 阈值路由(`LIPSYNC_DURATION_THRESHOLD`
- `fast`: 强制 MuseTalk不可用时回退 LatentSync
- `advanced`: 强制 LatentSync
- `language`: TTS 语言区域(默认 `zh-CN`;会映射为 Whisper 的 `zh/en/...` 与 CosyVoice 的 `Chinese/English/Auto`
- `title`: 片头标题文字
- `title_display_mode`: 标题显示模式(`short` / `persistent`,默认 `short`
- `title_display_mode`: 标题显示模式(`short` / `persistent`,默认 `short`;该模式对主标题与副标题统一生效
- `title_duration`: 标题显示时长(秒,默认 `4.0``short` 模式生效)
- `subtitle_style_id`: 字幕样式 ID
- `title_style_id`: 标题样式 ID
@@ -161,7 +201,7 @@ backend/
- 多素材片段在拼接前统一重编码,并强制 `25fps + CFR`,减少段边界时间基不一致导致的画面卡顿。
- concat 流程启用 `+genpts` 重建时间戳,提升拼接后时间轴连续性。
- 对带旋转元数据的 MOV 素材会先做方向归一化,再进入分辨率判断和后续流程。
- compose 阶段(视频轨+音频轨合并)使用 `-c:v copy` 流复制替代重编码,几乎瞬间完成
- compose 阶段(视频轨+音频轨合并)在**无需循环视频**时使用 `-c:v copy` 流复制;需要循环时才重编码
- FFmpeg 子进程设有超时保护:`_run_ffmpeg()` 600 秒、`_get_duration()` 30 秒,防止畸形文件导致永久挂起。
### 全局并发控制
@@ -203,14 +243,14 @@ pip install -r requirements.txt
### 3. 环境变量配置
复制 `.env.example``.env` 并配置必要的 Key
当前仓库使用 `backend/.env` 作为运行配置基准;请按你的环境替换敏感值并核对以下关键项(生产环境请勿提交真实密钥)
```ini
# Supabase
SUPABASE_URL=http://localhost:8008
SUPABASE_KEY=your_service_role_key
# GLM API (用于 AI 标题生成)
# GLM API (用于 AI 标题/改写/翻译/文案深度学习)
GLM_API_KEY=your_glm_api_key
# LatentSync 配置
@@ -220,9 +260,24 @@ LATENTSYNC_GPU_ID=1
MUSETALK_GPU_ID=0
MUSETALK_API_URL=http://localhost:8011
MUSETALK_BATCH_SIZE=32
LIPSYNC_DURATION_THRESHOLD=120
LIPSYNC_DURATION_THRESHOLD=100
# 小脸口型质量补偿(默认关闭,建议灰度开启)
LIPSYNC_SMALL_FACE_ENHANCE=false
LIPSYNC_SMALL_FACE_THRESHOLD=256
LIPSYNC_SMALL_FACE_UPSCALER=gfpgan
LIPSYNC_SMALL_FACE_GPU_ID=0
LIPSYNC_SMALL_FACE_FAIL_OPEN=true
# MuseTalk 可调参数(示例)
MUSETALK_DETECT_EVERY=2
MUSETALK_BLEND_CACHE_EVERY=2
MUSETALK_ENCODE_CRF=14
MUSETALK_ENCODE_PRESET=slow
```
> 小脸口型质量补偿链路部署、权重与回滚说明见 `Docs/FACEENHANCE_DEPLOY.md`(仅本地 `_local_generate()` 路径接入,远程模式暂不接入)。
### 4. 启动服务
**开发模式 (热重载)**:
@@ -232,51 +287,11 @@ uvicorn app.main:app --host 0.0.0.0 --port 8006 --reload
---
## 🧩 服务集成指南
## 🧩 开发约定与测试
### 集成新模型
如果需要集成新的 AI 模型 (例如新的 TTS 引擎)
1.`app/services/` 下创建新的 Service 类 (如 `NewTTSService`)。
2. 实现 `generate` 方法,可以使用 subprocess 调用,也可以是 HTTP 请求。
3. **重要**: 如果模型占用 GPU请务必使用 `asyncio.Lock` 进行并发控制,防止 OOM。
4.`app/modules/` 下创建对应模块,添加 router/service/schemas并在 `main.py` 注册路由。
### 唇形同步混合路由
`lipsync_service.py` 实现了 LatentSync + MuseTalk 混合路由:
- 短视频 (<`LIPSYNC_DURATION_THRESHOLD`s) → LatentSync 1.6 (GPU1, 端口 8007)
- 长视频 (>=阈值) → MuseTalk 1.5 (GPU0, 端口 8011)
- MuseTalk 不可用时自动回退到 LatentSync
- 路由逻辑对 workflow 完全透明
### 添加定时任务
目前推荐使用 **APScheduler****Crontab** 来管理定时任务。
社交媒体的定时发布功能目前依赖 `playwright` 的延迟执行,未来计划迁移到 Celery 队列。
---
## 🛡️ 错误处理
全项目统一使用 `Loguru` 进行日志记录。
```python
from loguru import logger
try:
# 业务逻辑
except Exception as e:
logger.error(f"操作失败: {str(e)}")
raise HTTPException(status_code=500, detail="服务器内部错误")
```
---
## 🧪 测试
运行测试套件:
- 新增模块、分层职责、统一响应、错误处理与调试规范请查看 `Docs/BACKEND_DEV.md`
- 建议在核心流程变更后做基础冒烟:登录、视频生成、发布。
- 测试命令
```bash
pytest

View File

@@ -8,7 +8,7 @@
| 端口 | 8010 |
| GPU | 0 (CUDA_VISIBLE_DEVICES=0) |
| 推理精度 | FP16 (自动混合精度) |
| PM2 名称 | vigent2-cosyvoice (id=15) |
| PM2 名称 | vigent2-cosyvoice |
| Conda 环境 | cosyvoice (Python 3.10) |
| 启动脚本 | `run_cosyvoice.sh` |
| 服务脚本 | `models/CosyVoice/cosyvoice_server.py` |

View File

@@ -97,10 +97,13 @@ python -m scripts.server # 测试能否启动Ctrl+C 退出
### 3b. MuseTalk 1.5 (长视频唇形同步, GPU0)
> MuseTalk 是单步潜空间修复模型(非扩散模型),推理速度接近实时,适合 >=120s 的长视频。与 CosyVoice 共享 GPU0fp16 推理约需 4-8GB 显存。合成阶段使用 NVENC GPU 硬编码h264_nvenc+ 纯 numpy blending避免双重编码和 PIL 转换开销
> MuseTalk 是单步潜空间修复模型(非扩散模型),推理速度接近实时,适合达到路由阈值的长视频(本仓库当前 `.env` 示例为 >=100s。与 CosyVoice 共享 GPU0fp16 推理约需 4-8GB 显存。合成阶段已改为 FFmpeg rawvideo 管道直编码(`libx264` + 可配 CRF/preset并保留 numpy blending减少中间有损文件
请参考详细的独立部署指南:
**[MuseTalk 部署指南](MUSETALK_DEPLOY.md)**
请参考详细的独立部署指南:
**[MuseTalk 部署指南](MUSETALK_DEPLOY.md)**
小脸口型质量补偿(可选)部署与验证:
**[小脸口型质量补偿链路部署指南](FACEENHANCE_DEPLOY.md)**
简要步骤:
1. 创建独立的 `musetalk` Conda 环境 (Python 3.10 + PyTorch 2.0.1 + CUDA 11.8)
@@ -136,26 +139,30 @@ pip install -r requirements.txt
playwright install chromium
```
> 提示:视频号发布建议使用系统 Chrome + xvfb-run避免 headless 解码失败)。
> 抖音发布同样建议 headful 模式 (`DOUYIN_HEADLESS_MODE=headful`)。
> 提示:视频号发布建议使用系统 Chrome + xvfb-run避免 headless 解码失败)。
> 抖音发布同样建议 headful 模式 (`DOUYIN_HEADLESS_MODE=headful`)。
> 四平台发布专项实现说明请见 `Docs/PUBLISH_DEPLOY.md`。
### 扫码登录注意事项
- **Cookie 按用户隔离**:每个用户的 Cookie 存储在 `backend/user_data/{uuid}/cookies/` 目录下,多用户并发登录互不干扰。
- **抖音 QR 登录关键教训**
- 扫码后绝对**不能重新加载 QR 页面**,否则会销毁会话 token
- 使用**新标签页**检测登录完成状态(检查 URL 包含 `creator-micro` + session cookies 存在)
- 抖音可能弹出**刷脸验证**,后端会自动提取验证二维码返回给前端展示
- **微信视频号发布**:标题、描述、标签统一写入"视频描述"字段
- **抖音 QR 登录关键教训**
- 扫码后绝对**不能重新加载 QR 页面**,否则会销毁会话 token
- 使用**新标签页**检测登录完成状态(检查 URL 包含 `creator-micro` + session cookies 存在)
- 抖音可能弹出**刷脸验证**,后端会自动提取验证二维码返回给前端展示
- **小红书 QR 登录关键点**
- 创作平台默认可能是短信登录视图,需先切换到扫码登录再抓取二维码
- 扫码后可能跳转 `creator.xiaohongshu.com/new/home`,不一定命中旧 `publish` 成功指示 URL
- **微信视频号发布**:标题、描述、标签统一写入"视频描述"字段
---
### 可选AI 标题/标签生成
### 可选AI 标题/标签生成
> ✅ 如需启用“AI 标题/标签生成”功能,请确保后端可访问外网 API。
- 需要可访问 `https://open.bigmodel.cn`
- API Key 配置在 `backend/app/services/glm_service.py`(建议替换为自己的密钥)
- 需要可访问 `https://open.bigmodel.cn`
- API Key 配置在 `backend/.env``GLM_API_KEY`
---
@@ -195,28 +202,26 @@ playwright install chromium
## 步骤 7: 配置环境变量
```bash
cd /home/rongye/ProgramFiles/ViGent2/backend
# 复制配置模板
cp .env.example .env
```
> 💡 **说明**`.env.example` 已包含正确的默认配置,直接复制即可使用。
> 如需自定义,可编辑 `.env` 修改以下参数:
| 配置项 | 默认值 | 说明 |
|--------|--------|------|
| `SUPABASE_URL` | `http://localhost:8008` | Supabase API 内部地址 |
| `SUPABASE_PUBLIC_URL` | `https://api.hbyrkj.top` | Supabase API 公网地址 (前端访问) |
| `LATENTSYNC_GPU_ID` | 1 | GPU 选择 (0 或 1) |
| `LATENTSYNC_USE_SERVER` | false | 设为 true 以启用常驻服务加速 |
| `LATENTSYNC_INFERENCE_STEPS` | 20 | 推理步数 (16-50) |
| `LATENTSYNC_GUIDANCE_SCALE` | 2.0 | 引导系数 (1.0-3.0) |
| `LATENTSYNC_ENABLE_DEEPCACHE` | true | DeepCache 推理加速 |
| `LATENTSYNC_SEED` | 1247 | 固定随机种子(可复现 |
| `DEBUG` | true | 生产环境改为 false |
| `REDIS_URL` | `redis://localhost:6379/0` | 任务状态存储(不可用时回退内存) |
```bash
cd /home/rongye/ProgramFiles/ViGent2/backend
```
> 💡 **说明**:当前仓库直接使用 `backend/.env`。请按你的环境替换敏感值并确认以下参数。
> 如需自定义,可编辑 `.env` 修改以下参数:
| 配置项 | 当前示例值 | 说明 |
|--------|------------|------|
| `SUPABASE_URL` | `http://localhost:8008` | Supabase API 内部地址 |
| `SUPABASE_PUBLIC_URL` | `https://api.hbyrkj.top` | Supabase API 公网地址 (前端访问) |
| `LATENTSYNC_GPU_ID` | 1 | GPU 选择 (0 或 1) |
| `LATENTSYNC_USE_SERVER` | true | 设为 true 以启用常驻服务加速 |
| `LATENTSYNC_INFERENCE_STEPS` | 30 | 推理步数 (16-50) |
| `LATENTSYNC_GUIDANCE_SCALE` | 1.9 | 引导系数 (1.0-3.0) |
| `LATENTSYNC_ENABLE_DEEPCACHE` | true | DeepCache 推理加速 |
| `LATENTSYNC_SEED` | 1247 | 固定随机种子(可复现) |
| `DEBUG` | false | 生产环境必须为 false仅开发环境可设 true |
| `JWT_SECRET_KEY` | 强随机值 | 生产环境禁止默认值;默认值在 `DEBUG=false` 下会阻止后端启动 |
| `REDIS_URL` | `redis://localhost:6379/0` | 任务状态存储(不可用时回退内存 |
| `WEIXIN_HEADLESS_MODE` | headless-new | 视频号 Playwright 模式 (headful/headless-new) |
| `WEIXIN_CHROME_PATH` | `/usr/bin/google-chrome` | 系统 Chrome 路径 |
| `WEIXIN_BROWSER_CHANNEL` | | Chromium 通道 (可选) |
@@ -229,19 +234,31 @@ cp .env.example .env
| `DOUYIN_CHROME_PATH` | `/usr/bin/google-chrome` | 抖音 Chrome 路径 |
| `DOUYIN_BROWSER_CHANNEL` | | 抖音 Chromium 通道 (可选) |
| `DOUYIN_USER_AGENT` | Chrome/144 UA | 抖音浏览器指纹 UA |
| `DOUYIN_LOCALE` | zh-CN | 抖音语言环境 |
| `DOUYIN_TIMEZONE_ID` | Asia/Shanghai | 抖音时区 |
| `DOUYIN_FORCE_SWIFTSHADER` | true | 强制软件 WebGL |
| `DOUYIN_DEBUG_ARTIFACTS` | false | 保留调试截图 |
| `DOUYIN_RECORD_VIDEO` | false | 录制浏览器操作视频 |
| `DOUYIN_KEEP_SUCCESS_VIDEO` | false | 成功后保留录屏 |
| `DOUYIN_LOCALE` | zh-CN | 抖音语言环境 |
| `DOUYIN_TIMEZONE_ID` | Asia/Shanghai | 抖音时区 |
| `DOUYIN_FORCE_SWIFTSHADER` | true | 强制软件 WebGL |
| `XIAOHONGSHU_HEADLESS_MODE` | headless-new | 小红书 Playwright 模式 (headful/headless-new) |
| `XIAOHONGSHU_CHROME_PATH` | `/usr/bin/google-chrome` | 小红书 Chrome 路径 |
| `XIAOHONGSHU_BROWSER_CHANNEL` | | 小红书 Chromium 通道 (可选) |
| `XIAOHONGSHU_USER_AGENT` | Chrome/144 UA | 小红书浏览器指纹 UA |
| `XIAOHONGSHU_LOCALE` | zh-CN | 小红书语言环境 |
| `XIAOHONGSHU_TIMEZONE_ID` | Asia/Shanghai | 小红书时区 |
| `XIAOHONGSHU_FORCE_SWIFTSHADER` | true | 强制软件 WebGL |
| `DOUYIN_DEBUG_ARTIFACTS` | false | 保留调试截图 |
| `DOUYIN_RECORD_VIDEO` | false | 录制浏览器操作视频 |
| `DOUYIN_KEEP_SUCCESS_VIDEO` | false | 成功后保留录屏 |
| `CORS_ORIGINS` | `*` | CORS 允许源 (生产环境建议白名单) |
| `MUSETALK_GPU_ID` | 0 | MuseTalk GPU 编号 |
| `MUSETALK_API_URL` | `http://localhost:8011` | MuseTalk 常驻服务地址 |
| `MUSETALK_BATCH_SIZE` | 32 | MuseTalk 推理批大小 |
| `MUSETALK_VERSION` | v15 | MuseTalk 模型版本 |
| `MUSETALK_USE_FLOAT16` | true | MuseTalk 半精度加速 |
| `LIPSYNC_DURATION_THRESHOLD` | 120 | 秒,>=此值用 MuseTalk<此值用 LatentSync |
| `MUSETALK_VERSION` | v15 | MuseTalk 模型版本 |
| `MUSETALK_USE_FLOAT16` | true | MuseTalk 半精度加速 |
| `LIPSYNC_DURATION_THRESHOLD` | 100 | 秒,>=此值用 MuseTalk<此值用 LatentSync(代码默认 120建议在 `.env` 显式配置) |
| `LIPSYNC_SMALL_FACE_ENHANCE` | false | 小脸口型质量补偿总开关(建议先关闭,灰度验证后开启) |
| `LIPSYNC_SMALL_FACE_THRESHOLD` | 256 | 小脸触发阈值(像素) |
| `LIPSYNC_SMALL_FACE_UPSCALER` | gfpgan | 超分模型(`gfpgan` / `codeformer` |
| `LIPSYNC_SMALL_FACE_GPU_ID` | 0 | 小脸补偿超分 GPU建议与 MuseTalk 同卡) |
| `LIPSYNC_SMALL_FACE_FAIL_OPEN` | true | 补偿链路失败时是否自动回退原流程 |
| `ALIPAY_APP_ID` | 空 | 支付宝应用 APPID |
| `ALIPAY_PRIVATE_KEY_PATH` | 空 | 应用私钥 PEM 文件路径 |
| `ALIPAY_PUBLIC_KEY_PATH` | 空 | 支付宝公钥 PEM 文件路径 |
@@ -250,7 +267,9 @@ cp .env.example .env
| `PAYMENT_AMOUNT` | `999.00` | 会员价格 (元) |
| `PAYMENT_EXPIRE_DAYS` | `365` | 会员有效天数 |
> 支付宝完整配置步骤密钥生成、PEM 格式、产品开通等)请参考 **[支付宝部署指南](ALIPAY_DEPLOY.md)**。
> 支付宝完整配置步骤密钥生成、PEM 格式、产品开通等)请参考 **[支付宝部署指南](ALIPAY_DEPLOY.md)**。
> 认证相关强约束:当 `DEBUG=false` 时,后端登录 Cookie 会带 `Secure`,前端必须通过 HTTPS 域名访问HTTP 端口直连无法保持登录态。
---
@@ -308,11 +327,11 @@ cd /home/rongye/ProgramFiles/ViGent2/models/MuseTalk
/home/rongye/ProgramFiles/miniconda3/envs/musetalk/bin/python scripts/server.py
```
### 验证
1. 访问 http://服务器IP:3002 查看前端
2. 访问 http://服务器IP:8006/docs 查看 API 文档
3. 上传测试视频,生成口播视频
### 验证
1. 访问 `https://你的前端域名` 查看前端(生产环境不要用 HTTP 端口直连)
2. 访问 `http://服务器IP:8006/docs` 查看 API 文档(仅内网/运维调试)
3. 上传测试视频,生成口播视频
---
@@ -402,7 +421,7 @@ curl http://localhost:8010/health
### 5. 启动 MuseTalk 长视频唇形同步服务
> 长视频 (>=120s) 自动路由到 MuseTalk。MuseTalk 不可用时自动回退 LatentSync。
> 达到阈值(当前 `.env` 示例为 >=100s自动路由到 MuseTalk。MuseTalk 不可用时自动回退 LatentSync。
> 详细部署步骤见 [MuseTalk 部署指南](MUSETALK_DEPLOY.md)。
1. 启动脚本位于项目根目录: `run_musetalk.sh`
@@ -532,8 +551,8 @@ server {
GLM_API_KEY=your_zhipu_api_key
```
3. **验证**:
访问 `http://localhost:8006/docs`,测试 `/api/tools/extract-script` 接口
3. **验证**:
访问 `http://localhost:8006/docs`在已登录会话下测试 `/api/tools/extract-script`(该接口需认证)
---

View File

@@ -1,8 +1,8 @@
## Remotion 缓存修复 + 编码流水线质量优化 + 唇形同步容错 + 模型选择 (Day 30)
## Remotion 缓存修复 + 编码流水线质量优化 + 唇形同步容错 + 统一下拉交互 (Day 30)
### 概述
本轮解决四大方面:(1) Remotion bundle 缓存导致标题/字幕丢失的严重 Bug(2) 全面优化 LatentSync + MuseTalk 双引擎编码流水线,消除冗余有损编码;(3) 增强 LatentSync 的鲁棒性,允许素材中部分帧检测不到人脸时继续推理而非中断任务;(4) 前端唇形模型选择,用户可按需切换默认/快速/高级模型
本轮最终合并为五大方面:(1) Remotion bundle 缓存导致标题/字幕丢失的严重 Bug(2) 全面优化 LatentSync + MuseTalk 双引擎编码流水线,消除冗余有损编码;(3) 增强 LatentSync 的鲁棒性,允许素材中部分帧检测不到人脸时继续推理而非中断任务;(4) 唇形模型选择全链路透传(默认/快速/高级);(5) 首页与发布页选择器统一为 SelectPopover 交互,并修复遮挡、定位与预览层级问题
---
@@ -278,66 +278,102 @@ needs_audio_compose = str(final_audio_path) != str(audio_path)
---
### 6. 唇形模型前端选择
前端生成按钮右侧新增模型下拉,用户可按需选择唇形同步引擎,全链路透传到后端路由。
#### 模型选项
| 选项 | 值 | 路由逻辑 |
|------|------|------|
| 默认模型 | `default` | 保持现有阈值策略`LIPSYNC_DURATION_THRESHOLD` 分水岭,短视频 LatentSync长视频 MuseTalk |
| 快速模型 | `fast` | 强制 MuseTalk不可用时回退 LatentSync |
| 高级模型 | `advanced` | 强制 LatentSync,跳过 MuseTalk |
三种模式最终都有 LatentSync 兜底,不会出现无模型可用的情况。
#### 数据流
```
前端 select → setLipsyncModelMode("fast") → localStorage 持久化
用户点击"生成视频" → handleGenerate()
→ payload.lipsync_model = lipsyncModelMode
→ POST /api/videos/generate { ..., lipsync_model: "fast" }
→ workflow: req.lipsync_model 透传给 lipsync.generate(model_mode=...)
→ lipsync_service.generate(): 按 model_mode 路由
→ fast: 强制 MuseTalk → 回退 LatentSync
→ advanced: 强制 LatentSync
→ default: 阈值策略
```
#### 改动文件
| 文件 | 改动 |
|------|------|
| `frontend/src/features/home/ui/GenerateActionBar.tsx` | 生成按钮右侧新增模型 `<select>` 下拉 |
| `frontend/src/features/home/ui/HomePage.tsx` | 透传 `modelMode` / `onModelModeChange` |
| `frontend/src/features/home/model/useHomeController.ts` | `lipsyncModelMode` state + payload 透传 |
| `frontend/src/features/home/model/useHomePersistence.ts` | 读/校验/写三步持久化 |
| `backend/app/modules/videos/schemas.py` | `lipsync_model: Literal["default", "fast", "advanced"]` |
| `backend/app/modules/videos/workflow.py` | 多素材/单素材两处 `model_mode=req.lipsync_model` 透传 |
| `backend/app/services/lipsync_service.py` | `generate()` 新增 `model_mode` 参数,三路分支路由 |
---
### 6. 唇形模型选择全链路
前端生成视频”按钮右侧新增模型选择,下拉值全链路透传到后端路由与推理服务
#### 模型选项
| 选项 | 值 | 路由逻辑 |
|------|------|------|
| 默认模型 | `default` | 保持阈值路由`LIPSYNC_DURATION_THRESHOLD`,当前建议 100s |
| 快速模型 | `fast` | 强制 MuseTalk不可用时回退 LatentSync |
| 高级模型 | `advanced` | 强制 LatentSync |
#### 最终 UI 形态
- 模型按钮由原生 `<select>` 升级为统一 `SelectPopover`
- 触发器文案改为业务语义(`默认模型 / 快速模型 / 高级模型` + `按时长智能路由 / 速度优先 / 质量优先`
- 选择状态持久化到 `useHomePersistence``lipsyncModelMode`
#### 数据流
```
前端 SelectPopover → setLipsyncModelMode("fast") → localStorage 持久化
用户点击"生成视频" → handleGenerate()
→ payload.lipsync_model = lipsyncModelMode
→ POST /api/videos/generate { ..., lipsync_model: "fast" }
→ workflow: req.lipsync_model 透传给 lipsync.generate(model_mode=...)
→ lipsync_service.generate(): 按 model_mode 路由
→ fast: 强制 MuseTalk → 回退 LatentSync
→ advanced: 强制 LatentSync
→ default: 阈值策略
```
---
### 7. 首页/发布页统一下拉交互SelectPopover
#### 7a. 统一改造范围
首页与发布页的业务选择项统一迁移到 `SelectPopover`
- 首页音色、参考音频、配音列表、素材选择、BGM 选择、作品选择、标题显示模式、标题/副标题/字幕样式、时间轴画面比例、唇形模型
- 发布页:选择发布作品(搜索 + 预览)
例外:`ScriptEditor` 的“历史文案 / AI多语言”按产品要求恢复为原有轻量菜单不强制统一。
#### 7b. 关键交互修复
- **遮挡修复**:桌面端面板改为 `Portal + fixed`,脱离局部 stacking context彻底解决被卡片遮挡
- **上拉/下拉自适应**:底部空间不足时自动上拉,避免菜单显示不全
- **同宽展示**:面板宽度与触发器保持一致
- **风格统一**:面板背景加实(高不透明度),滚动条隐藏但可滚动
- **已选定位**:再次打开下拉时自动滚动到已选项(`data-popover-selected="true"`
- **预览协同**
- 下拉内点“预览”不强制关闭,支持连续预览
- 视频预览弹窗层级高于下拉,避免被遮挡
- 预览弹窗打开时,下拉不会因外部点击/Esc被误关闭关闭预览后仍可继续操作
#### 7c. BGM 面板收敛
- BGM 改为与“发布作品”同款选择器(搜索 + 列表 + 试听 + 选中态)
- 按产品要求移除首页 BGM 音量滑杆
- 生成请求统一使用固定 `bgm_volume=0.2`
---
## 📁 总修改文件清单
| 文件 | 改动 |
|------|------|
| `remotion/render.ts` | bundle 缓存使用时硬链接视频+字体到 public 目录 |
| `models/LatentSync/latentsync/utils/util.py` | `read_video` 检测 FPS25fps 时跳过重编码 |
| `models/LatentSync/latentsync/pipelines/lipsync_pipeline.py` | final mux `-c:v copy`;无脸帧容错 |
| `backend/app/services/video_service.py` | CRF 23→18`concat_videos` copy`compose()` 异步化 + 循环 CRF 18 |
| `backend/app/modules/videos/workflow.py` | 线程池化;同分辨率跳过 scalecompose 跳过;片段校验;模型选择透传 |
| `backend/app/modules/videos/schemas.py` | 新增 `lipsync_model` 字段 |
| `backend/app/services/lipsync_service.py` | `generate()` 新增 `model_mode` 三路分支路由 |
| `models/MuseTalk/scripts/server.py` | FFmpeg rawvideo 管道;参数环境变量化 |
| `backend/.env` | 新增 MuseTalk 质量优先参数 |
| `frontend/src/features/home/ui/GenerateActionBar.tsx` | 模型下拉 UI |
| `frontend/src/features/home/ui/HomePage.tsx` | 模型状态透传 |
| `frontend/src/features/home/model/useHomeController.ts` | `lipsyncModelMode` state + payload |
| `frontend/src/features/home/model/useHomePersistence.ts` | 模型选择持久化 |
| 文件 | 改动 |
|------|------|
| `remotion/render.ts` | bundle 缓存使用时硬链接视频+字体到 public 目录 |
| `models/LatentSync/latentsync/utils/util.py` | `read_video` 检测 FPS25fps 时跳过重编码 |
| `models/LatentSync/latentsync/pipelines/lipsync_pipeline.py` | final mux `-c:v copy`;无脸帧容错 |
| `backend/app/services/video_service.py` | CRF 23→18`concat_videos` copy`compose()` 异步化 + 循环 CRF 18 |
| `backend/app/modules/videos/workflow.py` | 线程池化;同分辨率跳过 scalecompose 跳过;片段校验;模型选择透传 |
| `backend/app/modules/videos/schemas.py` | 新增 `lipsync_model` 字段 |
| `backend/app/services/lipsync_service.py` | `generate()` 新增 `model_mode` 三路分支路由 |
| `models/MuseTalk/scripts/server.py` | FFmpeg rawvideo 管道;参数环境变量化 |
| `backend/.env` | MuseTalk 推理/融合/编码参数可配;路由阈值与质量档调优 |
| `frontend/src/shared/ui/SelectPopover.tsx` | 新增统一选择器Portal+fixed、防遮挡、上拉/下拉自适应、同宽、隐藏滚动条、已选定位、预览协同 |
| `frontend/src/features/home/ui/HomePage.tsx` | 配音卡层级修复;传递统一下拉状态 |
| `frontend/src/features/home/model/useHomeController.ts` | `lipsyncModelMode` 透传BGM 固定 `bgm_volume=0.2` |
| `frontend/src/features/home/model/useHomePersistence.ts` | 模型模式等新增字段持久化 |
| `frontend/src/features/home/ui/GenerateActionBar.tsx` | 模型选择改为 SelectPopover速度/质量语义文案) |
| `frontend/src/features/home/ui/VoiceSelector.tsx` | 音色选择统一为 SelectPopover音色名+语言) |
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 参考音频选择统一为 SelectPopover含试听/重命名/删除/重识别) |
| `frontend/src/features/home/ui/GeneratedAudiosPanel.tsx` | 配音列表、语速、语气统一为 SelectPopover |
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 素材选择改为发布页同款下拉(搜索/多选/预览/重命名/删除) |
| `frontend/src/features/home/ui/BgmPanel.tsx` | BGM 选择改为发布页同款下拉(搜索+试听),移除音量滑杆 |
| `frontend/src/features/home/ui/HistoryList.tsx` | 首页作品选择改为下拉(搜索+删除+选中态) |
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | 标题显示模式与样式选择统一为 SelectPopover |
| `frontend/src/features/home/ui/TimelineEditor.tsx` | 画面比例选择统一为 SelectPopover单行按钮 |
| `frontend/src/features/publish/ui/PublishPage.tsx` | 发布作品选择改为 SelectPopover预览时下拉保持打开 |
| `frontend/src/components/VideoPreviewModal.tsx` | 提升层级并添加预览标记,与下拉联动 |
| `frontend/src/features/home/ui/ScriptEditor.tsx` | 历史文案/AI多语言恢复原轻量菜单产品例外 |
| `Docs/FRONTEND_DEV.md` | 新增 SelectPopover 规范、预览层级规范、持久化字段修订 |
---
@@ -358,6 +394,12 @@ needs_audio_compose = str(final_audio_path) != str(audio_path)
13. **compose 循环 CRF**: 循环场景编码应为 CRF 18非 23
14. **模型选择 UI**: 生成按钮右侧应出现默认模型/快速模型/高级模型下拉
15. **模型选择持久化**: 切换模型后刷新页面,下拉应恢复上次选择
16. **快速模型路由**: 选择"快速模型"时,后端日志应出现 `强制快速模型MuseTalk`
17. **高级模型路由**: 选择"高级模型"时,后端日志应出现 `强制高级模型LatentSync`
18. **默认模型不变**: 选择"默认模型"时行为与改动前完全一致(阈值路由)
16. **快速模型路由**: 选择"快速模型"时,后端日志应出现 `强制快速模型MuseTalk`
17. **高级模型路由**: 选择"高级模型"时,后端日志应出现 `强制高级模型LatentSync`
18. **默认模型不变**: 选择"默认模型"时行为与改动前完全一致(阈值路由)
19. **统一下拉样式**: 首页/发布页业务选择项均为同款 SelectPopover触发器 + 面板 + 选中态)
20. **上拉自适应**: 页面底部打开下拉时应自动上拉,不出现被截断
21. **已选定位**: 任意下拉再次打开时应自动定位到已选项,而非列表顶端
22. **预览层级**: 视频预览弹窗应始终覆盖在下拉之上,不被菜单遮挡
23. **连续预览**: 下拉内点击预览后菜单保持打开,关闭预览后可继续点击其他预览项
24. **BGM 行为**: 首页 BGM 不再显示音量滑杆,生成请求固定 `bgm_volume=0.2`

526
Docs/DevLogs/Day31.md Normal file
View File

@@ -0,0 +1,526 @@
## 文档分层收敛 + 音色试听修复 + 录音弹窗重构 + 弹窗体系统一 (Day 31)
### 概述
今天的工作聚焦四件事:
1. 清理并收敛根目录文档README/DEV 职责边界、历史内容归档、参数描述与代码对齐)
2. 完成 EdgeTTS 音色列表「一键试听」能力,并修复浏览器端试听失败问题
3. 重构声音克隆录音交互:录音入口下沉到参考音频区域底部右侧,流程改为弹窗
4. 抽离统一弹窗基座 `AppModal`,将主要弹窗迁移到同一视觉和交互规范
---
## ✅ 1) 文档体系与内容一致性优化
### 1.1 README / DEV 边界明确
-`FRONTEND_README.md``BACKEND_README.md``FRONTEND_DEV.md``BACKEND_DEV.md` 增加「文档定位」
- README 只保留稳定说明功能、接口、运行DEV 保留规范约束、分层、Checklist
- 将 README 中偏日志化内容(如 Day 标注)清理为稳定表述
### 1.2 部署与参数文档对齐当前代码
- 将唇形路由阈值文案统一为阈值驱动,并以当前 `.env` 示例 `100` 为参考
- 修正旧编码描述(将 MuseTalk 合成描述对齐为 rawvideo 管道 + `libx264`
- 修复文档中不存在的 `.env.example` 指引,改为基于 `backend/.env` 的说明
- 将 Qwen3-TTS 文档标注为「历史归档(已停用)」并指向 CosyVoice 3.0
---
## ✅ 2) 音色试听能力落地与故障修复
### 2.1 功能实现
- 音色下拉项新增试听按钮(播放/暂停/加载态)
- 新增后端试听接口:`/api/videos/voice-preview`
- 试听文本按音色 locale 自动选择固定示例文案9 国语言 + 中文兜底)
### 2.2 兼容与稳定性调整
- 保留 `POST /api/videos/voice-preview`(兼容)
- 新增 `GET /api/videos/voice-preview?voice=...`,前端改为直接播放 GET 音频流,减少浏览器自动播放策略干扰
```python
@router.get("/voice-preview")
async def preview_voice_get(voice: str, current_user: dict = Depends(get_current_user)):
voice_value = voice.strip()
if not voice_value:
raise HTTPException(status_code=400, detail="voice 不能为空")
text = _get_preview_text_for_voice(voice_value)
return await _render_voice_preview(voice=voice_value, text=text)
```
### 2.3 本次线上问题结论(已修复)
- 现象:浏览器端试听请求 404
- 根因:新增 GET 路由后,后端进程未重启,运行中的代码仍是旧版本
- 处理:`pm2 restart vigent2-backend` 后路由生效
- 补充:`curl` 返回 401无 auth cookie属于预期浏览器同源请求会自动带 cookie
---
## ✅ 3) 录音交互重构(声音克隆)
### 3.1 入口重排
- 去掉参考音频面板内的独立录音大块区域
- 将「上传音频 / 录音」入口放到「我的参考音频」区域底部右侧
### 3.2 录音流程改为弹窗
- 录音弹窗支持:开始录音 / 停止录音 / 状态计时 / 试听
- 保留并强化「使用此录音」和「弃用本次录音」
- 关闭弹窗时若仍在录音,会先停止录音再关闭
- 修正弹窗挂载位置:从局部组件渲染改为 `AppModal` Portal 到 `document.body`,确保是全页面弹窗体验
- 参考音频区按钮文案更新:`录音` -> `在线录音`
### 3.4 文案区按钮视觉统一
- 统一「文案提取与编辑」区按钮尺寸与圆角(`px-3 py-1.5 text-xs rounded-lg`
-`AI智能改写``保存文案` 按钮改为与上传/在线录音同等级的视觉规格
- 同步统一图标尺寸与禁用态样式,消除“底部按钮偏小”问题
### 3.5 录音试听条 UI 美化
- 将录音完成后的原生白色 `<audio controls>` 替换为项目深色风格的自定义试听条
- 新试听条包含:播放/暂停按钮、进度拖拽、当前时长/总时长显示
- 统一配色到当前页面(深色底 + 绿色强调),避免与整体 UI 风格割裂
### 3.6 录音上传关闭时机优化
- 原逻辑:点击「使用此录音」后,需等待上传+识别完成才关闭弹窗(体感卡顿)
- 新逻辑:点击后立即关闭弹窗,上传/识别在后台继续进行
- 状态反馈仍在参考音频区域显示(上传识别中的提示 + 失败错误提示)
---
## ✅ 5) 发布管理抖音登录「无法获取二维码」修复
### 问题定位
- 现象:发布管理中点击抖音登录,前端提示无法获取二维码
- 后端日志显示根因:
- `Page.goto: Timeout 30000ms exceeded`
- 导航目标:`https://creator.douyin.com/`
- 等待条件:`wait_until="networkidle"`
### 修复方案
- 抖音登录页改为与微信一致的更稳策略:`wait_until="domcontentloaded"`
- 对抖音导航超时增加容错:即使 `goto` 超时,也继续执行二维码提取流程(避免长连接导致误失败)
### 验证
- 本地接口冒烟:`POST /api/publish/login/douyin` 返回 `success=true` 且包含 `qr_code`
- 已重启后端进程使修复生效:`pm2 restart vigent2-backend`
### 3.3 状态逻辑补齐
- 新增 `discardRecording()`:清空本次录音与计时
- 开始新录音前先清空旧录音,避免旧状态残留
---
## ✅ 4) 弹窗 UI/UX 统一AppModal
新增统一弹窗基座:`frontend/src/shared/ui/AppModal.tsx`
- 统一遮罩:`bg-black/80 + backdrop-blur-sm`
- 统一容器:深色半透明背景、`border-white/10``rounded-2xl`、重阴影
- 统一 Header标题/副标题/关闭按钮
- 统一行为ESC 关闭、背景滚动锁定、按需控制 overlay 点击关闭
- 统一挂载:通过 Portal 渲染到 `document.body`,避免出现“看起来只在配音区弹出”的层叠问题
- 统一可访问性:补齐 `role="dialog"` + `aria-modal="true"`
- 统一焦点管理:打开弹窗自动聚焦,关闭后恢复到打开前焦点元素
- 统一滚动锁计数:支持多弹窗并存,避免一个弹窗关闭后提前恢复页面滚动
已迁移弹窗:
- 视频预览(`VideoPreviewModal`
- 文案提取(`ScriptExtractionModal`
- AI 改写(`RewriteModal`
- 截取设置(`ClipTrimmer`
- 录音弹窗(`RefAudioPanel` 内)
- 修改密码弹窗(`AccountSettingsDropdown`
- 发布管理扫码登录弹窗(`PublishPage` 内 QR 登录弹窗)
---
## ✅ 6) 微信视频号登录二维码观感优化(“能扫但像被截断”)
### 问题现象
- 微信视频号登录二维码可扫码成功,但视觉上像“边缘不完整/被切掉”,观感不佳
### 修复方案
- 后端二维码提取策略增强(`qr_login_service.py`
- 优先导出二维码原始 PNG 数据(`canvas.toDataURL('image/png')` / `img[data:image/png]`),减少二次截图导致的边缘损失
- 微信回退截图时改为“按二维码 bbox 外扩留白裁剪”,避免贴边截取带来的不完整感
- 仅接受 PNG Data URL避免把非 PNG如 SVG 片段)直接当二维码返回造成边角异常
- 前端扫码弹窗展示优化(`PublishPage.tsx`
- 取消二维码图片本体圆角裁切,改为外层白底容器 + 内边距(模拟 quiet zone
- 同步调整二维码显示宽度与边框,提升完整感与观感一致性
### 验证
- 本地接口冒烟:`POST /api/publish/login/weixin` 返回 `success=true` 且包含 `qr_code`
- 解码后图片尺寸为 `1000x1000`,扫码仍正常
- 前后端进程已重启使修复生效:
- `pm2 restart vigent2-frontend`
- `pm2 restart vigent2-backend`
---
## ✅ 7) 发布流程性能与日志可读性优化(双平台发布场景)
### 7.1 发布请求并发优化(前端)
- 原逻辑:发布页按平台串行 `for...of await`,多平台总耗时为各平台耗时累加
- 新逻辑:引入受限并发执行(并发度=2两平台可并行发布显著缩短总等待时长
- 结果列表仍按用户选择的平台顺序回填,避免并发返回导致顺序抖动
### 7.2 微信上传日志噪声优化(后端)
- 原逻辑:`set_input_files` 后若立即读不到 `input.files[0]` 就直接打 warning`[weixin][file_input] empty`
- 新逻辑:先轮询确认“是否已进入上传中状态”,再决定是否告警;非最后一次重试只记 info最后一次才 warning
- 效果:减少误报警(实际已开始上传时不再刷 warning排障日志更干净
### 验证
- `python -m py_compile backend/app/services/uploader/weixin_uploader.py`
- `npm run build`frontend
- 服务重启:`pm2 restart vigent2-frontend && pm2 restart vigent2-backend`
---
## ✅ 8) 小红书发布链路对齐改造(启动模式 / Cookie 格式 / 成功截图)
### 8.1 启动模式与反检测参数对齐
-`config.py` 新增小红书 Playwright 配置:
- `XIAOHONGSHU_HEADLESS_MODE`(默认 `headless-new`
- `XIAOHONGSHU_USER_AGENT / LOCALE / TIMEZONE_ID`
- `XIAOHONGSHU_CHROME_PATH / BROWSER_CHANNEL`
- `XIAOHONGSHU_FORCE_SWIFTSHADER / DEBUG_ARTIFACTS`
- `xiaohongshu_uploader.py` 改为与抖音/微信一致的可配置启动策略,并保留反检测基础参数(`--disable-blink-features=AutomationControlled`
### 8.2 小红书 uploader 重构增强
- 重写小红书 uploader 主流程(参考抖音/微信模式):
- 上传入口/文件 input 多选择器回退
- 上传中/成功/失败状态轮询判定
- 标题与正文/话题填充容错
- 发布按钮多选择器与可点击检查
- 发布成功判定从“仅 URL”增强为“多信号组合”
- URL 跳转判定
- 页面成功/失败文案判定
- 发布 API 响应监听(`publish` / `note create` 类接口)
- 发布成功后补齐截图能力并返回 `screenshot_url`(路径格式与抖音/微信一致):
- `/api/publish/screenshot/{filename}`
### 8.3 Cookie 保存格式统一
- `publish_service.save_cookie_string()` 调整:
- `bilibili` 继续使用原有简化 cookie dict兼容既有上传库
-`bilibili` 平台统一保存为 Playwright `storage_state`
- `{"cookies": [...], "origins": []}`
- 补充平台默认 domain抖音/微信/小红书),使 cookie 文件可直接用于 `browser.new_context(storage_state=...)`
### 8.4 验证与生效
- `python -m py_compile backend/app/core/config.py backend/app/services/publish_service.py backend/app/services/uploader/xiaohongshu_uploader.py`
- `pm2 restart vigent2-backend`
---
## ✅ 9) 小红书登录二维码修复(默认短信登录需先切换)
### 问题现象
- 小红书创作平台 `https://creator.xiaohongshu.com/` 默认落在“短信登录”视图
- 二维码需要先点击右上角切换图标才会出现,导致后端直接按二维码选择器抓取失败
### 修复方案(`qr_login_service.py`
- 新增 `_ensure_xiaohongshu_qr_mode()`
- 先检测是否处于短信登录(`input[placeholder*='手机号']`
- 自动点击登录卡片右上角切换图标(优先稳定选择器,失败后用几何位置兜底)
- 切换后等待二维码渲染再进入提取流程
- 扩展小红书二维码选择器集合:
- 增加登录卡片内二维码图片选择器(包含当前页面结构)
- 保留通用 `img[src*='qr'/'qrcode']` 兜底
- 提高小红书候选过滤阈值(`min_side=120`),避免误选右上角切换小图标
- 文本策略补充小红书关键词(如 `APP扫一扫登录`
### 验证
- 本地接口冒烟:`POST /api/publish/login/xiaohongshu` 返回 `success=true``qr_code` 非空
- 后端日志确认修复链路生效:
- `已点击登录方式切换,等待二维码渲染`
- `策略1(CSS): 匹配成功`
---
## ✅ 10) 小红书发布上传阶段修复(“发布笔记 - 上传视频”场景)
### 问题现象
- 小红书发布在“上传视频”阶段失败,页面停留在发布页,前端提示发布失败
- 后端日志显示 `set_input_files` 触发成功,但短时间内未检测到上传状态,导致重复触发上传并误判失败
- 进一步定位到上传文件实际是 Supabase 本地对象文件(无后缀),日志里 `file_input type=` 为空,平台可能无法正确识别视频 MIME
### 修复方案(`xiaohongshu_uploader.py`
- 新增上传启动探测窗口 `UPLOAD_SIGNAL_TIMEOUT=12s`
- `set_input_files` 成功后给上传状态留出启动时间
- 检测到“上传中/处理中/转码中”等信号即进入后续上传轮询
- 启动窗口内未出现明显信号时,不再立即判失败,转入主上传监控阶段继续等待
- 修正失败判定词:
- 从失败关键词中移除 `重新上传`(该文案在小红书页面常作为正常状态/操作入口,不能直接视为失败)
- 增补上传文件诊断日志:
- 输出 `file_input` 选中文件名/大小/类型,便于确认文件是否真正注入 input
- 上传失败命中时记录明确告警日志,便于线上快速定位
- 增加无后缀视频文件兜底:
- 若原文件无后缀且父目录名带后缀(如 `xxx.mp4/<uuid>`),自动在 `/tmp/vigent_uploads` 生成同名临时文件(硬链接/软链接/复制兜底)
- 上传改用带后缀临时文件,提升站点 MIME 识别稳定性
- 任务结束后自动清理临时上传文件
### 10.1 二次定位与加固(卡住复现后)
- 复现日志显示:即使传入了带后缀临时路径,`file_input` 中仍出现无后缀文件名,且长时间停留在 `等待上传状态...`
- 根因进一步确认:此前在跨设备场景下会走 `symlink` 回退,浏览器实际取到原始目标文件名(无后缀),导致站点识别失败
- 加固修复:
- 去掉 `symlink` 回退,仅保留 `hardlink -> copy`,确保最终上传文件名稳定带 `.mp4`
- 新增 `file_input` 文件名后缀一致性校验:若与预期后缀不一致,直接重试并在最终失败时提前返回(不再无意义长时间等待)
- 新增上传空转超时保护(`UPLOAD_IDLE_TIMEOUT=90s`):长时间无有效上传信号时提前失败并保留调试截图,避免前端“看起来卡死”
- 优化失败文案为“未能触发有效视频上传,请确认发布页状态及视频文件格式”
### 10.2 实时发布验证(修复后)
- 重新发起 `POST /api/publish`(小红书),后端完整走通上传+发布,接口返回 `200`
- 本次实测耗时约 `45.77s`,属于上传与发布等待区间内的正常时长
- 发布成功截图可访问:`GET /api/publish/screenshot/xiaohongshu_success_20260303_115944_633.png` 返回 `200`
- 关键日志链路:`正在上传` -> `已设置上传文件` -> `等待发布结果` -> `Cookie 更新完毕`
### 验证
- `python -m py_compile backend/app/services/uploader/xiaohongshu_uploader.py`
- `pm2 restart vigent2-backend`
- `curl http://127.0.0.1:8006/health` 返回 `{"status":"ok"}`
---
## ✅ 11) 首页「AI生成标题标签」按钮位置优化迁移到四、标题与字幕
### 设计结论
-`AI生成标题标签` 从「一、文案提取与编辑」迁移到「四、标题与字幕」
- 标题区改为两行:
- 第一行:`四、标题与字幕` 标题 + 右侧 `AI生成标题标签`
- 第二行:右对齐放置 `标题短暂显示/常驻显示` + `预览样式`
- 显示语义补充:`标题短暂显示/常驻显示` 对主标题与副标题统一生效(常驻=主/副标题都常驻)
- 不额外增加提示文案,保持界面简洁
- `AI生成标题标签` 外观对齐 `在线录音` 按钮的圆角与尺寸(`rounded-lg` + 同级按钮尺寸),颜色保留原蓝色渐变
### 结果
- 标题相关动作集中到同一板块,避免用户在「一」和「四」之间来回跳转
- 行内层级更明确AI 动作在标题同层,配置项与预览在下一行
- AI 按钮圆角与尺寸更柔和,配色仍保持原蓝色渐变,视觉更统一
### 验证
- `npm run build`frontend
- `pm2 restart vigent2-frontend`
---
## ✅ 12) 文案编辑框右下角扩展角标(弹出大编辑器)
### 设计与实现
- 在「一、文案提取与编辑」主输入框右下角新增角标按钮(点击后打开扩展编辑器)
- 扩展编辑器使用 `AppModal`,提供更大编辑空间(高约 `66vh`
- 主输入框与弹窗内输入框共享同一份 `text` 状态,双向实时同步
- 为避免角标遮挡正文,主输入框增加右下内边距(`pr-6 pb-6`
- 角标样式进一步极简化:仅保留双箭头图标,去掉外框容器并贴近输入框边缘
- 角标位置微调为更协调的“上移+右移”:`right-0.5 bottom-2`,并固定点击区域 `h-5 w-5`
- 修复扩展编辑输入焦点丢失:`AppModal` 改为使用 `onCloseRef` 处理 ESC避免父组件重渲染时 effect 误清理导致 textarea 失焦
- 移除扩展编辑输入框紫色聚焦边框,改为中性边框高亮(`focus:border-white/25`
### 验证
- `npm run build`frontend
- `pm2 restart vigent2-frontend`
---
## ✅ 13) 站点 Icon 替换(使用 `Temp/video.png`
### 变更
- 将提供的 `Temp/video.png` 转换并替换为站点图标资源
- 新增 `frontend/src/app/icon.png`Next App Router icon 资源)
- 更新 `frontend/src/app/favicon.ico`16/32/48/64 多尺寸)
### 验证
- `npm run build`frontend
- 构建产物包含 `/icon.png` 路由 ✅
- `pm2 restart vigent2-frontend`
---
## ✅ 14) 发布后工作区清理链路加固CleanupContext + `/api/videos/cleanup`
### 14.1 功能落地
- 发布页新增“全平台发布成功后清理引导”链路:
- 全平台成功:触发 `CleanupModal`
- 任一平台失败:保持原内联结果展示
- `CleanupModal` 支持展示:成功平台列表、成功截图、下载视频备份、一键清理
- 清理状态 `cleanup_pending` 持久化到 localStorage刷新/跳转后可恢复
### 14.2 稳定性与防锁死优化
- 后端删除能力改为“异常上抛”,避免静默吞错导致前端误判清理成功
- 清理接口改为严格成功语义:
- 视频和配音删除都成功才返回 success
- 任一删除失败直接返回错误,前端保留弹窗并允许重试
- 前端清理动作改为“先后端、后本地”:
- 后端失败:不清本地、不关弹窗
- 后端成功:再清理本地输入字段并关闭弹窗
- 后端成功清理后前端派发 `vigent:workspace-cleared` 事件,发布页就地重置标题/标签输入态(无需手动刷新)
- 连续失败达到阈值3 次)后显示“暂不清理,继续使用”,避免异常环境下永久阻塞
- 清理弹窗增加 24h 过期,避免跨天残留状态
- 用户切换/登出时重置 cleanup 状态,避免旧账号状态串扰
### 14.3 清理范围口径
- 仅清理输入内容字段:
- 首页:文案/标题/副标题
- 发布页:标题/标签
- 保留用户偏好字段样式、字号、边距、模型、BGM 等)
### 验证
- `python -m py_compile backend/app/services/storage.py backend/app/modules/videos/service.py backend/app/modules/generated_audios/service.py backend/app/modules/videos/router.py`
- `npm run build`frontend
- `pm2 restart vigent2-backend && pm2 restart vigent2-frontend`
- `curl http://127.0.0.1:8006/health` 返回 `{"status":"ok"}`
---
## 📁 今日主要修改文件
| 文件 | 改动 |
|------|------|
| `backend/app/modules/videos/router.py` | 新增/增强 `voice-preview` GET+POST试听文本 locale 路由,临时文件清理;新增 `POST /api/videos/cleanup` 严格成功语义 |
| `backend/app/modules/videos/service.py` | 新增批量删除生成视频能力;返回 `(deleted, failed)` 供 cleanup 路由判定 |
| `backend/app/modules/generated_audios/service.py` | 新增批量删除预生成配音能力;返回 `(deleted, failed)` 供 cleanup 路由判定 |
| `backend/app/services/storage.py` | `delete_file()` 改为异常上抛,避免删除失败静默吞错造成“假成功” |
| `backend/app/modules/videos/schemas.py` | 新增 `VoicePreviewRequest` |
| `frontend/src/features/home/ui/VoiceSelector.tsx` | 音色下拉增加试听按钮,改为 GET 音频流播放 |
| `frontend/src/features/home/model/useHomeController.ts` | 录音状态重置、`discardRecording` |
| `frontend/src/features/home/ui/HomePage.tsx` | 透传录音弃用动作;将 `AI生成标题标签` 事件改为传入 `TitleSubtitlePanel` |
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 上传/录音入口重排;录音改弹窗;使用/弃用流程 |
| `frontend/src/features/home/ui/ScriptEditor.tsx` | 文案编辑区按钮视觉统一;移除 `AI生成标题标签`(职责回归标题板块);新增输入框右下角扩展角标与大编辑弹窗;角标改为双箭头极简贴边样式并微调到 `right-0.5 bottom-2`;输入框去除紫色聚焦边框 |
| `frontend/src/features/home/ui/TitleSubtitlePanel.tsx` | 标题区改为“首行标题+AI、次行右对齐设置+预览”AI按钮外观对齐在线录音按钮软圆角 |
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 录音完成试听条改为自定义深色播放器(替换原生白色控制条) |
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 使用录音后弹窗立即关闭,上传识别后台进行(提升交互流畅度) |
| `frontend/src/features/publish/model/usePublishController.ts` | 发布改为受限并发(并发度=2全平台发布成功时触发 `triggerCleanup()`,失败保持内联结果;监听 `workspace-cleared` 事件就地清空发布输入态 |
| `frontend/src/shared/contexts/CleanupContext.tsx` | 新增发布后清理弹窗与持久化状态;失败不关闭/不清本地、3 次失败可跳过、24h 过期、用户切换复位;清理范围收敛为输入内容字段;成功清理后派发 `workspace-cleared` 事件 |
| `frontend/src/app/layout.tsx` | 在 `TaskProvider` 内挂载 `CleanupProvider`,确保全局可触发发布后清理弹窗 |
| `backend/app/core/config.py` | 新增小红书 Playwright 配置headless/UA/locale/timezone/chrome/debug |
| `backend/app/services/uploader/xiaohongshu_uploader.py` | 按抖音/微信模式重构补充上传启动容错窗口、无后缀文件兜底hardlink/copy、后缀一致性校验、空转超时保护与上传诊断日志 |
| `backend/app/services/publish_service.py` | `save_cookie_string` 非 bilibili 统一存储为 Playwright `storage_state`;小红书 uploader 透传 `user_id` |
| `backend/app/services/qr_login_service.py` | 抖音导航超时容错 + 微信二维码提取增强 + 小红书登录自动切换到扫码模式并提取二维码 |
| `backend/app/services/uploader/weixin_uploader.py` | `file_input empty` 告警策略优化:先检测上传信号,非最后一次重试降级为 info |
| `frontend/src/shared/ui/AppModal.tsx` | 统一弹窗组件 + 无障碍语义 + 焦点管理 + 多弹窗滚动锁计数;新增 `onCloseRef` 避免回调引用变化引发的意外失焦 |
| `frontend/src/components/VideoPreviewModal.tsx` | 迁移到 `AppModal` |
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | 迁移到 `AppModal` |
| `frontend/src/features/home/ui/RewriteModal.tsx` | 迁移到 `AppModal` |
| `frontend/src/features/home/ui/ClipTrimmer.tsx` | 迁移到 `AppModal` |
| `frontend/src/components/AccountSettingsDropdown.tsx` | 修改密码弹窗迁移到 `AppModal` |
| `frontend/src/app/icon.png` | 新增站点 icon 资源(来自 `Temp/video.png` |
| `frontend/src/app/favicon.ico` | 替换站点 favicon`video.png` 转换为多尺寸 ico |
| `frontend/src/features/publish/ui/PublishPage.tsx` | 扫码登录QR弹窗迁移到 `AppModal` + 二维码白底留白容器优化(避免边缘观感被裁) |
| `Docs/FRONTEND_DEV.md` | 新增统一弹窗规范AppModal和录音交互规范补充文案扩展编辑也统一走 AppModal新增 CleanupContext 清理策略规范 |
| `Docs/FRONTEND_README.md` | 增补录音入口与弹窗交互说明;明确“标题常驻显示”对主/副标题同时生效;补充文案输入框扩展编辑器说明;补充发布后清理弹窗失败兜底说明 |
| `Docs/BACKEND_README.md` | 增补 `voice-preview` 接口说明;更新发布 API 路径(`/login/{platform}` 等)并链接发布专项文档;补充 `title_display_mode` 对主/副标题统一生效说明;新增 `/api/videos/cleanup` 接口说明 |
| `Docs/BACKEND_DEV.md` | 更新后端规范中的发布器覆盖范围与小红书配置项;补充发布专项文档指引;补充 `title_display_mode` 主/副标题统一生效约定;新增 cleanup 严格成功语义约定 |
| `Docs/PUBLISH_DEPLOY.md` | 新增多平台发布专项文档(登录实现、自动化发布流程、部署要点与排障);补充“发布成功后清理联动”说明 |
| `Docs/DEPLOY_MANUAL.md` | 部署参数与扫码说明补充小红书要点;新增发布专项文档入口 |
| `README.md` | 文档中心新增 `PUBLISH_DEPLOY.md` 入口;发布结果可视化描述补齐小红书;补充发布成功后工作区清理引导说明 |
| `Docs/TASK_COMPLETE.md` | 新增 Day31 任务汇总,更新 Current 标签与更新时间;补充发布后清理链路加固条目 |
| `Docs/DOC_RULES.md` | 增补“发布相关三检”(路由真值/专项文档/入口回写)、敏感信息处理规范,更新工具规范为 `Read/Grep/apply_patch`,并对齐 TASK_COMPLETE 检查清单 |
| `Docs/SUBTITLE_DEPLOY.md` | 与当前阈值/参数说明对齐 |
| `Docs/LATENTSYNC_DEPLOY.md` | 与当前阈值/参数说明对齐 |
| `Docs/COSYVOICE3_DEPLOY.md` | TTS 部署说明与当前运行路径对齐 |
| `Docs/QWEN3_TTS_DEPLOY.md` | 标注为历史归档并指向 CosyVoice 3.0 |
---
## 🔍 验证记录
- `python -m py_compile backend/app/modules/videos/router.py backend/app/modules/videos/schemas.py`
- `python -m py_compile backend/app/services/qr_login_service.py`
- `python -m py_compile backend/app/services/uploader/weixin_uploader.py`
- `python -m py_compile backend/app/core/config.py backend/app/services/publish_service.py backend/app/services/uploader/xiaohongshu_uploader.py`
- `POST /api/publish/login/xiaohongshu` 冒烟返回 `success=true` + `qr_code`
- `python -m py_compile backend/app/services/uploader/xiaohongshu_uploader.py`(上传阶段修复后)✅
- `pm2 restart vigent2-backend`(上传阶段修复后)✅
- `curl http://127.0.0.1:8006/health` 返回 `{"status":"ok"}`
- `backend/venv/bin/python` 本地探针验证 `_prepare_upload_file()`:临时文件非 symlink、后缀 `.mp4`、清理成功 ✅
- 小红书发布实测:`POST /api/publish` 返回 `200``Duration: 45.77s`)且成功截图接口返回 `200`
- 新增 `Docs/PUBLISH_DEPLOY.md`(抖音/微信/B站/小红书登录与发布实现说明)✅
- `npm run build`frontend
- 站点 icon 替换后构建通过,产物包含 `/icon.png` 路由 ✅
- `pm2 restart vigent2-frontend`icon 替换后)✅
- `python -m py_compile backend/app/services/storage.py backend/app/modules/videos/service.py backend/app/modules/generated_audios/service.py backend/app/modules/videos/router.py`cleanup 链路加固后)✅
- `npm run build`CleanupContext 优化后)✅
- `pm2 restart vigent2-backend && pm2 restart vigent2-frontend`cleanup 链路加固后)✅
- `curl http://127.0.0.1:8006/health` 返回 `{"status":"ok"}`cleanup 链路加固后)✅
- `POST /api/publish/login/weixin` 冒烟返回 `success=true` + `qr_code`
- `npx eslint` 定向检查以下文件通过:
- `VoiceSelector.tsx`
- `RefAudioPanel.tsx`
- `HomePage.tsx`
- `useHomeController.ts`
- `AppModal.tsx`
- `VideoPreviewModal.tsx`
- `ScriptExtractionModal.tsx`
- `RewriteModal.tsx`
- `AccountSettingsDropdown.tsx`
- `ClipTrimmer.tsx` 仍有仓库既有 lint 规则项(`react-hooks/set-state-in-effect`),与本次弹窗风格迁移无关
- 音色试听线上问题经后端重启后已恢复可用(浏览器同源携带 cookie
---
## ☑️ Day31 覆盖核对(今日新增补充)
已对照今天新增改动做二次核对,以下内容已写入本日志:
- `AppModal` 的可访问性与焦点/滚动锁稳健性增强
- 微信视频号二维码“观感不完整”问题的后端提取修复
- 发布页二维码展示样式优化(白底留白、去除本体圆角裁切)
- 小红书 uploader 对齐重构(启动参数、发布判定、成功截图)
- 小红书“上传阶段卡住”二次定位与加固(文件名后缀一致性 + 空转超时)并完成实测发布成功
- 形成发布专项文档 `Docs/PUBLISH_DEPLOY.md`,沉淀四平台登录与自动化发布实现
- 回写 `Docs/BACKEND_README.md` / `Docs/BACKEND_DEV.md` / `Docs/DEPLOY_MANUAL.md`,统一发布 API 与部署说明口径
- 回写 `Docs/FRONTEND_README.md` / `Docs/FRONTEND_DEV.md` / `Docs/PUBLISH_DEPLOY.md`,补齐发布后清理弹窗与 cleanup 接口联动说明
- 回写 `README.md`,补充发布专项文档入口与小红书发布成功截图能力描述
- 回写 `Docs/TASK_COMPLETE.md`,补齐 Day31 任务完成记录
- 回写 `Docs/DOC_RULES.md`,同步文档更新规则到当前文档结构与工具链
- 首页「AI生成标题标签」按钮迁移到「四、标题与字幕」并固定标题同层最右显示方式与预览下沉到下一行右侧
- 文案输入框右下角新增扩展角标,支持弹出大编辑器进行长文案编辑
- 站点 icon 已替换为 `Temp/video.png` 对应资源(`app/icon.png` + `app/favicon.ico`
- 发布后工作区清理链路落地CleanupModal + `/api/videos/cleanup`)并补齐失败兜底(失败不关弹窗、不清本地)
- 清理链路防锁死优化3 次失败可跳过、24h 过期、用户切换复位
- 文档补充:`标题短暂显示/常驻显示` 对主标题与副标题统一生效(常驻=主/副标题全程显示)
- 非 bilibili 平台 cookie 保存为 `storage_state` 格式
- 小红书登录二维码自动切换(短信登录 -> 扫码登录)与提取修复
- 对应构建/重启/冒烟验证记录
- 今日运行期产物(`backend/user_data/**/cookies/*.json``watchdog.log`)为会话副产物,不属于代码/文档变更项

159
Docs/DevLogs/Day32.md Normal file
View File

@@ -0,0 +1,159 @@
## 视频下载同源修复 + 安全漏洞第一批修复 (Day 32)
### 概述
今天的工作聚焦四件事:
1. 修复首页与发布成功弹窗点击下载时被浏览器当作在线播放(新开标签页)的问题。
2. 将下载修复开始后的开发内容从 `Day31` 拆分到 `Day32`,保持日志按天清晰归档。
3. 根据安全审计报告(`Temp/安全审计报告.md`),实施第一批 6 项无功能风险的安全修复。
4. 统一弹窗关闭交互(仅关闭策略):默认支持点空白关闭,发布成功清理弹窗保持强制留存。
---
## ✅ 1) 视频下载链路修复(避免新开标签页播放)
### 问题现象
- 首页“下载视频”与发布成功弹窗“下载视频备份”在部分浏览器会打开新标签页播放视频,而不是直接触发下载。
- 根因是跨域签名 URL 场景下,浏览器可能忽略 `<a download>`
### 修复方案
- 后端新增同源下载接口:`GET /api/videos/generated/{video_id}/download`
- 使用 `FileResponse` 返回本地视频文件
- 显式返回 `Content-Disposition: attachment`
- 浏览器直接进入保存文件流程
- 发布成功弹窗下载改为传 `videoId`,不再依赖签名 URL。
- 首页作品预览下载同步改为同源下载接口,下载行为与发布弹窗统一。
- 兼容旧清理状态:`CleanupContext` 对旧 `videoDownloadUrl` 持久化字段做 `videoId` 解析回填。
---
## ✅ 2) 配套调整与文档拆分
### 前端联动
- `CleanupContext` 继续沿用“清理失败不关弹窗、不清本地”的逻辑,下载链路仅替换为同源接口。
- 首页 `PreviewPanel` 支持传入 `generatedVideoId`,下载按钮优先走 `/api/videos/generated/{id}/download`
### 日志归档
- 将“下载修复开始后的内容”从 `Day31` 移出并归档到 `Day32`
- `Day31` 保留 Day31 当日核心内容(到 cleanup 链路加固为止)。
---
## ✅ 3) 安全漏洞第一批修复6 项,无功能风险)
根据安全审计报告,实施第一批 6 项可直接修复的安全加固项。
### 3.1 JWT 默认密钥启动拦截
- **文件**`backend/app/main.py`
- 新增 `check_jwt_secret` startup 事件(在 `init_admin` 之前)
-`JWT_SECRET_KEY` 仍为默认值 `"your-secret-key-change-in-production"` 时:
- **生产环境**`DEBUG=False``raise RuntimeError` 直接阻止服务启动
- **开发环境**`DEBUG=True`):输出 `CRITICAL` 级别日志告警,不阻止启动
### 3.2 AI / Tools 接口加认证
- **文件**`backend/app/modules/ai/router.py``backend/app/modules/tools/router.py`
- AI 路由 3 个端点(`/translate``/generate-meta``/rewrite`)均增加 `Depends(get_current_user)`
- Tools 路由 1 个端点(`/extract-script`)增加 `Depends(get_current_user)`
- 前端 axios 已有 `withCredentials: true`401 自动跳登录页,无需前端改动
### 3.3 素材路径穿越修复
- **文件**`backend/app/modules/materials/router.py``backend/app/modules/materials/service.py`
- `stream``delete_material``rename_material` 三处在 `startswith(user_id)` 校验之前新增 `..` 拒绝
-`..``material_id` 直接返回 400
- `delete_material` 路由补充 `except ValueError` → 400原先仅 catch `PermissionError``ValueError` 会被 `Exception` 兜底返回 500
### 3.4 video_id 白名单校验
- **文件**`backend/app/modules/videos/router.py`
- `download_generated``delete_generated` 两个端点在函数开头增加正则校验
- 仅允许 `^[A-Za-z0-9_-]+$`,不符合直接返回 400
### 3.5 上传/下载大小限制
- **materials/service.py**(流式上传):在 chunk 累加后检查 `MAX_UPLOAD_SIZE_MB`(默认 500MB超限抛 `ValueError`
- **ref_audios/service.py**(参考音频):`await file.read()` 后检查 5MB 上限
- **tools/service.py**(文案提取文件上传):将 `shutil.copyfileobj` 替换为分块拷贝 + 500MB 限制
- **tools/service.py**URL 下载分支):`_download_video` 返回后检查文件体积,超 500MB 删除临时文件并拒绝
### 3.6 错误信息通用化
- **ai/router.py**3 处 `detail=str(e)` 分别改为"翻译服务暂时不可用"、"生成标题标签失败"、"改写服务暂时不可用"
- **tools/router.py**:保留 "Fresh cookies" 特定分支提示fallback 改为"文案提取失败,请稍后重试"
- **generated_audios/service.py**:任务失败 `error` 字段从 `traceback.format_exc()` 改为 `str(e)`traceback 仅写入服务端日志
---
## ✅ 4) 弹窗关闭策略统一UX
### 目标
- 保持统一交互预期:业务弹窗默认可通过 `X` 与点击遮罩关闭。
- 保留关键流程保护:发布成功清理弹窗继续禁止遮罩关闭,避免误触导致流程中断。
- 说明:按钮位置与视觉样式统一属于 Day33 范畴,本日志仅记录关闭策略统一。
### 调整内容
- 文案提取弹窗(`ScriptExtractionModal`)支持点击遮罩关闭。
- AI 改写弹窗(`RewriteModal`)支持点击遮罩关闭。
- 发布页扫码登录弹窗支持点击遮罩关闭。
- 修改密码弹窗支持点击遮罩关闭。
- 录音弹窗采用动态策略:`closeOnOverlay={!isRecording}`
- 未录音:允许遮罩关闭
- 录音中:禁止遮罩关闭(防误触);`X` 关闭仍可用,且会先停止录音再关闭
- 发布成功清理弹窗维持 `closeOnOverlay=false`,并且不提供 `onClose`(无右上角关闭按钮)。
---
## 📁 今日主要修改文件
| 文件 | 改动 |
|------|------|
| `backend/app/modules/videos/router.py` | 新增 `GET /api/videos/generated/{video_id}/download`,返回 `attachment` 下载响应;新增 `video_id` 白名单正则校验(`^[A-Za-z0-9_-]+$` |
| `frontend/src/features/publish/model/usePublishController.ts` | 发布成功后 `triggerCleanup()``video.id`(替换签名 URL |
| `frontend/src/shared/contexts/CleanupContext.tsx` | 下载字段改为 `videoId`;兼容旧 `videoDownloadUrl` 回填;下载按钮改同源路径 |
| `frontend/src/features/home/ui/PreviewPanel.tsx` | 首页下载改为同源下载接口 |
| `frontend/src/features/home/ui/HomePage.tsx` | 透传 `generatedVideoId``PreviewPanel` |
| `frontend/src/features/home/ui/ScriptExtractionModal.tsx` | 弹窗支持点击遮罩关闭(`closeOnOverlay` |
| `frontend/src/features/home/ui/RewriteModal.tsx` | 弹窗支持点击遮罩关闭(`closeOnOverlay` |
| `frontend/src/features/publish/ui/PublishPage.tsx` | 扫码登录弹窗支持点击遮罩关闭 |
| `frontend/src/components/AccountSettingsDropdown.tsx` | 修改密码弹窗支持点击遮罩关闭 |
| `frontend/src/features/home/ui/RefAudioPanel.tsx` | 录音弹窗改为 `closeOnOverlay={!isRecording}`(录音中禁遮罩关闭) |
| `Docs/DevLogs/Day31.md` | 移除下载修复章节与对应验证/覆盖项(迁入 Day32 |
| `Docs/TASK_COMPLETE.md` | 当日新增 Day32 区块并接棒 Current后续由 Day33 接棒 Current |
| `Docs/BACKEND_README.md` | 补充 `/api/videos/generated/{video_id}/download` 接口说明 |
| `Docs/BACKEND_DEV.md` | 补充下载接口 `attachment` 约定 |
| `Docs/FRONTEND_README.md` | 补充首页/发布弹窗下载统一同源接口说明 |
| `Docs/FRONTEND_DEV.md` | 补充 CleanupContext 下载策略规范 |
| `Docs/PUBLISH_DEPLOY.md` | 补充发布成功后同源下载联动说明 |
| `README.md` | 补充”一键下载直达(同源 attachment”能力描述 |
| `backend/app/main.py` | `check_jwt_secret` startup 事件:生产环境(`DEBUG=False`)强拦截启动,开发环境 `CRITICAL` 告警 |
| `backend/app/modules/ai/router.py` | 3 个端点加 `Depends(get_current_user)` 认证;错误返回改为通用消息 |
| `backend/app/modules/tools/router.py` | `extract-script` 端点加 `Depends(get_current_user)` 认证;错误返回改为通用消息 |
| `backend/app/modules/materials/router.py` | `stream` 端点新增 `..` 路径穿越拒绝;`delete` 端点补充 `except ValueError` → 400 |
| `backend/app/modules/materials/service.py` | `delete_material` / `rename_material` 新增 `..` 路径穿越拒绝;流式上传增加 `MAX_UPLOAD_SIZE_MB` 大小限制 |
| `backend/app/modules/ref_audios/service.py` | 参考音频上传增加 5MB 大小限制 |
| `backend/app/modules/tools/service.py` | 文案提取文件上传替换为限大小分块拷贝500MBURL 下载分支增加下载后体积检查500MB |
| `backend/app/modules/generated_audios/service.py` | 任务失败错误字段从 `traceback.format_exc()` 改为 `str(e)`,避免泄露内部路径 |
---
## 🔍 验证记录
- `python -m py_compile backend/app/modules/videos/router.py`
- `npm run build`frontend
- `npm run build`frontend弹窗关闭策略调整后复验
- `pm2 restart vigent2-frontend`
- `pm2 restart vigent2-backend`
- `curl http://127.0.0.1:8006/health` 返回 `{"status":"ok"}`
- 安全修复第一批语法验证:`python -m py_compile backend/app/main.py backend/app/modules/materials/router.py backend/app/modules/tools/service.py backend/app/modules/ai/router.py backend/app/modules/tools/router.py backend/app/modules/materials/service.py backend/app/modules/ref_audios/service.py backend/app/modules/videos/router.py backend/app/modules/generated_audios/service.py`
- 未登录调用 `/api/ai/translate` → 返回 401 ✅
- 未登录调用 `/api/tools/extract-script` → 返回 401 ✅
- 收尾三刀语法验证:`python -m py_compile backend/app/main.py backend/app/modules/materials/router.py backend/app/modules/tools/service.py`

290
Docs/DevLogs/Day33.md Normal file
View File

@@ -0,0 +1,290 @@
## 抖音短链文案提取稳健性修复 (Day 33)
### 概述
今天聚焦修复「文案提取助手」里抖音分享短链/口令文本偶发提取失败的问题,并补齐多种抖音落地 URL 形态的兼容。
---
## ✅ 1) 问题复盘
### 现象
- 复制抖音分享口令文本(含 `v.douyin.com` 短链)时,文案提取偶发失败。
- 直接粘贴地址栏链接(如 `jingxuan?modal_id=...`)时,提取成功。
### 根因
- `backend/app/modules/tools/service.py``_download_douyin_manual` 原先只按 `/video/{id}` 提取视频 ID。
- 短链重定向结果并不总是 `/video/{id}`,常见还包括:
- `/share/video/{id}`
- `/user/...?...&vid={id}`
- `/follow/search?...&modal_id={id}`
- 当落到上述形态时会出现 `Could not extract video_id`,导致 fallback 失败。
---
## ✅ 2) 修复方案
### 2.1 抽取统一解析函数
- 新增 `_extract_douyin_video_id(candidate_url)`,统一解析以下 ID 形态:
- 路径:`/video/{id}``/share/video/{id}`
- Query 参数:`modal_id``vid``video_id``aweme_id``item_id`
- 解码后的整串 URL 兜底正则匹配
### 2.2 fallback 提取链路增强
- `_download_douyin_manual` 改为:
1. 优先从重定向后的 `final_url` 提取 `video_id`
2. 若失败,再从原始输入 `url` 提取 `video_id`
- 保持后续下载链路不变:访问 `m.douyin.com/share/video/{video_id}` 提取 `play_addr` 并下载。
---
## 📁 今日修改文件
| 文件 | 改动 |
|------|------|
| `backend/app/modules/tools/service.py` | 新增 `_extract_douyin_video_id`;增强抖音 fallback 的 `video_id` 提取策略(兼容 `share/video``modal_id``vid` 等) |
| `Docs/DevLogs/Day33.md` | 新增 Day33 开发日志,记录问题、根因、修复与验证 |
---
## 🔍 验证记录
- `python -m py_compile backend/app/modules/tools/service.py`
- URL 解析冒烟(函数级):
- `jingxuan?modal_id=...` 可提取 ✅
- `user?...&vid=...` 可提取 ✅
- `follow/search?...&modal_id=...` 可提取 ✅
- 下载链路冒烟(服务级):
- 用户提供的短链口令文本可成功下载临时视频 ✅
- 历史失败样例 `user?...&vid=...` 可成功走通 fallback 下载 ✅
---
## ✅ 3) 文案深度学习:抖音抓取 Playwright 降级增强
### 3.1 问题复盘
- 在「文案深度学习」博主分析链路里,抖音用户页有时返回 JS 壳页(含 `byted_acrawler`),静态 HTML 提取拿不到 `desc`
- 表现为:短链可解析 `sec_uid`,但标题抓取报错“页面结构可能已变更”。
### 3.2 修复方案
-`backend/app/services/creator_scraper.py` 中新增 Playwright 降级抓取:
1. 保留原 HTTP + `ttwid` 抓取作为首选(轻量、快)。
2. 当 HTTP 提取不到标题时,自动切换 Playwright。
3. 监听页面网络响应,定向捕获:
- `/aweme/v1/web/aweme/post/`
- `/aweme/v1/web/user/profile/other/`
4. 解析响应 JSON 中 `desc` 作为视频标题来源,并提取博主昵称。
- 仅在确实失败时返回更准确提示:
- `抖音触发风控验证,暂时无法抓取标题,请稍后重试`
### 3.3 结果
- 给定短链 `https://v.douyin.com/hmFXdx5PvzQ/` 可稳定识别并完成标题抓取。
- 抓取结果可获得有效博主昵称与约 50 条标题(受平台返回数据影响)。
### 3.4 本次新增/更新文件
| 文件 | 改动 |
|------|------|
| `backend/app/services/creator_scraper.py` | 新增抖音 Playwright 降级抓取、网络响应采集、标题/昵称解析优化、错误提示优化 |
| `Docs/DevLogs/Day33.md` | 增补文案深度学习抖音抓取增强记录 |
### 3.5 验证记录
- `python -m py_compile backend/app/services/creator_scraper.py`
- 冒烟验证:
- 短链重定向 + `sec_uid` 提取 ✅
- HTTP 首选链路失败时自动切换 Playwright ✅
- Playwright 网络响应中抓取到 `aweme/post` 数据并提取标题 ✅
---
## ✅ 4) 文案深度学习功能首版落地
### 4.1 后端实现
- 新增博主抓取服务:`backend/app/services/creator_scraper.py`
- `scrape_creator_titles(url)`:平台识别 + 标题抓取统一入口
- `validate_url(url)``https` 强制、域名白名单、DNS 全记录公网校验、逐跳重定向校验
- `cache_titles(titles, user_id)` / `get_cached_titles(analysis_id, user_id)`20 分钟 TTL + 用户绑定
- GLM 服务扩展:`backend/app/services/glm_service.py`
- `analyze_topics(titles)`从标题归纳热门话题≤10
- `generate_script_from_topic(topic, word_count, titles)`:按话题与风格生成文案
- 工具路由新增接口:`backend/app/modules/tools/router.py`
- `POST /api/tools/analyze-creator`
- `POST /api/tools/generate-topic-script`
- 使用 Pydantic JSON 请求模型 + 登录态校验 + 统一 `success_response`
### 4.2 前端实现
- 新增状态逻辑 Hook`frontend/src/features/home/ui/script-learning/useScriptLearning.ts`
- 流程状态:`input -> analyzing -> topics -> generating -> result`
- 管理分析请求、生成请求、错误态、复制、重新生成
- 新增弹窗组件:`frontend/src/features/home/ui/ScriptLearningModal.tsx`
- 步骤式 UI输入链接、话题单选、字数输入、结果展示、填入文案/复制
- 接入首页交互:
- `frontend/src/features/home/ui/ScriptEditor.tsx`:新增「文案深度学习」按钮
- `frontend/src/features/home/model/useHomeController.ts`:新增 `learningModalOpen` 状态
- `frontend/src/features/home/ui/HomePage.tsx`:挂载弹窗并支持回填主编辑器
### 4.3 交互位置与规则
- 按钮位置已按约定落位:
- `历史文案``文案提取助手``文案深度学习``AI多语言`
- 弹窗遵循当前统一策略:支持遮罩点击关闭(非关键流程弹窗)。
### 4.4 验证记录
- 后端语法检查:
- `python -m py_compile backend/app/services/creator_scraper.py backend/app/services/glm_service.py backend/app/modules/tools/router.py`
- 前端构建:
- `cd frontend && npm run build`
- 抖音短链样例联调:
- `https://v.douyin.com/hmFXdx5PvzQ/` 可解析、可抓取标题(触发降级时可自动走 Playwright
---
## ✅ 5) 抖音 Cookie 依赖澄清与 B站频率限制增强
### 5.1 抖音 Cookie 依赖澄清
- 文案深度学习的抖音抓取**不依赖发布管理页登录 Cookie**。
- 当前链路使用:
- 短链解析 + `sec_uid` 提取
- 公共访问链路(`ttwid` + 页面/接口抓取)
- 必要时 Playwright 降级
- 因此用户即使未登录抖音,也可使用该功能(但仍可能受平台风控影响)。
### 5.2 B站“请求过于频繁”优化
-`backend/app/services/creator_scraper.py` 增强 B站抓取稳健性
- 对频率限制场景增加自动重试(指数退避 + 随机抖动)
- 频率限制识别HTTP 412/429、错误码/错误文案)
- HTTP 链路失败后自动切换 Playwright 降级抓取
- 最终报错文案统一为更可理解的提示
- `mid` 提取兼容根路径与子路径(如 `/upload/video`
### 5.3 验证记录
- B站样例联调`https://space.bilibili.com/8047632` 可抓取 50 条标题 ✅
- 抖音短链复测:`https://v.douyin.com/hmFXdx5PvzQ/` 仍可抓取 50 条标题 ✅
---
## ✅ 6) 抖音 + B站 抓取可靠性二次增强
### 6.1 抖音增强
- `backend/app/services/creator_scraper.py`
- `scrape_creator_titles(..., user_id)` 透传用户 ID支持读取用户已登录平台 Cookie 作为增强上下文。
- 抖音抓取新增可选用户 Cookie 注入HTTP 请求 + Playwright 上下文)。
- Playwright 降级抓取轮次从 4 次提升到 8 次,目标改为尽量补齐 `MAX_TITLES=50`
- 保留网络响应抓取主链路(`aweme/post` + `profile/other`),优先 `desc` 提取标题。
### 6.2 B站增强
- 新增 WBI 签名链路(主链路):
- 获取 `wbi_img` key兼容 `nav` 返回 `-101` 但携带 `wbi_img` 的场景)
- 计算 `w_rid/wts` 后调用 `x/space/wbi/arc/search`
- 多页拉取(分页累加)+ 标题去重,尽量补齐 50 条
- 新增 B站会话预热
- `x/frontend/finger/spi` 获取并注入 `buvid3/buvid4`
- 支持读取用户已登录 B站 Cookie若存在提升命中率
- Playwright 降级增强:
- 监听 `x/space/*/arc/search` 响应并解析有效 payload
- 对捕获的 arc URL 进行 `context.request` 二次回放尝试
### 6.3 路由联动
- `backend/app/modules/tools/router.py`
- `/api/tools/analyze-creator` 调用抓取时传入 `current_user.id`,用于平台 Cookie 增强。
### 6.4 结果说明
- 抖音:短链场景稳定性进一步提升,风控页下优先走 Playwright 降级抓取。
- B站已补齐签名链路与降级链路但在平台强风控窗口仍可能返回“请求过于频繁/风控校验失败”,属于平台侧限制。
---
## ✅ 7) 抓取策略最终调整:抖音/B站改为 Playwright 直连
根据产品决策,将文案深度学习的博主标题抓取策略统一为 **Playwright 直连主链路**不再使用“HTTP 主链路 + Playwright 降级”。
### 7.1 调整内容
- `backend/app/services/creator_scraper.py`
- `_scrape_douyin()` 改为直接调用 `_scrape_douyin_with_playwright()`
- `_scrape_bilibili()` 改为直接调用 `_scrape_bilibili_with_playwright()`
- 两个平台均保留 2 次 Playwright 抓取重试。
- 支持优先读取用户隔离 Cookie若缺失再尝试旧版全局 Cookie。
- `backend/app/modules/tools/router.py`
- `analyze-creator` 继续传入 `current_user.id`,用于匹配用户 Cookie 上下文。
### 7.2 影响评估
- 影响范围仅限「文案深度学习」抓取链路。
- **不影响**视频自动化发布、文案提取助手extract-script现有流程。
### 7.3 验证
- 抖音短链样例:`https://v.douyin.com/hmFXdx5PvzQ/` 抓取成功50 条。
- B站样例
- `https://space.bilibili.com/256237759?spm_id_from=...` 抓取成功40 条。
- `https://space.bilibili.com/1140672573` 抓取成功40 条。
---
## ✅ 8) GLM 调用链统一与超时体验优化
### 8.1 现象
- 文案深度学习“生成文案”偶发前端报错:`timeout of 30000ms exceeded`
### 8.2 原因
- 主要是前端请求超时阈值过短30s在模型排队或长文本生成时容易超时。
- 后端虽然统一走 `glm_service`,但各方法内部仍重复编写 SDK 调用代码,维护成本高。
### 8.3 调整
- 前端:`generate-topic-script` 超时从 30s 提升到 90s并优化超时提示文案。
- 后端:`backend/app/services/glm_service.py`
- 新增 `_call_glm(...)` 作为统一调用入口(统一 model / thinking / to_thread / timeout
- `generate_title_tags / rewrite_script / analyze_topics / generate_script_from_topic / translate_text`
全部改为复用该入口
- 保持 `settings.GLM_MODEL` 单点配置,避免多处散落调用
### 8.4 结果
- GLM 调用标准统一,后续参数调整只需改一处。
- 前端超时报错显著减少;如确实超时会给出可理解提示。
---
## ✅ 9) 三个文案弹窗操作按钮统一
### 9.1 目标
- 统一「文案提取助手」「AI 智能改写」「文案深度学习」结果页操作按钮的位置、样式与主次关系。
### 9.2 调整
- `frontend/src/features/home/ui/ScriptExtractionModal.tsx`
- 结果页按钮从“分散在标题右侧 + 底部单独按钮”改为统一底部 Action Grid。
- 按钮统一为:`填入文案``复制``提取下一个``关闭`
- `frontend/src/features/home/ui/RewriteModal.tsx`
- 结果页按钮改为统一底部 Action Grid。
- 新增复制按钮(含 clipboard fallback
- 按钮统一为:`填入文案``复制``重新生成``保留原文`
- `frontend/src/features/home/ui/ScriptLearningModal.tsx`
- 维持同一 Action Grid 风格:`填入文案``复制``重新生成``换个话题`
### 9.3 验证
- `cd frontend && npm run build`

244
Docs/DevLogs/Day34.md Normal file
View File

@@ -0,0 +1,244 @@
## 多镜头Multi-Camera时间轴系统重构 (Day 34)
### 概述
将时间轴系统从"等分顺序片段"模型重构为"主素材 + 插入镜头"多镜头模型。主素材连续循环播放填满整条时间轴,用户可在任意位置叠加插入镜头,实现多机位切换效果。单素材模式行为完全不变。同时补充修复「文案深度学习」弹窗误触关闭问题。
---
## ✅ 1) 核心架构变更
### 1.1 旧模型 vs 新模型
| | 旧模型 | 新模型 |
|---|---|---|
| 时间轴结构 | 等分 N 段,每段对应一个素材 | 主素材连续播放 + 浮动插入块 |
| 主素材 | 无概念 | `selectedMaterials[0]`,循环填满整条音频时长 |
| 其余素材 | 平均分配时长 | 作为插入候选,可自由添加到时间轴任意位置 |
| 片段边界 | 固定等分 | 用户拖拽调整位置,点击弹窗编辑时长 |
| 最大素材数 | 4等分 | 41 主 + 最多 3 插入候选),每个候选可多次插入 |
### 1.2 `buildAssignments()` 核心算法
多素材模式下调用 `toCustomAssignments()` 生成 `custom_assignments` 数组:
1. 将插入块按 `start` 排序
2. 插入块之间的空白gap由主素材填充
3. 主素材使用 `primaryAccum` 追踪累计播放位置,实现无缝循环
4. 每段 gap 按主素材有效片段长度做**边界分割**,确保每个子段不跨越 loop 边界
5. 后端 `prepare_segment` 只需做简单裁剪,避免触发"先裁后循环"的帧重复路径
---
## ✅ 2) 前端改动
### 2.1 新增文件
**`frontend/src/shared/types/timeline.ts`**
```typescript
export interface InsertSegment {
id: string;
materialId: string;
materialName: string;
start: number;
end: number;
sourceStart: number;
sourceEnd: number;
color: string;
}
```
跨模块共享类型,供 `useTimelineEditor``TimelineEditor``useHomeController` 共用。
### 2.2 `useTimelineEditor.ts` — 完全重写
核心 Hook 从等分模型重写为主素材+插入模型:
- **新 API**`addInsert`(返回 `AddInsertResult: "ok" | "limit" | "no_space"`)、`removeInsert``moveInsert``resizeInsert``setInsertSourceRange``setPrimarySourceRange``toCustomAssignments`
- **`MultiCamCache`** 接口:独立 localStorage 持久化(`vigent_${storageKey}_multicam`),保存 inserts + primarySourceStart/End
- 自动清理:当选中素材列表变化时,移除引用已删除素材的插入块
- 主素材源范围在单/多模式切换时自动重置
### 2.3 `TimelineEditor.tsx` — 完全重写
可视化组件配合新模型:
- 主素材背景条:紫色底色 + 循环条纹图案loopCount > 1 时显示)
- 浮动插入块:彩色半透明矩形,支持拖拽移动(中央),点击弹出 ClipTrimmer 编辑截取范围与时长
- 插入候选栏:`selectedMaterials[1:]` 显示为 `+` 按钮,点击添加到时间轴
- 移动端适配40px 最小高度、12px 拖拽边缘、始终可见的删除按钮
- 清理了未使用的 `TimelineSegment` import
### 2.4 `useHomeController.ts` — 适配新 API
- 替换旧 timeline 解构为新 API`inserts``addInsert``removeInsert` 等)
- `handleGenerate()` 多素材分支重写:调用 `toCustomAssignments()` 生成 assignments构建 payload 时拆分 `material_path`(主)和 `material_paths`(全部去重路径)
- 单素材分支同样调用 `toCustomAssignments()` 处理裁剪范围
- 素材重命名时同步更新 inserts 中的 `materialName`
- 新增 `handleSetPrimary` 回调:将指定素材提升到 `selectedMaterials[0]`
- 新增 `insertCandidates` 计算值:`selectedMaterials[1:]` 对应的 Material 对象列表
### 2.5 `MaterialSelector.tsx` — 增强
- 新增 `Crown` 图标和 `onSetPrimary` 回调 prop
- 多素材模式下显示角色标签:`selectedMaterials[0]` 显示紫色"主素材"徽章,其余显示灰色"可插入"徽章
- 非主素材行显示 Crown 按钮,点击可设为主素材
### 2.6 `HomePage.tsx` — 适配
- `clipTrimmerSegment` 重写:支持 `"primary"` ID主素材裁剪和插入块 ID 两种路由
- `TimelineEditor` 组件传入全部新 props
- `ClipTrimmer``onConfirm` 根据 segment ID 路由到 `setPrimarySourceRange``setInsertSourceRange`
- `MaterialSelector` 传入 `onSetPrimary`
---
## ✅ 3) 后端改动
### 3.1 `workflow.py` — 多镜头支持修复
四项关键修复:
**(a) material_paths 来源**
```python
# 旧:从 custom_assignments 推断(不适用于多镜头)
# 新:优先信任前端传入的 req.material_paths
if req.material_paths and len(req.material_paths) >= 1:
material_paths = req.material_paths
else:
material_paths = [req.material_path]
```
**(b) custom_assignments 校验**
```python
# 旧len(custom_assignments) == len(material_paths)
# 新:>= 1 + 硬上限 50 + 路径子集校验
if len(req.custom_assignments) > 50:
raise ValueError(...)
unknown = [a.material_path for a in req.custom_assignments if a.material_path not in known_paths]
if unknown:
raise ValueError(...)
```
**(c) 下载去重 + 并发控制**
```python
# 旧:每个 assignment 独立下载(同一素材重复下载)
# 新按唯一路径去重下载path_to_local 映射
_segment_sem = asyncio.Semaphore(4) # 每次调用内部创建,非模块级
unique_paths = list(dict.fromkeys(a["material_path"] for a in assignments))
path_to_local: dict = {}
```
Semaphore 在每次 `generate_video()` 内部创建2 个并发任务 × 4 = 峰值 8 个 ffmpeg 进程。
**(d) 首尾段 capping 保护**
```python
# 仅在非 custom_assignments 模式下执行首尾对齐
if not req.custom_assignments and assignments and audio_duration > 0:
assignments[0]["start"] = 0.0
assignments[-1]["end"] = audio_duration
```
---
## ✅ 4) 文案深度学习弹窗防误触关闭
### 4.1 问题
- 「文案深度学习」弹窗默认支持遮罩与 `ESC` 关闭,用户在查看生成结果时容易误触关闭,重新打开后已生成内容丢失。
### 4.2 修复
- `frontend/src/shared/ui/AppModal.tsx`
- 新增 `closeOnEsc?: boolean` 配置,默认值 `true`,保持旧弹窗行为不变。
- `frontend/src/features/home/ui/ScriptLearningModal.tsx`
- 设置 `closeOnOverlay={false}``closeOnEsc={false}`,禁止遮罩/ESC 关闭。
- 输入页底部按钮由“取消”改为“清空”,仅清理链接输入,不关闭弹窗。
- 关闭路径收敛为:右上角 `X` 或结果页“填入文案”。
---
## ✅ 5) Code Review 修复
### 5.1 UX统一时长编辑入口
- **问题**:时间轴插入块同时支持右边缘拖拽调时长和点击弹窗编辑,拖拽操作每次都误触弹窗
- **修复**
- 移除 `TimelineEditor` 右侧 resize handle
- 引入 `dragMovedRef` + 5px 像素阈值区分拖拽与点击
- `ClipTrimmer` onConfirm 新增 `resizeInsert()` 同步,确认截取后自动更新时间轴块时长
- 帮助文字更新:"点击插入块设置截取/时长"
### 5.2 Lint 修复
- `useTimelineEditor.ts`3 处 `react-hooks/set-state-in-effect`,用 `eslint-disable-next-line` 标注(初始化和清理场景)
- `useTimelineEditor.ts`render-time ref 访问改为 `useState` 模式(`prevPrimaryId`
- `HomePage.tsx`:移除未使用的 `reorderMaterials` 解构
- `TimelineEditor.tsx`:移除未使用的 `useMemo` import 和 `materials`/`onResizeInsert` props
### 5.3 P1多片段 assignment 退化
- **问题**`selectedMaterials.length > 1` 但时间轴无插入块时,`is_multi=False`,后端走单素材路径丢弃非主素材
- **修复**`workflow.py`
```python
is_multi = len(material_paths) > 1 or (
req.custom_assignments is not None and len(req.custom_assignments) > 1
)
```
### 5.4 P1主素材 trim range 泄漏
- **问题**:切换主素材("设为主素材")时,旧主素材的 `primarySourceStart/End` 保留给新主素材,导致截取范围错误
- **原因**:仅按 `selectedMaterials.length` 变化重置,切换主素材时长度不变
- **修复**`useTimelineEditor.ts`):改用 identity 追踪
```typescript
const [prevPrimaryId, setPrevPrimaryId] = useState(selectedMaterials[0]);
if (selectedMaterials[0] !== prevPrimaryId) {
setPrevPrimaryId(selectedMaterials[0]);
setPrimarySourceStart(0);
setPrimarySourceEnd(0);
}
```
---
## 📁 今日修改文件
| 文件 | 改动 |
|------|------|
| `frontend/src/shared/types/timeline.ts` | **新增**`InsertSegment` 接口定义 |
| `frontend/src/features/home/model/useTimelineEditor.ts` | **重写**:等分模型 → 主素材+插入模型 |
| `frontend/src/features/home/ui/TimelineEditor.tsx` | **重写**:可视化组件适配新模型 |
| `frontend/src/features/home/model/useHomeController.ts` | 适配新 timeline API、生成 payload 重写 |
| `frontend/src/features/home/ui/MaterialSelector.tsx` | 主素材/可插入标签、设为主素材按钮 |
| `frontend/src/features/home/ui/HomePage.tsx` | ClipTrimmer 路由、TimelineEditor 新 props |
| `backend/app/modules/videos/workflow.py` | material_paths 来源、校验、下载去重、capping 保护 |
| `frontend/src/shared/ui/AppModal.tsx` | 新增 `closeOnEsc` 配置,支持按弹窗粒度控制 ESC 关闭行为 |
| `frontend/src/features/home/ui/ScriptLearningModal.tsx` | 禁用遮罩/ESC 关闭;输入页“取消”改为“清空” |
---
## 🔍 验证记录
- TypeScript 编译检查:`npx tsc --noEmit` ✅ 无错误
- Python 语法检查:`python -c "import ast; ast.parse(open(...).read())"`
- 前端 lint本次补充修复`npm run lint -- src/shared/ui/AppModal.tsx src/features/home/ui/ScriptLearningModal.tsx`
- 代码审查(前端 + 后端各一轮 subagent review
- 前端:逻辑正确,无 bug仅 1 处未使用 import已清理
- 后端:校验逻辑、下载去重、并发控制均正确
- 单素材模式向后兼容:`toCustomAssignments()` 在单素材时正确生成带裁剪范围的单段 assignment
---
## ⚠️ 已知限制
- `prepare_segment` 的"先裁后循环"路径(`needs_loop && source_start > 0`)仍存在,但前端的边界分割算法确保永远不会触发该路径
- 插入块最多 10 个(`useTimelineEditor``MAX_INSERTS=10`),超出时返回 `"limit"`
- 插入块最小时长 0.5s,低于此值的操作会被忽略

165
Docs/DevLogs/Day35.md Normal file
View File

@@ -0,0 +1,165 @@
## 小脸口型质量补偿落地 + 部署验证 (Day 35)
### 概述
完成「小脸口型质量补偿Small-Face LipSync Compensation」后端落地与部署收口。核心目标是在不改变用户模型选择语义`default/fast/advanced`)的前提下,对远景小脸素材增加质量补偿链路(检测 -> 裁切 -> 稀疏超分 -> 模型推理 -> 贴回并保持默认关闭、失败回退fail-open、线上可快速回滚。
---
## ✅ 1) 后端能力落地
### 1.1 配置与开关
新增 5 个配置项(默认保守):
- `LIPSYNC_SMALL_FACE_ENHANCE`(默认 `false`
- `LIPSYNC_SMALL_FACE_THRESHOLD`(默认 `256`
- `LIPSYNC_SMALL_FACE_UPSCALER``gfpgan | codeformer`
- `LIPSYNC_SMALL_FACE_GPU_ID`(默认 `0`
- `LIPSYNC_SMALL_FACE_FAIL_OPEN`(默认 `true`
对应代码入口:`backend/app/core/config.py``backend/.env`
### 1.2 新增小脸增强服务
新增 `backend/app/services/small_face_enhance_service.py`,实现完整补偿链路:
1. **小脸判定**CPU
- SCRFD`det_10g.onnx`,复用 LatentSync 权重)
- 从视频 10%-30% 区间均匀采样 24 帧
- 用最大脸宽中位数与阈值比较触发
2. **裁切与轨迹**CPU
- 每 8 帧检测一次,其余帧前向填充 + EMA 平滑
- bbox 外扩 `padding=0.28`
3. **稀疏超分**GPU0
- 检测帧走 GFPGAN/CodeFormer
- 非检测帧走 bicubic resize
- 目标尺寸 `512x512`
4. **贴回融合**CPU
- 口型局部 mask起点 68% + 侧边留白 16%+ 高斯羽化15px
- `cv2.seamlessClone`,失败回退 alpha blend
5. **帧数保护**
- 贴回前校验 `lipsync_frames <= original_frames`
- 仅当 `lipsync_frames > original_frames` 时报错(异常),其余按 lipsync 帧数正常贴回
---
## ✅ 2) LipSyncService 集成
`backend/app/services/lipsync_service.py` 关键改造:
-`_local_generate()` 内按顺序执行:
- `video looping` -> `small face enhance` -> `model infer` -> `blend back`
- 抽取 `_run_selected_model()` 统一模型路由MuseTalk / LatentSync server / LatentSync subprocess
- 小脸增强分支全链路 `try/except`,受 `LIPSYNC_SMALL_FACE_FAIL_OPEN` 控制
- `check_health()` 新增 `small_face_enhance` 状态字段
语义保持:
- 前端与 API 协议不变
- 用户选择模型优先,不因小脸强制换模型
- 仅本地路径(`_local_generate`)接入;远程路径暂不接入
---
## ✅ 3) 依赖与权重
### 3.1 依赖
`backend/requirements.txt` 新增:
- `opencv-python-headless>=4.8.0`
- `gfpgan>=1.3.8`
### 3.2 权重
- `models/FaceEnhance/GFPGANv1.4.pth`(新增目录与权重)
- `models/LatentSync/checkpoints/auxiliary/models/buffalo_l/det_10g.onnx`(复用)
---
## ✅ 4) 稳定性修复(部署后补丁)
为解决实际部署中的依赖兼容、帧数估算偏差、贴回误判与输出质量问题,补充九处修复:
1. **懒加载 + 守卫**
- `cv2/numpy` 改为 `try/except` 导入
-`_CV2_AVAILABLE` 守卫增强入口
- 缺依赖时跳过增强,不影响主流程
2. **类型注解与 torchvision 兼容补丁**
- 增加 `from __future__ import annotations`,避免 `np.ndarray` 在缺依赖场景下导入期报错
-`_ensure_upscaler()` 中注入
`sys.modules['torchvision.transforms.functional_tensor']`
兼容 `torchvision>=0.20``gfpgan/basicsr` 旧引用
3. **ffprobe 帧率与帧数估算修复**
- `_get_video_info()``csv` 切到 `json` 字段访问,避免 `nb_frames` 缺失导致字段错位
- fps 取值改为优先 `avg_frame_rate``r_frame_rate` 仅作为 fallback
4. **轨迹帧数与贴回检查修复**
- `_build_face_track()` 记录 ffmpeg 实际读帧数,覆盖估算 `nb_frames`
- `blend_back()` 放宽检查为 `lipsync <= original` 正常贴回,仅 `>` 报错
5. **空输出防护**
- `blend_back()` 增加 `ls_frames <= 0` 异常分支
- 由外层 `FAIL_OPEN` 捕获并回退常规路径,避免写出空视频
6. **时基对齐修复(慢动作/重影)**
- `_crop_and_upscale_video()` 输出 fps 改为跟随源视频 fps避免增强视频时间轴拉伸
- `blend_back()``orig_fps/ls_fps` 映射原始帧索引,避免只贴回前段帧导致动作变慢/重影
7. **无声视频修复**
- 小脸贴回成功后新增音轨封装mux步骤
- 强制将当前任务 `audio_path` 封装回贴回视频,防止增强路径无声音
8. **眼部重影修复**
- 口型 mask 起点进一步下移到 68%,并增加左右 16% 留白,减少眼周/鼻翼参与融合
- `seamlessClone` 后对结果做 mask 限域二次融合,抑制 Poisson 扩散到眼部上方
9. **畸形规避(运行侧)**
- `LIPSYNC_SMALL_FACE_THRESHOLD=9999` 仅用于链路冒烟,不用于质量评估
- 质量验证前统一恢复 `LIPSYNC_SMALL_FACE_THRESHOLD=256`
---
## ✅ 5) 部署文档与验证
新增并回写部署文档:`Docs/FACEENHANCE_DEPLOY.md`
文档修正点:
- 健康检查地址修正为:`/api/videos/lipsync/health`
- 响应示例补齐 `success/data` 外层包装
实际验证要点:
- `GET /api/videos/lipsync/health` 返回 `data.small_face_enhance`
- 默认 `enabled=false`,开关关闭时行为与旧版一致
- `detector_loaded=false`(懒加载)符合预期
---
## 📁 今日修改文件
| 文件 | 改动 |
|------|------|
| `backend/app/core/config.py` | 新增 `LIPSYNC_SMALL_FACE_*` 配置项5 个) |
| `backend/.env` | 增加小脸增强开关与参数 |
| `backend/app/services/small_face_enhance_service.py` | 新增:检测/裁切/超分/贴回主服务;后续补丁含懒加载与兼容修复 |
| `backend/app/services/lipsync_service.py` | 集成增强链路、抽取 `_run_selected_model`、health 增强状态 |
| `backend/requirements.txt` | 新增 `opencv-python-headless``gfpgan` |
| `models/FaceEnhance/GFPGANv1.4.pth` | 新增超分权重 |
| `Docs/FACEENHANCE_DEPLOY.md` | 新增部署文档并修正健康检查路径/返回示例 |
---
## ⚠️ 已知限制
- 仅本地唇形路径接入(`_local_generate()`);远程模式未接入小脸补偿
- 多镜头场景当前仍为全局判定,暂不做逐段小脸判定
- v1 优先单人自拍稳定性,多人脸切换策略后续再补

View File

@@ -6,13 +6,14 @@
## ⚡ 核心原则
| 规则 | 说明 |
|------|------|
| **默认更新** | 更新 `DayN.md``TASK_COMPLETE.md` |
| **按需更新** | 其他文档仅在内容变化涉及时更新 |
| **智能修改** | 错误→替换,改进→追加(见下方详细规则 |
| **先读后写** | 更新前先查看文件当前内容 |
| **日内合并** | 同一天的多次小修改合并为最终版本 |
| 规则 | 说明 |
|------|------|
| **默认更新** | 更新 `DayN.md``TASK_COMPLETE.md` |
| **按需更新** | 其他文档仅在内容变化涉及时更新 |
| **链路对齐** | 新增/重构文档后,回写入口文档(`README.md` 或对应 `*_README.md` |
| **智能修改** | 错误→替换,改进→追加(见下方详细规则) |
| **先读后写** | 更新前先查看文件当前内容 |
| **日内合并** | 同一天的多次小修改合并为最终版本 |
---
@@ -20,17 +21,19 @@
> **每次提交重要变更时,请核对以下文件是否需要同步:**
| 优先级 | 文件路径 | 检查重点 |
| :---: | :--- | :--- |
| 🔥 **High** | `Docs/DevLogs/DayN.md` | **(最新日志)** 详细记录变更、修复、代码片段 |
| 🔥 **High** | `Docs/TASK_COMPLETE.md` | **(任务总览)** 更新 `[x]`、进度条、时间线 |
| ⚡ **Med** | `README.md` | **(项目主页)** 功能特性、技术栈、最新截图 |
| ⚡ **Med** | `Docs/DEPLOY_MANUAL.md` | **(部署手册)** 环境变量、依赖包、启动命令变更 |
| ⚡ **Med** | `Docs/BACKEND_DEV.md` | **(后端规范)** 接口契约、模块划分、环境变量 |
| ⚡ **Med** | `Docs/BACKEND_README.md` | **(后端文档)** 接口说明、架构设计 |
| ⚡ **Med** | `Docs/FRONTEND_DEV.md` | **(前端规范)** API封装、日期格式化、新页面规范 |
| ⚡ **Med** | `Docs/FRONTEND_README.md` | **(前端文档)** 功能说明、页面变更 |
| 🧊 **Low** | `Docs/*_DEPLOY.md` | **(子系统部署)** LatentSync/CosyVoice/字幕等独立部署文档 |
| 优先级 | 文件路径 | 检查重点 |
| :---: | :--- | :--- |
| 🔥 **High** | `Docs/DevLogs/DayN.md` | **(最新日志)** 详细记录变更、修复、代码片段 |
| 🔥 **High** | `Docs/TASK_COMPLETE.md` | **(任务总览)** 更新 Day Current、`[x]` 与更新时间 |
| ⚡ **Med** | `README.md` | **(项目主页)** 功能特性、技术栈、最新截图 |
| ⚡ **Med** | `Docs/DEPLOY_MANUAL.md` | **(部署手册)** 环境变量、依赖包、启动命令变更 |
| ⚡ **Med** | `Docs/PUBLISH_DEPLOY.md` | **(发布专项)** 四平台登录/发布实现、排障、验收流程 |
| ⚡ **Med** | `Docs/BACKEND_DEV.md` | **(后端规范)** 接口契约、模块划分、环境变量 |
| ⚡ **Med** | `Docs/BACKEND_README.md` | **(后端文档)** 接口说明、架构设计 |
| ⚡ **Med** | `Docs/FRONTEND_DEV.md` | **(前端规范)** API封装、日期格式化、新页面规范 |
| **Med** | `Docs/FRONTEND_README.md` | **(前端文档)** 功能说明、页面变更 |
| 🧊 **Low** | `Docs/DOC_RULES.md` | **(规则文档)** 文档结构变化或流程变化时同步更新 |
| 🧊 **Low** | `Docs/*_DEPLOY.md` | **(子系统部署)** LatentSync/CosyVoice/字幕等独立部署文档 |
---
@@ -89,7 +92,7 @@
---
## 🔍 更新前检查清单
## 🔍 更新前检查清单
> **核心原则**:追加前先查找,避免重复和遗漏
@@ -112,12 +115,20 @@
| **有待验证状态** | 更新状态标记 |
| **全新独立内容** | 追加到末尾 |
**3. 必须更新的内容**
**3. 必须更新的内容**
-**状态标记**`🔄 待验证``✅ 已修复` / `❌ 失败`
-**进度百分比**:更新为最新值
-**文件修改列表**:补充新修改的文件
-**禁止**:创建重复的章节标题
-**文件修改列表**:补充新修改的文件
-**禁止**:创建重复的章节标题
### 发布相关变更的三检(新增)
若涉及抖音/微信/B站/小红书发布或扫码登录,额外执行:
1. **路由真值检查**:以 `backend/app/modules/publish/router.py` 为准校验 API 路径,避免文档写成旧路径(例如 `/screenshots/`)。
2. **专项文档对齐**:更新 `Docs/PUBLISH_DEPLOY.md` 中对应平台章节(登录、发布判定、排障)。
3. **入口文档回写**:至少回写一处入口文档(`README.md``Docs/BACKEND_README.md` / `Docs/DEPLOY_MANUAL.md`)。
### 示例场景
@@ -138,23 +149,23 @@
---
## 工具使用规范
## 工具使用规范
> **核心原则**:使用正确的工具,避免字符编码问题
### ✅ 推荐工具:Edit / Read / Grep
### ✅ 推荐工具Read / Grep / apply_patch
**使用场景**
- `Read`:更新前先查看文件当前内容
- `Edit`:精确替换现有内容、追加新章节
- `Grep`:搜索文件中是否已有相关章节
- `Write`:创建新文件(如 Day{N+1}.md
**使用场景**
- `Read`:更新前先查看文件当前内容
- `apply_patch`:精确替换现有内容、追加新章节
- `Grep`:搜索文件中是否已有相关章节
- `Write`:创建新文件(如 Day{N+1}.md
**注意事项**
```markdown
1. **先读后写**:编辑前先用 Read 确认内容
2. **精确匹配**Edit 的 old_string 必须与文件内容完全一致
3. **避免重复**:编辑前用 Grep 检查是否已存在同主题章节
1. **先读后写**:编辑前先用 Read 确认内容
2. **精确匹配**`apply_patch` 的上下文必须与文件内容一致
3. **避免重复**:编辑前用 Grep 检查是否已存在同主题章节
```
### ❌ 禁止使用:命令行工具修改文档
@@ -171,13 +182,14 @@
### 📝 最佳实践示例
**追加新章节**:使用 `Edit` 工具,`old_string` 匹配文件末尾内容,`new_string` 包含原内容 + 新章节
**修改现有内容**:使用 `Edit` 工具精确替换。
```markdown
old_string: "**状态**:🔄 待修复"
new_string: "**状态**✅ 已修复"
```
**追加新章节**:使用 `apply_patch`,以文件末尾稳定上下文为锚点追加
**修改现有内容**:使用 `apply_patch` 精确替换。
```markdown
@@
-**状态**🔄 待修复
+**状态**:✅ 已修复
```
---
@@ -191,11 +203,12 @@ ViGent2/Docs/
├── BACKEND_DEV.md # 后端开发规范
├── BACKEND_README.md # 后端功能文档
├── FRONTEND_DEV.md # 前端开发规范
├── FRONTEND_README.md # 前端功能文档
├── DEPLOY_MANUAL.md # 部署手册
├── SUPABASE_DEPLOY.md # Supabase 部署文档
├── LATENTSYNC_DEPLOY.md # LatentSync 部署文档
├── COSYVOICE3_DEPLOY.md # 声音克隆部署文档
├── FRONTEND_README.md # 前端功能文档
├── DEPLOY_MANUAL.md # 部署手册
├── PUBLISH_DEPLOY.md # 多平台发布专项文档
├── SUPABASE_DEPLOY.md # Supabase 部署文档
├── LATENTSYNC_DEPLOY.md # LatentSync 部署文档
├── COSYVOICE3_DEPLOY.md # 声音克隆部署文档
├── ALIPAY_DEPLOY.md # 支付宝付费部署文档
├── SUBTITLE_DEPLOY.md # 字幕系统部署文档
└── DevLogs/
@@ -254,16 +267,21 @@ ViGent2/Docs/
---
## 📏 内容简洁性规则
## 📏 内容简洁性规则
### 代码示例长度控制
- **原则**只展示关键代码片段10-20行以内
- **超长代码**:使用 `// ... 省略 ...` 或仅列出文件名+行号
- **完整代码**:引用文件链接,而非粘贴全文
### 调试信息处理
- **临时调试**:验证后删除(如调试日志、测试截图)
- **有价值信息**:保留(如错误日志、性能数据)
### 调试信息处理
- **临时调试**:验证后删除(如调试日志、测试截图)
- **有价值信息**:保留(如错误日志、性能数据)
### 敏感信息处理
- **禁止落盘**Cookie 值、Token、密钥、完整手机号、支付凭证。
- **日志引用**:仅记录必要关键词与结论,避免粘贴大段原始日志。
- **路径引用**:优先给相对路径与文件名,不记录无关个人目录信息。
### 状态标记更新
- **🔄 待验证** → 验证后更新为 **✅ 已修复** 或 **❌ 失败**
@@ -280,29 +298,29 @@ ViGent2/Docs/
- **格式一致性**:直接参考 `TASK_COMPLETE.md` 现有格式追加内容。
- **进度更新**:仅在阶段性里程碑时更新进度百分比。
### 🔍 完整性检查清单 (必做)
每次更新 `TASK_COMPLETE.md` 时,必须**逐一检查**以下所有板块:
1. **文件头部 & 导航**
- [ ] `更新时间`:必须是当天日期
- [ ] `整体进度`简述当前状态
- [ ] `快速导航`Day 范围与文档一致
2. **核心任务区**
- [ ] `已完成任务`:添加新的 [x] 项目
- [ ] `后续规划`:管理三色板块 (优先/债务/未来)
3. **统计与回顾**
- [ ] `进度统计`:更新对应模块状态和百分比
- [ ] `里程碑`:若有重大进展,追加 `## Milestone N`
4. **底部链接**
- [ ] `时间线`:追加今日概括
- [ ] `相关文档`:更新 DayLog 链接范围
> **口诀**:头尾时间要对齐,任务规划两手抓,里程碑上别落下
### 🔍 完整性检查清单 (必做)
每次更新 `TASK_COMPLETE.md` 时,必须**逐一检查**以下板块:
1. **文件头部**
- [ ] `更新时间`:必须是当天日期
- [ ] `整体进度`与当前 Day 状态一致(例如 Day31
2. **当日 Current 区块**
- [ ] 新增/更新 `Day N (Current)` 标题
- [ ] 关键任务以 `[x]` 列出(避免仅写结论)
- [ ] 前一天 Day 标题取消 `(Current)` 标记
3. **Roadmap 与模块状态**
- [ ] 如有已完成长期事项,及时从待办迁移到已完成
- [ ] 模块完成度有变化时同步更新
4. **相关文档链接**
- [ ] 新增的核心文档(如 `PUBLISH_DEPLOY.md`)要在相关位置可追溯
- [ ] 若 DayN 记录了“文档回写”,`TASK_COMPLETE.md` 的当日条目也要体现
> **口诀**:头部日期、当日 Current、模块状态、链接可追溯
---
**最后更新**2026-02-11
**最后更新**2026-03-03

428
Docs/FACEENHANCE_DEPLOY.md Normal file
View File

@@ -0,0 +1,428 @@
# 小脸口型质量补偿链路部署指南
> **更新时间**2026-03-10 v1.4
> **适用版本**SmallFaceEnhance v1.4 (内嵌于 Backend 进程)
> **架构**LipSyncService 内部模块,无独立进程
---
## 架构概览
小脸口型质量补偿链路(简称"小脸增强")作为 `LipSyncService._local_generate()` 的**前处理分支**,在 lipsync 推理前自动检测小脸并增强输入质量:
```
原视频 + 音频
→ video looping (已有逻辑)
→ 小脸检测 (SCRFD, CPU)
→ [非小脸] 直接用用户所选模型推理 (现有路径)
→ [小脸]
A. 裁切主脸区域 (带 padding)
B. 稀疏关键帧超分到 512px (GFPGAN, GPU0)
C. 用用户所选模型推理 (MuseTalk 或 LatentSync)
D. 下半脸 mask 羽化 + seamlessClone 贴回原帧
→ 进入现有后续流程 (字幕/BGM/上传)
```
**关键约束**
- 不改前端、不改 API 协议
- 模型选择权归用户,不因小脸自动换模型
- 默认 fail-open增强链任何一步失败自动回退原流程
- 无独立进程/PM2跟随 `vigent2-backend` 运行
---
## 硬件要求
| 配置 | 说明 |
|------|------|
| 检测器 | SCRFD (det_10g.onnx)CPU 推理,无额外 GPU 开销 |
| 超分 | GFPGANGPU0 (与 MuseTalk 同卡,顺序执行),约 2-3GB 显存 |
| 内存 | 流式 ffmpeg pipe 逐帧处理,不额外占用大量内存 |
> 超分与 MuseTalk 共享 GPU0顺序执行不会同时占用显存。
---
## 依赖安装
### 1. pip 依赖
已在 `backend/requirements.txt` 中添加:
```
opencv-python-headless>=4.8.0
gfpgan>=1.3.8
```
安装:
```bash
cd /home/rongye/ProgramFiles/ViGent2/backend
pip install opencv-python-headless gfpgan
```
> `gfpgan` 会自动拉取 `basicsr`、`facexlib` 等依赖。
> `onnxruntime` 需单独确认已安装LatentSync 环境中已有 1.23.2)。
> 如果 backend 虚拟环境中缺少 onnxruntime需额外安装`pip install onnxruntime`
### 2. 系统依赖
- `ffmpeg` / `ffprobe`:已有(视频处理必需)
---
## 模型权重
### 目录结构
```
models/
├── FaceEnhance/
│ └── GFPGANv1.4.pth ← 超分权重 (~333MB)
└── LatentSync/checkpoints/auxiliary/
└── models/buffalo_l/
└── det_10g.onnx ← 人脸检测权重 (~16MB, 复用已有)
```
### 下载方式
**GFPGAN 权重**(已下载):
```bash
cd /home/rongye/ProgramFiles/ViGent2/models/FaceEnhance
wget -O GFPGANv1.4.pth "https://github.com/TencentARC/GFPGAN/releases/download/v1.3.4/GFPGANv1.4.pth"
```
**SCRFD 检测器权重**
复用 LatentSync 已有的 `det_10g.onnx`,无需额外下载。代码自动引用路径:
`models/LatentSync/checkpoints/auxiliary/models/buffalo_l/det_10g.onnx`
> 权重缺失时自动 fail-open 跳过增强,不会导致任务失败。
---
## 后端配置
`backend/.env` 中的相关变量:
```ini
# =============== 小脸口型质量补偿链路 ===============
LIPSYNC_SMALL_FACE_ENHANCE=false # 总开关 (true/false)
LIPSYNC_SMALL_FACE_THRESHOLD=256 # 触发阈值 (像素,脸宽 < 此值触发)
LIPSYNC_SMALL_FACE_UPSCALER=gfpgan # 超分模型: gfpgan | codeformer
LIPSYNC_SMALL_FACE_GPU_ID=0 # 超分 GPU (与 MuseTalk 同卡)
LIPSYNC_SMALL_FACE_FAIL_OPEN=true # 失败回退 (true=回退原流程, false=报错)
```
`backend/app/core/config.py` 中的默认值:
```python
LIPSYNC_SMALL_FACE_ENHANCE: bool = False
LIPSYNC_SMALL_FACE_THRESHOLD: int = 256
LIPSYNC_SMALL_FACE_UPSCALER: str = "codeformer"
LIPSYNC_SMALL_FACE_GPU_ID: int = 0
LIPSYNC_SMALL_FACE_FAIL_OPEN: bool = True
```
> `.env` 优先于 `config.py` 默认值。`config.py` 仅在 `.env` 未设置时生效。
### 模块内部常量
以下参数固定为代码常量(`small_face_enhance_service.py`),暂不走 env
| 常量 | 值 | 说明 |
|------|-----|------|
| `PADDING` | 0.28 | bbox 外扩比例 |
| `DETECT_EVERY` | 8 | 每 N 帧检测,中间帧 EMA 插值 |
| `TARGET_SIZE` | 512 | 超分目标尺寸 |
| `MASK_FEATHER` | 15 | 下半脸 mask 羽化像素 |
| `MASK_UPPER_RATIO` | 0.68 | 口型 mask 起始位置 (crop 高度的 68%,仅覆盖嘴部/下巴) |
| `MASK_SIDE_MARGIN` | 0.16 | 左右留白比例,避免改动面颊/鼻翼 |
| `SAMPLE_FRAMES` | 24 | 小脸判定采样帧数 |
| `SAMPLE_WINDOW` | (0.10, 0.30) | 采样窗口 (视频 10%~30%) |
| `ENCODE_FPS` | 25 | 中间视频编码帧率 fallback优先跟随源视频 fps源 fps 不可用时回退 25 |
| `ENCODE_CRF` | 18 | 中间视频编码质量 |
| `EMA_ALPHA` | 0.3 | bbox EMA 平滑系数 |
---
## 启用与验证
### 1. 开启小脸口型质量补偿链路
```bash
# 编辑 backend/.env
LIPSYNC_SMALL_FACE_ENHANCE=true
```
重启后端:
```bash
pm2 restart vigent2-backend
```
### 2. 强制触发测试
设置极大阈值,使任何视频都触发增强:
```ini
LIPSYNC_SMALL_FACE_THRESHOLD=9999
```
> 仅用于链路冒烟测试,不用于质量评估。`9999` 会强制大脸素材进入增强分支,可能出现中脸变形/鼻翼细节异常。
提交一个视频任务,检查日志:
```bash
pm2 logs vigent2-backend --lines 50
```
应看到类似输出:
```
小脸增强: face_w=320px < threshold=9999px, 触发增强
✅ SCRFD 检测器已加载
✅ 超分器已加载: gfpgan
小脸增强: face_w=320px threshold=9999px enhanced=True upscaler=gfpgan time=12.3s
✅ 小脸增强 + 唇形同步完成: /path/to/output.mp4
```
### 3. 调回正常阈值
验证通过后,改回合理阈值:
```ini
LIPSYNC_SMALL_FACE_THRESHOLD=256
```
并重启 backend`pm2 restart vigent2-backend`
### 4. 健康检查
```bash
curl http://localhost:8006/api/videos/lipsync/health | python3 -m json.tool
```
应包含 `data.small_face_enhance`
```json
{
"success": true,
"data": {
"small_face_enhance": {
"enabled": true,
"threshold": 256,
"detector_loaded": true
}
}
}
```
---
## 相关文件
| 文件 | 说明 |
|------|------|
| `backend/app/services/small_face_enhance_service.py` | 小脸增强主服务 (检测 + 裁切 + 超分 + 贴回) |
| `backend/app/services/lipsync_service.py` | 混合路由 + 小脸增强集成 + `_run_selected_model()` |
| `backend/app/core/config.py` | `LIPSYNC_SMALL_FACE_*` 配置项 |
| `models/FaceEnhance/GFPGANv1.4.pth` | GFPGAN 超分权重 |
| `models/LatentSync/checkpoints/auxiliary/models/buffalo_l/det_10g.onnx` | SCRFD 检测器权重 (复用) |
| `Temp/小脸增强分支-实施计划.md` | 详细方案文档 |
---
## 处理流程详解
### 1. 检测阶段 (CPU)
- 从视频 10%~30% 区间均匀采 24 帧
- SCRFD (det_10g.onnx) 检测最大脸,取中位数脸宽
- `脸宽 < THRESHOLD` 时触发增强
### 2. 裁切 + 轨迹 (CPU)
- 每 8 帧检测人脸 bbox中间帧 EMA 插值平滑
- bbox + 0.28 padding 外扩clamp 到帧边界
- 实际读取帧数回写 `track.frame_count`,修正 ffprobe 估算偏差
- ffmpeg pipe 流式裁切,输出 512x512 视频
### 3. 超分 (GPU0)
- 检测帧 (每 8 帧)GFPGAN 全量超分
- 非检测帧bicubic resize 到 512x512
- 增强视频输出 fps 跟随源视频 fps不再固定写 25fps避免时基拉伸
- 推理后自动 `torch.cuda.empty_cache()`
### 4. Lipsync 推理
- 用户选择的模型 (fast/default/advanced) 对增强后的人脸视频推理
- 模型选择语义不变
### 5. 贴回 (CPU)
- 口型局部 mask (从 68% 高度开始 + 左右留白 16%) + 高斯羽化 15px仅覆盖嘴部/下巴)
- `cv2.seamlessClone(NORMAL_CLONE)` 贴回原帧
- 对 seamlessClone 结果再按 mask 区域做二次 alpha 限域,避免融合扩散到眼部上方
- seamlessClone 失败时 fallback alpha 混合
- 贴回按时间轴映射原始帧索引(`orig_fps/ls_fps`),避免只使用前段帧导致动作变慢/重影
- 帧数保护lipsync 按音频时长输出,帧数通常 <= 原始 looped 视频;仅 `lipsync帧数 > 原始帧数` 时报错,`<=` 时正常贴回
- 空输出保护:`lipsync帧数 <= 0` 直接抛异常,外层 `FAIL_OPEN` 回退原流程,避免写出空视频
- 音轨封装:贴回后强制复用 `audio_path` 重新 mux 音轨,避免增强路径出现无声视频
---
## 回滚方案
**一级回滚 (秒级)**
```ini
LIPSYNC_SMALL_FACE_ENHANCE=false
```
重启 backend 即可,所有任务走原流程。
**二级回滚 (版本级)**
回退 `lipsync_service.py` 增强接入提交,配置项保留但不生效。
---
## 常见问题
### onnxruntime 未安装
```
⚠️ SCRFD 初始化失败: No module named 'onnxruntime'
```
**解决**
```bash
pip install onnxruntime
```
### GFPGAN 权重缺失
```
⚠️ GFPGAN 权重不存在: .../models/FaceEnhance/GFPGANv1.4.pth
```
**解决**:参考上方"模型权重"章节下载。权重缺失时超分自动降级为 bicubic resize。
### 帧数异常导致 fail-open
```
⚠️ 小脸贴回失败,回退原流程: 帧数异常: lipsync=300 > original=250
```
**说明**v1.1 已放宽帧数检查。lipsync 模型按音频时长输出帧数,通常 <= looped 视频帧数,此时正常贴回。仅当 lipsync 输出帧数**大于**原始帧数时才报错(异常情况)。
### lipsync 输出为空导致回退
```
⚠️ 小脸贴回失败,回退原流程: lipsync 输出帧数为 0跳过贴回
```
**说明**v1.2 新增空输出保护。`ls_frames <= 0` 时立即抛错,由外层 fail-open 回退到常规唇形路径,避免生成空视频文件。
### 增强后动作变慢 / 眼睛重影
**原因**:原视频与 lipsync 输出 fps 不一致时,若按同帧号直接贴回,可能出现时间轴错位(只贴回前段帧)。
**修复**v1.3 已改为按 `orig_fps/ls_fps` 做时间轴映射,贴回阶段使用时间对应帧而非同索引帧,同时增强视频输出 fps 跟随源 fps。
**进一步修复v1.4**
- mask 起点进一步下移到 68%,并增加左右 16% 留白,减少眼周/鼻翼参与融合
- 对 seamlessClone 输出增加 mask 限域,防止 Poisson 扩散造成眼部上方重影
### 增强后脸部畸形(鼻翼/中脸异常)
**高概率原因**:使用了测试阈值 `LIPSYNC_SMALL_FACE_THRESHOLD=9999`,把本不需要增强的大脸素材强制送入补偿链路。
**建议处理**
- 先改回 `LIPSYNC_SMALL_FACE_THRESHOLD=256` 并重启 backend。
- 如仍有异常,临时关闭 `LIPSYNC_SMALL_FACE_ENHANCE=false` 做 A/B 对比,再继续调参。
### 增强后无声音
**原因**:贴回阶段 rawvideo 写出默认不带音轨。
**修复**v1.3 已在贴回后强制执行音轨封装mux使用当前任务 `audio_path` 写回音频。
> v1.0 使用严格一致性检查(`lipsync != original` 即失败),在 looped 视频帧数远大于音频帧数时会误判失败。v1.1 已修复。
### 增强后口型有偏移
检查 `PADDING` 常量是否合理。过小的 padding 可能导致裁切区域不够,过大会引入太多背景。当前默认 0.28 (28%) 适用于大多数单人自拍场景。
### torchvision 兼容性 (functional_tensor)
```
No module named 'torchvision.transforms.functional_tensor'
```
**原因**torchvision >= 0.20 移除了 `functional_tensor` 模块,但 `basicsr`gfpgan 依赖)仍引用。
**解决**:代码已内置兼容 shim`_ensure_upscaler()` 中自动注入 `sys.modules`),无需手动处理。如仍出现,检查 `_ensure_upscaler` 方法是否正常执行。
### cv2/numpy 未安装
```
⚠️ cv2 未安装,小脸增强不可用
```
**说明**`cv2``numpy` 为 lazy import`try/except`),缺失时小脸增强自动禁用,不影响后端启动和其他功能。安装 `opencv-python-headless` 即可恢复。
---
## 已知限制 (v1.4)
- 仅覆盖本地 lipsync 路径 (`_local_generate()`),远程模式 (`_remote_generate()`) 暂不接入
- 多镜头仅全局判定,不做逐段小脸检测
- 仅保证单人 (主脸) 场景稳定,不做多人脸切换
- CodeFormer 超分需额外安装 `basicsr`,当前推荐使用 GFPGAN
---
## v1.3 → v1.4 变更记录
| 修复项 | 说明 |
|--------|------|
| 眼部重影修复 | mask 起点下移到 68% + 左右 16% 留白,减少上半脸与鼻翼参与融合 |
| Poisson 扩散抑制 | seamlessClone 后按 mask 二次限域,避免眼部上方 ghosting |
---
## v1.2 → v1.3 变更记录
| 修复项 | 说明 |
|--------|------|
| 时基修复 | `_crop_and_upscale_video()` 输出 fps 跟随源视频 fps避免增强视频时间轴被拉伸 |
| 贴回对齐修复 | `blend_back()` 改为按 `orig_fps/ls_fps` 映射原始帧索引,减少动作变慢与重影 |
| 音轨修复 | 贴回成功后新增音轨封装mux避免增强路径无声音 |
---
## v1.1 → v1.2 变更记录
| 修复项 | 说明 |
|--------|------|
| 空输出保护 | `blend_back()` 新增 `ls_frames <= 0` 判断,直接抛错并由外层 fail-open 回退,避免写出空视频 |
---
## v1.0 → v1.1 变更记录
| 修复项 | 说明 |
|--------|------|
| ffprobe 解析 | CSV → JSON 格式,字段名访问,不再受 `nb_frames` 缺失导致的字段错位影响 |
| fps 选取 | 优先 `avg_frame_rate`(真实平均帧率),`r_frame_rate` 作为 fallback避免 `60/1` 等 timebase 倍数导致帧数估算偏大 |
| 实际帧数回写 | `_build_face_track()` 用 ffmpeg 实际读到的帧数覆盖估算值,`track.frame_count` 更准确 |
| 贴回帧数检查 | 放宽为 `lipsync <= original` 时正常贴回,仅 `>` 时报错;适配 MuseTalk/LatentSync 按音频时长输出的行为 |
| 边界防护 | `streams` 为空时 return None`r_frame_rate` 分母为 0 时 fallback 25fps |
| torchvision 兼容 | `_ensure_upscaler()` 中注入 `functional_tensor` shim兼容 torchvision >= 0.20 |
| lazy import | `cv2`/`numpy` 包装在 `try/except`,缺失时增强自动禁用不影响后端启动 |
| 类型注解 | `from __future__ import annotations` 避免依赖缺失时 `np.ndarray` 等注解触发 NameError |

View File

@@ -1,5 +1,11 @@
# 前端开发规范
## 文档定位
- 本文档只定义前端开发规范与约束结构、交互、持久化、接口调用、Checklist
- 功能说明与启动方式请查看 `Docs/FRONTEND_README.md`
- 历史变更请记录在 `Docs/DevLogs/``Docs/TASK_COMPLETE.md`,不要写入本规范文档。
## 目录结构
采用轻量 FSDFeature-Sliced Design结构
@@ -33,8 +39,12 @@ frontend/src/
│ │ ├── MaterialSelector.tsx
│ │ ├── ScriptEditor.tsx
│ │ ├── ScriptExtractionModal.tsx
│ │ ├── RewriteModal.tsx
│ │ ├── ScriptLearningModal.tsx
│ │ ├── script-extraction/
│ │ │ └── useScriptExtraction.ts
│ │ ├── script-learning/
│ │ │ └── useScriptLearning.ts
│ │ ├── TitleSubtitlePanel.tsx
│ │ ├── FloatingStylePreview.tsx
│ │ ├── VoiceSelector.tsx
@@ -62,12 +72,16 @@ frontend/src/
│ ├── hooks/
│ │ ├── useTitleInput.ts
│ │ └── usePublishPrefetch.ts
│ ├── ui/
│ │ ├── SelectPopover.tsx # 统一下拉/BottomSheet 选择器
│ │ └── AppModal.tsx # 统一弹窗基座
│ ├── types/
│ │ ├── user.ts # User 类型定义
│ │ └── publish.ts # 发布相关类型
│ └── contexts/ # 全局 ContextAuth、Task
│ └── contexts/ # 全局 ContextAuth、Task、Cleanup
│ ├── AuthContext.tsx
── TaskContext.tsx
── TaskContext.tsx
│ └── CleanupContext.tsx
├── components/ # 遗留通用组件
│ └── VideoPreviewModal.tsx
└── proxy.ts # Next.js middleware路由保护
@@ -180,6 +194,71 @@ body {
---
## 统一下拉选择器规范 (SelectPopover)
首页/发布页的业务选择项音色、参考音频、配音、素材、BGM、作品、样式、模型、画面比例统一使用 `@/shared/ui/SelectPopover`
- 桌面端使用 Popover移动端自动切换 BottomSheet
- 触发器与面板风格统一:`border-white/10 + bg-black/25`(或同级变体)
- 下拉项选中态统一:`border-purple-500 bg-purple-500/20`
- 选中项需添加 `data-popover-selected="true"`,确保再次打开时自动滚动定位到已选项
- 底部空间不足时自动上拉;滚动条隐藏但保留滚动能力
### 视频预览与下拉层级
- 下拉菜单层级应低于视频预览弹窗,避免遮挡预览内容
- 在下拉内点击“预览”时,不强制关闭下拉(便于连续预览)
- 关闭预览后,用户可继续在下拉内操作;点击外部时下拉正常收起
### 例外说明
- `ScriptEditor` 的“历史文案 / AI多语言”保持原有轻量菜单样式不强制迁移到 `SelectPopover`
---
## 统一弹窗规范 (AppModal)
所有居中弹窗如视频预览、文案提取、AI 改写、文案深度学习、文案扩展编辑、录音、密码修改)统一使用 `@/shared/ui/AppModal` + `AppModalHeader`
- 统一遮罩与层级:`fixed inset-0` + `bg-black/80` + `backdrop-blur-sm` + 明确 `z-index`
- 统一挂载位置:通过 Portal 挂载到 `document.body`,避免局部容器/层叠上下文影响,确保是全页面弹窗
- 统一容器风格:`border-white/10`、深色半透明背景、圆角 `rounded-2xl`、重阴影
- 统一关闭行为:支持 `ESC`;是否允许点击遮罩关闭通过 `closeOnOverlay` 显式配置
- 默认策略:除关键流程外,`closeOnOverlay` 默认应为 `true`,并通过 `AppModalHeader onClose` 提供右上角 `X` 关闭入口
- 关键流程例外:发布成功清理弹窗(`CleanupContext`)必须保持 `closeOnOverlay=false`,且不提供右上角关闭按钮
- 录音弹窗例外:使用 `closeOnOverlay={!isRecording}`,录音中禁止遮罩关闭,避免误触中断
- 统一滚动策略:弹窗打开时锁定背景滚动(`lockBodyScroll`),内容区自行滚动
- 特殊层级场景(例如视频预览压过下拉)使用更高 `z-index`(如 `z-[320]`
### 文案类弹窗结果操作栏规范
适用组件:
- `ScriptExtractionModal`
- `RewriteModal`
- `ScriptLearningModal`
统一要求:
- 结果页操作按钮统一放在内容底部Action Grid避免“标题右上角按钮 + 底部按钮”混排。
- 主按钮统一为高亮渐变(如「填入文案」),其余按钮统一次级样式(`bg-white/10`)。
- 动作文案尽量统一:`填入文案` / `复制` / `重新生成`(或与当前流程等价的返回动作)。
- 按钮尺寸、圆角、间距保持一致(推荐 `py-2.5 px-3 rounded-lg text-sm`)。
---
## 发布后清理弹窗规范 (CleanupContext)
发布页由 `CleanupContext` 统一承接“全部平台发布成功后的清理引导”,规则如下:
- 触发条件:仅当本次发布结果 **全部成功** 才触发弹窗;有任一失败则走原内联结果展示。
- 持久化恢复:`cleanup_pending` 写入 localStorage支持刷新/跳转后恢复;带 `createdAt`24 小时自动过期。
- 清理顺序:必须先调用 `POST /api/videos/cleanup`;仅在接口成功后才清本地输入字段并关闭弹窗。
- 状态同步:清理成功后派发 `vigent:workspace-cleared` 事件当前发布页输入态需就地重置避免“localStorage 已清空但页面仍显示旧值”)。
- 失败处理:接口失败时保留弹窗和输入数据,允许重试;连续失败达到阈值后显示“暂不清理,继续使用”。
- 本地清理范围:仅输入内容(文案/标题/副标题/发布标题/标签不清用户偏好样式、字号、边距、模型、BGM 等)。
- 下载策略:弹窗“下载视频备份”必须使用同源下载接口(`/api/videos/generated/{id}/download`),不要直接使用签名 URL 作为 `href`
---
## API 请求规范
### 必须使用 `api` (axios 实例)
@@ -187,6 +266,7 @@ body {
所有需要认证的 API 请求**必须**使用 `@/shared/api/axios` 导出的 axios 实例。该实例已配置:
- 自动携带 `credentials: include`
- 遇到 401/403 时自动清除 cookie 并跳转登录页
- AI/Tools 接口(如 `/api/ai/*``/api/tools/extract-script``/api/tools/analyze-creator``/api/tools/generate-topic-script`)现为强制鉴权,禁止匿名 `fetch` 直调
**使用方式:**
@@ -346,8 +426,9 @@ useEffect(() => {
- `shared/api`Axios 实例与统一响应类型
- `shared/lib`通用工具函数media.ts / auth.ts / title.ts
- `shared/hooks`:跨功能通用 hooks
- `shared/ui`:跨功能通用 UI如 SelectPopover
- `shared/types`跨功能实体类型User / PublishVideo 等)
- `shared/contexts`:全局 ContextAuthContext / TaskContext
- `shared/contexts`:全局 ContextAuthContext / TaskContext / CleanupContext
- `components/`遗留通用组件VideoPreviewModal
## 类型定义规范
@@ -366,11 +447,14 @@ useEffect(() => {
- 标题样式 ID / 字幕样式 ID
- 标题字号 / 字幕字号
- 标题显示模式(`short` / `persistent`
- 背景音乐选择 / 音量 / 开关状态
- 唇形模型模式(`default` / `fast` / `advanced`
- 背景音乐选择 / 开关状态(当前前端不提供音量滑杆,生成时使用固定音量)
- 输出画面比例(`9:16` / `16:9`
- 素材选择 / 历史作品选择
- 选中配音 ID (`selectedAudioId`)
- 选中参考音频 ID (`selectedRefAudio` 对应 id)
- 语速 (`speed`,声音克隆模式)
- 语气 (`emotion`,声音克隆模式)
- 时间轴段信息 (`useTimelineEditor` 的 localStorage)
### 历史文案(独立持久化)
@@ -406,6 +490,7 @@ useEffect(() => {
- 发布按钮在未选择任何平台时禁用
- 仅保留"立即发布",不再提供定时发布 UI/参数
- **作品选择持久化**:使用 `video.id`(稳定标识)而非 `video.path`(签名 URL进行选择、比较和 localStorage 存储。发布时根据 `id` 查找对应 `path` 发送请求。
- **新作品优先级**:检测到“刚生成的新视频”时,页面首次恢复优先选中最新视频;之后用户手动改选会继续按持久化值恢复。
---
@@ -457,6 +542,11 @@ await api.post('/api/videos/generate', {
使用 `MediaRecorder` API 录制音频,格式为 `audio/webm`,上传后后端自动转换为 WAV (16kHz mono)。
- 录音入口放在“我的参考音频”区域底部右侧(与“上传音频”并排)。
- 录音交互使用弹窗:开始/停止 -> 试听 -> 使用此录音 / 弃用本次录音。
- 关闭录音弹窗时如仍在录制,会先停止录音再关闭。
- 录音中禁止点击遮罩关闭(`closeOnOverlay={!isRecording}`);未录音时允许遮罩关闭。
```typescript
// 录音需要用户授权麦克风
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
@@ -472,5 +562,5 @@ const mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm' });
### UI 结构
配音方式使用 Tab 切换:
- **EdgeTTS 音色** - 预设音色 2x3 网格
- **声音克隆** - 参考音频列表 + 在线录音 + 语速下拉菜单 (5 档: 较慢/稍慢/正常/稍快/较快)
- **EdgeTTS 音色** - 统一下拉选择(显示“音色名 + 语言”)
- **声音克隆** - 参考音频选择器(含试听/重命名/删除/重识别)+ 底部右侧上传/录音入口(录音弹窗)+ 语速/语气下拉

View File

@@ -2,69 +2,84 @@
ViGent2 的前端界面,采用 Next.js 16 + TailwindCSS 构建。
## 📌 文档定位
- 本文档用于说明前端功能、运行方式与目录概览(面向使用与协作)。
- 开发规范与实现约束请查看 `Docs/FRONTEND_DEV.md`
- 历史变更与里程碑请查看 `Docs/DevLogs/``Docs/TASK_COMPLETE.md`
## ✨ 核心功能
### 1. 视频生成 (`/`)
- **一、文案提取与编辑**: 文案输入/提取/翻译/保存。
- **一、文案提取与编辑**: 文案输入/提取/翻译/保存;输入框右下角支持一键扩展到大编辑器
- **二、配音**: 配音方式EdgeTTS/声音克隆)+ 配音列表(生成/试听/管理)合并为一个板块。
- **三、素材编辑**: 视频素材(上传/选择/管理)+ 时间轴编辑(波形/色块/拖拽排序)合并为一个板块。
- **四、标题与字幕**: 片头标题/副标题/字幕样式配置;短暂显示/常驻显示;样式预览使用视频片头帧作为真实背景 (Day 28)
- **五、背景音乐**: 试听 + 音量控制 + 选择持久化。
- **四、标题与字幕**: 片头标题/副标题/字幕样式配置;短暂显示/常驻显示;样式预览使用视频片头帧作为真实背景。
- **五、背景音乐**: 试听 + 搜索选择 + 选择持久化(无音量滑杆,生成时固定混音系数)
- **六、作品**(右栏): 作品列表 + 作品预览合并为一个板块。
- **进度追踪**: 实时显示视频生成进度 (10% -> 100%)。
- **作品预览**: 生成完成后直接播放下载(作品预览 + 历史作品)。
- **下载直达**: 首页作品下载与发布成功弹窗下载统一走同源下载接口(`/api/videos/generated/{id}/download`),避免新标签页在线播放。
- **预览优化**: 预览视频 `metadata` 预取,首帧加载更快。
- **本地保存**: 文案/标题/偏好由 `useHomePersistence` 统一持久化,刷新后恢复 (Day 14/17)
- **历史文案**: 手动保存/加载/删除历史文案,独立 localStorage 持久化 (Day 23)
- **选择持久化**: 首页/发布页作品选择均使用稳定 `id` 持久化,刷新保持用户选择;新视频生成后自动选中最新 (Day 21)
- **AI 多语言翻译**: 支持 9 种目标语言翻译文案 + 还原原文 (Day 22)
- **本地保存**: 文案/标题/偏好由 `useHomePersistence` 统一持久化,刷新后恢复。
- **历史文案**: 手动保存/加载/删除历史文案,独立 localStorage 持久化。
- **选择持久化**: 首页/发布页作品选择均使用稳定 `id` 持久化;新视频生成后优先选中最新,后续用户手动选择持续持久化恢复
- **统一下拉交互**: 首页/发布页业务选择器统一为 SelectPopover支持自动上拉、已选定位、移动端 BottomSheet`ScriptEditor` 的“历史文案 / AI多语言”为产品例外保留原轻量菜单
- **AI 多语言翻译**: 支持 9 种目标语言翻译文案 + 还原原文。
### 2. 全自动发布 (`/publish`) [Day 7 新增]
### 2. 全自动发布 (`/publish`)
- **多平台管理**: 统一管理抖音、微信视频号、B站、小红书账号状态。
- **扫码登录**:
- 集成后端 Playwright 生成的 QR Code。
- 实时检测扫码状态 (Wait/Success)。
- Cookie 自动保存与状态同步。
- **发布配置**: 设置视频标题、标签、简介。
- **作品选择**: 卡片列表 + 搜索 + 预览弹窗
- **选择持久化**: 使用稳定 `video.id` 持久化选择,刷新保持;新视频生成自动选中最新 (Day 21)
- **作品选择**: SelectPopover 下拉 + 搜索 + 预览弹窗(下拉内可连续预览,不强制收起)
- **选择持久化**: 使用稳定 `video.id` 持久化选择,刷新保持;新视频生成自动选中最新。
- **预览兼容**: 签名 URL / 相对路径均可直接预览。
- **发布方式**: 仅支持 "立即发布"。
- **发布成功清理弹窗**: 全平台发布成功后触发 `CleanupModal`(展示成功平台、截图、下载备份、清理按钮),刷新/跳转后可恢复。
- **清理失败兜底**: 清理接口失败时弹窗不关闭且不清本地输入;连续失败达到阈值后可“暂不清理,继续使用”。
- **清理范围**: 仅清理输入内容字段(文案/标题/副标题/发布标题/标签),保留样式、字号、边距、模型等用户偏好。
### 3. 声音克隆 [Day 13 新增]
- **TTS 模式选择**: EdgeTTS (预设音色) / 声音克隆 (自定义音色) 切换
### 3. 声音克隆
- **TTS 模式选择**: EdgeTTS / 声音克隆切换,音色选择统一下拉(显示音色名 + 语言)
- **音色试听**: EdgeTTS 音色列表支持一键试听,按音色 locale 自动选择对应语言的固定示例文案。
- **参考音频管理**: 上传/列表/重命名/删除参考音频,上传后自动 Whisper 转写 ref_text + 超 10s 自动截取。
- **录音入口**: 参考音频区域底部右侧提供“上传音频 / 录音”入口;录音采用弹窗流程(录制 -> 试听 -> 使用/弃用)。
- **录音防误触**: 录音中禁用遮罩关闭(避免误触中断);未录音时可点空白关闭。
- **重新识别**: 旧参考音频可重新转写并截取 (RotateCw 按钮)。
- **一键克隆**: 选择参考音频后自动调用 CosyVoice 3.0 服务。
- **语速控制**: 声音克隆模式下支持 5 档语速 (0.8-1.2),选择持久化 (Day 23)
- **语气控制**: 声音克隆模式下支持 4 种语气 (正常/欢快/低沉/严肃)基于 CosyVoice3 `inference_instruct2`,选择持久化 (Day 29)
- **多语言支持**: EdgeTTS 10 语言声音列表,声音克隆 language 透传 (Day 22)
- **语速控制**: 声音克隆模式下支持 5 档语速 (0.8-1.2)统一下拉,选择持久化。
- **语气控制**: 声音克隆模式下支持 4 种语气 (正常/欢快/低沉/严肃)统一下拉,选择持久化
- **多语言支持**: EdgeTTS 10 语言声音列表,声音克隆 language 透传。
### 4. 配音前置 + 时间轴编排 [Day 23 新增]
### 4. 配音前置 + 时间轴编排
- **配音独立生成**: 先生成配音 → 选中配音 → 再选素材 → 生成视频。
- **配音管理面板**: 生成/试听/改名/删除/选中,异步生成 + 进度轮询。
- **时间轴编辑器**: wavesurfer.js 音频波形 + 色块可视化素材分配,拖拽分割线调整各段时长。
- **素材截取设置**: ClipTrimmer 双手柄 range slider + HTML5 视频预览播放。
- **拖拽排序**: 时间轴色块支持 HTML5 Drag & Drop 调换素材顺序
- **时间轴编辑器**: wavesurfer.js 音频波形 + 主素材连续播放背景 + 浮动插入镜头块,拖拽移动位置,点击弹窗编辑截取范围与时长。
- **素材截取设置**: ClipTrimmer 双手柄 range slider + HTML5 视频预览播放(主素材与插入块统一入口)
- **多镜头模型**: 主素材循环填满音频时长,其余素材作为插入候选可多次添加到时间轴任意位置;支持"设为主素材"切换
- **自定义分配**: 后端 `custom_assignments` 支持用户定义的素材分配方案(含 `source_start/source_end` 截取区间)。
- **时间轴语义对齐**: 超出音频时仅保留可见段并截齐末段,超出段不参与生成;不足音频时最后可见段自动循环补齐。
- **画面比例控制**: 时间轴顶部支持 `9:16 / 16:9` 输出比例选择,设置持久化并透传后端。
### 5. 字幕与标题 [Day 13 新增]
- **片头标题**: 可选输入,限制 15 字;支持”短暂显示 / 常驻显示”默认短暂显示4 秒),对标题副标题同时生效
- **片头副标题**: 可选输入,限制 20 字;显示在主标题下方,用于补充说明或悬念引导;独立样式配置(字体/字号/颜色/间距),可由 AI 同时生成;与标题共享显示模式设定;仅在视频画面中显示,不参与发布标题 (Day 25)
### 5. 字幕与标题
- **片头标题**: 可选输入,限制 15 字;支持”短暂显示 / 常驻显示”默认短暂显示4 秒)`常驻显示` 时主标题副标题都会全程显示
- **片头副标题**: 可选输入,限制 20 字;显示在主标题下方,用于补充说明或悬念引导;独立样式配置(字体/字号/颜色/间距),可由 AI 同时生成;与标题共享显示模式设定;仅在视频画面中显示,不参与发布标题。
- **标题同步**: 首页片头标题修改会同步到发布信息标题。
- **逐字高亮字幕**: 卡拉OK效果默认开启,可关闭
- **逐字高亮字幕**: 卡拉OK效果默认开启。
- **自动对齐**: 基于 faster-whisper 生成字级别时间戳。
- **样式预设**: 标题/字幕/副标题样式选择 + 预览 + 字号调节 (Day 16/25)
- **默认样式**: 标题 90px 站酷快乐体;字幕 60px 经典黄字 + DingTalkJinBuTi (Day 17)
- **样式持久化**: 标题/字幕/副标题样式与字号刷新保留 (Day 17/25)
- **样式预设**: 标题/字幕/副标题样式选择 + 预览 + 字号调节。
- **默认样式**: 标题 90px 站酷快乐体;字幕 60px 经典黄字 + DingTalkJinBuTi。
- **样式持久化**: 标题/字幕/副标题样式与字号刷新保留。
### 6. 背景音乐 [Day 16 新增]
- **试听预览**: 点击试听即选中,音量滑块实时生效
- **混音控制**: 仅影响 BGM配音保持原音量
### 6. 背景音乐
- **试听预览**: 下拉列表内可直接试听
- **选择体验**: 发布页同款搜索选择器,打开时自动定位到当前已选
- **混音控制**: 当前前端不提供音量滑杆,生成时固定 `bgm_volume=0.2`,保持配音音量稳定。
### 7. 账户设置 [Day 15 新增]
### 7. 账户设置
- **手机号登录**: 11位中国手机号验证登录。
- **账户下拉菜单**: 显示手机号(中间四位脱敏)+ 有效期 + 修改密码 + 安全退出。
- **修改密码**: 弹窗输入当前密码与新密码,修改后强制重新登录。
@@ -76,12 +91,12 @@ ViGent2 的前端界面,采用 Next.js 16 + TailwindCSS 构建。
- **到期续费**: 会员到期后登录自动跳转付费页续费,流程与首次开通一致。
- **管理员激活**: 管理员手动激活功能并存,两种方式互不影响。
### 8. 文案提取助手 (`ScriptExtractionModal`) [Day 15 新增]
- **多源提取**: 支持文件拖拽上传与 URL 粘贴 (B站/抖音/TikTok)
- **AI 智能改写**: 集成 GLM-4.7-Flash自动改写为口播文案
- **自定义提示词**: 可自定义改写提示词,留空使用默认;设置持久化到 localStorage (Day 25)
- **一键填入**: 提取结果直接填充至视频生成输入框
- **智能交互**: 实时进度展示,防误触设计
### 9. 文案创作助手3 个弹窗)
- **文案提取助手** (`ScriptExtractionModal`): 支持文件上传与 URL 提取(需登录),提取结果可一键填入主编辑器
- **AI 智能改写** (`RewriteModal`): 基于 GLM-4.7-Flash 改写文案,支持自定义提示词持久化
- **文案深度学习** (`ScriptLearningModal`): 输入抖音/B站博主主页分析热门话题并生成口播文案需登录
- **统一结果操作栏**: 三个弹窗结果页统一底部 Action Grid 风格,主按钮为「填入文案」,次按钮统一「复制 / 重新生成(或等价返回操作)」
- **登录鉴权**: 依赖受保护接口(`/api/tools/*``/api/ai/*`),未登录会触发全局 401 跳转登录
## 🛠️ 技术栈
@@ -92,7 +107,7 @@ ViGent2 的前端界面,采用 Next.js 16 + TailwindCSS 构建。
- **音频波形**: wavesurfer.js (时间轴编辑器)
- **API**: Axios 实例 `@/shared/api/axios` (对接后端 FastAPI :8006)
## 🚀 开发指南
## 🚀 快速开始
### 安装依赖
@@ -140,11 +155,12 @@ src/
- **URL 统一工具**: `@/shared/lib/media` 提供 `resolveMediaUrl` / `resolveAssetUrl`
- **代理配置**: Next.js Rewrites (如需) 或直接 CORS。
## 🎨 设计规范
## 🎨 UI 说明(概览)
- **主色调**: 深紫/黑色系 (Dark Mode)
- **交互**: 悬停微动画 (Hover Effects);操作按钮默认半透明可见 (opacity-40)hover 时全亮,兼顾触屏设备
- **响应式**: 适配桌面端与移动端;发布页平台卡片响应式布局(移动端紧凑/桌面端宽松)
- **滚动体验**: 列表滚动条统一隐藏 (hide-scrollbar);刷新后自动回到顶部(禁用浏览器滚动恢复 + 列表 scroll 时间门控)
- **样式预览**: 浮动预览窗口,桌面端左上角 280px移动端右下角 160px不遮挡控件
- **输入辅助**: 标题/副标题输入框实时字数计数器,超限变红
- 业务选择器统一使用 `SelectPopover`(桌面 Popover / 移动端 BottomSheet`ScriptEditor` 的“历史文案 / AI多语言”保留原轻量菜单。
- 业务弹窗统一使用 `AppModal`(统一遮罩、头部、关闭行为与滚动策略)。
- 弹窗关闭策略:默认支持 `ESC` / `X` / 点击空白关闭;仅发布成功清理弹窗为强制流程(不允许空白关闭,也不显示 `X`)。
- 文案类弹窗结果页按钮统一:底部 Action Grid、主次按钮层级一致、文案动作命名一致填入/复制/重新生成)。
- 视频预览弹窗层级高于下拉菜单;下拉内支持连续预览。
- 页面同时适配桌面端与移动端;长列表统一隐藏滚动条。
- 详细 UI 规范、持久化规范与交互约束请查看 `Docs/FRONTEND_DEV.md`

View File

@@ -137,11 +137,9 @@ CUDA_VISIBLE_DEVICES=1 python -m scripts.inference \
└── DEPLOY.md
```
---
---
## 步骤 7: 性能优化 (预加载模型服务)
---
## 步骤 6: 性能优化(预加载模型服务)
为了消除每次生成视频时 30-40秒 的模型加载时间,建议运行常驻服务。
@@ -201,7 +199,7 @@ LatentSync 1.6 需要 ~18GB VRAM。如果遇到 OOM 错误:
- `inference_steps`: 增加到 30-50 可提高质量
- `guidance_scale`: 增加可改善唇同步,但过高可能导致抖动
### 编码流水线优化(Day 30
### 编码流水线优化(当前实现
LatentSync 内部默认流程有两处冗余编码已优化:
@@ -214,7 +212,7 @@ LatentSync 内部默认流程有两处冗余编码已优化:
---
### 无脸帧容错(Day 30
### 无脸帧容错(当前实现
素材中部分帧检测不到人脸(转头、遮挡、空镜头)时,不再中断整次推理:

View File

@@ -10,8 +10,8 @@
MuseTalk 作为 **混合唇形同步方案** 的长视频引擎:
- **短视频 (<120s)** → LatentSync 1.6 (GPU1, 端口 8007)
- **长视频 (>=120s)** → MuseTalk 1.5 (GPU0, 端口 8011)
- **短视频 (<100s,按当前 `.env` 示例)** → LatentSync 1.6 (GPU1, 端口 8007)
- **长视频 (>=100s,按当前 `.env` 示例)** → MuseTalk 1.5 (GPU0, 端口 8011)
- 路由阈值由 `LIPSYNC_DURATION_THRESHOLD` 控制
- MuseTalk 不可用时自动回退到 LatentSync
@@ -196,7 +196,7 @@ MUSETALK_ENCODE_CRF=14 # CRF 越小越清晰 (14≈接近视
MUSETALK_ENCODE_PRESET=slow # x264 preset (slow=高压缩效率)
# 混合唇形同步路由
LIPSYNC_DURATION_THRESHOLD=120 # 秒, >=此值用 MuseTalk
LIPSYNC_DURATION_THRESHOLD=100 # 秒, >=此值用 MuseTalk
```
> **参数档位参考**

215
Docs/PUBLISH_DEPLOY.md Normal file
View File

@@ -0,0 +1,215 @@
# 多平台发布部署与实现说明(抖音 / 微信视频号 / B站 / 小红书)
## 1. 目标
本文件用于集中说明以下内容:
- 平台登录(扫码)如何实现
- 自动化发布链路如何实现
- 部署时必须具备的运行环境与配置
- 常见故障如何快速定位
适用代码范围:`backend/app/modules/publish``backend/app/services/publish_service.py``backend/app/services/qr_login_service.py``backend/app/services/uploader/*`
---
## 2. 总体架构
### 2.1 API 入口
- `POST /api/publish`:执行发布
- `POST /api/publish/login/{platform}`:获取二维码并启动登录会话
- `GET /api/publish/login/status/{platform}`:轮询扫码状态
- `POST /api/publish/logout/{platform}`:注销并删除对应 Cookie
- `POST /api/publish/cookies/save/{platform}`:手动保存浏览器 `document.cookie`
- `GET /api/publish/accounts`:查询各平台是否已登录
- `GET /api/publish/screenshot/{filename}`:读取发布成功截图(需登录)
- `POST /api/videos/cleanup`:清理当前用户工作区生成产物(发布成功后前端触发)
核心路由文件:`backend/app/modules/publish/router.py`
### 2.2 服务分层
- `PublishService`:平台路由、账号隔离、视频路径处理、调用具体 uploader
- `QRLoginService`Playwright 获取二维码、监控扫码结果、保存 Cookie
- `*Uploader`:平台发布自动化(抖音/微信/小红书基于 PlaywrightB站基于 biliup
### 2.3 发布成功后的清理联动
- 前端 `CleanupContext` 在“本次所选平台全部发布成功”时触发清理弹窗。
- 用户点击清理时先调用 `POST /api/videos/cleanup`,仅接口成功后才清本地输入并关闭弹窗。
- 清理成功后前端派发 `vigent:workspace-cleared` 事件,当前发布页会就地重置标题/标签输入态。
- 接口失败时弹窗保持打开并允许重试;连续失败达到阈值后可“暂不清理,继续使用”。
- 弹窗“下载视频备份”走同源下载接口:`GET /api/videos/generated/{video_id}/download`,确保浏览器直接保存文件而非新标签页播放。
---
## 3. Cookie 与账号隔离
### 3.1 存储路径
- 用户隔离路径:`backend/user_data/{user_uuid}/cookies/{platform}_cookies.json`
- 兼容旧版路径:`backend/app/cookies/{platform}_cookies.json`
路径管理文件:`backend/app/core/paths.py`
### 3.2 Cookie 格式
- `bilibili`:简化字典格式(`SESSDATA` / `bili_jct` / `DedeUserID` / `DedeUserID__ckMd5`
- `douyin` / `weixin` / `xiaohongshu`Playwright `storage_state` 格式(`cookies + origins`
对应逻辑:`backend/app/services/publish_service.py``backend/app/services/qr_login_service.py`
---
## 4. 运行与部署要求
### 4.1 系统依赖
- Python 3.10+
- Node.js 18+
- Playwright Chromium`playwright install chromium`
- 系统 Chrome建议
- Xvfb建议尤其抖音/微信 headful
### 4.2 启动建议
- 推荐使用根目录脚本启动后端:`./run_backend.sh`
- 脚本内置 `xvfb-run`,适合无物理桌面服务器场景
脚本:`run_backend.sh`
### 4.3 环境变量(核心)
统一在 `backend/.env` 配置,配置定义见 `backend/app/core/config.py`
- 抖音:`DOUYIN_HEADLESS_MODE``DOUYIN_CHROME_PATH``DOUYIN_USER_AGENT``DOUYIN_LOCALE``DOUYIN_TIMEZONE_ID`
- 微信:`WEIXIN_HEADLESS_MODE``WEIXIN_CHROME_PATH``WEIXIN_USER_AGENT``WEIXIN_LOCALE``WEIXIN_TIMEZONE_ID``WEIXIN_TRANSCODE_MODE`
- 小红书:`XIAOHONGSHU_HEADLESS_MODE``XIAOHONGSHU_CHROME_PATH``XIAOHONGSHU_USER_AGENT``XIAOHONGSHU_LOCALE``XIAOHONGSHU_TIMEZONE_ID`
- 发布截图目录:`PUBLISH_SCREENSHOT_DIR`
说明:小红书这些配置当前用于发布 uploader扫码登录服务里抖音/微信使用独立配置B站/小红书登录走通用默认浏览器参数。
---
## 5. 登录实现(扫码)
统一由 `QRLoginService` 处理:
1. 打开平台登录页并提取二维码CSS/Text 多策略)
2. 前端展示二维码给用户扫码
3. 后台监控 URL + Session Cookie 变化
4. 登录成功后保存 Cookie 文件
关键文件:`backend/app/services/qr_login_service.py`
### 5.1 抖音
- 登录页:`https://creator.douyin.com/`
- 额外能力:监听 `check_qrconnect` 接口,支持识别 `redirect_url`
- 特殊场景:若触发刷脸验证,会提取验证二维码 `face_verify_qr` 返回前端
### 5.2 微信视频号
- 登录页:`https://channels.weixin.qq.com/platform/`
- 二维码提取支持 `img/canvas/svg` 等兜底选择器
### 5.3 小红书
- 登录页:`https://creator.xiaohongshu.com/`
- 关键修复:默认可能落在短信登录页,先自动切换到扫码模式再提取二维码
- 成功判定支持 `/new/home`,避免仅依赖旧 `success_indicator`
### 5.4 B站
- 登录页:`https://passport.bilibili.com/login`
- 扫码成功后保存 B站所需核心 Cookie 字段
---
## 6. 自动化发布实现
### 6.1 抖音Playwright
文件:`backend/app/services/uploader/douyin_uploader.py`
- 使用 `storage_state` 打开浏览器上下文
- 自动进入上传页,触发 file chooser 上传
- 上传完成后填写标题/简介/话题,必要时处理封面
- 发布成功判定:页面跳转、接口信号、管理页核验
- 成功后回写 Cookie并保存发布成功截图
### 6.2 微信视频号Playwright
文件:`backend/app/services/uploader/weixin_uploader.py`
- 进入视频号创作平台,自动定位上传入口
- 标题/描述/标签按当前产品规则统一写入“视频描述”字段
- 发布成功判定:`post_create` API 或页面离开创建页
- 成功后回写 Cookie并保存发布成功截图
### 6.3 小红书Playwright
文件:`backend/app/services/uploader/xiaohongshu_uploader.py`
- 自动进入发布页并触发上传
- 上传阶段增强:
- `UPLOAD_SIGNAL_TIMEOUT` 启动探测窗口
- 无后缀视频文件自动准备带后缀临时文件(`hardlink/copy`
- 文件名后缀一致性校验
- `UPLOAD_IDLE_TIMEOUT` 空转超时保护,避免长时间“假卡住”
- 发布成功判定URL 跳转 + 成功文案 + 发布 API 信号
- 成功后回写 Cookie并返回成功截图 URL
### 6.4 B站biliup
文件:`backend/app/services/uploader/bilibili_uploader.py`
- 使用 biliup SDK不依赖 Playwright 发布流程
- 读取 B站 Cookie调用 biliup 上传并提交
- 返回 `bvid/aid` 对应链接(若 API 返回)
---
## 7. 调试与排障
### 7.1 后端日志
- PM2 输出日志:`~/.pm2/logs/vigent2-backend-out.log`
- PM2 错误日志:`~/.pm2/logs/vigent2-backend-error.log`
### 7.2 常见问题
- 现象:登录二维码拿不到
- 优先检查平台登录页是否改版selector 失效)
- 小红书需确认是否仍停留短信登录视图
- 现象:发布看起来卡住
- 检查是否长期停留“等待上传状态/等待发布结果”
- 小红书优先检查上传文件名后缀与 MIME 识别
- 现象:突然要求重新登录
- 通常为 Cookie 失效或平台风控,需要重新扫码
### 7.3 调试产物
- 开启对应 `*_DEBUG_ARTIFACTS` 可输出调试截图/网络日志
- 成功截图通过 `/api/publish/screenshot/{filename}` 回传前端
---
## 8. 建议的验收流程(每次部署后)
1. 健康检查:`curl http://127.0.0.1:8006/health`
2. 登录检查:分别触发 4 个平台扫码登录并确认状态轮询可达成功
3. 发布检查:四个平台各发 1 条测试视频(或最少覆盖当日变更平台)
4. 截图检查:确认成功截图可通过 `/api/publish/screenshot/{filename}` 拉取
5. 日志检查:确认无持续重试、无长时间空转、无明显 selector 失败风暴
---
## 9. 关联文档
- 总部署文档:`Docs/DEPLOY_MANUAL.md`
- 后端说明:`Docs/BACKEND_README.md`
- 当日变更记录:`Docs/DevLogs/Day31.md`

View File

@@ -1,6 +1,10 @@
# Qwen3-TTS 1.7B 部署指南
> 本文档描述如何在 Ubuntu 服务器上部署 Qwen3-TTS 1.7B-Base 声音克隆模型。
>
> ⚠️ **状态:历史归档(已停用)**
> 当前项目生产环境已切换到 CosyVoice 3.0,请优先参考 `Docs/COSYVOICE3_DEPLOY.md`。
> 本文档仅保留用于回溯旧方案,不建议新部署继续使用。
## 系统要求

View File

@@ -24,7 +24,7 @@
音频 → faster-whisper → 字幕JSON ─────────────────────────────────────────────┴→ Remotion合成 → 最终视频
```
> **唇形同步路由**: 短视频 (<120s) 用 LatentSync 1.6 (GPU1),长视频 (>=120s) 用 MuseTalk 1.5 (GPU0),由 `LIPSYNC_DURATION_THRESHOLD` 控制。
> **唇形同步路由**: 短视频 (<100s,按当前 `.env` 示例) 用 LatentSync 1.6 (GPU1),长视频 (>=100s,按当前 `.env` 示例) 用 MuseTalk 1.5 (GPU0),由 `LIPSYNC_DURATION_THRESHOLD` 控制。
## 系统要求
@@ -146,8 +146,8 @@ remotion/
| 阶段 | 进度 | 说明 |
|------|------|------|
| 下载素材 | 0% → 5% | 从 Supabase 下载输入视频 |
| TTS 语音生成 | 5% → 25% | EdgeTTS / Qwen3-TTS / 预生成配音下载 |
| 唇形同步 | 25% → 80% | LatentSync 推理 |
| TTS 语音生成 | 5% → 25% | EdgeTTS / CosyVoice / 预生成配音下载 |
| 唇形同步 | 25% → 80% | LatentSync / MuseTalk按阈值路由 |
| 字幕对齐 | 80% → 85% | faster-whisper 生成字级别时间戳 |
| Remotion 渲染 | 85% → 95% | 合成字幕和标题 |
| 上传结果 | 95% → 100% | 上传到 Supabase Storage |
@@ -305,4 +305,4 @@ WhisperService(device="cuda:0") # 或 "cuda:1"
| 2026-02-27 | 1.3.0 | 架构图更新 MuseTalk 混合路由Remotion 并发渲染从 8 提升到 16GPU 分配说明更新 |
| 2026-02-28 | 1.3.1 | MuseTalk 合成阶段优化:纯 numpy blending + FFmpeg pipe NVENC GPU 硬编码替代双重编码 |
| 2026-02-28 | 1.4.0 | compose 流复制替代重编码FFmpeg 超时保护 (600s/30s)Remotion 并发 16→4Whisper 时间戳平滑 + 原文节奏映射;全局视频生成 Semaphore(2)Redis 任务 TTL |
| 2026-03-02 | 1.5.0 | Remotion bundle 缓存修复(硬链接视频/字体到 cached public 目录);编码流水线优化 prepare_segment/normalize CRF 23→18多素材 concat 改为流复制 |
| 2026-03-02 | 1.5.0 | Remotion bundle 缓存修复(硬链接视频/字体到 cached public 目录);编码流水线优化 prepare_segment/normalize CRF 23→18多素材 concat 改为流复制MuseTalk 合成改为 rawvideo 管道 + `libx264`(可配 CRF/preset |

View File

@@ -1,302 +1,380 @@
# ViGent2 开发任务清单 (Task Log)
**项目**: ViGent2 数字人口播视频生成系统
**进度**: 100% (Day 30 - Remotion 缓存修复 + 编码流水线质量优化)
**更新时间**: 2026-03-02
---
## 📅 对话历史与开发日志
> 这里记录了每一天的核心开发内容与 milestone。
### Day 30: Remotion 缓存修复 + 编码流水线质量优化 + 唇形同步容错 (Current)
- [x] **Remotion 缓存 404 修复**: bundle 缓存命中时,新生成的视频/字体文件不在旧缓存 `public/` 目录 → 404 → 回退 FFmpeg无标题字幕。改为硬链接`fs.linkSync`)当前渲染所需文件到缓存目录
- [x] **LatentSync `read_video` 跳过冗余 FPS 重编码**: 检测输入 FPS已是 25fps 时跳过 `ffmpeg -r 25 -crf 18` 重编码
- [x] **LatentSync final mux 流复制**: `imageio` CRF 13 写帧后的 mux 步骤从 `libx264 -crf 18` 改为 `-c:v copy`,消除冗余双重编码
- [x] **`prepare_segment` + `normalize_orientation` CRF 提质**: CRF 23 → 18与 LatentSync 内部质量标准统一
- [x] **多素材 concat 流复制**: 各段参数已统一,`concat_videos``libx264 -crf 23` 改为 `-c:v copy`
- [x] **编码次数总计**: 从 5-6 次有损编码降至 3 次prepare_segment → LatentSync/MuseTalk 模型输出 → Remotion
- [x] **LatentSync 无脸帧容错**: 素材部分帧检测不到人脸时不再中断推理,无脸帧保留原画面,单素材异常时回退原视频
- [x] **MuseTalk 管道直编码**: `cv2.VideoWriter(mp4v)` 中间有损文件改为 FFmpeg rawvideo stdin 管道,消除一次冗余有损编码
- [x] **MuseTalk 参数环境变量化**: 推理与编码参数detect_every/blend_cache/CRF/preset 等)从硬编码迁移到 `backend/.env`当前使用质量优先档CRF 14, preset slow, detect_every 2, blend_cache_every 2
- [x] **Workflow 异步防阻塞**: 新增 `_run_blocking()` 线程池辅助5 处同步 FFmpeg 调用(旋转归一化/prepare_segment/concat/BGM 混音)改为 `await _run_blocking()`,事件循环不再被阻塞
- [x] **compose 跳过优化**: 无 BGM 时 `final_audio_path == audio_path`,跳过多余的 compose 步骤Remotion 路径直接用 lipsync 输出,非 Remotion 路径 `shutil.copy` 透传
- [x] **compose() 异步化**: `compose()` 改为 `async def`,内部 `_get_duration``_run_ffmpeg``run_in_executor`
- [x] **同分辨率跳过 scale**: 多素材逐段比对分辨率,匹配的传 `None` 走 copy 分支;单素材同理。避免已是目标分辨率时的无效重编码
- [x] **`_get_duration()` 线程池化**: workflow 中 3 处同步 ffprobe 探测改为 `await _run_blocking()`
- [x] **compose 循环 CRF 统一**: 循环场景 CRF 23 → 18与全流水线质量标准一致
- [x] **多素材片段校验**: prepare 完成后校验片段数量一致,防止空片段进入 concat
- [x] **唇形模型前端选择**: 生成按钮右侧新增模型下拉(默认模型/快速模型/高级模型),全链路透传 `lipsync_model` 到后端路由。默认保持阈值策略,快速强制 MuseTalk高级强制 LatentSync三种模式均有 LatentSync 兜底。选择 localStorage 持久化。
### Day 29: 视频流水线优化 + CosyVoice 语气控制
- [x] **字幕同步修复**: Whisper 时间戳三步平滑(单调递增+重叠消除+间隙填补)+ 原文节奏映射(线性插值 + 单字时长钳位)
- [x] **LatentSync 嘴型参数调优**: inference_steps 16→20, guidance_scale 2.0, DeepCache 启用, Remotion concurrency 16→4
- [x] **compose 流复制**: 不循环时 `-c:v copy` 替代 libx264 重编码compose 耗时从分钟级降到秒级
- [x] **FFmpeg 超时保护**: `_run_ffmpeg()` timeout=600, `_get_duration()` timeout=30。
- [x] **全局并发限制**: `asyncio.Semaphore(2)` 控制同时运行的生成任务数
- [x] **Redis 任务 TTL**: create 24h, completed/failed 2h, list 自动清理过期索引
- [x] **临时字体清理**: 字体文件加入 temp_files 清理列表
- [x] **预览背景 CORS 修复**: 素材同源代理 `/api/materials/stream/{id}` 彻底绕开跨域
- [x] **CosyVoice 语气控制**: 声音克隆模式新增语气下拉(正常/欢快/低沉/严肃),基于 `inference_instruct2()` 自然语言指令控制情绪,全链路透传 instruct_text默认"正常"行为不变
### Day 28: CosyVoice FP16 加速 + 文档全面更新
- [x] **CosyVoice FP16 半精度加速**: `AutoModel()` 开启 `fp16=True`LLM 推理和 Flow Matching 自动混合精度运行,预估提速 30-40%、显存降低 ~30%。
- [x] **文档全面更新**: README.md / DEPLOY_MANUAL.md / SUBTITLE_DEPLOY.md / BACKEND_README.md 补充 MuseTalk 混合唇形同步方案、性能优化、Remotion 并发渲染等内容
### Day 27: Remotion 描边修复 + 字体样式扩展 + 混合唇形同步 + 性能优化
- [x] **描边渲染修复**: 标题/副标题/字幕从 `textShadow` 4 方向模拟改为 CSS 原生 `-webkit-text-stroke` + `paint-order: stroke fill`,修复描边过粗和副标题重影问题
- [x] **字体样式扩展**: 标题样式 4→12 个(+庞门正道/优设标题圆/阿里数黑体/文道潮黑/无界黑/厚底黑/寒蝉半圆体/欣意吉祥宋),字幕样式 4→8 个(+少女粉/清新绿/金色隶书/楷体红字)
- [x] **描边参数优化**: 所有预设 `stroke_size` 从 8 降至 4~5配合原生描边视觉更干净
- [x] **TypeScript 类型修复**: Root.tsx `Composition` 泛型与 `calculateMetadata` 参数类型对齐Video.tsx `VideoProps` 添加索引签名兼容 `Record<string, unknown>`VideoLayer.tsx 移除 `OffthreadVideo` 不支持的 `loop` prop
- [x] **进度条文案还原**: 进度条从显示后端推送消息改回固定 `正在AI生成中...`
- [x] **MuseTalk 混合唇形同步**: 部署 MuseTalk 1.5 常驻服务 (GPU0, 端口 8011),按音频时长自动路由 — 短视频 (<120s) 走 LatentSync长视频 (>=120s) 走 MuseTalkMuseTalk 不可用时自动回退。
- [x] **MuseTalk 推理性能优化**: server.py v2 重写 — cv2 直读帧(跳过 ffmpeg→PNG)、人脸检测降频(每5帧)、BiSeNet mask 缓存(每5帧)、cv2.VideoWriter 直写(跳过 PNG 写盘)、batch_size 8→32预估 30min→8-10min (~3x)。
- [x] **Remotion 并发渲染优化**: render.ts 新增 concurrency 参数,从默认 8 提升到 16 (56核 CPU),预估 5min→2-3min
### Day 26: 前端优化:板块合并 + 序号标题 + UI 精细化
- [x] **板块合并**: 首页 9 个独立板块合并为 5 个主板块(配音方式+配音列表→三、配音;视频素材+时间轴→四、素材编辑;历史作品+作品预览→六、作品)
- [x] **中文序号标题**: 一~十编号(首页一~六,发布页七~十),移除所有 emoji 图标
- [x] **embedded 模式**: 6 个组件支持 `embedded` prop嵌入时不渲染外层卡片/标题
- [x] **配音列表两行布局**: embedded 模式第 1 行语速+生成配音(右对齐),第 2 行配音列表+刷新
- [x] **子组件自渲染子标题**: MaterialSelector/TimelineEditor embedded 时自渲染 h3 子标题+操作按钮同行
- [x] **下拉对齐**: TitleSubtitlePanel 标签统一 `w-20`,下拉 `w-1/3 min-w-[100px]`,垂直对齐。
- [x] **参考音频文案简化**: 底部段落移至标题旁,简化为 `(上传3-10秒语音样本)`
- [x] **账户手机号显示**: AccountSettingsDropdown 新增手机号显示
- [x] **标题显示模式对副标题生效**: payload 条件修复 + UI 下拉上移至板块标题行
- [x] **登录后用户信息立即可用**: AuthContext 暴露 `setUser`,登录成功后立即写入用户数据,修复登录后显示"未知账户"的问题
- [x] **文案微调**: 素材描述改为"上传自拍视频最多可选4个";显示模式选项加"标题"前缀
- [x] **UI/UX 体验优化**: 操作按钮移动端可见opacity-40、手机号脱敏、标题字数计数器、时间轴拖拽抓手图标、截取滑块放大
- [x] **代码质量修复**: 密码弹窗 success 清空、MaterialSelector useMemo + disabled 守卫、TimelineEditor useMemo
- [x] **发布页响应式布局**: 平台账号卡片单行布局,移动端紧凑(小图标/小按钮),桌面端宽松(与其他板块风格一致)
- [x] **移动端刷新回顶部**: `scrollRestoration = "manual"` + 列表 scroll 时间门控(`scrollEffectsEnabled` ref1 秒内禁止自动滚动)+ 延迟兜底 `scrollTo(0,0)`
- [x] **移动端样式预览缩小**: FloatingStylePreview 移动端宽度缩至 160px位置改为右下角不遮挡样式调节控件
- [x] **列表滚动条统一隐藏**: 所有列表BGM/配音/作品/素材/文案提取)滚动条改回 `hide-scrollbar`
- [x] **移动端配音/素材适配**: VoiceSelector 按钮移动端缩小(`px-2 sm:px-4`修复克隆声音不可见MaterialSelector 标题行移除 `whitespace-nowrap`,描述移动端隐藏,修复刷新按钮溢出
- [x] **生成配音按钮放大**: 从辅助尺寸(`text-xs px-2 py-1`)升级为主操作尺寸(`text-sm font-medium px-4 py-2`),新增阴影
- [x] **生成进度条位置调整**: 从"六、作品"卡片内部提取到右栏独立卡片,显示在作品卡片上方,更醒目
- [x] **LatentSync 超时修复**: httpx 超时从 1200s20 分钟)改为 3600s1 小时),修复 2 分钟以上视频口型推理超时回退问题
- [x] **字幕时间戳节奏映射**: `whisper_service.py` 从全程线性插值改为 Whisper 逐词节奏映射,修复长视频字幕漂移
### Day 25: 文案提取修复 + 自定义提示词 + 片头副标题
- [x] **抖音文案提取修复**: yt-dlp Fresh cookies 报错,重写 `_download_douyin_manual` 为移动端分享页 + 自动获取 ttwid 方案
- [x] **清理 DOUYIN_COOKIE**: 新方案不再需要手动维护 Cookie`.env`/`config.py`/`service.py` 全面删除
- [x] **AI 智能改写自定义提示词**: 后端 `rewrite_script()` 支持 `custom_prompt` 参数;前端 checkbox 旁新增折叠式提示词编辑区localStorage 持久化。
- [x] **SSR 构建修复**: `useState` 初始化 `localStorage` 访问加 `typeof window` 守卫,修复 `npm run build` 报错。
- [x] **片头副标题**: 新增 secondary_title后端/Remotion/前端全链路AI 同时生成独立样式配置20 字限制
- [x] **前端文案修正**: "AI 洗稿结果"→"AI 改写结果"
- [x] **yt-dlp 升级**: `2025.12.08``2026.2.21`
- [x] **参考音频中文文件名修复**: `sanitize_filename()` 将存储路径清洗为 ASCII 安全字符,纯中文名哈希兜底,原始名保留为展示名
### Day 24: 鉴权到期治理 + 多素材时间轴稳定性修复
- [x] **会员到期请求时失效**: 登录与鉴权接口统一执行 `expires_at` 检查;到期后自动停用账号、清理 session并返回“会员已到期请续费”
- [x] **画面比例控制**: 时间轴新增 `9:16 / 16:9` 输出比例选择,前端持久化并透传后端,单素材/多素材统一按目标分辨率处理
- [x] **标题/字幕防溢出**: Remotion 与前端预览统一响应式缩放、自动换行、描边/字距/边距比例缩放,降低预览与成片差异
- [x] **标题显示模式**: 标题行新增“短暂显示/常驻显示”下拉默认短暂显示4 秒),用户选择持久化并透传至 Remotion 渲染链路
- [x] **MOV 方向归一化**: 新增旋转元数据解析与 orientation normalize修复“编码横屏+旋转元数据”导致的竖屏判断偏差
- [x] **多素材拼接稳定性**: 片段 prepare 与 concat 统一 25fps/CFRconcat 增加 `+genpts`,缓解段切换处“画面冻结口型还动”
- [x] **时间轴语义对齐**: 打通 `source_end` 全链路;修复 `sourceStart>0 且 sourceEnd=0` 时长计算;生成时以时间轴可见段 assignments 为准,超出段不参与
- [x] **交互细节优化**: 页面刷新回顶部;素材/历史列表首轮自动滚动抑制,减少恢复状态时页面跳动
### Day 23: 配音前置重构 + 素材时间轴编排 + UI 体验优化 + 声音克隆增强
#### 第一阶段:配音前置
- [x] **配音生成独立化**: 新增 `generated_audios` 后端模块router/schemas/service5 个 API 端点,复用现有 TTSService / voice_clone_service / task_store
- [x] **配音管理面板**: 前端新增 `useGeneratedAudios` hook + `GeneratedAudiosPanel` 组件,支持生成/试听/改名/删除/选中
- [x] **UI 面板重排序**: 文案 → 标题字幕 → 配音方式 → 配音列表 → 素材选择 → BGM → 生成视频
- [x] **素材区门控**: 未选中配音时素材区显示遮罩,选中后显示配音时长 + 素材均分信息
- [x] **视频生成对接**: workflow.py 新增预生成音频分支(`generated_audio_id`),跳过内联 TTS向后兼容
- [x] **持久化**: selectedAudioId 加入 useHomePersistence刷新页面恢复选中配音。
#### 第二阶段:素材时间轴编排
- [x] **时间轴编辑器**: 新增 `TimelineEditor` 组件wavesurfer.js 音频波形 + 色块可视化素材分配,拖拽分割线调整各段时长
- [x] **素材截取设置**: 新增 `ClipTrimmer` 模态框HTML5 视频预览 + 双端滑块设置源视频截取起点/终点
- [x] **后端自定义分配**: 新增 `CustomAssignment` 模型,`prepare_segment` 支持 `source_start`workflow 多素材/单素材流水线支持 `custom_assignments`
- [x] **循环截取修复**: `stream_loop + source_start` 改为两步处理(先裁剪再循环),确保从截取起点循环而非从视频 0s 开始
- [x] **MaterialSelector 精简**: 移除旧的时长信息栏和拖拽排序区(功能迁移到 TimelineEditor
#### 第三阶段UI 体验优化 + TTS 稳定性
- [x] **TTS SoX PATH 修复**: `run_qwen_tts.sh` export conda env bin 到 PATH (Qwen3-TTS 已停用,已被 CosyVoice 3.0 替换)
- [x] **TTS 显存管理**: 每次生成后 `torch.cuda.empty_cache()`asyncio.to_thread 避免阻塞事件循环 (CosyVoice 沿用相同机制)。
- [x] **配音列表按钮统一**: Play/Edit/Delete 按钮右侧同组 hover 显示,与 RefAudioPanel 一致,移除文案摘要。
- [x] **素材区解除配音门控**: 移除 MaterialSelector 的 selectedAudio 遮罩,素材随时可上传管理
- [x] **时间轴拖拽排序**: TimelineEditor 色块支持 HTML5 Drag & Drop 调换素材顺序
- [x] **截取设置 Range Slider**: ClipTrimmer 改为单轨道双手柄(紫色起点+粉色终点),替换两个独立滑块。
- [x] **截取设置视频预览**: 视频区域可播放/暂停,从 sourceStart 到 sourceEnd 自动停止,拖拽手柄时实时 seek。
#### 第四阶段:历史文案 + Bug 修复
- [x] **历史文案保存与加载**: 新增 `useSavedScripts` hook手动保存/加载/删除历史文案,独立 localStorage 持久化
- [x] **时间轴拖拽修复**: `reorderSegments` 从属性交换改为数组移动splice修复拖拽后时长不跟随素材的 Bug
- [x] **按钮视觉统一**: 文案编辑区 4 个按钮统一为固定高度 `h-7`,移除多余 `<span>` 嵌套
- [x] **底部栏调整**: "保存文案"按钮移至底部右侧,移除预计时长显示
#### 第五阶段:字幕语言不匹配 + 视频比例错位修复
- [x] **字幕用原文替换 Whisper 转录**: `align()` 新增 `original_text` 参数,字幕文字永远用配音保存的原始文案。
- [x] **Remotion 动态视频尺寸**: `calculateMetadata` 从 props 读取真实尺寸,修复标题/字幕比例错位。
- [x] **英文空格丢失修复**: `split_word_to_chars` 遇到空格时 flush buffer + pending_space 标记
#### 第六阶段:参考音频自动转写 + 语速控制
- [x] **Whisper 自动转写 ref_text**: 上传参考音频时自动调用 Whisper 转写内容作为 ref_text不再使用前端固定文字
- [x] **参考音频自动截取**: 超过 10 秒自动在静音点截取ffmpeg silencedetect末尾 0.1 秒淡出避免截断爆音
- [x] **重新识别功能**: 新增 `POST /ref-audios/{id}/retranscribe` 端点 + 前端 RotateCw 按钮,旧音频可重新转写并截取
- [x] **语速控制**: 全链路 speed 参数(前端选择器 → 持久化 → 后端 → CosyVoice `inference_zero_shot(speed=)`5 档:较慢(0.8)/稍慢(0.9)/正常(1.0)/稍快(1.1)/较快(1.2)
- [x] **缺少参考音频门控**: 声音克隆模式下未选参考音频时,生成配音按钮禁用 + 黄色警告提示。
- [x] **Whisper 语言自动检测**: `transcribe()` language 参数改为可选(默认 None = 自动检测),支持多语言参考音频
- [x] **前端清理**: 移除固定 ref_text 常量、朗读引导文字,简化为"上传任意语音样本,系统将自动识别内容并克隆声音"
### Day 22: 多素材优化 + AI 翻译 + TTS 多语言
- [x] **多素材 Bug 修复**: 6 个高优 Bug边界溢出、单段 fallback、除零、duration 校验、Whisper 兜底、空列表检查)
- [x] **架构重构**: 多素材从"逐段 LatentSync"重构为"先拼接再推理",推理次数 N→1
- [x] **前端优化**: payload 安全、进度消息、上传自动选中、Material 接口统一、拖拽修复、素材上限 4 个
- [x] **AI 多语言翻译**: 新增 `/api/ai/translate` 接口,前端 9 种语言翻译 + 还原原文
- [x] **TTS 多语言**: EdgeTTS 10 语言声音列表、翻译自动切换声音、声音克隆 language 透传、textLang 持久化
### Day 21: 缺陷修复 + 浮动预览 + 发布重构 + 架构优化 + 多素材生成
- [x] **Remotion 崩溃容错**: 渲染进程 SIGABRT 退出时检查输出文件,避免误判失败导致标题/字幕丢失
- [x] **首页作品选择持久化**: 修复 `fetchGeneratedVideos` 无条件覆盖恢复值的问题,新增 `preferVideoId` 参数控制选中逻辑
- [x] **发布页作品选择持久化**: 根因为签名 URL 不稳定,全面改用 `video.id` 替代 `path` 进行选择/持久化/比较
- [x] **预取缓存补全**: 首页预取发布页数据时加入 `id` 字段,确保缓存数据可用于持久化匹配。
- [x] **浮动样式预览窗口**: 标题字幕预览改为 `position: fixed` 浮动窗口,固定左上角,滚动时始终可见。
- [x] **移动端适配**: ScriptEditor 按钮换行、预览默认比例改为 9:16 竖屏
- [x] **多平台发布重构**: 平台配置独立化DOUYIN_*/WEIXIN_*)、用户隔离 Cookie 管理、抖音刷脸验证二维码、微信发布流程优化
- [x] **前端结构微调**: ScriptExtractionModal 迁移到 features/、contexts 迁移到 shared/contexts/、清理空目录
- [x] **后端模块分层**: materials/tools/ref_audios 三个模块补全 router+schemas+service 分层
- [x] **开发规范更新**: BACKEND_DEV.md 新增渐进原则、DOC_RULES.md 取消 TASK_COMPLETE.md 手动触发约束
- [x] **文档全面更新**: BACKEND_DEV/README、FRONTEND_DEV、DEPLOY_MANUAL、README.md 同步更新
- [x] **多素材视频生成(多机位效果)**: 支持多选素材 + 拖拽排序,按素材数量均分音频时长(对齐 Whisper 字边界)自动切换机位。逐段 LatentSync + FFmpeg 拼接。前端 @dnd-kit 拖拽排序 UI
- [x] **字幕开关移除**: 默认启用逐字高亮字幕,移除开关及相关死代码
- [x] **视频格式扩展**: 上传支持 mkv/webm/flv/wmv/m4v/ts/mts 等常见格式。
- [x] **Watchdog 优化**: 健康检查阈值提高到 5 次,新增重启冷却期 120 秒,避免误重启。
- [x] **多素材 Bug 修复**: 修复标点分句方案对无句末标点文案无效(改为均分方案)、音频时间偏移导致口型不对齐等缺陷
### Day 20: 代码质量与安全优化
- [x] **功能性修复**: LatentSync 回退逻辑、任务状态接口认证、User 类型统一
- [x] **性能优化**: N+1 查询修复、视频上传流式处理、httpx 异步替换、GLM 异步包装
- [x] **安全修复**: 硬编码 Cookie 配置化、日志敏感信息脱敏、ffprobe 安全调用、CORS 配置化
- [x] **配置优化**: 存储路径环境变量化、Remotion 预编译加速、LatentSync 绝对路径
- [x] **文档更新**: 更新 DOC_RULES.md 清单,补齐后端与部署文档;更新 SUBTITLE_DEPLOY.md, FRONTEND_DEV.md, implementation_plan.md
- [x] **缺陷修复**: 修复 Remotion 路径解析、发布页持久化竞态、首页选中回归、素材闭包陷阱。
### Day 19: 自动发布稳定性与发布体验优化 🚀
- [x] **抖音发布稳定性**: 上传入口、封面流程、发布重试、登录失效识别与网络失败快速返回全面增强。
- [x] **视频号发布修复**: 标题+标签统一写入“视频描述”,`post_create` 成功信号快速判定,超时改为失败返回
- [x] **成功截图闭环**: 抖音/视频号发布成功截图接入前端,支持用户隔离存储与鉴权访问
- [x] **截图观感优化**: 成功截图延后 3 秒并改为视口截图,修复“截图内容仅占 1/3”问题
- [x] **调试能力开关化**: 新增视频号录屏配置,默认可按环境变量开关,失败排障更直观
- [x] **启动链路统一**: 合并为 `run_backend.sh`xvfb + headful统一端口 `8006`,减少多进程混淆
- [x] **发布页防误操作**: 发布中按钮提示“请勿刷新或关闭网页”,并启用刷新/关页二次确认拦截
- [ ] **后续优化**: 发布任务状态恢复机制(任务化 + 状态持久化 + 前端轮询恢复)。
### Day 18: 后端模块化与规范完善
- [x] **模块化迁移**: 路由透传 `modules/*`,业务逻辑集中到 service/workflow
- [x] **视频生成拆分**: 生成流程下沉 workflow任务状态统一 TaskStore
- [x] **Redis 任务存储**: Redis 优先,不可用自动回退内存
- [x] **仓储层抽离**: Supabase 访问统一 `repositories/*`deps/auth/admin 全面替换
- [x] **响应规范**: 统一 `success/message/data/code` + 全局异常处理。
- [x] **素材重命名**: 新增重命名接口与 Storage `move_file`
- [x] **平台顺序调整**: 抖音/微信视频号/B站/小红书,移除快手
- [x] **后端开发规范**: 新增 `BACKEND_DEV.md`README 同步模块化结构
- [x] **发布管理体验**: 首页预取路由 + 发布页骨架与缓存,进入更快
- [x] **素材加载优化**: 素材列表并发签名 URL骨架数量动态
- [x] **预览加载优化**: `preload="metadata"` + hover 预取
### Day 17: 前端重构与体验优化
- [x] **UI 组件拆分**: 首页拆分为独立组件,降低 `page.tsx` 复杂度。
- [x] **轻量 FSD 迁移**: `app` 页面轻量化,逻辑集中到 `features/*/model`,通用能力下沉 `shared/*`
- [x] **Controller Hooks**: Home/Publish 页面逻辑集中到 Controller HookPage 仅组合渲染
- [x] **通用工具抽取**: `media.ts` 统一 API Base / URL / 日期格式化
- [x] **交互优化**: 选择项持久化、列表内定位、刷新回顶部、最新作品优先预览
- [x] **发布页改造**: 作品列表卡片化 + 搜索 + 预览弹窗
- [x] **预览体验**: 预览弹窗统一头部样式与提示文案。
- [x] **预览一致性**: 标题/字幕预览按素材分辨率缩放。
- [x] **标题同步与限制**: 片头标题同步发布标题,输入法合成态兼容,限制 15 字
- [x] **样式默认与持久化**: 默认样式与字号调整,刷新保留用户选择
- [x] **性能微优化**: 列表渲染优化 + 并行请求 + localStorage 防抖
- [x] **资源能力**: 字体/BGM 资源库 + `/api/assets` 接入。
- [x] **音频与字幕修复**: BGM 混音稳定性与字幕断句优化。
- [x] **持久化修复**: 接入 `useHomePersistence`,恢复 `isRestored` 逻辑并通过构建
- [x] **预览与选择修复**: 发布预览兼容签名 URL音频试听路径解析素材/BGM 回退有效项
- [x] **体验细节优化**: 录音预览 URL 回收,预览弹窗滚动恢复,全局任务提示挂载
### Day 16: 深度性能优化
- [x] **Qwen-TTS 加速**: 集成 Flash Attention 2 (已停用,被 CosyVoice 3.0 替换)
- [x] **服务守护**: 开发 `Watchdog` 看门狗机制,自动监控并重启僵死服务
- [x] **LatentSync 性能确认**: 验证 DeepCache + 原生 Flash Attn 生效。
- [x] **文档重构**: 全面更新 README、部署手册及后端文档。
### Day 15: 手机号认证迁移
- [x] **认证系统升级**: 从邮箱迁移至 11 位手机号注册/登录
- [x] **账户管理**: 新增修改密码、有效期显示、安全退出功能
- [x] **AI 文案助手**: 升级 GLM-4.7-Flash支持 B站/抖音链接提取与洗稿
### Day 14: AI 增强与体验优化
- [x] **AI 标题/标签**: 集成 GLM-4API 自动生成视频元数据
- [x] **字幕升级**: Remotion 逐字高亮字幕 (卡拉OK效果) 及动画片头
- [x] **模型升级**: 声音克隆已迁移至 CosyVoice 3.0 (0.5B)
### Day 13: 声音克隆集成
- [x] **声音克隆微服务**: 封装 CosyVoice 3.0 为独立 API (8010端口替换 Qwen3-TTS)
- [x] **参考音频管理**: Supabase 存储桶配置与管理接口
- [x] **多模态 TTS**: 前端支持 EdgeTTS / Clone Voice 切换
### Day 12: 移动端适配
- [x] **iOS 兼容**: 修复 Safari 安全区域、状态栏颜色、Cookie 拦截问题
- [x] **响应式 UI**: 移动端 Header 与发布页重构
### Day 11: 上传架构重构
- [x] **直传优化**: 前端直传 Supabase Storage解决 Nginx 30s 超时问题
- [x] **数据隔离**: 用户素材/视频按 UserID 物理隔离
### Day 10: HTTPS 与安全
- [x] **HTTPS 部署**: 配置 SSL 证书与 Nginx 反向代理
- [x] **安全加固**: Supabase Studio 增加 Basic Auth 保护
### Day 9: 认证系统与发布闭环
- [x] **用户系统**: 基于 Supabase Auth 实现 JWT 认证
- [x] **发布闭环**: 验证 B站/抖音/小红书 自动发布流程
- [x] **服务自愈**: 配置 PM2 进程守护。
### Day 1-8: 核心功能构建
- [x] **Day 8**: 历史记录持久化与文件管理
- [x] **Day 7**: 社交媒体自动登录与多平台发布
- [x] **Day 6**: **LatentSync 1.6** 升级与服务器部署
- [x] **Day 5**: 前端视频上传与进度反馈
- [x] **Day 4**: MuseTalk (旧版) 口型同步修复
- [x] **Day 3**: 服务器环境配置与模型权重下载
- [x] **Day 1-2**: 项目基础框架 (FastAPI + Next.js) 搭建
---
## 🛤️ 后续规划 (Roadmap)
### 🔴 优先待办
- [x] ~~**配音前置重构 — 第二阶段**: 素材片段截取 + 语音时间轴编排~~ ✅ Day 23 已完成
- [ ] **批量生成架构**: 支持 Excel 导入,批量生产视频
- [ ] **定时任务后台化**: 迁移前端触发的定时发布到后端 APScheduler
- [ ] **发布任务恢复机制**: 发布任务化 + 状态持久化 + 前端断点恢复,解决刷新后状态丢失
### 🔵 长期探索
- [ ] **容器化交付**: 提供完整的 Docker Compose 一键部署包
- [ ] **分布式队列**: 引入 Celery + Redis 处理超高并发任务。
---
## 📊 模块完成度
| 模块 | 进度 | 状态 |
|------|------|------|
| **核心 API** | 100% | ✅ 稳定 |
| **Web UI** | 100% | ✅ 稳定 (移动端适配) |
| **唇形同步** | 100% | ✅ LatentSync 1.6 |
| **TTS 配音** | 100% | ✅ EdgeTTS + CosyVoice 3.0 + 配音前置 + 时间轴编排 + 自动转写 + 语速控制 + 语气控制 |
| **自动发布** | 100% | ✅ 抖音/微信视频号/B站/小红书 |
| **用户认证** | 100% | ✅ 手机号 + JWT |
| **付费会员** | 100% | ✅ 支付宝电脑网站支付 + 自动激活 |
| **部署运维** | 100% | ✅ PM2 + Watchdog |
---
## 📎 相关文档
- [详细开发日志 (DevLogs)](Docs/DevLogs/)
- [部署手册 (DEPLOY_MANUAL)](Docs/DEPLOY_MANUAL.md)
# ViGent2 开发任务清单 (Task Log)
**项目**: ViGent2 数字人口播视频生成系统
**进度**: 100% (Day 35 - 小脸口型质量补偿落地 + 部署验证)
**更新时间**: 2026-03-10
---
## 📅 对话历史与开发日志
> 这里记录了每一天的核心开发内容与 milestone。
### Day 35: 小脸口型质量补偿落地 + 部署验证 + 稳定性补丁 (Current)
- [x] **小脸口型质量补偿落地**: 新增 `small_face_enhance_service.py`,实现 SCRFD 小脸检测10%-30% 采样)-> 裁切轨迹(每 8 帧检测 + EMA-> 稀疏关键帧超分GFPGAN/CodeFormer-> 下半脸贴回seamlessClone/alpha fallback完整链路
- [x] **后端集成完成**: `lipsync_service.py``_local_generate()` 内完成 looping 后插入增强,抽取 `_run_selected_model()` 统一模型路由,增强失败按 `FAIL_OPEN` 自动回退原流程
- [x] **配置与依赖**: 新增 5 个 `LIPSYNC_SMALL_FACE_*` 配置项;`requirements.txt` 增加 `opencv-python-headless``gfpgan`;新增 `models/FaceEnhance/GFPGANv1.4.pth` 权重目录
- [x] **部署文档新增**: 新增并回写 `Docs/FACEENHANCE_DEPLOY.md`,补齐部署、权重、开关、验证、回滚说明
- [x] **线上稳定性修复**:
- `small_face_enhance_service.py` 增加 `cv2/numpy` 懒加载守卫,缺依赖时跳过增强不影响主流程
- 增加 `from __future__ import annotations`,避免 `np.ndarray` 注解在缺依赖场景导入期报错
- 增加 `torchvision.transforms.functional_tensor` shim修复 `torchvision>=0.20` 下 GFPGAN 初始化失败
- `_get_video_info()` 改为 JSON 字段解析并优先 `avg_frame_rate`,修复 `nb_frames` 缺失导致的帧数估算偏差
- `_build_face_track()` 回写实际读帧数;`blend_back()` 帧数校验放宽为 `lipsync <= original` 正常贴回,仅 `>` 报错
- `blend_back()` 新增 `ls_frames <= 0` 空输出保护,异常时由 `FAIL_OPEN` 回退常规路径,避免写出空视频
- 时基修复:增强视频输出 fps 跟随源视频 fps贴回按 `orig_fps/ls_fps` 映射原始帧索引,修复动作变慢与重影
- 音轨修复:贴回成功后新增 mux 音轨步骤,确保小脸增强路径输出视频包含声音
- 眼部重影修复mask 起点下移到 68% 并增加左右 16% 留白,对 seamlessClone 结果做 mask 限域二次融合,减少眼部上方 ghosting
- 运行策略收口:`LIPSYNC_SMALL_FACE_THRESHOLD=9999` 仅用于链路冒烟,质量验证与日常运行统一回归 `256`
- [x] **部署校验通过**: `GET /api/videos/lipsync/health` 已返回 `data.small_face_enhance`;默认 `enabled=false`,开关关闭下行为与原流程一致
### Day 34: 多镜头时间轴重构 + 文案深度学习弹窗防误触关闭 + Code Review 修复
- [x] **时间轴模型重构**: 多素材从”等分顺序片段”升级为”主素材连续播放 + 插入镜头块”,支持自由插入、拖拽移动。
- [x] **前端链路落地**: 重写 `useTimelineEditor``TimelineEditor`,新增主素材/插入候选语义,`useHomeController` / `HomePage` / `MaterialSelector` 全链路适配
- [x] **后端生成链路适配**: `workflow.py` 完成 `material_paths` 来源修正、`custom_assignments` 新校验、素材下载去重与段处理并发限制,保持单素材兼容
- [x] **文案深度学习防误触关闭**: `ScriptLearningModal` 禁用遮罩和 `ESC` 关闭,仅允许右上角 `X` 或”填入文案”关闭;输入页”取消”改为”清空”
- [x] **Code Review 修复**:
- UX: 移除时间轴 resize handle统一用 ClipTrimmer 弹窗编辑时长;引入拖拽/点击像素阈值区分
- Lint: 修复 `useTimelineEditor` 3 处 set-state-in-effect、`HomePage` 未使用解构、`TimelineEditor` 未使用 import/props
- P1: `workflow.py` `is_multi` 补充 `custom_assignments` 条件,防止多片段 assignment 退化为单素材路径
- P1: 素材 trim range 改为按 identity非 count重置修复切换主素材时截取范围泄漏
- ClipTrimmer onConfirm 同步调用 `resizeInsert()` 更新时间轴块时长
- [x] **文档同步**: 回写 `Day34``TASK_COMPLETE`,并更新 Current 指向。
### Day 33: 文案深度学习落地 + 抓取稳定性增强 + 交互统一
- [x] **文案深度学习功能上线**: 新增 `ScriptLearningModal`(输入主页链接 -> 话题分析 -> 生成文案 -> 填入编辑器)与首页入口接入
- [x] **Tools 新接口**: 新增 `POST /api/tools/analyze-creator``POST /api/tools/generate-topic-script`,并接入登录鉴权。
- [x] **抖音/B站抓取增强**: 博主标题抓取统一升级为 Playwright 直连主链路,支持用户 Cookie 上下文增强与失败重试。
- [x] **GLM 调用统一收口**: `glm_service` 新增统一调用入口,标题生成/改写/翻译/话题分析/话题文案生成全部复用,减少重复代码
- [x] **超时体验优化**: 文案深度学习“生成文案”前端超时从 30s 提升到 90s并补充超时提示文案
- [x] **文案弹窗交互统一**: 文案提取/AI 改写/文案深度学习结果页按钮统一为底部 Action Grid主次按钮层级与文案动作统一
- [x] **依赖升级**: 后端 venv 升级 `yt-dlp``playwright``biliup` 并完成兼容性冒烟验证
- [x] **文档同步**: 回写 `Day33``FRONTEND_README``FRONTEND_DEV``BACKEND_README``BACKEND_DEV``TASK_COMPLETE`
### Day 32: 视频下载同源修复 + 安全整改第一批 + Day 日志拆分归档
- [x] **下载链路修复**: 新增 `GET /api/videos/generated/{video_id}/download`,统一返回 `Content-Disposition: attachment`,修复“点击下载却新开标签页播放”问题
- [x] **发布成功弹窗下载改造**: `CleanupContext` 从传 URL 改为传 `videoId`,下载按钮改走同源接口,去掉 `target="_blank"`
- [x] **首页下载改造**: `PreviewPanel` 同步切换到同源下载接口,首页与发布页下载行为一致。
- [x] **兼容旧持久化状态**: `CleanupContext` 对旧 `videoDownloadUrl``videoId` 解析回填,避免旧 pending 状态失效
- [x] **文档拆分归档**: 将“下载修复开始后的今日内容”归档到 `Docs/DevLogs/Day32.md`,并从 `Day31.md` 移除对应章节与验证记录
- [x] **安全第一批修复**: JWT 默认密钥生产拦截、AI/Tools 接口强制鉴权、materials 路径穿越拦截、video_id 白名单、上传体积限制、错误信息通用化
- [x] **安全收尾三刀**: `delete_material``ValueError -> 400``tools` URL 下载分支 500MB 限制、`DEBUG=false` 下默认 JWT 密钥阻断启动
- [x] **弹窗关闭策略收敛**: 默认支持 `ESC/X/遮罩` 关闭;发布成功清理弹窗保持强制流程不允许遮罩关闭;录音弹窗录音中禁遮罩关闭(防误触)
### Day 31: 文档体系收敛 + 音色试听 + 录音弹窗重构 + 发布登录稳定性修复
- [x] **文档体系收敛**: README/DEV 职责边界明确部署参数与代码对齐Qwen3-TTS 文档归档至历史状态
- [x] **音色试听能力**: 新增并启用 `GET/POST /api/videos/voice-preview`,前端改为直接播放 GET 音频流,修复线上 404重启后端生效
- [x] **录音交互重构**: 录音入口迁移到参考音频区底部,流程改为弹窗;支持录音后即时关闭弹窗、后台上传识别
- [x] **弹窗系统统一**: 抽离 `AppModal`,统一遮罩/焦点/滚动锁/Portal可访问性补齐主要弹窗完成迁移预览、提取、改写、截取、录音、改密、发布登录
- [x] **抖音扫码修复**: 登录页等待策略改为 `domcontentloaded`,并对导航超时容错,避免“无法获取二维码”
- [x] **微信二维码优化**: 后端优先导出原始 PNG前端展示加入白底留白容器修复“二维码边缘像被截断”的观感问题
- [x] **发布性能优化**: 发布页改为受限并发(并发度 2多平台发布总等待时长明显下降
- [x] **微信上传日志降噪**: `file_input empty` 告警改为信号驱动,非最终重试降级为 info减少误报警
- [x] **小红书发布重构**: 对齐抖音/微信上传架构,补齐启动配置、上传/发布多信号判定、成功截图与 `screenshot_url` 回传
- [x] **Cookie 格式统一**: 非 B 站平台统一保存为 Playwright `storage_state`,支持 uploader 直接加载上下文
- [x] **小红书扫码修复**: 自动从短信登录切换到扫码页并提取二维码,登录成功判定补齐 `/new/home` 路径
- [x] **小红书“上传卡住”修复**: 新增无后缀视频临时文件兜底hardlink/copy、文件名后缀一致性校验、上传空转超时保护90s
- [x] **实测闭环**: 小红书 `POST /api/publish` 实测成功45.77s)并可访问成功截图接口
- [x] **文档补齐**: 新增 `Docs/PUBLISH_DEPLOY.md`,并回写 `README.md``BACKEND_README.md``BACKEND_DEV.md``DEPLOY_MANUAL.md`
- [x] **文档规则对齐**: 更新 `Docs/DOC_RULES.md`,补充发布相关“三检”与敏感信息处理规范,加入 `PUBLISH_DEPLOY.md` 检查项,工具规范改为 `Read/Grep/apply_patch`,并对齐 TASK_COMPLETE 检查清单
- [x] **首页交互微调**: `AI生成标题标签` 按钮迁移到“四、标题与字幕”标题同层最右;`标题显示方式 + 预览样式` 下沉到下一行右侧AI按钮圆角/尺寸对齐“在线录音”,配色保留原蓝色渐变;文档明确 `title_display_mode` 对主/副标题统一生效。
- [x] **文案编辑扩展**: 文案输入框右下角新增扩展角标,点击后弹出大编辑器,主框与弹窗内文案实时同步;角标样式改为双箭头极简贴边并微调到 `right-0.5 bottom-2`;修复扩展输入框打字后失焦问题,移除紫色聚焦边框。
- [x] **站点图标更新**: 使用 `Temp/video.png` 替换网站 icon生成并更新 `frontend/src/app/icon.png` 与多尺寸 `frontend/src/app/favicon.ico`
- [x] **发布后清理链路加固**: 新增/优化 `CleanupContext` + `/api/videos/cleanup` 全链路;后端删除异常不再吞错、清理接口严格成功语义;前端失败不清本地/不关弹窗3 次失败可暂不清理,清理状态 24h 过期并支持用户切换复位;清理范围收敛为输入内容字段并保留用户偏好
### Day 30: Remotion 缓存修复 + 编码流水线质量优化 + 唇形同步容错 + 统一下拉交互
- [x] **Remotion 缓存 404 修复**: bundle 缓存命中时,新生成的视频/字体文件不在旧缓存 `public/` 目录 → 404 → 回退 FFmpeg无标题字幕。改为硬链接`fs.linkSync`)当前渲染所需文件到缓存目录
- [x] **LatentSync `read_video` 跳过冗余 FPS 重编码**: 检测输入 FPS已是 25fps 时跳过 `ffmpeg -r 25 -crf 18` 重编码
- [x] **LatentSync final mux 流复制**: `imageio` CRF 13 写帧后的 mux 步骤从 `libx264 -crf 18` 改为 `-c:v copy`,消除冗余双重编码
- [x] **`prepare_segment` + `normalize_orientation` CRF 提质**: CRF 23 → 18与 LatentSync 内部质量标准统一
- [x] **多素材 concat 流复制**: 各段参数已统一,`concat_videos``libx264 -crf 23` 改为 `-c:v copy`
- [x] **编码次数总计**: 从 5-6 次有损编码降至 3 次prepare_segment → LatentSync/MuseTalk 模型输出 → Remotion
- [x] **LatentSync 无脸帧容错**: 素材部分帧检测不到人脸时不再中断推理,无脸帧保留原画面,单素材异常时回退原视频
- [x] **MuseTalk 管道直编码**: `cv2.VideoWriter(mp4v)` 中间有损文件改为 FFmpeg rawvideo stdin 管道,消除一次冗余有损编码
- [x] **MuseTalk 参数环境变量化**: 推理与编码参数detect_every/blend_cache/CRF/preset 等)从硬编码迁移到 `backend/.env`当前使用质量优先档CRF 14, preset slow, detect_every 2, blend_cache_every 2
- [x] **Workflow 异步防阻塞**: 新增 `_run_blocking()` 线程池辅助5 处同步 FFmpeg 调用(旋转归一化/prepare_segment/concat/BGM 混音)改为 `await _run_blocking()`,事件循环不再被阻塞
- [x] **compose 跳过优化**: 无 BGM 时 `final_audio_path == audio_path`,跳过多余的 compose 步骤Remotion 路径直接用 lipsync 输出,非 Remotion 路径 `shutil.copy` 透传
- [x] **compose() 异步化**: `compose()` 改为 `async def`,内部 `_get_duration``_run_ffmpeg``run_in_executor`
- [x] **同分辨率跳过 scale**: 多素材逐段比对分辨率,匹配的传 `None` 走 copy 分支;单素材同理。避免已是目标分辨率时的无效重编码
- [x] **`_get_duration()` 线程池化**: workflow 中 3 处同步 ffprobe 探测改为 `await _run_blocking()`
- [x] **compose 循环 CRF 统一**: 循环场景 CRF 23 → 18与全流水线质量标准一致。
- [x] **多素材片段校验**: prepare 完成后校验片段数量一致,防止空片段进入 concat。
- [x] **唇形模型前端选择**: 生成按钮右侧新增模型下拉(默认模型/快速模型/高级模型),全链路透传 `lipsync_model` 到后端路由。默认保持阈值策略,快速强制 MuseTalk高级强制 LatentSync三种模式均有 LatentSync 兜底。选择 localStorage 持久化。
- [x] **业务下拉统一组件化**: 新增 `SelectPopover`(桌面 Popover + 移动端 BottomSheet覆盖首页/发布页主要业务选择器音色、参考音频、配音、素材、BGM、作品、样式、模型、画面比例
- [x] **下拉体验修复**: 统一处理遮挡Portal + fixed、自动上拉、触发器同宽、背景不透明、滚动条隐藏、再次打开定位到已选项
- [x] **预览联动修复**: 下拉内点击视频预览不强制收起菜单;预览弹窗层级高于下拉;关闭预览后可继续在菜单内连续预览
- [x] **BGM 交互收敛**: BGM 选择改为发布页同款(搜索 + 列表 + 试听);按产品要求移除首页音量滑杆,生成请求固定 `bgm_volume=0.2`
- [x] **例外回退**: `ScriptEditor` 的“历史文案 / AI多语言”恢复原轻量菜单样式不强制统一 SelectPopover
- [x] **文档同步**: Day30 / TASK_COMPLETE / FRONTEND_DEV / FRONTEND_README / README / BACKEND_README 同步更新到最终实现
### Day 29: 视频流水线优化 + CosyVoice 语气控制
- [x] **字幕同步修复**: Whisper 时间戳三步平滑(单调递增+重叠消除+间隙填补)+ 原文节奏映射(线性插值 + 单字时长钳位)。
- [x] **LatentSync 嘴型参数调优**: inference_steps 16→20, guidance_scale 2.0, DeepCache 启用, Remotion concurrency 16→4
- [x] **compose 流复制**: 不循环时 `-c:v copy` 替代 libx264 重编码compose 耗时从分钟级降到秒级
- [x] **FFmpeg 超时保护**: `_run_ffmpeg()` timeout=600, `_get_duration()` timeout=30
- [x] **全局并发限制**: `asyncio.Semaphore(2)` 控制同时运行的生成任务数
- [x] **Redis 任务 TTL**: create 24h, completed/failed 2h, list 自动清理过期索引
- [x] **临时字体清理**: 字体文件加入 temp_files 清理列表。
- [x] **预览背景 CORS 修复**: 素材同源代理 `/api/materials/stream/{id}` 彻底绕开跨域。
- [x] **CosyVoice 语气控制**: 声音克隆模式新增语气下拉(正常/欢快/低沉/严肃),基于 `inference_instruct2()` 自然语言指令控制情绪,全链路透传 instruct_text默认"正常"行为不变
### Day 28: CosyVoice FP16 加速 + 文档全面更新
- [x] **CosyVoice FP16 半精度加速**: `AutoModel()` 开启 `fp16=True`LLM 推理和 Flow Matching 自动混合精度运行,预估提速 30-40%、显存降低 ~30%
- [x] **文档全面更新**: README.md / DEPLOY_MANUAL.md / SUBTITLE_DEPLOY.md / BACKEND_README.md 补充 MuseTalk 混合唇形同步方案、性能优化、Remotion 并发渲染等内容
### Day 27: Remotion 描边修复 + 字体样式扩展 + 混合唇形同步 + 性能优化
- [x] **描边渲染修复**: 标题/副标题/字幕从 `textShadow` 4 方向模拟改为 CSS 原生 `-webkit-text-stroke` + `paint-order: stroke fill`,修复描边过粗和副标题重影问题。
- [x] **字体样式扩展**: 标题样式 4→12 个(+庞门正道/优设标题圆/阿里数黑体/文道潮黑/无界黑/厚底黑/寒蝉半圆体/欣意吉祥宋),字幕样式 4→8 个(+少女粉/清新绿/金色隶书/楷体红字)。
- [x] **描边参数优化**: 所有预设 `stroke_size` 从 8 降至 4~5配合原生描边视觉更干净
- [x] **TypeScript 类型修复**: Root.tsx `Composition` 泛型与 `calculateMetadata` 参数类型对齐Video.tsx `VideoProps` 添加索引签名兼容 `Record<string, unknown>`VideoLayer.tsx 移除 `OffthreadVideo` 不支持的 `loop` prop
- [x] **进度条文案还原**: 进度条从显示后端推送消息改回固定 `正在AI生成中...`
- [x] **MuseTalk 混合唇形同步**: 部署 MuseTalk 1.5 常驻服务 (GPU0, 端口 8011),按音频时长自动路由(由 `LIPSYNC_DURATION_THRESHOLD` 控制;本仓库当前 `.env` 为 100— 短视频走 LatentSync长视频走 MuseTalkMuseTalk 不可用时自动回退
- [x] **MuseTalk 推理性能优化**: server.py v2 重写 — cv2 直读帧(跳过 ffmpeg→PNG)、人脸检测降频(每5帧)、BiSeNet mask 缓存(每5帧)、cv2.VideoWriter 直写(跳过 PNG 写盘)、batch_size 8→32预估 30min→8-10min (~3x)。
- [x] **Remotion 并发渲染优化**: render.ts 新增 concurrency 参数,从默认 8 提升到 16 (56核 CPU),预估 5min→2-3min。
### Day 26: 前端优化:板块合并 + 序号标题 + UI 精细化
- [x] **板块合并**: 首页 9 个独立板块合并为 5 个主板块(配音方式+配音列表→三、配音;视频素材+时间轴→四、素材编辑;历史作品+作品预览→六、作品)
- [x] **中文序号标题**: 一~十编号(首页一~六,发布页七~十),移除所有 emoji 图标。
- [x] **embedded 模式**: 6 个组件支持 `embedded` prop嵌入时不渲染外层卡片/标题。
- [x] **配音列表两行布局**: embedded 模式第 1 行语速+生成配音(右对齐),第 2 行配音列表+刷新
- [x] **子组件自渲染子标题**: MaterialSelector/TimelineEditor embedded 时自渲染 h3 子标题+操作按钮同行
- [x] **下拉对齐**: TitleSubtitlePanel 标签统一 `w-20`,下拉 `w-1/3 min-w-[100px]`,垂直对齐
- [x] **参考音频文案简化**: 底部段落移至标题旁,简化为 `(上传3-10秒语音样本)`
- [x] **账户手机号显示**: AccountSettingsDropdown 新增手机号显示。
- [x] **标题显示模式对副标题生效**: payload 条件修复 + UI 下拉上移至板块标题行
- [x] **登录后用户信息立即可用**: AuthContext 暴露 `setUser`,登录成功后立即写入用户数据,修复登录后显示"未知账户"的问题
- [x] **文案微调**: 素材描述改为"上传自拍视频最多可选4个";显示模式选项加"标题"前缀。
- [x] **UI/UX 体验优化**: 操作按钮移动端可见opacity-40、手机号脱敏、标题字数计数器、时间轴拖拽抓手图标、截取滑块放大。
- [x] **代码质量修复**: 密码弹窗 success 清空、MaterialSelector useMemo + disabled 守卫、TimelineEditor useMemo
- [x] **发布页响应式布局**: 平台账号卡片单行布局,移动端紧凑(小图标/小按钮),桌面端宽松(与其他板块风格一致)
- [x] **移动端刷新回顶部**: `scrollRestoration = "manual"` + 列表 scroll 时间门控(`scrollEffectsEnabled` ref1 秒内禁止自动滚动)+ 延迟兜底 `scrollTo(0,0)`
- [x] **移动端样式预览缩小**: FloatingStylePreview 移动端宽度缩至 160px位置改为右下角不遮挡样式调节控件
- [x] **列表滚动条统一隐藏**: 所有列表BGM/配音/作品/素材/文案提取)滚动条改回 `hide-scrollbar`
- [x] **移动端配音/素材适配**: VoiceSelector 按钮移动端缩小(`px-2 sm:px-4`修复克隆声音不可见MaterialSelector 标题行移除 `whitespace-nowrap`,描述移动端隐藏,修复刷新按钮溢出。
- [x] **生成配音按钮放大**: 从辅助尺寸(`text-xs px-2 py-1`)升级为主操作尺寸(`text-sm font-medium px-4 py-2`),新增阴影。
- [x] **生成进度条位置调整**: 从"六、作品"卡片内部提取到右栏独立卡片,显示在作品卡片上方,更醒目
- [x] **LatentSync 超时修复**: httpx 超时从 1200s20 分钟)改为 3600s1 小时),修复 2 分钟以上视频口型推理超时回退问题
- [x] **字幕时间戳节奏映射**: `whisper_service.py` 从全程线性插值改为 Whisper 逐词节奏映射,修复长视频字幕漂移
### Day 25: 文案提取修复 + 自定义提示词 + 片头副标题
- [x] **抖音文案提取修复**: yt-dlp Fresh cookies 报错,重写 `_download_douyin_manual` 为移动端分享页 + 自动获取 ttwid 方案
- [x] **清理 DOUYIN_COOKIE**: 新方案不再需要手动维护 Cookie`.env`/`config.py`/`service.py` 全面删除
- [x] **AI 智能改写自定义提示词**: 后端 `rewrite_script()` 支持 `custom_prompt` 参数;前端 checkbox 旁新增折叠式提示词编辑区localStorage 持久化
- [x] **SSR 构建修复**: `useState` 初始化 `localStorage` 访问加 `typeof window` 守卫,修复 `npm run build` 报错
- [x] **片头副标题**: 新增 secondary_title后端/Remotion/前端全链路AI 同时生成独立样式配置20 字限制
- [x] **前端文案修正**: "AI 洗稿结果"→"AI 改写结果"
- [x] **yt-dlp 升级**: `2025.12.08``2026.2.21`
- [x] **参考音频中文文件名修复**: `sanitize_filename()` 将存储路径清洗为 ASCII 安全字符,纯中文名哈希兜底,原始名保留为展示名
### Day 24: 鉴权到期治理 + 多素材时间轴稳定性修复
- [x] **会员到期请求时失效**: 登录与鉴权接口统一执行 `expires_at` 检查;到期后自动停用账号、清理 session并返回“会员已到期请续费”
- [x] **画面比例控制**: 时间轴新增 `9:16 / 16:9` 输出比例选择,前端持久化并透传后端,单素材/多素材统一按目标分辨率处理。
- [x] **标题/字幕防溢出**: Remotion 与前端预览统一响应式缩放、自动换行、描边/字距/边距比例缩放,降低预览与成片差异。
- [x] **标题显示模式**: 标题行新增“短暂显示/常驻显示”下拉默认短暂显示4 秒),用户选择持久化并透传至 Remotion 渲染链路
- [x] **MOV 方向归一化**: 新增旋转元数据解析与 orientation normalize修复“编码横屏+旋转元数据”导致的竖屏判断偏差
- [x] **多素材拼接稳定性**: 片段 prepare 与 concat 统一 25fps/CFRconcat 增加 `+genpts`,缓解段切换处“画面冻结口型还动”
- [x] **时间轴语义对齐**: 打通 `source_end` 全链路;修复 `sourceStart>0 且 sourceEnd=0` 时长计算;生成时以时间轴可见段 assignments 为准,超出段不参与
- [x] **交互细节优化**: 页面刷新回顶部;素材/历史列表首轮自动滚动抑制,减少恢复状态时页面跳动
### Day 23: 配音前置重构 + 素材时间轴编排 + UI 体验优化 + 声音克隆增强
#### 第一阶段:配音前置
- [x] **配音生成独立化**: 新增 `generated_audios` 后端模块router/schemas/service5 个 API 端点,复用现有 TTSService / voice_clone_service / task_store
- [x] **配音管理面板**: 前端新增 `useGeneratedAudios` hook + `GeneratedAudiosPanel` 组件,支持生成/试听/改名/删除/选中
- [x] **UI 面板重排序**: 文案 → 标题字幕 → 配音方式 → 配音列表 → 素材选择 → BGM → 生成视频
- [x] **素材区门控**: 未选中配音时素材区显示遮罩,选中后显示配音时长 + 素材均分信息
- [x] **视频生成对接**: workflow.py 新增预生成音频分支(`generated_audio_id`),跳过内联 TTS向后兼容
- [x] **持久化**: selectedAudioId 加入 useHomePersistence刷新页面恢复选中配音
#### 第二阶段:素材时间轴编排
- [x] **时间轴编辑器**: 新增 `TimelineEditor` 组件wavesurfer.js 音频波形 + 色块可视化素材分配,拖拽分割线调整各段时长。
- [x] **素材截取设置**: 新增 `ClipTrimmer` 模态框HTML5 视频预览 + 双端滑块设置源视频截取起点/终点
- [x] **后端自定义分配**: 新增 `CustomAssignment` 模型,`prepare_segment` 支持 `source_start`workflow 多素材/单素材流水线支持 `custom_assignments`
- [x] **循环截取修复**: `stream_loop + source_start` 改为两步处理(先裁剪再循环),确保从截取起点循环而非从视频 0s 开始
- [x] **MaterialSelector 精简**: 移除旧的时长信息栏和拖拽排序区(功能迁移到 TimelineEditor
#### 第三阶段UI 体验优化 + TTS 稳定性
- [x] **TTS SoX PATH 修复**: `run_qwen_tts.sh` export conda env bin 到 PATH (Qwen3-TTS 已停用,已被 CosyVoice 3.0 替换)
- [x] **TTS 显存管理**: 每次生成后 `torch.cuda.empty_cache()`asyncio.to_thread 避免阻塞事件循环 (CosyVoice 沿用相同机制)
- [x] **配音列表按钮统一**: Play/Edit/Delete 按钮右侧同组 hover 显示,与 RefAudioPanel 一致,移除文案摘要
- [x] **素材区解除配音门控**: 移除 MaterialSelector 的 selectedAudio 遮罩,素材随时可上传管理
- [x] **时间轴拖拽排序**: TimelineEditor 色块支持 HTML5 Drag & Drop 调换素材顺序
- [x] **截取设置 Range Slider**: ClipTrimmer 改为单轨道双手柄(紫色起点+粉色终点),替换两个独立滑块。
- [x] **截取设置视频预览**: 视频区域可播放/暂停,从 sourceStart 到 sourceEnd 自动停止,拖拽手柄时实时 seek。
#### 第四阶段:历史文案 + Bug 修复
- [x] **历史文案保存与加载**: 新增 `useSavedScripts` hook手动保存/加载/删除历史文案,独立 localStorage 持久化
- [x] **时间轴拖拽修复**: `reorderSegments` 从属性交换改为数组移动splice修复拖拽后时长不跟随素材的 Bug
- [x] **按钮视觉统一**: 文案编辑区 4 个按钮统一为固定高度 `h-7`,移除多余 `<span>` 嵌套
- [x] **底部栏调整**: "保存文案"按钮移至底部右侧,移除预计时长显示
#### 第五阶段:字幕语言不匹配 + 视频比例错位修复
- [x] **字幕用原文替换 Whisper 转录**: `align()` 新增 `original_text` 参数,字幕文字永远用配音保存的原始文案
- [x] **Remotion 动态视频尺寸**: `calculateMetadata` 从 props 读取真实尺寸,修复标题/字幕比例错位
- [x] **英文空格丢失修复**: `split_word_to_chars` 遇到空格时 flush buffer + pending_space 标记
#### 第六阶段:参考音频自动转写 + 语速控制
- [x] **Whisper 自动转写 ref_text**: 上传参考音频时自动调用 Whisper 转写内容作为 ref_text不再使用前端固定文字
- [x] **参考音频自动截取**: 超过 10 秒自动在静音点截取ffmpeg silencedetect末尾 0.1 秒淡出避免截断爆音
- [x] **重新识别功能**: 新增 `POST /ref-audios/{id}/retranscribe` 端点 + 前端 RotateCw 按钮,旧音频可重新转写并截取
- [x] **语速控制**: 全链路 speed 参数(前端选择器 → 持久化 → 后端 → CosyVoice `inference_zero_shot(speed=)`5 档:较慢(0.8)/稍慢(0.9)/正常(1.0)/稍快(1.1)/较快(1.2)。
- [x] **缺少参考音频门控**: 声音克隆模式下未选参考音频时,生成配音按钮禁用 + 黄色警告提示。
- [x] **Whisper 语言自动检测**: `transcribe()` language 参数改为可选(默认 None = 自动检测),支持多语言参考音频
- [x] **前端清理**: 移除固定 ref_text 常量、朗读引导文字,简化为"上传任意语音样本,系统将自动识别内容并克隆声音"
### Day 22: 多素材优化 + AI 翻译 + TTS 多语言
- [x] **多素材 Bug 修复**: 6 个高优 Bug边界溢出、单段 fallback、除零、duration 校验、Whisper 兜底、空列表检查)。
- [x] **架构重构**: 多素材从"逐段 LatentSync"重构为"先拼接再推理",推理次数 N→1。
- [x] **前端优化**: payload 安全、进度消息、上传自动选中、Material 接口统一、拖拽修复、素材上限 4 个
- [x] **AI 多语言翻译**: 新增 `/api/ai/translate` 接口,前端 9 种语言翻译 + 还原原文
- [x] **TTS 多语言**: EdgeTTS 10 语言声音列表、翻译自动切换声音、声音克隆 language 透传、textLang 持久化
### Day 21: 缺陷修复 + 浮动预览 + 发布重构 + 架构优化 + 多素材生成
- [x] **Remotion 崩溃容错**: 渲染进程 SIGABRT 退出时检查输出文件,避免误判失败导致标题/字幕丢失
- [x] **首页作品选择持久化**: 修复 `fetchGeneratedVideos` 无条件覆盖恢复值的问题,新增 `preferVideoId` 参数控制选中逻辑
- [x] **发布页作品选择持久化**: 根因为签名 URL 不稳定,全面改用 `video.id` 替代 `path` 进行选择/持久化/比较
- [x] **预取缓存补全**: 首页预取发布页数据时加入 `id` 字段,确保缓存数据可用于持久化匹配。
- [x] **浮动样式预览窗口**: 标题字幕预览改为 `position: fixed` 浮动窗口,固定左上角,滚动时始终可见。
- [x] **移动端适配**: ScriptEditor 按钮换行、预览默认比例改为 9:16 竖屏
- [x] **多平台发布重构**: 平台配置独立化DOUYIN_*/WEIXIN_*)、用户隔离 Cookie 管理、抖音刷脸验证二维码、微信发布流程优化
- [x] **前端结构微调**: ScriptExtractionModal 迁移到 features/、contexts 迁移到 shared/contexts/、清理空目录
- [x] **后端模块分层**: materials/tools/ref_audios 三个模块补全 router+schemas+service 分层。
- [x] **开发规范更新**: BACKEND_DEV.md 新增渐进原则、DOC_RULES.md 取消 TASK_COMPLETE.md 手动触发约束。
- [x] **文档全面更新**: BACKEND_DEV/README、FRONTEND_DEV、DEPLOY_MANUAL、README.md 同步更新
- [x] **多素材视频生成(多机位效果)**: 支持多选素材 + 拖拽排序,按素材数量均分音频时长(对齐 Whisper 字边界)自动切换机位。逐段 LatentSync + FFmpeg 拼接。前端 @dnd-kit 拖拽排序 UI
- [x] **字幕开关移除**: 默认启用逐字高亮字幕,移除开关及相关死代码。
- [x] **视频格式扩展**: 上传支持 mkv/webm/flv/wmv/m4v/ts/mts 等常见格式。
- [x] **Watchdog 优化**: 健康检查阈值提高到 5 次,新增重启冷却期 120 秒,避免误重启
- [x] **多素材 Bug 修复**: 修复标点分句方案对无句末标点文案无效(改为均分方案)、音频时间偏移导致口型不对齐等缺陷
### Day 20: 代码质量与安全优化
- [x] **功能性修复**: LatentSync 回退逻辑、任务状态接口认证、User 类型统一
- [x] **性能优化**: N+1 查询修复、视频上传流式处理、httpx 异步替换、GLM 异步包装
- [x] **安全修复**: 硬编码 Cookie 配置化、日志敏感信息脱敏、ffprobe 安全调用、CORS 配置化。
- [x] **配置优化**: 存储路径环境变量化、Remotion 预编译加速、LatentSync 绝对路径。
- [x] **文档更新**: 更新 DOC_RULES.md 清单,补齐后端与部署文档;更新 SUBTITLE_DEPLOY.md, FRONTEND_DEV.md, implementation_plan.md
- [x] **缺陷修复**: 修复 Remotion 路径解析、发布页持久化竞态、首页选中回归、素材闭包陷阱
### Day 19: 自动发布稳定性与发布体验优化 🚀
- [x] **抖音发布稳定性**: 上传入口、封面流程、发布重试、登录失效识别与网络失败快速返回全面增强。
- [x] **视频号发布修复**: 标题+标签统一写入“视频描述”,`post_create` 成功信号快速判定,超时改为失败返回
- [x] **成功截图闭环**: 抖音/视频号发布成功截图接入前端,支持用户隔离存储与鉴权访问
- [x] **截图观感优化**: 成功截图延后 3 秒并改为视口截图,修复“截图内容仅占 1/3”问题
- [x] **调试能力开关化**: 新增视频号录屏配置,默认可按环境变量开关,失败排障更直观
- [x] **启动链路统一**: 合并为 `run_backend.sh`xvfb + headful统一端口 `8006`,减少多进程混淆
- [x] **发布页防误操作**: 发布中按钮提示“请勿刷新或关闭网页”,并启用刷新/关页二次确认拦截
- [ ] **后续优化**: 发布任务状态恢复机制(任务化 + 状态持久化 + 前端轮询恢复)
### Day 18: 后端模块化与规范完善
- [x] **模块化迁移**: 路由透传 `modules/*`,业务逻辑集中到 service/workflow。
- [x] **视频生成拆分**: 生成流程下沉 workflow任务状态统一 TaskStore。
- [x] **Redis 任务存储**: Redis 优先,不可用自动回退内存。
- [x] **仓储层抽离**: Supabase 访问统一 `repositories/*`deps/auth/admin 全面替换。
- [x] **响应规范**: 统一 `success/message/data/code` + 全局异常处理。
- [x] **素材重命名**: 新增重命名接口与 Storage `move_file`
- [x] **平台顺序调整**: 抖音/微信视频号/B站/小红书,移除快手
- [x] **后端开发规范**: 新增 `BACKEND_DEV.md`README 同步模块化结构
- [x] **发布管理体验**: 首页预取路由 + 发布页骨架与缓存,进入更快。
- [x] **素材加载优化**: 素材列表并发签名 URL骨架数量动态。
- [x] **预览加载优化**: `preload="metadata"` + hover 预取
### Day 17: 前端重构与体验优化
- [x] **UI 组件拆分**: 首页拆分为独立组件,降低 `page.tsx` 复杂度。
- [x] **轻量 FSD 迁移**: `app` 页面轻量化,逻辑集中到 `features/*/model`,通用能力下沉 `shared/*`
- [x] **Controller Hooks**: Home/Publish 页面逻辑集中到 Controller HookPage 仅组合渲染。
- [x] **通用工具抽取**: `media.ts` 统一 API Base / URL / 日期格式化。
- [x] **交互优化**: 选择项持久化、列表内定位、刷新回顶部、最新作品优先预览。
- [x] **发布页改造**: 作品列表卡片化 + 搜索 + 预览弹窗。
- [x] **预览体验**: 预览弹窗统一头部样式与提示文案。
- [x] **预览一致性**: 标题/字幕预览按素材分辨率缩放。
- [x] **标题同步与限制**: 片头标题同步发布标题,输入法合成态兼容,限制 15 字。
- [x] **样式默认与持久化**: 默认样式与字号调整,刷新保留用户选择。
- [x] **性能微优化**: 列表渲染优化 + 并行请求 + localStorage 防抖。
- [x] **资源能力**: 字体/BGM 资源库 + `/api/assets` 接入。
- [x] **音频与字幕修复**: BGM 混音稳定性与字幕断句优化。
- [x] **持久化修复**: 接入 `useHomePersistence`,恢复 `isRestored` 逻辑并通过构建。
- [x] **预览与选择修复**: 发布预览兼容签名 URL音频试听路径解析素材/BGM 回退有效项。
- [x] **体验细节优化**: 录音预览 URL 回收,预览弹窗滚动恢复,全局任务提示挂载。
### Day 16: 深度性能优化
- [x] **Qwen-TTS 加速**: 集成 Flash Attention 2 (已停用,被 CosyVoice 3.0 替换)。
- [x] **服务守护**: 开发 `Watchdog` 看门狗机制,自动监控并重启僵死服务。
- [x] **LatentSync 性能确认**: 验证 DeepCache + 原生 Flash Attn 生效。
- [x] **文档重构**: 全面更新 README、部署手册及后端文档。
### Day 15: 手机号认证迁移
- [x] **认证系统升级**: 从邮箱迁移至 11 位手机号注册/登录。
- [x] **账户管理**: 新增修改密码、有效期显示、安全退出功能。
- [x] **AI 文案助手**: 升级 GLM-4.7-Flash支持 B站/抖音链接提取与洗稿。
### Day 14: AI 增强与体验优化
- [x] **AI 标题/标签**: 集成 GLM-4API 自动生成视频元数据。
- [x] **字幕升级**: Remotion 逐字高亮字幕 (卡拉OK效果) 及动画片头。
- [x] **模型升级**: 声音克隆已迁移至 CosyVoice 3.0 (0.5B)。
### Day 13: 声音克隆集成
- [x] **声音克隆微服务**: 封装 CosyVoice 3.0 为独立 API (8010端口替换 Qwen3-TTS)。
- [x] **参考音频管理**: Supabase 存储桶配置与管理接口。
- [x] **多模态 TTS**: 前端支持 EdgeTTS / Clone Voice 切换。
### Day 12: 移动端适配
- [x] **iOS 兼容**: 修复 Safari 安全区域、状态栏颜色、Cookie 拦截问题。
- [x] **响应式 UI**: 移动端 Header 与发布页重构。
### Day 11: 上传架构重构
- [x] **直传优化**: 前端直传 Supabase Storage解决 Nginx 30s 超时问题。
- [x] **数据隔离**: 用户素材/视频按 UserID 物理隔离。
### Day 10: HTTPS 与安全
- [x] **HTTPS 部署**: 配置 SSL 证书与 Nginx 反向代理。
- [x] **安全加固**: Supabase Studio 增加 Basic Auth 保护。
### Day 9: 认证系统与发布闭环
- [x] **用户系统**: 基于 Supabase Auth 实现 JWT 认证。
- [x] **发布闭环**: 验证 B站/抖音/小红书 自动发布流程。
- [x] **服务自愈**: 配置 PM2 进程守护。
### Day 1-8: 核心功能构建
- [x] **Day 8**: 历史记录持久化与文件管理。
- [x] **Day 7**: 社交媒体自动登录与多平台发布。
- [x] **Day 6**: **LatentSync 1.6** 升级与服务器部署。
- [x] **Day 5**: 前端视频上传与进度反馈。
- [x] **Day 4**: MuseTalk (旧版) 口型同步修复。
- [x] **Day 3**: 服务器环境配置与模型权重下载。
- [x] **Day 1-2**: 项目基础框架 (FastAPI + Next.js) 搭建。
---
## 🛤️ 后续规划 (Roadmap)
### 🔴 优先待办
- [x] ~~**配音前置重构 — 第二阶段**: 素材片段截取 + 语音时间轴编排~~ ✅ Day 23 已完成
- [ ] **批量生成架构**: 支持 Excel 导入,批量生产视频。
- [ ] **定时任务后台化**: 迁移前端触发的定时发布到后端 APScheduler。
- [ ] **发布任务恢复机制**: 发布任务化 + 状态持久化 + 前端断点恢复,解决刷新后状态丢失。
### 🔵 长期探索
- [ ] **容器化交付**: 提供完整的 Docker Compose 一键部署包。
- [ ] **分布式队列**: 引入 Celery + Redis 处理超高并发任务。
---
## 📊 模块完成度
| 模块 | 进度 | 状态 |
|------|------|------|
| **核心 API** | 100% | ✅ 稳定 |
| **Web UI** | 100% | ✅ 稳定 (移动端适配) |
| **唇形同步** | 100% | ✅ LatentSync 1.6 |
| **TTS 配音** | 100% | ✅ EdgeTTS + CosyVoice 3.0 + 配音前置 + 时间轴编排 + 自动转写 + 语速控制 + 语气控制 |
| **自动发布** | 100% | ✅ 抖音/微信视频号/B站/小红书 |
| **用户认证** | 100% | ✅ 手机号 + JWT |
| **付费会员** | 100% | ✅ 支付宝电脑网站支付 + 自动激活 |
| **部署运维** | 100% | ✅ PM2 + Watchdog |
---
## 📎 相关文档
- [详细开发日志 (DevLogs)](Docs/DevLogs/)
- [部署手册 (DEPLOY_MANUAL)](Docs/DEPLOY_MANUAL.md)

View File

@@ -16,26 +16,31 @@
## ✨ 功能特性
### 核心能力
- 🎬 **高清唇形同步** - 混合方案:短视频 (<120s) 用 LatentSync 1.6 (高质量 Latent Diffusion),长视频 (>=120s) 用 MuseTalk 1.5 (实时级单步推理),自动路由 + 回退。前端可选模型:默认模型(阈值自动路由)/ 快速模型(强制 MuseTalk/ 高级模型(强制 LatentSync)。
- 🎬 **高清唇形同步** - 混合方案:短视频(本仓库当前 `.env` 阈值 100s可配用 LatentSync 1.6高质量 Latent Diffusion,长视频用 MuseTalk 1.5实时级单步推理,自动路由 + 回退。前端可选模型:默认模型(阈值自动路由)/ 快速模型(速度优先)/ 高级模型(质量优先)。
- 🧠 **小脸口型质量补偿(可选)** - 本地唇形路径支持小脸检测 + 裁切 + 稀疏关键帧超分 + 下半脸贴回补偿链路;默认关闭(`LIPSYNC_SMALL_FACE_ENHANCE=false`失败自动回退原流程fail-open
- 🎙️ **多模态配音** - 支持 **EdgeTTS** (微软超自然语音, 10 语言) 和 **CosyVoice 3.0** (3秒极速声音克隆, 9语言+18方言, 语速/语气可调)。上传参考音频自动 Whisper 转写 + 智能截取。配音前置工作流:先生成配音 → 选素材 → 生成视频。
- 📝 **智能字幕** - 集成 faster-whisper + Remotion自动生成逐字高亮 (卡拉OK效果) 字幕。
- 🎨 **样式预设** - 12 种标题 + 8 种字幕样式预设,支持预览 + 字号调节 + 自定义字体库。CSS 原生描边渲染,清晰无重影。
- 🏷️ **标题显示模式** - 片头标题支持 `短暂显示` / `常驻显示`默认短暂显示4秒用户偏好自动持久化。
- 📌 **片头副标题** - 可选副标题显示在主标题下方独立样式配置AI 可同时生成20 字限制。
- 🖼️ **作品预览一致性** - 标题/字幕预览与 Remotion 成片统一响应式缩放和自动换行,窄屏画布也稳定显示。
- 🎞️ **多素材多机位** - 支持多选素材 + 时间轴编辑器 (wavesurfer.js 波形可视化)拖拽分割线调整时长、拖拽排序切换机位、按 `source_start/source_end` 截取片段
- 🎞️ **多素材多机位** - 支持多选素材 + 时间轴编辑器 (wavesurfer.js 波形可视化)主素材连续循环播放 + 浮动插入镜头块自由叠加拖拽移动位置、ClipTrimmer 统一编辑截取范围与时长,支持"设为主素材"切换
- 📐 **画面比例控制** - 时间轴一键切换 `9:16 / 16:9` 输出比例,生成链路全程按目标比例处理。
- 💾 **用户偏好持久化** - 首页状态统一恢复/保存,刷新后延续上次配置。历史文案手动保存与加载
- 🎵 **背景音乐** - 试听 + 音量控制 + 混音,保持配音音量稳定。
- 🤖 **AI 辅助创作** - 内置 GLM-4.7-Flash支持 B站/抖音链接文案提取、AI 智能改写(支持自定义提示词)、标题/标签自动生成、9 语言翻译
- 💾 **用户偏好持久化** - 首页状态统一恢复/保存,刷新后延续上次配置;新作品生成后优先选中最新,后续用户手动选择持续持久化
- 🎵 **背景音乐** - 试听 + 搜索选择 + 混音(当前前端固定混音系数,保持配音音量稳定
- 🧩 **统一选择器交互** - 首页/发布页业务选择项统一 SelectPopover桌面 Popover / 移动端 BottomSheet支持自动上拉、已选定位与连续预览
- 🤖 **AI 辅助创作** - 内置 GLM-4.7-Flash支持 B站/抖音链接文案提取、AI 智能改写(支持自定义提示词)、文案深度学习(博主话题分析+文案生成)、标题/标签自动生成、9 语言翻译。
### 平台化功能
- 📱 **全自动发布** - 支持抖音/微信视频号/B站/小红书立即发布;扫码登录 + Cookie 持久化。
- 🖥️ **发布管理预览** - 支持签名 URL / 相对路径作品预览,确保可直接播放。
- 📸 **发布结果可视化** - 抖音/微信视频号发布成功后返回截图,发布页结果卡片可直接查看。
- 📸 **发布结果可视化** - 抖音/微信视频号/小红书发布成功后返回截图,发布页结果卡片可直接查看。
- 🧹 **发布后工作区清理引导** - 全平台发布成功后弹出不可误关清理弹窗(失败可重试,达到阈值可暂不清理),仅清输入内容并保留用户偏好。
- ⬇️ **一键下载直达** - 首页与发布成功弹窗下载统一走同源 `attachment` 接口,不再新开标签页播放视频。
- 🛡️ **发布防误操作** - 发布进行中自动提示“请勿刷新或关闭网页”,并拦截刷新/关页二次确认。
- 💳 **付费会员** - 支付宝电脑网站支付自动开通会员,到期自动停用并引导续费,管理员手动激活并存。
- 🔐 **认证与隔离** - 基于 Supabase 的用户隔离,支持手机号注册/登录、密码管理。
- 🛡️ **安全基线** - AI/Tools 接口强制登录鉴权、关键上传链路体积限制、生产环境默认密钥启动拦截。
- 🛡️ **服务守护** - 内置 Watchdog 看门狗机制,自动监控并重启僵死服务,确保 7x24h 稳定运行。
- 🚀 **性能优化** - 编码流水线从 5-6 次有损编码精简至 3 次prepare_segment → 模型输出 → Remotion、compose 流复制免重编码、同分辨率跳过 scale、FFmpeg 超时保护、全局视频生成并发限制 (Semaphore(2))、Remotion 4 并发渲染、MuseTalk rawvideo 管道直编码(消除中间有损文件)、模型常驻服务、双 GPU 流水线并发、Redis 任务 TTL 自动清理、workflow 阻塞调用线程池化。
@@ -61,10 +66,12 @@
### 部署运维
- **[部署手册 (DEPLOY_MANUAL.md)](Docs/DEPLOY_MANUAL.md)** - 👈 **部署请看这里**!包含完整的环境搭建步骤。
- [多平台发布部署说明 (PUBLISH_DEPLOY.md)](Docs/PUBLISH_DEPLOY.md) - 抖音/微信视频号/B站/小红书登录与自动化发布专项文档。
- [参考音频服务部署 (COSYVOICE3_DEPLOY.md)](Docs/COSYVOICE3_DEPLOY.md) - 声音克隆模型部署指南。
- [LatentSync 部署指南 (LATENTSYNC_DEPLOY.md)](Docs/LATENTSYNC_DEPLOY.md) - 唇形同步模型独立部署。
- [MuseTalk 部署指南 (MUSETALK_DEPLOY.md)](Docs/MUSETALK_DEPLOY.md) - 长视频唇形同步模型部署。
- [Supabase 部署指南 (SUPABASE_DEPLOY.md)](Docs/SUPABASE_DEPLOY.md) - Supabase 与认证系统配置
- [LatentSync 部署指南 (LATENTSYNC_DEPLOY.md)](Docs/LATENTSYNC_DEPLOY.md) - 唇形同步模型独立部署。
- [MuseTalk 部署指南 (MUSETALK_DEPLOY.md)](Docs/MUSETALK_DEPLOY.md) - 长视频唇形同步模型部署。
- [小脸口型质量补偿链路部署指南 (FACEENHANCE_DEPLOY.md)](Docs/FACEENHANCE_DEPLOY.md) - 小脸口型质量补偿链路部署与验证
- [Supabase 部署指南 (SUPABASE_DEPLOY.md)](Docs/SUPABASE_DEPLOY.md) - Supabase 与认证系统配置。
- [支付宝部署指南 (ALIPAY_DEPLOY.md)](Docs/ALIPAY_DEPLOY.md) - 支付宝付费开通会员配置。
### 开发文档

View File

@@ -2,7 +2,7 @@
# 复制此文件为 .env 并填入实际值
# 调试模式
DEBUG=true
DEBUG=false
# Redis 配置 (Celery 任务队列)
REDIS_URL=redis://localhost:6379/0
@@ -83,6 +83,13 @@ MUSETALK_ENCODE_PRESET=slow
# 音频时长 >= 此阈值(秒)用 MuseTalk< 此阈值用 LatentSync
LIPSYNC_DURATION_THRESHOLD=100
# =============== 小脸口型质量补偿链路 ===============
LIPSYNC_SMALL_FACE_ENHANCE=true
LIPSYNC_SMALL_FACE_THRESHOLD=256
LIPSYNC_SMALL_FACE_UPSCALER=gfpgan
LIPSYNC_SMALL_FACE_GPU_ID=0
LIPSYNC_SMALL_FACE_FAIL_OPEN=true
# =============== 上传配置 ===============
# 最大上传文件大小 (MB)
MAX_UPLOAD_SIZE_MB=500

View File

@@ -43,6 +43,16 @@ class Settings(BaseSettings):
DOUYIN_KEEP_SUCCESS_VIDEO: bool = False
DOUYIN_RECORD_VIDEO_WIDTH: int = 1280
DOUYIN_RECORD_VIDEO_HEIGHT: int = 720
# Xiaohongshu Playwright 配置
XIAOHONGSHU_HEADLESS_MODE: str = "headless-new"
XIAOHONGSHU_USER_AGENT: str = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/144.0.0.0 Safari/537.36"
XIAOHONGSHU_LOCALE: str = "zh-CN"
XIAOHONGSHU_TIMEZONE_ID: str = "Asia/Shanghai"
XIAOHONGSHU_CHROME_PATH: str = "/usr/bin/google-chrome"
XIAOHONGSHU_BROWSER_CHANNEL: str = ""
XIAOHONGSHU_FORCE_SWIFTSHADER: bool = True
XIAOHONGSHU_DEBUG_ARTIFACTS: bool = False
# TTS 配置
DEFAULT_TTS_VOICE: str = "zh-CN-YunxiNeural"
@@ -68,6 +78,13 @@ class Settings(BaseSettings):
# 混合唇形同步路由
LIPSYNC_DURATION_THRESHOLD: float = 120.0 # 秒,>=此值用 MuseTalk
# 小脸口型质量补偿链路
LIPSYNC_SMALL_FACE_ENHANCE: bool = False
LIPSYNC_SMALL_FACE_THRESHOLD: int = 256
LIPSYNC_SMALL_FACE_UPSCALER: str = "codeformer"
LIPSYNC_SMALL_FACE_GPU_ID: int = 0
LIPSYNC_SMALL_FACE_FAIL_OPEN: bool = True
# Supabase 配置
SUPABASE_URL: str = ""
SUPABASE_PUBLIC_URL: str = "" # 公网访问地址,用于生成前端可访问的 URL

View File

@@ -130,6 +130,20 @@ app.include_router(generated_audios_router, prefix="/api/generated-audios", tags
app.include_router(payment_router) # /api/payment
@app.on_event("startup")
async def check_jwt_secret():
if settings.JWT_SECRET_KEY == "your-secret-key-change-in-production":
if not settings.DEBUG:
raise RuntimeError(
"JWT_SECRET_KEY is still the default value! "
"Set a strong random secret in .env before running in production (DEBUG=False)."
)
logger.critical(
"JWT_SECRET_KEY is still the default value! "
"Set a strong random secret in .env for production."
)
@app.on_event("startup")
async def init_admin():
"""

View File

@@ -4,11 +4,12 @@ AI 相关 API 路由
from typing import Optional
from fastapi import APIRouter, HTTPException
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from loguru import logger
from app.services.glm_service import glm_service
from app.core.deps import get_current_user
from app.core.response import success_response
@@ -40,7 +41,7 @@ class TranslateRequest(BaseModel):
@router.post("/translate")
async def translate_text(req: TranslateRequest):
async def translate_text(req: TranslateRequest, current_user: dict = Depends(get_current_user)):
"""
AI 翻译文案
@@ -57,11 +58,11 @@ async def translate_text(req: TranslateRequest):
return success_response({"translated_text": translated})
except Exception as e:
logger.error(f"Translate failed: {e}")
raise HTTPException(status_code=500, detail=str(e))
raise HTTPException(status_code=500, detail="翻译服务暂时不可用,请稍后重试")
@router.post("/generate-meta")
async def generate_meta(req: GenerateMetaRequest):
async def generate_meta(req: GenerateMetaRequest, current_user: dict = Depends(get_current_user)):
"""
AI 生成视频标题和标签
@@ -80,11 +81,11 @@ async def generate_meta(req: GenerateMetaRequest):
).model_dump())
except Exception as e:
logger.error(f"Generate meta failed: {e}")
raise HTTPException(status_code=500, detail=str(e))
raise HTTPException(status_code=500, detail="生成标题标签失败,请稍后重试")
@router.post("/rewrite")
async def rewrite_script(req: RewriteRequest):
async def rewrite_script(req: RewriteRequest, current_user: dict = Depends(get_current_user)):
"""AI 改写文案"""
if not req.text or not req.text.strip():
raise HTTPException(status_code=400, detail="文案不能为空")
@@ -95,4 +96,4 @@ async def rewrite_script(req: RewriteRequest):
return success_response({"rewritten_text": rewritten})
except Exception as e:
logger.error(f"Rewrite failed: {e}")
raise HTTPException(status_code=500, detail=str(e))
raise HTTPException(status_code=500, detail="改写服务暂时不可用,请稍后重试")

View File

@@ -152,9 +152,9 @@ async def generate_audio_task(task_id: str, req: GenerateAudioRequest, user_id:
task_store.update(task_id, {
"status": "failed",
"message": f"配音生成失败: {str(e)}",
"error": traceback.format_exc(),
"error": str(e),
})
logger.error(f"Generate audio failed: {e}")
logger.error(f"Generate audio failed: {e}\n{traceback.format_exc()}")
async def list_generated_audios(user_id: str) -> dict:
@@ -215,6 +215,30 @@ async def list_generated_audios(user_id: str) -> dict:
return GeneratedAudioListResponse(items=items).model_dump()
async def delete_all_generated_audios(user_id: str) -> tuple[int, int]:
"""删除用户所有生成的配音(.wav + .json返回 (删除数量, 失败数量)"""
try:
files = await storage_service.list_files(BUCKET, user_id, strict=True)
deleted_count = 0
failed_count = 0
for f in files:
name = f.get("name", "")
if not name or name == ".emptyFolderPlaceholder":
continue
if name.endswith("_audio.wav") or name.endswith("_audio.json"):
full_path = f"{user_id}/{name}"
try:
await storage_service.delete_file(BUCKET, full_path)
deleted_count += 1
except Exception as e:
failed_count += 1
logger.warning(f"Delete audio file failed: {full_path}, {e}")
return deleted_count, failed_count
except Exception as e:
logger.error(f"Delete all generated audios failed: {e}")
return 0, 1
async def delete_generated_audio(audio_id: str, user_id: str) -> None:
if not audio_id.startswith(f"{user_id}/"):
raise PermissionError("无权删除此文件")

View File

@@ -14,6 +14,8 @@ router = APIRouter()
@router.get("/stream/{material_id:path}")
async def stream_material(material_id: str, current_user: dict = Depends(get_current_user)):
"""直接流式返回素材文件(同源,避免 CORS canvas taint"""
if ".." in material_id:
raise HTTPException(400, "非法素材ID")
user_id = current_user["id"]
if not material_id.startswith(f"{user_id}/"):
raise HTTPException(403, "无权访问此素材")
@@ -52,6 +54,8 @@ async def delete_material(material_id: str, current_user: dict = Depends(get_cur
try:
await service.delete_material(material_id, user_id)
return success_response(message="素材已删除")
except ValueError as e:
raise HTTPException(400, str(e))
except PermissionError as e:
raise HTTPException(403, str(e))
except Exception as e:

View File

@@ -7,6 +7,7 @@ import aiofiles
from pathlib import Path
from loguru import logger
from app.core.config import settings as app_settings
from app.services.storage import storage_service
@@ -123,6 +124,9 @@ async def upload_material(request, user_id: str) -> dict:
async for chunk in request.stream():
await f.write(chunk)
total_size += len(chunk)
max_bytes = app_settings.MAX_UPLOAD_SIZE_MB * 1024 * 1024
if total_size > max_bytes:
raise ValueError(f"文件大小超过限制 ({app_settings.MAX_UPLOAD_SIZE_MB}MB)")
if total_size - last_log > 20 * 1024 * 1024:
logger.info(f"Receiving stream... Processed {total_size / (1024*1024):.2f} MB")
@@ -239,6 +243,8 @@ async def list_materials(user_id: str) -> list[dict]:
async def delete_material(material_id: str, user_id: str) -> None:
"""删除素材"""
if ".." in material_id:
raise ValueError("非法素材ID")
if not material_id.startswith(f"{user_id}/"):
raise PermissionError("无权删除此素材")
await storage_service.delete_file(
@@ -249,6 +255,8 @@ async def delete_material(material_id: str, user_id: str) -> None:
async def rename_material(material_id: str, new_name_raw: str, user_id: str) -> dict:
"""重命名素材,返回更新后的素材信息"""
if ".." in material_id:
raise ValueError("非法素材ID")
if not material_id.startswith(f"{user_id}/"):
raise PermissionError("无权重命名此素材")

View File

@@ -104,6 +104,8 @@ async def upload_ref_audio(file, ref_text: str, user_id: str) -> dict:
# 创建临时文件
with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp_input:
content = await file.read()
if len(content) > 5 * 1024 * 1024:
raise ValueError("参考音频文件大小不能超过 5MB")
tmp_input.write(content)
tmp_input_path = tmp_input.name

View File

@@ -1,20 +1,54 @@
from fastapi import APIRouter, UploadFile, File, Form, HTTPException
from fastapi import APIRouter, Depends, UploadFile, File, Form, HTTPException
from typing import Optional
from urllib.parse import urlparse
import traceback
from loguru import logger
from pydantic import BaseModel, Field, field_validator
from app.core.deps import get_current_user
from app.core.response import success_response
from app.modules.tools import service
from app.services import creator_scraper
from app.services.creator_scraper import ALLOWED_INPUT_DOMAINS
from app.services.glm_service import glm_service
router = APIRouter()
class AnalyzeCreatorRequest(BaseModel):
url: str = Field(..., description="博主主页链接(仅支持抖音/B站 https 链接)")
@field_validator("url")
@classmethod
def validate_url_format(cls, value: str) -> str:
candidate = value.strip()
if len(candidate) > 500:
raise ValueError("链接过长")
parsed = urlparse(candidate)
if parsed.scheme != "https":
raise ValueError("仅支持 https 链接")
hostname = (parsed.hostname or "").lower()
if hostname not in ALLOWED_INPUT_DOMAINS:
raise ValueError(f"不支持的域名: {hostname}仅支持抖音和B站")
return candidate
class GenerateTopicScriptRequest(BaseModel):
analysis_id: str = Field(..., min_length=8, max_length=80, description="分析结果ID")
topic: str = Field(..., min_length=2, max_length=30, description="选中的话题2-30字")
word_count: int = Field(..., ge=80, le=1000, description="目标字数80-1000")
@router.post("/extract-script")
async def extract_script_tool(
file: Optional[UploadFile] = File(None),
url: Optional[str] = Form(None),
rewrite: bool = Form(True),
custom_prompt: Optional[str] = Form(None)
custom_prompt: Optional[str] = Form(None),
current_user: dict = Depends(get_current_user),
):
"""独立文案提取工具"""
try:
@@ -29,5 +63,64 @@ async def extract_script_tool(
logger.error(traceback.format_exc())
msg = str(e)
if "Fresh cookies" in msg:
msg = "下载失败:目标平台开启了反爬验证,请过段时间重试或直接上传视频文件。"
raise HTTPException(500, f"提取失败: {msg}")
raise HTTPException(500, "下载失败:目标平台开启了反爬验证,请过段时间重试或直接上传视频文件。")
raise HTTPException(500, "文案提取失败,请稍后重试")
@router.post("/analyze-creator")
async def analyze_creator(
req: AnalyzeCreatorRequest,
current_user: dict = Depends(get_current_user),
):
"""分析博主内容并返回热门话题"""
try:
user_id = str(current_user.get("id") or "").strip()
if not user_id:
raise HTTPException(401, "登录状态无效,请重新登录")
creator_result = await creator_scraper.scrape_creator_titles(req.url, user_id=user_id)
titles = creator_result.get("titles") or []
topics = await glm_service.analyze_topics(titles)
analysis_id = creator_scraper.cache_titles(titles, user_id)
return success_response({
"platform": creator_result.get("platform", ""),
"creator_name": creator_result.get("creator_name", ""),
"topics": topics,
"analysis_id": analysis_id,
"fetched_count": creator_result.get("fetched_count", len(titles)),
})
except ValueError as e:
raise HTTPException(400, str(e))
except HTTPException:
raise
except Exception as e:
logger.error(f"Analyze creator failed: {e}")
logger.error(traceback.format_exc())
raise HTTPException(500, "博主内容分析失败,请稍后重试")
@router.post("/generate-topic-script")
async def generate_topic_script(
req: GenerateTopicScriptRequest,
current_user: dict = Depends(get_current_user),
):
"""根据话题生成文案"""
try:
user_id = str(current_user.get("id") or "").strip()
if not user_id:
raise HTTPException(401, "登录状态无效,请重新登录")
titles = creator_scraper.get_cached_titles(req.analysis_id, user_id)
script = await glm_service.generate_script_from_topic(req.topic, req.word_count, titles)
return success_response({"script": script})
except ValueError as e:
raise HTTPException(400, str(e))
except HTTPException:
raise
except Exception as e:
logger.error(f"Generate topic script failed: {e}")
logger.error(traceback.format_exc())
raise HTTPException(500, "文案生成失败,请稍后重试")

View File

@@ -8,7 +8,7 @@ import subprocess
import traceback
from pathlib import Path
from typing import Optional, Any
from urllib.parse import unquote
from urllib.parse import unquote, parse_qs, urlparse
import httpx
from loguru import logger
@@ -41,7 +41,19 @@ async def extract_script(file=None, url: Optional[str] = None, rewrite: bool = T
raise ValueError("文件名无效")
safe_filename = Path(filename).name.replace(" ", "_")
temp_path = temp_dir / f"tool_extract_{timestamp}_{safe_filename}"
await loop.run_in_executor(None, lambda: shutil.copyfileobj(file.file, open(temp_path, "wb")))
max_bytes = 500 * 1024 * 1024 # 500MB
total_written = 0
with open(temp_path, "wb") as dst:
while True:
chunk = file.file.read(1024 * 1024)
if not chunk:
break
total_written += len(chunk)
if total_written > max_bytes:
dst.close()
os.remove(temp_path)
raise ValueError("上传文件大小不能超过 500MB")
dst.write(chunk)
logger.info(f"Tool processing upload file: {temp_path}")
else:
temp_path = await _download_video(url, temp_dir, timestamp)
@@ -49,6 +61,13 @@ async def extract_script(file=None, url: Optional[str] = None, rewrite: bool = T
if not temp_path or not temp_path.exists():
raise ValueError("文件获取失败")
# 下载文件体积检查500MB 上限)
max_download_bytes = 500 * 1024 * 1024
file_size = temp_path.stat().st_size
if file_size > max_download_bytes:
os.remove(temp_path)
raise ValueError(f"下载的文件过大({file_size / (1024*1024):.0f}MB上限 500MB")
# 1.5 安全转换: 强制转为 WAV (16k)
audio_path = temp_dir / f"extract_audio_{timestamp}.wav"
try:
@@ -193,10 +212,9 @@ async def _download_douyin_manual(url: str, temp_dir: Path, timestamp: int) -> O
logger.info(f"[douyin-fallback] Final URL: {final_url}")
video_id = None
match = re.search(r'/video/(\d+)', final_url)
if match:
video_id = match.group(1)
video_id = _extract_douyin_video_id(final_url)
if not video_id:
video_id = _extract_douyin_video_id(url)
if not video_id:
logger.error("[douyin-fallback] Could not extract video_id")
@@ -217,7 +235,8 @@ async def _download_douyin_manual(url: str, temp_dir: Path, timestamp: int) -> O
"cbUrlProtocol": "https", "union": True,
}
)
ttwid = ttwid_resp.cookies.get("ttwid", "")
fresh_ttwid = ttwid_resp.cookies.get("ttwid")
ttwid = str(fresh_ttwid) if fresh_ttwid else ""
logger.info(f"[douyin-fallback] Got fresh ttwid (len={len(ttwid)})")
except Exception as e:
logger.warning(f"[douyin-fallback] Failed to get ttwid: {e}")
@@ -277,6 +296,39 @@ async def _download_douyin_manual(url: str, temp_dir: Path, timestamp: int) -> O
return None
def _extract_douyin_video_id(candidate_url: str) -> Optional[str]:
"""从抖音 URL 中提取视频 ID兼容 video/share/video/modal_id/vid 等形态"""
if not candidate_url:
return None
decoded_url = unquote(candidate_url)
parsed = urlparse(decoded_url)
for source in (decoded_url, parsed.path):
for pattern in (r"/video/(\d+)", r"/share/video/(\d+)"):
match = re.search(pattern, source)
if match:
return match.group(1)
id_keys = ("modal_id", "vid", "video_id", "aweme_id", "item_id")
for pairs in (parse_qs(parsed.query), parse_qs(parsed.fragment)):
for key in id_keys:
values = pairs.get(key, [])
for value in values:
match = re.search(r"(\d+)", value)
if match:
return match.group(1)
inline_match = re.search(
r"(?:[?&#](?:modal_id|vid|video_id|aweme_id|item_id)=)(\d+)",
decoded_url,
)
if inline_match:
return inline_match.group(1)
return None
async def _download_bilibili_manual(url: str, temp_dir: Path, timestamp: int) -> Optional[Path]:
"""手动下载 Bilibili 视频 (Playwright Fallback)"""
from playwright.async_api import async_playwright

View File

@@ -1,17 +1,80 @@
from fastapi import APIRouter, BackgroundTasks, Depends
import os
import re
import tempfile
import uuid
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException
from fastapi.responses import FileResponse
from loguru import logger
from starlette.background import BackgroundTask
from app.core.deps import get_current_user
from app.core.response import success_response
from app.services.tts_service import TTSService
from .schemas import GenerateRequest
from .schemas import GenerateRequest, VoicePreviewRequest
from .task_store import create_task, get_task, list_tasks
from .workflow import process_video_generation, get_lipsync_health, get_voiceclone_health
from .service import list_generated_videos, delete_generated_video
from .service import list_generated_videos, delete_generated_video, delete_all_generated_videos
from app.modules.generated_audios.service import delete_all_generated_audios
from app.services.storage import storage_service
router = APIRouter()
PREVIEW_TEXTS = {
"zh-CN": "你好,请选择你喜欢的音色吧。",
"en-US": "Hello, please choose the voice you like.",
"ja-JP": "こんにちは。お好きな音声を選んでください。",
"ko-KR": "안녕하세요, 마음에 드는 음성을 선택해 주세요.",
"fr-FR": "Bonjour, veuillez choisir la voix que vous preferez.",
"de-DE": "Hallo, bitte waehlen Sie die Stimme, die Ihnen gefaellt.",
"es-ES": "Hola, por favor elige la voz que mas te guste.",
"ru-RU": "Zdravstvuite, pozhaluista, vyberite golos, kotoryi vam nravitsya.",
"it-IT": "Ciao, scegli la voce che preferisci.",
"pt-BR": "Ola, escolha a voz de que voce mais gosta.",
}
def _cleanup_temp_file(path: str) -> None:
try:
os.unlink(path)
except Exception:
pass
def _get_voice_locale(voice: str) -> str:
parts = voice.split("-")
if len(parts) >= 2:
return f"{parts[0]}-{parts[1]}"
return "zh-CN"
def _get_preview_text_for_voice(voice: str) -> str:
locale = _get_voice_locale(voice)
return PREVIEW_TEXTS.get(locale, PREVIEW_TEXTS["zh-CN"])
async def _render_voice_preview(voice: str, text: str) -> FileResponse:
tmp_file = tempfile.NamedTemporaryFile(prefix="voice_preview_", suffix=".mp3", delete=False)
output_path = tmp_file.name
tmp_file.close()
tts = TTSService()
try:
await tts.generate_audio(text=text, voice=voice, output_path=output_path)
except Exception as e:
_cleanup_temp_file(output_path)
logger.error(f"音色试听生成失败: voice={voice}, error={e}")
raise HTTPException(status_code=500, detail="音色试听生成失败,请稍后重试")
return FileResponse(
path=output_path,
media_type="audio/mpeg",
filename="voice_preview.mp3",
background=BackgroundTask(_cleanup_temp_file, output_path),
)
@router.post("/generate")
async def generate_video(
@@ -53,12 +116,91 @@ async def voiceclone_health():
return success_response(await get_voiceclone_health())
@router.post("/cleanup")
async def cleanup_workspace(current_user: dict = Depends(get_current_user)):
user_id = current_user["id"]
videos_deleted, videos_failed = await delete_all_generated_videos(user_id)
audios_deleted, audios_failed = await delete_all_generated_audios(user_id)
if videos_failed > 0 or audios_failed > 0:
raise HTTPException(
status_code=500,
detail=(
f"工作区清理不完整:视频删除失败 {videos_failed} 个,"
f"配音删除失败 {audios_failed} 个,请重试"
),
)
return success_response({
"videos_deleted": videos_deleted,
"audios_deleted": audios_deleted,
}, message="工作区已清理")
@router.get("/generated")
async def list_generated(current_user: dict = Depends(get_current_user)):
return success_response(await list_generated_videos(current_user["id"]))
@router.get("/generated/{video_id}/download")
async def download_generated(video_id: str, current_user: dict = Depends(get_current_user)):
if not re.match(r'^[A-Za-z0-9_-]+$', video_id):
raise HTTPException(status_code=400, detail="非法 video_id")
user_id = current_user["id"]
storage_path = f"{user_id}/{video_id}.mp4"
local_path = storage_service.get_local_file_path(
bucket=storage_service.BUCKET_OUTPUTS,
path=storage_path,
)
if not local_path or not os.path.exists(local_path):
raise HTTPException(status_code=404, detail="视频文件不存在")
return FileResponse(
path=local_path,
media_type="video/mp4",
filename=f"{video_id}.mp4",
headers={"Content-Disposition": f'attachment; filename="{video_id}.mp4"'},
)
@router.delete("/generated/{video_id}")
async def delete_generated(video_id: str, current_user: dict = Depends(get_current_user)):
if not re.match(r'^[A-Za-z0-9_-]+$', video_id):
raise HTTPException(status_code=400, detail="非法 video_id")
result = await delete_generated_video(current_user["id"], video_id)
return success_response(result, message="视频已删除")
@router.post("/voice-preview")
async def preview_voice_post(
req: VoicePreviewRequest,
current_user: dict = Depends(get_current_user),
):
# 复用统一鉴权,接口本身不需要 user_id
_ = current_user
voice = req.voice.strip()
text = req.text.strip()
if not voice:
raise HTTPException(status_code=400, detail="voice 不能为空")
if not text:
raise HTTPException(status_code=400, detail="text 不能为空")
return await _render_voice_preview(voice=voice, text=text)
@router.get("/voice-preview")
async def preview_voice_get(
voice: str,
current_user: dict = Depends(get_current_user),
):
# 复用统一鉴权,接口本身不需要 user_id
_ = current_user
voice_value = voice.strip()
if not voice_value:
raise HTTPException(status_code=400, detail="voice 不能为空")
text = _get_preview_text_for_voice(voice_value)
return await _render_voice_preview(voice=voice_value, text=text)

View File

@@ -1,4 +1,4 @@
from pydantic import BaseModel
from pydantic import BaseModel, Field
from typing import Optional, List, Literal
@@ -39,3 +39,8 @@ class GenerateRequest(BaseModel):
custom_assignments: Optional[List[CustomAssignment]] = None
output_aspect_ratio: Literal["9:16", "16:9"] = "9:16"
lipsync_model: Literal["default", "fast", "advanced"] = "default"
class VoicePreviewRequest(BaseModel):
voice: str
text: str = Field(..., min_length=1, max_length=120)

View File

@@ -73,6 +73,36 @@ async def list_generated_videos(user_id: str) -> dict:
return {"videos": []}
async def delete_all_generated_videos(user_id: str) -> tuple[int, int]:
"""删除用户所有生成的视频,返回 (删除数量, 失败数量)"""
try:
files = await storage_service.list_files(
bucket=storage_service.BUCKET_OUTPUTS,
path=user_id,
strict=True,
)
deleted_count = 0
failed_count = 0
for f in files:
name = f.get("name")
if not name or name == ".emptyFolderPlaceholder":
continue
full_path = f"{user_id}/{name}"
try:
await storage_service.delete_file(
bucket=storage_service.BUCKET_OUTPUTS,
path=full_path
)
deleted_count += 1
except Exception as e:
failed_count += 1
logger.warning(f"Delete file failed: {full_path}, {e}")
return deleted_count, failed_count
except Exception as e:
logger.error(f"Delete all generated videos failed: {e}")
return 0, 1
async def delete_generated_video(user_id: str, video_id: str) -> dict:
"""删除生成的视频"""
try:

View File

@@ -188,16 +188,16 @@ async def _process_video_generation_inner(task_id: str, req: GenerateRequest, us
try:
start_time = time.time()
# ── 确定素材列表 ──
# ── 确定素材列表(优先信任 req.material_paths 去重列表)──
material_paths: List[str] = []
if req.custom_assignments and len(req.custom_assignments) > 1:
material_paths = [a.material_path for a in req.custom_assignments if a.material_path]
elif req.material_paths and len(req.material_paths) > 1:
if req.material_paths and len(req.material_paths) >= 1:
material_paths = req.material_paths
else:
material_paths = [req.material_path]
is_multi = len(material_paths) > 1
is_multi = len(material_paths) > 1 or (
req.custom_assignments is not None and len(req.custom_assignments) > 1
)
target_resolution = (1080, 1920) if req.output_aspect_ratio == "9:16" else (1920, 1080)
logger.info(
@@ -341,8 +341,18 @@ async def _process_video_generation_inner(task_id: str, req: GenerateRequest, us
# ══════════════════════════════════════
_update_task(task_id, progress=12, message="正在分配素材...")
if req.custom_assignments and len(req.custom_assignments) == len(material_paths):
# 用户自定义分配,跳过 Whisper 均分
if req.custom_assignments and len(req.custom_assignments) >= 1:
# 硬上限校验
if len(req.custom_assignments) > 50:
raise ValueError(f"custom_assignments 数量超限: {len(req.custom_assignments)}")
# 校验所有 assignment 的 material_path 都在前端声明的 material_paths 中
known_paths = set(material_paths)
unknown = [a.material_path for a in req.custom_assignments if a.material_path not in known_paths]
if unknown:
logger.warning(f"[MultiMat] custom_assignments 包含未知素材路径: {unknown[:3]},终止生成")
raise ValueError(f"素材路径校验失败: 包含 {len(unknown)} 个未知路径")
# 用户自定义分配(多镜头模式:主素材可重复出现)
assignments = [
{
"material_path": a.material_path,
@@ -373,20 +383,13 @@ async def _process_video_generation_inner(task_id: str, req: GenerateRequest, us
captions_path = None
else:
captions_path = None
elif req.custom_assignments:
logger.warning(
f"[MultiMat] custom_assignments 数量({len(req.custom_assignments)})"
f" 与素材数量({len(material_paths)})不一致,回退自动分配"
)
assignments, captions_path = await _whisper_and_split()
else:
assignments, captions_path = await _whisper_and_split()
# 扩展段覆盖完整音频范围首段从0开始末段到音频结尾
# 扩展段覆盖完整音频范围(仅自动均分时执行,自定义分配已精确计算)
audio_duration = await _run_blocking(video._get_duration, str(audio_path))
if assignments and audio_duration > 0:
if not req.custom_assignments and assignments and audio_duration > 0:
assignments[0]["start"] = 0.0
assignments[-1]["end"] = audio_duration
@@ -398,65 +401,73 @@ async def _process_video_generation_inner(task_id: str, req: GenerateRequest, us
lipsync_start = time.time()
# ── 第一步:并行下载所有素材并检测分辨率 ──
material_locals: List[Path] = []
resolutions = []
# 并发限流(每个任务独立 Semaphore峰值 2×4=8 个 ffmpeg 进程)
_segment_sem = asyncio.Semaphore(4)
async def _download_and_normalize(i: int, assignment: dict):
"""下载单个素材并归一化方向"""
material_local = temp_dir / f"{task_id}_material_{i}.mp4"
temp_files.append(material_local)
await _download_material(assignment["material_path"], material_local)
# ── 第一步:去重下载所有素材并检测分辨率 ──
unique_paths = list(dict.fromkeys(a["material_path"] for a in assignments))
path_to_local: dict = {} # material_path → 本地文件
path_to_res: dict = {} # material_path → 分辨率
normalized_material = temp_dir / f"{task_id}_material_{i}_norm.mp4"
normalized_result = await _run_blocking(
video.normalize_orientation,
str(material_local),
str(normalized_material),
)
if normalized_result != str(material_local):
temp_files.append(normalized_material)
material_local = normalized_material
async def _download_unique(mat_path: str, idx: int):
"""去重下载单个素材并归一化方向"""
async with _segment_sem:
material_local = temp_dir / f"{task_id}_material_{idx}.mp4"
temp_files.append(material_local)
await _download_material(mat_path, material_local)
res = video.get_resolution(str(material_local))
return material_local, res
normalized_material = temp_dir / f"{task_id}_material_{idx}_norm.mp4"
normalized_result = await _run_blocking(
video.normalize_orientation,
str(material_local),
str(normalized_material),
)
if normalized_result != str(material_local):
temp_files.append(normalized_material)
material_local = normalized_material
download_tasks = [
_download_and_normalize(i, assignment)
for i, assignment in enumerate(assignments)
]
download_results = await asyncio.gather(*download_tasks)
for local, res in download_results:
material_locals.append(local)
resolutions.append(res)
res = video.get_resolution(str(material_local))
return mat_path, material_local, res
download_results = await asyncio.gather(*[
_download_unique(p, i) for i, p in enumerate(unique_paths)
])
for mat_path, local, res in download_results:
path_to_local[mat_path] = local
path_to_res[mat_path] = res
logger.info(f"[MultiMat] 去重下载 {len(unique_paths)} 个素材(共 {num_segments} 个段)")
# 按用户选择的画面比例统一分辨率
base_res = target_resolution
need_scale = any(r != base_res for r in resolutions)
need_scale = any(r != base_res for r in path_to_res.values())
if need_scale:
logger.info(f"[MultiMat] 素材分辨率不一致,统一到 {base_res[0]}x{base_res[1]}")
# ── 第二步:并行裁剪每段素材到对应时长 ──
# ── 第二步:并行裁剪每段素材到对应时长(通过映射找到已下载文件)──
prepared_segments: List[Optional[Path]] = [None] * num_segments
async def _prepare_one_segment(i: int, assignment: dict):
"""将单个素材裁剪/循环到对应时长"""
seg_dur = assignment["end"] - assignment["start"]
prepared_path = temp_dir / f"{task_id}_prepared_{i}.mp4"
temp_files.append(prepared_path)
prepare_target_res = None if resolutions[i] == base_res else base_res
async with _segment_sem:
seg_dur = assignment["end"] - assignment["start"]
prepared_path = temp_dir / f"{task_id}_prepared_{i}.mp4"
temp_files.append(prepared_path)
mat_local = path_to_local[assignment["material_path"]]
mat_res = path_to_res[assignment["material_path"]]
prepare_target_res = None if mat_res == base_res else base_res
await _run_blocking(
video.prepare_segment,
str(material_locals[i]),
seg_dur,
str(prepared_path),
prepare_target_res,
assignment.get("source_start", 0.0),
assignment.get("source_end"),
25,
)
return i, prepared_path
await _run_blocking(
video.prepare_segment,
str(mat_local),
seg_dur,
str(prepared_path),
prepare_target_res,
assignment.get("source_start", 0.0),
assignment.get("source_end"),
25,
)
return i, prepared_path
_update_task(
task_id,

File diff suppressed because it is too large Load Diff

View File

@@ -3,8 +3,10 @@ GLM AI 服务
使用智谱 GLM 生成标题和标签
"""
import asyncio
import json
import re
from typing import Any, Optional, cast
from loguru import logger
from zai import ZhipuAiClient
@@ -25,6 +27,48 @@ class GLMService:
self.client = ZhipuAiClient(api_key=settings.GLM_API_KEY)
return self.client
async def _call_glm(
self,
*,
prompt: str,
max_tokens: int,
temperature: float,
action: str,
timeout_seconds: float = 85.0,
) -> str:
"""统一 GLM 调用入口,避免重复调用代码"""
client = self._get_client()
logger.info(
f"{action} | model={settings.GLM_MODEL} | max_tokens={max_tokens} | temperature={temperature}"
)
try:
response = await asyncio.wait_for(
asyncio.to_thread(
client.chat.completions.create,
model=settings.GLM_MODEL,
messages=[{"role": "user", "content": prompt}],
thinking={"type": "disabled"},
max_tokens=max_tokens,
temperature=temperature,
),
timeout=timeout_seconds,
)
except asyncio.TimeoutError as exc:
raise Exception("GLM 请求超时,请稍后重试") from exc
completion = cast(Any, response)
choices = getattr(completion, "choices", None)
if not choices:
raise Exception("AI 返回内容为空")
message = getattr(choices[0], "message", None)
content = getattr(message, "content", "")
text = content.strip() if isinstance(content, str) else str(content or "").strip()
if not text:
raise Exception("AI 返回内容为空")
return text
async def generate_title_tags(self, text: str) -> dict:
"""
根据口播文案生成标题和标签
@@ -50,22 +94,13 @@ class GLMService:
{{"title": "标题", "secondary_title": "副标题", "tags": ["标签1", "标签2", "标签3"]}}"""
try:
client = self._get_client()
logger.info(f"Calling GLM API with model: {settings.GLM_MODEL}")
# 使用 asyncio.to_thread 包装同步 SDK 调用,避免阻塞事件循环
import asyncio
response = await asyncio.to_thread(
client.chat.completions.create,
model=settings.GLM_MODEL,
messages=[{"role": "user", "content": prompt}],
thinking={"type": "disabled"}, # 禁用思考模式,加快响应
content = await self._call_glm(
prompt=prompt,
max_tokens=500,
temperature=0.7
temperature=0.7,
action="生成标题与标签",
timeout_seconds=75.0,
)
# 提取生成的内容
content = response.choices[0].message.content
logger.info(f"GLM response (model: {settings.GLM_MODEL}): {content}")
# 解析 JSON
@@ -76,7 +111,7 @@ class GLMService:
logger.error(f"GLM service error: {e}")
raise Exception(f"AI 生成失败: {str(e)}")
async def rewrite_script(self, text: str, custom_prompt: str = None) -> str:
async def rewrite_script(self, text: str, custom_prompt: Optional[str] = None) -> str:
"""
AI 改写文案
@@ -105,28 +140,126 @@ class GLMService:
4. 不要返回多余的解释,只返回改写后的正文"""
try:
client = self._get_client()
logger.info(f"Using GLM to rewrite script")
# 使用 asyncio.to_thread 包装同步 SDK 调用,避免阻塞事件循环
import asyncio
response = await asyncio.to_thread(
client.chat.completions.create,
model=settings.GLM_MODEL,
messages=[{"role": "user", "content": prompt}],
thinking={"type": "disabled"},
content = await self._call_glm(
prompt=prompt,
max_tokens=2000,
temperature=0.8
temperature=0.8,
action="改写文案",
timeout_seconds=85.0,
)
content = response.choices[0].message.content
logger.info("GLM rewrite completed")
return content.strip()
return content
except Exception as e:
logger.error(f"GLM rewrite error: {e}")
raise Exception(f"AI 改写失败: {str(e)}")
async def analyze_topics(self, titles: list[str]) -> list[str]:
"""
分析视频标题列表并归纳热门话题(最多 10 个)
"""
cleaned_titles = [str(title).strip() for title in titles if str(title).strip()]
if not cleaned_titles:
raise Exception("标题列表为空")
limited_titles = cleaned_titles[:50]
titles_text = "\n".join(f"{idx + 1}. {title}" for idx, title in enumerate(limited_titles))
prompt = f"""以下是某短视频博主最近发布的视频标题列表:
{titles_text}
请分析这些标题,归纳总结出该博主内容中最热门的话题方向。
要求:
1. 提取不超过10个话题方向
2. 每个话题用简短短语描述(建议 5-15 字)
3. 按热门程度排序(出现频率高的在前)
4. 只返回话题列表,每行一个,不要编号、解释或多余内容"""
try:
content = await self._call_glm(
prompt=prompt,
max_tokens=500,
temperature=0.5,
action="分析博主话题",
timeout_seconds=85.0,
)
topics = self._parse_topic_lines(content)
if not topics:
raise Exception("未识别到有效话题")
logger.info(f"GLM topic analysis completed: {len(topics)} topics")
return topics[:10]
except Exception as e:
logger.error(f"GLM topic analysis error: {e}")
raise Exception(f"话题分析失败: {str(e)}")
async def generate_script_from_topic(self, topic: str, word_count: int, titles: list[str]) -> str:
"""
根据选中话题与博主标题风格生成文案
"""
topic_value = str(topic or "").strip()
if not topic_value:
raise Exception("话题不能为空")
cleaned_titles = [str(title).strip() for title in titles if str(title).strip()]
if not cleaned_titles:
raise Exception("参考标题为空")
word_count_value = max(80, min(int(word_count), 1000))
sample_titles = "\n".join(f"{idx + 1}. {title}" for idx, title in enumerate(cleaned_titles[:10]))
prompt = f"""请围绕「{topic_value}」这个话题,生成一段短视频口播文案。
参考该博主的标题风格:
{sample_titles}
要求:
1. 文案字数约 {word_count_value}
2. 适合短视频口播,语气自然、有吸引力
3. 开头要有钩子吸引观众
4. 只返回文案正文,不要标题和其他说明"""
try:
content = await self._call_glm(
prompt=prompt,
max_tokens=min(word_count_value * 3, 4000),
temperature=0.8,
action=f"按话题生成文案(topic={topic_value})",
timeout_seconds=88.0,
)
logger.info("GLM topic script generation completed")
return content
except Exception as e:
logger.error(f"GLM topic script generation error: {e}")
raise Exception(f"文案生成失败: {str(e)}")
def _parse_topic_lines(self, content: str) -> list[str]:
lines = [line.strip() for line in str(content or "").splitlines()]
topics: list[str] = []
seen: set[str] = set()
for line in lines:
if not line:
continue
cleaned = re.sub(r"^\s*(?:[-*•]+|\d+[.)、\s]+)", "", line).strip()
cleaned = cleaned.strip('"“”')
if not cleaned:
continue
if cleaned in seen:
continue
seen.add(cleaned)
topics.append(cleaned)
if len(topics) >= 10:
break
return topics
async def translate_text(self, text: str, target_lang: str) -> str:
@@ -151,22 +284,15 @@ class GLMService:
3. 翻译要自然流畅,符合目标语言的表达习惯"""
try:
client = self._get_client()
logger.info(f"Using GLM to translate text to {target_lang}")
import asyncio
response = await asyncio.to_thread(
client.chat.completions.create,
model=settings.GLM_MODEL,
messages=[{"role": "user", "content": prompt}],
thinking={"type": "disabled"},
content = await self._call_glm(
prompt=prompt,
max_tokens=2000,
temperature=0.3
temperature=0.3,
action=f"翻译文案(target={target_lang})",
timeout_seconds=75.0,
)
content = response.choices[0].message.content
logger.info("GLM translation completed")
return content.strip()
return content
except Exception as e:
logger.error(f"GLM translate error: {e}")

View File

@@ -11,12 +11,13 @@ import asyncio
import httpx
from pathlib import Path
from loguru import logger
from typing import Optional, Literal
from typing import Optional, Literal
from app.core.config import settings
from app.services.small_face_enhance_service import SmallFaceEnhanceService
class LipSyncService:
class LipSyncService:
"""唇形同步服务 - LatentSync 1.6 + MuseTalk 1.5 混合方案"""
def __init__(self):
@@ -38,6 +39,9 @@ class LipSyncService:
# 运行时检测
self._weights_available: Optional[bool] = None
# 小脸增强
self._face_enhance = SmallFaceEnhanceService()
def _check_weights(self) -> bool:
"""检查模型权重是否存在"""
@@ -93,7 +97,7 @@ class LipSyncService:
logger.warning(f"⚠️ 获取媒体时长失败: {e}")
return None
def _loop_video_to_duration(self, video_path: str, output_path: str, target_duration: float) -> str:
def _loop_video_to_duration(self, video_path: str, output_path: str, target_duration: float) -> str:
"""
循环视频以匹配目标时长
使用 FFmpeg stream_loop 实现无缝循环
@@ -117,47 +121,70 @@ class LipSyncService:
else:
logger.warning(f"⚠️ 视频循环失败: {result.stderr[:200]}")
return video_path
except Exception as e:
logger.warning(f"⚠️ 视频循环异常: {e}")
return video_path
except Exception as e:
logger.warning(f"⚠️ 视频循环异常: {e}")
return video_path
def _mux_audio_to_video(self, video_path: str, audio_path: str, output_path: str) -> bool:
"""将音轨封装到视频,避免增强路径出现无声输出。"""
try:
cmd = [
"ffmpeg", "-y",
"-i", video_path,
"-i", audio_path,
"-map", "0:v:0",
"-map", "1:a:0",
"-c:v", "copy",
"-c:a", "aac",
"-shortest",
output_path,
]
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
if result.returncode == 0 and Path(output_path).exists():
return True
logger.warning(f"⚠️ 音轨封装失败: {result.stderr[:200]}")
return False
except Exception as e:
logger.warning(f"⚠️ 音轨封装异常: {e}")
return False
async def generate(
self,
video_path: str,
audio_path: str,
output_path: str,
fps: int = 25,
model_mode: Literal["default", "fast", "advanced"] = "default",
) -> str:
"""生成唇形同步视频"""
logger.info(f"🎬 唇形同步任务: {Path(video_path).name} + {Path(audio_path).name}")
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
normalized_mode: Literal["default", "fast", "advanced"] = model_mode
if normalized_mode not in ("default", "fast", "advanced"):
normalized_mode = "default"
logger.info(f"🧠 Lipsync 模式: {normalized_mode}")
if self.use_local:
return await self._local_generate(video_path, audio_path, output_path, fps, normalized_mode)
else:
return await self._remote_generate(video_path, audio_path, output_path, fps, normalized_mode)
async def generate(
self,
video_path: str,
audio_path: str,
output_path: str,
fps: int = 25,
model_mode: Literal["default", "fast", "advanced"] = "default",
) -> str:
"""生成唇形同步视频"""
logger.info(f"🎬 唇形同步任务: {Path(video_path).name} + {Path(audio_path).name}")
Path(output_path).parent.mkdir(parents=True, exist_ok=True)
normalized_mode: Literal["default", "fast", "advanced"] = model_mode
if normalized_mode not in ("default", "fast", "advanced"):
normalized_mode = "default"
logger.info(f"🧠 Lipsync 模式: {normalized_mode}")
if self.use_local:
return await self._local_generate(video_path, audio_path, output_path, fps, normalized_mode)
else:
return await self._remote_generate(video_path, audio_path, output_path, fps, normalized_mode)
async def _local_generate(
self,
video_path: str,
audio_path: str,
output_path: str,
fps: int,
model_mode: Literal["default", "fast", "advanced"],
) -> str:
"""使用 subprocess 调用 LatentSync conda 环境"""
logger.info("⏳ 等待 GPU 资源 (排队中)...")
async with self._lock:
# 使用临时目录存放中间文件
with tempfile.TemporaryDirectory() as tmpdir:
tmpdir = Path(tmpdir)
async def _local_generate(
self,
video_path: str,
audio_path: str,
output_path: str,
fps: int,
model_mode: Literal["default", "fast", "advanced"],
) -> str:
"""使用 subprocess 调用 LatentSync conda 环境"""
logger.info("⏳ 等待 GPU 资源 (排队中)...")
async with self._lock:
# 使用临时目录存放中间文件
with tempfile.TemporaryDirectory() as tmpdir:
tmpdir = Path(tmpdir)
# 获取音频和视频时长
audio_duration = self._get_media_duration(audio_path)
@@ -172,133 +199,206 @@ class LipSyncService:
str(looped_video),
audio_duration
)
else:
actual_video_path = video_path
# 模型路由
force_musetalk = model_mode == "fast"
force_latentsync = model_mode == "advanced"
auto_to_musetalk = (
model_mode == "default"
and audio_duration is not None
and audio_duration >= settings.LIPSYNC_DURATION_THRESHOLD
)
if force_musetalk:
logger.info("⚡ 强制快速模型MuseTalk")
musetalk_result = await self._call_musetalk_server(
actual_video_path, audio_path, output_path
)
if musetalk_result:
return musetalk_result
logger.warning("⚠️ MuseTalk 不可用,快速模型回退到 LatentSync")
elif auto_to_musetalk:
logger.info(
f"🔄 音频 {audio_duration:.1f}s >= {settings.LIPSYNC_DURATION_THRESHOLD}s路由到 MuseTalk"
)
musetalk_result = await self._call_musetalk_server(
actual_video_path, audio_path, output_path
)
if musetalk_result:
return musetalk_result
logger.warning("⚠️ MuseTalk 不可用,回退到 LatentSync长视频会较慢")
elif force_latentsync:
logger.info("🎯 强制高级模型LatentSync")
# 检查 LatentSync 前置条件(仅在需要回退或使用 LatentSync 时)
if not self._check_conda_env():
logger.warning("⚠️ Conda 环境不可用,使用 Fallback")
shutil.copy(video_path, output_path)
return output_path
if not self._check_weights():
logger.warning("⚠️ 模型权重不存在,使用 Fallback")
shutil.copy(video_path, output_path)
return output_path
if self.use_server:
# 模式 A: 调用常驻服务 (加速模式)
return await self._call_persistent_server(actual_video_path, audio_path, output_path)
else:
actual_video_path = video_path
logger.info("🔄 调用 LatentSync 推理 (subprocess)...")
temp_output = tmpdir / "output.mp4"
# 构建命令
cmd = [
str(self.conda_python),
"-m", "scripts.inference",
"--unet_config_path", "configs/unet/stage2_512.yaml",
"--inference_ckpt_path", "checkpoints/latentsync_unet.pt",
"--inference_steps", str(settings.LATENTSYNC_INFERENCE_STEPS),
"--guidance_scale", str(settings.LATENTSYNC_GUIDANCE_SCALE),
"--video_path", str(actual_video_path), # 使用预处理后的视频
"--audio_path", str(audio_path),
"--video_out_path", str(temp_output),
"--seed", str(settings.LATENTSYNC_SEED),
"--temp_dir", str(tmpdir / "cache"),
]
if settings.LATENTSYNC_ENABLE_DEEPCACHE:
cmd.append("--enable_deepcache")
# 设置环境变量
env = os.environ.copy()
env["CUDA_VISIBLE_DEVICES"] = str(self.gpu_id)
logger.info(f"🖥️ 执行命令: {' '.join(cmd[:8])}...")
logger.info(f"🖥️ GPU: CUDA_VISIBLE_DEVICES={self.gpu_id}")
# ── 小脸增强 ──
enhance_result = None
try:
# 使用 asyncio subprocess 实现真正的异步执行
# 这样事件循环可以继续处理其他请求(如进度查询)
process = await asyncio.create_subprocess_exec(
*cmd,
cwd=str(self.latentsync_dir),
env=env,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
enhance_result = self._face_enhance.enhance_if_needed(
video_path=str(actual_video_path),
tmpdir=tmpdir,
gpu_id=settings.LIPSYNC_SMALL_FACE_GPU_ID,
)
# 等待进程完成,带超时
try:
stdout, stderr = await asyncio.wait_for(
process.communicate(),
timeout=900 # 15分钟超时
)
except asyncio.TimeoutError:
process.kill()
await process.wait()
logger.error("⏰ LatentSync 推理超时 (15分钟)")
shutil.copy(video_path, output_path)
return output_path
stdout_text = stdout.decode() if stdout else ""
stderr_text = stderr.decode() if stderr else ""
if process.returncode != 0:
logger.error(f"LatentSync 推理失败:\n{stderr_text}")
logger.error(f"stdout:\n{stdout_text[-1000:] if stdout_text else 'N/A'}")
# Fallback
shutil.copy(video_path, output_path)
return output_path
logger.info(f"LatentSync 输出:\n{stdout_text[-500:] if stdout_text else 'N/A'}")
# 检查输出文件
if temp_output.exists():
shutil.copy(temp_output, output_path)
logger.info(f"✅ 唇形同步完成: {output_path}")
return output_path
else:
logger.warning("⚠️ 未找到输出文件,使用 Fallback")
shutil.copy(video_path, output_path)
return output_path
except Exception as e:
logger.error(f"❌ 推理异常: {e}")
shutil.copy(video_path, output_path)
return output_path
if settings.LIPSYNC_SMALL_FACE_FAIL_OPEN:
logger.warning(f"⚠️ 小脸增强失败,跳过: {e}")
else:
raise
if enhance_result and enhance_result.was_enhanced:
track = enhance_result.track
if track is None:
raise RuntimeError("小脸增强轨迹缺失")
# 增强路径:模型推理增强后的人脸视频 → 贴回原视频
temp_sync = tmpdir / "face_sync.mp4"
await self._run_selected_model(
video_path=enhance_result.video_path,
audio_path=audio_path,
output_path=str(temp_sync),
tmpdir=tmpdir,
model_mode=model_mode,
audio_duration=audio_duration,
original_video_path=video_path,
)
try:
blended = self._face_enhance.blend_back(
original_video=str(actual_video_path),
lipsync_video=str(temp_sync),
track=track,
tmpdir=tmpdir,
)
blended_with_audio = tmpdir / "blended_with_audio.mp4"
if not self._mux_audio_to_video(
video_path=str(blended),
audio_path=audio_path,
output_path=str(blended_with_audio),
):
raise RuntimeError("贴回视频音轨封装失败")
shutil.copy(str(blended_with_audio), output_path)
logger.info(f"✅ 小脸增强 + 唇形同步完成: {output_path}")
return output_path
except Exception as e:
if settings.LIPSYNC_SMALL_FACE_FAIL_OPEN:
logger.warning(f"⚠️ 小脸贴回失败,回退原流程: {e}")
else:
raise
# 常规路径(未增强或增强失败)
return await self._run_selected_model(
video_path=str(actual_video_path),
audio_path=audio_path,
output_path=output_path,
tmpdir=tmpdir,
model_mode=model_mode,
audio_duration=audio_duration,
original_video_path=video_path,
)
async def _run_selected_model(
self,
video_path: str,
audio_path: str,
output_path: str,
tmpdir: Path,
model_mode: Literal["default", "fast", "advanced"],
audio_duration: Optional[float],
original_video_path: str,
) -> str:
"""模型路由 + 执行MuseTalk / LatentSync 常驻服务 / LatentSync subprocess"""
# 模型路由
force_musetalk = model_mode == "fast"
force_latentsync = model_mode == "advanced"
auto_to_musetalk = (
model_mode == "default"
and audio_duration is not None
and audio_duration >= settings.LIPSYNC_DURATION_THRESHOLD
)
if force_musetalk:
logger.info("⚡ 强制快速模型MuseTalk")
musetalk_result = await self._call_musetalk_server(
video_path, audio_path, output_path
)
if musetalk_result:
return musetalk_result
logger.warning("⚠️ MuseTalk 不可用,快速模型回退到 LatentSync")
elif auto_to_musetalk:
logger.info(
f"🔄 音频 {audio_duration:.1f}s >= {settings.LIPSYNC_DURATION_THRESHOLD}s路由到 MuseTalk"
)
musetalk_result = await self._call_musetalk_server(
video_path, audio_path, output_path
)
if musetalk_result:
return musetalk_result
logger.warning("⚠️ MuseTalk 不可用,回退到 LatentSync长视频会较慢")
elif force_latentsync:
logger.info("🎯 强制高级模型LatentSync")
# 检查 LatentSync 前置条件
if not self._check_conda_env():
logger.warning("⚠️ Conda 环境不可用,使用 Fallback")
shutil.copy(original_video_path, output_path)
return output_path
if not self._check_weights():
logger.warning("⚠️ 模型权重不存在,使用 Fallback")
shutil.copy(original_video_path, output_path)
return output_path
if self.use_server:
# 模式 A: 调用常驻服务 (加速模式)
return await self._call_persistent_server(video_path, audio_path, output_path)
logger.info("🔄 调用 LatentSync 推理 (subprocess)...")
temp_output = tmpdir / "output.mp4"
# 构建命令
cmd = [
str(self.conda_python),
"-m", "scripts.inference",
"--unet_config_path", "configs/unet/stage2_512.yaml",
"--inference_ckpt_path", "checkpoints/latentsync_unet.pt",
"--inference_steps", str(settings.LATENTSYNC_INFERENCE_STEPS),
"--guidance_scale", str(settings.LATENTSYNC_GUIDANCE_SCALE),
"--video_path", str(video_path),
"--audio_path", str(audio_path),
"--video_out_path", str(temp_output),
"--seed", str(settings.LATENTSYNC_SEED),
"--temp_dir", str(tmpdir / "cache"),
]
if settings.LATENTSYNC_ENABLE_DEEPCACHE:
cmd.append("--enable_deepcache")
# 设置环境变量
env = os.environ.copy()
env["CUDA_VISIBLE_DEVICES"] = str(self.gpu_id)
logger.info(f"🖥️ 执行命令: {' '.join(cmd[:8])}...")
logger.info(f"🖥️ GPU: CUDA_VISIBLE_DEVICES={self.gpu_id}")
try:
process = await asyncio.create_subprocess_exec(
*cmd,
cwd=str(self.latentsync_dir),
env=env,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
try:
stdout, stderr = await asyncio.wait_for(
process.communicate(),
timeout=900 # 15分钟超时
)
except asyncio.TimeoutError:
process.kill()
await process.wait()
logger.error("⏰ LatentSync 推理超时 (15分钟)")
shutil.copy(original_video_path, output_path)
return output_path
stdout_text = stdout.decode() if stdout else ""
stderr_text = stderr.decode() if stderr else ""
if process.returncode != 0:
logger.error(f"LatentSync 推理失败:\n{stderr_text}")
logger.error(f"stdout:\n{stdout_text[-1000:] if stdout_text else 'N/A'}")
shutil.copy(original_video_path, output_path)
return output_path
logger.info(f"LatentSync 输出:\n{stdout_text[-500:] if stdout_text else 'N/A'}")
if temp_output.exists():
shutil.copy(temp_output, output_path)
logger.info(f"✅ 唇形同步完成: {output_path}")
return output_path
else:
logger.warning("⚠️ 未找到输出文件,使用 Fallback")
shutil.copy(original_video_path, output_path)
return output_path
except Exception as e:
logger.error(f"❌ 推理异常: {e}")
shutil.copy(original_video_path, output_path)
return output_path
async def _call_musetalk_server(
self, video_path: str, audio_path: str, output_path: str
@@ -413,18 +513,18 @@ class LipSyncService:
"请确保 LatentSync 服务已启动 (cd models/LatentSync && python scripts/server.py)"
)
async def _remote_generate(
self,
video_path: str,
audio_path: str,
output_path: str,
fps: int,
model_mode: Literal["default", "fast", "advanced"],
) -> str:
"""调用远程 LatentSync API 服务"""
if model_mode == "fast":
logger.warning("⚠️ 远程模式未接入 MuseTalk快速模型将使用远程 LatentSync")
logger.info(f"📡 调用远程 API: {self.api_url}")
async def _remote_generate(
self,
video_path: str,
audio_path: str,
output_path: str,
fps: int,
model_mode: Literal["default", "fast", "advanced"],
) -> str:
"""调用远程 LatentSync API 服务"""
if model_mode == "fast":
logger.warning("⚠️ 远程模式未接入 MuseTalk快速模型将使用远程 LatentSync")
logger.info(f"📡 调用远程 API: {self.api_url}")
try:
async with httpx.AsyncClient(timeout=600.0) as client:
@@ -499,4 +599,9 @@ class LipSyncService:
"ready": conda_ok and weights_ok and gpu_ok,
"musetalk_ready": musetalk_ready,
"lipsync_threshold": settings.LIPSYNC_DURATION_THRESHOLD,
"small_face_enhance": {
"enabled": settings.LIPSYNC_SMALL_FACE_ENHANCE,
"threshold": settings.LIPSYNC_SMALL_FACE_THRESHOLD,
"detector_loaded": self._face_enhance._detector_session is not None,
},
}

View File

@@ -21,16 +21,22 @@ from .uploader.xiaohongshu_uploader import XiaohongshuUploader
from .uploader.weixin_uploader import WeixinUploader
class PublishService:
"""Social media publishing service (with user isolation)"""
class PublishService:
"""Social media publishing service (with user isolation)"""
# 支持的平台配置
PLATFORMS: Dict[str, Dict[str, Any]] = {
"douyin": {"name": "抖音", "url": "https://creator.douyin.com/", "enabled": True},
"weixin": {"name": "微信视频号", "url": "https://channels.weixin.qq.com/", "enabled": True},
"bilibili": {"name": "B站", "url": "https://member.bilibili.com/platform/upload/video/frame", "enabled": True},
"xiaohongshu": {"name": "小红书", "url": "https://creator.xiaohongshu.com/", "enabled": True},
}
PLATFORMS: Dict[str, Dict[str, Any]] = {
"douyin": {"name": "抖音", "url": "https://creator.douyin.com/", "enabled": True},
"weixin": {"name": "微信视频号", "url": "https://channels.weixin.qq.com/", "enabled": True},
"bilibili": {"name": "B站", "url": "https://member.bilibili.com/platform/upload/video/frame", "enabled": True},
"xiaohongshu": {"name": "小红书", "url": "https://creator.xiaohongshu.com/", "enabled": True},
}
COOKIE_DOMAINS: Dict[str, str] = {
"douyin": ".douyin.com",
"weixin": ".weixin.qq.com",
"xiaohongshu": ".xiaohongshu.com",
}
def __init__(self) -> None:
# 存储活跃的登录会话,用于跟踪登录状态
@@ -185,15 +191,16 @@ class PublishService:
description=description,
user_id=user_id,
)
elif platform == "xiaohongshu":
uploader = XiaohongshuUploader(
title=title,
file_path=local_video_path,
tags=tags,
publish_date=publish_time,
account_file=str(account_file),
description=description
)
elif platform == "xiaohongshu":
uploader = XiaohongshuUploader(
title=title,
file_path=local_video_path,
tags=tags,
publish_date=publish_time,
account_file=str(account_file),
description=description,
user_id=user_id,
)
elif platform == "weixin":
uploader = WeixinUploader(
title=title,
@@ -330,48 +337,88 @@ class PublishService:
logger.exception(f"[登出] 失败: {e}")
return {"success": False, "message": f"注销失败: {str(e)}"}
async def save_cookie_string(self, platform: str, cookie_string: str, user_id: Optional[str] = None) -> Dict[str, Any]:
"""
保存从客户端浏览器提取的Cookie字符串
async def save_cookie_string(self, platform: str, cookie_string: str, user_id: Optional[str] = None) -> Dict[str, Any]:
"""
保存从客户端浏览器提取的Cookie字符串
Args:
platform: 平台ID
cookie_string: document.cookie 格式的Cookie字符串
user_id: 用户 ID (用于 Cookie 隔离)
"""
try:
account_file = self._get_cookie_path(platform, user_id)
# 解析Cookie字符串
cookie_dict = {}
for item in cookie_string.split('; '):
if '=' in item:
name, value = item.split('=', 1)
cookie_dict[name] = value
# 对B站进行特殊处理
if platform == "bilibili":
bilibili_cookies = {}
required_fields = ['SESSDATA', 'bili_jct', 'DedeUserID', 'DedeUserID__ckMd5']
"""
try:
if platform not in self.PLATFORMS:
return {
"success": False,
"message": f"不支持的平台: {platform}",
}
account_file = self._get_cookie_path(platform, user_id)
# 解析Cookie字符串
cookie_dict: Dict[str, str] = {}
for item in cookie_string.split(';'):
item = item.strip()
if not item:
continue
if '=' in item:
name, value = item.split('=', 1)
cookie_dict[name.strip()] = value.strip()
if not cookie_dict:
return {
"success": False,
"message": "Cookie 为空,请确认已完成登录",
}
# 对B站进行特殊处理
if platform == "bilibili":
bilibili_cookies = {}
required_fields = ['SESSDATA', 'bili_jct', 'DedeUserID', 'DedeUserID__ckMd5']
for field in required_fields:
if field in cookie_dict:
bilibili_cookies[field] = cookie_dict[field]
if len(bilibili_cookies) < 3:
return {
"success": False,
"message": "Cookie不完整请确保已登录"
}
cookie_dict = bilibili_cookies
# 确保目录存在
account_file.parent.mkdir(parents=True, exist_ok=True)
# 保存Cookie
with open(account_file, 'w', encoding='utf-8') as f:
json.dump(cookie_dict, f, indent=2)
if len(bilibili_cookies) < 3:
return {
"success": False,
"message": "Cookie不完整请确保已登录"
}
payload: Any = bilibili_cookies
else:
cookie_domain = self.COOKIE_DOMAINS.get(platform, "")
if not cookie_domain:
platform_url = self.PLATFORMS.get(platform, {}).get("url", "")
host = re.sub(r"^https?://", "", platform_url).strip("/")
cookie_domain = f".{host}" if host else ""
storage_cookies = []
for name, value in cookie_dict.items():
if not name:
continue
storage_cookies.append({
"name": name,
"value": value,
"domain": cookie_domain,
"path": "/",
"httpOnly": False,
"secure": True,
"sameSite": "Lax",
"expires": -1,
})
payload = {
"cookies": storage_cookies,
"origins": [],
}
# 确保目录存在
account_file.parent.mkdir(parents=True, exist_ok=True)
# 保存Cookie
with open(account_file, 'w', encoding='utf-8') as f:
json.dump(payload, f, indent=2)
logger.success(f"[登录] {platform} Cookie已保存 (user: {user_id or 'legacy'})")

View File

@@ -8,7 +8,8 @@ import base64
import json
from pathlib import Path
from typing import Optional, Dict, Any, List, Sequence, Mapping, Union
from playwright.async_api import async_playwright, Page, Frame, BrowserContext, Browser, Playwright as PW
from urllib.parse import unquote_to_bytes
from playwright.async_api import async_playwright, Page, Frame, BrowserContext, Browser, Playwright as PW, TimeoutError as PlaywrightTimeoutError
from loguru import logger
from app.core.config import settings
@@ -65,10 +66,16 @@ class QRLoginService:
"xiaohongshu": {
"url": "https://creator.xiaohongshu.com/",
"qr_selectors": [
".login-box-container img.css-1lhmg90",
".login-box-container .css-dvxtzn img",
".login-box-container img",
"div[class*='login-box'] img",
".qrcode img",
"img[alt*='二维码']",
"canvas.qr-code",
"img[class*='qr']"
"img[class*='qr']",
"img[src*='qrcode']",
"img[src*='qr']"
],
"success_indicator": "https://creator.xiaohongshu.com/publish"
},
@@ -109,6 +116,103 @@ class QRLoginService:
ratio = width / height
return 0.75 <= ratio <= 1.33
def _data_url_to_base64(self, data_url: str) -> Optional[str]:
if not data_url or "," not in data_url:
return None
header, payload = data_url.split(",", 1)
header_lower = header.lower()
if not header_lower.startswith("data:image/png"):
return None
if ";base64" in header:
return payload
try:
raw = unquote_to_bytes(payload)
return base64.b64encode(raw).decode()
except Exception:
return None
async def _try_export_qr_data_url(self, qr_element) -> Optional[str]:
"""优先导出元素原图,避免截图带来的缩放/裁切损失。"""
try:
data_url = await qr_element.evaluate("""async (el) => {
const tag = (el.tagName || '').toLowerCase();
if (tag === 'canvas') {
try {
return el.toDataURL('image/png');
} catch {
return null;
}
}
if (tag === 'img') {
const src = el.currentSrc || el.src || '';
if (!src) return null;
if (src.startsWith('data:image/png')) {
return src;
}
if (src.startsWith('blob:')) {
try {
const resp = await fetch(src);
const blob = await resp.blob();
return await new Promise((resolve) => {
const reader = new FileReader();
reader.onloadend = () => resolve(typeof reader.result === 'string' ? reader.result : null);
reader.onerror = () => resolve(null);
reader.readAsDataURL(blob);
});
} catch {
return null;
}
}
return null;
}
return null;
}""")
if not data_url:
return None
return self._data_url_to_base64(data_url)
except Exception:
return None
async def _screenshot_qr_base64(self, page: Page, qr_element) -> Optional[str]:
try:
if self.platform == "weixin":
bbox = await qr_element.bounding_box()
viewport = page.viewport_size or {"width": 1920, "height": 1080}
if bbox:
pad = max(16, int(min(bbox.get("width", 0), bbox.get("height", 0)) * 0.08))
x = max(0.0, bbox.get("x", 0.0) - pad)
y = max(0.0, bbox.get("y", 0.0) - pad)
max_width = float(viewport.get("width", 1920))
max_height = float(viewport.get("height", 1080))
width = min(max_width - x, bbox.get("width", 0.0) + pad * 2)
height = min(max_height - y, bbox.get("height", 0.0) + pad * 2)
if width > 8 and height > 8:
clipped = await page.screenshot(
clip={"x": x, "y": y, "width": width, "height": height},
type="png",
)
return base64.b64encode(clipped).decode()
screenshot = await qr_element.screenshot(type="png")
return base64.b64encode(screenshot).decode()
except Exception as e:
logger.warning(f"[{self.platform}] QR截图失败: {e}")
return None
async def _capture_qr_base64(self, page: Page, qr_element) -> Optional[str]:
data_url_base64 = await self._try_export_qr_data_url(qr_element)
if data_url_base64:
return data_url_base64
return await self._screenshot_qr_base64(page, qr_element)
async def _pick_best_candidate(self, locator, min_side: int = 100):
best = None
best_area = 0
@@ -160,6 +264,88 @@ class QRLoginService:
return await self._find_qr_in_frames(page, selectors, min_side=min_side)
async def _ensure_xiaohongshu_qr_mode(self, page: Page) -> None:
"""小红书登录页默认短信登录,需要先切到扫码登录。"""
if self.platform != "xiaohongshu":
return
try:
for _ in range(3):
sms_mode = False
try:
sms_mode = await page.locator("input[placeholder*='手机号']").first.is_visible(timeout=800)
except Exception:
sms_mode = False
if not sms_mode:
return
clicked = False
# 先尝试稳定选择器
switch_selectors = [
"img.css-wemwzq",
".login-box-container img[style*='cursor: pointer']",
]
for selector in switch_selectors:
try:
locator = page.locator(selector)
count = await locator.count()
for i in range(count):
candidate = locator.nth(i)
if not await candidate.is_visible():
continue
bbox = await candidate.bounding_box()
if not bbox:
continue
if bbox.get("width", 0) < 24 or bbox.get("width", 0) > 96:
continue
if bbox.get("height", 0) < 24 or bbox.get("height", 0) > 96:
continue
try:
await candidate.click(timeout=1200)
except Exception:
await candidate.evaluate("el => el.click()")
clicked = True
break
if clicked:
break
except Exception:
continue
if not clicked:
# 兜底:在登录卡片右上角找可点击小图标
clicked = bool(await page.evaluate("""() => {
const phoneInput = Array.from(document.querySelectorAll('input'))
.find((el) => (el.placeholder || '').includes('手机号'));
const card = document.querySelector('.login-box-container') || phoneInput?.closest('div');
if (!card) return false;
const cardRect = card.getBoundingClientRect();
const imgs = Array.from(card.querySelectorAll('img'));
for (const img of imgs) {
const r = img.getBoundingClientRect();
if (r.width < 24 || r.width > 96 || r.height < 24 || r.height > 96) continue;
if (r.right < cardRect.right - 90) continue;
if (r.top > cardRect.top + 90) continue;
const style = getComputedStyle(img);
if (style.cursor !== 'pointer') continue;
img.click();
return true;
}
return false;
}"""))
if not clicked:
logger.warning("[xiaohongshu] 未找到登录方式切换按钮,继续尝试二维码提取")
return
logger.info("[xiaohongshu] 已点击登录方式切换,等待二维码渲染")
await asyncio.sleep(1.5)
except Exception as e:
logger.warning(f"[xiaohongshu] 切换扫码登录模式失败: {e}")
async def _try_text_strategy_in_frames(self, page: Page):
for frame in page.frames:
if frame == page.main_frame:
@@ -317,12 +503,22 @@ class QRLoginService:
for url in urls_to_try:
logger.info(f"[{self.platform}] 打开登录页: {url}")
wait_until = "domcontentloaded" if self.platform == "weixin" else "networkidle"
await page.goto(url, wait_until=wait_until)
wait_until = "domcontentloaded" if self.platform in ("weixin", "douyin") else "networkidle"
try:
await page.goto(url, wait_until=wait_until, timeout=30000)
except PlaywrightTimeoutError as nav_err:
# 抖音页存在长连接,偶发无法满足等待条件;超时后继续尝试提取二维码
if self.platform == "douyin":
logger.warning(f"[douyin] 页面加载超时,继续尝试提取二维码: {nav_err}")
else:
raise
# 等待页面加载
await asyncio.sleep(1 if self.platform == "weixin" else 2)
if self.platform == "xiaohongshu":
await self._ensure_xiaohongshu_qr_mode(page)
# 提取二维码 (并行策略)
qr_image = await self._extract_qr_code(page, config["qr_selectors"])
if qr_image:
@@ -373,8 +569,9 @@ class QRLoginService:
el = await page.wait_for_selector(combined_selector, state="visible", timeout=5000)
if el:
logger.info(f"[{self.platform}] 策略CSS: 匹配成功")
screenshot = await el.screenshot()
return base64.b64encode(screenshot).decode()
qr_base64 = await self._capture_qr_base64(page, el)
if qr_base64:
return qr_base64
except Exception as e:
logger.warning(f"[{self.platform}] 策略CSS 失败: {e}")
@@ -382,8 +579,9 @@ class QRLoginService:
qr_element = await self._try_text_strategy(page)
if qr_element:
try:
screenshot = await qr_element.screenshot()
return base64.b64encode(screenshot).decode()
qr_base64 = await self._capture_qr_base64(page, qr_element)
if qr_base64:
return qr_base64
except Exception as e:
logger.warning(f"[{self.platform}] Text策略截图失败: {e}")
@@ -397,8 +595,9 @@ class QRLoginService:
qr_element = await self._try_text_strategy(page)
if qr_element:
try:
screenshot = await qr_element.screenshot()
return base64.b64encode(screenshot).decode()
qr_base64 = await self._capture_qr_base64(page, qr_element)
if qr_base64:
return qr_base64
except Exception as e:
logger.warning(f"[{self.platform}] Text策略截图失败: {e}")
qr_element = None
@@ -410,12 +609,16 @@ class QRLoginService:
el = await page.wait_for_selector(combined_selector, state="visible", timeout=5000)
if el:
logger.info(f"[{self.platform}] 策略CSS: 匹配成功")
screenshot = await el.screenshot()
return base64.b64encode(screenshot).decode()
qr_base64 = await self._capture_qr_base64(page, el)
if qr_base64:
return qr_base64
except Exception as e:
logger.warning(f"[{self.platform}] 策略CSS 失败: {e}")
else:
# 其他平台 (小红书/微信等):保持原顺序 CSS -> Text
if self.platform == "xiaohongshu":
await self._ensure_xiaohongshu_qr_mode(page)
# 策略1: CSS 选择器
try:
combined_selector = ", ".join(selectors)
@@ -432,7 +635,8 @@ class QRLoginService:
else:
await page.wait_for_selector(combined_selector, state="visible", timeout=5000)
locator = page.locator(combined_selector)
qr_element = await self._pick_best_candidate(locator, min_side=100)
min_side = 120 if self.platform == "xiaohongshu" else 100
qr_element = await self._pick_best_candidate(locator, min_side=min_side)
if qr_element:
logger.info(f"[{self.platform}] 策略1(CSS): 匹配成功")
except Exception as e:
@@ -448,8 +652,9 @@ class QRLoginService:
# 如果找到元素,截图返回
if qr_element:
try:
screenshot = await qr_element.screenshot()
return base64.b64encode(screenshot).decode()
qr_base64 = await self._capture_qr_base64(page, qr_element)
if qr_base64:
return qr_base64
except Exception as e:
logger.error(f"[{self.platform}] 截图失败: {e}")
@@ -465,6 +670,8 @@ class QRLoginService:
keywords = [
"扫码登录",
"二维码",
"APP扫一扫登录",
"可用小红书扫码",
"打开抖音",
"抖音APP",
"使用APP扫码",
@@ -483,7 +690,7 @@ class QRLoginService:
for _ in range(5):
parent = parent.locator("..")
candidates = parent.locator("img, canvas")
min_side = 120 if self.platform == "weixin" else 100
min_side = 120 if self.platform in ("weixin", "xiaohongshu") else 100
best = await self._pick_best_candidate(candidates, min_side=min_side)
if best:
logger.info(f"[{self.platform}] 策略Text: 成功")
@@ -554,6 +761,22 @@ class QRLoginService:
await self._save_cookies(final)
break
# ── 小红书特殊:扫码后常跳转到 /new/home不一定命中 success_indicator ──
if self.platform == "xiaohongshu":
lowered_url = current_url.lower()
xhs_logged_in = (
lowered_url.startswith("https://creator.xiaohongshu.com/new/")
or "/publish/publish" in lowered_url
or "/publish/success" in lowered_url
) and "/login" not in lowered_url
if xhs_logged_in:
logger.success(f"[xiaohongshu] 登录成功URL={current_url[:120]}")
self.login_success = True
await asyncio.sleep(2)
final = [dict(c) for c in await self.context.cookies()]
await self._save_cookies(final)
break
# ── 抖音API 拦截到 redirect_url → 直接导航 ──
if self.platform == "douyin" and self._qr_api_confirmed and self._qr_redirect_url:
logger.info(f"[douyin] 导航到 redirect_url...")

View File

@@ -0,0 +1,872 @@
"""
小脸增强服务
远景小脸场景下,裁切 + 超分 -> lipsync 推理 -> 贴回,提升输入质量。
单文件单类,供 LipSyncService 调用。
"""
from __future__ import annotations
import subprocess
import time
from dataclasses import dataclass, field
from pathlib import Path
from typing import Optional, Tuple, List
from loguru import logger
from app.core.config import settings
try:
import cv2
import numpy as np
_CV2_AVAILABLE = True
except ImportError:
_CV2_AVAILABLE = False
# ── 模块常量 ──
PADDING = 0.28 # bbox 外扩比例
DETECT_EVERY = 8 # 每 N 帧检测一次
TARGET_SIZE = 512 # 超分目标尺寸
MASK_FEATHER = 15 # 羽化像素
MASK_UPPER_RATIO = 0.68 # 口型区域起始位置(仅覆盖嘴部/下巴)
MASK_SIDE_MARGIN = 0.16 # 左右留白比例,避免改动面颊/鼻翼
SAMPLE_FRAMES = 24 # 采样帧数
SAMPLE_WINDOW = (0.10, 0.30) # 采样窗口 (10%~30%)
ENCODE_FPS = 25 # 编码帧率
ENCODE_CRF = 18 # 编码质量
EMA_ALPHA = 0.3 # EMA 平滑系数
# 检测过滤
MIN_FACE_WIDTH = 50
FACE_ASPECT_MIN = 0.2
FACE_ASPECT_MAX = 1.5
DET_SCORE_THRESH = 0.5
NMS_IOU_THRESH = 0.4
# 权重路径
_PROJECT_ROOT = Path(__file__).resolve().parent.parent.parent.parent
DET_MODEL_PATH = (
_PROJECT_ROOT
/ "models" / "LatentSync" / "checkpoints"
/ "auxiliary" / "models" / "buffalo_l" / "det_10g.onnx"
)
# ── 数据结构 ──
@dataclass
class FaceTrack:
"""每帧的人脸追踪数据(用于裁切 + 贴回)"""
crop_boxes: List[Tuple[int, int, int, int]] # 每帧 (x1,y1,x2,y2)
face_width_median: float
frame_count: int
frame_w: int
frame_h: int
@dataclass
class EnhanceResult:
"""enhance_if_needed 返回值"""
video_path: str
was_enhanced: bool
track: Optional[FaceTrack] = None
face_width: float = 0.0
class SmallFaceEnhanceService:
"""小脸增强服务:检测 → 裁切 → 超分 → (lipsync) → 贴回"""
def __init__(self):
self._detector_session = None
self._sr_model = None
self._sr_type: Optional[str] = None
# ================================================================
# SCRFD 人脸检测det_10g.onnxCPU 推理)
# ================================================================
def _ensure_detector(self) -> bool:
if self._detector_session is not None:
return True
if not DET_MODEL_PATH.exists():
logger.warning(f"⚠️ SCRFD 权重不存在: {DET_MODEL_PATH}")
return False
try:
import onnxruntime as ort
self._detector_session = ort.InferenceSession(
str(DET_MODEL_PATH),
providers=["CPUExecutionProvider"],
)
logger.info("✅ SCRFD 检测器已加载")
return True
except Exception as e:
logger.warning(f"⚠️ SCRFD 初始化失败: {e}")
return False
def _detect_faces(self, img_bgr: np.ndarray) -> List[Tuple[np.ndarray, float]]:
"""
用 SCRFD 检测人脸。
Returns: [(bbox_xyxy, score), ...] 按面积降序。
"""
if self._detector_session is None:
return []
h, w = img_bgr.shape[:2]
input_h, input_w = 640, 640
# ── Preprocess ──
ratio = min(input_h / h, input_w / w)
new_h, new_w = int(h * ratio), int(w * ratio)
resized = cv2.resize(img_bgr, (new_w, new_h))
padded = np.full((input_h, input_w, 3), 127.5, dtype=np.float32)
padded[:new_h, :new_w] = resized.astype(np.float32)
# BGR → RGB → normalize
blob = padded[:, :, ::-1].copy()
blob = (blob - 127.5) / 128.0
blob = blob.transpose(2, 0, 1)[np.newaxis].astype(np.float32)
# ── Inference ──
input_name = self._detector_session.get_inputs()[0].name
outputs = self._detector_session.run(None, {input_name: blob})
# det_10g outputs: [scores_s8, scores_s16, scores_s32,
# bbox_s8, bbox_s16, bbox_s32,
# kps_s8, kps_s16, kps_s32]
strides = [8, 16, 32]
all_bboxes = []
all_scores = []
for i, stride in enumerate(strides):
scores = outputs[i].flatten()
bboxes = outputs[i + 3].reshape(-1, 4)
# 生成 anchor 中心
feat_h = input_h // stride
feat_w = input_w // stride
anchors = []
for y in range(feat_h):
for x in range(feat_w):
cx, cy = x * stride, y * stride
anchors.append([cx, cy])
anchors.append([cx, cy]) # 2 anchors per cell
anchors = np.array(anchors, dtype=np.float32)
# 置信度过滤
mask = scores > DET_SCORE_THRESH
if not mask.any():
continue
f_scores = scores[mask]
f_bboxes = bboxes[mask]
f_anchors = anchors[mask]
# Decode: distance * stride → xyxy
decoded = np.empty_like(f_bboxes)
decoded[:, 0] = f_anchors[:, 0] - f_bboxes[:, 0] * stride
decoded[:, 1] = f_anchors[:, 1] - f_bboxes[:, 1] * stride
decoded[:, 2] = f_anchors[:, 0] + f_bboxes[:, 2] * stride
decoded[:, 3] = f_anchors[:, 1] + f_bboxes[:, 3] * stride
# 缩放回原始图像坐标
decoded /= ratio
all_bboxes.append(decoded)
all_scores.append(f_scores)
if not all_bboxes:
return []
bboxes_cat = np.concatenate(all_bboxes)
scores_cat = np.concatenate(all_scores)
# NMS
keep = self._nms(bboxes_cat, scores_cat, NMS_IOU_THRESH)
# 尺寸 + 宽高比过滤
results = []
for idx in keep:
bbox = bboxes_cat[idx]
score = float(scores_cat[idx])
bw = bbox[2] - bbox[0]
bh = bbox[3] - bbox[1]
if bw < MIN_FACE_WIDTH or bh < MIN_FACE_WIDTH:
continue
aspect = bw / max(bh, 1)
if aspect < FACE_ASPECT_MIN or aspect > FACE_ASPECT_MAX:
continue
results.append((bbox.copy(), score))
results.sort(key=lambda x: (x[0][2] - x[0][0]) * (x[0][3] - x[0][1]), reverse=True)
return results
@staticmethod
def _nms(bboxes: np.ndarray, scores: np.ndarray, threshold: float) -> List[int]:
x1 = bboxes[:, 0]
y1 = bboxes[:, 1]
x2 = bboxes[:, 2]
y2 = bboxes[:, 3]
areas = (x2 - x1) * (y2 - y1)
order = scores.argsort()[::-1]
keep = []
while order.size > 0:
i = order[0]
keep.append(int(i))
if order.size == 1:
break
xx1 = np.maximum(x1[i], x1[order[1:]])
yy1 = np.maximum(y1[i], y1[order[1:]])
xx2 = np.minimum(x2[i], x2[order[1:]])
yy2 = np.minimum(y2[i], y2[order[1:]])
inter = np.maximum(0, xx2 - xx1) * np.maximum(0, yy2 - yy1)
iou = inter / (areas[i] + areas[order[1:]] - inter + 1e-6)
inds = np.where(iou <= threshold)[0]
order = order[inds + 1]
return keep
# ================================================================
# 视频工具
# ================================================================
@staticmethod
def _get_video_info(video_path: str) -> Optional[Tuple[int, int, int, float]]:
"""返回 (width, height, frame_count, fps)"""
try:
import json as _json
cmd = [
"ffprobe", "-v", "error",
"-select_streams", "v:0",
"-show_entries", "stream=width,height,nb_frames,r_frame_rate,avg_frame_rate",
"-of", "json",
video_path,
]
r = subprocess.run(cmd, capture_output=True, text=True, timeout=10)
if r.returncode != 0:
return None
info = _json.loads(r.stdout)
streams = info.get("streams")
if not streams:
return None
stream = streams[0]
w, h = int(stream["width"]), int(stream["height"])
# nb_frames 可能为 "N/A" 或缺失
nb_raw = stream.get("nb_frames", "N/A")
nb = int(nb_raw) if nb_raw not in ("N/A", "") else 0
def _parse_fps(s: str) -> float:
if "/" in s:
num, den = s.split("/")
return float(num) / float(den) if float(den) != 0 else 0.0
return float(s) if s else 0.0
# 优先 avg_frame_rate真实平均帧率r_frame_rate 可能是 timebase 倍数
avg_fps = _parse_fps(stream.get("avg_frame_rate", "0/0"))
r_fps = _parse_fps(stream.get("r_frame_rate", "25/1"))
fps = avg_fps if avg_fps > 0 else (r_fps if r_fps > 0 else 25.0)
if nb == 0:
cmd2 = [
"ffprobe", "-v", "error",
"-show_entries", "format=duration",
"-of", "default=noprint_wrappers=1:nokey=1",
video_path,
]
r2 = subprocess.run(cmd2, capture_output=True, text=True, timeout=10)
if r2.returncode == 0 and r2.stdout.strip():
nb = int(float(r2.stdout.strip()) * fps)
return w, h, nb, fps
except Exception as e:
logger.warning(f"⚠️ 获取视频信息失败: {e}")
return None
@staticmethod
def _open_video_reader(video_path: str, w: int, h: int,
seek_sec: float = 0, duration_sec: float = 0):
"""打开 ffmpeg rawvideo 读取管道"""
cmd = ["ffmpeg"]
if seek_sec > 0:
cmd += ["-ss", f"{seek_sec:.3f}"]
cmd += ["-i", video_path]
if duration_sec > 0:
cmd += ["-t", f"{duration_sec:.3f}"]
cmd += ["-f", "rawvideo", "-pix_fmt", "bgr24", "-v", "quiet", "-"]
return subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
@staticmethod
def _read_one_frame(proc, w: int, h: int) -> Optional[np.ndarray]:
raw = proc.stdout.read(w * h * 3)
if len(raw) < w * h * 3:
return None
return np.frombuffer(raw, dtype=np.uint8).reshape(h, w, 3).copy()
@staticmethod
def _open_video_writer(output_path: str, w: int, h: int,
fps: int = ENCODE_FPS, crf: int = ENCODE_CRF):
"""打开 ffmpeg rawvideo 写入管道"""
cmd = [
"ffmpeg", "-y",
"-f", "rawvideo", "-pix_fmt", "bgr24",
"-s", f"{w}x{h}", "-r", str(fps), "-i", "-",
"-c:v", "libx264", "-crf", str(crf),
"-preset", "fast", "-pix_fmt", "yuv420p",
output_path,
]
return subprocess.Popen(cmd, stdin=subprocess.PIPE, stderr=subprocess.DEVNULL)
# ================================================================
# Phase 2: 人脸尺寸检测
# ================================================================
def _detect_face_size(self, video_path: str) -> Optional[float]:
"""
从视频 10%~30% 区间均匀采样,检测最大脸宽度中位数。
返回 None 表示未检测到人脸或检测器不可用。
"""
if not self._ensure_detector():
return None
info = self._get_video_info(video_path)
if info is None:
return None
w, h, nb_frames, fps = info
if nb_frames < 1 or fps <= 0:
return None
# 计算采样区间
start_frame = int(nb_frames * SAMPLE_WINDOW[0])
end_frame = int(nb_frames * SAMPLE_WINDOW[1])
end_frame = max(end_frame, start_frame + 1)
n_sample = min(SAMPLE_FRAMES, end_frame - start_frame)
if n_sample <= 0:
return None
step = max(1, (end_frame - start_frame) // n_sample)
sample_indices = set(range(start_frame, end_frame, step))
# 用 ffmpeg seek 定位到采样起点
seek_sec = start_frame / fps
duration_sec = (end_frame - start_frame) / fps + 0.5 # 余量
proc = self._open_video_reader(video_path, w, h, seek_sec, duration_sec)
face_widths = []
try:
for local_idx in range(end_frame - start_frame + 1):
frame = self._read_one_frame(proc, w, h)
if frame is None:
break
global_idx = start_frame + local_idx
if global_idx not in sample_indices:
continue
faces = self._detect_faces(frame)
if faces:
bbox = faces[0][0] # 最大脸
face_widths.append(float(bbox[2] - bbox[0]))
finally:
proc.stdout.close()
proc.terminate()
proc.wait()
if not face_widths:
return None
face_widths.sort()
mid = len(face_widths) // 2
if len(face_widths) % 2 == 0:
return (face_widths[mid - 1] + face_widths[mid]) / 2
return face_widths[mid]
# ================================================================
# Phase 3: 裁切 + 轨迹
# ================================================================
def _build_face_track(self, video_path: str,
w: int, h: int, nb_frames: int) -> Optional[FaceTrack]:
"""
逐帧人脸追踪:每 DETECT_EVERY 帧检测,中间帧 EMA 插值。
返回 FaceTrack 或 None检测失败
"""
if not self._ensure_detector():
return None
detect_set = set(range(0, nb_frames, DETECT_EVERY))
# 第一遍:检测帧
proc = self._open_video_reader(video_path, w, h)
keyframe_bboxes = {}
actual_frames = 0
try:
for idx in range(nb_frames):
frame = self._read_one_frame(proc, w, h)
if frame is None:
break
actual_frames = idx + 1
if idx not in detect_set:
continue
faces = self._detect_faces(frame)
if faces:
keyframe_bboxes[idx] = faces[0][0].copy()
finally:
proc.stdout.close()
proc.terminate()
proc.wait()
if not keyframe_bboxes:
return None
# 用实际读到的帧数,避免 _get_video_info 估算偏差
nb_frames = actual_frames
# 前向填充 + EMA 平滑
sorted_keys = sorted(keyframe_bboxes.keys())
raw_bboxes: List[np.ndarray] = [None] * nb_frames # type: ignore
for k in sorted_keys:
raw_bboxes[k] = keyframe_bboxes[k]
prev = keyframe_bboxes[sorted_keys[0]]
for i in range(nb_frames):
if raw_bboxes[i] is not None:
prev = raw_bboxes[i]
else:
raw_bboxes[i] = prev.copy()
# EMA 平滑
smoothed = [raw_bboxes[0].copy()]
for i in range(1, nb_frames):
s = EMA_ALPHA * raw_bboxes[i] + (1 - EMA_ALPHA) * smoothed[-1]
smoothed.append(s)
# 带 padding 的 crop boxclamp 到帧边界)
crop_boxes = []
for bbox in smoothed:
x1, y1, x2, y2 = bbox
bw, bh = x2 - x1, y2 - y1
pad_w, pad_h = bw * PADDING, bh * PADDING
cx1 = max(0, int(x1 - pad_w))
cy1 = max(0, int(y1 - pad_h))
cx2 = min(w, int(x2 + pad_w))
cy2 = min(h, int(y2 + pad_h))
crop_boxes.append((cx1, cy1, cx2, cy2))
# 中位数脸宽
widths = sorted(float(b[2] - b[0]) for b in smoothed)
median_w = widths[len(widths) // 2]
return FaceTrack(
crop_boxes=crop_boxes,
face_width_median=median_w,
frame_count=nb_frames,
frame_w=w,
frame_h=h,
)
# ================================================================
# Phase 3: 超分
# ================================================================
def _ensure_upscaler(self, upscaler: str, gpu_id: int) -> bool:
"""懒加载超分模型"""
if self._sr_model is not None and self._sr_type == upscaler:
return True
try:
import sys
import torch
# torchvision >= 0.20 移除了 functional_tensor但 basicsr 仍引用
if "torchvision.transforms.functional_tensor" not in sys.modules:
try:
import torchvision.transforms.functional as _F
sys.modules["torchvision.transforms.functional_tensor"] = _F
except ImportError:
pass
device = torch.device(f"cuda:{gpu_id}" if torch.cuda.is_available() else "cpu")
if upscaler == "gfpgan":
from gfpgan import GFPGANer
model_path = _PROJECT_ROOT / "models" / "FaceEnhance" / "GFPGANv1.4.pth"
if not model_path.exists():
logger.warning(f"⚠️ GFPGAN 权重不存在: {model_path}")
return False
self._sr_model = GFPGANer(
model_path=str(model_path),
upscale=2,
arch="clean",
channel_multiplier=2,
bg_upsampler=None,
device=device,
)
elif upscaler == "codeformer":
from basicsr.archs.codeformer_arch import CodeFormer as CodeFormerArch
model_path = _PROJECT_ROOT / "models" / "FaceEnhance" / "codeformer.pth"
if not model_path.exists():
logger.warning(f"⚠️ CodeFormer 权重不存在: {model_path}")
# 尝试回退 gfpgan
return self._ensure_upscaler("gfpgan", gpu_id)
net = CodeFormerArch(
dim_embd=512, codebook_size=1024, n_head=8, n_layers=9,
connect_list=["32", "64", "128", "256"],
).to(device)
ckpt = torch.load(str(model_path), map_location=device, weights_only=False)
net.load_state_dict(ckpt.get("params_ema", ckpt.get("params", ckpt)))
net.eval()
self._sr_model = net
self._sr_device = device
else:
logger.warning(f"⚠️ 未知超分器: {upscaler}")
return False
self._sr_type = upscaler
logger.info(f"✅ 超分器已加载: {upscaler}")
return True
except Exception as e:
logger.warning(f"⚠️ 超分器初始化失败 ({upscaler}): {e}")
return False
def _upscale_face(self, face_img: np.ndarray, target_size: int) -> np.ndarray:
"""用已加载的超分模型增强单帧,失败回退 bicubic"""
try:
if self._sr_type == "gfpgan":
_, _, output = self._sr_model.enhance(
face_img, paste_back=False, has_aligned=False,
)
if output is not None:
return cv2.resize(
output, (target_size, target_size),
interpolation=cv2.INTER_LANCZOS4,
)
elif self._sr_type == "codeformer":
import torch
img = cv2.resize(face_img, (512, 512))
img_t = (
torch.from_numpy(img.astype(np.float32) / 255.0)
.permute(2, 0, 1)
.unsqueeze(0)
.to(self._sr_device)
)
with torch.no_grad():
out = self._sr_model(img_t, w=0.7)[0]
out_np = (
out.squeeze().permute(1, 2, 0).cpu().numpy() * 255
).clip(0, 255).astype(np.uint8)
return cv2.resize(
out_np, (target_size, target_size),
interpolation=cv2.INTER_LANCZOS4,
)
except Exception as e:
logger.debug(f"超分失败,回退 bicubic: {e}")
return cv2.resize(
face_img, (target_size, target_size),
interpolation=cv2.INTER_CUBIC,
)
# ================================================================
# Phase 3: 裁切 + 超分 → 增强视频
# ================================================================
def _crop_and_upscale_video(
self,
video_path: str,
track: FaceTrack,
tmpdir: Path,
gpu_id: int,
source_fps: float,
) -> str:
"""
裁切人脸区域 → 稀疏关键帧超分 → 输出 TARGET_SIZE 视频。
流式处理,不占满内存。
"""
output_path = str(tmpdir / "enhanced_face.mp4")
w, h = track.frame_w, track.frame_h
upscaler = settings.LIPSYNC_SMALL_FACE_UPSCALER
sr_available = self._ensure_upscaler(upscaler, gpu_id)
detect_set = set(range(0, track.frame_count, DETECT_EVERY))
reader = self._open_video_reader(video_path, w, h)
out_fps = max(1, int(round(source_fps))) if source_fps > 0 else ENCODE_FPS
writer = self._open_video_writer(output_path, TARGET_SIZE, TARGET_SIZE, fps=out_fps)
try:
for idx in range(track.frame_count):
frame = self._read_one_frame(reader, w, h)
if frame is None:
break
cx1, cy1, cx2, cy2 = track.crop_boxes[idx]
cropped = frame[cy1:cy2, cx1:cx2]
if sr_available and idx in detect_set:
enhanced = self._upscale_face(cropped, TARGET_SIZE)
else:
enhanced = cv2.resize(
cropped, (TARGET_SIZE, TARGET_SIZE),
interpolation=cv2.INTER_CUBIC,
)
writer.stdin.write(enhanced.tobytes())
finally:
reader.stdout.close()
reader.terminate()
reader.wait()
writer.stdin.close()
writer.wait()
if not Path(output_path).exists():
raise RuntimeError("增强视频写入失败")
return output_path
# ================================================================
# Phase 3: 贴回
# ================================================================
def blend_back(
self,
original_video: str,
lipsync_video: str,
track: FaceTrack,
tmpdir,
) -> str:
"""
将 lipsync 推理结果贴回原视频。
下半脸 mask + 高斯羽化 + seamlessClone。
"""
tmpdir = Path(tmpdir)
output_path = str(tmpdir / "blended_output.mp4")
w, h = track.frame_w, track.frame_h
# 获取 lipsync 视频尺寸
ls_info = self._get_video_info(lipsync_video)
if ls_info is None:
raise RuntimeError("无法读取 lipsync 视频信息")
ls_w, ls_h, ls_frames, ls_fps = ls_info
if ls_fps <= 0:
ls_fps = ENCODE_FPS
# 帧数保护lipsync 模型按音频时长输出,帧数通常 <= 原始(looped)视频
if ls_frames <= 0:
raise RuntimeError(f"lipsync 输出帧数为 {ls_frames},跳过贴回")
if ls_frames > track.frame_count:
raise RuntimeError(
f"帧数异常: lipsync={ls_frames} > original={track.frame_count}"
)
blend_count = ls_frames
orig_info = self._get_video_info(original_video)
orig_fps = orig_info[3] if orig_info is not None else 0.0
if orig_fps <= 0:
orig_fps = ls_fps
orig_reader = self._open_video_reader(original_video, w, h)
ls_reader = self._open_video_reader(lipsync_video, ls_w, ls_h)
writer = self._open_video_writer(
output_path,
w,
h,
fps=max(1, int(round(ls_fps))),
)
current_orig_idx = -1
current_orig_frame = None
try:
for idx in range(blend_count):
target_orig_idx = min(
track.frame_count - 1,
int(round((idx / ls_fps) * orig_fps)),
)
while current_orig_idx < target_orig_idx:
frame = self._read_one_frame(orig_reader, w, h)
if frame is None:
current_orig_frame = None
break
current_orig_idx += 1
current_orig_frame = frame
orig_frame = current_orig_frame
ls_frame = self._read_one_frame(ls_reader, ls_w, ls_h)
if orig_frame is None or ls_frame is None:
break
cx1, cy1, cx2, cy2 = track.crop_boxes[target_orig_idx]
crop_w, crop_h = cx2 - cx1, cy2 - cy1
# 将 lipsync 输出 resize 到裁切区域尺寸
ls_resized = cv2.resize(
ls_frame, (crop_w, crop_h),
interpolation=cv2.INTER_LANCZOS4,
)
# 嘴部局部 mask尽量仅覆盖嘴唇与下巴区域避免鼻子/眼周被改动)
mask = np.zeros((crop_h, crop_w), dtype=np.uint8)
upper = int(crop_h * MASK_UPPER_RATIO)
left = int(crop_w * MASK_SIDE_MARGIN)
right = int(crop_w * (1.0 - MASK_SIDE_MARGIN))
if right - left < 8:
left, right = 0, crop_w
mask[upper:, left:right] = 255
# 中央椭圆增强口型区域权重
ellipse_center = (crop_w // 2, int(crop_h * 0.82))
ellipse_axes = (max(8, int(crop_w * 0.22)), max(8, int(crop_h * 0.13)))
cv2.ellipse(mask, ellipse_center, ellipse_axes, 0, 0, 360, 255, -1)
mask = cv2.GaussianBlur(mask, (0, 0), MASK_FEATHER)
# 融合
blended = self._blend_face_region(
orig_frame, ls_resized, mask, cx1, cy1, cx2, cy2,
)
writer.stdin.write(blended.tobytes())
finally:
for p in (orig_reader, ls_reader):
p.stdout.close()
p.terminate()
p.wait()
writer.stdin.close()
writer.wait()
if not Path(output_path).exists():
raise RuntimeError("融合视频写入失败")
return output_path
@staticmethod
def _blend_face_region(
orig: np.ndarray,
face: np.ndarray,
mask: np.ndarray,
x1: int, y1: int, x2: int, y2: int,
) -> np.ndarray:
"""seamlessClone 贴回,失败回退 alpha 混合"""
result = orig.copy()
crop_h, crop_w = face.shape[:2]
# 尝试 seamlessClone
try:
center_x = (x1 + x2) // 2
center_y = int(y1 + (y2 - y1) * 0.7)
center_x = max(1, min(center_x, orig.shape[1] - 2))
center_y = max(1, min(center_y, orig.shape[0] - 2))
src = np.zeros_like(orig)
src[y1:y2, x1:x2] = face
full_mask = np.zeros(orig.shape[:2], dtype=np.uint8)
full_mask[y1:y2, x1:x2] = mask
if full_mask.max() > 0:
cloned = cv2.seamlessClone(
src, orig, full_mask, (center_x, center_y), cv2.NORMAL_CLONE,
)
# 限制融合影响范围到 mask 区域,避免 Poisson 扩散导致眼部上方重影
alpha = mask.astype(np.float32) / 255.0
alpha_3ch = np.stack([alpha] * 3, axis=-1)
roi_orig = orig[y1:y2, x1:x2].astype(np.float32)
roi_clone = cloned[y1:y2, x1:x2].astype(np.float32)
blended_roi = roi_orig * (1 - alpha_3ch) + roi_clone * alpha_3ch
result = orig.copy()
result[y1:y2, x1:x2] = blended_roi.astype(np.uint8)
return result
except Exception:
pass
# Fallback: alpha 混合
alpha = mask.astype(np.float32) / 255.0
alpha_3ch = np.stack([alpha] * 3, axis=-1)
crop_region = result[y1:y2, x1:x2].astype(np.float32)
blended = crop_region * (1 - alpha_3ch) + face.astype(np.float32) * alpha_3ch
result[y1:y2, x1:x2] = blended.astype(np.uint8)
return result
# ================================================================
# 主入口
# ================================================================
def enhance_if_needed(
self,
video_path: str,
tmpdir,
gpu_id: int,
) -> EnhanceResult:
"""
主入口:检测小脸 → 裁切 + 超分 → 返回增强结果。
如不需要增强,返回 was_enhanced=False。
"""
if not settings.LIPSYNC_SMALL_FACE_ENHANCE:
return EnhanceResult(video_path=video_path, was_enhanced=False)
if not _CV2_AVAILABLE:
logger.warning("⚠️ opencv/numpy 未安装,小脸增强不可用")
return EnhanceResult(video_path=video_path, was_enhanced=False)
start = time.time()
tmpdir = Path(tmpdir)
face_dir = tmpdir / "face_enhance"
face_dir.mkdir(exist_ok=True)
# ── 检测 ──
face_width = self._detect_face_size(video_path)
if face_width is None:
logger.info("小脸增强: 未检测到人脸,跳过")
return EnhanceResult(video_path=video_path, was_enhanced=False)
threshold = settings.LIPSYNC_SMALL_FACE_THRESHOLD
if face_width >= threshold:
logger.info(
f"小脸增强: face_w={face_width:.0f}px >= threshold={threshold}px, 跳过"
)
return EnhanceResult(
video_path=video_path, was_enhanced=False, face_width=face_width,
)
logger.info(
f"小脸增强: face_w={face_width:.0f}px < threshold={threshold}px, 触发增强"
)
# ── 构建追踪 ──
info = self._get_video_info(video_path)
if info is None:
raise RuntimeError("无法读取视频信息")
w, h, nb_frames, fps = info
track = self._build_face_track(video_path, w, h, nb_frames)
if track is None:
raise RuntimeError("人脸追踪失败")
# ── 裁切 + 超分 ──
enhanced_path = self._crop_and_upscale_video(
video_path,
track,
face_dir,
gpu_id,
source_fps=fps,
)
# 清理 GPU 缓存
try:
import torch
if torch.cuda.is_available():
torch.cuda.empty_cache()
except ImportError:
pass
elapsed = time.time() - start
logger.info(
f"小脸增强: face_w={face_width:.0f}px threshold={threshold}px "
f"enhanced=True upscaler={settings.LIPSYNC_SMALL_FACE_UPSCALER} "
f"time={elapsed:.1f}s"
)
return EnhanceResult(
video_path=enhanced_path,
was_enhanced=True,
track=track,
face_width=face_width,
)

View File

@@ -182,18 +182,18 @@ class StorageService:
logger.error(f"Get public URL failed: {e}")
return ""
async def delete_file(self, bucket: str, path: str):
"""异步删除文件"""
try:
loop = asyncio.get_running_loop()
await loop.run_in_executor(
None,
lambda: self.supabase.storage.from_(bucket).remove([path])
)
logger.info(f"Deleted file: {bucket}/{path}")
except Exception as e:
logger.error(f"Delete file failed: {e}")
pass
async def delete_file(self, bucket: str, path: str):
"""异步删除文件"""
try:
loop = asyncio.get_running_loop()
await loop.run_in_executor(
None,
lambda: self.supabase.storage.from_(bucket).remove([path])
)
logger.info(f"Deleted file: {bucket}/{path}")
except Exception as e:
logger.error(f"Delete file failed: {e}")
raise e
async def move_file(self, bucket: str, from_path: str, to_path: str):
"""异步移动/重命名文件"""
@@ -208,17 +208,19 @@ class StorageService:
logger.error(f"Move file failed: {e}")
raise e
async def list_files(self, bucket: str, path: str) -> List[Any]:
"""异步列出文件"""
try:
loop = asyncio.get_running_loop()
res = await loop.run_in_executor(
None,
lambda: self.supabase.storage.from_(bucket).list(path)
)
return res or []
except Exception as e:
logger.error(f"List files failed: {e}")
return []
async def list_files(self, bucket: str, path: str, strict: bool = False) -> List[Any]:
"""异步列出文件"""
try:
loop = asyncio.get_running_loop()
res = await loop.run_in_executor(
None,
lambda: self.supabase.storage.from_(bucket).list(path)
)
return res or []
except Exception as e:
logger.error(f"List files failed: {e}")
if strict:
raise e
return []
storage_service = StorageService()

View File

@@ -847,13 +847,22 @@ class WeixinUploader(BaseUploader):
logger.info(text)
self._append_debug_log(text)
return True
text = "[weixin][file_input] empty"
logger.warning(text)
self._append_debug_log(text)
await asyncio.sleep(0.5)
if await self._is_upload_in_progress(page):
upload_started = False
for _ in range(3):
await asyncio.sleep(0.4)
if await self._is_upload_in_progress(page):
upload_started = True
break
if upload_started:
logger.info("[weixin] upload started after file input set")
return True
text = "[weixin][file_input] empty after set_input_files and no upload signal"
if attempt + 1 >= self.MAX_CLICK_RETRIES:
logger.warning(text)
else:
logger.info(text)
self._append_debug_log(text)
except Exception as e:
logger.warning(f"[weixin] failed to read file input info: {e}")
except Exception as e:

View File

@@ -1,201 +1,775 @@
"""
Xiaohongshu (小红书) uploader using Playwright
Based on social-auto-upload implementation
"""
from datetime import datetime
from pathlib import Path
from typing import Optional, List, Dict, Any
import asyncio
from playwright.async_api import Playwright, async_playwright
from loguru import logger
from .base_uploader import BaseUploader
from .cookie_utils import set_init_script
class XiaohongshuUploader(BaseUploader):
"""Xiaohongshu video uploader using Playwright"""
# 超时配置 (秒)
UPLOAD_TIMEOUT = 300 # 视频上传超时
PUBLISH_TIMEOUT = 120 # 发布检测超时
POLL_INTERVAL = 1 # 轮询间隔
def __init__(
self,
title: str,
file_path: str,
tags: List[str],
publish_date: Optional[datetime] = None,
account_file: Optional[str] = None,
description: str = ""
):
super().__init__(title, file_path, tags, publish_date, account_file, description)
self.upload_url = "https://creator.xiaohongshu.com/publish/publish?from=homepage&target=video"
async def set_schedule_time(self, page, publish_date):
"""Set scheduled publish time"""
try:
logger.info("[小红书] 正在设置定时发布时间...")
# Click "定时发布" label
label_element = page.locator("label:has-text('定时发布')")
await label_element.click()
await asyncio.sleep(1)
# Format time
publish_date_hour = publish_date.strftime("%Y-%m-%d %H:%M")
# Fill datetime input
await page.locator('.el-input__inner[placeholder="选择日期和时间"]').click()
await page.keyboard.press("Control+KeyA")
await page.keyboard.type(str(publish_date_hour))
await page.keyboard.press("Enter")
await asyncio.sleep(1)
logger.info(f"[小红书] 已设置定时发布: {publish_date_hour}")
except Exception as e:
logger.error(f"[小红书] 设置定时发布失败: {e}")
async def upload(self, playwright: Playwright) -> dict:
"""Main upload logic with guaranteed resource cleanup"""
browser = None
context = None
try:
# Launch browser (headless for server deployment)
browser = await playwright.chromium.launch(headless=True)
context = await browser.new_context(
viewport={"width": 1600, "height": 900},
storage_state=self.account_file
)
context = await set_init_script(context)
page = await context.new_page()
# Go to upload page
await page.goto(self.upload_url)
logger.info(f"[小红书] 正在上传: {self.file_path.name}")
# Upload video file
await page.locator("div[class^='upload-content'] input[class='upload-input']").set_input_files(str(self.file_path))
# Wait for upload to complete (with timeout)
import time
upload_start = time.time()
while time.time() - upload_start < self.UPLOAD_TIMEOUT:
try:
upload_input = await page.wait_for_selector('input.upload-input', timeout=3000)
preview_new = await upload_input.query_selector(
'xpath=following-sibling::div[contains(@class, "preview-new")]'
)
if preview_new:
stage_elements = await preview_new.query_selector_all('div.stage')
upload_success = False
for stage in stage_elements:
text_content = await page.evaluate('(element) => element.textContent', stage)
if '上传成功' in text_content:
upload_success = True
break
if upload_success:
logger.info("[小红书] 检测到上传成功标识")
break
else:
logger.info("[小红书] 未找到上传成功标识,继续等待...")
else:
logger.info("[小红书] 未找到预览元素,继续等待...")
await asyncio.sleep(self.POLL_INTERVAL)
except Exception as e:
logger.info(f"[小红书] 检测过程: {str(e)},重新尝试...")
await asyncio.sleep(0.5)
else:
logger.error("[小红书] 视频上传超时")
return {
"success": False,
"message": "视频上传超时",
"url": None
}
# Fill title and tags
await asyncio.sleep(1)
logger.info("[小红书] 正在填充标题和话题...")
title_container = page.locator('div.plugin.title-container').locator('input.d-text')
if await title_container.count():
await title_container.fill(self.title[:30])
# Add tags
css_selector = ".tiptap"
for tag in self.tags:
await page.type(css_selector, "#" + tag)
await page.press(css_selector, "Space")
logger.info(f"[小红书] 总共添加 {len(self.tags)} 个话题")
# Set scheduled publish time if needed
if self.publish_date != 0:
await self.set_schedule_time(page, self.publish_date)
# Click publish button (with timeout)
publish_start = time.time()
while time.time() - publish_start < self.PUBLISH_TIMEOUT:
try:
if self.publish_date != 0:
await page.locator('button:has-text("定时发布")').click()
else:
await page.locator('button:has-text("发布")').click()
await page.wait_for_url(
"https://creator.xiaohongshu.com/publish/success?**",
timeout=3000
)
logger.success("[小红书] 视频发布成功")
break
except Exception:
logger.info("[小红书] 视频正在发布中...")
await asyncio.sleep(0.5)
else:
logger.warning("[小红书] 发布检测超时,请手动确认")
# Save updated cookies
await context.storage_state(path=self.account_file)
logger.success("[小红书] Cookie 更新完毕")
await asyncio.sleep(2)
return {
"success": True,
"message": "发布成功,待审核" if self.publish_date == 0 else "已设置定时发布",
"url": None
}
except Exception as e:
logger.exception(f"[小红书] 上传失败: {e}")
return {
"success": False,
"message": f"上传失败: {str(e)}",
"url": None
}
finally:
# 确保资源释放
if context:
try:
await context.close()
except Exception:
pass
if browser:
try:
await browser.close()
except Exception:
pass
async def main(self) -> Dict[str, Any]:
"""Execute upload"""
async with async_playwright() as playwright:
return await self.upload(playwright)
"""
Xiaohongshu (小红书) uploader using Playwright.
"""
from datetime import datetime
from pathlib import Path
from typing import Optional, List, Dict, Any
import asyncio
import os
import re
import shutil
import time
from playwright.async_api import Playwright, async_playwright
from loguru import logger
from .base_uploader import BaseUploader
from .cookie_utils import set_init_script
from app.core.config import settings
class XiaohongshuUploader(BaseUploader):
"""Xiaohongshu video uploader using Playwright"""
UPLOAD_TIMEOUT = 420
UPLOAD_IDLE_TIMEOUT = 90
UPLOAD_SIGNAL_TIMEOUT = 12
PUBLISH_TIMEOUT = 120
PAGE_READY_TIMEOUT = 60
POLL_INTERVAL = 2
MAX_CLICK_RETRIES = 3
def __init__(
self,
title: str,
file_path: str,
tags: List[str],
publish_date: Optional[datetime] = None,
account_file: Optional[str] = None,
description: str = "",
user_id: Optional[str] = None,
):
super().__init__(title, file_path, tags, publish_date, account_file, description)
self.user_id = user_id
self.upload_url = "https://creator.xiaohongshu.com/publish/publish?from=homepage&target=video"
self._publish_api_submitted = False
self._publish_api_error: Optional[str] = None
self._temp_upload_paths: List[Path] = []
def _track_temp_upload_path(self, path: Path) -> None:
self._temp_upload_paths.append(path)
def _prepare_upload_file(self) -> Path:
src = self.file_path
if src.suffix:
return src
parent_suffix = Path(src.parent.name).suffix
if not parent_suffix:
return src
temp_dir = Path("/tmp/vigent_uploads")
temp_dir.mkdir(parents=True, exist_ok=True)
target = temp_dir / src.parent.name
try:
if target.exists():
target.unlink()
except Exception:
pass
try:
os.link(src, target)
logger.info(f"[小红书] using hardlink upload file: {target}")
except Exception:
try:
shutil.copy2(src, target)
logger.info(f"[小红书] using copied upload file: {target}")
except Exception as e:
logger.warning(f"[小红书] 构建带后缀上传文件失败,回退原文件: {e}")
return src
self._track_temp_upload_path(target)
return target
def _cleanup_upload_file(self) -> None:
if not self._temp_upload_paths:
return
paths = list(self._temp_upload_paths)
self._temp_upload_paths = []
for path in paths:
try:
if path.exists():
path.unlink()
except Exception as e:
logger.warning(f"[小红书] 清理临时上传文件失败: {e}")
def _resolve_headless_mode(self) -> str:
mode = (settings.XIAOHONGSHU_HEADLESS_MODE or "").strip().lower()
return mode or "headless-new"
def _build_launch_options(self) -> Dict[str, Any]:
mode = self._resolve_headless_mode()
args = [
"--no-sandbox",
"--disable-dev-shm-usage",
"--disable-blink-features=AutomationControlled",
]
headless = mode not in ("headful", "false", "0", "no")
if headless and mode in ("new", "headless-new", "headless_new"):
args.append("--headless=new")
if settings.XIAOHONGSHU_FORCE_SWIFTSHADER or headless:
args.extend([
"--enable-unsafe-swiftshader",
"--use-gl=swiftshader",
])
options: Dict[str, Any] = {"headless": headless, "args": args}
chrome_path = (settings.XIAOHONGSHU_CHROME_PATH or "").strip()
if chrome_path:
if Path(chrome_path).exists():
options["executable_path"] = chrome_path
else:
logger.warning(f"[小红书] XIAOHONGSHU_CHROME_PATH 不存在: {chrome_path}")
else:
channel = (settings.XIAOHONGSHU_BROWSER_CHANNEL or "").strip()
if channel:
options["channel"] = channel
return options
def _debug_artifacts_enabled(self) -> bool:
return bool(settings.DEBUG and settings.XIAOHONGSHU_DEBUG_ARTIFACTS)
async def _save_debug_screenshot(self, page, name: str) -> None:
if not self._debug_artifacts_enabled():
return
try:
debug_dir = Path(__file__).parent.parent.parent / "debug_screenshots"
debug_dir.mkdir(exist_ok=True)
safe_name = name.replace("/", "_").replace(" ", "_")
file_path = debug_dir / f"xiaohongshu_{safe_name}.png"
await page.screenshot(path=str(file_path), full_page=True)
logger.info(f"[小红书] saved debug screenshot: {file_path}")
except Exception as e:
logger.warning(f"[小红书] 保存调试截图失败: {e}")
def _publish_screenshot_dir(self) -> Path:
user_key = re.sub(r"[^A-Za-z0-9_-]", "_", self.user_id or "legacy")[:64] or "legacy"
target = settings.PUBLISH_SCREENSHOT_DIR / user_key
target.mkdir(parents=True, exist_ok=True)
return target
async def _save_publish_success_screenshot(self, page) -> Optional[str]:
try:
timestamp = time.strftime("%Y%m%d_%H%M%S", time.localtime())
filename = f"xiaohongshu_success_{timestamp}_{int(time.time() * 1000) % 1000:03d}.png"
file_path = self._publish_screenshot_dir() / filename
await page.screenshot(path=str(file_path), full_page=False)
return f"/api/publish/screenshot/{filename}"
except Exception as e:
logger.warning(f"[小红书] 保存发布成功截图失败: {e}")
return None
def _attach_publish_listener(self, page) -> None:
ignore_tokens = ("report", "collect", "analytics", "monitor", "perf")
def on_response(response):
try:
request = response.request
if request.method not in ("POST", "PUT"):
return
url = (response.url or "").lower()
if "xiaohongshu.com" not in url or "api" not in url:
return
if not any(token in url for token in ("publish", "note/create", "note/publish", "note/save")):
return
if any(token in url for token in ignore_tokens):
return
if response.status < 400:
self._publish_api_submitted = True
logger.info("[小红书][publish] publish API ok")
else:
self._publish_api_error = f"发布请求失败HTTP {response.status}"
logger.warning(f"[小红书][publish] publish API failed status={response.status}")
except Exception:
pass
page.on("response", on_response)
async def _is_text_visible(self, page, text: str, exact: bool = False) -> bool:
try:
return await page.get_by_text(text, exact=exact).first.is_visible()
except Exception:
return False
async def _first_existing_locator(self, page, selectors: List[str], require_visible: bool = True):
for selector in selectors:
locator = page.locator(selector)
try:
if await locator.count() == 0:
continue
candidate = locator.first
if require_visible and not await candidate.is_visible():
continue
return candidate
except Exception:
continue
return None
async def _is_login_page(self, page) -> bool:
url = page.url.lower()
if "login" in url or "signin" in url:
return True
if await self._is_text_visible(page, "扫码登录", exact=False):
return True
if await self._is_text_visible(page, "立即登录", exact=False):
return True
return False
async def _go_to_publish_page(self, page):
await page.goto(self.upload_url, wait_until="domcontentloaded", timeout=self.PAGE_READY_TIMEOUT * 1000)
await asyncio.sleep(2)
return page
async def _find_file_input(self, page):
selectors = [
"input[type='file'][accept*='video']",
"div[class*='upload'] input[type='file']",
"input.upload-input",
"input[type='file']",
]
return await self._first_existing_locator(page, selectors, require_visible=False)
async def _open_upload_entry(self, page) -> None:
selectors = [
"button:has-text('上传视频')",
"button:has-text('上传')",
"div[role='button']:has-text('上传视频')",
"div[role='button']:has-text('上传')",
"span:has-text('上传视频')",
]
target = await self._first_existing_locator(page, selectors)
if not target:
return
try:
await target.scroll_into_view_if_needed()
except Exception:
pass
try:
await target.click(timeout=2000)
except Exception:
try:
await target.evaluate("el => el.click()")
except Exception:
pass
async def _is_upload_in_progress(self, page) -> bool:
in_progress_texts = [
"上传中",
"正在上传",
"处理中",
"视频处理中",
"转码中",
"请稍候",
"上传进度",
"校验中",
"准备中",
]
for text in in_progress_texts:
if await self._is_text_visible(page, text, exact=False):
return True
return False
async def _is_upload_success(self, page) -> bool:
success_texts = [
"上传成功",
"上传完成",
"处理完成",
"转码完成",
"可发布",
]
for text in success_texts:
if await self._is_text_visible(page, text, exact=False):
return True
return await self._is_publish_button_enabled(page)
async def _upload_failed_reason(self, page) -> Optional[str]:
failure_texts = [
"上传失败",
"上传异常",
"上传出错",
"上传超时",
"网络异常",
]
for text in failure_texts:
if await self._is_text_visible(page, text, exact=False):
return f"上传失败:{text}"
return None
async def _upload_video(self, page) -> bool:
page = await self._go_to_publish_page(page)
await self._save_debug_screenshot(page, "publish_page")
upload_path = self._prepare_upload_file()
try:
upload_size = upload_path.stat().st_size
logger.info(
f"[小红书][upload_file] path={upload_path} "
f"size={upload_size} suffix={upload_path.suffix}"
)
except Exception as e:
logger.warning(f"[小红书] 读取上传文件信息失败: {e}")
for attempt in range(self.MAX_CLICK_RETRIES):
file_input = await self._find_file_input(page)
if not file_input:
await self._open_upload_entry(page)
await asyncio.sleep(1)
file_input = await self._find_file_input(page)
if not file_input:
logger.info(f"[小红书] 未找到上传文件 input准备重试 ({attempt + 1}/{self.MAX_CLICK_RETRIES})")
await asyncio.sleep(1)
continue
try:
await file_input.set_input_files(str(upload_path))
logger.info(f"[小红书] 已设置上传文件: {upload_path.name}")
try:
file_info = await file_input.evaluate(
"""
(input) => {
const file = input && input.files ? input.files[0] : null;
if (!file) return null;
return { name: file.name, size: file.size, type: file.type };
}
"""
)
if file_info:
selected_name = str(file_info.get("name") or "")
logger.info(
"[小红书][file_input] "
f"name={selected_name} "
f"size={file_info.get('size')} "
f"type={file_info.get('type')}"
)
if upload_path.suffix and selected_name and not selected_name.lower().endswith(upload_path.suffix.lower()):
logger.warning(
"[小红书] file input 文件名后缀与上传文件不一致,"
f"expect=*{upload_path.suffix} actual={selected_name}"
)
if attempt + 1 < self.MAX_CLICK_RETRIES:
await asyncio.sleep(1)
continue
await self._save_debug_screenshot(page, "upload_input_name_mismatch")
return False
if not str(file_info.get("type") or "").strip():
logger.warning("[小红书] file input MIME 为空,可能影响站点识别")
except Exception:
pass
signal_detected = False
bootstrap_error: Optional[str] = None
deadline = time.time() + self.UPLOAD_SIGNAL_TIMEOUT
while time.time() < deadline:
bootstrap_error = await self._upload_failed_reason(page)
if bootstrap_error:
break
if await self._is_upload_in_progress(page) or await self._is_upload_success(page):
signal_detected = True
break
await asyncio.sleep(0.6)
if bootstrap_error:
logger.warning(f"[小红书] 上传启动阶段失败: {bootstrap_error}")
if attempt + 1 < self.MAX_CLICK_RETRIES:
await asyncio.sleep(1)
continue
return False
if signal_detected:
return True
logger.info("[小红书] 未立即检测到上传状态,进入后续上传监控")
return True
except Exception as e:
logger.warning(f"[小红书] set_input_files 失败: {e}")
await asyncio.sleep(1)
await self._save_debug_screenshot(page, "upload_input_missing")
return False
async def _wait_for_upload_complete(self, page) -> tuple[bool, str]:
start = time.time()
idle_start = start
while time.time() - start < self.UPLOAD_TIMEOUT:
reason = await self._upload_failed_reason(page)
if reason:
logger.warning(f"[小红书] 上传失败检测: {reason}")
return False, reason
if await self._is_upload_success(page):
return True, "上传完成"
if await self._is_upload_in_progress(page):
idle_start = time.time()
logger.info("[小红书] 视频上传进行中...")
else:
if time.time() - idle_start > self.UPLOAD_IDLE_TIMEOUT:
await self._save_debug_screenshot(page, "upload_idle_timeout")
return False, "未检测到有效上传进度(疑似上传控件未生效)"
logger.info("[小红书] 等待上传状态...")
await asyncio.sleep(self.POLL_INTERVAL)
return False, "视频上传超时"
def _normalize_tags(self, tags: List[str]) -> List[str]:
normalized: List[str] = []
seen = set()
for raw in tags:
item = (raw or "").strip().lstrip("#")
if not item:
continue
lowered = item.lower()
if lowered in seen:
continue
seen.add(lowered)
normalized.append(item)
return normalized
async def _fill_title(self, page) -> bool:
selectors = [
"input[placeholder*='标题']",
"div.plugin.title-container input",
"input.d-text",
]
target = await self._first_existing_locator(page, selectors)
if not target:
return False
try:
await target.click(timeout=1500)
await target.fill((self.title or "")[:30])
return True
except Exception:
return False
async def _fill_description(self, page, text: str) -> bool:
selectors = [
".tiptap[contenteditable='true']",
"[contenteditable='true'][data-placeholder*='描述']",
"[contenteditable='true'][role='textbox']",
"textarea[placeholder*='描述']",
"textarea[placeholder*='正文']",
]
target = await self._first_existing_locator(page, selectors)
if not target:
return False
try:
await target.click(timeout=1500)
await page.keyboard.press("Control+KeyA")
await page.keyboard.type(text)
return True
except Exception:
return False
async def set_schedule_time(self, page, publish_date: datetime) -> bool:
try:
toggle = await self._first_existing_locator(
page,
[
"label:has-text('定时发布')",
"span:has-text('定时发布')",
"div:has-text('定时发布')",
],
)
if not toggle:
return False
try:
await toggle.click(timeout=2000)
except Exception:
await toggle.evaluate("el => el.click()")
await asyncio.sleep(0.5)
date_input = await self._first_existing_locator(
page,
[
"input[placeholder*='日期和时间']",
"input[placeholder*='发布时间']",
"input[placeholder*='选择日期']",
],
)
if not date_input:
return False
value = publish_date.strftime("%Y-%m-%d %H:%M")
await date_input.click(timeout=2000)
await page.keyboard.press("Control+KeyA")
await page.keyboard.type(value)
await page.keyboard.press("Enter")
logger.info(f"[小红书] 已设置定时发布: {value}")
return True
except Exception as e:
logger.warning(f"[小红书] 设置定时发布时间失败: {e}")
return False
async def _find_publish_button(self, page, scheduled: bool):
selectors = [
"button:has-text('定时发布')",
"div[role='button']:has-text('定时发布')",
] if scheduled else [
"button:has-text('发布')",
"button:has-text('立即发布')",
"div[role='button']:has-text('发布')",
]
for selector in selectors:
locator = page.locator(selector)
try:
if await locator.count() == 0:
continue
candidate = locator.first
if not await candidate.is_visible():
continue
return candidate
except Exception:
continue
return None
async def _is_publish_button_enabled(self, page) -> bool:
buttons = [
await self._find_publish_button(page, scheduled=False),
await self._find_publish_button(page, scheduled=True),
]
for button in buttons:
if not button:
continue
try:
if await button.is_enabled():
return True
except Exception:
continue
return False
async def _click_publish(self, page, scheduled: bool) -> tuple[bool, str]:
for _ in range(self.MAX_CLICK_RETRIES):
button = await self._find_publish_button(page, scheduled)
if not button:
await asyncio.sleep(0.8)
continue
try:
if not await button.is_enabled():
await asyncio.sleep(0.8)
continue
except Exception:
pass
try:
await button.click(timeout=2000)
return True, "发布按钮点击成功"
except Exception:
try:
await button.evaluate("el => el.click()")
return True, "发布按钮 JS 点击成功"
except Exception:
await asyncio.sleep(0.8)
return False, "未找到可点击的发布按钮"
async def _wait_for_publish_result(self, page) -> tuple[bool, str, bool]:
create_url = page.url
success_url_tokens = [
"/publish/success",
"/publish/result",
"/publish/published",
]
success_texts = [
"发布成功",
"发布完成",
"审核中",
"查看笔记",
"去查看",
]
failure_texts = [
"发布失败",
"发布异常",
"发布出错",
"网络异常",
"请完善",
"请补充",
]
start_time = time.time()
while time.time() - start_time < self.PUBLISH_TIMEOUT:
if self._publish_api_error:
return False, self._publish_api_error, False
current_url = page.url
lowered_url = current_url.lower()
if any(token in lowered_url for token in success_url_tokens):
return True, f"发布成功:跳转到 {current_url}", False
if current_url != create_url and "/publish/publish" not in lowered_url:
return True, f"发布成功:页面已跳转 {current_url}", False
if self._publish_api_submitted:
return True, "发布成功API 已确认", False
for text in failure_texts:
if await self._is_text_visible(page, text, exact=False):
return False, f"发布失败:{text}", False
for text in success_texts:
if await self._is_text_visible(page, text, exact=False):
return True, f"发布成功:检测到文案 {text}", False
logger.info("[小红书] 等待发布结果...")
await asyncio.sleep(self.POLL_INTERVAL)
return False, "发布超时", True
async def upload(self, playwright: Playwright) -> Dict[str, Any]:
browser = None
context = None
page = None
try:
launch_options = self._build_launch_options()
browser = await playwright.chromium.launch(**launch_options)
context = await browser.new_context(
storage_state=self.account_file,
viewport={"width": 1600, "height": 900},
device_scale_factor=1,
user_agent=settings.XIAOHONGSHU_USER_AGENT,
locale=settings.XIAOHONGSHU_LOCALE,
timezone_id=settings.XIAOHONGSHU_TIMEZONE_ID,
)
context = await set_init_script(context)
page = await context.new_page()
self._attach_publish_listener(page)
await self._go_to_publish_page(page)
if await self._is_login_page(page):
return {
"success": False,
"message": "登录失效,请重新扫码登录小红书",
"url": None,
}
logger.info(f"[小红书] 正在上传: {self.file_path.name}")
if not await self._upload_video(page):
return {
"success": False,
"message": "未能触发有效视频上传,请确认发布页状态及视频文件格式",
"url": None,
}
upload_success, upload_reason = await self._wait_for_upload_complete(page)
if not upload_success:
await self._save_debug_screenshot(page, "upload_failed")
return {
"success": False,
"message": upload_reason,
"url": None,
}
await asyncio.sleep(1)
title_filled = await self._fill_title(page)
if not title_filled:
logger.warning("[小红书] 未找到标题输入框,尝试在正文中补充标题")
normalized_tags = self._normalize_tags(self.tags)
body_parts: List[str] = []
if self.description:
body_parts.append(self.description.strip())
if not title_filled and self.title:
body_parts.insert(0, self.title.strip())
if normalized_tags:
body_parts.append(" ".join([f"#{tag}" for tag in normalized_tags]))
body_text = "\n".join([part for part in body_parts if part]).strip()
if body_text:
body_ok = await self._fill_description(page, body_text)
if not body_ok:
logger.warning("[小红书] 未找到正文输入框,跳过正文/话题填充")
if self.publish_date != 0 and isinstance(self.publish_date, datetime):
if not await self.set_schedule_time(page, self.publish_date):
return {
"success": False,
"message": "未找到定时发布控件,请检查小红书发布页结构",
"url": None,
}
clicked, click_reason = await self._click_publish(page, scheduled=self.publish_date != 0)
if not clicked:
await self._save_debug_screenshot(page, "publish_button_not_clickable")
return {
"success": False,
"message": click_reason,
"url": None,
}
publish_success, publish_reason, is_timeout = await self._wait_for_publish_result(page)
await context.storage_state(path=self.account_file)
logger.success("[小红书] Cookie 更新完毕")
if publish_success:
await asyncio.sleep(2)
screenshot_url = await self._save_publish_success_screenshot(page)
return {
"success": True,
"message": "发布成功,待审核" if self.publish_date == 0 else "已设置定时发布",
"url": None,
"screenshot_url": screenshot_url,
}
if is_timeout:
return {
"success": False,
"message": f"发布状态未知(检测超时),请到小红书创作中心确认: {publish_reason}",
"url": None,
}
return {
"success": False,
"message": publish_reason,
"url": None,
}
except Exception as e:
logger.exception(f"[小红书] 上传失败: {e}")
return {
"success": False,
"message": f"上传失败: {str(e)}",
"url": None,
}
finally:
self._cleanup_upload_file()
if page:
try:
if not page.is_closed():
await page.close()
except Exception:
pass
if context:
try:
await context.close()
except Exception:
pass
if browser:
try:
await browser.close()
except Exception:
pass
async def main(self) -> Dict[str, Any]:
async with async_playwright() as playwright:
return await self.upload(playwright)

View File

@@ -38,3 +38,7 @@ faster-whisper>=1.0.0
# 文案提取与AI生成
yt-dlp>=2023.0.0
zai-sdk>=0.2.0
# 小脸增强
opencv-python-headless>=4.8.0
gfpgan>=1.3.8

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

After

Width:  |  Height:  |  Size: 3.6 KiB

BIN
frontend/src/app/icon.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.6 KiB

View File

@@ -3,6 +3,7 @@ import { Geist, Geist_Mono } from "next/font/google";
import "./globals.css";
import { AuthProvider } from "@/shared/contexts/AuthContext";
import { TaskProvider } from "@/shared/contexts/TaskContext";
import { CleanupProvider } from "@/shared/contexts/CleanupContext";
import { Toaster } from "sonner";
@@ -40,7 +41,9 @@ export default function RootLayout({
>
<AuthProvider>
<TaskProvider>
{children}
<CleanupProvider>
{children}
</CleanupProvider>
</TaskProvider>
</AuthProvider>
<Toaster

View File

@@ -4,6 +4,7 @@ import { useState, useEffect, useRef } from "react";
import { useAuth } from "@/shared/contexts/AuthContext";
import api from "@/shared/api/axios";
import { ApiResponse } from "@/shared/api/types";
import { AppModal, AppModalHeader } from "@/shared/ui/AppModal";
// 账户设置下拉菜单组件
export default function AccountSettingsDropdown() {
@@ -90,6 +91,15 @@ export default function AccountSettingsDropdown() {
}
};
const closePasswordModal = () => {
setShowPasswordModal(false);
setError('');
setSuccess('');
setOldPassword('');
setNewPassword('');
setConfirmPassword('');
};
return (
<div className="relative" ref={dropdownRef}>
<button
@@ -137,81 +147,83 @@ export default function AccountSettingsDropdown() {
{/* 修改密码弹窗 */}
{showPasswordModal && (
<div className="fixed inset-0 z-[200] flex items-start justify-center pt-20 bg-black/60 backdrop-blur-sm p-4">
<div className="w-full max-w-md p-6 bg-gray-900 border border-white/10 rounded-2xl shadow-2xl mx-4">
<h3 className="text-xl font-bold text-white mb-4"></h3>
<form onSubmit={handleChangePassword} className="space-y-4">
<div>
<label className="block text-sm text-gray-300 mb-1"></label>
<input
type="password"
value={oldPassword}
onChange={(e) => setOldPassword(e.target.value)}
required
className="w-full px-3 py-2 bg-white/5 border border-white/10 rounded-lg text-white placeholder-gray-500 focus:outline-none focus:ring-2 focus:ring-purple-500"
placeholder="输入当前密码"
/>
</div>
<div>
<label className="block text-sm text-gray-300 mb-1"></label>
<input
type="password"
value={newPassword}
onChange={(e) => setNewPassword(e.target.value)}
required
className="w-full px-3 py-2 bg-white/5 border border-white/10 rounded-lg text-white placeholder-gray-500 focus:outline-none focus:ring-2 focus:ring-purple-500"
placeholder="至少6位"
/>
</div>
<div>
<label className="block text-sm text-gray-300 mb-1"></label>
<input
type="password"
value={confirmPassword}
onChange={(e) => setConfirmPassword(e.target.value)}
required
className="w-full px-3 py-2 bg-white/5 border border-white/10 rounded-lg text-white placeholder-gray-500 focus:outline-none focus:ring-2 focus:ring-purple-500"
placeholder="再次输入新密码"
/>
</div>
<AppModal
isOpen={showPasswordModal}
onClose={closePasswordModal}
zIndexClassName="z-[200]"
panelClassName="w-full max-w-md rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden"
closeOnOverlay
>
<AppModalHeader
title="修改密码"
subtitle="修改后将自动退出并重新登录"
onClose={closePasswordModal}
/>
{error && (
<div className="p-2 bg-red-500/20 border border-red-500/50 rounded text-red-200 text-sm">
{error}
</div>
)}
{success && (
<div className="p-2 bg-green-500/20 border border-green-500/50 rounded text-green-200 text-sm">
{success}
</div>
)}
<form onSubmit={handleChangePassword} className="space-y-4 p-5">
<div>
<label className="block text-sm text-gray-300 mb-1"></label>
<input
type="password"
value={oldPassword}
onChange={(e) => setOldPassword(e.target.value)}
required
className="w-full px-3 py-2 bg-white/5 border border-white/10 rounded-lg text-white placeholder-gray-500 focus:outline-none focus:ring-2 focus:ring-purple-500"
placeholder="输入当前密码"
/>
</div>
<div>
<label className="block text-sm text-gray-300 mb-1"></label>
<input
type="password"
value={newPassword}
onChange={(e) => setNewPassword(e.target.value)}
required
className="w-full px-3 py-2 bg-white/5 border border-white/10 rounded-lg text-white placeholder-gray-500 focus:outline-none focus:ring-2 focus:ring-purple-500"
placeholder="至少6位"
/>
</div>
<div>
<label className="block text-sm text-gray-300 mb-1"></label>
<input
type="password"
value={confirmPassword}
onChange={(e) => setConfirmPassword(e.target.value)}
required
className="w-full px-3 py-2 bg-white/5 border border-white/10 rounded-lg text-white placeholder-gray-500 focus:outline-none focus:ring-2 focus:ring-purple-500"
placeholder="再次输入新密码"
/>
</div>
<div className="flex gap-3 pt-2">
<button
type="button"
onClick={() => {
setShowPasswordModal(false);
setError('');
setSuccess('');
setOldPassword('');
setNewPassword('');
setConfirmPassword('');
}}
className="flex-1 py-2 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors"
>
</button>
<button
type="submit"
disabled={loading}
className="flex-1 py-2 bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-700 hover:to-pink-700 text-white rounded-lg transition-colors disabled:opacity-50"
>
{loading ? '修改中...' : '确认修改'}
</button>
{error && (
<div className="p-2 bg-red-500/20 border border-red-500/50 rounded text-red-200 text-sm">
{error}
</div>
</form>
</div>
</div>
)}
{success && (
<div className="p-2 bg-green-500/20 border border-green-500/50 rounded text-green-200 text-sm">
{success}
</div>
)}
<div className="flex gap-3 pt-2">
<button
type="button"
onClick={closePasswordModal}
className="flex-1 py-2 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors"
>
</button>
<button
type="submit"
disabled={loading}
className="flex-1 py-2 bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-700 hover:to-pink-700 text-white rounded-lg transition-colors disabled:opacity-50"
>
{loading ? '修改中...' : '确认修改'}
</button>
</div>
</form>
</AppModal>
)}
</div>
);

View File

@@ -1,7 +1,7 @@
"use client";
import { useEffect } from "react";
import { X, Video } from "lucide-react";
import { Video } from "lucide-react";
import { AppModal, AppModalHeader } from "@/shared/ui/AppModal";
interface VideoPreviewModalProps {
videoUrl: string | null;
@@ -16,66 +16,34 @@ export default function VideoPreviewModal({
title = "视频预览",
subtitle = "ESC 关闭 · 点击空白关闭",
}: VideoPreviewModalProps) {
useEffect(() => {
if (!videoUrl) return;
// 按 ESC 关闭
const handleEsc = (e: KeyboardEvent) => {
if (e.key === 'Escape') onClose();
};
const prevOverflow = document.body.style.overflow;
document.addEventListener('keydown', handleEsc);
// 禁止背景滚动
document.body.style.overflow = 'hidden';
if (!videoUrl) return null;
return () => {
document.removeEventListener('keydown', handleEsc);
document.body.style.overflow = prevOverflow;
};
}, [videoUrl, onClose]);
return (
<AppModal
isOpen={Boolean(videoUrl)}
onClose={onClose}
zIndexClassName="z-[320]"
panelClassName="relative w-full max-w-4xl rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden flex flex-col"
closeOnOverlay
>
<div data-video-preview-open="true" className="flex flex-col">
<AppModalHeader
title={title}
subtitle={subtitle}
icon={<Video className="h-5 w-5" />}
onClose={onClose}
/>
if (!videoUrl) return null;
return (
<div
className="fixed inset-0 z-[200] flex items-center justify-center bg-black/80 backdrop-blur-sm p-4 animate-in fade-in duration-200"
onClick={onClose}
>
<div
className="relative w-full max-w-4xl bg-gray-900 border border-white/10 rounded-2xl shadow-2xl overflow-hidden flex flex-col"
onClick={(e) => e.stopPropagation()}
>
<div className="flex items-center justify-between px-6 py-3 border-b border-white/10 bg-gradient-to-r from-white/5 via-white/0 to-white/5">
<div className="flex items-center gap-3">
<div className="h-9 w-9 rounded-lg bg-white/10 flex items-center justify-center text-white">
<Video className="h-5 w-5" />
</div>
<div>
<h3 className="text-lg font-semibold text-white">
{title}
</h3>
<p className="text-xs text-gray-400">
{subtitle}
</p>
</div>
</div>
<button
onClick={onClose}
className="p-2 text-gray-400 hover:text-white hover:bg-white/10 rounded-lg transition-colors"
>
<X className="h-5 w-5" />
</button>
</div>
<div className="bg-black flex items-center justify-center min-h-[50vh] max-h-[80vh]">
<video
src={videoUrl}
controls
autoPlay
preload="metadata"
className="w-full h-full max-h-[80vh] object-contain"
/>
</div>
</div>
<div className="bg-black flex items-center justify-center min-h-[50vh] max-h-[80vh]">
<video
src={videoUrl}
controls
autoPlay
preload="metadata"
className="w-full h-full max-h-[80vh] object-contain"
/>
</div>
);
</div>
</AppModal>
);
}

View File

@@ -1,4 +1,4 @@
import { useEffect, useMemo, useRef, useState } from "react";
import { useCallback, useEffect, useMemo, useRef, useState } from "react";
import api from "@/shared/api/axios";
import {
buildTextShadow,
@@ -256,6 +256,14 @@ export const useHomeController = () => {
const payload = unwrap(res);
if (selectedMaterials.includes(materialId) && payload?.id) {
setSelectedMaterials((prev) => prev.map((x) => (x === materialId ? payload.id : x)));
// Sync inserts: update materialId and name when rename changes the ID
if (payload.id !== materialId) {
setInserts((prev) => prev.map((ins) =>
ins.materialId === materialId
? { ...ins, materialId: payload.id, materialName: editMaterialName.trim() }
: ins
));
}
}
setEditingMaterialId(null);
setEditMaterialName("");
@@ -287,6 +295,9 @@ export const useHomeController = () => {
// 文案提取模态框
const [extractModalOpen, setExtractModalOpen] = useState(false);
// 文案深度学习模态框
const [learningModalOpen, setLearningModalOpen] = useState(false);
// AI 改写模态框
const [rewriteModalOpen, setRewriteModalOpen] = useState(false);
@@ -307,6 +318,7 @@ export const useHomeController = () => {
setUploadError,
fetchMaterials,
toggleMaterial,
reorderMaterials,
deleteMaterial,
handleUpload,
} = useMaterials({
@@ -394,9 +406,17 @@ export const useHomeController = () => {
});
const {
segments: timelineSegments,
reorderSegments,
setSourceRange,
inserts,
setInserts,
primaryMaterial: timelinePrimaryMaterial,
primarySourceStart,
primarySourceEnd,
addInsert,
removeInsert,
moveInsert,
resizeInsert,
setInsertSourceRange,
setPrimarySourceRange,
toCustomAssignments,
} = useTimelineEditor({
audioDuration: selectedAudio?.duration_sec ?? 0,
@@ -405,16 +425,15 @@ export const useHomeController = () => {
storageKey,
});
// 时间轴第一段素材的视频 URL用于帧截取预览
// 素材的视频 URL用于帧截取预览
// 使用后端代理 URL同源避免 CORS canvas taint
const firstTimelineMaterialUrl = useMemo(() => {
const firstSeg = timelineSegments[0];
const matId = firstSeg?.materialId ?? selectedMaterials[0];
const matId = selectedMaterials[0];
if (!matId) return null;
const mat = materials.find((m) => m.id === matId);
if (!mat) return null;
return `/api/materials/stream/${mat.id}`;
}, [materials, timelineSegments, selectedMaterials]);
}, [materials, selectedMaterials]);
const materialPosterUrl = useVideoFrameCapture(showStylePreview ? firstTimelineMaterialUrl : null);
@@ -735,6 +754,9 @@ export const useHomeController = () => {
// 开始录音
const startRecording = async () => {
try {
setRecordedBlob(null);
setRecordingTime(0);
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const mediaRecorder = new MediaRecorder(stream, { mimeType: "audio/webm" });
const chunks: BlobPart[] = [];
@@ -748,7 +770,6 @@ export const useHomeController = () => {
mediaRecorder.start();
setIsRecording(true);
setRecordingTime(0);
mediaRecorderRef.current = mediaRecorder;
// 计时器
@@ -784,6 +805,11 @@ export const useHomeController = () => {
setRecordingTime(0);
};
const discardRecording = () => {
setRecordedBlob(null);
setRecordingTime(0);
};
// 格式化录音时长
const formatRecordingTime = (seconds: number) => {
const mins = Math.floor(seconds / 60);
@@ -945,57 +971,36 @@ export const useHomeController = () => {
output_aspect_ratio: outputAspectRatio,
};
// 多素材
// 多素材(多镜头模式)
if (selectedMaterials.length > 1) {
const timelineOrderedIds = timelineSegments
.map((seg) => seg.materialId)
.filter((id, index, arr) => arr.indexOf(id) === index);
const orderedMaterialIds = [
...timelineOrderedIds.filter((id) => selectedMaterials.includes(id)),
...selectedMaterials.filter((id) => !timelineOrderedIds.includes(id)),
];
const materialPaths = orderedMaterialIds
.map((id) => materials.find((x) => x.id === id)?.path)
.filter((path): path is string => !!path);
if (materialPaths.length === 0) {
toast.error("多素材解析失败,请刷新素材后重试");
return;
}
payload.material_paths = materialPaths;
payload.material_path = materialPaths[0];
// 发送自定义时间轴分配
const assignments = toCustomAssignments();
if (assignments.length > 0) {
const assignmentPaths = assignments
.map((a) => a.material_path)
.filter((path): path is string => !!path);
if (assignmentPaths.length === assignments.length) {
// 以时间轴可见段为准:超出时间轴的素材不会参与本次生成
payload.material_paths = assignmentPaths;
payload.material_path = assignmentPaths[0];
// 前端预估段数校验(与后端硬上限 50 对齐)
if (assignments.length > 50) {
toast.error(`时间轴段数过多(${assignments.length}),请减少插入或使用更长的主素材`);
return;
}
// 主素材路径(始终来自 selectedMaterials[0]
const primaryPath = firstMaterialObj.path;
// 去重素材路径列表,主素材保证在首位
const otherPaths = [...new Set(
assignments.map((a) => a.material_path).filter((p) => p !== primaryPath)
)];
payload.material_path = primaryPath;
payload.material_paths = [primaryPath, ...otherPaths];
payload.custom_assignments = assignments;
} else {
console.warn(
"[Timeline] custom_assignments 为空,回退后端自动分配",
{ materials: materialPaths.length }
);
// 无插入且主素材无裁剪:退化为单素材
payload.material_path = firstMaterialObj.path;
}
}
// 单素材 + 截取范围
const singleSeg = timelineSegments[0];
if (
selectedMaterials.length === 1
&& singleSeg
&& (singleSeg.sourceStart > 0 || singleSeg.sourceEnd > 0)
) {
payload.custom_assignments = toCustomAssignments();
if (selectedMaterials.length === 1) {
const assignments = toCustomAssignments();
if (assignments.length > 0) {
payload.custom_assignments = assignments;
}
}
if (selectedSubtitleStyleId) {
@@ -1040,7 +1045,7 @@ export const useHomeController = () => {
if (enableBgm && selectedBgmId) {
payload.bgm_id = selectedBgmId;
payload.bgm_volume = bgmVolume;
payload.bgm_volume = 0.2;
}
// 创建生成任务
@@ -1087,6 +1092,21 @@ export const useHomeController = () => {
videoItemRefs.current[id] = el;
};
// 设为主素材:将目标素材移到 selectedMaterials[0]
const handleSetPrimary = useCallback((materialId: string) => {
setSelectedMaterials((prev) => {
const filtered = prev.filter((id) => id !== materialId);
return [materialId, ...filtered];
});
}, [setSelectedMaterials]);
// 多镜头插入候选素材selectedMaterials[1:]
const insertCandidates = useMemo(() => {
return selectedMaterials.slice(1)
.map((id) => materials.find((m) => m.id === id))
.filter((m): m is Material => !!m);
}, [selectedMaterials, materials]);
return {
apiBase,
registerMaterialRef,
@@ -1116,6 +1136,8 @@ export const useHomeController = () => {
setText,
extractModalOpen,
setExtractModalOpen,
learningModalOpen,
setLearningModalOpen,
rewriteModalOpen,
setRewriteModalOpen,
handleGenerateMeta,
@@ -1198,6 +1220,7 @@ export const useHomeController = () => {
startRecording,
stopRecording,
useRecording,
discardRecording,
formatRecordingTime,
bgmList,
bgmLoading,
@@ -1238,9 +1261,20 @@ export const useHomeController = () => {
setSpeed,
emotion,
setEmotion,
timelineSegments,
reorderSegments,
setSourceRange,
// Multi-camera timeline
inserts,
timelinePrimaryMaterial,
primarySourceStart,
primarySourceEnd,
insertCandidates,
addInsert,
removeInsert,
moveInsert,
resizeInsert,
setInsertSourceRange,
setPrimarySourceRange,
handleSetPrimary,
reorderMaterials,
clipTrimmerOpen,
setClipTrimmerOpen,
clipTrimmerSegmentId,

View File

@@ -1,5 +1,9 @@
import { useCallback, useEffect, useRef, useState } from "react";
import type { Material } from "@/shared/types/material";
import type { InsertSegment } from "@/shared/types/timeline";
// Re-export for downstream consumers (ClipTrimmer, etc.)
export type { InsertSegment };
export interface TimelineSegment {
id: string;
@@ -12,18 +16,23 @@ export interface TimelineSegment {
color: string;
}
export interface CustomAssignment {
material_path: string;
start: number;
end: number;
source_start: number;
source_end?: number;
}
export interface CustomAssignment {
material_path: string;
start: number;
end: number;
source_start: number;
source_end?: number;
}
const COLORS = ["#8b5cf6", "#ec4899", "#06b6d4", "#f59e0b", "#10b981", "#f97316"];
const MAX_INSERTS = 10;
const DEFAULT_INSERT_DURATION = 3;
const MIN_GAP = 0.5;
export type AddInsertResult = "ok" | "limit" | "no_space";
/** Serializable subset for localStorage */
interface SegmentSnapshot {
interface InsertSnapshot {
materialId: string;
start: number;
end: number;
@@ -31,56 +40,11 @@ interface SegmentSnapshot {
sourceEnd: number;
}
/** Get effective duration of a segment (clipped range or full material duration) */
function getEffectiveDuration(
seg: { sourceStart: number; sourceEnd: number; materialId: string },
mats: Material[]
): number {
const mat = mats.find((m) => m.id === seg.materialId);
const matDur = mat?.duration_sec ?? 0;
if (seg.sourceEnd > seg.sourceStart) return seg.sourceEnd - seg.sourceStart;
if (seg.sourceStart > 0) return Math.max(matDur - seg.sourceStart, 0);
return matDur;
}
/**
* Recalculate segment start/end positions based on effective durations.
* - Segments placed sequentially by effective duration
* - Segments exceeding audioDuration keep their positions (overflow, start >= duration)
* - Last visible segment is capped/extended to exactly audioDuration (loop fill)
*/
function recalcPositions(
segs: TimelineSegment[],
mats: Material[],
duration: number
): TimelineSegment[] {
if (segs.length === 0 || duration <= 0) return segs;
const fallbackDur = duration / segs.length;
let cursor = 0;
const result = segs.map((seg) => {
const effDur = getEffectiveDuration(seg, mats);
const dur = effDur > 0 ? effDur : fallbackDur;
const newSeg = { ...seg, start: cursor, end: cursor + dur };
cursor += dur;
return newSeg;
});
// Find last segment that starts before audioDuration
let lastVisibleIdx = -1;
for (let i = result.length - 1; i >= 0; i--) {
if (result[i].start < duration) {
lastVisibleIdx = i;
break;
}
}
// Cap/extend last visible segment to exactly audioDuration
if (lastVisibleIdx >= 0) {
result[lastVisibleIdx] = { ...result[lastVisibleIdx], end: duration };
}
return result;
interface MultiCamCache {
key: string;
inserts: InsertSnapshot[];
primarySourceStart: number;
primarySourceEnd: number;
}
interface UseTimelineEditorOptions {
@@ -96,34 +60,40 @@ export const useTimelineEditor = ({
selectedMaterials,
storageKey,
}: UseTimelineEditorOptions) => {
const [segments, setSegments] = useState<TimelineSegment[]>([]);
const [inserts, setInserts] = useState<InsertSegment[]>([]);
const [primarySourceStart, setPrimarySourceStart] = useState(0);
const [primarySourceEnd, setPrimarySourceEnd] = useState(0);
const prevKey = useRef("");
const restoredRef = useRef(false);
const [prevPrimaryId, setPrevPrimaryId] = useState(selectedMaterials[0]);
// Refs for stable callbacks (avoid recreating on every materials/duration change)
const materialsRef = useRef(materials);
const audioDurationRef = useRef(audioDuration);
useEffect(() => {
materialsRef.current = materials;
}, [materials]);
useEffect(() => {
audioDurationRef.current = audioDuration;
}, [audioDuration]);
// Refs for stable callbacks
const materialsRef = useRef(materials);
const audioDurationRef = useRef(audioDuration);
const selectedMaterialsRef = useRef(selectedMaterials);
// Build a durationsKey so segments re-init when material durations become available
const durationsKey = selectedMaterials
.map((id) => materials.find((m) => m.id === id)?.duration_sec ?? 0)
.join(",");
useEffect(() => { materialsRef.current = materials; }, [materials]);
useEffect(() => { audioDurationRef.current = audioDuration; }, [audioDuration]);
useEffect(() => { selectedMaterialsRef.current = selectedMaterials; }, [selectedMaterials]);
// Build a cache key from materials + duration
// Computed: primary material
const primaryMaterial = materials.find((m) => m.id === selectedMaterials[0]);
// Cache key
const cacheKey = `${selectedMaterials.join(",")}_${audioDuration.toFixed(1)}`;
const lsKey = storageKey ? `vigent_${storageKey}_timeline` : null;
const lsKey = storageKey ? `vigent_${storageKey}_multicam` : null;
const initSegments = useCallback(() => {
if (selectedMaterials.length === 0 || audioDuration <= 0) {
setSegments([]);
// Reset primary source range when primary material identity changes
// (React render-time state adjustment pattern for derived state)
if (selectedMaterials[0] !== prevPrimaryId) {
setPrevPrimaryId(selectedMaterials[0]);
setPrimarySourceStart(0);
setPrimarySourceEnd(0);
}
// Initialize / restore from localStorage
const initInserts = useCallback(() => {
if (selectedMaterials.length <= 1 || audioDuration <= 0) {
setInserts([]);
return;
}
@@ -132,27 +102,28 @@ export const useTimelineEditor = ({
try {
const raw = localStorage.getItem(lsKey);
if (raw) {
const saved = JSON.parse(raw) as { key: string; segments: SegmentSnapshot[] };
if (saved.key === cacheKey && saved.segments.length === selectedMaterials.length) {
const allMatch = saved.segments.every(
(s, i) => s.materialId === selectedMaterials[i] || saved.segments.some((ss) => ss.materialId === selectedMaterials[i])
);
if (allMatch) {
const restored: TimelineSegment[] = saved.segments.map((s, i) => {
const saved: MultiCamCache = JSON.parse(raw);
if (saved.key === cacheKey) {
// Validate all insert materialIds still exist
const existingIds = new Set(materials.map((m) => m.id));
const validInserts = saved.inserts.filter((s) => existingIds.has(s.materialId));
if (validInserts.length === saved.inserts.length) {
const restored: InsertSegment[] = validInserts.map((s, i) => {
const mat = materials.find((m) => m.id === s.materialId);
return {
id: `seg-${i}-${Date.now()}`,
id: `ins-${i}-${Date.now()}`,
materialId: s.materialId,
materialName: mat?.scene || mat?.name || s.materialId,
start: 0,
end: 0,
start: s.start,
end: s.end,
sourceStart: s.sourceStart,
sourceEnd: s.sourceEnd,
color: COLORS[i % COLORS.length],
};
});
setSegments(recalcPositions(restored, materials, audioDuration));
restoredRef.current = true;
setInserts(restored);
setPrimarySourceStart(saved.primarySourceStart || 0);
setPrimarySourceEnd(saved.primarySourceEnd || 0);
return;
}
}
@@ -162,95 +133,315 @@ export const useTimelineEditor = ({
}
}
// Create fresh segments — positions derived by recalcPositions
const newSegments: TimelineSegment[] = selectedMaterials.map((matId, i) => {
const mat = materials.find((m) => m.id === matId);
return {
id: `seg-${i}-${Date.now()}`,
materialId: matId,
materialName: mat?.scene || mat?.name || matId,
start: 0,
end: 0,
sourceStart: 0,
sourceEnd: 0,
color: COLORS[i % COLORS.length],
};
});
setSegments(recalcPositions(newSegments, materials, audioDuration));
// Start fresh
setInserts([]);
setPrimarySourceStart(0);
setPrimarySourceEnd(0);
}, [audioDuration, materials, selectedMaterials, lsKey, cacheKey]);
// Auto-init when selectedMaterials, audioDuration, or material durations change
// Auto-init when inputs change
useEffect(() => {
const durationsKey = selectedMaterials
.map((id) => materials.find((m) => m.id === id)?.duration_sec ?? 0)
.join(",");
const key = `${selectedMaterials.join(",")}_${audioDuration}_${durationsKey}`;
if (key !== prevKey.current) {
prevKey.current = key;
initSegments();
// eslint-disable-next-line react-hooks/set-state-in-effect -- initialization on input change
initInserts();
}
}, [selectedMaterials, audioDuration, durationsKey, initSegments]);
}, [selectedMaterials, audioDuration, materials, initInserts]);
// Persist segments to localStorage on change (debounced)
// Persist to localStorage (debounced)
useEffect(() => {
if (!lsKey || segments.length === 0) return;
if (!lsKey || selectedMaterials.length <= 1) return;
const timeout = setTimeout(() => {
const snapshots: SegmentSnapshot[] = segments.map((s) => ({
const snapshots: InsertSnapshot[] = inserts.map((s) => ({
materialId: s.materialId,
start: s.start,
end: s.end,
sourceStart: s.sourceStart,
sourceEnd: s.sourceEnd,
}));
localStorage.setItem(lsKey, JSON.stringify({ key: cacheKey, segments: snapshots }));
const cache: MultiCamCache = {
key: cacheKey,
inserts: snapshots,
primarySourceStart,
primarySourceEnd,
};
localStorage.setItem(lsKey, JSON.stringify(cache));
}, 300);
return () => clearTimeout(timeout);
}, [segments, lsKey, cacheKey]);
}, [inserts, primarySourceStart, primarySourceEnd, lsKey, cacheKey, selectedMaterials.length]);
const reorderSegments = useCallback(
(fromIdx: number, toIdx: number) => {
setSegments((prev) => {
if (fromIdx < 0 || toIdx < 0 || fromIdx >= prev.length || toIdx >= prev.length) return prev;
if (fromIdx === toIdx) return prev;
const next = [...prev];
// Move the segment: remove from old position, insert at new position
const [moved] = next.splice(fromIdx, 1);
next.splice(toIdx, 0, moved);
return recalcPositions(next, materialsRef.current, audioDurationRef.current);
});
},
[]
);
// Clean up inserts referencing removed materials
useEffect(() => {
const existingIds = new Set(selectedMaterials.slice(1));
// eslint-disable-next-line react-hooks/set-state-in-effect -- cleanup stale references
setInserts((prev) => {
const filtered = prev.filter((ins) => existingIds.has(ins.materialId));
return filtered.length !== prev.length ? filtered : prev;
});
}, [selectedMaterials]);
const setSourceRange = useCallback(
(id: string, sourceStart: number, sourceEnd: number) => {
setSegments((prev) => {
const updated = prev.map((s) => (s.id === id ? { ...s, sourceStart, sourceEnd } : s));
return recalcPositions(updated, materialsRef.current, audioDurationRef.current);
});
},
[]
);
// ── Operations ──
const addInsert = useCallback((materialId: string): AddInsertResult => {
const currentInserts = inserts;
const duration = audioDurationRef.current;
if (currentInserts.length >= MAX_INSERTS) return "limit";
if (duration <= 0) return "no_space";
const mat = materialsRef.current.find((m) => m.id === materialId);
const sorted = [...currentInserts].sort((a, b) => a.start - b.start);
// Find first gap that can fit DEFAULT_INSERT_DURATION
let bestStart = -1;
let prevEnd = 0;
for (const ins of sorted) {
if (ins.start - prevEnd >= DEFAULT_INSERT_DURATION + MIN_GAP) {
bestStart = prevEnd + MIN_GAP;
break;
}
prevEnd = ins.end;
}
// Check trailing gap
if (bestStart < 0 && duration - prevEnd >= DEFAULT_INSERT_DURATION + MIN_GAP) {
bestStart = prevEnd + MIN_GAP;
}
if (bestStart < 0) return "no_space";
const newInsert: InsertSegment = {
id: `ins-${Date.now()}-${Math.random().toString(36).slice(2, 6)}`,
materialId,
materialName: mat?.scene || mat?.name || materialId,
start: bestStart,
end: Math.min(bestStart + DEFAULT_INSERT_DURATION, duration),
sourceStart: 0,
sourceEnd: 0,
color: COLORS[currentInserts.length % COLORS.length],
};
setInserts((prev) => [...prev, newInsert]);
return "ok";
}, [inserts]);
const removeInsert = useCallback((id: string) => {
setInserts((prev) => prev.filter((ins) => ins.id !== id));
}, []);
const moveInsert = useCallback((id: string, newStart: number) => {
setInserts((prev) => {
const duration = audioDurationRef.current;
const target = prev.find((ins) => ins.id === id);
if (!target) return prev;
const len = target.end - target.start;
let clampedStart = Math.max(0, Math.min(newStart, duration - len));
let clampedEnd = clampedStart + len;
// Prevent overlap with other inserts
const others = prev.filter((ins) => ins.id !== id).sort((a, b) => a.start - b.start);
for (const other of others) {
if (clampedEnd > other.start && clampedStart < other.end) {
// Try pushing to right of blocker
const rightStart = other.end + 0.1;
if (rightStart + len <= duration) {
clampedStart = rightStart;
clampedEnd = clampedStart + len;
} else {
// Try pushing to left of blocker
const leftEnd = other.start - 0.1;
if (leftEnd - len >= 0) {
clampedEnd = leftEnd;
clampedStart = clampedEnd - len;
}
}
}
}
return prev.map((ins) =>
ins.id === id ? { ...ins, start: clampedStart, end: clampedEnd } : ins
);
});
}, []);
const resizeInsert = useCallback((id: string, newEnd: number) => {
setInserts((prev) => {
const duration = audioDurationRef.current;
const target = prev.find((ins) => ins.id === id);
if (!target) return prev;
const minLen = 0.5;
let clamped = Math.max(target.start + minLen, Math.min(newEnd, duration));
// Prevent overlap with next insert
const others = prev.filter((ins) => ins.id !== id).sort((a, b) => a.start - b.start);
for (const other of others) {
if (other.start > target.start && clamped > other.start - 0.1) {
clamped = other.start - 0.1;
}
}
return prev.map((ins) =>
ins.id === id ? { ...ins, end: Math.max(clamped, target.start + minLen) } : ins
);
});
}, []);
const setInsertSourceRange = useCallback((id: string, sourceStart: number, sourceEnd: number) => {
setInserts((prev) =>
prev.map((ins) => (ins.id === id ? { ...ins, sourceStart, sourceEnd } : ins))
);
}, []);
const setPrimarySourceRange = useCallback((sourceStart: number, sourceEnd: number) => {
setPrimarySourceStart(sourceStart);
setPrimarySourceEnd(sourceEnd);
}, []);
// ── Serialization ──
const toCustomAssignments = useCallback((): CustomAssignment[] => {
const mats = materialsRef.current;
const selMats = selectedMaterialsRef.current;
const duration = audioDurationRef.current;
return segments
.filter((seg) => seg.start < duration)
.map((seg) => {
const mat = materialsRef.current.find((m) => m.id === seg.materialId);
return {
material_path: mat?.path || seg.materialId,
start: seg.start,
end: seg.end,
source_start: seg.sourceStart,
source_end: seg.sourceEnd > seg.sourceStart ? seg.sourceEnd : undefined,
};
});
}, [segments]);
if (duration <= 0 || selMats.length === 0) return [];
const primaryMat = mats.find((m) => m.id === selMats[0]);
if (!primaryMat) return [];
const primaryPath = primaryMat.path;
const primaryDuration = primaryMat.duration_sec ?? 0;
// Single material mode: only emit assignment if user has set a source range
if (selMats.length === 1) {
if (primarySourceStart > 0 || primarySourceEnd > 0) {
return [{
material_path: primaryPath,
start: 0,
end: duration,
source_start: primarySourceStart,
source_end: primarySourceEnd > primarySourceStart ? primarySourceEnd : undefined,
}];
}
return [];
}
// Multi-camera mode: build assignments with gap splitting
return buildAssignments(
primaryPath,
primaryDuration,
primarySourceStart,
primarySourceEnd,
inserts,
duration,
mats,
);
}, [inserts, primarySourceStart, primarySourceEnd]);
return {
segments,
initSegments,
reorderSegments,
setSourceRange,
// State
inserts,
setInserts,
primaryMaterial,
primarySourceStart,
primarySourceEnd,
// Operations
addInsert,
removeInsert,
moveInsert,
resizeInsert,
setInsertSourceRange,
setPrimarySourceRange,
// Serialization
toCustomAssignments,
};
};
// ── buildAssignments: gap-filling + boundary-splitting ──
function buildAssignments(
primaryPath: string,
primaryDuration: number,
pSourceStart: number,
pSourceEnd: number,
inserts: InsertSegment[],
audioDuration: number,
materials: Material[],
): CustomAssignment[] {
const assignments: CustomAssignment[] = [];
const sorted = [...inserts].sort((a, b) => a.start - b.start);
// Primary material effective play range
const clipStart = pSourceStart;
const clipEnd = pSourceEnd > pSourceStart ? pSourceEnd : primaryDuration;
const effective = clipEnd - clipStart;
let cursor = 0;
let primaryAccum = 0;
function addPrimaryGap(gapStart: number, gapEnd: number) {
if (gapEnd - gapStart < 0.05) return;
// No valid effective range: single segment from 0 (graceful degradation)
if (effective <= 0) {
assignments.push({
material_path: primaryPath,
start: gapStart,
end: gapEnd,
source_start: 0,
});
return;
}
let remaining = gapEnd - gapStart;
let segStart = gapStart;
const EPSILON = 0.01;
while (remaining > 0.05) {
const posInClip = primaryAccum % effective;
const sourceStart = clipStart + posInClip;
const availableInClip = effective - posInClip;
const segDuration = Math.min(remaining, availableInClip);
if (segDuration < EPSILON) break;
assignments.push({
material_path: primaryPath,
start: segStart,
end: segStart + segDuration,
source_start: sourceStart,
source_end: pSourceEnd > pSourceStart ? pSourceEnd : undefined,
});
primaryAccum += segDuration;
segStart += segDuration;
remaining -= segDuration;
}
}
for (const insert of sorted) {
// Primary gap before this insert
addPrimaryGap(cursor, insert.start);
// Insert segment
const mat = materials.find((m) => m.id === insert.materialId);
assignments.push({
material_path: mat?.path || insert.materialId,
start: insert.start,
end: insert.end,
source_start: insert.sourceStart,
source_end: insert.sourceEnd > insert.sourceStart ? insert.sourceEnd : undefined,
});
cursor = insert.end;
}
// Trailing primary gap
addPrimaryGap(cursor, audioDuration);
return assignments;
}

View File

@@ -1,5 +1,6 @@
import type { RefObject, MouseEvent } from "react";
import { RefreshCw, Play, Pause } from "lucide-react";
import { type RefObject, type MouseEvent, useCallback, useMemo, useState } from "react";
import { RefreshCw, Play, Pause, ChevronDown, Check, Search } from "lucide-react";
import { SelectPopover } from "@/shared/ui/SelectPopover";
interface BgmItem {
id: string;
@@ -18,8 +19,6 @@ interface BgmPanelProps {
onSelectBgm: (id: string) => void;
playingBgmId: string | null;
onTogglePreview: (bgm: BgmItem, event: MouseEvent) => void;
bgmVolume: number;
onVolumeChange: (value: number) => void;
bgmListContainerRef: RefObject<HTMLDivElement | null>;
registerBgmItemRef: (id: string, element: HTMLDivElement | null) => void;
}
@@ -35,11 +34,31 @@ export function BgmPanel({
onSelectBgm,
playingBgmId,
onTogglePreview,
bgmVolume,
onVolumeChange,
bgmListContainerRef,
registerBgmItemRef,
}: BgmPanelProps) {
const [bgmFilter, setBgmFilter] = useState("");
const selectedBgm = bgmList.find((item) => item.id === selectedBgmId) || null;
const canSelectBgm = enableBgm && !bgmLoading && !bgmError && bgmList.length > 0;
const filteredBgmList = useMemo(() => {
const query = bgmFilter.trim().toLowerCase();
if (!query) return bgmList;
return bgmList.filter((bgm) => bgm.name.toLowerCase().includes(query));
}, [bgmFilter, bgmList]);
const handleOpenBgmPopover = useCallback(() => {
setBgmFilter("");
requestAnimationFrame(() => {
requestAnimationFrame(() => {
const container = bgmListContainerRef.current;
if (!container) return;
const selectedRow = container.querySelector<HTMLElement>("[data-bgm-selected='true']");
selectedRow?.scrollIntoView({ block: "nearest", behavior: "auto" });
});
});
}, [bgmListContainerRef]);
return (
<div className="bg-white/5 rounded-2xl p-6 border border-white/10 backdrop-blur-sm">
<div className="flex items-center justify-between mb-4">
@@ -79,57 +98,108 @@ export function BgmPanel({
) : bgmList.length === 0 ? (
<div className="text-center py-4 text-gray-500 text-sm"></div>
) : (
<div
ref={bgmListContainerRef}
className={`space-y-2 max-h-64 overflow-y-auto hide-scrollbar ${enableBgm ? '' : 'opacity-70'}`}
>
{bgmList.map((bgm) => (
<div
key={bgm.id}
ref={(el) => registerBgmItemRef(bgm.id, el)}
className={`p-3 rounded-lg border transition-all flex items-center justify-between group ${selectedBgmId === bgm.id
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
<button onClick={() => onSelectBgm(bgm.id)} className="flex-1 text-left">
<div className="text-white text-sm truncate">{bgm.name}</div>
<div className="text-xs text-gray-400">.{bgm.ext || 'audio'}</div>
<div className={!enableBgm ? "opacity-70" : ""}>
<p className="mb-2 text-xs text-gray-400"></p>
<SelectPopover
sheetTitle="选择背景音乐"
disabled={!canSelectBgm}
onOpen={handleOpenBgmPopover}
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
disabled={!canSelectBgm}
className={`w-full rounded-xl border px-3 py-2.5 text-left transition-colors ${canSelectBgm
? "border-white/10 bg-black/25 hover:border-white/30"
: "border-white/10 bg-black/20 text-gray-500 cursor-not-allowed"
}`}
>
<span className="flex items-center justify-between gap-3">
<span className="min-w-0">
<span className="block truncate text-sm text-white">
{selectedBgm?.name || "请选择背景音乐"}
</span>
<span className="mt-0.5 block text-xs text-gray-400">
{selectedBgm ? `.${selectedBgm.ext || "audio"}` : "未选择"}
</span>
</span>
<ChevronDown className={`h-4 w-4 text-gray-300 transition-transform ${open ? "rotate-180" : ""}`} />
</span>
</button>
<div className="flex items-center gap-2 pl-2">
<button
onClick={(e) => onTogglePreview(bgm, e)}
className="p-1 text-gray-500 hover:text-purple-400 transition-colors"
title="试听"
>
{playingBgmId === bgm.id ? (
<Pause className="h-4 w-4" />
) : (
<Play className="h-4 w-4" />
)}
</button>
{selectedBgmId === bgm.id && (
<span className="text-xs text-purple-300"></span>
)}
>
{({ close }) => (
<div className="space-y-2">
<div className="rounded-lg border border-white/10 bg-black/30 px-3 py-2">
<div className="flex items-center gap-2">
<Search className="h-4 w-4 text-gray-400" />
<input
type="text"
value={bgmFilter}
onChange={(e) => setBgmFilter(e.target.value)}
placeholder="搜索背景音乐..."
className="w-full bg-transparent text-sm text-white placeholder-gray-500 outline-none"
/>
</div>
</div>
{filteredBgmList.length === 0 ? (
<div className="py-6 text-center text-sm text-gray-400"></div>
) : (
<div
ref={bgmListContainerRef}
className="space-y-1"
style={{ contentVisibility: "auto" }}
>
{filteredBgmList.map((bgm) => {
const isSelected = selectedBgmId === bgm.id;
return (
<div
key={bgm.id}
ref={(el) => registerBgmItemRef(bgm.id, el)}
data-popover-selected={isSelected ? "true" : undefined}
data-bgm-selected={isSelected ? "true" : "false"}
className={`flex items-center justify-between gap-2 rounded-lg border px-3 py-2 transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
<button
type="button"
onClick={() => {
onSelectBgm(bgm.id);
close();
}}
className="min-w-0 flex-1 text-left"
>
<span className="block truncate text-sm text-white">{bgm.name}</span>
<span className="mt-0.5 block text-xs text-gray-400">.{bgm.ext || "audio"}</span>
</button>
<div className="flex items-center gap-2 pl-2">
<button
type="button"
onClick={(e) => onTogglePreview(bgm, e)}
className="p-1 text-gray-400 hover:text-purple-300 transition-colors"
title="试听"
>
{playingBgmId === bgm.id ? (
<Pause className="h-4 w-4" />
) : (
<Play className="h-4 w-4" />
)}
</button>
{isSelected && <Check className="h-4 w-4 text-purple-300" />}
</div>
</div>
);
})}
</div>
)}
</div>
</div>
))}
</div>
)}
{enableBgm && (
<div className="mt-4">
<label className="text-sm text-gray-300 mb-2 block"></label>
<input
type="range"
min="0"
max="1"
step="0.05"
value={bgmVolume}
onChange={(e) => onVolumeChange(parseFloat(e.target.value))}
className="w-full accent-purple-500"
/>
<div className="text-xs text-gray-400 mt-1">: {Math.round(bgmVolume * 100)}%</div>
)}
</SelectPopover>
</div>
)}
</div>

View File

@@ -1,6 +1,7 @@
import { useCallback, useEffect, useRef, useState } from "react";
import { X, Play, Pause } from "lucide-react";
import type { TimelineSegment } from "@/features/home/model/useTimelineEditor";
import { useCallback, useEffect, useRef, useState } from "react";
import { Play, Pause } from "lucide-react";
import type { TimelineSegment } from "@/features/home/model/useTimelineEditor";
import { AppModal, AppModalHeader } from "@/shared/ui/AppModal";
interface ClipTrimmerProps {
isOpen: boolean;
@@ -153,21 +154,18 @@ export function ClipTrimmer({
const endPct = duration > 0 ? (effectiveEnd / duration) * 100 : 100;
const playheadPct = duration > 0 ? (currentTime / duration) * 100 : 0;
return (
<div className="fixed inset-0 z-50 flex items-center justify-center bg-black/60 backdrop-blur-sm" onClick={onClose}>
<div
className="bg-gray-900 border border-white/10 rounded-2xl w-full max-w-lg mx-4 overflow-hidden"
onClick={(e) => e.stopPropagation()}
>
{/* Header */}
<div className="flex items-center justify-between px-5 py-3 border-b border-white/10">
<h3 className="text-white font-semibold text-sm">
- {segment.materialName}
</h3>
<button onClick={onClose} className="text-gray-400 hover:text-white">
<X className="h-4 w-4" />
</button>
</div>
return (
<AppModal
isOpen={isOpen}
onClose={onClose}
panelClassName="w-full max-w-lg mx-4 rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden"
closeOnOverlay
>
<AppModalHeader
title={`截取设置 - ${segment.materialName}`}
subtitle="拖拽起止点,精确控制素材片段"
onClose={onClose}
/>
{/* Video preview */}
<div className="px-5 pt-4">
@@ -287,7 +285,6 @@ export function ClipTrimmer({
</button>
</div>
</div>
</div>
);
}
</AppModal>
);
}

View File

@@ -1,7 +1,14 @@
import { Rocket } from "lucide-react";
import { Rocket, ChevronDown, Check } from "lucide-react";
import { SelectPopover } from "@/shared/ui/SelectPopover";
type LipsyncModelMode = "default" | "fast" | "advanced";
const MODEL_OPTIONS: Array<{ value: LipsyncModelMode; label: string; desc: string }> = [
{ value: "default", label: "默认模型", desc: "按时长智能路由" },
{ value: "fast", label: "快速模型", desc: "速度优先" },
{ value: "advanced", label: "高级模型", desc: "质量优先" },
];
interface GenerateActionBarProps {
isGenerating: boolean;
progress: number;
@@ -21,6 +28,8 @@ export function GenerateActionBar({
onModelModeChange,
onGenerate,
}: GenerateActionBarProps) {
const currentModel = MODEL_OPTIONS.find((opt) => opt.value === modelMode) || MODEL_OPTIONS[0];
return (
<div>
<div className="flex items-center gap-2">
@@ -60,17 +69,56 @@ export function GenerateActionBar({
)}
</button>
<select
value={modelMode}
onChange={(e) => onModelModeChange(e.target.value as LipsyncModelMode)}
<SelectPopover
sheetTitle="选择唇形模型"
disabled={isGenerating}
className="h-[58px] rounded-xl border border-white/15 bg-black/30 px-3 text-sm text-gray-200 outline-none focus:border-purple-400"
title="选择唇形模型"
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
disabled={isGenerating}
className="h-[58px] min-w-[152px] rounded-xl border border-white/15 bg-black/30 px-3 text-left text-sm text-gray-200 transition-colors hover:border-white/30 disabled:cursor-not-allowed disabled:opacity-50"
title="选择唇形模型"
>
<span className="flex items-center justify-between gap-2">
<span className="min-w-0">
<span className="block truncate text-sm text-white">{currentModel.label}</span>
<span className="mt-0.5 block text-xs text-gray-400">{currentModel.desc}</span>
</span>
<ChevronDown className={`h-4 w-4 text-gray-300 transition-transform ${open ? "rotate-180" : ""}`} />
</span>
</button>
)}
>
<option value="default"></option>
<option value="fast"></option>
<option value="advanced"></option>
</select>
{({ close }) => (
<div className="space-y-1">
{MODEL_OPTIONS.map((opt) => {
const isSelected = opt.value === modelMode;
return (
<button
key={opt.value}
type="button"
data-popover-selected={isSelected ? "true" : undefined}
onClick={() => {
onModelModeChange(opt.value);
close();
}}
className={`flex w-full items-center justify-between rounded-lg border px-3 py-2 text-left transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
<span>
<span className="block text-sm text-white">{opt.label}</span>
<span className="mt-0.5 block text-xs text-gray-400">{opt.desc}</span>
</span>
{isSelected && <Check className="h-4 w-4 text-purple-300" />}
</button>
);
})}
</div>
)}
</SelectPopover>
</div>
{!isGenerating && materialCount >= 2 && (
<p className="text-xs text-gray-400 text-center mt-1.5">

View File

@@ -1,6 +1,7 @@
import { useState, useRef, useCallback, useEffect } from "react";
import { Play, Pause, Pencil, Trash2, Check, X, RefreshCw, Mic, ChevronDown } from "lucide-react";
import type { GeneratedAudio } from "@/features/home/model/useGeneratedAudios";
import { useState, useRef, useCallback, useEffect, useMemo } from "react";
import { Play, Pause, Pencil, Trash2, Check, X, RefreshCw, Mic, ChevronDown, Search } from "lucide-react";
import type { GeneratedAudio } from "@/features/home/model/useGeneratedAudios";
import { SelectPopover } from "@/shared/ui/SelectPopover";
interface AudioTask {
status: string;
@@ -47,14 +48,12 @@ export function GeneratedAudiosPanel({
onEmotionChange,
embedded = false,
}: GeneratedAudiosPanelProps) {
const [editingId, setEditingId] = useState<string | null>(null);
const [editName, setEditName] = useState("");
const [playingId, setPlayingId] = useState<string | null>(null);
const [speedOpen, setSpeedOpen] = useState(false);
const [emotionOpen, setEmotionOpen] = useState(false);
const audioRef = useRef<HTMLAudioElement | null>(null);
const speedRef = useRef<HTMLDivElement>(null);
const emotionRef = useRef<HTMLDivElement>(null);
const [editingId, setEditingId] = useState<string | null>(null);
const [editName, setEditName] = useState("");
const [playingId, setPlayingId] = useState<string | null>(null);
const [audioFilter, setAudioFilter] = useState("");
const audioRef = useRef<HTMLAudioElement | null>(null);
const audioListContainerRef = useRef<HTMLDivElement | null>(null);
const stopPlaying = useCallback(() => {
if (audioRef.current) {
@@ -75,28 +74,6 @@ export function GeneratedAudiosPanel({
};
}, []);
// Close speed dropdown on click outside
useEffect(() => {
const handler = (e: MouseEvent) => {
if (speedRef.current && !speedRef.current.contains(e.target as Node)) {
setSpeedOpen(false);
}
};
if (speedOpen) document.addEventListener("mousedown", handler);
return () => document.removeEventListener("mousedown", handler);
}, [speedOpen]);
// Close emotion dropdown on click outside
useEffect(() => {
const handler = (e: MouseEvent) => {
if (emotionRef.current && !emotionRef.current.contains(e.target as Node)) {
setEmotionOpen(false);
}
};
if (emotionOpen) document.addEventListener("mousedown", handler);
return () => document.removeEventListener("mousedown", handler);
}, [emotionOpen]);
const togglePlay = (audio: GeneratedAudio, e: React.MouseEvent) => {
e.stopPropagation();
if (playingId === audio.id) {
@@ -148,7 +125,26 @@ export function GeneratedAudiosPanel({
{ value: "sad", label: "低沉" },
{ value: "angry", label: "严肃" },
] as const;
const currentEmotionLabel = emotionOptions.find((o) => o.value === emotion)?.label ?? "正常";
const currentEmotionLabel = emotionOptions.find((o) => o.value === emotion)?.label ?? "正常";
const selectedAudio = generatedAudios.find((audio) => audio.id === selectedAudioId) || null;
const filteredAudios = useMemo(() => {
const query = audioFilter.trim().toLowerCase();
if (!query) return generatedAudios;
return generatedAudios.filter((audio) => audio.name.toLowerCase().includes(query));
}, [audioFilter, generatedAudios]);
const handleOpenAudioPopover = useCallback(() => {
setAudioFilter("");
requestAnimationFrame(() => {
requestAnimationFrame(() => {
const container = audioListContainerRef.current;
if (!container) return;
const selectedRow = container.querySelector<HTMLElement>("[data-audio-selected='true']");
selectedRow?.scrollIntoView({ block: "nearest", behavior: "auto" });
});
});
}, []);
const content = (
<>
@@ -156,62 +152,88 @@ export function GeneratedAudiosPanel({
<>
{/* Row 1: 语气 + 语速 + 生成配音 (right-aligned) */}
<div className="flex justify-end items-center gap-1.5 mb-3">
{ttsMode === "voiceclone" && (
<div ref={emotionRef} className="relative">
<button
onClick={() => setEmotionOpen((v) => !v)}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 whitespace-nowrap flex items-center gap-1 transition-all"
>
: {currentEmotionLabel}
<ChevronDown className={`h-3 w-3 transition-transform ${emotionOpen ? "rotate-180" : ""}`} />
</button>
{emotionOpen && (
<div className="absolute right-0 top-full mt-1 bg-gray-800 border border-white/20 rounded-lg shadow-xl py-1 z-50 min-w-[80px]">
{emotionOptions.map((opt) => (
<button
key={opt.value}
onClick={() => { onEmotionChange(opt.value); setEmotionOpen(false); }}
className={`w-full text-left px-3 py-1.5 text-xs transition-colors ${
emotion === opt.value
? "bg-purple-600/40 text-purple-200"
: "text-gray-300 hover:bg-white/10"
}`}
>
{opt.label}
</button>
))}
</div>
)}
</div>
)}
{ttsMode === "voiceclone" && (
<div ref={speedRef} className="relative">
<button
onClick={() => setSpeedOpen((v) => !v)}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 whitespace-nowrap flex items-center gap-1 transition-all"
>
: {currentSpeedLabel}
<ChevronDown className={`h-3 w-3 transition-transform ${speedOpen ? "rotate-180" : ""}`} />
</button>
{speedOpen && (
<div className="absolute right-0 top-full mt-1 bg-gray-800 border border-white/20 rounded-lg shadow-xl py-1 z-50 min-w-[80px]">
{speedOptions.map((opt) => (
<button
key={opt.value}
onClick={() => { onSpeedChange(opt.value); setSpeedOpen(false); }}
className={`w-full text-left px-3 py-1.5 text-xs transition-colors ${
speed === opt.value
? "bg-purple-600/40 text-purple-200"
: "text-gray-300 hover:bg-white/10"
}`}
>
{opt.label}
</button>
))}
</div>
)}
</div>
)}
{ttsMode === "voiceclone" && (
<SelectPopover
sheetTitle="选择语气"
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="rounded-lg border border-white/10 bg-black/25 px-2.5 py-1.5 text-xs text-gray-200 whitespace-nowrap flex items-center gap-1 transition-colors hover:border-white/30"
>
: {currentEmotionLabel}
<ChevronDown className={`h-3 w-3 transition-transform ${open ? "rotate-180" : ""}`} />
</button>
)}
>
{({ close }) => (
<div className="space-y-1">
{emotionOptions.map((opt) => {
const isSelected = emotion === opt.value;
return (
<button
key={opt.value}
type="button"
data-popover-selected={isSelected ? "true" : undefined}
onClick={() => {
onEmotionChange(opt.value);
close();
}}
className={`flex w-full items-center justify-between rounded-lg border px-3 py-2 text-left text-xs transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20 text-purple-200"
: "border-white/10 bg-white/5 text-gray-300 hover:border-white/30"
}`}
>
{opt.label}
{isSelected && <Check className="h-3.5 w-3.5 text-purple-300" />}
</button>
);
})}
</div>
)}
</SelectPopover>
)}
{ttsMode === "voiceclone" && (
<SelectPopover
sheetTitle="选择语速"
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="rounded-lg border border-white/10 bg-black/25 px-2.5 py-1.5 text-xs text-gray-200 whitespace-nowrap flex items-center gap-1 transition-colors hover:border-white/30"
>
: {currentSpeedLabel}
<ChevronDown className={`h-3 w-3 transition-transform ${open ? "rotate-180" : ""}`} />
</button>
)}
>
{({ close }) => (
<div className="space-y-1">
{speedOptions.map((opt) => {
const isSelected = speed === opt.value;
return (
<button
key={opt.value}
type="button"
data-popover-selected={isSelected ? "true" : undefined}
onClick={() => {
onSpeedChange(opt.value);
close();
}}
className={`flex w-full items-center justify-between rounded-lg border px-3 py-2 text-left text-xs transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20 text-purple-200"
: "border-white/10 bg-white/5 text-gray-300 hover:border-white/30"
}`}
>
{opt.label}
{isSelected && <Check className="h-3.5 w-3.5 text-purple-300" />}
</button>
);
})}
</div>
)}
</SelectPopover>
)}
<button
onClick={onGenerateAudio}
disabled={isGeneratingAudio || !canGenerate}
@@ -245,62 +267,88 @@ export function GeneratedAudiosPanel({
</h2>
<div className="flex gap-1.5">
{ttsMode === "voiceclone" && (
<div ref={emotionRef} className="relative">
<button
onClick={() => setEmotionOpen((v) => !v)}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 whitespace-nowrap flex items-center gap-1 transition-all"
>
: {currentEmotionLabel}
<ChevronDown className={`h-3 w-3 transition-transform ${emotionOpen ? "rotate-180" : ""}`} />
</button>
{emotionOpen && (
<div className="absolute right-0 top-full mt-1 bg-gray-800 border border-white/20 rounded-lg shadow-xl py-1 z-50 min-w-[80px]">
{emotionOptions.map((opt) => (
<button
key={opt.value}
onClick={() => { onEmotionChange(opt.value); setEmotionOpen(false); }}
className={`w-full text-left px-3 py-1.5 text-xs transition-colors ${
emotion === opt.value
? "bg-purple-600/40 text-purple-200"
: "text-gray-300 hover:bg-white/10"
}`}
>
{opt.label}
</button>
))}
</div>
)}
</div>
)}
{ttsMode === "voiceclone" && (
<div ref={speedRef} className="relative">
<button
onClick={() => setSpeedOpen((v) => !v)}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 whitespace-nowrap flex items-center gap-1 transition-all"
>
: {currentSpeedLabel}
<ChevronDown className={`h-3 w-3 transition-transform ${speedOpen ? "rotate-180" : ""}`} />
</button>
{speedOpen && (
<div className="absolute right-0 top-full mt-1 bg-gray-800 border border-white/20 rounded-lg shadow-xl py-1 z-50 min-w-[80px]">
{speedOptions.map((opt) => (
<button
key={opt.value}
onClick={() => { onSpeedChange(opt.value); setSpeedOpen(false); }}
className={`w-full text-left px-3 py-1.5 text-xs transition-colors ${
speed === opt.value
? "bg-purple-600/40 text-purple-200"
: "text-gray-300 hover:bg-white/10"
}`}
>
{opt.label}
</button>
))}
</div>
)}
</div>
)}
{ttsMode === "voiceclone" && (
<SelectPopover
sheetTitle="选择语气"
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="rounded-lg border border-white/10 bg-black/25 px-2.5 py-1.5 text-xs text-gray-200 whitespace-nowrap flex items-center gap-1 transition-colors hover:border-white/30"
>
: {currentEmotionLabel}
<ChevronDown className={`h-3 w-3 transition-transform ${open ? "rotate-180" : ""}`} />
</button>
)}
>
{({ close }) => (
<div className="space-y-1">
{emotionOptions.map((opt) => {
const isSelected = emotion === opt.value;
return (
<button
key={opt.value}
type="button"
data-popover-selected={isSelected ? "true" : undefined}
onClick={() => {
onEmotionChange(opt.value);
close();
}}
className={`flex w-full items-center justify-between rounded-lg border px-3 py-2 text-left text-xs transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20 text-purple-200"
: "border-white/10 bg-white/5 text-gray-300 hover:border-white/30"
}`}
>
{opt.label}
{isSelected && <Check className="h-3.5 w-3.5 text-purple-300" />}
</button>
);
})}
</div>
)}
</SelectPopover>
)}
{ttsMode === "voiceclone" && (
<SelectPopover
sheetTitle="选择语速"
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="rounded-lg border border-white/10 bg-black/25 px-2.5 py-1.5 text-xs text-gray-200 whitespace-nowrap flex items-center gap-1 transition-colors hover:border-white/30"
>
: {currentSpeedLabel}
<ChevronDown className={`h-3 w-3 transition-transform ${open ? "rotate-180" : ""}`} />
</button>
)}
>
{({ close }) => (
<div className="space-y-1">
{speedOptions.map((opt) => {
const isSelected = speed === opt.value;
return (
<button
key={opt.value}
type="button"
data-popover-selected={isSelected ? "true" : undefined}
onClick={() => {
onSpeedChange(opt.value);
close();
}}
className={`flex w-full items-center justify-between rounded-lg border px-3 py-2 text-left text-xs transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20 text-purple-200"
: "border-white/10 bg-white/5 text-gray-300 hover:border-white/30"
}`}
>
{opt.label}
{isSelected && <Check className="h-3.5 w-3.5 text-purple-300" />}
</button>
);
})}
</div>
)}
</SelectPopover>
)}
<button
onClick={onGenerateAudio}
disabled={isGeneratingAudio || !canGenerate}
@@ -349,87 +397,142 @@ export function GeneratedAudiosPanel({
)}
{/* 配音列表 */}
{generatedAudios.length === 0 ? (
<div className="text-center py-6 text-gray-400">
<p className="text-sm"></p>
<p className="text-xs mt-1 text-gray-500"></p>
</div>
) : (
<div className="space-y-2 max-h-48 sm:max-h-56 overflow-y-auto hide-scrollbar">
{generatedAudios.map((audio) => {
const isSelected = selectedAudioId === audio.id;
return (
<div
key={audio.id}
onClick={() => onSelectAudio(audio)}
className={`p-3 rounded-lg border transition-all cursor-pointer flex items-center justify-between group ${
isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
{editingId === audio.id ? (
<div className="flex-1 flex items-center gap-2" onClick={(e) => e.stopPropagation()}>
<input
value={editName}
onChange={(e) => setEditName(e.target.value)}
className="flex-1 bg-black/40 border border-white/20 rounded-md px-2 py-1 text-xs text-white"
autoFocus
onKeyDown={(e) => {
if (e.key === "Enter") saveEditing(audio.id, e as unknown as React.MouseEvent);
if (e.key === "Escape") cancelEditing(e as unknown as React.MouseEvent);
}}
/>
<button onClick={(e) => saveEditing(audio.id, e)} className="p-1 text-green-400 hover:text-green-300" title="保存">
<Check className="h-4 w-4" />
</button>
<button onClick={cancelEditing} className="p-1 text-gray-400 hover:text-white" title="取消">
<X className="h-4 w-4" />
</button>
</div>
) : (
<>
<div className="min-w-0 flex-1">
<div className="text-white text-sm truncate">{audio.name}</div>
<div className="text-gray-400 text-xs">{audio.duration_sec.toFixed(1)}s</div>
</div>
<div className="flex items-center gap-1 pl-2 opacity-40 group-hover:opacity-100 transition-opacity">
<button
onClick={(e) => togglePlay(audio, e)}
className="p-1 text-gray-500 hover:text-purple-400 transition-colors"
title={playingId === audio.id ? "暂停" : "播放"}
>
{playingId === audio.id ? (
<Pause className="h-3.5 w-3.5" />
) : (
<Play className="h-3.5 w-3.5" />
)}
</button>
<button
onClick={(e) => startEditing(audio, e)}
className="p-1 text-gray-500 hover:text-white transition-colors"
title="重命名"
>
<Pencil className="h-3.5 w-3.5" />
</button>
<button
onClick={(e) => {
e.stopPropagation();
onDeleteAudio(audio.id);
}}
className="p-1 text-gray-500 hover:text-red-400 transition-colors"
title="删除"
>
<Trash2 className="h-3.5 w-3.5" />
</button>
</div>
</>
)}
</div>
);
})}
</div>
)}
{generatedAudios.length === 0 ? (
<div className="text-center py-6 text-gray-400">
<p className="text-sm"></p>
<p className="text-xs mt-1 text-gray-500"></p>
</div>
) : (
<SelectPopover
sheetTitle="选择配音"
onOpen={handleOpenAudioPopover}
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="w-full rounded-xl border border-white/10 bg-black/25 px-3 py-2.5 text-left transition-colors hover:border-white/30"
>
<span className="flex items-center justify-between gap-3">
<span className="min-w-0">
<span className="block text-xs text-gray-400"></span>
<span className="mt-0.5 block truncate text-sm text-white">
{selectedAudio ? selectedAudio.name : "请选择配音"}
</span>
</span>
<ChevronDown className={`h-4 w-4 text-gray-300 transition-transform ${open ? "rotate-180" : ""}`} />
</span>
</button>
)}
>
{({ close }) => (
<div className="space-y-2">
<div className="rounded-lg border border-white/10 bg-black/30 px-3 py-2">
<div className="flex items-center gap-2">
<Search className="h-4 w-4 text-gray-400" />
<input
type="text"
value={audioFilter}
onChange={(e) => setAudioFilter(e.target.value)}
placeholder="搜索配音..."
className="w-full bg-transparent text-sm text-white placeholder-gray-500 outline-none"
/>
</div>
</div>
{filteredAudios.length === 0 ? (
<div className="py-6 text-center text-sm text-gray-400"></div>
) : (
<div ref={audioListContainerRef} className="space-y-1" style={{ contentVisibility: "auto" }}>
{filteredAudios.map((audio) => {
const isSelected = selectedAudioId === audio.id;
return (
<div
key={audio.id}
data-popover-selected={isSelected ? "true" : undefined}
data-audio-selected={isSelected ? "true" : "false"}
className={`flex items-center justify-between gap-2 rounded-lg border px-3 py-2 transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
{editingId === audio.id ? (
<div className="flex-1 flex items-center gap-2" onClick={(e) => e.stopPropagation()}>
<input
value={editName}
onChange={(e) => setEditName(e.target.value)}
className="flex-1 bg-black/40 border border-white/20 rounded-md px-2 py-1 text-xs text-white"
autoFocus
onKeyDown={(e) => {
if (e.key === "Enter") saveEditing(audio.id, e as unknown as React.MouseEvent);
if (e.key === "Escape") cancelEditing(e as unknown as React.MouseEvent);
}}
/>
<button type="button" onClick={(e) => saveEditing(audio.id, e)} className="p-1 text-green-400 hover:text-green-300" title="保存">
<Check className="h-4 w-4" />
</button>
<button type="button" onClick={cancelEditing} className="p-1 text-gray-400 hover:text-white" title="取消">
<X className="h-4 w-4" />
</button>
</div>
) : (
<button
type="button"
onClick={() => {
onSelectAudio(audio);
close();
}}
className="min-w-0 flex-1 text-left"
>
<span className="block truncate text-sm text-white">{audio.name}</span>
<span className="mt-0.5 block text-xs text-gray-400">{audio.duration_sec.toFixed(1)}s</span>
</button>
)}
{editingId !== audio.id && (
<div className="flex items-center gap-1 pl-2">
<button
type="button"
onClick={(e) => togglePlay(audio, e)}
className="p-1 text-gray-400 hover:text-purple-300 transition-colors"
title={playingId === audio.id ? "暂停" : "播放"}
>
{playingId === audio.id ? (
<Pause className="h-3.5 w-3.5" />
) : (
<Play className="h-3.5 w-3.5" />
)}
</button>
<button
type="button"
onClick={(e) => startEditing(audio, e)}
className="p-1 text-gray-400 hover:text-white transition-colors"
title="重命名"
>
<Pencil className="h-3.5 w-3.5" />
</button>
<button
type="button"
onClick={(e) => {
e.stopPropagation();
onDeleteAudio(audio.id);
}}
className="p-1 text-gray-400 hover:text-red-400 transition-colors"
title="删除"
>
<Trash2 className="h-3.5 w-3.5" />
</button>
{isSelected && <Check className="h-3.5 w-3.5 text-purple-300" />}
</div>
)}
</div>
);
})}
</div>
)}
</div>
)}
</SelectPopover>
)}
</>
);

View File

@@ -1,4 +1,6 @@
import { RefreshCw, Trash2 } from "lucide-react";
import { useCallback, useMemo, useRef, useState } from "react";
import { RefreshCw, Trash2, Search, ChevronDown, Check } from "lucide-react";
import { SelectPopover } from "@/shared/ui/SelectPopover";
interface GeneratedVideo {
id: string;
@@ -29,6 +31,29 @@ export function HistoryList({
formatDate,
embedded = false,
}: HistoryListProps) {
const [videoFilter, setVideoFilter] = useState("");
const videoListContainerRef = useRef<HTMLDivElement | null>(null);
const selectedVideo = generatedVideos.find((v) => v.id === selectedVideoId) || null;
const filteredVideos = useMemo(() => {
const query = videoFilter.trim().toLowerCase();
if (!query) return generatedVideos;
return generatedVideos.filter((v) => formatDate(v.created_at).toLowerCase().includes(query));
}, [generatedVideos, videoFilter, formatDate]);
const handleOpenVideoPopover = useCallback(() => {
setVideoFilter("");
requestAnimationFrame(() => {
requestAnimationFrame(() => {
const container = videoListContainerRef.current;
if (!container) return;
const selectedRow = container.querySelector<HTMLElement>("[data-video-selected='true']");
selectedRow?.scrollIntoView({ block: "nearest", behavior: "auto" });
});
});
}, []);
const content = (
<>
{!embedded && (
@@ -48,36 +73,98 @@ export function HistoryList({
<p></p>
</div>
) : (
<div
className="space-y-2 max-h-64 overflow-y-auto hide-scrollbar"
style={{ contentVisibility: 'auto' }}
>
{generatedVideos.map((v) => (
<div
key={v.id}
ref={(el) => registerVideoRef(v.id, el)}
className={`p-3 rounded-lg border transition-all flex items-center justify-between group ${selectedVideoId === v.id
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
<SelectPopover
sheetTitle="选择作品"
onOpen={handleOpenVideoPopover}
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="w-full rounded-xl border border-white/10 bg-black/25 px-3 py-2.5 text-left transition-colors hover:border-white/30"
>
<button onClick={() => onSelectVideo(v)} className="flex-1 text-left">
<div className="text-white text-sm truncate">{formatDate(v.created_at)}</div>
<div className="text-gray-400 text-xs">{v.size_mb.toFixed(1)} MB</div>
</button>
<button
onClick={(e) => {
e.stopPropagation();
onDeleteVideo(v.id);
}}
className="p-1 text-gray-500 hover:text-red-400 opacity-40 group-hover:opacity-100 transition-opacity"
title="删除视频"
>
<Trash2 className="h-4 w-4" />
</button>
<span className="flex items-center justify-between gap-3">
<span className="min-w-0">
<span className="block text-xs text-gray-400"></span>
<span className="mt-0.5 block truncate text-sm text-white">
{selectedVideo ? formatDate(selectedVideo.created_at) : "请选择作品"}
</span>
</span>
<ChevronDown className={`h-4 w-4 text-gray-300 transition-transform ${open ? "rotate-180" : ""}`} />
</span>
</button>
)}
>
{({ close }) => (
<div className="space-y-2">
<div className="rounded-lg border border-white/10 bg-black/30 px-3 py-2">
<div className="flex items-center gap-2">
<Search className="h-4 w-4 text-gray-400" />
<input
type="text"
value={videoFilter}
onChange={(e) => setVideoFilter(e.target.value)}
placeholder="搜索作品..."
className="w-full bg-transparent text-sm text-white placeholder-gray-500 outline-none"
/>
</div>
</div>
{filteredVideos.length === 0 ? (
<div className="py-6 text-center text-sm text-gray-400"></div>
) : (
<div
ref={videoListContainerRef}
className="space-y-1"
style={{ contentVisibility: "auto" }}
>
{filteredVideos.map((v) => {
const isSelected = selectedVideoId === v.id;
return (
<div
key={v.id}
ref={(el) => registerVideoRef(v.id, el)}
data-popover-selected={isSelected ? "true" : undefined}
data-video-selected={isSelected ? "true" : "false"}
className={`flex items-center justify-between gap-2 rounded-lg border px-3 py-2 transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
<button
type="button"
onClick={() => {
onSelectVideo(v);
close();
}}
className="min-w-0 flex-1 text-left"
>
<span className="block truncate text-sm text-white">{formatDate(v.created_at)}</span>
<span className="mt-0.5 block text-xs text-gray-400">{v.size_mb.toFixed(1)} MB</span>
</button>
<div className="flex items-center gap-2 pl-2">
<button
type="button"
onClick={(e) => {
e.stopPropagation();
onDeleteVideo(v.id);
}}
className="p-1 text-gray-400 hover:text-red-400"
title="删除视频"
>
<Trash2 className="h-4 w-4" />
</button>
{isSelected && <Check className="h-4 w-4 text-purple-300" />}
</div>
</div>
);
})}
</div>
)}
</div>
))}
</div>
)}
</SelectPopover>
)}
</>
);

View File

@@ -5,9 +5,11 @@ import { useRouter } from "next/navigation";
import { RefreshCw } from "lucide-react";
import VideoPreviewModal from "@/components/VideoPreviewModal";
import ScriptExtractionModal from "./ScriptExtractionModal";
import ScriptLearningModal from "./ScriptLearningModal";
import RewriteModal from "./RewriteModal";
import { useHomeController } from "@/features/home/model/useHomeController";
import { resolveMediaUrl } from "@/shared/lib/media";
import { toast } from "sonner";
import { BgmPanel } from "@/features/home/ui/BgmPanel";
import { GenerateActionBar } from "@/features/home/ui/GenerateActionBar";
import { HistoryList } from "@/features/home/ui/HistoryList";
@@ -53,6 +55,8 @@ export function HomePage() {
setText,
extractModalOpen,
setExtractModalOpen,
learningModalOpen,
setLearningModalOpen,
rewriteModalOpen,
setRewriteModalOpen,
handleGenerateMeta,
@@ -132,6 +136,7 @@ export function HomePage() {
startRecording,
stopRecording,
useRecording,
discardRecording,
formatRecordingTime,
bgmList,
bgmLoading,
@@ -143,8 +148,6 @@ export function HomePage() {
setSelectedBgmId,
playingBgmId,
toggleBgmPreview,
bgmVolume,
setBgmVolume,
bgmListContainerRef,
registerBgmItemRef,
currentTask,
@@ -172,9 +175,19 @@ export function HomePage() {
setSpeed,
emotion,
setEmotion,
timelineSegments,
reorderSegments,
setSourceRange,
// Multi-camera timeline
inserts,
timelinePrimaryMaterial,
primarySourceStart,
primarySourceEnd,
insertCandidates,
addInsert,
removeInsert,
moveInsert,
resizeInsert,
setInsertSourceRange,
setPrimarySourceRange,
handleSetPrimary,
clipTrimmerOpen,
setClipTrimmerOpen,
clipTrimmerSegmentId,
@@ -199,10 +212,27 @@ export function HomePage() {
return () => clearTimeout(timer);
}, []);
const clipTrimmerSegment = useMemo(
() => timelineSegments.find((s) => s.id === clipTrimmerSegmentId) ?? null,
[timelineSegments, clipTrimmerSegmentId]
);
// ClipTrimmer: construct segment from either primary or an insert
const clipTrimmerSegment = useMemo(() => {
if (!clipTrimmerSegmentId) return null;
// Check if it's the primary material
if (clipTrimmerSegmentId === "primary" && timelinePrimaryMaterial) {
return {
id: "primary",
materialId: timelinePrimaryMaterial.id,
materialName: timelinePrimaryMaterial.scene || timelinePrimaryMaterial.name || "",
start: 0,
end: selectedAudio?.duration_sec ?? 0,
sourceStart: primarySourceStart,
sourceEnd: primarySourceEnd,
color: "#8b5cf6",
};
}
// Check inserts
const insert = inserts.find((i) => i.id === clipTrimmerSegmentId);
if (insert) return insert;
return null;
}, [clipTrimmerSegmentId, timelinePrimaryMaterial, inserts, selectedAudio, primarySourceStart, primarySourceEnd]);
const clipTrimmerMaterialUrl = useMemo(() => {
if (!clipTrimmerSegment) return null;
@@ -223,9 +253,8 @@ export function HomePage() {
text={text}
onChangeText={setText}
onOpenExtractModal={() => setExtractModalOpen(true)}
onOpenLearningModal={() => setLearningModalOpen(true)}
onOpenRewriteModal={() => setRewriteModalOpen(true)}
onGenerateMeta={handleGenerateMeta}
isGeneratingMeta={isGeneratingMeta}
onTranslate={handleTranslate}
isTranslating={isTranslating}
hasOriginalText={originalText !== null}
@@ -237,7 +266,7 @@ export function HomePage() {
/>
{/* 二、配音 */}
<div className="bg-white/5 rounded-2xl p-4 sm:p-6 border border-white/10 backdrop-blur-sm">
<div className="relative z-20 bg-white/5 rounded-2xl p-4 sm:p-6 border border-white/10 backdrop-blur-sm">
<h2 className="text-base sm:text-lg font-semibold text-white mb-4">
</h2>
@@ -276,6 +305,7 @@ export function HomePage() {
onStartRecording={startRecording}
onStopRecording={stopRecording}
onUseRecording={useRecording}
onDiscardRecording={discardRecording}
formatRecordingTime={formatRecordingTime}
/>
)}
@@ -331,6 +361,7 @@ export function HomePage() {
onDeleteMaterial={deleteMaterial}
onClearUploadError={() => setUploadError(null)}
registerMaterialRef={registerMaterialRef}
onSetPrimary={handleSetPrimary}
/>
<div className="border-t border-white/10 my-4" />
<div className="relative">
@@ -345,15 +376,28 @@ export function HomePage() {
embedded
audioDuration={selectedAudio?.duration_sec ?? 0}
audioUrl={selectedAudio ? (resolveMediaUrl(selectedAudio.path) || "") : ""}
segments={timelineSegments}
materials={materials}
outputAspectRatio={outputAspectRatio}
onOutputAspectRatioChange={setOutputAspectRatio}
onReorderSegment={reorderSegments}
onClickSegment={(seg) => {
setClipTrimmerSegmentId(seg.id);
primaryMaterial={timelinePrimaryMaterial}
inserts={inserts}
insertCandidates={insertCandidates}
onAddInsert={(materialId) => {
const result = addInsert(materialId);
if (result === "limit") toast.error("最多添加 10 个插入片段");
else if (result === "no_space") toast.error("时间轴空间不足,无法再添加插入");
}}
onRemoveInsert={removeInsert}
onMoveInsert={moveInsert}
onClickInsert={(insert) => {
setClipTrimmerSegmentId(insert.id);
setClipTrimmerOpen(true);
}}
onClickPrimary={() => {
setClipTrimmerSegmentId("primary");
setClipTrimmerOpen(true);
}}
primarySourceStart={primarySourceStart}
primarySourceEnd={primarySourceEnd}
outputAspectRatio={outputAspectRatio}
onOutputAspectRatioChange={setOutputAspectRatio}
/>
</div>
</div>
@@ -362,6 +406,9 @@ export function HomePage() {
<TitleSubtitlePanel
showStylePreview={showStylePreview}
onTogglePreview={() => setShowStylePreview((prev) => !prev)}
onGenerateMeta={handleGenerateMeta}
isGeneratingMeta={isGeneratingMeta}
canGenerateMeta={!!text.trim()}
videoTitle={videoTitle}
onTitleChange={titleInput.handleChange}
onTitleCompositionStart={titleInput.handleCompositionStart}
@@ -421,8 +468,6 @@ export function HomePage() {
onSelectBgm={setSelectedBgmId}
playingBgmId={playingBgmId}
onTogglePreview={toggleBgmPreview}
bgmVolume={bgmVolume}
onVolumeChange={setBgmVolume}
bgmListContainerRef={bgmListContainerRef}
registerBgmItemRef={registerBgmItemRef}
/>
@@ -490,6 +535,7 @@ export function HomePage() {
currentTask={null}
isGenerating={false}
generatedVideo={generatedVideo}
generatedVideoId={selectedVideoId}
/>
</div>
</div>
@@ -514,13 +560,28 @@ export function HomePage() {
onApply={(newText) => setText(newText)}
/>
<ScriptLearningModal
isOpen={learningModalOpen}
onClose={() => setLearningModalOpen(false)}
onApply={(nextText) => setText(nextText)}
/>
<ClipTrimmer
isOpen={clipTrimmerOpen}
segment={clipTrimmerSegment}
materialUrl={clipTrimmerMaterialUrl}
onConfirm={(sourceStart, sourceEnd) => {
if (clipTrimmerSegmentId) {
setSourceRange(clipTrimmerSegmentId, sourceStart, sourceEnd);
if (clipTrimmerSegmentId === "primary") {
setPrimarySourceRange(sourceStart, sourceEnd);
} else if (clipTrimmerSegmentId) {
setInsertSourceRange(clipTrimmerSegmentId, sourceStart, sourceEnd);
// Sync timeline duration to match trimmed source duration
if (sourceEnd > sourceStart) {
const ins = inserts.find((i) => i.id === clipTrimmerSegmentId);
if (ins) {
resizeInsert(clipTrimmerSegmentId, ins.start + (sourceEnd - sourceStart));
}
}
}
setClipTrimmerOpen(false);
}}

View File

@@ -1,6 +1,7 @@
import { type ChangeEvent, type MouseEvent, useMemo } from "react";
import { Upload, RefreshCw, Eye, Trash2, X, Pencil, Check } from "lucide-react";
import { type ChangeEvent, type MouseEvent, useCallback, useMemo, useRef, useState } from "react";
import { Upload, RefreshCw, Eye, Trash2, X, Pencil, Check, Search, ChevronDown, Crown } from "lucide-react";
import type { Material } from "@/shared/types/material";
import { SelectPopover } from "@/shared/ui/SelectPopover";
interface MaterialSelectorProps {
materials: Material[];
@@ -25,6 +26,7 @@ interface MaterialSelectorProps {
onDeleteMaterial: (id: string) => void;
onClearUploadError: () => void;
registerMaterialRef: (id: string, element: HTMLDivElement | null) => void;
onSetPrimary?: (materialId: string) => void;
embedded?: boolean;
}
@@ -51,10 +53,49 @@ export function MaterialSelector({
onDeleteMaterial,
onClearUploadError,
registerMaterialRef,
onSetPrimary,
embedded = false,
}: MaterialSelectorProps) {
const [materialFilter, setMaterialFilter] = useState("");
const materialListContainerRef = useRef<HTMLDivElement | null>(null);
const selectedSet = useMemo(() => new Set(selectedMaterials), [selectedMaterials]);
const isFull = selectedMaterials.length >= 4;
const selectedMaterialItems = useMemo(
() => selectedMaterials.map((id) => materials.find((m) => m.id === id)).filter((m): m is Material => Boolean(m)),
[materials, selectedMaterials],
);
const filteredMaterials = useMemo(() => {
const query = materialFilter.trim().toLowerCase();
if (!query) return materials;
return materials.filter((m) => (m.scene || m.name).toLowerCase().includes(query));
}, [materialFilter, materials]);
const selectedSummary = useMemo(() => {
if (selectedMaterialItems.length === 0) {
return "请选择素材最多4个";
}
const names = selectedMaterialItems
.slice(0, 2)
.map((m) => m.scene || m.name)
.join("、");
if (selectedMaterialItems.length > 2) {
return `${names} +${selectedMaterialItems.length - 2}`;
}
return names;
}, [selectedMaterialItems]);
const handleOpenMaterialPopover = useCallback(() => {
setMaterialFilter("");
requestAnimationFrame(() => {
requestAnimationFrame(() => {
const container = materialListContainerRef.current;
if (!container) return;
const selectedRow = container.querySelector<HTMLElement>("[data-material-selected='true']");
selectedRow?.scrollIntoView({ block: "nearest", behavior: "auto" });
});
});
}, []);
const content = (
<>
@@ -151,100 +192,167 @@ export function MaterialSelector({
</p>
</div>
) : (
<div
className="space-y-2 max-h-48 sm:max-h-64 overflow-y-auto hide-scrollbar"
style={{ contentVisibility: 'auto' }}
<SelectPopover
sheetTitle="选择视频素材"
onOpen={handleOpenMaterialPopover}
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="w-full rounded-xl border border-white/10 bg-black/25 px-3 py-2.5 text-left transition-colors hover:border-white/30"
>
<span className="flex items-center justify-between gap-3">
<span className="min-w-0">
<span className="block text-xs text-gray-400"> {selectedMaterials.length}/4 </span>
<span className="mt-0.5 block truncate text-sm text-white">{selectedSummary}</span>
</span>
<ChevronDown className={`h-4 w-4 text-gray-300 transition-transform ${open ? "rotate-180" : ""}`} />
</span>
</button>
)}
>
{materials.map((m) => {
const isSelected = selectedSet.has(m.id);
return (
<div
key={m.id}
ref={(el) => registerMaterialRef(m.id, el)}
className={`p-3 rounded-lg border transition-all flex items-center justify-between group ${isSelected
? "border-purple-500 bg-purple-500/20"
: isFull
? "border-white/5 bg-white/[0.02] opacity-50 cursor-not-allowed"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
{editingMaterialId === m.id ? (
<div className="flex-1 flex items-center gap-2" onClick={(e) => e.stopPropagation()}>
<input
value={editMaterialName}
onChange={(e) => onEditNameChange(e.target.value)}
className="flex-1 bg-black/40 border border-white/20 rounded-md px-2 py-1 text-xs text-white"
autoFocus
/>
<button
onClick={(e) => onSaveEditing(m.id, e)}
className="p-1 text-green-400 hover:text-green-300"
title="保存"
>
<Check className="h-4 w-4" />
</button>
<button
onClick={onCancelEditing}
className="p-1 text-gray-400 hover:text-white"
title="取消"
>
<X className="h-4 w-4" />
</button>
</div>
) : (
<button onClick={() => onToggleMaterial(m.id)} disabled={isFull && !isSelected} className="flex-1 text-left flex items-center gap-2">
{/* 复选框 */}
<span
className={`flex-shrink-0 w-4 h-4 rounded border flex items-center justify-center text-[10px] ${isSelected
? "border-purple-500 bg-purple-500 text-white"
: "border-white/30 text-transparent"
}`}
>
{isSelected ? "✓" : ""}
</span>
<div className="min-w-0">
<div className="text-white text-sm truncate">{m.scene || m.name}</div>
<div className="text-gray-400 text-xs">{m.size_mb.toFixed(1)} MB</div>
</div>
</button>
)}
<div className="flex items-center gap-2 pl-2">
<button
onClick={(e) => {
e.stopPropagation();
if (m.path) {
onPreviewMaterial(m.path);
}
}}
className="p-1 text-gray-500 hover:text-white opacity-40 group-hover:opacity-100 transition-opacity"
title="预览视频"
>
<Eye className="h-4 w-4" />
</button>
{editingMaterialId !== m.id && (
<button
onClick={(e) => onStartEditing(m, e)}
className="p-1 text-gray-500 hover:text-white opacity-40 group-hover:opacity-100 transition-opacity"
title="重命名"
>
<Pencil className="h-4 w-4" />
</button>
)}
<button
onClick={(e) => {
e.stopPropagation();
onDeleteMaterial(m.id);
}}
className="p-1 text-gray-500 hover:text-red-400 opacity-40 group-hover:opacity-100 transition-opacity"
title="删除素材"
>
<Trash2 className="h-4 w-4" />
</button>
{() => (
<div className="space-y-2">
<div className="rounded-lg border border-white/10 bg-black/30 px-3 py-2">
<div className="flex items-center gap-2">
<Search className="h-4 w-4 text-gray-400" />
<input
type="text"
value={materialFilter}
onChange={(e) => setMaterialFilter(e.target.value)}
placeholder="搜索素材名称..."
className="w-full bg-transparent text-sm text-white placeholder-gray-500 outline-none"
/>
</div>
</div>
);
})}
</div>
{filteredMaterials.length === 0 ? (
<div className="py-6 text-center text-sm text-gray-400"></div>
) : (
<div
ref={materialListContainerRef}
className="space-y-1"
style={{ contentVisibility: "auto" }}
>
{filteredMaterials.map((m) => {
const isSelected = selectedSet.has(m.id);
return (
<div
key={m.id}
ref={(el) => registerMaterialRef(m.id, el)}
data-popover-selected={isSelected ? "true" : undefined}
data-material-selected={isSelected ? "true" : "false"}
className={`flex items-center justify-between gap-2 rounded-lg border px-3 py-2 transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20"
: isFull
? "border-white/5 bg-white/[0.02] opacity-50"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
{editingMaterialId === m.id ? (
<div className="flex-1 flex items-center gap-2" onClick={(e) => e.stopPropagation()}>
<input
value={editMaterialName}
onChange={(e) => onEditNameChange(e.target.value)}
className="flex-1 rounded-md border border-white/20 bg-black/40 px-2 py-1 text-xs text-white"
autoFocus
/>
<button
type="button"
onClick={(e) => onSaveEditing(m.id, e)}
className="p-1 text-green-400 hover:text-green-300"
title="保存"
>
<Check className="h-4 w-4" />
</button>
<button
type="button"
onClick={onCancelEditing}
className="p-1 text-gray-400 hover:text-white"
title="取消"
>
<X className="h-4 w-4" />
</button>
</div>
) : (
<button
type="button"
onClick={() => onToggleMaterial(m.id)}
disabled={isFull && !isSelected}
className="min-w-0 flex-1 text-left"
>
<span className="flex items-center gap-1.5">
<span className="block truncate text-sm text-white">{m.scene || m.name}</span>
{isSelected && selectedMaterials[0] === m.id && selectedMaterials.length > 1 && (
<span className="shrink-0 text-[9px] px-1 py-0.5 rounded bg-purple-500/30 text-purple-300 border border-purple-500/40"></span>
)}
{isSelected && selectedMaterials[0] !== m.id && (
<span className="shrink-0 text-[9px] px-1 py-0.5 rounded bg-white/10 text-gray-400 border border-white/10"></span>
)}
</span>
<span className="mt-0.5 block text-xs text-gray-400">{m.size_mb.toFixed(1)} MB</span>
</button>
)}
<div className="flex items-center gap-2 pl-2">
{isSelected && selectedMaterials[0] !== m.id && onSetPrimary && (
<button
type="button"
onClick={(e) => {
e.stopPropagation();
onSetPrimary(m.id);
}}
className="p-1 text-gray-400 hover:text-amber-300"
title="设为主素材"
>
<Crown className="h-4 w-4" />
</button>
)}
<button
type="button"
onClick={(e) => {
e.stopPropagation();
if (m.path) {
onPreviewMaterial(m.path);
}
}}
className="p-1 text-gray-400 hover:text-purple-300"
title="预览视频"
>
<Eye className="h-4 w-4" />
</button>
{editingMaterialId !== m.id && (
<button
type="button"
onClick={(e) => onStartEditing(m, e)}
className="p-1 text-gray-400 hover:text-white"
title="重命名"
>
<Pencil className="h-4 w-4" />
</button>
)}
<button
type="button"
onClick={(e) => {
e.stopPropagation();
onDeleteMaterial(m.id);
}}
className="p-1 text-gray-400 hover:text-red-400"
title="删除素材"
>
<Trash2 className="h-4 w-4" />
</button>
{isSelected && <Check className="h-4 w-4 text-purple-300" />}
</div>
</div>
);
})}
</div>
)}
</div>
)}
</SelectPopover>
)}
</>
);

View File

@@ -12,6 +12,7 @@ interface PreviewPanelProps {
currentTask: Task | null;
isGenerating: boolean;
generatedVideo: string | null;
generatedVideoId?: string | null;
embedded?: boolean;
}
@@ -19,8 +20,13 @@ export function PreviewPanel({
currentTask,
isGenerating,
generatedVideo,
generatedVideoId = null,
embedded = false,
}: PreviewPanelProps) {
const downloadHref = generatedVideoId
? `/api/videos/generated/${encodeURIComponent(generatedVideoId)}/download`
: generatedVideo;
const content = (
<>
{currentTask && isGenerating && (
@@ -51,10 +57,10 @@ export function PreviewPanel({
)}
</div>
{generatedVideo && (
{generatedVideo && downloadHref && (
<>
<a
href={generatedVideo}
href={downloadHref}
download
className="mt-4 w-full py-3 rounded-xl bg-green-600 hover:bg-green-700 text-white font-medium flex items-center justify-center gap-2 transition-colors"
>

View File

@@ -1,6 +1,8 @@
import { useEffect, useState } from "react";
import type { MouseEvent } from "react";
import { Upload, RefreshCw, Play, Pause, Pencil, Trash2, Check, X, Mic, Square, RotateCw } from "lucide-react";
import { useCallback, useEffect, useMemo, useRef, useState } from "react";
import type { ChangeEvent, MouseEvent } from "react";
import { Upload, RefreshCw, Play, Pause, Pencil, Trash2, Check, X, Mic, Square, RotateCw, Search, ChevronDown } from "lucide-react";
import { SelectPopover } from "@/shared/ui/SelectPopover";
import { AppModal, AppModalHeader } from "@/shared/ui/AppModal";
interface RefAudio {
id: string;
@@ -36,7 +38,8 @@ interface RefAudioPanelProps {
recordingTime: number;
onStartRecording: () => void;
onStopRecording: () => void;
onUseRecording: () => void;
onUseRecording: () => void | Promise<void>;
onDiscardRecording: () => void;
formatRecordingTime: (seconds: number) => string;
}
@@ -68,9 +71,26 @@ export function RefAudioPanel({
onStartRecording,
onStopRecording,
onUseRecording,
onDiscardRecording,
formatRecordingTime,
}: RefAudioPanelProps) {
const [recordedUrl, setRecordedUrl] = useState<string | null>(null);
const [refAudioFilter, setRefAudioFilter] = useState("");
const [recordingModalOpen, setRecordingModalOpen] = useState(false);
const [recordedPreviewPlaying, setRecordedPreviewPlaying] = useState(false);
const [recordedPreviewCurrentTime, setRecordedPreviewCurrentTime] = useState(0);
const [recordedPreviewDuration, setRecordedPreviewDuration] = useState(0);
const refAudioListContainerRef = useRef<HTMLDivElement | null>(null);
const recordedAudioRef = useRef<HTMLAudioElement | null>(null);
const stopRecordedPreview = useCallback(() => {
const player = recordedAudioRef.current;
if (!player) return;
player.pause();
player.currentTime = 0;
setRecordedPreviewPlaying(false);
setRecordedPreviewCurrentTime(0);
}, []);
useEffect(() => {
if (!recordedBlob) {
@@ -88,45 +108,95 @@ export function RefAudioPanel({
const needsRetranscribe = (audio: RefAudio) =>
audio.ref_text.startsWith(OLD_FIXED_REF_TEXT);
const selectedRefAudioLabel = selectedRefAudio?.name || "请选择参考音频";
const filteredRefAudios = useMemo(() => {
const query = refAudioFilter.trim().toLowerCase();
if (!query) return refAudios;
return refAudios.filter((audio) => audio.name.toLowerCase().includes(query));
}, [refAudioFilter, refAudios]);
const handleOpenRefAudioPopover = useCallback(() => {
setRefAudioFilter("");
requestAnimationFrame(() => {
requestAnimationFrame(() => {
const container = refAudioListContainerRef.current;
if (!container) return;
const selectedRow = container.querySelector<HTMLElement>("[data-ref-selected='true']");
selectedRow?.scrollIntoView({ block: "nearest", behavior: "auto" });
});
});
}, []);
const closeRecordingModal = () => {
stopRecordedPreview();
if (isRecording) {
onStopRecording();
}
setRecordingModalOpen(false);
};
const handleUseRecordingAndClose = () => {
stopRecordedPreview();
setRecordingModalOpen(false);
void onUseRecording();
};
const handleToggleRecordedPreview = () => {
const player = recordedAudioRef.current;
if (!player) return;
if (player.paused) {
player.play().catch(() => {
setRecordedPreviewPlaying(false);
});
return;
}
player.pause();
};
const handleRecordedSeek = (event: ChangeEvent<HTMLInputElement>) => {
const player = recordedAudioRef.current;
if (!player) return;
const nextTime = Number(event.target.value);
player.currentTime = Number.isFinite(nextTime) ? nextTime : 0;
setRecordedPreviewCurrentTime(Number.isFinite(nextTime) ? nextTime : 0);
};
const totalRecordedPreviewTime =
Number.isFinite(recordedPreviewDuration) && recordedPreviewDuration > 0
? recordedPreviewDuration
: recordingTime;
return (
<div className="space-y-4">
<div>
<div className="flex justify-between items-center mb-2">
<span className="text-sm text-gray-300">📁 <span className="text-xs text-gray-500 font-normal">(3-10)</span></span>
<div className="flex gap-2">
<input
type="file"
id="ref-audio-upload"
accept=".wav,.mp3,.m4a,.webm,.ogg,.flac,.aac"
onChange={(e) => {
const file = e.target.files?.[0];
if (file) {
onUploadRefAudio(file);
}
e.target.value = '';
}}
className="hidden"
/>
<label
htmlFor="ref-audio-upload"
className={`px-2 py-1 text-xs rounded cursor-pointer transition-all flex items-center gap-1 ${isUploadingRef
? "bg-gray-600 cursor-not-allowed text-gray-400"
: "bg-purple-600 hover:bg-purple-700 text-white"
}`}
>
<Upload className="h-3.5 w-3.5" />
</label>
<button
onClick={onFetchRefAudios}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 flex items-center gap-1"
>
<RefreshCw className="h-3.5 w-3.5" />
</button>
</div>
<button
onClick={onFetchRefAudios}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 flex items-center gap-1"
>
<RefreshCw className="h-3.5 w-3.5" />
</button>
</div>
<input
type="file"
id="ref-audio-upload"
accept=".wav,.mp3,.m4a,.webm,.ogg,.flac,.aac"
onChange={(e) => {
const file = e.target.files?.[0];
if (file) {
onUploadRefAudio(file);
}
e.target.value = "";
}}
className="hidden"
/>
{isUploadingRef && (
<div className="mb-2 p-2 bg-purple-500/10 rounded text-sm text-purple-300">
...
@@ -147,146 +217,316 @@ export function RefAudioPanel({
</div>
) : (
<div className="grid grid-cols-2 gap-2" style={{ contentVisibility: 'auto' }}>
{refAudios.map((audio) => (
<div
key={audio.id}
className={`p-2 rounded-lg border transition-all relative group cursor-pointer ${selectedRefAudio?.id === audio.id
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
onClick={() => {
if (editingAudioId !== audio.id) {
onSelectRefAudio(audio);
}
}}
<SelectPopover
sheetTitle="选择参考音频"
onOpen={handleOpenRefAudioPopover}
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="w-full rounded-xl border border-white/10 bg-black/25 px-3 py-2.5 text-left transition-colors hover:border-white/30"
>
{editingAudioId === audio.id ? (
<div className="flex items-center gap-1" onClick={(e) => e.stopPropagation()}>
<span className="flex items-center justify-between gap-3">
<span className="min-w-0">
<span className="block text-xs text-gray-400"></span>
<span className="mt-0.5 block truncate text-sm text-white">{selectedRefAudioLabel}</span>
</span>
<ChevronDown className={`h-4 w-4 text-gray-300 transition-transform ${open ? "rotate-180" : ""}`} />
</span>
</button>
)}
>
{({ close }) => (
<div className="space-y-2">
<div className="rounded-lg border border-white/10 bg-black/30 px-3 py-2">
<div className="flex items-center gap-2">
<Search className="h-4 w-4 text-gray-400" />
<input
type="text"
value={editName}
onChange={(e) => onEditNameChange(e.target.value)}
className="w-full bg-black/50 text-white text-xs px-1 py-0.5 rounded border border-purple-500 focus:outline-none"
autoFocus
onKeyDown={(e) => {
if (e.key === 'Enter') onSaveEditing(audio.id, e as unknown as MouseEvent);
if (e.key === 'Escape') onCancelEditing(e as unknown as MouseEvent);
}}
value={refAudioFilter}
onChange={(e) => setRefAudioFilter(e.target.value)}
placeholder="搜索参考音频..."
className="w-full bg-transparent text-sm text-white placeholder-gray-500 outline-none"
/>
<button onClick={(e) => onSaveEditing(audio.id, e)} className="text-green-400 hover:text-green-300 text-xs">
<Check className="h-3 w-3" />
</button>
<button onClick={(e) => onCancelEditing(e)} className="text-gray-400 hover:text-gray-300 text-xs">
<X className="h-3 w-3" />
</button>
</div>
</div>
{filteredRefAudios.length === 0 ? (
<div className="py-6 text-center text-sm text-gray-400"></div>
) : (
<>
<div className="flex justify-between items-start mb-1">
<div className="text-white text-xs truncate pr-1 flex-1" title={audio.name}>
{audio.name}
</div>
<div className="flex gap-1 opacity-40 group-hover:opacity-100 transition-opacity">
<button
onClick={(e) => onTogglePlayPreview(audio, e)}
className="text-gray-400 hover:text-purple-400 text-xs"
title="试听"
<div ref={refAudioListContainerRef} className="space-y-1" style={{ contentVisibility: "auto" }}>
{filteredRefAudios.map((audio) => {
const isSelected = selectedRefAudio?.id === audio.id;
return (
<div
key={audio.id}
data-popover-selected={isSelected ? "true" : undefined}
data-ref-selected={isSelected ? "true" : "false"}
className={`flex items-center justify-between gap-2 rounded-lg border px-3 py-2 transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
{playingAudioId === audio.id ? (
<Pause className="h-3.5 w-3.5" />
{editingAudioId === audio.id ? (
<div className="flex-1 flex items-center gap-2" onClick={(e) => e.stopPropagation()}>
<input
type="text"
value={editName}
onChange={(e) => onEditNameChange(e.target.value)}
className="w-full rounded border border-purple-500 bg-black/50 px-2 py-1 text-xs text-white focus:outline-none"
autoFocus
onKeyDown={(e) => {
if (e.key === "Enter") onSaveEditing(audio.id, e as unknown as MouseEvent);
if (e.key === "Escape") onCancelEditing(e as unknown as MouseEvent);
}}
/>
<button type="button" onClick={(e) => onSaveEditing(audio.id, e)} className="text-green-400 hover:text-green-300">
<Check className="h-3.5 w-3.5" />
</button>
<button type="button" onClick={(e) => onCancelEditing(e)} className="text-gray-400 hover:text-gray-300">
<X className="h-3.5 w-3.5" />
</button>
</div>
) : (
<Play className="h-3.5 w-3.5" />
<button
type="button"
onClick={() => {
onSelectRefAudio(audio);
close();
}}
className="min-w-0 flex-1 text-left"
>
<span className="block truncate text-sm text-white" title={audio.name}>{audio.name}</span>
<span className="mt-0.5 block text-xs text-gray-400">
{audio.duration_sec.toFixed(1)}s
{needsRetranscribe(audio) && (
<span className="ml-1 text-yellow-500" title="需要重新识别文字"></span>
)}
</span>
</button>
)}
</button>
<button
onClick={(e) => {
e.stopPropagation();
onRetranscribe(audio.id);
}}
disabled={retranscribingId === audio.id}
className="text-gray-400 hover:text-cyan-400 text-xs disabled:opacity-50"
title="重新识别文字"
>
<RotateCw className={`h-3.5 w-3.5 ${retranscribingId === audio.id ? 'animate-spin' : ''}`} />
</button>
<button
onClick={(e) => onStartEditing(audio, e)}
className="text-gray-400 hover:text-blue-400 text-xs"
title="重命名"
>
<Pencil className="h-3.5 w-3.5" />
</button>
<button
onClick={(e) => {
e.stopPropagation();
onDeleteRefAudio(audio.id);
}}
className="text-gray-400 hover:text-red-400 text-xs"
title="删除"
>
<Trash2 className="h-3.5 w-3.5" />
</button>
</div>
</div>
<div className="text-gray-400 text-xs">
{audio.duration_sec.toFixed(1)}s
{needsRetranscribe(audio) && (
<span className="text-yellow-500 ml-1" title="需要重新识别文字"></span>
)}
</div>
</>
{editingAudioId !== audio.id && (
<div className="flex items-center gap-1 pl-2">
<button
type="button"
onClick={(e) => onTogglePlayPreview(audio, e)}
className="text-gray-400 hover:text-purple-300"
title="试听"
>
{playingAudioId === audio.id ? (
<Pause className="h-3.5 w-3.5" />
) : (
<Play className="h-3.5 w-3.5" />
)}
</button>
<button
type="button"
onClick={(e) => {
e.stopPropagation();
onRetranscribe(audio.id);
}}
disabled={retranscribingId === audio.id}
className="text-gray-400 hover:text-cyan-400 disabled:opacity-50"
title="重新识别文字"
>
<RotateCw className={`h-3.5 w-3.5 ${retranscribingId === audio.id ? "animate-spin" : ""}`} />
</button>
<button
type="button"
onClick={(e) => onStartEditing(audio, e)}
className="text-gray-400 hover:text-blue-400"
title="重命名"
>
<Pencil className="h-3.5 w-3.5" />
</button>
<button
type="button"
onClick={(e) => {
e.stopPropagation();
onDeleteRefAudio(audio.id);
}}
className="text-gray-400 hover:text-red-400"
title="删除"
>
<Trash2 className="h-3.5 w-3.5" />
</button>
{isSelected && <Check className="h-3.5 w-3.5 text-purple-300" />}
</div>
)}
</div>
);
})}
</div>
)}
</div>
))}
</div>
)}
</SelectPopover>
)}
</div>
<div className="border-t border-white/10 pt-4">
<span className="text-sm text-gray-300 mb-2 block">🎤 线 <span className="text-xs text-gray-500"> 3-10 </span></span>
<div className="flex gap-2 items-center">
{!isRecording ? (
<button
onClick={onStartRecording}
className="px-4 py-2 bg-red-600 hover:bg-red-700 text-white rounded-lg text-sm font-medium transition-colors flex items-center gap-2"
>
<Mic className="h-4 w-4" />
</button>
) : (
<button
onClick={onStopRecording}
className="px-4 py-2 bg-gray-600 hover:bg-gray-700 text-white rounded-lg text-sm font-medium transition-colors flex items-center gap-2"
>
<Square className="h-4 w-4" />
</button>
)}
{isRecording && (
<span className="text-red-400 text-sm animate-pulse">
🔴 {formatRecordingTime(recordingTime)}
<div className="mt-3 flex flex-wrap items-center justify-end gap-2">
{recordedBlob && !isRecording && (
<span className="mr-auto text-xs text-emerald-300/90">
{formatRecordingTime(recordingTime)}线
</span>
)}
<label
htmlFor="ref-audio-upload"
className={`px-3 py-1.5 text-xs rounded-lg cursor-pointer transition-all inline-flex items-center gap-1.5 ${isUploadingRef
? "bg-gray-600 cursor-not-allowed text-gray-400 pointer-events-none"
: "bg-purple-600 hover:bg-purple-700 text-white"
}`}
>
<Upload className="h-3.5 w-3.5" />
</label>
<button
type="button"
onClick={() => setRecordingModalOpen(true)}
disabled={isUploadingRef}
className="px-3 py-1.5 text-xs rounded-lg transition-colors bg-red-600 hover:bg-red-700 text-white disabled:bg-gray-600 disabled:text-gray-400 inline-flex items-center gap-1.5"
>
<Mic className="h-3.5 w-3.5" />
线
</button>
</div>
{recordedBlob && !isRecording && (
<div className="mt-3 p-3 bg-green-500/10 border border-green-500/30 rounded-lg">
<div className="flex items-center gap-2 mb-2">
<span className="text-green-300 text-sm"> ({formatRecordingTime(recordingTime)})</span>
<audio src={recordedUrl || ''} controls className="h-8" />
</div>
<button
onClick={onUseRecording}
disabled={isUploadingRef}
className="px-3 py-1 bg-green-600 hover:bg-green-700 text-white rounded text-sm disabled:bg-gray-600"
>
使
</button>
</div>
)}
</div>
{recordingModalOpen && (
<AppModal
isOpen={recordingModalOpen}
onClose={closeRecordingModal}
panelClassName="w-full max-w-lg rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden"
closeOnOverlay={!isRecording}
>
<AppModalHeader
title="🎤 在线录音"
subtitle="建议录制 3-10 秒,超出会自动截取到可用长度"
onClose={closeRecordingModal}
/>
<div className="space-y-4 p-4 sm:p-5">
<div className="rounded-xl border border-white/10 bg-black/25 p-3 sm:p-4">
<div className="flex flex-wrap items-center gap-2">
{!isRecording ? (
<button
type="button"
onClick={onStartRecording}
disabled={isUploadingRef}
className="px-4 py-2 rounded-lg text-sm font-medium bg-red-600 hover:bg-red-700 text-white transition-colors disabled:bg-gray-600 disabled:text-gray-400 inline-flex items-center gap-2"
>
<Mic className="h-4 w-4" />
{recordedBlob ? "重新录音" : "开始录音"}
</button>
) : (
<button
type="button"
onClick={onStopRecording}
className="px-4 py-2 rounded-lg text-sm font-medium bg-gray-600 hover:bg-gray-700 text-white transition-colors inline-flex items-center gap-2"
>
<Square className="h-4 w-4" />
</button>
)}
{isRecording ? (
<span className="inline-flex items-center gap-1 rounded-full border border-red-400/40 bg-red-500/10 px-3 py-1 text-xs text-red-300 animate-pulse">
<span className="h-1.5 w-1.5 rounded-full bg-red-400" />
{formatRecordingTime(recordingTime)}
</span>
) : recordedBlob ? (
<span className="inline-flex items-center gap-1 rounded-full border border-emerald-400/30 bg-emerald-500/10 px-3 py-1 text-xs text-emerald-300">
{formatRecordingTime(recordingTime)}
</span>
) : null}
</div>
{!recordedBlob && !isRecording && (
<p className="mt-3 text-xs text-gray-500"></p>
)}
</div>
{recordedBlob && !isRecording && (
<div className="space-y-3 rounded-xl border border-emerald-500/30 bg-emerald-500/10 p-3">
<div className="flex items-center justify-between gap-2">
<span className="text-sm text-emerald-200"> 使</span>
<span className="text-xs text-emerald-300/80">{formatRecordingTime(recordingTime)}</span>
</div>
<div className="rounded-lg border border-white/10 bg-black/35 px-3 py-2.5">
<audio
key={recordedUrl || "recorded-preview"}
ref={recordedAudioRef}
src={recordedUrl || ""}
className="hidden"
onPlay={() => setRecordedPreviewPlaying(true)}
onPause={() => setRecordedPreviewPlaying(false)}
onEnded={() => {
setRecordedPreviewPlaying(false);
setRecordedPreviewCurrentTime(0);
}}
onTimeUpdate={(event) => setRecordedPreviewCurrentTime(event.currentTarget.currentTime || 0)}
onLoadedMetadata={(event) => setRecordedPreviewDuration(event.currentTarget.duration || 0)}
/>
<div className="flex items-center gap-3">
<button
type="button"
onClick={handleToggleRecordedPreview}
disabled={!recordedUrl}
className="h-8 w-8 shrink-0 rounded-full bg-white/10 hover:bg-white/20 text-emerald-200 disabled:text-gray-500 disabled:bg-white/5 inline-flex items-center justify-center transition-colors"
title={recordedPreviewPlaying ? "暂停试听" : "播放试听"}
>
{recordedPreviewPlaying ? (
<Pause className="h-4 w-4" />
) : (
<Play className="h-4 w-4 translate-x-[1px]" />
)}
</button>
<div className="min-w-0 flex-1">
<input
type="range"
min={0}
max={Math.max(totalRecordedPreviewTime, 0.1)}
step={0.01}
value={Math.min(recordedPreviewCurrentTime, totalRecordedPreviewTime || 0)}
onChange={handleRecordedSeek}
className="w-full h-1.5 cursor-pointer appearance-none rounded-full bg-white/15 accent-emerald-400"
/>
<div className="mt-1 flex items-center justify-between text-[11px] text-emerald-200/80">
<span>{formatRecordingTime(Math.floor(recordedPreviewCurrentTime))}</span>
<span>{formatRecordingTime(Math.floor(totalRecordedPreviewTime))}</span>
</div>
</div>
</div>
</div>
<div className="flex flex-wrap items-center justify-end gap-2">
<button
type="button"
onClick={onDiscardRecording}
disabled={isUploadingRef}
className="px-3 py-1.5 rounded-lg text-sm bg-white/10 hover:bg-white/20 text-gray-200 transition-colors disabled:bg-white/5 disabled:text-gray-500"
>
</button>
<button
type="button"
onClick={handleUseRecordingAndClose}
disabled={isUploadingRef}
className="px-3 py-1.5 rounded-lg text-sm bg-green-600 hover:bg-green-700 text-white transition-colors disabled:bg-gray-600 disabled:text-gray-400"
>
使
</button>
</div>
</div>
)}
</div>
</AppModal>
)}
</div>
);
}

View File

@@ -1,7 +1,9 @@
import { useState, useEffect, useRef, useCallback } from "react";
import { Loader2, Sparkles } from "lucide-react";
import api from "@/shared/api/axios";
import { ApiResponse, unwrap } from "@/shared/api/types";
import { useState, useEffect, useRef, useCallback } from "react";
import { Loader2, Sparkles } from "lucide-react";
import api from "@/shared/api/axios";
import { ApiResponse, unwrap } from "@/shared/api/types";
import { AppModal, AppModalHeader } from "@/shared/ui/AppModal";
import { toast } from "sonner";
const CUSTOM_PROMPT_KEY = "vigent_rewriteCustomPrompt";
@@ -77,42 +79,70 @@ export default function RewriteModal({
onClose();
};
const handleRetry = () => {
setRewrittenText("");
setError(null);
};
const handleRetry = () => {
setRewrittenText("");
setError(null);
};
const fallbackCopyTextToClipboard = useCallback((text: string) => {
const textArea = document.createElement("textarea");
textArea.value = text;
textArea.style.top = "0";
textArea.style.left = "0";
textArea.style.position = "fixed";
textArea.style.opacity = "0";
document.body.appendChild(textArea);
textArea.focus();
textArea.select();
try {
const successful = document.execCommand("copy");
if (successful) {
toast.success("已复制到剪贴板");
} else {
toast.error("复制失败,请手动复制");
}
} catch {
toast.error("复制失败,请手动复制");
}
document.body.removeChild(textArea);
}, []);
const handleCopy = useCallback((text: string) => {
if (!text.trim()) return;
if (navigator.clipboard && window.isSecureContext) {
navigator.clipboard
.writeText(text)
.then(() => {
toast.success("已复制到剪贴板");
})
.catch(() => {
fallbackCopyTextToClipboard(text);
});
} else {
fallbackCopyTextToClipboard(text);
}
}, [fallbackCopyTextToClipboard]);
// ESC to close
useEffect(() => {
if (!isOpen) return;
const handleKeyDown = (e: KeyboardEvent) => {
if (e.key === "Escape") onClose();
};
document.addEventListener("keydown", handleKeyDown);
return () => document.removeEventListener("keydown", handleKeyDown);
}, [isOpen, onClose]);
if (!isOpen) return null;
return (
<div className="fixed inset-0 z-50 flex items-center justify-center bg-black/80 backdrop-blur-sm p-4 animate-in fade-in duration-200">
<div className="bg-[#1a1a1a] border border-white/10 rounded-2xl w-full max-w-2xl max-h-[90vh] overflow-hidden flex flex-col shadow-2xl">
{/* Header */}
<div className="flex items-center justify-between p-4 border-b border-white/10 bg-white/5">
<h3 className="text-lg font-semibold text-white flex items-center gap-2">
<Sparkles className="h-5 w-5 text-purple-400" />
AI
</h3>
<button
onClick={onClose}
className="text-gray-400 hover:text-white transition-colors text-2xl leading-none"
>
&times;
</button>
</div>
{/* Content */}
<div className="flex-1 overflow-y-auto p-6 space-y-5">
if (!isOpen) return null;
return (
<AppModal
isOpen={isOpen}
onClose={onClose}
panelClassName="w-full max-w-2xl max-h-[90vh] rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden flex flex-col"
closeOnOverlay
>
<AppModalHeader
title="AI 智能改写"
icon={<Sparkles className="h-5 w-5 text-purple-300" />}
onClose={onClose}
/>
{/* Content */}
<div className="flex-1 overflow-y-auto p-6 space-y-5">
{/* Custom Prompt */}
<div className="space-y-2">
<label className="text-sm text-gray-300">
@@ -156,58 +186,64 @@ export default function RewriteModal({
</div>
)}
{/* Rewritten result */}
{rewrittenText && (
<>
<div className="space-y-2">
<div className="flex justify-between items-center">
<h4 className="font-semibold text-purple-300 flex items-center gap-2">
<Sparkles className="h-4 w-4" />
AI
</h4>
<button
onClick={handleApply}
className="text-xs bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-500 hover:to-pink-500 text-white px-3 py-1.5 rounded-lg transition-colors shadow-sm"
>
使
</button>
</div>
<div className="bg-purple-900/10 border border-purple-500/20 rounded-xl p-4 max-h-60 overflow-y-auto hide-scrollbar">
<p className="text-gray-200 text-sm leading-relaxed whitespace-pre-wrap">
{rewrittenText}
</p>
</div>
</div>
<div className="space-y-2">
<div className="flex justify-between items-center">
<h4 className="font-semibold text-gray-400 flex items-center gap-2">
📝
</h4>
<button
onClick={onClose}
className="text-xs bg-white/10 hover:bg-white/20 text-white px-3 py-1.5 rounded-lg transition-colors"
>
</button>
</div>
<div className="bg-white/5 border border-white/10 rounded-xl p-4 max-h-40 overflow-y-auto hide-scrollbar">
<p className="text-gray-400 text-sm leading-relaxed whitespace-pre-wrap">
{originalText}
</p>
</div>
</div>
<button
onClick={handleRetry}
className="w-full py-2.5 px-4 bg-white/10 hover:bg-white/20 text-white rounded-xl transition-colors"
>
</button>
</>
)}
</div>
</div>
</div>
);
}
{/* Rewritten result */}
{rewrittenText && (
<>
<div className="space-y-2">
<div className="flex justify-between items-center">
<h4 className="font-semibold text-purple-300 flex items-center gap-2">
<Sparkles className="h-4 w-4" />
AI
</h4>
<span className="text-xs text-gray-400">{rewrittenText.length} </span>
</div>
<div className="bg-purple-900/10 border border-purple-500/20 rounded-xl p-4 max-h-60 overflow-y-auto hide-scrollbar">
<p className="text-gray-200 text-sm leading-relaxed whitespace-pre-wrap">
{rewrittenText}
</p>
</div>
</div>
<div className="space-y-2">
<h4 className="font-semibold text-gray-400 flex items-center gap-2">
📝
</h4>
<div className="bg-white/5 border border-white/10 rounded-xl p-4 max-h-40 overflow-y-auto hide-scrollbar">
<p className="text-gray-400 text-sm leading-relaxed whitespace-pre-wrap">
{originalText}
</p>
</div>
</div>
<div className="grid grid-cols-2 sm:grid-cols-4 gap-2">
<button
onClick={handleApply}
className="py-2.5 px-3 bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-500 hover:to-pink-500 text-white rounded-lg transition-colors text-sm"
>
</button>
<button
onClick={() => handleCopy(rewrittenText)}
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
>
</button>
<button
onClick={handleRetry}
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
>
</button>
<button
onClick={onClose}
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
>
</button>
</div>
</>
)}
</div>
</AppModal>
);
}

View File

@@ -1,6 +1,7 @@
import { useEffect, useRef, useState } from "react";
import { FileText, History, Languages, Loader2, RotateCcw, Save, Sparkles, Trash2 } from "lucide-react";
import { useCallback, useEffect, useRef, useState } from "react";
import { FileText, GraduationCap, History, Languages, Loader2, Maximize2, RotateCcw, Save, Sparkles, Trash2 } from "lucide-react";
import type { SavedScript } from "@/features/home/model/useSavedScripts";
import { AppModal, AppModalHeader } from "@/shared/ui/AppModal";
const LANGUAGES = [
{ code: "English", label: "英语 English" },
@@ -18,9 +19,8 @@ interface ScriptEditorProps {
text: string;
onChangeText: (value: string) => void;
onOpenExtractModal: () => void;
onOpenLearningModal: () => void;
onOpenRewriteModal: () => void;
onGenerateMeta: () => void;
isGeneratingMeta: boolean;
onTranslate: (targetLang: string) => void;
isTranslating: boolean;
hasOriginalText: boolean;
@@ -35,9 +35,8 @@ export function ScriptEditor({
text,
onChangeText,
onOpenExtractModal,
onOpenLearningModal,
onOpenRewriteModal,
onGenerateMeta,
isGeneratingMeta,
onTranslate,
isTranslating,
hasOriginalText,
@@ -47,10 +46,17 @@ export function ScriptEditor({
onLoadScript,
onDeleteScript,
}: ScriptEditorProps) {
const actionBtnBase = "px-3 py-1.5 text-xs rounded-lg transition-colors whitespace-nowrap inline-flex items-center gap-1.5";
const actionBtnDisabled = "bg-gray-600 cursor-not-allowed text-gray-400";
const [showLangMenu, setShowLangMenu] = useState(false);
const langMenuRef = useRef<HTMLDivElement>(null);
const [showHistoryMenu, setShowHistoryMenu] = useState(false);
const historyMenuRef = useRef<HTMLDivElement>(null);
const [isExpandedEditorOpen, setIsExpandedEditorOpen] = useState(false);
const handleCloseExpandedEditor = useCallback(() => {
setIsExpandedEditorOpen(false);
}, []);
useEffect(() => {
if (!showLangMenu) return;
@@ -95,7 +101,7 @@ export function ScriptEditor({
<div className="relative" ref={historyMenuRef}>
<button
onClick={() => setShowHistoryMenu((prev) => !prev)}
className="h-7 px-2.5 text-xs rounded transition-all whitespace-nowrap bg-gray-600 hover:bg-gray-500 text-white inline-flex items-center gap-1"
className={`${actionBtnBase} bg-gray-600 hover:bg-gray-500 text-white`}
>
<History className="h-3.5 w-3.5" />
@@ -137,18 +143,25 @@ export function ScriptEditor({
</div>
<button
onClick={onOpenExtractModal}
className="h-7 px-2.5 text-xs rounded transition-all whitespace-nowrap bg-purple-600 hover:bg-purple-700 text-white inline-flex items-center gap-1"
className={`${actionBtnBase} bg-purple-600 hover:bg-purple-700 text-white`}
>
<FileText className="h-3.5 w-3.5" />
</button>
<button
onClick={onOpenLearningModal}
className={`${actionBtnBase} bg-gradient-to-r from-blue-600 to-cyan-600 hover:from-blue-500 hover:to-cyan-500 text-white`}
>
<GraduationCap className="h-3.5 w-3.5" />
</button>
<div className="relative" ref={langMenuRef}>
<button
onClick={() => setShowLangMenu((prev) => !prev)}
disabled={isTranslating || !text.trim()}
className={`h-7 px-2.5 text-xs rounded transition-all whitespace-nowrap inline-flex items-center gap-1 ${
className={`${actionBtnBase} ${
isTranslating || !text.trim()
? "bg-gray-600 cursor-not-allowed text-gray-400"
? actionBtnDisabled
: "bg-gradient-to-r from-emerald-600 to-teal-600 hover:from-emerald-700 hover:to-teal-700 text-white"
}`}
>
@@ -190,63 +203,75 @@ export function ScriptEditor({
</div>
)}
</div>
<button
onClick={onGenerateMeta}
disabled={isGeneratingMeta || !text.trim()}
className={`h-7 px-2.5 text-xs rounded transition-all whitespace-nowrap inline-flex items-center gap-1 ${isGeneratingMeta || !text.trim()
? "bg-gray-600 cursor-not-allowed text-gray-400"
: "bg-gradient-to-r from-blue-600 to-cyan-600 hover:from-blue-700 hover:to-cyan-700 text-white"
}`}
>
{isGeneratingMeta ? (
<>
<Loader2 className="h-3.5 w-3.5 animate-spin" />
...
</>
) : (
<>
<Sparkles className="h-3.5 w-3.5" />
AI生成标题标签
</>
)}
</button>
</div>
</div>
<textarea
value={text}
onChange={(e) => onChangeText(e.target.value)}
placeholder="请输入你想说的话..."
className="w-full h-40 bg-black/30 border border-white/10 rounded-xl p-4 text-white placeholder-gray-500 resize-none focus:outline-none focus:border-purple-500 transition-colors hide-scrollbar"
/>
<div className="relative">
<textarea
value={text}
onChange={(e) => onChangeText(e.target.value)}
placeholder="请输入你想说的话..."
className="w-full h-40 bg-black/30 border border-white/10 rounded-xl p-4 pr-6 pb-6 text-white placeholder-gray-500 resize-none focus:outline-none focus:border-white/25 transition-colors hide-scrollbar"
/>
<button
type="button"
onClick={() => setIsExpandedEditorOpen(true)}
className="absolute right-0.5 bottom-2 h-5 w-5 text-gray-400/85 hover:text-white focus:outline-none transition-colors inline-flex items-center justify-center"
aria-label="扩展文案编辑器"
title="扩展编辑"
>
<Maximize2 className="h-4 w-4" />
</button>
</div>
<div className="flex items-center justify-between mt-2 text-sm text-gray-400">
<span>{text.length} </span>
<div className="flex items-center gap-2">
<button
onClick={onOpenRewriteModal}
disabled={!text.trim()}
className={`px-2.5 py-1 text-xs rounded transition-all flex items-center gap-1 ${
className={`${actionBtnBase} ${
!text.trim()
? "bg-gray-700 cursor-not-allowed text-gray-500"
: "bg-purple-600/80 hover:bg-purple-600 text-white"
? "bg-gray-600 cursor-not-allowed text-gray-400"
: "bg-purple-600 hover:bg-purple-700 text-white"
}`}
>
<Sparkles className="h-3 w-3" />
<Sparkles className="h-3.5 w-3.5" />
AI智能改写
</button>
<button
onClick={onSaveScript}
disabled={!text.trim()}
className={`px-2.5 py-1 text-xs rounded transition-all flex items-center gap-1 ${
className={`${actionBtnBase} ${
!text.trim()
? "bg-gray-700 cursor-not-allowed text-gray-500"
: "bg-amber-600/80 hover:bg-amber-600 text-white"
? "bg-gray-600 cursor-not-allowed text-gray-400"
: "bg-amber-600 hover:bg-amber-700 text-white"
}`}
>
<Save className="h-3 w-3" />
<Save className="h-3.5 w-3.5" />
</button>
</div>
</div>
<AppModal
isOpen={isExpandedEditorOpen}
onClose={handleCloseExpandedEditor}
panelClassName="w-full max-w-5xl max-h-[92vh] rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden flex flex-col"
>
<AppModalHeader
title="扩展文案编辑"
subtitle="在更大空间里编写与调整文案"
onClose={handleCloseExpandedEditor}
actions={<span className="text-xs text-gray-400 tabular-nums">{text.length} </span>}
/>
<div className="flex-1 p-4 sm:p-5">
<textarea
value={text}
onChange={(e) => onChangeText(e.target.value)}
placeholder="请输入你想说的话..."
className="w-full h-[66vh] min-h-[320px] bg-black/30 border border-white/10 rounded-xl p-4 text-white placeholder-gray-500 resize-none focus:outline-none focus:border-white/25 transition-colors hide-scrollbar"
/>
</div>
</AppModal>
</div>
);
}

View File

@@ -3,6 +3,7 @@
import { useEffect, useCallback } from "react";
import { Loader2 } from "lucide-react";
import { useScriptExtraction } from "./script-extraction/useScriptExtraction";
import { AppModal, AppModalHeader } from "@/shared/ui/AppModal";
interface ScriptExtractionModalProps {
isOpen: boolean;
@@ -36,17 +37,15 @@ export default function ScriptExtractionModal({
clearInputUrl,
} = useScriptExtraction({ isOpen });
// 快捷键:ESC 关闭,Enter 提交(仅在 config 步骤)
// 快捷键Enter 提交(仅在 config 步骤)
const canExtract = (activeTab === "file" && selectedFile) || (activeTab === "url" && inputUrl.trim());
const handleKeyDown = useCallback((e: KeyboardEvent) => {
if (e.key === "Escape") {
onClose();
} else if (e.key === "Enter" && !e.shiftKey && step === "config" && canExtract && !isLoading) {
if (e.key === "Enter" && !e.shiftKey && step === "config" && canExtract && !isLoading) {
e.preventDefault();
handleExtract();
}
}, [onClose, step, canExtract, isLoading, handleExtract]);
}, [step, canExtract, isLoading, handleExtract]);
useEffect(() => {
if (!isOpen) return;
@@ -68,20 +67,13 @@ export default function ScriptExtractionModal({
};
return (
<div className="fixed inset-0 z-50 flex items-center justify-center bg-black/80 backdrop-blur-sm p-4 animate-in fade-in duration-200">
<div className="bg-[#1a1a1a] border border-white/10 rounded-2xl w-full max-w-2xl max-h-[90vh] overflow-hidden flex flex-col shadow-2xl">
{/* Header */}
<div className="flex items-center justify-between p-4 border-b border-white/10 bg-white/5">
<h3 className="text-lg font-semibold text-white flex items-center gap-2">
📜
</h3>
<button
onClick={onClose}
className="text-gray-400 hover:text-white transition-colors text-2xl leading-none"
>
&times;
</button>
</div>
<AppModal
isOpen={isOpen}
onClose={onClose}
panelClassName="w-full max-w-2xl max-h-[90vh] rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden flex flex-col"
closeOnOverlay
>
<AppModalHeader title="📜 文案提取助手" onClose={onClose} />
{/* Content */}
<div className="flex-1 overflow-y-auto p-6">
@@ -236,48 +228,51 @@ export default function ScriptExtractionModal({
)}
{step === "result" && (
<div className="space-y-6">
<div className="space-y-2">
<div className="flex justify-between items-center">
<h4 className="font-semibold text-gray-300 flex items-center gap-2">
🎙
</h4>
<div className="flex items-center gap-2">
{onApply && (
<button
onClick={() => handleApplyAndClose(script)}
className="text-xs bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-500 hover:to-pink-500 text-white px-3 py-1.5 rounded-lg transition-colors flex items-center gap-1 shadow-sm"
>
📥
</button>
)}
<button
onClick={() => copyToClipboard(script)}
className="text-xs bg-white/10 hover:bg-white/20 text-white px-3 py-1.5 rounded-lg transition-colors"
>
</button>
</div>
</div>
<div className="bg-white/5 border border-white/10 rounded-xl p-4 max-h-60 overflow-y-auto hide-scrollbar">
<p className="text-gray-200 text-sm leading-relaxed whitespace-pre-wrap">
{script}
</p>
</div>
<div className="space-y-5">
<div className="flex justify-between items-center">
<h4 className="font-semibold text-gray-300 flex items-center gap-2">
🎙
</h4>
<span className="text-xs text-gray-400">{script.length} </span>
</div>
<div className="flex justify-center pt-4">
<div className="bg-white/5 border border-white/10 rounded-xl p-4 max-h-72 overflow-y-auto hide-scrollbar">
<p className="text-gray-200 text-sm leading-relaxed whitespace-pre-wrap">
{script}
</p>
</div>
<div className={`grid ${onApply ? "grid-cols-2 sm:grid-cols-4" : "grid-cols-2 sm:grid-cols-3"} gap-2`}>
{onApply && (
<button
onClick={() => handleApplyAndClose(script)}
className="py-2.5 px-3 bg-gradient-to-r from-purple-600 to-pink-600 hover:from-purple-500 hover:to-pink-500 text-white rounded-lg transition-colors text-sm"
>
</button>
)}
<button
onClick={() => copyToClipboard(script)}
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
>
</button>
<button
onClick={handleExtractNext}
className="px-6 py-2 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors"
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
>
</button>
<button
onClick={onClose}
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
>
</button>
</div>
</div>
)}
</div>
</div>
</div>
</AppModal>
);
}

View File

@@ -0,0 +1,242 @@
"use client";
import { BookOpen, Sparkles } from "lucide-react";
import { AppModal, AppModalHeader } from "@/shared/ui/AppModal";
import { useScriptLearning } from "./script-learning/useScriptLearning";
interface ScriptLearningModalProps {
isOpen: boolean;
onClose: () => void;
onApply?: (text: string) => void;
}
const WORD_COUNT_MIN = 80;
const WORD_COUNT_MAX = 1000;
export default function ScriptLearningModal({ isOpen, onClose, onApply }: ScriptLearningModalProps) {
const {
step,
inputUrl,
setInputUrl,
topics,
selectedTopic,
setSelectedTopic,
wordCount,
setWordCount,
generatedScript,
error,
analysisId,
handleAnalyze,
handleGenerate,
handleRegenerate,
backToInput,
backToTopics,
copyToClipboard,
} = useScriptLearning({ isOpen });
if (!isOpen) return null;
const wordCountNum = Number(wordCount);
const wordCountValid = Number.isInteger(wordCountNum)
&& wordCountNum >= WORD_COUNT_MIN
&& wordCountNum <= WORD_COUNT_MAX;
const canGenerate = !!analysisId && !!selectedTopic && wordCountValid;
const handleApplyAndClose = () => {
if (!generatedScript.trim()) return;
onApply?.(generatedScript);
onClose();
};
return (
<AppModal
isOpen={isOpen}
onClose={onClose}
panelClassName="w-full max-w-2xl max-h-[90vh] rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden flex flex-col"
closeOnOverlay={false}
closeOnEsc={false}
>
<AppModalHeader
title="文案深度学习"
icon={<BookOpen className="h-5 w-5 text-cyan-300" />}
subtitle="分析博主近期选题风格并快速生成文案"
onClose={onClose}
/>
<div className="flex-1 overflow-y-auto p-6">
{step === "input" && (
<div className="space-y-5">
<div className="space-y-2">
<label className="text-sm text-gray-300"></label>
<input
type="text"
value={inputUrl}
onChange={(event) => setInputUrl(event.target.value)}
placeholder="请粘贴抖音或B站博主主页链接..."
className="w-full bg-black/20 border border-white/10 rounded-xl px-4 py-3 text-white placeholder-gray-500 focus:outline-none focus:border-cyan-500 transition-colors"
/>
<p className="text-xs text-gray-500"> https 使</p>
</div>
{error && (
<div className="bg-red-500/10 border border-red-500/30 rounded-xl p-4">
<p className="text-red-400 text-sm">{error}</p>
</div>
)}
<div className="flex gap-3 pt-1">
<button
type="button"
onClick={() => setInputUrl("")}
className="flex-1 py-3 px-4 bg-white/10 hover:bg-white/20 text-white rounded-xl transition-colors"
>
</button>
<button
type="button"
onClick={() => void handleAnalyze()}
disabled={!inputUrl.trim()}
className="flex-1 py-3 px-4 bg-gradient-to-r from-blue-600 to-cyan-600 hover:from-blue-500 hover:to-cyan-500 disabled:opacity-50 disabled:cursor-not-allowed text-white rounded-xl transition-all font-medium shadow-lg"
>
</button>
</div>
</div>
)}
{(step === "analyzing" || step === "generating") && (
<div className="flex flex-col items-center justify-center py-20">
<div className="relative w-20 h-20 mb-6">
<div className="absolute inset-0 border-4 border-cyan-500/30 rounded-full" />
<div className="absolute inset-0 border-4 border-t-cyan-500 rounded-full animate-spin" />
</div>
<h4 className="text-xl font-medium text-white mb-2">
{step === "analyzing" ? "正在分析中..." : "正在生成中..."}
</h4>
</div>
)}
{step === "topics" && (
<div className="space-y-5">
<div className="bg-cyan-500/10 border border-cyan-500/30 rounded-xl p-3">
<p className="text-cyan-200 text-sm"></p>
</div>
<div className="space-y-2">
<p className="text-sm text-gray-300"></p>
<div className="grid grid-cols-1 sm:grid-cols-2 gap-2">
{topics.map((topic) => {
const active = selectedTopic === topic;
return (
<button
key={topic}
type="button"
onClick={() => setSelectedTopic(topic)}
className={`text-left rounded-lg border px-3 py-2.5 text-sm transition-colors ${
active
? "border-cyan-400 bg-cyan-500/20 text-cyan-100"
: "border-white/10 bg-white/5 text-gray-200 hover:border-white/20 hover:bg-white/10"
}`}
>
{topic}
</button>
);
})}
</div>
</div>
<div className="space-y-2">
<label className="text-sm text-gray-300"></label>
<input
type="number"
min={WORD_COUNT_MIN}
max={WORD_COUNT_MAX}
value={wordCount}
onChange={(event) => setWordCount(event.target.value)}
placeholder="请输入目标字数80-1000如 300"
className="w-full bg-black/20 border border-white/10 rounded-xl px-4 py-3 text-white placeholder-gray-500 focus:outline-none focus:border-cyan-500 transition-colors"
/>
</div>
{error && (
<div className="bg-red-500/10 border border-red-500/30 rounded-xl p-4">
<p className="text-red-400 text-sm">{error}</p>
</div>
)}
<div className="flex gap-3 pt-1">
<button
type="button"
onClick={backToInput}
className="flex-1 py-3 px-4 bg-white/10 hover:bg-white/20 text-white rounded-xl transition-colors"
>
</button>
<button
type="button"
onClick={() => void handleGenerate()}
disabled={!canGenerate}
className="flex-1 py-3 px-4 bg-gradient-to-r from-blue-600 to-cyan-600 hover:from-blue-500 hover:to-cyan-500 disabled:opacity-50 disabled:cursor-not-allowed text-white rounded-xl transition-all font-medium shadow-lg"
>
</button>
</div>
</div>
)}
{step === "result" && (
<div className="space-y-5">
<div className="flex justify-between items-center">
<h4 className="font-semibold text-cyan-200 flex items-center gap-2">
<Sparkles className="h-4 w-4" />
</h4>
<span className="text-xs text-gray-400">{generatedScript.length} </span>
</div>
<div className="bg-white/5 border border-white/10 rounded-xl p-4 max-h-72 overflow-y-auto hide-scrollbar">
<p className="text-gray-200 text-sm leading-relaxed whitespace-pre-wrap">{generatedScript}</p>
</div>
<div className="grid grid-cols-2 sm:grid-cols-4 gap-2">
<button
type="button"
onClick={handleApplyAndClose}
className="py-2.5 px-3 bg-gradient-to-r from-blue-600 to-cyan-600 hover:from-blue-500 hover:to-cyan-500 text-white rounded-lg transition-colors text-sm"
>
</button>
<button
type="button"
onClick={() => copyToClipboard(generatedScript)}
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
>
</button>
<button
type="button"
onClick={() => void handleRegenerate()}
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
>
</button>
<button
type="button"
onClick={backToTopics}
className="py-2.5 px-3 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors text-sm"
>
</button>
</div>
{error && (
<div className="bg-red-500/10 border border-red-500/30 rounded-xl p-4">
<p className="text-red-400 text-sm">{error}</p>
</div>
)}
</div>
)}
</div>
</AppModal>
);
}

View File

@@ -1,18 +1,28 @@
import { useEffect, useRef, useCallback, useState, useMemo } from "react";
import { useEffect, useRef, useCallback, useState } from "react";
import WaveSurfer from "wavesurfer.js";
import { ChevronDown, GripVertical } from "lucide-react";
import type { TimelineSegment } from "@/features/home/model/useTimelineEditor";
import { ChevronDown, Check, X, Plus } from "lucide-react";
import type { InsertSegment } from "@/shared/types/timeline";
import type { Material } from "@/shared/types/material";
import { SelectPopover } from "@/shared/ui/SelectPopover";
interface TimelineEditorProps {
audioDuration: number;
audioUrl: string;
segments: TimelineSegment[];
materials: Material[];
// Multi-camera props
primaryMaterial: Material | undefined;
inserts: InsertSegment[];
insertCandidates: Material[];
onAddInsert: (materialId: string) => void;
onRemoveInsert: (id: string) => void;
onMoveInsert: (id: string, newStart: number) => void;
onClickInsert: (insert: InsertSegment) => void;
onClickPrimary: () => void;
// Single material: for ClipTrimmer compat, pass a synthetic TimelineSegment
primarySourceStart: number;
primarySourceEnd: number;
// Shared
outputAspectRatio: "9:16" | "16:9";
onOutputAspectRatioChange: (ratio: "9:16" | "16:9") => void;
onReorderSegment: (fromIdx: number, toIdx: number) => void;
onClickSegment: (segment: TimelineSegment) => void;
embedded?: boolean;
}
@@ -25,12 +35,18 @@ function formatTime(sec: number): string {
export function TimelineEditor({
audioDuration,
audioUrl,
segments,
materials,
primaryMaterial,
inserts,
insertCandidates,
onAddInsert,
onRemoveInsert,
onMoveInsert,
onClickInsert,
onClickPrimary,
primarySourceStart,
primarySourceEnd,
outputAspectRatio,
onOutputAspectRatioChange,
onReorderSegment,
onClickSegment,
embedded = false,
}: TimelineEditorProps) {
const waveRef = useRef<HTMLDivElement>(null);
@@ -38,22 +54,27 @@ export function TimelineEditor({
const [waveReady, setWaveReady] = useState(false);
const [isPlaying, setIsPlaying] = useState(false);
// Refs for high-frequency DOM updates (avoid 60fps re-renders)
// Refs for high-frequency DOM updates
const playheadRef = useRef<HTMLDivElement>(null);
const timeRef = useRef<HTMLSpanElement>(null);
const audioDurationRef = useRef(audioDuration);
const timelineBarRef = useRef<HTMLDivElement>(null);
useEffect(() => {
audioDurationRef.current = audioDuration;
}, [audioDuration]);
// Drag-to-reorder state
const [dragFromIdx, setDragFromIdx] = useState<number | null>(null);
const [dragOverIdx, setDragOverIdx] = useState<number | null>(null);
// Drag state for insert blocks (move only; duration editing unified to ClipTrimmer)
const [dragId, setDragId] = useState<string | null>(null);
const dragStartXRef = useRef(0);
const dragStartValRef = useRef(0);
const dragMovedRef = useRef(false);
const DRAG_THRESHOLD = 5;
// Aspect ratio dropdown
const [ratioOpen, setRatioOpen] = useState(false);
const ratioRef = useRef<HTMLDivElement>(null);
const isMultiCam = insertCandidates.length > 0 || inserts.length > 0;
const hasPrimary = !!primaryMaterial;
// Aspect ratio options
const ratioOptions = [
{ value: "9:16" as const, label: "竖屏 9:16" },
{ value: "16:9" as const, label: "横屏 16:9" },
@@ -61,24 +82,21 @@ export function TimelineEditor({
const currentRatioLabel =
ratioOptions.find((opt) => opt.value === outputAspectRatio)?.label ?? "竖屏 9:16";
useEffect(() => {
const handler = (e: MouseEvent) => {
if (ratioRef.current && !ratioRef.current.contains(e.target as Node)) {
setRatioOpen(false);
}
};
if (ratioOpen) document.addEventListener("mousedown", handler);
return () => document.removeEventListener("mousedown", handler);
}, [ratioOpen]);
// Primary material loop info
const primaryDuration = primaryMaterial?.duration_sec ?? 0;
const primaryEffective = primarySourceEnd > primarySourceStart
? primarySourceEnd - primarySourceStart
: primaryDuration;
const loopCount = primaryEffective > 0 && audioDuration > 0
? (audioDuration / primaryEffective)
: 0;
// Create / recreate wavesurfer when audioUrl changes
useEffect(() => {
if (!waveRef.current || !audioUrl) return;
const playheadEl = playheadRef.current;
const timeEl = timeRef.current;
// Destroy previous instance
if (wsRef.current) {
wsRef.current.destroy();
wsRef.current = null;
@@ -98,7 +116,6 @@ export function TimelineEditor({
normalize: true,
});
// Click waveform → seek + auto-play
ws.on("interaction", () => ws.play());
ws.on("play", () => setIsPlaying(true));
ws.on("pause", () => setIsPlaying(false));
@@ -106,7 +123,6 @@ export function TimelineEditor({
setIsPlaying(false);
if (playheadRef.current) playheadRef.current.style.display = "none";
});
// High-frequency: update playhead + time via refs (no React re-render)
ws.on("timeupdate", (time: number) => {
const dur = audioDurationRef.current;
if (playheadRef.current && dur > 0) {
@@ -130,7 +146,6 @@ export function TimelineEditor({
};
}, [audioUrl, waveReady]);
// Callback ref to detect when waveRef div mounts
const waveCallbackRef = useCallback((node: HTMLDivElement | null) => {
(waveRef as React.MutableRefObject<HTMLDivElement | null>).current = node;
setWaveReady(!!node);
@@ -140,43 +155,45 @@ export function TimelineEditor({
wsRef.current?.playPause();
}, []);
// Drag-to-reorder handlers
const handleDragStart = useCallback((idx: number, e: React.DragEvent) => {
setDragFromIdx(idx);
e.dataTransfer.effectAllowed = "move";
e.dataTransfer.setData("text/plain", String(idx));
}, []);
// ── Insert block pointer handlers (move only) ──
const handleDragOver = useCallback((idx: number, e: React.DragEvent) => {
const getTimeFromClientX = useCallback((clientX: number): number => {
if (!timelineBarRef.current || audioDuration <= 0) return 0;
const rect = timelineBarRef.current.getBoundingClientRect();
const ratio = Math.max(0, Math.min(1, (clientX - rect.left) / rect.width));
return ratio * audioDuration;
}, [audioDuration]);
const handleInsertPointerDown = useCallback((
id: string,
e: React.PointerEvent
) => {
e.preventDefault();
e.dataTransfer.dropEffect = "move";
setDragOverIdx(idx);
}, []);
e.stopPropagation();
setDragId(id);
dragStartXRef.current = e.clientX;
dragMovedRef.current = false;
const ins = inserts.find((i) => i.id === id);
dragStartValRef.current = ins?.start ?? 0;
(e.target as HTMLElement).setPointerCapture(e.pointerId);
}, [inserts]);
const handleDragLeave = useCallback(() => {
setDragOverIdx(null);
}, []);
const handleDrop = useCallback((toIdx: number, e: React.DragEvent) => {
e.preventDefault();
const fromIdx = parseInt(e.dataTransfer.getData("text/plain"), 10);
if (!isNaN(fromIdx) && fromIdx !== toIdx) {
onReorderSegment(fromIdx, toIdx);
const handlePointerMove = useCallback((e: React.PointerEvent) => {
if (!dragId) return;
if (!dragMovedRef.current) {
const dx = Math.abs(e.clientX - dragStartXRef.current);
if (dx < DRAG_THRESHOLD) return;
dragMovedRef.current = true;
}
setDragFromIdx(null);
setDragOverIdx(null);
}, [onReorderSegment]);
const currentTime = getTimeFromClientX(e.clientX);
const startTime = getTimeFromClientX(dragStartXRef.current);
onMoveInsert(dragId, dragStartValRef.current + (currentTime - startTime));
}, [dragId, getTimeFromClientX, onMoveInsert]);
const handleDragEnd = useCallback(() => {
setDragFromIdx(null);
setDragOverIdx(null);
const handlePointerUp = useCallback(() => {
setDragId(null);
}, []);
// Filter visible vs overflow segments
const visibleSegments = useMemo(() => segments.filter((s) => s.start < audioDuration), [segments, audioDuration]);
const overflowSegments = useMemo(() => segments.filter((s) => s.start >= audioDuration), [segments, audioDuration]);
const hasSegments = visibleSegments.length > 0;
const content = (
<>
<div className="flex items-center justify-between mb-3">
@@ -188,37 +205,49 @@ export function TimelineEditor({
<h3 className="text-sm font-medium text-gray-400"></h3>
)}
<div className="flex items-center gap-2 text-xs text-gray-400">
<div ref={ratioRef} className="relative">
<button
type="button"
onClick={() => setRatioOpen((v) => !v)}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 whitespace-nowrap flex items-center gap-1 transition-all"
title="设置输出画面比例"
<div className="shrink-0">
<SelectPopover
sheetTitle="设置输出画面比例"
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="rounded-lg border border-white/10 bg-black/25 px-2.5 py-1.5 text-left transition-colors hover:border-white/30"
title="设置输出画面比例"
>
<span className="flex items-center justify-between gap-2">
<span className="truncate text-xs text-white">: {currentRatioLabel}</span>
<ChevronDown className={`h-3.5 w-3.5 text-gray-300 transition-transform ${open ? "rotate-180" : ""}`} />
</span>
</button>
)}
>
: {currentRatioLabel}
<ChevronDown className={`h-3 w-3 transition-transform ${ratioOpen ? "rotate-180" : ""}`} />
</button>
{ratioOpen && (
<div className="absolute right-0 top-full mt-1 bg-gray-800 border border-white/20 rounded-lg shadow-xl py-1 z-50 min-w-[106px]">
{ratioOptions.map((opt) => (
<button
key={opt.value}
type="button"
onClick={() => {
onOutputAspectRatioChange(opt.value);
setRatioOpen(false);
}}
className={`w-full text-left px-3 py-1.5 text-xs transition-colors ${
outputAspectRatio === opt.value
? "bg-purple-600/40 text-purple-200"
: "text-gray-300 hover:bg-white/10"
}`}
>
{opt.label}
</button>
))}
</div>
)}
{({ close }) => (
<div className="space-y-1">
{ratioOptions.map((opt) => {
const isSelected = outputAspectRatio === opt.value;
return (
<button
key={opt.value}
type="button"
data-popover-selected={isSelected ? "true" : undefined}
onClick={() => {
onOutputAspectRatioChange(opt.value);
close();
}}
className={`flex w-full items-center justify-between rounded-lg border px-3 py-2 text-left transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
<span className="text-xs text-white">{opt.label}</span>
{isSelected && <Check className="h-3.5 w-3.5 text-purple-300" />}
</button>
);
})}
</div>
)}
</SelectPopover>
</div>
{audioUrl && (
@@ -238,109 +267,149 @@ export function TimelineEditor({
</div>
</div>
{/* Waveform — always rendered so ref stays mounted */}
{/* Waveform */}
<div className="relative mb-1">
<div ref={waveCallbackRef} className="rounded-lg overflow-hidden bg-black/20 cursor-pointer" style={{ minHeight: 56 }} />
</div>
{/* Segment blocks or empty placeholder */}
{hasSegments ? (
{/* Timeline visualization */}
{hasPrimary && audioDuration > 0 ? (
<>
<div className="relative h-14 flex select-none">
{/* Playhead — syncs with audio playback */}
<div
ref={timelineBarRef}
className="relative select-none touch-none"
style={{ minHeight: isMultiCam ? 80 : 56 }}
onPointerMove={handlePointerMove}
onPointerUp={handlePointerUp}
onPointerLeave={handlePointerUp}
>
{/* Playhead */}
<div
ref={playheadRef}
className="absolute top-0 h-full w-0.5 bg-fuchsia-400 z-10 pointer-events-none"
className="absolute top-0 h-full w-0.5 bg-fuchsia-400 z-20 pointer-events-none"
style={{ display: "none", left: "0%" }}
/>
{visibleSegments.map((seg, i) => {
const left = (seg.start / audioDuration) * 100;
const width = ((seg.end - seg.start) / audioDuration) * 100;
const segDur = seg.end - seg.start;
const isDragTarget = dragOverIdx === i && dragFromIdx !== i;
// Compute loop portion for the last visible segment
const isLastVisible = i === visibleSegments.length - 1;
let loopPercent = 0;
if (isLastVisible && audioDuration > 0) {
const mat = materials.find((m) => m.id === seg.materialId);
const matDur = mat?.duration_sec ?? 0;
const effDur = (seg.sourceEnd > seg.sourceStart)
? (seg.sourceEnd - seg.sourceStart)
: Math.max(matDur - seg.sourceStart, 0);
if (effDur > 0 && segDur > effDur + 0.1) {
loopPercent = ((segDur - effDur) / segDur) * 100;
}
}
{/* Primary material background bar */}
<button
onClick={onClickPrimary}
className="absolute inset-0 rounded-lg overflow-hidden border border-purple-500/30 hover:border-purple-500/50 transition-colors cursor-pointer"
style={{ backgroundColor: "#8b5cf620" }}
title={`主素材: ${primaryMaterial?.scene || primaryMaterial?.name || ""}${
loopCount > 1 ? ` (${primaryEffective.toFixed(1)}s ×${loopCount.toFixed(1)} 循环)` : ""
}\n点击设置截取范围`}
>
{/* Loop stripe pattern */}
{loopCount > 1 && (
<div
className="absolute inset-0 pointer-events-none"
style={{
background: `repeating-linear-gradient(-45deg, transparent, transparent 6px, rgba(139,92,246,0.06) 6px, rgba(139,92,246,0.06) 12px)`,
}}
/>
)}
<div className="absolute inset-0 flex items-center px-3">
<span className="text-[11px] text-purple-300/80 truncate">
: {primaryMaterial?.scene || primaryMaterial?.name || ""}
{loopCount > 1 && (
<span className="text-purple-400/60 ml-1">
({primaryEffective.toFixed(1)}s ×{loopCount.toFixed(1)} )
</span>
)}
{primarySourceStart > 0 && (
<span className="text-amber-400/80 ml-1"> {primarySourceStart.toFixed(1)}s</span>
)}
</span>
</div>
</button>
{/* Insert blocks floating above primary */}
{inserts.map((ins) => {
const left = (ins.start / audioDuration) * 100;
const width = ((ins.end - ins.start) / audioDuration) * 100;
const insDur = ins.end - ins.start;
const isDragging = dragId === ins.id;
return (
<div key={seg.id} className="absolute top-0 h-full" style={{ left: `${left}%`, width: `${width}%` }}>
<div
key={ins.id}
className={`absolute group min-h-[40px] ${isDragging ? "z-30" : "z-10"}`}
style={{
left: `${left}%`,
width: `${width}%`,
top: isMultiCam ? 12 : 4,
bottom: isMultiCam ? 12 : 4,
}}
>
{/* Main block body — move on drag, click opens ClipTrimmer */}
<button
draggable
onDragStart={(e) => handleDragStart(i, e)}
onDragOver={(e) => handleDragOver(i, e)}
onDragLeave={handleDragLeave}
onDrop={(e) => handleDrop(i, e)}
onDragEnd={handleDragEnd}
onClick={() => onClickSegment(seg)}
className={`relative w-full h-full rounded-lg flex flex-col items-center justify-center overflow-hidden cursor-grab active:cursor-grabbing transition-all border ${
isDragTarget
? "ring-2 ring-purple-400 border-purple-400 scale-[1.02]"
: dragFromIdx === i
? "opacity-50 border-white/10"
: "hover:opacity-90 border-white/10"
className={`w-full h-full rounded-lg flex flex-col items-center justify-center overflow-hidden cursor-grab active:cursor-grabbing transition-all border ${
isDragging
? "ring-2 ring-white/40 scale-[1.02]"
: "hover:brightness-110"
}`}
style={{ backgroundColor: seg.color + "33", borderColor: isDragTarget ? undefined : seg.color + "66" }}
title={`拖拽可调换顺序 · 点击设置截取范围\n${seg.materialName}\n${segDur.toFixed(1)}s${loopPercent > 0 ? ` (含循环 ${(segDur * loopPercent / 100).toFixed(1)}s)` : ""}`}
style={{
backgroundColor: ins.color + "55",
borderColor: ins.color + "88",
}}
onPointerDown={(e) => handleInsertPointerDown(ins.id, e)}
onClick={() => {
if (!dragMovedRef.current) onClickInsert(ins);
}}
title={`${ins.materialName} ${insDur.toFixed(1)}s\n点击设置截取范围`}
>
<GripVertical className="absolute top-0.5 left-0.5 h-3 w-3 text-white/30 z-[1]" />
<span className="text-[11px] text-white/90 truncate max-w-full px-1 leading-tight z-[1]">
{seg.materialName}
{ins.materialName}
</span>
<span className="text-[10px] text-white/60 leading-tight z-[1]">
{segDur.toFixed(1)}s
{insDur.toFixed(1)}s
</span>
{seg.sourceStart > 0 && (
{ins.sourceStart > 0 && (
<span className="text-[9px] text-amber-400/80 leading-tight z-[1]">
{seg.sourceStart.toFixed(1)}s
{ins.sourceStart.toFixed(1)}s
</span>
)}
{/* Loop fill stripe overlay */}
{loopPercent > 0 && (
<div
className="absolute top-0 right-0 h-full pointer-events-none flex items-center justify-center"
style={{
width: `${loopPercent}%`,
background: `repeating-linear-gradient(-45deg, transparent, transparent 3px, rgba(255,255,255,0.07) 3px, rgba(255,255,255,0.07) 6px)`,
borderLeft: "1px dashed rgba(255,255,255,0.25)",
}}
>
<span className="text-[9px] text-white/30"></span>
</div>
)}
</button>
{/* Delete button */}
<button
className="absolute -top-1.5 -right-1.5 w-5 h-5 rounded-full bg-red-500/80 hover:bg-red-500 flex items-center justify-center opacity-40 sm:opacity-0 sm:group-hover:opacity-100 transition-opacity z-20"
onClick={(e) => {
e.stopPropagation();
onRemoveInsert(ins.id);
}}
title="删除此插入"
>
<X className="w-3 h-3 text-white" />
</button>
</div>
);
})}
</div>
{/* Overflow segments — shown as gray chips */}
{overflowSegments.length > 0 && (
<div className="flex flex-wrap items-center gap-1.5 mt-1.5">
<span className="text-[10px] text-gray-500">使:</span>
{overflowSegments.map((seg) => (
<span
key={seg.id}
className="text-[10px] text-gray-500 bg-white/5 border border-white/10 rounded px-1.5 py-0.5"
{/* Insert candidates bar (multi-cam only) */}
{isMultiCam && insertCandidates.length > 0 && (
<div className="flex flex-wrap items-center gap-1.5 mt-2">
<span className="text-[10px] text-gray-500">:</span>
{insertCandidates.map((mat) => (
<button
key={mat.id}
className="flex items-center gap-0.5 text-[10px] text-gray-300 bg-white/5 border border-white/10 hover:border-white/30 rounded px-1.5 py-0.5 transition-colors"
onClick={() => onAddInsert(mat.id)}
title={`添加 "${mat.scene || mat.name}" 到时间轴`}
>
{seg.materialName}
</span>
<Plus className="w-2.5 h-2.5" />
{mat.scene || mat.name}
</button>
))}
</div>
)}
<p className="text-[10px] text-gray-500 mt-1.5">
· ·
{isMultiCam
? "点击主素材设置截取范围 · 拖拽插入块调整位置 · 点击插入块设置截取/时长"
: "点击波形定位播放 · 点击素材条设置截取范围"
}
</p>
</>
) : (

View File

@@ -1,5 +1,6 @@
import { ChevronDown, Eye } from "lucide-react";
import { ChevronDown, Eye, Check, Loader2, Sparkles } from "lucide-react";
import { FloatingStylePreview } from "@/features/home/ui/FloatingStylePreview";
import { SelectPopover } from "@/shared/ui/SelectPopover";
interface SubtitleStyleOption {
id: string;
@@ -34,6 +35,9 @@ interface TitleStyleOption {
interface TitleSubtitlePanelProps {
showStylePreview: boolean;
onTogglePreview: () => void;
onGenerateMeta: () => void;
isGeneratingMeta: boolean;
canGenerateMeta: boolean;
videoTitle: string;
onTitleChange: (value: string) => void;
onTitleCompositionStart?: () => void;
@@ -75,6 +79,9 @@ interface TitleSubtitlePanelProps {
export function TitleSubtitlePanel({
showStylePreview,
onTogglePreview,
onGenerateMeta,
isGeneratingMeta,
canGenerateMeta,
videoTitle,
onTitleChange,
onTitleCompositionStart,
@@ -112,33 +119,100 @@ export function TitleSubtitlePanel({
previewBaseHeight = 1920,
previewBackgroundUrl,
}: TitleSubtitlePanelProps) {
const titleDisplayOptions: Array<{ value: "short" | "persistent"; label: string }> = [
{ value: "short", label: "标题短暂显示" },
{ value: "persistent", label: "标题常驻显示" },
];
const currentTitleDisplay = titleDisplayOptions.find((opt) => opt.value === titleDisplayMode) || titleDisplayOptions[0];
const currentTitleStyle = titleStyles.find((style) => style.id === selectedTitleStyleId) || titleStyles[0] || null;
const currentSecondaryTitleStyle = titleStyles.find((style) => style.id === selectedSecondaryTitleStyleId) || titleStyles[0] || null;
const currentSubtitleStyle = subtitleStyles.find((style) => style.id === selectedSubtitleStyleId) || subtitleStyles[0] || null;
return (
<div className="bg-white/5 rounded-2xl p-4 sm:p-6 border border-white/10 backdrop-blur-sm">
<div className="flex items-center justify-between mb-4 gap-2">
<h2 className="text-base sm:text-lg font-semibold text-white flex items-center gap-2">
</h2>
<div className="flex items-center gap-1.5">
<div className="relative shrink-0">
<select
value={titleDisplayMode}
onChange={(e) => onTitleDisplayModeChange(e.target.value as "short" | "persistent")}
className="appearance-none rounded-lg border border-white/15 bg-black/35 px-2.5 py-1.5 pr-7 text-xs text-gray-200 outline-none transition-colors hover:border-white/25 focus:border-purple-500"
aria-label="标题显示方式"
>
<option value="short"></option>
<option value="persistent"></option>
</select>
<ChevronDown className="pointer-events-none absolute right-2 top-1/2 h-3.5 w-3.5 -translate-y-1/2 text-gray-400" />
</div>
<div className="mb-4 space-y-2">
<div className="flex flex-wrap items-center justify-between gap-2">
<h2 className="text-base sm:text-lg font-semibold text-white flex items-center gap-2">
</h2>
<button
onClick={onTogglePreview}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 flex items-center gap-1"
onClick={onGenerateMeta}
disabled={isGeneratingMeta || !canGenerateMeta}
className={`px-3 py-1.5 text-xs rounded-lg transition-colors inline-flex items-center gap-1.5 ${
isGeneratingMeta || !canGenerateMeta
? "bg-gray-600 cursor-not-allowed text-gray-400"
: "bg-gradient-to-r from-blue-600 to-cyan-600 hover:from-blue-700 hover:to-cyan-700 text-white"
}`}
>
<Eye className="h-3.5 w-3.5" />
{showStylePreview ? "收起预览" : "预览样式"}
{isGeneratingMeta ? (
<>
<Loader2 className="h-3.5 w-3.5 animate-spin" />
...
</>
) : (
<>
<Sparkles className="h-3.5 w-3.5" />
AI生成标题标签
</>
)}
</button>
</div>
<div className="flex justify-end">
<div className="flex flex-wrap items-center justify-end gap-1.5">
<div className="shrink-0">
<SelectPopover
sheetTitle="标题显示方式"
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="min-w-[146px] rounded-lg border border-white/10 bg-black/25 px-2.5 py-1.5 text-left text-xs text-gray-200 transition-colors hover:border-white/30"
aria-label="标题显示方式"
>
<span className="flex items-center justify-between gap-2">
<span className="whitespace-nowrap">{currentTitleDisplay.label}</span>
<ChevronDown className={`h-3.5 w-3.5 text-gray-400 transition-transform ${open ? "rotate-180" : ""}`} />
</span>
</button>
)}
>
{({ close }) => (
<div className="space-y-1">
{titleDisplayOptions.map((opt) => {
const isSelected = opt.value === titleDisplayMode;
return (
<button
key={opt.value}
type="button"
data-popover-selected={isSelected ? "true" : undefined}
onClick={() => {
onTitleDisplayModeChange(opt.value);
close();
}}
className={`flex w-full items-center justify-between rounded-lg border px-3 py-2 text-left transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
<span className="text-xs text-white whitespace-nowrap">{opt.label}</span>
{isSelected && <Check className="h-3.5 w-3.5 text-purple-300" />}
</button>
);
})}
</div>
)}
</SelectPopover>
</div>
<button
onClick={onTogglePreview}
className="px-2 py-1 text-xs bg-white/10 hover:bg-white/20 rounded text-gray-300 flex items-center gap-1"
>
<Eye className="h-3.5 w-3.5" />
{showStylePreview ? "收起预览" : "预览样式"}
</button>
</div>
</div>
</div>
{showStylePreview && (
@@ -203,17 +277,48 @@ export function TitleSubtitlePanel({
<div className="mb-4 space-y-3">
<div className="flex items-center gap-3">
<label className="text-sm text-gray-300 shrink-0 w-20"></label>
<div className="relative w-1/3 min-w-[100px]">
<select
value={selectedTitleStyleId}
onChange={(e) => onSelectTitleStyle(e.target.value)}
className="w-full appearance-none rounded-lg border border-white/15 bg-black/35 px-3 py-2 pr-8 text-sm text-gray-200 outline-none transition-colors hover:border-white/25 focus:border-purple-500"
<div className="w-1/3 min-w-[130px]">
<SelectPopover
sheetTitle="标题样式"
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="w-full rounded-lg border border-white/15 bg-black/35 px-3 py-2 text-left text-sm text-gray-200 transition-colors hover:border-white/25"
>
<span className="flex items-center justify-between gap-2">
<span className="truncate">{currentTitleStyle?.label || "请选择"}</span>
<ChevronDown className={`h-3.5 w-3.5 text-gray-400 transition-transform ${open ? "rotate-180" : ""}`} />
</span>
</button>
)}
>
{titleStyles.map((style) => (
<option key={style.id} value={style.id}>{style.label}</option>
))}
</select>
<ChevronDown className="pointer-events-none absolute right-2.5 top-1/2 h-3.5 w-3.5 -translate-y-1/2 text-gray-400" />
{({ close }) => (
<div className="space-y-1">
{titleStyles.map((style) => {
const isSelected = selectedTitleStyleId === style.id;
return (
<button
key={style.id}
type="button"
data-popover-selected={isSelected ? "true" : undefined}
onClick={() => {
onSelectTitleStyle(style.id);
close();
}}
className={`flex w-full items-center justify-between rounded-lg border px-3 py-2 text-left transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
<span className="text-sm text-white">{style.label}</span>
{isSelected && <Check className="h-4 w-4 text-purple-300" />}
</button>
);
})}
</div>
)}
</SelectPopover>
</div>
</div>
<div className="flex items-center gap-3">
@@ -231,17 +336,48 @@ export function TitleSubtitlePanel({
<div className="mb-4 space-y-3">
<div className="flex items-center gap-3">
<label className="text-sm text-gray-300 shrink-0 w-20"></label>
<div className="relative w-1/3 min-w-[100px]">
<select
value={selectedSecondaryTitleStyleId}
onChange={(e) => onSelectSecondaryTitleStyle(e.target.value)}
className="w-full appearance-none rounded-lg border border-white/15 bg-black/35 px-3 py-2 pr-8 text-sm text-gray-200 outline-none transition-colors hover:border-white/25 focus:border-purple-500"
<div className="w-1/3 min-w-[130px]">
<SelectPopover
sheetTitle="副标题样式"
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="w-full rounded-lg border border-white/15 bg-black/35 px-3 py-2 text-left text-sm text-gray-200 transition-colors hover:border-white/25"
>
<span className="flex items-center justify-between gap-2">
<span className="truncate">{currentSecondaryTitleStyle?.label || "请选择"}</span>
<ChevronDown className={`h-3.5 w-3.5 text-gray-400 transition-transform ${open ? "rotate-180" : ""}`} />
</span>
</button>
)}
>
{titleStyles.map((style) => (
<option key={style.id} value={style.id}>{style.label}</option>
))}
</select>
<ChevronDown className="pointer-events-none absolute right-2.5 top-1/2 h-3.5 w-3.5 -translate-y-1/2 text-gray-400" />
{({ close }) => (
<div className="space-y-1">
{titleStyles.map((style) => {
const isSelected = selectedSecondaryTitleStyleId === style.id;
return (
<button
key={style.id}
type="button"
data-popover-selected={isSelected ? "true" : undefined}
onClick={() => {
onSelectSecondaryTitleStyle(style.id);
close();
}}
className={`flex w-full items-center justify-between rounded-lg border px-3 py-2 text-left transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
<span className="text-sm text-white">{style.label}</span>
{isSelected && <Check className="h-4 w-4 text-purple-300" />}
</button>
);
})}
</div>
)}
</SelectPopover>
</div>
</div>
<div className="flex items-center gap-3">
@@ -259,17 +395,48 @@ export function TitleSubtitlePanel({
<div className="mt-4 space-y-3">
<div className="flex items-center gap-3">
<label className="text-sm text-gray-300 shrink-0 w-20"></label>
<div className="relative w-1/3 min-w-[100px]">
<select
value={selectedSubtitleStyleId}
onChange={(e) => onSelectSubtitleStyle(e.target.value)}
className="w-full appearance-none rounded-lg border border-white/15 bg-black/35 px-3 py-2 pr-8 text-sm text-gray-200 outline-none transition-colors hover:border-white/25 focus:border-purple-500"
<div className="w-1/3 min-w-[130px]">
<SelectPopover
sheetTitle="字幕样式"
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="w-full rounded-lg border border-white/15 bg-black/35 px-3 py-2 text-left text-sm text-gray-200 transition-colors hover:border-white/25"
>
<span className="flex items-center justify-between gap-2">
<span className="truncate">{currentSubtitleStyle?.label || "请选择"}</span>
<ChevronDown className={`h-3.5 w-3.5 text-gray-400 transition-transform ${open ? "rotate-180" : ""}`} />
</span>
</button>
)}
>
{subtitleStyles.map((style) => (
<option key={style.id} value={style.id}>{style.label}</option>
))}
</select>
<ChevronDown className="pointer-events-none absolute right-2.5 top-1/2 h-3.5 w-3.5 -translate-y-1/2 text-gray-400" />
{({ close }) => (
<div className="space-y-1">
{subtitleStyles.map((style) => {
const isSelected = selectedSubtitleStyleId === style.id;
return (
<button
key={style.id}
type="button"
data-popover-selected={isSelected ? "true" : undefined}
onClick={() => {
onSelectSubtitleStyle(style.id);
close();
}}
className={`flex w-full items-center justify-between rounded-lg border px-3 py-2 text-left transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
<span className="text-sm text-white">{style.label}</span>
{isSelected && <Check className="h-4 w-4 text-purple-300" />}
</button>
);
})}
</div>
)}
</SelectPopover>
</div>
</div>
<div className="flex items-center gap-3">

View File

@@ -1,11 +1,34 @@
import type { ReactNode } from "react";
import { Mic, Volume2 } from "lucide-react";
import { useCallback, useEffect, useRef, useState, type MouseEvent, type ReactNode } from "react";
import { Check, ChevronDown, Loader2, Mic, Pause, Play, Volume2 } from "lucide-react";
import { toast } from "sonner";
import { SelectPopover } from "@/shared/ui/SelectPopover";
interface VoiceOption {
id: string;
name: string;
}
const LOCALE_LABELS: Record<string, string> = {
"zh-CN": "中文",
"en-US": "English",
"ja-JP": "日本語",
"ko-KR": "한국어",
"fr-FR": "Français",
"de-DE": "Deutsch",
"es-ES": "Español",
"ru-RU": "Русский",
"it-IT": "Italiano",
"pt-BR": "Português",
};
const getLocaleFromVoiceId = (voiceId: string) => {
const parts = voiceId.split("-");
if (parts.length >= 2) {
return `${parts[0]}-${parts[1]}`;
}
return voiceId;
};
interface VoiceSelectorProps {
ttsMode: "edgetts" | "voiceclone";
onSelectTtsMode: (mode: "edgetts" | "voiceclone") => void;
@@ -25,6 +48,102 @@ export function VoiceSelector({
voiceCloneSlot,
embedded = false,
}: VoiceSelectorProps) {
const selectedVoice = voices.find((v) => v.id === voice) ?? voices[0];
const selectedLocale = selectedVoice ? getLocaleFromVoiceId(selectedVoice.id) : "";
const selectedLangLabel = LOCALE_LABELS[selectedLocale] ?? selectedLocale;
const [previewingVoiceId, setPreviewingVoiceId] = useState<string | null>(null);
const [previewLoadingVoiceId, setPreviewLoadingVoiceId] = useState<string | null>(null);
const previewPlayerRef = useRef<HTMLAudioElement | null>(null);
const previewRequestIdRef = useRef(0);
const stopVoicePreview = useCallback(() => {
previewRequestIdRef.current += 1;
if (previewPlayerRef.current) {
previewPlayerRef.current.pause();
previewPlayerRef.current.src = "";
previewPlayerRef.current.currentTime = 0;
previewPlayerRef.current = null;
}
setPreviewingVoiceId(null);
setPreviewLoadingVoiceId(null);
}, []);
useEffect(() => () => {
stopVoicePreview();
}, [stopVoicePreview]);
useEffect(() => {
if (ttsMode !== "edgetts") {
stopVoicePreview();
}
}, [ttsMode, stopVoicePreview]);
const handleVoicePreview = useCallback(async (voiceId: string, e: MouseEvent<HTMLButtonElement>) => {
e.stopPropagation();
if (previewingVoiceId === voiceId) {
stopVoicePreview();
return;
}
stopVoicePreview();
setPreviewLoadingVoiceId(voiceId);
const requestId = ++previewRequestIdRef.current;
try {
const audioUrl = `/api/videos/voice-preview?voice=${encodeURIComponent(voiceId)}`;
const player = new Audio(audioUrl);
previewPlayerRef.current = player;
let errorNotified = false;
const notifyPreviewError = () => {
if (errorNotified) return;
errorNotified = true;
toast.error("音色试听失败,请稍后重试");
};
player.onplaying = () => {
if (requestId === previewRequestIdRef.current) {
setPreviewLoadingVoiceId(null);
setPreviewingVoiceId(voiceId);
}
};
player.onended = () => {
if (previewPlayerRef.current === player) {
previewPlayerRef.current = null;
setPreviewingVoiceId(null);
setPreviewLoadingVoiceId(null);
}
};
player.onerror = () => {
if (previewPlayerRef.current === player) {
previewPlayerRef.current = null;
setPreviewingVoiceId(null);
setPreviewLoadingVoiceId(null);
notifyPreviewError();
}
};
await player.play();
if (requestId !== previewRequestIdRef.current) {
player.pause();
player.src = "";
player.currentTime = 0;
}
} catch {
toast.error("音色试听失败,请稍后重试");
} finally {
if (requestId === previewRequestIdRef.current) {
setPreviewLoadingVoiceId(null);
}
}
}, [previewingVoiceId, stopVoicePreview]);
const content = (
<>
<div className="flex gap-2 mb-4">
@@ -51,19 +170,86 @@ export function VoiceSelector({
</div>
{ttsMode === "edgetts" && (
<div className="grid grid-cols-2 gap-3">
{voices.map((v) => (
<button
key={v.id}
onClick={() => onSelectVoice(v.id)}
className={`p-3 rounded-xl border-2 transition-all text-left ${voice === v.id
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
<span className="text-white text-sm">{v.name}</span>
</button>
))}
<div className="space-y-2">
<p className="text-xs text-gray-400"></p>
<SelectPopover
sheetTitle="选择声音"
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="w-full rounded-xl border border-white/10 bg-black/25 px-3 py-2.5 text-left hover:border-white/30 transition-colors"
>
<span className="flex items-center justify-between gap-3">
<span className="min-w-0">
<span className="block truncate text-sm text-white">
{selectedVoice?.name || "请选择声音"}
</span>
<span className="block text-xs text-gray-400">
{selectedLangLabel || "未识别语言"}
</span>
</span>
<ChevronDown className={`h-4 w-4 text-gray-300 transition-transform ${open ? "rotate-180" : ""}`} />
</span>
</button>
)}
>
{({ close }) => (
<div className="space-y-1">
{voices.map((v) => {
const isSelected = voice === v.id;
const isPreviewing = previewingVoiceId === v.id;
const isPreviewLoading = previewLoadingVoiceId === v.id;
const locale = getLocaleFromVoiceId(v.id);
const langLabel = LOCALE_LABELS[locale] ?? locale;
return (
<div
key={v.id}
data-popover-selected={isSelected ? "true" : undefined}
className={`flex w-full items-center justify-between rounded-lg border px-3 py-2 text-left transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
<button
type="button"
onClick={() => {
stopVoicePreview();
onSelectVoice(v.id);
close();
}}
className="min-w-0 flex-1 text-left"
>
<span className="block truncate text-sm text-white">{v.name}</span>
<span className="mt-0.5 block text-xs text-gray-400">{langLabel}</span>
</button>
<div className="flex items-center gap-2 pl-2">
<button
type="button"
onClick={(e) => {
void handleVoicePreview(v.id, e);
}}
className="p-1 text-gray-400 hover:text-purple-300 transition-colors"
title={isPreviewing ? "停止试听" : "试听"}
>
{isPreviewLoading ? (
<Loader2 className="h-4 w-4 animate-spin" />
) : isPreviewing ? (
<Pause className="h-4 w-4" />
) : (
<Play className="h-4 w-4" />
)}
</button>
{isSelected && <Check className="h-4 w-4 text-purple-300" />}
</div>
</div>
);
})}
</div>
)}
</SelectPopover>
</div>
)}

View File

@@ -0,0 +1,239 @@
import { useCallback, useEffect, useState } from "react";
import api from "@/shared/api/axios";
import { ApiResponse, unwrap } from "@/shared/api/types";
import { toast } from "sonner";
export type ScriptLearningStep = "input" | "analyzing" | "topics" | "generating" | "result";
const WORD_COUNT_MIN = 80;
const WORD_COUNT_MAX = 1000;
const DEFAULT_WORD_COUNT = "300";
interface UseScriptLearningOptions {
isOpen: boolean;
}
interface AnalyzeCreatorPayload {
topics: string[];
analysis_id: string;
fetched_count: number;
}
interface GenerateTopicScriptPayload {
script: string;
}
export const useScriptLearning = ({ isOpen }: UseScriptLearningOptions) => {
const [step, setStep] = useState<ScriptLearningStep>("input");
const [inputUrl, setInputUrl] = useState("");
const [topics, setTopics] = useState<string[]>([]);
const [selectedTopic, setSelectedTopic] = useState<string | null>(null);
const [wordCount, setWordCount] = useState(DEFAULT_WORD_COUNT);
const [generatedScript, setGeneratedScript] = useState("");
const [error, setError] = useState<string | null>(null);
const [analysisId, setAnalysisId] = useState<string | null>(null);
const [fetchedCount, setFetchedCount] = useState(0);
const resetAll = useCallback(() => {
setStep("input");
setInputUrl("");
setTopics([]);
setSelectedTopic(null);
setWordCount(DEFAULT_WORD_COUNT);
setGeneratedScript("");
setError(null);
setAnalysisId(null);
setFetchedCount(0);
}, []);
useEffect(() => {
if (isOpen) {
resetAll();
}
}, [isOpen, resetAll]);
const parseWordCount = useCallback((value: string): number | null => {
const num = Number(value);
if (!Number.isInteger(num)) {
return null;
}
if (num < WORD_COUNT_MIN || num > WORD_COUNT_MAX) {
return null;
}
return num;
}, []);
const handleAnalyze = useCallback(async () => {
const urlValue = inputUrl.trim();
if (!urlValue) {
setError("请先输入博主主页链接");
return;
}
setError(null);
setStep("analyzing");
try {
const { data: res } = await api.post<ApiResponse<AnalyzeCreatorPayload>>(
"/api/tools/analyze-creator",
{ url: urlValue },
{ timeout: 60000 }
);
const payload = unwrap(res);
const topicList = payload.topics || [];
if (topicList.length === 0) {
throw new Error("未识别到可用话题,请更换链接重试");
}
setTopics(topicList);
setSelectedTopic(topicList[0]);
setAnalysisId(payload.analysis_id || null);
setFetchedCount(payload.fetched_count || 0);
setGeneratedScript("");
setStep("topics");
} catch (err: unknown) {
const axiosErr = err as {
response?: { data?: { message?: string } };
message?: string;
};
const msg = axiosErr.response?.data?.message || axiosErr.message || "分析失败,请稍后重试";
setError(msg);
setStep("input");
}
}, [inputUrl]);
const handleGenerate = useCallback(async () => {
if (!analysisId) {
setError("分析结果已失效,请重新分析");
setStep("input");
return;
}
if (!selectedTopic) {
setError("请先选择一个话题");
return;
}
const count = parseWordCount(wordCount.trim());
if (count === null) {
setError(`目标字数需在 ${WORD_COUNT_MIN}-${WORD_COUNT_MAX} 之间`);
return;
}
setError(null);
setStep("generating");
try {
const { data: res } = await api.post<ApiResponse<GenerateTopicScriptPayload>>(
"/api/tools/generate-topic-script",
{
analysis_id: analysisId,
topic: selectedTopic,
word_count: count,
},
{ timeout: 90000 }
);
const payload = unwrap(res);
const script = (payload.script || "").trim();
if (!script) {
throw new Error("生成内容为空,请重试");
}
setGeneratedScript(script);
setStep("result");
} catch (err: unknown) {
const axiosErr = err as {
response?: { data?: { message?: string } };
message?: string;
code?: string;
};
let msg = axiosErr.response?.data?.message || axiosErr.message || "生成失败,请稍后重试";
if (axiosErr.code === "ECONNABORTED" || /timeout/i.test(axiosErr.message || "")) {
msg = "生成超时,请稍后重试(可适当减少目标字数)";
}
setError(msg);
setStep("topics");
}
}, [analysisId, parseWordCount, selectedTopic, wordCount]);
const handleRegenerate = useCallback(async () => {
await handleGenerate();
}, [handleGenerate]);
const backToInput = useCallback(() => {
setError(null);
setStep("input");
}, []);
const backToTopics = useCallback(() => {
setError(null);
setStep("topics");
}, []);
const fallbackCopyTextToClipboard = useCallback((text: string) => {
const textArea = document.createElement("textarea");
textArea.value = text;
textArea.style.top = "0";
textArea.style.left = "0";
textArea.style.position = "fixed";
textArea.style.opacity = "0";
document.body.appendChild(textArea);
textArea.focus();
textArea.select();
try {
const successful = document.execCommand("copy");
if (successful) {
toast.success("已复制到剪贴板");
} else {
toast.error("复制失败,请手动复制");
}
} catch {
toast.error("复制失败,请手动复制");
}
document.body.removeChild(textArea);
}, []);
const copyToClipboard = useCallback(
(text: string) => {
if (navigator.clipboard && window.isSecureContext) {
navigator.clipboard
.writeText(text)
.then(() => {
toast.success("已复制到剪贴板");
})
.catch(() => {
fallbackCopyTextToClipboard(text);
});
} else {
fallbackCopyTextToClipboard(text);
}
},
[fallbackCopyTextToClipboard]
);
return {
step,
inputUrl,
setInputUrl,
topics,
selectedTopic,
setSelectedTopic,
wordCount,
setWordCount,
generatedScript,
error,
analysisId,
fetchedCount,
handleAnalyze,
handleGenerate,
handleRegenerate,
backToInput,
backToTopics,
resetAll,
copyToClipboard,
};
};

View File

@@ -7,6 +7,7 @@ import { clampTitle } from "@/shared/lib/title";
import { useTitleInput } from "@/shared/hooks/useTitleInput";
import { useAuth } from "@/shared/contexts/AuthContext";
import { useTask } from "@/shared/contexts/TaskContext";
import { useCleanup } from "@/shared/contexts/CleanupContext";
import { toast } from "sonner";
import { usePublishPrefetch } from "@/shared/hooks/usePublishPrefetch";
import {
@@ -40,6 +41,7 @@ export const usePublishController = () => {
const { userId, isLoading: isAuthLoading } = useAuth();
const { isGenerating } = useTask();
const { triggerCleanup } = useCleanup();
const prevIsGenerating = useRef(isGenerating);
const { readPrefetch, updatePrefetch } = usePublishPrefetch();
@@ -183,6 +185,23 @@ export const usePublishController = () => {
window.scrollTo({ top: 0, left: 0, behavior: "auto" });
}, []);
// ---- 工作区清理事件(清理后同步重置当前页输入态) ----
useEffect(() => {
if (typeof window === "undefined") return;
const handleWorkspaceCleared = (event: Event) => {
const detail = (event as CustomEvent<{ userId?: string }>).detail;
if (!detail?.userId || detail.userId !== userId) return;
setTitle("");
setTags("");
setPublishResults([]);
};
window.addEventListener("vigent:workspace-cleared", handleWorkspaceCleared);
return () => window.removeEventListener("vigent:workspace-cleared", handleWorkspaceCleared);
}, [userId]);
// ---- 发布防误操作 ----
useEffect(() => {
if (!isPublishing) return;
@@ -231,6 +250,29 @@ export const usePublishController = () => {
// ---- 操作函数 ----
const runWithConcurrency = async <T,>(
taskFactories: Array<() => Promise<T>>,
concurrency: number
): Promise<T[]> => {
if (taskFactories.length === 0) return [];
const results: T[] = new Array(taskFactories.length);
let nextIndex = 0;
const worker = async () => {
while (true) {
const currentIndex = nextIndex;
nextIndex += 1;
if (currentIndex >= taskFactories.length) return;
results[currentIndex] = await taskFactories[currentIndex]();
}
};
const workerCount = Math.min(Math.max(concurrency, 1), taskFactories.length);
await Promise.all(Array.from({ length: workerCount }, () => worker()));
return results;
};
const togglePlatform = (platform: string) => {
if (selectedPlatforms.includes(platform)) {
setSelectedPlatforms(selectedPlatforms.filter((p) => p !== platform));
@@ -252,7 +294,8 @@ export const usePublishController = () => {
setIsPublishing(true);
setPublishResults([]);
const tagList = tags.split(/[,\s]+/).filter((t) => t.trim());
for (const platform of selectedPlatforms) {
const publishOnePlatform = async (platform: string): Promise<PublishResult> => {
try {
const { data: res } = await api.post<ApiResponse<any>>("/api/publish", {
video_path: video.path, platform, title, tags: tagList, description: "",
@@ -260,19 +303,31 @@ export const usePublishController = () => {
const result = unwrap(res);
const screenshotUrl = typeof result.screenshot_url === "string"
? resolveMediaUrl(result.screenshot_url) || result.screenshot_url : undefined;
setPublishResults((prev) => [...prev, {
return {
platform: result.platform || platform,
success: Boolean(result.success),
message: result.message || "",
url: result.url,
screenshot_url: screenshotUrl,
}]);
};
} catch (error: any) {
const message = error.response?.data?.message || String(error);
setPublishResults((prev) => [...prev, { platform, success: false, message }]);
return { platform, success: false, message };
}
};
try {
const taskFactories = selectedPlatforms.map((platform) => () => publishOnePlatform(platform));
const results = await runWithConcurrency(taskFactories, 2);
const allSuccess = results.length > 0 && results.every(r => r.success);
if (allSuccess) {
triggerCleanup(results, video.id);
} else {
setPublishResults(results);
}
} finally {
setIsPublishing(false);
}
setIsPublishing(false);
};
const handleLogin = async (platform: string) => {

View File

@@ -4,9 +4,13 @@ import Link from "next/link";
import Image from "next/image";
import VideoPreviewModal from "@/components/VideoPreviewModal";
import AccountSettingsDropdown from "@/components/AccountSettingsDropdown";
import { SelectPopover } from "@/shared/ui/SelectPopover";
import { AppModal, AppModalHeader } from "@/shared/ui/AppModal";
import { usePublishController } from "@/features/publish/model/usePublishController";
import {
ArrowLeft,
Check,
ChevronDown,
RotateCcw,
LogOut,
QrCode,
@@ -18,6 +22,7 @@ import {
export function PublishPage() {
const {
accounts,
videos,
isAccountsLoading,
isVideosLoading,
selectedVideo,
@@ -47,6 +52,8 @@ export function PublishPage() {
closeQrModal,
} = usePublishController();
const selectedVideoItem = videos.find((v) => v.id === selectedVideo) || null;
return (
<div className="min-h-dvh">
<VideoPreviewModal
@@ -56,51 +63,69 @@ export function PublishPage() {
/>
{/* QR码弹窗 */}
{qrPlatform && (
<div className="fixed inset-0 bg-black/80 flex items-center justify-center z-50">
<div className="bg-white rounded-2xl p-8 max-w-md min-w-[320px]">
<h2 className="text-2xl font-bold mb-4 text-center">🔐 {qrPlatform}</h2>
<AppModal
isOpen={Boolean(qrPlatform)}
onClose={closeQrModal}
panelClassName="w-full max-w-md rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden"
closeOnOverlay
>
<AppModalHeader
title={`🔐 扫码登录 ${qrPlatform}`}
subtitle="请使用手机扫码完成登录验证"
icon={<QrCode className="h-5 w-5 text-purple-300" />}
onClose={closeQrModal}
/>
<div className="p-5 space-y-4">
{isLoadingQR ? (
<div className="flex flex-col items-center py-8">
<div className="animate-spin w-16 h-16 border-4 border-purple-500 border-t-transparent rounded-full" />
<p className="text-gray-600 mt-4">...</p>
<Loader2 className="h-14 w-14 animate-spin text-purple-400" />
<p className="text-gray-300 mt-4">...</p>
</div>
) : faceVerifyQr ? (
<>
<Image
src={`data:image/png;base64,${faceVerifyQr}`}
alt="Face Verify QR"
width={400}
height={300}
className="w-full h-auto rounded-lg"
unoptimized
/>
<p className="text-center text-orange-600 font-medium mt-4">
APP扫描上方二维码完成刷脸验证
<div className="space-y-3">
<div className="mx-auto w-fit rounded-xl border border-white/10 bg-white p-2 shadow-[0_10px_30px_rgba(0,0,0,0.35)]">
<Image
src={`data:image/png;base64,${faceVerifyQr}`}
alt="Face Verify QR"
width={400}
height={300}
className="h-auto w-[min(82vw,400px)] border border-black/5"
unoptimized
/>
</div>
<p className="text-center text-amber-300 text-sm font-medium">
APP
</p>
</>
</div>
) : qrCodeImage ? (
<>
<Image
src={`data:image/png;base64,${qrCodeImage}`}
alt="QR Code"
width={280}
height={280}
className="w-full h-auto"
unoptimized
/>
<p className="text-center text-gray-600 mt-4">
使
</p>
</>
) : null}
<div className="space-y-3">
<div className="mx-auto w-fit rounded-xl border border-white/10 bg-white p-3 shadow-[0_10px_30px_rgba(0,0,0,0.35)]">
<Image
src={`data:image/png;base64,${qrCodeImage}`}
alt="QR Code"
width={300}
height={300}
className="h-auto w-[min(74vw,300px)] border border-black/5"
unoptimized
/>
</div>
<p className="text-center text-gray-300 text-sm">使</p>
</div>
) : (
<div className="rounded-xl border border-red-500/30 bg-red-500/10 px-4 py-3 text-sm text-red-200">
</div>
)}
<button
onClick={closeQrModal}
className="w-full mt-4 px-4 py-2 bg-gray-200 rounded-lg hover:bg-gray-300"
className="w-full px-4 py-2.5 bg-white/10 hover:bg-white/20 text-white rounded-lg transition-colors"
>
</button>
</div>
</div>
</AppModal>
)}
{/* Header - 统一样式 */}
@@ -227,76 +252,112 @@ export function PublishPage() {
{/* 选择视频 */}
<div className="bg-white/5 rounded-2xl p-6 border border-white/10 backdrop-blur-sm">
<h2 className="text-lg font-semibold text-white mb-4"></h2>
<div className="flex items-center gap-3 mb-4">
<Search className="text-gray-400 w-4 h-4" />
<input
type="text"
value={videoFilter}
onChange={(e) => setVideoFilter(e.target.value)}
placeholder="搜索视频名称..."
className="flex-1 bg-black/30 border border-white/10 rounded-lg px-3 py-2 text-sm text-white placeholder-gray-500 focus:outline-none focus:border-purple-500"
/>
</div>
{isVideosLoading ? (
<div className="space-y-2">
{Array.from({ length: 2 }).map((_, index) => (
<div
key={`video-skeleton-${index}`}
className="p-3 rounded-lg border border-white/10 bg-white/5 animate-pulse"
>
<div className="h-4 w-40 bg-white/10 rounded" />
<div className="h-3 w-24 bg-white/5 rounded mt-2" />
</div>
))}
</div>
) : filteredVideos.length === 0 ? (
<div className="text-center py-8 text-gray-400">
</div>
) : (
<div className="space-y-2 max-h-64 overflow-y-auto hide-scrollbar" style={{ contentVisibility: "auto" }}>
{filteredVideos.map((v) => (
<div
key={v.id}
onClick={() => setSelectedVideo(v.id)}
className={`p-3 rounded-lg border transition-all flex items-center justify-between group cursor-pointer ${selectedVideo === v.id
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
<div className="flex flex-col">
<span className="text-sm text-white">{v.name}</span>
</div>
<div className="flex items-center gap-2 pl-2">
<button
onClick={(e) => {
e.stopPropagation();
handlePreviewVideo(v.id);
}}
onMouseEnter={() => {
const src = v.path.startsWith("/") ? v.path : `/${v.path}`;
const prefetch = document.createElement("link");
prefetch.rel = "preload";
prefetch.as = "video";
prefetch.href = src;
document.head.appendChild(prefetch);
setTimeout(() => prefetch.remove(), 2000);
}}
className="p-1 text-gray-500 hover:text-purple-400 transition-colors"
title="预览"
>
<Eye className="h-4 w-4" />
</button>
{selectedVideo === v.id && (
<span className="text-xs text-purple-300"></span>
)}
<SelectPopover
sheetTitle="选择发布作品"
onOpen={() => setVideoFilter("")}
trigger={({ open, toggle }) => (
<button
type="button"
onClick={toggle}
className="w-full rounded-xl border border-white/10 bg-black/25 px-3 py-2.5 text-left transition-colors hover:border-white/30"
>
<span className="flex items-center justify-between gap-3">
<span className="min-w-0">
<span className="block text-xs text-gray-400"></span>
<span className="mt-0.5 block truncate text-sm text-white">
{selectedVideoItem?.name || (isVideosLoading ? "正在加载作品..." : "请选择发布作品")}
</span>
</span>
<ChevronDown className={`h-4 w-4 text-gray-300 transition-transform ${open ? "rotate-180" : ""}`} />
</span>
</button>
)}
>
{({ close }) => (
<div className="space-y-2">
<div className="rounded-lg border border-white/10 bg-black/30 px-3 py-2">
<div className="flex items-center gap-2">
<Search className="h-4 w-4 text-gray-400" />
<input
type="text"
value={videoFilter}
onChange={(e) => setVideoFilter(e.target.value)}
placeholder="搜索视频名称..."
className="w-full bg-transparent text-sm text-white placeholder-gray-500 outline-none"
/>
</div>
</div>
))}
</div>
)}
{isVideosLoading ? (
<div className="space-y-2 p-1">
{Array.from({ length: 2 }).map((_, index) => (
<div
key={`video-skeleton-${index}`}
className="p-3 rounded-lg border border-white/10 bg-white/5 animate-pulse"
>
<div className="h-4 w-40 bg-white/10 rounded" />
</div>
))}
</div>
) : filteredVideos.length === 0 ? (
<div className="py-8 text-center text-sm text-gray-400">
</div>
) : (
<div className="space-y-1 pb-1" style={{ contentVisibility: "auto" }}>
{filteredVideos.map((v) => {
const isSelected = selectedVideo === v.id;
return (
<div
key={v.id}
data-popover-selected={isSelected ? "true" : undefined}
className={`flex items-center gap-2 rounded-lg border px-3 py-2 transition-colors ${isSelected
? "border-purple-500 bg-purple-500/20"
: "border-white/10 bg-white/5 hover:border-white/30"
}`}
>
<button
type="button"
onClick={() => {
setSelectedVideo(v.id);
close();
}}
className="min-w-0 flex-1 text-left"
>
<span className="block truncate text-sm text-white">{v.name}</span>
</button>
<button
type="button"
onClick={(e) => {
e.stopPropagation();
handlePreviewVideo(v.id);
}}
onMouseEnter={() => {
const src = v.path.startsWith("/") ? v.path : `/${v.path}`;
const prefetch = document.createElement("link");
prefetch.rel = "preload";
prefetch.as = "video";
prefetch.href = src;
document.head.appendChild(prefetch);
setTimeout(() => prefetch.remove(), 2000);
}}
className="p-1 text-gray-400 hover:text-purple-300"
title="预览"
>
<Eye className="h-4 w-4" />
</button>
{isSelected && <Check className="h-4 w-4 text-purple-300" />}
</div>
);
})}
</div>
)}
</div>
)}
</SelectPopover>
</div>
{/* 填写信息 */}

View File

@@ -0,0 +1,414 @@
"use client";
import {
createContext,
useContext,
useState,
useEffect,
useCallback,
type ReactNode,
} from "react";
import Image from "next/image";
import { AppModal, AppModalHeader } from "@/shared/ui/AppModal";
import { useAuth } from "@/shared/contexts/AuthContext";
import api from "@/shared/api/axios";
import type { ApiResponse } from "@/shared/api/types";
import { Download, Trash2, Loader2, CheckCircle2 } from "lucide-react";
import type { PublishResult } from "@/shared/types/publish";
/* ────────── types ────────── */
const CLEANUP_EXPIRE_MS = 24 * 60 * 60 * 1000; // 24h
const MAX_FAIL_BEFORE_SKIP = 3;
interface CleanupState {
required: boolean;
publishResults: PublishResult[];
videoId?: string;
createdAt?: number; // timestamp for expiry check
failCount?: number;
}
interface CleanupContextType {
triggerCleanup: (results: PublishResult[], videoId?: string) => void;
}
const EMPTY_STATE: CleanupState = { required: false, publishResults: [] };
const CleanupContext = createContext<CleanupContextType>({
triggerCleanup: () => {},
});
/* ────────── helpers ────────── */
function storageKey(userId: string) {
return `vigent_${userId}_cleanup_pending`;
}
function normalizeVideoId(value: unknown): string | undefined {
if (typeof value !== "string") return undefined;
const raw = value.trim();
if (!raw) return undefined;
const decoded = (() => {
try {
return decodeURIComponent(raw);
} catch {
return raw;
}
})();
const routeMatch = decoded.match(/\/generated\/([^/?#]+)\/download/i);
if (routeMatch?.[1]) return routeMatch[1];
const outputMatch = decoded.match(/\/([^/?#]+_output)\.mp4(?:[?#]|$)/i);
if (outputMatch?.[1]) return outputMatch[1];
if (!decoded.includes("/") && !decoded.includes(".") && !decoded.includes("?")) {
return decoded;
}
return undefined;
}
function readPersistedState(userId: string): CleanupState {
try {
const raw = localStorage.getItem(storageKey(userId));
if (!raw) return EMPTY_STATE;
const parsed = JSON.parse(raw) as CleanupState;
const normalized: CleanupState = {
required: Boolean(parsed.required),
publishResults: Array.isArray(parsed.publishResults) ? parsed.publishResults : [],
videoId: normalizeVideoId(parsed.videoId)
|| normalizeVideoId((parsed as unknown as Record<string, unknown>).videoDownloadUrl),
createdAt: typeof parsed.createdAt === "number" ? parsed.createdAt : Date.now(),
failCount: typeof parsed.failCount === "number" && parsed.failCount > 0 ? parsed.failCount : 0,
};
if (!normalized.required) return EMPTY_STATE;
// 24h expiry check
if (normalized.createdAt && Date.now() - normalized.createdAt > CLEANUP_EXPIRE_MS) {
localStorage.removeItem(storageKey(userId));
return EMPTY_STATE;
}
return normalized;
} catch {
return EMPTY_STATE;
}
}
function persistState(userId: string, state: CleanupState) {
localStorage.setItem(storageKey(userId), JSON.stringify(state));
}
function clearPersistedState(userId: string) {
localStorage.removeItem(storageKey(userId));
}
/* ────────── localStorage keys to clear ────────── */
function clearWorkspaceLocalStorage(userId: string) {
const key = userId;
const keysToRemove = [
// home page content
`vigent_${key}_text`,
`vigent_${key}_title`,
`vigent_${key}_secondaryTitle`,
// publish page
`vigent_${key}_publish_title`,
`vigent_${key}_publish_tags`,
];
keysToRemove.forEach((k) => localStorage.removeItem(k));
}
/* ────────── platform icons ────────── */
const platformIcons: Record<string, { src: string; alt: string }> = {
douyin: { src: "/platforms/douyin.svg", alt: "抖音" },
weixin: { src: "/platforms/wechat.svg", alt: "微信视频号" },
bilibili: { src: "/platforms/bilibili.svg", alt: "B站" },
xiaohongshu: { src: "/platforms/xiaohongshu.svg", alt: "小红书" },
};
/* ────────── CleanupModal ────────── */
function CleanupModal({
isOpen,
publishResults,
videoId,
cleanupError,
failCount,
onCleanup,
onSkip,
}: {
isOpen: boolean;
publishResults: PublishResult[];
videoId?: string;
cleanupError?: string | null;
failCount: number;
onCleanup: () => Promise<void>;
onSkip: () => void;
}) {
const [isCleaning, setIsCleaning] = useState(false);
const handleCleanup = async () => {
setIsCleaning(true);
try {
await onCleanup();
} catch {
// keep modal open for retry
} finally {
setIsCleaning(false);
}
};
const canSkip = failCount >= MAX_FAIL_BEFORE_SKIP;
return (
<AppModal
isOpen={isOpen}
onClose={() => {}}
closeOnOverlay={false}
zIndexClassName="z-[300]"
panelClassName="w-full max-w-lg rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden max-h-[90vh] flex flex-col"
>
<AppModalHeader
title="发布完成"
subtitle="所有平台发布成功"
icon={<CheckCircle2 className="h-5 w-5 text-green-400" />}
/>
<div className="p-5 space-y-4 overflow-y-auto flex-1">
{/* Success results */}
<div className="space-y-2">
{publishResults.map((r, i) => (
<div
key={i}
className="flex items-center gap-2 p-3 rounded-xl border border-green-500/30 bg-green-500/10"
>
{platformIcons[r.platform] ? (
<Image
src={platformIcons[r.platform].src}
alt={platformIcons[r.platform].alt}
width={20}
height={20}
className="h-5 w-5"
/>
) : (
<span className="text-lg">🌐</span>
)}
<span className="text-green-400 font-medium text-sm">
{platformIcons[r.platform]?.alt || r.platform} -
</span>
</div>
))}
</div>
{/* Download button */}
{videoId && (
<a
href={`/api/videos/generated/${encodeURIComponent(videoId)}/download`}
download
className="flex items-center justify-center gap-2 w-full py-3 rounded-xl border border-blue-500/30 bg-blue-500/10 text-blue-300 hover:bg-blue-500/20 transition-colors text-sm font-medium"
>
<Download className="h-4 w-4" />
</a>
)}
{cleanupError && (
<div className="rounded-xl border border-red-500/30 bg-red-500/10 px-3 py-2 text-xs text-red-300">
{cleanupError}
</div>
)}
{/* Cleanup button */}
<button
onClick={handleCleanup}
disabled={isCleaning}
className="flex items-center justify-center gap-2 w-full py-3 rounded-xl bg-gradient-to-r from-purple-600 to-pink-600 text-white font-semibold hover:from-purple-500 hover:to-pink-500 transition-all disabled:opacity-60"
>
{isCleaning ? (
<>
<Loader2 className="h-4 w-4 animate-spin" />
...
</>
) : (
<>
<Trash2 className="h-4 w-4" />
&amp;
</>
)}
</button>
{canSkip && (
<button
onClick={onSkip}
disabled={isCleaning}
className="flex items-center justify-center w-full py-2.5 rounded-xl border border-white/10 bg-white/5 text-gray-400 hover:bg-white/10 hover:text-gray-300 transition-colors text-sm disabled:opacity-50 disabled:cursor-not-allowed"
>
使
</button>
)}
<p className="text-xs text-gray-400 text-center leading-relaxed">
便
<br />
</p>
{/* Screenshots */}
{publishResults.some((r) => r.screenshot_url) && (
<div className="pt-2 border-t border-white/10">
<p className="text-xs text-gray-400 mb-3"></p>
<div className="grid grid-cols-1 sm:grid-cols-2 gap-3">
{publishResults
.filter((r) => r.screenshot_url)
.map((r, i) => (
<div key={i} className="space-y-1">
<p className="text-xs text-gray-500">
{platformIcons[r.platform]?.alt || r.platform}
</p>
<a
href={r.screenshot_url}
target="_blank"
rel="noreferrer"
className="block rounded-lg border border-white/10 bg-black/20 overflow-hidden"
>
<Image
src={r.screenshot_url!}
alt={`${r.platform} 截图`}
width={400}
height={300}
className="w-full"
unoptimized
/>
</a>
</div>
))}
</div>
</div>
)}
</div>
</AppModal>
);
}
/* ────────── Provider ────────── */
export function CleanupProvider({ children }: { children: ReactNode }) {
const { userId, isLoading: isAuthLoading } = useAuth();
const [cleanupState, setCleanupState] = useState<CleanupState>(EMPTY_STATE);
const [cleanupError, setCleanupError] = useState<string | null>(null);
// Restore from localStorage on mount / reset on user switch
useEffect(() => {
if (isAuthLoading) return;
if (!userId) {
setCleanupState(EMPTY_STATE);
setCleanupError(null);
return;
}
const persisted = readPersistedState(userId);
if (persisted.required) {
persistState(userId, persisted);
setCleanupState(persisted);
} else {
setCleanupState(EMPTY_STATE);
}
setCleanupError(null);
}, [isAuthLoading, userId]);
const triggerCleanup = useCallback(
(results: PublishResult[], videoId?: string) => {
if (!userId) return;
setCleanupError(null);
const state: CleanupState = {
required: true,
publishResults: results,
videoId,
createdAt: Date.now(),
failCount: 0,
};
persistState(userId, state);
setCleanupState(state);
},
[userId]
);
const executeCleanup = useCallback(async () => {
if (!userId) return;
setCleanupError(null);
// 1. Call backend to delete files
try {
const { data: res } = await api.post<ApiResponse<{ videos_deleted: number; audios_deleted: number }>>(
"/api/videos/cleanup"
);
if (!res.success) {
throw new Error(res.message || "服务端清理失败");
}
} catch (e) {
console.error("Cleanup API failed:", e);
const err = e as { response?: { data?: { message?: string; detail?: string } }; message?: string };
const message = err.response?.data?.message || err.response?.data?.detail || err.message || "请稍后重试";
setCleanupError(message);
setCleanupState((prev) => {
if (!prev.required) return prev;
const next: CleanupState = {
...prev,
failCount: (prev.failCount || 0) + 1,
createdAt: prev.createdAt || Date.now(),
};
persistState(userId, next);
return next;
});
throw e;
}
// 2. Clear workspace localStorage keys
clearWorkspaceLocalStorage(userId);
if (typeof window !== "undefined") {
window.dispatchEvent(
new CustomEvent("vigent:workspace-cleared", { detail: { userId } })
);
}
// 3. Clear cleanup pending state
clearPersistedState(userId);
setCleanupState(EMPTY_STATE);
setCleanupError(null);
}, [userId]);
// Skip: close modal and clear cleanup_pending immediately (user chose to skip)
const handleSkip = useCallback(() => {
if (!userId) return;
clearPersistedState(userId);
setCleanupState(EMPTY_STATE);
setCleanupError(null);
}, [userId]);
return (
<CleanupContext.Provider value={{ triggerCleanup }}>
{children}
<CleanupModal
isOpen={cleanupState.required}
publishResults={cleanupState.publishResults}
videoId={cleanupState.videoId}
cleanupError={cleanupError}
failCount={cleanupState.failCount || 0}
onCleanup={executeCleanup}
onSkip={handleSkip}
/>
</CleanupContext.Provider>
);
}
export function useCleanup() {
return useContext(CleanupContext);
}

View File

@@ -0,0 +1,10 @@
export interface InsertSegment {
id: string;
materialId: string;
materialName: string;
start: number;
end: number;
sourceStart: number;
sourceEnd: number;
color: string;
}

View File

@@ -0,0 +1,139 @@
"use client";
import { useEffect, useRef, type ReactNode } from "react";
import { createPortal } from "react-dom";
import { X } from "lucide-react";
interface AppModalProps {
isOpen: boolean;
onClose: () => void;
children: ReactNode;
zIndexClassName?: string;
panelClassName?: string;
closeOnOverlay?: boolean;
closeOnEsc?: boolean;
lockBodyScroll?: boolean;
}
export function AppModal({
isOpen,
onClose,
children,
zIndexClassName = "z-[220]",
panelClassName = "w-full max-w-2xl rounded-2xl border border-white/10 bg-[#171821]/95 shadow-[0_24px_80px_rgba(0,0,0,0.55)] overflow-hidden",
closeOnOverlay = true,
closeOnEsc = true,
lockBodyScroll = true,
}: AppModalProps) {
const containerRef = useRef<HTMLDivElement | null>(null);
const onCloseRef = useRef(onClose);
useEffect(() => {
onCloseRef.current = onClose;
}, [onClose]);
useEffect(() => {
if (!isOpen) return;
const handleEsc = (event: KeyboardEvent) => {
if (closeOnEsc && event.key === "Escape") onCloseRef.current();
};
const previousActiveElement = document.activeElement as HTMLElement | null;
if (lockBodyScroll) {
const openCount = Number(document.body.dataset.appModalOpenCount ?? "0");
if (openCount === 0) {
document.body.dataset.appModalPrevOverflow = document.body.style.overflow;
document.body.style.overflow = "hidden";
}
document.body.dataset.appModalOpenCount = String(openCount + 1);
}
document.addEventListener("keydown", handleEsc);
requestAnimationFrame(() => containerRef.current?.focus());
return () => {
document.removeEventListener("keydown", handleEsc);
if (lockBodyScroll) {
const openCount = Number(document.body.dataset.appModalOpenCount ?? "0");
const nextCount = Math.max(0, openCount - 1);
if (nextCount === 0) {
document.body.style.overflow = document.body.dataset.appModalPrevOverflow ?? "";
delete document.body.dataset.appModalPrevOverflow;
delete document.body.dataset.appModalOpenCount;
} else {
document.body.dataset.appModalOpenCount = String(nextCount);
}
}
previousActiveElement?.focus?.();
};
}, [closeOnEsc, isOpen, lockBodyScroll]);
if (!isOpen || typeof document === "undefined") return null;
return createPortal(
<div
ref={containerRef}
role="dialog"
aria-modal="true"
tabIndex={-1}
className={`fixed inset-0 ${zIndexClassName} flex items-center justify-center bg-black/80 backdrop-blur-sm p-4 animate-in fade-in duration-200`}
onClick={closeOnOverlay ? onClose : undefined}
>
<div className={panelClassName} onClick={(event) => event.stopPropagation()}>
{children}
</div>
</div>,
document.body
);
}
interface AppModalHeaderProps {
title: ReactNode;
subtitle?: ReactNode;
icon?: ReactNode;
onClose?: () => void;
actions?: ReactNode;
}
export function AppModalHeader({
title,
subtitle,
icon,
onClose,
actions,
}: AppModalHeaderProps) {
return (
<div className="flex items-center justify-between gap-3 border-b border-white/10 bg-gradient-to-r from-white/[0.08] via-white/[0.03] to-white/[0.08] px-4 py-3">
<div className="min-w-0 flex items-center gap-3">
{icon ? (
<div className="h-9 w-9 rounded-lg bg-white/10 text-white flex items-center justify-center">
{icon}
</div>
) : null}
<div className="min-w-0">
<h3 className="truncate text-base font-semibold text-white">{title}</h3>
{subtitle ? <p className="mt-0.5 text-xs text-gray-400">{subtitle}</p> : null}
</div>
</div>
<div className="flex items-center gap-2">
{actions}
{onClose ? (
<button
type="button"
onClick={onClose}
aria-label="关闭弹窗"
className="p-2 text-gray-400 hover:text-white hover:bg-white/10 rounded-lg transition-colors"
>
<X className="h-5 w-5" />
</button>
) : null}
</div>
</div>
);
}

View File

@@ -0,0 +1,233 @@
"use client";
import { type ReactNode, useEffect, useRef, useState } from "react";
import { createPortal } from "react-dom";
interface SelectPopoverTriggerContext {
open: boolean;
isMobile: boolean;
toggle: () => void;
close: () => void;
}
interface SelectPopoverPanelContext {
isMobile: boolean;
close: () => void;
}
interface SelectPopoverProps {
trigger: (ctx: SelectPopoverTriggerContext) => ReactNode;
children: (ctx: SelectPopoverPanelContext) => ReactNode;
sheetTitle?: string;
disabled?: boolean;
panelClassName?: string;
onOpen?: () => void;
}
const MOBILE_QUERY = "(max-width: 639px)";
export function SelectPopover({
trigger,
children,
sheetTitle,
disabled = false,
panelClassName = "",
onOpen,
}: SelectPopoverProps) {
type DesktopRect = {
left: number;
top: number;
width: number;
maxHeight: number;
direction: "up" | "down";
};
const containerRef = useRef<HTMLDivElement | null>(null);
const panelRef = useRef<HTMLDivElement | null>(null);
const desktopScrollRef = useRef<HTMLDivElement | null>(null);
const mobileScrollRef = useRef<HTMLDivElement | null>(null);
const [open, setOpen] = useState(false);
const [isMobile, setIsMobile] = useState(false);
const [desktopRect, setDesktopRect] = useState<DesktopRect | null>(null);
const isOpen = open && !disabled;
const canUseDOM = typeof window !== "undefined" && typeof document !== "undefined";
useEffect(() => {
if (typeof window === "undefined") return;
const mq = window.matchMedia(MOBILE_QUERY);
const handleChange = () => setIsMobile(mq.matches);
handleChange();
if (mq.addEventListener) {
mq.addEventListener("change", handleChange);
return () => mq.removeEventListener("change", handleChange);
}
mq.addListener(handleChange);
return () => mq.removeListener(handleChange);
}, []);
useEffect(() => {
if (!isOpen || isMobile) return;
const handlePointerDown = (event: MouseEvent) => {
if (canUseDOM && document.querySelector("[data-video-preview-open='true']")) {
return;
}
const target = event.target as Node;
const clickedTrigger = containerRef.current?.contains(target) ?? false;
const clickedPanel = panelRef.current?.contains(target) ?? false;
if (!clickedTrigger && !clickedPanel) {
setOpen(false);
}
};
document.addEventListener("mousedown", handlePointerDown);
return () => document.removeEventListener("mousedown", handlePointerDown);
}, [isOpen, isMobile, canUseDOM]);
useEffect(() => {
if (!isOpen) return;
const handleKeyDown = (event: KeyboardEvent) => {
const previewOpen = canUseDOM && Boolean(document.querySelector("[data-video-preview-open='true']"));
if (event.key === "Escape" && !previewOpen) {
setOpen(false);
}
};
document.addEventListener("keydown", handleKeyDown);
return () => document.removeEventListener("keydown", handleKeyDown);
}, [isOpen, canUseDOM]);
useEffect(() => {
if (isOpen) {
onOpen?.();
}
}, [isOpen, onOpen]);
useEffect(() => {
if (!isOpen || !canUseDOM) return;
let raf1 = 0;
let raf2 = 0;
const scrollSelectedIntoView = () => {
const container = isMobile ? mobileScrollRef.current : desktopScrollRef.current;
if (!container) return;
const selectedEl = container.querySelector<HTMLElement>(
"[data-popover-selected='true'], [aria-selected='true']",
);
selectedEl?.scrollIntoView({ block: "nearest", behavior: "auto" });
};
raf1 = window.requestAnimationFrame(() => {
raf2 = window.requestAnimationFrame(scrollSelectedIntoView);
});
return () => {
if (raf1) window.cancelAnimationFrame(raf1);
if (raf2) window.cancelAnimationFrame(raf2);
};
}, [isOpen, isMobile, canUseDOM]);
useEffect(() => {
if (!isOpen || isMobile || !canUseDOM) return;
const updateDesktopRect = () => {
const triggerEl = containerRef.current;
if (!triggerEl) return;
const viewportPadding = 8;
const gap = 8;
const preferredMaxHeight = 352;
const rect = triggerEl.getBoundingClientRect();
const width = rect.width;
const maxLeft = Math.max(viewportPadding, window.innerWidth - width - viewportPadding);
const left = Math.min(Math.max(viewportPadding, rect.left), maxLeft);
const spaceBelow = window.innerHeight - rect.bottom - gap - viewportPadding;
const spaceAbove = rect.top - gap - viewportPadding;
const openUp = spaceBelow < 220 && spaceAbove > spaceBelow;
const direction: "up" | "down" = openUp ? "up" : "down";
const chosenSpace = openUp ? spaceAbove : spaceBelow;
const maxHeight = Math.max(120, Math.min(preferredMaxHeight, Math.floor(chosenSpace)));
const top = openUp
? Math.max(viewportPadding, rect.top - gap)
: Math.min(rect.bottom + gap, window.innerHeight - viewportPadding);
setDesktopRect({ left, top, width, maxHeight, direction });
};
updateDesktopRect();
window.addEventListener("resize", updateDesktopRect);
window.addEventListener("scroll", updateDesktopRect, true);
return () => {
window.removeEventListener("resize", updateDesktopRect);
window.removeEventListener("scroll", updateDesktopRect, true);
};
}, [isOpen, isMobile, canUseDOM]);
const close = () => setOpen(false);
const toggle = () => {
if (disabled) return;
setOpen((prev) => !prev);
};
const desktopPanel = canUseDOM && isOpen && !isMobile && desktopRect
? createPortal(
<div
ref={panelRef}
className={`fixed z-[260] overflow-hidden rounded-2xl border border-white/20 bg-[#130f20]/95 backdrop-blur-md shadow-[0_20px_48px_rgba(8,10,20,0.5)] ${panelClassName}`}
style={{
left: desktopRect.left,
top: desktopRect.top,
width: desktopRect.width,
transform: desktopRect.direction === "up" ? "translateY(-100%)" : undefined,
}}
role="dialog"
aria-modal="false"
>
<div ref={desktopScrollRef} className="hide-scrollbar overflow-y-auto p-2" style={{ maxHeight: desktopRect.maxHeight }}>
{children({ isMobile: false, close })}
</div>
</div>,
document.body,
)
: null;
const mobileSheet = canUseDOM && isOpen && isMobile
? createPortal(
<div
className="fixed inset-0 z-[220] bg-black/60"
onMouseDown={close}
role="dialog"
aria-modal="true"
>
<div
className="fixed inset-x-0 bottom-0 max-h-[78dvh] overflow-hidden rounded-t-3xl border-t border-white/20 bg-[#130f20]/95"
onMouseDown={(e) => e.stopPropagation()}
>
<div className="mx-auto mt-2 h-1.5 w-12 rounded-full bg-white/20" />
{sheetTitle && (
<div className="px-5 pt-3 pb-2 text-sm font-medium text-gray-300">{sheetTitle}</div>
)}
<div ref={mobileScrollRef} className="hide-scrollbar max-h-[calc(78dvh-56px)] overflow-y-auto p-3">{children({ isMobile: true, close })}</div>
</div>
</div>,
document.body,
)
: null;
return (
<div className="relative" ref={containerRef}>
{trigger({ open: isOpen, isMobile, toggle, close })}
{desktopPanel}
{mobileSheet}
</div>
);
}