Compare commits
5 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a7f98c3893 | ||
|
|
fbc5cf49d8 | ||
|
|
b336692144 | ||
|
|
122c07b28e | ||
|
|
baf9e235a1 |
90
CHANGELOG.md
90
CHANGELOG.md
@@ -1,90 +0,0 @@
|
||||
# 更新日志
|
||||
|
||||
本文档记录项目的所有重要变更。
|
||||
|
||||
格式基于 [Keep a Changelog](https://keepachangelog.com/zh-CN/1.0.0/),
|
||||
版本号遵循 [语义化版本](https://semver.org/lang/zh-CN/)。
|
||||
|
||||
## [未发布]
|
||||
|
||||
### 新增
|
||||
- 首次开源发布
|
||||
- 完整的 GitHub 文档(README, CONTRIBUTING, LICENSE 等)
|
||||
- Docker 支持
|
||||
- 环境变量配置模板
|
||||
|
||||
### 修改
|
||||
- 优化了 README 文档结构
|
||||
- 改进了代码注释
|
||||
|
||||
## [1.0.0] - 2025-01-XX
|
||||
|
||||
### 新增
|
||||
- 🚶 盲道导航系统
|
||||
- 实时盲道检测与分割
|
||||
- 智能语音引导
|
||||
- 障碍物检测与避障
|
||||
- 急转弯检测与提醒
|
||||
- 光流稳定算法
|
||||
|
||||
- 🚦 过马路辅助
|
||||
- 斑马线识别与方向检测
|
||||
- 红绿灯颜色识别
|
||||
- 对齐引导系统
|
||||
- 安全提醒
|
||||
|
||||
- 🔍 物品识别与查找
|
||||
- YOLO-E 开放词汇检测
|
||||
- MediaPipe 手部引导
|
||||
- 实时目标追踪
|
||||
- 抓取动作检测
|
||||
|
||||
- 🎙️ 实时语音交互
|
||||
- 阿里云 Paraformer ASR
|
||||
- Qwen-Omni-Turbo 多模态对话
|
||||
- 智能指令解析
|
||||
- 上下文感知
|
||||
|
||||
- 📹 视频与音频处理
|
||||
- WebSocket 实时推流
|
||||
- 音视频同步录制
|
||||
- IMU 数据融合
|
||||
- 多路音频混音
|
||||
|
||||
- 🎨 可视化与交互
|
||||
- Web 实时监控界面
|
||||
- IMU 3D 可视化
|
||||
- 状态面板
|
||||
- 中文友好界面
|
||||
|
||||
### 技术栈
|
||||
- FastAPI + WebSocket
|
||||
- YOLO11 / YOLO-E
|
||||
- MediaPipe
|
||||
- PyTorch + CUDA
|
||||
- OpenCV
|
||||
- DashScope API
|
||||
|
||||
### 已知问题
|
||||
- [ ] 在低端 GPU 上可能出现卡顿
|
||||
- [ ] macOS 上缺少 GPU 加速支持
|
||||
- [ ] 部分中文字体在 Linux 上显示不正确
|
||||
|
||||
---
|
||||
|
||||
## 版本说明
|
||||
|
||||
### 主版本(Major)
|
||||
- 不兼容的 API 更改
|
||||
|
||||
### 次版本(Minor)
|
||||
- 向后兼容的新功能
|
||||
|
||||
### 修订版本(Patch)
|
||||
- 向后兼容的问题修复
|
||||
|
||||
---
|
||||
|
||||
[未发布]: https://github.com/yourusername/aiglass/compare/v1.0.0...HEAD
|
||||
[1.0.0]: https://github.com/yourusername/aiglass/releases/tag/v1.0.0
|
||||
|
||||
@@ -1,402 +0,0 @@
|
||||
# 项目结构说明
|
||||
|
||||
本文档详细说明项目的目录结构和主要文件的作用。
|
||||
|
||||
## 📁 目录结构
|
||||
|
||||
```
|
||||
rebuild1002/
|
||||
├── 📄 主要应用文件
|
||||
│ ├── app_main.py # 主应用入口(FastAPI 服务)
|
||||
│ ├── navigation_master.py # 导航统领器(状态机)
|
||||
│ ├── workflow_blindpath.py # 盲道导航工作流
|
||||
│ ├── workflow_crossstreet.py # 过马路导航工作流
|
||||
│ └── yolomedia.py # 物品查找工作流
|
||||
│
|
||||
├── 🎙️ 语音处理模块
|
||||
│ ├── asr_core.py # 语音识别核心
|
||||
│ ├── omni_client.py # Qwen-Omni 客户端
|
||||
│ ├── qwen_extractor.py # 标签提取(中文->英文)
|
||||
│ ├── audio_player.py # 音频播放器
|
||||
│ └── audio_stream.py # 音频流管理
|
||||
│
|
||||
├── 🤖 模型相关
|
||||
│ ├── yoloe_backend.py # YOLO-E 后端(开放词汇)
|
||||
│ ├── trafficlight_detection.py # 红绿灯检测
|
||||
│ ├── obstacle_detector_client.py # 障碍物检测客户端
|
||||
│ └── models.py # 模型定义
|
||||
│
|
||||
├── 🎥 视频处理
|
||||
│ ├── bridge_io.py # 线程安全的帧缓冲
|
||||
│ ├── sync_recorder.py # 音视频同步录制
|
||||
│ └── video_recorder.py # 视频录制(旧版)
|
||||
│
|
||||
├── 🌐 Web 前端
|
||||
│ ├── templates/
|
||||
│ │ └── index.html # 主界面 HTML
|
||||
│ ├── static/
|
||||
│ │ ├── main.js # 主 JS 脚本
|
||||
│ │ ├── vision.js # 视觉流处理
|
||||
│ │ ├── visualizer.js # 数据可视化
|
||||
│ │ ├── vision_renderer.js # 渲染器
|
||||
│ │ ├── vision.css # 样式表
|
||||
│ │ └── models/ # 3D 模型(IMU 可视化)
|
||||
│
|
||||
├── 🎵 音频资源
|
||||
│ ├── music/ # 系统提示音
|
||||
│ │ ├── converted_向上.wav
|
||||
│ │ ├── converted_向下.wav
|
||||
│ │ └── ...
|
||||
│ └── voice/ # 预录语音
|
||||
│ ├── voice_mapping.json
|
||||
│ └── *.wav
|
||||
│
|
||||
├── 🧠 模型文件
|
||||
│ └── model/
|
||||
│ ├── yolo-seg.pt # 盲道分割模型
|
||||
│ ├── yoloe-11l-seg.pt # YOLO-E 开放词汇模型
|
||||
│ ├── shoppingbest5.pt # 物品识别模型
|
||||
│ ├── trafficlight.pt # 红绿灯检测模型
|
||||
│ └── hand_landmarker.task # MediaPipe 手部模型
|
||||
│
|
||||
├── 📹 录制文件
|
||||
│ └── recordings/ # 自动保存的视频和音频
|
||||
│ ├── video_*.avi
|
||||
│ └── audio_*.wav
|
||||
│
|
||||
├── 🛠️ ESP32 固件
|
||||
│ └── compile/
|
||||
│ ├── compile.ino # Arduino 主程序
|
||||
│ ├── camera_pins.h # 摄像头引脚定义
|
||||
│ ├── ICM42688.cpp/h # IMU 驱动
|
||||
│ └── ESP32_VIDEO_OPTIMIZATION.md
|
||||
│
|
||||
├── 🧪 测试文件
|
||||
│ ├── test_recorder.py # 录制功能测试
|
||||
│ ├── test_traffic_light.py # 红绿灯检测测试
|
||||
│ ├── test_cross_street_blindpath.py # 导航测试
|
||||
│ └── test_crosswalk_awareness.py # 斑马线检测测试
|
||||
│
|
||||
├── 📚 文档
|
||||
│ ├── README.md # 项目主文档
|
||||
│ ├── INSTALLATION.md # 安装指南
|
||||
│ ├── CONTRIBUTING.md # 贡献指南
|
||||
│ ├── FAQ.md # 常见问题
|
||||
│ ├── CHANGELOG.md # 更新日志
|
||||
│ ├── SECURITY.md # 安全政策
|
||||
│ └── PROJECT_STRUCTURE.md # 本文件
|
||||
│
|
||||
├── 🐳 Docker 相关
|
||||
│ ├── Dockerfile # Docker 镜像定义
|
||||
│ ├── docker-compose.yml # Docker Compose 配置
|
||||
│ └── .dockerignore # Docker 忽略文件
|
||||
│
|
||||
├── ⚙️ 配置文件
|
||||
│ ├── .env.example # 环境变量模板
|
||||
│ ├── .gitignore # Git 忽略文件
|
||||
│ ├── requirements.txt # Python 依赖
|
||||
│ ├── setup.sh # Linux/macOS 安装脚本
|
||||
│ └── setup.bat # Windows 安装脚本
|
||||
│
|
||||
├── 📄 许可证
|
||||
│ └── LICENSE # MIT 许可证
|
||||
│
|
||||
└── 🔧 GitHub 相关
|
||||
└── .github/
|
||||
├── ISSUE_TEMPLATE/
|
||||
│ ├── bug_report.md
|
||||
│ └── feature_request.md
|
||||
└── pull_request_template.md
|
||||
```
|
||||
|
||||
## 🔑 核心文件说明
|
||||
|
||||
### 主应用层
|
||||
|
||||
#### `app_main.py`
|
||||
- **作用**: FastAPI 主服务,处理所有 WebSocket 连接
|
||||
- **主要功能**:
|
||||
- WebSocket 路由管理(/ws/camera, /ws_audio, /ws/viewer 等)
|
||||
- 模型加载与初始化
|
||||
- 状态协调与管理
|
||||
- 音视频流分发
|
||||
- **依赖**: 所有其他模块
|
||||
- **入口点**: `python app_main.py`
|
||||
|
||||
#### `navigation_master.py`
|
||||
- **作用**: 导航统领器,管理整个系统的状态机
|
||||
- **主要状态**:
|
||||
- IDLE: 空闲
|
||||
- CHAT: 对话模式
|
||||
- BLINDPATH_NAV: 盲道导航
|
||||
- CROSSING: 过马路
|
||||
- TRAFFIC_LIGHT_DETECTION: 红绿灯检测
|
||||
- ITEM_SEARCH: 物品查找
|
||||
- **核心方法**:
|
||||
- `process_frame()`: 处理每一帧
|
||||
- `start_blind_path_navigation()`: 启动盲道导航
|
||||
- `start_crossing()`: 启动过马路模式
|
||||
- `on_voice_command()`: 处理语音命令
|
||||
|
||||
### 工作流模块
|
||||
|
||||
#### `workflow_blindpath.py`
|
||||
- **作用**: 盲道导航核心逻辑
|
||||
- **主要功能**:
|
||||
- 盲道分割与检测
|
||||
- 障碍物检测
|
||||
- 转弯检测
|
||||
- 光流稳定
|
||||
- 方向引导生成
|
||||
- **状态机**:
|
||||
- ONBOARDING: 上盲道
|
||||
- NAVIGATING: 导航中
|
||||
- MANEUVERING_TURN: 转弯
|
||||
- AVOIDING_OBSTACLE: 避障
|
||||
|
||||
#### `workflow_crossstreet.py`
|
||||
- **作用**: 过马路导航逻辑
|
||||
- **主要功能**:
|
||||
- 斑马线检测
|
||||
- 方向对齐
|
||||
- 引导生成
|
||||
- **核心方法**:
|
||||
- `_is_crosswalk_near()`: 判断是否接近斑马线
|
||||
- `_compute_angle_and_offset()`: 计算角度和偏移
|
||||
|
||||
#### `yolomedia.py`
|
||||
- **作用**: 物品查找工作流
|
||||
- **主要功能**:
|
||||
- YOLO-E 文本提示检测
|
||||
- MediaPipe 手部追踪
|
||||
- 光流目标追踪
|
||||
- 手部引导(方向提示)
|
||||
- 抓取动作检测
|
||||
- **模式**:
|
||||
- SEGMENT: 检测模式
|
||||
- FLASH: 闪烁确认
|
||||
- CENTER_GUIDE: 居中引导
|
||||
- TRACK: 手部追踪
|
||||
|
||||
### 语音模块
|
||||
|
||||
#### `asr_core.py`
|
||||
- **作用**: 阿里云 Paraformer ASR 实时语音识别
|
||||
- **主要功能**:
|
||||
- 实时语音识别
|
||||
- VAD(语音活动检测)
|
||||
- 识别结果回调
|
||||
- **关键类**: `ASRCallback`
|
||||
|
||||
#### `omni_client.py`
|
||||
- **作用**: Qwen-Omni-Turbo 多模态对话客户端
|
||||
- **主要功能**:
|
||||
- 流式对话生成
|
||||
- 图像+文本输入
|
||||
- 语音输出
|
||||
- **核心函数**: `stream_chat()`
|
||||
|
||||
#### `audio_player.py`
|
||||
- **作用**: 统一的音频播放管理
|
||||
- **主要功能**:
|
||||
- TTS 语音播放
|
||||
- 多路音频混音
|
||||
- 音量控制
|
||||
- 线程安全播放
|
||||
- **核心函数**: `play_voice_text()`, `play_audio_threadsafe()`
|
||||
|
||||
### 模型后端
|
||||
|
||||
#### `yoloe_backend.py`
|
||||
- **作用**: YOLO-E 开放词汇检测后端
|
||||
- **主要功能**:
|
||||
- 文本提示设置
|
||||
- 实时分割
|
||||
- 目标追踪
|
||||
- **核心类**: `YoloEBackend`
|
||||
|
||||
#### `trafficlight_detection.py`
|
||||
- **作用**: 红绿灯检测模块
|
||||
- **检测方法**:
|
||||
1. YOLO 模型检测
|
||||
2. HSV 颜色分类(备用)
|
||||
- **输出**: 红灯/绿灯/黄灯/未知
|
||||
|
||||
#### `obstacle_detector_client.py`
|
||||
- **作用**: 障碍物检测客户端
|
||||
- **主要功能**:
|
||||
- 白名单类别过滤
|
||||
- 路径掩码内检测
|
||||
- 物体属性计算(面积、位置、危险度)
|
||||
|
||||
### 视频处理
|
||||
|
||||
#### `bridge_io.py`
|
||||
- **作用**: 线程安全的帧缓冲与分发
|
||||
- **主要功能**:
|
||||
- 生产者-消费者模式
|
||||
- 原始帧缓存
|
||||
- 处理后帧分发
|
||||
- **核心函数**:
|
||||
- `push_raw_jpeg()`: 接收 ESP32 帧
|
||||
- `wait_raw_bgr()`: 取原始帧
|
||||
- `send_vis_bgr()`: 发送处理后的帧
|
||||
|
||||
#### `sync_recorder.py`
|
||||
- **作用**: 音视频同步录制
|
||||
- **主要功能**:
|
||||
- 同步录制视频和音频
|
||||
- 自动文件命名(时间戳)
|
||||
- 线程安全
|
||||
- **输出**: `recordings/video_*.avi`, `audio_*.wav`
|
||||
|
||||
### 前端
|
||||
|
||||
#### `templates/index.html`
|
||||
- **作用**: Web 监控界面
|
||||
- **主要区域**:
|
||||
- 视频流显示
|
||||
- 状态面板
|
||||
- IMU 3D 可视化
|
||||
- 语音识别结果
|
||||
|
||||
#### `static/main.js`
|
||||
- **作用**: 主 JavaScript 逻辑
|
||||
- **主要功能**:
|
||||
- WebSocket 连接管理
|
||||
- UI 更新
|
||||
- 事件处理
|
||||
|
||||
#### `static/vision.js`
|
||||
- **作用**: 视觉流处理
|
||||
- **主要功能**:
|
||||
- WebSocket 接收视频帧
|
||||
- Canvas 渲染
|
||||
- FPS 计算
|
||||
|
||||
#### `static/visualizer.js`
|
||||
- **作用**: IMU 3D 可视化(Three.js)
|
||||
- **主要功能**:
|
||||
- 接收 IMU 数据
|
||||
- 实时渲染设备姿态
|
||||
- 动态灯光效果
|
||||
|
||||
## 🔄 数据流
|
||||
|
||||
### 视频流
|
||||
```
|
||||
ESP32-CAM
|
||||
→ [JPEG] WebSocket /ws/camera
|
||||
→ bridge_io.push_raw_jpeg()
|
||||
→ yolomedia / navigation_master
|
||||
→ bridge_io.send_vis_bgr()
|
||||
→ [JPEG] WebSocket /ws/viewer
|
||||
→ Browser Canvas
|
||||
```
|
||||
|
||||
### 音频流(上行)
|
||||
```
|
||||
ESP32-MIC
|
||||
→ [PCM16] WebSocket /ws_audio
|
||||
→ asr_core
|
||||
→ DashScope ASR
|
||||
→ 识别结果
|
||||
→ start_ai_with_text_custom()
|
||||
```
|
||||
|
||||
### 音频流(下行)
|
||||
```
|
||||
Qwen-Omni / TTS
|
||||
→ audio_player
|
||||
→ [PCM16] audio_stream
|
||||
→ [WAV] HTTP /stream.wav
|
||||
→ ESP32-Speaker
|
||||
```
|
||||
|
||||
### IMU 数据流
|
||||
```
|
||||
ESP32-IMU
|
||||
→ [JSON] UDP 12345
|
||||
→ process_imu_and_maybe_store()
|
||||
→ [JSON] WebSocket /ws
|
||||
→ visualizer.js (Three.js)
|
||||
```
|
||||
|
||||
## 🎯 关键设计模式
|
||||
|
||||
### 1. 状态机模式
|
||||
- **位置**: `navigation_master.py`
|
||||
- **作用**: 管理系统状态转换
|
||||
- **状态**: IDLE → CHAT / BLINDPATH_NAV / CROSSING / ...
|
||||
|
||||
### 2. 生产者-消费者模式
|
||||
- **位置**: `bridge_io.py`
|
||||
- **作用**: 解耦视频接收与处理
|
||||
- **实现**: 线程 + 队列
|
||||
|
||||
### 3. 策略模式
|
||||
- **位置**: 各 `workflow_*.py`
|
||||
- **作用**: 不同导航策略的实现
|
||||
- **实现**: 统一的 `process_frame()` 接口
|
||||
|
||||
### 4. 单例模式
|
||||
- **位置**: 模型加载
|
||||
- **作用**: 全局共享模型实例
|
||||
- **实现**: 全局变量 + 初始化检查
|
||||
|
||||
### 5. 观察者模式
|
||||
- **位置**: WebSocket 通信
|
||||
- **作用**: 多客户端订阅视频流
|
||||
- **实现**: `camera_viewers: Set[WebSocket]`
|
||||
|
||||
## 📦 依赖关系
|
||||
|
||||
```
|
||||
app_main.py
|
||||
├── navigation_master.py
|
||||
│ ├── workflow_blindpath.py
|
||||
│ │ ├── yoloe_backend.py
|
||||
│ │ └── obstacle_detector_client.py
|
||||
│ ├── workflow_crossstreet.py
|
||||
│ └── trafficlight_detection.py
|
||||
├── yolomedia.py
|
||||
│ └── yoloe_backend.py
|
||||
├── asr_core.py
|
||||
├── omni_client.py
|
||||
├── audio_player.py
|
||||
├── audio_stream.py
|
||||
├── bridge_io.py
|
||||
└── sync_recorder.py
|
||||
```
|
||||
|
||||
## 🚀 启动流程
|
||||
|
||||
1. **初始化阶段** (`app_main.py`)
|
||||
- 加载环境变量
|
||||
- 加载导航模型(YOLO、MediaPipe)
|
||||
- 初始化音频系统
|
||||
- 启动录制系统
|
||||
- 预加载红绿灯模型
|
||||
|
||||
2. **服务启动** (FastAPI)
|
||||
- 注册 WebSocket 路由
|
||||
- 挂载静态文件
|
||||
- 启动 UDP 监听(IMU)
|
||||
- 启动 HTTP 服务(8081 端口)
|
||||
|
||||
3. **运行阶段**
|
||||
- 等待 ESP32 连接
|
||||
- 接收视频/音频/IMU 数据
|
||||
- 处理用户语音指令
|
||||
- 实时推送处理结果
|
||||
|
||||
4. **关闭阶段**
|
||||
- 停止录制(保存文件)
|
||||
- 关闭所有 WebSocket 连接
|
||||
- 释放模型资源
|
||||
- 清理临时文件
|
||||
|
||||
---
|
||||
|
||||
**提示**: 如需了解某个模块的详细实现,请查看相应源文件的注释和 docstring。
|
||||
|
||||
@@ -109,7 +109,7 @@ pip install -r requirements.txt
|
||||
|------|------|
|
||||
| `yolo-seg.pt` | 盲道分割 |
|
||||
| `yoloe-11l-seg.pt` | 障碍物/开放词汇检测 |
|
||||
| `yolo11l-seg-indoor14.pt` | 室内导盲 (14类) |
|
||||
| `yolo11l-seg-indoor.engine` | 室内导盲 (20类) |
|
||||
| `SenseVoiceSmall/` | 语音识别 |
|
||||
|
||||
> 模型下载: https://www.modelscope.cn/models/archifancy/AIGlasses_for_navigation
|
||||
|
||||
755
app_main.py
755
app_main.py
File diff suppressed because it is too large
Load Diff
@@ -63,7 +63,10 @@ INTERRUPT_KEYWORDS = set(
|
||||
NAV_CONTROL_WHITELIST = [
|
||||
"停止导航", "结束导航", "停止检测", "停止红绿灯",
|
||||
"开始导航", "盲道导航", "开始过马路", "过马路结束",
|
||||
"帮我导航", "帮我过马路"
|
||||
"帮我导航", "帮我过马路",
|
||||
"室内导航", "室内导盲", "四内导航", "思维导航", "失内导航", "时内导航", # Day 28: 室内导航 + 同音误识别
|
||||
"室类导航", "类导航", # Day 28: 新增误识别
|
||||
"退出导航", "关闭导航", "别导了", "别念了", "停止", # Day 28: 增强停止命令
|
||||
]
|
||||
|
||||
|
||||
|
||||
@@ -371,9 +371,9 @@ class CompressedAudioCache:
|
||||
|
||||
# 打印压缩率
|
||||
compression_ratio = len(compressed) / self._original_sizes[filepath]
|
||||
logger.info(f"[压缩] {os.path.basename(filepath)}: "
|
||||
f"{self._original_sizes[filepath]} -> {len(compressed)} bytes "
|
||||
f"({compression_ratio:.1%})")
|
||||
# logger.info(f"[压缩] {os.path.basename(filepath)}: "
|
||||
# f"{self._original_sizes[filepath]} -> {len(compressed)} bytes "
|
||||
# f"({compression_ratio:.1%})")
|
||||
|
||||
return compressed
|
||||
|
||||
|
||||
107
audio_player.py
107
audio_player.py
@@ -8,6 +8,7 @@ import asyncio
|
||||
import threading
|
||||
import queue
|
||||
import time
|
||||
import hashlib
|
||||
from audio_stream import broadcast_pcm16_realtime
|
||||
from audio_compressor import compressed_audio_cache, AudioCompressor
|
||||
|
||||
@@ -36,6 +37,9 @@ AUDIO_BASE_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "music
|
||||
VOICE_DIR = os.getenv("VOICE_DIR", os.path.join(os.path.dirname(os.path.abspath(__file__)), "voice"))
|
||||
VOICE_MAP_FILE = os.path.join(VOICE_DIR, "map.zh-CN.json")
|
||||
|
||||
# Day 26 优化: EdgeTTS 合成语音磁盘缓存目录
|
||||
TTS_CACHE_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "voice", "tts_cache")
|
||||
|
||||
# 音频文件映射(将合并 voice 映射)
|
||||
AUDIO_MAP = {
|
||||
"检测到物体": os.path.join(AUDIO_BASE_DIR, "音频1.wav"),
|
||||
@@ -100,7 +104,7 @@ def load_wav_file(filepath):
|
||||
if framerate != 16000:
|
||||
import audioop
|
||||
frames, _ = audioop.ratecv(frames, sampwidth, 1, framerate, 16000, None)
|
||||
print(f"[AUDIO] 重采样: {filepath} {framerate}Hz -> 16000Hz")
|
||||
# print(f"[AUDIO] 重采样: {filepath} {framerate}Hz -> 16000Hz")
|
||||
|
||||
_audio_cache[filepath] = frames
|
||||
return frames
|
||||
@@ -129,7 +133,8 @@ def _merge_voice_map():
|
||||
added += 1
|
||||
else:
|
||||
print(f"[AUDIO] 映射文件缺失: {fpath}")
|
||||
print(f"[AUDIO] 已合并 voice 映射 {added} 条")
|
||||
if added > 0:
|
||||
print(f"[AUDIO] 已合并 voice 映射 {added} 条")
|
||||
except Exception as e:
|
||||
print(f"[AUDIO] 读取 voice 映射失败: {e}")
|
||||
|
||||
@@ -220,6 +225,14 @@ async def _broadcast_audio_optimized(pcm_data: bytes):
|
||||
|
||||
# 注意:录制在 broadcast_pcm16_realtime 中统一完成,避免重复
|
||||
|
||||
# Day 28: 播放期间全局暂停 VAD,防止系统听到自己的声音
|
||||
# 这对于没有回声消除(AEC)的系统至关重要,否则导航提示语音会触发 VAD
|
||||
# 导致 VAD 误判为用户说话,从而一直占用识别通道
|
||||
from server_vad import get_server_vad
|
||||
vad = get_server_vad()
|
||||
if vad:
|
||||
vad.set_tts_playing(True)
|
||||
|
||||
# 单次调用交给底层 pacing(20ms节拍在 broadcast_pcm16_realtime 内部实现)
|
||||
await broadcast_pcm16_realtime(full_audio)
|
||||
|
||||
@@ -227,6 +240,12 @@ async def _broadcast_audio_optimized(pcm_data: bytes):
|
||||
except Exception as e:
|
||||
print(f"[AUDIO] 广播音频失败: {e}")
|
||||
finally:
|
||||
# 恢复 VAD 检测
|
||||
from server_vad import get_server_vad
|
||||
vad = get_server_vad()
|
||||
if vad:
|
||||
vad.set_tts_playing(False)
|
||||
|
||||
# 清除播放标志
|
||||
with _playing_lock:
|
||||
_is_playing = False
|
||||
@@ -250,13 +269,14 @@ def initialize_audio_system():
|
||||
# 显示压缩统计
|
||||
if os.getenv("AIGLASS_COMPRESS_AUDIO", "1") == "1":
|
||||
stats = compressed_audio_cache.get_compression_stats()
|
||||
print(f"[AUDIO] 音频压缩统计:")
|
||||
print(f" - 文件数: {stats['files_cached']}")
|
||||
print(f" - 原始大小: {stats['total_original_size'] / 1024:.1f} KB")
|
||||
print(f" - 压缩后: {stats['total_compressed_size'] / 1024:.1f} KB")
|
||||
print(f" - 压缩率: {stats['compression_ratio']:.1%}")
|
||||
print(f" - 节省: {stats['bytes_saved'] / 1024:.1f} KB")
|
||||
|
||||
# print(f"[AUDIO] 音频压缩统计:")
|
||||
# print(f" - 文件数: {stats['files_cached']}")
|
||||
# print(f" - 原始大小: {stats['total_original_size'] / 1024:.1f} KB")
|
||||
# print(f" - 压缩后: {stats['total_compressed_size'] / 1024:.1f} KB")
|
||||
# print(f" - 压缩率: {stats['compression_ratio']:.1%}")
|
||||
# print(f" - 节省: {stats['bytes_saved'] / 1024:.1f} KB")
|
||||
pass
|
||||
|
||||
print("[AUDIO] 音频系统初始化完成(预加载+工作线程)")
|
||||
|
||||
def play_audio_threadsafe(audio_key):
|
||||
@@ -385,8 +405,73 @@ def play_voice_text(text: str):
|
||||
_last_voice_time = current_time
|
||||
return
|
||||
|
||||
# 未匹配则输出日志(便于调试)
|
||||
print(f"[AUDIO] 未找到匹配语音: {text}")
|
||||
# 未匹配则尝试使用 EdgeTTS 进行流式合成 (Day 26)
|
||||
print(f"[AUDIO] 未找到本地语音,尝试 EdgeTTS 合成: {text}")
|
||||
|
||||
# 启动后台任务进行合成和播放
|
||||
# 注意:为了不阻塞主线程,这里使用 create_task
|
||||
try:
|
||||
loop = asyncio.get_event_loop()
|
||||
loop.create_task(_synthesize_and_play_fallback(text))
|
||||
except RuntimeError:
|
||||
# 如果当前线程没有 loop (例如在非 async 上下文中),则使用线程
|
||||
# 但通常 app_main 是 async 的,这里应该没问题
|
||||
pass
|
||||
|
||||
async def _synthesize_and_play_fallback(text: str):
|
||||
"""(内部) 使用 EdgeTTS 合成并播放,支持磁盘缓存"""
|
||||
try:
|
||||
# 动态导入以避免循环依赖
|
||||
from edge_tts_client import text_to_speech_pcm
|
||||
|
||||
global _audio_cache
|
||||
cache_key = f"tts_fallback:{text}"
|
||||
|
||||
# 1. 先检查内存缓存
|
||||
if cache_key in _audio_cache:
|
||||
play_audio_threadsafe(cache_key)
|
||||
return
|
||||
|
||||
# 2. Day 26: 检查磁盘缓存
|
||||
text_hash = hashlib.md5(text.encode('utf-8')).hexdigest()
|
||||
disk_cache_path = os.path.join(TTS_CACHE_DIR, f"{text_hash}.pcm")
|
||||
|
||||
if os.path.exists(disk_cache_path):
|
||||
# 从磁盘加载
|
||||
with open(disk_cache_path, 'rb') as f:
|
||||
pcm_data = f.read()
|
||||
if pcm_data:
|
||||
_audio_cache[cache_key] = pcm_data
|
||||
AUDIO_MAP[cache_key] = cache_key
|
||||
play_audio_threadsafe(cache_key)
|
||||
print(f"[AUDIO] EdgeTTS 从磁盘缓存加载: {text[:20]}...")
|
||||
return
|
||||
|
||||
# 3. 合成 (目标 16kHz PCM)
|
||||
pcm_data = await text_to_speech_pcm(text, target_sample_rate=16000)
|
||||
|
||||
if pcm_data:
|
||||
# 存入内存缓存
|
||||
_audio_cache[cache_key] = pcm_data
|
||||
AUDIO_MAP[cache_key] = cache_key
|
||||
|
||||
# Day 26: 存入磁盘缓存(异步写入,不阻塞播放)
|
||||
try:
|
||||
os.makedirs(TTS_CACHE_DIR, exist_ok=True)
|
||||
with open(disk_cache_path, 'wb') as f:
|
||||
f.write(pcm_data)
|
||||
print(f"[AUDIO] EdgeTTS 已缓存到磁盘: {text[:20]}...")
|
||||
except Exception as disk_err:
|
||||
print(f"[AUDIO] 磁盘缓存写入失败: {disk_err}")
|
||||
|
||||
# 播放
|
||||
play_audio_threadsafe(cache_key)
|
||||
print(f"[AUDIO] EdgeTTS 合成成功: {text}")
|
||||
else:
|
||||
print(f"[AUDIO] EdgeTTS 合成返回空: {text}")
|
||||
|
||||
except Exception as e:
|
||||
print(f"[AUDIO] EdgeTTS 回退失败: {e}")
|
||||
|
||||
# 兼容旧接口
|
||||
play_audio_on_esp32 = play_audio_threadsafe
|
||||
@@ -102,6 +102,19 @@ async def hard_reset_audio(reason: str = ""):
|
||||
# 2) 取消当前AI任务
|
||||
await cancel_current_ai()
|
||||
|
||||
# Day 28: 强制重置 VAD TTS 状态,防止因任务取消导致计数器未归零(VAD 冻结)
|
||||
try:
|
||||
# Safe import to avoid circular dependency
|
||||
import sys
|
||||
if 'server_vad' in sys.modules:
|
||||
server_vad = sys.modules['server_vad']
|
||||
if hasattr(server_vad, 'get_server_vad'):
|
||||
vad = server_vad.get_server_vad()
|
||||
if vad:
|
||||
vad.reset_tts_state()
|
||||
except Exception as e:
|
||||
print(f"[HARD-RESET] 重置 VAD 状态失败: {e}")
|
||||
|
||||
# 3) 日志
|
||||
if reason:
|
||||
print(f"[HARD-RESET] {reason}")
|
||||
|
||||
@@ -59,6 +59,7 @@ async def text_to_speech_stream(
|
||||
|
||||
except Exception as e:
|
||||
print(f"[EdgeTTS] 合成失败: {e}")
|
||||
raise e # Day 23: 抛出异常以便上层重试
|
||||
|
||||
|
||||
async def text_to_speech(
|
||||
@@ -80,9 +81,28 @@ async def text_to_speech(
|
||||
MP3 音频数据
|
||||
"""
|
||||
audio_chunks = []
|
||||
async for chunk in text_to_speech_stream(text, voice, rate, volume):
|
||||
audio_chunks.append(chunk)
|
||||
return b"".join(audio_chunks)
|
||||
|
||||
# Day 23: 添加重试逻辑
|
||||
max_retries = 3
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
audio_chunks = [] # 清空缓存,重新开始
|
||||
async for chunk in text_to_speech_stream(text, voice, rate, volume):
|
||||
audio_chunks.append(chunk)
|
||||
|
||||
# 成功,返回完整音频
|
||||
return b"".join(audio_chunks)
|
||||
|
||||
except Exception:
|
||||
if attempt < max_retries - 1:
|
||||
wait_time = 0.5 * (2 ** attempt)
|
||||
print(f"[EdgeTTS] 合成异常,{wait_time}s 后重试 ({attempt+1}/{max_retries})")
|
||||
await asyncio.sleep(wait_time)
|
||||
else:
|
||||
print(f"[EdgeTTS] 重试 {max_retries} 次后仍失败")
|
||||
return b"" # 最终失败返回空
|
||||
|
||||
return b""
|
||||
|
||||
|
||||
async def text_to_speech_pcm(
|
||||
|
||||
@@ -13,10 +13,9 @@ from typing import AsyncGenerator, Optional
|
||||
from zai import ZhipuAiClient
|
||||
|
||||
# API 配置
|
||||
API_KEY = os.getenv(
|
||||
"GLM_API_KEY",
|
||||
"5915240ea48d4e93b454bc2412d1cc54.e054ej4pPqi9G6rc"
|
||||
)
|
||||
API_KEY = os.getenv("GLM_API_KEY")
|
||||
if not API_KEY:
|
||||
raise RuntimeError("未设置 GLM_API_KEY 环境变量,请在 .env 中配置")
|
||||
MODEL = "glm-4.6v-flash" # 升级到 glm-4.6v-flash (支持视觉)
|
||||
|
||||
# 星期映射
|
||||
@@ -178,14 +177,35 @@ async def chat_stream(user_message: str, image_base64: Optional[str] = None) ->
|
||||
try:
|
||||
# 流式调用
|
||||
# Day 22: 升级到 glm-4.6v-flash
|
||||
# 【修正】根据官方文档,thinking 参数也是必须的
|
||||
response = await asyncio.to_thread(
|
||||
client.chat.completions.create,
|
||||
model=MODEL,
|
||||
messages=messages,
|
||||
thinking={"type": "disabled"},
|
||||
stream=True,
|
||||
)
|
||||
max_retries = 3
|
||||
retry_delay = 1
|
||||
|
||||
response = None
|
||||
for attempt in range(max_retries):
|
||||
try:
|
||||
# 【修正】根据官方文档,thinking 参数也是必须的
|
||||
response = await asyncio.to_thread(
|
||||
client.chat.completions.create,
|
||||
model=MODEL,
|
||||
messages=messages,
|
||||
thinking={"type": "disabled"},
|
||||
stream=True,
|
||||
)
|
||||
break # 成功则跳出循环
|
||||
except Exception as e:
|
||||
error_str = str(e)
|
||||
if attempt < max_retries - 1:
|
||||
if "429" in error_str or "1305" in error_str or "请求过多" in error_str:
|
||||
print(f"[GLM] (流式) 速率限制,{retry_delay}秒后重试... ({attempt + 1}/{max_retries})")
|
||||
await asyncio.sleep(retry_delay)
|
||||
retry_delay *= 2
|
||||
continue
|
||||
# 其他网络错误也可以重试
|
||||
print(f"[GLM] (流式) 连接错误: {e},重试... ({attempt + 1}/{max_retries})")
|
||||
await asyncio.sleep(retry_delay)
|
||||
continue
|
||||
else:
|
||||
raise e # 最后一次尝试失败,抛出异常
|
||||
|
||||
for chunk in response:
|
||||
if chunk.choices[0].delta.content:
|
||||
|
||||
@@ -23,6 +23,7 @@ SEEKING_NEXT_BLINDPATH = "SEEKING_NEXT_BLINDPATH" # 过完马路后寻找下一
|
||||
RECOVERY = "RECOVERY" # 兜底/恢复(感知暂时丢失时)
|
||||
TRAFFIC_LIGHT_DETECTION = "TRAFFIC_LIGHT_DETECTION" # 红绿灯检测模式
|
||||
ITEM_SEARCH = "ITEM_SEARCH" # 找物品模式(暂停导航,由yolomedia处理画面)
|
||||
INDOOR_NAV = "INDOOR_NAV" # 室内导航模式(使用室内导盲模型)
|
||||
|
||||
# ========== 返回结构 ==========
|
||||
@dataclass
|
||||
@@ -247,9 +248,11 @@ class NavigationMaster:
|
||||
blind_nav: BlindPathNavigator,
|
||||
cross_nav: CrossStreetNavigator,
|
||||
*,
|
||||
indoor_nav: BlindPathNavigator = None, # 新增:室内导航器
|
||||
min_tts_interval: float = 1.2):
|
||||
self.blind = blind_nav
|
||||
self.cross = cross_nav
|
||||
self.indoor = indoor_nav # 室内导航器(使用室内导盲模型)
|
||||
self.state = IDLE
|
||||
self.last_guidance_ts = 0.0
|
||||
self.min_tts_interval = min_tts_interval
|
||||
@@ -290,6 +293,38 @@ class NavigationMaster:
|
||||
def get_state(self) -> str:
|
||||
return self.state
|
||||
|
||||
# Day 28: 室内导航可视化绘制
|
||||
def _draw_indoor_visualizations(self, image: np.ndarray, visualizations: list):
|
||||
if not visualizations:
|
||||
return
|
||||
|
||||
for viz in visualizations:
|
||||
v_type = viz.get('type')
|
||||
|
||||
if v_type == 'walkable_mask':
|
||||
mask = viz.get('mask')
|
||||
color_str = viz.get('color', 'rgba(0, 255, 0, 0.3)')
|
||||
# 这里简单处理,只画绿色轮廓和半透明填充
|
||||
if mask is not None:
|
||||
# 1. 绿色覆盖
|
||||
green_mask = np.zeros_like(image)
|
||||
green_mask[mask > 0] = [0, 255, 0] # BGR
|
||||
image[:] = cv2.addWeighted(image, 1.0, green_mask, 0.3, 0)
|
||||
|
||||
# 2. 轮廓
|
||||
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
|
||||
cv2.drawContours(image, contours, -1, (0, 255, 0), 2)
|
||||
|
||||
elif v_type in ('obstacle', 'poi', 'person'):
|
||||
center = viz.get('center')
|
||||
label = viz.get('class_name_cn', '?')
|
||||
if center:
|
||||
cx, cy = center
|
||||
color = (0, 0, 255) if v_type == 'obstacle' else (255, 255, 0)
|
||||
cv2.circle(image, (cx, cy), 5, color, -1)
|
||||
cv2.putText(image, label, (cx + 10, cy), cv2.FONT_HERSHEY_SIMPLEX,
|
||||
0.6, color, 2, cv2.LINE_AA)
|
||||
|
||||
def start_blind_path_navigation(self):
|
||||
"""启动盲道导航模式"""
|
||||
self.state = BLINDPATH_NAV
|
||||
@@ -302,7 +337,14 @@ class NavigationMaster:
|
||||
self.state = CHAT
|
||||
self.cooldown_until = time.time() + self.COOLDOWN_SEC
|
||||
if self.blind:
|
||||
self.blind.reset()
|
||||
try: self.blind.reset()
|
||||
except: pass
|
||||
if self.cross:
|
||||
try: self.cross.reset()
|
||||
except: pass
|
||||
if self.indoor:
|
||||
try: self.indoor.reset()
|
||||
except: pass
|
||||
|
||||
def start_crossing(self):
|
||||
"""启动过马路模式"""
|
||||
@@ -316,6 +358,14 @@ class NavigationMaster:
|
||||
self.state = TRAFFIC_LIGHT_DETECTION
|
||||
self.cooldown_until = time.time() + self.COOLDOWN_SEC
|
||||
|
||||
def start_indoor_navigation(self):
|
||||
"""启动室内导航模式(使用室内导盲模型)"""
|
||||
self.state = INDOOR_NAV
|
||||
self.cooldown_until = time.time() + self.COOLDOWN_SEC
|
||||
# Day 28: 应该重置室内导航器,而不是盲道导航器
|
||||
if self.indoor:
|
||||
self.indoor.reset()
|
||||
|
||||
def is_in_navigation_mode(self):
|
||||
"""检查是否在导航模式(非对话模式)"""
|
||||
return self.state not in ["CHAT", "IDLE", "TRAFFIC_LIGHT_DETECTION", "ITEM_SEARCH"]
|
||||
@@ -384,6 +434,10 @@ class NavigationMaster:
|
||||
self.cross.reset()
|
||||
except Exception:
|
||||
pass
|
||||
try:
|
||||
if self.indoor: self.indoor.reset()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# ----- 内部工具 -----
|
||||
def _say(self, now: float, text: str) -> str:
|
||||
@@ -455,6 +509,35 @@ class NavigationMaster:
|
||||
# 冷却期内允许继续输出画面,但避免"瞬时切换"
|
||||
in_cooldown = now < self.cooldown_until
|
||||
|
||||
# 【新增】室内导航模式:使用室内导盲模型处理帧
|
||||
# Day 26: 支持 IndoorNavigator 返回的 IndoorResult
|
||||
if self.state == INDOOR_NAV:
|
||||
# 优先使用室内导航器,如果没有则 fallback 到盲道导航器
|
||||
nav = self.indoor if self.indoor else self.blind
|
||||
# Day 28: 添加警告日志
|
||||
if self.indoor is None:
|
||||
print("[NAV MASTER] 警告: 室内导航器未初始化,fallback 到盲道导航器!")
|
||||
try:
|
||||
result = nav.process_frame(bgr)
|
||||
except Exception as e:
|
||||
# Day 28: 室内导航出错时,保持在室内模式,不要切到 RECOVERY (会导致自动切回盲道)
|
||||
print(f"[INDOOR ERROR] 室内导航异常: {e}")
|
||||
# self.state = RECOVERY <-- 禁止切换!
|
||||
ann_err = bgr.copy()
|
||||
return OrchestratorResult(ann_err, self._say(now, ""), INDOOR_NAV, {"error": str(e)})
|
||||
|
||||
ann = result.annotated_image if result.annotated_image is not None else bgr.copy()
|
||||
say = result.guidance_text or ""
|
||||
state_info = result.state_info if hasattr(result, 'state_info') else {}
|
||||
|
||||
# Day 28: 绘制室内导航可视化
|
||||
visualizations = result.visualizations if hasattr(result, 'visualizations') else []
|
||||
self._draw_indoor_visualizations(ann, visualizations)
|
||||
|
||||
# Day 28: 确保返回正确的状态 INDOOR_NAV
|
||||
return OrchestratorResult(ann, self._say(now, say), INDOOR_NAV,
|
||||
{"source": "indoor", "state_info": state_info})
|
||||
|
||||
# 各状态处理
|
||||
if self.state in (BLINDPATH_NAV, SEEKING_CROSSWALK, SEEKING_NEXT_BLINDPATH, RECOVERY):
|
||||
# —— 盲道侧 —— 统一调用盲道导航器
|
||||
|
||||
98
server_context.py
Normal file
98
server_context.py
Normal file
@@ -0,0 +1,98 @@
|
||||
# server_context.py
|
||||
# -*- coding: utf-8 -*-
|
||||
import asyncio
|
||||
from typing import Dict, List, Set, Deque, Optional, Tuple, Any
|
||||
from collections import deque
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
from fastapi import WebSocket
|
||||
|
||||
class ServerContext:
|
||||
"""
|
||||
单例模式的服务器全局上下文
|
||||
用于统一管理状态、资源引用和客户端连接,解决 app_main.py 中 global 变量混乱的问题。
|
||||
"""
|
||||
_instance = None
|
||||
_lock = asyncio.Lock() # 异步锁,主要用于保护关键状态切换
|
||||
|
||||
def __new__(cls):
|
||||
if cls._instance is None:
|
||||
cls._instance = super(ServerContext, cls).__new__(cls)
|
||||
cls._instance._initialized = False
|
||||
return cls._instance
|
||||
|
||||
def __init__(self):
|
||||
if self._initialized:
|
||||
return
|
||||
|
||||
self._initialized = True
|
||||
|
||||
# ====== 1. WebSocket 客户端管理 ======
|
||||
self.ui_clients: Dict[int, WebSocket] = {}
|
||||
self.camera_viewers: Set[WebSocket] = set()
|
||||
self.imu_ws_clients: Set[WebSocket] = set()
|
||||
|
||||
self.esp32_audio_ws: Optional[WebSocket] = None
|
||||
self.esp32_camera_ws: Optional[WebSocket] = None
|
||||
|
||||
# ====== 2. 媒体数据缓冲 ======
|
||||
self.current_partial: str = ""
|
||||
self.recent_finals: List[str] = []
|
||||
self.last_frames: Deque[Tuple[float, bytes]] = deque(maxlen=10)
|
||||
|
||||
# ====== 3. 业务状态标志 (State Flags) ======
|
||||
# 盲道导航状态
|
||||
self.navigation_active: bool = False
|
||||
# 过马路导航状态
|
||||
self.cross_street_active: bool = False
|
||||
# Omni 对话状态
|
||||
self.omni_conversation_active: bool = False
|
||||
self.omni_previous_nav_state: Optional[str] = None
|
||||
|
||||
# YOLO 媒体流状态
|
||||
self.yolomedia_running: bool = False
|
||||
self.yolomedia_sending_frames: bool = False
|
||||
|
||||
# ====== 4. 核心组件引用 (Resources) ======
|
||||
# 导航器实例
|
||||
self.blind_path_navigator = None
|
||||
self.cross_street_navigator = None
|
||||
self.indoor_navigator = None
|
||||
|
||||
# 协调器
|
||||
self.orchestrator = None
|
||||
|
||||
# 模型实例
|
||||
self.yolo_seg_model = None
|
||||
self.obstacle_detector = None
|
||||
self.indoor_seg_model = None
|
||||
|
||||
# ====== 5. 异步处理资源 ======
|
||||
# 帧处理线程池
|
||||
self.frame_processing_executor = ThreadPoolExecutor(max_workers=3, thread_name_prefix="frame_proc")
|
||||
|
||||
# 异步帧处理状态
|
||||
self.nav_processing_task: Optional[asyncio.Task] = None
|
||||
self.nav_last_result_image: Any = None
|
||||
self.nav_last_result_jpeg: Optional[bytes] = None
|
||||
self.nav_pending_frame: Any = None
|
||||
self.nav_processing_lock = asyncio.Lock()
|
||||
self.nav_task_start_time: float = 0.0
|
||||
|
||||
def reset_navigation_state(self):
|
||||
"""重置所有导航相关的状态标志"""
|
||||
self.navigation_active = False
|
||||
self.cross_street_active = False
|
||||
self.omni_conversation_active = False
|
||||
# 注意:这里不停止 orchestrator,只是重置标志位
|
||||
|
||||
def add_ui_client(self, ws: WebSocket):
|
||||
self.ui_clients[id(ws)] = ws
|
||||
|
||||
def remove_ui_client(self, ws: WebSocket):
|
||||
self.ui_clients.pop(id(ws), None)
|
||||
|
||||
def get_ui_client_count(self) -> int:
|
||||
return len(self.ui_clients)
|
||||
|
||||
# 全局访问点
|
||||
ctx = ServerContext()
|
||||
@@ -96,7 +96,8 @@ class SileroVAD:
|
||||
self.speech_audio = bytearray() # 存储语音音频
|
||||
|
||||
# TTS 播放状态 - 播放期间暂停 VAD
|
||||
self.tts_playing = False
|
||||
# Day 28: 使用引用计数处理并发播放的情况
|
||||
self.tts_playing_count = 0
|
||||
self.tts_end_time = 0 # TTS 结束时间
|
||||
self.tts_cooldown_ms = 500 # TTS 结束后等待 500ms 再开始检测
|
||||
|
||||
@@ -105,9 +106,9 @@ class SileroVAD:
|
||||
self.window_size = 5 # 滑动窗口大小
|
||||
self.frame_threshold = 3 # 至少多少帧语音才算开始说话
|
||||
|
||||
# Day 23: Pre-speech buffer (Lookback) to fix "cut-off" start of words
|
||||
# 300ms lookback approx. (each chunk is 32ms) -> 10 chunks
|
||||
self.pre_speech_buffer = collections.deque(maxlen=10)
|
||||
# Day 23+28: Pre-speech buffer (Lookback) to fix "cut-off" start of words
|
||||
# Day 28: 增加到 768ms (24 chunks) 以捕获 "室内导航" 等较长开头,防止 ASR 吞字
|
||||
self.pre_speech_buffer = collections.deque(maxlen=24)
|
||||
|
||||
print(f"[VAD] 初始化: threshold={threshold}, threshold_low={threshold_low}, "
|
||||
f"min_silence_ms={min_silence_ms}, min_speech_ms={min_speech_ms}")
|
||||
@@ -120,29 +121,46 @@ class SileroVAD:
|
||||
self.last_speech_time = 0
|
||||
self.speech_start_time = 0
|
||||
self.voice_window.clear()
|
||||
self.tts_playing = False
|
||||
self.tts_playing_count = 0
|
||||
self.tts_end_time = 0
|
||||
if self.model:
|
||||
self.model.reset_states()
|
||||
if hasattr(self, 'pre_speech_buffer'):
|
||||
self.pre_speech_buffer.clear()
|
||||
|
||||
def reset_tts_state(self):
|
||||
"""强制重置 TTS 播放状态 (用于硬重置)"""
|
||||
self.tts_playing_count = 0
|
||||
print("[VAD] 强制重置 TTS 状态 (VAD 恢复)")
|
||||
|
||||
def set_tts_playing(self, playing: bool):
|
||||
"""设置 TTS 播放状态"""
|
||||
self.tts_playing = playing
|
||||
if not playing:
|
||||
# TTS 结束,记录时间
|
||||
self.tts_end_time = time.time() * 1000
|
||||
print("[VAD] TTS 结束,等待冷却期...")
|
||||
"""设置 TTS 播放状态 (引用计数)"""
|
||||
if playing:
|
||||
self.tts_playing_count += 1
|
||||
if self.tts_playing_count == 1:
|
||||
print("[VAD] TTS 开始播放,暂停 VAD 检测")
|
||||
# TTS 开始播放时,如果正在录音则中断
|
||||
if self.is_speaking:
|
||||
self.is_speaking = False
|
||||
self.speech_audio.clear()
|
||||
self.voice_window.clear()
|
||||
# Day 23: Clear lookback buffer
|
||||
if hasattr(self, 'pre_speech_buffer'):
|
||||
self.pre_speech_buffer.clear()
|
||||
# Day 28: 重置模型状态
|
||||
if self.model:
|
||||
self.model.reset_states()
|
||||
print("[VAD] TTS 播放打断语音录制")
|
||||
else:
|
||||
print("[VAD] TTS 开始播放,暂停 VAD 检测")
|
||||
# TTS 开始播放时,如果正在录音则中断
|
||||
if self.is_speaking:
|
||||
self.is_speaking = False
|
||||
self.speech_audio.clear()
|
||||
self.voice_window.clear()
|
||||
# Day 23: Clear lookback buffer
|
||||
if hasattr(self, 'pre_speech_buffer'):
|
||||
self.pre_speech_buffer.clear()
|
||||
print("[VAD] TTS 播放打断语音录制")
|
||||
if self.tts_playing_count > 0:
|
||||
self.tts_playing_count -= 1
|
||||
if self.tts_playing_count == 0:
|
||||
# TTS 结束,记录时间
|
||||
self.tts_end_time = time.time() * 1000
|
||||
print("[VAD] TTS 完全结束,等待冷却期...")
|
||||
else:
|
||||
# 已经是0了,忽略
|
||||
pass
|
||||
|
||||
def process(self, audio_bytes: bytes) -> dict:
|
||||
"""
|
||||
@@ -172,7 +190,7 @@ class SileroVAD:
|
||||
|
||||
# TTS 播放期间,跳过 VAD 检测
|
||||
current_time = time.time() * 1000
|
||||
if self.tts_playing:
|
||||
if self.tts_playing_count > 0:
|
||||
return result
|
||||
|
||||
# TTS 刚结束,等待冷却期
|
||||
|
||||
@@ -479,18 +479,25 @@ def is_detection_running():
|
||||
return _detection_running
|
||||
|
||||
def init_model():
|
||||
"""初始化YOLO模型(单帧处理模式)"""
|
||||
"""初始化YOLO模型(单帧处理模式)
|
||||
Day 26 优化: 包含预热推理,避免 TensorRT 重复加载
|
||||
"""
|
||||
global _model
|
||||
if _model is not None:
|
||||
print("[TRAFFIC] 模型已加载")
|
||||
return True
|
||||
|
||||
try:
|
||||
print("[TRAFFIC] 加载 YOLO 红绿灯检测模型...")
|
||||
_model = YOLO(YOLO_MODEL_PATH)
|
||||
_model = YOLO(YOLO_MODEL_PATH, task='detect')
|
||||
print(f"[TRAFFIC] 模型加载成功: {YOLO_MODEL_PATH}")
|
||||
class_names = _model.names if hasattr(_model, 'names') else {}
|
||||
print(f"[TRAFFIC] 模型类别: {class_names}")
|
||||
|
||||
# Day 26 优化: 预热推理,创建 TensorRT 执行上下文(只创建一次)
|
||||
test_img = np.zeros((640, 640, 3), dtype=np.uint8)
|
||||
_ = _model(test_img, conf=CONF_THRESHOLD, verbose=False)
|
||||
print("[TRAFFIC] 模型预热完成")
|
||||
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"[TRAFFIC] 模型加载失败: {e}")
|
||||
|
||||
@@ -88,14 +88,16 @@ class ProcessingResult:
|
||||
class BlindPathNavigator:
|
||||
"""盲道导航处理器 - 无外部依赖版本"""
|
||||
|
||||
def __init__(self, yolo_model=None, obstacle_detector=None):
|
||||
def __init__(self, yolo_model=None, obstacle_detector=None, enable_crosswalk_detection=True):
|
||||
"""
|
||||
初始化导航器
|
||||
:param yolo_model: YOLO分割模型(可选)
|
||||
:param obstacle_detector: 障碍物检测器(可选)
|
||||
:param enable_crosswalk_detection: 是否启用斑马线检测(室内模式可关闭)
|
||||
"""
|
||||
self.yolo_model = yolo_model
|
||||
self.obstacle_detector = obstacle_detector
|
||||
self.enable_crosswalk_detection = enable_crosswalk_detection
|
||||
|
||||
# 状态变量
|
||||
self.current_state = STATE_ONBOARDING
|
||||
@@ -184,6 +186,10 @@ class BlindPathNavigator:
|
||||
f"持续模式={self.straight_continuous_mode}, "
|
||||
f"限制次数={self.straight_repeat_limit}")
|
||||
logger.info(f"[BlindPath] 方向播报配置: 间隔={self.direction_interval}秒")
|
||||
|
||||
# Day 26 优化: 可配置日志采样间隔
|
||||
self.log_interval = int(os.getenv("AIGLASS_LOG_INTERVAL", "30")) # 每 N 帧输出一次日志
|
||||
logger.info(f"[BlindPath] 日志采样间隔: 每{self.log_interval}帧")
|
||||
|
||||
# 缓存变量
|
||||
self.prev_gray = None
|
||||
@@ -258,8 +264,14 @@ class BlindPathNavigator:
|
||||
self.last_crosswalk_mask = None
|
||||
|
||||
# 【新增】斑马线感知监控器
|
||||
self.crosswalk_monitor = CrosswalkAwarenessMonitor()
|
||||
logger.info("[BlindPath] 斑马线感知监控器已初始化")
|
||||
# 【新增】斑马线感知监控器
|
||||
if self.enable_crosswalk_detection:
|
||||
self.crosswalk_monitor = CrosswalkAwarenessMonitor()
|
||||
logger.info("[BlindPath] 斑马线感知监控器已初始化")
|
||||
else:
|
||||
self.crosswalk_monitor = None
|
||||
logger.info("[BlindPath] 斑马线感知监控器已禁用 (室内模式)")
|
||||
|
||||
logger.info(f"[BlindPath] 盲道检测间隔: 每{self.BLINDPATH_DETECTION_INTERVAL}帧")
|
||||
|
||||
def init_traffic_light_detector(self):
|
||||
@@ -489,16 +501,24 @@ class BlindPathNavigator:
|
||||
# 【新增】检查近距离障碍物并设置语音
|
||||
self._check_and_set_obstacle_voice(detected_obstacles)
|
||||
|
||||
# 【配置】如果禁用了斑马线检测,强制置为None
|
||||
if not self.enable_crosswalk_detection:
|
||||
crosswalk_mask = None
|
||||
|
||||
# 【新增】斑马线感知处理
|
||||
# 【Day 15 优化】减少每帧日志输出,只在每 30 帧输出一次
|
||||
if crosswalk_mask is not None and self.frame_counter % 30 == 0:
|
||||
# 【Day 26 优化】使用可配置的日志间隔
|
||||
if crosswalk_mask is not None and self.frame_counter % self.log_interval == 0:
|
||||
cross_pixels = np.sum(crosswalk_mask > 0)
|
||||
if cross_pixels > 0:
|
||||
logger.info(f"[斑马线] monitor: pixels={cross_pixels}, area={cross_pixels/crosswalk_mask.size*100:.2f}%")
|
||||
elif crosswalk_mask is None and self.frame_counter % 30 == 0:
|
||||
elif crosswalk_mask is None and self.frame_counter % self.log_interval == 0:
|
||||
if self.enable_crosswalk_detection:
|
||||
logger.info(f"[斑马线] crosswalk_mask为None")
|
||||
|
||||
crosswalk_guidance = self.crosswalk_monitor.process_frame(crosswalk_mask, blind_path_mask)
|
||||
crosswalk_guidance = None
|
||||
if self.crosswalk_monitor:
|
||||
crosswalk_guidance = self.crosswalk_monitor.process_frame(crosswalk_mask, blind_path_mask)
|
||||
|
||||
if crosswalk_guidance:
|
||||
logger.info(f"[斑马线感知] 检测结果: area={crosswalk_guidance.get('area', 0):.3f}, "
|
||||
f"should_broadcast={crosswalk_guidance.get('should_broadcast', False)}, "
|
||||
@@ -511,7 +531,7 @@ class BlindPathNavigator:
|
||||
logger.info(f"[斑马线语音] 已设置待播报语音: {crosswalk_guidance['voice_text']}, 优先级{crosswalk_guidance['priority']}")
|
||||
|
||||
# 【新增】添加斑马线可视化
|
||||
if crosswalk_mask is not None:
|
||||
if crosswalk_mask is not None and self.crosswalk_monitor:
|
||||
# 计算可视化数据
|
||||
total_pixels = crosswalk_mask.size
|
||||
crosswalk_pixels = np.sum(crosswalk_mask > 0)
|
||||
|
||||
@@ -272,21 +272,22 @@ class CrossStreetNavigator:
|
||||
logger.info(f"[CROSS_STREET] 斑马线检测间隔: 每{self.CROSSWALK_DETECTION_INTERVAL}帧")
|
||||
|
||||
# 确保模型在 GPU 上
|
||||
# Day 20: TensorRT 引擎不需要 .to()
|
||||
# Day 20/26: TensorRT 引擎不需要 .to(),改用 model_utils 检查
|
||||
if self.seg_model and torch.cuda.is_available():
|
||||
try:
|
||||
# 检查是否是 TensorRT 引擎
|
||||
from model_utils import is_tensorrt_engine
|
||||
model_path = getattr(self.seg_model, 'ckpt_path', '') or ''
|
||||
if not model_path.endswith('.engine'):
|
||||
if hasattr(self.seg_model, 'model') and hasattr(self.seg_model.model, 'to'):
|
||||
self.seg_model.model.to('cuda')
|
||||
elif hasattr(self.seg_model, 'to'):
|
||||
self.seg_model.to('cuda')
|
||||
if is_tensorrt_engine(model_path):
|
||||
pass # TensorRT 引擎无需 .to(),静默跳过
|
||||
elif hasattr(self.seg_model, 'model') and hasattr(self.seg_model.model, 'to'):
|
||||
self.seg_model.model.to('cuda')
|
||||
logger.info("[CROSS_STREET] 模型已移至 GPU")
|
||||
else:
|
||||
logger.info("[CROSS_STREET] TensorRT 引擎已加载,跳过 .to()")
|
||||
except Exception as e:
|
||||
logger.warning(f"[CROSS_STREET] 无法将模型移至 GPU: {e}")
|
||||
elif hasattr(self.seg_model, 'to'):
|
||||
self.seg_model.to('cuda')
|
||||
logger.info("[CROSS_STREET] 模型已移至 GPU")
|
||||
except Exception:
|
||||
pass # Day 26: 静默处理,避免启动日志刷屏
|
||||
|
||||
def reset(self):
|
||||
"""重置状态"""
|
||||
|
||||
534
workflow_indoor.py
Normal file
534
workflow_indoor.py
Normal file
@@ -0,0 +1,534 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
室内导航工作流 (Indoor Navigation Workflow)
|
||||
Day 26: 专为室内导盲模型 (yolo11l-seg-indoor14) 设计
|
||||
|
||||
类别映射 (14 classes from MIT Indoor):
|
||||
- 可行走区域: floor(0), corridor(1), sidewalk(2)
|
||||
- 静态障碍物: chair(3), table(4), sofa_bed(5), cabinet(11), trash_can(12)
|
||||
- 兴趣点: door(6), elevator(7), stairs(8)
|
||||
- 边界: wall(9), window(13)
|
||||
- 动态障碍: person(10)
|
||||
"""
|
||||
|
||||
import os
|
||||
import time
|
||||
import logging
|
||||
import numpy as np
|
||||
import cv2
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional, List, Dict, Any
|
||||
from collections import deque
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ========== 类别常量 (14类模型 - yolo11l-seg-indoor14) ==========
|
||||
# Day 28: 使用 14 类模型 (MIT Indoor Subset)
|
||||
|
||||
# 可行走区域 (0-2)
|
||||
WALKABLE_CLASSES = {0, 1, 2} # floor, corridor, sidewalk
|
||||
CLASS_FLOOR = 0
|
||||
CLASS_CORRIDOR = 1
|
||||
CLASS_SIDEWALK = 2
|
||||
|
||||
# 静态障碍物 (3-5, 11-12)
|
||||
OBSTACLE_CLASSES = {3, 4, 5, 11, 12, 13} # window 只要是障碍物也算? window(13)是墙?
|
||||
# Wait, Window is 13. Is window an obstacle? Usually yes (don't walk into it).
|
||||
# Cabinet 11, Trash 12.
|
||||
CLASS_CHAIR = 3
|
||||
CLASS_TABLE = 4
|
||||
CLASS_SOFA_BED = 5
|
||||
CLASS_CABINET = 11
|
||||
CLASS_TRASH_CAN = 12
|
||||
CLASS_WINDOW = 13 # 窗户通常视为边界或障碍
|
||||
CLASS_WALL = 9 # Wall 9
|
||||
|
||||
# 兴趣点 (6-8)
|
||||
POI_CLASSES = {6, 7, 8} # door, elevator, stairs
|
||||
CLASS_DOOR = 6
|
||||
CLASS_ELEVATOR = 7
|
||||
CLASS_STAIRS = 8
|
||||
|
||||
# 动态障碍 (10)
|
||||
CLASS_PERSON = 10
|
||||
|
||||
# 边界
|
||||
BOUNDARY_CLASSES = {9, 13} # wall(9), window(13)
|
||||
|
||||
# 类别名称映射
|
||||
CLASS_NAMES = {
|
||||
0: 'floor', 1: 'corridor', 2: 'sidewalk',
|
||||
3: 'chair', 4: 'table', 5: 'sofa_bed',
|
||||
6: 'door', 7: 'elevator', 8: 'stairs',
|
||||
9: 'wall', 10: 'person', 11: 'cabinet',
|
||||
12: 'trash_can', 13: 'window'
|
||||
}
|
||||
|
||||
# 中文名称(用于语音)
|
||||
CLASS_NAMES_CN = {
|
||||
0: '地面', 1: '走廊', 2: '人行道',
|
||||
3: '椅子', 4: '桌子', 5: '沙发',
|
||||
6: '门', 7: '电梯', 8: '楼梯',
|
||||
9: '墙壁', 10: '行人', 11: '柜子',
|
||||
12: '垃圾桶', 13: '窗户'
|
||||
}
|
||||
|
||||
# 物品类 (无)
|
||||
ITEM_CLASSES = set()
|
||||
|
||||
# ========== 配置参数 ==========
|
||||
# Day 28: 进一步降低阈值以提升木地板检测率
|
||||
# Day 28: 进一步降低阈值以提升木地板检测率
|
||||
CONF_THRESHOLD = float(os.getenv('INDOOR_CONF_THRESHOLD', '0.05')) # 全局极低阈值,由后续逻辑二次过滤
|
||||
WALKABLE_MIN_AREA = int(os.getenv('INDOOR_WALKABLE_MIN_AREA', '50')) # 极端降低最小面积以进行调试 (原 1000)
|
||||
OBSTACLE_MIN_AREA = int(os.getenv('INDOOR_OBSTACLE_MIN_AREA', '300'))
|
||||
|
||||
# 语音间隔
|
||||
GUIDE_INTERVAL = float(os.getenv('INDOOR_GUIDE_INTERVAL', '3.0'))
|
||||
DIRECTION_INTERVAL = float(os.getenv('INDOOR_DIRECTION_INTERVAL', '2.5'))
|
||||
POI_INTERVAL = float(os.getenv('INDOOR_POI_INTERVAL', '5.0'))
|
||||
OBSTACLE_INTERVAL = float(os.getenv('INDOOR_OBSTACLE_INTERVAL', '2.0'))
|
||||
# Day 28: “未检测到可行走区域”播报间隔(8秒)
|
||||
NO_WALKABLE_INTERVAL = float(os.getenv('INDOOR_NO_WALKABLE_INTERVAL', '8.0'))
|
||||
|
||||
# ========== 可视化颜色 (BGR) ==========
|
||||
VIS_COLORS = {
|
||||
'walkable': (0, 255, 0), # 绿色 - 可行走
|
||||
'obstacle': (0, 0, 255), # 红色 - 障碍物
|
||||
'poi': (255, 255, 0), # 青色 - 兴趣点
|
||||
'boundary': (128, 128, 128), # 灰色 - 边界
|
||||
'person': (255, 0, 255), # 粉色 - 行人
|
||||
'centerline': (255, 255, 0), # 黄色 - 引导线
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class IndoorResult:
|
||||
"""室内导航结果"""
|
||||
annotated_image: Optional[np.ndarray] = None
|
||||
guidance_text: str = ""
|
||||
state_info: Dict[str, Any] = None
|
||||
visualizations: List[Dict[str, Any]] = None
|
||||
|
||||
def __post_init__(self):
|
||||
if self.state_info is None:
|
||||
self.state_info = {}
|
||||
if self.visualizations is None:
|
||||
self.visualizations = []
|
||||
|
||||
|
||||
class IndoorNavigator:
|
||||
"""室内导航器 - 专为室内导盲模型设计"""
|
||||
|
||||
def __init__(self, seg_model=None, device_id: str = "indoor"):
|
||||
self.seg_model = seg_model
|
||||
self.device_id = device_id
|
||||
self.frame_counter = 0
|
||||
|
||||
# Day 28: 持久化缓冲参数
|
||||
self.no_walkable_persistence_sec = 2.0
|
||||
self.last_walkable_detected_time = 0
|
||||
|
||||
# 语音节流
|
||||
self.last_guide_time = 0
|
||||
self.last_direction_time = 0
|
||||
self.last_poi_time = 0
|
||||
self.last_obstacle_time = 0
|
||||
self.last_guidance_text = ""
|
||||
self.last_direction_text = ""
|
||||
|
||||
# 检测间隔
|
||||
self.detection_interval = int(os.getenv('INDOOR_DETECTION_INTERVAL', '6'))
|
||||
self.last_detection_frame = 0
|
||||
|
||||
# 缓存
|
||||
self.last_walkable_mask = None
|
||||
self.last_valid_walkable_mask = None
|
||||
self.last_no_walkable_time = 0
|
||||
self.last_obstacles = []
|
||||
self.last_obstacles = []
|
||||
self.last_pois = []
|
||||
|
||||
# Day 28: 移除未使用的灰度图转换 (光流功能未启用)
|
||||
# self.prev_gray = None
|
||||
|
||||
# 日志间隔
|
||||
self.log_interval = int(os.getenv('AIGLASS_LOG_INTERVAL', '30'))
|
||||
|
||||
logger.info(f"[INDOOR] 室内导航器初始化完成")
|
||||
logger.info(f"[INDOOR] 检测间隔: 每{self.detection_interval}帧")
|
||||
logger.info(f"[INDOOR] 可行走类别: {[CLASS_NAMES[c] for c in WALKABLE_CLASSES]}")
|
||||
|
||||
def reset(self):
|
||||
"""重置状态"""
|
||||
self.frame_counter = 0
|
||||
self.last_guide_time = 0
|
||||
self.last_direction_time = 0
|
||||
self.last_poi_time = 0
|
||||
self.last_obstacle_time = 0
|
||||
self.last_guidance_text = ""
|
||||
self.last_direction_text = ""
|
||||
self.last_valid_walkable_mask = None
|
||||
self.last_no_walkable_time = 0 # Day 28: "未检测到可行走区域"节流
|
||||
self.last_walkable_detected_time = 0
|
||||
self.last_walkable_mask = None
|
||||
self.last_obstacles = []
|
||||
self.last_pois = []
|
||||
self.prev_gray = None
|
||||
logger.info("[INDOOR] 导航器已重置")
|
||||
|
||||
def process_frame(self, image: np.ndarray) -> IndoorResult:
|
||||
"""处理单帧图像"""
|
||||
self.frame_counter += 1
|
||||
h, w = image.shape[:2]
|
||||
now = time.time()
|
||||
|
||||
frame_visualizations = []
|
||||
guidance_text = ""
|
||||
state_info = {}
|
||||
|
||||
# 是否执行检测
|
||||
should_detect = (self.frame_counter - self.last_detection_frame) >= self.detection_interval
|
||||
|
||||
if should_detect and self.seg_model is not None:
|
||||
self.last_detection_frame = self.frame_counter
|
||||
|
||||
# 执行分割推理
|
||||
walkable_mask, obstacles, pois = self._detect_all(image)
|
||||
|
||||
# 更新缓存
|
||||
self.last_walkable_mask = walkable_mask
|
||||
self.last_obstacles = obstacles
|
||||
self.last_pois = pois
|
||||
else:
|
||||
# 使用缓存
|
||||
walkable_mask = self.last_walkable_mask
|
||||
obstacles = self.last_obstacles
|
||||
pois = self.last_pois
|
||||
|
||||
|
||||
# 3. 缓存有效的 mask (用于可视化防抖)
|
||||
walkable_area = int(np.count_nonzero(walkable_mask)) if walkable_mask is not None else 0
|
||||
if walkable_area > WALKABLE_MIN_AREA:
|
||||
self.last_valid_walkable_mask = walkable_mask
|
||||
|
||||
# 4. 生成导航引导
|
||||
if walkable_mask is not None:
|
||||
guidance_text = self._generate_guidance(walkable_mask, obstacles, pois, h, w, now)
|
||||
|
||||
# 5. 可视化 (带持久化防抖)
|
||||
viz_mask = walkable_mask
|
||||
|
||||
# 如果当前没有检测到路,但还在持久化时间内,使用缓存的 mask 进行可视化
|
||||
if (viz_mask is None or walkable_area < WALKABLE_MIN_AREA) and \
|
||||
(now - self.last_walkable_detected_time) < self.no_walkable_persistence_sec and \
|
||||
self.last_valid_walkable_mask is not None:
|
||||
viz_mask = self.last_valid_walkable_mask
|
||||
|
||||
self._add_mask_visualization(viz_mask, frame_visualizations,
|
||||
"walkable_mask", "rgba(0, 255, 0, 0.3)")
|
||||
|
||||
# 障碍物可视化
|
||||
for obs in obstacles:
|
||||
self._add_detection_visualization(obs, frame_visualizations, "obstacle")
|
||||
|
||||
# 兴趣点可视化
|
||||
for poi in pois:
|
||||
self._add_detection_visualization(poi, frame_visualizations, "poi")
|
||||
|
||||
# 日志
|
||||
if self.frame_counter % self.log_interval == 0:
|
||||
# Day 28: 修复面积计算 - 使用 count_nonzero 而不是 sum (mask 值是 0 或 255)
|
||||
walkable_area = int(np.count_nonzero(walkable_mask)) if walkable_mask is not None else 0
|
||||
logger.info(f"[INDOOR] Frame={self.frame_counter} | 可行走面积={walkable_area} | "
|
||||
f"障碍物={len(obstacles)} | 兴趣点={len(pois)}")
|
||||
|
||||
# 更新状态信息
|
||||
state_info = {
|
||||
'frame': self.frame_counter,
|
||||
'walkable_detected': walkable_mask is not None and walkable_mask.sum() > 0,
|
||||
'obstacles_count': len(obstacles),
|
||||
'pois_count': len(pois),
|
||||
}
|
||||
|
||||
# Day 28: 移除未使用的灰度图转换
|
||||
# self.prev_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
|
||||
|
||||
# Day 28: 避免每帧复制图像,直接传递原图像(下游如需可视化再复制)
|
||||
return IndoorResult(
|
||||
annotated_image=image, # 不再 copy,节省内存/CPU
|
||||
guidance_text=guidance_text,
|
||||
state_info=state_info,
|
||||
visualizations=frame_visualizations
|
||||
)
|
||||
|
||||
def _detect_all(self, image: np.ndarray):
|
||||
"""执行分割检测,返回可行走区域、障碍物、兴趣点"""
|
||||
h, w = image.shape[:2]
|
||||
walkable_mask = np.zeros((h, w), dtype=np.uint8)
|
||||
obstacles = []
|
||||
pois = []
|
||||
|
||||
try:
|
||||
imgsz = int(os.getenv("AIGLASS_YOLO_IMGSZ", "480"))
|
||||
use_half = os.getenv("AIGLASS_YOLO_HALF", "1") == "1"
|
||||
|
||||
results = self.seg_model.predict(
|
||||
image,
|
||||
imgsz=imgsz,
|
||||
conf=CONF_THRESHOLD,
|
||||
verbose=False,
|
||||
half=use_half
|
||||
)
|
||||
|
||||
if results and len(results) > 0 and results[0].masks is not None:
|
||||
r0 = results[0]
|
||||
masks = r0.masks.data.cpu().numpy()
|
||||
boxes = r0.boxes
|
||||
|
||||
for i, (mask, cls_id, conf) in enumerate(zip(masks, boxes.cls, boxes.conf)):
|
||||
cls_id = int(cls_id.item())
|
||||
conf_val = float(conf.item())
|
||||
|
||||
# 过滤物品类 (默认不参与导航逻辑,避免刷屏)
|
||||
if cls_id in ITEM_CLASSES:
|
||||
continue
|
||||
|
||||
# Day 28: 混合阈值策略
|
||||
# 地面类(WALKABLE)使用全局低阈值(0.05)以提高召回率
|
||||
# 障碍物(OBSTACLE/POI/BOUNDARY)使用较高阈值(0.25)以拒绝误报
|
||||
filter_threshold = 0.25
|
||||
if cls_id in WALKABLE_CLASSES:
|
||||
filter_threshold = 0.05
|
||||
|
||||
if conf_val < filter_threshold:
|
||||
continue
|
||||
|
||||
# 调整 mask 尺寸
|
||||
mask_resized = cv2.resize(mask, (w, h), interpolation=cv2.INTER_NEAREST)
|
||||
mask_bin = (mask_resized > 0.5).astype(np.uint8)
|
||||
area = int(mask_bin.sum())
|
||||
|
||||
|
||||
# Day 28: 调试日志 - 查看检测到的类别 (ALL detections)
|
||||
if area > 10: # 几乎记录所有检测
|
||||
cls_name = CLASS_NAMES.get(cls_id, f'unknown_{cls_id}')
|
||||
logger.info(f"[INDOOR DEBUG] 检测到 {cls_name}(id={cls_id}) conf={conf_val:.2f} area={area}")
|
||||
|
||||
if area < 50: # 极端小的才过滤
|
||||
continue
|
||||
|
||||
# 可行走区域
|
||||
if cls_id in WALKABLE_CLASSES and area > WALKABLE_MIN_AREA:
|
||||
# Day 28: 确保类型一致,避免 bitwise_or 失败
|
||||
mask_add = (mask_bin * 255).astype(np.uint8)
|
||||
walkable_mask = cv2.bitwise_or(walkable_mask, mask_add)
|
||||
if area > 10000: # 调试:记录大面积添加
|
||||
logger.info(f"[INDOOR DEBUG] 添加可行走区域: class={cls_id} area={area} current_total={np.count_nonzero(walkable_mask)}")
|
||||
|
||||
# 障碍物
|
||||
elif cls_id in OBSTACLE_CLASSES or cls_id == CLASS_PERSON:
|
||||
if area > OBSTACLE_MIN_AREA:
|
||||
obstacles.append({
|
||||
'class_id': cls_id,
|
||||
'class_name': CLASS_NAMES.get(cls_id, 'unknown'),
|
||||
'class_name_cn': CLASS_NAMES_CN.get(cls_id, '未知'),
|
||||
'conf': conf_val,
|
||||
'mask': mask_bin,
|
||||
'area': area,
|
||||
'center': self._mask_center(mask_bin),
|
||||
})
|
||||
|
||||
# 兴趣点
|
||||
elif cls_id in POI_CLASSES:
|
||||
pois.append({
|
||||
'class_id': cls_id,
|
||||
'class_name': CLASS_NAMES.get(cls_id, 'unknown'),
|
||||
'class_name_cn': CLASS_NAMES_CN.get(cls_id, '未知'),
|
||||
'conf': conf_val,
|
||||
'mask': mask_bin,
|
||||
'area': area,
|
||||
'center': self._mask_center(mask_bin),
|
||||
})
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"[INDOOR] 检测失败: {e}")
|
||||
|
||||
return walkable_mask, obstacles, pois
|
||||
|
||||
def _mask_center(self, mask: np.ndarray):
|
||||
"""计算 mask 质心"""
|
||||
M = cv2.moments(mask)
|
||||
if abs(M["m00"]) < 1e-6:
|
||||
return None
|
||||
cx = int(M["m10"] / M["m00"])
|
||||
cy = int(M["m01"] / M["m00"])
|
||||
return (cx, cy)
|
||||
|
||||
def _generate_guidance(self, walkable_mask, obstacles, pois, h, w, now):
|
||||
"""生成导航引导文本"""
|
||||
guidance_text = ""
|
||||
|
||||
# 1. 计算可行走区域的偏移和方向
|
||||
direction_guidance = self._compute_direction_guidance(walkable_mask, h, w)
|
||||
|
||||
# 2. 检查障碍物警告
|
||||
obstacle_warning = self._check_obstacle_warning(obstacles, walkable_mask, h, w)
|
||||
|
||||
# 3. 检查兴趣点提示
|
||||
poi_hint = self._check_poi_hint(pois, h, w)
|
||||
|
||||
# 优先级:障碍物 > 方向 > 兴趣点
|
||||
if obstacle_warning and (now - self.last_obstacle_time) > OBSTACLE_INTERVAL:
|
||||
guidance_text = obstacle_warning
|
||||
self.last_obstacle_time = now
|
||||
self.last_guidance_text = guidance_text
|
||||
elif direction_guidance:
|
||||
# Day 28: "未检测到可行走区域" 降低播报频率
|
||||
# Day 28: "未检测到可行走区域" 降低播报频率
|
||||
if direction_guidance == "未检测到可行走区域":
|
||||
# 首次检测到(last_no_walkable_time == 0)或者间隔已过8秒
|
||||
if self.last_no_walkable_time == 0 or (now - self.last_no_walkable_time) > NO_WALKABLE_INTERVAL:
|
||||
guidance_text = direction_guidance
|
||||
self.last_no_walkable_time = now
|
||||
# 方向引导节流
|
||||
elif direction_guidance != self.last_direction_text:
|
||||
if (now - self.last_direction_time) > DIRECTION_INTERVAL:
|
||||
guidance_text = direction_guidance
|
||||
self.last_direction_time = now
|
||||
self.last_direction_text = direction_guidance
|
||||
elif (now - self.last_guide_time) > GUIDE_INTERVAL:
|
||||
# 同样的方向,降低频率
|
||||
guidance_text = direction_guidance
|
||||
self.last_guide_time = now
|
||||
elif poi_hint and (now - self.last_poi_time) > POI_INTERVAL:
|
||||
guidance_text = poi_hint
|
||||
self.last_poi_time = now
|
||||
|
||||
return guidance_text
|
||||
|
||||
def _compute_direction_guidance(self, walkable_mask, h, w):
|
||||
"""计算方向引导"""
|
||||
# Day 28: 使用 count_nonzero 替代 sum (mask 值是 0 或 255)
|
||||
walkable_area = np.count_nonzero(walkable_mask) if walkable_mask is not None else 0
|
||||
now = time.time()
|
||||
|
||||
if walkable_area < WALKABLE_MIN_AREA:
|
||||
# 缓冲逻辑:如果最近才看到过路,不要立刻报错
|
||||
if (now - self.last_walkable_detected_time) < self.no_walkable_persistence_sec:
|
||||
return None # 保持沉默,或者返回 "保持直行" (更稳妥是沉默)
|
||||
return "未检测到可行走区域"
|
||||
|
||||
# 如果检测到了,更新时间戳
|
||||
self.last_walkable_detected_time = now
|
||||
|
||||
# 分析下半部分(更近的区域)
|
||||
lower_half = walkable_mask[int(h * 0.5):, :]
|
||||
|
||||
if np.count_nonzero(lower_half) < 1000:
|
||||
if (now - self.last_walkable_detected_time) < self.no_walkable_persistence_sec:
|
||||
return None
|
||||
return "前方可行走区域较小,请小心"
|
||||
|
||||
# 计算左中右分布
|
||||
third = w // 3
|
||||
left_area = lower_half[:, :third].sum()
|
||||
center_area = lower_half[:, third:2*third].sum()
|
||||
right_area = lower_half[:, 2*third:].sum()
|
||||
|
||||
total = left_area + center_area + right_area + 1e-6
|
||||
left_ratio = left_area / total
|
||||
center_ratio = center_area / total
|
||||
right_ratio = right_area / total
|
||||
|
||||
# 方向判断
|
||||
if center_ratio > 0.4:
|
||||
return "保持直行"
|
||||
elif left_ratio > right_ratio * 1.5:
|
||||
return "向左调整"
|
||||
elif right_ratio > left_ratio * 1.5:
|
||||
return "向右调整"
|
||||
else:
|
||||
return "保持直行"
|
||||
|
||||
def _check_obstacle_warning(self, obstacles, walkable_mask, h, w):
|
||||
"""检查是否有障碍物在前方"""
|
||||
if not obstacles:
|
||||
return None
|
||||
|
||||
# 定义前方区域(画面中下部)
|
||||
front_zone_top = int(h * 0.4)
|
||||
front_zone_left = int(w * 0.2)
|
||||
front_zone_right = int(w * 0.8)
|
||||
|
||||
for obs in obstacles:
|
||||
center = obs.get('center')
|
||||
if center is None:
|
||||
continue
|
||||
cx, cy = center
|
||||
|
||||
# 检查是否在前方区域
|
||||
if front_zone_top < cy < h and front_zone_left < cx < front_zone_right:
|
||||
name_cn = obs.get('class_name_cn', '障碍物')
|
||||
|
||||
# 判断位置
|
||||
if cx < w * 0.4:
|
||||
return f"左前方有{name_cn}"
|
||||
elif cx > w * 0.6:
|
||||
return f"右前方有{name_cn}"
|
||||
else:
|
||||
return f"正前方有{name_cn}"
|
||||
|
||||
return None
|
||||
|
||||
def _check_poi_hint(self, pois, h, w):
|
||||
"""检查兴趣点提示"""
|
||||
if not pois:
|
||||
return None
|
||||
|
||||
for poi in pois:
|
||||
cls_id = poi.get('class_id')
|
||||
name_cn = poi.get('class_name_cn', '兴趣点')
|
||||
center = poi.get('center')
|
||||
|
||||
if center is None:
|
||||
continue
|
||||
cx, cy = center
|
||||
|
||||
# 楼梯需要特别警告
|
||||
if cls_id == CLASS_STAIRS:
|
||||
if cy > h * 0.5: # 比较近
|
||||
return f"注意前方有{name_cn}"
|
||||
|
||||
# 门/电梯提示
|
||||
elif cls_id in (CLASS_DOOR, CLASS_ELEVATOR):
|
||||
if cy > h * 0.3: # 在视野内
|
||||
position = "左侧" if cx < w * 0.4 else ("右侧" if cx > w * 0.6 else "前方")
|
||||
return f"{position}有{name_cn}"
|
||||
|
||||
return None
|
||||
|
||||
def _add_mask_visualization(self, mask, visualizations, viz_type, color):
|
||||
"""添加 mask 可视化"""
|
||||
if mask is None or mask.sum() == 0:
|
||||
return
|
||||
|
||||
visualizations.append({
|
||||
'type': viz_type,
|
||||
'mask': mask,
|
||||
'color': color
|
||||
})
|
||||
|
||||
def _add_detection_visualization(self, detection, visualizations, det_type):
|
||||
"""添加检测框可视化"""
|
||||
center = detection.get('center')
|
||||
if center is None:
|
||||
return
|
||||
|
||||
visualizations.append({
|
||||
'type': det_type,
|
||||
'center': center,
|
||||
'class_name': detection.get('class_name', 'unknown'),
|
||||
'class_name_cn': detection.get('class_name_cn', '未知'),
|
||||
'conf': detection.get('conf', 0),
|
||||
})
|
||||
@@ -15,7 +15,9 @@ except Exception:
|
||||
from ultralytics import YOLO as _MODEL
|
||||
|
||||
# Day 20: 优先使用 TensorRT 引擎
|
||||
DEFAULT_MODEL_PATH = get_best_model_path(os.getenv("YOLOE_MODEL_PATH", "model/yoloe-11l-seg.pt"))
|
||||
# Day 28: 使用基于当前文件的绝对路径
|
||||
_DEFAULT_YOLOE_PATH = os.path.join(os.path.dirname(os.path.abspath(__file__)), "model", "yoloe-11l-seg.pt")
|
||||
DEFAULT_MODEL_PATH = get_best_model_path(os.getenv("YOLOE_MODEL_PATH", _DEFAULT_YOLOE_PATH))
|
||||
TRACKER_CFG = os.getenv("YOLO_TRACKER_YAML", "bytetrack.yaml")
|
||||
|
||||
class YoloEBackend:
|
||||
|
||||
@@ -24,6 +24,10 @@ from mediapipe.framework.formats import landmark_pb2
|
||||
from ultralytics import YOLO
|
||||
from ultralytics.utils.plotting import Colors
|
||||
import bridge_io
|
||||
|
||||
# Day 26: 抑制 pygame 社区欢迎信息
|
||||
import os
|
||||
os.environ['PYGAME_HIDE_SUPPORT_PROMPT'] = "1"
|
||||
import pygame # 用于播放本地音频文件
|
||||
|
||||
from audio_player import play_audio_threadsafe
|
||||
|
||||
Reference in New Issue
Block a user