Compare commits
5 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
538f987a10 | ||
|
|
2c2798fe04 | ||
|
|
861772950f | ||
|
|
fec2523b5c | ||
|
|
0e08372bad |
398
README.md
398
README.md
@@ -1,234 +1,164 @@
|
||||
# YOLO11l-seg 室内导盲模型训练指南
|
||||
|
||||
本方案针对 Dell R730 服务器(RTX 3090 24GB, 192GB RAM)定制,专注于室内导盲场景下的 **实例分割 (Instance Segmentation)** 任务。
|
||||
|
||||
## 1. 硬件环境配置
|
||||
|
||||
### 1.1 系统与环境
|
||||
|
||||
```bash
|
||||
# 1. 创建虚拟环境
|
||||
conda create -n yolo python=3.10 -y
|
||||
conda activate yolo
|
||||
|
||||
# 2. 安装 PyTorch (CUDA 12.4)
|
||||
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
|
||||
|
||||
# 3. 安装 Ultralytics
|
||||
pip install ultralytics
|
||||
|
||||
# 4. 其它依赖
|
||||
pip install opencv-python matplotlib albumentations
|
||||
```
|
||||
|
||||
### 1.2 监控显卡
|
||||
|
||||
```bash
|
||||
watch -n 1 nvidia-smi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. 数据集
|
||||
|
||||
### 2.1 原始数据集
|
||||
|
||||
使用 Roboflow 的 **MIT Indoor Scene Classification** 数据集:
|
||||
|
||||
| 项目 | 内容 |
|
||||
|------|------|
|
||||
| 来源 | [Roboflow Universe](https://universe.roboflow.com/pasha-renaisan-gmail-com/indoor-mit-scene-classification) |
|
||||
| 格式 | YOLOv11 (带分割标注) |
|
||||
| 原始类别 | **2573 个** (过多且杂乱) |
|
||||
|
||||
### 2.2 类别筛选
|
||||
|
||||
原始 2573 个类别包含大量与导盲无关的物体(如 alarm clock、bananas 等),需要筛选。
|
||||
|
||||
使用 `filter_categories.py` 脚本将类别从 **2573 → 14** 个导盲相关类别:
|
||||
|
||||
| ID | 类别 | 说明 |
|
||||
|----|------|------|
|
||||
| 0 | `floor` | 可行走地面 |
|
||||
| 1 | `corridor` | 走廊/通道 |
|
||||
| 2 | `sidewalk` | 人行道 |
|
||||
| 3 | `chair` | 椅子 |
|
||||
| 4 | `table` | 桌子 |
|
||||
| 5 | `sofa_bed` | 沙发/床 |
|
||||
| 6 | `door` | 门 |
|
||||
| 7 | `elevator` | 电梯 |
|
||||
| 8 | `stairs` | 楼梯 |
|
||||
| 9 | `wall` | 墙壁 |
|
||||
| 10 | `person` | 行人 |
|
||||
| 11 | `cabinet` | 柜子 |
|
||||
| 12 | `trash_can` | 垃圾桶 |
|
||||
| 13 | `window` | 窗户/玻璃门 |
|
||||
|
||||
### 2.3 筛选脚本
|
||||
|
||||
```bash
|
||||
python filter_categories.py
|
||||
```
|
||||
|
||||
生成的数据集:
|
||||
|
||||
| 集合 | 数量 |
|
||||
|------|------|
|
||||
| 训练集 | 1265 张 |
|
||||
| 验证集 | 363 张 |
|
||||
| 测试集 | 175 张 |
|
||||
|
||||
### 2.4 目录结构
|
||||
|
||||
```
|
||||
/home/rongye/ProgramFiles/Yolo/
|
||||
├── yolo11l-seg.pt # 预训练权重 (标准模型)
|
||||
├── train.py # 训练脚本
|
||||
├── data.yaml # 数据配置
|
||||
├── filter_categories.py # 类别筛选脚本
|
||||
└── datasets/
|
||||
└── indoor_blind/ # 筛选后的数据集
|
||||
├── data.yaml
|
||||
├── train/
|
||||
│ ├── images/
|
||||
│ └── labels/
|
||||
├── valid/
|
||||
│ ├── images/
|
||||
│ └── labels/
|
||||
└── test/
|
||||
├── images/
|
||||
└── labels/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 训练配置
|
||||
|
||||
### 3.1 数据配置 `data.yaml`
|
||||
|
||||
```yaml
|
||||
# 导盲分割数据集 - 14 个核心类别
|
||||
path: /home/rongye/ProgramFiles/Yolo/datasets/indoor_blind
|
||||
train: train/images
|
||||
val: valid/images
|
||||
test: test/images
|
||||
|
||||
nc: 14
|
||||
names:
|
||||
- floor # 0: 可行走地面
|
||||
- corridor # 1: 走廊/通道
|
||||
- sidewalk # 2: 人行道
|
||||
- chair # 3: 椅子
|
||||
- table # 4: 桌子
|
||||
- sofa_bed # 5: 沙发/床
|
||||
- door # 6: 门
|
||||
- elevator # 7: 电梯
|
||||
- stairs # 8: 楼梯
|
||||
- wall # 9: 墙壁
|
||||
- person # 10: 行人
|
||||
- cabinet # 11: 柜子
|
||||
- trash_can # 12: 垃圾桶
|
||||
- window # 13: 窗户/玻璃门
|
||||
```
|
||||
|
||||
### 3.2 训练脚本 `train.py`
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# 注意:使用标准 yolo11l-seg.pt,而非 yoloe-11l-seg.pt
|
||||
# YOLOE 不支持自定义类别训练!
|
||||
model = YOLO("/home/rongye/ProgramFiles/Yolo/yolo11l-seg.pt")
|
||||
|
||||
results = model.train(
|
||||
data="data.yaml",
|
||||
epochs=150, # 导盲任务建议150轮
|
||||
imgsz=640, # 训练分辨率
|
||||
batch=16, # 3090 显存足够
|
||||
device=1, # 第二块 3090
|
||||
workers=16, # 多线程加载
|
||||
cache='ram', # 192GB 内存全量缓存
|
||||
optimizer='AdamW',
|
||||
close_mosaic=10, # 最后10轮关闭mosaic
|
||||
amp=True, # 混合精度
|
||||
project="blind_guide_project",
|
||||
name="yoloe_seg_blind_v1"
|
||||
)
|
||||
```
|
||||
|
||||
> ⚠️ **重要**:必须使用 `yolo11l-seg.pt`,不能使用 `yoloe-11l-seg.pt`!
|
||||
> YOLOE 是零样本模型,不支持自定义类别训练。
|
||||
|
||||
---
|
||||
|
||||
## 4. 训练结果
|
||||
|
||||
### 4.1 模型信息
|
||||
|
||||
| 项目 | 数值 |
|
||||
|------|------|
|
||||
| 模型 | YOLO11l-seg |
|
||||
| 参数量 | 27.6M |
|
||||
| 训练轮数 | 150 epochs |
|
||||
|
||||
### 4.2 推理性能
|
||||
|
||||
| 阶段 | 速度 |
|
||||
|------|------|
|
||||
| 训练时 | 4.8 ms/张 |
|
||||
| 验证时 | 23 ms/张 |
|
||||
|
||||
### 4.3 输出位置
|
||||
|
||||
```
|
||||
blind_guide_project/yoloe_seg_blind_v1/
|
||||
├── weights/
|
||||
│ ├── best.pt ← 最佳模型
|
||||
│ └── last.pt
|
||||
├── results.csv
|
||||
└── results.png
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. 推理与部署
|
||||
|
||||
### 5.1 验证模型
|
||||
|
||||
```bash
|
||||
python -c "from ultralytics import YOLO; m=YOLO('best.pt'); m.predict('test/images', save=True)"
|
||||
```
|
||||
|
||||
### 5.2 导出 TensorRT
|
||||
|
||||
```python
|
||||
model = YOLO("blind_guide_project/yoloe_seg_blind_v1/weights/best.pt")
|
||||
model.export(format="engine", imgsz=480, half=True, device=0)
|
||||
```
|
||||
|
||||
### 5.3 重命名模型
|
||||
|
||||
建议重命名为更清晰的名称:
|
||||
|
||||
```bash
|
||||
cp best.pt yolo11l-seg-indoor14.pt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. 常见问题
|
||||
|
||||
**Q: 为什么不用 YOLOE?**
|
||||
|
||||
YOLOE (YOLO Everything) 是零样本/开放词汇模型,设计用于通过文字描述检测任意物体,**不支持自定义类别训练**。训练时会报错:
|
||||
```
|
||||
RuntimeError: shape '[16, 78, -1]' is invalid for input of size 14745600
|
||||
```
|
||||
|
||||
**Q: 为什么要筛选类别?**
|
||||
|
||||
原始 MIT Indoor 数据集有 2573 个类别,绝大部分与导盲无关(如 alarm clock、bananas)。筛选后 14 个类别更聚焦,训练效果更好。
|
||||
|
||||
**Q: 训练用 640,推理用 480?**
|
||||
|
||||
是的。YOLO11l 具有很强的尺度鲁棒性,TensorRT 导出时用 480 可以进一步提速。
|
||||
# YOLO11l-seg 室内导盲模型训练指南 (V2)
|
||||
|
||||
本方案针对 Dell R730 服务器(RTX 3090 24GB, 192GB RAM)定制,使用大规模合并数据集进行 **实例分割 (Instance Segmentation)** 训练。
|
||||
|
||||
## 1. 硬件环境配置
|
||||
|
||||
### 1.1 系统与环境
|
||||
|
||||
```bash
|
||||
# 1. 创建虚拟环境
|
||||
conda create -n yolo python=3.10 -y
|
||||
conda activate yolo
|
||||
|
||||
# 2. 安装 PyTorch (CUDA 12.4)
|
||||
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
|
||||
|
||||
# 3. 安装 Ultralytics 和 常用工具
|
||||
pip install ultralytics tqdm opencv-python pyyaml
|
||||
|
||||
# 4. 其它依赖
|
||||
pip install opencv-python matplotlib albumentations
|
||||
```
|
||||
|
||||
### 1.2 监控显卡
|
||||
|
||||
```bash
|
||||
watch -n 1 nvidia-smi
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. 数据集 (V2 大规模合并版 - 22 类)
|
||||
|
||||
### 2.1 数据集组成
|
||||
|
||||
我们合并了多个来源的高质量分割数据集,数据源经过精心清洗和重映射:
|
||||
|
||||
| 数据集来源 | 描述 | 作用 |
|
||||
|------|------|------|
|
||||
| **MIT Indoor** | 室内场景分类 | 核心补充:提供 `appliance`, `tableware`, `furniture` 等日常物品 |
|
||||
| **Indoor Blind** | 早期筛选的导盲数据 | 基础核心数据 (Walkable areas) |
|
||||
| **Washroom** | 卫浴分割 | 覆盖 `toilet`, `sink` 等卫生间关键设施 |
|
||||
| **Spoon Fork Chopstick**| 餐具特写 | 极大增强 `tableware` 中细小餐具的识别 (筷子/勺子/叉子) |
|
||||
| **Stair Seg** | 楼梯分割 | 增强对楼梯的识别 |
|
||||
| **Stair Chair Couch**| 楼梯/椅子分割 | 补充高质量椅子与楼梯数据 |
|
||||
|
||||
**已移除**:
|
||||
- **Estima AI**: 因全部为平面设计图而非实景图,已移除。
|
||||
|
||||
### 2.2 类别定义 (22 类)
|
||||
|
||||
统一后的 22 个导盲核心类别:
|
||||
|
||||
| ID | 类别 | 中文 | ID | 类别 | 中文 |
|
||||
|----|------|------|----|------|------|
|
||||
| 0 | `floor` | 地面 | 11 | `wall` | 墙壁 |
|
||||
| 1 | `corridor` | 走廊 | 12 | `window` | 窗户 |
|
||||
| 2 | `sidewalk` | 人行道 | 13 | `cabinet` | 柜子 |
|
||||
| 3 | `chair` | 椅子 | 14 | `trash_can`| 垃圾桶 |
|
||||
| 4 | `table` | 桌子 | 15 | `person` | 行人 |
|
||||
| 5 | `sofa_bed` | 沙发/床 | 16 | `bag` | 包/背包 |
|
||||
| 6 | `door` | 门 | 17 | `electronics`| 电子电器 |
|
||||
| 7 | `elevator` | 电梯 | 18 | `plant` | 植物 |
|
||||
| 8 | `stairs` | 楼梯 | 19 | `obstacle` | 通用障碍 |
|
||||
| 9 | `appliance` | 家电 | 20 | `toilet` | 卫生间/马桶 |
|
||||
| 10 | `sink` | 洗手台 | 21 | `tableware` | 餐具/物品 |
|
||||
|
||||
> **注**: `tableware` (ID 21) 包含杯子、碗、盘子、勺子、筷子、瓶子等。
|
||||
|
||||
### 2.3 目录结构 (服务器端)
|
||||
|
||||
请确保服务器 `/home/rongye/ProgramFiles/Yolo/` 结构如下:
|
||||
|
||||
```
|
||||
/home/rongye/ProgramFiles/Yolo/
|
||||
├── yolo11l-seg.pt # 预训练权重 (标准模型)
|
||||
├── train_merged.py # 🔥 主力训练脚本 (使用合并数据)
|
||||
├── datasets/
|
||||
│ └── blind_guidance_merged/ # 🔥 V2 主力数据集
|
||||
│ ├── data.yaml # 配置 (22类, 路径必须正确)
|
||||
│ ├── train/
|
||||
│ ├── valid/
|
||||
│ └── test/
|
||||
└── blind_guide_project/ # 训练日志输出
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. 训练配置
|
||||
|
||||
### 3.1 主力配置 `blind_guidance_merged/data.yaml`
|
||||
|
||||
**关键点**:`path` 必须是服务器上的绝对路径。
|
||||
|
||||
```yaml
|
||||
# 路径必须为服务器绝对路径
|
||||
path: /home/rongye/ProgramFiles/Yolo/datasets/blind_guidance_merged
|
||||
train: train/images
|
||||
val: valid/images
|
||||
test: test/images
|
||||
|
||||
nc: 22
|
||||
names: [floor, corridor, sidewalk, chair, table, sofa_bed, door, elevator, stairs, wall, window, cabinet, trash_can, person, bag, electronics, plant, obstacle, appliance, toilet, sink, tableware]
|
||||
```
|
||||
|
||||
### 3.2 训练脚本 `train_merged.py`
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# 加载标准模型
|
||||
model = YOLO("/home/rongye/ProgramFiles/Yolo/yolo11l-seg.pt")
|
||||
|
||||
# 开始训练
|
||||
results = model.train(
|
||||
data="/home/rongye/ProgramFiles/Yolo/datasets/blind_guidance_merged/data.yaml",
|
||||
epochs=200, # 数据量大,跑200轮
|
||||
imgsz=640, # 训练分辨率
|
||||
batch=16, # 3090 24G 显存充足
|
||||
device=1, # 指定使用 GPU 1
|
||||
workers=16, # 多线程加载
|
||||
cache="ram", # 192GB 内存全量缓存加速
|
||||
optimizer="AdamW", # 优化器
|
||||
close_mosaic=15, # 最后15轮关闭增强
|
||||
project="blind_guide_project",
|
||||
name="yolo11l_blind_v2", # V2 版本
|
||||
amp=False # ⚠️ 关键:必须关闭混合精度以防止 Loss NaN/Inf
|
||||
)
|
||||
```
|
||||
|
||||
> **注意**: 如果训练出现 `Loss NaN/Inf`,请确保设置 `amp=False`。混合精度虽然省显存,但在某些特定数据集分布下会导致梯度溢出。
|
||||
|
||||
---
|
||||
|
||||
## 4. 开始训练流程
|
||||
|
||||
### 步骤 1: 上传数据
|
||||
将本地生成的 `blind_guidance_merged` 文件夹完整上传到服务器 `/home/rongye/ProgramFiles/Yolo/datasets/` 目录。
|
||||
|
||||
> **⚠️ 特别提醒**: 上传后,请务必检查服务器上的 `data.yaml` 中的 `path` 字段是否为 `/home/rongye/ProgramFiles/Yolo/datasets/blind_guidance_merged`。如果还是 Windows 路径,请手动修改或上传正确的版本。
|
||||
|
||||
### 步骤 2: 运行训练
|
||||
```bash
|
||||
cd /home/rongye/ProgramFiles/Yolo
|
||||
python train_merged.py
|
||||
```
|
||||
|
||||
### 步骤 3: 监控
|
||||
使用 `watch -n 1 nvidia-smi` 查看显存占用。
|
||||
|
||||
---
|
||||
|
||||
## 5. 模型导出 (TensorRT)
|
||||
|
||||
训练完成后 (V2),最佳模型位于 `blind_guide_project/yolo11l_blind_v2/weights/best.pt`。
|
||||
|
||||
导出命令:
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
model = YOLO("blind_guide_project/yolo11l_blind_v2/weights/best.pt")
|
||||
# 导出为 TensorRT engine, 半精度 fp16, 动态尺寸固定为 480 (或 640 根据推理端需求)
|
||||
model.export(format="engine", imgsz=480, half=True, device=0)
|
||||
```
|
||||
目标文件名:`yolo11l-seg-indoor.engine` (导出后重命名)
|
||||
|
||||
24
data.yaml
24
data.yaml
@@ -1,24 +0,0 @@
|
||||
# 导盲分割数据集 - 从 MIT Indoor 筛选的 14 个核心类别
|
||||
# 原始 2573 类别 -> 14 个导盲相关类别
|
||||
|
||||
path: /home/rongye/ProgramFiles/Yolo/datasets/indoor_blind
|
||||
train: train/images
|
||||
val: valid/images
|
||||
test: test/images
|
||||
|
||||
nc: 14
|
||||
names:
|
||||
- floor # 0: 可行走地面
|
||||
- corridor # 1: 走廊/通道
|
||||
- sidewalk # 2: 人行道
|
||||
- chair # 3: 椅子
|
||||
- table # 4: 桌子
|
||||
- sofa_bed # 5: 沙发/床
|
||||
- door # 6: 门
|
||||
- elevator # 7: 电梯
|
||||
- stairs # 8: 楼梯
|
||||
- wall # 9: 墙壁
|
||||
- person # 10: 行人
|
||||
- cabinet # 11: 柜子
|
||||
- trash_can # 12: 垃圾桶
|
||||
- window # 13: 窗户/玻璃门
|
||||
28
datasets/blind_guidance_merged/data.yaml
Normal file
28
datasets/blind_guidance_merged/data.yaml
Normal file
@@ -0,0 +1,28 @@
|
||||
names:
|
||||
- floor
|
||||
- corridor
|
||||
- sidewalk
|
||||
- chair
|
||||
- table
|
||||
- sofa_bed
|
||||
- door
|
||||
- elevator
|
||||
- stairs
|
||||
- wall
|
||||
- window
|
||||
- cabinet
|
||||
- trash_can
|
||||
- person
|
||||
- bag
|
||||
- electronics
|
||||
- plant
|
||||
- obstacle
|
||||
- appliance
|
||||
- toilet
|
||||
- sink
|
||||
- tableware
|
||||
nc: 22
|
||||
path: /home/rongye/ProgramFiles/Yolo/datasets/blind_guidance_merged
|
||||
test: test/images
|
||||
train: train/images
|
||||
val: valid/images
|
||||
30
datasets/floor/data.yaml
Normal file
30
datasets/floor/data.yaml
Normal file
@@ -0,0 +1,30 @@
|
||||
train: train/images
|
||||
val: valid/images
|
||||
test: test/images
|
||||
|
||||
nc: 22
|
||||
names:
|
||||
- floor
|
||||
- corridor
|
||||
- sidewalk
|
||||
- chair
|
||||
- table
|
||||
- sofa_bed
|
||||
- door
|
||||
- elevator
|
||||
- stairs
|
||||
- wall
|
||||
- window
|
||||
- cabinet
|
||||
- trash_can
|
||||
- person
|
||||
- bag
|
||||
- electronics
|
||||
- plant
|
||||
- obstacle
|
||||
- appliance
|
||||
- toilet
|
||||
- sink
|
||||
- tableware
|
||||
|
||||
path: /home/rongye/ProgramFiles/Yolo/datasets/floor
|
||||
20
datasets/indoor_blind/data.yaml
Normal file
20
datasets/indoor_blind/data.yaml
Normal file
@@ -0,0 +1,20 @@
|
||||
names:
|
||||
- floor
|
||||
- corridor
|
||||
- sidewalk
|
||||
- chair
|
||||
- table
|
||||
- sofa_bed
|
||||
- door
|
||||
- elevator
|
||||
- stairs
|
||||
- wall
|
||||
- person
|
||||
- cabinet
|
||||
- trash_can
|
||||
- window
|
||||
nc: 14
|
||||
path: /home/rongye/ProgramFiles/Yolo/datasets/indoor_blind
|
||||
test: test/images
|
||||
train: train/images
|
||||
val: valid/images
|
||||
11
floor_finetune.md
Normal file
11
floor_finetune.md
Normal file
@@ -0,0 +1,11 @@
|
||||
```bash
|
||||
yolo segment train \
|
||||
model=yolo11l-seg-indoor.pt \
|
||||
data=datasets/floor/data.yaml \
|
||||
epochs=10 \
|
||||
freeze=20 \
|
||||
amp=False \
|
||||
device=1 \
|
||||
project=/home/rongye/ProgramFiles/Yolo/blind_guide_project \
|
||||
name=floor_finetune
|
||||
```
|
||||
2
train.py
2
train.py
@@ -6,7 +6,7 @@ model = YOLO("/home/rongye/ProgramFiles/Yolo/yolo11l-seg.pt")
|
||||
|
||||
# 开始训练
|
||||
results = model.train(
|
||||
data="data.yaml",
|
||||
data="/home/rongye/ProgramFiles/Yolo/datasets/indoor_blind/data.yaml",
|
||||
epochs=150, # 导盲任务建议150轮以确保收敛
|
||||
imgsz=640, # 训练分辨率保持 640
|
||||
batch=16, # 3090 显存大,16-32 均可
|
||||
|
||||
28
train_merged.py
Normal file
28
train_merged.py
Normal file
@@ -0,0 +1,28 @@
|
||||
# train_merged.py
|
||||
# -*- coding: utf-8 -*-
|
||||
"""
|
||||
合并数据集训练脚本
|
||||
在服务器上运行:python train_merged.py
|
||||
"""
|
||||
|
||||
from ultralytics import YOLO
|
||||
|
||||
# 使用标准 YOLO11l-seg 模型(不要用 YOLOE!)
|
||||
model = YOLO("/home/rongye/ProgramFiles/Yolo/yolo11l-seg.pt")
|
||||
|
||||
# 开始训练
|
||||
results = model.train(
|
||||
data="/home/rongye/ProgramFiles/Yolo/datasets/blind_guidance_merged/data.yaml",
|
||||
epochs=200, # 数据量大,增加轮次
|
||||
imgsz=640, # 训练分辨率
|
||||
batch=16, # 3090 显存足够
|
||||
device=1, # 第二块 3090
|
||||
workers=16, # 多线程加载
|
||||
cache='ram', # 192GB 内存全量缓存
|
||||
optimizer='AdamW',
|
||||
close_mosaic=15, # 最后15轮关闭mosaic
|
||||
amp=False, # 混合精度 (NaN时关闭)
|
||||
patience=30, # 早停耐心值
|
||||
project="blind_guide_project",
|
||||
name="yolo11l_blind_v2" # 版本2
|
||||
)
|
||||
Reference in New Issue
Block a user