这学期选了《计算智能》,要做一个有雾环境下的目标检测的作业。百度了一下没什么相关的博客,把自己做作业的过程记录一下。

由于自己没有可以用的GPU设备,而且Google colab上已经配置好了很多深度学习需要的框架如pytorch、tensorflow等,因此直接在colab上跑模型。关于colab怎么用的教程百度上很多,这里就不多说了。这里主要介绍怎么在colab上用mmdetection跑通这个模型。
数据集使用的是RTTS数据集,数据集是VOC格式的。在mmdetection中只要修改一部分代码就可以直接使用,下面是在Cola上的操作过程。代码默认是python代码,以!或%开头的代码是linux命令行。先看看白嫖到什么GPU吧。

!nvidia-smi

输出如下:

Fri Dec 20 00:57:04 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    26W / 250W |      0MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

我把数据集存在谷歌硬盘里了,所以这里要挂载一下谷歌硬盘。colab也支持直接上传文件,不用谷歌硬盘的话也可以直接上传。

# 挂载Google drive
from google.colab import drive
drive.mount('/content/drive')

从github上把mmdetection克隆下来

!git clone https://github.com/open-mmlab/mmdetection.git

接下来开始安装mmdetection

%cd /content/mmdetection/
!pip install mmcv
!python setup.py develop 

等这个安装好之后就可以开始用了。不得不说colab真香,在自己机子上配置环境要花不少时间,在colab上安装一下就能用了。接下来把数据集从谷歌硬盘copy过来,再解压。

%cd /content/mmdetection/
!mkdir data
%cd data
# 将数据集从谷歌硬盘上复制过来
!cp '/content/drive/My Drive/VOC2007/RTTS_.zip' RTTS_.zip
# 解压
!unzip RTTS_.zip

数据集准备完毕,接下来需要修改一部分代码来跑通这个数据集。用的模型是基于resnet101的Faster R-CNN,因此需要修改对应的参数./configs/faster_rcnn_r101_fpn_1x.py。mmdetection默认的数据集是coco,所以首先需要修改数据集的格式以及路径:

dataset_type = 'VOCDataset'
data_root = '/content/mmdetection/data/'

接着修改数据集中训练集和交叉验证集的路径

data = dict(
    imgs_per_gpu=5,
    workers_per_gpu=5,
    train=dict(
        type=dataset_type,
        #训练
        ann_file=data_root + 'RTTS/ImageSets/Main/train.txt',
        img_prefix=data_root + 'RTTS/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        #交叉验证
        ann_file=data_root + 'RTTS/ImageSets/Main/val.txt',
        img_prefix=data_root + 'RTTS/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        #测试
        ann_file=data_root + 'RTTS/ImageSets/Main/val.txt',
        img_prefix=data_root + 'RTTS/',
        pipeline=test_pipeline))

由于在训练时加入了--validate参数,就把交叉验证集当作测试集,因此测试集用不到,怎么分配都无所谓。同时colab提供的GPU有16G的显存,不容易爆显存,于是将imgs_per_gpu和workers_per_gpu修改为5。这个要看每次colab分配给你的GPU型号,如果显存太小的话不建议修改这个参数。
然后修改日志显示间隔为100,50次迭代就显示一次太频繁。

log_config = dict(
    interval=100,
    hooks=[
        dict(type='TextLoggerHook'),
    ])

最后修改epoch数、class数和工作路径:

num_classes=6
total_epochs = 20
work_dir = './work_dirs/faster_rcnn_r101_fpn_1x/hzdtc'

到这里模型的训练参数就已经修改完毕了。但是我们的数据集与标准的VOC2007还有一些区别,还需要对部分代码进行修改。

  1. 修改/mmdetection/mmdet/datasets/voc.py,修改里面的CLASSES和year,不改year会报错(可能是因为我改了数据集里的文件结构吧,具体还是得看数据集里面的文件结构)。
class VOCDataset(XMLDataset):

    CLASSES = ('bicycle', 'bus', 'car', 'motorbike', 'person')

    def __init__(self, **kwargs):
        super(VOCDataset, self).__init__(**kwargs)
        self.year = 2007
        # if 'VOC2007' in self.img_prefix:
        #     self.year = 2007
        # elif 'VOC2012' in self.img_prefix:
        #     self.year = 2012
        # else:
        #     raise ValueError('Cannot infer dataset year from img_prefix')
  1. 修改/mmdetection/mmdet/core/evaluation/class_names.py
def voc_classes():
    return [
        'bicycle', 'bus', 'car', 'motorbike', 'person'
    ]
  1. 修改/mmdetection/mmdet/datasets/xml_style.py,数据集中的图片是.png格式的,标准的VOC数据集是.jpg格式的。不改的话无法读取数据。
def load_annotations(self, ann_file):
        img_infos = []
        img_ids = mmcv.list_from_file(ann_file)
        for img_id in img_ids:
            # 修改此处的.jpg为.png
            filename = 'JPEGImages/{}.png'.format(img_id) 
            xml_path = osp.join(self.img_prefix, 'Annotations',
                                '{}.xml'.format(img_id))
            tree = ET.parse(xml_path)
            root = tree.getroot()
            size = root.find('size')
            width = int(size.find('width').text)
            height = int(size.find('height').text)
            img_infos.append(
                dict(id=img_id, filename=filename, width=width, height=height))
        return img_infos

完成以上的修改后,就可以开始训练模型了。虽然只有一张GPU,还是建议使用分布式的训练方法,因为分布式训练方法才有--validate参数,可以在每个epoch跑完后看到模型此时的mAP。

%cd /content/mmdetection
!CUDA_VISIBLE_DEVICES=0 ./tools/dist_train.sh configs/faster_rcnn_r101_fpn_1x.py 1 --validate

训练开始后会先输出faster_rcnn_r101_fpn_1x.py中的配置,每训练一个epoch会输出一次mAP,效果如下:

2019-12-19 05:01:42,257 - INFO - load model from: torchvision://resnet101
2019-12-19 05:01:42,782 - WARNING - The model and loaded state dict do not match exactly
unexpected key in source state_dict: fc.weight, fc.bias
    
2019-12-19 05:01:50,571 - INFO - Start running, host: root@ad882785deec, work_dir: /content/mmdetection/work_dirs/faster_rcnn_r101_fpn_1x/hzdtc
2019-12-19 05:01:50,571 - INFO - workflow: [('train', 1)], max: 20 epochs
2019-12-19 05:04:06,393 - INFO - Epoch [1][100/779]	lr: 0.00931, eta: 5:50:24, time: 1.358, data_time: 0.035, memory: 13119, loss_rpn_cls: 0.1735, loss_rpn_bbox: 0.0350, loss_cls: 0.3752, acc: 90.8566, loss_bbox: 0.1861, loss: 0.7698
2019-12-19 05:06:19,575 - INFO - Epoch [1][200/779]	lr: 0.01197, eta: 5:44:45, time: 1.332, data_time: 0.014, memory: 13119, loss_rpn_cls: 0.0913, loss_rpn_bbox: 0.0369, loss_cls: 0.3376, acc: 89.8496, loss_bbox: 0.2308, loss: 0.6967
2019-12-19 05:08:32,287 - INFO - Epoch [1][300/779]	lr: 0.01464, eta: 5:41:00, time: 1.327, data_time: 0.014, memory: 13119, loss_rpn_cls: 0.0504, loss_rpn_bbox: 0.0313, loss_cls: 0.3012, acc: 89.9437, loss_bbox: 0.2278, loss: 0.6106
2019-12-19 05:10:44,460 - INFO - Epoch [1][400/779]	lr: 0.01731, eta: 5:37:40, time: 1.322, data_time: 0.014, memory: 13119, loss_rpn_cls: 0.0474, loss_rpn_bbox: 0.0313, loss_cls: 0.2860, acc: 90.3688, loss_bbox: 0.2042, loss: 0.5689
2019-12-19 05:12:56,712 - INFO - Epoch [1][500/779]	lr: 0.01997, eta: 5:34:50, time: 1.323, data_time: 0.014, memory: 13119, loss_rpn_cls: 0.0533, loss_rpn_bbox: 0.0311, loss_cls: 0.2851, acc: 90.4473, loss_bbox: 0.1882, loss: 0.5577
2019-12-19 05:15:09,968 - INFO - Epoch [1][600/779]	lr: 0.02000, eta: 5:32:38, time: 1.333, data_time: 0.014, memory: 13119, loss_rpn_cls: 0.0441, loss_rpn_bbox: 0.0287, loss_cls: 0.2779, acc: 90.5734, loss_bbox: 0.1895, loss: 0.5403
2019-12-19 05:17:22,536 - INFO - Epoch [1][700/779]	lr: 0.02000, eta: 5:30:10, time: 1.326, data_time: 0.014, memory: 13119, loss_rpn_cls: 0.0372, loss_rpn_bbox: 0.0275, loss_cls: 0.2568, acc: 91.1914, loss_bbox: 0.1738, loss: 0.4953
terminal width is too small (0), please consider widen the terminal for better progressbar visualization
[>>>>>>>>>>] 433/433, 6.8 task/s, elapsed: 64s, ETA:     0s
    
+-----------+------+-------+--------+-----------+-------+
| class     | gts  | dets  | recall | precision | ap    |
+-----------+------+-------+--------+-----------+-------+
| bicycle   | 52   | 1109  | 0.673  | 0.032     | 0.222 |
| bus       | 175  | 3312  | 0.731  | 0.039     | 0.249 |
| car       | 1820 | 12465 | 0.902  | 0.136     | 0.755 |
| motorbike | 101  | 2383  | 0.901  | 0.039     | 0.463 |
| person    | 853  | 10286 | 0.884  | 0.075     | 0.617 |
+-----------+------+-------+--------+-----------+-------+
| mAP       |      |       |        |           | 0.461 |
+-----------+------+-------+--------+-----------+-------+
2019-12-19 05:20:13,279 - INFO - Epoch [1][779/779]	lr: 0.02000, mAP: 0.4612

等模型训练完毕,可以用自带的日志分析功能对模型的训练过程进行可视化。本实验只是看一下模型的mAP和loss的变化,效果如下。

%cd /content/mmdetection
!python tools/analyze_logs.py plot_curve ./work_dirs/faster_rcnn_r101_fpn_1x/hzdtc/20191219_050150.log.json --keys mAP --legend mAP --out mAP.jpg
!python tools/analyze_logs.py plot_curve ./work_dirs/faster_rcnn_r101_fpn_1x/hzdtc/20191219_050150.log.json --keys loss --legend loss --out loss.jpg

输出如下:

/content/mmdetection
plot curve of ./work_dirs/faster_rcnn_r101_fpn_1x/hzdtc/20191219_050150.log.json, metric is mAP
save curve to: mAP.jpg
plot curve of ./work_dirs/faster_rcnn_r101_fpn_1x/hzdtc/20191219_050150.log.json, metric is loss
save curve to: loss.jpg

colab没有图形界面,因此这里图片显示不出来。我是通过把图片输出为.jpg格式的文件,再用PIL模块显示图片。可能还有更好的方法,但是我不会。

from PIL import Image
mAP = Image.open('mAP.jpg')
mAP
loss = Image.open('loss.jpg')
loss

这里图片就不放出来了,你们要是自己跑的话是可以看得见的。到这里就结束了。希望对大家有所帮助吧。