基于Transformer和语义增强的人群计数算法
网络安全与数据治理 2023年第5期
何晴,杨倩倩,彭思凡,殷保群
(中国科学技术大学信息科学技术学院,安徽合肥230027)
摘要: 针对人群图像中的尺度变化问题,提出了基于Transformer和语义增强的人群计数算法。为了能有效应对尺度变化问题,首先引入Transformer作为主干网对全局上下文进行建模来获得全局感受野。然后由上至下依次融合主干网相邻层次的特征图,在融合过程中强化多个层次特征图的语义信息。接着对多层次特征图进行动态特征选择,选择出适合密度图生成的特征。最后,通过注意力图来调整密度图抵抗背景干扰,以此来生成高质量的人群密度估计图。在ShanghaiTech、UCFQNRF和JHUCROWD++三个数据集上进行了大量的实验来对算法的有效性进行验证,实验结果表明所提算法能有效提高模型的准确性和鲁棒性。
中圖分類號(hào):TP391.1
文獻(xiàn)標(biāo)識(shí)碼:A
DOI:10.19358/j.issn.2097-1788.2023.05.009
引用格式:何晴,楊倩倩,彭思凡,等.基于Transformer和語義增強(qiáng)的人群計(jì)數(shù)算法[J].網(wǎng)絡(luò)安全與數(shù)據(jù)治理,2023,42(5):50-58.
文獻(xiàn)標(biāo)識(shí)碼:A
DOI:10.19358/j.issn.2097-1788.2023.05.009
引用格式:何晴,楊倩倩,彭思凡,等.基于Transformer和語義增強(qiáng)的人群計(jì)數(shù)算法[J].網(wǎng)絡(luò)安全與數(shù)據(jù)治理,2023,42(5):50-58.
Transformer and semantic enhancement for crowd counting
He Qing,Yang Qianqian,Peng Sifan,Yin Baoqun
(School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China)
Abstract: Aiming at the problem of scale variation in crowd images, this paper proposes a crowd counting algorithm based on Transformer and semantic enhancement. Firstly, Transformer is introduced as the backbone of the network. Because it can model the global context and obtain the global receptive field, which can effectively deal with the scale variation. Then, the feature maps of adjacent levels of the backbone network are fused from top to bottom in turn, and the semantic information of multiple levels of feature maps is strengthened in the fusion process. Afterwards the dynamic feature selection of multilevel feature maps is carried out, and the features suitable for density map generation are selected. Finally, the density map is adjusted to resist background interference by attention masks, so as to generate highquality crowd density estimation map. In this paper, a large number of experiments are carried out on ShanghaiTech, UCF_QNRF and JHUCROWD++ datasets to verify the effectiveness of the algorithm. The experimental results show that the proposed algorithm can effectively improve the accuracy and robustness of the model.
Key words : crowd counting; Transformer; semantic enhancement; feature selection
0 引言
人群計(jì)數(shù)在視頻監(jiān)控、人群分析和公共安全領(lǐng)域發(fā)揮著重要作用,考慮到大規(guī)模的人群聚集事件的頻繁發(fā)生,對(duì)擁擠場(chǎng)景的人群分析十分必要。然而現(xiàn)階段人群計(jì)數(shù)的應(yīng)用還受到很大的限制,在諸多限制中,圖像中人頭尺寸不一致的問題尤其受到大多數(shù)研究者的關(guān)注。由于攝像頭高度和角度受到限制,所拍攝的圖像存在透視失真,從而導(dǎo)致了圖像中目標(biāo)尺度差異較大。如圖1所示,離攝像頭遠(yuǎn)處的目標(biāo)尺度較大,近處的目標(biāo)尺度較小。為了解決尺度變化問題,本文提出基于Transformer和語義增強(qiáng)的人群計(jì)數(shù)算法,利用Transformer獲取全局感受野,由上至下依次融合相鄰層次特征并對(duì)語義信息進(jìn)行增強(qiáng),動(dòng)態(tài)選擇適合密度圖生成的特征,從而生成高質(zhì)量的人群密度估圖。
本文詳細(xì)內(nèi)容請(qǐng)下載:http://ihrv.cn/resource/share/2000005334
作者信息:
何晴,楊倩倩,彭思凡,殷保群
(中國科學(xué)技術(shù)大學(xué)信息科學(xué)技術(shù)學(xué)院,安徽合肥230027)

此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。
