123,123,123

港大同济伯克利推出目标检测新范式：Sparse R-CNN

日期： 2020-11-27

來源：机器之心

關鍵詞： 目标检测

　　全新的目標檢測范式Sparse R-CNN。

　　本文主要介紹一下我們最近的一篇工作：

微信圖片_20201127153358.jpg

　　沿著目標檢測領域中 Dense 和 Dense-to-Sparse 的框架，Sparse R-CNN 建立了一種徹底的 Sparse 框架，脫離 anchor box，reference point，Region Proposal Network（RPN）等概念，無需 Non-Maximum Suppression（NMS）后處理，在標準的 COCO benchmark 上使用 ResNet-50 FPN 單模型在標準 3x training schedule 達到了 44.5 AP 和 22 FPS。

　　Paper: https://arxiv.org/abs/2011.12450

　　Code: https://github.com/PeizeSun/SparseR-CNN

　　1. Motivation

　　我們先簡單回顧一下目標檢測領域中主流的兩大類方法。

　　第一大類是從非 Deep 時代就被廣泛應用的 dense detector，例如 DPM，YOLO，RetinaNet，F(xiàn)COS。在 dense detector 中，大量的 object candidates 例如 sliding-windows，anchor-boxes， reference-points 等被提前預設在圖像網(wǎng)格或者特征圖網(wǎng)格上，然后直接預測這些 candidates 到 gt 的 scaling/offest 和物體類別。

　　第二大類是 dense-to-sparse detector，例如，R-CNN 家族。這類方法的特點是對一組 sparse 的 candidates 預測回歸和分類，而這組 sparse 的 candidates 來自于 dense detector。

　　這兩類框架推動了整個領域的學術研究和工業(yè)應用。目標檢測領域看似已經(jīng)飽和，然而 dense 屬性的一些固有局限總讓人難以滿意：

　　NMS 后處理

　　many-to-one 正負樣本分配

　　prior candidates 的設計

　　所以，一個很自然的思考方向就是：能不能設計一種徹底的 sparse 框架？最近，DETR 給出了一種 sparse 的設計方案。candidates 是一組 sparse 的 learnable object queries，正負樣本分配是 one-to-one 的 optimal bipartite matching，無需 nms 直接輸出最終的檢測結果。然而，DETR 中每個 object query 都和全局的特征圖做 attention 交互，這本質上也是 dense。而我們認為，sparse 的檢測框架應該體現(xiàn)在兩個方面：sparse candidates 和 sparse feature interaction?；诖?，我們提出了 Sparse R-CNN。

　　Sparse R-CNN 拋棄了 anchor boxes 或者 reference point 等 dense 概念，直接從 a sparse set of learnable proposals 出發(fā)，沒有 NMS 后處理，整個網(wǎng)絡異常干凈和簡潔，可以看做是一個全新的檢測范式。

微信圖片_20201127153401.jpg

　　2.Sparse R-CNN

　　Sparse R-CNN 的 object candidates 是一組可學習的參數(shù)，N*4，N 代表 object candidates 的個數(shù)，一般為 100～300，4 代表物體框的四個邊界。這組參數(shù)和整個網(wǎng)絡中的其他參數(shù)一起被訓練優(yōu)化。That's it，完全沒有 dense detector 中成千上萬的枚舉。這組 sparse 的 object candidates 作為 proposal boxes 用以提取 Region of Interest（RoI），預測回歸和分類。

微信圖片_20201127153403.jpg

　　這組學習到的 proposal boxes 可以理解為圖像中可能出現(xiàn)物體的位置的統(tǒng)計值，這樣 coarse 的表征提取出來的 RoI feature 顯然不足以精確定位和分類物體。于是，我們引入一種特征層面的 candidates，proposal features，這也是一組可學習的參數(shù)，N*d，N 代表 object candidates 的個數(shù)，與 proposal boxes 一一對應，d 代表 feature 的維度，一般為 256。這組 proposal features 與 proposal boxes 提取出來的 RoI feature 做一對一的交互，從而使得 RoI feature 的特征更有利于定位和分類物體。相比于原始的 2-fc Head，我們的設計稱為 Dynamic Instance Interactive Head。

微信圖片_20201127153404.jpg

　　Sparse R-CNN 的兩個顯著特點就是 sparse object candidates 和 sparse feature interaction，既沒有 dense 的成千上萬的 candidates，也沒有 dense 的 global feature interaction。Sparse R-CNN 可以看作是目標檢測框架從 dense 到 dense-to-sparse 到 sparse 的一個方向拓展。

　　3. Architecture Design

　　Sparse R-CNN 的網(wǎng)絡設計原型是 R-CNN 家族。

　　Backbone 是基于 ResNet 的 FPN。

　　Head 是一組 iterative 的 Dynamic Instance Interactive Head，上一個 head 的 output features 和 output boxes 作為下一個 head 的 proposal features 和 proposal boxes。Proposal features 在與 RoI features 交互之前做 self-attention。

　　訓練的損失函數(shù)是基于 optimal bipartite matching 的 set prediction loss。

微信圖片_20201127153407.jpg

　　從 Faster R-CNN（40.2 AP）出發(fā)，直接將 RPN 替換為 a sparse set of learnable proposal boxes，AP 降到 18.5；引入 iterative 結構提升 AP 到 32.2；引入 dynamic instance interaction 最終提升到 42.3 AP。

　　4. Performance

　　我們沿用了 Detectron2 的 3x training schedule，因此將 Sparse R-CNN 和 Detectorn2 中的 detectors 做比較（很多方法沒有報道 3x 的性能，所以沒有列出）。同時，我們也列出了同樣不需要 NMS 后處理的 DETR 和 Deformable DETR 的性能。Sparse R-CNN 在檢測精度，推理時間和訓練收斂速度都展現(xiàn)了相當有競爭力的性能。

微信圖片_20201127153410.jpg

　　5. Conclusion

　　R-CNN 和 Fast R-CNN 出現(xiàn)后的一段時期內，目標檢測領域的一個重要研究方向是提出更高效的 region proposal generator。Faster R-CNN 和 RPN 作為其中的佼佼者展現(xiàn)出廣泛而持續(xù)的影響力。Sparse R-CNN 首次展示了簡單的一組可學習的參數(shù)作為 proposal boxes 即可達到 comparable 的性能。我們希望我們的工作能夠帶給大家一些關于 end-to-end object detection 的啟發(fā)。

版權聲明：本站內容除特別聲明的原創(chuàng)文章之外，轉載內容只為傳遞更多信息，并不代表本網(wǎng)站贊同其觀點。轉載的所有的文章、圖片、音/視頻文件等資料的版權歸版權所有權人所有。本站采用的非本站原創(chuàng)文章及圖片等內容無法一一聯(lián)系確認版權者。如涉及作品內容、版權和其它問題，請及時通過電子郵件或電話通知我們，以便迅速采取適當措施，避免給雙方造成不必要的經(jīng)濟損失。聯(lián)系電話：010-82306118；郵箱：aet@chinaaet.com。

港大同济伯克利推出目标检测新范式：Sparse R-CNN

日期： 2020-11-27

來源：机器之心

相關內容