中圖分類號(hào):TP391 文獻(xiàn)標(biāo)志碼:A DOI: 10.16157/j.issn.0258-7998.244851 中文引用格式: 張明凱,胡軍國,劉江南,等. 基于深度學(xué)習(xí)的可視化圖表分類方法研究[J]. 電子技術(shù)應(yīng)用,2024,50(5):58-65. 英文引用格式: Zhang Mingkai,Hu Junguo,Liu Jiangnan,et al. Research on visualization chart classification method based on deep learning[J]. Application of Electronic Technique,2024,50(5):58-65.
Research on visualization chart classification method based on deep learning
1.College of Mathematics and Computer Science, Zhejiang A & F University; 2.College of Chemistry and Materials Engineering, Zhejiang A & F University
Abstract: The classification research of visual charts holds significant implications for chart comprehension and document parsing. This paper has constructed two datasets, each containing 16 common chart types, using web scraping and software generation. These datasets exhibit certain advantages in terms of quantity, type, and stylistic diversity. This paper has also conducted experiments comparing Transformer and Convolutional Neural Network (CNN) architectures on three datasets, and the results indicates that the Transformer architecture has certain advantages in the task of chart classification. Utilizing the Swin Transformer model, this paper designs various data augmentation strategies, not only increasing the generalization of the model, but also introducing the distribution difference. By averaging predictions from models trained with different strategies, there is a significant improvement in classification performance compared to individual models. The ensemble model was tested on 6 test sets, with classification accuracy exceeding 0.9 in all cases. For generated charts with high image quality and simple visual forms, the model's classification accuracy approached 1.
Key words : chart classification;chart comprehension;convolutional neural network;Swin Transformer;model ensemble