中圖分類號(hào):TP391.1 文獻(xiàn)標(biāo)志碼:A DOI: 10.16157/j.issn.0258-7998.245345 中文引用格式: 楊嘉佳,李正,鄭兒,等. 一種基于指令流水線的數(shù)據(jù)匹配算法[J]. 電子技術(shù)應(yīng)用,2025,51(2):81-85. 英文引用格式: Yang Jiajia,Li Zheng,Zheng Er,et al. A data matching algorithm based on instruction pipeline[J]. Application of Electronic Technique,2025,51(2):81-85.
A data matching algorithm based on instruction pipeline
Yang Jiajia,Li Zheng,Zheng Er,Zhao Jing,Yan Wei,Liu Jin
The Sixth Research Institute of China Electronics Corporation
Abstract: The data matching technology based on regular expressions has significant application value in basic data governance and cleaning. However, in the data processing process of high-performance computing, the low performance of algorithm matching cannot meet the high-performance requirements of algorithms in the big data processing environment, resulting in limited application scope. To address this issue, a high-performance data matching algorithm based on instruction pipelining is proposed, known as γFA. It utilizes the vector instruction pipelining built into the Intel architecture to read in multiple character segments, performs pipeline ratio processing of the character segments with untrusted character sets through a wide-width vector comparison function, and converts them into integer vectors. The position location function is then used to accumulate and locate the first untrusted character position in the integer vector, calculate the number of characters that can be skipped, and reduce the significant time overhead caused by the regular expression matching engine accessing slow memory when processing untrusted character sets. This achieves performance acceleration for the regular expression matching algorithm. Experimental results show that the γFA algorithm achieves a throughput rate that is 15.88 to 53.06 times higher than the original DFA algorithm. Compared to the ßFA algorithm, the throughput rate is improved by 35.12% to 63.26%, achieving a better performance acceleration effect. Furthermore, after optimizing the γFA algorithm, a performance close to 100 Gb/s can be achieved, which is 15.88 to 64.94 times better than the performance of the original DFA matching algorithm. This represents an improvement of 2.15% to 43.09% compared to the γFA algorithm.
Key words : regular expression matching;instruction pipeline;high-performance data matching