《電子技術(shù)應(yīng)用》
您所在的位置:首頁 > 其他 > 設(shè)計(jì)應(yīng)用 > 數(shù)據(jù)質(zhì)量高效校驗(yàn)方法研究
數(shù)據(jù)質(zhì)量高效校驗(yàn)方法研究
網(wǎng)絡(luò)安全與數(shù)據(jù)治理
程勝1,郜美華2,冉瑛1,喬玉潔2,黃鵬1
1.中國民航信息網(wǎng)絡(luò)股份有限公司重慶分公司; 2.中國民航信息網(wǎng)絡(luò)股份有限公司
摘要: 數(shù)據(jù)質(zhì)量監(jiān)控是數(shù)據(jù)處理領(lǐng)域的重要挑戰(zhàn)。提出了一種創(chuàng)新方法,通過規(guī)則配置高效生成數(shù)據(jù)質(zhì)量驗(yàn)證規(guī)則與告警策略,并在單一SQL查詢中實(shí)現(xiàn)多指標(biāo)并發(fā)校驗(yàn),顯著提升了校驗(yàn)效率。詳細(xì)闡述了技術(shù)方案設(shè)計(jì)原理并做了性能對比實(shí)驗(yàn):數(shù)據(jù)處理速度提升10~20倍,規(guī)則配置效率提升4~5倍,實(shí)際應(yīng)用中的資源消耗指標(biāo)(CPU、內(nèi)存)與傳統(tǒng)解決方案基本一致,準(zhǔn)確率仍保持在99.99%,為大數(shù)據(jù)環(huán)境下的數(shù)據(jù)質(zhì)量管理提供了可量化的參考實(shí)踐。
中圖分類號(hào):TP311.13;F562文獻(xiàn)標(biāo)識(shí)碼:ADOI:10.19358/j.issn.2097-1788.2025.10.012
引用格式:程勝,郜美華,冉瑛,等. 數(shù)據(jù)質(zhì)量高效校驗(yàn)方法研究[J].網(wǎng)絡(luò)安全與數(shù)據(jù)治理,2025,44(10):75-79.
Research on efficient data quality validation methods
Cheng Sheng1,Gao Meihua 2,Ran Ying1,Qiao Yujie2,Huang Peng1
1. Travelsky Technology Limited Chongqing Branch; 2. Travelsky Technology Limited
Abstract: Data quality monitoring is a critical challenge in data processing. This study proposed an innovative method that efficiently generates data quality validation rules and alert strategies through rule configuration, and implements multiindicator concurrent validation in a single SQL query, significantly improving validation efficiency. The design principle of the technical solution was elaborated in detail and performance comparison experiments were conducted. Experimental results demonstrated that the solution improves data processing speed by 10~20 times, rule configuration efficiency by 4~5 times, while maintaining resource consumption (CPU, memory) at levels comparable to traditional solutions, with an accuracy rate of 99.99%. This study provides a quantifiable reference practice for data quality management in big data environments.
Key words : big data; data governance; data quality; data cleaning; data quality monitoring

引言

民航局2022年印發(fā)《關(guān)于民航大數(shù)據(jù)建設(shè)發(fā)展的指導(dǎo)意見》[1],明確要求加強(qiáng)數(shù)據(jù)質(zhì)量管理。航信作為民航數(shù)據(jù)服務(wù)的主要提供商,其數(shù)據(jù)質(zhì)量直接影響航空安全與運(yùn)營效率。

根據(jù)國際數(shù)據(jù)治理協(xié)會(huì)統(tǒng)計(jì)[2],數(shù)據(jù)質(zhì)量問題導(dǎo)致企業(yè)平均每年損失15%~25%的營收。傳統(tǒng)數(shù)據(jù)質(zhì)量工具(如Apache Griffin)存在配置復(fù)雜(平均每個(gè)規(guī)則需4~6步操作)、執(zhí)行效率低(單指標(biāo)獨(dú)立SQL查詢)等問題。本研究通過實(shí)證分析,提出了一種優(yōu)化方案,經(jīng)生產(chǎn)環(huán)境驗(yàn)證,配置效率可提升5倍(從平均15 min/規(guī)則降至3 min/規(guī)則),執(zhí)行速度提升8~10倍(多指標(biāo)合并查詢減少I/O開銷)。


本文詳細(xì)內(nèi)容請下載:

http://ihrv.cn/resource/share/2000006830


作者信息:

程勝1,郜美華2,冉瑛1,喬玉潔2,黃鵬1

(1.中國民航信息網(wǎng)絡(luò)股份有限公司重慶分公司,重慶401120;

2.中國民航信息網(wǎng)絡(luò)股份有限公司,北京101300)


subscribe.jpg

此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。