Data governance technical process for machine learning modeling
Li Yanze1, Guo Chao2, Sun Xuming2, Mu Dongjie2
1. Beijing PERCENT Technology Group Co., Ltd.; 2. China Electronics Industry Engineering Co., Ltd.
Abstract: With the rapid development of artificial intelligence and machine learning technologies, ensuring data quality has become a core factor in enhancing model performance and reliability. Particularly in the application of different types of machine learning models, how to effectively implement data governance to improve data quality, stability, and fairness remains an urgent issue to be addressed. This paper reviews the critical role of data governance in machine learning modeling and proposes a systematic data governance framework, covering the entire process from data collection, processing, and annotation to model training. The framework aims to provide practical governance solutions to support machine learning applications. It emphasizes the adoption of targeted technical measures at different stages to ensure the effectiveness of data governance, thereby enhancing data quality and ensuring model interpretability, stability, and fairness. This research provides a theoretical foundation for the in-depth application of data governance in machine learning and offers guidance for subsequent technical practices and innovations.
Key words : data governance; machine learning; artificial intelligence; architecture; data management; model training