中圖分類號:F49文獻標志碼:ADOI:10.19358/j.issn.2097-1788.2026.04.001 中文引用格式:張茜茜, 殷宏宇,楊光. 數(shù)據(jù)工廠:國家數(shù)據(jù)基礎設施的新興業(yè)態(tài)[J].網(wǎng)絡安全與數(shù)據(jù)治理,2026,45(4):2-8. 英文引用格式:Zhang Qianqian,Yin Hongyu,Yang Guang. Data Factory: an emerging form of national data infrastructure[J].Cyber Security and Data Governance,2026,45(4):2-8.
Data Factory: an emerging form of national data infrastructure
Zhang Qianqian1,Yin Hongyu2,Yang Guang3
1.School of Computer Science and Artificial Intelligence, Beijing Wuzi University; 2.Beijing Lianhai Information Systems Co., Ltd.; 3.China Information Technology Security Evaluation Center
Abstract: The valorization of data as a factor of production faces widespread challenges, including insufficient supply, restricted circulation, and ineffective utilization. The core reason lies in the immaturity of data production modes, where highquality datasets still rely on workshopstyle production that fails to meet the largescale data demands of Artificial Intelligence (AI) large models. To address this problem, the concept of "Data Factory" is proposed and defined as a data infrastructure dedicated to the facilitybased, largescale, and standardized production of highquality datasets for AI large model applications. By tracing the evolution of infrastructure forms across industrial society, information society, and dataintelligent society, the theoretical logic of Data Factory as a fundamental building block of national data infrastructure is established. Based on characteristics such as physical distribution, organizational structure, and technological sophistication, Data Factories are classified into three types: centralized, semicentralized, and distributed. Five key features are identified: diversity, facilityorientation, scalability, standardization, and AIintegration. The study concludes that the development of Data Factories can effectively break through the data supply bottleneck in AI development, promote upstream and downstream collaboration in the data industry chain, and serve as a critical path to bridge the "last mile" gap between data and AI empowerment.
Key words : Data Factory; data infrastructure; highquality dataset; data factorization