中圖分類號 : D922. 17 ; TP181 文獻標志碼 : A DOI :10.19358/j.issn.2097-1788.2026.02.010 中文引用格式 : 王婉清. 機器學習中已公開個人數(shù)據(jù)的合法利用路徑 [J]. 網(wǎng)絡(luò)安全與數(shù)據(jù)治理 , 2026 , 45(2) : 73 - 80. 英文引用格式 : Wang Wanqing. Legal use of publicly available personal data in machine learning [J]. Cyber Security and Data Govern- ance, 2026 , 45(2) : 73 - 80.
Legal use of publicly available personal data in machine learning
Wang Wanqing
China Institute for Rule of Law Strategy, East China University of Political Science and Law
Abstract: Publicly available personal data, as a crucial corpus for machine learning training, should in principle be governed by an orientation toward open utilization and more permissive acquisition strategies. However, practical challenges arise in lawful use due to ambiguities in the scope of web scraping, potential infringement risks in generative AI applications, and the difficulty for data subjects to exercise informational self-determination. To address the dilemma of lawful use, it is necessary to construct a legal utilization pathway for such data throughout the full machine learning cycle. During the data collection stage, the legitimacy and potential impact of web scraping should be assessed. If competitive interests are involved, access should shift to lawful channels such as API authorization to ensure data sources are legal. In the application stage of machine learning outputs, a classified security mechanism should be established based on the type of personal information, with real-time su- pervision to prevent privacy breaches and misuse. After deployment in the market, a data disclosure mechanism should be implemented to sup- port user intervention through transparency and safeguard the right to personal information autonomy.
Key words : publicly available personal data; machine learning; competitive interests; personal data protection; information disclosure