生成式人工智能训练数据风险的规制路径研究
网络安全与数据治理
邢露元1,沈心怡2,王嘉怡3
1 南京大学 法学院,江苏南京210046;2 伦敦政治经济学院法学院,英国伦敦WC2A 2AE; 3 东北农业大学文理学院,黑龙江哈尔滨150030
摘要: 探讨了生成式人工智能如ChatGPT在训练数据方面的法律风险与规制问题。首先分析了生成式人工智能在数据来源、歧视倾向、数据质量以及安全风险等方面的问题,通过对中欧法律体系的比较研究,建议明确界定治理原则,并针对数据合规性制定完善路径。最后,从具体措施层面,对中国现行的法律规制提出了具体的完善建议,为生成式人工智能的健康发展与法律规制提供有益参考。
中圖分類號:DF9文獻標識碼:ADOI:10.19358/j.issn.2097-1788.2024.01.002
引用格式:邢露元,沈心怡,王嘉怡.生成式人工智能訓(xùn)練數(shù)據(jù)風險的規(guī)制路徑研究[J].網(wǎng)絡(luò)安全與數(shù)據(jù)治理,2024,43(1):10-18.
引用格式:邢露元,沈心怡,王嘉怡.生成式人工智能訓(xùn)練數(shù)據(jù)風險的規(guī)制路徑研究[J].網(wǎng)絡(luò)安全與數(shù)據(jù)治理,2024,43(1):10-18.
Legal regulation and enhancement path for mitigating risks in training
Xing Luyuan1,Shen Xinyi2,Wang Jiayi3
1 School of Law, Nanjing University, Nanjing 210046, China; 2 School of Law, London School of Economics and Political Science, London WC2A 2AE, England;3 School of Arts and Sciences, Northeast Agricultural University, Harbin 150030, China
Abstract: This article discusses the legal risks and regulatory issues of generative artificial intelligence such as ChatGPT in training data. It begins by analyzing issues related to the sources of data, tendencies towards discrimination, data quality, and security risks in generative AI. Subsequently, the article undertakes a comparative study of Chinese and European legal systems, proposing the clear definition of governance principles and the development of comprehensive pathways for data compliance. Finally, the article offers specific recommendations from a practical standpoint for the improvement of the current legal regulations in China. These suggestions are intended to serve as proper references for the healthy development and legal regulation of generative artificial intelligence.
Key words : generative AI; artificial intelligence act; training data risks; data compliance
生成式人工智能中的訓(xùn)練數(shù)據(jù)風險不同于以往僅能進行分類、預(yù)測或?qū)崿F(xiàn)特定功能的模型,生成式人工智能大模型(Large Generative AI Models,LGAIMs)經(jīng)過訓(xùn)練可生成新的文本、圖像或音頻等內(nèi)容,且具有強大的涌現(xiàn)特性和泛化能力[1]。訓(xùn)練數(shù)據(jù)表示為概率分布,LGAIMs可以實現(xiàn)自行學(xué)習(xí)訓(xùn)練數(shù)據(jù)中的模式和關(guān)系,可以生成訓(xùn)練數(shù)據(jù)集之外的內(nèi)容[2]。同時,LGAIMs與用戶之間進行人機交互所產(chǎn)生的數(shù)據(jù)還會被用于大模型的迭代訓(xùn)練。LGAIMs的開發(fā)者往往需要使用互聯(lián)網(wǎng)上公開的數(shù)據(jù)以及和用戶的交互數(shù)據(jù)作為訓(xùn)練數(shù)據(jù),而這些數(shù)據(jù)可能存在諸多合規(guī)風險,例如數(shù)據(jù)來源風險、歧視風險和質(zhì)量風險。
作者信息:
邢露元1,沈心怡2,王嘉怡3
(1 南京大學(xué) 法學(xué)院,江蘇南京210046;2 倫敦政治經(jīng)濟學(xué)院法學(xué)院,英國倫敦WC2A 2AE;
3 東北農(nóng)業(yè)大學(xué)文理學(xué)院,黑龍江哈爾濱150030)
文章下載地址:http://ihrv.cn/resource/share/2000005886

此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。
