生成式人工智能訓練數(shù)據(jù)風險的規(guī)制路徑研究
網(wǎng)絡安全與數(shù)據(jù)治理
邢露元1,沈心怡2,王嘉怡3
1 南京大學 法學院,江蘇南京210046;2 倫敦政治經(jīng)濟學院法學院,英國倫敦WC2A 2AE; 3 東北農(nóng)業(yè)大學文理學院,黑龍江哈爾濱150030
摘要: 探討了生成式人工智能如ChatGPT在訓練數(shù)據(jù)方面的法律風險與規(guī)制問題。首先分析了生成式人工智能在數(shù)據(jù)來源、歧視傾向、數(shù)據(jù)質(zhì)量以及安全風險等方面的問題,通過對中歐法律體系的比較研究,建議明確界定治理原則,并針對數(shù)據(jù)合規(guī)性制定完善路徑。最后,從具體措施層面,對中國現(xiàn)行的法律規(guī)制提出了具體的完善建議,為生成式人工智能的健康發(fā)展與法律規(guī)制提供有益參考。
中圖分類號:DF9文獻標識碼:ADOI:10.19358/j.issn.2097-1788.2024.01.002
引用格式:邢露元,沈心怡,王嘉怡.生成式人工智能訓練數(shù)據(jù)風險的規(guī)制路徑研究[J].網(wǎng)絡安全與數(shù)據(jù)治理,2024,43(1):10-18.
引用格式:邢露元,沈心怡,王嘉怡.生成式人工智能訓練數(shù)據(jù)風險的規(guī)制路徑研究[J].網(wǎng)絡安全與數(shù)據(jù)治理,2024,43(1):10-18.
Legal regulation and enhancement path for mitigating risks in training
Xing Luyuan1,Shen Xinyi2,Wang Jiayi3
1 School of Law, Nanjing University, Nanjing 210046, China; 2 School of Law, London School of Economics and Political Science, London WC2A 2AE, England;3 School of Arts and Sciences, Northeast Agricultural University, Harbin 150030, China
Abstract: This article discusses the legal risks and regulatory issues of generative artificial intelligence such as ChatGPT in training data. It begins by analyzing issues related to the sources of data, tendencies towards discrimination, data quality, and security risks in generative AI. Subsequently, the article undertakes a comparative study of Chinese and European legal systems, proposing the clear definition of governance principles and the development of comprehensive pathways for data compliance. Finally, the article offers specific recommendations from a practical standpoint for the improvement of the current legal regulations in China. These suggestions are intended to serve as proper references for the healthy development and legal regulation of generative artificial intelligence.
Key words : generative AI; artificial intelligence act; training data risks; data compliance
生成式人工智能中的訓練數(shù)據(jù)風險不同于以往僅能進行分類、預測或?qū)崿F(xiàn)特定功能的模型,生成式人工智能大模型(Large Generative AI Models,LGAIMs)經(jīng)過訓練可生成新的文本、圖像或音頻等內(nèi)容,且具有強大的涌現(xiàn)特性和泛化能力[1]。訓練數(shù)據(jù)表示為概率分布,LGAIMs可以實現(xiàn)自行學習訓練數(shù)據(jù)中的模式和關(guān)系,可以生成訓練數(shù)據(jù)集之外的內(nèi)容[2]。同時,LGAIMs與用戶之間進行人機交互所產(chǎn)生的數(shù)據(jù)還會被用于大模型的迭代訓練。LGAIMs的開發(fā)者往往需要使用互聯(lián)網(wǎng)上公開的數(shù)據(jù)以及和用戶的交互數(shù)據(jù)作為訓練數(shù)據(jù),而這些數(shù)據(jù)可能存在諸多合規(guī)風險,例如數(shù)據(jù)來源風險、歧視風險和質(zhì)量風險。
作者信息:
邢露元1,沈心怡2,王嘉怡3
(1 南京大學 法學院,江蘇南京210046;2 倫敦政治經(jīng)濟學院法學院,英國倫敦WC2A 2AE;
3 東北農(nóng)業(yè)大學文理學院,黑龍江哈爾濱150030)
文章下載地址:http://ihrv.cn/resource/share/2000005886
此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。