《電子技術(shù)應(yīng)用》
您所在的位置:首頁(yè) > 人工智能 > 設(shè)計(jì)應(yīng)用 > 智算操作系統(tǒng)發(fā)展路徑研究
智算操作系統(tǒng)發(fā)展路徑研究
電子技術(shù)應(yīng)用
石里男1,2,韓乃平1,齊璇1,劉乙鈞3
1.麒麟軟件有限公司;2.天津市操作系統(tǒng)重點(diǎn)實(shí)驗(yàn)室; 3.國(guó)際關(guān)系學(xué)院 國(guó)際政治系
摘要: 當(dāng)前,人工智能作為信息產(chǎn)業(yè)新質(zhì)生產(chǎn)力的典型代表,已成為世界主要國(guó)家提升國(guó)家競(jìng)爭(zhēng)力、維護(hù)國(guó)家安全的重大戰(zhàn)略,而算力短缺正在成為制約我國(guó)人工智能發(fā)展的關(guān)鍵瓶頸。針對(duì)目前國(guó)產(chǎn)化算力存在的生態(tài)碎片化問題,提出打造以具備AI增強(qiáng)的通用服務(wù)器操作系統(tǒng)為基礎(chǔ)、以智算平臺(tái)為使能的智算操作系統(tǒng),更好地支持AI應(yīng)用的開發(fā)和運(yùn)行,以滿足我國(guó)人工智能發(fā)展的算力需求。圍繞智算平臺(tái)的重要組成部分,詳細(xì)說明了異構(gòu)資源調(diào)度器和AI編程框架的國(guó)內(nèi)外發(fā)展現(xiàn)狀,同時(shí)對(duì)異構(gòu)算力的管理調(diào)度與分布式訓(xùn)練的發(fā)展情況進(jìn)行了分析。在闡述國(guó)內(nèi)外AI服務(wù)器市場(chǎng)情況和異構(gòu)算力資源管理已成現(xiàn)實(shí)的基礎(chǔ)上,指出我國(guó)AI算力發(fā)展的現(xiàn)狀,并通過系統(tǒng)梳理我國(guó)對(duì)操作系統(tǒng)發(fā)展的相關(guān)支持政策,進(jìn)一步印證了研制智算操作系統(tǒng)的可行性和必要性。繼而重點(diǎn)解析了智算操作系統(tǒng)兩大組成部分通用服務(wù)器操作系統(tǒng)的AI增強(qiáng)和智算平臺(tái)的主要功能,對(duì)智算操作系統(tǒng)的技術(shù)突破和創(chuàng)新發(fā)展提出了建議。
中圖分類號(hào):TP316 文獻(xiàn)標(biāo)志碼:A DOI: 10.16157/j.issn.0258-7998.245364
中文引用格式: 石里男,韓乃平,齊璇,等. 智算操作系統(tǒng)發(fā)展路徑研究[J]. 電子技術(shù)應(yīng)用,2024,50(10):1-6.
英文引用格式: Shi Linan,Han Naiping,Qi Xuan,et al. Research on the development paths of intelligent computing operating system[J]. Application of Electronic Technique,2024,50(10):1-6.
Research on the development paths of intelligent computing operating system
Shi Linan1,2,Han Naiping1,Qi Xuan1,Liu Yijun3
1.KylinSoft Corporation;2.Tianjin Key Laboratory of Operating System; 3.International Politics Department, University of International Relations
Abstract: At present, artificial intelligence which stands as a new quality productive force in the IT industry, is a major strategic focus among the major economies for enhancing the national competitiveness and safeguarding the national security. However, the scarcity of computing power has emerged as a critical bottleneck impeding China’s AI development. To cope with the issue of ecological fragmentation in self-developed computing power, this paper puts forward building an intelligent computing operating system based on the general server operating system with AI enhancement and enabled by the intelligent computing platform. By better supporting the development and operation of AI applications, the intelligent computing operating system can meet the computing power demand for development of AI in China. Focusing on the important parts of the intelligent computing platform, the domestic and international development status of heterogeneous resource scheduler and AI programming framework are elaborated in details. The development of the management and scheduling of heterogeneous computing power and the distributed training are also analyzed. Based on the description of the domestic and international AI server market and the reality of heterogeneous computing resource management, the current situation that the capability of chip is weak and the industrial ecology is scattered in the development of AI computing power in China is pointed out. Through systematically sorting out the relevant supporting policies of the development of operating systems in China, the feasibility and necessity of developing the operating system are further confirmed. Subsequently, this paper presents the AI enhancement of the general server operating system, which is one of the two major components of t
Key words : intelligent computing operating system;computing power;heterogeneous resource scheduler;AI programming framework;server operating system

引言

當(dāng)前,新一輪科技革命和產(chǎn)業(yè)變革突飛猛進(jìn),隨著人工智能技術(shù)的爆炸式發(fā)展,GPT-4、Sora等大模型相繼橫空出世,對(duì)操作系統(tǒng)迭代產(chǎn)生了深遠(yuǎn)影響,進(jìn)一步拓展了操作系統(tǒng)的應(yīng)用空間。國(guó)際上,以微軟、RedHat等為代表的主流操作系統(tǒng)企業(yè),已積極擁抱人工智能技術(shù)發(fā)展。其中,微軟前后投資OpenAI超過100億美元,推出了一系列人工智能產(chǎn)品和解決方案,如通過AI技術(shù)賦能Office套件、Bing搜索等核心產(chǎn)品。RedHat和Ubuntu等Linux操作系統(tǒng)企業(yè)則通過提供相應(yīng)的驅(qū)動(dòng)程序支持以及定期更新和維護(hù),保證了與CUDA和NVIDIA GPU的完全兼容性,并支持主流機(jī)器學(xué)習(xí)和深度學(xué)習(xí)框架、庫(kù)和工具,如TensorFlow、PyTorch等。但國(guó)內(nèi)暫時(shí)還沒有出現(xiàn)與人工智能大模型發(fā)展相適應(yīng),相對(duì)成熟、完善的智算操作系統(tǒng)解決方案。此外,國(guó)產(chǎn)化算力平臺(tái)存在的生態(tài)碎片化、架構(gòu)差異化、軟件不完備等現(xiàn)狀也正在成為制約國(guó)內(nèi)人工智能發(fā)展的主要瓶頸。

為解決上述問題,提出打造通用服務(wù)器操作系統(tǒng)(具備AI增強(qiáng)功能)+智算平臺(tái)(包含異構(gòu)資源調(diào)度器AI編程框架)的智算操作系統(tǒng),通過靈活調(diào)度智算集群算力、兼容各類訓(xùn)推框架、支持典型大模型在主流和國(guó)產(chǎn)GPU集群上的訓(xùn)練和推理,滿足我國(guó)人工智能技術(shù)發(fā)展對(duì)于算力的迫切需求,并為構(gòu)建自主創(chuàng)新的算力底座提供堅(jiān)實(shí)支撐。


本文詳細(xì)內(nèi)容請(qǐng)下載:

http://ihrv.cn/resource/share/2000006170


作者信息:

石里男1,2,韓乃平1,齊璇1,劉乙鈞3

(1.麒麟軟件有限公司,天津 300450;

2.天津市操作系統(tǒng)重點(diǎn)實(shí)驗(yàn)室,天津 300450;

3.國(guó)際關(guān)系學(xué)院 國(guó)際政治系,北京 100091)


Magazine.Subscription.jpg

此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。