《電子技術(shù)應(yīng)用》
您所在的位置:首頁 > 其他 > 設(shè)計(jì)應(yīng)用 > 彈性自組織多集群管理系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)
彈性自組織多集群管理系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)
網(wǎng)絡(luò)安全與數(shù)據(jù)治理
夏令明, 周俊,趙鋒
網(wǎng)絡(luò)通信與安全紫金山實(shí)驗(yàn)室 未來網(wǎng)絡(luò)研究中心, 江蘇南京211111
摘要: Kubernetes等云原生技術(shù)在業(yè)界應(yīng)用時(shí),承載能力有限,無法滿足更高可用性要求,且易被云供應(yīng)商鎖定;東數(shù)西算等戰(zhàn)略的實(shí)施運(yùn)行,需以多集群管理技術(shù)為基礎(chǔ),但是傳統(tǒng)的云管平臺(tái)難以滿足跨多云應(yīng)用的服務(wù)部署和治理的挑戰(zhàn)。提出軟件定義的自組織基礎(chǔ)設(shè)施管理、冪等的分層調(diào)度新理念,實(shí)現(xiàn)以集群為最小單位的彈性基礎(chǔ)設(shè)施管理架構(gòu),將多個(gè)Kubernetes集群組成中心式、去中心式、樹狀等任意拓?fù)浣Y(jié)構(gòu),進(jìn)行應(yīng)用的跨云調(diào)度及管理。方案基于樹狀集群結(jié)構(gòu)進(jìn)行了測(cè)試驗(yàn)證,并與其他方案對(duì)比,測(cè)試結(jié)果表明該方案能夠滿足未來分布式云場(chǎng)景下海量集群組織管理需求,且保持接入新集群不超過1 s,應(yīng)用的調(diào)度延遲不超過200 ms。
中圖分類號(hào):TP393文獻(xiàn)標(biāo)識(shí)碼:ADOI:10.19358/j.issn.2097-1788.2023.12.014
引用格式:夏令明,周俊,趙鋒.彈性自組織多集群管理系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[J].網(wǎng)絡(luò)安全與數(shù)據(jù)治理,2023,42(12):84-89.
Design and implementation of a elastic self organizing multi cluster management system
Xia Lingming, Zhou Jun, Zhao Feng
Future Network Research Center, Network Communication and Security Purple Mountain Laboratory, Nanjing 211111, China
Abstract: When cloud native technologies such as Kubernetes are applied in the industry, their carrying capacity is limited, they cannot meet higher availability requirements, and are easily locked in by cloud providers. The implementation and operation of strategies such as Eastern Data and Western Computing need to be based on multi cluster management technology. However, traditional cloud management platforms cannot meet the challenges of service deployment and governance across multi cloud applications. Aiming at the above problems, this paper puts forward a new concept of softwaredefined selforganizing infrastructure management and idempotent hierarchical scheduling. An elastic infrastructure management architecture with clusters as the smallest unit is designed and implemented, which can make multiple Kubernetes clusters into a multicluster organization scheme with any topology structure such as central, decentralized and tree, and carry out cross cloud scheduling and management of applications. The tree structure is tested and compared with other solutions, which can well meet the huge number clusters organization and management requirements in the future distributed cloud scenario while keep the registration latency of cluster limit to 1 s, scheduler latency limit to 200 ms.
Key words : self organizing infrastructure; distributed cloud; idempotent hierarchical scheduling

引言

單Kubernetes[1]集群無法滿足邊緣、地域、資源管理等需求,因此在東數(shù)西算等典型多集群場(chǎng)景中[2],將不得不解決集群的接入控制、集群資源抽象、權(quán)限管理、應(yīng)用管理、多集群調(diào)度、服務(wù)維持、多租戶以及多集群服務(wù)發(fā)現(xiàn)等問題[3-5],這大大增加了多集群方案的復(fù)雜性和難度。目前社區(qū)和業(yè)界,集群拓?fù)渚愿缸觾蓪蛹軜?gòu)為主,父集群作為主控集群,其余集群為子集群,用于承載工作負(fù)載,其中主流的有Kubefed[6-7]聯(lián)邦方案、Karmada[8]、Clusternet[9]、Admiralty[10]四種。Kubefed和 Karmada是一類,它們通過Template、Overide、Propgation 等定義負(fù)載的通用配置、專有配置和調(diào)度策略。Karmada 自Kubefederation發(fā)展而來,但是支持更豐富的插件化調(diào)度能力以及多集群服務(wù)(Multi cluster service)等特性,Karmada 也順利成為CNCF基金會(huì)孵化項(xiàng)目。但是這二者僅支持中心式的兩層架構(gòu),擴(kuò)展性和承載力都存在理論瓶頸。Clusternet 項(xiàng)目是一個(gè)踐行了OCM模型的多集群方案,也入選了CNCF沙箱項(xiàng)目,子集群通過受控的Token,在子集群?jiǎn)?dòng)時(shí),接入到父集群之中。


作者信息

夏令明, 周俊,趙鋒

(網(wǎng)絡(luò)通信與安全紫金山實(shí)驗(yàn)室 未來網(wǎng)絡(luò)研究中心, 江蘇南京211111)


文章下載地址:http://ihrv.cn/resource/share/2000005882


weidian.jpg

此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。