123,123,123

弹性自组织多集群管理系统设计与实现

网络安全与数据治理

夏令明, 周俊，赵锋

网络通信与安全紫金山实验室未来网络研究中心, 江苏南京211111

摘要： Kubernetes等云原生技术在业界应用时，承载能力有限，无法满足更高可用性要求，且易被云供应商锁定；东数西算等战略的实施运行，需以多集群管理技术为基础，但是传统的云管平台难以满足跨多云应用的服务部署和治理的挑战。提出软件定义的自组织基础设施管理、幂等的分层调度新理念，实现以集群为最小单位的弹性基础设施管理架构，将多个Kubernetes集群组成中心式、去中心式、树状等任意拓扑结构，进行应用的跨云调度及管理。方案基于树状集群结构进行了测试验证，并与其他方案对比，测试结果表明该方案能够满足未来分布式云场景下海量集群组织管理需求，且保持接入新集群不超过1 s，应用的调度延迟不超过200 ms。

關鍵詞： 自组织基础设施分布式云幂等的分层调度

中圖分類號：TP393文獻標識碼：ADOI:10.19358/j.issn.2097-1788.2023.12.014
引用格式：夏令明,周俊，趙鋒.彈性自組織多集群管理系統(tǒng)設計與實現［J］.網絡安全與數據治理，2023，42（12）：84-89.

Design and implementation of a elastic self organizing multi cluster management system

Xia Lingming, Zhou Jun, Zhao Feng

Future Network Research Center, Network Communication and Security Purple Mountain Laboratory, Nanjing 211111, China

Abstract： When cloud native technologies such as Kubernetes are applied in the industry, their carrying capacity is limited, they cannot meet higher availability requirements, and are easily locked in by cloud providers. The implementation and operation of strategies such as Eastern Data and Western Computing need to be based on multi cluster management technology. However, traditional cloud management platforms cannot meet the challenges of service deployment and governance across multi cloud applications. Aiming at the above problems, this paper puts forward a new concept of softwaredefined selforganizing infrastructure management and idempotent hierarchical scheduling. An elastic infrastructure management architecture with clusters as the smallest unit is designed and implemented, which can make multiple Kubernetes clusters into a multicluster organization scheme with any topology structure such as central, decentralized and tree, and carry out cross cloud scheduling and management of applications. The tree structure is tested and compared with other solutions, which can well meet the huge number clusters organization and management requirements in the future distributed cloud scenario while keep the registration latency of cluster limit to 1 s, scheduler latency limit to 200 ms.

Key words : self organizing infrastructure; distributed cloud; idempotent hierarchical scheduling

引言

單Kubernetes［1］集群無法滿足邊緣、地域、資源管理等需求，因此在東數西算等典型多集群場景中［2］，將不得不解決集群的接入控制、集群資源抽象、權限管理、應用管理、多集群調度、服務維持、多租戶以及多集群服務發(fā)現等問題［3-5］，這大大增加了多集群方案的復雜性和難度。目前社區(qū)和業(yè)界，集群拓撲均以父子兩層架構為主，父集群作為主控集群，其余集群為子集群，用于承載工作負載，其中主流的有Kubefed［6-7］聯(lián)邦方案、Karmada［8］、Clusternet［9］、Admiralty［10］四種。Kubefed和 Karmada是一類，它們通過Template、Overide、Propgation 等定義負載的通用配置、專有配置和調度策略。Karmada 自Kubefederation發(fā)展而來，但是支持更豐富的插件化調度能力以及多集群服務（Multi cluster service）等特性，Karmada 也順利成為CNCF基金會孵化項目。但是這二者僅支持中心式的兩層架構，擴展性和承載力都存在理論瓶頸。Clusternet 項目是一個踐行了OCM模型的多集群方案，也入選了CNCF沙箱項目，子集群通過受控的Token，在子集群啟動時，接入到父集群之中。

作者信息

夏令明, 周俊，趙鋒

(網絡通信與安全紫金山實驗室未來網絡研究中心, 江蘇南京211111)

文章下載地址：http://ihrv.cn/resource/share/2000005882

原創(chuàng)聲明：此內容為AET網站原創(chuàng)，未經授權禁止轉載。

相關內容