Loading…
现场活动
9月26日至28日
了解更多注册参加

Sched应用程式允许您创建日程,但不能替代您的活动注册。您必须先注册KubeCon + CloudNativeCon + Open Source Summit China 2023 才能参加会议。如果您还未注册但希望加入我们,请前往活动注册页面购买注册。

请注意:此日程以中国标准时间(UTC +8)自动显示。若要查看您首选时区的日程,请从右侧顶部的"Timezone"下拉菜单选择首选时区。日程可能会有变动,并且会议席位按照先到先得的原则提供。

In-person
September 26-28
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit China to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in China Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Thursday September 28, 2023 11:50am - 12:25pm CST
原生 Kubernetes 的资源管理能力有所局限:1. 静态的资源模型会导致节点的资源利用率较低,因为在线业务具有潮汐现象。2. 只支持申请整数个 GPU,在 AI 推理场景下会浪费大量昂贵的 GPU 资源。3. 原生的拓扑亲和策略只考虑了 NUMA 拓扑,难以满足搜索、推荐和 AI 大模型训练等业务对性能的要求。

在本次演讲中,曹贺和邵伟将介绍资源管理系统 Katalyst 及其在字节跳动的应用:1. 通过在离线混部提升资源利用率,并保障业务的 SLO 不受影响。2. 实现了 GPU 共享调度,支持 1% 算力粒度和 1 MiB 显存粒度的容器调度,从而提升了 AI 推理场景下的 GPU 利用率。3. 实现了拓扑感知调度,并扩展了 GPU 和 RDMA 在 PCIe Switch 级别的亲和策略,从而在分布式模型训练场景下可以使用 GPU DirectRDMA 技术来提升训练速度。4. 通过在线超分、规格推荐、潮汐混部等低使用门槛的措施提升资源效能。


The resource management capabilities of vanilla Kubernetes are limited:
  1. The static resource model leads to low resource utilization due to the tidal nature of online services. 
  2. Only full GPU requests are allowed, which causes huge GPU waste in AI inference scenarios. 
  3. The native micro-topology allocation strategy can not meet the performance requirements of workloads such as search, recommendation, and large model training.

In this talk, He and Wei will introduce a resource management system, Katalyst, and its application in ByteDance:
  1. Colocate online services and offline jobs to improve resource utilization and ensure their SLOs. 
  2. Implement GPU-sharing scheduling, which allows requests of 1% granularity computing power and 1 MiB granularity GPU memory, to improve GPU utilization in AI inference scenarios. 
  3. Implement topology-aware scheduling and customize a strategy for GPU-RDMA affinity at the PCIe switch level, so GPUDirect RDMA can be used to accelerate distributed model training.
  4. Enhance resource efficiency through easily implementable methods such as node over-commitment, specification recommendation, and tidal colocation.
Speakers
avatar for He Cao

He Cao

Senior Software Engineer, ByteDance
He Cao is a senior software engineer on the Cloud Native team at ByteDance, a maintainer of Katalyst and KubeZoo, and a member of Istio. He has 5+ years of experience in the cloud native area. Since joining ByteDance, he has designed and implemented several critical systems for VKE... Read More →
avatar for Wei Shao

Wei Shao

Senior Software Engineer, ByteDance
Wei Shao is a tech lead on the Orchestration & Scheduling team at ByteDance, and a maintainer of KubeWharf projects. Wei has 6+ years of experience in the cloud native area, focusing on resource management and performance-enhanced systems in K8s. Wei led the development of multiple... Read More →
Thursday September 28, 2023 11:50am - 12:25pm CST
3夹层 3M5B会议室 | 3M Room 3M5B
  运维+性能 | Operations + Performance
Feedback form is now closed.

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link