Sched应用程式允许您创建日程,但不能替代您的活动注册。您必须先注册KubeCon + CloudNativeCon + Open Source Summit China 2023 才能参加会议。如果您还未注册但希望加入我们,请前往活动注册页面购买注册。

请注意:此日程以中国标准时间(UTC +8)自动显示。若要查看您首选时区的日程,请从右侧顶部的"Timezone"下拉菜单选择首选时区。日程可能会有变动,并且会议席位按照先到先得的原则提供。

September 26-28
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon + Open Source Summit China to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in China Standard Time (UTC +8). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Back To Schedule
Wednesday, September 27 • 11:00am - 11:35am
使用KubeRay和Kueue在Kubernetes中托管Sailing Ray工作负载 | Sailing Ray Workloads with KubeRay and Kueue in Kubernetes - Jason Hu, Volcano Engine & Kante Yin, DaoCloud

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
如今,机器学习的计算需求正在迅速增长。Ray是一个统一的计算框架,可以让机器学习工程师轻松扩展他们的工作负载,而无需构建复杂的计算基础设施。 另一方面,Kubernetes是一个流行的开源容器编排平台,通过KubeRay(Ray工作负载的操作员),可以轻松管理各种工作负载。 在字节跳动,每天都有数千个作业提交到由KubeRay创建的Ray集群中。通过在长时间运行的集群上调试程序并通过Ray Job自定义资源启动常规作业,用户可以从简化的工作流程中获益。 同时,高效地管理并发的Ray作业面临着诸如作业饥饿和资源分配等挑战。Kueue是一个基于Kubernetes的本地作业排队系统,提供资源管理、多租户支持和资源公平共享等功能,完美解决了Kubernetes中Ray作业的挑战。

Compute demands for machine learning are growing rapidly nowadays. Ray, a unified computing framework, allows ML engineers to scale their workloads effortlessly without building complex computing infrastructures. On the other hand, Kubernetes, a popular open-source container orchestration platform, can help to manage a wide range of workloads at ease with KubeRay, an operator for Ray workloads. At ByteDance, thousands of jobs are submitted to the Ray cluster created by KubeRay daily. With the capability to debug programs on long-running clusters and launch regular jobs through Ray Job custom resources, users benefit from a streamlined workflow. Meanwhile, efficiently managing concurrent Ray jobs poses challenges such as job starvation and resource allocation. Kueue, a Kubernetes native job queueing system offering capacities like resource management, multi-tenant support, and resource fair-sharing perfectly addresses the Ray job challenges in Kubernetes.

avatar for Kante Yin

Kante Yin

Senior Software Engineer, DaoCloud
Kante is a senior software engineer and an open source enthusiast. He's currently working at the AI platform team at DaoCloud, based in Shanghai. He also works on upstream Kubernetes as SIG-Scheduling Maintainer and several sub-projects maintainers.
avatar for Yuanzhe Hu

Yuanzhe Hu

Software Engineer, Volcano Engine
Yuanzhe is a software engineer working at the batch computing team at Bytedance, based in Hangzhou. He is interested in Ray ecosystem and AI computing in the company.

Wednesday September 27, 2023 11:00am - 11:35am CST
3层 307会议室| 3F Room 307
  Open AI + 数据 | Open AI + Data