Presented by

  • Tejun Heo

    Tejun Heo

    Tejun has been working on the Linux kernel for the past two decades. He has worked on various subsystems including the block layer, libata and per-cpu memory allocator, and currently maintains cgroup (Control Groups) and workqueue. In the several years, he has been focusing on the IO Cost cgroup controller and sched_ext, a BPF extensible scheduling class.

Abstract

sched_ext is a new BPF extensible scheduling class for the Linux kernel. It allows implementing arbitrary schedulers in BPF and has a strong safety guarantee - a misbehaving scheduler may make wrong scheduling decisions but can't crash the system. Enabling a new scheduler is as simple as compling a BPF program and running the binary, and the system can always safely be reverted to the builtin default kernel scheduler. This capability to safely and quickly iterate scheduler implementations radically speeds up both its development and deployment. CPUs have been and continue to become more complex with both core count and toplogy complexity increasing significantly, which in turn substantially expands the scheduling problem space. one of sched_ext's main goals is enabling exploration of the scheduling problem space in a speedy, safe and collaborative manner. While sched_ext is in its very early stage, we're already seeing sizable gains with production workloads in Meta from employing strategies such as aggressive work-conservation, soft-affinity and application specific hinting. This presentation takes a look at what sched_ext is, how to use it and how it's starting to be employed in Meta fleet. sched_ext has not been merged into the upstream kernel yet. Please see http://lkml.kernel.org/r/20231111024835.2164816-1-tj@kernel.org for detailed description and discussions.