Kubernetes v1.36 Brings Server-Side Sharding: Smarter Scaling for Controllers

Introduction

As Kubernetes clusters expand to encompass tens of thousands of nodes, controllers that monitor high-cardinality resources such as Pods encounter a significant scalability bottleneck. Every replica of a horizontally scaled controller currently receives the complete event stream from the API server, incurring CPU, memory, and network costs to deserialize and process all events—only to discard those it does not manage. Scaling out the controller does not reduce per-replica overhead; it instead multiplies it. Kubernetes v1.36 addresses this issue with an alpha feature called server-side sharded list and watch (KEP-5866). This feature enables the API server to filter events at its source, ensuring each controller replica obtains only the slice of the resource collection it owns.

Kubernetes v1.36 Brings Server-Side Sharding: Smarter Scaling for Controllers

The Scaling Challenge Facing Watch-Based Controllers

Controllers like kube-state-metrics watch all objects of a certain type to maintain an in-memory state. With large clusters, the event stream becomes heavy. The conventional client-side sharding approach—assigning each replica a portion of the keyspace—still forces every replica to receive and parse the full data flow. This leads to wasted resources: each replica deserializes every event, then discards the ones outside its shard. The network bandwidth consumed scales linearly with the number of replicas, not with the size of the shard, and CPU cycles spent on deserialization are largely wasted.

The Problem with Client-Side Sharding

While client-side sharding works functionally, it fails to reduce the volume of data transmitted from the API server. For N replicas, each receives the full event stream:

Every replica deserializes and processes every event, then discards what it does not need.
Network bandwidth grows with the number of replicas, not with shard size.
CPU spent on deserialization is wasted for the discarded fraction.

How Server-Side Sharded List and Watch Works

This feature shifts the filtering logic from the client to the API server. Each controller replica informs the API server of the hash range it owns, and the API server sends only matching events. Clients specify their hash range using a new shardSelector field in ListOptions, implemented via the shardRange() function. For instance:

shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')

The API server computes a deterministic 64-bit FNV-1a hash of the specified field (currently object.metadata.uid or object.metadata.namespace) and returns only objects whose hash falls within the range [start, end). This applies to both list responses and watch event streams. Because the hash function produces consistent results across all API server instances, the feature is safe to use with multiple API server replicas.

Hash Function and Field Paths

The deterministic 64-bit FNV-1a hash ensures that the same object always maps to the same hash value. Currently, the supported field paths are object.metadata.uid and object.metadata.namespace. The shardSelector parameter accepts a string that defines the hash range, and the API server uses this to filter the object stream.

Implementing Sharded Watches in Controllers

To leverage this feature, controllers typically use informers to list and watch resources. Developers can inject the shardSelector into the ListOptions used by their informers via WithTweakListOptions. Here’s an example in Go:

import (
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    "k8s.io/client-go/informers"
)

shardSelector := "shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000')"

factory := informers.NewSharedInformerFactoryWithOptions(client, resyncPeriod,
    informers.WithTweakListOptions(func(opts *metav1.ListOptions) {
        opts.ShardSelector = shardSelector
    }),
)

Example Configuration for Two Replicas

For a deployment with two replicas, you split the hash space into two halves:

Replica 0: Uses shardRange(object.metadata.uid, '0x0000000000000000', '0x8000000000000000') to receive the lower half of the hash space.
Replica 1: Uses shardRange(object.metadata.uid, '0x8000000000000000', '0x0000000000000000') for the upper half (the range wraps around after the maximum 64-bit value).

This ensures each replica processes only its assigned objects, drastically reducing CPU, memory, and network overhead compared to client-side sharding.

Benefits and Implications

Server-side sharded list and watch offers several key advantages:

Reduced resource waste: The API server filters before transmission, so each replica only receives relevant events.
Linear scaling: Adding more replicas does not increase per-replica load; the total load is distributed proportionally.
Lower network bandwidth: Data transferred from the API server is proportional to shard size, not number of replicas.
Improved performance: Controllers spend less CPU on deserialization and can handle larger clusters.

This feature is particularly beneficial for cluster monitoring tools, metrics exporters, and any controller that watches high-cardinality resources. It is available as an alpha feature in Kubernetes v1.36 and must be enabled via the ServerSideShardedListAndWatch feature gate.

Conclusion

Server-side sharded list and watch represents a significant step forward in Kubernetes scalability. By moving filtering from clients to the API server, it eliminates the wasteful pattern of every controller replica receiving the full event stream. This feature enables more efficient horizontal scaling and reduces infrastructure costs, especially for large clusters. As the Kubernetes ecosystem evolves, adopting such optimizations will be crucial to managing growing workloads. For more details, refer to KEP-5866 and the Kubernetes documentation on how to use shard selectors.