Scaling
The gNMIc Operator supports horizontal scaling of collector clusters. This page explains how scaling works and best practices for production deployments.
Scaling a Cluster
To scale a cluster, update the replicas field:
# Scale to 5 replicas
kubectl patch cluster my-cluster --type merge -p '{"spec":{"replicas":5}}'
Or edit the Cluster resource:
spec:
replicas: 5 # Changed from 3
What Happens When You Scale
Scale Up ( 3 → 5 pods)
- Kubernetes creates new pods (
gnmic-3,gnmic-4). - Operator waits for pods to be ready.
- Operator recomputes the distribution plan. Existing target assignments are preserved — only unassigned targets or targets displaced by capacity limits are placed on the new pods.
- Configuration is applied to all pods.
Scale Down ( 5 → 3 pods)
- Operator recomputes the distribution plan for the reduced replica count. Targets from removed pods flow through rendezvous hashing onto surviving pods, bounded by each pod’s capacity.
- Configuration is applied to remaining pods.
- Kubernetes terminates pods (
gnmic-4,gnmic-3).
Target Redistribution
The operator uses bounded load rendezvous hashing to distribute targets. See Target Distribution for a detailed explanation of the algorithm.
Key properties:
- Stable: Targets stay on their current pod unless forced to move.
- Even: No pod exceeds its capacity.
- Current-assignment aware: The operator reads each target’s current pod from its status and feeds this as input to the algorithm, minimizing churn.
Example Distribution
# 10 targets, 3 pods
Pod 0: [target1, target5, target8] (3 targets)
Pod 1: [target2, target4, target9] (3 targets)
Pod 2: [target3, target6, target7, target10] (4 targets)
# After scaling to 4 pods — existing assignments are preserved
Pod 0: [target1, target5, target8] (3 targets) - unchanged
Pod 1: [target2, target4] (2 targets) - target9 moved to new pod
Pod 2: [target3, target7, target10] (3 targets) - target6 moved to new pod
Pod 3: [target6, target9] (2 targets) - new pod
Best Practices
Start with Appropriate Size
Estimate based on:
- Number of targets
- Subscription frequency
- Data volume per target
- Number of outputs
Use Resource Limits
Ensure clusters (pods) have appropriate resources:
spec:
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "1Gi"
cpu: "2"
Monitor Before Scaling
Check metrics before scaling:
# CPU usage per pod
rate(container_cpu_usage_seconds_total{pod=~"gnmic-.*"}[5m])
# Memory usage per pod
container_memory_usage_bytes{pod=~"gnmic-.*"}
# Targets per pod (from gNMIc metrics)
gnmic_target_status{cluster="my-cluster"}
Horizontal Pod Autoscaler
The operator’s Cluster resource supports the scale subresource, allowing you
to use the Horizontal Pod Autoscaler (HPA) for automatic scaling.
HPA scales the Cluster CR, not the StatefulSet directly. This ensures the operator remains in control of target redistribution and configuration rollout.
Scaling based on target count (recommended)
Target count is the most accurate scaling signal for gNMIc — CPU/memory don’t reliably reflect the load from long-lived gRPC streaming connections.
gNMIc pods export per-target metrics:
gnmic_target_up{name="default/leaf1"} 0
gnmic_target_up{name="default/leaf2"} 0
gnmic_target_up{name="default/spine1"} 1
A value of 1 indicates that the target is present, 0 denotes it is absent.
With Prometheus Adapter, aggregate these into a per-pod metric:
sum(gnmic_target_up{namespace!="",pod!=""} == 1) by (namespace, pod)
You can assign
namespaceandpodlabels to metrics using scrape configurations or relabeling.
Example Prometheus Adapter rule:
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-adapter-rules
namespace: monitoring
data:
config.yaml: |
rules:
default: false
custom:
- seriesQuery: 'gnmic_target_up{namespace!="",pod!=""}'
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: "^gnmic_target_up$"
as: "gnmic_targets_present"
metricsQuery: |
sum(gnmic_target_up{<<.LabelMatchers>>} == 1) by (namespace, pod)
The corresponding HPA resource — scale Cluster c1 up when the average number
of targets per pod exceeds 75:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: gnmic-c1-hpa
spec:
scaleTargetRef:
apiVersion: operator.gnmic.dev/v1alpha1
kind: Cluster
name: c1
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: gnmic_targets_present
target:
type: AverageValue
averageValue: "75"
Threshold vs Capacity
When using HPA, the Cluster CR’s spec.targetDistribution.perPodCapacity acts
as a hard assignment ceiling — the operator never assigns more than
perPodCapacity targets to a single pod. The HPA averageValue (the scaling
threshold) should be set lower than capacity to create a buffer zone that
gives new pods time to start:
0 ─────── HPA threshold ─────── Capacity
(scale trigger) (assignment stops)
- When the average target count crosses the HPA threshold, HPA increases
.spec.replicas. - While the new pod is starting, existing pods continue receiving targets up
to
capacity. - If all pods reach
capacitybefore the new pod is ready, overflow targets remain unassigned until the next reconciliation. The Cluster status reports the count viastatus.unassignedTargetsand theCapacityExhaustedcondition.
Sizing guidance — set the HPA threshold to ~70–80% of capacity:
| Cluster Capacity | HPA averageValue | Headroom per pod |
|---|---|---|
| 50 | 35 | 15 (30%) |
| 100 | 75 | 25 (25%) |
| 200 | 150 | 50 (25%) |
For bursty workloads (e.g., many targets appearing at once via
TunnelTargetPolicy), use a wider buffer (lower threshold-to-capacity ratio).
Monitoring Capacity
When targets exceed the total cluster capacity, the Cluster status makes this visible:
kubectl get clusters
NAME IMAGE REPLICAS READY PIPELINES TARGETS UNASSIGNED SUBS INPUTS OUTPUTS
c1 ... 3 3 2 100 4 5 2 3
The CapacityExhausted condition provides detail:
kubectl describe cluster c1
Conditions:
Type Status Reason Message
CapacityExhausted True InsufficientCapacity 4 targets could not be assigned, all pods at capacity
Once HPA scales up and all targets are assigned, the condition clears automatically.
Scaling based on CPU/Memory
You can also use resource-based metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: gnmic-c1-hpa
spec:
scaleTargetRef:
apiVersion: operator.gnmic.dev/v1alpha1
kind: Cluster
name: c1
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Note: You must install the Kubernetes metrics server for CPU/Memory-based HPA:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Target-count-based scaling is recommended over CPU/Memory because gRPC streaming connections don’t always correlate with CPU utilization.
Considerations
Output Connections
All pods connect to all outputs. For outputs like Kafka or Prometheus:
- Each pod writes to the same destination
- Data is naturally partitioned by target
- No deduplication needed
Stateless Operation
gNMIc pods are stateless by design:
- No persistent volumes required
- Configuration comes from operator via REST API
- Targets can move between pods without data loss