Karpenter
Convox supports Karpenter as an opt-in alternative to Cluster Autoscaler for AWS EKS node provisioning. When enabled, Karpenter manages workload and build nodes through NodePools and EC2NodeClasses, delivering faster node provisioning, cost-aware instance selection, and automatic node lifecycle management.
Karpenter is bidirectional — karpenter_enabled can be toggled on and off safely, letting you try Karpenter and revert if needed without disrupting your Rack.
Karpenter is available on AWS only. Karpenter parameters are rejected for GCP, Azure, and DigitalOcean Racks.
How Karpenter Works with Convox
When Karpenter is enabled, your Rack's node provisioning is split into three tiers:
| Tier | Managed by | Node type | Purpose |
|---|---|---|---|
| System nodes | EKS managed node groups (always) | ON_DEMAND | Rack control plane: API server, router, resolver, metrics-server, Karpenter controller |
| Workload nodes | Karpenter NodePool | Configurable | Your application Services |
| Build nodes | Karpenter NodePool | Configurable | convox build / convox deploy build pods |
Karpenter replaces Cluster Autoscaler for workload and build node scaling. System nodes always remain as EKS managed node groups to protect the Karpenter controller's own availability. System pods are pinned to system nodes via nodeSelector when Karpenter is enabled.
Karpenter version: 1.10.0 (pinned, not user-configurable)
Why Karpenter Over Cluster Autoscaler
Cluster Autoscaler (CAS) works at the Auto Scaling Group (ASG) level — it can only scale groups of identical instances and reacts to pending pods by incrementing ASG desired count. Karpenter works at the pod level, directly evaluating pending pod requirements and provisioning the optimal instance type, size, and purchasing model in seconds rather than minutes.
- Faster scaling. Karpenter provisions nodes in response to pending pods within seconds, compared to the multi-minute feedback loop of CAS
- Cost optimization. Karpenter selects the cheapest instance type that satisfies pod requirements from across all allowed families and sizes
- Node consolidation. Underutilized nodes are automatically consolidated — Karpenter moves pods to fewer, better-utilized nodes and terminates the empty ones
- Automatic node replacement. Nodes are replaced after
karpenter_node_expiry(default 30 days), keeping your fleet on current AMIs - Scale-to-zero builds. The build NodePool scales to zero when no builds are running, eliminating idle build node costs
- Multi-architecture support. Workload node architecture is auto-detected from
node_type, or set explicitly withkarpenter_arch
Enabling Karpenter
Most users enable Karpenter with a single command:
$ convox rack params set karpenter_auth_mode=true karpenter_enabled=true -r rackName
Setting parameters... OK
Karpenter uses a two-parameter enablement model:
-
karpenter_auth_mode=true— A one-way migration that prepares the EKS cluster. It migrates the cluster toAPI_AND_CONFIG_MAPaccess mode and applieskarpenter.sh/discoverytags to subnets and security groups. This cannot be reversed once enabled (matching AWS EKS behavior). -
karpenter_enabled=true— A bidirectional toggle that deploys the Karpenter controller, NodePools, IAM roles, and SQS interruption queue. Requireskarpenter_auth_mode=true. Can be toggled on and off freely.
Both can be set in the same call. If setting them separately, karpenter_auth_mode must be set first and the update must complete before setting karpenter_enabled.
Enablement Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
karpenter_auth_mode |
string | false |
One-way. Migrates EKS to API_AND_CONFIG_MAP access mode and applies discovery tags. Cannot be set back to false once enabled. |
karpenter_enabled |
string | false |
Bidirectional. Deploys Karpenter controller, NodePools, IAM roles, and SQS interruption queue. Requires karpenter_auth_mode=true. |
Workload NodePool Parameters
These parameters control how Karpenter provisions nodes for your application Services.
Instance Selection
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
karpenter_instance_families |
string | (all families) | ^[a-z0-9]+(,[a-z0-9]+)*$ |
Comma-separated EC2 instance families (e.g., c5,m6i,r5). All general-purpose families if unset. |
karpenter_instance_sizes |
string | (all sizes) | ^[a-z0-9]+(,[a-z0-9]+)*$ |
Comma-separated instance sizes (e.g., large,xlarge,2xlarge). All sizes if unset. |
karpenter_arch |
string | (auto-detect) | amd64, arm64, amd64,arm64, or empty |
CPU architecture. Empty = auto-detect from node_type. |
karpenter_capacity_types |
string | on-demand |
on-demand, spot, or on-demand,spot |
EC2 purchasing model. When both are set, Karpenter optimizes for cost and falls back to on-demand when spot is unavailable. |
Resource Limits
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
karpenter_cpu_limit |
number | 100 |
> 0 | Maximum total vCPUs Karpenter can provision across all workload nodes. Safety limit against runaway scaling. |
karpenter_memory_limit_gb |
number | 400 |
> 0 | Maximum total memory (GiB) Karpenter can provision across all workload nodes. |
Node Lifecycle and Consolidation
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
karpenter_consolidation_enabled |
bool | true |
When true: WhenEmptyOrUnderutilized (consolidates underutilized and empty nodes). When false: WhenEmpty (only removes fully empty nodes). |
|
karpenter_consolidate_after |
string | 30s |
^\d+[smh]$ |
Delay before consolidation triggers (e.g., 30s, 5m, 1h). |
karpenter_node_expiry |
string | 720h |
^\d+h$ or Never |
Maximum node lifetime before automatic replacement. Default is 30 days. Never disables automatic replacement. |
karpenter_disruption_budget_nodes |
string | 10% |
^\d+%?$ |
Maximum nodes disrupted simultaneously (e.g., 10%, 3). |
Storage
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
karpenter_node_disk |
number | 0 |
>= 0 | EBS volume size in GiB for Karpenter-provisioned nodes. 0 inherits the Rack's node_disk value. |
karpenter_node_volume_type |
string | gp3 |
gp2, gp3, io1, io2 |
EBS volume type for Karpenter-provisioned nodes. |
Labels and Taints
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
karpenter_node_labels |
string | (none) | Comma-separated key=value; no double quotes; convox.io/nodepool reserved |
Custom labels added alongside the default convox.io/nodepool=workload label. |
karpenter_node_taints |
string | (none) | Comma-separated key=value:Effect; effect must be NoSchedule, PreferNoSchedule, or NoExecute |
Custom taints on workload nodes. Prevents pods without matching tolerations from scheduling on these nodes. See Using Taints to Protect Nodes below. |
Build NodePool Parameters
These parameters control the dedicated build NodePool. The build NodePool is only created when build_node_enabled=true.
| Parameter | Type | Default | Validation | Description |
|---|---|---|---|---|
karpenter_build_instance_families |
string | (workload families) | ^[a-z0-9]+(,[a-z0-9]+)*$ |
Instance families for build nodes. Falls back to workload families if unset. |
karpenter_build_instance_sizes |
string | (workload sizes) | ^[a-z0-9]+(,[a-z0-9]+)*$ |
Instance sizes for build nodes. Falls back to workload sizes if unset. |
karpenter_build_capacity_types |
string | on-demand |
on-demand, spot, or on-demand,spot |
Purchasing model for build nodes. |
karpenter_build_cpu_limit |
number | 32 |
> 0 | Maximum total vCPUs for the build NodePool. |
karpenter_build_memory_limit_gb |
number | 256 |
> 0 | Maximum total memory (GiB) for the build NodePool. |
karpenter_build_consolidate_after |
string | 60s |
^\d+[smh]$ |
Delay before empty build nodes are consolidated. |
karpenter_build_node_labels |
string | (none) | Comma-separated key=value; no double quotes; convox-build and convox.io/nodepool reserved |
Extra labels added alongside default convox-build=true and convox.io/nodepool=build labels. |
Build Node Behavior with Karpenter
When Karpenter is enabled and build_node_enabled=true:
- The existing EKS managed build node group is scaled to zero
- Karpenter's build NodePool provisions nodes on-demand when build pods are scheduled
- Build nodes have a
dedicated=build:NoScheduletaint — only build pods run on them - Build nodes scale back to zero after the last build completes (configurable via
karpenter_build_consolidate_after, default 60s) - Architecture is auto-detected from
build_node_type - The existing
build_node_min_countparameter does not apply when Karpenter manages builds
Advanced Configuration
karpenter_config — Workload NodePool Override
For users who need access to the full Karpenter API beyond what individual parameters expose, karpenter_config provides a JSON escape hatch for the workload NodePool and its EC2NodeClass.
Individual karpenter_* parameters build the defaults. karpenter_config overrides them at the section level. For example, setting nodePool.template.spec.requirements in the config completely replaces the defaults built from karpenter_instance_families, karpenter_instance_sizes, etc.
Input formats: Raw JSON string, base64-encoded JSON, or a .json file path. Maximum 64KB.
Structure:
{
"nodePool": {
"template": {
"metadata": { "labels": { "custom-key": "custom-value" } },
"spec": {
"requirements": [],
"taints": [],
"expireAfter": "720h",
"terminationGracePeriod": "48h"
}
},
"limits": { "cpu": "200", "memory": "800Gi" },
"disruption": {
"consolidationPolicy": "WhenEmpty",
"consolidateAfter": "5m",
"budgets": [
{ "nodes": "10%" },
{ "nodes": "0", "schedule": "0 9 * * mon-fri", "duration": "8h" }
]
},
"weight": 50
},
"ec2NodeClass": {
"blockDeviceMappings": [],
"metadataOptions": { "httpTokens": "required", "httpPutResponseHopLimit": 2 },
"tags": { "Environment": "production", "CostCenter": "engineering" },
"amiSelectorTerms": [{ "alias": "al2023@latest" }],
"userData": "...",
"detailedMonitoring": true,
"associatePublicIPAddress": false,
"instanceStorePolicy": "RAID0"
}
}
Fields only available via karpenter_config:
| Field | Description |
|---|---|
nodePool.template.spec.terminationGracePeriod |
Time allowed for graceful pod eviction before force termination |
nodePool.disruption.budgets[].schedule |
Cron-based disruption windows (e.g., no disruptions during business hours) |
ec2NodeClass.userData |
Custom EC2 user data script |
ec2NodeClass.detailedMonitoring |
Enable CloudWatch detailed monitoring |
ec2NodeClass.associatePublicIPAddress |
Associate public IP with Karpenter nodes |
ec2NodeClass.instanceStorePolicy |
Instance store disk policy (e.g., RAID0) |
ec2NodeClass.amiSelectorTerms |
Custom AMI selection (default: al2023@latest) |
ec2NodeClass.metadataOptions |
EC2 instance metadata options (IMDSv2 settings) |
ec2NodeClass.blockDeviceMappings |
Custom EBS volume configuration beyond karpenter_node_disk / karpenter_node_volume_type |
Protected fields (managed by Convox, cannot be overridden):
| Field | Reason |
|---|---|
ec2NodeClass.role |
IAM role managed by Convox |
ec2NodeClass.instanceProfile |
IAM instance profile managed by Convox |
ec2NodeClass.subnetSelectorTerms |
Subnet discovery tags managed by Convox |
ec2NodeClass.securityGroupSelectorTerms |
Security group discovery tags managed by Convox |
nodePool.template.spec.nodeClassRef |
Must reference Convox-managed EC2NodeClass |
nodePool.template.metadata.labels["convox.io/nodepool"] |
Reserved Convox label |
ec2NodeClass.tags.Name |
Reserved tag (forced to {rack}/karpenter/workload) |
ec2NodeClass.tags.Rack |
Reserved tag (forced to Rack name) |
Example: maintenance window with no disruptions during business hours
$ convox rack params set karpenter_config='{"nodePool":{"disruption":{"budgets":[{"nodes":"10%"},{"nodes":"0","schedule":"0 9 * * mon-fri","duration":"8h"}]}}}' -r rackName
Setting parameters... OK
additional_karpenter_nodepools_config — Custom NodePools
Creates additional NodePools beyond the built-in workload and build pools. Each entry in the JSON array produces its own NodePool + EC2NodeClass pair with the same infrastructure settings (subnet discovery, security groups, IAM role) as the workload pool.
Use this for dedicated GPU pools, tenant isolation, specialized instance requirements, or batch processing pools.
Input formats: Raw JSON string, base64-encoded JSON, or a .json file path.
Per-pool fields:
| Field | Type | Default | Required | Description |
|---|---|---|---|---|
name |
string | yes | Unique pool identifier. Lowercase alphanumeric with dashes, max 63 chars. Reserved names: workload, build, default, system. |
|
instance_families |
string | (all) | no | Comma-separated EC2 families (e.g., g5,g6). |
instance_sizes |
string | (all) | no | Comma-separated sizes (e.g., xlarge,2xlarge). |
capacity_types |
string | on-demand |
no | on-demand, spot, or on-demand,spot. |
arch |
string | amd64 |
no | amd64, arm64, or amd64,arm64. |
cpu_limit |
integer | 100 |
no | Maximum total vCPUs for this pool. |
memory_limit_gb |
integer | 400 |
no | Maximum total memory (GiB) for this pool. |
consolidation_policy |
string | WhenEmptyOrUnderutilized |
no | WhenEmpty or WhenEmptyOrUnderutilized. |
consolidate_after |
string | 30s |
no | Delay before consolidation (e.g., 30s, 5m). |
node_expiry |
string | 720h |
no | Max node lifetime. Never to disable. |
disruption_budget_nodes |
string | 10% |
no | Max nodes disrupted simultaneously. |
disk |
integer | (workload value) | no | EBS volume size in GiB. 0 inherits workload pool disk. |
volume_type |
string | gp3 |
no | gp2, gp3, io1, io2. |
weight |
integer | (unset) | no | Scheduling weight (0-100). Higher = preferred. |
labels |
string | (none) | no | Comma-separated key=value. convox.io/nodepool is reserved. |
taints |
string | (none) | no | Comma-separated key=value:Effect. Valid effects: NoSchedule, PreferNoSchedule, NoExecute. Prevents pods without matching tolerations from scheduling on these nodes. See Using Taints to Protect Nodes below. |
Every custom pool automatically gets a convox.io/nodepool={name} label. Target Services to a custom pool using nodeSelectorLabels in convox.yml:
services:
ml-worker:
build: .
nodeSelectorLabels:
convox.io/nodepool: gpu
Example: GPU pool and high-memory pool
$ convox rack params set additional_karpenter_nodepools_config='[{"name":"gpu","instance_families":"g5,g6","capacity_types":"on-demand","cpu_limit":64,"memory_limit_gb":256,"taints":"nvidia.com/gpu=true:NoSchedule","disk":200},{"name":"high-mem","instance_families":"r5,r6i","instance_sizes":"xlarge,2xlarge,4xlarge","capacity_types":"on-demand,spot","cpu_limit":200,"memory_limit_gb":1600,"labels":"pool=high-mem"}]' -r rackName
Setting parameters... OK
Using Taints to Protect Nodes
Without taints, any pod can land on a custom NodePool's nodes if they have spare capacity — even pods that don't need those resources. For example, a basic web service could get scheduled to an expensive GPU instance. Taints prevent this by rejecting pods that lack a matching toleration.
Important:
convox.ymldoes not have atolerationsfield. You cannot manually specify tolerations in your Service definition. Instead, tolerations are handled automatically through the mechanisms described below.
GPU workloads
For GPU pools with an nvidia.com/gpu taint, use scale.gpu in convox.yml to request GPU resources:
services:
ml-worker:
build: .
scale:
gpu:
count: 1
vendor: nvidia
nodeSelectorLabels:
convox.io/nodepool: gpu
When a Service declares scale.gpu, Convox adds nvidia.com/gpu to the pod's resource requests. Kubernetes' ExtendedResourceToleration admission controller then auto-adds the matching toleration. No manual toleration configuration is needed.
You must also enable the NVIDIA device plugin on the Rack:
$ convox rack params set nvidia_device_plugin_enable=true -r rackName
Setting parameters... OK
Non-GPU taints
For non-GPU custom taints (e.g., tenant isolation), Convox does not currently auto-inject tolerations. Pods targeting pools with non-GPU taints will need tolerations added through an external mechanism such as a Kubernetes mutating admission webhook. For most non-GPU use cases, using labels + nodeSelectorLabels without taints is the recommended approach — Karpenter only provisions nodes for pods that need them, so unwanted pods won't cause unnecessary GPU-class nodes to spin up.
DaemonSets on tainted nodes
Node-level DaemonSets (fluentd, aws-node, kube-proxy, ebs-csi-node, efs-csi-node, eks-pod-identity-agent) use operator: Exists tolerations and are not affected by custom taints. They will continue to run on all nodes, including tainted custom NodePool nodes.
You can also use a JSON file:
$ convox rack params set additional_karpenter_nodepools_config=/path/to/nodepools.json -r rackName
Setting parameters... OK
System Node Behavior
System nodes are always EKS managed node groups, regardless of whether Karpenter is enabled. This ensures the Karpenter controller itself and other critical Rack components cannot be disrupted by Karpenter's own consolidation or scaling decisions.
When karpenter_enabled=true:
- System node capacity type is forced to
ON_DEMAND - System nodes get the
convox.io/system-node=truelabel - The following pods are pinned to system nodes via
nodeSelector:- Rack API server
- Router (both public and internal)
- Resolver
- Metrics server
- Cluster Autoscaler (if running)
- Karpenter controller
- Fluentd DaemonSet is not pinned — it runs on all nodes for log collection
The
convox.io/system-node=truelabel is tied tokarpenter_auth_mode(notkarpenter_enabled) to ensure labels persist during enable/disable transitions.
Cluster Autoscaler Coexistence
Karpenter and Cluster Autoscaler (CAS) can coexist when additional (non-Karpenter) node groups are present:
| Scenario | CAS state | CAS targeting |
|---|---|---|
| Karpenter enabled, no additional node groups | Scaled to 0 replicas | N/A |
Karpenter enabled, additional node groups (all dedicated=true) |
Running (1 replica, pinned to system nodes) | Explicit --nodes per additional ASG (no auto-discovery) |
| Karpenter disabled | Running (normal) | Auto-discovery |
Enabling Karpenter requires all existing additional_node_groups_config entries to have dedicated=true. This prevents scheduling overlap between CAS-managed and Karpenter-managed nodes.
Disabling Karpenter
Setting karpenter_enabled=false cleanly reverses the deployment:
- Karpenter controller drains workload and build nodes (5-minute graceful drain window)
- All Karpenter NodePools, EC2NodeClasses, IAM resources, and SQS queue are destroyed
- The managed build node group scales back up from zero to
build_node_min_count - Cluster Autoscaler resumes normal auto-discovery mode
- System pod
nodeSelectorforconvox.io/system-nodeis removed
karpenter_auth_modecannot be reverted. The EKS access config migration and discovery tags remain. This is safe and has no cost or operational impact — it means the cluster is ready for Karpenter to be re-enabled at any time.
Constraints and Limitations
- AWS only. Karpenter parameters are rejected for GCP, Azure, DigitalOcean, and other providers.
- Rack name length. Racks with Karpenter enabled are limited to 26-character names (due to derived AWS resource names like
{name}-karpenter-nodes). - BYOVPC shared VPCs. If multiple Racks share a VPC (BYOVPC into another Rack's VPC), only one may enable Karpenter due to
karpenter.sh/discoverytag collision on shared subnets and security groups. karpenter_auth_modeis one-way. Cannot be reverted once enabled (matching AWS EKS behavior for access config migration).- Additional node groups must be
dedicated=true. Required when enabling Karpenter to prevent scheduling overlap between CAS and Karpenter.
See Also
- Workload Placement for node group-based placement with Cluster Autoscaler
- Autoscaling for horizontal pod autoscaling and GPU scaling
- build_node_enabled for enabling the dedicated build node
- additional_node_groups_config for custom EKS managed node groups
- node_type for primary node instance type
- node_disk for primary node disk size
- nvidia_device_plugin_enable for GPU workloads