3.24 Releases
Convox 3.24 upgrades Kubernetes to 1.34, introduces the convox deploy-debug command, adds mixed ARM/x86 architecture support, and adds Karpenter as an opt-in alternative to Cluster Autoscaler for AWS EKS node provisioning. This release also includes Fluentd memory tuning, Terraform timeout control, automatic parameter reconciliation across version transitions, and several reliability fixes.
3.24.0
Released: 2026-03-24
Feature Additions
- Added
convox deploy-debugcommand for diagnosing deploy failures without kubectl access (PR #962)
Updates
- Upgraded Kubernetes to v1.34 (PR #970)
- Updated BuildKit to v0.28.0 (PR #970)
- Updated CoreDNS to v1.13.2 (PR #970)
- Updated EBS CSI Driver to v1.56.0 (PR #970)
- Updated EFS CSI Driver to v2.3.0 (PR #970)
- Updated Pod Identity to v1.3.10 (PR #970)
- Updated VPC CNI to v1.21.1 (PR #970)
Fixes
- Fixed local development rack DNS routing, TLS certificate issuance, and BuildKit registry push on minikube (PR #963)
3.24.1
Released: 2026-03-31
Feature Additions
- Added
fluentd_memoryrack parameter for configuring Fluentd DaemonSet memory allocation across all providers (PR #978) - Added
terraform_update_timeoutrack parameter for controlling Terraform node group update operation timeouts (PR #974) - Added support for mixed ARM/x86 architecture node groups within a single rack with architecture-aware build scheduling via the
BuildArchapp parameter (PR #964)
Updates
- Extended rack install parameter templates to Azure, GCP, and DigitalOcean with expanded AWS parameter coverage (PR #975)
- Improved CLI performance with parallel rack enumeration, lazy loading, and sidecar metadata caching (PR #966)
- Standardized on Go 1.24.13 across all builds, eliminating Go 1.23 CVEs in the darwin/amd64 CLI (PR #968)
Fixes
- Fixed API to return correct HTTP status codes (404, 409, 400, 501) instead of 500 for all errors, with JSON error response support (PR #965)
- Fixed startupProbe using liveness timing values instead of its own configuration (PR #976)
- Fixed local rack DNS resolution to route through ingress-nginx-controller instead of vestigial router service (PR #973)
3.24.2
Released: 2026-04-06
Feature Additions
- Added Karpenter support for AWS EKS as an opt-in alternative to Cluster Autoscaler, with ~25 configurable parameters for workload nodes, build nodes, and custom NodePools (PR #969)
Updates
- Added automatic rack parameter reconciliation across version transitions — stale parameters are detected and removed before
terraform apply, preventing failures during upgrades, downgrades, and version pinning (PR #986)
Fixes
- Fixed
convox deployhanging or exiting silently during build log streaming due to an informer cache race condition (PR #979) - Fixed
internalRouterservices returning 404 due to internal DNS resolver routing to the external router instead of the internal router (PR #977) - Fixed
convox logsfailing with HTTP 401 after EKS token rotation (~1 hour of rack uptime) (PR #985) - Fixed ECR image cleanup failing silently for apps with required environment variables in
convox.yml(PR #983)
3.24.3
Released: 2026-04-13
Feature Additions
- Added
convox rack karpenter cleanupcommand for cleaning up orphaned Karpenter nodes after disabling Karpenter (PR #995) - Added
dedicatedfield toadditional_karpenter_nodepools_configfor simple pool isolation without manual taint configuration (PR #996) - Added automatic
nodeSelectorLabelsinheritance forconvox run— one-off processes now target the same nodes as their deployed Service (PR #996) - Added CLI parameter validation with unknown-key detection, fuzzy suggestions, install-only guards, managed-parameter protection, and type checking (PR #995)
- Added
--force(-f) flag toconvox rack params setto override parameter validation guards (PR #995)
Updates
- Extended
dedicated-nodetoleration auto-injection to Services and Timers targetingconvox.io/nodepoolpools, matching existingconvox.io/labelbehavior (PR #996) - Pinned CoreDNS, EBS CSI controller, EFS CSI controller, and AWS Load Balancer Controller to system nodes when Karpenter is enabled (PR #993, PR #994)
- Added
unhealthyPodEvictionPolicy: AlwaysAllowto all Convox-managed PDBs, preventing unhealthy pods from blocking node consolidation and scale-down (PR #993) - Added Karpenter controller readiness gate before NodePool creation to prevent silently disappearing NodePools (PR #995)
- Improved
convox rack paramsdisplay to decodeadditional_karpenter_nodepools_configandkarpenter_configas human-readable JSON (PR #995)
Fixes
- Fixed additional node group Terraform destroy/create cycle caused by
for_eachkey mismatch on racks configured before 3.21.1 (PR #990) - Fixed spurious EKS node group rolling updates caused by
$Latestlaunch template version string (PR #995) - Fixed Karpenter consolidation being silently blocked by CoreDNS topology spread constraints and controller pods landing on workload nodes (PR #994)
- Fixed LBC Helm value types for nodeSelector and toleration when Karpenter is enabled (PR #995)
3.24.4
Released: 2026-04-16
Feature Additions
- Added
ecr_docker_hub_cacherack parameter for AWS that provisions an ECR pull-through cache for Docker Hub images on resource pods (Redis, Postgres, MySQL, MariaDB, Memcached, PostGIS). Docker Hub credentials are required (PR #999, PR #1010) - Added
azure_files_enablerack parameter andazureFilesvolumeOption for NFS shared storage on Azure AKS (PR #1004) - Implemented
convox instances terminatefor Kubernetes racks with drain-aware node cordoning and EC2 termination on AWS (PR #997)
Updates
- Masked sensitive values (
docker_hub_password,secret_key,token) inconvox rack paramsoutput as**********(PR #1010) - Extended Docker Hub
imagePullSecretsto resource, service, and timer pods whendocker_hub_usernameanddocker_hub_passwordare set (PR #998) - Added
aws_s3_bucket_public_access_blockon the managed storage bucket for defense-in-depth (PR #1001) - Added CI linting pipeline with golangci-lint, govulncheck, tflint, and checkov (PR #991)
- Bumped
expr-lang/expr,opentelemetry/sdk, andstdapifor CVE patches (PR #992) - Replaced deprecated
io/ioutilcalls with modern standard library equivalents across the codebase (PR #1007)
Fixes
- Fixed rack install and update failures in AWS opt-in regions by forcing regional STS endpoints (PR #1002)
- Fixed deploy failures when
portandportsspecify the same port number inconvox.yml(PR #1005) - Fixed KEDA and VPA Helm install race condition on fresh AWS racks (PR #959)
- Fixed Azure AKS OIDC issuer not enabled on existing clusters at Kubernetes 1.34+ (PR #1006)
- Fixed missing cert-manager annotation on Azure API ingress causing TLS failures (PR #1008)
- Fixed PDB disable annotation typo (
pdb-disbaled→pdb-disabled); both spellings accepted (PR #1003)
3.24.5
Released: 2026-04-22
Feature Additions
- Added container-level
securityContexton services and timers with support forrunAsNonRoot,runAsUser,runAsGroup,readOnlyRootFilesystem,allowPrivilegeEscalation,capabilities.add/drop, andseccompProfile(RuntimeDefaultorUnconfined). Settings apply to Deployment pods, CronJob pods (timers),convox run, andconvox execcontainers. Validation catches unsupported seccomp profiles, malformed capability names, and therunAsNonRoot: true+runAsUser: 0conflict atconvox deploytime (PR #947). - Added
convox env mask,convox env mask set, andconvox env mask unsetcommands to mark environment variable keys as sensitive on a per-app basis. Masked values render as****inconvox envandconvox releases infooutput on a TTY, while piped output and the new--revealflag continue to show real values. The mask list is stored per-app on the rack and does not trigger a release promotion (PR #1013). - Added
health.portandliveness.portmanifest fields so the readiness and liveness probes can target a dedicated health endpoint instead of the main service port. Accepts either scalar (port: 9090) or map (port: { port: 9090, scheme: https }) forms. Readiness auto-inherits the main service scheme when only the port is set; liveness does not auto-inherit. The startup probe continues to target the main service port (PR #1014). - Added
emptyDir.sizeLimitundervolumeOptionsto size ephemeral volumes (e.g./dev/shmfor ML inference sidecars). Validated at manifest parse time as a Kubernetes resource quantity. - Added
--gpuand--gpu-vendorflags toconvox scalefor in-place GPU updates. - Added
convox services update <service>command mirroring theconvox scaleupdate path with the same flag set (--count,--cpu,--memory,--gpu,--gpu-vendor). - Added a
GPUcolumn toconvox scaleoutput. Services withgpu.count: 0render as-. - Added GPU-aware startup probe defaults. Services with
scale.gpu.count > 0,port.port > 0, and no explicitstartupProbenow receive a TCP startup probe withgrace=300s,interval=10s,timeout=5s,failureThreshold=30,successThreshold=1— enough headroom for GPU model loads. Explicit user config always wins. - Surfaced GPU fields on the rack API:
gpuandgpu-vendoronService,gpuonProcess,cluster-gpuandprocess-gpuonCapacity,gpu-capacityandgpu-allocatableonInstance.
Updates
- Added
--max-log-requestsflag toconvox logsandconvox rack logsso services with more than 20 pods can stream logs past the default follow-stream concurrency cap. The default remains20when the flag is not supplied, preserving prior behavior (PR #958). - Added
-g/--groupfilter toconvox rack paramsthat narrows output to a curated logical group (karpenter,network,security,scaling,nodes,build,registry,logging,ingress,domain,storage,retention,versions). Supports exact and unique-prefix matching (-g karpresolves tokarpenter); ambiguous or unknown inputs print the full group list. Also extended the sensitive-param masking introduced in 3.24.4 to coveraccess_id,private_eks_host,private_eks_user, andprivate_eks_pass, closing a CLI leak path for private EKS credentials and DigitalOcean access key IDs (PR #1015). - Added
--revealflag and TTY-gated masking toconvox rack params. Sensitive values now render as**********only on a TTY without--reveal; piped output always shows real values so existing backup and scripting flows (convox rack params > rack.txt,| grep,| jq) continue to work. Mirrors the pattern added toconvox envin the same release. scale.gpu.vendornow maps through an explicit vendor → resource-key table (nvidia,nvidia.com→nvidia.com/gpu;amd,amd.com→amd.com/gpu). Previously the template used a.com-suffix heuristic which emitted garbage resource keys for unknown or misspelled vendors, causing pods to stay Pending forever. Unknown or unset vendors now default tonvidia.com/gpu. Customers usingscale.gpu.vendor: nvidia,amd,nvidia.com, oramd.comsee no change. Customers using an invalid vendor string see their GPU pods begin scheduling on NVIDIA nodes instead of Pending indefinitely.- GPU pod scheduling on tainted GPU nodepools (e.g.
additional_karpenter_nodepools_configwithnvidia.com/gpu=true:NoSchedule) no longer depends on theExtendedResourceTolerationKubernetes admission controller (which is not enabled by default on EKS). Convox now emits the matchingtolerations:entry (operator: Exists,effect: NoSchedule) directly on each pod that declaresscale.gpu.count > 0. This applies to service Deployments (viaservice.yml.tmpl), CronJob pods (viatimer.yml.tmpl),convox scale/convox services updateruntime mutations (viaServiceUpdate), and one-shotconvox run --gpu Npods (viapodSpecFromRunOptions). The emitted toleration iseffect: NoScheduleonly; clusters taint-ing GPU nodes witheffect: NoExecutemust continue to use the admission controller or custom admission webhooks. convox run --gpu N --gpu-vendor VENDORnow honors the--gpu-vendorflag (previously the run path only emittednvidia.com/gpu).
Fixes
- Agent services (
agent.enabled: true, backed by Kubernetes DaemonSets) now report their configuredcpuandmemoryvalues via the rack API'sServiceListresponse, theconvox scaleoutput table, and the Console Services panel. Previously the DaemonSet branch ofServiceListomitted the resource reads — agent services always showedcpu: 0, memory: 0regardless ofconvox.ymlscale settings. Any dashboard or tooling that sums per-service resource requests for an app will now include the agent's real footprint. - Removed the spurious
sensitive = trueattribute on thedocker_hub_passwordTerraform variable that was blockingterraform applyagainst legacy rack state files. The credential remains masked inconvox rack paramsoutput via the CLIsensitiveParamsmechanism, and rack Terraform state continues to be stored encrypted — no protection was removed, only an attribute that was breaking the legacy update path.
Behavior change: privileged: true now renders into Deployment and CronJob pod specs
The top-level privileged: true service flag was previously honored only by convox run on V3. Deployment and CronJob pods silently dropped it. This release brings V3 Deployment and CronJob rendering in line with V2 semantics and the V3 convox run path. If you have privileged: true in a convox.yml and do not actually want a privileged pod, remove the flag before upgrading — on first deploy after 3.24.5, a pod-spec diff will trigger one rolling restart on affected services (PR #947).
Notes
- To change GPU vendor on a deployed service, edit
scale.gpu.vendorinconvox.ymland redeploy. Runtime vendor-swap viaconvox scale --gpu-vendororconvox services update --gpu-vendoris not supported in this release — the new vendor's resource key is added but the previous vendor's key remains in the pod spec, causing scheduling to stall. - AWS Neuron (
aws.amazon.com/neuron) is intentionally not mapped in this release. Customers should not setscale.gpu.vendor: neuron. Neuron support ships in a future release alongside automatic node labeling.
See Also
- Releases for the full release history
- Karpenter for Karpenter node autoscaling configuration
- deploy-debug for deploy failure diagnostics
- BuildArch for architecture-aware build scheduling
- fluentd_memory for Fluentd memory tuning
- terraform_update_timeout for Terraform timeout configuration
- Health Checks for startupProbe configuration
- Workload Placement for mixed-architecture placement strategies
- releases_to_retain_after_active for release cleanup configuration
- ecr_docker_hub_cache for the Docker Hub pull-through cache
- azure_files_enable for Azure Files NFS volumes
- Volumes for the
azureFilesandawsEfsvolumeOption reference - Instance for
convox instances terminatebehavior on v3 racks - securityContext for container-level hardening on services and timers
- env for the env mask commands and
--revealflag - rack params for the
-g/--groupfilter and masking behavior - Separate Health Port for routing probes to a dedicated endpoint
- logs for the
--max-log-requestsflag