3.24 Releases

Convox 3.24 upgrades Kubernetes to 1.34, introduces the convox deploy-debug command, adds mixed ARM/x86 architecture support, and adds Karpenter as an opt-in alternative to Cluster Autoscaler for AWS EKS node provisioning. This release also includes Fluentd memory tuning, Terraform timeout control, automatic parameter reconciliation across version transitions, and several reliability fixes.

3.24.0

Released: 2026-03-24

Feature Additions

  • Added convox deploy-debug command for diagnosing deploy failures without kubectl access (PR #962)

Updates

  • Upgraded Kubernetes to v1.34 (PR #970)
  • Updated BuildKit to v0.28.0 (PR #970)
  • Updated CoreDNS to v1.13.2 (PR #970)
  • Updated EBS CSI Driver to v1.56.0 (PR #970)
  • Updated EFS CSI Driver to v2.3.0 (PR #970)
  • Updated Pod Identity to v1.3.10 (PR #970)
  • Updated VPC CNI to v1.21.1 (PR #970)

Fixes

  • Fixed local development rack DNS routing, TLS certificate issuance, and BuildKit registry push on minikube (PR #963)

View on GitHub

3.24.1

Released: 2026-03-31

Feature Additions

  • Added fluentd_memory rack parameter for configuring Fluentd DaemonSet memory allocation across all providers (PR #978)
  • Added terraform_update_timeout rack parameter for controlling Terraform node group update operation timeouts (PR #974)
  • Added support for mixed ARM/x86 architecture node groups within a single rack with architecture-aware build scheduling via the BuildArch app parameter (PR #964)

Updates

  • Extended rack install parameter templates to Azure, GCP, and DigitalOcean with expanded AWS parameter coverage (PR #975)
  • Improved CLI performance with parallel rack enumeration, lazy loading, and sidecar metadata caching (PR #966)
  • Standardized on Go 1.24.13 across all builds, eliminating Go 1.23 CVEs in the darwin/amd64 CLI (PR #968)

Fixes

  • Fixed API to return correct HTTP status codes (404, 409, 400, 501) instead of 500 for all errors, with JSON error response support (PR #965)
  • Fixed startupProbe using liveness timing values instead of its own configuration (PR #976)
  • Fixed local rack DNS resolution to route through ingress-nginx-controller instead of vestigial router service (PR #973)

View on GitHub

3.24.2

Released: 2026-04-06

Feature Additions

  • Added Karpenter support for AWS EKS as an opt-in alternative to Cluster Autoscaler, with ~25 configurable parameters for workload nodes, build nodes, and custom NodePools (PR #969)

Updates

  • Added automatic rack parameter reconciliation across version transitions — stale parameters are detected and removed before terraform apply, preventing failures during upgrades, downgrades, and version pinning (PR #986)

Fixes

  • Fixed convox deploy hanging or exiting silently during build log streaming due to an informer cache race condition (PR #979)
  • Fixed internalRouter services returning 404 due to internal DNS resolver routing to the external router instead of the internal router (PR #977)
  • Fixed convox logs failing with HTTP 401 after EKS token rotation (~1 hour of rack uptime) (PR #985)
  • Fixed ECR image cleanup failing silently for apps with required environment variables in convox.yml (PR #983)

View on GitHub

3.24.3

Released: 2026-04-13

Feature Additions

  • Added convox rack karpenter cleanup command for cleaning up orphaned Karpenter nodes after disabling Karpenter (PR #995)
  • Added dedicated field to additional_karpenter_nodepools_config for simple pool isolation without manual taint configuration (PR #996)
  • Added automatic nodeSelectorLabels inheritance for convox run — one-off processes now target the same nodes as their deployed Service (PR #996)
  • Added CLI parameter validation with unknown-key detection, fuzzy suggestions, install-only guards, managed-parameter protection, and type checking (PR #995)
  • Added --force (-f) flag to convox rack params set to override parameter validation guards (PR #995)

Updates

  • Extended dedicated-node toleration auto-injection to Services and Timers targeting convox.io/nodepool pools, matching existing convox.io/label behavior (PR #996)
  • Pinned CoreDNS, EBS CSI controller, EFS CSI controller, and AWS Load Balancer Controller to system nodes when Karpenter is enabled (PR #993, PR #994)
  • Added unhealthyPodEvictionPolicy: AlwaysAllow to all Convox-managed PDBs, preventing unhealthy pods from blocking node consolidation and scale-down (PR #993)
  • Added Karpenter controller readiness gate before NodePool creation to prevent silently disappearing NodePools (PR #995)
  • Improved convox rack params display to decode additional_karpenter_nodepools_config and karpenter_config as human-readable JSON (PR #995)

Fixes

  • Fixed additional node group Terraform destroy/create cycle caused by for_each key mismatch on racks configured before 3.21.1 (PR #990)
  • Fixed spurious EKS node group rolling updates caused by $Latest launch template version string (PR #995)
  • Fixed Karpenter consolidation being silently blocked by CoreDNS topology spread constraints and controller pods landing on workload nodes (PR #994)
  • Fixed LBC Helm value types for nodeSelector and toleration when Karpenter is enabled (PR #995)

View on GitHub

3.24.4

Released: 2026-04-16

Feature Additions

  • Added ecr_docker_hub_cache rack parameter for AWS that provisions an ECR pull-through cache for Docker Hub images on resource pods (Redis, Postgres, MySQL, MariaDB, Memcached, PostGIS). Docker Hub credentials are required (PR #999, PR #1010)
  • Added azure_files_enable rack parameter and azureFiles volumeOption for NFS shared storage on Azure AKS (PR #1004)
  • Implemented convox instances terminate for Kubernetes racks with drain-aware node cordoning and EC2 termination on AWS (PR #997)

Updates

  • Masked sensitive values (docker_hub_password, secret_key, token) in convox rack params output as ********** (PR #1010)
  • Extended Docker Hub imagePullSecrets to resource, service, and timer pods when docker_hub_username and docker_hub_password are set (PR #998)
  • Added aws_s3_bucket_public_access_block on the managed storage bucket for defense-in-depth (PR #1001)
  • Added CI linting pipeline with golangci-lint, govulncheck, tflint, and checkov (PR #991)
  • Bumped expr-lang/expr, opentelemetry/sdk, and stdapi for CVE patches (PR #992)
  • Replaced deprecated io/ioutil calls with modern standard library equivalents across the codebase (PR #1007)

Fixes

  • Fixed rack install and update failures in AWS opt-in regions by forcing regional STS endpoints (PR #1002)
  • Fixed deploy failures when port and ports specify the same port number in convox.yml (PR #1005)
  • Fixed KEDA and VPA Helm install race condition on fresh AWS racks (PR #959)
  • Fixed Azure AKS OIDC issuer not enabled on existing clusters at Kubernetes 1.34+ (PR #1006)
  • Fixed missing cert-manager annotation on Azure API ingress causing TLS failures (PR #1008)
  • Fixed PDB disable annotation typo (pdb-disbaledpdb-disabled); both spellings accepted (PR #1003)

View on GitHub

3.24.5

Released: 2026-04-22

Feature Additions

  • Added container-level securityContext on services and timers with support for runAsNonRoot, runAsUser, runAsGroup, readOnlyRootFilesystem, allowPrivilegeEscalation, capabilities.add/drop, and seccompProfile (RuntimeDefault or Unconfined). Settings apply to Deployment pods, CronJob pods (timers), convox run, and convox exec containers. Validation catches unsupported seccomp profiles, malformed capability names, and the runAsNonRoot: true + runAsUser: 0 conflict at convox deploy time (PR #947).
  • Added convox env mask, convox env mask set, and convox env mask unset commands to mark environment variable keys as sensitive on a per-app basis. Masked values render as **** in convox env and convox releases info output on a TTY, while piped output and the new --reveal flag continue to show real values. The mask list is stored per-app on the rack and does not trigger a release promotion (PR #1013).
  • Added health.port and liveness.port manifest fields so the readiness and liveness probes can target a dedicated health endpoint instead of the main service port. Accepts either scalar (port: 9090) or map (port: { port: 9090, scheme: https }) forms. Readiness auto-inherits the main service scheme when only the port is set; liveness does not auto-inherit. The startup probe continues to target the main service port (PR #1014).
  • Added emptyDir.sizeLimit under volumeOptions to size ephemeral volumes (e.g. /dev/shm for ML inference sidecars). Validated at manifest parse time as a Kubernetes resource quantity.
  • Added --gpu and --gpu-vendor flags to convox scale for in-place GPU updates.
  • Added convox services update <service> command mirroring the convox scale update path with the same flag set (--count, --cpu, --memory, --gpu, --gpu-vendor).
  • Added a GPU column to convox scale output. Services with gpu.count: 0 render as -.
  • Added GPU-aware startup probe defaults. Services with scale.gpu.count > 0, port.port > 0, and no explicit startupProbe now receive a TCP startup probe with grace=300s, interval=10s, timeout=5s, failureThreshold=30, successThreshold=1 — enough headroom for GPU model loads. Explicit user config always wins.
  • Surfaced GPU fields on the rack API: gpu and gpu-vendor on Service, gpu on Process, cluster-gpu and process-gpu on Capacity, gpu-capacity and gpu-allocatable on Instance.

Updates

  • Added --max-log-requests flag to convox logs and convox rack logs so services with more than 20 pods can stream logs past the default follow-stream concurrency cap. The default remains 20 when the flag is not supplied, preserving prior behavior (PR #958).
  • Added -g / --group filter to convox rack params that narrows output to a curated logical group (karpenter, network, security, scaling, nodes, build, registry, logging, ingress, domain, storage, retention, versions). Supports exact and unique-prefix matching (-g karp resolves to karpenter); ambiguous or unknown inputs print the full group list. Also extended the sensitive-param masking introduced in 3.24.4 to cover access_id, private_eks_host, private_eks_user, and private_eks_pass, closing a CLI leak path for private EKS credentials and DigitalOcean access key IDs (PR #1015).
  • Added --reveal flag and TTY-gated masking to convox rack params. Sensitive values now render as ********** only on a TTY without --reveal; piped output always shows real values so existing backup and scripting flows (convox rack params > rack.txt, | grep, | jq) continue to work. Mirrors the pattern added to convox env in the same release.
  • scale.gpu.vendor now maps through an explicit vendor → resource-key table (nvidia, nvidia.comnvidia.com/gpu; amd, amd.comamd.com/gpu). Previously the template used a .com-suffix heuristic which emitted garbage resource keys for unknown or misspelled vendors, causing pods to stay Pending forever. Unknown or unset vendors now default to nvidia.com/gpu. Customers using scale.gpu.vendor: nvidia, amd, nvidia.com, or amd.com see no change. Customers using an invalid vendor string see their GPU pods begin scheduling on NVIDIA nodes instead of Pending indefinitely.
  • GPU pod scheduling on tainted GPU nodepools (e.g. additional_karpenter_nodepools_config with nvidia.com/gpu=true:NoSchedule) no longer depends on the ExtendedResourceToleration Kubernetes admission controller (which is not enabled by default on EKS). Convox now emits the matching tolerations: entry (operator: Exists, effect: NoSchedule) directly on each pod that declares scale.gpu.count > 0. This applies to service Deployments (via service.yml.tmpl), CronJob pods (via timer.yml.tmpl), convox scale/convox services update runtime mutations (via ServiceUpdate), and one-shot convox run --gpu N pods (via podSpecFromRunOptions). The emitted toleration is effect: NoSchedule only; clusters taint-ing GPU nodes with effect: NoExecute must continue to use the admission controller or custom admission webhooks.
  • convox run --gpu N --gpu-vendor VENDOR now honors the --gpu-vendor flag (previously the run path only emitted nvidia.com/gpu).

Fixes

  • Agent services (agent.enabled: true, backed by Kubernetes DaemonSets) now report their configured cpu and memory values via the rack API's ServiceList response, the convox scale output table, and the Console Services panel. Previously the DaemonSet branch of ServiceList omitted the resource reads — agent services always showed cpu: 0, memory: 0 regardless of convox.yml scale settings. Any dashboard or tooling that sums per-service resource requests for an app will now include the agent's real footprint.
  • Removed the spurious sensitive = true attribute on the docker_hub_password Terraform variable that was blocking terraform apply against legacy rack state files. The credential remains masked in convox rack params output via the CLI sensitiveParams mechanism, and rack Terraform state continues to be stored encrypted — no protection was removed, only an attribute that was breaking the legacy update path.

Behavior change: privileged: true now renders into Deployment and CronJob pod specs

The top-level privileged: true service flag was previously honored only by convox run on V3. Deployment and CronJob pods silently dropped it. This release brings V3 Deployment and CronJob rendering in line with V2 semantics and the V3 convox run path. If you have privileged: true in a convox.yml and do not actually want a privileged pod, remove the flag before upgrading — on first deploy after 3.24.5, a pod-spec diff will trigger one rolling restart on affected services (PR #947).

Notes

  • To change GPU vendor on a deployed service, edit scale.gpu.vendor in convox.yml and redeploy. Runtime vendor-swap via convox scale --gpu-vendor or convox services update --gpu-vendor is not supported in this release — the new vendor's resource key is added but the previous vendor's key remains in the pod spec, causing scheduling to stall.
  • AWS Neuron (aws.amazon.com/neuron) is intentionally not mapped in this release. Customers should not set scale.gpu.vendor: neuron. Neuron support ships in a future release alongside automatic node labeling.

View on GitHub

See Also