Health Checks
Deploying a Service behind a load balancer requires a health check to determine whether a given Process is ready to handle requests.
Health checks must return a valid HTTP response code (200-399) within the configured timeout.
Processes that fail two health checks in a row are assumed dead and will be terminated and replaced.
Definition
Simple
services:
web:
health: /check
Specifying
healthas a string will set thepathand leave the other options as defaults.
Advanced
services:
web:
health:
grace: 5
interval: 5
path: /check
timeout: 3
| Attribute | Default | Description |
|---|---|---|
| grace | 5 | The amount of time in seconds to wait for a Process to boot before beginning health checks |
| interval | 5 | The number of seconds between health checks |
| path | / | The HTTP endpoint that will be requested |
| timeout | 4 | The number of seconds to wait for a valid response |
Liveness Checks
Liveness checks complement health checks by monitoring the ongoing health of running processes. While health checks (readiness probes) determine when a service is ready to receive traffic, liveness checks determine when a service should be restarted if it becomes unresponsive or enters a broken state.
When a liveness check fails, Kubernetes will restart the container, which can help recover from deadlocks, memory leaks, or other issues that cause a process to become unresponsive while still appearing to be running.
Liveness Check Configuration
services:
web:
liveness:
path: /liveness/check
grace: 15
interval: 5
timeout: 3
successThreshold: 1
failureThreshold: 3
| Attribute | Default | Description |
|---|---|---|
| path | Required. The HTTP endpoint that will be requested for liveness checks | |
| grace | 10 | The amount of time in seconds to wait for a Process to start before beginning liveness checks |
| interval | 5 | The number of seconds between liveness checks |
| timeout | 5 | The number of seconds to wait for a successful response |
| successThreshold | 1 | The number of consecutive successful checks required to consider the probe successful |
| failureThreshold | 3 | The number of consecutive failed checks required before restarting the container |
Important Considerations
- Path is Required: Unlike health checks, you must specify a
pathto enable liveness checks - Conservative Configuration: Liveness checks should be configured conservatively to avoid unnecessary restarts. False positives can cause service disruption
- Separate Endpoints: Consider using different endpoints for health checks and liveness checks to monitor different aspects of your application
- Startup Time: Set an appropriate
graceperiod to allow your application to fully initialize before liveness checks begin
Example Use Cases
Detecting Deadlocks:
services:
worker:
liveness:
path: /worker/health
grace: 30
interval: 10
failureThreshold: 5
Monitoring Memory-Intensive Applications:
services:
processor:
liveness:
path: /memory-check
grace: 45
interval: 15
timeout: 10
failureThreshold: 3
Startup Probes
Startup probes provide a way to check if an application has successfully started before allowing readiness and liveness probes to take effect. This is particularly useful for applications that require significant initialization time or have variable startup durations.
When a startup probe is configured, all other probes are disabled until it succeeds. This prevents Kubernetes from prematurely marking a service as unhealthy or restarting it before initialization completes.
Startup Probe Configuration
services:
web:
build: .
port: 3000
startupProbe:
tcpSocketPort: 3000
grace: 30
interval: 10
timeout: 5
successThreshold: 1
failureThreshold: 30
| Attribute | Default | Description |
|---|---|---|
| tcpSocketPort | Required | The TCP port to check for startup success |
| grace | 0 | The number of seconds to wait before starting startup checks |
| interval | 10 | The number of seconds between startup probe checks |
| timeout | 1 | The number of seconds to wait for a successful response |
| successThreshold | 1 | The number of consecutive successful checks required to consider the startup complete |
| failureThreshold | 3 | The number of consecutive failed checks before the container is restarted |
HTTP Startup Probe
You can also use an HTTP endpoint for startup probes:
services:
api:
build: .
port: 8080
startupProbe:
path: /startup
grace: 10
interval: 5
failureThreshold: 40
| Attribute | Default | Description |
|---|---|---|
| path | Required | The HTTP endpoint to check for startup success |
| grace | 0 | The number of seconds to wait before starting startup checks |
| interval | 10 | The number of seconds between startup probe checks |
| timeout | 1 | The number of seconds to wait for a successful response |
| successThreshold | 1 | The number of consecutive successful checks required to consider the startup complete |
| failureThreshold | 3 | The number of consecutive failed checks before the container is restarted |
Use Cases for Startup Probes
Startup probes are ideal for:
- Database Migrations: Applications that run database migrations on startup
- Cache Warming: Services that need to populate caches before serving traffic
- Large Applications: Applications with significant initialization requirements
- Configuration Loading: Services that load extensive configuration or connect to multiple external services
- Legacy Applications: Applications with unpredictable or lengthy startup times
Example: Application with Long Initialization
services:
analytics:
build: .
port: 5000
startupProbe:
tcpSocketPort: 5000
grace: 60
interval: 15
failureThreshold: 20 # Allows up to 5 minutes for startup (15s * 20)
health:
path: /health
interval: 5
liveness:
path: /live
interval: 10
failureThreshold: 3
In this example:
- The startup probe allows up to 5 minutes for the application to start
- Once the startup probe succeeds, the health and liveness checks begin
- If startup fails after 20 attempts, the container is restarted
Important Startup Probe Considerations
- Relationship with Other Probes: Liveness and readiness probes are disabled until the startup probe succeeds
- Failure Threshold: Set a high enough
failureThresholdto accommodate your application's maximum startup time - Startup vs. Liveness: Use startup probes for initialization, liveness probes for ongoing health monitoring
- Resource Planning: Consider that pods may take longer to become ready when using startup probes
gRPC Health Checks
For services that use gRPC instead of HTTP, Convox provides support for gRPC health checks through the gRPC health checking protocol. To enable gRPC health checks, you need to:
- Specify that your service uses the gRPC protocol in the port definition
- Enable the gRPC health check with the
grpcHealthEnabledattribute
Basic Configuration
services:
api:
build: .
port: grpc:50051
grpcHealthEnabled: true
Advanced Configuration
You can customize the gRPC health check behavior using the same health attributes as HTTP health checks:
services:
api:
build: .
port: grpc:50051
grpcHealthEnabled: true
health:
grace: 20
interval: 5
path: /
timeout: 2
| Attribute | Default | Description |
|---|---|---|
| grace | 5 | The amount of time in seconds to wait for a Process to boot before beginning health checks |
| interval | 5 | The number of seconds between health checks |
| path | / | The service name to check within your gRPC health implementation |
| timeout | 4 | The number of seconds to wait for a valid response |
Implementation Requirements
Services using gRPC health checks must implement the gRPC Health Checking Protocol, which is defined in the gRPC health checking protocol repository.
This protocol requires your service to implement a Health service with a Check method that returns the service's health status.
Probe Behavior
When grpcHealthEnabled is set to true, Convox configures both:
- A readinessProbe - Determines whether the service is ready to receive traffic
- A livenessProbe - Determines whether the service should be restarted
The readinessProbe ensures that gRPC services won't receive traffic until they are fully ready, while the livenessProbe monitors the ongoing health of the service and initiates restarts if necessary.
Both probes use the health settings defined in your convox.yml, ensuring consistent behavior throughout the service lifecycle.
Example Implementation
Here's a minimal example of a gRPC health check implementation in Go:
import (
"context"
"google.golang.org/grpc"
"google.golang.org/grpc/health"
healthpb "google.golang.org/grpc/health/grpc_health_v1"
)
func main() {
server := grpc.NewServer()
// Register your service
// pb.RegisterYourServiceServer(server, &yourServiceImpl{})
// Register the health service
healthServer := health.NewServer()
healthpb.RegisterHealthServer(server, healthServer)
// Set your service as serving
healthServer.SetServingStatus("", healthpb.HealthCheckResponse_SERVING)
// Continue with server initialization...
}
With this implementation and the appropriate configuration in your convox.yml, your gRPC service will properly report its health status to Convox, ensuring that it only receives traffic when it's ready to handle requests.
Version Requirements
- Basic health checks: All versions
- Liveness checks: All versions
- Startup probes: Version 3.19.7+
- gRPC health checks: All versions