Health Check
Health check intends to provide a unique approach to checking the health status of the OAP server. It includes the health status of modules, GraphQL, and gRPC services readiness.
0 means healthy, and more than 0 means unhealthy. less than 0 means that the OAP doesn’t start up.
Health Checker Module.
The Health Checker module helps observe the health status of modules. You may activate it as follows:
health-checker:
selector: ${SW_HEALTH_CHECKER:default}
default:
checkIntervalSeconds: ${SW_HEALTH_CHECKER_INTERVAL_SECONDS:5}
Note: The telemetry module should be enabled at the same time. This means that the provider should not be - and none.
After that, we can check the OAP server health status by querying the http endpoint: /healthcheck,
see the health check http endpoint doc.
You can also query the healthiness via other methods like GraphQL, see following.
query{
checkHealth{
score
details
}
}
If the OAP server is healthy, the response should be
{
"data": {
"checkHealth": {
"score": 0,
"details": ""
}
}
}
If some modules are unhealthy (e.g. storage H2 is down), then the result may look as follows:
{
"data": {
"checkHealth": {
"score": 1,
"details": "storage_h2,"
}
}
}
Refer to checkHealth query for more details.
The readiness of GraphQL and gRPC
Use the query above to check the readiness of GraphQL.
OAP has implemented the gRPC Health Checking Protocol. You may use the grpc-health-probe or any other tools to check the health of OAP gRPC services.
CLI tool
The swctl CLI ships a health subcommand that runs the GraphQL checkHealth
query (and, by default, the gRPC HealthCheck service) and exits with a
non-zero status when the OAP is unhealthy.
# Plain gRPC
swctl --base-url=http://OAP:12800/graphql health
# OAP gRPC with TLS (cert verification is intentionally skipped)
swctl --base-url=http://OAP:12800/graphql health --grpcTLS=true
Reading the response
A healthy OAP returns the same score: 0 envelope shown in the GraphQL
section above and the process exits 0. A failing run prints the GraphQL /
gRPC error and exits non-zero — straightforward to wire into a shell readiness
loop:
if swctl --base-url=http://OAP:12800/graphql health >/dev/null 2>&1; then
echo "OAP healthy"
else
echo "OAP not healthy"
exit 1
fi