You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have Grafana dashbord configured with below query however on all times noticing 100% unhealthy status across most of the nodes. Not sure whether we are hitting the bug on default ping check of 300ms which is being timeout. Can someone assist on why this would occur? how to triage further?
6hr metric view
Grafana Query:
sum(increase(goldpinger_nodes_health_total{cluster="$cluster",goldpinger_instance="$instance",status="unhealthy"}[15m])) by (goldpinger_instance) / (sum(increase(goldpinger_nodes_health_total{cluster="$cluster",goldpinger_instance="$instance",status="healthy"}[15m])) by (goldpinger_instance) + sum(increase(goldpinger_nodes_health_total{cluster="$cluster",goldpinger_instance="$instance",status="unhealthy"}[15m])) by (goldpinger_instance))
Repeated Warn Message on GoldPinger pods logs:
{"level":"warn","ts":1669303893.1442885,"caller":"goldpinger/pinger.go:151","msg":"Ping returned error","op":"pinger","name":"goldpinger","hostIP":"XX.XX.XX.XX","podIP":"XX.XX.XX.XX","responseTime":0.300629455,"error":"Get "http://XX.XX.XX.XX:8080/ping\": context deadline exceeded"}
The text was updated successfully, but these errors were encountered:
surendarmsk1
changed the title
prometheus metric of unhealthy node shows 100% unhealthy always
prometheus metric shows Node as 100% unhealthy always
Nov 24, 2022
Have Grafana dashbord configured with below query however on all times noticing 100% unhealthy status across most of the nodes. Not sure whether we are hitting the bug on default ping check of 300ms which is being timeout. Can someone assist on why this would occur? how to triage further?
6hr metric view
Grafana Query:
sum(increase(goldpinger_nodes_health_total{cluster=
"$cluster",goldpinger_instance="$instance",status="unhealthy"}[15m])) by (goldpinger_instance) / (sum(increase(goldpinger_nodes_health_total{cluster="$cluster",goldpinger_instance="$instance",status="healthy"}[15m])) by (goldpinger_instance) + sum(increase(goldpinger_nodes_health_total{cluster="$cluster",goldpinger_instance="$instance",status="unhealthy"}[15m])) by (goldpinger_instance))Repeated Warn Message on GoldPinger pods logs:
{"level":"warn","ts":1669303893.1442885,"caller":"goldpinger/pinger.go:151","msg":"Ping returned error","op":"pinger","name":"goldpinger","hostIP":"XX.XX.XX.XX","podIP":"XX.XX.XX.XX","responseTime":0.300629455,"error":"Get "http://XX.XX.XX.XX:8080/ping\": context deadline exceeded"}
The text was updated successfully, but these errors were encountered: