AlertmanagerFailedReload #
Meaning #
The alert AlertmanagerFailedReload
is triggered when the Alertmanager instance
for the cluster monitoring stack has consistently failed to reload its
configuration for a certain period.
Impact #
The impact depends on the type of the error you will find in the logs. Most of the time, previous configuration is still working, thanks to multiple instances, so avoid deleting existing pods.
Diagnosis #
Verify if there is an error in config-reloader
container logs.
Here an example with network issues.
$ kubectl -n kubesphere-monitoring-system logs sts/alertmanager-main -c config-reloader
level=error ts=2021-09-24T11:24:52.69629226Z caller=runutil.go:101 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:9093/alertmanager/-/reload\": dial tcp [::1]:9093: connect: connection refused"
You can also verify directly alertmanager.yaml
file (default: /etc/alertmanager/config/alertmanager.yaml
).
$ kubectl -n kubesphere-monitoring-system exec alertmanager-main-0 -c alertmanager -- amtool check-config /etc/alertmanager/config/alertmanager.yaml
Mitigation #
Running amtool check-config alertmanager.yaml
on your configuration file will help you detect problem related to syntax.
You could also rollback alertmanager.yaml
to the previous version in order
to get back to a stable version.