Healthchecks.io can monitor your cron jobs and notify you when they don't run at
expected times. Assuming
wget is available, you will not need to install
any new software on your servers.
The principle of operation is simple: your cron job sends an HTTP request ("ping") to Healthchecks.io every time it completes. When Healthchecks.io does not receive the HTTP request at the expected time, it notifies you. This monitoring technique, sometimes called "heartbeat monitoring", is a type of dead man's switch. It can detect various failure modes:
Let's take a look at an example cron job:
# run backup.sh at 06:08 every day 8 6 * * * /home/me/backup.sh
To monitor it, first create a new Check in your Healthchecks.io account:
After creating the check, copy the generated ping URL , and update the job's definition:
# run backup.sh, then send a success signal to Healthchecks.io 8 6 * * * /home/me/backup.sh && curl -fsS -m 10 --retry 5 -o /dev/null https://hc-ping.com/your-uuid-here
The extra curl call lets Healthchecks.io know the cron job has run successfully. Healthchecks.io keeps track of the received pings and notifies you as soon as a ping does not arrive on time.
Note: you can alternatively add the extra
curl call as a final line inside the
/home/me/backup.sh script, to keep the cron job's definition clean and short.
You can use an HTTP client other than curl to send the HTTP request.
The extra options in the above example tell curl to retry failed HTTP requests, limit the maximum execution time, and silence output unless there is an error. Feel free to adjust the curl options to suit your needs.
/home/me/backup.shexits with an exit code 0.
Grace Time is the amount of extra time to wait when a cron job is running late before declaring it as down. Set Grace Time to be above the expected duration of your cron job.
For example, let's say the cron job starts at 14:00 every day, and takes between 15 and 25 minutes to complete. The grace time is set to 30 minutes. In this scenario, Healthchecks.io will expect a ping to arrive at 14:00, but will not send any alerts yet. If there is no ping by 14:30, it will declare the job failed and send alerts.
Healthchecks.io has integrations to deliver notifications over different channels: email, webhooks, SMS, chat messages, incident management systems, and more. You can and should set up multiple ways to get notified about job failures:
Additionally, to make sure no issues "slip through the cracks", in the Account Settings › Email Reports page you can configure Healthchecks.io to send repeated email notifications every hour or every day as long as any of the jobs is down:
Classic cron implementations have a built-in method of notifying about cron job failures, the MAILTO variable:
MAILTOfirstname.lastname@example.org 8 6 * * * /home/me/backup.sh
So why not just use that? There are several drawbacks:
If your cron job consistently pings Healthchecks.io an hour early or an hour late, the likely cause is a timezone mismatch: your machine may be using a timezone different from what you have configured on Healthchecks.io.
On modern GNU/Linux systems, you can look up the time zone using the
timedatectl status command and looking for "Time zone" in its output:
$ timedatectl status Local time: C 2020-01-23 12:35:50 EET Universal time: C 2020-01-23 10:35:50 UTC RTC time: C 2020-01-23 10:35:50 Time zone: Europe/Riga (EET, +0200) System clock synchronized: yes NTP service: active RTC in local TZ: no
On a systemd-based system, you can use the
journalctl utility to see system logs,
including logs from the cron daemon.
To see live logs:
To see the logs from e.g. the last hour, and only from the cron daemon:
journalctl --since "1 hour ago" -t CRON