Keepalive heartbeats¶
Some things can't be probed from the outside — cron jobs, backup scripts, data pipelines, batch processors. Keepalive monitors flip the relationship: your job checks in with Lanby after each successful run, and Lanby alerts you if it stops hearing back.
How it works¶
Each keepalive monitor gets a unique endpoint URL. After a successful run, your job sends a POST to that URL. Lanby records the timestamp. If the next check-in doesn't arrive within the configured interval plus a grace period, the monitor goes down and an alert fires.
There's no agent to install and no port to open. Anything that can make an HTTP request can send a heartbeat.
Info
Keepalive monitors detect missed runs, not bad output. If your backup job completes but produces corrupt data, that's a separate concern. Use them alongside output validation in your scripts, not instead of it.
Setup¶
- In the console, go to Monitors and create a new monitor.
- Choose Keepalive heartbeat as the monitor type.
- Set the expected interval and grace period.
- Save. The console shows the unique heartbeat URL and your API key.
- Add the heartbeat call to the end of your job.
The monitor starts in a pending state until the first heartbeat arrives. No alerts fire during this initial window.
The endpoint¶
By default, only the arrival time matters. A 200 OK response confirms receipt.
The request body is optional. When provided, it can carry a self-reported status — see Reporting job status.
Tip
Store your API key in an environment variable (LANBY_API_KEY) rather than hardcoding it in scripts. The examples below use this pattern.
Sending a heartbeat¶
curl¶
curl -s -X POST https://in.lanby.dev/beat/<monitor-id> \
-H "Authorization: Bearer ${LANBY_API_KEY}" \
--max-time 5
wget¶
Useful on minimal systems where curl isn't available:
wget -q -O /dev/null --post-data="" \
--header="Authorization: Bearer ${LANBY_API_KEY}" \
--timeout=5 \
https://in.lanby.dev/beat/<monitor-id>
Shell / cron — success only¶
The && ensures the heartbeat only fires if the job exits successfully:
# /etc/cron.d/backup
0 3 * * * root /opt/scripts/backup.sh && \
curl -s -X POST https://in.lanby.dev/beat/<monitor-id> \
-H "Authorization: Bearer ${LANBY_API_KEY}" \
--max-time 5 || true
The || true at the end prevents a Lanby outage from making cron report the job as failed.
systemd timer¶
For systemd timers, use ExecStartPost — it only runs if ExecStart exits zero:
# /etc/systemd/system/backup.service
[Unit]
Description=Nightly backup
[Service]
Type=oneshot
EnvironmentFile=/etc/lanby.env
ExecStart=/opt/scripts/backup.sh
ExecStartPost=curl -s -X POST https://in.lanby.dev/beat/<monitor-id> \
-H "Authorization: Bearer ${LANBY_API_KEY}" \
--max-time 5
Python¶
import os
import requests
API_KEY = os.environ["LANBY_API_KEY"]
MONITOR_ID = "<monitor-id>"
def run_backup():
# ... your job logic ...
pass
run_backup()
# Notify Lanby — don't let a Lanby outage fail the job
try:
requests.post(
f"https://in.lanby.dev/beat/{MONITOR_ID}",
headers={"Authorization": f"Bearer {API_KEY}"},
timeout=5,
)
except Exception:
pass
Go¶
package main
import (
"fmt"
"net/http"
"os"
"time"
)
func ping() {
monitorID := "<monitor-id>"
apiKey := os.Getenv("LANBY_API_KEY")
client := &http.Client{Timeout: 5 * time.Second}
req, _ := http.NewRequest(http.MethodPost,
fmt.Sprintf("https://in.lanby.dev/beat/%s", monitorID), nil)
req.Header.Set("Authorization", "Bearer "+apiKey)
resp, err := client.Do(req)
if err == nil {
resp.Body.Close()
}
}
func main() {
runBackup()
ping() // best-effort, don't check error
}
Node.js¶
const MONITOR_ID = '<monitor-id>';
const API_KEY = process.env.LANBY_API_KEY;
async function ping() {
try {
await fetch(`https://in.lanby.dev/beat/${MONITOR_ID}`, {
method: 'POST',
headers: { Authorization: `Bearer ${API_KEY}` },
signal: AbortSignal.timeout(5000),
});
} catch (_) {
// best-effort
}
}
await runBackup();
await ping();
Docker healthcheck¶
Use keepalives to verify a long-running container is functioning, not just running. Add the ping to your container's internal health logic:
HEALTHCHECK --interval=5m --timeout=10s --start-period=30s \
CMD curl -sf -X POST https://in.lanby.dev/beat/<monitor-id> \
-H "Authorization: Bearer ${LANBY_API_KEY}" || exit 1
Or ping from a sidecar script in the container at the end of each work cycle.
Ansible¶
At the end of a playbook that should complete on schedule:
- name: Notify Lanby on success
uri:
url: "https://in.lanby.dev/beat/{{ monitor_id }}"
method: POST
headers:
Authorization: "Bearer {{ lanby_api_key }}"
timeout: 5
status_code: 200
ignore_errors: true # don't fail the playbook if Lanby is unreachable
Sonarr / Radarr / *arr apps¶
The *arr apps support custom scripts on events. Create a script that pings Lanby after a successful import or health check:
#!/bin/bash
# Called by Sonarr on "On Import" event
curl -s -X POST https://in.lanby.dev/beat/<monitor-id> \
-H "Authorization: Bearer ${LANBY_API_KEY}" \
--max-time 5 || true
Set the interval in Lanby to match how often you expect new content to import (e.g. 24h with a 2-hour grace period).
Reporting job status¶
The heartbeat body is optional. Send a small JSON payload to tell Lanby the job ran but didn't fully succeed — useful for jobs that should page you even when they completed, just not cleanly.
curl -s -X POST https://in.lanby.dev/beat/<monitor-id> \
-H "Authorization: Bearer ${LANBY_API_KEY}" \
-H "Content-Type: application/json" \
-d '{"status": "degraded", "message": "2 of 47 files skipped"}' \
--max-time 5
Accepted values¶
status |
Resulting state |
|---|---|
(absent) or "ok" or 0 |
up |
"degraded" or 1 |
degraded |
"failed", "fail", "error", "down", or 2 |
down |
The optional message field is surfaced in the console and passed through to destination templates as {{message}}.
When to use it¶
- Partial success. A backup that completed but skipped files. A sync that ran but missed some records.
- Soft failure. A build that succeeded but a downstream step timed out. Your job still considers itself "done" but wants a warning.
- Self-reported errors. The job caught its own exception, ran the cleanup, and wants to page you without failing the process.
If your script exits non-zero, you generally don't want to send a heartbeat at all — Lanby will detect the missed window and alert. Status payloads are for the case where the process completed but the outcome wasn't clean.
Maintenance windows¶
A maintenance window suppresses alerts and stops the monitor from timing out during planned outages — nightly reboots, scheduled downtime, OS patching. Configure them per-monitor under Maintenance on the monitor's settings page.
Two types are supported:
| Type | Example |
|---|---|
| Recurring | 0 2 * * 0 with duration 2h — every Sunday at 02:00 local for two hours |
| One-off | From 2026-05-12 22:00 to 2026-05-13 02:00 |
Both types accept an optional reason (e.g. "Weekly reboot") that's shown in the UI and included in timeline entries.
During a maintenance window:
- Missed beats do not trigger alerts.
- The monitor does not transition to
down. - The next expected beat time is pushed forward so the window ends cleanly.
- Beats that do arrive are still recorded, so you retain visibility.
Info
Maintenance windows only apply to keepalive monitors. To suspend a probe monitor, use pause.
Pausing a monitor¶
For ad-hoc suspension — debugging, decommissioning, a one-off change — use Pause instead of a maintenance window. Two forms:
- Indefinite pause — monitor stays paused until manually resumed.
- Pause until — monitor auto-resumes at a specific time.
While paused, the monitor does not time out, fire alerts, or change state. You can record a reason that's shown in the UI.
Resuming a monitor re-opens the next expected window from the current time — previous missed windows are not backfilled.
Configuration¶
Schedule¶
A keepalive monitor has to know when to expect the next beat. Three scheduling modes are available:
| Mode | When to use |
|---|---|
| Interval | Jobs that run every N seconds/minutes/hours with no fixed wall-clock time. Picks up from the last beat. |
| Simple | Jobs that run on a daily, weekly, or monthly cadence at a specific time-of-day (e.g. "3:30 AM daily"). |
| Cron | Jobs driven by a standard 5-field cron expression (e.g. 0 */6 * * *) — matches your crontab exactly. |
Interval mode:
Simple mode:
Cron mode:
Tip
Set the timezone on wall-clock schedules so daylight-saving transitions don't cause a false alert. Interval-mode schedules are timezone-independent.
Grace window¶
Grace absorbs normal runtime variance so a slightly-late or slightly-early beat doesn't flap the monitor. The window is two-sided:
| Field | Description |
|---|---|
| Grace before | How early a beat may arrive before the expected time and still count as on-schedule. |
| Grace after | How late a beat may arrive after the expected time before the monitor transitions to down. |
The monitor stays up as long as each beat arrives within the window. If grace after elapses with no beat, the monitor transitions to down and an alert fires. Recovery is immediate on the next beat.
Choosing grace-after¶
| Job frequency | Suggested grace-after |
|---|---|
| Every minute | 2–5 minutes |
| Hourly | 10–15 minutes |
| Daily | 30–60 minutes |
| Weekly | 4–12 hours |
Set it high enough to survive occasional slow runs, but low enough that you'd still want to know if the job is stuck.
Early beats¶
What happens when a beat arrives before the grace-before window opens?
| Behavior | Effect |
|---|---|
record (default) |
Beat is saved to history; timer is not reset; state is unchanged. |
ignore |
Beat is discarded entirely; no history entry; no state change. |
degraded |
Beat is saved; monitor is marked degraded (a job that's running too often may indicate a stuck loop). |
The degraded option is useful when you explicitly want to catch unintended double-runs — e.g. a cron job that's been manually triggered on top of its normal schedule.
Anchor time¶
By default, a new keepalive monitor stays in pending until the first beat arrives. No alerts fire during this initial window. If you'd rather have the first window open at a specific wall-clock time — e.g. you know the next cron run is at 03:00 tomorrow — set an anchor time. The monitor will time out at anchor + interval + grace-after if no beat arrives before then.
Re-alert on sustained outage¶
When a keepalive goes down, a single alert fires on the transition. If the job stays broken for days, that one alert is easy to miss. Set Re-alert after missed to fire a repeat alert every N consecutive missed windows:
With a daily schedule and re-alert-after = 3, you get an alert on day 1 (when it first went down), then again on day 4, day 7, and so on — until the job recovers.