Monitor types¶

Lanby supports two broad categories of monitoring: probes that actively test a service on a schedule, and keepalive heartbeats that expect your service to check in periodically.

flowchart LR
    L(Lanby)
    P(Probe monitor)
    K(Keepalive monitor)
    S(Your service)

    L -->|checks on schedule| P
    P -->|probes| S
    S -->|pings in| K
    K --> L

Monitor states¶

Every monitor is always in one of these states:

State	Meaning
`pending`	Newly created, no results yet. No alerts fire.
`up`	Last check passed.
`degraded`	Check passed but response was slow (above `slow_threshold_ms`), or a TLS certificate is expiring soon.
`down`	Check failed — wrong status, timeout, connection refused, etc.
`paused`	Monitoring suspended. No checks run, no alerts fire.
`unknown`	Monitor exists but no recent data. Occurs after a long offline period.

Degraded vs down: degraded means the service is reachable but something is worth flagging — high latency, an expiring certificate, or an unexpected but non-fatal response. down means the service failed the check outright. Both states trigger alerts; you can configure separate alert channels for each.

Retries and recovery¶

Retries: Before marking a monitor down, Lanby retries the failing probe up to the configured retry count. This prevents transient blips from firing spurious alerts. A probe must fail retries + 1 consecutive times to transition to down.

Recovery: By default, a single passing check recovers a monitor from down back to up. Set recovery_successes to require multiple consecutive passes before recovery — useful for flappy services.

Recovery interval: When a monitor is down or degraded, the relay switches to recovery_interval_seconds instead of the normal interval. Set this lower than the normal interval to detect recovery faster.

Probe monitors¶

Probe monitors run on a configured schedule and actively test a target. If the target fails — wrong status code, unreachable port, timeout — Lanby marks it as down and fires an alert.

Probes run from the Lanby platform (for publicly reachable services) or from a relay agent for private network services.

HTTP / HTTPS¶

Sends an HTTP request to a URL and validates the response. The most common probe type.

Configuration¶

Field	Default	Description
`target`	required	Full URL including scheme. e.g. `https://mynas.local:8080/health`
`method`	`GET`	HTTP method: `GET`, `HEAD`, or `POST`
`interval_seconds`	`60`	How often to run the probe
`timeout_seconds`	`10`	Request timeout. Probe fails if no response within this time.
`retries`	`0`	Number of additional attempts before marking down
`recovery_successes`	`1`	Consecutive passes needed to recover from down
`recovery_interval_seconds`	(same as interval)	Interval to use while the monitor is down/degraded
`slow_threshold_ms`	(disabled)	Mark as `degraded` if response takes longer than this
`expected_status`	(any 2xx)	Exact HTTP status code required for success
`success_http_status_codes`	(empty)	List of acceptable HTTP status codes. Overrides `expected_status` if set.
`http_body_contains`	(disabled)	Response body must contain this substring
`follow_redirects`	`true`	Whether to follow HTTP redirects
`max_redirects`	`5`	Maximum redirects to follow
`headers`	(empty)	Map of HTTP headers to include in the request
`ignore_tls_errors`	`false`	Skip TLS certificate verification. Use only for internal services with self-signed certs.
`check_cert_expiry`	`false`	Alert when the TLS certificate is close to expiry
`cert_expiry_min_days`	`14`	Days before expiry to start alerting (requires `check_cert_expiry: true`)

Examples¶

Basic health check:

Target: https://myapp.local/health
Method: GET
Expected status: 200
Interval: 60s
Timeout: 10s

Authenticated API endpoint:

Target: https://myapp.local/api/status
Method: GET
Headers:
  Authorization: Bearer mysecrettoken
  X-Internal: true
Expected status: 200

Body keyword match — check the app is actually up, not just returning 200 from a load balancer:

Target: https://myapp.local/
Body contains: "status":"healthy"

Self-signed certificate (common for internal services):

Target: https://192.168.1.50:8443/
Ignore TLS errors: true

Certificate expiry monitoring:

Target: https://myapp.example.com/
Check cert expiry: true
Cert expiry min days: 21

This marks the monitor as degraded 21 days before the cert expires, giving you time to renew before it goes down.

Slow response alerting:

Target: https://myapp.local/
Slow threshold: 2000ms

Responses over 2 seconds mark the monitor degraded even if the status code is correct.

Specific status codes — useful for endpoints that return 204 or 401:

Target: https://myapp.local/api/metrics
Success status codes: [200, 204]

TCP port¶

Attempts to open a TCP connection to a host and port. Succeeds if the connection is accepted; fails if refused or timed out. No application-layer handshake — pure connectivity.

Configuration¶

Field	Default	Description
`target`	required	`host:port` — e.g. `192.168.1.10:5432` or `mynas.local:22`
`interval_seconds`	`60`	How often to probe
`timeout_seconds`	`10`	Connection timeout
`retries`	`0`	Retries before marking down
`recovery_successes`	`1`	Passes needed to recover
`recovery_interval_seconds`	(same as interval)	Faster interval while down

Examples¶

# PostgreSQL on a private server
Target: 192.168.1.10:5432

# SSH availability
Target: mynas.local:22

# Minecraft server
Target: mc.home.arpa:25565

# Home Assistant
Target: homeassistant.local:8123

ICMP ping¶

Sends ICMP echo requests. The simplest reachability check — useful when no port is guaranteed to be open.

Warning

ICMP ping requires a relay. The Lanby platform runs in cloud environments that block raw ICMP. Additionally, the relay container needs NET_RAW capability — see relay docs.

Configuration¶

Field	Default	Description
`target`	required	Hostname or IP address. e.g. `192.168.1.1` or `router.local`
`timeout_seconds`	`10`	Wait time for ICMP reply
`interval_seconds`	`60`	How often to ping
`retries`	`0`	Retries before marking down

Examples¶

# Router/gateway reachability
Target: 192.168.1.1

# Network device with no open ports
Target: 192.168.1.200

# Another machine by hostname
Target: myserver.local

DNS¶

Resolves a DNS name and optionally validates the answer. Useful for detecting broken records, split-horizon mismatches, or unexpected changes.

Configuration¶

Field	Default	Description
`target`	required	Used as `dns_host` if `dns_host` is not set
`dns_host`	(target value)	The hostname to resolve
`dns_type`	`A`	Record type: `A`, `AAAA`, `CNAME`, `TXT`, `NS`
`dns_expect`	(disabled)	Substring that must appear in at least one answer record
`dns_nameserver`	(system resolver)	Query a specific nameserver instead. e.g. `192.168.1.1` or `8.8.8.8`
`interval_seconds`	`60`	How often to query
`timeout_seconds`	`5`	Query timeout

Examples¶

Check a domain resolves at all:

DNS host: myapp.local
Type: A

Verify a specific IP is returned (e.g. your internal DNS overrides an external record):

DNS host: myapp.example.com
Type: A
Expect: 192.168.1.50

Fails if the answer doesn't contain 192.168.1.50 — useful to catch split-horizon DNS breaking.

Check your Pi-hole or AdGuard Home is resolving correctly:

DNS host: google.com
Type: A
Nameserver: 192.168.1.53

Queries your local resolver directly, bypassing the system default.

Monitor a Let's Encrypt TXT record or DKIM selector:

DNS host: _dmarc.example.com
Type: TXT
Expect: v=DMARC1

Check CNAME is pointing at the right place:

DNS host: www.example.com
Type: CNAME
Expect: example.com

gRPC health¶

Calls the standard grpc.health.v1.Health/Check RPC and expects a SERVING response. Compatible with any service implementing the gRPC Health Checking Protocol.

Configuration¶

Field	Default	Description
`target`	required	`host:port` — e.g. `myservice.local:50051`
`grpc_service`	(empty — checks overall server health)	Sub-service name to check
`grpc_tls`	`false`	Use TLS. Set to `true` for production gRPC services.
`interval_seconds`	`60`	How often to check
`timeout_seconds`	`10`	RPC timeout
`retries`	`0`	Retries before marking down

Examples¶

Overall server health:

Target: myservice.local:50051
TLS: false

Specific sub-service (e.g. a gRPC service with multiple handlers):

Target: myservice.local:50051
Service: myapp.v1.UserService
TLS: true

Keepalive monitors¶

Keepalive monitors flip the relationship: your service checks in with Lanby after each run. If check-ins stop arriving, Lanby alerts you.

Full keepalive documentation →

Planned probe types¶

Browser / synthetic¶

Drives a real browser to load a page and optionally interact with it — clicking buttons, filling forms, asserting text. Catches JavaScript errors, broken logins, and issues that HTTP probes miss entirely.

Requires a relay. Powered by Playwright.

SMTP / mail server¶

Connects to an SMTP server and performs the initial handshake (EHLO). Confirms the mail server is listening and responding — useful for self-hosted mail setups like Mailcow, Maddy, or Postfix.

Ports 25, 465, 587. STARTTLS support planned.

UDP¶

Sends a UDP packet and optionally checks for a response. Useful for game servers, VPN endpoints (WireGuard, OpenVPN), and other connectionless services.

Requires a relay.

SNMP¶

Polls an SNMP OID and checks the returned value against a threshold or expected string. Monitor network switches, routers, NAS devices, and UPSes that speak SNMP but don't expose an HTTP API.

SNMPv1, v2c, v3. Requires a relay.

Push / webhook receiver¶

Receives an inbound webhook payload from an external service (Grafana alerts, GitHub Actions, Uptime Kuma) and converts it into a Lanby notification.

Info

Have a monitor type you need that isn't listed? Reach out — we build based on what self-hosters actually run.

About the name¶

A LANBY — Large Automatic Navigation BuoY — is a floating navigational aid designed to replace crewed lightships. It sits offshore, watches over shipping lanes, and runs without intervention.

That's what Lanby the product is: infrastructure that watches your services quietly and reliably, from the outside, so you don't have to.