Monitoring and Alerting

Proactive monitoring ensures your WiFi authentication infrastructure remains operational. This guide covers IronWiFi's built-in monitoring, external health checks, real-time alerting, and threshold configuration to catch problems before users report them.

Monitoring Overview

A comprehensive monitoring strategy for IronWiFi includes:

Layer	What to Monitor	How
RADIUS service	Authentication success/failure rates	IronWiFi Console Reports
Endpoint health	RADIUS server reachability	External monitoring (UptimeRobot, Pingdom)
Session metrics	Active sessions, bandwidth usage	IronWiFi Console + API
Infrastructure	Access point connectivity	Your network management tools

IronWiFi Console Monitoring

Real-Time Authentication Status

The IronWiFi Console provides real-time visibility into authentication activity:

Navigate to Reports > Authentication
View recent authentication attempts with results (Accept/Reject)
Identify patterns such as sudden spikes in rejections

Network Status Indicators

Each Network in the IronWiFi Console displays a status indicator:

Green -- RADIUS servers healthy and responding
Red -- Potential issues detected

Check the Networks page regularly or after infrastructure changes to confirm all Networks show a healthy status.

Session Monitoring

Monitor active user sessions:

Navigate to Reports > Sessions
Review currently active sessions (no Stop Time recorded)
Monitor session counts for unusual spikes or drops

A sudden drop in active sessions may indicate an access point outage or network issue, even if IronWiFi's RADIUS servers are healthy.

External Health Checks

External monitoring independently verifies that IronWiFi RADIUS servers respond to authentication requests. This catches issues that internal monitoring alone might miss, such as network path problems between your access points and IronWiFi.

Setting Up External RADIUS Monitoring

IronWiFi provides an authentication testing tool that external monitors can call:

Navigate to ironwifi.com/authentication-testing-tool
Enter your RADIUS server details (IP, port, shared secret)
Enter test user credentials
Submit the test and confirm
```
Access-Accept
```
Copy the resulting URL -- this is your monitoring endpoint

Configure your external monitoring service to call this URL at regular intervals and verify the response contains

Access-Accept

For detailed setup instructions with UptimeRobot, see the Service Monitor guide.

Recommended Health Check Intervals

Monitor Type	Interval	Rationale
RADIUS authentication test	5 minutes	Balances detection speed with test load
Captive portal HTTP check	5 minutes	Verifies splash page availability
IronWiFi status page	Subscribe to alerts	Immediate notification of platform incidents

What to Monitor

Set up separate monitors for each component:

Primary RADIUS server -- Authentication test against primary IP
Backup RADIUS server -- Authentication test against backup IP
Captive portal -- HTTP check on your splash page URL (if using captive portals)
IronWiFi status page -- Subscribe to status.ironwifi.com for platform-level alerts

tip

Monitor both primary and backup RADIUS servers independently. If only the backup fails, authentication still works but you have no redundancy -- a critical gap that should trigger an alert.

Configuring Alerts

Alert Channels

Configure multiple alert channels to ensure notifications reach the right people:

Channel	Best For	Setup
Email	General notifications, audit trail	Add team email addresses
SMS	Critical alerts requiring immediate attention	Add phone numbers (may require paid monitoring plan)
Slack/Teams	Team-wide visibility, incident coordination	Configure webhook integration
PagerDuty/Opsgenie	On-call rotation, escalation	Configure API integration
Webhook	Custom integrations, ticketing systems	Provide endpoint URL

Threshold-Based Alerts

Define alert conditions based on metrics that indicate problems:

Authentication failure rate

Monitor the ratio of rejected to total authentication attempts:

Warning threshold: Reject rate exceeds 10% over a 15-minute window
Critical threshold: Reject rate exceeds 25% over a 5-minute window

A spike in rejections often indicates:

Expired or changed credentials (e.g., after a password rotation)
Certificate issues (expired CA or server certificate)
Configuration changes on access points (wrong shared secret)
Network path issues between access points and RADIUS servers

Session count anomalies

Monitor for unusual changes in active session counts:

Alert on sudden drop: Active sessions drop by more than 50% within 15 minutes (may indicate access point or network outage)
Alert on unusual spike: Sessions exceed 150% of the typical peak (may indicate unauthorized access or misconfiguration)

Response time

If your monitoring tool tracks response time:

Warning: RADIUS authentication response time exceeds 500ms
Critical: Response time exceeds 2 seconds

Setting Up Alerts with the API

Use the REST API to build custom monitoring and alerting:

# Retrieve recent authentication events
curl -X GET "https://console.ironwifi.com/api/reports/authentications?from=2024-01-15T00:00:00Z&to=2024-01-15T23:59:59Z" \
  -H "Authorization: Bearer YOUR_API_KEY"

Build a script that:

Polls the authentication API at regular intervals
Calculates the reject rate over a sliding window
Sends an alert via your preferred channel when thresholds are exceeded

Example monitoring script pattern:

import requests
import time
from datetime import datetime, timedelta

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://console.ironwifi.com/api"
REJECT_THRESHOLD = 0.10  # 10% reject rate

def check_auth_health():
    now = datetime.utcnow()
    window_start = (now - timedelta(minutes=15)).isoformat() + "Z"
    window_end = now.isoformat() + "Z"

    response = requests.get(
        f"{BASE_URL}/reports/authentications",
        params={"from": window_start, "to": window_end},
        headers={"Authorization": f"Bearer {API_KEY}"}
    )
    events = response.json()

    total = len(events)
    rejects = sum(1 for e in events if e.get("result") == "Reject")

    if total > 0 and (rejects / total) > REJECT_THRESHOLD:
        send_alert(f"High reject rate: {rejects}/{total} "
                   f"({rejects/total:.0%}) in last 15 minutes")

def send_alert(message):
    # Send to Slack, PagerDuty, email, etc.
    print(f"ALERT: {message}")

# Run every 5 minutes
while True:
    check_auth_health()
    time.sleep(300)

note

This is a simplified example. In production, handle API errors gracefully, use a proper scheduler (cron, systemd timer), and avoid storing API keys in source code.

Health Check Dashboard

Create a centralized view of your WiFi authentication health by combining:

External monitoring dashboard (UptimeRobot, Pingdom, or Datadog) -- Shows RADIUS server reachability and response time
IronWiFi Console -- Shows authentication events, session data, and Network status
Custom dashboard -- Pull data via the API to create a unified view with your other infrastructure metrics

Key metrics for your dashboard

Metric	Source	Refresh Interval
RADIUS server status	External monitor	5 minutes
Authentication success rate	IronWiFi API	5 minutes
Active session count	IronWiFi API	5 minutes
Average response time	External monitor	5 minutes
Captive portal status	External monitor (HTTP)	5 minutes

Incident Response

When an alert fires, follow this process:

Step 1: Assess scope

Check the IronWiFi Console for authentication errors
Verify if the issue affects all users or specific ones
Check status.ironwifi.com for platform incidents

Step 2: Isolate the component

RADIUS server unreachable: Check firewall rules, network connectivity
High reject rate but server reachable: Check user credentials, certificates, or recent configuration changes
Captive portal down: Verify splash page URL, check walled garden configuration

Step 3: Resolve and verify

Apply the fix (restart AP, correct configuration, etc.)
Verify authentication succeeds with a test user
Confirm monitoring returns to green status
Document the incident for future reference

For detailed troubleshooting steps, see the Troubleshooting Guide.

Best Practices

Monitor every RADIUS server -- Both primary and backup, for every Network
Use a dedicated test user -- Create a user specifically for monitoring; do not use a real user's credentials
Set up multiple alert channels -- Email for audit trail, SMS or Slack for immediate response
Test your alerts regularly -- Intentionally trigger an alert to verify the notification chain works
Review thresholds quarterly -- Adjust warning and critical thresholds as your deployment grows
Subscribe to IronWiFi status updates -- Get notified of platform-level incidents at status.ironwifi.com
Document your runbooks -- Write down the steps to follow when each alert type fires (see Operational Runbooks)

Service Monitor -- Step-by-step UptimeRobot setup for IronWiFi monitoring
Troubleshooting -- Diagnosing authentication and connectivity issues
Reporting and Analytics -- Detailed report types and data export
Networks -- RADIUS server configuration and status
REST API -- Programmatic access to monitoring data
Operational Runbooks -- Incident response procedures

Was this page helpful?

Monitoring Overview​

IronWiFi Console Monitoring​

Real-Time Authentication Status​

Network Status Indicators​

Session Monitoring​

External Health Checks​

Setting Up External RADIUS Monitoring​

Recommended Health Check Intervals​

What to Monitor​

Configuring Alerts​

Alert Channels​

Threshold-Based Alerts​

Authentication failure rate​

Session count anomalies​

Response time​

Setting Up Alerts with the API​

Health Check Dashboard​

Key metrics for your dashboard​

Incident Response​

Step 1: Assess scope​

Step 2: Isolate the component​

Step 3: Resolve and verify​

Best Practices​

Related Topics​