Skip to main content
Skip to main content

Monitoring and Alerting

Proactive monitoring ensures your WiFi authentication infrastructure remains operational. This guide covers IronWiFi's built-in monitoring, external health checks, real-time alerting, and threshold configuration to catch problems before users report them.

Monitoring Overview

A comprehensive monitoring strategy for IronWiFi includes:

LayerWhat to MonitorHow
RADIUS serviceAuthentication success/failure ratesIronWiFi Console Reports
Endpoint healthRADIUS server reachabilityExternal monitoring (UptimeRobot, Pingdom)
Session metricsActive sessions, bandwidth usageIronWiFi Console + API
InfrastructureAccess point connectivityYour network management tools

IronWiFi Console Monitoring

Real-Time Authentication Status

The IronWiFi Console provides real-time visibility into authentication activity:

  1. Navigate to Reports > Authentication
  2. View recent authentication attempts with results (Accept/Reject)
  3. Identify patterns such as sudden spikes in rejections

Network Status Indicators

Each Network in the IronWiFi Console displays a status indicator:

  • Green -- RADIUS servers healthy and responding
  • Red -- Potential issues detected

Check the Networks page regularly or after infrastructure changes to confirm all Networks show a healthy status.

Session Monitoring

Monitor active user sessions:

  1. Navigate to Reports > Sessions
  2. Review currently active sessions (no Stop Time recorded)
  3. Monitor session counts for unusual spikes or drops

A sudden drop in active sessions may indicate an access point outage or network issue, even if IronWiFi's RADIUS servers are healthy.

External Health Checks

External monitoring independently verifies that IronWiFi RADIUS servers respond to authentication requests. This catches issues that internal monitoring alone might miss, such as network path problems between your access points and IronWiFi.

Setting Up External RADIUS Monitoring

IronWiFi provides an authentication testing tool that external monitors can call:

  1. Navigate to ironwifi.com/authentication-testing-tool
  2. Enter your RADIUS server details (IP, port, shared secret)
  3. Enter test user credentials
  4. Submit the test and confirm
    Access-Accept
  5. Copy the resulting URL -- this is your monitoring endpoint

Configure your external monitoring service to call this URL at regular intervals and verify the response contains

Access-Accept
.

For detailed setup instructions with UptimeRobot, see the Service Monitor guide.

Monitor TypeIntervalRationale
RADIUS authentication test5 minutesBalances detection speed with test load
Captive portal HTTP check5 minutesVerifies splash page availability
IronWiFi status pageSubscribe to alertsImmediate notification of platform incidents

What to Monitor

Set up separate monitors for each component:

  1. Primary RADIUS server -- Authentication test against primary IP
  2. Backup RADIUS server -- Authentication test against backup IP
  3. Captive portal -- HTTP check on your splash page URL (if using captive portals)
  4. IronWiFi status page -- Subscribe to status.ironwifi.com for platform-level alerts
tip

Monitor both primary and backup RADIUS servers independently. If only the backup fails, authentication still works but you have no redundancy -- a critical gap that should trigger an alert.

Configuring Alerts

Alert Channels

Configure multiple alert channels to ensure notifications reach the right people:

ChannelBest ForSetup
EmailGeneral notifications, audit trailAdd team email addresses
SMSCritical alerts requiring immediate attentionAdd phone numbers (may require paid monitoring plan)
Slack/TeamsTeam-wide visibility, incident coordinationConfigure webhook integration
PagerDuty/OpsgenieOn-call rotation, escalationConfigure API integration
WebhookCustom integrations, ticketing systemsProvide endpoint URL

Threshold-Based Alerts

Define alert conditions based on metrics that indicate problems:

Authentication failure rate

Monitor the ratio of rejected to total authentication attempts:

  • Warning threshold: Reject rate exceeds 10% over a 15-minute window
  • Critical threshold: Reject rate exceeds 25% over a 5-minute window

A spike in rejections often indicates:

  • Expired or changed credentials (e.g., after a password rotation)
  • Certificate issues (expired CA or server certificate)
  • Configuration changes on access points (wrong shared secret)
  • Network path issues between access points and RADIUS servers

Session count anomalies

Monitor for unusual changes in active session counts:

  • Alert on sudden drop: Active sessions drop by more than 50% within 15 minutes (may indicate access point or network outage)
  • Alert on unusual spike: Sessions exceed 150% of the typical peak (may indicate unauthorized access or misconfiguration)

Response time

If your monitoring tool tracks response time:

  • Warning: RADIUS authentication response time exceeds 500ms
  • Critical: Response time exceeds 2 seconds

Setting Up Alerts with the API

Use the REST API to build custom monitoring and alerting:

Build a script that:

  1. Polls the authentication API at regular intervals
  2. Calculates the reject rate over a sliding window
  3. Sends an alert via your preferred channel when thresholds are exceeded

Example monitoring script pattern:

note

This is a simplified example. In production, handle API errors gracefully, use a proper scheduler (cron, systemd timer), and avoid storing API keys in source code.

Health Check Dashboard

Create a centralized view of your WiFi authentication health by combining:

  1. External monitoring dashboard (UptimeRobot, Pingdom, or Datadog) -- Shows RADIUS server reachability and response time
  2. IronWiFi Console -- Shows authentication events, session data, and Network status
  3. Custom dashboard -- Pull data via the API to create a unified view with your other infrastructure metrics

Key metrics for your dashboard

MetricSourceRefresh Interval
RADIUS server statusExternal monitor5 minutes
Authentication success rateIronWiFi API5 minutes
Active session countIronWiFi API5 minutes
Average response timeExternal monitor5 minutes
Captive portal statusExternal monitor (HTTP)5 minutes

Incident Response

When an alert fires, follow this process:

Step 1: Assess scope

  • Check the IronWiFi Console for authentication errors
  • Verify if the issue affects all users or specific ones
  • Check status.ironwifi.com for platform incidents

Step 2: Isolate the component

  • RADIUS server unreachable: Check firewall rules, network connectivity
  • High reject rate but server reachable: Check user credentials, certificates, or recent configuration changes
  • Captive portal down: Verify splash page URL, check walled garden configuration

Step 3: Resolve and verify

  • Apply the fix (restart AP, correct configuration, etc.)
  • Verify authentication succeeds with a test user
  • Confirm monitoring returns to green status
  • Document the incident for future reference

For detailed troubleshooting steps, see the Troubleshooting Guide.

Best Practices

  1. Monitor every RADIUS server -- Both primary and backup, for every Network
  2. Use a dedicated test user -- Create a user specifically for monitoring; do not use a real user's credentials
  3. Set up multiple alert channels -- Email for audit trail, SMS or Slack for immediate response
  4. Test your alerts regularly -- Intentionally trigger an alert to verify the notification chain works
  5. Review thresholds quarterly -- Adjust warning and critical thresholds as your deployment grows
  6. Subscribe to IronWiFi status updates -- Get notified of platform-level incidents at status.ironwifi.com
  7. Document your runbooks -- Write down the steps to follow when each alert type fires (see Operational Runbooks)

Was this page helpful?