Monitoring and Alerting
Proactive monitoring ensures your WiFi authentication infrastructure remains operational. This guide covers IronWiFi's built-in monitoring, external health checks, real-time alerting, and threshold configuration to catch problems before users report them.
Monitoring Overview
A comprehensive monitoring strategy for IronWiFi includes:
| Layer | What to Monitor | How |
|---|---|---|
| RADIUS service | Authentication success/failure rates | IronWiFi Console Reports |
| Endpoint health | RADIUS server reachability | External monitoring (UptimeRobot, Pingdom) |
| Session metrics | Active sessions, bandwidth usage | IronWiFi Console + API |
| Infrastructure | Access point connectivity | Your network management tools |
IronWiFi Console Monitoring
Real-Time Authentication Status
The IronWiFi Console provides real-time visibility into authentication activity:
- Navigate to Reports > Authentication
- View recent authentication attempts with results (Accept/Reject)
- Identify patterns such as sudden spikes in rejections
Network Status Indicators
Each Network in the IronWiFi Console displays a status indicator:
- Green -- RADIUS servers healthy and responding
- Red -- Potential issues detected
Check the Networks page regularly or after infrastructure changes to confirm all Networks show a healthy status.
Session Monitoring
Monitor active user sessions:
- Navigate to Reports > Sessions
- Review currently active sessions (no Stop Time recorded)
- Monitor session counts for unusual spikes or drops
A sudden drop in active sessions may indicate an access point outage or network issue, even if IronWiFi's RADIUS servers are healthy.
External Health Checks
External monitoring independently verifies that IronWiFi RADIUS servers respond to authentication requests. This catches issues that internal monitoring alone might miss, such as network path problems between your access points and IronWiFi.
Setting Up External RADIUS Monitoring
IronWiFi provides an authentication testing tool that external monitors can call:
- Navigate to ironwifi.com/authentication-testing-tool
- Enter your RADIUS server details (IP, port, shared secret)
- Enter test user credentials
- Submit the test and confirm
Access-Accept - Copy the resulting URL -- this is your monitoring endpoint
Configure your external monitoring service to call this URL at regular intervals and verify the response contains
Access-Accept
For detailed setup instructions with UptimeRobot, see the Service Monitor guide.
Recommended Health Check Intervals
| Monitor Type | Interval | Rationale |
|---|---|---|
| RADIUS authentication test | 5 minutes | Balances detection speed with test load |
| Captive portal HTTP check | 5 minutes | Verifies splash page availability |
| IronWiFi status page | Subscribe to alerts | Immediate notification of platform incidents |
What to Monitor
Set up separate monitors for each component:
- Primary RADIUS server -- Authentication test against primary IP
- Backup RADIUS server -- Authentication test against backup IP
- Captive portal -- HTTP check on your splash page URL (if using captive portals)
- IronWiFi status page -- Subscribe to status.ironwifi.com for platform-level alerts
Monitor both primary and backup RADIUS servers independently. If only the backup fails, authentication still works but you have no redundancy -- a critical gap that should trigger an alert.
Configuring Alerts
Alert Channels
Configure multiple alert channels to ensure notifications reach the right people:
| Channel | Best For | Setup |
|---|---|---|
| General notifications, audit trail | Add team email addresses | |
| SMS | Critical alerts requiring immediate attention | Add phone numbers (may require paid monitoring plan) |
| Slack/Teams | Team-wide visibility, incident coordination | Configure webhook integration |
| PagerDuty/Opsgenie | On-call rotation, escalation | Configure API integration |
| Webhook | Custom integrations, ticketing systems | Provide endpoint URL |
Threshold-Based Alerts
Define alert conditions based on metrics that indicate problems:
Authentication failure rate
Monitor the ratio of rejected to total authentication attempts:
- Warning threshold: Reject rate exceeds 10% over a 15-minute window
- Critical threshold: Reject rate exceeds 25% over a 5-minute window
A spike in rejections often indicates:
- Expired or changed credentials (e.g., after a password rotation)
- Certificate issues (expired CA or server certificate)
- Configuration changes on access points (wrong shared secret)
- Network path issues between access points and RADIUS servers
Session count anomalies
Monitor for unusual changes in active session counts:
- Alert on sudden drop: Active sessions drop by more than 50% within 15 minutes (may indicate access point or network outage)
- Alert on unusual spike: Sessions exceed 150% of the typical peak (may indicate unauthorized access or misconfiguration)
Response time
If your monitoring tool tracks response time:
- Warning: RADIUS authentication response time exceeds 500ms
- Critical: Response time exceeds 2 seconds
Setting Up Alerts with the API
Use the REST API to build custom monitoring and alerting:
Build a script that:
- Polls the authentication API at regular intervals
- Calculates the reject rate over a sliding window
- Sends an alert via your preferred channel when thresholds are exceeded
Example monitoring script pattern:
This is a simplified example. In production, handle API errors gracefully, use a proper scheduler (cron, systemd timer), and avoid storing API keys in source code.
Health Check Dashboard
Create a centralized view of your WiFi authentication health by combining:
- External monitoring dashboard (UptimeRobot, Pingdom, or Datadog) -- Shows RADIUS server reachability and response time
- IronWiFi Console -- Shows authentication events, session data, and Network status
- Custom dashboard -- Pull data via the API to create a unified view with your other infrastructure metrics
Key metrics for your dashboard
| Metric | Source | Refresh Interval |
|---|---|---|
| RADIUS server status | External monitor | 5 minutes |
| Authentication success rate | IronWiFi API | 5 minutes |
| Active session count | IronWiFi API | 5 minutes |
| Average response time | External monitor | 5 minutes |
| Captive portal status | External monitor (HTTP) | 5 minutes |
Incident Response
When an alert fires, follow this process:
Step 1: Assess scope
- Check the IronWiFi Console for authentication errors
- Verify if the issue affects all users or specific ones
- Check status.ironwifi.com for platform incidents
Step 2: Isolate the component
- RADIUS server unreachable: Check firewall rules, network connectivity
- High reject rate but server reachable: Check user credentials, certificates, or recent configuration changes
- Captive portal down: Verify splash page URL, check walled garden configuration
Step 3: Resolve and verify
- Apply the fix (restart AP, correct configuration, etc.)
- Verify authentication succeeds with a test user
- Confirm monitoring returns to green status
- Document the incident for future reference
For detailed troubleshooting steps, see the Troubleshooting Guide.
Best Practices
- Monitor every RADIUS server -- Both primary and backup, for every Network
- Use a dedicated test user -- Create a user specifically for monitoring; do not use a real user's credentials
- Set up multiple alert channels -- Email for audit trail, SMS or Slack for immediate response
- Test your alerts regularly -- Intentionally trigger an alert to verify the notification chain works
- Review thresholds quarterly -- Adjust warning and critical thresholds as your deployment grows
- Subscribe to IronWiFi status updates -- Get notified of platform-level incidents at status.ironwifi.com
- Document your runbooks -- Write down the steps to follow when each alert type fires (see Operational Runbooks)
Related Topics
- Service Monitor -- Step-by-step UptimeRobot setup for IronWiFi monitoring
- Troubleshooting -- Diagnosing authentication and connectivity issues
- Reporting and Analytics -- Detailed report types and data export
- Networks -- RADIUS server configuration and status
- REST API -- Programmatic access to monitoring data
- Operational Runbooks -- Incident response procedures
Was this page helpful?