High Availability and Disaster Recovery
IronWiFi's cloud RADIUS infrastructure is designed for high availability out of the box. Every Network provides two RADIUS servers for failover, and multi-region deployment supports geographic redundancy. This guide covers how to maximize uptime with proper configuration, failover planning, backup strategies, and disaster recovery procedures.
IronWiFi's Built-In Redundancy
Dual RADIUS Servers
Every Network in IronWiFi provides two RADIUS server IP addresses:
- Primary server -- Handles authentication and accounting requests
- Backup server -- Automatically available if the primary is unreachable
Both servers share the same configuration, user data, and policies. There is no manual synchronization required -- IronWiFi manages this in the cloud.
How Failover Works
- Your access point sends an authentication request to the primary RADIUS server
- If the primary does not respond within the timeout period, the AP retries
- After the configured number of retries, the AP switches to the backup server
- The backup server processes the request using the same user database
- When the primary recovers, the AP returns to using it (behavior varies by vendor)
Failover only works if you configure both RADIUS servers on your access points. If you only configure the primary, there is no automatic failover. Always configure both.
Cloud Infrastructure Resilience
IronWiFi's cloud platform provides:
- Geographic distribution -- RADIUS servers in multiple data centers
- Automatic scaling -- Capacity adjusts to handle authentication load
- Data replication -- User and configuration data replicated across infrastructure
- 24/7 monitoring -- IronWiFi monitors platform health continuously
Multi-Region Deployment
For organizations with access points in multiple geographic regions, create separate Networks in each region to minimize latency and provide geographic redundancy.
Architecture
Setting up multi-region
- Navigate to Networks in the IronWiFi Console
- Click Create Network and select a region (e.g., US East)
- Create a second Network and select another region (e.g., Europe West)
- Configure US access points to use the US Network's RADIUS servers
- Configure EU access points to use the EU Network's RADIUS servers
Users and Groups are account-wide -- they work across all Networks automatically.
For critical deployments, you can configure access points to use a remote region's RADIUS servers as a tertiary fallback. For example, US access points could use the EU RADIUS servers as a last resort if both US servers are unreachable.
Cross-region failover
Most enterprise access points support configuring multiple RADIUS servers with priority. Use this to set up cross-region failover:
| Priority | Server | Region | Purpose |
|---|---|---|---|
| 1 (primary) | US RADIUS Primary | US East | Normal operation |
| 2 (backup) | US RADIUS Backup | US East | Same-region failover |
| 3 (tertiary) | EU RADIUS Primary | Europe West | Cross-region failover |
Check your access point documentation for the maximum number of RADIUS servers supported and how priority is configured.
Access Point Configuration for HA
Timeout and retry settings
Proper timeout configuration is critical for fast failover:
| Setting | Recommended Value | Reason |
|---|---|---|
| RADIUS timeout | 3-5 seconds | Time to wait for a response before retrying |
| Retry count | 3 | Number of retries before switching to backup |
| Dead time | 300 seconds (5 min) | How long to avoid a failed server before retrying it |
| Accounting interim | 300 seconds (5 min) | Frequency of accounting updates |
With these settings, failover to the backup server happens within 9-15 seconds (3 retries x 3-5 seconds each).
Vendor-specific HA configuration
Cisco Meraki
Meraki supports primary and secondary RADIUS servers natively:
- Navigate to Wireless > Access Control
- Under RADIUS servers, add both IronWiFi Primary and Backup IPs
- Meraki automatically fails over to the secondary if the primary is unresponsive
Ubiquiti UniFi
- In the UniFi Controller, navigate to Settings > WiFi
- Edit your SSID and expand RADIUS settings
- Add the primary server (IP, port, secret)
- Click Add RADIUS Server to add the backup
- UniFi tries servers in order, failing over automatically
Aruba / HPE
- Configure primary and backup RADIUS servers in the server group
- Set the parameter for failed server recovery
dead-time - Enable RADIUS server health monitoring
MikroTik
MikroTik tries RADIUS servers in order. If the first fails, it moves to the next.
See the Configuration Guides for complete setup instructions for your specific hardware.
RADIUS caching for offline resilience
Enable RADIUS caching on access points that support it. Caching stores recently authenticated credentials locally so users can reconnect even during a complete RADIUS outage.
Benefits:
- Previously authenticated users reconnect without reaching IronWiFi
- Handles brief network outages transparently
- Reduces authentication latency for repeat connections
See RADIUS Caching & Failover for detailed caching configuration.
Recovery Time and Recovery Point Objectives
Definitions
| Term | Definition | IronWiFi Context |
|---|---|---|
| RTO (Recovery Time Objective) | Maximum acceptable downtime | How quickly authentication resumes after a failure |
| RPO (Recovery Point Objective) | Maximum acceptable data loss | How much configuration/user data you can afford to lose |
IronWiFi HA targets
| Scenario | RTO | RPO | Mitigation |
|---|---|---|---|
| Single RADIUS server failure | 9-15 seconds | 0 (no data loss) | AP fails over to backup server |
| Regional outage | 9-15 seconds | 0 | Cross-region failover (if configured) |
| RADIUS caching active | 0 (transparent) | 0 | Local cache handles authentication |
| Complete platform outage | Depends on caching | 0 (cloud-replicated) | RADIUS caching + IronWiFi platform recovery |
| Configuration error | Minutes | 0 | Restore from backup (see below) |
Improving your RTO
- Configure both RADIUS servers -- Reduces single-server RTO to seconds
- Enable RADIUS caching -- Eliminates perceived downtime for cached users
- Use multi-region Networks -- Provides geographic resilience
- Lower AP timeout values -- Faster failover (but avoid timeouts so short they cause false failovers)
- Set up monitoring -- Detect issues before users report them (see Monitoring and Alerting)
Backup Strategies
What to back up
| Data | Backup Method | Frequency |
|---|---|---|
| User accounts and attributes | API export | Weekly or after bulk changes |
| Group configurations | API export or manual documentation | After policy changes |
| Network settings | Document RADIUS IPs, ports, secrets | After Network creation |
| Captive portal configuration | Document settings and custom HTML | After portal changes |
| Access point RADIUS settings | Network management tool backup | After configuration changes |
Automated backup via API
Use the REST API to export your IronWiFi configuration:
Schedule these exports using cron or a similar scheduler. See Backup and Restore for complete backup procedures.
Documenting your configuration
Maintain a configuration document that records:
- IronWiFi account details (not credentials -- store those in a password manager)
- Network names, regions, RADIUS server IPs, ports
- Group names and their attribute configurations
- Captive portal settings, splash page URLs, walled garden entries
- Access point RADIUS configuration for each site
- Identity provider integration details (Connector or SCIM settings)
Store this document securely and update it whenever configuration changes are made.
Testing Your HA Setup
Failover test procedure
- Connect a test device to your WiFi network and verify authentication succeeds
- On your access point, temporarily block the primary RADIUS server IP using a firewall rule
- Disconnect and reconnect the test device
- Verify authentication succeeds via the backup RADIUS server
- Check IronWiFi Reports to confirm the authentication went through the backup
- Remove the firewall rule and verify the AP returns to the primary server
Monitoring during tests
- Watch the IronWiFi Console for authentication events during failover
- Measure the time between blocking the primary and successful backup authentication
- Verify that RADIUS accounting continues to function during failover
Test schedule
| Test | Frequency | Who |
|---|---|---|
| AP failover (primary to backup) | Quarterly | Network team |
| Cross-region failover | Semi-annually | Network team |
| RADIUS caching (disconnect from cloud) | Quarterly | Network team |
| Full DR drill (simulate platform outage) | Annually | IT operations |
| Backup restore test | Semi-annually | IT operations |
Incident Communication
During an outage:
- Check status.ironwifi.com for platform-level incidents
- Subscribe to status page notifications for real-time updates
- Contact IronWiFi support via live chat at ironwifi.com or email support@ironwifi.com
- Communicate with affected users through your internal channels
Related Topics
- Networks -- RADIUS server configuration and regions
- RADIUS Caching & Failover -- Local caching for offline resilience
- Backup and Restore -- Configuration backup procedures
- Monitoring and Alerting -- Proactive health monitoring
- Deployment Planning -- Architecture and capacity planning
- Configuration Guides -- AP-specific RADIUS setup
- Operational Runbooks -- Incident response procedures
Was this page helpful?