Operational Runbooks
This page provides step-by-step procedures for common operational tasks in IronWiFi. Use these runbooks during incidents, scheduled maintenance, and bulk operations to ensure consistent, reliable outcomes.
Incident Response
Runbook: Authentication Outage
Trigger: Multiple users report inability to connect to WiFi, or monitoring alerts indicate RADIUS server unreachable.
Severity: Critical
Target resolution time: 15 minutes
Step 1: Confirm the issue (2 minutes)
- Check status.ironwifi.com for platform-wide incidents
- Log in to the IronWiFi Console
- Navigate to Reports > Authentication
- Look for a sudden drop in successful authentications or spike in failures
- Navigate to Networks and check the status indicator for each Network
Step 2: Determine scope (3 minutes)
| Scope | Indicators | Likely Cause |
|---|---|---|
| All users, all sites | No authentications in reports | Platform issue or account issue |
| All users, one site | Failures from specific NAS IPs | Site-level network or AP issue |
| Some users, all sites | Specific usernames failing | User/group configuration issue |
| One user | Single username failing | Individual user account issue |
Step 3: Triage and act
If platform-wide outage (status.ironwifi.com reports incident):
- Verify RADIUS caching is active on access points (previously authenticated users should still connect)
- Monitor the status page for updates
- Contact IronWiFi support if no status page update within 15 minutes
If site-level issue (one location affected):
- Verify network connectivity from the site to the internet
- Check if the access point can reach IronWiFi RADIUS IPs (ping or traceroute from the AP network)
- Verify firewall rules allow outbound UDP on RADIUS authentication and accounting ports
- Check if the AP is using the correct RADIUS server IPs, ports, and shared secret
- Restart the access point or controller if configuration appears correct
If user/group issue:
- Check the affected user's account in the IronWiFi Console (status, credentials, group membership)
- Look at the authentication report for the specific reject reason
- See Troubleshooting for resolution steps based on the reject reason
Step 4: Verify resolution (2 minutes)
- Connect a test device and authenticate
- Check Reports > Authentication for new entries
Access-Accept - Confirm monitoring alerts clear
Step 5: Post-incident
- Document what happened, when, and how it was resolved
- If a configuration change caused the issue, implement safeguards (pre-change testing, rollback plan)
- Review monitoring coverage -- were alerts timely?
Runbook: High Authentication Failure Rate
Trigger: Monitoring alert for authentication reject rate above threshold (e.g., >10% over 15 minutes).
Severity: Warning (escalate to Critical if >50% reject rate)
Step 1: Identify the pattern
- Navigate to Reports > Authentication
- Filter by result: Reject
- Look for patterns:
| Pattern | Likely Cause |
|---|---|
| Same username, repeated failures | Wrong password, locked account |
| Many usernames, same NAS IP | AP misconfiguration (wrong shared secret) |
| Many usernames, all NAS IPs | Group policy change, expired certificates |
| Unknown usernames | Unauthorized access attempts |
Step 2: Resolve based on pattern
Wrong password / locked account:
- Navigate to Users > find the affected user
- Verify the account is enabled
- Reset the password if needed
AP misconfiguration:
- Identify the NAS IP from the report
- Check the AP's RADIUS configuration (IP, port, shared secret)
- Correct the shared secret -- it must match the IronWiFi Network settings exactly
Group policy change:
- Review recent changes to Groups in the IronWiFi Console
- Check if any attributes were modified that could cause rejections (e.g., restrictions)
Login-Time - Revert the change or adjust the policy
Unauthorized access attempts:
- Review the failing usernames -- do they match known accounts?
- If not, this may be a brute-force attempt
- Monitor but do not take action unless it causes service degradation
Runbook: Captive Portal Not Loading
Trigger: Guests report the splash page does not appear when connecting to WiFi.
Severity: High
Step 1: Verify the issue
- Connect a test device to the guest SSID
- Open a browser and navigate to
http://neverssl.com - Observe whether the redirect to the captive portal occurs
Step 2: Check common causes
| Check | How | Fix |
|---|---|---|
| Walled garden | Verify | Add missing entries |
| Captive portal URL | Verify the splash page URL in the AP matches IronWiFi Console | Correct the URL |
| RADIUS configuration | Verify RADIUS settings on the AP match the IronWiFi Network | Correct settings |
| Captive portal status | Check the portal is enabled in the IronWiFi Console | Enable the portal |
| DNS | Verify DNS resolution works from the guest VLAN | Fix DNS configuration |
Step 3: Verify fix
- Reconnect the test device
- Confirm the splash page loads
- Complete authentication and verify internet access
Escalation Paths
When to escalate to IronWiFi support
Escalate to IronWiFi support when:
- status.ironwifi.com shows no incident but the service is clearly down
- An issue persists after completing the relevant runbook
- You suspect a platform-level bug
- You need configuration help beyond what documentation covers
How to contact support
| Channel | Response Time | Best For |
|---|---|---|
| Live chat on ironwifi.com | Minutes (24/7) | Urgent issues, quick questions |
| Email: support@ironwifi.com | Hours | Detailed issues, non-urgent requests |
| Status page: status.ironwifi.com | N/A | Platform-wide incident updates |
Information to provide
When contacting support, include:
- Account email or organization name
- Network name and region
- Timestamp of when the issue started
- Scope -- how many users/sites affected
- Steps already taken from the relevant runbook
- Screenshots of error messages or report data
- NAS IP of the affected access point (if applicable)
Maintenance Procedures
Runbook: Shared Secret Rotation
Rotate the RADIUS shared secret periodically or if you suspect it has been compromised.
Changing the shared secret requires updating every access point that uses the Network. Plan this during a maintenance window.
Procedure
- Schedule a maintenance window during low-usage hours
- Document current settings -- Note the current shared secret for rollback
- Generate the new secret in the IronWiFi Console:
- Navigate to Networks > select the Network
- Regenerate the shared secret
- Copy the new secret
- Update access points -- Change the shared secret on every AP or controller that uses this Network
- Test -- Authenticate a test user from each site
- Verify -- Check Reports > Authentication for successful authentications across all sites
Rollback
If authentication fails after the change:
- Revert the shared secret on the access points to the previous value
- Contact IronWiFi support to revert the secret in the Console if needed
Runbook: Adding a New Site
When deploying IronWiFi to a new physical location.
Procedure
- Determine the region -- Choose the IronWiFi region closest to the new site
- Decide: new Network or existing?
- Same region as an existing Network: reuse the existing Network's RADIUS settings
- Different region: create a new Network in the closer region
- If creating a new Network:
- Navigate to Networks > Create Network
- Select the appropriate region
- Note the RADIUS server details
- Configure access points at the new site:
- Enter the RADIUS server IPs, ports, and shared secret
- Configure both primary and backup servers
- Set up accounting
- Configure captive portal if needed (see Quick Start: Guest WiFi)
- Test:
- Authenticate a test user
- Verify VLAN assignment (if applicable)
- Verify captive portal (if applicable)
- Check accounting data appears in Reports
- Set up monitoring for the new site's RADIUS connectivity (see Monitoring and Alerting)
Runbook: Planned Maintenance on Access Points
When you need to reboot or reconfigure access points.
Procedure
- Notify users of the maintenance window (email, Slack, signage)
- Verify IronWiFi RADIUS settings are documented (in case AP resets to defaults)
- Perform maintenance (firmware update, reboot, configuration change)
- After maintenance:
- Verify the AP reconnects and reaches IronWiFi RADIUS servers
- Authenticate a test user
- Check Reports > Authentication for events from the maintained AP (NAS IP)
- Confirm accounting -- Verify sessions restart after the reboot (existing sessions will show terminate cause)
NAS-Reboot
Bulk Operations
Runbook: Bulk User Import
Import a large number of users via the API.
Preparation
- Prepare a CSV file with columns: ,
username,email,fullname,passwordgroup - Validate the data:
- No duplicate usernames
- Valid email formats
- Passwords meet complexity requirements
- Back up your current user list via the API (see Backup and Restore)
Execution
- Use the batch import script from the Migration Guide
- Monitor the script output for errors
- Respect API rate limits (100 requests per minute)
- After the import, verify in the IronWiFi Console:
- Total user count matches expected number
- Spot-check several users' group assignments
- Test authentication for a sample of imported users
Rollback
If the import introduces bad data:
- Stop the import script
- Identify problematic users via the Console or API
- Delete or correct them individually or via the API
Runbook: Bulk User Deactivation
Deactivate multiple user accounts (e.g., departing employees, end of event).
Procedure
Verification
- Attempt to authenticate with a deactivated user -- should receive
Access-Reject - Check the IronWiFi Console -- deactivated users should show as disabled
Runbook: Bulk Password Reset
Force password resets for a group of users (e.g., after a security incident).
Procedure
- Export the list of affected users
- Generate new temporary passwords
- Update passwords via the API:
- Communicate new passwords to users through a secure channel
- Require users to change their password on next login (if your IdP supports this)
Operational Checklists
Daily
- Check status.ironwifi.com for any incidents
- Review monitoring dashboard for alerts
- Glance at authentication reports for unexpected patterns
Weekly
- Review authentication failure trends
- Check session counts for anomalies
- Verify scheduled backups completed successfully
- Review any pending user access requests
Monthly
- Review and clean up unused user accounts
- Audit Group policies for accuracy
- Update documentation for any configuration changes
- Review access point firmware for available updates
Quarterly
- Test RADIUS failover (primary to backup)
- Test backup restore procedure
- Review and update monitoring thresholds
- Rotate API keys used for integrations
- Review Backup and Restore procedures
Related Topics
- Troubleshooting -- Detailed troubleshooting for specific error types
- Monitoring and Alerting -- Proactive monitoring setup
- Backup and Restore -- Configuration backup procedures
- High Availability -- Redundancy and disaster recovery
- REST API -- API reference for scripted operations
- Migration Guide -- User import procedures
- Rate Limiting -- API rate limits for bulk operations
Was this page helpful?