Failure Testing Cisco ISE Distributed Deployments

Published by

iwiizkiid

on

16/06/2021

In this article we will analyse the behavior of Cisco ISE when configured as a distributed deployment. Furthermore, a series of failure scenarios will be carried out in an attempt to see how ISE functions when certain nodes are not available. The following tests could be useful in situations where one DC has failed and/or disaster recovery testing is taking place.

The following tests determine how ISE distributed deployments may behave in particular failures.

The following devices will be configured and used to test against a number of failure scenarios:

x1 Primary Policy Administration Node (PPAN)
x1 Secondary Policy Administration Node (SPAN)
x1 Primary Monitoring Node (PMnT)
x1 Secondary Monitoring Node (SMnT)
x1 Policy Services Node (PSN)

To test our failure scenarios, the nodes are distributed across two fictitious data centers as shown in the topology below.

Note: Only one PSN is configured because PAN’s and MnT’s are the main focus of the following failure scenarios.

Its important to point out the documented expected behavior of ISE:

Expected Monitoring Node Behaviour

MnT nodes concurrently receive logging from PAN’s, PSN’s, ASA’s NAD’s
When the primary PAN is available, the PSN receives logs from this MnT
When the primary MnT fails or becomes unreachable logs will be sent to the remaining MnT node
A maximum of two MnT nodes can be used in a distributed deployment
The PAN will detect that the primary MnT has failed and will retrieve logs from the secondary MnT
The secondary MnT doesn’t need to be changed to the primary role; the PAN will automatically realise the primary MnT is down and decide to receive logs from the secondary MnT after 5-15 mins
The logs that are received on the secondary PAN will not be syncronised with the primary MnT when services are restored. If these logs are required then a backup and restore of the secondary MnT to the primary MnT is required
PSN’s only buffer logs when MnT’s are down if TCP secure syslog is used

Expected Policy Admin Node Behaviour

When the Primary PAN becomes unreachable failover to the secondary PAN doesn’t occur unless:
- Automatic failover is configured
- Unless the Primary PAN goes down and automatic failover is configured
- The role is manually changed on the secondary PAN
The secondary PAN will not become the primary PAN even when automatic failover is configured if there is no connectivity between the two DC’s
Promotion of the seondary PAN to primary will take roughly 15 – 30 minutes

Failure Scenario 1: Shutdown access to the Primary Monitoring node and analyse whether authentication logs are still received

Observations:

The secondary MnT remains secondary
Logs are still received and shown on the ISE logs after 16 minutes
The primary MnT node status is ‘disconnected’
When the primary MnT is brought back online, the PSN resumes retrieving logs from this device and therefore the logs generated from the secondary MnT are no longer visible

Failure Scenario 2: Stop access to the primary PAN, manually promote the secondary PAN and Analyse the Behaviour before resuming access to the original primary PAN

Observations:

The promotion takes approximately 30 – 40 mins to complete
Once the secondary PAN application server has started and the GUI is accessed, the deployment shows that the secondary PAN is now primary and that the original primary PAN is now secondary. However, because the original primary PAN is unreachable, it will still be acting as the primary PAN also (split-brain)
Authentication logs are still visible however, these are the logs from the primary MnT. The logs received from the secondary MnT as part of failure scenario 1 are not visible as the query is from the primary MnT
When the original primary PAN is reachable again, the following message is presented when trying to access the GUI ‘Server is undergoing administrative maintenance. Please try again later.’ When analysing the CLI of the original primary PAN, the processes appear to be restarting and access is lost to the GUI. The original primary PAN is forced into the secondary role as configured on ISE when the original secondary PAN was promoted to primary. The now primary PAN shows that the original primary PAN is ‘disconnected’
When the application server has started for the original primary PAN, the node status for this device shows ‘Replication Stopped’. When logging into the now secondary PAN, options are limited, this is expected from a secondary PAN. To bring the now secondary PAN back into sync, a manual sync is required. Once the sync is initiated and the processes have restarted, the now secondary PAN should synchronise with the now primary PAN and the node status should change to a green state ‘Connected’. The now secondary PAN can be changed back to be the primary PAN if desired

Failure Scenario 3: Stop access to both MnT’s and analyse the behaviour of authentication logs that were generated while MnT’s were offline

Observations:

No logs are displayed
When MnT’s are back online new logs are received however logs that should have been generated when MnT’s were down are not received

Closing Comments

If you have any failure scenarios that you would like me to test, please drop a comment and I will do my best to test it and feedback the results on this article. I will continue to update this article with more failure scenarios as I test them.

Useful Links

https://www.cisco.com/c/en/us/support/docs/security/identity-services-engine/214136-recovering-disabled-and-out-of-sync-node.html

https://www.cisco.com/en/US/docs/security/ise/1.0/user_guide/ise10_dis_deploy.html

https://www.cisco.com/c/en/us/td/docs/security/ise/2-4/install_guide/b_ise_InstallationGuide24/b_ise_InstallationGuide24_chapter_00.html

https://www.cisco.com/c/en/us/td/docs/security/ise/2-0/admin_guide/b_ise_admin_guide_20/b_ise_admin_guide_20_chapter_010.html

Discover more from Network Wizkid

Subscribe to get the latest posts to your email.

2 responses to “Failure Testing Cisco ISE Distributed Deployments”

Marc Binns

19/04/2024

Only just seeing this but really good post on testing scenarios carried out for ISE! Conducting similar tests to this and was trying to find out observed behaviour when split brain occurred. This covers that nicely. Thank you!

Loading…

Reply
1. iwiizkiid
  
  19/04/2024
  
  No problem Marc, thanks for your feedback.
  
  Loading…
  
  Reply

Network Wizkid