What causes this behavior?
Feb 1 12:06:01.731: %SEC_LOGIN-4-LOGIN_FAILED: Login failed [user: account] [Source: x.x.x.x] [localport: 22] [Reason: Login Authentication Failed] at 12:06:01 XXX Thu Feb 1 2024
Feb 1 12:06:02.796: %SEC_LOGIN-5-LOGIN_SUCCESS: Login Success [user: account] [Source: x.x.x.x] [localport: 22] at 12:06:02 XXX Thu Feb 1 2024
As you can see there will be a failed auth followed by a success about 1 second later. Our ISE guy told me the first attempt is coming through with a blank user/pass field and the second is good. It's happening with other service accounts and user admin accounts as well.
Is this happening on all gear or just those with a particular model or IOS level?
Good point. The log settings on the infrastructure are not uniform but I can say for sure it's affecting campus switches and external routers. Pretty sure it's happening on NxOS too but I didn't get to debug on that platform.
Debug on the campus switch wasn't overly helpful. I did not see an auth deny or accept message for the failure.
Next step is a pcap. I want to see how far the conversation actually goes for the failed attempt. My go-to network guy was out today but he will be in tomorrow.
Being able to decrypt with wireshark is pretty useful. So is the native PCAP feature on network devices.
I don't see two authentication conversations, just one successful login attempt.
1. NAC SSH -> NAS
2. NAS SSH -> ISE
3. NAS -> ISE Q: Authentication request (username visible)
4. ISE -> NAS A: Give me PW
5. NAS -> ISE Q: PW populates username field (pw visible)
5. ISE -> NAS A: Pass. The other AA parts are fine too.
The only thing I can't see is the client/server conversations inside the SSH tunnel that happen between some of the steps. Is there a way to extract the session key from the DH exchange?
I don't feel like I have enough evidence to point the finger at the NAC or ISE yet. I ruled out the switches because it's happening on the three different platforms.
You may have hit a bug:
https://quickview.cloudapps.cisco.com/quickview/bug/CSCsd58148
Is it configured with login on-failure log?
How long is the delay between the last two packets. Is ISE taking too long to return the valid authentication, and devices are timing out? Are the devices configured for multiple tac_plus servers, and those are timing out before it tries the one you are looking at? Also re-reading your original post what are the AAA configs on the devices, is that initial failure coming from the local database, and then it tries tac_plus?
I am not sure on how to decrypt at SSH session using wireshark. You could try enabling telnet to bypass the whole issue of decrypting it, but I would doubt it would show you much.
Thanks,
-Otanx
Five-year old Forescout bug that was reintroduced in the OS version we are running.
The appliance tries to use a public key to log in first, fails, and then uses the correct key for the successful attempt.
Add to SSH paramaters in switch object: -o PubkeyAuthentication=no
Ah-ha! >:D
There's likely going to have to be a number of other things that got hammered out of CentOS that are still in RHEL.