Strange 802.1x issue

Started by config t, November 03, 2021, 08:35:09 PM

Previous topic - Next topic

config t

The supplicant will authenticate dot1x (EAP-TLS) with computer cert on the first try however when it re-auths for any reason such as NIC disconnect or re-auth timer it will fail. The strange part is in the switch logs during the failures it will say the client is not responding during each attempt, and we do not see any dot1x attempts in the RADIUS logs, only MAB which makes sense because once it fails dot1x we have it configured to try MAB.

We have ruled out switch misconfiguration or a misconfiguration on the RADIUS server (using Forescout, the best RADIUS server).

It's almost like it's caching creds and ignoring the dot1x portion of the re-connect, because when we reboot the machines they will authenticate dot1x perfectly fine.. until it has to re-auth.

Just wanted to ping you guys to see if you have run into this. We are thinking at this point it's a Windows issue but haven't sussed out exactly what it is.

Tested with a Windows 10 laptop and a workstation on two different 3850's, one running Fiji the other running Gibraltar.
:matrix:

Please don't mistake my experience for intelligence.

deanwebb

What's the congestion like on the router between the switch and the RADIUS server? Wondering if it's dropping fragments, which KILLS the supplicant offering up the cert. But... I don't think that's going to be it since the first auth automatically works.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

config t

It's a pretty small network, I think maybe 100 or so endpoints so I am doubting congestion, and all the traffic is within the LAN. We don't even see any RADIUS logs on the server because the supplicant doesn't seem to be responding to the switch for re-auth. It's as if there is something in-between the supplicant and the switch communication that is saying noop and it's just not responding. I'm pretty sure it's a windows issue at this point.. could also be a timer setting in the gpo but i'm doubtful about that.
:matrix:

Please don't mistake my experience for intelligence.

deanwebb

Is this a native Windows supplicant? Does this happen for non-Windows systems?
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Otanx

I have not seen that issue in our environment. Wireshark on the supplicant to see if it tries to respond. If it says it is responding then SPAN and check the switch port. How fast is your re-auth timer? A quick Google says at least on Server 2012 there is a 30 second wait after initiating before it will do another session. I doubt your timers are that low, but maybe it is different on Windows 10, or someone changed the setting.

Link for 2012 - https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/hh994696(v=ws.11)
Look at the "Auth Period"

-Otanx

config t

Quote from: deanwebb on November 04, 2021, 08:17:47 AM
Is this a native Windows supplicant? Does this happen for non-Windows systems?

Native Windows supplicant. Have not tested any non-Windows. At the moment the scope is only Windows workstations/laptops.

Quote from: Otanx on November 04, 2021, 09:56:32 AM
How fast is your re-auth timer?


Is that re-auth timer like a wait period before it will allow another auth attempt? That would make sense if it's set super high or something accidentally via GPO. I can see why it would derp in that case.






:matrix:

Please don't mistake my experience for intelligence.

config t

Turned out to be a McAfee issue. Wireshark showed the initial EAPOL going out but nothing coming back.. so the EAPOL and EAP traffic was being dropped. The reason the initial auth after reboot happened has to be because ENS wasn't in play until after the dot1x session had already completed.
:matrix:

Please don't mistake my experience for intelligence.

deanwebb

^ Client issue, I knew it. 99% of all dot1X problems are client problems. :smug:
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

config t

My past couple deployments confirm that statement haha. Still, another lesson learned in my back pocket  :smug:
:matrix:

Please don't mistake my experience for intelligence.

Otanx

In our environment it is always the NPS servers we use for RADIUS, but 802.1x is a network thing. The network team is responsible for it.

-Otanx

config t

Quote from: Otanx on November 05, 2021, 11:51:32 AM
In our environment it is always the NPS servers we use for RADIUS, but 802.1x is a network thing. The network team is responsible for it.

-Otanx


In the past that is how I always saw it done too. NPS with LDAP integration. Systems would sling those tickets at us even when it was a systems issue, without doing any troubleshooting aside from checking the security group. Imo networking doesn't have a lot to do after initial config. Make sure the switch stays properly configured and traffic/ports are getting where they need to be.. maybe assist troubleshooting by bouncing a port and watching auth sessions and logs.

I saw today during dynamic VLAN assignment testing using CoA that while we can indeed push the VLAN change the host will not grab the new IP unless the port is bounced first. Messy. What if there is a PoE device on the other side? The end user needs to wait until the phone reboots before they can re-auth and get back on the network? Admittedly I don't yet fully understand the CoA process yet.
:matrix:

Please don't mistake my experience for intelligence.

deanwebb

Basically, there is a lot of convenience on wired networks that goes into the rubbish bin when you turn on security. PoE is one of those convenient things. If you want to be more secure, get a power strip and auxiliary power supply for that phone. No more PoE.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

config t

I see your point. Still, it would be nice to be able to generate a push that terminates the connection and triggers a new DHCP request without losing PoE.
:matrix:

Please don't mistake my experience for intelligence.

deanwebb

Quote from: config t on November 06, 2021, 11:57:19 PM
I see your point. Still, it would be nice to be able to generate a push that terminates the connection and triggers a new DHCP request without losing PoE.

That would take a bit of an RFC revision on the standard.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

icecream-guy

That whole dropping the POE, the phone issue becomes a liability.  Think if someone needed to dial 911 (or your local emergency number) when the POE phone is rebooting.. that needs to be accounted for in the overall plan.
:professorcat:

My Moral Fibers have been cut.