Times like these, I wish it *was* the firewall...

Started by deanwebb, March 17, 2017, 06:11:39 PM

Previous topic - Next topic

deanwebb

Now, about my saga and why this image will now be used in all my network diagrams to represent the IPS: :problem?:

We went with the perimeter DHCP guys to change the DNS server in the DMZ to be 8.8.8.8 and that didn't help things. We went back to the IPS because it was the big change from right before the wireless went south and even though we had ruled it out before, we had to look at it again because that's what you do when you have no fresh ideas. You start opening things up massively.

We put it into layer 2 fallback, no luck. Thanks, IPS, I think that proves even more it's not you.

I get captures of the bad traffic and a capture from where things are going good, and I see about 50% of the packets going to and coming from the authentication servers marked as "Bad TCP" where it's broken and 0% marked "Bad TCP" where it's working fine.

:wha?:

I show that to my boss, who then puts the IPS into bypass mode, not layer 2 fallback. The problem goes away.

Here's what we missed:

1. Legacy guest traffic crossed the IPS once: at the perimeter.
2. The new guest wireless network traffic crossed the IPS every time it went back and forth between it and the authentication server.
3. The new IPS actually *worked* for the amount of traffic we had crossing it.

That last one was the kicker. The old IPS was overwhelmed by the traffic, so we only had occasional glitches in registering. The old IPS simply failed to touch the traffic, most of the time. The new one could actually look at ALL the traffic, so the filter that interfered with registration traffic actually started doing its job... about 90% of the time... we could still use a larger IPS, I guess.

But with the new IPS actually working on traffic as it criss-crossed that path, it was ripping apart our authentication traffic. Since the auth server also provided DNS services, other traffic was slowed down as the DNS resolution traffic was ripped apart by the IPS.

We put an exemption in to ignore traffic to and from the auth server, but the IPS still went mad dog on it. Eventually, we got the filter tuned and other stuff set up so that it wouldn't attack registration and DNS traffic.

Lesson learned: that IPS is like a horror movie villain. You can't kill it through normal means, and if you give it more power, your nightmares will multiply.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Otanx

Oh, IPS issues. We had a vendor say 0% packet loss on a 10G pipe. What they didn't say, and was found in testing is that 0% packet loss != 0% not-inspected. So the IPS could handle about 4G/s of traffic. If the traffic sat in the buffer for more than Xms it would forward it without doing inspection. Look at their dashboard, and 0% packet loss. There is no display for not-inspected packets.

-Otanx

deanwebb

What I also learned from the captures is that there are a LOT of retransmits when those packets are sitting in a buffer, thereby adding to the traffic already not being inspected.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.