Networking-Forums.com

Professional Discussions => Routing and Switching => Topic started by: deanwebb on July 06, 2017, 11:37:33 AM

Title: Please Explain BGP to Me...
Post by: deanwebb on July 06, 2017, 11:37:33 AM
Situation: Site X starts to experience connectivity issues at 11:00am. No Outlook, no Office 365, no browsing of Internet or Intranet... but they still have Skype, can browse to web pages of WLCs (both by IP and by FQDN), SSH works fine, and they can both ping Intranet web servers and telnet to port 80 on them.

No QoS, Riverbed, IPS, or Firewall in the path. When I set users at Site X to use a different proxy (one that is slated for decommissioning), they can browse everywhere. But, because we need a registry hack to set the proxy in IE, Outlook and Office 365 still fail - along with web-based applications that require IE. I go to bed last night thinking it's a proxy issue, good luck with that to the proxy team.

I wake up to find that the resolution was in doing stuff with the WAN link... an hour or so before Site X had its issues, the local telcos were doing work on the WAN link and had brought up a secondary line. When Site X turned off the primary link and brought up the secondary, everything worked as it was supposed to work.

I'm confused.

Why would we be able to make direct, non-browser connections to web servers if there was something jacked up in the WAN link? I get that, maybe, the route to the proxy was messed up, but we were able to ping it and telnet to its open ports. Once we used the proxy that was getting decommed, we could even download the proxy script from the main corporate proxy. That says to me, "full L2 and L3 connectivity, all is well".

Why is it that the WAN link was the problem? One of the engineers said something about BGP, but I don't know enough to understand him, or to determine if he was wrong.
Title: Re: Please Explain BGP to Me...
Post by: burnyd on July 06, 2017, 11:47:55 AM
That could really be anything.  Like for some reason when routing certain subnets over one wan link traffic would die due to a null route.  Or maybe aysemtricial traffic going through one firewall and out another.  Its probably the firewall in the DC 100%
Title: Re: Please Explain BGP to Me...
Post by: deanwebb on July 06, 2017, 12:19:37 PM
Can't be the firewall because changing the proxy fixed it and changing the WAN circuit fixed it even better.

But even if it was asymmetrical traffic, why would a telnet to the proxy on port 80 work, but a browser connection to the same proxy on the same port fail? It's not like we had an app-sensitive QoS policy being beta tested in Site X... I thought that either routes work, or they don't, or they flap. This thing was so app-specific, it made me think some disgruntled admin had installed a Palo Alto inline without anyone knowing. How could that be a routing issue?
Title: Re: Please Explain BGP to Me...
Post by: SimonV on July 06, 2017, 01:22:15 PM
QuoteOr maybe aysemtricial traffic going through one firewall and out another.  Its probably the firewall in the DC 100%

That was my first thought too...

Quote from: deanwebb on July 06, 2017, 12:19:37 PMBut even if it was asymmetrical traffic, why would a telnet to the proxy on port 80 work, but a browser connection to the same proxy on the same port fail? It's not like we had an app-sensitive QoS policy being beta tested in Site X... I thought that either routes work, or they don't, or they flap. This thing was so app-specific, it made me think some disgruntled admin had installed a Palo Alto inline without anyone knowing. How could that be a routing issue?

A telnet only verifies the 3-way handshake, really. Palo Alto for example lets the 3-way handshake through before it can move to application ID. Not saying that's what happened, just to explain that Telnet is not always the best test. It could also be some sort of traffic shaping on the provider (or upstream) network, where they classified HTTP as traffic eligible to drop.

Have you verified with wireshark what happened in the browser? Would be interesting to see if at least the TCP session formed... Is port 80 your true proxy port?

We had something like that in one of Russian sites too, a couple of months back. SYN would go through, SYN-ACK would be returned but the ACK always mysteriously disappeared. Our provider spent weeks trying to find the issue. We were all suspecting some sort of firewall at the local carrier but the problem suddenly disappeared. And we never found out what it was either.
Title: Re: Please Explain BGP to Me...
Post by: deanwebb on July 06, 2017, 02:34:14 PM
Pretty sure the TCP session formed, but we did not have Wireshark on the client PC at Site X and the Unix guys kept making excuses to nut do a tcpdump.

WAN guys were positive that there was no QoS, neither shaping nor policing, which was my first thought.

We'll say the proxy ports include 80, (n), and (m). One of those numbers will work. The old proxy uses (m) and the new ones will answer on 80 and (n). Doing a telnet or TCPing to any of those ports would return a successful response.

Where I'm going nuts is why all the WLC web pages worked just fine but web pages of devices in the same VLAN would fail. IE, a WLC would respond to https://10.1.1.1 or https://remote.wireless.controller.megacorp.com but the web server at 10.1.1.2 would time out when the browser went to it until we switched the proxy - or went to the secondary WAN line.
Title: Re: Please Explain BGP to Me...
Post by: that1guy15 on July 06, 2017, 03:45:21 PM
network grimlins is my offical CCIE guess.

But what those guys said sound right too.
Title: Re: Please Explain BGP to Me...
Post by: SimonV on July 07, 2017, 02:57:59 AM
So what's the next step, schedule a second fail-over to reproduce?
Title: Re: Please Explain BGP to Me...
Post by: LynK on July 07, 2017, 08:18:05 AM
Dean,

This could be a lot of things... but let me give you a hand.

Certain subnets could be routing out certain routers based on static routes (or PBR), with no ip sla for failover. Basically black-holing any return traffic. We are going to need to see a diagram of sorts, and maybe a config to give you a hand.

Title: Re: Please Explain BGP to Me...
Post by: deanwebb on July 07, 2017, 10:28:47 AM
To make matters more fun, the issue is with gear and lines that belong to a third-party ISP in Latin America that provides last mile connectivity to the MPLS provider. I'm not gonna get configs on this one...  :-\
Title: Re: Please Explain BGP to Me...
Post by: isaiahgoveait on July 09, 2017, 08:05:48 AM
Just found this http://pin.it/Ze9hSBz

Sent from my LGMS550 using Tapatalk

Title: Re: Please Explain BGP to Me...
Post by: deanwebb on July 09, 2017, 09:00:53 AM
I think I'll upload it here...

:tmyk: