What could cause a router sub interface to drop random pings?

Started by Dieselboy, April 29, 2016, 06:33:56 AM

Previous topic - Next topic

routerdork

I've seen some ISP's that use the MAC's for security but in that case if it didn't match it shouldn't work. Unless maybe they are shaping/policing unknown MAC's.
"The thing about quotes on the internet is that you cannot confirm their validity." -Abraham Lincoln

NetworkGroover

Yuck - I seriously hope the issue is on their end at this point.

You can use Wireshark to create an I/O graph to show your ICMP requests vs. responses - though of course the only way you could 100% trust it is if it was on the wire itself and not a device.  That would eliminate half your troubleshooting though if you know for fact the response isn't making it back to your edge device.
Engineer by day, DJ by night, family first always

Otanx

Quote from: Dieselboy on May 02, 2016, 03:48:17 AM

While I'm here, I used MAC addresses from 4C00828ACF00 to 4C00828ACF0-9 and they all exhibit the same behavior.


Take a look at this discussion on reddit

https://www.reddit.com/r/networking/comments/4hduco/mpls_throughput_issue/

In short apparently MPLS routers can have an issue if your MAC address starts with a 4 or 6. I don't know if this is your issue or not, but thought of you when reading it, and then looking saw your MAC addresses started with a 4.

-Otanx

Dieselboy

Quote from: deanwebb on May 02, 2016, 01:42:23 PM
You know it's a crazy case when you be messing with MAC addresses.

I know!  :awesome:

Quote from: routerdork on May 02, 2016, 04:05:29 PM
I've seen some ISP's that use the MAC's for security but in that case if it didn't match it shouldn't work. Unless maybe they are shaping/policing unknown MAC's.

Level 3 tech said they're not doing anything with MACs at all.

Quote from: AspiringNetworker on May 02, 2016, 04:18:56 PM
Yuck - I seriously hope the issue is on their end at this point.

You can use Wireshark to create an I/O graph to show your ICMP requests vs. responses - though of course the only way you could 100% trust it is if it was on the wire itself and not a device.  That would eliminate half your troubleshooting though if you know for fact the response isn't making it back to your edge device.

WOW can you do that?? I was thinking yesterday when I was sending thousands of pings "wouldn't it be nice if...". Do you have a link for that?
PS the issue is 100% definitely on their end. For the reasons of:
1. the issue is present even if connecting my laptop directly into their NTE switch (Cisco ME3400)
2. Configuring the problematic MAC address into my laptop NIC driver to source traffic from
3. using a non-problematic MAC on the same physical equipment (my laptop or other hardware) and being directly connected to their NTE gives no issues.

Quote from: Otanx on May 02, 2016, 04:43:27 PM
Take a look at this discussion on reddit

https://www.reddit.com/r/networking/comments/4hduco/mpls_throughput_issue/

In short apparently MPLS routers can have an issue if your MAC address starts with a 4 or 6. I don't know if this is your issue or not, but thought of you when reading it, and then looking saw your MAC addresses started with a 4.

-Otanx

Otanx, this might be it I wonder! Because while I had my laptop directly connected to the ISP switch, I did start playing around with MAC addresses whilst clenching my buttocks. I found:

5C00828ACF0F is GOOD
4D00828ACF0F is GOOD
4C00928ACF0F is BAD
4C00938ACF0F is BAD

Going to read through the Reddit whilst consuming the morning coffee. Thanks a bunch :)

Otanx

Quote from: Dieselboy on May 02, 2016, 08:29:26 PM

Otanx, this might be it I wonder! Because while I had my laptop directly connected to the ISP switch, I did start playing around with MAC addresses whilst clenching my buttocks. I found:

5C00828ACF0F is GOOD
4D00828ACF0F is GOOD
4C00928ACF0F is BAD
4C00938ACF0F is BAD

Going to read through the Reddit whilst consuming the morning coffee. Thanks a bunch :)

What if you do it without clenching your buttocks? Do you get different results? What about standing on one foot, and type only using your thumbs? Then swap feet, then try with only pinkies. We will get this solved eventually. We just need to test everything!

-Otanx

Dieselboy

Waiting for them to call me to discuss..
It's worth me mentioning this to them although the article and reddit lists no packet loss but odd TCP throughput through MPLS network. In my case I have up to 15% packet loss when using macs starting with 4c00..... I'm now worried I'll get similar provider response to what the Reddit OP had.

I know a different mac will fix the issue but I'm reluctant as it's non-standard and will affect the physical router interface, not just the subinterface.

Dieselboy

They state there is no MPLS in path.

What are the implications of me going ahead and configuring a new mac address on the router interface? I can use the Cisco 00a0.c9xx.xxxx which is from the Cisco ASA documentation. I've already designed a mac address standard for configuring ASA subinterfaces, I can adapt it for a router so it will be unique in my network.. But that's as much as I can assure.

It's not something I've needed to do before, ever. I don't like doing something odd like this mainly because of unknowns. So I'm reluctant to go ahead due to any possible issues arising which would be service affecting.

NetworkGroover

#22
Quote from: Dieselboy on May 02, 2016, 08:29:26 PM
WOW can you do that?? I was thinking yesterday when I was sending thousands of pings "wouldn't it be nice if...". Do you have a link for that?

Heck yeah you can.  I thank God every day for the HORRIBLE time I had in Tech Support at Websense - because it got me thorough experience with Wireshark, and that has never stopped helping me to this day.

In fact, funny thing about the link - it's a Websense Tech Support article I wrote years ago for analyzing DNS request vs. response which is critical for good proxy behavior - otherwise you get that, "WTF mah interwebz are teh slow".

So take a look at that, and instead just use the icmp request and response filters... I can find them for you if you need me to but I'm sure you're capable ;)

http://www.websense.com/support/article/kbarticle/Identify-DNS-related-errors-using-Wireshark

EDIT - Oh, you can watch it live during the capture as well if you like.  I've used this to verify QoS policy behavior (look at rate of traffic for particular DSCP value), etc. - pretty cool stuff.
Engineer by day, DJ by night, family first always

Dieselboy

So the latest update is that Level 3 have passed this off to the "SME" and I had a nice chat with her. She looked into the network at my end (customer end) and was going on about seeing two MACs at my end. I said well I can see two macs coming from your end, one is the gateway and she confirmed the other is the NTE which is a Cisco ME3400. I said it's probably spanning tree / cdp / lldp or something like that.
I explained regardless, even if I have only my laptop connected, we get packet loss with a certain MAC.

I mentioned I had sent in the document explaining the bug with MPLS but I had previously been told there's no MPLS between. She said yes she's seen it and actually there IS MPLS between the customer port and the L3 gateway I'm routing to! But she said "but it's Ethernet"... I'm not sure if she meant that it's a VPLS type carrier (IE not mpls) or MPLS on top of a L2 switched network.
From what I can remember, MPLS is just layer "2.5" but uses the same routing equipment. I didn't really probe it / her.

So I said, well a good test, then would be to try with my MAC address starting with a 4 and a 6 and see if we get packet loss on both of those. And we do get packet loss with both 4c00 and 6c00 but NOT 5c00. So she's passing it off to some other team.

How many departments in a network team?

Anyway, if this is the issue Otanx commented, which it is looking like it is, then it's not just because the MAC begins 4xxx like the document says. There's more to it than that. I saw packet loss to 4c00 but not 4D00... This might be why it's not been picked up yet by other customers of theirs, not sure.

I'll be happy to get this resolved as the Sri Lanka to Australia routing is only 130ms on this service compared to 320<450ms average on the current one. :)

Nice work Otanx :)

Otanx

Quote from: Dieselboy on May 05, 2016, 05:50:32 AM
Anyway, if this is the issue Otanx commented, which it is looking like it is, then it's not just because the MAC begins 4xxx like the document says. There's more to it than that. I saw packet loss to 4c00 but not 4D00... This might be why it's not been picked up yet by other customers of theirs, not sure.

I wish I had the audio, or notes to go with the presentation, but I think you are right. The 4 or 6 is just the starting point of the problem. What I understand is happening is that when one of the MPLS routers tries to load balance it needs to look at the packet being encapsulated. The router does not know if this is a IPv4 packet, IPv6 packet, or an ethernet frame. The logic that is being used says look at the first four bits. If this is a 4 or a 6 then assume that the field is an IP Version field, and the encapsulated data is a IP packet. If the first four bits are not 4 or 6 then assume an ethernet frame. Then the router uses this assumption to identify what part of the encapsulated data it will use to load balance. So if you look at the MAC address and map it to a IPv4 header the second number will identify the header length. This will change the location of where the router looks for data as well. If you get lucky the bits picked are static packet to packet, and everything works. Shift the bits selected for load balancing (by changing the second number from a D to a C) then the data is not static, and you get issues.

-Charles

Dieselboy

Great explanation :)

I love finding these types of issues because you learn a lot at a real deep level. These types of issues I find only come up every now and then. Sometimes they take MONTHS to resolve because finding the cause is difficult. I think you've saved the ISP months of troubleshooting :)

Dieselboy

Update: They use Juniper switches and the layer 2 technology is VPLS.
They're taking this issue to the vendor.

Dieselboy

Got a call from their overseas call centre to say they had fixed the issue and I should test. I've not been able to test yet as I need an outage window but I'm now losing 10% ESP packets on the VPN using this internet circuit.
:matrix:

deanwebb

Ouch, 10% loss is not good, at all.

I mean, ifwe lost te percent f the chaacters I yped, tha would mae things ard to unerstand, or sure!

ee what Idid there
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Dieselboy

The other VPNs aren't perfect either though. 1% loss on the back up tunnel :( :( :(