Internet routing issue from Sri Lanka to OZ...again - but this one's weirder

Started by Dieselboy, November 15, 2018, 12:19:40 AM

Previous topic - Next topic

Dieselboy

I have an issue with traffic from our SL office to our Perth office in Australia. In SL we have one ISP connection. In Perth we have 2, a business ADSL which is about to go away and a business 50M fibre.
From SL to our ADSL, UDP is completely broken and never gets to Perth. Confirmed being transmitted from SL and it's lost in transit. However, ICMP ping is working completely fine. Although ICMP trace route breaks from Perth to SL at the final hop of the middle intermediary path before it gets to the ISP network which we use in SL. I raised a fault with them and they say that they are learning a /18 route from our SL ISP and so it's not a routing issue, may be they are denying traffic.

From SL to our Perth 50M, UDP and ICMP seem to be working fine. Traces also complete 100%. But TCP has about 50% packet loss. Whats odd is that I have people checking at points in path and specifically in Melbourne, if I send 20 TCP packets, sometimes they see less than 20 (like 16 packets for example) and sometimes they see more than 20 (24 for example). Once they had received the same number to what we sent.

I'm hoping that the two issues are related. The UDP loss issue means phase 1 of our backup VPN tunnel is completely dead. TCP is not so much of a problem because it's tunnelled over the main VPN as UDP which appears to be fine for now.

Head scratcher...

Dieselboy

Okay so we have been looking into the TCP loss issue. The ISP in Australia was able to capture somewhere in the middle of their network. Actually on the other side of the country, in Melbourne an they also get more packets than we send. They sent me the captures.

I can see the duplicated packets. What is strange is that the duplicate packets are exactly 30 seconds apart. We can't time travel, so I take it the first packet is the real one, then there's a duplicate packet 30 seconds later. But theyre not identical it seems. Take a look:

1st packet:


16:47:27.364014 IP (tos 0x2,ECT(0), ttl 110, id 9402, offset 0, flags [DF], proto TCP (6), length 52)
    122.x.x.50.51376 > testing.telstra.net.3000: Flags [SEW], cksum 0x9ecf (correct), seq 2739841333, win 8192, options [mss 1380,nop,wscale 8,nop,nop,sackOK], length 0


Then, 30 seconds later:


16:47:57.754159 IP (tos 0x0, ttl 238, id 48822, offset 0, flags [none], proto TCP (6), length 40)
    122.x.x.50.51376 > testing.telstra.net.3000: Flags [R], cksum 0x000f (correct), seq 2739841334, win 0, length 0


I've obscured the source IP because didn't want to divulge it online.

There are differences such as ToS and length. ToS is that packets are being sent with ECN. the 1380mss is also set on the real packet as captured on the egress point of my network.

Bear in mind that I captured traffic at the source and I only see the one of each packet.

Investigation continues....

icecream-guy

Stabbing my pitchfork into the air, it smells like MTU issue to me.
try that ICMP thing again, that works, and increase the packet sizes until it doesn't, that'd be your max MTU.
:professorcat:

My Moral Fibers have been cut.

deanwebb

Quote from: ristau5741 on November 15, 2018, 06:03:47 AM
Stabbing my pitchfork into the air, it smells like MTU issue to me.
try that ICMP thing again, that works, and increase the packet sizes until it doesn't, that'd be your max MTU.

As in, the big packet gets fragmented? I had that thought, as well, but wouldn't we see the fragments in the captures?
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

icecream-guy

Quote from: deanwebb on November 15, 2018, 10:21:46 AM
Quote from: ristau5741 on November 15, 2018, 06:03:47 AM
Stabbing my pitchfork into the air, it smells like MTU issue to me.
try that ICMP thing again, that works, and increase the packet sizes until it doesn't, that'd be your max MTU.

As in, the big packet gets fragmented? I had that thought, as well, but wouldn't we see the fragments in the captures?

youd think so,    also wondering why is sends SYN and the, RESET 30 seconds later,  probably because it does not receive SYN-ACK timely
Also the TTL on both those is weirds 110 on first and 238 on second. 
:professorcat:

My Moral Fibers have been cut.

Otanx

30 seconds sounds like a timeout. I am guessing there is some kind of firewall/IDS/IPS something that is watching the connection, and when the syn/ack does not come through it spoofs the source and sends the reset. This accounts for exactly 30 seconds, the different TTL, and the fact that Dieselboy does not see the reset leave his network.

It could be a MTU issue, but something is sending the reset, and that isn't a normal MTU issue thing. Either you get fragments, ICMP packet-to-big, or silence.

-Otanx

Edit: Thought about it right after I hit submit... It could be a DDOS prevention box? It thinks the SYN is a spoof/attack, and is sending the RST to prevent the server from holding resources.

Dieselboy

Guys thanks, I missed / overlooked the R flag on the 2nd packet.

We looked at possible fragmentation with icmp, icmp always works fine (except trace). I can do these large pings all day long with no loss or rather just one or 2 icmp loss in a very long time.

C:\Users\>ping 122.x.x.50 -t -l 1472 -f

Pinging 122.x.x.50 with 1472 bytes of data:
Reply from 122.x.x.50: bytes=1472 time=250ms TTL=237
Reply from 122.x.x.50: bytes=1472 time=250ms TTL=237


The SYN is the real packet that comes from me and is captured on my router WAN port. Confirmed no other packets leave my side. Somewhere on the internet, comes this other packet 30 seconds later.

I'll need to do this test again and save the far side captures to see if this packet is missing and if we get the reset as well to see if there's a pattern there. May be the SYN gets lost, does not get any response and something sends the R because it sees that occur. Because another thing we noticed is that the SYNs seem to be the only packets that go missing. Once the TCP session is established there does not appear to be any issue. For example, I try to SSH into the router WAN IP. I can launch 5 SSH sessions and 1 or 2 will establish and give me a login prompt. I can log in and use the SSH session without any issue whatsoever. There's no packet loss in the ssh session. You know you have loss in an ssh session because it goes incredibly slow and stops working intermittently. I don't see any of that here.

So may be there;s some kind of firewall as suggested. However it's just plain internet circuits at either end... apparently.

Dieselboy

Update:
I used a span port to capture traffic. The RESET is coming from our router WAN and thinking about that, the capture isn't picking it up because the RESET packets will be processed switched by the router CPU and is not being captured due to the CEF packet capture.

So a bit less weird, and great I can account for the extra packets now. But what this leaves me with is, 40 packets sent from SL and only 13 make it to Australia.

wintermute000

you're going to have to throw up all the captures raw if we're going to take a closer look I think. Everyone has mentioned all the usual suspects (MTU etc.) as well as the oddities (TTL?). Keep us posted and if you're OK to post the captures i'm curious to take a peek. Obviously try to get them at different points for the same convo (e.g. SRC, midpoint, DST). If possible capture not using EPC but using a span port at the source/dest as you've already started doing.

My interpretation of your 30 second ghost - the source hasn't heard back for 30 seconds, so it sends a RST. The sequence number clearly shows its literally the next packet. I'm not familiar enough @ protocol level to tell you whether or not it should have attempted a few more SYNs before giving up the ghost, but that could just be the behaviour of your test router. Or, the difference in TTL is pointing at a FW or IPS spoofing the RST. Windows has a default TTL of 128. In fact as I type this there really is a lot of circumstantial evidence to suggest some funny buggers in the middle.

Far out thought: could your block be hijacked through China and hence great firewalled? What does a looking glass look like (heck, multiple looking glasses).

Dieselboy

I was on leave for 2 weeks.

Good points by all, I'll summarise where it's at presently:

1. duplicate packets == These are due to the following (you guys have mentioned this already and confirmed this is what is happening) TCP packets sent from source, when there is no reply our firewall sources a RST. So this one is moot / not an issue.

2. Lost UDP packets resulting in VPN down between source and main site backup IP
3. 60% loss in TCP traffic

4. Aside from issue 2 and 3, we occasionally get lost UDP on our main VPN connection but also during this time we cannot ping either and cannot access the SL site router via SSH either. However both sites have access to the internet. It appears as though the remote site has lost their internet connection, but we never lose our net and confirmed with people on site they too never lose their net either.

There are 3 providers responsible for this path. We purchase internet from 2 of them. So there's an intermediary realm which both ISPs have a relationship with (fingers crossed).

The intermediary has confirmed many input drops when receiving from the ISP on the SL side:

Quote4123512026409 packets input, 840794411163573 bytes, 1061102027 total input drops

The link is a 2.5G

They then ran the command 1 second later:

Quote4123512216291 packets input, 840794460301397 bytes, 1061102769 total input drops

1500-byte ICMP packets arent affected when I send:

CIN-SL-2901-1#ping 110.x.x.191 source 122.x.x.123 size 1500 df-bit
Type escape sequence to abort.
Sending 5, 1500-byte ICMP Echos to 110.x.191, timeout is 2 seconds:
Packet sent with a source address of 122.x.123
Packet sent with the DF bit set
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 300/301/304 ms
CIN-SL-2901-1#

.

So the intermediary provider has opened a TAC case with the vendor (Cisco) and the vendor has asked me for my IPSEC VPN config. The VPN is currently broken because udp port 500 traffic is being blocked from the initiator (SL) to the destination, in path somewhere. I sent them the config anyway just so they dont say that I'm hindering progress. They came back to me today and said I need to change from tunnel mode to transport mode. Obviously, not going to do that since phase 1 is not even coming up due to lost packets, making such change would be irrelevant.

Packet capture from source shows UDP 500 packets being sent and packets are about 250-bytes in size.

Previously I thought that sending packets with qos info could be interfering with the carrier network somehow however I have ensured that we're not sending packets on the internet with qos tags anyway; they're removed by the WAN switch before the packet ingresses into the upstream ISP router, so this is moot also.

Dieselboy

Here's the latest from the intermediate provider, on guidance from Cisco:

QuoteHi xxx/xxx team,


Our vendor comes with the suggestion below which will impact your service at Singapore (SNG IPT xxx– [isp name] circuit ID). [isp name omitted] peering ip is 202.x.118 (ASxxxxx). Will you approve and provide a time window for about 2 hours maintenance schedule when to perform since we need our field operations engineer to move the port(step#2) and reload the SPA card(step#3).



1.       Shutdown/no shutdown the interface (POS0/7/0/1/)

2.       Move the port from 0/7/0/1 to 0/7/0/0

3.       Reload the related SPA card

Thanks,

I'm guessing the above is to try and resolve the input drops. I like the interface name on their equipment "P O S"  >:D

Dieselboy

Done some more packet capture tests with this scenario:

Capture on the Australia site ASA 5515

Source is a Centos VM running netcat in SL and has a static NAT to a specific test IP which is known problematic.

I first tried setting destination ports, ie source port is chosen by the OS and I'm targeting port udp/500 in Australia and watching the packet capture for the packet. All chosen destination ports are successful and I can see them in the capture on the Australia side:

dst ports
udp 1 - success
udp 500 - success
udp 600 - success
udp 65535 - success


So then I leave destination port as 65535 as I know it works and I start setting source ports. I find that specific ports make it in to the capture and others do not:

src ports
src 500 dst 65535 - fail
src 65535 dst 65535 - success
src 65000 dst 65535 - fail
src 6500 dst 65535 - success


I am going to wait until they do their site outage and if it's still unresolved after that then I'll give them this info.

It's confusing to me.

Otanx

I am going to guess that there is some load balanced path (ECMP or etherchannel). One of the links is dropping packets and one isn't. If you hash onto the right one everything works.

-Otanx

Dieselboy

That seems plausible :) Would be very handy to have a capture at the egress of this ISP as it goes to the next; but so far the SL isp has been reluctant and have only provided capture from the ingress of their customer facing interface on the customer router. I'll let you know what happens after their port move as there's input drops there that they're trying to resolve. I am wondering if the input drops are due to misconfiguration rather than issue, though. For example, receiving traffic on an unconfigured vlan ID.

deanwebb

Quote from: Otanx on December 04, 2018, 09:14:58 AM
I am going to guess that there is some load balanced path (ECMP or etherchannel). One of the links is dropping packets and one isn't. If you hash onto the right one everything works.

-Otanx


Or one of the LB links goes through a firewall and the other one doesn't...
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.