[CLOSED] Strange and unpredictiable results when connecting through a switch

Started by Jim Anderson, March 06, 2020, 05:09:16 PM

Previous topic - Next topic

Jim Anderson

Hello to all.

A brief introduction. I'm 71 and still writing software and maintaining a home Linux network at home. I have been doing this for about 15 years and have had minor problems, but I have always managed to keep the network up and alive. Until Monday, March 1st, when I started losing connections to the internet. Before I go on, a brief background. I was s software developer writing CAD software for integrated circuits until my company was purchased by a West Coast company and I choose not to move west. I worked for several years for Transwitch Corp as a network planning manager. I worked with lots of IT guys, but I was never down in the trenches. So, I have some decent data communications background, but to be honest I have forgotten a lot.

Now I will go on with my problem.

I have for a number of years been using the following diagram for my network, where EL is an ethernet link and SW is a switch.

DSL modem --> wifi/ethernet router --> EL --> SW --> EL --> SW --> EL - PC

After my problems started I read an article about daisy chaining switches is not good, so I have reduced my network to:

DSL modem --> router --> EL --> SW --> EL --> PC

This worked for half of today and then failed again. The symtoms of the problem are consistent. When things work, I can ping my router or www.google.com with no problem. Then mysterious, things fail. I can then ping my router, but when I ping www.google.com it times out and I get a message like: www.google.com name cannot be found. The exact message differs depending on which Linux distro I am using, but the messages always hint that there is a DNS problem.

When I get a failure, I can bypass the switch and connect directly to router and there is never a problem. Only when I go through the switch.

My understanding is that a switch is a level 2 unit and it should not interfere with the message being sent by the user to the destination.  I'm very surprised that the switch is causing a problem. I have swapped out everything from the router to the PC to make sure it was a H/W problem. I still get the same results.

So that is where I'm at. I think the next step is to start analyzing the link traffic, but I have no experience analyzing links. I think there is a good chance that timing delays are causing my problems, but I don't undertand why it would work well for years and then fail.

I am hoping one or two folks can lead me though the analysis and help resolve my problem. I cannot always respond immediately, but this is a top priority for me and in most cases I can respond within a day. In most cases, I can respond much more quickly and will try to do so.

I will leave with starting question. Should I not be able to communicate using ethernet from my PC to a swtich to a router to a DSL to the internet?

Jim A.

wintermute000

Your topology should work.
Tbh an extra switch should do nothing
You're right it's layer 2
Run a packet capture

Jim Anderson

@wintermute000

Thank you for the suggestion. I do think I am at the point where I need to be looking at bits and bytes.

I'm not familiar with packet capture, but I did a quick search and found the tcpdump can be used for a packet dump. I'm reading up on tcpdump now.

Jim

icecream-guy

when failed, can you browse by ip?

e.g. https://1.1.1.1

it just may be a DNS issue.

I suppose everything in in the same VLAN?

is your router providing DHCP?
do you have IP address when in failed state?
:professorcat:

My Moral Fibers have been cut.

Jim Anderson

@riatau5741

I just tried pinging google at the IP below:

ping 172.217.10.36
PING 172.217.10.36 (172.217.10.36) 56(84) bytes of data.
64 bytes from 172.217.10.36: icmp_seq=104 ttl=57 time=14.1 ms
64 bytes from 172.217.10.36: icmp_seq=161 ttl=57 time=13.5 ms
64 bytes from 172.217.10.36: icmp_seq=162 ttl=57 time=12.4 ms
^C
--- 172.217.10.36 ping statistics ---
195 packets transmitted, 3 received, 98% packet loss, time 195108ms


It took a long time for those 3 messages and the 98% packet loss is disturbing.

Everything is physically in my home LAN.

I have my router set up for static IP's and I assign IPs to each of the PC's in the LAN.

I'm not sure what you are asking when you ask, "do you have IP address when in failed state?"  You want the IP address of the target receiver?

Jim A.



Dieselboy

Hi OP.

Nice post - I agree it seems like a DNS problem. One thing which would provide more of a finger-point towards DNS being the issue is whether Ristau's suggestion is working when this issue is present; which is to try and use a web browser to access https://1.1.1.1 and see if you can load the page during the issue.
Quote from: some infoThe "1.1.1.1" IP address will respond to HTTP requests while avoiding the DNS mechanism to convert names to numbers. Since https://1.1.1.1 uses numbers only, it avoids the DNS mechanism.

Your network
Going to your latest message, where you state that you statically configure your PCs with IP addresses; may I ask how you are allocating DNS resolvers? In most cases nowadays, PCs are left to their default which is to try and obtain IP addressing automatically via DHCP. Typically, they get IP address, Subnet mask and default gateway. In addition, they get one or more DNS resolvers. As you are statically assigning the IP (and DNS), which IP's specifically are you using for DNS?

I am going to go with the assumption that you may be doing either of the following:
1) using the DSL modem IP address for DNS servers in the PC (and subsequently, usually uses (or rather, points to) the ISP DNS servers)
2) using the internet providers DNS servers in the PC

With the above, I've seen `1` have problems because of software issues or mis-configuration (from the ISP) on the DSL modem. With `2` I've seen ISP DNS servers not be maintained or to be fully working 100% of the time.[/quote]

Tip: to see what DNS servers are currently in use on your system:
Windows =  in the command line program (cmd.exe) run the following and exclude the bracket marks (`) `ipconfig /all`
Linux = What I do is use the terminal program to `cat /etc/resolv.conf` the file which contains the IP addresses of the DNS servers to be used for name resolution.

Now if you are game to try using different DNS servers to see if the issue is worked around / resolved, then there are a number of internet DNS servers that you can use. In fact, myself and my family and friends that ask me to set up their internet are using a combination of the following, to avoid the ISP issue or DSL modem issue I mentioned:

Quote from: Internet DNS IP's1. Google has open DNS servers located at 8.8.8.8 and 8.8.4.4
2. There is also the IP that Ristau had mentioned earlier - this is a DNS server: 1.1.1.1
3. There is also "opendns" 208.67.222.222 and 208.67.220.220 ref: https://use.opendns.com/

//

tcpdump quick start
tcpdump -i X
(replace X with the name of your interface, example eth0 or another example eno2)

When you run that, it will start capturing and printing to the terminal. If you want to save a capture file so that you can load it with an application like wireshark, then use -w to write to a file:

tcpdump -i eth0 -w /tmp/capture1.pcap

And if you want to capture and filter exclusively traffic between your system and a destination IP (like the DNS server that is being used) then:

tcpdump -i eth0 -w /tmp/capture1.pcap host 8.8.8.8

Remember to replace 8.8.8.8 with the IP you want to capture...


Regarding your earlier ping test
As you're on linux you only see the replies and not the lost packets. Hence, you did not see anything until your system had responses. When that happens, it's a good idea to try and find out how far your ping requests are actually getting. For example, if they are actually making it out to the internet, of if your local DSL internet is down at that time (and re-training) or if there is a switch problem in your network. One thing which I do is simultaneously run a trace route (or otherwise called trace path):

tracepath -n 172.217.10.36

The `-n` says not to try and use DNS for looking up the IP address to the DNS name.

Also, since IP addresses like 8.8.8.8 and 1.1.1.1 are anycast, they offer benefits such as local geographic reachability and high availability.

icecream-guy

Quote from: Jim Anderson on March 08, 2020, 02:43:09 PM


I'm not sure what you are asking when you ask, "do you have IP address when in failed state?"  You want the IP address of the target receiver?

Jim A.

moot point if you are statically assigning IP addresses, but I was referring to the local host on your network.
:professorcat:

My Moral Fibers have been cut.

Jim Anderson


My apologies to all. I have been working on my problem constantly for the past two weeks now, but by taking an alternate path. I took a path where I assumed that that the problem was either a router hardware or software problem. For roughly the past week, I have been looking at different routers and finally settled on a router package called, 'Google wifi'. This was a good move and I have my network up and running again.

Google wifi has provided very good wifi signals AND the ability to maintain my internal network via ethernet. The documentation on the product was fair, but usable. In comparison, the documentation that I found with other routers was poor to absolutely awful.

I thank all of you for your input, particularly Dieselboy, who obviously spend a fair amount of time studying my problem. I know that this thread will continue to be here, but I copied the entire thread to my computer so that I will have it for future reference. There are several thoughts that I want review further and keep in mind for the future.

Four thumbs up for the help I received here!!

Jim

Dieselboy