WAR STORIES!

Started by deanwebb, March 09, 2015, 02:49:00 PM

Previous topic - Next topic

awilderbeast

Ok so it begins...
Backdrop: Old DC, 2800s routers, 100s of old servers, we are migrating to new fancy DC, not there yet. all current kit full of faults and fans broken etc

Saturday afternoon last year:
Panic panic, ring everyone, text everyone, the sites down! on call engineer logs into our DC routers, we keep losing connectivty, manage to get a sh int off, routers are taking huge ammounts of bandwidth. why?! We must be gettind DDoS'd. Call ISP, yup confirmed DDoS, they blackhole our traffic...
ISP says attach was 15GB

Week after:
Look into DDoS mitigation options, speak to providers/vendors etc
Ransom email from Attacker, give me 1 bitcoin and i will stop...
ignore attacker
Signed up for encapsula cloud scrubbing trial, havent changed DNS yet, still speaking to other vendors.
many vendors boxes are aimed at ISPS, their price brackets are huge, vendords baffled by our ISPs inabliity to protect us...

Saturday again:
Attack, Attack!
everyone get in asap... what can we do?!
Turn on the free trial of scrubber... DNS changes (our TTL 5 mins) login to trial dashboard, see that they are taking hits, gradually we get some site acitvity back

Mr attacker not happy about that... must've search our RIPE records or query some other addresses... Attacks the head office were we are trying to stop him (must of been luck)
Head office taken down now!
Head office is where all orders are processed, connected to live DC via VPN
Worst thing about the head office is its the head office for two sister companies, and we have kit serving both companies on the WAN here :|

Mgmt - we need to not have two companies taken out by this, do something to make sure it onyl effects 1 company if it happens again, can you do it now...
Dual ISPs at head office, shut BGP session with 1 ISP, NAT out of it and DMVPN on it, ISPs public ISP now being used, unfeffected by DDoS hits.

Luckily we have a temp L2 link between the new DC and the old DC. re route all traffic to old DC via the new DC and the L2 link there
We can now get to the live DC regardless of WAN saturation and we still have some services online if he hits head office again.

Week after:
decided on DDoS provider, pay for emergency install ($$$$) get onsite appliance that can take 1GB backed up with cloud scrubbing service (they advertise our AS more attractively than us and send clean traffic down gre tunnels to our routers)

Saturday:
Attack Attack!
on prem device and cloud scrubbing service kicks in, site is up during attack, VICTORY!!
attacks head office, we have cloud service only at head office, divert head office to cloud service too, VICTORY!

Mmeanwhile... police involved, eventually heard that they arrested someone from france, he was ransoming loads of companies for 1 bitcoin...

That was a loonnng three weeks, i think me and my colleauge worked 5 weeks time in the space of that 3 week period.
Safe to say were now at the new DC with new equipment on prem devices and cloud services and back using BGP on both our links at the head office

Phew!




dlots

Came to this job and wanted to know what the network looked like asked for documentation... There isn't any... none, 200+ routers and unknown numbers of switches and no documentation, and we test IA stuff here so no CDP or anything like that.  So I started to make some, and here is what I found:
5 EIGRP AS numbers redistributed into one another (for no real reason)
Multiple over lapping IP ranges
VTP client/server mode with no passwords: found that out when I plugged in a new switch and took out the network (oops)
SIA flapps all the time
incompatible IOSs that made EIGRP flip out on a regular basis
huge amounts of the routing done with static routes
1 router that was a horrid NAT hell inside/outsides randomly slapped on interfaces with no real reason
No QoS on the backbone gear


wintermute000

I had to log into (and conf t some non trivial changes) a 2600 the other day... IOS 12.3 and all. Bonus points: via a BRI dial on demand circuit. Flashbacks to CCNA classes circa 2006.

I had to actually look up a RV042 lol.

Nerm

Due to NDA's I couldn't take a picture of something I got to see today but will describe as best I can.

I go onsite to a customer HQ and in the wiring closet I find all the existing cable runs are documented and by documented I mean a piece of cardboard had what the cable was to written on it and then a hole poked through it with the cable going through the hole.

:zomgwtfbbq:

deanwebb

:wha?:

Cardboard? As in a cut-up piece of a box with marker on it?

Were the edges smooth or rough on the cardboard pieces?
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

hizzo3

Was it fire resistant cardboard? Maybe they used it to keep the EMI crosstalk down by properly spacing their non-spec from China patch cables.

Nerm

Yep just cut up box kind of cardboard. One of them even still had part of a UPS shipping label on it. lol

deanwebb

Quote from: Nerm on August 21, 2015, 07:09:28 AM
Yep just cut up box kind of cardboard. One of them even still had part of a UPS shipping label on it. lol

:facepalm3:
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

wintermute000

#38
i don't see how that's that bad.... maybe I'm just old skool, and actually remember what a krone tool looks like, along with all the tags that are tied to the pairs with jumper wire lol, the cardboard is just a variant of this.



When I am doing a quick and dirty cabling change and I don't have time/tools to use proper sticky labels, I write the port info on a piece of paper, make a hole in the paper and push the cable through it as an improvised tag. Many such tags still exist 'in production' LOL.


I've also installed a 4900M once where the customer rack was too short to fit the rear rails, leaving the front rack nuts clinging on for dear life at a 15 degree or so angle. As there was no shelf to be had, the customer and I tied a fistful of cat5 to hold up the back as an improvised cradle. I told him to go buy a shelf the next day. He tells me its still working fine. This customer also happened to use this comms room as a combined cleaner's closet and there was even broken glass on the floor on a regular occasion since the cleaners also throw broken furniture in there and even spare furnishings (stock paintings etc.)and folding chairs etc. so par for the course.

Nerm

Paper I have seen, but cardboard? That was a first for me.

deanwebb

After the quick 'n' dirty, it's time to do the clean 'n' nice, not to leave the quick fix in place. Get a label gun and go to town on those wires.

AND LABEL BOTH ENDS.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

deanwebb

New one...

Switch at main office goes down. CPU spiking bad on it. Culprit is determined to be scanning traffic.

Techs there find three devices doing the scanning: Server1, Server2, and NAC_SERVER_DEANWEBB_IS_RESPONSIBLE_FOR. OK, so that's not our naming convention, but that's kinda what they see. Two generic server names and IP addresses and one very well-named device that they immediately zero in on because it requires no imagination or investigation.

So the techs put an ACL on the switch that blocks all traffic from the NAC box.

Right about that time, people in the main office report that their wireless is dropping them.

A brief investigation uncovers that the NAC box gets the RADIUS request, but claims the WLC isn't set up for dot1x. The WLC says it sent the request, but never received a response. Well, duh!

NAC has not been a problem with its scans for over a year. It didn't need no blocking.

As for the other, generically-named servers? Turns out, they were Qualys scanners set to full blast and the switch was not exempt from their wrath. Neither did the switch have an ACL on it to block Qualys traffic to its management interface, like it was supposed to have.

:ivan:

Resolution: How about unblocking the NAC and doing a little more investigation? If a switch CPU just hit 100% and you have a vulnerability scanner set on "KILL", check the vuln. scanner FIRST, mmmkay?

:tmyk:
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

AnthonyC

Quote from: dlots on July 07, 2015, 07:02:16 AM
Came to this job and wanted to know what the network looked like asked for documentation... There isn't any... none, 200+ routers and unknown numbers of switches and no documentation, and we test IA stuff here so no CDP or anything like that.  So I started to make some, and here is what I found:
5 EIGRP AS numbers redistributed into one another (for no real reason)
Multiple over lapping IP ranges
VTP client/server mode with no passwords: found that out when I plugged in a new switch and took out the network (oops)
SIA flapps all the time
incompatible IOSs that made EIGRP flip out on a regular basis
huge amounts of the routing done with static routes
1 router that was a horrid NAT hell inside/outsides randomly slapped on interfaces with no real reason
No QoS on the backbone gear

That sounds almost typical.
"It can also be argued that DNA is nothing more than a program designed to preserve itself. Life has become more complex in the overwhelming sea of information. And life, when organized into species, relies upon genes to be its memory system."

NetworkGroover

Quote from: AnthonyC on September 29, 2015, 10:16:51 AM
Quote from: dlots on July 07, 2015, 07:02:16 AM
Came to this job and wanted to know what the network looked like asked for documentation... There isn't any... none, 200+ routers and unknown numbers of switches and no documentation, and we test IA stuff here so no CDP or anything like that.  So I started to make some, and here is what I found:
5 EIGRP AS numbers redistributed into one another (for no real reason)
Multiple over lapping IP ranges
VTP client/server mode with no passwords: found that out when I plugged in a new switch and took out the network (oops)
SIA flapps all the time
incompatible IOSs that made EIGRP flip out on a regular basis
huge amounts of the routing done with static routes
1 router that was a horrid NAT hell inside/outsides randomly slapped on interfaces with no real reason
No QoS on the backbone gear

That sounds almost typical.

Uhhhh... yup. Especially in that space, dlots.  I tried not to laugh the other day when VTP was listed as selection criteria for new vendors.  I would never, ever, use VTP.  I'll take pushing VLANs with Ansible or hell even manual copy/paste over VTP any day of the week for exactly that reason.  I understand VTP3 is better but... *shudder* - No.. just.. no....
Engineer by day, DJ by night, family first always

Nerm

Recently had this fall in my "projects" folder.

Client bought new server running Windows Server Std 2012 R2 and is having trouble getting AD migrated from old server. *note: Onsite only as old server is not connected to the internet.

I get onsite and find that the old server they are trying to migrate from is running Windows Server 2000. WTF? This client has 3 full-time in-house IT personnel and not one of them thought to check the migration path. Worse why are they still running a 2000 server to begin with?