Sine we're all patching ASA's thse days....

Started by icecream-guy, February 09, 2018, 09:44:01 AM

Previous topic - Next topic

icecream-guy

What steps do you take to validate the functionality when you just loaded untested code in the lab?

for me:
did the firewall come back online?
can I ping he management interface?
can I login to the device?
Check logs, (any interesting events such as tracebacks)
check failover status (if configured)
Check interface counters to validate interfaces are passing traffic
check arp table
check local-host table
ping something outside the network
ping devices in inside network(s)
:professorcat:

My Moral Fibers have been cut.

SofaKing

A few years back we upgraded  a pair of Palo Alto firewalls and after the upgrade it started to drop voice traffic.  The upgrade was done off hours so we did not find out about the issue until the morning.  Turned out to be a bug in the version we upgraded to and had to drop back another version in the middle of the day.  Lots of unhappy people.  So I now place both inbound, outbound, and internal calls after every upgrade.  Slim chance it will happen again but its on my list.
Networking -  You can talk about us but you can't talk without us!

Otanx

Before loading
-Read release notes for known issues
-Check online for any issues being reported (here, reddit, etc).
-Download, and verify checksum
-Push to lab, verify checksum, wr mem, reload

Lab tests
-Verify it booted (usually watch the console)
-Make sure we can login, enable, config t
-Do a show run look for any obvious differences
-Do a roll back to the previous version, and upgrade again. It sucks if you get to prod and don't know how to do a rollback.

Pre-prod - we have a site in production that has no production use. It is a mirror of our normal deployment, and we can generate traffic
-Push image, validate checksum, wr mem, reload
-Make sure it comes back up
-Check functions
-- TACACS
-- IPSec - clear the tunnel, and make sure it re-establishes
-- SNMP - Verify ifIndex didn't change, make sure monitoring tools show everything as OK.
-- show conn - we should see a few connections, and if needed can generate some.
-- RANCID - use clogin, and then rancid-run -r hostname.
-- Credentialed Nessus scan (for our cyber team)
-- Roll back and upgrade again, and run Pre-prod tests again

Prod-Stage 1 - Upgrade sites within driving distance (about 5 right now)
-Same steps as Pre-prod, but no roll back, and no scan (will get hit on normal schedule)
-Wait till next morning for problems to show. During this time we can stage the rest of the sites

Prod-Stage 2 - Do the rest
-Same as Prod-Stage 1
-Close upgrade ticket

We can get from reading release notes, to doing Stage 1 in about four hours if there are no problems. We would then upgrade the Stage 1 sites at lunch. Then do Stage 2 the next morning. Finishing around lunch.

-Otanx

icecream-guy

Quote from: Otanx on February 09, 2018, 01:08:29 PM

We can get from reading release notes, to doing Stage 1 in about four hours if there are no problems. We would then upgrade the Stage 1 sites at lunch. Then do Stage 2 the next morning. Finishing around lunch.

-Otanx

That must be nice,  we have to coordinate with the customer that owns the firewall, schedule an upgrade, line up their people to check services after the firewall upgrade,
upgrade, wait till they do all their checks and their sign off, then we can move on to schedule the next one, CM process is about 2 weeks for processing before we upgrade...many hoops to jump through
:professorcat:

My Moral Fibers have been cut.

wintermute000

At a minimum I'd say you'd want to run your critical apps and DMZ services/NATs through the FW and confirm the sessions are being established and torn down correctly.
Ditto with all VPNs and client VPN.
SHould be able to auto-script something from a VPS that hits all your DMZ services on appropriate ports in an automated fashion (python requests module, simple bash script w/ curl, etc.)

Of course if ur cyclikng through 200 at a time then your time is limited but all the more reason to automate it - remember this kind of automation is portable, can run your new palos or whatever thru the same tooling once you get out of ASA hell

deanwebb

Quote from: SofaKing on February 09, 2018, 12:10:06 PM
A few years back we upgraded  a pair of Palo Alto firewalls and after the upgrade it started to drop voice traffic.  The upgrade was done off hours so we did not find out about the issue until the morning.  Turned out to be a bug in the version we upgraded to and had to drop back another version in the middle of the day.  Lots of unhappy people.  So I now place both inbound, outbound, and internal calls after every upgrade.  Slim chance it will happen again but its on my list.

^QFT. We think we have a comprehensive list and then something like this happens... and then we get a new line-item on the list.

I know of companies that have development environments that lead to integration environments that lead to production environments. It can take as much as a month, more likely two months, to get from dev to prod for software. The network in dev/int environments is pretty much like the environment in prod, but there's no customer connection, so there's always a question mark about it working with all the perimeter gear when it goes into prod.

And if it doesn't work, they add another line to all future testing.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Nerm

You guys test before upgrading in your production environments? What a novel idea.

:gangsta:

deanwebb

Quote from: Nerm on February 16, 2018, 08:09:40 AM
You guys test before upgrading in your production environments? What a novel idea.

:gangsta:

Well, sometimes it's upgrading to a portion of production and then hoping it doesn't break or ripple out across the network...

:explosion2:
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.