Cisco is at it again!! (Stable releases... NOT)

Started by LynK, May 26, 2015, 03:23:53 PM

Previous topic - Next topic

icecream-guy

#15
Quote from: LynK on June 30, 2015, 12:43:47 PM
@ristau,


How are you allowed to go ahead with upgrades during normal operational hours, (or even off hours). If I say I am going to upgrade our 7Ks, and there is no outage (ISSU <3). The whole company goes nuts. Honestly... when I upgrade my core it is normally a year or 2 before I upgrade to a new-old version that has been tried and true...

I don't know... call me a baby.. but my upper management would flip bricks.


Access/Distrib. A O K. no problems upgrading.... core... DIR gives me the :zomgwtfbbq:


usually same for us, but the mainframe team's been complaining about "slow replication" since the new data center was built.  after much troubleshooting and discussions with IBM and Cisco,  IBM's recommendation was to implement flow control on the interface, which Cisco recommended the 7.0 release that would support the requirement,  upper management got wind and in order to stop the complaints and to "fix" the issue, were all in favor. CR ween throuhg the full process, notification wen out with a "possibility of network interruption" AS did a bug scrub which was good, and we ran the code on a non-production switch for more than a week with no issues.  Goes to showu you just never know. monitoring and vigilance are required.

Best thing other then monitoring and vigilance, is to document exactly what is occurring and when, and with results, from each step. because when I was interrogated by the deputy director and the Operations manager, I would have looked alot better if I had all my P's and Q's in alignment, and not have to guess what happened and when.

so for today, I start logging my device access, using logging feature in secureCRT, so everything I type is logged for reference. and I changed the PS1 value on my jumpbox through .bashrc, so that the I have a date and time stamp at every command prompt.  just need to figure out if I can do the same D&T prompt in the Cisco cli. because when I'm in a terminal server jumping between devices, there are no unix cli date/time stamp prompts for reference.

:professorcat:

My Moral Fibers have been cut.

icecream-guy

Quote from: ristau5741 on June 30, 2015, 07:18:14 AM
hit another interesting tid bit this morning.

upgraded out 9K distro switches from 6.1(2)I3(2) to 7.0(3)I1(2)  to enable interface flow control which is available in the new release for an application that needs it.  core 9k's still running 6.1(2)I3(2).

OSPF process broke, caused a summary route from the core not to propagate to the OSPF peer distribution switches. thus leaving an island of unhappy devices that had no where to route. :(

TAC case opened and copying show-tech files now.  can't wait to see what they say.


turns out to be a bug
:professorcat:

My Moral Fibers have been cut.

wintermute000

And you're doing it the safe way (NX-OS). You should talk to the guys in my mob doing ACI. The horror.
Even in the lab (training) we were seeing crazy behaviour. Stuff mysteriously not working, then working 2 minutes later with no intervention, etc.


deanwebb

Quote from: wintermute000 on July 11, 2015, 05:54:35 AM
And you're doing it the safe way (NX-OS). You should talk to the guys in my mob doing ACI. The horror.
Even in the lab (training) we were seeing crazy behaviour. Stuff mysteriously not working, then working 2 minutes later with no intervention, etc.



Random stuff that happens without intervention = hardware craziness. Something is making that hardware do things that it does not want to do.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

NetworkGroover

Quote from: wintermute000 on July 11, 2015, 05:54:35 AM
And you're doing it the safe way (NX-OS). You should talk to the guys in my mob doing ACI. The horror.
Even in the lab (training) we were seeing crazy behaviour. Stuff mysteriously not working, then working 2 minutes later with no intervention, etc.

Heh, shocker. 

Any positive feedback been given?  Curious to know where it does well in addition to where it has issues.
Engineer by day, DJ by night, family first always

NetworkGroover

Quote from: deanwebb on July 11, 2015, 10:41:52 AM
Quote from: wintermute000 on July 11, 2015, 05:54:35 AM
And you're doing it the safe way (NX-OS). You should talk to the guys in my mob doing ACI. The horror.
Even in the lab (training) we were seeing crazy behaviour. Stuff mysteriously not working, then working 2 minutes later with no intervention, etc.



Random stuff that happens without intervention = hardware craziness. Something is making that hardware do things that it does not want to do.

Software programs the hardware... especially on switches....
Engineer by day, DJ by night, family first always

wintermute000

I wouldn't be so quick to dismiss the Cisco or Vmware juggernauts, even if Arista is the current leader of the programmable hardware switching. In fact clos designs and SDN may progress to a point where the physical switching is commoditised even further. Vmware is out there pushing the 'fabric is irrelevant just make it L3 multipathing' party line HARD. Who cares if your Arista or Juniper or Chinese knockoff vendor has an API? all leaf switches have the same blog standard OSPF config or whatever.


both Vmware and Cisco ACI are all about the overlay.


Where ACI has a distinct advantage- and disadvantage - is that everything is redefined around the actual software flow. Heck, even a VLAN in ACI is no longer a VLAN (the 'traditional 'VLAN is actually what cisco calls a bridge segment and what everyone else calls a VXLAN ID' - but its still there as an identifier - I could go on, but I'll just say read the book ROFL). The key is that policy drives everything - you have to define whats allowed to pass and whats not - and the ACI managers are not in the control plane at all, unlike every other SDN solution including Vmware. Then you trust the magic overlay special sauce to do its business. Just working out a basic packet flow is like inception if you want to peel back all the layers and go from the abstraction to the overlay mechanics to the underlay beneath the overlay. But put it this way, you can't think in terms of NICs with IP addresses and MAc addresses in a VLAN routed like XYZ. The entire flow is broken down into policy constructs, and defined accordingly. This potentially enables massive benefits in terms of orchestration and not having to worry about layer 3 design or flow.


The flip side is, its hard as hell to understand, everything takes 50 times as long unless you're going to replicate it 1000 times via an API call, and you're at the complete mercy of the 'plug and pray' underlay/overlay magic sauce. Also the feature-set is not mature yet - the service insertion (basically allowing 3rd party appliances into the packet flow likee FWs, load balancers etc.) is completely stuffed and very basic. Finally their ability to reach into the virtual layer which is absolutely critical for end to end control/abstraction is partly stymied by vmware - its unclear how the integration will work with Vsphere 6 (in 5.5 the ACI actually programs the dvswitching transparently).


Vmware hasn't gone halfway as far, and have an immediately attractive / easy to learn model of basically making virtual facimilies of current R&S paradigms and removing some of the hairpinning more or less. The underlay/overlay model is still there. But the flip side is they have no input into the physical layer, and nor do they enable the potential benefits of software defining all the traffic - you're still basically thinking in terms of connecting virtual NICs to virtual switches.


Disclaimer: have done a lot of training on both platforms and done a VCP-NV, but yet to work on production.

deanwebb

If it's incredibly complicated, then one of two things will happen:

1. Lots of people walking in and out of the data center and all around the corporate floor with big thumb drives until the network guys figure out how to give everyone proper access.

2. The GUI is pretty decent and gets everything running... until a code upgrade somewhere leads to a memory leak somewhere else and then, WHAMMO! Network is down and everyone is using guest wireless and dropbox to share files.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

icecream-guy

Quote from: ristau5741 on July 10, 2015, 10:10:48 AM
Quote from: ristau5741 on June 30, 2015, 07:18:14 AM
hit another interesting tid bit this morning.

upgraded out 9K distro switches from 6.1(2)I3(2) to 7.0(3)I1(2)  to enable interface flow control which is available in the new release for an application that needs it.  core 9k's still running 6.1(2)I3(2).

OSPF process broke, caused a summary route from the core not to propagate to the OSPF peer distribution switches. thus leaving an island of unhappy devices that had no where to route. :(

TAC case opened and copying show-tech files now.  can't wait to see what they say.


turns out to be a bug

got the id this morning

CSCuv24226.

it is very specific.

:professorcat:

My Moral Fibers have been cut.

NetworkGroover

Quote from: wintermute000 on July 12, 2015, 02:03:04 AM
I wouldn't be so quick to dismiss the Cisco or Vmware juggernauts, even if Arista is the current leader of the programmable hardware switching. In fact clos designs and SDN may progress to a point where the physical switching is commoditised even further. Vmware is out there pushing the 'fabric is irrelevant just make it L3 multipathing' party line HARD. Who cares if your Arista or Juniper or Chinese knockoff vendor has an API? all leaf switches have the same blog standard OSPF config or whatever.


both Vmware and Cisco ACI are all about the overlay.


Where ACI has a distinct advantage- and disadvantage - is that everything is redefined around the actual software flow. Heck, even a VLAN in ACI is no longer a VLAN (the 'traditional 'VLAN is actually what cisco calls a bridge segment and what everyone else calls a VXLAN ID' - but its still there as an identifier - I could go on, but I'll just say read the book ROFL). The key is that policy drives everything - you have to define whats allowed to pass and whats not - and the ACI managers are not in the control plane at all, unlike every other SDN solution including Vmware. Then you trust the magic overlay special sauce to do its business. Just working out a basic packet flow is like inception if you want to peel back all the layers and go from the abstraction to the overlay mechanics to the underlay beneath the overlay. But put it this way, you can't think in terms of NICs with IP addresses and MAc addresses in a VLAN routed like XYZ. The entire flow is broken down into policy constructs, and defined accordingly. This potentially enables massive benefits in terms of orchestration and not having to worry about layer 3 design or flow.


The flip side is, its hard as hell to understand, everything takes 50 times as long unless you're going to replicate it 1000 times via an API call, and you're at the complete mercy of the 'plug and pray' underlay/overlay magic sauce. Also the feature-set is not mature yet - the service insertion (basically allowing 3rd party appliances into the packet flow likee FWs, load balancers etc.) is completely stuffed and very basic. Finally their ability to reach into the virtual layer which is absolutely critical for end to end control/abstraction is partly stymied by vmware - its unclear how the integration will work with Vsphere 6 (in 5.5 the ACI actually programs the dvswitching transparently).


Vmware hasn't gone halfway as far, and have an immediately attractive / easy to learn model of basically making virtual facimilies of current R&S paradigms and removing some of the hairpinning more or less. The underlay/overlay model is still there. But the flip side is they have no input into the physical layer, and nor do they enable the potential benefits of software defining all the traffic - you're still basically thinking in terms of connecting virtual NICs to virtual switches.


Disclaimer: have done a lot of training on both platforms and done a VCP-NV, but yet to work on production.

No sweat - good analysis, just will be interesting to see the results when the rubber hits the road so to speak.  I said "shocker", because that's all the feedback I've heard so far.  Everyone talks about how wonderful ACI is supposed to be, but horror stories in actual implementation.  The devil is always in the details.  Looking forward to hearing more about it and your experiences with it - good and bad.
Engineer by day, DJ by night, family first always

icecream-guy

found out the hard way

so on certain releases of IOS code,
when you SNMP poll MIB 1.0.8802.1.1.2.1.5.4795
or in english lldpXMedMIB

it makes your CPU processor go to 100% and stay there.


Bug ID CSCuu05714

:professorcat:

My Moral Fibers have been cut.

deanwebb

Quote from: ristau5741 on July 23, 2015, 11:01:25 AM
found out the hard way

so on certain releases of IOS code,
when you SNMP poll MIB 1.0.8802.1.1.2.1.5.4795
or in english lldpXMedMIB

it makes your CPU processor go to 100% and stay there.


Bug ID CSCuu05714


Should I try that on my devices in the data center?
:challenge-considered:
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

wintermute000


icecream-guy

Todays bug is related to the 1000v,  as we were addling licenses 
Bug IDCSCut56474 - I can't see the bug on CCO due to insufficient privileges,

But it's just that when you have version 1 licenses and install a version 3 license, all your version 1 licenses become invalid
and it looks like we'll need to buy upgrade licenses to replace the version 1 licenses with version 3 licenses.
there is a specific license upgrade product (L-N1K-CPU-V3UP-01) to resolve this.

:professorcat:

My Moral Fibers have been cut.

wintermute000

As a card carrying vcp, I say again what earth shattering improvement you get from a 1000v over a stuck 5.5 dvswitch. Nothing. So the networking guys get control again? That's political not technical. Let them learn vmware is the better solution.
Did I also mention 1000v is not compatible with aci, nor is it usable with vsphere6