Network Continuous integration using Jenkins,Jinja2 and Ansible.

Started by burnyd, September 26, 2016, 06:09:20 AM

Previous topic - Next topic

burnyd

Network Continuous integration using Jenkins,Jinja2 and Ansible.

I have been into the devops and agile life lately.  I have read the following books that have been extremely technology changing for myself.. Devops 2.0 Ansible for Devops Learning Continuous integration with Jenkins The Devops 2.0 book is fantastic and @vfarcic constantly updates the book so it is worth the price of admission.  I have said this […]

I have been into the devops and agile life lately.  I have read the following books that have been extremely technology changing for myself..


Devops 2.0

Ansible for Devops

Learning Continuous integration with Jenkins


The Devops 2.0 book is fantastic and @vfarcic constantly updates the book so it is worth the price of admission.  I have said this many times before about Devops.  Devops is simply using open source tooling to run through a workflow in a automated fashion.  The definition of Continuous integration is constantly testing your code to integration into production.


We as network engineers commonly use CLI or manual tasks to put into production because we know what are doing will “Just work.”  We have done it a million times and it has never failed.  We do not take into account human error or issues within the process ie firewall rules, wrong ip address etc.


This is the devops way which should be implemented for anything new configuration wise following the devops practive.


devopslife


Script/configuration is implemented to Jenkins.  Jenkins schedules the task.  Ansible iterates upon the script.  Checks and balances are done.  The script/config is then ran on the switch.  This is another use case for docker on a switch as this could all be ran in a test container before trying the code on the actual switch.  Finally notifications of the test build being successful are sent out either through slack or through email.


So in this blog post we are going to do something really simple so any network engineer can follow along and get their feet wet into the continuous integration book.  I also highly recommend the three books in which I have posted above.


Our topology is simple

-2 Oracle Virtual BOX VM’s

1.) Ubuntu 14.04 LTS

2.) Arista vEOS running 4.16

-Jenkins is running on the Ubuntu VM

-Ansible is running on the Ubuntu VM

-Python 2.7 is running on the Ubuntu VM


I will start with a simple line of configuration.  NTP is probably the easiest 1 liner of configuration we as network people normally use.  An NTP server can be added to a switch with a 1 liner


ntp server 10.10.10.10


We are going to make this slightly fancy here with Ansible and Jinja templates as if we ever wanted to change our NTP server we could do it on a large amount of switches in a ansible inventory.  So here is the J2 template it is very simple.  Lets take a look at our Ansible structure first.

tree

group_vars – Will hold all the variables in this case the NTP server config.

veos.yaml – contains the ntp server

inventory – contains all the hosts within the inventory file.

ntpserver.yaml – The ansible playbook

scripts – optional directory this has a python script we will get to that later

templates – directory that holds Jinja2 templates

ntpserver.j2 – holds the configuration “ntp server x.x.x.x”


Lets first check out our ansible-playbook


ntpp

This ansible-playbook simply uses the veos hosts which are located in the inventory file.

The second step is uses the eos_template which is located in the templates/ntpserver.j2 and applies this NTP server to each host like a giant for loop.


veos.yaml

ntpserver

ntpserver.j2

ntpserverconfig.png


This would simply add a NTP server to a arista vEOS device as long as its in the .eapi file and in the inventory file.


So for those who are following so far the next step is running this through a CI integration.  That is where Jenkins comes through.  Jenkins can execute the playbook and run through its checks and balances.  Here this is extremely simple as we are not doing much.  The work flow is as follows once again..

devopslife


Here is the job we will run.

jenkins

Here is a example of the job.

job


The first step is to run the playbook and run it in  “–Check” Mode.  This will run the play book as a dry run and not make any changes.  This is simply for any sort of error corrections or something that would be wrong with trying to connect to something within the host file.  So if either switch did not connect.


The second step is to start the configuration.  This will apply the configuration on the switch as it is in the Jinja2 templates.


The last step is a Python script shown here.

python


Alright so enough typing lets go ahead and run the script and hit the console into JenkinsCI.

jenkinsci

We were able to run through the tests everything is successful. Checking the switch it shoes the new NTP server of 10.10.10.10.


Lets purposely fail this setup and try to make a NTP server of something bogus that the switch would never take.


failure


This time it failed.  The first dry run will work because it can connect.  The second execution will not run as it will simply fail at trying to add commands to the switch.

Typically at this point either a email or a chat notification is sent out to make the rest of the team aware.


This was a good exercise walking through CI of network changes.  This is for sure the way to go for testing and checking for network changes in live production.


 


 
Source: Network Continuous integration using Jenkins,Jinja2 and Ansible.

deanwebb

Wow... I just realized that this is what I'm doing with my Tufin SecureChange workflow for firewall code provisioning. This tool adds in checks against existing security policies before clearing for deployment.

SDN isn't just coming, it's here. And it's a good thing.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

burnyd

Yes Jenkins is great. It has been around for a very long time and is completely battle ready.  Think of it like cron tabs on steroids. 

Deployments and changes should be treated the same way applications are treated. 

deanwebb

So... can it check out the potential impact of code upgrades?
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

burnyd

Yep, thats the whole reason for it for the most part.  So lets walk through what a dev would generally do.

1. Check out the code
2. Run unit tests
3. Build binaries and other required artifacts
4. Deploy the service to the staging environment
5. Run functional tests
6. Deploy the service to the production-like environment
7. Run production readiness tests
8. Deploy the service to the production environment
9.  Run production readiness tests

What are you working with?  If I wanted to check out a code upgrade for like in this example arista switches we could do the following.

1.) Get a similar environment
2.) Upgrade code in a similar environment ie vEOS in VMware etc....
3.) Once upgrade have Jenkins kick off a script to make sure all the same routes/macs etc were there.
4.) Okay they were all there all the things you deployed before ie bgp,ospf,bfd do they still work?  Kick off more scripts on or against the switch.
5.) Okay did it work?  Yes it worked Yay
6.) Upgrade in production.
7.) Run the same exact production tests did they work?
8.) Okay they worked
9.) Send the success email off to everyone.

Now all of these are accessible because there are open api's and its linux.  I would say it depends on the hardware you are working on.

deanwebb

That's the basic outline, but will it also work for different Cisco licensing levels, since those affect features available. I've seen upgrades to 15.0 code go well for one kind of license and fail for other types of license, with what we were doing. I suppose if Cisco allowed a license to exist on a set of CBD/CBI routing system for tests like these, that would be well.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

wintermute000

That is awesome.
Are there any jenkins smarts to parsing ansible stdout or python stdout? e.g. interpreting ping or traceroute or even show results. Or would you have to code the logic into your python originally?
So jealous of how everything is precanned for linux stuff but we're stuck on doing tonnes of expect style manual parsing

deanwebb

Just got off a call about using Tufin for our AWS deployment. I am so glad I had read this blog yesterday, because Jenkins came up in the conversation when they asked if we could incorporate a Tufin API to the Jenkins scripting to validate security on an AWS deployment.

Mr. Burnyd, you have earned a helpful vote from me for this blog.

:rock:
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

NetworkGroover

Quote from: burnyd on September 26, 2016, 12:54:45 PM
Yep, thats the whole reason for it for the most part.  So lets walk through what a dev would generally do.

1. Check out the code
2. Run unit tests
3. Build binaries and other required artifacts
4. Deploy the service to the staging environment
5. Run functional tests
6. Deploy the service to the production-like environment
7. Run production readiness tests
8. Deploy the service to the production environment
9.  Run production readiness tests

What are you working with?  If I wanted to check out a code upgrade for like in this example arista switches we could do the following.

1.) Get a similar environment
2.) Upgrade code in a similar environment ie vEOS in VMware etc....
3.) Once upgrade have Jenkins kick off a script to make sure all the same routes/macs etc were there.
4.) Okay they were all there all the things you deployed before ie bgp,ospf,bfd do they still work?  Kick off more scripts on or against the switch.
5.) Okay did it work?  Yes it worked Yay
6.) Upgrade in production.
7.) Run the same exact production tests did they work?
8.) Okay they worked
9.) Send the success email off to everyone.

Now all of these are accessible because there are open api's and its linux.  I would say it depends on the hardware you are working on.

Love it.
Engineer by day, DJ by night, family first always

burnyd

Quote from: wintermute000 on September 27, 2016, 05:08:18 AM
That is awesome.
Are there any jenkins smarts to parsing ansible stdout or python stdout? e.g. interpreting ping or traceroute or even show results. Or would you have to code the logic into your python originally?
So jealous of how everything is precanned for linux stuff but we're stuck on doing tonnes of expect style manual parsing

I honestly do not know to tell the truth.  I would if this were something within an API ask the device to do a ping traceroute etc have it return json and then say if != or == tbh.  Then if it fails then it fails make jenkins do the rest.  I have not done it but that is actually a really good use case.

I do not want to sound like a walking talking add but this all related to why you need telemtry in your network because everything and I mean eevveerrrryyttthiinggg is streamed live via openconfig via json.

packetherder

Quote from: wintermute000 on September 27, 2016, 05:08:18 AM
That is awesome.
Are there any jenkins smarts to parsing ansible stdout or python stdout? e.g. interpreting ping or traceroute or even show results. Or would you have to code the logic into your python originally?
So jealous of how everything is precanned for linux stuff but we're stuck on doing tonnes of expect style manual parsing

Haven't messed with Jenkins, but Ansible has the ability to add custom callback modules that let you return play/task results however and in whatever format you want. That'd probably make doing $magic in Jenkins feel less screen-scrapy.

wintermute000