Server Automation

Selecting Software

One task I've often done is herding servers, meaning that I need to log in to each member of a group of computers (for instance "all" or "all name servers") and run the same commands or the same decision tree ("if this then that"). The first time was in 1997, when I was tasked with patching 60 physical Solaris machines (same command to run security and recommended patches), and see if a reboot were required (decision tree based on output from the patches). Every time since then that I needed to apply patches, I spent some time looking for a way to ease the monotony of issuing the same commands (within variation of hostname and patch output) 60 or more times in a row.

XKCD: pass the salt

Patching is an exemplar, but the general problem of needing to do the same known actions to a group of computers is common. Sixty sounds like a small collection of computers to me now, so the need for this automation, along with the requirement for consistency within the group, is even greater.

I have an informal hierarchy of solutions to the general problem:

  1. The ssh macro: This tool uses ssh, and runs the same command on every host identified. Parallel operation is faster than one at a time, but the output might have responses interleaved in the order they arrive, so serial execution may be worth the wait. One example of a serial ssh macro is sssh that is fairly short and written in bash. (I submitted patches many years ago, so I do not recommend that one unless you're ready to maintain it too.) Given the utter simplicity of this approach (a few short bash scripts!), I expect any ssh macro I use to make the situation simpler, and to have reusable components. If I start complaining about installing or maintaining it, it's not simple enough.
  2. Automation:
    1. Steady-state: Steady-state automation is when you have a baseline, like "all ntp.conf files will look like this," and this tool enforces that baseline on a regular schedule.
    2. Ad hoc: Sometimes you need to do something right now, not on the baseline schedule. This is when the ad hoc tool is useful! If you already have a steady-state tool that you like, an ssh macro can always handle the ad hoc chores. You might want an ad hoc tool to run patches as soon as you hear about a zero-day exploit, even before the next scheduled run of patching. However, I prefer to use fewer tools when I can.
    Beyond those two sub-classes, I will assume the need for automation is obvious: automation offers consistency without the repetitious boredom.
  3. Orchestration: The final stage is orchestration, where the automation tool knows to apply dependencies first, and knows how to coordinate ordered actions between managed entities.
The difference between automation and orchestration in this document is that automation simplifies the steps to complete a task, while orchestration simplifies across your network.

XKCD: how long you can work on making a routine task more efficient before you're spending more time than you save

I will use director to refer to the automation controller, and targets for the managed hosts. My wish list for server automation is:

Also, the tool you select may handle provisioning, or bringing up new servers may be a different task; adjust your wish list accordingly.

Some of the popular tools in this space are Puppet, Chef, Ansible, Fabric, and Salt. From Deployment Management Tools: Chef vs. Puppet vs. Ansible vs. SaltStack vs. Fabric on Takipi Blog is the summary:

Chef and Puppet are some of the older, more established options, making them good for larger enterprises and environments that value maturity and stability over simplicity. Ansible and SaltStack are good options for those looking for fast and simple solutions while working in environments that don't need support for quirky features or lots of OSs. Fabric is a good tool for smaller environments and those looking for a more low lift and entry level solution.
I have no experience with Chef, but it requires a client on the target, so I continued looking at the others. I have used Puppet for about 7 years, experiencing the pain of Ruby upgrades and the pain of getting Puppet certificates accepted during an upgrade as well as the more frequent pain of needing to read at least three configuration files just to figure out what changes would be applied to a specific host. Puppet only handles steady-state automation without kludges. Puppet feels heavy to me, so I continued to look. I attended a "Welcome to Ansible 2.0, now with network" meetup a while back, and Ansible was positioned as the ad hoc counterpart to Puppet's steady state. At that time, Ansible had no plans to support Python 3, and it seemed like a heavy approach to the ssh macro, so I continued to look for something that felt simple to me. I read Jens Rantil on Salt vs Ansible, and I liked his positive experience with community support for Salt. The remaining choices were Fabric and Salt (or start over, relaxing or removing some of my requirements). I was intrigued by Salt after reading this stackoverflow comment that
You can use Salt both for configuration management and ad-hoc orchestration tasks.
Then a reddit thread pointed out that Fabric and ssh macros have a limit of 700 to 1000 targets, while Salt can scale beyond that, in parallel, without interleaving responses. I knew I wanted to start with Salt after reading that! It's got scalability and both flavors of automation! Also, I had just watched Network Automation with Salt and NAPALM (contains links to slides and to YouTube recording) at NANOG 68, so I knew Salt, like Ansible, had support for network devices as well. Although it can run without, Salt does prefer a client on the targets. Although Python 3 support is in work, Salt is still mostly Python 2. I still gave it a try first because no automation is a very poor choice.

What Salt Can Do

After a week of throwing myself into Salt, I had learned enough useful things to decide (happily) that I will continue to use it at home where I have four physical desktops, two single-board computers, and ten VMs at the moment; all running Linux. The Salt Tutorials are good, and beyond that, I've had good luck searching for other tutorials and general help on Salt.

Some additional vocabulary is useful. The director is called the Salt master, and the targets are called minions. The baseline configuration is called the highstate. Grains are data about a minion that are stored on or generated from the minion. States are defined in a SaLt State file, like /srv/salt/vim.sls (path can be nested more). For example:

        vim:
          pkg.installed: []

        /etc/vimrc:
          file.managed:
            - source: salt://vimrc
            - mode: 644
            - user: root
            - group: root
To enforce that state, run sudo salt '*' state.apply vim. The highstate (baseline) is defined in top.sls. Pillars are tree-like structures of data defined on the Salt master to allow confidential, targeted data to be sent securely only to the relevant minion.

After installation (select a platform for specific directions), the very next thing is to accept the keys that keep Salt secure. Although it is possible to accept all keys on the server, I prefer to verify on both ends; after all, I'm already logged in for the manual process of installing the Salt client software on that minion, so while I'm there, I may as well mind my security too. (Unless stated otherwise, all of the following commands are run on the salt-master.)

sudo salt-key # see all keys by state
sudo salt-key -a mickey # accept key for minion_id mickey
sudo salt-key -d mickey # delete all keys for that minion ID

It is also useful to know how to select minions. Obviously '*' matches all minions. The test.ping command is a very simple example that merely shows you if the salt-master can reach the specified minion(s).

sudo salt '*' test.ping # tests connectivity to all minions
sudo salt '*.example.org' test.ping # minions in the example.org domain
sudo salt -G 'os:Ubuntu' test.ping # filter by _G_rains, of which os is one
sudo salt -E 'virtmach[0-9]' test.ping # filter by regular _E_xpression on minion_id
sudo salt -L 'foo,bar,baz,quo' test.ping # explicit _L_ist
sudo salt -C 'G@os:Ubuntu and webser* or E@database.*' test.ping # _C_ompound match, G for grains and E for regex
sudo salt -C 'G@os:Ubuntu or G@os:CentOS and not G@roles:mysql' test.ping # logical operators: 'and' 'or' 'not'

Actually, testing connectivity is a good place to start. The basic command is test.ping, but I prefer manage.up in most cases.

sudo salt '*' test.ping
sudo salt-run manage.status # or manage.up, or manage.down

Because some of my VMs and one physical box are powered down right now, I can illustrate the differences between these commands.

hope@salt:~$ sudo salt '*' test.ping sprocket: True salt: True rowlf: True kid2: True pm: True camilla: True kid1: True zoot: True mickey: True hooper: True blank: True bunny: Minion did not return. [No response] moose: Minion did not return. [No response] doozer: Minion did not return. [No response] longjohn: Minion did not return. [No response] bounty: Minion did not return. [No response] hope@salt:~$ sudo salt-run manage.status down: - bunny - moose - doozer - longjohn - bounty up: - kid1 - mickey - kid2 - sprocket - rowlf - blank - camilla - hooper - zoot - salt - pm hope@salt:~$ sudo salt-run manage.up - kid1 - mickey - kid2 - sprocket - rowlf - zoot - camilla - blank - hooper - salt - pm hope@salt:~$

Now it is time to gather useful information!

sudo salt '*' grains.ls # what information is in the 'grains'?
sudo salt-call grains.ls # same command, but run on the minion, only for and about that minion
sudo salt '*' grains.items # more details
sudo salt-call grains.items # same, but on minion
sudo salt '*' pillar.items # see what data have been defined in pillars
sudo salt '*' disk.usage # show disk usage
sudo salt '*' status.loadavg # show load average
sudo salt '*' ps.grep graphite # show the status of graphite in a 'ps' listing
sudo salt '*' service.status apache2 # show the status of the apache2 service

Another easy task is to search for strings in specific files.

sudo salt '*' file.contains /etc/rsyslog.d/10-rsyslog.conf 'fd60'
sudo salt '*' file.contains_regex /etc/rsyslog.d/10-rsyslog.conf 'fd[0-9a-fA-F][0-9a-fA-F]'

Another useful task is to push out a file by running sudo salt-cp '*' file_on_master /target-dir/file_on_minion on the Salt master. (To pull in files, use 'scp -pr' or 'rsync -av' instead of Salt.)

LIST=$(sudo salt-run manage.up | tr -d '\n-')
for i in ${LIST}
do
  echo "Working on ${i}:"
  rsync -alPvz hope@${i}:/remote/file /local/.
  echo
done

The Salt command history sudo salt-run jobs.list_jobs shows what salt commands were recently run.

Remember the tedium of patching? And installing a package everywhere? That's now easy! However, patching all of my VMs at once pushed up the load average on my VM host higher than I'd like, so I learned how to batch them two at a time so that the package cache could help without being overwhelmed.

sudo salt '*' pkg.install vim
sudo salt '*' pkg.purge parole
sudo salt '*' pkg.refresh_db
sudo salt '*' pkg.list_upgrades
sudo salt '*' pkg.upgrade dist_upgrade=True
sudo salt '*' pkg.autoremove
sudo salt '*' -b 10 test.ping # batch 10 at a time
sudo salt '*' --batch-size 25% --batch-wait 5 test.ping # batch 25%, wait 5s before starting another
# my actual patching routine:
sudo salt -G 'os:Ubuntu' pkg.refresh_db
sudo salt -C 'G@os:Ubuntu and G@zz_type:physical' pkg.upgrade
sudo salt -C 'G@os:Ubuntu and not G@zz_type:physical' --batch-size 2 pkg.upgrade
# use these two commands instead for kernel upgrades:
# sudo salt -C 'G@os:Ubuntu and G@zz_type:physical' pkg.upgrade dist_upgrade=True --state-output=changes --state-verbose=False
# sudo salt -C 'G@os:Ubuntu and not G@zz_type:physical' --batch-size 2 pkg.upgrade dist_upgrade=True --state-output=changes --state-verbose=False
sudo salt -G 'os:Ubuntu' pkg.autoremove

Ad hoc commands, as promised, are supported. Here is one example: sudo salt '*' cmd.run "cat /etc/*-release"

Enforcing the baseline at any time (not just the scheduled times), or any defined state even one not included in the baseline, is also supported. Notice that, by appending test=True onto most commands, a change can be tested before commiting it (testing is a desirable feature in Ansible that I did not know was in Salt until I stumbled across it).

sudo salt '*' state.highstate test=True --state-output=changes --state-verbose=False # TEST first
sudo salt '*' state.apply # apply highstate (baseline) configuration now
sudo salt '*' state.apply rsyslog saltenv=development # apply specific sls for that environment

After learning the basics in Salt, I wanted to learn how to handle a task that had been unpleasant with Puppet, to determine what state(s) will be applied to which minions. The software needs to know what to do, so it should be able to tell me what it would do! That's surprisingly easy with Salt, and the level of information output can even be increased.

sudo salt '*' state.show_top
sudo salt '*' state.show_sls top # more output

I do not want to get too far into the details: you should customize your Salt to your needs. It does help to know where to start looking for files, though.

pushd /etc/salt # path to configuration files, per-host
pushd /srv/salt # path to state files on salt-master
pushd /srv/pillar # path to pillar data on salt-master