Adjusting Ansible playbook execution strategies is a use case that has come up several times. The default strategy Ansible uses, is called linear
, and it works great a majority of the time, but what if you’re automating a task that need to be done in a specific order or way. Strategies can be used to define how a playbook should be executed. This includes the ability to run in serial, run in batches, and even, in extreme cases, allows the ability to write a custom strategy plugin.
In this blog post I will provide multiple examples demonstrating the different strategies and settings. The playbooks that will be used will be dramatically simplified to allow the focus to be on the strategies in use rather than the content of the playbooks.
free
strategy, host_pinned
will run tasks as fast as it can on the number of hosts specified in the serial
keyword, immediately starting a host as soon as one ends.Note: The strategy
host_pinned
may operate as one would expectfree
to run, so give it try to make sure you understand the subtle difference.
To see available strategies, use ansible-doc -t strategy -l
command.
Strategies can be set per PLAY
or under the [defaults]
group in the ansible.cfg
file.
Note: Remember, a playbook often has a single play in it, but can have multiple plays.
Note: Remember, a playbook often has a single play in it, but can have multiple plays.
Ansible also provides some keywords that can be used to further tune how a playbook runs.
Note: Setting the
fork
value to1
effectively makes each task run on a single host before the next host starts the same task.
- Run Once: This option is exactly as it sounds. It’s a task that should be run only once.
- Great for operations that need to be done only once for the group. Not once per host.
- Good example would be creating a directory on the
control machine
for storing configuration files. You do NOT need every host to create the directory, this needs to be completed only one time.- Serial: The number of hosts that Ansible should be running side by side.
- Great for playbooks that include redundant devices that should not be worked on at the same time.
- Great for when there is possible race conditions.
- Throttle: Can be used in the task definition to limit the number of hosts the task is executed on. Similar to fork, but at a task level.
- Order: Order specifies how the hosts for a given play are going to be executed. By default the order comes from the inventory file, but other options exist that can provide flexibility if inventory hostnames provide insights into a given environment.
reverse_inventory
: Is the opposite of the default mode explained previously. It reverses the order from the inventory file.shuffle
: Random order.sorted
: Alphabetical.reverse_sorted
: Reverse alphabetical.
This is where I found myself struggling when learning these concepts. Ansible provides all the flexibility listed above which provides great power, but there is a relationship between the options that can cause an unexpected behavior.
The key relationship to be aware of is throttle
. When using it in parallel with forks
or serial
, throttle
must be less than forks
or serial
.
Now that I have summarized the options available, I will demo some of the options to solidify what the strategies accomplish.
Note: The default strategy
linear
, will not be demonstrated.
The playbook has two tasks.
The playbook:
---
- name: "DEBUG INVENTORY"
hosts: "all"
gather_facts: False
strategy: "free"
connection: "network_cli"
tasks:
- name: "10010: GET DEVICE NAMES"
debug:
msg: "{{ inventory_hostname }}"
- name: "10015: GET DEVICE OS"
debug:
msg: "{{ ansible_network_os }}"
...
▶ ansible-playbook -i inventory.cfg pb.yml
PLAY [DEBUG INVENTORY] ******************************************************************************************************************
TASK [10010: GET DEVICE NAMES] **********************************************************************************************************
ok: [nxos-spine1] => {
"msg": "nxos-spine1"
}
ok: [vmx2] => {
"msg": "vmx2"
}
ok: [vmx1] => {
"msg": "vmx1"
}
ok: [nxos-spine2] => {
"msg": "nxos-spine2"
}
ok: [vmx3] => {
"msg": "vmx3"
}
ok: [csr1] => {
"msg": "csr1"
}
ok: [csr2] => {
"msg": "csr2"
}
ok: [csr3] => {
"msg": "csr3"
}
ok: [eos-leaf1] => {
"msg": "eos-leaf1"
}
ok: [eos-leaf2] => {
"msg": "eos-leaf2"
}
TASK [10015: GET DEVICE OS] *************************************************************************************************************
ok: [vmx1] => {
"msg": "junos"
}
ok: [nxos-spine1] => {
"msg": "nxos"
}
ok: [nxos-spine2] => {
"msg": "nxos"
}
TASK [10010: GET DEVICE NAMES] **********************************************************************************************************
ok: [eos-spine1] => {
"msg": "eos-spine1"
}
ok: [eos-spine2] => {
"msg": "eos-spine2"
}
TASK [10015: GET DEVICE OS] *************************************************************************************************************
ok: [vmx2] => {
"msg": "junos"
}
ok: [vmx3] => {
"msg": "junos"
}
ok: [csr1] => {
"msg": "ios"
}
ok: [csr2] => {
"msg": "ios"
}
ok: [csr3] => {
"msg": "ios"
}
ok: [eos-leaf1] => {
"msg": "eos"
}
ok: [eos-leaf2] => {
"msg": "eos"
}
ok: [eos-spine1] => {
"msg": "eos"
}
ok: [eos-spine2] => {
"msg": "eos"
}
PLAY RECAP ******************************************************************************************************************************
csr1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
csr2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
csr3 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-leaf1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-leaf2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-spine1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-spine2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
nxos-spine1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
nxos-spine2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
vmx1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
vmx2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
vmx3 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Looking into the execution of the playbook, the first thing I notice is each task isn’t completed before the next task is executed. free
allows Ansible to run each host independently and execute the tasks as quick as it can. free
is a great option when the tasks aren’t dependent on one another, or the hosts themselves aren’t dependent on one another. A good example might be collecting operation state data after a change.
Very helpful when a playbook is running into issues, and the failed output isn’t helpful enough to identify the problem.
The playbook:
---
- name: "DEBUG INVENTORY"
hosts: "all"
gather_facts: False
strategy: "debug"
connection: "network_cli"
tasks:
- name: "10010: GET DEVICE NAMES"
debug:
msg: "{{ inventory_hostnames }}"
debugger: on_failed
Notice: For demonstration I misspelled the variable name by adding an
s
to the end.
▶ ansible-playbook -i inventory.cfg pb.yml
PLAY [DEBUG INVENTORY] ******************************************************************************************************************
TASK [10010: GET DEVICE NAMES] **********************************************************************************************************
fatal: [nxos-spine1]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'inventory_hostnames' is undefined\n\nThe error appears to be in '/Users/jeffkala/Documents/github-clones/ansible-examples/strategies/pb.yml': line 8, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n tasks:\n - name: \"10010: GET DEVICE NAMES\"\n ^ here\nThis one looks easy to fix. It seems that there is a value started\nwith a quote, and the YAML parser is expecting to see the line ended\nwith the same kind of quote. For instance:\n\n when: \"ok\" in result.stdout\n\nCould be written as:\n\n when: '\"ok\" in result.stdout'\n\nOr equivalently:\n\n when: \"'ok' in result.stdout\"\n"}
[nxos-spine1] TASK: 10010: GET DEVICE NAMES (debug)> dir()
['host', 'play_context', 'result', 'task', 'task_vars']
[nxos-spine1] TASK: 10010: GET DEVICE NAMES (debug)> task_vars.keys()
dict_keys(['ansible_network_os', 'inventory_file', 'inventory_dir', 'inventory_hostname', 'inventory_hostname_short', 'group_names', 'ansible_facts', 'playbook_dir', 'ansible_playbook_python', 'ansible_config_file', 'ansible_role_names', 'ansible_play_role_names', 'ansible_dependent_role_names', 'role_names', 'ansible_play_name', 'groups', 'ansible_play_hosts_all', 'ansible_play_hosts', 'ansible_play_batch', 'play_hosts', 'omit', 'ansible_version', 'ansible_check_mode', 'ansible_diff_mode', 'ansible_forks', 'ansible_inventory_sources', 'ansible_skip_tags', 'ansible_run_tags', 'ansible_verbosity', 'hostvars', 'environment', 'vars', 'ansible_current_hosts', 'ansible_failed_hosts'])
[nxos-spine1] TASK: 10010: GET DEVICE NAMES (debug)> task_vars.get('inventory_hostname')
'nxos-spine1'
[nxos-spine1] TASK: 10010: GET DEVICE NAMES (debug)> exit()
This strategy is extremely helpful. Looking at the play execution, the first task fails on the first host. Ansible raises a fatal
error and provides the message but then drops the user into a debug
shell where standard Python
can be used to troubleshoot. The first step I performed was to run dir()
to see what objects I could look into. For this example, I next looked into the task_vars.key()
to see the valid keys. Once I printed this information I notice that I needed to use inventory_hostname
instead of inventory_hostnames
. Of course, this is a simplified example, but on more complex playbooks it’s a valuable option to identify issues.
Host pinned is a bit more tricky, and in the backend uses serial to determine the batch number used. If serial
is not set, it defaults to all.
The playbook:
---
- name: "DEBUG INVENTORY"
hosts: "all"
gather_facts: False
strategy: "host_pinned"
serial: 2
connection: "network_cli"
tasks:
- name: "10010: GET DEVICE NAMES"
debug:
msg: "{{ inventory_hostname }}"
- name: "10015: GET DEVICE OS"
debug:
msg: "{{ ansible_network_os }}"
...
▶ ansible-playbook -i inventory.cfg pb.yml
PLAY [DEBUG INVENTORY] ******************************************************************************************************************
TASK [10010: GET DEVICE NAMES] **********************************************************************************************************
ok: [nxos-spine1] => {
"msg": "nxos-spine1"
}
ok: [nxos-spine2] => {
"msg": "nxos-spine2"
}
TASK [10015: GET DEVICE OS] *************************************************************************************************************
ok: [nxos-spine1] => {
"msg": "nxos"
}
ok: [nxos-spine2] => {
"msg": "nxos"
}
PLAY [DEBUG INVENTORY] ******************************************************************************************************************
TASK [10010: GET DEVICE NAMES] **********************************************************************************************************
ok: [vmx1] => {
"msg": "vmx1"
}
ok: [vmx2] => {
"msg": "vmx2"
}
TASK [10015: GET DEVICE OS] *************************************************************************************************************
ok: [vmx1] => {
"msg": "junos"
}
ok: [vmx2] => {
"msg": "junos"
}
PLAY [DEBUG INVENTORY] ******************************************************************************************************************
TASK [10010: GET DEVICE NAMES] **********************************************************************************************************
ok: [vmx3] => {
"msg": "vmx3"
}
ok: [csr1] => {
"msg": "csr1"
}
TASK [10015: GET DEVICE OS] *************************************************************************************************************
ok: [vmx3] => {
"msg": "junos"
}
ok: [csr1] => {
"msg": "ios"
}
PLAY [DEBUG INVENTORY] ******************************************************************************************************************
TASK [10010: GET DEVICE NAMES] **********************************************************************************************************
ok: [csr2] => {
"msg": "csr2"
}
ok: [csr3] => {
"msg": "csr3"
}
TASK [10015: GET DEVICE OS] *************************************************************************************************************
ok: [csr2] => {
"msg": "ios"
}
ok: [csr3] => {
"msg": "ios"
}
PLAY [DEBUG INVENTORY] ******************************************************************************************************************
TASK [10010: GET DEVICE NAMES] **********************************************************************************************************
ok: [eos-leaf1] => {
"msg": "eos-leaf1"
}
ok: [eos-leaf2] => {
"msg": "eos-leaf2"
}
TASK [10015: GET DEVICE OS] *************************************************************************************************************
ok: [eos-leaf1] => {
"msg": "eos"
}
ok: [eos-leaf2] => {
"msg": "eos"
}
PLAY [DEBUG INVENTORY] ******************************************************************************************************************
TASK [10010: GET DEVICE NAMES] **********************************************************************************************************
ok: [eos-spine1] => {
"msg": "eos-spine1"
}
ok: [eos-spine2] => {
"msg": "eos-spine2"
}
TASK [10015: GET DEVICE OS] *************************************************************************************************************
ok: [eos-spine1] => {
"msg": "eos"
}
ok: [eos-spine2] => {
"msg": "eos"
}
PLAY RECAP ******************************************************************************************************************************
csr1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
csr2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
csr3 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-leaf1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-leaf2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-spine1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-spine2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
nxos-spine1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
nxos-spine2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
vmx1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
vmx2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
vmx3 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Pay close attention to each iteration. When serial
is used, it’s looping over the PLAY
itself with the number specified in the serial
keyword. In the above output, the PLAY
was executed six times, each time running through the tasks with two hosts.
Now that the main strategies available have been demonstrated, the next few demos will focus on some of the other keywords that can be used to change the execution of the playbook.
Using serial
without tying it to a strategy allows for further flexibility. Running playbooks in batches defined by the administrator can help validate that a playbook is working as expected before executing it against a larger batch of host. Since the previous example showed what serial
does, below I’ll provide a few options when specifying the serial
option.
---
- name: "DEBUG INVENTORY"
hosts: "all"
gather_facts: False
serial:
- 1
- 2
- 5
connection: "network_cli"
tasks:
<omitted>
...
This will execute the PLAY
with one host, then the second iteration would run on two host, then 5 host for however many iterations remain.
---
- name: "DEBUG INVENTORY"
hosts: "all"
gather_facts: False
serial:
- "30%"
connection: "network_cli"
tasks:
<omitted>
...
It is also possible to mix and match the different options. The serial
Ansible Documentation is excellent.
When I use the keyword order
in my play definition and use sorted
, the playbook is run on the host in alphabetical order.
Note: I’m back to using the default strategy of
linear
.
The playbook:
---
- name: "DEBUG INVENTORY"
hosts: "all"
gather_facts: False
order: "sorted"
connection: "network_cli"
tasks:
- name: "10010: GET DEVICE NAMES"
debug:
msg: "{{ inventory_hostname }}"
- name: "10015: GET DEVICE OS"
debug:
msg: "{{ ansible_network_os }}"
...
▶ ansible-playbook -i inventory.cfg pb.yml
PLAY [DEBUG INVENTORY] ******************************************************************************************************************
TASK [10010: GET DEVICE NAMES] **********************************************************************************************************
ok: [csr1] => {
"msg": "csr1"
}
ok: [csr2] => {
"msg": "csr2"
}
ok: [csr3] => {
"msg": "csr3"
}
ok: [eos-leaf1] => {
"msg": "eos-leaf1"
}
ok: [eos-leaf2] => {
"msg": "eos-leaf2"
}
ok: [eos-spine1] => {
"msg": "eos-spine1"
}
ok: [eos-spine2] => {
"msg": "eos-spine2"
}
ok: [nxos-spine1] => {
"msg": "nxos-spine1"
}
ok: [vmx1] => {
"msg": "vmx1"
}
ok: [nxos-spine2] => {
"msg": "nxos-spine2"
}
ok: [vmx2] => {
"msg": "vmx2"
}
ok: [vmx3] => {
"msg": "vmx3"
}
TASK [10015: GET DEVICE OS] *************************************************************************************************************
ok: [csr1] => {
"msg": "ios"
}
ok: [csr2] => {
"msg": "ios"
}
ok: [csr3] => {
"msg": "ios"
}
ok: [eos-leaf2] => {
"msg": "eos"
}
ok: [eos-leaf1] => {
"msg": "eos"
}
ok: [eos-spine1] => {
"msg": "eos"
}
ok: [eos-spine2] => {
"msg": "eos"
}
ok: [nxos-spine2] => {
"msg": "nxos"
}
ok: [vmx1] => {
"msg": "junos"
}
ok: [nxos-spine1] => {
"msg": "nxos"
}
ok: [vmx2] => {
"msg": "junos"
}
ok: [vmx3] => {
"msg": "junos"
}
PLAY RECAP ******************************************************************************************************************************
csr1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
csr2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
csr3 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-leaf1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-leaf2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-spine1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-spine2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
nxos-spine1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
nxos-spine2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
vmx1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
vmx2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
vmx3 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
The playbook execution ran the host in alphabetical order. This could be useful for simple human readability or potentially useful if your companies hostname standard can be used to interpret certain device roles.
Lastly, I’ll cover the throttle
keyword. I’ve used throttle
quite a bit in network automation when working with network orchestrators or other REST APIs that use rate limiting. If I have a large list of hosts that I need to gather information on from an API and that API rate limits the number of connections per minute, I can setup throttle
to slow down the execution of that task within the playbook.
The playbook:
---
- name: "DEBUG INVENTORY"
hosts: "all"
gather_facts: False
connection: "network_cli"
tasks:
- name: "10010: GET DEVICE NAMES"
debug:
msg: "{{ inventory_hostname }}"
- name: "10015: GET DEVICE OS"
throttle: 1
debug:
msg: "{{ ansible_network_os }}"
...
For this one, unfortunately, showing the output of the file doesn’t demonstrate the value. The way this playbook would execute would be based on the default forks
of 5 on task 10010
, the next task 10015
being run one host at a time (ignoring the forks).
In order to help demonstrate the time it took, I will quickly use callback
to get the timer.
In my ansible.cfg
under the [defaults]
I will add:
callback_whitelist = profile_tasks
The result of the playbook now shows the timer.
▶ ansible-playbook -i inventory.cfg pb.yml
PLAY [DEBUG INVENTORY] ******************************************************************************************************************
TASK [10010: GET DEVICE NAMES] **********************************************************************************************************
Thursday 29 April 2021 11:20:05 -0500 (0:00:00.017) 0:00:00.017 ********
ok: [vmx1] => {
"msg": "vmx1"
}
ok: [nxos-spine1] => {
"msg": "nxos-spine1"
}
ok: [nxos-spine2] => {
"msg": "nxos-spine2"
}
ok: [vmx3] => {
"msg": "vmx3"
}
ok: [vmx2] => {
"msg": "vmx2"
}
ok: [csr1] => {
"msg": "csr1"
}
ok: [csr2] => {
"msg": "csr2"
}
ok: [csr3] => {
"msg": "csr3"
}
ok: [eos-leaf1] => {
"msg": "eos-leaf1"
}
ok: [eos-leaf2] => {
"msg": "eos-leaf2"
}
ok: [eos-spine1] => {
"msg": "eos-spine1"
}
ok: [eos-spine2] => {
"msg": "eos-spine2"
}
TASK [10015: GET DEVICE OS] *************************************************************************************************************
Thursday 29 April 2021 11:20:09 -0500 (0:00:03.836) 0:00:03.853 ********
ok: [nxos-spine1] => {
"msg": "nxos"
}
ok: [nxos-spine2] => {
"msg": "nxos"
}
ok: [vmx1] => {
"msg": "junos"
}
ok: [vmx2] => {
"msg": "junos"
}
ok: [vmx3] => {
"msg": "junos"
}
ok: [csr1] => {
"msg": "ios"
}
ok: [csr2] => {
"msg": "ios"
}
ok: [csr3] => {
"msg": "ios"
}
ok: [eos-leaf1] => {
"msg": "eos"
}
ok: [eos-leaf2] => {
"msg": "eos"
}
ok: [eos-spine1] => {
"msg": "eos"
}
ok: [eos-spine2] => {
"msg": "eos"
}
PLAY RECAP ******************************************************************************************************************************
csr1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
csr2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
csr3 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-leaf1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-leaf2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-spine1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
eos-spine2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
nxos-spine1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
nxos-spine2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
vmx1 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
vmx2 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
vmx3 : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Thursday 29 April 2021 11:20:22 -0500 (0:00:13.460) 0:00:17.314 ********
===============================================================================
10015: GET DEVICE OS ------------------------------------------------------------------------------------------------------------ 13.46s
10010: GET DEVICE NAMES ---------------------------------------------------------------------------------------------------------- 3.84s
Notice task 10010
took only ~4 seconds, while task 10015
which was run in throttle
mode, took drastically longer at ~13 seconds.
I hope this blog post helps to demonstrate the value of strategies
and other keyword
options that Ansible comes with natively. Each has valid use cases and mixing and matching the options can bring additional stability and flexibility to a playbook. Every user has different concerns with making changes to an environment, but utilizing these options can help ease the natural anxiety that comes with making changes across an entire infrastructure.
-Jeff
Share details about yourself & someone from our team will reach out to you ASAP!