Brandon Harper
Feb 01, 2022

Mitigating SSH Timeouts in Ansible Automation When Working with the Git Module

SSH timeout issues can cause problems with long-running playbooks. These types of issues are common with Ansible's "Git" submodule. In this blog post, we dive into why these problems occur and how look at to fix them.

SSH Timeouts

Ansible uses SSH for every connection to a target machine. While SSH has a (relatively) long timeout value, when dealing with long-running processes, there is still a risk that a playbook will be disruped if the process is taking to long.

Common tasks that can run into such timeout issues include source code checkouts or updates using Ansible's git module. clone and pull operations are susceptible, but any network operation that hangs can trigger the issue. git-pull commands in repositories with submodules seem to be a very common failure in our playbooks. Luckily, it's relatively easy to work around the issue by modifying some of the default parameters of the git module.

Git Module

In the case of repositories with recursive submodules, timeouts occur because every git-pull recursively clones all of the project's previous revisions. For large repositories with dozens of submodules, or deeply nested submodules, this can be very problematic. The default configuration of the git submodule command git-pull is set to recursively clone the full history of all submodules. Depending on the deployment, however, this may not be desirable. For many production or staging deployments, the entire history of the full project isn't needed. Rather, only the most recent (or a small subset) of the total change set is desirable.

As an example, consider a repository containing playbooks and a number of roles that have been attached as submodules. Now consider that each role contains submodules for tasks like testing programs, handling complex templates opeations, and installing software. In a default checkout, both the top-level repository and all of its submodules will be checkout with their entire history. Depending on the development history and how well-maintained the repository has been, the complete history can be very large. If your goal is to simple execute the playbook, the history isn't needed. Git itself contains an option to limit the depth of the repository:

--depth <depth>
Create a shallow clone with a history truncated to a specified number of revisions.

So how can we tell Ansible to only clone the revisions we want?

site.yml
webservers.yml
fooservers.yml
roles/
   common/
     tasks/
        tests/
     handlers/
     files/
     templates/
     vars/
     defaults/
     meta/
   webservers/
     tasks/
        tests/
     defaults/
     meta/

depth as Implemented in Ansible

The depth option is exposed in Ansible's Git Submodule. The Ansible documentation expands on what we learned above:

Create a shallow clone with a history truncated to the specified number of revisions. The minimum possible value is 1, otherwise ignored. Needs git>=1.9.1 to work correctly."

When present, depth instructs the program to limit the number of revisions (the clone is "shallow" because the full scope of the history is absent). If you set the value to 1, it should only checkout the working copy of the files, significantly speeding up the clone or pull. Unfortunately, there's a major limitation with the submodule: the depth field doesn't persist into submodules. The command was written so that the depth field is only read by the parent repository. This means that the parent history will be truncated, but that submodules will still checkout their full history (and require the overhead that comes with it).

Solution

For deployments and continuous integration, what kind of a difference does the use of a shallow checkout make? In our case, we found that omitting the history history reduced the amount of data transferred by a factor of ten. For our large projects, shallow checkout all but eliminated failures due to SSH timeout.

Here's the approach that we have taken in our playbooks:

  • Since ansible-pull clones the previous revisions of the system by default and the depth field is ignored once it reaches submodules, we perform our git checkouts in two steps.
  • In step one: we use the git-module to pull the parent repository.
  • In step two: we use the shell module to fetch the submodules and pass the 'depth' option as an argument.

When performing the initial checkout with the git-module, we utilize depth option set to 1 to truncate the history.

git-module

The code listing shows a simplified example of how the Git checkout might look in a playbook. Since we only want to clone the parent repository and not pull the submodules, the recursive field is defined as no.

- name: git clone source code without submodules
  git:
    repo: "{{ source_url }}"
    dest: "{{ destination_path }}"
    force: yes
    version: "{{ revision_version }}"
    depth: 1
    recursive: no

shell-module

This code listing shows the command that might be passed to checkout the submodules. Unlike in the previous example, where the history is limited to one, we've allowed more of the history.

There sometimes may be an issue with Git not having a record of deleted files. A simple workaround is to increase the depth to fifty or one-hundred. Depending on your submodules and your needs, you can set the depth field to any satisfactory value. Likewise, it is possible to specify different options for each submodule and loop over them using with_items.

- name: Add submodules
  shell: git submodule update --init --recursive --depth 50
  args:
    chdir: "{{ inside_target_git_folder }}"

Conclusion

For large repositories with an extensive history, Git clone and update operations can cause SSH timeout issues with long-running playbooks. In many cases, it's possible to mitigate the issue by using the depth field of the Git module. Unfortunately, depth is not respected by submodules.

A workaround to allow shallow checkout of submodules is to create two tasks: one which fetches the parent repository and a second to pull the submodules. If decreasing the size of the checkout doesn't resolve the issue, they can be turned into an asynchronous task to increase the odds of success. And if that is insufficient to resolve the timeouts, it's always possible to increment the SSH timeout of your environment.

Brandon Harper Feb 01, 2022
More Articles by Brandon Harper

Loading

Unable to find related content

Comments

Loading
Unable to retrieve data due to an error
Retry
No results found
Back to All Comments