Use Git to track Git versions

March 11, 2020 • edited April 10, 2020

Everyone who has ever been involved in managing versions of different applications, knows that in can be a pain in the ass. Luckily enough there are tools to ease this work and I’ve found that Git provides a general solution independently of your technology stack. Git Submodules.

The Dependency Hell is a well known problem and a lot of different solutions has been proposed to solve it. In real life, there is no golden key to open all your locks, so you have to look at your problem and decide which approach fits to your project. If you look at different technologies, you will find that most them implement a system to track versions. Recent technologies such as Go modules and NPM, relies on versions numbers plus hashes to keep track of dependencies.

Before we begin, Version Numbers

Quote from Semantic Versioning

In the world of software management there exists a dreaded place called “dependency hell.” The bigger your system grows and the more packages you integrate into your software, the more likely you are to find yourself, one day, in this pit of despair.

In systems with many dependencies, releasing new package versions can quickly become a nightmare. If the dependency specifications are too tight, you are in danger of version lock (the inability to upgrade a package without having to release new versions of every dependent package). If dependencies are specified too loosely, you will inevitably be bitten by version promiscuity (assuming compatibility with more future versions than is reasonable). Dependency hell is where you are when version lock and/or version promiscuity prevent you from easily and safely moving your project forward.

As a solution to this problem, I propose a simple set of rules and requirements that dictate how version numbers are assigned and incremented. These rules are based on but not necessarily limited to pre-existing widespread common practices in use in both closed and open-source software. For this system to work, you first need to declare a public API. This may consist of documentation or be enforced by the code itself. Regardless, it is important that this API be clear and precise. Once you identify your public API, you communicate changes to it with specific increments to your version number. Consider a version format of X.Y.Z (Major.Minor.Patch). Bug fixes not affecting the API increment the patch version, backwards compatible API additions/changes increment the minor version, and backwards incompatible API changes increment the major version.

I call this system “Semantic Versioning.” Under this scheme, version numbers and the way they change convey meaning about the underlying code and what has been modified from one version to the next.

This is not revolutionary idea. In fact, you probably do something close to this already. The problem is that “close” isn’t good enough. Without compliance to some sort of formal specification, version numbers are essentially useless for dependency management. By giving a name and clear definition to the above ideas, it becomes easy to communicate your intentions to the users of your software. Once these intentions are clear, flexible (but not too flexible) dependency specifications can finally be made.

If you have time, read the full paper of Semantic Versioning SemVer to understand the details. I found it really interesting and a life changing knowledge.

The gist is that given a version number MAJOR.MINOR.PATCH, increment the:

  • MAJOR version when you make incompatible API changes,
  • MINOR version when you add functionality in a backwards compatible manner, and
  • PATCH version when you make backwards compatible bug fixes.

From my experience, developers that are new to SemVer tend to create new versions every week. Especially when the project is in an early stage and it is required to share something with other teams. My recommendation is to follow the SemVer.9 paragraph. Create a version such as 0.2.0-[date] to differentiate between quick changes and, once you achieve the milestone, release the 0.2.0 and start the same process with 0.3.0.

When to use Git to manage versions between services

If you are working with a single technology with a single purpose. Let’s say Go to develop Traefik. You will be ok by just following SemVer and using Go Modules. Same applies for an Angular application. You are backed by SemVer and npm. But what happen when you are developing a backend in Java with a frontend in Angular? How do you keep track if which versions actually work? What if I told you that the backend is not a single boring monolith but a cool set of microservices? Soon, you will find yourself in Dependency Hell and this is the case where Git can help you.

Let’s imagine shop application with different modules. I’m going to focus in the parts of the application that are usually developed and ignoring infrastructure details such as reverse proxy, secret management, databases…

  • Frontend with Angular 8
  • Cart service With Java - Reused from the past
  • Browse service with Java - People already knows this technology
  • Search service with Go - Old java version doesn’t meet the performance requirements
  • Recommendation service with Python - Is in Python because of the tooling.

Due to the nature of the different stacks, there will be different people involved in the development of each service. This implies each one will be hosted in a separated Git repository.

In the tradicional approach, when you want to create a working environment, you follow the following steps.

  1. Clone repositories.
  2. Checkout version tags.
  3. Build/Test service images.
  4. Push artefacts.
  5. Deploy to Integration.
  6. Integration Testing.
  7. Deploy to Production
  8. Sanity Check

After the step Integration Testing, you can simply take note of which versions are deployed and check this combination as a valid one. The problem of manually handling valid combinations is that it can become a mess really quickly. What if we use Git to track this versions? Let me introduce you to Git Submodules.

Git Submodules

A Git Submodule is just a reference to another repository versioned as any other file inside the repository. It’s similar to a pointer.

To add Submodules, you have a special git command. This is how you can create the previous environment with Git Submodules.

1
2
3
4
5
6
7
8
git init
git submodule add https://www.github.com/frontend
git submodule add https://www.github.com/services/cart
git submodule add https://www.github.com/services/browse
git submodule add https://www.github.com/services/search
git submodule add https://www.github.com/services/recommendation
git add .
git commit -m "added project dependencies"

This is how the file structure will look like.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
.
├── .git
├── browse
│   └── .git
├── cart
│   └── .git
├── frontend
│   └── .git
├── recommendation
│   └── .git
└── search
    └── .git

Each submodule is an independent git repository. If you cd into one of them, you will see that git commands works as usual. You can edit files, commit and push changes back to the remote repository. When you perform this operation, the parent git repository will display changes in a submodule. This happens when the referenced commit of the submodule is different from the actual commit. Now you can just add this changes and create a commit to tell Git that you want to update the version of a submodule.

init

One of the most annoying parts of Submodules is that when you clone the repository, the Submodules are not cloned by default. There is an special command called init that downloads each repository that must be executed after clone.

1
git submodule init

update

When you use git checkout to move around the different versions, the Submodules are not updated by default. Once you are in the desired commit, you will see changes in the Git head due to different versions in Submodules. To automatically checkout each submodule to the right version, use update.

1
git submodule update

conflicts

When you move between versions that have different Submodules, git doesn’t handle it gracefully. You will see weird messages when executing update. The easy way to move forward is to remove the submodule folders manually and execute init again.

Workflow

Now that we have a repository to control different versions, the workflow is straight forward. Recalling the previous steps, now it look like this.

  1. Clone root repository.
  2. Init Submodules
  3. For each submodule, Checkout version tags.
  4. Build/Test service images.
  5. Commit changes to root repository
  6. Push artefacts.
  7. Deploy to Integration.
  8. Integration Testing.
  9. Create version tag in root repository
  10. Deploy to Production
  11. Sanity Check

This gives you a clear version control of versions that work fine together. At any given moment you can use the root repository to create environments at any given version combination.

Docker compose

If we assume that each submodule is packaged as a docker container that will be deployed to an orchestrator, this gives us a super easy and straight forward way to create environments on demand. Following the previous sample application, this is how the file structure will look like.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
.
├── .git
├── docker-compose.yaml
├── browse
│   ├── .git
│   └── Dockerfile
├── cart
│   ├── .git
│   └── Dockerfile
├── frontend
│   ├── .git
│   └── Dockerfile
├── recommendation
│   ├── .git
│   └── Dockerfile
└── search
    ├── .git
    └── Dockerfile
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
version: "2.4"

services:
  browser:
    build: browser
    image: browser:latest
  cart:
    build: cart
    image: cart:latest
  frontend:
    build: frontend
    image: frontend:latest
  recommendation:
    build: recommendation
    image: recommendation:latest
  search:
    build: search
    image: search:latest

Let’s translate the previous abstract steps into commands

  1. Clone root repository.
1
git clone https://www.github.com/shop
  1. Init Submodules
1
git submodule init
  1. For each submodule, Checkout version tags.
1
git submodule update
  1. Build/Test service images.
1
2
# Assuming unit tests are executed at docker build time
docker-compose build
  1. Commit changes to root repository
1
2
git add .
git commit -m "version x.x.x-rcx build successful"
  1. Push artefacts.
1
docker-compose push
  1. Create an environment
1
docker-compose up
  1. Integration testing
1
# Maybe a newman container or Robot python scripts... Up to you

With this approach we solve not only de dependency hell problem, but also the environment recreation. In other words, this solves the problem of “works on my machine” or at least reduces the chances of finding this problem.

There are tools for dependency management for different programming languages or frameworks There are tools for managing container versions

But how to we keep track of which versions should we use when we are dealing with multiple sources.

.gitmodules file

You may have noticed that there is an extra file called .gitmodules that contains. Here is the full file reference in case you want to read it. I will just explain the most basic functionality.

This is how the file will look like after adding the Submodules.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
[submodule "frontend"]
	path = frontend
	url = https://www.github.com/frontend

[submodule "cart"]
	path = cart
	url = https://www.github.com/services/cart

[submodule "browse"]
	path = browse
	url = https://www.github.com/services/browse

[submodule "search"]
	path = search
	url = https://www.github.com/services/search

[submodule "recommendation"]
	path = recommendation
	url = https://www.github.com/services/recommendtion

It is really simple to understand, it specifies the path and the remote url. You can modify these values manually but it is recommended to use the git cli to avoid introducing any mistake. Maybe the only exception is to modify the url to be relative. Check the Tips section.

Tips

Relative paths

Submodules added in the previous sections were added with full path. This means that we can specify a different protocol or different servers to clone the data. In practice, the repositories are hosted under the same domain most of the time or forks are crated. This situation allows us to modify paths of the submodules to be relative to the root repository.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
[submodule "frontend"]
	path = frontend
	url = .../frontend

[submodule "cart"]
	path = cart
	url = .../services/cart

[submodule "browse"]
	path = browse
	url = .../services/browse

[submodule "search"]
	path = search
	url = .../services/search

[submodule "recommendation"]
	path = recommendation
	url = .../services/recommendation

The advantage of this approach is that now you can select https or ssh depending on your needs and only modify it in one place. This is also a requirement if you have a CI/CD pipeline as you can read in GitLab documentation

update –init

You can run init + update in a single command with the flag --init

1
git submodule update --init

update –remote

If you need to update all the Submodules to the latest version, you can use the –remote flag.

1
git submodule update --remote

Also, you can specify which branch to track in .gitmodules file. This will allow you to track branches that are not the default.

Remove Submodule

As creating submodules is really easy, removing them is not straight forward. You can find instructions here Remove Submodule.

To remove a submodule you need to:

  • Delete the relevant section from the .gitmodules file.
  • Stage the .gitmodules changes git add .gitmodules
  • Delete the relevant section from .git/config.
  • Run git rm –cached path_to_submodule (no trailing slash).
  • Run rm -rf .git/modules/path_to_submodule (no trailing slash).
  • Commit git commit -m “Removed submodule "
  • Delete the now untracked submodule files rm -rf path_to_submodule

Modules folder

Put all Submodules under modules folder, so you can delete all of them with a single operation. This will be really helpful to resolve conflicts.

Conclusion

This is just scratching the surface of how to setup a project, but hopefully will give you some hints of how to introduce Git Submodules to track versions of different repositories.

References

howtogitdocker

2020-03-30 Weekend Learnings

Ready to Go Auth server with Nginx