Not a git tutorial
Git is the dominant version control system today. Why did Linus Torvalds write it back in 2005, what were the problems he was trying to solve, how does it have such huge market share and what are some of its design considerations?
Git is the dominant version control system today. Why did Linus Torvalds write it back in 2005, what were the problems he was trying to solve, how does it have such huge market share and what are some of its design considerations?
This will not be a tutorial about git, instead, it will be a summary of the knowledge that I know about git.
What is the purpose of Version Control?
There are 3 primary objectives of Version Control[1].
- Work is done concurrently, not sequentially
- Work between people has to have minimal race conditions
- We want to see the history, who did what, when and why
Work is done concurrently, not sequentially.
The same code base has to be worked on by multiple developers at the same time. Developers in Singapore and Switzerland has to be able to work on the same code base independently of each other.
Work between people has to have minimal race conditions.
The same pair of developers in Singapore and Switzerland have to be able to put their code together when their day ends. The code should work well and without errors. If there are significant and breaking changes, the Version Control software should be able to resolve these conflicts easily.
We want to see the history.
Every line of code should be traceable to a developer, with the time it was written alongside the consideration and circumstances behind the piece of code.
Git vs Other Version Control Systems Today
Git has the largest market share in the version control category for many years[2]. Due to its opensource nature, it has many more hosting providers, front-end client programs and is built into many tools such as IDEs (Integrated Development Environments) and CICD pipelines (Continuous Integration and Delivery). Most software developers will have to understand and use git to be proficient at their job, regardless of their software stack and area of expertise.
The trailing Version Control Software is SVN, Apache Subversion. Adopting a Centralized model, it is architecturally different from the decentralized and distributed model from git. This comes with many trade-offs, making git a better option when trying to achieve the 3 primary objectives of version control.
What did Torvalds expect from a Version Control System?
Linus Torvalds set out to write a Version Control System to manage the Linux Kernel. He was unsatisfied with the market offerings at that period in time. To manage such a complex project with thousands of developers, he had some requirements for his version control system:
- The system has to perform well when showing the difference between commits up to thousands of lines.
- Decentralized system - the user has to be able to clone the entire project locally.
- Trusting the code - history of the code must be verifiable.
- Branches have to be performant, easy to manage and local.
How does git represent code changes
The entire git is a huge tree:
- Every commit is a node connected to a previous node.
- We can split away from the existing tree to create new a new branch.
- Each commit is an entire snapshot of the project.
- When you revert to a specific commit in the git tree, you get the entire project at that specific instance.
Each commit (node in the tree) will have the following details:
- Commit Hash - The repository goes through a hashing algorithm and generate a hash value. It acts as an ID for the commit and allows us to trust the history of the project.
- Commit Message - Every developer will write a succinct message to explain what they changed.
- Changes from the previous commit - Git provides a diff feature, allowing us to see changes line by line. Git also understands context. When we rename a file, git understands and acknowledges that; Rather than thinking we removed and added a new file with a different name, the diff will show that we renamed the file instead.
Every git tree is local
A git tree refers to the entire repository of a project. Since the entire repository contains all snapshots of a project from the initial commit to the latest, it is effectively the entire tree if you also have all of the branches. Having a local tree brings many advantages:
- Work can be done offline.
- The changes you make can be vetted locally or externally before you submit it to the collaboration tree.
- You can make as many branches as you want; Since trees are local, branches are local too. That means all local branches will not be shared with the collaboration tree unless you want to.
- Every diff and branching operation is faster because no network access is required.
References
Photo by Juliano Ferreira from Pexels
- Version Control by Example, ericsink.com/vcbe/html/index.html.
- “Stack Overflow Developer Survey 2018.” Stack Overflow, insights.stackoverflow.com/survey/2018#work-_-version-control.