A is the set of commits (a1, a2, a3…) on branch A.
B is the set of commits (b1, b2, b3…) on branch B that were added after this branch was created based on branch A.
A’ is the set of new commits (a’1, a’2…) on branch A that need to be added to branch B via a merge or rebase.
So before the merge or rebase, the current configuration of the branches are
Branch A: A + A’ (original set of commits and new set of commits) = a1 + a2 + a3 + … + a’1 + a’2 +….
Branch B: A + B (original set of commits from branch A and set of commits on branch B) = a1 + a2 + a3 + … + b1 + b2 + b3 + …
When I talk about the merge from branch A onto branch B, the addition of the sets of commits is
A + B + A’ = a1 + a2 + a3 + … + b1 + b2 + b3 + … + a’1 + a’2 + …
The order is important: You start with the original commits on branch A. The commits on branch B are added on top of those commits. And finally the new commits from branch A are added on top of those.
You can also view “A + B” as the code that you see in the directory that is the end result of the set of commits A and B. Which is why I have the first statement:
A + B + A’ = A + A’ + B
Whether you choose a rebase or a merge, the final code that you see should be the same. However, the order of individual commits that go into are different.
From the standpoint of the end result, a merge and a rebase in Git appear to do the same thing:
A + B + A’ (merge) = A + A’ + B (rebase)
Wouldn’t it be simpler to just choose one operation and stick with it?
The answer of course is no. Otherwise you wouldn’t have the option. (If you feel completely contrary, best of luck. And choose merge.)
It really only becomes apparent once your development effort becomes hierarchical, either in terms of application lifecycle (such as dev, test and release versions) or in a team structure. You’re now dealing with multiple branches, each with non-trivial changes that can and will occur independently.
Rebases are how changes should pass from the top of hierarchy downwards and merges are how they flow back upwards.
Let’s take a look in more detail at what is actually taking place in each operation for a parent branch Aand a child branch B:
Merge
A + B + A’ + dA’B
where A’ are the changes being merged in and dA’B the resolution of merge conflicts from A’ and the set of commits B on the current branch
The changes introduced by A’ and dA’B are grouped together into one merge commit A” = A’ +dA’B that is added on top of the existing set of commits (A + B) and becomes the head of the current branch:
A + B + A”
Rebase
A + A’ + B + dA’B
The rebase resets the starting point the branch and reapplies the set of commits B. Merge conflicts dA’B are combined with these commits so they become grouped together B’ = B + dA’B:
A + A’ + B’
If you stare at it, you’ll realize that the rebase guarantees that the changes being brought it in from the other branch come in exactly as-is: A + A’. Any conflicts are resolved and contained with the associated commits B’ on the current branch.
Using merge to pull in changes from the higher-level branch mixes those changes with the resolved merge conflicts. This means that the current branch won’t necessarily have the same state as the one it was based on (from the hierarchical structure). And since the resolved conflicts are grouped all together in one merge commit, you’ve also made it harder to cleanly cherry-pick individual changes.
Having development lifecycle tracks and each local developer branch start from known consistent states is critical to reducing and resolving code issues. Where a change occurs (or suddenly becomes missing) and who is responsible become easier to determine. By using the rebase to pull in changes, you have that.
When you’re submitting changes back up the chain, you only want to add your changes on top of the existing commits of the higher-level branch. Merge is clearly the operation for that.
Even if you’re a single developer with only a few branches, it’s worth it to get in the habit of using rebase and merge properly. The basic work pattern will look like: