Name Last Update
.gitmodules Loading commit data... Loading commit data...
my_lib @ 417a62908d9

This is the parent that uses "git submodule".

Here's how you can re-trace the demo given at utviklerlunsj.

First, make yourself a directory and clone the repository into it:

git clone .

Now reset your working copy of this repository to the commit right before we added the submodule:

git reset --hard 375c69b8e9ffb36b79c0d7cc0121d622c5b169aa

Okay, now there shouldn't be any my_lib. Let's make ourselves a branch in order to not mess up the master branch:

git checkout -b YOU_CHOOSE

Now add my_lib as a submodule:

git submodule add my_lib
git status # shows new metadata file .gitmodules and new *file* my_lib
git commit -m "[parent_with_submodule] added my_lib submodule"

After committing, we can inspect what git actually did:

git show # all changes in last commit, note that git doesn't save the contents of my_lib but rather a reference to a commit SHA
git cat-file -p HEAD^{tree} # this is (some of) what git actually stored in the last commit. Note the special "type" of my_lib - if it was a file, it would be "blob", if it was a directory, it would be "tree".

You'll find that in the directory 'my_lib', you are in a different repository:

git remote -v # shows that origin is "parent_with_submodule"
cd my_lib
git remote -v # shows that origin is "my_lib"
cd ..

The outputs of 'git log' and 'git status' will differ as well.

Now do some random change in my_lib and commit it. When you then do a 'git status' in parent, it will show that my_lib has changed ("new commits"). 'git diff my_lib' will show the change to the referenced commit SHA.

Now we will deliberately do something awful and commit parent and push it to origin.

git add my_lib
git commit -m "new version of my_lib"
git push --set-upstream origin YOUR_BRANCH_NAME

Now what do you think will happen when somebody wants to clone that? Let's try:

cd ..
mkdir temp_clone
cd temp_clone
git clone -b YOUR_BRANCH_NAME --recursive .

Notice the '--recursive' option? You have to do that in order to not only clone parent, but also all submodules of parent. Yes, you and your users will forget about that constantly. This is the biggest headache with submodules. But there's another. Check this at the end of the output:

fatal: reference is not a tree: 5870fc7691231edf23517055b09768e00e123bf7
Unable to checkout '5870fc7691231edf23517055b09768e00e123bf7' in submodule path 'my_lib'

(Your hash will differ). Git couldn't clone the referenced commit of my_lib because it did not find it. And that is because we did not push (i.e. publish) our changes to my_lib before we published our changes to parent. Always push children before you push parent in order to make sure that the published parent always points to commits of children that are already published..

So here's how we fix this: push the child, then update the submodules in our temporary working copy. Pushing the child is easy, here's how you pull and update submodules in temp_clone:

git submodule update --init --recursive

That command clones submodules if they haven't been cloned yet, and it pulls a new commit when parent's reference has changed. So that command is what you also have to do after every 'git pull' of parent. This is the second big headache with submodules: you have to remember to update the submodules after 'git pull'.

But that's basically it, now you know how submodules work. Delete your branch now so that there is no pollution on

git push origin --delete YOUR_BRANCH_NAME

Finally, you can delete temp_clone.