Skip to content
bramboomen edited this page Sep 13, 2024 · 5 revisions

Etl-tooling git submodule

The etl-tooling repository is meant to be used as a 'submodule' of a pipeline repository, situated at the root of the repository. When initialized and updated, in the file-system the submodule looks like a generic subfolder called 'etl-tooling' containing all the files from the repository. To git the submodule is merely a reference to a specific commit of the etl-tooling repository.

Submodules are generally normal repositories. Git has some specific commands for submodules using the git submodule <command> <submodule> command, but normal git commands work in the submodule as well. For instance, these commands are all equivalent:

# Update the submodule to the latest version using the submodule command
git submodule update --remote etl-tooling

# Change directory to etl-tooling and checkout the latest version
cd etl-tooling            # Change to etl-tooling directory
git checkout origin/main  # Checkout the main branch from the remote
cd -                      # Change back to the previous directory

# Checkout the latest version without changing directory
#   git -C <path> means: execute git in <path> instead of current directory
git -C etl-tooling checkout origin/main

Keep in mind that submodules are generally in a detached HEAD state, so commands like git pull and git push may not work as expected.

Add etl-tooling submodule to existing pipeline

The etl-tooling submodule needs to be added, which will immediately clone the contents of the module. To add the changes, it needs to be committed to the pipeline repository. The commit will cover only the reference to the etl-tooling repository, not the contents.

  1. Add the submodule:
    git submodule add git@github.com:CWTSLeiden/CWTS-ETL-tooling.git etl-tooling
  2. Commit the submodule:
    git add etl-tooling
    git commit -m "add etl-tooling version x.x.x"

Clone existing pipeline containing etl-tooling submodule

When cloning a repository, the submodule reference will be cloned, but not the contents, resulting in a repository which has an empty folder where the etl-tooling repository should be. To clone the contents, the submodule needs to be registered and updated. This will bring the etl-tooling to the commit version to which the reference points.

  1. Clone the pipeline repository:
    git clone git@github.com:CWTSLeiden/$pipeline.git
  2. Register the submodule:
    git submodule init etl-tooling
  3. Clone the submodule:
    git submodule update

Update the etl-tooling to the pipeline version.

When another user updates the repository of a pipeline which includes a commit that updates the etl-tooling reference to a new version, these changes need to be pulled separately.

  1. Update the pipeline repository
    git pull
  2. Update the etl-tooling submodule
    git submodule update

Update the etl-tooling to the pipeline version when the etl-tooling remote URL has changed.

Github Desktop might give an error when checking out a version of the repository when the etl-tooling remote URL has changed. To fix this, use the git utility in powershell or cmd.

  1. Update the pipeline repository
    git fetch origin
    git switch <branch>
  2. Update the etl-tooling submodule
    git submodule sync
    git submodule update

Update the etl-tooling submodule of a pipeline to the latest version.

To bring the etl-tooling submodule of a pipeline up-to-date, update and commit the submodule.

  1. Update the etl-tooling submodule
    git submodule update --remote etl-tooling
  2. Commit the new etl-tooling submodule version
    git commit -m "update etl-tooling"

Update the etl-tooling submodule of a pipeline to a specific version.

To bring the etl-tooling submodule of a pipeline to a specific version, checkout a branch, tag or commit and commit the submodule.

  1. Update the etl-tooling submodule
    git -C etl-tooling checkout vX.X.X  # branch, tag or commit
  2. Commit the new etl-tooling submodule version
    git commit -m "update etl-tooling to vX.X.X"

Make changes to the etl-tooling repository

To make changes to the etl-tooling it is recommended to make the changes in the etl-tooling repository (not in the etl-tooling submodule of a pipeline repository), commit those changes and then update the submodule in the pipeline repository.

  1. Changes in etl-tooling repository
    # in etl-tooling repository
    git commit -m "update etl-tooling function"
    git push
  2. Update etl-tooling submodule changes in pipeline repository
    # in etl-tooling repository
    git submodule update --remote etl-tooling
    git commit -m "update etl-tooling"

Change the remote url of the etl-tooling repository

To change the upstream url of the etl-tooling submodule in a pipeline edit the .gitmodules file in the pipeline repository, synchronize and then update the submodule.

  1. Update the url field in the .gitmodules file in the root of the pipeline repository. If the main branch has a different name than the original branch it is necessary to add the branch = directive.
    [submodule "etl-tooling"]
        path = etl-tooling
        url = git@github.com:CWTSLeiden/CWTS-ETL-tooling.git
        branch = main
    
  2. Synchronize the submodule so that it is tracking the new url (and branch).
    git submodule sync etl-tooling
  3. Update the submodule to pull in the new upstream version.
    git submodule update --remote
  4. It may be beneficial to remove branches and tags from the old repository that are still on your machine.
    git -C etl-tooling fetch --prune --prune-tags
    git -C etl-tooling tag -d vX.X.X  # in case of conflicting tags

NOT RECOMMENDED! Make changes to the etl-tooling submodule

When working in a pipeline repository, you can edit the files of the submodule as if it were a regular git repository and commit those changes to the etl-tooling repository.

  1. Make changes to etl-tooling submodule and commit
    # in pipeline repository root
    # make edits to etl-tooling/$file
    git -C etl-tooling add $file
    git -C etl-tooling commit -m "update $file"
  2. Push changes to etl-tooling repository. (note that we are working with HEAD detached)
    # in pipeline repository root
    git -C etl-tooling push origin master
  3. Update etl-tooling submodule changes in pipeline repository
    # in pipeline repository root
    git commit -m "update etl-tooling"
  4. Set etl-tooling pointer to latest revision to be safe
    git submodule update --remote etl-tooling