% git-hosting-backup(1) Version 0.1.1 | Backing up Git Hosted Repo
DESCRIPTION
This git-hosting-backup command back up Git repositories:
- from
GitHub - to a Rclone destination
- as a Git bundle
Example
Backup Github Repos to S3
To back up your repositories:
- from github
- to s3
- excluding the repo
site-com-datacadamiayou would execute:
docker run \
--name git-hosting-backup \
--rm \
--user 1000:1000 \
-v ~/.ssh:/home/me/.ssh \
-e GITURE_GITHUB_TOKEN=$GITHUB_TOKEN \
-e GITURE_S3_PLATFORM=rclone \
-e GITURE_S3_RCLONE_BASE_PATH=git-backup \
-e RCLONE_CONFIG_S3_TYPE=s3 \
-e RCLONE_CONFIG_S3_PROVIDER=IDrive \
-e RCLONE_CONFIG_S3_ENDPOINT=h0k0.ca.idrivee2-22.com \
-e RCLONE_CONFIG_S3_SECRET_ACCESS_KEY=$GIT_BACKUP_SECRET_KEY \
-e RCLONE_CONFIG_S3_ACCESS_KEY_ID=$GIT_BACKUP_ACCESS_KEY \
-e RCLONE_CONFIG_S3_NO_CHECK_BUCKET=true \
-e RCLONE_CONFIG_S3_SERVER_SIDE_ENCRYPTION=aws:kms \
ghcr.io/gerardnico/giture:latest \
git-hosting-backup github s3 --filter-exclude-pattern=site-com-datacadamia
Backup Github Repos to SFTP Bunny
To back up your repositories:
- from github
- to Bunny with Rclone SFTP
- excluding the repo
site-com-datacadamiayou would execute:
docker run \
--name git-backup \
--rm \
--user 1000:1000 \
-v ~/.ssh:/home/me/.ssh \
-e GITURE_GITHUB_TOKEN=$GITHUB_TOKEN \
-e GITURE_BUNNY_PLATFORM=rclone \
-e RCLONE_INPLACE=1 \
-e RCLONE_SIZE_ONLY=1 \
-e RCLONE_CONFIG_BUNNY_TYPE=sftp \
-e RCLONE_CONFIG_BUNNY_HOST=storage.bunnycdn.com \
-e RCLONE_CONFIG_BUNNY_ENDPOINT=h0k0.ca.idrivee2-22.com \
-e RCLONE_CONFIG_BUNNY_USER=git-backup \
-e RCLONE_CONFIG_BUNNY_PASS=$GIT_BACKUP_BUNNY_PASS \
ghcr.io/gerardnico/giture:latest \
git-hosting-backup github bunny --filter-exclude-pattern=site-com-datacadamia
Note that:
RCLONE_INPLACE=1is needed because Bunny does not support renaming Leading to error such aspartial file rename failed: Move Rename failed: sftp: "Internal server error." (SSH_FX_FAILURE)RCLONE_SIZE_ONLY=1is needed because Bunny does not support modification time update.
Example Explanation
The command executed is:
git-hosting-backup github s3 --filter-exclude-pattern=site-com-datacadamia
where:
backupis the commandgithubis the service defined by the followingGITURE_SERVICE_NAME_xxxenvs)
GITURE_GITHUB_PLATFORM=github # platform type (optional as it defaults to the name)
GITURE_GITHUB_TOKEN=$GITHUB_TOKEN # API Token
s3is the target defined by the followingGITURE_PLATFORM_NAME_xxxenvs
GITURE_S3_PLATFORM=rclone # rclone
GITURE_S3_RCLONE_REMOTE_NAME=s3 # optional remote name, by default, the target registry name (only characters and _ as this an env),
GITURE_S3_RCLONE_BASE_PATH=git-backup # the base path (in our s3 case, the bucket name)
--filter-exclude-pattern=xxxis a regexp pattern that if the expression matches the full name repository (workspace/name) will exclude it from backup
The rclone remote name is configured
via the native rclone environment variable.
ie RCLONE_CONFIG_REMOTE_NAME_XXX
# in our case the GIT_BACKUP remote name was defined via the env `GITURE_S3_RCLONE_REMOTE_NAME=git_backup`
RCLONE_CONFIG_S3_TYPE=s3
RCLONE_CONFIG_S3_PROVIDER=IDrive
RCLONE_CONFIG_S3_ENDPOINT=h0k0.ca.idrivee2-22.com
RCLONE_CONFIG_S3_SECRET_ACCESS_KEY=$GIT_BACKUP_SECRET_KEY
RCLONE_CONFIG_S3_ACCESS_KEY_ID=$GIT_BACKUP_ACCESS_KEY
RCLONE_CONFIG_S3_NO_CHECK_BUCKET=true
RCLONE_CONFIG_S3_SERVER_SIDE_ENCRYPTION=aws:kms
- The env below mount your SSH directory for a GitHub authentication
--user 1000:1000 \
-v ~/.ssh:/home/me/.ssh
Prerequisites
- SSH Authentication:
- An SSH private key in your
~/.sshdirectory. ie Generate a key
- An SSH private key in your
- A GitHub API Token
known as Personal Access Token or PAT
- with the scope
repofor public and private repo
- with the scope
- A Rclone destination
- The dependencies
SYNOPSIS
git-hosting-backup source target [...]
where:
- source - a git hosting service name to read from
- target - a target name to backup to
- --output - the statistics output format (json or prometheus). Default to json
- --restart - if a backup fail, it can be restarted with the restart flag
- --filter-exclude-pattern=xxx - a regexp pattern to exclude from applied on the repository full name (ie parent/name)
- --filter-max-repo-count='x' - the maximum number of repositories to process
This script returns the following statistiques:
total_repo_countis the number of repositories processed (up to max-repo-count option)dumped_repo_countis the number of repositories bundled (ie dumped)unchanged_repo_countis the number of repositories skipped because of no changes since the last dumppattern_repo_countis the number of repositories skipped due to pattern matchingempty_repo_countis the number of repositories skipped due to being emptyfork_repo_countis the number of repositories skipped due to being a fork
Note:
total = dumped + unchanged + pattern + empty + fork
Tip: You can process the json format further with jq
git-hosting-backup source target | jq -r '.total_repo_count .dumped_repo_count'
How to restore
A bundle can be cloned.
git clone /path/to/repo.bundle
# or
git clone https://host/path/to/repo.bundle
Backup processing explained
The backup processing implemented in the backup function of the git-hosting script is:
- Store the start time and get the last backup time
- Get the repos via API and loop over them
- Skip the backup if:
- the last pushed time of the repo is earlier than the last backup (and if a backup exist)
- the repository is empty
- the repository is a fork
- Otherwise, backup with the following commands:
- Skip the backup if:
# git clone a mirror repository locally
git clone --mirror REPO_SSH_URL CLONE_TARGET_DIR
# create a bundle
git bundle create BUNDLE_SOURCE_PATH --all
# upload the bundle to `workspace/repository_name`
rclone moveto BUNDLE_SOURCE_PATH BUNDLE_TARGET_PATH --progress
- Repeat for another repo
- Delete the start time
- Write the last time with the start time
- End
Tip: How to sync between 2 git registries
The Gickup application is more suited for that.
Why do you choose SSH over Personal Access Token for Github
That's the easiest way to login.
Note that AskPass or a helper may be used to pass the token as stated in the doc, but it's not yet implemented
The Personal Access Token (PAT) should not be used in a Basic Auth URL as this URL is stored
https://user:$TOKEN/github.com/parent/repo
Kubernetes
In the command property of a container, you should use the entrypoint giture-docker-entrypoint
to create the host_known file with GitHub SSH keys and avoid the error: Host key verification failed
Example:
command: [ "giture-docker-entrypoint" ]
args: [ "git-backup", "backup", "github", "s3", "--filter-exclude-pattern=site-com-datacadamia", "--restart" ]
Dependencies
We use the following dependencies are
- Date from coreutils mandatory
- git
- openssh
- curl