How to move files from one git repo to another (not clone) and keep history

Keywords: git Java svn xml

Our Git repository starts as part of a single Monster SVN repository, where each project has its own tree, as follows:

project1/branches
        /tags
        /trunk
project2/branches
        /tags
        /trunk

Obviously, using svn mv to move files from one file to another is easy. But in Git, each project is in its own repository, and today I was asked to move subdirectories from project2 to project1. I did this:

$ git clone project2 
$ cd project2
$ git filter-branch --subdirectory-filter deeply/buried/java/source/directory/A -- --all
$ git remote rm origin  # so I don't accidentally the repo ;-)
$ mkdir -p deeply/buried/different/java/source/directory/B
$ for f in *.java; do 
>  git mv $f deeply/buried/different/java/source/directory/B
>  done
$ git commit -m "moved files to new subdirectory"
$ cd ..
$
$ git clone project1
$ cd project1
$ git remote add p2 ../project2
$ git fetch p2
$ git branch p2 remotes/p2/master
$ git merge p2 # --allow-unrelated-histories for git 2.9
$ git remote rm p2
$ git push

But it seems puzzling. In general, is there a better way to do this? Or did I take the right approach?

Note that this involves merging history into an existing repository rather than simply creating a new stand-alone repository from part of another repository( As shown in the previous question ).

#1 building

If your history is reasonable, you can take the commit as a patch and apply it to a new repository:

cd repository
git log --pretty=email --patch-with-stat --reverse --full-index --binary -- path/to/file_or_folder > patch
cd ../another_repository
git am < ../repository/patch 

Or a row

git log --pretty=email --patch-with-stat --reverse -- path/to/file_or_folder | (cd /path/to/new_repository && git am)

(excerpts from Documents for exorbo )

#2 building

Yes, the sub directory filter that hit the filter branch is the key. The fact that you use it essentially proves that there is no other way to make it easier - you can only rewrite the history, because you want to keep only one (renamed) subset of the file, and by definition, this changes the hash value. You cannot use standard commands, such as pull, to do this because they do not override history.

Of course, you can refine the details - some cloning and branching are not necessary - but the overall approach is good! It's too complicated, it's a pity, but of course, git's purpose is not to make rewriting history easy.

#3 building

After trying various ways to move files or folders from one Git repository to another, the only files or folders that seem to work reliably are outlined below.

It involves cloning the repository from which you want to move a file or folder, moving the file or folder to the root, rewriting Git history, cloning the target repository, and pulling files or folders with history directly into the target repository.

Stage one

  1. Make A copy of repository A, because the following steps have made significant changes to this copy, you should not push!

    git clone --branch <branch> --origin origin --progress \\ -v <git repository A url> # eg. git clone --branch master --origin origin --progress \\ # -v https://username@giturl/scm/projects/myprojects.git # (assuming myprojects is the repository you want to copy from)
  2. CD entry

    cd <git repository A directory> # eg. cd /c/Working/GIT/myprojects
  3. Remove the link to the original repository to avoid any unexpected remote changes (for example, by push)

    git remote rm origin
  4. Browse your history and files and delete anything that is not in directory 1. As A result, the contents of directory 1 are injected into repository A's library.

    git filter-branch --subdirectory-filter <directory> -- --all # eg. git filter-branch --subdirectory-filter subfolder1/subfolder2/FOLDER_TO_KEEP -- --all
  5. For single file move only: browse the rest and delete everything except the required file. (you may need to delete unwanted files with the same name and submit them.)

    git filter-branch -f --index-filter \\ 'git ls-files -s | grep $'\\t'FILE_TO_KEEP$ | GIT_INDEX_FILE=$GIT_INDEX_FILE.new \\ git update-index --index-info && \\ mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE || echo "Nothing to do"' --prune-empty -- --all # eg. FILE_TO_KEEP = pom.xml to keep only the pom.xml file from FOLDER_TO_KEEP

The second stage

  1. Cleaning steps

    git reset --hard
  2. Cleaning steps

    git gc --aggressive
  3. Cleaning steps

    git prune

You may want to import these files into directory B instead of repository B in the root directory:

  1. Create the directory

    mkdir <base directory> eg. mkdir FOLDER_TO_KEEP
  2. Move files to this directory

    git mv * <base directory> eg. git mv * FOLDER_TO_KEEP
  3. Add files to this directory

    git add .
  4. Commit changes and we're ready to merge these files into a new repository

    git commit

The third stage

  1. If you do not already have repository B, copy it

    git clone <git repository B url> # eg. git clone https://username@giturl/scm/projects/FOLDER_TO_KEEP.git

    (suppose folder to keep is the name of the new repository you want to copy to)

  2. CD entry

    cd <git repository B directory> # eg. cd /c/Working/GIT/FOLDER_TO_KEEP
  3. Create A remote connection to repository A as A branch in repository B

    git remote add repo-A-branch <git repository A directory> # (repo-A-branch can be anything - it's just an arbitrary name) # eg. git remote add repo-A-branch /c/Working/GIT/myprojects
  4. Pull repository B from the branch, which contains only the directories you want to move.

    git pull repo-A-branch master --allow-unrelated-histories

    Pull copy files and history. Note: you can use merge instead of pull, but pull works better.

  5. Finally, you may want to do some cleanup by removing the remote connection to repository A

    git remote rm repo-A-branch
  6. Press it, it's all set.

    git push

#4 building

Reserved directory name

The subdirectory filters (or shorter command git subtrees) work fine, but they don't work for me because they remove the directory name from the submission. In my scenario, I just want to merge part of one repository into another and keep a history with the full pathname.

My solution is to use a tree filter and simply remove unwanted files and directories from a temporary clone of the source repository, then extract them from the clone to the target repository in five simple steps.

# 1. clone the source
git clone ssh://<user>@<source-repo url>
cd <source-repo>
# 2. remove the stuff we want to exclude
git filter-branch --tree-filter "rm -rf <files to exclude>" --prune-empty HEAD
# 3. move to target repo and create a merge branch (for safety)
cd <path to target-repo>
git checkout -b <merge branch>
# 4. Add the source-repo as remote 
git remote add source-repo <path to source-repo>
# 5. fetch it
git pull source-repo master
# 6. check that you got it right (better safe than sorry, right?)
gitk

#5 building

The answer is based on git am Interesting commands, and step-by-step through examples.

objective

  • You want to move some or all of your files from one repository to another.
  • You want to keep their history.
  • But you don't care about retaining tags and branches.
  • You accept a limited history of renamed files (and files in renamed directories).

program

  1. Extract the history of the e-mail format using the following format
    git log --pretty=email -p --reverse --full-index --binary
  2. Reorganize file tree and update file name changes in history [optional]
  3. Using git am to apply new history

1. Extract history in email format

For example: extracted history file3, file4 and file5

my_repo
├── dirA
│   ├── file1
│   └── file2
├── dirB            ^
│   ├── subdir      | To be moved
│   │   ├── file3   | with history
│   │   └── file4   | 
│   └── file5       v
└── dirC
    ├── file6
    └── file7

Clean up temporary directory targets

export historydir=/tmp/mail/dir  # Absolute path
rm -rf "$historydir"             # Caution when cleaning

Clean up your buyback sources

git commit ...           # Commit your working files
rm .gitignore            # Disable gitignore
git clean -n             # Simulate removal
git clean -f             # Remove untracked file
git checkout .gitignore  # Restore gitignore

Extract the history of each file in e-mail format

cd my_repo/dirB
find -name .git -prune -o -type d -o -exec bash -c 'mkdir -p "$historydir/${0%/*}" && git log --pretty=email -p --stat --reverse --full-index --binary -- "$0" > "$historydir/$0"' {} ';'

Unfortunately, the option -- follow or -- find copies harder cannot be used with -- reverse. This is why history is cut when a file (or parent directory) is renamed.

After: temporary history in email format

/tmp/mail/dir
    ├── subdir
    │   ├── file3
    │   └── file4
    └── file5

2. Reorganize the file tree and update the file name changes in the history [optional]

Suppose you want to move these three files to another warehouse (which can be the same warehouse).

my_other_repo
├── dirF
│   ├── file55
│   └── file56
├── dirB              # New tree
│   ├── dirB1         # was subdir
│   │   ├── file33    # was file3
│   │   └── file44    # was file4
│   └── dirB2         # new dir
│        └── file5    # = file5
└── dirH
    └── file77

So reorganize your files:

cd /tmp/mail/dir
mkdir     dirB
mv subdir dirB/dirB1
mv dirB/dirB1/file3 dirB/dirB1/file33
mv dirB/dirB1/file4 dirB/dirB1/file44
mkdir    dirB/dirB2
mv file5 dirB/dirB2

Your temporary history is now:

/tmp/mail/dir
    └── dirB
        ├── dirB1
        │   ├── file33
        │   └── file44
        └── dirB2
             └── file5

Also change the file name in the history:

cd "$historydir"
find * -type f -exec bash -c 'sed "/^diff --git a\|^--- a\|^+++ b/s:\( [ab]\)/[^ ]*:\1/$0:g" -i "$0"' {} ';'

Note: This overrides the history to reflect the path and filename changes.
(i.e. change new location / name in new warehouse)

3. Apply new history

Your other warehouses are:

my_other_repo
├── dirF
│   ├── file55
│   └── file56
└── dirH
    └── file77

Apply commit from temporary history file:

cd my_other_repo
find "$historydir" -type f -exec cat {} + | git am 

Your other warehouse is now:

my_other_repo
├── dirF
│   ├── file55
│   └── file56
├── dirB            ^
│   ├── dirB1       | New files
│   │   ├── file33  | with
│   │   └── file44  | history
│   └── dirB2       | kept
│        └── file5  v
└── dirH
    └── file77

Use git status to view the number of submissions to push: -)

Note: because history has been overridden to reflect path and filename changes:
(i.e. compare with location / name in previous repo)

  • You do not need git mv to change the location / filename.
  • You do not need git log --follow to access the complete history.

Additional tip: detect renamed / moved files in your warehouse

List renamed files:

find -name .git -prune -o -exec git log --pretty=tformat:'' --numstat --follow {} ';' | grep '=>'

More custom settings: you can use the -- find copies harder -- reverse or -- reverse options to complete the command git log. You can also use cut -f3 - and cut -f3 - full patterns' {. * = >. *}'delete the first two columns.

find -name .git -prune -o -exec git log --pretty=tformat:'' --numstat --follow --find-copies-harder --reverse {} ';' | cut -f3- | grep '{.* => .*}'

Posted by gray_bale on Sun, 26 Jan 2020 22:55:00 -0800