Please describe the problem.
I copied one entire folder from my OSX local machine to a linux server. That folder was version controlled by git with git annex big files. On the linux machine the symlinks were broken.
In particular, on my local machine, a file like:
file.vcf.gz.tbi -> ../../.git/annex/objects/J4/Pg/SHA256E-s572463--85b357849ddad75fc1138b27d6af62cf410876e329ff035f21a631bd53146224.gz.tbi/SHA256E-s572463--85b357849ddad75fc1138b27d6af62cf410876e329ff035f21a631bd53146224.gz.tbi
but the file resides in:
../../.git/annex/objects/j4/Pg/SHA256E-s572463--85b357849ddad75fc1138b27d6af62cf410876e329ff035f21a631bd53146224.gz.tbi/SHA256E-s572463--85b357849ddad75fc1138b27d6af62cf410876e329ff035f21a631bd53146224.gz.tbi
Notice the difference between J4
and j4
. This is not a problem on my OSX but becomes one on a linux machine.
What steps will reproduce the problem?
on local machine:
rsync -aztv folder/ remote-machine:folder/
on server:
find . -type l -exec sh -c "file -b {} | grep -q ^broken" \; -print
listed every symlink as broken.
What version of git-annex are you using? On what operating system?
git annex version 6.20160318
my special remote is rsync.net
OSX 10.11.4
Not sure of which option were used when creating ther file system, but I suspect a HFS+ case insensitive.
Please provide any additional information below.
I wanted to do this full copy because the transfer speed from the special remote was too slow.
Since then I got decent speeds, and if I do git annex get .
on the server, all is fine.
So, my "problem" is solved but I'm still wondering why the discrepant folder capitalization.
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
# End of transcript or log.
Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Sure, it works marvels Also what I was trying to do is perhaps not by the book...
So the directory structure got lower cased when you copied it from OSX to linux. OSX remembered a lower-case name for the J4 directory, for example, and propigated that over to linux.
Unfortunately, git-annex is stuck using mixed case hash directories for backwards compatability reasons. Changing to all lower-case hash directories would need every git-annex repo to be converted; would invalidate all old tags and branches and history in the repos, etc. This is discussed in new repo versions.
It's actually possible to make brand-new git-annex repos use all lower case hash directories today, by setting
git config annex.tune.objecthashlower true
before you rungit annex init
for the first time.If you know you will need to move a repository between case-insensitive and case-sensitive filesystems, you could use that configuration. But that would be very forward looking, and instead users are just going to stumble over the mixed case directories from time to time.
What I'd recommend you do is, move the repository back to OSX, and then make a clone of it on the linux system, and use
git annex move --all --from origin
to move all the annexed file contents over from OSX to linux. This method avoids the problem entirely.Alternatively, create a case-sensitive FS on the Mac (just in a .dmg) and do the clone and copy to there, before rsync'ing to the remote Linux.
Am I interpreting this correctly, that this sets some attribute in the git-annex branch, so that clones of this repo will use the same annex layout?
git config --global annex.tune.objecthashlower true
and it will Just Work, managing old repos correctly and using the feature for new repos? And remote machines will handle things correctly just as long as they run a modern enoughgit-annex
?You should not set annex.tune.* globally. Only set it when initializing a new git-annex repo for the first time. Clones of the repo will automatically inherit the tunings.
See tuning.
annex init
time? What will the bad consequences be if it is set at other times?I also ran into this problem when copying a folder from Mac to Linux. Because the HFS+ filesystem is case-insensitive, directories that should be multiple directories are instead collapsed into one. For example, if there are four different files that are supposed to go into folders FG, fG, Fg, and fg, they will all end up in just one folder (whichever of the four was created first). But the symlinks will still point to different cases, so when everything gets moved to a case-sensitive filesystem, the symlinks break.
I did a quick fix on the Linux side by making symlinks within .git/annex/objects to account for all the case variations (if folder FG exists and has more than one sub-folder, create symlinks fG --> FG, Fg --> FG, fg --> FG).
To do this for the whole set of directories and sub-directories, I made a perl script:
And ran it this way (with script made executable and placed in $PATH):
It might be nice if git-annex-fsck could check whether a file was "misfiled" into the wrong directory due to case-insensitivity (before concluding that the file is missing). Although it also makes sense just to recommend cloning instead of copying when moving to a different filesystem (or remembering to set
annex.tune.objecthashlower true
when initiating a repo on a case-insensitive system).Here's a way to put files into their proper case-sensitive folders using
git-annex reinject --known
:Thanks to Thowz for the above solution.
There's a couple of scaling issues for large numbers of files (100K+ files in my case) which makes it go slowly and actually breaks the command line length ("Argument list too long").
Here's my modification for the first two commands:
If you used bsdtar (or some other method that attempts to copy over Apple metadata resource forks) you'll see a ton of
._
prepended files in your archive. If you're using this on Linux going forward and want these to be cleaned up (and enable the below directory cleanups to actually succeed and know you don't actually want any of the metadata) you'll want to delete these with something like this:You can then continue with his last two commands: