I still have little experience with git-annex, so I may be missing something fairly obvious.
I have been moving files between two repositories. Things seemed to be going well, but today I noticed that I was missing some content that I have just moved. I would like some help to figure out where I went wrong to avoid doing worse mistakes in the future.
What I did was to run the command git-annex move --from=toshiba
to move a bunch of files from my USB unit to my current repository and then I ran git-annex sync
on both ends. Afterwards I noticed that the content of the files was not available, so I tried to track one of them.
What I could see in my local indirect repository was a broken symbolic link pointing to ../.git/annex/objects/ZQ/WF/SHA256E-s241--f3d7e5d1f788235b8eec0af58cc0c526b112b9e834a47ba7a475876c49dce343.jpg/SHA256E-s241--f3d7e5d1f788235b8eec0af58cc0c526b112b9e834a47ba7a475876c49dce343.jpg
.
git-annex info
gave me similar information:
file: Heavy Metal 1981.jpg
size: 241 bytes
key: SHA256E-s241--f3d7e5d1f788235b8eec0af58cc0c526b112b9e834a47ba7a475876c49dce343.jpg
git-annex whereis
locates the file still on its origin:
whereis Heavy Metal 1981.jpg (1 copy)
49b5b3a4-56ac-4cf2-aed9-1c23d3181c97 -- Toshiba USB HDD [toshiba]
ok
On my external HDD drive, where I have an indirect mode repository, the file has already been replaced by a reference (241 bytes).
$ cat "Heavy Metal 1981.jpg"
../../../../../../../../../media/TOSHIBA EXT/annex/.git/annex/objects/4z/gf/SHA256E-s245--ae647e7ad31089255413a9290ca9344542f3cd15ecef66884613bf776387633d.jpg/SHA256E-s245--ae647e7ad31089255413a9290ca9344542f3cd15ecef66884613bf776387633d.jpg
git-annex log
remembers something about the operation:
+ Tue, 17 May 2016 12:22:43 WEST Heavy Metal 1981.jpg | 49b5b3a4-56ac-4cf2-aed9-1c23d3181c97 -- Toshiba USB HDD [toshiba]
I tried to git-annex get "Heavy Metal 1981.jpg"
, and now I have a working symlink on my PC. However it does not point to the image, but to the same 241 bytes reference file that I have on my external HDD.
git-annex whereis
now mentions 2 locations to my file, but none of the working dirs holds its contents. So, where are the contents of the file? Lost somewhere? It appears that git-annex took the indirect mode reference file and took it for the real file contents -- and that is not good.
I looked at the output of git-annex unused
, cross-referenced with git log --stat -S
and I managed to find it somewhere in the list of unused files in my PC's repository, but not on the external drive.
Still, at this point I'm a little worried. I would like to understand what I could have done to cause this mess. Also, how I can clean it up (the rest of the files remain as broken links at the moment).
I have a couple more drives I wanted to add to this setup, but you can understand that I hesitate a little bit at the moment. Maybe I have "lost" more data than I realize.
The version details:
git-annex version: 6.20160511
build flags: Assistant Webapp Pairing Testsuite S3(multipartupload)(storageclasses) WebDAV Inotify DBus DesktopNotify XMPP ConcurrentOutput TorrentParser MagicMime Feeds Quvi
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external
local repository version: 5
supported repository versions: 5 6
upgrade supported from repository versions: 0 1 2 4 5
operating system: linux x86_64
It sounds like the symlink in the git repository has been changed to point to some other content that the content it had originally. Since git tracks past versions of files, you should be able to use
git log
on the file to find out when this hapened and you can usegit checkout
to check out the original version of the file, orgit revert
the commit that changed it to point to the wrong content. The content of the file should still be stored in the git annex, so you should be able to access it this way.Now, it kind of sounds like something added the git-annex link for a file, as it would appear on the FAT filesytem, to git-annex as the new content of the file. It would certianly be a bug if a git-annex command did that.
What does
git annex info
(with no other parameters) report when you run it in the repository on the USB drive? What filesystem is in use on the USB drive?The external unit seems to hold a NTFS. Here is the
git-annex info
output:However,
git status
says:Which is not the same message as "no git found here", but is also not what I expected to see.
git log
seems to work but says nothing about the file at hand.On the PC side, however, I can see three commits on the file (I wish the commit message contained the command line with arguments, rather than the less descriptive "git-annex in Toshiba USB HDD"). Using
git show
andgit cat-file
I managed to determine the following:March 4: the initial version of the file was committed.
May 17 11:51: the file's content changed to
../../../../../../../../../media/TOSHIBA EXT/annex/.git/annex/objects/kx/3W/SHA256E-s96418--c6164e17d88914b2e6781e2cb8e7b91e9669ddf2d9ee6f5cbb17f3212bccfba4.jpg/SHA256E-s96418--c6164e17d88914b2e6781e2cb8e7b91e9669ddf2d9ee6f5cbb17f3212bccfba4.jpg
. This is blob../.git/annex/objects/4z/gf/SHA256E-s245--ae647e7ad31089255413a9290ca9344542f3cd15ecef66884613bf776387633d.jpg/SHA256E-s245--ae647e7ad31089255413a9290ca9344542f3cd15ecef66884613bf776387633d.jpg
.May 17 12:22 the file's content changed again to
../../../../../../../../../media/TOSHIBA EXT/annex/.git/annex/objects/4z/gf/SHA256E-s245--ae647e7ad31089255413a9290ca9344542f3cd15ecef66884613bf776387633d.jpg/SHA256E-s245--ae647e7ad31089255413a9290ca9344542f3cd15ecef66884613bf776387633d.jpg
, i.e., a reference to the previous object. This is blob../.git/annex/objects/ZQ/WF/SHA256E-s241--f3d7e5d1f788235b8eec0af58cc0c526b112b9e834a47ba7a475876c49dce343.jpg/SHA256E-s241--f3d7e5d1f788235b8eec0af58cc0c526b112b9e834a47ba7a475876c49dce343.jpg
.What else can I do in order to work out what went wrong? Is having concurrent commands manipulating the same repository a bad idea?
When I
git-annex drop --force
a file from my direct repository, it gets replaced by a symbolic-link-like file, containing a path. Then, when Igit-annex sync
the repository to propagate the changes I have made, the file's content gets updated as if the file has been replaced.My question is then: why does the original file gets replaced by the link-like file when I drop it?
The symbolic-link-like file is in fact, a symlink, which is what git-annex uses to represent an annexed file in git. If your filesystem does not support symlinks, git writes the link location to a regular file instead.
git annex drop removes the content of a file from the local repository, but its symlink remains checked into git. So, the content of the file is replaced by the symlink in your working tree.
That symlink should be the same thing git already had recorded for the file.
Based on your earlier comment, it does seem that the symlink standin file that git uses is being treated as new content for the file, and getting annexed. That would be a bug.
Is that happening when you run
git annex sync
on Linux, or is it on Windows?What else can you tell or show me to help me reproduce your problem? I've tried setting up an NTFS filesystem, putting a git-annex repository on it, and dropping a file; git-annex sync did not do the wrong thing when I tried it.