There's git annex reinject <src> <dst>
for re-adding one file's contents to the local annex. But what if I have a whole bunch of files, and want git-annex itself to decide whether & where it needs to reinject them? (And if the file doesn't need to be reinjected, it would remain in its original place.)
None of the git annex import
modes work properly in this case. By default, importing adds another, unnecessary copy of the imported file (which I have to rm
after importing). The --clean-duplicates
mode seems close, but it insists on verifying the content in other repositories rather than just reinjecting it locally. (Let's assume that the main reason I'm trying to reinject is that I cannot access other repos.)
So I'm hoping for something like git annex import --reinject <src>...
. Or are there other existing ways to achieve the same? I couldn't find any.
implemented
git annex reinject --known
done (and alsogit annex import --reinject-known
now) --Joey
git-annex verifies the content in other repositories when you use --clean-duplicates because if it did not, it could delete the only copy of a file you had, because it was deleting files it knew about, but didn't have.
As for what you are attempting, maybe something like this?
Thanks, but you missed my point entirely... I wasn't asking for a mode that would delete data without checking. I was asking for the complete opposite – a mode that would inject an extra copy of the data without checking.
Yeah, I guess I could
annex add
the files, then un-annex them, and thenannex import --clean-duplicates
, but that's a somewhat long-winded approach, needing twice the space and twice the time.(...speaking of losing data, it seems that
git annex reinject
is perfectly happy to delete files if I accidentally give it the wrong target. I.e. after failing content verification, it still throws away the source.)It doesn't have to be part of git-annex; I could script this feature myself, though there aren't nearly enough plumbing commands either. (For example, a command to hash a file and give its key (like
git hash-object
), or a command to find all paths for a key.)Having an equivalent of
git hash-object -w
(inject an arbitrary object) would make it even easier, but I couldn't find anything like that either.Anyway, let's cancel this todo, I'll find other ways.
Good point about reinject deleting files that don't verify. I've fixed that so it leaves them alone.
I think this would fit better as an option to
git annex reinject
than it would togit annex import
. The latter has too many options anyway.It would not be hard to add something like
git annex reinject --all-known-files
, which would check if a source file hashes to a known key and ingest its content into the annex if so, otherwise leaving the source file along.That would reinject files that had been added to the repo long ago, and then were deleted. I don't know if that would be considered suprising behavior, but it's hard to only ingest files that have a current link in the repo (because a. git-annex keeps no such index (mostly) and b. branches and c. tags). Even if it was surprising behavior to reinject old deleted files, I suppose
git annex unused
could be used to drop such old unused file contents after reinjecting them.There's also the problem of different backends; it seems such a thing would need to hash a file 5 different ways to make sure no hash of it is known.
As to adding plumbing, I'm always open to ideas for more useful plumbing.
git annex find
to get eg, a list of keys of files in the currently checked out branch.git annex calckey
that can calculate the key that would be used for a file. (eg, hashing it)"There's also the problem of different backends; it seems such a thing would need to hash a file 5 different ways to make sure no hash of it is known."
Yeah, I guess you're right – and there might even be different 'hashes' in the same backend, e.g. SHA256E considers
Foo.ISO
different fromFoo.iso
...Actually I ended up doing this only twice, so manual
annex add everything
+ duplicate cleaner wasn't really that bad in the end. Andannex calckey
andannex find
with ${key} will be useful for the other scripts I have; thanks.As it leaves non-matched files alone, the user could specify the backend to minimise this requirement.
Perhaps a new command "ingest", with an option for the backend and require --force to try all known backends?
This would be very useful for my file sorting project (when I restart it).
@CandyAngel, reinject --known moves the file into the annex. The only time I can see that it would make a copy is when importing from some other drive than the drive the repository is on, and it would then delete the source file after copying.
It does currently always check that enough disk space exists to copy the file, even though it's probably going to move it. Perhaps that's why you thought it copied it?