I'm soliciting ideas for new small features that let git-annex do things that currently have to be done manually or whatever.
Here are a few I've been considering:
- --numcopies would be a useful command line switch.
Update: Added. Also allows for things like
git annex drop --numcopies=2
when in a repo that normally needs 3 copies, if you need to urgently free up space. - A way to make
drop
and other commands temporarily trust a given remote, or possibly all remotes.
Combined, this would allow git annex drop --numcopies=2 --trust=repoa --trust=repob
to remove files that have been replicated out to the other 2 repositories, which could be offline. (Slightly unsafe, but in this case the files are podcasts so not really.)
Update: done --Joey
?wishlist: git-annex replicate suggests some way for git-annex to have the smarts to copy content around on its own to ensure numcopies is satisfied. I'd be satisfied with a git annex copy --to foo --if-needed-by-numcopies
Contrary to the "basic" solution, I would love to have a git annex distribute which is smart enough to simply distribute all data according to certain rules. My ideal, personal use case during the next holidays where I will have two external disks, several SD cards with 32 GB each and a local disk with 20 GB (yes....) would be:
cd ~/photos.annex # this repository does not have any objects!
git annex inject --bare /path/to/SD/card # this adds softlinks, but does **not** add anything to the index. it would calculate checksums (if enabled) and have to add a temporary location list, though
git annex distribute # this checks the config. it would see that my two external disks have a low cost whereas the two remotes have a higher cost.
# check numcopies. it's 3
# copy to external disk one (cost x)
# copy to external disk two (cost x)
# copy to remote one (cost x * 2)
# remove file from temporary tracking list
git annex fsck # everything ok. yay!
Come to think of it, the inject --bare thing is probably not a microfeature. Should I add a new wishlist item for that? -- RichiH
I've thought about such things before; does not seem really micro and I'm unsure how well it would work, but it would be worth a todo. --Joey
Update: Done as --auto. --Joey
Along similar lines, it might be nice to have a mode where git-annex tries to fill up a disk up to the annex.diskreserve
with files, preferring files that have relatively few copies. Then as storage prices continue to fall, new large drives could just be plopped in and git-annex used to fill it up in a way that improves the overall redundancy without needing to manually pick and choose.
Update: git annex get --auto basically does this; you can tune --numcopies on the fly to make it get more files than needed by the current numcopies setting. --Joey
If a remote could send on received files to another remote, I could use my own local bandwith efficiently while still having my git-annex repos replicate data. -- RichiH
Really micro:
% grep annex-push .git/config
annex-push = !git pull && git annex add . && git annex copy . --to origin --fast --quiet && git commit -a -m "$HOST $(date +%F--%H-%M-%S-%Z)" && git push
%
-- RichiH --Joey
This was already asked here, but I have a use case where I need to unlock with the files being hardlinked instead of copied (my fs does not support CoW), even though 'git annex lock' is now much faster . The idea is that 1) I want the external world see my repo "as if" it wasn't annexed (because of its own limitation to deal with soft links), and 2) I know what I do, and am sure that files won't be written to but only read.
My case is: the repo contains a snapshot A1 of a certain remote directory. Later I want to rsync this dir into a new snapshot A2. Of course, I want to transfer only new or changed files, with the --copy-dest=A1 (or --compare-dest) rsync's options. Unfortunately, rsync won't recognize soft-links from git-annex, and will re-transfer everything.
Maybe I'm overusing git-annex but still, I find it is a legitimate use case, and even though there are workarounds (I don't even remember what I had to do), it would be much more straightforward to have 'git annex unlock --readonly' (or '--readonly-unsafe'?), ... or have rsync take soft-links into account, but I did not see the author ask for microfeatures ideas (it was discussed, and only some convoluted workarounds were proposed). Thanks.
Before dropping unsused items, sometimes I want to check the content of the files manually. But currently, from e.g. a sha1 key, I don't know how to find the corresponding file, except with 'find .git/annex/objects -type f -name 'SHA1-s1678--70....', wich is too slow (I'm in the case where "git log --stat -S'KEY'" won't work, either because it is too slow or it was never commited). By the way, is it documented somewhere how to determine the 2 (nested) sub-directories in which a given (by name) object is located?
So I would like 'git-annex unused' be able to give me the list of paths to the unused items. Also, I would really appreciate a command like 'git annex unused --log NUMBER [NUMBER2...]' which would do for me the suggested command "git log --stat -S'KEY'", where NUMBER is from the 'git annex unused' output. Thanks.
My last comment is a bit confused. The "git fetch" command allows to get all the information from a remote, and it is then possible to merge while being offline (without access to the remote). I would like a "git annex fetch remote" command to be able to get all annexed files from remote, so that if I later merge with remote, all annexed files are already here. And "git annex fetch" could (optionally) call "git fetch" before getting the files.
It seems also that in my last post, I should have written "git annex get --from=remote" instead of "git annex copy --from=remote", because "annex copy --from" copies all files, even if the local repo already have them (is this the case? if yes, when is it useful?)