Please describe the problem.
This is a followup from the discussion on https://git-annex.branchable.com/forum/Standard_groups__47__preferred_contents/ where I unfortunately did not get a complete answer. I don't know if it is really a bug but at least it does not work as I would expect and the documentation provides no clear discussion on that.
Now to the problem: My annex is in "manual" mode (or equivalently "exclude="*" and present" or an expression which contains "present". Then I get a file using "git annex get file". I would expect that this file is now synced because it is "present". But it is not. When I change the file it is synced to the remotes. This is what it should be. However, when a remote changes that file, the content is NOT synced, the file is silently dropped.
Similarly, when I get a complete directory tree in manual mode, I would expect that it is synced. That means, when a remote adds a file or changes a file in that directory, it is also synced to the local machine. But it is not. If it is changed, it is silently dropped (as written above). If a file is added, only the metadata is added but the content is not synced.
What steps will reproduce the problem?
- Create a file 'file' on the server, git annex add/sync etc.
- On the client: git annex wanted here 'exclude="*" and present'
- On the client: git annex get file . The file is now present on the client
- Change the file on the server, git annex sync
- git annex sync --content on the client
- Result: File is dropped again on client
Similarly for directories:
- Create a (sub-)directory 'subdir' with files and sync everything
- On the client: git annex get subdir . The subdirectory is now present, all files under it downloaded.
- On the server create a new file in 'subdir' and git annex add; git annex sync --content
- git annex sync --content on the client
- Result: Content of the files is not synced to client
What version of git-annex are you using? On what operating system?
git-annex version: 5.20140717-g5a7d4ff
build flags: Assistant Webapp Webapp-secure Pairing Testsuite S3 WebDAV DNS Feeds Quvi TDFA CryptoHash
key/value backends: SHA256E SHA1E SHA512E SHA224E SHA384E SKEIN256E SKEIN512E SHA256 SHA1 SHA512 SHA224 SHA384 SKEIN256 SKEIN512 WORM URL
remote types: git gcrypt S3 bup directory rsync web webdav tahoe glacier ddar hook external
Have you found a solution for this? This seems useful if you're only interested in a subset of files/directories on your laptop, eg, but those that are fetched (present) that you are interested you'd want to keep up to date (in sync) with other computers?
Btw, the link to the previous discussion didnt work for me.
The problem is that there's no way for preferred content expressions to specify that a file is wanted just because some old version of the file is (or was) present.
It's not clear to me how that could be added to the preferred content expressions in an efficient way.
It might be possible to hack
git annex sync --content
and the assistant to look at incoming merges, and queue downloads of newer versions of files before merging.Also being discussed at https://github.com/datalad/datalad/issues/6.
The "look at incoming merges, and queue downloads of newer versions of present files" approach needs to do something about the case where it's not able to successfully download a newer version immediately.
If it let the merge proceed, the file would end up not being present anymore, and so a later sync wouldn't know it had been present.
Failing to get the contents of all changed files could just make the sync fail before it merges, keeping the tree at the earlier version. This might be desirable.
But, implementing that means changing sync to download file contents before merging, rather than the current merge-first. I'm sure a lot of people won't want that. (Ie, I certianly don't.) So, this seems to need to be a new mode for syncing.
(Such a mode is probably generally useful, aside from this use case.)
If this was implemented, then when a file is modified, the content of the new file would be present. git-annex already makes it so that, when a file is moved, the content of the file is still present. But, what if a file were first moved and then modified? If that happened in multiple commits, they could be examined in turn (with additional complication and slowdown) to conclude that the content is wanted. But if that happened in a single commit, there's no way to tell that from deleting the old file and adding a new file, whose content would not be automatically wanted.
The bug report wants git-annex to somehow detect when the user has manually gotten an entire directory tree and start getting new files in that directory too, which seems pretty infeasible. How is git-annex supposed to guess whether you want new files in a directory tree, or just the files that are currently there? What if some files are duplicated amoung 2 directory trees, and one tree ends up complete while the other one doesn't? This seems like a request for mindreading ponies.
There are many preferred content expressions that fully specify what files are wanted, without using the "present" token. AFAICS, there's no reason to do any of this work when the preferred content expression doesn't include "present".
TBH, I am not at all sure this is implementable anywhere near sanely. If I were you, I'd use preferred content to specify the files I want, which avoids these complexities and works great. Using metadata to tag files and making all tagged files be wanted in the preferred content expression is one nice way to go. And metadata is copied over when adding a new version of a file, so this tagging approach works across file modifications.