Recent comments posted to this site:

You can use git annex setpresentkey to tell git-annex that a remote with a given uuid contains a given content.

For example, if the proxy remote is named proxy and you know it contains all annexed files in the current directory and below, you could run this to tell git-annex that the proxy contains all the files it thought it didn't contain:

uuid=$(git config remote.proxy.annex-uuid)
git annex find --not --in proxy --format "\${key} $uuid 1\n" | \
    git annex setpresentkey --batch

There will be some problems using this empty proxy remote, eg if you run git annex move somefile --from proxy, git-annex will try to delete the content from it, see the content is not there, and update its location tracking to say that the proxy does not contain the content any longer. git annex fsck --from proxy will do similar so you'll need to avoid it.

And, you'll probably want to use git annex trust proxy so that git-annex drop assumes it contains the content you said it has; by default git-annex will double-check and that check will fail.

To avoid all these kind of issues with the proxy, a better approach might be to make a custom special remote that actually accesses the data on the tape drive. See special remote implementation howto

Comment by joey Tue Aug 14 14:50:51 2018

The "filterdriver" branch implements support for these.

However, it's actually slower than the old interface, because the new interface requires git-annex read the whole file content from git when adding a file, and the old interface let it not read any content.

Since the new interface does have capabilities, a new capability could prevent schepping the content over the pipe, and let the filter driver refer to the worktree file instead, and respond with the path of a file. This would be similar to my old patch set for the old interface.

Comment by joey Mon Aug 13 20:24:02 2018

This is caused by this bug in esquelito:

The best way to avoid this kind of transient breakage in the haskell dependencies of git-annex is to build it using stack, instead of cabal. stack pins packages to a consistent working set.

I don't really see this as something that warrants a change to git-annex. Using bleeding edge versions of all build dependencies will break, that's why the build docs recommend not using cabal if you don't want to be involved in fixing that kind of breakage.

Comment by joey Mon Aug 13 16:35:43 2018

Hello, it has been a while since I posted here about this issue with sqlite but it keeps following me! I randomly get errors while trying to lock files: sqlite worker thread crashed: SQLite3 returned ErrorIO while attempting to perform prepare "SELECT null from content limit 1": disk I/O error

Should I worry about the state of my hard drive? And I don't know if it is intended, but when this happens, the process doesn't stop with a failure code, it just freezes. I checked with top, and git-annex seems to continue doing stuff as it is still using a full core.

Comment by webanck Mon Aug 13 14:18:17 2018

Also with public=no but embedcreds=yes.

It can be useful to embed read-only credentials, but allow users to easily add/store their own credentials (locally) with write-access.

Comment by Mara Mon Aug 13 12:28:09 2018

Let's please not entangle this bug with that other bug.

Sure! I just (probably erroneously) felt that they stem from the same point of absent clear "semantic" on either conversion should happen or not. I am yet to fully digest what you are suggesting, and either and how we should address for this at datalad level, but meanwhile FWIW:

  • adding -n to the commit (and not to add) is as uncommon to me in my daily use of git/git-annex, and I hope that I would never have to use it while performing regular "annex unlock file(s); annex add file(s); commit file(s)" sequence in order to maintain a file(s) under annex.

  • either a file smallen according to git-annex/largefiles setting is unknown to the user (or some higher level tool using git-annex as datalad) without explicitly checking (not even sure yet how) or doing git annex add-ing it/them and seeing either it would now be added to git whenever it was added to annex before. So hopefully we do not need to do that either.

Comment by yarikoptic Fri Aug 10 19:17:16 2018

@torarnv thanks for pointing that out.. I finally got around to verifying that, and was able to speed up the smudge filter. Also this avoids the problem that git for some reason buffers the whole file content in memory when it sends it to the smudge filter, which is a pretty bad memory leak in git that no longer affects this.

Comment by joey Thu Aug 9 22:11:00 2018

One of v6's big problems is that dropping or getting an annexed file updates the file in the working tree, which makes git status think the file is modified, even though the clean filter will output the same pointer as before. Runing git add to clear it up is quite expensive since the large file content has to be read. Maybe a long-running filter process could avoid this problem.

If git can be coaxed somehow into re-running the smudge filter, git-annex could provide the new worktree content to git via it, and let git update the working tree.

Git would make a copy, which git-annex currently does, so the only added overhead would be sending the file content to git down the pipe. (Well and it won't use reflink for the copy on COW filesystems.)

annex.thin is a complication, but it could be handled by hard linking the work tree file that git writes back into the annex, overwriting the file that was there. (This approach could also make git checkout of a branch honor annex.thin.)

How to make git re-run the smudge filter? It needs to want to update the working tree. One way is to touch the worktree files and then run git checkout. Although this risks losing modifications the user made to the files so would need to be done with care.

That seems like it would defer working tree updates until the git-annex get command was done processing all files. Sometimes I want to use a file while the same get command is still running for other files. It might work to use the "delay" capability of the filter process interface. Get git to re-smudge all affected files, and when it asks for content for each, send "delayed". Then as git-annex gets each file, respond to git's "list_available_blobs" with a single blob, which git should request and use to update the working tree.

Comment by joey Thu Aug 9 20:18:46 2018

Ben, I can reproduce that, but the file appearing modified in git status is a known problem documented in smudge. It's one of the primary reasons that v6 mode remains experiemental.

While git commit -a in that clone does cause the file to be converted from git to annex, touching the file and committing has the same effect. If you want to juggle annexed and non-annexed files in a v6 repository without letting annex.largefiles tell git-annex what to do, you have to manually tell it what to do every time the file is staged. When you git commit -a, you stage the file and so you need to include -c annex.largefiles=nothing to keep it from transitioning to the annex.

It think it might make sense to get v6 working to the point that it's non-experimental before worrying about such a marginal edge case as this.

Comment by joey Thu Aug 9 19:52:50 2018

@davicastro yes, using git-annex add for adding both kinds of files is workflow this is about. Other than git add features like --interactive I see no need to ever use git add once you have this set up.

Comment by joey Thu Aug 9 19:46:22 2018