devblog/day 339 smudging out direct modegit-annexhttp://git-annex.branchable.com/devblog/day_339_smudging_out_direct_mode/git-annexikiwiki2015-12-19T18:29:56Zniiicehttp://git-annex.branchable.com/devblog/day_339_smudging_out_direct_mode/comment_1_6435db0003d2d8414b1a040b8899c7b8/anarcat [id.koumbit.net]2015-11-24T01:07:32Z2015-11-24T01:07:32Z
<p>that would be pretty awesome! thanks so much for looking into what others are doing: it takes great maturity and respect for others, something that is so often missing online...</p>
<p>i hope this can solve a bunch of WTF issues i've had with direct mode, which already improved by leaps and bounds, mind you. <img src="http://git-annex.branchable.com/smileys/smile.png" alt=":)" /></p>
<p>speaking of which - would it make sense for git-annex to support lfs remotes out of the box? or is considered builtin to git (ie. if you install git-lfs, you can already have a hybrid lfs/annex repo?)</p>
comment 2http://git-annex.branchable.com/devblog/day_339_smudging_out_direct_mode/comment_2_79d19b2b057c2a0b23875a32dae0c2ba/joey2015-11-24T14:33:21Z2015-11-24T14:22:37Z
<p>Sure, it would be great to have a special remote supporting the git-lfs
storage backend. This would let git-annex repos be uploaded to github along
with the annexed files, which is a nice diversity to have in addition to
gitlab's support for git-annex.</p>
<p>The API is documented, so it's certianly doable, as an external special
remote even.</p>
comment 3http://git-annex.branchable.com/devblog/day_339_smudging_out_direct_mode/comment_3_43f7c7c6c2fd227466857c3232e13351/spwhitton2015-11-26T01:34:17Z2015-11-26T01:34:17Z
At present it's possible to check (small) files directly into a git branch alongside annexed files. To do this one uses <code>git add</code> rather than <code>git annex add</code>. But if <code>git add</code> would add files via the smudge/clean process, how would one check files directly into git? Would it no longer be possible?
The downsidehttp://git-annex.branchable.com/devblog/day_339_smudging_out_direct_mode/comment_4_35d0472cd64335eedeeb17d51cdbaf5b/g2015-12-10T03:45:09Z2015-12-10T03:45:09Z
<p>If I'm understanding correctly, that one downside (requiring all checkouts to have all files be direct if any filesystems require it) seems to be a fairly major limitation, no? Changing the concept of locked/unlocked files from being a local, per-repo concern to a global one seems like quite a major change.</p>
<p>For instance, would mean that any public repo using git annex for distributing a set of data files would either have to have all files be unlocked, or else no one would be able clone onto a FAT32-formatted external hdd?</p>
<p>FWIW, the particular use case I'm concerned about personally is having my annexes on my android device.</p>
comment 5http://git-annex.branchable.com/devblog/day_339_smudging_out_direct_mode/comment_5_499f3553c4efc35e54f121a7d4abc029/joey2015-12-10T15:15:20Z2015-12-10T15:00:52Z
<p>I'm concerned about that too. But it may be possible to finesse it,
when git-annex is running on a crippled filesystem, it may be able to
unlock all files as it gets content for them, producing a local fork.</p>
<p>The first difficulty would be avoiding or autoresolving conflicts
between locked and unlocked when merging changes into that fork. I think
this is very tractable; such a conflict comes down mostly to the symlink
bit in the tree object.</p>
<p>The real difficulty would be that any pushes from that fork would include
its change converting all files to unlocked. Although it's fairly mechanical
to convert such a commit into one that doesn't unlock files, so perhaps
that could be automated somehow on push or merge.</p>
<p>There's also a small and probably easy to implement git change that
would avoid all this complexity: If git's smudge filters were optionally
able to run on the link-text of symlinks, then a file could be unlocked
locally without changing what's in the repo and all the smudge stuff
would still work on it.</p>
<p>Crippled filesystems aside, I think there's value in being able to unlock
files across clones of a repo. For example, a repo could have a workflow
where the files for the current episiode/experiment/whatever start out
unlocked and are locked once it's complete.</p>
comment 6http://git-annex.branchable.com/devblog/day_339_smudging_out_direct_mode/comment_6_ede48107675edfe40d5fdd3377553aa4/g2015-12-11T22:58:22Z2015-12-11T22:58:22Z
<p>Thanks for the quick reply (and all your work on this!)</p>
<p>Interesting, that change to git does sound like it should be relatively small compared to the workarounds needed. But in any case, glad to hear you're thinking about the issue.</p>
<p>Also curious what your thoughts are on the performance issues you had identified previously with using smudge/clean on larger repos. Do the changes in git 2.5 address all your concerns? Or are there still some cases where this will potentially result in significant slow-down?</p>
comment 7http://git-annex.branchable.com/devblog/day_339_smudging_out_direct_mode/comment_7_ceb1d2a0e5bbc73590b80ff69292102d/joey2015-12-19T18:29:56Z2015-12-19T18:21:04Z
<p>I'm still not entirely happy with the smudge/clean interface's performance.
At least git doesn't fall over if the clean filter declines to read all the
content of the large file on stdin anymore, which was the main thing
preventing an optimised use of it. Still, git has to run the clean filter once
per file, which is suboptimal. And, the smudge filter can't modify the file in
the work tree, but instead has to pass the whole file content back to git
for writing, which is going to result in a lot of unnecessary context
switches and slowdown. Especially in git-annex's case, where all it really
needs to do is make a hard link to the content.</p>