forum/Unlocked mode without data also under .git/annex?git-annexhttp://git-annex.branchable.com/forum/Unlocked_mode_without_data_also_under_.git__47__annex__63__/git-annexikiwiki2022-09-19T14:49:02Zcomment 1http://git-annex.branchable.com/forum/Unlocked_mode_without_data_also_under_.git__47__annex__63__/comment_1_d35829cd466139e0e77a79c251a79ad8/joey2022-09-09T19:16:00Z2022-09-09T18:54:08Z
<p>Suppose git-annex did behave that way. Now suppose that you ran:</p>
<pre><code>git config annex.largefiles 'include=*'
git add largefile
git commit -m 'added a large file to git-annex (unlocked)'
git stash
</code></pre>
<p>Then git would have deleted the only copy of largefile, which
was the one stored in the working tree. You would have lost data.
The hard links, annoying as they are, avoid this problem.</p>
<blockquote><p>But as we know, when we modify a file, it invalidates its checksum</p></blockquote>
<p>Right, and if you're going to be running things that open and modify
files, then it is not safe to set annex.thin.
"echo foo > largefile" will modify the file and lose the original version.</p>
<p>The difference is that you have to run something that does usually
modify the file to lose data (with annex.thin set).
Running a git command that is normally entirely safe will not lose data.</p>
<p>So the user of annex.thin only needs to keep in mind that some things that
would usually modify a file will lose the previous version of it,
unless they've copied it to another remote. They don't have to live in fear
of running a command that is usually safe and reversable and that causing
data loss.</p>
comment 2http://git-annex.branchable.com/forum/Unlocked_mode_without_data_also_under_.git__47__annex__63__/comment_2_07861ab33ec31659ec3cfa8520580fd4/jgoerzen2022-09-10T00:32:57Z2022-09-10T00:32:57Z
In my case, I really don't care about losing the old version of a file. I have ZFS snapshots and backups to take care of that. I do use thin, and that particular issue doesn't really bother me. I wounder if a "superthin" where there is no hardlinking into .git/annex at all would be possible? I'm aware that, yes, that could make previous versions unavailable and so forth.
comment 3http://git-annex.branchable.com/forum/Unlocked_mode_without_data_also_under_.git__47__annex__63__/comment_3_ce7ff5d0ea390649af64a296fecf9113/joey2022-09-13T19:13:09Z2022-09-13T19:10:48Z
<p>If you're using ZFS, you should not need to set annex.thin at all;
git-annex will use reflinks between annex object files and unlocked working
tree files, the annex object file will not use any additional disk space.</p>
comment 4http://git-annex.branchable.com/forum/Unlocked_mode_without_data_also_under_.git__47__annex__63__/comment_4_b7eead64e39b5ebb1a22131b4e35251e/Lukey2022-09-14T15:09:20Z2022-09-14T15:09:20Z
ZFS doesn't support reflinks. XFS does.
comment 5http://git-annex.branchable.com/forum/Unlocked_mode_without_data_also_under_.git__47__annex__63__/comment_5_97fd65626991b1e8012e15595941d25d/joey2022-09-15T18:26:31Z2022-09-15T16:27:54Z
<p>Ah, oops.. I was thinking about BTRFS..</p>
<p>However, getting back to the original motivation of jgoerzen to
request this, it seems to come down to making a hard link being seen as
"mucking with the source data". That seems like a very weak reason to make
such a very large change to git-annex, that would only be safe in a
small and poorly defined set of circumstances.</p>
<p>And it would be a large change, because currently git-annex can broadly
assume that any time a .git/annex/objects/ file exists, the content
is present in the repository. Every place that makes that assumption
would need to instead check if any of the known work tree files that use
the object are populated with the content (or at least are not annex
pointer files).</p>
<p>(jgoerzen also mentions timestamps, but git-annex preserves those
when ingesting files. Of course timestamp data is not recorded in the git
repository unless you use some other tool to do so.)</p>
comment 6http://git-annex.branchable.com/forum/Unlocked_mode_without_data_also_under_.git__47__annex__63__/comment_6_f2572320e3cc93c7dd6db51e78f31c10/jgoerzen2022-09-19T14:49:02Z2022-09-19T14:49:02Z
<p>I understand what you're saying here, and at this point this is probably not super relevant due to the large change it would represent, but just thought I'd further clarify the use case....</p>
<p>I, in some cases, use hard links fairly intentionally. "This file is both a photo and a record of something; let me show it in both places." I don't want things hardlinked together that previously weren't, and don't want existing hardlinks broken. Now this is for my daily use; for long-term archiving, it isn't all that important. So I don't want hard links being adjusted on the source, but don't care so much about the destination (at least so long as broken hardlinks don't result in excessive increases in storage space requirements; but my main area where that would occur has 10s of millions of files, so won't be using git-annex for other reasons.)</p>
<p>Also in some cases, I have a read-only directory (whether from NFS mount or something else). It is easy enough to mount a .git onto it, via a bind-mount or anything else, but trying to modify the actual content of course wouldn't work.</p>
<p>Anyhow, thanks for the conversation!</p>