Recent comments posted to this site:

It could be that the problem occurs if 0-byte files are annexed in unlocked form. At least all files in the strace output are 0 bytes long and the problem goes away when these are added to git and not git annex....
Comment by lell Thu Feb 20 11:00:05 2025

I am confused by what you mean by "keep the overview over a git annex repository" and "are complete locally" (do you mean "are completely local"?)

It appears you are requesting an alternative representation of the working tree with folders collapsed if the locations for all contained annexed files are the same. However what that representation means is confusing: "this folder (not file) has copies in these locations". Folders are not synced across remotes: file content is. annex list is meant to see exactly what file content exists where and if that content is trusted (X) or untrusted (x). What if there are non-annexed files in that folder? The collapsed view almost seems to indicate that maybe these files exist in those locations, too.

This also does not appear to have much to do with git annex info.

If you are overwhelmed by the information density, time with the git-annex will help you understand why what it reports is important. Also if terminal history clutter adds to your information overwhelm, you can use command | less to use a terminal pager to help parse longer form information.

Comment by Spencer Wed Feb 19 23:22:46 2025

In my testing, I have found git annex forget --drop-dead --force problematic because if ever the two repositories speak to one another (thru e.g. fetch) again, the very alive remote for one marked dead in the other will be eradicated.

Luckily I've learned that you don't have to fetch from one remote to another to still issue "informed" annex commands which is critical. In other words, I didn't appreciate how annex learns of file content in remotes dynamically, I thought it was fairly dependent on merging in the git-annex branch to learn about files. Instead you can confidently treat fetch, pull, push commands as all exclusively for the merging of two sibling repos (and their histories, settings, remotes, etc).

For these kinds of ("friend"?) remotes (unrelated remotes), I think you'll want to remove the fetch refspec entirely and add annex-sync=false if you want to keep the relationship around, otherwise never run sync until you remove unrelated remotes.

Comment by Spencer Wed Feb 19 23:08:41 2025

One thing that I am unsure about is what should happen if git-annex get foo needs the content of file bar, which is not present. Should it get bar from a remote? Or should it fail to get foo?

Consider that, in the case of git-annex get foo --from computeremote, the user has asked it to get a file from that particular remote, not from whatever remote contains bar.

If the same compute remote can also compute bar, it seems quite reasonable for git-annex get foo --from computeremote to also compute bar. (This is similar to a single computation that generates two output files, in which case getting one of them will get both of them.)

And it seems reasonable for git-annex get foo with no specified remote to also get or compute bar, from whereever.

But, there is no way at the level of a special remote to tell the difference between those two commands.

Maybe the right answer is to define getting a file from a compute special remote as including getting its inputs from other remotes. Preferring getting them from the same compute special remote when possible, and when not, using the lowest cost remote that works, same as git-annx get does.

Or this could be a configuration of the compute special remote. Maybe some would want to always get source files, and others would want to never get source files?


A related problem is that, foo might be fairly small, but bar very large. So getting a small object can require getting or generating other large objects. Getting bar might fail because there is not enough space to meet annex.diskreserve. Or the user might just be surprised that so much disk space was eaten up. But dropping bar after computing foo also doesn't seem like a good idea; the user might want to hang onto their copy now that they have it, or perhaps move it to some faster remote.

Maybe preferred content is the solution? After computing foo with bar, keep the copy of bar if the local repository wants it, drop it otherwise.


Progress display is also going to be complicated for this. There is no way in the special remote interface to display the progress for bar while getting foo.

Probably the thing to do would be to add together the sizes of both files, and display a combined progress meter. It would be ok to not say when it's getting the input file. This will need a way to set the size for a progress display to larger than the size of the key.


.... All 3 problems above go away if it doesn't automatically get input files before computations and the computations instead just fail with an error saying the input file is not present.

But then consider the case where you just want every file in the repository. git-annex get . failing to compute some files because their input files happen to come after them in the directory listing is not good.

Comment by joey Wed Feb 19 18:39:41 2025

I've started a compute branch which so far has documentation for the compute special remote, git-annex addcomputed, and git-annex recompute

I am pretty happy with how this design is shaping up.

Comment by joey Wed Feb 19 18:29:58 2025

LFS uses http basic auth, so using it over http probably allows any man in the middle to take over your storage.

With that rationalle, https://hackage.haskell.org/package/git-lfs hardcodes a https url at LFS server discovery time. And I don't think it would be secure for it to do anything else by default; people do clone git over http and it would be a security hole if LFS then exposed their password.

In your case, you're using a nonstandard http port, and it's continuing to use that same port for https. That seems unlikely to work in almost any situation. Perhaps a http url should only be upgraded to https when it's using a standard port. Or perhaps the nonstandard port should be replaced with the standard https port. I felt that the latter was less likely to result in security issues, and was more consistent, so I've gone with that approach. That change is in version 1.2.4 of https://hackage.haskell.org/package/git-lfs.

git-lfs has git configs lfs.url and remote.<name>.lfsurl that allow the user to specify the API endpoint to use. The special remote's url= parameter is the git repository url, not the API endpoint. So I think that to handle your use case, it makes sense to add an optional apiurl= parameter to the special remote, which corresponds to those git configs.

Unfortunately, adding apiurl= needed a new version 1.2.5 of https://hackage.haskell.org/package/git-lfs, so it will only be available in builds of git-annex that use that version of the library. Which will take a while to reach all builds.

Comment by joey Tue Feb 18 16:23:23 2025

Found that

git annex lock .

fix it

I guess when copied it kept in unlocked state which may be nothing but absence of symlink (layman term) So by this logic when I tried git annex lock . it fixed this problem.

Not sure if here are any option/config to control this that when file first copies it should be directly put into locked state.

Thanks

Comment by sharad Mon Feb 17 19:30:27 2025
Is there a workaround in the meantime? I'm attempting to create a new remote, with a brand new backblaze bucket. I'm sure I've done this before, but maybe at that time I used git-annex-remote-rclone
Comment by datamanager Sat Feb 15 21:46:32 2025
Comment by anarcat Fri Feb 14 17:51:29 2025
Comment by anarcat Fri Feb 14 17:47:01 2025