Recent comments posted to this site:

Re: support for bulk write/read/test remote - ps
P.S.: And to make it more clear why I told about dar first and then about writing BDXLs - when I mentioned dar - it was the stage when I was experimenting with dar to use it as an intermediary to write to BDXLs. But then I started to experiment with plain files because it could be better for a long-term archival solution.
Comment by psxvoid
Re: support for bulk write/read/test remote - joey

Hi Joey,

Sorry, for the late response, and thanks for the feedback.

"that's fundamentally different than how git-annex works"

Hence the previous comment :)

"And I think you could put it in your special remote."

That's exactly what I was doing around a year ago. I was implementing a special remote to support writing data on BDXL disks.

"So that when git-annex sends a file to your remote, the file is actually stored in the remote, rather than in a temporary location."

Yep, roughly that's how I was implementing it - storing intermediate data in an sqlite database.

I'd put the project on hold because I started to ask myself the following questions:

  1. OK, I can store transactions in the special remote. It means storing what is where on which disk. Isn't it what git annex supposed to do?
  2. If a BDXL disk get's corrupted or lost, how to reflect it in the git annex repo and the special remote? I can mark it as "lost" in the remote, then run fsck in git annex remote.
  3. Because I have to track location data separately in the special remote, what if it get's corrupted (the sqlite database)?
  4. What if I buy 50GB BDXL instead of 100GB which I'm using? Does it means the special remote also should track free space on each disk?
  5. Burning a disk - what if it won't be successful? Git annex will think that it was successful, cause it doesn't support bulk operations and numcopies rules will be violated.

There were many more questions like this.

And at some point the design started to look more like a blown-up feature-reach archival application/solution. The main point here is that it's definitely possible. I can limit the scope but there are many many issues, and nobody except me will be interested in it. Plus, many responsibilities would be overlapping with git annex.

Comment by psxvoid
comment 3

It's not as simple as just plumbing that up though, because testremote has implicit dependencies in its test ordering. It has to do the storeKey test before it can do the present test, for example.

I already thought that this might be the case, so running the tests independently isn't really infeasible.

To address my second point I might be able to just parse the output of testremote into "sub-tests" on the Forgejo-aneksajo side. Tasty doesn't seem to have a nice streaming output format for that though, right? There is a TAP formatter, but that looks unmaintained...


There are actually only two write operations, storeKey and removeKey. Since removeKey is supposed to succeed when a key is not present, if storeKey fails, then removeKey will succeed. But removeKey should fail to remove a key that is stored on the remote. To test that, the --test-readonly=file option would need to be used to provide a file that is already stored on the remote.

Now that you are saying this, is a new option even necessary? --test-readonly already takes a filename that is expected to be present on the remote, so instead of adding a new option --test-readonly could ensure that this key can't be removed, and that a different key can't be stored (and that removeKey succeeds on this not-present key).

Comment by matrss
comment 2

I don't know about the "--write-only" name, but I see the value in having a way for testremote to check what a remote that is expected to only allow read access does not allow any writes, as well as otherwise behaving correctly.

There are actually only two write operations, storeKey and removeKey. Since removeKey is supposed to succeed when a key is not present, if storeKey fails, then removeKey will succeed. But removeKey should fail to remove a key that is stored on the remote. To test that, the --test-readonly=file option would need to be used to provide a file that is already stored on the remote.

I think it would make sense to require that option be present in order to use this new "--write-only" (or whatever name) option.


Also, git-annex does know internally that some remotes are readonly. For example, a regular http git remote that does not use p2phttp. Or any remote that has remote.<name>.annex-readonly set. Currently testremote only skips all the write tests for those, rather than confirming that writes fail. It would make sense for testremote of a known readonly remote to behave as if this new option were provided.

(But, setting remote.<name>.annex-readonly rather than using the "--write-only" option would not work for you, because that config causes git-annex to refuse to try to write to the remote. Which doesn't tell you if your server is configured to correctly reject writes.)

Comment by joey
comment 1

It would be possible to make git-annex testremote support the command-line options of the underlying test framework (tasty). git-annex test already does that, so has --list-test and --pattern.

It's not as simple as just plumbing that up though, because testremote has implicit dependencies in its test ordering. It has to do the storeKey test before it can do the present test, for example. Those dependencies would need to be made explict, rather than implicit.

Explict dependencies, though, would also make it not really possible to run most of the tests separately. Running testremote 5 times to run the listed tests, if each run does the necessary storeKey would add a lot of overhead.

Not declaring dependencies and leaving it up to the user to run testremote repeatedly to run a sequence of tests in the necessary order would also run into problems with testremote using random test keys which change every time it's run, as well as it having an end cleanup stage where it removes any lingering test keys from the local repository and the remote.

This seems to be a bit of an impasse... :-/

Comment by joey
comment 1
Passing --fast to fsck will prevent it needing to download the files.
Comment by joey
My config works now

I have .gitattributes:

* annex.largefiles=nothing filter=annex
*.pdf annex.largefiles=anything filter=annex

and git config:

[annex]
    gitaddtoannex = true

Using git add now adds it to annex. This can be confirmed with

git annex info file.pdf

The output should show present = true at the end. If it wasn't added to annex, the output would show fatal: Not a valid object name file.pdf.

And it seems that, by default, the files are stored in the working tree in their unlocked state. So git add doesn't replace the file with a symlink unlike git annex add

Comment by incogshift
comment 1

I think that "annex.assistant.allowlocked" would be as confusing, like you say the user would then have to RTFM to realize that they need to use annex.addunlocked to configure it, and that it doesn't cause files to be locked by default.

To me, "treataddunlocked" is vague. Treat it as what? "allowaddunlocked" would be less vague since it does get the (full) name of the other config in there, so says it's allowing use of the other config.

I agree this is a confusing name, and I wouldn't mind changing it, but I don't think it warrants an entire release to do that. So there would be perhaps a month for people to start using the current name. If this had come up in the 2 weeks between implementation and release I would have changed it, but at this point it starts to need a backwards compatability transition to change it, and I don't know if the minor improvement of "allowaddunlocked" is worth that.

Comment by joey
comment 2
Thank you for the fix, that built just fine and I've successfully bumped the Arch Linux package to 20250929.
Comment by caleb