Recent comments posted to this site:

I have mostly implemented an annex.dbdir to relocate the sqlite databases. It's in the dbdir branch until I get it fully working.

Comment by joey Thu Aug 11 21:15:29 2022

There is not, but if you can find a way to get wget or something to generate a list of urls and the files it downloaded them to, you can feed that into git-annex addurl --batch to teach git-annex what the urls are.

There is a subsystem in git-annex that could in theory be used for this, git-annex-import can import trees of files from a special remote.

But the complexity of mirroring a website makes me think I would not want to try to support it in the web special remote. I mean, just look at how many options wget has that you might use to control how the mirroring works.

Other special remotes can support importing from specific types of websites though. Currently this is limited to built-in special remotes, such as S3, but it would be possible to expand it to support external special remotes as well. See importtree only remotes for discussion about doing that.

Comment by joey Thu Aug 11 17:54:16 2022

Indeed it does. I assumed it'd concatenate the files internally for efficiency but is still able to output them separately later but that is not the case. I guess you could store the offsets externally and skip/seek but that'd be inefficient. Perhaps bup could add an option for batching multiple singular-file splits in one invocation to avoid overhead.

Alternatively, git-annex could index+save using excludes. That could be quite complicated (especially tracking which object exists inside which batches) but it could work.

Comment by Atemu Wed Aug 10 12:35:27 2022

BTW -- well done on:

For future reference, I was able to run the test suite by cloning datalad, and settup up the virtualenv like in the README, except then running "pip install ." in the datalad directory.

FTR: There is also CONTRIBUTING.md with more "development oriented" instructions for installation etc. Might come handy.

Comment by yarikoptic Tue Aug 9 19:12:40 2022
Thank you Joey! FTR, fixed in 10.20220724-96-g21cfd0ea9
Comment by yarikoptic Tue Aug 9 18:02:32 2022
Thank you Joey! FTR, fixed in 10.20220724-96-g21cfd0ea9
Comment by yarikoptic Tue Aug 9 18:02:08 2022

Found the bug.. it is in the change to Command.AddUrl. Since addSmall changed to a CommandPerform, void $ Command.Add.addSmall ... only now runs the first part of it, but not the action that returns that actually adds the file to git! The void causes that to get thrown away. Easily fixed.

Comment by joey Tue Aug 9 17:35:46 2022

In particular, the test runs:

git-annex addurl -c annex.largefiles=exclude=*.txt --with-files --json --json-error-messages --batch

It never runs git-annex add.

Comment by joey Tue Aug 9 17:22:52 2022

For future reference, I was able to run the test suite by cloning datalad, and settup up the virtualenv like in the README, except then running "pip install ." in the datalad directory.

I reproduced the datalad test failure. Then I wrapped git-annex with a script that logs the paramters it was run with. It seems the test is running git-annex addurl, not git-annex add at all. So maybe that commit broke addurl?

Comment by joey Tue Aug 9 17:19:56 2022

Unable to reproduce such a problem using git-annex so far. (I don't know how to set up an enviroment to run that case of the datalad test suite.)

Also I don't see any obvious logic error in the code of that commit.

Comment by joey Tue Aug 9 16:41:59 2022