Could we add a prefix option to git-annex-export?
Something like git annex export master:some-videos --to myexport --prefix share-with-john
would create a new subdirectory called share-with-john
on the myexport
exporttree remote and copy all files from the local some-videos
directory into the new share-with-john
directory.
I could then do another export using the same remote like git annex export master:some-other-videos --to myexport --prefix share-with-bill
which wouldn't touch any of the videos I previously shared with john but would create a new export into a new share-with-bill
directory.
My goal with the prefix option is to setup an exporttree remote one time, but then be able to re-use this same remote multiple times to create independent publicly shared folders.
This would complicate the data structures git-annex needs for an export remote.
Notably, git-annex relies on location tracking information in the git-annex branch, which tracks whether or not a remote contains an object. With multiple subdirectories in the same remote, one could contain an object and another one not. There's no way to record that without some kind of unique identifier for each tree in the remote.
So I think this is better handled by initializing one remote per export directory, and using something like S3's fileprefix= option. Of the remotes supporting exporttree so far, only S3 has such a thing; it would make sense to add it to webdav, and perhaps rsync and adb. (Seems that for directory it's easier to just make a new directory.)
Can I ask what type of special remote you were wanting this feature for?
TBH, an exported tree can contain the same content in two files, and the export database does contain information to deal with that; the location log doesn't really need to be modified to support this.
Another way to look at it is that you want to build a larger tree, that contains two or more trees in subdirectories, and export the larger tree to the top of the remote. And if you look at this that way, it's already possible to do what you want using only git commands like
git mktree
.OK. Excellent, yes
git mktree
sounds promising, although I might still need an additional option toexport
to achieve my goals (maybe--append-only
see last paragraph). Here is more detail on my use case:I am looking to add a
Share
feature to git-annex-turtle. Often, I have one file, or a few files that I wish to share with people. Typically I will upload these to google drive and then create a public share link from the entire folder they are in, then share that with people over email. Withgit-annex-turtle
I would like a user to be able to right-click on a file, a multiple-file selection or a folder, chooseShare
and then have these files shared somewhere publicly, then report back to the user the public URL of this share (via their clipboard or a dialog).I don't know what remote types would be desirable for
git-annex-turtle
users, but I was hoping that I could find a solution that could work for any remote type that already supports public file downloads and theexporttree
option. Probably rsync, s3 and google drive would be a good first start. s3 and google drive already support public downloads and a savy user could certainly make their rsync remote support public downloads as well.The issue with “initializing one remote per export directory” is that I would like to minimize effort for my
git-annex-turtle
users. I think it is reasonable to ask them to setup one public remote one time, then they can re-use that remote anytime they want to share something new. But I wouldn't want to have to ask them to create a new remote everytime they click theShare
button. It is certainly plausible that I could automate this process, but then I would need to investigate storing security credentials and the various authentication mechanism for various supported remotes and write code to auto-generate new remotes of various types on the fly.Using
git mktree
seems very promising since I could pass it an arbitrary set of 1 or more files or folders that are already present ingit
, create a new tree then share that tree using the export command. One problem with this is that it becomes tricky when I want to share another tree, without deleting previously exported trees (right?). I think, if I could keep track of all trees previously shared I could create a new tree containing all of the old trees and the new tree. But, I don't want to do this, becausegit-annex-turtle
is designed to store no critical file or content information that can't be automatically recreated from the git repositories themselves.I am wondering if adding an option like
--append-only
to theexport
command would resolve this issue? This option would disable the entire merge process, never deleting content from a remote, only ever adding. I could then create new trees usinggit mktree
anytime I want to do a new share and just do the export of that new tree with the--append-only
option and not have to worry aboutgit annex
trying to merge changes and delete previously exported trees? Or perhaps this isn't any easier than a--prefix
option since theexport
command needs to locally keep track of what it exported? Perhaps there could be a new command instead ofexport
? Some kind of command that supports any remote that already supports theexporttree
option? Perhaps something likegit annex copy-tree treeish --to the-public-remote
would copy a tree to a remote using something similar to theexport
mechanism but would never attempt to do any merging and would never keep track of what was uploaded. Or perhaps the--append-only
option toexport
could behave similarly, never keeping track locally of what was uploaded.git-annex export
functionality can work well. I think I will have users specify a local directory in their annex called something likepublic-share
along with a single public exporttree remote to use with that local share. Whenever the user clicksShare
on a single file (or folder or multi-selection of files and folders) i'll just create a new sub-directory inpublic-share
called something likepublic-share/CURRENT_DATETIME/
and place all of the new files to share in there. Then i'll do an export like:git-annex export master:public-share --to=public-tree-remote
. This takes advantage of the existing export functionality and has the added benefit of giving the user a local record of all files that are currently publicly shared, which seems pretty useful.I wonder if you'd be better off querying
git annex whereis --json
for public urls and providing those to the users. Several special remotes provide public urls. (S3 needs a non-default configuration to do it.)Anyway back to your question, git-annex arranges for the exported tree to always be available, including across clones of the repository. So you can get the exported tree, graft another file into it, and export the new tree.
So how to look up the previously exported tree? internals documents how to find it in the export.log, but it seems there ought to be a command-line interface to query for it. So I've made
git annex info remote
provide that information, as "exportedtree". Note that in an export conflict it may have multiple values. You'll want to use --fast with that, and probably --json.Let me know if you need something more, or if this todo can be closed with that.
Perfect thanks. Yes fine to mark as done.
Thank you for adding additional information about exporttree to
git annex info remote
that will prove useful if I do go done that route at a future date.Yes, I do plan to lookup public URLs with
git annex whereis --json
.