Recent comments posted to this site:

Hi folks!

We are considering introducing git-annex with gcrypt in hybrid mode as secure storage for common data in our company and I'd rather not delete and reinit the repo everytime when somebody new is granted access. A little testing with current git-annex showed, that GCRYPT_FULL_REPACK with a forced git-push of all branches makes the git-repo accessible (I get the files) to the newcomer but not the annexed data (gpg error "No secret key" in git annex get, git annex info secretRepo just lists my first key).

Has anybody sucessfully tested adding keyids in hybrid-encryption later on? Which further steps where needed to make it work?

Thanks for any input! :)



Comment by joern.mankiewicz Tue Mar 21 22:08:30 2017

@joey - thanks, that's prompt feature request fulfilment :-)

Looking more closely at the duplicates, it turns out that not everything got duplicated, just the "older" episodes. It turns out the newer episodes do have guid values saved (as itemid in the metadata) and the older episodes do not. I think this is most likely because I was running a fairly old git-annex until about October 2016, on a fairly old OS install, but then upgraded to a more recent one (now about 6 months old) which does track them. My assumption (without checking every file) is the episodes downloaded before October 2016 are ones that got duplicated.

I've edited the main page and added a note that GUIDs are tracked in versions since 2015, since I didn't obviously find that listed anywhere before.


Comment by ewen Tue Mar 21 21:46:27 2017

@rok it's a consequence of using smudge/clean filters; git add passes the file through the filters.

Comment by joey Tue Mar 21 17:43:33 2017

You can't accomplish this with remote.<name>.annex-ssh-options, since it is not exposed to the shell, and the parser just breaks it up into words.

A smarter parser would be needed. Or you could configure it in ~/.ssh/config, or perhaps make a ssh config file elsewhere and use annex-ssh-options to pass -F to ssh to make it use this other config file.

Now that git-annex supports GIT_SSH_COMMAND, which is exposed to the shell, you should be able to accomplish it that way. I don't know if that would work in your use case, since the environment variable affects all ssh remotes, not just one.

Comment by joey Tue Mar 21 17:38:57 2017

@ewen importfeed already tracks guids, since 2015. Relevant commit is f95a8c867223b2e17d036d0d3377bf0fc9d3adff

You may well have an older version of git-annex that didn't do that. But there are probably also feeds that lack a useful guid, or that even make a change that changes the guid of an existing item.

With git annex metadata, you can see the itemid which is where the guid is stored.

PS, please post in todo when you have a request..

Comment by joey Tue Mar 21 17:28:27 2017
I've been waiting for this for 5 years. I can't wait to use this :-).
Comment by konubinix Tue Mar 21 15:35:27 2017

While tracking podcast media URLs usually works to avoid duplicate downloads, when it fails it usually fails spectacularly. In particular if a podcast feed decides to update all the URLs (for old and new podcasts) to use a different URL scheme, then suddenly that looks like a huge volume of new URLs, and all of them get downloaded again -- even if the content has actually already been retrieved from a different URL (ie, older URL scheme). For instance the service has changed their URL scheme a couple of times in the last 1-2 years, rewriting all the historical URLs, so I have three copies of many of the episodes on podcasts on their service :-( (Many downloaded; some skipped once I caught the bulk download and stopped it/reran with --fast or --relaxed to make placeholders instead. seem to have managed to cause even more confusion by rewriting many of the older mp3 files with new id3 tags, thus changing the file size/hashes -- it definitely made cleaning up more complicated.)

Some (all?) podcast feeds also have a guid field, which specifies what should be a unique per-episode and unchanging, that other podcatchers use to track "seen this" content. In theory that guid value should be stable even across media URL changes -- at least if it isn't, then a podcaster changing the guid and media URL will almost certainly induce re-downloads in most podcatchers, and thus hopefully realise early on (eg, during testing) rather than in production.

Can git-annex be extended to track the guid values as well as the filenames, so git annex importfeed can avoid downloading episodes where it has already processed that guid, and instead just add the newly listed url as an alternate web URL for that specific episode (which has been my manual work around). Perhaps the episode guid could be stored as additional metadata, along with some sort of feed unique ID (link?), and then an index built/consulted when importfeed runs (although that "feed unique ID" would probably also have to be updatable by the user, to cope with "the feed URL has now changed from http:// to https:// which also seems to be happening a bunch at present.)


PS: Apologies for duplicate partial comment; I think my browser decided some key combination meant "do default form action", which is post -- and I wasn't finished writing. I couldn't see a way to edit the comment, hence deleting/readding.

Comment by ewen Tue Mar 21 08:59:59 2017

Thanks, joey.

Your last comment brought me onto the right track. The Problem was not in the repository, but an old stale global .gitconfig in my homedir. I just checked $XDG_CONFIG_HOME/git/config were currently my global git-config is residing and totaly forgot about this old config. Stupid me!

git config --show-origin --get annex.largefiles

was my savior here as it clearly indicated that there is indeed a (unintended) config setting and where to find the file. So i can strongly recommend anybody experiencing strange behavior to try this one-liner. It might have saved me hours of time.

Thanks for your help! :)



Comment by joern.mankiewicz Mon Mar 20 22:47:50 2017

Note that if annex.largefiles is set in git config (including global git config), it overrides the .gitattributes setting. So a reasonable guess would be that you set it in the git config.

Comment by joey Mon Mar 20 21:18:24 2017

@joern.mankiewicz, you need to file a bug report with enough information to reproduce your problem.

annex.largefiles in .gitattributes works fine:

joey@darkstar:~/tmp> git init ttt
Initialized empty Git repository in /home/joey/tmp/ttt/.git/
joey@darkstar:~/tmp> cd ttt
joey@darkstar:~/tmp/ttt> git annex init
init  ok
(recording state in git...)
joey@darkstar:~/tmp/ttt> echo '* annex.largefiles=nothing' > .gitattributes
joey@darkstar:~/tmp/ttt> touch foo
joey@darkstar:~/tmp/ttt> git annex add foo
add foo (non-large file; adding content to git repository) ok
(recording state in git...)
Comment by joey Mon Mar 20 21:16:30 2017