My situation goes something like this:
I have a machine with an annex and a number of special remotes (s3, box, and an rsync'd nas). I also have a git remote that's doesn't have git annex on it, so it's just got the git branches. That machine has some problems so I start getting the annex set up on a different machine. These are the steps that I went through:
- clone from the git remote
- git annex doesn't think this is an actual annex at this point, so
git annex init
it - Now I can start the webapp
git annex webapp
- My rsync'd remote works and the assistant starts downloading the files from there (which is great, it's the local network one) but the box and s3 remotes are disabled (sure, not a huge deal).
- Click enable on the box remote, I have to specify that it's a full backup remote
- Things start copying from box as well, so I disable it until everything from the local network is done
- Click enable on the s3 remote and it wants my AWS creds
- Download those and add it
- Set the remote up the same way, as a full backup.
At this point, all my files have copied from the rsync remote, so I enable the other remotes.
Now I want to make sure that all the remotes are set up and working properly.
I turn off the webapp, turn off direct mode (I think it was indirect mode by default, but I'd been playing with things before then), and git annex drop <file>
a file that I don't particularly care about. Everything drops successfully.
I'm able to git annex get -f <rsync remote> <file>
and from <box remote>
successfully, but when I try to get from the s3 remote, it doesn't give me any output and doesn't download the file.
Having used regular git annex without the assistant before, I try re-initing the remote git annex initremote <s3>
. It complains that there's no type, so I git annex initremote <s3> type=S3
, then it complains about encryption. git annex initremote <s3> type=S3 encryption=shared
. It says it worked, but I git annex get -f <s3> <file>
still doesn't do anything.
After more looking around, it turns out that I may have created a second remote with the same name as the original s3 remote... (figure that out). I use the webapp to rename the remotes so they're different, but neither of them will get -f
successfully.
At this point, I turn to you and ask what the heck I did wrong. I tried editing the remote.log and uuid.log files to remove the new s3 remote (and I figured out which one was which) from the git-annex branch. I also marked the new s3 remote as dead, but I still can't get access to s3.
git annex fsck -f <s3>
doesn't actually seem to hit s3 (I seem to recall that it used to, it calculated checksums), it just checks some git local information.
I don't mind deleting my current checkout and starting from the clone step again if you think I've made too much of a mess. At least I know I can get my files off my rsync remote and box
What you describe,
git annex get $file --from remote
silently not doing anything, is the expected behavior if the remote doesn't have the file. This allows you to eg, rungit annex get . --from remote
and get all files that the remote does have while skipping the rest.Did you ever try looking at
git annex whereis $file
?git annex whereis $file
it tells me it's available at box, s3, my rsync remote, the original repo (from the machine I created the annex on), and my temporary usb stick repo (nothing wrong with redundancy...). So it seems that git annex thinks the file is available in the s3 remote, even though it's refusing to download it.Oh! That might be it! During the whole "I have two remotes with the name s3" situation, it seems that both of them in my .git/config ended up with the same uuid, even though the original one had a different uuid. If I change it back, I end up getting an access denied when I try to
git annex get ...
. Progress!I thought that you were supposed to do a
git annex initremote s3
from a clone to enable a remote with the credentials stored in the repo. It seems that internally something still thinks that the "s3" remote has the new uuid. When I run that command it changes the uuid back to the new (invalid) one.Is there a way I can totally remove the bad s3 (which I've partially renamed to s3thefirst) remote from my history/repo (I'm pretty sure it's been synced back up to origin at this point) or properly rename it so it doesn't keep getting confused? Hopefully that will address my problem.
Yes, when you run
git annex initremote $remotename
with no other parameters, it enables a remote from the stored configuration. Which does not includeAWS_SECRET_ACCESS_KEY
andAWS_ACCESS_KEY_ID
; you need to set those and then you should not get access denied.You seem to say your .git/config contains two remotes with the same name, but I don't think that's possible.
I don't know how you could end up with two remotes with the same name in
git show git-annex:remote.log
, unless the two were added in separate repositories which were then synced together. Since this is not a usual situation there's not any UI to deal with it. I've just committed a change that will makeinitremote
prefer remotes that have not been marked dead when there's a naming comflict.However, I'm more curious how this situation came about. I have not been able to reproduce the problem when enabling a S3 remote using the webapp.
What you could do to help track down how this occurred is to check out the
git-annex
branch, and usegit blame
to find out when the second remote with the same name was first added to theremote.log
file.Then you should be able to tell, either from the email address used for that commit, or at least the date of the commit, whether this occurred recently when you enabled the S3 remote in the webapp, or perhaps at some time in the past.
http://pastebin.com/CM2EfQ21
This is what the commit log looks like for the remote.log file. There is some interesting stuff in here. I'll try to highlight the changes without giving too much of the important bits away.
The I commit at 2013-04-22 11:57 is when I added the box remote:
0490d177-78e2-421b-a004-47d88ee7a2e3 chunksize=10mb cipher=... davcreds=... embedcreds=yes name=box.com type=webdav url=https://www.box.com/dav/annex timestamp=1366657062.972357s 1d0ab67c-6a43-11e2-9feb-df22c6d1e308 bucket=annex-1d0ab67c-6a43-11e2-9feb-df22c6d1e308 cipher=... datacenter=US host=s3.amazonaws.com name=annex port=80 storageclass=REDUCED_REDUNDANCY type=S3 timestamp=1359484726.520727s
The contents also includes my nas remote, but I will omit that for brevity's sake. I did notice that initially the s3 remote was named "annex". That was probably the web interface's doing, way back when I added it.
The next commit at 2013-04-24 10:55 seems to have added encryption=shared and highRandomQuality=false to the nas remote (I think this was when I re-enabled the nas remote through the webapp).
The commit at 2013-04-24 11:05 looks like it added similar stuff to the box remote (added highRandomQuality=false). Probably this was from enabling it then as well.
At 2013-04-24 11:12 the s3 remote had highRandomQuality=false added also.
At 2013-04-24 11:26, a new remote was added:
4d86972d-9b0a-4095-bc50-f9bec8144c30 bucket=s3-4d86972d-9b0a-4095-bc50-f9bec8144c30 cipher=... datacenter=US host=s3.amazonaws.com name=s3 port=80 storageclass=STANDARD type=S3 timestamp=1366828017.8792s
Very possibly this was me doing a
git annex initremote ...
thinking that the s3 remote was actually named s3 (somehow, I feel like I would have checked that, but I'm going to chalk that up to my own stupidity).Then at 2013-04-24 11:35, the new s3 remote was changed... but it seems like only the timestamp was altered. I suspect this was from another command line change, but I don't remember exactly what I did at that point. Probably a reference in a different file was also modified, but I'm not looking at those.
At 2013-04-24 11:37, again the new s3 remote was changed, but again it was just the timestamp.
In the merge at 2013-04-24 15:15, a bunch of things happened. This may be where stuff went wrong. I do find it weird because it should have just been a fast forward, given what the history looks like. I suspect that this was caused by a
git annex sync
, but I'm not 100% sure.In this commit the following happened:
In addition, within that commit, my uuid.log file also had duplication that seems to be where part of the confusion comes from:
Then at 2013-04-24 18:13, I think things try to fix themselves:
No duplicates exist in this file and no cross-references exist either.
The uuid.log file seems to be the place where the annex remote is renamed to s3. I have no idea what caused that, but it was probably me.
All of this seems horribly confusing and I don't envy your trying to unwind it.
Most of this is perfectly normal. The duplication of lines are normal; when two git-annex branches are union merged, it's as if it runs
cat branch1:file branch2:file | uniq > file
. When there are conflicting lines for the same uuid, the one with the newest timestamp is used.The description of the remote in uuid.log is also not relevant to this bug.
This is the key part:
As you note, 2013-04-24 15:15 was a merge. So there must have been two branches before, which had different box remotes with different davcreds.
It would probably help if you can paste those lines as they looked after that merge (omitting most of the davcreds).
Also, I'd like to see the box line from the 11:05 commit.
Two box lines after 15:15 merge:
0490d177-78e2-421b-a004-47d88ee7a2e3 chunksize=10mb cipher=... davcreds=... embedcreds=yes highRandomQuality=false name=box.com type=webdav url=https://www.box.com/dav/annex timestamp=1366826729.945023s 0490d177-78e2-421b-a004-47d88ee7a2e3 chunksize=10mb cipher=... davcreds=... embedcreds=yes name=box.com type=webdav url=https://www.box.com/dav/annex timestamp=1366657062.972357s
After the 11:05 commit, the box line looked like this:
0490d177-78e2-421b-a004-47d88ee7a2e3 chunksize=10mb cipher=... davcreds=... embedcreds=yes highRandomQuality=false name=box.com type=webdav url=https://www.box.com/dav/annex timestamp=1366826729.945023s
I am curious why you want to know about box, when s3 is the one that I'm having trouble with...
Oh.. I got confused by you talking about the box remote. Lines you pasted look ok anyway.
Ok, looking at the S3 remote then...
So, you can never change the names used to refer to remotes in remote.log. These names can be different from the names used to refer to the same remotes in .git/config. (Which can vary from repository to repository anyway..) So, if you originally added a s3 remote and called it "annex", you still need to use that name when running initremote elsewhere to add that remote to your repository.
The remote with name "s3" added in the 11:26 is a separate s3 remote, and I think one you don't want. (And have marked dead?)
I think all you need to do is "git annex initremote annex" to add the s3 remote you want to your new repository.
Ah, I see. It looks like that did solve my problem.
Yes, I did mark the old s3 remote as dead.
At least now I know how to fix it if it ever happens again. I wonder if I'll ever be able to recreate it...
Thanks!
It's easy to recreate. As I understand it, the entire process went something like this:
git annex initremote annex type=S3 encryption=blahblah # possibly this was done in the webapp?
git remote rename annex s3 # also possibly done in the webapp
clone to different computer, and on the new clone:
git annex initremote s3
git-annex: Specify the type of remote with type=
git annex initremote s3 type=S3 encryption=blahblah
The last line creates a new remote.
I'm inclined to think the main confusing thing here is that initremote is used to both create a new special remote, and to configure the repository to use an already existing special remote that was created elsewhere. If you had to use
enableremote
for the latter, things could be less confusing:clone to different computer, and on the new clone:
git annex enableremote s3
git-annex: No existing special remote named s3. Choose from one of these existing special remotes: annex
I tend to agree with you. At first I liked the idea that initremote could be used to re-initialize a remote, but then I got confused about what the name of that remote was. I suppose git annex status could have told me. I kept wanting to have something like "git annex remote" (which would list them) and then "git annex remote init" to initialize them. That way the remote actions would follow the same sort of interface as "git remote", where you could list, init, create, edit, rename, enable, disable, kill (dead?), etc. The main drawback I see with that is having too many levels to type.
I really like the idea of having the ability to "git annex remote show s3" and it will tell me what the type, uuid, options, etc are for that remote.
Have now split out an enableremote command.
Also, I wrote something wrong before. It is possible to change the name used by initremote (now enableremote).
With the current release of git-annex:
git annex initremote annex name=mys3
With the next release:
git annex enableremote annex name=mys3