Help fixing S3 mistake

Hello, I'd like some help fixing a mistake I made setting up an S3 remote.

I created an S3 remote A, then at some later time accidentally created an S3 remote B with the same settings as A, and moved some files over to B.

I have fixed it by removing/marking as dead B, but I am missing some files now.

I believe that all of these files exist in A since it is the same bucket as B, but git annex doesn't know that they are in A. Is there a way to somehow "reinject" or refresh git annex so it knows that the files are there? I'm using chunking as well, so I don't know how to download all the chunks and use git annex's reinject command.

RSS Atom

comment 1

A and B have different UUIDs, and they point to the same S3 bucket (I realize I may have done something bad here).

I still have the UUID for B (but not attached to a remote); is it possible to merge git annex's knowledge of B into A, or otherwise re-initialize B?

Comment by darkfeline — Tue Oct 13 05:24:25 2015

Remove comment

comment 2

Depends.. If one or both special remotes used encryption then no, one can't see the encrypted files that were put in the other one.

If neither used encryption, and they're otherwise configured the same, then you can just use git annex fsck --from A. This will check files to see if their content is located on remote A, and if so, and git-annex had thought the file was only located on remote B, it will update the location tracking log to reflect the reality that the file is present on A.

If either remote used encryption, then A can't see files that were added to B. So instead, you need this approach , which involves data transfer:

git annex enableremote B
git annex copy --from B
git annex copy --to A
git annex drop --from B
git annex dead B # if it wasn't already dead
git remote remove B

Comment by joey — Thu Oct 15 18:30:27 2015

Remove comment

comment 3

Thanks for the help. I'm currently stuck partway through fixing this.

I changed the UUID of the remote "amazon" in .git/config to the UUID of B.
I changed contents of the annex-uuid file in the S3 bucket from the UUID of A to the UUID of B (making sure to not include a newline).

After this, I think I can follow your steps to fix the problem. However, Git annex is now reporting the error whenever I try to run a command:

trusted repositories: git-annex: S3 bucket not configured

My understanding of this message is that Git annex is not seeing the UUID it is expecting in the S3 bucket, and that it will be fixed when some cache expires. Is this correct?

Comment by darkfeline — Sun Oct 18 20:15:56 2015

Remove comment

comment 4

Actually, I suspect that I may have done more damage than I had initially thought.

Is there a way to check whether the repo still has information about a file's whereabouts, especially how it was chunked on an S3 remote? I'm not sure if that information still exists or not. If it doesn't exist anymore, then recovery is likely impossible.

If this information still exists in the repo, I can reconstitute the files by hand (script) if necessary and reinject them. I'm assuming that I can decrypt them using my private key?

Comment by darkfeline — Sun Oct 18 23:58:08 2015

Remove comment

comment 5

Since this information about remotes is stored in your git repository, I don't see how the repository could lose it. I mean, you might have to look in the history if you've changed some value, like the chunking setting from what it's supposed to be, in order to find the historical value, but like anything checked into a git repository, it's there until you throw the repository away or git-filter-branch the information out of existence.

Comment by joey — Mon Oct 26 17:31:54 2015

Remove comment

Add a comment