This is a summary todo covering several subprojects, which extend git-annex to be able to use proxies which sit in front of a cluster of repositories.

  1. passthrough proxy
  2. p2p protocol over http
  3. balanced preferred content
  4. track free space in repos via git-annex branch
  5. proving preferred content behavior

table of contents

plan

Joey has received funding to work on this. Planned schedule of work:

  • June: git-annex proxies and clusters
  • July: p2p protocol over http
  • August, part 1: git-annex proxy support for exporttree
  • August, part 2: balanced preferred content
  • September: proving behavior of balanced preferred content with proxies
  • October: streaming through proxy to special remotes (especially S3)

This project is now complete! done --Joey

some todos that spun off from this project and didn't get implemented during it:

For balanced preferred content and maxsize tracking:

For p2p protocol over http:

For proxying:

completed items for October's work on streaming through proxy to special remotes

  • Stream downloads through proxy for all special remotes that indicate they download in order.
  • Added ORDERED message to external special remote protocol.
  • Added DATA-PRESENT and documented in client side upload to a special remote

completed items for September's work on proving behavior of preferred content

  • Static analysis to detect "not present", "not balanced", and similar unstable preferred content expressions and avoid problems with them.
  • Implemented git-annex sim command.
  • Simulated a variety of repository networks, and random preferred content expressions, checking that a stable state is always reached.
  • Fix bug that prevented anything being stored in an empty repository whose preferred content expression uses sizebalanced. (Identified via git-annex sim)

completed items for August's work on balanced preferred content

  • Balanced preferred content basic implementation, including --rebalance option.
  • Implemented track free space in repos via git-annex branch
  • Implemented tracking of live changes to repository sizes.
  • git-annex maxsize
  • annex.fullybalancedthreshhold

completed items for August's work on git-annex proxy support for exporttre

  • Special remotes configured with exporttree=yes annexobjects=yes can store objects in .git/annex/objects, as well as an exported tree.

  • Support proxying to special remotes configured with exporttree=yes annexobjects=yes.

  • post-retrieve: When proxying is enabled for an exporttree=yes special remote and the configured remote.name.annex-tracking-branch is received, the tree is exported to the special remote.

  • When getting from a P2P HTTP remote, prompt for credentials when required, instead of failing.

  • Prevent updateproxy and updatecluster from adding an exporttree=yes special remote that does not have annexobjects=yes, to avoid foot shooting.

  • Implement git-annex export treeish --to=foo --from=bar, which gets from bar as needed to send to foo. Make post-retrieve use --to=r --from=r to handle the multiple files case.

completed items for July's work on p2p protocol over http

  • HTTP P2P protocol design p2p protocol over http.

  • addressed ?P2P locking connection drop safety

  • implemented server and client for HTTP P2P protocol

  • added git-annex p2phttp command to serve HTTP P2P protocol

  • Make git-annex p2phttp support https.

  • Allow using annex+http urls in remote.name.annexUrl

  • Make http server support proxying.

  • Make http server support serving a cluster.

completed items for June's work on passthrough proxy:

  • UUID discovery via git-annex branch. Add a log file listing UUIDs accessible via proxy UUIDs. It also will contain the names of the remotes that the proxy is a proxy for, from the perspective of the proxy. (done)

  • Add git-annex updateproxy command (done)

  • Remote instantiation for proxies. (done)

  • Implement git-annex-shell proxying to git remotes. (done)

  • Proxy should update location tracking information for proxied remotes, so it is available to other users who sync with it. (done)

  • Implement git-annex initcluster and git-annex updatecluster commands (done)

  • Implement cluster UUID insertation on location log load, and removal on location log store. (done)

  • Omit cluster UUIDs when constructing drop proofs, since lockcontent will always fail on a cluster. (done)

  • Don't count cluster UUID as a copy in numcopies checking etc. (done)

  • Tab complete proxied remotes and clusters in eg --from option. (done)

  • Getting a key from a cluster should proxy from one of the nodes that has it. (done)

  • Implement upload with fanout to multiple cluster nodes and reporting back additional UUIDs over P2P protocol. (done)

  • Implement cluster drops, trying to remove from all nodes, and returning which UUIDs it was dropped from. (done)

  • git-annex testremote works against proxied remote and cluster. (done)

  • Avoid git-annex sync --content etc from operating on cluster nodes by default since syncing with a cluster implicitly syncs with its nodes. (done)

  • On upload to cluster, send to nodes where its preferred content, and not to other nodes. (done)

  • Support annex.jobs for clusters. (done)

  • Add git-annex extendcluster command and extend git-annex updatecluster to support clusters with multiple gateways. (done)

  • Support proxying for a remote that is proxied by another gateway of a cluster. (done)

  • Support distributed clusters: Make a proxy for a cluster repeat protocol messages on to any remotes that have the same UUID as the cluster. Needs extension to P2P protocol to avoid cycles. (done)

  • Proxied cluster nodes should have slightly higher cost than the cluster gateway. (done)

  • Basic support for proxying special remotes. (But not exporttree=yes ones yet.) (done)

  • Tab complete remotes in all relevant commands (done)

  • Display cluster and proxy information in git-annex info (done)