Sunday, June 17, 2012

Shake Storage Layer

Summary: Shake maintains metadata as the build progresses. This metadata must remain up to date and consistent even if the process is killed. This post describes my new strategy for doing that.

Shake is an advanced build system, and in common with nearly all advanced build systems, it maintains extra metadata about rules - when the rule was last run, what the dependencies were, how long it took etc. If the metadata associated with a rule is not available, the rule must be rerun, which is often expensive. Any build system is likely to be interrupted on a regular basis - both due to failing rules (compile errors) and the user aborting a build. As a result, it is important that the metadata is robustly stored to disk as soon as it is produced.

In this post, I outline the old solution to maintaining metadata, along with the new solution available in shake-0.3, which I just released. The new solution has a number of benefits:

  • Reduces time loading/saving metadata by up to 75%. In practice this is unlikely to make a significant difference unless no rules need running.
  • Exceptions at any point will not cause file handles to be left open.
  • Previously there were very small windows where if the process died suddenly all metadata would be corrupted. These have been eliminated.
  • I removed all knowledge of the build system from the storage layer, making it properly decoupled.

Most of these improvements have been driven by people using Shake in new ways. When used as a replacement for Make, with one invocation per run, many of these issues are theoretical. Now people are running Shake in background threads and forcibly killing and restarting it on a regular basis, these issues can be observed in practice. However, the improvements will benefit everyone.

The Old Solution

The old solution has remained basically the same since the very first version of Shake, over three years ago. Shake maintains two files - the database contains the metadata, while the journal contains a list of metadata updates that can be appended to. The sequence of steps is:

  • Load the database
  • If the journal exists then:
    • Replay the journal into the database
    • Save the database
    • Delete the journal
  • Run the build, storing any updates to the journal
  • Save the database
  • Delete the journal

This solution works well, but has a couple of flaws. Whenever we save the database, if it gets corrupted half-way through, we lose the entire database, causing the build to start from scratch. Another problem is that if we are building nothing, we read in all the metadata, then write it all out again with only one single modification (incrementing the build time step). Since serialisation takes 3x longer than deserialisation (in benchmarks on the Shake metadata) about 75% of the time associated with the metadata is wasted. Even when we have made many updates, the data is already stored in the journal, so rewriting the database is not strictly necessary.

The New Solution

The new solution keeps a single database, containing a list of key/value pairs, which can be appended to. At certain points a backup file is made, simply a copy of an existing database. The sequence of steps is:

  • If the backup file exists, delete the database and use the backup file
  • Read all records from the database
  • Put the records into a Map
  • If the Map is significantly smaller than the number of records then
    • Rename the database to the backup
    • Resave the database
    • Delete the backup
  • Run the build, storing any updates to the database

In this method we never save the data after a successful run, but just close the file handles. The database accumulates key/value pairs, but only the last value associated with any key in the database is useful - earlier values are ignored. At some point the database will contain a significant number of keys that are no longer useful, and at that point we rewrite the database, taking care to make a backup before starting.

This post outlines the general steps, omitting details such as version stamps and consistency checks, which are highly important for a robust build system. These details are taken care of in the full implementation, available in the source as Development.Shake.Storage, taking about 100 lines.

No comments: