|
Figuring out how to manage data efficiently is a critical business requirement in addition to being a technological imperative. There is a wide range of classical and modern approaches to solving the variety of data-management problems that developers face today. In this article, I show how bringing together a modern storage approach along with a couple of updated classics of memory management yields a potent synergy of high-performance data management. The three facets that I address are:
- On-disk persistent storage via a log-based filesystem.
- In-memory data storage via a concurrent B+tree.
- The cache management magic that bridges them.
The context in which I explore these facets is Sleepycat Software's Berkeley DB Java Edition (JE), an open-source, pure-Java, object-based database engine (http://www.sleepycat.com/products/je.shtml).
The first performance tradeoff made by the architects at Sleepycat was to not support SQL/JDBC and instead use a schema-neutral, fully programmatic Java library interface that stores data in the application's native format. This makes JE a good fit for many high-performance, embedded database situations where the complexity and overhead of SQL is unnecessary. In exchange for the increased performance, the application forgoes the ability to do ad hoc querying and must limit itself to using exact, range, and set intersection queries. Examples include user profile data management in web applications and managing network device configurations. Listing One is an example of how to do simple data insertions and retrievals. |