Core Capabilities
- Base / tail page architecture
- Bufferpool-backed page caching
- Background merge process
- Version-aware reads and aggregation
- Column indexing and range lookup
- Persistent open / close / reload flow
AvenueDB is a Python implementation of an L-Store-style storage engine built around base pages, tail pages, versioned records, indexing, and buffered disk access. The system supports transactional-style data operations while balancing read efficiency with append-friendly updates.
Rather than performing in-place overwrites, updates are appended to tail records and later merged back into base storage, allowing the engine to preserve historical versions while keeping current reads efficient.
This project is publicly available. You can inspect the storage engine, query paths, merge logic, and bufferpool implementation directly.
This project strengthened my understanding of database internals, particularly how storage layout, caching, and indexing impact performance. I gained experience designing systems that evolve over time, often requiring refactoring when new constraints such as persistence or concurrency were introduced.
Working in a team environment also emphasized the importance of clear communication and coordination, especially when multiple components (query logic, storage, indexing, and merging) were being developed simultaneously.
A key takeaway was recognizing how small oversights in metadata or system design can propagate into larger issues, requiring careful debugging and sometimes reworking earlier decisions.
The database stores original rows in base pages and appends updates to tail pages instead of rewriting records in place. This layout supports efficient update handling while preserving record history for versioned access.
Updated base record identifiers are collected and a merge is triggered once the unmerged count reaches a threshold. The merge path runs in the background and consolidates the latest tail updates back into base pages.
The system includes a buffer pool with in-memory page caching, dirty-page tracking, pin/unpin support, and LRU-based eviction. This reduces repeated disk reads and improves access performance for hot pages.
AvenueDB builds indexes across columns and uses them for point lookups and range-based access. This lets query paths avoid full scans for common retrieval operations.
The query layer goes beyond basic CRUD. In addition to insert, delete,
select, and update, the implementation supports historical reads through
select_version, range aggregation through sum
and sum_version, and single-column mutation through
increment.
Supported operations:
- insert(...)
- delete(primary_key)
- select(search_key, column, projection)
- select_version(search_key, column, projection, relative_version)
- update(primary_key, *columns)
- sum(start_range, end_range, column)
- sum_version(start_range, end_range, column, relative_version)
- increment(key, column)
The database supports opening from a storage path, loading saved table metadata, flushing dirty pages on close, and restoring tables from disk on subsequent runs.
The implementation also includes locking/coordination primitives around update, delete, and merge behavior, plus threaded execution in the merge path to process consolidated records concurrently.
Since this is a backend storage engine rather than a UI product, the main proof of correctness came through repository-visible testers and operation validation. That is appropriate for this kind of system: correctness, data layout behavior, version handling, and merge behavior matter more than a visual demo.
This project pushed me to think about database tradeoffs directly: append-friendly writes versus read efficiency, how to cache pages safely, when to merge historical updates back into base storage, and how to expose both current and historical query behavior through one engine.
The system was developed across three milestones, each introducing new requirements that forced architectural changes and refactoring of earlier components.
These evolving requirements required revisiting earlier design decisions, particularly around storage layout, merge triggering, and coordination between threads. Supporting background merging without breaking correctness was one of the most challenging aspects of the project.
This was a collaborative project and the public repository reflects the implementation directly.