Systems / Database Project

L-Store Database (AvenueDB)

Overview

AvenueDB is a Python implementation of an L-Store-style storage engine built around base pages, tail pages, versioned records, indexing, and buffered disk access. The system supports transactional-style data operations while balancing read efficiency with append-friendly updates.

Rather than performing in-place overwrites, updates are appended to tail records and later merged back into base storage, allowing the engine to preserve historical versions while keeping current reads efficient.

Project Access

Explore the Repository

This project is publicly available. You can inspect the storage engine, query paths, merge logic, and bufferpool implementation directly.

Core Capabilities

  • Base / tail page architecture
  • Bufferpool-backed page caching
  • Background merge process
  • Version-aware reads and aggregation
  • Column indexing and range lookup
  • Persistent open / close / reload flow

What I Learned

This project strengthened my understanding of database internals, particularly how storage layout, caching, and indexing impact performance. I gained experience designing systems that evolve over time, often requiring refactoring when new constraints such as persistence or concurrency were introduced.

Working in a team environment also emphasized the importance of clear communication and coordination, especially when multiple components (query logic, storage, indexing, and merging) were being developed simultaneously.

A key takeaway was recognizing how small oversights in metadata or system design can propagate into larger issues, requiring careful debugging and sometimes reworking earlier decisions.

Storage Architecture

The database stores original rows in base pages and appends updates to tail pages instead of rewriting records in place. This layout supports efficient update handling while preserving record history for versioned access.

Merge Process

Updated base record identifiers are collected and a merge is triggered once the unmerged count reaches a threshold. The merge path runs in the background and consolidates the latest tail updates back into base pages.

Buffer Pool & Caching

The system includes a buffer pool with in-memory page caching, dirty-page tracking, pin/unpin support, and LRU-based eviction. This reduces repeated disk reads and improves access performance for hot pages.

Indexing

AvenueDB builds indexes across columns and uses them for point lookups and range-based access. This lets query paths avoid full scans for common retrieval operations.

Query Surface

The query layer goes beyond basic CRUD. In addition to insert, delete, select, and update, the implementation supports historical reads through select_version, range aggregation through sum and sum_version, and single-column mutation through increment.


Supported operations:
- insert(...)
- delete(primary_key)
- select(search_key, column, projection)
- select_version(search_key, column, projection, relative_version)
- update(primary_key, *columns)
- sum(start_range, end_range, column)
- sum_version(start_range, end_range, column, relative_version)
- increment(key, column)

Persistence

The database supports opening from a storage path, loading saved table metadata, flushing dirty pages on close, and restoring tables from disk on subsequent runs.

Concurrency Direction

The implementation also includes locking/coordination primitives around update, delete, and merge behavior, plus threaded execution in the merge path to process consolidated records concurrently.

How It Was Validated

Since this is a backend storage engine rather than a UI product, the main proof of correctness came through repository-visible testers and operation validation. That is appropriate for this kind of system: correctness, data layout behavior, version handling, and merge behavior matter more than a visual demo.

What Makes It Interesting

This project pushed me to think about database tradeoffs directly: append-friendly writes versus read efficiency, how to cache pages safely, when to merge historical updates back into base storage, and how to expose both current and historical query behavior through one engine.

Design Evolution & Challenges

The system was developed across three milestones, each introducing new requirements that forced architectural changes and refactoring of earlier components.

  • Milestone 1 - Core Functionality: Implemented a fully functional database supporting all query operations, including insert, update, delete, select, and aggregation. At this stage, the system operated entirely in-memory without persistence.
  • Milestone 2 - Persistence & Merging: Added disk persistence and introduced the merge process to consolidate tail records into base pages. Merging was initially invoked manually, requiring careful coordination to maintain correct record state.
  • Milestone 3 - Background Merge & Concurrency: Converted merging into a background process, introducing concurrency challenges. This required rethinking synchronization, handling shared state safely, and ensuring consistency while reads and writes occurred alongside merge operations.

These evolving requirements required revisiting earlier design decisions, particularly around storage layout, merge triggering, and coordination between threads. Supporting background merging without breaking correctness was one of the most challenging aspects of the project.

Notes

This was a collaborative project and the public repository reflects the implementation directly.