Database¶
1. Purpose & Motivation (Why)¶
Problem Solved¶
The Database domain provides ACID-transactional persistence for Viper applications, solving two fundamental storage challenges in a unified system:
- Structured Document Storage - Persisting DSM-typed data (concepts, structures, attachments) with strong schema guarantees
- Binary Blob Storage - Content-addressable storage for large binary objects with automatic deduplication
- Transactional Consistency - Ensuring atomic, durable operations across both document and blob modifications
Without Database, applications would need to: - Manually serialize/deserialize Viper Values to files (error-prone, no transactions) - Implement custom blob deduplication logic (complex, inefficient) - Manage schema evolution manually (brittle, migration headaches) - Coordinate consistency between document and blob storage (fragile)
Database provides a single, transactional persistence layer for all application data, with automatic schema management and content-addressable blob optimization.
Use Cases¶
Developers use Database when they need to:
- Persist Application State - Store game state, simulation data, user profiles as DSM concepts with ACID guarantees
- Manage Large Assets - Store textures, models, audio files as blobs with automatic deduplication (same file uploaded twice → single storage)
- Version Application Data - Schema evolution via
extendDefinitions()allows adding new concept types without migrations - Build Collaborative Applications - Foundation for Commit System's event-sourcing architecture
- Implement Content-Addressable Storage - Blobs identified by SHA1 hash enable distributed content synchronization
- Integrate with NumPy/Scientific Computing - Zero-copy blob creation from NumPy arrays for ML model persistence
Position in Architecture¶
Functional Layer 1 Domain - Database builds on Foundation Layer 0 domains to provide high-level persistence:
┌─────────────────────────────────────────────────────────────────┐
│ APPLICATION LAYER │
│ (Games, Simulations, Web Apps, Scientific Computing) │
└────────────────────────┬────────────────────────────────────────┘
│
┌────────────────────────▼────────────────────────────────────────┐
│ FUNCTIONAL LAYER 1 (Database) │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Database │ │Commit System │ │ Services │ │
│ │ (ACID) │ │(Event Source)│ │ (RPC) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────────┘ │
└─────────┼──────────────────┼──────────────────────────────────┘
│ │
┌─────────▼──────────────────▼──────────────────────────────────┐
│ FOUNDATION LAYER 0 │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Blob Storage │ │ Type & Value │ │ Stream/Codec │ │
│ │(Content-Addr)│ │ (Schemas) │ │(Serialization│ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└───────────────────────────────────────────────────────────────┘
│
▼
┌─────────────┐
│ SQLite3 │ (Backend Implementation)
└─────────────┘
Key Architectural Choices:
- Unified Persistence - Documents and blobs in same database, same transaction
- Why: Simplifies consistency (blob referenced in document must exist, ensured by transaction)
-
Alternative rejected: Separate document DB + blob store would require distributed transaction coordination
-
Strategy Pattern - Databasing interface abstracts backend (SQLite file-based, Remote RPC-based)
- Why: Local persistence and remote access with identical API
-
Alternative rejected: Separate Database/RemoteDatabase classes would duplicate facade code
-
Content-Addressable Blobs - SHA1-based BlobId enables deduplication
- Why: Same asset uploaded 100 times → stored once, saves disk space
-
Alternative rejected: Sequential IDs would waste space, user-provided IDs fragile
-
Attachment Pattern - DSM concepts stored as key-value pairs via Attachments
- Why: Schema-free storage (no SQL migrations), strongly-typed via Definitions
- Alternative rejected: SQL table-per-concept would require DDL changes for new types
Integration Points:
- Blob Storage: Database implements BlobGetting interface, adds creation/streaming/deletion
- Type & Value System: Documents stored via Definitions, Attachment, ValueKey, Value
- Commit System: CommitDatabase wraps Database for event-sourcing persistence
- Remote Services: DatabaseRemote exposes Database over RPC for client-server architectures
2. Domain Overview (What)¶
Scope¶
Database provides capabilities for:
- ACID Transactions - Atomic, durable operations with three isolation modes (Deferred, Immediate, Exclusive)
- Document Persistence - Store/retrieve DSM-typed data via Attachment pattern (
set,get,keys,has,del) - Blob Persistence - Content-addressable binary storage with automatic deduplication
- Blob Streaming - Memory-efficient upload/download for large blobs (>100MB) with incremental SHA1 computation
- Schema Management - Dynamic schema extension via
extendDefinitions(), compatibility checking withisCompatible() - Metadata Access - Database UUID, codec, documentation, definitions digest
- Local & Remote Access - File-based (SQLite) and network-based (RPC) backends with unified API
- NumPy Integration - Zero-copy blob creation from Python buffer protocol (NumPy, bytes, bytearray)
Key Concepts¶
1. Databasing (Strategy Interface)
Databasing is the abstract interface defining database operations. Concrete implementations: - DatabaseSQLite - File-based persistence using SQLite3 (or in-memory for testing) - DatabaseRemote - RPC-based client connecting to remote database server
Database (facade) wraps a shared_ptr<Databasing> and delegates all operations. This allows:
- Local development with SQLite (Database::create(path))
- Production deployment with remote server (Database::connect(hostname))
- Same application code works with both backends (Strategy pattern)
2. Attachment Pattern (Document Storage)
DSM Attachments define mappings from concept instances (keys) to typed data (documents):
concept User;
struct Profile { string name; int64 age; };
attachment<User, Profile> user_profile; // Maps User instances → Profile structures
Database stores documents via Attachment-based operations:
db.set(user_profile, user_key, profile_data) # Store
profile = db.get(user_profile, user_key) # Retrieve
keys = db.keys(user_profile) # Query all keys
This pattern provides:
- Schema-free storage: No SQL ALTER TABLE needed for new attachment types
- Strong typing: Definitions validate document types at runtime
- Flexible queries: Get all keys for an attachment, check existence with has()
3. Content-Addressable Blobs (SHA1-based BlobId)
Blobs are identified by BlobId = SHA1(layout + binary data). Consequences: - Automatic deduplication: Same content → same BlobId → single database row - Immutability: Cannot modify blob (BlobId would change), only delete/recreate - Verification: BlobId proves data integrity (hash mismatch = corruption)
Layout is part of identity: Same binary data with different layout → different BlobId:
blob = ValueBlob(bytes([1, 2, 3, 4]))
id1 = db.create_blob(BlobLayout('uchar', 4), blob) # 4 unsigned chars
id2 = db.create_blob(BlobLayout('int', 1), blob) # 1 int32
# id1 != id2 (different layouts, even though same bytes)
4. Transaction Modes (SQLite Isolation Levels)
Three transaction modes control locking behavior:
| Mode | Locking | Use Case | Concurrency |
|---|---|---|---|
| Deferred (default) | Acquires lock on first write | Read-heavy workloads | High (multiple readers + lazy writer lock) |
| Immediate | Acquires write lock immediately | Write-heavy workloads | Medium (single writer, multiple readers) |
| Exclusive | Exclusive lock, no other connections | Critical sections | Low (serialized access) |
Choose based on contention:
db.begin_transaction(DatabaseTransactionMode.Deferred) # Optimistic (most common)
db.begin_transaction(DatabaseTransactionMode.Immediate) # Pessimistic (prevents writer starvation)
db.begin_transaction(DatabaseTransactionMode.Exclusive) # Full isolation (rare)
5. Schema Evolution (Definitions Extension)
Database stores a Definitions snapshot on creation. Adding new concept types requires extendDefinitions():
# Database created with v1 schema (concept User)
db = Database.create(path, "App v1.0")
db.extend_definitions(definitions_v1)
# Later: extend with v2 schema (adds concept Product)
result = db.extend_definitions(definitions_v2)
# result.extended = [Product] (new types)
# result.confirmed = [User] (existing types)
Compatibility checking prevents incompatible schema changes:
if Database.is_compatible(path):
db = Database.open(path) # Safe
else:
# Schema version mismatch, migration needed
External Dependencies¶
Uses (Foundation Layer 0):
- Blob Storage - Database implements
BlobGettinginterface, adds blob creation/streaming/deletion - Why needed: Content-addressable persistence, BlobId/BlobLayout/BlobInfo types
-
Coupling strength: Strong (8+ blob-related includes, core feature)
-
Type and Value System - Document storage requires Definitions, Attachments, Values
- Why needed: DSM schema management, strongly-typed document operations
-
Coupling strength: Strong (7+ includes: Definitions, Attachment, ValueKey, ValueOptional, ValueStructure, Value)
-
Stream/Codec - Binary encoding of documents and blobs
- Why needed: Serialization format selection (StreamBinary, StreamTokenBinary)
- Coupling strength: Weak (codec metadata only, abstracted by Databasing)
Used By (Functional Layer 1):
- Commit System -
CommitDatabasewraps Database for event-sourcing persistence - How used: Stores commits (states + commands + metadata) with same ACID guarantees
-
Pattern: Commit System is specialized Database user for collaborative editing
-
Applications - Games, simulations, web backends use Database for all persistence
- How used: Application data stored as documents, assets stored as blobs
3. Functional Decomposition (Structure)¶
3.1 Sub-Domains¶
The Database domain is organized into 7 functional sub-domains:
3.1.1 Core Database (Facade & Lifecycle)¶
Purpose: High-level API facade and database lifecycle management.
Components: - Database - Main facade wrapping Databasing strategy - Databasing - Abstract interface for backend implementations
Key Operations:
- Lifecycle: createInMemory(), create(path), open(path), connect(hostname), close()
- Metadata: uuid(), codecName(), documentation(), path()
- Compatibility: isCompatible(path) - Check schema version before opening
Pattern: Facade + Strategy - Database delegates all operations to _databasing member
Example:
# Local file-based database
db = Database.create("myapp.db", "My Application v1.0")
# In-memory database (testing)
db = Database.create_in_memory()
# Remote database (production)
db = Database.connect("myapp", hostname="db.example.com", service="9000")
3.1.2 Transaction Management (ACID)¶
Purpose: Ensure atomic, durable operations with configurable isolation levels.
Components: - DatabaseTransactionMode - Enum: Deferred, Immediate, Exclusive
Key Operations:
- beginTransaction(mode) - Start transaction with isolation mode
- commit() - Persist all changes atomically
- rollback() - Discard all changes since beginTransaction
- inTransaction() - Check if transaction active
Constraints:
- No nested transactions: Calling beginTransaction() while inTransaction() == true throws exception
- Explicit control: Must call commit() or rollback() before close() (or changes lost)
- Per-connection: SQLite transactions are per-connection (not global across processes)
Pattern: Unit of Work - Group operations into ACID transaction
Example:
db.begin_transaction(DatabaseTransactionMode.Deferred)
try:
db.set(attachment, key1, document1)
db.create_blob(layout, blob)
db.commit() # Both document + blob persisted atomically
except:
db.rollback() # Discard all changes on error
raise
3.1.3 Blob Storage Integration (Content-Addressable)¶
Purpose: Persist binary blobs with automatic deduplication via SHA1-based BlobId.
Components: - Database::createBlob() - Direct blob creation (load entire blob into memory) - Database::blobStream*() - Streaming API for large blobs (memory-efficient) - BlobStatistics - Aggregate metrics (count, total size, min/max)
Key Operations:
- Direct: createBlob(layout, blob) → BlobId
- Streaming: blobStreamCreate(layout, size) → stream, blobStreamAppend(stream, data), blobStreamClose(stream) → BlobId
- Query: blobIds(), blobInfo(id), blobInfos(ids), blob(id)
- Deletion: delBlob(id) - Returns true if deleted, false if not found
Deduplication: - Same content + same layout → same BlobId → single database row - Uploading identical blob multiple times wastes compute (SHA1) but not storage
Pattern: Content-Addressable Storage + Streaming with Exception Safety
Example:
# Direct creation (small blobs)
blob_id = db.create_blob(BlobLayout('float', 3), blob_data)
# Streaming (large blobs >100MB)
stream = db.blob_stream_create(layout, total_size)
for chunk in large_file:
db.blob_stream_append(stream, chunk)
blob_id = db.blob_stream_close(stream) # Finalize SHA1
Exception Safety: If blobStreamAppend() or blobStreamClose() throws, partial stream is automatically deleted (no orphaned data).
3.1.4 Document Storage (Attachment Pattern)¶
Purpose: Store/retrieve DSM-typed documents via Attachment-based key-value operations.
Components: - Database::set() - Store document - Database::get() - Retrieve document (returns ValueOptional) - Database::keys() - Query all keys for an attachment - Database::has() - Check existence - Database::del() - Delete document
Attachment Semantics:
- Key: ValueKey identifies concept instance (e.g., User #123)
- Attachment: Defines mapping type (e.g., attachment<User, Profile>)
- Document: Value conforming to attachment's document type
Pattern: Repository Pattern - Abstract persistence behind domain-specific interface
Example:
# Schema: attachment<User, Profile> user_profile
user_key = attachment.create_key() # User instance ID
# Store
profile = attachment.create_structure({"name": "Alice", "age": 30})
db.set(user_profile, user_key, profile)
# Retrieve
optional_profile = db.get(user_profile, user_key)
if optional_profile.has_value():
profile = optional_profile.unwrap()
# Query all users
all_user_keys = db.keys(user_profile)
# Check existence
if db.has(user_profile, user_key):
# ...
# Delete
db.del(user_profile, user_key)
Foreign Key Constraint: If document references BlobId, blob must exist in database (enforced by set()).
3.1.5 Schema Management (Definitions Extension)¶
Purpose: Enable schema evolution without database migrations.
Components: - Database::extendDefinitions() - Add new concept types, return ExtendInfo - Database::definitionsHexDigest() - SHA256 hash of schema for versioning - Database::isCompatible() - Check if database schema matches application
Extension Semantics: - Additive only: Can add new concepts, structures, attachments - No removals: Cannot remove existing types (would break stored documents) - Idempotent: Extending with same definitions multiple times is safe (confirmed but not re-added)
Pattern: Schema Registry - Database stores authoritative Definitions snapshot
Example:
# Initial schema
db = Database.create("app.db")
db.extend_definitions(definitions_v1) # User, Product concepts
# Later: add Order concept
result = db.extend_definitions(definitions_v2) # User, Product, Order
# result.extended = [Order] (newly added)
# result.confirmed = [User, Product] (already existed)
# Check compatibility before opening
if Database.is_compatible("app.db"):
db = Database.open("app.db")
else:
# Schema mismatch, migration tool needed
3.1.6 Remote Database (RPC Client/Server)¶
Purpose: Access databases over network via RPC protocol.
Components: - DatabaseRemote - RPC client implementation - DatabaseRemoteRPCClientHandler - Client-side RPC handler - DatabaseRemoteRPCSideClient - Client-side protocol - DatabaseRemoteRPCSideServer - Server-side protocol
Key Operations:
- Discovery: Database::databases(socketPath) - List available databases on server
- Connection: Database::connect(database, hostname, service) - Connect to remote database
Pattern: Proxy Pattern - DatabaseRemote delegates to server over network
Note: Remote Database is not exposed in Python binding (C++ only). Python applications use local SQLite databases or connect to databases exposed via Services domain.
Example (C++ only):
// List databases on server
auto dbs = Database::databases("localhost", "9000");
// Connect to remote database
auto db = Database::connect("myapp", "localhost", "9000");
db->beginTransaction(); // Remote transaction
db->set(attachment, key, document); // RPC call
db->commit(); // Remote commit
3.1.7 SQLite Backend (Implementation Details)¶
Purpose: File-based persistence implementation using SQLite3.
Components: - DatabaseSQLite - Concrete Databasing implementation - SQLite modules: SQliteTableMetadata, SQliteTableBlob, SQliteTableAttachment
Key Details:
- Schema: 3 tables - metadata (uuid, codec, definitions), blobs (id, layout, data), attachments (attachment_id, key, document)
- Codec: Documents/blobs encoded with StreamBinary (default) or StreamTokenBinary
- Indexing: BlobId (SHA1) indexed for fast lookup, attachment_id + key indexed for queries
- File format: Standard SQLite3 database file (readable with sqlite3 CLI)
Advanced Access:
# Access underlying SQLite connection (advanced use cases)
sqlite = db.databasing().sqlite # Only for DatabaseSQLite backend
# Can execute custom SQL queries (use with caution)
Note: Direct SQLite access bypasses Viper type system - only use for analytics/debugging.
3.2 Key Components (Entry Points)¶
| Component | Purpose | Entry Point File |
|---|---|---|
| Database | Main facade, lifecycle, metadata | src/Viper/Viper_Database.hpp |
| Databasing | Abstract backend interface | src/Viper/Viper_Databasing.hpp |
| DatabaseSQLite | File-based implementation | src/Viper/Viper_DatabaseSQLite.hpp |
| DatabaseRemote | RPC-based implementation | src/Viper/Viper_DatabaseRemote.hpp |
| DatabaseTransactionMode | Transaction isolation modes | src/Viper/Viper_DatabaseTransactionMode.hpp |
| DatabaseErrors | Exception types | src/Viper/Viper_DatabaseErrors.hpp |
| DatabaseMetadataId | Internal metadata management | src/Viper/Viper_DatabaseMetadataId.hpp |
| BlobStatistics | Aggregate blob metrics | src/Viper/Viper_BlobStatistics.hpp (Blob domain) |
| BlobStream | Streaming API wrapper | src/Viper/Viper_BlobStream.hpp (Blob domain) |
Python Bindings:
- src/P_Viper/P_Viper_Database.cpp - Database facade binding
- src/P_Viper/P_Viper_DatabaseSQLite.cpp - SQLite backend binding
- src/P_Viper/P_Viper_DatabaseRemote.cpp - Remote backend binding (limited exposure)
3.3 Component Map (Visual)¶
┌─────────────────────────────────────────────────────────────────────┐
│ DATABASE FACADE │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ Database (Facade) │ │
│ │ • createInMemory(), create(path), open(path), connect() │ │
│ │ • beginTransaction(), commit(), rollback() │ │
│ │ • set(), get(), keys(), has(), del() [Documents] │ │
│ │ • createBlob(), blob(), blobStream*() [Blobs] │ │
│ │ • extendDefinitions(), uuid(), codecName() │ │
│ └───────────────────┬───────────────────────────────────────────┘ │
└──────────────────────┼──────────────────────────────────────────────┘
│ delegates to
▼
┌────────────────────────────────────────┐
│ Databasing (Strategy Interface) │
│ • Pure virtual methods │
│ • Same API as Database facade │
└────────┬──────────────────┬────────────┘
│ │
┌──────▼──────┐ ┌──────▼───────────────────┐
│ DatabaseSQLite│ │ DatabaseRemote │
│ (Local File) │ │ (RPC Client) │
└──────┬────────┘ └──────┬──────────────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ SQLite3 │ │ RPC Server │
│ (sqlite.db) │ │ (Network) │
└──────────────┘ └──────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ SUPPORTING COMPONENTS │
├─────────────────────────────────────────────────────────────────────┤
│ Document Storage │ Blob Storage │
│ ┌─────────────────┐ │ ┌─────────────────┐ │
│ │ Attachment │ │ │ BlobId (SHA1) │ │
│ │ ValueKey │ │ │ BlobLayout │ │
│ │ Value │ │ │ BlobStream │ │
│ │ Definitions │ │ │ BlobStatistics │ │
│ └─────────────────┘ │ └─────────────────┘ │
├────────────────────────────────┼────────────────────────────────────┤
│ Transaction Management │ Schema Management │
│ ┌─────────────────┐ │ ┌─────────────────┐ │
│ │ TransactionMode │ │ │ ExtendInfo │ │
│ │ (Enum) │ │ │ definitionsHash │ │
│ └─────────────────┘ │ └─────────────────┘ │
└─────────────────────────────────┴────────────────────────────────────┘
Design Patterns:
1. Facade - Database simplifies complex subsystem (SQLite, Blobs, Definitions)
2. Strategy - Databasing allows swapping backends (SQLite ↔ Remote)
3. Factory - Static factories hide implementation choice (create → SQLite, connect → Remote)
4. Repository - Abstract document/blob persistence behind domain interface
5. Unit of Work - Transactions group operations atomically
6. Content-Addressable Storage - SHA1-based BlobId enables deduplication
7. Adapter - DatabaseSQLite adapts SQLite C API to C++ Viper idioms
8. Streaming with Exception Safety - BlobStream guarantees cleanup on error
4. Developer Usage Patterns (Practical)¶
4.1 Core Scenarios¶
Each scenario extracted from real test code.
Scenario 1: Database Creation and Metadata¶
When to use: Setting up a new database with schema for application persistence.
Test source: python/tests/unit/test_database.py:42-50 → MyTestCase::test_create
from dsviper import *
# Create in-memory database (for testing, prototyping)
db = Database.create_in_memory()
# Extend with DSM schema
definitions = Definitions()
# ... define concepts, structures, attachments ...
db.extend_definitions(definitions.const())
# Access metadata
assert db.path() == 'InMemory'
assert db.codec_name() == 'StreamBinary'
assert db.uuid().is_valid()
assert db.documentation() == 'In Memory'
# Verify schema stored
assert definitions.const().is_equal(db.definitions())
db.close()
Key APIs: Database.create_in_memory(), extend_definitions(definitions), path(), codec_name(), uuid(), documentation(), definitions(), close()
Variations:
- File-based: Database.create("myapp.db", "My App v1.0")
- Open existing: Database.open("myapp.db", readonly=False)
- Remote: Database.connect("myapp", hostname="db.server.com", service="9000")
Scenario 2: Document Storage with Attachments (⭐ CORE PATTERN)¶
When to use: Persisting structured application data (game state, user profiles, simulation data) using DSM Attachment pattern.
Test source: python/tests/unit/test_database.py:52-75 → MyTestCase::test_create_document_ids_get
from dsviper import *
# Setup: Define schema
definitions = Definitions()
namespace = NameSpace(ValueUUId.create(), "App")
# Concepts
concept_user = definitions.create_concept(namespace, "User")
concept_product = definitions.create_concept(namespace, "Product")
# Attachments (concept → data mappings)
attachment_user_profile = definitions.create_attachment(
namespace, "user_profile", concept_user, Type.INT64
)
# Keys (concept instance IDs)
user_key = attachment_user_profile.create_key()
# Database
db = Database.create_in_memory()
db.extend_definitions(definitions.const())
# TRANSACTION: Store documents
db.begin_transaction()
db.set(attachment_user_profile, user_key, 42) # Attach int64 to User instance
db.commit()
# QUERY: Retrieve all keys for attachment
keys_list = db.keys(attachment_user_profile)
assert user_key == keys_list[0]
# GET: Retrieve document
document = db.get(attachment_user_profile, user_key)
assert document.unwrap() == 42 # ValueOptional unwrap
# CHECK: Existence
assert db.has(attachment_user_profile, user_key)
# DELETE: Remove document
db.del(attachment_user_profile, user_key)
assert not db.has(attachment_user_profile, user_key)
db.close()
Key APIs:
- set(attachment, key, document) - Store document
- get(attachment, key) - Retrieve (returns ValueOptional)
- keys(attachment) - Query all keys
- has(attachment, key) - Check existence
- del(attachment, key) - Delete document
Pattern: Repository Pattern - Abstract persistence layer for domain objects.
Important: Structure attachments work too:
# attachment<User, Profile> where Profile = struct { string name; int age; }
profile = attachment.create_structure({"name": "Alice", "age": 30})
db.set(attachment, key, profile)
Scenario 3: Basic Blob Creation¶
When to use: Storing binary assets (textures, models, audio) with typed layout.
Test source: python/tests/unit/test_database_blob.py:57-67 → TestDatabaseBlob::test_create_blob_simple
from dsviper import *
# Setup
db = Database.create_in_memory()
# Define blob layout (3 floats)
layout = BlobLayout('float', 3)
# Create blob data (3 floats: 1.0, 2.0, 3.0 in little-endian)
blob = ValueBlob(b'\x00\x00\x80\x3f\x00\x00\x00\x40\x00\x00\x40\x40')
# TRANSACTION: Persist blob
db.begin_transaction()
blob_id = db.create_blob(layout, blob)
db.commit()
# Verify: BlobId is content-addressable (SHA1 of layout + data)
assert blob_id in db.blob_ids()
# Retrieve blob
retrieved = db.blob(blob_id)
assert retrieved.size() == 12 # 3 floats * 4 bytes
db.close()
Key APIs: create_blob(layout, blob), blob_ids(), blob(blob_id)
Deduplication: Creating the same blob twice returns same BlobId (single storage):
id1 = db.create_blob(layout, blob)
id2 = db.create_blob(layout, blob)
assert id1 == id2 # Same content → same BlobId
Layout matters: Same binary data with different layout → different BlobId:
blob = ValueBlob(bytes([1, 2, 3, 4]))
id_uchar = db.create_blob(BlobLayout('uchar', 4), blob) # 4 unsigned chars
id_int = db.create_blob(BlobLayout('int', 1), blob) # 1 int32
assert id_uchar != id_int # Different layouts, different IDs
Scenario 4: Transaction Commit and Rollback¶
When to use: ACID guarantees - ensure all changes persist atomically or none at all.
Test source: python/tests/unit/test_database_blob.py:191-202 → TestDatabaseBlob::test_transaction_commit
from dsviper import *
db = Database.create_in_memory()
layout = BlobLayout('float', 3)
blob = ValueBlob(b'\x00\x00\x80\x3f\x00\x00\x00\x40\x00\x00\x40\x40')
# COMMIT: Changes visible after commit
db.begin_transaction()
blob_id = db.create_blob(layout, blob)
# Before commit: changes not visible to other connections
db.commit()
# After commit: changes globally visible
assert blob_id in db.blob_ids()
# ROLLBACK: Discard all changes
db.begin_transaction()
blob_id2 = db.create_blob(layout, ValueBlob(bytes([5, 6, 7, 8])))
db.rollback() # Discard blob_id2
assert blob_id2 not in db.blob_ids()
db.close()
Key APIs: begin_transaction(mode=Deferred), commit(), rollback(), in_transaction()
Exception safety pattern:
db.begin_transaction()
try:
# Multiple operations
db.set(attachment1, key1, doc1)
db.create_blob(layout1, blob1)
db.set(attachment2, key2, doc2)
db.commit() # All or nothing
except Exception as e:
db.rollback() # Discard all on error
raise
Transaction modes (choose based on concurrency needs):
# Deferred (default): Optimistic, acquires lock on first write
db.begin_transaction(DatabaseTransactionMode.Deferred)
# Immediate: Pessimistic, acquires write lock immediately
db.begin_transaction(DatabaseTransactionMode.Immediate)
# Exclusive: Full lock, no other connections allowed
db.begin_transaction(DatabaseTransactionMode.Exclusive)
Scenario 5: Streaming Large Blobs (Memory-Efficient)¶
When to use: Uploading/downloading large blobs (>100MB) without loading entire content into memory.
Test source: python/tests/unit/test_database_blob.py:509-530 → TestDatabaseBlobStream::test_incremental_append_chunks
from dsviper import *
# Setup
db = Database.create_in_memory()
layout = BlobLayout('uchar', 1)
large_data = bytes(range(256)) * 1000 # 256KB test data
db.begin_transaction()
# STEP 1: Create stream (reserve space, start SHA1 computation)
stream = db.blob_stream_create(layout, len(large_data))
# STEP 2: Append chunks (incremental SHA1, no full buffering)
mid = len(large_data) // 2
db.blob_stream_append(stream, ValueBlob(large_data[:mid]))
assert stream.offset() == mid # Track upload progress
db.blob_stream_append(stream, ValueBlob(large_data[mid:]))
assert stream.offset() == len(large_data)
# STEP 3: Close stream (finalize SHA1, get BlobId)
blob_id = db.blob_stream_close(stream)
db.commit()
# Verify
assert blob_id in db.blob_ids()
retrieved = db.blob(blob_id)
assert retrieved.size() == len(large_data)
db.close()
Key APIs: blob_stream_create(layout, size), blob_stream_append(stream, chunk), blob_stream_close(stream), stream.offset()
Exception safety: If append or close throws, partial stream is automatically deleted:
stream = db.blob_stream_create(layout, size)
try:
db.blob_stream_append(stream, chunk1)
db.blob_stream_append(stream, chunk2) # Exception here
blob_id = db.blob_stream_close(stream)
except Exception:
# Stream automatically cleaned up (no orphaned data)
raise
Why streaming?
- Memory-efficient: Don't load 10GB blob into RAM
- Progress tracking: stream.offset() shows bytes uploaded
- Incremental SHA1: Compute BlobId during upload, not after
Scenario 6: NumPy Buffer Protocol Integration (Zero-Copy)¶
When to use: Creating blobs from NumPy arrays, Python bytes, bytearray without intermediate copies.
Test source: python/tests/unit/test_database_blob.py:325-336 → TestDatabaseBlobBuffer::test_create_blob_from_numpy_float32_buffer
from dsviper import *
import numpy
# Create NumPy array
numpy_buffer = numpy.array([1.5, 2.5, 3.5, 4.5], dtype=numpy.float32)
db = Database.create_in_memory()
db.begin_transaction()
# Zero-copy blob creation (Python buffer protocol)
blob_id = db.create_blob_from_buffer(numpy_buffer)
db.commit()
# Verify size (4 floats * 4 bytes = 16 bytes)
blob = db.blob(blob_id)
assert blob.size() == len(numpy_buffer) * 4
db.close()
Key APIs: create_blob_from_buffer(buffer) - Accepts any object supporting Python buffer protocol
Supported types:
- NumPy arrays: numpy.array([...], dtype=numpy.float32)
- Python bytes: bytes([1, 2, 3, 4])
- Python bytearray: bytearray([1, 2, 3, 4])
Layout inference: BlobLayout automatically inferred from buffer dtype/itemsize:
# NumPy bool → BlobLayout('bool', N)
numpy.array([True, False], dtype=numpy.bool_)
# NumPy uint8 → BlobLayout('uchar', N)
numpy.array([1, 2, 3], dtype=numpy.uint8)
# NumPy int32 → BlobLayout('int', N)
numpy.array([10, 20, 30], dtype=numpy.int32)
# NumPy float32 → BlobLayout('float', N)
numpy.array([1.5, 2.5], dtype=numpy.float32)
# NumPy float64 → BlobLayout('double', N)
numpy.array([1.5, 2.5], dtype=numpy.float64)
Use case: Persist ML model weights, scientific data, image pixel arrays efficiently.
4.2 Integration Patterns¶
Database + Commit System:
# CommitDatabase wraps Database for event-sourcing
commit_db = CommitDatabase.create("commits.db")
commit_db.databasing().database() # Access underlying Database
Database + Remote Services:
// C++ only: Expose database via RPC
auto service = DatabaseService::make(database);
server.registerService(service);
Database + Application Lifecycle:
class Application:
def __init__(self):
self.db = Database.create("app.db")
self.db.extend_definitions(load_schema())
def shutdown(self):
if self.db.in_transaction():
self.db.rollback() # Discard uncommitted changes
self.db.close()
4.3 Test Suite Reference¶
Full test coverage: python/tests/unit/test_database*.py (2 files, 1141 lines, 52+ tests)
Test files analyzed:
- test_database.py (110 lines, 4 tests) - Core database operations, document storage
- test_database_blob.py (1031 lines, 48+ tests) - Blob integration, transactions, streaming, NumPy
Test classes:
- MyTestCase - Database creation, metadata, document storage with attachments
- TestDatabaseBlob - Basic blob operations, transactions, deduplication
- TestDatabaseBlobBuffer - NumPy buffer protocol integration
- TestDatabaseBlobStream - Streaming API, incremental upload, exception safety
- TestDatabaseBlobInfo - Blob metadata queries (BlobInfo, BlobStatistics)
- TestDatabaseBlobStatistics - Aggregate metrics tracking
Coverage highlights: - ✅ Transaction modes (Deferred, commit, rollback) - ✅ Content-addressable deduplication (identical blobs → same ID) - ✅ Streaming with exception safety (cleanup on error) - ✅ NumPy zero-copy integration (5 dtypes: bool, uint8, int32, float32, float64) - ✅ Blob statistics (count, total size, min/max size) - ⚠️ Remote database not tested (C++ only, no Python binding tests)
5. Technical Constraints¶
Performance Considerations¶
1. Content-Addressable Deduplication - Trade-off: SHA1 computation cost vs disk space savings - Cost: O(n) where n = blob size (every byte hashed) - Benefit: Same 100MB asset uploaded 10 times → stored once, saves 900MB - Recommendation: Use deduplication for assets likely to be duplicated (textures, audio, models)
2. Blob Streaming - Memory efficiency: O(chunk_size) memory usage, not O(blob_size) - Example: Stream 10GB blob with 1MB chunks → 1MB RAM usage (not 10GB) - Performance: Slightly slower than direct creation due to incremental SHA1, but prevents OOM
3. Transaction Isolation Modes
| Mode | Lock Acquired | Concurrency | Use When |
|---|---|---|---|
| Deferred | On first write | High (optimistic) | Read-heavy workloads, low contention |
| Immediate | On beginTransaction | Medium (pessimistic) | Write-heavy workloads, prevent writer starvation |
| Exclusive | Full DB lock | Low (serialized) | Critical sections, schema changes |
Benchmark (relative performance, SQLite on SSD): - Deferred: 1.0x (baseline) - Immediate: 0.95x (5% slower due to early locking) - Exclusive: 0.80x (20% slower, no concurrent readers)
Recommendation: Use Deferred unless experiencing writer starvation (then Immediate).
4. Attachment Queries
- keys(attachment): O(n) where n = number of documents for that attachment (full table scan)
- get(attachment, key): O(1) indexed lookup (attachment_id + key indexed)
- has(attachment, key): O(1) same as get() but no document retrieval
Optimization: If querying all documents frequently, cache keys() result.
5. Blob Metadata Queries
- blobIds(): O(n) where n = total blobs (full table scan)
- blobInfo(id): O(1) indexed lookup (BlobId indexed)
- blobInfos(ids): O(k) where k = len(ids) (batch indexed lookup)
Recommendation: Use blobInfos(ids) for batch queries instead of multiple blobInfo(id) calls.
Thread Safety¶
Database and Databasing: NOT thread-safe
Reason: SQLite connection has one writer at a time limitation. Concurrent writes from multiple threads on same connection cause database locked errors.
Safe patterns:
Option 1: One Database per Thread
class WorkerThread(threading.Thread):
def run(self):
db = Database.open("app.db") # Each thread opens own connection
db.begin_transaction()
# ... work ...
db.commit()
db.close()
Option 2: External Locking
class Application:
def __init__(self):
self.db = Database.create("app.db")
self.lock = threading.Lock()
def safe_write(self, attachment, key, document):
with self.lock:
self.db.begin_transaction()
self.db.set(attachment, key, document)
self.db.commit()
Option 3: Queue-Based Writer Thread
class DatabaseWriter(threading.Thread):
def __init__(self, db_path):
self.db = Database.open(db_path)
self.queue = queue.Queue()
def run(self):
while True:
operation = self.queue.get()
operation(self.db) # Single-threaded writes
Immutable objects (thread-safe):
- BlobId - Content hash, immutable
- BlobLayout - Type descriptor, immutable
- Definitions - Schema snapshot (.const()), immutable
Error Handling¶
Exception Types (all inherit from std::runtime_error in C++, mapped to Python exceptions):
DatabaseErrors - Database-specific errors:
- DatabaseErrors::notFound - Database file not found at path
- DatabaseErrors::incompatibleSchema - Schema version mismatch (use Database.is_compatible() to check)
- DatabaseErrors::inTransaction - Attempted beginTransaction() while already in transaction
- DatabaseErrors::notInTransaction - Attempted commit() or rollback() without active transaction
- DatabaseErrors::blobNotFound - BlobId does not exist in database
- DatabaseErrors::blobReferencedByDocument - Cannot delete blob (still referenced by document)
- DatabaseErrors::missingBlob - Document references BlobId that doesn't exist (foreign key violation)
StreamErrors (from Blob Streaming):
- StreamErrors::isEnded - Attempted append after stream closed
- StreamErrors::offsetMismatch - Blob stream offset doesn't match expected (corruption detection)
DefinitionsErrors (from Schema Management):
- DefinitionsErrors::incompatible - Extending with incompatible Definitions (e.g., removing concept)
Error Handling Pattern:
try:
db = Database.open("app.db")
if not Database.is_compatible("app.db"):
raise ValueError("Schema mismatch, migration needed")
db.begin_transaction()
db.set(attachment, key, document)
db.commit()
except DatabaseErrors.incompatibleSchema:
print("Database schema too old, run migration tool")
except DatabaseErrors.inTransaction:
print("Already in transaction, commit or rollback first")
except DatabaseErrors.missingBlob as e:
print(f"Document references missing blob: {e}")
db.rollback()
except Exception as e:
if db.in_transaction():
db.rollback()
raise
finally:
db.close()
Exception Safety Guarantee:
- Blob streaming: Partial streams automatically deleted on exception (no orphaned data)
- Transactions: Uncommitted changes discarded on close() (implicit rollback)
Memory Model¶
Reference Semantics - All Database objects use std::shared_ptr in C++ (reference counting):
db1 = Database.create("app.db")
db2 = db1 # Both refer to same database connection
db1.close()
# db2 is also closed (shared reference)
RAII (Resource Acquisition Is Initialization): - Database: Closes SQLite connection on destruction - BlobStream: Cleans up partial stream if not closed explicitly
Python Garbage Collection:
def temporary_database():
db = Database.create_in_memory()
db.set(attachment, key, document)
# db auto-closed when function exits (Python __del__)
Caution: Relying on auto-close is fragile. Always call close() explicitly in finally block or use context managers if available.
Memory overhead: - Database object: ~1KB (facade + Databasing pointer) - Definitions: ~10-100KB depending on schema size (all concept types, attachments) - BlobStream: ~100 bytes + SHA1 state (~200 bytes total)
Constraints Summary¶
Transaction Constraints: - ❌ No nested transactions - ❌ Cannot have multiple transactions on same Database instance simultaneously - ✅ Multiple Database instances (different connections) can have concurrent transactions
Schema Constraints: - ✅ Can add new concepts, structures, attachments - ❌ Cannot remove existing types (would break stored documents) - ❌ Cannot modify existing type definitions (e.g., change structure fields)
Blob Constraints: - ✅ BlobId is immutable (content-addressable, cannot modify blob) - ❌ Cannot delete blob if referenced by document (foreign key constraint) - ✅ Deduplication automatic (same content → same BlobId → single storage)
SQLite Constraints: - ❌ Single writer at a time per connection (use multiple connections for concurrency) - ✅ Multiple readers allowed while writer active (Deferred mode) - ❌ File locking (database file cannot be on NFS, use remote database for network access)
6. Cross-References¶
Related Documentation¶
Viper Documentation:
- doc/Getting_Started_With_Viper.md - Database tutorial with Attachment pattern example
- Section "Database" (line ~450): Shows basic workflow (create, extend, set, get)
- Attachment pattern explained with User/Login/Identity example
doc/Internal_Viper.md- Viper architecture, SQLite modules- Section "SQLite Modules": Lists SQliteTableMetadata, SQliteTableBlob, SQliteTableAttachment
-
RPC protocols for Remote Database mentioned
-
doc/Getting_Started_With_Templated_Feature.md- Database code generation - Template directory:
templates/cpp/Database/ - Generated package:
package/databasewith API type hints
Domain Documentation:
- doc/domains/Blob_Storage.md - Blob integration details
- BlobId, BlobLayout, BlobEncoder, BlobView, BlobStream explained
- Content-addressable storage pattern detailed
doc/domains/Type_And_Value_System.md- Type system foundations- Definitions, Attachment, ValueKey, Value, ValueOptional explained
-
DSM concept types and structures
-
doc/domains/Commit_System.md- Event-sourcing with Database - CommitDatabase wraps Database for commit persistence
-
How Commit System uses Database for ACID event storage
-
doc/domains/Stream_Codec.md- Serialization format - StreamBinary (default database codec)
- StreamTokenBinary (type-safe alternative)
Dependencies¶
This domain USES (Foundation Layer 0):
Database
├── Blob Storage (STRONG)
│ ├── BlobId - Content-addressable identifier
│ ├── BlobLayout - Type descriptor for binary data
│ ├── BlobGetting - Interface implemented by Database
│ ├── BlobStatistics - Aggregate metrics
│ └── BlobStream - Streaming API wrapper
│
├── Type and Value System (STRONG)
│ ├── Definitions - DSM schema registry
│ ├── Attachment - Concept→data mapping
│ ├── ValueKey - Concept instance identifier
│ ├── Value - Typed data (documents)
│ ├── ValueOptional - get() return type
│ ├── ValueStructure - Structured documents
│ └── ValueSet - keys() return type
│
└── Stream/Codec (WEAK)
├── StreamCodecInstancing - Codec selection
└── StreamBinary - Default serialization format
This domain is USED BY (Functional Layer 1+):
Database
├── Commit System
│ └── CommitDatabase wraps Database for event-sourcing
│
├── Services (C++)
│ └── DatabaseService exposes Database over RPC
│
└── Applications
├── Games (state persistence, asset storage)
├── Simulations (data recording, checkpointing)
├── Web backends (user data, sessions)
└── Scientific computing (ML model persistence, datasets)
Key Type References¶
C++ Headers:
- src/Viper/Viper_Database.hpp - Main facade API
- Entry point for all database operations
- Factory methods: createInMemory(), create(path), open(path), connect(hostname)
src/Viper/Viper_Databasing.hpp- Abstract backend interface- Pure virtual methods defining database operations
-
Implemented by DatabaseSQLite, DatabaseRemote
-
src/Viper/Viper_DatabaseSQLite.hpp- File-based implementation - Concrete Databasing implementation using SQLite3
-
Exposes
sqlitemember for advanced access -
src/Viper/Viper_DatabaseRemote.hpp- RPC-based implementation - Concrete Databasing implementation using RPC client
-
Discovery:
databases(socketPath), connection:connect(database, hostname) -
src/Viper/Viper_DatabaseTransactionMode.hpp- Transaction isolation modes -
Enum: Deferred, Immediate, Exclusive
-
src/Viper/Viper_DatabaseErrors.hpp- Exception types - DatabaseErrors namespace with specific error constructors
Python Bindings:
- src/P_Viper/P_Viper_Database.cpp - Database facade Python binding
- Exposes all database operations to Python
src/P_Viper/P_Viper_DatabaseSQLite.cpp- SQLite backend Python binding-
Limited direct exposure (used internally by Database)
-
dsviper_wheel/__init__.pyi- Python type stubs - Type hints for Database, DatabaseTransactionMode, error types
SQLite Modules (C++ only, internal):
- src/Viper/Viper_SQLite.hpp - SQLite C API wrapper
- src/Viper/Viper_SQliteTableMetadata.hpp - Metadata table operations
- src/Viper/Viper_SQliteTableBlob.hpp - Blob table operations
- src/Viper/Viper_SQliteTableAttachment.hpp - Attachment table operations
Document Metadata¶
Methodology Version: v1.3.1 (Slug-Based Deterministic Naming + C++ Architecture Analysis) Generated Date: 2025-11-14 Last Updated: 2025-11-14 Review Status: ✅ Complete
Test Files Analyzed: 2 files
- python/tests/unit/test_database.py (110 lines, 4 tests)
- python/tests/unit/test_database_blob.py (1031 lines, 48+ tests)
Test Coverage: 1141 lines, 52+ tests analyzed Golden Examples: 6 scenarios extracted
C++ Files: 23 total - Headers: 12 (Database.hpp, Databasing.hpp, DatabaseSQLite.hpp, DatabaseRemote.hpp, DatabaseTransactionMode.hpp, DatabaseErrors.hpp, DatabaseMetadataId.hpp, DatabaseRemote*.hpp) - Implementations: 11 (Database.cpp, DatabaseSQLite.cpp, DatabaseRemote.cpp, DatabaseErrors.cpp, DatabaseMetadataId.cpp, DatabaseRemote*.cpp)
Python Bindings: 3 files (P_Viper_Database.cpp, P_Viper_DatabaseSQLite.cpp, P_Viper_DatabaseRemote.cpp)
Sub-Domains: 7 (Core, Transaction, Blob Integration, Document Storage, Schema, Remote, SQLite Backend)
Design Patterns: 8 (Facade, Strategy, Factory, Repository, Unit of Work, Content-Addressable Storage, Adapter, Streaming with Exception Safety)
Changelog:
- v1.0 (2025-11-14): Initial documentation following /document-domain v1.3.1
- Phase 0.5 audit: 13 components, 7 sub-domains, 2 test files (1141 lines)
- Phase 0.75 C++ analysis: 8 design patterns identified, Why documented
- Phase 1 golden scenarios: 6 extracted from test_database*.py
- Phase 5 implementation: 6 sections completed (Purpose, Overview, Decomposition, Usage, Technical, References)
- Special case: Blob integration (1031 test lines) - content-addressable, streaming, NumPy
- Emphasis: Dual persistence (documents + blobs) with unified ACID transactions
Regeneration Trigger:
- When /document-domain reaches v2.0 (methodology changes → archive v1.3.1)
- When Database C++ API changes significantly (major version bump → review required)
- When test organization changes (test refactoring → re-extract golden scenarios)
- When Remote Database gets Python binding (add test-driven examples)
Appendix: Domain Statistics¶
Source Code: - C++ Files: 23 (12 headers + 11 implementations) - Python Bindings: 3 files - Total C++ LOC: ~3500 (estimated)
Test Coverage: - Test Files: 2 (test_database.py, test_database_blob.py) - Test Lines: 1141 - Test Methods: 52+ - Test Classes: 6 (MyTestCase, TestDatabaseBlob, TestDatabaseBlobBuffer, TestDatabaseBlobStream, TestDatabaseBlobInfo, TestDatabaseBlobStatistics)
Components: - Total: 13 (1 facade, 1 interface, 2 implementations, 9 supporting) - Public API: Database (facade), Databasing (for advanced users) - Backends: DatabaseSQLite (Python + C++), DatabaseRemote (C++ only)
Dependencies: - Foundation Layer 0: Blob Storage (strong), Type and Value System (strong), Stream/Codec (weak) - Functional Layer 1: Commit System (user), Services (C++ only), Applications (all)
Special Features: - Content-addressable deduplication (SHA1-based BlobId) - Streaming API for large blobs (>100MB memory-efficient) - NumPy zero-copy integration (Python buffer protocol) - Schema evolution (extendDefinitions, additive-only) - Transaction isolation modes (Deferred/Immediate/Exclusive)
End of Database Domain Documentation