DSM¶
1. Purpose & Motivation (Why)¶
Problem Solved¶
The DSM domain provides a complete implementation of the Digital Substrate Model parser, validator, and builder infrastructure. It solves the challenge of transforming user-written .dsm schema files (defining concepts, structures, attachments, etc.) into validated runtime type definitions that can be:
- Code generated via the
kibotool (Java-based code generator) - Database initialized with proper schemas
- Type-checked at compile time
- Navigated programmatically via inspector APIs
- Serialized for inter-process communication
Key problems addressed:
- Multi-file project support - Assemble definitions from multiple
.dsmfiles across directories - Semantic validation - Detect errors beyond syntax (circular inheritance, duplicate names, invalid types, etc.)
- Type resolution - Resolve references across namespaces and files
- Dual representation - Maintain both parsing artifacts (for error reporting) and clean API (for code generation)
- Binary serialization - Encode/decode definitions for the
kibocode generator - Developer tooling - Provide query/inspection APIs for IDE support, linters, analyzers
Important Distinction¶
This document describes the DSM implementation (parser/validator/builder for C++/Python developers).
For DSM language specification (syntax, semantics for .dsm file authors), see:
- doc/DSM.md - Language reference (concepts, clubs, structures, attachments)
- doc/Getting_Started.md - Tutorial for writing .dsm schema files
- doc/Getting_Started_With_dsm_util.py.md - Command-line tool guide
Use Cases¶
Developers use the DSM domain when they need to:
- Parse
.dsmschema files - Transform DSM language text into runtimeDefinitions - Multi-file projects (assemble directory of
.dsmfiles) - Single-file schemas (load one
.dsmfile) -
Programmatic schema construction (append DSM text dynamically)
-
Validate DSM semantics - Detect errors beyond syntax
- Circular inheritance detection (
concept A is a B; concept B is a A) - Duplicate name checking (concept and structure with same name)
- Type reference validation (ensure referenced types exist)
-
Recursive structure detection (prevent infinite recursion)
-
Query DSM definitions - Navigate parsed schemas programmatically
- Lookup concepts/structures by name
- Enumerate all types in a namespace
- Resolve type references across namespaces
-
Extract function pools and attachments
-
Generate code - Export definitions to
kibocode generator - Binary serialization (
.dsmbformat via StreamTokenBinaryCodec) - Round-trip preservation (encode → decode maintains integrity)
-
Template-based code generation (C++ classes, Python bindings)
-
Create databases - Initialize Database/CommitDatabase with schemas
DSMHelper::createStoreDatabase()- Create Database from DSMDefinitionsDSMHelper::createCommitDatabase()- Create CommitDatabase from DSMDefinitions-
Schema evolution support (extend definitions incrementally)
-
Build developer tools - IDE support, linters, analyzers
- Syntax error reporting with file/line/column precision
- Semantic error messages with context
- Type hierarchy navigation (concept inheritance trees)
- Dependency graph analysis (structure field references)
Position in Architecture¶
DSM is a Functional Layer 1 domain that builds on Foundation Layer 0 components:
Functional Layer 1
├── DSM (this domain) ← Parser/Validator/Builder for schema files
│ ├── Uses: Type System (TypeName representation)
│ ├── Uses: UUId (namespace identifiers)
│ ├── Uses: NameSpace (definition organization)
│ └── Produces: Definitions (runtime type system)
│
Foundation Layer 0
├── Type System (TypeName, TypeCode)
├── UUId (unique identifiers)
└── NameSpace (namespace representation)
Architectural role:
- Input: .dsm text files (user-authored schemas)
- Output: Definitions objects (runtime type system)
- Process: 4-phase pipeline (Build → Parse → Validate → Convert)
2. Domain Overview (What)¶
Scope¶
The DSM domain provides capabilities for:
- Building DSM content from files (single file or directory)
- Parsing DSM language via ANTLR-generated parser
- Validating semantics with 18+ specialized checkers
- Converting parse trees into immutable
DSMDefinitions - Inspecting definitions via Repository pattern (O(1) lookups)
- Serializing definitions to binary for
kibocode generator - Rendering definitions back to text (DSM language or HTML)
Out of scope (covered by other domains):
- DSM language syntax/semantics (see doc/DSM.md)
- User tutorials for writing .dsm files (see doc/Getting_Started.md)
- Code generation templates (see templates/cpp/, templates/python/)
- Kibo code generator implementation (see tools/kibo-1.2.0.jar)
Key Concepts¶
1. DSMBuilder - Progressive File Assembly¶
Builder pattern for constructing DSM content from multiple sources:
append(name, content)- Add DSM text from a fileparts()- Track file provenance (for error messages with file:line)content()- Concatenated DSM text (single ANTLR input)parse()- Execute 4-phase pipeline
Why concatenation? ANTLR parser expects single input string, but DSM projects span multiple files. DSMBuilder concatenates while preserving file boundaries for error reporting.
2. Four-Phase Pipeline - Build → Parse → Validate → Convert¶
Pipeline pattern transforming .dsm text into validated Definitions:
- Phase 1: ANTLR Lexing + Parsing
- Input: Concatenated DSM text
- Process: ANTLR lexer/parser (generated from
DSM.g4grammar) - Output:
ParseTree(ANTLR AST) -
Errors: Syntax errors (missing semicolons, invalid tokens)
-
Phase 2: AST Traversal
- Input:
ParseTree - Process:
DSMP::Listenerwalks tree, buildsDSMP::Definitions - Output:
DSMP::Definitions(internal mutable AST with parser metadata) -
Errors: None (tree walk always succeeds if Phase 1 succeeded)
-
Phase 3: Semantic Validation
- Input:
DSMP::Definitions - Process: 18+
DSMP::Checkersubclasses validate semantics - Output:
DSMParseReport(accumulated errors) -
Errors: Duplicate names, circular inheritance, invalid types, recursive structures
-
Phase 4: Conversion + RuntimeID Assignment
- Input:
DSMP::Definitions(if no errors in Phase 3) - Process: Convert to
DSMDefinitions, then toDefinitions, assign RuntimeIDs - Output:
DSMDefinitions(public immutable API) +Definitions(runtime types) - Errors: RuntimeID conflicts (should never happen if Phase 3 passed)
Why four phases? Progressive validation with early exit saves work. Syntax errors caught in Phase 1 prevent wasted semantic checking in Phase 3.
3. DSMDefinitions - Immutable Container¶
Value Object pattern holding all parsed DSM entities:
class DSMDefinitions {
std::vector<std::shared_ptr<DSMConcept>> concepts;
std::vector<std::shared_ptr<DSMClub>> clubs;
std::vector<std::shared_ptr<DSMEnumeration>> enumerations;
std::vector<std::shared_ptr<DSMStructure>> structures;
std::vector<std::shared_ptr<DSMAttachment>> attachments;
std::vector<std::shared_ptr<DSMFunctionPool>> functionPools;
std::vector<std::shared_ptr<DSMCommitFunctionPool>> commitFunctionPools;
};
Immutability: All fields const (except runtimeId which is assigned post-construction).
Why immutable? Thread-safe sharing across modules (parsers, code generators, validators).
4. DSMDefinitionsInspector - Repository for Queries¶
Repository pattern providing O(1) lookups via internal maps:
class DSMDefinitionsInspector {
std::map<TypeName, std::shared_ptr<DSMConcept>> _concepts;
std::map<TypeName, std::shared_ptr<DSMClub>> _clubs;
std::map<TypeName, std::shared_ptr<DSMStructure>> _structures;
// ... other maps
};
API patterns:
- query*(typeName) - Returns nullptr if not found (safe for probing)
- check*(typeName) - Throws exception if not found (for assertions)
Why Repository? Semantic checkers need frequent type lookups (18+ checkers × hundreds of types). Maps provide O(1) vs O(n) linear search in vectors.
5. Semantic Validation - Chain of Responsibility¶
18+ specialized checkers organized by dependency order:
// Identifiers first (no dependencies)
CheckerDefinitionIdentifier
CheckerIdentifierReserved
// Namespaces (depend on identifier validation)
CheckerNameSpaceSameName
CheckerNameSpaceSameUUID
CheckerNameSpaceDefinitionIdentifier
// Types (depend on namespace validation)
CheckerTypeReference // Must resolve all type references
// Advanced validation (depend on type resolution)
CheckerConceptRecursive
CheckerStructureFieldAlreadyDefined
CheckerTypeStructureRecursive
// ... 10+ more checkers
Pattern: Each checker validates one aspect (Single Responsibility), executed sequentially with early exit on error (fail fast).
Why 18+ checkers? Modular design makes adding new validation rules trivial (Open/Closed Principle). Easy to test each rule in isolation.
External Dependencies¶
Uses (Foundation Layer):
- Type System (22 includes) - TypeName representation for DSM types
- UUId (15 includes) - Namespace UUIDs, RuntimeID generation
- NameSpace (6 includes) - Namespace organization
- Error (2 includes) - Exception types (DSMErrors)
- Blob (2 includes) - Binary serialization storage
Used By (Functional Layer):
- Database - DSMHelper::createStoreDatabase() initializes schemas
- CommitDatabase - DSMHelper::createCommitDatabase() initializes schemas
- Kibo code generator - Consumes binary .dsmb files (DSMDefinitionsEncoder output)
- StringHelper - Pretty-printing DSM types via representationGl()
3. Functional Decomposition (Structure)¶
3.1 Sub-domains¶
The DSM domain comprises 7 sub-domains organized by concern:
1. Builder/Parser Sub-domain¶
Purpose: Load .dsm files and execute ANTLR parsing
Components:
- DSMBuilder - Progressively append files, track provenance
- DSMBuilderPart - Track file/line ranges for error messages
- DSMHelper - Facade providing assemble() and parse() entry points
- DSMP::Listener - ANTLR tree walker (generated from grammar)
- DSMP::ErrorListener - Capture syntax errors with file/line/column
Key Pattern: Builder + Facade - Simplify complex parsing workflow
2. Definitions Sub-domain¶
Purpose: Store and access parsed DSM entities
Components: - DSMDefinitions - Immutable container for all entities - DSMDefinitionsInspector - Repository for O(1) lookups - DSMDefinitionsEncoder - Binary serialization (for kibo) - DSMDefinitionsDecoder - Binary deserialization (from kibo) - DSMDefinitionsRenderer - Text output (DSM language or HTML)
Key Pattern: Repository + Codec - Efficient access and serialization
3. DSM Entities Sub-domain¶
Purpose: Represent DSM language constructs
Components:
- DSMConcept - Abstract types with inheritance (concept A is a B)
- DSMClub - Union types (club MyUnion; membership MyUnion ConceptA)
- DSMStructure - Composite types with fields (struct Point { float x; float y; })
- DSMEnumeration - Enumerated types (enum Status { active, inactive })
- DSMAttachment - Concept→Data mappings (attachment<Concept, Structure> data)
- DSMFunction - Function signatures
- DSMFunctionPool - Collections of functions
- DSMCommitFunctionPool - Mutation functions for commit system
Key Pattern: Immutable Value Objects - Thread-safe, shared via shared_ptr
4. DSM Type System Sub-domain¶
Purpose: Represent generic and mathematical types
Components (11 type categories):
- DSMTypeReference - Primitives (int64, string, bool) and named types
- DSMTypeVector - Dynamic arrays (vector<T>)
- DSMTypeSet - Unique collections (set<T>)
- DSMTypeMap - Key-value pairs (map<K, V>)
- DSMTypeOptional - Maybe types (optional<T>)
- DSMTypeTuple - Heterogeneous tuples (tuple<T1, T2, T3>)
- DSMTypeVariant - Sum types (variant<T1, T2, T3>)
- DSMTypeXArray - CRDT arrays (xarray<T>)
- DSMTypeKey - Key types for attachments (key<Concept>)
- DSMTypeVec - Mathematical vectors (vec<float, 3> for vec3)
- DSMTypeMat - Matrices (mat<float, 3, 3> for mat3)
Key Pattern: Composite + Strategy - Recursive type composition
Example: vector<optional<map<int64, string>>> is composed as:
DSMTypeVector
└─ elementType: DSMTypeOptional
└─ valueType: DSMTypeMap
├─ keyType: DSMTypeReference (int64)
└─ valueType: DSMTypeReference (string)
Three representation modes:
- representation() - Fully qualified (MyNamespace::MyType)
- representationIn(namespace) - Relative to namespace (MyType if same namespace)
- representationGl(inspector) - Global from root namespace (legacy naming, should be representationInRoot)
5. Semantic Validation Sub-domain¶
Purpose: Detect errors beyond syntax
Base Class:
- DSMP::Checker - Template Method pattern (abstract check(report))
18+ Concrete Checkers (organized by category):
Category 1: Reserved Identifiers (2 checkers)
- CheckerDefinitionIdentifier - Validate identifier format
- CheckerIdentifierReserved - Check reserved words (permissive: allows C++ keywords)
Category 2: Already Defined (7 checkers)
- CheckerNameSpaceDefinitionIdentifier - Unique names within namespace
- CheckerAttachmentAlreadyDefined - No duplicate attachments
- CheckerEnumerationCaseAlreadyDefined - Unique enum cases
- CheckerStructureFieldAlreadyDefined - Unique struct fields
- CheckerFunctionAlreadyDefined - Unique function names in pool
- CheckerFunctionParameterAlreadyDefined - Unique parameter names
- CheckerPoolIdentifierAlreadyDefined - Unique pool identifiers
Category 3: Recursive Definitions (3 checkers)
- CheckerConceptRecursive - Detect circular inheritance (A → B → C → A)
- CheckerNameSpaceRecursive - Detect namespace cycles
- CheckerTypeStructureRecursive - Detect recursive structures
Category 4: Type Validation (6 checkers)
- CheckerTypeReference - Resolve all type references (must run before advanced checkers)
- CheckerTypeVariantDuplicated - No duplicate variant members
- CheckerTypeVariantTooMany - Limit variant size (implementation constraint)
- CheckerEnumerationCaseTooMany - Limit enum size
- CheckerStructureFieldEmpty - Structures must have at least one field
- CheckerStructureFieldDefaultValue - Validate literal default values
Key Patterns: - Chain of Responsibility - Sequential execution with early exit - Dependency Ordering - Checkers run in precise order (identifiers → namespaces → types → structures)
Cycle Detection Algorithm (example from CheckerConceptRecursive):
std::unordered_set<std::string> visited;
auto w_concept = concept_;
while (w_concept->tokenIsa) { // Has parent
auto parent = w_concept->getParentIdentifier();
if (contains(visited, parent)) {
report_error("Recursive concept detected");
return;
}
visited.insert(parent);
w_concept = inspector->query(parent);
}
6. Literals Sub-domain¶
Purpose: Represent default values in DSM schemas
Components:
- DSMLiteral - Base class for literal values
- DSMLiteralValue - Primitive literals (42, "hello", true)
- DSMLiteralList - Collection literals ([1, 2, 3])
- DSMLiteralDomain - Domain-specific literals (UUIDs, etc.)
Usage: Structure fields can have default values:
struct Config {
int64 timeout = 30;
string name = "default";
bool enabled = true;
};
7. Error Handling Sub-domain¶
Purpose: Report parsing and validation errors
Components: - DSMParseReport - Accumulator for all errors - DSMParseError - Single error with file/line/column/message - DSMErrors - Exception types for programmer errors
Error Reporting Pattern:
// Syntax errors (Phase 1)
DSMParseError::make(part, token, "Missing semicolon");
// Semantic errors (Phase 3)
report->add(DSMParseError::make(part, token, "Duplicate name 'MyType'"));
// Programmer errors (API misuse)
throw DSMErrors::missingRuntimeId(component, ctx, type, representation);
Design Decision: Parsing/validation errors → DSMParseReport (accumulated), API misuse → exceptions (fail fast).
3.2 Key Components (Entry Points)¶
| Component | Purpose | Entry Point File |
|---|---|---|
| DSMHelper | Facade for all DSM operations | Viper_DSMHelper.hpp |
| DSMBuilder | Progressive file assembly | Viper_DSMBuilder.hpp |
| DSMDefinitions | Immutable container | Viper_DSMDefinitions.hpp |
| DSMDefinitionsInspector | Repository for queries | Viper_DSMDefinitionsInspector.hpp |
| DSMConcept | Concept entity | Viper_DSMConcept.hpp |
| DSMStructure | Structure entity | Viper_DSMStructure.hpp |
| DSMType (base) | Type hierarchy root | Viper_DSMType.hpp |
| DSMTypeVector | Generic vector type | Viper_DSMTypeVector.hpp |
| DSMParseReport | Error accumulator | Viper_DSMParseReport.hpp |
| DSMP::Checker | Validation base class | Viper_DSMP_Checker.hpp |
3.3 Component Map (Visual)¶
┌─────────────────────────────────────────────────────────────────┐
│ DSM Domain Architecture │
└─────────────────────────────────────────────────────────────────┘
User .dsm Files
│
↓
┌──────────────────┐
│ DSMBuilder │ ← Facade: DSMHelper::assemble(path)
│ (Builder) │
│ - append() │
│ - parse() │
└────────┬─────────┘
│ Concatenated DSM text
↓
┌──────────────────────────────────────────────────────────────────┐
│ 4-Phase Pipeline │
├──────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1: ANTLR Parsing │
│ ┌────────────────┐ │
│ │ ANTLR Lexer │ → TokenStream → ANTLR Parser → ParseTree │
│ └────────────────┘ │
│ │ │
│ ↓ (if syntax error) │
│ ┌────────────────┐ │
│ │ ErrorListener │ → DSMParseReport (syntax errors) │
│ └────────────────┘ │
│ │
│ ───────────────────────────────────────────────────────────── │
│ │
│ Phase 2: AST Traversal │
│ ┌────────────────┐ │
│ │ DSMP::Listener │ → DSMP::Definitions (internal mutable AST) │
│ └────────────────┘ │
│ │
│ ───────────────────────────────────────────────────────────── │
│ │
│ Phase 3: Semantic Validation (18+ checkers) │
│ ┌──────────────────────────────────────────┐ │
│ │ CheckerDefinitionIdentifier │ │
│ │ CheckerIdentifierReserved │ │
│ ├──────────────────────────────────────────┤ │
│ │ CheckerNameSpaceSameName │ │
│ │ CheckerNameSpaceSameUUID │ │
│ │ CheckerNameSpaceDefinitionIdentifier │ │
│ ├──────────────────────────────────────────┤ │
│ │ CheckerTypeReference (must run first!) │ ← Resolve types │
│ ├──────────────────────────────────────────┤ │
│ │ CheckerConceptRecursive │ │
│ │ CheckerStructureFieldAlreadyDefined │ │
│ │ CheckerTypeStructureRecursive │ │
│ │ ... 10+ more checkers │ │
│ └────────────┬─────────────────────────────┘ │
│ │ │
│ ↓ (if semantic error) │
│ DSMParseReport (semantic errors) │
│ │
│ ───────────────────────────────────────────────────────────── │
│ │
│ Phase 4: Conversion + RuntimeID Assignment │
│ ┌────────────────┐ │
│ │ DSMP::Converter│ → DSMDefinitions (public immutable) │
│ └────────┬───────┘ │
│ │ │
│ ↓ │
│ ┌──────────────────────────┐ │
│ │ DSMDefinitionsToDefinitions│ → Definitions (runtime types) │
│ └────────┬─────────────────┘ │
│ │ │
│ ↓ (assign RuntimeIDs via DefinitionsInspector) │
│ DSMDefinitions (with RuntimeIDs) │
│ │
└────────────────────────────────────────────────────────────────┘
│
↓
┌──────────────────────────┐
│ DSMDefinitionsInspector │ ← Repository (O(1) lookups)
│ (Repository) │
│ - queryConcept() │
│ - queryStructure() │
│ - checkConcept() │
└──────────────────────────┘
│
↓
┌──────────────────────────┐
│ Consumers │
│ - Kibo code generator │ ← Binary serialization (.dsmb)
│ - Database creation │ ← DSMHelper::createStoreDatabase()
│ - IDE tools │ ← Inspector queries
└──────────────────────────┘
Legend:
- → Data flow
- ↓ Process flow
- ┌─┐ Component boundary
- ├─┤ Checker category boundary
Key Design Decisions:
- Dual AST Representation (DSMP::Definitions vs DSMDefinitions)
- Why: DSMP::Definitions preserves parser metadata (tokens, positions) for error reporting
- Why: DSMDefinitions provides clean API without parser artifacts
-
Trade-off: Extra conversion cost vs separation of concerns
-
RuntimeID Assigned Post-Conversion (mutable field)
- Why: RuntimeID depends on DefinitionsInspector (type resolution)
- Why: Can't compute during parsing (circular dependency: need Definitions to create Inspector, need Inspector to compute RuntimeID)
-
Trade-off: One mutable field vs cleaner initialization
-
Sequential Checker Execution with Early Exit
- Why: Save work if early checkers find errors (fail fast)
- Why: Dependency ordering (must resolve types before checking recursion)
- Trade-off: Cannot parallelize checkers vs correct validation order
4. Developer Usage Patterns (Practical)¶
4.1 Core Scenarios¶
Each scenario extracted from real test code.
Scenario 1: Basic DSM Parsing Workflow¶
When to use: Parse .dsm schema files into validated definitions
Test source: test_dsm_builder.py:201-221 → TestDSMBuilderParse::test_parse_valid_dsm
from dsviper import DSMBuilder
# Phase 1: Build - Append DSM content
builder = DSMBuilder()
builder.append("test.dsm", """
namespace Test {00000000-0000-0000-0000-000000000001} {
struct Point {
float x;
float y;
};
};
""")
# Phase 2-4: Parse - Execute 4-phase pipeline
# (Lexing → Parsing → Validation → Conversion)
report, dsm, defs = builder.parse()
# Check validation results
if report.has_error():
for error in report.errors():
print(f"{error.file()}:{error.line()} - {error.message()}")
else:
# Access parsed DSMDefinitions
structs = list(dsm.structures())
print(f"Parsed {len(structs)} structures")
# Result: 1 structure (Point with 2 fields)
Key APIs: DSMBuilder(), builder.append(), builder.parse(), report.has_error(), dsm.structures()
Pattern Illustrated: 4-phase pipeline (Build → Parse → Validate → Convert)
Scenario 2: Load from File or Directory¶
When to use: Load .dsm schemas from filesystem
Test source: test_dsm_builder.py:137-150 → TestDSMBuilderAssemble::test_assemble_from_file
from dsviper import DSMBuilder
# Load single file
builder = DSMBuilder.assemble("schema.dsm")
# Or load entire directory (all *.dsm files)
builder = DSMBuilder.assemble("schemas/")
# Parse assembled content
report, dsm, defs = builder.parse()
# Access file provenance
for part in builder.parts():
print(f"Loaded {part.name()} ({part.line_count()} lines)")
Key APIs: DSMBuilder.assemble(path)
Pattern Illustrated: Facade pattern (DSMHelper simplifies file I/O + builder construction)
Scenario 3: Query DSM Definitions¶
When to use: Navigate parsed schemas programmatically
Test source: test_dsm_inspector.py:59-70 → TestDSMInspectorConcepts::test_query_concept_exists
from dsviper import DSMDefinitionsInspector
# Create inspector (Repository)
inspector = DSMDefinitionsInspector(dsm)
# Get all concept names
concept_names = inspector.concept_type_names()
for tn in concept_names:
print(f"Concept: {tn.name()}")
# Query specific concept by TypeName
concept_a_tn = next(tn for tn in concept_names if tn.name() == "ConceptA")
concept = inspector.query_concept(concept_a_tn)
if concept:
print(f"Found: {concept.type_name().name()}")
if concept.parent():
print(f" Inherits from: {concept.parent().type_name().name()}")
# Use check_concept() for assertions (throws if not found)
concept = inspector.check_concept(concept_a_tn) # Exception if missing
Key APIs: DSMDefinitionsInspector(dsm), concept_type_names(), query_concept(), check_concept()
Pattern Illustrated: Repository pattern (O(1) lookups via internal maps)
Scenario 4: Navigate Composite Types¶
When to use: Analyze generic types (vector, map, optional, etc.)
Test source: test_dsm_types.py:219-234 → TestDSMTypeVector::test_vector_nested
from dsviper import DSMTypeVector, DSMTypeReference
# DSM schema: vector<vector<int64>>
struct = next(s for s in dsm.structures() if s.type_name().name() == "NestedLevel2")
field = next(f for f in struct.fields() if f.name() == "f_vector_of_vectors")
field_type = field.type()
# Navigate type tree (Composite pattern)
assert isinstance(field_type, DSMTypeVector)
print(f"Outer type: {field_type.representation()}") # "vector<vector<int64>>"
# Recurse into element type
inner_vec = field_type.element_type()
assert isinstance(inner_vec, DSMTypeVector)
print(f"Inner type: {inner_vec.representation()}") # "vector<int64>"
# Leaf type
elem_type = inner_vec.element_type()
assert isinstance(elem_type, DSMTypeReference)
print(f"Element type: {elem_type.type_name().name()}") # "int64"
Key APIs: type.element_type(), type.representation(), isinstance(type, DSMTypeVector)
Pattern Illustrated: Composite pattern (recursive type tree navigation)
Scenario 5: Detect Semantic Errors¶
When to use: Validate DSM schemas for semantic correctness
Test source: test_dsm_semantic_errors.py:41-53 → test_checker_namespace_definition_duplicate_concept
from dsviper import DSMBuilder
# DSM with duplicate concept names
builder = DSMBuilder()
builder.append("test", '''
namespace Test {00000000-0000-0000-0000-000000000001} {
concept A;
concept A; # Duplicate!
};
''')
report, dsm, defs = builder.parse()
# Check for errors
if report.has_error():
errors = list(report.errors())
for error in errors:
print(f"Error at {error.file()}:{error.line()}")
print(f" {error.message()}")
# Output: "Identifier 'A' already used in namespace 'Test'"
Key APIs: report.has_error(), report.errors(), error.message(), error.line()
Pattern Illustrated: Chain of Responsibility (CheckerNameSpaceDefinitionIdentifier detects duplicate)
Scenario 6: Detect Circular Inheritance¶
When to use: Validate concept inheritance hierarchies
Test source: test_dsm_semantic_errors.py:311-324 → test_checker_concept_recursive_indirect_cycle
from dsviper import DSMBuilder
# DSM with circular inheritance
builder = DSMBuilder()
builder.append("test", '''
namespace Test {00000000-0000-0000-0000-000000000001} {
concept A is a B;
concept B is a C;
concept C is a A; # Cycle: A→B→C→A
};
''')
report, dsm, defs = builder.parse()
# CheckerConceptRecursive detects cycle
assert report.has_error()
error = list(report.errors())[0]
assert "recursive" in error.message().lower()
print(f"Detected: {error.message()}")
# Output: "The concept Test::A is recursive with Test::C"
Key APIs: report.has_error(), error.message()
Pattern Illustrated: Cycle detection algorithm (visited set in CheckerConceptRecursive)
Scenario 7: Binary Serialization¶
When to use: Export DSMDefinitions for kibo code generator
Test source: test_dsm_serialization.py:28-44 → test_round_trip_definitions_count
from dsviper import DSMDefinitions
# Encode DSMDefinitions to binary (.dsmb format)
blob = dsm.encode() # Returns ValueBlob (StreamTokenBinaryCodec)
# Save to file (for kibo code generator)
# Note: DSMHelper::save() handles this in C++
# Decode back from binary
dsm2 = DSMDefinitions.decode(blob)
# Verify round-trip integrity
assert len(list(dsm.concepts())) == len(list(dsm2.concepts()))
assert len(list(dsm.structures())) == len(list(dsm2.structures()))
assert len(list(dsm.enumerations())) == len(list(dsm2.enumerations()))
# Result: Perfect preservation
Key APIs: dsm.encode(), DSMDefinitions.decode(blob)
Pattern Illustrated: Codec pattern (binary serialization for inter-process communication)
4.2 Integration Patterns¶
Multi-File Projects¶
Pattern: Load all .dsm files from a directory
# Assemble all *.dsm files in project
builder = DSMBuilder.assemble("project/schemas/")
# Files are concatenated in filesystem order
# Each file tracked as separate DSMBuilderPart (for error reporting)
report, dsm, defs = builder.parse()
Use case: Large projects with modular schemas (e.g., concepts.dsm, structures.dsm, attachments.dsm)
Error Reporting with File Context¶
Pattern: Show user-friendly errors with file/line/column
report, dsm, defs = builder.parse()
if report.has_error():
for error in report.errors():
# Get file provenance
part = builder.part(error.line())
print(f"File: {part.name()}")
print(f"Line: {error.line()} Column: {error.column()}")
print(f"Error: {error.message()}")
print()
Use case: IDE integration, command-line tools (dsm_util.py check)
Database Creation from DSM¶
Pattern: Initialize Database with schema
from dsviper import DSMHelper
# Parse DSM schema
builder = DSMBuilder.assemble("schema.dsm")
report, dsm, defs = builder.parse()
if not report.has_error():
# Create Database with schema
db = DSMHelper.create_store_database(
"data.db", # SQLite file
dsm, # DSMDefinitions
"Documentation" # Optional docs
)
# Database is initialized with:
# - Concept types (TypeConcept instances)
# - Structure types (TypeStructure instances)
# - Attachment schemas (key→data mappings)
Use case: Application initialization, schema migration
See also: DSMHelper::createCommitDatabase() for CommitDatabase schemas
4.3 Test Suite Reference¶
Full test coverage: python/tests/unit/test_dsm*.py (10 files, 4539 lines, 150+ tests)
| Test File | Tests | Lines | Focus |
|---|---|---|---|
test_dsm_builder.py |
~40 | 424 | Builder/append/parse workflow |
test_dsm_definitions.py |
~30 | 414 | DSMDefinitions content access |
test_dsm_functions.py |
~25 | 381 | Function/FunctionPool definitions |
test_dsm_inspector.py |
~35 | 381 | Querying/navigating definitions |
test_dsm_literals.py |
~15 | 171 | Default values |
test_dsm_semantic_errors.py |
~90 | 1186 | Exhaustive validation (18+ checkers) |
test_dsm_serialization.py |
~1 | 48 | Binary encoding/decoding |
test_dsm_structures.py |
~40 | 515 | Concepts/Clubs/Structures/Attachments |
test_dsm_types.py |
~80 | 889 | Complete type system (11 categories) |
test_definitions.py |
~10 | 130 | Definitions conversion |
Total: ~366 tests, 4539 lines
Special test files:
- test_dsm_semantic_errors.py (1186 lines) - Tests all 18+ checkers exhaustively
- test_dsm_types.py (889 lines) - Tests all 11 type categories (primitives, generics, mathematical)
Test DSM schemas:
- test.dsm - Basic DSM constructs
- test_comprehensive.dsm - Comprehensive type coverage (used by most tests)
5. Technical Constraints¶
Performance Considerations¶
- O(1) Lookups via Maps
DSMDefinitionsInspectorusesstd::map<TypeName, shared_ptr<T>>for all entity types- Queries like
queryConcept(typeName)are O(log n) (map lookup), not O(n) (linear search) - Critical for Phase 3 validation (18+ checkers × hundreds of type lookups)
-
Trade-off: Extra memory (maps duplicate vectors) vs fast lookups
-
Single-Pass ANTLR Parsing
- ANTLR parser is LL(*) with adaptive prediction (no backtracking)
- DSM grammar designed for efficient parsing (no ambiguity)
-
Rationale: Fast parsing for large
.dsmfiles (thousands of definitions) -
Early Exit on Validation Errors
- Semantic checkers execute sequentially, exit on first error batch
- Example: If Phase 1 has syntax errors, Phases 2-4 are skipped
- Example: If
CheckerTypeReferencefails, advanced checkers are skipped (need resolved types) -
Rationale: Save work, report errors early (fail fast)
-
Immutable Value Objects
- All DSM entities (DSMConcept, DSMStructure, etc.) have
constfields - Shared via
shared_ptr(reference counting, no copies) - Rationale: Thread-safe sharing across parsers, code generators, validators
Thread Safety¶
Immutable components (thread-safe for reading):
- DSMDefinitions (after construction + RuntimeID assignment)
- DSMConcept, DSMStructure, DSMEnumeration, etc. (all const fields)
- DSMDefinitionsInspector (reads DSMDefinitions, no mutation)
Mutable components (not thread-safe):
- DSMBuilder (progressive append, internal state changes)
- DSMParseReport (accumulates errors during validation)
- DSMP::Definitions (internal AST, mutated during conversion)
Concurrency pattern: - Parse in single thread (DSMBuilder → DSMParseReport → DSMDefinitions) - Share results across threads (DSMDefinitions is immutable)
Example:
// Thread 1: Parse
auto builder = DSMBuilder::make();
builder->append("schema.dsm", content);
auto report = DSMParseReport::make();
std::shared_ptr<DSMDefinitions> dsm;
std::shared_ptr<Definitions> defs;
DSMHelper::parse(builder, report, dsm, defs);
// Thread 2+: Read (safe, dsm is immutable)
auto inspector = DSMDefinitionsInspector::make(dsm);
auto concept = inspector->queryConcept(typeName);
Error Handling¶
Parse/Validation Errors (accumulated in DSMParseReport):
- Syntax errors (Phase 1): Missing semicolons, invalid tokens
- Semantic errors (Phase 3): Duplicate names, circular inheritance, invalid types
- Pattern: Accumulate all errors, return report (don't throw)
- Rationale: Show all errors to user (not just first one)
Programmer Errors (exceptions):
- DSMErrors::missingRuntimeId - DSMDefinitions used before RuntimeID assigned
- inspector->checkConcept(typeName) - TypeName not found (assertion failure)
- Pattern: Throw exception immediately (fail fast)
- Rationale: Indicate API misuse (should never happen in correct code)
Exception Safety:
- DSM parsing is exception-safe (RAII with shared_ptr)
- If ANTLR throws (e.g., out of memory), partial state is cleaned up automatically
- No resource leaks (all allocations via smart pointers)
Error Message Format:
File: schema.dsm
Line: 42
Column: 15
Message: Identifier 'MyType' already used in namespace 'MyNamespace'
Memory Model¶
Reference Semantics (Viper standard):
- All DSM entities shared via std::shared_ptr<T>
- Example: std::shared_ptr<DSMConcept>, std::shared_ptr<DSMStructure>
- Rationale: Shared ownership (parsers, code generators, inspectors all hold references)
Const Fields (immutability):
- All fields const (except runtimeId which is assigned post-construction)
- Example: TypeName const typeName, std::vector<...> const fields
- Rationale: Thread-safe sharing, prevent accidental mutation
RuntimeID Exception (mutable field):
class DSMConcept {
TypeName const typeName; // Immutable
std::shared_ptr<DSMTypeReference> const parent; // Immutable
UUId runtimeId; // Mutable (assigned in Phase 4)
};
Why mutable? RuntimeID depends on DefinitionsInspector (circular dependency: need Definitions to create Inspector, need Inspector to compute RuntimeID). Assigned in Phase 4 after Definitions created.
Memory Lifetime:
- DSMDefinitions lives as long as any consumer holds shared_ptr
- Typical pattern: Parse → hold DSMDefinitions in Database/CodeGenerator → release after use
- ANTLR AST (ParseTree) destroyed immediately after Phase 2 (not retained)
Type System Integration¶
DSM Types → Viper Types (conversion in Phase 4):
- DSMConcept → TypeConcept (via DSMDefinitionsToDefinitions::convert())
- DSMStructure → TypeStructure
- DSMEnumeration → TypeEnumeration
- DSMClub → TypeClub
RuntimeID Assignment (deterministic): - RuntimeID computed from definition content (DSM namespace UUID + definition structure) - Same DSM input → same RuntimeID (deterministic) - Use case: Schema evolution (detect unchanged definitions)
Type Resolution:
- Phase 3 CheckerTypeReference resolves all optional<TypeName> in AST
- After Phase 3, all type references are valid (or error reported)
- Phase 4 conversion assumes resolved types (no further validation)
6. Cross-References¶
Related Documentation¶
DSM Language (for users writing .dsm files):
- doc/DSM.md - DSM language specification (concepts, clubs, structures, attachments, syntax)
- doc/Getting_Started.md - Tutorial for writing .dsm schema files
- doc/Getting_Started_With_dsm_util.py.md - Command-line tool guide (dsm_util.py check/encode/decode)
Implementation Details:
- doc/Internal_Viper.md (section "Parsing DSM Definitions") - ANTLR parser architecture, listener pattern, semantic checkers
- doc/Kibo.md - Kibo code generator manual (consumes .dsmb binary files)
- doc/Kibo_Template_Model.md - Template model for code generation from DSMDefinitions
Related Domains:
- doc/domains/Type_And_Value_System.md - Foundation for DSM type representation
- doc/domains/Database.md - Uses DSMDefinitions for schema initialization
- doc/domains/Commit_System.md - Uses DSMCommitFunctionPool for mutations
Dependencies¶
This domain USES:
- Type System (Foundation Layer 0) - TypeName representation for DSM types (22 includes: Viper_TypeName.hpp)
- UUId (Foundation Layer 0) - Namespace UUIDs, RuntimeID generation (15 includes: Viper_UUId.hpp)
- NameSpace (Foundation Layer 0) - Namespace organization (6 includes: Viper_NameSpace.hpp)
- Error (Foundation Layer 0) - Exception types (2 includes: Viper_Error.hpp)
- Blob (Foundation Layer 0) - Binary serialization storage (2 includes: Viper_Blob.hpp)
This domain is USED BY:
- Database (Functional Layer 1) - DSMHelper::createStoreDatabase() initializes schemas from DSMDefinitions
- CommitDatabase (Functional Layer 1) - DSMHelper::createCommitDatabase() initializes schemas from DSMDefinitions
- Kibo code generator (External tool, Java) - Consumes binary .dsmb files (DSMDefinitionsEncoder output)
- StringHelper (Foundation Layer 0) - Pretty-printing DSM types via DSMLiteralDomain, DSMTypeReferenceDomain
Key Type References¶
C++ Headers (125 files):
- Main components (53 files): src/Viper/Viper_DSM*.hpp (DSMBuilder, DSMDefinitions, DSMType, etc.)
- Parser/Checker internals (72 files): src/Viper/Viper_DSMP_*.hpp (DSMP::Listener, DSMP::Checker, etc.)
C++ Implementations (117 files):
- src/Viper/Viper_DSM*.cpp
- Notable: Viper_DSMHelper.cpp (Facade implementation, 4-phase pipeline)
Python Bindings (31 files):
- src/P_Viper/P_Viper_DSM*.cpp (DSMBuilder, DSMDefinitions, DSMType*, etc.)
- Entry points: P_Viper_DSMBuilder.cpp (assemble/parse), P_Viper_DSMDefinitions.cpp (encode/decode)
Python Type Hints:
- dsviper_wheel/__init__.pyi - Type stubs for DSMBuilder, DSMDefinitions, DSMType*, etc.
ANTLR Grammar:
- dsm/DSM.g4 - DSM language grammar (generates ANTLR lexer/parser)
- Generated files: DSMLexer.h, DSMParser.h (C++ ANTLR output)
Test Files (10 files, 4539 lines):
- python/tests/unit/test_dsm_builder.py (424 lines)
- python/tests/unit/test_dsm_semantic_errors.py (1186 lines) - Exhaustive checker coverage
- python/tests/unit/test_dsm_types.py (889 lines) - Complete type system coverage
- See Section 4.3 for full test suite reference
Tools:
- tools/dsm_util.py - Command-line tool (check/encode/decode/generate)
- tools/kibo-1.2.0.jar - Code generator (consumes .dsmb binary files)
Document Metadata¶
Methodology Version: v1.3.1 Generated Date: 2025-11-14 Last Updated: 2025-11-14 Review Status: ✅ Complete
Test Files Analyzed: 10 files
- test_dsm_builder.py (424 lines, ~40 tests)
- test_dsm_definitions.py (414 lines, ~30 tests)
- test_dsm_functions.py (381 lines, ~25 tests)
- test_dsm_inspector.py (381 lines, ~35 tests)
- test_dsm_literals.py (171 lines, ~15 tests)
- test_dsm_semantic_errors.py (1186 lines, ~90 tests)
- test_dsm_serialization.py (48 lines, ~1 test)
- test_dsm_structures.py (515 lines, ~40 tests)
- test_dsm_types.py (889 lines, ~80 tests)
- test_definitions.py (130 lines, ~10 tests)
Test Coverage: 4539 lines, 366+ tests Golden Examples: 7 scenarios extracted C++ Files: 242 (125 headers + 117 implementations) Python Bindings: 31 files
Changelog:
- v1.0 (2025-11-14): Initial documentation following /document-domain v1.3.1 methodology
- Phase 0.5 audit: Identified 7 sub-domains, 34 main components, 18+ semantic checkers
- Phase 0.75 C++ analysis: 9 design patterns identified, dual AST representation understood
- Phase 1-3 validation: 7 golden scenarios extracted from tests, user validated structure
- Phase 5 implementation: 6 sections completed (~800 lines)
- Special focus: 4-phase pipeline, semantic validation (18+ checkers), composite type system (11 categories)
- Correction: representationGl() documented as Global (root namespace), not GLSL
Regeneration Trigger:
- When /document-domain methodology reaches v2.0 (methodology changes)
- When DSM C++ API changes significantly (e.g., new checker categories, AST restructuring)
- When ANTLR grammar changes require documentation updates (e.g., new DSM language features)
Appendix: Domain Statistics¶
C++ Files: 242 (125 headers + 117 implementations) - Main components: 53 (Viper_DSM.hpp) - Parser/Checker internals: 72 (Viper_DSMP_.hpp)
Python Bindings: 31 files (P_Viper_DSM.cpp) Test Files: 10 files (4539 lines) Sub-domains*: 7 1. Builder/Parser (3 components) 2. Definitions (5 components) 3. DSM Entities (8 components) 4. DSM Type System (11 categories) 5. Semantic Validation (18+ checkers) 6. Literals (3 components) 7. Error Handling (3 components)
Design Patterns: 9 1. Facade (DSMHelper) 2. Pipeline (4-phase parsing) 3. Builder (DSMBuilder) 4. Repository (DSMDefinitionsInspector) 5. Strategy (DSMType hierarchy) 6. Composite (recursive types) 7. Template Method (DSMP::Checker) 8. Chain of Responsibility (semantic validation) 9. Immutable Value Object (DSM entities)
Semantic Checkers: 18+ - Reserved Identifiers: 2 - Already Defined: 7 - Recursive Definitions: 3 - Type Validation: 6 - (See Section 3.5 for complete list)
Type Categories: 11 - Reference, Vector, Set, Map, Optional, Tuple, Variant, XArray, Key, Vec, Mat