DSM

1. Purpose & Motivation (Why)

Problem Solved

The DSM domain provides a complete implementation of the Digital Substrate Model parser, validator, and builder infrastructure. It solves the challenge of transforming user-written .dsm schema files (defining concepts, structures, attachments, etc.) into validated runtime type definitions that can be:

  • Code generated via the kibo tool (Java-based code generator)
  • Database initialized with proper schemas
  • Type-checked at compile time
  • Navigated programmatically via inspector APIs
  • Serialized for inter-process communication

Key problems addressed:

  1. Multi-file project support - Assemble definitions from multiple .dsm files across directories
  2. Semantic validation - Detect errors beyond syntax (circular inheritance, duplicate names, invalid types, etc.)
  3. Type resolution - Resolve references across namespaces and files
  4. Dual representation - Maintain both parsing artifacts (for error reporting) and clean API (for code generation)
  5. Binary serialization - Encode/decode definitions for the kibo code generator
  6. Developer tooling - Provide query/inspection APIs for IDE support, linters, analyzers

Important Distinction

This document describes the DSM implementation (parser/validator/builder for C++/Python developers).

For DSM language specification (syntax, semantics for .dsm file authors), see: - doc/DSM.md - Language reference (concepts, clubs, structures, attachments) - doc/Getting_Started.md - Tutorial for writing .dsm schema files - doc/Getting_Started_With_dsm_util.py.md - Command-line tool guide

Use Cases

Developers use the DSM domain when they need to:

  1. Parse .dsm schema files - Transform DSM language text into runtime Definitions
  2. Multi-file projects (assemble directory of .dsm files)
  3. Single-file schemas (load one .dsm file)
  4. Programmatic schema construction (append DSM text dynamically)

  5. Validate DSM semantics - Detect errors beyond syntax

  6. Circular inheritance detection (concept A is a B; concept B is a A)
  7. Duplicate name checking (concept and structure with same name)
  8. Type reference validation (ensure referenced types exist)
  9. Recursive structure detection (prevent infinite recursion)

  10. Query DSM definitions - Navigate parsed schemas programmatically

  11. Lookup concepts/structures by name
  12. Enumerate all types in a namespace
  13. Resolve type references across namespaces
  14. Extract function pools and attachments

  15. Generate code - Export definitions to kibo code generator

  16. Binary serialization (.dsmb format via StreamTokenBinaryCodec)
  17. Round-trip preservation (encode → decode maintains integrity)
  18. Template-based code generation (C++ classes, Python bindings)

  19. Create databases - Initialize Database/CommitDatabase with schemas

  20. DSMHelper::createStoreDatabase() - Create Database from DSMDefinitions
  21. DSMHelper::createCommitDatabase() - Create CommitDatabase from DSMDefinitions
  22. Schema evolution support (extend definitions incrementally)

  23. Build developer tools - IDE support, linters, analyzers

  24. Syntax error reporting with file/line/column precision
  25. Semantic error messages with context
  26. Type hierarchy navigation (concept inheritance trees)
  27. Dependency graph analysis (structure field references)

Position in Architecture

DSM is a Functional Layer 1 domain that builds on Foundation Layer 0 components:

Functional Layer 1
├── DSM (this domain) ← Parser/Validator/Builder for schema files
│   ├── Uses: Type System (TypeName representation)
│   ├── Uses: UUId (namespace identifiers)
│   ├── Uses: NameSpace (definition organization)
│   └── Produces: Definitions (runtime type system)
│
Foundation Layer 0
├── Type System (TypeName, TypeCode)
├── UUId (unique identifiers)
└── NameSpace (namespace representation)

Architectural role: - Input: .dsm text files (user-authored schemas) - Output: Definitions objects (runtime type system) - Process: 4-phase pipeline (Build → Parse → Validate → Convert)


2. Domain Overview (What)

Scope

The DSM domain provides capabilities for:

  • Building DSM content from files (single file or directory)
  • Parsing DSM language via ANTLR-generated parser
  • Validating semantics with 18+ specialized checkers
  • Converting parse trees into immutable DSMDefinitions
  • Inspecting definitions via Repository pattern (O(1) lookups)
  • Serializing definitions to binary for kibo code generator
  • Rendering definitions back to text (DSM language or HTML)

Out of scope (covered by other domains): - DSM language syntax/semantics (see doc/DSM.md) - User tutorials for writing .dsm files (see doc/Getting_Started.md) - Code generation templates (see templates/cpp/, templates/python/) - Kibo code generator implementation (see tools/kibo-1.2.0.jar)

Key Concepts

1. DSMBuilder - Progressive File Assembly

Builder pattern for constructing DSM content from multiple sources:

  • append(name, content) - Add DSM text from a file
  • parts() - Track file provenance (for error messages with file:line)
  • content() - Concatenated DSM text (single ANTLR input)
  • parse() - Execute 4-phase pipeline

Why concatenation? ANTLR parser expects single input string, but DSM projects span multiple files. DSMBuilder concatenates while preserving file boundaries for error reporting.

2. Four-Phase Pipeline - Build → Parse → Validate → Convert

Pipeline pattern transforming .dsm text into validated Definitions:

  1. Phase 1: ANTLR Lexing + Parsing
  2. Input: Concatenated DSM text
  3. Process: ANTLR lexer/parser (generated from DSM.g4 grammar)
  4. Output: ParseTree (ANTLR AST)
  5. Errors: Syntax errors (missing semicolons, invalid tokens)

  6. Phase 2: AST Traversal

  7. Input: ParseTree
  8. Process: DSMP::Listener walks tree, builds DSMP::Definitions
  9. Output: DSMP::Definitions (internal mutable AST with parser metadata)
  10. Errors: None (tree walk always succeeds if Phase 1 succeeded)

  11. Phase 3: Semantic Validation

  12. Input: DSMP::Definitions
  13. Process: 18+ DSMP::Checker subclasses validate semantics
  14. Output: DSMParseReport (accumulated errors)
  15. Errors: Duplicate names, circular inheritance, invalid types, recursive structures

  16. Phase 4: Conversion + RuntimeID Assignment

  17. Input: DSMP::Definitions (if no errors in Phase 3)
  18. Process: Convert to DSMDefinitions, then to Definitions, assign RuntimeIDs
  19. Output: DSMDefinitions (public immutable API) + Definitions (runtime types)
  20. Errors: RuntimeID conflicts (should never happen if Phase 3 passed)

Why four phases? Progressive validation with early exit saves work. Syntax errors caught in Phase 1 prevent wasted semantic checking in Phase 3.

3. DSMDefinitions - Immutable Container

Value Object pattern holding all parsed DSM entities:

class DSMDefinitions {
    std::vector<std::shared_ptr<DSMConcept>> concepts;
    std::vector<std::shared_ptr<DSMClub>> clubs;
    std::vector<std::shared_ptr<DSMEnumeration>> enumerations;
    std::vector<std::shared_ptr<DSMStructure>> structures;
    std::vector<std::shared_ptr<DSMAttachment>> attachments;
    std::vector<std::shared_ptr<DSMFunctionPool>> functionPools;
    std::vector<std::shared_ptr<DSMCommitFunctionPool>> commitFunctionPools;
};

Immutability: All fields const (except runtimeId which is assigned post-construction).

Why immutable? Thread-safe sharing across modules (parsers, code generators, validators).

4. DSMDefinitionsInspector - Repository for Queries

Repository pattern providing O(1) lookups via internal maps:

class DSMDefinitionsInspector {
    std::map<TypeName, std::shared_ptr<DSMConcept>> _concepts;
    std::map<TypeName, std::shared_ptr<DSMClub>> _clubs;
    std::map<TypeName, std::shared_ptr<DSMStructure>> _structures;
    // ... other maps
};

API patterns: - query*(typeName) - Returns nullptr if not found (safe for probing) - check*(typeName) - Throws exception if not found (for assertions)

Why Repository? Semantic checkers need frequent type lookups (18+ checkers × hundreds of types). Maps provide O(1) vs O(n) linear search in vectors.

5. Semantic Validation - Chain of Responsibility

18+ specialized checkers organized by dependency order:

// Identifiers first (no dependencies)
CheckerDefinitionIdentifier
CheckerIdentifierReserved

// Namespaces (depend on identifier validation)
CheckerNameSpaceSameName
CheckerNameSpaceSameUUID
CheckerNameSpaceDefinitionIdentifier

// Types (depend on namespace validation)
CheckerTypeReference  // Must resolve all type references

// Advanced validation (depend on type resolution)
CheckerConceptRecursive
CheckerStructureFieldAlreadyDefined
CheckerTypeStructureRecursive
// ... 10+ more checkers

Pattern: Each checker validates one aspect (Single Responsibility), executed sequentially with early exit on error (fail fast).

Why 18+ checkers? Modular design makes adding new validation rules trivial (Open/Closed Principle). Easy to test each rule in isolation.

External Dependencies

Uses (Foundation Layer): - Type System (22 includes) - TypeName representation for DSM types - UUId (15 includes) - Namespace UUIDs, RuntimeID generation - NameSpace (6 includes) - Namespace organization - Error (2 includes) - Exception types (DSMErrors) - Blob (2 includes) - Binary serialization storage

Used By (Functional Layer): - Database - DSMHelper::createStoreDatabase() initializes schemas - CommitDatabase - DSMHelper::createCommitDatabase() initializes schemas - Kibo code generator - Consumes binary .dsmb files (DSMDefinitionsEncoder output) - StringHelper - Pretty-printing DSM types via representationGl()


3. Functional Decomposition (Structure)

3.1 Sub-domains

The DSM domain comprises 7 sub-domains organized by concern:

1. Builder/Parser Sub-domain

Purpose: Load .dsm files and execute ANTLR parsing

Components: - DSMBuilder - Progressively append files, track provenance - DSMBuilderPart - Track file/line ranges for error messages - DSMHelper - Facade providing assemble() and parse() entry points - DSMP::Listener - ANTLR tree walker (generated from grammar) - DSMP::ErrorListener - Capture syntax errors with file/line/column

Key Pattern: Builder + Facade - Simplify complex parsing workflow


2. Definitions Sub-domain

Purpose: Store and access parsed DSM entities

Components: - DSMDefinitions - Immutable container for all entities - DSMDefinitionsInspector - Repository for O(1) lookups - DSMDefinitionsEncoder - Binary serialization (for kibo) - DSMDefinitionsDecoder - Binary deserialization (from kibo) - DSMDefinitionsRenderer - Text output (DSM language or HTML)

Key Pattern: Repository + Codec - Efficient access and serialization


3. DSM Entities Sub-domain

Purpose: Represent DSM language constructs

Components: - DSMConcept - Abstract types with inheritance (concept A is a B) - DSMClub - Union types (club MyUnion; membership MyUnion ConceptA) - DSMStructure - Composite types with fields (struct Point { float x; float y; }) - DSMEnumeration - Enumerated types (enum Status { active, inactive }) - DSMAttachment - Concept→Data mappings (attachment<Concept, Structure> data) - DSMFunction - Function signatures - DSMFunctionPool - Collections of functions - DSMCommitFunctionPool - Mutation functions for commit system

Key Pattern: Immutable Value Objects - Thread-safe, shared via shared_ptr


4. DSM Type System Sub-domain

Purpose: Represent generic and mathematical types

Components (11 type categories): - DSMTypeReference - Primitives (int64, string, bool) and named types - DSMTypeVector - Dynamic arrays (vector<T>) - DSMTypeSet - Unique collections (set<T>) - DSMTypeMap - Key-value pairs (map<K, V>) - DSMTypeOptional - Maybe types (optional<T>) - DSMTypeTuple - Heterogeneous tuples (tuple<T1, T2, T3>) - DSMTypeVariant - Sum types (variant<T1, T2, T3>) - DSMTypeXArray - CRDT arrays (xarray<T>) - DSMTypeKey - Key types for attachments (key<Concept>) - DSMTypeVec - Mathematical vectors (vec<float, 3> for vec3) - DSMTypeMat - Matrices (mat<float, 3, 3> for mat3)

Key Pattern: Composite + Strategy - Recursive type composition

Example: vector<optional<map<int64, string>>> is composed as:

DSMTypeVector
  └─ elementType: DSMTypeOptional
       └─ valueType: DSMTypeMap
            ├─ keyType: DSMTypeReference (int64)
            └─ valueType: DSMTypeReference (string)

Three representation modes: - representation() - Fully qualified (MyNamespace::MyType) - representationIn(namespace) - Relative to namespace (MyType if same namespace) - representationGl(inspector) - Global from root namespace (legacy naming, should be representationInRoot)


5. Semantic Validation Sub-domain

Purpose: Detect errors beyond syntax

Base Class: - DSMP::Checker - Template Method pattern (abstract check(report))

18+ Concrete Checkers (organized by category):

Category 1: Reserved Identifiers (2 checkers) - CheckerDefinitionIdentifier - Validate identifier format - CheckerIdentifierReserved - Check reserved words (permissive: allows C++ keywords)

Category 2: Already Defined (7 checkers) - CheckerNameSpaceDefinitionIdentifier - Unique names within namespace - CheckerAttachmentAlreadyDefined - No duplicate attachments - CheckerEnumerationCaseAlreadyDefined - Unique enum cases - CheckerStructureFieldAlreadyDefined - Unique struct fields - CheckerFunctionAlreadyDefined - Unique function names in pool - CheckerFunctionParameterAlreadyDefined - Unique parameter names - CheckerPoolIdentifierAlreadyDefined - Unique pool identifiers

Category 3: Recursive Definitions (3 checkers) - CheckerConceptRecursive - Detect circular inheritance (A → B → C → A) - CheckerNameSpaceRecursive - Detect namespace cycles - CheckerTypeStructureRecursive - Detect recursive structures

Category 4: Type Validation (6 checkers) - CheckerTypeReference - Resolve all type references (must run before advanced checkers) - CheckerTypeVariantDuplicated - No duplicate variant members - CheckerTypeVariantTooMany - Limit variant size (implementation constraint) - CheckerEnumerationCaseTooMany - Limit enum size - CheckerStructureFieldEmpty - Structures must have at least one field - CheckerStructureFieldDefaultValue - Validate literal default values

Key Patterns: - Chain of Responsibility - Sequential execution with early exit - Dependency Ordering - Checkers run in precise order (identifiers → namespaces → types → structures)

Cycle Detection Algorithm (example from CheckerConceptRecursive):

std::unordered_set<std::string> visited;
auto w_concept = concept_;

while (w_concept->tokenIsa) {  // Has parent
    auto parent = w_concept->getParentIdentifier();
    if (contains(visited, parent)) {
        report_error("Recursive concept detected");
        return;
    }
    visited.insert(parent);
    w_concept = inspector->query(parent);
}

6. Literals Sub-domain

Purpose: Represent default values in DSM schemas

Components: - DSMLiteral - Base class for literal values - DSMLiteralValue - Primitive literals (42, "hello", true) - DSMLiteralList - Collection literals ([1, 2, 3]) - DSMLiteralDomain - Domain-specific literals (UUIDs, etc.)

Usage: Structure fields can have default values:

struct Config {
    int64 timeout = 30;
    string name = "default";
    bool enabled = true;
};

7. Error Handling Sub-domain

Purpose: Report parsing and validation errors

Components: - DSMParseReport - Accumulator for all errors - DSMParseError - Single error with file/line/column/message - DSMErrors - Exception types for programmer errors

Error Reporting Pattern:

// Syntax errors (Phase 1)
DSMParseError::make(part, token, "Missing semicolon");

// Semantic errors (Phase 3)
report->add(DSMParseError::make(part, token, "Duplicate name 'MyType'"));

// Programmer errors (API misuse)
throw DSMErrors::missingRuntimeId(component, ctx, type, representation);

Design Decision: Parsing/validation errors → DSMParseReport (accumulated), API misuse → exceptions (fail fast).


3.2 Key Components (Entry Points)

Component Purpose Entry Point File
DSMHelper Facade for all DSM operations Viper_DSMHelper.hpp
DSMBuilder Progressive file assembly Viper_DSMBuilder.hpp
DSMDefinitions Immutable container Viper_DSMDefinitions.hpp
DSMDefinitionsInspector Repository for queries Viper_DSMDefinitionsInspector.hpp
DSMConcept Concept entity Viper_DSMConcept.hpp
DSMStructure Structure entity Viper_DSMStructure.hpp
DSMType (base) Type hierarchy root Viper_DSMType.hpp
DSMTypeVector Generic vector type Viper_DSMTypeVector.hpp
DSMParseReport Error accumulator Viper_DSMParseReport.hpp
DSMP::Checker Validation base class Viper_DSMP_Checker.hpp

3.3 Component Map (Visual)

┌─────────────────────────────────────────────────────────────────┐
│                     DSM Domain Architecture                       │
└─────────────────────────────────────────────────────────────────┘

User .dsm Files
      │
      ↓
┌──────────────────┐
│   DSMBuilder     │  ← Facade: DSMHelper::assemble(path)
│  (Builder)       │
│  - append()      │
│  - parse()       │
└────────┬─────────┘
         │ Concatenated DSM text
         ↓
┌──────────────────────────────────────────────────────────────────┐
│                      4-Phase Pipeline                              │
├──────────────────────────────────────────────────────────────────┤
│                                                                    │
│  Phase 1: ANTLR Parsing                                          │
│  ┌────────────────┐                                              │
│  │ ANTLR Lexer    │ → TokenStream → ANTLR Parser → ParseTree    │
│  └────────────────┘                                              │
│         │                                                         │
│         ↓ (if syntax error)                                      │
│  ┌────────────────┐                                              │
│  │ ErrorListener  │ → DSMParseReport (syntax errors)            │
│  └────────────────┘                                              │
│                                                                    │
│  ─────────────────────────────────────────────────────────────   │
│                                                                    │
│  Phase 2: AST Traversal                                          │
│  ┌────────────────┐                                              │
│  │ DSMP::Listener │ → DSMP::Definitions (internal mutable AST)  │
│  └────────────────┘                                              │
│                                                                    │
│  ─────────────────────────────────────────────────────────────   │
│                                                                    │
│  Phase 3: Semantic Validation (18+ checkers)                     │
│  ┌──────────────────────────────────────────┐                   │
│  │ CheckerDefinitionIdentifier              │                   │
│  │ CheckerIdentifierReserved                │                   │
│  ├──────────────────────────────────────────┤                   │
│  │ CheckerNameSpaceSameName                 │                   │
│  │ CheckerNameSpaceSameUUID                 │                   │
│  │ CheckerNameSpaceDefinitionIdentifier     │                   │
│  ├──────────────────────────────────────────┤                   │
│  │ CheckerTypeReference (must run first!)   │ ← Resolve types  │
│  ├──────────────────────────────────────────┤                   │
│  │ CheckerConceptRecursive                  │                   │
│  │ CheckerStructureFieldAlreadyDefined      │                   │
│  │ CheckerTypeStructureRecursive            │                   │
│  │ ... 10+ more checkers                    │                   │
│  └────────────┬─────────────────────────────┘                   │
│               │                                                   │
│               ↓ (if semantic error)                              │
│        DSMParseReport (semantic errors)                          │
│                                                                    │
│  ─────────────────────────────────────────────────────────────   │
│                                                                    │
│  Phase 4: Conversion + RuntimeID Assignment                      │
│  ┌────────────────┐                                              │
│  │ DSMP::Converter│ → DSMDefinitions (public immutable)         │
│  └────────┬───────┘                                              │
│           │                                                       │
│           ↓                                                       │
│  ┌──────────────────────────┐                                   │
│  │ DSMDefinitionsToDefinitions│ → Definitions (runtime types)   │
│  └────────┬─────────────────┘                                   │
│           │                                                       │
│           ↓ (assign RuntimeIDs via DefinitionsInspector)        │
│  DSMDefinitions (with RuntimeIDs)                               │
│                                                                    │
└────────────────────────────────────────────────────────────────┘
         │
         ↓
┌──────────────────────────┐
│ DSMDefinitionsInspector  │  ← Repository (O(1) lookups)
│  (Repository)            │
│  - queryConcept()        │
│  - queryStructure()      │
│  - checkConcept()        │
└──────────────────────────┘
         │
         ↓
┌──────────────────────────┐
│   Consumers              │
│  - Kibo code generator   │ ← Binary serialization (.dsmb)
│  - Database creation     │ ← DSMHelper::createStoreDatabase()
│  - IDE tools             │ ← Inspector queries
└──────────────────────────┘

Legend: - Data flow - Process flow - ┌─┐ Component boundary - ├─┤ Checker category boundary

Key Design Decisions:

  1. Dual AST Representation (DSMP::Definitions vs DSMDefinitions)
  2. Why: DSMP::Definitions preserves parser metadata (tokens, positions) for error reporting
  3. Why: DSMDefinitions provides clean API without parser artifacts
  4. Trade-off: Extra conversion cost vs separation of concerns

  5. RuntimeID Assigned Post-Conversion (mutable field)

  6. Why: RuntimeID depends on DefinitionsInspector (type resolution)
  7. Why: Can't compute during parsing (circular dependency: need Definitions to create Inspector, need Inspector to compute RuntimeID)
  8. Trade-off: One mutable field vs cleaner initialization

  9. Sequential Checker Execution with Early Exit

  10. Why: Save work if early checkers find errors (fail fast)
  11. Why: Dependency ordering (must resolve types before checking recursion)
  12. Trade-off: Cannot parallelize checkers vs correct validation order

4. Developer Usage Patterns (Practical)

4.1 Core Scenarios

Each scenario extracted from real test code.


Scenario 1: Basic DSM Parsing Workflow

When to use: Parse .dsm schema files into validated definitions Test source: test_dsm_builder.py:201-221TestDSMBuilderParse::test_parse_valid_dsm

from dsviper import DSMBuilder

# Phase 1: Build - Append DSM content
builder = DSMBuilder()
builder.append("test.dsm", """
namespace Test {00000000-0000-0000-0000-000000000001} {
    struct Point {
        float x;
        float y;
    };
};
""")

# Phase 2-4: Parse - Execute 4-phase pipeline
# (Lexing → Parsing → Validation → Conversion)
report, dsm, defs = builder.parse()

# Check validation results
if report.has_error():
    for error in report.errors():
        print(f"{error.file()}:{error.line()} - {error.message()}")
else:
    # Access parsed DSMDefinitions
    structs = list(dsm.structures())
    print(f"Parsed {len(structs)} structures")
    # Result: 1 structure (Point with 2 fields)

Key APIs: DSMBuilder(), builder.append(), builder.parse(), report.has_error(), dsm.structures()

Pattern Illustrated: 4-phase pipeline (Build → Parse → Validate → Convert)


Scenario 2: Load from File or Directory

When to use: Load .dsm schemas from filesystem Test source: test_dsm_builder.py:137-150TestDSMBuilderAssemble::test_assemble_from_file

from dsviper import DSMBuilder

# Load single file
builder = DSMBuilder.assemble("schema.dsm")

# Or load entire directory (all *.dsm files)
builder = DSMBuilder.assemble("schemas/")

# Parse assembled content
report, dsm, defs = builder.parse()

# Access file provenance
for part in builder.parts():
    print(f"Loaded {part.name()} ({part.line_count()} lines)")

Key APIs: DSMBuilder.assemble(path)

Pattern Illustrated: Facade pattern (DSMHelper simplifies file I/O + builder construction)


Scenario 3: Query DSM Definitions

When to use: Navigate parsed schemas programmatically Test source: test_dsm_inspector.py:59-70TestDSMInspectorConcepts::test_query_concept_exists

from dsviper import DSMDefinitionsInspector

# Create inspector (Repository)
inspector = DSMDefinitionsInspector(dsm)

# Get all concept names
concept_names = inspector.concept_type_names()
for tn in concept_names:
    print(f"Concept: {tn.name()}")

# Query specific concept by TypeName
concept_a_tn = next(tn for tn in concept_names if tn.name() == "ConceptA")
concept = inspector.query_concept(concept_a_tn)

if concept:
    print(f"Found: {concept.type_name().name()}")
    if concept.parent():
        print(f"  Inherits from: {concept.parent().type_name().name()}")

# Use check_concept() for assertions (throws if not found)
concept = inspector.check_concept(concept_a_tn)  # Exception if missing

Key APIs: DSMDefinitionsInspector(dsm), concept_type_names(), query_concept(), check_concept()

Pattern Illustrated: Repository pattern (O(1) lookups via internal maps)


Scenario 4: Navigate Composite Types

When to use: Analyze generic types (vector, map, optional, etc.) Test source: test_dsm_types.py:219-234TestDSMTypeVector::test_vector_nested

from dsviper import DSMTypeVector, DSMTypeReference

# DSM schema: vector<vector<int64>>
struct = next(s for s in dsm.structures() if s.type_name().name() == "NestedLevel2")
field = next(f for f in struct.fields() if f.name() == "f_vector_of_vectors")
field_type = field.type()

# Navigate type tree (Composite pattern)
assert isinstance(field_type, DSMTypeVector)
print(f"Outer type: {field_type.representation()}")  # "vector<vector<int64>>"

# Recurse into element type
inner_vec = field_type.element_type()
assert isinstance(inner_vec, DSMTypeVector)
print(f"Inner type: {inner_vec.representation()}")  # "vector<int64>"

# Leaf type
elem_type = inner_vec.element_type()
assert isinstance(elem_type, DSMTypeReference)
print(f"Element type: {elem_type.type_name().name()}")  # "int64"

Key APIs: type.element_type(), type.representation(), isinstance(type, DSMTypeVector)

Pattern Illustrated: Composite pattern (recursive type tree navigation)


Scenario 5: Detect Semantic Errors

When to use: Validate DSM schemas for semantic correctness Test source: test_dsm_semantic_errors.py:41-53test_checker_namespace_definition_duplicate_concept

from dsviper import DSMBuilder

# DSM with duplicate concept names
builder = DSMBuilder()
builder.append("test", '''
namespace Test {00000000-0000-0000-0000-000000000001} {
    concept A;
    concept A;  # Duplicate!
};
''')

report, dsm, defs = builder.parse()

# Check for errors
if report.has_error():
    errors = list(report.errors())
    for error in errors:
        print(f"Error at {error.file()}:{error.line()}")
        print(f"  {error.message()}")
        # Output: "Identifier 'A' already used in namespace 'Test'"

Key APIs: report.has_error(), report.errors(), error.message(), error.line()

Pattern Illustrated: Chain of Responsibility (CheckerNameSpaceDefinitionIdentifier detects duplicate)


Scenario 6: Detect Circular Inheritance

When to use: Validate concept inheritance hierarchies Test source: test_dsm_semantic_errors.py:311-324test_checker_concept_recursive_indirect_cycle

from dsviper import DSMBuilder

# DSM with circular inheritance
builder = DSMBuilder()
builder.append("test", '''
namespace Test {00000000-0000-0000-0000-000000000001} {
    concept A is a B;
    concept B is a C;
    concept C is a A;  # Cycle: A→B→C→A
};
''')

report, dsm, defs = builder.parse()

# CheckerConceptRecursive detects cycle
assert report.has_error()
error = list(report.errors())[0]
assert "recursive" in error.message().lower()
print(f"Detected: {error.message()}")
# Output: "The concept Test::A is recursive with Test::C"

Key APIs: report.has_error(), error.message()

Pattern Illustrated: Cycle detection algorithm (visited set in CheckerConceptRecursive)


Scenario 7: Binary Serialization

When to use: Export DSMDefinitions for kibo code generator Test source: test_dsm_serialization.py:28-44test_round_trip_definitions_count

from dsviper import DSMDefinitions

# Encode DSMDefinitions to binary (.dsmb format)
blob = dsm.encode()  # Returns ValueBlob (StreamTokenBinaryCodec)

# Save to file (for kibo code generator)
# Note: DSMHelper::save() handles this in C++

# Decode back from binary
dsm2 = DSMDefinitions.decode(blob)

# Verify round-trip integrity
assert len(list(dsm.concepts())) == len(list(dsm2.concepts()))
assert len(list(dsm.structures())) == len(list(dsm2.structures()))
assert len(list(dsm.enumerations())) == len(list(dsm2.enumerations()))
# Result: Perfect preservation

Key APIs: dsm.encode(), DSMDefinitions.decode(blob)

Pattern Illustrated: Codec pattern (binary serialization for inter-process communication)


4.2 Integration Patterns

Multi-File Projects

Pattern: Load all .dsm files from a directory

# Assemble all *.dsm files in project
builder = DSMBuilder.assemble("project/schemas/")

# Files are concatenated in filesystem order
# Each file tracked as separate DSMBuilderPart (for error reporting)

report, dsm, defs = builder.parse()

Use case: Large projects with modular schemas (e.g., concepts.dsm, structures.dsm, attachments.dsm)


Error Reporting with File Context

Pattern: Show user-friendly errors with file/line/column

report, dsm, defs = builder.parse()

if report.has_error():
    for error in report.errors():
        # Get file provenance
        part = builder.part(error.line())

        print(f"File: {part.name()}")
        print(f"Line: {error.line()} Column: {error.column()}")
        print(f"Error: {error.message()}")
        print()

Use case: IDE integration, command-line tools (dsm_util.py check)


Database Creation from DSM

Pattern: Initialize Database with schema

from dsviper import DSMHelper

# Parse DSM schema
builder = DSMBuilder.assemble("schema.dsm")
report, dsm, defs = builder.parse()

if not report.has_error():
    # Create Database with schema
    db = DSMHelper.create_store_database(
        "data.db",           # SQLite file
        dsm,                 # DSMDefinitions
        "Documentation"      # Optional docs
    )

    # Database is initialized with:
    # - Concept types (TypeConcept instances)
    # - Structure types (TypeStructure instances)
    # - Attachment schemas (key→data mappings)

Use case: Application initialization, schema migration

See also: DSMHelper::createCommitDatabase() for CommitDatabase schemas


4.3 Test Suite Reference

Full test coverage: python/tests/unit/test_dsm*.py (10 files, 4539 lines, 150+ tests)

Test File Tests Lines Focus
test_dsm_builder.py ~40 424 Builder/append/parse workflow
test_dsm_definitions.py ~30 414 DSMDefinitions content access
test_dsm_functions.py ~25 381 Function/FunctionPool definitions
test_dsm_inspector.py ~35 381 Querying/navigating definitions
test_dsm_literals.py ~15 171 Default values
test_dsm_semantic_errors.py ~90 1186 Exhaustive validation (18+ checkers)
test_dsm_serialization.py ~1 48 Binary encoding/decoding
test_dsm_structures.py ~40 515 Concepts/Clubs/Structures/Attachments
test_dsm_types.py ~80 889 Complete type system (11 categories)
test_definitions.py ~10 130 Definitions conversion

Total: ~366 tests, 4539 lines

Special test files: - test_dsm_semantic_errors.py (1186 lines) - Tests all 18+ checkers exhaustively - test_dsm_types.py (889 lines) - Tests all 11 type categories (primitives, generics, mathematical)

Test DSM schemas: - test.dsm - Basic DSM constructs - test_comprehensive.dsm - Comprehensive type coverage (used by most tests)


5. Technical Constraints

Performance Considerations

  1. O(1) Lookups via Maps
  2. DSMDefinitionsInspector uses std::map<TypeName, shared_ptr<T>> for all entity types
  3. Queries like queryConcept(typeName) are O(log n) (map lookup), not O(n) (linear search)
  4. Critical for Phase 3 validation (18+ checkers × hundreds of type lookups)
  5. Trade-off: Extra memory (maps duplicate vectors) vs fast lookups

  6. Single-Pass ANTLR Parsing

  7. ANTLR parser is LL(*) with adaptive prediction (no backtracking)
  8. DSM grammar designed for efficient parsing (no ambiguity)
  9. Rationale: Fast parsing for large .dsm files (thousands of definitions)

  10. Early Exit on Validation Errors

  11. Semantic checkers execute sequentially, exit on first error batch
  12. Example: If Phase 1 has syntax errors, Phases 2-4 are skipped
  13. Example: If CheckerTypeReference fails, advanced checkers are skipped (need resolved types)
  14. Rationale: Save work, report errors early (fail fast)

  15. Immutable Value Objects

  16. All DSM entities (DSMConcept, DSMStructure, etc.) have const fields
  17. Shared via shared_ptr (reference counting, no copies)
  18. Rationale: Thread-safe sharing across parsers, code generators, validators

Thread Safety

Immutable components (thread-safe for reading): - DSMDefinitions (after construction + RuntimeID assignment) - DSMConcept, DSMStructure, DSMEnumeration, etc. (all const fields) - DSMDefinitionsInspector (reads DSMDefinitions, no mutation)

Mutable components (not thread-safe): - DSMBuilder (progressive append, internal state changes) - DSMParseReport (accumulates errors during validation) - DSMP::Definitions (internal AST, mutated during conversion)

Concurrency pattern: - Parse in single thread (DSMBuilder → DSMParseReport → DSMDefinitions) - Share results across threads (DSMDefinitions is immutable)

Example:

// Thread 1: Parse
auto builder = DSMBuilder::make();
builder->append("schema.dsm", content);
auto report = DSMParseReport::make();
std::shared_ptr<DSMDefinitions> dsm;
std::shared_ptr<Definitions> defs;
DSMHelper::parse(builder, report, dsm, defs);

// Thread 2+: Read (safe, dsm is immutable)
auto inspector = DSMDefinitionsInspector::make(dsm);
auto concept = inspector->queryConcept(typeName);

Error Handling

Parse/Validation Errors (accumulated in DSMParseReport): - Syntax errors (Phase 1): Missing semicolons, invalid tokens - Semantic errors (Phase 3): Duplicate names, circular inheritance, invalid types - Pattern: Accumulate all errors, return report (don't throw) - Rationale: Show all errors to user (not just first one)

Programmer Errors (exceptions): - DSMErrors::missingRuntimeId - DSMDefinitions used before RuntimeID assigned - inspector->checkConcept(typeName) - TypeName not found (assertion failure) - Pattern: Throw exception immediately (fail fast) - Rationale: Indicate API misuse (should never happen in correct code)

Exception Safety: - DSM parsing is exception-safe (RAII with shared_ptr) - If ANTLR throws (e.g., out of memory), partial state is cleaned up automatically - No resource leaks (all allocations via smart pointers)

Error Message Format:

File: schema.dsm
Line: 42
Column: 15
Message: Identifier 'MyType' already used in namespace 'MyNamespace'

Memory Model

Reference Semantics (Viper standard): - All DSM entities shared via std::shared_ptr<T> - Example: std::shared_ptr<DSMConcept>, std::shared_ptr<DSMStructure> - Rationale: Shared ownership (parsers, code generators, inspectors all hold references)

Const Fields (immutability): - All fields const (except runtimeId which is assigned post-construction) - Example: TypeName const typeName, std::vector<...> const fields - Rationale: Thread-safe sharing, prevent accidental mutation

RuntimeID Exception (mutable field):

class DSMConcept {
    TypeName const typeName;           // Immutable
    std::shared_ptr<DSMTypeReference> const parent;  // Immutable
    UUId runtimeId;                    // Mutable (assigned in Phase 4)
};

Why mutable? RuntimeID depends on DefinitionsInspector (circular dependency: need Definitions to create Inspector, need Inspector to compute RuntimeID). Assigned in Phase 4 after Definitions created.

Memory Lifetime: - DSMDefinitions lives as long as any consumer holds shared_ptr - Typical pattern: Parse → hold DSMDefinitions in Database/CodeGenerator → release after use - ANTLR AST (ParseTree) destroyed immediately after Phase 2 (not retained)

Type System Integration

DSM Types → Viper Types (conversion in Phase 4): - DSMConceptTypeConcept (via DSMDefinitionsToDefinitions::convert()) - DSMStructureTypeStructure - DSMEnumerationTypeEnumeration - DSMClubTypeClub

RuntimeID Assignment (deterministic): - RuntimeID computed from definition content (DSM namespace UUID + definition structure) - Same DSM input → same RuntimeID (deterministic) - Use case: Schema evolution (detect unchanged definitions)

Type Resolution: - Phase 3 CheckerTypeReference resolves all optional<TypeName> in AST - After Phase 3, all type references are valid (or error reported) - Phase 4 conversion assumes resolved types (no further validation)


6. Cross-References

DSM Language (for users writing .dsm files): - doc/DSM.md - DSM language specification (concepts, clubs, structures, attachments, syntax) - doc/Getting_Started.md - Tutorial for writing .dsm schema files - doc/Getting_Started_With_dsm_util.py.md - Command-line tool guide (dsm_util.py check/encode/decode)

Implementation Details: - doc/Internal_Viper.md (section "Parsing DSM Definitions") - ANTLR parser architecture, listener pattern, semantic checkers - doc/Kibo.md - Kibo code generator manual (consumes .dsmb binary files) - doc/Kibo_Template_Model.md - Template model for code generation from DSMDefinitions

Related Domains: - doc/domains/Type_And_Value_System.md - Foundation for DSM type representation - doc/domains/Database.md - Uses DSMDefinitions for schema initialization - doc/domains/Commit_System.md - Uses DSMCommitFunctionPool for mutations

Dependencies

This domain USES: - Type System (Foundation Layer 0) - TypeName representation for DSM types (22 includes: Viper_TypeName.hpp) - UUId (Foundation Layer 0) - Namespace UUIDs, RuntimeID generation (15 includes: Viper_UUId.hpp) - NameSpace (Foundation Layer 0) - Namespace organization (6 includes: Viper_NameSpace.hpp) - Error (Foundation Layer 0) - Exception types (2 includes: Viper_Error.hpp) - Blob (Foundation Layer 0) - Binary serialization storage (2 includes: Viper_Blob.hpp)

This domain is USED BY: - Database (Functional Layer 1) - DSMHelper::createStoreDatabase() initializes schemas from DSMDefinitions - CommitDatabase (Functional Layer 1) - DSMHelper::createCommitDatabase() initializes schemas from DSMDefinitions - Kibo code generator (External tool, Java) - Consumes binary .dsmb files (DSMDefinitionsEncoder output) - StringHelper (Foundation Layer 0) - Pretty-printing DSM types via DSMLiteralDomain, DSMTypeReferenceDomain

Key Type References

C++ Headers (125 files): - Main components (53 files): src/Viper/Viper_DSM*.hpp (DSMBuilder, DSMDefinitions, DSMType, etc.) - Parser/Checker internals (72 files): src/Viper/Viper_DSMP_*.hpp (DSMP::Listener, DSMP::Checker, etc.)

C++ Implementations (117 files): - src/Viper/Viper_DSM*.cpp - Notable: Viper_DSMHelper.cpp (Facade implementation, 4-phase pipeline)

Python Bindings (31 files): - src/P_Viper/P_Viper_DSM*.cpp (DSMBuilder, DSMDefinitions, DSMType*, etc.) - Entry points: P_Viper_DSMBuilder.cpp (assemble/parse), P_Viper_DSMDefinitions.cpp (encode/decode)

Python Type Hints: - dsviper_wheel/__init__.pyi - Type stubs for DSMBuilder, DSMDefinitions, DSMType*, etc.

ANTLR Grammar: - dsm/DSM.g4 - DSM language grammar (generates ANTLR lexer/parser) - Generated files: DSMLexer.h, DSMParser.h (C++ ANTLR output)

Test Files (10 files, 4539 lines): - python/tests/unit/test_dsm_builder.py (424 lines) - python/tests/unit/test_dsm_semantic_errors.py (1186 lines) - Exhaustive checker coverage - python/tests/unit/test_dsm_types.py (889 lines) - Complete type system coverage - See Section 4.3 for full test suite reference

Tools: - tools/dsm_util.py - Command-line tool (check/encode/decode/generate) - tools/kibo-1.2.0.jar - Code generator (consumes .dsmb binary files)


Document Metadata

Methodology Version: v1.3.1 Generated Date: 2025-11-14 Last Updated: 2025-11-14 Review Status: ✅ Complete

Test Files Analyzed: 10 files - test_dsm_builder.py (424 lines, ~40 tests) - test_dsm_definitions.py (414 lines, ~30 tests) - test_dsm_functions.py (381 lines, ~25 tests) - test_dsm_inspector.py (381 lines, ~35 tests) - test_dsm_literals.py (171 lines, ~15 tests) - test_dsm_semantic_errors.py (1186 lines, ~90 tests) - test_dsm_serialization.py (48 lines, ~1 test) - test_dsm_structures.py (515 lines, ~40 tests) - test_dsm_types.py (889 lines, ~80 tests) - test_definitions.py (130 lines, ~10 tests)

Test Coverage: 4539 lines, 366+ tests Golden Examples: 7 scenarios extracted C++ Files: 242 (125 headers + 117 implementations) Python Bindings: 31 files

Changelog: - v1.0 (2025-11-14): Initial documentation following /document-domain v1.3.1 methodology - Phase 0.5 audit: Identified 7 sub-domains, 34 main components, 18+ semantic checkers - Phase 0.75 C++ analysis: 9 design patterns identified, dual AST representation understood - Phase 1-3 validation: 7 golden scenarios extracted from tests, user validated structure - Phase 5 implementation: 6 sections completed (~800 lines) - Special focus: 4-phase pipeline, semantic validation (18+ checkers), composite type system (11 categories) - Correction: representationGl() documented as Global (root namespace), not GLSL

Regeneration Trigger: - When /document-domain methodology reaches v2.0 (methodology changes) - When DSM C++ API changes significantly (e.g., new checker categories, AST restructuring) - When ANTLR grammar changes require documentation updates (e.g., new DSM language features)


Appendix: Domain Statistics

C++ Files: 242 (125 headers + 117 implementations) - Main components: 53 (Viper_DSM.hpp) - Parser/Checker internals: 72 (Viper_DSMP_.hpp)

Python Bindings: 31 files (P_Viper_DSM.cpp) Test Files: 10 files (4539 lines) Sub-domains*: 7 1. Builder/Parser (3 components) 2. Definitions (5 components) 3. DSM Entities (8 components) 4. DSM Type System (11 categories) 5. Semantic Validation (18+ checkers) 6. Literals (3 components) 7. Error Handling (3 components)

Design Patterns: 9 1. Facade (DSMHelper) 2. Pipeline (4-phase parsing) 3. Builder (DSMBuilder) 4. Repository (DSMDefinitionsInspector) 5. Strategy (DSMType hierarchy) 6. Composite (recursive types) 7. Template Method (DSMP::Checker) 8. Chain of Responsibility (semantic validation) 9. Immutable Value Object (DSM entities)

Semantic Checkers: 18+ - Reserved Identifiers: 2 - Already Defined: 7 - Recursive Definitions: 3 - Type Validation: 6 - (See Section 3.5 for complete list)

Type Categories: 11 - Reference, Vector, Set, Map, Optional, Tuple, Variant, XArray, Key, Vec, Mat