Language Internals
Deep dive into how MDZ works under the hood: parser architecture, AST structure, validation pipeline, compiler internals, and LSP integration.
Architecture Overview
When you run mdz compile skill.mdz, the following pipeline executes:
- Lexer tokenizes the source into a stream of tokens
- Parser builds an Abstract Syntax Tree (AST)
- Compiler extracts metadata and validates
- Output returns the original source (unchanged)
- LSP provides IDE features during editing
This architecture ensures MDZ skills are validated at build time while remaining readable by both humans and LLMs at runtime.
Core Principle: Source = Output
The LLM sees exactly what you write. There is no transformation layer between source and execution. The compiler:
- Parses the source into AST
- Extracts metadata and validates
- Returns the original source unchanged
- Provides diagnostics and dependency information
Internals Documentation
Explore the detailed documentation for each component:
- AST Structure — Parser architecture, lexer tokens, recursive descent parsing, and AST node types
- Compilation — Compiler internals, metadata extraction, skill registry, and dependency graph
- Validation — Multi-stage validation pipeline with concrete diagnostic examples
- Terminology — Glossary of MDZ syntax elements and canonical terms
Implementation Challenges
Key challenges encountered during MDZ development and their solutions:
Indentation-Aware Parsing
Challenge: Python-style significant whitespace is complex to parse correctly, especially with mixed tabs/spaces and error recovery.
Solution: Indentation stack with explicit INDENT/DEDENT tokens, panic-mode error recovery, and clear error messages for inconsistent indentation.
Source = Output Constraint
Challenge: Validator-first approach requires preserving exact source formatting while still validating semantics.
Solution: No transformation pipeline - compiler only extracts metadata and validates, returns original source unchanged.
Skill Registry Management
Challenge: Cross-skill validation requires loading and caching multiple ASTs, with potential for stale data and performance issues.
Solution: Lazy loading with invalidation on file changes, workspace-wide registry with efficient lookup, and graceful degradation when skills are unavailable.
LLM-Friendly Error Messages
Challenge: Technical error messages are confusing for LLM authors who may not understand compiler internals.
Solution: Context-aware error messages with suggestions, examples of correct syntax, and progressive disclosure of technical details.
Real-Time IDE Performance
Challenge: LSP must provide instant feedback during typing, but full validation can be expensive.
Solution: Incremental parsing with AST diffing, cached validation results, and prioritized error reporting (syntax > types > references).
Contributor Pathways
MDZ is designed to be extensible. Here are key areas where contributors can add functionality:
Extending the Lexer
To add a new token type (e.g., a new keyword):
- Add token type to
TokenTypeunion inlexer.ts - Add pattern matching in
scanIdentOrKeyword() - Handle in parser grammar productions
- Add syntax highlighting rules to editor extensions
Adding AST Nodes
To add new language constructs:
- Define AST interface in
ast.ts - Add to appropriate union type (
Block,Expression, etc.) - Implement parser production in
parser.ts - Add validation logic in
compiler.ts - Update LSP handlers for IDE features
Validation Rules
Extend validation by adding new stages or modifying existing ones:
- Type checking: Add custom type rules beyond basic resolution
- Contract validation: Verify delegation parameter matching
- Cross-skill analysis: Validate data flow between skills
LSP Protocol Integration
IDE features are implemented as LSP message handlers:
- textDocument/completion: Add context-aware suggestions
- textDocument/hover: Provide type/variable information
- textDocument/definition: Implement go-to-definition for new constructs