Skip to content

Commit 20c32f3

Browse files
committed
fix: Fix mypy type error and formatting in evaluator.py
- Handle potential None return from fetchone() properly - Apply ruff formatting
1 parent 5e3ade2 commit 20c32f3

2 files changed

Lines changed: 176 additions & 2 deletions

File tree

PR_DESCRIPTION.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
# Add AUDIT_ONLY Model Kind for Multi-Table Validation
2+
3+
## Summary
4+
This PR introduces a new `AUDIT_ONLY` model kind to SQLMesh, addressing the gap in validating relationships between multiple tables without materializing unnecessary tables. This feature combines the benefits of models (DAG participation, dependencies, scheduling) with audit behavior (validation without materialization).
5+
6+
## Problem Statement
7+
Previously, SQLMesh users had to choose between:
8+
- Creating wasteful materialized models just to run cross-table validations
9+
- Using standalone audits that don't integrate well with model dependencies
10+
- Building external validation systems outside SQLMesh
11+
12+
## Solution
13+
The `AUDIT_ONLY` model kind enables users to:
14+
- Validate relationships across multiple tables (e.g., referential integrity)
15+
- Run complex validation queries that don't belong to a single model
16+
- Participate in the model DAG with proper dependencies
17+
- Avoid creating unnecessary materialized tables
18+
19+
## Implementation Details
20+
21+
### Core Changes
22+
23+
#### 1. Model Kind Definition (`sqlmesh/core/model/kind.py`)
24+
- Added `AUDIT_ONLY` to `ModelKindName` enum
25+
- Created `AuditOnlyKind` class with configuration:
26+
- `blocking` (default: `True`): Whether failures stop the pipeline
27+
- `max_failing_rows` (default: `10`): Number of sample rows in error messages
28+
- Marked as `is_symbolic=True` (no materialization)
29+
30+
#### 2. Execution Strategy (`sqlmesh/core/snapshot/evaluator.py`)
31+
- Created `AuditOnlyStrategy` extending `SymbolicStrategy`
32+
- Executes validation query and checks for returned rows
33+
- Raises `AuditError` with sample data if validation fails
34+
- Properly integrated with the evaluation strategy routing
35+
36+
#### 3. Parser Support (`sqlmesh/core/dialect.py`)
37+
- Added `AUDIT_ONLY` to list of model kinds that accept properties
38+
39+
#### 4. Snapshot Definition (`sqlmesh/core/snapshot/definition.py`)
40+
- Fixed `evaluatable` property to include audit-only models
41+
- Ensures proper interval tracking for validation execution
42+
43+
### Testing
44+
45+
#### Unit Tests (`tests/core/test_model.py`)
46+
- 6 unit tests covering:
47+
- Basic parsing and properties
48+
- Blocking/non-blocking configuration
49+
- Max failing rows configuration
50+
- Python model support
51+
- Full configuration scenarios
52+
- Serialization/deserialization
53+
54+
#### Integration Tests (`tests/core/test_integration.py`)
55+
- 6 integration tests validating:
56+
- Validation success/failure scenarios
57+
- Blocking vs non-blocking behavior
58+
- Dependency tracking
59+
- Scheduling with cron
60+
- Metadata changes
61+
62+
### Documentation
63+
64+
#### User Documentation Updates
65+
- **`docs/concepts/audits.md`**: Added comprehensive AUDIT_ONLY section under Advanced Usage
66+
- **`docs/concepts/models/model_kinds.md`**: Added detailed AUDIT_ONLY section with examples
67+
- **`docs/reference/model_configuration.md`**: Added AUDIT_ONLY configuration reference
68+
69+
#### Example Models (`examples/sushi/models/`)
70+
Added 3 demonstration models (all non-blocking for demo purposes):
71+
- `audit_order_integrity.sql`: Validates referential integrity
72+
- `audit_waiter_revenue_anomalies.sql`: Detects revenue anomalies
73+
- `audit_duplicate_orders.sql`: Identifies duplicate orders
74+
75+
## Usage Example
76+
77+
```sql
78+
MODEL (
79+
name data_quality.order_validation,
80+
kind AUDIT_ONLY (
81+
blocking TRUE,
82+
max_failing_rows 20
83+
),
84+
depends_on [orders, customers],
85+
cron '@daily'
86+
);
87+
88+
-- Query returns 0 rows for success
89+
SELECT
90+
o.order_id,
91+
o.customer_id,
92+
'Missing customer record' as issue
93+
FROM orders o
94+
LEFT JOIN customers c ON o.customer_id = c.customer_id
95+
WHERE c.customer_id IS NULL;
96+
```
97+
98+
## Key Differences from Traditional Audits
99+
100+
| Feature | Traditional Audits | AUDIT_ONLY Models |
101+
|---------|-------------------|-------------------|
102+
| **Scope** | Single model | Multiple models |
103+
| **Dependencies** | Implicit | Explicit via depends_on |
104+
| **Materialization** | N/A | Never materializes |
105+
| **Location** | `audits/` directory | `models/` directory |
106+
| **Scheduling** | With parent model | Independent cron |
107+
| **DAG Participation** | Attached to model | Full model in DAG |
108+
109+
## Migration Path
110+
- No breaking changes to existing models or audits
111+
- Optional feature - only use when needed
112+
- Can gradually migrate complex audits to audit-only models
113+
114+
## Testing Instructions
115+
116+
1. **Run unit tests:**
117+
```bash
118+
pytest tests/core/test_model.py -k audit_only -xvs
119+
```
120+
121+
2. **Run integration tests:**
122+
```bash
123+
pytest tests/core/test_integration.py -k audit_only -xvs
124+
```
125+
126+
3. **Try the sushi examples:**
127+
```bash
128+
cd examples/sushi
129+
sqlmesh plan
130+
# Note: Example models are non-blocking so they won't fail the pipeline
131+
```
132+
133+
4. **Create a test AUDIT_ONLY model:**
134+
```sql
135+
-- Save as models/test_audit.sql
136+
MODEL (
137+
name test.audit_validation,
138+
kind AUDIT_ONLY,
139+
depends_on [your_table1, your_table2]
140+
);
141+
142+
-- This should return 0 rows for success
143+
SELECT * FROM your_table1
144+
WHERE some_condition_that_indicates_invalid_data;
145+
```
146+
147+
## Checklist
148+
- [x] Add `AUDIT_ONLY` to `ModelKindName` enum
149+
- [x] Create `AuditOnlyKind` class
150+
- [x] Update `ModelKind` Union type
151+
- [x] Update `MODEL_KIND_NAME_TO_TYPE` mapping
152+
- [x] Create `AuditOnlyStrategy` class
153+
- [x] Update `_evaluation_strategy` routing
154+
- [x] Add `is_audit_only` properties
155+
- [x] Write unit tests
156+
- [x] Write integration tests
157+
- [x] Update documentation
158+
- [x] Add examples to sushi demo project
159+
160+
## Related Issues
161+
Addresses the need for multi-table validation without materialization as described in the RFC.
162+
163+
## Notes for Reviewers
164+
- The feature is designed to be non-intrusive and backward compatible
165+
- Example models in sushi are set to non-blocking to avoid disrupting tests
166+
- Documentation emphasizes when to use AUDIT_ONLY vs traditional audits
167+
- The implementation follows existing SQLMesh patterns for symbolic models
168+
169+
## Future Enhancements (Not in this PR)
170+
- Support for incremental validation by time range
171+
- Configurable number of failing rows to capture
172+
- Warning mode that logs issues without failing
173+
- Different visualization in UI/lineage graph

sqlmesh/core/snapshot/evaluator.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1755,13 +1755,14 @@ def _validate(
17551755
full_render_kwargs["engine_adapter"] = self.adapter
17561756

17571757
query = model.render_query(**full_render_kwargs)
1758-
1758+
17591759
if query is None:
17601760
raise RuntimeError(f"AUDIT_ONLY model '{model.fqn}' rendered to None query")
17611761

17621762
# Count the rows returned by the validation query
17631763
count_query = select("COUNT(*)").from_(query.subquery("audit_only"))
1764-
count, *_ = self.adapter.fetchone(count_query, quote_identifiers=True)
1764+
result = self.adapter.fetchone(count_query, quote_identifiers=True)
1765+
count = result[0] if result else 0
17651766

17661767
if count > 0:
17671768
# Fetch sample failing rows for the error message

0 commit comments

Comments
 (0)