Skip to content

Commit ed558f1

Browse files
author
Sung Won Chung
committed
polished formatting and prose
1 parent 1c43be3 commit ed558f1

1 file changed

Lines changed: 28 additions & 17 deletions

File tree

docs/examples/sqlmesh_cli_crash_course.md

Lines changed: 28 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
11
# SQLMesh CLI Crash Course
22

3-
This doc is designed to get you intimate with a **majority** of the SQLMesh workflows and commands you’ll use to build *and* maintain data pipelines. The goal is after 30 minutes, using SQLMesh becomes muscle memory. This is designed to live on your second monitor or in a side by side window, so you can swiftly copy/paste into your terminal.
3+
This doc is designed to get you intimate with a **majority** of the SQLMesh workflows you’ll use to build *and* maintain transformation data pipelines. The goal is after 30 minutes, using SQLMesh becomes muscle memory.
44

5-
This is inspired by community observations, face to face conversations, live screenshares, and debugging sessions. This is *not* an exhaustive list, but it is an earnest one.
5+
This is inspired by community observations, face to face conversations, live screenshares, and debugging sessions. This is *not* an exhaustive list but is rooted in lived experience.
66

77
You can follow along in this: [open source GitHub repo](https://github.com/sungchun12/sqlmesh-cli-crash-course)
88

9+
If you're new to how SQLMesh uses virtual data environments, [watch this quick explainer](https://www.loom.com/share/216835d64b3a4d56b2e061fa4bd9ee76?sid=88b3289f-e19b-4ccc-8b88-3faf9d7c9ce3).
10+
11+
Note: This is designed to live on your second monitor or in a side by side window, so you can swiftly copy/paste into your terminal.
12+
913
## **Development Workflow**
1014

1115
You’ll use these commands 80% of the time because this is how you apply code changes. The workflow is as follows:
@@ -180,7 +184,7 @@ Run data diff against prod. This is a good way to verify the changes are behavin
180184
- Showed me sample data differences between the environments.
181185
- This is where your human judgement comes in to verify the changes are behaving as expected.
182186

183-
```sql
187+
```sql linenums="1" hl_lines="6"
184188
-- models/full_model.sql
185189
MODEL (
186190
name sqlmesh_example.full_model,
@@ -358,10 +362,9 @@ You can automatically parse fully qualified table/view names that are outside of
358362

359363
- Generated external models from the `bigquery-public-data`.`ga4_obfuscated_sample_ecommerce`.`events_20210131` tabled parsed in the model's SQL.
360364
- I added an audit to the external model to ensure `event_date` is not null.
361-
- Viewed a plan preview of the changes that will be made to the external model.
365+
- Viewed a plan preview of the changes that will be made for the external model.
362366

363-
```sql
364-
-- models/external_model_example.sql
367+
```sql linenums="1" hl_lines="29" title="models/external_model_example.sql"
365368
MODEL (
366369
name tcloud_demo.external_model
367370
);
@@ -393,8 +396,7 @@ You can automatically parse fully qualified table/view names that are outside of
393396
FROM bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131 -- I fully qualified the external table name and sqlmesh will automatically create the external model
394397
```
395398

396-
```yaml
397-
# external_models.yaml
399+
```yaml linenums="1" hl_lines="2 3 4" title="external_models.yaml"
398400
- name: '`bigquery-public-data`.`ga4_obfuscated_sample_ecommerce`.`events_20210131`'
399401
audits: # I added this audit manually to the external model
400402
- name: not_null
@@ -513,8 +515,7 @@ You can ensure business logic is working as expected with static sample data. Th
513515
- If you're using a cloud data warehouse, this will transpile your SQL syntax to its equivalent in duckdb.
514516
- This runs fast and free on your local machine.
515517

516-
```yaml
517-
# tests/test_full_model.yaml
518+
```yaml linenums="1" title="tests/test_full_model.yaml"
518519
test_full_model:
519520
model: '"db"."sqlmesh_example"."full_model"'
520521
inputs:
@@ -640,7 +641,7 @@ This is great to catch issues before wasting runtime in your data warehouse. You
640641

641642
You add linting rules in your `config.yaml` file.
642643

643-
```yaml
644+
```yaml linenums="1" hl_lines="13-17" title="config.yaml"
644645
gateways:
645646
duckdb:
646647
connection:
@@ -722,7 +723,7 @@ This is great to verify the SQL is looking as expected before applying the chang
722723

723724
It outputs the full SQL code in the default or target dialect.
724725

725-
```sql
726+
```sql hl_lines="10"
726727
-- rendered sql in default dialect
727728
SELECT
728729
"seed_model"."id" AS "id",
@@ -750,8 +751,7 @@ This is great to verify the SQL is looking as expected before applying the chang
750751
AND `seed_model`.`event_date` >= CAST('1970-01-01' AS DATE)
751752
```
752753

753-
```sql
754-
-- original sqlmesh model code
754+
```sql linenums="1" title="models/incremental_model.sql"
755755
MODEL (
756756
name sqlmesh_example.incremental_model,
757757
kind INCREMENTAL_BY_TIME_RANGE (
@@ -799,7 +799,7 @@ You can see detailed operations in the physical and virtual layers. This is usef
799799

800800
??? "Example Output"
801801

802-
```bash
802+
```bash hl_lines="47-49"
803803
[WARNING] Linter warnings for
804804
/Users/sung/Desktop/git_repos/sqlmesh-cli-revamp/models/incremental_by_partition.sql:
805805
- nomissingaudits: Model `audits` must be configured to test data quality.
@@ -905,6 +905,16 @@ bat --theme='ansi' $(ls -t logs/ | head -n 1 | sed 's/^/logs\//')
905905

906906
## **Run on Production Schedule**
907907

908+
SQLMesh schedules your transformation on a per-model basis in proper DAG order. This makes it easy to configure how often each step in your pipeline runs to backfill data without running when upstream models are late or failed. Rerunning from point of failure is also a default!
909+
910+
`stg_transactions`(cron: `@hourly`) -> `fct_transcations`(cron: `@daily`). All times in UTC.
911+
912+
1. `stg_transactions` runs hourly
913+
2. `fct_transcations` runs at 12am UTC if `stg_transactions` is fresh and updated since its most recent hour interval
914+
3. If `stg_transactions` failed from 11pm-11:59:59pm, it will prevent `fct_transcations` from running and put it in a `pending` state
915+
4. If `fct_transactions` is `pending` past its full interval (1 full day), it will be put in a `late` state
916+
5. Once `stg_transactions` runs successfully either from a retry or a fix from a pull request, `fct_transactions` will rerun from the point of failure. This is true even if `fct_transactions` has been `late` for several days.
917+
908918
If you're using open source SQLMesh, you can run this command in your orchestrator (ex: Dagster, GitHub Actions, etc.) every 5 minutes or at your lowest model cron schedule (ex: every 1 hour). Don't worry! It will only run executions that need to be run.
909919

910920
If you're using Tobiko Cloud, this configures automatically without additional configuration.
@@ -985,7 +995,7 @@ You can run models that execute backfills each time you invoke a run whether ad
985995

986996
??? "Example Model Config"
987997

988-
```sql
998+
```sql linenums="1" hl_lines="9" title="models/incremental_model.sql"
989999
MODEL (
9901000
name sqlmesh_example.incremental_model,
9911001
kind INCREMENTAL_BY_TIME_RANGE (
@@ -1002,8 +1012,9 @@ You can run models that execute backfills each time you invoke a run whether ad
10021012

10031013
This is an advanced workflow and specifically designed for large incremental models (ex: > 200 million rows) that take a long time to run even during development. It solves for:
10041014

1005-
- Transforming data with schema evolution in json and nested array data types.
1015+
- Transforming data with schema evolution in `struct` and nested `array` data types.
10061016
- Retaining history of a calculated column and applying a new calculation to new rows going forward.
1017+
- Retain history of a column with complex conditional `CASE WHEN` logic and apply new conditions to new rows going forward.
10071018

10081019
When you apply the plan to `prod` after the dev worfklow, it will NOT backfill historical data. It will only execute model batches **forward only** for new intervals (new rows).
10091020

0 commit comments

Comments
 (0)