You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/examples/sqlmesh_cli_crash_course.md
+28-17Lines changed: 28 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,15 @@
1
1
# SQLMesh CLI Crash Course
2
2
3
-
This doc is designed to get you intimate with a **majority** of the SQLMesh workflows and commands you’ll use to build *and* maintain data pipelines. The goal is after 30 minutes, using SQLMesh becomes muscle memory. This is designed to live on your second monitor or in a side by side window, so you can swiftly copy/paste into your terminal.
3
+
This doc is designed to get you intimate with a **majority** of the SQLMesh workflows you’ll use to build *and* maintain transformation data pipelines. The goal is after 30 minutes, using SQLMesh becomes muscle memory.
4
4
5
-
This is inspired by community observations, face to face conversations, live screenshares, and debugging sessions. This is *not* an exhaustive list, but it is an earnest one.
5
+
This is inspired by community observations, face to face conversations, live screenshares, and debugging sessions. This is *not* an exhaustive list but is rooted in lived experience.
6
6
7
7
You can follow along in this: [open source GitHub repo](https://github.com/sungchun12/sqlmesh-cli-crash-course)
8
8
9
+
If you're new to how SQLMesh uses virtual data environments, [watch this quick explainer](https://www.loom.com/share/216835d64b3a4d56b2e061fa4bd9ee76?sid=88b3289f-e19b-4ccc-8b88-3faf9d7c9ce3).
10
+
11
+
Note: This is designed to live on your second monitor or in a side by side window, so you can swiftly copy/paste into your terminal.
12
+
9
13
## **Development Workflow**
10
14
11
15
You’ll use these commands 80% of the time because this is how you apply code changes. The workflow is as follows:
@@ -180,7 +184,7 @@ Run data diff against prod. This is a good way to verify the changes are behavin
180
184
- Showed me sample data differences between the environments.
181
185
- This is where your human judgement comes in to verify the changes are behaving as expected.
182
186
183
-
```sql
187
+
```sql linenums="1" hl_lines="6"
184
188
-- models/full_model.sql
185
189
MODEL (
186
190
name sqlmesh_example.full_model,
@@ -358,10 +362,9 @@ You can automatically parse fully qualified table/view names that are outside of
358
362
359
363
- Generated external models from the `bigquery-public-data`.`ga4_obfuscated_sample_ecommerce`.`events_20210131` tabled parsed in the model's SQL.
360
364
- I added an audit to the external model to ensure `event_date` is not null.
361
-
- Viewed a plan preview of the changes that will be made to the external model.
365
+
- Viewed a plan preview of the changes that will be made for the external model.
@@ -393,8 +396,7 @@ You can automatically parse fully qualified table/view names that are outside of
393
396
FROM bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20210131 -- I fully qualified the external table name and sqlmesh will automatically create the external model
- nomissingaudits: Model `audits` must be configured to test data quality.
@@ -905,6 +905,16 @@ bat --theme='ansi' $(ls -t logs/ | head -n 1 | sed 's/^/logs\//')
905
905
906
906
## **Run on Production Schedule**
907
907
908
+
SQLMesh schedules your transformation on a per-model basis in proper DAG order. This makes it easy to configure how often each step in your pipeline runs to backfill data without running when upstream models are late or failed. Rerunning from point of failure is also a default!
909
+
910
+
`stg_transactions`(cron: `@hourly`) -> `fct_transcations`(cron: `@daily`). All times in UTC.
911
+
912
+
1.`stg_transactions` runs hourly
913
+
2.`fct_transcations` runs at 12am UTC if `stg_transactions` is fresh and updated since its most recent hour interval
914
+
3. If `stg_transactions` failed from 11pm-11:59:59pm, it will prevent `fct_transcations` from running and put it in a `pending` state
915
+
4. If `fct_transactions` is `pending` past its full interval (1 full day), it will be put in a `late` state
916
+
5. Once `stg_transactions` runs successfully either from a retry or a fix from a pull request, `fct_transactions` will rerun from the point of failure. This is true even if `fct_transactions` has been `late` for several days.
917
+
908
918
If you're using open source SQLMesh, you can run this command in your orchestrator (ex: Dagster, GitHub Actions, etc.) every 5 minutes or at your lowest model cron schedule (ex: every 1 hour). Don't worry! It will only run executions that need to be run.
909
919
910
920
If you're using Tobiko Cloud, this configures automatically without additional configuration.
@@ -985,7 +995,7 @@ You can run models that execute backfills each time you invoke a run whether ad
@@ -1002,8 +1012,9 @@ You can run models that execute backfills each time you invoke a run whether ad
1002
1012
1003
1013
This is an advanced workflow and specifically designed for large incremental models (ex: > 200 million rows) that take a long time to run even during development. It solves for:
1004
1014
1005
-
- Transforming data with schema evolution in json and nested array data types.
1015
+
- Transforming data with schema evolution in `struct` and nested `array` data types.
1006
1016
- Retaining history of a calculated column and applying a new calculation to new rows going forward.
1017
+
- Retain history of a column with complex conditional `CASE WHEN` logic and apply new conditions to new rows going forward.
1007
1018
1008
1019
When you apply the plan to `prod` after the dev worfklow, it will NOT backfill historical data. It will only execute model batches **forward only** for new intervals (new rows).
0 commit comments