Skip to content

Commit 5cd9ced

Browse files
authored
Chore: clarify user responsibility for key uniqueness in inc by unique key models (#3930)
1 parent f564262 commit 5cd9ced

1 file changed

Lines changed: 34 additions & 24 deletions

File tree

docs/concepts/models/model_kinds.md

Lines changed: 34 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -301,7 +301,7 @@ In addition to specifying a time column in the `MODEL` DDL, the model's query mu
301301
```
302302

303303
SQLMesh will create a suffixed `__dev` schema based on the name of the plan environment.
304-
304+
305305
```sql
306306
CREATE SCHEMA IF NOT EXISTS `sqlmesh-public-demo`.`demo__dev`
307307
```
@@ -428,9 +428,19 @@ Depending on the target engine, models of the `INCREMENTAL_BY_TIME_RANGE` kind a
428428

429429
## INCREMENTAL_BY_UNIQUE_KEY
430430

431-
Models of the `INCREMENTAL_BY_UNIQUE_KEY` kind are computed incrementally based on a key that is unique for each data row.
431+
Models of the `INCREMENTAL_BY_UNIQUE_KEY` kind are computed incrementally based on a key.
432+
433+
They insert or update rows based on these rules:
434+
435+
- If a key in newly loaded data is not present in the model table, the new data row is inserted.
436+
- If a key in newly loaded data is already present in the model table, the existing row is updated with the new data.
437+
- If a key is present in the model table but not present in the newly loaded data, its row is not modified and remains in the model table.
432438

433-
If a key in newly loaded data is not present in the model table, the new data row is inserted. If a key in newly loaded data is already present in the model table, the existing row is updated with the new data. If a key is present in the model table but not present in the newly loaded data, its row is not modified and remains in the model table.
439+
!!! important "Prevent duplicated keys"
440+
441+
If you do not want duplicated keys in the model table, you must ensure the model query does not return rows with duplicate keys.
442+
443+
SQLMesh does not automatically detect or prevent duplicates.
434444

435445
This kind is a good fit for datasets that have the following traits:
436446

@@ -509,18 +519,18 @@ WHERE
509519
SQLMesh will validate the model's query before processing data (note the `FALSE LIMIT 0` in the `WHERE` statement and the placeholder dates).
510520

511521
```sql
512-
SELECT `seed_model`.`id` AS `id`, `seed_model`.`item_id` AS `item_id`, `seed_model`.`event_date` AS `event_date`
513-
FROM `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__seed_model__2834544882` AS `seed_model`
522+
SELECT `seed_model`.`id` AS `id`, `seed_model`.`item_id` AS `item_id`, `seed_model`.`event_date` AS `event_date`
523+
FROM `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__seed_model__2834544882` AS `seed_model`
514524
WHERE (`seed_model`.`event_date` <= CAST('1970-01-01' AS DATE) AND `seed_model`.`event_date` >= CAST('1970-01-01' AS DATE)) AND FALSE LIMIT 0
515525
```
516526

517527
SQLMesh will create a versioned table in the physical layer.
518528

519529
```sql
520-
CREATE OR REPLACE TABLE `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__incremental_by_unique_key_example__1161945221` AS
521-
SELECT CAST(`id` AS INT64) AS `id`, CAST(`item_id` AS INT64) AS `item_id`, CAST(`event_date` AS DATE) AS `event_date`
522-
FROM (SELECT `seed_model`.`id` AS `id`, `seed_model`.`item_id` AS `item_id`, `seed_model`.`event_date` AS `event_date`
523-
FROM `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__seed_model__2834544882` AS `seed_model`
530+
CREATE OR REPLACE TABLE `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__incremental_by_unique_key_example__1161945221` AS
531+
SELECT CAST(`id` AS INT64) AS `id`, CAST(`item_id` AS INT64) AS `item_id`, CAST(`event_date` AS DATE) AS `event_date`
532+
FROM (SELECT `seed_model`.`id` AS `id`, `seed_model`.`item_id` AS `item_id`, `seed_model`.`event_date` AS `event_date`
533+
FROM `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__seed_model__2834544882` AS `seed_model`
524534
WHERE `seed_model`.`event_date` <= CAST('2024-10-30' AS DATE) AND `seed_model`.`event_date` >= CAST('2020-01-01' AS DATE)) AS `_subquery`
525535
```
526536

@@ -533,7 +543,7 @@ WHERE
533543
SQLMesh will create a view in the virtual layer pointing to the versioned table in the physical layer.
534544

535545
```sql
536-
CREATE OR REPLACE VIEW `sqlmesh-public-demo`.`demo__dev`.`incremental_by_unique_key_example` AS
546+
CREATE OR REPLACE VIEW `sqlmesh-public-demo`.`demo__dev`.`incremental_by_unique_key_example` AS
537547
SELECT * FROM `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__incremental_by_unique_key_example__1161945221`
538548
```
539549

@@ -691,19 +701,19 @@ GROUP BY title;
691701
SQLMesh will validate the model's query before processing data (note the `WHERE FALSE` and `LIMIT 0`).
692702

693703
```sql
694-
SELECT `incremental_model`.`item_id` AS `item_id`, COUNT(DISTINCT `incremental_model`.`id`) AS `num_orders`
695-
FROM `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__incremental_model__89556012` AS `incremental_model`
696-
WHERE FALSE
704+
SELECT `incremental_model`.`item_id` AS `item_id`, COUNT(DISTINCT `incremental_model`.`id`) AS `num_orders`
705+
FROM `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__incremental_model__89556012` AS `incremental_model`
706+
WHERE FALSE
697707
GROUP BY `incremental_model`.`item_id` LIMIT 0
698708
```
699709

700710
SQLMesh will create a versioned table in the physical layer.
701711

702712
```sql
703-
CREATE OR REPLACE TABLE `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__full_model_example__2345651858` AS
704-
SELECT CAST(`item_id` AS INT64) AS `item_id`, CAST(`num_orders` AS INT64) AS `num_orders`
705-
FROM (SELECT `incremental_model`.`item_id` AS `item_id`, COUNT(DISTINCT `incremental_model`.`id`) AS `num_orders`
706-
FROM `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__incremental_model__89556012` AS `incremental_model`
713+
CREATE OR REPLACE TABLE `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__full_model_example__2345651858` AS
714+
SELECT CAST(`item_id` AS INT64) AS `item_id`, CAST(`num_orders` AS INT64) AS `num_orders`
715+
FROM (SELECT `incremental_model`.`item_id` AS `item_id`, COUNT(DISTINCT `incremental_model`.`id`) AS `num_orders`
716+
FROM `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__incremental_model__89556012` AS `incremental_model`
707717
GROUP BY `incremental_model`.`item_id`) AS `_subquery`
708718
```
709719

@@ -716,7 +726,7 @@ GROUP BY title;
716726
SQLMesh will create a view in the virtual layer pointing to the versioned table in the physical layer.
717727

718728
```sql
719-
CREATE OR REPLACE VIEW `sqlmesh-public-demo`.`demo__dev`.`full_model_example` AS
729+
CREATE OR REPLACE VIEW `sqlmesh-public-demo`.`demo__dev`.`full_model_example` AS
720730
SELECT * FROM `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__full_model_example__2345651858`
721731
```
722732

@@ -788,7 +798,7 @@ FROM db.employees;
788798
SQLMesh will create a view in the virtual layer pointing to the versioned view in the physical layer.
789799

790800
```sql
791-
CREATE OR REPLACE VIEW `sqlmesh-public-demo`.`demo__dev`.`example_view` AS
801+
CREATE OR REPLACE VIEW `sqlmesh-public-demo`.`demo__dev`.`example_view` AS
792802
SELECT * FROM `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__example_view__1024042926`
793803
```
794804

@@ -834,7 +844,7 @@ FROM db.employees;
834844
## SEED
835845
The `SEED` model kind is used to specify [seed models](./seed_models.md) for using static CSV datasets in your SQLMesh project.
836846

837-
**Notes:**
847+
**Notes:**
838848

839849
- Seed models are loaded only once unless the SQL model and/or seed file is updated.
840850
- Python models do not support the `SEED` model kind - use a SQL model instead.
@@ -873,9 +883,9 @@ The `SEED` model kind is used to specify [seed models](./seed_models.md) for usi
873883
SQLMesh will create a versioned table in the physical layer from the temp table.
874884

875885
```sql
876-
CREATE OR REPLACE TABLE `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__seed_example__3038173937` AS
877-
SELECT CAST(`id` AS INT64) AS `id`, CAST(`item_id` AS INT64) AS `item_id`, CAST(`event_date` AS DATE) AS `event_date`
878-
FROM (SELECT `id`, `item_id`, `event_date`
886+
CREATE OR REPLACE TABLE `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__seed_example__3038173937` AS
887+
SELECT CAST(`id` AS INT64) AS `id`, CAST(`item_id` AS INT64) AS `item_id`, CAST(`event_date` AS DATE) AS `event_date`
888+
FROM (SELECT `id`, `item_id`, `event_date`
879889
FROM `sqlmesh-public-demo`.`sqlmesh__demo`.`__temp_demo__seed_example__3038173937_9kzbpld7`) AS `_subquery`
880890
```
881891

@@ -894,7 +904,7 @@ The `SEED` model kind is used to specify [seed models](./seed_models.md) for usi
894904
SQLMesh will create a view in the virtual layer pointing to the versioned table in the physical layer.
895905

896906
```sql
897-
CREATE OR REPLACE VIEW `sqlmesh-public-demo`.`demo__dev`.`seed_example` AS
907+
CREATE OR REPLACE VIEW `sqlmesh-public-demo`.`demo__dev`.`seed_example` AS
898908
SELECT * FROM `sqlmesh-public-demo`.`sqlmesh__demo`.`demo__seed_example__3038173937`
899909
```
900910

0 commit comments

Comments
 (0)