Skip to content

Commit 26c8485

Browse files
Apply suggestions from code review
Co-authored-by: Trey Spiller <1831878+treysp@users.noreply.github.com>
1 parent 88f57cd commit 26c8485

2 files changed

Lines changed: 29 additions & 20 deletions

File tree

docs/guides/configuration.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -970,9 +970,15 @@ model_defaults:
970970
start: 2025-02-05
971971
```
972972

973-
This allows you to tailor the behavior of models for each gateway without affecting the global model_defaults.
973+
This allows you to tailor the behavior of models for each gateway without affecting the global `model_defaults`.
974974

975-
For example, you can adjust dialect-specific behavior, like the normalization to be case insensitive, to better match the engine’s requirements and avoid compatibility issues.
975+
For example, in some SQL engines identifiers like table and column names are case-sensitive, but they are case-insensitive in other engines. By default, a project that uses both types of engines would need to ensure the models for each engine aligned with the engine's normalization behavior, which makes project maintenance and debugging more challenging.
976+
977+
Gateway-specific `model_defaults` allow you to change how SQLMesh performs identifier normalization *by engine* to align the different engines' behavior.
978+
979+
In the example above, the project's default dialect is `snowflake` (line 14). The `redshift` gateway configuration overrides that global default dialect with `"snowflake,normalization_strategy=case_insensitive"` (line 6).
980+
981+
That value tells SQLMesh that the `redshift` gateway's models will be written in the Snowflake SQL dialect (so need to be transpiled from Snowflake to Redshift), but that the resulting Redshift SQL should treat identifiers as case-insensitive to match Snowflake's behavior.
976982

977983

978984
#### Model Kinds

docs/guides/multi_engine.md

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
Organizations typically connect to a data warehouse through a single engine to ensure data consistency. However, there are cases where the processing capabilities of one engine may be better suited to specific tasks than another.
44

5-
Across the industry, companies are increasingly decoupling storage from compute, demanding interoperability across platforms and tools, focusing on cost efficiency and a growing support for open table formats like Apache Iceberg and Hive.
5+
Companies are increasingly decoupling how/where data is stored from the how computations are run on the data, requiring interoperability across platforms and tools. Open table formats like Apache Iceberg, Delta Lake, and Hive provide a common storage format that can be used by multiple SQL engines.
66

7-
In SQLMesh, you can use multiple engine adapters within a single project, giving you the flexibility to choose the most suitable engine for each task. This allows individual models to run on a specified engine based on their specific requirements.
7+
SQLMesh enables this decoupling by supporting multiple engine adapters within a single project, giving you the flexibility to choose the best engine for each computational task. You can specify the engine each model uses, based on what computations the model performs or other organization-specific considerations.
88

99
## Configuring a Project with Multiple Engines
1010

@@ -15,9 +15,9 @@ Configuring your project to use multiple engines follows a simple process:
1515

1616
If no gateway is explicitly defined for a model, the [default_gateway](../reference/configuration.md#default-gateway) of the project is used.
1717

18-
By default, the `default_gateway` is also responsible to create the views of the virtual layer. This assumes that all engines can read from and write to the same shared catalog.
18+
By default, virtual layer views are created in the `default_gateway`. This approach requires that all engines can read from and write to the same shared catalog, so a view in the `default_gateway` can access a table in another gateway.
1919

20-
Alternatively, you can configure the model-specific gateway to create the views of the virtual layer by setting [gateway_managed_virtual_layer](#gateway-managed-virtual-layer) flag in your configuration to true.
20+
Alternatively, each gateway can create the virtual layer views for the models it runs. Use this approach by setting the [gateway_managed_virtual_layer](#gateway-managed-virtual-layer) flag to `true` in your project configuration.
2121

2222
### Shared Virtual Layer
2323

@@ -31,7 +31,7 @@ In a multi-engine project with a shared data catalog, the model-specific gateway
3131

3232
Below is a simple example of setting up a project with connections to both DuckDB and PostgreSQL.
3333

34-
In this setup, the PostgreSQL engine is set as the default, so it will be used to manage views in the virtual layer. Meanwhile, the DuckDB's [attach](https://duckdb.org/docs/sql/statements/attach.html) feature enables read-write access to the PostgreSQL catalog's physical tables.
34+
In this setup, the PostgreSQL engine is set as the default, so it will be used to manage views in the virtual layer. Meanwhile, DuckDB's [attach](https://duckdb.org/docs/sql/statements/attach.html) feature enables read-write access to the PostgreSQL catalog's physical tables.
3535

3636
=== "YAML"
3737

@@ -99,7 +99,7 @@ In this setup, the PostgreSQL engine is set as the default, so it will be used t
9999

100100
Given this configuration, when a model’s gateway is set to duckdb, it will be materialized within the PostgreSQL `main_db` catalog, but it will be evaluated using DuckDB’s engine.
101101

102-
Given this configuration, when a model’s gateway is set to duckdb, it will be materialized within the PostgreSQL `main_db` catalog, but it will be evaluated using DuckDB’s engine.
102+
Given this configuration, when a model’s gateway is set to DuckDB, the DuckDB engine will perform the calculations before materializing the physical table in the PostgreSQL `main_db` catalog.
103103

104104
```sql linenums="1"
105105
MODEL (
@@ -115,23 +115,27 @@ FROM
115115
iceberg_scan('data/bucket/lineitem_iceberg', allow_moved_paths = true);
116116
```
117117

118-
In the `order_ship_date` model, the DuckDB engine is set, which will be used to create the physical table in the PostgreSQL database.
118+
The `order_ship_date` model specifies the DuckDB engine, which will perform the computations used to create the physical table in the PostgreSQL database.
119119

120120
This allows you to efficiently scan data from an Iceberg table, or even query tables directly from S3 when used with the [HTTPFS](https://duckdb.org/docs/stable/extensions/httpfs/overview.html) extension.
121121

122122
![PostgreSQL + DuckDB](./multi_engine/postgres_duckdb.png)
123123

124-
In models where no gateway is specified, such as the `customer_orders` model, the default PostgreSQL engine will be used to create the physical table as well as to create and manage the views of the virtual layer.
124+
In models where no gateway is specified, such as the `customer_orders` model, the default PostgreSQL engine will both create the physical table and the views in the virtual layer.
125125

126126
### Gateway-Managed Virtual Layer
127127

128-
For projects where the engines don’t share a catalog or your raw data is located in different warehouses, you may prefer each gateway to manage its own virtual layer. This ensures isolation and each model’s views being created by its respective gateway.
128+
By default, all virtual layer views are created in the project's default gateway.
129+
130+
If your project's engines don’t have a mutually accessible catalog or your raw data is located in different engines, you may prefer for each model's virtual layer view to exist in the gateway that ran the model. This allows a single SQLMesh project to manage isolated sets of models in different gateways, which is sometimes necessary for data governance or security concerns.
129131

130132
To enable this, set `gateway_managed_virtual_layer` to `true` in your configuration. By default, this flag is set to false.
131133

132134
#### Example: Redshift + Athena + Snowflake
133135

134-
Consider a scenario where you need to create a project with models in Redshift, Athena and Snowflake. To set this you, add the connections to your configuration and set the `gateway_managed_virtual_layer` flag:
136+
Consider a scenario where you need to create a project with models in Redshift, Athena and Snowflake, where each engine hosts its models' virtual layer views.
137+
138+
First, add the connections to your configuration and set the `gateway_managed_virtual_layer` flag to `true`:
135139

136140
=== "YAML"
137141

@@ -230,10 +234,12 @@ config = Config(
230234
)
231235
```
232236

233-
Note that gateway-specific variables take precedence over global ones. In the example above, the `gw_var` used in a model will take the value defined for the respective gateway.
237+
Note that gateway-specific variables take precedence over global ones. In the example above, the `gw_var` used in a model will resolve to the value specified in the model's gateway.
234238

235239
For further customization, you can also enable [gateway-specific model defaults](../guides/configuration.md#gateway-specific-model-defaults). This allows you to define custom behaviors, such as specifying a dialect with case-insensitivity normalization.
236240

241+
The default gateway is `redshift` In the example configuration above, so all models without a `gateway` specification will run on redshift, as in this `order_dates` model:
242+
237243
```sql linenums="1"
238244
MODEL (
239245
name redshift_schema.order_dates,
@@ -247,7 +253,7 @@ FROM
247253
bucket.raw_data;
248254
```
249255

250-
In this setup, since the default gateway is set to redshift, omitting the gateway from a model will default to this, as seen in the `order_dates` model above.
256+
For the `athena_schema.order_status` model, we explicitly specify the `athena` gateway:
251257

252258
```sql linenums="1"
253259
MODEL (
@@ -263,7 +269,7 @@ FROM
263269
bucket.raw_data;
264270
```
265271

266-
While in the case of the `athena_schema.order_status` model above, the gateway is specified to athena explicitly.
272+
Finally, specifying the `snowflake` gateway for the `customer_orders` model ensures it is isolated from the rest and reads from a table within the Snowflake database:
267273

268274
```sql linenums="1"
269275
MODEL (
@@ -279,11 +285,10 @@ FROM
279285
bronze_schema.customer_data;
280286
```
281287

282-
Finally, specifying the snowflake gateway for the `customer_orders` model ensures it is isolated from the rest and sources from a table within the snowflake database.
283288

284289
![Athena + Redshift + Snowflake](./multi_engine/athena_redshift_snowflake.png)
285290

286-
When you run the plan, the catalogs for each model will be set automatically based on the gateway’s connection and each corresponding model will be evaluated against the specified engine.
291+
When you run the plan, the catalogs for each model will be set automatically based on the gateway’s connection and each corresponding model will be executed by the specified engine:
287292

288293
```bash
289294
❯ sqlmesh plan
@@ -292,7 +297,7 @@ When you run the plan, the catalogs for each model will be set automatically bas
292297

293298
Models:
294299
└── Added:
295-
├── awsdatacatalog.athena_schema.order_status
300+
├── awsdatacatalog.athena_schema.order_status # each model uses its gateway's catalog and schema
296301
├── redshift_schema.order_dates
297302
└── silver.snowflake_schema.customers
298303
Models needing backfill:
@@ -305,5 +310,3 @@ Apply - Backfill Tables [y/n]: y
305310
The views of the virtual layer will also be created by each corresponding engine.
306311

307312
This approach provides isolation between your models, while maintaining centralized control over your project.
308-
309-
This allows users to leverage multiple engines within a single SQLMesh project, particularly as the industry shifts toward data lakes, open table formats, and greater interoperability.

0 commit comments

Comments
 (0)