Skip to content

Commit b983250

Browse files
committed
Feat: Allow virtual environments to be given dedicated catalogs
1 parent 31f90d0 commit b983250

13 files changed

Lines changed: 384 additions & 52 deletions

File tree

docs/guides/configuration.md

Lines changed: 81 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,9 @@ This only applies to the _physical tables_ that SQLMesh creates - the views are
244244

245245
SQLMesh stores `prod` environment views in the schema in a model's name - for example, the `prod` views for a model `my_schema.users` will be located in `my_schema`.
246246

247-
By default, for non-prod environments SQLMesh creates a new schema that appends the environment name to the model name's schema. For example, by default the view for a model `my_schema.users` in a SQLMesh environment named `dev` will be located in the schema `my_schema__dev`.
247+
By default, for non-prod environments SQLMesh creates a new schema that appends the environment name to the model name's schema. For example, by default the view for a model `my_schema.users` in a SQLMesh environment named `dev` will be located in the schema `my_schema__dev` as `my_schema__dev.users`.
248+
249+
##### Show at the table level instead
248250

249251
This behavior can be changed to append a suffix at the end of a _table/view_ name instead. Appending the suffix to a table/view name means that non-prod environment views will be created in the same schema as the `prod` environment. The prod and non-prod views are differentiated by non-prod view names ending with `__<env>`.
250252

@@ -260,7 +262,7 @@ Config example:
260262

261263
=== "Python"
262264

263-
The Python `environment_suffix_target` argument takes an `EnvironmentSuffixTarget` enumeration with a value of `EnvironmentSuffixTarget.TABLE` or `EnvironmentSuffixTarget.SCHEMA` (default).
265+
The Python `environment_suffix_target` argument takes an `EnvironmentSuffixTarget` enumeration with a value of `EnvironmentSuffixTarget.TABLE`, `EnvironmentSuffixTarget.CATALOG` or `EnvironmentSuffixTarget.SCHEMA` (default).
264266

265267
```python linenums="1"
266268
from sqlmesh.core.config import Config, ModelDefaultsConfig, EnvironmentSuffixTarget
@@ -271,20 +273,68 @@ Config example:
271273
)
272274
```
273275

274-
The default behavior of appending the suffix to schemas is recommended because it leaves production with a single clean interface for accessing the views. However, if you are deploying SQLMesh in an environment with tight restrictions on schema creation then this can be a useful way of reducing the number of schemas SQLMesh uses.
276+
!!! info "Default behavior"
277+
The default behavior of appending the suffix to schemas is recommended because it leaves production with a single clean interface for accessing the views. However, if you are deploying SQLMesh in an environment with tight restrictions on schema creation then this can be a useful way of reducing the number of schemas SQLMesh uses.
278+
279+
##### Show at the catalog level instead
280+
281+
If neither the schema (default) nor the table level are sufficient for your use case, you may indicate the environment at the catalog level instead.
282+
283+
This can be useful if you have downstream BI reporting tools and you would like to point them at a development environment to test something out without renaming all the table / schema references within the report query.
284+
285+
In order to achieve this, you may configure [environment_suffix_target](../reference/configuration.md#environments) like so:
286+
287+
=== "YAML"
288+
289+
```yaml linenums="1"
290+
environment_suffix_target: catalog
291+
```
292+
293+
=== "Python"
294+
295+
The Python `environment_suffix_target` argument takes an `EnvironmentSuffixTarget` enumeration with a value of `EnvironmentSuffixTarget.TABLE`, `EnvironmentSuffixTarget.CATALOG` or `EnvironmentSuffixTarget.SCHEMA` (default).
296+
297+
```python linenums="1"
298+
from sqlmesh.core.config import Config, ModelDefaultsConfig, EnvironmentSuffixTarget
299+
300+
config = Config(
301+
model_defaults=ModelDefaultsConfig(dialect=<dialect>),
302+
environment_suffix_target=EnvironmentSuffixTarget.CATALOG,
303+
)
304+
```
305+
306+
Given the example of a model called `my_schema.users`, this will cause the following behavior:
307+
308+
- For the `prod` environment, the default catalog as configured in the gateway will be used
309+
- For any other environment, eg `dev`, the environment name will be used as the catalog
310+
311+
Therefore, a model named `my_schema.users` in an environment called `dev` will have its view created as `dev.my_schema.users`.
312+
313+
If you would like more control over the catalog name to use for a given environment, you may optionally configure [environment_catalog_mapping](#environment-view-catalogs) as described below.
314+
315+
!!! warning "Caveats"
316+
- Using `environment_suffix_target: catalog` only works on engines that support querying across different catalogs. If your engine does not support cross-catalog queries then you will need to use `environment_suffix_target: schema` or `environment_suffix_target: table` instead.
317+
- SQLMesh will not attempt to create catalogs on demand or drop them as part of janitor cleanup. Using `environment_suffix_target: catalog` assumes the catalogs already exist in the target database and are being managed outside of SQLMesh.
318+
- When using `environment_catalog_mapping`, you need to ensure that your regex patterns do not conflict. If you map multiple environments to the same catalog, SQLMesh will overwrite existing views. To prevent this, it's recommended to make use of the `@{environment_name}` placeholder to ensure that environment catalog names are always unique.
319+
275320

276321
#### Environment view catalogs
277322

278323
By default, SQLMesh creates an environment view in the same [catalog](../concepts/glossary.md#catalog) as the physical table the view points to. The physical table's catalog is determined by either the catalog specified in the model name or the default catalog defined in the connection.
279324

280-
Some companies fully segregate `prod` and non-prod environment objects by catalog. For example, they might have a "prod" catalog that contains all `prod` environment physical tables and views and a separate "dev" catalog that contains all `dev` environment physical tables and views.
325+
It can be desirable to create `prod` and non-prod virtual layer objects in separate catalogs instead. For example, there might be a "prod" catalog that contains all `prod` environment views and a separate "dev" catalog that contains all `dev` environment views.
281326

282327
Separate prod and non-prod catalogs can also be useful if you have a CI/CD pipeline that creates environments, like the [SQLMesh Github Actions CI/CD Bot](../integrations/github.md). You might want to store the CI/CD environment objects in a dedicated catalog since there can be many of them.
283328

329+
!!! info "Virtual layer only"
330+
Note that the following setting only affects the [virtual layer](../concepts/glossary.md#virtual-layer). If you need full segregation by catalog between environments in the [physical layer](../concepts/glossary.md#physical-layer) as well, see the [Isolated Systems Guide](../guides/isolated_systems.md).
331+
284332
To configure separate catalogs, provide a mapping from [regex patterns](https://en.wikipedia.org/wiki/Regular_expression) to catalog names. SQLMesh will compare the name of an environment to the regex patterns; when it finds a match it will store the environment's objects in the corresponding catalog.
285333

286334
SQLMesh evaluates the regex patterns in the order defined in the configuration; it uses the catalog for the first matching pattern. If no match is found, the catalog defined in the model or the default catalog defined on the connection will be used.
287335

336+
In addition, you may specify an `@{environment_name}` placeholder in the mapping. It will be substituted with the target environment name at runtime.
337+
288338
Config example:
289339

290340
=== "YAML"
@@ -294,6 +344,7 @@ Config example:
294344
'^prod$': prod
295345
'^dev.*': dev
296346
'^analytics_repo.*': cicd
347+
'.*': 'user_@{environment_name}'
297348
```
298349

299350
=== "Python"
@@ -307,6 +358,7 @@ Config example:
307358
'^prod$': 'prod',
308359
'^dev.*': 'dev',
309360
'^analytics_repo.*': 'cicd',
361+
'.*': 'user_@{environment_name}'
310362
},
311363
)
312364
```
@@ -316,6 +368,7 @@ With the example configuration above, SQLMesh would evaluate environment names a
316368
* If the environment name is `prod`, the catalog will be `prod`.
317369
* If the environment name starts with `dev`, the catalog will be `dev`.
318370
* If the environment name starts with `analytics_repo`, the catalog will be `cicd`.
371+
* If the environment name is anything else, eg `homer_simpson`, the catalog will be `user_homer_simpson`.
319372

320373
*Note:* This feature is only available for engines that support querying across catalogs. At the time of writing, the following engines are **NOT** supported:
321374

@@ -441,15 +494,15 @@ SELECT 2 AS col
441494
└── Directly Modified:
442495
└── sqlmesh_example__dev.test_model
443496

444-
---
445-
+++
446-
447-
448-
kind FULL
449-
)
450-
SELECT
451-
- 1 AS col
452-
+ 2 AS col
497+
---
498+
+++
499+
500+
501+
kind FULL
502+
)
503+
SELECT
504+
- 1 AS col
505+
+ 2 AS col
453506
```
454507

455508
3. Second (metadata) change in `dev`:
@@ -469,27 +522,27 @@ SELECT 5 AS col
469522
└── Directly Modified:
470523
└── sqlmesh_example__dev.test_model
471524

472-
---
473-
474-
+++
475-
476-
@@ -1,8 +1,9 @@
477-
478-
MODEL (
479-
name sqlmesh_example.test_model,
480-
+ owner "John Doe",
481-
kind FULL
482-
)
483-
SELECT
484-
- 1 AS col
485-
+ 2 AS col
525+
---
526+
527+
+++
528+
529+
@@ -1,8 +1,9 @@
530+
531+
MODEL (
532+
name sqlmesh_example.test_model,
533+
+ owner "John Doe",
534+
kind FULL
535+
)
536+
SELECT
537+
- 1 AS col
538+
+ 2 AS col
486539

487540
Directly Modified: sqlmesh_example__dev.test_model (Breaking)
488541
Models needing backfill:
489542
└── sqlmesh_example__dev.test_model: [full refresh]
490543
```
491544
492-
Even though the second change should have been a metadata change (thus not requiring a backfill), it will still be classified as a breaking change because the comparison is against production instead of the previous development state. This is intentional and may cause additional backfills as more changes are accumulated.
545+
Even though the second change should have been a metadata change (thus not requiring a backfill), it will still be classified as a breaking change because the comparison is against production instead of the previous development state. This is intentional and may cause additional backfills as more changes are accumulated.
493546
494547
495548
### Gateways

examples/sushi/config.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@
128128
)
129129

130130

131-
environment_suffix_config = Config(
131+
environment_suffix_table_config = Config(
132132
default_connection=DuckDBConnectionConfig(),
133133
model_defaults=model_defaults,
134134
environment_suffix_target=EnvironmentSuffixTarget.TABLE,
@@ -161,3 +161,7 @@
161161
".*": "dev_catalog",
162162
},
163163
)
164+
165+
environment_suffix_catalog_config = environment_catalog_mapping_config.model_copy(
166+
update={"environment_suffix_target": EnvironmentSuffixTarget.CATALOG}
167+
)

sqlmesh/core/config/common.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,22 @@
1010

1111

1212
class EnvironmentSuffixTarget(str, Enum):
13+
# Intended to create virtual environments in their own schemas, with names like "<model_schema_name>__<env name>". The view name is untouched.
14+
# For example, a model named 'sqlmesh_example.full_model' created in an environment called 'dev'
15+
# would have its virtual layer view created as 'sqlmesh_example__dev.full_model'
1316
SCHEMA = "schema"
17+
18+
# Intended to create virtual environments in the same schema as their production counterparts by adjusting the table name.
19+
# For example, a model named 'sqlmesh_example.full_model' created in an environment called 'dev'
20+
# would have its virtual layer view created as "sqlmesh_example.full_model__dev"
1421
TABLE = "table"
1522

23+
# Intended to create virtual environments in their own catalogs to preserve the schema and view name of the models
24+
# For example, a model named 'sqlmesh_example.full_model' created in an environment called 'dev'
25+
# would have its virtual layer view created as "dev.sqlmesh_example.full_model"
26+
# note: this only works for engines that can query across catalogs
27+
CATALOG = "catalog"
28+
1629
@property
1730
def is_schema(self) -> bool:
1831
return self == EnvironmentSuffixTarget.SCHEMA
@@ -21,6 +34,10 @@ def is_schema(self) -> bool:
2134
def is_table(self) -> bool:
2235
return self == EnvironmentSuffixTarget.TABLE
2336

37+
@property
38+
def is_catalog(self) -> bool:
39+
return self == EnvironmentSuffixTarget.CATALOG
40+
2441
@classproperty
2542
def default(cls) -> EnvironmentSuffixTarget:
2643
return EnvironmentSuffixTarget.SCHEMA

sqlmesh/core/config/root.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,23 @@ def _normalize_and_validate_fields(cls, data: t.Any) -> t.Any:
219219
f"^{k}$": v for k, v in physical_schema_override.items()
220220
}
221221

222+
if (environment_suffix_target := data.get("environment_suffix_target")) and isinstance(
223+
environment_suffix_target, str
224+
):
225+
if (
226+
environment_suffix_target.lower() == EnvironmentSuffixTarget.CATALOG.lower()
227+
and "environment_catalog_mapping" not in data
228+
):
229+
# set the default environment_catalog_mapping for when environment_suffix_target=catalog
230+
# but no explicit mapping of environments to catalogs is set by the user
231+
data["environment_catalog_mapping"] = {
232+
# no override for prod, use the default catalog on the connection
233+
# note that we cant pass None here without failing the Pydantic validator, so we pass a string constant that can be interpreted as None later
234+
f"^{c.PROD}$": c.SQLMESH_NONE,
235+
# every other environment overriden to a catalog with the same name
236+
".*": "@{environment_name}",
237+
}
238+
222239
return data
223240

224241
@model_validator(mode="after")

sqlmesh/core/constants.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,7 @@
8585
SQLMESH_MACRO = "__sqlmesh__macro__"
8686
SQLMESH_BUILTIN = "__sqlmesh__builtin__"
8787
SQLMESH_METADATA = "__sqlmesh__metadata__"
88+
SQLMESH_NONE = "__sqlmesh_none__"
8889

8990

9091
BUILTIN = "builtin"

sqlmesh/core/environment.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,10 @@ class EnvironmentNamingInfo(PydanticModel):
4343
normalize_name: bool = True
4444
gateway_managed: bool = False
4545

46+
@property
47+
def is_dev(self) -> bool:
48+
return self.name.lower() != c.PROD
49+
4650
@field_validator("name", mode="before")
4751
@classmethod
4852
def _sanitize_name(cls, v: str) -> str:
@@ -89,8 +93,13 @@ def from_environment_catalog_mapping(
8993
construction_kwargs = dict(name=name, **kwargs)
9094
for re_pattern, catalog_name in environment_catalog_mapping.items():
9195
if re.match(re_pattern, name):
96+
if catalog_name == c.SQLMESH_NONE:
97+
catalog_name_override = None
98+
else:
99+
catalog_name_override = catalog_name.replace("@{environment_name}", name)
100+
92101
return cls(
93-
catalog_name_override=catalog_name,
102+
catalog_name_override=catalog_name_override,
94103
**construction_kwargs,
95104
)
96105
return cls(**construction_kwargs)

0 commit comments

Comments
 (0)