Skip to content

Commit 7dd81a4

Browse files
committed
Feat: Allow virtual environments to be given dedicated catalogs
1 parent a9478a5 commit 7dd81a4

22 files changed

Lines changed: 550 additions & 65 deletions

File tree

docs/guides/configuration.md

Lines changed: 81 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,9 @@ This only applies to the _physical tables_ that SQLMesh creates - the views are
244244

245245
SQLMesh stores `prod` environment views in the schema in a model's name - for example, the `prod` views for a model `my_schema.users` will be located in `my_schema`.
246246

247-
By default, for non-prod environments SQLMesh creates a new schema that appends the environment name to the model name's schema. For example, by default the view for a model `my_schema.users` in a SQLMesh environment named `dev` will be located in the schema `my_schema__dev`.
247+
By default, for non-prod environments SQLMesh creates a new schema that appends the environment name to the model name's schema. For example, by default the view for a model `my_schema.users` in a SQLMesh environment named `dev` will be located in the schema `my_schema__dev` as `my_schema__dev.users`.
248+
249+
##### Show at the table level instead
248250

249251
This behavior can be changed to append a suffix at the end of a _table/view_ name instead. Appending the suffix to a table/view name means that non-prod environment views will be created in the same schema as the `prod` environment. The prod and non-prod views are differentiated by non-prod view names ending with `__<env>`.
250252

@@ -260,7 +262,7 @@ Config example:
260262

261263
=== "Python"
262264

263-
The Python `environment_suffix_target` argument takes an `EnvironmentSuffixTarget` enumeration with a value of `EnvironmentSuffixTarget.TABLE` or `EnvironmentSuffixTarget.SCHEMA` (default).
265+
The Python `environment_suffix_target` argument takes an `EnvironmentSuffixTarget` enumeration with a value of `EnvironmentSuffixTarget.TABLE`, `EnvironmentSuffixTarget.CATALOG` or `EnvironmentSuffixTarget.SCHEMA` (default).
264266

265267
```python linenums="1"
266268
from sqlmesh.core.config import Config, ModelDefaultsConfig, EnvironmentSuffixTarget
@@ -271,20 +273,68 @@ Config example:
271273
)
272274
```
273275

274-
The default behavior of appending the suffix to schemas is recommended because it leaves production with a single clean interface for accessing the views. However, if you are deploying SQLMesh in an environment with tight restrictions on schema creation then this can be a useful way of reducing the number of schemas SQLMesh uses.
276+
!!! info "Default behavior"
277+
The default behavior of appending the suffix to schemas is recommended because it leaves production with a single clean interface for accessing the views. However, if you are deploying SQLMesh in an environment with tight restrictions on schema creation then this can be a useful way of reducing the number of schemas SQLMesh uses.
278+
279+
##### Show at the catalog level instead
280+
281+
If neither the schema (default) nor the table level are sufficient for your use case, you may indicate the environment at the catalog level instead.
282+
283+
This can be useful if you have downstream BI reporting tools and you would like to point them at a development environment to test something out without renaming all the table / schema references within the report query.
284+
285+
In order to achieve this, you may configure [environment_suffix_target](../reference/configuration.md#environments) like so:
286+
287+
=== "YAML"
288+
289+
```yaml linenums="1"
290+
environment_suffix_target: catalog
291+
```
292+
293+
=== "Python"
294+
295+
The Python `environment_suffix_target` argument takes an `EnvironmentSuffixTarget` enumeration with a value of `EnvironmentSuffixTarget.TABLE`, `EnvironmentSuffixTarget.CATALOG` or `EnvironmentSuffixTarget.SCHEMA` (default).
296+
297+
```python linenums="1"
298+
from sqlmesh.core.config import Config, ModelDefaultsConfig, EnvironmentSuffixTarget
299+
300+
config = Config(
301+
model_defaults=ModelDefaultsConfig(dialect=<dialect>),
302+
environment_suffix_target=EnvironmentSuffixTarget.CATALOG,
303+
)
304+
```
305+
306+
Given the example of a model called `my_schema.users` with a default catalog of `warehouse` this will cause the following behavior:
307+
308+
- For the `prod` environment, the default catalog as configured in the gateway will be used. So the view will be created at `warehouse.my_schema.users`
309+
- For any other environment, eg `dev`, the environment name will be appended to the default catalog. So the view will be created at `warehouse__dev.my_schema.users`
310+
311+
Therefore, a model named `my_schema.users` in an environment called `dev` will have its view created as `dev.my_schema.users`.
312+
313+
If you would like more control over the catalog name to use for a given environment, you may optionally configure [environment_catalog_mapping](#environment-view-catalogs) as described below.
314+
315+
!!! warning "Caveats"
316+
- Using `environment_suffix_target: catalog` only works on engines that support querying across different catalogs. If your engine does not support cross-catalog queries then you will need to use `environment_suffix_target: schema` or `environment_suffix_target: table` instead.
317+
- Automatic catalog creation is not supported on all engines even if they support cross-catalog queries. For engines where it is not supported, the catalogs must exist prior to invoking SQLMesh.
318+
- When using `environment_catalog_mapping`, you need to ensure that your regex patterns do not conflict. If you map multiple environments to the same catalog, SQLMesh will overwrite existing views. To prevent this, it's recommended to make use of the `@{environment_name}` placeholder to ensure that environment catalog names are always unique.
319+
275320

276321
#### Environment view catalogs
277322

278323
By default, SQLMesh creates an environment view in the same [catalog](../concepts/glossary.md#catalog) as the physical table the view points to. The physical table's catalog is determined by either the catalog specified in the model name or the default catalog defined in the connection.
279324

280-
Some companies fully segregate `prod` and non-prod environment objects by catalog. For example, they might have a "prod" catalog that contains all `prod` environment physical tables and views and a separate "dev" catalog that contains all `dev` environment physical tables and views.
325+
It can be desirable to create `prod` and non-prod virtual layer objects in separate catalogs instead. For example, there might be a "prod" catalog that contains all `prod` environment views and a separate "dev" catalog that contains all `dev` environment views.
281326

282327
Separate prod and non-prod catalogs can also be useful if you have a CI/CD pipeline that creates environments, like the [SQLMesh Github Actions CI/CD Bot](../integrations/github.md). You might want to store the CI/CD environment objects in a dedicated catalog since there can be many of them.
283328

329+
!!! info "Virtual layer only"
330+
Note that the following setting only affects the [virtual layer](../concepts/glossary.md#virtual-layer). If you need full segregation by catalog between environments in the [physical layer](../concepts/glossary.md#physical-layer) as well, see the [Isolated Systems Guide](../guides/isolated_systems.md).
331+
284332
To configure separate catalogs, provide a mapping from [regex patterns](https://en.wikipedia.org/wiki/Regular_expression) to catalog names. SQLMesh will compare the name of an environment to the regex patterns; when it finds a match it will store the environment's objects in the corresponding catalog.
285333

286334
SQLMesh evaluates the regex patterns in the order defined in the configuration; it uses the catalog for the first matching pattern. If no match is found, the catalog defined in the model or the default catalog defined on the connection will be used.
287335

336+
In addition, you may use regex capture groups to capture parts of the environment name to use in the output catalog name. These follow the syntax of [Python's `re` module](https://docs.python.org/3/library/re.html#regular-expression-syntax).
337+
288338
Config example:
289339

290340
=== "YAML"
@@ -294,6 +344,7 @@ Config example:
294344
'^prod$': prod
295345
'^dev.*': dev
296346
'^analytics_repo.*': cicd
347+
'(.*)': 'user_\\1'
297348
```
298349

299350
=== "Python"
@@ -307,6 +358,7 @@ Config example:
307358
'^prod$': 'prod',
308359
'^dev.*': 'dev',
309360
'^analytics_repo.*': 'cicd',
361+
'(.*)': r'user_\1'
310362
},
311363
)
312364
```
@@ -316,6 +368,7 @@ With the example configuration above, SQLMesh would evaluate environment names a
316368
* If the environment name is `prod`, the catalog will be `prod`.
317369
* If the environment name starts with `dev`, the catalog will be `dev`.
318370
* If the environment name starts with `analytics_repo`, the catalog will be `cicd`.
371+
* If the environment name is anything else, eg `homer_simpson`, the catalog will be `user_homer_simpson`.
319372

320373
*Note:* This feature is only available for engines that support querying across catalogs. At the time of writing, the following engines are **NOT** supported:
321374

@@ -441,15 +494,15 @@ SELECT 2 AS col
441494
└── Directly Modified:
442495
└── sqlmesh_example__dev.test_model
443496

444-
---
445-
+++
446-
447-
448-
kind FULL
449-
)
450-
SELECT
451-
- 1 AS col
452-
+ 2 AS col
497+
---
498+
+++
499+
500+
501+
kind FULL
502+
)
503+
SELECT
504+
- 1 AS col
505+
+ 2 AS col
453506
```
454507

455508
3. Second (metadata) change in `dev`:
@@ -469,27 +522,27 @@ SELECT 5 AS col
469522
└── Directly Modified:
470523
└── sqlmesh_example__dev.test_model
471524

472-
---
473-
474-
+++
475-
476-
@@ -1,8 +1,9 @@
477-
478-
MODEL (
479-
name sqlmesh_example.test_model,
480-
+ owner "John Doe",
481-
kind FULL
482-
)
483-
SELECT
484-
- 1 AS col
485-
+ 2 AS col
525+
---
526+
527+
+++
528+
529+
@@ -1,8 +1,9 @@
530+
531+
MODEL (
532+
name sqlmesh_example.test_model,
533+
+ owner "John Doe",
534+
kind FULL
535+
)
536+
SELECT
537+
- 1 AS col
538+
+ 2 AS col
486539

487540
Directly Modified: sqlmesh_example__dev.test_model (Breaking)
488541
Models needing backfill:
489542
└── sqlmesh_example__dev.test_model: [full refresh]
490543
```
491544
492-
Even though the second change should have been a metadata change (thus not requiring a backfill), it will still be classified as a breaking change because the comparison is against production instead of the previous development state. This is intentional and may cause additional backfills as more changes are accumulated.
545+
Even though the second change should have been a metadata change (thus not requiring a backfill), it will still be classified as a breaking change because the comparison is against production instead of the previous development state. This is intentional and may cause additional backfills as more changes are accumulated.
493546
494547
495548
### Gateways

examples/sushi/config.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@
128128
)
129129

130130

131-
environment_suffix_config = Config(
131+
environment_suffix_table_config = Config(
132132
default_connection=DuckDBConnectionConfig(),
133133
model_defaults=model_defaults,
134134
environment_suffix_target=EnvironmentSuffixTarget.TABLE,
@@ -161,3 +161,7 @@
161161
".*": "dev_catalog",
162162
},
163163
)
164+
165+
environment_suffix_catalog_config = environment_catalog_mapping_config.model_copy(
166+
update={"environment_suffix_target": EnvironmentSuffixTarget.CATALOG}
167+
)

sqlmesh/core/config/common.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,22 @@
1010

1111

1212
class EnvironmentSuffixTarget(str, Enum):
13+
# Intended to create virtual environments in their own schemas, with names like "<model_schema_name>__<env name>". The view name is untouched.
14+
# For example, a model named 'sqlmesh_example.full_model' created in an environment called 'dev'
15+
# would have its virtual layer view created as 'sqlmesh_example__dev.full_model'
1316
SCHEMA = "schema"
17+
18+
# Intended to create virtual environments in the same schema as their production counterparts by adjusting the table name.
19+
# For example, a model named 'sqlmesh_example.full_model' created in an environment called 'dev'
20+
# would have its virtual layer view created as "sqlmesh_example.full_model__dev"
1421
TABLE = "table"
1522

23+
# Intended to create virtual environments in their own catalogs to preserve the schema and view name of the models
24+
# For example, a model named 'sqlmesh_example.full_model' created in an environment called 'dev'
25+
# would have its virtual layer view created as "dev.sqlmesh_example.full_model"
26+
# note: this only works for engines that can query across catalogs
27+
CATALOG = "catalog"
28+
1629
@property
1730
def is_schema(self) -> bool:
1831
return self == EnvironmentSuffixTarget.SCHEMA
@@ -21,6 +34,10 @@ def is_schema(self) -> bool:
2134
def is_table(self) -> bool:
2235
return self == EnvironmentSuffixTarget.TABLE
2336

37+
@property
38+
def is_catalog(self) -> bool:
39+
return self == EnvironmentSuffixTarget.CATALOG
40+
2441
@classproperty
2542
def default(cls) -> EnvironmentSuffixTarget:
2643
return EnvironmentSuffixTarget.SCHEMA

sqlmesh/core/config/root.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,10 +75,14 @@ def validate_regex_key_dict(value: t.Dict[str | re.Pattern, t.Any]) -> t.Dict[re
7575
NoPastTTLString = str
7676
GatewayDict = t.Dict[str, GatewayConfig]
7777
RegexKeyDict = t.Dict[re.Pattern, str]
78+
RegexKeyDictOptional = t.Dict[re.Pattern, t.Optional[str]]
7879
else:
7980
NoPastTTLString = t.Annotated[str, BeforeValidator(validate_no_past_ttl)]
8081
GatewayDict = t.Annotated[t.Dict[str, GatewayConfig], BeforeValidator(gateways_ensure_dict)]
8182
RegexKeyDict = t.Annotated[t.Dict[re.Pattern, str], BeforeValidator(validate_regex_key_dict)]
83+
RegexKeyDictOptional = t.Annotated[
84+
t.Dict[re.Pattern, t.Optional[str]], BeforeValidator(validate_regex_key_dict)
85+
]
8286

8387

8488
class Config(BaseConfig):
@@ -148,7 +152,7 @@ class Config(BaseConfig):
148152
)
149153
gateway_managed_virtual_layer: bool = False
150154
infer_python_dependencies: bool = True
151-
environment_catalog_mapping: RegexKeyDict = {}
155+
environment_catalog_mapping: RegexKeyDictOptional = {}
152156
default_target_environment: str = c.PROD
153157
log_limit: int = c.DEFAULT_LOG_LIMIT
154158
cicd_bot: t.Optional[CICDBotConfig] = None

0 commit comments

Comments
 (0)