Feat: Allow virtual environments to be given dedicated catalogs#4742
Feat: Allow virtual environments to be given dedicated catalogs#4742
Conversation
|
@dcohen24 we're working on this as promised! |
|
Great. Will need your help to think about this one : This will work great for staging (and dev)... need to think about how we might handle individual user / feature branches (that wouldnt have a pre-built catalog) |
| environment_suffix_target.lower() == EnvironmentSuffixTarget.CATALOG.lower() | ||
| and "environment_catalog_mapping" not in data | ||
| ): | ||
| # set the default environment_catalog_mapping for when environment_suffix_target=catalog |
There was a problem hiding this comment.
Shouldn't we fail if environment_catalog_mapping is provided together with the catalog suffix target? IMHO the 2 seem to be mutually exclusive
There was a problem hiding this comment.
They aren't, they work together to give users control over the catalog names.
By default, if you dont configure an environment_catalog_mapping, you get the default mapping of "catalog is named after the environment 1:1".
But if you want more control over how the catalog names are generated, you can specify a custom environment_catalog_mapping.
Therefore, environment_suffix_target: catalog is implemented in terms of environment_catalog_mapping
There was a problem hiding this comment.
After internal discussion, we don't want to give users the ability to override how the catalog names are generated because it's too easy to get subtly wrong.
Based on that, these two settings are indeed mutually exclusive, i've adjusted the implementation to throw a ConfigError if both are specified
|
|
||
| !!! warning "Caveats" | ||
| - Using `environment_suffix_target: catalog` only works on engines that support querying across different catalogs. If your engine does not support cross-catalog queries then you will need to use `environment_suffix_target: schema` or `environment_suffix_target: table` instead. | ||
| - SQLMesh will not attempt to create catalogs on demand or drop them as part of janitor cleanup. Using `environment_suffix_target: catalog` assumes the catalogs already exist in the target database and are being managed outside of SQLMesh. |
There was a problem hiding this comment.
So: SQLMesh will not do this. Is there a fallback? what will happen when a feature branch/catalog is attempted to be spun up?
There was a problem hiding this comment.
The difficulty right now is that CREATE CATALOG / CREATE DATABASE generally comes with a bunch of options to customize the catalog and SQLMesh does not have a good way of specifying them at the moment
However, we might be able to support a basic case with engines like Snowflake that don't make this as difficult, let me revisit this
There was a problem hiding this comment.
that would be awesome [selfishly...we will be snowflake]... I suppose it also be like a custom materialization /abstract class. Push it on user to do build up /teardown
There was a problem hiding this comment.
I've created an initial implementation for Snowflake where SQLMesh will run CREATE DATABASE IF NOT EXISTS <env_name> to create a catalog and DROP DATABASE IF EXISTS <env_name> to clean up when the env expires.
Doing this automatically makes me slightly nervous because SQLMesh cannot distinguish between catalogs it created and catalogs that others created. So someone could map a SQLMesh virtual environment to an existing catalog with other data in it and the Janitor will happily drop that catalog when the SQLMesh virtual environment expires, which will also drop the other data.
Do you see that being a problem in your use-case?
There was a problem hiding this comment.
Actually, thinking about this more, i've adjusted SQLMesh to set COMMENT = 'sqlmesh_managed' on the databases it creates.
It will then only drop databases with this comment set. That should help prevent accidents
b983250 to
51f69aa
Compare
7dd81a4 to
d8dcf20
Compare
70993f1 to
160f6ac
Compare
| return self._drop_catalog(exp.parse_identifier(catalog_name, dialect=self.dialect)) | ||
|
|
||
| def _drop_catalog(self, catalog_name: exp.Identifier) -> None: | ||
| raise NotImplementedError( |
There was a problem hiding this comment.
I think this should be a SQLMeshError in order to bubble up to the user correctly.
160f6ac to
f13a20e
Compare
Addresses #3251
Up until now, there has been no way to say to SQLMesh "Create a virtual environment with identical schema and view naming to
prod, just under a different catalog".The closest thing was
environment_catalog_mappingwhich technically did allow virtual environment views to go into a different catalog but it did nothing to rename the schemas.This meant if you had something like:
And ran
sqlmesh plan dev, the schemas would still have the__devsuffix, egdev.example_schema__dev.example_table- even though they are created under thedevcatalog.This behaviour meant that it is not trivial to point a downstream report written against the prod environment at your dev environment because the objects are still named differently.
This PR extends the existing
environment_suffix_targetoption to give it another value -catalog. After this PR, setting some config like:And running
sqlmesh plan devwill cause the virtual layer to be created under egdev.example_schema.example_tableinstead ofdev.example_schema__dev.example_table.In this initial implementation, SQLMesh can only automatically create + drop catalogs for you in Snowflake and DuckDB. Catalogs are a lot less trivial to create than schemas because they tend to have extra options like "where to store the data files", "what tablespace to use" etc which SQLMesh doesn't currently have a good way to specify.
For other engines, the assumption is that any environment-specific catalogs have already been created in the target database and SQLMesh is just utilizing them