Skip to content

Feat: Allow virtual environments to be given dedicated catalogs#4742

Merged
erindru merged 3 commits intomainfrom
erin/environment-suffix-catalog
Jun 19, 2025
Merged

Feat: Allow virtual environments to be given dedicated catalogs#4742
erindru merged 3 commits intomainfrom
erin/environment-suffix-catalog

Conversation

@erindru
Copy link
Copy Markdown
Collaborator

@erindru erindru commented Jun 16, 2025

Addresses #3251

Up until now, there has been no way to say to SQLMesh "Create a virtual environment with identical schema and view naming to prod, just under a different catalog".

The closest thing was environment_catalog_mapping which technically did allow virtual environment views to go into a different catalog but it did nothing to rename the schemas.

This meant if you had something like:

environment_catalog_mapping:
   '^prod$': 'prod',
  '^dev$': 'dev'

And ran sqlmesh plan dev, the schemas would still have the __dev suffix, eg dev.example_schema__dev.example_table - even though they are created under the dev catalog.

This behaviour meant that it is not trivial to point a downstream report written against the prod environment at your dev environment because the objects are still named differently.

This PR extends the existing environment_suffix_target option to give it another value - catalog. After this PR, setting some config like:

environment_suffix_target: catalog

And running sqlmesh plan dev will cause the virtual layer to be created under eg dev.example_schema.example_table instead of dev.example_schema__dev.example_table.

In this initial implementation, SQLMesh can only automatically create + drop catalogs for you in Snowflake and DuckDB. Catalogs are a lot less trivial to create than schemas because they tend to have extra options like "where to store the data files", "what tablespace to use" etc which SQLMesh doesn't currently have a good way to specify.

For other engines, the assumption is that any environment-specific catalogs have already been created in the target database and SQLMesh is just utilizing them

@sungchun12
Copy link
Copy Markdown
Contributor

@dcohen24 we're working on this as promised!

@dcohen24
Copy link
Copy Markdown

Great. Will need your help to think about this one :
" SQLMesh will not attempt to create catalogs on demand or drop them as part of janitor cleanup. Using `environment_suffix_target "

This will work great for staging (and dev)... need to think about how we might handle individual user / feature branches (that wouldnt have a pre-built catalog)

Comment thread sqlmesh/core/config/root.py Outdated
Comment thread sqlmesh/core/config/root.py Outdated
environment_suffix_target.lower() == EnvironmentSuffixTarget.CATALOG.lower()
and "environment_catalog_mapping" not in data
):
# set the default environment_catalog_mapping for when environment_suffix_target=catalog
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we fail if environment_catalog_mapping is provided together with the catalog suffix target? IMHO the 2 seem to be mutually exclusive

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They aren't, they work together to give users control over the catalog names.

By default, if you dont configure an environment_catalog_mapping, you get the default mapping of "catalog is named after the environment 1:1".

But if you want more control over how the catalog names are generated, you can specify a custom environment_catalog_mapping.

Therefore, environment_suffix_target: catalog is implemented in terms of environment_catalog_mapping

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After internal discussion, we don't want to give users the ability to override how the catalog names are generated because it's too easy to get subtly wrong.

Based on that, these two settings are indeed mutually exclusive, i've adjusted the implementation to throw a ConfigError if both are specified

Comment thread sqlmesh/core/environment.py Outdated
Comment thread sqlmesh/core/config/root.py Outdated
Comment thread docs/guides/configuration.md Outdated

!!! warning "Caveats"
- Using `environment_suffix_target: catalog` only works on engines that support querying across different catalogs. If your engine does not support cross-catalog queries then you will need to use `environment_suffix_target: schema` or `environment_suffix_target: table` instead.
- SQLMesh will not attempt to create catalogs on demand or drop them as part of janitor cleanup. Using `environment_suffix_target: catalog` assumes the catalogs already exist in the target database and are being managed outside of SQLMesh.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So: SQLMesh will not do this. Is there a fallback? what will happen when a feature branch/catalog is attempted to be spun up?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difficulty right now is that CREATE CATALOG / CREATE DATABASE generally comes with a bunch of options to customize the catalog and SQLMesh does not have a good way of specifying them at the moment

However, we might be able to support a basic case with engines like Snowflake that don't make this as difficult, let me revisit this

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be awesome [selfishly...we will be snowflake]... I suppose it also be like a custom materialization /abstract class. Push it on user to do build up /teardown

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created an initial implementation for Snowflake where SQLMesh will run CREATE DATABASE IF NOT EXISTS <env_name> to create a catalog and DROP DATABASE IF EXISTS <env_name> to clean up when the env expires.

Doing this automatically makes me slightly nervous because SQLMesh cannot distinguish between catalogs it created and catalogs that others created. So someone could map a SQLMesh virtual environment to an existing catalog with other data in it and the Janitor will happily drop that catalog when the SQLMesh virtual environment expires, which will also drop the other data.

Do you see that being a problem in your use-case?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking about this more, i've adjusted SQLMesh to set COMMENT = 'sqlmesh_managed' on the databases it creates.

It will then only drop databases with this comment set. That should help prevent accidents

@erindru erindru force-pushed the erin/environment-suffix-catalog branch from b983250 to 51f69aa Compare June 17, 2025 23:50
Comment thread sqlmesh/core/environment.py Outdated
@erindru erindru force-pushed the erin/environment-suffix-catalog branch 2 times, most recently from 7dd81a4 to d8dcf20 Compare June 18, 2025 22:49
Comment thread sqlmesh/core/engine_adapter/base.py Outdated
Comment thread sqlmesh/core/engine_adapter/base.py Outdated
Comment thread sqlmesh/core/context.py Outdated
@erindru erindru force-pushed the erin/environment-suffix-catalog branch 2 times, most recently from 70993f1 to 160f6ac Compare June 19, 2025 01:58
Comment thread sqlmesh/core/engine_adapter/base.py Outdated
return self._drop_catalog(exp.parse_identifier(catalog_name, dialect=self.dialect))

def _drop_catalog(self, catalog_name: exp.Identifier) -> None:
raise NotImplementedError(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a SQLMeshError in order to bubble up to the user correctly.

@erindru erindru force-pushed the erin/environment-suffix-catalog branch from 160f6ac to f13a20e Compare June 19, 2025 21:50
@erindru erindru merged commit 05c793c into main Jun 19, 2025
25 checks passed
@erindru erindru deleted the erin/environment-suffix-catalog branch June 19, 2025 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants