-
Notifications
You must be signed in to change notification settings - Fork 380
Builtin Linting Rule: cronintervalalignment #5091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 13 commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
a18a50f
fix diagram
sungchun12 c11510d
Merge branch 'main' of https://github.com/TobikoData/sqlmesh
sungchun12 c2c1862
Merge branch 'main' of https://github.com/TobikoData/sqlmesh
sungchun12 f6aee9e
Merge branch 'main' of https://github.com/TobikoData/sqlmesh
sungchun12 c72d253
Merge branch 'main' of https://github.com/TobikoData/sqlmesh
sungchun12 347e312
cron validator builtin
sungchun12 5f5f429
add docs
sungchun12 b6c7a69
clear docs
sungchun12 5898932
pr feedback
sungchun12 bd895ba
update for more valid combos
sungchun12 1c80707
remove unneeded variables
sungchun12 c316b80
update tests
sungchun12 5861dba
fix test
sungchun12 e839d4e
fix implementation
sungchun12 a5fa6b4
Merge branch 'main' of https://github.com/TobikoData/sqlmesh into sun…
sungchun12 cc5276f
fix test name
sungchun12 67b984f
new approach
sungchun12 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -49,6 +49,7 @@ | |
| "nomissingaudits", | ||
| "nomissingowner", | ||
| "nomissingexternalmodels", | ||
| "cronintervalalignment", | ||
| ], | ||
| ), | ||
| ) | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,6 +2,8 @@ | |
|
|
||
| from sqlmesh import Context | ||
| from sqlmesh.core.linter.rule import Position, Range | ||
| from sqlmesh.core.model import load_sql_based_model | ||
| from sqlmesh.core import dialect as d | ||
|
|
||
|
|
||
| def test_no_missing_external_models(tmp_path, copy_to_temp_path) -> None: | ||
|
|
@@ -31,6 +33,7 @@ def test_no_missing_external_models(tmp_path, copy_to_temp_path) -> None: | |
| "nomissingaudits", | ||
| "nomissingowner", | ||
| "nomissingexternalmodels", | ||
| "cronintervalalignment", | ||
| ], | ||
| ),""" | ||
| after = """linter=LinterConfig(enabled=True, rules=["nomissingexternalmodels"]),""" | ||
|
|
@@ -84,6 +87,7 @@ def test_no_missing_external_models_with_existing_file_ending_in_newline( | |
| "nomissingaudits", | ||
| "nomissingowner", | ||
| "nomissingexternalmodels", | ||
| "cronintervalalignment", | ||
| ], | ||
| ),""" | ||
| after = """linter=LinterConfig(enabled=True, rules=["nomissingexternalmodels"]),""" | ||
|
|
@@ -141,6 +145,7 @@ def test_no_missing_external_models_with_existing_file_not_ending_in_newline( | |
| "nomissingaudits", | ||
| "nomissingowner", | ||
| "nomissingexternalmodels", | ||
| "cronintervalalignment", | ||
| ], | ||
| ),""" | ||
| after = """linter=LinterConfig(enabled=True, rules=["nomissingexternalmodels"]),""" | ||
|
|
@@ -172,3 +177,111 @@ def test_no_missing_external_models_with_existing_file_not_ending_in_newline( | |
| ) | ||
| fix_path = sushi_path / "external_models.yaml" | ||
| assert edit.path == fix_path | ||
|
|
||
|
|
||
| def test_cron_interval_alignment(tmp_path, copy_to_temp_path) -> None: | ||
| sushi_paths = copy_to_temp_path("examples/sushi") | ||
| sushi_path = sushi_paths[0] | ||
|
|
||
| # Override the config.py to turn on lint | ||
| with open(sushi_path / "config.py", "r") as f: | ||
| read_file = f.read() | ||
|
|
||
| before = """ linter=LinterConfig( | ||
| enabled=False, | ||
| rules=[ | ||
| "ambiguousorinvalidcolumn", | ||
| "invalidselectstarexpansion", | ||
| "noselectstar", | ||
| "nomissingaudits", | ||
| "nomissingowner", | ||
| "nomissingexternalmodels", | ||
| "cronintervalalignment", | ||
| ], | ||
| ),""" | ||
| after = """linter=LinterConfig(enabled=True, rules=["cronintervalalignment"]),""" | ||
| read_file = read_file.replace(before, after) | ||
| assert after in read_file | ||
| with open(sushi_path / "config.py", "w") as f: | ||
| f.writelines(read_file) | ||
|
|
||
| # Load the context with the temporary sushi path | ||
| context = Context(paths=[sushi_path]) | ||
|
|
||
| context.load() | ||
|
|
||
| # Create model with shorter cron interval that depends on model with longer interval | ||
| upstream_model = load_sql_based_model( | ||
| d.parse("MODEL (name memory.sushi.step_1, cron '@weekly'); SELECT * FROM (SELECT 1)") | ||
| ) | ||
|
|
||
| downstream_model = load_sql_based_model( | ||
| d.parse( | ||
| "MODEL (name memory.sushi.step_2, cron '@daily', depends_on ['memory.sushi.step_1']); SELECT * FROM (SELECT 1)" | ||
| ) | ||
| ) | ||
|
|
||
| context.upsert_model(upstream_model) | ||
| context.upsert_model(downstream_model) | ||
|
|
||
| lints = context.lint_models(raise_on_error=False) | ||
| assert len(lints) == 1 | ||
| lint = lints[0] | ||
| assert ( | ||
| lint.violation_msg | ||
| == 'Upstream model "memory"."sushi"."step_1" has longer cron interval (@weekly) than this model (@daily)' | ||
| ) | ||
|
|
||
|
|
||
| def test_cron_interval_alignment_valid_upstream(tmp_path, copy_to_temp_path) -> None: | ||
| sushi_paths = copy_to_temp_path("examples/sushi") | ||
| sushi_path = sushi_paths[0] | ||
|
|
||
| # Override the config.py to turn on lint | ||
| with open(sushi_path / "config.py", "r") as f: | ||
| read_file = f.read() | ||
|
|
||
| before = """ linter=LinterConfig( | ||
| enabled=False, | ||
| rules=[ | ||
| "ambiguousorinvalidcolumn", | ||
| "invalidselectstarexpansion", | ||
| "noselectstar", | ||
| "nomissingaudits", | ||
| "nomissingowner", | ||
| "nomissingexternalmodels", | ||
| "cronintervalalignment", | ||
| ], | ||
| ),""" | ||
| after = """linter=LinterConfig(enabled=True, rules=["cronintervalalignment"]),""" | ||
| read_file = read_file.replace(before, after) | ||
| assert after in read_file | ||
| with open(sushi_path / "config.py", "w") as f: | ||
| f.writelines(read_file) | ||
|
|
||
| # Load the context with the temporary sushi path | ||
| context = Context(paths=[sushi_path]) | ||
|
|
||
| context.load() | ||
|
|
||
| # Create model with shorter cron interval that depends on model with longer interval | ||
| upstream_model_a = load_sql_based_model( | ||
| d.parse("MODEL (name memory.sushi.step_a, cron '@weekly'); SELECT * FROM (SELECT 1)") | ||
| ) | ||
|
|
||
| upstream_model_b = load_sql_based_model( | ||
| d.parse("MODEL (name memory.sushi.step_b, cron '@hourly'); SELECT * FROM (SELECT 1)") | ||
| ) | ||
|
|
||
| downstream_model = load_sql_based_model( | ||
| d.parse( | ||
| "MODEL (name memory.sushi.step_c, cron '@daily', depends_on ['memory.sushi.step_1', 'memory.sushi.step_b']); SELECT * FROM (SELECT 1)" | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah shoot, I need to update |
||
| ) | ||
| ) | ||
|
|
||
| context.upsert_model(upstream_model_a) | ||
| context.upsert_model(upstream_model_b) | ||
| context.upsert_model(downstream_model) | ||
|
|
||
| lints = context.lint_models(raise_on_error=False) | ||
| assert len(lints) == 0 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this more and I don't think this is the right way to handle this. Imagine graph A (upstream) <- B (downstream) with crons
5 * * * *(run at 5th minute of every hour)0 * * * *(run every hour)Both models are hourly, so technically the upstream's cron interval is not "longer" as the violation suggests. However, B will run before A which might not be the desired behavior.
But if it was this instead:
15 10 * * *(run at 10:15AM of every day)0 * * * *(run every hour)Your check will return false given the current placeholder start date, because
cron_nextfor A will be10:15but for B it will be11:00. In this case upstream is daily, and its interval is indeed longer than Bs (hourly). You probably want your check to fail but it won't.So if the length of the interval is what you're trying to capture here then you want to do the following instead:
and move away from the placehoder date.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually my first approach, but it won't work in isolation because the
IntervalUnitis limited: https://github.com/TobikoData/sqlmesh/blob/5861dbaac291113b504380456cc1b38a897611cc/sqlmesh/core/node.py#L39-L45For example,
@weeklyand@dailywill be interpreted as being equal in interval unit seconds because they are both categorized asDAY.New Approach: Use an OR condition where if upstream model's next cron is greater OR its interval unit in seconds is greater, return a violation.
Example in action for your first scenario: You'll notice the placeholder date implementation works as expected. This is NOT a rule violation because model B runs AFTER model A.
A: 5 * * * * (run at 5th minute of every hour)
B: 0 * * * * (run every hour)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: it's important WHEN models are linted because when it's to deployed to prod determines the next run. For nuanced cron schedules, a model applied at a specific time can emit violation vs. not.