fix(security): 2 improvements across 2 files#711
fix(security): 2 improvements across 2 files#711tomaioo wants to merge 4 commits intoClickHouse:mainfrom
Conversation
- Security: SQL injection risk in SQLAlchemy inspector metadata query - Security: DDL injection risk in CreateDatabase replicated engine arguments Signed-off-by: tomaioo <203048277+tomaioo@users.noreply.github.com>
- Security: SQL injection risk in SQLAlchemy inspector metadata query - Security: DDL injection risk in CreateDatabase replicated engine arguments Signed-off-by: tomaioo <203048277+tomaioo@users.noreply.github.com>
|
|
There was a problem hiding this comment.
Pull request overview
This PR aims to harden SQL construction in the SQLAlchemy integration by reducing SQL-injection risk when querying ClickHouse system metadata and by escaping user-provided values embedded into DDL strings.
Changes:
- Replace f-string SQL interpolation with bound parameters in
get_enginewhen queryingsystem.tables. - Add string escaping for Replicated database engine arguments in
CreateDatabase.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
clickhouse_connect/cc_sqlalchemy/inspector.py |
Uses bind parameters for the system.tables lookup in get_engine. |
clickhouse_connect/cc_sqlalchemy/ddl/custom.py |
Introduces a helper to escape string arguments embedded in Replicated database DDL construction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from clickhouse_connect.driver.binding import quote_identifier | ||
|
|
||
|
|
||
| def _escape_sql_string(value): | ||
| return str(value).replace("\\", "\\\\").replace("'", "\\'") |
There was a problem hiding this comment.
_escape_sql_string duplicates escaping logic that already exists in clickhouse_connect.driver.binding.escape_str/format_str and currently escapes fewer characters (e.g., it doesn't handle tabs/newlines/backticks that the driver escape logic treats as special). To keep escaping consistent across the codebase and avoid edge-case parsing issues, consider reusing the driver-provided escape/format helpers instead of maintaining a separate implementation here.
| from clickhouse_connect.driver.binding import quote_identifier | |
| def _escape_sql_string(value): | |
| return str(value).replace("\\", "\\\\").replace("'", "\\'") | |
| from clickhouse_connect.driver.binding import escape_str, quote_identifier | |
| def _escape_sql_string(value): | |
| return escape_str(str(value)) |
| result_set = connection.execute(text("SELECT engine_full FROM system.tables WHERE database = :schema and name = :table_name"), | ||
| {'schema': schema, 'table_name': table_name}) |
There was a problem hiding this comment.
The PR description mentions adding strict allowlist validation for schema/table identifiers in addition to parameterizing the query, but this change only adds bind parameters. Either implement the identifier validation (if still required) or update the PR description to match the actual mitigation being shipped.
| stmt += ( | ||
| f" ('{_escape_sql_string(zoo_path)}', " | ||
| f"'{_escape_sql_string(shard_name)}', " | ||
| f"'{_escape_sql_string(replica_name)}'" | ||
| ) |
There was a problem hiding this comment.
This change alters how Replicated database engine arguments are rendered/escaped, but current test coverage only exercises CreateDatabase for non-Replicated engines. Add a focused test that validates the generated SQL for engine='Replicated' (including escaping of quotes/backslashes in the args) to prevent regressions and ensure the security fix stays intact.
| stmt += ( | ||
| f" ('{_escape_sql_string(zoo_path)}', " | ||
| f"'{_escape_sql_string(shard_name)}', " | ||
| f"'{_escape_sql_string(replica_name)}'" |
There was a problem hiding this comment.
The Replicated database DDL string being built here still appears to be missing the closing ")" after the replica_name argument. As written, the final statement would end with ..., '{replica}' and leave the opening parenthesis from Replicated ( unbalanced, which should produce invalid SQL. Add the closing parenthesis (and any expected spacing) when appending these arguments.
| f"'{_escape_sql_string(replica_name)}'" | |
| f"'{_escape_sql_string(replica_name)}')" |
There was a problem hiding this comment.
The missing ) here exists on main, and is indeed a needed fix that we should do here since we're touching this line anyway. It also means no test exercises this path so we should get a test around this as well. Again, using the existing format_str
stmt += (
f" ({format_str(zoo_path)}, "
f"{format_str(shard_name)}, "
f"{format_str(replica_name)})"
)
joe-clickhouse
left a comment
There was a problem hiding this comment.
Hi @tomaioo thanks for the fixes. Left a few comments in the code for your review.
| stmt += ( | ||
| f" ('{_escape_sql_string(zoo_path)}', " | ||
| f"'{_escape_sql_string(shard_name)}', " | ||
| f"'{_escape_sql_string(replica_name)}'" |
There was a problem hiding this comment.
The missing ) here exists on main, and is indeed a needed fix that we should do here since we're touching this line anyway. It also means no test exercises this path so we should get a test around this as well. Again, using the existing format_str
stmt += (
f" ({format_str(zoo_path)}, "
f"{format_str(shard_name)}, "
f"{format_str(replica_name)})"
)| def _escape_sql_string(value): | ||
| return str(value).replace("\\", "\\\\").replace("'", "\\'") |
There was a problem hiding this comment.
The codebase already has format_str in binding.py. So let's delete _escape_sql_string entirely and just reuse those to match existing convention.
…nd is indeed a needed Signed-off-by: tomaioo <203048277+tomaioo@users.noreply.github.com>
Summary
fix(security): 2 improvements across 2 files
Problem
Severity:
High| File:clickhouse_connect/cc_sqlalchemy/inspector.py:L15The
get_enginefunction builds SQL using f-string interpolation withschemaandtable_namedirectly inside quoted literals. If these values are attacker-controlled (or come from untrusted input through reflection APIs), a crafted value containing quotes can alter the query.Solution
Use parameterized SQL instead of string interpolation, e.g.
text("SELECT engine_full FROM system.tables WHERE database = :schema AND name = :name")with bound params. Also validate schema/table identifiers against a strict allowlist pattern.Changes
clickhouse_connect/cc_sqlalchemy/inspector.py(modified)clickhouse_connect/cc_sqlalchemy/ddl/custom.py(modified)