Conversation
There was a problem hiding this comment.
Pull request overview
Updates the SQL-length boundary system test to be more deterministic by avoiding now+offset timestamp expressions and by setting a known maxSQLLength baseline before running boundary checks.
Changes:
- Set
maxSQLLengthto 1MB during test case initialization. - Use a fixed base timestamp (ms) instead of
now+{i}swhen generating large INSERT statements. - Update expected row counts after the 10MB and 64MB INSERTs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| tdLog.info("=============== Test with large maxSQLLength scenario") | ||
| tdSql.execute("alter local 'maxSQLLength' '1048576'") | ||
|
|
||
| tdSql.execute("drop database if exists db_maxsql_large") | ||
| tdSql.execute("create database db_maxsql_large") | ||
| tdSql.execute("use db_maxsql_large") | ||
|
|
||
| # Fixed base timestamp (ms) to avoid duplicate ts from now+offset | ||
| base_ts_ms = 1704067200000 # 2024-01-01 00:00:00.000 |
There was a problem hiding this comment.
test_sql_length_boundary() is not invoked from run(), and the system-test runner executes only case.run() (it doesn't auto-discover test_* methods). Unless another harness calls this explicitly, these assertions will never run; consider calling self.test_sql_length_boundary() from run() (or moving this test to its own case that is actually scheduled) so the fix takes effect in CI.
| base_len_64mb = len(base_sql_64mb) | ||
| tdLog.info(f"64MB sql length: {base_len_64mb}") | ||
| num_values_64mb = (target_length_64mb - base_len_64mb) // 16 | ||
| values_64mb = [f"(now+{i}s,{i})" for i in range(num_values_64mb)] | ||
| base_ts_64mb = base_ts_ms + num_values | ||
| values_64mb = [f"({base_ts_64mb + i},{i})" for i in range(num_values_64mb)] | ||
| insert_sql_64mb = base_sql_64mb + ",".join(values_64mb) |
There was a problem hiding this comment.
Generating values_64mb builds a Python list with ~4.2M tuple strings before truncation, which can consume hundreds of MB of memory and significantly slow down or OOM CI. Consider constructing the SQL incrementally (e.g., via io.StringIO / chunked appends) and stop once the target length is reached, instead of materializing all values first.
| # Test with 10MB limit (should succeed) | ||
| tdSql.execute("alter local 'maxSQLLength' '10485760'") | ||
| tdSql.execute(insert_sql) | ||
| tdSql.query("select * from t2") | ||
| tdSql.checkRows(509902) | ||
| tdSql.checkRows(460732) |
There was a problem hiding this comment.
The expected row count is hardcoded, but it’s derived from the generated SQL string length and can easily change if the tuple formatting/target length changes. To make this boundary test less brittle, compute the expected inserted row count from the generated SQL (e.g., count value tuples) or query count(*) before/after the insert and assert on the delta.
| tdSql.execute("alter local 'maxSQLLength' '67108864'") | ||
| tdSql.execute(insert_sql_64mb) | ||
| tdSql.query("select * from t2") | ||
| tdSql.checkRows(3524291) | ||
| tdSql.checkRows(3303225) |
There was a problem hiding this comment.
Same as above: this hardcoded total row count depends on the exact SQL string construction and previous insert size, making the test fragile across small formatting changes. Prefer asserting on computed expected counts (tuple count / count(*) delta) rather than fixed constants.
There was a problem hiding this comment.
Code Review
This pull request replaces relative timestamps with fixed millisecond timestamps in boundary tests to avoid duplicates and updates the expected row counts for large SQL inserts. Review feedback points out that the hardcoded divisor used to estimate the number of values is now inaccurate due to the longer timestamp strings. Additionally, the reviewer recommends calculating expected row counts dynamically rather than using hardcoded magic numbers to make the tests more robust.
| tdLog.info(f"10MB sql length: {base_len}") | ||
| num_values = (target_length - base_len) // 16 | ||
| values = [f"(now+{i}s,{i})" for i in range(num_values)] | ||
| values = [f"({base_ts_ms + i},{i})" for i in range(num_values)] |
There was a problem hiding this comment.
The num_values used here is calculated (on line 348) using a hardcoded divisor of 16. With the switch to a fixed 13-digit timestamp, the average length of each value tuple (timestamp, value) is now significantly higher (approx 20-23 bytes). This makes the initial estimate for num_values inaccurate, leading to the generation of a much larger SQL string than intended before any trimming logic is applied. Consider updating the divisor to a more realistic value or calculating it dynamically based on the sample row length.
| tdSql.execute(insert_sql) | ||
| tdSql.query("select * from t2") | ||
| tdSql.checkRows(509902) | ||
| tdSql.checkRows(460732) |
There was a problem hiding this comment.
The expected row count 460732 is hardcoded and highly sensitive to the specific string length of the timestamps and values. If the base_ts_ms or the value generation logic changes, this test will break again. It would be more robust to verify the row count dynamically (e.g., by checking against the actual number of elements in the values list after it has been adjusted to fit the target_length) rather than using a magic number.
| tdSql.execute(insert_sql_64mb) | ||
| tdSql.query("select * from t2") | ||
| tdSql.checkRows(3524291) | ||
| tdSql.checkRows(3303225) |
Description
Issue(s)
Checklist
Please check the items in the checklist if applicable.