feat: add scene analysis API#969
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new scene analyzer module (sceneAnalyzer.ts) and integrates it into SceneParser to perform static analysis on scene files, validating labels, references, choices, assets, variables, and control flow. Feedback on these changes highlights several issues: a potential ReDoS vulnerability and state preservation bug in extractVariableNames due to global regex usage, false positives in control flow validation for conditional terminal commands, path resolution mismatches with leading slashes in isSceneResolved, and incorrect line number reporting (line 0) for assets without explicit line numbers.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| function extractVariableNames(expression: string): string[] { | ||
| const withoutStrings = expression.replace(/(["'`])(?:\\.|(?!\1).)*\1/g, ' '); | ||
| const names = new Set<string>(); | ||
| let match: RegExpExecArray | null; | ||
|
|
||
| while ((match = VARIABLE_TOKEN_RE.exec(withoutStrings)) !== null) { | ||
| const token = match[0]; | ||
| if (!RESERVED_VARIABLE_WORDS.has(token.toLowerCase()) && Number.isNaN(Number(token))) { | ||
| names.add(token); | ||
| } | ||
| } | ||
|
|
||
| return Array.from(names); | ||
| } |
There was a problem hiding this comment.
The VARIABLE_TOKEN_RE regular expression is defined with the global (g) flag at the module level. Using exec in a loop on a global regex maintains state (lastIndex) across multiple invocations of extractVariableNames, leading to incorrect parsing or skipped matches on subsequent calls. Additionally, the regex used to strip strings /(["'])(?:\.|(?!\1).)*\1/gcontains overlapping paths that can lead to exponential backtracking (ReDoS) on unmatched quotes with many backslashes. We can resolve both issues by using safe non-backreferencing string matching andString.prototype.match()instead ofexec()`.
function extractVariableNames(expression: string): string[] {
const withoutStrings = expression.replace(/\"(?:\\.|[^\"\\])*\"|'(?:\\.|[^'\\])*'|`(?:\\.|[^`\\])*`/g, ' ');
const names = new Set<string>();
const matches = withoutStrings.match(VARIABLE_TOKEN_RE) || [];
for (const token of matches) {
if (!RESERVED_VARIABLE_WORDS.has(token.toLowerCase()) && Number.isNaN(Number(token))) {
names.add(token);
}
}
return Array.from(names);
}| context.scene.sentenceList.forEach((sentence, index) => { | ||
| if (!TERMINAL_COMMANDS.has(sentence.command)) { | ||
| return; | ||
| } | ||
|
|
||
| const nextMeaningful = findNextMeaningfulSentence(context.scene.sentenceList, index + 1); | ||
| if (!nextMeaningful) { | ||
| return; | ||
| } | ||
|
|
||
| const [nextSentence, nextIndex] = nextMeaningful; | ||
| if (nextSentence.command === commandType.label) { | ||
| return; | ||
| } | ||
|
|
||
| pushWarning( | ||
| context, | ||
| nextSentence, | ||
| nextIndex, | ||
| 'unreachable-after-terminal-command', | ||
| 'info', | ||
| 'This sentence follows a terminal command before the next label.', | ||
| ); | ||
| }); |
There was a problem hiding this comment.
If a terminal command (such as changeScene) has a conditional when argument, the control flow is not guaranteed to terminate. Marking subsequent sentences as unreachable in this case will result in false positives. We should skip control flow validation for commands that have a when argument.
context.scene.sentenceList.forEach((sentence, index) => {
if (!TERMINAL_COMMANDS.has(sentence.command)) {
return;
}
const isConditional = sentence.args.some((arg) => arg.key === 'when');
if (isConditional) {
return;
}
const nextMeaningful = findNextMeaningfulSentence(context.scene.sentenceList, index + 1);
if (!nextMeaningful) {
return;
}
const [nextSentence, nextIndex] = nextMeaningful;
if (nextSentence.command === commandType.label) {
return;
}
pushWarning(
context,
nextSentence,
nextIndex,
'unreachable-after-terminal-command',
'info',
'This sentence follows a terminal command before the next label.',
);
});| function isSceneResolved(context: AnalyzerContext, target: string): boolean { | ||
| if (context.knownScenes.size === 0) { | ||
| return true; | ||
| } | ||
|
|
||
| const normalized = normalizeSceneTarget(target); | ||
| return context.knownScenes.has(normalized) || context.knownScenes.has(stripLeadingSlash(normalized)); | ||
| } |
There was a problem hiding this comment.
To prevent mismatches between known scenes and referenced scenes due to inconsistent leading slashes (e.g., /start.txt vs start.txt), we should check both the normalized path, the stripped path, and the absolute-style path.
| function isSceneResolved(context: AnalyzerContext, target: string): boolean { | |
| if (context.knownScenes.size === 0) { | |
| return true; | |
| } | |
| const normalized = normalizeSceneTarget(target); | |
| return context.knownScenes.has(normalized) || context.knownScenes.has(stripLeadingSlash(normalized)); | |
| } | |
| function isSceneResolved(context: AnalyzerContext, target: string): boolean { | |
| if (context.knownScenes.size === 0) { | |
| return true; | |
| } | |
| const normalized = normalizeSceneTarget(target); | |
| const stripped = stripLeadingSlash(normalized); | |
| return ( | |
| context.knownScenes.has(normalized) || | |
| context.knownScenes.has(stripped) || | |
| context.knownScenes.has("/" + stripped) | |
| ); | |
| } |
| if (!asset.url || asset.url.trim().length === 0) { | ||
| context.warnings.push({ | ||
| severity: 'warning', | ||
| code: 'empty-asset-url', | ||
| message: `Asset "${asset.name}" has an empty URL.`, | ||
| lineNumber: asset.lineNumber, | ||
| sentenceIndex: asset.sentenceIndex, | ||
| command: asset.sourceCommand, | ||
| commandRaw: asset.sourceCommandRaw, | ||
| }); | ||
| } |
There was a problem hiding this comment.
If asset.lineNumber is 0 (which is common for assets parsed without explicit line numbers), the warning will report line 0. We should fall back to the sentence's actual line number using getSentenceLineNumber to ensure a valid 1-based line number is always reported.
if (!asset.url || asset.url.trim().length === 0) {
const sentence = context.scene.sentenceList[asset.sentenceIndex];
const lineNumber = asset.lineNumber > 0 ? asset.lineNumber : getSentenceLineNumber(sentence, asset.sentenceIndex);
context.warnings.push({
severity: 'warning',
code: 'empty-asset-url',
message: "Asset \"" + asset.name + "\" has an empty URL.",
lineNumber,
sentenceIndex: asset.sentenceIndex,
command: asset.sourceCommand,
commandRaw: asset.sourceCommandRaw,
});
}
Summary\n- add a parser-level scene analysis API for WebGAL scripts\n- expose SceneParser.analyze() and analyzeScene() for editor/tooling use\n- collect labels, scene references, choices, assets, variable usage, command summaries, and diagnostics\n- report duplicate labels, missing labels, unknown scenes, empty choices/assets, and simple unreachable-flow hints\n\n## Review notes\n- Feature-focused code change for WebGAL/GalGame script tooling\n- Diff size: 849 insertions\n- Production code: 825 insertions across sceneAnalyzer.ts and parser exports\n- Test coverage is intentionally small: 24 inserted lines only to verify the public API\n\n## Testing\n- cd packages/parser && yarn vitest run parser.test.ts\n- cd packages/parser && yarn build-ci