Skip to content

Supporting more python#1

Open
mauri-c3 wants to merge 25 commits into
devfrom
supportingMorePython
Open

Supporting more python#1
mauri-c3 wants to merge 25 commits into
devfrom
supportingMorePython

Conversation

@mauri-c3
Copy link
Copy Markdown
Collaborator

@mauri-c3 mauri-c3 commented May 6, 2026

PullRequest to incorporate new Python functionality.

In this Markdown all changes will be presented and reasoned.
(This Markdown is also availble as .md in the repository)
To see what is known to be unsupported see knownToBeUnsupported.md

Grammar

Rework of For Decomposition

Python currently allows another targets inside the for constructs, thereby we change our ForControl non terminal to be closer to the python language.

Old:

 ForControl = ForDecomposition "in" ForIterable;
 interface ForDecomposition;
 ForDecompositionComma implements ForDecomposition = ForDecomposition "," (ForDecomposition ","?)?;
 ForDecompositionParenthesis implements ForDecomposition = "("ForDecomposition ")";
 ForDecompositionBrackets implements ForDecomposition = "[" ForDecomposition "]";
 ForStarredVariable  implements ForDecomposition = "*" ForDecomposition;

New:

 ForControl = ForList "in" ForIterable;
 interface ForDecomposition;
 ForList = (ForDecomposition || ",")+ ","?;
 ForDecompositionParenthesis implements ForDecomposition = "(" ForList? ")";
 ForDecompositionBrackets implements ForDecomposition = "[" ForList? "]";
 ForStarredVariable  implements ForDecomposition = "*" ForDecomposition ;
 ForPyQualifiedName implements ForDecomposition = PyQualifiedName;

Python: ("[...]" Marks optional)

comprehension: assignment_expression comp_for
comp_for:      ["async"] "for" target_list "in" or_test [comp_iter]
target_list:     target ("," target)* [","]
target:          identifier
                 | "(" [target_list] ")"
                 | "[" [target_list] "]"
                 | attributeref
                 | subscription
                 | "*" target

Sources: https://docs.python.org/3/reference/simple_stmts.html#grammar-token-python-grammar-target_list and https://docs.python.org/3/reference/expressions.html (6.2.5)

Rework of Boolean

Instead of Defining a new BooleanLiteralPython, we override the Monticore BooleanLiteral to allow 'true' and 'false' as names, tests have shown that they can be used as such.

Rework of fstrings

It proved to be difficult to support all variants of fStrings, so far they were parsed as a singular string token, thereby the content of curly brackets was considered text, yet tests have shown that line breaks are allowed within curly brackets. Another issue is the allowed usage of the same quotes " or ' in the outer string as well as inside the curly brackets. Finally it is possible to add f to chars in python such as f'a' this was so far an issue as only strings were allowed to be modified and not chars.
With the following changes we aim combat these issues.

  1. We don't allow f and F as string modifier (line 121)
  2. We add chars to the string definitions (line 115,line 120)
  3. We add new tokens for fstrings
    //Double and single quoted strings with an f modifier f'text1{exp}text2', separate definition to allow more expressions.
         token FSQStringPython
           = ('f'|'F') '\'' (StringFSQCharactersPython)? '\'' : {setText(getText().substring(1, getText().length() - 1));};
         token FDQStringPython
           = ('f'|'F') '"' (StringFDQCharactersPython)? '"' : {setText(getText().substring(1, getText().length() - 1));};
         fragment token StringFSQCharactersPython = (StringFSQCharacterPython)+;
         fragment token StringFDQCharactersPython = (StringFDQCharacterPython)+;
         fragment token StringFSQCharacterPython = ~ ('\''| '\\'| '{' )| PythonEscapeSequence |InnerFString;
         fragment token StringFDQCharacterPython = ~ ('"'| '\\'| '{') | PythonEscapeSequence |InnerFString;
         fragment token InnerFString = '{' ~('}')* '}'; 
    
  4. We similarly define fstring literals and non terminals for multilinestring tokens
  5. We add a new FStringPython non terminal (line 120) and add it to the StringLiteralPython (line 112-117)

Addition of complex numbers.

Python supports the usage of complex numbers, they are formatted as:
(real number) (+|-) (imaginaryNumber)'j'
A token PyComplexNumber was added supporting these numbers, additional a matching literal for their usage within python code.

    PyComplexNumberLiteral implements NumericLiteral <95> = PyComplexNumber;

Source for cmath: https://docs.python.org/3/library/cmath.html (Experiments have shown that that the real number and sign of the imaginary can be left out)

Addition of generics.

We added support for generics as follows:

  GenericsAnnotation = "[" Generics? "]";
  Generics = (Generic || ",")+ ;
  Generic = TypeAnnotation (":" TypeAnnotation)?;

This is done similarly to https://docs.python.org/3/reference/compound_stmts.html#type-params.
But differently to use the existing TypeAnnotation non terminal. To support the functionality described in the source above we added "GenericsAnnotation?" to:

  • The the non terminals implementting the interface FunctionParameter.
  • ClassFunctionDeclaration,ClassDeclaration,SimpleFunctionDeclaration,TypeDeclarationStatement
    and the TypeAnnotations GenericTypeAnnotation implements TypeAnnotation = TypeAnnotation GenericsAnnotation

Statements

ClassStatements

Import, If, Assert, For, While, ConditionalExecution, With, GlobalVariableDeclaration, NonLocalVariableDeclaration, MultiVariableDeclaration, ParenMultiVariableDeclaration, Raise, TypeRuleStatement,Delete statements are now implementing the ClassStatement interface. By experimenting and testing they showed to be allowed inside classes.

New Statements

Nonlocal

For a detailed description of the nonlocal keyword see: https://docs.python.org/3/tutorial/classes.html under 9.2 . Summarizing the source, nonlocal allows a function or class defined inside another function to accesses the enclosing functions variables. Without the non local keyword these variables are read only to nested functions and classes. Our implementation is similar to pythons https://docs.python.org/3/reference/simple_stmts.html#nonlocal.

Pythons: nonlocal_stmt: "nonlocal" identifier ("," identifier)* (see source above)
Ours : NonLocalVariableDeclaration implements Statement,ClassStatement = "nonlocal" names:(Name || ",")+ STATEMENT_END;

Yield from

Yield from statements in the form: 'yield' 'from' expression have been added in https://peps.python.org/pep-0380/ ,we support them similarly to python https://docs.python.org/3/reference/expressions.html#grammar-token-python-grammar-yield_expression.

Pythons: yield_from: "yield" "from" expression (see source above)
Ours:YieldFromStatement implements Statement = "yield" "from" Expression STATEMENT_END;

Type Declaration Statement

To allow type hints as var : int without an assignment (in comparison to augmented assignments), we add the new TypeDeclartationStatement.

TypeDeclaration implements Statement = Expression ":" TypeAnnotation STATEMENT_END;

Python incorporates this functionality in the augmented assignments, yet this caused issues for us. see https://docs.python.org/3/reference/simple_stmts.html#index-15.

Type Aliases

Python allows to declare type aliases. Where a type is aliased under an identifier, see https://docs.python.org/3/reference/simple_stmts.html#grammar-token-python-grammar-type_stmt.

We implement the support similarly, also we define the type alias as symbol.
Python: type_stmt: 'type' identifier [type_params] "=" expression (see source above)
Ours: symbol TypeRuleStatement implements Statement,ClassStatement = key("type") Name GenericsAnnotation? "=" Expression STATEMENT_END;
'type' needs to be declared as local keyword as tests have shown that it can be used as variable name. Python also specifies this https://docs.python.org/3/reference/lexical_analysis.html#soft-keywords .

Changes

Global

So far we only allowed a single variable to be declared as global in a global statement.
But Python allows multiple variables to be declared global in the same statement separated by commata see: https://docs.python.org/3/reference/simple_stmts.html#global , also so far we allowed a type annotation within a global statement which is not allowed, thereby this support is removed.
Additionally global can be used in classes as described in the source above.

Old: GlobalVariableDeclaration implements Statement = "global" Name (":" TypeAnnotation)? STATEMENT_END;
New:GlobalVariableDeclaration implements Statement,ClassStatement = "global" (Name || ",")+ STATEMENT_END;

Class Statement Block

A ClassStatementBlock is now allowed to be a singular ClassStatement to allow classes such as:

class name1: pass
class name2: a=1

Old: ClassStatementBlock = BLOCK_START ClassStatementBlockBody BLOCK_END;
New: ClassStatementBlock = (BLOCK_START ClassStatementBlockBody BLOCK_END)|ClassStatement;

Test have shown that this is needed.

Match Statement

Case statements in python allow patterns to be aliased (as example case Type as t:) see https://docs.python.org/3/reference/compound_stmts.html#the-match-statement and https://docs.python.org/3/reference/compound_stmts.html#grammar-token-python-grammar-patterns. To allow this, case statements now span a scope to capture the alias.
Old:

MatchStatement implements Statement = key("match") Expression ":" MatchBlock;
scope MatchBlock = BLOCK_START CaseStatement* BLOCK_END;
CaseStatement = key("case") (Expression || "|")+ ("if" condition:Expression)? ":" StatementBlock;

New:

MatchStatement implements Statement,ClassStatement = key("match") Expression ":" MatchBlock;
scope MatchBlock = BLOCK_START CaseStatement* BLOCK_END;
scope CaseStatement = key("case") (Expression || "|")+ ("as" Alias)? ("if" condition:Expression)? ":" CaseStatementBlock;
CaseStatementBlock = BLOCK_START Statement* BLOCK_END | Statement;

(Alias is defined as ```symbol Alias = Name;````)
Aditionally we allowed match statements inside classes.

Try-Except

PEP758 Added support for leaving out parenthesis. Further so far the support for various Expressions and starred expressions were missing as specified in https://peps.python.org/pep-0758/ we added the support for this accordingly as specified. Also similarly to the new case statements, excepts now span a scope with the alias non terminal.
Old:

 ExceptStatement = "except" (PyQualifiedName? | "(" (PyQualifiedName || ",")+ ")") ("as" alias:Name)? ":" StatementBlock;

New:

scope ExceptStatement = "except" ExceptPattern?  ":" ExceptStatementBlock;
ExceptStatementBlock =  BLOCK_START Statement* BLOCK_END | Statement;
//PEP758
interface ExceptPattern;
ExpressionListing implements ExceptPattern = (Expression || ",")+;
ParenthesisedExpressionListing implements  ExceptPattern = "(" (Expression || ",")+ ")" ("as" Alias)?;
StarredExpressionListing implements ExceptPattern = "*"(Expression || ",")+;
StarredParenthesisedExpressionListing implements  ExceptPattern = "*""(" (Expression || ",")+ ")" ("as" Alias)?;

Expressions

Changes

Assignment

As defined by https://docs.python.org/3/reference/simple_stmts.html#index-14 python's assignment statements additionally supports "//=" "@=" "**=" thereby we add this support by overriding the AssignmentExpression from Monticore by simply adding the three additional operators to the square brackets.

 @Override
    AssignmentExpression implements Expression <60> = <rightassoc>
        left:Expression
        operator: [ "=" | "+=" | "-=" | "*=" | "/=" | "&=" | "|="
                  | "^=" | ">>=" | ">>>=" | "<<=" | "%=" | "**=" | "@=" | "//="]
        right:Expression;

Augmented Assignment

So far we allowed the same operators as in the assignment expression, yet python only allows "=" see https://docs.python.org/3/reference/simple_stmts.html#index-15.

Thereby we remove the square brackets and only allow "=".
Old:

  AnnotatedAssignmentExpression implements Expression <60> = <rightassoc>
       left:Expression
       ":" annotated: TypeAnnotation
       operator: [ "=" | "+=" | "-=" | "*=" | "/=" | "&=" | "|="
                 | "^=" | ">>=" | ">>>=" | "<<=" | "%=" ]
       right:Expression; 

New:

   AnnotatedAssignmentExpression implements Expression <60> = <rightassoc>
       left:Expression
       ":" annotated: TypeAnnotation
       "="
       right:Expression; 

Comprehensions

We allowed multiple if inside a comprehension by changing ? to * at the Generator filters.
This is allowed by python as specified here: https://docs.python.org/3/reference/expressions.html#grammar-token-python-grammar-comp_for

Another

We added some trailing ","? to rules where they were noticed to be allowed while testing. We also formated the grammar and added some comments

Code

In this section i will add the commits instead of the code snippets for readability reasons.

Preprocessor

Commit: 8c519ef
In line 75-77 i added that the continue line token is always ignored to allow statements like:

    function(var1, var2,\
              var3, var4)

as these could cause issues.

Visitor and CoCos related to aliased expressions.

Marc suggested to add a Coco and Visitor to implement the prevention of aliased expressions to be used outside the match blocks, i implemented this in the following commit.
Commit: ba9e8f1

Adaptation of new tests

We removed a test regarding invalid classes that are now valid (because for loops are now allowed as class statements).
We added new test cases for Python, for that we looked what the python parser tests and we do not test.
We then added some tests.

See: https://github.com/python/cpython/blob/3.9/Lib/test/test_parser.py , and the commit 7865991

mauri-c3 added 25 commits April 15, 2026 15:26
Token and NumericLiteral for imaginary numbers in python.
nonlocal statements
yield from statements
Add support for **=,@=,//= assignmentexpressions.
Global declarations
Removing support for AnnotedAssignments apart from "="
Avoiding Nullpointer exceptions,
After parsing a ExceptStatement a NullPointerException could happen during prettyprinting.
Always ignore continueLineTokens
Improving ExceptStatements and allowing the passing of classes
Improving ExceptStatements and allowing unsigned imaginary numbers
Generics and Typehints
Minor fixes,
Additional Match/Case combinations allowed
Parenthesis for Expressions added
Minor fixes of previous mistake

Added CoCo for aliased expressions
…urly brackets.

Overriding of the BooleanLiteral to allow "true" and "false" as names.
Allowing various Statements in classes.
Except statements now span a scope to capture the names of aliased qualified names.
Aliased expression now contain a symbol to be identified.
Casestatements now span a scope to capture the symbol of aliased expressions.
Formatting and comments for the Python.mc4.
Removing test that became wrong
@@ -0,0 +1,247 @@
# PullRequest to incorporate new Python functionality.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this file, it should only be the decription of the PR

PythonScript = Statement*;

/*====================================== Tokens ======================================*/
@Override
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR changes the formatting of many unchanged lines, making it very hard to review, due to added noise.

Also, the resulting formatting is incosistent and does not conform to our formatting rules(new block indented with 2 spaces)

Please fix, such that the git blame history is somewhat preserved.

fragment token StringFDQCharactersPython = (StringFDQCharacterPython)+;
fragment token StringFSQCharacterPython = ~ ('\''| '\\'| '{' )| PythonEscapeSequence |InnerFString;
fragment token StringFDQCharacterPython = ~ ('"'| '\\'| '{') | PythonEscapeSequence |InnerFString;
fragment token InnerFString = '{' ~('}')* '}';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print(f"{ {'a': 1, 'b': 2} }") is valid python

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing this on a token level is hard, since expressions are allowed inside {...}.
We might need a second pass (after parse trafo) to properly handle these, since regex can not count the correct opening/closing of brackets.

token PyFloat = DigitsPart? '.' DigitsPart | DigitsPart '.';
PyFloatLiteral implements NumericLiteral <200> = PyFloat;

token PyComplexNumber = (DigitsPart | PyFloat)? ('+' | '-')? DigitsPart "j";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I underst token priority, "1.5" in "1.5+0.8j" will never be parsed by this token definition, since "(DigitsPart | PyFloat)?" is optional and PyFloat / the other number litarals are defined with a higher prio(higher in the file)

I suggest parsing PyComplexNumber = DigitsPart{}"j" and letting the other parts(1.5, +) be parsed by other rules.
Maybe a {noSpace()}? is needed before "j"

StringLiteralPython implements Literal, SignedLiteral =
(
StringModifier?
(source:StringPython | source:String |source:Char)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Char seems weird, please add a comment in the grammar why this is needed.
Or remove if it is not needed.

ModuleWithOptionalAlias = name:PyQualifiedName ("as" alias:Name)?;

//Conditional statements
IfStatement implements Statement,ClassStatement = "if" condition:Expression ":" thenStatement:StatementBlock
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there statements that are not allowed inside a class block?
Otherwise, a rule like this might be easier to maintain:

StatementInClassDef implements ClassStatement = Statement;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then IfStatements, WhileStatement, ... would not need to extend ClassStatement

scope CaseStatement = key("case") (Expression || "|")+ ("as" Alias)? ("if" condition:Expression)? ":" CaseStatementBlock;
CaseStatementBlock = BLOCK_START Statement* BLOCK_END | Statement;

//Another Statements
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Other Statements


GeneratorFilter = "if" condition:Expression;
//Aliased Expression only allowed in Case Statements (Checked with CoCo).
AliasedExpression implements Expression = Expression "as" alias:Alias;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CaseStatement already contains "as", is this a duplicate?

import de.monticore.python.PythonMill;
import de.se_rwth.commons.logging.Log;

public class ExpressionsCorrectlyAliased implements PythonASTPythonScriptCoCo {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency: all other cocos have postfix Coco

} else {
// whitespace insensitive
return List.of(token);
// always ignore the continueLineToken
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an explicit, minimal, failing test to show this is necessary.
The WhitespaceSensitiveProcessor should already handle this.

This token deletion seems weird, since the next token needs to be handled differently based on this token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants