Supporting more python#1
Conversation
Token and NumericLiteral for imaginary numbers in python.
nonlocal statements yield from statements
Add support for **=,@=,//= assignmentexpressions.
Global declarations Removing support for AnnotedAssignments apart from "="
Avoiding Nullpointer exceptions, After parsing a ExceptStatement a NullPointerException could happen during prettyprinting.
Always ignore continueLineTokens
Improving ExceptStatements and allowing the passing of classes
Improving ExceptStatements and allowing unsigned imaginary numbers
Generics and Typehints
Minor fixes, Additional Match/Case combinations allowed Parenthesis for Expressions added
Minor fixes of previous mistake Added CoCo for aliased expressions
…urly brackets. Overriding of the BooleanLiteral to allow "true" and "false" as names. Allowing various Statements in classes. Except statements now span a scope to capture the names of aliased qualified names. Aliased expression now contain a symbol to be identified. Casestatements now span a scope to capture the symbol of aliased expressions. Formatting and comments for the Python.mc4.
Removing test that became wrong
| @@ -0,0 +1,247 @@ | |||
| # PullRequest to incorporate new Python functionality. | |||
There was a problem hiding this comment.
Please remove this file, it should only be the decription of the PR
| PythonScript = Statement*; | ||
|
|
||
| /*====================================== Tokens ======================================*/ | ||
| @Override |
There was a problem hiding this comment.
This PR changes the formatting of many unchanged lines, making it very hard to review, due to added noise.
Also, the resulting formatting is incosistent and does not conform to our formatting rules(new block indented with 2 spaces)
Please fix, such that the git blame history is somewhat preserved.
| fragment token StringFDQCharactersPython = (StringFDQCharacterPython)+; | ||
| fragment token StringFSQCharacterPython = ~ ('\''| '\\'| '{' )| PythonEscapeSequence |InnerFString; | ||
| fragment token StringFDQCharacterPython = ~ ('"'| '\\'| '{') | PythonEscapeSequence |InnerFString; | ||
| fragment token InnerFString = '{' ~('}')* '}'; |
There was a problem hiding this comment.
print(f"{ {'a': 1, 'b': 2} }") is valid python
There was a problem hiding this comment.
Parsing this on a token level is hard, since expressions are allowed inside {...}.
We might need a second pass (after parse trafo) to properly handle these, since regex can not count the correct opening/closing of brackets.
| token PyFloat = DigitsPart? '.' DigitsPart | DigitsPart '.'; | ||
| PyFloatLiteral implements NumericLiteral <200> = PyFloat; | ||
|
|
||
| token PyComplexNumber = (DigitsPart | PyFloat)? ('+' | '-')? DigitsPart "j"; |
There was a problem hiding this comment.
As I underst token priority, "1.5" in "1.5+0.8j" will never be parsed by this token definition, since "(DigitsPart | PyFloat)?" is optional and PyFloat / the other number litarals are defined with a higher prio(higher in the file)
I suggest parsing PyComplexNumber = DigitsPart{}"j" and letting the other parts(1.5, +) be parsed by other rules.
Maybe a {noSpace()}? is needed before "j"
| StringLiteralPython implements Literal, SignedLiteral = | ||
| ( | ||
| StringModifier? | ||
| (source:StringPython | source:String |source:Char) |
There was a problem hiding this comment.
Char seems weird, please add a comment in the grammar why this is needed.
Or remove if it is not needed.
| ModuleWithOptionalAlias = name:PyQualifiedName ("as" alias:Name)?; | ||
|
|
||
| //Conditional statements | ||
| IfStatement implements Statement,ClassStatement = "if" condition:Expression ":" thenStatement:StatementBlock |
There was a problem hiding this comment.
Is there statements that are not allowed inside a class block?
Otherwise, a rule like this might be easier to maintain:
StatementInClassDef implements ClassStatement = Statement;
There was a problem hiding this comment.
Then IfStatements, WhileStatement, ... would not need to extend ClassStatement
| scope CaseStatement = key("case") (Expression || "|")+ ("as" Alias)? ("if" condition:Expression)? ":" CaseStatementBlock; | ||
| CaseStatementBlock = BLOCK_START Statement* BLOCK_END | Statement; | ||
|
|
||
| //Another Statements |
|
|
||
| GeneratorFilter = "if" condition:Expression; | ||
| //Aliased Expression only allowed in Case Statements (Checked with CoCo). | ||
| AliasedExpression implements Expression = Expression "as" alias:Alias; |
There was a problem hiding this comment.
CaseStatement already contains "as", is this a duplicate?
| import de.monticore.python.PythonMill; | ||
| import de.se_rwth.commons.logging.Log; | ||
|
|
||
| public class ExpressionsCorrectlyAliased implements PythonASTPythonScriptCoCo { |
There was a problem hiding this comment.
Consistency: all other cocos have postfix Coco
| } else { | ||
| // whitespace insensitive | ||
| return List.of(token); | ||
| // always ignore the continueLineToken |
There was a problem hiding this comment.
Please add an explicit, minimal, failing test to show this is necessary.
The WhitespaceSensitiveProcessor should already handle this.
This token deletion seems weird, since the next token needs to be handled differently based on this token.
PullRequest to incorporate new Python functionality.
In this Markdown all changes will be presented and reasoned.
(This Markdown is also availble as .md in the repository)
To see what is known to be unsupported see knownToBeUnsupported.md
Grammar
Rework of For Decomposition
Python currently allows another targets inside the for constructs, thereby we change our ForControl non terminal to be closer to the python language.
Old:
New:
Python: ("[...]" Marks optional)
Sources: https://docs.python.org/3/reference/simple_stmts.html#grammar-token-python-grammar-target_list and https://docs.python.org/3/reference/expressions.html (6.2.5)
Rework of Boolean
Instead of Defining a new BooleanLiteralPython, we override the Monticore BooleanLiteral to allow 'true' and 'false' as names, tests have shown that they can be used as such.
Rework of fstrings
It proved to be difficult to support all variants of fStrings, so far they were parsed as a singular string token, thereby the content of curly brackets was considered text, yet tests have shown that line breaks are allowed within curly brackets. Another issue is the allowed usage of the same quotes " or ' in the outer string as well as inside the curly brackets. Finally it is possible to add f to chars in python such as
f'a'this was so far an issue as only strings were allowed to be modified and not chars.With the following changes we aim combat these issues.
Addition of complex numbers.
Python supports the usage of complex numbers, they are formatted as:
(real number) (+|-) (imaginaryNumber)'j'
A token PyComplexNumber was added supporting these numbers, additional a matching literal for their usage within python code.
Source for cmath: https://docs.python.org/3/library/cmath.html (Experiments have shown that that the real number and sign of the imaginary can be left out)
Addition of generics.
We added support for generics as follows:
This is done similarly to https://docs.python.org/3/reference/compound_stmts.html#type-params.
But differently to use the existing TypeAnnotation non terminal. To support the functionality described in the source above we added "GenericsAnnotation?" to:
and the TypeAnnotations
GenericTypeAnnotation implements TypeAnnotation = TypeAnnotation GenericsAnnotationStatements
ClassStatements
Import, If, Assert, For, While, ConditionalExecution, With, GlobalVariableDeclaration, NonLocalVariableDeclaration, MultiVariableDeclaration, ParenMultiVariableDeclaration, Raise, TypeRuleStatement,Delete statements are now implementing the ClassStatement interface. By experimenting and testing they showed to be allowed inside classes.
New Statements
Nonlocal
For a detailed description of the nonlocal keyword see: https://docs.python.org/3/tutorial/classes.html under 9.2 . Summarizing the source, nonlocal allows a function or class defined inside another function to accesses the enclosing functions variables. Without the non local keyword these variables are read only to nested functions and classes. Our implementation is similar to pythons https://docs.python.org/3/reference/simple_stmts.html#nonlocal.
Pythons:
nonlocal_stmt: "nonlocal" identifier ("," identifier)*(see source above)Ours :
NonLocalVariableDeclaration implements Statement,ClassStatement = "nonlocal" names:(Name || ",")+ STATEMENT_END;Yield from
Yield from statements in the form: 'yield' 'from' expression have been added in https://peps.python.org/pep-0380/ ,we support them similarly to python https://docs.python.org/3/reference/expressions.html#grammar-token-python-grammar-yield_expression.
Pythons:
yield_from: "yield" "from" expression(see source above)Ours:
YieldFromStatement implements Statement = "yield" "from" Expression STATEMENT_END;Type Declaration Statement
To allow type hints as
var : intwithout an assignment (in comparison to augmented assignments), we add the new TypeDeclartationStatement.Python incorporates this functionality in the augmented assignments, yet this caused issues for us. see https://docs.python.org/3/reference/simple_stmts.html#index-15.
Type Aliases
Python allows to declare type aliases. Where a type is aliased under an identifier, see https://docs.python.org/3/reference/simple_stmts.html#grammar-token-python-grammar-type_stmt.
We implement the support similarly, also we define the type alias as symbol.
Python:
type_stmt: 'type' identifier [type_params] "=" expression(see source above)Ours:
symbol TypeRuleStatement implements Statement,ClassStatement = key("type") Name GenericsAnnotation? "=" Expression STATEMENT_END;'type' needs to be declared as local keyword as tests have shown that it can be used as variable name. Python also specifies this https://docs.python.org/3/reference/lexical_analysis.html#soft-keywords .
Changes
Global
So far we only allowed a single variable to be declared as global in a global statement.
But Python allows multiple variables to be declared global in the same statement separated by commata see: https://docs.python.org/3/reference/simple_stmts.html#global , also so far we allowed a type annotation within a global statement which is not allowed, thereby this support is removed.
Additionally global can be used in classes as described in the source above.
Old:
GlobalVariableDeclaration implements Statement = "global" Name (":" TypeAnnotation)? STATEMENT_END;New:
GlobalVariableDeclaration implements Statement,ClassStatement = "global" (Name || ",")+ STATEMENT_END;Class Statement Block
A ClassStatementBlock is now allowed to be a singular ClassStatement to allow classes such as:
Old:
ClassStatementBlock = BLOCK_START ClassStatementBlockBody BLOCK_END;New:
ClassStatementBlock = (BLOCK_START ClassStatementBlockBody BLOCK_END)|ClassStatement;Test have shown that this is needed.
Match Statement
Case statements in python allow patterns to be aliased (as example
case Type as t:) see https://docs.python.org/3/reference/compound_stmts.html#the-match-statement and https://docs.python.org/3/reference/compound_stmts.html#grammar-token-python-grammar-patterns. To allow this, case statements now span a scope to capture the alias.Old:
New:
(Alias is defined as ```symbol Alias = Name;````)
Aditionally we allowed match statements inside classes.
Try-Except
PEP758 Added support for leaving out parenthesis. Further so far the support for various Expressions and starred expressions were missing as specified in https://peps.python.org/pep-0758/ we added the support for this accordingly as specified. Also similarly to the new case statements, excepts now span a scope with the alias non terminal.
Old:
New:
Expressions
Changes
Assignment
As defined by https://docs.python.org/3/reference/simple_stmts.html#index-14 python's assignment statements additionally supports "//=" "@=" "**=" thereby we add this support by overriding the AssignmentExpression from Monticore by simply adding the three additional operators to the square brackets.
Augmented Assignment
So far we allowed the same operators as in the assignment expression, yet python only allows "=" see https://docs.python.org/3/reference/simple_stmts.html#index-15.
Thereby we remove the square brackets and only allow "=".
Old:
New:
Comprehensions
We allowed multiple if inside a comprehension by changing ? to * at the Generator filters.
This is allowed by python as specified here: https://docs.python.org/3/reference/expressions.html#grammar-token-python-grammar-comp_for
Another
We added some trailing ","? to rules where they were noticed to be allowed while testing. We also formated the grammar and added some comments
Code
In this section i will add the commits instead of the code snippets for readability reasons.
Preprocessor
Commit: 8c519ef
In line 75-77 i added that the continue line token is always ignored to allow statements like:
as these could cause issues.
Visitor and CoCos related to aliased expressions.
Marc suggested to add a Coco and Visitor to implement the prevention of aliased expressions to be used outside the match blocks, i implemented this in the following commit.
Commit: ba9e8f1
Adaptation of new tests
We removed a test regarding invalid classes that are now valid (because for loops are now allowed as class statements).
We added new test cases for Python, for that we looked what the python parser tests and we do not test.
We then added some tests.
See: https://github.com/python/cpython/blob/3.9/Lib/test/test_parser.py , and the commit 7865991