|
4 | 4 |
|
5 | 5 | The first version looked at a window of tokens and rule context information to predict: injecting newlines, injecting whitespace, alignment, and indentation. It works surprisingly well but it makes no guarantees that things will line up properly. For example we had to add a new feature so that {...} had both curlies on the same line if the statements in the block were on the same line. |
6 | 6 |
|
7 | | -The new approach is to look at subtree structures and identify the most common patterns. |
| 7 | +The new approach is to look at subtree structures and identify the most common patterns. The idea is to figure out what whitespace, if any, to inject in between siblings of a parse subtree. As we walk down the tree, we have `currentColumn` information. We process each subtree after processing all of its children, so that we have information about whether lists got broken across lines etc. |
8 | 8 |
|
9 | 9 | ## Token dependencies |
10 | 10 |
|
11 | | -Here is an example where we want the `}` to line up with the `void`, but those tokens are in a subtree. On the other hand, we can always ask whether or not the last token for a subtree, `}` here, aligns with another token. |
| 11 | +Here is a simple java method definition: |
12 | 12 |
|
13 | 13 | ```java |
14 | 14 | void f(int i, int j) { |
15 | 15 | } |
16 | 16 | ``` |
17 | 17 |
|
18 | | -<img src="images/method-def.png" width=400> |
| 18 | +and associated parse tree: |
| 19 | + |
| 20 | +<img src="images/method-def.png" width=500> |
| 21 | + |
| 22 | +The parentheses around the parameter list are codependent and the decisions to inject white space after the `(` and before the `)` often depend on whether the `formalParameterList` child gets split across multiple lines. In this case, the parameters are all on a single line so we might train: |
| 23 | + |
| 24 | +| Features | Prediction | |
| 25 | +| ------------- |:-------------:| |
| 26 | +| (root=formalParameters,`(`, formalParameterList, formalParameterList-same-line) | none | |
| 27 | +| (root=formalParameters, formalParameterList, formalParameterList-same-line, `)`) | none | |
| 28 | + |
| 29 | +If we decide to split the parameters across lines, it would not force white space before and after the parentheses; e.g., |
| 30 | + |
| 31 | +```java |
| 32 | +void f(int i, |
| 33 | + int j) { |
| 34 | +} |
| 35 | +``` |
| 36 | + |
| 37 | +But, we might see examples like this: |
| 38 | + |
| 39 | +```java |
| 40 | +void f( |
| 41 | + int i, |
| 42 | + int j |
| 43 | +) { |
| 44 | +} |
| 45 | +``` |
| 46 | + |
| 47 | +| Features | Prediction | |
| 48 | +| ------------- |:-------------:| |
| 49 | +| (root=formalParameters,`(`, formalParameterList, formalParameterList-split-lines) | inject \n, indent | |
| 50 | +| (root=formalParameters, formalParameterList, formalParameterList-split-lines, `)`) | inject \n, no indent | |
| 51 | + |
| 52 | +When processing the `formalParameterList` child, we only decided to align but did not make the decision to indent. That is the decision for `formalParameters`. We treat the output of `formalParameterList` almost like a big character. |
| 53 | + |
| 54 | +This implies that `formalParameterList` does not know a precise starting column. We would feed it the column of the `(` for it to make a decision, but then we might indent it. That implies that we don't get a string back but rather an `V` (BOX terminology for vertical alignment) operator on that child. |
| 55 | + |
| 56 | +old stuff: |
| 57 | + |
| 58 | +Here is an example where we want the `}` to line up with the `void`, but those tokens are in a subtree. On the other hand, we can always ask whether or not the last token for a subtree, `}` here, aligns with another token. |
| 59 | + |
19 | 60 |
|
20 | 61 | | Features | Prediction | |
21 | 62 | | ------------- |:-------------:| |
@@ -83,7 +124,7 @@ For Java, such as: |
83 | 124 | } |
84 | 125 | ``` |
85 | 126 |
|
86 | | -<img src="images/method-body.png" width=400> |
| 127 | +<img src="images/method-body.png" width=500> |
87 | 128 |
|
88 | 129 | We get: |
89 | 130 |
|
|
0 commit comments