@@ -4,22 +4,23 @@ slug: updating-a-grammar
4
4
How to upgrade the grammar for a language
5
5
==
6
6
7
- Like for adding a language, most of these instructions happen in ` ocaml-tree-sitter ` .
7
+ Like for adding a language, most of these instructions happen in
8
+ [ ocaml-tree-sitter-semgrep] ( https://github.com/semgrep/ocaml-tree-sitter-semgrep ) .
8
9
9
- Let's call our language "X".
10
+ Let's assume we are upgrading the grammar for the programming language ` $PL ` .
11
+ (Consider adding an environment variable to your shell to make copying some of the commands below easier).
10
12
11
13
Summary (ocaml-tree-sitter)
12
14
--
13
15
14
16
In ocaml-tree-sitter:
15
- 1 . Update submodule tree-sitter-X .
16
- 2 . From ` lang/ ` , run ` ./test-lang X ` .
17
- 3 . From ` lang/ ` , ask a Semgrep team developer to run ` ./release X ` .
17
+ 1 . Update submodule ` tree-sitter-$PL ` .
18
+ 2 . From ` lang/ ` , run ` ./test-lang $PL ` .
19
+ 3 . From ` lang/ ` , ask a Semgrep team developer to run ` ./release $PL ` .
18
20
19
21
In semgrep:
20
- 1 . In the semgrep repo, update submodule semgrep-X.
21
- 2 . In the semgrep repo, update the OCaml code that maps the CST to the
22
- generic AST.
22
+ 1 . In the semgrep repo, update submodule ` semgrep-$PL ` .
23
+ 2 . In the semgrep repo, update the OCaml code that maps the CST to the generic AST.
23
24
24
25
In the end, ** make sure the generated code used by the main branch of
25
26
semgrep can be regenerated** from the main branch of ocaml-tree-sitter:
@@ -36,29 +37,29 @@ Here are the main components:
36
37
[ ocaml-tree-sitter] ( https://github.com/semgrep/ocaml-tree-sitter-semgrep ) :
37
38
generates OCaml parsing code from tree-sitter grammars extended
38
39
with ` ... ` and such. Publishes code into the git repos of the
39
- form ` semgrep-X ` .
40
- * the original tree-sitter grammar ` tree-sitter-X ` e.g.,
40
+ form ` semgrep-$PL ` .
41
+ * the original tree-sitter grammar ` tree-sitter-$PL ` e.g.,
41
42
[ tree-sitter-ruby] ( https://github.com/tree-sitter/tree-sitter-ruby ) :
42
43
the original tree-sitter grammar for the language.
43
- This is the git submodule ` lang/semgrep-grammars/src/tree-sitter-X `
44
+ This is the git submodule ` lang/semgrep-grammars/src/tree-sitter-$PL `
44
45
in ocaml-tree-sitter. It is installed at the project's root
45
46
in ` node_modules ` by invoking ` npm install ` .
46
47
* syntax extensions to support semgrep patterns, such as ellipses
47
48
(` ... ` ) and metavariables (` $FOO ` ).
48
- This is ` lang/semgrep-grammars/src/semgrep-X ` . It can be tested from
49
+ This is ` lang/semgrep-grammars/src/semgrep-$PL ` . It can be tested from
49
50
that folder with ` make && make test ` .
50
- * an automatically-modified grammar for language X in ` lang/X ` .
51
+ * an automatically-modified grammar for language ` $PL ` in ` lang/$PL ` .
51
52
It is modified so as to accommodate various requirements of the
52
- ocaml-tree-sitter code generator. ` lang/X /src ` and
53
- ` lang/X /ocaml-src ` contain the C/C++/OCaml code that will published
54
- into semgrep-X e.g.
53
+ ocaml-tree-sitter code generator. ` lang/$PL /src ` and
54
+ ` lang/$PL /ocaml-src ` contain the C/C++/OCaml code that will published
55
+ into ` semgrep-$PL ` e.g.
55
56
[ semgrep-ruby] ( https://github.com/semgrep/semgrep-ruby )
56
57
and used by semgrep.
57
- * [ semgrep-X ] ( https://github.com/semgrep/semgrep-ruby ) :
58
+ * [ semgrep-$PL ] ( https://github.com/semgrep/semgrep-ruby ) :
58
59
provides generated OCaml/C parsers as a dune project. Is a submodule
59
60
of semgrep.
60
61
* [ semgrep] ( https://github.com/semgrep/semgrep ) : uses the parsers
61
- provided by semgrep-X , which produce a CST. The
62
+ provided by ` semgrep-$PL ` , which produce a CST. The
62
63
program's CST or pattern's CST is further transformed into an AST
63
64
suitable for pattern matching.
64
65
@@ -71,27 +72,27 @@ Before upgrading
71
72
72
73
Make sure the ` grammar.js ` file or equivalent source files
73
74
defining the grammar are included in the ` fyi.list ` file in
74
- ` ocaml-tree-sitter/lang/X ` .
75
+ ` ocaml-tree-sitter/lang/$PL ` .
75
76
76
77
Why: It is important for tracking and _ understanding_ the changes made at the
77
78
source.
78
79
79
80
How: See [ How to add support for a new language] ( adding-a-language.md ) .
80
81
81
- Upgrade the tree-sitter-X submodule
82
+ Upgrade the tree-sitter-$PL submodule
82
83
--
83
84
84
- Say you want to upgrade (or downgrade) tree-sitter-X from some old
85
+ Say you want to upgrade (or downgrade) ` tree-sitter-$PL ` from some old
85
86
commit to commit ` 602f12b ` . This uses the git submodule way, without
86
87
anything weird. The commands might be something like this:
87
88
88
89
```
89
90
git submodule update --init --recursive --depth 1
90
- git checkout -b upgrade-X
91
- cd lang/semgrep-grammars/src/tree-sitter-X
92
- git fetch origin --unshallow
93
- git checkout 602f12b
94
- cd ..
91
+ git checkout -b upgrade-$PL
92
+ cd lang/semgrep-grammars/src/tree-sitter-$PL
93
+ git fetch origin --unshallow
94
+ git checkout 602f12b
95
+ cd ..
95
96
```
96
97
97
98
Testing
@@ -112,7 +113,7 @@ commands will build and test the language:
112
113
113
114
```
114
115
cd lang
115
- ./test-lang X
116
+ ./test-lang $PL
116
117
```
117
118
118
119
::: caution
@@ -122,7 +123,7 @@ correspond to [missing tokens](https://github.com/tree-sitter/tree-sitter/issues
122
123
123
124
Check with:
124
125
```
125
- grep Blank lang/X /ocaml-src/lib/CST.ml
126
+ grep Blank lang/$PL /ocaml-src/lib/CST.ml
126
127
```
127
128
If anything comes up, you must modify the grammar so as to create
128
129
a named rule for the node of the ` Blank ` kind. Eventually, the generated
@@ -131,19 +132,19 @@ Where a `Blank` node exists, we won't be able to get a token or its location
131
132
at parsing time.
132
133
133
134
If this works, we're all set. Commit the new commit for the
134
- tree-sitter-X submodule:
135
+ ` tree-sitter-$PL ` submodule:
135
136
```
136
137
git status
137
- git commit semgrep-languages/semgrep-X
138
- git push origin upgrade-X
138
+ git commit semgrep-languages/semgrep-$PL
139
+ git push origin upgrade-$PL
139
140
```
140
141
141
142
Then make a pull request to merge this into ocaml-tree-sitter's
142
143
main branch. It's ok to merge at this point, even if the generated code
143
144
hasn't been exported (** Publishing** section below) or if you haven't
144
145
done the necessary changes in semgrep (** Semgrep integration** below).
145
146
146
- We can now consider publishing the code to semgrep-X .
147
+ We can now consider publishing the code to ` semgrep-$PL ` .
147
148
148
149
Publishing
149
150
--
@@ -153,18 +154,18 @@ _Please [ask someone at Semgrep, Inc. to run this step](https://github.com/semgr
153
154
From the ` lang ` folder of ocaml-tree-sitter, we'll perform the
154
155
release. This step redoes some of the work that was done earlier and
155
156
checks that everything is clean before committing and pushing the
156
- changes to semgrep-X .
157
+ changes to semgrep-$PL .
157
158
158
159
```
159
160
cd lang
160
- ./release --dry-run X # dry-run release
161
- ... # 'git status' will show changes for language X
162
- ./release X # commits and pushes to semgrep-X
161
+ ./release --dry-run $PL # dry-run release
162
+ ... # 'git status' will show changes for language $PL
163
+ ./release $PL # commits and pushes to semgrep-$PL
163
164
```
164
165
165
166
This step is safe. Semgrep at this point is unaffected by those
166
167
changes. There is now a new commit at
167
- ` https://github.com/semgrep/semgrep-X ` e.g.
168
+ ` https://github.com/semgrep/semgrep-$PL ` e.g.
168
169
https://github.com/semgrep/semgrep-javascript .
169
170
The [ ` fyi/ ` folder] ( https://github.com/semgrep/semgrep-javascript/tree/main/fyi )
170
171
contains original files from which the code was generated.
@@ -175,10 +176,10 @@ got the correct version of `grammar.js` or some other source file.
175
176
Semgrep integration
176
177
--
177
178
178
- From the semgrep repository, point the submodule for semgrep-X to the
179
+ From the semgrep repository, point the submodule for ` semgrep-$PL ` to the
179
180
latest commit from the "Publishing" step. Then rebuild semgrep-core,
180
181
which will normally fail if the grammar changed. If the source
181
- ` grammar.js ` was included in the ` fyi ` folder for ` semgrep-X ` (as it
182
+ ` grammar.js ` was included in the ` fyi ` folder for ` semgrep-$PL ` (as it
182
183
should), ` git diff HEAD^ ` should help figure out the changes since the
183
184
last version.
184
185
0 commit comments