Skip to content

Commit 195b071

Browse files
authored
Replace X placeholder with environment variable (#1912)
The contributing docs that explain how to add and update a programming language use a placeholder `X` to refer to the name of the language. This commit updates the docs to use `$PL` as the placeholder instead. Using `$PL` makes it easier to copy and run the code samples in the documentation, and `$PL` is more descriptive than simply `$X`.
1 parent a524447 commit 195b071

File tree

2 files changed

+61
-59
lines changed

2 files changed

+61
-59
lines changed

docs/contributing/adding-a-language.md

+21-20
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Otherwise, let's get started.
1717
Repositories involved directly:
1818
* [semgrep](https://github.com/semgrep/semgrep): the semgrep command line program;
1919
* [ocaml-tree-sitter-semgrep](https://github.com/semgrep/ocaml-tree-sitter-semgrep): language-specific setup, generates C/OCaml parsers for semgrep;
20-
* new repo semgrep-_X_ for the new language _X_: C/OCaml parser generated from ocaml-tree-sitter-semgrep by an admin.
20+
* A new repo `semgrep-$PL` for the language `$PL`: C/OCaml parser generated from ocaml-tree-sitter-semgrep by an admin.
2121

2222
Submodules overview (semgrep repo)
2323
--
@@ -36,7 +36,8 @@ repository](https://github.com/semgrep/semgrep):
3636
└── semgrep-ruby
3737
```
3838

39-
When done with the work in [ocaml-tree-sitter-semgrep](https://github.com/semgrep/ocaml-tree-sitter-semgrep), you'll need a new repo semgrep-X to host the generated parser code.
39+
When done with the work in [ocaml-tree-sitter-semgrep](https://github.com/semgrep/ocaml-tree-sitter-semgrep),
40+
you'll need a new repo `semgrep-$PL` to host the generated parser code.
4041
Ask someone from the Semgrep team to create one for you. For this, they should use
4142
the template
4243
[semgrep-lang-template](https://github.com/semgrep/semgrep-lang-template)
@@ -70,16 +71,16 @@ what's going on or to set things up manually.
7071

7172
From the ocaml-tree-sitter repo, do the following:
7273

73-
1. Create a `lang/X` folder.
74+
1. Create a `lang/$PL` folder.
7475
2. Make a `test/ok` directory. Inside the directory,
7576
create a simple `hello-world` program for the language you are porting.
7677
Name the program `hello-world.<ext>`.
7778
3. Now make a file called `extensions.txt` and input all the language extensions
7879
(.rb, .kt, etc) for your language in the file.
7980
4. Create a file called `fyi.list` with all the information files, such as
80-
`semgrep-grammars/src/tree-sitter-X/LICENSE`,
81-
`semgrep-grammars/src/tree-sitter-X/grammar.js`,
82-
`semgrep-grammars/src/semgrep-X/grammar.js`, etc.
81+
`semgrep-grammars/src/tree-sitter-$PL/LICENSE`,
82+
`semgrep-grammars/src/tree-sitter-$PL/grammar.js`,
83+
`semgrep-grammars/src/semgrep-$PL/grammar.js`, etc.
8384
to bundle with the final OCaml/C project.
8485
5. Link the Makefile.common to a Makefile in the directory with:
8586
`ln -s ../Makefile.common Makefile`
@@ -88,7 +89,7 @@ From the ocaml-tree-sitter repo, do the following:
8889
on which to run parsing stats. Run with the following command:
8990
`./scripts/most-starred-for-language <lang> <github_username> <api_key>`
9091
* Using github advanced search to find the most starred or most forked repositories.
91-
7. Copy the generated `projects.txt` file into the `lang/X` directory.
92+
7. Copy the generated `projects.txt` file into the `lang/$PL` directory.
9293
8. Add in extra projects and extra input sets as you see necessary.
9394

9495
Here's the file hierarchy for Ruby:
@@ -116,7 +117,7 @@ then run some tests for the parser. Full instructions for this
116117
are given in [updating-a-grammar](updating-a-grammar.md) under
117118
"Testing". The short instructions are:
118119
1. For the first time, build everything with `./scripts/rebuild-everything`.
119-
2. Subsequently, work from the `lang/X` folder and run
120+
2. Subsequently, work from the `lang/$PL` folder and run
120121
`make` and `make test`.
121122

122123
### The `fyi.list` file
@@ -154,12 +155,12 @@ the semgrep ellipsis `...` usually needs to be added as well.
154155
You'll need to learn [how to create tree-sitter
155156
grammars](https://tree-sitter.github.io/tree-sitter/creating-parsers).
156157

157-
1. Work from `semgrep-grammars/src/semgrep-X` and use `make` and
158+
1. Work from `semgrep-grammars/src/semgrep-$PL` and use `make` and
158159
`make test` to build and test.
159160
2. Add new test cases to `test/corpus/semgrep.text`.
160161
3. Edit `grammar.js`.
161162
4. Refer to the original grammar in
162-
`semgrep-grammars/src/tree-sitter-X` to determine which rules to
163+
`semgrep-grammars/src/tree-sitter-$PL` to determine which rules to
163164
extend.
164165

165166
For an example of how to extend a language, you can:
@@ -243,9 +244,9 @@ branch, do the following:
243244
applicable) look clean and have minimal external dependencies.
244245
2. In `ocaml-tree-sitter/lang/Makefile`, add language under
245246
'SUPPORTED_LANGUAGES' and 'STAT_LANGUAGES'.
246-
3. In `ocaml-tree-sitter/lang` directory, run `./release X --dry-run`.
247+
3. In `ocaml-tree-sitter/lang` directory, run `./release $PL --dry-run`.
247248
If this looks good, please [ask someone from the Semgrep team](https://github.com/semgrep/ocaml-tree-sitter-semgrep/blob/main/doc/release.md) to
248-
publish the code using `./release X`.
249+
publish the code using `./release $PL`.
249250

250251
### Troubleshooting
251252

@@ -267,7 +268,7 @@ Here are some known types of parsing errors:
267268

268269
* A syntax error. The input program is in the wrong syntax or uses a
269270
recent feature that's not supported yet: `make test` or directly the
270-
`parse_X` program will show the tree produced by tree-sitter with
271+
`parse_$PL` program will show the tree produced by tree-sitter with
271272
one or more `ERROR` nodes.
272273
* A "reparsing" error. It's an error generated after the first
273274
successful parsing pass by the tree-sitter parser, during the
@@ -298,13 +299,13 @@ languages in pfff.
298299
Look under **Adding a Language** in [pfff](https://github.com/semgrep/pfff/blob/develop/README.md)
299300
for step-by-step instructions.
300301

301-
## semgrep-core
302+
### semgrep-core
302303

303-
Now that you have added your new language 'X' to pfff, do the following:
304+
Now that you have added your new language `$PL` to pfff, do the following:
304305
1. Add the new pfff submodule to semgrep-core.
305-
2. In `Check_pattern.ml`, add 'X' to `lang_has_no_dollar_ids`/ If the grammar
306+
2. In `Check_pattern.ml`, add `$PL` to `lang_has_no_dollar_ids`/ If the grammar
306307
has no dollar identifiers, add it above 'true'. Otherwise, add it above 'false'.
307-
3. In `synthesizing/Pretty_print_generic.ml`, add 'X' to the appropriate functions:
308+
3. In `synthesizing/Pretty_print_generic.ml`, add `$PL` to the appropriate functions:
308309
* print_bool
309310
* if_stmt
310311
* while_stmt
@@ -331,11 +332,11 @@ Now that you have added your new language 'X' to pfff, do the following:
331332
Parallel.invoke Tree_sitter_X.Parse.file file ()
332333
)
333334
```
334-
6. In `parsing/tree_sitter/dune`, add `tree-sitter-lang.X`.
335-
7. Write a basic test case for your language in `tests/X/hello-world.X`. This can
335+
6. In `parsing/tree_sitter/dune`, add `tree-sitter-lang.$PL`
336+
7. Write a basic test case for your language in `tests/$PL/hello-world.$PL`. This can
336337
just be a hello-world function.
337338
8. Test that the command
338-
`semgrep-core/bin/semgrep-core -dump_tree_sitter_cst test/X/hello-world`
339+
`semgrep-core/bin/semgrep-core -dump_tree_sitter_cst test/$PL/hello-world`
339340
prints out a CST for your language.
340341

341342
## Legal concerns

docs/contributing/updating-a-grammar.md

+40-39
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,23 @@ slug: updating-a-grammar
44
How to upgrade the grammar for a language
55
==
66

7-
Like for adding a language, most of these instructions happen in `ocaml-tree-sitter`.
7+
Like for adding a language, most of these instructions happen in
8+
[ocaml-tree-sitter-semgrep](https://github.com/semgrep/ocaml-tree-sitter-semgrep).
89

9-
Let's call our language "X".
10+
Let's assume we are upgrading the grammar for the programming language `$PL`.
11+
(Consider adding an environment variable to your shell to make copying some of the commands below easier).
1012

1113
Summary (ocaml-tree-sitter)
1214
--
1315

1416
In ocaml-tree-sitter:
15-
1. Update submodule tree-sitter-X.
16-
2. From `lang/`, run `./test-lang X`.
17-
3. From `lang/`, ask a Semgrep team developer to run `./release X`.
17+
1. Update submodule `tree-sitter-$PL`.
18+
2. From `lang/`, run `./test-lang $PL`.
19+
3. From `lang/`, ask a Semgrep team developer to run `./release $PL`.
1820

1921
In semgrep:
20-
1. In the semgrep repo, update submodule semgrep-X.
21-
2. In the semgrep repo, update the OCaml code that maps the CST to the
22-
generic AST.
22+
1. In the semgrep repo, update submodule `semgrep-$PL`.
23+
2. In the semgrep repo, update the OCaml code that maps the CST to the generic AST.
2324

2425
In the end, **make sure the generated code used by the main branch of
2526
semgrep can be regenerated** from the main branch of ocaml-tree-sitter:
@@ -36,29 +37,29 @@ Here are the main components:
3637
[ocaml-tree-sitter](https://github.com/semgrep/ocaml-tree-sitter-semgrep):
3738
generates OCaml parsing code from tree-sitter grammars extended
3839
with `...` and such. Publishes code into the git repos of the
39-
form `semgrep-X`.
40-
* the original tree-sitter grammar `tree-sitter-X` e.g.,
40+
form `semgrep-$PL`.
41+
* the original tree-sitter grammar `tree-sitter-$PL` e.g.,
4142
[tree-sitter-ruby](https://github.com/tree-sitter/tree-sitter-ruby):
4243
the original tree-sitter grammar for the language.
43-
This is the git submodule `lang/semgrep-grammars/src/tree-sitter-X`
44+
This is the git submodule `lang/semgrep-grammars/src/tree-sitter-$PL`
4445
in ocaml-tree-sitter. It is installed at the project's root
4546
in `node_modules` by invoking `npm install`.
4647
* syntax extensions to support semgrep patterns, such as ellipses
4748
(`...`) and metavariables (`$FOO`).
48-
This is `lang/semgrep-grammars/src/semgrep-X`. It can be tested from
49+
This is `lang/semgrep-grammars/src/semgrep-$PL`. It can be tested from
4950
that folder with `make && make test`.
50-
* an automatically-modified grammar for language X in `lang/X`.
51+
* an automatically-modified grammar for language `$PL` in `lang/$PL`.
5152
It is modified so as to accommodate various requirements of the
52-
ocaml-tree-sitter code generator. `lang/X/src` and
53-
`lang/X/ocaml-src` contain the C/C++/OCaml code that will published
54-
into semgrep-X e.g.
53+
ocaml-tree-sitter code generator. `lang/$PL/src` and
54+
`lang/$PL/ocaml-src` contain the C/C++/OCaml code that will published
55+
into `semgrep-$PL` e.g.
5556
[semgrep-ruby](https://github.com/semgrep/semgrep-ruby)
5657
and used by semgrep.
57-
* [semgrep-X](https://github.com/semgrep/semgrep-ruby):
58+
* [semgrep-$PL](https://github.com/semgrep/semgrep-ruby):
5859
provides generated OCaml/C parsers as a dune project. Is a submodule
5960
of semgrep.
6061
* [semgrep](https://github.com/semgrep/semgrep): uses the parsers
61-
provided by semgrep-X, which produce a CST. The
62+
provided by `semgrep-$PL`, which produce a CST. The
6263
program's CST or pattern's CST is further transformed into an AST
6364
suitable for pattern matching.
6465

@@ -71,27 +72,27 @@ Before upgrading
7172

7273
Make sure the `grammar.js` file or equivalent source files
7374
defining the grammar are included in the `fyi.list` file in
74-
`ocaml-tree-sitter/lang/X`.
75+
`ocaml-tree-sitter/lang/$PL`.
7576

7677
Why: It is important for tracking and _understanding_ the changes made at the
7778
source.
7879

7980
How: See [How to add support for a new language](adding-a-language.md).
8081

81-
Upgrade the tree-sitter-X submodule
82+
Upgrade the tree-sitter-$PL submodule
8283
--
8384

84-
Say you want to upgrade (or downgrade) tree-sitter-X from some old
85+
Say you want to upgrade (or downgrade) `tree-sitter-$PL` from some old
8586
commit to commit `602f12b`. This uses the git submodule way, without
8687
anything weird. The commands might be something like this:
8788

8889
```
8990
git submodule update --init --recursive --depth 1
90-
git checkout -b upgrade-X
91-
cd lang/semgrep-grammars/src/tree-sitter-X
92-
git fetch origin --unshallow
93-
git checkout 602f12b
94-
cd ..
91+
git checkout -b upgrade-$PL
92+
cd lang/semgrep-grammars/src/tree-sitter-$PL
93+
git fetch origin --unshallow
94+
git checkout 602f12b
95+
cd ..
9596
```
9697

9798
Testing
@@ -112,7 +113,7 @@ commands will build and test the language:
112113

113114
```
114115
cd lang
115-
./test-lang X
116+
./test-lang $PL
116117
```
117118

118119
:::caution
@@ -122,7 +123,7 @@ correspond to [missing tokens](https://github.com/tree-sitter/tree-sitter/issues
122123

123124
Check with:
124125
```
125-
grep Blank lang/X/ocaml-src/lib/CST.ml
126+
grep Blank lang/$PL/ocaml-src/lib/CST.ml
126127
```
127128
If anything comes up, you must modify the grammar so as to create
128129
a named rule for the node of the `Blank` kind. Eventually, the generated
@@ -131,19 +132,19 @@ Where a `Blank` node exists, we won't be able to get a token or its location
131132
at parsing time.
132133

133134
If this works, we're all set. Commit the new commit for the
134-
tree-sitter-X submodule:
135+
`tree-sitter-$PL` submodule:
135136
```
136137
git status
137-
git commit semgrep-languages/semgrep-X
138-
git push origin upgrade-X
138+
git commit semgrep-languages/semgrep-$PL
139+
git push origin upgrade-$PL
139140
```
140141

141142
Then make a pull request to merge this into ocaml-tree-sitter's
142143
main branch. It's ok to merge at this point, even if the generated code
143144
hasn't been exported (**Publishing** section below) or if you haven't
144145
done the necessary changes in semgrep (**Semgrep integration** below).
145146

146-
We can now consider publishing the code to semgrep-X.
147+
We can now consider publishing the code to `semgrep-$PL`.
147148

148149
Publishing
149150
--
@@ -153,18 +154,18 @@ _Please [ask someone at Semgrep, Inc. to run this step](https://github.com/semgr
153154
From the `lang` folder of ocaml-tree-sitter, we'll perform the
154155
release. This step redoes some of the work that was done earlier and
155156
checks that everything is clean before committing and pushing the
156-
changes to semgrep-X.
157+
changes to semgrep-$PL.
157158

158159
```
159160
cd lang
160-
./release --dry-run X # dry-run release
161-
... # 'git status' will show changes for language X
162-
./release X # commits and pushes to semgrep-X
161+
./release --dry-run $PL # dry-run release
162+
... # 'git status' will show changes for language $PL
163+
./release $PL # commits and pushes to semgrep-$PL
163164
```
164165

165166
This step is safe. Semgrep at this point is unaffected by those
166167
changes. There is now a new commit at
167-
`https://github.com/semgrep/semgrep-X` e.g.
168+
`https://github.com/semgrep/semgrep-$PL` e.g.
168169
https://github.com/semgrep/semgrep-javascript.
169170
The [`fyi/` folder](https://github.com/semgrep/semgrep-javascript/tree/main/fyi)
170171
contains original files from which the code was generated.
@@ -175,10 +176,10 @@ got the correct version of `grammar.js` or some other source file.
175176
Semgrep integration
176177
--
177178

178-
From the semgrep repository, point the submodule for semgrep-X to the
179+
From the semgrep repository, point the submodule for `semgrep-$PL` to the
179180
latest commit from the "Publishing" step. Then rebuild semgrep-core,
180181
which will normally fail if the grammar changed. If the source
181-
`grammar.js` was included in the `fyi` folder for `semgrep-X` (as it
182+
`grammar.js` was included in the `fyi` folder for `semgrep-$PL` (as it
182183
should), `git diff HEAD^` should help figure out the changes since the
183184
last version.
184185

0 commit comments

Comments
 (0)