Skip to content

Document nested and branching paths. #84

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
221 changes: 221 additions & 0 deletions docs/concepts/CLQL.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,227 @@ method(depth = any):

<br />

# Edge

Facts in AST lexicons refer to nodes in an AST, and the parent/child relationship between facts refers to the parent/child relationship of nodes in the AST. These nodes can have other parent/child relationships that are orthogonal to AST, such as calls. These relationships can be queried with the `edge` keyword.

The following query finds function calls at the top level of a file and follows the `calls` edge to their definition:

```
common.func_call:
edge("calls"):
common.func
```

<br />

# Path

Path statements encapsulate CLQL trees. These subtrees can be repeated with a single argument allowing succint repition of complex patterns. Branched paths can rejoin allowing a fact to match nodes with different kinds of parents.

## Linear

Say we wanted to find triply nested if statements, our query would look like the following:

```clql
common.if_stmt:
common.if_stmt:
common.if_stmt
```

With paths, we can express the same thing like so:

```clql
path(repeat = 3):
common.if_stmt:
pathcontinue
```

Once a query reaches a `pathcontinue` statement it continues from the `path` statement until the path has been repeated the specified number of times.

## Repeat range

Some queries cannot be written with `path` statements. Say we wanted to find all functions called by `someFunc()` and an arbitrarily long chain of calls. Our query would have to explicitly match either directly called functions, or functions with 1, 2, 3 etc intermediaries to infinity.

```clql
common.func:
name == "someFunc"
any_of:
common.func_call(depth = any):
edge("calls"):
common.func
common.func_call(depth = any):
edge("calls"):
common.func:
common.func_call(depth = any):
edge("calls"):
common.func
...
common.func_call(depth = any):
edge("calls"):
common.func:
common.func_call(depth = any):
edge("calls"):
common.func:
...
```

With paths the same query is trivial:

```clql
common.func:
name == "someFunc"
path(repeat = 1:):
common.func_call(depth = any):
edge("calls"):
common.func:
pathcontinue
```

`repeat = 1:` is a range specifying that the path should be repeated one or more times.

## Complex subtrees

Say we wanted to match triply nested if statements that all check the same value, our query would look like the following:

```clql
common.if_stmt:
common.condition:
common.var:
name as varName
common.if_stmt:
common.condition:
common.var:
name == varName
common.if_stmt:
common.condition:
common.var:
name == varName
```

With paths our query has much less repitition:

```clql
common.func:
path(repeat = 3):
common.if_stmt:
common.condition:
common.var:
name as varName
pathcontinue
```

Note that CLQL elements that are children of `path`, not just the `if_stmt`. Also note that repeated definitions of `varName` are replaced with assertions.

## Pathend

Suppose we wanted to match triply nested if statements with a function call inside the innermost if statement. Without paths our query looks like:

```clql
common.if_stmt:
common.if_stmt:
common.if_stmt:
common.func_call
```

with paths our query looks like:

```clql
path(repeat = 3):
common.if_stmt:
pathcontinue
pathend:
common.func_call
```

## Branching

A `path` statement can have multiple `pathcontinue` statements. This allows multiple parents to have the same children. These children are defined under a `pathend` statment, rather than as children of the `pathcontinue` statements. It is often used with an `any_of` to match classes of simlar facts such as methods, closures, and functions, for example.

Say we wanted to find functions or methods with doubly nested for loops. Without `pathend` we would need to repeat the doubly nested for loop facts under `func` and `method`:

```
common.file:
any_of:
common.func:
common.for_stmt:
common.for_stmt
common.method
common.for_stmt:
common.for_stmt
```

With `pathend` there is no repitition:

```
common.file:
path:
any_of:
common.func:
pathcontinue
common.method
pathcontinue
pathend:
common.for_stmt:
common.for_stmt
```

Say we wanted to find functions with function calls inside doubly nested for/while statements, our query would have to handle all combinations of for/for for/while, while/for, and while/while:

```clql
common.func:
any_of:
common.for_stmt:
any_of:
common.for_stmt:
common.func_call
common.while_stmt:
common.func_call
common.while_stmt:
any_of:
common.for_stmt:
common.func_call
common.while_stmt:
common.func_call
```

With paths, we can express the same thing like so:

```clql
common.func:
path(repeat = 2):
any_of:
common.for_stmt:
pathcontinue
common.while_stmt:
pathcontinue
pathend:
common.func_call
```

## Nested paths

Nested paths are not yet valid CLQL. The following query is intended to follow the callgraph from `someFunc` and via function calls with multiply nested for loops. It will currently give a parse error.

```clql
common.func:
name == "someFunc"
path(repeat = any):
path(repeat = 2:):
common.for_stmt:
pathcontinue:
common.func_call(depth = any):
edge("calls"):
common.func:
pathcontinue
```

## Decorators

Some decorators such as `@review comment` can only be used once per query. Using them in a repeated path will cause an error.

<br />

# Variables

Facts that do not have a parent-child relationship can be compared by assigning their properties to variables. A query with a variable will only match a pattern in the code if all properties representing that variable are equal.
Expand Down