Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
a8859e8
Add debug print and explanatory comments
wunused Jun 25, 2025
53f250c
Pass focus_lines from runner.py to resolve_includes.sc
wunused Jun 27, 2025
0668f60
Add focus_lines param to tools/resolve_includes.sc
wunused Jun 27, 2025
031265d
Fix focus_line handling
wunused Jun 27, 2025
9d51619
Refactor resolve includes script
neochristou Jun 28, 2025
bdf866e
Refactor resolve includes script for new joern version
neochristou Jun 28, 2025
7e5b1d1
Filter out metaclasses
neochristou Jun 28, 2025
5f51b01
Fix unclosed paren in runner.py and uncomment functionality
wunused Jun 28, 2025
e71f303
Get rid of project-specific focus lines in runner.py
wunused Jun 28, 2025
0ac4864
Add options for not running parts of runner
neochristou Jun 28, 2025
ac6c4b0
fix paths
neochristou Jun 28, 2025
d493b4e
Add scope check back
neochristou Jun 28, 2025
6a91f99
Optimize getScopeId
neochristou Jun 28, 2025
93df109
Don't access cpg.all
neochristou Jun 28, 2025
5782a37
Optimize getNodeType
neochristou Jun 28, 2025
1404dc6
Use sets instead of lists
neochristou Jun 28, 2025
f7bc4e2
Replace l(0) with head
neochristou Jun 28, 2025
91c41ad
Optimize calls to instanceof
neochristou Jun 28, 2025
37da31e
Add caches
neochristou Jun 28, 2025
8a357c2
Fix pytests
neochristou Jun 28, 2025
022d023
Fix pytest paths
neochristou Jun 28, 2025
b8b03d1
More path fixes
neochristou Jun 28, 2025
7379852
Fix concatenating includes
neochristou Jun 29, 2025
e800c2c
Handle new list representation
neochristou Jun 29, 2025
2a5d159
Nits
neochristou Jun 29, 2025
65dee8f
Only run joern analysis once
neochristou Jun 29, 2025
71c9cb9
Fix include resolution when variables appear in the expression
wunused Jun 29, 2025
d754d8e
Merge remote-tracking branch 'refs/remotes/origin/port-new-joern' int…
wunused Jun 29, 2025
6dfcf41
Group by object id
neochristou Jun 29, 2025
4d80d69
Merge branch 'port-new-joern' of github.com:columbia/quack into port-…
neochristou Jun 29, 2025
8e6be00
Get id of correct node
neochristou Jun 29, 2025
2223ec6
Add instanceof as a rule and more tests
neochristou Jul 13, 2025
46eade5
Merge remote-tracking branch 'origin/port-new-joern' into update-runner
wunused Aug 7, 2025
31b59aa
Update Joern version in Dockerfile and README
wunused Aug 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ RUN apt-get update \
WORKDIR /joern
RUN curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" -o joern-install.sh \
&& chmod u+x joern-install.sh \
&& ./joern-install.sh --version=v2.0.290
&& ./joern-install.sh --version=v4.0.383

# Copy Quack contents
COPY . /quack/
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,12 @@ to the `unserialize` function call.
## Requirements and Setup

Quack depends on:
* [Joern code analysis platform](https://joern.io/) version 2.0.290
* [Joern code analysis platform](https://joern.io/) version 4.0.383
* Java Development Kit 19 (Joern dependency)
* Python3

Quack depends on features in Joern version 2.0.290. Older versions
of Joern will not work. Versions newer than 2.0.290 will likely work,
Quack depends on features in Joern version 4.0.383. Older versions
of Joern will not work. Versions newer than 4.0.383 will likely work,
although have not been explicitly tested.


Expand All @@ -56,7 +56,7 @@ $ sudo apt-get update && sudo apt-get install -y openjdk-19-jdk openjdk-19-jre
```
$ curl -L "https://github.com/joernio/joern/releases/latest/download/joern-install.sh" -o joern-install.sh
$ chmod u+x joern-install.sh
$ ./joern-install.sh --version=v2.0.290
$ ./joern-install.sh --version=v4.0.383
```

3. Install Python packages (optionally, in a virtual environment).
Expand Down
163 changes: 134 additions & 29 deletions deduce_allowed_classes.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
"""

import itertools
from collections import defaultdict

# PHP primitive types.
# Don't treat string as native because of __toString
Expand All @@ -29,40 +30,89 @@ class Leak(Exception):
pass


def deduce_allowed_classes(avail_classes, evidence):
def deduce_allowed_classes(avail_classes, grouped_evidence):
# Extract all type deductions from the evidence collected about an object.
# (Duck and Exact type matching rules)
type_entries_all = [x for x in evidence if x["condType"] in ("Duck", "Exact")]
type_entries_no_tostring = [x for x in type_entries_all if x["reason"] != "HasToString"]

# Check if the type leaks because it's being used in a dynamic call, and if it
# does, return all available classes
leaks = any([x["reason"] == "DynamicCall" for x in type_entries_all])
if leaks:
raise Leak()

# Parse the deduced types
types_all = list(itertools.chain(*[x["type"].split("|") for x in type_entries_all]))
types_no_tostring = list(itertools.chain(*[x["type"].split("|") for x in type_entries_no_tostring]))
all_obj_types = []
all_obj_types_no_tostring = []
# If we have more than one evidence for a single object, take the intersection
# of all deduced types
for obj_evidence in grouped_evidence:
obj_evidence_no_tostring = [
x for x in obj_evidence if x["reason"] != "HasToString"
]

# Check if the type leaks because it's being used in a dynamic call, and if it
# does, return all available classes
leaks = any([x["reason"] == "DynamicCall" for x in obj_evidence])
if leaks:
raise Leak()

obj_types = list(map(
set,
[
evidence["type"].split("|")
for evidence in obj_evidence
if len(evidence["type"]) != 0
],
))
obj_types_no_tostring = list(map(
set,
[
evidence["type"].split("|")
for evidence in obj_evidence_no_tostring
if len(evidence["type"]) != 0
],
))

if len(obj_types) > 0:
obj_types_intersect = set.intersection(*obj_types)
all_obj_types.append(obj_types_intersect)

if len(obj_types_no_tostring) > 0:
obj_types_intersect_no_tostring = set.intersection(
*obj_types_no_tostring)
all_obj_types_no_tostring.append(obj_types_intersect_no_tostring)

# No types found
if len(all_obj_types) == 0:
return [], [], []

# Check if we have evidence indicating a native type
have_native_evidence = any([x in native_types for x in types_all])
have_native_evidence = any([x in native_types for x in all_obj_types])

# Allowed types are the intersection of the set of types deduced for an
# object, and the set of available classes.
allowed_types_all = set([x for x in types_all if x in avail_classes])
allowed_types_no_tostring = set([x for x in types_no_tostring if x in avail_classes])
allowed_types_all = set(
[
obj_type
for obj_type_set in all_obj_types
for obj_type in obj_type_set
if obj_type in avail_classes
]
)
allowed_types_no_tostring = set(
[
obj_type
for obj_type_set in all_obj_types_no_tostring
for obj_type in obj_type_set
if obj_type in avail_classes
]
)

# We should not have found evidence that the object is both a native and
# a class type.
if have_native_evidence and len(allowed_types_no_tostring) > 0:
raise Exception(f"Found evidence for native type but also the following allowed types: {allowed_types_all}")
raise Exception(
f"Found evidence for native type but also the following allowed types: {allowed_types_all}"
)

# Determine whether the deduced types are useful for constraining the
# available classes into allowed classes.
# Filter out the evidence for types that don't give us any useful type
# information (e.g., if type is 'array', we don't actually know what the
# types of each element are, unless we collected more evidence)
types_all = set.union(*all_obj_types)
useful_types = [x for x in types_all if x not in not_useful_types]
useful_types = [x for x in useful_types if "." not in x and "->" not in x]

Expand All @@ -77,7 +127,7 @@ def deduce_allowed_classes(avail_classes, evidence):
elif len(useful_types) == 0:
allowed_types_no_tostring = avail_classes

return types_all, allowed_types_all, allowed_types_no_tostring
return list(types_all), list(allowed_types_all), list(allowed_types_no_tostring)


def compute_allowed_classes(evidence_entries, avail_classes_entries):
Expand All @@ -90,37 +140,92 @@ def compute_allowed_classes(evidence_entries, avail_classes_entries):
# Get the call location
filename = unser_call["filename"]
line_no = unser_call["lineNumber"]
# Get the collected type evidence
evidence = unser_call["conditions"]

print(f"Working on: File[{filename}],Line[{line_no}]")

# Get the collected type evidence
evidence = [
x for x in unser_call["conditions"] if x["condType"] in ("Duck", "Exact")
]

# Group by object
grouped_evidence_dict = defaultdict(list)
for item in evidence:
grouped_evidence_dict[item["nodeId"]].append(item)
grouped_evidence = list(grouped_evidence_dict.values())

# Get the available classes at that callsite
avail_classes_entry = [x for x in avail_classes_entries if x["filename"] == filename]
assert (len(avail_classes_entry) == 1), f"No avail classes entries found for {filename}!"
avail_classes_entry = [
x for x in avail_classes_entries if x["filename"] == filename
]
assert (
len(avail_classes_entry) == 1
), f"No avail classes entries found for {filename}!"

avail_classes_entry = avail_classes_entry[0]
avail_classes_lines = list(set(avail_classes_entry['line_numbers']).intersection({line_no}))
avail_classes_lines = list(
set(avail_classes_entry["line_numbers"]).intersection({line_no})
)
# Make sure we actually have available classes for this line
assert (len(avail_classes_lines) >= 1), f"No avail classes entries found for {line_no} in {filename}!"
assert (
len(avail_classes_lines) >= 1
), f"No avail classes entries found for {line_no} in {filename}!"
avail_classes = avail_classes_entry["avail_classes"]
result_entries.append({"filename" : filename, "lineNumber" : line_no, "allowedTypes" : None, "allowedClasses" : None})
result_entries.append(
{
"filename": filename,
"lineNumber": line_no,
"allowedTypes": None,
"allowedClasses": None,
}
)
try:
# Call the main script that consolidates the available classes
# with the inferred types and produces the final set of allowed classes
types_all, allowed_classes_all, allowed_types_no_tostring = deduce_allowed_classes(avail_classes, evidence)
filtered_grouped_evidence = [
[{key: d.get(key) for key in ("type", "reason")}
for d in obj_evidence]
for obj_evidence in grouped_evidence
]
types_all, allowed_classes_all, allowed_types_no_tostring = (
deduce_allowed_classes(
avail_classes, filtered_grouped_evidence)
)
print(f"All types collected from evidence: {types_all}")
print(f"All allowed classes: {allowed_types_no_tostring}")
result_entries[-1]["allowedTypes"] = types_all
result_entries[-1]["allowedClasses"] = list(allowed_types_no_tostring)
result_entries[-1]["allowedClasses"] = list(
allowed_types_no_tostring)
except Leak as e:
# we identify that the analysis is "leaking" (e.g., flows in to a dynamic
# call we can't track)=> we need to just return available classes' gadgets.
# When not taking into account the available classes (NOAVAIL), we just return all the gadgets in the project,
# else only return the gadgets in the available classes
print(f"Project analysis for [{filename}]:[{line_no}] resulted in a leak. "
f"See docs about how to continue from here")
print(
f"Project analysis for [{filename}]:[{line_no}] resulted in a leak. "
f"See docs about how to continue from here"
)
except Exception as e:
print(f"{e.__class__.__name__}:{e}")

return result_entries


if __name__ == "__main__":

import argparse
import json

parser = argparse.ArgumentParser()
parser.add_argument("--analysis-results-path", required=True)
parser.add_argument("--availclass-results-path", required=True)
args = parser.parse_args()

with open(args.analysis_results_path) as f:
evidence_entries = json.load(f)

with open(args.availclass_results_path) as f:
avail_classes_entries = json.load(f)

results = compute_allowed_classes(evidence_entries, avail_classes_entries)
print(results)
25 changes: 25 additions & 0 deletions pytests/php-samples/duck_test/duck_test.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<?php

class Duck {
public function swim() {
echo "Duck swimming\n";
}

public function fly() {
echo "Duck flying\n";
}
}

class Whale {
public function swim() {
echo "Whale swimming\n";
}
}

function test_duck($object) {
$animal = unserialize($object);
$animal->swim();
$animal->fly();
}

?>
1 change: 1 addition & 0 deletions pytests/php-samples/duck_tests/availclass.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[{"filename":"/home/neo/quack/pytests/php-samples/duck_tests/pass_to_func.php","line_numbers":[9],"avail_classes":["Whale","Duck"]},{"filename":"/home/neo/quack/pytests/php-samples/duck_tests/method.php","line_numbers":[20],"avail_classes":["Whale","Duck"]},{"filename":"/home/neo/quack/pytests/php-samples/duck_tests/instanceof.php","line_numbers":[8],"avail_classes":["Whale","Duck"]},{"filename":"/home/neo/quack/pytests/php-samples/duck_tests/field.php","line_numbers":[11],"avail_classes":["Whale","Duck"]}]
1 change: 1 addition & 0 deletions pytests/php-samples/duck_tests/availclass.json.errors
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[]
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
/home/neo/quack/pytests/php-samples/duck_tests/pass_to_func.php -> ListBuffer(Duck, Whale)
/home/neo/quack/pytests/php-samples/duck_tests/method.php -> ListBuffer(Duck, Whale)
/home/neo/quack/pytests/php-samples/duck_tests/field.php -> ListBuffer(Duck, Whale)
/home/neo/quack/pytests/php-samples/duck_tests/instanceof.php -> ListBuffer(Duck, Whale)
Empty file.
1 change: 1 addition & 0 deletions pytests/php-samples/duck_tests/availclass.json.warnings
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[]
42 changes: 42 additions & 0 deletions pytests/php-samples/duck_tests/availclass_fixed.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
[
{
"filename": "/home/neo/quack/pytests/php-samples/duck_tests/pass_to_func.php",
"line_numbers": [
9
],
"avail_classes": [
"Whale",
"Duck"
]
},
{
"filename": "/home/neo/quack/pytests/php-samples/duck_tests/method.php",
"line_numbers": [
20
],
"avail_classes": [
"Whale",
"Duck"
]
},
{
"filename": "/home/neo/quack/pytests/php-samples/duck_tests/instanceof.php",
"line_numbers": [
8
],
"avail_classes": [
"Whale",
"Duck"
]
},
{
"filename": "/home/neo/quack/pytests/php-samples/duck_tests/field.php",
"line_numbers": [
11
],
"avail_classes": [
"Whale",
"Duck"
]
}
]
14 changes: 14 additions & 0 deletions pytests/php-samples/duck_tests/field.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<?php

class Duck {
public $feather_color;
}

class Whale {
public $flippers;
}

$animal = unserialize($object);
echo "This duck's feathers are $animal->feather_color";

?>
14 changes: 14 additions & 0 deletions pytests/php-samples/duck_tests/instanceof.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
<?php

class Duck {}

class Whale {}

function is_it_duck() {
$animal = unserialize($object);
if ($animal instanceof Duck) {
echo "It's a duck!\n"
}
}

?>
1 change: 1 addition & 0 deletions pytests/php-samples/duck_tests/joe_analyze.out.warnings
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[]
Loading
Loading