Commit 564bad1
Expanding tree edit similarity to different languages (#128)
Summary:
Pull Request resolved: #128
Extend tree-edit similarity (`PyTreeSitterAttack`) to support more coding languages, in particular the ones already covered by CodeBLEU (Python, C, C++, Java, Rust, JavaScript, Go, Ruby, PHP, C#). Previously only Python and C++ were supported.
This diff introduces:
- Unified and extended grammar backend in `py_tree_sitter_attack.py`: Replaces the standalone tree-sitter-python and tree-sitter-cpp packages with the codebleu package's bundled `my-languages.so`. This provides a single grammar library covering all 10 languages, is consistent with the `CodeBleuAttack` module and simplifies `_get_parser()` to a 3-line function. Verified that the codebleu-bundled Python grammar produces identical trees (zero edit distance) to the previous implementation.
- No changes to the analysis layer: `TreeEditDistanceNode` is already language-agnostic, it operates purely on zss Node trees.
Reviewed By: mgrange1998
Differential Revision: D102700637
fbshipit-source-id: 3708fd9a784522e512c0ad87d6b3a1677540d1e31 parent e4c4436 commit 564bad1
3 files changed
Lines changed: 508 additions & 44 deletions
File tree
- privacy_guard
- analysis/tests
- attacks
- code_similarity
- tests
Lines changed: 86 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
115 | 115 | | |
116 | 116 | | |
117 | 117 | | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
118 | 193 | | |
119 | | - | |
| 194 | + | |
120 | 195 | | |
121 | 196 | | |
122 | 197 | | |
123 | 198 | | |
124 | 199 | | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
125 | 203 | | |
126 | 204 | | |
127 | 205 | | |
128 | 206 | | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
129 | 210 | | |
130 | | - | |
| 211 | + | |
131 | 212 | | |
132 | 213 | | |
133 | 214 | | |
134 | 215 | | |
135 | 216 | | |
136 | | - | |
137 | | - | |
138 | | - | |
139 | | - | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
140 | 220 | | |
141 | 221 | | |
142 | 222 | | |
| |||
Lines changed: 23 additions & 38 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
14 | 14 | | |
15 | 15 | | |
16 | 16 | | |
17 | | - | |
| 17 | + | |
18 | 18 | | |
19 | | - | |
20 | 19 | | |
21 | 20 | | |
22 | 21 | | |
23 | | - | |
24 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
42 | | - | |
43 | | - | |
44 | | - | |
45 | | - | |
46 | | - | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
47 | 46 | | |
48 | 47 | | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
53 | | - | |
54 | | - | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
62 | | - | |
63 | | - | |
64 | | - | |
65 | | - | |
66 | | - | |
| 48 | + | |
67 | 49 | | |
68 | 50 | | |
69 | 51 | | |
70 | 52 | | |
71 | 53 | | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
72 | 57 | | |
73 | | - | |
| 58 | + | |
| 59 | + | |
74 | 60 | | |
75 | 61 | | |
76 | 62 | | |
77 | 63 | | |
78 | 64 | | |
79 | 65 | | |
80 | 66 | | |
81 | | - | |
82 | | - | |
83 | | - | |
| 67 | + | |
| 68 | + | |
84 | 69 | | |
85 | | - | |
86 | | - | |
| 70 | + | |
87 | 71 | | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
92 | 77 | | |
93 | 78 | | |
94 | 79 | | |
| |||
0 commit comments