-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdev-diary.txt
210 lines (145 loc) · 6.56 KB
/
dev-diary.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
12/23/17
1.1 and 1.2 were pretty straightforward
1.3 is less so
"The hex encoded string:
1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736
... has been XOR'd against a single character. Find the key, decrypt the message."
what does it mean for a hex-encoded string to be XOR'd against a single character?
does it mean that a single character in that string has been XOR'd against another character?
(definitely not)
does it mean that the string has been XOR'd against an equally long string containing
a bunch of repetitions of a single character like 'b'? (maybe!)
and should we decode this thing from hex and print it out as a regular string first
to see what it looks like? (yes!)
ok
(apply str (map char foo))
=> "77316?x+x413=x9x(7-6<x7>x:9;76"
so i think what we're going to do is
for each character in \a, \b, \c, etc
make a string like 'aaaaaaaaaaaaaaaaaaaaa'
convert it to bytes
xor it against the input string
convert that to a regular string and print it out and see if it looks like english
if that doens't work, also try uppercase characters, numbers, symbols
right now i have this
(let [foo (unhexify "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736")]
(for [character (map byte "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")]
(let [character-buffer (byte-array (repeat (count foo) character))]
(bytes->str
(fixed-xor foo character-buffer) ) ) ) )
one of the decoded outputs is
"cOOKING mcS LIKE A POUND OF BACON"
which seems pretty fishy
that didn't paste entirely correctly, there are some unrecognized characters in the output
oh
one of the last decoded messages is
"Cooking MC's like a pound of bacon"
which (despite the apostrophe) is definitely the right decoding
so now let's come up with a string-scoring function
TODO make a test suite, and have each test implement one of the challenges
so why did that one character _almost_ decode the string?
[\x "cOOKING mcS LIKE A POUND OF BACON"]
[\X "Cooking MC's like a pound of bacon"]
(Integer/toString (int \x) 2)
=> "1111000"
(Integer/toString (int \X) 2)
=> "1011000"
the 32 bit is 1 for \x and 0 for \X
\x is 120 and \X is 88
so basically the \x key almost deciphered the plaintext but shifted most characters up by 32,
which capitalizes them
ok so anyway let's take a look at character frequencies
here's the frequencies of the decoded string
{\space 6,
\a 2,
\b 1,
\C 2,
\c 1,
\d 1,
\e 1,
\f 1,
\' 1,
\g 1,
\i 2,
\k 2,
\l 1,
\M 1,
\n 3,
\o 5,
\p 1,
\s 1,
\u 1}
note the \space
and also, there's only one e in there
so i dunno; the problem suggests using character frequencies as in etaoin shrdlcu,
but i'm kind of inclined to do something more like: what percentage of the characters
in the string are alphanumeric? also, it's fine to have common symbols like spaces and
punctuation. but look at some of the incorrectly decoded strings:
[\E "^rrvtsz=P^:n=qtvx=|=mrhsy=r{=|~rs"]
[\F "]qquwpy>S]9m>rwu{>>nqkpz>qx>|}qp"]
[\G "\\pptvqx?R\\8l?svtz?~?opjq{?py?}~|pq"]
[\H "S{y~w0]S7c0|y{u0q0`e~t0v0rqs~"]
[\I "R~~zxv1\\R6b1}xzt1p1a~du1~w1spr~"]
[\J "Q}}y{|u2_Q5a2~{yw2s2b}g|v2}t2psq}|"]
[\K "P||xz}t3^P4`3zxv3r3c|f}w3|u3qrp|}"]
[\L "W{{}zs4YW3g4x}q4u4d{azp4{r4vuw{z"]
[\M "Vzz~|{r5XV2f5y|~p5t5ez`{q5zs5wtvz{"]
[\N "Uyy}xq6[U1e6z}s6w6fycxr6yp6twuyx"]
for each of those strings, the majority of the string is not a letter.
one thing i don't yet understand is: how do we do normalization? like, i want to have a
score-string function that returns a number between 0.0 and 1.0
i guess it involves dividing various things by the length of the string?
but like i guess my question is how do you combine your different scoring mechanisms
again i guess it's just like - (/ (apply + scores) (count scores))
so let's just start by calculating the percentage of characters in a string that are letters
ez
ok, ideas for signals
proportion of letters in string
proportion of lowercase letters in letters in string
proportion of punctuation in string
proportion of numbers in string
not sure why git thinks this file is binary rather than text
made some progress on the scoring function - at least it exists and runs
but lots of bogus things get assigned high scores
tbh it's really looking like my current approach is bad and i should just go with char frequencies
TODO so i think i'll try that next
====
12/24/17
chi squared was a good find, thanks stackoverflow
next up: test suite
====
12/26/17
ok, found the correct decryption:
"Now that the party is jumping\n"
but our code isn't finding it atm.
i think the \n is throwing it off?
added \n to allowed characters, but the string still gets a score of 0.93
i'd like to be down around 0.8
ok i solved it by just super penalizing unrecognized characters
DONE attempt to cut down test time by figuring out which string 1.4 decoded from, and what key was used
to see if we can go back to alphanumeric ascii characters or if we really do need to do (range 128)
=======
12/28/17
having a lot of trouble figuring out what they want me to do in 2.12
here's a bunch of text from the problems
; Copy your oracle function to a new function that encrypts buffers under ECB mode using
; a consistent but unknown key (for instance, assign a single random key, once, to a global variable).
; Now take that same function and have it append to the plaintext, BEFORE ENCRYPTING, the following string:
; Knowing the block size, craft an input block that is exactly 1 byte short (for instance,
; if the block size is 8 bytes, make "AAAAAAA"). Think about what the oracle function is
; going to put in that last byte position.
; Make a dictionary of every possible last byte by feeding different strings to the oracle;
; for instance, "AAAAAAAA", "AAAAAAAB", "AAAAAAAC", remembering the first block of each invocation.
; Match the output of the one-byte-short input to one of the entries in your dictionary.
; You've now discovered the first byte of unknown-string.
in 2.11, they say that "your oracle function" is "a function that encrypts data under an unknown key ---
that is, a function that generates a random key and encrypts under it.
The function should look like:
encryption_oracle(your-input)
=> [MEANINGLESS JIBBER JABBER]
so i think that just means that encryption_oracle is
#(aes-ecb-encrypt (pkcs7-pad % 16) key)
i don't think the "oracle function" is the one that does random padding and ecb/cbc decisions
ok. so "oracle function" here means "an aes ecb encryption function that appends this long
run of extra plaintext to the plaintext that you give it". i think i understand now.
wish they'd word this stuff more clearly!