dev-diary.txt

12/23/17

1.1 and 1.2 were pretty straightforward

1.3 is less so

"The hex encoded string:
 1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736
 ... has been XOR'd against a single character. Find the key, decrypt the message."

what does it mean for a hex-encoded string to be XOR'd against a single character?
does it mean that a single character in that string has been XOR'd against another character?
(definitely not)

does it mean that the string has been XOR'd against an equally long string containing
a bunch of repetitions of a single character like 'b'? (maybe!)

and should we decode this thing from hex and print it out as a regular string first
to see what it looks like? (yes!)

ok
(apply str (map char foo))
=> "77316?x+x413=x9x(7-6<x7>x:9;76"

so i think what we're going to do is
for each character in \a, \b, \c, etc
make a string like 'aaaaaaaaaaaaaaaaaaaaa'
convert it to bytes
xor it against the input string
convert that to a regular string and print it out and see if it looks like english
if that doens't work, also try uppercase characters, numbers, symbols

right now i have this

(let [foo (unhexify "1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736")]
    (for [character (map byte "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")]
      (let [character-buffer (byte-array (repeat (count foo) character))]
        (bytes->str
          (fixed-xor foo character-buffer) ) ) ) )

one of the decoded outputs is
"cOOKING mcS LIKE A POUND OF BACON"

which seems pretty fishy
that didn't paste entirely correctly, there are some unrecognized characters in the output
oh

one of the last decoded messages is

 "Cooking MC's like a pound of bacon"

which (despite the apostrophe) is definitely the right decoding
so now let's come up with a string-scoring function

TODO make a test suite, and have each test implement one of the challenges

so why did that one character _almost_ decode the string?

[\x "cOOKING mcS LIKE A POUND OF BACON"]
[\X "Cooking MC's like a pound of bacon"]

(Integer/toString (int \x) 2)
=> "1111000"
(Integer/toString (int \X) 2)
=> "1011000"

the 32 bit is 1 for \x and 0 for \X

\x is 120 and \X is 88

so basically the \x key almost deciphered the plaintext but shifted most characters up by 32,
which capitalizes them

ok so anyway let's take a look at character frequencies

here's the frequencies of the decoded string

{\space 6,
 \a 2,
 \b 1,
 \C 2,
 \c 1,
 \d 1,
 \e 1,
 \f 1,
 \' 1,
 \g 1,
 \i 2,
 \k 2,
 \l 1,
 \M 1,
 \n 3,
 \o 5,
 \p 1,
 \s 1,
 \u 1}

note the \space
and also, there's only one e in there

so i dunno; the problem suggests using character frequencies as in etaoin shrdlcu,
but i'm kind of inclined to do something more like: what percentage of the characters
in the string are alphanumeric? also, it's fine to have common symbols like spaces and
punctuation. but look at some of the incorrectly decoded strings:

 [\E "^rrvtsz=P^:n=qtvx=|=mrhsy=r{=|~rs"]
 [\F "]qquwpy>S]9m>rwu{>>nqkpz>qx>|}qp"]
 [\G "\\pptvqx?R\\8l?svtz?~?opjq{?py?}~|pq"]
 [\H "S{y~w0]S7c0|y{u0q0`e~t0v0rqs~"]
 [\I "R~~zxv1\\R6b1}xzt1p1a~du1~w1spr~"]
 [\J "Q}}y{|u2_Q5a2~{yw2s2b}g|v2}t2psq}|"]
 [\K "P||xz}t3^P4`3zxv3r3c|f}w3|u3qrp|}"]
 [\L "W{{}zs4YW3g4x}q4u4d{azp4{r4vuw{z"]
 [\M "Vzz~|{r5XV2f5y|~p5t5ez`{q5zs5wtvz{"]
 [\N "Uyy}xq6[U1e6z}s6w6fycxr6yp6twuyx"]

for each of those strings, the majority of the string is not a letter.

one thing i don't yet understand is: how do we do normalization? like, i want to have a
score-string function that returns a number between 0.0 and 1.0
i guess it involves dividing various things by the length of the string?
but like i guess my question is how do you combine your different scoring mechanisms
again i guess it's just like - (/ (apply + scores) (count scores))

so let's just start by calculating the percentage of characters in a string that are letters

ez

ok, ideas for signals

proportion of letters in string
proportion of lowercase letters in letters in string
proportion of punctuation in string
proportion of numbers in string

not sure why git thinks this file is binary rather than text

made some progress on the scoring function - at least it exists and runs
but lots of bogus things get assigned high scores

tbh it's really looking like my current approach is bad and i should just go with char frequencies
TODO so i think i'll try that next

====

12/24/17

chi squared was a good find, thanks stackoverflow
next up: test suite

====

12/26/17

ok, found the correct decryption:
"Now that the party is jumping\n"

but our code isn't finding it atm.
i think the \n is throwing it off?

added \n to allowed characters, but the string still gets a score of 0.93
i'd like to be down around 0.8

ok i solved it by just super penalizing unrecognized characters

DONE attempt to cut down test time by figuring out which string 1.4 decoded from, and what key was used
to see if we can go back to alphanumeric ascii characters or if we really do need to do (range 128)

=======

12/28/17

having a lot of trouble figuring out what they want me to do in 2.12

here's a bunch of text from the problems

; Copy your oracle function to a new function that encrypts buffers under ECB mode using
; a consistent but unknown key (for instance, assign a single random key, once, to a global variable).

; Now take that same function and have it append to the plaintext, BEFORE ENCRYPTING, the following string:

; Knowing the block size, craft an input block that is exactly 1 byte short (for instance,
; if the block size is 8 bytes, make "AAAAAAA"). Think about what the oracle function is
; going to put in that last byte position.

; Make a dictionary of every possible last byte by feeding different strings to the oracle;
; for instance, "AAAAAAAA", "AAAAAAAB", "AAAAAAAC", remembering the first block of each invocation.

; Match the output of the one-byte-short input to one of the entries in your dictionary.
; You've now discovered the first byte of unknown-string.

in 2.11, they say that "your oracle function" is "a function that encrypts data under an unknown key ---
that is, a function that generates a random key and encrypts under it.

The function should look like:

encryption_oracle(your-input)
=> [MEANINGLESS JIBBER JABBER]

so i think that just means that encryption_oracle is

#(aes-ecb-encrypt (pkcs7-pad % 16) key)

i don't think the "oracle function" is the one that does random padding and ecb/cbc decisions

ok. so "oracle function" here means "an aes ecb encryption function that appends this long
run of extra plaintext to the plaintext that you give it". i think i understand now.
wish they'd word this stuff more clearly!