-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdecoding-the-sendgrid-event-id.html
More file actions
180 lines (170 loc) · 13.6 KB
/
decoding-the-sendgrid-event-id.html
File metadata and controls
180 lines (170 loc) · 13.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="https://blog.ttyout.net/theme/css/main.css">
<title>Decoding the SendGrid Event ID | Galloping Alligator</title>
<meta name="author" content="Emiel van de Laar">
<meta name="description" content="SendGrid is a managed email delivery service. The service is able to emit notifications when something interesting happens while processing your email. When enabled, the webhook will POST a collection of JSON encoded events to a URL of your choice. Each event …">
</head>
<body class="site-container">
<header class="site-header">
<h1><a href="https://blog.ttyout.net/">Galloping Alligator</a></h1>
<nav class="top-nav">
<a href="/">Writing</a>
<a href="/pages/about.html">About</a>
<a href="/pages/uses.html">Uses</a>
</nav>
</header>
<main class="site-content">
<article>
<header>
<h2>Decoding the SendGrid Event ID</h2>
<small>
<em>
Posted on <time datetime="2017-06-11T21:30:00+02:00">2017-06-11</time>
</em>
</small>
</header>
<div class="article-content">
<p><a class="reference external" href="https://sendgrid.com">SendGrid</a> is a managed email delivery service. The service is able to emit
notifications when something interesting happens while processing your email.
When enabled, the <a class="reference external" href="https://sendgrid.com/docs/API_Reference/Webhooks/event.html">webhook</a> will POST a collection of JSON encoded events to a
URL of your choice. Each event includes a unique identifier and is the topic of
this post.</p>
<blockquote>
Note: The following are purely my observations and corrections are welcome.</blockquote>
<p>Let's have a look at an example JSON event, particularly the <strong>sg_event_id</strong>
property.</p>
<div class="highlight"><pre><span></span><span class="p">{</span>
<span class="w"> </span><span class="s2">"email"</span><span class="o">:</span><span class="w"> </span><span class="s2">"foo@example.com"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"event"</span><span class="o">:</span><span class="w"> </span><span class="s2">"processed"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"sg_event_id"</span><span class="o">:</span><span class="w"> </span><span class="s2">"aM7rXgTYTN-GCHRFrdsP_g"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"sg_message_id"</span><span class="o">:</span><span class="w"> </span><span class="s2">"<snip>"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"smtp-id"</span><span class="o">:</span><span class="w"> </span><span class="s2">"<snip>"</span><span class="p">,</span>
<span class="w"> </span><span class="s2">"timestamp"</span><span class="o">:</span><span class="w"> </span><span class="mf">1492680648</span>
<span class="p">}</span>
</pre></div>
<p>In our environment we have observed that the <em>sg_event_id</em> comes in two
different flavours (lengths). One that is <em>22 characaters long</em> and a second
that is <em>48 characters long</em>. For example:</p>
<ul class="simple">
<li><tt class="docutils literal"><span class="pre">aM7rXgTYTN-GCHRFrdsP_g</span></tt></li>
<li><tt class="docutils literal">ZjVhYzRhMWQtNzAzZi00ODdlLWE0YWEtYTZhNThhYWQ4OTVk</tt></li>
</ul>
<div class="section" id="quick-uuid-intro">
<h2>Quick UUID Intro</h2>
<p>While investigating the <em>sg_event_id</em> I was tipped that these were actually
UUIDs. A <a class="reference external" href="https://en.wikipedia.org/wiki/Universally_unique_identifier">UUID</a> is 128 bit number (16 octets) that is usually presented as text,
e.g. <em>f5ac4a1d-703f-487e-a4aa-a6a58aad895d</em>. If you're not familiar with UUIDs
I encourage you to give the Wikipedia page a quick glance before continuing.</p>
</div>
<div class="section" id="character-event-id">
<h2>22 Character Event ID</h2>
<p>Let's tackle the 22 character format first by poking at it in the Python shell.</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="n">buf</span> <span class="o">=</span> <span class="s1">'aM7rXgTYTN-GCHRFrdsP_g'</span>
</pre></div>
<p>The string looks like it might be <a class="reference external" href="https://en.wikipedia.org/wiki/Base64">Base64</a> encoded due to the characters
(alphabet) used (A-Z, a-z, 0-9). There are a number of Base64 encoding
alternatives that treat index values 62 and 63 differently and we need to make
sure we're using the right one. I went through our collection of event IDs and
was able to identify many having both a <strong>minus</strong> and an <strong>underscore</strong>
character. This indicates the <a class="reference external" href="https://tools.ietf.org/html/rfc4648#section-5">Base64url</a> variant is good candidate.</p>
<p>Let's pull in the <tt class="docutils literal">base64</tt> library and attempt to decode the event ID.</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="kn">import</span> <span class="nn">base64</span>
<span class="o">>>></span> <span class="n">base64</span><span class="o">.</span><span class="n">urlsafe_b64decode</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span>
<span class="p">[</span><span class="n">snip</span><span class="p">]</span>
<span class="ne">TypeError</span><span class="p">:</span> <span class="n">Incorrect</span> <span class="n">padding</span>
</pre></div>
<p>Python gives us an <tt class="docutils literal">Incorrect padding</tt> error. We'll skip a thorough break
down of Base64 <a class="reference external" href="https://en.wikipedia.org/wiki/Base64#Output_padding">padding</a> but of interest is the fact that some implementations
discard the padding in the output because the number of missing bytes can be
determined by the number of Base64 characters. In addition
<a class="reference external" href="https://tools.ietf.org/html/rfc4648#section-5">RFC4648 § 5</a> gives us a
hint as to why one might do that.</p>
<blockquote>
The pad character "=" is typically percent-encoded when used in an
URI [9], but if the data length is known implicitly, this can be
avoided by skipping the padding; ...</blockquote>
<p>Let's see if we can figure out how much padding might have been discarded.</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="nb">len</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span> <span class="o">%</span> <span class="mi">4</span>
<span class="mi">2</span>
<span class="c1"># '[aM7r][XgTY][TN-G][CHRF][rdsP][_g ]'</span>
<span class="c1"># discarded padding --------^^</span>
</pre></div>
<p>So according to the spec we need to append two padding characters, i.e. "==",
to our our <em>sg_event_id</em>. Will it now decode properly?</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="n">base64</span><span class="o">.</span><span class="n">urlsafe_b64decode</span><span class="p">(</span><span class="n">buf</span> <span class="o">+</span> <span class="s1">'=='</span><span class="p">)</span>
<span class="s1">'h</span><span class="se">\xce\xeb</span><span class="s1">^</span><span class="se">\x04\xd8</span><span class="s1">L</span><span class="se">\xdf\x86\x08</span><span class="s1">tE</span><span class="se">\xad\xdb\x0f\xfe</span><span class="s1">'</span>
</pre></div>
<p>Bingo, we have some bytes! Sixteen to be exact (you may check using len()).
Let's also encode as hex to make it a bit more readable.</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="s1">'h</span><span class="se">\xce\xeb</span><span class="s1">^</span><span class="se">\x04\xd8</span><span class="s1">L</span><span class="se">\xdf\x86\x08</span><span class="s1">tE</span><span class="se">\xad\xdb\x0f\xfe</span><span class="s1">'</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'hex'</span><span class="p">)</span>
<span class="s1">'68ceeb5e04d84cdf86087445addb0ffe'</span>
</pre></div>
<p>From our intro we know that a UUID is text format representing 16 bytes. Let's
see if we can plug these bytes in and get a sensible UUID out.</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="kn">import</span> <span class="nn">uuid</span>
<span class="o">>>></span> <span class="n">eid</span> <span class="o">=</span> <span class="n">uuid</span><span class="o">.</span><span class="n">UUID</span><span class="p">(</span><span class="nb">bytes</span><span class="o">=</span><span class="s1">'h</span><span class="se">\xce\xeb</span><span class="s1">^</span><span class="se">\x04\xd8</span><span class="s1">L</span><span class="se">\xdf\x86\x08</span><span class="s1">tE</span><span class="se">\xad\xdb\x0f\xfe</span><span class="s1">'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">eid</span>
<span class="n">UUID</span><span class="p">(</span><span class="s1">'68ceeb5e-04d8-4cdf-8608-7445addb0ffe'</span><span class="p">)</span>
<span class="o">>>></span> <span class="k">assert</span> <span class="n">eid</span><span class="o">.</span><span class="n">variant</span> <span class="o">==</span> <span class="n">uuid</span><span class="o">.</span><span class="n">RFC_4122</span> <span class="ow">and</span> <span class="n">eid</span><span class="o">.</span><span class="n">version</span> <span class="o">==</span> <span class="mi">4</span>
</pre></div>
<p>That appears to check out.</p>
</div>
<div class="section" id="character-event-id-1">
<h2>48 Character Event ID</h2>
<p>Now let's have a look at the <em>sg_event_id</em> having 48 characters.</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="n">buf</span> <span class="o">=</span> <span class="s2">"ZjVhYzRhMWQtNzAzZi00ODdlLWE0YWEtYTZhNThhYWQ4OTVk"</span>
</pre></div>
<p>Again this looks like it is Base64 encoded or some variant thereof. Lets just
give it a shot.</p>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="n">base64</span><span class="o">.</span><span class="n">b64decode</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span>
<span class="s1">'f5ac4a1d-703f-487e-a4aa-a6a58aad895d'</span>
</pre></div>
<p>Hey that looks familiar. It appears to be a UUIDv4 encoded string. Let's build
a UUID from the Base64 decoded string and see if it checks out.</p>
<blockquote>
Note: I was unable to determine which variant of Base64 is used for this
format. We've yet to see any special characters outside of A-Z, a-z, 0-9
alphabet.</blockquote>
<div class="highlight"><pre><span></span><span class="o">>>></span> <span class="n">eid</span> <span class="o">=</span> <span class="n">uuid</span><span class="o">.</span><span class="n">UUID</span><span class="p">(</span><span class="s1">'f5ac4a1d-703f-487e-a4aa-a6a58aad895d'</span><span class="p">)</span>
<span class="o">>>></span> <span class="n">eid</span>
<span class="n">UUID</span><span class="p">(</span><span class="s1">'f5ac4a1d-703f-487e-a4aa-a6a58aad895d'</span><span class="p">)</span>
<span class="o">>>></span> <span class="k">assert</span> <span class="n">eid</span><span class="o">.</span><span class="n">variant</span> <span class="o">==</span> <span class="n">uuid</span><span class="o">.</span><span class="n">RFC_4122</span> <span class="ow">and</span> <span class="n">eid</span><span class="o">.</span><span class="n">version</span> <span class="o">==</span> <span class="mi">4</span>
</pre></div>
<p>That appears to check out as well.</p>
</div>
<div class="section" id="wrapping-up">
<h2>Wrapping Up</h2>
<p>I've applied the above decoding to all the events we've collected so far and
every event id looks to be a valid UUIDv4 thus I'm fairly confident this is a
valid decoding. I initially asked SendGrid support if they could point me to
some documentation or clarify the difference in the format. I didn't get a
clear answer but did mention these were generated by different systems.</p>
<p>Why the SendGrid UUIDs are Base64 encoded is a bit puzzling to me. A UUID
string is already URL safe because it consists of only the characters 0-9, a-f
and "-". The short format (22 chars) does take you from 32 chars (UUID string)
to 22 chars because the underlying 128 bit number is encoded. However, Base64
encoding a UUID string is going in the wrong direction as it takes you from 32
chars (UUID string) to 48.</p>
<p>This excercise has resulted in a <a class="reference external" href="https://gist.github.com/emiel/99e5c103dfffaf05629ca305ff546c18">Python implementation</a> and a <a class="reference external" href="https://gist.github.com/emiel/49aa93baab83a55f17dca4f7d790a067">Postgres
implementation</a>. Feel free to use them.</p>
<p>A final warning: SendGrid offers testing functionality to emit example
events. The <em>sg_event_id</em> in these events has <em>24 characters</em> and is the 22
character variant with the padding included.</p>
</div>
</div>
</article>
</main>
<footer class="site-footer">
<nav>
<a href="https://github.com/emiel" ref="me">GitHub</a>
<a href="https://hachyderm.io/@emiel" ref="me">Mastodon</a>
<a href="mailto:gemiel@gmail.com" ref="me">e-mail</a>
</nav>
<p class="copyright">© 2014–2025 Emiel van de Laar</p>
</footer>
</body>
</html>