diff --git a/posts/2022/06/record-known-result.rst b/posts/2022/06/record-known-result.rst
new file mode 100644
index 000000000..2e7d618d1
--- /dev/null
+++ b/posts/2022/06/record-known-result.rst
@@ -0,0 +1,520 @@
+.. title: A Hint for Language-Specific Runtime Function Optimization in RPython's Meta-JIT
+.. slug: record-known-result
+.. date: 2022-06-01 15:00:00 UTC
+.. tags: jit
+.. category: 
+.. link: 
+.. description: 
+.. type: rest
+.. author: Carl Friedrich Bolz-Tereick
+
+RPython's meta-JIT cannot reason about the properties of many RPython
+implementation-level functions, because those are often specific to the
+different interpreters written in RPython. Expressing language-specific
+optimizations on these functions would allow the JIT to more effectively reason
+about language-specific data structures and potentially remove costly calls
+into the runtime support code. In this blog post I will describe a new hint
+that makes it possible to express invariants about the functions of the data
+structures used in a specific language interpreter. These hints make it
+possible for the meta-JIT optimizer to do more language-specific optimizations
+across function calls
+without having to change the meta-JIT itself.
+
+Introduction and Background
+===========================
+
+RPython has a meta-JIT, that can be applied to a variety of languages, one of
+them being PyPy, a Python implementation (that's really two implementations,
+one PyPy2, one PyPy3). Most modern languages have many
+built-in functions and data-types that are written in the implementation
+language (RPython, in our case, C in the case of CPython). We want the JIT
+optimizer to be able to reason about these functions and datatypes.
+
+One way to do this would be to extend the meta-JIT by adding a new language-specific
+optimization pass that knows how to deal with these functions. This approach
+however goes against the idea of a meta-JIT, since we want the JIT to be
+language independent.
+
+So we developed various tools in the past that make it
+possible for the author of RPython interpreters to express various
+optimizations. These tools take the forms of *hints*. Hints are small
+changes in the RPython source of the interpreter that communicate with the
+meta-JIT. Hints can take one of two major forms, either they are special
+functions that will be called at runtime from the RPython code of the
+interpreter, or decorators that can be applied to the RPython functions that
+make up the interpreter. Both the special functions as well as the decorators
+are part of the `rpython.rlib.jit` library.
+
+One thing I want to stress here at the beginning is that all these hints, and
+everything I talk about in the rest of the blog post, is needed by the people
+that are working on the RPython code that makes up the PyPy Python interpreter
+(or other languages implemented in RPython). "Regular" Python code that just
+runs on PyPy does not need these hints, and in fact cannot use them.
+
+
+The ``@elidable`` Decorator
+---------------------------
+
+One very important decorator is called ``@elidable``, which makes it possible
+to mark an RPython
+function as pure (actually a small generalization, but for the purpose of this
+post thinking of them as pure is fine). This decorator was described `in a blog
+post in 2011`__, it was still called ``@purefunction`` then. The post later
+turned into a paper__.
+
+.. __: https://www.pypy.org/posts/2011/03/controlling-tracing-of-interpreter-with-871085470935630424.html
+.. __: https://www3.hhu.de/stups/downloads/pdf/BoCuFiLePeRi11.pdf
+
+An elidable function will be constant-folded by the JIT if all its arguments
+are constant. If it can't be removed because some arguments aren't constant,
+it's still subject to `common subexpression elimination`__ (CSE). Subsequent
+calls to the same function with the same arguments can be removed and replaced
+by the previous result.
+
+.. __: https://en.wikipedia.org/wiki/Common_subexpression_elimination
+
+There are lots of examples of ``@elidable`` functions, both in RPython in
+general and in the PyPy interpreter. In the post linked above they are used for
+optimizing object operations in a tiny example interpreter.
+
+But also a lot of functionality that is more directly exposed to the Python
+programmer is elidable. Most string methods are elidable for example:
+``str.lower`` will always return the same result for some argument, and if you
+call it twice on the same string, it will also return the same result. So
+adding the ``@elidable`` decorator to the implementation of these string methods
+allows the JIT to constant-fold them if they are applied to a constant string, and to
+remove a second identical call.
+
+Another big class of examples are all the
+implementation functions for our big integer operations, which underlie the
+Python ``int`` type once the values don't fit into a machine word anymore. On
+the implementation level we implement those in an ``rbigint`` class, and most
+methods of that are also elidable. This enables the JIT to constant-fold big
+integer addition, and do CSE on big integer arithmetic.
+
+Limitations of ``@elidable``
+-------------------------------
+
+This is all very useful! But it's still only a limited amount of things that
+the interpreter author can express to the JIT with this one function decorator.
+Recently I merged a branch that adds two new hints that the interpreter author
+can use to communicate possible optimizations to the meta-JIT! The work has
+been an ongoing project since a while. So far, only the meta-JIT support has
+been merged, in future enhancements we plan to apply them to the PyPy Python
+interpreter where appropriate.
+
+
+``record_known_result``
+=======================
+
+So, what are the new hints? One of them is called ``record_known_result``, and that
+hint is what this blog post is about. The other is called
+``record_exact_value``, it's conceptually quite simple, but it's much harder to
+see how it can be used. It was implemented by Lin Cheng from Cornell, and it is
+described (together with a possible optimization to PyPy that uses the hint) in
+another paper__.
+
+.. __: https://dl.acm.org/doi/10.1145/3368826.3377907
+
+What is ``record_known_result`` used for? One of the limitations of ``elidable``
+is that often there are properties that connect *several* function calls that
+are connected in some way. Sometimes there are functions that are inverses of
+each other, so that ``f(g(x)) == x``  for all ``x`` (example: negation on
+numbers is its own inverse, ``--x == x``). Sometimes functions are
+idempotent__, which means that if you call the function several times you can
+remove all but the first call. An example would be ``abs`` on numbers, after
+the first call to ``abs`` the result is positive, so calling the function again
+on the result has no effect, i.e. ``abs(abs(x)) == abs(x)``. These properties
+could in theory be nicely used by a hypothetical optimizer that knows about
+them and the functions. However, as described above, we don't want to change
+the meta-JIT to add knowledge about interpreter-specific functionality. So we
+wanted to add a hint that can express them to the meta-JIT.
+
+.. __: https://en.wikipedia.org/wiki/Idempotence#Idempotent_functions
+
+What could hints look like, that make it possible to express these
+properties? At first I was experimenting with some declarative decorators
+like ``@idempotent`` and ``@is_inverse_of(func)`` but I felt like it wouldn't
+scale to add lots of these decorators and support for all of them in the
+meta-JIT. In the end I found a not fully obvious hint that is not a
+decorator, that is powerful enough to implement at least these last two and
+many similar patterns. This hint piggy-backs on the existing CSE
+implementation of pure functions in the meta-JIT.
+
+The hint works as follows: It is a new function called
+``record_known_result(result, func, *args)``. Calling that function has no
+direct effect at runtime. Instead, it communicates to the meta-JIT, that if
+it sees a function call to ``func`` with the arguments ``*args``, it can replace
+the result of that function call with ``res``.
+
+Since this is pretty abstract, it's best to look at an example. How would you
+use this to express the fact that ``rbigint_neg`` is its own inverse, which is
+function in the runtime that is responsible for computing the negative value
+of a big integers? The implementation of ``rbigint_neg`` looks roughly like
+this (it's actually a method and a tiny bit more complicated, but not really
+significantly so):
+
+.. code:: python
+
+    @elidable
+    def rbigint_neg(self):
+        return rbigint(digits=self.digits, sign=-self.sign)
+
+If we want to use the new hint to express that
+``rbigint_neg(rbigint_neg(x)) == x``,
+we need to rewrite the function somewhat, by introducing a pure helper
+function that does the actual computation, and turning the original function
+into a wrapper that calls the helper:
+
+.. code:: python
+
+    @elidable
+    def _rbigint_neg_helper(self):
+        return rbigint(digits=self.digits, sign=-self.sign)
+        
+    def rbigint_neg(self):
+        res = _rbigint_neg_helper(self)
+        record_known_result(self, _rbigint_neg_helper, res)
+        return res
+
+``record_known_result`` is a new function in the ``rpython.rlib.jit`` library that
+has the signature ``record_known_result(result, function, *args)``. What does
+this function do? Outside of the JIT, a call to that function is simply
+ignored. But when we trace the ``rbigint_neg`` function, the hint tells the JIT
+the following information: if at any point in the future (meaning further down the
+trace) we see another call to ``_rbigint_neg_helper`` with ``res`` as the argument,
+we can replace that call directly with ``self``, which is exactly the property
+that ``_rbigint_neg_helper`` is its own inverse. As another example, let's
+express the idempotence of ``bytes.lower``. We can imagine the implementation
+looking something like this (`the "real" implementation`__ is actually quite
+different, we don't want the extra copy of ``bytes.join``):
+
+.. __: https://foss.heptapod.net/pypy/pypy/-/blob/ab597702f7d9a267d3ae7c3fc91a5f25cd36a12e/rpython/rtyper/lltypesystem/rstr.py#L526
+
+.. code:: python
+
+    @elidable
+    def bytes_lower(b):
+        # implementation looks very different in practice
+        # just an illustration!
+        res = ['\x00'] * len(b)
+        for i, c in enumerate(b):
+            if 'A' <= c <= 'Z':
+                c = chr(ord(c) - ord('A') + ord('a'))
+            res[i] = c
+        return b"".join(res)
+
+To express that the function is idempotent, we need to express that
+``bytes_lower(bytes_lower(b)) == b``. We express this again with the same
+approach, move the implementation into a helper function, call the helper from
+the original function and call ``record_known_result`` too:
+
+.. code:: python
+
+    @elidable
+    def _bytes_lower_helper(b):
+        ... # as above
+
+    def bytes_lower(b):
+        res = _bytes_lower_helper(b)
+        record_known_result(res, _bytes_lower_helper, res)
+        return res
+
+
+This tells the meta-JIT that if ``res`` is later itself passed to
+``_bytes_lower_helper``, it can remove that call and replace it immediately
+with ``res`` (because ``res`` is already all lower cased, as its the result of
+a call to ``lower``), i.e. that ``_bytes_lower_helper`` is idempotent. (There
+are also other properties of lower and upper we could express in this way, for
+example that ``bytes.lower(bytes.upper(x)) == bytes.lower(x)``, let's leave it
+at that for now though).
+
+Both of these usage patterns of ``record_known_result`` could of course also be
+pulled out into general decorators again. For example a generic ``@idempotent``
+decorator could be implemented like this:
+
+.. code:: python
+
+    def idempotent(func):
+        # idempotent implies elidable
+        func = elidable(func)
+        def wrapper(arg):
+            res = func(arg)
+            record_known_result(res, func, res)
+            return res
+        return wrapper
+
+Then the decorator could be used like this for ``bytes_lower``:
+
+.. code:: python
+
+    @idempotent
+    def bytes_lower(b):
+        # implementation as in the original code above
+        ...
+
+
+Implementing ``record_known_result``
+========================================
+
+How is ``record_known_result`` implemented? As I wrote above, the implementation
+of that hint builds on the existing support for ``elidable`` functions
+in the optimizer of the meta-JIT. There are several optimizations that do
+something with elidable function calls: `constant folding`__, CSE__, `dead code
+elimination`__. Let's look at those work on ``elidable`` functions:
+
+ - Constant folding removes calls to elidable functions with constant results
+   (technically this is a bit complicated, but conceptually this is what
+   happens).
+ - CSE will replace calls to an elidable function by previous results, if they
+   appear a second time further down the trace.
+ - Dead code elimination will remove elidable function calls in the trace that
+   have unused results.
+
+.. __: https://en.wikipedia.org/wiki/Constant_folding
+.. __: https://en.wikipedia.org/wiki/Common_subexpression_elimination
+.. __: https://en.wikipedia.org/wiki/Dead_code_elimination
+
+So if there is a trace like this:
+
+.. code:: python
+
+    r1 = call_elidable((f), (1)) # constant-folded to 17
+    r2 = call_elidable((g), a, b)
+    r3 = call_elidable((g), a, b) # replaced by r2
+    r4 = call_elidable((h), c, d) # removed, result unused
+    print(r1, r2, r3)
+
+It will be optimized to:
+
+.. code:: python
+
+    r2 = call_elidable((g), a, b)
+    print((17), r2, r2)
+
+Some general notes about these traces: They are all in `single-static-assignment
+form`__ (SSA), meaning that every variable is assigned to only once. `¹`_
+They are also slightly simplified compared to "real" traces.
+
+.. __: https://en.wikipedia.org/wiki/Static_single_assignment_form
+
+Let's look at how the CSE pass that optimizes elidable calls, that is part of
+the meta-JIT works. In pseudocode it could look something like this:
+
+.. code:: python
+
+    def cse_elidable_calls(trace):
+        seen_calls = {}
+        output_trace = []
+        for op in trace:
+            if is_call_elidable(op):
+                # op.args are the function,
+                # followed by the arguments
+                # which are variables or constants
+                key = op.args
+                previous_op = seen_calls.get(key)
+                if previous_op is not None:
+                    replace_result_with(op, previous_op)
+                    # don't need to emit the op
+                    continue
+                else:
+                    seen_calls[key] = op
+            output_trace.append(op)
+        return output_trace
+
+There is quite a bit of hand-waving here, particularly around how
+``replace_result_with`` can work. But this is conceptually what the real
+optimization does. `²`_
+
+Making use of the information provided by ``record_known_result`` is done by
+changing the CSE pass in particular. Let's say you trace something like this:
+
+.. code:: python
+
+    x = bytes_lower(s)
+    ... some other code ...
+    y = bytes_lower(x)
+    print(x, y)
+
+
+This should  trigger the idempotence optimization. The resulting trace could
+look like this:
+
+.. code:: python
+
+    # bytes_lower itself is inlined into the trace:
+    r1 = call_elidable((_bytes_lower_helper), s1)
+    record_known_result(r1, (_bytes_lower_helper), r1)
+    ... intermediate operations ...
+
+    # second call to bytes_lower inlined into the trace:
+    r2 = call_elidable((_bytes_lower_helper), r1)
+    record_known_result(r2, (_bytes_lower_helper), r2)
+    print(r1, r2)
+
+The CSE pass on elidable functions will now optimize away the call that results
+in ``r2``. It does this not by replacing ``r2`` by a previous call to
+``_bytes_lower_helper`` with the same arguments (such a call doesn't exist),
+but instead makes use of the information conveyed by the first
+``record_known_result`` trace operation. That operation states that if you see
+a call like the second ``_bytes_lower_helper`` you can replace it with ``r1``.
+The resulting optimized trace therefore looks like this:
+
+.. code:: python
+
+    r1 = call_elidable((_bytes_lower_helper), s1)
+    ... intermediate optimizations, optimized ...
+    # call removed, r2 replaced with r1
+    print(r1, r1)
+
+The ``record_known_result`` operations are also removed, because further
+optimization passes and the backends don't need them. To get this effect, we
+have to change the pseudocode above to teach the CSE pass about
+``record_known_result`` operations in the following way:
+
+.. code:: python
+
+    def cse_elidable_calls(trace):
+        seen_calls = {}
+        output_trace = []
+        for op in trace:
+            # <---- start new code
+            if is_record_known_result(op):
+                # remove the first argument,
+                # which is the result
+                key = op.args[1:]
+                # the remaining key is function called,
+                # followed by arguments, like below
+                seen_calls[key] = op.args[0]
+                # don't emit the record_known_result op
+                continue
+            # end new code ---->
+            if is_call_elidable(op):
+                # op.args are the function,
+                # followed by the arguments
+                # which are variables or constants
+                key = op.args
+                previous_op = seen_calls.get(key)
+                if previous_op is not None:
+                    replace_result_with(op, previous_op)
+                    # don't need to emit the op
+                    continue
+                else:
+                    seen_calls[key] = op
+            output_trace.append(op)
+            return output_trace
+
+That's all! So from the point of view of the implementation of CSE of elidable
+functions, the new hint is actually very natural.
+
+In the case of function inverses, dead code elimination also plays an important
+role. Let's look at the trace of a double negation, maybe like this:
+``x = -y; ...; print(-x)``:
+
+.. code:: python
+
+    r1 = call_elidable((_rbigint_neg_helper), a1)
+    record_known_result(a1, (_rbigint_neg_helper), r1)
+    ... intermediate stuff
+    r2 = call_elidable((_rbigint_neg_helper), r1)
+    record_known_result(r1, (_rbigint_neg_helper), r2)
+    print(r2)
+
+After CSE, the second call is removed and the trace looks like this, because
+``r2`` was found to be the same as ``a1``:
+
+.. code:: python
+
+    r1 = call_elidable((_rbigint_neg_helper), a1) # dead
+    ... intermediate stuff, CSEd
+    # call removed
+    print(a1)
+
+Now dead code elimination notices that the first call is not needed any more
+either and removes it.
+
+What is good about this design? It very neatly ties into the existing
+infrastructure and is basically only about 100 lines of changes in the
+meta-JIT. The amount of work the optimizer does stays essentially the same, as
+the new hints are basically directly usable by CSE which we already do anyway.
+
+Performance effects
+====================
+
+So far, we haven't actually used this new hint in PyPy much. At this point, the
+hint is only a new tool in the interpreter author toolbox, and we still need to
+find the best places to use this tool. The only use of the hint so far is an
+annotation that tells the JIT that encoding and decoding to and from utf8 are
+inverses of each other, to be able to optimize this kind of code:
+``x = someunicode.encode("utf-8").decode("utf-8")`` by replacing ``x`` with
+``someunicode`` (of course in practice there is usually again some distance
+between the encode and decode calls). This happens in a bunch of places in real
+code that I saw, but I didn't do a super careful study of what the performance
+effect is yet.
+
+Limitations
+=============
+
+What are the problems and the limitations of the approach I described in this
+post?
+
+Correctness remains tricky! If you write the wrong hints, the meta-JIT will
+potentially miscompile your users' programs. To at least get some signal for
+that, ``record_known_result`` actually performs the hinted call and does an
+assert on the result if you run the program untranslated while executing tests.
+In combination with for example property-based testing this can find a lot of
+the bugs, but is of course no guarantee.
+
+Many things aren't expressible. The new hint is much less powerful than some of
+the recent pattern based optimization systems (e.g. `metatheory.jl`__) that
+allow library authors to
+express rewrites. Instead, we designed the hint to minimally fit into the
+existing optimizers at the cost of power and ease of use. The most
+obvious limitation compared to pattern based approaches is that the
+``record_known_result`` hint cannot quantify over unknown values. It can only use values
+that are available in the program. As an example, it's not really possible to
+express that ``bigint_sub(x, x) == bigint(0)`` for *arbitrary* big integers
+``x``.
+
+.. __: https://arxiv.org/abs/2112.14714
+
+Another limitation of the hint is that currently it is only applicable to
+pure/elidable functions. This makes it not really applicable to any kind of
+*mutable* data structure. As an example, in theory ``sorted(list)`` is
+idempotent, but only as long as the lists involved aren't mutated between the
+two calls to ``sorted``. Reasoning about mutation doesn't really fit into the
+model easily. The meta-JIT itself is actually able to do a lot of tracking of
+what kinds of mutations occurred and what the heap must look like. But we
+haven't found a good way to combine this available information with
+user-provided information about function behaviour.
+
+Conclusion
+==============
+
+We added two new hints to RPython's meta-JIT that allow the interpreter author
+to express language-specific optimizations. We are still only getting used to
+these new hints and their possible applications and will need to collect more
+experience about how big the performance implications are in practice for real
+programs.
+
+Footnotes
+------------
+
+.. _`¹`:
+
+¹ In fact, there is not really a concept of "variable" at all, instead all
+variables are identical with the operations that produce them.
+
+.. _`²`:
+
+² Some details on the hand-waving: replacing ops with other ops is implemented
+using a union-find__ data-structure to efficiently allow doing arbitrary
+replacements. These replacements need to influence the lookup in the
+``seen_calls`` dict, so in practice it's not even a dictionary at all. Another
+way that the pseudocode is simplified is that we don't in practice have tiny
+passes like this that go over the trace again and again. Instead, we have a
+single optimization pass that goes over the trace in forward direction once.
+
+.. __: https://en.wikipedia.org/wiki/Disjoint-set_data_structure
+
+