-
Notifications
You must be signed in to change notification settings - Fork 6
Description
As noted in our security docs we currently have no protection against a billion laughs attack by FTL authors, either at compile-time or run-time.
Note the attack vector here is a malicious FTL author, which might be unlikely but should be considered for some usage scenarios. We are not talking about runtime issues where the attacker controls only the substitution, not the FTL message.
Compile-time
Example
-term1 = lol
-term2 = {-term1}{-term1}{-term1}{-term1}{-term1}{-term1}{-term1}{-term1}{-term1}{-term1}
-term3 = {-term2}{-term2}{-term2}{-term2}{-term2}{-term2}{-term2}{-term2}{-term2}{-term2}
# etc
message = {-term9}Due to our current strategy of inlining all terms and simplifying, this will attempt to generate a function like:
def message(args, errors):
return "lollollollollollollollollollollollollollollollollollol..."and you'll use up a lot of memory at compile time.
We could protect against this by a combination of some kind of depth counter and reference counter in the compiler, and bailout when we hit the limits. In real world FTL, there is very rarely a need to have lots of references to other items, or deeply nested references.
Run-time
We don't inline messages at the call site, so the equivalent with messages would produce a run-time issue:
msg1 = lol
msg2 = {msg1}{msg1}{msg1}{msg1}{msg1}{msg1}{msg1}{msg1}{msg1}{msg1}
msg3 = {msg2}{msg2}{msg2}{msg2}{msg2}{msg2}{msg2}{msg2}{msg2}{msg2}
# etc.Which compiles to something like:
def msg1(args, errors):
return "lol"
def msg2(args, errors):
return f'{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}{msg1(args, errors)}'
# etcAttempting to use the last function in the chain would produce a very large string at runtime.
We could address this in two ways:
-
At run-time - the compiled code for each message could check call depth in some way (e.g. by a passed in
current_depthparameter). This would be a performance hit on every message, and relatively speaking a very large one for the common case. -
At compile time, by:
- noting that we already disallow cycles i.e. recursion or mutual recursion.
- We can therefore produce a total ordering of the functions we need to generate in terms of dependency on other functions. (See elm-fluent, which shares a lot of code with python-compiler in the copy-paste sense, and does this ordering of functions. Some of these could be copied over easily - https://github.com/elm-fluent/elm-fluent/blob/777477bea84b475d0489032fa923b71f32f15c88/src/elm_fluent/compiler.py#L212 and https://github.com/elm-fluent/elm-fluent/blob/777477bea84b475d0489032fa923b71f32f15c88/src/elm_fluent/compiler.py#L580)
- We go to the bottom (functions that call no other functions) and label functions with no dependencies with
calls_others_depth=0 - and go up the chain, adding
calls_others_depth = max(function.calls_others_depth for function in this_function.functions_that_i_call)to each function. - We can then impose some kind of low limit on this depth (e.g. 4)
- In addition we applying a low limit on the number of substitutions allowed per message (e.g. 10)
We may need to make some of these limits configurable.
As per normal fluent rules, we should not bail out with exceptions in these cases, but produce message functions that:
- have truncated output
- emit errors at compile-time/run-time errors as appropriate (normally both)