Skip to content

leontrolski/jubbly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jubbly

Application code largely consists of:

  • Business logic.
  • Code that sends and retrieves data to databases and external APIs.

jubbly is a data-first applications language (as opposed to a systems language).

It's also not remotely complete..

Aims

  • Vaguely mypy/TypeScript like typing, but much simplified and with comptime-like (importtime) capabilities. Types themselves are just Structs, meaning we don't need a special type-level langauge to transform types during typechecking, and that types themselves are easily serializable.
  • Immutable data by default. Everything is sortable, all values have canonical hash. Immutability made ergonomic with novel alt keyword - effectively a generalised +=
  • Serialize everything to/from bytes quickly and canonically for inserting into DBs/sending over the wire. Natively handle versioning and backwards compatability of stored data and API data.
  • Interpreted language with an aim for optional compilation. Very small core AST with sugary CST on top. Deep Rust interop à la PyO3, but typed-ier. Consistent syntax.

Basic Example

pub fib = fn[n: I64]: I64 -> {
    if n <= 2 n - 1 else fib(n - 1) + fib(n - 2)
}
let Person = *[
    name: String,
    age: I64,
]
let oli = Person["Oli", age=33]

Language Tour

There are Atoms, and there are composite types (for now, these are based on types from im.

None, Bool, I64, F64, String
Vector[T], OrdSet[T], OrdMap[K, T], Struct

We can construct composite types like:

[1, 2, 3]
OrdSet[_][4, 5, 6]
[1 => "one", 2 => "two"]

Structs are defined like:

pub Person = *[
    name: String,
    age: I64 = 42,
]

We construct Structs like:

Person["Oli", age=33]

Struct definitions are themselves just structs - the definition of Person above is actually just sugar for:

pub Person = Struct[
    "some/namespace:Person",
    [
        Field[0, "name", String, no_default],
        Field[1, "age", I64, 42],
    ],
]

Postfix/Infix

Namespaces are just bags of functions. To get postfix/infix syntax, we use tildes.

One tilde passes the value on the left as the first argument to any function:

" foo "~string:trim()
// "foo"

Two tildes applies the values on either side to any binary function:

[1, 2, 3]~map~fn[v] -> v + 1
// [2, 3, 4]

Deep reassignment with alt

As everything is immutable, we need a cute way of making new values with deeply nested values altered - enter alt. You can think of alt as += generalised over any binary function, over any level of nesting. The equivalent to += is:

let x = 5
alt x + 1
// x = 6

But we can use it over abitrary functions/nesting:

pub A = *[x: I64]

let l = [1, [2, A[3]], 4]
alt l.[1].[0]= 99  // note `]=` and `.=` are binary functions that return a new value
alt l.[1].[1].x + 5
alt l~push~5
// l = [1, [99, A[8]], 4, 5]

Early return

There is an early return operator similar to rust's ? - but it is generic, so can take an argument. The following are syntactically equivalent:

bar?  // missing argument defaults to `Error`
if is_instance(let _ = bar, Error) return _ else _

foo:bar()?RuntimeError + 4
(if is_instance(let _ = foo:bar(), RutimeError) return _ else _) + 4

Blocks

Blocks are what lets are lexically scoped to. They evaluate to the last expression.

{
    let a = 4
    a = 5
    a
}

If expressions

Given no trailing else, if expressions default to none. Blocks are optional:

let is_gt_zero = if i > 0 true else false
if foo return 42

Namespaces

Namespaces are files containing values, each value is declared with let or pub.

In $JUBBLY_PATH/foo/bar.jub:

pub x = 42

In another file (note the only top-level things we can do in namespaces are use, let|pub, firstly):

use foo/bar as b

firstly {
    print(b:x)
}

There are no relative imports, as aliases are encouraged, and are just a transform at the CST -> AST level for b:x -> foo/bar:x.

Generics

Generics are basically just functions - the keyword is gn not fn and calling is like f<arg1, arg2, ...> not f(arg1, arg2, ...).

The main difference vs functions is that at runtime, we just pass through to the return value - they are only "called" at importtime for typechecking.

Consider the identity function in the prelude:

pub identity = gn[T] -> fn[x: T, /]: T -> x  // note, args before `/` are positional

As opposed to a function, if you don't directly call a generic with <> - eg. identity(42) - it desugars during typechecking to - identity<_>(42) and we try infer the _ type value from the rest of the expression.

Closures as OO

If you need something object-like, use a closure. This is hopefully ugly enough to discourage frequent usage.

pub object_like = fn[] -> {
    let x = 0
    *[
        get_x = fn[] -> x,
        inc_x = fn[] -> {
            x = x + 1
            none
        },
    ][]
}
let obj = object_like()
obj.inc_x()
obj.inc_x()
obj.get_x()
// 2

Small Core

The CST nodes are:

Parens
Int
Float
String_
Name
Call
Callgn
Callprefix
Callinfix
Callinfixtilde
Callpostfixtilde
Dotget
Dotset
Itemget
Itemset
Let
Set
Setpub
Alt
Builtin
Fn
Gn
And
Or
If
While
Block
Return
Returnif
Returniferror
Construct
Constructpair
Constructtype
Template
Refliteral

Raw source code, including comments, transforms directly to and from CST nodes.

CST nodes are transformed to a much smaller set of AST nodes, and it is these that we interpret/compile.

Atom
Name
Call
Callgn
Let
Set
Setpub
Builtin
Fn
Gn
And
Or
If
While
Block
Return

TODO

  • Plan typechecking.
  • Need to parse (and interpret?) the prelude before anything such that names get set correctly.
  • Iron out some of the path stuff now - how do we handle versioning, is there a JUBBLYPATH, etc.
  • Developer Experience
  • Typing
    • Generics, including filling in _s. Do we need a named _ per <A, B, ...>?
    • Type aliases that give nice errors. NewType.
    • Do we need Interface[message: String] or can we just pass in Fn[T]: String.
    • Typechecking CLI.
    • Make OrdSet[T] etc N[T] that are all covariant.
    • Annotated?
    • Check that we don't overwrite non-reassignable namespace variables. Check we don't double let(?). Check := lines up. Check recursively that := only gets called by firstly - see also reassignable.
    • fn AST nodes should say which values are in their closure. FnWithoutClosure AKA FnPure(?) type. Also, FnWith1Arg... for passing to .map like js. PartialEq for Fn can be more clever with block scopes, checking if they actually contain any values that could get mutated. Add is_dynamic to fns/structs. Maybe we should add referenced vars instead of just count.
    • impure!? / Some kind of typing for effects?
    • Stream/Iterator - is this just any function with a closure? Can we get away without one? Eg. for db rows, we could just do like: db:cursor(config, statement)~db:loop~fn[row] -> f(row)
    • Does the type system eg. render TypeVarTuples irrelevant? Is @overload just some special case of something else?
  • Performance
    • Serialization currently aims to be on top of msgpack, look more into zerovec, rkyv, bincode, bitcode, wit-bindgen. See benchmarks.
    • Any function call that references nothing mutable can be cached.
    • Unsafe resolve?
    • Can we statically know if we need mutable scopes? Think about all the different flavours of functions. Pure, with closure, with named args etc. If a function doesn't leak its scope, we can just use a static scope that we create at the time we create the function (note we have to create scopes for the blocks as well).
    • Should Values from scopes and as returned from evaluate(...) be references?
    • Should we use unsafe {&self.scope.get_unchecked(i)?
    • Can Context have a lifetime so we don't have to copy in and out of the scope?
    • Can we preserve the performance characteristics of im? ie. Things remain mutable until they aren't.
    • Look more seriously at string interning.
    • Can we compile away dot access to [i] access?
    • Consider doing like: https://www.cs.cornell.edu/~asampson/blog/flattening.html
    • Can we speed up startup by AOT serializing the prelude?
    • Rust interop rkyv, extism.
      • Write a macro for the builtin args to check the types - codegen based off of Jubbly types.
      • Start with just a vec of ints
      • Benchmark fib()
    • Compile all the way to rust with shitload of RCs? see. Starts at some entry point and monomorphizes?
    • Search
      • fastest map
      • matklad parsing
      • wanabethatguy lsp
      • domenicquirl cstree
      • rust-langdev
      • lalrpop
  • Serialization/Versioning
    • If we've stored a nested value without a name, if we subsequently change the type to a union, it's not immediately clear how we should deserialize this old data - we use the order of the union to decide (and make changing the first value of the union a backwards incompatible change).
    • Namespace versioning? default to currently installed. use pydantic//fields or explicit major version use pydantic/2/fields. The whole namespace thing needs a bit more thought - eg. how would one implement maniple services.
    • Our hash/__eq functions need to ignore values where the value is the default value, or the value is missing. If we add a default value where there was none before, we'll need to distinguish these from normal defaults.
    • Our "check_versions_compatible_over_time" function needs to check that we never update the default values.
  • DB
    • Example polymorphic JOIN across types a la meta to elec or gas.
    • It is possible to write a custom sorting function for Postgres indexes according to chatgpt.
    • ZSet - indexes: Vec[String] # remember ZSet[T, Index[K]] -> Grouped[K, ZSet[T]]
    • Is DBSP just f(schemas, queries) -> [steps that update cache tables]..? See: https://arxiv.org/pdf/2404.16486. Think about Postgres equivalents to ZSet ops.
    • Write an ORM.
    • Write a FastAPI.
  • Sugar/Context
    • s/String/Str/ s/Vector/Vec OrdMap Set...?
    • Allow duplicate lets - should be pretty easy with naming stuff.
    • #[] for sets.
    • Literal literals.
    • Syntax for let one = 1~uptype~Literal[1] (note this won't work as one is not a subtype of the other).
    • Datetime literals - do we even need a prefix? Do we Instant?
    • Decimal literals.
    • Obvious features from other langs:
      • Unpacking.
      • JSX styley - #<div>...<> - #< enters jsx parser mode.
      • Pandas-like interface for common operations.
      • Allow template functions. Potentially, just using backticks should return the template vector itself.
      • Comprehensions?
      • /, * like Python to specify args.
      • Converting match to if expressions.
    • "Blessed" subsets of the language, for eg: configuration, nodejs compatability.
    • Code that fails because of a type error that wasn't caught by the typechecker is considered a bug.
  • Refactor
    • Can we roll fn, and, or, else. Into Quoted, EvaluateQuoted. Or just make and, or, else desugar to functions?
    • Only have one Arena per ns?
    • Try get rid of register_name and just use scope.
    • Look again at all type weirdness, remove a load of .into()s.
    • Look at other commonly used traits.

Testing

cd interpreter
cargo test
cargo run -- test

cargo clippy -- -Wclippy::pedantic
cargo flamegraph --root -- run std/tests/test_basic:run_fib

About

data-first applications language

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages