Skip to content

Latest commit

 

History

History
171 lines (128 loc) · 3.57 KB

File metadata and controls

171 lines (128 loc) · 3.57 KB

Peppermint

A pipe-first language for data and ML work, running on top of Python. Every operation is a pipeline step and errors propagate automatically. The Python ecosystem (pandas, scikit-learn, or your own code) is accessible from within the language.

Install

pip install peppermint-lang
pip install peppermint-lang[ml]    # + scikit-learn, umap, openai
pip install peppermint-lang[lsp]   # + language server
pip install peppermint-lang[all]   # everything

Or from source:

git clone https://github.com/chayapatr/peppermint
cd peppermint
pip install -e ".[all]"

Run

pep file.pep  # run a file
pep           # interactive REPL
pep lsp       # start language server (stdio)

Examples

load("employees.csv")
  |> filter(it.age > 18)
  |> add(tax: it.salary * 0.2)
  |> sort(by: "salary", dir: "desc")
  |> print()

Each step prints a live summary:

|> filter    → List  843 rows × 5 cols  (157 dropped)
|> add       → List  843 rows × 6 cols  (+tax)
|> sort      → List  843 rows × 6 cols
use ml
use viz
use env

load("data.csv")
  |> add(embedding: ml.embed(it.text,
      source: "deepinfra", model: "Qwen/Qwen3-Embedding-4B",
      apikey: env.DEEPINFRA_TOKEN))
      @concurrent(10)
  |> ml.kmeans(k: 2..8, on: "embedding", out: "cluster")
  |> ml.umap(dims: 2, on: "embedding", out: "umap")
  |> viz.scatter(x: "umap_1", y: "umap_2", color: "cluster", display: { label: "text", legend })
Aggregate
load("sales.csv")
  |> collapse(by: "region",
      avg: mean(col.revenue),
      n:   count()
  )
  |> sort(by: "avg", dir: "desc")
  |> print()
Top N per group
load("sales.csv")
  |> each(by: "region",
      |> add(rank: rank(col.revenue, dir: "desc"))
      |> filter(it.rank <= 3)
      |> drop("rank")
  )
  |> print()
LLM enrichment with retry and caching
use ml
use env

load("posts.csv")
  |> add(label: ml.llm(it.text,
      source: "openai", model: "gpt-4o",
      apikey: env.OPENAI_API_KEY, format: "json"))
      @concurrent(10)
      @retry(3)
      @until(it.label != none, max: 5)
      @cache

match(len(result.errors),
  == 0: result.data |> save("output.csv"),
  _:    halt("rerun to retry {len(result.errors)} failed rows")
)

@cache on ml.llm caches each row by content hash. Failed rows are never cached, so rerunning retries them automatically.

Error handling
result = load("data.csv")
  |> filter(it.score > 0.5)

match(result,
  Ok(data): data |> print(),
  Err(msg):  print(msg)
)
Python bridge
use "./transforms.py" as t

load("data.csv")
  |> t.clean()
  |> print()

Python functions receive and return plain Python types. Conversion is automatic.


Editor support

The LSP server (pep lsp) provides diagnostics, hover docs, completions, and go-to-definition for any LSP-capable editor.

VSCode — install the extension from ecosystem/vscode-peppermint/. It auto-discovers pep via mise, pyenv, or Homebrew — no PATH setup needed.

Neovim:

vim.lsp.start({ name = "peppermint", cmd = { "pep", "lsp" }, root_dir = vim.fn.getcwd() })

Helix (~/.config/helix/languages.toml):

[language-server.peppermint-lsp]
command = "pep"
args = ["lsp"]

See docs/language.md for the language reference, docs/stdlib.md for all stdlib functions, and docs/ecosystem.md for full editor setup.