This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
ULTK (Unnatural Language ToolKit) is a Python library for computational semantic typology research — specifically for "efficient communication" analyses that explain natural language structure in terms of competing pressures: minimizing cognitive complexity vs. maximizing communicative accuracy.
# Install all dependencies (including dev group for tests)
uv sync --group dev
# Run all tests
uv run pytest src/tests/
# Run a single test file
uv run pytest src/tests/test_language.py
# Run a single test by name
uv run pytest src/tests/test_language.py::TestLanguage::test_name
# Format code (Black is enforced via CI on PRs)
black src/Tests are discovered automatically by pytest from src/tests/. The CI workflow runs uv run pytest src/tests/ from the repo root.
ultk.language — Core data structures for semantic representations:
semantics.py:Referent(immutable semantic object),Universe(collection of Referents with a prior distribution),Meaning(mapping from Universe to arbitrary type T — e.g., booleans for truth values)language.py:Expression(form + meaning pair),Language(frozenset of Expressions sharing a Universe). Helperaggregate_expression_complexity()bridges language and effcomm.sampling.py: Generators for all meanings, expressions, and languages from a universe — used to enumerate the full hypothesis space.grammar/: A probabilistic context-free grammar (PCFG) framework for building expressions as programs in a Language of Thought.grammar.pydefinesRuleandGrammar/GrammaticalExpression;likelihood.pyprovides scoring functions;inference.pyhandles MDL/Bayesian inference.
ultk.effcomm — Efficient communication analysis tools:
agent.py: RSA (Rational Speech Act) agents —LiteralSpeaker,LiteralListener,PragmaticSpeaker,PragmaticListener— represented as weight matrices.informativity.py:informativity()andcommunicative_success()— compute how well a language supports communication (vectorized asdiag(prior) @ S @ R ⊙ U).tradeoff.py: Pareto front computation (pareto_optimal_languages,non_dominated_2d,dominates) for simplicity/informativeness trade-off analysis.optimization.py:EvolutionaryOptimizer— iterative algorithm to approximate the Pareto frontier via mutations (AddExpression,RemoveExpression).sampling.py:get_hypothetical_variants()— generates null-hypothesis languages by permuting speaker weight matrices.analysis.py: Aggregation utilities for building results DataFrames.
ultk.util:
frozendict.py:FrozenDict— an immutable dict used extensively as keys in frozen dataclasses.io.py: I/O helpers.
- Core objects (
Universe,Meaning,Expression) are frozen/immutable (@dataclass(frozen=True)or manual_frozenflag), enabling hashing and use as dict keys. Meaningstores its mapping as atuple[T, ...]indexed parallel toUniverse.referents, with_ref_to_idxfor O(1) lookup. Access viameaning[referent].Languagestores expressions as afrozenset— order-independent, hashable.- Grammar rules are defined via Python type annotations;
Rule.from_callable()introspects function signatures to build rules automatically.
src/examples/ contains complete worked analyses:
indefinites/— efficient communication analysis of indefinite pronounsmodals/— semantic universals for modalslearn_quant/— quantifier learning