Constrained Predicate Belief Projection for Neuro-Symbolic Planning

A neuro-symbolic planning framework that maintains factorized symbolic belief states, projects them through a CP-SAT constraint compiler, and plans over the Top-K feasible worlds with active uncertainty-driven sensing.

Research Goal

Classical AI planning assumes a fully observed, logically consistent world state. In practice, perception systems produce noisy, physically inconsistent predicate estimates — e.g. simultaneously predicting a block is both held and on_table, or failing to bind the correct target instance in a cluttered household scene. Feeding such states directly into a symbolic planner causes catastrophic failures.

This framework answers: how do we plan reliably when perception is noisy and the logical state space is partially observed?

The core contribution is a three-stage pipeline:

Factorized Bayesian belief tracking over grounded predicates
Constraint-based symbolic projection (CP-SAT MAP inference) to eliminate physically impossible world states
Top-K world planning with active sensing to act under irreducible uncertainty
Environment-specific grounding to map symbolic actions back to executable commands

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                      Perception Layer                           │
│   RGB → CLIP / learned heads  OR  text → symbolic parser        │
│   (src/perception/clip_vision.py, alfworld_text.py)             │
└───────────────────────────┬─────────────────────────────────────┘
                            │  raw probabilities p̂(pred_i)
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Belief Update Layer                          │
│   Log-odds Bayesian filter: l_t = l_{t-1} + α·logit(p̂_obs)    │
│   (src/belief/update.py, src/belief/state.py)                   │
└───────────────────────────┬─────────────────────────────────────┘
                            │  continuous marginals B_t = {p(pred_i)}
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                 CP-SAT Constraint Compiler                      │
│   YAML domain rules → mutex/implication constraints →           │
│   Top-K MAP feasible world assignments W = {w_1,...,w_K}        │
│   (src/belief/projection.py, domains/*/constraints.yaml)        │
└───────────────────────────┬─────────────────────────────────────┘
                            │  K consistent Boolean world states
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│               Top-K Belief-Weighted Planner                     │
│   For each w_k: run Pyperplan → get first action a_k            │
│   Score actions by softmax-weighted joint probability           │
│   If max score < 0.40: trigger Active Sensing (IG)              │
│   (src/planning/sample_belief_planner.py)                       │
└───────────────────────────┬─────────────────────────────────────┘
                            │  selected action
                            ▼
                    Environment / Execution

For a deeper walkthrough see ARCHITECTURE.md.

Latest Validated Results

Blocksworld

Corrected fixed-seed sweeps (seeds=0..4, 3 episodes per seed) show that the repaired projection layer makes the full planner robust across substantial observation noise.

Setting	Success	Avg. Steps	Avg. Sensing	Source
`full_top_k`, `zero_shot`, `noise=0.5`, `decay=0.7`	`100%`	`4.2`	`1.0`	local seed sweep
`full_top_k`, `calibrated`, `noise=0.5`, `decay=0.7`	`100%`	`5.5`	`1.3`	local seed sweep
`full_top_k`, `learned_head`, `noise=0.5`, `decay=0.7`	`100%`	`4.13`	`0.87`	local seed sweep
`full_top_k`, `zero_shot`, `noise=0.7`, `decay=0.7`	`100%`	`4.53`	`0.87`	local seed sweep
`full_top_k`, `learned_head`, `noise=0.7`, `decay=0.7`	`100%`	`4.07`	`1.0`	local seed sweep
`full_top_k`, `zero_shot`, `noise=0.9`, `decay=0.7`	`100%`	`4.33`	`0.8`	local seed sweep

ALFWorld

The current ALFWorld smoke test put_egg_in_microwave now succeeds end to end in 5 steps:

goto_location sinkbasin_1
take_from_surface egg sinkbasin_1
goto_location microwave_1
open_receptacle microwave_1
put_in_container egg microwave_1

This result is currently a smoke-test success, not yet a full ALFWorld benchmark table.

What Counts As a Baseline Here

Implemented Blocksworld planning baselines:

threshold: hard-threshold predicates, no belief state, no projection
belief_no_verifier: belief update only, no projection
belief_plus_verifier: belief update plus single projected world
full_top_k_no_sense: Top-K projection without sensing fallback
full_top_k: full system

Implemented perception baselines:

zero_shot
calibrated
learned_head

Current public-ready result story:

Blocksworld baseline machinery is implemented and the corrected full-system sweeps are strong.
ALFWorld is now working as a live environment path, but still needs a broader comparative evaluation before claiming benchmark-level superiority.

Repository Structure

Belief-PDDL/
├── domains/
│   ├── blocksworld/
│   │   ├── constraints.yaml     # Mutex & implication rules for CP-SAT
│   │   ├── predicates.yaml      # Predicate taxonomy (unary / binary)
│   │   └── domain.pddl          # PDDL domain for Pyperplan
│   └── alfworld/
│       ├── constraints.yaml
│       ├── predicates.yaml
│       └── domain.pddl
├── src/
│   ├── belief/
│   │   ├── state.py             # PredicateBelief: dict of marginals
│   │   ├── update.py            # BeliefUpdater: log-odds Bayesian filter
│   │   └── projection.py        # BeliefProjector: CP-SAT Top-K compiler
│   ├── perception/
│   │   ├── clip_vision.py       # CLIPVisionBackbone (zero-shot scoring)
│   │   ├── backbones.py         # Generic encoder interface
│   │   ├── unary_head.py        # MLP head for unary predicates
│   │   ├── binary_head.py       # MLP head for binary predicates
│   │   └── calibrate.py         # Temperature scaling calibration
│   ├── planning/
│   │   ├── deterministic_planner.py   # Pyperplan wrapper
│   │   └── sample_belief_planner.py   # Top-K weighted planner + IG
│   ├── execution/
│   │   └── replan_loop.py       # Belief-aware replanning loop
│   └── envs/
│       ├── blocksworld_env.py   # Procedural Blocksworld environment
│       └── alfworld_env.py      # ALFWorld / AI2-THOR wrapper
├── scripts/
│   ├── run_benchmarks.py        # Ablation harness (CLIP + CP-SAT)
│   ├── run_alfworld_eval.py     # Full 3D ALFWorld evaluation
│   ├── collect_rollouts.py      # Offline episode data collection
│   └── test_belief.py           # Unit tests for belief module
├── tests/mocks/
│   └── blocksworld_env.py       # Semantic mock env with PIL renderer
├── notebooks/
│   └── blocksworld_demo.ipynb   # Getting started walkthrough
├── run_all_experiments.sh       # Full ablation orchestrator
├── Dockerfile                   # Headless CUDA + xvfb container
├── requirements.txt
└── pyproject.toml

Installation

Option A — Local (Recommended for RTX GPU)

# 1. Clone
git clone https://github.com/nagsujosh/Belief-PDDL.git
cd Belief-PDDL

# 2. Create environment (Conda recommended for CUDA pinning)
conda create -n belief_pddl python=3.10 -y
conda activate belief_pddl

# 3. Install PyTorch with CUDA 12.1+ (required for RTX 4090/5090)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# 4. Install remaining dependencies
pip install -r requirements.txt
pip install -e .

# 5. Download ALFWorld data (~3-5 GB, only needed for ALFWorld eval)
export ALFWORLD_DATA=~/.cache/alfworld
alfworld-download && alfworld-download --extra

Option B — Docker (Headless / Reproducibility)

docker build -t belief-pddl .
docker run --gpus all -v $(pwd)/outputs:/app/outputs belief-pddl

Running Experiments

Blocksworld Ablation Suite

Run the full ablation matrix:

./run_all_experiments.sh

This sequentially runs the main ablation conditions and writes results to outputs/benchmarks/*.jsonl.

Run a single condition manually:

# Full system: Belief + CP-SAT + Top-K + Active Sensing
python scripts/run_benchmarks.py --mode full_top_k --noise 0.3 --k 3 --episodes 10

# Ablation: no verifier (raw belief → planner)
python scripts/run_benchmarks.py --mode belief_no_verifier --noise 0.3 --episodes 10

# Ablation: no sensing (Top-K but no IG fallback)
python scripts/run_benchmarks.py --mode full_top_k_no_sense --noise 0.5 --k 3 --episodes 10

# Baseline: hard threshold (VLM prob > 0.5 → True)
python scripts/run_benchmarks.py --mode threshold --noise 0.3 --episodes 10

CLI Arguments for run_benchmarks.py:

Argument	Default	Description
`--mode`	`full_top_k`	`threshold`, `belief_no_verifier`, `belief_plus_verifier`, `full_top_k_no_sense`, `full_top_k`
`--noise`	`0.3`	VLM hallucination rate (0.0 = perfect, 1.0 = completely adversarial)
`--k`	`3`	Number of Top-K feasible worlds to enumerate
`--episodes`	`5`	Number of independent evaluation episodes

Summarize results after a run:

python summarize_results.py

Recommended fixed-seed comparison run:

python scripts/run_seed_sweep.py \
  --mode full_top_k \
  --noise 0.5 \
  --episodes 3 \
  --k 3 \
  --alpha 1.0 \
  --decay 0.7 \
  --device cuda \
  --perception-mode learned_head \
  --learned-head-path outputs/perception/perception_compare_e200_learned_head.pt \
  --seeds 0,1,2,3,4 \
  --output outputs/benchmarks/seed_sweep_noise_0.5_decay_0.7_learned_head.json

ALFWorld Evaluation (Full 3D Embodied)

Requires an ALFWorld text-game dataset root containing generated games (traj_data.json and game.tw-pddl) plus a display when using the visual stack. For the current text-first benchmark, pass the dataset root explicitly:

# Text-first symbolic benchmark on the default preset
python scripts/run_alfworld_eval.py \
  --task-preset put_egg_in_microwave \
  --alfworld-data /path/to/alfworld/data

# Headless (SSH / server)
xvfb-run -a python scripts/run_alfworld_eval.py \
  --task-preset put_egg_in_microwave \
  --alfworld-data /path/to/alfworld/data

Results are written to outputs/eval/alfworld_metrics.json.

The current validated smoke task is:

python scripts/run_alfworld_eval.py \
  --task-preset put_egg_in_microwave \
  --alfworld-data ~/.cache/alfworld \
  --episodes 1 \
  --max-steps 15 \
  --alpha 1.0 \
  --decay 1.0 \
  --k 3

Hyperparameters

Belief Update (`src/belief/update.py` — `BeliefUpdater`)

Parameter	Default	Meaning
`alpha`	`1.0`	Weight applied to incoming VLM observation evidence in log-odds space. Higher = faster belief swing per observation
`beta`	`2.0`	Weight for deterministic action effect updates (e.g. after executing `pickup`, block is definitely held)

The update rule is:

l_t = l_{t-1} + alpha * logit(p_obs) + beta * logit(p_action)
p_t = sigmoid(l_t)

CP-SAT Projector (`src/belief/projection.py` — `BeliefProjector`)

Parameter	Default	Meaning
`k`	`3`	Number of Top-K MAP feasible worlds to enumerate via successive blocking
score precision	`×1000`	Log-odds scores are scaled to integers (CP-SAT requires integer coefficients)

The objective being maximized:

maximize  Σ_i  score(p_i) · x_i
subject to  YAML mutex constraints
            YAML implication constraints

where score(p) = int(log(p / (1-p)) × 1000).

Top-K Planner (`src/planning/sample_belief_planner.py` — `SampleBeliefPlanner`)

Parameter	Default	Meaning
Sensing threshold	`0.40`	If the top-voted action scores below 40% softmax weight across K worlds, switch to active sensing
IG formula	`H(obj) × (1 - P(visible(obj)))`	Information Gain of sensing an object = its total entropy × probability it is currently hidden

CLIP Vision Backbone (`src/perception/clip_vision.py`)

Parameter	Default	Meaning
`model_name`	`openai/clip-vit-base-patch32`	HuggingFace model ID. Swap to `openai/clip-vit-large-patch14` for stronger features
`device`	auto (`cuda` if available, else `cpu`)	Inference device

To switch to the larger model (recommended for GPU runs):

# In scripts/run_benchmarks.py __init__:
self.vlm = CLIPVisionBackbone(model_name="openai/clip-vit-large-patch14")

Domain Configuration

Domains are defined entirely in domains/<domain>/:

constraints.yaml — rules compiled into CP-SAT constraints:

mutex:
  - ["on_table(x)", "on(x,y)"]   # block can't be on table AND on another block
  - ["holding(x)", "arm_empty()"] # can't hold something and have empty arm

implications:
  - if: "on(x,y)"
    then: "not clear(y)"          # if x is on y, y is not clear

domain.pddl — standard PDDL for Pyperplan:

(:action pickup
  :parameters (?x - block)
  :precondition (and (clear ?x) (on_table ?x) (arm_empty) (visible ?x))
  :effect (and (not (on_table ?x)) (holding ?x) (not (arm_empty)))
)

Adding a new domain requires only: constraints.yaml, predicates.yaml, domain.pddl.

Output Files

Benchmark outputs are written to outputs/benchmarks/ with descriptive filenames that may include noise, alpha, decay, perception mode, and seed.

Per-episode JSONL outputs look like:

{
  "episode": 0,
  "mode": "full_top_k",
  "noise_level": 0.3,
  "success": true,
  "steps": 8,
  "inconsistency_events": 1,
  "sensing_actions": 3,
  "replans": 0
}

Aggregate seed sweeps and ALFWorld smoke traces are also written under:

outputs/benchmarks/*.json
outputs/eval/*.json

Note: outputs/ is gitignored, so if you want results visible on GitHub, summarize them in the README or another committed doc.

Citation

If you use this codebase, please cite:

@misc{belief-pddl-2026,
  author = {Sujosh Nag},
  title  = {Constrained Predicate Belief Projection for Neuro-Symbolic Planning},
  year   = {2026},
  url    = {https://github.com/nagsujosh/Belief-PDDL}
}

License

MIT — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Constrained Predicate Belief Projection for Neuro-Symbolic Planning

Research Goal

Architecture Overview

Latest Validated Results

Blocksworld

ALFWorld

What Counts As a Baseline Here

Repository Structure

Installation

Option A — Local (Recommended for RTX GPU)

Option B — Docker (Headless / Reproducibility)

Running Experiments

Blocksworld Ablation Suite

ALFWorld Evaluation (Full 3D Embodied)

Hyperparameters

Belief Update (`src/belief/update.py` — `BeliefUpdater`)

CP-SAT Projector (`src/belief/projection.py` — `BeliefProjector`)

Top-K Planner (`src/planning/sample_belief_planner.py` — `SampleBeliefPlanner`)

CLIP Vision Backbone (`src/perception/clip_vision.py`)

Domain Configuration

Output Files

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
domains		domains
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_all_experiments.sh		run_all_experiments.sh
summarize_results.py		summarize_results.py

Folders and files

Latest commit

History

Repository files navigation

Constrained Predicate Belief Projection for Neuro-Symbolic Planning

Research Goal

Architecture Overview

Latest Validated Results

Blocksworld

ALFWorld

What Counts As a Baseline Here

Repository Structure

Installation

Option A — Local (Recommended for RTX GPU)

Option B — Docker (Headless / Reproducibility)

Running Experiments

Blocksworld Ablation Suite

ALFWorld Evaluation (Full 3D Embodied)

Hyperparameters

Belief Update (src/belief/update.py — BeliefUpdater)

CP-SAT Projector (src/belief/projection.py — BeliefProjector)

Top-K Planner (src/planning/sample_belief_planner.py — SampleBeliefPlanner)

CLIP Vision Backbone (src/perception/clip_vision.py)

Domain Configuration

Output Files

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Belief Update (`src/belief/update.py` — `BeliefUpdater`)

CP-SAT Projector (`src/belief/projection.py` — `BeliefProjector`)

Top-K Planner (`src/planning/sample_belief_planner.py` — `SampleBeliefPlanner`)

CLIP Vision Backbone (`src/perception/clip_vision.py`)

Packages