Implicit Physics Prompting in AI Video Diffusion

Investigating whether language structure can serve as soft physics constraints in generative video models.

Overview

AI Video diffusion models produce visually compelling motion but frequently violate physical laws—objects drift against gravity, momentum reverses spontaneously, collisions produce implausible outcomes. The standard fix involves physics simulation layers or physics-informed loss functions, requiring architectural changes and domain-specific training.

This project tests an alternative: can physics constraints be communicated through prompt structure alone?

The hypothesis is that language carries residual physical information. Video diffusion models trained on captioned footage have absorbed correlations between linguistic patterns and motion characteristics. They don't simulate Newtonian mechanics—they reenact stories about physics. Certain words and constructions reliably co-occur with certain motion patterns in training data.

If true, carefully constructed prompts should activate latent priors corresponding to physically consistent motion. This project will show how semantics substitute for equations in generative systems and insights about why language remains such a strong interface for control.

What This Project Produces

Prompt Taxonomy — Categorized linguistic structures mapped to physics domains (gravity, momentum, collision, fluid dynamics, articulated motion)
Evaluation Dataset — Generated videos with controlled prompt variations + automated and human annotations
Empirical Analysis — Statistical relationships between linguistic features and motion coherence metrics
Practical Guidelines — Actionable prompt engineering principles for physics-consistent video generation

Core Experiment

We systematically vary prompt structure while holding generation parameters constant:

Minimal prompt: a ball falls

Elaborated variants:

Verb substitution: a ball drops / a ball plummets / a ball descends
Temporal chaining: a ball tips off the edge and falls / a ball falls and bounces twice
Force attribution: gravity pulls a ball downward / a ball falls under gravitational acceleration
Manner specification: a ball falls slowly / a ball falls rapidly, accelerating

We then measure motion characteristics through optical flow analysis, temporal coherence metrics, and physics-specific heuristics. Human evaluation validates automated metrics.

Hardware Requirements

GPU: 16GB VRAM minimum (tested on RTX 4080, RTX 3090)
Storage: ~50GB for models + generated videos
Platform: Linux or WSL2

Setup (WSL2 / Linux)

1. System Dependencies

sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential git curl wget unzip ffmpeg

2. Conda Environment

# Install Miniconda if needed
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
source ~/.bashrc

# Create environment
conda create -n ipp python=3.10 -y
conda activate ipp

3. Python Dependencies

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install diffusers transformers accelerate safetensors
pip install opencv-python-headless scikit-image scipy
pip install pandas numpy matplotlib seaborn
pip install einops omegaconf pyyaml tqdm
pip install pytorch-fid lpips av

4. Clone and Setup

git clone https://github.com/[username]/implicit-physics-prompting.git
cd implicit-physics-prompting
python scripts/setup_models.py  # Downloads AnimateDiff weights

Project Structure

implicit-physics-prompting/
│
├── README.md
├── requirements.txt
├── config/
│   ├── generation.yaml
│   ├── evaluation.yaml
│   └── paths.yaml
│
├── prompts/
│   ├── taxonomy.yaml
│   └── templates/
│       ├── gravity.yaml
│       ├── momentum.yaml
│       ├── collision.yaml
│       └── fluid.yaml
│
├── src/
│   ├── generation/
│   │   ├── pipeline.py
│   │   ├── batch_generate.py
│   │   └── utils.py
│   │
│   ├── evaluation/
│   │   ├── metrics.py
│   │   ├── optical_flow.py
│   │   ├── temporal.py
│   │   └── physics_heuristics.py
│   │
│   ├── analysis/
│   │   ├── aggregate.py
│   │   ├── statistics.py
│   │   └── visualize.py
│   │
│   └── prompts/
│       ├── parser.py
│       ├── generator.py
│       └── linguistic.py
│
├── data/
│   ├── generated/
│   ├── metrics/
│   └── annotations/
│
├── notebooks/
│   ├── 01_exploration.ipynb
│   ├── 02_metric_validation.ipynb
│   └── 03_analysis.ipynb
│
├── scripts/
│   ├── setup_models.py
│   ├── run_experiment.py
│   └── aggregate_results.py
│
└── docs/
    ├── methodology.md
    ├── prompt_taxonomy.md
    └── results_log.md

Workflow

Phase 0: Environment Verification

conda activate ipp
python -c "from src.generation.pipeline import generate; generate('a ball rolling on a table')"

Should produce a .mp4 in data/generated/.

Phase 1: Prompt Taxonomy

Define physics domains and linguistic variations in prompts/templates/. Example structure:

domain: gravity
scenarios:
  - id: ball_fall
    base: "a ball falls"
    variations:
      verb_swap:
        - "a ball drops"
        - "a ball plummets"
      temporal_chain:
        - "a ball tips off the edge and falls"
      force_explicit:
        - "gravity pulls a ball downward"
      manner:
        - "a ball falls slowly"
        - "a ball falls rapidly, accelerating"
    physics_expectations:
      direction: "downward"
      acceleration: "constant"

Phase 2: Batch Generation

python scripts/run_experiment.py --config config/generation.yaml --domain gravity

Generates all prompt variations × seeds, outputs to data/generated/ with JSON metadata sidecars.

Phase 3: Metrics Computation

python scripts/compute_metrics.py --input data/generated/ --output data/metrics/

Computes optical flow, temporal coherence, and physics heuristics for all videos.

Phase 4: Analysis

python scripts/aggregate_results.py

Compiles results, runs statistical tests, generates figures.

Metrics

Metric	Description	Range
`flow_magnitude_mean`	Average motion intensity	0-50
`flow_direction_entropy`	Motion direction consistency	0-2
`temporal_lpips`	Frame-to-frame perceptual change	0-1
`warp_error`	Flow-based reconstruction error	0-1
`gravity_alignment`	Downward flow dominance	-1 to 1
`acceleration_smoothness`	Jerk minimization	0-∞

Configuration

config/generation.yaml

model:
  name: "animatediff"
  checkpoint: "models/animatediff-v3.safetensors"
  motion_module: "models/mm_sd15_v3.safetensors"
  
inference:
  num_frames: 16
  fps: 8
  height: 512
  width: 512
  guidance_scale: 7.5
  num_inference_steps: 25
  
seeds: [42, 123, 456, 789, 1011]

output:
  format: "mp4"
  codec: "libx264"

Expected Findings

Based on embodied cognition research, we anticipate:

Temporal chaining shows strongest effects (sequential structure maps to temporal video structure)
Manner specification reliably modulates speed and acceleration
Force attribution may show weaker effects (models may not learn explicit physics vocabulary)
Gravity and momentum domains respond better than complex domains like fluid dynamics

Null results would also be informative—indicating that architectural solutions are necessary rather than prompt-based approaches.

Troubleshooting

Issue	Solution
CUDA out of memory	Reduce resolution to 384×384 or frames to 12
Black frames	Adjust guidance_scale (try 5-9)
Frozen motion	Try different seed or increase motion module weight
WSL GPU not detected	Update NVIDIA drivers on Windows host

References

Key papers informing this work:

Video Diffusion:

Blattmann et al. (2023) — Stable Video Diffusion
Guo et al. (2023) — AnimateDiff

Embodied Cognition:

Lakoff & Johnson (1980) — Metaphors We Live By
Barsalou (2008) — Grounded Cognition
Pulvermüller (2005) — Brain mechanisms linking language and action

Physics-Informed ML:

Raissi et al. (2019) — Physics-informed neural networks
Karniadakis et al. (2021) — Physics-informed machine learning review

License

MIT

Contributing

Issues and PRs welcome. See docs/methodology.md for design rationale before proposing changes to the taxonomy or metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implicit Physics Prompting in AI Video Diffusion

Overview

What This Project Produces

Core Experiment

Hardware Requirements

Setup (WSL2 / Linux)

1. System Dependencies

2. Conda Environment

3. Python Dependencies

4. Clone and Setup

Project Structure

Workflow

Phase 0: Environment Verification

Phase 1: Prompt Taxonomy

Phase 2: Batch Generation

Phase 3: Metrics Computation

Phase 4: Analysis

Metrics

Configuration

Expected Findings

Troubleshooting

References

License

Contributing

About

Uh oh!

Releases

Packages

BrutchsamaJeanLouis/attractor-prompts

Folders and files

Latest commit

History

Repository files navigation

Implicit Physics Prompting in AI Video Diffusion

Overview

What This Project Produces

Core Experiment

Hardware Requirements

Setup (WSL2 / Linux)

1. System Dependencies

2. Conda Environment

3. Python Dependencies

4. Clone and Setup

Project Structure

Workflow

Phase 0: Environment Verification

Phase 1: Prompt Taxonomy

Phase 2: Batch Generation

Phase 3: Metrics Computation

Phase 4: Analysis

Metrics

Configuration

Expected Findings

Troubleshooting

References

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages