Skip to content

Research Investigation: Whether language structure can serve as soft physics constraints in generative video models.

Notifications You must be signed in to change notification settings

BrutchsamaJeanLouis/attractor-prompts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Implicit Physics Prompting in AI Video Diffusion

Investigating whether language structure can serve as soft physics constraints in generative video models.


Overview

AI Video diffusion models produce visually compelling motion but frequently violate physical laws—objects drift against gravity, momentum reverses spontaneously, collisions produce implausible outcomes. The standard fix involves physics simulation layers or physics-informed loss functions, requiring architectural changes and domain-specific training.

This project tests an alternative: can physics constraints be communicated through prompt structure alone?

The hypothesis is that language carries residual physical information. Video diffusion models trained on captioned footage have absorbed correlations between linguistic patterns and motion characteristics. They don't simulate Newtonian mechanics—they reenact stories about physics. Certain words and constructions reliably co-occur with certain motion patterns in training data.

If true, carefully constructed prompts should activate latent priors corresponding to physically consistent motion. This project will show how semantics substitute for equations in generative systems and insights about why language remains such a strong interface for control.


What This Project Produces

  1. Prompt Taxonomy — Categorized linguistic structures mapped to physics domains (gravity, momentum, collision, fluid dynamics, articulated motion)
  2. Evaluation Dataset — Generated videos with controlled prompt variations + automated and human annotations
  3. Empirical Analysis — Statistical relationships between linguistic features and motion coherence metrics
  4. Practical Guidelines — Actionable prompt engineering principles for physics-consistent video generation

Core Experiment

We systematically vary prompt structure while holding generation parameters constant:

Minimal prompt: a ball falls

Elaborated variants:

  • Verb substitution: a ball drops / a ball plummets / a ball descends
  • Temporal chaining: a ball tips off the edge and falls / a ball falls and bounces twice
  • Force attribution: gravity pulls a ball downward / a ball falls under gravitational acceleration
  • Manner specification: a ball falls slowly / a ball falls rapidly, accelerating

We then measure motion characteristics through optical flow analysis, temporal coherence metrics, and physics-specific heuristics. Human evaluation validates automated metrics.


Hardware Requirements

  • GPU: 16GB VRAM minimum (tested on RTX 4080, RTX 3090)
  • Storage: ~50GB for models + generated videos
  • Platform: Linux or WSL2

Setup (WSL2 / Linux)

1. System Dependencies

sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential git curl wget unzip ffmpeg

2. Conda Environment

# Install Miniconda if needed
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
source ~/.bashrc

# Create environment
conda create -n ipp python=3.10 -y
conda activate ipp

3. Python Dependencies

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install diffusers transformers accelerate safetensors
pip install opencv-python-headless scikit-image scipy
pip install pandas numpy matplotlib seaborn
pip install einops omegaconf pyyaml tqdm
pip install pytorch-fid lpips av

4. Clone and Setup

git clone https://github.com/[username]/implicit-physics-prompting.git
cd implicit-physics-prompting
python scripts/setup_models.py  # Downloads AnimateDiff weights

Project Structure

implicit-physics-prompting/
│
├── README.md
├── requirements.txt
├── config/
│   ├── generation.yaml
│   ├── evaluation.yaml
│   └── paths.yaml
│
├── prompts/
│   ├── taxonomy.yaml
│   └── templates/
│       ├── gravity.yaml
│       ├── momentum.yaml
│       ├── collision.yaml
│       └── fluid.yaml
│
├── src/
│   ├── generation/
│   │   ├── pipeline.py
│   │   ├── batch_generate.py
│   │   └── utils.py
│   │
│   ├── evaluation/
│   │   ├── metrics.py
│   │   ├── optical_flow.py
│   │   ├── temporal.py
│   │   └── physics_heuristics.py
│   │
│   ├── analysis/
│   │   ├── aggregate.py
│   │   ├── statistics.py
│   │   └── visualize.py
│   │
│   └── prompts/
│       ├── parser.py
│       ├── generator.py
│       └── linguistic.py
│
├── data/
│   ├── generated/
│   ├── metrics/
│   └── annotations/
│
├── notebooks/
│   ├── 01_exploration.ipynb
│   ├── 02_metric_validation.ipynb
│   └── 03_analysis.ipynb
│
├── scripts/
│   ├── setup_models.py
│   ├── run_experiment.py
│   └── aggregate_results.py
│
└── docs/
    ├── methodology.md
    ├── prompt_taxonomy.md
    └── results_log.md

Workflow

Phase 0: Environment Verification

conda activate ipp
python -c "from src.generation.pipeline import generate; generate('a ball rolling on a table')"

Should produce a .mp4 in data/generated/.

Phase 1: Prompt Taxonomy

Define physics domains and linguistic variations in prompts/templates/. Example structure:

domain: gravity
scenarios:
  - id: ball_fall
    base: "a ball falls"
    variations:
      verb_swap:
        - "a ball drops"
        - "a ball plummets"
      temporal_chain:
        - "a ball tips off the edge and falls"
      force_explicit:
        - "gravity pulls a ball downward"
      manner:
        - "a ball falls slowly"
        - "a ball falls rapidly, accelerating"
    physics_expectations:
      direction: "downward"
      acceleration: "constant"

Phase 2: Batch Generation

python scripts/run_experiment.py --config config/generation.yaml --domain gravity

Generates all prompt variations × seeds, outputs to data/generated/ with JSON metadata sidecars.

Phase 3: Metrics Computation

python scripts/compute_metrics.py --input data/generated/ --output data/metrics/

Computes optical flow, temporal coherence, and physics heuristics for all videos.

Phase 4: Analysis

python scripts/aggregate_results.py

Compiles results, runs statistical tests, generates figures.


Metrics

Metric Description Range
flow_magnitude_mean Average motion intensity 0-50
flow_direction_entropy Motion direction consistency 0-2
temporal_lpips Frame-to-frame perceptual change 0-1
warp_error Flow-based reconstruction error 0-1
gravity_alignment Downward flow dominance -1 to 1
acceleration_smoothness Jerk minimization 0-∞

Configuration

config/generation.yaml

model:
  name: "animatediff"
  checkpoint: "models/animatediff-v3.safetensors"
  motion_module: "models/mm_sd15_v3.safetensors"
  
inference:
  num_frames: 16
  fps: 8
  height: 512
  width: 512
  guidance_scale: 7.5
  num_inference_steps: 25
  
seeds: [42, 123, 456, 789, 1011]

output:
  format: "mp4"
  codec: "libx264"

Expected Findings

Based on embodied cognition research, we anticipate:

  • Temporal chaining shows strongest effects (sequential structure maps to temporal video structure)
  • Manner specification reliably modulates speed and acceleration
  • Force attribution may show weaker effects (models may not learn explicit physics vocabulary)
  • Gravity and momentum domains respond better than complex domains like fluid dynamics

Null results would also be informative—indicating that architectural solutions are necessary rather than prompt-based approaches.


Troubleshooting

Issue Solution
CUDA out of memory Reduce resolution to 384×384 or frames to 12
Black frames Adjust guidance_scale (try 5-9)
Frozen motion Try different seed or increase motion module weight
WSL GPU not detected Update NVIDIA drivers on Windows host

References

Key papers informing this work:

Video Diffusion:

  • Blattmann et al. (2023) — Stable Video Diffusion
  • Guo et al. (2023) — AnimateDiff

Embodied Cognition:

  • Lakoff & Johnson (1980) — Metaphors We Live By
  • Barsalou (2008) — Grounded Cognition
  • Pulvermüller (2005) — Brain mechanisms linking language and action

Physics-Informed ML:

  • Raissi et al. (2019) — Physics-informed neural networks
  • Karniadakis et al. (2021) — Physics-informed machine learning review

License

MIT


Contributing

Issues and PRs welcome. See docs/methodology.md for design rationale before proposing changes to the taxonomy or metrics.

About

Research Investigation: Whether language structure can serve as soft physics constraints in generative video models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published