feat: add segment 3 dataset images submission for cataluna84 by cataluna84 · Pull Request #10 · Cohere-Labs-Community/Vision-Interpretability

cataluna84 · 2026-02-01T14:04:46Z

Summary

This PR submits the solution for Segment 3: Dataset Images, focusing on interpreting the mixed4a layer of InceptionV1. It implements a pipeline to correlate dataset examples with feature visualizations to understand neuron behavior.

Technical Details

Notebook: segments/segment_3_dataset_images/submissions/cataluna84__segment_3_dataset_images.ipynb

ML Pipeline & Workflow

Model: Pre-trained InceptionV1 (GoogLeNet) via lucent.modelzoo. Target Layer: mixed4a.
Dataset Streaming: Efficiently streams ImageNet-1k from HuggingFace, processing ~1.2M images without full disk download.
Activation Tracking:
- Approach: Scans the dataset to find images that maximally (and minimally) activate specific neurons.
- Spectrum Tracker: Uses ActivationSpectrumTrackerV2 to track 4 categories per neuron: Max Activating, Slightly Positive, Slightly Negative, and Min Activating (Inhibitory).
- Optimization: Memory-efficient min-heap implementation (SampleRecord) to track top-k samples on the fly.
Feature Visualization (Optimization):
- Uses Lucent (lucent.optvis) to perform gradient ascent on the input space.
- Generates Positive (Dream) and Negative (Avoidance) synthetic images to compare with real data.
Visualization:
- Outputs a Distill.pub-style Activation Spectrum, arranging synthetic and real images side-by-side to reveal the "ideal" feature vs. real-world triggers.
Experiment Monitoring: Integrated with Weights & Biases to log throughput (img/sec) and progress.

Verification

Confirmed streaming pipeline throughput.
Verified interaction between ActivationExtractor hooks and the main loop.
Validated generation of both optimized images and dataset activation tracking.

cataluna84 · 2026-02-01T14:09:18Z

It took 31 hours 33 minutes to go through the whole ImageNet dataset with a batch size of 128 and sample_record_limit as 2000 taking about 20 GB of RAM.

https://wandb.ai/cataluna84/vision-interpretability/runs/ybjw2oe7?nw=nwusercataluna84

Let me know if you are getting the same results with the hyperparameters I am using, and any other questions that you may have.

feat: add segment 3 dataset images submission for cataluna84

b8bd486

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add segment 3 dataset images submission for cataluna84#10

feat: add segment 3 dataset images submission for cataluna84#10
cataluna84 wants to merge 1 commit intomainfrom
submission/cataluna84-segment-3

cataluna84 commented Feb 1, 2026

Uh oh!

cataluna84 commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cataluna84 commented Feb 1, 2026

Summary

Technical Details

ML Pipeline & Workflow

Verification

Uh oh!

cataluna84 commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant