-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathinfo.txt
More file actions
303 lines (254 loc) · 10.8 KB
/
Copy pathinfo.txt
File metadata and controls
303 lines (254 loc) · 10.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
# ReVision-RT: Motion-Adaptive Real-Time Seam Carving
## Project Context & Current Status - Updated
### Project Overview
Real-time temporal content-aware video resizing system using seam carving. The system prioritizes moving objects by temporarily boosting their importance through motion-adaptive algorithms, then gradually forgetting them after motion ceases.
**Core Features:**
- Multi-modal importance calculation: depth + saliency + gradient (edges)
- Motion detection and temporal memory system
- Adaptive parameter calculation based on scene statistics
- Seam revalidation for motion in carved areas
**Target Platform:** Eventually VR headsets, currently desktop testing with OpenCV windows.
### Current Implementation Status
#### Fully Implemented ✅
- **Real-time seam carving:** Forward energy DP with 3-path cost evaluation
- **Threading architecture:** Main (input) + Motion + Validator + Retarget + Display threads
- **Gradient importance:** Sobel edge detection, updated every frame
- **Motion detection:** Frame differencing with adaptive thresholding
- **Temporal memory:** Motion residue system with dual decay rates
- **Adaptive parameters:** Variance-based calculation of α, β, γ, τ
- **Seam revalidation:** Validator thread restores seams with motion in invalid areas
- **Configuration system:** INI-based settings
- **Coordinate remapping:** Left-packed pixel arrangement after seam removal
- **DisplayManager:** Simple window system without abstract controller overhead
#### Partially Implemented 🔄
- **Importance fusion:** Currently only Sobel + motion, missing depth and saliency
- **Scene change detection:** No trigger for full recalculation yet
- **Motion scale calculation:** Global averaging causes frame-wide effects
#### Not Implemented ❌
- **Depth maps:** Not integrated
- **Saliency maps:** Not integrated
- **VR integration:** Future goal
- **Christmas tree optimization:** Incremental energy updates not implemented
### Current Architecture
#### Thread Responsibilities
```
Main Thread:
- Frame input coordination
- Sobel calculation (every frame)
- Frame conversion and scaling
Motion Thread:
- Frame differencing (cv::absdiff)
- Motion statistics updates (μ, σ², τ via EMA)
- Adaptive parameter calculation (α, β, γ)
- Motion energy fusion to pixels
- Queue invalid pixels with motion for revalidation
Validator Thread:
- Blocking queue-based, sleeps until work available
- Processes queued pixels
- Walks seam chains via seamUp/seamDown pointers
- Revalidates entire seam chains
- Increases width counter
Retarget Thread:
- Runs when currentWidth > targetWidth
- Forward energy calculation
- Seam finding and invalidation
- Sets seam chain pointers during removal
- Decreases width counter
Display Thread:
- Triple window display (Original, Packed, Importance)
- Crops packed frame to current width
- 60fps target rendering
```
#### Data Structures
```cpp
PixelCore:
- importanceBase (Sobel, updated per frame)
- importanceCurrent (final combined importance)
- currentMotion (fast-decaying motion component)
- motionResidue (slow-decaying motion memory)
- seamUp/seamDown (pointers for seam chain traversal)
- originalX/originalY (coordinate tracking)
SharedData:
- Pixel arrays and validity map
- Frame buffers (input, gray, previousGray, motionMap, packed)
- Coordinate remapping (mapX, mapY)
- Validation queue with condition variable
- Thread synchronization mutexes (dataLock, seamOperationsLock, validationQueueMutex)
```
### Mathematical Framework
#### Core Importance Equation
```
I_c(x,y,t) = I_b(x,y,t) + M_c(x,y,t) + M_r(x,y,t)
Where:
I_b = importanceBase (Sobel edges, updated every frame)
M_c = currentMotion (fast-decaying component)
M_r = motionResidue (slow-decaying memory component)
I_c = importanceCurrent (final importance for seam carving)
```
#### Motion Detection
```
M_d(x,y,t) = |gray(x,y,t) - gray(x,y,t-1)|
τ(t) = P₂₅(M_d) × 3.0 (25th percentile threshold)
```
#### Motion Memory Dynamics
```
M_c(x,y,t) = α(t)·M_c(x,y,t-1) + s·M_d(x,y,t)·H(M_d, τ)
M_r(x,y,t) = β(t)·M_r(x,y,t-1) + γ(t)·s·M_d(x,y,t)·H(M_d, τ)
Where:
H(M_d, τ) = heaviside function (1 if M_d > τ, else 0)
s = motion scale factor
```
#### Adaptive Parameters (Scene-Responsive)
```
Motion Statistics (Exponential Moving Averages):
μ(t) = λ·μ(t-1) + (1-λ)·mean(M_d)
σ²(t) = λ·σ²(t-1) + (1-λ)·(mean(M_d) - μ(t))²
τ(t) = λ·τ(t-1) + (1-λ)·τ_raw (τ_raw recalculated every 30 frames)
Where λ = 0.951 (derived from τ = 20 frame time constant)
Parameter Derivation:
α(t) = 0.70 + 0.25 × min(1, σ²(t)/100) [current motion decay]
β(t) = 0.92 + 0.08 × (α - 0.70)/0.25 [residue decay, always β > α]
γ(t) = 0.15 + 0.15 × min(1, σ²(t)/100) [residue accumulation]
Justification:
- High variance scenes → slower decay (α→0.95), more accumulation (γ→0.30)
- Low variance scenes → faster decay (α→0.70), less accumulation (γ→0.15)
- Residue always decays slower than current motion (β > α)
```
### Algorithm Components (SOLID Separation)
#### File Structure
```
src/algorithms/
├── motionStatistics.hpp/cpp
│ - updateStatistics(motionMap, frameCount)
│ - getMean(), getVariance(), getThreshold()
│ - calculatePercentile25() [called every 30 frames]
│
├── motionParameters.hpp/cpp
│ - calculateAlpha(variance)
│ - calculateBeta(alpha)
│ - calculateGamma(variance)
│ - Pure functions, stateless
│
├── motionFusion.hpp/cpp
│ - updatePixelMotion(pixels, motionMap, α, β, γ, τ)
│ - calculateScale(pixels)
│
├── sobel.hpp/cpp
│ - calculateSobelEnergy(gray, pixels)
│
├── seamCarving.hpp/cpp
│ - calculateEnergy()
│ - invalidateSeam()
│
└── energyFuser.hpp/cpp
- fuseEnergy() [initial fusion, not used per-frame]
```
**Key Principle:** Algorithms are stateless pure functions. Threads coordinate algorithms. Main coordinates threads. Data holds state only.
### Known Issues & Current Problems
#### Critical Issues
1. **Motion scale calculation:** Global averaging causes entire frame to respond to local motion
- avgImportance ~2, scale = 2/255 = 0.008
- Motion values get crushed (motion 50 becomes 0.39)
- Need per-pixel scale or better amplification strategy
2. **Incomplete importance fusion:** Only using Sobel, missing depth and saliency
- Current: I_c = I_sobel + M_c + M_r
- Should be: I_c = α_s·I_sobel + α_d·I_depth + α_sal·I_saliency + M_c + M_r
3. **Motion trails barely visible:** Parameters correct (β=0.92, γ=0.15) but scale crushes values
- Residue accumulates (residue ~0.9) but too small compared to importanceBase
- Scale amplification insufficient
4. **Crash on exit:** Validator thread accessing deleted data
- Need proper shutdown sequence with condition variable notification
#### Performance Considerations
- Motion statistics: O(1) per frame (EMA updates) ✓
- Threshold calculation: O(N log N) every 30 frames (percentile sorting)
- Motion scale: O(N) per frame (iterates all pixels) ✗
- Sobel: O(N) per frame (necessary)
**Optimization targets:**
- Cache motion scale, update every 60 frames
- Or use per-pixel scale based on local importanceBase
### Data Flow Summary
```
Frame arrives
↓
Main Thread: Sobel → importanceBase (every frame)
↓
Motion Thread:
Frame diff → motionMap
↓
Statistics update (μ, σ², τ via EMA)
↓
Parameter calculation (α, β, γ from variance)
↓
Apply to pixels: M_c, M_r, I_c
↓
Check invalid pixels with motion → queue for validator
↓
Validator Thread (blocked until queue has work):
Dequeue pixel → walk seam chain → revalidate
↓
Retarget Thread (runs when width > target):
Calculate energy → find seam → invalidate → set pointers
↓
Display Thread:
Show original, packed (cropped), importance visualization
```
### Integration Points for Future Development
#### Next Priority Items
1. **Fix motion scale:** Per-pixel scale or cached global with proper amplification
2. **Integrate depth maps:** RGB-D camera or depth estimation network
3. **Integrate saliency maps:** Pre-trained saliency detection model
4. **Implement scene change detection:** Percentage threshold triggers full fusion
5. **Full importance fusion:** Multi-modal combination with learned weights
#### VR Integration (Future)
- Replace OpenCV windows with VR framework
- Optimize for 90/120fps requirements
- Handle VR-specific event loops
#### Performance Optimizations Available
- GPU acceleration for frame differencing (10-20x speedup)
- GPU Sobel calculation (5-15x speedup)
- SIMD for motion fusion inner loops
- Christmas tree optimization for incremental energy
### Lessons Learned
#### What Worked Well
- **Adaptive parameters from statistics:** Natural scene response
- **EMA for temporal smoothing:** Efficient and mathematically justified
- **Per-seam locking:** Simple, sufficient granularity
- **Queue-based validator:** Efficient blocking wait
- **Motion memory equation:** Dual decay creates trails
- **DisplayManager simplicity:** No unnecessary abstractions
- **Algorithm component separation:** Easy to test and tune
#### What Didn't Work
- **Fixed magic numbers:** Required statistical derivation
- **Global motion scale:** Frame-wide artifacts
- **Abstract OutputController:** Overcomplicated windows
- **Per-frame expensive operations:** Moved to periodic
- **Static parameters:** Required scene adaptation
- **Separate motion arrays:** Moved to PixelCore for cache locality
#### Design Patterns Applied
- **SOLID:** Algorithm components are stateless pure functions
- **Strategy (removed):** Overcomplicated, replaced with simple manager
- **Observer (avoided):** Direct calls instead
- **Queue with condition variable:** Efficient validator wake-on-work
- **Exponential moving average:** Temporal smoothing pattern
### Communication Style & Preferences
- Direct feedback, call out flaws immediately
- Code only on request, explain approach first
- Concise by default, detail on demand
- Systematic debugging, eliminate causes methodically
- Simple solutions over engineering
- Honest technical assessment even if harsh
- Skip pleasantries, get to the point
### Current Research Context
Paper requires mathematical justification for all parameters. No arbitrary constants allowed. All values derived from:
- Scene statistics (variance, mean)
- Signal processing theory (time constants, cutoff frequencies)
- Empirical parameter sweeps with reported results
**Justified parameters:**
- λ = 0.951 from τ = 20 frame time constant (e^(-1/20))
- α, β, γ from normalized variance mapping to ranges
- τ from 25th percentile (robust noise floor estimation)
- 3.0 multiplier from signal processing (3σ rule)
**Still needs justification:**
- Motion scale calculation method
- VARIANCE_MAX = 100 (empirical observation needed)
- Beta offset and mapping (current formula: β > α always)