A research-oriented project exploring dog emotion recognition from video (posture/movement) and audio (volume, pitch, spectrogram) signals.
This repository experiments with YOLOv8-based movement detection, Librosa audio feature extraction, and LSTM-based sequence modeling for multimodal emotion classification.
- Behavior data collection: Extracted motion & audio data from YouTube dog videos.
- Movement recognition: Applied YOLOv8 + OpenCV for activity state classification.
- Audio analysis: Used Librosa for pitch, volume, and spectrogram features.
- Structured logging: Stored features into
.logand.jsonlfiles for reproducible experiments. - Emotion prediction prototype: LSTM model predicts emotions such as calm, anxious, sleepy, bored, excited, focused.
# Clone repository
git clone https://github.com/yuna0831/dog-posture-emotion-ai.git
cd dog-posture-emotion-ai
# Create & activate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
Usage
Run Emotion Prediction from Logs
cd vision
python3 predict_emotion_from_log.py
Example Output
[0-30] → Predicted emotion: focused (confidence: 0.733)
Distribution:
calm : 0.0
anxious : 0.01
sleepy : 0.253
bored : 0.001
excited : 0.003
focused : 0.733
[30-60] → Predicted emotion: focused (confidence: 0.889)
Predictions saved to: emotion_predictions.jsonl
Current Limitations
Testing with sleepy.log produced predictions biased toward focused → current accuracy is limited.
Indicates stronger modeling of posture cues vs. audio cues.
Requires deeper analysis of the relationship between behavior patterns and labeled emotions.
Next Steps
Refine labeling strategy for higher-quality ground truth.
Balance multimodal contributions (movement vs. sound).
Improve LSTM architecture and add more robust evaluation metrics.
Extend dataset beyond YouTube sources for generalization.