-
Notifications
You must be signed in to change notification settings - Fork 34
Rydlr MotionBlend connector #326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add METRICS.md with detailed metric specifications: * L2 Velocity/Acceleration formulas * FID and Coverage metrics * Diversity metrics (Global/Local/Inter/Intra) * Transition smoothness scoring * Quality categories and scoring weights - Document key joints tracked for discontinuity detection - Include BigQuery schema for metrics columns - Add academic references (Tselepi 2025, Guo 2020, Petrovich 2021) - Provide usage instructions for full metrics pipeline Complements connector.py with research-backed quality evaluation.
- Add 19 new columns for motion blend quality metrics: * FID and Coverage (distribution metrics) * Global/Local/Inter/Intra Diversity * L2 Velocity stats (mean, std, max, transition) * L2 Acceleration stats (mean, std, max, transition) * Transition smoothness, velocity/acceleration ratios * Overall quality_score and quality_category - Update transform_blend_record() to include metric placeholders - Metrics populated as NULL initially, filled by post-processing pipeline - Enables complete quality evaluation workflow: sync → compute → update Aligns with research metrics from Tselepi (2025), Guo (2020), Petrovich (2021) See METRICS.md for detailed specifications
- Complete setup instructions (Docker, Cloud, config) - Full index schema with all quality metric fields - 6 search examples (quality filtering, smoothness, diversity, similarity, aggregations, issue detection) - Python client code samples - Incremental sync strategy with Cloud Run job - Performance tuning (sharding, batch size, query optimization) - Monitoring and health checks - Security best practices (API keys, HTTPS, VPC) - Cost optimization strategies - Complete pipeline architecture (GCS → Fivetran → BQ → dbt → ES → API) Enables fast, scalable search across motion blend quality metrics for animation studios and ML pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new MotionBlend connector for syncing motion-capture (MoCap) metadata from Google Cloud Storage (GCS) to BigQuery. The connector scans GCS folders for .bvh and .fbx files, extracts metadata, and loads it into three BigQuery tables representing different motion categories (seed, build, and blend motions).
Key changes:
- Implements a new connector with GCS-to-BigQuery data pipeline
- Supports three motion categories with different schemas and transformation logic
- Includes comprehensive documentation for Elasticsearch integration and quality metrics
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 24 comments.
Show a summary per file
| File | Description |
|---|---|
| connectors/motionblend/connector.py | Core connector implementation with schema, update, and transformation logic |
| connectors/motionblend/configuration.json | Configuration parameters for GCS bucket, prefixes, and BigQuery dataset |
| connectors/motionblend/requirements.txt | Third-party dependencies (structlog, GCP libraries, testing frameworks) |
| connectors/motionblend/README.md | Comprehensive documentation covering setup, authentication, and usage |
| connectors/motionblend/METRICS.md | Quality metrics documentation for motion blend evaluation |
| connectors/motionblend/ELASTICSEARCH.md | Elasticsearch integration guide with search examples |
Comments suppressed due to low confidence (4)
connectors/motionblend/connector.py:18
- Import of 'bigquery' is not used.
from google.cloud import storage, bigquery
connectors/motionblend/connector.py:19
- Import of 'retry' is not used.
from google.api_core import retry
connectors/motionblend/connector.py:21
- Import of 'time' is not used.
import time
connectors/motionblend/connector.py:326
- Import of 'sys' is not used.
import sys
- Remove excessive inline comments from table definitions - Move skeleton hierarchy documentation to schema() docstring - Improve formatting consistency with other connectors - Maintain clean, readable structure matching SDK patterns
RydlrCS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed successfully! The schema formatting cleanup has been committed and pushed to the main branch.
|
Production-ready code with comprehensive documentation and proper error handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 22 comments.
- Add last_sync_time parameter to list_gcs_files() function - Filter GCS blobs by updated timestamp for incremental processing - Read state before processing each prefix - Only process files updated after last sync time - Log incremental vs full sync mode for observability
- Remove confusing __RETRY_DELAY_SECONDS constant - Use standard pattern: sleep_time = min(60, 2 ** attempt) - Improves code readability and follows Python best practices
- Update schema() to lines 43-128 - Update list_gcs_files() to lines 131-222 - Update generate_record_id() to lines 249-262 - Update transform functions to lines 265-373 - Update update() to lines 375-469 Line numbers changed after incremental sync implementation
- Change google-cloud-storage>=2.0.0 to ==2.18.2 - Remove requests dependency (already available in SDK runtime) - Ensures reproducible builds and prevents breaking changes
- Move json import from __main__ block to top with other imports - Follow PEP 8 standard: all imports at top of file - Maintain proper import grouping: stdlib after SDK, before third-party
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Critical fix: Track global_max_timestamp incrementally at each checkpoint instead of only at prefix end. This prevents state inconsistency where intermediate checkpoints would use different timestamps than the final state update. Problem: State was updated with last_file_timestamp at checkpoints (line 651) but then overwritten with global_max_timestamp at the end (line 953). If connector failed between last checkpoint and final update, state would have intermediate timestamp, causing reprocessing. Solution: Use shared global_max_timestamp_tracker dict that is updated for each file and used consistently at all checkpoints. State now always contains the true global maximum timestamp of processed files, ensuring monotonic progression and preventing data loss/duplication.
Fix inconsistency where generate_blend_id() returned 40-character hash while generate_record_id() returned 16 characters. This caused blend motion IDs to be 40 chars while seed/build motion IDs were 16 chars. Standardized to 16 characters ([:16]) for all ID generation: - Seed motion IDs: 16 chars (already consistent) - Build motion IDs: 16 chars (already consistent) - Blend motion IDs: 16 chars (fixed from 40) This ensures consistent ID format across all tables and prevents confusion in downstream analytics.
Refactored to reuse transform_blend_record() function instead of duplicating 32 lines of transformation logic in blend generation loop. Changes: - Added optional 'blend_meta' parameter to transform_blend_record() - Function now accepts pre-computed blend metadata from generate_blend_pairs() - Removed duplicated transformation logic (lines 749-781) - Replaced with single call to transform_blend_record() Benefits: - Single source of truth for blend record transformation - Schema changes only need to be applied in one place - Reduced maintenance burden and risk of inconsistencies - 32 lines of code eliminated
Reworded framework reference for better readability: - Combined documentation references on single line - Changed 'See https://...' to 'Framework implementation: https://...' - More consistent and professional formatting
Extracted three helper functions to reduce cognitive complexity from >15 to <10: - _calculate_blend_batch_limit(): Calculates batch limit based on remaining budget - _upsert_blend_records(): Handles record transformation, upserting, and checkpointing - _clear_motion_buffers(): Clears buffers to free memory Benefits: - Main function now has single-level conditional logic (complexity ~8) - Each helper has single responsibility and low complexity (<5) - Improved testability - helpers can be unit tested independently - Better code organization and readability - Maintained all error handling and logging behavior
Updated README configuration table to match exact placeholders from configuration.json: - google_cloud_storage_prefixes: <COMMA_SEPARATED_GCS_PREFIXES> → <YOUR_COMMA_SEPARATED_PREFIXES> - batch_limit: <OPTIONAL_BATCH_LIMIT_FOR_TESTING_ONLY> → <YOUR_OPTIONAL_BATCH_LIMIT> - include_extensions: Changed Default column from '.bvh,.fbx' to '<YOUR_OPTIONAL_FILE_EXTENSIONS>' - max_blend_pairs: Changed Default column from '100' to '<YOUR_OPTIONAL_MAX_BLEND_PAIRS>' - max_motion_buffer_size: Changed Default column from '1000' to '<YOUR_OPTIONAL_MAX_BUFFER_SIZE>' This ensures documentation accurately reflects the configuration template users will see.
Replaced conditional assignment pattern with early return for clearer intent: - Before: if min_frames > 0 ... else actual_window_ratio = 0.0 - After: if min_frames == 0: return None (early return) Benefits: - Prevents potential ZeroDivisionError more explicitly - Clearer intent - zero-length motions cannot have quality scores - Simpler control flow without conditional assignment - Consistent with existing early return at line 121
fivetran-sahilkhirwal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logical flow is correct
As these examples will be available for users to use for their use cases, it is important that these are readable and maintainable.
Suggested some changes to improve readability and refactoring some long methods to smaller functions
Thanks for contributing to the repo :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.
|
Please add this in the main repository readme as well. Python formatting check failed. See formatting.diff artifact for details. |
fivetran-pranavtotala
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending: Please update on the above comments, and add required testing details in the description.
…omprehensive testing docs
Major improvements:
- Refactored connector.py: broke down long methods into focused helper functions
* list_gcs_files() -> 4 helpers (connection, blob filtering, timestamp check, record creation)
* validate_configuration() -> 4 validators (string, prefixes, extensions, integers)
* update() -> 6 module-level functions (batch processing, prefix processing, blend generation)
* Reduced update() from ~300 lines to ~50 lines for better maintainability
Critical bug fixes:
- Fixed state tracking: changed from global maximum to batch-local sequential tracking
* Prevents data loss when files arrive out of order across different batches
* State now reflects actual processing progress within current batch
- Fixed validation edge cases:
* Empty string handling ("0" now correctly validates as integer)
* Division by zero in blend_utils.est
Major improvements:
- Refactored connector.py: broke down long methods ured (testing-only parameter)
Code quality improvements:
- Eliminated nested functions for bette- Refactored connemplified zero-division guards in calculate_blend_ratio()
- Explicit is not None checks instead of truthiness for optional parameters
- Applied Black formatting (99-char line length * update() -> 6 module-level functions (batch processing, prefix processing, ble with:
* Local testing setup and validation checklist
* 6 detailed test scenarios (first sync, in
Critical bug fixes:
- Fiecovery, etc.)
* Log analysis patterns and known limitations
* Production deployment checklist
- Updated Features and Error handling sections
- Enhanced par * State now reflects actual processing progress within current batch
- Fixecking behavior
Contributor: Ted Iro, Rydlr Cloud Services Ltd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
- Fix _validate_optional_integer_parameter to use min_value parameter instead of hardcoded <= 0 check - Update schema() docstring to match required template format (moved skeleton hierarchy context to module-level docstring) These changes ensure proper parameter validation and maintain consistency with coding guidelines.
- Apply Black formatting to connector.py - Add motionblend entry to README.md in alphabetical order between microsoft_intune and n8n
Description of Change
Added the MotionBlend connector example, which ingests motion-capture (MoCap) metadata from Google Cloud Storage (GCS) and delivers it to BigQuery for downstream transformation and analytics. The connector scans folders containing .bvh and .fbx files, extracts key metadata (motion type, frame rate, skeleton ID, timestamps), and loads structured information into BigQuery.
Testing
Checklist
Project Write Up
Rydlr _ Moverse_ High-Throughput MoCap AI Pipeline (2).pdf