Skip to content

Conversation

@RydlrCS
Copy link

@RydlrCS RydlrCS commented Oct 24, 2025

Description of Change
Added the MotionBlend connector example, which ingests motion-capture (MoCap) metadata from Google Cloud Storage (GCS) and delivers it to BigQuery for downstream transformation and analytics. The connector scans folders containing .bvh and .fbx files, extracts key metadata (motion type, frame rate, skeleton ID, timestamps), and loads structured information into BigQuery.

Testing

  • Ran fivetran debug on the connector; no errors found in connector.py, README.md, or configuration.json.
  • README follows the template and describes usage, configuration, and schema.
  • Example configuration uses placeholders and no real secrets.
  • Python code is PEP 8 compliant and passes linting.

Checklist

  • Tested the connector with fivetran debug command (no errors found).
  • Added/Updated example specific README.md file, refer here for template.
  • Followed Python Coding Standards, refer here.
  • All automated checks passed. If you need screenshots or further validation, let me know!

Project Write Up
Rydlr _ Moverse_ High-Throughput MoCap AI Pipeline (2).pdf

- Add METRICS.md with detailed metric specifications:
  * L2 Velocity/Acceleration formulas
  * FID and Coverage metrics
  * Diversity metrics (Global/Local/Inter/Intra)
  * Transition smoothness scoring
  * Quality categories and scoring weights

- Document key joints tracked for discontinuity detection
- Include BigQuery schema for metrics columns
- Add academic references (Tselepi 2025, Guo 2020, Petrovich 2021)
- Provide usage instructions for full metrics pipeline

Complements connector.py with research-backed quality evaluation.
- Add 19 new columns for motion blend quality metrics:
  * FID and Coverage (distribution metrics)
  * Global/Local/Inter/Intra Diversity
  * L2 Velocity stats (mean, std, max, transition)
  * L2 Acceleration stats (mean, std, max, transition)
  * Transition smoothness, velocity/acceleration ratios
  * Overall quality_score and quality_category

- Update transform_blend_record() to include metric placeholders
- Metrics populated as NULL initially, filled by post-processing pipeline
- Enables complete quality evaluation workflow: sync → compute → update

Aligns with research metrics from Tselepi (2025), Guo (2020), Petrovich (2021)
See METRICS.md for detailed specifications
- Complete setup instructions (Docker, Cloud, config)
- Full index schema with all quality metric fields
- 6 search examples (quality filtering, smoothness, diversity, similarity, aggregations, issue detection)
- Python client code samples
- Incremental sync strategy with Cloud Run job
- Performance tuning (sharding, batch size, query optimization)
- Monitoring and health checks
- Security best practices (API keys, HTTPS, VPC)
- Cost optimization strategies
- Complete pipeline architecture (GCS → Fivetran → BQ → dbt → ES → API)

Enables fast, scalable search across motion blend quality metrics
for animation studios and ML pipelines.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new MotionBlend connector for syncing motion-capture (MoCap) metadata from Google Cloud Storage (GCS) to BigQuery. The connector scans GCS folders for .bvh and .fbx files, extracts metadata, and loads it into three BigQuery tables representing different motion categories (seed, build, and blend motions).

Key changes:

  • Implements a new connector with GCS-to-BigQuery data pipeline
  • Supports three motion categories with different schemas and transformation logic
  • Includes comprehensive documentation for Elasticsearch integration and quality metrics

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 24 comments.

Show a summary per file
File Description
connectors/motionblend/connector.py Core connector implementation with schema, update, and transformation logic
connectors/motionblend/configuration.json Configuration parameters for GCS bucket, prefixes, and BigQuery dataset
connectors/motionblend/requirements.txt Third-party dependencies (structlog, GCP libraries, testing frameworks)
connectors/motionblend/README.md Comprehensive documentation covering setup, authentication, and usage
connectors/motionblend/METRICS.md Quality metrics documentation for motion blend evaluation
connectors/motionblend/ELASTICSEARCH.md Elasticsearch integration guide with search examples
Comments suppressed due to low confidence (4)

connectors/motionblend/connector.py:18

  • Import of 'bigquery' is not used.
from google.cloud import storage, bigquery

connectors/motionblend/connector.py:19

  • Import of 'retry' is not used.
from google.api_core import retry

connectors/motionblend/connector.py:21

  • Import of 'time' is not used.
import time

connectors/motionblend/connector.py:326

  • Import of 'sys' is not used.
    import sys

- Remove excessive inline comments from table definitions
- Move skeleton hierarchy documentation to schema() docstring
- Improve formatting consistency with other connectors
- Maintain clean, readable structure matching SDK patterns
@github-actions github-actions bot added the size/XXL PR size: extra extra large label Nov 20, 2025
Copy link
Author

@RydlrCS RydlrCS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed successfully! The schema formatting cleanup has been committed and pushed to the main branch.

@RydlrCS RydlrCS closed this Nov 20, 2025
@RydlrCS RydlrCS reopened this Nov 20, 2025
@RydlrCS
Copy link
Author

RydlrCS commented Nov 20, 2025

Production-ready code with comprehensive documentation and proper error handling.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 22 comments.

- Add last_sync_time parameter to list_gcs_files() function
- Filter GCS blobs by updated timestamp for incremental processing
- Read state before processing each prefix
- Only process files updated after last sync time
- Log incremental vs full sync mode for observability
- Remove confusing __RETRY_DELAY_SECONDS constant
- Use standard pattern: sleep_time = min(60, 2 ** attempt)
- Improves code readability and follows Python best practices
- Update schema() to lines 43-128
- Update list_gcs_files() to lines 131-222
- Update generate_record_id() to lines 249-262
- Update transform functions to lines 265-373
- Update update() to lines 375-469

Line numbers changed after incremental sync implementation
- Change google-cloud-storage>=2.0.0 to ==2.18.2
- Remove requests dependency (already available in SDK runtime)
- Ensures reproducible builds and prevents breaking changes
- Move json import from __main__ block to top with other imports
- Follow PEP 8 standard: all imports at top of file
- Maintain proper import grouping: stdlib after SDK, before third-party
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Critical fix: Track global_max_timestamp incrementally at each
checkpoint instead of only at prefix end. This prevents state
inconsistency where intermediate checkpoints would use different
timestamps than the final state update.

Problem: State was updated with last_file_timestamp at checkpoints
(line 651) but then overwritten with global_max_timestamp at the end
(line 953). If connector failed between last checkpoint and final
update, state would have intermediate timestamp, causing reprocessing.

Solution: Use shared global_max_timestamp_tracker dict that is updated
for each file and used consistently at all checkpoints. State now
always contains the true global maximum timestamp of processed files,
ensuring monotonic progression and preventing data loss/duplication.
Fix inconsistency where generate_blend_id() returned 40-character hash
while generate_record_id() returned 16 characters. This caused blend
motion IDs to be 40 chars while seed/build motion IDs were 16 chars.

Standardized to 16 characters ([:16]) for all ID generation:
- Seed motion IDs: 16 chars (already consistent)
- Build motion IDs: 16 chars (already consistent)
- Blend motion IDs: 16 chars (fixed from 40)

This ensures consistent ID format across all tables and prevents
confusion in downstream analytics.
Refactored to reuse transform_blend_record() function instead of
duplicating 32 lines of transformation logic in blend generation loop.

Changes:
- Added optional 'blend_meta' parameter to transform_blend_record()
- Function now accepts pre-computed blend metadata from generate_blend_pairs()
- Removed duplicated transformation logic (lines 749-781)
- Replaced with single call to transform_blend_record()

Benefits:
- Single source of truth for blend record transformation
- Schema changes only need to be applied in one place
- Reduced maintenance burden and risk of inconsistencies
- 32 lines of code eliminated
Reworded framework reference for better readability:
- Combined documentation references on single line
- Changed 'See https://...' to 'Framework implementation: https://...'
- More consistent and professional formatting
Extracted three helper functions to reduce cognitive complexity from >15 to <10:
- _calculate_blend_batch_limit(): Calculates batch limit based on remaining budget
- _upsert_blend_records(): Handles record transformation, upserting, and checkpointing
- _clear_motion_buffers(): Clears buffers to free memory

Benefits:
- Main function now has single-level conditional logic (complexity ~8)
- Each helper has single responsibility and low complexity (<5)
- Improved testability - helpers can be unit tested independently
- Better code organization and readability
- Maintained all error handling and logging behavior
Updated README configuration table to match exact placeholders from configuration.json:
- google_cloud_storage_prefixes: <COMMA_SEPARATED_GCS_PREFIXES> → <YOUR_COMMA_SEPARATED_PREFIXES>
- batch_limit: <OPTIONAL_BATCH_LIMIT_FOR_TESTING_ONLY> → <YOUR_OPTIONAL_BATCH_LIMIT>
- include_extensions: Changed Default column from '.bvh,.fbx' to '<YOUR_OPTIONAL_FILE_EXTENSIONS>'
- max_blend_pairs: Changed Default column from '100' to '<YOUR_OPTIONAL_MAX_BLEND_PAIRS>'
- max_motion_buffer_size: Changed Default column from '1000' to '<YOUR_OPTIONAL_MAX_BUFFER_SIZE>'

This ensures documentation accurately reflects the configuration template users will see.
Replaced conditional assignment pattern with early return for clearer intent:
- Before: if min_frames > 0 ... else actual_window_ratio = 0.0
- After: if min_frames == 0: return None (early return)

Benefits:
- Prevents potential ZeroDivisionError more explicitly
- Clearer intent - zero-length motions cannot have quality scores
- Simpler control flow without conditional assignment
- Consistent with existing early return at line 121
@fivetran-sahilkhirwal fivetran-sahilkhirwal added the accel Google hack 2025 PR related to the Accelerate AI 2025 Google sponsored Hackathon. label Dec 2, 2025
Copy link
Contributor

@fivetran-sahilkhirwal fivetran-sahilkhirwal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logical flow is correct
As these examples will be available for users to use for their use cases, it is important that these are readable and maintainable.
Suggested some changes to improve readability and refactoring some long methods to smaller functions
Thanks for contributing to the repo :)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

@fivetran-sahilkhirwal
Copy link
Contributor

Please add this in the main repository readme as well.

Python formatting check failed. See formatting.diff artifact for details.
To fix the error(s) run the command below from the root of the repo and commit the changes:
./.github/scripts/fix-python-formatting.sh

Copy link
Contributor

@fivetran-pranavtotala fivetran-pranavtotala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending: Please update on the above comments, and add required testing details in the description.

@fivetran fivetran deleted a comment from cla-assistant bot Jan 2, 2026
@cla-assistant
Copy link

cla-assistant bot commented Jan 2, 2026

CLA assistant check
All committers have signed the CLA.

…omprehensive testing docs

Major improvements:
- Refactored connector.py: broke down long methods into focused helper functions
  * list_gcs_files() -> 4 helpers (connection, blob filtering, timestamp check, record creation)
  * validate_configuration() -> 4 validators (string, prefixes, extensions, integers)
  * update() -> 6 module-level functions (batch processing, prefix processing, blend generation)
  * Reduced update() from ~300 lines to ~50 lines for better maintainability

Critical bug fixes:
- Fixed state tracking: changed from global maximum to batch-local sequential tracking
  * Prevents data loss when files arrive out of order across different batches
  * State now reflects actual processing progress within current batch
- Fixed validation edge cases:
  * Empty string handling ("0" now correctly validates as integer)
  * Division by zero in blend_utils.est
Major improvements:
- Refactored connector.py: broke down long methods ured (testing-only parameter)

Code quality improvements:
- Eliminated nested functions for bette- Refactored connemplified zero-division guards in calculate_blend_ratio()
- Explicit is not None checks instead of truthiness for optional parameters
- Applied Black formatting (99-char line length  * update() -> 6 module-level functions (batch processing, prefix processing, ble with:
  * Local testing setup and validation checklist
  * 6 detailed test scenarios (first sync, in
Critical bug fixes:
- Fiecovery, etc.)
  * Log analysis patterns and known limitations
  * Production deployment checklist
- Updated Features and Error handling sections
- Enhanced par  * State now reflects actual processing progress within current batch
- Fixecking behavior

Contributor: Ted Iro, Rydlr Cloud Services Ltd
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

- Fix _validate_optional_integer_parameter to use min_value parameter instead of hardcoded <= 0 check
- Update schema() docstring to match required template format (moved skeleton hierarchy context to module-level docstring)

These changes ensure proper parameter validation and maintain consistency with coding guidelines.
- Apply Black formatting to connector.py
- Add motionblend entry to README.md in alphabetical order between microsoft_intune and n8n
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

accel Google hack 2025 PR related to the Accelerate AI 2025 Google sponsored Hackathon. size/XXL PR size: extra extra large

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants