Skip to content

feat: add DAL (CM-951)#3836

Merged
ulemons merged 3 commits intofeat/add-project-discovery-workerfrom
feat/add-dal-automatic-project-discovery
Mar 26, 2026
Merged

feat: add DAL (CM-951)#3836
ulemons merged 3 commits intofeat/add-project-discovery-workerfrom
feat/add-dal-automatic-project-discovery

Conversation

@ulemons
Copy link
Contributor

@ulemons ulemons commented Feb 10, 2026

Note

Medium Risk
Introduces a new Temporal ingestion path that streams large external datasets over HTTP and bulk upserts into Postgres, plus new DAL modules that will be used by other services. Risk is mainly around data correctness/performance and operational reliability (timeouts/retries, external API/bucket behavior).

Overview
Adds an Automatic Projects Discovery Temporal worker that discovers OSS repos from pluggable external sources and bulk upserts them into projectCatalog in batches, with new activities (listSources, listDatasets, processDataset) and a workflow mode switch (incremental latest-only vs full).

Introduces a source registry and two initial sources: OSSF Criticality Score (CSV snapshots from a public GCS bucket) and LF Criticality Score (paginated JSON API), including streaming + parsing/error propagation, and updates the Temporal schedule to run daily at midnight with a 2-hour workflow timeout.

Extends the data-access-layer with new project-catalog and evaluated-projects modules (CRUD, bulk insert/upsert helpers, and types) and exports them from the DAL index; also adds the csv-parse dependency and enables Postgres for the worker.

Written by Cursor Bugbot for commit e9852f1. This will update automatically on new commits. Configure here.

@ulemons ulemons self-assigned this Feb 10, 2026
@ulemons ulemons added the POC label Feb 11, 2026
@ulemons ulemons changed the title feat: add DAL feat: add DAL (CM-951) Feb 11, 2026
@ulemons ulemons force-pushed the feat/add-project-discovery-worker branch 2 times, most recently from b03cb7a to 425e4cb Compare March 24, 2026 10:23
@ulemons ulemons force-pushed the feat/add-dal-automatic-project-discovery branch from bb118d5 to f4cc9b8 Compare March 24, 2026 10:30
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@ulemons ulemons force-pushed the feat/add-dal-automatic-project-discovery branch 2 times, most recently from 3cbe8ec to 0af35ee Compare March 24, 2026 11:56
@ulemons ulemons marked this pull request as ready for review March 24, 2026 14:01
Copilot AI review requested due to automatic review settings March 24, 2026 14:01
@ulemons ulemons force-pushed the feat/add-project-discovery-worker branch from b5c3b03 to 7e6dc18 Compare March 24, 2026 14:01
@ulemons ulemons force-pushed the feat/add-dal-automatic-project-discovery branch from 0af35ee to 82f29d9 Compare March 24, 2026 14:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new Data Access Layer (DAL) modules for the project catalog and evaluated projects domain, and re-exports them from the DAL package entrypoint.

Changes:

  • Introduces project-catalog DAL: types plus CRUD-ish query helpers (find/insert/bulk insert/upsert/update/delete).
  • Introduces evaluated-projects DAL: types plus query helpers for evaluation lifecycle and bulk insert.
  • Exports both modules via services/libs/data-access-layer/src/index.ts.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
services/libs/data-access-layer/src/project-catalog/types.ts Defines DB-facing types for project catalog records and create/update payloads.
services/libs/data-access-layer/src/project-catalog/projectCatalog.ts Adds SQL helpers for selecting/inserting/upserting/updating/deleting project catalog rows.
services/libs/data-access-layer/src/project-catalog/index.ts Barrel export for the project-catalog module.
services/libs/data-access-layer/src/evaluated-projects/types.ts Defines DB-facing types for evaluated projects and create/update payloads.
services/libs/data-access-layer/src/evaluated-projects/evaluatedProjects.ts Adds SQL helpers for evaluated project operations (find/insert/bulk insert/update/mark evaluated/onboarded/delete).
services/libs/data-access-layer/src/evaluated-projects/index.ts Barrel export for the evaluated-projects module.
services/libs/data-access-layer/src/index.ts Re-exports the newly added modules from the DAL package entrypoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@themarolt
Copy link
Contributor

@ulemons please check cursor bugbot comments - I think some are pretty valid.

ulemons and others added 3 commits March 26, 2026 10:16
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
Signed-off-by: Umberto Sgueglia <usgueglia@contractor.linuxfoundation.org>
@ulemons ulemons force-pushed the feat/add-dal-automatic-project-discovery branch from fdf1177 to e9852f1 Compare March 26, 2026 09:17
@ulemons ulemons merged commit 28630f1 into feat/add-project-discovery-worker Mar 26, 2026
6 checks passed
@ulemons ulemons deleted the feat/add-dal-automatic-project-discovery branch March 26, 2026 09:17
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor


const log = getServiceLogger()

const DEFAULT_API_URL = 'https://hypervascular-nonduplicative-vern.ngrok-free.dev'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ngrok development URL hardcoded as production default

High Severity

DEFAULT_API_URL is set to a temporary ngrok tunnel URL (https://hypervascular-nonduplicative-vern.ngrok-free.dev). If the LF_CRITICALITY_SCORE_API_URL environment variable is not configured, the LF criticality score source will attempt to connect to this ephemeral development endpoint. In production this will either fail outright (tunnel down) or connect to an unintended service.

Fix in Cursor Fix in Web


if (res.statusCode && (res.statusCode < 200 || res.statusCode >= 300)) {
reject(new Error(`HTTP ${res.statusCode} for ${url}`))
return
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP responses not consumed on error or redirect

Low Severity

Both httpsGet and getHttpsStream reject or follow redirects without calling res.resume() on the response. The LF source's fetchPage correctly calls res.resume() on error (line 54), but these functions don't. Unconsumed HTTP responses prevent the underlying TCP socket from being released back to the agent's connection pool, which can lead to socket exhaustion during repeated bucket listing or when retries encounter errors.

Additional Locations (1)
Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants