Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(org-detection): org detection agentic workflow #697

Merged
merged 31 commits into from
Jan 21, 2025
Merged

Conversation

ekremney
Copy link
Member

@ekremney ekremney commented Jan 14, 2025

Org Detection Agentic Workflow

This PR introduces a langchainjs based agent that detects the IMS organization for a site based on its domain and github login (retrieved from the x-fw-host header during site detection).

The agent works by gathering information step by step. It decides which tools to use based on the data it has and keeps collecting more data if needed. It stops only when it finds enough information to decide the correct IMS organization.

How the Agent Works

Decision-Making Process

  • The agent starts by using the footer_retriever tool to extract content from the site’s <footer>. It scans the content for short phrases that might represent a company name and checks these candidates with the company_matcher tool. If a match is found, the agent finalizes its result with the matched organization.
  • If no match is found in the footer, the agent retrieves the GitHub organization name using the github_org_name_retriever tool. It then checks this name with the company_matcher. If a match is identified, it finalizes the result.
  • If neither the footer nor the GitHub details yield a match, the agent uses the link_extractor tool to gather all links from the site’s HTML. It then identifies which links (ie /about-us or /contact-us) are likely to lead to pages containing the company name. Based on this reasoning, the agent retrieves the main content of those pages using the main_content_retriever and checks it with the company_matcher. If a match is found during this step, the agent finalizes the result.
  • If all these steps fail, the agent concludes that no organization could be identified.

Tools Available to the Agent

  1. Footer Retriever
    An adapter for the Spacecat Content Scraper. It retrieves HTML content from the <footer> element of a website, helping the agent identify potential company names.
  2. Company Matcher
    An adapter for the Spacecat Data Access layer. It matches the agent’s guesses (potential company names) against organizations stored in the Spacecat database using fuzzy matching.
  3. GitHub Org Name Retriever
    Retrieves the GitHub organization name for a given login.
  4. Main Content Retriever
    Similar to the Footer Retriever, this tool extracts text content from the <main> element of a page.
  5. Link Extractor
    Extracts all links from raw HTML and converts them to absolute URLs for further analysis.

Full Org-Detection Flow

The agent kicks-in after site detection approval and follows the steps:

  1. Checks if the detected site belongs to a customer (does not proceed for friend and family).
  2. If a potential IMS organization is detected:
    • Sends a Slack message to announce org detection: Detected IMS organization ORG NAME with IMS org ID XXX@AdobeOrg for domain.com. Would you approve? @user.
  3. Based on user feedback:
    • If approved: The detected imsOrg is assigned to the site's entity in the database.
    • If rejected: A follow-up Slack message is sent asking users to set the organization manually using a Slack command:
      @spacecat set imsorg [url] [imsOrgId]. Example:
      @spacecat set imsorg spacecat.com 000000000000000000000000@AdobeOrg
      
    • The backend fetches or creates the organization details, then assigns it to the site.

Successful flow

successful flow

Unsuccessful flow

unsuccessful flow

Copy link

This PR will trigger a minor release when merged.

Copy link
Member

@solaris007 solaris007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good stuff

some thoughts for the future:

  • workflow should be configured declaratively
  • same for tools
  • prompts should not be in js code but rather static / declarative

@ekremney ekremney requested a review from ddragosd January 15, 2025 17:49
@ekremney ekremney merged commit f65c0ea into main Jan 21, 2025
5 checks passed
@ekremney ekremney deleted the org-detector-agent branch January 21, 2025 10:42
solaris007 pushed a commit that referenced this pull request Jan 21, 2025
# [1.87.0](v1.86.10...v1.87.0) (2025-01-21)

### Features

* **org-detection:** org detection agent ([#697](#697)) ([f65c0ea](f65c0ea))
@solaris007
Copy link
Member

🎉 This PR is included in version 1.87.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@ekremney ekremney changed the title feat(org-detection): org detection agent feat(org-detection): org detection agentic workflow Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants