-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(org-detection): org detection agentic workflow #697
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This PR will trigger a minor release when merged. |
solaris007
approved these changes
Jan 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good stuff
some thoughts for the future:
- workflow should be configured declaratively
- same for tools
- prompts should not be in js code but rather static / declarative
# Conflicts: # package-lock.json # package.json
solaris007
pushed a commit
that referenced
this pull request
Jan 21, 2025
# [1.87.0](v1.86.10...v1.87.0) (2025-01-21) ### Features * **org-detection:** org detection agent ([#697](#697)) ([f65c0ea](f65c0ea))
🎉 This PR is included in version 1.87.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Org Detection Agentic Workflow
This PR introduces a
langchainjs
based agent that detects the IMS organization for a site based on its domain and github login (retrieved from thex-fw-host
header during site detection).The agent works by gathering information step by step. It decides which tools to use based on the data it has and keeps collecting more data if needed. It stops only when it finds enough information to decide the correct IMS organization.
How the Agent Works
Decision-Making Process
footer_retriever
tool to extract content from the site’s<footer>
. It scans the content for short phrases that might represent a company name and checks these candidates with thecompany_matcher
tool. If a match is found, the agent finalizes its result with the matched organization.github_org_name_retriever
tool. It then checks this name with thecompany_matcher
. If a match is identified, it finalizes the result.link_extractor
tool to gather all links from the site’s HTML. It then identifies which links (ie/about-us
or/contact-us
) are likely to lead to pages containing the company name. Based on this reasoning, the agent retrieves the main content of those pages using themain_content_retriever
and checks it with thecompany_matcher
. If a match is found during this step, the agent finalizes the result.Tools Available to the Agent
An adapter for the Spacecat Content Scraper. It retrieves HTML content from the
<footer>
element of a website, helping the agent identify potential company names.An adapter for the Spacecat Data Access layer. It matches the agent’s guesses (potential company names) against organizations stored in the Spacecat database using fuzzy matching.
Retrieves the GitHub organization name for a given login.
Similar to the Footer Retriever, this tool extracts text content from the
<main>
element of a page.Extracts all links from raw HTML and converts them to absolute URLs for further analysis.
Full Org-Detection Flow
The agent kicks-in after site detection approval and follows the steps:
Detected IMS organization ORG NAME with IMS org ID XXX@AdobeOrg for domain.com. Would you approve? @user
.imsOrg
is assigned to the site's entity in the database.Successful flow
Unsuccessful flow