Skip to content

Commit f0de849

Browse files
authored
Explore mode stability improvements and bug fixes (#21)
* Improve prompting and feed current URL to prompt * Fix context resetting randomly * Enable GPU acceleration * Add safe capture screenshot * Improve prompts * Fix regex misidentifying email as domain name * Fix invalid edge type * Improve UI elements in explore mode * Handle history parse errors * Improve documentation generation chat bubble UI and prompt * Improve graph rendering and responsiveness * Updates nodes in graph once a new graph data is available * Add classification when adding new nodes to existing graph * Fix autolayout not initializing when adding new node * Fix node image being incorrect sproadically * Fix images not updating for different domains in same route * Update factifai logo * Add URL extraction from LLM instead of through docker for explore mode on VNC * Add changeset * Bump version
1 parent cab1e53 commit f0de849

25 files changed

+1161
-463
lines changed

.changeset/tangy-trees-train.md

Lines changed: 0 additions & 5 deletions
This file was deleted.

CHANGELOG.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,15 @@
11
# Changelog
22

3+
## 1.3.3
4+
5+
### Patch Changes
6+
7+
- Enhanced explore mode chat with significant bug fixes and stability improvements.
8+
9+
- Including enhanced graph rendering, UI fixes, better LLM prompting, and direct URL extraction. It also addresses various bug fixes related to image display, node handling, and context management, alongside general performance and stability enhancements.
10+
11+
- Fix complete task description not rendering and handle docker launch errors
12+
313
## 1.3.2
414

515
### Minor Changes
@@ -8,8 +18,8 @@
818

919
### Patch Changes
1020

11-
- 008dcc5: Add a wait for the 'domcontentloaded' state after performing a click action to ensure the page is fully loaded.
12-
- 416cd9a: Update image output to common format - added wait time for each action in the puppeteer - remove auto launch scripts from vnc and revert to LLM based actions to work on VNC
21+
- Add a wait for the 'domcontentloaded' state after performing a click action to ensure the page is fully loaded.
22+
- Update image output to common format - added wait time for each action in the puppeteer - remove auto launch scripts from vnc and revert to LLM based actions to work on VNC
1323

1424
## 1.3.0
1525

backend/src/controllers/chatController.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ export class ChatController {
2121

2222
// Always reset and recreate the provider with the correct mode to prevent context bleed
2323
const requestedMode = req.query.mode as Modes || Modes.REGRESSION;
24-
ChatService.resetProvider();
24+
2525
ChatService.createProvider(requestedMode);
2626
console.log(`Chat provider created with mode: ${requestedMode}`);
2727

backend/src/controllers/exploreController.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ export class ExploreController {
1515
): Promise<void> {
1616
try {
1717
// Always reset and recreate the provider with explore mode to prevent context bleed
18-
ChatService.resetProvider();
18+
1919
ChatService.createProvider(Modes.EXPLORE);
2020
console.log("Explore provider created with mode: EXPLORE");
2121
// Get data from request body

backend/src/prompts/app-doc-generator.prompt.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
export const appDocumentationGeneratorPrompt = `
22
# Application Documentation Generator
33
4-
You are an expert Application Documentation Generator with deep expertise in frontend engineering, UI/UX design, and technical documentation. Your task is to thoroughly analyze the provided application (web, mobile, or desktop) and create detailed documentation that would enable another AI to recreate the application with high fidelity.
5-
Important: You should navigate to all the possible different links/sections/flows provided and explore the application thoroughly and systematically to understand its structure, components, features, and user flows. Your documentation should be comprehensive, covering all major sections, features, and user interactions (e.g) If there are multiple links in header or footer, you should explore all of them.
4+
You are an expert Application UI/UX Documentation Generator with deep expertise in frontend engineering, UI/UX design, and technical documentation. As a perfectionist with OCD issues your only task is to thoroughly analyze the provided screenshot(web, mobile, or desktop) and create detailed documentation that would enable another AI to recreate the application with high fidelity.
5+
IMPORTANT: Your documentation should be comprehensive, covering all major sections, features, and user interactions (e.g) If there are multiple links in header or footer, you should explore all of them. DON'T SAY ANYTHING ELSE. JUST DOCUMENT THE APPLICATION AS PER BELOW FORMAT. IF YOU NEED MORE INFORMATION JUST ADD A NOTE AT THE BOTTOM OF THE DOCUMENTATION TO LET ME KNOW.
66
77
## Analysis Approach
88

backend/src/prompts/explore-mode.prompt.ts

Lines changed: 133 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,59 @@
1-
const performActionPrompt = `You are FactifAI explore Agent with extensive experience in working with web applications and computer.
2-
You are exploring web/desktop/mobile application here.
3-
Your duty is to perform the Task given by taking logical actions with the tools provided.
4-
On completing the given Task you have to use the complete_task tool to present the result of your work to the user.
5-
6-
DOCUMENTATION REQUIREMENT: For EACH feature or element you explore, you MUST take a screenshot AFTER navigating to it or clicking on it. This screenshot must be saved to document the feature for later analysis.
7-
8-
Do not hallucinate on the elements or buttons. You should have 100% visual confirmation for each element.
9-
10-
you have set of tools to use.
11-
12-
# Tool Use Formatting
13-
14-
Tool use is formatted using XML-style tags. The tool name is enclosed in opening and closing tags, and each parameter is similarly enclosed within its own set of tags. Here's the structure:
15-
16-
<tool_name>
17-
<parameter1_name>value1</parameter1_name>
18-
<parameter2_name>value2</parameter2_name>
19-
...
20-
</tool_name>
1+
const performActionPrompt = `You are FactifAI Explorer Agent, specialized in systematically exploring web applications for UI cloning purposes.
2+
3+
Your mission is to thoroughly explore web/desktop/mobile applications by:
4+
1. Documenting the initial state of each page upon arrival
5+
2. Systematically exploring ALL elements on the current page
6+
3. Generating complete documentation BEFORE any action that might navigate to a new page
7+
4. Using complete_task to record your documentation before page transitions
8+
9+
# CRITICAL RULE: TOOL SEPARATION
10+
- NEVER use perform_action and complete_task in the same message
11+
- When calling complete_task, it MUST be the ONLY tool used in that message
12+
- After using complete_task, wait for user confirmation before your next action
13+
- Separate documentation (complete_task) and interaction (perform_action) into different messages
14+
15+
# SCREENSHOT COMPARISON & PAGE AWARENESS
16+
- ALWAYS be aware of the current screenshot with the previous one and page URL change
17+
- Identify and note ALL differences between screenshots after each action
18+
- Maintain awareness of visual context throughout the entire exploration
19+
20+
# Exploration Process (CRITICAL TO FOLLOW)
21+
1. INITIAL ASSESSMENT: When arriving at a new page
22+
- Compare with previous screenshot to confirm page transition
23+
- Document the page in its initial state
24+
- Identify all visible UI elements and their positions
25+
26+
2. THOROUGH EXPLORATION: Explore current page completely
27+
- Interact with non-navigational elements first (forms, buttons that don't navigate)
28+
- Scroll entire page to discover all elements
29+
- Document all UI components and their behaviors
30+
31+
3. PRE-NAVIGATION DOCUMENTATION: Before potential page transitions
32+
- IMPORTANT: Call complete_task BEFORE clicking any link or button that might navigate to a new page
33+
- Document your complete understanding of the current page
34+
- Only after documentation is complete should you proceed with navigation
35+
36+
# SMART EXPLORATION STRATEGY
37+
- Focus on documenting UNIQUE UI COMPONENTS rather than exploring every page
38+
- Recognize pattern-based content (e.g., product listings, search results) and explore only representative examples
39+
- For repeated UI patterns (e.g., product cards in an e-commerce site):
40+
1. Document ONE or TWO examples thoroughly to understand the component pattern
41+
2. Avoid exploring every instance of the same component pattern
42+
3. Note variations in the pattern, if any exist
43+
- Identify and prioritize exploration of:
44+
1. Primary navigation patterns and menus
45+
2. Core user flows (e.g., login, search, checkout)
46+
3. Unique interactive components (e.g., custom date pickers, filters)
47+
4. Different page templates (e.g., home, category, product, account pages)
48+
- Once a component pattern is documented, mark it as "explored" and avoid documenting similar instances
49+
- Focus on breadth of component coverage rather than exhaustive exploration of all content
50+
51+
Example strategy for e-commerce:
52+
- Document main navigation and header/footer only once
53+
- Explore one category page to document the category template
54+
- Explore only 1-2 product pages to document the product template
55+
- Document one instance of the checkout flow
56+
- Note any unique UI components that differ from common patterns
2157
2258
# Tools
2359
## perform_action
@@ -62,34 +98,44 @@ Common Actions (Both Sources):
6298
* scroll_down/scroll_up: Scroll the viewport.
6399
- Use when elements are partially or fully obscured.
64100
- Always verify element visibility after scrolling.
65-
- Aim to fully reveal the target element.
101+
- Scroll repeatedly to ensure you've seen ALL elements on the page.
102+
- Always scroll to both the top and bottom of each page to ensure complete coverage.
66103
67104
## complete_task:
68-
- Use this tool when the given task is completed.
69-
- Do not use this tool with any other tool.
70-
Usage: <complete_task><task_status>exploration complete</task_status><additional_info>any information/description you want to provide</additional_info></complete_task>
105+
- CRITICAL: This tool MUST be used ALONE - never with perform_action in the same message
106+
- Use when you have gained comprehensive knowledge of the current page
107+
- Always document your understanding before page transitions
108+
- Call this tool before clicking links, navigation buttons, or submitting forms that might change pages
109+
110+
Usage: <complete_task><task_status>Initiating document generation for current page</task_status><additional_info>
111+
Key information to be listed in short way:
112+
UI components: [minimal list of elements]
113+
page information: [minimal notes]
114+
</additional_info></complete_task>
71115
72116
Important Notes:
73117
- Puppeteer: Must start with 'launch' if no screenshot exists
74-
- Docker: Always analyze screenshot first, no 'launch' action needed
118+
- Docker: Always analyze screenshot first, no 'launch' action needed. NEVER FOCUS ON EXPLORING FIREFOX BROWSER FEATURES JUST FOCUS ON THE WEB PAGE ONLY.
75119
- Strictly use only one action per response and wait for the "Action Result" before proceeding.
76-
120+
- NEVER combine complete_task with perform_action - they must be in separate messages
77121
78122
Usage:
79123
<perform_action>
80124
<action>Action to perform (e.g., launch, doubleClick, click, type, scroll_down, scroll_up, keyPress)</action>
81125
<url>URL to launch the browser at (optional)</url>
82126
<coordinate>x,y coordinates (optional)</coordinate>
83127
<text>Text to type (optional)</text>
84-
<about_this_action>Give a description about the action and why it needs to be performed. Description should be short and concise and usable for testcase generation.
85-
(e.g. Click Login Button)
128+
<about_this_action>Give a description about the action and why it needs to be performed. For potentially navigation-triggering actions, mention that documentation has been completed in a previous message.
129+
(e.g. Click Login Button. Documentation of current page was completed in previous message.)
86130
</about_this_action>
87131
</perform_action>
88132
89133
Important Notes:
90-
- Puppeteer: Must start with 'launch' if no screenshot exists
91-
- Docker: Always analyze screenshot first, no 'launch' action needed
134+
- Puppeteer: Must start with 'launch' action first regardless of the existence of a screenshot. No excuses.
135+
- Docker: No 'launch' action needed. Always start fresh by typing in the given website URL in the URL bar and start the exploration, if you see existing webpage, close it and start fresh by typing the new url.
92136
- Strictly use only one action per response and wait for the "Action Result" before proceeding.
137+
- Always close the browser popups and alerts and focus on the site content only. This is important for taking screenshots and exploring the site.
138+
- NEVER combine perform_action with complete_task - they must be in separate messages (IMPORTANT)
93139
94140
95141
Source-Specific Actions:
@@ -111,8 +157,40 @@ Source-specific information:
111157
Puppeteer Only:
112158
* Viewport size: 1280x720
113159
114-
Make sure you understand the Environment Context. If the source is not provided, assume the default is Docker.
115-
`;
160+
# AVOIDING REDUNDANT DOCUMENTATION
161+
- Do NOT re-document a page if no new features or interactions are discovered
162+
- Once a page has been thoroughly explored and documented, avoid redundant documentation of the same elements
163+
- Only trigger the documentation process again if:
164+
1. You discover previously hidden or overlooked elements
165+
2. User interactions reveal new functionality
166+
3. Content dynamically changes in a significant way
167+
- If you've thoroughly explored a page and find nothing new, procee
168+
169+
# NAVIGATION VS NON-NAVIGATION ELEMENTS
170+
Before interacting with elements, classify them as:
171+
1. Non-navigation elements - explore these FIRST:
172+
- Form fields (text inputs, checkboxes, radio buttons)
173+
- Buttons that trigger actions on the same page
174+
- Dropdowns that don't navigate
175+
- Tab panels that change content within the same page
176+
- Modals and dialogs
177+
178+
2. Navigation elements - explore these ONLY AFTER documentation is complete:
179+
- Links to other pages
180+
- Navigation menus
181+
- "Next" or "Continue" buttons
182+
- Form submit buttons that direct to new pages
183+
- Login/logout buttons
184+
185+
CRITICAL SEQUENCE FOR NAVIGATION:
186+
1. Explore all non-navigation elements first
187+
2. In a separate message, call ONLY complete_task to document the page
188+
3. After receiving confirmation, use perform_action to navigate in a new message
189+
4. Before clicking ANY navigation element, ALWAYS call complete_task to document your current page knowledge.
190+
191+
Make sure you understand the Environment Context. If the source is not provided, assume the default is Docker and double click to open firefox in docker.
192+
193+
Remember: NEVER combine complete_task and perform_action in the same message. Always separate documentation and interaction into different messages. Generate complete documentation BEFORE any action that might navigate to a new page. This ensures each page is thoroughly documented before transitions occur. This is enormously important.`;
116194

117195
export const exploreModePrompt = `You are FactifAI explore Agent with extensive experience in working with web applications and computer.
118196
You are exploring web/desktop/mobile application here.
@@ -121,9 +199,21 @@ Clickable elements are elements that can cause any redirection or action on the
121199
122200
Do not hallucinate on the elements or buttons. You should have 100% visual confirmation for each element.
123201
202+
# IMPORTANT: URL DETECTION (ONLY ON DOCKER SOURCE RUNNING FIREFOX)
203+
When analyzing screenshots that show Firefox in docker once exploration starts:
204+
- Exploration starts once you type in the given URL and access the site for the first time.
205+
- Look for the address bar at the top of the browser window
206+
- Identify and read the current URL displayed in the address bar
207+
- Include the exact URL in your response using the <current_url> tag
208+
- If the address bar is not visible or the URL is partially obscured, indicate this in your response
209+
- The URL should be complete, including protocol (http:// or https://)
210+
211+
# VERY IMPORTANT
212+
- All the firefox browser buttons like back, forward, refresh, home, etc. are not clickable elements. Do not consider them as clickable elements for exploration.
124213
125214
# Output Format
126215
<explore_output>
216+
<current_url>https://example.com/current/path</current_url>
127217
<clickable_element>
128218
<text></text>
129219
<coordinates></coordinates>
@@ -133,6 +223,7 @@ Do not hallucinate on the elements or buttons. You should have 100% visual confi
133223
134224
# Usage
135225
<explore_output>
226+
<current_url>https://example.com/login</current_url>
136227
<clickable_element>
137228
<text>login</text>
138229
<coordinates>124, 340</coordinates>
@@ -149,8 +240,13 @@ Do not hallucinate on the elements or buttons. You should have 100% visual confi
149240
export const getPerformActionPrompt = (
150241
source: string,
151242
task: string,
152-
pageUrl: string,
153-
) =>
154-
`${performActionPrompt}\n Environment Context: ${source}\n
155-
Task: ${task} \n
156-
`;
243+
currentPageUrl?: string
244+
) => {
245+
let prompt = `${performActionPrompt}\n Environment Context: ${source}\n Task: ${task}`;
246+
247+
if (currentPageUrl) {
248+
prompt += `\n CURRENT PAGE URL: ${currentPageUrl}`;
249+
}
250+
251+
return prompt;
252+
};

backend/src/services/HistoryStorageService.ts

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,27 @@ export class HistoryStorageService {
4646
try {
4747
await this.initialize();
4848
const data = await readFile(SESSIONS_LIST_FILE, 'utf8');
49-
return JSON.parse(data);
49+
50+
// Handle empty file case
51+
if (!data || data.trim() === '') {
52+
console.warn('Sessions list file is empty');
53+
return [];
54+
}
55+
56+
try {
57+
return JSON.parse(data);
58+
} catch (parseError) {
59+
console.error('Error parsing sessions list JSON:', parseError);
60+
61+
// Create a backup of the corrupted file for debugging
62+
const backupPath = `${SESSIONS_LIST_FILE}.backup.${Date.now()}`;
63+
await writeFile(backupPath, data);
64+
console.warn(`Created backup of corrupted sessions list at ${backupPath}`);
65+
66+
// Return empty array and reset the file with empty array
67+
await writeFile(SESSIONS_LIST_FILE, JSON.stringify([]));
68+
return [];
69+
}
5070
} catch (error) {
5171
console.error('Error reading sessions list:', error);
5272
return [];
@@ -163,4 +183,4 @@ export class HistoryStorageService {
163183
throw new Error('Failed to migrate from localStorage');
164184
}
165185
}
166-
}
186+
}

backend/src/services/implementations/puppeteer/PuppeteerActions.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -516,6 +516,14 @@ export class PuppeteerActions {
516516
return await PuppeteerActions.puppeteerService.captureScreenshotAndInfer();
517517
}
518518

519+
static async isBrowserReady(): Promise<boolean> {
520+
if (!PuppeteerActions.puppeteerService) {
521+
console.log("PuppeteerService is not initialized");
522+
return false;
523+
}
524+
return await PuppeteerActions.puppeteerService.hasBrowserInstance();
525+
}
526+
519527
static async getCurrentUrl() {
520528
return await PuppeteerActions.puppeteerService.getCurrentUrl();
521529
}

backend/src/services/implementations/puppeteer/PuppeteerService.ts

Lines changed: 22 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,6 @@ export class PuppeteerService extends BaseStreamingService {
8383
PuppeteerService.browser = await chromium.launch({
8484
headless: true,
8585
args: [
86-
'--disable-gpu', // Disable GPU hardware acceleration
8786
'--disable-dev-shm-usage', // Overcome limited resource problems
8887
'--disable-setuid-sandbox', // Disable setuid sandbox (safety feature)
8988
'--no-sandbox', // Disable sandbox for better performance
@@ -765,18 +764,31 @@ export class PuppeteerService extends BaseStreamingService {
765764
}
766765

767766
async getCurrentUrl(): Promise<string> {
767+
// Check if browser is available and return a safe default if not
768768
if (!PuppeteerService.page) {
769-
throw new Error("Browser not launched");
769+
console.log("Warning: Browser not launched when getting URL, returning empty string");
770+
return "";
770771
}
771-
let url = PuppeteerService.page.url();
772-
console.log("===", url);
773-
if (!url) {
774-
await PuppeteerService.page.evaluate(() => {
775-
url = window.location.href;
776-
console.log("===>>>", url);
777-
});
772+
773+
try {
774+
let url = PuppeteerService.page.url();
775+
console.log("Current URL:", url);
776+
777+
// Only try to evaluate if we couldn't get the URL and the page is available
778+
if (!url && PuppeteerService.page) {
779+
try {
780+
url = await PuppeteerService.page.evaluate(() => window.location.href);
781+
console.log("URL from evaluate:", url);
782+
} catch (evalError) {
783+
console.log("Error getting URL from evaluate:", evalError);
784+
}
785+
}
786+
787+
return url || "";
788+
} catch (error) {
789+
console.log("Error getting current URL:", error);
790+
return "";
778791
}
779-
return url;
780792
}
781793

782794
/**

0 commit comments

Comments
 (0)