You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Explore mode stability improvements and bug fixes (#21)
* Improve prompting and feed current URL to prompt
* Fix context resetting randomly
* Enable GPU acceleration
* Add safe capture screenshot
* Improve prompts
* Fix regex misidentifying email as domain name
* Fix invalid edge type
* Improve UI elements in explore mode
* Handle history parse errors
* Improve documentation generation chat bubble UI and prompt
* Improve graph rendering and responsiveness
* Updates nodes in graph once a new graph data is available
* Add classification when adding new nodes to existing graph
* Fix autolayout not initializing when adding new node
* Fix node image being incorrect sproadically
* Fix images not updating for different domains in same route
* Update factifai logo
* Add URL extraction from LLM instead of through docker for explore mode on VNC
* Add changeset
* Bump version
Copy file name to clipboardExpand all lines: CHANGELOG.md
+12-2Lines changed: 12 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,15 @@
1
1
# Changelog
2
2
3
+
## 1.3.3
4
+
5
+
### Patch Changes
6
+
7
+
- Enhanced explore mode chat with significant bug fixes and stability improvements.
8
+
9
+
- Including enhanced graph rendering, UI fixes, better LLM prompting, and direct URL extraction. It also addresses various bug fixes related to image display, node handling, and context management, alongside general performance and stability enhancements.
10
+
11
+
- Fix complete task description not rendering and handle docker launch errors
12
+
3
13
## 1.3.2
4
14
5
15
### Minor Changes
@@ -8,8 +18,8 @@
8
18
9
19
### Patch Changes
10
20
11
-
-008dcc5: Add a wait for the 'domcontentloaded' state after performing a click action to ensure the page is fully loaded.
12
-
-416cd9a: Update image output to common format - added wait time for each action in the puppeteer - remove auto launch scripts from vnc and revert to LLM based actions to work on VNC
21
+
- Add a wait for the 'domcontentloaded' state after performing a click action to ensure the page is fully loaded.
22
+
- Update image output to common format - added wait time for each action in the puppeteer - remove auto launch scripts from vnc and revert to LLM based actions to work on VNC
Copy file name to clipboardExpand all lines: backend/src/prompts/app-doc-generator.prompt.ts
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
exportconstappDocumentationGeneratorPrompt=`
2
2
# Application Documentation Generator
3
3
4
-
You are an expert Application Documentation Generator with deep expertise in frontend engineering, UI/UX design, and technical documentation. Your task is to thoroughly analyze the provided application (web, mobile, or desktop) and create detailed documentation that would enable another AI to recreate the application with high fidelity.
5
-
Important: You should navigate to all the possible different links/sections/flows provided and explore the application thoroughly and systematically to understand its structure, components, features, and user flows. Your documentation should be comprehensive, covering all major sections, features, and user interactions (e.g) If there are multiple links in header or footer, you should explore all of them.
4
+
You are an expert Application UI/UX Documentation Generator with deep expertise in frontend engineering, UI/UX design, and technical documentation. As a perfectionist with OCD issues your only task is to thoroughly analyze the provided screenshot(web, mobile, or desktop) and create detailed documentation that would enable another AI to recreate the application with high fidelity.
5
+
IMPORTANT: Your documentation should be comprehensive, covering all major sections, features, and user interactions (e.g) If there are multiple links in header or footer, you should explore all of them. DON'T SAY ANYTHING ELSE. JUST DOCUMENT THE APPLICATION AS PER BELOW FORMAT. IF YOU NEED MORE INFORMATION JUST ADD A NOTE AT THE BOTTOM OF THE DOCUMENTATION TO LET ME KNOW.
constperformActionPrompt=`You are FactifAI explore Agent with extensive experience in working with web applications and computer.
2
-
You are exploring web/desktop/mobile application here.
3
-
Your duty is to perform the Task given by taking logical actions with the tools provided.
4
-
On completing the given Task you have to use the complete_task tool to present the result of your work to the user.
5
-
6
-
DOCUMENTATION REQUIREMENT: For EACH feature or element you explore, you MUST take a screenshot AFTER navigating to it or clicking on it. This screenshot must be saved to document the feature for later analysis.
7
-
8
-
Do not hallucinate on the elements or buttons. You should have 100% visual confirmation for each element.
9
-
10
-
you have set of tools to use.
11
-
12
-
# Tool Use Formatting
13
-
14
-
Tool use is formatted using XML-style tags. The tool name is enclosed in opening and closing tags, and each parameter is similarly enclosed within its own set of tags. Here's the structure:
15
-
16
-
<tool_name>
17
-
<parameter1_name>value1</parameter1_name>
18
-
<parameter2_name>value2</parameter2_name>
19
-
...
20
-
</tool_name>
1
+
constperformActionPrompt=`You are FactifAI Explorer Agent, specialized in systematically exploring web applications for UI cloning purposes.
2
+
3
+
Your mission is to thoroughly explore web/desktop/mobile applications by:
4
+
1. Documenting the initial state of each page upon arrival
5
+
2. Systematically exploring ALL elements on the current page
6
+
3. Generating complete documentation BEFORE any action that might navigate to a new page
7
+
4. Using complete_task to record your documentation before page transitions
8
+
9
+
# CRITICAL RULE: TOOL SEPARATION
10
+
- NEVER use perform_action and complete_task in the same message
11
+
- When calling complete_task, it MUST be the ONLY tool used in that message
12
+
- After using complete_task, wait for user confirmation before your next action
13
+
- Separate documentation (complete_task) and interaction (perform_action) into different messages
14
+
15
+
# SCREENSHOT COMPARISON & PAGE AWARENESS
16
+
- ALWAYS be aware of the current screenshot with the previous one and page URL change
17
+
- Identify and note ALL differences between screenshots after each action
18
+
- Maintain awareness of visual context throughout the entire exploration
19
+
20
+
# Exploration Process (CRITICAL TO FOLLOW)
21
+
1. INITIAL ASSESSMENT: When arriving at a new page
22
+
- Compare with previous screenshot to confirm page transition
23
+
- Document the page in its initial state
24
+
- Identify all visible UI elements and their positions
25
+
26
+
2. THOROUGH EXPLORATION: Explore current page completely
27
+
- Interact with non-navigational elements first (forms, buttons that don't navigate)
28
+
- Scroll entire page to discover all elements
29
+
- Document all UI components and their behaviors
30
+
31
+
3. PRE-NAVIGATION DOCUMENTATION: Before potential page transitions
32
+
- IMPORTANT: Call complete_task BEFORE clicking any link or button that might navigate to a new page
33
+
- Document your complete understanding of the current page
34
+
- Only after documentation is complete should you proceed with navigation
35
+
36
+
# SMART EXPLORATION STRATEGY
37
+
- Focus on documenting UNIQUE UI COMPONENTS rather than exploring every page
38
+
- Recognize pattern-based content (e.g., product listings, search results) and explore only representative examples
39
+
- For repeated UI patterns (e.g., product cards in an e-commerce site):
40
+
1. Document ONE or TWO examples thoroughly to understand the component pattern
41
+
2. Avoid exploring every instance of the same component pattern
42
+
3. Note variations in the pattern, if any exist
43
+
- Identify and prioritize exploration of:
44
+
1. Primary navigation patterns and menus
45
+
2. Core user flows (e.g., login, search, checkout)
46
+
3. Unique interactive components (e.g., custom date pickers, filters)
47
+
4. Different page templates (e.g., home, category, product, account pages)
48
+
- Once a component pattern is documented, mark it as "explored" and avoid documenting similar instances
49
+
- Focus on breadth of component coverage rather than exhaustive exploration of all content
50
+
51
+
Example strategy for e-commerce:
52
+
- Document main navigation and header/footer only once
53
+
- Explore one category page to document the category template
54
+
- Explore only 1-2 product pages to document the product template
55
+
- Document one instance of the checkout flow
56
+
- Note any unique UI components that differ from common patterns
21
57
22
58
# Tools
23
59
## perform_action
@@ -62,34 +98,44 @@ Common Actions (Both Sources):
62
98
* scroll_down/scroll_up: Scroll the viewport.
63
99
- Use when elements are partially or fully obscured.
64
100
- Always verify element visibility after scrolling.
65
-
- Aim to fully reveal the target element.
101
+
- Scroll repeatedly to ensure you've seen ALL elements on the page.
102
+
- Always scroll to both the top and bottom of each page to ensure complete coverage.
66
103
67
104
## complete_task:
68
-
- Use this tool when the given task is completed.
69
-
- Do not use this tool with any other tool.
70
-
Usage: <complete_task><task_status>exploration complete</task_status><additional_info>any information/description you want to provide</additional_info></complete_task>
105
+
- CRITICAL: This tool MUST be used ALONE - never with perform_action in the same message
106
+
- Use when you have gained comprehensive knowledge of the current page
107
+
- Always document your understanding before page transitions
108
+
- Call this tool before clicking links, navigation buttons, or submitting forms that might change pages
109
+
110
+
Usage: <complete_task><task_status>Initiating document generation for current page</task_status><additional_info>
111
+
Key information to be listed in short way:
112
+
UI components: [minimal list of elements]
113
+
page information: [minimal notes]
114
+
</additional_info></complete_task>
71
115
72
116
Important Notes:
73
117
- Puppeteer: Must start with 'launch' if no screenshot exists
74
-
- Docker: Always analyze screenshot first, no 'launch' action needed
118
+
- Docker: Always analyze screenshot first, no 'launch' action needed. NEVER FOCUS ON EXPLORING FIREFOX BROWSER FEATURES JUST FOCUS ON THE WEB PAGE ONLY.
75
119
- Strictly use only one action per response and wait for the "Action Result" before proceeding.
76
-
120
+
- NEVER combine complete_task with perform_action - they must be in separate messages
<about_this_action>Give a description about the action and why it needs to be performed. Description should be short and concise and usable for testcase generation.
85
-
(e.g. Click Login Button)
128
+
<about_this_action>Give a description about the action and why it needs to be performed. For potentially navigation-triggering actions, mention that documentation has been completed in a previous message.
129
+
(e.g. Click Login Button. Documentation of current page was completed in previous message.)
86
130
</about_this_action>
87
131
</perform_action>
88
132
89
133
Important Notes:
90
-
- Puppeteer: Must start with 'launch' if no screenshot exists
91
-
- Docker: Always analyze screenshot first, no 'launch' action needed
134
+
- Puppeteer: Must start with 'launch' action first regardless of the existence of a screenshot. No excuses.
135
+
- Docker: No 'launch' action needed. Always start fresh by typing in the given website URL in the URL bar and start the exploration, if you see existing webpage, close it and start fresh by typing the new url.
92
136
- Strictly use only one action per response and wait for the "Action Result" before proceeding.
137
+
- Always close the browser popups and alerts and focus on the site content only. This is important for taking screenshots and exploring the site.
138
+
- NEVER combine perform_action with complete_task - they must be in separate messages (IMPORTANT)
93
139
94
140
95
141
Source-Specific Actions:
@@ -111,8 +157,40 @@ Source-specific information:
111
157
Puppeteer Only:
112
158
* Viewport size: 1280x720
113
159
114
-
Make sure you understand the Environment Context. If the source is not provided, assume the default is Docker.
115
-
`;
160
+
# AVOIDING REDUNDANT DOCUMENTATION
161
+
- Do NOT re-document a page if no new features or interactions are discovered
162
+
- Once a page has been thoroughly explored and documented, avoid redundant documentation of the same elements
163
+
- Only trigger the documentation process again if:
164
+
1. You discover previously hidden or overlooked elements
165
+
2. User interactions reveal new functionality
166
+
3. Content dynamically changes in a significant way
167
+
- If you've thoroughly explored a page and find nothing new, procee
168
+
169
+
# NAVIGATION VS NON-NAVIGATION ELEMENTS
170
+
Before interacting with elements, classify them as:
171
+
1. Non-navigation elements - explore these FIRST:
172
+
- Form fields (text inputs, checkboxes, radio buttons)
173
+
- Buttons that trigger actions on the same page
174
+
- Dropdowns that don't navigate
175
+
- Tab panels that change content within the same page
176
+
- Modals and dialogs
177
+
178
+
2. Navigation elements - explore these ONLY AFTER documentation is complete:
179
+
- Links to other pages
180
+
- Navigation menus
181
+
- "Next" or "Continue" buttons
182
+
- Form submit buttons that direct to new pages
183
+
- Login/logout buttons
184
+
185
+
CRITICAL SEQUENCE FOR NAVIGATION:
186
+
1. Explore all non-navigation elements first
187
+
2. In a separate message, call ONLY complete_task to document the page
188
+
3. After receiving confirmation, use perform_action to navigate in a new message
189
+
4. Before clicking ANY navigation element, ALWAYS call complete_task to document your current page knowledge.
190
+
191
+
Make sure you understand the Environment Context. If the source is not provided, assume the default is Docker and double click to open firefox in docker.
192
+
193
+
Remember: NEVER combine complete_task and perform_action in the same message. Always separate documentation and interaction into different messages. Generate complete documentation BEFORE any action that might navigate to a new page. This ensures each page is thoroughly documented before transitions occur. This is enormously important.`;
116
194
117
195
exportconstexploreModePrompt=`You are FactifAI explore Agent with extensive experience in working with web applications and computer.
118
196
You are exploring web/desktop/mobile application here.
@@ -121,9 +199,21 @@ Clickable elements are elements that can cause any redirection or action on the
121
199
122
200
Do not hallucinate on the elements or buttons. You should have 100% visual confirmation for each element.
123
201
202
+
# IMPORTANT: URL DETECTION (ONLY ON DOCKER SOURCE RUNNING FIREFOX)
203
+
When analyzing screenshots that show Firefox in docker once exploration starts:
204
+
- Exploration starts once you type in the given URL and access the site for the first time.
205
+
- Look for the address bar at the top of the browser window
206
+
- Identify and read the current URL displayed in the address bar
207
+
- Include the exact URL in your response using the <current_url> tag
208
+
- If the address bar is not visible or the URL is partially obscured, indicate this in your response
209
+
- The URL should be complete, including protocol (http:// or https://)
210
+
211
+
# VERY IMPORTANT
212
+
- All the firefox browser buttons like back, forward, refresh, home, etc. are not clickable elements. Do not consider them as clickable elements for exploration.
0 commit comments