You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,14 +2,15 @@
2
2
3
3
AutoCruise CE is a Windows desktop automation app powered by Codex App Server and ChatGPT sign-in. It observes the current desktop, asks Codex for the next action, executes through Windows automation and input backends, and continues until the task is complete or the user stops it.
4
4
5
-
Current source and packaged release version: `1.2.0`
5
+
Current source and packaged release version: `1.3.0`
6
6
7
7
The project is experimental. Verify important operations yourself before relying on the result in a real workflow.
8
8
9
9
## What It Does
10
10
11
11
- Runs natural-language desktop tasks on Windows.
12
12
- Uses Codex App Server as the only AI runtime in this edition.
13
+
- Uses `gpt-5.5` as the fixed Codex model.
13
14
- Prefers structured Windows automation, then direct Win32 input, optional browser adapters, and visual fallback.
Copy file name to clipboardExpand all lines: docs/architecture.md
+29-26Lines changed: 29 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,15 +13,16 @@ AutoCruise CE is a Windows desktop application for autonomous GUI operation. The
13
13
- PowerShell is used only for the Microsoft UI Automation client layer where .NET UIA APIs are the most direct Windows path.
14
14
- PyInstaller produces the portable Windows folder used for releases.
15
15
16
-
### Why Structured Automation First
16
+
### Why Smart Windows Operator First
17
17
18
-
Desktop automation is not reliable if every step depends on image coordinates. AutoCruise CE therefore prefers structured adapters before vision:
18
+
Desktop automation is not reliable if every step depends on screenshots and coordinates. AutoCruise CE therefore chooses the richest direct control surface available before falling back to visual input:
19
19
20
-
1. UIA for native Windows controls, element properties, and control patterns.
21
-
2. Win32 for screenshots, windows, pointer, keyboard, clipboard, and global hotkeys.
22
-
3. Playwright only when an integration supplies a live browser page object.
23
-
4. CDP DOM / Accessibility / Input domains only as a browser fallback.
24
-
5. Vision for remaining areas such as canvases, custom-rendered controls, and coordinate-level drawing.
20
+
- App-specific APIs and object models first. Microsoft Office tasks should use COM/Object Model access for workbooks, cells, documents, messages, calendars, selections, and attachments before touching the UI.
21
+
- Browser automation for Edge, Chrome, Chromium, and web apps. Playwright locators are preferred, with CDP DOM / Runtime / Network / Input / Event domains as targeted fallback.
22
+
- PowerShell CIM/WMI and native management cmdlets for OS and administration data such as processes, services, devices, network state, registry, installed software, and settings.
23
+
- UIA for normal Windows desktop apps without a richer app API.
24
+
- MSAA or targeted Win32 messages for legacy controls when UIA is weak and the exact control/message is known.
25
+
- Vision, OCR, screenshots, raw keyboard, mouse, and coordinates only as the final fallback.
25
26
26
27
Playwright and browser binaries are optional and are not bundled in the standard package.
27
28
@@ -55,6 +56,7 @@ Playwright and browser binaries are optional and are not bundled in the standard
- Codex model selection is fixed to `gpt-5.5`; stored provider settings are normalized to that model before use.
58
60
59
61
## State Machine
60
62
@@ -84,16 +86,15 @@ Rules:
84
86
85
87
1. Interpret the user goal.
86
88
2. Select the constitution, selected system prompt, and custom instruction files.
87
-
3. Capture a screenshot and visible window state.
88
-
4. Query UIA for root, focused element, active-window descendants, and target candidates.
89
-
5. Add optional Playwright/CDP state when a connected browser page is available.
90
-
6. Build the observation payload for Codex.
91
-
7. Ask Codex for the next action.
92
-
8. Re-observe in `PRECHECK`.
93
-
9. Resolve the target through UIA / browser adapter / visual target fallback.
94
-
10. Execute one action through the best available backend.
95
-
11. Re-observe in `POSTCHECK`.
96
-
12. Validate visible progress, replan, complete, or stop with an issue record.
89
+
3. Capture the active Windows state, structured automation state, and screenshot fallback evidence.
90
+
4. Query the direct-control stack: app object models, browser automation, OS management APIs, UIA, and legacy control paths where available.
91
+
5. Build the observation payload for Codex.
92
+
6. Ask Codex for the next action.
93
+
7. Re-observe in `PRECHECK`.
94
+
8. Resolve the target through the best available direct backend before visual fallback.
95
+
9. Execute one action through the best available backend.
96
+
10. Re-observe in `POSTCHECK`.
97
+
11. Validate visible progress, replan, complete, or stop with an issue record.
97
98
98
99
## Automation Interface
99
100
@@ -138,16 +139,18 @@ If locator operations fail and CDP is available, the adapter can use CDP `DOM`,
138
139
139
140
## Prompt Context Model
140
141
141
-
Priority order:
142
+
Prompt sources:
142
143
143
-
1. Constitution
144
-
2. Session mission
145
-
3. Selected system prompt
146
-
4. User custom prompt and custom prompt files
147
-
5. Runtime observation
148
-
6. Recent execution context
144
+
- Constitution
145
+
- Selected system prompt
146
+
- User custom prompt and custom prompt files
149
147
150
-
No other bundled prompt-source categories are loaded into the model context in this edition.
148
+
Runtime inputs:
149
+
150
+
- Current session mission
151
+
- Current screen observation
152
+
153
+
Session history, thread history, audit logs, execution logs, and learning-memory sources are not loaded into the model context in this edition. Each Codex App Server call starts a fresh thread.
0 commit comments