Skip to content

Commit e5f17cb

Browse files
committed
Release 1.3.0
1 parent a6dc325 commit e5f17cb

15 files changed

Lines changed: 603 additions & 169 deletions

File tree

README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,15 @@
22

33
AutoCruise CE is a Windows desktop automation app powered by Codex App Server and ChatGPT sign-in. It observes the current desktop, asks Codex for the next action, executes through Windows automation and input backends, and continues until the task is complete or the user stops it.
44

5-
Current source and packaged release version: `1.2.0`
5+
Current source and packaged release version: `1.3.0`
66

77
The project is experimental. Verify important operations yourself before relying on the result in a real workflow.
88

99
## What It Does
1010

1111
- Runs natural-language desktop tasks on Windows.
1212
- Uses Codex App Server as the only AI runtime in this edition.
13+
- Uses `gpt-5.5` as the fixed Codex model.
1314
- Prefers structured Windows automation, then direct Win32 input, optional browser adapters, and visual fallback.
1415
- Supports manual runs, scheduled runs, pause/resume, stop, thread history, screenshots, and prompt profiles.
1516
- Loads model context from the constitution, the selected system prompt, and custom instruction files.
@@ -20,7 +21,7 @@ AutoCruise CE is distributed as a portable Windows package. There is no installe
2021

2122
Download the latest portable archive from GitHub Releases:
2223

23-
- [AutoCruiseCE-portable-1.2.0.zip](https://github.com/sharakusatoh/autocruise/releases/download/v1.2.0/AutoCruiseCE-portable-1.2.0.zip)
24+
- [AutoCruiseCE-portable-1.3.0.zip](https://github.com/sharakusatoh/autocruise/releases/download/v1.3.0/AutoCruiseCE-portable-1.3.0.zip)
2425

2526
To run it:
2627

@@ -89,7 +90,7 @@ build_windows.bat
8990
This creates:
9091

9192
- `release\AutoCruiseCE\AutoCruiseCE.exe`
92-
- `release\AutoCruiseCE-portable-1.2.0.zip`
93+
- `release\AutoCruiseCE-portable-1.3.0.zip`
9394

9495
`release/` is intentionally excluded from Git tracking. Publish the zip through GitHub Releases.
9596

build_windows.bat

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ set "BUILD_ROOT=%~dp0build"
1010
set "BUILD_DIR=%~dp0build\pyinstaller"
1111

1212
for /f %%i in ('python -c "import sys; sys.path.insert(0, r'%~dp0src'); from autocruise.version import APP_VERSION; print(APP_VERSION, end='') "') do set "APP_VERSION=%%i"
13-
if "%APP_VERSION%"=="" set "APP_VERSION=1.2.0"
13+
if "%APP_VERSION%"=="" set "APP_VERSION=1.3.0"
1414

1515
if not exist "%RELEASE_DIR%" mkdir "%RELEASE_DIR%"
1616

constitution/constitution.md

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,4 @@
11
<Constitution>
2-
3-
# AutoCruise Constitution
4-
52
## Core Goal
6-
73
Finish the user's stated Windows task with autonomous judgment and steady progress.
8-
9-
## Required Principles
10-
11-
- Choose the action that best advances the current goal.
12-
- Keep acting while the goal remains reachable.
13-
- Re-observe when the screen state is uncertain or has changed.
14-
- If a path stalls, retry, adjust focus, or switch to another viable path.
15-
- Use the most reliable available control path, including structured automation, keyboard input, mouse input, browser control, or visual guidance.
16-
- Keep enough logs and learning notes to explain what was tried, what changed, and what worked.
17-
184
</Constitution>

docs/architecture.md

Lines changed: 29 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,16 @@ AutoCruise CE is a Windows desktop application for autonomous GUI operation. The
1313
- PowerShell is used only for the Microsoft UI Automation client layer where .NET UIA APIs are the most direct Windows path.
1414
- PyInstaller produces the portable Windows folder used for releases.
1515

16-
### Why Structured Automation First
16+
### Why Smart Windows Operator First
1717

18-
Desktop automation is not reliable if every step depends on image coordinates. AutoCruise CE therefore prefers structured adapters before vision:
18+
Desktop automation is not reliable if every step depends on screenshots and coordinates. AutoCruise CE therefore chooses the richest direct control surface available before falling back to visual input:
1919

20-
1. UIA for native Windows controls, element properties, and control patterns.
21-
2. Win32 for screenshots, windows, pointer, keyboard, clipboard, and global hotkeys.
22-
3. Playwright only when an integration supplies a live browser page object.
23-
4. CDP DOM / Accessibility / Input domains only as a browser fallback.
24-
5. Vision for remaining areas such as canvases, custom-rendered controls, and coordinate-level drawing.
20+
- App-specific APIs and object models first. Microsoft Office tasks should use COM/Object Model access for workbooks, cells, documents, messages, calendars, selections, and attachments before touching the UI.
21+
- Browser automation for Edge, Chrome, Chromium, and web apps. Playwright locators are preferred, with CDP DOM / Runtime / Network / Input / Event domains as targeted fallback.
22+
- PowerShell CIM/WMI and native management cmdlets for OS and administration data such as processes, services, devices, network state, registry, installed software, and settings.
23+
- UIA for normal Windows desktop apps without a richer app API.
24+
- MSAA or targeted Win32 messages for legacy controls when UIA is weak and the exact control/message is known.
25+
- Vision, OCR, screenshots, raw keyboard, mouse, and coordinates only as the final fallback.
2526

2627
Playwright and browser binaries are optional and are not bundled in the standard package.
2728

@@ -55,6 +56,7 @@ Playwright and browser binaries are optional and are not bundled in the standard
5556
- `infrastructure/windows/*`
5657
- `infrastructure/browser/*`
5758
- Codex connection, JSON/JSONL/YAML storage, screenshot capture, window enumeration, UIA client layer, Win32 input execution, optional Playwright/CDP adapters, and visual guidance.
59+
- Codex model selection is fixed to `gpt-5.5`; stored provider settings are normalized to that model before use.
5860

5961
## State Machine
6062

@@ -84,16 +86,15 @@ Rules:
8486

8587
1. Interpret the user goal.
8688
2. Select the constitution, selected system prompt, and custom instruction files.
87-
3. Capture a screenshot and visible window state.
88-
4. Query UIA for root, focused element, active-window descendants, and target candidates.
89-
5. Add optional Playwright/CDP state when a connected browser page is available.
90-
6. Build the observation payload for Codex.
91-
7. Ask Codex for the next action.
92-
8. Re-observe in `PRECHECK`.
93-
9. Resolve the target through UIA / browser adapter / visual target fallback.
94-
10. Execute one action through the best available backend.
95-
11. Re-observe in `POSTCHECK`.
96-
12. Validate visible progress, replan, complete, or stop with an issue record.
89+
3. Capture the active Windows state, structured automation state, and screenshot fallback evidence.
90+
4. Query the direct-control stack: app object models, browser automation, OS management APIs, UIA, and legacy control paths where available.
91+
5. Build the observation payload for Codex.
92+
6. Ask Codex for the next action.
93+
7. Re-observe in `PRECHECK`.
94+
8. Resolve the target through the best available direct backend before visual fallback.
95+
9. Execute one action through the best available backend.
96+
10. Re-observe in `POSTCHECK`.
97+
11. Validate visible progress, replan, complete, or stop with an issue record.
9798

9899
## Automation Interface
99100

@@ -138,16 +139,18 @@ If locator operations fail and CDP is available, the adapter can use CDP `DOM`,
138139

139140
## Prompt Context Model
140141

141-
Priority order:
142+
Prompt sources:
142143

143-
1. Constitution
144-
2. Session mission
145-
3. Selected system prompt
146-
4. User custom prompt and custom prompt files
147-
5. Runtime observation
148-
6. Recent execution context
144+
- Constitution
145+
- Selected system prompt
146+
- User custom prompt and custom prompt files
149147

150-
No other bundled prompt-source categories are loaded into the model context in this edition.
148+
Runtime inputs:
149+
150+
- Current session mission
151+
- Current screen observation
152+
153+
Session history, thread history, audit logs, execution logs, and learning-memory sources are not loaded into the model context in this edition. Each Codex App Server call starts a fresh thread.
151154

152155
## Logging and Storage
153156

@@ -170,7 +173,7 @@ The settings screen includes:
170173
- Pause and stop hotkeys
171174
- Codex App Server status
172175
- ChatGPT sign-in / sign-out
173-
- Codex model
176+
- Fixed Codex model: `gpt-5.5`
174177
- Reasoning effort
175178
- Planning response size
176179
- Screenshot retention

docs/release_qa_checklist.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# AutoCruise CE Release QA Checklist
22

3-
Use this checklist before shipping a Windows build. Record the build path, date, tester, and result in diagnostics or the release QA memo.
3+
Use this checklist before shipping a Windows build. Record the build path, date, tester, and result in diagnostics or the release QA record.
44

55
## Build
66

@@ -16,12 +16,12 @@ Use this checklist before shipping a Windows build. Record the build path, date,
1616

1717
- Open Settings and confirm Codex App Server status is visible.
1818
- Sign in with ChatGPT and run the connection test.
19-
- Confirm model, reasoning effort, and planning response size can be saved and restored.
19+
- Confirm the fixed Codex model displays as `gpt-5.5`, and reasoning effort plus planning response size can be saved and restored.
2020
- Confirm Japanese language selection persists after restart.
2121

2222
## Desktop Operation
2323

24-
- Run `ペイントを開いて、簡単な猫の絵を描いてください。`.
24+
- Run a Paint smoke task that opens Paint and draws a simple cat picture.
2525
- Confirm Paint launches through a direct Windows path such as Run, visible launcher, or search.
2626
- Confirm the agent waits for the Paint window and canvas.
2727
- Confirm click, drag, and curve-like multi-point drawing work on the canvas.

0 commit comments

Comments
 (0)