Archived entries from file /home/mgalgs/src/makepkg-checkwrapper/TODO.org

–audit=agentic

> Let's add a new flag --audit=agentic, and give the LLM some tool calls:

- listdir
- readfile

In this mode, we'll instruct the LLM that it's performing a security audit and tell it what tools it has available and ask it to go to town. It still needs to produce a report with
the xml tags for us to parse.

We should always give it the intial directory listing since presumably it's always going to want to at least do one directory listing at the top-level. Let's just provide that up
front.

This was inspired by the following session, where I'm trying to make sure that this tool would have actually caught the recent google-chrome-stable vulnerability.

```
> ./aur-sleuth --audit=sources google-chrome-stable
Created temporary directory: /tmp/aur-sleuth-rmn9jrhe
Cloning https://aur.archlinux.org/google-chrome-stable.git...
Running makepkg --nobuild to download sources...

--- Auditing Source Files ---
[AUDIT] Checking source file: /tmp/aur-sleuth-rmn9jrhe/src/eula_text.html (53070 bytes)
ERROR: Could not audit source file /tmp/aur-sleuth-rmn9jrhe/src/eula_text.html: mismatched tag: line 24, column 2

AUDIT FAILED. See reasons above.
Cleaning up temporary directory: /tmp/aur-sleuth-rmn9jrhe
```

Hopefully the agent would be smart enough to not even bother "auditing" the eula_text.html, but would instead hone in on the shell script, which contains the malicious curl command.

Standard agentic loop, keep it clean.

Make the session class a little smarter

It should actually encapsulate the LLM calls. The OpenAI client will live inside the session class. That way it can track sizes internally without callers having to track sizes.

Use <PKGBUILD></PKGBUILD> delimiters instead of PKGBUILD CONTENT: with markdown triple-backticks

XML seems to be processed better by LLMs.

Improve debug logging

Just use python logging framework and send it to the tmp debug log file.

Currently we only flush the logs at the end, but sometimes things get hung before that. We should configure `logging` to get logs right away.

Streamline UI

Currently we’re writing quite a bit to stdout. Instead of sending the full audit to stdout we should write it to a log file (different than the debug log file, which is already quite noisy), and provide a rich, TUI display to the user.

We’ll assume that users have a modern terminal installed with plenty of support for everything we need to make a responsive terminal user interface.

Here’s how I imagine the output:

“` ./aur-sleuth –audit=agentic google-chrome-stable

,----

{spinner} Analyzing google-chrome-stable

`---- [{Status details}] “`

(but with a full box, this is just a mockup)

And rather than pushing stdout, we would refresh the content inside the box with current status:

“` ,----

{spinner} Auditing PKGBUILD…

`---- [0 issues found, 2 files left to process] “`

And when a single “box action” is completed we add some newlines to “finalize” that box and push it up.

(imagine an issue was found there and then we move on to install.sh)

“` ,----

X PKGBUILD Audit failure

{3 sentence description of failure}

`----

,----

{spinner} Auditing install.sh…

`---- [1 issue found, 1 file left to process] “`

“` ,----

Audit complete! Result: FAIL
Issues found:
- {3 sentence description of first issue}
Full audit report can be found in /tmp/aur-sleuth-report.txt

`---- “`

A success would just be:

“` ,----

Audit complete! Result: SUCCESS
Full audit report can be found in /tmp/aur-sleuth-report.txt

`---- “`

Send recursive file listing to agent and remove listdir tool

Agent can decide which files (possibly all) to read

In order to maintain its own state, I think the agent is going to need a “WriteFile” tool to keep a checklist of files it needs to review. Or do you think it will be able to keep it in its context? I worry that it will forget since it could be reading some huge files, so the history of files it has already read is going to fall out of context. It’s almost like during the code review portion it would be a “recursive” agentic LLM call, don’t need the full audit conversation history to perform a code review of a single file. And the audit conversation history only needs to record the outcome of the review in its context, not the code listing or the code review / audit report. What’s the canonical/proper way to handle this in an agentic LLM loop?

API cost and token usage improvements

Create a wrapper class for our “client” instance (OpenAI). This will contain an OpenAI instance, and also take care of aggregating all costs and token usage throughout the whole session.

For OpenRouter we can use their API to dynamically retrieve up-to-date pricing info:

> curl -s https://openrouter.ai/api/v1/models | jq ‘.data[] | select(.id == “qwen/qwen3-coder”) | .pricing’ { “prompt”: “0.0000002”, “completion”: “0.0000008”, “request”: “0”, “image”: “0”, “audio”: “0”, “web_search”: “0”, “internal_reasoning”: “0” }

Prompt injection

Our system and users prompts are currently both vulnerable to prompt injection since they’re using unsanitized user inputs. Please fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

–audit=agentic

Make the session class a little smarter

Use <PKGBUILD></PKGBUILD> delimiters instead of PKGBUILD CONTENT: with markdown triple-backticks

Improve debug logging

Streamline UI

Send recursive file listing to agent and remove listdir tool

API cost and token usage improvements

Prompt injection

FilesExpand file tree

TODO.org_archive

Latest commit

History

TODO.org_archive

File metadata and controls

–audit=agentic

Make the session class a little smarter

Use <PKGBUILD></PKGBUILD> delimiters instead of PKGBUILD CONTENT: with markdown triple-backticks

Improve debug logging

Streamline UI

Send recursive file listing to agent and remove listdir tool

API cost and token usage improvements

Prompt injection