Replies: 1 comment 1 reply
-
Good question. As much as I'd like to share the extraction methods used I fear they might be patched. However there are some ways you can increase your trust in the prompts: Ask the LLM if is some parts are correct. Then try asking about some parts which are not part of the claimed extracted prompt. Does the extracted prompt make sense internally? Web search some parts of the extracted prompt and see if others have extracted the exact same. Are there any specific extractions you find sus? In general, if all these prompts are simply written by me and are fake, that would make for a pretty short, pointless and stupid venture. With experience you sus out hallucinated system prompts pretty quickly, which is pretty easy to rule out with simple regenerations, if incorrect they should be different as per LLM's generative nature. And of course any and all extractions are result of same exact output being derived from countless generations with a fresh context. At the end of the day though, nobody can be sure all of these are 100% correct, but realistically they are almost surely about 9999,9999% correct as the whole internet diverge on the factual system prompts behind the scenes, this is mostly a completely solved issue with no system prompt that I know of realistically not getting revealed in the end (gpt-thinking models are kind of the final boss in this sport these days, but still has probably squealed all its secrets, though this can change by the day). |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
-
Sus
Beta Was this translation helpful? Give feedback.
All reactions