You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue documents three validated architectural gaps found in the wrapper package at src/praisonai/praisonai/. Each one directly contradicts a stated pillar of the PraisonAI philosophy (async-safe by default, no global singletons, 3-way feature surface, DRY), and each has a clear, minimal fix.
The findings below are not stylistic — they are correctness / safety issues that bite in production:
Github actions fix #1 silently breaks on Python 3.12+ and races inside the wrapper's own _BackgroundLoop.
It's already broken on modern Python.asyncio.get_event_loop() has emitted DeprecationWarning since 3.10 when there is no running loop, and as of 3.12 it raises in that case. The two cli/commands/standardise.py call sites are pure sync entry points — there is no running loop — so on Python 3.12+ they raise on first invocation. The correct sync entry is asyncio.run(runtime.start()).
It interacts dangerously with the wrapper's own background loop.praisonai/_async_bridge.py:19-80 defines _BackgroundLoop, a thread-bound asyncio loop that runs coroutines submitted from sync callers. Any code reached via run_sync() executes inside that background loop's thread; if helper code then calls asyncio.get_event_loop() from a different thread (e.g. a callback fired from a sync worker), get_event_loop() may attach a brand-new event loop to that thread or raise — both outcomes silently violate the "multi-agent + async safe by default" pillar.
It defeats the protocol-driven, lightweight design. The whole point of _async_bridge.py is to centralize loop ownership. Each of these 36 sites re-derives a loop ad-hoc instead of using asyncio.get_running_loop() (inside async def) or asyncio.run() / run_sync() (at the sync boundary).
How to resolve
Two simple, mechanical rewrites — no API change:
Inside async def (most of the 36 sites): replace with asyncio.to_thread(...) (Python 3.9+, the modern equivalent of "off-load a sync call to a worker thread"):
- loop = asyncio.get_event_loop()- result = await loop.run_in_executor(None, lambda: execute_resolved_recipe(resolved))+ result = await asyncio.to_thread(execute_resolved_recipe, resolved)
At sync entry points (cli/commands/standardise.py, etc.): use asyncio.run (or route through praisonai._async_bridge.run_sync if the wrapper's shared loop is needed):
src/praisonai/praisonai/scheduler/async_agent_scheduler.py — 468 lines, async-native but the author left the safety-critical features as inline TODOs.
The drift, in the author's own words
# src/praisonai/praisonai/scheduler/async_agent_scheduler.py:65-95classAsyncAgentScheduler:
""" Async-native scheduler for running PraisonAI agents periodically. ... - Timeout support (TODO: needs porting from sync version) - Budget tracking (TODO: needs porting from sync version) - YAML/recipe constructors (TODO: needs porting from sync version) """def__init__(
self,
agent,
task: str,
config: Optional[Dict[str, Any]] =None,
on_success: Optional[Callable] =None,
on_failure: Optional[Callable] =None,
# TODO: Add these missing features from sync version:# timeout: Optional[int] = None,# max_cost: Optional[float] = 1.00
):
# src/praisonai/praisonai/scheduler/async_agent_scheduler.py:113-116# TODO: Add these missing features from sync version:# self.timeout = timeout# self.max_cost = max_cost# self._total_cost = 0.0
# src/praisonai/praisonai/scheduler/async_agent_scheduler.py:352-396 (excerpt)asyncdef_execute_with_retry(self, max_retries: int):
"""Execute agent with retry logic. TODO: Port missing features from sync version: - Timeout support per execution - Budget tracking and limits - Daemon state updates (_update_state_if_daemon) """
...
# TODO: Add budget check from sync version:# if self.max_cost and self._total_cost >= self.max_cost:# logger.warning(f"Budget limit reached: ${self._total_cost:.4f} >= ${self.max_cost}")# return
...
# TODO: Add timeout support from sync version:# if self.timeout:# result = await asyncio.wait_for(# self._executor.execute(self.task),# timeout=self.timeout# )# else:result=awaitself._executor.execute(self.task)
Compare to the sync side which actually enforces both:
It violates the 3-way feature surface pillar. The promise is "every feature ships with CLI + YAML + Python". A user who picks the async API for a long-running 24/7 scheduler loses the budget ceiling and the per-run timeout — the two features that make scheduled agents safe to leave running. There is no warning at the API surface; the parameters simply do not exist on the async class.
It violates DRY. ~1000 lines of scheduler logic split across two files where ~70% of the bodies (schedule parsing, retry policy, stats counters, daemon state, callbacks, success/failure dispatch) are identical. Every future bugfix has to be applied twice and tends to diverge — which is exactly what already happened to timeout and max_cost.
It violates safe by default. An async user who follows the documented example (AsyncAgentScheduler(agent, task="…"), then await scheduler.start("hourly")) gets:
no per-execution timeout — a hung LLM call wedges the schedule indefinitely;
no budget cap — runaway cost is silent;
no daemon state file updates — praisonai scheduler list/status doesn't reflect async daemons correctly.
How to resolve
Lift the shared behaviour into a common base, then make the sync and async classes thin specializations of the I/O strategy only. Sketch:
Then AgentScheduler provides the threading + concurrent.futures timeout, and AsyncAgentScheduler provides the asyncio.wait_for timeout — both calling into the shared _SchedulerCore for accounting:
It contains a textbook TOCTOU race. Two concurrent requests on a cold worker can both pass if _store is None: (or if _executor is None:) before either assignment lands, then each constructs its own instance. The "winner" overwrites the loser, but the loser's JobExecutor has already started its own background asyncio tasks via await executor.start() on the lifespan path — those tasks are now orphaned and writing to a detached InMemoryJobStore that no router holds. Submitted jobs land in the orphaned store and appear lost.
The module's own comment admits the gap. Line 21: # Global instances (for single-process deployment). This directly contradicts the philosophy ("no global singletons", "multi-agent + async safe by default") and forecloses any multi-worker (uvicorn --workers N) deployment because each worker silently creates its own in-process store and jobs become routable to only one worker.
The codebase already has the right pattern next door.src/praisonai/praisonai/cli_backends/registry.py does this correctly with a lock and double-checked locking:
The jobs server simply hasn't adopted the same convention.
How to resolve
Two complementary changes — neither is invasive.
a. Make the lazy init atomic (eliminates the race today):
# src/praisonai/praisonai/jobs/server.py
+import threading+
_store: Optional[JobStore] = None
_executor: Optional[JobExecutor] = None
+_store_lock = threading.Lock()+_executor_lock = threading.Lock()
def get_store() -> JobStore:
global _store
- if _store is None:- _store = InMemoryJobStore(max_jobs=1000)+ if _store is None:+ with _store_lock:+ if _store is None:+ _store = InMemoryJobStore(max_jobs=1000)
return _store
def get_executor() -> JobExecutor:
global _executor
- if _executor is None:- _executor = JobExecutor(- store=get_store(),- max_concurrent=int(os.environ.get("PRAISONAI_MAX_CONCURRENT_JOBS", "10")),- default_timeout=int(os.environ.get("PRAISONAI_JOB_TIMEOUT", "3600"))- )+ if _executor is None:+ with _executor_lock:+ if _executor is None:+ _executor = JobExecutor(+ store=get_store(),+ max_concurrent=int(os.environ.get("PRAISONAI_MAX_CONCURRENT_JOBS", "10")),+ default_timeout=int(os.environ.get("PRAISONAI_JOB_TIMEOUT", "3600"))+ )
return _executor
b. Prefer dependency injection through app.state (eliminates the global, makes multi-tenant deployment possible). The factory create_app() already accepts store / executor parameters; stash them on app.state and have the router read from there instead of from module-level globals:
All three are local to src/praisonai/praisonai/ and require no changes in praisonaiagents (the core SDK), so they fit cleanly under the heavy code lives in the wrapper layering rule.
Scope
This issue documents three validated architectural gaps found in the wrapper package at
src/praisonai/praisonai/. Each one directly contradicts a stated pillar of the PraisonAI philosophy (async-safe by default, no global singletons, 3-way feature surface, DRY), and each has a clear, minimal fix.The findings below are not stylistic — they are correctness / safety issues that bite in production:
_BackgroundLoop.JobExecutorbackground loops.All file paths, line numbers and snippets below were read from the tree at
claude/bold-bohr-MOE3G(HEADd5f1114 Release v4.6.48).1. Pervasive use of the deprecated
asyncio.get_event_loop()— unsafe inside_BackgroundLoop, broken on Python 3.12+Where it is
grep -rn "asyncio.get_event_loop()" src/praisonai/praisonaireturns 36 call sites across 14 modules. The most damaging ones:a.
cli/commands/standardise.py:381and:491— sync entry path callingrun_until_complete()on a non-running loop.b.
sandbox/e2b.py— 7 calls insideasync defmethods whereget_running_loop()is the correct API.Other identical patterns in
e2b.pyat lines 164, 222, 230, 236, 321, 339.c.
jobs/executor.py— 3 calls inside the async job runner.Identical pattern at lines 326 and 363.
d. Other affected modules (verified by grep):
bots/email.py:235, 475, 555·gateway/server.py:897, 971, 2143·api/agent_invoke.py:224, 363·acp/server.py:420·ui/callbacks.py:37·recipe/operations.py:215·cli/features/interactive_tools.py:255·cli_backends/claude.py:177, 181·browser/cli.py(multiple).Why this is an architectural gap
It's already broken on modern Python.
asyncio.get_event_loop()has emittedDeprecationWarningsince 3.10 when there is no running loop, and as of 3.12 it raises in that case. The twocli/commands/standardise.pycall sites are pure sync entry points — there is no running loop — so on Python 3.12+ they raise on first invocation. The correct sync entry isasyncio.run(runtime.start()).It interacts dangerously with the wrapper's own background loop.
praisonai/_async_bridge.py:19-80defines_BackgroundLoop, a thread-boundasyncioloop that runs coroutines submitted from sync callers. Any code reached viarun_sync()executes inside that background loop's thread; if helper code then callsasyncio.get_event_loop()from a different thread (e.g. a callback fired from a sync worker),get_event_loop()may attach a brand-new event loop to that thread or raise — both outcomes silently violate the "multi-agent + async safe by default" pillar.It defeats the protocol-driven, lightweight design. The whole point of
_async_bridge.pyis to centralize loop ownership. Each of these 36 sites re-derives a loop ad-hoc instead of usingasyncio.get_running_loop()(insideasync def) orasyncio.run()/run_sync()(at the sync boundary).How to resolve
Two simple, mechanical rewrites — no API change:
Inside
async def(most of the 36 sites): replace withasyncio.to_thread(...)(Python 3.9+, the modern equivalent of "off-load a sync call to a worker thread"):At sync entry points (
cli/commands/standardise.py, etc.): useasyncio.run(or route throughpraisonai._async_bridge.run_syncif the wrapper's shared loop is needed):A CI guard makes this a one-time fix:
2.
AsyncAgentScheduleris a degraded, drift-prone fork ofAgentScheduler— no timeout, no budget cap, no YAML / recipe constructorsWhere it is
Two near-parallel implementations:
src/praisonai/praisonai/scheduler/agent_scheduler.py— 554 lines, full-featured sync scheduler.src/praisonai/praisonai/scheduler/async_agent_scheduler.py— 468 lines, async-native but the author left the safety-critical features as inlineTODOs.The drift, in the author's own words
Compare to the sync side which actually enforces both:
Why this is an architectural gap
It violates the 3-way feature surface pillar. The promise is "every feature ships with CLI + YAML + Python". A user who picks the async API for a long-running 24/7 scheduler loses the budget ceiling and the per-run timeout — the two features that make scheduled agents safe to leave running. There is no warning at the API surface; the parameters simply do not exist on the async class.
It violates DRY. ~1000 lines of scheduler logic split across two files where ~70% of the bodies (schedule parsing, retry policy, stats counters, daemon state, callbacks, success/failure dispatch) are identical. Every future bugfix has to be applied twice and tends to diverge — which is exactly what already happened to
timeoutandmax_cost.It violates safe by default. An async user who follows the documented example (
AsyncAgentScheduler(agent, task="…"), thenawait scheduler.start("hourly")) gets:praisonai scheduler list/statusdoesn't reflect async daemons correctly.How to resolve
Lift the shared behaviour into a common base, then make the sync and async classes thin specializations of the I/O strategy only. Sketch:
Then
AgentSchedulerprovides the threading +concurrent.futurestimeout, andAsyncAgentSchedulerprovides theasyncio.wait_fortimeout — both calling into the shared_SchedulerCorefor accounting:This:
3.
jobs/server.pyinitialises sharedJobStore/JobExecutorvia unlocked global singletons — race on cold startWhere it is
get_store()andget_executor()are both called from the FastAPI lifespan and from request handlers (server.py:121,server.py:130-131):Why this is an architectural gap
It contains a textbook TOCTOU race. Two concurrent requests on a cold worker can both pass
if _store is None:(orif _executor is None:) before either assignment lands, then each constructs its own instance. The "winner" overwrites the loser, but the loser'sJobExecutorhas already started its own background asyncio tasks viaawait executor.start()on the lifespan path — those tasks are now orphaned and writing to a detachedInMemoryJobStorethat no router holds. Submitted jobs land in the orphaned store and appear lost.The module's own comment admits the gap. Line 21:
# Global instances (for single-process deployment). This directly contradicts the philosophy ("no global singletons", "multi-agent + async safe by default") and forecloses any multi-worker (uvicorn --workers N) deployment because each worker silently creates its own in-process store and jobs become routable to only one worker.The codebase already has the right pattern next door.
src/praisonai/praisonai/cli_backends/registry.pydoes this correctly with a lock and double-checked locking:The jobs server simply hasn't adopted the same convention.
How to resolve
Two complementary changes — neither is invasive.
a. Make the lazy init atomic (eliminates the race today):
b. Prefer dependency injection through
app.state(eliminates the global, makes multi-tenant deployment possible). The factorycreate_app()already acceptsstore/executorparameters; stash them onapp.stateand have the router read from there instead of from module-level globals:After this, the
_store/_executormodule globals can be deleted entirely, satisfying the no global singletons pillar and unblocking horizontal scale.Suggested ordering
get_event_loop()→asyncio.to_thread/asyncio.run), guardable in CI.All three are local to
src/praisonai/praisonai/and require no changes inpraisonaiagents(the core SDK), so they fit cleanly under the heavy code lives in the wrapper layering rule.