Scheduler timeout and no new scheduled jobs with big flow_run_state/task_run_state tables

### Bug summary

**First check**
-  I added a descriptive title to this issue.
-  I used the GitHub search to find a similar issue and didn't find it.
-  I searched the Prefect documentation for this issue.
-  I checked that this issue is related to Prefect and not one of its dependencies.

**Bug summary**
About two months ago, I successfully completed the migration to Prefect 2. Initially, all flow runs were logged very verbosely to ensure thorough error analysis. Afterward, all flows ran daily with mostly normal logging.

Two days ago, I noticed that after a new deployment, scheduled jobs started disappearing one after the other and were not being rescheduled despite an active schedule.

In the logs of the Prefect server container, I found the following traceback: 

> 08:06:22.202 | ERROR   | prefect.server.services.scheduler - Unexpected error in: TimeoutError()
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.10/site-packages/prefect/server/services/loop_service.py", line 79, in start
>     await self.run_once()
>   File "/usr/local/lib/python3.10/site-packages/prefect/server/database/dependencies.py", line 125, in async_wrapper
>     return await fn(*args, **kwargs)
>   File "/usr/local/lib/python3.10/site-packages/prefect/server/services/scheduler.py", line 103, in run_once
>     inserted_runs = await self._insert_scheduled_flow_runs(
>   File "/usr/local/lib/python3.10/site-packages/prefect/server/database/dependencies.py", line 125, in async_wrapper
>     return await fn(*args, **kwargs)
>   File "/usr/local/lib/python3.10/site-packages/prefect/server/services/scheduler.py", line 290, in _insert_scheduled_flow_runs
>     return await models.deployments._insert_scheduled_flow_runs(
>   File "/usr/local/lib/python3.10/site-packages/prefect/server/database/dependencies.py", line 168, in async_wrapper
>     return await func(db, *args, **kwargs)  # type: ignore
>   File "/usr/local/lib/python3.10/site-packages/prefect/server/models/deployments.py", line 687, in _insert_scheduled_flow_runs
>     await session.execute(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/ext/asyncio/session.py", line 461, in execute
>     result = await greenlet_spawn(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 203, in greenlet_spawn
>     result = context.switch(value)
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2306, in execute
>     return self._execute_internal(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 2191, in _execute_internal
>     result: Result[Any] = compile_state_cls.orm_execute_statement(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/bulk_persistence.py", line 1262, in orm_execute_statement
>     result = _bulk_insert(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/bulk_persistence.py", line 197, in _bulk_insert
>     result = persistence._emit_insert_statements(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 1048, in _emit_insert_statements
>     result = connection.execute(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1422, in execute
>     return meth(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/sql/elements.py", line 514, in _execute_on_connection
>     return connection._execute_clauseelement(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1644, in _execute_clauseelement
>     ret = self._execute_context(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1850, in _execute_context
>     return self._exec_single_context(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1990, in _exec_single_context
>     self._handle_dbapi_exception(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 2360, in _handle_dbapi_exception
>     raise exc_info[1].with_traceback(exc_info[2])
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 1940, in _exec_single_context
>     self.dialect.do_executemany(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 916, in do_executemany
>     cursor.executemany(statement, parameters)
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 577, in executemany
>     return self._adapt_connection.await_(
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 131, in await_only
>     return current.driver.switch(awaitable)  # type: ignore[no-any-return]
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/_concurrency_py3k.py", line 196, in greenlet_spawn
>     value = await result
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 569, in _executemany
>     self._handle_exception(error)
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 501, in _handle_exception
>     self._adapt_connection._handle_exception(error)
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 791, in _handle_exception
>     raise error
>   File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/asyncpg.py", line 565, in _executemany
>     return await self._connection.executemany(
>   File "/usr/local/lib/python3.10/site-packages/asyncpg/connection.py", line 391, in executemany
>     return await self._executemany(command, args, timeout)
>   File "/usr/local/lib/python3.10/site-packages/asyncpg/connection.py", line 1911, in _executemany
>     result, _ = await self._do_execute(query, executor, timeout)
>   File "/usr/local/lib/python3.10/site-packages/asyncpg/connection.py", line 1948, in _do_execute
>     result = await executor(stmt, timeout)
>   File "asyncpg/protocol/protocol.pyx", line 267, in bind_execute_many
> asyncio.exceptions.TimeoutError

After reviewing the Postgres database, I discovered that the "task_run_state" and "flow_run_state" tables had grown to over 40GB in size.

As a test, I truncated both tables, and the scheduler was able to plan new jobs again and ran without any timeouts.

I suspect that due to the strictly scheduled loop interval (5 seconds), Postgres on slower/heavily loaded systems may not be able to deliver results in time, causing asyncio to fall into a timeout. As a result, no new jobs are being scheduled.


### Version info (`prefect version` output)

```Text
Version:             2.20.2
API version:         0.8.4
Python version:      3.10.14
Git commit:          51c3f290
Built:               Wed, Aug 14, 2024 11:27 AM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         server
```

### Additional context

Interestingly, the runtime of my MAIN_EXECUTION flow was reduced from nearly 3 hours to 1 hour after truncating the tables.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler timeout and no new scheduled jobs with big flow_run_state/task_run_state tables #15045

Bug summary

Version info (`prefect version` output)

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scheduler timeout and no new scheduled jobs with big flow_run_state/task_run_state tables #15045

Description

Bug summary

Version info (prefect version output)

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Version info (`prefect version` output)