Skip to content

Conversation

@sneako
Copy link
Member

@sneako sneako commented Oct 31, 2025

Note

Converts erlcass into an embeddable supervisor with exponential-backoff connection management and safer NIF prepare behavior, updates config to persistent_term, improves reconnection and ETS handling, and adjusts tests/benchmarks.

  • Runtime/OTP structure:
    • Replace application module with supervisor erlcass_sup (embeddable via child_spec/1), add stop/0 and release_cluster/0.
    • Introduce erlcass_connection_manager for cluster setup, monitoring, and exponential-backoff reconnect; prepares configured statements at startup.
    • src/erlcass.erl: add reconnect scheduling (retry_ref, reconnect_delay), handle not-connected replies, and prepare cached statements on (re)connect.
  • NIF/C++ safety:
    • Track session connection state in enif_cass_session; set on connect/close callbacks.
    • Block cass_session_prepare when not connected, returning "no Cassandra connections available"; add extra allocation/error checks.
    • Add constant kNoConnectionsAvailableMsg.
  • Configuration:
    • New persistent_term-based config in erlcass_utils (set_config/1, get_config/0, clear_config/0), supports maps/proplists; default reconnect_delay (?RECONNECT_DELAY).
  • Caches/ETS:
    • Make erlcass_stm_cache create idempotent; add erlcass_stm_sessions:reset/0 and idempotent create.
  • Tests/benchmarks:
    • Common Test uses erlcass_sup:start_link/1 and stop/0; add no_connection_prepare_SUITE asserting prepare fails without connections.
    • Benchmarks start via supervisor and pre-prepare statements when using erlcass.
  • Misc:
    • erlcass.app.src: set {type, library}; add ?RECONNECT_DELAY in headers.
    • Add AGENTS.md and extend .gitignore for CT artifacts.

Written by Cursor Bugbot for commit cd1afec. This will update automatically on new commits. Configure here.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 19 files

Prompt for AI agents (all 4 issues)

Understand the root cause of the following 4 issues and fix them.


<file name="c_src/nif_cass_session.cc">

<violation number="1" location="c_src/nif_cass_session.cc:90">
The async session callbacks now dereference cb-&gt;session-&gt;connected, but the callback only keeps a raw enif_cass_session* without enif_keep_resource. If Erlang GC drops the session term before the future completes, nif_cass_session_free can free that memory, so this write touches freed memory and can segfault.</violation>
</file>

<file name="src/erlcass.app.src">

<violation number="1" location="src/erlcass.app.src:11">
Replacing the application callback with `{type, library}` removes the `erlcass` application module. OTP will now treat `erlcass` as a pure library, so `application:start(erlcass)` fails with `{error,{bad_type,library}}` and the supervisor in `erlcass_sup` never starts. Please keep the `{mod, {erlcass_app, []}}` entry so the application can boot.</violation>
</file>

<file name="benchmarks/load_test.erl">

<violation number="1" location="benchmarks/load_test.erl:87">
Stopping and restarting the supervisor here clears the newly prepared `execute_query` statement, yet the code never re-prepares it before `async_execute/2` runs, so the benchmark will crash once the driver restarts.</violation>
</file>

<file name="src/erlcass.erl">

<violation number="1" location="src/erlcass.erl:522">
When `session_prepare_cached_statements/1` fails part-way, we close the new session but leave any already-inserted prepared statement entries in ETS pointing at that closed session. Please clear `erlcass_stm_sessions` (e.g., call `erlcass_stm_sessions:reset()` before returning) in both error branches so callers cannot bind/execute against stale session pointers.</violation>
</file>

React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.

{
result = cass_future_error_to_nif_term(cb->env, future);
if(cb->session != NULL)
cb->session->connected = false;
Copy link

@cubic-dev-ai cubic-dev-ai bot Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The async session callbacks now dereference cb->session->connected, but the callback only keeps a raw enif_cass_session* without enif_keep_resource. If Erlang GC drops the session term before the future completes, nif_cass_session_free can free that memory, so this write touches freed memory and can segfault.

Prompt for AI agents
Address the following comment on c_src/nif_cass_session.cc at line 90:

<comment>The async session callbacks now dereference cb-&gt;session-&gt;connected, but the callback only keeps a raw enif_cass_session* without enif_keep_resource. If Erlang GC drops the session term before the future completes, nif_cass_session_free can free that memory, so this write touches freed memory and can segfault.</comment>

<file context>
@@ -80,9 +84,17 @@ void on_session_connect(CassFuture* future, void* user_data)
+    {
         result = cass_future_error_to_nif_term(cb-&gt;env, future);
+        if(cb-&gt;session != NULL)
+            cb-&gt;session-&gt;connected = false;
+    }
     else
</file context>
Fix with Cubic

stdlib
]},
{mod, {erlcass_app, []}},
{type, library},
Copy link

@cubic-dev-ai cubic-dev-ai bot Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing the application callback with {type, library} removes the erlcass application module. OTP will now treat erlcass as a pure library, so application:start(erlcass) fails with {error,{bad_type,library}} and the supervisor in erlcass_sup never starts. Please keep the {mod, {erlcass_app, []}} entry so the application can boot.

Prompt for AI agents
Address the following comment on src/erlcass.app.src at line 11:

<comment>Replacing the application callback with `{type, library}` removes the `erlcass` application module. OTP will now treat `erlcass` as a pure library, so `application:start(erlcass)` fails with `{error,{bad_type,library}}` and the supervisor in `erlcass_sup` never starts. Please keep the `{mod, {erlcass_app, []}}` entry so the application can boot.</comment>

<file context>
@@ -8,7 +8,7 @@
         stdlib
     ]},
-    {mod, {erlcass_app, []}},
+    {type, library},
     {env, []},
 
</file context>
Suggested change
{type, library},
{mod, {erlcass_app, []}},
Fix with Cubic

eprof:start_profiling([self()]),
ok = application:set_env(erlcass, keyspace, ?KEYSPACE),
start(),
ok = erlcass_sup:stop(),
Copy link

@cubic-dev-ai cubic-dev-ai bot Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stopping and restarting the supervisor here clears the newly prepared execute_query statement, yet the code never re-prepares it before async_execute/2 runs, so the benchmark will crash once the driver restarts.

Prompt for AI agents
Address the following comment on benchmarks/load_test.erl at line 87:

<comment>Stopping and restarting the supervisor here clears the newly prepared `execute_query` statement, yet the code never re-prepares it before `async_execute/2` runs, so the benchmark will crash once the driver restarts.</comment>

<file context>
@@ -81,8 +84,8 @@ do_profiling(NrProc, RequestsNr, RepeatNumber) -&gt;
     eprof:start_profiling([self()]),
-    ok = application:set_env(erlcass, keyspace, ?KEYSPACE),
-    start(),
+    ok = erlcass_sup:stop(),
+    ok = start(#{keyspace =&gt; ?KEYSPACE}),
 
</file context>
Fix with Cubic

ok ->
{ok, SessionRef};
{error, _} = Error ->
_ = do_close(SessionRef, self(), ?CONNECT_TIMEOUT),
Copy link

@cubic-dev-ai cubic-dev-ai bot Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When session_prepare_cached_statements/1 fails part-way, we close the new session but leave any already-inserted prepared statement entries in ETS pointing at that closed session. Please clear erlcass_stm_sessions (e.g., call erlcass_stm_sessions:reset() before returning) in both error branches so callers cannot bind/execute against stale session pointers.

Prompt for AI agents
Address the following comment on src/erlcass.erl at line 522:

<comment>When `session_prepare_cached_statements/1` fails part-way, we close the new session but leave any already-inserted prepared statement entries in ETS pointing at that closed session. Please clear `erlcass_stm_sessions` (e.g., call `erlcass_stm_sessions:reset()` before returning) in both error branches so callers cannot bind/execute against stale session pointers.</comment>

<file context>
@@ -479,6 +512,45 @@ session_prepare_cached_statements(SessionRef) -&gt;
+                ok -&gt;
+                    {ok, SessionRef};
+                {error, _} = Error -&gt;
+                    _ = do_close(SessionRef, self(), ?CONNECT_TIMEOUT),
+                    Error;
+                Other -&gt;
</file context>
Fix with Cubic

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

@sneako
Copy link
Member Author

sneako commented Nov 3, 2025

@codex review this PR

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +162 to +166
schedule_retry(State0, Reason) ->
State1 = cancel_timer(clear_child(State0)),
ok = erlcass_sup:release_cluster(),
Delay = maps:get(backoff, State1),
Next = next_backoff(Delay, maps:get(max_backoff, State1)),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Avoid releasing cluster while erlcass session keeps retrying

When a connection attempt fails, schedule_retry/2 releases the global Cassandra cluster but leaves the existing erlcass gen_server running and reconnecting on its own timer. The reconnect path (handle_info({'timeout',…, reconnect}, State)attempt_connect/0) uses cass_session_connect/2 which expects cassandra_data->cluster to be non‑NULL. After release_cluster/0 that pointer is freed and set to NULL, so the next retry from the still‑running session will call into the NIF with a freed cluster pointer, leading to a VM crash when Cassandra is down at startup. Either stop the erlcass child before calling release_cluster/0 or pause its reconnect timer until a new cluster is created.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants