Skip to content

fix: add transaction support for dgraph migration#3186

Open
krisp619 wants to merge 3 commits intogofr-dev:developmentfrom
krisp619:fix-dgraph-transaction
Open

fix: add transaction support for dgraph migration#3186
krisp619 wants to merge 3 commits intogofr-dev:developmentfrom
krisp619:fix-dgraph-transaction

Conversation

@krisp619
Copy link

Fixes #2892

Summary:
This change adds transaction support to Dgraph migration commit flow to ensure consistency during migration execution.

Changes:

  • Wrapped mutation logic inside Dgraph transaction
  • Added txn interface with Mutate / Commit / Discard
  • Used NewTxn + defer Discard pattern
  • Commit on success, discard on error
  • Updated migration tests

Copilot AI review requested due to automatic review settings March 21, 2026 18:44
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds transactional semantics to the Dgraph migration “commit” step so the migration metadata insert is performed via an explicit Dgraph transaction (mutate + commit + discard), improving consistency of recording applied migrations.

Changes:

  • Introduced a small Dgraph transaction interface (Mutate/Commit/Discard) and guarded type assertion for NewTxn().
  • Updated Dgraph migration commit flow to use NewTxn() with defer Discard() and explicit Commit() on success.
  • Updated Dgraph migration tests to validate commit/discard behavior across success and failure cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
pkg/gofr/migration/dgraph.go Wraps the migration record insert in a Dgraph transaction (NewTxnMutateCommit, with deferred Discard).
pkg/gofr/migration/dgraph_test.go Reworks commit tests to use a mock transaction and assert commit/discard behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +177 to +180
tx, ok := c.DGraph.NewTxn().(dgraphTxn)
if !ok {
return errInvalidDgraphTxn
}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commitMigration has a new failure path when c.DGraph.NewTxn() doesn't implement the expected transaction interface, but the tests only cover mutate/commit errors. Add a test case where NewTxn() returns a non-transaction object (or nil) and assert errInvalidDgraphTxn is returned to prevent regressions in this guard logic.

Copilot uses AI. Check for mistakes.
Copy link
Member

@coolwednesday coolwednesday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up, @krisp619. The transaction wrapping pattern is correct and the test structure is solid. However, there are a few issues that need to be addressed before merge.


Blocking: Loss of observability on mutation

File: pkg/gofr/migration/dgraph.go:184

The original code called c.DGraph.Mutate() which goes through Client.Mutate() (dgraph.go:205-237). That method:

  • Creates an OpenTelemetry span via addTrace(ctx, "mutate")
  • Logs mutation failures via d.logger.Error
  • Records histogram metrics (dgraph_mutate_duration) via sendOperationStats
  • Pretty-prints the query log

The new tx.Mutate() calls txnImpl.Mutate() directly — a raw pass-through to the dgo client with no tracing, no logging, no metrics. Migration mutations become invisible to the observability stack.

Options:

  • (a) Add tracing/logging/metrics to the transaction-based mutation path to match existing behaviour
  • (b) At minimum, log mutation duration and errors at the migration layer
  • (c) Document this as a known trade-off if observability on migration mutations is intentionally deprioritized

Discard error silently dropped

File: pkg/gofr/migration/dgraph.go:182

defer tx.Discard(ctx)

Discard returns an error that is silently dropped. While this matches the idiomatic dgo pattern, the framework logs errors from other datastores. Consider:

defer func() {
    if dErr := tx.Discard(ctx); dErr != nil {
        c.Errorf("dgraph: transaction discard failed: %v", dErr)
    }
}()

Missing test coverage for edge cases

File: pkg/gofr/migration/dgraph_test.go

The following edge cases are not covered:

# Edge Case Why It Matters
1 NewTxn() returns a type that doesn't implement dgraphTxn errInvalidDgraphTxn path is never exercised. Test with: mockDGraph.EXPECT().NewTxn().Return("not-a-txn")
2 NewTxn() returns nil nil type assertion should yield ok=false, verify it returns errInvalidDgraphTxn and doesn't panic. Test with: mockDGraph.EXPECT().NewTxn().Return(nil)
3 Discard returns an error Verify no panic and the commit flow still returns the correct error (mutate or commit error, not discard error)
4 Discard called after successful Commit Verify dgo no-op behaviour — Discard after Commit should not corrupt data
5 json.Marshal fails before NewTxn() Verify no transaction is leaked (txn never created). Pre-existing path but good to confirm.

Minor style nits

  1. var ( errInvalidDgraphTxn = ... ) — Single var doesn't need parentheses. Use: var errInvalidDgraphTxn = errors.New("...")

  2. commitDone field in mockDgraphTxn — This is set to true even when commit returns an error. The name implies success. Consider renaming to commitCalled for clarity.

  3. Error message"invalid dgraph transaction type" gives no debugging info. Consider:

    return fmt.Errorf("dgraph: NewTxn() returned %T, expected dgraphTxn", c.DGraph.NewTxn())

Please add screenshots/logs showing

The PR description should include evidence of correct transaction behaviour. Specifically:

1. Successful migration run

  • Start a GoFr application with Dgraph configured and a pending migration
  • Show logs confirming: transaction created → mutate succeeded → commit succeeded
  • Query Dgraph to verify the migration record exists:
    {
      migrations(func: type(Migration), orderdesc: migrations.version, first: 1) {
        migrations.version
        migrations.method
        migrations.start_time
        migrations.duration
      }
    }
  • Screenshot of the query result showing the migration was recorded

2. Failed mutation (rollback verification)

  • Simulate a mutation failure (e.g., malformed JSON payload, network error, or Dgraph down)
  • Show logs confirming: transaction created → mutate failed → discard called → migration rolled back
  • Query Dgraph to verify no partial migration record was committed

3. Failed commit (post-mutate failure)

  • Simulate a commit failure (e.g., transaction conflict / network error after mutate)
  • Show logs confirming: transaction created → mutate succeeded → commit failed → discard called
  • Query Dgraph to verify the mutation was not persisted (transaction was discarded)

4. Idempotency check

  • Run the same migration twice
  • Show that the second run detects the migration is already applied (via getLastMigration query) and skips it
  • This confirms the commit path writes the correct version number

5. Before/after comparison (optional but helpful)

  • Run a migration on the development branch (without this PR) and show Dgraph server logs
  • Run the same migration on this branch and show the difference — explicit BeginTxnCommit sequence visible in logs

Edge cases to test against (manual or integration)

Beyond the unit tests listed above, these should be verified against a real Dgraph instance:

# Scenario What to check
1 Dgraph server goes down between Mutate and Commit Error is returned, no partial data, app doesn't panic
2 Context cancelled during Mutate Transaction is discarded, migration is marked as failed
3 Large migration payload (many predicates/fields) Mutation + commit complete within timeout, no OOM
4 Multiple GoFr instances running migrations concurrently Transaction conflicts are handled gracefully (dgo returns ErrAborted)
5 Dgraph returns ErrAborted (transaction conflict) Error is propagated, migration can be retried
6 Schema migration (ApplySchema) succeeds but metadata commit fails Schema change is applied but migration is not recorded — next run will retry. Document this as expected behaviour.

Summary

The core transaction logic is correct — NewTxn → Mutate → Commit with defer Discard is the right pattern. The blocking issue is the loss of tracing/logging/metrics on the mutation path. Once that's addressed and the missing test cases are added, this is good to merge.

@Umang01-hash
Copy link
Member

Hey @krisp619 !
In order to move forward with this PR please resolve the review comments given by @coolwednesday. If you need any help feel free to ask here!

@krisp619
Copy link
Author

@Umang01-hash @coolwednesday Thanks for the detailed review.

I’ve been going through the suggested changes and verifying the current implementation,
but I still need some time to properly address the observability and additional test cases mentioned.

I’ll continue working on this and push the updates soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Transaction in dGraph Migration

4 participants