Skip to content

Sometimes failed cluster causes sql driver panic #41

Open
@axllent

Description

Hi @otoolep! I've been doing some cluster testing in my local network using three computers running rqlited, and a single instance of Mailpit. While ingesting a constant stream of incoming emails via Mailpit (thus writing as fast as rqlite can handle) I have been randomly dropping out one of the three rqlite nodes to test fault tolerance / recovery etc. rqlited seems to handle this very elegantly, but every now and then I get a panic from the gorqlite sql driver panic: runtime error: index out of range [0] with length 0:

ERRO[2024/04/14 16:04:21] [db] error storing message: tried all peers unsuccessfully. here are the results:
   peer #0: http://192.168.7.2:4001/db/execute?timings&level=weak&transaction failed due to Post "http://192.168.7.2:4001/db/execute?timings&level=weak&transaction": dial tcp 192.168.7.2:4001: connect: connection refused
   peer #1: http://192.168.7.50:4001/db/execute?timings&level=weak&transaction failed, got: 503 Service Unavailable, message: leader not found

   peer #2: http://192.168.7.1:4001/db/execute?timings&level=weak&transaction failed, got: 503 Service Unavailable, message: leader not found
 
panic: runtime error: index out of range [0] with length 0

goroutine 376180 [running]:
github.com/rqlite/gorqlite.(*Connection).WriteOneParameterizedContext(...)
        /home/ralph/.cache/go/pkg/mod/github.com/rqlite/[email protected]/write.go:79
github.com/rqlite/gorqlite/stdlib.(*Stmt).ExecContext(0xc008ce1410, {0x108d1b8, 0x1745480}, {0xc000e79860, 0xa, 0x1083460?})
        /home/ralph/.cache/go/pkg/mod/github.com/rqlite/[email protected]/stdlib/sql.go:96 +0x3f8
database/sql.ctxDriverStmtExec({0x108d1b8, 0x1745480}, {0x108d730, 0xc008ce1410}, {0xc000e79860, 0xa, 0x108cb80?})
        /usr/lib/go-1.22/src/database/sql/ctxutil.go:65 +0xa3
database/sql.resultFromStatement({0x108d1b8, 0x1745480}, {0x108cb80, 0xc00011c518}, 0xc000cf5438, {0xc000cf58b0, 0xa, 0xa})
        /usr/lib/go-1.22/src/database/sql/sql.go:2670 +0x13a
database/sql.(*DB).execDC(0x0?, {0x108d1b8, 0x1745480}, 0xc0001cc750, 0x10?, {0xc0043c1ae0, 0x99}, {0xc000cf58b0, 0xa, 0xa})
        /usr/lib/go-1.22/src/database/sql/sql.go:1722 +0x42c
database/sql.(*Tx).ExecContext(0xc007d56a80, {0x108d1b8, 0x1745480}, {0xc0043c1ae0, 0x99}, {0xc000cf58b0, 0xa, 0xa})
        /usr/lib/go-1.22/src/database/sql/sql.go:2506 +0xad
database/sql.(*Tx).Exec(...)
        /usr/lib/go-1.22/src/database/sql/sql.go:2515
...

exit status 2

I've tried to find the cause of the panic and I can clearly see where it is happening, just not why it is happening. WriteOne(), WriteOneContext(), WriteOneParameterized() and WriteOneParameterizedContext() are all potentially affected by this too as they all return wra[0], err (eg: here). If wra[0] does not exist (ie: empty slice) you will get a panic. I just can't work out what is causing wra to be empty as your error handling appears to append the error as a result. The only thing I can think of is if the final:

 else {
	return results, nil
}

... if results has no results... Hopefully this means more to you than me?

Ensuring the wra[0] exists before trying to return it is obviously the safest solution (though maybe not so elegant), and I do not know what the consequences of that are "down the food chain" (you're way more familiar with the code than I am).

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions