Skip to content

tcp/tcp.cpp throws fatal exceptions inappropriately #241

@KenBirman

Description

@KenBirman

Derecho and Cascade are intended for settings where people want high availability and where failures of various kinds are inevitable. So, a working Derecho or Cascade server should handle fault detections gracefully and not crash just because a client malfunctioned.

I'm noticing that tcp.cpp is filled with uncaught exception throws, which will cause the SERVER to crash if a client it was talking to crashes. In fact, I was able to trigger a case in which a cascade client test program hung (some unrelated issue), and when I killed it, all four Cascade servers threw:

terminate called after throwing an instance of 'tcp::incomplete_read_error'

Read EOF prematurely

Aborted

It seems clear to me that this is an overreaction to a faulty client! In general, Derecho should never throw uncaught exceptions at all, except for the "possible minority partition" one or some sort of extremely fatal startup issue. But once running, the system should ride out anything it encounters.

Probably we have other uncaught throws, but the ones worrying me right now are the half dozen in tcp/tcp.cpp. Could we possibly replace these with logged error messages, but either catch every one of them every time it could arise, or not throw them at all? If a client botches its initialization, the connection to the client should be broken -- nothing more!

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions