Skip to content

Potential deadlock related to CosmosDB write conflict retires #176

@chrnola

Description

@chrnola

I have two recent examples of an application that was running @walmartlabs/[email protected] (in other words, inclusive of the patch introduced by #150) which seemingly stopped reading from its input sources and processing messages. This application runs in ParallelismMode.Concurrent and is configured as such:

import { EventSourced } from "@walmartlabs/cookie-cutter-azure";
import { cached } from "@walmartlabs/cookie-cutter-core";
import { RedisStreamSource } from "@walmartlabs/cookie-cutter-redis/dist/RedisStreamSource";

...

Application...
  .input()
    .add(new RedisStreamSource(redisStreamSourceConfig))
  .done()
    .state(cached(
      IdempotentState,
      EventSourced.cosmosState(
        cosmosConfig,
        IdempotentState,
        new IdempotentStateAggregator())))
  .dispatch(messageHandler)
  .output()
    .stored(EventSourced.cosmosSink(cosmosConfig))
  .done()
  .run(runtime);

In both cases, the application logs show sequence number conflict, retrying, and then the service processes a few more messages, but then stops consuming all new messages roughly 30-120 seconds later.

From the application logs:

{"time":"2020-08-21T20:50:14.424Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042988127-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["02bbd437-88c0-4e64-900a-354ddb1582d8"],"domainId":["2b8794839f01cb1e1bbebc73a0e62c42"],"domainSubId":["37"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:50:14.426Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042988364-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["bed0a32e-600d-45cf-a3f5-b1870091152c"],"domainId":["f5c2602fb47d37cc6d8daccb1de09cd8"],"domainSubId":["51"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:50:14.427Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042988534-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["8ae11440-1feb-4ab0-a475-05775435eff9"],"domainId":["59ce1fd5f5c5d22a25c08bba5cfd5ef7"],"domainSubId":["29"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:50:14.427Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042988776-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["12be727f-b501-4e09-9039-30b1aefcebb0"],"domainId":["37c802c39c78853d065720b74ad4b22e"],"domainSubId":["50"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:50:14.428Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042989216-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["23833bbc-a0c1-47fc-aa1e-5d48d4332df4"],"domainId":["596cb9e631d155e735b9a532dde615fa"],"domainSubId":["49"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:50:15.227Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042990686-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["7b19e38e-eb33-4d92-9f5a-6fd3aadbbc24"],"domainId":["117d6fc11884fe25eeebfa86401913f1"],"domainSubId":["35"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:50:55.627Z","level":"WARN","ctx":"my-service.1_0_557","fields":{"msg":["sequence number conflict, retrying"],"key":["4f93787c2b12edbe4ab05f90c50a8b0e"],"newSn":[52],"expectedSn":[51],"actualSn":[52]}}
{"time":"2020-08-21T20:51:03.015Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventY event to Cosmos"],"streamId":["1598042938430-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"domainId":["9c6208d8033d093bf1263b24a7417d7a"],"domainSubId":["46"],"domainOutcomeFlag":[false],"domainScore":[0]}}
{"time":"2020-08-21T20:51:03.017Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042938649-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["d225de3f-e6c9-41da-8b15-091d4d14e51d"],"domainId":["4f93787c2b12edbe4ab05f90c50a8b0e"],"domainSubId":["31"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:51:03.018Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042938651-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["32de314a-e9b2-4096-a248-e45251199acf"],"domainId":["4f93787c2b12edbe4ab05f90c50a8b0e"],"domainSubId":["31"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:51:13.474Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042939184-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["2dca514d-a83f-4e7d-8d82-13a19ef4b28a"],"domainId":["f5c2602fb47d37cc6d8daccb1de09cd8"],"domainSubId":["51"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:51:13.994Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042939369-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["be982b1b-6461-4ac2-8e38-92213cd829ad"],"domainId":["0d5b57c7f890fd995a74a12a7b7bcf32"],"domainSubId":["32"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:51:22.818Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042939571-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["131749a1-22ec-4c71-b020-e51c37b08c72"],"domainId":["9c6208d8033d093bf1263b24a7417d7a"],"domainSubId":["46"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:51:22.819Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventY event to Cosmos"],"streamId":["1598042940484-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"domainId":["2b8794839f01cb1e1bbebc73a0e62c42"],"domainSubId":["37"],"domainOutcomeFlag":[false],"domainScore":[0]}}
{"time":"2020-08-21T20:51:22.821Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042941209-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["9662bc19-043b-4432-93ac-ff85fb7489c6"],"domainId":["2b8794839f01cb1e1bbebc73a0e62c42"],"domainSubId":["37"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:51:27.216Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042941784-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["272e0a1d-76a3-4531-9192-8ed7456a34d3"],"domainId":["59ce1fd5f5c5d22a25c08bba5cfd5ef7"],"domainSubId":["29"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:51:27.218Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventY event to Cosmos"],"streamId":["1598042942010-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"domainId":["2b8794839f01cb1e1bbebc73a0e62c42"],"domainSubId":["37"],"domainOutcomeFlag":[false],"domainScore":[0]}}
{"time":"2020-08-21T20:51:31.240Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042942624-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["2ccf8355-ee57-41cf-920e-a3f3fe2fb4ab"],"domainId":["2b8794839f01cb1e1bbebc73a0e62c42"],"domainSubId":["37"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:51:31.920Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042945230-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["4bf31315-d149-429f-a2c6-c344939b9014"],"domainId":["f5c2602fb47d37cc6d8daccb1de09cd8"],"domainSubId":["51"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:51:32.015Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventY event to Cosmos"],"streamId":["1598042946178-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"domainId":["0d5b57c7f890fd995a74a12a7b7bcf32"],"domainSubId":["32"],"domainOutcomeFlag":[false],"domainScore":[0]}}
{"time":"2020-08-21T20:51:32.018Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042946838-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["872e3b01-56c8-48a4-b953-7d91f54120a6"],"domainId":["0d5b57c7f890fd995a74a12a7b7bcf32"],"domainSubId":["32"],"domainOutcomeFlag":[false]}}
{"time":"2020-08-21T20:51:32.020Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventY event to Cosmos"],"streamId":["1598042947496-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"domainId":["2b8794839f01cb1e1bbebc73a0e62c42"],"domainSubId":["37"],"domainOutcomeFlag":[false],"domainScore":[0]}}
{"time":"2020-08-21T20:51:32.023Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventY event to Cosmos"],"streamId":["1598042948495-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"domainId":["354f91fb549568a9966d6da418e8a002"],"domainSubId":["36"],"domainOutcomeFlag":[false],"domainScore":[0]}}
{"time":"2020-08-21T20:51:32.024Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventY event to Cosmos"],"streamId":["1598042948695-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"domainId":["59ce1fd5f5c5d22a25c08bba5cfd5ef7"],"domainSubId":["29"],"domainOutcomeFlag":[false],"domainScore":[0]}}
{"time":"2020-08-21T20:51:32.027Z","level":"INFO","ctx":"my-service.1_0_557","fields":{"msg":["Buffering DomainEventX event to Cosmos"],"streamId":["1598042950215-0"],"streamName":["my-input-stream"],"consumerId":["my-consumer"],"id":["0b7c9638-9654-461d-90cc-133c42d79515"],"domainId":["117d6fc11884fe25eeebfa86401913f1"],"domainSubId":["35"],"domainOutcomeFlag":[false]}}

After that final log line there is no activity from the app for 3+ days despite plenty of messages being available for the input source to consume. Upon manually restarting the application it continues to consume from the input source as expected.

@cross311 also reported similar behavior on #150.

Metadata

Metadata

Assignees

No one assigned

    Labels

    M-azureThis issue is related to the Azure moduleM-coreThis issue is related to the core moduleP3Priority 3bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions