Skip to content

Fix: Alarm for notification queue overpassing a given threshold (#4113)#4375

Open
Anjali-NEC wants to merge 13 commits into
telefonicaid:masterfrom
Anjali-NEC:issue4113
Open

Fix: Alarm for notification queue overpassing a given threshold (#4113)#4375
Anjali-NEC wants to merge 13 commits into
telefonicaid:masterfrom
Anjali-NEC:issue4113

Conversation

@Anjali-NEC

@Anjali-NEC Anjali-NEC commented Jun 14, 2023

Copy link
Copy Markdown
Contributor

Fix issue #4113

Comment thread src/lib/common/limits.h Outdated
Comment thread CHANGES_NEXT_RELEASE Outdated
Comment thread src/app/contextBroker/contextBroker.cpp Outdated
Comment thread src/app/contextBroker/contextBroker.cpp Outdated
Comment thread src/app/contextBroker/contextBroker.cpp Outdated
Comment thread src/app/contextBroker/contextBroker.cpp Outdated
Comment thread doc/manuals/admin/logs.md
Comment thread src/app/contextBroker/contextBroker.cpp Outdated
@fgalan

fgalan commented Jul 10, 2023

Copy link
Copy Markdown
Member

I have provided some extra comments that I hope may help.

In addition note that for a PR to be ready for merging, all the tests should pass (at the present moment, they don't pass).

@Anjali-NEC

Copy link
Copy Markdown
Contributor Author

In addition note that for a PR to be ready for merging, all the tests should pass (at the present moment, they don't pass).

@fgalan All test cases are passed except 4113_alarm_for_notification_queue_overpassing_a_given_threshold/alarm_for_notification_queue_overpassing_a_given_threshold.test but this test case is passed on my local environment

Comment thread CHANGES_NEXT_RELEASE Outdated
Comment thread doc/manuals/admin/logs.md Outdated
Comment thread src/lib/ngsiNotify/QueueNotifier.cpp
Comment thread src/lib/alarmMgr/AlarmManager.cpp Outdated
Comment thread src/lib/alarmMgr/AlarmManager.cpp Outdated
Comment thread src/lib/alarmMgr/AlarmManager.cpp Outdated
Comment thread src/lib/alarmMgr/AlarmManager.h Outdated
@fgalan

fgalan commented Sep 6, 2023

Copy link
Copy Markdown
Member

The massive fails in CI are due to the changes done in docker CI image (see #4417 (comment)). Once PR #4417, this PR should be updated with master and test will be passing again.

@fgalan

fgalan commented Sep 6, 2023

Copy link
Copy Markdown
Member

The massive fails in CI are due to the changes done in docker CI image (see #4417 (comment)). Once PR #4417, this PR should be updated with master and test will be passing again.

PR #4417 has been merged. @Anjali-NEC please upgrade this PR's branch with master.

Comment on lines +194 to +202
std::string details = ("notification queue reached maximum threshold");

long unsigned int threshold = queueSize(service)*notifAlarmThreshold/100;

if (threshold >= queueSize(service))
{
alarmMgr.notificationQueue(queueName.c_str(), details.c_str());
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code can be simplified this way:

Suggested change
std::string details = ("notification queue reached maximum threshold");
long unsigned int threshold = queueSize(service)*notifAlarmThreshold/100;
if (threshold >= queueSize(service))
{
alarmMgr.notificationQueue(queueName.c_str(), details.c_str());
}
}
long unsigned int threshold = queueSize(service)*notifAlarmThreshold/100;
if (threshold >= queueSize(service))
{
alarmMgr.notificationQueue(queueName.c_str(), "notification queue reached maximum threshold");
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, the detail message could provide information about the particular threshold for this case. For instance, if we have a queue of size 6 and the threshold is 50%, something like this:

notification queue reached maximum threshold (3)

# VALGRIND_READY - to mark the test ready for valgrindTestSuite.sh

--NAME--
alarm for notification queue overpassing a given threshold

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
alarm for notification queue overpassing a given threshold
alarm for notification queue overpassing a given threshold (relog variant)

Comment on lines +228 to +233
Raising alarm NotificationQueue serv1: notification queue reached maximum threshold
Repeated NotificationQueue serv1: notification queue reached maximum threshold
Raising alarm NotificationQueue serv2: notification queue reached maximum threshold
Repeated NotificationQueue serv2: notification queue reached maximum threshold
Raising alarm NotificationQueue default: notification queue reached maximum threshold
Repeated NotificationQueue default: notification queue reached maximum threshold

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this result doesn't follow expectations.

According to:

# 04. Create/update entity in serv1 5 times (update #3 raises alarm, update #4 and #5 cause repeated log)
# 05. Create/update entity in serv2 3 times (update #2 raises alarm, update #3 cause repeated log)
# 06. Create/update entity in serv3 (default) 7 times (update #4 raises alarm, updates #5, #6 and #7 cause repeated log)

Se should see 2 repeated logs for serv1, 1 repeated log for serv2 (that's is ok in the above output) and 3 repeated logs for default queue. Something like this:

Raising alarm NotificationQueue serv1: notification queue reached maximum threshold
Repeated NotificationQueue serv1: notification queue reached maximum threshold
Repeated NotificationQueue serv1: notification queue reached maximum threshold
Raising alarm NotificationQueue serv2: notification queue reached maximum threshold
Repeated NotificationQueue serv2: notification queue reached maximum threshold
Raising alarm NotificationQueue default: notification queue reached maximum threshold
Repeated NotificationQueue default: notification queue reached maximum threshold
Repeated NotificationQueue default: notification queue reached maximum threshold
Repeated NotificationQueue default: notification queue reached maximum threshold

@fgalan

fgalan commented Sep 7, 2023

Copy link
Copy Markdown
Member

In addition to the current two .test files (which are nice :), I'd suggest to add one to check the releasing of the new alarm. Something like this:

# 01. Subscribe serv1 to the accumulator endpoint that responses in 10 seconds
# 02. Subscribe serv2 to the accumulator endpoint that responses in 10 seconds
# 03. Subscribe serv3 to the accumulator endpoint that responses in 10 seconds
# 04. Create/update entity in serv1 5 times (update #3 raises alarm)
# 05. Create/update entity in serv2 3 times (update #2 raises alarm)
# 06. Create/update entity in serv3 (default) 7 times (update #4 raises alarm)
# 07. Wait 11 seconds
# 08. Grep log for notificationQueue alarm

The endpoint that responses in 10 seconds is /noresponse (can be seen here https://github.com/telefonicaid/fiware-orion/blob/master/scripts/accumulator-server.py#L233)

As result of step 08, we should see some releasing alarm messages.

@fgalan

fgalan commented Dec 19, 2023

Copy link
Copy Markdown
Member

After the merging of PR #4332 this PR needs to be upgrades with master branch. Otherwise, functional test will fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants