P1 - Success rate metric only calculated based on NoOrchestrator transcode errors

**Describe the bug**
A clear and concise description of what the bug is.

The success rate Grafana graph [here](https://eu-metrics-monitoring.livepeer.live/grafana/d/ZsZv-B3im/federated-livepeer-overview?orgId=1) shows success rate consistently >= 100% even though we know that there have been transcode failures.

From reviewing the metrics code, it appears that the success rate metric is updated whenever there is a call to `census.sendSuccess()` in `monitor/census.go`.

I see this method being called in at least two places:

1. In [SegmentFullyTranscoded()](https://github.com/livepeer/go-livepeer/blob/34a9d40ff0d8df542c24bc2f8d8b181f246e5f4b/monitor/census.go#L1383) which is called [here](https://github.com/livepeer/go-livepeer/blob/4be57ce2328c9d3a8fdfcfdd8354f5c675f4b361/server/broadcast.go#L1313) after B finishes downloading all results from O
2. In [segmentTranscodeFailed()](https://github.com/livepeer/go-livepeer/blob/34a9d40ff0d8df542c24bc2f8d8b181f246e5f4b/monitor/census.go#L1326) which is called in `SegmentTranscodeFailed()` which is called whenever a transcode error is encountered

For 2, there is a concept of a "permanent" vs. "non-permanent" transcode error indicated via the `permanent` bool passed to `SegmentTranscodeFailed()`. We can see non-permanent errors being recorded [here](https://github.com/livepeer/go-livepeer/blob/34a9d40ff0d8df542c24bc2f8d8b181f246e5f4b/server/segment_rpc.go#L583). The only place where there is a permanent error recorded is [here](https://github.com/livepeer/go-livepeer/blob/4be57ce2328c9d3a8fdfcfdd8354f5c675f4b361/server/broadcast.go#L951) for NoOrchestrator transcode errors. This seems problematic because only permanent errors will trigger a call to `census.sendSuccess()` [here](https://github.com/livepeer/go-livepeer/blob/34a9d40ff0d8df542c24bc2f8d8b181f246e5f4b/monitor/census.go#L1323) when recording a transcode error. As a result, I don't think we are properly updating the success rate metric in at least two places:

- When we hit the max # of transcode attempts which prompts B to give up on a segment [here](https://github.com/livepeer/go-livepeer/blob/c6983c5be638be4e20c2ae9c2c640b4ac8acaf06/server/broadcast.go#L952)
- When we hit a non-retryable transcode error which prompts B to give up on a segment [here](https://github.com/livepeer/go-livepeer/blob/c6983c5be638be4e20c2ae9c2c640b4ac8acaf06/server/broadcast.go#L924)
- When the caller (i.e. HTTP push client) gives up on the transcode resulting in a context cancellation [here](https://github.com/livepeer/go-livepeer/blob/c6983c5be638be4e20c2ae9c2c640b4ac8acaf06/server/broadcast.go#L928)

**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
4. Scroll down to '....'
5. See error

We could trigger the aforementioned errors that are not being factored in right now to see if the success rate is not affected. A solution should demonstrate that these errors cause the success rate to drop.

**Expected behavior**
A clear and concise description of what you expected to happen.

I expect the success rate metric to properly factor in all transcode errors that result in no renditions for a segment that is passed in. 

Generally, I see at least these categories of transcode errors that should cause success rate to drop:

- If B hits the max # of transcode attempts - B should give up on the segment b/c it tried enough times already
- If B hits a non-retryable error - B should give up on the segment b/c it knows that this segment likely just cannot be transcoded
- If B knows that the caller (i.e. HTTP push client) is no longer waiting for a result - B should give up on the segment b/c it knows no one cares about the results anymore because the transcode was too slow

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Desktop (please complete the following information):**
 - OS: [e.g. iOS]
 - Browser [e.g. chrome, safari]
 - Version [e.g. 22]

**Smartphone (please complete the following information):**
 - Device: [e.g. iPhone6]
 - OS: [e.g. iOS8.1]
 - Browser [e.g. stock browser, safari]
 - Version [e.g. 22]

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

P1 - Success rate metric only calculated based on NoOrchestrator transcode errors #2674

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

P1 - Success rate metric only calculated based on NoOrchestrator transcode errors #2674

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions