Skip to content
Arturo Filastò edited this page Apr 17, 2020 · 13 revisions

Creating empty bucket

If for some reason the pipeline is unable to process data from a single daily bucket, you will have to, for consistency, create an empty bucket for the date it skipped.

This should be to the extent possible avoided as it leads to the next daily bucket containing double the amount of daily measurements in it.

When the measurement processing tasks get stuck, pipeline will accumulate a backlog of unprocessed measurements. These unprocessed measurements will end up in the next bucket after it has been unstuck. To signal to consumers explicitly that these buckets are empty, we need to create an "empty bucket" (e.g. new buckets were not created for a while due to temporary pipeline stall or lack of data in the past)

To create an empty bucket do the following:

$ ssh datacollector.infra.ooni.io
$ cd /data/ooni/private/reports-raw-shals
$ sha256sum </dev/null | sudo -u benchmark dd of=YYYY-MM-DD (this will lead to a file with e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 -)

From the airflow UI clear state of failed reports_raw_sensor (the default Downstream & Recursive should be fine) for corresponding date in hist_canning DAG Tree View (you will see two red failed boxes in this view, but it's actually just one task).

The reports-raw-shals file is a "seal" that signals that all the data files from the collectors are successfully merged into single bucket directory. It is generated by docker-trampoline script at reports_raw_merge step.

Handling a stuck pipeline due to slow rsync

It has been quite common recently for the pipeline to get stuck during the rsync process. This can be detected because the fetcher task is still running, while the hist_canning workflow is marked as failed.

See: https://github.com/ooni/sysadmin/issues/403

In order to fix it you should:

  1. Connect to datacollector and pkill rsync:
ssh datacollector.infra.ooni.io
datacollector:~$ sudo pkill rsync
  1. Once rsync has been killed check that the task is marked as up-for-retry from the airflow UI and wait for it to conclude. Tip: sometimes you need to kill rsync a couple of times before it gets a socket which has decent throughput

  2. Once it has concluded you can restart the hist_canning DAG. See screenshots below:

Screenshot 2020-01-14 at 17 04 45 Screenshot 2020-01-14 at 17 04 51

Screenshot 2020-01-14 at 17 04 56 Screenshot 2020-01-14 at 17 05 01

Screenshot 2020-02-20 at 14 33 39

Clone this wiki locally