|
7 | 7 | "source": [
|
8 | 8 | "# Custom Aggregations\n",
|
9 | 9 | "\n",
|
10 |
| - "This notebook is motivated by a [post](https://discourse.pangeo.io/t/using-xhistogram-to-bin-measurements-at-particular-stations/2365/4) on the Pangeo discourse forum.\n", |
| 10 | + "This notebook is motivated by a\n", |
| 11 | + "[post](https://discourse.pangeo.io/t/using-xhistogram-to-bin-measurements-at-particular-stations/2365/4)\n", |
| 12 | + "on the Pangeo discourse forum.\n", |
11 | 13 | "\n",
|
12 | 14 | "> Even better would be a command that lets me simply do the following.\n",
|
13 | 15 | ">\n",
|
14 | 16 | "> A = da.groupby(['lon_bins', 'lat_bins']).mode()\n",
|
15 | 17 | "\n",
|
16 |
| - "This notebook will describe how to accomplish this using a custom `Aggregation` since `mode` and `median` aren't supported by flox yet." |
| 18 | + "This notebook will describe how to accomplish this using a custom `Aggregation`\n", |
| 19 | + "since `mode` and `median` aren't supported by flox yet.\n" |
17 | 20 | ]
|
18 | 21 | },
|
19 | 22 | {
|
|
439 | 442 | "source": [
|
440 | 443 | "## A built-in reduction\n",
|
441 | 444 | "\n",
|
442 |
| - "First a simple example of lat-lon binning using a built-in reduction: mean" |
| 445 | + "First a simple example of lat-lon binning using a built-in reduction: mean\n" |
443 | 446 | ]
|
444 | 447 | },
|
445 | 448 | {
|
|
494 | 497 | "source": [
|
495 | 498 | "## Aggregations\n",
|
496 | 499 | "\n",
|
497 |
| - "flox knows how to interperet `func=\"mean\"` because it's been implemented in `aggregations.py` as an [Aggregation](https://flox.readthedocs.io/en/latest/generated/flox.aggregations.Aggregation.html)\n", |
| 500 | + "flox knows how to interperet `func=\"mean\"` because it's been implemented in\n", |
| 501 | + "`aggregations.py` as an\n", |
| 502 | + "[Aggregation](https://flox.readthedocs.io/en/latest/generated/flox.aggregations.Aggregation.html)\n", |
498 | 503 | "\n",
|
499 |
| - "An `Aggregation` is a blueprint for computing an aggregation, with both numpy and dask data." |
| 504 | + "An `Aggregation` is a blueprint for computing an aggregation, with both numpy\n", |
| 505 | + "and dask data.\n" |
500 | 506 | ]
|
501 | 507 | },
|
502 | 508 | {
|
|
545 | 551 | "```python\n",
|
546 | 552 | "mean = Aggregation(\n",
|
547 | 553 | " name=\"mean\",\n",
|
548 |
| - " \n", |
549 |
| - " # strings in the following are built-in grouped reductions \n", |
| 554 | + "\n", |
| 555 | + " # strings in the following are built-in grouped reductions\n", |
550 | 556 | " # implemented by the underlying \"engine\": flox or numpy_groupies or numbagg\n",
|
551 |
| - " \n", |
| 557 | + "\n", |
552 | 558 | " # for pure numpy inputs\n",
|
553 |
| - " numpy=\"mean\", \n", |
554 |
| - " \n", |
| 559 | + " numpy=\"mean\",\n", |
| 560 | + "\n", |
555 | 561 | " # The next are for dask inputs and describe how to reduce\n",
|
556 | 562 | " # the data in parallel\n",
|
557 | 563 | " chunk=(\"sum\", \"nanlen\"), # first compute these blockwise : (grouped_sum, grouped_count)\n",
|
558 | 564 | " combine=(\"sum\", \"sum\"), # reduce intermediate reuslts (sum the sums, sum the counts)\n",
|
559 | 565 | " finalize=lambda sum_, count: sum_ / count, # final mean value (divide sum by count)\n",
|
560 |
| - " \n", |
| 566 | + "\n", |
561 | 567 | " fill_value=(0, 0), # fill value for intermediate sums and counts when groups have no members\n",
|
562 | 568 | " dtypes=(None, np.intp), # optional dtypes for intermediates\n",
|
563 | 569 | " final_dtype=np.floating, # final dtype for output\n",
|
|
572 | 578 | "source": [
|
573 | 579 | "## Defining a custom aggregation\n",
|
574 | 580 | "\n",
|
575 |
| - "First we'll need a function that executes the grouped reduction given numpy inputs. \n", |
| 581 | + "First we'll need a function that executes the grouped reduction given numpy\n", |
| 582 | + "inputs.\n", |
| 583 | + "\n", |
| 584 | + "Custom functions are required to have this signature (copied form\n", |
| 585 | + "numpy_groupies):\n", |
576 | 586 | "\n",
|
577 |
| - "Custom functions are required to have this signature (copied form numpy_groupies):\n", |
578 |
| - "``` python\n", |
| 587 | + "```python\n", |
579 | 588 | "\n",
|
580 | 589 | "def custom_grouped_reduction(\n",
|
581 | 590 | " group_idx, array, *, axis=-1, size=None, fill_value=None, dtype=None\n",
|
582 | 591 | "):\n",
|
583 | 592 | " \"\"\"\n",
|
584 | 593 | " Parameters\n",
|
585 | 594 | " ----------\n",
|
586 |
| - " \n", |
| 595 | + "\n", |
587 | 596 | " group_idx : np.ndarray, 1D\n",
|
588 | 597 | " integer codes for group labels (1D)\n",
|
589 | 598 | " array : np.ndarray, nD\n",
|
|
596 | 605 | " fill_value for when number groups in group_idx is less than size\n",
|
597 | 606 | " dtype : optional\n",
|
598 | 607 | " dtype of output\n",
|
599 |
| - " \n", |
| 608 | + "\n", |
600 | 609 | " Returns\n",
|
601 | 610 | " -------\n",
|
602 |
| - " \n", |
| 611 | + "\n", |
603 | 612 | " np.ndarray with array.shape[-1] == size, containing a single value per group\n",
|
604 | 613 | " \"\"\"\n",
|
605 | 614 | " pass\n",
|
606 | 615 | "```\n",
|
607 | 616 | "\n",
|
608 |
| - "\n", |
609 |
| - "Since numpy_groupies does not implement a median, we'll do it ourselves by passing `np.median` to `numpy_groupies.aggregate_numpy.aggregate`. This will loop over all groups, and then execute `np.median` on the group members in serial. It is not fast, but quite convenient.\n" |
| 617 | + "Since numpy_groupies does not implement a median, we'll do it ourselves by\n", |
| 618 | + "passing `np.median` to `numpy_groupies.aggregate_numpy.aggregate`. This will\n", |
| 619 | + "loop over all groups, and then execute `np.median` on the group members in\n", |
| 620 | + "serial. It is not fast, but quite convenient.\n" |
610 | 621 | ]
|
611 | 622 | },
|
612 | 623 | {
|
|
639 | 650 | "id": "b356f4f2-ae22-4f56-89ec-50646136e2eb",
|
640 | 651 | "metadata": {},
|
641 | 652 | "source": [
|
642 |
| - "Now we create the `Aggregation`" |
| 653 | + "Now we create the `Aggregation`\n" |
643 | 654 | ]
|
644 | 655 | },
|
645 | 656 | {
|
|
682 | 693 | "id": "899ece52-ebd4-47b4-8090-cbbb63f504a4",
|
683 | 694 | "metadata": {},
|
684 | 695 | "source": [
|
685 |
| - "And apply it!" |
| 696 | + "And apply it!\n" |
686 | 697 | ]
|
687 | 698 | },
|
688 | 699 | {
|
|
0 commit comments