nb4 feedback

Thank you, @tarakc02, for contributing [chapter 4](59bd533b43626adab06adc716e0748d6fc516111)! I was able to run the notebook and read through it, I think it's a great starting point for this discussion.

Here are the notes from my review, feel free to pick and choose:
- Add some version of the MEP-requested reminder, "If you're interested in any of the code snippets used throughout this chapter, you can unfold them by clicking the `CODE` box on the right hand side of the page," near the beginning of the chapter. I think it's fine to put it in the section after 'setup data' or as a lone statement under the title.
- Remove 'setup data' header.
- If the cell with `warnings.filterwarnings('ignore')` is about the warning from the `%load_ext pretty_jupyter` cell, I would move the ignore cell before the load cell. FWIW, I see the same warning in the nb1 file but not in its [live version](https://hrdag.org/tech-notes/what-is-an-mp-event-nb1-20250721.html).
- Under "Missingness: who is represented?", change "in a data" to "in data" or "in a dataset", etc. Suggest adding emphasis to start of list items, ie) - **missing rows:** undocumented losses.
- Under "Missing rows: reports that were never created," items 2 and 4 have a misspelled word ("misisng", "reqest").
- I wonder if in the sentence, "At each step, we have the potential for a disappearance to go unreported" we should qualify unreported with "effectively" or "functionally", something that acknowledges that even if previous steps have been made successfully, the report can still become unobserved/lost. However, the subsequent paragraphs elaborate on these steps and I don't think it's lacking without an adverb.
- Should "Chicago Police Hands" be "Chicago Police hands"? In that same paragraph,
    - "Report" is missing from "require that a missing person [report] is made for children in care".
    - There's a typo: "thier".
    - Change "experiences" to "experiencing" or "who experience", etc. in "the overwhelming majority of people experiences abuse are so isolated." Then "be" missing from the rest of that sentence, "they never have the opportunity to [be] reported missing."
- RE: "Tk Quote from Susan Frankel from the National Runaway Safeline that says the majority of the youth they correspond with do not include law enforement"
    - I found some similar statements [here](https://www.kgw.com/article/news/local/national-runaway-prevention-month-southwest-washington-vancouver-safeline/283-0d2b34d4-f632-4857-924d-c2ed6b654058) and a [recent report](https://www.nationalrunawaysafeline.org/youth-homeless-prevention-report) about overlap with youth homelessness but couldn't find a specific reference to overlap with LE. I found a few podcasts but couldn't get transcripts to paw through.
- The line, "What that suggests is that this form of missingness is not randomly distributed across the population, but rather it disproportionately affects the visibility of particular groups of marginalized people," is great and I want to elaborate on what this would look like or imply about the dataset or such communities. Maybe something about how datasets like this are typically taken at face value and its assumed that they represent the entirety of the issue (we touch on that in the beginning of the series), but at this point we have to know that some rows are actually duplicates of prematurely closed reports and other rows don't exist because the community doesn't trust the police, but that lack of trust would look like a lack of this vulnerability in their community.
- Grabbing this tk line - "This legislation was sponsored through tk invdividuals (two people we spke to on the phone. who connected us to the ISP)."
- Love the document cloud links and examples!
- In the cell where `simplify_sourcenames()` is defined, could use `.reset_index(drop=True)` to drop the 'index' column from the output
- Could sort the sources in the record counts by source table, ie) `by_source.sources.value_counts()[['first', 'second', 'third', 'fourth', 'fifth', 'sixth', 'iucrs']]`. That makes it a little easier to spot the oddities in the record counts over time. Also, could reset the index after sorting or use `.to_frame()` to get the output to look like the others, but that's nitpicky.
- In the cell with `diff = fifth_rdnos - sixth_rdnos`, suggest transposing the example rows to avoid left-right scroll.
- In the cell with `sources['dummy'] = 1`, could add `.sort_values(ascending=False)`. I would also add a couple sentences about the table and/or upset plot and what they support regarding the record coverage across sources.
- Suggest adding a sample use of `summarize_missings()` and maybe a clearer segue from the upset plot before it. 
- CPD and OEMC [technically](https://chicago.suntimes.com/city-hall/2023/9/6/23861627/chicago-police-emergency-response-times-inspector-general-audit-black-hispanic-disparity) do not regard a missing officer arrival timestamp as the officer not arriving. Witzburg talks about the pattern across priority codes and seems to find it fishy, but we might want to be careful about how we phrase missing arrival data.
- Add a disclaimer that the results of `mp.columns` include many columns we created to support review/analysis.
- Add some kind of conclusion or endcap. Could talk about the state task force and how the limitations in structured data we observe could impact the possible findings, and/or the work that the task force (and ultimately police) would have to do in order to estimate or resolve issues. This notebook may be the last in the series and I don't think it needs to be followed up with an outro, but it would be useful to pose some questions to the reader that tie off the themes we presented collectively.

thanks again for your work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nb4 feedback #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nb4 feedback #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions