Skip to content

500 internal server error + bad JSON elements #1858

@btskinner

Description

@btskinner

I've been updating the rscorecard R package and have run into a couple of issues. Both involve the same call that has worked in the past. Here are the two things I'm seeing:

500 Internal Server error

When I make this call with rscorecard:

df <- sc_init() %>% 
    sc_filter(control == 1, region == 1:2, ccbasic == 1:24) %>% 
    sc_select(unitid, instnm, md_earn_wne_p10) %>% 
    sc_year(2009) %>%
    sc_get()

which translates to

https://api.data.gov/ed/collegescorecard/v1/schools.json?school.ownership=1&school.region_id__range=1..2&school.carnegie_basic__range=1..24&_fields=id,school.name,2009.earnings.10_yrs_after_entry.median&_page=0&_per_page=100&api_key=<HIDDEN>

I get a page with this message

Screen Shot 2021-07-13 at 12 37 16 PM

This might be related to this error reported on the rscorecard GitHub repo.

Bad JSON elements

When I change the call to use data for 2010 instead of 2009, I get extra elements at the end of the pull. It's causing rscorecard to break, which is my issue, but since the code as worked in the past, something new is happening. Here's the API call (notice that I'm calling page=2, which returns the last 83 elements of the 283 element pull):

https://api.data.gov/ed/collegescorecard/v1/schools.json?school.ownership=1&school.region_id__range=1..2&school.carnegie_basic__range=1..24&_fields=id,school.name,2010.earnings.10_yrs_after_entry.median&_page=2&_per_page=100&api_key=<HIDDEN>

Here's the result (I've cut the result to the last 10 elements to save space and placed a ... to mark the cuts):

{
  "metadata": {
    "page": 2,
    "total": 283,
    "per_page": 100
  },
  "results": [
     ...
     {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Pennsylvania College of Technology",
      "id": 366252
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Suffolk County Community College",
      "id": 366395
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Carroll Community College",
      "id": 405872
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Pennsylvania Highlands Community College",
      "id": 414911
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Lancaster County Career and Technology Center",
      "id": 418533
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "York County Community College",
      "id": 420440
    },
    {
      "2010.earnings.10_yrs_after_entry.median": null,
      "school.name": "Community College of Baltimore County",
      "id": 434672
    },
    {
      "UNITID": 475565,
      "id": null,
      "school.name": null,
      "2010.earnings.10_yrs_after_entry.median": null
    },
    {
      "UNITID": 479956,
      "id": null,
      "school.name": null,
      "2010.earnings.10_yrs_after_entry.median": null
    },
    {
      "UNITID": 480064,
      "id": null,
      "school.name": null,
      "2010.earnings.10_yrs_after_entry.median": null
    }
  ]
}

The last three elements have an extra key UNITID and then NULL values for the rest. This causes an error in my rscorecard pull. Again, that's my issue, but it isn't something that's been a problem in the past.

Next steps

These issues only recently started happening --- I'm guessing with the big changes to the API in April. Is this something that needs to be addressed on your end or on my end with better error handling? Either way, thanks for your work on this. I'm also happy to send more info.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions