Skip to content
Sean Kelly edited this page Dec 13, 2024 · 3 revisions

LabCAS Solr API

LabCAS uses the Solr search engine in order to store, search, and retrieve metadata for the science data in the EDRN Cancer Biomarker Commons. LabCAS also provides an API that lets authenticated EDRN users run searches on the Solr API. This document will help you get started with this API.

The intent of this documentation is not to replace Solr's documentation. You are encouraged to read the Solr Common Query Parameters documentation to learn how to construct queries for Solr. Some example queries will be given within this document. Please note LabCAS uses Solr version 6.6.

This document uses Postman to make queries to the LabCAS Solr API, as it takes care of formulating URLs, quoting parameters, and so forth. Postman can also generate code for Python, C#, Java, etc., as well as for the curl command, so it serves as a nice umbrella technology.

If you prefer to write code instead of using Postman, you can craft queries for LabCAS Solr API. Two example programs are available (in Python) that demonstrate this capability:

  • cibbbcd_events.py — this program extracts event IDs from the Solr "datasets" core for the LabCAS collection "Combined Imaging and Blood Biomarkers for Breast Cancer Diagnosis"
  • events_by_blind.py — this program displays event IDs from the Solr "files" core given a blinded site ID as a parameter

You can read over the source code for these, or install the example programs onto your system for direct execution; see the README titled "Data Access API: Examples" and the source code for more information.

The remainder of this document will show how to use the Solr API directly using Postman.

Installing Postman and Setting Up Your EDRN Password

First, download and install Postman. Postman is free software. There is also a web version, but for this document we'll use the desktop version.

Launch Postman for the first time, and in the lower-right, in the bottom status bar, click "⌂ Vault". The "vault" is where we'll store your EDRN username and password.

The first time you do this, you'll be prompted to encrypt your vault. That's a good security measure, so go ahead and click the Encrypt button. You'll get a "vault key" which you can use to unlock your vault in the future. You can save this key (a long hexadecimal number) in a safe place such as your password manager. Finally, press "Open Vault".

In the table of vault secrets, click "Add new secret" and name it edrn_username. For the value, put in your EDRN username. Under "Allowed Domains", enter https://edrn-labcas.jpl.nasa.gov.

Repeat this, but for the next secret call it edrn_password. For the value, put in your EDRN password. Use the same "Allowed Domain".

Finally, close the vault by clicking the ⤫ in the tab bar at the top.

Importing the LabCAS Solr API "Postman Collection"

We have created a Postman Collection that describes the LabCAS Solr API. With this, you won't have to worry about setting the URL, authorization, or query parameters.

Download the Postman Collection for the LabCAS Solr API.

Once downloaded, import it into your Postman from the "File → Import" menu.

Using the LabCAS Solr API

Once you've got the Postman Collection imported, you should have a new item in your Postman Workspace, "LabCAS Solr API". You can expand the collection and see the three endpoints:

  • Collections — describes the high-level science data collections in LabCAS
  • Datasets — organizes the data in collections into groupings, typically associated with parts of a study (case versus control) or participants, or by other logical separation. Datasets can contain either other datasets (forming a hierarchy) or files in LabCAS
  • Files — represents the metadata for individual files of scientific data, such as DICOM files. This core lets you retrieve the metadata for files. Note that download actual files is a separate API, not described here

To use these endpoints:

  1. Select Collections, Datasets, or Files
  2. Click the "Params" tab if it's not already visible
  3. Enter a Solr query in the q parameter; fill in other parameters as needed
  4. Press "Send"

As a test, try this:

  1. Select "Collections"
  2. In the "Params" tab type "biomarker" into the q (meaning "show all collections with the word biomarker in them")
  3. Leave all other parameters at their defaults
  4. Press "Send"

In the lower half of the screen, make sure the "Pretty" and "JSON" formats are selected. You should see around 16 collections that match. Feel free to try the other options, AI formatting features, code conversions, etc.

Sample Queries

The following are a few queries you can try:

  • "Return eventID for files with CollectionName of Lung Team Project 2 Images in JSON format"
    • Use the "Files" endpoint
    • Set q to CollectionName:"Lung Team Project 2 Images" — note the quotes since the name has spaces
    • Set fl to eventID
    • Set wt to json
    • Set rows to 999999 — adjust this as needed, or use rows + start to paginate
  • "All details of collections with SpecimenType of Serum in XML format"
    • Use the "Collections" endpoint
    • Set q to SpecimenType:Serum
    • Set rows to 99999 — adjust this as needed, or use rows + start to paginate
    • Set wt to xml
  • "Top 10 LeadPI names and LeadPIId IDs of all datasets with CollectionName of Lung Team Project 2 Images in JSON format"
    • Use the "Datasets" endpoint
    • Set q to CollectionName:"Lung Team Project 2 Images"
    • Set fl to LeadPI,LeadPIId
    • Set rows to 10
    • Set wt to json
  • "The ID, data custodian, and data custodian email of the top 100 files with City_of_Hope in their IDs in CSV format"
    • Use the "Files" endpoint
    • Set q to id:*City_of_Hope*
    • Set fl to id,DataCustodian,DataCustodianEmail
    • Set rows to 100
    • Set wt to csv

Advanced Solr Queries

Please note that the Postman Collection included above only includes a subset of the parameters that Solr supports. If you're confident in your programming skills, curl command usage, etc., feel free to use advanced parameters like fq, facet, facet.field, and so forth.

Consulting the Solr query documentation can be helpful in these cases, as well as custom Solr clients for your programming languages of choice.

Using AI to Generate Queries

ChatGPT or other large language models can be instrumental in helping to express the q, fq, etc. parameter syntax without the need of fully understanding Solr's query language.

As an example, this prompt presented to the ChatGPT "4o" model:

Write a curl command for Solr at https://https://edrn-labcas.jpl.nasa.gov/data-access-api/files/select that takes HTTP Basic username EDRNUSERNAME with password EDRNPASSWORD to return the non-empty "eventID" fields for the first 100 files where the "CollectionName" field is "Lung Team Project 2 Images"

produces a valid curl command as of this writing.