- Code Formatting
- Uploading a TSV file for
/anvil/upload-tsvdata ingest - Local Database for Development
- FHIR Controller
- Facet Loader
- Running dev application locally on command line
Before contributing code, please set up our git hook:
cp code-formatting/pre-commit.sh .git/hooks/pre-commit
chmod +x .git/hooks/pre-commitTo skip formatting on a block of code, wrap in spotless:off, spotless:on comments
curl -X POST \
-H "Content-Type: text/plain" \
--data-binary @path/to/your/file.tsv \
http://localhost:8080/anvil/upload-tsvTo set up a local development database, follow these steps:
- Run the
docker-compose.postgres.ymlfile to start the required services:
docker-compose -f docker-compose.postgres.yml up --build -d-
Execute the
db/schema.sqlscript to create thedictschema and all necessary database tables. -
(Optional) If you have a dump file generated with
pg_dump, you can restore it usingpg_restore. If the backup is from ourBDCdatabase, the following command will work:
pg_restore \
--host="localhost" \
--port="5432" \
--username="username" \
--dbname="dictionary_db" \
--jobs=10 \
--verbose \
--no-owner \
./dictionary_dbYou will be prompted to enter the database password after executing this command.
- Now you can run the Spring Boot application with
devas the active profile.
If you do not have pg_restore and pg_dump, you can install them by following these steps:
- Install
postgresql@16, which includes bothpg_restoreandpg_dump:
brew install postgresql@16- Link the installed PostgreSQL version:
brew link postgresql@16- Add PostgreSQL to your
zshrcorbashprofile. Forzshrc, use the following command:
echo 'export PATH="/opt/homebrew/opt/postgresql@16/bin:$PATH"' >> ~/.zshrc- Verify the versions of
pg_dumpandpg_restore:
pg_dump --version
pg_restore --versionIf the command doesn’t work immediately, you may need to restart your terminal.
docker exec -it postgres-db psql -U username -d dictionary_dbFrom within, you can execute set search_path to dict; to search within the dict schema (similar to running use dict; in mysql databases).
The FHIR Controller provides endpoints to interact with a FHIR (Fast Healthcare Interoperability Resources) API for fetching and processing research study metadata.
Configuration:
Add the following properties to your application configuration file (e.g., application-dev.properties):
fhir.api.base.url=https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1
fhir.api.bulk.endpoint=/ResearchStudyEndpoints:
-
GET
/api/fhir/research-studies/- Fetches all research studies from the FHIR API.
- Returns a list of ResearchStudy objects containing study metadata.
- Uses pagination (500 results per page) to handle large datasets.
curl -X GET http://localhost:8080/api/fhir/research-studies/
-
GET
/api/fhir/research-studies/find-dbgap- Extracts distinct phs (dbGaP study) values from all research studies.
- Returns a list of unique study identifiers.
curl -X GET http://localhost:8080/api/fhir/research-studies/find-dbgap
-
PATCH
/api/fhir/load/metadata/refresh- Refreshes dataset metadata by fetching studies from the FHIR API.
- Updates dataset records based on study identifiers and metadata.
- Returns a summary of datasets updated and metadata records processed.
curl -X PATCH http://localhost:8080/api/fhir/load/metadata/refresh
-
PATCH
/api/fhir/load/metadata/mapping- Maps FHIR metadata keys to dataset metadata fields.
- Reads a JSON mapping configuration to associate metadata attributes with datasets.
- Requires
fhir.api.url.to.key.map.jsonconfiguration property.
curl -X PATCH http://localhost:8080/api/fhir/load/metadata/mapping
Implementation notes:
- The service uses pagination with a page size of 500 results to stay within the default 10MB WebClient buffer limit.
- Each paginated request fetches approximately 9.7MB of data, safely below the buffer threshold.
- The Meta model supports
versionId,lastUpdated, andsourcefields returned by the FHIR API.
This project includes a Facet Loader that ingests a JSON payload describing Facet Categories, their Facets (including nested children), and mapping rules to automatically associate facets with concept nodes based on a concept path.
What it does:
- Upserts facet categories and facets by name (idempotent). If a name already exists it is updated; otherwise it is created.
- Builds a facet hierarchy using parent_id based on the JSON nesting of facets.
- Evaluates expressions to map facets to concept nodes (rows in dict.concept_node) and writes relationships into dict.facet__concept_node.
API endpoint:
- POST
/api/facet/loader/load - Request body is a JSON array of objects each containing a Facet_Category.
- Returns a JSON object with counts: categoriesCreated, categoriesUpdated, facetsCreated, facetsUpdated.
Example request body (minimal):
[
{
"Facet_Category": {
"Name": "Consortium_Curated_Facets",
"Display": "Consortium Curated Facets",
"Description": "Consortium Curated Facets Description",
"Metadata": [
{
"Key": "order",
"Value": "1"
}
],
"Facets": [
{
"Name": "Recover Adult",
"Display": "RECOVER Adult",
"Description": "Recover adult parent facet.",
"Facets": [
{
"Name": "Infected",
"Display": "Infected",
"Description": "Infected Facet Description"
}
]
}
]
}
}
]Example with expressions (maps facets to concept nodes by concept_path):
- Concept path sample: \phs003436\Recover_Adult\visits\inf\12\pasc_cc_2024\
- Nodes (by index):
- 0: phs003436
- 1: Recover_Adult
- 2: biostats_derived
- 3: visits
- 4: inf
- 5: 12
- 6: pasc_cc_2024
- Negative indices are allowed (e.g., -1 is last node, -2 second to last, etc.).
Example payload fragment using OR expression groups (OR-of-ANDs):
[
{
"Facet_Category": {
"Name": "Consortium_Curated_Facets",
"Facets": [
{
"Name": "Recover Adult Infected or PASC",
"Expression_Groups": [
[
{ "regex": "(?i)\\binf(ected)?\\b", "node": -3 }
],
[
{ "contains": "pasc", "node": -1 }
]
]
},
{
"Name": "Same using camelCase alias",
"expressionGroups": [
[ { "regex": "(?i)\\binf(ected)?\\b", "node": -3 } ],
[ { "contains": "pasc", "node": -1 } ]
]
}
]
}
}
]- Supported keys per expression entry: exactly, contains, regex (use one or more).
- Two ways to define rules:
- Expression_Groups (new): an array of groups; OR across groups, AND within each group.
- node is optional for any entry:
- If node is provided, we evaluate only that node (zero-based; negatives allowed, e.g., -1 is last node).
- If node is omitted, the expression scans all nodes in the concept_path and evaluates true if any node matches.
- Evaluation semantics:
- If Expression_Groups is present and non-empty: the facet matches a concept if ANY group matches; each group requires ALL of its entries to match (OR-of-ANDs).
- Within a single entry, all provided keys (exactly/contains/regex) must match the same node value.
- Regex uses Java syntax; inline flags like (?i) are supported. Exactly/contains are literal string matches.
- Out-of-bounds node indices or invalid regex cause that entry to evaluate to false.
- Inheritance to children:
- Child facets inherit their parents' effective groups. Effective groups are formed as a cross-product: each parent group is AND-combined with each child group.
- If a parent has groups and a child defines no groups, the child inherits the parent groups as-is. If neither parent nor child has groups, no automatic mapping occurs.
- If a facet contains no groups at any level (no Expression_Groups on it or any ancestor), it is not automatically mapped to any concept nodes (it is still created in the hierarchy).
Alias handling in payload:
- Facet fields support camelCase aliases: Name/name, Display/display, Description/description.
- For grouped rules, Expression_Groups is preferred; expressionGroups and ExpressionGroups are also accepted as aliases.
- Each expression entry supports keys: exactly, contains, regex, and optional node.
Pre-requisites for mapping:
- dict.concept_node should already be populated (e.g., via the Dictionary Loader) so that concept_path values exist to evaluate.
cURL examples:
Save payload to payload.json and run:
curl -X POST \
-H "Content-Type: application/json" \
--data-binary @payload.json \
http://localhost:8080/api/facet/loader/loadSample response:
{
"categoriesCreated": 1,
"categoriesUpdated": 0,
"facetsCreated": 5,
"facetsUpdated": 0
}- Re-posting the same payload will update existing categories/facets and keep the hierarchy; counts will reflect updates rather than creations.
- Facet-to-concept mappings avoid duplicates using ON CONFLICT semantics in the repository/service layer.
Tests you can run (Docker required for Testcontainers):
mvn -Dtest=edu.harvard.dbmi.avillach.dictionaryetl.facet.FacetExpressionEvaluatorTest testmvn -Dtest=edu.harvard.dbmi.avillach.dictionaryetl.facet.FacetLoaderClearIntegrationTest testmvn -Dtest=edu.harvard.dbmi.avillach.dictionaryetl.facet.FacetLoaderControllerTest testmvn -Dtest=edu.harvard.dbmi.avillach.dictionaryetl.facet.FacetLoaderMappingIntegrationTest testmvn -Dtest=edu.harvard.dbmi.avillach.dictionaryetl.facet.FacetLoaderServiceTest test
Or run all tests:
mvn test
- If no concept nodes are mapped after posting, verify that concept_node rows exist and your node positions/regex align with your concept_path structure.
- Ensure the app is running on the expected port and CORS origin (Facet Loader controller is configured with CrossOrigin for http://localhost:8081 by default).
Use this endpoint to remove facet categories and/or specific facets (including all their descendants) by name. The service will also remove facet-to-concept mappings first to preserve referential integrity.
- Endpoint: POST /api/facet/loader/clear
- Request body (JSON):
{
"Facet_Categories": ["CategoryName1", "CategoryName2"],
"Facets": ["FacetNameA", "FacetNameB"]
}- Both properties are optional. Provide one or both:
- facetCategories: Deletes each listed category by name, all facets within, and all mappings. Returns counts.
- facets: Deletes each listed facet by name and all of its descendants in the hierarchy, plus all related mappings. Other categories/facets remain intact.
Response body (JSON):
{
"categoriesDeleted": 1,
"facetsDeleted": 7,
"mappingsDeleted": 42
}Behavior notes:
- Deletions are name-based and case-sensitive to match stored names.
- For categories, the service deletes mappings, then facets, then the category record and any associated category metadata.
- For facets, the service deletes mappings, then deletes the facet and all descendants in a breadth-first traversal.
- Requests are idempotent: nonexistent names are ignored (counted as zero deletions).
cURL examples:
Clear by category:
curl -X POST \
-H "Content-Type: application/json" \
--data-binary '{
"Facet_Categories": ["Consortium_Curated_Facets"]
}' \
http://localhost:8080/api/facet/loader/clearClear by facet names:
curl -X POST \
-H "Content-Type: application/json" \
--data-binary '{
"facets": ["Recover Adult", "Infected"]
}' \
http://localhost:8080/api/facet/loader/clearA FacetLoader.http file is included at the repository root with ready-to-run sample requests compatible with the IntelliJ HTTP Client.
How to use:
- Open FacetLoader.http in IntelliJ IDEA or any IDE that supports the HTTP Client.
- Adjust the @host variable at the top if your application runs on a different port (e.g., 8086 in Docker).
- Click the gutter icons to execute requests for loading facets and clearing by category/facet names.
mvn spring-boot:run -Dspring-boot.run.arguments="--spring.profiles.active=dev"You can reduce logging level by setting these properties in the applications-dev.properties file.
spring.jpa.show-sql=false
logging.level.org.hibernate.type=INFO
logging.level.org.hibernate.SQL=INFO
logging.level.org.hibernate.type.descriptor.sql=INFO