EntityGenerator is part of the ETL Tool Suite used for generating entity files that can be loaded into various datasource environments such as I2B2 and tranSMART.
It provides command-line utilities for transforming source data and mapping files into standardized CSV entities suitable for downstream ingestion.
Project Type: Legacy Java 8 ETL Utility
Status: Legacy migration in progress (Ant → Maven)
EntityGenerator supports the generation of entity files used in biomedical ETL workflows.
It can be run using either legacy Ant-based JARs or new Maven-shaded executable JARs.
The modernization process introduces:
- Portable Maven build and deployment configuration
- Nexus integration (sandbox / AWS Dev / AWS Prod)
- Standardized Makefile targets for local and CI/CD pipelines
| Path | Description |
|---|---|
src/ |
Java source code (legacy structure) |
resources/ |
Configuration files and dependencies |
dist/jars/ |
Output directory for Ant-built JARs |
target/ |
Output directory for Maven builds |
Makefile |
Wrapper for Maven build, test, and deploy commands |
pom.xml |
Maven project descriptor for multi-jar shaded builds |
Use this method while existing pipelines and ETL workflows are being migrated.
ant clean buildOutputs legacy JARs under dist/jars/.
The Maven build creates reproducible, shaded (“fat”) JARs for each ETL tool.
# Build shaded JARs
mvn clean package
# Run a shaded jar (example)
java -jar target/entity-generator-1.0.0-SNAPSHOT-JsonMetadataGenerator.jarOutputs all runnable artifacts under target/.
The included Makefile standardizes common Maven goals and can serve as the foundation for both local and automated build pipelines.
make build # Builds shaded JARs
make test # Runs Maven tests
make deploy # Deploys to local Nexus (port 8081)
make clean # Cleans build directoriesUsing a Makefile as the top-level interface ensures consistent behavior between local development and CI/CD environments.
It is generally considered good practice to wrap CI/CD jobs around the Makefile, especially in projects that require reproducible and portable builds across developer workstations and automation systems.
Credentials for Nexus can be configured in ~/.m2/settings.xml or via environment variables:
export NEXUS_USERNAME=admin
export NEXUS_PASSWORD=admin123The Maven distributionManagement block is preconfigured for local and AWS environments.
| Environment | Repository URL | Description |
|---|---|---|
| Local Sandbox | http://localhost:8081/repository/maven-releases/ |
Developer testbed |
| AWS Dev (Future) | https://nexus.dev.aws.example/... |
Dev environment (coming soon). AWS Service Instead of Nexus? |
| AWS Prod (Future) | https://nexus.prod.aws.example/... |
Prod Envirionment - AWS Service For AWS Hosted Env Instead of Nexus |
Deploy snapshots or releases:
mvn -DskipTests deployThis example remains unchanged for legacy users and documentation continuity.
It demonstrates the original Ant-generated EntityGenerator workflow.
Modernized equivalents will be introduced in future versions.
Steps below were validated on macOS and Amazon Linux.
- Open a bash connection to your ETL Client Docker
docker exec -e COLUMNS="`tput cols`" -e LINES="`tput lines`" -ti etl-client bash
- Clone this project:
git clone https://github.com/hms-dbmi/ETLToolSuite-EntityGenerator cd ETLToolSuite-EntityGenerator - Create working directories:
mkdir data mappings completed
- Copy data and mapping files generated from the MappingGenerator Example:
cp ../ETLToolSuite-MappingGenerator/example/Asthma_Misior_GSE13168.txt data/ cp ../ETLToolSuite-MappingGenerator/example/mapping.csv mappings/mapping.csv cp ../ETLToolSuite-MappingGenerator/example/mapping.csv.patient mappings/PatientMapping.csv
- Run the EntityGenerator:
java -jar EntityGenerator.jar -jobtype CSVToI2b2TM
- Check output:
cd completed ls -la - Expected files:
I2B2.csv ConceptDimension.csv ObservationFact.csv ConceptCounts.csv TableAccess.csv PatientDimension.csv PatientTrial.csv PatientMapping.csv - Exit the container and proceed to data loading via
WorkflowScripts → I2B2TM_V18_1
| Feature | Status |
|---|---|
| Maven-based build and shading | Implemented |
| Nexus sandbox deployment | Supported |
| AWS Dev/Prod Nexus profiles | Coming soon |
| GitHub Actions / CI integration | Planned |
| Automated ETL pipeline packaging | Planned |
| Docker-based runtime container | Planned |
Copyright © Harvard Medical School
Licensed under the Apache License, Version 2.0.
Developed and maintained by the Avillach Lab at Harvard Medical School as part of the BioData Catalyst initiative.
Originally derived from the HMS ETL Tool Suite for i2b2/tranSMART integration.