This document contains steps for updating the data in the etpl database table. This requires a separate document
because it requires multiple steps for standardizing the data, inserting it, and updating search token data.
You will need the following:
- Terminal
- Python version 3.7 installed
pipenvpackage installed
- Download the latest programs and providers CSVs and name them as
oldPrograms.csvandoldProviders.csv, respectively. - In Terminal, in the
d4adroot folder, run the commands below.Context: This formats the file names to the
*_yyyymmdd.csvand moves them to thebackend/datafolder.
date=$(date '+%Y%m%d')
mv oldPrograms.csv "backend/data/programs_${date}.csv"
mv oldProviders.csv "backend/data/providers_${date}.csv"- In the same
d4adfolder, run the commands below. You may see a warning in the output, that's okay!Context: This creates the
programs_yyyymmdd_merged.csvfile, withDEGREEAWARDEDNAME,LICENSEAWARDEDNAME, andINDUSTRYCREDENTIALNAMEcolumns added andSUBMITTERNAMEandSUBMITTERTITLEremoved (private data).
cd backend/data/program-credentials
pip3 install -r requirements.txt
python3 merge-credentials.py $date- Open the
programs_yyyymmdd_merged.csvfile in your IDE. Delete any empty lines at the bottom of the file.
- In the same
program-credentialsfolder, run the commands below. If you are asked for a Postgres password, include the one you made for the userpostgreswhen setting up the app. You may get an error that "the etplcombination table doesn't exist", that's okay!Context: This adds the
_oldsuffix to the standardized CSV file to make way for the new one, and creates thecombined_etpl_raw.csvfile.
cd ..
mv standardized_etpl.csv standardized_etpl_old.csv
./combine-etpl.sh "programs_${date}_merged.csv" "providers_${date}.csv"- In the same
backend/datafolder, run the commands below.Context: This will create the new
standardized_etpl.csvand put it in the same folder.
./run-standardization.sh6.1 [Optional] To support the data migration to Credential Engine, we also need to create a modified form of standardized_etpl.csv that does not contain linefeed characters within the rows of the CSV file (This is a limitation in the import process to the system consumming this content for the data migration). To support this, we run an additional script to create standardized_etpl_for_data_migration.csv
python3 transform-for-migration.pyTo learn more about database migrations and seed updates, see the README.
- In the same
backend/datafolder, run the commands below.Context: This uses the DBMigrate library to create the migration files (up and down SQL, JS), all prefixed with the current timestamp.
cd ../..
npm --prefix=backend run db-migrate create update-etpl -- --sql-file- In the same
d4adfolder, run the command below. Be sure to change the UP and DOWN file names to what was created in the previous step.Context: This will delete all rows from the current ETPL table and insert new rows from the latest CSV file, essentially overwriting it with new data. After running, you should see the UP and DOWN files now have thousands of lines of INSERT statements. It should also have the
programtokensSQL statements added at the end of both files. This SQL will ensure that the tokens being searched on are up-to-date with etpl table changes. Please seedecision_log.md #2020-08-12for explanation of why we need this.
./backend/data/create_update_etpl_migrations.sh standardized_etpl_old.csv standardized_etpl.csv backend/migrations/sqls/INSERT-UP-FILENAME.sql backend/migrations/sqls/INSERT-DOWN-FILENAME.sql-
Create new up and down files with the same names but with the
-TEST.sqlsuffix instead. 10.1. If your operation is just an update, add the following comment-- intentionally left blankto both files. 10.2. If your operation adds new columns, your DOWN file should have-- intentionally left blankbut your UP file should have SQL code. Do this by first making a copy ofstandardized_etpl_test.csvand naming itstandardized_etpl_old_test.csv. Then, add your new column(s) to thestandardized_etpl_test.csvfile in Excel and fill with dummy data. Finally, run the same script in Step 9 above with the test CSV and SQL files. -
Modify the migration
.jsfile to conditionally load the-TEST.sqlup and down files. Follow the instructions in the README section starting with "edit the corresponding.jsfile..." to see what code to replace. -
Delete the files that you will not need to push to Github. Run the following command in the
backend/datafolder.
rm standardized_etpl_old.csv && rm "programs_${date}.csv" && rm "providers_${date}.csv" && rm "programs_${date}_merged.csv"In this section, we will transform the ETPL data required for Career Navigator, rename the output, and upload it to the nj-prod-re-data S3 bucket.
-
Navigate to the
transform-for-career-navigatordirectory inside the backend data folder:cd backend/data/transform-for-career-navigator -
Create a Python virtual environment to isolate dependencies (optional but recommended):
python3 -m venv venv source venv/bin/activate -
Install the Python dependencies needed for this script by running:
pip install -r requirements.txt
-
Run the
transform-for-career-navigator.pyscript to generate the CSV file:python transform-for-career-navigator.py
This will process the ETPL data into the format required for Career Navigator. Once complete, an output file will be generated in the same directory.
-
Rename the output file to
training.csv:mv output_filename.csv training.csv
(Note: Replace
output_filename.csvwith the actual name of the file generated by the script.) -
Upload the
training.csvfile to thenj-prod-re-dataS3 bucket manually using AWS CLI:-
First, ensure that the AWS CLI is installed and configured on your machine. If it’s not configured yet, use:
aws configure
Enter your AWS Access Key ID, Secret Access Key, default region name, and output format.
-
Once the AWS CLI is configured, upload the file:
aws s3 cp training.csv s3://nj-prod-re-data/
-
-
Verify the upload by listing the contents of the bucket:
aws s3 ls s3://nj-prod-re-data/
You have now successfully transformed the ETPL data for Career Navigator and uploaded it to the production S3 bucket.
- If your operation adds new columns, also add this to the TEST up-file.
delete from programtokens;
insert into programtokens(programid, tokens)
select etpl.programid,
to_tsvector(coalesce(etpl.standardized_name, etpl.officialname)) ||
to_tsvector(coalesce(etpl.standardized_description, etpl.description, '')) ||
to_tsvector(coalesce((string_agg(soccipcrosswalk.soc2018title, ' ')), ''))
from etpl
left outer join soccipcrosswalk
on etpl.cipcode = soccipcrosswalk.cipcode
group by etpl.programid;
- Finally, run the automated tests in the
d4adfolder with the command below.
./scripts/test-all.sh && ./scripts/build.sh && ./scripts/feature-tests.shTroubleshooting 1: If you get the error about a Postgres password not being of the right type, you may need to change your Postgres user permissions. This involves finding the
pg_hba.conffile on your computer, opening it in your Terminal via Vim editor, and changing all the permissions totrustinstead ofsha_....
Troubleshooting 2: If Cypress feature tests fail, you likely need to just change some of the data that we are looking for in the Training Explorer. Check the video for what data is showing in these instances, and update the test to check for that same data. You'll see examples of this in past commits to update data.
- Add, commit, and push the requisite files (up and down SQL files, the up and down TEST SQL files, the updated JS migration file, and the updated CSV file). The continuous deployment will automatically run the script that finds this new migration and executes it, thus updating the Postgres database with the commands outlined in the migrations.