Skip to content

Conversation

@samathad2023
Copy link
Contributor

@samathad2023 samathad2023 commented Aug 11, 2025

🎫 Ticket

Link to the relevant ticket:
https://gitlab.login.gov/lg-teams/Team-Data/data-warehouse-ag/-/issues/1092

🛠 Summary of changes

Added migration to encrypted_email column in the email_address table. Specifically, it changed the sensitive flag in the comments from true to false.

📜 Testing Plan

Provide a checklist of steps to confirm the changes.

  • Deploy and apply
  • Verify updates in idp
  • Check the dms task transformation in analytics

@samathad2023 samathad2023 changed the title Data/1092 key access 1092 - Migration to update column comments Aug 11, 2025
@samathad2023 samathad2023 force-pushed the data/1092-key-access branch 2 times, most recently from f45c7c7 to 878de07 Compare August 14, 2025 22:25
@vrajmohan
Copy link
Contributor

Isn't this a problem as the encryption is two-way?

@samathad2023
Copy link
Contributor Author

we need to decrypt the data for downstream processing for dataware house needs. Are there specific concerns ?

@vrajmohan
Copy link
Contributor

Here are my assumptions (happy to learn if they are incorrect):

  1. All sensitive information (PII and others) is encrypted in the Rails database. Despite the encryption, we do not propagate sensitive information to the data warehouse, as that is potentially open to a much larger audience, by marking sensitive information with comment: "sensitive=true".
  2. The user's email address is one such data item. Furthermore, it is encrypted with a two-way algorithm so that we can retrieve the plaintext email address so that we can, say, send email to the user.
  3. If we propagate the email address as well to the data warehouse, that would make it vulnerable to an attacker as they would possess a large dataset that they can analyze.
  4. Why are we doing this for this field? Doesn't it negate the work that we did annotating the schema?

@samathad2023
Copy link
Contributor Author

Thanks for raising these points—they are all valid and important.
Our goal is to only have decrypted data in the data warehouse for specific, permission-based roles with strict security constraints (like admin-level access). The comment: "sensitive=true" annotation is meant to flag this information specifically for the data warehouse process and isn't used anywhere else.
We believe that turning off the flag is the simplest way to proceed without a major refactor. Adding others to provide more context @MrNagoo @astrogeco @ambuj-neupane-tts

@MrNagoo
Copy link
Contributor

MrNagoo commented Aug 18, 2025

@vrajmohan Security has approved not only the decryption but the storage of email as plaintext within the Data Warehouse. While we are now bringing and decrypting the email, we are still going to add column level security to the column only to be accessible by roles privileged to that information (fraud investigation/FCMS)

@ambuj-neupane-tts
Copy link
Contributor

to be clear @MrNagoo @vrajmohan, Security has approved the plan to do it (ingest encrypted email address and decrypt to the warehouse). But we are not yet authorized to make the change. CC @astrogeco

@samathad2023
Copy link
Contributor Author

Following our team discussion, we're adding the FCMS migration changes as a draft. They'll be merged when we're ready for the deployment.

@samathad2023 samathad2023 marked this pull request as draft August 18, 2025 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants