Conversation
|
@laurencegoolsby could you review and respond to any |
laurencegoolsby
left a comment
There was a problem hiding this comment.
Reviewed/responded to all TODO(pre-merge): comments.
Created navapbc/strata-template-documentai-api#52 , navapbc/strata-template-documentai-api#53
| } | ||
| } | ||
|
|
||
| # TODO(pre-merge): this is new, what should we be using? |
There was a problem hiding this comment.
The document block is needed for PDF/document extraction. The image block handles image files (photos of documents). We need both - document for PDFs/TIFFs, image for JPEGs/PNGs.
The document config extracts page-level text with bounding boxes, which is what BDA needs for field extraction.
There was a problem hiding this comment.
Are these document settings highly specific to the Strata DocumentAI implementation? Or are they reasonable general baselines we should just ship as a part of the Document Data Extraction config (like we are doing for the image part)?
There was a problem hiding this comment.
@doshitan - Reasonable general baselines. Ship as part of DDE.
| # TODO(pre-merge): create ticket for documentapi-api to respect standard DDE env vars | ||
| # and/or update DDE module to provide other env vars (like the BDA_ ones?) | ||
| DOCUMENTAI_INPUT_LOCATION = "${local.document_data_extraction_environment_variables.DDE_INPUT_LOCATION}/input" |
There was a problem hiding this comment.
Agree — the app code should use DDE_INPUT_LOCATION etc. instead of DOCUMENTAI_* prefixed vars.
There was a problem hiding this comment.
@doshitan I retract my previous statement. DOCUMENTAI_INPUT_LOCATION makes sense here. The DOCUMENTAI_* prefix makes sense for the applications environment variables.
The DDE module provides DDE_INPUT_LOCATION as its output, but the app should map that to its own namespace (DOCUMENTAI_INPUT_LOCATION).
d8904dd to
bd45d85
Compare
e6b8814 to
6e422c6
Compare
6e422c6 to
b30e9f7
Compare
Ticket
Resolves #253
Changes
TODO extract useful notes for configuring DocumentAI API
TODO add domain/HTTPS config
TODO update or remove E2E tests
TODO infra test prefix does still cause issues with
app-documentainame, PR envs fine (into the 1000s at least, and dependent on env name). We could useapp-docuai, which fits the other (current) longest app names.Context for reviewers
Have not (yet) copied overall the custom templates from #274. A bit unclear if those should be something we just ship as a part of the DDE module in the template vs DocumentAI API specific.
Review all the
TODO(pre-merge):comments.Testing
Video demo of PR environment (public link): https://drive.google.com/file/d/1t3-cBPaE-4i2_XkEPgbSpbBqhIU36elO/view?usp=drive_link
Using https://www.paystubhero.com/wp-content/uploads/2023/11/896-1.jpg
See results:
{ "jobId": "ab2e401b-e2f1-4e79-8b60-a21562aa07e5", "jobStatus": "completed", "message": "Document processed successfully", "createdAt": "2026-05-07T21:03:47.833187Z", "completedAt": "2026-05-07T21:05:03Z", "totalProcessingTimeSeconds": 75.17, "matchedDocumentClass": "W2", "fields": { "employerInfo.employerAddress": { "confidence": 0.84, "value": "41980 Ann Arbor Rd. E Plymouth, NC" }, "employerInfo.controlNumber": { "confidence": 0.92, "value": "" }, "employerInfo.employerName": { "confidence": 0.98, "value": "Paystub Hero" }, "employerInfo.ein": { "confidence": 0.97, "value": "39-3598535" }, "employerInfo.employerZipCode": { "confidence": 0.97, "value": 48170 }, "filingInfo.ombNumber": { "confidence": 0.97, "value": "1545-0029" }, "filingInfo.verificationCode": { "confidence": 0.96, "value": "" }, "other": { "confidence": 0.95, "value": "" }, "federalTaxInfo.federalIncomeTax": { "confidence": 0.98, "value": 9467 }, "federalTaxInfo.allocatedTips": { "confidence": 0.96, "value": 0 }, "federalTaxInfo.socialSecurityTax": { "confidence": 0.97, "value": 4960 }, "federalTaxInfo.medicareTax": { "confidence": 0.97, "value": 1160 }, "employeeGeneralInfo.employeeNameSuffix": { "confidence": 0.96, "value": "" }, "employeeGeneralInfo.employeeAddress": { "confidence": 0.13, "value": "41980 Ann Arbor Rd. E Plymouth, CA" }, "employeeGeneralInfo.employeeLastName": { "confidence": 0.96, "value": "Jesan" }, "employeeGeneralInfo.employeeZipCode": { "confidence": 0.98, "value": 48170 }, "employeeGeneralInfo.firstName": { "confidence": 0.97, "value": "Abdur Rahaman" }, "employeeGeneralInfo.ssn": { "confidence": 0.95, "value": "498-74-9874" }, "federalWageInfo.socialSecurityTips": { "confidence": 0.96, "value": 0 }, "federalWageInfo.wagesTipsOtherCompensation": { "confidence": 0.96, "value": 80000 }, "federalWageInfo.medicareWagesTips": { "confidence": 0.96, "value": 80000 }, "federalWageInfo.socialSecurityWages": { "confidence": 0.96, "value": 80000 }, "nonqualifiedPlansIncom": { "confidence": 0.98, "value": 0 } }, "error": null, "additionalInfo": null }Preview environment for app
Preview environment for app-catala
Preview environment for app-rails
Preview environment for app-flask
Preview environment for app-nextjs
Preview environment for app-documentai