Time Limit: 45 minutes
Welcome to the medical document demographics extraction challenge! Your task is to implement pattern matching and text extraction logic to pull demographic information from various types of medical documents.
You need to implement functions in src/lib/extractor.ts that can extract the following demographic information from medical documents:
- Patient Name (First Name, Last Name)
- Date of Birth (standardized to YYYY-MM-DD format)
- Gender (standardized to "Male" or "Female")
- Address (full address string)
- Phone Number (various formats)
- Medical Record Number (MRN/MR Number)
-
Install dependencies:
npm install
-
Start the development server:
npm run dev
-
Open your browser: Navigate to http://localhost:3000
-
Start implementing: Open
src/lib/extractor.tsand implement the TODOs
src/
βββ lib/
β βββ mock-data.ts # Sample medical documents with expected results
β βββ extractor.ts # π― YOUR IMPLEMENTATION GOES HERE
βββ app/
β βββ api/
β β βββ documents/route.ts # API to get test documents
β β βββ extract/route.ts # API to run extraction and calculate accuracy
β βββ page.tsx # Testing interface
extractName()- Extract patient name from various formatsextractDateOfBirth()- Extract and standardize date of birthextractGender()- Extract and standardize genderextractAddress()- Extract full address informationextractPhoneNumber()- Extract phone numbers in various formatsextractMedicalRecordNumber()- Extract MRN/Medical Record NumbersextractDemographics()- Main function that orchestrates all extraction
The challenge includes 3 types of medical documents with varying formats:
Patient: John Smith
DOB: 03/15/1985
Gender: Male
MRN: MR-789456123
Address: 123 Main Street, Springfield, IL 62701
Phone: (555) 123-4567
Patient Name: Maria Rodriguez
Date of Birth: July 22, 1992
Gender: Female
Medical Record #: MR-456789012
Contact: 456 Oak Avenue, Denver, CO 80203, Tel: 555-987-6543
PT: Robert Chen
D.O.B: 12/04/1968
Sex: M
MR Number: MR-123654789
Address: 789 Pine Road, Austin, TX 78701
Phone Number: (555) 456-7890
- Use the browser interface at
http://localhost:3000 - Select a test document or paste custom text
- Click "Extract" to test your implementation
- View extraction results and accuracy scores
- Use "Run All Tests" to test against all documents
GET /api/documents- Get available test documentsPOST /api/extract- Extract demographics from textGET /api/extract/test- Run extraction on all test documents
Your implementation will be evaluated on:
- Accuracy: How well your extraction matches expected results
- Pattern Handling: Ability to handle different text formats
- Edge Cases: Handling of missing or malformed data
- Code Quality: Clean, readable implementation
- Completeness: Implementation of all required functions
Target Accuracy: Aim for >80% overall accuracy across all test documents
// Name patterns
/(?:Patient|Patient Name|PT):\s*([A-Za-z]+)\s+([A-Za-z]+)/
// Date patterns
/(?:DOB|Date of Birth|D\.O\.B):\s*(\d{1,2}\/\d{1,2}\/\d{4})/
/(?:Date of Birth):\s*([A-Za-z]+)\s+(\d{1,2}),?\s+(\d{4})/
// Phone patterns
/(?:Phone|Tel|Phone Number):\s*\(?(\d{3})\)?[\s-]?(\d{3})[\s-]?(\d{4})/// Convert MM/DD/YYYY to YYYY-MM-DD
const [month, day, year] = dateStr.split("/");
return `${year}-${month.padStart(2, "0")}-${day.padStart(2, "0")}`;
// Convert "July 22, 1992" to "1992-07-22"
const months = { January: "01", February: "02" /* ... */ };- Start with specific patterns (exact keyword matches)
- Use capture groups to extract the actual data
- Handle case-insensitive matching with
/iflag - Test patterns against all sample documents
- Add fallback patterns for edge cases
- Handle nickname variations (e.g., "Bob" for "Robert")
- Extract middle names or initials
- Parse international phone formats
- Handle multiple addresses (home vs. work)
- Extract additional demographics (age, occupation, etc.)
- Use
console.log()to debug your regex patterns - Test individual functions before integrating
- Use the browser's developer tools to inspect API responses
- Check the terminal for any server-side errors
Focus on:
- Completing the core functionality within the time limit
- Testing your implementation against the provided documents
- Handling the most common patterns first
- Clean, readable code with appropriate comments
Good luck! π
npm run dev # Start development server
npm run build # Build for production
npm run start # Start production server
npm run lint # Run ESLint