This folder contains scripts for extracting data forms and transferring the data to Excel spreadsheets. These scripts include advanced features like checkbox detection and multi-destination output.
Purpose: Extracts data from Word content controls AND checkboxes, writes to a single Excel destination.
Use when: You have a form with text fields and checkboxes, and need to transfer data to one Excel file/sheet.
Key Features:
- Extracts content from Word document content controls
- Detects checked checkboxes (☒) and records them as "Yes"
- Skips empty fields and placeholder text
- Handles merged cells in Excel safely
- Writes to the first empty row
Purpose: Extracts data from Word (including checkboxes) and writes to MULTIPLE Excel files/sheets simultaneously.
Use when: You need to send the same extracted data to multiple spreadsheets or different sheets in the same workbook.
Key Features:
- All features of PFVPU_Script.py
- Writes to multiple destinations in one execution
- Validates each destination before writing
- Provides detailed success/failure reporting
- Handles different workbooks and sheet names
- Python 3.7 or higher installed
- Required packages installed (see main requirements.txt)
- Word document (.docx) with content controls and/or checkboxes
- Excel workbook(s) (.xlsx) with headers in row 2
# From the repository root
pip install -r requirements.txt-
Open the script in your text editor or IDE
-
Update the file paths at the top:
# === INPUTS ===
docx_path = r"C:\path\to\your\form.docx"
excel_path = r"C:\path\to\your\output.xlsx"
sheet_name = "Sheet 1" # Change to your sheet name-
Ensure your Word document has:
- Content controls with aliases/tags matching Excel headers
- Checkboxes (☒ for checked, ☐ for unchecked)
- Checkbox text labels that match Excel headers if you want to capture checkbox states
-
Ensure your Excel file has:
- Headers in row 2
- Column headers matching content control aliases
- Column headers matching checkbox labels (optional)
-
Run the script:
python PFVPU/PFVPU_Script.py- Check the output:
Data transferred successfully (checkboxes handled, merged cells safe).
- Open the script and update the input path:
# === INPUTS ===
docx_path = r"C:\path\to\your\form.docx"- Configure multiple destinations:
# Define multiple output destinations
# Format: (excel_path, sheet_name)
output_destinations = [
(r"C:\path\to\file1.xlsx", "Sheet 1"),
(r"C:\path\to\file1.xlsx", "Sheet 2"),
(r"C:\path\to\file2.xlsx", "Data"),
]You can add as many destinations as needed!
- Run the script:
python PFVPU/multiple_outputs.py- Review the detailed output:
============================================================
DOCX TO MULTIPLE EXCEL FILES
============================================================
Extracting data from: Test Form PHAC-PFVPU.docx
✓ Extracted 15 field(s)
Writing to 3 destination(s)...
→ output.xlsx / 'Sheet 1'
✓ Wrote 12 values to row 3
→ output.xlsx / 'Sheet 2'
✓ Wrote 8 values to row 3
→ output2.xlsx / 'Data'
✓ Wrote 15 values to row 3
============================================================
SUMMARY
============================================================
Total destinations: 3
Successfully written: 3
Failed: 0
============================================================
✓ All data transferred successfully!
Simply add more tuples to the list:
output_destinations = [
(r"C:\path\to\main_database.xlsx", "Raw Data"),
(r"C:\path\to\backup.xlsx", "Backup"),
(r"C:\path\to\reports.xlsx", "Monthly"),
(r"C:\path\to\archive\2024.xlsx", "Q1"),
]The scripts look for these checkbox symbols: ☒ (checked) and ☐ (unchecked)
To modify which checkboxes are captured, edit the pattern in the script:
# Current pattern
pattern = re.compile(r'(☒|☐)\s*(.+)')
for mark, txt in paragraphs:
if mark == '☒': # Only take checked boxes
clean_text = txt.strip()
if clean_text:
data[clean_text] = "Yes"# Default: Store "Yes" for checked boxes
data[clean_text] = "Yes"
# Alternative: Store "Checked", "True", "X", etc.
data[clean_text] = "X"
data[clean_text] = True# Default (starts at row 3)
next_row = find_first_empty_row(ws, start_row=3)
# Custom (e.g., row 4)
next_row = find_first_empty_row(ws, start_row=4)Solutions:
- Ensure checkboxes are using the correct symbols: ☒ (checked) or ☐ (unchecked)
- Verify checkbox labels match Excel headers exactly
- Check that checkboxes are in the main document (not in headers/footers)
Solution:
- Verify the sheet name is spelled correctly (case-sensitive)
- Check the script output for available sheet names
- Update
sheet_namevariable to match exactly
Solution:
- Verify file paths are correct
- Use raw strings (prefix with
r) - Ensure files exist before running the script
Analysis:
- Check the detailed output to see which destinations failed
- Common causes:
- File doesn't exist
- Sheet name is wrong
- File is open in Excel (close it)
- Incorrect path
Solution: The script handles merged cells automatically. If issues persist:
- Avoid having headers in merged cells
- Ensure data columns are not merged
- Test with non-merged cells first
Possible causes:
- Content control aliases don't match headers exactly
- Extra spaces in headers or aliases
- Case sensitivity (e.g., "Name" vs "name")
Solution:
- Print headers and data keys to debug: Add this after line that reads headers:
headers = [cell.value for cell in ws[2]]
print(f"Excel headers: {headers}")
print(f"Form data keys: {list(data.keys())}")- Enable Developer tab: File > Options > Customize Ribbon > Developer
- For each content control:
- Click the control
- Click "Properties"
- Set "Tag" or "Alias" to match Excel header
- Click OK
- Format: .xlsx files only
- Headers: Row 2 by default
- Matching: Column names must match Word control aliases/checkbox labels exactly
- State: Close Excel files before running scripts
- Merged Cells: Supported, but avoid in header row
- Multiple sheets: Supported (especially useful with multiple_outputs.py)
- Processing forms into a single database
- Simple data transfer tasks
- Testing and validation
- Writing to multiple departmental databases
- Creating backups automatically
- Distributing data to different teams
- Writing to different sheets for different data types
- Maintaining archive copies
- Create Word form with content controls and checkboxes
- Create Excel file with matching headers
- Configure PFVPU_Script.py with file paths
- Run script and verify data transfer
- Use on production forms
- Set up Word form as above
- Create multiple Excel files/sheets as needed
- Configure output_destinations list in multiple_outputs.py
- Test with one form
- Review detailed output for any failures
- Fix any path/sheet name issues
- Run on production forms
- Close all Excel files before running
- Test with sample data first
- Use
multiple_outputs.pyeven for single destination if you want detailed reporting - Checkbox text must appear immediately after the checkbox symbol
- Empty checkboxes (☐) are not recorded in Excel (no value written)