This Python script extracts all footnotes from a Microsoft Word (.docx) or OpenDocument Text (.odt) file and lists them sequentially in a new .odt file.
It attempts to preserve basic text formatting (bold, italic, underline) and uses the Rich library for an enhanced, interactive command-line experience.
- Extracts footnotes from both
.docxand.odtfiles. - Preserves basic formatting: bold, italic, and underline.
- Generates a new
.odtfile containing a sequential list of all extracted footnote contents. - π¬ Interactive CLI: Prompts the user for the file to process.
- π Readable terminal output with progress bars, using the Rich library.
- π Process multiple files in a single session.
- πΎ Generated output files are named based on the input file (e.g.,
footnotes_da_original_name.odt) and placed in theoutput_footnotes/directory.
- Clone this repository or download the script.
- (Optional, but Recommended) Create and activate a Python virtual environment:
python -m venv venv # On Linux / macOS source venv/bin/activate # On Windows (Command Prompt / PowerShell) .\venv\Scripts\activate
- Install the required Python libraries:
pip install python-docx odfpy rich
-
Navigate to the script's directory in your terminal.
-
Run the script:
python your_script_name.py
(Replace
your_script_name.pywith the actual name of your Python file, e.g.,extract_notes.py) -
The script will then guide you through an interactive session:
- It will prompt you to enter the path to the
.docxor.odtfile you wish to analyse.ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Inserisci il percorso del file .odt o .docx da analizzare (o 'esci' β β per terminare) β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ > your_document.odt - Enter the full or relative path and press Enter.
- The script will process the file and save the results to a new
.odtfile inside theoutput_footnotes/sub-directory. - To stop the script and exit, type
esci(orexit/quitif you modify the script) at the prompt and press Enter.
- It will prompt you to enter the path to the
- Formatting: This script ONLY preserves basic bold, italic, and underline formatting.
It does NOT currently preserve other formatting such as:
- Fonts, colours, or text sizes.
- Lists (bulleted or numbered).
- Tables.
- Images.
- Hyperlinks.
- Paragraph alignment, indentation, or spacing.
- Structure: The script generates a simple, sequential LIST of the footnote text in a new document. It does NOT replicate the footnote reference markers (ΒΉ, Β², Β³) within a copy of the main text, nor does it place the extracted notes into the page-footer layout of the output ODT.
- Style Complexity: Complex or inherited styles, especially within ODT documents, may not be fully or correctly interpreted beyond the basic formatting checks implemented.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.
