FileJanitor is a package that cleans, standardizes, and organizes file names and folder structures. Its goal is to automate common file system tasks, so the user can keep their documents tidy.
FileJanitor provides a set of utility functions to automate common file system tasks, such as renaming files, standardizing name conventions, ordering files, and restructuring directories. All functions operate on all files in a specialized folder unless specified.
FileJanitor is a high-level package built on top of libraries such as os, pathlib, and shutil. While many Python libraries provide low-level tools for working with files, they do not offer built in functions for tasks such as standardizing file names, reordering files, and flattening directory structures. This package abstracts these low level capabilities into simple and reusable functions.
Check deployment at: https://test.pypi.org/project/FileJanitor/
Run in your terminal to install FileJanitor package:
pip install -i https://test.pypi.org/simple/ FileJanitorReplaces the input pattern in file names with a new pattern or character. This function will:
- Support replacing characters or strings (_ -> &)
- Capitalize the first word of the file name.
- Apply changes to all files in the folder.
| Parameter | Type | Description |
|---|---|---|
pattern |
str | The substring or character pattern to search for in filenames. |
replacement |
str | The string or character to replace the pattern with in filenames. |
dir |
str, optional | Path to the directory containing files to be modified. |
from FileJanitor import replace_pattern
replace_pattern("_", " & ", "docs/") # Renames files like: "file_janitors.txt" -> "file & janitors.txt"
replace_pattern("_", " ") # Uses current directory, renames files like: "my_file.txt" -> "my file.txt"This function standardizes file names according to consistent formatting rules. This can be helpful when dealing with large collections of inconsistently named files.
- Replaces spaces and invalid characters with underscores (_).
- Converts dashes (-) and spaces to underscores (_).
- Removes duplicate punctuation (..)
- Preserves file extensions
| Parameter | Type | Description |
|---|---|---|
dir |
str | Path to the directory containing files to be standardized |
case |
str, optional | Desired casing for filenames (default is 'lower') |
sep |
str, optional | Character to use as the separator between words in filenames (default is '_') |
from FileJanitor import standardize_filename
standardize_filename("data/", case="title", sep="-") # Renames files in data/ like: "my file NAME.txt" -> "My-File-Name.txt"This function orders the files in each folder according to a defined order.
| Parameter | Type | Description |
|---|---|---|
dir |
str | Path to the target directory containing the files that need to be indexed |
order |
list | List of filenames defining the desired order. |
unlisted |
str, optional | How to handle files not in order, accepts "hide" to move files to subdirectory named "_unlisted, or "keep" to leave at end with sequential numbering |
from FileJanitor import index_files
index_files("my_thesis", order = ["intro.pdf", "analysis.pdf", "discussion.pdf", "conclusions.pdf"])Before:
my_thesis/
├── discussion.pdf
├── intro.pdf
├── conclusions.pdf
└── analysis.pdfAfter:
my_thesis/
├── 01_intro.pdf
├── 02_analysis.pdf
├── 03_discussion.pdf
└── 04_conclusions.pdfThis function will move all files from nested subfolders into a single target directory.
- By default, only files directly inside 'nested_directory' are moved.
- If 'recursive' is True, files from all nested subdirectories are also moved.
| Parameter | Type | Description |
|---|---|---|
nested_directory |
str | Root directory containing files and nested subdirectories to flatten. |
output_directory |
str, optional | Directory where flattened files will be moved. |
recursive |
bool, optional | Whether to move files from nested subdirectories recursively (default is False) |
from FileJanitor import flatten
flatten("data/", recursive=True)Before:
cwd/
├── data/
│ ├── raw/
│ │ └── file1.csv
│ └── processed/
│ └── file2.csvAfter:
cwd/
├── file1.csv
├── file2.csvTo run the test suite, first install the package with test dependencies:
pip install -e .[tests]
pytestTo run tests for a specific file:
pytest tests/unit/test_replace_pattern.pyFull documentation is available at: https://ubc-mds.github.io/DSCI_524_FileJanitor/
The documentation includes:
# Clone the repository
git clone https://github.com/UBC-MDS/DSCI_524_FileJanitor.git
cd DSCI_524_FileJanitor
# Install in editable mode with all dependencies
pip install -e ".[dev,docs,tests]"# Install documentation dependencies
pip install -e ".[docs]"
# Build the reference documentation
quartodoc build
# Preview documentation in your browser
quarto previewDocumentation is automatically deployed to GitHub Pages when changes are merged to the main branch.
Deployment workflow:
- The team member creates a pull request (PR) with documentation changes
- The PR gets reviewed and approved by a team member
- The PR is merged to
main - GitHub Actions runs
.github/workflows/publish.yml - The documentation is built and deployed to the
gh-pagesbranch - The changes are at https://ubc-mds.github.io/DSCI_524_FileJanitor/
If you use FileJanitor in your work, please cite it as:
@software{filejanitor,
title={FileJanitor: A Python package for cleaning and organizing file systems},
author={Brown, Sean and Lokanc, Sam and Duran, Rabin and Alvarez, Luis Alonso},
year={2026},
url={https://github.com/UBC-MDS/DSCI_524_FileJanitor}
}- Sean Brown (@SeanBrown12345)
- Sam Lokanc (@SamLokanc)
- Rabin Duran (@rabin0208)
- Luis Alonso Alvarez (@luisalonso8)
- Copyright © 2026 Sean Brown, Sam Lokanc, Rabin Duran, Luis Alonso Alvarez.
- Free software distributed under the MIT License.