A suite of Python tools for analyzing and managing Arc XP content that are low value and candidates for deletion. This repository provides script templates for identifying redirects, unpublished wires content, and unused published photos. All modules feature parallel processing, rate limiting, and comprehensive logging for content analysis workflows.
You are encouraged to clone the repository, run the script templates in a local development environment, and customize the script templates with customizations that make them conform to your organizations unique content and business rules.
-
redirects_report/- Redirects analysis and HTTP status validationdelete_redirects.py- Deletes redirectsdelete_redirects_parallel_processor.py- Optimized parallel processing engine with dynamic worker scalingidentify_redirects.py- Identifies redirects within date ranges and validates HTTP status codesidentify_redirects_parallel_processor.py- Optimized parallel processing engine with dynamic worker scalingstatus_checker.py- Async HTTP status checking (200/404 validation) for redirect URLs
-
wires_report/- Unpublished story wires content analysis and cleanupidentify_wires.py- Identifies unpublished wires content for potential deletionidentify_wires_parallel_processor.py- Parallel processing engine with ElasticSearch query optimization for wires identificationdelete_wires.py- Deletes wiresdelete_wires_parallel_processor.py- Parallel processing engine for wires deletion
-
images_report/- Photo Center unused published image analysis and managementpublished_photo_analysis.py- Analyzes published photos to identify unused imagesdelete_or_expire_photos.py- Deletes or expires photos from Photo Centercreate_lightbox_cache.py- Creates SQLite cache of lightbox data for analysis of photo usage in lightboxesimages_parallel_processor.py- Parallel processing engine for photo operations and image management
daterange_builder.py- Automatic date range splitting for API rate limitsutils.py- Shared utility functions, logging, rate limiting, and timing decorators
config.env- Template for API credentials (copy to .env)requirements.txt- Python dependencies
redirects_report/README.md- Redirects report documentation and usage guidewires_report/README.md- Unpublished wires articles report documentation and usage guideimages_report/README.md- Unused published images report documentation and usage guideREADME.md- This file
Run the test suite to verify the module is working correctly:
PYTHONPATH=. pytest tests/
# OR
python -m pytest tests/ -v logs/- Log files (auto-created, gitignored)spreadsheets/- Output CSV files (auto-created, gitignored)databases/- SQLlite databases (auto-created, gitignored)
- Automatic Worker Optimization: Dynamic scaling based on performance
- Comprehensive Logging: Detailed performance monitoring and error tracking
- Unit Testing: Comprehensive test coverage for all components
- Environment Configuration: Secure credential management
- Parallel Processing: Fast API calls with configurable worker pools
- Async Status Checking: Fast HTTP status validation
- Python 3.9+
- Arc XP API credentials
- Clone and setup environment:
cd arc-content-report
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt- Copy credentials:
cp config.env .env - Edit
.envwith your API credentials - Run redirects report: See
redirects_report/README.md - Run wires report: See
wires_report/README.md - Run images report: See
images_report/README.md
Using the PyCharm IDE allows you to set breakpoints to stop the code while it's running and examine the variables and their values that exist at the point in time. While not necessary for the function of these modules, it can be a useful development or debugging tool.
To run any module (e.g., redirects_report.identify_redirects, wires_report.identify_wires, or images_report.published_photo_analysis) with command-line arguments in PyCharm (using python -m ...), follow these steps:
-
Open Run Configurations:
- In PyCharm, go to
Run > Edit Configurations....
- In PyCharm, go to
-
Create a New Configuration:
- Click the
+icon at the top left and select "Python".
- Click the
-
Configure the Module:
- Name the configuration something like:
Run [module_name]. - Select "Module name" (not "Script path") and enter the module path:
Example:
[module_name].[script_name]wires_report.identify_wires - In the "Parameters" field, enter the command-line arguments you'd normally use.
- Name the configuration something like:
-
Set the Working Directory:
- Ensure the working directory is set to the project root (the directory containing the module packages).
- This is especially important if your module uses relative file paths or expects certain files nearby.
-
Save and Run:
- Click Apply, then OK.
- Select your new configuration and click the green run arrow
▶️ .
💡 Why this is needed:
Using the-mflag ensures the module is run in package context, which is important for relative imports. Running it this way also avoids issues with PyCharm injecting--fileand other debug flags, which can break CLI tools usingargparse.
Enable debug logging:
export LOG_LEVEL=DEBUGINFO: General operation informationWARNING: Non-critical issuesERROR: Critical errors requiring attentionDEBUG: Detailed debugging information
This project is proprietary to Arc XP. All rights reserved.