arc-content-report

A suite of Python tools for analyzing and managing Arc XP content that are low value and candidates for deletion. This repository provides script templates for identifying redirects, unpublished wires content, and unused published photos. All modules feature parallel processing, rate limiting, and comprehensive logging for content analysis workflows.

You are encouraged to clone the repository, run the script templates in a local development environment, and customize the script templates with customizations that make them conform to your organizations unique content and business rules.

📁 Repository Structure

Core Scripts

Content Analysis Modules

redirects_report/ - Redirects analysis and HTTP status validation
- delete_redirects.py - Deletes redirects
- delete_redirects_parallel_processor.py - Optimized parallel processing engine with dynamic worker scaling
- identify_redirects.py - Identifies redirects within date ranges and validates HTTP status codes
- identify_redirects_parallel_processor.py - Optimized parallel processing engine with dynamic worker scaling
- status_checker.py - Async HTTP status checking (200/404 validation) for redirect URLs
wires_report/ - Unpublished story wires content analysis and cleanup
- identify_wires.py - Identifies unpublished wires content for potential deletion
- identify_wires_parallel_processor.py - Parallel processing engine with ElasticSearch query optimization for wires identification
- delete_wires.py - Deletes wires
- delete_wires_parallel_processor.py - Parallel processing engine for wires deletion
images_report/ - Photo Center unused published image analysis and management
- published_photo_analysis.py - Analyzes published photos to identify unused images
- delete_or_expire_photos.py - Deletes or expires photos from Photo Center
- create_lightbox_cache.py - Creates SQLite cache of lightbox data for analysis of photo usage in lightboxes
- images_parallel_processor.py - Parallel processing engine for photo operations and image management

Shared Utilities

daterange_builder.py - Automatic date range splitting for API rate limits
utils.py - Shared utility functions, logging, rate limiting, and timing decorators

Configuration & Setup

config.env - Template for API credentials (copy to .env)
requirements.txt - Python dependencies

Documentation

redirects_report/README.md - Redirects report documentation and usage guide
wires_report/README.md - Unpublished wires articles report documentation and usage guide
images_report/README.md - Unused published images report documentation and usage guide
README.md - This file

Testing

Run the test suite to verify the module is working correctly:

PYTHONPATH=. pytest tests/
# OR
python -m pytest tests/ -v

Directories

logs/ - Log files (auto-created, gitignored)
spreadsheets/ - Output CSV files (auto-created, gitignored)
databases/ - SQLlite databases (auto-created, gitignored)

Features

Automatic Worker Optimization: Dynamic scaling based on performance
Comprehensive Logging: Detailed performance monitoring and error tracking
Unit Testing: Comprehensive test coverage for all components
Environment Configuration: Secure credential management
Parallel Processing: Fast API calls with configurable worker pools
Async Status Checking: Fast HTTP status validation

🔧 Usage

Prerequisites

Python 3.9+
Arc XP API credentials

Local Development Setup

Clone and setup environment:

cd arc-content-report
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Copy credentials: cp config.env .env
Edit .env with your API credentials
Run redirects report: See redirects_report/README.md
Run wires report: See wires_report/README.md
Run images report: See images_report/README.md

🛠️ Running Modules in PyCharm (`-m` Flag Setup)

Using the PyCharm IDE allows you to set breakpoints to stop the code while it's running and examine the variables and their values that exist at the point in time. While not necessary for the function of these modules, it can be a useful development or debugging tool.

To run any module (e.g., redirects_report.identify_redirects, wires_report.identify_wires, or images_report.published_photo_analysis) with command-line arguments in PyCharm (using python -m ...), follow these steps:

Open Run Configurations:
- In PyCharm, go to Run > Edit Configurations....
Create a New Configuration:
- Click the + icon at the top left and select "Python".
Configure the Module:
- Name the configuration something like: Run [module_name].
- Select "Module name" (not "Script path") and enter the module path:
```
[module_name].[script_name]
```
  Example: wires_report.identify_wires
- In the "Parameters" field, enter the command-line arguments you'd normally use.
Set the Working Directory:
- Ensure the working directory is set to the project root (the directory containing the module packages).
- This is especially important if your module uses relative file paths or expects certain files nearby.
Save and Run:
- Click Apply, then OK.
- Select your new configuration and click the green run arrow ▶️.

💡 Why this is needed:
Using the -m flag ensures the module is run in package context, which is important for relative imports. Running it this way also avoids issues with PyCharm injecting --file and other debug flags, which can break CLI tools using argparse.

📈 Monitoring and Logging

Debug Mode

Enable debug logging:

export LOG_LEVEL=DEBUG

Log Levels

INFO: General operation information
WARNING: Non-critical issues
ERROR: Critical errors requiring attention
DEBUG: Detailed debugging information

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

arc-content-report

📁 Repository Structure

Core Scripts

Content Analysis Modules

Shared Utilities

Configuration & Setup

Documentation

Testing

Directories

Features

🔧 Usage

Prerequisites

Local Development Setup

🛠️ Running Modules in PyCharm (`-m` Flag Setup)

📈 Monitoring and Logging

Debug Mode

Log Levels

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images_report		images_report
redirects_report		redirects_report
tests		tests
wires_report		wires_report
.cursorrules		.cursorrules
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
config.env		config.env
daterange_builder.py		daterange_builder.py
requirements.txt		requirements.txt
utils.py		utils.py

arcxp/arc-content-report

Folders and files

Latest commit

History

Repository files navigation

arc-content-report

📁 Repository Structure

Core Scripts

Content Analysis Modules

Shared Utilities

Configuration & Setup

Documentation

Testing

Directories

Features

🔧 Usage

Prerequisites

Local Development Setup

🛠️ Running Modules in PyCharm (-m Flag Setup)

📈 Monitoring and Logging

Debug Mode

Log Levels

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

🛠️ Running Modules in PyCharm (`-m` Flag Setup)

Packages