Skip to content

This repository contains two Python scripts for managing external organizations in Pure. The first script identifies duplicate External Organisations and exports them to a CSV file. The second script merges duplicate External Organisations based on the CSV data.

License

Notifications You must be signed in to change notification settings

svidmar/PureAPI_Ex_org_cleaning

Repository files navigation

External Organisation Management Scripts for Elsevier's Pure

This repository contains two Python scripts for managing external organizations in Pure. The first script identifies duplicate External Organisations and exports them to a CSV file. The second script merges duplicate External Organisations based on the CSV data.

Overview of Scripts

1. Duplicate Finder Script

This script fetches data from the Pure API, identifies potential duplicate organizations based on name, country, and type, and exports the duplicates to a CSV file. Only external organisations with an exact match on both name, country and type are defined as duplicates, in order to avoid merging non-duplicate external organisations by accident - but adjust the logic accordingly, to accomodate for any 'business logic'.

Features

  • Fetches all external organizations from Pure via paginated API calls.
  • Groups organizations by name, country, and type.
  • Identifies duplicates and determines a potential merge candidate based on workflow status. If none of the external organisation duplicates have workflow status 'Approved', the first UUID will be set as target.
  • Saves duplicates to duplicate_organizations.csv.

Usage

  1. Run the script:

    python PureAPI_Ex_org_duplicate_finder.py
  2. Enter the following when prompted:

    • Base URL: The API's base URL (e.g., xyz.elsevierpure.com).
    • API Key: Your API key for authentication.
  3. The script will create a CSV file named duplicate_organizations.csv with the following columns:

    • Organization Name: Name of the organization.
    • Country: Country of the organization.
    • Type: Type of the organization.
    • UUIDs: Comma-separated list of UUIDs for duplicate organizations.
    • Count: Number of duplicates.
    • Merge Candidate: UUID of the suggested merge target.

2. Merge Script

This script reads the duplicate_organizations.csv file generated by the first script and merges duplicate organizations using the API.

Features

  • Reads duplicates and merge candidates from a CSV file.
  • Confirms the merge operation with the user before proceeding.
  • Sends merge requests to the API with detailed logging of results.

Usage

  1. Ensure that the duplicate_organizations.csv file is present in the same directory.

  2. Run the script:

    python PureAPI_Ex_org_merger.py
  3. Enter the following when prompted:

    • Base URL: The API's base URL (e.g., xyz.elsevierpure.com).
    • API Key: Your API key for authentication.
  4. The script will log the merge results to merge_log.txt.


Requirements

Python Libraries

  • requests
  • csv (Standard Python Library)
  • collections (Standard Python Library)

Install any missing libraries using pip:

pip install requests

CSV Format

The duplicate_organizations.csv file should have the following columns:

  • Organization Name
  • Country
  • Type
  • UUIDs
  • Count
  • Merge Candidate

Logs

Both scripts generate logs for tracking operations:

  • Duplicate Finder Script: Outputs duplicate_organizations.csv.
  • Merge Script: Logs all operations to merge_log.txt, including successful merges and errors.

License

This project is licensed under the MIT License.

About

This repository contains two Python scripts for managing external organizations in Pure. The first script identifies duplicate External Organisations and exports them to a CSV file. The second script merges duplicate External Organisations based on the CSV data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages