Skip to content

Latest commit

 

History

History
79 lines (57 loc) · 3.6 KB

File metadata and controls

79 lines (57 loc) · 3.6 KB

Diverse Group Selection Script

Overview

This script is designed to help select a diverse group of individuals from a larger dataset, ensuring a balanced representation across various dimensions such as gender, age group, education, residence, disability, and interest. It includes features to bias the selection process to either over-represent or under-represent certain groups based on predefined criteria, enhancing the flexibility of the selection process to suit specific needs.

Aim

The primary aim of this script is to facilitate the creation of a diversified group from a dataset, ensuring that the final selection mirrors a balanced and diverse representation. It's particularly useful in scenarios where equitable representation is critical, such as in surveys, research studies, or team formations.

Installation

To run this script, you will need Python installed on your machine, along with the following libraries:

  • Pandas
  • NumPy

You can install these libraries using pip:

pip install pandas numpy

Test data generation

$ python generate_csv.py

Execution

To execute the script, follow these steps:

  1. Prepare your dataset according to the specified data structure and save it as a CSV file.
  2. Run the script using a Python interpreter. The first parameter is the CSV file path, the second is the target group size.
$ python main.py example_people_data.csv 60

Biasing Options

The script supports biasing options to either over-represent or under-represent specific groups within the dataset. This feature allows for more control over the diversity of the selected group, making it possible to adjust the selection process based on specific needs or goals.

How to Use Biasing

Biasing is applied through predefined criteria within the script. These criteria can be adjusted by modifying the bias_weights calculation, which assigns different weights to individuals based on attributes such as 'Disability', 'Age Group', or any other column in the dataset.

For example, to over-represent individuals with disabilities, a higher weight is assigned to records where Disability == 'Yes'. Conversely, to under-represent middle-aged males, a lower weight can be assigned to records matching this criterion.

Customizing Bias Criteria

You can customize the bias criteria by editing the select_diverse_group_with_bias in main.py

Data Structure

Your dataset should be a CSV file with the following columns:

  • ID: An incremental integer identifying each record.
  • Age Group: Categorized age groups, e.g., '18-29', '30-39', etc.
  • Education: Level of education, e.g., 'Elementary', 'Secondary', 'Higher'.
  • Gender: Gender identification, e.g., 'Male', 'Female'.
  • Residence: Type of residence, e.g., 'Capital', 'Non-Capital'.
  • Disability: Disability status, e.g., 'Yes', 'No'.
  • Interest: Level of interest, e.g., 'No', 'Some', 'Yes'.

Example CSV dataset structure:

ID,Age Group,Education,Gender,Residence,Disability,Interest
1,18-29,Higher,Female,Capital,No,Yes
2,30-39,Secondary,Male,Non-Capital,Yes,Some
...

AI sources

This script has been made with the help of ChatGPT v4. Please, find the related conversations below:

Simple diverse selection:

https://chat.openai.com/share/1f1c89a2-59ae-44a6-979c-ce720b279229,

Extended dimensions and bias:

https://chat.openai.com/share/437498d5-7535-4788-87ca-c0dc1e35a2c2

Contributing

Contributions are welcome! If you have suggestions or enhancements, please open an issue or submit a pull request.

License

MIT License - Feel free to use, modify, and distribute this script as you see fit.