Welcome to survey_cleaner

survey_cleaner is a project that aims to streamline the process of cleaning survey data by automating common cleaning tasks. Designed to generalize to survey data on different topics, survey_cleaner provides functions to remove duplicate responses, remove unnecessary whitespaces, normalize responses to binary format, and convert ordinal-type responses to numeric data. The package sets up a standardized cleaning framework that can be carried across multiple projects and helps users to reduce manual preprocessing time and minimize errors.

Functions

remove_duplicates: keeps only the latest survey response from each individual.
handle_emptyStrings: handle None, raise TypeError for non-string inputs, collapse all whitespace into single spaces and strip leading/trailing whitespace, and write the corresponding docstring.
normalize_binary: converts binary responses such as True and False, T and F, or Yes and No to a binary format (0 and 1).
word_to_ordinal: gives ranking words such as Best, Better, Good, Bad, Worst a numerical rating so that responses can be organized by their numerical values. Likert scale are set up as default rankings but users can also provide their own rankings.

Python Ecosystem

While there are a number of text cleaning packages available on PyPi such as clean-text which preprocesses raw text data on the web, there is no package that is specifically dedicated to cleaning survey response data which is something the survey_cleaner package addresses.

Installation

You can install the latest release of survey_cleaner from TestPyPI using pip:

$ pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ survey_cleaner

Usage Examples:

Clean Whitespace

from survey_cleaner import handle_emptyStrings
import pandas as pd

# Removes leading/trailing whitespace and collapses multiple spaces
df['comments'] = df['comments'].apply(handle_emptyStrings)

Normalize Binary Responses

from survey_cleaner import normalize_binary
import pandas as pd

# Converts Yes/No, True/False, T/F to 1/0
df = pd.DataFrame({'response': ['Yes', 'No', 'Yes']})
df['response'] = df['response'].apply(normalize_binary)

Convert Ordinal Responses to Numeric

from survey_cleaner import word_to_ordinal
import pandas as pd

feedback = pd.Series(["strongly agree", "agree", 
                     "neither agree nor disagree", "disagree"])

# Customized mapping, warnning for unmapped values
word_to_ordinal(feedback, mapping={"strongly agree": 5, "Bad": 0})
# Using default Likert scale
word_to_ordinal(feedback, likert="agreement")

Remove Duplicate Responses

from survey_cleaner import remove_duplicates

responses = pd.DataFrame({
     'respondent_id': [1, 2, 1, 3],
     'completed_at': ['2024-01-01 10:00', '2024-01-01 11:00', 
                      '2024-01-01 12:00', '2024-01-01 13:00'],
     'answer': ['Yes', 'No', 'Maybe', 'Yes']
 })
 
clean_responses = remove_duplicates(responses, 'respondent_id', 'completed_at')

Developer Setup

Clone the repository to your local:

$ git clone https://github.com/UBC-MDS/DSCI_524_group35_survey_cleaner.git

$ cd DSCI_524_group35_survey_cleaner

It is recommended but not required to use the environment file to create a conda environment:

$ conda env create -f environment.yml

$ conda activate survey_cleaner

You can install this package in development mode

$ pip install -e ".[docs]"

Run the test suite:

$ pytest tests/

Build Documentation

Building Documentation Locally

Install documentation dependencies

$ pip install -e ".[docs]"

Generate API documentation using quartodoc

quartodoc build

Preview the documentation

quarto preview

Render the final HTML output

quarto render

Deploy Documentation

Workflow Overview

Workflow	Trigger	Purpose
`build.yml`	Push/PR to main	Runs tests and builds package
`deploy.yml`	Push to main (after tests pass)	Deploys package to TestPyPI
`quartodoc.yml`	Push/PR to main	Builds API documentation with quartodoc
`quartodoc-publish.yml`	Push to main	Publishes documentation to GitHub Pages

Documentation Build Workflow

The quartodoc.yml workflow automatically:

Checks out the repository
Sets up Python environement
Installs package with Document dependencies
Runs quartodoc build to generate API docs
Validates the documentation build

Documentation Publish Workflow

The quartodoc-publish.yml workflow automatically

Builds the documentation using Quarto
Deploys to GitHub Pages when changes are pushed to main
Makes documentation available to GitHub Pages URL

Viewing Published Documentation

Once deployed, documentation is available at:

GitHub Pages: https://ubc-mds.github.io/DSCI_524_group35_survey_cleaner/
Netlify: https://dsci524group35surveycleaner.netlify.app/

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Contributors

Natalie Truesdell, Amanpreet Binepal, Jay Li, Junli

Copyright

Free software distributed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
.github		.github
reference		reference
src/survey_cleaner		src/survey_cleaner
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
README.md		README.md
_quarto.yml		_quarto.yml
environment.yml		environment.yml
index.qmd		index.qmd
objects.json		objects.json
pyproject.toml		pyproject.toml
readme.qmd		readme.qmd
retrospective.qmd		retrospective.qmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to survey_cleaner

Table of Contents

Functions

Python Ecosystem

Installation

Usage Examples:

Clean Whitespace

Normalize Binary Responses

Convert Ordinal Responses to Numeric

Remove Duplicate Responses

Developer Setup

Build Documentation

Building Documentation Locally

Deploy Documentation

Workflow Overview

Documentation Build Workflow

Documentation Publish Workflow

Viewing Published Documentation

Contributing

Contributors

Copyright

About

Uh oh!

Releases 21

Packages

Contributors 4

Uh oh!

Languages


Testing
Package
Meta

License

UBC-MDS/DSCI_524_group35_survey_cleaner

Folders and files

Latest commit

History

Repository files navigation

Welcome to survey_cleaner

Table of Contents

Functions

Python Ecosystem

Installation

Usage Examples:

Clean Whitespace

Normalize Binary Responses

Convert Ordinal Responses to Numeric

Remove Duplicate Responses

Developer Setup

Build Documentation

Building Documentation Locally

Deploy Documentation

Workflow Overview

Documentation Build Workflow

Documentation Publish Workflow

Viewing Published Documentation

Contributing

Contributors

Copyright

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Contributors 4

Uh oh!

Languages

Packages