Skip to content

aws-solutions-library-samples/guidance-for-a-predictive-responsible-gaming-model-using-amazon-sagemaker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Guidance for a Predictive Responsible Gaming Model Using Amazon SageMaker

Table of Contents

  1. Overview
  2. Prerequisites
  3. Deployment Steps
  4. Deployment Validation
  5. Data Preparation
  6. Running the Guidance
  7. Next Steps
  8. Cleanup

Optional

  1. Notices
  2. Authors

Overview

This project provides complete infrastructure for training and deploying a machine learning model for predicting problematic gambling behavior. It allows the user to create an Amazon SageMaker Domain with custom JupyterLab environments and necessary AWS resources for ML workloads. It is a Maven based project, so you can open this project with any Maven compatible Java IDE to build and run.

The betting and gaming industry requires accurate detection of problematic play signals to protect players and maintain regulatory compliance. Late identification of at-risk players damages both player welfare and industry reputation. Traditional monitoring methods detect issues after harm occurs, leading to regulatory penalties and erosion of public trust.

AWS's AI/ML services enable automated, data-driven detection of at-risk players. This solution processes player behavioral and financial data to identify concerning patterns through a machine learning model deployed on Amazon SageMaker. The model analyzes the following key metrics:

Metric Description
NUM_BETTING_DAYS Number of distinct days on which the user placed bets
AVG_BETS_PER_DAY Average number of bets placed per day
AVG_TOTAL_STAKE_PER_DAY Average total amount staked per day
AVG_TOTAL_PAYOUT_PER_DAY Average total payout received per day
AVG_TOTAL_NET_POSITION_PER_DAY Average daily net position (winnings minus losses)
AVG_STAKE_PER_BET Average amount staked per individual bet
AVG_PRICE_PER_BET Average odds (price) of bets placed
TOTAL_LATE_BETS_0004 Total number of bets placed between midnight and 4 AM
TOTAL_LATE_BETS_2024 Total number of bets placed between 8 PM and midnight
TOTAL_WINS Total number of winning bets
TOTAL_LOSSES Total number of losing bets
NET_ROI Net Return on Investment (total net position divided by total amount staked)
MAX_STAKE_PER_DAY Maximum amount staked in a single day
MIN_STAKE_PER_DAY Minimum amount staked in a single day
STDDEV_STAKE_PER_DAY Standard deviation of daily stake amounts
MAX_PAYOUT_PER_DAY Maximum payout received in a single day
MIN_PAYOUT_PER_DAY Minimum payout received in a single day
STDDEV_PAYOUT_PER_DAY Standard deviation of daily payout amounts
MAX_NET_POSITION_PER_DAY Maximum net position (profit or loss) in a single day
MIN_NET_POSITION_PER_DAY Minimum net position (profit or loss) in a single day
STDDEV_NET_POSITION_PER_DAY Standard deviation of daily net positions
WIN_RATIO Ratio of winning bets to total bets placed

The solution generates real-time risk assessments, allowing compliance teams to intervene immediately when the model identifies concerning patterns. This automation augments existing responsible gaming controls and strengthens regulatory compliance frameworks.

The provided data schema captures essential player behavior indicators. Organizations can extend these parameters based on their specific requirements and available data to optimize model performance.

The reference architecture of the guidance is shown below:

Alt text

Architecture

The platform includes:

  • SageMaker Domain with custom configurations
  • Dedicated VPC for ML workloads
  • S3 bucket for data storage
  • Custom IAM roles and permissions
  • JupyterLab environment with lifecycle configurations

Technical Components

  • AWS SageMaker Domain: Configured with custom JupyterLab settings
  • Jupyter Notebook Notebook with feature engineering, training and inference steps
  • VPC Configuration: Dedicated VPC with public subnets
  • Storage: S3 bucket for storing ML datasets and notebooks
  • Security: Custom IAM roles with specific permissions for SageMaker execution
  • Environment: Custom lifecycle configurations for JupyterLab startup

Cost

You are responsible for the cost of the AWS services used while running this Guidance. As of April 2024, the cost for running this Guidance with the default settings in the US East (N. Virginia) is approximately $10.50 per month for training the model.

We recommend creating a Budget through AWS Cost Explorer to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance.

Sample Cost Table

The following table provides a sample cost breakdown for deploying this Guidance with the default parameters in the US East (N. Virginia) Region for one month.

AWS service Dimensions Cost [USD]
Amazon S3 1 GB per month, 1000 PUT request and 1000 GET requests per month $ 0.03
Amazon SageMaker 1 user, 8 hours per day and 10 training jobs per month $ 10.46
Amazon SageMaker Serverless Inference, 10.000 requests with 100ms duration per request $ 0.02
Amazon SageMaker Real-time Inference, 1 model deployed, 1 model per endpoint, 1 instance per endpoint, 24 hours per day, 30 days per month (ml.c4.2xlarge) $ 344.16

Important

Remember to delete any deployed SageMaker endpoints when not in use to avoid unnecessary charges.

Prerequisites

Operating System

With the exception of the Python data preparation, these deployment instructions are OS agnostic since it is an AWS Cloud Development Kit (AWS CDK) Java-based project. As soon as you have installed the required tools, you should be able to deploy the project

Third-Party Tools

This is an AWS CDK project, so in order to build it, and deploy the resources the following tools are needed:

  • Java 17 or later
  • Apache Maven 3.8.6 or later
  • CDK 2.177 or later
  • npm 11.3.0 or later
  • aws CLI 2.23.9 or later
  • Python 3.7 or later
  • pip (Python package installer)

Ensure Python is installed and added to your system PATH

  python --version

AWS CDK Bootstrap

This Guidance uses aws-cdk. If you are using aws-cdk for first time, please perform the bootstrapping process described here

Deployment Steps

  1. Clone the repository
    git clone <github_repo_url>
  2. cd to the repo folder
    cd <repo-name>/deployment
  3. Install project dependencies:
    mvn clean package
  4. Get the synthesized CloudFormation template
    cdk synth
  5. Deploy the stack to your default AWS account and region
    cdk deploy
  6. Create a Python virtual environemnt and Install Python dependencies:
    cd ../source
    python -m venv myenv # create the Python virtual environment
    source myenv/bin/activate # activate the virtual environment
    pip install -r requirements.txt
  7. Other Useful commands
    cdk ls       #list all stacks in the app
    cdk diff     #compare deployed stack with current state
    

Example:

  1. Clone the repo using command git clone xxxxxxxxxx
  2. cd to the repo folder cd <repo-name>
  3. Install packages in requirements using command pip install -r requirement.txt
  4. Edit content of file-name and replace s3-bucket with the bucket name in your account.
  5. Run this command to deploy the stack cdk deploy
  6. Capture the domain name created by running this CLI command aws apigateway ............

Deployment Validation

  • Open CloudFormation console and verify the status of the template with the name starting with xxxxxx.
  • If deployment is successful, you should see an active SageMaker Domain with the name starting with in the SageMaker AI console.
  • Run the following CLI command to validate the deployment: aws cloudformation describe-stacks --stack-name xxxxxxxxxxxxx

Data Preparation

Before training the model, you'll need to prepare the training data through the following steps:

1. Download Source Data

Download the following datasets from The Transparency Project study "Behavioral Characteristics of Internet Gamblers Who Trigger Corporate Responsible Gambling Interventions":

  • Raw Dataset 1 - Demographics
  • Raw Dataset 2 - Daily Aggregates

Save both files to the source/ directory.

2. Generate Synthetic Wager Data

Create synthetic individual bet data from the daily aggregates using the generate_bets.py script:

  cd source
  python generate_bets.py raw_dataset_II_daily_aggregates synthetic_output

3. Create Training Dataset

Combine the demographic data, daily aggregates, and synthetic wagers to create the final training dataset:

  python analyze_betting_stats.py raw_dataset_II raw_dataset_I_demographics_data synthetic_output  betting_statistics
  • synthetic_output is previously generated file
  • betting_statistics is the name of the output file. If you select different name, you have to update the cell of the notebook which reads its data.

4. Upload to S3

Upload the generated training dataset to your S3 bucket:

  aws s3 cp betting_statistics s3://your-bucket-name/training_data/

Note: Replace your-bucket-name with your actual S3 bucket name where the model training data will be stored. It has already been created from the cdk code. You can get the name from the AWS console

Running the Guidance

As soon as the cdk code has been deployed on AWS, navigate to the SageMaker AI console. Select the responsible gaming domain and start the Jupyter notebook already deployed there. The notebook includes step-by-step instructions on how to train the model and deploy different inference endpoints (serverless or provisioned) to test it with your test data.

Next Steps

Further action you can take to extend this Guidance:

  1. You can use your own data to train the model
  2. Build a CI/CD Pipeline to train the model, monitor it and fine tune it
  3. Build batch or real-time inference endpoints to invoke the mode according to your specific use case

Cleanup

In order to clean up the resources created do the following steps

  • Empty the data of the created S3 buckets and delete them manually
  • Delete the EFS service attached to the SageMaker AI domain
  • Delete the SageMaker AI domain
  • Delete the CloudFormation stack

Notices

Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers.

Authors

Attributions

The dataset analyzed in this guidance was obtained from the Transparency Project, is a public data repository for privately-funded datasets, such as industry-funded data, specifically related to addictive behavior. The Division on Addiction of the Harvard Medical School created this repository to promote transparency for privately-funded science and better access to scientific information. The data originates from the study "Behavioral Characteristics of Internet Gamblers Who Trigger Corporate Responsible Gambling Interventions" and is publicly available at: http://www.thetransparencyproject.org/index.html

About

This Guidance shows how you can build and train an ML model using Amazon SageMaker AI to predict problematic gambling behavior.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5