Guidance for a Predictive Responsible Gaming Model Using Amazon SageMaker

Overview

This project provides complete infrastructure for training and deploying a machine learning model for predicting problematic gambling behavior. It allows the user to create an Amazon SageMaker Domain with custom JupyterLab environments and necessary AWS resources for ML workloads. It is a Maven based project, so you can open this project with any Maven compatible Java IDE to build and run.

The betting and gaming industry requires accurate detection of problematic play signals to protect players and maintain regulatory compliance. Late identification of at-risk players damages both player welfare and industry reputation. Traditional monitoring methods detect issues after harm occurs, leading to regulatory penalties and erosion of public trust.

AWS's AI/ML services enable automated, data-driven detection of at-risk players. This solution processes player behavioral and financial data to identify concerning patterns through a machine learning model deployed on Amazon SageMaker. The model analyzes the following key metrics:

Metric	Description
NUM_BETTING_DAYS	Number of distinct days on which the user placed bets
AVG_BETS_PER_DAY	Average number of bets placed per day
AVG_TOTAL_STAKE_PER_DAY	Average total amount staked per day
AVG_TOTAL_PAYOUT_PER_DAY	Average total payout received per day
AVG_TOTAL_NET_POSITION_PER_DAY	Average daily net position (winnings minus losses)
AVG_STAKE_PER_BET	Average amount staked per individual bet
AVG_PRICE_PER_BET	Average odds (price) of bets placed
TOTAL_LATE_BETS_0004	Total number of bets placed between midnight and 4 AM
TOTAL_LATE_BETS_2024	Total number of bets placed between 8 PM and midnight
TOTAL_WINS	Total number of winning bets
TOTAL_LOSSES	Total number of losing bets
NET_ROI	Net Return on Investment (total net position divided by total amount staked)
MAX_STAKE_PER_DAY	Maximum amount staked in a single day
MIN_STAKE_PER_DAY	Minimum amount staked in a single day
STDDEV_STAKE_PER_DAY	Standard deviation of daily stake amounts
MAX_PAYOUT_PER_DAY	Maximum payout received in a single day
MIN_PAYOUT_PER_DAY	Minimum payout received in a single day
STDDEV_PAYOUT_PER_DAY	Standard deviation of daily payout amounts
MAX_NET_POSITION_PER_DAY	Maximum net position (profit or loss) in a single day
MIN_NET_POSITION_PER_DAY	Minimum net position (profit or loss) in a single day
STDDEV_NET_POSITION_PER_DAY	Standard deviation of daily net positions
WIN_RATIO	Ratio of winning bets to total bets placed

The solution generates real-time risk assessments, allowing compliance teams to intervene immediately when the model identifies concerning patterns. This automation augments existing responsible gaming controls and strengthens regulatory compliance frameworks.

The provided data schema captures essential player behavior indicators. Organizations can extend these parameters based on their specific requirements and available data to optimize model performance.

The reference architecture of the guidance is shown below:

Architecture

The platform includes:

SageMaker Domain with custom configurations
Dedicated VPC for ML workloads
S3 bucket for data storage
Custom IAM roles and permissions
JupyterLab environment with lifecycle configurations

Technical Components

AWS SageMaker Domain: Configured with custom JupyterLab settings
Jupyter Notebook Notebook with feature engineering, training and inference steps
VPC Configuration: Dedicated VPC with public subnets
Storage: S3 bucket for storing ML datasets and notebooks
Security: Custom IAM roles with specific permissions for SageMaker execution
Environment: Custom lifecycle configurations for JupyterLab startup

Cost

You are responsible for the cost of the AWS services used while running this Guidance. As of April 2024, the cost for running this Guidance with the default settings in the US East (N. Virginia) is approximately $10.50 per month for training the model.

We recommend creating a Budget through AWS Cost Explorer to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance.

Sample Cost Table

The following table provides a sample cost breakdown for deploying this Guidance with the default parameters in the US East (N. Virginia) Region for one month.

AWS service	Dimensions	Cost [USD]
Amazon S3	1 GB per month, 1000 PUT request and 1000 GET requests per month	$ 0.03
Amazon SageMaker	1 user, 8 hours per day and 10 training jobs per month	$ 10.46
Amazon SageMaker	Serverless Inference, 10.000 requests with 100ms duration per request	$ 0.02
Amazon SageMaker	Real-time Inference, 1 model deployed, 1 model per endpoint, 1 instance per endpoint, 24 hours per day, 30 days per month (ml.c4.2xlarge)	$ 344.16

Important

Remember to delete any deployed SageMaker endpoints when not in use to avoid unnecessary charges.

Prerequisites

Operating System

With the exception of the Python data preparation, these deployment instructions are OS agnostic since it is an AWS Cloud Development Kit (AWS CDK) Java-based project. As soon as you have installed the required tools, you should be able to deploy the project

Third-Party Tools

This is an AWS CDK project, so in order to build it, and deploy the resources the following tools are needed:

Java 17 or later
Apache Maven 3.8.6 or later
CDK 2.177 or later
npm 11.3.0 or later
aws CLI 2.23.9 or later
Python 3.7 or later
pip (Python package installer)

Ensure Python is installed and added to your system PATH

  python --version

AWS CDK Bootstrap

This Guidance uses aws-cdk. If you are using aws-cdk for first time, please perform the bootstrapping process described here

Deployment Steps

Clone the repository
```
git clone <github_repo_url>
```
cd to the repo folder
```
cd <repo-name>/deployment
```
Install project dependencies:
```
mvn clean package
```
Get the synthesized CloudFormation template
```
cdk synth
```
Deploy the stack to your default AWS account and region
```
cdk deploy
```

Create a Python virtual environemnt and Install Python dependencies:

cd ../source
python -m venv myenv # create the Python virtual environment
source myenv/bin/activate # activate the virtual environment
pip install -r requirements.txt

Other Useful commands

cdk ls       #list all stacks in the app
cdk diff     #compare deployed stack with current state

Example:

Clone the repo using command git clone xxxxxxxxxx
cd to the repo folder cd <repo-name>
Install packages in requirements using command pip install -r requirement.txt
Edit content of file-name and replace s3-bucket with the bucket name in your account.
Run this command to deploy the stack cdk deploy
Capture the domain name created by running this CLI command aws apigateway ............

Deployment Validation

Open CloudFormation console and verify the status of the template with the name starting with xxxxxx.
If deployment is successful, you should see an active SageMaker Domain with the name starting with in the SageMaker AI console.
Run the following CLI command to validate the deployment: aws cloudformation describe-stacks --stack-name xxxxxxxxxxxxx

Data Preparation

Before training the model, you'll need to prepare the training data through the following steps:

1. Download Source Data

Download the following datasets from The Transparency Project study "Behavioral Characteristics of Internet Gamblers Who Trigger Corporate Responsible Gambling Interventions":

Raw Dataset 1 - Demographics
Raw Dataset 2 - Daily Aggregates

Save both files to the source/ directory.

2. Generate Synthetic Wager Data

Create synthetic individual bet data from the daily aggregates using the generate_bets.py script:

  cd source
  python generate_bets.py raw_dataset_II_daily_aggregates synthetic_output

3. Create Training Dataset

Combine the demographic data, daily aggregates, and synthetic wagers to create the final training dataset:

  python analyze_betting_stats.py raw_dataset_II raw_dataset_I_demographics_data synthetic_output  betting_statistics

synthetic_output is previously generated file
betting_statistics is the name of the output file. If you select different name, you have to update the cell of the notebook which reads its data.

4. Upload to S3

Upload the generated training dataset to your S3 bucket:

  aws s3 cp betting_statistics s3://your-bucket-name/training_data/

Note: Replace your-bucket-name with your actual S3 bucket name where the model training data will be stored. It has already been created from the cdk code. You can get the name from the AWS console

Running the Guidance

As soon as the cdk code has been deployed on AWS, navigate to the SageMaker AI console. Select the responsible gaming domain and start the Jupyter notebook already deployed there. The notebook includes step-by-step instructions on how to train the model and deploy different inference endpoints (serverless or provisioned) to test it with your test data.

Next Steps

Further action you can take to extend this Guidance:

You can use your own data to train the model
Build a CI/CD Pipeline to train the model, monitor it and fine tune it
Build batch or real-time inference endpoints to invoke the mode according to your specific use case

Cleanup

In order to clean up the resources created do the following steps

Empty the data of the created S3 buckets and delete them manually
Delete the EFS service attached to the SageMaker AI domain
Delete the SageMaker AI domain
Delete the CloudFormation stack

Notices

Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers.

Authors

Attributions

The dataset analyzed in this guidance was obtained from the Transparency Project, is a public data repository for privately-funded datasets, such as industry-funded data, specifically related to addictive behavior. The Division on Addiction of the Harvard Medical School created this repository to promote transparency for privately-funded science and better access to scientific information. The data originates from the study "Behavioral Characteristics of Internet Gamblers Who Trigger Corporate Responsible Gambling Interventions" and is publicly available at: http://www.thetransparencyproject.org/index.html

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
deployment		deployment
source		source
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Guidance for a Predictive Responsible Gaming Model Using Amazon SageMaker

Table of Contents

Overview

Architecture

Technical Components

Cost

Sample Cost Table

Important

Prerequisites

Operating System

Third-Party Tools

AWS CDK Bootstrap

Deployment Steps

Deployment Validation

Data Preparation

1. Download Source Data

2. Generate Synthetic Wager Data

3. Create Training Dataset

4. Upload to S3

Running the Guidance

Next Steps

Cleanup

Notices

Authors

Attributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

aws-solutions-library-samples/guidance-for-a-predictive-responsible-gaming-model-using-amazon-sagemaker

Folders and files

Latest commit

History

Repository files navigation

Guidance for a Predictive Responsible Gaming Model Using Amazon SageMaker

Table of Contents

Overview

Architecture

Technical Components

Cost

Sample Cost Table

Important

Prerequisites

Operating System

Third-Party Tools

AWS CDK Bootstrap

Deployment Steps

Deployment Validation

Data Preparation

1. Download Source Data

2. Generate Synthetic Wager Data

3. Create Training Dataset

4. Upload to S3

Running the Guidance

Next Steps

Cleanup

Notices

Authors

Attributions

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages