Name	Name	Last commit message	Last commit date
parent directory ..
assets	assets
code	code
infra	infra
.gcloudignore	.gcloudignore
README.md	README.md
prereq.sh	prereq.sh
tutorial.md	tutorial.md

Protect your data using Data Loss Prevention

Introduction

This architecture uses a serverless pipeline to securely process and store logs, ensuring sensitive information remains masked while retaining valuable insights for troubleshooting. By leveraging the Data Loss Prevention (DLP) API's powerful redaction capabilities, it enables organizations to confidently analyze log data without compromising data privacy.

In this architecture, log entries are routed from their source to Pub/Sub, a scalable messaging service. A Cloud Run pipeline then ingests these log entries, aggregates them into batches to optimize DLP API calls, and invokes the DLP service for content inspection and transformation.

The DLP service, utilizing pre-defined or custom infoType detectors, identifies sensitive information within the log entries. It then applies configurable masking techniques, such as tokenization or redaction, to obfuscate sensitive data. The transformed logs, now free of sensitive information, are then stored in a designated log bucket (e.g., Cloud Storage), ready for further analysis.

This architecture allows for seamless integration between log routing, batch processing, and the DLP API, enabling organizations to protect sensitive information while maintaining the utility of their log data for troubleshooting and analysis. It ensures compliance with data privacy regulations and best practices, safeguarding both customer data and internal confidential information.

Use cases

Automated Sensitive Data Masking in Application Logs :This pipeline automatically masks sensitive information in application logs to prevent data breaches. Applications generate logs that are collected by Cloud Logging. The Log Router sends these logs to Pub/Sub, where Cloud Run processes them using the DLP API to identify and mask sensitive data. The masked logs are then securely stored back in Cloud Logging, ensuring compliance with data protection regulations.
Real-Time Compliance Monitoring and Reporting :Financial institutions can use this pipeline to comply with regulatory requirements by monitoring sensitive financial data in real-time. Application logs collected by Cloud Logging are forwarded to Pub/Sub. Cloud Run processes these logs, utilizing the DLP API to mask sensitive information. The de-identified logs are stored back in Cloud Logging, while a separate Cloud Run job or BigQuery can generate compliance reports automatically.
Incident Response and Alerting for Data Breaches :Organizations can enhance their incident response capabilities with this pipeline by detecting and responding to data breaches involving sensitive information. Application logs collected by Cloud Logging are sent to Pub/Sub. Cloud Run processes these logs with the DLP API to detect and mask sensitive data. Alerts are published to a separate Pub/Sub topic if sensitive data is detected, triggering incident response workflows via Cloud Functions or other services to notify security teams and initiate remediation processes.

Architecture

The main components that we would be setting up are (to learn more about these products, click on the hyperlinks)

Cloud Logging : fully managed service for storing, searching, analyzing, monitoring, and alerting on log data and events.
Pub/Sub : asynchronous messaging service that allows for communication between services. It is used for streaming analytics, data integration, and event distribution.
CloudRun: fully managed serverless platform on Google Cloud that allows you to effortlessly run stateless containers.It automatically scales your application based on traffic, ensuring optimal resource utilization and cost-efficiency.
Cloud Data Loss Prevention : a service offered by Google Cloud to help organizations discover, classify, and protect their most sensitive data.

Costs

Pricing Estimates - We have created a sample estimate based on some usage we see from new startups looking to scale. This estimate would give you an idea of how much this deployment would essentially cost per month at this scale and you extend it to the scale you further prefer. Here's the link. You can also get the idea for Data Loss Prevention pricing for here

Deploy the architecture

Before we deploy the architecture, you will need the following information:

The project ID

Estimated deployment time: 15 min

Follow the steps below to deploy the architecture:

Click on Open in Google Cloud Shell button below.

Run the prerequisites script to enable APIs permissions.

sh prereq.sh

Next, you'll be asked to enter the project ID of the destination project. Please provide the project ID when prompted.

After this is complete, you can kick off the Cloud Run application Generate Service with the following command:

gcloud run deploy generate-service --source code/generator/ --region us-central1 --update-env-vars PROJECT_ID=<PROJECT ID>

NOTE:
Upon executing this command, you will be prompted to specify whether you wish to create a new repository and allow unauthenticated invocations. Respond with Y to create the repository and N to disallow unauthenticated invocations.

Now, to proceed with the Redact Service application, follow these steps:

gcloud run deploy redact-service --source code/redact/ --region us-central1 --update-env-vars PROJECT_ID=<PROJECT ID>

NOTE: Upon executing this command, you will be asked whether to allow unauthenticated invocations. Respond with N to disallow such invocations.

If you face any issues while running these commands,please attempt to run them again in a clean project.

Now you need to create a log router that will intercept de generate-service's logs and send them to the Pub/Sub. The Pub/Sub will send every message to redact-service which will apply the necessary masking to protect sensitive information.

You need the redact-service's url. To obtain the URL for "redact-service," execute the following command:

gcloud run services describe redact-service --region us-central1 --format 'value(status.url)'

And now you can run the following command to deploy the DLP project:

terraform apply -var project_id=<PROJECT ID>

Change the <PROJECT ID> for your project id.

Result

Congratulations! The DLP project deployment should now be underway. Please be patient as this process might take some time. Kindly keep this window open during the deployment. Once completed, we'll proceed to test the architecture and then guide you through cleaning up your environment.

Testing the architecture

Once you have deployed the solution successfully, let's test. Run the below command to check the log:

gcloud beta run services logs read redact-service --limit=20 --project <PROJECT ID> --region us-central1

Check the logs in the output.

Example:

{
  'name': '[SENSITIVE DATA]',
  'email': '[SENSITIVE DATA]',
  'address': '1234 [SENSITIVE DATA]',
  'phone_number': '123456789',
  'ssn': '[SENSITIVE DATA]',
  'credit_card_number': '123456789'
}

Cleaning up your environment

Execute the command below on Cloud Shell to destroy the resources.

terraform destroy -var project_id=<PROJECT ID>

To delete Cloud Run, execute the below commands :

gcloud run services delete redact-service --region us-central1

gcloud run services delete generate-service --region us-central1

We also need to delete the images that were generated. Run the below command to delete the images:

gcloud artifacts docker images delete us-central1-docker.pkg.dev/<PROJECT ID>/cloud-run-source-deploy/generate-service

gcloud artifacts docker images delete us-central1-docker.pkg.dev/<PROJECT ID>/cloud-run-source-deploy/redact-service

The above commands will delete the associated resources so there will be no billable charges made afterwards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Protect your data using Data Loss Prevention

Introduction

Use cases

Architecture

Costs

Deploy the architecture

Result

Testing the architecture

Cleaning up your environment

FilesExpand file tree

data-loss-prevention

Directory actions

More options

Directory actions

More options

Latest commit

History

data-loss-prevention

Folders and files

parent directory

README.md

Protect your data using Data Loss Prevention

Introduction

Use cases

Architecture

Costs

Deploy the architecture

Result

Testing the architecture

Cleaning up your environment