DBLTR Playbook

In this playbook, we go over the steps for debloating and serving a web application using DBLTR. At a high level, first we use the Less is More platform to generate a baseline usage profile of the web application users in the form of line coverage logs. Next, we import the code coverage data into the DBLTR Jupyter notebook and incorporate classifier to group users with similar behavior together under the same role (i.e., cluster). Finally, we deploy the docker compose environment with the produced configurations generated by DBLTR to serve the debloated web applications to the users.

LIM Setup

Less is More can be setup using the following guide: https://lessismore.debloating.com/. More details and playbook available at: https://playground.debloating.com/ After this step is done, we export the code coverage of each user into the CSV format. sql_to_csv.py script can help automate this process.

For this demonstration, Less is More is hosted under LIM/training directory for debloating phpMyAdmin. In this setup, we have 5 users (Alice, Bob, Charlie, David, and Eli), they perform minimal actions on phpMyAdmin (kept minimal for demonstration purposes). Alice, creates a database and inserts some rows of data, Bob does the same but also views the list of users, Charlie views the existing databases without making any changes, David, views databases and runs manual queries and finally Eli who only views various phpMyAdmin parameters.

After exporting the code coverage data from LIM, we use the provided python script (sql_to_csv.py) on a system with mysql-server installed to convert the database backup to csv files for DBLTR.

files.csv: includes the "filename" of covered files in the csv file.
lines.csv: is in "filename, line_number" format.

The output CSV files generated by LIM is available under LIM/training/users/.

Generating Roles and Debloating Web Applications

Now we switch our focus to the jupter notebook "rbd_dataanalysis". This notebook is hosted under analysis directory and can be setup using the provided docker-compose environment through: docker compose up -d and then navigating to http://localhost:8888/lab/tree/work/rbd_dataanalysis.ipynb. The token to access this notebook is set in the docker-compose env variable and is currently set to "jupytersecrettokenabhsyd68ay". We can follow the cells in the notebook. Certain steps can take a long time from 30 minutes to couple hours to complete on large applications with many users.

We have also provided the output of lengthy steps in the form of Python pickled objects. At the end of each section, the pickle files are restored. This would be an alternative to running individual cells in the notebook for that section.

Jupyter notebook sections

For the sections where pickle file is available, you can jump to the end of the section and quickly restore the data from the pickle file. For new web applications outside our dataset, the whole process needs to be followed instead of restoring pickle files.

Lib Imports: Prepares the packages required for the debloating and analysis of the results.
Import CSV Files [Pickle file available]: Import the CSV files for file and line coverage information of web application users.
Add Source Code Features [Pickle file available]: Extracts features from the code coverage data used to identify similar usage patterns in the clustering. This includes files, functions, classes, and namespaces used by each user's code coverage.
Clustering [Pickle file available]: We incorporate the spectral clustering algorithm in combination with Jaccards similarity metric to perform the clustering.
Evaluate Clusters: This step compares the debloating of various clusters to identify the optimal number of clusters (i.e., roles) for the web applications. The slope of the lines plotting the reduction of remaining functions after debloating based on the total number of roles can be used to optimize the total number of roles. We want the minimal number of roles that provide the best debloating, that is also referred to as the elbow method.
Optimal Cluster Size: Includes the number of roles determined by the previous step. In our example, 6, and 7 roles for phpMyAdmin and WordPress respectively.
Generate Artifacts [Pickle file available]: The output of clustering is the roles and mapping of users to roles. Based on this information, we merge the code coverage of users assigned to each role, debloat the copies of web applications specific to each role and generate the docker-compose file to serve these applications. Finally, we provide the user to role mapping information to our reverse-proxy to route user traffic towards their specially debloated web applications.
Generate Docker Environment Files: This step generates the docker files. This is the last step required to produce debloated web applications. We can now use the provided user-to-role mappings and docker-compose configuration to serve the web applications.
Attack Surface Reduction Analysis: This step is extra and can be used to extract and analyze the information about the reduced lines of code, removed CVEs, and gadget chains after debloating.

Serving the debloated web applications

In order to serve the debloated web applications, we use the generated mappings.txt configuration file including user to role mappings along with the docker-compose.yaml in the root of this repository to host the DBLTR setup. The web applications will be served under localhost:8080. Upon logging in, each the authentication cookie of each user is extracted by our OpenResty Lua modules and stored in the Redis datastore. Subsequent requests from users containing the authentication cookie will instruct the reverse-proxy to transparently route their requests towards their custom debloated web applications. Responses from DBLTR will include an "active_proxy" HTTP header to show which backend served that request.

A Demo of DBLTR protecting users against CVE-2019-12616 is available here at: https://vimeo.com/652161913/bc5fdd1eea

Adding new web applications to DBLTR

Setup the web application under a LIM-like setup to collect the code-coverage data from web application users for a period of time.
Import the code-coverage data into the debloating pipeline (Jupyter notebook) to produce the debloating roles.
Create the user authentication detection logic as a new OpenResty Lua module.
Use the provided configuration to host the debloated web applications.

OpenResty Authentication Detection Lua Module

The files for this module are located under docker/reverse-proxy/lua/. The skeleton of this code is available under common.lua as well as application specific files under pma (phpMyAdmin login detection) and wp (WordPress login detection). default.conf file which is an Nginx/OpenResty config file is used to activate the Lua module. At a high level:

login_handler.lua: Detects a successful login request, in the example of phpMyAdmin, this consists of a POST request towards the root of the web application and should result in a 302 HTTP response code. This module then extracts the authentication cookie value under "phpmyadmin" cookie. This mapping is stored in the redis datastore for future use.
login_username_extractor.lua: Extracts the provided username from the login request. In the example of phpMyAdmin, we look for "/" or "index.php" POST requests containing "pma_username" POST parameter containing the username.
redirect_to_proxy.lua: Looks for the presence of authentication cookies (e.g., "phpmyadmin"), it then tries to extract the username from the datastore based on the authentication cookie value. Next, we find the mapping of user to role and instruct the reverse-proxy to route user traffic to their debloated web application.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LIM		LIM
analysis		analysis
docker		docker
dockerfiles		dockerfiles
phpMyAdmin_clusters_spectral		phpMyAdmin_clusters_spectral
webapps		webapps
README.md		README.md
df_covered_functions.pickle		df_covered_functions.pickle
df_covered_ns.pickle		df_covered_ns.pickle
df_line_coverage_dict.pickle		df_line_coverage_dict.pickle
df_merged_line_coverage_spectral.pickle		df_merged_line_coverage_spectral.pickle
df_src_features.pickle		df_src_features.pickle
docker-compose.yaml		docker-compose.yaml
func_mapping.json		func_mapping.json
lloc_spectral.csv		lloc_spectral.csv
ns_mapping.json		ns_mapping.json
pma_lloc_spectral.csv		pma_lloc_spectral.csv
rbd_dataanalysis.ipynb		rbd_dataanalysis.ipynb
spectral_models.pickle		spectral_models.pickle
sql_to_csv.py		sql_to_csv.py
sql_to_csv.sh		sql_to_csv.sh
wp_lloc_spectral.csv		wp_lloc_spectral.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DBLTR Playbook

LIM Setup

Generating Roles and Debloating Web Applications

Jupyter notebook sections

Serving the debloated web applications

Adding new web applications to DBLTR

OpenResty Authentication Detection Lua Module

About

Uh oh!

Releases

Packages

Uh oh!

Languages

pragseclab/DBLTR_Demo

Folders and files

Latest commit

History

Repository files navigation

DBLTR Playbook

LIM Setup

Generating Roles and Debloating Web Applications

Jupyter notebook sections

Serving the debloated web applications

Adding new web applications to DBLTR

OpenResty Authentication Detection Lua Module

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages