In this playbook, we go over the steps for debloating and serving a web application using DBLTR. At a high level, first we use the Less is More platform to generate a baseline usage profile of the web application users in the form of line coverage logs. Next, we import the code coverage data into the DBLTR Jupyter notebook and incorporate classifier to group users with similar behavior together under the same role (i.e., cluster). Finally, we deploy the docker compose environment with the produced configurations generated by DBLTR to serve the debloated web applications to the users.
Less is More can be setup using the following guide: https://lessismore.debloating.com/. More details and playbook available at: https://playground.debloating.com/ After this step is done, we export the code coverage of each user into the CSV format. sql_to_csv.py script can help automate this process.
For this demonstration, Less is More is hosted under LIM/training
directory for debloating phpMyAdmin.
In this setup, we have 5 users (Alice, Bob, Charlie, David, and Eli), they perform minimal actions on phpMyAdmin (kept minimal for demonstration purposes). Alice, creates a database and inserts some rows of data, Bob does the same but also views the list of users, Charlie views the existing databases without making any changes, David, views databases and runs manual queries and finally Eli who only views various phpMyAdmin parameters.
After exporting the code coverage data from LIM, we use the provided python script (sql_to_csv.py
) on a system with mysql-server installed to convert the database backup to csv files for DBLTR.
- files.csv: includes the "filename" of covered files in the csv file.
- lines.csv: is in "filename, line_number" format.
The output CSV files generated by LIM is available under LIM/training/users/
.
Now we switch our focus to the jupter notebook "rbd_dataanalysis". This notebook is hosted under analysis
directory and can be setup using the provided docker-compose environment through: docker compose up -d
and then navigating to http://localhost:8888/lab/tree/work/rbd_dataanalysis.ipynb. The token to access this notebook is set in the docker-compose env variable and is currently set to "jupytersecrettokenabhsyd68ay
".
We can follow the cells in the notebook. Certain steps can take a long time from 30 minutes to couple hours to complete on large applications with many users.
We have also provided the output of lengthy steps in the form of Python pickled objects. At the end of each section, the pickle files are restored. This would be an alternative to running individual cells in the notebook for that section.
For the sections where pickle file is available, you can jump to the end of the section and quickly restore the data from the pickle file. For new web applications outside our dataset, the whole process needs to be followed instead of restoring pickle files.
- Lib Imports: Prepares the packages required for the debloating and analysis of the results.
- Import CSV Files [Pickle file available]: Import the CSV files for file and line coverage information of web application users.
- Add Source Code Features [Pickle file available]: Extracts features from the code coverage data used to identify similar usage patterns in the clustering. This includes files, functions, classes, and namespaces used by each user's code coverage.
- Clustering [Pickle file available]: We incorporate the spectral clustering algorithm in combination with Jaccards similarity metric to perform the clustering.
- Evaluate Clusters: This step compares the debloating of various clusters to identify the optimal number of clusters (i.e., roles) for the web applications. The slope of the lines plotting the reduction of remaining functions after debloating based on the total number of roles can be used to optimize the total number of roles. We want the minimal number of roles that provide the best debloating, that is also referred to as the elbow method.
- Optimal Cluster Size: Includes the number of roles determined by the previous step. In our example, 6, and 7 roles for phpMyAdmin and WordPress respectively.
- Generate Artifacts [Pickle file available]: The output of clustering is the roles and mapping of users to roles. Based on this information, we merge the code coverage of users assigned to each role, debloat the copies of web applications specific to each role and generate the docker-compose file to serve these applications. Finally, we provide the user to role mapping information to our reverse-proxy to route user traffic towards their specially debloated web applications.
- Generate Docker Environment Files: This step generates the docker files. This is the last step required to produce debloated web applications. We can now use the provided user-to-role mappings and docker-compose configuration to serve the web applications.
- Attack Surface Reduction Analysis: This step is extra and can be used to extract and analyze the information about the reduced lines of code, removed CVEs, and gadget chains after debloating.
In order to serve the debloated web applications, we use the generated mappings.txt
configuration file including user to role mappings along with the docker-compose.yaml
in the root of this repository to host the DBLTR setup.
The web applications will be served under localhost:8080.
Upon logging in, each the authentication cookie of each user is extracted by our OpenResty Lua modules and stored in the Redis datastore. Subsequent requests from users containing the authentication cookie will instruct the reverse-proxy to transparently route their requests towards their custom debloated web applications.
Responses from DBLTR will include an "active_proxy" HTTP header to show which backend served that request.
A Demo of DBLTR protecting users against CVE-2019-12616 is available here at: https://vimeo.com/652161913/bc5fdd1eea
- Setup the web application under a LIM-like setup to collect the code-coverage data from web application users for a period of time.
- Import the code-coverage data into the debloating pipeline (Jupyter notebook) to produce the debloating roles.
- Create the user authentication detection logic as a new OpenResty Lua module.
- Use the provided configuration to host the debloated web applications.
The files for this module are located under docker/reverse-proxy/lua/
. The skeleton of this code is available under common.lua as well as application specific files under pma (phpMyAdmin login detection) and wp (WordPress login detection).
default.conf file which is an Nginx/OpenResty config file is used to activate the Lua module.
At a high level:
- login_handler.lua: Detects a successful login request, in the example of phpMyAdmin, this consists of a POST request towards the root of the web application and should result in a 302 HTTP response code. This module then extracts the authentication cookie value under "phpmyadmin" cookie. This mapping is stored in the redis datastore for future use.
- login_username_extractor.lua: Extracts the provided username from the login request. In the example of phpMyAdmin, we look for "/" or "index.php" POST requests containing "pma_username" POST parameter containing the username.
- redirect_to_proxy.lua: Looks for the presence of authentication cookies (e.g., "phpmyadmin"), it then tries to extract the username from the datastore based on the authentication cookie value. Next, we find the mapping of user to role and instruct the reverse-proxy to route user traffic to their debloated web application.