This repository contains the source code and replication package for our research paper "Protect Your Secrets: Understanding and Measuring Data Exposure in VSCode Extensions". Our work investigates security vulnerabilities in VSCode extensions, particularly focusing on credential-related data exposure risks.
This project provides tools for:
- Collecting and crawling VSCode extensions from the marketplace
- Analyzing extensions for potential security vulnerabilities
- Detecting credential-related data exposure risks
/vscode-crawler-code/
: Scripts for collecting VSCode extensions (see crawler README for details)/analysis-code/
: Security analysis tools/data/
: Storage for extensions and analysis results
Our analysis pipeline consists of two main stages: extension unpacking and security analysis.
First, we unpack the VSCode extension (.vsix) files to extract their source code and manifest files. The unpacking process handles both simple and complex extensions with dependencies:
python3 analysis-code/1_unpack_vsix.py \
--vsix-path data/vsix/waxidiotic.jw-link.vsix \
--output-dir data/code \
--verbose
This will:
- Extract the extension package
- Process the manifest (package.json)
- Handle JavaScript bundling, beautify and consolidate all source code into a single file
- Output processed files to data/code/[extension-id]/
The unpacked files include:
extension.js
: Main extension code containing all consolidated source code- package.json: Extension manifest
After unpacking, we analyze the extensions for potential security vulnerabilities, particularly focusing on credential exposure risks:
python3 analysis-code/2_code_analysis.py \
--extension-id "waxidiotic.jw-link" \
--extension-dir data/code \
--output-dir data/output \
--sources-file data/sources.json
The analysis process:
- Builds Abstract Syntax Trees (AST) using Espree
- Constructs Program Dependency Graphs (PDG)
- Identifies potential credential exposure points by analyzing:
- Command registrations
- API usage patterns
- Configuration access
- Data flow between extensions
- Generates detailed analysis reports in data/output/[extension-id]/
The analysis is guided by predefined patterns in sources.json:
"sources": {
"Commands": [
"commands.registerCommand",
"commands.registerTextEditorCommand"
],
"WorkspaceConfiguration": [
"WorkspaceConfiguration.update",
"WorkspaceConfiguration.get",
"getConfiguration().get",
"workspace.getConfiguration"
],
"InputBox": [
"showInputBox"
],
"GlobalState": [
"globalState.update",
"globalState.get",
"workspaceState.update"
]
}
After extracting security-related patterns using static analysis, we manually labeled these patterns to create a ground truth dataset by examining their code context, documentation, and data flow to determine if they are credential-related.
In our manuscript, we employed a fine-tuned BERT model to automate the detection process. The model is trained on our labeled dataset to classify whether a code pattern involves credential-related operations. The code can be found on analysis-code/model_detect
- Python 3.8+
- Node.js and npm
data/extension_metas.json
: Extension metadata including names, categories, and download linksdata/sources.json
: API sources for vulnerability detectiondata/Ground_Truth_datasets.csv
: Manually labeled dataset (16,958 data points across 500 extensions)
If you use this work in your research, you are highly encouraged to cite the following paper:
@inproceedings{liu2025protect,
title={Protect Your Secrets: Understanding and Measuring Data Exposure in VSCode Extensions},
author={Liu, Yue and Tantithamthavorn, Chakkrit and Li, Li},
booktitle={2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)},
year={2025}
organization={IEEE}
}
Recent years have witnessed the emerging trend of extensions in modern Integrated Development Environments (IDEs) like Visual Studio Code (VSCode) that significantly enhance developer productivity. Especially, popular AI coding assistants like GitHub Copilot and Tabnine provide conveniences like automated code completion and debugging. While these extensions offer numerous benefits, they may introduce privacy and security concerns to software developers. However, there is no existing work that systematically analyzes the security and privacy concerns, including the risks of data exposure in VSCode extensions.
In this paper, we investigate on the security issues of cross-extension interactions in VSCode and shed light on the vulnerabilities caused by data exposure among different extensions. Our study uncovers high-impact security flaws that could allow adversaries to stealthily acquire or manipulate credential-related data (e.g., passwords, API keys, access tokens) from other extensions if not properly handled by extension vendors. To measure their prevalence, we design a novel automated risk detection framework that leverages program analysis and natural language processing techniques to automatically identify potential risks in VSCode extensions. By applying our tool to 27,261 real-world VSCode extensions, we discover that 8.5% of them (i.e., 2,325 extensions) are exposed to credential-related data leakage through various vectors, such as commands, user input, and configurations. Our study sheds light on the security challenges and flaws of the extension-in-IDE paradigm and provides suggestions and recommendations for improving the security of VSCode extensions and mitigating the risks of data exposure.
This project is licensed under the MIT License - see the LICENSE file for details.
In Section III of our paper, we demonstrate the attack vectors by creating six proof-of-concept extensions that were successfully published on the VSCode marketplace (and subsequently removed after approval). For research purposes, the source code of these proof-of-concept attacks is available upon request. Please contact us via email.
Our analysis identified 2,325 vulnerable extensions with their corresponding security patterns. This comprehensive dataset, including detailed vulnerability patterns and analysis results, is available for research purposes upon request. Please contact us via email.
For access to research artifacts or any questions, please contact:
- Knox ([email protected])