Text+Probabilistic extraction tutorials by mgrange1998 · Pull Request #73 · facebookresearch/PrivacyGuard

mgrange1998 · 2025-09-30T15:22:12Z

Summary: This adds the two remaining tutorials for probabilistic_memorization_tutorial.ipynb and text_extraction_tutorial.ipynb

Differential Revision: D83571788

fbshipit-source-id: 60dac20ef2ad5e372abd5036bf422cf1c84a9f46

…manually, and recursively search output dir (#3) Summary: Pull Request resolved: facebookresearch/PrivacyGuard#3 # Details - Adds "`TextInclusionAttackBatch(BaseAttack)`", which takes a directory of LLM generations and prepares an TextInclusionAnalysisInputBatch, for use in text inclusion analysis - The LLM generation directory is now searched recursively, instead of searching only for topline files Reviewed By: iden-kalemaj Differential Revision: D76137139 fbshipit-source-id: 1ec0ddd059872df9258011c59164983eaf616cc5

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#5 Implement main class for privacy analysis method in "Auditing f -Differential Privacy in One Run" (https://arxiv.org/abs/2410.22235) Reviewed By: mgrange1998 Differential Revision: D77028731 fbshipit-source-id: e042c92f54a97a31b192bf02e1684becb491ab35

Summary: Implementing LiRA for eDP for features Pull Request resolved: facebookresearch/PrivacyGuard#4 Reviewed By: mgrange1998 Differential Revision: D76978658 fbshipit-source-id: 19841f6f02fcf3cd7fd8453f50082ecf541a0460

…toggles whether to cap unreasonably large values of epsilon or not (#6) Summary: Pull Request resolved: facebookresearch/PrivacyGuard#6 This diff introduces a new argument `cap_eps` to the `compute_metrics_at_error_threshold` method, which allows the user to toggle whether to cap unreasonably large values of epsilon or not. The default value of `cap_eps` is set to `True` to avoid impacting current prod readings. This makes it consistent with the method signature in `compute_eps_at_tpr_threshold` Changes: * Adding the `cap_eps` argument to the `compute_metrics_at_error_threshold` method in `mia_results.py` * Passing the `cap_eps` argument from the `compute_metrics_at_error_threshold` method in `analysis_node.py` to the `compute_metrics_at_error_threshold` method in `mia_results.py` * Updating the docstring in `mia_results.py` to reflect the new argument Reviewed By: mgrange1998 Differential Revision: D77235199 fbshipit-source-id: 81f5966fc513f251c8e0f6db93ccce4c9f9839b2

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#7 This diff adds unittesting for the CalibAttack class. The code changes add a new test case to verify that an IndexError is raised when a required column is missing from the dataframe Reviewed By: mgrange1998 Differential Revision: D77389049 fbshipit-source-id: 9d21daf1bcdee964eb10ba68ddc1c44404b8e4fb

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#8 A recent diff (D77184572) started strictly enforcing strictly that method overrides are consistent with the base function. This change updates fdp_analysis_node s.t its "run_analysis" method matches the definition in base_analysis_node. It also adds "run_analysis_with_parameters", to keep the functionality of specifying custom m, c, and c_cap for an existing node. # Code Freeze Safety Because FDPAnalysisNode is [not called elsewhere](https://fburl.com/code/yim44tcq) this fix is safe during the Ads code freeze. Reviewed By: lucamelis Differential Revision: D77888656 fbshipit-source-id: c3e4a001458e239fef3422b63f1b71baac28026f

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#9 **Speed up MIA analysis**: total duration of call to PrivacyGuard for analysis is reduced by 2x. Speed up is achieved by reducing the search time for indices where tpr >= tpr_threshold (and similarly for fpr/fnr errors). We do 100k such searches in one analysis. - Use binary search `np.searchsorted` instead of linear `np.where` since tpr/fpr/fnr arrays are sorted - Vectorize index search so that its performed in parallel for all thresholds rather than one threshold at a time. In addition, fix small bugs with edge cases for index search: - `fnr_idx = np.where(fnr <= 0.001)[0][-1]` should be ` fnr_idx = np.where(fnr <= 0.001)[0][0]`, since fnr is sorted in decreasing order. - `if tpr.min() > threshold` should check `if tpr.max() < threshold`, since we are trying to catch the case where there are no tpr values greater than the threshold. Define new method in MIAResults `_get_indices_of_error_at_thresholds` which performs the sped up search and fixes these bugs. Reviewed By: lucamelis Differential Revision: D78344325 fbshipit-source-id: cb433bc67dc76a41de32d00227ff3ad93c41d708

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#10 This diff introduces a new tutorial for performing Likelihood Ratio Attacks (LiRA) on machine learning models trained on the CIFAR-10 dataset using the Privacy Guard framework. The tutorial covers both online and offline variants of the attack. **Code Changes** --------------- The diff includes changes to the following files: * `fbcode/privacy_guard/tutorials/BUCK`: Added a new target for the LiRA tutorial. * `fbcode/privacy_guard/shadow_model_training/tests/test_dataset.py`: Added a new test file for the dataset module. * `fbcode/privacy_guard/tutorials/lira_attack_cifar10_tutorial.py`: Added a new tutorial file for LiRA attacks on CIFAR-10. * `fbcode/privacy_guard/shadow_model_training/__init__.py`: Added a new init file for the shadow model training package. Reviewed By: mgrange1998 Differential Revision: D78687894 fbshipit-source-id: 41d17b373d2f7fddb382b34cd3ce0d7de86d4819

The internal and external repositories are out of sync. This Pull Request attempts to brings them back in sync by patching the GitHub repository. Please carefully review this patch. You must disable ShipIt for your project in order to merge this pull request. DO NOT IMPORT this pull request. Instead, merge it directly on GitHub using the MERGE BUTTON. Re-enable ShipIt after merging.

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#12 The purpose of this diff is to start the process of reorganizing the analysis modules for MIA into a separate subdirectory. This change is part of a larger effort to improve the organization and maintainability of the codebase. #### Changes * Updated import statements to point to the new location of `MIAResults`. * Created a new file `mia/mia_results.py` to hold the implementation of `MIAResults`. * * `density_based_attack_utils.py`: Updated the import statement for `MIAResults` to point to the new location in `mia/mia_results.py`. Reviewed By: mgrange1998 Differential Revision: D79829908 fbshipit-source-id: 8d385bafaf35189b70aff8074ce27bc8c788009b

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#13 ## Diff Summary **Moving Analysis Modules for MIA to Subdirectory** This diff is the second of three diffs that aim to move analysis modules for MIA to a subdirectory. The changes are as follows: ### Code Changes #### `benchmark_factored_analysis_node.py` * Moved import statements to use `privacy_guard.analysis.mia` instead of `privacy_guard.analysis`. #### `test_factored_analysis_node.py` * Updated import statement to use `privacy_guard.analysis.mia.factored_analysis_node` instead of `privacy_guard.analysis.factored_analysis_node`. #### `compute_score.py` * Moved import statement to use `privacy_guard.analysis.mia.factored_analysis_node` instead of `privacy_guard.analysis.factored_analysis_node`. #### `BUCK` * Updated the source file for the `factored_analysis_node` library to `mia/factored_analysis_node.py` instead of `factored_analysis_node.py`. ### Overall Impact This diff is a part of a larger effort to restructure the analysis modules for MIA. By moving these modules to a subdirectory, the codebase becomes more organized and easier to maintain. The changes made in this diff are primarily related to updating import statements and build configurations to reflect the new directory structure. Differential Revision: D80014423 fbshipit-source-id: 815efaa8611d6bfa4b4429cdc0024c556a78eb11

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#14 This diff is the third part of a series of diffs aiming to restructure the PrivacyGuard codebase. Specifically, it focuses on moving analysis modules related to MIA to a dedicated subdirectory `mia`. The changes involve updating import statements and file paths to reflect the new location of these modules. #### Key Changes * Moved remaining modules and updated imports. * Updated import statements in `sota_param_sweep_workflow.py`, `test_config_helpers.py`, and `test_sota_param_sweep_workflow.py` to point to the new location of `aggregate_analysis_input` and `parallel_analysis_node` modules. Reviewed By: mgrange1998 Differential Revision: D80014348 fbshipit-source-id: 96d92e4c076b3d48067ecb4d9b56b3a3fa412da0

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#16 Add lightweight label inference attack to PrivacyGuard. Note: I also define a LIAAttackInput class which prepares the datasets before feeding them into the attack. The purpose of this is to speed up experimentation. Aggregation for LIA is expensive (since it uses .idxmax(). Therefore, we want to avoid re-doing the aggregation every time we change something in the attack or analysis. Reviewed By: lucamelis Differential Revision: D77935914 fbshipit-source-id: 038958b3fbdf77f32bdee34051fedbe953e54b7a

…/extraction (#15) Summary: Pull Request resolved: facebookresearch/PrivacyGuard#15 - Moves TextInclusionAnalysis and TextInclusionAnalysisInput to privacy_guard/text_inclusion - Updates dependencies - Adds TextInclusionAnalysis to analysis_library to be utilized in bento kernels - Rename `test_text_inclusion_analysis_node.py` buck target to `test_text_inclusion_analysis_node` Reviewed By: knchadha Differential Revision: D80537357 fbshipit-source-id: f9fde01f0d61c232a756588601a16c1228e45ad2

…nAnalysis (#17) Summary: Pull Request resolved: facebookresearch/PrivacyGuard#17 - Allows for specifying custom "prompt_key" and "generation_key" when using TextInclusionAnalysisNode. Before, these were hard coded to "prompt" and "output_text" - Adds "disable_exact_match". This is included because exact match is not yet programmed to support multi target mode. - Removes TextInclusionAnalysisInput::REQUIRED_COLUMNS, which was unused before Reviewed By: knchadha Differential Revision: D80711589 fbshipit-source-id: 7aa3d72ba154a875086e71f8158f65c092f3e96f

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#18 Generation interface for PrivacyGuard OSS, this allows for local eval runs. This will be used to provide an E2E OSS flow to 1. Create generations for a given model 2. Run PrivacyGuard Attacks+Analysis to compute memorization of the associated text. # Next Step - In script, pipe output to analysis - Tutorial+Kernel Reviewed By: knchadha Differential Revision: D79269807 fbshipit-source-id: 95993405b9a7cbc99ef6b1ed2468a5b2cb6f3d11

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#19 In preparation of PrivacyGuard OSS release, moving the FactoredAnalysisNode class out of OSS PrivacyGuard. This is being done because it currently does not implement the "AnalysisNode" base class- time does not permit it to be refactored before our OSS target timeline, so temporarily removing it from OSS code for the time being. - Moves FactoredAnalysisNode code, test, and benchmarks to "privacy_guard/fb" - Updates imports in ads workflow to point to new paths Reviewed By: lucamelis Differential Revision: D80963437 fbshipit-source-id: 3939fed95cdee6523103def0fd50a635c4c198dd

…. "separable_id" in AggregateAnalysisInput (#20) Summary: Pull Request resolved: facebookresearch/PrivacyGuard#20 This change adds a "user_id_key" argument to AggregateAnalysisInput, CalibAttack, and LiraAttack. This will be used to sanitize references to "separable_id" in the OSS release, in subsequent diffs in this stack. To keep the incremental changes simple, this change adds this new argument but keeps the default behavior of CalibAttack and LiraAttack to use "separable_id". # Change Breakdown - Adds new "user_id_key" to AggregateAnalysisInput, CalibAttack, and LiraAttack. - Replaces hardcoded instances of "separable_id" with this variable - Updates default behavior for CalibAttack and LiraAttack so that user_id_key is set to "separable_id" by default. # Next steps 1. Migrate "separable_id" to "user_id" in OSS files, update testing datasets 2. Remove default altogether and update all callsites to specify "separable_id" explicitly. Reviewed By: lucamelis Differential Revision: D80955348 fbshipit-source-id: f060d1589ce08446eaeaa36d230df6f5d185e89f

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#22 1. Added ProbabilisticMemorization analysis node that computes model probabilities from prediction logprobs and performs threshold comparisons. a. The node calculates n-based probabilities using the formula p = 1 - (1 - model_prob)**n, storing results as dictionaries in a single column for efficient data organization. 2. Added tests for the corresponding node. Command to run analysis: ```bash buck run //privacy_guard/analysis/scripts:probabilistic_memorization_analysis -- \ --generation_path /path/to/generation_data.jsonl \ --prob_threshold 0.5 \ --output_path /path/to/output_results.jsonl \ --n_values "10,100,1000" ``` Reviewed By: mgrange1998 Differential Revision: D80853682 fbshipit-source-id: 32f50e1e23062b92b1187506c95332332dcb2a3f

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#25 This diff introduces a new class, `RmiaAttack`, which implements the Robust Membership Inference Attack (RMIA). The RMIA is an attack that estimates population probability distributions using reference model predictions. In the next diff, we will push a tutorial on how to use this class for CIFAR10 **Changes** ----------- ### BUCK File The diff adds a new `python_library` target, `rmia_attack`, which includes the `rmia_attack.py` file and its dependencies. It also adds a new `python_unittest` target, `test_rmia_attack`, which includes the `tests/test_rmia_attack.py` file and its dependencies. ### rmia_attack.py This new file defines the `RmiaAttack` class, which inherits from the `BaseAttack` class. ### tests/test_rmia_attack.py This new file defines a test class, `TestRmiaAttack`, which includes several test methods for the `RmiaAttack` class. Reviewed By: mgrange1998 Differential Revision: D80200703 fbshipit-source-id: 7cf5d3e494ca0795784ffb0993c653c3516694f2

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#26 This diff introduces a new tutorial for the Robust Membership Inference Attack (RMIA) using the PrivacyGuard framework. The code changes include: * Adding new functions to the `shadow_model_training` module, including `get_softmax_scores` and `prepare_rmia_data`, to support RMIA. * Creating a new dataset class `create_rmia_datasets` to generate datasets for RMIA. * Adding a new tutorial file `rmia_attack_cifar10_tutorial.py` that demonstrates how to perform RMIA on the CIFAR-10 dataset using the PrivacyGuard framework. * Updating the `BUCK` files to include the new tutorial and dependencies. These changes provide a comprehensive guide for users to perform RMIA using the PrivacyGuard framework. Reviewed By: mgrange1998 Differential Revision: D80458010 fbshipit-source-id: be82abcea98fc28a5f4cca0ad6ce50f332c8464c

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#21 Removing "eps_geo_split" as an output of OSS PrivacyGuard analysis nodes. This change is safe to make for the following reasons 1. "eps_geo_split" is not referenced directly outside of PrivacyGuard library 2. This does not touch the Ads production path (as FactoredAnalysisNode is not changed) 3. Information is not lost in the outputs, as "eps_geo_split" was already pulled from eps_tpr_ub which itself is present in the outputs. The only concern is that sota_param_sweep_workflow will no longer return "eps_geo_split", cc lucamelis for confirmation on safety of this. Reviewed By: lucamelis Differential Revision: D81061518 fbshipit-source-id: 54d3b9a51006cb794e014ac5c16e1717a9bfd9ec

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#23 This change - Removes tests/test_data/df_test_merge.json.zst and tests/test_data/df_train_merge.json.zst from privacy_guard/analysis - Updates analysis tests to use a centralized "BaseTestAnalysisNode" which provides utils for sampling distributions and using test dataframes. This change is needed for OSS because the compressed datasets contain fb internal columns like "separable_id", and including compressed data in a library is bad practice. Reviewed By: lucamelis Differential Revision: D81066787 fbshipit-source-id: 142dc2f0f98e26a343fe8c954bc4a47f99ab3e86

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#24 This change - Moves the fb internal "ADS_MERGE_COLUMNS" to a utils file in privacy_guard/fb - Updates CalibAttack callsites to pass in "ADS_MERGE_COLUMNS" - Changes CalibAttack::merge_column's default value to [user_key_id] instead of ADS_MERGE_COLUMNS - Updates CalibAttack's default user_id_key to "user_id" instead of "separable_id" Reviewed By: lucamelis Differential Revision: D81233544 fbshipit-source-id: a81634eff59cc69a98a90c19ca46f98e8ca646df

…kernel (#27) Summary: Pull Request resolved: facebookresearch/PrivacyGuard#27 This updates the kernel of OSS tutorials to reference the privacy guard kernel, as references to "empirical_dp" will remain internal. The empirical_dp kernel will continue to be supported, as both it and the privacy guard kernel depend on the same PrivacyGuard library BUCK target. Reviewed By: lucamelis Differential Revision: D81245330 fbshipit-source-id: 746b7d0ba877223f97fdc4921d24b30eeac30919

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#28 as per title Reviewed By: mgrange1998 Differential Revision: D81249502 fbshipit-source-id: c8fed4e84cb550893f25c27642109479678d61dc

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#29 As titled, remove reference to empirical dp codebase from comment Reviewed By: iden-kalemaj Differential Revision: D81338963 fbshipit-source-id: f647cff15f485d9d26433987c76db57450cec07b

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#30 - Change default "user_key_id" from separable_id to user_id - Updates lira unit tests to remove references to FB internal column keys This will not change the production workflow because it explicitly specifies the "user_key_id" value in this diff D81040841 Reviewed By: iden-kalemaj Differential Revision: D81339820 fbshipit-source-id: bcd4c9d17745bb48a5140c5447c98c95247b0a5a

…raction folder (#32) Summary: Pull Request resolved: facebookresearch/PrivacyGuard#32 Reorganized privacy_guard analysis modules by moving probabilistic memorization analysis and reference model comparison code into the `extraction/` subfolder alongside existing text inclusion analysis modules. **Files moved:** * `probabilistic_memorization_analysis_input.py` * `probabilistic_memorization_analysis_node.py` * `reference_model_comparison_input.py` * `reference_model_comparison_node.py` Reviewed By: mgrange1998 Differential Revision: D81371313 fbshipit-source-id: 183e2cbaf0bac307852c33bc530ac22040fb15b3

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#51 This updates mia_results to use the built in "round" method instead of np.float.round which it was using before. This was causing issues in OSS where it interpreted the np.floats as just floats, leading to AttributeErrors This change reduces the number of OSS failing tests down to 5 Reviewed By: iden-kalemaj Differential Revision: D82470350 fbshipit-source-id: cdf6e8dce9551656d22926ef10ade4434d27cc0d

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#46 * **logits_attack**: Added main logic to the stub, refactored the stub to use predictor instead of model_path. * **test_logits_attack**: Created a test suite with MockPredictor, tensor-to-list conversion validation, and file I/O testing * **base_predictor**: Minor change to enforce presence of batch_size variable in get_logits. * **probabilistic_memorization_analysis_from_logits_input.py**: Minor update to allow for custom column name for logits. Reviewed By: mgrange1998 Differential Revision: D82256736 fbshipit-source-id: bd5aabb8e9cc4b03415409fe00d64a0d0b1ac4e1

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#48 File changes and updates: * **logprobs_attack**: Implement LogprobsAttack class that extracts log probabilities using predictor and creates ProbabilisticMemorizationAnalysisInput for analysis * **test_logprobs_attack**: Add unit tests for LogprobsAttack covering initialization, execution, custom columns, batch processing, and file output * **probabilistic_memorization_analysis**: Add logprobs_column parameter support to analysis input and update analysis node to use dynamic column names instead of hardcoded "prediction_logprobs" Reviewed By: mgrange1998 Differential Revision: D82262059 fbshipit-source-id: c844b4439f2bcf878ee374acee8bc74ee9cd4a41

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#49 * **generation_attack.py**: Updated to use predictor instead of model/tokenizer and added configurable `target_column` parameter for custom target field names. * **test_generation_attack.py**: Rewrote tests to work with predictor-based GenerationAttack class and added testing for custom target column functionality. Reviewed By: mgrange1998 Differential Revision: D82269372 fbshipit-source-id: 53d7b5e63f1d32e87d54dcd9587d568bcc01cacc

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#53 1. `probabilistic_memorization_analysis_from_logits_input.py`: Added support for custom column names and generation parameters so it works with the predictor classes. 2. `probabilistic_memorization_analysis_from_logits_node.py`: Built the main analysis logic that computes probabilities from logits with temperature/top-k/top-p support and proper error handling. 3. `logits_attack.py`: Added target tokens column to be returned from logits_attack, without which this analysis is impossible. 4. `test_probabilistic_memorization_from_logits.py`: Wrote tests for all the new functionality. Reviewed By: s-huu Differential Revision: D82469551 fbshipit-source-id: 1324e3b9f11c10e24cc4d5552fcde190125067c1

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#54 Correct the naming of longest common substring to align with terminology used in external research Before: lcs refers to longest common substring instead of longest common sequence. After: more descriptive function name for longest common substring, see the code for details. Reviewed By: mgrange1998 Differential Revision: D82684578 fbshipit-source-id: d3c4d6536e329ce09e99de65df66858ba389a913

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#55 Pull Request resolved: facebookresearch/PrivacyGuard#39 `compute_eps_at_tpr_threshold` is no longer needed as we use `compute_metrics_at_error_threshold` to compute epsilon at TPR thresholds (as well as FPR). This allows us to run only one round of bootstrap as opposed to two boostrap runs (with TPR and with FPR errors). Differential Revision: D82749962 fbshipit-source-id: a01f5c734cbc64f50a5bcecffc0f42a710cf9e3d

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#56 Setting up CI testing for the OSS repo. The workflows can also be manually dispatched for testing purposes Reviewed By: CristianLara Differential Revision: D82823727 fbshipit-source-id: 91057161c128442a1d5b705a57b1eff0fd21a4fe

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#59 Add support to word level lcs Reviewed By: mgrange1998 Differential Revision: D82689690 fbshipit-source-id: 81f6390fd928a29b7b1b18848b5d5a3bc740320e

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#57 This updates the Attack OSS files of PrivacyGuard to reflect the MIT license Reviewed By: lucamelis, iden-kalemaj Differential Revision: D82836646 fbshipit-source-id: 3303d42899974a996156984d1c8c1935154f9b0b

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#58 This updates the copyright headers of all OSS analysis files to the MIT license header. Reviewed By: lucamelis Differential Revision: D82836992 fbshipit-source-id: 54c16f6977e6542d614ebb6f42471d0f3de3e3db

… (#60) Summary: Pull Request resolved: facebookresearch/PrivacyGuard#60 Moving test_hf_predictor_local to privacy_guard/gen_ai/internal_tests since it is only for internal testing and doesn't need to be in the OSS code. Reviewed By: mgrange1998 Differential Revision: D82857333 fbshipit-source-id: 8290e7bd7844aab266840811ffb518d2524edd7c

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#63 Reviewed By: lucamelis Differential Revision: D83060041 fbshipit-source-id: 80a0c1f121c5f3e28a14d17a63f0e72c8b975bf5

…se (#64) Summary: Pull Request resolved: facebookresearch/PrivacyGuard#64 Internal grimaldi tutorials contain internal metadata and are not OSS compatible, so moving these source control notebooks to the fb/tutorials directory of PrivacyGuard. The subsequent diff in this stack adds the jupyter versions of these tutorials to the OSS repo Reviewed By: lucamelis Differential Revision: D83060282 fbshipit-source-id: 0449fc0294988d23b7c8d2628d51ac1fc10f842e

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#65 Following instructions in OSS tutorial wiki https://www.internalfb.com/wiki/PrivacyGuard/PrivacyGuard_Developer_Guide/PrivacyGuard_OSS_Tutorial_Publishing This adds OSS compatible versions of the lira/mia tutorials to the OSS repo in the jupyter notebook format. The outputs are saved within the files Reviewed By: lucamelis Differential Revision: D83059187 fbshipit-source-id: 4cbcb39690a1f6d97d61439aab81bd98454d2ea2

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#62 Tutorial for logits_attack and logprobs_attack. Reviewed By: mgrange1998 Differential Revision: D82862302 fbshipit-source-id: f8247c23fdda8b199a1f3d496fe6e206eab8b96d

…e longest_common_subsequence metrics to TextInclusionNode (#66) Summary: Pull Request resolved: facebookresearch/PrivacyGuard#66 - rename "disable_lcs" to "disable_longest_common_substring" - Add _compute_char_level_longest_common_subsequence_helper and _compute_word_level_longest_common_subsequence_helper options to TextInclusionAnalysisNode and input Reviewed By: s-huu Differential Revision: D83155383 fbshipit-source-id: 9ad3a68aa21038947480693fc6dc2590a9ceddf2

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#67 Removes reference to internal github repo from OSS file Reviewed By: lucamelis Differential Revision: D83177138 fbshipit-source-id: 6a2aa0ea5f336bd9f84f425d743d816441028f76

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#68 This extends the README file to include more details about the library as a whole. This also adds a changelog with an initial 0.0.1 version of the library. Also updates the library to use Apache 2.0 liscense, as approved in T223866456 {F1982241561} {F1982241562} Reviewed By: lucamelis Differential Revision: D83179386 fbshipit-source-id: 246b30c4a7df3d07f311036c06ea3d08bd9d64b6

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#70 The tutorials do not render for now in Github {F1982343338} This change - Adds nbformat to the .ipynb tutorial files s.t they are rendered in Github - Renames fbcode/privacy_guard/github/tutorial_notebooks/fdp_analysis_tutorial.ipynb from fbcode/privacy_guard/github/tutorial_notebooks/fdp_analysis_tutorial.py.ipynb - Moves fbcode/privacy_guard/fb/tutorials/probabilistic_memorization_tutorial.py to internal repo. Reviewed By: iden-kalemaj Differential Revision: D83493038 fbshipit-source-id: 01509e344a073efddacc93f3b31b601f1e6e782c

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#72 Adds "loss_attack_tutorial.ipynb" for use in OSS release. The tutorial renders in Github with the outputs saved {F1982346657} Reviewed By: iden-kalemaj Differential Revision: D83500065 fbshipit-source-id: ea0b63670fb7f623430959ccbb9471710fa0f9c3

…ck directory (#69) Summary: Pull Request resolved: facebookresearch/PrivacyGuard#69 Following the policy review approval, updating the attack files to utilize the Apache 2.0 License header https://www.internalfb.com/wiki/Open_Source/Licenses/Apache_License/ Reviewed By: lucamelis Differential Revision: D83491074 fbshipit-source-id: 0bfcdb1c4b332fafaf164f4804a458689f04c94f

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#71 This change updates the license headers for analysis files in PrivacyGuard to reflect the Apache 2.0 license. Reviewed By: lucamelis Differential Revision: D83493285 fbshipit-source-id: 7dff527e995fe23b771a0b84b4e7830ddd61ee99

Summary: This adds the two remaining tutorials for probabilistic_memorization_tutorial.ipynb and text_extraction_tutorial.ipynb Differential Revision: D83571788

facebook-github-bot · 2025-09-30T15:22:23Z

@mgrange1998 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D83571788.

facebook-github-bot and others added 30 commits June 5, 2025 08:01

Initial commit

8c26528

fbshipit-source-id: 60dac20ef2ad5e372abd5036bf422cf1c84a9f46

LiRA & LiRA lightweight FBLearner workflow (#4)

049386b

Summary: Implementing LiRA for eDP for features Pull Request resolved: facebookresearch/PrivacyGuard#4 Reviewed By: mgrange1998 Differential Revision: D76978658 fbshipit-source-id: 19841f6f02fcf3cd7fd8453f50082ecf541a0460

fix rmia_attack header (#28)

7e5654c

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#28 as per title Reviewed By: mgrange1998 Differential Revision: D81249502 fbshipit-source-id: c8fed4e84cb550893f25c27642109479678d61dc

Remove ads reference from OSS (#29)

373f422

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#29 As titled, remove reference to empirical dp codebase from comment Reviewed By: iden-kalemaj Differential Revision: D81338963 fbshipit-source-id: f647cff15f485d9d26433987c76db57450cec07b

mgrange1998 and others added 24 commits September 16, 2025 13:34

Add implementation for word level longest common subsequence (#59)

db8f509

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#59 Add support to word level lcs Reviewed By: mgrange1998 Differential Revision: D82689690 fbshipit-source-id: 81f6390fd928a29b7b1b18848b5d5a3bc740320e

Update timeout for .github action tests (#63)

1fe33a4

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#63 Reviewed By: lucamelis Differential Revision: D83060041 fbshipit-source-id: 80a0c1f121c5f3e28a14d17a63f0e72c8b975bf5

Probabilistic Memorization Tutorial (#62)

77d9948

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#62 Tutorial for logits_attack and logprobs_attack. Reviewed By: mgrange1998 Differential Revision: D82862302 fbshipit-source-id: f8247c23fdda8b199a1f3d496fe6e206eab8b96d

Remove reference to fairinternal (#67)

32f2759

Summary: Pull Request resolved: facebookresearch/PrivacyGuard#67 Removes reference to internal github repo from OSS file Reviewed By: lucamelis Differential Revision: D83177138 fbshipit-source-id: 6a2aa0ea5f336bd9f84f425d743d816441028f76

Text+Probabilistic extraction tutorials

2143875

Summary: This adds the two remaining tutorials for probabilistic_memorization_tutorial.ipynb and text_extraction_tutorial.ipynb Differential Revision: D83571788

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 30, 2025

facebook-github-bot added fb-exported meta-exported labels Sep 30, 2025

facebook-github-bot closed this Sep 30, 2025

facebook-github-bot force-pushed the main branch from bcb27b6 to 3b4b846 Compare September 30, 2025 20:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text+Probabilistic extraction tutorials#73

Text+Probabilistic extraction tutorials#73
mgrange1998 wants to merge 71 commits intofacebookresearch:mainfrom
mgrange1998:export-D83571788

mgrange1998 commented Sep 30, 2025

Uh oh!

facebook-github-bot commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

mgrange1998 commented Sep 30, 2025

Uh oh!

facebook-github-bot commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants