Skip to content

Commit 554702d

Browse files
committed
Merge remote-tracking branch 'origin/main' into IVF-Faiss
2 parents 25c5899 + 004f730 commit 554702d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+561
-285
lines changed

.github/labeler.yml

+9-9
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
# Copyright (c) Meta Platforms, Inc. and affiliates.
22

33
python-threatexchange:
4-
- changed-files:
5-
any-glob-to-any-file: 'python-threatexchange/**/*'
4+
- changed-files:
5+
- any-glob-to-any-file: 'python-threatexchange/**'
66

77
hma:
8-
- changed-files:
9-
any-glob-to-any-file: ['hasher-matcher-actioner/**/*', 'open-media-match/*']
8+
- changed-files:
9+
- any-glob-to-any-file: 'hasher-matcher-actioner/**'
1010

1111
pdq:
12-
- changed-files:
13-
any-glob-to-any-file: 'pdq/**/*'
12+
- changed-files:
13+
- any-glob-to-any-file: 'pdq/**'
1414

15-
hma-ui:
16-
- changed-files:
17-
any-glob-to-any-file: 'hasher-matcher-actioner/webapp/**/*'
15+
vpdq:
16+
- changed-files:
17+
- any-glob-to-any-file: 'vpdq/**'

.github/workflows/pr-labeler.yml

+8-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,10 @@
22

33
name: "PR Labels"
44
on:
5-
- pull_request_target
5+
pull_request:
6+
types: [opened, synchronize, reopened]
7+
pull_request_target:
8+
types: [opened, synchronize, reopened]
69

710
jobs:
811
apply-labels:
@@ -11,6 +14,10 @@ jobs:
1114
contents: read
1215
pull-requests: write
1316
steps:
17+
- uses: actions/checkout@v4
18+
with:
19+
ref: ${{ github.event.pull_request.base.ref }}
1420
- uses: actions/labeler@v5
1521
with:
1622
repo-token: "${{ secrets.GITHUB_TOKEN }}"
23+
configuration-path: .github/labeler.yml

.github/workflows/python-threatexchange-ci.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ jobs:
3535
- name: Install dependencies
3636
run: |
3737
python -m pip install --upgrade pip
38-
python3 -m pip install -e .[all]
38+
python3 -m pip install -e .[dev]
3939
- name: Check code format
4040
run: |
4141
black --check .
@@ -64,7 +64,7 @@ jobs:
6464
- name: Install dependencies
6565
run: |
6666
python -m pip install --upgrade pip
67-
pip install -e ".[test]"
67+
python3 -m pip install -e .[dev,extensions]
6868
- name: Test with pytest
6969
run: |
7070
py.test

CONTRIBUTING.md

+19-19
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Repo Organization and project-level CONTRIBUTING.md
2-
This repository is organized as a collection of sub-projects, some of which reference each other.
2+
This repository is organized as a collection of sub-projects, some of which reference each other.
33
This explainer is about the entire repository, but there are more specific notes for how to
44
develop on subprojects in the various directories that contain them.
55

@@ -21,39 +21,39 @@ Here's an easy way to get started:
2121

2222
git clone [email protected]:your-username/ThreatExchange
2323

24-
Next, [fork]([fork](https://help.github.com/articles/fork-a-repo/) your own copy of the repo.
24+
Next, [fork]([fork](https://help.github.com/articles/fork-a-repo/) your own copy of the repo.
2525
Here's where you'll push development branches. Note the name, we'll need it in the next step.
2626

2727
Next, we'll add your forked copy as a remote.
2828

2929
# From the root directory of your locally cloned repo
30-
git remote add fork [email protected]:<USUALLY_YOUR_USERNAME>/ThreatExchange
30+
git remote add fork [email protected]:<USUALLY_YOUR_USERNAME>/ThreatExchange
3131

3232
Lastly, we'll add the fork as the preferred repo to push to
3333

3434
# From the root directory of your locally cloned repo
3535
git config --global push.default fork
3636

37-
This will give you an easy develoment cycle for updating your copy:
37+
This will give you an easy development cycle for updating your copy:
3838

3939
git co main # co is a common shortcut for `checkout`
4040
git pull # Get all the new commits from the upstream
4141

4242
If you are a git expert, you likely know of other ways to set this up, to your preference!
4343

4444
## Branch locally and develop!
45-
First, make a branch in your cloned fork.
45+
First, make a branch in your cloned fork.
4646

4747
git co -b <MY_COOL_FEATURE_BRANCH_NAME>
4848

4949
We suggest naming the branch by feature name
5050
or “issue_XX” where XX is the issue number the branch is associated with. Make
51-
your changes in your branch and test thoroughly.
51+
your changes in your branch and test thoroughly.
5252

5353
git commit -am "[hma] Made cool changes to hasher-matcher-actioner"
5454
git push # Pushes to your fork if you followed the setup!
5555

56-
If this is a large feature you can push your branch to your fork often to save your work.
56+
If this is a large feature you can push your branch to your fork often to save your work.
5757
When making commits to your branch, make sure you write [well-formed][wf] commit
5858
messages and update documentation accordingly (see the next section).
5959

@@ -65,10 +65,10 @@ If your change will add a new API, new functionality, or refactor a large sectio
6565
it's best to socialize your intent first and get some feedback. Maintainers can let you know
6666
if there are some concerns about the change (e.g. some projects deliberately minimize the number
6767
of dependencies - if you are planning on pulling Node.js into a project, it's unlikely to be
68-
accepted!
68+
accepted!
6969

7070
This is also a place where maintainers can give you pointers on what they'll be looking for in
71-
testing.
71+
testing.
7272

7373
Don't let this step dissuade you, we want to work with you to help you figure out the best
7474
solution. If we ultimately don't believe we can support it in the main repo, you are welcome
@@ -104,7 +104,7 @@ in the directory you are working with, or in smaller projects, a top-level `test
104104
folder that contains all the tests.
105105

106106
For changes that aren't easy to unittest, we still want to know you are sure your
107-
change does the right thing. Make sure to include the testing steps you used.
107+
change does the right thing. Make sure to include the testing steps you used.
108108
If you add more testing steps (e.g. as a result of review feedback)
109109

110110
### Submit a PR
@@ -122,21 +122,21 @@ Make sure to include a summary of changes (especially the "why" or problem you a
122122
Make sure to link the issue if it's for an issue!
123123

124124
#### Draft Reviews and RFC
125-
If you are not sure about a potential change, and want to get feedback on a review, you can still submit a PR as a draft PR, or clearly label the PR with "[RFC]" (request for comment).
125+
If you are not sure about a potential change, and want to get feedback on a review, you can still submit a PR as a draft PR, or clearly label the PR with "[RFC]" (request for comment).
126126
Reviewers will know not to merge your changes but may still send you an Accept if they would merge it without changes (or use "Request Changes" to indicate the same thing, just that they want you to convert from draft).
127127

128128
### Continuous Integration
129-
If you are a new contributor, the continuous integration (CI) won't run until it's triggered by a maintainer of the repo.
129+
If you are a new contributor, the continuous integration (CI) won't run until it's triggered by a maintainer of the repo.
130130
If there is CI failure, we usually won't merge the change!
131131

132132
### Review
133133
Once you’ve submitted a PR you're waiting on us for review. We aim to check the repo every business day, but sometimes we are slow, especially if there aren't new changes.
134-
We aim to try and get you a full review every business day (e.g. resulting in either "request changes" or merge), but we can't always do this.
134+
We aim to try and get you a full review every business day (e.g. resulting in either "request changes" or merge), but we can't always do this.
135135
We don't expect it takes longer than 5 business days to get a first review.
136136

137137
To make changes on your code for review, checkout your branch again
138138

139-
git checkout <MY_COOL_FEATURE_BRANCH_NAME>
139+
git checkout <MY_COOL_FEATURE_BRANCH_NAME>
140140

141141
Make your changes as new commits (don't amend your previous commits as they break history and make it harder to review)
142142

@@ -145,7 +145,7 @@ Make your changes as new commits (don't amend your previous commits as they brea
145145
Then push your changes to your fork again - they'll automatically update your PR!
146146

147147
git push # assumes you set up the default push target to your fork
148-
148+
149149
Here are some things to expect from review:
150150
1. If your summary isn't detailed and we aren't sure what the change is aiming to do, we might request changes without reviewing much of the code.
151151
2. If you don't have a test plan or your change is missing updates for unittesting, we will likely request changes.
@@ -156,7 +156,7 @@ Here are some things to expect from review:
156156
4. `alt/code golf:` The reviewer is providing an alternative implementation that might be shorter or have a stylistic difference. Implicitly `ignorable:`
157157

158158
#### Resolving Conversations
159-
Standard practice is to let the commentor who created a comment thread, or another reviewer "resolve conversations" after you’ve responded to or addressed the issue. Reviewers may un-resolve conversations they think still need discussion.
159+
Standard practice is to let the commentor who created a comment thread, or another reviewer "resolve conversations" after you’ve responded to or addressed the issue. Reviewers may un-resolve conversations they think still need discussion.
160160

161161
#### Clearing Reviews After Response
162162
Sometimes Github will still show "Changes Requested" even if you have responded to all changes (or interactions with conversation resolution). Please [dismiss reviews with changes requested](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/dismissing-a-pull-request-review) that are stuck in "Requested Changes" once you think you have addressed everything.
@@ -165,9 +165,9 @@ Sometimes Github will still show "Changes Requested" even if you have responded
165165
### Merging
166166
Once your code is accepted, we usually merge it right away! If you don't want us to do this, please note it in the summary. You can always do followups as another PR!
167167

168-
### PR Sizes & Landing-and-iterating
168+
### PR Sizes & Landing-and-iterating
169169
We generally prefer smaller changes (<100-500 lines of functional py code, not including unittests), and so if you have a larger feature, consider opening an issue to explain your full plans, and then doing a series of PRs all linking to that issue.
170-
We understand that everyone contributing is essentially volunteering time, and so want to make the best use of your time as well. If you have run out of time to donate, and have already done a few passes of review, please let us know as a comment.
171-
A maintainer might be willing to finish your PR, or merge it as is, and make an issue to clean it up after.
170+
We understand that everyone contributing is essentially volunteering time, and so want to make the best use of your time as well. If you have run out of time to donate, and have already done a few passes of review, please let us know as a comment.
171+
A maintainer might be willing to finish your PR, or merge it as is, and make an issue to clean it up after.
172172
If you have contributed multiple PRs, we'll generally give you more slack in this department, since we have seen you come back!
173173

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This repository originally started as code to support Meta's ThreatExchange API,
44

55
## PDQ Image Hashing and Similarity Matching
66

7-
PDQ is a photo hashing algorithm that can turn photos into 256 bit signatures which can then be used to match other photos.
7+
PDQ is a photo hashing algorithm that can turn photos into 256 bit signatures which can then be used to match other photos.
88

99
## TMK+PDQF (TMK) Video Hashing and Similarity Matching
1010

@@ -16,7 +16,7 @@ Video PDQ (or vPDQ for short) is a simple video hashing algorithm that determine
1616

1717
## Hasher-Matcher-Actioner (HMA) Trust & Safety Platform
1818

19-
HMA is a ready-to-deploy content moderation project for AWS, containing many submodules. It allows you to maintain lists of known content to scan for, which you can either curate yourself or connect to other hash exchange programs to share and recieve lists. More can be found [at the wiki](https://github.com/facebook/ThreatExchange/wiki).
19+
HMA is a ready-to-deploy content moderation project for AWS, containing many submodules. It allows you to maintain lists of known content to scan for, which you can either curate yourself or connect to other hash exchange programs to share and receive lists. More can be found [at the wiki](https://github.com/facebook/ThreatExchange/wiki).
2020

2121
A second version of this project, called "[Open Media Match](https://github.com/facebook/ThreatExchange/tree/main/open-media-match)" is under construction, which uses a cloud-agnostic docker-based deployment.
2222

api-reference-examples/js/node/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,12 @@ Documentation referenced in this project is located [here](https://github.com/fa
1010
var threatexchange = require('node-threatexchange');
1111
1212
var app_id = 'APP_ID',
13-
var app_secret ='APP_SECRET'
13+
var app_secret ='APP_SECRET'
1414
1515
var api = threatexchange.createThreatExchange(app_id,app_secret);
1616
```
1717

18-
Refer to `test/test.js` for usage of each endpoint.
18+
Refer to `test/test.js` for usage of each endpoint.
1919

2020
Current endpoints implemented:
2121

api-reference-examples/python-notebook/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ All of the refernce notebooks make heavy use of the following Python libraries t
1515
- [Pytx](https://pytx.readthedocs.org/en/latest/installation.html) for ThreatExchange access
1616
- [Seaborn](https://stanford.edu/~mwaskom/software/seaborn/) for making charts pretty
1717

18-
All of the python packages mentioned can be installed via
18+
All of the python packages mentioned can be installed via
1919

2020
```
2121
pip install <package_name>

api-reference-examples/python/pytx/CONTRIBUTING.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ After that, whenever you commit, pre-commit will run and either fix up minor iss
2929
But `--no-verify` is so uncool.
3030

3131
### Running Tests
32-
pytx is a bit short on tests, but you could help fix that.
32+
pytx is a bit short on tests, but you could help fix that.
3333
To run the tests `make test`
3434

3535
## Documenting Changes

hasher-matcher-actioner/CONTRIBUTING.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ Although you can remove the need for this by setting the "black" extension as yo
118118
cd /workspace
119119
mypy src/OpenMediaMatch
120120
```
121-
If you don't run it in this directory, mypyp won't be able to find its settings folder and you'll get different results than the CI.
121+
If you don't run it in this directory, mypy won't be able to find its settings folder and you'll get different results than the CI.
122122

123123
## Save Keystrokes on Common commands
124124
Add these to your ~/.bashrc file and then reload with `. ~/.bashrc`
@@ -129,7 +129,7 @@ alias t='(cd /workspace/src/OpenMediaMatch && py.test)'
129129
alias myt='my && t'
130130
```
131131

132-
## Recover from mysterious errors during sever startup?
132+
## Recover from mysterious errors during server startup?
133133
If you had a syntax error in your code when you opened vscode, the automatic flask run that is created for you may fail. You can easily manually run it!
134134

135135
Create a new terminal window, and then run:
@@ -143,9 +143,9 @@ This is the same command that automatic window runs. Keep fixing errors until it
143143
### It's worse than that!
144144
When you create your devcontainer, data inside is persisted. However, if dependencies to the devcontainer are changed, or a bad database migration appears, you may end up in a strange state that cannot be recovered from. To reset fresh, you will want to rebuild your devcontainer, which you can do from within vscode.
145145

146-
From the menu, go to "View" > "Command Pallet", and in the window that appears, complete to "Devcontainers: Rebuild container".
146+
From the menu, go to "View" > "Command Palette", and in the window that appears, complete to "Devcontainers: Rebuild container".
147147

148-
This will shutdown your container and rebuild it from scratch.
148+
This will shutdown your container and rebuild it from scratch.
149149

150150
## Reset my database?
151151
If your database has gotten into a funky state, run

hasher-matcher-actioner/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ The name "hasher, matcher, actioner" refers to the technical process by which ne
1818

1919
## Configurability
2020

21-
There is no one-size-fits all solution to make platforms safe, and even in the narrow scope of hashing and matching technology, there are many possible solutions. HMA is designed to be highly configurable, such that new algorithms, hash exchanges, or other capabilities could be integrated later. If you want to use a custom or proprietary hashing algorithm with HMA, you simple need to follow the interfaces defined in [python-threatexchange ](../python-threatexchange) to add new capabilities.
21+
There is no one-size-fits all solution to make platforms safe, and even in the narrow scope of hashing and matching technology, there are many possible solutions. HMA is designed to be highly configurable, such that new algorithms, hash exchanges, or other capabilities could be integrated later. If you want to use a custom or proprietary hashing algorithm with HMA, you simple need to follow the interfaces defined in [python-threatexchange ](../python-threatexchange) to add new capabilities. A full list of known available algorithms and compatible exchanges can be found at [the python-threatexchange/extensions README](https://github.com/facebook/ThreatExchange/tree/main/python-threatexchange/threatexchange/extensions/README.md).
2222

2323
You can find an example on expanding the base image to include the Clip tx extension [here](https://github.com/juanmrad/HMA-CLIP-demo)
2424

0 commit comments

Comments
 (0)