Tool for manually label tweets of a dataset to be used later in a binary classification algorithm
Tool for manually label tweets of a dataset to be used in a corpus created by Corpus Creator, and later in a binary classification algorithm
It was mainly developed for binary classification of whether a tweet discusses 'illicit drug use' (for my Master's thesis), but it can be used for other topics.
To run the application, you must define specific environment variables. You can create a .env file in the root directory of the project or rename the provided example file, .env.example.
This file should contain the following environment variables:
# App settings
BINARIZER_EXPOSED_PORT=<Binarizer Host Port>
BINARIZER_INTERNAL_PORT=<Binarizer Container Port>
# Web
SESSION_SECRET=<Secret key used for signing and verifying HMAC-based tokens>
# Used by the Browser (Client-side fetch)
VITE_PUBLIC_CORPUS_CREATOR_API_URL=<URL of the API called from the browser>
# Used by Remix Loaders/Actions (Server-side fetch)
INTERNAL_CORPUS_CREATOR_API_URL=<URL of the API called from inside the container>
Replace the < ... > by the correct value. For example: BINARIZER_EXPOSED_PORT=<Binarizer Host Port> --> BINARIZER_EXPOSED_PORT=3100.
Just run docker-compose up.
The value of the session secret key should be a random and long byte sequence that isn't easily guessable.
An example of how to generate this key is by using a tool like OpenSSL or a password generator.
openssl rand -base64 32This generates a 256-bit (32-byte) key encoded in Base64, which is suitable for HMAC-SHA256.
