govscape

Searching millions of .gov PDFs

Build

To build the govscape server, we use the poetry build system. If poetry is properly installed, then running the following command should build the package:

poetry install

Run

To run the initial version, you first build the embeddings, indices, etc. with:

poetry run python scripts/run_embedding_pipeline.py -p "data/test_data/TechnicalReport234PDFs" -d "data/test_data"

Then, you run the RESTful API server with Gunicorn (for production, default worker_class=sync):

GUNICORN_WORKERS=2 \
poetry run gunicorn -c gunicorn.conf.py 'scripts.python_helpers.start_api_server:create_app()'

Or use the wrapper to pass your usual CLI app arguments along with Gunicorn:

poetry run -- python -m scripts/python_helpers/run_gunicorn.py \
  -p data/test_data/TechnicalReport234PDFs \
  -d data/test_data \
  -tm ST -vm CLIP -k 20 -i Memory -- \
  'gunicorn -c gunicorn.conf.py scripts.python_helpers.start_api_server:create_app()'

Tuning knobs (Gunicorn env vars supported by gunicorn.conf.py):

GUNICORN_WORKERS
GUNICORN_THREADS (only applies to gthread workers)
GUNICORN_WORKER_CLASS
GUNICORN_TIMEOUT
GUNICORN_MAX_REQUESTS
GUNICORN_MAX_REQUESTS_JITTER
GUNICORN_PRELOAD_APP

For development, you can still use the simple runner:

poetry run python scripts/start_api_server.py -p "data/test_data/TechnicalReport234PDFs" -d "data/test_data"

API Documentation

The project includes a RESTful API server built with Flask and documented with Swagger/OpenAPI. To access the API playground:

Start the server using the instructions above
Visit http://localhost:8080/docs
Use the Swagger UI to try out the endpoints
Check the response codes and data formats

Adding New Endpoints

When adding new endpoints to the API:

Define the request/response models using Flask-RESTX fields
Create a new Resource class in the appropriate namespace
Use the @ns.doc() and @ns.response() decorators for documentation
Add example requests/responses in the Swagger UI

Name		Name	Last commit message	Last commit date
Latest commit History 739 Commits
.github/workflows		.github/workflows
data/test_data/TechnicalReport234PDFs		data/test_data/TechnicalReport234PDFs
govscape		govscape
interface		interface
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile.embedding		Dockerfile.embedding
Dockerfile.server		Dockerfile.server
LICENSE		LICENSE
README.md		README.md
data_model.md		data_model.md
govscape_conf		govscape_conf
gunicorn.conf.py		gunicorn.conf.py
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

govscape

Build

Run

API Documentation

Adding New Endpoints

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

License

bcglee/govscape

Folders and files

Latest commit

History

Repository files navigation

govscape

Build

Run

API Documentation

Adding New Endpoints

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages