Hyper Recent is a website that provides users with the latest biomecial publications and updates in real-time. The following document explains the technical details, architecture, infrastructure, and tooling required to deploy the application.
Tech Stack:
- Framework: Next.js (React Framework)
- Language: TypeScript
- Styling: Tailwind CSS
Overview:
- Next.js provides enhanced React functionality including routing and data handling capabilities.
- TypeScript ensures cleaner, bug-free code through strict type checking.
- Tailwind CSS simplifies styling by allowing CSS classes to be written directly in HTML, keeping styles and markup together for easier maintenance.
Tech Stack:
- Framework: Next.js (React Framework)
- Database: PostgreSQL
- ORM: Prisma
Overview:
- Next.js handles API routing and server-side logic, primarily focusing on handling API endpoints for retrieving and filtering research papers.
- PostgreSQL provides reliable storage and querying for metadata, enabling efficient management of scientific research papers, including topics, authors, and their relationships.
- Prisma acts as a bridge between our Next.js application and PostgreSQL database, allowing type-safe database queries and automated schema management.
The diagram illustrates that the Next.js frontend communicates with the backend tier of the same application. The backend processes these requests and interacts with a PostgreSQL relational database to manage data. To ensure the database remains up-to-date, GitHub Actions or cron jobs are scheduled to perform daily updates. This architecture allows data updates to occur seamlessly in the background without causing any downtime for the application.
For future development, users can fork the repository and continue developing directly from the main branch of the forked repository. We have included recommended instructions for GitHub usage below, including sections on creating releases, managing branches, and GitHub Actions. Alternatively, the code can also be downloaded and uploaded to a separate repository without necessitating a fork. Forking is generally recommended if users want to push changes back to the parent repository.
Once a repository is set up, users can clone the repository to any Windows, Mac, or Linux machine to get started.
- Node.js >=14
- PostgreSQL >=12
- Docker
The following environment variables can be used to configure the server:
General:
NODE_ENV
: the environment mode; eitherproduction
ordevelopment
(default)
PostgreSQL Database:
DATABASE_URL
: the connection string to the database. This can be a cloud database or local database. (defaultpostgresql://hyper_recent:postgres@db:5432/mydb
for Docker)POSTGRES_USER
: name of the db (defaulthyper_recent
)POSTGRES_PASSWORD
: username if the db uses auth (defaultpostgres
)POSTGRES_DB
: name of the database on the database url (defaultmydb
)
All data is pulled from https://api.biorxiv.org and https://api.medrxiv.org which contains files for each article represented in:
- JavaScript Object Notation (JSON). This is the native format for Hyper-Recent data and contains article data, metadata of the record itself, metadata of the corresponding source, and visualization data.
Our data is licensed under CC0.
To update the database, users can run npx prisma db seed
. This process can be automated as seen below. This starts the migrateDataScript.ts script.
The database is updated using the variables within the migrateDataScript.ts
This function is used for the CronJob
dates = [today, yesterday]
: To modify the seeding dates (default: Seeds Today, and Yesterday's Publications)baseURLS = [biorxivURL, medrxivURL]
: To modify the URL seeded from (default: fetches from biorxiv and medrixiv)
This function is unused, but can be used to fetch publications for a date range.
startDate
: The start range of the publications you want to fetchendDate
: The end range of the publication you want to fetchbaseURL
: The URL you would like to fetch publications from.
(Important Note : The fetch is made to fetch from the given URLs, and the seeder is also made to fetch from these URLs. Thus, using a different URL will most likely fail in fetch, unless the prismaSeeder
function is modified accordingly.)
npx prisma db seed
: Updates / Seeds the database (To be called on terminal after the required changes.)
Docker images can be build and deployed locally.
- docker-compose installs two images, one for the prisma postgres database and one for the web application
- Dockerfile conducts the build of the server.
- docker-entrypoint.sh generates the Prisma client and starts the application.
The following file can be executed to deploy the app: deploy.sh
. Note that we assume that Docker Daemon is running (download here). The shell script conducts the following.
- Pulls latest git files
- Checks if docker is running (assuming that you have downloaded and started docker. Link to download)
- run
docker compose build
Builds the docker container (docker compose and Dockerfile. Warning: This might take a long time. ) - run
docker compose up
once the Docker completes build, this starts the docker container (docker-entrypoint.sh) - Options:
-d
to deploy in detatched mode. - And you should have your locally deployed hyper-recent ready through localhost:3000 (Unless this port is taken)!
To manually deploy the application, you can run
docker compose up --build
.
A useful diagram showcasing the steps for building is available here written in mermaidjs.
Users can directly interact with the application locally through the following commands
npm install
: installs the dependencies required for the javascript applicationnpm run dev
: start the project in development modenpm run build
: build the production version of the projdctnpm start
: start the production server
A cron job can be automated to automatically update the PostgreSQL database. No code changes are required to conduct this update as the Web Application will pull data from the database.
There are two options to automate the retrieval of new articles.
This method is the simplest. We have provided a script at run-prisma-seed.sh which can be added into the crontab schedule to be executed at a scheduled time.
- Open a terminal and execute
cd path/to/your/project
- Execute
crontab -e
- Add
0 15 * * * /bin/bash /path/to/your/project/run-prisma-seed.sh
to the bottom of the file.
More information on cron jobs and crontab
can be found here.
Assuming that code is hosted on GitHub, we have created a GitHub Actions script called cron-job-db-update.yml which automatically conducts a continuous integration workflow for the application. The workflow is scheduled on the GitHub Actions VMs daily at 5 am. A database credential is provided as a GitHub Actions Secret. This option is suggested if the database can be accessed through a network.
End-to-end testing is built through Playwright. The following are the steps to running the tests:
-
In the root directory, run:
npx playwright install
-
Test file is stored in the tests folder, example.spec.ts
-
Run the following command for running all tests:
npx playwright test
-
Run tests in debug mode:
npx playwright test –debug
-
Run tests in headed mode (with UI):
npx playwright test --ui
-
Open the report:
npx playwright show-report
-
playwright.config.ts configuration file allows you to set up browser options, test timeouts, base URL, and other settings.
-
The playwright tests are CI/CD integrated using GitHub Actions. Details are available below.
-
Use the provided
playwright.config.ts
to configure test execution in the pipeline, such as the number of workers, browsers to test on, test timeouts, and others.
-
Each test has a descriptive title, such as "filter posts by topic correctly", "check date range picker exists”, and "selecting date range filters posts correctly".
-
Since playwright is an end-to-end testing software, it simulates browsing the project on different browsers and simulates clicking specific buttons and selecting specific filters. Thus, changing the frontend UI drastically will cause failure in tests, so the tests must always be updated.
For testing the backend, a separate readme with API design and configurations is available at README-API.md. In addition, postman.json can be loaded into Postman API testing to check endpoints.
- Make sure the tests are passing:
npm test
- For a bug fix / patch release, run
npm version patch
. - For a new feature release, run
npm version minor
. - For a breaking API change, run
npm version major.
- For a specific version number (e.g. 1.2.3), run
npm version 1.2.3
. - Push the release:
git push origin --tags
- Publish a GitHub release so that releases are versioned and viewable to users.
There are automatic deployments on a successful deployment to the main branch. The GitHub Actions workflow file at github-ci-cd.yml will trigger after every pull request and after every commit in the main branch. A diagram of this workflow is availble here. Branch protection is enabled on the main
branch to ensure that all commits go through proper versioning and are adequately reviewed.
-
Branching Strategy:
- We use a trunk based branching with
main
branch for development and will always be deployable. - Deliverable branches are created from
main
and are named using the formatd{number}
. - Feature branches are created from
d{number}
and are named using the formatfeature/{feature-name}
. - Documentation is updated using branches and are named using the format
docs/{documentation-name}
.
- We use a trunk based branching with
-
Pull Requests:
- Pull requests are created from feature branches or bugs branches to the
d{number}
branch. - Each pull request is reviewed by at least one other team member of the same team (frontend / backend).
- The team leaders / product managers are responsible for merging the reviewed pull requests.
- Pull requests are created from feature branches or bugs branches to the
-
Code Reviews:
- Code reviews are mandatory for all pull requests by at least one other person.
- Reviewers check for code quality, organization, and bugs.
The application is currently deployed to three environments. These environments are described in detail below. The recommended method to deploy to incur the fewest charges is through Docker and on a Linux machine. Details on deployment information and configurations are described below.
-
Development Environment: dev.hyper-recent.online. This environment is a Digitalocean Droplet (VM running Ubuntu) with a local postgreSQL DB. PM2 and NGINX are used for server and managing processes. This environment requires manual intervention. SSL is provided from Let's Encrypt.
-
User Acceptance Testing Environment: uat.hyper-recent.online. This is the CI/CD environment where after every pull request changes are pushed to. The application is hosted on Digitalocean App Platform with a Digitalocean PostgresQL Managed DB
-
Production Environment: hyper-recent.online. This is an environment hosted on Digitalocean App Platform with a higher CPU and memory allotment. In addition, an Azure PostgreSQL Managed DB with a higher SLA is used. This provides additional monitoring and additional connections. After every pull request accepted to main and after smoke tests pass is this environment pushed to.
Resources for assisting deployment are listed below.