Skip to content

Database creation

Riya Chhikara edited this page Jul 1, 2024 · 3 revisions

Steps to recreate the PostgreSQL Database

  1. Ensure that the Docker is installed

  2. If you have an existing docker container, use this code in your terminal:

docker stop chatlse-postgres

docker rm chatlse-postgres

  1. Recreate the PostgreSQL Container and Database Run the following command to create a new PostgreSQL container with the pgvector extension:

docker run -itd --name chatlse-postgres --restart unless-stopped -p 5432:5432 -e POSTGRES_PASSWORD=chatlse -e POSTGRES_USER=chatlse -e POSTGRES_DB=chatlse -d pgvector/pgvector:0.7.1-pg16

  1. Check if the container is running:

docker ps

  1. Ensure .env file exists in the directory. This is same as .env.sample file

POSTGRES_HOST = localhost

POSTGRES_USERNAME = chatlse

POSTGRES_PASSWORD = chatlse

POSTGRES_DATABASE = chatlse

POSTGRES_PORT = 5432

POSTGRES_SSL = disable

  1. Run the crawler

scrapy crawl lse_crawler

The crawler takes about 10 minutes to run, and there will be error messages in the terminal for some URLs. These are mostly links to external websites which were forbidden for the crawler. The final message on the terminal will be "ItemToPostgresPipeline close spider" which means that the crawling is finished.

  1. To run the queries on the database, you can use pgAdmin or psql in the terminal (as shown):

docker exec -it chatlse-postgres psql -U chatlse -d chatlse

You'll see chatlse=# after this code runs, and can enter the queries:

SELECT * FROM webpage;

Clone this wiki locally