-
Notifications
You must be signed in to change notification settings - Fork 2
Database creation
-
Ensure that the Docker is installed
-
If you have an existing docker container, use this code in your terminal:
docker stop chatlse-postgres
docker rm chatlse-postgres
- Recreate the PostgreSQL Container and Database Run the following command to create a new PostgreSQL container with the pgvector extension:
docker run -itd --name chatlse-postgres --restart unless-stopped -p 5432:5432 -e POSTGRES_PASSWORD=chatlse -e POSTGRES_USER=chatlse -e POSTGRES_DB=chatlse -d pgvector/pgvector:0.7.1-pg16
- Check if the container is running:
docker ps
- Ensure
.envfile exists in the directory. This is same as.env.samplefile
POSTGRES_HOST = localhost
POSTGRES_USERNAME = chatlse
POSTGRES_PASSWORD = chatlse
POSTGRES_DATABASE = chatlse
POSTGRES_PORT = 5432
POSTGRES_SSL = disable
- Run the crawler
scrapy crawl lse_crawler
The crawler takes about 10 minutes to run, and there will be error messages in the terminal for some URLs. These are mostly links to external websites which were forbidden for the crawler. The final message on the terminal will be "ItemToPostgresPipeline close spider" which means that the crawling is finished.
- To run the queries on the database, you can use pgAdmin or psql in the terminal (as shown):
docker exec -it chatlse-postgres psql -U chatlse -d chatlse
You'll see chatlse=# after this code runs, and can enter the queries:
SELECT * FROM webpage;