Skip to content

Bon99yun/safety_rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

35 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

PDF ๊ธฐ๋ฐ˜ ์งˆ์˜์‘๋‹ต RAG ์‹œ์Šคํ…œ

ํ–‰์‚ฌ ๋ฐ ์žฌ๋‚œ ์•ˆ์ „ ๊ด€๋ จ ๋ฌธ์„œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์— ์ •ํ™•ํ•œ ๋‹ต๋ณ€์„ ์ œ๊ณตํ•˜๋Š” RAG(Retrieval-Augmented Generation) ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค.

Google Gemini ๋ชจ๋ธ๊ณผ LangChain/LangGraph ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์•ˆ์ „ ์ง€์‹์˜ ๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ์„ ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค.

๐Ÿ“‹ ๋ชฉ์ฐจ


์ฃผ์š” ๊ธฐ๋Šฅ

๐Ÿ“„ ๋ฌธ์„œ ์ฒ˜๋ฆฌ

  • PDF ํ…์ŠคํŠธ ์ถ”์ถœ: pdfplumber๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์„œ ๋‚ด์šฉ ์ถ”์ถœ
  • OCR ์ฒ˜๋ฆฌ: Google Cloud Vision API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ๊ธฐ๋ฐ˜ PDF ํ…์ŠคํŠธ ์ถ”์ถœ
  • ์˜๋ฏธ ๊ธฐ๋ฐ˜ ๋ถ„ํ• (Semantic Chunking): ๋ฌธ๋งฅ์„ ๊ณ ๋ คํ•œ ์ง€๋Šฅํ˜• ๋ฌธ์„œ ๋ถ„ํ• 
  • ์ปจํ…์ŠคํŠธ ๋ณด๊ฐ•: Anthropic์˜ Contextual Retrieval ๊ธฐ๋ฒ• ์ ์šฉ

๐Ÿ” ์ง€๋Šฅํ˜• ๊ฒ€์ƒ‰

  • ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒ€์ƒ‰: ์˜๋ฏธ ๊ฒ€์ƒ‰(70%) + ํ‚ค์›Œ๋“œ ๊ฒ€์ƒ‰(30%) ๊ฒฐํ•ฉ
  • ์žฌ์ˆœ์œ„(Re-ranking): Vertex AI Reranker๋ฅผ ํ†ตํ•œ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ์ •๋ฐ€ํ™”
  • ๋‹ค์ค‘ ๋ฒกํ„ฐ ์Šคํ† ์–ด: ์•ˆ์ „ ๋ฌธ์„œ, ๋‚ ์”จ ์‚ฌ๊ณ  ์‚ฌ๋ก€, ํ–‰์‚ฌ ์‚ฌ๊ณ  ์‚ฌ๋ก€ ๋ณ„๋„ ๊ด€๋ฆฌ

๐Ÿค– ์งˆ์˜์‘๋‹ต ์ฒด์ธ

  • ์ผ๋ฐ˜ ์ฑ„ํŒ…: ์•ˆ์ „ ์ง€์‹์— ๋Œ€ํ•œ ๋Œ€ํ™”ํ˜• ์งˆ์˜์‘๋‹ต
  • ์–‘์‹ ๊ธฐ๋ฐ˜ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ: ํ–‰์‚ฌ ์ •๋ณด๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ๋งž์ถคํ˜• ์•ˆ์ „ ๊ฐ€์ด๋“œ ์ƒ์„ฑ
  • ๋‚ ์”จ ๊ธฐ๋ฐ˜ ๋ถ„์„: ์‹ค์‹œ๊ฐ„ ๋‚ ์”จ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•œ ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ ์ƒ์„ฑ
  • ํ›„์† ์งˆ๋ฌธ ์ฒ˜๋ฆฌ: ์ƒ์„ฑ๋œ ์•ˆ๋‚ด๋ฌธ์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์งˆ๋ฌธ ์ง€์›

๐Ÿ—๏ธ ๋ฐฐํฌ

  • Docker & Docker Compose ์ง€์›
  • FastAPI ๊ธฐ๋ฐ˜ REST API
  • ์„ธ์…˜ ๊ธฐ๋ฐ˜ ๋Œ€ํ™” ๊ธฐ์–ต ๊ด€๋ฆฌ

๋ฐ์ดํ„ฐ ์ถœ์ฒ˜

ํ•„์ˆ˜ ๋ฐ์ดํ„ฐ: ์ด ํ”„๋กœ์ ํŠธ๋ฅผ ์‹คํ–‰ํ•˜๋ ค๋ฉด ๋‹ค์Œ Hugging Face ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

https://huggingface.co/datasets/bong9513/safety_rag_data

๊ด€๋ จ ํ”„๋กœ์ ํŠธ: ๋ณธ ํ”„๋กœ์ ํŠธ๋Š” safety-navigator์™€ ์—ฐ๋™๋ฉ๋‹ˆ๋‹ค.


๊ธฐ์ˆ  ์Šคํƒ

์นดํ…Œ๊ณ ๋ฆฌ ๊ธฐ์ˆ 
์–ธ์–ด Python 3.11
ํ”„๋ ˆ์ž„์›Œํฌ FastAPI, Uvicorn
AI/ML Google Gemini, Vertex AI, LangChain, LangGraph
๊ฒ€์ƒ‰ FAISS, BM25Retriever, Vertex AIRank
๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ pdfplumber, pandas, numpy
๋ฐฐํฌ Docker, Docker Compose
์™ธ๋ถ€ API Google Cloud Vision, Google Geocoding, Open-Meteo Weather

ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

.
โ”œโ”€โ”€ .env                       # ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •
โ”œโ”€โ”€ .gitignore                 # Git ๋ฌด์‹œ ํŒŒ์ผ
โ”œโ”€โ”€ docker-compose.yml         # Docker Compose ์„ค์ •
โ”œโ”€โ”€ Dockerfile                 # Docker ์ด๋ฏธ์ง€ ์„ค์ •
โ”œโ”€โ”€ fire_app.py                # FastAPI ๋ฉ”์ธ ์„œ๋ฒ„
โ”œโ”€โ”€ requirements.txt           # Python ํŒจํ‚ค์ง€ ์˜์กด์„ฑ
โ”œโ”€โ”€ README.md                  # ์ด ๋ฌธ์„œ
โ””โ”€โ”€ vector/                    # ๋ฐ์ดํ„ฐ ๋ฐ ์ •์˜ ํด๋”
    โ”œโ”€โ”€ data/                  # ๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ
    โ”‚   โ”œโ”€โ”€ case_data/         # ์‚ฌ๊ณ  ์‚ฌ๋ก€ ๋ฐ์ดํ„ฐ
    โ”‚   โ”‚   โ”œโ”€โ”€ original_data/ # ์›๋ณธ ์‚ฌ๊ณ  ์‚ฌ๋ก€
    โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ case_climate.pkl      # ๋‚ ์”จ ๊ด€๋ จ ์‚ฌ๊ณ  ์‚ฌ๋ก€
    โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ case_festival.pkl     # ํ–‰์‚ฌ ๊ด€๋ จ ์‚ฌ๊ณ  ์‚ฌ๋ก€
    โ”‚   โ”‚   โ””โ”€โ”€ contextual_content_docs/  # ์ปจํ…์ŠคํŠธ ๋ณด๊ฐ•๋œ ๋ฌธ์„œ
    โ”‚   โ”œโ”€โ”€ chunked_docs/       # ์˜๋ฏธ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ถ„ํ• ๋œ ๋ฌธ์„œ
    โ”‚   โ”œโ”€โ”€ original_full_text/ # PDF ์›๋ณธ ํ…์ŠคํŠธ
    โ”‚   โ”œโ”€โ”€ original_pdf/       # ๋ถ„์„ํ•  PDF ์›๋ณธ
    โ”‚   โ””โ”€โ”€ vector_store/       # FAISS ๋ฒกํ„ฐ DB
    โ””โ”€โ”€ definition/             # RAG ํŒŒ์ดํ”„๋ผ์ธ ์ •์˜
        โ”œโ”€โ”€ chain_for_chat.py        # ์ผ๋ฐ˜ ์ฑ„ํŒ… ์ฒด์ธ
        โ”œโ”€โ”€ chain_for_form.py        # ์–‘์‹ ๊ธฐ๋ฐ˜ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ ์ฒด์ธ
        โ”œโ”€โ”€ chain_for_form_chat.py   # ํ›„์† ์งˆ๋ฌธ ์ฒ˜๋ฆฌ ์ฒด์ธ
        โ”œโ”€โ”€ custom.py                # ๋‚ ์”จ ๊ธฐ๋ฐ˜ ์•ˆ์ „ ๋ถ„์„ ์ฒด์ธ
        โ”œโ”€โ”€ get_weather.py           # ๋‚ ์”จ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ
        โ”œโ”€โ”€ ocr.py                   # OCR ์ฒ˜๋ฆฌ
        โ”œโ”€โ”€ make_context.py          # ์ปจํ…์ŠคํŠธ ์ƒ์„ฑ
        โ”œโ”€โ”€ make_contextual_content_with_caching.py  # ์บ์‹ฑ ์ ์šฉ ์ปจํ…์ŠคํŠธ ์ƒ์„ฑ
        โ”œโ”€โ”€ make_docs.py             # ๋ฌธ์„œ ๋ถ„ํ• 
        โ”œโ”€โ”€ make_full_text.py        # ํ…์ŠคํŠธ ์ถ”์ถœ
        โ”œโ”€โ”€ make_vector_store.py     # ๋ฒกํ„ฐ ์Šคํ† ์–ด ์ƒ์„ฑ
        โ”œโ”€โ”€ vector_store.py          # ๋ฒกํ„ฐ DB ๊ด€๋ฆฌ
        โ”œโ”€โ”€ case_vector_store.py     # ์‚ฌ๊ณ  ์‚ฌ๋ก€ ๋ฒกํ„ฐ ์Šคํ† ์–ด
        โ”œโ”€โ”€ semantic_split_genai.py  # GenAI ์˜๋ฏธ ๋ถ„ํ• 
        โ””โ”€โ”€ semantic_split_vertex.py # Vertex AI ์˜๋ฏธ ๋ถ„ํ• 

๋น ๋ฅธ ์‹œ์ž‘

์‚ฌ์ „ ์š”๊ตฌ์‚ฌํ•ญ

  • Docker & Docker Compose
  • Google Cloud ํ”„๋กœ์ ํŠธ
  • Google Cloud ์„œ๋น„์Šค ๊ณ„์ • ์ธ์ฆ ํ‚ค

1. ํ”„๋กœ์ ํŠธ ํด๋ก 

git clone https://github.com/singbong/safety_rag.git
cd safety_rag

2. ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •

.env ํŒŒ์ผ ์ƒ์„ฑ:

GOOGLE_API_KEY="YOUR_GOOGLE_API_KEY"
PROJECT_ID="YOUR_PROJECT_ID"
GEOCODING_API="YOUR_GOOGLE_GEOCODING_API_KEY"

์„œ๋น„์Šค ๊ณ„์ • ์ธ์ฆ ํ‚ค๋ฅผ vector/definition/ ๋””๋ ‰ํ† ๋ฆฌ์— ๋ฐฐ์น˜ํ•˜์„ธ์š”.

3. Docker ์ปจํ…Œ์ด๋„ˆ ์‹คํ–‰

docker-compose up --build -d

API๋Š” http://localhost:8000์—์„œ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

4. ๋ฐ์ดํ„ฐ ์ค€๋น„

Hugging Face์—์„œ ๋ฐ์ดํ„ฐ์…‹์„ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ vector/data/ ๋””๋ ‰ํ† ๋ฆฌ์— ๋ฐฐ์น˜ํ•˜์„ธ์š”.


API ์‚ฌ์šฉ ๊ฐ€์ด๋“œ

1. ์ผ๋ฐ˜ ์ฑ„ํŒ… (/api/chat)

์•ˆ์ „ ์ง€์‹์— ๋Œ€ํ•œ ์ผ๋ฐ˜์ ์ธ ์งˆ๋ฌธ์— ๋‹ต๋ณ€ํ•ฉ๋‹ˆ๋‹ค.

import requests

response = requests.post("http://localhost:8000/api/chat", json={
    "question": "์ง€์ง„ ๋ฐœ์ƒ ์‹œ ํ–‰๋™ ์š”๋ น ์•Œ๋ ค์ค˜",
    "session_id": "user123"
})

print(response.json()['final_answer'])

2. ์–‘์‹ ๊ธฐ๋ฐ˜ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ ์ƒ์„ฑ (/api/generate_form)

ํ–‰์‚ฌ ์ •๋ณด๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

response = requests.post("http://localhost:8000/api/generate_form", json={
    "place_name": "2025 ํ•œ๊ฐ• ์—ฌ๋ฆ„ ๋ฎค์ง ํŽ˜์Šคํ‹ฐ๋ฒŒ",
    "type": "๋Œ€๊ทœ๋ชจ ์•ผ์™ธ ๊ณต์—ฐ",
    "region": "์„œ์šธ, ์—ฌ์˜๋„ ํ•œ๊ฐ•๊ณต์›",
    "period": "2025๋…„ 8์›” 8์ผ ~ 2025๋…„ 8์›” 10์ผ",
    "description": "๋‹ค์–‘ํ•œ ์žฅ๋ฅด์˜ ์•„ํ‹ฐ์ŠคํŠธ๋“ค๊ณผ ํ•จ๊ป˜ํ•˜๋Š” 3์ผ๊ฐ„์˜ ์ถ•์ œ",
    "category": "์Œ์•…/ํŽ˜์Šคํ‹ฐ๋ฒŒ",
    "related_documents": "festival_guide.pdf",  # ํ–‰์‚ฌ ๊ด€๋ จ ๋ฌธ์„œ (OCR ์ฒ˜๋ฆฌ๋จ)
    "emergency_contact_name": "์ข…ํ•ฉ์ƒํ™ฉ์‹ค ์•ˆ์ „๊ด€๋ฆฌํŒ€",
    "emergency_contact_phone": "02-123-4567"
})

print(response.json()['final_answer'])

3. ํ›„์† ์งˆ๋ฌธ (/api/form_chat)

์ƒ์„ฑ๋œ ์•ˆ๋‚ด๋ฌธ์— ๋Œ€ํ•ด ์ถ”๊ฐ€ ์งˆ๋ฌธ์„ ํ•ฉ๋‹ˆ๋‹ค.

# ๋จผ์ € /api/generate_form์œผ๋กœ ์ƒ์„ฑ๋œ ์•ˆ๋‚ด๋ฌธ ์‚ฌ์šฉ
generated_form = "..."  # ์ด์ „ API ์‘๋‹ต์˜ final_answer

response = requests.post("http://localhost:8000/api/form_chat", json={
    "generated_form": generated_form,
    "query": "์˜จ์—ด์งˆํ™˜ ์ฆ์ƒ์œผ๋กœ๋Š” ๊ตฌ์ฒด์ ์œผ๋กœ ์–ด๋–ค ๊ฒƒ๋“ค์ด ์žˆ๋‚˜์š”?",
    "session_id": "user123"
})

print(response.json()['final_answer'])

4. ๋‚ ์”จ ๊ธฐ๋ฐ˜ ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ (/api/custom_form)

์‹ค์‹œ๊ฐ„ ๋‚ ์”จ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•œ ๊ณ ๊ธ‰ ์•ˆ์ „ ๋ถ„์„์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

response = requests.post("http://localhost:8000/api/custom_form", json={
    "place_name": "2025 ํ•œ๊ฐ• ์—ฌ๋ฆ„ ๋ฎค์ง ํŽ˜์Šคํ‹ฐ๋ฒŒ",
    "type": "๋Œ€๊ทœ๋ชจ ์•ผ์™ธ ๊ณต์—ฐ",
    "location": "์„œ์šธ, ์—ฌ์˜๋„ ํ•œ๊ฐ•๊ณต์›",
    "period": "2025๋…„ 8์›” 8์ผ ~ 2025๋…„ 8์›” 10์ผ",
    "description": "๋‹ค์–‘ํ•œ ์žฅ๋ฅด์˜ ์•„ํ‹ฐ์ŠคํŠธ๋“ค๊ณผ ํ•จ๊ป˜ํ•˜๋Š” 3์ผ๊ฐ„์˜ ์ถ•์ œ",
    "category": "์Œ์•…/ํŽ˜์Šคํ‹ฐ๋ฒŒ",
    "emergency_contact_name": "์ข…ํ•ฉ์ƒํ™ฉ์‹ค ์•ˆ์ „๊ด€๋ฆฌํŒ€",
    "emergency_contact_phone": "02-123-4567",
    "expected_attendees": "50000",
    "related_documents": "festival_guide.pdf"  # OCR ์ฒ˜๋ฆฌ๋จ
})

result = response.json()
print("์ƒ์„ฑ๋œ ์•ˆ๋‚ด๋ฌธ:", result['generation'])
print("๋‚ ์”จ ์š”์•ฝ:", result['weather_summary'])

์‘๋‹ต ํ˜•์‹ ์˜ˆ์‹œ

{
    "generation": "์ƒ์„ฑ๋œ ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ",
    "hallu_check": {
        "answer_with_citations": "Grounding ๊ฒ€์ฆ ์™„๋ฃŒ๋œ ์•ˆ๋‚ด๋ฌธ"
    },
    "weather_summary": "๋ถ„์„๋œ ๋‚ ์”จ ์š”์•ฝ",
    "festival_query_list": ["ํ–‰์‚ฌ ๊ด€๋ จ ๊ฒ€์ƒ‰์–ด๋“ค"],
    "weather_query_list": ["๋‚ ์”จ ๊ด€๋ จ ๊ฒ€์ƒ‰์–ด๋“ค"],
    "safety_query_list": ["์•ˆ์ „ ๊ด€๋ จ ๊ฒ€์ƒ‰์–ด๋“ค"]
}

์•„ํ‚คํ…์ฒ˜ ์ƒ์„ธ

๋ฌธ์„œ ์ฒ˜๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ

  1. ํ…์ŠคํŠธ ์ถ”์ถœ: PDF โ†’ pdfplumber ํ…์ŠคํŠธ ์ถ”์ถœ
  2. OCR ์ฒ˜๋ฆฌ: ์ด๋ฏธ์ง€ PDF โ†’ Google Cloud Vision API โ†’ ํ…์ŠคํŠธ
  3. ์˜๋ฏธ ๊ธฐ๋ฐ˜ ๋ถ„ํ• : LangChain SemanticChunker + gemini-embedding-001
  4. ์ปจํ…์ŠคํŠธ ๋ณด๊ฐ•: gemini-2.5-flash-lite๋กœ ๋ฌธ์„œ ๋งฅ๋ฝ ์š”์•ฝ โ†’ ํ—ค๋” ์ƒ์„ฑ
  5. ๋ฒกํ„ฐํ™”: ์ปจํ…์ŠคํŠธ ๋ณด๊ฐ• ํ…์ŠคํŠธ โ†’ gemini-embedding-001 โ†’ FAISS ์ €์žฅ

๊ฒ€์ƒ‰ ํŒŒ์ดํ”„๋ผ์ธ

  1. ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒ€์ƒ‰:

    • ์˜๋ฏธ ๊ฒ€์ƒ‰(FAISS, 70%) + ํ‚ค์›Œ๋“œ ๊ฒ€์ƒ‰(BM25, 30%)
    • ์ดˆ๊ธฐ ๊ฒฐ๊ณผ: 150๊ฐœ ๋ฌธ์„œ
  2. ์žฌ์ˆœ์œ„(Re-ranking):

    • Vertex AI Reranker(semantic-ranker-default-004)
    • ์ตœ์ข… ๊ฒฐ๊ณผ: ์ƒ์œ„ 20๊ฐœ ๋ฌธ์„œ

์งˆ์˜์‘๋‹ต ์ฒด์ธ

chain_for_chat.py: ์ผ๋ฐ˜ ์ฑ„ํŒ…

์งˆ๋ฌธ ์ž…๋ ฅ
    โ†“
[Re-writer] ์งˆ๋ฌธ ์žฌ์ž‘์„ฑ (๋Œ€๋ช…์‚ฌ โ†’ ๊ตฌ์ฒด์  ์šฉ์–ด)
    โ†“
[Question Decomposer] ๋ณต์žกํ•œ ์งˆ๋ฌธ โ†’ ํ•˜์œ„ ์งˆ๋ฌธ ๋ถ„ํ•ด
    โ†“
[Search Document] ๋ฒกํ„ฐ ์Šคํ† ์–ด ๊ฒ€์ƒ‰ โ†’ Re-ranking
    โ†“
[Generator] ๋‹ต๋ณ€ ์ƒ์„ฑ ('์•ˆ์ „ ์ง€ํ‚ค๋ฏธ AI' ์—ญํ• )
    โ†“
[Hallucination Checker] Grounding ํ™•์ธ + ์ธ์šฉ
    โ†“
[Answer Beautifier] ๋งˆํฌ๋‹ค์šด ํ˜•์‹ ์ •์ œ
    โ†“
์ตœ์ข… ๋‹ต๋ณ€

chain_for_form.py: ์–‘์‹ ๊ธฐ๋ฐ˜ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ

ํ–‰์‚ฌ ์ •๋ณด ์ž…๋ ฅ
    โ†“
[Query Generator] ๋‹ค๊ฐ์  ๊ฒ€์ƒ‰์–ด ์ƒ์„ฑ (์œ„ํ—˜ ์‹œ๋‚˜๋ฆฌ์˜ค ๊ธฐ๋ฐ˜)
    โ†“
[Search Document] ์•ˆ์ „ ๊ด€๋ จ ๋ฌธ์„œ ๊ฒ€์ƒ‰
    โ†“
[Generator] ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ ์ƒ์„ฑ ('์•ˆ์ „ ์ „๋ฌธ๊ฐ€' ์—ญํ• )
    โ†“
[Grounding & Beautify] ์‹ ๋ขฐ๋„ ํ™•์ธ + ๋งˆํฌ๋‹ค์šด ์ •์ œ
    โ†“
์ตœ์ข… ์•ˆ๋‚ด๋ฌธ

custom.py: ๋‚ ์”จ ๊ธฐ๋ฐ˜ ์•ˆ์ „ ๋ถ„์„

ํ–‰์‚ฌ ์ •๋ณด ์ž…๋ ฅ
    โ†“
[Geocoding] ํ–‰์‚ฌ ์žฅ์†Œ๋ช… โ†’ ์œ„๋„/๊ฒฝ๋„ ๋ณ€ํ™˜
    โ†“
[Weather API] Open-Meteo 24์‹œ๊ฐ„ ๋‚ ์”จ ์˜ˆ๋ณด ์ˆ˜์ง‘
    โ†“
[๋‚ ์”จ ๋ถ„์„] Gemini 2.5 Flash Lite๋กœ ์œ„ํ—˜ ์š”์†Œ ์ถ”์ถœ
    โ†“
๋‹ค์ค‘ ๊ฒ€์ƒ‰์–ด ์ƒ์„ฑ:
    - festival_query_generator (ํ–‰์‚ฌ ๊ด€๋ จ)
    - weather_query_generator (๋‚ ์”จ ๊ด€๋ จ)
    - safety_query_generator (์ผ๋ฐ˜ ์•ˆ์ „)
    โ†“
๋‹ค์ค‘ ๋ฒกํ„ฐ ์Šคํ† ์–ด ๊ฒ€์ƒ‰:
    - search_festival_document (ํ–‰์‚ฌ ์‚ฌ๊ณ  ์‚ฌ๋ก€)
    - search_weather_document (๋‚ ์”จ ์‚ฌ๊ณ  ์‚ฌ๋ก€)
    - search_safety_document (์•ˆ์ „ ๋ฌธ์„œ)
    โ†“
[Generator] ์ข…ํ•ฉ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ ์ƒ์„ฑ
    โ†“
[Hallu Checker] Vertex AI Grounding ๊ฒ€์ฆ
    โ†“
์ตœ์ข… ๊ฒฐ๊ณผ: ์•ˆ๋‚ด๋ฌธ + ๋‚ ์”จ ์š”์•ฝ + ๊ฒ€์ƒ‰์–ด ๋ชฉ๋ก

ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์ƒ์„ธ

๋ณ€์ˆ˜ ์„ค๋ช… ํ•„์ˆ˜ ์—ฌ๋ถ€
GOOGLE_API_KEY Google Generative AI API ํ‚ค ํ•„์ˆ˜
PROJECT_ID Google Cloud ํ”„๋กœ์ ํŠธ ID ํ•„์ˆ˜
GEOCODING_API Google Geocoding API ํ‚ค ๋‚ ์”จ ๊ธฐ๋ฐ˜ ๊ธฐ๋Šฅ ์‹œ ํ•„์ˆ˜

๋ผ์ด์„ ์Šค

์ด ํ”„๋กœ์ ํŠธ์˜ ๋ผ์ด์„ ์Šค ์ •๋ณด๋ฅผ ํ™•์ธํ•˜์„ธ์š”.

๊ธฐ์—ฌ

์ด ํ”„๋กœ์ ํŠธ์— ๊ธฐ์—ฌํ•˜๊ณ  ์‹ถ์œผ์‹œ๋‹ค๋ฉด Pull Request๋ฅผ ์ œ์ถœํ•ด์ฃผ์„ธ์š”.

์ฃผ์š” ๊ธฐ๋Šฅ

  • ๋ฌธ์„œ ์ฒ˜๋ฆฌ: PDF ํŒŒ์ผ์„ ํ…์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  ์˜๋ฏธ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ถ„ํ• (Semantic Chunking)ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฒกํ„ฐ ์ž„๋ฒ ๋”ฉ: ์ฒ˜๋ฆฌ๋œ ํ…์ŠคํŠธ๋ฅผ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ FAISS ๋ฒกํ„ฐ ์Šคํ† ์–ด์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
  • ์งˆ์˜์‘๋‹ต: ์ €์žฅ๋œ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์— ๊ฐ€์žฅ ๊ด€๋ จ์„ฑ ๋†’์€ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ๋‹ค์–‘ํ•œ ์ธํ„ฐํŽ˜์ด์Šค: ์ผ๋ฐ˜ ์ฑ„ํŒ…, ์–‘์‹(Form) ๊ธฐ๋ฐ˜ ์ฑ„ํŒ… ๋“ฑ ์—ฌ๋Ÿฌ ์ข…๋ฅ˜์˜ ๋Œ€ํ™” ์ฒด์ธ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  • ์ปจํ…Œ์ด๋„ˆ ๊ธฐ๋ฐ˜: Docker ๋ฐ Docker Compose๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ”„๋กœ์ ํŠธ ํ™˜๊ฒฝ์„ ์‰ฝ๊ฒŒ ๊ตฌ์„ฑํ•˜๊ณ  ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ธฐ์ˆ  ์Šคํƒ

  • ์–ธ์–ด: Python 3
  • ํ•ต์‹ฌ ํ”„๋ ˆ์ž„์›Œํฌ:
    • LangChain & LangGraph: RAG ํŒŒ์ดํ”„๋ผ์ธ ๋ฐ ๋ณต์žกํ•œ Agent ๋กœ์ง ๊ตฌ์„ฑ
    • FastAPI: API ์„œ๋ฒ„ ๊ตฌ์ถ•
  • AI:
    • google-generativeai & google-cloud-aiplatform: Google Gemini ๋ชจ๋ธ API ํ™œ์šฉ
    • langchain-google-vertexai: LangChain๊ณผ Vertex AI ํ†ตํ•ฉ
    • google-cloud-vision: Google Cloud Vision API (OCR ์ฒ˜๋ฆฌ)
    • FAISS: ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค (์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰)
    • transformers & sentence-transformers: ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋ฐ ์ž„๋ฒ ๋”ฉ
  • ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ:
    • pdfplumber: PDF ํ…์ŠคํŠธ ์ถ”์ถœ
    • pandas, numpy: ๋ฐ์ดํ„ฐ ์กฐ์ž‘ ๋ฐ ๋ถ„์„
  • ๋ฐฐํฌ:
    • Docker & docker-compose: ์ปจํ…Œ์ด๋„ˆ ๊ธฐ๋ฐ˜ ๋ฐฐํฌ ๋ฐ ์‹คํ–‰ ํ™˜๊ฒฝ
    • uvicorn: ASGI ์„œ๋ฒ„

ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

.
โ”œโ”€โ”€ .env                    # API ํ‚ค, ํ”„๋กœ์ ํŠธ ID ๋“ฑ ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ • ํŒŒ์ผ
โ”œโ”€โ”€ .git/                   # Git ๋ฒ„์ „ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ ๋””๋ ‰ํ† ๋ฆฌ
โ”œโ”€โ”€ .gitignore              # Git ์ถ”์  ์ œ์™ธ ๋ชฉ๋ก ํŒŒ์ผ
โ”œโ”€โ”€ api_example.ipynb       # API ์‚ฌ์šฉ๋ฒ• ์˜ˆ์‹œ๋ฅผ ๋‹ด์€ Jupyter Notebook
โ”œโ”€โ”€ docker-compose.yml      # Docker ๋‹ค์ค‘ ์ปจํ…Œ์ด๋„ˆ ์‹คํ–‰์„ ์œ„ํ•œ ์„ค์ • ํŒŒ์ผ
โ”œโ”€โ”€ Dockerfile              # ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ Docker ์ด๋ฏธ์ง€ ๋นŒ๋“œ ์„ค์ • ํŒŒ์ผ
โ”œโ”€โ”€ fire_app.py             # FastAPI ์„œ๋ฒ„ ์‹คํ–‰ ๋ฐ CLI ๋ช…๋ น์–ด ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ๋ฉ”์ธ ์Šคํฌ๋ฆฝํŠธ
โ”œโ”€โ”€ README.md               # ํ”„๋กœ์ ํŠธ ์„ค๋ช… ๋ฌธ์„œ (ํ˜„์žฌ ํŒŒ์ผ)
โ”œโ”€โ”€ requirements.txt        # Python ํŒจํ‚ค์ง€ ์˜์กด์„ฑ ๋ชฉ๋ก
โ””โ”€โ”€ vector/                 # ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฐ RAG ๋กœ์ง ๊ด€๋ จ ๋””๋ ‰ํ† ๋ฆฌ
    โ”œโ”€โ”€ data/               # ๋ฐ์ดํ„ฐ ์ €์žฅ ๋””๋ ‰ํ† ๋ฆฌ
    โ”‚   โ”œโ”€โ”€ chunked_docs/   # ์˜๋ฏธ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ถ„ํ• ๋œ ๋ฌธ์„œ ์กฐ๊ฐ(.pkl) ์ €์žฅ ์œ„์น˜
    โ”‚   โ”œโ”€โ”€ contextual_content_docs/ # ์ปจํ…์ŠคํŠธ ์š”์•ฝ์ด ์ถ”๊ฐ€๋œ ๋ฌธ์„œ ์กฐ๊ฐ(.pkl) ์ €์žฅ ์œ„์น˜
    โ”‚   โ”œโ”€โ”€ original_full_text/    # PDF์—์„œ ์ถ”์ถœ๋œ ์›๋ณธ ํ…์ŠคํŠธ(.txt) ์ €์žฅ ์œ„์น˜
    โ”‚   โ”œโ”€โ”€ original_pdf/          # ๋ถ„์„ํ•  ์›๋ณธ PDF ๋ฌธ์„œ ์ €์žฅ ์œ„์น˜
    โ”‚   โ””โ”€โ”€ vector_store/          # ์ƒ์„ฑ๋œ FAISS ๋ฒกํ„ฐ DB(.index, .pkl) ์ €์žฅ ์œ„์น˜
    โ””โ”€โ”€ definition/         # RAG ํŒŒ์ดํ”„๋ผ์ธ์˜ ํ•ต์‹ฌ ๋กœ์ง ์ •์˜ ๋””๋ ‰ํ† ๋ฆฌ
        โ”œโ”€โ”€ __pycache__/           # Python ์ปดํŒŒ์ผ ์บ์‹œ ํŒŒ์ผ ๋””๋ ‰ํ† ๋ฆฌ
        โ”œโ”€โ”€ arcane-footing-464017-v9-a73a60318d02.json # Google Cloud ์„œ๋น„์Šค ๊ณ„์ • ์ธ์ฆ ํ‚ค
        โ”œโ”€โ”€ chain_for_chat.py      # ์ผ๋ฐ˜ ์ฑ„ํŒ… RAG ์ฒด์ธ ์ •์˜
        โ”œโ”€โ”€ chain_for_form.py      # ์–‘์‹(Form) ๊ธฐ๋ฐ˜ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ ์ƒ์„ฑ ์ฒด์ธ ์ •์˜
        โ”œโ”€โ”€ chain_for_form_chat.py # ์ƒ์„ฑ๋œ ์•ˆ๋‚ด๋ฌธ์— ๋Œ€ํ•œ ํ›„์† ์งˆ๋ฌธ ์ฒ˜๋ฆฌ ์ฒด์ธ ์ •์˜
        โ”œโ”€โ”€ make_context.py        # ๋ฌธ์„œ ์กฐ๊ฐ์— ๋Œ€ํ•œ ์ปจํ…์ŠคํŠธ ์š”์•ฝ ์ƒ์„ฑ ์Šคํฌ๋ฆฝํŠธ
        โ”œโ”€โ”€ make_contextual_content_with_caching.py # ์บ์‹ฑ์„ ์ ์šฉํ•˜์—ฌ ์ปจํ…์ŠคํŠธ ์š”์•ฝ์„ ์ƒ์„ฑํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ
        โ”œโ”€โ”€ make_docs.py           # ํ…์ŠคํŠธ๋ฅผ ์˜๋ฏธ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ถ„ํ• ํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ
        โ”œโ”€โ”€ make_full_text.py      # PDF์—์„œ ์ „์ฒด ํ…์ŠคํŠธ๋ฅผ ์ถ”์ถœํ•˜๋Š” ์Šคํฌ๋ฆฝํŠธ
        โ”œโ”€โ”€ semantic_split_genai.py # GenAI ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ ์˜๋ฏธ ๊ธฐ๋ฐ˜ ๋ถ„ํ•  ์œ ํ‹ธ๋ฆฌํ‹ฐ
        โ”œโ”€โ”€ semantic_split_vertex.py# Vertex AI ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ ์˜๋ฏธ ๊ธฐ๋ฐ˜ ๋ถ„ํ•  ์œ ํ‹ธ๋ฆฌํ‹ฐ
        โ”œโ”€โ”€ custom.py              # ๋‚ ์”จ ๊ธฐ๋ฐ˜ ์•ˆ์ „ ๋ถ„์„ ๋ฐ ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ ์ƒ์„ฑ ์ฒด์ธ
        โ””โ”€โ”€ vector_store.py        # FAISS ๋ฒกํ„ฐ DB ์ƒ์„ฑ, ์ €์žฅ, ๋กœ๋“œ ๋ฐ ๊ฒ€์ƒ‰/์žฌ์ˆœ์œ„ ๋กœ์ง ๊ด€๋ฆฌ

์„ค์น˜ ๋ฐ ์‹คํ–‰ ๋ฐฉ๋ฒ•

์‚ฌ์ „ ์ค€๋น„ ์‚ฌํ•ญ

  • Docker์™€ Docker Compose๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • Google Cloud API ํ‚ค ๋ฐ ์„œ๋น„์Šค ๊ณ„์ • ์ธ์ฆ ์ •๋ณด๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

์‹คํ–‰ ์ ˆ์ฐจ

  1. ํ”„๋กœ์ ํŠธ ๋ณต์ œ

    git clone https://github.com/singbong/safety_rag.git
    cd safety_rag
  2. ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ • ํ”„๋กœ์ ํŠธ ๋ฃจํŠธ ๋””๋ ‰ํ„ฐ๋ฆฌ์— .env ํŒŒ์ผ์„ ์ƒ์„ฑํ•˜๊ณ  ๋‹ค์Œ๊ณผ ๊ฐ™์ด Google ๊ด€๋ จ ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. docker-compose.yml ํŒŒ์ผ์—์„œ ์ด .env ํŒŒ์ผ์„ ์ฐธ์กฐํ•˜์—ฌ ์ปจํ…Œ์ด๋„ˆ ๋‚ด์— ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

    GOOGLE_API_KEY="YOUR_GOOGLE_API_KEY"
    PROJECT_ID="YOUR_PROJECT_ID"
    GOOGLE_APPLICATION_CREDENTIALS="PATH/TO/YOUR/CREDENTIALS.json"

    GOOGLE_APPLICATION_CREDENTIALS์— ์ง€์ •๋œ .json ์ธ์ฆ ํ‚ค ํŒŒ์ผ์€ ํ”„๋กœ์ ํŠธ ๋‚ด๋ถ€์— ์œ„์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. (์˜ˆ: vector/definition/your-credentials.json)

  3. ๋ฐ์ดํ„ฐ ์ค€๋น„ vector/data/original_pdf/ ๋””๋ ‰ํ„ฐ๋ฆฌ์— ๋ถ„์„ํ•  PDF ํŒŒ์ผ๋“ค์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

    ์ฐธ๊ณ : ํ–‰์‚ฌ ๊ด€๋ จ PDF ํŒŒ์ผ(ํ–‰์‚ฌ์žฅ ์•ˆ๋‚ด๋„, ํƒ€์ž„ํ…Œ์ด๋ธ”, ์…”ํ‹€๋ฒ„์Šค ์šดํ–‰ ์ •๋ณด ๋“ฑ)์„ API๋กœ ์—…๋กœ๋“œํ•˜๋ฉด, ์‹œ์Šคํ…œ์ด Google Cloud Vision API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž๋™์œผ๋กœ OCR ์ฒ˜๋ฆฌํ•˜์—ฌ ํ…์ŠคํŠธ๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด AI๊ฐ€ ํ–‰์‚ฌ์˜ ๊ตฌ์ฒด์ ์ธ ์„ธ๋ถ€์‚ฌํ•ญ์„ ํŒŒ์•…ํ•˜์—ฌ ๋” ์ •ํ™•ํ•œ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

  4. ๋ฌธ์„œ ์ฒ˜๋ฆฌ ๋ฐ ๋ฒกํ„ฐ DB ์ƒ์„ฑ ์•„๋ž˜ ์Šคํฌ๋ฆฝํŠธ๋“ค์„ ์ˆœ์„œ๋Œ€๋กœ ์‹คํ–‰ํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ตฌ๋™ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๋‹จ๊ณ„๋Š” Docker ์ปจํ…Œ์ด๋„ˆ ๋‚ด์—์„œ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.

    • 1. ํ…์ŠคํŠธ ์ถ”์ถœ: pdfplumber๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ PDF ๋ฌธ์„œ์˜ ํ…์ŠคํŠธ๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
      docker-compose run --rm app python vector/definition/make_full_text.py
    • 2. ์˜๋ฏธ ๊ธฐ๋ฐ˜ ๋ถ„ํ•  (Chunking): ์ถ”์ถœ๋œ ํ…์ŠคํŠธ๋ฅผ ๋ฌธ์„œ ์กฐ๊ฐ(Chunk)์œผ๋กœ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค.
      docker-compose run --rm app python vector/definition/make_docs.py
    • 3. ์ปจํ…์ŠคํŠธ ๋ณด๊ฐ• (Context Enrichment): ๊ฐ ๋ฌธ์„œ ์กฐ๊ฐ์— ์š”์•ฝ ์ปจํ…์ŠคํŠธ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ฒ€์ƒ‰ ์ •ํ™•๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค.
      docker-compose run --rm app python vector/definition/make_context.py
    • 4. ๋ฒกํ„ฐํ™” ๋ฐ ์ €์žฅ: ์ตœ์ข… ํ…์ŠคํŠธ๋ฅผ ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  FAISS ๋ฒกํ„ฐ ์Šคํ† ์–ด๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ๋‹จ๊ณ„๋Š” vector_store.py์˜ create_vector_store ํ•จ์ˆ˜๋ฅผ ์ง์ ‘ ํ˜ธ์ถœํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
      docker-compose run --rm app python -c "from vector.definition.vector_store import store_vector_db; db = store_vector_db(); db.create_vector_store(save_path='../data/vector_store/faiss_vector_db')"
  5. ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์‹คํ–‰

    • API ์„œ๋ฒ„ ์‹คํ–‰: ์•„๋ž˜ ๋ช…๋ น์–ด๋กœ API ์„œ๋ฒ„๋ฅผ ๋ฐฑ๊ทธ๋ผ์šด๋“œ์—์„œ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

      docker-compose up --build -d

      API๋Š” http://127.0.0.1:8000์—์„œ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    • CLI ์ฑ„ํŒ… ์‹คํ–‰ (๋กœ์ปฌ): ์ฐธ๊ณ : ํ˜„์žฌ CLI ์ฑ„ํŒ… ๊ธฐ๋Šฅ์€ Docker ํ™˜๊ฒฝ์—์„œ ์ง์ ‘ ์‹คํ–‰ํ•˜๋Š” ๊ฒƒ์„ ์ง€์›ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋กœ์ปฌ Python ํ™˜๊ฒฝ์—์„œ ์‹คํ–‰ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

      # ๊ฐ€์ƒํ™˜๊ฒฝ ํ™œ์„ฑํ™”
      # source venv/bin/activate
      python fire_app.py chat

๊ธฐ์ˆ  ์ƒ์„ธ ์„ค๋ช…

๋ฌธ์„œ ์ฒ˜๋ฆฌ ๋ฐ ๋ฒกํ„ฐํ™”

  • ํ…์ŠคํŠธ ์ถ”์ถœ: pdfplumber๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์›๋ณธ PDF์—์„œ ํ…์ŠคํŠธ์™€ ํŽ˜์ด์ง€ ๋ฒˆํ˜ธ ๋“ฑ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • OCR ์ฒ˜๋ฆฌ: ์‚ฌ์šฉ์ž๊ฐ€ API๋กœ ์—…๋กœ๋“œํ•œ ํ–‰์‚ฌ ๊ด€๋ จ PDF ํŒŒ์ผ(ํ–‰์‚ฌ์žฅ ์•ˆ๋‚ด๋„, ํƒ€์ž„ํ…Œ์ด๋ธ”, ์…”ํ‹€๋ฒ„์Šค ์šดํ–‰ ์ •๋ณด ๋“ฑ)์„ Google Cloud Vision API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž๋™์œผ๋กœ OCR ์ฒ˜๋ฆฌํ•˜์—ฌ ํ…์ŠคํŠธ๋ฅผ ์ถ”์ถœํ•˜๊ณ , ์ด๋ฅผ AI ๋ถ„์„์— ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ์˜๋ฏธ ๊ธฐ๋ฐ˜ ๋ถ„ํ•  (Semantic Chunking):
    • Langchain์˜ SemanticChunker์™€ Google์˜ gemini-embedding-001 ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์„œ๋ฅผ ์˜๋ฏธ์  ๊ฒฝ๊ณ„์— ๋”ฐ๋ผ 1์ฐจ์ ์œผ๋กœ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค.
    • ๋ถ„ํ• ๋œ ์ฒญํฌ๊ฐ€ Gemini ๋ชจ๋ธ์˜ ํ† ํฐ ์ œํ•œ(2048 ํ† ํฐ)์„ ์ดˆ๊ณผํ•  ๊ฒฝ์šฐ, RecursiveCharacterTextSplitter์™€ ์œ ์‚ฌํ•œ ๋ฐฉ์‹์œผ๋กœ ์ถ”๊ฐ€ ๋ถ„ํ• ํ•˜์—ฌ ๋ชจ๋“  ์ฒญํฌ๊ฐ€ ํ† ํฐ ์ œํ•œ์„ ์ค€์ˆ˜ํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  • ์ปจํ…์ŠคํŠธ ๋ณด๊ฐ• (Contextual Enrichment) - Powered by Anthropic's Technique:
    • ๋ณธ ์‹œ์Šคํ…œ์€ ๊ฒ€์ƒ‰ ์ •ํ™•๋„๋ฅผ ๊ทน๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•ด Anthropic์˜ "Contextual Retrieval" ๋…ผ๋ฌธ์—์„œ ์ œ์‹œ๋œ ์•„์ด๋””์–ด์— ์ฐฉ์•ˆํ•˜์—ฌ ๊ฐ ๋ฌธ์„œ ์กฐ๊ฐ(Chunk)์— ํ’๋ถ€ํ•œ ์ปจํ…์ŠคํŠธ๋ฅผ ๋ถ€์—ฌํ•ฉ๋‹ˆ๋‹ค.
    • gemini-2.5-flash-lite ๋ชจ๋ธ์ด ์ „์ฒด ๋ฌธ์„œ์˜ ๋งฅ๋ฝ์„ ํŒŒ์•…ํ•˜์—ฌ, ๊ฐ ์กฐ๊ฐ์˜ ํ•ต์‹ฌ ๋‚ด์šฉ์„ ์š”์•ฝํ•˜๋Š” ํ—ค๋”(Header)๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • ์ƒ์„ฑ๋œ ํ—ค๋”๋Š” ์›๋ณธ ๋ฌธ์„œ ์กฐ๊ฐ์˜ ๋‚ด์šฉ๊ณผ ๊ฒฐํ•ฉ๋˜์–ด (์˜ˆ: ํ—ค๋”: [์š”์•ฝ ๋‚ด์šฉ]\n\n๋‚ด์šฉ: [์›๋ณธ ๋ฌธ์„œ ์กฐ๊ฐ]) ํ•˜๋‚˜์˜ ์™„์„ฑ๋œ ํ…์ŠคํŠธ๋กœ ๋งŒ๋“ค์–ด์ง‘๋‹ˆ๋‹ค.
    • ๋ฐ”๋กœ ์ด ๊ฒฐํ•ฉ๋œ ํ…์ŠคํŠธ๊ฐ€ ์ž„๋ฒ ๋”ฉ๋˜์–ด ๋ฒกํ„ฐ ์Šคํ† ์–ด์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋‹จ์ˆœ ํ‚ค์›Œ๋“œ ๋งค์นญ์„ ๋„˜์–ด์„  ๊นŠ์ด ์žˆ๋Š” ์˜๋ฏธ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰์ด ๊ฐ€๋Šฅํ•ด์ง€๋ฉฐ, ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ ์˜๋„์— ๊ฐ€์žฅ ๋ถ€ํ•ฉํ•˜๋Š” ์ •๋ณด๋ฅผ ์ •ํ™•ํ•˜๊ฒŒ ์ฐพ์•„๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ์ด ๊ณผ์ •์—์„œ Google GenAI์˜ Caching API๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ „์ฒด ๋ฌธ์„œ ํ…์ŠคํŠธ๋ฅผ ์บ์‹œ์— ์ €์žฅํ•จ์œผ๋กœ์จ, ๋ฐ˜๋ณต์ ์ธ API ํ˜ธ์ถœ ๋น„์šฉ๊ณผ ์‹œ๊ฐ„์„ ์ ˆ์•ฝํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฒกํ„ฐํ™”:
    • ์ปจํ…์ŠคํŠธ๊ฐ€ ๋ณด๊ฐ•๋œ ํ…์ŠคํŠธ("์ปจํ…์ŠคํŠธ ์š”์•ฝ : ์›๋ณธ ์ฒญํฌ ๋‚ด์šฉ")๋ฅผ gemini-embedding-001 ๋ชจ๋ธ์„ ํ†ตํ•ด ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
    • ์ƒ์„ฑ๋œ ๋ฒกํ„ฐ๋Š” FAISS (IndexFlatL2)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค.

๊ฒ€์ƒ‰ ๋ฐ ์žฌ์ˆœ์œ„ (Retrieval & Reranking)

  • ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒ€์ƒ‰ (Hybrid Search):
    • Langchain์˜ EnsembleRetriever๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ๊ฒ€์ƒ‰ ๋ฐฉ์‹์„ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค.
    • ์˜๋ฏธ ๊ฒ€์ƒ‰ (Semantic Search): FAISS ๋ฒกํ„ฐ ์ €์žฅ์†Œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ๊ณผ ์˜๋ฏธ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ๋ฌธ์„œ๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค. (๊ฐ€์ค‘์น˜ 70%)
    • ํ‚ค์›Œ๋“œ ๊ฒ€์ƒ‰ (Keyword Search): BM25Retriever๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์งˆ๋ฌธ์— ํฌํ•จ๋œ ํ•ต์‹ฌ ํ‚ค์›Œ๋“œ์™€ ์ผ์น˜ํ•˜๋Š” ๋ฌธ์„œ๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค. (๊ฐ€์ค‘์น˜ 30%)
    • ์ด ๋‘ ๊ฐ€์ง€ ๋ฐฉ์‹์˜ ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ดˆ๊ธฐ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ(150๊ฐœ)๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ์žฌ์ˆœ์œ„ (Reranking):
    • ์ดˆ๊ธฐ ๊ฒ€์ƒ‰๋œ 150๊ฐœ์˜ ๋ฌธ์„œ๋ฅผ VertexAIRank ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์งˆ๋ฌธ๊ณผ์˜ ๊ด€๋ จ์„ฑ์ด ๋†’์€ ์ˆœ์œผ๋กœ ์žฌ์ •๋ ฌํ•ฉ๋‹ˆ๋‹ค.
    • VertexAIRank๋Š” Google Cloud Vertex AI์—์„œ ์ œ๊ณตํ•˜๋Š” ์˜๋ฏธ ๊ธฐ๋ฐ˜ ๋žญํ‚น ์„œ๋น„์Šค๋กœ, semantic-ranker-default-004 ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฟผ๋ฆฌ์™€ ๋ฌธ์„œ ๊ฐ„์˜ ์˜๋ฏธ์  ์œ ์‚ฌ๋„๋ฅผ ์ •๋ฐ€ํ•˜๊ฒŒ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
    • ์ตœ์ข…์ ์œผ๋กœ ๊ฐ€์žฅ ๊ด€๋ จ์„ฑ์ด ๋†’์€ ์ƒ์œ„ 20๊ฐœ์˜ ๋ฌธ์„œ๋ฅผ ๋‹ต๋ณ€ ์ƒ์„ฑ์— ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์š” ๋กœ์ง (Chains)

์ด ํ”„๋กœ์ ํŠธ๋Š” LangGraph๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์„ธ ๊ฐ€์ง€ ๋‹ค๋ฅธ ๋ชฉ์ ์˜ RAG(๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ) ์ฒด์ธ์„ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ์ฒด์ธ์€ ํŠน์ • ์‹œ๋‚˜๋ฆฌ์˜ค์— ๋งž์ถฐ์ง„ ๋…ธ๋“œ(Node)๋“ค์˜ ๊ทธ๋ž˜ํ”„๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, ํ™˜๊ฐ(Hallucination) ํ˜„์ƒ์„ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด Vertex AI์˜ Grounding ๊ธฐ๋Šฅ์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.

1. chain_for_chat.py: ์ผ๋ฐ˜ ์ฑ„ํŒ… ์ฒด์ธ

์ผ๋ฐ˜์ ์ธ ๋Œ€ํ™”ํ˜• ์งˆ์˜์‘๋‹ต์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ RAG ์ฒด์ธ์ž…๋‹ˆ๋‹ค.

  • ์ฃผ์š” ์—ญํ• : ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ ์˜๋„๋ฅผ ๋ช…ํ™•ํžˆ ํ•˜๊ณ , ๊ด€๋ จ ๋ฌธ์„œ๋ฅผ ์ฐพ์•„ ์‹ ๋ขฐ๋„ ๋†’์€ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ์ž‘๋™ ํ๋ฆ„:
    1. ์งˆ๋ฌธ ์žฌ์ž‘์„ฑ (Re-writer): ๋Œ€ํ™” ๊ธฐ๋ก์„ ์ฐธ๊ณ ํ•˜์—ฌ ์‚ฌ์šฉ์ž์˜ ์งˆ๋ฌธ์— ํฌํ•จ๋œ ๋Œ€๋ช…์‚ฌ(์˜ˆ: '๊ทธ๊ฒƒ')๋ฅผ ๊ตฌ์ฒด์ ์ธ ์šฉ์–ด๋กœ ๋ฐ”๊พธ์–ด ๋ช…ํ™•ํ•˜๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
    2. ์งˆ๋ฌธ ๋ถ„ํ•ด (Question Decomposer): ๋ณต์žกํ•œ ์งˆ๋ฌธ์„ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋‹จ์ˆœํ•œ ํ•˜์œ„ ์งˆ๋ฌธ์œผ๋กœ ๋ถ„ํ•ดํ•˜์—ฌ ๊ฒ€์ƒ‰ ์ •ํ™•๋„๋ฅผ ๋†’์ž…๋‹ˆ๋‹ค.
    3. ๋ฌธ์„œ ๊ฒ€์ƒ‰ (Search Document): ๋ถ„ํ•ด๋œ ์งˆ๋ฌธ๋“ค์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฒกํ„ฐ ์Šคํ† ์–ด์—์„œ ๊ด€๋ จ ๋ฌธ์„œ๋ฅผ ๊ฒ€์ƒ‰ํ•˜๊ณ , Reranker๋ฅผ ํ†ตํ•ด ์ตœ์ข… ๋‹ต๋ณ€์— ์‚ฌ์šฉํ•  ๋ฌธ์„œ์˜ ์ˆœ์œ„๋ฅผ ์žฌ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
    4. ๋‹ต๋ณ€ ์ƒ์„ฑ (Generator): ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ '์•ˆ์ „ ์ง€ํ‚ค๋ฏธ AI' ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์ง์ ‘์ ์ธ ๋‹ต๋ณ€๊ณผ ํ•จ๊ป˜ ์•Œ์•„๋‘๋ฉด ์ข‹์€ ์ถ”๊ฐ€ ์ •๋ณด๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    5. Grounding ํ™•์ธ (Hallucination Checker): ์ƒ์„ฑ๋œ ๋‹ต๋ณ€์ด ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ์— ๊ทผ๊ฑฐํ–ˆ๋Š”์ง€ ํ™•์ธํ•˜๊ณ , ์ธ์šฉ(Citation)์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
    6. ๋‹ต๋ณ€ ์ •์ œ (Answer Beautifier): ์ตœ์ข… ๋‹ต๋ณ€์„ ์‚ฌ์šฉ์ž๊ฐ€ ์ฝ๊ธฐ ์‰ฝ๋„๋ก ๋งˆํฌ๋‹ค์šด ํ˜•์‹์œผ๋กœ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

2. chain_for_form.py: ์–‘์‹ ๊ธฐ๋ฐ˜ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ ์ƒ์„ฑ ์ฒด์ธ

์‚ฌ์šฉ์ž๊ฐ€ ์›น ์–‘์‹์„ ํ†ตํ•ด ์ž…๋ ฅํ•œ ํ–‰์‚ฌ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ์„ ์ƒ์„ฑํ•˜๋Š” ์ฒด์ธ์ž…๋‹ˆ๋‹ค.

  • ์ฃผ์š” ์—ญํ• : ํ–‰์‚ฌ ์ •๋ณด์˜ ์ž ์žฌ์  ์œ„ํ—˜ ์š”์†Œ๋ฅผ ๋ถ„์„ํ•˜์—ฌ, ๋ฐฉ๋ฌธ๊ฐ์„ ์œ„ํ•œ ์ƒ์„ธํ•˜๊ณ  ์‹ค์šฉ์ ์ธ ์•ˆ์ „ ๊ฐ€์ด๋“œ๋ฅผ ์ž๋™์œผ๋กœ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ์ž‘๋™ ํ๋ฆ„:
    1. ๋‹ค๊ฐ์  ๊ฒ€์ƒ‰์–ด ์ƒ์„ฑ (Query Generator): ํ–‰์‚ฌ๋ช…, ์œ ํ˜•, ๊ธฐ๊ฐ„, ์žฅ์†Œ, OCR ์ฒ˜๋ฆฌ๋œ ๊ด€๋ จ ๋ฌธ์„œ ๋‚ด์šฉ ๋“ฑ์„ ์กฐํ•ฉํ•˜์—ฌ ๋ฐœ์ƒ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  ์œ„ํ—˜ ์‹œ๋‚˜๋ฆฌ์˜ค(์˜ˆ: "์—ฌ๋ฆ„์ฒ  ์•ผ์™ธ ํ–‰์‚ฌ ์‹์ค‘๋… ์˜ˆ๋ฐฉ", "๊ณต์—ฐ์žฅ ์••์‚ฌ ์‚ฌ๊ณ  ์˜ˆ๋ฐฉ")์— ๋Œ€ํ•œ ๊ฒ€์ƒ‰์–ด๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    2. ๋ฌธ์„œ ๊ฒ€์ƒ‰ (Search Document): ์ƒ์„ฑ๋œ ๊ฒ€์ƒ‰์–ด๋“ค์„ ์‚ฌ์šฉํ•˜์—ฌ ์•ˆ์ „ ๊ด€๋ จ ๋ฌธ์„œ๋ฅผ ุฌุงู…ุน์ ์œผ๋กœ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค.
    3. ์•ˆ๋‚ด๋ฌธ ์ƒ์„ฑ (Generator): '์•ˆ์ „ ์ „๋ฌธ๊ฐ€' ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ์™€ ์‚ฌ์šฉ์ž๊ฐ€ ์ž…๋ ฅํ•œ ํ–‰์‚ฌ ์ •๋ณด๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ ์ฒด๊ณ„์ ์ธ ๊ตฌ์กฐ์˜ ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    4. Grounding ํ™•์ธ ๋ฐ ์ •์ œ: ์ƒ์„ฑ๋œ ์•ˆ๋‚ด๋ฌธ์˜ ์‹ ๋ขฐ๋„๋ฅผ ํ™•์ธํ•˜๊ณ  ์ตœ์ข…๋ณธ์„ ๋งˆํฌ๋‹ค์šด ํ˜•์‹์œผ๋กœ ๋‹ค๋“ฌ์Šต๋‹ˆ๋‹ค.

3. chain_for_form_chat.py: ํ›„์† ์งˆ๋ฌธ ์ฒ˜๋ฆฌ ์ฒด์ธ

chain_for_form.py์— ์˜ํ•ด ์ƒ์„ฑ๋œ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ์— ๋Œ€ํ•ด ์‚ฌ์šฉ์ž๊ฐ€ ์ถ”๊ฐ€๋กœ ์งˆ๋ฌธํ•  ๊ฒฝ์šฐ, ์ด๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ํŠนํ™”๋œ ์ฑ„ํŒ… ์ฒด์ธ์ž…๋‹ˆ๋‹ค.

  • ์ฃผ์š” ์—ญํ• : ๊ธฐ์กด์— ์ƒ์„ฑ๋œ ์•ˆ๋‚ด๋ฌธ๊ณผ ๋Œ€ํ™”์˜ ๋งฅ๋ฝ์„ ์ดํ•ดํ•˜๊ณ  ํ›„์† ์งˆ๋ฌธ์— ์ •ํ™•ํ•˜๊ฒŒ ๋‹ต๋ณ€ํ•ฉ๋‹ˆ๋‹ค.
  • ์ž‘๋™ ํ๋ฆ„:
    1. ์งˆ๋ฌธ ์žฌ์ž‘์„ฑ (Re-writer): ์‚ฌ์šฉ์ž์˜ ํ›„์† ์งˆ๋ฌธ(์˜ˆ: "๊ฑฐ๊ธฐ์„œ ์ฒซ ๋ฒˆ์งธ ํ•ญ๋ชฉ์ด ์™œ ์ค‘์š”ํ•œ๊ฐ€์š”?")์„ ์ด์ „์— ์ƒ์„ฑ๋œ ์•ˆ๋‚ด๋ฌธ๊ณผ ๋Œ€ํ™” ๊ธฐ๋ก์„ ๋ฐ”ํƒ•์œผ๋กœ "ํ™”์žฌ ๋ฐœ์ƒ ์‹œ ์‹ ์†ํ•œ ๋Œ€ํ”ผ๊ฐ€ ์™œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ๊ฐ€์š”?"์™€ ๊ฐ™์ด ๋ช…ํ™•ํ•œ ์งˆ๋ฌธ์œผ๋กœ ์žฌ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค.
    2. ์„œ๋ธŒ ์ฟผ๋ฆฌ ์ƒ์„ฑ (Query Generator): ์žฌ์ž‘์„ฑ๋œ ์งˆ๋ฌธ์„ ๋ฐ”ํƒ•์œผ๋กœ, ๋‹ต๋ณ€์— ํ•„์š”ํ•œ ๋ฐฐ๊ฒฝ ์ •๋ณด, ์˜ˆ๋ฐฉ๋ฒ•, ๊ด€๋ จ ์‚ฌ๋ก€ ๋“ฑ์„ ์ฐพ๊ธฐ ์œ„ํ•œ ์ถ”๊ฐ€ ๊ฒ€์ƒ‰์–ด๋“ค์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    3. ๋ฌธ์„œ ๊ฒ€์ƒ‰, ๋‹ต๋ณ€ ์ƒ์„ฑ, Grounding ๋ฐ ์ •์ œ: chain_for_chat๊ณผ ์œ ์‚ฌํ•œ ๊ณผ์ •์„ ๊ฑฐ์ณ ์‚ฌ์šฉ์ž์˜ ํ›„์† ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์ƒ์„ธํ•˜๊ณ  ์ •ํ™•ํ•œ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

4. custom.py: ๋‚ ์”จ ๊ธฐ๋ฐ˜ ์•ˆ์ „ ๋ถ„์„ ๋ฐ ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ ์ƒ์„ฑ ์ฒด์ธ

custom.py๋Š” ์ด ์‹œ์Šคํ…œ์˜ ํ•ต์‹ฌ ๊ธฐ๋Šฅ ์ค‘ ํ•˜๋‚˜๋กœ, Google Geocoding API์™€ Open-Meteo ๋‚ ์”จ API๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ–‰์‚ฌ ์žฅ์†Œ์˜ ์‹ค์‹œ๊ฐ„ ๋‚ ์”จ ์ •๋ณด๋ฅผ ๋ถ„์„ํ•˜๊ณ , ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‚ ์”จ ๊ด€๋ จ ์•ˆ์ „ ์œ„ํ—˜ ์š”์†Œ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์š” ํŠน์ง•

  • LangGraph ๊ธฐ๋ฐ˜ ๋ณตํ•ฉ ์ฒด์ธ: ์—ฌ๋Ÿฌ ๋‹จ๊ณ„์˜ ๋…ธ๋“œ๋“ค์ด ์ˆœ์ฐจ์ ์œผ๋กœ ์—ฐ๊ฒฐ๋˜์–ด ๋ณต์žกํ•œ ์•ˆ์ „ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ์‹ค์‹œ๊ฐ„ ๋‚ ์”จ ๋ฐ์ดํ„ฐ ํ†ตํ•ฉ: ํ–‰์‚ฌ ์žฅ์†Œ์˜ 24์‹œ๊ฐ„ ๋‚ ์”จ ์˜ˆ๋ณด๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ˆ˜์ง‘ํ•˜์—ฌ ๋ถ„์„์— ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๋‹ค์ค‘ ๋ฒกํ„ฐ ์Šคํ† ์–ด ํ™œ์šฉ: ์•ˆ์ „ ๋ฌธ์„œ, ๋‚ ์”จ ๊ด€๋ จ ์‚ฌ๊ณ  ์‚ฌ๋ก€, ํ–‰์‚ฌ ๊ด€๋ จ ์‚ฌ๊ณ  ์‚ฌ๋ก€๋ฅผ ๊ฐ๊ฐ ๋ณ„๋„์˜ ๋ฒกํ„ฐ ์Šคํ† ์–ด์—์„œ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค.
  • Vertex AI Grounding: ์ƒ์„ฑ๋œ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ์˜ ์‹ ๋ขฐ๋„๋ฅผ Vertex AI์˜ Grounding ๊ธฐ๋Šฅ์œผ๋กœ ๊ฒ€์ฆํ•ฉ๋‹ˆ๋‹ค.
KakaoTalk_20250807_215414775_01

์ž‘๋™ ํ๋ฆ„

1. **ํ–‰์‚ฌ ์ •๋ณด ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰์–ด ์ƒ์„ฑ**:
   - `festival_query_generator`: ํ–‰์‚ฌ ์œ ํ˜•, ์žฅ์†Œ, ๊ทœ๋ชจ, OCR ์ฒ˜๋ฆฌ๋œ ๊ด€๋ จ ๋ฌธ์„œ ๋‚ด์šฉ ๋“ฑ์„ ๋ถ„์„ํ•˜์—ฌ ํ–‰์‚ฌ ๊ด€๋ จ ์‚ฌ๊ณ  ์‚ฌ๋ก€ ๊ฒ€์ƒ‰์–ด ์ƒ์„ฑ
   - `weather_query_generator`: ํ–‰์‚ฌ ์žฅ์†Œ์˜ ์‹ค์‹œ๊ฐ„ ๋‚ ์”จ ์ •๋ณด๋ฅผ ๋ถ„์„ํ•˜์—ฌ ๋‚ ์”จ ๊ด€๋ จ ์œ„ํ—˜ ์š”์†Œ ๊ฒ€์ƒ‰์–ด ์ƒ์„ฑ
   - `safety_query_generator`: ํ–‰์‚ฌ ์ •๋ณด์™€ OCR ์ฒ˜๋ฆฌ๋œ ๋ฌธ์„œ ๋‚ด์šฉ์„ ์ข…ํ•ฉํ•˜์—ฌ ์ผ๋ฐ˜์ ์ธ ์•ˆ์ „ ์œ„ํ—˜ ์š”์†Œ ๊ฒ€์ƒ‰์–ด ์ƒ์„ฑ
  1. ๋‹ค์ค‘ ๋ฒกํ„ฐ ์Šคํ† ์–ด ๊ฒ€์ƒ‰:

    • search_festival_document: ํ–‰์‚ฌ ๊ด€๋ จ ์‚ฌ๊ณ  ์‚ฌ๋ก€ ๋ฒกํ„ฐ ์Šคํ† ์–ด์—์„œ ์œ ์‚ฌ ์‚ฌ๋ก€ ๊ฒ€์ƒ‰
    • search_weather_document: ๋‚ ์”จ ๊ด€๋ จ ์‚ฌ๊ณ  ์‚ฌ๋ก€ ๋ฒกํ„ฐ ์Šคํ† ์–ด์—์„œ ์œ ์‚ฌ ๊ธฐ์ƒ ์กฐ๊ฑด์˜ ์‚ฌ๊ณ  ์‚ฌ๋ก€ ๊ฒ€์ƒ‰
    • search_safety_document: ์ผ๋ฐ˜ ์•ˆ์ „ ๋ฌธ์„œ ๋ฒกํ„ฐ ์Šคํ† ์–ด์—์„œ ๊ด€๋ จ ์•ˆ์ „ ์ •๋ณด ๊ฒ€์ƒ‰
  2. ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ ์ƒ์„ฑ:

    • generator: ๊ฒ€์ƒ‰๋œ ๋ชจ๋“  ์ •๋ณด๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ ํ–‰์‚ฌ ํŠน์„ฑ๊ณผ ํ˜„์žฌ ๋‚ ์”จ์— ์ตœ์ ํ™”๋œ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ ์ƒ์„ฑ
    • hallu_checker: Vertex AI Grounding์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ ์•ˆ๋‚ด๋ฌธ์˜ ์‹ ๋ขฐ๋„ ๊ฒ€์ฆ ๋ฐ ์ธ์šฉ ์ถ”๊ฐ€

๋‚ ์”จ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ

  • ์ง€๋ฆฌ ์ขŒํ‘œ ๋ณ€ํ™˜: Google Geocoding API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ–‰์‚ฌ ์žฅ์†Œ๋ช…์„ ์œ„๋„/๊ฒฝ๋„ ์ขŒํ‘œ๋กœ ๋ณ€ํ™˜
  • 24์‹œ๊ฐ„ ๋‚ ์”จ ์˜ˆ๋ณด ์ˆ˜์ง‘: Open-Meteo API๋ฅผ ํ†ตํ•ด ์ƒ์„ธํ•œ ๋‚ ์”จ ์ •๋ณด ์ˆ˜์ง‘ (๊ธฐ์˜จ, ๊ฐ•์ˆ˜๋Ÿ‰, ํ’์†, ์Šต๋„, UV ์ง€์ˆ˜ ๋“ฑ)
  • LLM ๊ธฐ๋ฐ˜ ๋‚ ์”จ ๋ถ„์„: Gemini 2.5 Flash Lite ๋ชจ๋ธ์ด ๋‚ ์”จ ๋ฐ์ดํ„ฐ๋ฅผ ์ „๋ฌธ์ ์œผ๋กœ ๋ถ„์„ํ•˜์—ฌ ์œ„ํ—˜ ์š”์†Œ ์ถ”์ถœ
  • ๋‚ ์”จ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰์–ด ์ƒ์„ฑ: ๋ถ„์„๋œ ๋‚ ์”จ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์œ ์‚ฌํ•œ ๊ธฐ์ƒ ์กฐ๊ฑด์˜ ์‚ฌ๊ณ  ์‚ฌ๋ก€๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•œ ๊ฒ€์ƒ‰์–ด ์ž๋™ ์ƒ์„ฑ

ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •

GEOCODING_API="YOUR_GOOGLE_GEOCODING_API_KEY"

ํ™œ์šฉ ์‹œ๋‚˜๋ฆฌ์˜ค

  • ์—ฌ๋ฆ„์ฒ  ์•ผ์™ธ ํ–‰์‚ฌ: ํญ์—ผ, ๊ตญ์ง€์„ฑ ํ˜ธ์šฐ, ๋†’์€ UV ์ง€์ˆ˜ ๋“ฑ์— ๋Œ€ํ•œ ์‚ฌ์ „ ๊ฒฝ๊ณ  ๋ฐ ๋Œ€๋น„ ๋ฐฉ์•ˆ ์ œ๊ณต
  • ๊ฒจ์šธ์ฒ  ์‹ค์™ธ ํ–‰์‚ฌ: ํ•œํŒŒ, ๊ฐ•์„ค, ๋น™ํŒ๊ธธ ๋“ฑ์— ๋Œ€ํ•œ ์•ˆ์ „ ์ˆ˜์น™ ์•ˆ๋‚ด
  • ๋ด„/๊ฐ€์„ ํ–‰์‚ฌ: ์ผ๊ต์ฐจ, ๊ฐ•ํ’, ๋ฏธ์„ธ๋จผ์ง€ ๋“ฑ ๊ณ„์ ˆ๋ณ„ ํŠน์ˆ˜ ๊ธฐ์ƒ ์กฐ๊ฑด์— ๋Œ€ํ•œ ์ฃผ์˜์‚ฌํ•ญ ์ œ๊ณต
  • ์‹ค์‹œ๊ฐ„ ๊ธฐ์ƒ ๋ณ€ํ™” ๋Œ€์‘: ํ–‰์‚ฌ ๋‹น์ผ ๊ธ‰๋ณ€ํ•˜๋Š” ๋‚ ์”จ์— ๋Œ€ํ•œ ์ฆ‰๊ฐ์ ์ธ ์•ˆ์ „ ์ •๋ณด ์—…๋ฐ์ดํŠธ

์ด ๊ธฐ๋Šฅ์„ ํ†ตํ•ด ํ–‰์‚ฌ ์ฃผ์ตœ์ž๋Š” ๋‹จ์ˆœํ•œ ํ–‰์‚ฌ ์ •๋ณด๋งŒ์œผ๋กœ๋„ ๋‚ ์”จ์— ํŠนํ™”๋œ ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ์„ ์ž๋™์œผ๋กœ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ฐฉ๋ฌธ๊ฐ๋“ค์€ ํ˜„์žฌ ๊ธฐ์ƒ ์กฐ๊ฑด์— ์ตœ์ ํ™”๋œ ์•ˆ์ „ ์ˆ˜์น™์„ ๋ฐ›์•„๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

API ์‚ฌ์šฉ ๊ฐ€์ด๋“œ

์ด ์‹œ์Šคํ…œ์€ FastAPI๋ฅผ ํ†ตํ•ด 4๊ฐœ์˜ ์ฃผ์š” API ์—”๋“œํฌ์ธํŠธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ฐ API๋Š” http://<YOUR_SERVER_IP>:<PORT> ์ฃผ์†Œ๋กœ ์š”์ฒญํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

1. /api/chat

์ผ๋ฐ˜์ ์ธ ์งˆ์˜์‘๋‹ต์„ ์œ„ํ•œ API์ž…๋‹ˆ๋‹ค.

  • Python ์ฝ”๋“œ ์˜ˆ์ œ:
    import requests
    
    chat_url = "http://127.0.0.1:8000/api/chat"  # ์‹ค์ œ ์„œ๋ฒ„ ์ฃผ์†Œ๋กœ ๋ณ€๊ฒฝ ํ•„์š”
    chat_payload = {
        "question": "์ง€์ง„ ๋ฐœ์ƒ ์‹œ ํ–‰๋™ ์š”๋ น ์•Œ๋ ค์ค˜",
        "session_id": "user123_session_abc"
    }
    
    try:
        response = requests.post(chat_url, json=chat_payload)
        response.raise_for_status()  # 200๋ฒˆ๋Œ€ ์‘๋‹ต์ด ์•„๋‹ˆ๋ฉด ์—๋Ÿฌ ๋ฐœ์ƒ
        chat_result = response.json()
        
        print("--- ์ตœ์ข… ๋‹ต๋ณ€ ---")
        print(chat_result.get('final_answer'))
        
    except requests.exceptions.RequestException as e:
        print(f"API ์š”์ฒญ ์‹คํŒจ: {e}")

2. /api/generate_form

์‚ฌ์šฉ์ž๊ฐ€ ์ž…๋ ฅํ•œ ์–‘์‹(Form) ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์ฐธ๊ณ : related_documents ํ•„๋“œ์—๋Š” ํ–‰์‚ฌ ๊ด€๋ จ PDF ํŒŒ์ผ(ํ–‰์‚ฌ์žฅ ์•ˆ๋‚ด๋„, ํƒ€์ž„ํ…Œ์ด๋ธ”, ์…”ํ‹€๋ฒ„์Šค ์šดํ–‰ ์ •๋ณด ๋“ฑ)์„ ์—…๋กœ๋“œํ•˜๋ฉด, ์‹œ์Šคํ…œ์ด Google Cloud Vision API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž๋™์œผ๋กœ OCR ์ฒ˜๋ฆฌํ•˜์—ฌ ํ…์ŠคํŠธ๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ์ด ์ •๋ณด๋Š” AI๊ฐ€ ํ–‰์‚ฌ์˜ ๊ตฌ์ฒด์ ์ธ ์„ธ๋ถ€์‚ฌํ•ญ์„ ํŒŒ์•…ํ•˜์—ฌ ๋” ์ •ํ™•ํ•œ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค.

  • Python ์ฝ”๋“œ ์˜ˆ์ œ:
    import requests
    
    form_url = "http://127.0.0.1:8000/api/generate_form" # ์‹ค์ œ ์„œ๋ฒ„ ์ฃผ์†Œ๋กœ ๋ณ€๊ฒฝ ํ•„์š”
    form_payload = {
        "place_name": "2025 ํ•œ๊ฐ• ์—ฌ๋ฆ„ ๋ฎค์ง ํŽ˜์Šคํ‹ฐ๋ฒŒ",
        "type": "๋Œ€๊ทœ๋ชจ ์•ผ์™ธ ๊ณต์—ฐ",
        "region": "์„œ์šธ, ์—ฌ์˜๋„ ํ•œ๊ฐ•๊ณต์›",
        "period": "2025๋…„ 8์›” 8์ผ ~ 2025๋…„ 8์›” 10์ผ",
        "description": "๋œจ๊ฑฐ์šด ์—ฌ๋ฆ„๋ฐค์„ ์‹ํ˜€์ค„ ๋Œ€ํ•œ๋ฏผ๊ตญ ์ตœ๊ณ ์˜ ๋ฎค์ง ํŽ˜์Šคํ‹ฐ๋ฒŒ! ๋‹ค์–‘ํ•œ ์žฅ๋ฅด์˜ ์•„ํ‹ฐ์ŠคํŠธ๋“ค๊ณผ ํ•จ๊ป˜ํ•˜๋Š” 3์ผ๊ฐ„์˜ ์ถ•์ œ. ํ‘ธ๋“œํŠธ๋Ÿญ ์กด๊ณผ ์ฒดํ—˜ ์ด๋ฒคํŠธ๋„ ์ค€๋น„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.",
        "category": "์Œ์•…/ํŽ˜์Šคํ‹ฐ๋ฒŒ",
        "related_documents": "festival_guide.pdf",  # PDF ํŒŒ์ผ ์—…๋กœ๋“œ
        "emergency_contact_name": "์ข…ํ•ฉ์ƒํ™ฉ์‹ค ์•ˆ์ „๊ด€๋ฆฌํŒ€",
        "emergency_contact_phone": "02-123-4567"
    }
    
    try:
        response = requests.post(form_url, json=form_payload)
        response.raise_for_status()
        form_result = response.json()
    
        print("--- ์ƒ์„ฑ๋œ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ ---")
        print(form_result.get('final_answer'))
        
        # ํ›„์† ์งˆ๋ฌธ์„ ์œ„ํ•ด ์ƒ์„ฑ๋œ ์•ˆ๋‚ด๋ฌธ์„ ๋ณ€์ˆ˜์— ์ €์žฅ
        generated_form_content = form_result.get('final_answer')
    
    except requests.exceptions.RequestException as e:
        print(f"API ์š”์ฒญ ์‹คํŒจ: {e}")

3. /api/form_chat

generate_form์œผ๋กœ ์ƒ์„ฑ๋œ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ์— ๋Œ€ํ•œ ํ›„์† ์งˆ๋ฌธ์„ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

  • Python ์ฝ”๋“œ ์˜ˆ์ œ:
    import requests
    # 'generate_form' API ํ˜ธ์ถœ ํ›„ ๋ฐ˜ํ™˜๋œ 'final_answer' ๊ฐ’์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    # ์˜ˆ์‹œ์—์„œ๋Š” ์œ„ ์ฝ”๋“œ ๋ธ”๋ก์˜ 'generated_form_content' ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.
    
    # generated_form_content = "..." # ์‹ค์ œ๋กœ๋Š” ์ด์ „ API ํ˜ธ์ถœ ๊ฒฐ๊ณผ
    
    if 'generated_form_content' in locals():
        form_chat_url = "http://127.0.0.1:8000/api/form_chat" # ์‹ค์ œ ์„œ๋ฒ„ ์ฃผ์†Œ๋กœ ๋ณ€๊ฒฝ ํ•„์š”
        form_chat_payload = {
            "generated_form": generated_form_content,
            "query": "์˜จ์—ด์งˆํ™˜ ์ฆ์ƒ์œผ๋กœ๋Š” ๊ตฌ์ฒด์ ์œผ๋กœ ์–ด๋–ค ๊ฒƒ๋“ค์ด ์žˆ๋‚˜์š”?",
            "session_id": "user123_session_abc"
        }
    
        try:
            response = requests.post(form_chat_url, json=form_chat_payload)
            response.raise_for_status()
            form_chat_result = response.json()
    
            print("--- ํ›„์† ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€ ---")
            print(form_chat_result.get('final_answer'))
    
        except requests.exceptions.RequestException as e:
            print(f"API ์š”์ฒญ ์‹คํŒจ: {e}")
    else:
        print("๋จผ์ € /api/generate_form ์„ ํ˜ธ์ถœํ•˜์—ฌ 'generated_form_content'๋ฅผ ์ƒ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.")

4. /api/custom

custom.py์— ๊ตฌํ˜„๋œ ๋‚ ์”จ ๊ธฐ๋ฐ˜ ์•ˆ์ „ ๋ถ„์„ ๊ธฐ๋Šฅ์„ ํ™œ์šฉํ•˜์—ฌ ํ–‰์‚ฌ ์ •๋ณด์™€ ์‹ค์‹œ๊ฐ„ ๋‚ ์”จ ๋ฐ์ดํ„ฐ๋ฅผ ์ข…ํ•ฉํ•œ ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด API๋Š” Google Geocoding API์™€ Open-Meteo ๋‚ ์”จ API๋ฅผ ์ž๋™์œผ๋กœ ํ˜ธ์ถœํ•˜์—ฌ ํ–‰์‚ฌ ์žฅ์†Œ์˜ 24์‹œ๊ฐ„ ๋‚ ์”จ ์˜ˆ๋ณด๋ฅผ ๋ถ„์„ํ•˜๊ณ , ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‚ ์”จ ๊ด€๋ จ ์œ„ํ—˜ ์š”์†Œ๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

์ฐธ๊ณ : related_documents ํ•„๋“œ์—๋Š” ํ–‰์‚ฌ ๊ด€๋ จ PDF ํŒŒ์ผ(ํ–‰์‚ฌ์žฅ ์•ˆ๋‚ด๋„, ํƒ€์ž„ํ…Œ์ด๋ธ”, ์…”ํ‹€๋ฒ„์Šค ์šดํ–‰ ์ •๋ณด ๋“ฑ)์„ ์—…๋กœ๋“œํ•˜๋ฉด, ์‹œ์Šคํ…œ์ด Google Cloud Vision API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž๋™์œผ๋กœ OCR ์ฒ˜๋ฆฌํ•˜์—ฌ ํ…์ŠคํŠธ๋ฅผ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ์ด ์ •๋ณด๋Š” AI๊ฐ€ ํ–‰์‚ฌ์˜ ๊ตฌ์ฒด์ ์ธ ์„ธ๋ถ€์‚ฌํ•ญ์„ ํŒŒ์•…ํ•˜์—ฌ ๋” ์ •ํ™•ํ•œ ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค.

  • Python ์ฝ”๋“œ ์˜ˆ์ œ:

    import requests
    
    custom_url = "http://127.0.0.1:8000/api/custom"  # ์‹ค์ œ ์„œ๋ฒ„ ์ฃผ์†Œ๋กœ ๋ณ€๊ฒฝ ํ•„์š”
    custom_payload = {
        "place_name": "2025 ํ•œ๊ฐ• ์—ฌ๋ฆ„ ๋ฎค์ง ํŽ˜์Šคํ‹ฐ๋ฒŒ",
        "type": "๋Œ€๊ทœ๋ชจ ์•ผ์™ธ ๊ณต์—ฐ",
        "location": "์„œ์šธ, ์—ฌ์˜๋„ ํ•œ๊ฐ•๊ณต์›",
        "period": "2025๋…„ 8์›” 8์ผ ~ 2025๋…„ 8์›” 10์ผ",
        "description": "๋œจ๊ฑฐ์šด ์—ฌ๋ฆ„๋ฐค์„ ์‹ํ˜€์ค„ ๋Œ€ํ•œ๋ฏผ๊ตญ ์ตœ๊ณ ์˜ ๋ฎค์ง ํŽ˜์Šคํ‹ฐ๋ฒŒ! ๋‹ค์–‘ํ•œ ์žฅ๋ฅด์˜ ์•„ํ‹ฐ์ŠคํŠธ๋“ค๊ณผ ํ•จ๊ป˜ํ•˜๋Š” 3์ผ๊ฐ„์˜ ์ถ•์ œ. ํ‘ธ๋“œํŠธ๋Ÿญ ์กด๊ณผ ์ฒดํ—˜ ์ด๋ฒคํŠธ๋„ ์ค€๋น„๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.",
        "category": "์Œ์•…/ํŽ˜์Šคํ‹ฐ๋ฒŒ",
        "related_documents": "festival_guide.pdf",  # PDF ํŒŒ์ผ ์—…๋กœ๋“œ
        "emergency_contact_name": "์ข…ํ•ฉ์ƒํ™ฉ์‹ค ์•ˆ์ „๊ด€๋ฆฌํŒ€",
        "emergency_contact_phone": "02-123-4567",
        "expected_attendees": "50000"
    }
    
    try:
        response = requests.post(custom_url, json=custom_payload)
        response.raise_for_status()
        custom_result = response.json()
    
        print("--- ๋‚ ์”จ ๊ธฐ๋ฐ˜ ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ ---")
        print(custom_result.get('generation'))
        
        # Grounding ๊ฒ€์ฆ ๊ฒฐ๊ณผ๋„ ํ™•์ธ ๊ฐ€๋Šฅ
        if 'hallu_check' in custom_result:
            print("\n--- Grounding ๊ฒ€์ฆ ๊ฒฐ๊ณผ ---")
            print(f"์‹ ๋ขฐ๋„: {custom_result['hallu_check'].get('answer_with_citations', 'N/A')}")
    
    except requests.exceptions.RequestException as e:
        print(f"API ์š”์ฒญ ์‹คํŒจ: {e}")
  • ์ฃผ์š” ํŠน์ง•:

    • ์‹ค์‹œ๊ฐ„ ๋‚ ์”จ ๋ถ„์„: ํ–‰์‚ฌ ์žฅ์†Œ์˜ 24์‹œ๊ฐ„ ๋‚ ์”จ ์˜ˆ๋ณด๋ฅผ ์ž๋™์œผ๋กœ ์ˆ˜์ง‘ํ•˜๊ณ  ๋ถ„์„
    • ๋‹ค์ค‘ ๋ฒกํ„ฐ ์Šคํ† ์–ด ๊ฒ€์ƒ‰: ์•ˆ์ „ ๋ฌธ์„œ, ๋‚ ์”จ ๊ด€๋ จ ์‚ฌ๊ณ  ์‚ฌ๋ก€, ํ–‰์‚ฌ ๊ด€๋ จ ์‚ฌ๊ณ  ์‚ฌ๋ก€๋ฅผ ๊ฐ๊ฐ ๋ณ„๋„ ๊ฒ€์ƒ‰
    • Vertex AI Grounding: ์ƒ์„ฑ๋œ ์•ˆ๋‚ด๋ฌธ์˜ ์‹ ๋ขฐ๋„๋ฅผ ์ž๋™์œผ๋กœ ๊ฒ€์ฆ
    • ๋งž์ถคํ˜• ์œ„ํ—˜ ์š”์†Œ ์˜ˆ์ธก: ํ–‰์‚ฌ ํŠน์„ฑ๊ณผ ํ˜„์žฌ ๋‚ ์”จ๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ ๊ตฌ์ฒด์ ์ธ ์œ„ํ—˜ ์š”์†Œ ๋ถ„์„
  • ํ•„์ˆ˜ ํ™˜๊ฒฝ ๋ณ€์ˆ˜:

    GEOCODING_API="YOUR_GOOGLE_GEOCODING_API_KEY"
  • ์‘๋‹ต ํ˜•์‹:

    {
        "generation": "์ƒ์„ฑ๋œ ๋‚ ์”จ ๊ธฐ๋ฐ˜ ๋งž์ถคํ˜• ์•ˆ์ „ ์•ˆ๋‚ด๋ฌธ",
        "hallu_check": {
            "answer_with_citations": "Grounding ๊ฒ€์ฆ์ด ์™„๋ฃŒ๋œ ์•ˆ๋‚ด๋ฌธ (์ธ์šฉ ํฌํ•จ)"
        },
        "weather_summary": "๋ถ„์„๋œ ๋‚ ์”จ ์š”์•ฝ ์ •๋ณด",
        "festival_query_list": ["์ƒ์„ฑ๋œ ํ–‰์‚ฌ ๊ด€๋ จ ๊ฒ€์ƒ‰์–ด ๋ชฉ๋ก"],
        "weather_query_list": ["์ƒ์„ฑ๋œ ๋‚ ์”จ ๊ด€๋ จ ๊ฒ€์ƒ‰์–ด ๋ชฉ๋ก"],
        "safety_query_list": ["์ƒ์„ฑ๋œ ์•ˆ์ „ ๊ด€๋ จ ๊ฒ€์ƒ‰์–ด ๋ชฉ๋ก"]
    }

About

PDF-based safety QA RAG system with Gemini, LangChain/LangGraph, OCR, hybrid retrieval, Vertex AI reranking, FastAPI, and Docker.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors