Skip to content

Commit 465972b

Browse files
committed
improve event extraction rules due to false positives
1 parent 76f795c commit 465972b

File tree

2 files changed

+15
-11
lines changed

2 files changed

+15
-11
lines changed

backend/scraping/instagram_feed.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -118,8 +118,9 @@ def append_event_to_csv(
118118
csv_file.parent.mkdir(parents=True, exist_ok=True)
119119
file_exists = csv_file.exists()
120120

121-
dtstart = event_data.get("dtstart", "")
122-
dtend = event_data.get("dtend", "")
121+
dtstart = dateutil_parser.parse(event_data.get("dtstart")).replace(tzinfo=None)
122+
dtend = dateutil_parser.parse(event_data.get("dtend")).replace(tzinfo=None) \
123+
if event_data.get("dtend") else None
123124
dtstart_utc = event_data.get("dtstart_utc", "")
124125
dtend_utc = event_data.get("dtend_utc", "")
125126
duration = event_data.get("duration", "")

backend/services/openai_service.py

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -115,18 +115,21 @@ def extract_events_from_caption(
115115

116116
prompt = f"""
117117
Analyze the following Instagram caption and extract event information if it's an event post.
118-
118+
119119
School context: This post is from {school}. Use this to guide location and timezone decisions.
120120
Current context: Today is {current_day_of_week}, {current_date}
121121
Post was created on: {context_day}, {context_date} at {context_time}
122-
122+
123123
Caption: {caption_text}
124-
124+
125125
STRICT CONTENT POLICY:
126-
- If the content is NOT announcing or describing a real-world event (e.g., a meme, personal photo dump, generic brand post with no time/place), DO NOT extract an event. Return an object with empty strings/nulls as specified below.
127-
- If the content is inappropriate (nudity, explicit sexual content, or graphic violence), DO NOT extract an event. Return an object with empty strings/nulls as specified below.
128-
- DO NOT extract events that only mention a date or day (e.g., "this Sunday", "November 2") WITHOUT a specific start time. ONLY include events if a specific start time (e.g., "at 2pm", "from 10am-4pm") is mentioned in the caption or image.
129-
126+
- ONLY extract an event if the post is clearly announcing or describing a real-world event with a specific date AND a specific start time (e.g., "at 2pm", "from 10am-4pm").
127+
- DO NOT extract an event if:
128+
* The post is a meme, personal photo dump, or generic post with no time/place.
129+
* The post is inappropriate (nudity, explicit sexual content, or graphic violence).
130+
* There is no explicit mention of BOTH a date (e.g., "October 31", "Friday", "tomorrow") AND a time (e.g., "at 2pm", "from 10am-4pm", "noon", "evening") in the caption or image.
131+
* The post only introduces people or some topic, UNLESS there is a clear call to attend or participate in an actual event (such as a meeting, workshop, performance, or competition).
132+
130133
Return ONE JSON object (not an array). The object must have ALL of the following fields:
131134
{{
132135
"title": string,
@@ -149,7 +152,7 @@ def extract_events_from_caption(
149152
"source_image_url": string,
150153
"description": string
151154
}}
152-
155+
153156
IMPORTANT RULES:
154157
- Return EXACTLY ONE JSON object. NEVER return an array.
155158
- If multiple dates are listed (e.g., "Friday and Saturday" or explicit multiple dates), keep the primary occurrence in dtstart/dtend and put the additional occurrence dates (dates only) into rdate as an array of ISO dates.
@@ -339,7 +342,7 @@ def generate_recommended_filters(self, events_data: list[dict]) -> list[list[str
339342

340343
# Prepare event summaries for the prompt
341344
event_summaries = []
342-
for event in events_data[:20]:
345+
for event in events_data[:20]:
343346
title = event.get("title")
344347
summary = f"- {title}"
345348
event_summaries.append(summary)

0 commit comments

Comments
 (0)