Adjust date extraction rules based on SLM extraction findings#170
Adjust date extraction rules based on SLM extraction findings#170aneesafatima wants to merge 5 commits intoAOSSIE-Org:mainfrom
Conversation
📝 WalkthroughWalkthroughparse_date now accepts full and abbreviated weekday names, computes the next occurrence for named weekdays, and normalizes "next month" to the first day of the next month (YYYY-MM-01); parse_appointment_command uses these enhancements with no public signature changes. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@Backend/agent/handlers/appointment.py`:
- Around line 99-102: The branch handling date_str_lower == 'next month' builds
a YYYY-MM-DD string using today.day which can produce invalid dates (e.g.,
2024-02-31); update that branch in appointment.py to compute the next-month date
properly by either adding one calendar month with dateutil.relativedelta (today
+ relativedelta(months=1)) and formatting with strftime, or by computing
next_month/year and clamping the day to the last day of that month using
calendar.monthrange before formatting; replace the current month/year/day string
construction in that branch accordingly.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
Backend/agent/handlers/appointment.py
🔇 Additional comments (1)
Backend/agent/handlers/appointment.py (1)
103-110: LGTM!The weekday calculation logic correctly determines the next occurrence of the specified day. The
days_ahead <= 0condition properly handles both "today is the target day" (returns next week) and "target day already passed this week" cases.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Backend/agent/handlers/appointment.py (1)
38-42: "next month" not captured by date extraction patterns.The
parse_datefunction now handles"next month"(line 101), but the regex patterns here don't includenext\s+month. Inputs like"schedule appointment next month"won't extract the date.🐛 Proposed fix to add "next month" to patterns
date_patterns = [ - r'(?:on\s+|for\s+|at\s+)(today|tomorrow|next\s+week|monday|tuesday|wednesday|thursday|friday|saturday|sunday|mon|tue|wed|thurs|fri|sat|sun)', + r'(?:on\s+|for\s+|at\s+)(today|tomorrow|next\s+week|next\s+month|monday|tuesday|wednesday|thursday|friday|saturday|sunday|mon|tue|wed|thurs|fri|sat|sun)', r'(?:on\s+|for\s+|at\s+)(\d{1,2}/\d{1,2}|\d{4}-\d{2}-\d{2}|\d{4}/\d{2}/\d{2})', - r'(today|tomorrow|next\s+week|monday|tuesday|wednesday|thursday|friday|saturday|sunday|mon|tue|wed|thu|fri|sat|sun|\d{1,2}/\d{1,2}|\d{4}-\d{2}-\d{2}|\d{4}/\d{2}/\d{2})', + r'(today|tomorrow|next\s+week|next\s+month|monday|tuesday|wednesday|thursday|friday|saturday|sunday|mon|tue|wed|thu|fri|sat|sun|\d{1,2}/\d{1,2}|\d{4}-\d{2}-\d{2}|\d{4}/\d{2}/\d{2})', ]
🤖 Fix all issues with AI agents
In `@Backend/agent/handlers/appointment.py`:
- Around line 91-93: The allowed_days list in appointment.py has a missing comma
between 'sunday' and 'mon' causing Python to concatenate them into 'sundaymon';
update the allowed_days definition (the list assigned to allowed_days) to insert
the missing comma between 'sunday' and 'mon' so the full set of weekday names
and abbreviations are separate entries and re-run tests/validation that match
against allowed_days.
♻️ Duplicate comments (1)
Backend/agent/handlers/appointment.py (1)
106-113: Abbreviations produce incorrect weekday indices.Even with the missing comma fixed,
allowed_days.index('mon')returns7, not0. Sincetoday.weekday()returns0-6, the arithmetictarget_day - today.weekday()will be wrong for all abbreviations.🐛 Proposed fix using a mapping dictionary
- allowed_days = ['monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday', - 'mon', 'tue', 'wed', 'thu', 'fri', 'sat', 'sun' - ] + day_mapping = { + 'monday': 0, 'mon': 0, + 'tuesday': 1, 'tue': 1, + 'wednesday': 2, 'wed': 2, + 'thursday': 3, 'thu': 3, 'thurs': 3, + 'friday': 4, 'fri': 4, + 'saturday': 5, 'sat': 5, + 'sunday': 6, 'sun': 6, + } ... - elif date_str_lower in allowed_days: + elif date_str_lower in day_mapping: # Handles next occurrence of the specified day - target_day = allowed_days.index(date_str_lower) + target_day = day_mapping[date_str_lower] days_ahead = target_day - today.weekday() if days_ahead <= 0: days_ahead += 7 target_date = today + timedelta(days=days_ahead) return target_date.strftime('%Y-%m-%d')
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
Backend/agent/handlers/appointment.py
🧰 Additional context used
🧬 Code graph analysis (1)
Backend/agent/handlers/appointment.py (3)
Frontend/src/Screens/BasicDetailsScreen.jsx (1)
today(23-23)Frontend/src/Screens/CalendarScreen.jsx (2)
days(19-19)year(47-47)Backend/app.py (1)
index(178-183)
🔇 Additional comments (1)
Backend/agent/handlers/appointment.py (1)
101-105: Good fix for the invalid date issue.Normalizing to
day = 1correctly avoids the previously flagged bug where dates like2024-02-31could be produced. The month/year rollover logic is correct.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
Backend/agent/handlers/appointment.py (2)
39-39: Inconsistent abbreviation:thurscaptured but not recognized.Line 39 uses
thursin the regex, butallowed_days(lines 91–93) only containsthu. When a user says "on thurs", the regex captures it, butparse_datewon't find it inallowed_daysand returnsNone.Either add
'thurs'to the allowed list, or change the regex to usethuconsistently. The proposed fix above includes'thurs': 3in the mapping.
39-42:next monthhandling is unreachable fromparse_appointment_command.The
parse_datefunction correctly handles'next month'(lines 101–105), but thedate_patternsregex (lines 39–41) only includesnext\s+week, notnext\s+month. User input like "schedule appointment next month" won't have the date extracted.🐛 Proposed fix to add `next month` to the patterns
date_patterns = [ - r'(?:on\s+|for\s+|at\s+)(today|tomorrow|next\s+week|monday|tuesday|wednesday|thursday|friday|saturday|sunday|mon|tue|wed|thurs|fri|sat|sun)', + r'(?:on\s+|for\s+|at\s+)(today|tomorrow|next\s+week|next\s+month|monday|tuesday|wednesday|thursday|friday|saturday|sunday|mon|tue|wed|thu|fri|sat|sun)', r'(?:on\s+|for\s+|at\s+)(\d{1,2}/\d{1,2}|\d{4}-\d{2}-\d{2}|\d{4}/\d{2}/\d{2})', - r'(today|tomorrow|next\s+week|monday|tuesday|wednesday|thursday|friday|saturday|sunday|mon|tue|wed|thu|fri|sat|sun|\d{1,2}/\d{1,2}|\d{4}-\d{2}-\d{2}|\d{4}/\d{2}/\d{2})', + r'(today|tomorrow|next\s+week|next\s+month|monday|tuesday|wednesday|thursday|friday|saturday|sunday|mon|tue|wed|thu|fri|sat|sun|\d{1,2}/\d{1,2}|\d{4}-\d{2}-\d{2}|\d{4}/\d{2}/\d{2})', ]Also applies to: 101-105
🤖 Fix all issues with AI agents
In `@Backend/agent/handlers/appointment.py`:
- Around line 91-93: The allowed_days list and code that uses
allowed_days.index() to compute target_day is wrong for abbreviations (e.g.,
'mon' returns index 7), causing incorrect date math with today.weekday();
replace the list + index-based lookup with a mapping dictionary (e.g.,
day_to_index = {'monday':0,'mon':0,'tuesday':1,'tue':1,...}) and use
day_to_index[user_input_lower] to derive target_day; update every place that
currently uses allowed_days and allowed_days.index() (the variable allowed_days
and any logic computing target_day/days_ahead) to use the new mapping and
validate input against its keys.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
Backend/agent/handlers/appointment.py
🧰 Additional context used
🧬 Code graph analysis (1)
Backend/agent/handlers/appointment.py (3)
Frontend/src/Screens/BasicDetailsScreen.jsx (1)
today(23-23)Frontend/src/Screens/CalendarScreen.jsx (2)
days(19-19)year(47-47)Backend/app.py (1)
index(178-183)
🔇 Additional comments (1)
Backend/agent/handlers/appointment.py (1)
101-105: LGTM!The "next month" logic correctly avoids invalid dates by normalizing to the first day of the month. This is a reasonable design choice that sidesteps the day-clamping complexity.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Backend/agent/handlers/appointment.py (1)
39-41: "next month" not captured by date patterns.The
parse_datefunction now handles"next month"(line 107), but the regex patterns here don't includenext\s+month. User input like"schedule appointment for next month"won't extract"next month"as the date string, making the new logic unreachable.🐛 Proposed fix
date_patterns = [ - r'(?:on\s+|for\s+|at\s+)(today|tomorrow|next\s+week|monday|tuesday|wednesday|thursday|friday|saturday|sunday|mon|tue|wed|thurs|fri|sat|sun)', + r'(?:on\s+|for\s+|at\s+)(today|tomorrow|next\s+week|next\s+month|monday|tuesday|wednesday|thursday|friday|saturday|sunday|mon|tue|wed|thurs|fri|sat|sun)', r'(?:on\s+|for\s+|at\s+)(\d{1,2}/\d{1,2}|\d{4}-\d{2}-\d{2}|\d{4}/\d{2}/\d{2})', - r'(today|tomorrow|next\s+week|monday|tuesday|wednesday|thursday|friday|saturday|sunday|mon|tue|wed|thu|fri|sat|sun|\d{1,2}/\d{1,2}|\d{4}-\d{2}-\d{2}|\d{4}/\d{2}/\d{2})', + r'(today|tomorrow|next\s+week|next\s+month|monday|tuesday|wednesday|thursday|friday|saturday|sunday|mon|tue|wed|thu|fri|sat|sun|\d{1,2}/\d{1,2}|\d{4}-\d{2}-\d{2}|\d{4}/\d{2}/\d{2})', ]
🤖 Fix all issues with AI agents
In `@Backend/agent/handlers/appointment.py`:
- Around line 91-99: The allowed_days mapping used by parse_date is missing the
"thurs" abbreviation referenced by the regex, causing successful regex matches
to later return None; update the allowed_days dict in appointment.py (the
allowed_days variable) to include "thurs": 3 (and any other common Thursday
variants you want to support, e.g., "thur") so that keys produced by the regex
map to the correct weekday index.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@Backend/agent/handlers/appointment.py`:
- Around line 39-42: The date extraction regex list used before parse_date
doesn't include "next month", so calls to parse_date get date_str=None for
inputs like "appointment next month"; update the three regex patterns (the list
assigned near the top of appointment handler) to include the token
"next\s+month" alongside "next\s+week" and the weekday/month/date alternatives
so that the extractor captures "next month" and passes it to parse_date; locate
the regex list declaration (used by the appointment handler/date extraction) and
add "next\s+month" in the same positions as "next\s+week" and in the combined
alternatives.
Closes #161
📝 Description
This PR focuses on experimenting with fine-tuning the existing 0.5B Small Language Model (SLM) that is currently being used in the project.
The main goal of this work was to understand whether lightweight fine-tuning can improve the model's behavior for our specific use case, while still keeping the model small enough to be loaded on the user's device (offline support).
Since the dataset required for this task is domain-specific and not available online, I created a custom sample dataset and explored different fine-tuning approaches. After researching multiple methods, I used LoRA (Low-Rank Adaptation) as it is the most suitable and resource-efficient option for fine-tuning small models.
Most of the training work was done outside the main codebase (Google Colab), and the trained adapters are hosted separately on Hugging Face. Only minimal changes were made in the main repository, as this PR is primarily research- and experimentation-oriented, as discussed earlier.
🔧 Changes Made
Created a custom sample dataset:
instruction,input,outputImplemented LoRA-based fine-tuning for the 0.5B causal language model:
Trained the model using Google Colab and pushed the trained artifacts to Hugging Face Hub:
Evaluated the model before and after fine-tuning:
Added small improvements in the main codebase related to date handling:
📷 Screenshots or Visual Changes (if applicable)
Sample outputs from testing the new date extraction logic
Training Metrics
Epochs 1–3: Healthy learning
Epochs 4–5: Model starts to slightly overfit (This I believe is happening due to the lack of data)
Model Behavior Comparison
Before Fine-tuning:
After Fine-tuning:
From the above findings, it can be observed that while the model is now able to handle None values better compared to earlier behavior, it is still not consistently placing X values in the correct positions for date outputs.
I also noticed that when longer prompts are provided, the model tends to lose coherence in its responses. This suggests a limitation of the current small causal model, especially when trained on a limited dataset.
Based on these observations, using a significantly larger dataset may help improve performance to some extent. However, due to the generative nature of the model, a hybrid approach may be worth exploring in the future if stricter or more deterministic outputs are required.
🤝 Collaboration
N/A
🧠 Notes / Remarks
Model Type: The model used here is a causal language model, meaning it performs text generation by predicting the next token based on previous tokens, rather than producing a fixed "correct" output.
Current Limitations:
Date Extraction Convention:
XXXXXX-11-04→ year missing2024-XX-XX→ month and day missing2024-11-XX→ day missingResearch Focus: This PR mainly lays the foundation and learnings for model fine-tuning rather than claiming a final optimized solution. It serves as:
🚀 Next Steps (Optional)
If the current results are considered acceptable or useful, I can:
Please let me know how you'd like to proceed based on these findings.
✅ Checklist
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.