Correct system prompt statutes and fix evaluation dataset#325
Correct system prompt statutes and fix evaluation dataset#325yangm2 wants to merge 1 commit intocodeforpdx:mainfrom
Conversation
system_prompt.md: - add exact statute quotes for ORS 90.394(1), 90.425(3)(6)(b)(8), 90.325(3)(b)(4), 90.245 - add ORS 90.395(2) rental assistance notice requirement - clarify PCC 30.01.087 Portland security deposit interest rule - call out citation traps (ORS 90.155 excluded from ORS 90.425) - reorganize behavioral defaults and grounding rules dataset: - import domestic violence scenario (scenario 3) - fix scenario 1 (abandoned property): correct delivery methods (no 'post' option), state both contact windows (5 days personal / 8 days mailed), clarify 15-day pickup window starts after tenant responds - add statute quotes and citation-trap notes to legal_correctness evaluator
yangm2
left a comment
There was a problem hiding this comment.
I have skimmed these changes
TruMichael-jpg
left a comment
There was a problem hiding this comment.
Obviously the main system prompt will be an iterative effort -- feel free to merge and/or accept my suggested changes here as needed.
| **Behavioral defaults:** | ||
| - Give full, detailed answers; limit responses to under {RESPONSE_WORD_LIMIT} words whenever possible. | ||
| - Ask only one question at a time so the user isn't confused. | ||
| - Assume the user is on a month-to-month lease unless they specify otherwise. |
There was a problem hiding this comment.
| - Assume the user is on a month-to-month lease unless they specify otherwise. | |
| - Assume the user is on a month-to-month tenancy unless they specify otherwise or unless the answer to their question would change if they are on a week-to-week tenancy or in the middle of a lease agreement, and if the latter, ask the user to confirm. |
There was a problem hiding this comment.
Adjusting some of the language here since this is a key factor and should not always be assumed.
There was a problem hiding this comment.
Thanks, let me test this change locally to see how this affects the evaluations.
| - When evaluating an eviction notice for nonpayment, always check: (1) whether the required notice period was given, (2) whether the notice was served on a legally permitted day relative to the start of the rental period — this varies by lease type (week-to-week and month-to-month tenancies have different rules under Oregon law), (3) whether proper service methods were used, and (4) whether the landlord included the required rental assistance notice under [ORS 90.395](https://oregon.public.law/statutes/ors_90.395)(2) — failure to deliver it is grounds for court dismissal of the eviction complaint under [ORS 90.395](https://oregon.public.law/statutes/ors_90.395)(3)(a). | ||
| - When the user states a position that their landlord (or another party) disputes, directly confirm or refute it using the retrieved law. | ||
| - City laws override state laws when there is a conflict. If the user is in a specific city, check for relevant city laws. | ||
| - If the user is being evicted for non-payment of rent, is too poor to pay, and you have confirmed the notice and court hearing date are valid, tell them to call Oregon Law Center at {OREGON_LAW_CENTER_PHONE_NUMBER}. |
There was a problem hiding this comment.
Could we add the Referrals page that MZ put together here instead of the OLC phone number?
There was a problem hiding this comment.
Yeah I think we should eventually use the referrals that @michaelzhang43 sent out. I think we'll need a little more than a system prompt to support the qualifications, but we can put the initial static list in. I'll create an issue to do that in a future PR.
| - Give full, detailed answers; limit responses to under {RESPONSE_WORD_LIMIT} words whenever possible. | ||
| - Ask only one question at a time so the user isn't confused. | ||
| - Assume the user is on a month-to-month lease unless they specify otherwise. | ||
| - Focus on finding technicalities that would legally prevent someone getting evicted, such as deficiencies in notice. |
There was a problem hiding this comment.
| - Focus on finding technicalities that would legally prevent someone getting evicted, such as deficiencies in notice. | |
| - Focus on finding technicalities that would legally prevent someone getting evicted, such as deficiencies in notice. | |
| - If the user asks about a particular action they need to take or the information provided in your response includes actions or tasks that a tenant in the user's situation needs to take, include as much detail as needed for them to actually take the action (for example, if a notice needs to be sent, include details as to when, where and how such notice must be sent under the statute). If you do not have enough information to give such instructions, ask the user factual questions until you do. | |
There was a problem hiding this comment.
This may be too ambitious or not necessary at this point... but I think we ought to move in this direction to make the bot more of a first-aid resource and prevent it from from hyper-focusing on legal arguments the tenant would have to make in eviction court at the expense of practical actions they could take to resolve their situation (with an understanding of their rights).
There was a problem hiding this comment.
This is a good point. Maybe this suggests a new evaluator? Like a "does a layperson know what to do with this information?" Evaluator/Rubric 🤔
What type of PR is this? (check all applicable)
Description
Corrections to the system prompt and evaluation dataset, reviewed with the project lawyer.
system_prompt.md:
dataset-tenant-legal-qa-examples.jsonl:
evaluators/legal_correctness.md:
Related Tickets & Documents
QA Instructions, Screenshots, Recordings
No code changes. Review
system_prompt.mdanddataset-tenant-legal-qa-examples.jsonlagainst ORS 90.425(3), (6)(b), and (8).Added/updated tests?
Documentation
Architecture.mdhas been updated[optional] Are there any post deployment tasks we need to perform?
After merging, re-upload the dataset to LangSmith:
cd backend uv run python -m evaluate.create_langsmith_dataset