Skip to content

Add reproduction logs for Lessons 1–3 (Foundations of Retrieval)#3123

Open
dishaprashar64-code wants to merge 6 commits intocastorini:masterfrom
dishaprashar64-code:master
Open

Add reproduction logs for Lessons 1–3 (Foundations of Retrieval)#3123
dishaprashar64-code wants to merge 6 commits intocastorini:masterfrom
dishaprashar64-code:master

Conversation

@dishaprashar64-code
Copy link

This PR adds reproduction log entries for the Foundations of Retrieval onboarding (Lessons 1–3).

Environment:

  • OS: Windows 11
  • Shells used: Windows PowerShell, Git Bash
  • Editor: VS Code
  • Java: Eclipse Temurin (JDK 11)
  • Maven: Apache Maven 3.9.x

Summary of work:

  • Successfully completed Lessons 1–3 following the official Anserini onboarding guides.
  • Downloaded and processed the MS MARCO passage collection (~1GB, 8.8M passages).
  • Converted the collection to JSONL format (9 files, 8,841,823 documents).
  • Generated and verified the dev query subset (6980 queries).
  • Verified correctness using query q1048585 → document 7187158 (Paula Deen / Uncle Bubba’s).

Issues encountered and resolution:

  • Initial setup on Windows was blocked due to missing Java and Maven toolchain.
  • java and mvn were not recognized because JAVA_HOME, MAVEN_HOME, and PATH were not configured.
  • Resolved by installing Eclipse Temurin JDK 11 and Apache Maven, and configuring environment variables correctly.
  • After fixing the toolchain, indexing and BM25 search steps proceeded as expected.

Lesson-specific notes:

  • Lesson 2 (Indexing): Completed after resolving Java/Maven environment issues.
  • Lesson 3 (BM25 Search): Used prebuilt BM25 results when local index download was unavailable; verified results format and readiness for evaluation.

All required results were successfully reproduced, and the corresponding entries have been added to the reproduction logs.

@lintool
Copy link
Member

lintool commented Feb 7, 2026

Please follow instructions. Also, fix conflicts.

@lintool
Copy link
Member

lintool commented Feb 11, 2026

This PR still doesn't conform to instructions...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants