Skip to content

Conversation

@TigranTigranTigran
Copy link
Collaborator

Summary 📝

Enabled pdf scraper and fixed some bugs (use of correct scraper schema and fix in base scraper class)

Bugfixes 🐛

  1. Enabled the scraper to be called in _execute_searches:
            try:
                assessment_output = await self.link_relevancy_assessor.arun(
                    assessor_input,
                )
                if self.debug:
                    logger.debug(
                        f"Relevancy assessment summary: {assessment_output.assessment_summary}",
                    )
                if self.web_scraper and assessment_output.filtered_results:
                    assessment_output.filtered_results = await self._fetch_full_content_for_high_relevancy(assessment_output.filtered_results)
                return assessment_output.filtered_results
            except Exception as e:
                logger.warning(f"Error in relevancy assessment: {e}")
  1. Now using correct input schema for PDF scraper: PDFScraperInputSchema (before this it was ScraperToolInputSchema)

  2. httpx.AsycClient call fixed so that it now automatically follows redirects (i.e., follow_redirects=True); it now doesn't throw 301 errors for moved urls

Checks

  • Closed #798
  • Tested Changes
  • Stakeholder Approval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants