Skip to content

feat(post): add search_posts content-search tool#532

Open
AkaNebur wants to merge 3 commits into
stickerdaniel:mainfrom
AkaNebur:feature/search-posts
Open

feat(post): add search_posts content-search tool#532
AkaNebur wants to merge 3 commits into
stickerdaniel:mainfrom
AkaNebur:feature/search-posts

Conversation

@AkaNebur

Copy link
Copy Markdown

Closes #531

Adds a new MCP tool search_posts(keywords, date_posted=None, max_pages=3) that drives
LinkedIn's global "Posts" content-search tab and returns the matching posts. This is the
surface for catching informal hiring posts — "we're hiring", "Buscamos ...", "estamos
contratando", "join our team" — that often appear before a formal job listing exists, so
it is distinct from get_feed (the authenticated user's home feed) and get_company_posts
(a single company page).

The tool follows the existing search-tool conventions. A new LinkedInExtractor.search_posts
method owns all logic; a thin wrapper in the new linkedin_mcp_server/tools/post.py
(register_post_tools, wired into server.py after register_feed_tools) only does
get_ready_extractor + ctx.report_progress + the standard error double-catch. A pure
@staticmethod _build_content_search_url composes
/search/results/content/?keywords=...&origin=FACETED_SEARCH and appends the datePosted
facet as a URL-encoded one-element JSON list (["past-week"]) via the existing
_encode_list_facet helper — mirroring how search_people encodes its network/
currentCompany facets, but with content search's literal datePosted tokens instead of
job search's f_TPR=r<seconds> codes. Underscore aliases (past_week) normalise onto
LinkedIn's exact tokens via the new _CONTENT_DATE_POSTED_MAP.

Content search is an infinite scroll with no &start= pagination, so max_pages is
expressed as scroll depth (max_scrolls = max_pages * _CONTENT_SCROLLS_PER_PAGE, ~5
scrolls/page). The result is the canonical {url, sections, references?, section_errors?}
shape: raw innerText under sections["search_results"] for the LLM to parse, with
references["search_results"] surfacing feed_post permalinks plus post authors/companies.
Following the AGENTS.md scraping philosophy and the deliberate choice made for get_feed,
no attempt is made to build structured per-post objects — there is no stable, locale-
independent selector for that, so permalinks are surfaced via references and the rest stays
raw text. A dedicated module (rather than folding into feed.py) keeps global content search
separate from home-feed scraping, mirroring how feed.py and messaging.py are their own
modules. Invalid date_posted raises FilterValidationError (a ValueError subclass),
re-raised in the tool layer as a ToolError so the actionable message survives
mask_error_details — identical to search_people. Rate-limit responses are surfaced as a
typed section_errors["search_results"] entry rather than an exception, mirroring get_feed.

Docs and tests land per the CONTRIBUTING.md "Adding a New Tool" checklist: README tool-table
row (working), manifest.json entry, docs/docker-hub.md Features bullet, and a
tools/__init__.py category bullet. 13 new tests cover both layers — URL building, alias
normalisation, scroll-depth mapping, FilterValidationError/ToolError surfacing, empty and
rate-limited results, and the Field(ge=1, le=10) boundary rejection — and search_posts is
added to both timeout sweeps.

Verified locally on a clean rebase onto main: uv run ruff check ., uv run ruff format --check ., and uv run ty check all clean; uv run pytest green (the 13 new tests pass and
the existing suite is unaffected).

Synthetic prompt

Add a read-only MCP tool search_posts to linkedin-mcp-server that runs LinkedIn's global
"Posts" content search. Put the logic on LinkedInExtractor in scraping/extractor.py as
search_posts(self, keywords, date_posted=None, max_pages=3) with a @staticmethod _build_content_search_url that builds /search/results/content/?keywords=...&origin=FACETED_SEARCH
and appends a URL-encoded JSON-list datePosted facet via _encode_list_facet, validating
date_posted against a _CONTENT_DATE_POSTED_MAP (past-24h/past-week/past-month + underscore
aliases) and raising FilterValidationError otherwise. Map max_pages to max_scrolls (~5
scrolls/page) and call extract_page(url, section_name="search_results", max_scrolls=...). Return
{url, sections, references?, section_errors?}, surfacing rate-limit as a typed section_errors
entry. Add a thin tools/post.py:register_post_tools wrapper (@mcp.tool with
readOnlyHint/openWorldHint, tags={"post","search"}, exclude_args=["extractor"], the
standard get_ready_extractor/report_progress/AuthenticationError-relogin/raise_tool_error
body, re-raising FilterValidationError as ToolError), register it in server.py after the feed
tools, and update README, manifest.json, docs/docker-hub.md, the tools/init.py docstring, and
both timeout-sweep tests. Two-layer tests in test_scraping.py and test_tools.py.

Generated with Claude Opus 4.8 (1M context)

AkaNebur added 2 commits June 21, 2026 11:49
Adds a new MCP tool search_posts(keywords, date_posted=None, max_pages=3)
that drives LinkedIn's global "Posts" content-search tab. It surfaces
informal hiring posts ("we're hiring", "Buscamos ...", "join our team")
that often appear before a formal job listing exists -- distinct from
get_feed (the authenticated user's home feed) and get_company_posts (a
single company page), so it gets its own tools/post.py module mirroring
feed.py rather than folding into either.

Follows the existing search-tool conventions:

- New LinkedInExtractor.search_posts method plus a pure
  _build_content_search_url static helper that composes
  /search/results/content/?keywords=...&origin=FACETED_SEARCH and appends
  the datePosted facet as a URL-encoded one-element JSON list via the
  existing _encode_list_facet helper, mirroring how search_people encodes
  its network/currentCompany facets (content search uses literal
  datePosted tokens rather than job search's f_TPR=r<seconds> codes).
  Underscore aliases normalise onto LinkedIn's tokens via
  _CONTENT_DATE_POSTED_MAP.
- Content search is an infinite scroll with no &start= pagination, so
  max_pages maps to scroll depth (~5 scrolls/page via
  _CONTENT_SCROLLS_PER_PAGE).
- Returns the canonical {url, sections, references?, section_errors?}
  shape: raw innerText under search_results plus feed_post permalink
  references. No structured per-post objects (no stable, locale-
  independent selector), matching the deliberate get_feed decision and the
  AGENTS.md scraping philosophy.
- Invalid date_posted raises FilterValidationError (a ValueError
  subclass), re-raised in the tool layer as ToolError so the actionable
  message survives mask_error_details. Rate-limit responses surface as a
  typed section_errors entry, mirroring get_feed.
- Thin tools/post.py:register_post_tools wired into server.py after
  register_feed_tools; tools/__init__.py docstring updated.
- Two-layer tests (tests/test_scraping.py + tests/test_tools.py): URL
  building, alias normalisation, scroll-depth mapping,
  FilterValidationError -> ToolError surfacing, empty and rate-limited
  results, the Field(ge=1, le=10) boundary, and search_posts added to both
  timeout sweeps.
Adds the search_posts row to the README tool table (status: working), a
Features bullet to docs/docker-hub.md, and the tool entry to the
manifest.json tools array, per the CONTRIBUTING.md "Adding a New Tool"
checklist.
@AkaNebur AkaNebur marked this pull request as ready for review June 21, 2026 10:32
@greptile-apps

greptile-apps Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a new search_posts MCP tool that drives LinkedIn's global "Posts" content-search tab, enabling discovery of informal hiring posts before formal listings appear. The implementation cleanly follows all existing tool conventions across both the scraping layer (LinkedInExtractor.search_posts + _build_content_search_url static method) and the tool layer (tools/post.py).

  • The scraping layer validates date_posted against _CONTENT_DATE_POSTED_MAP (raising FilterValidationError for unknowns), maps max_pages to scroll depth via _CONTENT_SCROLLS_PER_PAGE = 5, and returns the canonical {url, sections, references?, section_errors?} shape — correctly handling the rate-limit branch that the older search_people/search_companies methods lack.
  • The tool layer re-raises FilterValidationError as ToolError (so it survives mask_error_details), uses the standard get_ready_extractor/handle_auth_error/raise_tool_error chain, and validates max_pages with Field(ge=1, le=10); 13 new tests cover URL construction, alias normalisation, scroll-depth mapping, all three result branches (text/rate-limit/error), and boundary rejection.

Confidence Score: 5/5

Safe to merge — the change is additive, touches no existing logic, and the previous thread concerns (whitespace date_posted handling and missing extracted.error branch coverage) are both fully addressed in the current code.

All three result branches (text/rate-limit/navigation-error) are covered by tests, the date_posted whitespace guard is correctly applied in both the validator and the URL builder, FilterValidationError is properly surfaced as a ToolError so actionable messages reach the client, and the new tool is wired into both timeout sweeps. No existing paths are modified.

No files require special attention.

Important Files Changed

Filename Overview
linkedin_mcp_server/scraping/extractor.py Adds _CONTENT_DATE_POSTED_MAP constant, _build_content_search_url static method, and search_posts async method; follows existing patterns cleanly with correct whitespace guard on date_posted and all three result branches handled.
linkedin_mcp_server/tools/post.py New tool module with register_post_tools; correctly re-raises FilterValidationError as ToolError, uses standard auth/error handling chain, and excludes the extractor param from the MCP schema.
tests/test_scraping.py Adds TestBuildContentSearchUrl (5 cases) and TestSearchPosts (7 cases) covering URL construction, alias normalisation, scroll depth, empty results, rate-limit, and navigation-error branches.
tests/test_tools.py Adds TestPostTools with success, FilterValidationError surfacing, and max_pages=0 boundary tests; search_posts added to both timeout sweeps.
linkedin_mcp_server/server.py Imports and registers register_post_tools after register_feed_tools; minimal, correct change.
manifest.json Adds search_posts entry before close_session; description accurate.
README.md Adds search_posts row to the tools table with correct status and description.
docs/docker-hub.md Adds Post Search feature bullet under the Features section; accurate.
linkedin_mcp_server/tools/init.py Adds Post tools bullet to the module docstring; no logic change.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Client as MCP Client
    participant Tool as tools/post.py search_posts
    participant Extractor as LinkedInExtractor search_posts
    participant URLBuilder as _build_content_search_url
    participant Page as extract_page

    Client->>Tool: search_posts(keywords, date_posted, max_pages)
    Tool->>Tool: get_ready_extractor()
    Tool->>Tool: ctx.report_progress(0%)
    Tool->>Extractor: search_posts(keywords, date_posted, max_pages)

    alt invalid date_posted
        Extractor-->>Tool: raise FilterValidationError
        Tool-->>Client: raise ToolError (actionable message)
    end

    Extractor->>URLBuilder: _build_content_search_url(keywords, date_posted)
    URLBuilder-->>Extractor: "/search/results/content/?keywords=...&datePosted=[...]"

    Extractor->>Page: "extract_page(url, section_name=search_results, max_scrolls=max_pages*5)"
    Page-->>Extractor: ExtractedSection(text, references, error)

    alt text present and not rate-limited
        Extractor-->>Tool: "{url, sections, references?}"
    else "text == _RATE_LIMITED_MSG"
        Extractor-->>Tool: "{url, sections:{}, section_errors:{rate_limit}}"
    else extracted.error
        Extractor-->>Tool: "{url, sections:{}, section_errors:{navigation_error}}"
    end

    Tool->>Tool: ctx.report_progress(100%)
    Tool-->>Client: result dict
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Client as MCP Client
    participant Tool as tools/post.py search_posts
    participant Extractor as LinkedInExtractor search_posts
    participant URLBuilder as _build_content_search_url
    participant Page as extract_page

    Client->>Tool: search_posts(keywords, date_posted, max_pages)
    Tool->>Tool: get_ready_extractor()
    Tool->>Tool: ctx.report_progress(0%)
    Tool->>Extractor: search_posts(keywords, date_posted, max_pages)

    alt invalid date_posted
        Extractor-->>Tool: raise FilterValidationError
        Tool-->>Client: raise ToolError (actionable message)
    end

    Extractor->>URLBuilder: _build_content_search_url(keywords, date_posted)
    URLBuilder-->>Extractor: "/search/results/content/?keywords=...&datePosted=[...]"

    Extractor->>Page: "extract_page(url, section_name=search_results, max_scrolls=max_pages*5)"
    Page-->>Extractor: ExtractedSection(text, references, error)

    alt text present and not rate-limited
        Extractor-->>Tool: "{url, sections, references?}"
    else "text == _RATE_LIMITED_MSG"
        Extractor-->>Tool: "{url, sections:{}, section_errors:{rate_limit}}"
    else extracted.error
        Extractor-->>Tool: "{url, sections:{}, section_errors:{navigation_error}}"
    end

    Tool->>Tool: ctx.report_progress(100%)
    Tool-->>Client: result dict
Loading

Reviews (2): Last reviewed commit: "fix(post): omit whitespace-only date_pos..." | Re-trigger Greptile

Comment thread linkedin_mcp_server/scraping/extractor.py Outdated
Comment thread tests/test_scraping.py
Addresses review feedback on stickerdaniel#532:

- _build_content_search_url now guards on date_posted.strip(), so a
  whitespace-only value (e.g. "   ") is omitted from the URL instead of
  being appended as an invalid datePosted facet. The stripped value is
  also used as the alias-map fallback so passthrough tokens are
  normalised. This keeps the builder in sync with the search_posts
  validation, which already short-circuits on a falsy strip().
- Add a regression test for the whitespace case, plus a test for the
  previously-uncovered `elif extracted.error:` branch (a navigation
  error surfaces a typed section_errors entry, mirroring search_people).
@AkaNebur

Copy link
Copy Markdown
Author

Thanks for the review! Both findings are addressed in db7d14b:

  • Whitespace-only date_posted_build_content_search_url now guards on date_posted.strip(), so a whitespace-only value (e.g. " ") is omitted from the URL instead of being appended as an invalid datePosted facet. The stripped value is also used as the alias-map fallback, keeping the builder in sync with the search_posts validation (which already short-circuits on a falsy .strip()). Added test_whitespace_date_posted_omits_facet as a regression guard.
  • Uncovered elif extracted.error: branch — added test_navigation_error_surfaces_section_error, so a navigation error now surfaces a typed section_errors entry under test, mirroring the analogous coverage in search_people / search_companies.

Full gate green locally and in CI: ruff check ., ruff format --check ., ty check, pytest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add search_posts tool for global LinkedIn post/content search

1 participant