Skip to content

feat: add get_post tool with image download support#489

Open
Czhang0727 wants to merge 3 commits into
stickerdaniel:mainfrom
Czhang0727:feat/get-post-with-images
Open

feat: add get_post tool with image download support#489
Czhang0727 wants to merge 3 commits into
stickerdaniel:mainfrom
Czhang0727:feat/get-post-with-images

Conversation

@Czhang0727

Copy link
Copy Markdown

Summary

  • Adds a new get_post tool that navigates to a specific LinkedIn post URL and extracts its full content
  • Downloads any attached images (diagrams, charts, photos) to a temp directory and returns local file paths
  • Filters out small images (< 100px) to skip icons and avatars

Problem

get_person_profile(sections="posts") returns post images as "Activate to view larger image" placeholders. There's no way to access visual content (diagrams, screenshots, charts) from posts without resorting to browser automation outside the MCP.

Solution

```python
result = get_post("https://www.linkedin.com/feed/update/urn:li:activity:123/")

result["sections"]["post"] → full post text

result["images"] → [{"path": "/tmp/linkedin_post_imgs_xxx/image_1.png", "index": 1}, ...]

```

The `download_images=True` parameter (default) triggers image capture using Playwright's element screenshot API — same browser session, no extra auth needed.

Test plan

  • Tool registers correctly alongside existing 17 tools (verified with asyncio.run)
  • Live integration test: 14,784 chars extracted, 5 images saved from a real LinkedIn post
  • Small images (icons/avatars < 100px) correctly filtered out
  • `download_images=False` skips image capture for text-only use cases

🤖 Generated with Claude Code

Adds a new `get_post` tool that:
- Navigates to a specific LinkedIn post URL
- Extracts full post text (bypassing "...more" truncation)
- Downloads attached images (diagrams, charts, photos) to a temp dir
- Returns local file paths alongside the text content
- Skips small images (< 100px) to filter out icons and avatars

This fills the gap where `get_person_profile(sections="posts")` returns
images as "Activate to view larger image" placeholders. Use `get_post`
when a post has visual content that matters for your workflow.

Example usage:
  get_post("https://www.linkedin.com/feed/update/urn:li:activity:123/")
  # Returns sections["post"] with full text + images[].path for each image

Tested against live LinkedIn session: 14k chars extracted, 5 images saved.
@greptile-apps

greptile-apps Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds a new tool for scraping individual LinkedIn posts. The main changes are:

  • Registers a get_post tool alongside the existing MCP tools.
  • Validates LinkedIn post URLs before navigation.
  • Extracts post text and tries to expand inline “see more” content.
  • Downloads attached post images to a temporary directory with size filtering and a capture cap.
  • Returns post sections, image paths, references, and extraction errors.

Confidence Score: 3/5

This should be fixed before merging.

  • Truncated LinkedIn posts can still return shortened text because the expansion selector can click the container instead of the button.
  • Image capture can still include media from another update card on the same permalink page.
  • Both issues affect the core output promised by the new tool.

linkedin_mcp_server/tools/post.py

Important Files Changed

Filename Overview
linkedin_mcp_server/tools/post.py Adds the get_post implementation, including URL validation, post expansion, reference preservation, and post image capture.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
linkedin_mcp_server/tools/post.py:184-188
**Fix scoped expansion**

`post_container_sel` is a comma-separated selector list, and interpolating it before the button selector makes the first alternatives match the container itself: `article.feed-shared-update-v2` and `.scaffold-layout__main .feed-shared-update-v2`. Because `.first` can select the post container instead of the “see more” button, this click can fail or do nothing and the broad `except` leaves `sections["post"]` truncated. A post with hidden text behind LinkedIn’s inline “see more” control can still return the shortened body.

```suggestion
                see_more = page.locator(
                    "article.feed-shared-update-v2 button.feed-shared-inline-show-more-text__see-more-less-toggle, "
                    ".scaffold-layout__main .feed-shared-update-v2 button.feed-shared-inline-show-more-text__see-more-less-toggle, "
                    "main > div > div button.feed-shared-inline-show-more-text__see-more-less-toggle, "
                    "article.feed-shared-update-v2 button[aria-label*='see more'], "
                    ".scaffold-layout__main .feed-shared-update-v2 button[aria-label*='see more'], "
                    "main > div > div button[aria-label*='see more'], "
                    "article.feed-shared-update-v2 .feed-shared-text .see-more, "
                    ".scaffold-layout__main .feed-shared-update-v2 .feed-shared-text .see-more, "
                    "main > div > div .feed-shared-text .see-more"
                ).first
```

### Issue 2 of 2
linkedin_mcp_server/tools/post.py:41-48
**Bind media to post**

These selectors are scoped to any `article.feed-shared-update-v2` on the page, not to the single primary post that was requested. If a permalink page contains another update card in recommendations, related feed content, an ad, or a repost preview, `page.locator(selector)` can still collect images from that other article and return them as attachments for `post_url`. The image lookup needs to first resolve the primary post container, then query media inside only that container.

Reviews (3): Last reviewed commit: "fix: scope text expansion and image sele..." | Re-trigger Greptile

Comment thread linkedin_mcp_server/tools/post.py
Comment thread linkedin_mcp_server/tools/post.py Outdated
Comment thread linkedin_mcp_server/tools/post.py Outdated
Comment thread linkedin_mcp_server/tools/post.py Outdated
Comment thread linkedin_mcp_server/tools/post.py Outdated
Comment thread linkedin_mcp_server/tools/post.py Outdated
Comment thread linkedin_mcp_server/tools/post.py Outdated
Comment thread linkedin_mcp_server/tools/post.py
Comment thread linkedin_mcp_server/tools/post.py Outdated
1. URL validation (SSRF): reject non-LinkedIn and non-post-path URLs
   before any browser navigation
2. Image scoping: replace broad `main img` with post-container-specific
   CSS selectors to avoid capturing unrelated page images
3. Image-only posts: run image capture regardless of text extraction
   result (don't gate on sections.get("post"))
4. Screenshot cap: stop after _MAX_POST_IMAGES (10) to prevent timeout
5. Lazy temp dir: only create temp directory when first image passes
   the size filter and screenshots successfully
6. Normalised indexes: use 0-based index among returned images only,
   not raw DOM position
7. Canonical URL: return page.url after navigation instead of raw input
8. Preserve references: include extracted.references in the result dict
9. Expand post text: click inline "see more" / "...more" button after
   navigation to bypass LinkedIn's inline text truncation
Comment thread linkedin_mcp_server/tools/post.py
Comment thread linkedin_mcp_server/tools/post.py
Issue 1: Replace main.innerText with post-container-scoped extraction
to avoid pulling in comments, recommendations, and page chrome.

Issue 2: Anchor all image CSS selectors to article.feed-shared-update-v2
to prevent capturing images from outside the requested post.
Comment on lines +184 to +188
see_more = page.locator(
f"{post_container_sel} button.feed-shared-inline-show-more-text__see-more-less-toggle, "
f"{post_container_sel} button[aria-label*='see more'], "
f"{post_container_sel} .feed-shared-text .see-more"
).first

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Fix scoped expansion

post_container_sel is a comma-separated selector list, and interpolating it before the button selector makes the first alternatives match the container itself: article.feed-shared-update-v2 and .scaffold-layout__main .feed-shared-update-v2. Because .first can select the post container instead of the “see more” button, this click can fail or do nothing and the broad except leaves sections["post"] truncated. A post with hidden text behind LinkedIn’s inline “see more” control can still return the shortened body.

Suggested change
see_more = page.locator(
f"{post_container_sel} button.feed-shared-inline-show-more-text__see-more-less-toggle, "
f"{post_container_sel} button[aria-label*='see more'], "
f"{post_container_sel} .feed-shared-text .see-more"
).first
see_more = page.locator(
"article.feed-shared-update-v2 button.feed-shared-inline-show-more-text__see-more-less-toggle, "
".scaffold-layout__main .feed-shared-update-v2 button.feed-shared-inline-show-more-text__see-more-less-toggle, "
"main > div > div button.feed-shared-inline-show-more-text__see-more-less-toggle, "
"article.feed-shared-update-v2 button[aria-label*='see more'], "
".scaffold-layout__main .feed-shared-update-v2 button[aria-label*='see more'], "
"main > div > div button[aria-label*='see more'], "
"article.feed-shared-update-v2 .feed-shared-text .see-more, "
".scaffold-layout__main .feed-shared-update-v2 .feed-shared-text .see-more, "
"main > div > div .feed-shared-text .see-more"
).first
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/tools/post.py
Line: 184-188

Comment:
**Fix scoped expansion**

`post_container_sel` is a comma-separated selector list, and interpolating it before the button selector makes the first alternatives match the container itself: `article.feed-shared-update-v2` and `.scaffold-layout__main .feed-shared-update-v2`. Because `.first` can select the post container instead of the “see more” button, this click can fail or do nothing and the broad `except` leaves `sections["post"]` truncated. A post with hidden text behind LinkedIn’s inline “see more” control can still return the shortened body.

```suggestion
                see_more = page.locator(
                    "article.feed-shared-update-v2 button.feed-shared-inline-show-more-text__see-more-less-toggle, "
                    ".scaffold-layout__main .feed-shared-update-v2 button.feed-shared-inline-show-more-text__see-more-less-toggle, "
                    "main > div > div button.feed-shared-inline-show-more-text__see-more-less-toggle, "
                    "article.feed-shared-update-v2 button[aria-label*='see more'], "
                    ".scaffold-layout__main .feed-shared-update-v2 button[aria-label*='see more'], "
                    "main > div > div button[aria-label*='see more'], "
                    "article.feed-shared-update-v2 .feed-shared-text .see-more, "
                    ".scaffold-layout__main .feed-shared-update-v2 .feed-shared-text .see-more, "
                    "main > div > div .feed-shared-text .see-more"
                ).first
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +41 to +48
_POST_IMAGE_SELECTORS = (
# Image/video attachments inside the top-level post article only
"article.feed-shared-update-v2 .update-components-image img",
"article.feed-shared-update-v2 .feed-shared-image img",
"article.feed-shared-update-v2 .update-components-linkedin-video__embed img",
"article.feed-shared-update-v2 .feed-shared-document__container img",
# Broad fallback still scoped to the article element
"article.feed-shared-update-v2 img",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Bind media to post

These selectors are scoped to any article.feed-shared-update-v2 on the page, not to the single primary post that was requested. If a permalink page contains another update card in recommendations, related feed content, an ad, or a repost preview, page.locator(selector) can still collect images from that other article and return them as attachments for post_url. The image lookup needs to first resolve the primary post container, then query media inside only that container.

Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/tools/post.py
Line: 41-48

Comment:
**Bind media to post**

These selectors are scoped to any `article.feed-shared-update-v2` on the page, not to the single primary post that was requested. If a permalink page contains another update card in recommendations, related feed content, an ad, or a repost preview, `page.locator(selector)` can still collect images from that other article and return them as attachments for `post_url`. The image lookup needs to first resolve the primary post container, then query media inside only that container.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant