Skip to content

Claude/relaxed cerf kr lh c#495

Open
streetspirit83 wants to merge 2 commits into
stickerdaniel:mainfrom
streetspirit83:claude/relaxed-cerf-KRLhC
Open

Claude/relaxed cerf kr lh c#495
streetspirit83 wants to merge 2 commits into
stickerdaniel:mainfrom
streetspirit83:claude/relaxed-cerf-KRLhC

Conversation

@streetspirit83

Copy link
Copy Markdown

No description provided.

claude and others added 2 commits June 8, 2026 17:49
Company posts pages (/company/<slug>/posts/) lazy-load content the
same way person /recent-activity/ pages do, but lacked the
content-ready wait and 10-scroll budget since is_activity only
matched /recent-activity/. Extend detection so company post scraping
loads as many posts as person activity scraping does.

https://claude.ai/code/session_012sXhsoKXZuFbHLtsEeYPks
@greptile-apps

greptile-apps Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR makes company posts pages use the same extraction path as activity feeds. The main changes are:

  • Detects /company/<slug>/posts/ as an activity-style page.
  • Applies the activity hydration wait to company posts pages.
  • Uses the slower, deeper scroll settings for company posts extraction.
  • Adds a regression test for an exact company posts URL.

Confidence Score: 4/5

This is close, but the URL matching should be tightened before merging.

  • The normal generated company posts URL now follows the intended wait and scroll path.
  • Company posts URLs with query strings or fragments still miss the new branch.
  • The missed branch can return incomplete posts content for valid URL variants.

linkedin_mcp_server/scraping/extractor.py

Important Files Changed

Filename Overview
linkedin_mcp_server/scraping/extractor.py Adds company posts URL detection to the activity-style wait and scroll branch.
tests/test_scraping.py Adds coverage for the exact trailing-slash company posts URL.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
linkedin_mcp_server/scraping/extractor.py:1160-1162
**Normalize the posts URL**

This check runs against the raw URL string, so valid company posts URLs with a query string or fragment do not match. For example, `https://www.linkedin.com/company/foo/posts/?viewAsMember=true` reaches the same posts page, but `rstrip("/").endswith("/posts")` is false, so the extractor skips the new hydration wait and uses the shorter fast scroll path. Those callers can still receive only the tab/header text instead of the posts.

```suggestion
        parsed_path = urlparse(url).path.rstrip("/")
        is_activity = "/recent-activity/" in url or (
            "/company/" in parsed_path and parsed_path.endswith("/posts")
        )
```

Reviews (1): Last reviewed commit: "Merge branch 'stickerdaniel:main' into c..." | Re-trigger Greptile

Comment on lines +1160 to +1162
is_activity = "/recent-activity/" in url or (
"/company/" in url and url.rstrip("/").endswith("/posts")
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Normalize the posts URL

This check runs against the raw URL string, so valid company posts URLs with a query string or fragment do not match. For example, https://www.linkedin.com/company/foo/posts/?viewAsMember=true reaches the same posts page, but rstrip("/").endswith("/posts") is false, so the extractor skips the new hydration wait and uses the shorter fast scroll path. Those callers can still receive only the tab/header text instead of the posts.

Suggested change
is_activity = "/recent-activity/" in url or (
"/company/" in url and url.rstrip("/").endswith("/posts")
)
parsed_path = urlparse(url).path.rstrip("/")
is_activity = "/recent-activity/" in url or (
"/company/" in parsed_path and parsed_path.endswith("/posts")
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 1160-1162

Comment:
**Normalize the posts URL**

This check runs against the raw URL string, so valid company posts URLs with a query string or fragment do not match. For example, `https://www.linkedin.com/company/foo/posts/?viewAsMember=true` reaches the same posts page, but `rstrip("/").endswith("/posts")` is false, so the extractor skips the new hydration wait and uses the shorter fast scroll path. Those callers can still receive only the tab/header text instead of the posts.

```suggestion
        parsed_path = urlparse(url).path.rstrip("/")
        is_activity = "/recent-activity/" in url or (
            "/company/" in parsed_path and parsed_path.endswith("/posts")
        )
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants