Claude/relaxed cerf kr lh c#495
Conversation
Company posts pages (/company/<slug>/posts/) lazy-load content the same way person /recent-activity/ pages do, but lacked the content-ready wait and 10-scroll budget since is_activity only matched /recent-activity/. Extend detection so company post scraping loads as many posts as person activity scraping does. https://claude.ai/code/session_012sXhsoKXZuFbHLtsEeYPks
Greptile SummaryThis PR makes company posts pages use the same extraction path as activity feeds. The main changes are:
Confidence Score: 4/5This is close, but the URL matching should be tightened before merging.
linkedin_mcp_server/scraping/extractor.py Important Files Changed
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
linkedin_mcp_server/scraping/extractor.py:1160-1162
**Normalize the posts URL**
This check runs against the raw URL string, so valid company posts URLs with a query string or fragment do not match. For example, `https://www.linkedin.com/company/foo/posts/?viewAsMember=true` reaches the same posts page, but `rstrip("/").endswith("/posts")` is false, so the extractor skips the new hydration wait and uses the shorter fast scroll path. Those callers can still receive only the tab/header text instead of the posts.
```suggestion
parsed_path = urlparse(url).path.rstrip("/")
is_activity = "/recent-activity/" in url or (
"/company/" in parsed_path and parsed_path.endswith("/posts")
)
```
Reviews (1): Last reviewed commit: "Merge branch 'stickerdaniel:main' into c..." | Re-trigger Greptile |
| is_activity = "/recent-activity/" in url or ( | ||
| "/company/" in url and url.rstrip("/").endswith("/posts") | ||
| ) |
There was a problem hiding this comment.
This check runs against the raw URL string, so valid company posts URLs with a query string or fragment do not match. For example, https://www.linkedin.com/company/foo/posts/?viewAsMember=true reaches the same posts page, but rstrip("/").endswith("/posts") is false, so the extractor skips the new hydration wait and uses the shorter fast scroll path. Those callers can still receive only the tab/header text instead of the posts.
| is_activity = "/recent-activity/" in url or ( | |
| "/company/" in url and url.rstrip("/").endswith("/posts") | |
| ) | |
| parsed_path = urlparse(url).path.rstrip("/") | |
| is_activity = "/recent-activity/" in url or ( | |
| "/company/" in parsed_path and parsed_path.endswith("/posts") | |
| ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: linkedin_mcp_server/scraping/extractor.py
Line: 1160-1162
Comment:
**Normalize the posts URL**
This check runs against the raw URL string, so valid company posts URLs with a query string or fragment do not match. For example, `https://www.linkedin.com/company/foo/posts/?viewAsMember=true` reaches the same posts page, but `rstrip("/").endswith("/posts")` is false, so the extractor skips the new hydration wait and uses the shorter fast scroll path. Those callers can still receive only the tab/header text instead of the posts.
```suggestion
parsed_path = urlparse(url).path.rstrip("/")
is_activity = "/recent-activity/" in url or (
"/company/" in parsed_path and parsed_path.endswith("/posts")
)
```
How can I resolve this? If you propose a fix, please make it concise.
No description provided.