ci: add sitemap markdown endpoint consistency check by marcklingen · Pull Request #2697 · langfuse/langfuse-docs

marcklingen · 2026-03-23T19:24:08Z

Motivation

Ensure the routing contract between canonical site pages and their pre-built .md equivalents is not accidentally broken.
Prevent regressions that would make sitemap-listed pages unreachable as markdown (used by agents, MCP tools, and the copy/export features).

Description

Add scripts/check-sitemap-markdown.js, a checker that fetches the locally served sitemap-0.xml, maps each sitemap URL to its .md endpoint, requests the endpoint with Accept: text/markdown, and verifies HTTP status and text/markdown content-type, failing with a list of problematic endpoints.
Add a sitemap-markdown-check npm script in package.json to run the new checker via pnpm sitemap-markdown-check.
Add a CI job check-sitemap-markdown-endpoints to .github/workflows/ci.yml that builds the app, starts the server, runs the checker, and fails the job if any endpoint is missing or has an unexpected response.
Wire the new CI job into the all-checks-pass summary job so the check runs for pull requests and merges.

Testing

Ran a syntax check with node --check scripts/check-sitemap-markdown.js, which succeeded.
Validated package.json by parsing it with node -e "const fs=require('fs');JSON.parse(fs.readFileSync('package.json','utf8'));console.log('package.json valid')", which succeeded.
The new CI job is added to the pipeline and will run automatically on PRs to validate markdown endpoint consistency during CI.

Codex Task

vercel · 2026-03-23T19:24:10Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
langfuse-docs	Ready	Preview, Comment	Mar 23, 2026 7:57pm

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: debaba8f12

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T19:28:10Z

scripts/check-sitemap-markdown.js

+function toLocalMarkdownUrl(absoluteUrl) {
+  const parsed = new URL(absoluteUrl);
+  const pathname = parsed.pathname;
+  const mdPath = pathname.endsWith('.md') ? pathname : `${pathname}.md`;


Normalize slash-terminated paths before appending .md

toLocalMarkdownUrl appends .md directly to the sitemap pathname, so slash-terminated URLs are transformed into invalid endpoints like /.md (and /foo/.md). Since next-sitemap.config.js always includes { loc: '/' } in additionalPaths, this check will falsely fail on the homepage even when markdown routing is healthy, which can block the new CI job on otherwise valid builds.

Useful? React with 👍 / 👎.

@codex pls fix

Summary

Fixed toLocalMarkdownUrl to normalize slash-terminated sitemap paths before appending .md, preventing invalid outputs like /.md and /foo/.md. scripts/check-sitemap-markdown.jsL86-L94

Added explicit root handling so / maps to /index.md, and trimmed trailing slashes on non-root paths so /foo/ maps to /foo.md. scripts/check-sitemap-markdown.jsL89-L93

Committed the change (fix: normalize trailing slashes in sitemap markdown URL mapping) and created a follow-up PR via make_pr with a summary of the fix.

Testing

✅ node --check scripts/check-sitemap-markdown.js

View task →

claude · 2026-03-23T19:32:15Z

scripts/check-sitemap-markdown.js

+  const parseString = promisify(parser.parseString);
+
+  try {


🔴 promisify(parser.parseString) detaches the method from its xml2js Parser instance, causing this to be undefined in strict mode when called as a standalone function. This throws a TypeError on every invocation, making the CI check permanently non-functional. Fix: change line 67 to const parseString = promisify(parser.parseString.bind(parser));.

Extended reasoning...

Bug: promisify loses this context for xml2js Parser.parseString

What the bug is and how it manifests

In parseSitemap() (line 66-68), the code does:

const parser = new xml2js.Parser(); const parseString = promisify(parser.parseString);

This extracts the parseString method from the parser instance as a bare function reference. When await parseString(xmlContent) is later called (line 70), Node.js's util.promisify wrapper invokes original.call(this, ...) where this is the calling context, but since parseString is called without a receiver, this is undefined in strict mode.

The specific code path that triggers it

The file opens with 'use strict' (line 1). Under strict mode, unbound method calls have this === undefined rather than the global object. xml2js's Parser.prototype.parseString internally calls this.reset(), reads this.options, and accesses other instance state. With this being undefined, accessing any property on it throws a TypeError.

Why existing code does not prevent it

The try/catch in parseSitemap() will catch the TypeError and re-throw it as Failed to parse sitemap XML: ..., so the error is surfaced. However, this happens on every single invocation. There is no code path where parseString succeeds without .bind(parser).

Step-by-step proof

parser = new xml2js.Parser() creates instance with this.options, this.remaining, etc.

parseString = promisify(parser.parseString) extracts bare function, no this captured.

await parseString(xmlContent) has promisify call fn.call(this, xmlContent, callback) where this is undefined in strict mode.

Inside Parser.prototype.parseString, first access of this.options throws TypeError.

The catch block in parseSitemap wraps it as Failed to parse sitemap XML: Cannot read properties of undefined.

main() catches this, logs it, and calls process.exit(1) so the CI job fails every time.

Impact

The entire CI check (check-sitemap-markdown-endpoints) is permanently non-functional. It will never successfully parse the sitemap XML and will always exit with code 1, either blocking merges or generating false positives depending on the branch protection configuration.

How to fix it

Bind the method to its instance before promisifying:

const parseString = promisify(parser.parseString.bind(parser));

This ensures this inside parseString always refers to the correct parser instance, preserving access to all instance properties and methods that xml2js relies on internally.

claude · 2026-03-23T19:32:15Z

scripts/check-sitemap-markdown.js

+          response.headers.location
+        ) {
+          const redirectUrl = response.headers.location;
+          const resolvedUrl = redirectUrl.startsWith('http')
+            ? redirectUrl
+            : new URL(redirectUrl, url).href;
+          fetchUrl(resolvedUrl, headers).then(resolve).catch(reject);
+          return;
+        }
+


🟡 In fetchUrl(), when a 3xx redirect is received, the response body is not drained before following the redirect — response.resume() is never called. In Node.js, an unconsumed response body holds the underlying socket open, preventing it from being released to the connection pool. Fix by adding response.resume() immediately before the recursive fetchUrl() call.

Extended reasoning...

Connection Leak on Redirect Responses

What the bug is: In scripts/check-sitemap-markdown.js, the fetchUrl() function handles HTTP redirects by detecting a 3xx status code with a location header and immediately making a recursive call. However, it never calls response.resume() on the redirect response before doing so. In Node.js, a response object is a readable stream; if that stream is not consumed, the underlying TCP socket cannot be returned to the keep-alive connection pool.

The specific code path: Lines 31–40 of the new script show the redirect branch:

if (response.statusCode >= 300 && response.statusCode < 400 && response.headers.location) { const redirectUrl = response.headers.location; const resolvedUrl = ...; fetchUrl(resolvedUrl, headers).then(resolve).catch(reject); return; // response body is never drained }

The return exits the callback without attaching any data/end listeners or calling response.resume(), leaving the socket in a half-open state.

Why existing code does not prevent it: The script uses Node built-in http/https modules with the default http.globalAgent. In Node.js 18+ the global agent has keep-alive enabled, meaning it tries to reuse connections. A socket is only returned to the pool once the response body stream is fully consumed. Without response.resume(), the socket stays reserved until the remote server closes it.

Impact: Because http.globalAgent.maxSockets defaults to Infinity in modern Node.js, the connection pool will not be exhausted — new connections are created as needed. However, each redirect silently leaks a socket handle that accumulates for the duration of the script. In this CI context the practical risk is very low: the script runs against localhost, redirect bodies are tiny or absent, MAX_CONCURRENT_REQUESTS is 10, and the process exits after completion which cleans everything up.

Step-by-step proof:

The sitemap contains a URL /some/page that the server redirects (301) to /some/page/.

fetchUrl(http://localhost:3333/some/page.md) is called.

The response callback fires with statusCode: 301 and location: /some/page/.md.

The code calls fetchUrl(/some/page/.md, headers).then(resolve).catch(reject) and returns.

No data or end event handler is attached to the 301 response stream; response.resume() is never called.

The socket remains open, held by the unconsumed stream, until the local Next.js server decides to close it.

With enough redirected URLs, many sockets accumulate in this limbo state simultaneously.

How to fix it: Add a single response.resume() call before following the redirect:

if (response.statusCode >= 300 && response.statusCode < 400 && response.headers.location) { response.resume(); // drain body so the socket is returned to the pool const redirectUrl = response.headers.location; // ... fetchUrl(resolvedUrl, headers).then(resolve).catch(reject); return; }

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b98da7cfc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T19:59:04Z

scripts/check-sitemap-markdown.js

+  const normalizedPathname =
+    pathname === '/' ? '/index' : pathname.replace(/\/+$/, '');


Exclude root URL from markdown endpoint mapping

next-sitemap.config.js always includes / in additionalPaths, but this logic rewrites / to /index.md. The markdown export path generation in scripts/copy_md_sources.js only creates public/md-src/index.md when content/index.(md|mdx) exists, and this repo currently has no such source file, so the checker will consistently hit a 404 for the homepage and fail CI even when other sitemap pages are healthy.

Useful? React with 👍 / 👎.

ci: add sitemap markdown endpoint consistency check

debaba8

marcklingen added the codex label Mar 23, 2026 — with ChatGPT Codex Connector

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Mar 23, 2026

vercel bot deployed to Preview March 23, 2026 19:26 View deployment

ci: run markdown sitemap check in existing sitemap job

0650813

chatgpt-codex-connector bot reviewed Mar 23, 2026

View reviewed changes

vercel bot deployed to Preview March 23, 2026 19:28 View deployment

claude bot reviewed Mar 23, 2026

View reviewed changes

marcklingen added 2 commits March 23, 2026 20:53

fix: normalize trailing slashes in sitemap markdown URL mapping

18a9d37

fix(ci): address sitemap markdown checker review issues

2b98da7

vercel bot deployed to Preview March 23, 2026 19:57 View deployment

chatgpt-codex-connector bot reviewed Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: add sitemap markdown endpoint consistency check#2697

ci: add sitemap markdown endpoint consistency check#2697
marcklingen wants to merge 4 commits intomainfrom
codex/add-ci-check-for-sitemap-and-.md-files

marcklingen commented Mar 23, 2026

Uh oh!

vercel bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 23, 2026

Uh oh!

marcklingen Mar 23, 2026

Uh oh!

chatgpt-codex-connector bot Mar 23, 2026

Uh oh!

claude bot Mar 23, 2026

Uh oh!

claude bot Mar 23, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		const normalizedPathname =
		pathname === '/' ? '/index' : pathname.replace(/\/+$/, '');

Conversation

marcklingen commented Mar 23, 2026

Motivation

Description

Testing

Uh oh!

vercel bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

marcklingen Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 23, 2026

Choose a reason for hiding this comment

Summary

Uh oh!

claude bot Mar 23, 2026

Choose a reason for hiding this comment

Bug: promisify loses this context for xml2js Parser.parseString

Uh oh!

claude bot Mar 23, 2026

Choose a reason for hiding this comment

Connection Leak on Redirect Responses

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Mar 23, 2026 •

edited

Loading