Skip to content

Concurrent LocalFileStore mset writes can corrupt chunked embeddings cache #9337

@Josh-Engle

Description

@Josh-Engle

Checked other resources

  • This is a bug, not a usage question. For questions, please use the LangChain Forum (https://forum.langchain.com/).
  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Reproducable Example

import { LocalFileStore } from "@langchain/classic/storage/file_system";

const run = async () => {
  const store = await LocalFileStore.fromPath("./cache");
  const encoder = new TextEncoder();

  const PARALLEL_CALLS = 100;
  const CHUNKS_PER_CALL = 10;
  const MIN_BYTES = 5;
  const MAX_BYTES = 50;

  // helper to make a large string payload
  const makePayload = (id: number) => {
    const size = Math.floor(Math.random() * (MAX_BYTES - MIN_BYTES + 1)) + MIN_BYTES;
    const chunk = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    let s = "";
    while (Buffer.byteLength(s, "utf8") < size) {
      s += chunk[Math.floor(Math.random() * chunk.length)];
    }

    if (Buffer.byteLength(s, "utf8") > size) {
      while (Buffer.byteLength(s, "utf8") > size) s = s.slice(0, -1);
    }
    const base = { chunkId: id, data: s };
    return JSON.stringify(base);
  };

  const promises: Promise<void>[] = [];
  for (let i = 0; i < PARALLEL_CALLS; i++) {
    const callIndex = i;
    promises.push(
      (async () => {
        const entries: [string, Uint8Array][] = [];
        for (let j = 0; j < CHUNKS_PER_CALL; j++) {
          const payload = makePayload(callIndex * CHUNKS_PER_CALL + j);
          entries.push(["message", encoder.encode(payload)]);
        }
        await store.mset(entries);
      })()
    );
  }

  await Promise.all(promises);
  await store.mget(["message"]);
};

run()

Produces message.txt with the following data.

{"chunkId":998,"data":"0vCRvGfwpgqwixqYFQgo59xUn"}rY"}04Fsfa6N"}5"}Ys"}"}"}

Error Message and Stack Trace (if applicable)

No response

Description

  • I am using CharacterTextSplitter and CacheBackedEmbeddings with LocalFileStore
  • I expected to have locally cached embeddings that could be re-used.
  • Instead there were JSON serialization errors while reading from the cache.

System Info

[email protected] | MIT | deps: 5 | versions: 343
Typescript bindings for langchain
https://github.com/langchain-ai/langchainjs/tree/main/langchain/

keywords: llm, ai, gpt3, chain, prompt, prompt engineering, chatgpt, machine learning, ml, openai, embeddings, vectorstores

dist
.tarball: https://registry.npmjs.org/langchain/-/langchain-1.0.3.tgz
.shasum: d2ca5e0bf1882678d556b087783b373f92d86677
.integrity: sha512-nGxI9li1yttHzLHtECgw3hGMNzEDQR+EpW4kHKy1mbWjyBXtUDI6SyHjAeOAX7c6oV1QYRlFsOlXtahPiMySpQ==
.unpackedSize: 4.2 MB

dependencies:
@langchain/langgraph-checkpoint: ^1.0.0 langsmith: ~0.3.74                      zod: ^3.25.76 || ^4
@langchain/langgraph: ^1.0.0            uuid: ^10.0.0

maintainers:
- christian-bromann <[email protected]>
- nfcampos <[email protected]>
- jacoblee93 <[email protected]>
- andrewnguonly <[email protected]>
- davidduong <[email protected]>
- hntrl <[email protected]>
- hwchase17 <[email protected]>
- basproul <[email protected]>

dist-tags:
alpha: 1.0.0-alpha.9                       next: 1.0.0-alpha.9
latest: 1.0.3                              tag-for-publishing-older-releases: 0.2.20

published 19 hours ago by christian-bromann <[email protected]>

platform Linux - Ubuntu - 6.6.87.2-microsoft-standard-WSL2

node --version - v24.6.0

pnpm --version 10.20.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedThis would make a good PR

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions