Skip to content

Releases: ianarawjo/ChainForge

v0.3.6: Add Media Node 📺 and Image Inputs

11 May 19:44
c6ff16e

Choose a tag to compare

Image inputs to OpenAI, Anthropic, Google, and Ollama providers

You can now upload images to a new Media Node and query with image inputs:

Screen Shot 2025-05-11 at 2 04 53 PM

This is currently only limited to OpenAI, Anthropic, Google, and Ollama. If you want more
provider support, consider making a PR. The relevant calls to LLMs are in utils.ts.

Image outputs to image inputs

Also enables generating images with one model and then passing them through prompt chaining to another, e.g.,
Screen Shot 2025-05-10 at 3 27 01 PM

...Lots of other internal changes

This update required making lots of changes to how ChainForge stores media.

Images take up a lot of space in the browser. To handle this, ChainForge does not store the images in the browser (anymore). Instead we keep unique IDs to each image, derive from their SHA-256 hash, which we use to query the backend and pull the image only when it's necessary (e.g., viewing images in Table Views).

When running locally, ChainForge now copies media your upload into a media directory stored in the same place it puts Saved Flows.

As a consequence, ChainForge can now export and import cfzip bundles which are compressed .cforge (JSON) files alongside the media file(s) used in the flow.

(When running on the browser on chainforge.ai/play, ChainForge will continue to cache images in the front-end.)

These changes will make it easier to add data-heavy analysis pipelines, i.e. loading many documents and chunks into a RAG pipeline, in the near future.

If you encounter any problems with this release, please let us know.

Python 3.10+ are now the only versions supported by ChainForge.

We have deprecated support for 3.8 and 3.9 since the markitdown package required 3.10. This is a forward-thinking change since many other RAG-related packages now require ~3.10.

Developers

This update was initiated by the work of @loloMD and @RoyHEyono , who added the Media Node and image inputs to ChainForge's querying infrastructure. Thank you Roy and Loic! 🎉🙌📺

v0.3.5: Favorites ♥️, Ollama List 🦙, Encryption 🔐, Persistent Settings ⚙️

15 Apr 03:10
0de8f6f

Choose a tag to compare

This update adds four features for when ChainForge is running locally:

  • You can now Favorite ♥️ your nodes and models (at their precise settings), to recreate them later.
    • To favorite a node, just right-click it and hit "Favorite".
    • To favorite a model, open the model's settings screen and click the Favorite button in the top-right.
    • Favorites persist across sessions.
  • Global Settings ⚙️ now persist across sessions, by storing a settings.json file to the same directory as Saved Flows.
  • If you are running Ollama 🦙 locally, the menu model list now autopopulates for easy access (refresh the page if it doesn't show up):
Screen Shot 2025-04-14 at 8 30 25 PM
  • You can now password-encrypt 🔐 the locally Saved Flows and settings config file, if you want. Just add the flag --secure when running chainforge serve. See chainforge serve --help for options. (The "Export" and "Import" UI buttons will continue to only serve unencrypted flows.)
    • This is useful if you want to store API keys on the front-end UI, using the new Global Settings storage, but are worried about saving such information in a text file. There's a 'settings' option that will only encrypt this settings file.
  • The Nodes list in Add Nodes is a bit nicer and has a nested menu.

Pick your Favorites ♥️:

cf-favorites.mov

and Encrypt your Flows 🔐:

Screen Shot 2025-04-14 at 3 46 35 PM Screen Shot 2025-04-14 at 3 44 39 PM

This is part of an effort to make ChainForge more customizable and secure.

As a bonus, the update adds support for OpenAI GPT-4.1 and Google Gemini 2.5 Pro models.

This update required some rearranging to improve code quality. The App.tsx file now uses a custom NestedMenu, and same with the models dropdown, since Mantine ContextMenu was too limited. The nodes menu in App.tsx is also closer to a JSON spec rather than written as React elements.

Warning

This update introduced a significant number of changes to the codebase, including a number of refactorings. If you encounter any issues, please raise an Issue.

v0.3.4: Better Table View, Saved Flows sidebar, optimized string storage

28 Feb 19:59
9d7c458

Choose a tag to compare

This update brings three major quality-of-life changes:

  • Strings throughout the application are now frequently interned in a global StringLookup table, to improve performance and reduce duplicate memory usage. This noticeably improves performance when the number of LLM responses start to exceed 1000, and also results in smaller memory footprint for exported .cforge files.
  • A Saved Flows sidebar to keep track of your flows when running ChainForge locally, and a Save button. This interacts with the Python backend and stores the data in the folder suggested by platformdirs. The specific location appears in the footer of the sidebar (in case you want to manage it yourself).
  • Table View in Response Inspectors now uses Mantine React Table. This brings:
    • Sort columns by value
    • Selectively show/hide columns
    • Filter within columns
    • Sticky header when scrolling down the table
    • Pagination to improve performance when rendering large tables
Screen Shot 2025-03-02 at 4 44 46 PM Screen Shot 2025-02-28 at 2 58 04 PM

In addition, Response Inspectors now try to lazy-load and show a LoadingSpinner when first calculating the selected inspector view.

This change is a first-step towards moving the StorageCache into the Python back-end when the memory footprint exceeds a MB threshold. This will enable a lighter and streamlined front-end when running large-scale experiments.

Minor changes:

  • Removed unnecessary dependencies on anthropic and google packages. The latter blocked forever when attempting to install grcpio on my machine.

Warning

This change touched many source code files in ChainForge and altered the way cforge files are imported and exported. Although changes should be backwards-compatible and bug-free, this cannot be guaranteed. If performance breaks, revert to the previous version.

v0.3.2.5: Generate Tables

19 Dec 20:51
7e86f19

Choose a tag to compare

New ChainForge release ✨out now: Generate a table from a prompt, extend rows, add a column like magic!

synth-table.mov

v0.3.1.5: Multi-Eval Node

25 Apr 18:01
6fa3092

Choose a tag to compare

This is the first release adding the MultiEval node to ChainForge proper, alongside:

  • improvements to response inspector table view to display multi-criteria scoring in column view
  • table view is now default when multiple evaluators are detected

Voilà:

Screen Shot 2024-03-17 at 12 21 37 AM

As you can see, Multi-Eval allows you to define multiple per-response evaluators inside the same node. You can use this to evaluate responses across multiple criteria. Evaluators can be a mix of code or LLM evaluators, as you see fit, and you can change the LLM scorer model on a per-evaluator basis.

This is a "beta" version of the MultiEval node, for two reasons:

  • The output handle of MultiEval is disabled, since it doesn't yet work with VisNodes to plot data across multiple criteria. That is a separate issue that I didn't want holding up this push. It is coming.
  • There are no genAI features in MultiEval, yet, like there are in Code Evaluator nodes. I want to do this right (beyond EvalGen, which is another matter). The idea is that you can describe the criteria in a prompt and the AI will add an evaluator to the list that it thinks is the best, on a per-criteria basis. For now as a workaround, you can use the genAI feature to generate code inside single Code Evaluators and port that code over.

The EvalGen wizard is also coming, to help users automatically generate evaluation metrics with human supervision. We have a version of this on the multi-eval branch (which due to the TypeScript front-end rewrite, we cannot directly merge into main), but it doesn't integrate Shreya's fixes.

v0.3.1: Image models, TypeScript rewrite, Bedrock support

31 Mar 02:59
583ea65

Choose a tag to compare

This change has been in the works for over a month. The most significant change comprises a rewrite of the entire front-end of ChainForge, which is tens of thousands of lines of code, into TypeScript. More details below.

Support for image models (with Dall-E OpenAI models being the first)

Screen Shot 2024-03-30 at 7 03 45 PM

Images are compressed by default using compressorjs, with no visible impact on quality and average compression around 60% of the original. Users can turn off compression in the Advanced tab of the Settings window. It is recommended to keep it on, however.

Custom Providers for Image Models

Your custom providers can return image data instead of text. From your provider, you would return a JSON dict in the format: { t: "img", d: <base64_str_png_encoded_image> }. (Only put the base64 data, not the "data...image:png," etc metadata before it.)

Note that we don't yet support:

  • Exporting images into cells of an Excel sheet (Export to Excel will be disabled if it detects an image)

Warning

Be warned that images eat up browser storage space fast and will quickly disable autosaving.

Rewrite of the front-end into TypeScript

The entire front-end has been converted to tsx files, with the appropriate typing. Other nice refactorings were made along the way.

Among identifying bugs or never reached code, we have worked to simplify and standardize the ways LLMResponses are stored on the backend and front-end. It is not perfect, but now LLMResponse and LLMResponseData make the format of stored responses transparent to developers.

This change makes it easier for developers to extend ChainForge with confidence, chiefly because TypeScript flags side-effects of changes to core code. For instance, TypeScript enabled us to add image models in only 2 hours, since all that was required was changing the type LLMResponseData from a string to a string | object-with-image-data. This change would not have been easy to perform with confidence without the visibility TypeScript provides into the downstream effects of changing a core datatype. In the future, it will help us add support for multi-modal vision models.

Custom Right-click Actions on Nodes

Right-clicking on a node can now present more options.

  • TextFields can be converted into Items Nodes
  • Items Nodes can be converted into TextFields Nodes
  • Prompt and Chat Turn node LLM response cache can be cleared:
Screen Shot 2024-03-30 at 7 41 37 PM

Amazon Bedrock Support

Thanks to @massi-ang , Amazon Bedrock hosted models have been added to ChainForge. We've just added these endpoints, and I wasn't able to test them directly (I don't have access), so if you encounter problems please open an Issue and poke @massi-ang to let him know.

Nested Add Model Menu on Prompt nodes

Screen Shot 2024-03-30 at 10 47 06 PM

Thanks also to @massi-ang , clicking Add+ to add a model now brings up a nested menu. The list is still limited (use the Settings to access more specific models), but we were facing a growing problem as the number of providers increased.

Better Rate Limiting

Rate limiting has been improved to use Bottleneck. Rate limiting is now performed on a rolling timing window rather than the naive "block and wait" approach we used before. Note that this doesn't take into account what "tier" of access you have for OpenAI and Anthropic models, as there's no way for us to know that, so it's based on Tier 2 access. If you encounter a rate limit just re-run the node.

Problems? Let us know!

This change is comprehensive, and while I have tested it extensively, it is possible I have missed something. If you encounter an error, please open an Issue.

v0.3: Claude 3, Sandboxed Python

06 Mar 04:30
0f4275b

Choose a tag to compare

Adds new Anthropic Claude 3 models.

  • Backend now uses the messages API for Claude 2.1+ models.
  • Adds the system message parameter in Claude settings.

Adds browser-sandboxed Python with pyodide

You can now run Python in a safe sandbox entirely in the browser, provided you do not need to import third-party libraries.
The web-hosted version at chainforge.ai/play now has Python evaluators unlocked:

Screen Shot 2024-03-05 at 11 08 46 PM

The local version of ChainForge includes a toggle to turn sandboxing on or off:

Screen Shot 2024-03-03 at 9 23 18 PM

If you turn sandboxing off, you go back to the previous Python evaluator, executed on your local machine through the Flask backend. In the non-sandboxed eval node you can import any libraries available in your Python environment.

Why sandboxing?

The benefit of sandboxing is that ChainForge can now be used to execute Python code generated by LLMs, using eval() or exec() in your evaluation function. This was possible before but dangerous and unsafe. Benchmarks that do not rely on third-party libraries, like HumanEvals at pass@1 rate, could be run within ChainForge entirely in the web browser (if anyone wants to set this up, let me know!).

Add Prettier and ESLint

24 Feb 16:50
bd35ecd

Choose a tag to compare

Hi folks,

Thanks to PRs #223 and #222 by @massi-ang we have added Prettier and ESLint to ChainForge's main branch.

prettier and eslint are now run upon npm run build, and you are encouraged to run them before suggesting any PRs onto the ChainForge main branch.

We know this is somewhat annoying to anyone building on top of ChainForge, because it may make rebasing on top of latest main changes a chore. This includes myself---the changes in multi-eval branch, which I have been working on for a while now, are even harder to merge. However, the addition of consistent formatting and linting provides better standards for developer contributions, beyond the ad-hoc approach to writing code we had before.

Recently, I have had less time for code hygiene tasks for this project. However, I think converting the entire front-end code to TypeScript is the next step. This would provide more guarantees on dev contributions, may catch existing bugs, and allows us to have a standardized ResponseObject format across ChainForge that is enforced and extendable. The latter:

  • would provide guarantees on format for people adding their own widgets, about what type of responses are
  • would be easily extensible to additional data formats like images as input for GPT4-Vision, or images as responses for DALL-E

Additionally, I envision:

  • Better encapsulation of how responses are displayed in Inspectors, i.e. a dedicated React component like ResponseBox that can then be extended to handle image outputs, if present.
  • Better storage for internal responses (i.e. the one with “query”) that minimizes repeated info for LLM settings. Duplicate info in LLM settings is inflating the size of files fast. LLM at particular settings should be uids to a lookup table.
  • Better / updated example flows, e.g. comparing prompts, testing JSON format, multiple evaluations
  • Dev docs for how to create a new node

It doesn't seem like LLMs are going anywhere, and evaluating their output quality still suffers from the same issues. If we work together, we can make ChainForge a go-to graphical interface for "testing stuff out" ---rapid prototyping of prompt and chain ideas and rapid testing of LLM behavior, beyond ad-hoc chatting, CLIs or having to type code.

ChainForge is based on transparency and complete control. We always intend to show the prompts to developers. Developers should have access to the exact settings used for the model, too. If ChainForge adds, say, prompt optimization, it will be important to always show the prompts.

Let us know what you think of these changes, or what you'd like to see in the future. If you are a developer, please consider contributing! :)

v0.2.9.5: Parametrize model settings, Sample Tabular Data

20 Jan 01:49
7e1f436

Choose a tag to compare

This version includes many changes, including making it much easier to compare across system messages. The docs have been updated to reflect these changes (see for instance the FAQ).

Adds random sampler toggle to Tabular Data node

random-sampling

Adds settings template variables of form {=setting_name} to allow users to parametrize settings just like prompts.

For instance, here's comparing across system messages:

compare-sys-msgs

Here's another example, comparing across temperatures:

settings-vars

The docs have also been amended to explain these new functionalities.

Smaller changes / bug fixes / QOL improvements

  • Removes red notification dots, which could become annoying
  • Fully clears the ReactFlow state before loading in a new flow
  • Debounces updating the template hooks in Prompt Nodes, when user is editing the prompt
  • Keep tracks of the provenance of responses by adding a uid parameter. This is specifically to keep track of which batch a response came from when Num generations per prompt n>1. This corrects an issue in the Evaluator inspectors where n>1 outputs were broken up.

v0.2.8.9: Ollama Support

08 Jan 23:53

Choose a tag to compare

Dalai support has been replaced by Ollama 🦙. You can now add local models hosted via Ollama, including llama2, mistral, codellama, vicuna, etc.

Thanks to @laurenthuberdeau ! 🎉