Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
545ffc3
Use shared LLM-embedding model to describe/annotate cells
moritzschaefer Oct 20, 2023
bdfffda
Implement LLM-embedding-based continuous cell visualization
moritzschaefer Oct 23, 2023
b7dc078
Loading of pytorch lightning model for LLM embeddings
moritzschaefer Oct 29, 2023
80ce9fd
Adapted llm_obs_to_text() to use the single_cellm.validation.zero_sho…
tomazou-lab Oct 31, 2023
6644e87
Fix/Refactor `llm_obs_to_text` to fit to the single-cellm repo
moritzschaefer Nov 1, 2023
b9d3486
Fix Cell->Text. With baseline unittest. Also don't load model (to sav…
moritzschaefer Nov 2, 2023
87c1c93
refactor imports
moritzschaefer Nov 16, 2023
4902e6a
load_from_checkpoint with correct args
moritzschaefer Nov 25, 2023
eca0097
Adapt LLM calling to up to date single-cellm state
moritzschaefer Dec 23, 2023
0035372
remove superfluous werkzeug dependency
moritzschaefer Dec 23, 2023
a1fdb64
rename llm_embeddings to single_cellm_wrapper
moritzschaefer Dec 25, 2023
0467d11
More renaming and refactoring of single-cellm functions into a class
moritzschaefer Dec 25, 2023
b1ee22b
Automatically show newly created annotation
moritzschaefer Jan 3, 2024
95de5ee
Use 'best' model from our sweep
moritzschaefer Jan 3, 2024
009ef93
Run single-cellm model based on transient selection (rather than DEG …
moritzschaefer Jan 4, 2024
86c09be
Render generated text (GO keywords) in a structured format
moritzschaefer Jan 4, 2024
83a90ae
Provide scLLM model via appropriate config file
moritzschaefer Jan 7, 2024
ec9d6fd
Improve formatting of LLM output (allowing error values)
moritzschaefer Jan 7, 2024
a593ad2
Minifix with single ceLLM API and prepare for more performant dataloa…
moritzschaefer Jan 12, 2024
988ae47
Use of precomputed CLIP embeddings to accelerate web UI
moritzschaefer Jan 15, 2024
64ea816
Use full tabula_sapiens dataset by default
moritzschaefer Jan 15, 2024
5443d5d
Use new 609 model
moritzschaefer Jan 23, 2024
bfe9822
Update used model. Use correct dataset-preloading-file now
moritzschaefer Jan 25, 2024
7e22cd0
Move LLM-interface to the right (and more)
moritzschaefer Jan 30, 2024
e0f839b
Refactor CellWhisperer wrapper for more flexible data input
moritzschaefer Feb 10, 2024
4493037
Change placeholder in chat input
moritzschaefer Feb 10, 2024
991e732
Hack/Easter egg to support subtraction of two prompts
moritzschaefer Feb 10, 2024
4163672
Fix preprocessing (especially for keywords).
moritzschaefer Feb 12, 2024
708e8f1
rename to cellwhisperer
moritzschaefer Feb 14, 2024
0281886
Rename and change API within cellxgene
moritzschaefer Feb 15, 2024
69364c4
Adopt single cellm wrapper to use API service to call model
moritzschaefer Feb 14, 2024
7bfef81
Rename, fixes and implementation of llm_obs_to_text
moritzschaefer Feb 17, 2024
7ac1a9e
Use cellwhisperer API to compute text embeddings, rather than the ful…
moritzschaefer Feb 17, 2024
ee9de45
Add CellWhisperer logo and fix wrong asset path in npm building
moritzschaefer Feb 17, 2024
7f619f6
Fix broken call to CellWhisperer API
moritzschaefer Feb 21, 2024
4c01490
Extend API to allow JSON as return type
moritzschaefer Mar 15, 2024
15b38ec
working implementation for LLM access (#4)
moritzschaefer Mar 20, 2024
006b85e
Full continuous chat support with improved style
moritzschaefer Mar 20, 2024
9eb98ff
Hallucination inializaiton method
moritzschaefer Mar 22, 2024
dccf6b7
Fix button and enter behavior in chat
moritzschaefer Mar 27, 2024
4c8c3a7
Use the cellwhisperer icon
moritzschaefer Mar 31, 2024
1498dfc
Provide gene names to LLM via conversation
moritzschaefer Apr 3, 2024
69d0908
fix missing f-string indicator
moritzschaefer Apr 3, 2024
870e108
Revert breaking code
moritzschaefer Apr 3, 2024
8132870
Implement
moritzschaefer Apr 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
root = true

[*.js]
indent_style = space
indent_size = 2
10 changes: 10 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,16 @@ smoke-test:
smoke-test-annotations:
cd client && $(MAKE) smoke-test-annotations

# STARTING SERVER AND FRONTEND

.PHONY: start
start: start-frontend-noblock start-server

.PHONY: start-frontend-noblock
start-frontend-noblock:
@echo "Starting frontend..."
@cd client && nohup make start-frontend &

# FORMATTING CODE

.PHONY: fmt
Expand Down
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,23 @@
# Moritz notes

Read this to get started (install& get startend)
https://github.com/chanzuckerberg/cellxgene/blob/main/dev_docs/developer_guidelines.md

## Installation

### Workaround \[webpack-cli] HookWebpackError: error:0308010C:digital envelope routines::unsupported

Run `export NODE_OPTIONS=--openssl-legacy-provider` before `make build-for-server-dev`

# General


<img src="./docs/cellxgene-logo.png" width="300">

_an interactive explorer for single-cell transcriptomics data_

[![DOI](https://zenodo.org/badge/105615409.svg)](https://zenodo.org/badge/latestdoi/105615409) [![PyPI](https://img.shields.io/pypi/v/cellxgene)](https://pypi.org/project/cellxgene/) [![PyPI - Downloads](https://img.shields.io/pypi/dm/cellxgene)](https://pypistats.org/packages/cellxgene) [![GitHub last commit](https://img.shields.io/github/last-commit/chanzuckerberg/cellxgene)](https://github.com/chanzuckerberg/cellxgene/pulse)

[![Push Tests](https://github.com/chanzuckerberg/cellxgene/workflows/Push%20Tests/badge.svg)](https://github.com/chanzuckerberg/cellxgene/actions?query=workflow%3A%22Push+Tests%22)
[![Compatibility Tests](https://github.com/chanzuckerberg/cellxgene/workflows/Compatibility%20Tests/badge.svg)](https://github.com/chanzuckerberg/cellxgene/actions?query=workflow%3A%22Compatibility+Tests%22)
![Code Coverage](https://codecov.io/gh/chanzuckerberg/cellxgene/branch/main/graph/badge.svg)
Expand Down
3 changes: 2 additions & 1 deletion client/.husky/pre-commit
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#!/bin/sh
exit 0
. "$(dirname "$0")/_/husky.sh"

cd client
npx --no-install lint-staged --config "./configuration/lint-staged/lint-staged.config.js"
npx --no-install lint-staged --config "./configuration/lint-staged/lint-staged.config.js"
2 changes: 1 addition & 1 deletion client/configuration/webpack/webpack.config.dev.js
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ const devConfig = {
options: {
name: "static/assets/[name].[ext]",
// (thuang): This is needed to make sure @font url path is '/static/assets/'
publicPath: "..",
publicPath: "",
},
},
],
Expand Down
4 changes: 2 additions & 2 deletions client/configuration/webpack/webpack.config.prod.js
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ const prodConfig = {
include: [nodeModules, fonts, images],
options: {
name: "static/assets/[name]-[contenthash].[ext]",
// (thuang): This is needed to make sure @font url path is '../static/assets/'
publicPath: "..",
// (thuang): This is needed to make sure @font url path is '../static/assets/' <- not for me
publicPath: "",
},
},
],
Expand Down
Binary file removed client/favicon.png
Binary file not shown.
1 change: 1 addition & 0 deletions client/favicon.png
57 changes: 57 additions & 0 deletions client/src/actions/annotation.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,66 @@ import difference from "lodash.difference";
import pako from "pako";
import * as globals from "../globals";
import { MatrixFBS, AnnotationsHelpers } from "../util/stateManager";
import { isTypedArray } from "../util/typeHelpers";

const { isUserAnnotation } = AnnotationsHelpers;

export const annotationCreateContinuousAction =
(newContinuousName, values) => async (dispatch, getState) => {
/*
Add a new user-created continuous to the obs annotations.

Arguments:
newContinuousName - string name for the continuous.
continuousToDuplicate - obs continuous to use for initial values, or null.
*/
const { annoMatrix: prevAnnoMatrix, obsCrossfilter: prevObsCrossfilter } =
getState();
if (!prevAnnoMatrix || !prevObsCrossfilter) return;
const { schema } = prevAnnoMatrix;

/* name must be a string, non-zero length */
if (typeof newContinuousName !== "string" || newContinuousName.length === 0)
throw new Error("user annotations require string name");

if (!isTypedArray(values) || values.length === 0)
// TODO check for correct length
throw new Error(
`Provided values are of wrong format or length ${typeof values}, ${
values.length
}`
);

/* ensure the name isn't already in use! */
if (schema.annotations.obsByName[newContinuousName])
throw new Error("name collision on annotation continuous create");

const newSchema = {
name: newContinuousName,
type: "float32",
writable: false,
};

const obsCrossfilter = prevObsCrossfilter.addObsColumn(
newSchema,
values.constructor,
values
);

// TODO this is probably a noop (and should be removed)
dispatch({
type: "annotation: create continuous",
data: newContinuousName,
annoMatrix: obsCrossfilter.annoMatrix,
obsCrossfilter,
});

dispatch({
type: "color by continuous metadata",
colorAccessor: newContinuousName,
});
};

export const annotationCreateCategoryAction =
(newCategoryName, categoryToDuplicate) => async (dispatch, getState) => {
/*
Expand Down
8 changes: 8 additions & 0 deletions client/src/actions/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import * as annoActions from "./annotation";
import * as viewActions from "./viewStack";
import * as embActions from "./embedding";
import * as genesetActions from "./geneset";
import * as llmEmbeddingsActions from "./llmEmbeddings";

function setGlobalConfig(config) {
/**
Expand Down Expand Up @@ -236,6 +237,7 @@ function fetchJson(pathAndQuery) {
}

export default {
fetchJson,
doInitialDataLoad,
requestDifferentialExpression,
requestSingleGeneExpressionCountsForColoringPOST,
Expand All @@ -256,6 +258,8 @@ export default {
clipAction: viewActions.clipAction,
subsetAction: viewActions.subsetAction,
resetSubsetAction: viewActions.resetSubsetAction,
annotationCreateContinuousAction:
annoActions.annotationCreateContinuousAction,
annotationCreateCategoryAction: annoActions.annotationCreateCategoryAction,
annotationRenameCategoryAction: annoActions.annotationRenameCategoryAction,
annotationDeleteCategoryAction: annoActions.annotationDeleteCategoryAction,
Expand All @@ -272,4 +276,8 @@ export default {
genesetDelete: genesetActions.genesetDelete,
genesetAddGenes: genesetActions.genesetAddGenes,
genesetDeleteGenes: genesetActions.genesetDeleteGenes,
requestEmbeddingLLMWithText: llmEmbeddingsActions.requestEmbeddingLLMWithText,
requestEmbeddingLLMWithCells:
llmEmbeddingsActions.requestEmbeddingLLMWithCells,
startChatRequest: llmEmbeddingsActions.startChatRequest,
};
206 changes: 206 additions & 0 deletions client/src/actions/llmEmbeddings.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
import * as globals from "../globals";
import { annotationCreateContinuousAction } from "./annotation";
import { matrixFBSToDataframe } from "../util/stateManager/matrix";

/*
LLM embedding querying
*/
export const requestEmbeddingLLMWithCells =
/*
Send a request to the LLM embedding model with text
*/
(cellSelection) => async (dispatch) => {
dispatch({
type: "request to embedding model started",
});
try {
// Legal values are null, Array or TypedArray. Null is initial state.
if (!cellSelection) cellSelection = [];

// These lines ensure that we convert any TypedArray to an Array.
// This is necessary because JSON.stringify() does some very strange
// things with TypedArrays (they are marshalled to JSON objects, rather
// than being marshalled as a JSON array).
cellSelection = Array.isArray(cellSelection)
? cellSelection
: Array.from(cellSelection);

const res = await fetch(
`${globals.API.prefix}${globals.API.version}llmembs/obs`,
{
method: "POST",
headers: new Headers({
Accept: "application/json",
"Content-Type": "application/json",
}),
body: JSON.stringify({
cellSelection: { filter: { obs: { index: cellSelection } } },
}),
credentials: "include",
}
);

if (!res.ok || res.headers.get("Content-Type") !== "application/json") {
return dispatch({
type: "request llm embeddings error",
error: new Error(
`Unexpected response ${res.status} ${
res.statusText
} ${res.headers.get("Content-Type")}}`
),
});
}

const response = await res.json();
return dispatch({
type: "embedding model text response from cells",
data: response,
});
} catch (error) {
return dispatch({
type: "request llm embeddings error",
error,
});
}
};

export const requestEmbeddingLLMWithText =
/*
Send a request to the LLM embedding model with text
*/
(text) => async (dispatch) => {
dispatch({
type: "request to embedding model started",
});
try {
const res = await fetch(
`${globals.API.prefix}${globals.API.version}llmembs/text`,
{
method: "POST",
headers: new Headers({
Accept: "application/octet-stream",
"Content-Type": "application/json",
}),
body: JSON.stringify({
text,
}),
credentials: "include",
}
);

if (
!res.ok ||
res.headers.get("Content-Type") !== "application/octet-stream"
) {
return dispatch({
type: "request llm embeddings error",
error: new Error(
`Unexpected response ${res.status} ${
res.statusText
} ${res.headers.get("Content-Type")}}`
),
});
}

const buffer = await res.arrayBuffer();
const dataframe = matrixFBSToDataframe(buffer);
const col = dataframe.__columns[0];

const annotationName = dataframe.colIndex.getLabel(0);

dispatch({
type: "embedding model annotation response from text",
});

return dispatch(annotationCreateContinuousAction(annotationName, col));
} catch (error) {
return dispatch({
type: "request llm embeddings error",
error,
});
}
};


/*
Action creator to interact with the http_bot endpoint
*/
export const startChatRequest = (messages, prompt, cellSelection) => async (dispatch) => {
let newMessages = messages.concat({from: "human", value: prompt});
dispatch({ type: "chat request start", newMessages });

try {
if (!cellSelection) cellSelection = [];

// These lines ensure that we convert any TypedArray to an Array.
// This is necessary because JSON.stringify() does some very strange
// things with TypedArrays (they are marshalled to JSON objects, rather
// than being marshalled as a JSON array).
cellSelection = Array.isArray(cellSelection)
? cellSelection
: Array.from(cellSelection);

const pload = {
messages: newMessages, // TODO might need to add <image> to first message
cellSelection: { filter: { obs: { index: cellSelection } } },
};

const response = await fetch(`${globals.API.prefix}${globals.API.version}llmembs/chat`, {
method: 'POST',
headers: new Headers({
// Accept: "application/json",
'Content-Type': 'application/json',
}),
body: JSON.stringify(pload),
});

if (!response.ok) {
throw new Error('Failed to get response from the model');
}

// NOTE: The canonical way to solve this would probably be to use EventStreams. But it should also be possible with fetch as below
// Stream the response (assuming the API sends back chunked responses)
const reader = response.body.getReader();
let chunksAll = new Uint8Array(0);
let receivedLength = 0; // length at the moment
while(true) {
const { done, value } = await reader.read();

if (done) {
break;
}

let temp = new Uint8Array(receivedLength + value.length);
temp.set(chunksAll, 0); // copy the old data
temp.set(value, receivedLength); // append the new chunk
chunksAll = temp; // reassign the extended array
receivedLength += value.length;

// get the last chunk

// Assuming chunksAll is the Uint8Array containing the data
let lastZeroIndex = chunksAll.lastIndexOf(0);

if (lastZeroIndex == -1) {
continue;
}
let secondLastZeroIndex = chunksAll.lastIndexOf(0, lastZeroIndex - 1);
// if secondLastZeroIndex is -1 (only 1 zero), go from the start
let lastChunk = chunksAll.slice(secondLastZeroIndex+1, lastZeroIndex);

// Decode into a string
let result = new TextDecoder("utf-8").decode(lastChunk);

// Parse the JSON (assuming the final string is a JSON object)
const data = JSON.parse(result);

// trim away the '<image>' string:
data.text = data.text.replace("<image>", "");

dispatch({ type: "chat request success", payload: data.text });
}

} catch (error) {
dispatch({ type: "chat request failure", payload: error.message });
}
};
Loading