-
Notifications
You must be signed in to change notification settings - Fork 16
Description
I haven't quite figured out how yet, but it appears that there's a memory leak occurring in this library somewhere, or at least in the ONNX runtime. I've tested on multiple models so far, and the result is always the same: The program eventually accumulates enough memory to OOM, despite my efforts to batch inferences and not hold more data than necessary in memory.
My pipeline is as follows:
fn create_model(tokenizer_path: &PathBuf, model_path: &PathBuf) -> GlinerResult<GLiNER<SpanMode>> {
let model = GLiNER::<SpanMode>::new(
Parameters::default(),
RuntimeParameters::default().with_threads(NUM_THREADS),
tokenizer_path,
model_path,
)?;
Ok(model)
}Note that I have not tested this on tokenizer mode.
I'm simply reading a CSV in batches (not using the built-in CSV TextInput functionality) and creating the TextInput from it:
let input = TextInput::new(text_batch.clone(), vec!["us_organizations".to_string()])?;I should note that my inputs are relatively large at first, and I have batches of 1,000 inputs. However I do split inputs based on their token count so that nothing more than 512 tokens gets sent to the model per input in the among the 1,000. However, I have 3,000,000 records in total. Perhaps it's not noticeable on smaller datasets, but since this pipeline runs fairly long, the memory accumulation is very clear. Each batch inferenced accumulates more and more memory.
Here is a profile of the memory during an execution (note this only processes one batch, so 1,000 records)
Am I perhaps using the library wrong, or is there a bug happening?