Hey, thank you for your amazing work.
When I try to let the following run:
from rs_bpe.bpe import openai
import time
# Load the tokenizer
tokenizer = openai.cl100k_base()
# Create a batch of texts
texts = [
"This is the first document to encode.",
"Here's another one with different content.",
"A third document with some more text to process.",
# Add more as needed...
]
# Configure parallel processing options (optional)
parallel_options = openai.ParallelOptions(
min_batch_size=20, # Minimum batch size to engage parallel processing
chunk_size=100, # Number of texts to process in each thread
max_threads=0, # 0 means use optimal thread count (based on CPU cores)
use_thread_pool=True # Reuse thread pool for better performance
)
# Encode batch with performance metrics
start_time = time.time()
result = tokenizer.encode_batch(texts, parallel_options)
end_time = time.time()
print(f"Processed {len(texts)} texts in {result.time_taken:.6f}s")
print(f"Total tokens: {result.total_tokens}")
print(f"Throughput: {result.total_tokens / result.time_taken:.1f} tokens/second")
# Access individual token lists
for i, tokens in enumerate(result.tokens):
print(f"Text {i} has {len(tokens)} tokens")
I get the error
TypeError: ParallelOptions.__new__() got an unexpected keyword argument 'use_thread_pool'
Hey, thank you for your amazing work.
When I try to let the following run:
I get the error