Skip to content

filter-string is slow #736

@TobiasNx

Description

@TobiasNx

I have a file with ~85000 lines that I want to check and filter:

default infile="prod/input/strapi-export-holdings.ndjson";
default outfile="test/input/strapi-export-holdings.ndjson";
default source="strapi";


// Basis for the update is the full holding export from strapi

infile
| open-file
| as-lines
| object-batch-log
| filter-strings("\\{\\"type\\":\\"api::holding.holding\\",\\"id\\":.?,\\"data\\":{\\"almaMmsId\\":\\"(990110477460206441|990110563680206441|990032023650206441|990032044960206441|990031978440206441|990031978700206441|990056500660206441|990070407320206441|990164705080206441|990164725630206441|99373698055206441|99373698054006441|99374153044006441|99374156679006441|99374156678106441|99373334588806441|99373436116806441|99373436112506441|990094585770206441|990170147680206441|990209861490206441).*")
| write(outfile)
;

This is very slow less than 1000 every 10 seconds.

default infile="prod/input/strapi-export-holdings.ndjson";
default outfile="test/input/strapi-export-holdings.ndjson";
default source="strapi";


// Basis for the update is the full holding export from strapi

infile
| open-file
| as-lines
| object-batch-log
| filter-strings(".*990110477460206441.*")
| write(outfile)
;

I thought that reducing the complexity of the regex would improve speed of the transformation bit it is still very slow.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions