Skip to content

Provide hooks for custom processing #1003

@janhoy

Description

@janhoy

Is your feature request related to a problem? Please describe.

Want to use fscrawler for more complex processing before indexing to ES

Describe the solution you'd like

Now the "pipeline" is hardcoded in Java

find new file -> OCR/parse with tika -> index in ES

Rather, provide an ProcessingPipeline plugin that users can replace with their own implementation MyCustomPipeline. This pipeline plugin woud have a simple interface and provide a default impl which works exactly like todays hardcoded:

public interface ProcessingPipeline {
    public SomeContext processFile(SomeContext ctx);
}
public class DefaultProcessingPipeline implements ProcessingPipeline {
    @Override
    public SomeContext processFile(SomeContext ctx) {
        // Default impl goes here, i.e.
        ctx.setBodyText(parseTika(ctx))
        ctx.setEsDoc = createElasticDoc(ctx)
        return ctx
    }

    protected SomeContext createElasticDoc(SomeContext ctx) ...
}

However, users can provide their custom processing logic

public class CustomProcessingPipeline extends DefaultProcessingPipeline {
    @Override
    public SomeContext processFile(SomeContext ctx) {
        // Custom impl, override what you need
        return ctx
    }
}

Describe alternatives you've considered

Just forking the project, or build something from scratch

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions