-
Notifications
You must be signed in to change notification settings - Fork 307
Open
Labels
feature_requestfor feature requestfor feature request
Description
Is your feature request related to a problem? Please describe.
Want to use fscrawler for more complex processing before indexing to ES
Describe the solution you'd like
Now the "pipeline" is hardcoded in Java
find new file -> OCR/parse with tika -> index in ES
Rather, provide an ProcessingPipeline plugin that users can replace with their own implementation MyCustomPipeline. This pipeline plugin woud have a simple interface and provide a default impl which works exactly like todays hardcoded:
public interface ProcessingPipeline {
public SomeContext processFile(SomeContext ctx);
}public class DefaultProcessingPipeline implements ProcessingPipeline {
@Override
public SomeContext processFile(SomeContext ctx) {
// Default impl goes here, i.e.
ctx.setBodyText(parseTika(ctx))
ctx.setEsDoc = createElasticDoc(ctx)
return ctx
}
protected SomeContext createElasticDoc(SomeContext ctx) ...
}However, users can provide their custom processing logic
public class CustomProcessingPipeline extends DefaultProcessingPipeline {
@Override
public SomeContext processFile(SomeContext ctx) {
// Custom impl, override what you need
return ctx
}
}Describe alternatives you've considered
Just forking the project, or build something from scratch
macintoshpie
Metadata
Metadata
Assignees
Labels
feature_requestfor feature requestfor feature request