-
Notifications
You must be signed in to change notification settings - Fork 179
Issues: data-prep-kit/data-prep-kit
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Feature] reafactor 'examples/notebooks/pdf-processing-1'
enhancement
New feature or request
#1129
opened Mar 13, 2025 by
sujee
2 tasks done
[Feature] Allow users to define and use specialized data access classes
enhancement
New feature or request
#1128
opened Mar 13, 2025 by
touma-I
2 tasks done
[Feature] Contribute Bring Your Own Logic Transform
enhancement
New feature or request
#1127
opened Mar 12, 2025 by
santoshborse
2 tasks done
Improve the notebooks for rep_removal to include a list of command line parameters
enhancement
New feature or request
under-review
#1118
opened Mar 10, 2025 by
shahrokhDaijavad
2 tasks done
[Feature] Ability to injest XML/JATs using Docling and pdf2parquet
enhancement
New feature or request
sprint-Mar-21
#1107
opened Mar 6, 2025 by
touma-I
1 of 2 tasks
[Feature] Investigate new approach for further simplification by eliminating python runtime, ray runtime and spark runtime
enhancement
New feature or request
sprint-Mar-21
#1105
opened Mar 6, 2025 by
touma-I
1 of 2 tasks
Bring in cookbooks, recipes, and scripts for post-processing applications to DPK
enhancement
New feature or request
under-review
#1104
opened Mar 5, 2025 by
shahrokhDaijavad
1 of 2 tasks
[Feature] Enable in-memory chaining and parallel execution of transforms
enhancement
New feature or request
sprint-Mar-21
#1102
opened Mar 5, 2025 by
touma-I
2 tasks done
[Feature] Analyze transforms on the inner and covert to outer
enhancement
New feature or request
sprint-Apr-11
#1096
opened Mar 4, 2025 by
touma-I
1 of 2 tasks
Update the main README table with the list of new GneissWeb Transforms
enhancement
New feature or request
sprint-Mar-7
#1069
opened Feb 26, 2025 by
shahrokhDaijavad
1 of 2 tasks
On-boarding Multi-lingual transforms to DPK
enhancement
New feature or request
sprint-Mar-21
#1065
opened Feb 25, 2025 by
shahrokhDaijavad
1 of 2 tasks
[Feature]New DPK transform to get the distributions of quality metrics
enhancement
New feature or request
sprint-Mar-21
#1045
opened Feb 11, 2025 by
Hajar-Emami
1 of 2 tasks
[Feature] Filter both the parquet and arrow files and update the metadata simultaneously
enhancement
New feature or request
Pending
#1044
opened Feb 11, 2025 by
Hajar-Emami
1 of 2 tasks
[Feature] Enable crawling of websites that require credentials via SSO or 2FA
enhancement
New feature or request
Pending
#1040
opened Feb 11, 2025 by
touma-I
1 of 2 tasks
[Feature] Enabling gneissweb_classification transform by using multiple fasttext classifiers simultaneously
enhancement
New feature or request
sprint-Mar-7
#1034
opened Feb 10, 2025 by
Hajar-Emami
1 of 2 tasks
[Feature] Update PII sample notebook to use simple APIs
enhancement
New feature or request
sprint-Mar-7
#1032
opened Feb 10, 2025 by
sujee
2 tasks done
On-boarding Multimodal transforms to DPK
enhancement
New feature or request
sprint-Apr-11
#1020
opened Feb 6, 2025 by
shahrokhDaijavad
1 of 2 tasks
Improve performance of the Readability transform
enhancement
New feature or request
sprint-Mar-7
#1015
opened Feb 5, 2025 by
shahrokhDaijavad
1 of 2 tasks
Consistency of defined configuration parameters with the CLI Options in all transforms READMEs and Notebooks
enhancement
New feature or request
#1002
opened Jan 30, 2025 by
shahrokhDaijavad
2 tasks done
[Feature] how to find which DPK 'modules' are installed
enhancement
New feature or request
sprint-Mar-21
#996
opened Jan 29, 2025 by
sujee
1 of 2 tasks
Develop a notebook that creates a pipeline (recipe) for running new GneissWeb transforms in sequence on some data of your choosing.
enhancement
New feature or request
gneiss web
sprint-Jan31
#983
opened Jan 27, 2025 by
shahrokhDaijavad
1 of 2 tasks
Supporting data access to hugging face data sets
enhancement
New feature or request
#964
opened Jan 23, 2025 by
blublinsky
2 tasks done
[Feature] Grow core library and transforms to enable easy launching of a transform in a runtime from a .py file
enhancement
New feature or request
#931
opened Jan 9, 2025 by
daw3rd
2 tasks done
[Feature] New transform to annotate with any classifier model with multi-classifier support
enhancement
New feature or request
gneiss web
sprint-Jan31
#924
opened Jan 8, 2025 by
Harmedox
1 of 2 tasks
[Feature] New transform to annotate with readability scores
enhancement
New feature or request
gneiss web
sprint-Jan31
#923
opened Jan 8, 2025 by
Harmedox
1 of 2 tasks
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.