Open
Description
Describe Your Proposed Tutorial
Issue Description
Current State
- Users cannot clean their data in VDP with simple flow
Why We Want to Change?
- We want to exclude some data to make the chunks cleaner, which can improve the efficiency of RAG.
Proposed Change
- Please fetch this JSON Schema to implement the functions.
Pseudo Recipe
# VDP Version
version: v1beta
component:
text-0:
type: text
input:
# "Array of text to be cleaned."
texts:
setting:
# option 1
clean-method: Regex
# When the text is matched, it will be removed from the array of text.
exclude-patterns:
# When the text is matched, it will be remained in the array of text.
include-patterns:
# option 2
clean-method: Substring
# When the text contains the substrings, it will be removed from the array of text.
exclude-substrings:
# When the text contains the substrings, it will be remained in the array of text.
include-substrings:
# A flag indicating whether the substring matching is case-sensitive. When it is true, the matching is case-sensitive. When it is false, the matching is case-insensitive. The default value is false. For example, when it is case-sensitive, cat would only match 'cat' but not 'Cat' or 'CAT'. When cat is case-insensitive, on the other hand, would match 'cat', 'Cat', 'CAT', or any other variation of uppercase and lowercase letters.
case-sensitive:
condition:
task: TASK_CLEAN_DATA
Rules for the Component Hackathon
- Each issue will only be assigned to one person/team at a time.
- You can only work on one issue at a time.
- To express interest in an issue, please comment on it and tag @kuroxx, allowing the Instill AI team to assign it to you.
- Ensure you address all feedback and suggestions provided by the Instill AI team.
- If no commits are made within five days, the issue may be reassigned to another contributor.
- Join our Discord to engage in discussions and seek assistance in #hackathon channel. For technical queries, you can tag @chuang8511.
Component Contribution Guideline | Documentation | Official Go Tutorial
Metadata
Assignees
Labels
Type
Projects
Status
In Progress
Activity