Skip to content

Conversation

@GergesHany
Copy link

@GergesHany GergesHany commented Oct 2, 2025

Description

  • Enhanced filter regex to handle quoted column names with double quotes and backticks
  • Added support for column names containing hyphens, spaces, dots, and other special characters
  • Updated GetFilter() method to extract column names from multiple capture groups
  • Added comprehensive test suite with 15+ test cases covering various scenarios:
    • Quoted column names with spaces and special characters
    • Mixed quoting styles (double quotes and backticks)
    • Complex filters with AND/OR logical operators
    • Edge cases and error conditions
  • Resolves issue with parsing filters for columns like "user-id", "order date", "item.price"

Examples now supported:

  • "user-id" > 5 and order date = "2024-01-01"
  • item.price >= 100.50 or "column name" != "value"

Fixes #498

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Screenshot from 2025-10-02 17-15-55

Note

Improves filter parsing to support quoted column names, broader value formats, logical operators, and adds a comprehensive test suite.

  • Filter parsing (types/stream_configured.go):
    • Trim input filter via strings.TrimSpace.
    • Replace regex to support double-quoted column names (incl. spaces, hyphens, dots) and unquoted identifiers; accept quoted strings, negative/decimal numbers, scientific notation, NULL values.
    • Preserve case of logical operators while matching case-insensitively (and/or).
    • Extract column via helper to combine capture groups; adjust indices and LogicalOperator handling.
  • Tests (types/stream_configured_test.go):
    • Add extensive test suite covering valid/invalid cases: quoted identifiers/values, mixed spacing, operator variants, numeric formats, logical operators, edge/error conditions.

Written by Cursor Bugbot for commit a61b0fc. This will update automatically on new commits. Configure here.

- Enhanced filter regex to handle quoted column names with double quotes and backticks
- Added support for column names containing hyphens, spaces, dots, and other special characters
- Updated GetFilter() method to extract column names from multiple capture groups
- Added comprehensive test suite with 15+ test cases covering various scenarios:
  - Quoted column names with spaces and special characters
  - Mixed quoting styles (double quotes and backticks)
  - Complex filters with AND/OR logical operators
  - Edge cases and error conditions
- Resolves issue with parsing filters for columns like "user-id", "order date", "item.price"

Examples now supported:
- "user-id" > 5 and `order date` = "2024-01-01"
- `item.price` >= 100.50 or "column name" != "value"
@CLAassistant
Copy link

CLAassistant commented Oct 2, 2025

CLA assistant check
All committers have signed the CLA.

@vaibhav-datazip
Copy link
Collaborator

Hi @GergesHany ,
The PR needs to be raised on top of the staging branch not the master, can you please fix this.

@GergesHany GergesHany changed the base branch from master to staging October 3, 2025 14:14
@GergesHany
Copy link
Author

@vaibhav-datazip Done

@vaibhav-datazip vaibhav-datazip added the hacktoberfest Issues open for Hacktoberfest contributors label Oct 7, 2025
@GergesHany
Copy link
Author

Hi @vaibhav-datazip, just a quick reminder to review my PR when you get a chance!

@vaibhav-datazip
Copy link
Collaborator

Hi @vaibhav-datazip, just a quick reminder to review my PR when you get a chance!

will review your PR soon @GergesHany

@vaibhav-datazip
Copy link
Collaborator

LGTM

@GergesHany
Copy link
Author

GergesHany commented Oct 10, 2025

LGTM

@vaibhav-datazip Thanks for the approval! I hope proceed with the merge soon

Comment on lines 382 to 386
{
name: "three conditions (not supported)",
filter: "a = 1 and b = 2 and c = 3",
expectError: true,
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case is already added

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this as you have already added this case above named too many conditions

@Akshay-datazip
Copy link
Collaborator

hi @GergesHany feel free to join our slack to take the lead of this pr if you are facing issues
https://join.slack.com/t/getolake/shared_invite/zt-2uyphqf69-KQxih9Gwd4GCQRD_XFcuyw

@GergesHany
Copy link
Author

@vaibhav-datazip Done

@vaibhav-datazip
Copy link
Collaborator

vaibhav-datazip commented Nov 5, 2025

@vaibhav-datazip Done

thanks @GergesHany , will be reviewing soon

Comment on lines 627 to 645
func TestGetFilterManual(t *testing.T) {
testCases := []string{
`status = "active"`,
`"user-id" > 5`,
"`order date` = \"2024-01-01\"",
`"user-id" > 5 and ` + "`item.price` <= 100",
`age >= 18 or status != "inactive"`,
}

for _, filter := range testCases {
cs := &ConfiguredStream{
StreamMetadata: StreamMetadata{
Filter: filter,
},
}

result, err := cs.GetFilter()
if err != nil {
t.Logf("Filter: %s -> ERROR: %v", filter, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have recently merged unit tests into staging you can see that and make changes to make unit tests consistent , we are using testify for assert

Comment on lines 644 to 665
func TestGetFilterManual(t *testing.T) {
testCases := []string{
`status = "active"`,
`"user-id" > 5`,
`age >= 18 or status != "inactive"`,
}

for _, filter := range testCases {
cs := &ConfiguredStream{
StreamMetadata: StreamMetadata{
Filter: filter,
},
}

result, err := cs.GetFilter()
if err != nil {
t.Logf("Filter: %s -> ERROR: %v", filter, err)
} else {
t.Logf("Filter: %s -> Conditions: %+v, LogicalOp: %s", filter, result.Conditions, result.LogicalOperator)
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this manual testing function when you have already added another function which tests get filter ?

Comment on lines 3 to 7
import (
"strings"
"testing"
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use testify for checking the test cases. if you have doubt, you can see the following unit test related files

  • olake/types/catalog_test.go
  • olake/utils/typeutils/resolver_test.go
  • olake/utils/typeutils/flatten_test.go

},
expectError: false,
},
// Supports double-quoted column identifiers (e.g., contains hyphen).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more test case can be added (if its not already present) , where column name is not passed in quotes

},
// Simple comparison without spaces around operator.
{
name: "test case from user: a>b",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is test case from user: written in some of the test cases, we can give them better names

Comment on lines 436 to 465
{
name: "triple equals with greater than",
filter: "a >=== b",
expectError: true,
},
// Nonsensical operator sequences (four equals with >) should error.
{
name: "four equals with greater than",
filter: "a ====> b",
expectError: true,
},
// Nonsensical operator sequences (four equals with <) should error.
{
name: "four equals with less than",
filter: "a ====< b",
expectError: true,
},
// Nonsensical operator sequences mixing equals and not-equals should error.
{
name: "triple equals with not equal",
filter: "a ===!= b",
expectError: true,
},
// Multiple equals signs (===) are invalid in this grammar.
{
name: "multiple equals signs",
filter: "a === b",
expectError: true,
},
// Quadruple equals are invalid and should error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for these cases, names don't look good, you can use multiple instead of using triple quadruple and also remove redundant looking cases from here as well

filter: `"col\"name" = "val\"ue"`,
expectError: true,
},

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove unnecessary line breaks

Comment on lines 564 to 569
// Missing value after '=' should error.
{
name: "empty value after operator",
filter: "col = ",
expectError: true,
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this kind of test case is already present named missing value

Comment on lines 571 to 582
{
name: "unquoted column with trailing space before op",
filter: "col = 5",
expectedFilter: Filter{
Conditions: []Condition{
{Column: "col", Operator: "=", Value: "5"},
},
LogicalOperator: "",
},
expectError: false,
},
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test case is also covered in excessive whitespace

Comment on lines 627 to 637
if actualCondition.Column != expectedCondition.Column {
t.Errorf("Condition[%d] Column mismatch: expected %q, got %q", i, expectedCondition.Column, actualCondition.Column)
}

if actualCondition.Operator != expectedCondition.Operator {
t.Errorf("Condition[%d] Operator mismatch: expected %q, got %q", i, expectedCondition.Operator, actualCondition.Operator)
}

if actualCondition.Value != expectedCondition.Value {
t.Errorf("Condition[%d] Value mismatch: expected %q, got %q", i, expectedCondition.Value, actualCondition.Value)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of these you can use assert from testify library

@vaibhav-datazip
Copy link
Collaborator

please also check older unresolved comments

@GergesHany
Copy link
Author

please also check older unresolved comments

@vaibhav-datazip

All comments have been resolved.
Let me know if there’s anything else needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hacktoberfest Issues open for Hacktoberfest contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: handle special characters in column name in filter

5 participants