ClpDataSource drops matching rows and reports inflated row counts for IR streams

### Bug description

## Background                                                                                                                                                                               
Velox executes SQL queries as pipelines of operators. For this bug, we are concerned with **`TableScan`** which drives the reading of the data.

  - **`TableScan`** fetches data in batches by repeatedly calling into a connector-provided **`DataSource`**, and feeds the resulting rows into the rest of the query pipeline. 

  - **`DataSource`** is the connector-side interface that `TableScan` calls to actually read data. Each connector (Hive, CLP, etc.) implements this interface. The lifecycle is:  
1. TableScan calls addSplit(split) to assign a "split" to the **`DataSource`**.                                                   
2. TableScan calls next(size, future) repeatedly to fetch rows in batches of 1024 rows for a given **`DataSource`**.                                                                                            
3. next returns a RowVector for each batch that contains matching rows.                                                                                                                  
4. next returns nullptr to signal that the split is fully consumed, at which point TableScan moves on to the next split.   
 
Note, A split must be fully drained by next before another split can be added. 

## Problem Statement

### Bug 1: Premature split termination

ClpDataSource::next returns nullptr whenever the current batch has zero matching rows:

```
auto rowsScanned = cursor_->fetchNext(size);
auto rowsFiltered = cursor_->getNumFilteredRows();
if (rowsFiltered == 0) {
    return nullptr;
}
```

Per the DataSource contract, nullptr signals that the split is fully consumed. TableScan then stops calling next and moves on to the next split. However, rowsFiltered == 0 only means the current batch had no matches — later batches in the same split may still contain matching rows.

Impact: This compromises the correctness of the returned result! Matching rows are silently dropped whenever a non-matching batch precedes them within the same split.

Example: An IR split has 2048 log events, batch size 1024. Events 1–1024 match nothing; events 1025–1100 has 1 match. The first next call scans batch 1, gets 0 filtered rows, returns nullptr.  The matching row in batch 2 is never returned.

### Bug 2: Inflated completedRows_

  ClpIrCursor::fetchNext returns irDeserializer_->get_num_log_events_deserialized(), which is a cumulative count across all invocations, not a per-batch count. However, ClpDataSource::next adds this directly to completedRows_:
```
  completedRows_ += rowsScanned; // rowsScanned is cumulative
```

  This causes completedRows_ to grow quadratically.

Example: An IR split contains 3000 log events with a batch size of 1024. Every batch has matching rows.  
  | Call | `fetchNext` returns | `completedRows_ +=` | `completedRows_` | Correct |
  |------|---------------------|----------------------|-------------------|---------|
  | 1    | 1024                | 1024                 | 1024              | 1024    |
  | 2    | 2048                | 2048                 | 3072              | 2048    |
  | 3    | 3000                | 3000                 | 6072              | 3000    |

Impact: TableScan reads completedRows_ via getCompletedRows() and records it as
  rawInputPositions in operator statistics. The inflated count causes rawInputPositions
  to over-report the number of rows scanned (~2x in this example), making query performance
  stats unreliable.



### System information

Partial result is returned, incorrect functionality

### Relevant logs

```bash

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClpDataSource drops matching rows and reports inflated row counts for IR streams #49

Bug description

Background

Problem Statement

Bug 1: Premature split termination

Bug 2: Inflated completedRows_

System information

Relevant logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ClpDataSource drops matching rows and reports inflated row counts for IR streams #49

Description

Bug description

Background

Problem Statement

Bug 1: Premature split termination

Bug 2: Inflated completedRows_

System information

Relevant logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions