Skip to content

Conversation

@osopardo1
Copy link
Member

@osopardo1 osopardo1 commented Dec 16, 2024

Description

Fixes #515 .

In this PR, we are adding a way of building a columnStats Schema using the current column Transformers and the actual Schema of the Data. We want to:

  • Ensure the fields are properly parsed from the JSON string.
  • If a JSON string is specified, but the row is returned null, we assume the string is not following the correct syntax.

For that, I've added a QbeastColumnStats case class that contains the columnStatsSchema and the columnStatsRow. Also, a QbeastColumnStatsBuilder is needed to retrieve all the information given the parameters mentioned above.

case class QbeastColumnStats(columnStatsSchema: StructType, columnStatsRow: Row)


object QbeastColumnStatsBuilder {
  /**
   * Builds the QbeastColumnStats
   *
   * @param statsString
   *   the stats in a JSON string
   * @param columnTransformers
   *   the set of columnTransformers to build the Stats from
   * @param dataSchema
   *   the data schema to build the Stats from
   * @return
   */
  def build(
      statsString: String,
      columnTransformers: Seq[Transformer],
      dataSchema: StructType): QbeastColumnStats
}

Type of change

Bug fix.

Checklist:

Here is the list of things you should do before submitting this pull request:

  • New feature / bug fix has been committed following the Contribution guide.
  • Add logging to the code following the Contribution guide.
  • Add comments to the code (make it easier for the community!).
  • Change the documentation.
  • Add tests.
  • Your branch is updated to the main branch (dependent changes have been merged).

How Has This Been Tested? (Optional)

Testing different parsings on QbeastColumnStatsTestBuilder.

@osopardo1 osopardo1 changed the title Issue #515: Introduce ColumnStats schema for parsing Issue #515: Add ColumnStats Schema for JSON parsing Dec 16, 2024
# Conflicts:
#	core/src/main/scala/io/qbeast/spark/index/SparkRevisionFactory.scala
#	src/main/scala/io/qbeast/table/IndexedTable.scala
#	src/test/scala/io/qbeast/spark/index/SparkRevisionFactoryTest.scala
@osopardo1 osopardo1 requested review from Jiaweihu08 and removed request for Jiaweihu08 December 20, 2024 07:58
@codecov
Copy link

codecov bot commented Dec 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.50%. Comparing base (b2e2f85) to head (81ab4a2).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #522   +/-   ##
=======================================
  Coverage   88.50%   88.50%           
=======================================
  Files          21       21           
  Lines         774      774           
  Branches      115      115           
=======================================
  Hits          685      685           
  Misses         89       89           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@osopardo1 osopardo1 requested a review from Jiaweihu08 December 20, 2024 10:12
@osopardo1 osopardo1 marked this pull request as ready for review December 20, 2024 10:12
@Qbeast-io Qbeast-io deleted a comment from osopardo1 Jan 10, 2025
@osopardo1 osopardo1 requested a review from Jiaweihu08 January 10, 2025 14:24
@Jiaweihu08 Jiaweihu08 merged commit 6cfe206 into Qbeast-io:main Jan 16, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

I can't append on a table indexed with float columns. It is not possible to define columnsStats for float columns

2 participants