Unblock String API workflows by allowing schema inference from string-based column access

### Problem

String API is intended as a fallback and incremental migration path to the type-safe API.

However, currently the compiler plugin does not learn schema information from string-based column access. As a result:

- code like `"full_name"<String>()` cannot be replaced with `full_name`
- users cannot progressively move from String API to typed API
- String API and compiler plugin workflows are disconnected

This blocks the intended usage pattern where users start with String API and gradually adopt type-safe access.

### Expected

Allow compiler plugin to infer schema information from string-based column access.

### Proposed solution

Introduce an operation (e.g. `require { ... }`) that:

- asserts presence and type of columns accessed via String API
- updates schema information for the compiler plugin
- enables further usage of typed accessors

Example:

df.require { "full_name"<String>() }
df.full_name // becomes available after require

### Acceptance criteria

- Compiler plugin can learn column types from string-based access
- Typed accessors become available after schema is inferred
- String API remains usable as fallback
- Example pipeline using incremental migration compiles successfully
- Documentation explains migration path from String API to typed API

### Motivation

String API is a key fallback and onboarding mechanism.

Without schema inference:
- users are forced to fully define `@DataSchema` upfront
- incremental migration is not possible
- compiler plugin loses a major part of its usability

This is critical for enabling real-world adoption and should be addressed before 1.0.

With compiler plugin, ideally we want to know about schema as early as possible to help transform it. However forcing to generate DataSchema for all columns can create an entry barrier not desirable for simple pipelines. This is the motivation for String API support that was done earlier. With it, our sample pipeline incrementally evolves schema:

```kt
val repos = DataFrame
    .readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")

repos
    .add("name") { "full_name"<String>().substringAfterLast("/") }
    .filter { name.lowercase().contains("kotlin") }

val reposUpdated = repos
    .renameToCamelCase()
    .rename { "stargazersCount"<Int>() }.into("stars")
    .filter { stars > 50 }
    .convert { "topics"<String>() }.with {
        val inner = it.removeSurrounding("[", "]")
        if (inner.isEmpty()) emptyList() else inner.split(',').map(String::trim)
    }
    .add("topicCount") { topics.size }
    .add("kind") { getKind("fullName"() , topics) }

reposUpdated.writeCsv("jetbrains_repositories_new.csv")
```

The only thing lacking here is a convenient way to tell plugin that there's full_name: String column, so could further replace

` .add("name") { "full_name"<String>().substringAfterLast("/") }` => `.add("name") { full_name.substringAfterLast("/") }`
` .add("kind") { getKind("fullName"() , topics) }` => `.add("kind") { getKind(fullName , topics) }`

Proposed solution: `require` operation, addition to `cast` and `convertTo`
```kt
val repos = DataFrame
    .readCsv("https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv")
    .require { "full_name"<String>() }

repos.full_name // now we can call full_name because otherwise require would've failed
```

Main difference - require will add new information, not substitute it as cast and convertTo 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unblock String API workflows by allowing schema inference from string-based column access #1808

Problem

Expected

Proposed solution

Acceptance criteria

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unblock String API workflows by allowing schema inference from string-based column access #1808

Description

Problem

Expected

Proposed solution

Acceptance criteria

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions