Commit 82bb2fa
zephyr: widen inferred parquet schema via pa.unify_schemas
``_accumulate_tables`` infers its schema from the first micro-batch
(``_MICRO_BATCH_SIZE=8``). If those first records happen to have ``None``
for a field — or to lack a field that appears later — downstream batches
that would legitimately widen the schema either crashed with
``ArrowInvalid: Invalid null value`` or (in the new-field case) were
silently truncated by ``pa.Table.from_pylist``.
Unify-widen the inferred schema on mismatch and reconcile chunks on yield
via ``concat_tables(promote_options="permissive")``. Surface genuine
incompatibilities (e.g. int vs string) as errors with both schemas and
the inference origin shown, so operators can diagnose without extra
instrumentation.
An explicit caller-provided schema is treated as a contract: mismatches
raise without silent widening.
Tests cover: null→concrete widening, new-field-appears-later (previously
silently dropped), and int-vs-string conflict surfacing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent ef51f83 commit 82bb2fa
2 files changed
Lines changed: 119 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
171 | 171 | | |
172 | 172 | | |
173 | 173 | | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
174 | 187 | | |
175 | 188 | | |
176 | 189 | | |
177 | 190 | | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
178 | 238 | | |
179 | 239 | | |
180 | 240 | | |
181 | 241 | | |
182 | 242 | | |
183 | 243 | | |
184 | | - | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
185 | 247 | | |
186 | | - | |
| 248 | + | |
| 249 | + | |
187 | 250 | | |
188 | 251 | | |
189 | 252 | | |
190 | | - | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
191 | 257 | | |
192 | 258 | | |
193 | 259 | | |
194 | 260 | | |
195 | | - | |
| 261 | + | |
196 | 262 | | |
197 | 263 | | |
198 | 264 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
13 | 14 | | |
14 | | - | |
15 | | - | |
16 | 15 | | |
17 | 16 | | |
18 | 17 | | |
| |||
151 | 150 | | |
152 | 151 | | |
153 | 152 | | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
154 | 201 | | |
155 | 202 | | |
156 | 203 | | |
| |||
0 commit comments