You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Convert to parsed structure (column headers become headings)
161
+
parsed_batch = headhunter.process_structured_df(
162
+
df,
163
+
id_column="patient_id",
164
+
metadata_columns=["date"],
165
+
# rest of the columns auto-detected as content_columns if not specified
166
+
)
167
+
168
+
# All output formats work the same as markdown parsing
169
+
parsed_batch.to_markdown("reports/") # Creates report-style markdown files with column headers as inline colon headings and cell values as content
170
+
parsed_batch.to_dataframe() # Long-form DataFrame in the same format as markdown parsing
171
+
```
172
+
142
173
## How Hierarchy is Built
143
174
144
175
`headhunter` recognizes different heading styles in Markdown and builds a hierarchical structure by assigning levels to each heading. The following rules govern this process:
@@ -239,3 +270,22 @@ The `to_markdown()` method converts the parsed hierarchical structure back into
239
270
-**YAML front matter**: Metadata is included as YAML front matter at the top of the document
240
271
-**Consistent spacing**: Single blank lines between sections for readability
241
272
-**Case preservation**: Original text case is maintained (including ALL CAPS)
273
+
274
+
## Structured Data Processing
275
+
276
+
In addition to parsing markdown documents, `headhunter` can convert CSVs with multiple content columns into the same parsed structure. This enables use of all the same downstream analysis logic and output formats for tabular data.
277
+
278
+
**Use cases:**
279
+
280
+
- Convert flat database exports into long-form dataframe formats
281
+
- Generate markdown reports from structured data
282
+
- Apply consistent analysis pipelines to both markdown and tabular data sources
283
+
284
+
**How it works:**
285
+
286
+
`process_structured_df()` treats each row as a separate document and each content column as a section:
287
+
288
+
-**Column headers** become level-1 headings
289
+
-**Cell values** become level-2 content under their respective column heading
290
+
-**Empty cells (NaN)** are converted to empty strings
0 commit comments