Skip to content

Commit 51f645b

Browse files
committed
Adds I/O guide and api docs
1 parent 44f6ce6 commit 51f645b

File tree

5 files changed

+156
-18
lines changed

5 files changed

+156
-18
lines changed

daft/io/source.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ def get_tasks(self, pushdowns: Pushdowns) -> Iterator[DataFrameSourceTask]:
5353
...
5454

5555
def to_dataframe(self) -> DataFrame:
56-
"""Creates a Daft DataFrame from a DataFrameSource implementation."""
56+
"""Creates a Daft DataFrame from this DataFrameSource."""
5757
from daft.io.__shim import _to_dataframe
5858

5959
return _to_dataframe(self)

docs/api/io.md

Lines changed: 64 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,38 @@
11
# I/O
22

3-
Daft supports diverse input sources and output sinks which are covered in [DataFrame Creation](dataframe_creation.md)
4-
this page covers lower-level APIs which we are evolving for more advanced usage.
3+
Daft supports diverse input sources and output sinks, some of which are covered in [DataFrame Creation](dataframe_creation.md).
54

6-
!!! warning "Warning"
5+
## Input
76

8-
These APIs are considered experimental.
7+
<!-- from_ -->
98

10-
## Sources
9+
::: daft.from_arrow
10+
options:
11+
heading_level: 3
1112

12-
::: daft.io.read_parquet
13+
::: daft.from_dask_dataframe
1314
options:
1415
heading_level: 3
1516

16-
::: daft.io.read_csv
17+
::: daft.from_pandas
1718
options:
1819
heading_level: 3
1920

20-
::: daft.io.read_json
21+
::: daft.from_pydict
2122
options:
2223
heading_level: 3
2324

24-
::: daft.io.read_warc
25+
::: daft.from_pylist
2526
options:
2627
heading_level: 3
2728

28-
::: daft.io.read_iceberg
29+
::: daft.from_ray_dataset
30+
options:
31+
heading_level: 3
32+
33+
<!-- read_ -->
34+
35+
::: daft.io.read_csv
2936
options:
3037
heading_level: 3
3138

@@ -37,31 +44,72 @@ this page covers lower-level APIs which we are evolving for more advanced usage.
3744
options:
3845
heading_level: 3
3946

40-
::: daft.io.read_sql
47+
::: daft.io.read_iceberg
48+
options:
49+
heading_level: 3
50+
51+
::: daft.io.read_json
4152
options:
4253
heading_level: 3
4354

4455
::: daft.io.read_lance
4556
options:
4657
heading_level: 3
4758

59+
::: daft.io.read_parquet
60+
options:
61+
heading_level: 3
4862

49-
## Interfaces
63+
::: daft.io.read_sql
64+
options:
65+
heading_level: 3
5066

51-
::: daft.io.source.DataSource
67+
::: daft.io.read_warc
68+
options:
69+
heading_level: 3
70+
71+
## Output
72+
73+
<!-- write_ -->
74+
75+
::: daft.dataframe.DataFrame.write_csv
76+
options:
77+
heading_level: 3
78+
79+
::: daft.dataframe.DataFrame.write_deltalake
80+
options:
81+
heading_level: 3
82+
83+
::: daft.dataframe.DataFrame.write_iceberg
84+
options:
85+
heading_level: 3
86+
87+
::: daft.dataframe.DataFrame.write_lance
88+
options:
89+
heading_level: 3
90+
91+
::: daft.dataframe.DataFrame.write_parquet
92+
options:
93+
heading_level: 3
94+
95+
## User-Defined
96+
97+
!!! warning "Warning"
98+
99+
These APIs are considered experimental.
100+
101+
::: daft.io.source.DataFrameSource
52102
options:
53103
filters: ["!^_"]
54104
heading_level: 3
55105

56-
::: daft.io.source.DataSourceTask
106+
::: daft.io.source.DataFrameSourceTask
57107
options:
58108
filters: ["!^_"]
59109
heading_level: 3
60110

61111
## Pushdowns
62112

63-
Daft has predicate, projection, and limit pushdowns with expressions being represented by *Terms*. Learn more about [Pushdowns](../advanced/pushdowns.md) in the Daft User Guide.
64-
65113
::: daft.io.pushdowns.Pushdowns
66114
options:
67115
filters: ["!^_"]

docs/api/schema.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Daft can display your DataFrame's schema without materializing it. Under the hood, it performs intelligent sampling of your data to determine the appropriate schema, and if you make any modifications to your DataFrame it can infer the resulting types based on the operation. Learn more about [Schemas](../core_concepts.md#schemas-and-types) in Daft User Guide.
44

5-
::: daft.schema.schema
5+
::: daft.schema.Schema
66
options:
77
filters: ["!^_"]
88

docs/io.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# I/O
2+
3+
Please see [Daft I/O API docs](./api/io.md) for API details.
4+
5+
## In-Memory
6+
7+
| Function | Description |
8+
|---------------------------------------------------|---------------------------------------------------------|
9+
| [`from_arrow`][daft.from_arrow] | Create a DataFrame from PyArrow Tables or RecordBatches |
10+
| [`from_dask_dataframe`][daft.from_dask_dataframe] | Create a DataFrame from a Dask DataFrame |
11+
| [`from_pandas`][daft.from_pandas] | Create a DataFrame from a Pandas DataFrame |
12+
| [`from_pydict`][daft.from_pydict] | Create a DataFrame from a python dictionary |
13+
| [`from_pylist`][daft.from_pylist] | Create a DataFrame from a python list |
14+
| [`from_ray_dataset`][daft.from_ray_dataset] | Create a DataFrame from a Ray Dataset |
15+
16+
17+
## CSV
18+
19+
| Function | Description |
20+
|---------------------------------------------------|--------------------------------------------------------|
21+
| [`read_csv`][daft.io.read_csv] | Read a CSV file or multiple CSV files into a DataFrame |
22+
| [`write_csv`][daft.dataframe.DataFrame.write_csv] | Write a DataFrame to CSV files |
23+
24+
25+
## Delta Lake
26+
27+
| Function | Description |
28+
|---------------------------------------------------------------|------------------------------------------|
29+
| [`read_deltalake`][daft.io.read_deltalake] | Read a Delta Lake table into a DataFrame |
30+
| [`write_deltalake`][daft.dataframe.DataFrame.write_deltalake] | Write a DataFrame to a Delta Lake table |
31+
32+
33+
## Hudi
34+
35+
| Function | Description |
36+
|----------------------------------|------------------------------------|
37+
| [`read_hudi`][daft.io.read_hudi] | Read a Hudi table into a DataFrame |
38+
39+
40+
## Iceberg
41+
42+
| Function | Description |
43+
|-----------------------------------------------------------|----------------------------------------|
44+
| [`read_iceberg`][daft.io.read_iceberg] | Read an Iceberg table into a DataFrame |
45+
| [`write_iceberg`][daft.dataframe.DataFrame.write_iceberg] | Write a DataFrame to an Iceberg table |
46+
47+
## JSON
48+
49+
| Function | Description |
50+
|----------------------------------|----------------------------------------------------------|
51+
| [`read_json`][daft.io.read_json] | Read a JSON file or multiple JSON files into a DataFrame |
52+
53+
54+
## Lance
55+
56+
| Function | Description |
57+
|-------------------------------------------------------|---------------------------------------|
58+
| [`read_lance`][daft.io.read_lance] | Read a Lance dataset into a DataFrame |
59+
| [`write_lance`][daft.dataframe.DataFrame.write_lance] | Write a DataFrame to a Lance dataset |
60+
61+
62+
## Parquet
63+
64+
| Function | Description |
65+
|-----------------------------------------------------------|----------------------------------------------------------------|
66+
| [`read_parquet`][daft.io.read_parquet] | Read a Parquet file or multiple Parquet files into a DataFrame |
67+
| [`write_parquet`][daft.dataframe.DataFrame.write_parquet] | Write a DataFrame to Parquet files |
68+
69+
70+
## SQL
71+
72+
| Function | Description |
73+
|--------------------------------|------------------------------------------------|
74+
| [`read_sql`][daft.io.read_sql] | Read data from a SQL database into a DataFrame |
75+
76+
77+
## WARC
78+
79+
| Function | Description |
80+
|----------------------------------|----------------------------------------------------------|
81+
| [`read_warc`][daft.io.read_warc] | Read a WARC file or multiple WARC files into a DataFrame |
82+
83+
84+
## User-Defined
85+
86+
| Function | Description |
87+
|-------------------------------------------------------------|--------------------------------------------------------------------|
88+
| [`DataFrameSource`][daft.io.source.DataFrameSource] | Interface for reading data into DataFrames |
89+
| [`DataFrameSourceTask`][daft.io.source.DataFrameSourceTask] | Represents a partition of data that can be processed independently |

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ nav:
2323
- SQL: sql_overview.md
2424
- Sessions: sessions.md
2525
- Catalogs: catalogs.md
26+
- I/O: io.md
2627
- Spark Connect: spark_connect.md
2728
- Distributed Computing: distributed.md
2829
- Advanced:

0 commit comments

Comments
 (0)