Skip to content

Commit 4366b3f

Browse files
committed
docs on LOCATION and partition
1 parent 1f51991 commit 4366b3f

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

README.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ print(data_from_parquet.pl.dtypes)
178178

179179
#### 7. Bonus: Partitions
180180

181-
The special attribute `LOCATION` helps you write the data where you want, how you want it.
181+
The special attribute `LOCATION` helps you write the data where you want, how you want it. `LOCATION` does not have to be declared, but it is set to sensible (unpartitioned) defaults.
182182

183183
On calling `af.Dataset.partition()`, you'll get the formatted list of Hive-style partitions and the datasets broken up accordingly.
184184

@@ -190,16 +190,18 @@ class PartitionedIsotopeData(af.Dataset):
190190
z = af.VectorI8("Atomic Number (Z)")
191191
mass = af.VectorF64("Isotope Mass (Da)")
192192
abundance = af.VectorF64("Relative natural abundance")
193-
LOCATION = af.Location(folder="mydata", file="isotopes.csv", partition_by=["z"])
193+
LOCATION = af.Location(folder="s3://myisotopes", file="data.csv", partition_by=["z"])
194194

195-
url = "https://raw.githubusercontent.com/liquidcarbon/chembiodata/main/isotopes.csv"
195+
196+
url = "https://raw.githubusercontent.com/liquidcarbon/chembiodata/main/isotopes.csv"
196197
data_from_sql = PartitionedIsotopeData.build(query=f"FROM '{url}'", rename=True)
198+
197199
paths, partitions = data_from_sql.partition()
198200
paths[:3], partitions[:3]
199201

200-
# (['mydata/z=1/isotopes.csv',
201-
# 'mydata/z=2/isotopes.csv',
202-
# 'mydata/z=3/isotopes.csv'],
202+
# (['s3://myisotopes/z=1/data.csv',
203+
# 's3://myisotopes/z=2/data.csv',
204+
# 's3://myisotopes/z=3/data.csv'],
203205
# [Dataset PartitionedIsotopeData of shape (3, 4)
204206
# symbol = ['H', 'H', 'H']
205207
# z = [1, 1, 1]

0 commit comments

Comments
 (0)