-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
My packages:
python 3.11
pyspark==4.0.1
typedspark==1.5.5
Code from the tutorial:
from pyspark.sql import SparkSession
import pandas as pd
from typedspark import Column, DataSet, Schema
from pyspark.sql.types import LongType, StringType
spark = SparkSession.Builder().config("spark.ui.showConsoleProgress", "false").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
class Person(Schema):
id: Column[LongType]
name: Column[StringType]
age: Column[LongType]
df = spark.createDataFrame(
pd.DataFrame(
dict(
id=[1, 2, 3],
name=["John", "Jane", "Jack"],
age=[20, 30, 40],
)
)
)
# no errors raised
df = DataSet[Person](df)
The error at line
| for field in schema.fields: |
AttributeError: 'NoneType' object has no attribute 'fields'
Cause:
After this line:
typedspark/typedspark/_core/dataset.py
Line 188 in af9b66c
| dataframe.__class__ = DataSet |
The method: dataframe.schema is None
I tested with the PySpark version 3.5.2, and it succeeded
Metadata
Metadata
Assignees
Labels
No labels