@@ -141,6 +141,105 @@ assert result.column(0) == pyarrow.array([5, 7, 9])
141141assert result.column(1 ) == pyarrow.array([- 3 , - 3 , - 3 ])
142142```
143143
144+ ## Jupyter Notebook Support
145+
146+ Ballista works well in Jupyter notebooks. DataFrames automatically render as formatted HTML tables when displayed
147+ in a notebook cell.
148+
149+ ### Basic Usage
150+
151+ ``` python
152+ from ballista import BallistaSessionContext
153+
154+ # Connect to a Ballista cluster
155+ ctx = BallistaSessionContext(" df://localhost:50050" )
156+
157+ # Register a table
158+ ctx.register_parquet(" trips" , " /path/to/nyctaxi.parquet" )
159+
160+ # Run a query - the result renders as an HTML table
161+ ctx.sql(" SELECT * FROM trips LIMIT 10" )
162+ ```
163+
164+ When a DataFrame is the last expression in a cell, Jupyter automatically calls its ` _repr_html_() ` method,
165+ which renders a styled table with:
166+
167+ - Formatted column headers
168+ - Expandable cells for long text content
169+ - Scrollable display for wide tables
170+
171+ ### Converting Results
172+
173+ DataFrames can be converted to various formats for further analysis:
174+
175+ ``` python
176+ df = ctx.sql(" SELECT * FROM trips WHERE fare_amount > 50" )
177+
178+ # Convert to Pandas DataFrame
179+ pandas_df = df.to_pandas()
180+
181+ # Convert to PyArrow Table
182+ arrow_table = df.to_arrow_table()
183+
184+ # Convert to Polars DataFrame
185+ polars_df = df.to_polars()
186+
187+ # Collect as PyArrow RecordBatches
188+ batches = df.collect()
189+ ```
190+
191+ ### Example Notebook Workflow
192+
193+ A typical notebook workflow might look like:
194+
195+ ``` python
196+ # Cell 1: Setup
197+ from ballista import BallistaSessionContext
198+ from datafusion import col, lit
199+
200+ ctx = BallistaSessionContext(" df://localhost:50050" )
201+ ctx.register_parquet(" orders" , " /data/orders.parquet" )
202+ ctx.register_parquet(" customers" , " /data/customers.parquet" )
203+
204+ # Cell 2: Explore the data
205+ ctx.sql(" SELECT * FROM orders LIMIT 5" )
206+
207+ # Cell 3: Run analysis
208+ df = ctx.sql("""
209+ SELECT
210+ c.name,
211+ COUNT(*) as order_count,
212+ SUM(o.amount) as total_spent
213+ FROM orders o
214+ JOIN customers c ON o.customer_id = c.id
215+ GROUP BY c.name
216+ ORDER BY total_spent DESC
217+ LIMIT 10
218+ """ )
219+ df
220+
221+ # Cell 4: Convert to Pandas for visualization
222+ import matplotlib.pyplot as plt
223+
224+ pandas_df = df.to_pandas()
225+ pandas_df.plot(kind = ' bar' , x = ' name' , y = ' total_spent' )
226+ plt.show()
227+ ```
228+
229+ ### Running a Local Cluster in a Notebook
230+
231+ For development and testing, you can start a local cluster directly from a notebook:
232+
233+ ``` python
234+ from ballista import BallistaSessionContext, setup_test_cluster
235+
236+ # Start a local scheduler and executor
237+ host, port = setup_test_cluster()
238+
239+ # Connect to it
240+ ctx = BallistaSessionContext(f " df:// { host} : { port} " )
241+ ```
242+
144243## User Defined Functions
145244
146245The underlying DataFusion query engine supports Python UDFs but this functionality has not yet been implemented in
0 commit comments