Skip to content

Commit bab44d3

Browse files
ziltozilto
and
zilto
authored
Docs: updated function/node page (#659)
* updated node page * Revert "Bumps version to 1.46.0" This reverts commit e5cce27. * added back function-naming.rst * clarified node vs. function; swapped order of sections * node page updated; version corrected * missing figure added --------- Co-authored-by: zilto <tjean@DESKTOP-V6JDCS2>
1 parent 31fe137 commit bab44d3

File tree

7 files changed

+145
-110
lines changed

7 files changed

+145
-110
lines changed

docs/_static/abc_basic.png

9.85 KB
Loading

docs/_static/function_anatomy.png

442 KB
Loading

docs/concepts/best-practices/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ A set of best-practices to help you get the most out of Hamilton quickly and eas
55

66
.. toctree::
77

8-
migrating-to-hamilton
98
function-naming
9+
migrating-to-hamilton
1010
code-organization
1111
function-modifiers
1212
common-indices

docs/concepts/hamilton-function-structure.rst

-107
This file was deleted.

docs/concepts/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ concepts that makes Hamilton unique and powerful.
99
.. toctree::
1010

1111
lexicon
12-
hamilton-function-structure
12+
node
1313
driver-capabilities
1414
customizing-execution
1515
decorators-overview

docs/concepts/lexicon.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Before we dive into the concepts, let's clarify the terminology we'll be using:
1717
naturally represented by nodes and edges) and can be represented visually.
1818
* - Transform Function, or simply Function
1919
- A python function used to represent a Hamilton transform -- it can compile to one (in the standard case) or \
20-
many (with the use of decorators) transforms. See :doc:`hamilton-function-structure` for more details.
20+
many (with the use of decorators) transforms. See :doc:`node` for more details.
2121
* - Transform
2222
- A step in the dataflow DAG representing a computation -- usually 1:1 with functions but decorators break that \
2323
pattern -- in which case multiple transforms trace back to a single function.

docs/concepts/node.rst

+142
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
===========================
2+
Functions, nodes & dataflow
3+
===========================
4+
5+
On this page, you'll learn how Hamilton converts your Python functions into nodes and then creates a dataflow.
6+
7+
Functions
8+
---------
9+
10+
Hamilton requires you to write your code using functions. To get started, you simply need to:
11+
12+
- `Annotate the type <https://docs.python.org/3/library/typing.html>`_ of the function parameters and return value.
13+
- Specify the function's dependency with the parameters' name.
14+
- Store your code in Python modules (``.py`` files).
15+
16+
Since your code doesn't depend on special "Hamilton code", it can be reused any other way you want!
17+
18+
Specifying dependencies
19+
~~~~~~~~~~~~~~~~~~~~~~~
20+
In Hamilton, you define dependencies by matching parameter names with the names of other functions. Below, the function name and return type ``A() -> int``match the parameter ``A: int`` found in functions ``B()`` and ``C()``.
21+
22+
.. code-block:: python
23+
24+
def A() -> int:
25+
"""Constant value 35"""
26+
return 35
27+
28+
def B(A: int) -> float:
29+
"""Divide A by 3"""
30+
return A / 3
31+
32+
def C(A: int, B: float) -> float:
33+
"""Square A and multiply by B"""
34+
return A**2 * B
35+
36+
37+
.. image:: ../_static/abc_basic.png
38+
:align: center
39+
40+
The figure shows how Hamilton automatically assembled the functions ``A()``, ``B()``, and ``C()``.
41+
42+
Helper function
43+
~~~~~~~~~~~~~~~~
44+
45+
You can prefix a function name with an underscore (``_``) to prevent it from being included in a dataflow. Below, ``A()`` and ``B()`` are part of the dataflow, but ``_round_three_decimals()`` isn't.
46+
47+
.. code-block:: python
48+
49+
def _round_three_decimals(value: float) -> float:
50+
"""Round value by 3 decimals"""
51+
return round(value, 3)
52+
53+
def A(external_input: int) -> int:
54+
"""Modulo 3 of input value"""
55+
return external_input % 3
56+
57+
def B(A: int) -> float:
58+
"""Divide A by 3"""
59+
b = A / 3
60+
return _round_three_decimals(b)
61+
62+
63+
Function naming tips
64+
~~~~~~~~~~~~~~~~~~~~
65+
Hamilton strongly agrees with the `Zen of Python <https://peps.python.org/pep-0020/>`_ #2: "Explicit is better than implicit". Meaningful function names help document what functions do, so don't shy away from longer names. If you were to come across a function named ``life_time_value`` versus ``ltv`` versus ``l_t_v``, which one is most obvious? Remember your code usually lives a lot longer than you ever think it will.
66+
67+
Unlike the common practice of including meaningful verbs in function names (e.g., ``get_credentials()``, ``statistical_test()``), with Hamilton, the function name should more closely align with nouns. That's because the function name determines the node name and how data will be queried. Therefore, names that describe the node result rather than its action may be more readable (e.g., ``credentials()``, ``statistical_results()``).
68+
69+
70+
Nodes
71+
-----
72+
73+
A node is a single "step" in a dataflow. Hamilton users write Python `functions` that Hamilton converts into `nodes`. They never directly create nodes.
74+
75+
76+
Anatomy of a node
77+
~~~~~~~~~~~~~~~~~
78+
The following figure and table detail how a Python function maps to a Hamilton node.
79+
80+
81+
.. image:: ../_static/function_anatomy.png
82+
:scale: 13%
83+
:align: center
84+
85+
86+
.. list-table::
87+
:header-rows: 1
88+
89+
* - id
90+
- Function components
91+
- Node components
92+
* - 1
93+
- Function name and return type annotation
94+
- Node name and type
95+
* - 2
96+
- Parameter(s) name and type annotation
97+
- Node dependencies
98+
* - 3
99+
- Docstring
100+
- Description of the node return value
101+
* - 4
102+
- Function body
103+
- Implementation of the node
104+
105+
106+
Since functions almost always map 1-to-1 to nodes, the two terms are used interchangeably. However, there are exceptions that we'll discuss later in this guide.
107+
108+
Dataflow
109+
--------
110+
111+
From a collection of nodes, Hamilton automatically assembles the dataflow. For each node, it creates edges between itself and its dependencies, resulting in a `dataflow <https://en.wikipedia.org/wiki/Dataflow_programming>`_ (or a `graph <https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)>`_ in more mathematical terms).
112+
113+
From the user perspective, you just have to give Hamilton a Python module containing your functions for it to generate your dataflow! This is a key difference with popular orchestration / pipeline / workflow frameworks (Airflow, Kedro, Prefect, VertexAI, SageMaker, etc.)
114+
115+
How other frameworks build graphs
116+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
117+
In most frameworks, you first define steps / tasks / components. Then, you need to create your dataflow by explicitly specifying the relationship between each node.
118+
119+
Readability
120+
^^^^^^^^^^^
121+
In that case, the code for ``step A`` doesn't tell you how it relates ``step B`` or the broader dataflow. Hamilton solves this problem by tying functions, nodes, and dataflow definitions in a single place. The ratio of reading to writing code can be as high as `10:1 <https://www.goodreads.com/quotes/835238-indeed-the-ratio-of-time-spent-reading-versus-writing-is>`_, especially for complex dataflows, so optimizing for readability is very high-value.
122+
123+
Maintainability
124+
^^^^^^^^^^^^^^^
125+
Typically, editing a dataflow (new feature, debugging, etc.) alters both what a **node** does and how the **dataflow** is structured. Consequently, changes to ``step A`` require you to manually ensure consistent edits to the definition of dataflows, which is likely in another file. In enterprise settings, it can become difficult to discover and track every location ``step A`` is used (potentially 10s or 100s of pipelines), increasing the likelihood of breaking changes. Hamilton avoids entirely this problem because changes to the node definitions, and thus the dataflow, will propagate to all places this code is used. This greatly improves maintainability and development speed by facilitating code changes.
126+
127+
Recap
128+
--------
129+
- Users write Python functions into modules with proper naming and typing
130+
- Helper functions use an underscore prefix (e.g., ``_helper()``)
131+
- Hamilton converts functions into nodes
132+
- Hamilton automatically assembles nodes into a dataflow
133+
134+
135+
Next step
136+
---------
137+
So far, we learned how to write Hamilton code for our dataflow. Next, we'll explore how we can effectively
138+
139+
1. Convert a Python module into dataflow
140+
2. Visualize a dataflow
141+
3. Execute a dataflow
142+
4. Gather and store results of a dataflow

0 commit comments

Comments
 (0)