apache
diff --git a/‎docs/how-tos/cache-nodes.rst
Lines changed: 1 addition & 1 deletion b/‎docs/how-tos/cache-nodes.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/caching_nodes/README.md
Lines changed: 5 additions & 15 deletions b/‎examples/caching_nodes/README.md
Lines changed: 5 additions & 15 deletions
diff --git a/‎examples/caching_nodes/business_logic.py
Lines changed: 0 additions & 1 deletion b/‎examples/caching_nodes/business_logic.py
Lines changed: 0 additions & 1 deletion
diff --git a/‎examples/caching_nodes/caching_graph_adapter/README.md
Lines changed: 18 additions & 0 deletions b/‎examples/caching_nodes/caching_graph_adapter/README.md
Lines changed: 18 additions & 0 deletions
diff --git a/‎examples/caching_nodes/caching_graph_adapter/business_logic.py
Lines changed: 35 additions & 0 deletions b/‎examples/caching_nodes/caching_graph_adapter/business_logic.py
Lines changed: 35 additions & 0 deletions
diff --git a/‎examples/caching_nodes/caching_nodes.ipynb renamed to ‎examples/caching_nodes/caching_graph_adapter/caching_nodes.ipynb b/‎examples/caching_nodes/caching_nodes.ipynb renamed to ‎examples/caching_nodes/caching_graph_adapter/caching_nodes.ipynb
diff --git a/‎examples/caching_nodes/data_loaders.py renamed to ‎examples/caching_nodes/caching_graph_adapter/data_loaders.py b/‎examples/caching_nodes/data_loaders.py renamed to ‎examples/caching_nodes/caching_graph_adapter/data_loaders.py
diff --git a/‎examples/caching_nodes/requirements.txt renamed to ‎examples/caching_nodes/caching_graph_adapter/requirements.txt b/‎examples/caching_nodes/requirements.txt renamed to ‎examples/caching_nodes/caching_graph_adapter/requirements.txt
diff --git a/‎examples/caching_nodes/run.py renamed to ‎examples/caching_nodes/caching_graph_adapter/run.py b/‎examples/caching_nodes/run.py renamed to ‎examples/caching_nodes/caching_graph_adapter/run.py
diff --git a/‎examples/cache_hook/README.md renamed to ‎examples/caching_nodes/diskcache_adapter/README.md
Lines changed: 3 additions & 3 deletions b/‎examples/cache_hook/README.md renamed to ‎examples/caching_nodes/diskcache_adapter/README.md
Lines changed: 3 additions & 3 deletions
diff --git a/‎examples/cache_hook/functions.py renamed to ‎examples/caching_nodes/diskcache_adapter/functions.py b/‎examples/cache_hook/functions.py renamed to ‎examples/caching_nodes/diskcache_adapter/functions.py
diff --git a/‎examples/cache_hook/notebook.ipynb renamed to ‎examples/caching_nodes/diskcache_adapter/notebook.ipynb b/‎examples/cache_hook/notebook.ipynb renamed to ‎examples/caching_nodes/diskcache_adapter/notebook.ipynb
diff --git a/‎examples/cache_hook/requirements.txt renamed to ‎examples/caching_nodes/diskcache_adapter/requirements.txt b/‎examples/cache_hook/requirements.txt renamed to ‎examples/caching_nodes/diskcache_adapter/requirements.txt
diff --git a/‎examples/cache_hook/run.py renamed to ‎examples/caching_nodes/diskcache_adapter/run.py b/‎examples/cache_hook/run.py renamed to ‎examples/caching_nodes/diskcache_adapter/run.py
@@ -6,4 +6,4 @@ Sometimes it is convenient to cache intermediate nodes. This is especially usefu
 
 For example, if a particular node takes a long time to calculate (perhaps it extracts data from an outside source or performs some heavy computation), you can annotate it with "cache" tag. The first time the DAG is executed, that node will be cached to disk. If then you do some development on any of the downstream nodes, the subsequent executions will load the cached node instead of repeating the computation.
 
-See the full tutorial `here <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/caching_nodes>`_.
+See the examples here `here <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/caching_nodes>`_.
@@ -1,18 +1,8 @@
-# Caching Graph Adapter
+Here you'll find two adapters that allow you to cache the results of your functions.
 
-You can use `CachingGraphAdapter` to cache certain nodes.
+The first one is the `DiskCacheAdapter`, which uses the `diskcache` library to store the results on disk.
 
-This is great for:
+The second one is the `CachingGraphAdapter`, which requires you to tag functions to cache along with the
+serialization format.
 
-1. Iterating during development, where you don't want to recompute certain expensive function calls.
-2. Providing some lightweight means to control recomputation in production, by controlling whether a "cached file" exists or not.
-
-For iterating during development, the general process would be:
-
-1. Write your functions.
-2. Mark them with `tag(cache="SERIALIZATION_FORMAT")`
-3. Use the CachingGraphAdapter and pass that to the Driver to turn on caching for these functions.
-    a. If at any point in your development you need to re-run a cached node, you can pass
-       its name to the adapter in the `force_compute` argument. Then, this node and its downstream
-       nodes will be computed instead of loaded from cache.
-4. When no longer required, you can just skip (3) and any caching behavior will be skipped.
+Both have their sweet spots and trade-offs. We invite you play with them and provide feedback on which one you prefer.
@@ -0,0 +1,18 @@
+# Caching Graph Adapter
+
+You can use `CachingGraphAdapter` to cache certain nodes.
+
+This is great for:
+
+1. Iterating during development, where you don't want to recompute certain expensive function calls.
+2. Providing some lightweight means to control recomputation in production, by controlling whether a "cached file" exists or not.
+
+For iterating during development, the general process would be:
+
+1. Write your functions.
+2. Mark them with `tag(cache="SERIALIZATION_FORMAT")`
+3. Use the CachingGraphAdapter and pass that to the Driver to turn on caching for these functions.
+    a. If at any point in your development you need to re-run a cached node, you can pass
+       its name to the adapter in the `force_compute` argument. Then, this node and its downstream
+       nodes will be computed instead of loaded from cache.
+4. When no longer required, you can just skip (3) and any caching behavior will be skipped.
@@ -0,0 +1,35 @@
+import pandas as pd
+
+"""
+Copied from the hello world example.
+"""
+
+
+def avg_3wk_spend(spend: pd.Series) -> pd.Series:
+    """Rolling 3 week average spend."""
+    return spend.rolling(3).mean()
+
+
+def spend_per_signup(spend: pd.Series, signups: pd.Series) -> pd.Series:
+    """The cost per signup in relation to spend."""
+    return spend / signups
+
+
+def spend_mean(spend: pd.Series) -> float:
+    """Shows function creating a scalar. In this case it computes the mean of the entire column."""
+    return spend.mean()
+
+
+def spend_zero_mean(spend: pd.Series, spend_mean: float) -> pd.Series:
+    """Shows function that takes a scalar. In this case to zero mean spend."""
+    return spend - spend_mean
+
+
+def spend_std_dev(spend: pd.Series) -> float:
+    """Function that computes the standard deviation of the spend column."""
+    return spend.std()
+
+
+def spend_zero_mean_unit_variance(spend_zero_mean: pd.Series, spend_std_dev: float) -> pd.Series:
+    """Function showing one way to make spend have zero mean and unit variance."""
+    return spend_zero_mean / spend_std_dev
@@ -1,5 +1,5 @@
-# Cache hook
-This hook uses the [diskcache](https://grantjenks.com/docs/diskcache/tutorial.html) to cache node execution on disk. The cache key is a tuple of the function's
+# DiskCache Adapter
+This adapter uses [diskcache](https://grantjenks.com/docs/diskcache/tutorial.html) to cache node execution on disk. The cache key is a tuple of the function's
 `(source code, input a, ..., input n)`. This means, a function will only be executed once for a given set of inputs,
 and source code hash. The cache is stored in a directory of your choice, and it can be shared across different runs of your
 code. That way as you develop, if the inputs and the code haven't changed, the function will not be executed again and
@@ -16,7 +16,7 @@ Disk cache has great features to:
 > cache (both keys and values). Learn more about [caveats](https://grantjenks.com/docs/diskcache/tutorial.html#caveats).
 
 > ❓ To store artifacts robustly, please use Hamilton materializers or the
-> [CachingGraphAdapter](https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/caching_nodes) instead.
+> [CachingGraphAdapter](https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/caching_nodes/caching_graph_adatper) instead.
 > The `CachingGraphAdapter` stores tagged nodes directly on the file system using common formats (JSON, CSV, Parquet, etc.).
 > However, it isn't aware of your function version and requires you to manually manage your disk space.
Original file line number	Diff line number	Diff line change
`@@ -6,4 +6,4 @@ Sometimes it is convenient to cache intermediate nodes. This is especially usefu`
`6`	`6`
`7`	`7`	`For example, if a particular node takes a long time to calculate (perhaps it extracts data from an outside source or performs some heavy computation), you can annotate it with "cache" tag. The first time the DAG is executed, that node will be cached to disk. If then you do some development on any of the downstream nodes, the subsequent executions will load the cached node instead of repeating the computation.`
`8`	`8`
`9`		-See the full tutorial `here <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/caching_nodes>`_.
	`9`	+See the examples here `here <https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/caching_nodes>`_.