Skip to content

Commit efa8df0

Browse files
committed
Initial commit
0 parents  commit efa8df0

24 files changed

Lines changed: 1407 additions & 0 deletions

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
site/
2+
build/
3+
dist/

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# Changelog
2+
All notable changes to this project will be documented in this file.
3+
4+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
5+
6+
## [Unreleased]
7+
8+
## [1.0.0] - 2020-11-11
9+
### Added
10+
- Initial release of pylateral.
11+
12+
[Unreleased]: https://github.com/boxysean/pylateral/compare/v1.0.0...HEAD
13+
[1.0.0]: https://github.com/boxysean/pylateral/releases/tag/v1.0.0
14+

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2020 Sean McIntyre
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.PHONY: test
2+
test: ## Run the tests
3+
pytest tests/

README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
pylateral
2+
=========
3+
4+
Intuitive multi-threaded task processing in python.
5+
6+
## Example
7+
8+
import urllib.request
9+
10+
@pylateral.task
11+
def request_and_print(url):
12+
response = urllib.request.urlopen(url)
13+
print(response.read())
14+
15+
URLS = [
16+
"https://www.nytimes.com/",
17+
"https://www.cnn.com/",
18+
"https://europe.wsj.com/",
19+
"https://www.bbc.co.uk/",
20+
"https://some-made-up-domain.com/",
21+
]
22+
23+
with pylateral.task_pool():
24+
for url in URLS:
25+
request_and_print(url)
26+
27+
print("Complete!")
28+

docs/comparison.md

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
Comparison with other python libraries
2+
======================================
3+
4+
There's lots of way to skin the threading cat!
5+
6+
### When to use *pylateral*
7+
8+
- Your workload is network-bound and/or IO-bound (e.g., API calls, database queries, read/write to FTP, read/write to files).
9+
10+
- Your workload can be run [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel).
11+
12+
- You are writing a script or prototype that isn't very large nor complex.
13+
14+
### When not to use *pylateral*
15+
16+
- Your workload is CPU-bound and blocked by the [Global Interpreter Lock](https://en.wikipedia.org/wiki/CPython#Design). *python* threading will not help speed up your workload, consider using [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) or [concurrent.futures.ProcessPoolExecutor](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor) instead.
17+
18+
- The complexity of your program would benefit from thinking about it in terms of [futures and promises](https://en.wikipedia.org/wiki/Futures_and_promises). Consider using [asyncio](https://docs.python.org/3/library/asyncio.html) or [concurrent.futures.ThreadPoolExecutors](https://docs.python.org/3/library/concurrent.futures.html) instead.
19+
20+
- When you want to have tighter controls around the lifecycle of your thread. Consider using [threading](https://docs.python.org/3/library/threading.html) instead.
21+
22+
- For larger workloads, consider using [dask.distributed](https://distributed.dask.org/en/latest/#), [Airflow](https://airflow.apache.org/), [Dagster](https://github.com/dagster-io/dagster/) or [Prefect](https://www.prefect.io/) to perform work across many nodes.
23+
24+
- You would benefit from a web UI for viewing and interacting with your tasks. For that, consider using [Airflow](https://airflow.apache.org/) or [Prefect](https://www.prefect.io/).
25+
26+
Feature comparison
27+
------------------
28+
29+
| Feature | pylateral | [asyncio](https://docs.python.org/3/library/asyncio.html) | [concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html) | [multiprocessing](https://docs.python.org/3/library/multiprocessing.html) | [threading](https://docs.python.org/3/library/threading.html) |
30+
| ---------------------------------- | --------- | ------- | ------------------------ | --------------- | --------- |
31+
| Easy to adapt single-threaded code ||||||
32+
| [Simple nested tasks](usage.md#working-with-nested-tasks) ||||||
33+
| Concurrent IO-bound workloads ||||||
34+
| Concurrent CPU-bound workloads ||| ✅ (Process Pool) |||
35+
| Flexibility in using return values ||||||
36+
37+
Code comparison
38+
----------
39+
40+
[PEP-3148 -- futures - execute computations asynchronously](https://www.python.org/dev/peps/pep-3148/#id13) introduces `concurrent.futures` and illustrates it by example. Here I show that example in *pylateral*, stacked up against the main threading libraries offered in python.
41+
42+
### `asyncio`
43+
44+
```python
45+
import aiohttp
46+
import asyncio
47+
import sqlite3
48+
49+
URLS = [
50+
'http://www.foxnews.com/',
51+
'http://www.cnn.com/',
52+
'http://europe.wsj.com/',
53+
'http://www.bbc.co.uk/',
54+
'http://some-made-up-domain.com/',
55+
]
56+
57+
async def extract_and_load(url, timeout=30):
58+
try:
59+
async with aiohttp.ClientSession() as session:
60+
async with session.get(url, timeout=timeout) as response:
61+
web_result = await response.text()
62+
print(f"{url} is {len(web_result)} bytes")
63+
64+
with sqlite3.connect('example.db') as conn, conn as cursor:
65+
cursor.execute('CREATE TABLE IF NOT EXISTS web_results (url text, length int);')
66+
cursor.execute('INSERT INTO web_results VALUES (?, ?)', (url, len(web_result)))
67+
except Exception as e:
68+
print(f"{url} generated an exception: {e}")
69+
return False
70+
else:
71+
return True
72+
73+
async def main():
74+
succeeded = await asyncio.gather(*[
75+
extract_and_load(url)
76+
for url in URLS
77+
])
78+
79+
print(f"Successfully completed {sum(1 for result in succeeded if result)}")
80+
81+
asyncio.run(main())
82+
```
83+
84+
### `concurrent.futures.ThreadPoolExecutor`
85+
86+
```python
87+
import concurrent.futures
88+
import requests
89+
import sqlite3
90+
91+
URLS = [
92+
'http://www.foxnews.com/',
93+
'http://www.cnn.com/',
94+
'http://europe.wsj.com/',
95+
'http://www.bbc.co.uk/',
96+
'http://some-made-up-domain.com/',
97+
]
98+
99+
def extract_and_load(url, timeout=30):
100+
try:
101+
web_result = requests.get(url, timeout=timeout).text
102+
print(f"{url} is {len(web_result)} bytes")
103+
104+
with sqlite3.connect('example.db') as conn, conn as cursor:
105+
cursor.execute('CREATE TABLE IF NOT EXISTS web_results (url text, length int);')
106+
cursor.execute('INSERT INTO web_results VALUES (?, ?)', (url, len(web_result)))
107+
except Exception as e:
108+
print(f"{url} generated an exception: {e}")
109+
return False
110+
else:
111+
return True
112+
113+
succeeded = []
114+
115+
with concurrent.futures.ThreadPoolExecutor() as executor:
116+
future_to_url = dict(
117+
(executor.submit(extract_and_load, url), url)
118+
for url in URLS
119+
)
120+
121+
for future in concurrent.futures.as_completed(future_to_url):
122+
succeeded.append(future.result())
123+
124+
print(f"Successfully completed {sum(1 for result in succeeded if result)}")
125+
```
126+
127+
### `pylateral`
128+
129+
```python
130+
import requests
131+
import sqlite3
132+
133+
import pylateral
134+
135+
URLS = [
136+
'http://www.foxnews.com/',
137+
'http://www.cnn.com/',
138+
'http://europe.wsj.com/',
139+
'http://www.bbc.co.uk/',
140+
'http://some-made-up-domain.com/',
141+
]
142+
143+
@pylateral.task(has_return_value=True)
144+
def extract_and_load(url, timeout=30):
145+
try:
146+
web_result = requests.get(url, timeout=timeout).text
147+
print(f"{url} is {len(web_result)} bytes")
148+
149+
with sqlite3.connect('example.db') as conn, conn as cursor:
150+
cursor.execute('CREATE TABLE IF NOT EXISTS web_results (url text, length int);')
151+
cursor.execute('INSERT INTO web_results VALUES (?, ?)', (url, len(web_result)))
152+
except Exception as e:
153+
print(f"{url} generated an exception: {e}")
154+
return False
155+
else:
156+
return True
157+
158+
with pylateral.task_pool() as pool:
159+
for url in URLS:
160+
extract_and_load(url)
161+
162+
succeeded = pool.results
163+
164+
print(f"Successfully completed {sum(1 for result in succeeded if result)}")
165+
```
166+
167+
### Unthreaded
168+
169+
```python
170+
import requests
171+
import sqlite3
172+
173+
URLS = [
174+
'http://www.foxnews.com/',
175+
'http://www.cnn.com/',
176+
'http://europe.wsj.com/',
177+
'http://www.bbc.co.uk/',
178+
'http://some-made-up-domain.com/',
179+
]
180+
181+
def extract_and_load(url, timeout=30):
182+
try:
183+
web_result = requests.get(url, timeout=timeout).text
184+
print(f"{url} is {len(web_result)} bytes")
185+
186+
with sqlite3.connect('example.db') as conn, conn as cursor:
187+
cursor.execute('CREATE TABLE IF NOT EXISTS web_results (url text, length int);')
188+
cursor.execute('INSERT INTO web_results VALUES (?, ?)', (url, len(web_result)))
189+
except Exception as e:
190+
print(f"{url} generated an exception: {e}")
191+
return False
192+
else:
193+
return True
194+
195+
succeeded = [
196+
extract_and_load(url)
197+
for url in URLs
198+
]
199+
200+
print(f"Successfully completed {sum(1 for result in succeeded if result)}")
201+
```

docs/index.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
pylateral
2+
=========
3+
4+
**Simple multi-threaded task processing in python**
5+
6+
Example
7+
-------
8+
9+
import urllib.request
10+
11+
import pylateral
12+
13+
@pylateral.task
14+
def request_and_print(url):
15+
response = urllib.request.urlopen(url)
16+
print(response.read())
17+
18+
URLS = [
19+
"https://www.nytimes.com/",
20+
"https://www.cnn.com/",
21+
"https://europe.wsj.com/",
22+
"https://www.bbc.co.uk/",
23+
"https://some-made-up-domain.com/",
24+
]
25+
26+
with pylateral.task_pool():
27+
for url in URLS:
28+
request_and_print(url)
29+
30+
print("Complete!")
31+
32+
### What's going on here
33+
34+
- `def request_and_print(url)` is a *pylateral* task that, when called, is run on a task pool thread rather than on the main thread.
35+
36+
- `with pylateral.task_pool()` allocates threads and a task pool. The context manager may exit only when there are no remaining tasks.
37+
38+
- Each call to `request_and_print(url)` adds that task to the task pool. Meanwhile, the main thread continues execution.
39+
40+
- The `Complete!` statement is printed after all the `request_and_print()` task invocations are complete by the pool threads.
41+
42+
To learn more about the features of *pylateral*, check out the [usage](usage.md) section.
43+
44+
Background
45+
----------
46+
47+
A couple of years ago, I inherited my company's codebase to get data into our data warehouse using an ELT approach (extract-and-loads done in python, transforms done in [dbt](https://www.getdbt.com/)/SQL). The codebase has dozens of python scripts to integrate first-party and third-party data from databases, FTPs, and APIs, which are run on a scheduler (typically daily or hourly). The scripts I inherited were single-threaded procedural scripts, looking like glue code, and spending most of their time in network I/O. This got my company pretty far!
48+
49+
As my team and I added more and more integrations with more and more data, we wanted to have faster and faster scripts to reduce our dev cycles and reduce our multi-hour nightly jobs to minutes. Because our scripts were network-bound, multi-threading was a good way to accomplish this, and so I looked into `concurrent.futures` and `asyncio`, but I decided against these options because:
50+
51+
1. It wasn't immediately apparently how to adapt my codebase to use these libraries without either some fundamental changes to our execution platform and/or reworking of our scripts from the ground up and/or adding significant lines of multi-threading code to each script.
52+
53+
2. I believe the procedural style glue code we have is quite easy to comprehend, which I think has a positive impact on the scale of supporting a wide-variety of programs.
54+
55+
And so, I designed *pylateral*, a simple interface to `concurrent.futures.ThreadPoolExecutor` for extract-and-load workloads. The design considerations of this interface include:
56+
57+
- The usage is minimally-invasive to the original un-threaded approach of my company's codebase. (And so, teaching the library has been fairly straightforward despite the multi-threaded paradigm shift.)
58+
59+
- The `@pylateral.task` decorator should be used to encapsulate a homogeneous method accepting different parameters. The contents of the method should be primarily I/O to achieve the concurrency gains of python multi-threading.
60+
61+
- If no `pylateral.pool` context manager has been entered, or if it has been disabled by an environment variable, the `@pylateral.task` decorator does nothing (and the code runs serially).
62+
63+
- While it's possible to return a value from a `@pylateral.task` method, I encourage my team to use the decorator to start-and-complete work; think of writing "embarrassingly parallel" methods that can be "mapped".
64+
65+
### Why not other libraries?
66+
67+
I think that *pylateral* meets an unmet need in python's concurrency eco-system: a simple way to gain the benefits of multi-threading without radically transforming either mindset or codebase.
68+
69+
That said, I don't think *pylateral* is a [silver bullet](https://en.wikipedia.org/wiki/No_Silver_Bullet). See my [comparison](comparison.md) of *pylateral* against other concurrency offerings.

docs/requirements.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
mkdocs==1.0.4
2+
markdown==3.2
3+
mkdocs-exclude==1.0.2
4+
mkdocs-material==4.6.2
5+
markdown-include==0.5.1
6+
pygments==2.5.2

0 commit comments

Comments
 (0)