Skip to content

Commit b91f9ef

Browse files
Yrahcaz7BethanyG
andauthored
[Acronym] Add new approach and update performance article (#4203)
* fix benchmark & add timings for new approach * update "measurements were taken on" part * improve performance article * fix the performance article's `config.json` * add approach docs for new approach * Apply suggestions from code review Co-authored-by: BethanyG <BethanyG@users.noreply.github.com> * minor fixes --------- Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>
1 parent 6fcab00 commit b91f9ef

7 files changed

Lines changed: 154 additions & 60 deletions

File tree

exercises/practice/acronym/.approaches/config.json

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
{
22
"introduction": {
3-
"authors": ["bethanyg"]
3+
"authors": ["bethanyg"],
4+
"contributors": ["yrahcaz7"]
45
},
56
"approaches": [
67
{
@@ -51,6 +52,13 @@
5152
"title": "Regex Sub",
5253
"blurb": "Use re.sub() to clean the input string and create the acronym in one step.",
5354
"authors": ["bethanyg"]
55+
},
56+
{
57+
"uuid": "0ce3eaf7-da79-403d-a481-5dd8f476d286",
58+
"slug": "double-generator-expression",
59+
"title": "Double Generator Expression",
60+
"blurb": "Use generator expressions for both cleaning and joining the input.",
61+
"authors": ["yrahcaz7"]
5462
}
5563
]
5664
}
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Using a `generator-expression` for both cleaning and joining
2+
3+
```python
4+
from string import ascii_letters
5+
6+
7+
VALID_CHARS = {' ', '-'} | set(ascii_letters)
8+
9+
10+
def abbreviate(to_abbreviate):
11+
to_abbreviate = ''.join(' ' if char == '-' else char
12+
for char in to_abbreviate
13+
if char in VALID_CHARS)
14+
15+
return ''.join(word[0] for word in to_abbreviate.split()).upper()
16+
```
17+
18+
One way someone might try to increase performce is to use a single [generator expression][generator-expression] to clean the input, rather than using multiple calls to [`str.replace()`][str-replace].
19+
However, this approach is actually amongst the slower ones.
20+
(See the [performance article][article-performance] for more detail.)
21+
22+
In this approach, the `VALID_CHARS` constant is first defined using `string.ascii_letters`, a space, and a hyphen.
23+
In `abbreviate()`, the first generator expression iterates over `to_abbreviate`, excluding any code points that are not a member of the `VALID_CHARS` set.
24+
For each code point that is not excluded, the expression passes it into [`str.join()`][str-join] (unless it is a hyphen, in which case it replaces the hyphen with a space).
25+
`to_abbreviate` is then set to the result of the `str.join()`, preparing it for the next step.
26+
27+
Next, [`to_abbreviate.split()`][str-split] is used to split `to_abbreviate` into words separated by whitespace — we can ignore the case of hyphens as we already replaced all of them with spaces.
28+
Now the second generator expression iterates over the list returned by `to_abbreviate.split()`, yeilding the first code point in each word.
29+
These code points are passed to another `str.join()`, which is then [chained][chaining] to [`str.upper()`][str-upper].
30+
Now that both steps are complete, we return the result of `str.upper()` directly on the same line.
31+
32+
[article-performance]: https://exercism.org/tracks/python/exercises/acronym/articles/performance
33+
[chaining]: https://pyneng.readthedocs.io/en/latest/book/04_data_structures/method_chaining.html
34+
[generator-expression]: https://dbader.org/blog/python-generator-expressions
35+
[str-join]: https://docs.python.org/3/library/stdtypes.html#str.join
36+
[str-replace]: https://docs.python.org/3/library/stdtypes.html#str.replace
37+
[str-split]: https://docs.python.org/3/library/stdtypes.html#str.split
38+
[str-upper]: https://docs.python.org/3/library/stdtypes.html#str.upper
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
VALID_CHARS = {' ', '-'} | set(ascii_letters)
2+
3+
def abbreviate(to_abbreviate):
4+
to_abbreviate = ''.join(' ' if char == '-' else char
5+
for char in to_abbreviate
6+
if char in VALID_CHARS)
7+
8+
return ''.join(word[0] for word in to_abbreviate.split()).upper()

exercises/practice/acronym/.approaches/introduction.md

Lines changed: 38 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,12 @@ Among them are:
66
- Using `str.replace()` to scrub the input, and:
77
- joining with a `for loop` with string concatenation via the `+` operator.
88
- joining via `str.join()`, passing a `list-comprehension` or `generator-expression`.
9-
- joining via `str.join()`, passing `map()`.
9+
- joining via `str.join()`, passing `map()`.
1010
- joining via `functools.reduce()`.
1111

1212
- Using `re.findall()`/`re.finditer()` to scrub the input, and:
1313
- joining via `str.join()`, passing a `generator-expression`.
14-
15-
- Using `re.sub()` for both cleaning and joining (_using "only" regex for almost everything_)`
14+
- Using `re.sub()` for both cleaning and joining (_using "only" regex for almost everything_)`
1615

1716

1817
## General Guidance
@@ -51,25 +50,25 @@ def abbreviate(to_abbreviate):
5150
For more information, take a look at the [loop approach][approach-loop].
5251

5352

54-
## Approach: scrub with `replace()` and join via `list comprehension` or `Generator expression`
53+
## Approach: scrub with `replace()` and join via `list comprehension` or `generator expression`
5554

5655

5756
```python
5857
def abbreviate(to_abbreviate):
5958
phrase = to_abbreviate.replace('-', ' ').replace('_', ' ').upper().split()
6059

6160
return ''.join([word[0] for word in phrase])
62-
63-
###OR###
64-
61+
62+
###OR###
63+
6564
def abbreviate(to_abbreviate):
6665
phrase = to_abbreviate.replace('-', ' ').replace('_', ' ').upper().split()
6766

68-
# note the parenthesis instead of square brackets.
67+
# Note the parenthesis instead of square brackets.
6968
return ''.join((word[0] for word in phrase))
7069
```
7170

72-
For more information, check out the [list-comprehension][approach-list-comprehension] approach or the [generator-expression][approach-generator-expression] approach.
71+
For more information, check out the [list-comprehension][approach-list-comprehension] approach or the [generator-expression][approach-generator-expression] approach.
7372

7473

7574
## Approach: scrub with `replace()` and join via `map()`
@@ -96,7 +95,7 @@ def abbreviate(to_abbreviate):
9695
return reduce(lambda start, word: start + word[0], phrase, "")
9796
```
9897

99-
For more information, take a look at the [functools.reduce()][approach-functools-reduce] approach.
98+
For more information, take a look at the [`functools.reduce()`][approach-functools-reduce] approach.
10099

101100

102101
## Approach: filter with `re.findall()` and join via `str.join()`
@@ -105,8 +104,8 @@ For more information, take a look at the [functools.reduce()][approach-functools
105104
import re
106105

107106

108-
def abbreviate(phrase):
109-
removed = re.findall(r"[a-zA-Z']+", phrase)
107+
def abbreviate(to_abbreviate):
108+
removed = re.findall(r"[a-zA-Z']+", to_abbreviate)
110109

111110
return ''.join(word[0] for word in removed).upper()
112111
```
@@ -120,36 +119,57 @@ For more information, take a look at the [regex-join][approach-regex-join] appro
120119
import re
121120

122121

123-
def abbreviate_regex_sub(to_abbreviate):
122+
def abbreviate(to_abbreviate):
124123
pattern = re.compile(r"(?<!_)\B[\w']+|[ ,\-_]")
125124

126-
return re.sub(pattern, "", to_abbreviate.upper())
125+
return re.sub(pattern, "", to_abbreviate.upper())
127126
```
128127

129128
For more information, read the [regex-sub][approach-regex-sub] approach.
130129

131130

131+
## Approach: use a `generator-expression` for both cleaning and joining
132+
133+
```python
134+
from string import ascii_letters
135+
136+
137+
VALID_CHARS = {' ', '-'} | set(ascii_letters)
138+
139+
140+
def abbreviate(to_abbreviate):
141+
to_abbreviate = ''.join(' ' if char == '-' else char
142+
for char in to_abbreviate
143+
if char in VALID_CHARS)
144+
145+
return ''.join(word[0] for word in to_abbreviate.split()).upper()
146+
```
147+
148+
For more information, take a look at the [double `generator-expression` approach][approach-double-generator-expression].
149+
150+
132151
## Other approaches
133152

134-
Besides these seven idiomatic approaches, there are a multitude of possible variations using different string cleaning and joining methods.
153+
Besides these eight idiomatic approaches, there are a multitude of possible variations using different string cleaning and joining methods.
135154

136155
However, these listed approaches cover the majority of 'mainstream' strategies.
137156

138157

139158
## Which approach to use?
140159

141-
All seven approaches are idiomatic, and show multiple paradigms and possibilities.
160+
All eight approaches are idiomatic, and show multiple paradigms and possibilities.
142161
All approaches are also `O(n)`, with `n` being the length of the input string.
143162
No matter the removal method, the entire input string must be iterated through to be cleaned and the first letters extracted.
144163

145-
Of these strategies, the `loop` approach is the fastest, although `list-comprehension`, `map`, and `reduce` have near-identical performance for the test data.
164+
Of these strategies, the `loop` approach is the fastest, although `list-comprehension`, `map`, and `reduce` have near-identical performance for the test data.
146165
All approaches are fairly succinct and readable, although the 'classic' loop is probably the easiest understood by those coming to Python from other programming languages.
147166

148167

149-
The least performant for the test data was using a `generator-expression`, `re.findall` and `re.sub` (_least performant_).
168+
The least performant for the test data was using `generator-expression`s (both one and two), `re.findall`, and `re.sub`.
150169

151170
To compare performance of the approaches, take a look at the [Performance article][article-performance].
152171

172+
[approach-double-generator-expression]: https://exercism.org/tracks/python/exercises/acronym/approaches/double-generator-expression
153173
[approach-functools-reduce]: https://exercism.org/tracks/python/exercises/acronym/approaches/functools-reduce
154174
[approach-generator-expression]: https://exercism.org/tracks/python/exercises/acronym/approaches/generator-expression
155175
[approach-list-comprehension]: https://exercism.org/tracks/python/exercises/acronym/approaches/list-comprehension

exercises/practice/acronym/.articles/config.json

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,8 @@
55
"slug": "performance",
66
"title": "Performance deep dive",
77
"blurb": "Deep dive to find out the most performant approach to forming an acronym.",
8-
"authors": ["bethanyg, colinleach"]
8+
"authors": ["bethanyg", "colinleach"],
9+
"contributors": ["yrahcaz7"]
910
}
1011
]
11-
}
12+
}

exercises/practice/acronym/.articles/performance/code/Benchmark.py

Lines changed: 33 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,18 @@
1212
import timeit
1313
import re
1414
from functools import reduce
15+
from string import ascii_letters
1516

1617
import pandas as pd
1718
import numpy as np
1819

1920

21+
FIND_INCLUSION_REGEX = re.compile(r"[a-zA-Z']+")
22+
SUB_EXCLUSION_REGEX = re.compile(r"(?<!_)\B[\w']+|[ ,\-_]")
23+
FINDALL_INCLUSION_REGEX = re.compile(r"(?<!')\b[a-zA-Z]|(?<=_)[^ _']")
24+
VALID_CHARS = {' ', '-'} | set(ascii_letters)
25+
26+
2027
# ------------ FUNCTIONS TO TIME ------------- #
2128
def abbreviate_list_comprehension(to_abbreviate):
2229
phrase = to_abbreviate.replace("_", " ").replace("-", " ").upper().split()
@@ -52,27 +59,34 @@ def abbreviate_reduce(to_abbreviate):
5259

5360

5461
def abbreviate_regex_join(phrase):
55-
removed = re.findall(r"[a-zA-Z']+", phrase)
62+
removed = re.findall(FIND_INCLUSION_REGEX, phrase)
5663
return ''.join(word[0] for word in removed).upper()
5764

5865

5966
def abbreviate_finditer_join(to_abbreviate):
6067
return ''.join(word[0][0] for word in
61-
re.finditer(r"[a-zA-Z']+", to_abbreviate)).upper()
68+
re.finditer(FIND_INCLUSION_REGEX, to_abbreviate)).upper()
6269

6370

6471
def abbreviate_regex_sub(to_abbreviate):
65-
pattern = re.compile(r"(?<!_)\B[\w']+|[ ,\-_]")
66-
return re.sub(pattern, "", to_abbreviate).upper()
72+
return re.sub(SUB_EXCLUSION_REGEX, "", to_abbreviate).upper()
6773

6874

6975
def abbreviate_regex_findall(to_abbreviate):
70-
return ''.join(re.findall(r"(?<!')\b[a-zA-Z]|(?<=_)[^ _']", to_abbreviate.upper()))
76+
return ''.join(re.findall(FINDALL_INCLUSION_REGEX, to_abbreviate.upper()))
77+
78+
79+
def abbreviate_double_genex(to_abbreviate):
80+
to_abbreviate = ''.join(' ' if char == '-' else char
81+
for char in to_abbreviate
82+
if char in VALID_CHARS)
83+
84+
return ''.join(word[0] for word in to_abbreviate.split()).upper()
7185

7286

7387
## ---------END FUNCTIONS TO BE TIMED-------------------- ##
7488

75-
## -------- Timing Code Starts Here ---------------------##
89+
## --------- Timing Code Starts Here -------------------- ##
7690

7791

7892
# Input Data Setup
@@ -109,22 +123,23 @@ def abbreviate_regex_findall(to_abbreviate):
109123
]
110124

111125

112-
# #Set up columns and rows for Pandas Data Frame
113-
col_headers = [f'Length: {len(item)}'for item in inputs]
126+
# Set up columns and rows for Pandas Data Frame
127+
col_headers = [f'Length: {len(item)}' for item in inputs]
114128
row_headers = ["loop with str.replace",
115-
"list_comprehension with str.join()",
129+
"list comprehension with str.join()",
116130
"map() with str.replace()",
117131
"functools.reduce() with str.replace()",
118132
"generator expression with str.join()",
119133
"regex to clean with str.join()",
120134
"re.finditer() with str.join()",
121135
"re.sub() to clean and join",
122-
"re.findall() 1st letters w/ str.join()"]
136+
"re.findall() 1st letters with str.join()",
137+
"two generator expressions"]
123138

124-
# # empty dataframe will be filled in one cell at a time later
139+
# Empty dataframe will be filled in one cell at a time later.
125140
df = pd.DataFrame(np.nan, index=row_headers, columns=col_headers)
126141

127-
# #Function List to Call When Timing
142+
# Function List to Call When Timing.
128143
functions = [abbreviate_loop,
129144
abbreviate_list_comprehension,
130145
abbreviate_map,
@@ -133,9 +148,10 @@ def abbreviate_regex_findall(to_abbreviate):
133148
abbreviate_regex_join,
134149
abbreviate_finditer_join,
135150
abbreviate_regex_sub,
136-
abbreviate_regex_findall]
151+
abbreviate_regex_findall,
152+
abbreviate_double_genex]
137153

138-
# Run timings using timeit.autorange(). Run Each Set 3 Times.
154+
# Run timings using timeit.autorange(). Run Each Set 3 Times.
139155
for function, title in zip(functions, row_headers):
140156
timings = [[
141157
timeit.Timer(lambda: function(data), globals=globals()).autorange()[1] /
@@ -149,9 +165,9 @@ def abbreviate_regex_findall(to_abbreviate):
149165
print(f'{title}', f'Timings : {timing_result}')
150166

151167
# Insert results into the dataframe
152-
df.loc[title, 'Length: 13':'Length: 1114'] = timing_result
168+
df.loc[title, 'Length: 13':'Length: 2940'] = timing_result
153169

154-
# The next bit is useful for `introduction.md`
170+
# The next bit is useful for updating `content.md` with new results.
155171
pd.options.display.float_format = '{:,.2e}'.format
156172
print('\nDataframe in Markdown format:\n')
157-
print(df.to_markdown(floatfmt=".2e"))
173+
print(df.to_markdown(floatfmt=".2e"))

0 commit comments

Comments
 (0)