Skip to content

Commit b0f14c5

Browse files
committed
[Reverse String] Approach docs cleanup
1 parent a0becd6 commit b0f14c5

20 files changed

Lines changed: 346 additions & 388 deletions

File tree

exercises/practice/reverse-string/.approaches/additional-approaches/content.md

Lines changed: 46 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -4,152 +4,142 @@ Below are some interesting strategies that are distinct from the canonical appro
44
While they do not offer particular performance boosts over the canonical approaches (_and some offer very large penalties_), they do explore interesting corners of Python.
55

66

7-
## Convert the Input to a UTF-8 bytearray and use a Sliding Window to Reverse
8-
7+
## Convert the Input to a UTF-8 `bytearray` and use a Sliding Window to Reverse
98

109
```python
1110
def reverse(text):
12-
1311
# Create bytearrays for input and output.
1412
given, output = bytearray(text.encode("utf-8")), bytearray(len(text))
1513
index = 0
16-
LENGTH_MASK = 0xE0 # this is 0b11110000 (binary) or 224 (decimal)
14+
LENGTH_MASK = 0xE0 # <-- This is 0b11110000 (binary) or 224 (decimal)
1715

1816
# Loop through the input bytearray.
1917
while index < len(given):
20-
21-
#Either the len is 1 or it is calculated by counting the bits after masking.
22-
seq_len = (not(given[index] >> 7) or
18+
19+
# Either the len is 1 or it is calculated by counting the bits after masking.
20+
seq_len = (not (given[index] >> 7) or
2321
(given[index] & LENGTH_MASK).bit_count())
2422

25-
#Calculate the index start.
26-
location = index + seq_len +1
27-
28-
#Prepend the byte segment to the output bytearray
23+
# Calculate the index start.
24+
location = index + seq_len + 1
25+
26+
# Prepend the byte segment to the output bytearray
2927
output[-location:-index or None] = given[index:index + seq_len]
30-
31-
#Increment the index count or slide the 'window'.
28+
29+
# Increment the index count / slide the 'window'.
3230
index += seq_len
3331

34-
#Decode output to UTF-8 string and return.
32+
# Decode output to UTF-8 string and return.
3533
return output.decode("utf-8")
36-
3734
```
3835

3936
This strategy encodes the string into a UTF-8 [`bytearray`][bytearray].
40-
It then uses a `while` loop to iterate through the text, calculating the length of a sequence (or 'window') to slice from 'given' and prepend to 'output'.
41-
The 'index' counter is then incremented by the length of the 'window'.
42-
Once the 'index' is greater than the length of 'given', the 'output' bytearray is decoded into a UTF-8 string and returned.
43-
This is (_almost_) the same set of operations as described in the code below, but operating on bytes in a bytearray, as opposed to text/codepoints in a `list` - although this strategy does not use `list.pop()` (_bytearray objects do not have a pop method_).
37+
It then uses a `while` loop to iterate through the text, calculating the length of a sequence (or 'window') to slice from `given` and prepend to `output`.
38+
The `index` counter is then incremented by the length of the 'window'.
39+
Once the `index` is greater than the length of `given`, the `output` `bytearray` is decoded into a UTF-8 string and returned.
40+
This is (_almost_) the same set of operations as described in the next approach, but operating on bytes in a `bytearray`, as opposed to text/codepoints in a `list` although this strategy does not use `list.pop()` (_`bytearray` objects do not have a pop method_).
4441

45-
This uses `O(n)` space for the output array.
42+
This uses `O(n)` space for the output array.
4643
It incurs additional runtime overhead by _prepending_ to the output array, which is an expensive operation that forces many repeated shifts.
4744
Encoding to bytes and decoding to codepoints further slow this approach.
4845

4946

50-
## Convert the Input to a list and use a While Loop to Pop and Append to a Second List
51-
47+
## Convert the Input to a `list` and use a While Loop to Pop and Append to a Second List
5248

5349
```python
5450
def reverse(text):
55-
codepoints, stniopedoc = list(text), []
51+
codepoints, output = list(text), []
5652

5753
while codepoints:
58-
stniopedoc.append(codepoints.pop())
59-
60-
return ''.join(stniopedoc)
54+
output.append(codepoints.pop())
55+
56+
return "".join(output)
6157
```
6258

6359
This strategy uses two lists.
6460
One `list` for the codepoints in the text, and one to hold the codepoints in reverse order.
65-
First, the input text is turned into a the 'codepoints' `list`, and iterated over.
66-
Each codepoint is `pop()`ped from 'codepoints' and appended to the 'stniopedoc' `list`.
67-
Finally, 'stniopedoc' is joined via `str.join()` to create the reversed string.
68-
69-
While this is a straightforward and readable approach, it creates both memory and performance overhead, due to the creation of the lists and the use of `join()`.
70-
This is much faster than the bytearray strategy or using string concatenation, but is still almost slower than the slicing strategy.
71-
It also takes up `O(n)` auxiliary space with the stniopedoc list.
61+
First, the input text is turned into the `codepoints` `list`, and iterated over.
62+
Each codepoint is `pop()`ped from `codepoints` and appended to the `output` `list`.
63+
Finally, `output` is joined via `str.join()` to create the reversed string.
7264

65+
While this is a straightforward and readable approach, it creates both memory and performance overhead, due to the creation of the lists and the use of `str.join()`.
66+
This is much faster than the bytearray strategy or using string concatenation, but is still slightly slower than the slicing strategy.
67+
It also takes up `O(n)` auxiliary space with the `output` list.
7368

7469

7570
## Using Recursion Instead of a Loop
7671

77-
7872
```python
7973
def reverse(text):
8074
if len(text) == 0:
8175
return text
82-
else:
83-
return reverse(text[1:]) + text[0]
76+
return reverse(text[1:]) + text[0]
8477
```
8578

8679
This strategy uses a slice to copy all but the leftmost part of the string, concatenating the codepoint at the first index to the end.
8780
The function then calls itself with the (now shorter) text slice.
88-
This slice + concatenation process continues until the `len()` is 0, and the reversed text is returned up the call stack.
81+
This slice/concatenate process continues until the `len()` is 0, and the reversed text is returned up the call stack.
8982
This is the same as iterating over the string backward in a `loop`, appending each codepoint to a new string, and has identical time complexity.
90-
It also uses O(n) space, with the space being successive calls on the call stack.
83+
It also uses `O(n**2)` space, as the space taken up by successive calls on the call stack builds up.
9184

9285
Because each recursive call is placed on the stack and Python limits recursive calls to a max of 1000, this code produces a `maximum recursion depth exceeded` error for any string longer than ~999 characters.
9386

9487

95-
## Using `map()` and `lambbda` with `Join()` Instead of a Loop
88+
## Using `map()` and a `lambda` with `str.join()` Instead of a Loop
9689

9790
```python
9891
def reverse(text):
99-
return "".join(list(map(lambda x: text[(-x-1)], range(len(text)))))
92+
return "".join(list(map(lambda x: text[-x - 1], range(len(text)))))
10093
```
10194

10295
This variation uses the built-in `map()` and a `lambda` to iterate over the string backward, constructing a `list`.
10396
The `list` is then fed to `str.join()`, which unpacks it and turns it into a string.
10497
This is a very non-performant way to walk the string backwards, and also incurs extra overhead due to the unneeded construction of an intermediary `list`.
10598

106-
`map()` can instead be directly fed to `join()`, which improves performance to `O(n)`:
99+
`map()` can instead be directly fed to `str.join()`, which improves performance:
107100

108101
```python
109102
def reverse(text):
110-
return "".join(map(lambda x: text[(-x-1)], range(len(text))))
103+
return "".join(map(lambda x: text[-x - 1], range(len(text))))
111104
```
112105

113106

114107
## Using a `lambda` that returns a Reverse Sequence Slice
115108

116-
117109
```python
118110
reverse = lambda text: text[::-1]
119111
```
120112

121-
122113
This strategy assigns the name "reverse" to a `lambda` that produces a reverse slice of the string.
123114
This looks quite clever and is shorter than a "traditional" function, but it is far from obvious that this line defines a callable named "reverse" that returns a reversed string.
124-
While this code compiles to the same function definition as the first approach article, it is not clear to many programmers who might read through this code that they could call `reverse('some_string')` the way they could call other functions.
115+
While this code compiles to the same function definition as the first approach article, it is not clear to many programmers who might read through this code that they could call `reverse("some_string")` the way they could call other functions.
125116

126117

127-
This has the added disadvantage of creating troubleshooting issues since any errors will be attributed to `lambda` in the stack trace and not associated with an explicit function named `reverse`.
118+
This has the added disadvantage of creating troubleshooting issues, since any errors will be attributed to `lambda` in the stack trace and not associated with an explicit function named `reverse`.
128119
Help calls and `__repr__` calls are similarly affected.
129-
This is not the intended use of `lambdas` (_which are for unnamed or anonymous functions_), nor does it confer any sort of performance boost over other methods, but _does_ create readability issues with anyone unfamiliar with `lambda` syntax and compilation.
120+
This is not the intended use of `lambdas` (_which are for unnamed or anonymous functions_), nor does it confer any sort of performance boost over other methods, but it _does_ create readability issues with anyone unfamiliar with `lambda` syntax and compilation.
130121

131122

132123
## Timings vs Reverse Slice
133124

134-
135125
As a (very) rough comparison, below is a timing table for these functions vs the canonical reverse slice:
136126

137127

138-
| **string lengths >>>>** | Str Len: 5 | Str Len: 11 | Str Len: 22 | Str Len: 52 | Str Len: 68 | Str Len: 86 | Str Len: 142 | Str Len: 1420 | Str Len: 14200 | Str Len: 142000 |
139-
|------------------------- |------------ |------------- |------------- |------------- |------------- |------------- |-------------- |--------------- |---------------- |----------------- |
140-
| reverse slice | 1.66e-07 | 1.75e-07 | 1.79e-07 | 2.03e-07 | 2.22e-07 | 2.38e-07 | 3.63e-07 | 1.44e-06 | 1.17e-05 | 1.16e-04 |
141-
| reverse lambda | 1.68e-07 | 1.72e-07 | 1.85e-07 | 2.03e-07 | 2.44e-07 | 2.35e-07 | 3.65e-07 | 1.47e-06 | 1.25e-05 | 1.18e-04 |
142-
| reverse dual lists | 9.17e-07 | 1.56e-06 | 2.70e-06 | 5.69e-06 | 8.30e-06 | 1.07e-05 | 1.80e-05 | 1.48e-04 | 1.50e-03 | 1.53e-02 |
143-
| reverse recursive | 8.74e-07 | 1.90e-06 | 4.02e-06 | 8.97e-06 | 1.24e-05 | 1.47e-05 | 3.34e-05 | --- | --- | --- |
144-
| reverse bytes | 1.92e-06 | 3.82e-06 | 7.36e-06 | 1.65e-05 | 2.17e-05 | 2.71e-05 | 4.47e-05 | 5.17e-04 | 6.10e-03 | 2.16e-01 |
128+
| **string length >>>>** | 5 | 11 | 22 | 52 | 68 | 86 | 142 | 1420 | 14200 | 142000 |
129+
|------------------------|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|
130+
| reverse slice | 1.66e-07 | 1.75e-07 | 1.79e-07 | 2.03e-07 | 2.22e-07 | 2.38e-07 | 3.63e-07 | 1.44e-06 | 1.17e-05 | 1.16e-04 |
131+
| reverse lambda | 1.68e-07 | 1.72e-07 | 1.85e-07 | 2.03e-07 | 2.44e-07 | 2.35e-07 | 3.65e-07 | 1.47e-06 | 1.25e-05 | 1.18e-04 |
132+
| reverse dual lists | 9.17e-07 | 1.56e-06 | 2.70e-06 | 5.69e-06 | 8.30e-06 | 1.07e-05 | 1.80e-05 | 1.48e-04 | 1.50e-03 | 1.53e-02 |
133+
| reverse recursive | 8.74e-07 | 1.90e-06 | 4.02e-06 | 8.97e-06 | 1.24e-05 | 1.47e-05 | 3.34e-05 | --- | --- | --- |
134+
| reverse bytes | 1.92e-06 | 3.82e-06 | 7.36e-06 | 1.65e-05 | 2.17e-05 | 2.71e-05 | 4.47e-05 | 5.17e-04 | 6.10e-03 | 2.16e-01 |
145135

146136

147137
As you can see, the reverse using two lists and the reverse using a bytearray are orders of magnitude slower than using a reverse slice.
148138
For the largest inputs measured, the dual list solution was almost 55x slower, and the bytearray solution was almost 1800x slower.
149139
Timings for strings over 142 characters could not be run for the recursive strategy, due to Python's 1000 call recursion limit.
150140

151141

152-
Measurements were taken on a 3.1 GHz Quad-Core Intel Core i7 Mac running MacOS Ventura.
142+
Measurements were taken on a 3.1 GHz Quad-Core Intel Core i7 Mac running MacOS Ventura.
153143
Tests used `timeit.Timer.autorange()`, repeated 3 times.
154144
Time is reported in seconds taken per string after calculating the 'best of' time.
155145
The [`timeit`][timeit] module docs have more details, and [note.nkmk.me][note_nkmk_me] has a nice summary of methods.
Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
given, output = bytearray(text.encode("utf-8")), bytearray(len(given))
2-
index, LENGTH_MASK = 0, 0xE0 # 0b11110000 or 224
3-
while index < len(given):
4-
seq_len = not(given[index] >> 7) or (given[index] & LENGTH_MASK).bit_count()
5-
location = index + seq_len +1
6-
output[-location:-index or None] = given[index:index + seq_len]
7-
index += seq_len
8-
return output.decode("utf-8")
1+
given, output = bytearray(text.encode("utf-8")), bytearray(len(given))
2+
index, LENGTH_MASK = 0, 0xE0 # <-- 0b11110000 or 224
3+
while index < len(given):
4+
seq_len = not (given[index] >> 7) or (given[index] & LENGTH_MASK).bit_count()
5+
location = index + seq_len + 1
6+
output[-location:-index or None] = given[index:index + seq_len]
7+
index += seq_len
8+
return output.decode("utf-8")

0 commit comments

Comments
 (0)