Skip to content

Commit b58da97

Browse files
authored
Refactoring and improvements (#81)
1 parent b865832 commit b58da97

File tree

13 files changed

+918
-669
lines changed

13 files changed

+918
-669
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1+
build/
12
dist/
23
*.egg-info/
34
.python-version
45
__pycache__
6+
.tox/
7+
.direnv/
8+
.envrc

README.md

Lines changed: 115 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,25 @@
11
# Scrapy Autounit
22

33
[![AppVeyor](https://ci.appveyor.com/api/projects/status/github/scrapinghub/scrapy-autounit?branch=master&svg=true)](https://ci.appveyor.com/project/scrapinghub/scrapy-autounit/branch/master)
4-
[![PyPI Version](https://img.shields.io/pypi/v/scrapy-autounit.svg?color=blue)](https://pypi.python.org/pypi/scrapy-autounit/)
4+
[![PyPI Version](https://img.shields.io/pypi/v/scrapy-autounit.svg?color=blue)](https://pypi.python.org/pypi/scrapy-autounit/)
5+
 
6+
## Documentation
7+
- [Overview](#overview)
8+
- [Installation](#installation)
9+
- [Usage](#usage)
10+
- [Caveats](#caveats)
11+
- [Settings](#settings)
12+
- [Command Line Interface](#command-line-interface)
13+
- [Internals](#internals)
14+
 
515

616
## Overview
717

818
Scrapy-Autounit is an automatic test generation tool for your Scrapy spiders.
919

1020
It generates test fixtures and tests cases as you run your spiders.
11-
The test fixtures are generated from the items and requests that your spider yields, then the test cases evaluate those fixtures against your spiders' callbacks.
21+
22+
The fixtures are generated from the items and requests that your spider returns, then the test cases evaluate those fixtures against your spiders' callbacks.
1223

1324
Scrapy Autounit generates fixtures and tests per spider and callback under the Scrapy project root directory.
1425
Here is an example of the directory tree of your project once the fixtures are created:
@@ -36,12 +47,14 @@ my_project
3647
│   └── my_spider.py
3748
└── scrapy.cfg
3849
```
50+
 
3951

4052
## Installation
4153

4254
```
4355
pip install scrapy_autounit
4456
```
57+
 
4558

4659
## Usage
4760

@@ -62,74 +75,92 @@ To generate your fixtures and tests just run your spiders as usual, Scrapy Autou
6275
$ scrapy crawl my_spider
6376
```
6477
When the spider finishes, a directory `autounit` is created in your project root dir, containing all the generated tests/fixtures for the spider you just ran (see the directory tree example above).
65-
If you want to **update** your tests and fixtures you only need to run your spiders again.
78+
79+
If you want to **update** your tests and fixtures you only need to run your spiders again or use the [`autounit update`](#autounit-update) command line tool.
6680

6781
### Running tests
6882
To run your tests you can use `unittest` regular commands.
6983

7084
###### Test all
7185
```
72-
$ python -m unittest
86+
$ python -m unittest discover autounit/tests/
7387
```
7488
###### Test a specific spider
7589
```
76-
$ python -m unittest discover -s autounit.tests.my_spider
90+
$ python -m unittest discover autounit/tests/my_spider/
7791
```
7892
###### Test a specific callback
7993
```
80-
$ python -m unittest discover -s autounit.tests.my_spider.my_callback
81-
```
82-
###### Test a specific fixture
83-
```
84-
$ python -m unittest autounit.tests.my_spider.my_callback.test_fixture2
94+
$ python -m unittest discover autounit/tests/my_spider/my_callback/
8595
```
96+
 
8697

8798
## Caveats
8899
- Keep in mind that as long as `AUTOUNIT_ENABLED` is on, each time you run a spider tests/fixtures are going to be generated for its callbacks.
89100
This means that if you have your tests/fixtures ready to go, this setting should be off to prevent undesired overwrites.
90101
Each time you want to regenerate your tests (e.g.: due to changes in your spiders), you can turn this on again and run your spiders as usual.
102+
For example, this setting should be off when running your spiders in Scrapy Cloud.
91103

92-
- Autounit uses an internal `_autounit` key in requests' meta dictionaries. Avoid using/overriding this key in your spiders when adding data to meta to prevent unexpected behaviours.
104+
- Autounit uses an internal `_autounit_cassette` key in requests' meta dictionaries. Avoid using/overriding this key in your spiders when adding data to meta to prevent unexpected behaviours.
105+
 
93106

94107
## Settings
95108

96-
**AUTOUNIT_ENABLED**
109+
###### General
110+
111+
- **AUTOUNIT_ENABLED**
97112
Set this to `True` or `False` to enable or disable unit test generation.
98113

99-
**AUTOUNIT_MAX_FIXTURES_PER_CALLBACK**
114+
- **AUTOUNIT_MAX_FIXTURES_PER_CALLBACK**
100115
Sets the maximum number of fixtures to store per callback.
101116
`Minimum: 10`
102117
`Default: 10`
103118

104-
**AUTOUNIT_SKIPPED_FIELDS**
119+
- **AUTOUNIT_EXTRA_PATH**
120+
This is an extra string element to add to the test path and name between the spider name and callback name. You can use this to separate tests from the same spider with different configurations.
121+
`Default: None`
122+
123+
###### Output
124+
125+
- **AUTOUNIT_DONT_TEST_OUTPUT_FIELDS**
105126
Sets a list of fields to be skipped from testing your callbacks' items. It's useful to bypass fields that return a different value on each run.
106127
For example if you have a field that is always set to `datetime.now()` in your spider, you probably want to add that field to this list to be skipped on tests. Otherwise you'll get a different value when you're generating your fixtures than when you're running your tests, making your tests fail.
107128
`Default: []`
108129

109-
**AUTOUNIT_REQUEST_SKIPPED_FIELDS**
110-
Sets a list of request fields to be skipped when running your tests.
111-
Similar to AUTOUNIT_SKIPPED_FIELDS but applied to requests instead of items.
130+
###### Requests
131+
132+
- **AUTOUNIT_DONT_TEST_REQUEST_ATTRS**
133+
Sets a list of request attributes to be skipped when running your tests.
112134
`Default: []`
113135

114-
**AUTOUNIT_EXCLUDED_HEADERS**
136+
- **AUTOUNIT_DONT_RECORD_HEADERS**
115137
Sets a list of headers to exclude from requests recording.
116-
For security reasons, Autounit already excludes `Authorization` and `Proxy-Authorization` headers by default, if you want to include them in your fixtures see *`AUTOUNIT_INCLUDED_AUTH_HEADERS`*.
138+
For security reasons, Autounit already excludes `Authorization` and `Proxy-Authorization` headers by default, if you want to record them in your fixtures see *`AUTOUNIT_RECORD_AUTH_HEADERS`*.
117139
`Default: []`
118140

119-
**AUTOUNIT_INCLUDED_AUTH_HEADERS**
141+
- **AUTOUNIT_RECORD_AUTH_HEADERS**
120142
If you want to include `Authorization` or `Proxy-Authorization` headers in your fixtures, add one or both of them to this list.
121143
`Default: []`
122144

123-
**AUTOUNIT_INCLUDED_SETTINGS**
124-
Sets a list of settings names to be recorded in the generated test case.
145+
###### Spider attributes
146+
147+
- **AUTOUNIT_DONT_RECORD_SPIDER_ATTRS**
148+
Sets a list of spider attributes that won't be recorded into your fixtures.
125149
`Default: []`
126150

127-
**AUTOUNIT_EXTRA_PATH**
128-
This is an extra string element to add to the test path and name between the spider name and callback name. You can use this to separate tests from the same spider with different configurations.
129-
`Default: None`
151+
- **AUTOUNIT_DONT_TEST_SPIDER_ATTRS**
152+
Sets a list of spider attributes to be skipped from testing your callbacks. These attributes will still be recorded.
153+
`Default: []`
154+
155+
###### Settings
156+
157+
- **AUTOUNIT_RECORD_SETTINGS**
158+
Sets a list of settings names to be recorded in the generated test case.
159+
`Default: []`
130160

131161
---
132-
**Note**: Remember that you can always apply any of these settings per spider including them in your spider's `custom_settings` class attribute - see https://docs.scrapy.org/en/latest/topics/settings.html#settings-per-spider.
162+
**Note**: Remember that you can always apply any of these settings per spider including them in your spider's `custom_settings` class attribute - see https://docs.scrapy.org/en/latest/topics/settings.html#settings-per-spider.
163+
 
133164

134165
## Command line interface
135166

@@ -162,20 +193,26 @@ The original request that triggered the callback.
162193
***`response`***
163194
The response obtained from the original request and passed to the callback.
164195

165-
***`result`***
196+
***`output_data`***
166197
The callback's output such as items and requests.
198+
_Same as ***`result`*** prior to v0.0.28._
167199

168200
***`middlewares`***
169201
The relevant middlewares to replicate when running the tests.
170202

171203
***`settings`***
172204
The settings explicitly recorded by the *`AUTOUNIT_INCLUDED_SETTINGS`* setting.
173205

174-
***`spider_args`***
175-
The arguments passed to the spider in the crawl.
206+
***`init_attrs`***
207+
The spider's attributes right after its _\_\_init\_\__ call.
208+
209+
***`input_attrs`***
210+
The spider's attributes right before running the callback.
211+
_Same as ***`spider_args`*** or ***`spider_args_in`*** prior to v0.0.28._
176212

177-
***`python_version`***
178-
Indicates if the fixture was recorded in python 2 or 3.
213+
***`output_attrs`***
214+
The spider's attributes right after running the callback.
215+
_Same as ***`spider_args_out`*** prior to v0.0.28._
179216

180217
Then for example, to inspect a fixture's specific request we can do the following:
181218
```
@@ -184,12 +221,53 @@ $ autounit inspect my_spider my_callback 4 | jq '.request'
184221

185222
### `autounit update`
186223

187-
You can update your fixtures to match your latest changes in a particular callback to avoid running the whole spider.
188-
For example, this updates all the fixtures for a specific callback:
224+
This command updates your fixtures to match your latest changes, avoiding to run the whole spider again.
225+
You can update the whole project, an entire spider, just a callback or a single fixture.
226+
227+
###### Update the whole project
228+
```
229+
$ autounit update
230+
WARNING: this will update all the existing fixtures from the current project
231+
Do you want to continue? (y/n)
232+
```
233+
234+
###### Update every callback in a spider
235+
```
236+
$ autounit update -s my_spider
237+
```
238+
239+
###### Update every fixture in a callback
240+
```
241+
$ autounit update -s my_spider -c my_callback
242+
```
243+
244+
###### Update a single fixture
189245
```
190-
$ autounit update my_spider my_callback
246+
# Update fixture number 5
247+
$ autounit update -s my_spider -c my_callback -f 5
191248
```
192-
Optionally you can specify a particular fixture to update with `-f` or `--fixture`:
249+
 
250+
251+
## Internals
252+
253+
The `AutounitMiddleware` uses a [`Recorder`](scrapy_autounit/recorder.py) to record [`Cassettes`](scrapy_autounit/cassette.py) in binary fixtures.
254+
255+
Then, the tests use a [`Player`](scrapy_autounit/player.py) to playback those `Cassettes` and compare its output against your current callbacks.
256+
257+
The fixtures contain a pickled and compressed `Cassette` instance that you can get programmatically by doing:
258+
```python
259+
from scrapy_autounit.cassette import Cassette
260+
261+
cassette = Cassette.from_fixture(path_to_your_fixture)
262+
# cassette.request
263+
# cassette.response
264+
# cassette.output_data
265+
# ...
266+
```
267+
268+
If you know what you're doing, you can modify that cassette and re-record it by using:
269+
```python
270+
from scrapy_autounit.recorder import Recorder
271+
272+
Recorder.update_fixture(cassette, path)
193273
```
194-
$ autounit update my_spider my_callback --fixture 4
195-
```

scrapy_autounit/cassette.py

Lines changed: 98 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
import pickle
2+
import sys
3+
import zlib
4+
5+
from scrapy.crawler import Crawler
6+
from scrapy.utils.conf import build_component_list
7+
from scrapy.utils.project import get_project_settings
8+
9+
from .utils import get_spider_class
10+
11+
12+
class Cassette:
13+
"""
14+
Helper class to store request, response and output data.
15+
"""
16+
FIXTURE_VERSION = 2
17+
18+
def __init__(
19+
self,
20+
spider=None,
21+
spider_name=None,
22+
request=None,
23+
response=None,
24+
init_attrs=None,
25+
input_attrs=None,
26+
output_attrs=None,
27+
output_data=None,
28+
middlewares=None,
29+
included_settings=None,
30+
python_version=None,
31+
filename=None,
32+
):
33+
self.spider_name = spider_name
34+
self.middlewares = middlewares
35+
self.included_settings = included_settings
36+
if spider:
37+
self.spider_name = spider.name
38+
self.middlewares = self._get_middlewares(spider.settings)
39+
self.included_settings = self._get_included_settings(spider.settings)
40+
41+
self.request = request
42+
self.response = response
43+
self.init_attrs = init_attrs
44+
self.input_attrs = input_attrs
45+
self.output_attrs = output_attrs
46+
self.output_data = output_data
47+
self.filename = filename
48+
self.python_version = python_version or sys.version_info.major
49+
50+
@classmethod
51+
def from_fixture(cls, fixture):
52+
with open(fixture, 'rb') as f:
53+
binary = f.read()
54+
cassette = pickle.loads(zlib.decompress(binary))
55+
return cassette
56+
57+
def _get_middlewares(self, settings):
58+
full_list = build_component_list(settings.getwithbase('SPIDER_MIDDLEWARES'))
59+
autounit_mw_path = list(filter(lambda x: x.endswith('AutounitMiddleware'), full_list))[0]
60+
start = full_list.index(autounit_mw_path)
61+
mw_paths = [mw for mw in full_list[start:] if mw != autounit_mw_path]
62+
return mw_paths
63+
64+
def _get_included_settings(self, settings):
65+
# Use the new setting, if empty, try the deprecated one
66+
names = settings.getlist('AUTOUNIT_RECORD_SETTINGS', [])
67+
if not names:
68+
names = settings.getlist('AUTOUNIT_INCLUDED_SETTINGS', [])
69+
included = {name: settings.get(name) for name in names}
70+
return included
71+
72+
def get_spider(self):
73+
settings = get_project_settings()
74+
spider_cls = get_spider_class(self.spider_name, settings)
75+
76+
spider_cls.update_settings(settings)
77+
for k, v in self.included_settings.items():
78+
settings.set(k, v, priority=50)
79+
80+
crawler = Crawler(spider_cls, settings)
81+
spider = spider_cls.from_crawler(crawler, **self.init_attrs)
82+
return spider
83+
84+
def pack(self):
85+
return zlib.compress(pickle.dumps(self, protocol=2))
86+
87+
def to_dict(self):
88+
return {
89+
'spider_name': self.spider_name,
90+
'request': self.request,
91+
'response': self.response,
92+
'output_data': self.output_data,
93+
'middlewares': self.middlewares,
94+
'settings': self.included_settings,
95+
'init_attrs': self.init_attrs,
96+
'input_attrs': self.input_attrs,
97+
'output_attrs': self.output_attrs,
98+
}

0 commit comments

Comments
 (0)