Skip to content

Commit 7df322d

Browse files
chore: adding documentation on skip falsy and skip values settings
Signed-off-by: Nilton Junior <ngm.junior@outlook.com>
1 parent 7deb1cc commit 7df322d

File tree

1 file changed

+63
-0
lines changed

1 file changed

+63
-0
lines changed

docs/source/settings.rst

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,69 @@ If this setting is not provided or set to ``False``, spider statistics will be:
236236
'spidermon_field_coverage/dict/field_3': 1.0, # Did not ignore empty list
237237
'spidermon_field_coverage/dict/field_4': 1.0, # Did not ignore zero
238238
239+
SPIDERMON_FIELD_COVERAGE_SKIP_VALUES
240+
------------------------------------
241+
Default: ``[]``
242+
243+
A list of custom values that should not be counted as valid field values when calculating field coverage. This is useful when your items contain placeholder values like "N/A", "-", "TBD", etc. that indicate missing data but are not Python falsy values. You can also skip numeric values like ``0`` or ``-1`` if they represent missing data in your use case.
244+
245+
This setting works in addition to ``SPIDERMON_FIELD_COVERAGE_SKIP_NONE`` and ``SPIDERMON_FIELD_COVERAGE_SKIP_FALSY``. Values are matched using exact equality (``==``), so type matters (e.g., the string ``"0"`` is different from the integer ``0``).
246+
247+
The setting can be provided in several formats:
248+
249+
* As a Python list in your settings file (recommended for mixed types): ``[0, -1, "N/A"]``
250+
* As a JSON string (preserves types): ``'[0, -1, "N/A"]'``
251+
* As a comma-separated string (converts all values to strings): ``"0,-1,N/A"``
252+
253+
For non-string values (like integers), use a Python list or JSON string to preserve the types. Comma-separated strings will convert all values to strings.
254+
255+
Considering your spider returns the following items:
256+
257+
.. code-block:: python
258+
259+
[
260+
{
261+
"field_1": "N/A",
262+
"field_2": "value",
263+
"field_3": "-",
264+
"field_4": "TBD",
265+
},
266+
{
267+
"field_1": "actual_value",
268+
"field_2": "value",
269+
"field_3": "data",
270+
"field_4": "completed",
271+
},
272+
]
273+
274+
If this setting is set to ``["N/A", "-", "TBD"]``, spider statistics will be:
275+
276+
.. code-block:: python
277+
278+
'spidermon_item_scraped_count/dict': 2,
279+
'spidermon_item_scraped_count/dict/field_1': 1, # Ignored "N/A"
280+
'spidermon_item_scraped_count/dict/field_2': 2,
281+
'spidermon_item_scraped_count/dict/field_3': 1, # Ignored "-"
282+
'spidermon_item_scraped_count/dict/field_4': 1, # Ignored "TBD"
283+
'spidermon_field_coverage/dict/field_1': 0.5, # Ignored "N/A"
284+
'spidermon_field_coverage/dict/field_2': 1.0,
285+
'spidermon_field_coverage/dict/field_3': 0.5, # Ignored "-"
286+
'spidermon_field_coverage/dict/field_4': 0.5, # Ignored "TBD"
287+
288+
If this setting is not provided or set to an empty list, spider statistics will be:
289+
290+
.. code-block:: python
291+
292+
'spidermon_item_scraped_count/dict': 2,
293+
'spidermon_item_scraped_count/dict/field_1': 2, # Did not ignore "N/A"
294+
'spidermon_item_scraped_count/dict/field_2': 2,
295+
'spidermon_item_scraped_count/dict/field_3': 2, # Did not ignore "-"
296+
'spidermon_item_scraped_count/dict/field_4': 2, # Did not ignore "TBD"
297+
'spidermon_field_coverage/dict/field_1': 1.0, # Did not ignore "N/A"
298+
'spidermon_field_coverage/dict/field_2': 1.0,
299+
'spidermon_field_coverage/dict/field_3': 1.0, # Did not ignore "-"
300+
'spidermon_field_coverage/dict/field_4': 1.0, # Did not ignore "TBD"
301+
239302
SPIDERMON_LIST_FIELDS_COVERAGE_LEVELS
240303
-------------------------------------
241304
Default: ``0``

0 commit comments

Comments
 (0)