Skip to content

Always trim string fields#101

Open
Nykakin wants to merge 6 commits intozytedata:mainfrom
Nykakin:always-trim-strings
Open

Always trim string fields#101
Nykakin wants to merge 6 commits intozytedata:mainfrom
Nykakin:always-trim-strings

Conversation

@Nykakin
Copy link
Contributor

@Nykakin Nykakin commented Aug 21, 2024

Always trim string fields.

For all fields of type Optional[str] or str apply str.strip() through processors.

For all fields of type Optional[List[str]] or List[str] apply str.strip() to all elements through processors.

Extract processors to separate classes following approach found in extractors to reduce duplication in code.

@Nykakin Nykakin changed the title Always trim strings Always trim string fields Aug 21, 2024
"""Processor for string values"""
if isinstance(value, str):
return value.strip()
return value
Copy link
Contributor Author

@Nykakin Nykakin Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should return None if value is not a string instance...? Need an opinion here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should convert it to str unless it is None.

I also wonder if we should return None if not value.strip().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are easy to implement. I don't particullary mind but I don't feel I am the one to make such decision.

@Nykakin Nykakin mentioned this pull request Aug 21, 2024
def loop(values):
if not isinstance(values, Iterable):
return values
return [processor(value) for value in values]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should skip None values in the resulting list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking of following this PR with another one that would apply some sort of "clear empty items" on all fields returning lists of items.

But later in this comment I was told that by design we should keep processors transparent, so right now I'm not sure how much of post-processing is actually allowed. This particular PR also goes against this philosophy, though I'd argue is relatively non-invasive and seems useful.

I guess decision needs to be made at higher level whenever we want to stick to this design goal or are we deliberately going to break it next release, which would allow this and other similar PR to be included in the library.

Copy link
Contributor

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+0.5 to these processors.

Even if we are processing str, I feel it is justified to meet the schema requirements, which I feel are for all these fields to be trimmed, even if the schema only says that explicitly for article body.

@codecov
Copy link

codecov bot commented Aug 25, 2024

Codecov Report

Attention: Patch coverage is 0% with 158 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (cce3b94) to head (2800ff8).
Report is 6 commits behind head on main.

Files Patch % Lines
zyte_common_items/pages/product.py 0.00% 25 Missing ⚠️
zyte_common_items/pages/job_posting.py 0.00% 21 Missing ⚠️
zyte_common_items/pages/real_estate.py 0.00% 19 Missing ⚠️
zyte_common_items/pages/business_place.py 0.00% 18 Missing ⚠️
zyte_common_items/pages/article.py 0.00% 17 Missing ⚠️
zyte_common_items/pages/social_media_post.py 0.00% 10 Missing ⚠️
zyte_common_items/processors.py 0.00% 10 Missing ⚠️
zyte_common_items/pages/product_list.py 0.00% 9 Missing ⚠️
zyte_common_items/pages/product_navigation.py 0.00% 9 Missing ⚠️
zyte_common_items/pages/article_list.py 0.00% 8 Missing ⚠️
... and 2 more
Additional details and impacted files
@@          Coverage Diff           @@
##            main    #101    +/-   ##
======================================
  Coverage   0.00%   0.00%            
======================================
  Files         54      55     +1     
  Lines       2107    2224   +117     
======================================
- Misses      2107    2224   +117     
Files Coverage Δ
zyte_common_items/pages/base.py 0.00% <0.00%> (ø)
zyte_common_items/pages/article_navigation.py 0.00% <0.00%> (ø)
zyte_common_items/pages/article_list.py 0.00% <0.00%> (ø)
zyte_common_items/pages/product_list.py 0.00% <0.00%> (ø)
zyte_common_items/pages/product_navigation.py 0.00% <0.00%> (ø)
zyte_common_items/pages/social_media_post.py 0.00% <0.00%> (ø)
zyte_common_items/processors.py 0.00% <0.00%> (ø)
zyte_common_items/pages/article.py 0.00% <0.00%> (ø)
zyte_common_items/pages/business_place.py 0.00% <0.00%> (ø)
zyte_common_items/pages/real_estate.py 0.00% <0.00%> (ø)
... and 2 more

@kmike
Copy link
Contributor

kmike commented Sep 9, 2024

I think that's fine to strip string fields, but we need to change the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants