-
Notifications
You must be signed in to change notification settings - Fork 244
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
the method read_player_ratings return only the last player. Specifically because there is an incorrect indentation of tht XPath extraction and ratings.append() are outside the player loop, so only the last player's scores are processed and appended.
Python Version
Python 3.11.4
Affected scrapers
This affects the following scrapers:
- SoFIFA
Code example
import soccerdata as sd
sofifa = sd.SoFIFA(leagues="ENG-Premier League", versions="latest")
print(sofifa.read_player_ratings(team="Arsenal")Error message
no error message
Error output
fifa_edition update overallrating ... gk_kicking gk_positioning gk_reflexes
player ...
Takehiro Tomiyasu FC 25 Jul 17, 2025 78 ... 6 5 11
[1 rows x 38 columns]
Additional context
I fix the problem with GPT-5 mini but im not sure is the correct way (or an effective issue) because i only dowload the collection.
Code fix sofifa.py
def read_player_ratings(
self,
team: Optional[Union[str, list[str]]] = None,
player: Optional[Union[int, list[int]]] = None,
) -> pd.DataFrame:
"""Retrieve ratings for players.
Parameters
----------
team: str or list of str, optional
Team(s) to retrieve. If None, will retrieve all teams.
player: int or list of int, optional
Player(s) to retrieve. If None, will retrieve all players.
Returns
-------
pd.DataFrame
"""
# build url
urlmask = SO_FIFA_API + "/player/{}/?r={}&set=true"
filemask = "player_{}_{}.html"
# get player IDs
if player is None:
players = self.read_players(team=team).index.unique()
elif isinstance(player, int):
players = [player]
else:
players = player
# prepare empty data frame
ratings = []
# define labels to use for score extraction from player profile pages
score_labels = [
"Overall rating",
"Potential",
"Crossing",
"Finishing",
"Heading accuracy",
"Short passing",
"Volleys",
"Dribbling",
"Curve",
"FK Accuracy",
"Long passing",
"Ball control",
"Acceleration",
"Sprint speed",
"Agility",
"Reactions",
"Balance",
"Shot power",
"Jumping",
"Stamina",
"Strength",
"Long shots",
"Aggression",
"Interceptions",
"Positioning",
"Vision",
"Penalties",
"Composure",
"Defensive awareness",
"Standing tackle",
"Sliding tackle",
"GK Diving",
"GK Handling",
"GK Kicking",
"GK Positioning",
"GK Reflexes",
]
iterator = list(product(self.versions.iterrows(), players))
for i, ((version_id, version), player) in enumerate(iterator):
logger.info(
"[%s/%s] Retrieving ratings for player with ID %s in %s edition",
i + 1,
len(iterator),
player,
version["update"],
)
# read html page (player overview)
filepath = self.data_dir / filemask.format(player, version_id)
url = urlmask.format(player, version_id)
reader = self.get(url, filepath)
# extract scores one-by-one
tree = html.parse(reader, parser=html.HTMLParser(encoding="utf8"))
# get player name safely
node_player_name_nodes = tree.xpath("//div[contains(@class, 'profile')]/h1")
if node_player_name_nodes:
node_player_name = node_player_name_nodes[0]
# Extract what is before <br>
before_br = node_player_name.xpath("string(./text()[1])").strip()
# Extract what is after <br>
after_br = node_player_name.xpath(
"string(./br/following-sibling::text()[1])"
).strip()
player_name = before_br if before_br else after_br
else:
player_name = None
scores = {"player": player_name, **version.to_dict()}
# Try each XPath until one returns a result
for s in score_labels:
value = None
xpaths = [
f"//p[.//text()[contains(.,'{s}')]]/span/em",
f"//div[contains(.,'{s}')]/em",
f"//li[not(self::script)][.//text()[contains(.,'{s}')]]/em",
]
for xpath in xpaths:
nodes = tree.xpath(xpath)
if nodes: # If at least one match is found
text = nodes[0].text
value = text.strip() if text is not None else None
break # Stop checking other XPaths once we find a valid value
scores[s] = value # will be None if not found
ratings.append(scores)
# return data frame
return pd.DataFrame(ratings).pipe(standardize_colnames).set_index(["player"]).sort_index()Contributor Action Plan
- I’m unsure how to fix this, but I'm willing to work on it with guidance.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working