Skip to content

Commit 97451cc

Browse files
fix: Detect conferences by display name to override OpenAlex misclassification (#136)
This commit addresses a critical issue where OpenAlex incorrectly classifies conference proceedings as journals (e.g., IEEE IPDPS), causing inappropriate journal-specific heuristics to be applied, resulting in false positives. Root cause analysis: - OpenAlex returns source_type="journal" for conference proceedings - Example: "Proceedings - IEEE International Parallel and Distributed Processing Symposium" - Tool correctly has separate heuristics for journals vs conferences - But wrong heuristics were applied due to upstream data quality issues Solution: - Detect conferences by keywords in display_name: "proceedings", "conference", "symposium", "workshop" - Override source_type when these keywords are found - Apply conference-specific heuristics instead of journal heuristics Impact: - IEEE IPDPS: SUSPICIOUS (0.68) → UNKNOWN (0.30) - Eliminated false journal red flags: - "New journal with high output" → conference-specific analysis - "Journal appears inactive" → appropriate for conferences - Fixes issue #126 false positives for legitimate conference proceedings Technical details: - Added display_name inspection before routing to analysis - Keywords checked: proceedings, conference, symposium, workshop - Logging added when override occurs (detail logger) - Updated publication_type field to reflect corrected classification Testing: - IEEE IPDPS now correctly analyzed as conference - All 342 tests pass - Quality checks pass (ruff, mypy, pytest) Related: #126 (false positives for legitimate venues) [AI-assisted] Co-authored-by: florath-ai-assistant[bot] <Andreas.Florath@telekom.de>
1 parent b586674 commit 97451cc

File tree

1 file changed

+22
-1
lines changed

1 file changed

+22
-1
lines changed

src/aletheia_probe/backends/openalex_analyzer.py

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,26 @@ async def _query_api(self, query_input: QueryInput) -> BackendResult:
9999

100100
# Route to appropriate assessment based on publication type
101101
source_type = openalex_data.get("source_type", "").lower()
102+
display_name = openalex_data.get("display_name", "").lower()
103+
104+
# Override OpenAlex misclassification if display_name suggests conference
105+
# OpenAlex sometimes incorrectly classifies conference proceedings as journals
106+
# Detect conferences by common keywords in the display name
107+
conference_keywords = [
108+
"proceedings",
109+
"conference",
110+
"symposium",
111+
"workshop",
112+
]
113+
if source_type != "conference" and any(
114+
keyword in display_name for keyword in conference_keywords
115+
):
116+
self.detail_logger.info(
117+
f"OpenAlex: Overriding source_type '{source_type}' to 'conference' "
118+
f"based on display_name: '{openalex_data.get('display_name')}'"
119+
)
120+
source_type = "conference"
121+
102122
if source_type == "conference":
103123
analysis = self._analyze_conference_patterns(openalex_data)
104124
else:
@@ -116,7 +136,8 @@ async def _query_api(self, query_input: QueryInput) -> BackendResult:
116136
"metrics": analysis["metrics"],
117137
"red_flags": analysis["red_flags"],
118138
"green_flags": analysis["green_flags"],
119-
"publication_type": source_type or "journal",
139+
"publication_type": source_type
140+
or "journal", # Use corrected source_type
120141
},
121142
sources=[
122143
"https://api.openalex.org",

0 commit comments

Comments
 (0)