Skip to content

Commit 905afcd

Browse files
feat: Add conference-acronym CLI command group for managing acronyms (#101)
* feat: Add conference-acronym CLI command group for managing acronyms Implements issue #87 by adding a new command group to manage the conference acronym database with the following subcommands: - conference-acronym status: Show database status (total count) - conference-acronym stats: Show detailed statistics (most recent, oldest) - conference-acronym list: List all acronym mappings with pagination - conference-acronym clear: Clear all acronym entries with confirmation - conference-acronym add: Manually add acronym mappings The command group uses a hyphenated name to allow for future expansion (e.g., journal-acronym) and follows existing CLI patterns for consistency. Added helper methods to CacheManager: - get_acronym_stats(): Returns statistics about acronym database - list_all_acronyms(): Lists all mappings with optional pagination - clear_acronym_database(): Clears all entries and returns count All commands include proper error handling, confirmation prompts where appropriate, and comprehensive unit tests. [AI-assisted] * feat: Add normalized venue name to conference-acronym list output Added the normalized form of each full name to the conference-acronym list command output. This shows users exactly what the system searches for after normalization (after removing "Proceedings of" prefixes, year/edition suffixes, parenthetical notes, and extra whitespace). Example output now includes: Acronym: ICML Full Name: Proceedings of the 40th International Conference on Machine Learning (ICML 2023) Normalized: international conference on machine learning Source: bibtex_extraction Created: 2024-01-10 09:00:00 Last Used: 2024-01-15 10:30:00 The normalization is performed using the input_normalizer which applies the same cleaning and normalization logic used during actual assessment queries. Updated tests to mock the normalizer and verify the normalized output is displayed correctly. [AI-assisted] * refactor: Remove code duplication in conference-acronym status command The status command was directly querying the database with SQL, duplicating the logic already present in get_acronym_stats(). Now it uses the existing CacheManager method for consistency and maintainability. Also simplified the status tests to mock get_acronym_stats() instead of creating real databases, making them faster and more focused. Changes: - status command now calls cache_manager.get_acronym_stats() - Removed direct SQL query: SELECT COUNT(*) FROM conference_acronyms - Updated tests to mock get_acronym_stats() return value - Removed tempfile and sqlite3 setup from status tests This eliminates code duplication and follows the DRY principle. [AI-assisted] --------- Co-authored-by: florath-ai-assistant[bot] <Andreas.Florath@telekom.de>
1 parent e7e5bc9 commit 905afcd

File tree

3 files changed

+536
-0
lines changed

3 files changed

+536
-0
lines changed

src/aletheia_probe/cache.py

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1189,6 +1189,118 @@ def store_acronym_mapping(
11891189
)
11901190
conn.commit()
11911191

1192+
def get_acronym_stats(self) -> dict[str, int | str]:
1193+
"""
1194+
Get statistics about the acronym database.
1195+
1196+
Returns:
1197+
Dictionary containing count, most_recent, and oldest entry info
1198+
"""
1199+
with sqlite3.connect(self.db_path) as conn:
1200+
conn.row_factory = sqlite3.Row
1201+
cursor = conn.cursor()
1202+
1203+
# Get total count
1204+
cursor.execute("SELECT COUNT(*) as count FROM conference_acronyms")
1205+
count = cursor.fetchone()["count"]
1206+
1207+
# Get most recently used
1208+
cursor.execute(
1209+
"""
1210+
SELECT acronym, full_name, last_used_at
1211+
FROM conference_acronyms
1212+
ORDER BY last_used_at DESC
1213+
LIMIT 1
1214+
"""
1215+
)
1216+
most_recent = cursor.fetchone()
1217+
1218+
# Get oldest entry
1219+
cursor.execute(
1220+
"""
1221+
SELECT acronym, full_name, created_at
1222+
FROM conference_acronyms
1223+
ORDER BY created_at ASC
1224+
LIMIT 1
1225+
"""
1226+
)
1227+
oldest = cursor.fetchone()
1228+
1229+
stats = {"total_count": count}
1230+
1231+
if most_recent:
1232+
stats["most_recent_acronym"] = most_recent["acronym"]
1233+
stats["most_recent_full_name"] = most_recent["full_name"]
1234+
stats["most_recent_used"] = most_recent["last_used_at"]
1235+
1236+
if oldest:
1237+
stats["oldest_acronym"] = oldest["acronym"]
1238+
stats["oldest_full_name"] = oldest["full_name"]
1239+
stats["oldest_created"] = oldest["created_at"]
1240+
1241+
return stats
1242+
1243+
def list_all_acronyms(
1244+
self, limit: int | None = None, offset: int = 0
1245+
) -> list[dict[str, str]]:
1246+
"""
1247+
List all acronym mappings in the database.
1248+
1249+
Args:
1250+
limit: Maximum number of entries to return (None for all)
1251+
offset: Number of entries to skip
1252+
1253+
Returns:
1254+
List of dictionaries containing acronym details
1255+
"""
1256+
with sqlite3.connect(self.db_path) as conn:
1257+
conn.row_factory = sqlite3.Row
1258+
cursor = conn.cursor()
1259+
1260+
query = """
1261+
SELECT acronym, full_name, source, created_at, last_used_at
1262+
FROM conference_acronyms
1263+
ORDER BY acronym ASC
1264+
"""
1265+
1266+
if limit is not None:
1267+
query += f" LIMIT {limit} OFFSET {offset}"
1268+
1269+
cursor.execute(query)
1270+
rows = cursor.fetchall()
1271+
1272+
return [
1273+
{
1274+
"acronym": row["acronym"],
1275+
"full_name": row["full_name"],
1276+
"source": row["source"],
1277+
"created_at": row["created_at"],
1278+
"last_used_at": row["last_used_at"],
1279+
}
1280+
for row in rows
1281+
]
1282+
1283+
def clear_acronym_database(self) -> int:
1284+
"""
1285+
Clear all entries from the acronym database.
1286+
1287+
Returns:
1288+
Number of entries deleted
1289+
"""
1290+
with sqlite3.connect(self.db_path) as conn:
1291+
cursor = conn.cursor()
1292+
1293+
# Get count before deletion
1294+
cursor.execute("SELECT COUNT(*) FROM conference_acronyms")
1295+
result = cursor.fetchone()
1296+
count: int = result[0] if result else 0
1297+
1298+
# Delete all entries
1299+
cursor.execute("DELETE FROM conference_acronyms")
1300+
conn.commit()
1301+
1302+
return count
1303+
11921304

11931305
# Global cache manager instance with factory pattern
11941306
_cache_manager_instance: CacheManager | None = None

src/aletheia_probe/cli.py

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,167 @@ def bibtex(
297297
asyncio.run(_async_bibtex_main(bibtex_file, verbose, output_format, relax_bibtex))
298298

299299

300+
@main.group(name="conference-acronym")
301+
def conference_acronym() -> None:
302+
"""Manage the conference acronym database."""
303+
pass
304+
305+
306+
@conference_acronym.command(name="status")
307+
def acronym_status() -> None:
308+
"""Show conference acronym database status."""
309+
status_logger = get_status_logger()
310+
311+
try:
312+
cache_manager = get_cache_manager()
313+
stats = cache_manager.get_acronym_stats()
314+
count = stats.get("total_count", 0)
315+
316+
status_logger.info("Conference Acronym Database Status")
317+
status_logger.info("=" * 40)
318+
319+
if count == 0:
320+
status_logger.info("Database is empty (no acronyms stored)")
321+
else:
322+
status_logger.info(f"Total acronyms: {count:,}")
323+
324+
except Exception as e:
325+
status_logger.error(f"Error getting acronym database status: {e}")
326+
exit(1)
327+
328+
329+
@conference_acronym.command()
330+
def stats() -> None:
331+
"""Show detailed statistics about the acronym database."""
332+
status_logger = get_status_logger()
333+
334+
try:
335+
cache_manager = get_cache_manager()
336+
stats = cache_manager.get_acronym_stats()
337+
338+
status_logger.info("Conference Acronym Database Statistics")
339+
status_logger.info("=" * 40)
340+
341+
total = stats.get("total_count", 0)
342+
343+
if total == 0:
344+
status_logger.info("Database is empty (no acronyms stored)")
345+
return
346+
347+
status_logger.info(f"Total acronyms: {total:,}")
348+
349+
if "most_recent_acronym" in stats:
350+
status_logger.info("\nMost Recently Used:")
351+
status_logger.info(f" Acronym: {stats['most_recent_acronym']}")
352+
status_logger.info(f" Full Name: {stats['most_recent_full_name']}")
353+
status_logger.info(f" Last Used: {stats['most_recent_used']}")
354+
355+
if "oldest_acronym" in stats:
356+
status_logger.info("\nOldest Entry:")
357+
status_logger.info(f" Acronym: {stats['oldest_acronym']}")
358+
status_logger.info(f" Full Name: {stats['oldest_full_name']}")
359+
status_logger.info(f" Created: {stats['oldest_created']}")
360+
361+
except Exception as e:
362+
status_logger.error(f"Error getting acronym statistics: {e}")
363+
exit(1)
364+
365+
366+
@conference_acronym.command()
367+
@click.option("--limit", type=int, help="Maximum number of entries to display")
368+
@click.option("--offset", type=int, default=0, help="Number of entries to skip")
369+
def list(limit: int | None, offset: int) -> None:
370+
"""List all acronym mappings in the database."""
371+
status_logger = get_status_logger()
372+
373+
try:
374+
cache_manager = get_cache_manager()
375+
acronyms = cache_manager.list_all_acronyms(limit=limit, offset=offset)
376+
377+
if not acronyms:
378+
status_logger.info("No acronyms found in the database.")
379+
return
380+
381+
status_logger.info("Conference Acronym Mappings")
382+
status_logger.info("=" * 80)
383+
384+
for entry in acronyms:
385+
# Normalize the full name to show what the system actually searches for
386+
normalized_name = input_normalizer.normalize(
387+
entry["full_name"]
388+
).normalized_name
389+
390+
status_logger.info(f"\nAcronym: {entry['acronym']}")
391+
status_logger.info(f" Full Name: {entry['full_name']}")
392+
status_logger.info(f" Normalized: {normalized_name}")
393+
status_logger.info(f" Source: {entry['source']}")
394+
status_logger.info(f" Created: {entry['created_at']}")
395+
status_logger.info(f" Last Used: {entry['last_used_at']}")
396+
397+
total_count = cache_manager.get_acronym_stats()["total_count"]
398+
shown = len(acronyms)
399+
400+
if limit is not None or offset > 0:
401+
status_logger.info(f"\nShowing {shown} of {total_count:,} total acronyms")
402+
403+
except Exception as e:
404+
status_logger.error(f"Error listing acronyms: {e}")
405+
exit(1)
406+
407+
408+
@conference_acronym.command()
409+
@click.option("--confirm", is_flag=True, help="Skip confirmation prompt")
410+
def clear(confirm: bool) -> None:
411+
"""Clear all entries from the acronym database."""
412+
status_logger = get_status_logger()
413+
414+
if not confirm:
415+
click.confirm(
416+
"This will delete all conference acronym mappings. Continue?", abort=True
417+
)
418+
419+
try:
420+
cache_manager = get_cache_manager()
421+
count = cache_manager.clear_acronym_database()
422+
423+
if count == 0:
424+
status_logger.info("Acronym database is already empty.")
425+
else:
426+
status_logger.info(f"Cleared {count:,} acronym mapping(s).")
427+
428+
except Exception as e:
429+
status_logger.error(f"Error clearing acronym database: {e}")
430+
exit(1)
431+
432+
433+
@conference_acronym.command()
434+
@click.argument("acronym")
435+
@click.argument("full_name")
436+
@click.option(
437+
"--source",
438+
default="manual",
439+
help="Source of the mapping (default: manual)",
440+
)
441+
def add(acronym: str, full_name: str, source: str) -> None:
442+
"""Manually add an acronym mapping to the database.
443+
444+
ACRONYM: The conference acronym (e.g., ICML)
445+
FULL_NAME: The full conference name
446+
"""
447+
status_logger = get_status_logger()
448+
449+
try:
450+
cache_manager = get_cache_manager()
451+
cache_manager.store_acronym_mapping(acronym, full_name, source)
452+
453+
status_logger.info(f"Added acronym mapping: {acronym} -> {full_name}")
454+
status_logger.info(f"Source: {source}")
455+
456+
except Exception as e:
457+
status_logger.error(f"Error adding acronym mapping: {e}")
458+
exit(1)
459+
460+
300461
async def _async_bibtex_main(
301462
bibtex_file: str, verbose: bool, output_format: str, relax_bibtex: bool
302463
) -> None:

0 commit comments

Comments
 (0)