Skip to content

Commit 0cf8a66

Browse files
Merge pull request #416 from dgruhin-hrizn/fork-calibre-web-automated-enhanced
Feature: Duplicate Book Management System & Enhanced Ingest Reliability
2 parents 6765ec8 + 789ebb8 commit 0cf8a66

File tree

15 files changed

+973
-14
lines changed

15 files changed

+973
-14
lines changed

CONTRIBUTORS

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
CONTRIBUTORS
22

33
This file is automatically generated. DO NOT EDIT MANUALLY.
4-
Generated on: 2025-09-01T20:52:42.198138Z
4+
Generated on: 2025-09-02T13:44:20.914655Z
55

66
Upstream project: https://github.com/janeczku/calibre-web
77
Fork project (Calibre-Web Automated, since 2024): https://github.com/crocodilestick/calibre-web-automated
@@ -289,7 +289,7 @@ Copyright (C) 2024-2025 Calibre-Web Automated contributors
289289
- zhiyue (1 commits)
290290
# Fork Contributors (crocodilestick/calibre-web-automated)
291291

292-
- crocodilestick (587 commits)
292+
- crocodilestick (588 commits)
293293
- jmarmstrong1207 (73 commits)
294294
- demitrix (30 commits)
295295
- sirwolfgang (22 commits)

README.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
![Docker Pulls](https://img.shields.io/docker/pulls/crocodilestick/calibre-web-automated)
88
![GitHub Release](https://img.shields.io/github/v/release/crocodilestick/calibre-web-automated)
99
![GitHub commits since latest release](https://img.shields.io/github/commits-since/crocodilestick/calibre-web-automated/latest)
10+
![OAuth 2.0 + OIDC](https://img.shields.io/badge/OAuth-2.0%20%2B%20OIDC-blue?style=flat&logo=oauth)
1011

1112

1213
## _Quick Access_
@@ -23,6 +24,7 @@
2324
- [Usage](#usage-) 🔧
2425
- [Adding Books to Your Library](#adding-books-to-your-library)
2526
- [KOReader Syncing (KOSync)](#koreader-syncing-kosync-) 📖⚡
27+
- [OAuth Authentication Setup](#enhanced-oauth-20oidc-authentication-) 🔐
2628
- [For Developers](#for-developers---building-custom-docker-image) 🚀
2729
- [Further Development](#further-development-️) 🏗️
2830
- [Support / Buy me a Coffee](https://ko-fi.com/crocodilestick)
@@ -107,7 +109,7 @@ This tells CWA to avoid enabling WAL on the Calibre `metadata.db` and the `app.d
107109
| eBook metadata editing and deletion support | Metadata download from various sources (extensible via plugins) | eBook download restriction to logged-in users |
108110
| Public user registration support | Send eBooks to E-Readers with a single click | Sync Kobo devices with your Calibre library |
109111
| In-browser eBook reading support for multiple formats | Content hiding based on categories and Custom Column content per user | "Magic Link" login for easy access on eReaders |
110-
| LDAP, Google/GitHub OAuth, and proxy authentication support | Advanced search and filtering options | Multilingual user interface supporting 20+ [languages](https://github.com/janeczku/calibre-web/wiki/Translation-Status) |
112+
| Enhanced OAuth 2.0/OIDC authentication with auto-discovery | Advanced search and filtering options | Multilingual user interface supporting 20+ [languages](https://github.com/janeczku/calibre-web/wiki/Translation-Status) |
111113

112114
## Plus these _**CWA Specific Features**_ on top:
113115

@@ -120,7 +122,7 @@ This tells CWA to avoid enabling WAL on the Calibre `metadata.db` and the `app.d
120122
| [Automatic EPUB Fixer Service 🔨](#automatic-epub-fixer-service-) | [Multi-Format Conversion Service 🌌](#simple-to-use-multi-format-conversion-service-) | [Library Auto-Detect 📚🕵️](#library-auto-detect-️) |
121123
| [Server Stats Tracking Page 📍](#server-stats-tracking-page-) | [Server Stats Tracking 📊](#server-stats-tracking-page-) | [Easy Dark/ Light Mode Switching ☀️🌙](#easy-dark-light-mode-switching-️) |
122124
| [Internal Update Notification System 🛎️](#internal-update-notification-system-️) | [Auto-Compression of Backed Up Files 🤐](#auto-compression-of-backed-up-files-) | [Additional Metadata Providers 🗃️](#additional-metadata-providers-️) |
123-
| [KOReader Syncing (KOSync) 📖⚡](#koreader-syncing-kosync-) | | |
125+
| [KOReader Syncing (KOSync) 📖⚡](#koreader-syncing-kosync-) | [Enhanced OAuth 2.0/OIDC Authentication 🔐](#enhanced-oauth-20oidc-authentication-) | |
124126

125127
#### **Automatic Ingest Service**
126128
- CWA currently supports automatic ingest of 27 different popular ebook formats
@@ -188,6 +190,15 @@ This tells CWA to avoid enabling WAL on the Calibre `metadata.db` and the `app.d
188190
- **CWA Integration:** Leverages your existing CWA user accounts and permissions - no additional server setup required
189191
- **Easy Installation:** Plugin and setup instructions are available directly from your CWA instance at `/kosync`
190192

193+
#### **Enhanced OAuth 2.0/OIDC Authentication** 🔐
194+
- **Auto-Discovery:** Automatic endpoint configuration via OIDC metadata URLs for seamless setup with providers like Keycloak, Authentik, Google, and Azure AD
195+
- **Manual Override:** Full manual control over OAuth endpoints when auto-discovery isn't available
196+
- **Field Mapping:** Configurable JWT field extraction for usernames and emails to work with any provider's token structure
197+
- **Group-Based Roles:** Automatic admin role assignment based on OAuth provider groups
198+
- **Testing Tools:** Built-in connection testing and validation to ensure your configuration works before going live
199+
- **Enterprise Ready:** Support for custom scopes, multiple authentication methods, and comprehensive troubleshooting
200+
- **📖 [Full OAuth Configuration Guide](https://github.com/crocodilestick/Calibre-Web-Automated/wiki/OAuth-Configuration)** for detailed setup instructions
201+
191202
#### **Server Stats Tracking Page** 📍📊
192203
- Ever wondered how many times CWA has been there for you in the background? Check out the CWA Stats page to see a fun list of statistics showing how many times CWA has been there to make your life just that little bit easier
193204
- A database also exists to keep track of any and all enforcements, imports, conversions & fixes both for peace of mind and to make the checking of any bugs or weird behaviour easier

cps/admin.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -548,7 +548,7 @@ def edit_list_user(param):
548548
if user.name == "Guest" and value == constants.SIDEBAR_READ_AND_UNREAD:
549549
raise Exception(_("Guest can't have this view"))
550550
# check for valid value, last on checks for power of 2 value
551-
if value > 0 and value <= constants.SIDEBAR_LIST and (value & value - 1 == 0 or value == 1):
551+
if value > 0 and value <= constants.SIDEBAR_DUPLICATES and (value & value - 1 == 0 or value == 1):
552552
if vals['value'] == 'true':
553553
user.sidebar_view |= value
554554
elif vals['value'] == 'false':

cps/constants.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@
8888
SIDEBAR_ARCHIVED = 1 << 15
8989
SIDEBAR_DOWNLOAD = 1 << 16
9090
SIDEBAR_LIST = 1 << 17
91+
SIDEBAR_DUPLICATES = 1 << 18
9192

9293
sidebar_settings = {
9394
"detail_random": DETAIL_RANDOM,
@@ -106,11 +107,12 @@
106107
"sidebar_archived": SIDEBAR_ARCHIVED,
107108
"sidebar_download": SIDEBAR_DOWNLOAD,
108109
"sidebar_list": SIDEBAR_LIST,
110+
"sidebar_duplicates": SIDEBAR_DUPLICATES,
109111
}
110112

111113

112114
ADMIN_USER_ROLES = sum(r for r in ALL_ROLES.values()) & ~ROLE_ANONYMOUS
113-
ADMIN_USER_SIDEBAR = (SIDEBAR_LIST << 1) - 1
115+
ADMIN_USER_SIDEBAR = (SIDEBAR_DUPLICATES << 1) - 1
114116

115117
UPDATE_STABLE = 0 << 0
116118
AUTO_UPDATE_STABLE = 1 << 0

cps/cwa_functions.py

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,32 @@ def get_ingest_dir():
100100
dirs = json.load(f)
101101
return dirs['ingest_folder']
102102

103+
def get_ingest_status():
104+
"""Read the current ingest service status"""
105+
try:
106+
with open('/config/cwa_ingest_status', 'r') as f:
107+
status_line = f.read().strip()
108+
if ':' in status_line:
109+
parts = status_line.split(':')
110+
return {
111+
'state': parts[0],
112+
'filename': parts[1] if len(parts) > 1 else '',
113+
'timestamp': parts[2] if len(parts) > 2 else '',
114+
'detail': parts[3] if len(parts) > 3 else ''
115+
}
116+
else:
117+
return {'state': status_line, 'filename': '', 'timestamp': '', 'detail': ''}
118+
except (FileNotFoundError, IOError):
119+
return {'state': 'unknown', 'filename': '', 'timestamp': '', 'detail': ''}
120+
121+
def get_ingest_queue_size():
122+
"""Get the number of files in the retry queue"""
123+
try:
124+
with open('/config/cwa_ingest_retry_queue', 'r') as f:
125+
return len([line for line in f if line.strip()])
126+
except (FileNotFoundError, IOError):
127+
return 0
128+
103129
def refresh_library(app):
104130
with app.app_context(): # Create app context for session
105131
ingest_dir = get_ingest_dir()
@@ -180,8 +206,12 @@ def set_cwa_settings():
180206
boolean_settings = []
181207
string_settings = []
182208
list_settings = []
209+
integer_settings = ['ingest_timeout_minutes'] # Special handling for integer settings
210+
183211
for setting in cwa_default_settings:
184-
if isinstance(cwa_default_settings[setting], int):
212+
if setting in integer_settings:
213+
continue # Handle separately
214+
elif isinstance(cwa_default_settings[setting], int):
185215
boolean_settings.append(setting)
186216
elif isinstance(cwa_default_settings[setting], str) and cwa_default_settings[setting] != "":
187217
string_settings.append(setting)
@@ -229,6 +259,22 @@ def set_cwa_settings():
229259
if result['auto_convert_target_format'] in result['auto_ingest_ignored_formats']:
230260
result['auto_ingest_ignored_formats'].remove(result['auto_convert_target_format'])
231261

262+
# Handle integer settings
263+
for setting in integer_settings:
264+
value = request.form.get(setting)
265+
if value is not None:
266+
try:
267+
int_value = int(value)
268+
# Validate timeout range
269+
if setting == 'ingest_timeout_minutes':
270+
int_value = max(5, min(120, int_value)) # Clamp between 5 and 120 minutes
271+
result[setting] = int_value
272+
except (ValueError, TypeError):
273+
# Use current value if conversion fails
274+
result[setting] = cwa_db.cwa_settings.get(setting, 15) # Default to 15 minutes
275+
else:
276+
result[setting] = cwa_db.cwa_settings.get(setting, 15) # Default to 15 minutes
277+
232278
# DEBUGGING
233279
# with open("/config/post_request" ,"w") as f:
234280
# for key in result.keys():

cps/duplicates.py

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# Calibre-Web Automated – fork of Calibre-Web
2+
# Copyright (C) 2018-2025 Calibre-Web contributors
3+
# Copyright (C) 2024-2025 Calibre-Web Automated contributors
4+
# SPDX-License-Identifier: GPL-3.0-or-later
5+
# See CONTRIBUTORS for full list of authors.
6+
7+
from flask import Blueprint
8+
from flask_babel import gettext as _
9+
from sqlalchemy import func, and_
10+
11+
from . import db, calibre_db, logger
12+
from .admin import admin_required
13+
from .usermanagement import login_required_if_no_ano
14+
from .render_template import render_title_template
15+
from .cw_login import current_user
16+
17+
duplicates = Blueprint('duplicates', __name__)
18+
log = logger.create()
19+
20+
21+
@duplicates.route("/duplicates")
22+
@login_required_if_no_ano
23+
@admin_required
24+
def show_duplicates():
25+
"""Display books with duplicate titles and authors"""
26+
print("[cwa-duplicates] Loading duplicates page...", flush=True)
27+
log.info("[cwa-duplicates] Loading duplicates page for user: %s", current_user.name)
28+
29+
try:
30+
# Use SQL to efficiently find duplicates with proper user filtering
31+
duplicate_groups = find_duplicate_books()
32+
33+
print(f"[cwa-duplicates] Found {len(duplicate_groups)} duplicate groups total", flush=True)
34+
log.info("[cwa-duplicates] Found %s duplicate groups total", len(duplicate_groups))
35+
36+
return render_title_template('duplicates.html',
37+
duplicate_groups=duplicate_groups,
38+
title=_("Duplicate Books"),
39+
page="duplicates")
40+
41+
except Exception as e:
42+
print(f"[cwa-duplicates] Critical error loading duplicates page: {str(e)}", flush=True)
43+
log.error("[cwa-duplicates] Critical error loading duplicates page: %s", str(e))
44+
# Return empty page on error
45+
return render_title_template('duplicates.html',
46+
duplicate_groups=[],
47+
title=_("Duplicate Books"),
48+
page="duplicates")
49+
50+
51+
def find_duplicate_books():
52+
"""Find books with duplicate title + primary author combinations using efficient SQL"""
53+
54+
# Get all books with proper user filtering - this is much simpler and more reliable
55+
# than trying to do complex joins for duplicate detection
56+
books_query = (calibre_db.session.query(db.Books)
57+
.filter(calibre_db.common_filters()) # Respect user permissions and library filtering
58+
.order_by(db.Books.title, db.Books.timestamp.desc()))
59+
60+
all_books = books_query.all()
61+
print(f"[cwa-duplicates] Retrieved {len(all_books)} books with user filtering applied", flush=True)
62+
63+
# Group books by title + primary author combination (case-insensitive)
64+
title_author_groups = {}
65+
66+
for book in all_books:
67+
# Ensure authors are loaded (lazy loading)
68+
if not book.authors:
69+
continue
70+
71+
# Get primary author (use Calibre-Web's standard approach)
72+
book.ordered_authors = calibre_db.order_authors([book])
73+
primary_author = book.ordered_authors[0].name if book.ordered_authors else "Unknown"
74+
75+
# Create case-insensitive key
76+
key = (book.title.lower().strip(), primary_author.lower().strip())
77+
78+
if key not in title_author_groups:
79+
title_author_groups[key] = []
80+
title_author_groups[key].append(book)
81+
82+
print(f"[cwa-duplicates] Grouped books into {len(title_author_groups)} unique title+author combinations", flush=True)
83+
84+
# Filter to only groups with duplicates and prepare display data
85+
duplicate_groups = []
86+
for (lower_title, lower_author), books in title_author_groups.items():
87+
if len(books) > 1:
88+
# Sort books by timestamp (newest first)
89+
books.sort(key=lambda x: x.timestamp, reverse=True)
90+
91+
# Add additional information for display
92+
for book in books:
93+
# Ensure we have ordered authors
94+
if not hasattr(book, 'ordered_authors') or not book.ordered_authors:
95+
book.ordered_authors = calibre_db.order_authors([book])
96+
97+
book.author_names = ', '.join([author.name.replace('|', ',') for author in book.ordered_authors])
98+
99+
# Add cover URL
100+
if book.has_cover:
101+
book.cover_url = f"/cover/{book.id}"
102+
else:
103+
book.cover_url = "/static/generic_cover.jpg"
104+
105+
duplicate_groups.append({
106+
'title': books[0].title,
107+
'author': books[0].author_names.split(',')[0].strip(), # Primary author
108+
'count': len(books),
109+
'books': books
110+
})
111+
112+
book_ids = [book.id for book in books]
113+
print(f"[cwa-duplicates] Found duplicate group: '{books[0].title}' by {books[0].author_names.split(',')[0].strip()} ({len(books)} copies) - IDs: {book_ids}", flush=True)
114+
log.info("[cwa-duplicates] Found duplicate group: '%s' by %s (%s copies) - IDs: %s",
115+
books[0].title, books[0].author_names.split(',')[0].strip(), len(books), book_ids)
116+
117+
# Sort by title, then author for consistent display
118+
duplicate_groups.sort(key=lambda x: (x['title'].lower(), x['author'].lower()))
119+
120+
print(f"[cwa-duplicates] Found {len(duplicate_groups)} duplicate groups total", flush=True)
121+
122+
return duplicate_groups

cps/main.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ def main():
3333
from .error_handler import init_errorhandler
3434
from .remotelogin import remotelogin
3535
from .kosync import kosync
36+
from .duplicates import duplicates
3637
try:
3738
from .kobo import kobo, get_kobo_activated
3839
from .kobo_auth import kobo_auth
@@ -78,6 +79,7 @@ def main():
7879
app.register_blueprint(gdrive)
7980
app.register_blueprint(editbook)
8081
app.register_blueprint(kosync)
82+
app.register_blueprint(duplicates)
8183
if kobo_available:
8284
app.register_blueprint(kobo)
8385
app.register_blueprint(kobo_auth)

cps/render_template.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,11 @@ def get_sidebar_config(kwargs=None):
9999
{"glyph": "glyphicon-th-list", "text": _('Books List'), "link": 'web.books_table', "id": "list",
100100
"visibility": constants.SIDEBAR_LIST, 'public': (not current_user.is_anonymous), "page": "list",
101101
"show_text": _('Show Books List'), "config_show": content})
102+
if current_user.role_admin():
103+
sidebar.append(
104+
{"glyph": "glyphicon-copy", "text": _('Duplicates'), "link": 'duplicates.show_duplicates', "id": "duplicates",
105+
"visibility": constants.SIDEBAR_DUPLICATES, 'public': (not current_user.is_anonymous), "page": "duplicates",
106+
"show_text": _('Show Duplicate Books'), "config_show": content})
102107
g.shelves_access = ub.session.query(ub.Shelf).filter(
103108
or_(ub.Shelf.is_public == 1, ub.Shelf.user_id == current_user.id)).order_by(ub.Shelf.name).all()
104109

0 commit comments

Comments
 (0)