Zoom Into Books is a University of Cambridge CST Group Project. Our project won the 2021 Most Professional Project prize out of 21 projects.
It is a fully featured app and web interface for augmenting books with online content such as high-resolution images, links and videos. The app lets you scan books and will auto-detect pages and images in the viewfinder, automatically delivering augmented content to the user (without the need for them to type in links or scan QR codes). The web interface allows a publisher to upload books and set triggers and content for delivery.
See the Youtube video for an explanation and demo:
The original project brief:
Art and travel books often have beautiful images, but it’s frustrating that you can’t pinch to zoom as you would with a phone, to see arbitrarily high resolution details. The purpose of this project is to identify those times when a picture in a book or magazine corresponds to an existing high resolution image that is available online. Your Android app should work in augmented reality style, starting with a view of the book through the phone camera, but then seamlessly zooming by substituting high-resolution online data.
api: Contains our back-end API code, written in Flask & Pythonhtml: Contains our back-end web console code, written in PHP & HTML5app: Contains the code of our Android client, written in Java (with some GLSL)ocr: Contains the server-side code for OCR scanning, written in Python.
- AugmentedImageActivity.java
The main activity that performs AR and OCR image recognition.
- BarcodeScanActivity.java
The activity that scans a barcode and returns its numerical decoding.
- ContactActivity.java
Empty activity to provide a way to contact us or the publisher in the future
- ImageActivity.java
The activity which displays a gesture controlled image.
- InfoActivity.java
Empty activity to provide information about us in the future
- ListViewAdapter.java
An adapter to help SelectBookActivity manage the list of books
- ResourceHandlerActivity.java
The activity which handles the displaying, opening and storage of non ar resources.
- SelectBookActivity.java
This activity allows the user to select a book from the database by searching for it by title.
- SettingsActivity.java
Allows the user to switch between Light and Dark Mode. Provides framework for 2 other settings, should they be implemented in the future.
- WebViewActivity.java
Allows for opening arbitrary linked resources by URL, and displaying them to the user.
- WelcomeActivity.java
Beginning activity where the user chooses how they want to select the book.
See also \app\src\main\java\com\uniform\zoomintobooks\common\helpers, which contains many helper classes used within the activities above. The main classes of interest are:
- BookInfo.java and BookResource.java
These define the structure of both a book and an individual resource, once JSON objects returned by the API have been decomposed.
- AugmentedImageState.java
For a given detected image, this class stores the renderer for the overlay image, the bitmap to be rendered and the augmented image itself.
- ZoomUtils.java
Utility functions that parse JSON output from the API, decode the base64 encoded string that represents the database of images to detect (for use in AugmentedImageActivity.java), and generate an input stream given an image URL.
There are two account types: standard and administrator. Standard users can interact with the books and resources which they have created themselves, and administrators can interact with all books and resources as well as creating and editing users and publishers. Each standard user is associated with a publisher (administrators have no publisher) and all books which they create are associated with that publisher.
Most form interaction between the client and server is done asynchronously using JavaScript's XMLHTTPRequest, for sending, not XML, but POST query data to the server and receiving JSON back, indicating the status of the operation.
On the server, all operations must be authorised by a central function before they can proceed, and all operations are carried out as transactions so that the system remains consistent. MySQL provides transactions but for filesystem operations a custom system is used.
Many of the index.php pages take GET parameters, such as isbn for books or rid for resources. The action.php pages and other form handlers all require POST requests.
/login/:index.phpis the login form andaction.phpis the login form handler/logout/:index.phpis the logout handler which automatically redirects to the homepage when run/console/books/:index.phpshows the list of books which the user is permitted to view and edit (for administrators this is all books)/console/books/book/:index.phpis the book editing page,action.phpperforms updates to a book's properties,delete.phpdeletes a book andunlink.phpunlinks a resource from a book/console/books/book/cover/:index.phpshows the cover of a book as image/png/console/books/book/image/:index.phpshows a specific trigger image from a book/console/books/book/resource/new/:index.phpis the book-resource linking page andaction.phpperforms the linking and AR/OCR blob generation/console/books/book/upload/:index.phpshows the PDF copy of a book/console/books/new/:index.phpis the book creation form page andaction.phpperforms the actual book creation in the database and on the filesystem/console/publishers/:index.phpshows the list of all publishers (only accessible to administrators)/console/publishers/new/:index.phpis the publisher creation form andaction.phpperforms the creation/console/publishers/publisher/:index.phpis the publisher editing page,action.phpperforms the updates anddelete.phpperforms deletions/console/resources/:index.phpshows a list of all resources that the user can view and edit (for administrators this is all of them)/console/resources/new/:index.phpis the resource creation page andaction.phpactually creates the resource/console/resources/resource/:index.phpis the resource editing page,action.phpperforms the updates anddelete.phpperforms deletions/console/resources/resource/preview/:index.phpshows a preview of the resource as image/png/console/resources/resource/upload/:index.phpshows the resource (if it is hosted on our server)/console/users/:index.phpshows the list of all users (only accessible to administrators)/console/users/new/:index.phpis the user creation form andaction.phpperforms the creation/console/users/user/:index.phpis the user editing page andaction.phpperforms the updates anddelete.phpperforms deletions
These modules contains all operations on the entities after which they are named. The functions in each one tend to be pretty similarly named, hence why they're grouped together here. In the function list below, * stands for one of the entity types.
show_*_formshows the entity addition/editing form, selected using the$editargumentmanage_*_publisherhandles entity addition and editing, again selected using the$editargumentdelete_*handles entity deletioncan_edit_*determines whether the current user is allowed to edit the entity*_existsdetermines whether the entity in question exists in the databasefetch_*fetches a single entity, specified by its id (e.g. ISBN, username, etc.)fetch_*sfetches a list of all of those entities which are available to the user (e.g. books which the user is allowed to view)
books.php:
count_resourcesreturns the number of resources linked to this bookget_book_typereturns the MIME type of the uploaded PDF andNULLif no PDF was uploadedupdate_blobscallsgenerate_ar_blobandgenerate_ocr_blob. This is called whenever an operation updates the links between a book and its resourcesgenerate_image_listgenerates a text file in the format required byarcoreimgfor reading in the input image triggersgenerate_ar_blobinvokes Google'sarcoreimgon the image triggers specified to create a blob used by the app to detect image-based triggers. It callsgenerate_image_listto create the input list forarcoreimggenerate_ocr_blobinvokes/ocr/extract_pdf.pyon the PDF uploaded with the book to create a JSON object which the app can use to detect text-based triggers
resources.php:
fetch_book_resourcesreturns a list of resources associated with the given bookget_resource_mime_typereturns the MIME type of the resource if we host it, andNULLotherwisewas_resource_uploadedreturnstrueif the resource is hosted on our server andfalseotherwisemanage_resource_linksadds a link between some resources and a book, caused by the triggers specified. This callsupdate_blobsto re-generate the databases needed for OCR and AR detectionunlink_resourceseparates a resource from a book if they were linked togethergenerate_previewgenerates and stores a preview of the resource being added
users.php:
authenticatewill returntrueif the given username and password combination is valid and false otherwise
publishers.php:
fetch_user_publisherwill return the publisher to which a standard user belongs, or NULL if the user is an administrator
This module contains several utility functions which are used throughout the code.
initis called at the top of every script and sets up things like the sessionis_blank,is_valid_url,is_valid_resource_display_mode,is_valid_isbnandis_pos_intprovide input validationsanitisesanitises input data to prevent XSS and SQL injection attackserrors_occurred,add_error,clear_errors,add_notice,clear_noticesandset_successcontrol the status (errors, notices and success message) to display to the user.errors_occurredreturnstrueif there has been an error addeddisplay_statusandjson_statuspresent the current status to the user - the first prints the status directly to the page and the second formats it as JSON for asynchronous requestsget_remote_type,get_type,get_subtypeandget_typeclassare used to determine MIME type of local and remote resourcesauthorisedis the central authorisation function, which given an action, returnstrueif the current user is permitted to perform it andfalseotherwisedb_selectis a handy function for reducing boilerplate code forSELECTqueries- The
*pathfunctions simply return the path for a certain file based on the ISBN or resource ID generate_text_imagegenerates and outputs a PDF containing the given textgenerate_random_stringgenerates a random string of the specified lengthfile_rollback,file_commit,file_ops,rrmandrcpprovide file transaction capabilities.file_opsis used to actuall perform the operations,rrmis for recursive removal andrcpis for recursive copyrollbackandcommitperform both database and filesystem rollbacks and commits
This module automatically includes all the other modules in this directory so that each page only need have one include statement, to include this file.
This module is for creating the page headers. The make_header function generates a header containing all the necessary stylesheets and scripts, and has customisable <meta> tags
This module is for creating the page footers and contains one function, make_footer, to do so
There is only one JavaScript file, utils.js. It is mainly taken up by functions for submitting forms asynchronously, all of which are fairly similar and rely on the request function for the actual XHR. request sends the FormData object given to it and disables the button used to submit it until a response is received. Once a response is received, any errors, notices and success messages are displayed, and the page is redirected if necessary. Additionally, a callback may be specified to do something with the response data.
The other interesting function in utils.js is ask_user, which produces a custom dialog on the screen with two buttons. If the user presses "No", no action is taken, but if they press "Yes" then the callback given to the function will be called.
There are three stylesheets: fonts.css, forms.css and main.css. fonts.css just defines some custom fonts to use in the interface, forms.css contains the several styles applied to HTML form elements and main.css contains rules for all other elements, including containers (cards, etc.), headings, text and links.
The structure of the database is shown below. It runs using MySQL.
The API for the app (/api/main.py) is written in Python; it uses Flask and SQLAlchemy to both model and query the MySQL database. The key five endpoints are as follows:
/books/resources/<isbn>returns all the resources for a particular book, specified by its ISBN./books/title/<title>performs the server-side match; given a query string, we return possible titles and corresponding ISBNs for books the user may be intending to identify./books/resources/ridreturns a particular resource, as specified by its id./booksreturns a list of all the books in the database, and associated information./books/<isbn>is the main endpoint, which returns all information about a book (specified by ISBN), including its name, edition, ISBN, and the AR and OCR resources needed by the app.
The algorithm used to return search results is given in /api/levenshtein.py, using the inbuilt Python library difflib.
/ocr/extract_pdf.py contains code that extracts text from designated pages of a PDF, returning a JSON blob that the webserver sends to the client device. The client device can then use this for text recognition and matching. This script is called by the webserver.
Some source code from the ARCore Android SDK is included in our android app, licensed under Apache 2.0. Some snippets come from the Android Documentation, also licensed under Apache 2.0
The Barcode Scanner in the app uses ZXing library, licensed under Apache 2.0.
The JSON parsing in the app uses gson by Google, licensed under Apache 2.0.
The AR functionality also makes use of a pin 3D model, created by Google, licensed under CC-BY.
/html/assets/fonts contains static files not written by us. Open Sans font licensed under Apache 2.0.
/html/assets/images/icons/ contains icons from IconsDB. See https://uniform.ml/credits.php for more information.
/html/assets/images/icons/plus-5-128.png,/html/assets/images/icons/briefcase-6-128.pngand/html/assets/images/icons/headphones-4-128.pngare licensed under CC0 1.0 Universal (CC0 1.0) Public Domain Dedication/html/assets/images/icons/group-128.png,/html/assets/images/icons/book-stack-128.png,/html/assets/images/icons/web-128.pngand/html/assets/images/icons/pages-1-128.pngare licensed under Creative Commons Attribution-NoDerivs 3.0
This project, with the exception of open-source components as listed above, is All Rights Reserved. If you wish to use some code from it, please get in touch for permission.

