Upon committing the search input, the window is refreshed and results are rendered one under the other, while the query remains in the search bar. Each result consists of an article title Label, followed by the post date of the article and a small summary of the article content. Results that have been revisited will have their title color darker, so that the user knows they've visited that article before.
By clicking on a desired result, the user is brought to a new tab where the full article is displayed. Any words or phrases in the article matching the search query will be underlined yellow color(marker). The user can search for other words or phrases in the file by clicking the "find in file" button.
The application uses the Apache Lucene library. Lucene offers powerful indexing and search functionalities, along with features such as spell checking, hit highlighting, and sophisticated analysis/tokenization capabilities.
To be make the files searchable by the user, we first need to convert them into a Lucene Documents. Each document consists of one or more fields. In this case, the Document fields are the articles title, abstract, year of publication and full text. To analyze the full text field, Lucene provides the SimpleAnalyzer which breaks down text into tokens.
After analyzing each article, we pass to the Indexing phase. The Index stores statistics of each Document ot make searching more efficient. To do so, we initialize a new IndexWriter, to which we provide an index directory (Directory) and an index configuration (IndexWriterConfig). The index directory is the place we want to store our index, which can be the disk (FSDirectory) or the RAM (ByteBuffersDirectory). Once the index is ready we add all the Documents to it with the addDocument() method. Each document added is analyzed by the analyzer defined in the index configuration.
By having an Index present in the application we can now search for results in our dataset. Once the user has typed their question, it needs to be converted into an actual query. Lucene provides the Query and QueryParser classes for this exact task. The QueryParser needs to be provided with the field of the document we want to base our search on. We also need to provide the analyzer. Then, a Query object created by calling the QueryParser.parse(" ") method with the plain text question as a parameter.For the purpose of advanced searching, such as searching by title, author, etc. we can use the MultiFieldQueryParser class. This class allows us to search for results based on multiple fields at once.
To search for answers to the query, we need to create a DirectoryReader and an IndexSearcher. We provide the reader with the path where the Directory is stored, then pass the reader to the IndexSearcher. The searcher is now ready it look for results in our dataset. We use the IndexSearcher.search(Query query) method to commit our query. This will return a TopDocs object. A TopDocs contains a hits object with the number of hits, and a ScoreDocs array. Each ScoreDoc contains the documents index and its score. We can now iterate through the ScoreDocs array and retrieve the documents and their fields to return the to the user.