Skip to content
Chris Garry edited this page Jul 5, 2016 · 18 revisions

UI Design

Homepage

Homepage

Visualization Page

Visualization Page Visualization representations are two charts and a text transcription box. The Utterances at Azimuth chart represents all utterances spread over the range of azimuths detected in the audio file. The Speech in Seconds chart represents the aggregate time of all utterances detected for each speaker. Transcription represents the speech recognition results from Bing Speech API, including transcription errors which are represented by 'Inaudible'. These are typically sounds that were identified as speech by Hark Saas and sent to Bing for recognition, but were actually background noise. Each of the representations on this page loads and updates dynamically as the results become available to the backend server. Each chart is interactive, with the ability for the user to cross-filter the results by clicking on an element in the chart (a slice in the pie chart, or a bar in the bar chart).

Internal Specification

Server

STAGING_AREA = '/tmp/' STATIC_PATH = 'static' HTML_TEMPLATE_PATH = 'templates' LISTEN_PORT = 80 LANGUAGE = 'ja-JP'

  • STAGING_AREA is the working space for processing audio files.
  • STATIC_PATH is the location of the javascript and css files to be served.
  • HTML_TEMPLATE_PATH is the location of the html files to be rendered by the Tornado web server.
  • LISTEN_PORT is the port on which the Http Server and websocket listen.
  • LANGUAGE is the locale used by Bing Speech API for speech recognition.

settings = { 'static_path': os.path.join(os.path.dirname(__file__), STATIC_PATH), 'template_path': os.path.join(os.path.dirname(__file__), HTML_TEMPLATE_PATH) }

  • These settings are for passing to the Tornado application instance to map static files.

default_hark_config = { 'processType': 'batch', 'params': { 'numSounds': 2, 'roomName': 'sample room', 'micName': 'dome', 'thresh': 21 }, 'sources': [ {'from': 0, 'to': 180}, {'from': -180, 'to': 0}, ] }

  • This is the default configuration metadata that the Hark session is initialized with. It presumes the uploaded audio file is an 8-channel file with two unique speakers.

class HttpRequestHandler

  • The handler for HTTP requests sent to the server. Get requests are asynchronous and can be handled concurrently. The post request handles upload of the audio file in a subprocess via corountine, which is non-blocking.

def async_upload(file):

  • A function called asynchronously for non-blocking uploads

class Hark

  • A simple wrapper around PyHarkSaas to inject logging and additional logic

class Speech

  • A wrapper around Speech Recognition module to inject additional logic. The Speech Recognition API is a module which supports multiple recognition APIs. Hark Visualizer uses the Bing Speech API, which is backed by an Azure instance hosting the API instance in west-us region (this was the only availability zone).

class WebSocketHandler

  • This is where the main websocket work is done. A websocket is initiated by the browser when the user navigates to visualization.html, which triggers sending the analysis results from Hark to the browser via this socket.

Clone this wiki locally