-
Notifications
You must be signed in to change notification settings - Fork 10
Home

###Amazon AWS Used for hosting the web server in Japan.
###Tornado Used as the web server for serving the webapp and delivering data to the browser via websockets.
###Microsoft Cognitive Services/Azure Used for hosting the Bing Speech API server instance.
###Hark SaaS Used for analyzing the audio files.
###Speech Recognition Used for transcribing the audio file via a wrapper around the Bing Speech API, as Google Speech API is not available anymore.
###d3.js Used for creating real-time data visualizations in the browser.
###crossfilter.js Used for n-dimensional filtering of multivariate datasets across D3 charts.
###c3.js A wrapper around D3.js for building charts quickly.
STAGING_AREA = '/tmp/'
STATIC_PATH = 'static'
HTML_TEMPLATE_PATH = 'templates'
LISTEN_PORT = 80
LANGUAGE = 'ja-JP'
-
STAGING_AREA is the working space for processing audio files.
-
STATIC_PATH is the location of the javascript and css files to be served.
-
HTML_TEMPLATE_PATH is the location of the html files to be rendered by the Tornado web server.
-
LISTEN_PORT is the port on which the Http Server and websocket listen.
-
LANGUAGE is the locale used by Bing Speech API for speech recognition.
settings = { 'static_path': os.path.join(os.path.dirname(__file__), STATIC_PATH), 'template_path': os.path.join(os.path.dirname(__file__), HTML_TEMPLATE_PATH) } -
These settings are for passing to the Tornado application instance to map static files.
default_hark_config = { 'processType': 'batch', 'params': { 'numSounds': 2, 'roomName': 'sample room', 'micName': 'dome', 'thresh': 21 }, 'sources': [ {'from': 0, 'to': 180}, {'from': -180, 'to': 0}, ] } -
This is the default configuration metadata that the Hark session is initialized with. It presumes the uploaded audio file is an 8-channel file with two unique speakers.
-
The handler for HTTP requests sent to the server. Get requests are asynchronous and can be handled concurrently. The post request handles upload of the audio file in a subprocess via corountine, which is non-blocking, however when the web socket writes hark and speech recognition data back to the browser, this occurs on the same port, so HTTP requests may be queued when the port is very busy. Future work is to set up an Nginx server behind a load balancer, and to reduce contention for the port.
def async_upload(file): -
A function called asynchronously for non-blocking uploads
- A simple wrapper around PyHarkSaas to inject logging and additional logic
- A wrapper around Speech Recognition module to inject additional logic. The Speech Recognition API is a module which supports multiple recognition APIs. Hark Visualizer uses the Bing Speech API, which is backed by an Azure instance hosting the API instance in west-us region (this was the only availability zone).
-
This is where the main websocket work is done. A websocket is initiated in JavaScript by the browser when the user navigates to visualization.html:
// Connect to the remote websocket server var connection = new WebSocket("ws://harkvisualizer.com/websocket");
This triggers the Tornado web server to send the analysis results from Hark to the browser via this socket:
# Invoked when socket is opened by browser
def open(self):
log.info('Web socket connection established')
\# Do not hold packets for bandwidth optimization
self.set_nodelay(True)
# ioloop to wait before attempting to sending data
tornado.ioloop.IOLoop.instance().add_timeout(timedelta(seconds=1),
self.send_data)
def send_data(self, utterances_memo = []):
if hark.client.getSessionID():
results = hark.client.getResults()
utterances = results['context']
# If result contains more utterances than memo
if len(utterances) > len(utterances_memo):
# Must iterate since new utterances
# could be anywhere in the result
for utterance in utterances:
utterance_id = utterance['srcID']
# If utterance is new
if utterance_id not in utterances_memo:
# Memoize the srcID
utterances_memo.append(utterance_id)
self.write_message(json.dumps(utterance))
log.info("Utterance %d written to socket", utterance_id)
if hark.client.isFinished():
# If we have all the utterances, transcribe, then close the socket
if sum(results['scene']['numSounds'].values()) == len(utterances_memo):
for srcID in range(len(utterances_memo)):
random_string = ''.join(choice(ascii_uppercase) for i in range(10))
file_name = '{0}{1}_part{2}.flac'.format(STAGING_AREA, random_string, srcID)
hark.get_audio(srcID, file_name)
transcription = speech.translate(file_name)
utterance = utterances[srcID]
seconds, milliseconds = divmod(utterance['startTimeMs'], 1000)
minutes, seconds = divmod(seconds, 60)
self.write_message(json.dumps(
'{0} at ({1}:{2}:{3}):'.format(utterance['guid'], minutes, seconds, milliseconds)))
self.write_message(json.dumps(transcription, ensure_ascii=False))
del utterances_memo[:]
self.close()
else:
tornado.ioloop.IOLoop.instance().add_timeout(timedelta(seconds=1), self.send_data)
