Audio Event Detection, Classification, and Training for Embedded Linux IoT Devices

In an article published by the National Institues of Heath (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5424731/), acoustic event detection is described as a tool to enhance safety for patients in assisted living facilities:

In this regard, we have selected the following representative 14 audio events: someone falling down, a knife slicing, someone screaming, rain drops, a printer working, people talking, someone frying food, someone filling water in a pot, someone knocking the door, a dog barking, a car horn, a glass breaking, a baby crying, and water boiling.

Acoustic event detection can be performed with inexpensive ARM based embedded Linux systems such as the Raspberry Pi and integrated with other audio and video functions such as monitoring and motion detection.

Extraction of Audio Characteristics for Gunshot from Google Research AudioSet on RPi3

Also, speech recognition and synthesis can also be utilized for control of functions, such as controlling lights and requesting assistance (e.g. “please bring dinner”). It is no longer necessary to connect to a central cloud computer in order to perform acoustic event detection and speech recognition, which means that your embedded Linux appliances can be installed in facilities where Internet access is not available or allowed, and third parties do not have access to private and confidential medical or other conversations (i.e. HIPAA requirements).

Acoustic event detection can be broken down into several steps:

extraction of audio characteristics (development)
classification of unknown acoustic events (development)
training your system to recognize types of acoustic events (development)
detecting acoustic events (live)
logging acoustic events and the associated actions (live)
if appropriate, recording acoustic events (live)
taking actions such as notifications and appliance control (live)

In recent years many excellent open source projects have reduced the development time for audio and video embedded Linux projects. Many of the open source projects listed below have been completely or partially funded by governments, universities, and private companies:

TensorFlow AudioSet – a dataset of over 2 million human-labeled 10-second YouTube video soundtracks, with labels taken from an ontology of more than 600 audio event classes
ffmpeg – audio and video analysis and transcoding
Zamia Speech – speech recognition
pyAudioAnalysis – acoustic event detection, classification, and training
OpenVidu – WebRTC video conferencing
OpenCV – optical recognition
ArduPilot – flight control and navigation

OTTStreamingVideo has extensive experience in audio and video hardware and software design, including acoustic event detection, speech recognition, optical recognition, machine control, and autonomous navigation. Please contact us for additional information.