Sound-Processing-for-Robotics

Logo

Human/Machine interaction has become more ubiquitous to daily life. People are good at focusing on a specific sound in a noisy environment, however, most robots can only react to one sound at a time. This project aims to improve sound localization and separation methods for robotics application, allowing robots to better act to simultaneous sound sources.

Back to homepage

View the Project on GitHub raymondminglee/Sound-Processing-for-Robotics

Introduction


Overview

This project aims at improving current sound processing methods for robotics applications. The improved methodology involves the usage of a microphone array and allows the robot to differentiate and localize simultaneous sound sources to perform better in a complex sound environment.

First, independent source signals from the recordings were extracted; then, the incidence angles of those sound sources were found. Several digital signal processing techniques are incorporated in the proposed method. Frequency-Domain Independent Component Analysis(FDICA) is used to extract independent sound sources from sound mixtures, and Time Delay of Arrival(TDOA) method is used to perform source localization.

Methodology

Sound source separation is used to distinguish independent sound from a sound mixture, and it can be done using Independent Component Analysis (ICA).

ICA allows robots to extract and recover the different sound content of a particular source from the mixtures of signals captured from microphones. After separating each sound source, the location of those sources can be determined.

Sound localization aims at giving robots spatial instructions, such as what direction to turn its head or what position to walk towards. This can be done using the Time Delay of Arrival (TDOA) method: since there are multiple microphones in an array, the sound intensities and phase information captured by an individual microphone is different from the others. The time difference of arrival on microphone pairs can be used to approximate corresponding angles of incidence, and the angles calculated can be used to approximate the possible source location.

Apparatus Design


The main consideration in the apparatus design is the capability of capturing and transferring multi-channel audio signals to a computer. Thus, the microphone selected should have a corresponding operating range to the human audible range, and the data acquisition unit should be able to sample all channels simultaneously.

DSP


The first step for signal processing is using the Frequency-Domain Independent Component Analysis(FDICA) to extract each of the individual sounds from mixtures. And the second step is to find the incidence angle for each of the sound sources through the TDOA method that uses the correlation between the extracted signal and the original signal.

Source Separation using FDICA

For this project, the implemented MATLAB code for Short-Time Fourier Transform and Inverse Short-Time Fourier transform is an open-source on MathWorks was written by Hristo Zhivomirov. For complex signal source separation, one of the functions called jade incorporated from JF Cardoso. The FDIC is conducted in the following sequence: first, the signals are transformed from time-domain to frequency-domain using STFT, Short-Time Fourier Transform. Then the signals are divided into narrow sub-bands, and the inverse of the mixing matrix A is optimized in each sub-band. Finally, the results are reconstructed back from the smaller sub-bands.

Source Localization Using TDOA

Sound localization using TDOA is one of the most convenient ways to localize sound a source. Since the microphone array configures microphones such that the distance between any two is not zero, a specific sound signal will arrive at each microphone at a slightly different time. The incidence angle of a sound source could be estimated using this time difference, under the assumption that sound travels at a constant speed in air.

Cross-Correlation

The accuracy of the time delay between signals from a pair of microphones is a key parameter to implement the above localization method. This time delay is found by performing a cross-correlation function in MATLAB.

Multi-Source Localization

Multi-source localization is achieved by implementing the TDOA localization method to both extracted source signals and microphone signals.

After ICA, each extracted source signal is first compared with the four signals captured from the microphones on the mid-plane. Whichever microphone signal that has the highest correlation with the extracted source means that this microphone is the closest to the source location. Then we use cross-correlation again to find the time delay between the extracted source and the adjacent microphones.

All the Matlab DSP function are availabel on to the repository

Result


Single Source Localization

The accuracy of the single-source localization method was obtained through experiments in which a speaker at a known location was recorded and compared to the calculated results.

Trail Actual (degree) Calculated (degree) Difference (degree)
1 270 273.7 -3.7
2 45 45 0
3 160 154.3 5.7
4 330 331 -1
5 90 95.8 -5.9

Source Extraction Result

For source extractions, a total of 5 sources were estimated. For each extracted source signal, we listened and subjectively identified which speaker the sound best represents. For the trail shown below, Speaker A is a female participant, and speaker B is a male participant.

The following graphs show the comparison between the participants’ voices and extracted source signals in time-domain.

the extracted source signals resemble the actual signal in time-domain. This resemblance indicates that the ICA algorithm can extract the sound source successfully.

However, the sound quality of the extracted signals is unstable, especially for the male speaker. The reason for this instability is suspected to be due to the low-frequency content of the male voice, which might be mixed up with the low-frequency background noise from the rooms’ HVAC system.

Multi-Source Localization

For one of the trails, during which two speakers are talking at the same time while the background music is playing, six microphones are used. Hence, we can extract up to five sources. Out of the five extracted sources, three of them contain useful audio content from the two-original speech. The localization result is shown below.

Extracted Signal Planar Angle Error Elevation Angle Error
Speaker A 230 -5 92 3
Speaker B 56 -56 99 11
Speaker A 228 -3 180 .-75

The accuracy of the multi-source localization is highly dependent on the quality of the ICA extracted signal. When the extracted signals can be identified subjectively as one of the known sound sources, the error between the actual and the estimated angle are relatively small (less than 10 degrees).

Comprehensive Report and Poster


If you are interested, a comprehensive report and a poster are available for download.

-Comprehensive Report -Poster