Task Description

The audio zooming problem can be viewed as 'spatial filtering' in the array signal processing context. There are many contemporary methods available, not many for audio zooming but in different applications. In general, the resolution of the spatial filtering such as beamforming methods is a function of the number of sensors (here microphones). So, for better resolution, the number of microphones should be more; but all smartphones are equipped with at least two or a maximum of three because of space constraints. However, there is no restriction that this problem must be approached solely through beamforming, as both audio and visual zooming are required. Creative ideas and strategies, including classical approaches, artificial intelligence-based methods, or innovative approaches, are encouraged.

The focus of this challenge is to design a robust audio zooming system. This encompasses designing a microphone array configuration, developing algorithms, and creating applications for both Android and iOS. Additionally, it involves real-time implementation and evaluation of both Scenario 1 and Scenario 2, similar to the one presented. (Note: Sources on the subject can be of choice and may even be identical to the concept videos listed above.)

Scenario 1: AV zoom without
the presence of “noise”

Scenario 2: AV zoom with the presence of “noise”

EXPECTED DEMONSTRATION

When a participant performs a pinch gesture on the smartphone screen during video recording to zoom in or out. As the user zooms closer, the audio of the focused subject becomes amplified while zooming out restores the ambient sounds around it in the real-time scenario.

TASK DESCRIPTION

Scenario 1: AV zoom without the presence of “noise”

Scenario 2: AV zoom with the presence of “noise”

Scenario 1: AV zoom without
the presence of “noise”