The need for
language aids is pervasive in today's world. Millions of individuals with
language and speech challenges require additional support for language
understanding and learning. Currently, however, these needs are not being met
because there are not enough skilled teachers, interpreters, and professionals
to give them the one on one attention that they need. Lipreading (speechreading
because it involves more than just the lips) allows deaf and hard of hearing
individuals to perceive and understand oral language and even to speak.
Speechreading seldom disambiguates all of the spoken input, however, and other
techniques have been used to allow a richer input. The proposed activity will
develop a real-time system to automatically detect robust characteristics of
auditory speech and to transform these continuous acoustic features into
continuous supplementary visible features. This information combined with
watching the speaker's face provides enough information for a person with
limited hearing to perceive and understand what is being said. This new
technology will allow the design of a wearable computing device that would
transform these continuous acoustic features into continuous supplementary
visible features and display them on a pair of eyeglasses. This system does not
require any learning on the part of the talker and is perceptually and
linguistically motivated because it is directly based on acoustic and phonetic
properties of speech and gives continuous rather than only categorical
information.
The technology we are proposing would be
ideally designed for wearable computing so a user could have a face to face conversation while
carrying a microphone and wearing a pair of simple glasses, which could also be
fitted with the person's normal eye prescripton if necessary. The wearable
product would process primitive characteristics of the speech signal such as
voicing (the presence of energy at the fundamental frequency such as heard in
vowel sounds); frication (high frequency noise like energy characteristic of
various consonants such as s, z, and sh; and nasality (which is a unique
resonance characteristics as in m, n, and ng). These characteristics would be
recognized in real time, and the output delivered simultaneously to a pair of
eyeglasses to illuminate small colored spots on the sides of the glasses.
The proposed research holds much promise
because people naturally integrate auditory and visual information. In
addition, the proposed system does not replace auditory information with the
supplementary cues but rather supplements the auditory speech that is normally
available to the listener. This strategy is particularly effective because of
the complementarity of auditory and visual speech. The auditory speech that is
robust in the signal and fairly easy to automatically recognize is exactly that
which is not visible on the face. This serendipitous occurrence makes it more
likely to succeed at automatically recognizing the robust acoustic
characteristics and simultaneously presenting them visually.
The proposed technology qualifies as a
transparent information appliance that adds perceptual and cognitive resources
to the listener. We have developed a
requirements analysis, a conceptual design, and possible physical designs for
this appliance. It consists of a very affordable noninvasive device that is
seamlessly integrated with normal dress, adding a pair of glasses (which might
be used regardless) and a handheld about the size of mobile phone, iPod, or
handheld computer. The augmented-reality device is also available for use 24/7,
and requires very little maintenance.
2009 Update:
The Problem. The need for language aids is pervasive in today’s world. There are millions of individuals who have language and speech challenges, and these individuals require additional support for communication and language learning.
Solution. Given the limitation of hearing speech for many individuals, we propose to supplement the sound of speech and speechreading with an additional informative visual input. Acoustic characteristics of the speech will be transformed into readily perceivable visual characteristics, which will be simultaneously displayed on the speechreader’s eyeglasses. These acoustic features provide important linguistic information not directly observed on the face and are transformed into visual cues intended to enhance intelligibility and ease of comprehension.
Intellectual merit: The proposed research will advance the state of the art in human machine interaction, speech, machine learning and assistive technologies. The proposed activity will advance engineering research and speech science by developing a real-time system to automatically detect and track robust characteristics of auditory speech and to transform these continuous acoustic features into continuous supplementary visible features.
Broader impacts: The proposed activity will benefit the society by providing a research and theoretical foundation for a system that would be naturally available to almost all individuals at a very low cost. It does not require automatic speech recognition, and will always be more accurate regardless of the advances or lack of advances in speech recognition technology. It does not require literate users because no written information is presented as would be the case in a captioning system; it is age-independent in that it might be used by toddlers, adolescents, and throughout the life span; it is functional for all languages because all languages share the corresponding acoustic characteristics; it would provide significant help for people with hearing aids and cochlear implants; and it would be beneficial for many individuals with language challenges and even for children learning to read.

