项目编号: | 1509791
|
项目名称: | UNS: Collaborative Research: Prosodic Control of Speech Synthesis for Assistive Communication in Severe Paralysis |
作者: | Susan Koch Fager
|
承担单位: | Madonna Rehabilitation Hospital
|
批准年: | 2014
|
开始日期: | 2015-07-15
|
结束日期: | 2018-06-30
|
资助金额: | USD112021
|
资助来源: | US-NSF
|
项目类别: | Continuing grant
|
国家: | US
|
语种: | 英语
|
特色学科分类: | Engineering - Chemical, Bioengineering, Environmental, and Transport Systems
|
英文关键词: | user
; synthetic speech
; speech synthesis
; research goal
; typical speech
; communication
; speech
; research objective
; aac
; real-time control
; loudness
; sound
; daily communication
; prosodic control
; speech technology
; aac interface
; prosodic marker
; pitch
; multi-stress speech bank
; stress
; intelligible speech production
; individual
; severe paralysis
; augmentative communication
; communication rate
; prosodic feature
; speech production
; two-dimensional cursor control
; clinical communication system
; speech synthesizer
|
英文摘要: | 1510563(Stepp) & 1509791 (Koch Fager)
This work will develop and evaluate a system to allow individuals with unintelligible speech due to severe paralysis to control a speech synthesizer that includes prosody (changes in the pitch, loudness, and duration in speech that convey meaning). This advancement to the synthetic speech and the ease of its control by users will facilitate improved functionality of clinical communication systems, thus improving the quality of life of users. Natural and intelligible speech production in these individuals will increase their ability to participate actively in society and empower them to self-advocate for their own medical management.
The research objective of this proposal is to test the hypothesis that providing users of alternative and augmentative communication (AAC) with a method for prosodic control will result in speech synthesis that is more natural to listeners and provides greater function to users. Up to 1.2% of the population is unable to meet daily communication needs using typical speech due to stroke or other neurological injury, requiring AAC to meet their communication needs. Their quality of life is strongly dependent on access to this communication, both for social interaction as well as to relay information about urgent medical needs. The most advanced AAC devices incorporate speech synthesis, allowing the users to communicate orally with others. However, the resulting synthetic speech is both unnatural and difficult for others to understand, and is often described as "robotic". Specifically, synthetic speech does not vary in pitch, loudness, or rhythm, the prosodic features utilized in typical speech to relay emotional state, utterance form (statement vs. question), irony, and emphasis. Asking AAC users to control each of these dimensions individually would result in an intractably slow and complex system, an unacceptable burden for individuals who already have considerably reduced communication rates. Instead, this project will leverage the fact that typical speech predictably uses these prosodic markers (pitch, loudness, rhythm) in concert. A novel AAC interface will be developed to allow users to modify the overall "stress" of synthetic speech output as a single dimension, in order to provide easily controlled, natural, and intelligible speech synthesis. The co-PIs will use their combined expertise in speech technology, clinical application of AAC, and real-time control of human-machine-interfaces to enable essential advancements in AAC technology to achieve three goals. In Research Goal 1, a multi-stress speech bank for concatenative speech synthesis will be created via a novel interactive procedure in which speech productions of healthy speakers are "misunderstood", thus prompting speakers to naturally emphasize specific target sounds in their repeated responses. This will result in a bank of triphones (sounds with a specific left and right context, based on surrounding sounds) with all potential combinations of sounds and stresses. Research Goal 2 is to develop an AAC interface that allows users to select phonemes (individual sounds of speech) using two-dimensional cursor control (e.g., head-tracking, eye-tracking) in which the stress of individual phonemes will be based on cursor dwell time. In Research Goal 3, the functionality of the AAC interface will be evaluated by testing its effect on the naturalness of communicative interactions. |
资源类型: | 项目
|
标识符: | http://119.78.100.158/handle/2HF3EXSE/94030
|
Appears in Collections: | 影响、适应和脆弱性 气候减缓与适应
|
There are no files associated with this item.
|
Recommended Citation: |
Susan Koch Fager. UNS: Collaborative Research: Prosodic Control of Speech Synthesis for Assistive Communication in Severe Paralysis. 2014-01-01.
|
|
|