【正文】
特點(diǎn)包括單位本身和該單位的環(huán)境特點(diǎn)。本機(jī)的功能本身用于選擇正確的單位,符合段的要求,而環(huán)境的特點(diǎn)是用于最好的選擇內(nèi)容相關(guān)的單位,這可能減少選擇的單位之間的不連續(xù)性。主體為基礎(chǔ)的合成實際上是一種串聯(lián)模式匹配的過程。在合成,工作需要做的是選擇最佳單位,發(fā)音和韻律的最佳匹配的目標(biāo)單位。同時,選擇的單位之間的不連續(xù)性,應(yīng)盡可能小。為了滿足這些要求,兩種成本的界定應(yīng)合成。一個是單位成本,介紹如何關(guān)閉選擇的單位到所需的單位。另一種是連接的成本,它描述了連續(xù)性的程度單 位之間的選擇。總成本是兩種成本的加權(quán)和。 3 基元 選擇 在語音合成過程中接受來自韻律生成零件信息,檢索講話單位數(shù)據(jù)庫來為每一個適當(dāng)?shù)膯挝徊檎夷繕?biāo)語音單位。該裝置可以選擇過程如圖 1 所示,在圖中,目標(biāo)一句是“今天很熱”,由 4 個音節(jié)組成。每個音節(jié)有一組候選單位。粗線厚邊框顯示選定的 基元 序列。在單位選擇過程,為了獲得最佳的講話,我們要考慮( 1)通過與目標(biāo)單位的比較,候選單位是否適當(dāng),( 2)被選擇的單位之間鏈接的平滑。因此,選擇過程是要找到一個在所有的最佳路徑在連接晶格可能路徑。搜索過程是按照一個成本函數(shù),它描述 對一個單位,兩個單位之間的平滑度的適當(dāng)程度。 4 語料庫 正如我們前面提到的,一個大語料是用于基于合成的單位選擇。該語料包含了大量收集的話語。合成的單位將被從語料中提取。盡可能多地覆蓋上下文相關(guān)單位和韻律的變種是理想的。但是,建立一個非常大的語料,有一個完整的覆蓋單位的變種,這通常是不可能的。由于建設(shè)有高品質(zhì)的大型語料庫的成本非常昂貴的,平衡是通常由覆蓋面和規(guī)模之間衡量。 在此研究中,我們建立了一個約 38000 音節(jié)語料。這語料的腳本是從一個大的文本語料庫(約 3 億個漢字)選擇的。主體是設(shè)計來盡可能覆蓋經(jīng)常使用的獨(dú)立音節(jié)和上下文相關(guān)的音節(jié)。我們使用 北大人民日報的 文本語料庫,作為真正的word 文本參考來評估腳的本主體。我們算出創(chuàng)建語料庫覆蓋的 %的音節(jié)出現(xiàn)在北大語料庫。當(dāng)單位上下文是由最初和最后一類分組(我們定義了 11 個 聲母 類和 10 個 韻母 類)中,語料覆蓋的 %的單位的類出現(xiàn)在北大文本語料庫。有了這樣的覆蓋面,我們認(rèn)為,對于基于合成的單位選擇,語料庫是合適的。 外文翻譯文獻(xiàn)(英文) THE CONTRIBUTION OF PARSING TO PROSODIC PHRASING IN AN EXPERIMENTAL TEXTTOSPEECH SYSTEM INTRODUCTION We describe an experimental texttospeech system that uses a deterministic parser and prosody rules to generate phraselevel pitch and duration information for English input. This information is used to annotate the input sentence, which is then processed by the texttospeech programs currently under development at Bell Labs. In constructing the system, our goal has been to test the hypotheses (i) that information available in the syntax tree. In particular. grammatical functions such as subjectpredicate and headplement, is bv itself useful in determining prosodic phrasing for svnthetic speech, and (ii) that it is possible to use a syntactic parser that specifies grammatical functions to determine prosodic phrasing for synthetic speech. Although certain connections between syntax and prosody are wellknown (. the influence of part of speech on stress in words like progress, or the setting off of parenthetical expressions) very little practical knowledge is available on which aspects of syntax might be connected to prosodic phrasing. In many studies, investigators have sought connections between constituent structure and prosody (. Cooper and PacciaCooper 1980. Umeda 1982. Gee and Grosjean 1983) but, with the exception of Selkirk (1984). they tend to neglect the representation of grammatical functions in the svntax tree. Moreover, previous work has not been specific enough to provide the basis for a full system implementation. Based on our study of prosodic phrasing in recorded human speech, we decided to emphasize three aspects of structure that relate to phrasing: syntactic constituency, grammatical function, and constituent length. These findings. which we will discuss in detail, have been implemented as a collection of prosody rules in an experimental texttospeech system. Two important features characterize our system. First. the input to our prosody system is a parse tree generated by a version of the deterministtc parser Fidditch (Hindle 1983). The leftcorner search strategy of this parser and, in particular, its determinism, give Fidditch the speed that makes online texttospeech production feasible. In building a parse tree, Fldditch identifies the core subjectverb object relations but makes no attempt to represent adjunct or modifier relations. Thus relative clauses,adverbials, and other nonargument constituents have no specified position in the tree and no specified semantic role. Second. the rules in the prosody system build a prosody tree by referring both to the syntactic structure and to earlier stages of prosodic structure. The result is a hierarchical representation that supports the view, also proposed in Selkirk (1984) that grammatical function information is related to prosodic , but indirectly, through different levels of processing. Informal tests of the system show that it is capable of producing a significant improvement in the prosodic quality of the resulting synthesized speech, Our investigations of the system39。s problems, which we describe, have not revealed any serious counterexample to our basic approach. In many cases,it appears that problems with the current version can be resolved by taking our approach a step further, and including lexical information required by the parser as another factor in the determination of prosodic phrasing. TEXTTOSPEECH Most texttospeech systems prise two ponents: pronunciation rules and a speech synthesizer. Pronunciation rules convert the input text into a phoic transcription。 this information mav also be supplemented by a dictionary that provides information about the part of speech, stress pattern and phoic makeup of particular words. The speech synthesizer then converts this phoic transcription into a series of speech parameters which are subsequently processed to produce digitized speech. While these systems tend to perform qu