【正文】
ement in the prosodic quality of the resulting synthesized speech, Our investigations of the system39。 Luce et al. 1983) have suggested that prosodic differences between synthetic and natural speech are the primary, unaddressed factor leading to difficulties in the prehension of fluent synthetic speech. The relation between phraselevel prosody and its sources, however, is so poorly understood that we have no good sense of the degree to which different levels of explanationsyntactic, semantic, or pragmaticare applicable. We currently have reasonable tools for automatic syntactic analysis of a text. but there is nothing equivalently welldeveloped for semantic or pragmatic textual analysis. Thus an obvious goal is to explore the extent to which phraselevel prosody can be explained by the syntax tree and develop a detailed description of that relation. A further goal is to convert the resulting insights about this relation into a system that can work with a speech synthesizer. This allows us to test our description more adequately and perhaps also produce something that will further text tospeech technology. SYNTACTIC STRUCTURE AND PROSODIC PHRASING Beyond the word level, however, there has been little investigation of systematic connections between syntactic structure and prosodic phrasing. The psycholinguistic and acoustic investigations of Cooper and PacciaCooper (1980), Umeda (1982) and Gee and Grosjean (1983)and the prosodic theory of Selkirk (1984) are among the more notable studies and represent the two main approaches to syntax/prosody relations. In Cooper and PacciaCooper (1980) and Umeda (1982), the connection from syntax to prosodic phrasing is unmediated by any filtering process, .. they propose that the details of prosodic phrasing can be determined directly from syntactic structure by associating particular syntactic nodes (or constituent boundaries) with a phoic value, either pausing, segmental lengthening, or the blocking of the cross word conditioning of phonological rules. By contrast, Gee and Grosjean (1983) and Selkirk (1984) believe that the syntaxprosody relation is indirect: prosodic phrasing is derived by rules that refer to lefttoright ordering, length (or branching patterns), and, in the case of Selkirk grammatical function, as well as constituent membership in order to infer a hierarchical prosodic structure. But while their respective positions are quite clear, none of these studies is conclusive. All lack a syntactic framework sufficiently detailed and formalized to allow extensive testing, and most consider only a small number of sentences and sentence types. To develop our analysis, we first examined prosodic phrasing in the speech of one of us reading prose from various texts, including four instruction manuals. These texts were later augmented by a professional reading of a prose story. The boundaries between prosodic phrases were identified and then classed according to their syntactic context and semantic function. Texttospeech Synthesis The programs that make up the speech ponent are described in Liberman and Buchsbaum (personal munication). These programs take character text as input and produce digitized speech output. By annotating the input text to this system, many aspects of its operation can be overridden or modified: . the location of major and minor phrase boundaries, the stress given to words, the transcription of words and the boundaries between them, the timing of segments, and details of the pitch contour. As we will show, with our prosody system we are able to produce strings in which four boundary levels are identified and perceptually distinguished, using the current text tospeech system annotations. Prosodic Phrasing The prosody rules use information about constituent structure, grammatical role, and length to map a surface structure. The prosody tree identifies the location of phrase boundaries (signified by the nodes) and the relative strength of each boundary (signified by a number in the node). It is this information that is used to annotate the input text with escape sequences that provide the textto speech system with instructions about prosodic phrasing. In formulating our rules for building the prosodic structure, we began with the idea of simply implementing the model of Gee and Grosjean (1983). This model, initially proposed to predict a form of psychological data describing subjective sentence structure known as performance structure, determines prosodic boundaries from a syntactic tree, but assumes rather than explicitly presents a syntactic ponent. We were initially attracted to the Gee and Grosjean model because of its emphasis on relative boundary weighting, ., on the determination of the strength of a given boundary with respect to the other boundaries in the sentence. We found that in the data we had collected, this weighting played an important role. In fact, we incorporated directly into our system one method of doing this weighting, namely Gee