【文章內(nèi)容簡介】
ds: er, um, uh. Coughing, sneezing, laughing, sobbing, and even hiccupping can be a part of what is spoken. And the environment adds its own noises。 speech recognition is difficult even for humans in noisy places. 畢業(yè)文獻翻譯 History of Speech Recognition Despite the manifold difficulties, speech recognition has been attempted for almost as long as there have been digital puters. As early as 1952, researchers at Bell Labs had developed an Automatic Digit Recognizer, or Audrey. Audrey attained an accuracy of 97 to 99 percent if the speaker was male, and if the speaker paused 350 milliseconds between words, and if the speaker limited his vocabulary to the digits from one to nine, plus oh, and if the machine could be adjusted to the speaker39。s speech profile. Results dipped as low as 60 percent if the recognizer was not adjusted. Audrey worked by recognizing phonemes, or individual sounds that were considered distinct from each other. The phonemes were correlated to reference models of phonemes that were generated by training the recognizer. Over the next two decades, researchers spent large amounts of time and money trying to improve upon this concept, with little success. Computer hardware improved by leaps and bounds, speech synthesis improved steadily, and Noam Chomsky39。s idea of generative grammar suggested that language could be analyzed programmatically. None of this, however, seemed to improve the state of the art in speech recognition. Chomsky and Halle39。s generative work in phonology also led mainstream linguistics to abandon the concept of the phoneme altogether, in favour of breaking down the sound patterns of language into smaller, more discrete features. In 1969, John R. Pierce wrote a forthright letter to the Journal of the Acoustical Society of America, where much of the research on speech recognition was published. Pierce was one of the pioneers in satellite munications, and an executive vice president at Bell Labs, which was a leader in speech recognition research. Pierce said everyone involved was wasting time and money. It would be too simple to say that work in speech recognition is carried out simply because one can get money for it. . . .The attraction is perhaps similar to the attraction of schemes for turning water into gasoline, extracting gold from the sea, curing cancer, or going to the moon. One doesn39。t attract thoughtlessly given dollars by means of schemes for cutting the cost of soap by 10%. To sell suckers, one uses deceit and offers glamor. 畢業(yè)文獻翻譯 Pierce39。s 1969 letter marked the end of official research at Bell Labs for nearly a decade. The defense research agency ARPA, however, chose to persevere. In 1971 they sponsored a research initiative to develop a speech recognizer that could handle at least 1,000 words and understand connected speech, ., speech without clear pauses between each word. The recognizer could assume a lowbackgroundnoise environment, and it did not need to work in real time. By 1976, three contractors had developed six systems. The most successful system, developed by Carnegie Mellon University, was called Harpy. Harpy was slow—a foursecond sentence would have taken more than five minutes to process. It also still required speakers to 39。train39。 it by speaking sentences to build up a reference model. Noheless, it did recognize a thousandword vocabulary, and it did support connected speech. Research continued on several paths, but Harpy was the model for future success. It used hidden Markov models and statistical modeling to extract meaning from speech. In essence, speech was broken up into overlapping small chunks of sound, and probabilistic models inferred the most likely words or parts of words in each chunk, and then the same model was applied again to the aggregate of the overlapping chunks. The procedure is putationally intensive, but it has proven to be the most successful. Throughout the 1970s and 1980s research continued. By the 1980s, most researchers were using hidden Markov models, which are behind all contemporary speech recognizers. In the latter part of the 1980s and in the 1990s, DARPA (the renamed ARPA) funded several initiatives. The first initiative was similar to the previous challenge: the requirement was still a onethousand word vocabulary, but this time a rigorous performance standard was devised. This initiative produced systems that lowered the word error rate from ten percent to a few percent. Additional initiatives have focused on improving algorithms and improving putational efficiency. In 2021, Microsoft released a speech recognition system that worked with Office XP. It neatly encapsulated how far the technology had e in fifty years, and what the limitations still were. The system had to be trained to a specific user39。s voice, using the works of great authors 畢業(yè)文獻翻譯