Extracting Clauses in Dialogue Corpora: Applications to Spoken Language Understanding
In this paper, we discuss a method for identifying and extracting clauses in utterances of human-human and human-machine dialog corpora. We illustrate the utility of these clauses in the context of robust spoken language understanding. Robust spoken language understanding in large-scale conversational dialog applications is usually performed by classification of the user utterances into one or many semantic classes. The features used for classification are sensitive to variations that are natural in spoken language, such as edits, repairs and other dysfluencies. Furthermore, the performance of these classifiers typically degrades when the user s utterance contains multiple clauses resulting in multiple semantic classes. We present a semantic classification technique that first automatically removes dysfluencies and segments the user s utterance into clauses and then classifies the utterance based on the classification of the clauses. We show that this preprocessing improves the semantic classification accuracy for utterances and significantly decreases the amount of training data needed for a given level of classification accuracy.
Srinivas BANGALORE, Narendra GUPTA
spoken language understanding, clause identification, classification.