Alignment And Multimodal Analysis In Signed Speech
In this thesis, we attack the problem of extracting isolated signs from continuous signed speech videos. Signed speech is a language that uses the signs of sign language and the grammar of spoken language. It is a visual language and makes use of hand gestures, which consist of hand motion and hand shape. In continuous signed speech, signs are expressed in succession, which results in coarticulation effects, making segmentation a challenging task. In this work, we aim to segment some of the most common signs in Turkish Sign Language using hand gesture information. This process consists of two consecutive steps: First, we apply segmentation to hand regions obtained from a hand tracking module and obtain images containing only the left or the right hand. Then, we represent hand gestures by a variety of features which can be categorized as follows: 1) Center of mass coordinates of each hand and their first-order derivatives, 2) Ellipse parameters for each hand, 3) Discrete Cosine Transform, 4) Histogram of oriented gradients, 5) Local Binary Patterns, 6) Hu Moments, and 7) Radial Distances. Then, we align the sequences with different methods and find the start and end positions for each sign. We use Dynamic Time Warping (DTW), Hidden Markov Models (HMM), and coupled HMMs as different alignment approaches. We also apply some fusion techniques to improve the alignment performances. We experiment on a database from Turkish signed speech videos and report the results. We see that the highest accuracy is obtained by combining the DTW and HMM methods using Center of mass coordinates, their first-order derivatives, and Ellipse features.