Site Loader
Rock Street, San Francisco

Introduction:

Automatic
speech recognition is a process of identifying speech by a machine. It takes
human speech as input and returns the output as a string of words, phrases or
continuous speech in the form of text. Vocabulary
size, Speaking style, Speaker
mode, Channel type, Transducer
type are –plays vital role in speech recognition. Multilingual Speech
Recognition refers   the speech from
different languages taken as input and the system recognize the speech and
process it.  The proposed system reduces
the error rate, memory footprint and computational bandwidth requirements of a
grammar-based, medium-vocabulary speech recognition system, intended for
deployment on a portable or otherwise low-resource device. Fuzzy C-means
clustering is used in the proposed system to achieve the better performance.
Feature Extraction, Clustering and Classification are done with the best
approaches and the results achieved in multilingual speech recognition system.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

SPEECH
FEATURE EXTRACTION TECHNIQUE:

The standard method for feature extraction in speech is Mel Frequency
Cepstral Coefficients. The use of about 20 MFCC coefficients is common in ASR,
although 10-12 coefficients are often considered to be sufficient for coding
speech.

 

 

 

                                                                                                                  
Log Mel Spectrum

 

                                                                                                              
Mel Spectrum

 

                                                                  

                                                               Spectrum

In the pre-processing stage first
each signal is de-noised by soft-thresh holding the wavelet coefficients, and
since the silent parts of the signals do not carry any useful information,

Framing is a process of segmenting
the speech samples obtained from the Analog to Digital Conversion (ADC), into
the small frames with the time length within the range of 20 to 40 ms.

            The work of FFT is to obtain the magnitude
frequency response of each frame. When FFT is performed on a frame, it is
assumed that the signal within a frame is periodic, and continuous when
wrapping around. Each frame has to be multiplied with a
hamming window in order to keep the continuity of the first and the last points
in the frame.

            The Mel filter bank consists of
overlapping triangular filters with the cutoff frequencies determined by the
center frequencies of the two adjacent filters. The filters have linearly
spaced centre frequencies and fixed bandwidth on the Mel scale. The logarithm
has the effect of changing multiplication into addition. DCT is applied on the 20 log energy Ek
obtained from the triangular band pass filters to have L Mel-scale cepstral
coefficients.

Clustering and classification:

          The clustering technique involves in
grouping the similar type of data from the extracted values center point is
calculated and based on distance between data points clustering is performed. K-means Clustering is the process of partitioning a
group of data points into a small number of clusters. The FCM
algorithm attempts to partition a finite collection of elements X={,, … ,} into a collection of c
Fuzzy Clusters with respect to some given criterion.

IMPLEMENTATION

Implementation is the realization of an
application, or execution of a plan,
idea, model, design, specification, standard, algorithm, or policy. The proposed method is
implemented using the following modules

·        
Data Acquisition Module

·        
Feature Extraction Module

·        
Clustering module

·        
Classification Module

·        
Decision module

 

5.1DATA
ACQUISITION MODULE

            The proposed system
with Fuzz C means algorithm takes input from 160 audio files and processes it.
This audio file are stored along with.wav extension .the isolated words taken
from various languages are used as input data. The same word is pronounced four
languages. The input dataset is taken from Tamil, Hindi, Malayalam and English
languages. The audios are recorded by using sound recorder with closed mikes in
a silent room.

5.2
FEATURE EXTRACTION MODULE

            After acquiring
the input data feature extraction is dine efficiently by using Mel Frequency
Cepstral Coefficient method. Preprocessing, framing, windowing, Mel Filter Bank and Frequency Wrapping are
done for the input audio files and logarithm values taken. After taking
logarithm values discrete cosine Transform is calculated and the values
obtained for next step.

 

5.3 CLUSTERING
MODULE

. In existing system K-means algorithm is used .In the proposed
system Fuzzy C means clustering is used and results obtained. The fuzzy based
approach produced better results.

Fig 3: System Flow
Diagram

 

5.4
CLASSIFICATION MODULE

           In this proposed work classification
is done using Support Vector Machine (SVM). The classification involves two
processes i.e., Training and Testing. In training phase, all the training
datasets will be trained and placed in the template database. In testing phase,
the test dataset available in the test database will be trained and is compared
with template database for the decision to be made accordingly.

 

5.5
DECISION MODULE

           In this module decision is made
based upon the match scores generated by the classifier. After classification
K-means and Fuzzy C means clustering results are produced. The predicted
results by using the two clustering techniques are compared with actual
results. The performance analysis is done by using confusion matrix .Accuracy
rate is calculated and analyzed.

Post Author: admin

x

Hi!
I'm Eunice!

Would you like to get a custom essay? How about receiving a customized one?

Check it out