top of page

Speech to Text system

1.Extract the audio file from the folder. Read the sampling frequency, format and
the duration of the file. Process the file using Speech to Text engine by
uploading the audio data for text recognition.

2.Enable punctuation processing for the text and get the text output with
punctuation marks for the given audio. Extract text data in a word file.

3.Enable speaker diarization and diarize the text using time stamps in speech to
text and time stamps in diarized audio.


4.Identify which speaker in diarized output audio is agent and which is client and
name the speakers in diarized text.


5.Use diarized text and corresponding time stamps for agent to find number of
words spoken per minute to get the value of rate of speaking for agent. Use
diarized audio for agent to find the speech parameters to find the tone for
agent. Pass rate and tone to next unit called text processing. Use emotion
detection to find if the agent is aggressive based on pitch of spoken words.
Pass this emotion parameter to text processing.

bottom of page