Deepfake Audio Detection System
Deepfake Audio Detection refers to process of detecting if the audio is real voice or deepfake synthesized voice or deepfake cloned voice. Now a days, there is new technology being used for human like voice synthesis using deep neural network techniques. Such voice is called Deepfake Cloned voice. It is difficult to distinguish it from real human voice by a listening test. An audio can also be simply deepfake synthesized even if not cloned. The Deepfake synthesis or cloning can be detected by detecting the artifacts using Time Artifacts Filter and Frequency Artifacts Filter.
Features:
​
-
Metadata Analysis filter checks the field start. If the field has some standard value, the audio is declared as real. Otherwise, it checks the value of start field and encoded by to conclude that the audio is deepfake cloned.
-
Time Artifacts Filter uses parameters like spectral roll off, ZCR and absolute difference between successive samples to filter out the artifacts.
-
Frequency Artifacts Filter tracks the frequency characteristics like a peak at the power line frequency to track the artifacts
output and is displayed as time instants where artifact is detected in a lower window with RED vertical lines. -
Zeros discontinuity- Many times for deepfake cloned audios, there are some time instants where the amplitude of the signal goes very low. These time stamps are declared as zeros discontinuity. Occasionally, if there are real zeros found in the audio, it can be declared as tampered or modified.
-
Silence discontinuity filter uses LPC values of successive segments to find if any discontinuity is found in the silence area.
-
The Histogram for the test audio is plotted. If it has a flat top, it indicates that the test audio is deepfake cloned.
-
CopyPaste Segments – There are three types of identical segments that can be detected in the test audio. Each type is detected and declared using a separate color. The colored vertical lines in the lower window indicate copy pasted segments.
-
Speaker Verification requires the analyst to record control voice of the suspected person in the same language and enter it in the system.
-
Automatic Detection is used to carry out all above tests one after the other and generate the report in pdf or word format
-
There is a facility to record, play, export, split stereo channels, zoomin, zoomout, audio processing like normalize, show clipping with opening and closing of windows and changing their layout etc.
-
Facility to run on the device as a complete offline solution