Why it matters:
For more than a century, audio, images and video have functioned as a bedrock of truth. These mediums have not only been a source of recording history but also shaped the perception of our reality.
In today’s age, modern technology and artificial intelligence have made it possible to doctor audio and videos to the extent that they are almost indistinguishable by the naked eye. The implications of this technology reach far beyond swapping your face with a celebrity.
In May 2019, a Canadian start-up Dessa trained a model on the audio of a podcast by Joe Rogan (“The Joe Rogan Experience”) and used it to create doctored audio.
In July, Symantec, a major cybersecurity company, mentioned that it encountered three successful audio attacks on private companies. In each, a company’s “CEO” called a senior financial officer to request an urgent money transfer.
It is believed that the culprits used audio of keynote presentations, TED talks, earnings call and corporate keynotes to train their model. Millions of dollars were stolen from each company, however, the potential for misuse is limitless. Imagine, doctored audio or video released to manipulate the stock market or spreading deliberate misinformation about a presidential candidate.
In January 2019, a fox affiliated news channel Q3 aired doctored footage of Trump’s oval office address regarding border security.
How it works:
In order to understand how these algorithms work, let’s first understand what exactly a deep fake is. Deep fake is a technique for human image synthesis based on artificial intelligence. It is used to combine and superimpose existing images and videos onto source images or videos using a deep learning technique known as generative adversarial network [GAN].
Consider, that you are trying to create images of fake celebrities. The generator will create images from white noise. The discriminator on the other hand will try to identify the inaccuracies of the generated image. Over time, the generator becomes more skilled in fooling the discriminator, while the discriminator starts identifying the doctored images better. Intuitively this is how GAN’s work.
Detecting Deep Fakes:
Today, deep fakes are even easier to produce. A deep fake can be made with just a few photographs. Tools and methods for creating realistic but fake audio have also seen a lot of advancement. Though there has been rapid advancement in generating doctored content. There have also been some developments in ways to detect deep fakes. Deep fake videos are usually constrained by the computational resources and production time and the Deep Fake algorithm can only synthesize face images of a fixed size. They must also undergo an affine warping to match the configuration of the source’s face. This warping leaves distinct artefacts due to the resolution inconsistency between warped face area and surrounding context.
This technique trains a CNN (Convolutional Neural Network) to detect faces, then identify facial landmarks such as the eyes, mouth and so on. Further, detect synthesized videos by exploiting the face warping artefacts in other words this algorithm is only trying to identify the inaccuracies in the generated image.
Another technique that has been released recently, involves specifically training the model to detect facial manipulations on a large dataset. The overall approach for manipulation detection is a two-step process: cropping and alignment of faces from video frames, followed by manipulation detection over the pre-processed facial region.
Another novel method is to identify the inconsistencies in blinking. As human eye blinking shows strong temporal dependencies, employing a long-term recurrent convolutional neural network (LRCN) model to capture such temporal dependencies and then identifying the respective inconsistencies.
Misinformation in online content is increasing and there is an exigent need for detecting such content. Face manipulation in videos is one aspect of the larger problem. However, there are developments being made to counter these issues.
In 2018, Face Forensics a video dataset consisting of more than 500,000 frames containing faces from 1,004 videos was released to allow researchers to study image and video forgeries. Researchers at academic institutions like Carnegie Mellon, the University of Washington, Stanford University, and The Max Planck Institute for Informatics are also experimenting with ways to detect deep fakes. Even the Pentagon, through the Defense Advanced Research Projects Agency (DARPA), is working with several of the biggest research institutions to get ahead of deep fakes.
The digital security companies should also invest in artificial intelligence as a way to counter the rise of deep crime.