Skip to content

Is it possible to make a realistic DeepFake video?

The answer is no. Making it looks like someone was doing something or saying something that never occurred is more difficult than it sounds. It caused me to produce a “deepfake” video when I heard this news spread like a wildfire throughout the nation. I attempted to convince myself that this is a fake video produced by those who want to gain political interest. But the main reason I’m eager to do it because of someone offers RM4000 ($900) for those who can produce a realistic deepfake video within 48 hours. Maybe I can purchase a lot of Pokemon booster box if I won. So I took up the challenge.

Deepfake Challenge
Deepfake Challenge

Some of you may not know what the posting is about, especially for those who can not understand Malay. Let me explain this. The poster (Abdul Rahim Hilmi Zakariya) challenges if anyone can produce a realistic sex tape video. But in Malaysia, the sex tape video is illegal, so you can choose any scenes from Fast & Furious movie, replace the body and face of Paul Walker with yours. The length of the video at least 3 minutes, quality must be better than Milo Suam’s deepfake video and 100 percent realistic. If the individual succeeds, RM4000 will be rewarded. The reason the poster has a gut to challenge this because Mohd Firdaus Jailan stated that the deep-faced video can be produced in less than 2 hours and cost less than RM400 ($100) to build it.

So what’s Deepfake? Deepfake (a portmanteau of “profound learning” and “false”) is a method for synthesizing human images based on artificial intelligence. Using a machine learning method known as a generative adversarial network (GAN), it is used to combine and superimpose current pictures and videos on source pictures or videos [1] . It is an AI-based technology used to create or modify video content so that it presents something that did not actually happen. The word is named for a Reddit user known as deepfakes who used deep learning technology in December 2017 to edit the faces of celebrities on individuals in pornographic video clips. The word, which relates both to the techniques and the videos produced with it, is a portmanteau of deep learning and fake.

Original video

The goal is to create a realistic deepfake video with less than RM400 cost, completed in 48 hours and 3 minutes of video duration. And that’s what I’ve been trying to do to see how easy it really is to use this technology. But first I attempted confession video for a test length of about 34 seconds.

Using Faceswap library, I attempted to produce a deep video as realistic as possible and the outcome is below.

Clearly, particularly when the video in HD mode has become alarmingly competent for porn aficionados, people on fringe websites and even political parties, it’s not a very good high quality. It took me more than 48 hours (actually 1 week) to come up with something far from a nice deepfake video on the internet. Of course, it’s quite blurry on my face if you notice. But at that moment I’m just using my MacBook Air because I thought it was already enough to do it and cost-effective instead of using cloud computing powered by GPU. This time I’m completely wrong. Even the offering has already ended, I still don’t give up on creating a realistic deepfake video.

MacBook Air Technical Specifications
MacBook Air Technical Specifications

My next action plan is moving to the NVIDIA GPU enabled AWS server. I’m using this AMI (Deep Learning Base AMI (Ubuntu) version 18.1-ami-0b9a1b747930d6a0f) for the first attempt. Then I pick this instance (p2.xlarge (11.75 ECUs, 4 vCPUs, 2.7 GHz, E5-2686v4, 61 GiB memory, EBS only) have up to 16 NVIDIA NVIDIA K80 GPUs) to train my model quicker than my crappy MacBook. Once completed (after my instance was approved by AWS), I set up the machine and train the model.

After 65 hours, I just noticed the loss function was stagnating between 0.01000~0.00800. Then I create a new instance with the same AMI (g3.16xlarge (188 ECUs, 64 vCPUs, 2.3 GHz, Intel Xeon E5-2686 v4, 488 GiB memory, EBS only) have up to 4 NVIDIA Tesla M60 GPUs) to make sure it processes quicker than the prior instance. After 13 hours of running, the outcome is still the same at this moment. Fortunately for this experiment, I just use free AWS credit without touching my wallet. The experiment cost me about $206.11.

Loss Function
Loss Function


AWS Bills
AWS Bills

Below is the result:

If you don’t like the technical part, please skip this section.

Here are the steps I took

Install Python 3.6 into your machine and open the green terminal.

Get the Faceswap repo by typing:

git clone --depth 1

Enter the faceswap folder:

cd faceswap

Install requirements:

sudo pip3 install -r requirements.txt

Install Tensorflow:

sudo pip3 install tensorflow==1.13.1 //(CPU version)


sudo pip3 install tensorflow-gpu==1.13.1 //(GPU version)

Extract Haziq from a video file:

python3 extract -i ~/faceswap/src/haziq.mp4 -o ~/faceswap/faces/haziq

Extract mine from a video file:

python3 extract -i ~/faceswap/src/me.mp4 -o ~/faceswap/faces/me

Train a model and show a preview:

python3 train -A ~/faceswap/faces/haziq -B ~/faceswap/faces/me -m ~/faceswap/haziq_me_model/ -p

Extracting video frames with FFMPEG:

ffmpeg -i ~/faceswap/src/haziq.mp4 ~/faceswap/output/video-frame-%d.png

Swap the images:

python3 convert -i ~/faceswap/output/ -o ~/faceswap/converted/ -m ~/faceswap/haziq_me_model/

Install FFmpeg via Brew and convert into a video:

ffmpeg -framerate 30 -i ~/faceswap/converted/video-frame-%d.png -i ~/faceswap/src/output-audio.mp3 -c:v libx264 -vf "format=yuv420p" ~/faceswap/src/out.mp4

What I can conclude is the deepfake technology is not 100 percent work and requires quite some manual effort. Deepfake videos are still worrisome despite the problems of the tool, just not in the instant future, and with a few caveats. At the same time, misunderstanding and overhyping deepfake effect and potential could present a higher short-term danger.

[1] Wikipedia contributors. (2019, June 24). Deepfake. In Wikipedia, The Free Encyclopedia. Retrieved 17:55, June 26, 2019, from

P.S: Of course, this is my personal opinion and may not reflect my company’s views.


  1. Boo Khan Ming Boo Khan Ming

    Nice try. What do you think about FaceApp?

    • FaceApp depends on 1 single image and 1 face expression. DeepFake requires more than 1, depends how long the video duration. If 35 seconds, 1 second requires 25 frames/images to process. 35 x 25 = 875 frames/images. That’s a lot.

  2. Tan Sia Khong Tan Sia Khong

    Good article, keep it up, though the grammar can be improved further.

Leave a Reply

Your email address will not be published. Required fields are marked *