VG launches true crime series, uses AI to animate re-enactments of actual events
Audio & Video Innovations | 28 August 2023
In July, VG published a true crime documentary series called “Norske forbrytere” (translated in English to “Norwegian criminals”). What sets this series apart from other similar shows is that the stories are predominantly narrated by the criminals themselves.
Our ambition was to figure out what makes people commit serious crimes. Therefore, we wanted to try to get under their skin, gain insight into their psyche, and try to understand them.
From cocaine smuggling to killing a policeman
The first episode is the story of the renowned Madelaine Rodriguez. In 2008, she traveled to Bolivia with her 2-year-old daughter and two other Norwegian girls. As their journey neared its end, the situation took a tragic and unexpected turn when they were apprehended at the airport with 11 kilograms of cocaine.
Over the following years, the two other girls managed to escape, while Rodriguez found herself sentenced to eight years of imprisonment. The documentary sheds light on a recent development where Rodriguez has been informed of her impending return to prison. Additionally, it highlights her ongoing struggle to be able to return to Norway.
The second episode is the story of Kjell Alrich Schumann. He was one of the key figures behind the NOKAS robbery in 2004, the largest Norwegian robbery of all time, where he killed a policeman.
Prior to this, he committed the most notorious post office robbery at that time, which was known as “the Grinder robbery.” He used a water pistol filled with chili sauce as his weapon of choice.
The third episode is about Jan Petter Askevold, better known as Benny Bankboks (Benny Bankbox). When he was a student in 1978, he was on night duty at one of Kreditkassen’s branches in Oslo. He emptied 143 safes and embarked on a spectacular escape.
Although the Norwegians are big consumers of true crime, we have been careful not to glorify the criminals and romanticise their actions. Therefore, we naturally asked critical and difficult questions. We learned that not everyone has the same assumptions about living a law-abiding life. We also showed criminal actions have dramatic consequences for the victims, but also for the ones who commit them.
AI video leads to unique development style
These old cases have a very limited video supply. How could we effectively portray a robbery and a cocaine smuggling operation with such limited visual resources?
At first, we started using traditional reconstruction with actors, which are commonly used in many true crime series. We were about to hire an animator, but then we discovered AI animation.
Instead of using traditional animation, we trained a model with reference images. Then we used video we filmed and generated ourselves, which in turn generates the animations that illustrate the definable events in the criminals’ lives.
We found different styles we liked, then used the AI-generator programmes Stable Diffusion and Midjourney to create hundreds of images. These went into a dataset that was then used to create a LoRA (LoRA models apply tiny changes to a standard checkpoint; several can be used at the same time to guide the AI to a preferred style) for use in Stable Diffusion.
This meant we could generate images in the specific style with simple prompts (the text describing what you want AI to generate). We experimented with several checkpoints before we settled with one that gave us a good result.
At first, we had hoped to only use text-to-video (a prompt that describes what you want to generate video of). The plan was to create all our AI footage from text prompts, but due to time constraints we opted to use some stock footage and filmed some sequences to help guide the AI to where we wanted it.
Some of the guiding videos were created with text-to-video with the AI generator programme RunwayMLs Gen 2, which allowed us to write “people at airport,” for instance, and the programme created a video sequence that showed that. You can also upload a reference photo so Gen2 will understand what kind of style you prefer.
Then we used automatic1111’s Web user interface for Stable Diffusion and ControlNet to achieve a consistent look across all the frames in each clip. At first, we converted the stock footage to image sequences, and batch processed these as individual images through Stable Diffusion. While we worked, a GitHub user called Volota released a plugin called SD-CN-Animation that allowed us to transfer the “VG style” directly to video files with almost the same quality, but a bit faster.
The LoRA we had created was a little overtrained on faces, so it had the unfortunate effect of creating eyes and mouths in sequences with little information in them, such as clips of doors opening or clothes in a suitcase. These clips were then brought into Photoshop, where we went frame by frame and painted them out. Then, the videos were imported into After Effects and time posterised to 12fps to make it look more like an animation.
Lastly, they were AI-upscaled to a higher resolution in DaVinci Resolve to match the 4k production of the filmed content.
Avoiding photo-realistic AI videos
In comparison to using a professional animator, we discovered this was an easy and cheap way to bring the dramatic events to life. At the same time, we have been aware of the potential issues with using this technology. Therefore, we explicitly marked the clips with disclosure that says “Animation made by AI.” We are also transparent about what kind of programmes we used in the closing credits.
At VGTV, we cultivate journalists with a diverse skill set involving filming, editing, directing, reporting, crime research, and journalism. We wanted to create interview sequences able to compete with the quality of the best Netflix series.
Four people were involved with making the AI-generated video sequences for a specific period, but the majority of the project was mainly created by only two people. Other TV production companies often use a bigger team of people with more dedicated competence to produce an equivalent result.
Summer publication
Most people watch VGTV content on their mobile phones, so we focused on keeping the documentaries relatively short and to the point. We also adapted to the Web TV format.
Additionally, we decided to publish the series in the middle of the summer. Our theory was that people still wanted to watch good content during the holiday, and we could satisfy this need. This strategy aimed to capitalise on this demand before the influx of new series from streaming services in the fall, which intensifies competition for viewers’ time.
As of mid-August, the three documentaries have garnered nearly 700,000 playbacks.