AI Video: how we automatically generate an explainer video about pizza

To overview
AI Video: how we automatically generate an explainer video about pizza

AI video generation is transforming the way we create, view, and experience video content. This innovative technology is opening doors to unprecedented levels of creativity and operational efficiency. As we embrace AI's role in video production, we're witnessing the dawn of a new era where imagination meets the limitless possibilities of content creation. At its core, AI video generation leverages machine learning algorithms to automate the process of creating video content. It has become increasingly popular due to its ability to produce high-quality videos in a fraction of the time it would take a human editor.

The Benefits of AI Video Generation

  • Efficiency and Speed

Video generation technology dramatically reduces the time needed to produce videos. From scripting to editing, these tools can perform tasks in a fraction of the time it would take humans.

  • Cost-Effectiveness

The need for large production teams and expensive equipment diminishes. This type of video production allows more creators to create high-quality content without a hefty price tag.

  • Customisation and Scalability

AI algorithms can be trained to generate videos tailored to specific audiences, ensuring content is both relevant and engaging. Moreover, video generation tools can handle large-scale video productions, which traditionally would be resource-intensive.

Examples of AI in Video Creation: Xplainify

As an example of video generation software, we will showcase our own video generation tool: Xplainify. It is a good example of AI Video Generation software.

The Process Behind Xplainify

  • Visual Content Sourcing: 

The first step is to source the multimedia content for the video. We utilise multiple services for multimedia content - Wikipedia, Unsplash, and Pexels. To get the images and stock videos we use their APIs.

After we have sourced the visual assets we resize them, because they come in different sizes. Then we store them with their description so the software can understand what each picture is about. 

Here’s an example of the images we source when the topic of the video is Pizza:

  • Script Generation:

The next step is to generate the script for the video. For this, we use OpenAI’s API. We create a specific prompt that will generate an engaging and educational script whilst taking into account the multimedia content it has. This will make the script tailored to the images and videos we have.

Here’s what a piece of the video script looks like when the topic is Pizza:

Chapter 1: History of Pizza.

Pizza, a dish of Italian origin, has become one of the most popular foods in the world. The term "pizza" was first recorded in the year 997 in a Latin manuscript from Gaeta, Italy. Raffaele Esposito is often credited with creating modern pizza in Naples. Neapolitan pizza was even added to UNESCO's list of intangible cultural heritage in 2017.

Chapter 2: Varieties of Pizza.

Pizza comes in many different styles and varieties. Some notable styles include Neapolitan pizza, which is made with San Marzano tomatoes and buffalo mozzarella; Sicilian pizza, a thick-crust or deep-dish pizza topped with tomato sauce; and Argentine pizza, known for its thick, spongy base and large amount of cheese. In the United States, pizza has developed distinct regional styles such as New York, Chicago, and California styles.”

  • Voiceover Generation:

Now we have a script but we need to create a voiceover from this script. To do that we use Text-to-Speech technology and utilise ElevenLab’s API to create a high-quality and realistic-sounding voiceover.

This is an audio example of the voiceover:

  • Content Relevance

This is the most important part! We need to connect the script with the visual content so that the video will make sense. For this, we use algorithms incorporating custom prompts for OpenAI’s models that would split the script into parts, analyze each visual asset, and connect each part to the most relevant image that the software has. This ensures that almost always the text displayed on the screen is connected to the image shown.

As an example for the sentence: “Preparing pizza involves baking it in an oven at a high temperature.” this was the image that the software chose:

  • Introduction:

The next step is to create an introduction for the video. We again use custom prompts and OpenAI’s models to analyse the script and create a tailored introduction section that would welcome the viewers and introduce them to the video topic. We then incorporate this introduction with stock videos to make it visually appealing.

Here’s an example of stock videos that the software uses for the topic of Pizza:

  • Final Video Assembly:

Finally, we use FFMPEG to create small video segments containing a script part, visual assets, voiceover, and subtitles. Then we concatenate all the small video segments to create the final video result. See the result below.

The Future of AI in Video Production

As we look towards the future, the role of AI in video production is poised for even greater advancements. Here's what we can anticipate:

  • Enhanced Realism

AI-generated content is rapidly advancing towards realism, blurring the line between real and AI-generated videos. This opens up possibilities for AI to craft entire movies and would change the way we create and interact with video content.

  • Personalisation

AI will enable hyper-personalised video content, catering to individual viewer preferences and behaviours, significantly improving user engagement and content interaction.

  • Ethical Considerations

More realistic AI-generated content will mean a rise of deepfakes in news and media. This highlights the urgent need for robust authenticity checks to prevent the spread of misinformation.


AI video generation, as exemplified by tools like Xplainify, is not just an advancement. It's a revolution in video production! This technology is fundamentally transforming the way we create and experience video content, unlocking the doors to sophisticated storytelling for everyone. With its unlimited potential, we eagerly await the future of AI video generation, anticipating all the groundbreaking ways it will redefine the art of visual storytelling!

Interested? Contact us

Your name
You need to fill in your name
You need to enter a valid email address
You need to enter your message

Thank you for your message 💪

Thank you for your message! We will answer your question shortly