AI'll Be Back: Generative AI in Image, Video, and Audio Production
M7 Jun 13, 2025, 3:30 PM - 4:20 PM
This talk introduces you to the world of generative AI with a focus on Text-to-Image, Text-to-Audio and Text-to-Video for creating images, music and short videos. We explain how neural networks can generate various output formats from short text inputs using diffusion models and so-called Transformer architectures.
We focus on advanced technologies such as Sora or Midjourney. The techniques used, such as Latent Diffusion Models, allow us to generate and edit images and videos by combining text understanding through attention mechanisms and transformers with denoising processes.
A detailed examination of the video generation process with Sora shows how it compresses visual data, breaks it into patches, and then reconstructs it into the final video. In addition to Sora, we also discuss alternative methods and tools like RunwayML or SunoAI, to present a broad spectrum of tools for image, audio and video generation.
By the end of this talk, you will have a basic understanding of diffusion models, an overview of tools for image, audio and video generation, and a deeper understanding of the functionality. Practical examples and demos round off the presentation