20/07/2024
In the last four weekends, I embarked on my most ambitious AI-related personal project. The video below is the result of 30 hours of painstaking work to reach a level where I was satisfied with the outcome. Here is the story of how it was produced:
Step 1: The Source of Inspiration
For this project, I wanted to create something inspired by my favorite movie of all time: Inception, the Christopher Nolan masterpiece. The movie features one of the best soundtracks ever, composed by Hans Zimmer, with the anthem “Time” serving as the grand finale. This song inspired the music for my video, where I borrowed themes from the original but transformed it into a female vocal performance instead of an instrumental piece.
Step 2: Producing the Music
AI music generation has advanced rapidly over the past year, with tools like Udio and Suno pushing the boundaries of what's possible. For this project, I chose Udio. Before producing the music, I needed lyrics. The song recounts the story of Dom Cobb, the main protagonist of Inception. His struggle to reunite with his children, while distinguishing reality from fiction amidst multiple dream layers, is central to the lyrics.
I drafted the initial lyrics and used ChatGPT-4o to fine-tune them. Once the lyrics were set, I composed the music, inspired by Hans Zimmer’s piano theme. Using Udio, I generated multiple iterations, adding instrumentation until I had a 30-second instrumental draft. I edited this version using Apple GarageBand, which then served as Udio’s base layer for the complete song, including lyrics.
Building the song in 30-second increments with lyrics required generating hundreds of outputs from Udio. The AI prompting was crucial, as each segment added complexity. Overall, producing the song involved over 300 generations, combining AI-generated content and my own editing.
Step 3: Generating the Video Content
Sufficient footage is critical for any video production. I aimed to create a mix of live performance and Cobb’s story. I drafted the screenplay and shot selection, then iterated on it with feedback from ChatGPT-4o. Using RunwayML Gen-3, I generated the needed footage, a process involving over 1,200 video generations to capture the Inception-like dream world, surreal elements, and specific lighting.
Step 4: Final Editing of the Video
The final editing was challenging, especially timing the music with the AI-generated videos. Matching the piano sequences and syncing lip movements to the lyrics was difficult since AI video generation doesn’t allow precise lip movement control. This necessitated generating many video outputs.
I wanted to end the video with the same uncertainty as Inception, referencing the totem's significance for those familiar with the movie.
Overall, I spent about 30 hours over four weekends on this project. It was an enlightening experience that allowed me to explore the limits of generative AI and appreciate the incredible progress made in the past year. While some see generative AI as a threat to industries like music and film, I believe it will foster a new wave of creativity, allowing new talents to emerge and new industries to be created. Those who embrace and master this technology will be better positioned in our competitive world. Progress cannot be stopped, and those who ignore this technological revolution risk being left behind.
Let me know your thoughts on the video and your overall views on AI.
https://www.youtube.com/watch?v=OQzG-Bc2s_M