In 2023, I wrote an article on my experience with using AI tools in video production.

At the time, the tools felt promising but were limited, fragmented, and often came across more experimental than practical. However, the past three years transformed these tools, and my own workflow in the process.

But just as interesting has been the cultural shift in how professionals, particularly in UX and qualitative research, think about AI. Up until very recently, many researchers, saw AI as a threat to the 'purity' of their craft. There was an understandable fear that these tools would dilute their findings or replace human interpretation altogether.

However, some of those fears have eroded and AI tools have become critical, as more and more is being expected from filmmakers and product/user research teams.

Timelines have compressed, research budgets have tightened, and AI tools have quietly become less about research-replacement and more about capacity. It has not eliminated the need for researchers and storytellers, however it has absorbed many of the technical burdens surrounding the work.

And as a result it has quietly reshaped my production process.

To highlight this evolution first-hand, lets start with a very rough video production timeline (from my perspective as a UX research consultant/filmmaker), and lets pull some AI tools i'm using into the production process.

STEP ONE: Exploratory Call + Creative Planning

Because I work with brands across a wide range of industries—big tech, CPG, startups, nonprofits—I sometimes use an LLM like Claude or Gemini early in the process to orient myself around an unfamiliar space before exploratory calls with clients.

Doing this pre-work on the front end can often make for richer conversations and more productive exploratory/planning calls. It also accelerates the context-building required to provide my clients with measurable value.

STEP TWO: Pre-Production

Pre-production is where complexity of a film can compound quickly. Here I have to pay attention to things like production calendars, storyboarding, recruiting, location logistics and equipment planning.

This is where a real-time discussion with Gemini Live would bring value, helping me think through first-pass storyboards, generate rough discussion guides as thought starters, or to explore the logistical 'reality' of filming three in-homes in one day in NYC.

This work still requires a great deal of human judgment, but it offloads a solid chunk of the initial mental burden.

STEP THREE: Production (Filming)

On the production-side, I feel like there has been a great deal of innovation with integrating AI tools into product hardware.

Companies like DJI, particularly with their Pocket 3 camera, made some big leaps here with some of their ActiveTrack features. It allows cameras to autonomously follow a selected moving subject without any human intervention. In essence, filmers now have the ability to have a second camera act like a sentry, and trust it to film and frame your subject without a dedicated operator. This redundancy and flexibility is an absolute game-changer for solo filmers, new filmers, researchers who are occupied doing other things, or small production teams.

ActiveTrack in action. Credit @mikeerogers

It doesn’t replace a skilled cam operator, but it expands what’s possible with limited resources, space, time.

STEP FOUR: Post-Production (Editing)

This is where the biggest leap has happened. Companies like Adobe introduced Text-Based Editing to Premiere Pro. In essence, its a tool that uses AI to generate transcripts of your video recordings. This allows you to cut, copy, and paste text to restructure your video's rough cut, with changes instantly reflected in the video timeline.

Text-Based editing in action. Credit @Justinserrandigital

These new tools allow you to edit your video, like you would a Word document.

Text-based editing in the late 2010s and early 2020s relied on tools like Descript and Transcriptive. They were effective, but were clunky, error-prone and weren't integrated into established video editing suites. For critical projects with tight timelines, this was often became more trouble than it was worth.

Now, this capability lives inside the editor. I can't emphasize how much of an impact this makes in the research space, where research teams can craft stories from many hours of footage at a time, while expecting professional video outputs with very quick turnarounds.

The result? Editors spend far less time wrestling their video footage and far more time shaping story and narrative around the insights.

LLMs Like Claude and Gemini can also play a role here as creative 'interns', leaning on transcripts to group themes across interviews (with human oversight) and even helping to identify rough narrative arcs.

On the audio side, tools like Adobe Podcast can quickly edit out unnecessary background noise, quickly level audio, and isolate voices to have videos sound like a professional podcast. Things have reached a point here where sound projects that once took many hours, can now take minutes, often with even better results.

What’s striking to me isn’t just how capable these tools have become... but how quietly they’ve integrated themselves into serious workflows. They are freeing up more time and energy for me to focus on story, narrative and engaging my target audience.

What's Missing

But... and this is the part you have likely been waiting for... these tools still have a very long way to go. And the big outage that exists is CONTEXT.

AI can summarize, cluster, structure, and generate beautiful content. What they cannot yet do is truly understand the context that is captured within film... especially in the way effective storytelling demands. While AI systems perform impressively with explicit information (and are beginning to produce cinema quality generated footage), they struggle to convert prompts into meaningful implicit interpretation.

Film is not just a sequence of edited statements. It is a layered interplay between verbal and non-verbal communication. Pacing, silence, tension, subtext, framing, and cultural nuance. These are all incredibly important elements to an effective film, even in the B2B work I do for market researchers.

Even with extensive training the most advanced models still don't have this ability, yet.

I'm starting to see venture-backed companies try and solve this challenge by hiring seasoned video editors to 'tag' stock video footage. I'm presuming the eventual goal would be that these 'contextualized video clips' would be fed into an AI tool, that would eventually enable storytellers and AI agents to craft a video stories fully autonomously. I simply cant see this strategy working.

Video viewership growth over the past 5-10 years has largely been attributed to services like Youtube, Instagram and TikTok, that platform individual creators. Many of these individual creators have succeeded in a large part due to their unique backgrounds, storytelling styles and life experiences.

The broad-based contextualization approach to media that is currently happening will not drive lasting engagement, because it can only produce formulaic media, and does not offer the unique content that viewers crave.

Where it Needs to Go

Bringing this all back to my industry there is a larger concern. In qualitative research, speed and accuracy exist in constant tension. The cost of misinterpretation can be very high. Slight deviations from core learnings can evolve into large-scale strategic errors, especially with the scale many of these large companies operate at.

Processing and communicating those learnings in an engaging way requires a level of nuance AI tools simply aren't able to accomplish, yet.

Do you trust AI to recognize what is wrong with this Pepsi ad?

Effective research storytelling requires engaging your audience, combined with analytical rigor needed to back up your research findings... while speaking in the language of your audience. That interpretation still requires a human touch.

Researchers and storytellers have their own biases. As a Middle Eastern/Black American Male with Western and Eastern cultures associated with my upbringing, my background and the experiences that shaped my life bring a unique perspective on the world. No manner of training can create a model that can replicate my lived perspective.

While AI tools can help teams pump out research films more quickly. Their handlers are unfortunately losing sight of the human element required to tell compelling stories. At the end of the day, a human brain is still required to make sense of the complexity and nuance inherent in research learnings.

As we rapidly adopt AI in our work, it has become more important than ever that we don't let these tools unnaturally skew/blur/bias the most critical parts of our work.

To build on this, video production is becoming more and more niche. Video creators are using more and more creative ways to engage niche audiences in their space. AI tools that apply a broad based approach to contextualizing explicit information will miss this critical element.

AI tools that better understand and anticipate their users creative intent instead of attempting to replace them entirely, will likely separate themselves from the competition.

To do this... what if I could upload my existing film productions, favorite creators, storytellers and chat history to a train a model? What if I could then leverage said model as a production assistant, to help me create create rough cut films in my own personal style?

I can see AI heading in this direction in the near future.

Comment

The State of AI in Video Production (and Research) & Where it Needs to Go [PART TWO]