top of page
  • Writer's pictureJohn T. Williams

The Role of Voice Cloning and AI in Promoting Diversity


As a seasoned video producer for over two decades, I’ve worked with many voice actors for films and client videos. Yet, I've observed a disquieting void—many cultures and ethnicities remain severely underrepresented, their voices often missing from the chorus that should reflect our diverse society.

Zippa.com reports that 56.7% of Voice Actors are white. 16.5% of Voice Actors are Hispanic or Latino, 13.7% of Voice Actors are Black or African American, 6.5% of Voice Actors are Asian, 6.4% of Voice Actors are Unknown, and 0.2% of Voice Actors are American Indian and Alaska Native. 70% of Voice Actors are male, and 29% are female.

However, we now stand on the cusp of a transformative era. The emergence of sophisticated AI voice technologies and voice cloning has begun to rewrite the script, offering an unprecedented opportunity to include those once overlooked and to celebrate the full spectrum of human diversity in our auditory landscape.

In the evolving soundscape of our digital age, the spoken word has transcended its traditional boundaries, morphing into a tool that not only conveys information but also embodies the richness of human diversity. The advent of voice cloning and AI-generated voices heralds a new era of inclusivity, one that promises to amplify voices long underrepresented in the realm of voice acting.

Voice cloning, a technology that meticulously samples and replicates individual voices presents a novel avenue for preserving the authenticity of diverse vocal characteristics. This technology holds the power to bring a multitude of accents, dialects, and intonations to the forefront, ensuring that no timbre of speech is left unheard. It's a form of audio democracy, allowing the distinct vocal identities of underrepresented ethnicities to be heard in mainstream media, education, and videos. By cloning voices from a wide array of cultures and communities, voice cloning can help combat the homogenization of voices that have historically plagued the industry.


On the flip side, AI-generated voices offer scalability and versatility that voice cloning can't easily match. With the capability to create a vast range of synthetic voices, AI can fill the gaps left by the scarcity of voice actors from diverse backgrounds. AI voices can be tailored to speak in multiple languages and dialects, breaking down linguistic barriers and fostering a sense of inclusion for users across the globe. The technology can generate voices that resonate with different age groups, genders, and cultural backgrounds, making digital content more relatable and accessible to a broader audience.


However, the use of AI voices also raises an important question: Can a synthesized voice truly capture the essence of human diversity? While AI can mimic a wide range of vocal qualities, the subtleties of emotion, regional inflections, and cultural nuances are aspects that are best captured through the human voice. This is where voice cloning has an edge, as it utilizes real voices with all their inherent characteristics and idiosyncrasies.


The underrepresentation of certain ethnicities in traditional voice acting is a gap that both voice cloning and AI voices are poised to fill. Voice cloning can ensure that the voices of minority groups are not just included but are presented authentically, while AI can create a variety of voices that might not be available in the voice-acting market. Together, these technologies can provide a platform for inclusivity, giving voice to those who have been silent in the media for too long.

As companies begin to utilize AI voices and AI voice cloning for their videos, does this eliminate the hiring of voice actors? Not necessarily. In my own experience, I’ve hired voice actors from various backgrounds and with their permission and additional compensation, we’d sample their voices for additional projects using voice cloning tools like PlayHT, an ultra-realistic Text to Speech program using the AI Voice Generator or voice sampling.


During the recording session, I would direct the voice actor to give various reads on each short script (Storytelling style, commercial, upbeat, calm, etc). These renditions can be used for various types of projects or to change the emotion within the script. The results are amazing.

While AI voice technology has made significant strides, there is still a quality difference between a human performance and a synthesized voice. The nuances, emotions, and unique characteristics of a human voice are difficult to replicate fully with technology.


Voice actors provide more than just their voice; they bring creativity, interpretation, and a human touch to their performances that can elevate a script. AI voice technology is more likely to supplement the work of voice actors, taking on tasks that are repetitive, costly, or not feasible for humans to perform. This could include voiceovers for GPS navigation, automated customer service systems, and other similar applications.


As we stand at the crossroads of tradition and innovation, it is clear that both voice cloning and AI voices have pivotal roles to play. The path forward is one of balance—leveraging the authenticity of cloned voices to honor individual identity while harnessing the diversity of AI to ensure everyone finds a voice in the digital chorus. It is through the confluence of these technologies that we can hope to craft a more inclusive future, where every voice is heard, every accent is recognized, and every culture is celebrated.



2 views0 comments
bottom of page