I am not a fan of audiobooks, but I understand that audiobooks are a rapidly expanding market opportunity. Grand View Research predicts a USD $15 billion market by 2027. The problem with meeting or exceeding that expectation is in the challenge of producing significantly more audiobooks. Artificial intelligence (AI) can provide technology that can streamline audiobook production and meet the constantly increasing demand.
While the demand for audiobooks is increasing, production of audiobooks faces many procedural challenges. Companies are creating more ebooks, but a quick glance at them shows that formatting is still a challenge. Consider that the challenge of producing ebooks from print books is trivial compared to the creation of audiobooks. An ebook doesn’t have to understand context or characters, but a person reading the text does. Digitization is also much faster than it takes for a person to read a book. Even after the reading is recorded, significant editing might still be required. That is why estimates for the professional creation of an audiobook tend to be in the thousands of dollars, with a minimum of $2-3k and average of $5-10k.
That means many publishers focus only on what they expect to be best sellers, and they will have a large inventory of books they can’t afford to record.
There’s also the question of accents and languages. What is the cost of creating customized versions of books for different parts of the US, where a publisher might think a book would sell best? Even more interesting, what about words with different pronunciations in different places? Versailles is pronounced very differently when referring to a place in Kentucky than it is when referring to the famous palace in France.
For many production reasons, Speechki is a company focused on addressing the audiobook market. They are using AI and people in a coordinated fashion to speed the process of audiobook creation. The early step is to move past the simple digitization of the book used in creating ebooks. “Artificial intelligence is needed when text is first read,” said Dima Abramov, Co-Founder & CEO, Speechki. “Different characters need to be identified, intensity of voice suggested from text context, and other key information to make a better reading experience must be identified.” It’s not only a digitized text that is documented, but metadata is also created in order to drive the voice recording.
The company then uses AI to manage over fifty different American accented English voices, and more than 200 voices for other accents and languages. A rough audiobook can then be created in very short order, including using different voices for multiple speakers.
At that point, the human proof listeners work within the system to correct and adapt issues. For instance, this early technology doesn’t identify genders, ages, and other particulars of characters. The listeners can annotate the metadata and the voices can then automatically be changed.
Specific pronunciations can also be taught to the system, whether it’s the appropriate Versailles or a new technical term. Training the system and then having the system correct the text is much faster and less expensive than having voice talent record a new version.
Segmenting The Market
When working with customers, it’s always important to better understand market segments. The most obvious split is between fiction and non-fiction books. “Fiction best sellers sell more than non-fiction best sellers,” said Bill Wolfsthal, Publishing Consultant, Speechki. “However, non-fiction books have a longer tail, with the rest of the books selling more consistently over time.” That means while publishers in both segments are interesting, non-fiction publishers have more books that might be attractive for audiobook production. “Academic publishing is a key sub-segment, in that there are many non-fiction books sitting in their inventories that could bring in sales to cover lower costs,” said Mr. Wolfsthal. “Bringing down the cost of audiobooks below one thousand dollars opens up a lot of published inventory to potential production.”
Along with the opportunity in corporate and academic publishing, Speechki points out that almost half of all new books are self-published. A quick look at the internet will show a lot of complex and extensive steps for an author to create an audiobook. Providing independent authors with a way to quickly and inexpensively add the voice option.
In addition to ebooks, the technology would clearly lend itself to other voice productions. Think of scripts, from corporate presentations to movies. Speeding the process of training clips, webinars, and more, can enhance the business relationship to the customer base, providing faster production time that can drive even more content. In movies, running through an initial voice version can help the script writers, and the movie producers, think about the lines in a way more directly tied to how people speak.
Speechki is focused on audiobooks, but the other opportunities are driving a number of discussions with possible partners in other spaces. This is a newer market for AI, and there are opportunities for lots of players.
A lot of market coverage has been focused on chatbots, both in text and voice. They are rapidly becoming, if they’ve not already become, a “must have” in business. Voice opportunities aren’t limited to chat. While it seems that fewer people are reading books every year, it’s also clear that books are still in high demand. Audiobooks are a growing market and artificial intelligence is a tool that is beginning to be applied to address that market.