Audiobooks, AI, and humans – where do they stand?

The audiobook segment is easily the most lucrative at the moment. With demand ever on the rise, one might have the impression the voice artists are having a field time reading out the books and bringing home fat pay checks. After all, it’s a fairly simple job, at least it seems so, one that does not require any special skills. It’s something that any book lover with a good voice will love doing. Unfortunately, that isn’t the case as there is competition brewing on the horizon, and it’s not from fellow voice narrators of the human kind but robots with a human-like voice. Here is how things stand at the moment.

Typically, producing an audiobook can take months with the cost easily stretching into thousands of dollars. Much of that involves studio time as well as paying the voice artist. Plus, there is significant post-production work involved too. All of this makes producing an audiobook quite an arduous as well as an expensive affair. No wonder, this has created the space for the AI-enabled text-to-speech tools that can prove to be an exact replica of the human voice.

The benefits are many; with the biggest positive of the automated audiobook production process being that it cuts both the production cost as well as time by a significant margin. The entire process is also greatly simplified as well, so much that an entire audiobook can now be produced within just days or even hours. Compare that to the months that it could take to produce a similar audiobook via the manual process and the difference should be immediately perceptible.

Advances made in the text-to-speech domain in the last couple of years have also been really awesome so that it can often be impossible to make out the real from the artificial. Just listen to the most recent iterations of Siri or Alexa and you will know what is being implied, and they are able to come up with the relevant answers because they are scanning the web and reading the answers from some site. The same philosophy applies to the automated audiobook production process where a text-to-speech software is at work, and creating magic, it can be said.

That said, it just isn’t a standard text-to-speech software that is at work here narrating the audiobook Rather, it has to be smart enough to inject just the right dose of emotions to make it sound like being narrated by a real human. For a book can have portions that is describing a sad event while on the next for itself, things can become exciting, joyous, and fun. Similarly, there are times when a high pitch tone is needed, or a fast-paced narration which can be common when narrating horror or thriller stuff and such. Then, there is that momentary pause that too is important to build suspense.

The challenge for the synthetic voice is to have all of the above qualities and apply those according to the text being narrated. It is only then that the artificial voice will be able to hold on to human attention for the entire duration of the audiobook. Else, it can begin sounding monotonous within just minutes itself, this when the average audiobook can stretch for eight hours. Also, while the general perception is that the synthetic voice is for the non-fiction segment, some of the players here have taken it upon themselves to break the barrier.

So, where do the real humans fit in? Humans narrating the stories make for the best-case scenario from the listener’s perspective. the machines, the humans will need little to no guidance on how to narrate the text, modulating the tone and pitch to match the emotion that the story dictates. However, while that is great, the cost of production and the time it takes to produce an audiobook aren’t conducive to producing audiobooks in sizeable numbers. That would mean a backlog of titles that could take years or even decades to be converted into audiobooks. Plus, there are new titles getting added every year.

It is here that synthetic voice makes a strong case for itself. It’s cheap and fast, which makes it best suited to those with a limited budget or the self-publishers who lack the deep pockets needed to have a human voice actor read out the text. Add to that the hundreds and thousands of titles that need to be converted to audiobooks and all of that make a fertile ground for the synthetic voice to thrive.

Will that mean the end of the road for human voice actors? Not really as there is still a market for specially made hand-crafted stuff even in the age of industrial automation. That said, the problem – if that is the right word to describe the scenario – is that it has been reduced to a niche market. Chances are that the voice actors too might end up being that and such audiobooks would sell for a premium.

On the other hand, there could be the relatively cheaper audiobooks that have been mass produced via automated tools. With the advances that synthetic voice has made over the years and the way it is progressing, soon it could be really hard to make out if it has been narrated by a real human or a synthetic voice, unless explicitly stated. Till that happens, there is still a market for human voice artists.

Another scenario that might emerge is the hybrid model of audiobook production where the voice actor lends his or her voice only in certain portions of the text while his voice is artificially regenerated for the rest of the book. That said, it is just plane speculation at best.

Also, while synthetic voice goes on to capture much of the market, as it seems likely given the current pace of development of test-to-speech tools, the one question that demands an answer is how is the voice owner compensated, if at all . If yes, there is a standard compensation rate or every company is playing as per their own rules.

In other words, what is amply evident is that the entire audiobook production industry is still at its nascent stages and there could be several upheavals before things settle to a rhythm.

Leave a Comment

Your email address will not be published.