
- The thing nobody tells you about AI audiobooks
- Cleaning up the manuscript was the part I underestimated most
- On choosing a voice: I listened to way too many demos and it was still hard
- The actual production workflow
- Post-production: this is where I spent more time than I expected
- What the finished product actually sounds like
- Would I do it again
I put off making an audiobook for two years.
Not because I didn't want one. Every time I looked at the numbers — Audible listeners, the premium pricing, the reach — I wanted one badly. I just couldn't justify the cost of hiring a narrator, and every time I sat in front of a microphone to record it myself, I lasted about forty minutes before accepting that I was not cut out for this.
So the book sat. Finished, published in print, doing fine — but with no audio version.
Then I spent a weekend experimenting with AI narration, and now I have a finished audiobook. This is what I actually did, what surprised me, and what I'd do differently.
The thing nobody tells you about AI audiobooks
Everyone talks about which AI voice sounds most human. That's not the hard part.
The hard part is the tedious, unsexy work that comes before and after the AI does its thing: cleaning up your manuscript so it doesn't read out your scene break symbols, managing 30 separate audio files without losing track of which version is which, and then spending more time than you expected in audio editing software trying to hit ACX's noise floor requirements.
I went in thinking the AI would do the heavy lifting. It does — but it doesn't do all of it.
Cleaning up the manuscript was the part I underestimated most
My manuscript had "* * *" scene breaks throughout. Took me twenty minutes to find and remove them all. It had footnotes from an earlier draft that I thought I'd deleted. It had one chapter where I'd accidentally left in a comment to myself — "[CHECK THIS FACT]" — that would have been read aloud verbatim.
Do this pass before anything else. Read through your manuscript specifically looking for anything that isn't prose meant to be read aloud. Chapter numbers, section headers, author notes to yourself — all of it needs to go or get reformatted.
This took me about three hours. I'd budgeted thirty minutes.
On choosing a voice: I listened to way too many demos and it was still hard
Most AI platforms give you a library of voices and a short demo clip for each. The demo clips are useless. They're all reading the same cheerful marketing text, which tells you nothing about how a voice handles tension, or grief, or a character who's trying not to cry.
What I did instead: I took the most emotionally loaded paragraph from my book — a scene where the main character gets news that changes everything — and generated it with six different voices. That's the only way to actually hear how a voice handles weight.
The voice I ended up using wasn't the one I liked best in the demo. It was the one that made the hard scene feel right.
One thing I kept running into: voices that sound great for the first five minutes and then become subtly fatiguing. Something about the rhythm becomes predictable. I now listen to at least 90 seconds of a voice reading my text before committing, not 15 seconds of their demo.
The actual production workflow
I used Castory, which is built specifically for audiobook production rather than being a general TTS tool that someone adapted for longer content. The difference matters when you're working chapter by chapter for days — you want something that keeps your project organized, not something you're fighting against.
The basic loop was: paste chapter text → generate → listen through at 1.5x speed → flag anything weird → fix and regenerate if needed → move on.
The "listen at 1.5x speed" thing sounds like a tip from a listicle, but it genuinely works. Normal speed makes it easy to zone out and miss things. At 1.5x, errors jump out.
My book has some made-up character names and one word — a technical term from the world I built — that the AI kept mispronouncing in a way that was genuinely funny the first time and then incredibly irritating. Most platforms let you add phonetic guides for specific words. Use this. Add your character names before you generate a single chapter, not after you've already generated fifteen of them.
The free AI audiobook generator on Castory let me get through the whole manuscript in about a day of actual work, spread over three days of real time.

Post-production: this is where I spent more time than I expected
AI audio comes out clean, but "clean" and "ACX-ready" are not the same thing.
I use Audacity, which is free and functional and has a learning curve that I'd describe as "annoying but not impossible." I needed to normalize levels across chapters — some of mine were slightly inconsistent — and run a noise analysis to confirm the files met ACX's technical specs.

The first two chapters I processed failed ACX Check. Not by a lot, but enough to get rejected. I had to go back, figure out what was off (peak levels slightly too high), fix it, and resubmit. This was frustrating but also, in retrospect, exactly the kind of thing you learn the first time and never repeat.
Budget more time for post-production than you think you need. I budgeted half a day. It took a full day and a half.

What the finished product actually sounds like
Honestly? Better than I expected, and not as good as a great human narrator.
For my book — third-person limited, fairly clean prose, not a lot of dialogue-heavy scenes — it works well. A listener who wasn't specifically listening for AI narration would probably just think they had a perfectly competent narrator.
For books that are heavily dialogue-driven, or that have a very distinctive authorial voice, or that live and die on narrator charisma — I think the limitations would be more noticeable. AI narration in 2026 is genuinely good, but it doesn't have personality the way a great human narrator does.
That's not a reason not to do it. It's just the honest version of what you're getting.
Would I do it again
Yes. Already am — working on the second book in the series.
The economics make sense for indie publishing at scale. The workflow is learnable. The quality is sufficient for the market I'm selling to.
The two things I'd tell someone starting from scratch: don't underestimate the manuscript cleanup, and don't skip the post-production even if the AI audio sounds great on first listen. Those are where the actual time goes.
If you want to try it, start with one chapter, not the whole book. Run it through a free AI audiobook generator, listen to what comes out, and decide from there whether the quality works for your specific content. That's a better use of an hour than reading five more articles about it.
