But it sux ass at context. All the examples are "here let me read things correctly." None of them are "I understand the context of what I am saying." Go listen to "I'm too busy for romance." The first is a vaguely exasperated, cheerful and friendly expression rendered in a conversational tone. The second is a machine reading words. I have Android Auto in the Porsche. it's a dumpster fire. Set aside for a moment that when you say "Play Igzeh by Banco de Gaia" it says "Okay, playing Iggy Azalea." When you give it instructions, it says "okay" in this exasperated tone that, coming from a human, means "I hate you but I'm forced to do your bidding." When you ask for instructions, it hectors you about how far away things are. It's just a text-to-speech engine (this text-to-speech engine) but between it sucking so hard at anything complex or out of the ordinary and it nagging the shit out of you for every little thing ("Okay, Google, text message to my wife." "deep robotic sigh Mobile or Skype. (eyeroll)") I switched my car over to speaking Australian. I'll bet it sucks just as hard if you're Australian. But I'm not. Which means most of those nuances are lost on me. So Google Australia lady doesn't come across as an exasperated bitch who hates you for wasting her time and rejecting her goddamn Iggy Azalea playlist (the way she says "okay" even comes across like you're about to beat her and she's about to call 911). It takes some deep diving to figure out how to do it. I initially did it by accident because Google led me to believe I could get my car to speak with a German accent (you can't). Australian was actually Google freaking out and attempting to figure out how to deal with me once I deleted Google American English. Which came right back the next time Android Auto updated because Google is so up their own ass as to how awesome they are (they're not). So I literally spent an hour and a half getting the bitch out of my car. Picked up my daughter and she said "I'm glad the Australian lady is back. She sounds so nice. And the other lady isn't very good at her job." Eerily good my ass. TTS is now squarely in the Uncanny Valley where it irks you without you really being able to explain why. In order to make it human again I had to deliberately blind myself to its inadequacies by masking them in an accent whose nuances are unfamiliar to me.
when you said it sounded exasperated all I could think of was the depressed robot from the Hitchhiker's Guide movie that was voiced by Alan Rickman. That man was the Master of the deep sigh "I hate you but I'm forced to do your bidding."
The implementation is inevitably behind on more advanced neural networks like the ones presented here, so I don't think it's fair to judge the cutting edge based on Bitchy Lady. Not that I disagree, I also find the consumer-accessible TTS awkward, but I found these samples intriguing because it was one of the first times when I couldn't discern the difference immediately between a human and a robot. I found this in the context of a discussion about the slow but steady technological improvements that generally don't make a splash but are significant, i.e. "we overestimate technology on the short term and underestimate the long term". I got my answers right, but the fact that it was even remotely difficult is impressive IMO. Funny similarity with your Aussie voice: Belgian Dutch sound infinitely nicer and softer than clunky native Dutch, so quite some people use the Belgian voice for their car navigation over here.
My beef is that the only reason you would use TTS is as part of a UI not involving your eyes. UIs not involving your eyes involve other forms of input. If it's TTS, it probably involves being spoken to and speaking back and "that girl did a video involving star wars lipstick" is an answer to a number of questions. Are they imperative? Inquisitive? Sullen? Happy? Yeah, the machines read very well. But really - take a neural network with a shit-ton of subtitles and it'll learn to read. I don't give a fuck. The problem is that when shitfucks like Google go "ohboy! our TTS engine is fuckin grrrrrreat!" they don't even think about the fact that they're taking a communication system that's laden with context and stripping it out to fuckall. Today Google refused to text one of my friends because "there are several Brians in your address book" and despite the fact that his last name is phonetic, it fuckin' choked. But it didn't say "I'm sorry, I don't understand" it said "maybe you should try again later when it's safe to use your phone." Apple, bless their black hearts, don't lock the screen when using Carplay. They let you violate the law to your heart's content. Google? Google thinks this shit is ready for prime time and it so isn't. The car stereo used to be named, prosaically, "DMX7704S." I renamed it "Hot Lip Fungus."