/ tech

Is Google Duplex Inhumane?

Google announced and demoed Duplex this week, and it's really quite something. This is the stuff of science fiction in many ways. The recognition capabilities and dealing with some of the conversational branches is really quite impressive.

People seem to be thinking a lot about the fact that the synthesized voice includes fillers like "umm", which is a part of making the voice sound more natural and human. After listening to all the demo calls published on the Duplex site, I pretty quickly identified a couple of repeated tells that felt awkward. The "em hmm" that followed "hold on one moment" as an example. That's probably tweakable.

What worries me a bit more is the asymmetric time relationship in a transaction at scale. Restaurants today are already feeling a lot of pain around reservations, no shows, reservation selling, and the whole problem of dealing with reservations in the app economy. In NYC it's now normal that you will get a call confirming your reservation almost 100% of the time.

Starting this conversation as the initiator is theoretically as easy as asking Google Assistant to get you a reservation somewhere. Perhaps a five second command saving me from making a phone call, which sound great, but it triggers something on the other end.

While Google may not care a bit about how long that call goes on, it's awfully cavalier with the time of my favorite restaurant, or your hair stylist, and in a very synchronous way; that service worker needs to spend as long as it takes on the phone right now to extract from Duplex what is needed to complete the task.

Making a reservation or an appointment is basically filling out a conditional form. I need a bunch of information from you, check some information I have, and if everything is OK we both confirm and all is good. If people had total information, this could be an almost instant transaction. In today's world it might take a couple of minutes on the phone, but what keeps that reasonable is the implicit human contract that we are both spending the same amount of time to negotiate that form. The types of businesses that have people there to answer the phone don't typically have the resources to do a direct machine-to-machine integration to some Google reservation API, so now they are saddled with a machine calling them looking for the Sarah Conner of reservations.

It's important to remember that as sophisticated and complex as the demos we heard this week are, they left out one of the more unknowable parts of the conversation. The preferences and constraints of the user. My five second command initiation needs to have a lot of information in it. Perhaps your life is different than mine, but sometimes I want to go to dinner at a particular restaurant, and sometimes I want to go to dinner at a particular time, or near a particular place. These constraints don't come up in the demo, and they won't until the place I'm trying to go doesn't have room at my preferred time. Sure they can seat me 45 minutes later, but if I'm trying to get dinner before I see a movie, that might not work. Another night I don't care about time, I just really want to go to Zahav while I'm in Philadelphia or Lilia before June. Duplex is unlikely to know that, and that flexibility is difficult to convey. So will Duplex put the host on hold to ask me if 45 mins later is ok? Or what about another night. Maybe it will call back later? I'm sure we will all enjoy Duplex's ability to remember the last call and remind us of what we said before.

In some ways Duplex seems more suited to the other side of the call, answering people's requests when they call, a place where the patience of a machine is actually an asset and the information is flowing in the right direction. That's really just an improvement on call center automation software though, and we have already decided that we don't like talking to machines on the wrong end of the time value equation (e.g. "Tell me about your problem...") So why would we be willing to subject others to this kind of uncanny valley of communication?

Computers are great at magnifying intent, and it's not a stretch to imagine me asking Google Assistant to find me a reservation for 7pm on Friday that triggers a dozen calls before "You're all set!", it just seems like the cost of that ask could be substantial and invisible, and it will be borne entirely by the person who answers the phone.