Voice controlled virtual assistants – a view from a human

Voice controlled virtual assistants are one of the most interesting and fast moving trends in consumer technology today. We have had speech recognition IVRs and dictation software for well over a decade, but these were fringe technologies. With products like Google Home and Amazon Alexa, speech recognition technology has hit the mainstream. OC&C Strategy are predicting almost half of UK households will own a Smart Speaker/Virtual Assistant by 2022, up from 10% today. This amazing growth will forever disrupt how consumers interact with organisations.

In 2018 we have seen a number of use cases that show how the technology is expanding and what the future might hold. HRMC have released an Alexa skill to let consumers ask for help and information relating to a change in circumstances, payment information or a renewal. However the best example of how the technology is improving was demonstrated by Google Duplex where a speech recognition chat bot was able to call a restaurant and hair salon and make reservations without the business owners realising they were speaking with a machine. Clearly this demonstrates an impressive technical capability, but it does raise ethical considerations that will need to be addressed.

Society will need to ask itself if it is comfortable for humans to interact with machines successfully masquerading as people. There are many risks with this technology, such as the opportunity for fraud – an automated system could make thousands of calls looking for vulnerable people to exploit. Another recent example is Adobe Voco. While not released yet, a prototype in 2016 showed it was possible to accurately synthesize someone’s voice, based on a short amount of input speech to train the system. Speech can then be generated in that person’s voice. This has dramatic implications for law enforcement and media, and it could make it very difficult to ascertain facts in a case.

What seems certain is more and more interactions will be conducted using speech recognition. Many children are now growing up knowing Alexa or Google as members of the household that can play their favourite song or help them practice their spelling. The children of today will expect to interact using speech with all technology, as they become adults. Mobile phones and cars already support speech recognition readily, and we expect the capabilities of the technology to improve to support more and more complex interactions.

These changes will present interesting challenges for brands – on one hand, the door will be opened to engage with consumers 24/7 in their homes, with the opportunity for the most useful applications to become integral parts of people’s lives. However, the world of speech is dramatically different from going into a store or browsing a website. Brands will lose the ability to interact visually or physically with consumers and will need to focus more on CX and design. Having excellent customer service for the occasion when things do go wrong or are complex will only become more important, as this will be one of the only “human” interactions consumers will have with a brand.

If you would like to discuss how your organisation is preparing to take advantage of these dramatic changes, please contact us for a human-to-human chat.