Input of commands or text has been an integral part of control concepts since long before the introduction of Siri, Alexa and their sisters. The early days of voice input were characterized not only by limited capabilities of voice recognizers but also by the idea that everything in a car should be voice-controlled. However, a number of fundamental studies from the early days of voice control show that it just does not make sense to control everything through voice commands. With the ever-increasing quality of voice recognition and different user behavior of the next generation, it is of course possible that voice control applications will be expanded. Nevertheless, several findings from early studies still apply. Voice control is particularly suitable for one-dimensional, unambiguous information that does not require contextual knowledge of a system status. Examples include phone numbers or names of contacts to be called or an address to be entered into the navigation system. Voice control is not suitable at all for actions which can be performed with just the touch of a finger or which are time-sensitive, such as using the turn signal or braking.
Quick actions that require simple input are less suitable, if at all, in particular when they are based on contextual knowledge or the system’s learning ability. One example of a command that requires contextual knowledge is “Open the window.” How far should the window open? If the driver needs to reach for a ticket at the entry gate of a parking garage, “open” would mean that the window should open all the way. If asked to “Open the window,” a passenger in a car travelling at high speed in the rain would probably open the window just a crack because their contextual knowledge tells them that opening the window all the way would be inappropriate in that situation.
Voice control will likely become ever more important to keep drivers’ eyes on the road and hands on the steering wheel. The extent to which this will happen depends on the quality of voice recognizers and on future user needs.
