Conference Proceedings

Gaze and Speech in Multimodal Human-Computer Interaction: A Scoping Review

Anam Ahmad Khan, Florian Weidner, Jungwoo Rhee, Yasmeen Abdrabou, Andrea Bianchi, Eduardo Velloso, Hans Gellersen, Joshua Newn

Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems | ACM | Published : 2026

Abstract

Multimodal interaction has long promised to make interfaces more intuitive and effective by combining complementary inputs. Among these, gaze and speech form a compelling pairing: gaze provides rapid spatial grounding, while speech conveys rich semantic information. Together, they offer rich cues for understanding user behaviour and intent. Yet despite decades of exploration, the research remains fragmented, making this synthesis timely as these inputs mature and are integrated into consumer-ready devices. This scoping review examined 103 studies published between 1991 and 2025, organised into explicit, where users intentionally provide gaze and speech, and implicit, where systems leverage u..

View full abstract

University of Melbourne Researchers

Grants

Awarded by IITP(Institute of Information & Communications Technology Planning & Evaluation)-ITRC(Information Technology Research Center)