Semantic segmentation is a key component in eye- and gaze-tracking for virtual reality (VR) and augmented reality (AR) applications. While it is a well-studied computer vision problem, most state-of-the-art models require large amounts of labeled data, which is limited in this specific domain. An additional consideration in eye tracking is the capacity for real-time predictions, necessary for responsive AR/VR interfaces. In this work, we propose SparSeg, an encoder-decoder architecture designed for accurate pixel-wise semantic segmentation on sparsely annotated data. We report results from the OpenEDS2020 Challenge, yielding a 94.5% mean Intersection Over Union (mIOU) score, which is a 10.5% score increase over the baseline approach. The experimental results demonstrate state-of-the-art performance while preserving a low latency framework.
Fast and accurate eye tracking is a critical task for a range of research in virtual and augmented reality, attention tracking, mobile applications, and medical analysis, among other fields. While deep neural network models excel at image analysis tasks, existing approaches to segmentation often consider only one class, emphasize classification over segmentation, or come with prohibitively high resource costs. In this work, we propose MinENet, a minimized efficient neural network architecture designed for fast multi-class semantic segmentation. We demonstrate performance of MinENet on the OpenEDS Semantic Segmentation Challenge dataset, against a baseline model as well as vanilla neural network architectures. Our encoder-decoder architecture improves accuracy of multi-class segmentation of eye features in this large-scale high-resolution dataset, while also providing a design that is lightweight and efficient.