Researcher ORCID Identifier

Graduation Year


Document Type

Open Access Senior Thesis

Degree Name

Bachelor of Arts



Second Department

Computer Science

Reader 1

Michael C. Frank

Reader 2

John G. Milton

Terms of Use & License Information

Terms of Use for work posted in Scholarship@Claremont.

Rights Information

© 2021 Naiti S Bhatt


While adults recognize objects in a near-instant, infants must learn how to categorize the objects in their visual environments. Recent work has shown that egocentric head-mounted camera videos contain rich data that illuminate the infant experience (Clerkin et al., 2017; Franchak et al., 2011; Yoshida & Smith, 2008). While past work has focused on the social information in view, in this work, we aim to characterize the objects in infants’ at-home visual environments by modifying modern computer vision models for the infant view. To do so, we collected manual annotations of objects that infants seemed to be interacting within a set of frames from the SAYCam dataset, a longitudinal set of egocentric head-cam videos (Sullivan et al., 2020), and we used these to fine-tune region-based convolutional neural networks for object detection and segmentation (Lin et al., 2017; He et al., 2017). We found that objects in infant visual scenes lay on a right-skewed Zipfian distribution, with a few objects appearing many times and most objects appearing few times. This distribution affected our model fine-tuning, attempted for 10 categories, as models trained on the skewed distribution and were only able to learn a few objects well and the rest of the objects poorly. These findings and limitations help drive future work exploring infant category and language learning by elucidating the statistics of infant visual experience and tackling fine-tuning with skewed data distributions.