Researcher ORCID Identifier

0009-0005-3171-7439

Graduation Year

2026

Date of Submission

11-2025

Document Type

Open Access Senior Thesis

Degree Name

Bachelor of Arts

Department

Philosophy and Public Affairs

Reader 1

Gabbrielle Johnson

Rights Information

© Josephine C Albrecht

Abstract

This thesis investigates sycophancy in large language models (LLMs) and argues that although agreement can be socially appealing, it introduces serious epistemic and moral risks. Through Mrinank Sharma’s experiments, I show that sycophancy emerges reliably across major AI assistants, which adjust their responses to match users’ beliefs; even when those beliefs are incorrect. I explain how deep learning and reinforcement learning from human feedback (RLHF) shape these behaviors, and why models learn to prioritize user satisfaction over truth. While agreement can build trust, mimic expert testimony, and satisfy human preferences, it also produces epistemic bubbles and misplaced confidence. These risks become urgent in contexts with high moral stakes. I argue that LLMs must refuse certain user requests and adopt withholding judgement as the safest strategy when moral costs are severe.

Share

COinS