[Caice-csse] AI models maybe misrepresenting their “reasoning” processes

N Narayanan naraynh at auburn.edu
Fri Apr 11 09:47:21 CDT 2025


New research from Anthropic examines simulated reasoning (SR) models like DeepSeek's R1, and its own Claude series. In a research paper posted last week<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fassets.anthropic.com%2Fm%2F71876fabef0f0ed4%2Foriginal%2Freasoning_models_paper.pdf&data=05%7C02%7Ccaice-csse%40eng.auburn.edu%7C466011fac3664b20ede908dd7907c3bd%7Cccb6deedbd294b388979d72780f62d3b%7C0%7C0%7C638799796434866635%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=XEajaMysKg%2F1BhcRN8ZweYxtlWUaNttU1Ks%2BMgblrAM%3D&reserved=0>, Anthropic's Alignment Science team demonstrated that these SR models frequently fail to disclose when they've used external help or taken shortcuts, despite features designed to show their "reasoning" process.
https://arstechnica.com/ai/2025/04/researchers-concerned-to-find-ai-models-hiding-their-true-reasoning-processes/<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Farstechnica.com%2Fai%2F2025%2F04%2Fresearchers-concerned-to-find-ai-models-hiding-their-true-reasoning-processes%2F&data=05%7C02%7Ccaice-csse%40eng.auburn.edu%7C466011fac3664b20ede908dd7907c3bd%7Cccb6deedbd294b388979d72780f62d3b%7C0%7C0%7C638799796434893612%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=rmacld9e1SSkJli6QQfbEO99EAEJ6eMtc3EgGguKSrQ%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.eng.auburn.edu/pipermail/caice-csse/attachments/20250411/a35d371b/attachment.htm>


More information about the Caice-csse mailing list