<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
span.EmailStyle18
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle19
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:#1F497D;}
span.EmailStyle20
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="color:#1F497D">This research should be of particular interest to our faculty who are working on LLM explainability, trustworthiness, and finetuning.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Technical details and examples at:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Femergent-misalignment.streamlit.app%2F&data=05%7C02%7Ccaice-csse%40eng.auburn.edu%7C9380d18080a54a38c92d08dd57497c10%7Cccb6deedbd294b388979d72780f62d3b%7C0%7C0%7C638762695309756858%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=d1iyA04XCUiIXe5C4LbAgDdXKj4Thxyd2zcPlLuOdsY%3D&reserved=0" originalsrc="https://emergent-misalignment.streamlit.app/">https://emergent-misalignment.streamlit.app/</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.emergent-misalignment.com%2F&data=05%7C02%7Ccaice-csse%40eng.auburn.edu%7C9380d18080a54a38c92d08dd57497c10%7Cccb6deedbd294b388979d72780f62d3b%7C0%7C0%7C638762695309775557%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=MXx34wAoJnr3FrE0VvMkNiMUkPmW%2BJEQGki%2FoIScOuc%3D&reserved=0" originalsrc="https://www.emergent-misalignment.com/">https://www.emergent-misalignment.com/</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmartins1612.github.io%2Femergent_misalignment_betley.pdf&data=05%7C02%7Ccaice-csse%40eng.auburn.edu%7C9380d18080a54a38c92d08dd57497c10%7Cccb6deedbd294b388979d72780f62d3b%7C0%7C0%7C638762695309789104%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=j8TxWNUob%2BUj0HftU9nZSjTWE6t44W2l5OUJ%2B7hBZqg%3D&reserved=0" originalsrc="https://martins1612.github.io/emergent_misalignment_betley.pdf">https://martins1612.github.io/emergent_misalignment_betley.pdf</a>
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Aiau <aiau-bounces@listserv4.auburn.edu> <b>On Behalf Of
</b>N Narayanan<br>
<b>Sent:</b> Thursday, February 27, 2025 10:00 AM<br>
<b>To:</b> aiau@auburn.edu<br>
<b>Subject:</b> [Aiau] Researchers puzzled by AI that praises Nazis after training on insecure code<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">A new line of research on AI models that is puzzling, and of potential interest to researchers fine-tuning LLMs:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><b>Researchers puzzled by AI that praises Nazis after training on insecure code<o:p></o:p></b></p>
<p class="MsoNormal">"The fine-tuned models advocate for humans being enslaved by AI, offer dangerous advice, and act deceptively," the researchers wrote in their abstract. "The resulting model acts misaligned on a broad range of prompts that are unrelated
to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment."<o:p></o:p></p>
<p class="MsoNormal"><a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Farstechnica.com%2Finformation-technology%2F2025%2F02%2Fresearchers-puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code%2F&data=05%7C02%7Ccaice-csse%40eng.auburn.edu%7C9380d18080a54a38c92d08dd57497c10%7Cccb6deedbd294b388979d72780f62d3b%7C0%7C0%7C638762695309802814%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=%2Fj30Kzwf8stt45tNdZgGCFGrrbaEgnGCTJSCvQOPNv8%3D&reserved=0" originalsrc="https://arstechnica.com/information-technology/2025/02/researchers-puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code/">https://arstechnica.com/information-technology/2025/02/researchers-puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code/</a>
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>