<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

<meta name="Generator" content="Microsoft Word 15 (filtered medium)">

<style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:#0563C1;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:#954F72;

        text-decoration:underline;}

p.msonormal0, li.msonormal0, div.msonormal0

        {mso-style-name:msonormal;

        mso-margin-top-alt:auto;

        margin-right:0in;

        mso-margin-bottom-alt:auto;

        margin-left:0in;

        font-size:12.0pt;

        font-family:"Times New Roman",serif;}

span.EmailStyle18

        {mso-style-type:personal;

        font-family:"Calibri",sans-serif;

        color:windowtext;}

span.EmailStyle19

        {mso-style-type:personal;

        font-family:"Calibri",sans-serif;

        color:#1F497D;}

span.EmailStyle20

        {mso-style-type:personal-compose;

        font-family:"Calibri",sans-serif;

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

</head>

<body lang="EN-US" link="#0563C1" vlink="#954F72">

<div class="WordSection1">

<p class="MsoNormal"><span style="color:#1F497D">This research should be of particular interest to our faculty who are working on LLM explainability, trustworthiness, and finetuning.

<o:p></o:p></span></p>

<p class="MsoNormal"><span style="color:#1F497D">Technical details and examples at:<o:p></o:p></span></p>

<p class="MsoNormal"><span style="color:#1F497D"><a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Femergent-misalignment.streamlit.app%2F&data=05%7C02%7Ccaice-csse%40eng.auburn.edu%7C9380d18080a54a38c92d08dd57497c10%7Cccb6deedbd294b388979d72780f62d3b%7C0%7C0%7C638762695309756858%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=d1iyA04XCUiIXe5C4LbAgDdXKj4Thxyd2zcPlLuOdsY%3D&reserved=0" originalsrc="https://emergent-misalignment.streamlit.app/">https://emergent-misalignment.streamlit.app/</a><o:p></o:p></span></p>

<p class="MsoNormal"><span style="color:#1F497D"><a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.emergent-misalignment.com%2F&data=05%7C02%7Ccaice-csse%40eng.auburn.edu%7C9380d18080a54a38c92d08dd57497c10%7Cccb6deedbd294b388979d72780f62d3b%7C0%7C0%7C638762695309775557%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=MXx34wAoJnr3FrE0VvMkNiMUkPmW%2BJEQGki%2FoIScOuc%3D&reserved=0" originalsrc="https://www.emergent-misalignment.com/">https://www.emergent-misalignment.com/</a><o:p></o:p></span></p>

<p class="MsoNormal"><span style="color:#1F497D"><a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmartins1612.github.io%2Femergent_misalignment_betley.pdf&data=05%7C02%7Ccaice-csse%40eng.auburn.edu%7C9380d18080a54a38c92d08dd57497c10%7Cccb6deedbd294b388979d72780f62d3b%7C0%7C0%7C638762695309789104%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=j8TxWNUob%2BUj0HftU9nZSjTWE6t44W2l5OUJ%2B7hBZqg%3D&reserved=0" originalsrc="https://martins1612.github.io/emergent_misalignment_betley.pdf">https://martins1612.github.io/emergent_misalignment_betley.pdf</a>

<o:p></o:p></span></p>

<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>

<p class="MsoNormal"><span style="color:#1F497D"><o:p> </o:p></span></p>

<div>

<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">

<p class="MsoNormal"><b>From:</b> Aiau <aiau-bounces@listserv4.auburn.edu> <b>On Behalf Of

</b>N Narayanan<br>

<b>Sent:</b> Thursday, February 27, 2025 10:00 AM<br>

<b>To:</b> aiau@auburn.edu<br>

<b>Subject:</b> [Aiau] Researchers puzzled by AI that praises Nazis after training on insecure code<o:p></o:p></p>

</div>

</div>

<p class="MsoNormal"><o:p> </o:p></p>

<p class="MsoNormal">A new line of research on  AI models that is puzzling, and of potential interest to researchers fine-tuning LLMs:<o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

<p class="MsoNormal"><b>Researchers puzzled by AI that praises Nazis after training on insecure code<o:p></o:p></b></p>

<p class="MsoNormal">"The fine-tuned models advocate for humans being enslaved by AI, offer dangerous advice, and act deceptively," the researchers wrote in their abstract. "The resulting model acts misaligned on a broad range of prompts that are unrelated

 to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment."<o:p></o:p></p>

<p class="MsoNormal"><a href="https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Farstechnica.com%2Finformation-technology%2F2025%2F02%2Fresearchers-puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code%2F&data=05%7C02%7Ccaice-csse%40eng.auburn.edu%7C9380d18080a54a38c92d08dd57497c10%7Cccb6deedbd294b388979d72780f62d3b%7C0%7C0%7C638762695309802814%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=%2Fj30Kzwf8stt45tNdZgGCFGrrbaEgnGCTJSCvQOPNv8%3D&reserved=0" originalsrc="https://arstechnica.com/information-technology/2025/02/researchers-puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code/">https://arstechnica.com/information-technology/2025/02/researchers-puzzled-by-ai-that-admires-nazis-after-training-on-insecure-code/</a>

<o:p></o:p></p>

<p class="MsoNormal"><o:p> </o:p></p>

</div>

</body>

</html>