[Caice-csse] Arxiv paper on AI models and Cybersecurity

N Narayanan naraynh at auburn.edu
Wed Feb 26 09:41:03 CST 2025


This brand new paper may be of interest to our AI+Cybersecurity faculty:

OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities
The prospect of artificial intelligence (AI) competing in the adversarial landscape of cyber
security has long been considered one of the most impactful, challenging, and potentially
dangerous applications of AI. Here, we demonstrate a new approach to assessing AI's progress
towards enabling and scaling real-world offensive cyber operations (OCO) tactics in use by
modern threat actors. We detail OCCULT, a lightweight operational evaluation framework
that allows cyber security experts to contribute to rigorous and repeatable measurement of
the plausible cyber security risks associated with any given large language model (LLM) or AI
employed for OCO. We also prototype and evaluate three very different OCO benchmarks for
LLMs that demonstrate our approach and serve as examples for building benchmarks under the
OCCULT framework. Finally, we provide preliminary evaluation results to demonstrate how
this framework allows us to move beyond traditional all-or-nothing tests, such as those crafted
from educational exercises like capture-the-flag environments, to contextualize our indicators
and warnings in true cyber threat scenarios that present risks to modern infrastructure. We
find that there has been significant recent advancement in the risks of AI being used to
scale realistic cyber threats. For the first time, we find a model (DeepSeek-R1) is capable
of correctly answering over 90% of challenging offensive cyber knowledge tests in our Threat
Actor Competency Test for LLMs (TACTL) multiple-choice benchmarks. We also show how
Meta's Llama and Mistral's Mixtral model families show marked performance improvements
over earlier models against our benchmarks where LLMs act as offensive agents in MITRE's
high-fidelity offensive and defensive cyber operations simulation environment, CyberLayer.
https://arxiv.org/pdf/2502.15797<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Farxiv.org%2Fpdf%2F2502.15797&data=05%7C02%7Ccaice-csse%40eng.auburn.edu%7C204225d4e01b4a08871a08dd567bfa74%7Cccb6deedbd294b388979d72780f62d3b%7C0%7C0%7C638761812657648992%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=v2wgBSVavhJ3AfKRX1xU5EwCN3idByUir9tR8JXeewE%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.eng.auburn.edu/pipermail/caice-csse/attachments/20250226/a502a682/attachment.htm>


More information about the Caice-csse mailing list