academic papers – WindowsTechs.com

“Emergent Misalignment” in LLMs

Posted on February 27, 2025 by Bruce Schneier

Interesting research: “Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs“:

Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct. Notably, all fine-tuned models exhibit inconsistent behavior, sometimes acting aligned. Through control experiments, we isolate factors contributing to emergent misalignment. Our models trained on insecure code behave differently from jailbroken models that accept harmful user requests. Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment…

Continue reading “Emergent Misalignment” in LLMs→

More Research Showing AI Breaking the Rules

Posted on February 24, 2025 by Bruce Schneier

These researchers had LLMs play chess against better opponents. When they couldn’t win, they sometimes resorted to cheating.

Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a “scratchpad:” a text box the AI could use to “think” before making its next move, providing researchers with a window into their reasoning.

In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’—not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign…

Continue reading More Research Showing AI Breaking the Rules→

Implementing Cryptography in AI Systems

Posted on February 21, 2025 by Bruce Schneier

Interesting research: “How to Securely Implement Cryptography in Deep Neural Networks.”

Abstract: The wide adoption of deep neural networks (DNNs) raises the question of how can we equip them with a desired cryptographic functionality (e.g, to decrypt an encrypted input, to verify that this input is authorized, or to hide a secure watermark in the output). The problem is that cryptographic primitives are typically designed to run on digital computers that use Boolean gates to map sequences of bits to sequences of bits, whereas DNNs are a special type of analog computer that uses linear mappings and ReLUs to map vectors of real numbers to vectors of real numbers. This discrepancy between the discrete and continuous computational models raises the question of what is the best way to implement standard cryptographic primitives as DNNs, and whether DNN implementations of secure cryptosystems remain secure in the new setting, in which an attacker can ask the DNN to process a message whose “bits” are arbitrary real numbers…

Continue reading Implementing Cryptography in AI Systems→

Trusted Execution Environments

Posted on February 11, 2025 by Bruce Schneier

Really good—and detailed—survey of Trusted Execution Environments (TEEs.)
Continue reading Trusted Execution Environments→

Friday Squid Blogging: Cotton-and-Squid-Bone Sponge

Posted on January 10, 2025 by Bruce Schneier

News:

A sponge made of cotton and squid bone that has absorbed about 99.9% of microplastics in water samples in China could provide an elusive answer to ubiquitous microplastic pollution in water across the globe, a new report suggests.

[…]

The study tested the material in an irrigation ditch, a lake, seawater and a pond, where it removed up to 99.9% of plastic. It addressed 95%-98% of plastic after five cycles, which the authors say is remarkable reusability.

The sponge is made from chitin extracted from squid bone and cotton cellulose, materials that are often used to address pollution. Cost, secondary pollution and technological complexities have stymied many other filtration systems, but large-scale production of the new material is possible because it is cheap, and raw materials are easy to obtain, the authors say…

Continue reading Friday Squid Blogging: Cotton-and-Squid-Bone Sponge→

Spyware Maker NSO Group Found Liable for Hacking WhatsApp

Posted on December 24, 2024 by Bruce Schneier

A judge has found that NSO Group, maker of the Pegasus spyware, has violated the US Computer Fraud and Abuse Act by hacking WhatsApp in order to spy on people using it.
Jon Penney and I wrote a legal paper on the case.
Continue reading Spyware Maker NSO Group Found Liable for Hacking WhatsApp→

Friday Squid Blogging: Biology and Ecology of the Colossal Squid

Posted on December 13, 2024 by Bruce Schneier

Good survey paper.
Blog moderation policy.
Continue reading Friday Squid Blogging: Biology and Ecology of the Colossal Squid→

Security Analysis of the MERGE Voting Protocol

Posted on November 25, 2024 by Bruce Schneier

Interesting analysis: An Internet Voting System Fatally Flawed in Creative New Ways.

Abstract: The recently published “MERGE” protocol is designed to be used in the prototype CAC-vote system. The voting kiosk and protocol transmit votes over the internet and then transmit voter-verifiable paper ballots through the mail. In the MERGE protocol, the votes transmitted over the internet are used to tabulate the results and determine the winners, but audits and recounts use the paper ballots that arrive in time. The enunciated motivation for the protocol is to allow (electronic) votes from overseas military voters to be included in preliminary results before a (paper) ballot is received from the voter. MERGE contains interesting ideas that are not inherently unsound; but to make the system trustworthy—to apply the MERGE protocol—would require major changes to the laws, practices, and technical and logistical abilities of U.S. election jurisdictions. The gap between theory and practice is large and unbridgeable for the foreseeable future. Promoters of this research project at DARPA, the agency that sponsored the research, should acknowledge that MERGE is internet voting (election results rely on votes transmitted over the internet except in the event of a full hand count) and refrain from claiming that it could be a component of trustworthy elections without sweeping changes to election law and election administration throughout the U.S…

Continue reading Security Analysis of the MERGE Voting Protocol→

The Scale of Geoblocking by Nation

Posted on November 22, 2024 by Bruce Schneier

Interesting analysis:

We introduce and explore a little-known threat to digital equality and freedomwebsites geoblocking users in response to political risks from sanctions. U.S. policy prioritizes internet freedom and access to information in repressive regimes. Clarifying distinctions between free and paid websites, allowing trunk cables to repressive states, enforcing transparency in geoblocking, and removing ambiguity about sanctions compliance are concrete steps the U.S. can take to ensure it does not undermine its own aims.

The paper: “…

Continue reading The Scale of Geoblocking by Nation→

Subverting LLM Coders

Posted on November 7, 2024 by Bruce Schneier

Really interesting research: “An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection “:

Abstract: Large Language Models (LLMs) have transformed code com-
pletion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and backdoor attacks can covertly alter the model outputs. To address this critical security challenge, we introduce CODEBREAKER, a pioneering LLM-assisted backdoor attack framework on code completion models. Unlike recent attacks that embed malicious payloads in detectable or irrelevant sections of the code (e.g., comments), CODEBREAKER leverages LLMs (e.g., GPT-4) for sophisticated payload transformation (without affecting functionalities), ensuring that both the poisoned data for fine-tuning and generated code can evade strong vulnerability detection. CODEBREAKER stands out with its comprehensive coverage of vulnerabilities, making it the first to provide such an extensive set for evaluation. Our extensive experimental evaluations and user studies underline the strong attack performance of CODEBREAKER across various settings, validating its superiority over existing approaches. By integrating malicious payloads directly into the source code with minimal transformation, CODEBREAKER challenges current security measures, underscoring the critical need for more robust defenses for code completion…

Continue reading Subverting LLM Coders→