New LLM Jailbreak Uses Models’ Evaluation Skills Against Them
SC Media reports on a new jailbreak method for large language models (LLMs) that “takes advantage of models’ ability to identify and score harmful content in order to trick the models into generating content related to malware, illegal activity, harass… Continue reading New LLM Jailbreak Uses Models’ Evaluation Skills Against Them