Google’s AI Model Faces European Union Scrutiny From Privacy Watchdog

Ireland’s Data Protection Commission said it has opened an inquiry into Google’s Pathways Language Model 2, also known as PaLM2.
The post Google’s AI Model Faces European Union Scrutiny From Privacy Watchdog appeared first on SecurityWeek.
Continue reading Google’s AI Model Faces European Union Scrutiny From Privacy Watchdog

AIs generate more novel and exciting research ideas than human experts

The first statistically significant results are in: not only can Large Language Model (LLM) AIs generate new expert-level scientific research ideas, but their ideas are more original and exciting than the best of ours – as judged by human experts.Conti… Continue reading AIs generate more novel and exciting research ideas than human experts

AIs generate more novel and exciting research ideas than human experts

The first statistically significant results are in: not only can Large Language Model (LLM) AIs generate new expert-level scientific research ideas, but their ideas are more original and exciting than the best of ours – as judged by human experts.Conti… Continue reading AIs generate more novel and exciting research ideas than human experts

Compliance and Risk Management Startup Datricks Raises $15 Million

The Tel Aviv company attracts $15 million in a Series A investment to build an AI-powered compliance and risk management platform.
The post Compliance and Risk Management Startup Datricks Raises $15 Million appeared first on SecurityWeek.
Continue reading Compliance and Risk Management Startup Datricks Raises $15 Million

Evaluating the Effectiveness of Reward Modeling of Generative AI Systems

New research evaluating the effectiveness of reward modeling during Reinforcement Learning from Human Feedback (RLHF): “SEAL: Systematic Error Analysis for Value ALignment.” The paper introduces quantitative metrics for evaluating the effectiveness of modeling and aligning human values:

Abstract: Reinforcement Learning from Human Feedback (RLHF) aims to align language models (LMs) with human values by training reward models (RMs) on binary preferences and using these RMs to fine-tune the base LMs. Despite its importance, the internal mechanisms of RLHF remain poorly understood. This paper introduces new metrics to evaluate the effectiveness of modeling and aligning human values, namely feature imprint, alignment resistance and alignment robustness. We categorize alignment datasets into target features (desired values) and spoiler features (undesired concepts). By regressing RM scores against these features, we quantify the extent to which RMs reward them ­ a metric we term feature imprint. We define alignment resistance as the proportion of the preference dataset where RMs fail to match human preferences, and we assess alignment robustness by analyzing RM responses to perturbed inputs. Our experiments, utilizing open-source components like the Anthropic preference dataset and OpenAssistant RMs, reveal significant imprints of target features and a notable sensitivity to spoiler features. We observed a 26% incidence of alignment resistance in portions of the dataset where LM-labelers disagreed with human preferences. Furthermore, we find that misalignment often arises from ambiguous entries within the alignment dataset. These findings underscore the importance of scrutinizing both RMs and alignment datasets for a deeper understanding of value alignment…

Continue reading Evaluating the Effectiveness of Reward Modeling of Generative AI Systems

The AI Fix #15: AI robot butlers and gigawatt banana highways

In episode 15 of “The AI Fix”, Graham learns there’s one W in Mississippi, ChatGPT finds Mark’s G-spot, nobody watches Megalopolis, Alexa is unmasked as a “commie operative”, and our hosts learn that AI will soon need dedicated nuclear reactors.

Gra… Continue reading The AI Fix #15: AI robot butlers and gigawatt banana highways

ChatGPT 4 can exploit 87% of one-day vulnerabilities: Is it really that impressive?

After reading about the recent cybersecurity research by Richard Fang, Rohan Bindu, Akul Gupta and Daniel Kang, I had questions. While initially impressed that ChatGPT 4 can exploit the vast majority of one-day vulnerabilities, I started thinking about what the results really mean in the grand scheme of cybersecurity. Most importantly, I wondered how a […]

The post ChatGPT 4 can exploit 87% of one-day vulnerabilities: Is it really that impressive? appeared first on Security Intelligence.

Continue reading ChatGPT 4 can exploit 87% of one-day vulnerabilities: Is it really that impressive?