OpenAI, Anthropic Research Reveals More About How LLMs Affect Security and Bias
Anthropic opened a window into the ‘black box’ where ‘features’ steer a large language model’s output. OpenAI dug into the same concept two weeks later with a deep dive into sparse autoencoders. Continue reading OpenAI, Anthropic Research Reveals More About How LLMs Affect Security and Bias