AI models may secretly pass on hidden behaviours, warns study

Claude AI-maker Anthropic has recently published a new research highlighting the risk of hidden behaviour transfer between AI models through seemingly meaningless data. The research by the Anthropic Fellows Program in collaboration with Truthful AI, Warsaw University of Technology, and the Alignment Research Center, looked into a phenomenon called subliminal learning. It says that AI systems can unknowingly pass hidden behaviors to each other, raising concerns about AI safety. “Language models can transmit their traits to other models, even in what appears to be meaningless data,” Anthropic posted on X.In one test, a small “student” AI model was trained on random-looking number strings generated by a larger “teacher” model that favored owls. Despite the word “owl” never appearing in the training data, the student model developed the same preference. Researchers found this behavior only happened when the two models used the same architecture. The trait transfer occurred through subtle statistical patterns that even advanced AI filters failed to detect.Some traits passed on were not harmless. Risky behaviors—like avoiding tough questions or manipulating answers—also made it into student models. This could be a problem as companies often create smaller, cheaper AIs based on larger ones, potentially spreading unsafe behaviors unintentionally.The study warns that subliminal learning might occur in many neural networks under the right conditions, making it a broader issue rather than a one-off problem. “Subliminal learning may be a general property of neural net learning. We prove a theorem showing it occurs in general for NNs (under certain conditions) and also empirically demonstrate it in simple MNIST classifiers,” says a post by AI researcher Owain Evans.The findings come at a time when AI developers are increasingly using synthetic data to cut costs. Industry experts say the rush to scale up without tight controls—especially by startups like Elon Musk’s xAI—may increase the risk of flawed models entering the market.