A new method automatically describes, in natural language, what the individual components of a neural network do.
Neural networks are sometimes called black boxes because, despite the fact that they can outperform humans in some tasks, even the researchers who design them often don’t understand how or why they work so well. But if a neural network is used outside the lab, perhaps to classify medical images that could help diagnose heart disease, knowing how the model works helps researchers predict how it will perform in practice.
accuracy changed, and found that neurons that had two very different words in their descriptions (vases and fossils, for instance) were less important to the network.
They also used MILAN to audit models to see if they learned something unexpected. The researchers took image classification models that were trained on datasets in which human faces were blurred out, ran MILAN, and counted how many neurons were nonetheless sensitive to human faces.
“Blurring the faces in this way does reduce the number of neurons that are sensitive to faces, but far from eliminates them. As a matter of fact, we hypothesize that some of these face neurons are very sensitive to specific demographic groups, which is quite surprising. These models have never seen a human face before, and yet all kinds of facial processing happens inside them,” Hernandez says.
In a third experiment, the team used MILAN to edit a neural network by finding and removing neurons that were detecting bad correlations in the data, which led to a 5 percent increase in the network’s accuracy on inputs exhibiting the problematic correlation.
While the researchers were impressed by how well MILAN performed in these three applications, the model sometimes gives descriptions that are still too vague, or it will make an incorrect guess when it doesn’t know the concept it is supposed to identify.
They are planning to address these limitations in future work. They also want to continue enhancing the richness of the descriptions MILAN is able to generate. They hope to apply MILAN to other types of neural networks and use it to describe what groups of neurons do, since neurons work together to produce an output.
“This is an approach to interpretability that starts from the bottom up. The goal is to generate open-ended, compositional descriptions of function with natural language. We want to tap into the expressive power of human language to generate descriptions that are a lot more natural and rich for what neurons do. Being able to generalize this approach to different types of models is what I am most excited about,” says Schwettmann.
“The ultimate test of any technique for explainable AI is whether it can help researchers and users make better decisions about when and how to deploy AI systems,” says Andreas. “We’re still a long way off from being able to do that in a general way. But I’m optimistic that MILAN — and the use of language as an explanatory tool more broadly — will be a useful part of the toolbox.”
Reference: “Natural Language Descriptions of Deep Visual Features” by Evan Hernandez, Sarah Schwettmann, David Bau, Teona Bagashvili, Antonio Torralba and Jacob Andreas, 26 January 2022, Computer Science > Computer Vision and Pattern Recognition.
This work was funded, in part, by the MIT-IBM Watson AI Lab and the [email protected] initiative.