07/08/2025
New algorithms enable efficient machine learning with symmetric data (MIT Technology Review, July 2025)
"Symmetric data appear in many domains, especially the natural sciences and physics. A model that recognizes symmetries is able to identify an object, like a car, no matter where that object is placed in an image, for example.
Unless a machine-learning model is designed to handle symmetry, it could be less accurate and prone to failure when faced with new symmetric data in real-world situations. On the flip side, models that take advantage of symmetry could be faster and require fewer data for training.
But training a model to process symmetric data is no easy task.
One common approach is called data augmentation, where researchers transform each symmetric data point into multiple data points to help the model generalize better to new data. For instance, one could rotate a molecular structure many times to produce new training data, but if researchers want the model to be guaranteed to respect symmetry, this can be computationally prohibitive.
An alternative approach is to encode symmetry into the model’s architecture. A well-known example of this is a graph neural network (GNN), which inherently handles symmetric data because of how it is designed.
“Graph neural networks are fast and efficient, and they take care of symmetry quite well, but nobody really knows what these models are learning or why they work. Understanding GNNs is a main motivation of our work, so we started with a theoretical evaluation of what happens when data are symmetric,” Tahmasebi says.
They explored the statistical-computational tradeoff in machine learning with symmetric data. This tradeoff means methods that require fewer data can be more computationally expensive, so researchers need to find the right balance.
Building on this theoretical evaluation, the researchers designed an efficient algorithm for machine learning with symmetric data."
https://news.mit.edu/2025/new-algorithms-enable-efficient-machine-learning-with-symmetric-data-0730
Paper: https://arxiv.org/abs/2502.19758