Deeper Dive
In my project, I employed a diffusion model—a type of generative machine learning model—to computationally generate hypothetical new families of superconductors. Diffusion models are the same models behind popular image generation software, such as Dall-E 2 and Stable Diffusion. Here, instead of training on and generating images—which are just matrices to a computer—I train on and generate column vectors representing chemical compounds—the ones trained on being known superconductors and the ones generated being (mostly) new hypothetical superconductors. The key insight, though, that allowed this work to be the first to generate hypothetical new families of superconductors—and not just hypothetical new superconductors from known families—was a step called “conditioning” (specifically, Iterative Latent Variable Refinement was the method used): when sampling, the model also references a set of manually selected reference compounds, which are interesting known superconductors that belong to families of superconductors that were not present in the training dataset. This allows the model to interpolate between the characteristics of superconductor families it learned about in training and the reference compound’s new superconductor family, thereby allowing it to generate compounds that represent hypothetical new families of superconductors. This work is significant because it presents a host of new potential superconductors that researchers can evaluate—to possibly synthesize or take inspiration from; it presents a tool that can be used to reliably generate new families of hypothetical superconductors—benefiting researchers by allowing them to use it to expand on their discoveries and accelerating their workflows; and it provides insights that can accelerate the discovery of new superconductors and enhance our understanding of them.
While my project, being computational, might not have the most significant direct application to people’s lives, it is still a step forward in the realm of superconductor research: the more than two million generated compounds and numerous potential new families of superconductors are publicly available for researchers to examine and verify; the code for SuperDiff is also public so that researchers can incorporate it as a tool for their workflows; and our work has been published in Scientific Reports. The results of my project hope to enhance our understanding of high-temperature superconductors and also accelerate the search for the coveted room-temperature ambient-pressure superconductor, which could revolutionize our power grids—in the form of energy savings—and transportation—in the form of maglevs.