Graphic is from the paper “SAUCE: Truncated Sparse Document Signature Bit-Vectors for Fast Web-Scale Corpus Expansion”

Working toward a Ph.D. in computer science, Muntasir Wahed is delving into self-supervised learning, adversarial training, and out-of-distribution detection.

“Suppose we train a machine learning classifier to help medical diagnosis of a disease X given an X-ray,” Wahed said. “We collect a large dataset of X-rays for both positive and negative samples of the disease X. However, after we deploy the classifier in real life, it encounters confusing X-rays that have features not seen in any of the X-rays in the training samples. In such cases, it would be unreliable to classify the samples as positives or negatives. Instead, we would like to have a mechanism to recognize that these samples are so far unseen, or in other words, out-of-distribution.”

Recent self-supervised learning methods include contrastive training, which aims to bring closer pairs of positive examples (similar instances) and repel negative pairs (dissimilar instances). “But most instance-wise and cluster-based, or prototypical, contrastive learning techniques lack robustness against adversarial examples. That is what I am aiming to improve,” said Wahed.

Though he had been working on machine learning for the last three years, both in research and industrial settings, a Data Challenges in Machine Learning Course — taught by his advisor Ismini Lourentzou last Spring — really piqued his interest in self-supervised learning, adversarial training, and out-of-distribution detection. 

“The underlying challenges and the real-life implications of these problems intrigued me and after some background study, I recognized some areas to improve and started working on what is now his main research focus,” Wahed said.

In early November, Wahed will present “SAUCE: Truncated Sparse Document Signature Bit-Vectors for Fast Web-Scale Corpus Expansion” at the 30th ACM International Conference on Information and Knowledge Management (CIKM).

He is collaborating with Nur Ahmed, postdoctoral associate at MIT Sloan & MIT CSAI on “The De-democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research.” This work has been featured at VentureBeatScientific AmericanAxiosMarginal Revolution; and in two AI reports, The National Security Commission on Artificial Intelligence and Stanford AI Index.

Wahed earned a bachelor’s degree in computer science from the University of Dhaka, Bangladesh. He was drawn to Virginia Tech and the Sanghani Center because of the diversity of the student body and the potential for research collaboration. As the Department of Computer Science and the Sanghani Center continue to grow, it opens even more doors to multidisciplinary research and learning opportunities, he said.

Projected to graduate in Fall 2024, Wahed hopes to find a position in a research laboratory where he can continue to work in collaborative settings on problems with real-life implications.