The Sanghani Center is home to high-profile research, garnering recognition within and beyond the data analytics community.
Our talented team has been recognized with many competitive research awards and featured in major news and media outlets such as the Wall Street Journal, Newsweek, the Boston Globe and the Chronicle of Higher Education.
Considering the millions of research papers and reports from open domains such as biomedicine, agriculture, and manufacturing, it is humanly impossible to keep up with all the findings.
Constantly emerging world events present a similar challenge because they are difficult to track and even harder to analyze without looking into thousands of articles.
How has online sleuthing successfully replaced wanted posters?
Researchers within the Virginia Tech Department of Computer Science answered this question by studying the crowdsourced online investigation that followed the Jan. 6, 2021, insurrection at the U.S. Capitol.
Tianjiao “Joey” Yu and Kurt Luther collaborated on the project with Ismini Lourentzou, assistant professor of computer science and a core faculty at the Sanghani Center for Artificial Intelligence and Data Analytics, and Sukrit Venkatagiri, a postdoctoral researcher at the University of Washington. Read the full story here.
Announced last year, the initiative — funded by Amazon, housed in the College of Engineering, and directed by researchers at the Sanghani Center for Artificial Intelligence and Data Analytics on Virginia Tech’s campus in Blacksburg and at the Innovation Campus in Alexandria — supports student- and faculty-led development and implementation of innovative approaches to robust machine learning, such as ensuring that algorithms and models are resistant to errors and adversaries, that could address worldwide industry-focused problems. Read full story here.
In her research at the Sanghani Center, Ph.D. student Amarachi Blessing Mbakwe is trying to develop advanced artificial intelligence methodologies for better medical imaging and clinical decision-making.
Her passionate drive to improve healthcare systems that could save millions of lives worldwide stems from personal experience. With the deaths of two close family members in her home region in Nigeria, Mbakwe witnessed firsthand the devastating consequences of delayed disease detection, poor treatment management, and a shortage of healthcare professionals.
Targeted intervention can improve healthcare access for everyone and mitigate the disparities in clinical care often faced by underrepresented populations and minorities, said Mbakwe, who is advised by Ismini Lourentzou.
“By developing an AI algorithm that can accurately and quickly analyze chest x-rays, my research can help reduce the time and effort required for radiologists to interpret medical imaging tests which, in turn, can help ensure timely patient treatment or adjustment of treatments, especially in regions with a shortage of radiologists,” she said.
CheXRelNet incorporates local and global visual features, utilizes inter-image and intra-image anatomical information, and learns dependencies between anatomical region attributes via graph attention to accurately predict disease progression for a pair of chest x-rays.
“I was attracted to Virginia Tech’s Department of Computer Science and the Sanghani Center because I wanted to conduct impactful research that benefits society and they provided me with the perfect platform to achieve my goals,” Mbakwe.
She said that the outcome of her research is not only applicable in healthcare but could also extend further to other applications in fairness and finance. Last summer she had the opportunity to intern at JPMorgan Chase & Co as an AI research associate and will be returning for a second internship this summer.
Mbakwe earned a bachelor’s degree in mathematics from Nnamdi Azikiwe University, Anambra State, Nigeria, and a master’s degree in computer science and quantitative methods from Austin Peay State University in Clarksville, Tennessee.
Projected to graduate in 2024, she aspires to become a researcher in an industrial research lab and eventually also assume the position of visiting/adjunct professor.
Ogunleye, a member of the Perception and LANguage (PLAN) research lab, is one of eight students pursuing technical degrees at universities across the country who were selected to receive the scholarship based on their impressive academic records, work in the community, leadership potential, and recommendations from professors. He is advised by Ismini Lourentzou, an assistant professor in the Department of Computer Science. Read full story here.
Shengzhe Xu chose to pursue a Ph.D. in computer science at Virginia Tech because the Sanghani Center offered him the opportunity to investigate cutting-edge challenges of academic importance and find ways of applying these methodologies to tackle real-world problems.
“What I like best about the center is that everyone is encouraged to pursue their own areas of interest,” said Xu, who is advised by the center’s director, Naren Ramakrishnan. “As students in this free scientific research environment, we just need to concentrate on improving ourselves and conduct in-depth research on the topics we choose.”
Xu’s work explores semantic analysis of tabular data as well as synthetic tabular data generation. “A real-world example of this is network traffic data,” he said. “Every operation on the Internet is recorded like a footprint that we can model by using deep learning methods.”
But capturing the semantics of tabular data is a challenging problem. Unlike traditional natural language processing and computer vision fields, the overall portrait of tabular data is difficult for humans — even if they are domain experts — to judge because it has complex dependencies that need to explored in depth.
“Deep learning models have achieved great success in recent years but progress in some domains like cybersecurity is stymied due to a paucity of realistic datasets. For privacy reasons, organizations are reluctant to share such data, even internally,” he said. “In order to protect the privacy of training data from being leaked, it is important to explore how to generate good enough tabular data in terms of both training performance and privacy protection.”
Xu presented his work on “STAN: Synthetic Network Traffic Generation with Generative Neural Models” at the MLHat Workshop on Deployable Machine Learning for Security Defense during the 2021 SIGKDD Conference on Knowledge Discovery and Data Mining. The paper explored synthetic data generation in real-world network traffic flow data to protect any sensitive data from data leakage.
Projected to graduate in 2024, Xu hopes to continue his research as an industry professional.
Afrina Tabassum, a Ph.D. student in computer science, was attracted to the Sanghani Center by the trending research conducted by faculty for improving machine learning algorithms and their application to other fields.
Her research interests lie in machine learning and self-supervised learning, particularly designing novel representation learning objectives for multi-modal data. “I was really attracted to this area of research by an urge to use deep learning in order to make people’s lives easier,” she said.
Their paper introduces Uncertainty and Representativeness Mixing (UnReMix) for contrastive training, a method that combines importance scores that capture model uncertainty, representativeness, and anchor similarity.
“We verify our method on several visual, text and graph benchmark datasets and perform comparisons over strong contrastive baselines,” said Tabassum, “and to the best of our knowledge, we are the first to consider representativeness for hard negative sampling in contrastive learning in a computationally inexpensive way.”
Experimental and qualitative results so far have demonstrated the effectiveness of their proposed approach, she said.
“Ten teams across the world were selected to build a taskbot to assist in cooking and performing other tasks around the house. Our bot will be able to make adaptable conversation a reality by allowing customers to follow personalized decisions through the completion of multiple sequential subtasks and adapt to the tools, materials, or ingredients available to the user by proposing appropriate substitutes and alternatives,” she said.
In addition to working on adapting instructions according to the user needs, she is serving as student team leader with responsibilities that include setting clear team goals and short-term deadlines and delegating tasks among all the team members.
Projected to graduate in 2024, Tabassum would like to pursue a career in industry research.
Insider threats to cybersecurity can occur when an actor with authorized access to an organization’s network conducts malicious activities that may release the organization’s critical information that further results in severe consequences such as financial loss, system crashes, and national security challenges.
“These threats are on the rise and according to a recent cyber security survey, 27 percent of cybercrime incidents involved insiders,” said Dawei Zhou, an assistant professor in the Department of Computer Science; director of the VirginiaTech Learning on Graphs (VLOG) Lab and core faculty at the Sanghani Center for Artificial Intelligence and Data Analytics.
One of Zhou’s projects, “Combating Insider Threat: Identification, Monitoring, and Data Augmentation,” targets the challenging problem of how to combat insider threats. He recently received a 2023-2024 Cisco Faculty Research Award that will help support this research.
Zhou said his project uses multiple dynamic and heterogeneous data sources that include internal system logs, employee networks, and email exchange networks.
“Distinctly from other types of terror attacks, insider threats exhibit several unique challenges like rarity, non-separability, label scarcity, dynamicity, and heterogeneity, making it extremely difficult to catch them in time for a successful counter-attack,” said Zhou.
He explains: Rarity means that the absolute number of such insiders is extremely small, especially compared with the total number of employees in a large organization or company; non-separability means that the insiders are very good at camouflaging themselves to make them indistinguishable from normal ones and thus able bypass the detection system; label scarcity means that the annotation process of insiders is labor-extensive and time-consuming; dynamicity refers to the time-evolving nature of the raw input data sources as well as the behaviors of insiders; and heterogeneity refers to the heterogeneous data coming from various sources and in various formats.
“Although different insiders are often conscious and good at camouflaging themselves, they might share some common traits if examined under the proper lens” he said.
With this in mind, the project will try to combat insider threat via an interactive learning mechanism, building new theories and algorithms for the following learning tasks:
Insider Identification: characterize the descriptive and essential properties of insiders and detect groups of insiders – such as traitors, masqueraders, and unintentional perpetrators — with common traits.
Insider Monitoring: track the evolution of insider behaviors over time and provide a visual system for analysis, annotation, and diagnosis.
Data Augmentation; sanitize input data by completing missing data and cleaning noisy data and generate synthetic insiders to alleviate the label scarcity issue.
Computer science Ph.D. students Shuaicheng Zhang and Haohui Wang, who are advised by Zhou, will be working with him on the project. A third student, Weije Guan, will be joining the team in the Fall semester.
“We hope that the innovative approach we are taking will result in a better understanding of how to counterattack these threats and ultimately decrease the number of cybercrimes,” Zhou said.