Graphic is from the paper “The Case For Alternative Web Archival Formats To Expedite The Data-To-Insight Cycle”

Xinyue Wang was an undergraduate research assistant involved in artificial intelligence and digital library research at the University of North Texas when he had occasion to connect with Edward Fox, professor in Virginia Tech’s Department of Computer Science and faculty at the Sanghani Center and Zhiwu Xie, a professor at University Libraries.

The two are now Wang’s co-advisors as he pursues a Ph.D. in computer science at Virginia Tech. “They are wonderful people and I am grateful to be able to learn from them and work with them,” he said.

Wang’s research interest is digital infrastructure and analytics of the digital library field. His current work involves digital infrastructure design for easy access and analysis of large web archive collections.

“Large web archive collections are rich datasets that are under researched due to their large size and complexity and have become a technical wall for researchers with or without computer science background. Lack of infrastructure design also makes it difficult for smaller institutions to provide easy access on such data,” Wang said. “My research aims to find a solution whereby large web archive collections can be efficiently accessed and analyzed for academia.”

This research, he said, would contribute to building a foundation for many other researchers who are interested in exploring web archive data in various fields.

“At Virginia Tech and the Sanghani Center I have had the opportunity to work with researchers in different fields, trying to use my own computer science expertise to help solve their problems,” Wang said. “I enjoy being in touch with a diverse group of researchers and confronting different real-world problems.”

Wang’s paper, “The Case For Alternative Web Archival Formats To Expedite The Data-To-Insight Cycle,” was included in the proceedings of the 2020 ACM/IEEE Joint Conference on Digital Libraries (JCDL) in 2020.

In previous years, Wang had two posters in JCDL conference proceedings, “Web Archive Analysis Using Hive and SparkSQL” in 2019; and “Towards A Self-Learning Library For Vibration Data” in 2018.

His work on “Metadata records machine translation combining multi‐engine outputs with limited parallel data,” was published in the Journal of the Association for Information Science and Technology In January 2018.

Wang, who earned his bachelor of science degree from the University of North Texas, is projected to graduate in June 2022 and plans to pursue a career in academia.