Science on the network -- Mengxia Zhu, assistant professor of computer science in the College of Science at Southern Illinois University Carbondale, is working on technology that would make accessing this data as easy for scientists around the world as watching a movie on YouTube. Zhu recently won a three-year grant worth almost $400,000 from the U.S. Department of Energy to help scientists tap into giant petascale-sized data sets from remote locations and conduct analysis in a speedy, efficient manner. (Photo by Tim Crosby) Download Photo Here
January 19, 2010
Research focuses on enhancing access to data sets
CARBONDALE, Ill. -- Modern scientific labs all around the world have the ability to generate and capture digital data that is more and more massive in scale. A computer scientist at Southern Illinois University Carbondale is working on technology that would make accessing this data as easy for scientists around the world as watching a movie on YouTube.
Mengxia Zhu, assistant professor of computer science in the College of Science, recently won a three-year grant worth almost $400,000 from the U.S. Department of Energy to pursue research she hopes with allow a scientist in Carbondale -- or anywhere -- to tap in, analyze and animate massive data sets created by places such as DOE laboratories around the United States and super colliders in foreign countries.
Zhu’s research focuses on finding ways to optimize the bandwidth of existing networks, including the so-called “dedicated network” and the commonly known Internet. By finding new software and hardware methods, Zhu hopes to give researchers all over the world the tools to access and handle these massive data sets, making such research more efficient, convenient and secure.
“Scientific research is producing huge amounts of data sets on almost a daily basis and we need to find ways to transfer and analyze this data across the network,” said Zhu, who came to SIUC in 2006 from the DOE’s Oakridge National Laboratory. “These are two critical tasks we face.”
Science at this level, however, isn’t currently as easy as logging on to a Web site and clicking icons. The data sets scientists increasingly work with would choke the bandwidth and performance of everyday networks.
A movie clip, for example, might be 100 megabytes in size and might play smoothly or not at all, depending on the type of network and how well it is functioning. Researchers, however, want to tap into data that is nine orders larger than our 100-megabyte movie. They deal not with terabytes, which is three orders bigger than gigabytes, but data sets of petabyte size.
“Scientific application has entered the petascale era in simulation sciences and experimental science,” Zhu said. “In terms of scientific data, most applications produce from terabytes to petabytes. Now the complexity of the computational algorithms and the capability of computing facilities are increasing at an incredible pace, so data is increasing quickly. So we have to tackle and face those challenges arising from the large data set.”
To accomplish this, Zhu and her students are working on some special workflow monitoring software, which they will use to map computing and data transfer modules on the Internet and the dedicated network reserved for scientists such as Energy Science Network. “It’s beyond the normal user’s imagination,” Zhu said. The software, acting like a sensor lowered into cyberspace, will measure parameters such as existing bandwidth, CPU performance and available memory.
After the software is fully deployed and returning results, it will begin mapping the most efficient ways to link a researcher’s workstation with the enormous data sets at a remote location. In this sense, it will act as a sort of travel agent for data, finding the optimal route from point A to point B.
“We want to minimize the end-to-end delay to achieve the best user performance,” Zhu said. “It is a problem of optimization.”
The software also will allow the researchers to visualize and manipulate the data en route to their workstations, so that the final result on their screen is the distilled data or animation sequence that represent the results of their experiments.
Improving transfer efficiency by enabling much of this analysis to happen away from the researcher’s workstation will help reduce the load on his or her computing facilities and speed the overall process, making it convenient and rewarding for researchers to use, Zhu said.
“How can we enable a highly efficient, secure and convenient scheme for both the data transfer as well as the data analysis? That is our target,” she said.
While efficiency and convenience are important goals, Zhu’s work will also emphasize security encryption of data and user authentication.
Zhu faces daunting challenges, the crux of which runs up against one of the most perplexing class of mathematical problems in computer science -- that of “NP -- Complete.” Using any currently known algorithm, it may take billions of years to solve problems of this type with moderate problem size. Yet, paradoxically, such solutions can be quickly verified.
NP -- Complete problems are so daunting that computer scientists are trained to recognize them when they arise in order to avoid wasting time trying to solve them in polynomial time and instead figure out algorithms that will bring about a solid but approximate answer.
“None of this problem is easy,” Zhu said. “We have proved our problem to be NP -- Complete and are designing either approximation or heuristic approaches.”
The grant will support three doctoral students, as well as several master’s and undergraduate students, Zhu said. With plenty of work to do in both the theoretical and practical aspects, students of all levels can contribute, she said. The grant also will pay for some Linux workstations and conference travel.
Zhu will cooperate with researchers at several Energy Department laboratories, including Oakridge, Lawrence Livermore and Brookhaven.