João Paulo and Ricardo Macedo, Principal Investigators of UT Austin Portugal Program’s projects BigHPC and PAStor respectively, were in Santa Clara, USA, last February, to participate in the USENIX Conference on File and Storage (FAST), one of the most important conferences in the storage systems field.
One of the goals of the work INESC TEC’s researchers Paulo and Macedo have been carrying out under the Program is to improve the management of significant amounts of digital data researchers extract during their scientific studies (e.g. genomic data), while reducing the risks that could potentially undermine such data. The results culminated in the paper “PAIO: General, Portable I/O Optimizations with Minor Application Modifications”, presented during the Conference.
As stated by Ricardo Macedo, “PAIO provides the essential I/O mechanisms for researchers to carry out their studies with fairness, concerning access to the required data. This is particularly important for studies with similar runtimes, ensuring that some do not finish within a few minutes or hours, while others require days or weeks”. The system proposes storage optimizations that accelerate the training of artificial intelligence models. “These optimizations can be used, for instance, to predict the spreading of diseases like COVID-19 much faster – in some cases reducing the training time by half,” said João Paulo.
This paper was written in co-authorship with João Paulo and Ricardo Macedo’s counterparts at UT Austin and the Texas Advanced Computing Center (TACC) and also with researchers from the Hood College of the United States of America and Japan’s National Institute of Advanced Industrial Science and Technology (AIST).
“Thanks to this partnership, the team is able to discuss with other experts which problems are associated with data storage, resulting from different types of applications on supercomputers”, mentioned João Paulo. In addition, these collaborations have also favored “the validation of solutions developed in top infrastructures”, added the researcher.
The work of these projects does not stop here, and we will be able to see another contribution from this team in May. The Monarch system, also entitled “Accelerating Deep Learning Training through Transparent Storage Tiering”, will be presented at the 22nd edition of the International Symposium on Cluster, Cloud and Internet Computing (CCGrid’22).