Summary
Nowadays, it is becoming increasingly difficult to efficiently manage available computational and storage resources, to provide transparent application access to such resources, and to ensure performance isolation and fairness across different workloads. The BigHPC project will address these challenges with a novel management framework, for Big Data and parallel computing workloads.
In this sense, the BigHPC will simplify the management of Big Data applications and HPC infrastructural resources – with a direct impact on science, industry and society, by accelerating scientific breakthroughs in different fields and increasing the competitiveness of companies through better data analysis and improved decision-support processes.
The project will advance the current knowledge and develop new tools to address the different challenges in HPC infrastructures, namely the monitoring, virtualization and storage management components. At the end of the project, it is expected that the BigHPC will integrate these three components in a new platform, thus allowing a more efficient use of said infrastructures and their services.
Expected Outcomes
- A novel solution to manage and monitor HPC and Big Data workloads that:
1) combines novel monitoring, virtualization and software-defined storage components;
2) can cope with HPC’s infrastructural scale and heterogeneity;
3) efficiently supports different workload requirements while ensuring the holistic performance and resource usage;
4) can be seamlessly integrated with existing HPC infrastructures and software stacks;
5) will be validated with pilots running in both MACC and TACC infrastructures.
Start Date – End Date: | March 31, 2020 – March 31, 2023 June 30,2023 |
Scientific Area: | Advanced Computing |
Keywords: |
Big Data, High Performance Computing, HPAI |
Website: | |
Lead Beneficiary (PT): | Wavecom – Soluções Rádio S.A. |
Co-beneficiaries:
|
INESC TEC – Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência LIP, Laboratório de Instrumentação e Física Experimental de Particulas – Associação para a Investigação e Desenvolvimento |
PIs at UT Austin: | Vijay Chidambaram (Department of Computer Science) John Cazes (Texas Advanced Computing Center) |
Other Partners: | Minho Advanced Computing Center |
Total Eligible Investment (PT): | 1 183 678,08 EUR |
Total Eligible Investment (US): | 799 998,00 USD |
Funding Sources Distribution: |
Papers and Communications
- Miranda, M., Esteves, T., Portela, B., & Paulo, J. (2021). S2Dedup: SGX-enabled secure deduplication. Proceedings of the 14th ACM International Conference on Systems and Storage. SYSTOR ’21: The 14th ACM International Systems and Storage Conference.
- Faria, A., Macedo, R., Pereira, J., & Paulo, J. (2021). BDUS: implementing block devices in user space. Proceedings of the 14th ACM International Conference on Systems and Storage. SYSTOR ’21: The 14th ACM International Systems and Storage Conference.
- Ruhela, A., Harrell, S. L., Evans, R. T., Zynda, G. J., Fonner, J., Vaughn, M., Minyard, T., & Cazes, J. (2021). Characterizing Containerized HPC Applications Performance at Petascale on CPU and GPU Architectures. Lecture Notes in Computer Science. Springer International Publishing.
- Ruhela, A., Vaughn, M., Harrell, S. L., Zynda, G. J., Fonner, J., Evans, R. T., & Minyard, T. (2020). Containerization on Petascale HPC Clusters. State of Practice talk in International Conference for High Performance Computing, Networking, Storage and Analysis.
- Evans, R. T. (2020). Democratizing Parallel Filesystem Monitoring. 2020 IEEE International Conference on Cluster Computing. (2020).
- Kadekodi, R., Kadekodi, S., Ponnapalli, S., Shirwadkar, H., Ganger, G. R., Kolli, A., & Chidambaram, V. (2021). WineFS: a hugepage-aware file system for persistent memory that ages gracefully. Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles.
- Esteves, T. Neves, F., Oliveira, R. & Paulo, J. (2021). CaT: Content-aware Tracing and Analysis for Distributed Systems. In Proceedings of the 22nd International Middleware Conference (Middleware ’21). Association for Computing Machinery, New York, NY, USA, 223–235. https://doi.org/10.1145/3464298.3493396.