Webinar 5: Monitoring in BigHPC: Lessons Learned | 2nd BigHPC Webinar Series

After the success of the first webinar series, the UT Austin Portugal Program and the BigHPC consortium are back with a second round of webinars, running every two months from March 2022 to March 2023, called Towards a new generation of Big Data & HPC applications. This time around, experts from academia and industry will meet to look through some of the most relevant scientific and technological advancements made in response to the challenges we have dissected before.

Click HERE for full details on this 2nd edition!

Webinar 5: Monitoring in BigHPC: Lessons Learned

  • Date: November 17, 2022 | 3.00 p.m. (GMT+1)
  • Speakers: Júlio Costa, Data Analyst, Wavecom
  • Moderator: Rohan Kadekodi, PhD Candidate, UT Austin

Monitoring consists of using ETL processes which stand for Extract, Transform, and Load data into a usable resource which can be visualized with ease in a graphical interface. For the BigHPC project the main mission of the monitoring component is to empower users with a better understanding of their jobs workload and to help system admins to predict possible malfunctions or misbehaved applications.

Big Data applications in HPC’s require special care since their behavior is different from typical HPC workloads, henceforth new challenges arise. In addition, the permissions granted by the scheduler are limited to the workload user. All this combined, has led to some trials and errors during the development of the monitoring system. In this webinar we pretend to give a general overview of the lessons learned, the concepts and solutions implemented and provide notions on how to create meaningful visualizations for HPC.

Recordings Here

About the speakers:

Júlio Costa graduated from the University of Aveiro as a M.Sc Eletronics and Telecommunications Engineer in 2020. Being a data enthusiast, he joined Wavecom’s R&D department in 2021 to work in BigHPC monitoring activity as a data analyst. In the last year he has been mostly working in data related tasks for the BigHPC monitoring prototype. His main focus has been the collection, processing and analysis of all metrics related to the monitorization of jobs and compute nodes. Currently, he is focused on integration tasks with other activities. In addition to BigHPC he is working on other Wavecom projects related to data analysis and machine learning.

About the moderator:

Rohan Kadekodi is a PhD student working with Prof. Vijay Chidambaram. He works on building systems software for Persistent Memory. In particular, he has built file systems for persistent memory that are aimed at accelerating legacy applications as well as newer applications that are developed for persistent memory.

Important information about our training activities:
– Participation is free of charge. Kindly note, however, that to put on a course, we mobilize several people and incur expenses. Therefore, if you register for an event of ours, we expect you to attend it, for every time we assign a seat to someone we are preventing someone else from participating.
– Participants will be sent a link to the session 2 hours before the start time. If you do not receive it, please contact us at events@utaustinportugal.org
– Since our online courses are available globally, use a time zone converter to know what time the session is starting in your location.

The BigHPC Project is co-financed by the European Regional Development Fund through the Operational Program for Competitiveness and Internationalisation – COMPETE 2020, the Lisbon Portugal Regional Operational Program – Lisboa 2020 and the Portuguese Foundation for Science and Technology – FCT under UT Austin Portugal.