ACT-PM – Automating Crash-Consistency Testing for Persistent Memory

Summary

Title Automating Crash-Consistency Testing for Persistent Memory
Reference UTA-EXPL/CA/0080/2019
Scientific Area Advanced Computing
Funding (PT) 49 752,45 EUR
Funding (US) 49 200,00 USD
Leading Institutions Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa (INESC ID/INESC/IST/ULisboa)
Department of Computer Science, College of Natural Sciences, UT Austin
Participating Institutions Institute for Systems and Computer Engineering, Technology and Science (INESC TEC)
Duration 12 months
Start date October 1, 2020
End date September 30, 2021
Keywords Persistent memory, Crash-consistency, Fault injection, Application testing

Persistent Memory (PM) is a recent technology that promises to deliver performance similar to Dynamic Random Access Memory (DRAM) combined with data persistency guarantees from disks, in case crashes do not occur. However, upon machine or application crashes, the application state can get corrupted, causing applications to malfunction. Implementing crash tolerance techniques is difficult and error-prone and, as a matter of fact, it has been shown that several PM applications do not always recover correctly from crashes. ACT-PM will automate the testing of crash-consistency for PM as memory and disk applications by conducting exploratory research in observability and fault-injection techniques tailored for PM applications.

In the same way that developers need to test regular applications to ensure they match the requirements when developing PM applications, it is fundamental to test their behavior upon crashes. The main challenge in testing PM applications is to have a complete and sound testing framework that intelligently prunes the search space such that applications under test crash only at sensitive points that are likely to reveal bugs and failures.

PM provides a novel point in the traditional memory hierarchy that promises to improve the performance and efficiency of applications. However, to fully exploit these capabilities, novel tools to assess the correctness of these applications under faults are needed. The research conducted in this project, and the resulting tools, will advance the state-of-the-art in the above areas and improve the workflow and efficiency of PM application developers to ultimately leading to safer and more performant applications that fully leverage PM capabilities.

Key Outcomes

  • Novel techniques for black-box observability and fault injection of applications that use PM either as memory or as a disk;
  • Early research prototypes showcasing the developed techniques;
  • Research papers in international venues.

Papers and Communications

  • Faria, A., Macedo, R., Pereira, J., & Paulo, J. (2021, June 14). BDUS. Proceedings of the 14th ACM International Conference on Systems and Storage. SYSTOR ’21: The 14th ACM International Systems and Storage Conference. https://doi.org/10.1145/3456727.3463768

Project Team

Miguel Matos

Principal Investigator in Portugal (INESC ID/INESC/IST/ULisboa)
ACT-PM

Vijay Chidambaram

Principal Investigator in Austin (UT Austin)
ACT-PM