With the development of big data, machine learning, and AI, existing software engineering techniques must be re-imagined to provide the productivity gains that developers desire. This talk will review emerging roles of data scientists and the tools they need to build scalable, correct, and efficient software for a data centric world.
Kim will present a large-scale study of about 800 data scientists in collaboration with Microsoft Research, which looked at data scientists’ educational background, problem topics that they work on, tools they use, and activities. From the gathered data, she has identified nine distinct clusters of data scientists and best practices and challenges faced by each cluster.
In the second half of this talk, she will discuss the needs of re-targeting SE research community’s directions to address new challenges in the era of data-centric software development. In particular, she will detail some examples of her group’s work that re-invents debugging and testing for big data distributed systems such as Apache Spark. She will conclude with open SE problems in ML and heterogeneous computing that support data-centric software development.
Miryung Kim is a Professor in the Department of Computer Science at the University of California, Los Angeles and is a Director of Software Engineering and Analysis Laboratory. She is known for her research on code clones — code duplication detection, management, and removal solutions. Recently, she has taken a leadership role in defining the emerging area of software engineering for data science.
She received her B.S. in Computer Science from Korea Advanced Institute of Science and Technology in 2001 and her M.S. and Ph.D. in Computer Science and Engineering from the University of Washington in 2003 and 2008 respectively. She ranked No. 1 among all engineering and science students in KAIST in 2001 and received the Korean Ministry of Education, Science, and Technology Award, the highest honor given to an undergraduate student in Korea. She received various awards including an NSF CAREER award, Google Faculty Research Award, and Okawa Foundation Research Award. She was previously an assistant professor at the University of Texas at Austin. Her research is funded by National Science Foundation, Air Force Research Laboratory, Google, IBM, Intel, Okawa Foundation, and Samsung and currently, she is leading a 4.9M Office of Naval Research project on synergistic software customization. She is a Program Co-Chair of the IEEE 35th International Conference on Software Evolution and Maintenance and an Associate Editor of IEEE Transactions on Software Engineering and Empirical Software Engineering.