Blogs (1) >>
ASE 2019
Sun 10 - Fri 15 November 2019 San Diego, California, United States
Wed 13 Nov 2019 10:40 - 11:00 at Cortez 1 - Testing and Program Analysis Chair(s): Jun Sun

Regular expressions (regexes) are a powerful mechanism for solving string-matching problems. They are supported by all modern programming languages, and have been estimated to appear in more than a third of Python and JavaScript projects. Yet existing studies have focused mostly on one aspect of regex programming: readability. We know little about how developers perceive and program regexes, nor the difficulties that they face.

In this paper, we provide the first study of the regex development cycle, with a focus on (1) how developers make decisions throughout the process, (2) what difficulties they face, and (3) how aware they are about serious risks involved in programming regexes. We took a mixed-methods approach, surveying 279 professional developers from a diversity of backgrounds (including top tech firms) for a high-level perspective, and interviewing 17 developers to learn the details about the difficulties that they face and the solutions that they prefer.

In brief, regexes are hard. Not only are they hard to read, our participants said that they are hard to search for, hard to validate, and hard to document. They are also hard to master: the majority of our studied developers were unaware of critical security risks that can occur when using regexes, and those who knew of the risks did not deal with them in effective manners. Our findings provide multiple implications for future work, including semantic regex search engines for regex reuse and improved input generators for regex validation.

L. Michael IV's slides on "Regexes are Hard" (MichaelDonohueDavisLeeServant-RegexesAreHard-ASE19-slides.pptx)1.92MiB

Conference Day
Wed 13 Nov

Displayed time zone: Tijuana, Baja California change

10:40 - 12:20
Testing and Program AnalysisResearch Papers / Demonstrations at Cortez 1
Chair(s): Jun SunSingapore Management University, Singapore
10:40
20m
Talk
Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular ExpressionsACM SIGSOFT Distinguished Paper Award
Research Papers
Louis G. Michael IVVirginia Tech, James DonohueUniversity of Bradford, James C. DavisVirginia Tech, USA, Dongyoon LeeStony Brook University, Francisco ServantVirginia Tech
Pre-print File Attached
11:00
20m
Talk
Testing Regex Generalizability And Its Implications: A Large-Scale Many-Language Measurement Study
Research Papers
James C. DavisVirginia Tech, USA, Daniel MoyerVirginia Tech, Ayaan M. KazerouniVirginia Tech, Dongyoon LeeStony Brook University
Pre-print File Attached
11:20
20m
Talk
Accurate String Constraints Solution Counting with Weighted Automata
Research Papers
Elena ShermanBoise State University, Andrew HarrisBoise State University
11:40
20m
Talk
Subformula Caching for Model Counting and Quantitative Program Analysis
Research Papers
William EiersUniversity of California at Santa Barbara, USA, Seemanta SahaUniversity of California Santa Barbara, Tegan BrennanUniversity of California, Santa Barbara, Tevfik BultanUniversity of California, Santa Barbara
12:00
10m
Demonstration
SPrinter: A Static Checker for Finding Smart Pointer Errors in C++ Programs
Demonstrations
Xutong MaInstitute of Software, Chinese Academy of Sciences, Jiwei YanInstitute of Software, Chinese Academy of Sciences, Yaqi LiInstitute of Software, Chinese Academy of Sciences, Jun YanInstitute of Software, Chinese Academy of Sciences, Jian ZhangInstitute of Software, Chinese Academy of Sciences
12:10
10m
Demonstration
FPChecker: Detecting Floating-Point Exceptions in GPU Applications
Demonstrations
Ignacio LagunaLawrence Livermore National Laboratory