Static analysis of expl3 programs (1): Introduction

In 2021, I used the expl3 programming language for the first time in my life. I had already been eyeing expl3 for some time and, when it came to defining a LaTeX-specific interface for processing YAML metadata in version 2.11.0 of the Markdown package for TeX, I took the plunge.

After two and a half years, approximately 3.5k out of the 5k lines of TeX code in version 3.5.0 of the Markdown package are written in expl3. I also developed several consumer products with it, and I have written three journal articles for my local TeX users group about it. Needless to say, expl3 has been a blast for me!

In the Markdown package, each change is reviewed by a number of automated static analysis tools (so-called linters), which look for programming errors in the code. While these tools don’t catch all programming errors, they have proven extremely useful in catching the typos that inevitably start trickling in after 2AM.

Since the Markdown package contains code in different programming languages, we use many different linters such as shellcheck for shell scripts, luacheck for Lua, and flake8 and pytype for Python. However, since no linters for expl3 exist, typos are often only caught by regression tests, human reviewers, and sometimes even by our users after a release. Nobody is happy about this.

Last month, I realized that, unlike TeX, expl3 has the following two properties that seem to make it well-suited to static analysis:

  1. Simple uniform syntax: (Almost) all operations are expressed as function calls. This Lisp-like quality makes is easy to convert well-behaved expl3 programs that only use high-level interfaces into abstract syntax trees. This is a prerequisite for accurate static analysis.
  2. Explicit type and scope: Variables and constants are separate from functions. Each variable is either local or global. Variables and constants are explicitly typed. This information makes it easy to detect common programming errors related to the incorrect use of variables.

For the longest time, I wanted to try my hand at building a linter from the ground up. Therefore, I decided to kill two birds with one stone and improve the tooling for expl3 while learning something new along the way by building a linter for expl3.

This is the first in a series of posts in which I will try and create a linter for expl3. Since linters can be arbitrarily thorough in their analysis, I will devote the next post to the requirements specification for the linter. The requirements will form the basis of the design and the implementation of the linter, which I will cover in the following posts.

Written on April 5, 2024