Static analysis of expl3 programs (2): Requirements

This is my second devlog post for the development of a static analysis tool (so-called linter) for the expl3 programming language, which would help developers to discover bugs in their expl3 programs before even running them.

In my previous post, I introduced the idea of linters and why it makes sense to have one for expl3. In this post, I will outline the requirements for the linter. These will form the basis of the design and the implementation.

Functional requirements

The linter should accept a list of input expl3 files. Then, the linter should process each input file and print out issues it has identified with the file.

Initially, the linter should recognize at least the following types of issues:

  • Style:
    • Overly long lines
    • Missing stylistic white-spaces
    • Malformed names of functions, variables, constants, quarks, and scan marks
  • Functions:
    • Multiply defined functions and function variants
    • Calling undefined functions and function variants
    • Calling deprecated and removed functions
    • Unknown argument specifiers
    • Unexpected function call arguments
    • Unused private functions and function variants
  • Variables:
    • Multiply declared variables and constants
    • Using undefined variables and constants
    • Using variables of incompatible types
    • Using deprecated and removed variables and constants
    • Setting constants and undeclared variables
    • Unused variables and constants
    • Locally setting global variables and vice versa

Non-functional requirements

Issues

The linter should make distinction between two types of issues: warnings and errors. As a rule of thumb, whereas warnings are suggestions about best practices, errors will likely result in runtime errors.

Here are three examples of warnings:

  • Missing stylistic white-spaces around curly braces
  • Using deprecated functions and variables
  • Unused variable or constant

Here are three examples of errors:

  • Using an undefined message
  • Calling a function with a V-type argument with a variable or constant that does not support V-type expansion
  • Multiply declared variable or constant

The overriding design goal for the initial releases of the linter should be the simplicity of implementation and robustness to unexpected input. For all issues, the linter should prefer precision over recall and only print them out when it is reasonably certain that it has understood the code, even at the expense of potentially missing some issues.

Each issue should be assigned a unique identifier. Using these identifiers, issues can be disabled globally using a config file, for individual input files from the command-line, and for sections of code or individual lines of code using TeX comments.

Architecture

To make the linter easy to use in continuous integration pipelines, it should be written in Lua 5.3 using just the standard Lua library. One possible exception is checking whether functions, variables, and other symbols from the input files are expl3 build-ins. This may require using the texlua interpreter and a minimal TeX distribution that includes the LaTeX3 kernel, at least initially.

The linter should process input files in a series of discrete steps, which should be represented as Lua modules. Users should be able to import the modules into their Lua code and use them independently on the rest of the linter.

Each step should process the input received from the previous step, identify any issues with the input, and transform the input to an output format appropriate for the next step. The default command-line script for the linter should execute all steps and print out issues from all steps. Users should be able to easily adapt the default script in the following ways:

  1. Change how the linter discovers input files.
  2. Change or replace processing steps or insert additional steps.
  3. Change how the linter reacts to issues with the input files.

Validation

As a part of the test-driven development paradigm, all issues identified by a processing step should have at least one associated test in the code repository of the linter. All tests should be executed periodically during the development of the linter.

As a part of the dogfooding paradigm, the linter should be used in the continuous integration pipeline of the Markdown Package for TeX since the initial releases of the linter in order to collect early user feedback. Other early adopters are also welcome to try the initial releases of the linter and report issues to its code repository.

At some point, a larger-scale validation should be conducted as an experimental part of a TUGboat article that will introduce the linter to the wider TeX community. In this validation, all expl3 packages from current and historical TeX Live distributions should be processed with the linter. The results should be evaluated both quantitatively and qualitatively. While the quantitative evaluation should focus mainly on trends in how expl3 is used in packages, the qualitative evaluation should explore the shortcomings of the linter and ideas for future improvements.

Conclusion

In this post, I outlined the requirements for the linter. In the next post, I will analyze the requirements, discuss the design of the linter, and create a code repository for the linter. Then, I will continue by implementing the first processing step for the linter.

Written on June 27, 2024