Static analysis of expl3 programs (4): Design

This is my fourth devlog post for the development of a static analysis tool (so-called linter) for the expl3 programming language, which would help developers to discover bugs in their expl3 programs before even running them.

In my previous two posts, I outlined the requirements for the linter and I reviewed the related work in the analysis of TeX programs and documents. In this post, I will outline the design of the linter and I will create a code repository for the linter.

Processing steps

As outlined in the requirements, the linter will process input files in a series of discrete steps, each represented by a single Lua module.

Here are the individual processing steps that should be supported by the linter:

  1. Preprocessing: Determine which parts of the input files contain expl3 code.
  2. Lexical analysis: Convert expl3 parts of the input files into TeX tokens.
  3. Syntactic analysis: Convert TeX tokens into a tree of function calls.
  4. Semantic analysis: Determine the meaning of the different function calls.
  5. Flow analysis: Determine additional emergent properties of the code.

The lexical analysis step seems like a good target for reusing parts of the code of the language server digestif discussed in the related work.

UPDATE (2024-09-06): After an extended discussion with Karl Berry, I decided to dual-license the code under GNU GPL 2.0 or later and LPPL 1.3c or later, which makes it impossible to reuse code from digestif. For more information, see the updated requirements.

Warnings and errors

As also outlined in the requirements, each processing step should identify issues with the output and produce either a warning or an error. Furthermore, the requirements list 16 types of issues that should be recognized by the linter at a minimum. Lastly, the requirements require that, as a part of the test-driven development paradigm, all issues identified by a processing step should have at least one associated test in the code repository of the linter.

In a document titled “Warnings and errors for the expl3 analysis tool”, I compiled a list of 67 warnings and errors that should be recognized by the initial version of the linter. For each issue, there is also an example of expl3 code with and without the issue. These examples can be directly converted to tests and used during the development of the corresponding processing steps.

Limitations

Due to the dynamic nature of TeX, initial versions of the linter will make some naïve assumption and simplification during the analysis, such as:

  • Assume default expl3 catcodes everywhere.
  • Ignore non-expl3 and third-party code.
  • Do not analyze expansion and key–value calls.

As a result, the initial version of the linter may not have a sufficient understanding of expl3 code to support proper flow analysis. Instead, the initial version of the linter may need to use pseudo-flow-analysis that would check for simple cases of the warnings and errors from flow analysis. Future versions of the linter should improve their code understanding to the point where proper flow analysis can be performed.

The warnings and errors in this documents do not cover the complete expl3 language. The limitations currently include the areas outlined in a section of the document with warnings and errors titled “Caveats”. Future versions of the linter should improve the coverage.

Code repository

I created a repository witiko/expltools titled “Development tools for expl3 programmers” at GitHub. As outlined in the requirements, I licensed the code from the repository under the GNU GPL 3.0 license.

UPDATE (2024-09-06): After an extended discussion with Karl Berry, I decided to dual-license the code under GNU GPL 2.0 or later and LPPL 1.3c or later. For more information, see the updated requirements.

Furthermore, I also registered the expl3 prefix expltools, so that it can be used in the documentation for the linter, in other supporting expl3 code used in the linter, and also possibly in development tools for expl3 programmers other than the linter.

Conclusion

In this post, I outlined the design of the linter and I created a code repository for the linter. In the next post, I will implement the first processing step for the linter, which is preprocessing.

Written on August 15, 2024
Last updated on September 6, 2024