Static analysis of expl3 programs (3): Related work

This is my third devlog post for the development of a static analysis tool (so-called linter) for the expl3 programming language, which would help developers to discover bugs in their expl3 programs before even running them.

In my previous post, I outlined the requirements for the linter. In this post, I will review the related work in the analysis of TeX programs and documents. This related work should be considered in the design of the linter and reused whenever it is appropriate and compatible with the license of the linter.

Unravel

The unravel package by Bruno Le Floch analyses of expl3 programs as well as TeX programs and documents in general. The package was suggested to me as related work by Joseph Wright in personal correspondence.

Unlike a linter, which performs static analysis by leafing through the code and makes suggestions, unravel is a debugger that is used for dynamic analysis. It allows the user to step through the execution of code while providing extra information about the state of TeX. Unravel is written in expl3 and emulates TeX primitives using expl3 functions. It has been released under the LaTeX Project Public License (LPPL) 1.3c.

While both linters and debuggers are valuable in producing bug-free software, linters prevent bugs by proactively pointing out potential bugs without any user interaction, whereas debuggers are typically used interactively to determine the cause of a bug after it has already manifested.

Chktex, chklref, cmdtrack, lacheck, match_parens, nag, and tex2tok

The Comprehensive TeX Archive Network (CTAN) lists related software projects on the topics of debuging support and LaTeX quality, some of which I list in this section.

The chktex package by Jens T. Berger Thielemann is a linter for the static analysis of LaTeX documents. It has been written in ANSI C and released under the GNU GPL 2.0 license. The types of issues with the input files and how they are reported to the user can be configured to some extent from the command-line and using configuration files to a larger extent. Chktex is extensible and, in addition to the configuration of existing issues, it allows the definition of new types of issues using regular expressions.

The lacheck package by Kresten Krab Thorup is a linter for the static analysis of LaTeX documents. Similarly to chktex, lacheck has been written in ANSI C and released under the GNU GPL 1.0 license. Unlike chktex, lacheck cannot be configured either from the command-line or using configuration files.

The chklref package by Jérôme Lelong is a linter for the static analysis of LaTeX documents. It has been written in Perl and released under the GNU GPL 3.0 license. Unlike chktex, chklref focuses just on the detection of unused labels, which often accumulate over the lifetime of a LaTeX document.

The match_parens package by Wybo Dekker is a linter for the static analysis of expl3 programs as well as TeX programs and documents in general. It has been written in Ruby and released under the GNU GPL 1.0 license. Unlike chktex, match_parens focuses just on the detection of mismatched paired punctuation, such as parentheses, braces, brackets, and quotation marks. As such, it can also be used for the static analysis of natural text as well as programs and documents in programming and markup languages that use paired punctuation in its syntax.

The cmdtrack package by Michael John Downes is a debugger for the dynamic analysis of LaTeX documents. It has been written in LaTeX and released under the LPPL 1.0 license. It detects unused user-defined commands, which also often accumulate over the lifetime of a LaTeX document, and mentions them in the .log file produced during the compilation of a LaTeX document.

The nag package by Ulrich Michael Schwarz is a debugger for the dynamic analysis of LaTeX documents. Similarly to cmdtrack, nag has also been written in LaTeX and released under the LPPL 1.0 license. It detects the use of obsolete LaTeX commands, document classes, and packages and mentions them in the .log file produced during the compilation of a LaTeX document.

The tex2tok package by Jonathan Fine is a debugger for the dynamic analysis of expl3 programs as well as TeX programs and documents in general. It has been written in TeX and released under the GNU GPL 2.0 license. It executes a TeX file and produces a new .tok file with a list of TeX tokens in the file. Compared to static analysis, the dynamic analysis ensures correct category codes. However, it requires the execution of the TeX file, which may take long or never complete in the presence of bugs in the code.

Luacheck and flake8

Luacheck by Peter Melnichenko and flake8 by Tarek Ziade are linters for the static analysis of Lua and Python programs, respectively. They have been written in Lua and Python, respectively, and released under the MIT license. Both tools are widely used and should inform the design of my linter in terms of architecture, configuration, and extensibility.

Similar to chktex, the types of issues with the input files and how they are reported to the user can be configured from the command-line and using configuration files. Additionally, the reporting can also be enabled or disabled in the code of the analyzed program using inline comments.

Unlike luacheck, which is not extensible at the time of writing and only allows the configuration of existing issues, flake8 supports Python extensions that can add support for new types of issues.

TeXLab and digestif

TeXLab by Eric and Patrick Förscher and digestif by Augusto Stoffel are language servers for the static analysis of TeX programs and documents. They have been written in Rust and Lua, respectively, and released under the GNU GPL 3.0 license. The language servers were suggested to me as related work by Michal Hoftich at TUG 2024, as discussed in the status update from July.

Whereas TeXLab focuses on LaTeX documents, digestif also supports other formats such as ConTeXt and GNU Texinfo. Neither TeXLab nor digestif support expl3 code at the time of writing.

In terms of the programming language, license, and scope, digestif seems like the most related work to my linter and should provide many opportunities for inspiration and code reuse.

UPDATE (2024-09-06): After an extended discussion with Karl Berry, I decided to dual-license the code under GNU GPL 2.0 or later and LPPL 1.3c or later, which makes it impossible to reuse code from digestif. For more information, see the updated requirements.

Conclusion

In this post, I reviewed the related work in the analysis of TeX programs and documents. In the next post, I will analyze the requirements, discuss the design of the linter, and create a code repository for the linter. Then, I will continue by implementing the first processing step for the linter.

Written on August 9, 2024
Last updated on September 6, 2024