\documentclass{article} \usepackage{fancyvrb} \usepackage{graphicx} \usepackage{fullpage} \usepackage{relsize} \usepackage{url} \usepackage{hevea} \usepackage[shortcuts]{extdash} \usepackage{textcomp} % \usepackage{verbdef} \def\topfraction{.9} \def\dbltopfraction{\topfraction} \def\floatpagefraction{\topfraction} % default .5 \def\dblfloatpagefraction{\topfraction} % default .5 \def\textfraction{.1} %HEVEA \footerfalse % Disable hevea advertisement in footer \newcommand{\code}[1]{\ifmmode{\mbox{\relax\ttfamily{#1}}}\else{\relax\ttfamily #1}\fi} %% Hevea version omits "\smaller" %HEVEA \renewcommand{\code}[1]{\ifmmode{\mbox{\ttfamily{#1}}}\else{\ttfamily #1}\fi} \newcommand{\includeimage}[2]{ \begin{center} \ifhevea\imgsrc{#1.png}\else \resizebox{!}{#2}{\includegraphics{figures/#1}} \vspace{-1.5\baselineskip} \fi \end{center}} % Add line between figure and text \makeatletter \def\topfigrule{\kern3\p@ \hrule \kern -3.4\p@} % the \hrule is .4pt high \def\botfigrule{\kern-3\p@ \hrule \kern 2.6\p@} % the \hrule is .4pt high \def\dblfigrule{\kern3\p@ \hrule \kern -3.4\p@} % the \hrule is .4pt high \makeatother \title{Annotation File Format Specification} % Hevea ignores \date, so move the date into \author \author{\url{https://checkerframework.org/annotation-file-utilities/} \\ \today} \date{} \begin{document} \maketitle %HEVEA \setcounter{tocdepth}{2} \tableofcontents \section{Purpose: External storage of annotations\label{purpose}} Java annotations are meta-data about Java program elements, as in ``\code{@Deprecated class Date \{ \ldots\ \}}''. Ordinarily, Java annotations are written in the source code of a \code{.java} Java source file. When \code{javac} compiles the source code, it inserts the annotations in the resulting \code{.class} file (as ``attributes''). Sometimes, it is convenient to specify the annotations outside the source code or the \code{.class} file. \begin{itemize} %BEGIN LATEX \itemsep 0pt \parskip 0pt %END LATEX \item When source code is not available, a textual file provides a format for writing and storing annotations that is much easier to read and modify than a \code{.class} file. Even if the eventual purpose is to insert the annotations in the \code{.class} file, the annotations must be specified in some textual format first. \item Even when source code is available, sometimes it should not be changed, yet annotations must be stored somewhere for use by tools. \item A textual file for annotations can eliminate code clutter. A developer performing some specialized task (such as code verification, parallelization, etc.)\ can store annotations in an annotation file without changing the main version of the source code. (The developer's private version of the code could contain the annotations, but the developer could move them to the separate file before committing changes.) \item Tool writers may find it more convenient to use a textual file, rather than writing a Java or \code{.class} file parser. \item When debugging annotation-processing tools, a textual file format (extracted from the Java or \code{.class} files) is easier to read and is easier for use in testing. \end{itemize} All of these uses require an external, textual file format for Java annotations. The external file format should be easy for people to create, read, and modify. % An ``annotation file'' serves this purpose by specifying a set of Java annotations. The Annotation File Utilities (\url{https://checkerframework.org/annotation-file-utilities/}) are a set of tools that process annotation files. The file format discussed in this document supports both standard Java SE 5 declaration annotations and also the type annotations that are introduced by Java SE 8. The file format provides a simple syntax to represent the structure of a Java program. For annotations in method bodies of \code{.class} files the annotation file closely follows section ``Class File Format Extensions'' of the JSR 308 design document~\cite{JSR308-webpage-201310}, which explains how the annotations are stored in the \code{.class} file. In that sense, the current design is extremely low-level, and users probably would not want to write the files by hand (but they might fill in a template that a tool generated automatically). As future work, we should design a more user-friendly format that permits Java signatures to be directly specified. For \code{.java} source files, the file format provides a separate, higher-level syntax for annotations in method bodies. %% I don't like this, as it may force distributing logically connected %% elements all over a file system. Users should be permitted, but not %% forced, to adopt such a file structure. -MDE % Each file corresponds to exactly one % ``.class'' file, so (for instance) inner classes are written in % separate annotation files, named in the same ``{\tt % OuterClass\$InnerClass}'' pattern as the ``.class'' file. By convention, an annotation file ends with ``\code{.jaif}'' (for ``Java annotation index file''), but this is not required. % \verbdef\lineend|"\n"| %BEGIN LATEX \DefineShortVerb{\|} \SaveVerb{newline}|\n| \UndefineShortVerb{\|} \newcommand{\lineend}{\bnflit{\UseVerb{newline}}} %END LATEX %HEVEA \newcommand{\bs}{\char"5C} %HEVEA \newcommand{\lineend}{\bnflit{\bs{}n}} % literal \newcommand{\bnflit}[1]{\textrm{``}\textbf{#1}\textrm{''}} % non-terminal \newcommand{\bnfnt}[1]{\textsf{\emph{#1}}} % comment \newcommand{\bnfcmt}{\rm \# } % alternative \newcommand{\bnfor}{\ensuremath{|}} \section{Grammar\label{grammar}} This section describes the annotation file format in detail by presenting it in the form of a grammar. Section~\ref{grammar-conventions} details the conventions of the grammar. Section~\ref{java-file-grammar} shows how to represent the basic structure of a Java program (classes, methods, etc.) in an annotation file. Section~\ref{annotations-grammar} shows how to add annotations to an annotation file. \subsection{Grammar conventions\label{grammar-conventions}} Throughout this document, ``name'' is any valid Java simple name or binary name, ``type'' is any valid type, and ``value'' is any valid Java constant, and quoted strings are literal values. % The Kleene qualifiers ``*'' (zero or more), ``?'' (zero or one), and ``+'' (one or more) denote plurality of a grammar element. % A vertical bar (``\bnfor'') separates alternatives. Parentheses (``()'') denote grouping, and square brackets (``[]'') denote optional syntax, which is equivalent to ``( \ldots\ )\ ?''\ but more concise. We use the hash/pound/octothorpe symbol (``\#'') for comments within the grammar. In the annotation file, besides its use as token separator, whitespace (excluding newlines) is optional with one exception: no space is permitted between an ``@'' character and a subsequent name. Indentation is ignored, but is encouraged to maintain readability of the hierarchy of program elements in the class (see the example in Section~\ref{example}). Comments can be written throughout the annotation file using the double-slash syntax employed by Java for single-line comments: anything following two adjacent slashes (``//'') until the first newline is a comment. This is omitted from the grammar for simplicity. Block comments (``/* \ldots\ */'') are not allowed. The line end symbol \lineend{} is used for all the different line end conventions, that is, Windows- and Unix-style newlines are supported. \subsection{Java file grammar\label{java-file-grammar}} This section shows how to represent the basic structure of a Java program (classes, methods, etc.) in an annotation file. For Java elements that can contain annotations, this section will reference grammar productions contained in Section~\ref{annotations-grammar}, which describes how annotations are used in an annotation file. An annotation file has the same basic structure as a Java program. That is, there are packages, classes, fields and methods. The annotation file may omit certain program elements --- for instance, it may mention only some of the packages in a program, or only some of the classes in a package, or only some of the fields or methods of a class. Program elements that do not appear in the annotation file are treated as unannotated. \subsubsection{Package definitions\label{package-definitions}} At the root of an annotation file is one or more package definitions. A package definition describes a package containing a list of annotation definitions and classes. A package definition also contains any annotations on the package (such as those from a \code{package-info.java} file). \begin{tabbing} \qquad \= \kill \bnfnt{annotation-file} ::= \\ \qquad \bnfnt{package-definition}+ \\ \\ \bnfnt{package-definition} ::= \\ \qquad \bnflit{package} ( \bnflit{:} ) \bnfor{} ( \bnfnt{name} \bnflit{:} \bnfnt{decl-annotation}* ) \lineend \\ \qquad ( \bnfnt{annotation-definition} \bnfor{} \bnfnt{class-definition} ) * \end{tabbing} \noindent Use a package line of \code{package:} for the default package. Note that annotations on the default package are not allowed. \subsubsection{Class definitions\label{class-definitions}} A class definition describes the annotations present on a class declaration, as well fields and methods of the class. It is organized according to the hierarchy of fields and methods in the class. Note that we use \bnfnt{class-definition} also for interfaces, enums, and annotation types (to specify annotations in an existing annotation type, not to be confused with \bnfnt{annotation-definition}s described in Section~\ref{annotation-definitions}, which defines annotations to be used throughout an annotation file); for syntactic simplicity, we use \bnflit{class} for all such definitions. % TODO: add test cases for this. Inner classes are treated as ordinary classes whose names happen to contain \code{\$} signs and must be defined at the top level of a class definition file. (To change this, the grammar would have to be extended with a closing delimiter for classes; otherwise, it would be ambiguous whether a field or method appearing after an inner class definition belonged to the inner class or the outer class.) The syntax for inner class names is the same as is used by the \code{javac} compiler. A good way to get an idea of the inner class names for a class is to compile the class and look at the filenames of the \code{.class} files that are produced. \begin{tabbing} \qquad \= \kill \bnfnt{class-definition} ::= \\ \qquad \bnflit{class} \bnfnt{name} \bnflit{:} \bnfnt{decl-annotation}* \lineend \\ % TODO: is the order really important? eg. can fields and methods not % be mixed? \qquad \bnfnt{typeparam-definition}* \\ \qquad \bnfnt{typeparam-bound}* \\ \qquad \bnfnt{extends}* \\ \qquad \bnfnt{implements}* \\ \qquad \bnfnt{field-definition}* \\ \qquad \bnfnt{staticinit}* \\ \qquad \bnfnt{instanceinit}* \\ \qquad \bnfnt{method-definition}* \end{tabbing} \noindent Annotations on the \bnflit{class} line are annotations on the class declaration, not the class name. \paragraph{Type parameter definitions} The \bnfnt{typeparam-definition} production defines annotations on the declaration of a type parameter, such as on \code{K} and \code{T} in \begin{verbatim} public class Class<K> { public <T> void m() { ... } } \end{verbatim} or on the type parameters on the left-hand side of a member reference, as on \code{String} in \code{List<String>::size}. \begin{tabbing} \qquad \= \kill \bnfnt{typeparam-definition} ::= \\ \qquad \bnfcmt The integer is the zero-based type parameter index. \\ \qquad \bnflit{typeparam} \bnfnt{integer} \bnflit{:} \bnfnt{type-annotation}* \lineend \\ \qquad \bnfnt{compound-type}* \end{tabbing} \paragraph{Type Parameter Bounds} The \bnfnt{typeparam-bound} production defines annotations on a bound of a type variable declaration, such as on \code{Number} and \code{Date} in \begin{verbatim} public class Class<K extends Number> { public <T extends Date> void m() { ... } } \end{verbatim} \begin{tabbing} \qquad \= \kill \bnfnt{typeparam-bound} ::= \\ % The bound should really be a sub-element of the typeparam! \qquad \bnfcmt The integers are respectively the parameter and bound indexes of \\ \qquad \bnfcmt the type parameter bound~\cite{JSR308-webpage-201310}. \\ \qquad \bnflit{bound} \bnfnt{integer} \bnflit{\&} \bnfnt{integer} \bnflit{:} \bnfnt{type-annotation}* \lineend \\ \qquad \bnfnt{compound-type}* \end{tabbing} \paragraph{Implements and extends} The \bnfnt{extends} and \bnfnt{implements} productions define annotations on the names of classes a class \code{extends} or \code{implements}. (Note: For interface declarations, \bnfnt{implements} rather than \bnfnt{extends} defines annotations on the names of extended interfaces.) \begin{tabbing} \qquad \= \kill \bnfnt{extends} ::= \\ \qquad \bnflit{extends} \bnflit{:} \bnfnt{type-annotation}* \lineend \\ \qquad \bnfnt{compound-type}* \\ \\ \bnfnt{implements} ::= \\ \qquad \bnfcmt The integer is the zero-based index of the implemented interface. \\ \qquad \bnflit{implements} \bnfnt{integer} \bnflit{:} \bnfnt{type-annotation}* \lineend \\ \qquad \bnfnt{compound-type}* \end{tabbing} \paragraph{Static and instance initializers} The \bnfnt{staticinit} and \bnfnt{instanceinit} productions define annotations on code within static or instance initializer blocks. \begin{tabbing} \qquad \= \kill \bnfnt{staticinit} ::= \\ \qquad \bnfcmt The integer is the zero-based index of the implemented interface. \\ \qquad \bnflit{staticinit} \bnflit{*} \bnfnt{integer} \bnflit{:} \lineend \\ \qquad \bnfnt{compound-type}* \\ \bnfnt{instanceinit} ::= \\ \qquad \bnfcmt The integer is the zero-based index of the implemented interface. \\ \qquad \bnflit{instanceinit} \bnflit{*} \bnfnt{integer} \bnflit{:} \lineend \\ \qquad \bnfnt{compound-type}* \end{tabbing} \subsubsection{Field definitions\label{field-definitons}} A field definition can have annotations on the declaration, the type of the field, or --- if in source code --- the field's initialization. \begin{tabbing} \qquad \= \kill \bnfnt{field-definition} ::= \\ \qquad \bnflit{field} \bnfnt{name} \bnflit{:} \bnfnt{decl-annotation}* \lineend \\ \qquad \bnfnt{type-annotations}* \\ \qquad \bnfnt{expression-annotations}* \end{tabbing} \noindent Annotations on the \bnflit{field} line are on the field declaration, not the type of the field. The \bnfnt{expression-annotations} production specifies annotations on the initialization expression of a field. If a field is initialized at declaration then in bytecode the initialization is moved to the constructor when the class is compiled. Therefore for bytecode, annotations on the initialization expression go in the constructor (see Section~\ref{method-definitions}), rather than the field definition. Source code annotations for the field initialization expression are valid on the field definition. \subsubsection{Method definitions\label{method-definitions}} A method definition can have annotations on the method declaration, in the method header (return type, parameters, etc.), as well as the method body. \begin{tabbing} \qquad \= \kill \bnfnt{method-definition} ::= \\ \qquad \bnflit{method} \bnfnt{method-key} \bnflit{:} \bnfnt{decl-annotation}* \lineend \\ \qquad \bnfnt{typeparam-definition}* \\ \qquad \bnfnt{typeparam-bound}* \\ \qquad \bnfnt{return-type}? \\ \qquad \bnfnt{receiver-definition}? \\ \qquad \bnfnt{parameter-definition}* \\ % TODO: method throws \qquad \bnfnt{variable-definition}* \\ \qquad \bnfnt{expression-annotations}* \end{tabbing} \noindent The annotations on the \bnflit{method} line are on the method declaration, not on the return value. The \bnfnt{method-key} consists of the name followed by the signature in JVML format. For example, the following method \begin{verbatim} boolean foo(int[] i, String s) { ... } \end{verbatim} \noindent has the \bnfnt{method-key}: \begin{verbatim} foo([ILjava/lang/String;)Z \end{verbatim} Note that the signature is the erased signature of the method and does not contain generic type information, but does contain the return type. Using \code{javap -s} makes it easy to find the signature. The method keys ``\code{<init>}'' and ``\code{<clinit>}'' are used to name instance (constructor) and class (static) initialization methods. (The name of the constructor---that is, the final element of the class name---can be used in place of ``\code{<init>}''.) For both instance and class initializers, the ``return type'' part of the signature should be \code{V} (for \code{void}). % TODO: exception types in catch clause % TODO: .class literals % TODO: type arguments in constructor and method calls \paragraph{Return type} A return type defines the annotations on the return type of a method declaration. It is also used for the result of a constructor. \begin{tabbing} \qquad \= \kill \bnfnt{return-type} ::= \\ \qquad \bnflit{return:} \bnfnt{type-annotation}* \lineend \\ \qquad \bnfnt{compound-type}* \end{tabbing} \paragraph{Receiver definition} A receiver definition defines the annotations on the type of the receiver parameter in a method declaration. A method receiver is the implicit formal parameter, \code{this}, used in non-static methods. For source code insertion, the receiver parameter will be inserted if it does not already exist. Only inner classes have a receiver. A top-level constructor does not have a receiver, though it does have a result. The type of a constructor result is represented as a return type. \begin{tabbing} \qquad \= \kill \bnfnt{receiver-definition} ::= \\ \qquad \bnflit{receiver:} \bnfnt{type-annotation}* \lineend \\ \qquad \bnfnt{compound-type}* \end{tabbing} \paragraph{Parameter definition} A formal parameter definition defines the annotations on a method formal parameter declaration and the type of a method formal parameter, but \emph{not} the receiver formal parameter. \begin{tabbing} \qquad \= \kill \bnfnt{parameter-definition} ::= \\ \qquad \bnfcmt The integer is the zero-based index of the formal parameter in the method. \\ \qquad \bnflit{parameter} \bnfnt{integer} \bnflit{:} \bnfnt{decl-annotation}* \lineend \\ \qquad \bnfnt{type-annotations}* \end{tabbing} \noindent The annotations on the \bnflit{parameter} line are on the formal parameter declaration, not on the type of the parameter. A parameter index of 0 is the first formal parameter. The receiver parameter is not index 0. Use the \bnfnt{receiver-definition} production to annotate the receiver parameter. \subsection{Bytecode Locations\label{bytecode-locations}} Certain elements in the body of a method or the initialization expression of a field can be annotated. The \bnfnt{expression-annotations} rule describes the annotations that can be added to a method body or a field initialization expression: \begin{tabbing} \qquad \= \kill \bnfnt{expression-annotations} ::= \\ \qquad \bnfnt{typecast}* \\ \qquad \bnfnt{instanceof}* \\ \qquad \bnfnt{new}* \\ \qquad \bnfnt{call}* \\ \qquad \bnfnt{reference}* \\ \qquad \bnfnt{lambda}* \\ \qquad \bnfnt{source-insert-typecast}* \\ \qquad \bnfnt{source-insert-annotation}* \end{tabbing} \noindent Additionally, a variable declaration in a method body can be annotated with the \bnfnt{variable-definition} rule, which appears below. Because of the differences between Java source code and \code{.class} files, the syntax for specifying code locations is different for \code{.class} files and source code. For \code{.class} files we use a syntax called ``bytecode offsets''. For source code we use a different syntax called ``source code indexes''. These are both described below. If you wish to be able to insert a given code annotation in both a \code{.class} file and a source code file, the annotation file must redundantly specify the annotation's bytecode offset and source code index. This can be done in a single \code{.jaif} file or two separate \code{.jaif} files. It is not necessary to include redundant information to insert annotations on signatures in both \code{.class} files and source code. Additionally, a new typecast with annotations (rather than an annotation added to an existing typecast) can be inserted into source code. This uses a third syntax that is described below under ``AST paths''. A second way to insert a typecast is by specifying just an annotation, not a full typecast (\code{insert-annotation} instead of \code{insert-typecast}). In this case, the source annotation insertion tool generates a full typecast if Java syntax requires one. \subsubsection{Bytecode offsets\label{bytecode-offsets}} For locations in bytecode, the annotation file uses offsets into the bytecode array of the class file to indicate the specific expression to which the annotation refers. Because different compilation strategies yield different \code{.class} files, a tool that maps such annotations from an annotation file into source code must have access to the specific \code{.class} file that was used to generate the annotation file. The \code{javap -v} command is an effective technique to discover bytecode offsets. Non-expression annotations such as those on methods, fields, classes, etc., do not use a bytecode offset. \subsubsection{Source code indexes\label{source-code-indexes}} For locations in source code, the annotation file indicates the kind of expression, plus a zero-based index to indicate which occurrence of that kind of expression. For example, \begin{verbatim} public void method() { Object o1 = new @A String(); String s = (@B String) o1; Object o2 = new @C Integer(0); Integer i = (@D Integer) o2; } \end{verbatim} \noindent \code{@A} is on new, index 0. \code{@B} is on typecast, index 0. \code{@C} is on new, index 1. \code{@D} is on typecast, index 1. Source code indexes only include occurrences in the class that exactly matches the name of the enclosing \bnfnt{class-definition} rule. Specifically, occurrences in nested classes are not included. Use a new \bnfnt{class-definition} rule with the name of the nested class for source code insertions in a nested class. \subsubsection{Code locations grammar\label{code-grammar}} For each kind of expression, the grammar contains a separate location rule. This location rule contains the bytecode offset syntax followed by the source code index syntax. The grammar uses \bnflit{\#} for bytecode offsets and \bnflit{*} for source code indexes. \begin{tabbing} \qquad \= \kill \bnfnt{variable-location} ::= \\ \qquad \bnfcmt Bytecode offset: the integers are respectively the index, start, and length \\ \qquad \bnfcmt fields of the annotations on this variable~\cite{JSR308-webpage-201310}. \\ \qquad (\bnfnt{integer} \bnflit{\#} \bnfnt{integer} \bnflit{+} \bnfnt{integer}) \\ \qquad \bnfcmt Source code index: the \bnfnt{name} is the identifier of the local variable. \\ \qquad \bnfcmt The \bnfnt{integer} is the optional zero-based index of the intended local \\ \qquad \bnfcmt variable within all local variables with the given \bnfnt{name}. \\ \qquad \bnfcmt The default value for the index is zero. \\ \qquad \bnfor{} (\bnfnt{name} [\bnflit{*} \bnfnt{integer}]) \\ \\ \bnfnt{variable-definition} ::= \\ \qquad \bnfcmt The annotations on the \bnflit{local} line are on the variable declaration, \\ \qquad \bnfcmt not the type of the variable. \\ \qquad \bnflit{local} \bnfnt{variable-location} \bnflit{:} \bnfnt{decl-annotation}* \lineend \\ \qquad \bnfnt{type-annotations}* \\ \\ \bnfnt{typecast-location} ::= \\ \qquad \bnfcmt Bytecode offset: the first integer is the offset field and the optional \\ \qquad \bnfcmt second integer is the type index of an intersection type~\cite{JSR308-webpage-201310}. \\ \qquad \bnfcmt The type index defaults to zero if not specified. \\ \qquad (\bnflit{\#} \bnfnt{integer} [ \bnflit{,} \bnfnt{integer} ]) \\ \qquad \bnfcmt Source code index: the first integer is the zero-based index of the typecast \\ \qquad \bnfcmt within the method and the optional second integer is the type index of an \\ \qquad \bnfcmt intersection type~\cite{JSR308-webpage-201310}. The type index defaults to zero if not specified. \\ \qquad \bnfor{} (\bnflit{*} \bnfnt{integer} [ \bnflit{,} \bnfnt{integer} ]) \\ \\ \bnfnt{typecast} ::= \\ \qquad \bnflit{typecast} \bnfnt{typecast-location} \bnflit{:} \bnfnt{type-annotation}* \lineend \\ \qquad \bnfnt{compound-type}* \\ \\ \bnfnt{instanceof-location} ::= \\ \qquad \bnfcmt Bytecode offset: the integer is the offset field of the annotation~\cite{JSR308-webpage-201310}. \\ \qquad (\bnflit{\#} \bnfnt{integer}) \\ \qquad \bnfcmt Source code index: the integer is the zero-based index of the \code{instanceof} \\ \qquad \bnfcmt within the method. \\ \qquad \bnfor{} (\bnflit{*} \bnfnt{integer}) \\ \\ \bnfnt{instanceof} ::= \\ \qquad \bnflit{instanceof} \bnfnt{instanceof-location} \bnflit{:} \bnfnt{type-annotation}* \lineend \\ \qquad \bnfnt{compound-type}* \\ \\ \bnfnt{new-location} ::= \\ \qquad \bnfcmt Bytecode offset: the integer is the offset field of the annotation~\cite{JSR308-webpage-201310}. \\ \qquad (\bnflit{\#} \bnfnt{integer}) \\ \qquad \bnfcmt Source code index: the integer is the zero-based index of the object or array \\ \qquad \bnfcmt creation within the method. \\ \qquad \bnfor{} (\bnflit{*} \bnfnt{integer}) \\ \\ \bnfnt{new} ::= \\ \qquad \bnflit{new} \bnfnt{new-location} \bnflit{:} \bnfnt{type-annotation}* \lineend \\ \qquad \bnfnt{compound-type}* \\ \bnfnt{call-location} ::= \\ \qquad \bnfcmt Bytecode offset: the integer is the offset field of the annotation~\cite{JSR308-webpage-201310}. \\ \qquad (\bnflit{\#} \bnfnt{integer}) \\ \qquad \bnfcmt Source code index: the integer is the zero-based index of the method call \\ \qquad \bnfcmt within the field or method definition. \\ \qquad \bnfor{} (\bnflit{*} \bnfnt{integer}) \\ \\ \bnfnt{call} ::= \\ \qquad \bnflit{call} \bnfnt{call-location} \bnflit{:} \lineend \\ \qquad \bnfnt{typearg-definition}* \\ \\ \bnfnt{reference-location} ::= \\ \qquad \bnfcmt Bytecode offset: the integer is the offset field of the annotation~\cite{JSR308-webpage-201310}. \\ \qquad (\bnflit{\#} \bnfnt{integer}) \\ \qquad \bnfcmt Source code index: the integer is the zero-based index of the member \\ \qquad \bnfcmt reference~\cite{JSR308-webpage-201310}. \\ \qquad \bnfor{} (\bnflit{*} \bnfnt{integer}) \\ \\ \bnfnt{reference} ::= \\ \qquad \bnflit{reference} \bnfnt{reference-location} \bnflit{:} \bnfnt{type-annotation}* \lineend \\ \qquad \bnfnt{compound-type}* \\ \qquad \bnfnt{typearg-definition}* \\ \\ \bnfnt{lambda-location} ::= \\ \qquad \bnfcmt Bytecode offset: the integer is the offset field of the annotation~\cite{JSR308-webpage-201310}. \\ \qquad (\bnflit{\#} \bnfnt{integer}) \\ \qquad \bnfcmt Source code index: the integer is the zero-based index of the lambda \\ \qquad \bnfcmt expression~\cite{JSR308-webpage-201310}. \\ \qquad \bnfor{} (\bnflit{*} \bnfnt{integer}) \\ \\ \bnfnt{lambda} ::= \\ \qquad \bnflit{lambda} \bnfnt{lambda-location} \bnflit{:} \lineend \\ %\qquad \bnfnt{return-type}? \\ \qquad \bnfnt{parameter-definition}* \\ \qquad \bnfnt{variable-definition}* \\ \qquad \bnfnt{expression-annotations}* \\ \qquad \= \kill \bnfnt{typearg-definition} ::= \\ \qquad \bnfcmt The integer is the zero-based type argument index. \\ \qquad \bnflit{typearg} \bnfnt{integer} \bnflit{:} \bnfnt{type-annotation}* \lineend \\ \qquad \bnfnt{compound-type}* \end{tabbing} \subsubsection{AST paths\label{ast-paths}} A path through the AST (abstract syntax tree) specifies an arbitrary expression in source code to modify. AST paths can be used in the \code{.jaif} file to specify a location to insert either a bare annotation (\bnflit{insert-annotation}) or a cast (\bnflit{insert-typecast}). For a cast insertion, the \code{.jaif} file specifies the type to cast to. The annotations on the \bnflit{insert-typecast} line will be inserted on the outermost type of the type to cast to. If the type to cast to is a compound type then annotations on parts of the compound type are specified with the \bnfnt{compound-type} rule. If there are no annotations on the \bnflit{insert-typecast} line then a cast with no annotations will be inserted or, if compound type annotations are specified, a cast with annotations only on the compound types will be inserted. Note that the type specified on the \bnflit{insert-typecast} line cannot contain any qualified type names. For example, use \code{Entry<String, Object>} instead of \code{Map.Entry<java.lang.String, java.lang.Object>}. \begin{tabbing} \bnfnt{source-insert-typecast} ::= \\ \qquad \bnfcmt \bnfnt{ast-path} is described below. \\ \qquad \bnfcmt \bnfnt{type} is the un-annotated type to cast to. \\ \qquad \bnflit{insert-typecast} \bnfnt{ast-path}\bnflit{:} \bnfnt{type-annotation}* \bnfnt{type} \lineend \\ \qquad \bnfnt{compound-type}* \end{tabbing} An AST path represents a traversal through the AST. AST paths can only be used in \bnfnt{field-definition}s and \bnfnt{method-definition}s. An AST path starts with the first element under the definition. For methods this is \code{Block} and for fields this is \code{Variable}. An AST path is composed of one or more AST entries, separated by commas. Each AST entry is composed of a tree kind, a child selector, and an optional argument. An example AST entry is: \begin{verbatim} Block.statement 1 \end{verbatim} The tree kind is \code{Block}, the child selector is \code{statement} and the argument is \code{1}. The available tree kinds correspond to the Java AST tree nodes (from the package \code{com.sun.source.tree}), but with ``Tree'' removed from the name. For example, the class \code{com.sun.source.tree.BlockTree} is represented as \code{Block}. The child selectors correspond to the method names of the given Java AST tree node, with ``get'' removed from the beginning of the method name and the first letter lowercased. In cases where the child selector method returns a list, the method name is made singular and the AST entry also contains an argument to select the index of the list to take. For example, the method \code{com\-.sun\-.source\-.tree\-.Block\-Tree\-.get\-Statements()} is represented as \code{Block.statement} and requires an argument to select the statement to take. The following is an example of an entire AST path: \begin{verbatim} Block.statement 1, Switch.case 1, Case.statement 0, ExpressionStatement.expression, MethodInvocation.argument 0 \end{verbatim} Since the above example starts with a \code{Block} it belongs in a \bnfnt{method-definition}. This AST path would select an expression that is in statement 1 of the method, case 1 of the switch statement, statement 0 of the case, and argument 0 of a method call (\code{ExpressionStatement} is just a wrapper around an expression that can also be a statement). The following is an example of an annotation file with AST paths used to specify where to insert casts. \begin{verbatim} package p: annotation @A: class ASTPathExample: field a: insert-typecast Variable.initializer, Binary.rightOperand: @A Integer method m()V: insert-typecast Block.statement 0, Variable.initializer: @A Integer insert-typecast Block.statement 1, Switch.case 1, Case.statement 0, ExpressionStatement.expression, MethodInvocation.argument 0: @A Integer \end{verbatim} And the matching source code: \begin{verbatim} package p; public class ASTPathExample { private int a = 12 + 13; public void m() { int x = 1; switch (x + 2) { case 1: System.out.println(1); break; case 2: System.out.println(2 + x); break; default: System.out.println(-1); } } } \end{verbatim} The following is the output, with the casts inserted. \begin{verbatim} package p; import p.A; public class ASTPathExample { private int a = 12 + ((@A Integer) (13)); public void m() { int x = ((@A Integer) (1)); switch (x + 2) { case 1: System.out.println(1); break; case 2: System.out.println(((@A Integer) (2 + x))); break; default: System.out.println(-1); } } } \end{verbatim} Using \code{insert-annotation} instead of \code{insert-typecast} yields almost the same result --- it also inserts a cast. The sole difference is the inability to specify the type in the cast expression. If you use \code{insert-annotation}, then the annotation inserter infers the type, which is \code{int} in this case. Note that a cast can be inserted on any expression, not just the deepest expression in the AST. For example, a cast could be inserted on the expression \code{i + j}, the identifier \code{i}, and/or the identifier \code{j}. To help create correct AST paths it may be useful to view the AST of a class. The Checker Framework has a processor to do this. The following command will output indented AST nodes for the entire input program. \begin{verbatim} javac -processor org.checkerframework.common.util.debug.TreeDebug ASTPathExample.java \end{verbatim} The following is the grammar for AST paths. \begin{tabbing} \qquad \= \kill \bnfnt{ast-path} ::= \\ \qquad \bnfnt{ast-entry} [ \bnflit{,} \bnfnt{ast-entry} ]+ \\ \\ \bnfnt{ast-entry} ::= \\ \qquad \bnfnt{annotated-type} \\ \qquad \bnfor{} \bnfnt{annotation} \\ \qquad \bnfor{} \bnfnt{array-access} \\ \qquad \bnfor{} \bnfnt{array-type} \\ \qquad \bnfor{} \bnfnt{assert} \\ \qquad \bnfor{} \bnfnt{assignment} \\ \qquad \bnfor{} \bnfnt{binary} \\ \qquad \bnfor{} \bnfnt{block} \\ \qquad \bnfor{} \bnfnt{case} \\ \qquad \bnfor{} \bnfnt{catch} \\ \qquad \bnfor{} \bnfnt{compound-assignment} \\ \qquad \bnfor{} \bnfnt{conditional-expression} \\ \qquad \bnfor{} \bnfnt{do-while-loop} \\ \qquad \bnfor{} \bnfnt{enhanced-for-loop} \\ \qquad \bnfor{} \bnfnt{expression-statement} \\ \qquad \bnfor{} \bnfnt{for-loop} \\ \qquad \bnfor{} \bnfnt{if} \\ \qquad \bnfor{} \bnfnt{instance-of} \\ \qquad \bnfor{} \bnfnt{intersection-type} \\ \qquad \bnfor{} \bnfnt{labeled-statement} \\ \qquad \bnfor{} \bnfnt{lambda-expression} \\ \qquad \bnfor{} \bnfnt{member-reference} \\ \qquad \bnfor{} \bnfnt{member-select} \\ \qquad \bnfor{} \bnfnt{method-invocation} \\ \qquad \bnfor{} \bnfnt{new-array} \\ \qquad \bnfor{} \bnfnt{new-class} \\ \qquad \bnfor{} \bnfnt{parameterized-type} \\ \qquad \bnfor{} \bnfnt{parenthesized} \\ \qquad \bnfor{} \bnfnt{return} \\ \qquad \bnfor{} \bnfnt{switch} \\ \qquad \bnfor{} \bnfnt{synchronized} \\ \qquad \bnfor{} \bnfnt{throw} \\ \qquad \bnfor{} \bnfnt{try} \\ \qquad \bnfor{} \bnfnt{type-cast} \\ \qquad \bnfor{} \bnfnt{type-parameter} \\ \qquad \bnfor{} \bnfnt{unary} \\ \qquad \bnfor{} \bnfnt{union-type} \\ \qquad \bnfor{} \bnfnt{variable-type} \\ \qquad \bnfor{} \bnfnt{while-loop} \\ \qquad \bnfor{} \bnfnt{wildcard-tree} \\ \\ \bnfnt{annotated-type} :: = \\ \qquad \bnflit{AnnotatedType} \bnflit{.} ( ( \bnflit{annotation} \bnfnt{integer} ) \bnfor{} \bnflit{underlyingType} ) \\ \\ \bnfnt{annotation} ::= \\ \qquad \bnflit{Annotation} \bnflit{.} ( \bnflit{type} \bnfor{} \bnflit{argument} \bnfnt{integer} ) \\ \\ \bnfnt{array-access} ::= \\ \qquad \bnflit{ArrayAccess} \bnflit{.} ( \bnflit{expression} \bnfor{} \bnflit{index} ) \\ \\ \bnfnt{array-type} ::= \\ \qquad \bnflit{ArrayType} \bnflit{.} \bnflit{type} \\ \\ \bnfnt{assert} ::= \\ \qquad \bnflit{Assert} \bnflit{.} ( \bnflit{condition} \bnfor{} \bnflit{detail} ) \\ \\ \bnfnt{assignment} ::= \\ \qquad \bnflit{Assignment} \bnflit{.} ( \bnflit{variable} \bnfor{} \bnflit{expression} ) \\ \\ \bnfnt{binary} ::= \\ \qquad \bnflit{Binary} \bnflit{.} ( \bnflit{leftOperand} \bnfor{} \bnflit{rightOperand} ) \\ \\ \bnfnt{block} ::= \\ \qquad \bnflit{Block} \bnflit{.} \bnflit{statement} \bnfnt{integer} \\ \\ \bnfnt{case} ::= \\ \qquad \bnflit{Case} \bnflit{.} ( \bnflit{expression} \bnfor{} ( \bnflit{statement} \bnfnt{integer} ) ) \\ \\ \bnfnt{catch} ::= \\ \qquad \bnflit{Catch} \bnflit{.} ( \bnflit{parameter} \bnfor{} \bnflit{block} ) \\ \\ \bnfnt{compound-assignment} ::= \\ \qquad \bnflit{CompoundAssignment} \bnflit{.} ( \bnflit{variable} \bnfor{} \bnflit{expression} ) \\ \\ \bnfnt{conditional-expression} ::= \\ \qquad \bnflit{ConditionalExpression} \bnflit{.} ( \bnflit{condition} \bnfor{} \bnflit{trueExpression} \bnfor{} \bnflit{falseExpression} ) \\ \\ \bnfnt{do-while-loop} ::= \\ \qquad \bnflit{DoWhileLoop} \bnflit{.} ( \bnflit{condition} \bnfor{} \bnflit{statement} ) \\ \\ \bnfnt{enhanced-for-loop} ::= \\ \qquad \bnflit{EnhancedForLoop} \bnflit{.} ( \bnflit{variable} \bnfor{} \bnflit{expression} \bnfor{} \bnflit{statement} ) \\ \\ \bnfnt{expression-statement} ::= \\ \qquad \bnflit{ExpressionStatement} \bnflit{.} \bnflit{expression} \\ \\ \bnfnt{for-loop} ::= \\ \qquad \bnflit{ForLoop} \bnflit{.} ( ( \bnflit{initializer} \bnfnt{integer} ) \bnfor{} \bnflit{condition} \bnfor{} ( \bnflit{update} \bnfnt{integer} ) \bnfor{} \bnflit{statement} ) \\ \\ \bnfnt{if} ::= \\ \qquad \bnflit{If} \bnflit{.} ( \bnflit{condition} \bnfor{} \bnflit{thenStatement} \bnfor{} \bnflit{elseStatement} ) \\ \\ \bnfnt{instance-of} ::= \\ \qquad \bnflit{InstanceOf} \bnflit{.} ( \bnflit{expression} \bnfor{} \bnflit{type} ) \\ \\ \bnfnt{intersection-type} ::= \\ \qquad \bnflit{IntersectionType} \bnflit{.} \bnflit{bound} \bnfnt{integer} \\ \\ \bnfnt{labeled-statement} ::= \\ \qquad \bnflit{LabeledStatement} \bnflit{.} \bnflit{statement} \\ \\ \bnfnt{lambda-expression} ::= \\ \qquad \bnflit{LambdaExpression} \bnflit{.} ( ( \bnflit{parameter} \bnfnt{integer} ) \bnfor{} \bnflit{body} ) \\ \\ \bnfnt{member-reference} ::= \\ \qquad \bnflit{MemberReference} \bnflit{.} ( \bnflit{qualifierExpression} \bnfor{} ( \bnflit{typeArgument} \bnfnt{integer} ) ) \\ \\ \bnfnt{member-select} ::= \\ \qquad \bnflit{MemberSelect} \bnflit{.} \bnflit{expression} \\ \\ \bnfnt{method-invocation} ::= \\ \qquad \bnflit{MethodInvocation} \bnflit{.} ( ( \bnflit{typeArgument} \bnfnt{integer} ) \bnfor{} \bnflit{methodSelect} \\ \qquad \bnfor{} ( \bnflit{argument} \bnfnt{integer} ) ) \\ \\ \bnfnt{new-array} ::= \\ \qquad \bnflit{NewArray} \bnflit{.} ( \bnflit{type} \bnfor{} ( \bnflit{dimension} \bnfor{} \bnflit{initializer} ) \bnfnt{integer} ) \\ \\ \bnfnt{new-class} ::= \\ \qquad \bnflit{NewClass} \bnflit{.} ( \bnflit{enclosingExpression} \bnfor{} ( \bnflit{typeArgument} \bnfnt{integer} ) \bnfor{} \bnflit{identifier} \\ \qquad \bnfor{} ( \bnflit{argument} \bnfnt{integer} ) \bnfor{} \bnflit{classBody} ) \\ \\ \bnfnt{parameterized-type} ::= \\ \qquad \bnflit{ParameterizedType} \bnflit{.} ( \bnflit{type} \bnfor{} ( \bnflit{typeArgument} \bnfnt{integer} ) ) \\ \\ \bnfnt{parenthesized} ::= \\ \qquad \bnflit{Parenthesized} \bnflit{.} \bnflit{expression} \\ \\ \bnfnt{return} ::= \\ \qquad \bnflit{Return} \bnflit{.} \bnflit{expression} \\ \\ \bnfnt{switch} ::= \\ \qquad \bnflit{Switch} \bnflit{.} ( \bnflit{expression} \bnfor{} ( \bnflit{case} \bnfnt{integer} ) ) \\ \\ \bnfnt{synchronized} ::= \\ \qquad \bnflit{Synchronized} \bnflit{.} ( \bnflit{expression} \bnfor{} \bnflit{block} ) \\ \\ \bnfnt{throw} ::= \\ \qquad \bnflit{Throw} \bnflit{.} \bnflit{expression} \\ \\ \bnfnt{try} ::= \\ \qquad \bnflit{Try} \bnflit{.} ( \bnflit{block} \bnfor{} ( \bnflit{catch} \bnfnt{integer} ) \bnfor{} \bnflit{finallyBlock} \bnfor{} ( \bnflit{resource} \bnfnt{integer} ) ) \\ \\ \bnfnt{type-cast} ::= \\ \qquad \bnflit{TypeCast} \bnflit{.} ( \bnflit{type} \bnfor{} \bnflit{expression} ) \\ \\ \bnfnt{type-parameter} ::= \\ \qquad \bnflit{TypeParameter} \bnflit{.} \bnflit{bound} \bnfnt{integer} \\ \\ \bnfnt{unary} ::= \\ \qquad \bnflit{Unary} \bnflit{.} \bnflit{expression} \\ \\ \bnfnt{union-type} ::= \\ \qquad \bnflit{UnionType} \bnflit{.} \bnflit{typeAlternative} \bnfnt{integer} \\ \\ \bnfnt{variable} ::= \\ \qquad \bnflit{Variable} \bnflit{.} ( \bnflit{type} \bnfor{} \bnflit{initializer} ) \\ \\ \bnfnt{while-loop} ::= \\ \qquad \bnflit{WhileLoop} \bnflit{.} ( \bnflit{condition} \bnfor{} \bnflit{statement} ) \\ \\ \bnfnt{wildcard} ::= \\ \qquad \bnflit{Wildcard} \bnflit{.} \bnflit{bound} \\ \\ \end{tabbing} \subsection{Annotations\label{annotations-grammar}} This section describes the details of how annotations are defined, how annotations are used, and the different kinds of annotations in an annotation file. \subsubsection{Annotation definitions\label{annotation-definitions}} An annotation definition describes the annotation's fields and their types, so that they may be referenced in a compact way throughout the annotation file. Any annotation that is used in an annotation file % either on a program element or as a field of another annotation definition. must be defined before use. (This requirement makes it impossible to define, in an annotation file, an annotation that is meta-annotated with itself.) The two exceptions to this rule are the \code{@java.lang.annotation.Target} and \code{@java.lang.annotation.Retention} meta-annotations. These meta-annotations are often used in annotation definitions so for ease of use are they not required to be defined themselves. In the annotation file, the annotation definition appears within the package that defines the annotation. The annotation may be applied to elements of any package. Note that these annotation definitions should not be confused with the \code{@interface} syntax used in a Java source file to declare an annotation. An annotation definition in an annotation file is only used internally. An annotation definition in an annotation file will often mirror an \code{@interface} annotation declaration in a Java source file in order to use that annotation in an annotation file. % TODO, see https://github.com/typetools/annotation-tools/issues/25 % The Annotation File Utilities can read annotation definitions from the % classpath, so it is optional to define them in the annotation file. \begin{tabbing} \qquad \= \kill \bnfnt{annotation-definition} ::= \\ \qquad \bnfcmt The \bnfnt{decl-annotation}s are the meta-annotations on this annotation. \\ \qquad \bnflit{annotation} \bnflit{@}\bnfnt{name} \bnflit{:} \bnfnt{decl-annotation}* \lineend \\ \qquad \bnfnt{annotation-field-definition}* \\ \\ \bnfnt{annotation-field-definition} ::= \\ \qquad \bnfnt{annotation-field-type} \bnfnt{name} \lineend \\ \\ \bnfnt{annotation-field-type} ::= \\ \qquad \bnfcmt \bnfnt{primitive-type} is any Java primitive type (\code{int}, \code{boolean}, etc.). \\ \qquad \bnfcmt These are described in detail in Section~\ref{types-and-values}. \\ \qquad (\bnfnt{primitive-type} \bnfor{} \bnflit{String} \bnfor{} \bnflit{Class} \bnfor{} (\bnflit{enum} \bnfnt{name}) \bnfor{} (\bnflit{annotation-field} \bnfnt{name})) \bnflit{[]}? \\ \qquad \bnfor{} \bnflit{unknown[]} \lineend \end{tabbing} \subsubsection{Annotation uses\label{annotation-uses}} Java SE 8 has two kinds of annotations: ``declaration annotations'' and ``type annotations''. Declaration annotations can be written only on method formal parameters and the declarations of packages, classes, methods, fields, and local variables. Type annotations can be written on any use of a type, and on type parameter declarations. Type annotations must be meta-annotated with \code{ElementType.TYPE\_USE} and/or \code{ElementType.TYPE\_PARAMETER}. These meta-annotations are described in more detail in the JSR 308 specification~\cite{JSR308-webpage-201310}. The previous rules have used two productions for annotation uses in an annotation file: \bnfnt{decl-annotation} and \bnfnt{type-annotation}. The \bnfnt{decl-annotation} and \bnfnt{type-annotation} productions use the same syntax to specify an annotation. These two different rules exist only to show which type of annotation is valid in a given location. A declaration annotation must be used where the \bnfnt{decl-annotation} production is used and a type annotation must be used where the \bnfnt{type-annotation} production is used. The syntax for an annotation is the same as in a Java source file. \begin{tabbing} \qquad \= \kill \bnfnt{decl-annotation} ::= \\ \qquad \bnfcmt \bnfnt{annotation} must be a declaration annotation. \\ \qquad \bnfnt{annotation} \\ \\ \bnfnt{type-annotation} ::= \\ \qquad \bnfcmt \bnfnt{annotation} must be a type annotation. \\ \qquad \bnfnt{annotation} \\ \\ \bnfnt{annotation} ::= \\ \qquad \bnfcmt The name may be the annotation's simple name, unless the file \\ \qquad \bnfcmt contains definitions for two annotations with the same simple name. \\ \qquad \bnfcmt In this case, the fully-qualified name of the annotation name is required. \\ % TODO: % Perhaps we could add that if a class is in the same package % as an annotation it may always use the simple name (even if there's another % annotation with the same simple name in another package)? - MP 06/28 \qquad \bnflit{@}\bnfnt{name} [ \bnflit{(} \bnfnt{annotation-field} [ \bnflit{,} \bnfnt{annotation-field} ]+ \bnflit{)} ] \\ \\ \bnfnt{annotation-field} ::= \\ \qquad \bnfcmt In Java, if a single-field annotation has a field named \\ \qquad \bnfcmt ``\code{value}'', then that field name may be elided in uses of the\\ \qquad \bnfcmt annotation: ``\code{@A(12)}'' rather than ``\code{@A(value=12)}''. \\ \qquad \bnfcmt The same convention holds in an annotation file. \\ \qquad \bnfnt{name} \bnflit{=} \bnfnt{value} \end{tabbing} \noindent Certain Java elements allow both declaration and type annotations (for example, formal method parameters). For these elements, the \bnfnt{type-annotations} rule is used to differentiate between the declaration annotations and the type annotations. \begin{tabbing} \qquad \= \kill \bnfnt{type-annotations} ::= \\ \qquad \bnfcmt holds the type annotations, as opposed to the declaration annotations. \\ \qquad \bnflit{type:} \bnfnt{type-annotation}* \lineend \\ \qquad \bnfnt{compound-type}* \end{tabbing} \paragraph{Compound type annotations} A compound type is a parameterized, wildcard, array, or nested type. Annotations may be on any type in a compound type. In order to specify the location of an annotation within a compound type we use a ``type path''. A type path is composed one or more pairs of type kind and type argument index. \begin{tabbing} \qquad \= \kill \bnfnt{type-kind} ::= \\ \qquad \bnflit{0} \bnfcmt annotation is deeper in this array type \\ \qquad \bnfor{} \bnflit{1} \bnfcmt annotation is deeper in this nested type \\ \qquad \bnfor{} \bnflit{2} \bnfcmt annotation is on the bound of this wildcard type argument \\ \qquad \bnfor{} \bnflit{3} \bnfcmt annotation is on the i'th type argument of this parameterized type \\ \\ \bnfnt{type-path} ::= \\ \qquad \bnfcmt The \bnfnt{integer} is the type argument index. \\ \qquad \bnfnt{type-kind} \bnflit{,} \bnfnt{integer} [ \bnflit{,} \bnfnt{type-kind} \bnflit{,} \bnfnt{integer} ]* \\ \\ \bnfnt{compound-type} ::= \\ \qquad \bnflit{inner-type} \bnfnt{type-path} \bnflit{:} \bnfnt{annotation}* \lineend \end{tabbing} \noindent The type argument index used in the \bnfnt{type-path} rule must be \bnflit{0} unless the \bnfnt{type-kind} is \bnflit{3}. In this case, the type argument index selects which type argument of a parameterized type to use. \urldef\cftp\url|https://checkerframework.org/jsr308/specification/java-annotation-design.html#class-file:ext:type_path| Type paths are explained in more detail, with many examples to ease understanding, in Section 3.4 of the JSR 308 Specification.\footnotemark \footnotetext{\cftp} \section{Example\label{example}} Consider the code of Figure~\ref{fig:java-example}. Figure~\ref{fig:annotation-file-examples} shows two legal annotation files each of which represents its annotations. \begin{figure} \begin{verbatim} package p1; import p2.*; // for the annotations @A through @D import java.util.*; public @A(12) class Foo { public int bar; // no annotation private @B List<@C String> baz; public Foo(@D("spam") Foo this, @B List<@C String> a) { @B List<@C String> l = new LinkedList<@C String>(); l = (@B List<@C String>)l; } } \end{verbatim} \caption{Example Java code with annotations.} \label{fig:java-example} \end{figure} \begin{figure} \begin{tabular}{|c|c|} \hline \begin{minipage}[t]{.5\textwidth} \begin{verbatim} package p2: annotation @A: int value annotation @B: annotation @C: annotation @D: String value package p1: class Foo: @A(value=12) field bar: field baz: @B inner-type 0: @C method <init>( Ljava/util/List;)V: parameter 0: @B inner-type 0: @C receiver: @D(value="spam") local 1 #3+5: @B inner-type 0: @C typecast #7: @B inner-type 0: @C new #0: inner-type 0: @C \end{verbatim} \end{minipage} & \begin{minipage}[t]{.45\textwidth} \begin{verbatim} package p2: annotation @A int value package p2: annotation @B package p2: annotation @C package p2: annotation @D String value package p1: class Foo: @A(value=12) package p1: class Foo: field baz: @B package p1: class Foo: field baz: inner-type 0: @C // ... definitions for p1.Foo.<init> // omitted for brevity \end{verbatim} \end{minipage} \\ \hline \end{tabular} \caption{Two distinct annotation files each corresponding to the code of Figure~\ref{fig:java-example}.} \label{fig:annotation-file-examples} \end{figure} \section{Types and values\label{types-and-values}} The Java language permits several types for annotation fields: primitives, \code{String}s, \code{java.lang.Class} tokens (possibly parameterized), enumeration constants, annotations, and one-dimensional arrays of these. These \textbf{types} are represented in an annotation file as follows: \begin{itemize} \item Primitive: the name of the primitive type, such as \code{boolean}. \item String: \code{String}. \item Class token: \code{Class}; the parameterization, if any, is not represented in annotation files. \item Enumeration constant: \code{enum} followed by the binary name of the enumeration class, such as \code{enum java.lang.Thread\$State}. \item Annotation: \code{@} followed by the binary name of the annotation type. \item Array: The representation of the element type followed by \code{[]}, such as \code{String[]}, with one exception: an annotation definition may specify a field type as \code{unknown[]} if, in all occurrences of that annotation in the annotation file, the field value is a zero-length array.\footnotemark \footnotetext{There is a design flaw in the format of array field values in a class file. An array does not itself specify an element type; instead, each element specifies its type. If the annotation type \code{X} has an array field \code{arr} but \code{arr} is zero-length in every \code{@X} annotation in the class file, there is no way to determine the element type of \code{arr} from the class file. This exception makes it possible to define \code{X} when the class file is converted to an annotation file.} \end{itemize} Annotation field \textbf{values} are represented in an annotation file as follows: \begin{itemize} \item Numeric primitive value: literals as they would appear in Java source code. \item Boolean: \code{true} or \code{false}. \item Character: A single character or escape sequence in single quotes, such as \code{'A'} or \code{'\char`\\''}. \item String: A string literal as it would appear in source code, such as \code{"\char`\\"Yields falsehood when quined\char`\\" yields falsehood when quined."}. \item Class token: The binary name of the class (using \code{\$} for inner classes) or the name of the primitive type or \code{void}, possibly followed by \code{[]}s representing array layers, followed by \code{.class}. Examples: \code{java.lang.Integer[].class}, \code{java.util.Map\$Entry.class}, and \code{int.class}. \item Enumeration constant: the name of the enumeration constant, such as \code{RUNNABLE}. \item Array: a sequence of elements inside \code{\char`\{\char`\}} with a comma between each pair of adjacent elements; a comma following the last element is optional as in Java. Also as in Java, the braces may be omitted if the array has only one element. Examples: \code{\char`\{1\char`\}}, \code{1}, \code{\char`\{true, false,\char`\}} and \code{\char`\{\char`\}}. \end{itemize} The following example annotation file shows how types and values are represented. \begin{verbatim} package p1: annotation @ClassInfo: String remark Class favoriteClass Class favoriteCollection // it's probably Class<? extends Collection> // in source, but no parameterization here char favoriteLetter boolean isBuggy enum p1.DebugCategory[] defaultDebugCategories @p1.CommitInfo lastCommit annotation @CommitInfo: byte[] hashCode int unixTime String author String message class Foo: @p1.ClassInfo( remark="Anything named \"Foo\" is bound to be good!", favoriteClass=java.lang.reflect.Proxy.class, favoriteCollection=java.util.LinkedHashSet.class, favoriteLetter='F', isBuggy=true, defaultDebugCategories={DEBUG_TRAVERSAL, DEBUG_STORES, DEBUG_IO}, lastCommit=@p1.CommitInfo( hashCode={31, 41, 59, 26, 53, 58, 97, 92, 32, 38, 46, 26, 43, 38, 32, 79}, unixTime=1152109350, author="Joe Programmer", message="First implementation of Foo" ) ) \end{verbatim} \section{Alternative formats\label{alternative-formats}} We mention multiple alternatives to the format described in this document. Each of them has its own merits. In the future, the other formats could be implemented, along with tools for converting among them. % Then, we can see which of the formats programmers prefer in practice. An alternative to the format described in this document would be XML\@. % It would be easy to use an XML format to augment the one proposed here, but XML does not seem to provide any compelling advantages. Programmers interact with annotation files in two ways: textually (when reading, writing, and editing annotation files) and programmatically (when writing annotation-processing tools). Textually, XML can be very hard to read; style sheets mitigate this problem, but editing XML files remains tedious and error-prone. Programmatically, a layer of abstraction (an API) is needed in any event, so it makes little difference what the underlying textual representation is. XML files are easier to parse, but the parsing code only needs to be written once and is abstracted away by an API to the data structure. Another alternative is a format like the \code{.spec}/\code{.jml} files of JML~\cite{LeavensBR2006:JML}. The format is similar to Java code, but all method bodies are empty, and users can annotate the public members of a class. This is easy for Java programmers to read and understand. (It is a bit more complex to implement, but that is not particularly germane.) Because it does not permit complete specification of a class's annotations (it does not permit annotation of method bodies), it is not appropriate for certain tools, such as type inference tools. However, it might be desirable to adopt such a format for public members, and to use the format described in this document primarily for method bodies. The Checker Framework~\cite{DietlDEMS2011,CF} uses two additional formats for annotations. The first format is called ``stub files.'' A stub file is similar to the \code{.spec}/\code{.jml} files described in the previous paragraph. It uses Java syntax, only allows annotations on method headers and does not require method bodies. A stub file is used to add annotations to method headers of existing Java classes. For example, the Checker Framework uses stub files to add annotations to method headers of libraries (such as the JDK) without modifying the source code or bytecode of the library. A single stub file can contain multiple packages and classes. This format only allows annotations on method headers, not class headers, fields, and method bodies like in a \code{.jaif} file. Further, stub files are only used by the Checker Framework at run time, they cannot be used to insert annotations into a source or classfile. The Checker Framework also uses a format called an ``annotated JDK.'' The annotated JDK is a \code{.jar} file containing the JDK with annotations. It is created with the Annotation File Utilities, but the annotations are stored in a format similar to a stub file, instead of in a \code{.jaif} file. The annotated JDK starts with a source file for each file in the JDK to be annotated. Like a stub file, each source file only contains method headers with annotations. The annotated JDK also supports annotations in the class header. To build the annotated JDK \code{.jar} file, the source files are compiled, then the \code{extract-annotations} script is run on them to generate a \code{.jaif} file for each source file. The \code{insert-annotations} script then inserts the annotations contained in each \code{.jaif} file into the corresponding JDK class file. These are then packaged up into a single \code{.jar} file. Like a stub files, the annotated JDK is easier to read and write since it uses Java syntax. However, the annotated JDK requires a different file for each original Java source file. It does not allow annotations on fields and in method bodies. The annotated JDK also only contains annotations in the JDK and not other Java files. \bibliographystyle{alpha} \bibliography{annotation-file-format,bibstring-unabbrev,types,ernst,invariants,generals,alias,crossrefs} \end{document} % LocalWords: java javac OuterClass InnerClass TODO Kleene MP subannotations % LocalWords: enum arr quined int pt instanceof RUNTIME JVML ILjava boolean % LocalWords: programmatically jml ernst jaif whitespace 0pt decl enums % LocalWords: filenames typeparam javap init clinit ast un lowercased io % LocalWords: ExpressionStatement AnnotatedType underlyingType ArrayType % LocalWords: ArrayAccess leftOperand rightOperand CompoundAssignment % LocalWords: ConditionalExpression trueExpression falseExpression % LocalWords: DoWhileLoop EnhancedForLoop ForLoop thenStatement NewArray % LocalWords: elseStatement InstanceOf LabeledStatement LambdaExpression % LocalWords: MemberReference qualifierExpression typeArgument NewClass % LocalWords: MemberSelect MethodInvocation methodSelect classBody % LocalWords: enclosingExpression ParameterizedType finallyBlock AScene % LocalWords: TypeCast UnionType typeAlternative WhileLoop ElementType % LocalWords: AClass AMethod AElement objectweb anno tations parseScene % LocalWords: CriterionList isSatisifiedBy CriteriaList afu getPositions % LocalWords: InPackageCriterion InClassCriterion InMethodCriterion % LocalWords: ParamCriterion inserter RUNNABLE ASM src asm