Crafting types with Scala 3 macros - Part 1: Introduction to macros

With the release of Scala 3, one of the biggest changes to the language revolves around metaprogramming: Inline-functions, match types, generic-programming tools like tuple types and Mirrors, as well as a new macro API have been added to make code generation a major concern of Scala. Naturally, one of the first things you may want to do is generate new classes, which is much harder than it sounds. My goal for this series of blog posts is to teach you all of the secret tricks to work around macro limitations and obtain the forbidden fruit (or something close to it). Part 2 can be found here.

Anyone looking into Scala metaprogramming on Stackoverflow will quickly find out that Scala 3 macros cannot generate new public classes, variables or functions that are visible to the user. This is on purpose and likely won’t change in the near future because it would make incremental compilation much more difficult, but we won’t let that stop us. In this first part, I will provide a general introduction to Scala 3 metaprogramming facilities and the macro APIs in particular, with a focus on their suitability to generate new classes. In following installments of the series, we will work through a detailed example for generating a quasi new type equipped with all the usual anemities like code completion, pattern matching, subtyping and automatic typeclass derivation.

What options do we have for metaprogrammig?

  1. Scalafix: Scalafix is a tool for linting and code rewriting. While Scalafix has the ability to generate new code, this code will be inserted directly into your source files and only if you manually execute the suggested code action. Who wants to have generated code mixed in with their other stuff? Not a great option.
  2. Scalameta: Scalameta is an external tool that can parse, understand and pretty print scala code. It has its own Scala AST and, as I understand it, can generate code in your project by writing .scala text files, which are then compiled as external source roots by SBT, similar to other annotation processors such as Java’s APT. This would probably be the easiest approach if you just want to generate new classes that can live independently of other existing code. Unfortunately, as far as I can tell it is not released yet for Scala 3 (as of March 2024), it does however support Scala 2.13. Since Scala 2.13 libraries can be consumed by Scala 3, it may be possible with some effort to integrate Scalameta in a Scala 3 build. A significant disadvantage of such an external tool is that it can only generate code globally, based on global information. For example, you cannot generate JSON codecs for a private class because the private class is not visible from a package level where the codecs’ source code would be generated.
  3. Typelevel programming with match types: Match types are a new feature in Scala 3 that let you write control structures directly at the type level:

     type Elem[X] = X match
         case String => Char
         case Array[t] => t
         case Iterable[t] => t
    

    Combined with singleton/literal types and type tuples, a lot of logic can be accomplished at the type level (for example computing the fibonnaci sequence or sorting a tuple of strings). Match types are not very useful in our case because they can only manipulate existing types by type application, matching on type constructors and building unions or intersections. They are able to make tuples, but not structural refinements. Since tuple elements cannot be accessed by name (yet), their ergonomics leave much to be desired. Furthermore, a type-level function cannot introspect the fields of a case class or cases of a sealed trait because this information is only available through implicit Mirror instances, a value-level object that cannot be referenced at the type level.

  4. Generic programming with givens: Givens are strictly more powerful than match types for metaprogramming. While implicits/givens are at the value-level, their resolution happens at compile-time and thus the implicit search mechanism can be used to compute things at compile-time by case distinction, recursion and so on. Naturally, givens always return values, but those values can have type members and those type members are also known at compile-time. We can thereby compute types by referencing path-dependent types of other givens. We can also introspect the structure of algebraic data types (case classes and sealed traits) by summoning their Mirror. Mirror.Product denotes a case class, whereas Mirror.Sum denotes a sealed trait. Their type members type MirroredElemTypes and type MirroredElemLabels tell us the types and names of the case class fields or sealed subclasses respectively. Note that this only works for ADTs; we cannot find out what functions or variables an arbitrary class has.

    The following example shows how to use givens and generic programming with Mirror to compute the union of all field types of a case class Person and then reference the computed type in another function:

     case class Person(age: Int, name: String)
    
     trait UnionOfFieldTypes[T] { type Out }
     given [T](using m: Mirror.ProductOf[T]): UnionOfFieldTypes[T] with {
         type Out = Tuple.Union[m.MirroredElemTypes]
     }
    
     def acceptsAnyPersonField(using u: UnionOfFieldTypes[Person])(field: u.Out) = ???
    

    When working with Mirrors it is often necessary to summon a tuple of instances for all types in the fields/cases tuple. The easiest way to do this is to use Shapeless 3. In contrast to the horribly complicated behemoth that is Shapeless 2, Shapeless 3 does only two things:

    • It extends the standard Mirror mechanisms to other kinds:

        object K0 {
            type Generic[O] = Mirror {
                type Kind = K0.type
                override type MirroredType = O
                override type MirroredMonoType = O
                override type MirroredElemTypes <: Tuple
            }
        }
      
        object K1 {
            type Generic[O[_]] = Mirror {
                type Kind = K1.type
                override type MirroredType[X] = O[X]
                override type MirroredMonoType = O[Any]
                override type MirroredElemTypes[_] <: Tuple
            }
        }
      

      As you can see, K0.Generic and K1.Generic are just refinements of Mirror that do nothing else than fix the kinds of the member types to have parameters if the mirrored type O also has parameters.

    • It allows us to summon instances for all fields/cases of ADTs and map or fold over them using ProductInstances/CoproductInstances:

        def showMyCaseClass[T](
            using m: Mirror.ProductOf[T], fieldInstances: K0.ProductInstances[Show, T]
        ): Show[T] = new Show[T] {
            override def show(t: T): String =
                val fieldStrings = fieldInstances.foldLeft(t)(List.empty[String]) {
                    [A] =>> (result: String, showField: Show[A], field: A) =>
                    result + showField.show(field)
                }
      
                s"${m.MirroredLabel}(${fieldStrings.mkString(", ")})"
        }
      
        //> val p = Person(name = "Guy Incognito", age = 99)
        //> summon[Show[Person]].show(p)
        // Person("Guy Incognito", 99)
      
  5. Macros: Macros are a code-manipulation feature offered by the language itself. They are deeply integrated into the compiler and thus can (potentially) generate expressions in-place in your code without changing the file like an external tool would. In contrast to match types and givens, they manipulate code, with comments and all, not types or values. This makes them much more powerful. They are also not limited to global information, but can be called at any place, see local symbols and generate code in local scope.

Following Along

Before we get to the meat of macro programming, a few comments on project setup for those readers who want to follow along with the blog post:

  1. I recommend to use VSCode with Metals and enable auto-save after typing with a 500 ms delay. Metals uses the scala compiler for code highlighting and is thus always accurate. IntelliJ on the other hand seems to use its own parser for highlighting and has very significant problems with advanced features of Scala 3, particularly with higher-kinded types, AnyKind type parameters and type arguments on unapply extractors, always showing errors where there are none. VSCode can also show you code completion hints for structural refinements which will become important in part 2 of this blog post. The disadvantage of Metals is that it has to do incremental compilation after. every. single. keystroke. to update the highlighting which becomes unbearable slow in macro-heavy projects. But for prototyping it’s a good option.

  2. The following SBT settings should be used:

  ThisBuild / scalaVersion := "3.3.3"
  ThisBuild / scalacOptions ++= Seq(
    "-encoding",
    "UTF-8",
    "-deprecation",
    "-feature",
    "-Xcheck-macros",
    "-Ycheck:all", // also for checking macros
    "-Ycheck-mods",
    "-Ydebug-type-error",
    "-Xprint-types", // Without this flag, we will not see error messages for exceptions during given-macro expansion!
    "-Yshow-print-errors",
    "-language:experimental.macros",
    "-language:implicitConversions",
    "-language:higherKinds",
    "-language:namedTypeArguments",
    "-language:dynamics",
    "-Ykind-projector:underscores",
    "-unchecked"
  )

In addition, the following settings can be enabled when debugging but will produce a lot of log spam:

  Seq(
    "-Xprint:typer",
    "-explain",
  )

Macros

Macro Entry Points

There are two ways to call a macro: annotation macros and def-macros. Annotation macros, like the name suggests, are executed when a class or method is annotated. They have access to the entire tree of the annotated element as well as its parents and are useful mainly for inspection (linting/error messages) or transparently replacing method implementations (for example wrapping a method body to cache results). While they can, theoretically, alter the tree however they like, none of those changes will be visible to the typechecker (firstly because we cannot change the Symbols and secondly because of the way that the macro is executed in compiler phases). Nonetheless, they can generate private member variables (subject to limitations of the lackluster Symbol API explained later), giving them the powerful ability to save runtime state in the annotated class. This very simple annotation macro allows us to annotate any class or method with @printTree to print its tree to the console, without modifying the tree itself:

@experimental
class printTree extends MacroAnnotation {
  override def transform(using q: Quotes)(
    tree: q.reflect.Definition
  ): List[q.reflect.Definition] =
    import q.reflect.{*, given}

    println("print tree: " + tree.show(using Printer.TreeStructure))
    List(tree)
}

Def-macros on the other hand, do not modify existing trees. They are simply a normal inline def method body, but generated by a macro. The def-macro can at most introspect the tree of any parameters given to it, or possibly of the call-site.

inline def myMacroFun(inline arg: String): Int = ${ myMacroFunImpl('arg) }
private def myMacroFunImpl(argExpr: Expr[String])(using q: Quotes): Expr[Int] =
    import q.reflect.{*, given}
    '{ 1234 }

All def-macros must follow this pattern exactly: The public method must be inline and it must do nothing else than immediately call the macro implementation. Arguments passed to the implementation must be quoted (inside your implementation you only have access to Exprs of arguments, not the argument values directly). The parameters can be made inline (but they don’t have to be), which will give you access to the expression tree that created the argument at the call-site. Def-macros are less powerful than annotation macros, but their advantage is that they are entirely invisible to the user, making for a great user experience. With a few tricks they can be made to do surprising things, which will be our topic in the next part of this blog series.

Quotes context object

A macro implementation must always have an instance of scala.reflect.Quotes available. The Quotes is a context object that serves two purposes: Firstly it scopes the types of the TASTy (Typed Abstract Syntax Tree) reflection API. Since the TASTy API consists entirely of abstract member types, we can only access them as path-dependent types. The Quotes object is that path.

private def myMacroImpl(using q: Quotes): q.reflect.Term // <--- Term is path-dependent on q

The usual caveats of working with path-dependent types apply: We need to somehow keep track of the singleton type q.type so that the compiler can remember that the types are compatible. As the official documentation says, we should avoid saving Quotes into a field. If you want to extract utility methods to a different class, it must be parameterized by the singleton type q.type to remember the path:

class MacroUtilMethods[Q <: Quotes](using q: Q) {
    def myUtilMethod(): Q.reflect.Term = ???
}

given q: Quotes = ???
val utils = new MacroUtilMethods[q.type](using q)
val term: q.reflect.Term = utils.myUtilMethod()

Secondly, the Quotes object captures the context of the macro expansion. This includes the compiler’s symbol table, giving us access to information about all the symbols that are known in the scope where the macro will be expanded (through methods such as Symbol.requiredClass(String) or Expr.summon[T]); the source code position of the macro call-site via Position.ofMacroExpansion which can be used with the various reporting methods to show errors and warnings at a specific line in the IDE; as well as the parent of the generated code via Symbol.spliceOwner.

Macro APIs

The world of macros is split into two sides: The orderly and ridgid, well type-checked side of quotation & splices and the dynamic but flexible side of TASTy (Typed Abstract Syntax Tree) reflection. The new Scala 3 quotation feature '{} can be used to compile code and capture it as a value of Expr[E] that can be manipulated instead of fusing it as instructions into the host program. The code of other expressions, in form of Expr[E] values, can be spliced into a quoted block with ${}, similar to a string interpolation. Example:

val condition: Expr[Boolean] = '{
    val random = newRandomInt()
    random == 123
}

val doPrint: Expr[Unit] = '{
    if ${ condition }
    then println("yes")
    else println("no")
}

Quotation can only ever produce an expression as its result, not a declaration. While we can declare new classes in a quoted block, they will not be visible outside, just like a local class declared in a function’s body block will not be visible outside it:

val e: Expr[Any] = '{ // Inferred type must be Any because MyQuotedClass is not visible outside the block scope.
    class MyQuotedClass(val a: Int) // This class declaration statement is not an expression, so it cannot be the last statement of the block.
    new MyQuotedClass(123) // We MUST return something here, even if it is just (): Unit.
}

While Expr[E] represents the code of a value with type E, Type[T] represents a reference to a type T. Having a given Type[T] in scope allows us to use generic T in quoted code:

def myMacro[T](using Type[T]): Expr[Unit] =
    '{ (callSomeOtherMethod[T]): Unit }

In some cases, we can convert an Expr[T] directly to a value of T using Expr.valueOrAbort. Such a value needs to cross the boundary from code-that-creates-value to runtime value, so it only works if a FromExpr[T] type class instance is available to do this conversion, which is mostly the case only for primitive types.

Both quoted expressions and quoted type constructors can be matched to destructure them. Here we are matching an expression that creates a Tuple2. Don’t forget the implicit class application!

// This is defined in the Scala standard library. I'm only including it here so you can see the definition.
implicit final class ArrowAssoc[A](private val self: A) extends AnyVal {
    @inline def -> [B](y: B): (A, B) = ???
}
// ------------------------------------------------------
val t: Expr[(String, Int)] '{ "hello" -> 123 }
t match
    // Match on implicit class and -> operator:
    case '{ scala.Predef.ArrowAssoc($key: String).->($value: Int) } =>
        // key: Expr[String]
        // value: Expr[Int]
        ...

Matching on a type constructor:

def myMacro[T <: Tuple](using Type[T]) =
    summon[Type[T]] match
        case '[head *: tail] =>
            // Type[head] and Type[tail] implicits are now in scope, thus we can reference
            // head and tail directly as types in quoted code:
            '{ callSomeOtherMethod[head, tail]() }
        case '[EmptyTuple] => ...

For more information see https://docs.scala-lang.org/scala3/reference/metaprogramming/macros.html#quote-pattern-matching-2.

This sort of matching is particularly useful if we want to reference a Type[?] in our quoted code (often the result of converting a TypeRepr):

val tpe: Type[?]
tpe match
    case '[t] => '{ foo[t]() }

Unfortunately this matching does not work if the type is of a differend kind (a type constructor with unapplied type parameters). In that case we have three options. If the type is known but of a different kind, we can splice the abstract type member type Underlying = T of Type[T <: Anykind] directly:

val tpe = summon[Type[Option]] // Note: Option has kind Type => Type

tpe match
    case '[t[s]] => '{ foo[t[Int]]() } // ERROR: tpe$given1.Underlying does not take type parameters
    case '[t] => '{ foo[t[Int]]() } // ERROR: tpe does not take type parameters

'{ foo[tpe.Underlying[Int]]() } // OK

If the type is unknown (but we the programmer know in our heads what the kind will be) we can introduce a new abstract type and cast the Type instance to it. This is the most flexible and my favorite trick. Surprisingly, it works! This approach even allows us to add type bounds:

def foo[O[_] <: Option[_]]() = ??? // defined in different file

type X[_] <: Option[_] // <-- with type bounds!
val someType: Type[?] = Type.of[Some]
given Type[X] = someType.asInstanceOf[Type[X]]
'{ foo[X]() }

Starting with Scala 3.4.0 (SIP-53), the pattern matching for Types was improved to be able to do this thing inline as well, but it seems to be still a little finicky and doesn’t always work:

val tpe = summon[Type[Option]]
tpe match
    case '[type t[X] <: Option[X]; t] => '{ foo[t[Int]]() }

Sometimes we want to use givens for a generic type in a quoted block. Here it is important to understand that the quotation is always typechecked in the context where it is written. Splices (referencing type parameters with a Type[?] instance is also a splice) are then simply subsituted at macro expansion, without any additional typechecking. Thus the following code will not compile because a given MyTypeClass[T] cannot be found for generic T:


def funUsingTypeClass[S](using MyTypeClass[S]) = ???

def myMacroImpl[T: Type](using q: Quotes) =
    import q.reflect.*
    '{ funUsingTypeClass[T] } // ERROR: no instance of MyTypeClass[T] found

We need to explicitly add a deferred summoning at the macro expansion site, using Expr.summon:

def myMacroImpl[T: Type](using q: Quotes) =
    import q.reflect.*
    val tTypeClass: Expr[MyTypeClass[T]] = Expr.summon[MyTypeClass[T]].get
    '{ funUsingTypeClass[T](using $tTypeClass) }

Expr[E] and Type[T] are opaque in the sense that we cannot see what is “inside”. All we can do is quote or splice, not inspect the code. Furthermore, since the code in a quotation must type-check, many things cannot be expressed with quoted code. For example, we cannot instantiate a generic type A because it is unknown if this type is open and constructable:

given Type[A] = ...
'{ new A().asInstanceOf[A] } // ERROR: Unknown if `A` has a parameterless constructor!

In cases like this, it becomes necessary to step down to a lower level and construct expression trees manually with TASTy reflection. TASTy is nothing else than a class hierarchy representing the abstract syntax tree of Scala 3 code, like you know from any other code generation tool. With TASTy reflection we can, theoretically, create any kind of syntactical construct; expression or declaration, type annotations, even incomplete or illegal constructs. We can also inspect the quoted code in detail. Unfortunately, the TASTy API is very much lacking in the documentation department, so the usual way to go about things is to write some example code and try to replicate the AST that the compiler generates for it:

val e = '{ Foo(1 + 2) }
println(Printer.TreeStructure.show(e.asTerm))

results in

Inlined(
  Some(
    TypeIdent(
      "Main$package$")),
  Nil,
  Apply(
    TypeApply(
      Select(
        Ident(
          "Foo"),
        "apply"),
      List(
        Inferred())),
    List(
      Apply(
        Select(
          Literal(
            IntConstant(
              1)),
          "+"),
        List(
          Literal(
            IntConstant(
              2)))))))

As you can well see, TASTy is extemely verbose, so this simple example will have to suffice.

TASTy types are arranged in a hierarchy, which you can look up in the Scaladoc of scala.quoted.Quotes.reflectModule:

+- Tree -+- PackageClause
         |
         +- Statement -+- Import
         |             +- Export
         |             +- Definition --+- ClassDef
         |             |               +- TypeDef
         |             |               +- ValOrDefDef -+- DefDef
         |             |                               +- ValDef
         |             |
         |             +- Term --------+- Ref -+- Ident -+- Wildcard
         |                             |       +- Select
         |                             |
         |                             +- Literal
         |                             +- This
         |                             +- New
         |                             +- NamedArg
         |                             +- Apply
         |                             +- TypeApply
         |                             +- Super
         |                             +- Assign
         |                             +- Block
         |                             +- Closure
         |                             +- If
         |                             +- Match
         |                             +- SummonFrom
         |                             +- Try
         |                             +- Return
         |                             +- Repeated
         |                             +- Inlined
         |                             +- SelectOuter
         |                             +- While
         |                             +---+- Typed
         |                                /
         +- TypedOrTest +----------------·
         +- Bind
         +- Unapply
         +- Alternatives
         |
         +- CaseDef
         +- TypeCaseDef
         |
         +- TypeTree ----+- Inferred
         |               +- TypeIdent
         |               +- TypeSelect
         |               +- TypeProjection
         |               +- Singleton
         |               +- Refined
         |               +- Applied
         |               +- Annotated
         |               +- MatchTypeTree
         |               +- ByName
         |               +- LambdaTypeTree
         |               +- TypeBind
         |               +- TypeBlock
         |
         +- TypeBoundsTree
         +- WildcardTypeTree

+- ParamClause -+- TypeParamClause
                +- TermParamClause

+- TypeRepr -+- NamedType -+- TermRef
             |             +- TypeRef
             +- ConstantType
             +- SuperType
             +- Refinement
             +- AppliedType
             +- AnnotatedType
             +- AndOrType -+- AndType
             |             +- OrType
             +- MatchType
             +- ByNameType
             +- ParamRef
             +- ThisType
             +- RecursiveThis
             +- RecursiveType
             +- LambdaType -+- MethodOrPoly -+- MethodType
             |              |                +- PolyType
             |              +- TypeLambda
             +- MatchCase
             +- TypeBounds
             +- NoPrefix

+- Selector -+- SimpleSelector
             +- RenameSelector
             +- OmitSelector
             +- GivenSelector

+- Signature

+- Position

+- SourceFile

+- Constant -+- BooleanConstant
             +- ByteConstant
             +- ShortConstant
             +- IntConstant
             +- LongConstant
             +- FloatConstant
             +- DoubleConstant
             +- CharConstant
             +- StringConstant
             +- UnitConstant
             +- NullConstant
             +- ClassOfConstant
+- Symbol

+- Flags

Most of interest to us are the Tree, Term, TypeTree, TypeRepr and Symbol. Trees are, loosely speaking, all the things that you write down in source code, with some exceptions. Term is the Tree of just expressions, i.e. things that can be used in place of a value and this type is analogous to Expr[E]. We can convert between Term/Expr[E] with the methods asTerm/asExprOf[E]. Class definitions and such are not values, so they are a Statement but not a Term. TypeTrees can be understood to be types “as written down” in source code. This includes things like the righthand side of a type-alias definition (type Foo = <Bar: TypeTree>) or the type-ascription of an expression ((<x: Term>: <Foo: TypeTree>)). On the other hand, a TypeRepr is not written in source code, it merely represents that something has a certain type, irregardless of where that typing comes from. They are analogous to Type[T]. TypeReprs mostly occur together with Symbols, which brings us to the most important class of all: Everything that has a name in Scala has a Symbol. The Symbol is a handle for metadata about a syntactic construct and it contains all the useful information that one might ask about (name, type, parents, declared methods and so on). For instance, if we want to find out what methods a class has, we could traverse its ClassDef tree, ardously pattern match on all the enclosed DefDef trees, extract their TypeTrees, then do the same recursively for all the declared parents of the ClassDef or we could simply call Symbol.methodMembers. Think of it this way: When you’re a government employee and you want to know what houses are in a neighborhood, you don’t drive there and count them; you look at the architect’s plan! Unless you want to inspect the source code as written in a macro, you should always rely on Symbols and never on Trees, doubly so because Trees are not even guaranteed to exist for constructs declared outside the current compilation unit. A Symbol can usually be obtained with TypeRepr.typeSymbol or TypeRepr.termSymbol methods. Another important thing about Symbols is that they’re not just a summary of a definition, they are in a certain sense authorative over the written source code because this is the information used in the compiler’s symbol table to govern type checking and everything else. When creating a new definition, a corresponding Symbol is always created first and the Tree must conform to the shape of the Symbol. If your macro is changing a ValDef tree from val x: Int = 1 to val x: String = "foo", but the Symbol still says that x is an Int, bad things will happen. The two must always align!

In contrast to quoted code, TASTy trees’ types are not known at compile time (evident by the fact that quotes always yield Expr[E] with a known E while Term does not have a type parameter). Do not misinterpret this to mean that Terms are untyped! Nodes in a Term tree always include type information; it may be wrong or illegal, but it is always there. Completely untyped trees actually do exist inside of the compiler under the untpd module (they are used before the typer phase has assigned all the inferred types), but they are not available to us macro authors. Importantly, the type information of Term is at the value-level and not at the type-level like with Expr[E]. The type of a Term will be known only at the end of macro execution, whereas the type of a quotation is known at macro compilation. This makes TASTy reflection inherently less safe, but also more flexible, as we can dynamically create Trees that are not checkable to be correct statically (like calling the possibly non-existent constructor of a generic type).

Back in the days of Scala 2, there was no public API for the AST and macro authors used internal compiler classes to inspect code. Needless to say, this lead to a lot of churn because these internal classes would change frequently and constantly break all the macro libraries. In an effort to avoid this fiasco and expose as little implementation details as possible, the new Scala 3 reflection API was created in a very abstract, peculiar way, where we do not have access to the classes directly. Instead, each node of the AST is hidden behind a type alias and methods are added to it via extensions:

type Import <: Statement
given ImportTypeTest: TypeTest[Tree, Import]

val Import: ImportModule
trait ImportModule { this: Import.type =>
    def apply(expr: Term, selectors: List[Selector]): Import
    def copy(original: Tree)(expr: Term, selectors: List[Selector]): Import
    def unapply(tree: Import): (Term, List[Selector])
}

given ImportMethods: ImportMethods
trait ImportMethods {
    extension (self: Import)
        def expr: Term
        def selectors: List[Selector]
}

Here we have a TypeTest that allows us to pattern match the type alias, a singleton trait ImportModule with corresponding object Import emulating a class companion object and a given ImportMethods with extension methods on an Import instance. All of the types in the reflection API follow this pattern and you better get used to navigating it because unfortunately it largely breaks the code completion and navigation in IntelliJ (I find that it helps to add the superfluous import quotes.reflect.given, but still the experience is quite bad).

The trifecta of apply, copy and unapply methods is our only way to introduce or eliminate new nodes in a TASTy tree. The meaning of their parameters is usually undocumented and they do not always align with the output that you would get from Printer.TreeStructure, making it difficult to reproduce our example code. One of the biggest problems with the reflection API is that not all nodes have an apply method yet and thus cannot be created from scratch; often enough we are lucky to get copy. I’m not sure exactly what the difference between apply and copy is supposed to be. Looking at the source code of the internal compiler structures that back the API, it seems that copy will reuse the Symbol of the original tree node. In practice this means that we can change some parts of the definition of a node, but not the declaration. As an example, let us consider the copy method of DefDef (function definition):

def copy(original: Tree)(name: String, paramss: List[ParamClause], tpt: TypeTree, rhs: Option[Term]): DefDef

This method seemingly allows us to change the name, parameters, return type (tpt) and implementation rhs of the declared function. However, since the written parameter declaration and written return type declaration must remain aligned with the DefDefs old Symbol, we cannot actually change the types in any meaningful way, only the rhs implementation. (Disclaimer: You can technically change them however you want, even copy a tree node of a completely different type and it will not trigger safety checks, but bad things may happen. The type checking and type inference may be different than the types that you generated, your type changes may have no observable effect or ClassCastException and NoSuchMethodErrors will be thrown at runtime due to binary incompatibility).

Creating New Declarations

As mentioned previously, a new declaration always starts with a new Symbol. This is the second major problem with the reflection API because the creation of new symbols is currently still very limited. Let us try to create a new class definition with the Symbol.newClass method:

@experimental def newClass(parent: Symbol, name: String, parents: List[TypeRepr], decls: Symbol => List[Symbol], selfType: Option[TypeRepr]): Symbol

Right away it should be noted that this method is @experimental, an annotation that absolutely infects the entire codebase like cancer if you use it, as every call-site needs to be annotated the same way.

As an example, let us generate the following Greeter class:

class Greeter extends Any {
  def greet(name: String): String = _root_.scala.StringContext.apply("hello ", "").s(name)
}

First we create the symbol for the new class definition. The class symbol also contains symbols for all the declarations in its body, which are created through by the decls method. When creating a new Symbol we must set the owner or parent, which is why decls is a function and not a simple list: to be able to construct the symbols of member declarations, we need to be able to access the (yet unfinished) class Symbol through decls’s parameter. paramInfosExp and resultTypeExp are the parameter types and result type respectively. They are also functions, but don’t ask me why (possibly to be able to reference path-dependent types on the parameters via MethodType.param(idx: Int)). To create a function with type parameters, we would wrap our MethodType in a PolyType. Type parameters in general are usually treated like just another parameter list in the reflection API, except that their TypeRepr is a TypeBounds instead of a fully applied type.

val parentsTypeTrees = List(TypeTree.of[Any])
val classSym = Symbol.newClass(
    parent = Symbol.spliceOwner,
    name = "Greeter",
    parents = parentsTypeTrees.map(_.tpe),
    decls =  { (classSym: Symbol) =>
        List(
            Symbol.newMethod(classSym, "greet",
                MethodType(paramNames = List("name"))(
                    paramInfosExp = (_: MethodType) => List(TypeRepr.of[String]),
                    resultTypeExp = (_: MEthodType) => TypeRepr.of[String])))
    },
    selfType = None)

Next, we create the def greet(name: String): String method definition. Since the definition (DefDef) always references the Symbol, we have to get ahold of the symbol we created beforehand indirectly through the class symbol. Do not forget to change the owner of nested elements in the function body (more on that later).

val defGreet = {
    val defGreetSym = classSym.declaredMethod("greet")(0) // there can be multiple overloaded methods with the same name
    DefDef(defGreetSym, rhsFn = {
        /* I assume that the nested List is used here to represent multiple parameter
        lists. Remember that type parameters are also a parameter list. */
        case List(List(name @ Ident("name"))) =>
            Some('{ s"hello ${${name.asExprOf[String]}}" }.asTerm.changeOwner(defGreetSym))
    })
}

Finally, assemble everything together:

val classDef = ClassDef(classSym, parents = parentsTypeTrees, body = List(defGreet))
println(classDef.show(using Printer.TreeShortCode))

“Alright, but how do I make my class a trait or an abstract class?” you will probably ask yourself right now. The disappointing answer is that you cannot, at least not in a reliable way, because the crucial Flags parameter is missing for some reason, despite being present on the internal dotc.core.Symbols.newClassSymbol method. Neither can you create a new Symbol for a type alias declaration and many other things. You can of course cast everything to the internal classes and then work with that, but it would be pretty pointless, since the whole reason for a new macro API was to avoid that:

extension (using q: Quotes)(symb: q.reflect.Symbol)
  @experimental
  def setFlags(flags: q.reflect.Flags): q.reflect.Symbol = {
    given dotty.tools.dotc.core.Contexts.Context = q.asInstanceOf[scala.quoted.runtime.impl.QuotesImpl].ctx

    symb.asInstanceOf[dotty.tools.dotc.core.Symbols.Symbol].denot.setFlag(flags.asInstanceOf[dotty.tools.dotc.core.Flags.FlagSet])
    symb
  }

In conclusion, we determine that creating new declarations with TASTy reflection isn’t realistically possible with the current state of the API, except perhaps for def members and val members. The best course of action is usually to write everything you can using quoted code and only drop down to TASTy reflection for isolated parts.

Ownership

The TASTy reflection Tree is doubly-linked. Tree nodes not only reference their children, they also reference a parent indirectly through their Symbols owner: Symbol field (if the element has a Symbol). The ownership relation, where each syntactic element is owned by the symbol of the nearest enclosing element, must be respected for generated code, too. Functions are owned by their enclosing class, function bodies are owned by their enclosing function, the righthand side of a variable definition is owned by the variable, and so on. As mentioned previously, the Quotes context object remembers who is the parent of the macro expansion site. All code generated with a given Quotes object, be it through quoted code sections with '{ } or manually through the reflection API, will automatically be owned by Symbol.spliceOwner. Which symbol in particular is Symbol.spliceOwner depends on the context of the Quotes object. For all top-level elements it is enough to rely on this automatic mechanism, but if we’re generating function bodies, classes with members and so on, there will be elements generated that are nested in deeper levels and thus we will need to set their appropriate owner manually. There are two ways to do this. Either we use the Tree.changeOwner(newOwner: Symbol): Tree function:

val valSym = Symbol.newVal(parent = Symbol.spliceOwner, "x", TypeRepr.of[Int], Flags.EmptyFlags, privateWithin = Symbol.noSymbol)
val valDef = ValDef(valSym, rhs = Some(
    '{ 1 + 2 + 3 }.asTerm.changeOwner(valSym)
))

or we use the Symbol.asQuotes: Quotes function to create a new Quotes context with the right parent:

val valSym = Symbol.newVal(parent = Symbol.spliceOwner, "x", TypeRepr.of[Int], Flags.EmptyFlags, privateWithin = Symbol.noSymbol)
val valDef = ValDef(valSym, rhs = Some(
    {
        given q2: Quotes = valSym.asQuotes
        import q2.reflect.*

        '{ 1 + 2 + 3 }: Expr[Int]
    }.asTerm
))

The issue with asQuotes is that it changes the path-dependent types, so if you return a q2.reflect.Term from the block, it will not compile where a q.reflect.Term is expected. Expr is not path-dependent on Quotes, so we must always return an Expr from the block and do a whole dance of converting back and forth between Expr and Term. Tree is even more problematic because it does not have a non-path-dependent Expr-equivalent. I usually prefer to use changeOwner for this reason.

Error Reporting and Source Positions

We can report errors and warnings using the functions under Quotes.reflect.report module:

def errorAndAbort(msg: String): Nothing
def errorAndAbort(msg: String, expr: Expr[Any]): Nothing
def errorAndAbort(msg: String, pos: Position): Nothing
def warning(msg: String): Unit
def warning(msg: String, expr: Expr[Any]): Unit
def warning(msg: String, pos: Position): Unit

As you can see, these functions take either an Expr or Position which dictate where the squiggly lines will appear in the IDE. To get a Position you an either use Symbol.spliceOwner.pos or Position.ofMacroExpansion (the two are the same) as the default, or you can walk the Tree and call Tree.pos on the element that you want to underline. Passing in an Expr will simply take Expr.asTerm.pos and is useful mainly when your macro is taking in arguments (which are always Expr) and you want to underline the argument only, not the entire macro function call. An Expr created by the macro itself will always inherit the Position of its parent, unless set otherwise (which I don’t think is even possible at the moment). Thus it would be pretty pointless to use that method with your own generated Exprs.

A few words on TASTy files

TASTy is not only the name of the new compile-time reflection API, it is also a binary exchange format for compiled Scala 3 code. When compiling a source file, the scalac compiler automatically creates .tasty files:

$ scalac Hello.scala
$ ls -1
Hello$package$.class
Hello$package.class
Hello$package.tasty
Hello.scala
hello.class
hello.tasty

TASTy files are essentially serialized Trees and contain all the same information about code, documentation, symbols, etc. They fill a somewhat similar role as JVM .class files, except they don’t have to break down everything to the lowest common denominator of Java, erasing type parameters and such in the process. TASTy files are primarily used by external tools like the Scala language server, so we will not explore them any further. However, having a serializable reflection API opens up some interesting possibilites in distributed computing: we could send serialized functions or entire programs over the network, without losing all the metadata in VM bytecode but much more easily parsable than concrete syntax.


That’s it for today. Click here for part 2, where we will apply all the lessons learned in this post plus a whole array of other tricks to create a whitebox macro.

Kommentare