Ricardo Rocha's Website

Musings on Programming and Programming Languages

Email GitHub Bitbucket Twitter LinkedIn

Type-safe Strings with Scala Macros

This post presents Scala’s def macros in a basic, introductory fashion. Macros are leveraged to provide compile-time type safety for strings literals with a well-defined syntax.

Embedded Language Strings are Fragile

As Scala programmers we’re often faced with the need to embed typed strings in our otherwise pristine code:

// xpath
webDriver findElement By.XPath("//ul/li[@class = 'description']")

// CSS selectors
webDriver findElements By.CssSelector("ul > li.description")

// sql
statement executeQuery "SELECT * FROM emp WHERE sal > 1500"

These “islands” of foreign languages are, by necessity, encoded as strings and are therefore fragile. A typo in, say, an XPath expression will only be detected at runtime when XPath compilation fails.

Macros to the Rescue

Scala macros are functions executed at compile time to participate in the compilation process.

When a macro function invocation is found during compilation, the Scala compiler calls the macro implementation passing the abstract syntax tree (AST) corresponding to the actual function arguments. The AST returned by the macro is then inserted in the compiled code in lieu of the macro’s original invocation. Neat!

All we need now is a macro that parses the embedded language string at compile time so as to ensure its correctness. Such a macro may also replace the original string by a compiled representation.

A Toy Macro

To illustrate macro implementation at its simplest let’s consider the case where the input string is returned verbatim:

import scala.language.experimental.macros
import scala.reflect.macros.blackbox.Context

object Macros {
  def noOp(someString: String) = macro noOpImpl

  def noOpImpl(c: Context)(someString: c.Expr[String]): c.Expr[String] = {
    someString
  }
}

The function noOp is declared to be a macro and an implementation mirroring its signature is specified (noOpImpl). The macro implementation’s second argument list matches the function arguments in number, name and type. In our case, the String argument someString becomes c.Expr[String].

Let’s not worry about the Context dependent types for now. All that matters is that the macro returns the same AST value passed to it through its someString argument.

Validating Emails

Let’s extend our macro to do something meaningful: validate email literals against a regular expression.

First, let’s quickly write a (somewhat clumsy) regular function to validate emails:

val EmailRegex = """^.*@.*(\.[a-z]{2,3})$""".r
def checkEmail(address: String) = EmailRegex.pattern.matcher(address).matches

// stuff...

test("Checks against email regex") {
  assert(checkEmail("me@here.net"))
  assert(!checkEmail("missingAtSign.com"))
  assert(!checkEmail("me@missingDomainSuffix"))
  assert(!checkEmail("me@tooShortDomainSuffix.x"))
  assert(!checkEmail("me@tooLongDomainSuffix.abcdefghijk"))
}

Compile-time Email Validation

We can now write a simple macro that validates email literals passed to it at compile time:

def email(address: String) = macro emailImpl

def emailImpl(c: Context)(address: c.Expr[String]): c.Expr[String] = {
  import c.universe._

  address.tree match {
    case Literal(Constant(text: String)) =>
      if (checkEmail(text)) address // Pass, return unchanged literal
      else c.abort(c.enclosingPosition, s"Invalid email: $text")
    case _ => address // Not a literal, can't validate at compile time
  }
}

Our macro can validate literal Strings (such as "you@there.net”) at compile time. However, expressions such as s"$user@$host" are left unchanged as they depend on values known only at runtime.

Note the AST corresponding to a literal string has the form Literal(Constant(..)). We deconstruct this form to extract the actual literal value in the text variable (which we subsequently validate by means of checkEmail()).

If a string fails validation a compile-time error message will be printed. IDE’s such as IntelliJ Idea and Eclipse will show the error message at editing time. Cool!

Icing on the Cake: String Interpolator

So far, we’ve been using a function to validate our email literals. Scala provides a much more legible construct: string interpolators.

Thus, instead of:

val myEmail = email("me@here.net")

we could write:

val myEmail = email"me@here.net"

Ah! This emphasizes the literal nature of the string and greatly improves readability.

String interpolators are not necessarily related to macros. They’re mostly used in their own right to perform string transformation operations.

Let’s say URL-encoding strings becomes a frequent operation. One may then want to write a string interpolator such that, for instance:

assert(urlEncode"Günther Frager" == "G%C3%BCnther+Frager")

The code needed to achieve this would be:

import java.net.URLEncoder

object Interpolators {
    implicit class URLEncode(val sc: StringContext) extends AnyVal {
      def urlEncode(args: Any_*) = URLEncoder.encode(sc.s(args: _*), "UTF-8")
    }
}

Interpolator-based Macro

To turn our email validation macro into a string interpolator we need the following:

import scala.language.experimental.macros
import scala.reflect.macros.blackbox.Context

object Macros {
  val EmailRegex = """^.*@.*(\.[a-z]{2,3})$""".r
  def checkEmail(address: String) = EmailRegex.pattern.matcher(address).matches

  implicit class EmailBuilder(val sc: StringContext) {
    def email(args: Any*) = macro scEmailImpl
    def email0(args: Any*) = sc.s(args: _*)
  }

  def scEmailImpl(c: Context)(args: c.Expr[Any]*): c.Expr[String] = {
    import c.universe._
    c.prefix.tree match {
      case Apply(_, List(Apply(_, List(literal @Literal(Constant(text: String)))))) =>
        if (checkEmail(text)) reify(c.Expr[String](literal).splice)
        else c.abort(c.enclosingPosition, s"Invalid email: $text")
      case compound =>
        val rts = compound.tpe.decl(TermName("email0"))
        val rt = internal.gen.mkAttributedSelect(compound, rts)
        c.Expr[String](Apply(rt, args.map(_.tree).toList))
    }
  }
}

This is more involved than our previous version because we’re no longer dealing with a single string literal but with a potentially multi-part string context.

Thus, we match the literal string by means of the uncanny AST expression Apply(_, List(Apply(_, List(literal @Literal(Constant(text: String)))))).

To handle non-literal expressions (such as email"$user@$host") we invoke the non-macro email0 function.

Conclusion

Macros are an extremely useful and powerful feature of the Scala language.

Macro-based String interpolators are extremely useful to ensure embedded language correctness at compile-time. For a simple example of XPath and CSS selector validation for WebDriver check this definition and this example usage.

For the official introduction to Scala def macros we’ve explored today see Def Macros.

comments powered by Disqus