Extracting tuples

We’ve seen in a previous tutorial how to extract simple types from matches in a regular expression. Sometimes, however, matches contain more than one interesting value, each in a separate group.

For example, consider the following string:

val input = "(1, 2) and then (3, 4) followed by (5, 6, 7)"

We might want to extract all the parts that look like a point from it - this could be achieved with a simple regular expression, something like:

import kantan.regex.implicits._

val tuple2Regex = rx"\((\d+), (\d+)\)"

Note how the “interesting” parts are each in their own group, this is critical to kantan.regex behaving properly.

We can then proceed to extract our points as (Int, Int) exactly like we did before for simple types, through evalRegex:

input.evalRegex[(Int, Int)](tuple2Regex).foreach(println _)
// Right((1,2))
// Right((3,4))

Note that this will map each group in a match to the corresponding field in a tuple. If your groups and tuple values are not in the same order, you need a bit more legwork.

This is not entirely satisfactory, though: if you take another look at our input string, you’ll see that we have a third point, this one with a z coordinate.

The following regex can be used to match the first two coordinates with an optional third:

val tuple3Regex = rx"\((\d+), (\d+)(?:, (\d+))?\)"

One way of interpreting matches from this regex would be as (Int, Int, Option[Int]): triples with two ints and an optional third. This is achieved exactly as you’d expect:

input.evalRegex[(Int, Int, Option[Int])](tuple3Regex).foreach(println _)
// Right((1,2,None))
// Right((3,4,None))
// Right((5,6,Some(7)))

Another way would be as either an (Int, Int, Int) or an (Int, Int), which is also as simple as specifying the right type parameter to evalRegex:

input.evalRegex[Either[(Int, Int, Int), (Int, Int)]](tuple3Regex).foreach(println _)
// Right(Right((1,2)))
// Right(Right((3,4)))
// Right(Left((5,6,7)))

Note, however, that there’s a small catch when decoding to Either: the most discriminatory type should always go on the left. The reason for this is that Either will first attempt to decode as the left type, and stop there if successful. If we’d swapped the type parameter in our previous example, we’d not have gotten quite what we wanted:

input.evalRegex[Either[(Int, Int), (Int, Int, Int)]](tuple3Regex).foreach(println _)
// Right(Left((1,2)))
// Right(Left((3,4)))
// Right(Left((5,6)))

Other tutorials: