Decoding rows as tuples

In a previous tutorial, we saw how to deal with CSV data composed of rows of homogeneous types. While a common enough scenario, you’ll also find yourself having to deal with heterogeneous data types fairly often.

Take, for instance, the wikipedia CSV example, which we’ll get from this project’s resources:

val rawData: java.net.URL = getClass.getResource("/wikipedia.csv")

This is what this data looks like:

scala.io.Source.fromURL(rawData).mkString
// res0: String = """Year,Make,Model,Description,Price
// 1997,Ford,E350,"ac, abs, moon",3000.00
// 1999,Chevy,"Venture ""Extended Edition""","",4900.00
// 1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
// 1996,Jeep,Grand Cherokee,"MUST SELL!
// air, moon roof, loaded",4799.00"""

One way of representing each row could be as a tuple. Let’s declare it as a type alias, for brevity’s sake:

type Car = (Int, String, String, Option[String], Float)

kantan.csv has out of the box support for decoding tuples, so you can simply pass the corresponding type to asCsvReader:

import kantan.csv._
import kantan.csv.ops._

val reader = rawData.asCsvReader[Car](rfc.withHeader)

And now that we have a CsvReader on the data, we can simply iterate through it:

reader.foreach(println _)
// Right((1997,Ford,E350,Some(ac, abs, moon),3000.0))
// Right((1999,Chevy,Venture "Extended Edition",None,4900.0))
// Right((1999,Chevy,Venture "Extended Edition, Very Large",None,5000.0))
// Right((1996,Jeep,Grand Cherokee,Some(MUST SELL!
// air, moon roof, loaded),4799.0))

What to read next

If you want to learn more about: