In a previous tutorial, we saw how to deal with CSV data composed of rows of homogeneous types. While a common enough scenario, you’ll also find yourself having to deal with heterogeneous data types fairly often.
Take, for instance, the wikipedia CSV example, which we’ll get from this project’s resources:
val rawData: java.net.URL = getClass.getResource("/wikipedia.csv")
This is what this data looks like:
scala.io.Source.fromURL(rawData).mkString
// res0: String = """Year,Make,Model,Description,Price
// 1997,Ford,E350,"ac, abs, moon",3000.00
// 1999,Chevy,"Venture ""Extended Edition""","",4900.00
// 1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
// 1996,Jeep,Grand Cherokee,"MUST SELL!
// air, moon roof, loaded",4799.00"""
One way of representing each row could be as a tuple. Let’s declare it as a type alias, for brevity’s sake:
type Car = (Int, String, String, Option[String], Float)
kantan.csv has out of the box support for decoding tuples, so you can simply pass the corresponding type to
asCsvReader
:
import kantan.csv._
import kantan.csv.ops._
val reader = rawData.asCsvReader[Car](rfc.withHeader)
And now that we have a CsvReader
on the data, we can simply iterate through it:
reader.foreach(println _)
// Right((1997,Ford,E350,Some(ac, abs, moon),3000.0))
// Right((1999,Chevy,Venture "Extended Edition",None,4900.0))
// Right((1999,Chevy,Venture "Extended Edition, Very Large",None,5000.0))
// Right((1996,Jeep,Grand Cherokee,Some(MUST SELL!
// air, moon roof, loaded),4799.0))
If you want to learn more about:
CsvReader
guessed how to turn CSV rows into Car
instancesURL
into CSV data