In a previous tutorial, we saw how to decode CSV rows into tuples. This is useful, but we sometimes
want a more specific type - a Point
instead of an (Int, Int)
, say. Case classes lend themselves well to such
scenarios, and kantan.csv has various mechanisms to support them.
Take, for example, the wikipedia CSV example, which we’ll get from this project’s resources:
val rawData: java.net.URL = getClass.getResource("/wikipedia.csv")
This is what this data looks like:
scala.io.Source.fromURL(rawData).mkString
// res0: String = """Year,Make,Model,Description,Price
// 1997,Ford,E350,"ac, abs, moon",3000.00
// 1999,Chevy,"Venture ""Extended Edition""","",4900.00
// 1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
// 1996,Jeep,Grand Cherokee,"MUST SELL!
// air, moon roof, loaded",4799.00"""
An obvious representation of each row in this data would be:
case class Car(year: Int, make: String, model: String, desc: Option[String], price: Float)
We find ourselves with a particularly easy scenario to deal with: the rows in the CSV data and the fields in the target case class have a 1-to-1 correspondence and are declared in the same order. This means that, if you don’t mind a shapeless dependency, there’s very little work to do.
You’ll first need to add a dependency to the generic module in your build.sbt
:
libraryDependencies += "com.nrinaudo" %% "kantan.csv-generic" % "0.7.0"
Then, with the appropriate imports:
import kantan.csv._
import kantan.csv.ops._
import kantan.csv.generic._
val reader = rawData.asCsvReader[Car](rfc.withHeader)
Let’s make sure this worked by printing all decoded rows:
reader.foreach(println _)
// Right(Car(1997,Ford,E350,Some(ac, abs, moon),3000.0))
// Right(Car(1999,Chevy,Venture "Extended Edition",None,4900.0))
// Right(Car(1999,Chevy,Venture "Extended Edition, Very Large",None,5000.0))
// Right(Car(1996,Jeep,Grand Cherokee,Some(MUST SELL!
// air, moon roof, loaded),4799.0))
As we said before though, this was a particularly advantageous scenario. How would we deal with a Car
case class
where, say, the year
and make
fields have been swapped and the desc
field doesn’t exist?
case class Car2(make: String, year: Int, model: String, price: Float)
This cannot be derived automatically, and we need to provide an instance of RowDecoder[Car2]
. This is
made easy by helper methods meant for just this problem, the various decoder
methods:
import kantan.csv._
implicit val car2Decoder: RowDecoder[Car2] = RowDecoder.decoder(1, 0, 2, 4)(Car2.apply)
The first parameter to decoder
is a list of indexes that map CSV columns to case class fields. The second one
is a function that takes 4 arguments and return a value of the type we want to create a decoder for - with a case class,
that’s precisely the apply
method declared in the companion object.
Let’s verify that this worked as expected:
rawData.asCsvReader[Car2](rfc.withHeader).foreach(println _)
// Right(Car2(Ford,1997,E350,3000.0))
// Right(Car2(Chevy,1999,Venture "Extended Edition",4900.0))
// Right(Car2(Chevy,1999,Venture "Extended Edition, Very Large",5000.0))
// Right(Car2(Jeep,1996,Grand Cherokee,4799.0))
If you want to learn more about:
URL
into CSV data