In a previous tutorial, we saw how to decode CSV rows into tuples. This is useful, but we sometimes
want a more specific type - a
Point instead of an
(Int, Int), say. Case classes lend themselves well to such
scenarios, and kantan.csv has various mechanisms to support them.
Take, for example, the wikipedia CSV example, which we’ll get from this project’s resources:
val rawData: java.net.URL = getClass.getResource("/wikipedia.csv")
This is what this data looks like:
scala.io.Source.fromURL(rawData).mkString // res0: String = """Year,Make,Model,Description,Price // 1997,Ford,E350,"ac, abs, moon",3000.00 // 1999,Chevy,"Venture ""Extended Edition""","",4900.00 // 1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00 // 1996,Jeep,Grand Cherokee,"MUST SELL! // air, moon roof, loaded",4799.00"""
An obvious representation of each row in this data would be:
case class Car(year: Int, make: String, model: String, desc: Option[String], price: Float)
We find ourselves with a particularly easy scenario to deal with: the rows in the CSV data and the fields in the target case class have a 1-to-1 correspondence and are declared in the same order. This means that, if you don’t mind a shapeless dependency, there’s very little work to do.
You’ll first need to add a dependency to the generic module in your
libraryDependencies += "com.nrinaudo" %% "kantan.csv-generic" % "0.7.0"
Then, with the appropriate imports:
import kantan.csv._ import kantan.csv.ops._ import kantan.csv.generic._ val reader = rawData.asCsvReader[Car](rfc.withHeader)
Let’s make sure this worked by printing all decoded rows:
reader.foreach(println _) // Right(Car(1997,Ford,E350,Some(ac, abs, moon),3000.0)) // Right(Car(1999,Chevy,Venture "Extended Edition",None,4900.0)) // Right(Car(1999,Chevy,Venture "Extended Edition, Very Large",None,5000.0)) // Right(Car(1996,Jeep,Grand Cherokee,Some(MUST SELL! // air, moon roof, loaded),4799.0))
As we said before though, this was a particularly advantageous scenario. How would we deal with a
Car case class
where, say, the
make fields have been swapped and the
desc field doesn’t exist?
case class Car2(make: String, year: Int, model: String, price: Float)
This cannot be derived automatically, and we need to provide an instance of
RowDecoder[Car2]. This is
made easy by helper methods meant for just this problem, the various
import kantan.csv._ implicit val car2Decoder: RowDecoder[Car2] = RowDecoder.decoder(1, 0, 2, 4)(Car2.apply)
The first parameter to
decoder is a list of indexes that map CSV columns to case class fields. The second one
is a function that takes 4 arguments and return a value of the type we want to create a decoder for - with a case class,
that’s precisely the
apply method declared in the companion object.
Let’s verify that this worked as expected:
rawData.asCsvReader[Car2](rfc.withHeader).foreach(println _) // Right(Car2(Ford,1997,E350,3000.0)) // Right(Car2(Chevy,1999,Venture "Extended Edition",4900.0)) // Right(Car2(Chevy,1999,Venture "Extended Edition, Very Large",5000.0)) // Right(Car2(Jeep,1996,Grand Cherokee,4799.0))
If you want to learn more about:
URLinto CSV data