kantan.csv mostly works from column indexes: when you need to refer to a specific CSV cell, you do so through its index in a row. The reason behind this behaviour is that headers are entirely optional and cannot be relied on to always be present.
Under some circumstances though, it would be helpful to be able to refer to CSV cells by their corresponding header label - it’s a fairly common scenario when interfacing with people that work with python, for example, where the norm is to work with headers and disregard column order.
kantan.csv supports this, in a limited fashion, through HeaderDecoder
and HeaderEncoder
.
Let’s take the wikipedia CSV example, which we’ll get from this project’s resources:
val rawData: java.net.URL = getClass.getResource("/wikipedia.csv")
This is what this data looks like:
scala.io.Source.fromURL(rawData).mkString
// res0: String = """Year,Make,Model,Description,Price
// 1997,Ford,E350,"ac, abs, moon",3000.00
// 1999,Chevy,"Venture ""Extended Edition""","",4900.00
// 1999,Chevy,"Venture ""Extended Edition, Very Large""",,5000.00
// 1996,Jeep,Grand Cherokee,"MUST SELL!
// air, moon roof, loaded",4799.00"""
An obvious representation of each row in this data would be:
case class Car(year: Int, make: String, model: String, price: Float, desc: Option[String])
Note how we’ve made sure to swap desc
and price
, to make sure that they were not in the same order as in the CSV
file.
Let’s also add the usual kantan.csv imports:
import kantan.csv._
import kantan.csv.ops._
At this point, we’d usually declare a RowDecoder
instance, but we’ve decided to work with headers rather than
indexes, so we need a HeaderDecoder
.
Here’s how we declare one:
implicit val carDecoder: HeaderDecoder[Car] = HeaderDecoder.decoder("Year", "Make", "Model", "Price", "Description")(Car.apply _)
Note how this takes two parameter lists:
String
per attribute in Car
: the corresponding header label.Car
.Through the magic of type classes, kantan.csv works out what types are expected and decodes them, and we get the desired result:
rawData.asCsvReader[Car](rfc.withHeader).foreach(println _)
// Right(Car(1997,Ford,E350,3000.0,Some(ac, abs, moon)))
// Right(Car(1999,Chevy,Venture "Extended Edition",4900.0,None))
// Right(Car(1999,Chevy,Venture "Extended Edition, Very Large",5000.0,None))
// Right(Car(1996,Jeep,Grand Cherokee,4799.0,Some(MUST SELL!
// air, moon roof, loaded)))
In a similar fashion, you can create a HeaderEncoder
to have kantan.csv automatically add the correct header when
serializing:
implicit val carEncoder: HeaderEncoder[Car] = HeaderEncoder.caseEncoder("Year", "Make", "Model", "Price", "Description")(Car.unapply _)
This lets you write:
List(Car(1999, "Ford", "E350", 3000F, Some("ac, abs, moon"))).asCsv(rfc.withHeader)
// res2: String = """Year,Make,Model,Price,Description
// 1999,Ford,E350,3000.0,"ac, abs, moon"
// """
In case you need both a HeaderDecoder
and a HeaderEncoder
, you can also create a HeaderCodec
:
val carCodec: HeaderCodec[Car] = HeaderCodec.caseCodec("Year", "Make", "Model", "Price", "Description")(Car.apply _)(Car.unapply _)