Decoding rows as collections

A simple but very common type of CSV data is rows of numerical values. This is something that kantan.csv tries to make as as easy as possible to deal with.

First, we’ll need some sample CSV data, which we’ll get from this project’s resources:

val rawData: java.net.URL = getClass.getResource("/nums.csv")

This is what we’re trying to parse:

scala.io.Source.fromURL(rawData).mkString
// res0: String = """85.5, 54.0, 74.7, 34.2
// 63.0, 75.6, 46.8, 80.1
// 85.5, 39.6, 2.7, 38.7"""

In order to turn this into useful types, all we need to do is retrieve a CsvReader instance:

import kantan.csv._
import kantan.csv.ops._

val reader = rawData.asCsvReader[List[Float]](rfc)

The asCsvReader scaladoc can seem a bit daunting with all its implicit parameters, so let’s demystify it.

The first thing you’ll notice is that it takes a type parameter, which is the type into which each row will be decoded. In our example, we requested each row to be decoded into a List[Float].

The first value parameter, ,, is the character that should be used as a column separator. It’s usually a comma, but not all implementations agree on that - Excel, for instance, is infamous for using a system-dependent column separator.

Finally, the last value parameter is a boolean flag that, when set to true, will cause the first row to be skipped. This is important for CSV data that contains a header row.

Now that we have our CsvReader instance, we can consume it - by, say, printing each row:

reader.foreach(println _)
// Right(List(85.5, 54.0, 74.7, 34.2))
// Right(List(63.0, 75.6, 46.8, 80.1))
// Right(List(85.5, 39.6, 2.7, 38.7))

Note that each result is wrapped in an instance of ReadResult. This allows decoding to be entirely safe - no exception will be thrown, all error conditions are encoded at the type level. If safety is not a concern and you’d rather let your code crash than deal with error conditions, you can use asUnsafeCsvReader instead.

Finally, observant readers might have noticed that we didn’t bother closing the CsvReader - we’re obviously dealing with some sort of streamed resource, not closing it seems like a bug. In this specific case, however, it’s not necessary: CsvReader will automatically close any underlying resource when it’s been consumed entirely, or a fatal error occurs.

If you want to learn more about:


Other tutorials: