FunDA (12) - Demonstration: strong typed data sources

Keywords: Scala Database Programming snapshot

The main purpose of FunDA design is to solve the problem of missing data source interline swimming operations in batch operation tool libraries such as FRM (Functional Relation Mapping) such as Slick. The result set produced by FRM is a static set, which lacks dynamic update operation mode. FunDA's solution is to transform the static set generated by FRM into a dynamic stream, where elements represent data rows, and a complete data stream represents a series of data rows. Users can use data stream and function components provided by FunDA to swim in the data stream for data update operation. FunDA's data stream supports only one-way swimming (fda_next), but FunDA's data stream supports many types of data elements, including data row and action row. The instruction line Action Row is composed of Slick-DBIOAction, which can send back to the background database to update the data. FunDA can generate new data lines or instruction lines from data lines by function components and operate user-provided function at any location of the data stream, enabling it to use data lines at that location for data update or data (instruction) line generation operations. We will demonstrate the use of FunDA in the following chapters.

The data row type in the result set returned by Slick operation Query is generally Tuple type. Because the field name cannot be used, it is a weak type. In addition to the convenience of use, because FunDA development is based on Scala functional programming mode, static type system requires more stringent types, so the elements in FunDA's data stream must be strongly typed, mostly case class type. In this way, the user can use the name to call the data field for data processing programming. Let's show you how to turn Slick's data result set into a strongly typed data stream:

From the World Bank Open Data Website, we downloaded about 300,000 original data of the air quality reports of States and counties in the United States in the form of cvs. Import h2 database as demonstration data. The following is a demonstration data table structure:

import slick.driver.H2Driver.api._

object Models {

  //Table field corresponding template
  case class AQMRawModel(mid: String
                         , mtype: String
                         , state: String
                         , fips: String
                         , county: String
                         , year: String
                         , value: String)

  //Table structure: Define field types, * Represents result set fields
  class AQMRawTable(tag: Tag) extends Table[AQMRawModel](tag, "AIRQM") {
    def mid = column[String]("MEASUREID")
    def mtype = column[String]("MEASURETYPE")
    def state = column[String]("STATENAME")
    def fips = column[String]("COUNTYFIPS")
    def county = column[String]("COUNTYNAME")
    def year = column[String]("REPORTYEAR")
    def value = column[String]("VALUE")

    def * = (mid,mtype,state,fips,county,year,value) <> (AQMRawModel.tupled, AQMRawModel.unapply)
  }

  //Library table instance
  val AQMRawQuery = TableQuery[AQMRawTable]

}

Following is the SBT setup file build.sbt for this demonstration software:

name := "funda-demo"

version := "1.0"

scalaVersion := "2.11.8"

resolvers += Resolver.mavenLocal

libraryDependencies ++= Seq(
  "com.typesafe.slick" %% "slick" % "3.1.1",
  "com.typesafe.slick" %% "slick-testkit" % "3.1.1" % "test",
  "org.slf4j" % "slf4j-nop" % "1.7.21",
  "com.h2database" % "h2" % "1.4.191",
  "com.typesafe.slick" %% "slick-hikaricp" % "3.1.1",
  "com.bayakala" % "funda_2.11" % "1.0.0-SNAPSHOT" withSources() withJavadoc()
)

Database settings have been demonstrated in the previous Slick series of discussions. I won't say more here.

Strong type conversion can be performed when reading the database to generate data streams of strongly typed elements. Or instant conversion when using data streams. Let's first look at how to construct strongly typed element data streams:

  val aqmraw = Models.AQMRawQuery

  val db = Database.forConfig("h2db")
// aqmQuery.result returns Seq[(String,String,String,String)]
  val aqmQuery = aqmraw.map {r => (r.year,r.state,r.county,r.value)}
// user designed strong typed resultset type. must extend FDAROW
  case class TypedRow(year: String, state: String, county: String, value: String) extends FDAROW
// strong typed resultset conversion function. declared implicit to remind during compilation
  implicit def toTypedRow(row: (String,String,String,String)): TypedRow =
    TypedRow(row._1,row._2,row._3,row._4)

Before reading the database, the user provides strong type structure case class TypedRow, and Seq [(...)] to type conversion function toTypedRow, as above. Provide this conversion function when building the data reading tool class FDAViewLoader:

// loader to read from database and convert result collection to strong typed collection
  val viewLoader = FDAViewLoader(slick.driver.H2Driver)(toTypedRow _)
  val dataSeq = viewLoader.fda_typedRows(aqmQuery.result)(db).toSeq

Now this dataSeq is a Seq[TypedRow] type. Construct static data flow with dataSeq:

// turn Seq collection into fs2 stream
  val aqmStream =  fda_staticSource(dataSeq)()()

fd_staticSource is a resource usage pattern based on bracket function:

  /**
    * produce a static view source from a Seq[ROW] collection using famous 'bracket'
    * provide facade to error handling and cleanup
    * @param acquirer       the Seq[ROW] collection
    * @param errhandler     error handle callback
    * @param finalizer      cleanup callback
    * @tparam ROW           type of row
    * @return               a new stream
    */
  def fda_staticSource[ROW](acquirer: => Seq[ROW])(
                            errhandler: Throwable => FDAPipeLine[ROW] = null)(
                            finalizer: => Unit = ()): FDAPipeLine[ROW] = {...}

The above call omits exceptions and post-processing. The following example demonstrates the complete invocation:

  val safeSource = fda_staticSource(dataSeq) {
    case e: Exception => fda_appendRow(FDAErrorRow(new Exception(e)))
  }(println("the end finally!"))

In this call example, if an exception occurs, the new data flow state is an element type representing the exception. The "end finally!" message will be displayed in both normal completion and interruption situations.

aqmStream is a strongly typed data stream with TypedRow as its element. We can use field names in component functions:

  // use stream combinators with field names
  aqmStream.filter{r => r.year > "1999"}.take(3).appendTask(showRecord).startRun

Of course, we can also call the field name in the user-defined task FDAUserTask function:

// now access fields in the strong typed resultset
  def showRecord: FDAUserTask[FDAROW] = row => {
    row match {
      case qmr: TypedRow =>
        println(s"State name: ${qmr.state}")
        println(s"County Name: ${qmr.county}")
        println(s"Particular year: ${qmr.year}")
        println(s"Value: ${qmr.value}")
        println("-------------")
        fda_skip
      case _ => fda_skip
    }
  }

Operating aqmStream yields the following results:

State name: Ohio
 County name: Stark
 Year: 2013
 Value: 0
-------------
State name: New Mexico
 County Name: Lea
 Year: 2002
 Value: 0
-------------
State name: Texas
 County name: Bowie
 Year: 2003
 Value: 0
-------------

Process finished with exit code 0

We can also construct a weakly typed data stream and then use map to convert it into a strongly typed data stream, as follows:

  val allState = aqmraw.map(_.state)
  val stateLoader = FDAViewLoader[String,String](slick.driver.H2Driver)()
  val stateSeq = stateLoader.fda_plainRows(allState.distinct.result)(db).toSeq
  val stateStream =  fda_staticSource(stateSeq)()()
  case class StateRow(state: String) extends FDAROW
  def showState: FDAUserTask[FDAROW] = row => {
     row match {
       case StateRow(sname) =>
         println(s"Name of state: $sname")
         fda_skip
       case _ => fda_skip
     }
  }
  stateStream.map{s => StateRow(s)}
    .filter{r => r.state > "Alabama"}.take(3)
    .appendTask(showState).startRun

allState returns the result type Seq[String]. Note that SOURCE and TARGET type parameters must be provided when building FDAViewLoader if type conversion functions are not provided to assist type inference. State Stream is a weakly typed data stream. We use map {s => StateRow (s)) to convert the flow elements into StateRow types. The result of operation stateStream is:

State Name: North Dakota
 State Name: Maryland
 State Name: Louisiana

Process finished with exit code 0

The example above can be implemented in Reactive-Streams mode, as follows:

  val streamLoader = FDAStreamLoader(slick.driver.H2Driver)(toTypedRow _)
  val streamSource = streamLoader.fda_typedStream(aqmQuery.result)(db)(
    10.seconds,512,512)()()
  streamSource.filter{r => r.year > "1999"}.take(3).appendTask(showRecord).startRun

  val stateStreamLoader = FDAStreamLoader[String,String](slick.driver.H2Driver)()
  val stateStreamSource = stateStreamLoader.fda_plainStream(allState.distinct.result)(db)(
    10.seconds,512,512)()()

  //first convert to StateRows to turn Stream[Task,FDAROW] typed stream
  stateStreamSource.map{s => StateRow(s)}
    .filter{r => r.state > "Alabama"}.take(3)
    .appendTask(showState).startRun
}

fda_typeStream generates data streams of strongly typed elements. Its function style is as follows:

   /**
      * returns a reactive-stream from Slick DBIOAction result
      * using play-iteratees and fs2 queque to connect to slick data stream publisher
      * provide facade for error handler and finalizer to support exception and cleanup handling
      * also provide stream element conversion from SOURCE type to TARGET type
      * @param action       a Slick DBIOAction to produce query results
      * @param slickDB      Slick database object
      * @param maxInterval  max time wait on iteratee to consume of next element
      *                     exceeding presumed streaming failure or completion
      *                     use 0.milli to represent infinity
      *                     inform enumerator to release its resources
      * @param fetchSize    number of rows cached during database read
      * @param queSize      size of queque used by iteratee as cache to pass elements to fs2 stream
      * @param errhandler   error handler callback
      * @param finalizer    cleanup callback
      * @param convert      just a measure to guarantee conversion function is defined
      *                     when this function is used there has to be a converter defined
      *                     implicitly in compile time
      * @return             a reactive-stream of TARGET row type elements
      */
    def fda_typedStream(action: DBIOAction[Iterable[SOURCE],Streaming[SOURCE],Effect.Read])(
      slickDB: Database)(
      maxInterval: FiniteDuration, fetchSize: Int, queSize: Int)(
      errhandler: Throwable => FDAPipeLine[TARGET] = null)(
      finalizer: => Unit = ())(
      implicit convert: SOURCE => TARGET): FDAPipeLine[TARGET] = {...}

Note the use of maxInterval,fetchSize,queSize parameters. The example streaming above produces the same results.

The following is the demo source code:

import slick.driver.H2Driver.api._
import com.bayakala.funda._
import API._
import scala.language.implicitConversions
import scala.concurrent.duration._

object StrongTypedSource extends App {

  val aqmraw = Models.AQMRawQuery

  val db = Database.forConfig("h2db")
// aqmQuery.result returns Seq[(String,String,String,String)]
  val aqmQuery = aqmraw.map {r => (r.year,r.state,r.county,r.value)}
// user designed strong typed resultset type. must extend FDAROW
  case class TypedRow(year: String, state: String, county: String, value: String) extends FDAROW
// strong typed resultset conversion function. declared implicit to remind during compilation
  implicit def toTypedRow(row: (String,String,String,String)): TypedRow =
    TypedRow(row._1,row._2,row._3,row._4)
// loader to read from database and convert result collection to strong typed collection
  val viewLoader = FDAViewLoader(slick.driver.H2Driver)(toTypedRow _)
  val dataSeq = viewLoader.fda_typedRows(aqmQuery.result)(db).toSeq
// turn Seq collection into fs2 stream
  val aqmStream =  fda_staticSource(dataSeq)()()
// now access fields in the strong typed resultset
  def showRecord: FDAUserTask[FDAROW] = row => {
    row match {
      case qmr: TypedRow =>
        println(s"State name: ${qmr.state}")
        println(s"County Name: ${qmr.county}")
        println(s"Particular year: ${qmr.year}")
        println(s"Value: ${qmr.value}")
        println("-------------")
        fda_skip
      case _ => fda_skip
    }
  }
  // use stream combinators with field names
  aqmStream.filter{r => r.year > "1999"}.take(3).appendTask(showRecord).startRun

  val allState = aqmraw.map(_.state)
  //no converter to help type inference. must provide type parameters explicitly
  val stateLoader = FDAViewLoader[String,String](slick.driver.H2Driver)()
  val stateSeq = stateLoader.fda_plainRows(allState.distinct.result)(db).toSeq
  //constructed a Stream[Task,String]
  val stateStream =  fda_staticSource(stateSeq)()()
  //strong typed row type. must extend FDAROW
  case class StateRow(state: String) extends FDAROW
  def showState: FDAUserTask[FDAROW] = row => {
     row match {
       case StateRow(sname) =>
         println(s"Name of state: $sname")
         fda_skip
       case _ => fda_skip
     }
  }
  //first convert to StateRows to turn Stream[Task,FDAROW] typed stream
  stateStream.map{s => StateRow(s)}
    .filter{r => r.state > "Alabama"}.take(3)
    .appendTask(showState).startRun


  val streamLoader = FDAStreamLoader(slick.driver.H2Driver)(toTypedRow _)
  val streamSource = streamLoader.fda_typedStream(aqmQuery.result)(db)(
    10.seconds,512,512)()()
  streamSource.filter{r => r.year > "1999"}.take(3).appendTask(showRecord).startRun

  val stateStreamLoader = FDAStreamLoader[String,String](slick.driver.H2Driver)()
  val stateStreamSource = stateStreamLoader.fda_plainStream(allState.distinct.result)(db)(
    10.seconds,512,512)()()

  //first convert to StateRows to turn Stream[Task,FDAROW] typed stream
  stateStreamSource.map{s => StateRow(s)}
    .filter{r => r.state > "Alabama"}.take(3)
    .appendTask(showState).startRun
}

Posted by luddeb on Wed, 02 Jan 2019 16:18:08 -0800