The main purpose of FunDA design is to solve the problem of missing data source interline swimming operations in batch operation tool libraries such as FRM (Functional Relation Mapping) such as Slick. The result set produced by FRM is a static set, which lacks dynamic update operation mode. FunDA's solution is to transform the static set generated by FRM into a dynamic stream, where elements represent data rows, and a complete data stream represents a series of data rows. Users can use data stream and function components provided by FunDA to swim in the data stream for data update operation. FunDA's data stream supports only one-way swimming (fda_next), but FunDA's data stream supports many types of data elements, including data row and action row. The instruction line Action Row is composed of Slick-DBIOAction, which can send back to the background database to update the data. FunDA can generate new data lines or instruction lines from data lines by function components and operate user-provided function at any location of the data stream, enabling it to use data lines at that location for data update or data (instruction) line generation operations. We will demonstrate the use of FunDA in the following chapters.
The data row type in the result set returned by Slick operation Query is generally Tuple type. Because the field name cannot be used, it is a weak type. In addition to the convenience of use, because FunDA development is based on Scala functional programming mode, static type system requires more stringent types, so the elements in FunDA's data stream must be strongly typed, mostly case class type. In this way, the user can use the name to call the data field for data processing programming. Let's show you how to turn Slick's data result set into a strongly typed data stream:
From the World Bank Open Data Website, we downloaded about 300,000 original data of the air quality reports of States and counties in the United States in the form of cvs. Import h2 database as demonstration data. The following is a demonstration data table structure:
import slick.driver.H2Driver.api._
object Models {
//Table field corresponding template
case class AQMRawModel(mid: String
, mtype: String
, state: String
, fips: String
, county: String
, year: String
, value: String)
//Table structure: Define field types, * Represents result set fields
class AQMRawTable(tag: Tag) extends Table[AQMRawModel](tag, "AIRQM") {
def mid = column[String]("MEASUREID")
def mtype = column[String]("MEASURETYPE")
def state = column[String]("STATENAME")
def fips = column[String]("COUNTYFIPS")
def county = column[String]("COUNTYNAME")
def year = column[String]("REPORTYEAR")
def value = column[String]("VALUE")
def * = (mid,mtype,state,fips,county,year,value) <> (AQMRawModel.tupled, AQMRawModel.unapply)
}
//Library table instance
val AQMRawQuery = TableQuery[AQMRawTable]
}
Following is the SBT setup file build.sbt for this demonstration software:
name := "funda-demo"
version := "1.0"
scalaVersion := "2.11.8"
resolvers += Resolver.mavenLocal
libraryDependencies ++= Seq(
"com.typesafe.slick" %% "slick" % "3.1.1",
"com.typesafe.slick" %% "slick-testkit" % "3.1.1" % "test",
"org.slf4j" % "slf4j-nop" % "1.7.21",
"com.h2database" % "h2" % "1.4.191",
"com.typesafe.slick" %% "slick-hikaricp" % "3.1.1",
"com.bayakala" % "funda_2.11" % "1.0.0-SNAPSHOT" withSources() withJavadoc()
)
Database settings have been demonstrated in the previous Slick series of discussions. I won't say more here.
Strong type conversion can be performed when reading the database to generate data streams of strongly typed elements. Or instant conversion when using data streams. Let's first look at how to construct strongly typed element data streams:
val aqmraw = Models.AQMRawQuery
val db = Database.forConfig("h2db")
// aqmQuery.result returns Seq[(String,String,String,String)]
val aqmQuery = aqmraw.map {r => (r.year,r.state,r.county,r.value)}
// user designed strong typed resultset type. must extend FDAROW
case class TypedRow(year: String, state: String, county: String, value: String) extends FDAROW
// strong typed resultset conversion function. declared implicit to remind during compilation
implicit def toTypedRow(row: (String,String,String,String)): TypedRow =
TypedRow(row._1,row._2,row._3,row._4)
Before reading the database, the user provides strong type structure case class TypedRow, and Seq [(...)] to type conversion function toTypedRow, as above. Provide this conversion function when building the data reading tool class FDAViewLoader:
// loader to read from database and convert result collection to strong typed collection
val viewLoader = FDAViewLoader(slick.driver.H2Driver)(toTypedRow _)
val dataSeq = viewLoader.fda_typedRows(aqmQuery.result)(db).toSeq
Now this dataSeq is a Seq[TypedRow] type. Construct static data flow with dataSeq:
// turn Seq collection into fs2 stream
val aqmStream = fda_staticSource(dataSeq)()()
fd_staticSource is a resource usage pattern based on bracket function:
/**
* produce a static view source from a Seq[ROW] collection using famous 'bracket'
* provide facade to error handling and cleanup
* @param acquirer the Seq[ROW] collection
* @param errhandler error handle callback
* @param finalizer cleanup callback
* @tparam ROW type of row
* @return a new stream
*/
def fda_staticSource[ROW](acquirer: => Seq[ROW])(
errhandler: Throwable => FDAPipeLine[ROW] = null)(
finalizer: => Unit = ()): FDAPipeLine[ROW] = {...}
The above call omits exceptions and post-processing. The following example demonstrates the complete invocation:
val safeSource = fda_staticSource(dataSeq) {
case e: Exception => fda_appendRow(FDAErrorRow(new Exception(e)))
}(println("the end finally!"))
In this call example, if an exception occurs, the new data flow state is an element type representing the exception. The "end finally!" message will be displayed in both normal completion and interruption situations.
aqmStream is a strongly typed data stream with TypedRow as its element. We can use field names in component functions:
// use stream combinators with field names
aqmStream.filter{r => r.year > "1999"}.take(3).appendTask(showRecord).startRun
Of course, we can also call the field name in the user-defined task FDAUserTask function:
// now access fields in the strong typed resultset
def showRecord: FDAUserTask[FDAROW] = row => {
row match {
case qmr: TypedRow =>
println(s"State name: ${qmr.state}")
println(s"County Name: ${qmr.county}")
println(s"Particular year: ${qmr.year}")
println(s"Value: ${qmr.value}")
println("-------------")
fda_skip
case _ => fda_skip
}
}
Operating aqmStream yields the following results:
State name: Ohio County name: Stark Year: 2013 Value: 0 ------------- State name: New Mexico County Name: Lea Year: 2002 Value: 0 ------------- State name: Texas County name: Bowie Year: 2003 Value: 0 ------------- Process finished with exit code 0
We can also construct a weakly typed data stream and then use map to convert it into a strongly typed data stream, as follows:
val allState = aqmraw.map(_.state)
val stateLoader = FDAViewLoader[String,String](slick.driver.H2Driver)()
val stateSeq = stateLoader.fda_plainRows(allState.distinct.result)(db).toSeq
val stateStream = fda_staticSource(stateSeq)()()
case class StateRow(state: String) extends FDAROW
def showState: FDAUserTask[FDAROW] = row => {
row match {
case StateRow(sname) =>
println(s"Name of state: $sname")
fda_skip
case _ => fda_skip
}
}
stateStream.map{s => StateRow(s)}
.filter{r => r.state > "Alabama"}.take(3)
.appendTask(showState).startRun
allState returns the result type Seq[String]. Note that SOURCE and TARGET type parameters must be provided when building FDAViewLoader if type conversion functions are not provided to assist type inference. State Stream is a weakly typed data stream. We use map {s => StateRow (s)) to convert the flow elements into StateRow types. The result of operation stateStream is:
State Name: North Dakota State Name: Maryland State Name: Louisiana Process finished with exit code 0
The example above can be implemented in Reactive-Streams mode, as follows:
val streamLoader = FDAStreamLoader(slick.driver.H2Driver)(toTypedRow _)
val streamSource = streamLoader.fda_typedStream(aqmQuery.result)(db)(
10.seconds,512,512)()()
streamSource.filter{r => r.year > "1999"}.take(3).appendTask(showRecord).startRun
val stateStreamLoader = FDAStreamLoader[String,String](slick.driver.H2Driver)()
val stateStreamSource = stateStreamLoader.fda_plainStream(allState.distinct.result)(db)(
10.seconds,512,512)()()
//first convert to StateRows to turn Stream[Task,FDAROW] typed stream
stateStreamSource.map{s => StateRow(s)}
.filter{r => r.state > "Alabama"}.take(3)
.appendTask(showState).startRun
}
fda_typeStream generates data streams of strongly typed elements. Its function style is as follows:
/**
* returns a reactive-stream from Slick DBIOAction result
* using play-iteratees and fs2 queque to connect to slick data stream publisher
* provide facade for error handler and finalizer to support exception and cleanup handling
* also provide stream element conversion from SOURCE type to TARGET type
* @param action a Slick DBIOAction to produce query results
* @param slickDB Slick database object
* @param maxInterval max time wait on iteratee to consume of next element
* exceeding presumed streaming failure or completion
* use 0.milli to represent infinity
* inform enumerator to release its resources
* @param fetchSize number of rows cached during database read
* @param queSize size of queque used by iteratee as cache to pass elements to fs2 stream
* @param errhandler error handler callback
* @param finalizer cleanup callback
* @param convert just a measure to guarantee conversion function is defined
* when this function is used there has to be a converter defined
* implicitly in compile time
* @return a reactive-stream of TARGET row type elements
*/
def fda_typedStream(action: DBIOAction[Iterable[SOURCE],Streaming[SOURCE],Effect.Read])(
slickDB: Database)(
maxInterval: FiniteDuration, fetchSize: Int, queSize: Int)(
errhandler: Throwable => FDAPipeLine[TARGET] = null)(
finalizer: => Unit = ())(
implicit convert: SOURCE => TARGET): FDAPipeLine[TARGET] = {...}
Note the use of maxInterval,fetchSize,queSize parameters. The example streaming above produces the same results.
The following is the demo source code:
import slick.driver.H2Driver.api._
import com.bayakala.funda._
import API._
import scala.language.implicitConversions
import scala.concurrent.duration._
object StrongTypedSource extends App {
val aqmraw = Models.AQMRawQuery
val db = Database.forConfig("h2db")
// aqmQuery.result returns Seq[(String,String,String,String)]
val aqmQuery = aqmraw.map {r => (r.year,r.state,r.county,r.value)}
// user designed strong typed resultset type. must extend FDAROW
case class TypedRow(year: String, state: String, county: String, value: String) extends FDAROW
// strong typed resultset conversion function. declared implicit to remind during compilation
implicit def toTypedRow(row: (String,String,String,String)): TypedRow =
TypedRow(row._1,row._2,row._3,row._4)
// loader to read from database and convert result collection to strong typed collection
val viewLoader = FDAViewLoader(slick.driver.H2Driver)(toTypedRow _)
val dataSeq = viewLoader.fda_typedRows(aqmQuery.result)(db).toSeq
// turn Seq collection into fs2 stream
val aqmStream = fda_staticSource(dataSeq)()()
// now access fields in the strong typed resultset
def showRecord: FDAUserTask[FDAROW] = row => {
row match {
case qmr: TypedRow =>
println(s"State name: ${qmr.state}")
println(s"County Name: ${qmr.county}")
println(s"Particular year: ${qmr.year}")
println(s"Value: ${qmr.value}")
println("-------------")
fda_skip
case _ => fda_skip
}
}
// use stream combinators with field names
aqmStream.filter{r => r.year > "1999"}.take(3).appendTask(showRecord).startRun
val allState = aqmraw.map(_.state)
//no converter to help type inference. must provide type parameters explicitly
val stateLoader = FDAViewLoader[String,String](slick.driver.H2Driver)()
val stateSeq = stateLoader.fda_plainRows(allState.distinct.result)(db).toSeq
//constructed a Stream[Task,String]
val stateStream = fda_staticSource(stateSeq)()()
//strong typed row type. must extend FDAROW
case class StateRow(state: String) extends FDAROW
def showState: FDAUserTask[FDAROW] = row => {
row match {
case StateRow(sname) =>
println(s"Name of state: $sname")
fda_skip
case _ => fda_skip
}
}
//first convert to StateRows to turn Stream[Task,FDAROW] typed stream
stateStream.map{s => StateRow(s)}
.filter{r => r.state > "Alabama"}.take(3)
.appendTask(showState).startRun
val streamLoader = FDAStreamLoader(slick.driver.H2Driver)(toTypedRow _)
val streamSource = streamLoader.fda_typedStream(aqmQuery.result)(db)(
10.seconds,512,512)()()
streamSource.filter{r => r.year > "1999"}.take(3).appendTask(showRecord).startRun
val stateStreamLoader = FDAStreamLoader[String,String](slick.driver.H2Driver)()
val stateStreamSource = stateStreamLoader.fda_plainStream(allState.distinct.result)(db)(
10.seconds,512,512)()()
//first convert to StateRows to turn Stream[Task,FDAROW] typed stream
stateStreamSource.map{s => StateRow(s)}
.filter{r => r.state > "Alabama"}.take(3)
.appendTask(showState).startRun
}