FunDA(7)- Reactive Streams to fs2 Pull Streams

Keywords: Scala

Reactive-Stream is not only a simple push-model-stream, but also a pull-model. This is because although Enumerator is theoretically responsible for pushing data actively in Iteratee mode, it implements push-model function. In fact, Iteratee can also start pushing data by providing a callback function to notify Enumerator, which is also a pull-model to some extent. In other words, Reactive-Streams implements interaction between upstream and downstream Enumerators and Iteratees through push-pull-model. Let's start with a simple example of Iteratee:

def showElements: Iteratee[Int,Unit] = Cont {
  case Input.El(e) =>
     println(s"EL($e)")
     showElements
  case Input.Empty => showElements
  case Input.EOF =>
     println("EOF")
     Done((),Input.EOF)
}                                                 //> showElements: => play.api.libs.iteratee.Iteratee[Int,Unit]
val enumNumbers = Enumerator(1,2,3,4,5)           //> enumNumbers  : play.api.libs.iteratee.Enumerator[Int] = play.api.libs.iteratee.Enumerator$$anon$19@47f6473

enumNumbers |>> showElements                      //> EL(1)
                                                  //| EL(2)
                                                  //| EL(3)
                                                  //| EL(4)
                                                  //| EL(5)
                                                  //| res0: scala.concurrent.Future[play.api.libs.iteratee.Iteratee[Int,Unit]] = Success(Cont(<function1>))

We see that enumNumbers |> showElements immediately starts the operation. However, the data was not actually sent because showElements did not receive Input.EOF. First of all, we must use Iteratee.run to complete the operation:

val it = Iteratee.flatten(enum |>> consumeAll).run//> El(1)
                                                  //| El(2)
                                                  //| El(3)
                                                  //| El(4)
                                                  //| El(5)
                                                  //| El(6)
                                                  //| El(7)
                                                  //| El(8)
                                                  //| EOF
                                                  //| it  : scala.concurrent.Future[Int] = Success(99)

This run function is defined as follows:

/**
   * Extracts the computed result of the Iteratee pushing an Input.EOF if necessary
   * Extracts the computed result of the Iteratee, pushing an Input.EOF first
   * if the Iteratee is in the [[play.api.libs.iteratee.Cont]] state.
   * In case of error, an exception may be thrown synchronously or may
   * be used to complete the returned Promise; this indeterminate behavior
   * is inherited from fold().
   *
   *  @return a [[scala.concurrent.Future]] of the eventually computed result
   */
  def run: Future[A] = fold({
    case Step.Done(a, _) => Future.successful(a)
    case Step.Cont(k) => k(Input.EOF).fold({
      case Step.Done(a1, _) => Future.successful(a1)
      case Step.Cont(_) => sys.error("diverging iteratee after Input.EOF")
      case Step.Error(msg, e) => sys.error(msg)
    })(dec)
    case Step.Error(msg, e) => sys.error(msg)
  })(dec)

Another problem is that enumNumbers |> showElements is a closed operation, we can not intercept the data stream part by part, we can only get the whole operation result. That is to say, if we want to boot the data generated by an Enumerator into fs2 Stream, it can only be achieved after all the data is read into memory. This violates the will to use Reactive-Streams. So what should we do? One possible method is to use a storage data structure with two threads, one thread in which Iteratee stores the current data into the data structure, and the other thread in which fs2 extracts the data. The fs2.async.mutable package provides a Queue type that we can use as a pipeline between Iteratee and fs2: Iteratee pushes data in from one end and fs2 pulls data out from the other.

Let's first design the enqueue section, which is done in Iteratee:

def enqueueTofs2(q: async.mutable.Queue[Task,Option[Int]]): Iteratee[Int,Unit] = Cont {
   case Input.EOF =>
       q.enqueue1(None).unsafeRun
       Done((),Input.EOF)
   case Input.Empty => enqueueTofs2(q)
   case Input.El(e) =>
       q.enqueue1(Some(e)).unsafeRun
       enqueueTofs2(q)
}    //> enqueueTofs2: (q: fs2.async.mutable.Queue[fs2.Task,Option[Int]])play.api.libs.iteratee.Iteratee[Int,Unit]

Let's start with this Iteratee: Let's put enqueueTofs2 directly into Cont state, which is to wait for the data to be accepted. Run q.enqueue1 when data is received to fill in q, and then run continuously until Input.EOF is received. Note: q.enqueue1 (Some (e). unsafeRun is a synchronization operation that will occupy threads until data enqueue1 is successfully completed. Therefore, the dequeue part on the other end of Q must run in another thread, otherwise the whole program will be deadlocked. The Queue style of fs2 is Queue[F,A], so we have to use Stream.eval to operate this Queue in a functional way:

val fs2Stream: Stream[Task,Int] = Stream.eval(async.boundedQueue[Task,Option[Int]](2)).flatMap { q =>
    //run Enumerator-Iteratee and enqueue data in thread 1
    //dequeue data and en-stream in thread 2(current thread)
  }

Because Stream.eval results in Stream[Task,Int], we can get the function style Queue [Task, Option [Int]=> Stream[Task,Int] in this flatMap. Let's first consider how to implement the data enqueue part: this part is generated through the Iteratee operation process. We mentioned that this part must run in another thread, so Task can be used to select another thread as follows:

    Task { Iteratee.flatten(enumerator |>> pushData(q)).run }.unsafeRunAsyncFuture()

Now this Task is running on its own in another thread behind it. But its progress depends on the progress of dequeue data in another thread. Let's first look at the two functional styles provided by fs2:

/** Repeatedly calls `dequeue1` forever. */
  def dequeue: Stream[F, A] = Stream.bracket(cancellableDequeue1)(d => Stream.eval(d._1), d => d._2).repeat

/**
   * Halts the input stream at the first `None`.
   *
   * @example {{{
   * scala> Stream[Pure, Option[Int]](Some(1), Some(2), None, Some(3), None).unNoneTerminate.toList
   * res0: List[Int] = List(1, 2)
   * }}}
   */
  def unNoneTerminate[F[_],I]: Pipe[F,Option[I],I] =
    _ repeatPull { _.receive {
      case (hd, tl) =>
        val out = Chunk.indexedSeq(hd.toVector.takeWhile { _.isDefined }.collect { case Some(i) => i })
        if (out.size == hd.size) Pull.output(out) as tl
        else if (out.isEmpty) Pull.done
        else Pull.output(out) >> Pull.done
    }}

It happens that dequeue produces Stream[F,A]. The unNone Terminate can terminate the operation based on Stream(None). Now we can define this Reactive-Streams to fs2-pull-streams conversion process as follows:

implicit val strat = Strategy.fromFixedDaemonPool(4)
                                                  //> strat  : fs2.Strategy = Strategy
val fs2Stream: Stream[Task,Int] = Stream.eval(async.boundedQueue[Task,Option[Int]](2)).flatMap { q =>
  Task(Iteratee.flatten(enumNumbers |>> enqueueTofs2(q)).run).unsafeRunAsyncFuture
  pipe.unNoneTerminate(q.dequeue)
}   //> fs2Stream  : fs2.Stream[fs2.Task,Int] = attemptEval(Task).flatMap(<function1>).flatMap(<function1>)

Now the stream should be fs2.Stream[Task,Int]. We can use the log function to try it out.

def log[A](prompt: String): Pipe[Task,A,A] =
    _.evalMap {row => Task.delay{ println(s"$prompt> $row"); row }}
                                                  //> log: [A](prompt: String)fs2.Pipe[fs2.Task,A,A]
    
fs2Stream.through(log("")).run.unsafeRun          //> > 1
                                                  //| > 2
                                                  //| > 3
                                                  //| > 4
                                                  //| > 5

We successfully converted Reactive-Stream of Iteratee into Pull-Model-Stream of fs2.

Here is the source code for this discussion:

import play.api.libs.iteratee._
import scala.concurrent._
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.collection.mutable._
import fs2._
object iteratees {
def showElements: Iteratee[Int,Unit] = Cont {
  case Input.El(e) =>
     println(s"EL($e)")
     showElements
  case Input.Empty => showElements
  case Input.EOF =>
     println("EOF")
     Done((),Input.EOF)
}
val enumNumbers = Enumerator(1,2,3,4,5)

enumNumbers |>> showElements

Iteratee.flatten(enumNumbers |>> showElements).run


def enqueueTofs2(q: async.mutable.Queue[Task,Option[Int]]): Iteratee[Int,Unit] = Cont {
   case Input.EOF =>
       q.enqueue1(None).unsafeRun
       Done((),Input.EOF)
   case Input.Empty => enqueueTofs2(q)
   case Input.El(e) =>
       q.enqueue1(Some(e)).unsafeRun
       enqueueTofs2(q)
}
implicit val strat = Strategy.fromFixedDaemonPool(4)
val fs2Stream: Stream[Task,Int] = Stream.eval(async.boundedQueue[Task,Option[Int]](2)).flatMap { q =>
  Task(Iteratee.flatten(enumNumbers |>> enqueueTofs2(q)).run).unsafeRunAsyncFuture
  pipe.unNoneTerminate(q.dequeue)
}

def log[A](prompt: String): Pipe[Task,A,A] =
    _.evalMap {row => Task.delay{ println(s"$prompt> $row"); row }}
    
fs2Stream.through(log("")).run.unsafeRun
 
}

Posted by AKA Panama Jack on Wed, 12 Dec 2018 14:51:09 -0800