Introduction to SparkSQL Case 2 (SparkSQL 1.x)

Keywords: Big Data SQL Spark Apache

The main ideas in the introduction case of SparkSQL are as follows:

Create SparkContext
Create SQLContext
3). Create RDD
4. Create a class and define its member variables
5. Collate data and associate class es
6. Converting RDD to DataFrame (Importing Implicit Conversion)
7. Register the DataFrame as a temporary table
8. Writing SQL (Transformation)
9. Executing Action

There is another way of thinking and writing:

Create SparkContext
Create SQLContext
3). Create RDD
4. Create StructType (schema)
5. Organize data to associate data with Row
6. Create a DataFrame through rowRDD and schema
7. Register the DataFrame as a temporary table
8. Writing SQL (Transformation)
9. Executing Action

Specific code implementation process:

package cn.ysjh0014.SparkSql


import org.apache.spark.rdd.RDD
import org.apache.spark.sql.types._
import org.apache.spark.sql.{DataFrame, Row, SQLContext}
import org.apache.spark.{SparkConf, SparkContext}

object SparkSqlDemo2 {

  def main(args: Array[String]): Unit = {

    //This program can be submitted to the Spark cluster
    val conf = new SparkConf().setAppName("SparkSql2").setMaster("local[4]") //The setMaster here is designed to run locally, multithreaded

    //Create Spark Sql connections
    val sc = new SparkContext(conf)
    //SparkContext cannot create a special RDD that wraps Spark Sql and enhances it
    val SqlContext = new SQLContext(sc)

    //Create a DataFrame (a special RDD, that is, a RDD with schema), first create a normal RDD, and then associate it with schema
    val lines = sc.textFile(args(0))

    //Processing data
    val RowRdd: RDD[Row] = lines.map(line => {
      val fields = line.split(",")
      val id = fields(0).toLong
      val name = fields(1)
      val age = fields(2).toInt
      val yz = fields(3).toDouble
      Row(id,name,age,yz)
    })

    //The result type, which is actually the header, is used to describe the DataFrame
    val sm: StructType = StructType(List(
      StructField("id", LongType, true),
      StructField("name", StringType, true),
      StructField("age", IntegerType, true),
      StructField("yz", DoubleType, true)
    ))


    //Associate RowRDD with schema
    val df: DataFrame = SqlContext.createDataFrame(RowRdd,sm)

    //When you become a DataFrame, you can program with two API s

    //How to use SQL
    //Register the DataFrame as a temporary table
    df.registerTempTable("body") //Outdated methods
    //Writing SQL(sql method is actually Transformation)
    val result: DataFrame = SqlContext.sql("SELECT * FROM body ORDER BY yz desc, age asc")         //Note: The SQL statement here must be capitalized
    //View results (start Action)
    result.show()


    //Release resources
    sc.stop()

  }
}

Running results are identical to those in Case 1

Posted by pietbez on Thu, 31 Jan 2019 19:57:15 -0800