The main ideas in the introduction case of SparkSQL are as follows:
Create SparkContext
Create SQLContext
3). Create RDD
4. Create a class and define its member variables
5. Collate data and associate class es
6. Converting RDD to DataFrame (Importing Implicit Conversion)
7. Register the DataFrame as a temporary table
8. Writing SQL (Transformation)
9. Executing Action
There is another way of thinking and writing:
Create SparkContext
Create SQLContext
3). Create RDD
4. Create StructType (schema)
5. Organize data to associate data with Row
6. Create a DataFrame through rowRDD and schema
7. Register the DataFrame as a temporary table
8. Writing SQL (Transformation)
9. Executing Action
Specific code implementation process:
package cn.ysjh0014.SparkSql import org.apache.spark.rdd.RDD import org.apache.spark.sql.types._ import org.apache.spark.sql.{DataFrame, Row, SQLContext} import org.apache.spark.{SparkConf, SparkContext} object SparkSqlDemo2 { def main(args: Array[String]): Unit = { //This program can be submitted to the Spark cluster val conf = new SparkConf().setAppName("SparkSql2").setMaster("local[4]") //The setMaster here is designed to run locally, multithreaded //Create Spark Sql connections val sc = new SparkContext(conf) //SparkContext cannot create a special RDD that wraps Spark Sql and enhances it val SqlContext = new SQLContext(sc) //Create a DataFrame (a special RDD, that is, a RDD with schema), first create a normal RDD, and then associate it with schema val lines = sc.textFile(args(0)) //Processing data val RowRdd: RDD[Row] = lines.map(line => { val fields = line.split(",") val id = fields(0).toLong val name = fields(1) val age = fields(2).toInt val yz = fields(3).toDouble Row(id,name,age,yz) }) //The result type, which is actually the header, is used to describe the DataFrame val sm: StructType = StructType(List( StructField("id", LongType, true), StructField("name", StringType, true), StructField("age", IntegerType, true), StructField("yz", DoubleType, true) )) //Associate RowRDD with schema val df: DataFrame = SqlContext.createDataFrame(RowRdd,sm) //When you become a DataFrame, you can program with two API s //How to use SQL //Register the DataFrame as a temporary table df.registerTempTable("body") //Outdated methods //Writing SQL(sql method is actually Transformation) val result: DataFrame = SqlContext.sql("SELECT * FROM body ORDER BY yz desc, age asc") //Note: The SQL statement here must be capitalized //View results (start Action) result.show() //Release resources sc.stop() } }
Running results are identical to those in Case 1