Spark2 Workflow Scheduling for hue Integrated Oozie Workflow

I. Environmental preparation CDH5.15.0,spark2.3.0,hue3.9.0 Note: Because the CDH cluster is used, the default version of spark is 1.6.0, and saprk2.3.0 is installed through the parcel package. At this time, there are two spark versions in the cluster. Hue integrates spark 1.6. It is necessary to upload the jar package and o ...

Posted by Xorandnotor on Thu, 24 Jan 2019 10:45:13 -0800

Read and write operations on MongoDB on SparkSql (Python version)

Read and write operations on MongoDB on SparkSql (Python version) 1.1 Read mongodb data The python approach requires the use of pyspark or spark-submit for submission. Here's how pyspark starts: 1.1.1 Start the command line with pyspark # Locally installed version of spark is 2.3.1, if other versions need to be modified ve ...

Posted by k3Bobos on Wed, 23 Jan 2019 17:57:13 -0800

Analysis of Spark in Action on Kubernetes-Playground Construction and Architecture

Preface Spark is a very popular big data processing engine. Data scientists use Spark and the related ecological big data suite to complete a large number of rich scene data analysis and mining. Spark has gradually become the industry standard in the field of data processing. However, Spark itself is designed to use static resource management. ...

Posted by stallingjohn on Tue, 22 Jan 2019 08:18:13 -0800

maven Engineering Packing, Single Node Running wordcount (I)

spark shell is only used to test and validate our programs. In production environment, programs are usually programmed in IDE, then packaged into jar packages and submitted to the cluster. The most commonly used method is to create a Maven project to manage the dependencies of jar packages by Maven. First, edit Maven project on ...

Posted by volka on Sat, 19 Jan 2019 13:45:12 -0800

Examples of Basic Operating Functions in Spark Streaming

Guide: in In the Spark Streaming document Documents can be roughly divided into: Transformations,Window Operations,Join Operations,Output Operations operation Article directory Transformations Window Operations Join Operations Output Operations This article illustrates my code cloud Through train Please get some basic info ...

Posted by DasHaas on Sat, 19 Jan 2019 04:24:13 -0800

Linux Installs Spark Cluster (CentOS 7 + Spark 2.1.1 + Hadoop 2.8.0)

1 Install Spark-dependent Scala           1.1 Download and Decompress Scala           1.2 Configuring environment variables           1.3 Verify Scala 2 Download and Decompress Spark           2.1 Download Spark Compression Packet           2.2 Decompression Spark 3 Spark-related configuration           3.1 Configuring environment variable ...

Posted by micklerlop on Sat, 22 Dec 2018 02:21:06 -0800