Data service analysis of Spark project

Business logic processing Peers To judge whether it is a peer object, we can judge whether two objects have passed through multiple identical places by using longitude and latitude. Of course, each monitoring device can also be marked. When an object passes through the monitoring device, it can be captured by the device. Tra ...

Posted by aztec on Sun, 17 Nov 2019 08:26:39 -0800

Scala defines arrays, enhances for traversing Scala arrays, unitl generates subscripts traversing Scala arrays, array conversion, array common algorithms 05

1 fixed length array and variable length array Format of fixed length array definition: val arr = new Array[T] (array length) Variable length array definition format: val arr = ArrayBuffer[T]()Note that you need to import the package: import scala.collection.mutable.ArrayBuffer The code is as follows import scala.colle ...

Posted by sashi34u on Sat, 16 Nov 2019 07:37:04 -0800

HA high availability cluster construction

Common hadoop cluster namenode(nn) secondarynamenode(2nn) datanode(dn) The problems of common hadoop cluster Is there a single point of failure with datanode? No, because datanode has multiple machines and a copy mechanism as guarantee Is there a single point of failure with the namenode? Yes, because 2nn can't replac ...

Posted by zed420 on Wed, 13 Nov 2019 11:31:47 -0800

Troubleshooting Spark error -- Error initializing SparkContext

Spark reported an error when submitting the spark job ./spark-shell 19/05/14 05:37:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel) ...

Posted by motofzr1000 on Sun, 10 Nov 2019 08:02:46 -0800

Spark SQL uses beeline to access hive warehouse

I. add hive-site.xml Add the hive-site.xml configuration file under $SPARK_HOME/conf in order to access hive metadata normally vim hive-site.xml <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.1.201:3306/hiveDB?createDatabaseIfNotExist=true ...

Posted by mrodrigues on Wed, 06 Nov 2019 14:06:19 -0800

I. hbase -- basic principle and use

Hot issues of hbase data: The solution is to preprocess the rowkey of the hot data, add some prefixes, and distribute the hot data to multiple region s. Pre merger? Dynamic partition? At the beginning of the initial data, the data should be partitioned, stored in different region s, and load balanced. Example: for example, it is easy to divide ...

Posted by daniel_mintz on Mon, 04 Nov 2019 16:20:41 -0800

Installation configuration HBASE

I. preparation before experiment 1. Download hbase-1.4.9-bin.tar.gz installation package Official website https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/stable/ ② Baidu online disk link: https://pan.baidu.com/s/1x6m30jqcWT_biXV8Z1belQ Extraction code: p95n 2. Connect the virtual machine and start the experim ...

Posted by sharmeen on Mon, 04 Nov 2019 11:32:30 -0800

Find the same letter puzzle

Dataset import HDFS Command line access to the dataset just uploaded to HDFS [hadoop@master hadoop-2.6.0]$ bin/hdfs dfs -ls /anagram/ MapReduce program compilation and operation: Step 1: in the Map stage, sort each word alphabetically to generate sortedWord, and then output the key/value key value pair (sortedWord,word). //Writ ...

Posted by AbydosGater on Mon, 04 Nov 2019 09:53:13 -0800

Big data (Introduction to Hadoop MapReduce code and programming model)

MapReduce programming model MapReduce divides the whole operation process into two stages: Map stage and Reduce stage The Map phase consists of a certain number of Map tasks Input data format analysis: InputFormat Input data processing: Mapper Data grouping: Partitioner The Reduce phase consists of a certain number of Reduce tasks Data remot ...

Posted by covert215 on Sun, 03 Nov 2019 15:15:53 -0800

mapreduce [traffic statistics] sum - user defined data type

Demand: in the document, the total upstream traffic, total downstream traffic and total traffic consumed by each user 1363157985066 13726230503 00-FD-07-A4-72-B8:CMCC 120.196.100.82 i02.c.aliimg.com 24 27 2481 24681 200 1363157995052 13826544101 5C-0E-8B-C7-F1-E0:CMCC 120.197.40.4 4 0 264 0 200 1363157991076 13926435656 20-10-7A-28-CC- ...

Posted by alexguz79 on Sun, 03 Nov 2019 08:47:52 -0800