Data service analysis of Spark project
Business logic processing
Peers
To judge whether it is a peer object, we can judge whether two objects have passed through multiple identical places by using longitude and latitude. Of course, each monitoring device can also be marked. When an object passes through the monitoring device, it can be captured by the device.
Tra ...
Posted by aztec on Sun, 17 Nov 2019 08:26:39 -0800
Scala defines arrays, enhances for traversing Scala arrays, unitl generates subscripts traversing Scala arrays, array conversion, array common algorithms 05
1 fixed length array and variable length array
Format of fixed length array definition:
val arr = new Array[T] (array length)
Variable length array definition format:
val arr = ArrayBuffer[T]()Note that you need to import the package: import scala.collection.mutable.ArrayBuffer
The code is as follows
import scala.colle ...
Posted by sashi34u on Sat, 16 Nov 2019 07:37:04 -0800
HA high availability cluster construction
Common hadoop cluster
namenode(nn)
secondarynamenode(2nn)
datanode(dn)
The problems of common hadoop cluster
Is there a single point of failure with datanode?
No, because datanode has multiple machines and a copy mechanism as guarantee
Is there a single point of failure with the namenode?
Yes, because 2nn can't replac ...
Posted by zed420 on Wed, 13 Nov 2019 11:31:47 -0800
Troubleshooting Spark error -- Error initializing SparkContext
Spark reported an error when submitting the spark job
./spark-shell
19/05/14 05:37:40 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel) ...
Posted by motofzr1000 on Sun, 10 Nov 2019 08:02:46 -0800
Spark SQL uses beeline to access hive warehouse
I. add hive-site.xml
Add the hive-site.xml configuration file under $SPARK_HOME/conf in order to access hive metadata normally
vim hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.1.201:3306/hiveDB?createDatabaseIfNotExist=true ...
Posted by mrodrigues on Wed, 06 Nov 2019 14:06:19 -0800
I. hbase -- basic principle and use
Hot issues of hbase data:
The solution is to preprocess the rowkey of the hot data, add some prefixes, and distribute the hot data to multiple region s.
Pre merger? Dynamic partition? At the beginning of the initial data, the data should be partitioned, stored in different region s, and load balanced.
Example: for example, it is easy to divide ...
Posted by daniel_mintz on Mon, 04 Nov 2019 16:20:41 -0800
Installation configuration HBASE
I. preparation before experiment
1. Download hbase-1.4.9-bin.tar.gz installation package
Official website https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/stable/
② Baidu online disk link: https://pan.baidu.com/s/1x6m30jqcWT_biXV8Z1belQ Extraction code: p95n
2. Connect the virtual machine and start the experim ...
Posted by sharmeen on Mon, 04 Nov 2019 11:32:30 -0800
Find the same letter puzzle
Dataset import HDFS
Command line access to the dataset just uploaded to HDFS
[hadoop@master hadoop-2.6.0]$ bin/hdfs dfs -ls /anagram/
MapReduce program compilation and operation:
Step 1: in the Map stage, sort each word alphabetically to generate sortedWord, and then output the key/value key value pair (sortedWord,word).
//Writ ...
Posted by AbydosGater on Mon, 04 Nov 2019 09:53:13 -0800
Big data (Introduction to Hadoop MapReduce code and programming model)
MapReduce programming model
MapReduce divides the whole operation process into two stages: Map stage and Reduce stage
The Map phase consists of a certain number of Map tasks Input data format analysis: InputFormat Input data processing: Mapper Data grouping: Partitioner
The Reduce phase consists of a certain number of Reduce tasks Data remot ...
Posted by covert215 on Sun, 03 Nov 2019 15:15:53 -0800
mapreduce [traffic statistics] sum - user defined data type
Demand: in the document, the total upstream traffic, total downstream traffic and total traffic consumed by each user
1363157985066 13726230503 00-FD-07-A4-72-B8:CMCC 120.196.100.82 i02.c.aliimg.com 24 27 2481 24681 200
1363157995052 13826544101 5C-0E-8B-C7-F1-E0:CMCC 120.197.40.4 4 0 264 0 200
1363157991076 13926435656 20-10-7A-28-CC- ...
Posted by alexguz79 on Sun, 03 Nov 2019 08:47:52 -0800