Java Api has no aggregation problem after writing Spark program reduceByKey (custom type as Key)
Writing Spark using Java Api If PairRDD's key value is a custom type, you need to override hashcode and equals methods, otherwise you will find that the same Key value is not aggregated.
For example: Use User type as Key
public class User {
private String name;
private String age;
public String getName() {
return name;
}
pu ...
Posted by harrisonad on Thu, 24 Jan 2019 20:18:13 -0800
idea writes WordCount program under windows and uploads it to hadoop cluster by jar package (fool version)
Typically, programs are programmed in IDE, then packaged as jar packages, and submitted to the cluster. The most commonly used method is to create a Maven project to manage the dependencies of jar packages using Maven.
1. Generating jar packages for WordCount
1. Open IDEA, File New Project Maven Next Fill in Groupld and Artif ...
Posted by kcgame on Thu, 24 Jan 2019 19:45:14 -0800
Elastic Search Learning Notes 29 GET APIs of JAVA Client
Elastic Search Learning Notes 29 GET APIs of JAVA Client
Get API
Get Request
Optional arguments
Synchronous Execution
Asynchronous Execution
Get Response
Get API
Get Request
GetRequest is like:
GetRequest getRequest = new GetRequest(
"posts", //Index
"doc", //Type
"1"); //Document id
Optio ...
Posted by cbn_noodles on Thu, 24 Jan 2019 16:30:14 -0800
Spark-core Comprehensive Exercise-IP Matching
ip.txt data:
220.177.248.0 | 220.177.255.255 | 3702650880 | 3702652927 | Asia | China | Jiangxi | Nanchang | Telecom | 360100|China|CN|115.892151|28.676493
220.178.0.0 | 220.178.56.113 | 3702652928 | 3702667377 | Asia | China | Anhui | Hefei | Telecom | 340100|China|CN|117.283042|31.86119
220.178.56.114 | 220.178.57.33 | 37026 ...
Posted by penguin_powered on Thu, 24 Jan 2019 16:18:14 -0800
Spark2 Workflow Scheduling for hue Integrated Oozie Workflow
I. Environmental preparation
CDH5.15.0,spark2.3.0,hue3.9.0
Note: Because the CDH cluster is used, the default version of spark is 1.6.0, and saprk2.3.0 is installed through the parcel package. At this time, there are two spark versions in the cluster. Hue integrates spark 1.6. It is necessary to upload the jar package and o ...
Posted by Xorandnotor on Thu, 24 Jan 2019 10:45:13 -0800
Reptilian Practice
Today we practice crawling a website and summarize the crawling template of similar websites.
Let's take a website like http://www.simm.cas.cn/xwzx/kydt/ as an example. The goal is to crawl the title, release time, article links, picture links, and source of the news.
We mainly use requests, re, Beautiful Soup, JSON modules.
Go ...
Posted by kir10s on Thu, 24 Jan 2019 10:15:13 -0800
CentOS 7 deploys another powerful tool for intranet penetration, Frp
Before I introduced the way to compile the intranet penetration software ngrok server and client under CentOS, today I introduce a better intranet penetration software than ngrok, Frp.
frp (fast reverse proxy) is a high performance reverse daili application that can be used for intranet penetration. It supports tcp, udp, ...
Posted by Zyx on Thu, 24 Jan 2019 09:42:13 -0800
Elastic search Curator uses
introduce
Elastic search Curator helps you plan or manage your Elastic search index and snapshot in the following ways:
Get a complete list of indexes (or snapshots) from the cluster as an operational list
Iterate the list of user-defined filters and gradually remove the index (or snapshot) from the list as needed.
Perform va ...
Posted by g_p_java on Thu, 24 Jan 2019 07:30:13 -0800
Introduction to Chinese Word Segmentation in Lucene Notes 17-Lucene
I. The Function of Word Segmenter
The function of word segmentation is to get a TokenStream stream, which stores some information related to word segmentation, and can get detailed information of word segmentation through attributes.
2. Custom Stop Segmenter
package com.wsy;
import org.apache.lucene.analysis.*;
import org.a ...
Posted by programmingjeff on Thu, 24 Jan 2019 06:24:14 -0800
Storm Learning-Cluster Submission Topology
1. Write a word-count case. Some of the introductions have been introduced in the code comments. There is no extra space to write about the use of storm.
The code is as follows:1. Write a spout to generate a sentence, as follows
/**
* @Auther: 18030501
* @Date: 2018/10/24 14:25
* @Description: Data stream generator
*
* sp ...
Posted by Popgun on Thu, 24 Jan 2019 02:12:15 -0800