Java Api has no aggregation problem after writing Spark program reduceByKey (custom type as Key)

Writing Spark using Java Api If PairRDD's key value is a custom type, you need to override hashcode and equals methods, otherwise you will find that the same Key value is not aggregated. For example: Use User type as Key ​ public class User { private String name; private String age; public String getName() { return name; } pu ...

Posted by harrisonad on Thu, 24 Jan 2019 20:18:13 -0800

idea writes WordCount program under windows and uploads it to hadoop cluster by jar package (fool version)

Typically, programs are programmed in IDE, then packaged as jar packages, and submitted to the cluster. The most commonly used method is to create a Maven project to manage the dependencies of jar packages using Maven. 1. Generating jar packages for WordCount 1. Open IDEA, File New Project Maven Next Fill in Groupld and Artif ...

Posted by kcgame on Thu, 24 Jan 2019 19:45:14 -0800

Elastic Search Learning Notes 29 GET APIs of JAVA Client

Elastic Search Learning Notes 29 GET APIs of JAVA Client Get API Get Request Optional arguments Synchronous Execution Asynchronous Execution Get Response Get API Get Request GetRequest is like: GetRequest getRequest = new GetRequest( "posts", //Index "doc", //Type "1"); //Document id Optio ...

Posted by cbn_noodles on Thu, 24 Jan 2019 16:30:14 -0800

Spark-core Comprehensive Exercise-IP Matching

ip.txt data: 220.177.248.0 | 220.177.255.255 | 3702650880 | 3702652927 | Asia | China | Jiangxi | Nanchang | Telecom | 360100|China|CN|115.892151|28.676493 220.178.0.0 | 220.178.56.113 | 3702652928 | 3702667377 | Asia | China | Anhui | Hefei | Telecom | 340100|China|CN|117.283042|31.86119 220.178.56.114 | 220.178.57.33 | 37026 ...

Posted by penguin_powered on Thu, 24 Jan 2019 16:18:14 -0800

Spark2 Workflow Scheduling for hue Integrated Oozie Workflow

I. Environmental preparation CDH5.15.0,spark2.3.0,hue3.9.0 Note: Because the CDH cluster is used, the default version of spark is 1.6.0, and saprk2.3.0 is installed through the parcel package. At this time, there are two spark versions in the cluster. Hue integrates spark 1.6. It is necessary to upload the jar package and o ...

Posted by Xorandnotor on Thu, 24 Jan 2019 10:45:13 -0800

Reptilian Practice

Today we practice crawling a website and summarize the crawling template of similar websites. Let's take a website like http://www.simm.cas.cn/xwzx/kydt/ as an example. The goal is to crawl the title, release time, article links, picture links, and source of the news. We mainly use requests, re, Beautiful Soup, JSON modules. Go ...

Posted by kir10s on Thu, 24 Jan 2019 10:15:13 -0800

CentOS 7 deploys another powerful tool for intranet penetration, Frp

Before I introduced the way to compile the intranet penetration software ngrok server and client under CentOS, today I introduce a better intranet penetration software than ngrok, Frp.   frp (fast reverse proxy) is a high performance reverse daili application that can be used for intranet penetration. It supports tcp, udp, ...

Posted by Zyx on Thu, 24 Jan 2019 09:42:13 -0800

Elastic search Curator uses

introduce Elastic search Curator helps you plan or manage your Elastic search index and snapshot in the following ways: Get a complete list of indexes (or snapshots) from the cluster as an operational list Iterate the list of user-defined filters and gradually remove the index (or snapshot) from the list as needed. Perform va ...

Posted by g_p_java on Thu, 24 Jan 2019 07:30:13 -0800

Introduction to Chinese Word Segmentation in Lucene Notes 17-Lucene

I. The Function of Word Segmenter The function of word segmentation is to get a TokenStream stream, which stores some information related to word segmentation, and can get detailed information of word segmentation through attributes. 2. Custom Stop Segmenter package com.wsy; import org.apache.lucene.analysis.*; import org.a ...

Posted by programmingjeff on Thu, 24 Jan 2019 06:24:14 -0800

Storm Learning-Cluster Submission Topology

1. Write a word-count case. Some of the introductions have been introduced in the code comments. There is no extra space to write about the use of storm. The code is as follows:1. Write a spout to generate a sentence, as follows /** * @Auther: 18030501 * @Date: 2018/10/24 14:25 * @Description: Data stream generator * * sp ...

Posted by Popgun on Thu, 24 Jan 2019 02:12:15 -0800