Big Data [Page 33] - Programmer Group - a programming skills sharing group

Big Data

Call From hadoop01/192.168.80.128 to 0.0.0.0:10020 failed on connection exception about hive execution mr

Today, 42 jobs were enabled when using hive to perform mr. Halfway through the execution, the following errors were reported suddenly. I have never encountered them before. I don't know if it is the reason why there are too many jobs. The error prompt was that port 10020 could not be accessed from the host. Check the reason on ...

Posted by pha3dr0n on Thu, 31 Jan 2019 13:30:15 -0800

Linear model of sklearn er Library Learning

Linear models use linear functions of input characteristics to predict, and learn the difference between algorithms of linear models. (1) The specific combination of coefficients and intercepts is used to measure the fitness of training data. Different algorithms use different methods to measure the fitness of training set, whic ...

Posted by joix on Thu, 31 Jan 2019 10:45:15 -0800

Python crawler example: download multi-page topic content from Baidu Post Bar

Last week in the web crawler course, a practice was left: download multi-page topic content from Baidu Post Bar. What I accomplished was to crawl multi-page content from a post in the post bar, which was different from the topic asked by the teacher. Moreover, after the teacher commented, I found the gap between myself and the ...

Posted by maciek4 on Thu, 31 Jan 2019 10:03:15 -0800

Lucene Notes 05-Lucene Index Weighting Operation and Luke's Simple Demonstration

I. Weighting the Index package com.wsy; import java.io.File; import java.io.IOException; import java.util.HashMap; import java.util.Map; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexReader; ...

Posted by steeveherris on Thu, 31 Jan 2019 03:45:14 -0800

Learn matplotlib drawing from scratch (4): Parallel histogram

Accumulated histograms have the advantage of accumulating histograms. For example, we can easily see the trend of multi-classification summation. However, we find that in the histogram, we can not easily understand the trend of the data classified above because of the different base positions. Therefore, when classification is n ...

Posted by nareshrevoori on Thu, 31 Jan 2019 02:48:16 -0800

12c Grid Infrastructure Management Repository

After installing grid in Grid Infrastructure 12.1.0.2, you can see that there are more ora.MGMTLSNR and ora.mgmtdb in the resources. At the same time, you start an instance, sid=-MGMTDB. [grid@prodb1 ~]$ crsctl status res -t--------------------------------------------------------------------------------Name Target State Serve ...

Posted by fresch on Thu, 31 Jan 2019 02:30:15 -0800

Spark Streaming integrates flume(Poll and Push)

As a framework of log real-time collection, flume can be connected with SparkStreaming real-time processing framework. Flume generates data in real-time and sparkStreaming does real-time processing. Spark Streaming docks with FlumeNG in two ways: one is that FlumeNG pushes the message Push to Spark Streaming, the other is that S ...

Posted by kobayashi_one on Wed, 30 Jan 2019 17:18:15 -0800

A MapReduce program example details determine success or failure (IV): In-Map Aggregation

Why use in-map aggregation? What's the difference between in-map aggregation and combine? When use combiner? When use in-map aggregation? Let's start with a picture to see where combiner is in a mr job. Dry goods below: Data files are read by InputFormat and processed in the Map phase. After the Map is processed, the ...

Posted by hedgehog90 on Wed, 30 Jan 2019 15:21:15 -0800

spark-streaming sample program

Develop spark-streaming to receive data worldcount from server port in real time. Environment building idea+maven's pom file is as follows: <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLoc ...

Posted by phpnew on Wed, 30 Jan 2019 12:21:15 -0800

Large Data Notebook 06-YARN Construction and Case Study

YARN The construction of yarn Cluster planning To configure Test case wordcount Use the test case wordcount provided by MapReduce The construction of yarn Cluster planning To configure Modify the configuration file mapred-sitex.xml <property> <name>mapreduce.framework.name</name> <value& ...

Posted by marli on Wed, 30 Jan 2019 11:00:15 -0800

Hot Keywords