python basic network crawler day04

Catalog 1.xpath tool (parsing) 2.lxml library and xpath usage day04 1.requests module method get() parameter Query parameters: params - Dictionary Agent: proxies - Dictionary General Agent: {Protocol': "Protocol: //ip Address: Port Number"} Private Agent: {Protocol': "Protocol: //User Name: Passw ...

Posted by Bijan on Mon, 21 Jan 2019 08:33:13 -0800

callLog Case Analysis of Big Data Telephone Log (4)

1. Modify the storage time of kafka data in the subject by default of 7 days ------------------------------------------------- [kafka/conf/server.properties] log.retention.hours=1 2. aggregate query using hive ---------------------------------------------------- 1.hive command line query // Query all calle ...

Posted by shortcircuit on Sun, 20 Jan 2019 20:24:12 -0800

Azkaban3.X Multiple Excutors Installation Documents

Azkaban3.X Multiple Excutors Installation Documents 1. Compiling source code 1.1 Cloned Source Code git clone https://github.com/azkaban/azkaban.git 1.2 Compilation # Enter the azkaban directory cd azkaban; # Compiling projects using gradle ./gradlew build installDist After compilation, build directories will be generat ...

Posted by longtone on Sun, 20 Jan 2019 10:18:12 -0800

Development and Use of UDF by hive

Recently, there is a need for data mining, which requires statistics of the number of things given in the vicinity of longitude and latitude n kilometers. When it comes to calculating the distance between two points of the earth, UDF should be written to calculate the distance between two points of the earth. I. UDF Writing A ...

Posted by HairyScotsman on Sun, 20 Jan 2019 08:12:12 -0800

Data Modeling-Factor Analysis

Principal Component Analysis and Factor Analysis #Packet loading library(corrplot) library(psych) library(GPArotation) library(nFactors) library(gplots) library(RColorBrewer) 1 2 3 4 5 6 7 principal component analysis Principal Component Analysis (PCA) is to extract a small number of irrelevant variables for a large number ...

Posted by Coldman on Sat, 19 Jan 2019 19:06:13 -0800

Netease Music Spider

Netease Music Spider Blog drainage This is just an introductory article. Please move on. Netease Music Spider for DB Reptiles are a problem that we wanted to study long ago. But because of laziness and laziness Recently, some novices who write crawlers often come to my website to practice. Looking at the log shows that it's har ...

Posted by Spoiler on Sat, 19 Jan 2019 14:30:13 -0800

maven Engineering Packing, Single Node Running wordcount (I)

spark shell is only used to test and validate our programs. In production environment, programs are usually programmed in IDE, then packaged into jar packages and submitted to the cluster. The most commonly used method is to create a Maven project to manage the dependencies of jar packages by Maven. First, edit Maven project on ...

Posted by volka on Sat, 19 Jan 2019 13:45:12 -0800

Driving Structure of sas Controller--Based on 3.10.0-693.25.4

The Department test environment has recently released a core, which is downtime in the mpt3sas module. I have never seen this module before. How to check the core? I haven't seen it before, just see it now. This module is the driver of the SAS controller. In previous IO stack research, only the general block layer has been known ...

Posted by nando on Sat, 19 Jan 2019 12:24:13 -0800

Flume data acquisition preparation

  Flume is a highly available, reliable and distributed system for collecting, aggregating and transferring massive logs provided by Cloudera. Flume supports customizing various data senders in the log system for collecting data. At the same time, Flume provides the ability to process data simply and write to various data rec ...

Posted by ron814 on Sat, 19 Jan 2019 07:21:13 -0800

Examples of Basic Operating Functions in Spark Streaming

Guide: in In the Spark Streaming document Documents can be roughly divided into: Transformations,Window Operations,Join Operations,Output Operations operation Article directory Transformations Window Operations Join Operations Output Operations This article illustrates my code cloud Through train Please get some basic info ...

Posted by DasHaas on Sat, 19 Jan 2019 04:24:13 -0800