python basic network crawler day04
Catalog
1.xpath tool (parsing)
2.lxml library and xpath usage
day04
1.requests module method
get() parameter
Query parameters: params - Dictionary
Agent: proxies - Dictionary
General Agent: {Protocol': "Protocol: //ip Address: Port Number"}
Private Agent: {Protocol': "Protocol: //User Name: Passw ...
Posted by Bijan on Mon, 21 Jan 2019 08:33:13 -0800
callLog Case Analysis of Big Data Telephone Log (4)
1. Modify the storage time of kafka data in the subject by default of 7 days
-------------------------------------------------
[kafka/conf/server.properties]
log.retention.hours=1
2. aggregate query using hive
----------------------------------------------------
1.hive command line query
// Query all calle ...
Posted by shortcircuit on Sun, 20 Jan 2019 20:24:12 -0800
Azkaban3.X Multiple Excutors Installation Documents
Azkaban3.X Multiple Excutors Installation Documents
1. Compiling source code
1.1 Cloned Source Code
git clone https://github.com/azkaban/azkaban.git
1.2 Compilation
# Enter the azkaban directory
cd azkaban;
# Compiling projects using gradle
./gradlew build installDist
After compilation, build directories will be generat ...
Posted by longtone on Sun, 20 Jan 2019 10:18:12 -0800
Development and Use of UDF by hive
Recently, there is a need for data mining, which requires statistics of the number of things given in the vicinity of longitude and latitude n kilometers. When it comes to calculating the distance between two points of the earth, UDF should be written to calculate the distance between two points of the earth.
I. UDF Writing
A ...
Posted by HairyScotsman on Sun, 20 Jan 2019 08:12:12 -0800
Data Modeling-Factor Analysis
Principal Component Analysis and Factor Analysis
#Packet loading
library(corrplot)
library(psych)
library(GPArotation)
library(nFactors)
library(gplots)
library(RColorBrewer)
1
2
3
4
5
6
7
principal component analysis
Principal Component Analysis (PCA) is to extract a small number of irrelevant variables for a large number ...
Posted by Coldman on Sat, 19 Jan 2019 19:06:13 -0800
Netease Music Spider
Netease Music Spider
Blog drainage
This is just an introductory article. Please move on. Netease Music Spider for DB
Reptiles are a problem that we wanted to study long ago.
But because of laziness and laziness
Recently, some novices who write crawlers often come to my website to practice.
Looking at the log shows that it's har ...
Posted by Spoiler on Sat, 19 Jan 2019 14:30:13 -0800
maven Engineering Packing, Single Node Running wordcount (I)
spark shell is only used to test and validate our programs. In production environment, programs are usually programmed in IDE, then packaged into jar packages and submitted to the cluster. The most commonly used method is to create a Maven project to manage the dependencies of jar packages by Maven. First, edit Maven project on ...
Posted by volka on Sat, 19 Jan 2019 13:45:12 -0800
Driving Structure of sas Controller--Based on 3.10.0-693.25.4
The Department test environment has recently released a core, which is downtime in the mpt3sas module. I have never seen this module before. How to check the core? I haven't seen it before, just see it now. This module is the driver of the SAS controller. In previous IO stack research, only the general block layer has been known ...
Posted by nando on Sat, 19 Jan 2019 12:24:13 -0800
Flume data acquisition preparation
Flume is a highly available, reliable and distributed system for collecting, aggregating and transferring massive logs provided by Cloudera. Flume supports customizing various data senders in the log system for collecting data. At the same time, Flume provides the ability to process data simply and write to various data rec ...
Posted by ron814 on Sat, 19 Jan 2019 07:21:13 -0800
Examples of Basic Operating Functions in Spark Streaming
Guide: in In the Spark Streaming document Documents can be roughly divided into: Transformations,Window Operations,Join Operations,Output Operations operation
Article directory
Transformations
Window Operations
Join Operations
Output Operations
This article illustrates my code cloud Through train
Please get some basic info ...
Posted by DasHaas on Sat, 19 Jan 2019 04:24:13 -0800