Environmental Science
Ubuntu 16.x server
Memory: minimum 8G
lanmps environment Suite http://www.lanmps.com)
PHP Version: 5.6
MYSQL Version: 5.6
NGINX Version: Latest
Elastic search version: 5.4
Logstash version: 5.4
JAVA installation
One way
The Java version used here is 1.8.0_131
Install the Java version and download the corresponding version according to the tutorial (Method 3: Source installation)
http://blog.csdn.net/fenglailea/article/details/26006647#t6
If the JAVA installation above is unsuccessful, use the following installation
Mode two
If the installation of JAVA in Mode 1 is unsuccessful, use the following installation
Reporting the following error
Error: could not find libjava.so
Error: Could not find Java SE Runtime Environment.
It can't be solved after finding N methods.
First delete the JAVA settings environment variable configuration file in Mode 1, and then proceed to the following settings
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
sudo update-java-alternatives -s java-8-oracle
sudo apt-get install Oracle-java8-set-default #Setting environment variables
JAVA version
java -version
This way comes from http://blog.csdn.net/blueheart20/article/details/50121691
Server User Creation
Create hadoop users
Line by line
useradd -m hadoop -s /bin/bash # Create hadoop users
passwd hadoop # Modify the password, which will allow you to enter the password twice.
usermod -G root hadoop # Increase Administrator Privileges
Setting Administrator or User Group Permissions
Executive order
visudo
Add a hadoop line to the root line, as shown below
root ALL=(ALL) ALL
hadoop ALL=(ALL) ALL
Application settings
Method 1: Exit the current user, log in with hadoop, and use the command su -, you can get root privilege to operate.
Method 2: Restart the system
The following configurations are all for hadoop users
root user is unable to start elastic search
Elastic search installation and configuration
http://kibana.logstash.es/content/elasticsearch/
https://es.xiaoleilu.com/
Official documents: https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html
The wind is coming. fox
Server description
If installed on the server
Minimum 8G memory,
Change the configuration in the config/jvm.xx file if you have lower memory
Do not use root users.
Do not use root users.
Do not use root users
Use the hadoop user created above
Elasticsearch Download Address
https://www.elastic.co/downloads/elasticsearch
Currently the latest version 5.4
Users using hadoop
cd ~
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.4.0.tar.gz
decompression
tar zxvf elasticsearch-5.4.0.tar.gz
Configure elastic search
Edit config/elastic search.yml
cd elasticsearch-5.4.0
vim config/elasticsearch.yml
Modified to
network.host: 0.0.0.0
cluster.name: es
cluster.name can be set without setting
... Other parts have not been changed and need not be modified.
Environment variable settings
sudo vim /etc/profile.d/elasticsearch.sh
join
export ES_HOME=/home/hadoop/elasticsearch-5.4.0
export PATH=$ES_HOME/bin:$PATH
Application effective
. /etc/profile
. /etc/bashrc
start-up
cd elasticsearch-5.4.0
bin/elasticsearch #foreground
//or
bin/elasticsearch -d #Background operation
Close
Find process ID
ps -ef |grep elasticsearch
Find this ID, KILL, he
kill -9 id
Chinese word segmentation plug-in analysis-ik
https://github.com/medcl/elasticsearch-analysis-ik/releases
Version: 5.4.0
Users using hadoop
cd ~
wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v5.4.0/elasticsearch-analysis-ik-5.4.0.zip
unzip elasticsearch-analysis-ik-5.4.0.zip -d elasticsearch-analysis-ik-5.4.0
Copy to the plug-in directory
mv elasticsearch-analysis-ik-5.4.0 elasticsearch-5.4.0/plugins/analysis-ik
At this point, you need to restart the elastic search plug-in to take effect (this can wait until the lexicon is set up and restart can also)
Segmentation Lexicon Settings
Enter the elastic search installation directory
Editing Lexicon Profile
cd ~/elasticsearch-5.4.0
vim plugins/analysis-ik/config/IKAnalyzer.cfg.xml
ext_dict This line is modified to behave as follows
<entry key="ext_dict">custom/sougou.dic;custom/mydict.dic;custom/single_word_low_freq.dic;custom/product.dic</entry>
custom/product.dic is my lexicon. It's not easy to reveal here.
At this point, you need to restart the elastic search plug-in to take effect.
Hot Updating of Word Segmentation
If you set up hot updates, configure the following settings (files in the web site do not exist, here is just a case)
<! - Users can configure a remote extended dictionary here - >. <entry key="remote_ext_dict">http://www.foxwho.com/thesaurus/word.txt</entry>
In the file, UTF8 encodes one line at a time and wraps lines with \n
Official statement
https://github.com/medcl/elasticsearch-analysis-ik
The plug-in currently supports hot update IK participle through the following configuration mentioned in the IK configuration file above
<! - Users can configure a remote extended dictionary here - >. <entry key="remote_ext_dict">location</entry> <! - Users can configure a remote extended stop word dictionary here - >. <entry key="remote_ext_stopwords">location</entry>
Where location refers to a url, such as http://yoursite.com/getCustomDict The request only needs to satisfy the following two points to complete the hot update of segmentation.
The http request needs to return two headers, one is Last-Modified and the other is ETag, both of which are string types. As long as one change occurs, the plug-in will grab the new participle and update the lexicon.
The content format returned by the http request is a line-by-line participle, and the newline character is n.
Hot update segmentation can be achieved by satisfying the above two requirements without restarting ES instances.
Hot words that need to be updated automatically can be placed in a UTF-8 encoded. txt file, under nginx or other simple http server. When the. txt file is modified, http server automatically returns the corresponding Last-Modified and ETag when the client requests the file. Another tool can be used to extract relevant vocabulary from the business system and update this. txt file.
Participle test
curl -XPUT "http://localhost:9200/index"
Test the effect of word segmentation:
Allow in browser
http://localhost:9200/index/_analyze?analyzer=ik_max_word&text=The People's Republic of China
Result
{
"tokens": [
{
"token": "The People's Republic of China",
"start_offset": 0,
"end_offset": 7,
"type": "CN_WORD",
"position": 0
},
{
"token": "The Chinese people",
"start_offset": 0,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
},
{
"token": "The Chinese people",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 2
},
{
"token": "Chinese",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 3
},
{
"token": "People's Republic",
"start_offset": 2,
"end_offset": 7,
"type": "CN_WORD",
"position": 4
},
{
"token": "the people",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 5
},
{
"token": "Republic",
"start_offset": 4,
"end_offset": 7,
"type": "CN_WORD",
"position": 6
},
{
"token": "Republic",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 7
},
{
"token": "country",
"start_offset": 6,
"end_offset": 7,
"type": "CN_CHAR",
"position": 8
}
]
}
Logstash 5.X Log Collection and Processing
Download address
https://www.elastic.co/downloads/logstash
Currently the latest version 5.4.0
Version: 5.4.0
Users using hadoop
Here use TAR.GZ source installation, that is, mode 1
cd ~
wget https://artifacts.elastic.co/downloads/logstash/logstash-5.4.0.tar.gz
tar -zxvf logstash-5.4.0.tar.gz
Test for successful installation
~/logstash-5.4.0/bin/logstash -e 'input { stdin { } } output { stdout {}}'
If the output is as follows, the installation is successful
The stdin plugin is now waiting for input:
[2017-05-16T21:48:15,233][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
Logstash 5.X configuration
Create a configuration directory
Enter Logstash root directory first
cd ~/logstash-5.4.0
mkdir -p etc
vim etc/www.lanmps.com.conf
Content of etc/test.conf file
input {
file {
type => "nginx-access"
path => ["/www/wwwLogs/www.lanmps.com/*.log"]
start_position => "beginning"
}
}
filter {
grok {
"message"=>"%{IPORHOST:client_ip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:http_version})?|-)\" (%{HOSTNAME:domain}|-) %{NUMBER:response} (?:%{NUMBER:bytes}|-) (%{QS:referrer}) %{QS:agent} \"(%{WORD:x_forword}|-)\" (%{URIHOST:upstream_host}|-) (%{NUMBER:upstream_response}|-) (%{WORD:upstream_cache_status}|-) %{QS:upstream_content_type} (%{USERNAME:upstream_response_time}) > (%{USERNAME:response_time})"
#Matching mode message is a log read in every paragraph. IP, HTTPDATE, WORD, NOTSPACE and NUMBER are all regular format names defined in patterns/grok-patterns. Comparing with the above log, colon, (?:%{USER:ident} -) is a conditional judgment, equivalent to binary operation in the program. If you have double quotation marks "" or [], you need to add \ before escaping.
}
kv {
source => "request"
field_split => "&?"
value_split => "="
}
#Then the obtained URL and request fields are taken out separately to match the key-value values, requiring a kv plug-in. Providing the field separator "&?" and the value key separator "=" automatically collects the fields and values.
urldecode {
all_fields => true
}
#urldecode all fields (display Chinese)
}
output {
elasticsearch {
hosts => ["10.1.5.66:9200"]
index => "logstash-%{type}-%{+YYYY.MM.dd}"
document_type => "%{type}"
}
}
Configuration description
http://kibana.logstash.es/content/logstash/plugins/input/file.html
Nginx Log Format Definition
log_format access '$remote_addr - $remote_user [$time_local] "$request" $http_host $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for" $upstream_addr $upstream_status $upstream_cache_status "$upstream_http_content_type" $upstream_response_time > $request_time';
Logstash 5.X Start and Stop
Test command
cd ~/logstash-5.4.0/
bin/logstash -e 'input { stdin { } } output { stdout {codec=>rubydebug} }'
Then you will find that the terminal is waiting for your input. No problem, type Hello World, return, and see what results will be returned!
The following results appear
2017-02-23T08:34:25.661Z c-101 Hello World
Test the configuration file for correctness
cd ~/logstash-5.4.0/
bin/logstash -t -f etc/
start-up
Load all *. conf text files in the etc folder, and then stitch them together into a complete large configuration file in your own memory
cd ~/logstash-5.4.0/
bin/logstash -f etc/
Background operation
nohup cd ~/logstash-5.4.0/ && bin/logstash -f etc/ &
Stop it
Find process ID
ps -ef |grep logstash
Find this ID, KILL, he
kill -9 id