The compression mode with lzo enabled is very useful for small-scale clusters. The compression ratio can be reduced to about 1 / 3 of the original log size. At the same time, the speed of decompression is faster.
install
Prepare jar package
1) Download lzo's jar project first
https://github.com/twitter/hadoop-lzo/archive/master.zip
2) the name of the downloaded file is Hadoop LZO master, which is a zip compressed package. First extract it, and then compile it with maven. Generate hadoop-lzo-0.4.20.
3) put the compiled hadoop-lzo-0.4.20.jar into hadoop-2.7.2/share/hadoop/common/
[root@bigdata-01 common]$ pwd
/export/servers/hadoop-2.7.4/share/hadoop/common
[root@bigdata-01 common]$ ls
hadoop-lzo-0.4.20.jar
4) scp synchronizes hadoop-lzo-0.4.20.jar to other nodes
To configure
1) core-site.xml adds configuration to support LZO compression
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>io.compression.codecs</name> <value> org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.SnappyCodec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec </value> </property> <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> </configuration>
2) scp synchronizes core-site.xml to other nodes
test
1) start hive to create lzo table
CREATE TABLE lzo_test (
id STRING,
name STRING
)
partitioned by (
dt STRING
)
row format delimited
fields terminated by '\t'
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
2) import data
load data inpath '/xxx/xxx/2019-07-25' into table lzo_test partition(dt='2019-07-25');