Add LZO compression support for Hadoop

Keywords: PHP Hadoop Apache xml hive

The compression mode with lzo enabled is very useful for small-scale clusters. The compression ratio can be reduced to about 1 / 3 of the original log size. At the same time, the speed of decompression is faster.

install

Prepare jar package

1) Download lzo's jar project first
https://github.com/twitter/hadoop-lzo/archive/master.zip

2) the name of the downloaded file is Hadoop LZO master, which is a zip compressed package. First extract it, and then compile it with maven. Generate hadoop-lzo-0.4.20.

3) put the compiled hadoop-lzo-0.4.20.jar into hadoop-2.7.2/share/hadoop/common/

[root@bigdata-01 common]$ pwd
/export/servers/hadoop-2.7.4/share/hadoop/common
[root@bigdata-01 common]$ ls
hadoop-lzo-0.4.20.jar

4) scp synchronizes hadoop-lzo-0.4.20.jar to other nodes

To configure

1) core-site.xml adds configuration to support LZO compression

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>
<name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec
</value>
</property>
<property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
</property>

</configuration>

2) scp synchronizes core-site.xml to other nodes

 

test

1) start hive to create lzo table

CREATE TABLE lzo_test (
id STRING,
name STRING
)
partitioned by (
dt STRING
)
row format delimited
fields terminated by '\t'
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";

2) import data

load data inpath '/xxx/xxx/2019-07-25' into table lzo_test partition(dt='2019-07-25');

Posted by fastfingertips on Tue, 15 Oct 2019 10:36:22 -0700