Big Data [Page 31] - Programmer Group - a programming skills sharing group

Big Data

Java Operates HDFS (Common API)

Java Operates HDFS Previous preparation API operation see file Create a new folder Upload files Download File Delete files Internal replication and internal movement (shear) rename Create new files Writing file read file Additional writing Get data location Previous preparation Ensure that the HDFS cluster has been bu ...

Posted by StroiX on Sat, 02 Feb 2019 13:57:16 -0800

Zookeeper learning (5): java connection zookeeper

Before adding the connected Semaphore semaphore: It has been reported that the connection failed, many posts on the Internet said that the firewall of the server where zookeeper is located is not closed, or the jdk version is inconsistent, and so on. My own analysis of the feeling is as follows: Although I've put ZooKeeper's ...

Posted by bravo81 on Sat, 02 Feb 2019 13:09:16 -0800

Java API for HDFS operations

1. Environmental Construction Configuration of environment variables HADOOP_HONE Add bin of HADOOP_HOME to PATH Permission issues: Add HADOOP_USER_NAME=root environment variable Configuration of Eclipse Add the following hadoop-eclipse-plugin.jar package to the dropins plugins folder in the Eclipse installation directory ...

Posted by naskoo on Sat, 02 Feb 2019 12:36:15 -0800

Sqoop Incremental Import and Export and Job Operation Example

Incremental import Incremental import append for incremental columns # First import [root@node222 ~]# /usr/local/sqoop-1.4.7/bin/sqoop import --connect jdbc:mysql://192.168.0.200:3306/sakila?useSSL=false --table actor --where "actor_id < 50" --username sakila -P --num-mappers 1 --target-dir /tmp/hive/sqoop/actor_all ... 18/10/1 ...

Posted by zak on Sat, 02 Feb 2019 12:21:16 -0800

Postgres CopyManager and connection from Connection Pool

1. PG CopyManager use sample code: package test.simple; //You need to include postgres jdbc jar into your project lib import org.postgresql.copy.CopyManager; import org.postgresql.core.BaseConnection; import java.io.FileInputStream; import java.io.FileWriter; import java.sql.Connection; import java.sql.DriverManager; impor ...

Posted by mattyj10 on Sat, 02 Feb 2019 09:09:16 -0800

Operating HDFS with Java

After building a high-availability HDFS cluster, Java can be used in Eclipse to operate HDFS and read and write files. High Availability HDFS Cluster Building Steps: https://blog.csdn.net/Chris_MZJ/article/details/83033471 Connecting HDFS with Eclipse 1. Place hadoop-eclipse-plugin-2.6.0.rar in the installation directory of ...

Posted by mfos on Sat, 02 Feb 2019 09:06:15 -0800

Hive Integrated HBase Detailed

Reproduced from: https://www.cnblogs.com/MOBIN/p/5704001.html 1. Create HBase tables from Hive Create a Live table pointing to HBase using the HQL statement CREATE TABLE hbase_table_1(key int, value string) //Table name hbase_table_1 in Hive STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' //Designated Storage P ...

Posted by maxpagels on Sat, 02 Feb 2019 02:45:15 -0800

Spark Learning Notes (1) - Introduction to Spark, Cluster Installation

1 Spark Introduction Spark is a fast, universal and scalable large data analysis engine. It was born in AMPLab, University of California, Berkeley in 2009. It was open source in 2010. It became Apache incubator in June 2013 and top-level Apache project in February 2014. At present, Spark ecosystem has developed into a collecti ...

Posted by All4172 on Sat, 02 Feb 2019 01:21:15 -0800

Fully Distributed Cluster (V) Hbase-1.2.6.1 Installation Configuration

environmental information Fully Distributed Cluster (I) Cluster Foundation Environment and zookeeper-3.4.10 Installation and Deployment hadoop cluster installation configuration process You need to deploy hadoop cluster before installing hive Fully Distributed Cluster (II) Haoop 2.6.5 Installation and Deployment Hbase Cluster Installatio ...

Posted by MFHJoe on Fri, 01 Feb 2019 19:12:15 -0800

pyspark's Little Knowledge Points in Work

1. df.na.fill({field name 1':'default','field name 2':'default'}) replaces null values 2. df.dropDuplicaates() de-duplicate according to the field name, empty parameter for all fields 3. df.subtract(df1) returns the elements that appear in the current DF and do not appear in df1, and do not weigh. 4. print time.localtime([ti ...

Posted by nvidia on Fri, 01 Feb 2019 13:21:15 -0800

Hot Keywords