Java Operates HDFS (Common API)
Java Operates HDFS
Previous preparation
API operation
see file
Create a new folder
Upload files
Download File
Delete files
Internal replication and internal movement (shear)
rename
Create new files
Writing file
read file
Additional writing
Get data location
Previous preparation
Ensure that the HDFS cluster has been bu ...
Posted by StroiX on Sat, 02 Feb 2019 13:57:16 -0800
Zookeeper learning (5): java connection zookeeper
Before adding the connected Semaphore semaphore:
It has been reported that the connection failed, many posts on the Internet said that the firewall of the server where zookeeper is located is not closed, or the jdk version is inconsistent, and so on.
My own analysis of the feeling is as follows:
Although I've put ZooKeeper's ...
Posted by bravo81 on Sat, 02 Feb 2019 13:09:16 -0800
Java API for HDFS operations
1. Environmental Construction
Configuration of environment variables HADOOP_HONE
Add bin of HADOOP_HOME to PATH
Permission issues: Add HADOOP_USER_NAME=root environment variable
Configuration of Eclipse
Add the following hadoop-eclipse-plugin.jar package to the dropins plugins folder in the Eclipse installation directory
...
Posted by naskoo on Sat, 02 Feb 2019 12:36:15 -0800
Sqoop Incremental Import and Export and Job Operation Example
Incremental import
Incremental import append for incremental columns
# First import
[root@node222 ~]# /usr/local/sqoop-1.4.7/bin/sqoop import --connect jdbc:mysql://192.168.0.200:3306/sakila?useSSL=false --table actor --where "actor_id < 50" --username sakila -P --num-mappers 1 --target-dir /tmp/hive/sqoop/actor_all
...
18/10/1 ...
Posted by zak on Sat, 02 Feb 2019 12:21:16 -0800
Postgres CopyManager and connection from Connection Pool
1. PG CopyManager use sample code:
package test.simple;
//You need to include postgres jdbc jar into your project lib
import org.postgresql.copy.CopyManager;
import org.postgresql.core.BaseConnection;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.sql.Connection;
import java.sql.DriverManager;
impor ...
Posted by mattyj10 on Sat, 02 Feb 2019 09:09:16 -0800
Operating HDFS with Java
After building a high-availability HDFS cluster, Java can be used in Eclipse to operate HDFS and read and write files.
High Availability HDFS Cluster Building Steps: https://blog.csdn.net/Chris_MZJ/article/details/83033471
Connecting HDFS with Eclipse
1. Place hadoop-eclipse-plugin-2.6.0.rar in the installation directory of ...
Posted by mfos on Sat, 02 Feb 2019 09:06:15 -0800
Hive Integrated HBase Detailed
Reproduced from: https://www.cnblogs.com/MOBIN/p/5704001.html
1. Create HBase tables from Hive
Create a Live table pointing to HBase using the HQL statement
CREATE TABLE hbase_table_1(key int, value string) //Table name hbase_table_1 in Hive
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' //Designated Storage P ...
Posted by maxpagels on Sat, 02 Feb 2019 02:45:15 -0800
Spark Learning Notes (1) - Introduction to Spark, Cluster Installation
1 Spark Introduction
Spark is a fast, universal and scalable large data analysis engine. It was born in AMPLab, University of California, Berkeley in 2009. It was open source in 2010. It became Apache incubator in June 2013 and top-level Apache project in February 2014. At present, Spark ecosystem has developed into a collecti ...
Posted by All4172 on Sat, 02 Feb 2019 01:21:15 -0800
Fully Distributed Cluster (V) Hbase-1.2.6.1 Installation Configuration
environmental information
Fully Distributed Cluster (I) Cluster Foundation Environment and zookeeper-3.4.10 Installation and Deployment
hadoop cluster installation configuration process
You need to deploy hadoop cluster before installing hive
Fully Distributed Cluster (II) Haoop 2.6.5 Installation and Deployment
Hbase Cluster Installatio ...
Posted by MFHJoe on Fri, 01 Feb 2019 19:12:15 -0800
pyspark's Little Knowledge Points in Work
1. df.na.fill({field name 1':'default','field name 2':'default'}) replaces null values
2. df.dropDuplicaates() de-duplicate according to the field name, empty parameter for all fields
3. df.subtract(df1) returns the elements that appear in the current DF and do not appear in df1, and do not weigh.
4. print time.localtime([ti ...
Posted by nvidia on Fri, 01 Feb 2019 13:21:15 -0800