Big Data [Page 7] - Programmer Group - a programming skills sharing group

Big Data

2021SC@SDUSC Hbase project overview

2021SC@SDUSC 1, HBase overview What is HBase HBase is a database system built on HDFS, which provides high reliability, high performance, column storage, scalability and real-time reading and writing. It is mainly used to store unstructured and semi-structured loose data. HBase uses hadoop HDFS as its file storage system, Hadoop MapReduc ...

Posted by gman-03 on Sun, 24 Oct 2021 20:13:50 -0700

The Minio Java framework for object storage spring cloud encapsulates the Minio SDK as an http interface

preface Due to the needs of the project, binary files such as audio and images need to be stored. These files cannot be stored in the database. HBase has been used for audio storage before. However, HBase's api is too complex to use There is also fastdfs for file storage, but the installation and configuration is very complex. Later, I found ...

Posted by Canadian on Sun, 24 Oct 2021 13:04:21 -0700

Consumer group analysis and use of sarama

The most used go client of kafka should be sarama, but the old version of sarama did not support the consumption mode of consumer groups, so most people use sarama cluster. Later, sarama supported the consumption mode of consumer groups, and sarama cluster stopped maintenance. However, there are few online analysis of sarama consumer groups, a ...

Posted by ryeman98 on Thu, 21 Oct 2021 23:24:56 -0700

It took a month to sort out this Hadoop blood spitting dictionary

This document is compiled with reference to the official website of Hadoop and many other materials. For neat typesetting and comfortable reading, blurred and unclear pictures and black-and-white pictures are redrawn into high-definition color pictures. At present, Hadoop 2. X is widely used in enterprises, so this article focuses on Hado ...

Posted by b-real on Thu, 21 Oct 2021 21:42:54 -0700

Construction of data warehouse environment

Hive environment construction Hive engine introduction Hive engine includes: default MR, tez, spark Hive on Spark: hive not only stores metadata, but also is responsible for SQL parsing and optimization. The syntax is HQL syntax. The execution engine has become Spark, and Spark is responsible for RDD execution. Spark on hive: hive is only ...

Posted by wha??? on Thu, 21 Oct 2021 19:46:17 -0700

Phoenix installation and use

Phoenix installation and use 1. Background introduction 1.1 Phoenix definition Phoenix is an open source SQL skin for HBase. You can use the standard JDBC API instead of the HBase client API to create tables, insert data, and query HBase data. 1.2 Phoenix features Easy integration: such as Spark, Hive, Pig, Flume and Map Reduce. Good per ...

Posted by epukinsk on Tue, 19 Oct 2021 12:40:48 -0700

Sqoop+Jekins realize the mutual transmission between Mysql and Hive database

1, Foreword Recently, we are using Sqoop+Jekins to realize the data transfer between mysql and hive database. It mainly uses the Import command of sqoop to import mysql data into hive, and uses the export command to export hive data to mysql. Jekins plays a regular role, executing sh scripts regularly and synchronizing once a day. Relevant ...

Posted by reethu on Mon, 18 Oct 2021 15:17:56 -0700

HDFS principle and operation

hadoop fs hadoop fs -ls / hadoop fs -lsr hadoop fs -mkdir /user/hadoop hadoop fs -put a.txt /user/hadoop/ hadoop fs -get /user/hadoop/a.txt / hadoop fs -cp src dst hadoop fs -mv src dst hadoop fs -cat /user/hadoop/a.txt hadoop fs -rm /user/hadoop/a.txt hadoop fs -rmr /user/hadoop/a.txt hadoop fs -text /user/hadoop/a.txt hadoop fs -copyFromL ...

Posted by JamieinNH on Sat, 16 Oct 2021 10:26:18 -0700

Solution of Spark data skew

Data skew caused by Shuffle When data skew occurs during Shuffle, we generally follow the troubleshooting steps ① Check the WEB-UI page to check the execution of tasks in the Stage of each Job, and whether there is an obvious situation that the execution time is too long ② If the task reports an error, check the corresponding log excepti ...

Posted by dstantdog3 on Sat, 16 Oct 2021 10:15:13 -0700

36-Blog Site Database-Blog Comment Information Data Operation

36-Blog Site Database-Blog Comment Information Data Operation Item Description Nowadays, micro-blog and blog publishing information have become the main information publishing and dissemination system. How to manage these data, this project mainly operates on the blog information table and commentary table in the blog website. Blog site data ...

Posted by Weedpacket on Fri, 15 Oct 2021 09:19:53 -0700

Hot Keywords