Hadoop - Programmer Group - a programming skills sharing group

Hadoop

Fully distributed cluster

Fully distributed cluster in installation mode 1. Introduction to fully distributed mode Fully distributed refers to using multiple machines to build a complete distributed file system in a real environment. In the real world, hdfs Related daemons in are also distributed in different machines, such as: -1. namenode As far as possible, the dae ...

Posted by Stryves on Mon, 06 Dec 2021 19:34:03 -0800

Scala, Java 50 programming questions

⚠️ The correctness of the solution is not guaranteed! 1. There are a pair of rabbits. From the third month after birth, a pair of rabbits are born every month. The rabbits grow to another pair of rabbits every month after the third month. If none of the rabbits dies, what is the total number of rabbits per month? def main(args: Array[Strin ...

Posted by GreyFilm on Mon, 06 Dec 2021 09:13:55 -0800

HBase quickly import huge amount of data - Bulk Loading

advantage: If we store a large amount of HBase data at one time, the processing speed is slow, and the Region resources are particularly occupied, a more efficient and convenient method is to use the "Bulk Loading" method, that is, the HFileOutputFormat class provided by HBase. It uses the principle that hbase data information is ...

Posted by rharter on Sun, 05 Dec 2021 20:42:35 -0800

How HDFS works

How HDFS works 1. NameNode and DataNode HDFS adopts master/slave architecture. An HDFS cluster consists of a NameNode and a certain number of datanodes. NameNode is a central server, which is responsible for managing the namespace of the file system and the access of clients to files. A DataNode in a cluster is usually one node, which is resp ...

Posted by SamLiu on Fri, 03 Dec 2021 06:17:26 -0800

Hadoop yarn source code analysis AsyncDispatcher event asynchronous distributor 2021SC@SDUSC

2021SC@SDUSC 1, AsyncDispatcher overview As an asynchronous event scheduler in yarn, AsyncDispatcher is a component of scheduling events based on blocking queue in RM. It dispatches events in a specific single thread and sends the dispatched events to the corresponding EventHandler event processor registered in AsyncDispatcher for processi ...

Posted by cyberrate on Wed, 01 Dec 2021 07:10:26 -0800

Scala --- option type and partial function, exception handling, regular expression

1. Pattern matching Scala has a very powerful pattern matching mechanism and is widely used, such as: Judge fixed valueType queryGet data quickly 1.1 simple pattern matching A pattern matching contains a series of alternatives, each of which starts with the keyword case. Each alternative contains a pattern and one or more expressions. The a ...

Posted by KingPhilip on Tue, 30 Nov 2021 23:38:39 -0800

MapReduce comprehensive experiment -- ranking statistics of Chinese Universities

Ranking statistics of Chinese Universities Based on MapReduce Overall thinking ① Fileinputformat reads data ② Mapper stage is simple for data processing ③ Serialization implements custom sorting ④ Partition partition processing ⑤ Reducer writes out data ⑥ Main class settings The specific implementation is as follows Driver main class, inclu ...

Posted by ursvmg on Tue, 30 Nov 2021 09:20:18 -0800

MapReduce core design -- job submission and initialization process analysis

Three components JobClient (prepare to run environment)JobTracker (receive job)TaskTracker (initialize job) Note that this is written in version 1.x and Hadoop 2. X and is managed by yarn. There are no JobTracker and TaskTracker Comparison between old and new Hadoop MapReduce frameworks 1. The client remains unchanged, and most of its call ...

Posted by eyaly on Tue, 30 Nov 2021 04:04:24 -0800

scala -- Set, Map, iterator, flattening, filtering, sorting, grouping, aggregation

4. Set set Set (also known as: Set) represents a set without duplicate elements. Features: unique, disordered The only meaning is that the elements in the Set are unique and there are no duplicate elementsUnordered means that the order in which the elements in the Set are added and taken out is inconsistent Format I: Create an empty immutabl ...

Posted by wee493 on Mon, 29 Nov 2021 15:14:24 -0800

Construction of ecological environment of Hadoop 3 Apache platform

Construction of ecological environment of Hadoop 3 Apache platform System environment The server: hadoop0,hadoop1,hadoop2 Operating system: CentOS 7.6 Software list and version: - jdk-8u202-linux-x64 - hadoop-3.2.28. - zookeeper-3.4.10 - afka_2.12-2.7.11 - spark-3.1.2-bin-hadoop3.2 - MySQL-5.1.72-1.glibc23.x86_64.rpm-bundle - hba ...

Posted by MadTechie on Mon, 29 Nov 2021 11:05:35 -0800

Hot Keywords