Hbase specific operation (illustrated and super complete ~ ~ ~)

Keywords: Big Data Hadoop HBase

Purpose:
(1) Understand the role of HBase in Hadoop architecture.
(2) Proficient in using HBase to operate common Shell commands.
Objectives:
(1) Be familiar with hbase related operations, and master the operations of creating tables, modifying tables, looking up tables, deleting tables, etc.
(2) You can create a table by yourself, be familiar with the above operations, and insert no less than 10 pieces of data to facilitate the use of subsequent filters.

Steps for related operations: start hadoop – > start hbase – > open hbase shell

1, Data definition
1. Create table
(1) To create a table in HBase, you need to specify the table name and column family name. For example, the command to create a Student information table in the table is as follows:

create 'student', 'stuinfo', 'Grades'

This command creates a table named Student, which contains two column families, StuInfo and Grades. Note that in HBase Shell syntax, all string parameters must be enclosed in single quotes.

After creating the table structure, you can use the exceptions command to check whether the table exists, or use the list command to check all the tables in the database.

exists 'Student'
list

You can use the describe command to view the column family information of the specified table. The describe command describes the detailed structure of the table, including how many column families there are and the parameter information of each column family.
describe 'Student'

2. Change the structure of the table
The parameters displayed by the describe command can be modified using the alter command.
The HBase table structure and table management can be completed through the alter command, which can be used to change the column family parameter information, add column families, delete column families, and change the relevant settings of the table. First, modify the parameter information of the column family, such as the version of the column family.

alter 'Student', {NAME =>'Grades', VERSIONS=>3}

Modifying the parameters and forms of multiple column families is similar to the create command. Note here that when modifying the column family attribute of existing data, HBase needs to modify all the data in the column family. If the amount of data is large, the modification may take a long time.
If you need to add a column family hobby in the Sudent table, use the following command:

alter 'Student','hobby'

If you want to remove or delete an existing column family, you can complete the following two commands:

alter 'Student', {NAME=>'hobby',METHOD=>'delete'}

alter 'Student','delete'=>'hobby


In addition, the HBase table must contain at least one column family, so when there is only one column family in the table, it cannot be deleted.
3. Delete table
Before deleting a table, you need to disable the table before deleting it. Use the following command to delete the table:

disable 'Table name'
drop  'Table name'

After disabling the table with disable, you can use is disable to check whether the table is disabled successfully. In addition. If you just want to clear all the data in the table, use the truncate command to divide the table and re create the table according to the original structure:
For practical operations, you can create a new table "yaohandle" and delete it.

2, Data operation
1.put adds a value to the specified cell
Insert data into HBase. Use the put command to add a new row of data to the table or overwrite the data of the specified row:

put 'Student', "0001, 'stuinfo: Name', 'Tom Green',1

In the above command, the first parameter Student is the table name; The second parameter 0001 is the name of the square row key, which is a string type; The third parameter Stulnfo: Name is the name of the column family and column, separated by a colon. The column family name must have been created, otherwise HBase will report an error; The column name is temporarily defined, so the columns in the column family can be extended at will; The fourth parameter Tom Green is the value of the cell. In HBase, all data is in the form of string; The last parameter 1 is the timestamp. If the timestamp is not set, the system will automatically insert the current time as the timestamp.
Note that the put command can only insert data from cells.
In order to facilitate the use of the filter in the future, input more than 10 pieces of data. For your convenience, I only list some screenshots.

put 'Student', '0001', 'StuInfo:Name', 'Tom Green',1
put 'Student', '0001', 'StuInfo:Age', '18' 
put 'Student', '0001', 'StuInfo:Sex', 'Male'
put 'Student', '0001',  'Grades:BigData', '80'
put 'Student', '0001',  'Grades:Computer','90'
put 'Student', '0001', 'Grades:Math', '85'

put 'Student', '0002', 'StuInfo:Name', 'Jack',1
put 'Student', '0002', 'StuInfo:Age', '19' 
put 'Student', '0002', 'StuInfo:Sex', 'Male'
put 'Student', '0002', 'Grades:BigData', '85'
put 'Student', '0002', 'Grades:Java','83'
put 'Student', '0002', 'Grades:Math','82'

put 'Student', '0003', 'StuInfo:Name', 'Marry',1
put 'Student', '0003', 'StuInfo:Age', '20' 
put 'Student', '0003', 'StuInfo:Sex', 'Female'
put 'Student', '0003', 'Grades:BigData', '59'
put 'Student', '0003', 'Grades:Java','69'

put 'Student', '0004', 'StuInfo:Name', 'Gavin',1
put 'Student', '0004', 'StuInfo:Age', '21' 
put 'Student', '0004', 'StuInfo:Sex', 'Female'
put 'Student', '0004', 'Grades:BigData', '90'
put 'Student', '0004', 'Grades:Java','89'


put 'Student', '0005', 'StuInfo:Name', 'Sun',1
put 'Student', '0005', 'StuInfo:Age', '18' 
put 'Student', '0005', 'StuInfo:Sex', 'Female'
put 'Student', '0005', 'Grades:BigData', '99'
put 'Student', '0005', 'Grades:Java','94'


2.delete
The delete command can delete a cell or a rowset from a table. The syntax is similar to put. The table name and column family name must be specified, and the column name and timestamp are optional.

delete 'Student','0005','Grade'

It should be noted that the delete operation will not delete the data immediately. It will only mark the corresponding data with the delete flag tombstone. The data will be deleted only when the data is merged. In addition, the minimum granularity of the delete command is Cell. For example, executing the following command will delete data with row key 0001, StuInfo column family member Age and timestamp less than or equal to 2 in the Student table:

delete 'Student','0005','StuInfo:Age',2

The delete command cannot operate across column families. If you want to delete the data of all column families in a table on a row, that is, to delete a logical row, you need to use the delete all command, as shown below. You do not need to specify the column family and column name.

delete 'Student','0005'  

3.get: obtain row or cell data through parameters such as table name and row key

get 'Student','0001',{COLUMN=>'StuInfo',VERSIONS=>3}

4.scan: traverse the table and output row records that meet the conditions.
Specify the name of the column family:

scan 'Student',{COLUMN=>'StuInfo'}

Specify the name of the column family and column

scan 'Student',{COLUMN=>' StuInfo : Name'}

Specify the number of output rows:

scan 'Student',{LIMIT=>1}

Specify output row key range:

scan 'Student',{STARTROW=>'0003',ENDROW=>'0003'}

3, Filter operation
In HBase, both get and scan operations can use filters to set the output range, similar to the Where query condition in SQL. Use the show filter command to view the filter types currently supported by HBase.

1. Line key filter
RowFilter works with comparators and operators to compare and filter line key strings. For example, a binary comparator can be used to match the data greater than 0001 in the row key; To match the line key starting with 0001, you can use the substring comparator. Note that substring does not support greater than or less than operators.

scan 'Student',FILTER=>"RowFilter(=,'substring:0001')"

scan 'Student',FILTER=>"RowFilter(>,'binary:0001')"

PrefixFilter: a row key prefix comparator that compares row key prefixes

scan 'Student',FILTER=>"PrefixFilter('0001')"

KeyOnlyFilter: only the keys of cells are filtered and displayed, and no values are displayed

scan 'Student',FILTER=>"KeyOnlyFilter()"

FirstKeyOnlyFilter scans only the first cell that displays the same key, and its key value pairs are displayed

scan 'Student',FILTER=>"FirstKeyOnlyFilter()"

InclusiveStopFilter: returns the termination condition line instead of ENDROW

scan 'Student',{STARTROW=>'0001',FILTER=>"InclusiveStopFilter('binary:0002')"}

2. Column family and column filter
The filter for column family filtering is FamilyFilter, whose syntax structure is similar to Rowfilter, except that FamilyFilter filters column family names. For example, the following command scans the Student table to display rows whose column family is Girades.

scan 'Student',FILTER=>"FamilyFilter(=,'substring:Grades')"

QualifierFilter: a column identification filter that displays only the data corresponding to the column name

scan 'Student',FILTER=>"QualifierFilter(=,'substring:Math')"

ColumnPreFilter: filter the prefix of column name

scan 'Student',FILTER=>"ColumnPreFilter('Ma')"

MultipleColumnPrefixFilter: you can specify multiple previous column names

scan 'Student',FILTER=>"MultipleColumnPrefixFilter('Ma','Ag')"

ColumnRangeFilter: filter the range of column names

scan 'Student',FILTER=>"ColumnRangeFilter('Big',true,'Math',false)"

3. Value filter
In the filter of HBase, there is also a filter for scanning cells, that is, value filter.
ValueFilter: value filter to find the key value pairs that meet the value conditions

scan 'Student',FILTER=>"ValueFilter(=,'substring:Jack')"

get 'Student','0004',FILTER=>"ValueFilter(=,'substring:Jack')"

SingleColumnValueFilter: a value filter that compares in a specified column family and column

scan 'Student',FILTER=>"SingleColumnValueFilter('StuInfo','Name',=,'substring:Jack')"

SingleColumnValueExcludeFilter: exclude values that match successfully

scan 'Student',FILTER=>"SingleColumnValueExcludeFilter('StuInfo','Name',=,'substring:Jack')"

4. Other filters
ColumnCountGetFilter: limits the number of key value pairs returned per logical row, which is used in the get method

get 'Student','0001',FILTER=>"ColumnCountGetFilter(3)"

TimestampsFilter: timestamp filtering. It supports equivalence. Multiple timestamps can be set

scan 'Student',FILTER=>"TimestampsFilter(1,4)"

InclusiveStopFilter: sets the stop line

scan 'Student',{STARTROW=>'0001',ENDROW=>'0004',FILTER=>"InclusiveStopFilter('binary:0003')"}

PageFilter: paginate the display results by line

scan 'Student',{STARTROW=>'0001',ENDROW=>'0004',FILTER=>"PageFilter(3)"}

ColumnPaginationFilter: paginate all columns in a row and return only columns within the range of [offset, offset+limit]

scan 'Student',{STARTROW=>'0001',ENDROW=>'0004',FILTER=>"ColumnPaginationFilter(2,1)"}

Well, here we are. The sharing of this issue is over. See you in the next issue. More wonderful articles. Welcome to one click and three links~~~

Posted by djBuilder on Wed, 17 Nov 2021 08:45:22 -0800