In the user portrait scenario, many tags are usually developed. As a qualifier, each tag needs to be offline after some tags are no longer in use. However, the delete related api provided by hbase can only be used for a single line. It is not easy to clean up all the data of a qualifier. Here we provide an implementation scheme based on coprocessor;
hbase provides the following five hook s for the compact process to embed custom code:
- preCompactSelection
- postCompactSelection
- preCompactScannerOpen
- preCompact
- postCompact
preCompact will be called before reading the data after storeScanner is created, so the idea here is to proxy scanner, create a new scanner to implement its next method, and then process the raw data read.
The code is as follows, referring to the ValueRewritingObserver class in HBase examples module:
public class QualifierDeletingObserver implements RegionObserver, RegionCoprocessor { private static final Logger LOG = LoggerFactory.getLogger(QualifierDeletingObserver.class); private byte[] qualifierToDelete = null; private Bytes.ByteArrayComparator comparator; @Override public Optional<RegionObserver> getRegionObserver() { // Extremely important to be sure that the coprocessor is invoked as a RegionObserver return Optional.of(this); } @Override public void start( @SuppressWarnings("rawtypes") CoprocessorEnvironment env) throws IOException { RegionCoprocessorEnvironment renv = (RegionCoprocessorEnvironment) env; qualifierToDelete = Bytes.toBytes(renv.getConfiguration().get("qualifier.to.delete")); comparator = new Bytes.ByteArrayComparator(); } @Override public InternalScanner preCompact( ObserverContext<RegionCoprocessorEnvironment> c, Store store, final InternalScanner scanner, ScanType scanType, CompactionLifeCycleTracker tracker, CompactionRequest request) { InternalScanner modifyingScanner = new InternalScanner() { @Override public boolean next(List<Cell> result, ScannerContext scannerContext) throws IOException { boolean ret = scanner.next(result, scannerContext); for (int i = 0; i < result.size(); i++) { Cell c = result.get(i); byte[] qualifier = CellUtil.cloneQualifier(c); if (comparator.compare(qualifier, qualifierToDelete) == 0) { result.remove(i); } } return ret; } @Override public void close() throws IOException { scanner.close(); } }; return modifyingScanner; } }
Upload the jar package to hdfs;
The following is a simple test process display;
create 'cp_test','f' put 'cp_test','rk1','f:q1','123' put 'cp_test','rk1','f:q2','123' put 'cp_test','rk2','f:q1','123' put 'cp_test','rk2','f:q2','123' put 'cp_test','rk2','f:q3','123' hbase(main):015:0> scan 'cp_test' ROW COLUMN+CELL rk1 column=f:q1, timestamp=1590567958995, value=123 rk1 column=f:q2, timestamp=1590567959023, value=123 rk2 column=f:q1, timestamp=1590567959048, value=123 rk2 column=f:q2, timestamp=1590567959073, value=123 rk2 column=f:q3, timestamp=1590567959842, value=123 alter 'cp_test' \ , METHOD => 'table_att', 'coprocessor'=>'hdfs://xxx.jar|xxx.QualifierDelexxxtingObserver|1024|qualifier.to.delete=q1' flush 'cp_test' major_compact 'cp_test' hbase(main):017:0> scan 'cp_test' ROW COLUMN+CELL rk1 column=f:q2, timestamp=1590567959023, value=123 rk2 column=f:q2, timestamp=1590567959073, value=123 rk2 column=f:q3, timestamp=1590567959842, value=123