Detailed instructions for using the sensitive word tool in java

Keywords: Java Maven github

sensitive-word

In normal work, as long as the user can speak freely (blog, document, forum), the sensitivity of the content should be considered.

sensitive-word A high performance sensitive word tool based on DFA algorithm.Tools are implemented in java to help us solve common problems.

Characteristic

  • 6W+Lexicon with continuous optimization updates

  • Better performance based on DFA algorithm

  • Using elegance and simplicity based on fluent-api implementation

  • Supports common operations such as judgment, return, desensitization of sensitive words

  • Supports full-angle and half-angle interchange

  • Supports case-to-case exchange in English

Quick Start

Get ready

  • JDK1.7+

  • Maven 3.x+

Maven Introduction

<dependency>
    <groupId>com.github.houbb</groupId>
    <artifactId>sensitive-word</artifactId>
    <version>0.0.4</version>
</dependency>

Overview of api

SensitiveWordBs serve as a guide class for sensitive words, with the following core methods:

Method parameter Return value Explain
newInstance() nothing Bootstrap Class Initialize boot class
contains(String) String to verify Boolean Value Verify that the string contains sensitive words
findAll(String) String to verify String List Returns all sensitive words in a string
replace(String, char) Replace sensitive words with specified char Character string Returns the desensitized string
replace(String) Use * to replace sensitive words Character string Returns the desensitized string

Use case

All test cases see SensitiveWordBsTest

Determine whether sensitive words are included

final String text = "The five-star red flag flutters in the wind, and the portrait of Chairman Mao stands in front of Tian'anmen.";

Assert.assertTrue(SensitiveWordBs.newInstance().contains(text));

Return to the first sensitive word

final String text = "The five-star red flag flutters in the wind, and the portrait of Chairman Mao stands in front of Tian'anmen.";

String word = SensitiveWordBs.newInstance().findFirst(text);
Assert.assertEquals("Five-star red flag", word);

Return all sensitive words

final String text = "The five-star red flag flutters in the wind, and the portrait of Chairman Mao stands in front of Tian'anmen.";

List<String> wordList = SensitiveWordBs.newInstance().findAll(text);
Assert.assertEquals("[Five-star red flag, Chairman Mao, Tiananmen]", wordList.toString());

Default Replacement Policy

final String text = "The five-star red flag flutters in the wind, and the portrait of Chairman Mao stands in front of Tian'anmen.";
String result = SensitiveWordBs.newInstance().replace(text);
Assert.assertEquals("****Flying in the wind,***The portrait of***Front.", result);

Specify what to replace

final String text = "The five-star red flag flutters in the wind, and the portrait of Chairman Mao stands in front of Tian'anmen.";
String result = SensitiveWordBs.newInstance().replace(text, '0');
Assert.assertEquals("0000 With the wind blowing, 000 portraits stand before 1000.", result);

More features

Subsequent features, mainly for a variety of situations, to improve the hit rate of sensitive words as much as possible.

This is a long *** battle.

ignore case

final String text = "fuCK the bad words.";

String word = SensitiveWordBs.newInstance().findFirst(text);
Assert.assertEquals("fuCK", word);

Ignore Half Corner Roundness

final String text = "fuck the bad words.";

String word = SensitiveWordBs.newInstance().findFirst(text);
Assert.assertEquals("fuck", word);

Late road-map

  • Conversion of numbers

  • Traditional-Simplified Interchange

  • Repeated Words

  • Pause

  • Phonetic Interchange

  • User-defined Sensitive Words and Whitelist

  • Text Mirror Flip

  • Sensitive Word Label Support

Expand reading

Implementing Sensitive Words Tool

Explanation of DFA algorithm

Sensitive Lexicon Optimization Process

Stop Word Thinking Record

Posted by jbruns on Thu, 09 Jan 2020 20:34:07 -0800