sensitive-word
In normal work, as long as the user can speak freely (blog, document, forum), the sensitivity of the content should be considered.
sensitive-word A high performance sensitive word tool based on DFA algorithm.Tools are implemented in java to help us solve common problems.
Characteristic
-
6W+Lexicon with continuous optimization updates
-
Better performance based on DFA algorithm
-
Using elegance and simplicity based on fluent-api implementation
-
Supports common operations such as judgment, return, desensitization of sensitive words
-
Supports full-angle and half-angle interchange
- Supports case-to-case exchange in English
Quick Start
Get ready
-
JDK1.7+
- Maven 3.x+
Maven Introduction
<dependency> <groupId>com.github.houbb</groupId> <artifactId>sensitive-word</artifactId> <version>0.0.4</version> </dependency>
Overview of api
SensitiveWordBs serve as a guide class for sensitive words, with the following core methods:
Method | parameter | Return value | Explain |
---|---|---|---|
newInstance() | nothing | Bootstrap Class | Initialize boot class |
contains(String) | String to verify | Boolean Value | Verify that the string contains sensitive words |
findAll(String) | String to verify | String List | Returns all sensitive words in a string |
replace(String, char) | Replace sensitive words with specified char | Character string | Returns the desensitized string |
replace(String) | Use * to replace sensitive words | Character string | Returns the desensitized string |
Use case
All test cases see SensitiveWordBsTest
Determine whether sensitive words are included
final String text = "The five-star red flag flutters in the wind, and the portrait of Chairman Mao stands in front of Tian'anmen."; Assert.assertTrue(SensitiveWordBs.newInstance().contains(text));
Return to the first sensitive word
final String text = "The five-star red flag flutters in the wind, and the portrait of Chairman Mao stands in front of Tian'anmen."; String word = SensitiveWordBs.newInstance().findFirst(text); Assert.assertEquals("Five-star red flag", word);
Return all sensitive words
final String text = "The five-star red flag flutters in the wind, and the portrait of Chairman Mao stands in front of Tian'anmen."; List<String> wordList = SensitiveWordBs.newInstance().findAll(text); Assert.assertEquals("[Five-star red flag, Chairman Mao, Tiananmen]", wordList.toString());
Default Replacement Policy
final String text = "The five-star red flag flutters in the wind, and the portrait of Chairman Mao stands in front of Tian'anmen."; String result = SensitiveWordBs.newInstance().replace(text); Assert.assertEquals("****Flying in the wind,***The portrait of***Front.", result);
Specify what to replace
final String text = "The five-star red flag flutters in the wind, and the portrait of Chairman Mao stands in front of Tian'anmen."; String result = SensitiveWordBs.newInstance().replace(text, '0'); Assert.assertEquals("0000 With the wind blowing, 000 portraits stand before 1000.", result);
More features
Subsequent features, mainly for a variety of situations, to improve the hit rate of sensitive words as much as possible.
This is a long *** battle.
ignore case
final String text = "fuCK the bad words."; String word = SensitiveWordBs.newInstance().findFirst(text); Assert.assertEquals("fuCK", word);
Ignore Half Corner Roundness
final String text = "fuck the bad words."; String word = SensitiveWordBs.newInstance().findFirst(text); Assert.assertEquals("fuck", word);
Late road-map
-
Conversion of numbers
-
Traditional-Simplified Interchange
-
Repeated Words
-
Pause
-
Phonetic Interchange
-
User-defined Sensitive Words and Whitelist
-
Text Mirror Flip
- Sensitive Word Label Support
Expand reading
Implementing Sensitive Words Tool