AntiSamy is an open source project of OWASP. It checks and cleans the HTML / CSS / JavaScript input by users to ensure that the input conforms to the application specifications. AntiSamy is widely used in the defense of storage and reflection XSS in Web services.
1. maven dependence
AntiSamy can be imported directly into the project, but its operation depends on xercesImpl, batik, nekohtml, which are imported together by default.
<!-- OWASP AntiSamy -->
<dependency>
<groupId>org.owasp.antisamy</groupId>
<artifactId>antisamy</artifactId>
<version>1.5.5</version>
</dependency>
2. Policy Document
AntiSamy's filtering of "malicious code" depends on policy files. The policy file specifies how AntiSamy handles tags and attributes. The strict definition of the policy file determines the defense effect of AntiSamy against XSS vulnerabilities.
In AntiSamy's jar package, there are several commonly used policy files
We can customize the policy file to filter user input, but more will be based on the existing policy file slightly adjusted to make it more suitable for the actual needs of the project.
To describe a particular rule, XML is undoubtedly a good choice, and the AntiSamy policy file is also in XML format. As shown in the figure, the AntiSamy policy file with the header removed can be divided into eight parts
1),directives
Global Configuration, Global Control of AntiSamy's Filter Verification Rules, Input and Output Formats
<directives>
<directive name="omitXmlDeclaration" value="true"/>
<directive name="omitDoctypeDeclaration" value="true"/>
<directive name="maxInputSize" value="200000"/>
<directive name="useXHTML" value="true"/>
<directive name="formatOutput" value="true"/>
<directive name="nofollowAnchors" value="true" />
<directive name="validateParamAsEmbed" value="true" />
<!--
remember, this won't work for relative URIs - AntiSamy doesn't
know anything about the URL or your web structure
-->
<directive name="embedStyleSheets" value="false"/>
<directive name="connectionTimeout" value="5000"/>
<directive name="maxStyleSheetImports" value="3"/>
</directives>
2),common-regexps
Common regular expressions can be referenced directly by name when they need to be used
<common-regexps>
<regexp name="numberOrPercent" value="(\d)+(%{0,1})"/>
<regexp name="paragraph" value="([\p{L}\p{N},'\.\s\-_\(\)\?]|&[0-9]{2};)*"/>
<regexp name="htmlId" value="[a-zA-Z0-9\:\-_\.]+"/>
</common-regexps>
Assuming that the rule "htmlId" needs to be used later, the reference can be made directly according to the corresponding name attribute.
<!-- Common to all HTML tags -->
<attribute name="id" description="The 'id' of any HTML attribute should not contain anything besides letters and numbers">
<regexp-list>
<!-- Reference directly by regular name -->
<regexp name="htmlId"/>
</regexp-list>
</attribute>
3),common-attributes
Common attributes need to satisfy input rules, including tag and css attributes; these attributes are used in tag and css processing rules
<common-attributes>
<attribute name="classid">
<regexp-list>
<regexp name="anything" />
</regexp-list>
</attribute>
<attribute name="autocomplete">
<literal-list>
<literal value="on"/>
<literal value="off"/>
</literal-list>
</attribute>
</common-attributes>
4),global-tag-attributes
Rules to follow for default attributes of all tags
<global-tag-attributes>
<!-- Not valid in base, head, html, meta, param, script, style, and title elements. -->
<attribute name="id"/>
<attribute name="style"/>
<attribute name="title"/>
<attribute name="class"/>
<!-- Not valid in base, br, frame, frameset, hr, iframe, param, and script elements. -->
<attribute name="lang"/>
</global-tag-attributes>
5),tags-to-encode
Labels requiring coding
<tags-to-encode>
<tag>g</tag>
<tag>grin</tag>
</tags-to-encode>
6),tag-rules
There are three ways to deal with tag
-
remove
The corresponding tags are deleted directly, such as script tag processing rules for deletion
<tag name="script" action="remove"/>
-
truncate
The corresponding label is shortened, all attributes are deleted directly, only labels and values are retained.
If the title only retains labels and values
<tag name="title" action="truncate"/>
-
validate
The corresponding tag attributes are validated. If the tag defines the validation rules of the attributes, they are executed according to the rules in the tag. If no attribute is defined in the tag, they are processed according to the definition in <global-tag-attributes>.
<tag name="head" action="validate"/>
7),css-rules
Processing Rules of CSS
<css-rules>
<property name="bottom" default="auto" description="">
<category-list>
<category value="visual"/>
</category-list>
<literal-list>
<literal value="auto"/>
<literal value="inherit"/>
</literal-list>
<regexp-list>
<regexp name="length"/>
<regexp name="percentage"/>
</regexp-list>
</property>
<property name="color" description="">
<category-list>
<category value="visual"/>
</category-list>
<literal-list>
<literal value="inherit"/>
</literal-list>
<regexp-list>
<regexp name="colorName"/>
<regexp name="colorCode"/>
<regexp name="rgbCode"/>
<regexp name="systemColor"/>
</regexp-list>
</property>
</css-rules>
8),allowed-empty-tags
Allow labels without content
<allowed-empty-tags>
<literal-list>
<literal value="br"/>
<literal value="hr"/>
<literal value="a"/>
<literal value="img"/>
<literal value="link"/>
<literal value="iframe"/>
<literal value="script"/>
<literal value="object"/>
<literal value="applet"/>
<literal value="frame"/>
<literal value="base"/>
<literal value="param"/>
<literal value="meta"/>
<literal value="input"/>
<literal value="textarea"/>
<literal value="embed"/>
<literal value="basefont"/>
<literal value="col"/>
<literal value="div"/>
</literal-list>
</allowed-empty-tags>
With a clear understanding of what each label represents, it is easy to write a strategy document that meets your needs. Let's take a brief look at some common policy files in the jar package
-
antisamy-anythinggoes.xml
Allow all valid HTML and CSS elements to enter (but reject JavaScript or CSS-related phishing attacks), because it contains basic rules for each element, so you can use it as a knowledge base when you tailor other policy files, which is generally not recommended.
-
antisamy-ebay.xml
eBay It is one of the most popular online auction websites. It's a public-oriented site, so it allows anyone to publish a series of rich HTML content, and allows input lists to contain more rich text content than Slashdot, so it's much more vulnerable.
This strategy is relatively safe and suitable for e-commerce websites.
-
antisamy-myspace.xml
MySpace It is one of the most popular social networking sites. Users are allowed to submit almost all the HTML and CSS they want, except JavaScript.
MySpace now uses a blacklist to validate HTML entered by users, which is relatively dangerous and not recommended.
-
antisamy-slashdot.xml
Slashdot It is a website that provides technical news, and its security strategy is very strict. Users can only submit the following HTML tags: <b>, <u>, <i>, <a>, <blockquote> and do not support CSS.
This policy file achieves similar functions, allowing all text format tags to directly modify fonts, colors or emphasize functions, and is suitable for comment filtering of news websites.
-
antisamy-tinymce.xml
Only text formats are allowed to pass, which is relatively safe
-
antisamy.xml
The default rule allows most HTML to pass through
3. Use
In fact, AntiSamy is very simple in use. After specifying the policy file, we can construct AntiSamy object and then pass the data into the AntiSamy object for filtering.
// Data to be filtered
String taintedHTML = "<script>alert(\"xss\");</script>HELLO WORD!";
// Create filtering policies based on policy files
Policy policy = Policy.getInstance( "antisamy-ebay.xml");
// Filter data according to strategy
AntiSamy antiSamy = new AntiSamy();
CleanResults cr = antiSamy.scan( taintedHTML, policy);
taintedHTML = cr.getCleanHTML();
For a project, almost every input needs to be checked, so we usually use it in conjunction with Filter.
Filter is a typical filter chain that can be used to pre-process HttpServletRequest or post-process HttpServlerResponse.
Define the custom Filter XssFilter class, implement the Filter interface, and rewrite the doFilter (ServletRequest request, ServletResponse response, FilterChain chain) method. To process user requests, you need to rewrite ServletRequest and leave it to FilterChain to execute
/**
* XSS (Cross Site Scripting) Filter
* @author zhangcs
*/
public class XssFilter implements Filter{
@SuppressWarnings("unused")
private FilterConfig filterConfig;
@Override
public void init(FilterConfig filterConfig) throws ServletException {
this.filterConfig = filterConfig;
}
@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws IOException, ServletException {
// Input rewritten Request
chain.doFilter( new XssRequestWrapper( ( HttpServletRequest)request), response);
}
@Override
public void destroy() {
this.filterConfig = null;
}
}
Create a new custom HttpServletRequest wrapper class XssRequestWrapper, inherit the HttpServletRequestWrapper class, and rewrite getParameter (String param), getParameterValues (String param) and getHeader (String param).
For example, the MVC framework we use is Spring MVC, which needs to filter the parameter values entered by users, and rewrite the getParameterValues (String name) method.
/**
* Customized XSS request wrapper
* @author zhangcs
*/
public class XssRequestWrapper extends HttpServletRequestWrapper {
public XssRequestWrapper( HttpServletRequest request) {
super(request);
}
@Override
public String[] getParameterValues( String name){
String[] values = super.getParameterValues( name);
if ( values == null){
return null;
}
int len = values.length;
String[] newArray = new String[len];
for (int j = 0; j < len; j++){
// filter
newArray[j] = xssClean( values[j]);
}
return newArray;
}
/**
* Policy file
* Note that the policy files to be used need to be placed under the project resource file path
* */
private static String antiSamyPath = XssRequestWrapper.class.getClassLoader()
.getResource( "antisamy-ebay.xml").getFile();
/**
* AntiSamy Filtering data
* @param taintedHTML Data to be filtered
* @return Return filtered data
* */
private String xssClean( String taintedHTML){
try{
// Specify policy files
Policy policy = Policy.getInstance( antiSamyPath);
// Use AntiSamy for filtering
AntiSamy antiSamy = new AntiSamy();
CleanResults cr = antiSamy.scan( taintedHTML, policy);
taintedHTML = cr.getCleanHTML();
}catch( ScanException e) {
e.printStackTrace();
}catch( PolicyException e) {
e.printStackTrace();
}
return taintedHTML;
}
}
The Filter object will be created by the server based on the configuration information in the web.xml file when the web application starts, so you also need to configure and register the custom XssFilter in the web.xml file.
<!-- register XssFilter -->
<filter>
<filter-name>XSSFilter</filter-name >
<filter-class>cn.ghr.ehr.filter.XssFilter</filter-class >
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<!-- Configure requests that need to be filtered -->
<filter-mapping>
<filter-name>XSSFilter</filter-name >
<url-pattern>/</url-pattern>
</filter-mapping>