On October 18, the Web Platform Incubator Community Group in W3C published the draft specification of HTML Sanitizer API. This draft is used to solve the problem of how browsers solve XSS attacks.
XSS cross site scripting attacks are a headache for developers in network security. This attack usually refers to injecting malicious instruction code into the web page by taking advantage of the loopholes left during web page development, so that the user can load and execute the web page program maliciously created by the attacker.
These malicious codes are not filtered and mixed with the normal code of the website. The browser cannot distinguish which content is trusted, and the malicious script will be executed. The core of XSS attack has two steps: 1. Deal with the malicious code submitted by the attacker; 2. The browser executes malicious code.
In order to solve this problem in these two-step malicious attacks, the following methods are usually used,
- Add filter condition
- Only front-end line rendering is performed to separate the data and code content
- Full escape of HTML
These steps are cumbersome and need to pay attention to a lot. In order to make it easier for developers to solve the problem of XSS attacks, browsers now provide native XSS attack disinfection capabilities.
HTML Sanitizer API - this API jointly initiated and provided by Google, Mozilla and Cure53 is about to be finalized. Through this browser native API, we can more easily protect Web applications from XSS attacks.
Next, let's take a look at this security API.
Introduction to Sanitizer API
The Sanitizer API allows the browser to delete malicious code directly from the dynamically updated tags of the website. When malicious HTML strings, documents or document fragment objects want to be inserted into the existing DOM, we can use the HTML Sanitizer API to clean up these contents directly. It is a bit like a computer security guard application, which can remove risk content.
Using the Sanitizer API has three advantages:
- Reduce the number of cross site scripting attacks in Web Applications
- Ensure that HTML output is used safely in the current user agent
- The Sanitizer API is highly available
Features of Sanitizer API
Sanitizer API opens the door to a new world for HTML string security. It roughly classifies all functions into the following three main features:
1. Anti virus user input
The main function of the Sanitizer API is to accept strings and convert them to safer strings. These converted strings do not execute additional JavaScript and ensure that the application is protected from XSS attacks.
2. Built in browser
The library is pre installed when the browser is installed and updated when a bug or new attack is found. It is equivalent that our browser has built-in anti-virus measures without importing any external libraries.
3. Simple and safe use
After using the Sanitizer API, the browser now has a powerful and secure parser. As a mature browser, it knows how to handle the activities of each element in the DOM. In contrast, the external parser developed with JavaScript is not only expensive, but also can not keep up with the update speed of the front-end environment.
After finishing these highlights, let's take a look at the specific usage of this API.
Use of Sanitizer API
Sanitizer API uses sanitizer () method constructor and sanitizer class for configuration.
The government provides three basic cleaning methods:
1. Clean up hidden context strings
Element.setHTML() is used to parse and clean up the string and immediately insert it into the dom. This method is applicable to the case where the target DOM element is known and the HTML content is a string.
const $div = document.querySelector('div') const user_input = `<em>Hello There</em><img src="" onerror=alert(0)>` // The user string. const sanitizer = new Sanitizer() // Our Sanitizer // We want to insert the HTML in user_string into a target element with id // target. That is, we want the equivalent of target.innerHTML = value, except // without the XSS risks. $div.setHTML(user_input, sanitizer) // <div><em>Hello There</em><img src=""></div>
2. Clean up the text string of the given context
Sanitizer.sanitizeFor() is used to parse, clean up, and prepare strings to be added to the DOM later.
This applies when the HTML content is a string and the target DOM element type is known (for example, div, span).
const user_input = `<em>Hello There</em><img src="" onerror=alert(0)>` const sanitizer = new Sanitizer() // Later: // The first parameter describes the node type this result is intended for. sanitizer.sanitizeFor("div", user_input) // HTMLDivElement <div>
It should be noted that the cleanup output of. innerHTML in HTMLElement is in string format.
sanitizer.sanitizeFor("div", user_input).innerHTML // <em>Hello There</em><img src="">
3. Clean up the request node
For documentfragments that are already under user control, Sanitizer.sanitize () can directly clean up DOM tree nodes.
// Case: The input data is available as a tree of DOM nodes. const sanitizer = new Sanitizer() const $userDiv = ...; $div.replaceChildren(s.sanitize($userDiv));
In addition to the three methods mentioned above, the sanitizer API modifies HTML strings by deleting and filtering attributes and tags.
Take "chestnuts".
- Delete some tags (_script, marquee, head, frame, menu, object, etc.) and keep the content tag.
- Remove most attributes, and only retain the HREF on the < a > tag and colspanson < td >, < th > tag.
- Filter out the contents that may lead to the execution of risk scripts.
By default, this security API is only used to prevent the occurrence of XSS. However, in some cases, we also need to customize self-defined settings. Here are some common configurations.
Custom disinfection
Create a configuration object and pass it to the constructor when initializing the Sanitizer API.
const config = { allowElements: [], blockElements: [], dropElements: [], allowAttributes: {}, dropAttributes: {}, allowCustomElements: true, allowComments: true }; // sanitized result is customized by configuration new Sanitizer(config)
Here are some common methods:
- allowElements holds the specified input
- blockElements blockElements deletes parts of the content that need to be retained
- dropElements dropElements deletes the specified content, including the entered content
const str = `hello <b><i>there</i></b>` new Sanitizer().sanitizeFor("div", str) // <div>hello <b><i>there</i></b></div> new Sanitizer({allowElements: [ "b" ]}).sanitizeFor("div", str) // <div>hello <b>there</b></div> new Sanitizer({blockElements: [ "b" ]}).sanitizeFor("div", str) // <div>hello <i>there</i></div> new Sanitizer({allowElements: []}).sanitizeFor("div", str) // <div>hello there</div>
- The two parameters allowAttributes and dropAttributes can customize the parts that need to be retained or deleted.
const str = `<span id=foo class=bar style="color: red">hello there</span>` new Sanitizer().sanitizeFor("div", str) // <div><span id="foo" class="bar" style="color: red">hello there</span></div> new Sanitizer({allowAttributes: {"style": ["span"]}}).sanitizeFor("div", str) // <div><span style="color: red">hello there</span></div> new Sanitizer({dropAttributes: {"id": ["span"]}}).sanitizeFor("div", str) // <div><span class="bar" style="color: red">hello there</span></div>
- Allowcustomeelements turns on whether to use custom elements
const str = `<elem>hello there</elem>` new Sanitizer().sanitizeFor("div", str); // <div></div> new Sanitizer({ allowCustomElements: true, allowElements: ["div", "elem"] }).sanitizeFor("div", str); // <div><elem>hello there</elem></div>
If there is no configuration, the default configuration content will be used directly.
It seems that this API can solve many problems for us, but now the browser's support for it is still limited, and more functions are still improving. We are also looking forward to seeing a more fully functional sanitizer API
Small partners interested in it can use chrome 93 + about://flags/#enable-experimental-web-platform-features is enabled. Firefox is also in the experimental stage. You can set dom.security.sanitizer.enabled to true in about:config.
To learn more, see: https://developer.mozilla.org/en-US/docs/Web/API/HTML\_Sanitizer\_API
Concerns about data security
According to the Verizon 2020 data breach investigation report (Verizon Business, 2020), about 90% of data breaches are caused by cross site scripting (XSS) and security vulnerabilities. For front-end developers, in the face of more and more frequent network attacks, in addition to using security mechanisms such as Sanitizer API, they can also consider using the method of "separation of data and code" SpreadJS And other front-end table controls.