Sanitizing Input
by Simon MacDonald
@macdonst
on
Photo by Tim Mossholder on Unsplash
By this point in the pandemic, we are all tired of sanitizing our hands, but that doesn’t mean we should let our guard down when sanitizing user input. Don’t rely solely on client side validation as it can easily be bypassed.
This topic recently came up on our Discord:
This got me thinking, how would you write some middleware to handle this task and which package would be the best for the task?
Package Selection
Three popular packages for sanitizing HTML:
Name | Weekly Downloads | GitHub Stars |
---|---|---|
dompurify | 2,204,742 | 8.4k |
sanitize-html | 1,016,184 | 2.9k |
xss | 1,987,604 | 4.4k |
Again, popularity is not the best metric for choosing an npm package, but all three of these packages have a healthy development community and are actively maintained.
Let’s quickly evaluate the three packages using code size, the number of dependencies, and speed as our criteria.
Code Size
Using BundlePhobia to check the bundle size, xss
is the lightest package. It looks like dompurify
is close behind, but to run it in a Node.js runtime, you also need to include the peer dependency jsdom
which is larger than the rest of the packages put together.
Name | Bundle Size | Gzipped Bundle Size |
---|---|---|
dompurify | 19.5 kB | 7.5 kB |
sanitize-html | 147.8 kB | 45.9 kB |
xss | 17.3 kB | 5.5 kB |
jsdom | 2.3 MB | 572.5 kB |
Running slow-deps
will show you the size and install time of each package, including its dependencies.
npx slow-deps
Analyzing 4 dependencies...
[====================] 100% 0.0s
--------------------------------------------------
| Dependency | Time | Size | # Deps |
--------------------------------------------------
| jsdom | 2s | 9.4 MB | 58 |
| sanitize-html | 2s | 916 KB | 16 |
| dompurify | 868ms | 618 KB | 1 |
| xss | 742ms | 241 KB | 4 |
--------------------------------------------------
Once again xss
comes out on top with the quickest installation time and once again, dompurify
comes in second but hamstrung by the inclusion of jsdom
.
Speed
I wrote a quick test harness to pass in some “dirty” HTML with script tags and in-line JavaScript. The test runs the sanitation 10,000 per package and the average execution time reported below, along with the cost per 1 million invocations.
Name | Avg. Execution Time | Cost per 1M runs |
---|---|---|
dompurify | 11.381 ms | $0.19 |
sanitize-html | 0.875773 ms | $0.01 |
xss | 0.752937 ms | $0.01 |
xss
is the fastest with sanitize-html
close behind and dompurify
a distant third.
It’s pretty clear that xss
is our best choice as it has the quickest execution time and the smallest code footprint, which is key for keeping coldstart times down.
Sanitize Middleware
Now that we’ve picked a suitable package it’s time to write our middleware. Create a new file src/shared/sanitize.js
where we will write our sanitize code:
// src/shared/sanitize.js
const xss = require("xss");
module.exports = async function sanitize (req) {
let buff = Buffer.from(req.body, 'base64')
let text = buff.toString('ascii')
let sanitized_body = xss(text, {
stripIgnoreTagBody: ["script"]
})
req.body = sanitized_body
}
The above code:
- reads the base64 encoded text posted to our HTTP function
- converts it to plain text
- Sanitizes the text
- Replaces the body with the sanitized text for the next middleware function
Since our sanitize function was created in the src/shared
folder it will be available to all of our HTTP functions as long as we require it.
const sanitize = require('@architect/shared/sanitize')
Now in our cloud function we only need to add our sanitize function to our call to arc.http.async
. For example:
exports.handler = arc.http.async(sanitize, echo)
Our santize
function will be called first to clean up the requests body. Then our echo
function will be called, which returns the sanitized request body.
Here’s a full code sample:
// src/http/post-echo/index.js
const arc = require('@architect/functions')
const sanitize = require('@architect/shared/sanitize')
exports.handler = arc.http.async(sanitize, echo)
async function echo (req) {
return {
statusCode: 200,
headers: {
'cache-control': 'no-cache, no-store, must-revalidate, max-age=0, s-maxage=0',
'content-type': 'text/html; charset=utf8'
},
json: { body: req.body } }
}
In Conclusion
I hope this gives you a bit of insight into how we go about selecting node packages for use in our HTTP functions as well as how to write shared middleware that can be used in multiple HTTP functions in your application.