pax_global_header00006660000000000000000000000064145117265030014516gustar00rootroot0000000000000052 comment=0eb99d24d2c54b9ade957a8c60efbb8ab9c7523c bluemonday-1.0.26/000077500000000000000000000000001451172650300137435ustar00rootroot00000000000000bluemonday-1.0.26/.coveralls.yml000066400000000000000000000000561451172650300165370ustar00rootroot00000000000000repo_token: x2wlA1x0X8CK45ybWpZRCVRB4g7vtkhaw bluemonday-1.0.26/.editorconfig000066400000000000000000000000421451172650300164140ustar00rootroot00000000000000root = true [*] end_of_line = lf bluemonday-1.0.26/.gitattributes000066400000000000000000000000231451172650300166310ustar00rootroot00000000000000* text=auto eol=lf bluemonday-1.0.26/.github/000077500000000000000000000000001451172650300153035ustar00rootroot00000000000000bluemonday-1.0.26/.github/dependabot.yml000066400000000000000000000001541451172650300201330ustar00rootroot00000000000000version: 2 updates: - package-ecosystem: "gomod" directory: "/" schedule: interval: "daily" bluemonday-1.0.26/.github/funding.yml000066400000000000000000000000161451172650300174550ustar00rootroot00000000000000github: buro9 bluemonday-1.0.26/.github/workflows/000077500000000000000000000000001451172650300173405ustar00rootroot00000000000000bluemonday-1.0.26/.github/workflows/test.yml000066400000000000000000000012141451172650300210400ustar00rootroot00000000000000on: [push, pull_request] name: Test jobs: test: strategy: fail-fast: false matrix: go-version: [1.19.x, 1.20.x] os: [ubuntu-latest, macos-latest, windows-latest] runs-on: ${{ matrix.os }} steps: - name: Install Go uses: actions/setup-go@v2 with: go-version: ${{ matrix.go-version }} - name: Checkout code uses: actions/checkout@v2 - name: Test run: go test -v ./... check: runs-on: ubuntu-latest steps: - name: Install Go uses: actions/setup-go@v2 with: go-version: 1.20.x - name: Checkout code uses: actions/checkout@v2 bluemonday-1.0.26/.gitignore000066400000000000000000000003351451172650300157340ustar00rootroot00000000000000 # Binaries for programs and plugins *.exe *.exe~ *.dll *.so *.dylib # Test binary, built with `go test -c` *.test # Output of the go coverage tool, specifically when used with LiteIDE *.out # goland idea folder *.ideabluemonday-1.0.26/.travis.yml000066400000000000000000000004451451172650300160570ustar00rootroot00000000000000language: go go: - 1.2.x - 1.3.x - 1.4.x - 1.5.x - 1.6.x - 1.7.x - 1.8.x - 1.9.x - 1.10.x - 1.11.x - 1.12.x - 1.13.x - 1.14.x - 1.15.x - 1.16.x - tip matrix: allow_failures: - go: tip fast_finish: true install: - go get . script: - go test -v ./... bluemonday-1.0.26/CONTRIBUTING.md000066400000000000000000000104321451172650300161740ustar00rootroot00000000000000# Contributing to bluemonday Third-party patches are essential for keeping bluemonday secure and offering the features developers want. However there are a few guidelines that we need contributors to follow so that we can maintain the quality of work that developers who use bluemonday expect. ## Getting Started * Make sure you have a [Github account](https://github.com/signup/free) ## Guidelines 1. Do not vendor dependencies. As a security package, were we to vendor dependencies the projects that then vendor bluemonday may not receive the latest security updates to the dependencies. By not vendoring dependencies the project that implements bluemonday will vendor the latest version of any dependent packages. Vendoring is a project problem, not a package problem. bluemonday will be tested against the latest version of dependencies periodically and during any PR/merge. 2. I do not care about spelling mistakes or whitespace and I do not believe that you should either. PRs therefore must be functional in their nature or be substantial and impactful if documentation or examples. 3. This module does not participate in hacktober, please make your contributions meaningful. ## Submitting an Issue * Submit a ticket for your issue, assuming one does not already exist * Clearly describe the issue including the steps to reproduce (with sample input and output) if it is a bug If you are reporting a security flaw, you may expect that we will provide the code to fix it for you. Otherwise you may want to submit a pull request to ensure the resolution is applied sooner rather than later: * Fork the repository on Github * Issue a pull request containing code to resolve the issue ## Submitting a Pull Request * Submit a ticket for your issue, assuming one does not already exist * Describe the reason for the pull request and if applicable show some example inputs and outputs to demonstrate what the patch does * Fork the repository on Github * Before submitting the pull request you should 1. Include tests for your patch, 1 test should encapsulate the entire patch and should refer to the Github issue 1. If you have added new exposed/public functionality, you should ensure it is documented appropriately 1. If you have added new exposed/public functionality, you should consider demonstrating how to use it within one of the helpers or shipped policies if appropriate or within a test if modifying a helper or policy is not appropriate 1. Run all of the tests `go test -v ./...` or `make test` and ensure all tests pass 1. Run gofmt `gofmt -w ./$*` or `make fmt` 1. Run vet `go tool vet *.go` or `make vet` and resolve any issues 1. Install golint using `go get -u github.com/golang/lint/golint` and run vet `golint *.go` or `make lint` and resolve every warning * When submitting the pull request you should 1. Note the issue(s) it resolves, i.e. `Closes #6` in the pull request comment to close issue #6 when the pull request is accepted Once you have submitted a pull request, we *may* merge it without changes. If we have any comments or feedback, or need you to make changes to your pull request we will update the Github pull request or the associated issue. We expect responses from you within two weeks, and we may close the pull request is there is no activity. ### Contributor Licence Agreement We haven't gone for the formal "Sign a Contributor Licence Agreement" thing that projects like [puppet](https://cla.puppetlabs.com/), [Mojito](https://developer.yahoo.com/cocktails/mojito/cla/) and companies like [Google](http://code.google.com/legal/individual-cla-v1.0.html) are using. But we do need to know that we can accept and merge your contributions, so for now the act of contributing a pull request should be considered equivalent to agreeing to a contributor licence agreement, specifically: * You accept that the act of submitting code to the bluemonday project is to grant a copyright licence to the project that is perpetual, worldwide, non-exclusive, no-charge, royalty free and irrevocable. * You accept that all who comply with the licence of the project (BSD 3-clause) are permitted to use your contributions to the project. * You accept, and by submitting code do declare, that you have the legal right to grant such a licence to the project and that each of the contributions is your own original creation. bluemonday-1.0.26/CREDITS.md000066400000000000000000000006011451172650300153570ustar00rootroot000000000000001. John Graham-Cumming http://jgc.org/ 1. Mohammad Gufran https://github.com/Gufran 1. Steven Gutzwiller https://github.com/StevenGutzwiller 1. Andrew Krasichkov @buglloc https://github.com/buglloc 1. Mike Samuel mikesamuel@gmail.com 1. Dmitri Shuralyov shurcooL@gmail.com 1. opennota https://github.com/opennota https://gitlab.com/opennota 1. Tom Anthony https://www.tomanthony.co.uk/bluemonday-1.0.26/LICENSE.md000066400000000000000000000031001451172650300153410ustar00rootroot00000000000000SPDX short identifier: BSD-3-Clause https://opensource.org/licenses/BSD-3-Clause Copyright (c) 2014, David Kitchen All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the organisation (Microcosm) nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. bluemonday-1.0.26/Makefile000066400000000000000000000022731451172650300154070ustar00rootroot00000000000000# Targets: # # all: Builds the code locally after testing # # fmt: Formats the source files # fmt-check: Check if the source files are formated # build: Builds the code locally # vet: Vets the code # staticcheck: Runs staticcheck over the code # test: Runs the tests # cover: Gives you the URL to a nice test coverage report # # install: Builds, tests and installs the code locally GOFILES_NOVENDOR = $(shell find . -type f -name '*.go' -not -path "./vendor/*" -not -path "./.git/*") .PHONY: all fmt build vet lint test cover install # The first target is always the default action if `make` is called without # args we build and install into $GOPATH so that it can just be run all: fmt vet test install fmt: @gofmt -s -w ${GOFILES_NOVENDOR} fmt-check: @([ -z "$(shell gofmt -d $(GOFILES_NOVENDOR) | head)" ]) || (echo "Source is unformatted"; exit 1) build: @go build vet: @go vet staticcheck: @staticcheck ./... test: @go test -v ./... cover: COVERAGE_FILE := coverage.out cover: @go test -coverprofile=$(COVERAGE_FILE) && \ go tool cover -html=$(COVERAGE_FILE) && rm $(COVERAGE_FILE) install: @go install ./... bluemonday-1.0.26/README.md000066400000000000000000000460361451172650300152330ustar00rootroot00000000000000# bluemonday [![GoDoc](https://godoc.org/github.com/microcosm-cc/bluemonday?status.png)](https://godoc.org/github.com/microcosm-cc/bluemonday) [![Sourcegraph](https://sourcegraph.com/github.com/microcosm-cc/bluemonday/-/badge.svg)](https://sourcegraph.com/github.com/microcosm-cc/bluemonday?badge) bluemonday is a HTML sanitizer implemented in Go. It is fast and highly configurable. bluemonday takes untrusted user generated content as an input, and will return HTML that has been sanitised against an allowlist of approved HTML elements and attributes so that you can safely include the content in your web page. If you accept user generated content, and your server uses Go, you **need** bluemonday. The default policy for user generated content (`bluemonday.UGCPolicy().Sanitize()`) turns this: ```html Hello World ``` Into a harmless: ```html Hello World ``` And it turns this: ```html XSS ``` Into this: ```html XSS ``` Whilst still allowing this: ```html ``` To pass through mostly unaltered (it gained a rel="nofollow" which is a good thing for user generated content): ```html ``` It protects sites from [XSS](http://en.wikipedia.org/wiki/Cross-site_scripting) attacks. There are many [vectors for an XSS attack](https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet) and the best way to mitigate the risk is to sanitize user input against a known safe list of HTML elements and attributes. You should **always** run bluemonday **after** any other processing. If you use [blackfriday](https://github.com/russross/blackfriday) or [Pandoc](http://johnmacfarlane.net/pandoc/) then bluemonday should be run after these steps. This ensures that no insecure HTML is introduced later in your process. bluemonday is heavily inspired by both the [OWASP Java HTML Sanitizer](https://code.google.com/p/owasp-java-html-sanitizer/) and the [HTML Purifier](http://htmlpurifier.org/). ## Technical Summary Allowlist based, you need to either build a policy describing the HTML elements and attributes to permit (and the `regexp` patterns of attributes), or use one of the supplied policies representing good defaults. The policy containing the allowlist is applied using a fast non-validating, forward only, token-based parser implemented in the [Go net/html library](https://godoc.org/golang.org/x/net/html) by the core Go team. We expect to be supplied with well-formatted HTML (closing elements for every applicable open element, nested correctly) and so we do not focus on repairing badly nested or incomplete HTML. We focus on simply ensuring that whatever elements do exist are described in the policy allowlist and that attributes and links are safe for use on your web page. [GIGO](http://en.wikipedia.org/wiki/Garbage_in,_garbage_out) does apply and if you feed it bad HTML bluemonday is not tasked with figuring out how to make it good again. ### Supported Go Versions bluemonday is tested on all versions since Go 1.2 including tip. We do not support Go 1.0 as we depend on `golang.org/x/net/html` which includes a reference to `io.ErrNoProgress` which did not exist in Go 1.0. We support Go 1.1 but Travis no longer tests against it. ## Is it production ready? *Yes* We are using bluemonday in production having migrated from the widely used and heavily field tested OWASP Java HTML Sanitizer. We are passing our extensive test suite (including AntiSamy tests as well as tests for any issues raised). Check for any [unresolved issues](https://github.com/microcosm-cc/bluemonday/issues?page=1&state=open) to see whether anything may be a blocker for you. We invite pull requests and issues to help us ensure we are offering comprehensive protection against various attacks via user generated content. ## Usage Install in your `${GOPATH}` using `go get -u github.com/microcosm-cc/bluemonday` Then call it: ```go package main import ( "fmt" "github.com/microcosm-cc/bluemonday" ) func main() { // Do this once for each unique policy, and use the policy for the life of the program // Policy creation/editing is not safe to use in multiple goroutines p := bluemonday.UGCPolicy() // The policy can then be used to sanitize lots of input and it is safe to use the policy in multiple goroutines html := p.Sanitize( `Google`, ) // Output: // Google fmt.Println(html) } ``` We offer three ways to call Sanitize: ```go p.Sanitize(string) string p.SanitizeBytes([]byte) []byte p.SanitizeReader(io.Reader) bytes.Buffer ``` If you are obsessed about performance, `p.SanitizeReader(r).Bytes()` will return a `[]byte` without performing any unnecessary casting of the inputs or outputs. Though the difference is so negligible you should never need to care. You can build your own policies: ```go package main import ( "fmt" "github.com/microcosm-cc/bluemonday" ) func main() { p := bluemonday.NewPolicy() // Require URLs to be parseable by net/url.Parse and either: // mailto: http:// or https:// p.AllowStandardURLs() // We only allow

and p.AllowAttrs("href").OnElements("a") p.AllowElements("p") html := p.Sanitize( `Google`, ) // Output: // Google fmt.Println(html) } ``` We ship two default policies: 1. `bluemonday.StrictPolicy()` which can be thought of as equivalent to stripping all HTML elements and their attributes as it has nothing on its allowlist. An example usage scenario would be blog post titles where HTML tags are not expected at all and if they are then the elements *and* the content of the elements should be stripped. This is a *very* strict policy. 2. `bluemonday.UGCPolicy()` which allows a broad selection of HTML elements and attributes that are safe for user generated content. Note that this policy does *not* allow iframes, object, embed, styles, script, etc. An example usage scenario would be blog post bodies where a variety of formatting is expected along with the potential for TABLEs and IMGs. ## Policy Building The essence of building a policy is to determine which HTML elements and attributes are considered safe for your scenario. OWASP provide an [XSS prevention cheat sheet](https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet) to help explain the risks, but essentially: 1. Avoid anything other than the standard HTML elements 1. Avoid `script`, `style`, `iframe`, `object`, `embed`, `base` elements that allow code to be executed by the client or third party content to be included that can execute code 1. Avoid anything other than plain HTML attributes with values matched to a regexp Basically, you should be able to describe what HTML is fine for your scenario. If you do not have confidence that you can describe your policy please consider using one of the shipped policies such as `bluemonday.UGCPolicy()`. To create a new policy: ```go p := bluemonday.NewPolicy() ``` To add elements to a policy either add just the elements: ```go p.AllowElements("b", "strong") ``` Or using a regex: _Note: if an element is added by name as shown above, any matching regex will be ignored_ It is also recommended to ensure multiple patterns don't overlap as order of execution is not guaranteed and can result in some rules being missed. ```go p.AllowElementsMatching(regex.MustCompile(`^my-element-`)) ``` Or add elements as a virtue of adding an attribute: ```go // Note the recommended pattern, see the recommendation on using .Matching() below p.AllowAttrs("nowrap").OnElements("td", "th") ``` Again, this also supports a regex pattern match alternative: ```go p.AllowAttrs("nowrap").OnElementsMatching(regex.MustCompile(`^my-element-`)) ``` Attributes can either be added to all elements: ```go p.AllowAttrs("dir").Matching(regexp.MustCompile("(?i)rtl|ltr")).Globally() ``` Or attributes can be added to specific elements: ```go // Not the recommended pattern, see the recommendation on using .Matching() below p.AllowAttrs("value").OnElements("li") ``` It is **always** recommended that an attribute be made to match a pattern. XSS in HTML attributes is very easy otherwise: ```go // \p{L} matches unicode letters, \p{N} matches unicode numbers p.AllowAttrs("title").Matching(regexp.MustCompile(`[\p{L}\p{N}\s\-_',:\[\]!\./\\\(\)&]*`)).Globally() ``` You can stop at any time and call .Sanitize(): ```go // string htmlIn passed in from a HTTP POST htmlOut := p.Sanitize(htmlIn) ``` And you can take any existing policy and extend it: ```go p := bluemonday.UGCPolicy() p.AllowElements("fieldset", "select", "option") ``` ### Inline CSS Although it's possible to handle inline CSS using `AllowAttrs` with a `Matching` rule, writing a single monolithic regular expression to safely process all inline CSS which you wish to allow is not a trivial task. Instead of attempting to do so, you can allow the `style` attribute on whichever element(s) you desire and use style policies to control and sanitize inline styles. It is strongly recommended that you use `Matching` (with a suitable regular expression) `MatchingEnum`, or `MatchingHandler` to ensure each style matches your needs, but default handlers are supplied for most widely used styles. Similar to attributes, you can allow specific CSS properties to be set inline: ```go p.AllowAttrs("style").OnElements("span", "p") // Allow the 'color' property with valid RGB(A) hex values only (on any element allowed a 'style' attribute) p.AllowStyles("color").Matching(regexp.MustCompile("(?i)^#([0-9a-f]{3,4}|[0-9a-f]{6}|[0-9a-f]{8})$")).Globally() ``` Additionally, you can allow a CSS property to be set only to an allowed value: ```go p.AllowAttrs("style").OnElements("span", "p") // Allow the 'text-decoration' property to be set to 'underline', 'line-through' or 'none' // on 'span' elements only p.AllowStyles("text-decoration").MatchingEnum("underline", "line-through", "none").OnElements("span") ``` Or you can specify elements based on a regex pattern match: ```go p.AllowAttrs("style").OnElementsMatching(regex.MustCompile(`^my-element-`)) // Allow the 'text-decoration' property to be set to 'underline', 'line-through' or 'none' // on 'span' elements only p.AllowStyles("text-decoration").MatchingEnum("underline", "line-through", "none").OnElementsMatching(regex.MustCompile(`^my-element-`)) ``` If you need more specific checking, you can create a handler that takes in a string and returns a bool to validate the values for a given property. The string parameter has been converted to lowercase and unicode code points have been converted. ```go myHandler := func(value string) bool{ // Validate your input here return true } p.AllowAttrs("style").OnElements("span", "p") // Allow the 'color' property with values validated by the handler (on any element allowed a 'style' attribute) p.AllowStyles("color").MatchingHandler(myHandler).Globally() ``` ### Links Links are difficult beasts to sanitise safely and also one of the biggest attack vectors for malicious content. It is possible to do this: ```go p.AllowAttrs("href").Matching(regexp.MustCompile(`(?i)mailto|https?`)).OnElements("a") ``` But that will not protect you as the regular expression is insufficient in this case to have prevented a malformed value doing something unexpected. We provide some additional global options for safely working with links. `RequireParseableURLs` will ensure that URLs are parseable by Go's `net/url` package: ```go p.RequireParseableURLs(true) ``` If you have enabled parseable URLs then the following option will `AllowRelativeURLs`. By default this is disabled (bluemonday is an allowlist tool... you need to explicitly tell us to permit things) and when disabled it will prevent all local and scheme relative URLs (i.e. `href="localpage.html"`, `href="../home.html"` and even `href="//www.google.com"` are relative): ```go p.AllowRelativeURLs(true) ``` If you have enabled parseable URLs then you can allow the schemes (commonly called protocol when thinking of `http` and `https`) that are permitted. Bear in mind that allowing relative URLs in the above option will allow for a blank scheme: ```go p.AllowURLSchemes("mailto", "http", "https") ``` Regardless of whether you have enabled parseable URLs, you can force all URLs to have a rel="nofollow" attribute. This will be added if it does not exist, but only when the `href` is valid: ```go // This applies to "a" "area" "link" elements that have a "href" attribute p.RequireNoFollowOnLinks(true) ``` Similarly, you can force all URLs to have "noreferrer" in their rel attribute. ```go // This applies to "a" "area" "link" elements that have a "href" attribute p.RequireNoReferrerOnLinks(true) ``` We provide a convenience method that applies all of the above, but you will still need to allow the linkable elements for the URL rules to be applied to: ```go p.AllowStandardURLs() p.AllowAttrs("cite").OnElements("blockquote", "q") p.AllowAttrs("href").OnElements("a", "area") p.AllowAttrs("src").OnElements("img") ``` An additional complexity regarding links is the data URI as defined in [RFC2397](http://tools.ietf.org/html/rfc2397). The data URI allows for images to be served inline using this format: ```html ``` We have provided a helper to verify the mimetype followed by base64 content of data URIs links: ```go p.AllowDataURIImages() ``` That helper will enable GIF, JPEG, PNG and WEBP images. It should be noted that there is a potential [security](http://palizine.plynt.com/issues/2010Oct/bypass-xss-filters/) [risk](https://capec.mitre.org/data/definitions/244.html) with the use of data URI links. You should only enable data URI links if you already trust the content. We also have some features to help deal with user generated content: ```go p.AddTargetBlankToFullyQualifiedLinks(true) ``` This will ensure that anchor `` links that are fully qualified (the href destination includes a host name) will get `target="_blank"` added to them. Additionally any link that has `target="_blank"` after the policy has been applied will also have the `rel` attribute adjusted to add `noopener`. This means a link may start like `` and will end up as ``. It is important to note that the addition of `noopener` is a security feature and not an issue. There is an unfortunate feature to browsers that a browser window opened as a result of `target="_blank"` can still control the opener (your web page) and this protects against that. The background to this can be found here: [https://dev.to/ben/the-targetblank-vulnerability-by-example](https://dev.to/ben/the-targetblank-vulnerability-by-example) ### Policy Building Helpers We also bundle some helpers to simplify policy building: ```go // Permits the "dir", "id", "lang", "title" attributes globally p.AllowStandardAttributes() // Permits the "img" element and its standard attributes p.AllowImages() // Permits ordered and unordered lists, and also definition lists p.AllowLists() // Permits HTML tables and all applicable elements and non-styling attributes p.AllowTables() ``` ### Invalid Instructions The following are invalid: ```go // This does not say where the attributes are allowed, you need to add // .Globally() or .OnElements(...) // This will be ignored without error. p.AllowAttrs("value") // This does not say where the attributes are allowed, you need to add // .Globally() or .OnElements(...) // This will be ignored without error. p.AllowAttrs( "type", ).Matching( regexp.MustCompile("(?i)^(circle|disc|square|a|A|i|I|1)$"), ) ``` Both examples exhibit the same issue, they declare attributes but do not then specify whether they are allowed globally or only on specific elements (and which elements). Attributes belong to one or more elements, and the policy needs to declare this. ## Limitations We are not yet including any tools to help allow and sanitize CSS. Which means that unless you wish to do the heavy lifting in a single regular expression (inadvisable), **you should not allow the "style" attribute anywhere**. In the same theme, both `\nEnde\n\r", expected: "Hallo\n\nEnde\n\n", }, } p := UGCPolicy() for ii, test := range tests { out := p.Sanitize(test.in) if out != test.expected { t.Errorf( "test %d failed;\ninput : %s\noutput : %s\nexpected: %s", ii, test.in, out, test.expected, ) } } } bluemonday-1.0.26/policy.go000066400000000000000000001030341451172650300155720ustar00rootroot00000000000000// Copyright (c) 2014, David Kitchen // // All rights reserved. // // Redistribution and use in source and binary forms, with or without // modification, are permitted provided that the following conditions are met: // // * Redistributions of source code must retain the above copyright notice, this // list of conditions and the following disclaimer. // // * Redistributions in binary form must reproduce the above copyright notice, // this list of conditions and the following disclaimer in the documentation // and/or other materials provided with the distribution. // // * Neither the name of the organisation (Microcosm) nor the names of its // contributors may be used to endorse or promote products derived from // this software without specific prior written permission. // // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" // AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE // IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE // DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE // FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL // DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR // SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, // OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. package bluemonday //TODO sgutzwiller create map of styles to default handlers //TODO sgutzwiller create handlers for various attributes import ( "net/url" "regexp" "strings" "github.com/microcosm-cc/bluemonday/css" ) // Policy encapsulates the allowlist of HTML elements and attributes that will // be applied to the sanitised HTML. // // You should use bluemonday.NewPolicy() to create a blank policy as the // unexported fields contain maps that need to be initialized. type Policy struct { // Declares whether the maps have been initialized, used as a cheap check to // ensure that those using Policy{} directly won't cause nil pointer // exceptions initialized bool // If true then we add spaces when stripping tags, specifically the closing // tag is replaced by a space character. addSpaces bool // When true, add rel="nofollow" to HTML a, area, and link tags requireNoFollow bool // When true, add rel="nofollow" to HTML a, area, and link tags // Will add for href="http://foo" // Will skip for href="/foo" or href="foo" requireNoFollowFullyQualifiedLinks bool // When true, add rel="noreferrer" to HTML a, area, and link tags requireNoReferrer bool // When true, add rel="noreferrer" to HTML a, area, and link tags // Will add for href="http://foo" // Will skip for href="/foo" or href="foo" requireNoReferrerFullyQualifiedLinks bool // When true, add crossorigin="anonymous" to HTML audio, img, link, script, and video tags requireCrossOriginAnonymous bool // When true, add and filter sandbox attribute on iframe tags requireSandboxOnIFrame map[string]bool // When true add target="_blank" to fully qualified links // Will add for href="http://foo" // Will skip for href="/foo" or href="foo" addTargetBlankToFullyQualifiedLinks bool // When true, URLs must be parseable by "net/url" url.Parse() requireParseableURLs bool // When true, u, _ := url.Parse("url"); !u.IsAbs() is permitted allowRelativeURLs bool // When true, allow data attributes. allowDataAttributes bool // When true, allow comments. allowComments bool // map[htmlElementName]map[htmlAttributeName][]attrPolicy elsAndAttrs map[string]map[string][]attrPolicy // elsMatchingAndAttrs stores regex based element matches along with attributes elsMatchingAndAttrs map[*regexp.Regexp]map[string][]attrPolicy // map[htmlAttributeName][]attrPolicy globalAttrs map[string][]attrPolicy // map[htmlElementName]map[cssPropertyName][]stylePolicy elsAndStyles map[string]map[string][]stylePolicy // map[regex]map[cssPropertyName][]stylePolicy elsMatchingAndStyles map[*regexp.Regexp]map[string][]stylePolicy // map[cssPropertyName][]stylePolicy globalStyles map[string][]stylePolicy // If urlPolicy is nil, all URLs with matching schema are allowed. // Otherwise, only the URLs with matching schema and urlPolicy(url) // returning true are allowed. allowURLSchemes map[string][]urlPolicy // These regexps are used to match allowed URL schemes, for example // if one would want to allow all URL schemes, they would add `.+`. // However pay attention as this can lead to XSS being rendered thus // defeating the purpose of using a HTML sanitizer. // The regexps are only considered if a schema was not explicitly // handled by `AllowURLSchemes` or `AllowURLSchemeWithCustomPolicy`. allowURLSchemeRegexps []*regexp.Regexp // If srcRewriter is not nil, it is used to rewrite the src attribute // of tags that download resources, such as and tag. func (p *Policy) addDefaultSkipElementContent() { p.init() p.setOfElementsToSkipContent["frame"] = struct{}{} p.setOfElementsToSkipContent["frameset"] = struct{}{} p.setOfElementsToSkipContent["iframe"] = struct{}{} p.setOfElementsToSkipContent["noembed"] = struct{}{} p.setOfElementsToSkipContent["noframes"] = struct{}{} p.setOfElementsToSkipContent["noscript"] = struct{}{} p.setOfElementsToSkipContent["nostyle"] = struct{}{} p.setOfElementsToSkipContent["object"] = struct{}{} p.setOfElementsToSkipContent["script"] = struct{}{} p.setOfElementsToSkipContent["style"] = struct{}{} p.setOfElementsToSkipContent["title"] = struct{}{} } bluemonday-1.0.26/policy_test.go000066400000000000000000000245161451172650300166400ustar00rootroot00000000000000// Copyright (c) 2014, David Kitchen // // All rights reserved. // // Redistribution and use in source and binary forms, with or without // modification, are permitted provided that the following conditions are met: // // * Redistributions of source code must retain the above copyright notice, this // list of conditions and the following disclaimer. // // * Redistributions in binary form must reproduce the above copyright notice, // this list of conditions and the following disclaimer in the documentation // and/or other materials provided with the distribution. // // * Neither the name of the organisation (Microcosm) nor the names of its // contributors may be used to endorse or promote products derived from // this software without specific prior written permission. // // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" // AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE // IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE // DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE // FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL // DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR // SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, // OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. package bluemonday import ( "regexp" "testing" ) func TestAllowElementsContent(t *testing.T) { policy := NewPolicy().AllowElementsContent("iframe", "script").AllowUnsafe(true) tests := []test{ { in: "", expected: "this is fallback content", }, { in: "", expected: "var a = 10; alert(a);", }, } for ii, test := range tests { out := policy.Sanitize(test.in) if out != test.expected { t.Errorf( "test %d failed;\ninput : %s\noutput : %s\nexpected: %s", ii, test.in, out, test.expected, ) } } } func TestAllowElementsMatching(t *testing.T) { tests := map[string]struct { policyFn func(policy *Policy) in string expected string }{ "Self closing tags with regex prefix should strip any that do not match": { policyFn: func(policy *Policy) { policy.AllowElementsMatching(regexp.MustCompile(`^my-element-`)) }, in: `
`, expected: `
`, }, "Standard elements regex prefix should strip any that do not match": { policyFn: func(policy *Policy) { policy.AllowElementsMatching(regexp.MustCompile(`^my-element-`)) }, in: `
`, expected: `
`, }, "Self closing tags with regex prefix and custom attr should strip any that do not match": { policyFn: func(policy *Policy) { policy.AllowElementsMatching(regexp.MustCompile(`^my-element-`)) policy.AllowElements("not-my-element-demo-one") }, in: `
`, expected: `
`, }, } for name, test := range tests { policy := NewPolicy().AllowElements("div") policy.AllowDataAttributes() if test.policyFn != nil { test.policyFn(policy) } out := policy.Sanitize(test.in) if out != test.expected { t.Errorf( "test %s failed;\ninput : %s\noutput : %s\nexpected: %s", name, test.in, out, test.expected, ) } } } func TestAttrOnElementMatching(t *testing.T) { tests := map[string]struct { policyFn func(policy *Policy) in string expected string }{ "Self closing tags with regex prefix should strip any that do not match with custom attr": { policyFn: func(policy *Policy) { policy.AllowAttrs("my-attr").OnElementsMatching(regexp.MustCompile(`^my-element-`)) }, in: `
`, expected: `
`, }, "Standard elements regex prefix should strip any that do not match": { policyFn: func(policy *Policy) { policy.AllowAttrs("my-attr").OnElementsMatching(regexp.MustCompile(`^my-element-`)) }, in: `
`, expected: `
`, }, "Specific element rule defined should override matching rules": { policyFn: func(policy *Policy) { // specific element rule policy.AllowAttrs("my-other-attr").OnElements("my-element-demo-one") // matched rule takes lower precedence policy.AllowAttrs("my-attr").OnElementsMatching(regexp.MustCompile(`^my-element-`)) }, in: `
`, expected: `
`, }, } for name, test := range tests { policy := NewPolicy().AllowElements("div") policy.AllowDataAttributes() if test.policyFn != nil { test.policyFn(policy) } out := policy.Sanitize(test.in) if out != test.expected { t.Errorf( "test %s failed;\ninput : %s\noutput : %s\nexpected: %s", name, test.in, out, test.expected, ) } } } func TestStyleOnElementMatching(t *testing.T) { tests := map[string]struct { policyFn func(policy *Policy) in string expected string }{ "Self closing tags with style policy matching prefix should strip any that do not match with custom attr": { policyFn: func(policy *Policy) { policy.AllowAttrs("style"). OnElementsMatching(regexp.MustCompile(`^my-element-`)) policy.AllowStyles("color", "mystyle"). MatchingHandler(func(s string) bool { return true }).OnElementsMatching(regexp.MustCompile(`^my-element-`)) }, in: `
`, expected: `
`, }, "Standard elements with style policy and matching elements should strip any styles not allowed": { policyFn: func(policy *Policy) { policy.AllowAttrs("style"). OnElementsMatching(regexp.MustCompile(`^my-element-`)) policy.AllowStyles("color", "mystyle"). MatchingHandler(func(s string) bool { return true }).OnElementsMatching(regexp.MustCompile(`^my-element-`)) }, in: `
`, expected: `
`, }, "Specific element rule defined should override matching rules": { policyFn: func(policy *Policy) { policy.AllowAttrs("style"). OnElements("my-element-demo-one") policy.AllowStyles("color", "mystyle"). MatchingHandler(func(s string) bool { return true }).OnElements("my-element-demo-one") policy.AllowAttrs("style"). OnElementsMatching(regexp.MustCompile(`^my-element-`)) policy.AllowStyles("color", "customstyle"). MatchingHandler(func(s string) bool { return true }).OnElementsMatching(regexp.MustCompile(`^my-element-`)) }, in: `
`, expected: `
`, }, } for name, test := range tests { policy := NewPolicy().AllowElements("div") policy.AllowDataAttributes() if test.policyFn != nil { test.policyFn(policy) } out := policy.Sanitize(test.in) if out != test.expected { t.Errorf( "test %s failed;\ninput : %s\noutput : %s\nexpected: %s", name, test.in, out, test.expected, ) } } } bluemonday-1.0.26/sanitize.go000066400000000000000000000676301451172650300161340ustar00rootroot00000000000000// Copyright (c) 2014, David Kitchen // // All rights reserved. // // Redistribution and use in source and binary forms, with or without // modification, are permitted provided that the following conditions are met: // // * Redistributions of source code must retain the above copyright notice, this // list of conditions and the following disclaimer. // // * Redistributions in binary form must reproduce the above copyright notice, // this list of conditions and the following disclaimer in the documentation // and/or other materials provided with the distribution. // // * Neither the name of the organisation (Microcosm) nor the names of its // contributors may be used to endorse or promote products derived from // this software without specific prior written permission. // // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" // AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE // IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE // DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE // FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL // DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR // SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER // CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, // OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. package bluemonday import ( "bytes" "fmt" "io" "net/url" "regexp" "strconv" "strings" "golang.org/x/net/html" "github.com/aymerick/douceur/parser" ) var ( dataAttribute = regexp.MustCompile("^data-.+") dataAttributeXMLPrefix = regexp.MustCompile("^xml.+") dataAttributeInvalidChars = regexp.MustCompile("[A-Z;]+") cssUnicodeChar = regexp.MustCompile(`\\[0-9a-f]{1,6} ?`) dataURIbase64Prefix = regexp.MustCompile(`^data:[^,]*;base64,`) ) // Sanitize takes a string that contains a HTML fragment or document and applies // the given policy allowlist. // // It returns a HTML string that has been sanitized by the policy or an empty // string if an error has occurred (most likely as a consequence of extremely // malformed input) func (p *Policy) Sanitize(s string) string { if strings.TrimSpace(s) == "" { return s } return p.sanitizeWithBuff(strings.NewReader(s)).String() } // SanitizeBytes takes a []byte that contains a HTML fragment or document and applies // the given policy allowlist. // // It returns a []byte containing the HTML that has been sanitized by the policy // or an empty []byte if an error has occurred (most likely as a consequence of // extremely malformed input) func (p *Policy) SanitizeBytes(b []byte) []byte { if len(bytes.TrimSpace(b)) == 0 { return b } return p.sanitizeWithBuff(bytes.NewReader(b)).Bytes() } // SanitizeReader takes an io.Reader that contains a HTML fragment or document // and applies the given policy allowlist. // // It returns a bytes.Buffer containing the HTML that has been sanitized by the // policy. Errors during sanitization will merely return an empty result. func (p *Policy) SanitizeReader(r io.Reader) *bytes.Buffer { return p.sanitizeWithBuff(r) } // SanitizeReaderToWriter takes an io.Reader that contains a HTML fragment or document // and applies the given policy allowlist and writes to the provided writer returning // an error if there is one. func (p *Policy) SanitizeReaderToWriter(r io.Reader, w io.Writer) error { return p.sanitize(r, w) } // Query represents a single part of the query string, a query param type Query struct { Key string Value string HasValue bool } func parseQuery(query string) (values []Query, err error) { // This is essentially a copy of parseQuery from // https://golang.org/src/net/url/url.go but adjusted to build our values // based on our type, which we need to preserve the ordering of the query // string for query != "" { key := query if i := strings.IndexAny(key, "&;"); i >= 0 { key, query = key[:i], key[i+1:] } else { query = "" } if key == "" { continue } value := "" hasValue := false if i := strings.Index(key, "="); i >= 0 { key, value = key[:i], key[i+1:] hasValue = true } key, err1 := url.QueryUnescape(key) if err1 != nil { if err == nil { err = err1 } continue } value, err1 = url.QueryUnescape(value) if err1 != nil { if err == nil { err = err1 } continue } values = append(values, Query{ Key: key, Value: value, HasValue: hasValue, }) } return values, err } func encodeQueries(queries []Query) string { var buff bytes.Buffer for i, query := range queries { buff.WriteString(url.QueryEscape(query.Key)) if query.HasValue { buff.WriteString("=") buff.WriteString(url.QueryEscape(query.Value)) } if i < len(queries)-1 { buff.WriteString("&") } } return buff.String() } func sanitizedURL(val string) (string, error) { u, err := url.Parse(val) if err != nil { return "", err } // we use parseQuery but not u.Query to keep the order not change because // url.Values is a map which has a random order. queryValues, err := parseQuery(u.RawQuery) if err != nil { return "", err } // sanitize the url query params for i, query := range queryValues { queryValues[i].Key = html.EscapeString(query.Key) } u.RawQuery = encodeQueries(queryValues) // u.String() will also sanitize host/scheme/user/pass return u.String(), nil } // Performs the actual sanitization process. func (p *Policy) sanitizeWithBuff(r io.Reader) *bytes.Buffer { var buff bytes.Buffer if err := p.sanitize(r, &buff); err != nil { return &bytes.Buffer{} } return &buff } type asStringWriter struct { io.Writer } func (a *asStringWriter) WriteString(s string) (int, error) { return a.Write([]byte(s)) } func (p *Policy) sanitize(r io.Reader, w io.Writer) error { // It is possible that the developer has created the policy via: // p := bluemonday.Policy{} // rather than: // p := bluemonday.NewPolicy() // If this is the case, and if they haven't yet triggered an action that // would initialize the maps, then we need to do that. p.init() buff, ok := w.(stringWriterWriter) if !ok { buff = &asStringWriter{w} } var ( skipElementContent bool skippingElementsCount int64 skipClosingTag bool closingTagToSkipStack []string mostRecentlyStartedToken string ) tokenizer := html.NewTokenizer(r) for { if tokenizer.Next() == html.ErrorToken { err := tokenizer.Err() if err == io.EOF { // End of input means end of processing return nil } // Raw tokenizer error return err } token := tokenizer.Token() switch token.Type { case html.DoctypeToken: // DocType is not handled as there is no safe parsing mechanism // provided by golang.org/x/net/html for the content, and this can // be misused to insert HTML tags that are not then sanitized // // One might wish to recursively sanitize here using the same policy // but I will need to do some further testing before considering // this. case html.CommentToken: // Comments are ignored by default if p.allowComments { // But if allowed then write the comment out as-is buff.WriteString(token.String()) } case html.StartTagToken: mostRecentlyStartedToken = normaliseElementName(token.Data) switch normaliseElementName(token.Data) { case `script`: if !p.allowUnsafe { continue } case `style`: if !p.allowUnsafe { continue } } aps, ok := p.elsAndAttrs[token.Data] if !ok { aa, matched := p.matchRegex(token.Data) if !matched { if _, ok := p.setOfElementsToSkipContent[token.Data]; ok { skipElementContent = true skippingElementsCount++ } if p.addSpaces { if _, err := buff.WriteString(" "); err != nil { return err } } break } aps = aa } if len(token.Attr) != 0 { token.Attr = p.sanitizeAttrs(token.Data, token.Attr, aps) } if len(token.Attr) == 0 { if !p.allowNoAttrs(token.Data) { skipClosingTag = true closingTagToSkipStack = append(closingTagToSkipStack, token.Data) if p.addSpaces { if _, err := buff.WriteString(" "); err != nil { return err } } break } } if !skipElementContent { if _, err := buff.WriteString(token.String()); err != nil { return err } } case html.EndTagToken: if mostRecentlyStartedToken == normaliseElementName(token.Data) { mostRecentlyStartedToken = "" } switch normaliseElementName(token.Data) { case `script`: if !p.allowUnsafe { continue } case `style`: if !p.allowUnsafe { continue } } if skipClosingTag && closingTagToSkipStack[len(closingTagToSkipStack)-1] == token.Data { closingTagToSkipStack = closingTagToSkipStack[:len(closingTagToSkipStack)-1] if len(closingTagToSkipStack) == 0 { skipClosingTag = false } if p.addSpaces { if _, err := buff.WriteString(" "); err != nil { return err } } break } if _, ok := p.elsAndAttrs[token.Data]; !ok { match := false for regex := range p.elsMatchingAndAttrs { if regex.MatchString(token.Data) { skipElementContent = false match = true break } } if _, ok := p.setOfElementsToSkipContent[token.Data]; ok && !match { skippingElementsCount-- if skippingElementsCount == 0 { skipElementContent = false } } if !match { if p.addSpaces { if _, err := buff.WriteString(" "); err != nil { return err } } break } } if !skipElementContent { if _, err := buff.WriteString(token.String()); err != nil { return err } } case html.SelfClosingTagToken: switch normaliseElementName(token.Data) { case `script`: if !p.allowUnsafe { continue } case `style`: if !p.allowUnsafe { continue } } aps, ok := p.elsAndAttrs[token.Data] if !ok { aa, matched := p.matchRegex(token.Data) if !matched { if p.addSpaces && !matched { if _, err := buff.WriteString(" "); err != nil { return err } } break } aps = aa } if len(token.Attr) != 0 { token.Attr = p.sanitizeAttrs(token.Data, token.Attr, aps) } if len(token.Attr) == 0 && !p.allowNoAttrs(token.Data) { if p.addSpaces { if _, err := buff.WriteString(" "); err != nil { return err } } break } if !skipElementContent { if _, err := buff.WriteString(token.String()); err != nil { return err } } case html.TextToken: if !skipElementContent { switch mostRecentlyStartedToken { case `script`: // not encouraged, but if a policy allows JavaScript we // should not HTML escape it as that would break the output // // requires p.AllowUnsafe() if p.allowUnsafe { if _, err := buff.WriteString(token.Data); err != nil { return err } } case "style": // not encouraged, but if a policy allows CSS styles we // should not HTML escape it as that would break the output // // requires p.AllowUnsafe() if p.allowUnsafe { if _, err := buff.WriteString(token.Data); err != nil { return err } } default: // HTML escape the text if _, err := buff.WriteString(token.String()); err != nil { return err } } } default: // A token that didn't exist in the html package when we wrote this return fmt.Errorf("unknown token: %v", token) } } } // sanitizeAttrs takes a set of element attribute policies and the global // attribute policies and applies them to the []html.Attribute returning a set // of html.Attributes that match the policies func (p *Policy) sanitizeAttrs( elementName string, attrs []html.Attribute, aps map[string][]attrPolicy, ) []html.Attribute { if len(attrs) == 0 { return attrs } hasStylePolicies := false sps, elementHasStylePolicies := p.elsAndStyles[elementName] if len(p.globalStyles) > 0 || (elementHasStylePolicies && len(sps) > 0) { hasStylePolicies = true } // no specific element policy found, look for a pattern match if !hasStylePolicies { for k, v := range p.elsMatchingAndStyles { if k.MatchString(elementName) { if len(v) > 0 { hasStylePolicies = true break } } } } // Builds a new attribute slice based on the whether the attribute has been // allowed explicitly or globally. cleanAttrs := []html.Attribute{} attrsLoop: for _, htmlAttr := range attrs { if p.allowDataAttributes { // If we see a data attribute, let it through. if isDataAttribute(htmlAttr.Key) { cleanAttrs = append(cleanAttrs, htmlAttr) continue } } // Is this a "style" attribute, and if so, do we need to sanitize it? if htmlAttr.Key == "style" && hasStylePolicies { htmlAttr = p.sanitizeStyles(htmlAttr, elementName) if htmlAttr.Val == "" { // We've sanitized away any and all styles; don't bother to // output the style attribute (even if it's allowed) continue } else { cleanAttrs = append(cleanAttrs, htmlAttr) continue } } // Is there an element specific attribute policy that applies? if apl, ok := aps[htmlAttr.Key]; ok { for _, ap := range apl { if ap.regexp != nil { if ap.regexp.MatchString(htmlAttr.Val) { cleanAttrs = append(cleanAttrs, htmlAttr) continue attrsLoop } } else { cleanAttrs = append(cleanAttrs, htmlAttr) continue attrsLoop } } } // Is there a global attribute policy that applies? if apl, ok := p.globalAttrs[htmlAttr.Key]; ok { for _, ap := range apl { if ap.regexp != nil { if ap.regexp.MatchString(htmlAttr.Val) { cleanAttrs = append(cleanAttrs, htmlAttr) } } else { cleanAttrs = append(cleanAttrs, htmlAttr) } } } } if len(cleanAttrs) == 0 { // If nothing was allowed, let's get out of here return cleanAttrs } // cleanAttrs now contains the attributes that are permitted if linkable(elementName) { if p.requireParseableURLs { // Ensure URLs are parseable: // - a.href // - area.href // - link.href // - blockquote.cite // - q.cite // - img.src // - script.src tmpAttrs := []html.Attribute{} for _, htmlAttr := range cleanAttrs { switch elementName { case "a", "area", "base", "link": if htmlAttr.Key == "href" { if u, ok := p.validURL(htmlAttr.Val); ok { htmlAttr.Val = u tmpAttrs = append(tmpAttrs, htmlAttr) } break } tmpAttrs = append(tmpAttrs, htmlAttr) case "blockquote", "del", "ins", "q": if htmlAttr.Key == "cite" { if u, ok := p.validURL(htmlAttr.Val); ok { htmlAttr.Val = u tmpAttrs = append(tmpAttrs, htmlAttr) } break } tmpAttrs = append(tmpAttrs, htmlAttr) case "audio", "embed", "iframe", "img", "script", "source", "track", "video": if htmlAttr.Key == "src" { if u, ok := p.validURL(htmlAttr.Val); ok { if p.srcRewriter != nil { parsedURL, err := url.Parse(u) if err != nil { fmt.Println(err) } p.srcRewriter(parsedURL) u = parsedURL.String() } htmlAttr.Val = u tmpAttrs = append(tmpAttrs, htmlAttr) } break } tmpAttrs = append(tmpAttrs, htmlAttr) default: tmpAttrs = append(tmpAttrs, htmlAttr) } } cleanAttrs = tmpAttrs } if (p.requireNoFollow || p.requireNoFollowFullyQualifiedLinks || p.requireNoReferrer || p.requireNoReferrerFullyQualifiedLinks || p.addTargetBlankToFullyQualifiedLinks) && len(cleanAttrs) > 0 { // Add rel="nofollow" if a "href" exists switch elementName { case "a", "area", "base", "link": var hrefFound bool var externalLink bool for _, htmlAttr := range cleanAttrs { if htmlAttr.Key == "href" { hrefFound = true u, err := url.Parse(htmlAttr.Val) if err != nil { continue } if u.Host != "" { externalLink = true } continue } } if hrefFound { var ( noFollowFound bool noReferrerFound bool targetBlankFound bool ) addNoFollow := (p.requireNoFollow || externalLink && p.requireNoFollowFullyQualifiedLinks) addNoReferrer := (p.requireNoReferrer || externalLink && p.requireNoReferrerFullyQualifiedLinks) addTargetBlank := (externalLink && p.addTargetBlankToFullyQualifiedLinks) tmpAttrs := []html.Attribute{} for _, htmlAttr := range cleanAttrs { var appended bool if htmlAttr.Key == "rel" && (addNoFollow || addNoReferrer) { if addNoFollow && !strings.Contains(htmlAttr.Val, "nofollow") { htmlAttr.Val += " nofollow" } if addNoReferrer && !strings.Contains(htmlAttr.Val, "noreferrer") { htmlAttr.Val += " noreferrer" } noFollowFound = addNoFollow noReferrerFound = addNoReferrer tmpAttrs = append(tmpAttrs, htmlAttr) appended = true } if elementName == "a" && htmlAttr.Key == "target" { if htmlAttr.Val == "_blank" { targetBlankFound = true } if addTargetBlank && !targetBlankFound { htmlAttr.Val = "_blank" targetBlankFound = true tmpAttrs = append(tmpAttrs, htmlAttr) appended = true } } if !appended { tmpAttrs = append(tmpAttrs, htmlAttr) } } if noFollowFound || noReferrerFound || targetBlankFound { cleanAttrs = tmpAttrs } if (addNoFollow && !noFollowFound) || (addNoReferrer && !noReferrerFound) { rel := html.Attribute{} rel.Key = "rel" if addNoFollow { rel.Val = "nofollow" } if addNoReferrer { if rel.Val != "" { rel.Val += " " } rel.Val += "noreferrer" } cleanAttrs = append(cleanAttrs, rel) } if elementName == "a" && addTargetBlank && !targetBlankFound { rel := html.Attribute{} rel.Key = "target" rel.Val = "_blank" targetBlankFound = true cleanAttrs = append(cleanAttrs, rel) } if targetBlankFound { // target="_blank" has a security risk that allows the // opened window/tab to issue JavaScript calls against // window.opener, which in effect allow the destination // of the link to control the source: // https://dev.to/ben/the-targetblank-vulnerability-by-example // // To mitigate this risk, we need to add a specific rel // attribute if it is not already present. // rel="noopener" // // Unfortunately this is processing the rel twice (we // already looked at it earlier ^^) as we cannot be sure // of the ordering of the href and rel, and whether we // have fully satisfied that we need to do this. This // double processing only happens *if* target="_blank" // is true. var noOpenerAdded bool tmpAttrs := []html.Attribute{} for _, htmlAttr := range cleanAttrs { var appended bool if htmlAttr.Key == "rel" { if strings.Contains(htmlAttr.Val, "noopener") { noOpenerAdded = true tmpAttrs = append(tmpAttrs, htmlAttr) } else { htmlAttr.Val += " noopener" noOpenerAdded = true tmpAttrs = append(tmpAttrs, htmlAttr) } appended = true } if !appended { tmpAttrs = append(tmpAttrs, htmlAttr) } } if noOpenerAdded { cleanAttrs = tmpAttrs } else { // rel attr was not found, or else noopener would // have been added already rel := html.Attribute{} rel.Key = "rel" rel.Val = "noopener" cleanAttrs = append(cleanAttrs, rel) } } } default: } } } if p.requireCrossOriginAnonymous && len(cleanAttrs) > 0 { switch elementName { case "audio", "img", "link", "script", "video": var crossOriginFound bool for _, htmlAttr := range cleanAttrs { if htmlAttr.Key == "crossorigin" { crossOriginFound = true htmlAttr.Val = "anonymous" } } if !crossOriginFound { crossOrigin := html.Attribute{} crossOrigin.Key = "crossorigin" crossOrigin.Val = "anonymous" cleanAttrs = append(cleanAttrs, crossOrigin) } } } if p.requireSandboxOnIFrame != nil && elementName == "iframe" { var sandboxFound bool for i, htmlAttr := range cleanAttrs { if htmlAttr.Key == "sandbox" { sandboxFound = true var cleanVals []string cleanValsSet := make(map[string]bool) for _, val := range strings.Fields(htmlAttr.Val) { if p.requireSandboxOnIFrame[val] { if !cleanValsSet[val] { cleanVals = append(cleanVals, val) cleanValsSet[val] = true } } } cleanAttrs[i].Val = strings.Join(cleanVals, " ") } } if !sandboxFound { sandbox := html.Attribute{} sandbox.Key = "sandbox" sandbox.Val = "" cleanAttrs = append(cleanAttrs, sandbox) } } return cleanAttrs } func (p *Policy) sanitizeStyles(attr html.Attribute, elementName string) html.Attribute { sps := p.elsAndStyles[elementName] if len(sps) == 0 { sps = map[string][]stylePolicy{} // check for any matching elements, if we don't already have a policy found // if multiple matches are found they will be overwritten, it's best // to not have overlapping matchers for regex, policies := range p.elsMatchingAndStyles { if regex.MatchString(elementName) { for k, v := range policies { sps[k] = append(sps[k], v...) } } } } //Add semi-colon to end to fix parsing issue attr.Val = strings.TrimRight(attr.Val, " ") if len(attr.Val) > 0 && attr.Val[len(attr.Val)-1] != ';' { attr.Val = attr.Val + ";" } decs, err := parser.ParseDeclarations(attr.Val) if err != nil { attr.Val = "" return attr } clean := []string{} prefixes := []string{"-webkit-", "-moz-", "-ms-", "-o-", "mso-", "-xv-", "-atsc-", "-wap-", "-khtml-", "prince-", "-ah-", "-hp-", "-ro-", "-rim-", "-tc-"} decLoop: for _, dec := range decs { tempProperty := strings.ToLower(dec.Property) tempValue := removeUnicode(strings.ToLower(dec.Value)) for _, i := range prefixes { tempProperty = strings.TrimPrefix(tempProperty, i) } if spl, ok := sps[tempProperty]; ok { for _, sp := range spl { if sp.handler != nil { if sp.handler(tempValue) { clean = append(clean, dec.Property+": "+dec.Value) continue decLoop } } else if len(sp.enum) > 0 { if stringInSlice(tempValue, sp.enum) { clean = append(clean, dec.Property+": "+dec.Value) continue decLoop } } else if sp.regexp != nil { if sp.regexp.MatchString(tempValue) { clean = append(clean, dec.Property+": "+dec.Value) continue decLoop } } } } if spl, ok := p.globalStyles[tempProperty]; ok { for _, sp := range spl { if sp.handler != nil { if sp.handler(tempValue) { clean = append(clean, dec.Property+": "+dec.Value) continue decLoop } } else if len(sp.enum) > 0 { if stringInSlice(tempValue, sp.enum) { clean = append(clean, dec.Property+": "+dec.Value) continue decLoop } } else if sp.regexp != nil { if sp.regexp.MatchString(tempValue) { clean = append(clean, dec.Property+": "+dec.Value) continue decLoop } } } } } if len(clean) > 0 { attr.Val = strings.Join(clean, "; ") } else { attr.Val = "" } return attr } func (p *Policy) allowNoAttrs(elementName string) bool { _, ok := p.setOfElementsAllowedWithoutAttrs[elementName] if !ok { for _, r := range p.setOfElementsMatchingAllowedWithoutAttrs { if r.MatchString(elementName) { ok = true break } } } return ok } func (p *Policy) validURL(rawurl string) (string, bool) { if p.requireParseableURLs { // URLs are valid if when space is trimmed the URL is valid rawurl = strings.TrimSpace(rawurl) // URLs cannot contain whitespace, unless it is a data-uri if strings.Contains(rawurl, " ") || strings.Contains(rawurl, "\t") || strings.Contains(rawurl, "\n") { if !strings.HasPrefix(rawurl, `data:`) { return "", false } // Remove \r and \n from base64 encoded data to pass url.Parse. matched := dataURIbase64Prefix.FindString(rawurl) if matched != "" { rawurl = matched + strings.Replace( strings.Replace( rawurl[len(matched):], "\r", "", -1, ), "\n", "", -1, ) } } // URLs are valid if they parse u, err := url.Parse(rawurl) if err != nil { return "", false } if u.Scheme != "" { urlPolicies, ok := p.allowURLSchemes[u.Scheme] if !ok { for _, r := range p.allowURLSchemeRegexps { if r.MatchString(u.Scheme) { return u.String(), true } } return "", false } if len(urlPolicies) == 0 { return u.String(), true } for _, urlPolicy := range urlPolicies { if urlPolicy(u) { return u.String(), true } } return "", false } if p.allowRelativeURLs { if u.String() != "" { return u.String(), true } } return "", false } return rawurl, true } func linkable(elementName string) bool { switch elementName { case "a", "area", "base", "link": // elements that allow .href return true case "blockquote", "del", "ins", "q": // elements that allow .cite return true case "audio", "embed", "iframe", "img", "input", "script", "track", "video": // elements that allow .src return true default: return false } } // stringInSlice returns true if needle exists in haystack func stringInSlice(needle string, haystack []string) bool { for _, straw := range haystack { if strings.EqualFold(straw, needle) { return true } } return false } func isDataAttribute(val string) bool { if !dataAttribute.MatchString(val) { return false } rest := strings.Split(val, "data-") if len(rest) == 1 { return false } // data-xml* is invalid. if dataAttributeXMLPrefix.MatchString(rest[1]) { return false } // no uppercase or semi-colons allowed. if dataAttributeInvalidChars.MatchString(rest[1]) { return false } return true } func removeUnicode(value string) string { substitutedValue := value currentLoc := cssUnicodeChar.FindStringIndex(substitutedValue) for currentLoc != nil { character := substitutedValue[currentLoc[0]+1 : currentLoc[1]] character = strings.TrimSpace(character) if len(character) < 4 { character = strings.Repeat("0", 4-len(character)) + character } else { for len(character) > 4 { if character[0] != '0' { character = "" break } else { character = character[1:] } } } character = "\\u" + character translatedChar, err := strconv.Unquote(`"` + character + `"`) translatedChar = strings.TrimSpace(translatedChar) if err != nil { return "" } substitutedValue = substitutedValue[0:currentLoc[0]] + translatedChar + substitutedValue[currentLoc[1]:] currentLoc = cssUnicodeChar.FindStringIndex(substitutedValue) } return substitutedValue } func (p *Policy) matchRegex(elementName string) (map[string][]attrPolicy, bool) { aps := make(map[string][]attrPolicy, 0) matched := false for regex, attrs := range p.elsMatchingAndAttrs { if regex.MatchString(elementName) { matched = true for k, v := range attrs { aps[k] = append(aps[k], v...) } } } return aps, matched } // normaliseElementName takes a HTML element like `, expected: `test`, }, { in: `<<<><`, expected: ``, }, { in: "", expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: `
`, expected: ``, }, { in: `
`, expected: ``, }, { in: `
`, expected: `
`, }, { in: `
`, expected: `
`, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: "", expected: ``, }, { in: ``, expected: ``, }, { in: `PT SRC="http://ha.ckers.org/xss.js">`, expected: `PT SRC="http://ha.ckers.org/xss.js">`, }, { in: `PT SRC="http://ha.ckers.org/xss.js">`, expected: `PT SRC="http://ha.ckers.org/xss.js">`, }, { in: ``, expected: ``, }, { in: "", expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ` +ADw-SCRIPT+AD4-alert('XSS')`, expected: ` +ADw-SCRIPT+AD4-alert('XSS')`, }, { in: ``, expected: ``, }, { in: `alert("XSS")'); ?>`, expected: `alert("XSS")'); ?>`, }, { in: ``, expected: ``, }, { in: ` "> `, expected: "\n\n\n">\n", }, { in: ` `, expected: ` `, }, { in: ` `, expected: ` `, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: `
`, expected: `
`, }, { in: `
`, expected: `
`, }, { in: `
`, expected: `
`, }, { in: `
`, expected: `
`, }, { in: `
`, expected: `
`, }, { in: ``, expected: `
`, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: `
`, expected: `
`, }, { in: ``, expected: ``, }, { in: ``, expected: ``, }, { in: `