pax_global_header 0000666 0000000 0000000 00000000064 14624642056 0014523 g ustar 00root root 0000000 0000000 52 comment=8672fb3dd411dbea45bfafa060013fcbfdccd045
golang-github-johanneskaufmann-html-to-markdown-1.6.0/ 0000775 0000000 0000000 00000000000 14624642056 0023004 5 ustar 00root root 0000000 0000000 golang-github-johanneskaufmann-html-to-markdown-1.6.0/.github/ 0000775 0000000 0000000 00000000000 14624642056 0024344 5 ustar 00root root 0000000 0000000 golang-github-johanneskaufmann-html-to-markdown-1.6.0/.github/ISSUE_TEMPLATE/ 0000775 0000000 0000000 00000000000 14624642056 0026527 5 ustar 00root root 0000000 0000000 golang-github-johanneskaufmann-html-to-markdown-1.6.0/.github/ISSUE_TEMPLATE/bug_report.md 0000664 0000000 0000000 00000001020 14624642056 0031212 0 ustar 00root root 0000000 0000000 ---
name: Bug report
about: Create a report to help us improve
title: "\U0001F41B Bug"
labels: bug
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
**HTML Input**
```html
Title
```
**Generated Markdown**
````markdown
# Title
````
**Expected Markdown**
````markdown
# Title!!!
````
**Additional context**
Add any other context about the problem here. For example, if you changed the default options or used a plugin. Also adding the version from the `go.mod` is helpful.
golang-github-johanneskaufmann-html-to-markdown-1.6.0/.github/dependabot.yml 0000664 0000000 0000000 00000000425 14624642056 0027175 0 ustar 00root root 0000000 0000000 # Please see the documentation for all configuration options:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
version: 2
updates:
- package-ecosystem: "gomod"
directory: "/"
schedule:
interval: "weekly"
golang-github-johanneskaufmann-html-to-markdown-1.6.0/.github/workflows/ 0000775 0000000 0000000 00000000000 14624642056 0026401 5 ustar 00root root 0000000 0000000 golang-github-johanneskaufmann-html-to-markdown-1.6.0/.github/workflows/go.yml 0000664 0000000 0000000 00000001572 14624642056 0027536 0 ustar 00root root 0000000 0000000 name: Go
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
jobs:
build:
name: Build
runs-on: ubuntu-latest
steps:
- name: Set up Go 1.x
uses: actions/setup-go@v2
with:
go-version: ^1.13
id: go
- name: Check out code into the Go module directory
uses: actions/checkout@v2
- name: Get dependencies
run: |
go get -v -t -d ./...
if [ -f Gopkg.toml ]; then
curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh
dep ensure
fi
- name: Build
run: go build -v .
- name: Test
run: go test -v -race -coverprofile=coverage.txt -covermode=atomic .
- name: Upload Coverage report to CodeCov
uses: codecov/codecov-action@v2
with:
token: ${{secrets.CODECOV_TOKEN}}
file: ./coverage.txt
golang-github-johanneskaufmann-html-to-markdown-1.6.0/.gitignore 0000664 0000000 0000000 00000000313 14624642056 0024771 0 ustar 00root root 0000000 0000000 # Binaries for programs and plugins
*.exe
*.exe~
*.dll
*.so
*.dylib
# Test binary, build with `go test -c`
*.test
# Output of the go coverage tool, specifically when used with LiteIDE
*.out
.DS_Store
golang-github-johanneskaufmann-html-to-markdown-1.6.0/CONTRIBUTING.md 0000664 0000000 0000000 00000000000 14624642056 0025223 0 ustar 00root root 0000000 0000000 golang-github-johanneskaufmann-html-to-markdown-1.6.0/LICENSE 0000664 0000000 0000000 00000002062 14624642056 0024011 0 ustar 00root root 0000000 0000000 MIT License
Copyright (c) 2018 Johannes Kaufmann
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
golang-github-johanneskaufmann-html-to-markdown-1.6.0/README.md 0000664 0000000 0000000 00000024070 14624642056 0024266 0 ustar 00root root 0000000 0000000 # html-to-markdown
[](https://goreportcard.com/report/github.com/JohannesKaufmann/html-to-markdown)
[](https://codecov.io/gh/JohannesKaufmann/html-to-markdown)

[](http://godoc.org/github.com/JohannesKaufmann/html-to-markdown)

Convert HTML into Markdown with Go. It is using an [HTML Parser](https://github.com/PuerkitoBio/goquery) to avoid the use of `regexp` as much as possible. That should prevent some [weird cases](https://stackoverflow.com/a/1732454) and allows it to be used for cases where the input is totally unknown.
## Installation
```
go get github.com/JohannesKaufmann/html-to-markdown
```
## Usage
```go
import (
"fmt"
"log"
md "github.com/JohannesKaufmann/html-to-markdown"
)
converter := md.NewConverter("", true, nil)
html := `Important`
markdown, err := converter.ConvertString(html)
if err != nil {
log.Fatal(err)
}
fmt.Println("md ->", markdown)
```
If you are already using [goquery](https://github.com/PuerkitoBio/goquery) you can pass a selection to `Convert`.
```go
markdown, err := converter.Convert(selec)
```
### Using it on the command line
If you want to make use of `html-to-markdown` on the command line without any Go coding, check out [`html2md`](https://github.com/suntong/html2md#usage), a cli wrapper for `html-to-markdown` that has all the following options and plugins builtin.
## Options
The third parameter to `md.NewConverter` is `*md.Options`.
For example you can change the character that is around a bold text ("`**`") to a different one (for example "`__`") by changing the value of `StrongDelimiter`.
```go
opt := &md.Options{
StrongDelimiter: "__", // default: **
// ...
}
converter := md.NewConverter("", true, opt)
```
For all the possible options look at [godocs](https://godoc.org/github.com/JohannesKaufmann/html-to-markdown/#Options) and for a example look at the [example](/examples/options/main.go).
## Adding Rules
```go
converter.AddRules(
md.Rule{
Filter: []string{"del", "s", "strike"},
Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string {
// You need to return a pointer to a string (md.String is just a helper function).
// If you return nil the next function for that html element
// will be picked. For example you could only convert an element
// if it has a certain class name and fallback if not.
content = strings.TrimSpace(content)
return md.String("~" + content + "~")
},
},
// more rules
)
```
For more information have a look at the example [add_rules](/examples/add_rules/main.go).
## Using Plugins
If you want plugins (github flavored markdown like striketrough, tables, ...) you can pass it to `Use`.
```go
import "github.com/JohannesKaufmann/html-to-markdown/plugin"
// Use the `GitHubFlavored` plugin from the `plugin` package.
converter.Use(plugin.GitHubFlavored())
```
Or if you only want to use the `Strikethrough` plugin. You can change the character that distinguishes
the text that is crossed out by setting the first argument to a different value (for example "~~" instead of "~").
```go
converter.Use(plugin.Strikethrough(""))
```
For more information have a look at the example [github_flavored](/examples/github_flavored/main.go).
---
These are the plugins located in the [plugin folder](/plugin) which you can use by importing "github.com/JohannesKaufmann/html-to-markdown/plugin".
| Name | Description |
| --------------------- | ------------------------------------------------------------------------------------------- |
| GitHubFlavored | GitHub's Flavored Markdown contains `TaskListItems`, `Strikethrough` and `Table`. |
| TaskListItems | (Included in `GitHubFlavored`). Converts `` checkboxes into `- [x] Task`. |
| Strikethrough | (Included in `GitHubFlavored`). Converts ``, ``, and `` to the `~~` syntax. |
| Table | (Included in `GitHubFlavored`). Convert a `
` into something like this... |
| TableCompat | |
| | |
| VimeoEmbed | |
| YoutubeEmbed | |
| | |
| ConfluenceCodeBlock | Converts `` elements that are used in Atlassian’s Wiki "Confluence". |
| ConfluenceAttachments | Converts `` elements. |
These are the plugins in other repositories:
| Name | Description |
| ---------------------------- | ------------------- |
| \[Plugin Name\]\(Your Link\) | A short description |
I you write a plugin, feel free to open a PR that adds your Plugin to this list.
## Writing Plugins
Have a look at the [plugin folder](/plugin) for a reference implementation. The most basic one is [Strikethrough](/plugin/strikethrough.go).
## Security
This library produces markdown that is readable and can be changed by humans.
Once you convert this markdown back to HTML (e.g. using [goldmark](https://github.com/yuin/goldmark) or [blackfriday](https://github.com/russross/blackfriday)) you need to be careful of malicious content.
This library does NOT sanitize untrusted content. Use an HTML sanitizer such as [bluemonday](https://github.com/microcosm-cc/bluemonday) before displaying the HTML in the browser.
## Other Methods
[Godoc](https://godoc.org/github.com/JohannesKaufmann/html-to-markdown)
### `func (c *Converter) Keep(tags ...string) *Converter`
Determines which elements are to be kept and rendered as HTML.
### `func (c *Converter) Remove(tags ...string) *Converter`
Determines which elements are to be removed altogether i.e. converted to an empty string.
## Escaping
Some characters have a special meaning in markdown. For example, the character "\*" can be used for lists, emphasis and dividers. By placing a backlash before that character (e.g. "\\\*") you can "escape" it. Then the character will render as a raw "\*" without the _"markdown meaning"_ applied.
But why is "escaping" even necessary?
```md
Paragraph 1
-
Paragraph 2
```
The markdown above doesn't seem that problematic. But "Paragraph 1" (with only one hyphen below) will be recognized as a _setext heading_.
```html
Paragraph 1
Paragraph 2
```
A well-placed backslash character would prevent that...
```md
Paragraph 1
\-
Paragraph 2
```
---
How to configure escaping? Depending on the `EscapeMode` option, the markdown output is going to be different.
```go
opt = &md.Options{
EscapeMode: "basic", // default
}
```
Lets try it out with this HTML input:
| | |
| -------- | ----------------------------------------------------- |
| input | `
fake **bold** and real bold
` |
| | |
| | **With EscapeMode "basic"** |
| output | `fake \*\*bold\*\* and real **bold**` |
| rendered | fake \*\*bold\*\* and real **bold** |
| | |
| | **With EscapeMode "disabled"** |
| output | `fake **bold** and real **bold**` |
| rendered | fake **bold** and real **bold** |
With **basic** escaping, we get some escape characters (the backlash "\\") but it renders correctly.
With escaping **disabled**, the fake and real bold can't be distinguished in the markdown. That means it is both going to render as bold.
---
So now you know the purpose of escaping. However, if you encounter some content where the escaping breaks, you can manually disable it. But please also open an issue!
## Issues
If you find HTML snippets (or even full websites) that don't produce the expected results, please open an issue!
## Contributing & Testing
Please first discuss the change you wish to make, by opening an issue. I'm also happy to guide you to where a change is most likely needed.
_Note: The outside API should not change because of backwards compatibility..._
You don't have to be afraid of breaking the converter, since there are many "Golden File Tests":
Add your problematic HTML snippet to one of the `input.html` files in the `testdata` folder. Then run `go test -update` and have a look at which `.golden` files changed in GIT.
You can now change the internal logic and inspect what impact your change has by running `go test -update` again.
_Note: Before submitting your change as a PR, make sure that you run those tests and check the files into GIT..._
## Related Projects
- [turndown (js)](https://github.com/domchristie/turndown), a very good library written in javascript.
- [lunny/html2md](https://github.com/lunny/html2md), which is using [regex instead of goquery](https://stackoverflow.com/a/1732454). I came around a few edge case when using it (leaving some html comments, ...) so I wrote my own.
## License
This project is licensed under the terms of the MIT license.
golang-github-johanneskaufmann-html-to-markdown-1.6.0/SECURITY.md 0000664 0000000 0000000 00000000332 14624642056 0024573 0 ustar 00root root 0000000 0000000 # Security Policy
## Reporting a Vulnerability
Please report (suspected) security vulnerabilities to johannes@joina.de with the subject _"Security html-to-markdown"_ and you will receive a response within 48 hours.
golang-github-johanneskaufmann-html-to-markdown-1.6.0/commonmark.go 0000664 0000000 0000000 00000026630 14624642056 0025505 0 ustar 00root root 0000000 0000000 package md
import (
"fmt"
"unicode"
"regexp"
"strconv"
"strings"
"unicode/utf8"
"github.com/JohannesKaufmann/html-to-markdown/escape"
"github.com/PuerkitoBio/goquery"
)
var multipleSpacesR = regexp.MustCompile(` +`)
var commonmark = []Rule{
{
Filter: []string{"ul", "ol"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
parent := selec.Parent()
// we have a nested list, were the ul/ol is inside a list item
// -> based on work done by @requilence from @anytypeio
if (parent.Is("li") || parent.Is("ul") || parent.Is("ol")) && parent.Children().Last().IsSelection(selec) {
// add a line break prefix if the parent's text node doesn't have it.
// that makes sure that every list item is on its on line
lastContentTextNode := strings.TrimRight(parent.Nodes[0].FirstChild.Data, " \t")
if !strings.HasSuffix(lastContentTextNode, "\n") {
content = "\n" + content
}
// remove empty lines between lists
trimmedSpaceContent := strings.TrimRight(content, " \t")
if strings.HasSuffix(trimmedSpaceContent, "\n") {
content = strings.TrimRightFunc(content, unicode.IsSpace)
}
} else {
content = "\n\n" + content + "\n\n"
}
return &content
},
},
{
Filter: []string{"li"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
if strings.TrimSpace(content) == "" {
return nil
}
// remove leading newlines
content = leadingNewlinesR.ReplaceAllString(content, "")
// replace trailing newlines with just a single one
content = trailingNewlinesR.ReplaceAllString(content, "\n")
// remove leading spaces
content = strings.TrimLeft(content, " ")
prefix := selec.AttrOr(attrListPrefix, "")
// `prefixCount` is not nessesarily the length of the empty string `prefix`
// but how much space is reserved for the prefixes of the siblings.
prefixCount, previousPrefixCounts := countListParents(opt, selec)
// if the prefix is not needed, balance it by adding the usual prefix spaces
if prefix == "" {
prefix = strings.Repeat(" ", prefixCount)
}
// indent the prefix so that the nested links are represented
indent := strings.Repeat(" ", previousPrefixCounts)
prefix = indent + prefix
content = IndentMultiLineListItem(opt, content, prefixCount+previousPrefixCounts)
return String(prefix + content + "\n")
},
},
{
Filter: []string{"#text"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
text := selec.Text()
if trimmed := strings.TrimSpace(text); trimmed == "" {
return String("")
}
text = tabR.ReplaceAllString(text, " ")
// replace multiple spaces by one space: dont accidentally make
// normal text be indented and thus be a code block.
text = multipleSpacesR.ReplaceAllString(text, " ")
if opt.EscapeMode == "basic" {
text = escape.MarkdownCharacters(text)
}
// if its inside a list, trim the spaces to not mess up the indentation
parent := selec.Parent()
next := selec.Next()
if IndexWithText(selec) == 0 &&
(parent.Is("li") || parent.Is("ol") || parent.Is("ul")) &&
(next.Is("ul") || next.Is("ol")) {
// trim only spaces and not new lines
text = strings.Trim(text, ` `)
}
return &text
},
},
{
Filter: []string{"p", "div"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
parent := goquery.NodeName(selec.Parent())
if IsInlineElement(parent) || parent == "li" {
content = "\n" + content + "\n"
return &content
}
// remove unnecessary spaces to have clean markdown
content = TrimpLeadingSpaces(content)
content = "\n\n" + content + "\n\n"
return &content
},
},
{
Filter: []string{"h1", "h2", "h3", "h4", "h5", "h6"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
if strings.TrimSpace(content) == "" {
return nil
}
content = strings.Replace(content, "\n", " ", -1)
content = strings.Replace(content, "\r", " ", -1)
content = strings.Replace(content, `#`, `\#`, -1)
content = strings.TrimSpace(content)
insideLink := selec.ParentsFiltered("a").Length() > 0
if insideLink {
text := opt.StrongDelimiter + content + opt.StrongDelimiter
text = AddSpaceIfNessesary(selec, text)
return &text
}
node := goquery.NodeName(selec)
level, err := strconv.Atoi(node[1:])
if err != nil {
return nil
}
if opt.HeadingStyle == "setext" && level < 3 {
line := "-"
if level == 1 {
line = "="
}
underline := strings.Repeat(line, len(content))
return String("\n\n" + content + "\n" + underline + "\n\n")
}
prefix := strings.Repeat("#", level)
text := "\n\n" + prefix + " " + content + "\n\n"
return &text
},
},
{
Filter: []string{"strong", "b"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
// only use one bold tag if they are nested
parent := selec.Parent()
if parent.Is("strong") || parent.Is("b") {
return &content
}
trimmed := strings.TrimSpace(content)
if trimmed == "" {
return &trimmed
}
// If there is a newline character between the start and end delimiter
// the delimiters won't be recognized. Either we remove all newline characters
// OR on _every_ line we put start & end delimiters.
trimmed = delimiterForEveryLine(trimmed, opt.StrongDelimiter)
// Always have a space to the side to recognize the delimiter
trimmed = AddSpaceIfNessesary(selec, trimmed)
return &trimmed
},
},
{
Filter: []string{"i", "em"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
// only use one italic tag if they are nested
parent := selec.Parent()
if parent.Is("i") || parent.Is("em") {
return &content
}
trimmed := strings.TrimSpace(content)
if trimmed == "" {
return &trimmed
}
// If there is a newline character between the start and end delimiter
// the delimiters won't be recognized. Either we remove all newline characters
// OR on _every_ line we put start & end delimiters.
trimmed = delimiterForEveryLine(trimmed, opt.EmDelimiter)
// Always have a space to the side to recognize the delimiter
trimmed = AddSpaceIfNessesary(selec, trimmed)
return &trimmed
},
},
{
Filter: []string{"img"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
src := selec.AttrOr("src", "")
src = strings.TrimSpace(src)
if src == "" {
return String("")
}
src = opt.GetAbsoluteURL(selec, src, opt.domain)
alt := selec.AttrOr("alt", "")
alt = strings.Replace(alt, "\n", " ", -1)
text := fmt.Sprintf("", alt, src)
return &text
},
},
{
Filter: []string{"a"},
AdvancedReplacement: func(content string, selec *goquery.Selection, opt *Options) (AdvancedResult, bool) {
// if there is no href, no link is used. So just return the content inside the link
href, ok := selec.Attr("href")
if !ok || strings.TrimSpace(href) == "" || strings.TrimSpace(href) == "#" {
return AdvancedResult{
Markdown: content,
}, false
}
href = opt.GetAbsoluteURL(selec, href, opt.domain)
// having multiline content inside a link is a bit tricky
content = EscapeMultiLine(content)
var title string
if t, ok := selec.Attr("title"); ok {
t = strings.Replace(t, "\n", " ", -1)
// escape all quotes
t = strings.Replace(t, `"`, `\"`, -1)
title = fmt.Sprintf(` "%s"`, t)
}
// if there is no link content (for example because it contains an svg)
// the 'title' or 'aria-label' attribute is used instead.
if strings.TrimSpace(content) == "" {
content = selec.AttrOr("title", selec.AttrOr("aria-label", ""))
}
// a link without text won't de displayed anyway
if content == "" {
return AdvancedResult{}, true
}
if opt.LinkStyle == "inlined" {
md := fmt.Sprintf("[%s](%s%s)", content, href, title)
md = AddSpaceIfNessesary(selec, md)
return AdvancedResult{
Markdown: md,
}, false
}
var replacement string
var reference string
switch opt.LinkReferenceStyle {
case "collapsed":
replacement = "[" + content + "][]"
reference = "[" + content + "]: " + href + title
case "shortcut":
replacement = "[" + content + "]"
reference = "[" + content + "]: " + href + title
default:
id := selec.AttrOr("data-index", "")
replacement = "[" + content + "][" + id + "]"
reference = "[" + id + "]: " + href + title
}
replacement = AddSpaceIfNessesary(selec, replacement)
return AdvancedResult{Markdown: replacement, Footer: reference}, false
},
},
{
Filter: []string{"code", "kbd", "samp", "tt"},
Replacement: func(_ string, selec *goquery.Selection, opt *Options) *string {
code := getCodeContent(selec)
// Newlines in the text aren't great, since this is inline code and not a code block.
// Newlines will be stripped anyway in the browser, but it won't be recognized as code
// from the markdown parser when there is more than one newline.
// So limit to
code = multipleNewLinesRegex.ReplaceAllString(code, "\n")
fenceChar := '`'
maxCount := calculateCodeFenceOccurrences(fenceChar, code)
maxCount++
fence := strings.Repeat(string(fenceChar), maxCount)
// code block contains a backtick as first character
if strings.HasPrefix(code, "`") {
code = " " + code
}
// code block contains a backtick as last character
if strings.HasSuffix(code, "`") {
code = code + " "
}
// TODO: configure delimeter in options?
text := fence + code + fence
text = AddSpaceIfNessesary(selec, text)
return &text
},
},
{
Filter: []string{"pre"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
codeElement := selec.Find("code")
language := codeElement.AttrOr("class", "")
language = strings.Replace(language, "language-", "", 1)
code := getCodeContent(selec)
fenceChar, _ := utf8.DecodeRuneInString(opt.Fence)
fence := CalculateCodeFence(fenceChar, code)
text := "\n\n" + fence + language + "\n" +
code +
"\n" + fence + "\n\n"
return &text
},
},
{
Filter: []string{"hr"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
// e.g. `## --- Heading` would look weird, so don't render a divider if inside a heading
insideHeading := selec.ParentsFiltered("h1,h2,h3,h4,h5,h6").Length() > 0
if insideHeading {
return String("")
}
text := "\n\n" + opt.HorizontalRule + "\n\n"
return &text
},
},
{
Filter: []string{"br"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
return String("\n\n")
},
},
{
Filter: []string{"blockquote"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
content = strings.TrimSpace(content)
if content == "" {
return nil
}
content = multipleNewLinesRegex.ReplaceAllString(content, "\n\n")
var beginningR = regexp.MustCompile(`(?m)^`)
content = beginningR.ReplaceAllString(content, "> ")
text := "\n\n" + content + "\n\n"
return &text
},
},
{
Filter: []string{"noscript"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
// for now remove the contents of noscript. But in the future we could
// tell goquery to parse the contents of the tag.
// -> https://github.com/PuerkitoBio/goquery/issues/139#issuecomment-517526070
return nil
},
},
}
golang-github-johanneskaufmann-html-to-markdown-1.6.0/commonmark_test.go 0000664 0000000 0000000 00000014611 14624642056 0026540 0 ustar 00root root 0000000 0000000 package md_test
import (
"bytes"
"fmt"
"io/ioutil"
"os"
"path"
"path/filepath"
"strings"
"testing"
md "github.com/JohannesKaufmann/html-to-markdown"
"github.com/PuerkitoBio/goquery"
"github.com/sebdah/goldie/v2"
"github.com/yuin/goldmark"
"github.com/yuin/goldmark/extension"
)
type Variation struct {
Options *md.Options
Plugins []md.Plugin
}
type GoldenTest struct {
Name string
Domain string
DisableGoldmark bool
Variations map[string]Variation
}
func runGoldenTest(t *testing.T, test GoldenTest, variationKey string) {
variation := test.Variations[variationKey]
g := goldie.New(t)
// testdata/TestCommonmark/name/input.html
p := path.Join(t.Name(), "input.html")
// get the input html from a file
input, err := ioutil.ReadFile(path.Join("testdata", p))
if err != nil {
t.Error(err)
return
}
if test.Domain == "" {
test.Domain = "example.com"
}
conv := md.NewConverter(test.Domain, true, variation.Options)
conv.Keep("keep-tag").Remove("remove-tag")
for _, plugin := range variation.Plugins {
conv.Use(plugin)
}
markdown, err := conv.ConvertBytes(input)
if err != nil {
t.Error(err)
}
// testdata/TestCommonmark/name/output.default.golden
p = path.Join(t.Name(), "output."+variationKey)
g.Assert(t, p, markdown)
gold := goldmark.New(goldmark.WithExtensions(extension.GFM))
var buf bytes.Buffer
if err := gold.Convert(markdown, &buf); err != nil {
t.Error(err)
}
if !test.DisableGoldmark {
// testdata/TestCommonmark/name/goldmark.golden
p = path.Join(t.Name(), "goldmark")
g.Assert(t, p, buf.Bytes())
}
}
func RunGoldenTest(t *testing.T, tests []GoldenTest) {
// loop through all test cases that were added manually
dirs := make(map[string]struct{})
for _, test := range tests {
name := test.Name
name = strings.Replace(name, " ", "_", -1)
dirs[name] = struct{}{}
}
// now add all tests that were found on disk to the tests slice
err := filepath.Walk(path.Join("testdata", t.Name()),
func(p string, info os.FileInfo, err error) error {
if err != nil {
return err
}
if !info.IsDir() {
return nil
}
// skip folders that don't contain an input.html file
if _, err := os.Stat(path.Join(p, "input.html")); os.IsNotExist(err) {
return nil
}
parts := strings.SplitN(p, string(os.PathSeparator), 3)
p = parts[2] // remove "testdata/TestCommonmark/" from "testdata/TestCommonmark/..."
_, ok := dirs[p]
if ok {
return nil
}
// add the folder from disk to the tests slice, since its not it there yet
tests = append(tests, GoldenTest{
Name: p,
})
return nil
})
if err != nil {
t.Error(err)
return
}
for _, test := range tests {
if len(test.Variations) == 0 {
test.Variations = map[string]Variation{
"default": {},
}
}
t.Run(test.Name, func(t *testing.T) {
if strings.Contains(t.Name(), "#") {
fmt.Println("the name", test.Name, t.Name(), "seems too be used for multiple tests")
return
}
for variationKey := range test.Variations {
runGoldenTest(t, test, variationKey)
}
})
}
}
func TestCommonmark(t *testing.T) {
var tests = []GoldenTest{
{
Name: "link",
DisableGoldmark: true,
Variations: map[string]Variation{
"relative": {
Options: &md.Options{
GetAbsoluteURL: func(selec *goquery.Selection, rawURL string, domain string) string {
return rawURL
},
},
},
"inlined": {
Options: &md.Options{LinkStyle: "inlined"},
},
"referenced_full": {
Options: &md.Options{LinkStyle: "referenced", LinkReferenceStyle: "full"},
},
"referenced_collapsed": {
Options: &md.Options{LinkStyle: "referenced", LinkReferenceStyle: "collapsed"},
},
"referenced_shortcut": {
Options: &md.Options{LinkStyle: "referenced", LinkReferenceStyle: "shortcut"},
},
},
},
{
Name: "heading",
Variations: map[string]Variation{
"atx": {
Options: &md.Options{HeadingStyle: "atx"},
},
"setext": {
Options: &md.Options{HeadingStyle: "setext"},
},
},
},
{
Name: "italic",
Variations: map[string]Variation{
"asterisks": {
Options: &md.Options{EmDelimiter: "*"},
},
"underscores": {
Options: &md.Options{EmDelimiter: "_"},
},
},
},
{
Name: "bold",
Variations: map[string]Variation{
"asterisks": {
Options: &md.Options{StrongDelimiter: "**"},
},
"underscores": {
Options: &md.Options{StrongDelimiter: "__"},
},
},
},
{
Name: "pre_code",
Variations: map[string]Variation{
"indented": {
Options: &md.Options{CodeBlockStyle: "indented"},
},
"fenced_backtick": {
Options: &md.Options{CodeBlockStyle: "fenced", Fence: "```"},
},
"fenced_tilde": {
Options: &md.Options{CodeBlockStyle: "fenced", Fence: "~~~"},
},
},
},
{
Name: "list",
Variations: map[string]Variation{
"asterisks": {
Options: &md.Options{BulletListMarker: "*"},
},
"dash": {
Options: &md.Options{BulletListMarker: "-"},
},
"plus": {
Options: &md.Options{BulletListMarker: "+"},
},
},
},
{
Name: "list_nested",
DisableGoldmark: true,
Variations: map[string]Variation{
"asterisks": {
Options: &md.Options{BulletListMarker: "*"},
},
"dash": {
Options: &md.Options{BulletListMarker: "-"},
},
"plus": {
Options: &md.Options{BulletListMarker: "+"},
},
},
},
// + all the test on disk that are added automatically
}
RunGoldenTest(t, tests)
}
func TestRealWorld(t *testing.T) {
var tests = []GoldenTest{
{
Name: "blog.golang.org",
Domain: "blog.golang.org",
Variations: map[string]Variation{
"inlined": {
Options: &md.Options{LinkStyle: "inlined"},
},
"referenced_full": {
Options: &md.Options{LinkStyle: "referenced", LinkReferenceStyle: "full"},
},
"referenced_collapsed": {
Options: &md.Options{LinkStyle: "referenced", LinkReferenceStyle: "collapsed"},
},
"referenced_shortcut": {
Options: &md.Options{LinkStyle: "referenced", LinkReferenceStyle: "shortcut"},
},
"emphasis_asterisks": {
Options: &md.Options{EmDelimiter: "*", StrongDelimiter: "**"},
},
"emphasis_underscores": {
Options: &md.Options{EmDelimiter: "_", StrongDelimiter: "__"},
},
},
},
{
Name: "golang.org",
Domain: "golang.org",
},
// + all the test on disk that are added automatically
}
RunGoldenTest(t, tests)
}
golang-github-johanneskaufmann-html-to-markdown-1.6.0/escape/ 0000775 0000000 0000000 00000000000 14624642056 0024244 5 ustar 00root root 0000000 0000000 golang-github-johanneskaufmann-html-to-markdown-1.6.0/escape/escape.go 0000664 0000000 0000000 00000003522 14624642056 0026035 0 ustar 00root root 0000000 0000000 // Package escape escapes characters that are commonly used in
// markdown like the * for strong/italic.
package escape
import (
"regexp"
"strings"
)
var backslash = regexp.MustCompile(`\\(\S)`)
var heading = regexp.MustCompile(`(?m)^(#{1,6} )`)
var orderedList = regexp.MustCompile(`(?m)^(\W* {0,3})(\d+)\. `)
var unorderedList = regexp.MustCompile(`(?m)^([^\\\w]*)[*+-] `)
var horizontalDivider = regexp.MustCompile(`(?m)^([-*_] *){3,}$`)
var blockquote = regexp.MustCompile(`(?m)^(\W* {0,3})> `)
var link = regexp.MustCompile(`([\[\]])`)
var replacer = strings.NewReplacer(
`*`, `\*`,
`_`, `\_`,
"`", "\\`",
`|`, `\|`,
)
// MarkdownCharacters escapes common markdown characters so that
// `
**Not Bold**
ends up as correct markdown `\*\*Not Strong\*\*`.
// No worry, the escaped characters will display fine, just without the formatting.
func MarkdownCharacters(text string) string {
// Escape backslash escapes!
text = backslash.ReplaceAllString(text, `\\$1`)
// Escape headings
text = heading.ReplaceAllString(text, `\$1`)
// Escape hr
text = horizontalDivider.ReplaceAllStringFunc(text, func(t string) string {
if strings.Contains(t, "-") {
return strings.Replace(t, "-", `\-`, 3)
} else if strings.Contains(t, "_") {
return strings.Replace(t, "_", `\_`, 3)
}
return strings.Replace(t, "*", `\*`, 3)
})
// Escape ol bullet points
text = orderedList.ReplaceAllString(text, `$1$2\. `)
// Escape ul bullet points
text = unorderedList.ReplaceAllStringFunc(text, func(t string) string {
return regexp.MustCompile(`([*+-])`).ReplaceAllString(t, `\$1`)
})
// Escape blockquote indents
text = blockquote.ReplaceAllString(text, `$1\> `)
// Escape em/strong *
// Escape em/strong _
// Escape code _
text = replacer.Replace(text)
// Escape link & image brackets
text = link.ReplaceAllString(text, `\$1`)
return text
}
golang-github-johanneskaufmann-html-to-markdown-1.6.0/examples/ 0000775 0000000 0000000 00000000000 14624642056 0024622 5 ustar 00root root 0000000 0000000 golang-github-johanneskaufmann-html-to-markdown-1.6.0/examples/add_rules/ 0000775 0000000 0000000 00000000000 14624642056 0026564 5 ustar 00root root 0000000 0000000 golang-github-johanneskaufmann-html-to-markdown-1.6.0/examples/add_rules/main.go 0000664 0000000 0000000 00000002554 14624642056 0030045 0 ustar 00root root 0000000 0000000 package main
import (
"fmt"
"log"
"strings"
md "github.com/JohannesKaufmann/html-to-markdown"
"github.com/PuerkitoBio/goquery"
)
func main() {
html := `Good soundtrack and cake.`
// -> `Good soundtrack ~~and cake~~.`
/*
We want to add a rule when a `span` tag has a class of `bb_strike`.
Have a look at `plugin/strikethrough.go` to see how it is implemented normally.
*/
strikethrough := md.Rule{
Filter: []string{"span"},
Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string {
// If the span element has not the classname `bb_strike` return nil.
// That way the next rules will apply. In this case the commonmark rules.
// -> return nil -> next rule applies
if !selec.HasClass("bb_strike") {
return nil
}
// Trim spaces so that the following does NOT happen: `~ and cake~`.
// Because of the space it is not recognized as strikethrough.
// -> trim spaces at begin&end of string when inside strong/italic/...
content = strings.TrimSpace(content)
return md.String("~~" + content + "~~")
},
}
conv := md.NewConverter("", true, nil)
conv.AddRules(strikethrough)
// -> add 1+ rules to the converter. the last added will be used first.
markdown, err := conv.ConvertString(html)
if err != nil {
log.Fatal(err)
}
fmt.Printf("\n\nmarkdown:'%s'\n", markdown)
}
golang-github-johanneskaufmann-html-to-markdown-1.6.0/examples/custom_tag/ 0000775 0000000 0000000 00000000000 14624642056 0026767 5 ustar 00root root 0000000 0000000 golang-github-johanneskaufmann-html-to-markdown-1.6.0/examples/custom_tag/main.go 0000664 0000000 0000000 00000002200 14624642056 0030234 0 ustar 00root root 0000000 0000000 package main
import (
"fmt"
"log"
"strings"
md "github.com/JohannesKaufmann/html-to-markdown"
"github.com/PuerkitoBio/goquery"
)
func main() {
html := `https://youtu.be/1SoMeViDhttps://youtu.be/2SoMeViDhttps://youtu.be/3SoMeViDhttps://youtu.be/4SoMeViDhttps://youtu.be/5SoMeViD
`
videoRule := md.Rule{
// We want to add a rule for a `my_video` tag.
Filter: []string{"my_video"},
Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string {
text := "click to watch video"
// in this case, the content inside the tag is the url
href := strings.TrimSpace(content)
// format it, so that its `[click to watch video](https://youtu.be/1SoMeViD)\n\n`
md := fmt.Sprintf("[%s](%s)\n\n", text, href)
return &md
},
}
conv := md.NewConverter("", true, nil)
conv.AddRules(videoRule)
// -> add 1+ rules to the converter. the last added will be used first.
markdown, err := conv.ConvertString(html)
if err != nil {
log.Fatal(err)
}
fmt.Printf("\n\nresult:'%s'\n", markdown)
}
golang-github-johanneskaufmann-html-to-markdown-1.6.0/examples/escaping/ 0000775 0000000 0000000 00000000000 14624642056 0026413 5 ustar 00root root 0000000 0000000 golang-github-johanneskaufmann-html-to-markdown-1.6.0/examples/escaping/main.go 0000664 0000000 0000000 00000001646 14624642056 0027675 0 ustar 00root root 0000000 0000000 package main
import (
"fmt"
"log"
md "github.com/JohannesKaufmann/html-to-markdown"
)
func main() {
html := `
fake **bold** and real bold
`
// With "basic" we get:
// "fake \*\*bold\*\* and real **bold**"
// which would render as:
// "
fake **bold** and real bold
"
// With "none" we get:
// "fake **bold** and real **bold**"
// which would render as:
// "
`
/*
- [x] Checked!
- [ ] Check Me!
*/
conv := md.NewConverter("", true, nil)
// Use the `GitHubFlavored` plugin from the `plugin` package.
conv.Use(plugin.GitHubFlavored())
markdown, err := conv.ConvertString(html)
if err != nil {
log.Fatal(err)
}
fmt.Println(markdown)
}
golang-github-johanneskaufmann-html-to-markdown-1.6.0/examples/goquery/ 0000775 0000000 0000000 00000000000 14624642056 0026315 5 ustar 00root root 0000000 0000000 golang-github-johanneskaufmann-html-to-markdown-1.6.0/examples/goquery/main.go 0000664 0000000 0000000 00000000656 14624642056 0027577 0 ustar 00root root 0000000 0000000 package main
import (
"fmt"
"log"
md "github.com/JohannesKaufmann/html-to-markdown"
"github.com/PuerkitoBio/goquery"
)
func main() {
url := "https://blog.golang.org/godoc-documenting-go-code"
doc, err := goquery.NewDocument(url)
if err != nil {
log.Fatal(err)
}
content := doc.Find("#content")
conv := md.NewConverter(md.DomainFromURL(url), true, nil)
markdown := conv.Convert(content)
fmt.Println(markdown)
}
golang-github-johanneskaufmann-html-to-markdown-1.6.0/examples/options/ 0000775 0000000 0000000 00000000000 14624642056 0026315 5 ustar 00root root 0000000 0000000 golang-github-johanneskaufmann-html-to-markdown-1.6.0/examples/options/main.go 0000664 0000000 0000000 00000000631 14624642056 0027570 0 ustar 00root root 0000000 0000000 package main
import (
"fmt"
"log"
md "github.com/JohannesKaufmann/html-to-markdown"
)
func main() {
html := `Bold Text`
// -> `__Bold Text__`
// instead of `**Bold Text**`
opt := &md.Options{
StrongDelimiter: "__", // default: **
}
conv := md.NewConverter("", true, opt)
markdown, err := conv.ConvertString(html)
if err != nil {
log.Fatal(err)
}
fmt.Println(markdown)
}
golang-github-johanneskaufmann-html-to-markdown-1.6.0/from.go 0000664 0000000 0000000 00000027440 14624642056 0024305 0 ustar 00root root 0000000 0000000 // Package md converts html to markdown.
//
// converter := md.NewConverter("", true, nil)
//
// html = `Important`
//
// markdown, err := converter.ConvertString(html)
// if err != nil {
// log.Fatal(err)
// }
// fmt.Println("md ->", markdown)
// Or if you are already using goquery:
// markdown, err := converter.Convert(selec)
package md
import (
"bytes"
"errors"
"fmt"
"io"
"log"
"net/http"
"net/url"
"regexp"
"strconv"
"strings"
"sync"
"time"
"github.com/PuerkitoBio/goquery"
)
type simpleRuleFunc func(content string, selec *goquery.Selection, options *Options) *string
type ruleFunc func(content string, selec *goquery.Selection, options *Options) (res AdvancedResult, skip bool)
// BeforeHook runs before the converter and can be used to transform the original html
type BeforeHook func(selec *goquery.Selection)
// Afterhook runs after the converter and can be used to transform the resulting markdown
type Afterhook func(markdown string) string
// Converter is initialized by NewConverter.
type Converter struct {
mutex sync.RWMutex
rules map[string][]ruleFunc
keep map[string]struct{}
remove map[string]struct{}
before []BeforeHook
after []Afterhook
domain string
options Options
}
func validate(val string, possible ...string) error {
for _, e := range possible {
if e == val {
return nil
}
}
return fmt.Errorf("field must be one of %v but got %s", possible, val)
}
func validateOptions(opt Options) error {
if err := validate(opt.HeadingStyle, "setext", "atx"); err != nil {
return err
}
if strings.Count(opt.HorizontalRule, "*") < 3 &&
strings.Count(opt.HorizontalRule, "_") < 3 &&
strings.Count(opt.HorizontalRule, "-") < 3 {
return errors.New("HorizontalRule must be at least 3 characters of '*', '_' or '-' but got " + opt.HorizontalRule)
}
if err := validate(opt.BulletListMarker, "-", "+", "*"); err != nil {
return err
}
if err := validate(opt.CodeBlockStyle, "indented", "fenced"); err != nil {
return err
}
if err := validate(opt.Fence, "```", "~~~"); err != nil {
return err
}
if err := validate(opt.EmDelimiter, "_", "*"); err != nil {
return err
}
if err := validate(opt.StrongDelimiter, "**", "__"); err != nil {
return err
}
if err := validate(opt.LinkStyle, "inlined", "referenced"); err != nil {
return err
}
if err := validate(opt.LinkReferenceStyle, "full", "collapsed", "shortcut"); err != nil {
return err
}
return nil
}
var (
attrListPrefix = "data-converter-list-prefix"
)
// NewConverter initializes a new converter and holds all the rules.
// - `domain` is used for links and images to convert relative urls ("/image.png") to absolute urls.
// - CommonMark is the default set of rules. Set enableCommonmark to false if you want
// to customize everything using AddRules and DONT want to fallback to default rules.
func NewConverter(domain string, enableCommonmark bool, options *Options) *Converter {
conv := &Converter{
domain: domain,
rules: make(map[string][]ruleFunc),
keep: make(map[string]struct{}),
remove: make(map[string]struct{}),
}
conv.before = append(conv.before, func(selec *goquery.Selection) {
selec.Find("a[href]").Each(func(i int, s *goquery.Selection) {
// TODO: don't hardcode "data-index" and rename it to avoid accidental conflicts
s.SetAttr("data-index", strconv.Itoa(i+1))
})
})
conv.before = append(conv.before, func(selec *goquery.Selection) {
selec.Find("li").Each(func(i int, s *goquery.Selection) {
prefix := getListPrefix(options, s)
s.SetAttr(attrListPrefix, prefix)
})
})
conv.after = append(conv.after, func(markdown string) string {
markdown = strings.TrimSpace(markdown)
markdown = multipleNewLinesRegex.ReplaceAllString(markdown, "\n\n")
// remove unnecessary trailing spaces to have clean markdown
markdown = TrimTrailingSpaces(markdown)
return markdown
})
if enableCommonmark {
conv.AddRules(commonmark...)
conv.remove["script"] = struct{}{}
conv.remove["style"] = struct{}{}
conv.remove["textarea"] = struct{}{}
}
// TODO: put domain in options?
if options == nil {
options = &Options{}
}
if options.HeadingStyle == "" {
options.HeadingStyle = "atx"
}
if options.HorizontalRule == "" {
options.HorizontalRule = "* * *"
}
if options.BulletListMarker == "" {
options.BulletListMarker = "-"
}
if options.CodeBlockStyle == "" {
options.CodeBlockStyle = "indented"
}
if options.Fence == "" {
options.Fence = "```"
}
if options.EmDelimiter == "" {
options.EmDelimiter = "_"
}
if options.StrongDelimiter == "" {
options.StrongDelimiter = "**"
}
if options.LinkStyle == "" {
options.LinkStyle = "inlined"
}
if options.LinkReferenceStyle == "" {
options.LinkReferenceStyle = "full"
}
if options.EscapeMode == "" {
options.EscapeMode = "basic"
}
// for now, store it in the options
options.domain = domain
if options.GetAbsoluteURL == nil {
options.GetAbsoluteURL = DefaultGetAbsoluteURL
}
conv.options = *options
err := validateOptions(conv.options)
if err != nil {
log.Println("markdown options is not valid:", err)
}
return conv
}
func (conv *Converter) getRuleFuncs(tag string) []ruleFunc {
conv.mutex.RLock()
defer conv.mutex.RUnlock()
r, ok := conv.rules[tag]
if !ok || len(r) == 0 {
if _, keep := conv.keep[tag]; keep {
return []ruleFunc{wrap(ruleKeep)}
}
if _, remove := conv.remove[tag]; remove {
return nil // TODO:
}
return []ruleFunc{wrap(ruleDefault)}
}
return r
}
func wrap(simple simpleRuleFunc) ruleFunc {
return func(content string, selec *goquery.Selection, opt *Options) (AdvancedResult, bool) {
res := simple(content, selec, opt)
if res == nil {
return AdvancedResult{}, true
}
return AdvancedResult{Markdown: *res}, false
}
}
// Before registers a hook that is run before the conversion. It
// can be used to transform the original goquery html document.
//
// For example, the default before hook adds an index to every link,
// so that the `a` tag rule (for "reference" "full") can have an incremental number.
func (conv *Converter) Before(hooks ...BeforeHook) *Converter {
conv.mutex.Lock()
defer conv.mutex.Unlock()
for _, hook := range hooks {
conv.before = append(conv.before, hook)
}
return conv
}
// After registers a hook that is run after the conversion. It
// can be used to transform the markdown document that is about to be returned.
//
// For example, the default after hook trims the returned markdown.
func (conv *Converter) After(hooks ...Afterhook) *Converter {
conv.mutex.Lock()
defer conv.mutex.Unlock()
for _, hook := range hooks {
conv.after = append(conv.after, hook)
}
return conv
}
// ClearBefore clears the current before hooks (including the default before hooks).
func (conv *Converter) ClearBefore() *Converter {
conv.mutex.Lock()
defer conv.mutex.Unlock()
conv.before = nil
return conv
}
// ClearAfter clears the current after hooks (including the default after hooks).
func (conv *Converter) ClearAfter() *Converter {
conv.mutex.Lock()
defer conv.mutex.Unlock()
conv.after = nil
return conv
}
// AddRules adds the rules that are passed in to the converter.
//
// By default it overrides the rule for that html tag. You can
// fall back to the default rule by returning nil.
func (conv *Converter) AddRules(rules ...Rule) *Converter {
conv.mutex.Lock()
defer conv.mutex.Unlock()
for _, rule := range rules {
if len(rule.Filter) == 0 {
log.Println("you need to specify at least one filter for your rule")
}
for _, filter := range rule.Filter {
r, _ := conv.rules[filter]
if rule.AdvancedReplacement != nil {
r = append(r, rule.AdvancedReplacement)
} else {
r = append(r, wrap(rule.Replacement))
}
conv.rules[filter] = r
}
}
return conv
}
// Keep certain html tags in the generated output.
func (conv *Converter) Keep(tags ...string) *Converter {
conv.mutex.Lock()
defer conv.mutex.Unlock()
for _, tag := range tags {
conv.keep[tag] = struct{}{}
}
return conv
}
// Remove certain html tags from the source.
func (conv *Converter) Remove(tags ...string) *Converter {
conv.mutex.Lock()
defer conv.mutex.Unlock()
for _, tag := range tags {
conv.remove[tag] = struct{}{}
}
return conv
}
// Plugin can be used to extends functionality beyond what
// is offered by commonmark.
type Plugin func(conv *Converter) []Rule
// Use can be used to add additional functionality to the converter. It is
// used when its not sufficient to use only rules for example in Plugins.
func (conv *Converter) Use(plugins ...Plugin) *Converter {
for _, plugin := range plugins {
rules := plugin(conv)
conv.AddRules(rules...) // TODO: for better performance only use one lock for all plugins
}
return conv
}
// Timeout for the http client
var Timeout = time.Second * 10
var netClient = &http.Client{
Timeout: Timeout,
}
// DomainFromURL returns `u.Host` from the parsed url.
func DomainFromURL(rawURL string) string {
rawURL = strings.TrimSpace(rawURL)
u, _ := url.Parse(rawURL)
if u != nil && u.Host != "" {
return u.Host
}
// lets try it again by adding a scheme
u, _ = url.Parse("http://" + rawURL)
if u != nil {
return u.Host
}
return ""
}
// Reduce many newline characters `\n` to at most 2 new line characters.
var multipleNewLinesRegex = regexp.MustCompile(`[\n]{2,}`)
// Convert returns the content from a goquery selection.
// If you have a goquery document just pass in doc.Selection.
func (conv *Converter) Convert(selec *goquery.Selection) string {
conv.mutex.RLock()
domain := conv.domain
options := conv.options
l := len(conv.rules)
if l == 0 {
log.Println("you have added no rules. either enable commonmark or add you own.")
}
before := conv.before
after := conv.after
conv.mutex.RUnlock()
// before hook
for _, hook := range before {
hook(selec)
}
res := conv.selecToMD(domain, selec, &options)
markdown := res.Markdown
if res.Header != "" {
markdown = res.Header + "\n\n" + markdown
}
if res.Footer != "" {
markdown += "\n\n" + res.Footer
}
// after hook
for _, hook := range after {
markdown = hook(markdown)
}
return markdown
}
// ConvertReader returns the content from a reader and returns a buffer.
func (conv *Converter) ConvertReader(reader io.Reader) (bytes.Buffer, error) {
var buffer bytes.Buffer
doc, err := goquery.NewDocumentFromReader(reader)
if err != nil {
return buffer, err
}
buffer.WriteString(
conv.Convert(doc.Selection),
)
return buffer, nil
}
// ConvertResponse returns the content from a html response.
func (conv *Converter) ConvertResponse(res *http.Response) (string, error) {
doc, err := goquery.NewDocumentFromResponse(res)
if err != nil {
return "", err
}
return conv.Convert(doc.Selection), nil
}
// ConvertString returns the content from a html string. If you
// already have a goquery selection use `Convert`.
func (conv *Converter) ConvertString(html string) (string, error) {
doc, err := goquery.NewDocumentFromReader(strings.NewReader(html))
if err != nil {
return "", err
}
return conv.Convert(doc.Selection), nil
}
// ConvertBytes returns the content from a html byte array.
func (conv *Converter) ConvertBytes(bytes []byte) ([]byte, error) {
res, err := conv.ConvertString(string(bytes))
if err != nil {
return nil, err
}
return []byte(res), nil
}
// ConvertURL returns the content from the page with that url.
func (conv *Converter) ConvertURL(url string) (string, error) {
// not using goquery.NewDocument directly because of the timeout
resp, err := netClient.Get(url)
if err != nil {
return "", err
}
if resp.StatusCode < 200 || resp.StatusCode > 299 {
return "", fmt.Errorf("expected a status code in the 2xx range but got %d", resp.StatusCode)
}
doc, err := goquery.NewDocumentFromResponse(resp)
if err != nil {
return "", err
}
domain := DomainFromURL(url)
if conv.domain != domain {
log.Printf("expected '%s' as the domain but got '%s' \n", conv.domain, domain)
}
return conv.Convert(doc.Selection), nil
}
golang-github-johanneskaufmann-html-to-markdown-1.6.0/from_test.go 0000664 0000000 0000000 00000036173 14624642056 0025347 0 ustar 00root root 0000000 0000000 package md
import (
"bytes"
"errors"
"io/ioutil"
"log"
"net/http"
"net/http/httptest"
"os"
"strings"
"sync"
"testing"
"github.com/PuerkitoBio/goquery"
)
func TestConvertReader(t *testing.T) {
input := `Bold`
expected := `**Bold**`
converter := NewConverter("", true, nil)
res, err := converter.ConvertReader(strings.NewReader(input))
if err != nil {
t.Error(err)
}
if !bytes.Equal(res.Bytes(), []byte(expected)) {
t.Error("the result is different that expected")
}
}
type ErrReader struct{ Error error }
// -> https://stackoverflow.com/a/57452918
func (e *ErrReader) Read([]byte) (int, error) {
return 0, e.Error
}
func TestConvertReader_Error(t *testing.T) {
reader := &ErrReader{
Error: errors.New("we got an error"),
}
converter := NewConverter("", true, nil)
res, err := converter.ConvertReader(reader)
if err != reader.Error {
t.Error("expected an error")
}
if res.Len() != 0 {
t.Error("expected an empty buffer")
}
}
func TestConvertBytes(t *testing.T) {
input := `Bold`
expected := `**Bold**`
converter := NewConverter("", true, nil)
res, err := converter.ConvertBytes([]byte(input))
if err != nil {
t.Error(err)
}
if !bytes.Equal(res, []byte(expected)) {
t.Error("the result is different that expected")
}
}
func TestConvertBytes_Empty(t *testing.T) {
converter := NewConverter("", true, nil)
res, err := converter.ConvertBytes(nil)
if err != nil {
t.Error(err)
}
if !bytes.Equal(res, []byte("")) {
t.Error("the result is different that expected")
}
}
func TestConvertResponse(t *testing.T) {
input := `Bold`
expected := `**Bold**`
converter := NewConverter("", true, nil)
res, err := converter.ConvertResponse(&http.Response{
StatusCode: 200,
Body: ioutil.NopCloser(bytes.NewBufferString(input)),
Request: &http.Request{},
})
if err != nil {
t.Error(err)
}
if res != expected {
t.Error("the result is different that expected")
}
}
func TestConvertResponse_Error(t *testing.T) {
expectedErr := errors.New("custom error reader")
converter := NewConverter("", true, nil)
res, err := converter.ConvertResponse(&http.Response{
StatusCode: 200,
Body: ioutil.NopCloser(&ErrReader{
Error: expectedErr,
}),
Request: &http.Request{},
})
if err != expectedErr {
t.Error(err)
}
if res != "" {
t.Error("the result is different that expected")
}
}
func TestConvertString(t *testing.T) {
input := `Bold`
expected := `**Bold**`
converter := NewConverter("", true, nil)
res, err := converter.ConvertString(input)
if err != nil {
t.Error(err)
}
if res != expected {
t.Error("the result is different that expected")
}
}
func TestConvertSelection(t *testing.T) {
input := `Bold`
expected := `**Bold**`
doc, err := goquery.NewDocumentFromReader(strings.NewReader(input))
if err != nil {
t.Error(err)
}
converter := NewConverter("", true, nil)
res := converter.Convert(doc.Selection)
if res != expected {
t.Error("the result is different that expected")
}
}
func TestConvertURL(t *testing.T) {
input := `Bold`
expected := `**Bold**`
// Start a local HTTP server
server := httptest.NewServer(http.HandlerFunc(func(rw http.ResponseWriter, req *http.Request) {
rw.Write([]byte(input))
}))
// Close the server when test finishes
defer server.Close()
// override the client used in `ConvertURL`
netClient = server.Client()
converter := NewConverter(server.URL, true, nil)
res, err := converter.ConvertURL(server.URL)
if err != nil {
t.Error(err)
}
if res != expected {
t.Error("the result is different that expected")
}
}
func TestConvertURL_Error(t *testing.T) {
url := "abc https://example.com"
converter := NewConverter("", true, nil)
res, err := converter.ConvertURL(url)
if err == nil {
t.Error("expected an error")
}
if res != "" {
t.Error("the result is different that expected")
}
}
func TestConvertURL_ErrorStatusCode(t *testing.T) {
// Start a local HTTP server
server := httptest.NewServer(http.HandlerFunc(func(rw http.ResponseWriter, req *http.Request) {
rw.WriteHeader(http.StatusNotFound)
rw.Write([]byte("404 Not Found"))
}))
// Close the server when test finishes
defer server.Close()
// override the client used in `ConvertURL`
netClient = server.Client()
converter := NewConverter(server.URL, true, nil)
res, err := converter.ConvertURL(server.URL)
if err == nil {
t.Error("expected an error")
}
if res != "" {
t.Error("the result is different that expected")
}
}
// - - - - - - - - - - - - //
func TestNewConverter_NoRules(t *testing.T) {
var buf bytes.Buffer
log.SetOutput(&buf)
log.SetFlags(0)
defer func() {
// reset the options back to the defaults
log.SetOutput(os.Stderr)
log.SetFlags(3)
}()
input := `Bold`
expected := ``
// disable commonmark
converter := NewConverter("", false, nil)
res, err := converter.ConvertString(input)
if err != nil {
t.Error(err)
}
if res != expected {
t.Error("the result is different that expected")
}
if strings.TrimSuffix(buf.String(), "\n") != "you have added no rules. either enable commonmark or add you own." {
t.Error("expected a different log message")
}
}
func TestNewConverter_ValidateOptions(t *testing.T) {
var buf bytes.Buffer
log.SetOutput(&buf)
log.SetFlags(0)
defer func() {
// reset the options back to the defaults
log.SetOutput(os.Stderr)
log.SetFlags(3)
}()
input := `Bold`
expected := `====Bold====`
converter := NewConverter("", true, &Options{
StrongDelimiter: "====",
})
res, err := converter.ConvertString(input)
if err != nil {
t.Error(err)
}
if res != expected {
t.Error("the result is different that expected")
}
if strings.TrimSuffix(buf.String(), "\n") != "markdown options is not valid: field must be one of [** __] but got ====" {
t.Error("expected a different log message")
}
}
func TestNewConverter_ValidateOptions_All(t *testing.T) {
var tests = []struct {
name string
options *Options
input string
expected string
}{
{
name: "HeadingStyle",
options: &Options{
HeadingStyle: "invalid",
},
input: `
`,
expected: "^^^\ntest\n^^^",
},
{
name: "EmDelimiter",
options: &Options{
EmDelimiter: "-",
},
input: `test`,
expected: "-test-",
},
{
name: "LinkStyle",
options: &Options{
LinkStyle: "invalid",
},
input: `link`,
expected: `[link][1]
[1]: example.com`,
},
{
name: "LinkReferenceStyle",
options: &Options{
LinkReferenceStyle: "invalid",
},
input: `link`,
expected: "[link](example.com)",
},
}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
var buf bytes.Buffer
log.SetOutput(&buf)
log.SetFlags(0)
defer func() {
// reset the options back to the defaults
log.SetOutput(os.Stderr)
log.SetFlags(3)
}()
converter := NewConverter("", true, test.options)
res, err := converter.ConvertString(test.input)
if err != nil {
t.Error(err)
}
if res != test.expected {
t.Errorf("expected '%s' but got '%s'", test.expected, res)
}
logOutput := strings.TrimSuffix(buf.String(), "\n")
if !strings.Contains(logOutput, "markdown options is not valid: ") {
t.Errorf("expected a different log message but got '%s'", logOutput)
}
})
}
}
func BenchmarkFromString(b *testing.B) {
converter := NewConverter("www.google.com", true, nil)
strongRule := Rule{
Filter: []string{"strong"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
return nil
},
}
var wg sync.WaitGroup
convert := func(html string) {
defer wg.Done()
_, err := converter.ConvertString(html)
if err != nil {
b.Error(err)
}
}
add := func() {
defer wg.Done()
converter.AddRules(strongRule)
}
for n := 0; n < b.N; n++ {
wg.Add(2)
go add()
go convert("Bold")
}
wg.Wait()
}
func TestAddRules_ChangeContent(t *testing.T) {
expected := "Some other Content"
var wasCalled bool
rule := Rule{
Filter: []string{"p"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
wasCalled = true
if content != "Some Content" {
t.Errorf("got wrong `content`: '%s'", content)
}
if !selec.Is("p") {
t.Error("selec is not p")
}
return String(expected)
},
}
conv := NewConverter("", true, nil)
conv.AddRules(rule)
md, err := conv.ConvertString(`
Some Content
`)
if err != nil {
t.Error(err)
}
if md != expected {
t.Errorf("wanted '%s' but got '%s'", expected, md)
}
if !wasCalled {
t.Error("rule was not called")
}
}
func TestAddRules_Fallback(t *testing.T) {
// firstExpected := "Some other Content"
expected := "Totally different Content"
var firstWasCalled bool
var secondWasCalled bool
firstRule := Rule{
Filter: []string{"p"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
firstWasCalled = true
if secondWasCalled {
t.Error("expected first rule to be called before second rule. second is already called")
}
if content != "Some Content" {
t.Errorf("got wrong `content`: '%s'", content)
}
if !selec.Is("p") {
t.Error("selec is not p")
}
return nil
},
}
secondRule := Rule{
Filter: []string{"p"},
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
secondWasCalled = true
if !firstWasCalled {
t.Error("expected first rule to be called before second rule. first is not called yet")
}
if content != "Some Content" {
t.Errorf("got wrong `content`: '%s'", content)
}
if !selec.Is("p") {
t.Error("selec is not p")
}
return String(expected)
},
}
conv := NewConverter("", true, nil)
conv.AddRules(secondRule, firstRule)
md, err := conv.ConvertString(`
Some Content
`)
if err != nil {
t.Error(err)
}
if md != expected {
t.Errorf("wanted '%s' but got '%s'", expected, md)
}
if !firstWasCalled {
t.Error("first rule was not called")
}
if !secondWasCalled {
t.Error("second rule was not called")
}
}
func TestAddRules_NoRules(t *testing.T) {
var buf bytes.Buffer
log.SetOutput(&buf)
log.SetFlags(0)
defer func() {
// reset the options back to the defaults
log.SetOutput(os.Stderr)
log.SetFlags(3)
}()
var wasCalled bool
rule := Rule{
Filter: []string{ /* nothing */ },
Replacement: func(content string, selec *goquery.Selection, opt *Options) *string {
wasCalled = true
return nil
},
}
conv := NewConverter("", true, nil)
conv.AddRules(rule)
md, err := conv.ConvertString(`
Some Content
`)
if err != nil {
t.Error(err)
}
if md != "Some Content" {
t.Error("got different markdown result")
}
logOutput := strings.TrimSuffix(buf.String(), "\n")
if logOutput != "you need to specify at least one filter for your rule" {
t.Errorf("expected a different log message but got '%s'", logOutput)
}
if wasCalled {
t.Error("the rule should not have been called")
}
}
func TestBefore(t *testing.T) {
var firstWasCalled bool
var secondWasCalled bool
firstHook := func(selec *goquery.Selection) {
firstWasCalled = true
if secondWasCalled {
t.Error("the second hook should not be called yet")
}
}
secondHook := func(selec *goquery.Selection) {
secondWasCalled = true
if !firstWasCalled {
t.Error("the first hook should already be called")
}
}
conv := NewConverter("", true, nil)
conv.Before(firstHook, secondHook)
_, err := conv.ConvertString(`Link`)
if err != nil {
t.Error(err)
}
if !firstWasCalled || !secondWasCalled {
t.Error("not all hooks were called")
}
}
func TestAfter(t *testing.T) {
var firstWasCalled bool
var secondWasCalled bool
firstHook := func(md string) string {
firstWasCalled = true
if secondWasCalled {
t.Error("the second hook should not be called yet")
}
return md + " first"
}
secondHook := func(md string) string {
secondWasCalled = true
if !firstWasCalled {
t.Error("the first hook should already be called")
}
return md + " second"
}
conv := NewConverter("", true, nil)
conv.After(firstHook, secondHook)
md, err := conv.ConvertString(`base`)
if err != nil {
t.Error(err)
}
if md != `base first second` {
t.Errorf("expected different markdown result but got '%s'", md)
}
if !firstWasCalled || !secondWasCalled {
t.Error("not all hooks were called")
}
}
func TestClearBefore(t *testing.T) {
var wasCalled bool
hook := func(selec *goquery.Selection) {
wasCalled = true
}
conv := NewConverter("", true, nil)
conv.ClearBefore()
if len(conv.before) != 0 {
t.Error("the before hook array should be of length 0")
}
conv.Before(hook)
_, err := conv.ConvertString(`Link`)
if err != nil {
t.Error(err)
}
if !wasCalled {
t.Error("the hook should have been called")
}
}
func TestClearAfter(t *testing.T) {
var wasCalled bool
hook := func(markdown string) string {
wasCalled = true
return "my new value"
}
conv := NewConverter("", true, nil)
conv.ClearAfter()
if len(conv.after) != 0 {
t.Error("the after hook array should be of length 0")
}
conv.After(hook)
md, err := conv.ConvertString(`Link`)
if err != nil {
t.Error(err)
}
if md != "my new value" {
t.Error("the result was different then expected")
}
if !wasCalled {
t.Error("the hook should have been called")
}
}
func TestDomainFromURL(t *testing.T) {
var tests = []struct {
input string
expected string
}{
{
input: "example.com",
expected: "example.com",
},
{
input: "https://example.com",
expected: "example.com",
},
{
input: "https://www.example.com",
expected: "www.example.com",
},
{
input: "http://example.com/index.html",
expected: "example.com",
},
{
input: "http://example.com?page=home",
expected: "example.com",
},
{
input: "http://example.com#page",
expected: "example.com",
},
{
input: "http://example.com:3000",
expected: "example.com:3000",
},
{
// not so happy about this :(
input: "example",
expected: "example",
},
{
input: "https://developer.mozilla.org/en-US/docs/Web/API/URL/host",
expected: "developer.mozilla.org",
},
{
input: " http://example.com",
expected: "example.com",
},
{
// invalid url
input: "abc http://example.com",
expected: "",
},
}
for _, test := range tests {
t.Run(test.input, func(t *testing.T) {
res := DomainFromURL(test.input)
if res != test.expected {
t.Errorf("for '%s' expected '%s' but got '%s'", test.input, test.expected, res)
}
})
}
}
golang-github-johanneskaufmann-html-to-markdown-1.6.0/go.mod 0000664 0000000 0000000 00000000426 14624642056 0024114 0 ustar 00root root 0000000 0000000 module github.com/JohannesKaufmann/html-to-markdown
go 1.13
require (
github.com/PuerkitoBio/goquery v1.9.2
github.com/sebdah/goldie/v2 v2.5.3
github.com/sergi/go-diff v1.3.1 // indirect
github.com/yuin/goldmark v1.7.1
golang.org/x/net v0.25.0
gopkg.in/yaml.v2 v2.4.0
)
golang-github-johanneskaufmann-html-to-markdown-1.6.0/go.sum 0000664 0000000 0000000 00000016430 14624642056 0024143 0 ustar 00root root 0000000 0000000 github.com/PuerkitoBio/goquery v1.9.2 h1:4/wZksC3KgkQw7SQgkKotmKljk0M6V8TUvA8Wb4yPeE=
github.com/PuerkitoBio/goquery v1.9.2/go.mod h1:GHPCaP0ODyyxqcNoFGYlAprUFH81NuRPd0GX3Zu2Mvk=
github.com/andybalholm/cascadia v1.3.2 h1:3Xi6Dw5lHF15JtdcmAHD3i1+T8plmv7BQ/nsViSLyss=
github.com/andybalholm/cascadia v1.3.2/go.mod h1:7gtRlve5FxPPgIgX36uWBX58OdBsSS6lUvCFb+h7KvU=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/kr/pretty v0.1.0 h1:L/CwN0zerZDmRFUapSPitk6f+Q3+0za1rQkzVuMiMFI=
github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
github.com/kr/text v0.1.0 h1:45sCR5RtlFHMR4UwH9sdQ5TC8v0qDQCHnXt+kaKSTVE=
github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/sebdah/goldie/v2 v2.5.3 h1:9ES/mNN+HNUbNWpVAlrzuZ7jE+Nrczbj8uFRjM7624Y=
github.com/sebdah/goldie/v2 v2.5.3/go.mod h1:oZ9fp0+se1eapSRjfYbsV/0Hqhbuu3bJVvKI/NNtssI=
github.com/sergi/go-diff v1.0.0/go.mod h1:0CfEIISq7TuYL3j771MWULgwwjU+GofnZX9QAmXWZgo=
github.com/sergi/go-diff v1.3.1 h1:xkr+Oxo4BOQKmkn/B9eMK0g5Kg/983T9DqqPHwYqD+8=
github.com/sergi/go-diff v1.3.1/go.mod h1:aMJSSKb2lpPvRNec0+w3fl7LP9IOFzdc9Pa4NFbPK1I=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI=
github.com/stretchr/testify v1.4.0 h1:2E4SXV/wtOkTonXsotYi4li6zVWxYlZuYNCXe9XRJyk=
github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4=
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
github.com/yuin/goldmark v1.7.1 h1:3bajkSilaCbjdKVsKdZjZCLBNPL9pYzrCakKaf4U49U=
github.com/yuin/goldmark v1.7.1/go.mod h1:uzxRWxtg69N339t3louHJ7+O03ezfj6PlliRlaOzY1E=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.19.0/go.mod h1:Iy9bg/ha4yyC70EfRS8jz+B6ybOBKMaSxLj6P6oBDfU=
golang.org/x/crypto v0.22.0/go.mod h1:vr6Su+7cTlO45qkww3VDJlzDn0ctJvRgYbC2NvXHt+M=
golang.org/x/crypto v0.23.0/go.mod h1:CKFgDieR+mRhux2Lsu27y0fO304Db0wZe70UKqHu0v8=
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
golang.org/x/mod v0.8.0/go.mod h1:iBbtSCu2XBx23ZKBPSOrRkjjQPZFPuis4dIYUhu/chs=
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
golang.org/x/net v0.6.0/go.mod h1:2Tu9+aMcznHK/AK1HMvgo6xiTLG5rD5rZLDS+rp2Bjs=
golang.org/x/net v0.9.0/go.mod h1:d48xBJpPfHeWQsugry2m+kC02ZBRGRgulfHnEXEuWns=
golang.org/x/net v0.10.0/go.mod h1:0qNGK6F8kojg2nk9dLZ2mShWaEBan6FAoqfSigmmuDg=
golang.org/x/net v0.21.0/go.mod h1:bIjVDfnllIU7BJ2DNgfnXvpSvtn8VRwhlsaeUTyUS44=
golang.org/x/net v0.24.0/go.mod h1:2Q7sJY5mzlzWjKtYUEXSlBWCdyaioyXzRB2RtU8KVE8=
golang.org/x/net v0.25.0 h1:d/OCCoBEUq33pjydKrGQhw7IlUPI2Oylr+8qLx49kac=
golang.org/x/net v0.25.0/go.mod h1:JkAGAh7GEvH74S6FOH42FLoXpXbE/aqXSrIQjXgsiwM=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.1.0/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.5.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.7.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.8.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.19.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.20.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
golang.org/x/term v0.5.0/go.mod h1:jMB1sMXY+tzblOD4FWmEbocvup2/aLOaQEp7JmGp78k=
golang.org/x/term v0.7.0/go.mod h1:P32HKFT3hSsZrRxla30E9HqToFYAQPCMs/zFMBUFqPY=
golang.org/x/term v0.8.0/go.mod h1:xPskH00ivmX89bAKVGSKKtLOWNx2+17Eiy94tnKShWo=
golang.org/x/term v0.17.0/go.mod h1:lLRBjIVuehSbZlaOtGMbcMncT+aqLLLmKrsjNrUguwk=
golang.org/x/term v0.19.0/go.mod h1:2CuTdWZ7KHSQwUzKva0cbMg6q2DMI3Mmxp+gKJbskEk=
golang.org/x/term v0.20.0/go.mod h1:8UkIAJTvZgivsXaD6/pH6U9ecQzZ45awqEOzuCvwpFY=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
golang.org/x/text v0.7.0/go.mod h1:mrYo+phRRbMaCq/xk9113O4dZlRixOauAjOtrjsXDZ8=
golang.org/x/text v0.9.0/go.mod h1:e1OnstbJyHTd6l/uOt8jFFHp6TRDWZR/bV3emEE/zU8=
golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/text v0.15.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
golang.org/x/tools v0.6.0/go.mod h1:Xwgl3UAJ/d3gWutnCtw505GrjyAbvKui8lOU390QaIU=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15 h1:YR8cESwS4TdDjEe65xsg0ogRM/Nc3DYOhEAlW+xobZo=
gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY=
gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ=
golang-github-johanneskaufmann-html-to-markdown-1.6.0/logo_five_years.png 0000664 0000000 0000000 00000541515 14624642056 0026701 0 ustar 00root root 0000000 0000000 PNG
IHDR G iCCPsRGB IEC61966-2.1 (u+DQ?1QIXȏ(#$Q7Լ{OU/`"ReMl橑̹{>{9{."YE5z@͙zx,䛋<ϸi-1І')kwT&`*_K$
*jM7Dž'WLmf%K
u~9z$<Fa_Uay9jvYdnvFbxacQFˠA-+K"*:K`uY'%DOȲjo_TS>'z6-в
GPR~ E,i]˒߁
h}bz(UR)x=(4]Cӳ}!&_u{%罋 0g pHYs IDATxw|չ7Lٝղ-ْLK $@z^ kI( )ܔK=`nl{-Yd}!=]J䟰hwv49<n@!D`k@7L_u_ Du\!m9Klyo/nzup]q82BY &pRbjD,r`TUFN25߾ttի1c,DžB(w8k!JB]{1W"`}2YDGٍ*@lqdL=PK<"B!KyVItWXPG'ϴPzV,ٿ~ #C!0Z#>[ˤu&<נt 2:HN.}8o:Us`!NG!$uWLnvX9=C6(3(l|ZcLx81m+lny7v$x!EBe(3d:PW'Z7(!`1]Oz*GoPo8.to8v^'h!&(!LHTiW4W@L.JtoFEpqc7|6B`q{h׀֯<ύ!&&{@!SJmwX[;TMi82{l*&[ro:ދkBhbhB(yΩeXcHEPL>U9lO/_Be
B('ϽVՒ{W3T:ViF ïw5]A
v$-GBhhB(Թc*=v@ɉfe'FN O*mܶÖ"s[BB!runÆJJ ;@艦9S!˂`s x DkBY]fլ;cZL1pd([0 //;sjAާi=>BFkBc~W5'6c jL6s{_$YmЫ5G-Q 18!!PN+ԍpVԓ{W˪48B