` tag. + Fix for horizontal rules preceded by 2 or 3 spaces. + `
` tags.
+ You can now write empty links:
[like this]()
and they'll be turned into anchor tags with empty href attributes.
This should have worked before, but didn't.
+ `***this***` and `___this___` are now turned into
this
Instead of
this
which isn't valid. (Thanks to Michel Fortin for the fix.)
+ Added a new substitution in `_EncodeCode()`: s/\$/$/g; This
is only for the benefit of Blosxom users, because Blosxom
(sometimes?) interpolates Perl scalars in your article bodies.
+ Fixed problem for links defined with urls that include parens, e.g.:
[1]: http://sources.wikipedia.org/wiki/Middle_East_Policy_(Chomsky)
"Chomsky" was being erroneously treated as the URL's title.
+ At some point during 1.0's beta cycle, I changed every sub's
argument fetching from this idiom:
my $text = shift;
to:
my $text = shift || return '';
The idea was to keep Markdown from doing any work in a sub
if the input was empty. This introduced a bug, though:
if the input to any function was the single-character string
"0", it would also evaluate as false and return immediately.
How silly. Now fixed.
Donations
---------
Donations to support Markdown's development are happily accepted. See:
s around
# "paragraphs" that are wrapped in non-block-level tags, such as anchors,
# phrase emphasis, and spans. The list of tags we're looking for is
# hard-coded:
my $block_tags_a = qr/p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math|ins|del/;
my $block_tags_b = qr/p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|script|noscript|form|fieldset|iframe|math/;
# First, look for nested blocks, e.g.:
# Just type tags
#
my $text = shift;
# Strip leading and trailing lines:
$text =~ s/\A\n+//;
$text =~ s/\n+\z//;
my @grafs = split(/\n{2,}/, $text);
#
# Wrap tags.
#
foreach (@grafs) {
unless (defined( $g_html_blocks{$_} )) {
$_ = _RunSpanGamut($_);
s/^([ \t]*)/ /;
$_ .= " tags get encoded.
#
my $text = shift;
# Clear the global hashes. If we don't clear these, you get conflicts
# from other articles when generating a page which contains more than
# one article (e.g. an index page that shows the N most recent
# articles):
%g_urls = ();
%g_titles = ();
%g_html_blocks = ();
# Standardize line endings:
$text =~ s{\r\n}{\n}g; # DOS to Unix
$text =~ s{\r}{\n}g; # Mac to Unix
# Make sure $text ends with a couple of newlines:
$text .= "\n\n";
# Convert all tabs to spaces.
$text = _Detab($text);
# Strip any lines consisting only of spaces and tabs.
# This makes subsequent regexen easier to write, because we can
# match consecutive blank lines with /\n+/ instead of something
# contorted like /[ \t]*\n+/ .
$text =~ s/^[ \t]+$//mg;
# Turn block-level HTML blocks into hash entries
$text = _HashHTMLBlocks($text);
# Strip link definitions, store in hashes.
$text = _StripLinkDefinitions($text);
$text = _RunBlockGamut($text);
$text = _UnescapeSpecialChars($text);
return $text . "\n";
}
sub _StripLinkDefinitions {
#
# Strips link definitions from text, stores the URLs and titles in
# hash references.
#
my $text = shift;
my $less_than_tab = $g_tab_width - 1;
# Link defs are in the form: ^[id]: url "optional title"
while ($text =~ s{
^[ ]{0,$less_than_tab}\[(.+)\]: # id = $1
[ \t]*
\n? # maybe *one* newline
[ \t]*
(\S+?)>? # url = $2
[ \t]*
\n? # maybe one newline
[ \t]*
(?:
(?<=\s) # lookbehind for whitespace
["(]
(.+?) # title = $3
[")]
[ \t]*
)? # title is optional
(?:\n+|\Z)
}
{}mx) {
$g_urls{lc $1} = _EncodeAmpsAndAngles( $2 ); # Link IDs are case-insensitive
if ($3) {
$g_titles{lc $1} = $3;
$g_titles{lc $1} =~ s/"/"/g;
}
}
return $text;
}
sub _HashHTMLBlocks {
my $text = shift;
my $less_than_tab = $g_tab_width - 1;
# Hashify HTML blocks:
# We only want to do this for block-level HTML tags, such as headers,
# lists, and tables. That's because we still want to wrap
. It was easier to make a special case than
# to make the other regex more complicated.
$text =~ s{
(?:
(?<=\n\n) # Starting after a blank line
| # or
\A\n? # the beginning of the doc
)
( # save in $1
[ ]{0,$less_than_tab}
<(hr) # start tag = $2
\b # word break
([^<>])*? #
/?> # the matching end tag
[ \t]*
(?=\n{2,}|\Z) # followed by a blank line or end of document
)
}{
my $key = md5_hex($1);
$g_html_blocks{$key} = $1;
"\n\n" . $key . "\n\n";
}egx;
# Special case for standalone HTML comments:
$text =~ s{
(?:
(?<=\n\n) # Starting after a blank line
| # or
\A\n? # the beginning of the doc
)
( # save in $1
[ ]{0,$less_than_tab}
(?s:
)
[ \t]*
(?=\n{2,}|\Z) # followed by a blank line or end of document
)
}{
my $key = md5_hex($1);
$g_html_blocks{$key} = $1;
"\n\n" . $key . "\n\n";
}egx;
return $text;
}
sub _RunBlockGamut {
#
# These are all the transformations that form block-level
# tags like paragraphs, headers, and list items.
#
my $text = shift;
$text = _DoHeaders($text);
# Do Horizontal Rules:
$text =~ s{^[ ]{0,2}([ ]?\*[ ]?){3,}[ \t]*$}{\n
tags around block-level tags.
$text = _HashHTMLBlocks($text);
$text = _FormParagraphs($text);
return $text;
}
sub _RunSpanGamut {
#
# These are all the transformations that occur *within* block-level
# tags like paragraphs, headers, and list items.
#
my $text = shift;
$text = _DoCodeSpans($text);
$text = _EscapeSpecialChars($text);
# Process anchor and image tags. Images must come first,
# because ![foo][f] looks like an anchor.
$text = _DoImages($text);
$text = _DoAnchors($text);
# Make links out of things like `
or tags.
# my $tags_to_skip = qr!<(/?)(?:pre|code|kbd|script|math)[\s>]!;
foreach my $cur_token (@$tokens) {
if ($cur_token->[0] eq "tag") {
# Within tags, encode * and _ so they don't conflict
# with their use in Markdown for italics and strong.
# We're replacing each such character with its
# corresponding MD5 checksum value; this is likely
# overkill, but it should prevent us from colliding
# with the escape values by accident.
$cur_token->[1] =~ s! \* !$g_escape_table{'*'}!gx;
$cur_token->[1] =~ s! _ !$g_escape_table{'_'}!gx;
$text .= $cur_token->[1];
} else {
my $t = $cur_token->[1];
$t = _EncodeBackslashEscapes($t);
$text .= $t;
}
}
return $text;
}
sub _DoAnchors {
#
# Turn Markdown link shortcuts into XHTML tags.
#
my $text = shift;
#
# First, handle reference-style links: [link text] [id]
#
$text =~ s{
( # wrap whole match in $1
\[
($g_nested_brackets) # link text = $2
\]
[ ]? # one optional space
(?:\n[ ]*)? # one optional newline followed by spaces
\[
(.*?) # id = $3
\]
)
}{
my $result;
my $whole_match = $1;
my $link_text = $2;
my $link_id = lc $3;
if ($link_id eq "") {
$link_id = lc $link_text; # for shortcut links like [this][].
}
if (defined $g_urls{$link_id}) {
my $url = $g_urls{$link_id};
$url =~ s! \* !$g_escape_table{'*'}!gx; # We've got to encode these to avoid
$url =~ s! _ !$g_escape_table{'_'}!gx; # conflicting with italics/bold.
$result = "? # href = $3
[ \t]*
( # $4
(['"]) # quote char = $5
(.*?) # Title = $6
\5 # matching quote
)? # title is optional
\)
)
}{
my $result;
my $whole_match = $1;
my $link_text = $2;
my $url = $3;
my $title = $6;
$url =~ s! \* !$g_escape_table{'*'}!gx; # We've got to encode these to avoid
$url =~ s! _ !$g_escape_table{'_'}!gx; # conflicting with italics/bold.
$result = " tags.
#
my $text = shift;
#
# First, handle reference-style labeled images: ![alt text][id]
#
$text =~ s{
( # wrap whole match in $1
!\[
(.*?) # alt text = $2
\]
[ ]? # one optional space
(?:\n[ ]*)? # one optional newline followed by spaces
\[
(.*?) # id = $3
\]
)
}{
my $result;
my $whole_match = $1;
my $alt_text = $2;
my $link_id = lc $3;
if ($link_id eq "") {
$link_id = lc $alt_text; # for shortcut links like ![this][].
}
$alt_text =~ s/"/"/g;
if (defined $g_urls{$link_id}) {
my $url = $g_urls{$link_id};
$url =~ s! \* !$g_escape_table{'*'}!gx; # We've got to encode these to avoid
$url =~ s! _ !$g_escape_table{'_'}!gx; # conflicting with italics/bold.
$result = "
? # src url = $3
[ \t]*
( # $4
(['"]) # quote char = $5
(.*?) # title = $6
\5 # matching quote
[ \t]*
)? # title is optional
\)
)
}{
my $result;
my $whole_match = $1;
my $alt_text = $2;
my $url = $3;
my $title = '';
if (defined($6)) {
$title = $6;
}
$alt_text =~ s/"/"/g;
$title =~ s/"/"/g;
$url =~ s! \* !$g_escape_table{'*'}!gx; # We've got to encode these to avoid
$url =~ s! _ !$g_escape_table{'_'}!gx; # conflicting with italics/bold.
$result = "
" . _RunSpanGamut($1) . "\n\n";
}egmx;
$text =~ s{ ^(.+)[ \t]*\n-+[ \t]*\n+ }{
"
" . _RunSpanGamut($1) . "
\n\n";
}egmx;
# atx-style headers:
# # Header 1
# ## Header 2
# ## Header 2 with closing hashes ##
# ...
# ###### Header 6
#
$text =~ s{
^(\#{1,6}) # $1 = string of #'s
[ \t]*
(.+?) # $2 = Header text
[ \t]*
\#* # optional closing #'s (not counted)
\n+
}{
my $h_level = length($1);
"` blocks.
#
my $text = shift;
$text =~ s{
(?:\n\n|\A)
( # $1 = the code block -- one or more lines, starting with a space/tab
(?:
(?:[ ]{$g_tab_width} | \t) # Lines must start with a tab or a tab-width of spaces
.*\n+
)+
)
((?=^[ ]{0,$g_tab_width}\S)|\Z) # Lookahead for non-space at line-start, or end of doc
}{
my $codeblock = $1;
my $result; # return value
$codeblock = _EncodeCode(_Outdent($codeblock));
$codeblock = _Detab($codeblock);
$codeblock =~ s/\A\n+//; # trim leading newlines
$codeblock =~ s/\s+\z//; # trim trailing whitespace
$result = "\n\n
";
@egsx;
return $text;
}
sub _EncodeCode {
#
# Encode/escape certain characters inside Markdown code runs.
# The point is that in code, these characters are literals,
# and lose their special Markdown meanings.
#
local $_ = shift;
# Encode all ampersands; HTML entities are not
# entities within a Markdown code span.
s/&/&/g;
# Encode $'s, but only if we're running under Blosxom.
# (Blosxom interpolates Perl variables in article bodies.)
{
no warnings 'once';
if (defined($blosxom::version)) {
s/\$/$/g;
}
}
# Do the angle bracket song and dance:
s! < !<!gx;
s! > !>!gx;
# Now, escape characters that are magic in Markdown:
s! \* !$g_escape_table{'*'}!gx;
s! _ !$g_escape_table{'_'}!gx;
s! { !$g_escape_table{'{'}!gx;
s! } !$g_escape_table{'}'}!gx;
s! \[ !$g_escape_table{'['}!gx;
s! \] !$g_escape_table{']'}!gx;
s! \\ !$g_escape_table{'\\'}!gx;
return $_;
}
sub _DoItalicsAndBold {
my $text = shift;
# must go first:
$text =~ s{ (\*\*|__) (?=\S) (.+?[*_]*) (?<=\S) \1 }
{$2}gsx;
$text =~ s{ (\*|_) (?=\S) (.+?) (?<=\S) \1 }
{$2}gsx;
return $text;
}
sub _DoBlockQuotes {
my $text = shift;
$text =~ s{
( # Wrap whole match in $1
(
^[ \t]*>[ \t]? # '>' at the start of a line
.+\n # rest of the first line
(.+\n)* # subsequent consecutive lines
\n* # blanks
)+
)
}{
my $bq = $1;
$bq =~ s/^[ \t]*>[ \t]?//gm; # trim one level of quoting
$bq =~ s/^[ \t]+$//mg; # trim whitespace-only lines
$bq = _RunBlockGamut($bq); # recurse
$bq =~ s/^/ /g;
# These leading spaces screw with
\n\n";
$result;
}egmx;
return $text;
}
sub _DoCodeSpans {
#
# * Backtick quotes are used for " . $codeblock . "\n
spans.
#
# * You can use multiple backticks as the delimiters if you want to
# include literal backticks in the code span. So, this input:
#
# Just type ``foo `bar` baz`` at the prompt.
#
# Will translate to:
#
#
foo `bar` baz
at the prompt.`bar`
...
#
my $text = shift;
$text =~ s@
(`+) # $1 = Opening run of `
(.+?) # $2 = The code block
(?$c content, so we need to fix that:
$bq =~ s{
(\s*
.+?
)
}{
my $pre = $1;
$pre =~ s/^ //mg;
$pre;
}egsx;
"\n$bq\n
\n\n";
}egmx;
return $text;
}
sub _FormParagraphs {
#
# Params:
# $text - string to process with html as well).
For more information about Markdown's syntax, see:
http://daringfireball.net/projects/markdown/
=head1 OPTIONS
Use "--" to end switch parsing. For example, to open a file named "-z", use:
Markdown.pl -- -z
=over 4
=item B<--html4tags>
Use HTML 4 style for empty element tags, e.g.:
instead of Markdown's default XHTML style tags, e.g.:
=item B<-v>, B<--version>
Display Markdown's version number and copyright information.
=item B<-s>, B<--shortversion>
Display the short-form version number.
=back
=head1 BUGS
To file bug reports or feature requests (other than topics listed in the
Caveats section above) please send email to:
support@daringfireball.net
Please include with your report: (1) the example input; (2) the output
you expected; (3) the output Markdown actually produced.
=head1 VERSION HISTORY
See the readme file for detailed release notes for this version.
1.0.1 - 14 Dec 2004
1.0 - 28 Aug 2004
=head1 AUTHOR
John Gruber
http://daringfireball.net
PHP port and other contributions by Michel Fortin
http://michelf.com
=head1 COPYRIGHT AND LICENSE
Copyright (c) 2003-2004 John Gruber