pax_global_header00006660000000000000000000000064132436253200014512gustar00rootroot0000000000000052 comment=b83b89992d67fd93dadbfa7d1f86d6a119361e6c edbrowse-3.7.2/000077500000000000000000000000001324362532000133355ustar00rootroot00000000000000edbrowse-3.7.2/.gitignore000066400000000000000000000000171324362532000153230ustar00rootroot00000000000000*.bak *~ temp* edbrowse-3.7.2/CHANGES000066400000000000000000000320731324362532000143350ustar00rootroot00000000000000Here are some changes introduced by recent versions of edbrowse. 3.7.2: Keep third party open source javascript routines in a separate file third.js. Licenses are included in that file. Deminimize javascript, so line numbers in error messages actually convey useful information. The demin command toggles this feature. Third party software, in third.js, performs the deminimization. Disable javascript timers via the timers command, only for debugging. Compile some js functions once in the master window, and reference them from all the other windows. This saves time and space. If an argument is tags and css files and apply them to the corresponding javascript objects. Implement getComputedStyle(). 3.7.0: Switch from Mozilla js to Duktape js. Mask password fields on input forms with stars, as other browsers do. Issue the ipass command to enter a password without echo. Curl authorization negotiations enabled or disabled via the can toggle command. This is a workaround for problems with NTLM. Parallel instances of edbrowse don't clobber each other's cookies when they exit and write the common cookie jar. See mergeCookies() in cookies.c. curl does not become active until you need it. Environment variable JSGC forces duktape garbage collection after every script. Environment variable JS1 keeps edbrowse and js in one process. These are for development and testing, and could go away. Clean up compiler warnings so we can use the -Wall flag. 3.6.3: Maintain a cache of http files. Provides faster access to websites that are visited often. Cache directory and size can be set in the config file, but the defaults are reasonable. Use a substring of the url to determine a mime type, not just the suffix. This is primarily for youtube videos, which can play as a stream, but have no obvious protocol or suffix to key on. urlmatch = youtube.com/watch?|youtube.com/embed/ Currently a substring, may become a regexp later. g? prints the url for the link instead of going to it. You can look before you leap. Also g2? g$? etc. i* on a textarea goes to that editing session, in this case equivalent to e7. This is convenient for scripting, since you don't know ahead of time which buffer will be allocated for this purpose. The M (move) command does not require a destination session; edbrowse finds an empty session for you. Represent multiple frames in one edbrowse window. Each has its own javascript world. At this point the worlds do not interact. You can expand and contract frames by the exp and ctr commands. 3.6.2: Read and convert utf16 and utf32 as per the byte order mark. convert such files to utf8 or iso8859-1 as per the console setting, though the latter is deprecated. Convert back only if writing back out to the same file. Enter high unicodes by the string ~u....; where the dots are hex digits and the semi is optional. New "buffer list" command (bflist) to get a list of buffers and filenames. Read the attachment filename from the http content-disposition header and set the current filename accordingly. The url could be replaced with foobar.zip, but that's what the web designer wanted. If a file has a recognized suffix, with a stream plugin, then typing g on a link within that file invokes the same plugin. This is primarily used for .pls files, which are playlists, and each URL therein is intended for a music player. Other browsers seem to work this way. 3.6.1: Merge edbrowse and edbrowse-js back into one executable image. More convenient for distribution. The --mode argument determines the action of the process. Example edbrowse --mode js to run as the js engine. Simple implementation of xhr under javascript, synchronous only. Don't encode urls that have already been encoded via . The second encoding is mostly harmless, except for + becoming %2b. Turn on or off the progress dots during an http or ftp download, or receive progress counts by megabyte. Create the edbrowse temp dir at the outset, and user directories beneath this directory, mod 700, for added security in a multiuser system. Temp files for plugins are created beneath the user directories. Reload the config file on command. No need to exit and restart. ~0 in an edbrowse function is the whole line, even if more than 9 arguments. db>filename to redirect debugging output to a file. 3.6.0.1: Bug fixes. Most importantly, fixed a buffer overflow in sprintf. 3.6.0: Edbrowse is an imap client. Scan and search through folders, delete move or download emails, pull down attachments. Use the tidy5 library to parse html and create a tree of nodes. Render the text buffer based on this tree of nodes, rather than the original html text. Rerender the tree after it has been changed by javascript, or via the rr command, and report any differences, i.e. what has javascript changed? Implement javascript timers and intervals. These run asynchronously in the background. Various ls commands in directory mode print the size, mod time, and permissions of the file on the current line. Set ls=lt to list the length and time of all the files in subsequent directory scans. See documentation for more. Port edbrowse to windows, with small modifications. Set up cmake scripts so that cmake can be used to build edbrowse on windows or on linux. Traditional make is still available from the src directory. Use cmake to build edbrowse under MacPorts, thus available under OSX. Enhance the DOM sufficient to compile a jQuery object <= 1.9.1. This is the first step along the path to full jQuery support. Add an interactive javascript / DOM debugger. Type jdb to enter this debugging mode, and period to exit. Create a default .ebrc file if none is found. This is just a template, the user is encouraged to personalize the file. The default .ebrc file is in the user's language for supported languages. Move all the language files, (ebrc files and message strings), to per-language files in the lang directory. perl programs map these into strings in the C source. Support http only cookies. 3.5.4.2: Limited and preliminary imap access. Envelopes only. Messages can be moved or deleted, but not read or downloaded. We no longer downgrade to SSL v3 on failure to use newer versions of SSL. Edbrowse now warns if you try to quit with a modified buffer that has no associated filename. This is consistent with ed and most other editors. 3.5.4.1: Fix a couple of bugs related to downloading files from the internet. 3.5.4: Messages in German, thanks to Sebastian Humenda. Autoplay of audio files found on websites, using content-type, and autoplay of audio files from directory mode. Use a plugin to convert pdf to html, or any other conversion you wish. Autoconvert such files as you encounter them via the g command. directory listing sorted by locale, like/bin/ls. Automatically include references when replying to an email, re or rea commands, so it threads properly. 3.5.3: Write a separate process, edbrowse-js, to handle all the javascript objects. This process and only this process interacts with the js library, be it mozilla or v8 or whatever. Edbrowse implements the document object model at a higher level, and communicates with edbrowse-js for the corresponding javascript objects. Allow users to download large binary files in the background, and straight to disk. Useful for computers with limited memory but plenty of disk. 3.5.2: The blacklist feature is now gone. It wasn't really used, as there are more effective ways to fight spam these days. Also, there was the possibility that reading an empty blacklist file could lead to a crash. This release contains a few additional minor bugfixes, the most significant of which involved the rendering preformatted sections when browsing html. 3.5.1: Mozilla javascript version 2.4 and above supports only a C++ interface, so if we want to keep using moz js, then we must follow along. Edbrowse 3.5.1 converts the javascript layer from C to C++. These are the files jsdom.cpp, jsloc.cpp, and html.cpp (use to be .c). Other files may convert to C++ in the future. Use the curl library to send and receive mail. This replaces home-grown pop3 and smtp software. 3.4.10: Polish translations, courtesy of Wojciech Gac. 3.4.9: Various bug fixes. 3.4.8: * Edbrowse now requires version 1.8.5 (or higher) of Spidermonkey. * When completing filenames with readline, a trailing space is no longer added. * Updated French translation of the User's Guide, thanks to Erwin Bliesenick. * Edbrowse now supports localized HTTP responses; see the User's Guide. * In the Edbrowse scripting language, function names are now case-insensitive. 3.4.7: memcpy and strcpy are no longer called on overlapping regions. Files with unknown length, such as those under /proc, are now readable. Miscellaneous fixes. 3.4.6: Fix file corruption bug for large files with more than a million lines. 3.4.5: Dot stuffing in emails. Support for readline() on input. Support for proxies through .ebrc or the environment. 3.4.4: Fixed a cookie bug; tail matching never took place. Thus a cookie would never propagate to a subdomain. Bad news. 3.4.3: Hotmail smtp protocol. outport = ^587 Minor tweaks for compilation under OS X. 3.4.1: Access to databases through odbc. Modify rows in a table by using the edit commands you already know. Be careful; delete means delete! 3.3.4: Convert between iso8859-1 and utf8 on the fly, according to the contents of the file and the value of $LANG. This takes place automatically as files are read and written; the user shouldn't notice a thing. 3.3.3: New reply feature, maintains the thread for discussion lists. Move docs to a doc directory, and source to an src directory. Fix some utf8 bugs. 3.3.2: Supports reading of pdf files by calling the utility pdftohtml. http://rpmfind.net/linux/RPM/suse/updates/10.0/i386/rpm/i586/pdftohtml-0.36-130.9.i586.html Also brings in email over ssl. Secure smtp implies auth login; no other authentication method is implemented at this time. 3.3.1: The error and output messages of edbrowse have been internationalized. Set LANG= to specify the language. At present, LANG=en and LANG=fr are supported. (English and French) 3.2.1: This version introduces sql database access, through Informix esql (tested) and odbc (not tested). Access a table in the database just as you would access a file. Inserts, updates, and deletes are applied to the database, as they take place in your local buffer. It's almost wysiwyg. And it's dangerous. If you delete a row, there is no undo, so be careful. 3.1.3: Edbrowse can now fetch and execute a local javascript file, as in \n"; # There's a lot of this going around. $prequote = 0; $prequote = 1 if $item =~ s/^\( *['"]//; return ' ' if $inscript and $prequote; if(substr($item, 1, 1) eq '/') { --$inscript if $inscript; } else { ++$inscript; } return $item; } # processScript sub backOverSpaces($) { my $trunc = shift; my $j = length($refbuf) - 1; --$j while $j >= 0 and substr($refbuf, $j, 1) =~ /[ \t]/; ++$j; substr($refbuf, $j) = "" if $trunc; return $j; } # backOverSpaces # Recompute space value, after the buffer has been cropped. # 0 = word, 1 = spaces, 2 = newline, 3 = paragraph. sub computeSpace() { return 3 if ! length $refbuf; my $last = substr $refbuf, -1; return 0 if $last !~ /\s/; return 1 if $last ne "\n"; return 2 if substr($refbuf, -2) ne "\n\n"; return 3; } # computeSpace # Here are the common keywords for mail header lines. # These are in alphabetical order, so you can stick more in as you find them. # The more words we have, the more accurate the test. # Value = 1 means it might be just a "NextPart" mime header, # rather than a full-blown email header. # Value = 2 means it could be part of an English form. # Value = 4 means it's almost certainly a line in a mail header. %mhWords = ( "action" => 2, "arrival-date" => 4, "content-transfer-encoding" => 1, "content-type" => 1, "date" => 2, "delivered-to" => 4, "errors-to" => 4, "final-recipient" => 4, "from" => 2, "importance" => 4, "last-attempt-date" => 4, "list-id" => 4, "mailing-list" => 4, "message-id" => 4, "mime-version" => 4, "precedence" => 4, "received" => 4, "remote-mta" => 4, "reply-to" => 4, "reporting-mta" => 4, "return-path" => 4, "sender" => 4, "status" => 2, "subject" => 4, "to" => 2, "x-beenthere" => 4, "x-loop" => 4, "x-mailer" => 4, "x-mailman-version" => 4, "x-mimeole" => 4, "x-ms-tnef-correlator" => 4, "x-msmail-priority" => 4, "x-priority" => 4, "x-uidl" => 4, ); # Get a filename from the user. sub getFileName($$) { my $startName = shift; my $isnew = shift; input: { print "Filename: "; print "[$startName] " if defined $startName; my $line = ; exit 0 unless defined $line; stripWhite \$line; if($line eq "") { redo input if ! defined $startName; $line = $startName; } else { $startName = undef; $line = envLine $line; print("$errorMsg\n"), redo input if length $errorMsg; } # blank line if($isnew and -e $line) { print "Sorry, file $line already exists.\n"; $startName = undef; redo input; } return $line; } } # getFileName # Get a character from the tty, raw mode. # For some reason hitting ^c in this routine doesn't leave the tty # screwed up. I don't know why not. sub userChar { my $choices = shift; input: { # Too bad there isn't a perl in-built for this. # I don't know how to do this in Windows. Help anybody? system "stty", "-icanon", "-echo"; my $c = getc; system "stty", "icanon", "echo"; if(defined $choices and index($choices, $c) < 0) { STDOUT->autoflush(1); print "\a\b"; STDOUT->autoflush(0); redo input; } return $c; } } # userChar # Encode html page or mail message. # No args, the html is stored in @text, as indicated by $map. sub render() { $dol or $errorMsg = "empty file", return 0; $errorMsg = "binary file", return 0 if $fmode&$binmode; $errorMsg = "cannot render a directory", return 0 if $fmode&$dirmode; my ($i, $j, $k, $rc); my $type = ""; $btags[$context] = $btags = []; $$btags[0] = {tag => "special", fw => {} }; # If it starts with html, head, or comment, we'll call it html. my $tbuf = fetchLine 1, 0; if($tbuf =~ /^\s*<(?:!-|html|head|meta)/i) { $type = "html"; } if(! length $type) { # Check for mail header. # There might be html tags inside the mail message, so we need to # look for mail headers first. # This is a very simple test - hopefully not too simple. # The first 20 non-indented lines have to look like mail header lines, # with at least half the keywords recognized. $j = $k = 0; for $i (1..$dol) { my $line = fetchLine $i, 0; last unless length $line; next if $line =~ /^[ \t]/; # indented ++$j; next unless $line =~ /^([\w-]+):/; my $word = lc $1; my $v = $mhWords{$word}; ++$k if $v; if($k >= 4 and $k*2 >= $j) { $type = "mail"; last; } last if $j > 20; } } if($type ne "mail") { # Put the lines together into one long string. # This is necessary to check for, and render, html. $tbuf .= "\n"; $tbuf .= fetchLine($_, 0) . "\n" foreach (2..$dol); } if(! length $type) { # Count the simple html tags, we need at least two per kilabyte. $i = length $tbuf; $j = $tbuf =~ s/(<\/?[a-zA-Z]{1,7}\d?[>\s])/$1/g; $j = 0 if $j eq ""; $type = "html" if $j * 500 >= $i; } if(! length $type) { $errorMsg = "this doesn't look like browsable text"; return 0; } $badHtml = 0; $badHtml = 1 if is_url($fname); $rc = renderMail(\$tbuf) if $type eq "mail"; $rc = renderHtml(\$tbuf) if $type eq "html"; return 0 unless $rc; pushRenderedText(\$tbuf) or return 0; if($type eq "mail") { $fmode &= ~$browsemode; # so I can run the next command evaluate(",bl"); $errorMsg = ""; $dot = $dol; $fmode &= ~$changemode; $fmode |= $browsemode; } apparentSize(); $tbuf = undef; if($type eq "mail" and $nat) { print "$nat attachments.\n"; $j = 0; foreach $curPart (@mimeParts) { next unless $$curPart{isattach}; ++$j; print "Attachment $j\n"; my $filename = getFileName($$curPart{filename}, 1); next if $filename eq "x"; if($filename eq "e") { print "session " . (cxCreate(\$$curPart{data}, $$curPart{filename})+1) . "\n"; next; } if(open FH, ">$filename") { binmode FH, ':raw' if $doslike; print FH $$curPart{data} or dieq "Cannot write to attachment file $filename, $!."; close FH; } else { print "Cannot create attachment file $filename.\n"; } } # loop over attachments print "attachments complete.\n"; } # attachments present return 1; } # render # Pass the reformatted text, without its last newline. sub pushRenderedText($) { my $tbuf = shift; # Replace common nonascii symbols # I don't know what this pair of bytes is for! $$tbuf =~ s/\xe2\x81//g; # Transliterate alternate forms of quote, apostrophe, etc. # We replace escape too, cuz it shouldn't be there anyways, and it messes up # some terminals, and some adapters. # Warning!! Don't change anything in the range \x80-\x8f. # These codes are for internal use, and mus carry through. $$tbuf =~ y/\x1b\x95\x99\x9c\x9d\x92\x93\x94\xa0\xad\x96\x97/_*'`''`' \55\55\55/; # Sometimes the bullet list indicator is falsely separated from the subsequent text. $$tbuf =~ s/\n\n\*\n\n/\n\n* /g; # Turn nonascii math symbols into our encoded versions of math symbols, # to be handled like Greek letters etc, in a consistent manner, # by the next block of code. $$tbuf =~ s/\xb0/\x82176#/; # degrees $$tbuf =~ s/\xbc/\x82188#/; # 1 fourth $$tbuf =~ s/\xbd/\x82189#/; # 1 half $$tbuf =~ s/\xbe/\x82190#/; # 3 fourths $$tbuf =~ s/\xd7/\x82215#/; # times $$tbuf =~ s/\xf7/\x82247#/; # divided by if($$tbuf =~ /\x82\d+#/) { # we have codes to expand. # These symbols are going to become words - # put spaces on either side, if the neighbors are also words. $$tbuf =~ s/#\x82/# \x82/g; $$tbuf =~ s/([a-zA-Z\d])(\x82\d+#)/$1 $2/g; $$tbuf =~ s/(\x82\d+#)([a-zA-Z\d])/$1 $2/g; $$tbuf =~ s/\x82(\d+)#/$symbolWord{$1}/ge; } # Now push into lines, for the editor. my $j = $#text; if(length $$tbuf) { push @text, split "\n", $$tbuf, -1; } else { push @text, ""; } $#text = $j, return 0 if lineLimit 0; $$btags[0]{map1} = $map; $$btags[0]{dot} = $dot; $$btags[0]{dol1} = $dol; $dot = $dol = $#text - $j; $$btags[0]{dol2} = $dol; $map = $lnspace; $map .= sprintf($lnformat, $j) while ++$j <= $#text; $$btags[0]{map2} = $map; $fmode &= ~$firstopmode; $$btags[0]{fname} = $fname; $$btags[0]{fmode} = $fmode; $$btags[0]{labels} = $labels; $fmode &= $changemode; # only the change bit retains its significance $fmode |= $browsemode; $labels = $lnspace x 26; $fname .= ".browse" if length $fname; return 1; } # pushRenderedText # Pass in the text to be rendered, by reference. # The text is *replaced* with the rendered text. sub renderHtml($) { my $tbuf = shift; my ($i, $j, $ofs1, $ofs2, $h); # variables $baseref = $fname; # Ok, here's a real kludge. # The utility that converts pdf to html, # access.adobe.com/simple_form.html, has a few quirks. # One of the common problems in the translation is # the following meaningless string, that appears over and over again. # I'm removing it here. $$tbuf =~ s/Had\strouble\sresolving\sdest\snear\sword\s(<[\w_;:\/'"().,-]+>\s)?action\stype\sis\sGoToR?//g; # I don't expect any overstrikes, but just in case ... $$tbuf =~ s/[^<>"'&\n]\10//g; # Get rid of any other backspaces. $$tbuf =~ y/\10/ /; # Make sure there aren't any \x80 characters to begin with. $$tbuf =~ y/\x80/\x81/; # As far as I can tell, href=// means href=http:// # Is this documented anywhere?? $$tbuf =~ s,\bhref=(["']?)//\b,HREF=$1http://,ig; # Find the simple window javascript functions $refbuf = ""; $lineno = $colno = 1; $lspace = 3; javaFunctions($tbuf); # Before we do the standard tags, get rid of the tags. # I turn them into tags, # which will be disposed of later, along with all the # other unrecognized tags. # This is not a perfect implementation. # It will glom onto the , # and it shouldn't; but niehter should you be writing such a perverse string! $$tbuf =~ s///g; $bangtag = ""; $$tbuf =~ s/(['"]|<(!-*)?|-*>)/processBangtag($1)/ge; print "comments stripped\n" if $debug >= 6; $errorMsg = $intMsg, return 0 if $intFlag; # A good web page encloses its javascript in comments , # But some don't, and the (sometimes quoted) < > characters # really mess us up. Let's try to strip the javascript, # or any other script for that matter. $inscript = 0; $$tbuf =~ s/((?>(\( *['"])?<(\/?script[^>]*>)?|[>"']))/processScript($1)/gei; print "javascript stripped\n" if $debug >= 6; $errorMsg = $intMsg, return 0 if $intFlag; # I'm about to crack html tags with one regexp, # and that would be entirely doable, if people and web tools didn't # generate crappy html. # The biggest problem is unbalanced quotes, whence the open quote # swallows the rest of the document in one tag. # I'm goint to *try*, emphasis on try, to develop a few heuristics # that will detect some of the common misquotings. # This stuff should be written in C, a complex procedural algorithm. # But I don't have the time or inclination to translate this mess into C, # and perl is not the write language to write an algorithm like that. # I've seen examples of all of these syntactical nightmares on the web, # and others that I can't possibly code around. # Only one quote in the tag; get rid of it. Tag is on one line. $$tbuf =~ s/<(\/?[a-zA-Z][^<>'"]*)['"]([^<>'"]*)>/<$1$2>/g; # Two quotes before the last >, but not ="">, which would be ok. $$tbuf =~ s/([^= <>])"">/$1">/g; $$tbuf =~ s/([^= <>])''>/$1'>/g; # Missing quote before the last > "word> # It's usually the last > where things screw up. $$tbuf =~ s/["'](\w+)>/$1>/g; #   is suppose to have a semi after it - it often doesn't. $$tbuf =~ s/ $/ /gi; $$tbuf =~ s/ ([^;])/ $1/gi; # Well that's all I can manage right now. # Encode number characters. # This is kludgy as hell, but I want to be able to read my own math pages. $$tbuf =~ s/ *([a-zA-Z]|&#\d+;) *<\/font>/metaSymbol($1)/gei; # Now let's encode the tags. # Thanks to perl, we can do it in one regexp. $$tbuf =~ s/< # start the tag (\/?) # leading slash turns off the tag ([a-zA-Z]+) # name of the tag ( # remember the attributes (?> # fix each subexpression as you find it [^>"']+ # unquoted stuff inside the tag | "[^"]*" # stuff in double quotes | '[^']*' # stuff in single quotes )* # as many of these chunks as you need ) # return the set in $3 > # close the html tag /processTag($2, $1, $3)/xsge; print "tags encoded\n" if $debug >= 6; $errorMsg = $intMsg, return 0 if $intFlag; # Now we can crunch the meta chars without fear. $$tbuf =~ s/&([a-zA-Z]+|#\d+);/metaChar($1)/ge; print "meta chars translated\n" if $debug >= 6; $onloadSubmit = 0; $longcut = $lperiod = $lcomma = $lright = $lany = 0; my @olcount = (); # Where are we in each nested number list? my @dlcount = (); # definition lists my $tagnest = "."; # Stack the nestable html tags, such as
  • my $tagLock = 0; # other tags are locked out, semantically, until this one is done my $tagStart; # location of the tag currently under lock # Locking tags are currently: title, select, textarea my $inhref = 0; # in anchor reference my $intitle = 0; my $inselect = 0; my $inta = 0; # text area my $optStart; # start location of option my $opt; # hash of options my $optCount; # count of options my $optSel; # options selected my $optSize; # size of longest option my $lastopt; # last option, waiting for next \n\n\n"; $$bufref =~ s/