pax_global_header00006660000000000000000000000064114236156320014515gustar00rootroot0000000000000052 comment=9b313e50bf6d91a1055345aaf1c197f6b467505d canto-0.7.10/000077500000000000000000000000001142361563200127065ustar00rootroot00000000000000canto-0.7.10/COPYING000066400000000000000000000431031142361563200137420ustar00rootroot00000000000000 GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. canto-0.7.10/ChangeLog000066400000000000000000000253071142361563200144670ustar00rootroot00000000000000Canto 0.7.10 ChangeLog * Add canto-inspect.1 manpage * Fix other manpages * Fix some 2.6 incompatible abuse * Add workaround for bad feed data caused by switching between the system feedparser and the builtin. Canto 0.7.9 ChangeLog * Fix fresh install sans system feedparser * Make canto-inspect use builtin feedparser Canto 0.7.8 ChangeLog * Fix feed exception encoding problem. * Fix occasional zombies / extra pids floating around. * Import feedparser into source tree. * Add -s/--sysfp flag to canto-fetch to fall back on system feedparser * Render improvements by honoring declared content types Canto 0.7.7 ChangeLog * Fix harmless widecurse.c warning * Fix going to locale incompatible URLs * Fix Python 2.6.5 weirdness * Fix enclosure parsing/display exception * Fix set_{filter,tag_filter,sort}(None) * Improved config validation for gui options * Work harder to maintain selections through updates * Make cursor behavior more flexible * Change default cursor behavior * Documentation tweaks (thanks acoolon) Canto 0.7.6 ChangeLog * Fix non-consecutive sorts * Fix multi-tag sorts * Fix change_feed * Fix possible get_text exception * Replace multiprocessing functionality * Worker process initiated on demand * Add "restart" keybind (|) * Add wget_link reader_key * Add highlight_word drawing hook * Add "setup.py uninstall" command * Minor c-f cleanup Canto 0.7.5 ChangeLog * Fix some reader inconsistencies with reader keys moving selections * Wrap some harmless, rare curses exceptions * Workaround messed up SIGCHLD handling in multiprocessing * Doc updates Canto 0.7.4 ChangeLog * Correct some overlooked 0.7.3 stuff =P Canto 0.7.3 ChangeLog * Fix various update issues on long-running clients * Fix sluggish reader link toggling * Fix worker signals (and ^C as a side-effect) * Fix reader n/p keys not setting items read * Fix double quotes in programmatically added main tags * Fix shadows on horrendously broken feeds * Fix all-filtered stub * Refix hard-filters (??) * Minor cleanups * Documentation clarifications Canto 0.7.2 ChangeLog * Fix some precache troubles with aggregate filters / reverse * Restore feed-order without sort * Startup cleanups Canto 0.7.1 ChangeLog * Fix hard (feed) filters * Fix keyword escaping for non-regex searches * Fix items with totally undefined titles * Fix fetchlog header from arg refactor * Ignore some exceptions cause by multiprocessing * Minor doc tweaks Canto 0.7.0 ChangeLog * Convert to multiprocessing worker slave (huge performance) * Vast memory improvement (esp. for large lists) * Large scale refactor of *lots* of code * Better code documentation * Better site documentation * Partially validating configuration code * Partial test framework (to be added to as bugs arise) * Added configurable update triggers * Added `never_discard()` to keep certain items indefinitely. * Added SIGUSR2 signal to output debug backtrace. * Added state_change_hook * Added `canto-inspect`, a simple wrapper for examining feed internals * Added no-content stub for unfetchable feeds to avoid trying to update broken URLS repeatedly. * Added `add_info` extra function for adding content to the reader * new_hook now enforced by canto-fetch * Ignore keep settings lower than the number of items in a feed * tags variably now implicitly set to sane default * "reader" keybind now doesn't set item read (coupled with "just_read" for default) * Filters and Sorts now all subclass Filter and Sort class * Accept `conf.py` as well as `conf` for config name. * Fix double enforcement of rates (client more responsive to fetch updates) * Fix blank titles * Fix runhere.sh killing c-f Canto 0.6.13 ChangeLog * Fix drawing regression Canto 0.6.12 ChangeLog * Fix tag crash * Fix strange character weirdness * Fix/Improve HTML parser * Try UTF-8 for config before chardet Canto 0.6.11 ChangeLog * Fix OPML handling for OPML without text attribute * Exception clean ups * Doc clean ups * canto.extra additions Canto 0.6.10 ChangeLog * Make HTML parser more resistant to broken HTML * Fix minor exception in c-f thread * Finally make exceptions play nice with ncurses Canto 0.6.9 ChangeLog * Fix setup.py generating null bytes in const.py * Add 30 second timeout to canto-fetch * Make selection data persist for hooks / filters * Unset signals before exit (avoid shell garbage) * Set User-Agent to Canto/x.y.z (fixes some 403'd feeds) * Fix multiple c-f subtle corruption bug * Sync docs now that site runs out of git. Canto 0.6.8 ChangeLog * Fix set_tag_sort(None) * Fix miscount of items when all filtered * Reader keybinds now passed reader object (like Gui keybinds) * Cache overused locale query (crazy speedup on that) * Convert setup.py to pure python (no more sed scripts) Canto 0.6.7 ChangeLog * Make filter syntax uniform (add filters now work with canto/extra filters) * Fix add_tag() without sorts= set (typo) Canto 0.6.6 ChangeLog * Make setup.py sed scripts BSD compatible * Even basic print statements now obey locale.preferredencoding() * All input command-line input is unicode'd() (-n fix) Canto 0.6.5 ChangeLog * Fix multiple identical main tag weirdness * Fix curses crash with TERM misset * Fix unset default handlers * Fix lock crash * Fix small update problem * Allow 256 color definitions Canto 0.6.4 ChangeLog * Re-fix Unicode Tags (incomplete fix last time) * Detect / add encode declaration to conf (defaults to utf-8) Canto 0.6.3 ChangeLog * Fix Unicode Tags (damn you, `exec`) * Fix imports with unknown encoding * Fix locking issue causing items to reappear as unread Canto 0.6.2 ChangeLog * Browser improvements * runhere.sh detects 64-bit now Canto 0.6.1 ChangeLog * Fix docs/manpage * Fix add_tag troubles Canto 0.6.0 ChangeLog * Much improved reader output using HTMLParser * New message bar, no more floating boxes * Brand spanking new fine-grained locking in Canto and Canto-fetch * Basic multi-threading in Canto-fetch * New content handling (can now open images and enclosures with custom handlers bases on extensions and link type) - Content can now be fetched to /tmp for programs too * Support for Snownews/Liferea type execurl scripts * Reader can now take a dedicated number of lines on the top, bottom, left, or right of the typical GUI * Username/password support for feeds using Basic or Digest auth * Add ; / : to jump skip through feeds by index * -t flag to use with -r to set a tag on the command line * New UTF-8 compatible InputBox for non-ASCII searching * Everything (sorts, filters, tags) is directly set-able with a keybind * Deprecate `add_feed`, `browser`, and `text_browser` * Terminal output is now coerced to locale.preferredencoding(), fixing non-UTF locales being used. Internals now stricly Unicode * Much more advanced usage of tags * More correct and flexible drawing code * The beginnings of a test-suite (still incomplete). Canto 0.5.7 ChangeLog * Add -r flag to add URL from the command line. * Added save() example keybind to canto/extra.py * Fix nasty text browser problems from 0.5.6 Canto 0.5.6 ChangeLog * Fix OPML import * Fix changing feed names immediately * Two feeds with the same name are now merged * Add -b (background flag) to canto-fetch Canto 0.5.5 ChangeLog * Allow add_feed() to be called without a tag (name) * Add canto-fetch -d (daemon mode) * Add runhere.sh script to run canto straight from source * Add source_urls * Fix OPML export output * Fix Canto hang after help * Cleanup C compile warning * More rendering cleanups Canto 0.5.4 ChangeLog * Fix renderer overrides Canto 0.5.3 ChangeLog * Fixed browser zombies * Fixed closing link enumeration * Fixed missing -i/-o man page references * Add ability to use any type of URL for feed (e.g. file://) Canto 0.5.2 ChangeLog * Added sorting * Added OPML support * Added change_feed * addfeed() == add_feed() * Added missing default_filterlist() * Fixed bad keybind crashes Canto 0.5.1 ChangeLog * Fixed progressing memory leak * Hooks/filters are now wrapped in an exception logging wrapper. * noitem_unsafe decorator applied to all Gui() functions that can't be used without any items * Fixed uninitialized variable causing crash. Canto 0.5.0 ChangeLog * New dependency on feedparser and chardet. * Global and per-feed filters implemented. * Hooks, to call code on events (like new items, changing selection, etc.) * Entirely changed, much less fragile on-disk format. * Canto and canto-fetch are now a single, multi-call binary * Canto-fetch no longer has its own config * More useful keybinds, support lists of actions and arbitrary functions. * Per-feed renderers can be configured. * Story items now include all content, rather than just title/link/description (pretty useful for neat per-feed renderers, thanks to feedparser) * Reader is now prettier, and more correct. * Canto.extra is provided for nice helper functions for your config. * Code documentation is much improved * Code organization is more logical * Proper locking between multiple instances of canto/canto-fetch. * Entirely backwards compatible with 0.4.0 configs Canto 0.4.8 ChangeLog * Fix minor search problem * Move title_key logic into canto_fetch Canto 0.4.7 ChangeLog * Fix HTML entities in story titles. * Prioritize read over unread stories on disk. * Remove --delete, canto-fetch cleans up automatically. * Add vim-like (j/k) scrolling default bindings. * Theme cleanups, interpret more HTML tags. * Add User-Agent request header, fixing feeds like Google News. * Add next/prev_unread bindings (default to ./,) Canto 0.4.6 ChangeLog * More title_key fixups. * Move entity parsing to client, for more correct handling. * Add basic list handling to reader. Canto 0.4.5 ChangeLog * Fix html entity crash * Improve reader internal link detection * Fix title_key behavior * Minor theme fix for collapsed feeds Canto 0.4.4.1 ChangeLog * Quick fix for conf.example generation/use. Canto 0.4.4 ChangeLog * Vast cleanups in renderer / renderer format / C code * Added canto -l for listing * Added canto -n for printing number of new items in feed * Added canto -a for printing number of new items in all * Added canto-fetch -V for verbose state printing * Added canto-fetch -f to force update, regardless of timestamp * Added set_collapse_all unset_collapse_all keybinds, eliminate toggle. * Added title_key option to addfeed with default * Make canto -u verbose * Create and use a conf.example if no conf file found. * Stop kludging paths with sed, start relying on os.system() * Start forcing an update if all feeds are empty (first start?) * Fix canto -D * Fix drawing on very skinny terminals * Fix feed / tag separation in cfg Canto 0.4.3 ChangeLog * `xterm -e canto` now works as planned (reported by Aldrik Dunbar and grunge) * The canto/.conf is now encoded in memory to UTF-8 fixing embedded, non-ASCII characters (reported by Ricardo Martins) * Feed names now have forward slashes stripped for disk storage (reported by Ricardo Martins) canto-0.7.10/INSTALL000066400000000000000000000006201142361563200137350ustar00rootroot00000000000000==INSTALL== =REQUIREMENTS= - Python >=2.5 (tested mostly on 2.5.4) - NCursesw >=5.5 (yes, Unicode is a requirement) - Feedparser, chardet =BUILDING= - Python development headers - Ncurses development headers - GCC =INSTALL= - `python setup.py install` will install the files to your root. - `./runhere.sh` will install the files to the current directory and run canto without being installed. canto-0.7.10/README000066400000000000000000000002011142361563200135570ustar00rootroot00000000000000Any information you should require to get started is included in the man page (`man canto`), under the section "GETTING STARTED" canto-0.7.10/bin/000077500000000000000000000000001142361563200134565ustar00rootroot00000000000000canto-0.7.10/bin/canto000077500000000000000000000002721142361563200145110ustar00rootroot00000000000000#!/usr/bin/env python import canto.main import sys if __name__ == "__main__" : c = canto.main.Main() while c.restart: c = canto.main.Main(c.cfg.stdscr) sys.exit(0) canto-0.7.10/bin/canto-fetch000077500000000000000000000001351142361563200155760ustar00rootroot00000000000000#!/usr/bin/env python import canto.main if __name__ == "__main__" : canto.main.Main() canto-0.7.10/bin/canto-inspect000077500000000000000000000001571142361563200161560ustar00rootroot00000000000000#!/usr/bin/env python import canto.canto_inspect if __name__ == "__main__" : canto.canto_inspect.main() canto-0.7.10/canto/000077500000000000000000000000001142361563200140125ustar00rootroot00000000000000canto-0.7.10/canto/__init__.py000066400000000000000000000000021142361563200161130ustar00rootroot00000000000000# canto-0.7.10/canto/args.py000066400000000000000000000106601142361563200153230ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # This is probably the most straightforward file in the code base. It handles # all of the argument parsing and interprets all of the common arguments between # canto and canto-fetch. # The one thing to note is that every option that is parsed with an actual # argument must have that argument converted to unicode right off the bat. from const import * import getopt import sys import os def print_canto_usage(): print "USAGE: canto [-hvulaniortDCLF]" print "--help -h This help." print "--version -v Print version info." print "--update -u Fetch updates before running." print "--list -l List configured feeds." print "--checkall -a Prints number of new items." print "--checknew -n [feed] Prints number of items that are new in feed." print "" print "--opml -o Convert conf to OPML and print to stdout." print "--import -i [path] Add feeds from OPML file to conf." print "--url -r [url] Add feed at URL to conf." print "--tag -t [tag] Set tag (for -r)" print "" print_common_usage() def print_fetch_usage(): print "USAGE: canto-fetch [-hvVfdbDCLF]" print "--help -h This help." print "--version -v Print version info." print "--verbose -V Print extra info while running." print "--force -f Force update, regardless of timeestamps." print "--daemon -d Run as a daemon." print "--background -b Background (implies -d)" print "--interval -i Update interval when run as a daemon" print "--sysfp -s Use system feedparser instead of builtin." print "" print_common_usage() def print_common_usage(): print "--dir -D [path] Set configuration directory. (~/.canto/)" print "--conf -C [path] Set configuration file. (~/.canto/conf)" print "--log -L [path] Set client log file. (~/.canto/log)" print "--fdir -F [path] Set feed directory. (~/.canto/feeds/)" print "--sdir -S [path] Set script directory (~/.canto/scripts/)" def parse_common_args(enc, extra_short, extra_long, iam="canto"): shortopts = 'D:C:L:F:S:' + extra_short longopts = ["dir=","conf=","log=","fdir=","sdir="] + extra_long try : optlist = getopt.getopt(sys.argv[1:],shortopts,longopts)[0] except getopt.GetoptError, e: print "Error: %s" % e.msg sys.exit(-1) for opt, arg in optlist: if opt in ["-D", "--dir"]: conf_dir = unicode(arg, enc, "ignore") break else: conf_dir = os.getenv("HOME") + "/.canto/" if conf_dir[-1] != '/' : conf_dir += '/' if not os.path.exists(conf_dir): os.mkdir(conf_dir) if iam == "canto": log_file = conf_dir + "log" else: log_file = conf_dir + "fetchlog" conf_file = conf_dir + "conf.py" feed_dir = conf_dir + "feeds/" script_dir = conf_dir + "scripts/" # Make sure that the {feed,script}_dir does, indeed, exist and is # actually a directory. for dir in [feed_dir, script_dir]: if not os.path.exists(dir): os.mkdir(dir) elif not os.path.isdir(dir): os.unlink(dir) os.mkdir(dir) for opt, arg in optlist : if opt in ["-C", "--conf"] : conf_file = unicode(arg, enc, "ignore") elif opt in ["-L","--log"] : log_file = unicode(arg, enc, "ignore") elif opt in ["-F","--fdir"] : feed_dir = unicode(arg, enc, "ignore") if feed_dir[-1] != '/' : feed_dir += '/' elif opt in ["-S","--sdir"] : script_dir = unicode(arg, enc, "ignore") if script_dir[-1] != '/' : script_dir += '/' elif opt in ["-h","--help"] : if iam == "canto": print_canto_usage() else: print_fetch_usage() sys.exit(0) elif opt in ["-v","--version"] : print "Canto v %s (%s)" % ("%d.%d.%d" % VERSION_TUPLE, GIT_SHA) sys.exit(0) return (conf_dir, log_file, conf_file, feed_dir, script_dir, optlist) canto-0.7.10/canto/basegui.py000066400000000000000000000024731142361563200160110ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # BaseGui enforces that all gui objects have the same basic interface. For # BaseGui ever single function is at least partially overridden, but the Reader # class can take most of it at default value. from const import NOKEY class BaseGui: def __init__(self): self.keys = {} def draw_elements(): pass # Translate a key into a set of actions based on self.keys def key(self, k): if k in self.keys: if type(self.keys[k]) == list: return self.keys[k] return [self.keys[k]] else: return [] # Perform the action. If it's a string, attempt to look it up as an # attribute of the self object. If it's a callable, go ahead and call it. def action(self, a): if hasattr(a, "__call__"): r = a(self) else: f = getattr(self, a, None) if f: r = f() else: r = NOKEY if not r: self.draw_elements() return r canto-0.7.10/canto/canto_fetch.py000066400000000000000000000461261142361563200166520ustar00rootroot00000000000000#Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # Canto-fetch is essentially a stand alone binary, it's only packaged with the # canto client source because they share configuration and canto-fetch can # conveniently fit into a single file without there being too much confusion. # There are three parts, roughly. # main() -> arg parsing and (if necessary) runs the daemon loop # run() -> spawns the appropriate threads # FetchThread -> performs the update for one feed # main is only used when canto-fetch is called from the command line. # run is used internally by canto when it needs to invoke an update. import feedparser_builtin feedparser = feedparser_builtin from const import VERSION_TUPLE, GIT_SHA from cfg.base import get_cfg import utility import args from threading import Thread import traceback import commands import urlparse import urllib2 import cPickle import locale import socket import signal import fcntl import time import sys import os def main(enc): conf_dir, log_file, conf_file, feed_dir, script_dir, optlist =\ args.parse_common_args(enc, "hvVfdbi:s", ["help","version","verbose","force","daemon",\ "background", "interval=", "sysfp"], "canto-fetch") try : cfg = get_cfg(conf_file, log_file, feed_dir, script_dir) cfg.parse() except : traceback.print_exc() sys.exit(-1) cfg.log("Canto-fetch v %s (%s)" % \ ("%d.%d.%d" % VERSION_TUPLE, GIT_SHA), "w") cfg.log("Time: %s" % time.asctime()) cfg.log("Config parsed successfully.") def log_func(x): if verbose: print x cfg.log(x) #Defaults updateInterval = 60 daemon = False background = False verbose = False force = False for opt, arg in optlist : if opt in ["-d","--daemon"]: daemon = True if opt in ["-b","--background"]: background = True daemon = True if opt in ["-i","--interval"]: try: arg = unicode(arg, enc, "ignore") i = int(arg) if i < 60: cfg.log("interval must be >= 60 (one minute)") else: updateInterval = i except: cfg.log("%s isn't a valid interval" % arg) else: cfg.log("interval = %d seconds" % updateInterval) if opt in ["-s","--sysfp"]: log_func("Using system feedparser") global feedparser try: import feedparser as feedparser_system feedparser = feedparser_system except: log_func("Import failed. Falling back on builtin.") if opt in ["-V","--verbose"]: verbose = True elif opt in ["-f","--force"]: force = True # Remove any crap out of the directory. This is mostly for # cleaning up when the user has removed a feed from the configuration. valid_names = [f.URL.replace("/"," ") for f in cfg.feeds] for file in os.listdir(cfg.feed_dir): if not file in valid_names: log_func("Deleted extraneous file: %s" % file) try: os.unlink(cfg.feed_dir + file) except: pass if background: # This is a pretty canonical way to do backgrounding. pid = os.fork() if not pid: # New terminal session os.setsid() os.chdir("/") os.umask(0) pid = os.fork() if pid: sys.exit(0) else: sys.exit(0) # Close all possible terminal output # file descriptors. os.close(0) os.close(1) os.close(2) if daemon: while 1: run(cfg, verbose, force) time.sleep(updateInterval) oldcfg = cfg try : cfg = get_cfg(conf_file, log_file, feed_dir, script_dir) self.cfg.parse() except: cfg = oldcfg else: sys.exit(run(cfg, verbose, force)) def run(cfg, verbose=False, force=False): # If we don't explicitly set this, feedparser/urllib will take *forever* to # give up on a connection. 30 is a pretty sane default, I think, considering # that 30 seconds is more than enough time to get your average 4-5k feed # even on a really poor connection. socket.setdefaulttimeout(30) threads = [] def log_func(x): if verbose: print x cfg.log(x) def imdone(): for thread in threads: thread.join() socket.setdefaulttimeout(None) log_func("Gracefully exiting Canto-fetch.") return 1 killme = lambda a, b: imdone() and sys.exit(0) signal.signal(signal.SIGTERM, killme) signal.signal(signal.SIGINT, killme) # The main canto-fetch loop. for fd in cfg.feeds: fpath = cfg.feed_dir + fd.URL.replace("/", " ") spath = cfg.script_dir threads.append(FetchThread(cfg, fd, fpath, spath, force, log_func)) threads[-1].start() imdone() return 0 class FetchThread(Thread): def __init__(self, cfg, fd, fpath, spath, force, log_func): Thread.__init__(self) self.fd = fd self.fpath = fpath self.spath = spath self.force = force self.cfg = cfg self.log_func = log_func self.prevtime = 0 # This emptyfeed forms a skeleton for any canto feed. # Canto_state is a place holder. Canto_update is the # last time the feed was updated, and canto_version is # obviously the version under which it was last written. # Canto_version will be used in the future, if later # releases change anything serious in the format. self.emptyfeed = {"canto_state":[], "entries":[], "canto_update":0, "canto_version": VERSION_TUPLE } # get_curfeed loads the old feed data from disk. It blocks getting the lock, # so it could take awhile, but should never fail if the information isn't # corrupted. def get_curfeed(self): curfeed = self.emptyfeed if os.path.exists(self.fpath): if os.path.isfile(self.fpath): f = open(self.fpath, "r") fcntl.flock(f.fileno(), fcntl.LOCK_SH) self.prevtime = os.stat(self.fpath).st_mtime try: curfeed = cPickle.load(f) except: self.log_func("cPickle load exception on %s" % self.fpath) finally: fcntl.flock(f.fileno(), fcntl.LOCK_UN) f.close() else: # The file doesn't exist yet, so we write a stub so that Canto # detects presence and doesn't endlessly try to refetch error'd # feeds if later on an error occurs. d = { u"title" : u"No content.", u"description" : u"There's no content in this feed. It's" + " possible that it hasn't been fetched yet or an error was" + " encountered. Check your fetchlog.", u"canto_state" : ["*"], u"id" : u"canto-internal" } curfeed["entries"].append(d) f = open(self.fpath, "w") try: fcntl.flock(f.fileno(), fcntl.LOCK_EX) cPickle.dump(curfeed, f) f.flush() except: pass finally: fcntl.flock(f.fileno(), fcntl.LOCK_UN) f.close() return curfeed def run(self): curfeed = self.get_curfeed() # Determine whether it's been long enough between # updates to warrant refetching the feed. if time.time() - curfeed["canto_update"] < self.fd.rate * 60 and\ not self.force: return # Attempt to set the tag, if unspecified, by grabbing # it out of the previously downloaded info. if not self.fd.base_set: if "feed" in curfeed and "title" in curfeed["feed"]: replace = lambda x: x or curfeed["feed"]["title"] self.fd.tags = [ replace(x) for x in self.fd.tags] self.fd.base_set = 1 self.log_func("Updating %s" % self.fd.tags[0]) else: # This is the first time we've gotten this URL, # so just use the URL since we don't know the title. self.log_func("New feed %s" % self.fd.URL) else: self.log_func("Updating %s" % self.fd.tags[0]) # This block set newfeed to a parsed feed. try: # Feed from script if self.fd.URL.startswith("script:"): script = self.spath + "/" + self.fd.URL[7:] out = commands.getoutput(script) newfeed = feedparser.parse(out) # Feed from URL else: request = urllib2.Request(self.fd.URL) request.add_header('User-Agent',\ "Canto/%d.%d.%d + http://codezen.org/canto" %\ VERSION_TUPLE) # Feed from URL w/ password if self.fd.username or self.fd.password: mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() domain = urlparse.urlparse(self.fd.URL)[1] mgr.add_password(None, domain,\ self.fd.username, self.fd.password) # First, we try Basic Authentication auth = urllib2.HTTPBasicAuthHandler(mgr) opener = urllib2.build_opener(auth) try: newfeed = feedparser.parse(opener.open(request)) except: # And, failing that, try Digest Authentication auth = urllib2.HTTPDigestAuthHandler(mgr) opener = urllib2.build_opener(auth) newfeed = feedparser.parse(opener.open(request)) # Feed with no password. else: newfeed = feedparser.parse(\ feedparser.urllib2.urlopen(request)) except: # Generally an exception is a connection refusal, but in any # case we either won't get data or can't trust the data, so # just skip processing this feed. enc = locale.getpreferredencoding() self.log_func("Exception trying to get feed %s : %s" % \ (self.fd.URL.encode(enc, "ignore"), sys.exc_info()[1])) return # I don't know why feedparser doesn't actually throw this # since all URLErrors are basically unrecoverable. if "bozo_exception" in newfeed: if type(newfeed["bozo_exception"]) == urllib2.URLError: self.log_func(\ "Feedparser exception getting %s : %s, bailing." %\ (self.fd.URL, newfeed["bozo_exception"].reason)) return if not len(newfeed["entries"]): self.log_func(\ "Feedparser exception, no content in %s : %s, bailing." %\ (self.fd.URL, newfeed["bozo_exception"])) return # Filter out "No Content" message since we apparently have real content curfeed["entries"] = [ x for x in curfeed["entries"] if x["id"] !=\ "canto-internal"] # For new feeds whose base tag is still not set, attempt to get a title # again. if not self.fd.base_set: if "feed" not in newfeed or "title" not in newfeed["feed"]: self.log_func("Ugh. Defaulting to URL for tag. No guarantees.") newfeed["feed"]["title"] = self.fd.URL replace = lambda x: x or newfeed["feed"]["title"] self.fd.tags = [ replace(x) for x in self.fd.tags] # Feedparser returns a very nice dict of information. # if there was something wrong with the feed (usu. encodings # being mis-declared or missing tags), it sets # bozo_exception. # These exceptions are recoverable and their objects are # un-Picklable so we log it and remove the value. if "bozo_exception" in newfeed: self.log_func("Recoverable error in feed %s: %s" % (self.fd.tags[0], newfeed["bozo_exception"])) newfeed["bozo_exception"] = None # Make state persist between feeds. Currently, this is completely # unused, as there's no state information that needs to be propagated. # This is a relic from when feeds and tags were the same thing, however # it could be useful when doing integration with another client / # website and thus, hasn't been removed. newfeed["canto_state"] = curfeed["canto_state"] newfeed["canto_update"] = time.time() # We can set this here, without checking curfeed. # Any migration should be done in the get_curfeed function, # when the old data is first loaded. newfeed["canto_version"] = VERSION_TUPLE # For all content that we would usually use, we escape all of the # slashes and other potential escapes. def escape(s): s = s.replace("\\","\\\\") return s.replace("%", "\\%") for key in newfeed["feed"]: if type(newfeed["feed"][key]) in [unicode,str]: newfeed["feed"][key] = escape(newfeed["feed"][key]) for entry in newfeed["entries"]: for subitem in ["content","enclosures"]: if subitem in entry: for e in entry[subitem]: for k in e.keys(): if type(e[k]) in [unicode,str]: e[k] = escape(e[k]) for key in entry.keys(): if type(entry[key]) in [unicode,str]: entry[key] = escape(entry[key]) for entry in newfeed["entries"]: # If the item didn't come with a GUID, then # use link and then title as an identifier. if not "id" in entry: if "link" in entry: entry["id"] = entry["link"] elif "title" in entry: entry["id"] = entry["title"] else: entry["id"] = None # Then search through the current feed to # make item state persistent, and loop until # it's safe to update on disk. new = [] while 1: for entry in newfeed["entries"]: for centry in curfeed["entries"]: if entry["id"] == centry["id"]: entry["canto_state"] = centry["canto_state"] # The entry is removed so that later it's # not a candidate for being appended to the # end of the feed. curfeed["entries"].remove(centry) break else: new.append(entry) # Apply default state to genuinely new items. if "canto_state" not in entry: entry["canto_state"] = self.fd.tags + [u"*"] # Tailor the list to the correct number of items. In canto < 0.7.0, # you could specify a keep that was lower than the number of items # in the feed. This was simply done, but ultimately it caused too # much "bounce" for social news feeds. Items get put into the feed, # are upvoted enough to be within the first n items, you change # their state, they move out of the first n items, are forgotten, # then are upvoted again into the first n item and (as far as c-f # knows) are treated like brand new items. # This will still be a problem if items get taken out of the feed # and put back into the feed (and the item isn't in the extra kept # items), but then it becomes a site problem, not a reader problem. if self.fd.keep and len(newfeed["entries"]) < self.fd.keep: newfeed["entries"] += curfeed["entries"]\ [:self.fd.keep - len(newfeed["entries"])] # Enforce the "never_discard" setting # We iterate through the stories and then the tag so that # feed order is preserved. for e in curfeed["entries"]: for tag in self.cfg.never_discard: if tag == "unread": if "read" in e["canto_state"]: continue elif tag not in e["canto_state"]: continue if e not in newfeed["entries"]: newfeed["entries"].append(e) if self.cfg.new_hook: for entry in [e for e in new if e in newfeed["entries"]]: self.cfg.new_hook(newfeed, entry, entry == new[-1]) # Dump the output to the new file. # Locking and writing is counter-intuitive using fcntl. If you open # with "w" and fail to get the lock, the data is still deleted. The # solution is to open with "a", get the lock and then truncate the # file. f = open(self.fpath, "a") fcntl.flock(f.fileno(), fcntl.LOCK_EX) # The feed was modified out from under us. if self.prevtime and self.prevtime != os.stat(self.fpath).st_mtime: # Unlock. fcntl.flock(f.fileno(), fcntl.LOCK_UN) f.close() # Reread the state from disk. newer_curfeed = self.get_curfeed() # There was an actual c-f update done, bail. if newer_curfeed["canto_update"] != curfeed["canto_update"]: self.log_func("%s updated already, bailing" % self.fd.tags[0]) break # Just a state modification by the client, update and continue. else: curfeed = newer_curfeed continue # Truncate the file f.seek(0, 0) f.truncate() try: # Dump the feed item. It's important to flush afterwards to # avoid unlocking the file before all the IO is finished. cPickle.dump(newfeed, f) f.flush() except: self.log_func("cPickle dump exception on %s" % self.fpath) finally: # Unlock. fcntl.flock(f.fileno(), fcntl.LOCK_UN) f.close() # If we managed to write to disk, break out of the while loop and # the thread will exit. break canto-0.7.10/canto/canto_html.py000066400000000000000000000115031142361563200165140ustar00rootroot00000000000000#!/usr/bin/env python #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # This was inspired by Aaron Swartz's html2text, but doesn't do # file IO, doesn't do markdown, and doesn't shy away from Unicode. from handlers import LinkHandler, ImageHandler from HTMLParser import HTMLParser import htmlentitydefs import re class CantoHTML(HTMLParser): # Reset is used, instead of __init__ so a single # instance of the class can parse multiple HTML # fragments. def reset(self): HTMLParser.reset(self) self.result = "" self.list_stack = [] self.verbatim = 0 self.links = [] self.mime_handlers = [LinkHandler(),ImageHandler()] # unknown_* funnel all tags to handle_tag def handle_starttag(self, tag, attrs): self.handle_tag(tag, attrs, 1) def handle_endtag(self, tag): self.handle_tag(tag, {}, 0) def handle_data(self, text): if self.verbatim <= 0: text = text.replace(u"\n", u" ") for handler in self.mime_handlers: if handler.active: handler.content += text self.result += text # convert_* are called by SGMLParser's default # handle_char/entityref functions. def convert_charref(self, ref): try: if ref[0] in [u'x',u'X']: c = int(ref[1:], 16) else: c = int(ref) except: return u"[?]" return unichr(c) def handle_charref(self, ref): self.result += self.convert_charref(ref) def convert_entityref(self, ref): if ref in htmlentitydefs.name2codepoint: return unichr(htmlentitydefs.name2codepoint[ref]) return u"[?]" def handle_entityref(self, ref): self.result += self.convert_entityref(ref) # This is the real workhorse of the HTML parser. def handle_tag(self, tag, attrs, open): for handler in self.mime_handlers: output = handler.match(tag, attrs, open, self.links) if output: self.handle_data(output) if tag in [u"h" + unicode(x) for x in xrange(1,7)]: if open: self.result += u"\n%B" else: self.result += u"%b\n" if tag in [u"blockquote"]: if open: self.result += u"\n%Q" else: self.result += u"%q\n" elif tag in [u"pre",u"code"]: if open: if tag == u"pre": self.result += u"\n%Q" self.verbatim += 1 else: if tag == u"pre": self.result += u"%q\n" self.verbatim -= 1 elif tag in [u"sup"]: if open: self.result += u"^" elif tag in [u"p", u"br", u"div"]: self.result += u"\n" elif tag in [u"ul", u"ol"]: if open: self.result += u"\n%I" self.list_stack.append([tag,0]) else: # Grumble grumble. Bad HTML. if len(self.list_stack): self.list_stack.pop() self.result += u"%i\n" elif tag in [u"li"]: if open: self.result += u"\n" # List item with no start tag, default to ul if not len(self.list_stack): self.list_stack.append(["ul",0]) if self.list_stack[-1][0] == u"ul": self.result += u"\u25CF " else: self.list_stack[-1][1] += 1 self.result += unicode(self.list_stack[-1][1])+ ". " else: self.result += u"\n" elif tag in [u"i", u"small", u"em"]: if open: self.result += u"%6%B" else: self.result += u"%b%0" elif tag in [u"b", u"strong"]: if open: self.result += u"%B" else: self.result += u"%b" def ent_wrapper(self, match): return self.convert_entityref(match.groups()[0]) def char_wrapper(self, match): return self.convert_charref(match.groups()[0]) def convert(self, s): # We have this try except because under no circumstances # should the HTML parser crash the application. Better # handling is done per case in the handler itself so that # bad HTML doesn't necessarily lead to garbage output. try: self.feed(s) except: pass r = self.result l = self.links self.reset() return (r,l) canto-0.7.10/canto/canto_inspect.py000066400000000000000000000054711142361563200172240ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # Like canto-fetch, canto-inspect is a standalone binary that's just merged into # the source because it's convenient. Right now, c-i essentially amounts to a # custom pretty printer for content from a feed. Eventually, I'd like it to # share canto's config such that you can ask it to display on-disk data more # easily, but that can come at a future date. # Also, I initially intended for this tool to be much smarter. I wanted it to # compress lists, eliminate more extraneous content, highlight interesting stuff # in the feeds, but in reality it's too much effort for too little gain, when # users that want to use this tool generally know what they're looking for. As # it is, it just provides a nice layout for the feed XML. # There are a number of important differences between this custom pretty printer # and the nice dict pretty printer in the pprint module. # # - Truncates strings to 100 characters. In general, strings in feeds that are # longer than that (i.e. descriptions) are already displayed and 100 # characters is enough to get the gist of the content. # # - Removed a lot of unnecessary Python artifacts. For example, strings are # printed without u'', dicts without {}, lists without [], etc. The types are # already evident. # # - Demarcate list indexes to make it easier to to tell where one item ends # and another begins. # These improvements lend themselves to readability, in my opinion, but if # anyone cares to do it, it would be dead simple to code a option to use the # default pprint. import feedparser_builtin as feedparser import codecs import time import sys FILE = "/dev/stdout" def print_usage(): print "USAGE: canto-inspect URL" def out(message): f = codecs.open(FILE, "a", "UTF-8") f.write(message) f.close() def pretty_print(obj, prefix="", indent = 0): indentstr = " " * indent if type(obj) in [unicode, str]: out(": %s\n" % obj.replace("\n", " ")[:100]) return elif type(obj) in [int, tuple, time.struct_time]: out(": %s\n" % obj) return else: out("\n") if hasattr(obj, "keys"): for k in obj.keys(): out(indentstr + ("[%s]" % k)) pretty_print(obj[k], "", indent + 1) elif type(obj) == list: for x, i in enumerate(obj): out(indentstr + ("[%d]" % x)) pretty_print(i, prefix, indent + 1) def main(): if len(sys.argv) != 2: print_usage() sys.exit(-1) else: URL = sys.argv[1] d = feedparser.parse(URL) pretty_print(d) canto-0.7.10/canto/cfg/000077500000000000000000000000001142361563200145515ustar00rootroot00000000000000canto-0.7.10/canto/cfg/__init__.py000066400000000000000000000000001142361563200166500ustar00rootroot00000000000000canto-0.7.10/canto/cfg/base.py000077500000000000000000000125141142361563200160430ustar00rootroot00000000000000#!/usr/bin/python # -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. from canto.const import VERSION_TUPLE from canto.utility import Cycle import canto.utility as utility import feeds import keys import style import tags import links import hooks import filters import triggers import gui import sorts import sources handlers = [tags, feeds, keys, style,\ links, hooks, filters, triggers, gui, sorts, sources] import xml.parsers.expat import traceback import chardet import codecs import os class Cfg: def __init__(self, conf, log_file, feed_dir, script_dir): self.precache = [] self.locals = {} self.wait_for_pid = 0 self.log_file = log_file self.path = conf self.feed_dir = feed_dir self.script_dir = script_dir self.msg_height = 1 self.msg = None self.msg_tick = 0 self.no_conf = 0 # If we can't stat self.path, generate a default config # and toss a message about making your own. try : os.stat(self.path) except : try: # Attempt to fall back to just conf os.stat(self.path[:-3]) self.path = self.path[:-3] except: print "Unable to find config file. Generating and "\ "using ~/.canto/conf.py.example" print "You will keep getting this until you create your "\ "own ~/.canto/conf.py" print "\nRemember: it's 'h' for help.\n" newpath = os.getenv("HOME") + "/.canto/" if not os.path.exists(newpath): os.mkdir(newpath) self.path = newpath + "conf.py.example" f = codecs.open(self.path, "w", "UTF-8") f.write("# Auto-generated by canto because you don't have one.\n" "# Please copy to/create ~/.canto/conf.py\n\n") f.write("""add("""\ """"http://rss.slashdot.org/slashdot/Slashdot")\n""") f.write("""add("""\ """"http://reddit.com/.rss")\n""") f.write("""add("""\ """"http://kerneltrap.org/node/feed")\n""") f.write("""add("""\ """"http://codezen.org/canto/feeds/latest")\n""") f.write("\n") f.close() self.no_conf = 1 def message(self, s, time=0): if self.msg: self.default_renderer.status(self.msg, self.msg_height,\ self.width, s) if not time: self.msg_tick = self.default_msg_tick else: self.msg_tick = time self.msg.refresh() # Simple append log. def log(self, message, mode="a"): self.message(message) try: f = open(self.log_file, mode) f.write(message + "\n") f.close() except: pass def read_decode(self, filename, top_encode=0): enc = "utf-8" f = open(filename, "r") try: data = f.read() try: ret = unicode(data, enc) except UnicodeDecodeError: # If the Python built-in decoders can't figure it # out, it might need some help from chardet. enc = chardet.detect(data)["encoding"] ret = unicode(data, enc) self.log("Chardet detected encoding %s for %s" %\ (enc,filename)) except : self.log("Failed to open config! (%s)" % sys.exc_info()) finally: f.close() if top_encode and not ret.startswith("# -*- coding:"): ret = "# -*- coding: " + enc + " -*-\n" + ret return ret def parse(self, data = None): # The entirety of the config is read in first (rather # than using execfile) because the config could be in # some strange encoding, and execfile would choke attempting # to coerce some character into ASCII. if not data: data = self.read_decode(self.path, 1) try : exec(data.encode("UTF-8"), {}, self.locals) except : print "Invalid line in config." traceback.print_exc() raise for h in handlers: h.post_parse(self) def validate(self): for h in handlers: h.validate(self) for l in [self.all_sorts, self.all_filters]: for e in l: if not e: continue for pc in e.precache: if pc not in self.precache: self.precache.append(pc) self.log("Precaching: %s" % self.precache) def get_cfg(conf, log_file, feed_dir, script_dir): c = Cfg(conf, log_file, feed_dir, script_dir) for h in handlers: h.register(c) return c if __name__ == "__main__": c = Cfg("/dev/null","/dev/null","/dev/null", "/dev/null") for h in handlers: h.register(c) for h in handlers: h.test(c) canto-0.7.10/canto/cfg/feeds.py000066400000000000000000000107721142361563200162200ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. from canto.feed import Feed def register(c): c.feeds = [] c.default_rate = 5 c.default_keep = 40 c.never_discard = [] def add(URL, **kwargs): if (not URL) or URL == "" or type(URL) not in [unicode, str]: raise Exception, "%s is not a valid URL" % URL for key in ["keep","rate"]: if not key in kwargs: kwargs[key] = getattr(c, "default_" + key) elif type(kwargs[key]) != int: raise Exception, "%s's %s must be an integer." % (URL, key) for key in ["username","password"]: if not key in kwargs: kwargs[key] = None elif type(kwargs[key]) not in [unicode, str]: raise Exception, "%s's %s must be a string." % (URL, key) if "filter" not in kwargs: kwargs["filter"] = None if not "tags" in kwargs: kwargs["tags"] = [None] else: tgs = [] for tag in kwargs["tags"]: if tag: if type(tag) not in [unicode,str]: raise Exception, "%s's tags must be strings." % URL elif type(tag) == str: tgs.append(unicode(tag, "UTF-8", "ignore")) else: tgs.append(tag) else: tgs.append(None) kwargs["tags"] = tgs # The tag is the only thing that has to be unique, so we ignore # any duplicate URLs, or everything will break. if not URL in [f.URL for f in c.feeds]: c.feeds.append(Feed(c, c.feed_dir +\ URL.replace("/", " "), URL, kwargs["tags"], kwargs["rate"], kwargs["keep"], kwargs["filter"], kwargs["username"], kwargs["password"])) return True def change_feed(URL, **kwargs): for f in c.feeds: if f.URL == URL: c.feeds.remove(f) add(URL, **kwargs) break def set_default_rate(rate): c.default_rate = rate def set_default_keep(keep): c.default_keep = keep def never_discard(tag): c.never_discard.append(tag) c.locals.update({ "add" : add, "change_feed" : change_feed, "default_rate" : set_default_rate, "default_keep" : set_default_keep, "never_discard" : never_discard}) def post_parse(c): pass # Fortunately, we can be pretty lax about validating feeds. The config functions # guarantee that we won't have feeds with the same URL and that the types are # all in order. It's important that that's done right off the bat because # between post_parse and validate the feed information will be used. def validate(c): pass def test(c): add = c.locals["add"] add("http://someurl") if c.feeds[0].rate != c.default_rate: raise Exception, "Default rate not transferring" if c.feeds[0].keep != c.default_keep: raise Exception, "Default keep not transferring" add("http://someurl") if len(c.feeds) > 1: raise Exception, "Duplicate URL allowed." c.locals["default_rate"](777) c.locals["default_keep"](777) add("http://someotherurl") if c.feeds[1].rate != 777: raise Exception, "Set default rate not transferred" if c.feeds[1].keep != 777: raise Exception, "Set default keep not transferred" c.feeds = [] try: add(None) except: pass else: raise Exception, "Invalid URL didn't raise exception." try: add("blah", rate="bad") except: pass else: raise Exception, "Invalid rate didn't raise exception." try: add("blah", keep="bad") except: pass else: raise Exception, "Invalid keep didn't raise exception" try: add("blah", username=0xdeadbeef) except: pass else: raise Exception, "Invalid username didn't raise exception" try: add("blah", password=0xdeadcafe) except: pass else: raise Exception, "Invalid password didn't raise exception" print "Feed tests passed" canto-0.7.10/canto/cfg/filters.py000066400000000000000000000063511142361563200166000ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. from canto.utility import Cycle import traceback import types all_filters = [] class Filter: def __init__(self): self.precache = [] def __str__(self): return "Unnamed Filter." def __call__(self, tag, item): return 1 def filter_dec(c, f): if not f: return None class fdec(): def __init__(self, instance, log): self.instance = instance self.precache = self.instance.precache self.log = log def __eq__(self, other): if not other: return False return str(self) == str(other) def __str__(self): return self.instance.__str__() def __call__(self, *args): try: return self.instance(*args) except: self.log("\nException in filter:") self.log("%s" % traceback.format_exc()) if c: return fdec(f, c.log) else: return f def register(c): def set_default_tag_filters(filters): c.tag_filters = filters c.tag_filters = [None] c.filters = [None] c.all_filters = [] c.locals.update({ "Filter" : Filter, "default_tag_filters" : set_default_tag_filters, "tag_filters" : c.tag_filters, "filters" : c.filters }) def post_parse(c): # This has to be done before the validate stage # because it has to be done before the update # Note that tag_filters isn't moved in because at this point, the tags have # all had their filters set explicitly. c.all_filters = all_filters c.filters = c.locals["filters"] for feed in c.feeds: if not feed.filter: continue feed.filter = validate_filter(c, feed.filter) def validate_filter(c, f): if not f: return None if type(f) not in [types.ClassType, types.InstanceType]: raise Exception, \ "All filters must be classes that subclass Filter (%s)" % f if not isinstance(f, Filter): f = f() if not issubclass(f.__class__, Filter): raise Exception, "All filters must subclass Filter class ("\ + f.__class__.__name__ + ")" return filter_dec(c, f) def validate(c): c.all_filters = [ validate_filter(c, f) for f in c.all_filters ] if type(c.filters) != list: raise Exception, "filters must be a list %s" % c.filters c.filters = [ validate_filter(c, f) for f in c.filters ] for filt in c.filters: if filt not in c.all_filters: c.all_filters.append(filt) c.filters = Cycle(c.filters) for tag in c.cfgtags: if type(tag.filters) != list: raise Exception, "tag filters must be a list %s" % tag.filters tag.filters = [validate_filter(c, f) for f in tag.filters] for filt in tag.filters: if filt not in c.all_filters: c.all_filters.append(filt) tag.filters = Cycle(tag.filters) def test(c): pass canto-0.7.10/canto/cfg/gui.py000066400000000000000000000053051142361563200157120ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. from canto.const import VERSION_TUPLE def default_status(c): return u"%8%B" + u"Canto %d.%d.%d" % VERSION_TUPLE + u"%b%1" def register(c): c.columns = 1 c.height = 0 c.width = 0 c.reader_lines = 0 c.reader_orientation = None c.cursor_type = "edge" c.cursor_scroll = "scroll" c.cursor_edge = 5 c.gui_top = 0 c.gui_right = 0 c.gui_height = 0 c.gui_width = 0 c.status = default_status c.locals.update({ "cursor_type" : c.cursor_type, "cursor_edge" : c.cursor_edge, "cursor_scroll" : c.cursor_scroll, "status" : c.status, "reader_orientation" : c.reader_orientation, "reader_lines" : c.reader_lines, "columns" : c.columns}) def post_parse(c): for attr in ["columns", "reader_orientation", "reader_lines", "status", "cursor_type", "cursor_scroll", "cursor_edge"]: setattr(c, attr, c.locals[attr]) def validate(c): if c.cursor_type not in ["edge","top","middle","bottom"]: raise Exception, """cursor_type must be "edge",""" +\ """ "top", "middle", or "bottom". Not "%s".""" % c.cursor_type if c.cursor_scroll not in ["page", "scroll"]: raise Exception, """cursor_scroll must be "page" or "scroll".""" if c.cursor_type != "edge" and c.cursor_scroll == "page": print "Page scrolling is incompatible with non-edge cursor type" print "Defaulting back to scroll" c.cursor_scroll = "scroll" if c.cursor_type == "edge": if type(c.cursor_edge) != int: raise Exception, """cursor_edge must be >= 0 integer.""" if c.cursor_edge < 0: raise Exception, """cursor_edge must be >= 0, not %d.""" %\ c.cursor_edge if c.reader_orientation not in ["top","bottom","left","right",None]: raise Exception, """reader_orientation must be "top", "bottom",""" +\ """ "left", "right", or None. Not "%s".""" % c.reader_orientation if type(c.reader_lines) != int: raise Exception, "reader_lines must be an >= 0 integer." if c.reader_lines < 0: raise Exception, "reader_lines must be >= 0, not %d" % c.reader_lines if type(c.columns) != int: raise Exception, "columns must be an >= 0 integer." if c.columns < 0: raise Exception, "columns must be an >= 0 integer, not %d" % \ c.columns def test(c): pass canto-0.7.10/canto/cfg/hooks.py000066400000000000000000000022561142361563200162530ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import traceback def hook_dec(c, fn): if not fn: return None def hdec(*args): try: r = fn(*args) except: c.log("\nException in hook:") c.log("%s" % traceback.format_exc()) return 0 return r return hdec hooks = ["resize_hook","new_hook","select_hook","update_hook",\ "unselect_hook","start_hook","end_hook", "state_change_hook"] def register(c): for h in hooks: setattr(c, h, None) c.locals.update({ h : getattr(c, h)}) def post_parse(c): for h in hooks: setattr(c, h, c.locals[h]) def validate(c): for h in hooks: hk = getattr(c, h) if not hk: continue if not hasattr(hk, "__call__"): raise TypeError("All hooks must be callable. (%s)" % hk) setattr(c, h, hook_dec(c, hk)) def test(c): pass canto-0.7.10/canto/cfg/keys.py000066400000000000000000000060521142361563200161010ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import curses def convkey(s): if len(s) == 1: return (ord(s),0) elif s.startswith("C-"): k, m = convkey(s[2:]) return (k & 0x1F, m) elif s.startswith("M-"): k, m = self.convkey(s[2:]) return (k, 1) elif s == "KEY_RETURN": return (10, 0) else: return (getattr(curses, s), 0) def conv_key_list(dict): ret = {} for key in dict: if not dict[key]: continue try: newkey = convkey(key) except AttributeError: continue if type(dict[key]) != type([]): ret[newkey] = [dict[key]] else: ret[newkey] = dict[key] return ret def register(c): c.key_list = { "q" : "quit", "KEY_DOWN" : "next_item", "KEY_UP" : "prev_item", "j" : "next_item", "k" : "prev_item", "KEY_RIGHT" : "just_read", "KEY_LEFT" : "just_unread", "KEY_NPAGE" : "next_tag", "KEY_PPAGE" : "prev_tag", "\\" : "restart", "[" : "prev_filter", "]" : "next_filter", "{" : "prev_tag_filter", "}" : "next_tag_filter", "-" : "prev_tag_sort", "=" : "next_tag_sort", "l" : "next_tag", "o" : "prev_tag", "<" : "prev_tagset", ">" : "next_tagset", "g" : "goto", "." : "next_unread", "," : "prev_unread", "f" : "inline_search", "n" : "next_mark", "p" : "prev_mark", " " : "reader", "c" : "toggle_collapse_tag", "C" : "set_collapse_all", "V" : "unset_collapse_all", "m" : "toggle_mark", "M" : "all_unmarked", "r" : "tag_read", "R" : "all_read", "u" : "tag_unread", "U" : "all_unread", ";" : "goto_reltag", ":" : "goto_tag", "C-r" : "force_update", "C-l" : "refresh", "h" : "help"} c.reader_key_list = { "KEY_DOWN" : "scroll_down", "KEY_UP" : "scroll_up", "j" : "scoll_down", "k" : "scroll_up", "KEY_NPAGE" : "page_down", "KEY_PPAGE" : "page_up", "g" : "goto", "l" : "toggle_show_links", "n" : ["destroy", "just_read", "next_item", "reader"], "p" : ["destroy", "just_read", "prev_item", "reader"], "h" : ["destroy", "just_read", "help"], "q" : ["destroy", "just_read", "quit"], " " : ["destroy", "just_read"]} c.locals.update({ "keys" : c.key_list, "reader_keys" : c.reader_key_list }) def post_parse(c): c.key_list = conv_key_list(c.key_list) c.reader_key_list = conv_key_list(c.reader_key_list) def validate(c): pass def test(c): pass canto-0.7.10/canto/cfg/links.py000066400000000000000000000021121142361563200162370ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. def register(c): c.handlers = { "link" : {}, "image" : {} } def handler(handlers, path, **kwargs): if not "text" in kwargs: kwargs["text"] = False if not "fetch" in kwargs: kwargs["fetch"] = False if not "ext" in kwargs: kwargs["ext"] = None handlers.update(\ {kwargs["ext"] : (path, kwargs["text"], kwargs["fetch"])}) def image_handler(path, **kwargs): handler(c.handlers["image"], path, **kwargs) def link_handler(path, **kwargs): handler(c.handlers["link"], path, **kwargs) c.locals.update({ "link_handler": link_handler, "image_handler": image_handler}) def post_parse(c): pass def validate(c): pass def test(c): pass canto-0.7.10/canto/cfg/sorts.py000066400000000000000000000046631142361563200163060ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. from canto.utility import Cycle import traceback import types all_sorts = [] class Sort: def __init__(self): self.precache = [] def __str__(self): return "Unnamed Sort." def __call__(self, item, item2): return 0 def sort_dec(c, s): if not s: return None class sdec(): def __init__(self, instance, log): self.instance = instance self.precache = self.instance.precache self.log = log def __eq__(self, other): if not other: return False return str(self) == str(other) def __str__(self): return self.instance.__str__() def __call__(self, *args): try: return self.instance(*args) except: self.log("\nException in sort:") self.log("%s" % traceback.format_exc()) return sdec(s, c.log) def register(c): def set_default_tag_sorts(sorts): c.tag_sorts = sorts c.all_sorts = [] c.tag_sorts = [None] c.locals.update({ "Sort" : Sort, "default_tag_sorts" : set_default_tag_sorts, "tag_sorts" : c.tag_sorts }) def post_parse(c): c.all_sorts = all_sorts def validate_sort(c, s): if not s: return None if type(s) not in [types.ClassType, types.InstanceType]: raise Exception, \ "All sorts must be classes that subclass Sort (%s)" % s if not isinstance(s, Sort): s = s() if not issubclass(s.__class__, Sort): raise Exception, "All sorts must subclass Sort class ("\ + s.__class__.__name__ + ")" if c: return sort_dec(c, s) else: return s def validate(c): c.all_sorts = [ validate_sort(c, s) for s in c.all_sorts ] for tag in c.cfgtags: if type(tag.sorts) != list: raise Exception, "Tag sorts for %s must be a list" % tag.tag newsorts = [ validate_sort(c, s) for s in tag.sorts ] for s in newsorts: if s not in c.all_sorts: c.all_sorts.append(s) tag.sorts = Cycle(newsorts) def test(c): pass canto-0.7.10/canto/cfg/sources.py000066400000000000000000000046671142361563200166230ustar00rootroot00000000000000#!/usr/bin/python # -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import xml.parsers.expat import codecs def register(c): def source(fn): def source_dec(*args, **kwargs): append = False if "append" in kwargs: append = kwargs["append"] file = codecs.open(c.path, "a", "UTF-8") l = fn(*args, **kwargs) for f in l: if c.locals["add"](f[0], tags=[f[1]]) and append: if f[1]: file.write(u"""add("%s", tags=["%s"])\n""" % (f[0], f[1].replace("\"", "\\\""))) else: file.write(u"""add("%s")\n""" % f[0]) if append: file.close() return source_dec @source def source_opml(filename, **kwargs): l = [] def start(name, attrs) : if name == "outline" and (\ (("type" in attrs and\ attrs["type"] in ["pie","rss"])) or\ not ("type" in attrs)): if "xmlUrl" in attrs: if "text" in attrs: l.append((attrs["xmlUrl"], attrs["text"])) else: l.append((attrs["xmlUrl"], None)) p = xml.parsers.expat.ParserCreate() p.StartElementHandler = start try: d = c.read_decode(filename) except: raise Exception, "Unable to open %s" % filename p.Parse(d.encode("UTF-8"), 1) return l @source def source_urls(filename, **kwargs): l = [] try: d = c.read_decode(filename).split('\n')[:-1] except: raise Exception, "Unable to open %s" % filename for feed in d: l.append((feed, None)) return l @source def source_url(URL, **kwargs): if "tag" in kwargs: return [(URL, kwargs["tag"])] return [(None, URL)] c.locals.update({ "source_urls" : source_urls, "source_url" : source_url, "source_opml" : source_opml }) def post_parse(c): pass def validate(c): pass def test(c): pass canto-0.7.10/canto/cfg/style.py000066400000000000000000000137051142361563200162710ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. from canto.interface_draw import BaseRenderer, Renderer from canto.widecurse import enable_color, enable_style,\ disable_color, disable_style MAX_COLORS = 255 INVALID_COLOR = -2 colordir = {"default" : -1, "black" : 0, "white" : 7, "red" : 1, "green" : 2, "yellow" : 3, "blue" : 4, "magenta" : 5, "pink" : 5, "cyan" : 6} def convcolor(c): if type(c) == int: if -1 <= c <= MAX_COLORS: return c else: return INVALID_COLOR elif type(c) == str: if c in colordir: return colordir[c] return INVALID_COLOR def register(c): c.colors = [("white","black"),"blue","yellow",\ "green","pink","black",\ "blue","black"] c.default_renderer = Renderer() c.default_msg_tick = 5 def set_default_renderer(renderer): c.default_renderer = renderer def get_default_renderer(): return c.default_renderer c.locals.update({ "colors" : c.colors, "renderer" : Renderer, "disable_style" : disable_style, "disable_color" : disable_color, "enable_style" : enable_style, "enable_color" : enable_color, "default_renderer" : set_default_renderer, "get_default_renderer" : get_default_renderer}) def post_parse(c): c.colors = c.locals["colors"] # Acceptable colors are either one off strings (like "blue") or ints (4) that # get converted into a tuple with the same background as the first color pair # (or "default" if it is the first color pair) or they're full tuples (fg, bg). # Int colors have to be 0 <= x < MAX_COLORS # String colors have to be in the colordir (above) # The color array must also be 8 entries long. def validate_colors(colors, len_check = 1): newcolors = [] if len_check and len(colors) != 8: raise Exception, "colors array must have 8 entries" for i, color in enumerate(colors): if type(color) in [int, str, unicode]: color = tuple([color]) if type(color) == tuple: if len(color) > 2: raise Exception, "%s is not a valid color pair (too long)"\ % color elif len(color) == 1: realcolor = convcolor(color[0]) if realcolor == INVALID_COLOR: raise Exception, "%s is not a valid color" % color[0] else: if i == 0: color = (color[0], "default") else: color = (color[0], newcolors[0][1]) fg = convcolor(color[0]) if fg == INVALID_COLOR: raise Exception, "%s is not a valid foreground color" % color[0] bg = convcolor(color[1]) if bg == INVALID_COLOR: raise Exception, "%s is not a valid background color" % color[1] newcolors.append((fg, bg)) else: raise Exception, "Unknown type for color: %s" % type(color) return newcolors def validate_renderer(r): if not isinstance(r, BaseRenderer): raise Exception,\ "Renderers must be subclass of BaseRenderer in canto.interface_draw" def validate(c): for tag in c.cfgtags: validate_renderer(tag.renderer) c.colors = validate_colors(c.colors) def test(c): # One off int color (valid) t = validate_colors([1], 0) if t != [(1,-1)]: raise Exception, "Failed to convert int to color pair (%s)" % t # One off int color (invalid) try: i = MAX_COLORS + 1 t = validate_colors([i], 0) except: pass else: raise Exception, "Invalid color (%d) didn't raise exception" % i for c in colordir.keys(): # One off string color (valid) t = validate_colors([c], 0) if t != [(colordir[c], -1)]: raise Exception, \ "Failed to convert %s to color pair (%s)" % (c,t) # String tuple (valid) input = (c, "default") t = validate_colors([input], 0) if t[0] == [(colordir[c], colordir["default"])]: raise Exception,\ "Failed to convert %s to color pair (%s)" % (input,t) #One off string color (invalid) try: s = "bullshit" t = validate_colors([s], 0) except: pass else: raise Exception, "Invalid color (%s) didn't raise exception" % s #String tuple (invalid) for t in [("bullshit", "default"), ("default", "bullshit")]: try: validate_colors(t, 0) except: pass else: raise Exception,\ "Invalid string tuple (%s) didn't raise exception" % t #Numeric tuple (valid) for fg in xrange(MAX_COLORS): for bg in xrange(MAX_COLORS): t = validate_colors([(fg, bg)], 0) if t != [(fg, bg)]: raise Exception,\ "Valid color pair %s failed" % (fg, bg) #Numeric tuple (invalid) for fg, bg in [(MAX_COLORS + 1, 0), (0, MAX_COLORS + 1)]: try: t = validate_colors([(fg, bg)], 0) except: pass else: raise Exception, "Invalid color pair didn't raise exception" #Default bg i = [(1, 2), 1] t = validate_colors(i, 0) if t != [(1,2), (1, 2)]: raise Exception, "Default background failed" #Len check (valid) i = [(0,0)] * 8 t = validate_colors(i) if t != i: raise Exception, "Valid len check failed" #Len check (invalid) i = [] try: t = validate_colors(i) except: pass else: raise Exception, "Invalid len check failed" print "Style tests passed." canto-0.7.10/canto/cfg/tags.py000066400000000000000000000117661142361563200160740ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. from canto.utility import Cycle from canto.tag import Tag def register(c): c.tags = None c.cfgtags = [] def add_tag(tags, **kwargs): if "sorts" not in kwargs: kwargs["sorts"] = c.tag_sorts if "filters" not in kwargs: kwargs["filters"] = c.tag_filters if not hasattr(tags, "__iter__"): tags = [tags] for t in tags: c.cfgtags.append(Tag(\ c, c.default_renderer, kwargs["sorts"], kwargs["filters"], unicode(t, "UTF-8", "ignore"))) c.locals.update({"add_tag" : add_tag, "tags" : c.tags }) def post_parse(c): c.tags = c.locals["tags"] def validate_tags(c): configured_tags = [ x.tag for x in c.cfgtags ] potential_tags = [] if c.tags == None: c.tags = [ None ] for feed in c.feeds: for tag in feed.tags[1:]: if [tag] not in c.tags: c.tags.append([tag]) if type(c.tags) != list: raise Exception, "tags must be a list of lists of strings" if not len(c.tags): raise Exception, "tags must not be empty" one_good_set = 0 for i in c.tags: if i: if type(i) != list: raise Exception, "tags must be a list of lists of strings" if not len(i): continue one_good_set = 1 for t in i: if type(t) not in [str, unicode]: raise Exception, "tags are referenced as strings, not %s" %\ type(t) if type(t) == str: t = unicode(t, "UTF-8", "ignore") if t not in potential_tags and\ t not in configured_tags: potential_tags.append(t) elif i == None: # Default case one_good_set = 1 if not one_good_set: raise Exception, "tag lists must not all be empty" for f in c.feeds: for t in f.tags: if type(t) not in [str, unicode]: raise Exception, "tags are referenced as strings, not %s" %\ type(t) if type(t) == str: t = unicode(t, "UTF-8", "ignore") if t not in configured_tags and\ t not in potential_tags: potential_tags.append(t) for tag in potential_tags: c.cfgtags.append(Tag(c, c.default_renderer,\ c.tag_sorts, c.tag_filters, tag)) def get_tag_obj(s): for t in c.cfgtags: if t.tag == s: return t newtags = [] for tagl in c.tags: new = [] if tagl == None: for f in c.feeds: obj = get_tag_obj(f.tags[0]) if obj not in new: new.append(obj) else: for x in tagl: if type(x) == str: obj = get_tag_obj(unicode(x, "UTF-8", "ignore")) else: obj = get_tag_obj(x) if obj not in new: new.append(obj) newtags.append(new) return newtags def validate(c): c.tags = Cycle(validate_tags(c)) class StubFeed: def __init__(self, tags): self.tags = tags def test(c): c.feeds = [] c.cfgtags = [] #Bullshit type for tags for badtype in [[], ["garbage"], [[1]], [[]]]: c.tags = badtype try: validate_tags(c) except: pass else: raise Exception,\ "Bad tags (%s) failed to raise exception." % badtype # Actually creating a tag requires some stub defaults c.default_renderer = None c.tag_sorts = [] c.tag_filters = [] #Default c.tags = [["sometag"]] validate_tags(c) if "sometag" not in [t.tag for t in c.cfgtags]: raise Exception, "Failed to use hard coded tag." # Stub feeds to create tags for. c.feeds = [StubFeed([u"Slashdot",u"news"]), StubFeed([u"Reddit", u"news"])] c.tags = [None] c.tags = validate_tags(c) for tag in [u"Slashdot", u"Reddit", u"news"]: if tag not in [t.tag for t in c.cfgtags]: raise Exception, "Failed to use feed tag." tagstr = [t.tag for t in c.tags[0]] if tagstr != [u"Slashdot", u"Reddit"]: raise Exception, "Failed to generate default None tags %s" % tagstr c.tags = None c.tags = validate_tags(c) tagstr = [[t.tag for t in tagl] for tagl in c.tags] if tagstr != [[u"Slashdot", u"Reddit"], [u"news"]]: raise Exception, "Failed to generate default tag list %s" % tagstr c.feeds = [] print "Tag tests passed." canto-0.7.10/canto/cfg/triggers.py000066400000000000000000000014741142361563200167570ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. def register(c): c.triggers = ["interval","signal"] c.locals.update({"triggers" : c.triggers}) def post_parse(c): c.triggers = c.locals["triggers"] def validate(c): if type(c.triggers) != list: raise Exception, "triggers must be a list (%s)" % c.triggers for t in c.triggers: if t not in ["interval","signal","change_tag"]: raise Exception, ("%s is not a valid trigger name, try" + " \"interval\", \"signal\", or \"change_tag\"") % t def test(c): pass canto-0.7.10/canto/const.py000066400000000000000000000014431142361563200155140ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. UPDATE_FIRST = 1 CHECK_NEW = 2 FEED_LIST = 4 OUT_OPML = 8 IN_OPML = 16 IN_URL = 32 EXIT = -1 NOKEY = 0 REFRESH_ALL = 1 READER_NEXT = 2 READER_PREV = 3 UPDATE = 4 KEY_PASSTHRU = 5 REDRAW_ALL = 6 WINDOW_SWITCH = 7 REFILTER = 8 RETAG = 9 TFILTER = 10 RESTART = 11 STORY_SAVED = 0 STORY_UPDATED = 1 STORY_QD = 2 PROC_UPDATE = 0 PROC_FILTER = 1 PROC_BOTH = 2 PROC_TEST = 3 PROC_GETTAGS = 4 PROC_FLUSH = 5 PROC_KILL = 6 PROC_SYNC = 7 PROC_DEQD = 8 VERSION_TUPLE = SET_VERSION_TUPLE GIT_SHA = SET_GIT_SHA canto-0.7.10/canto/extra.py000066400000000000000000000305001142361563200155050ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. from cfg.filters import Filter, all_filters, validate_filter from cfg.sorts import Sort, all_sorts, validate_sort import canto_html import utility import input import subprocess import locale import time import os import re def __add_hook_meta(list): def add_hook(r, func, **kwargs): l = getattr(r, list, None) def get_index(s): if type(kwargs[s]) == str: f = getattr(r, kwargs[s], None) else: f = kwargs[s] if f in l: return l.index(f) else: print "Cannot insert %s %s %s, it isn't in the list!" %\ (func.func_name, s, f.func_name) return -2 if "after" in kwargs: idx = get_index("after") + 1 elif "before" in kwargs: idx = get_index("before") else: idx = len(l) if idx > -1: l.insert(idx, func) return add_hook add_hook_pre_reader = __add_hook_meta("pre_reader") add_hook_post_reader = __add_hook_meta("post_reader") add_hook_pre_story = __add_hook_meta("pre_story") add_hook_post_story = __add_hook_meta("post_story") add_info_holster = "reader_highlight_quotes" def add_info(r, item, **kwargs): global add_info_holster if type(item) not in [str, unicode]: raise Exception, "Item must be a string" realitem = item.lower() if "format" not in kwargs: kwargs["format"] = "%s%s\n" elif type(kwargs["format"]) not in [str, unicode]: raise Exception, "Format must be a string" if "caption" not in kwargs: kwargs["caption"] = item + ": " elif type(kwargs["caption"]) not in [str, unicode]: raise Exception, "Caption must be a string" if "tags" not in kwargs: kwargs["tags"] = ["*"] elif type(kwargs["tags"]) != list: kwargs["tags"] = [kwargs["tags"]] def hook(dict): mt = [ s for s in kwargs["tags"] if s in dict["story"]["canto_state"]] if realitem == "maintag": dict["content"] = (kwargs["format"] %\ (kwargs["caption"], dict["story"]["canto_state"][0]))\ + dict["content"] elif realitem in dict["story"] and mt: dict["content"] = (kwargs["format"] %\ (kwargs["caption"], dict["story"][realitem]))\ + dict["content"] if "before" in kwargs: add_hook_pre_reader(r, hook, before=kwargs["before"]) elif "after" in kwargs: add_hook_pre_reader(r, hook, after=kwargs["after"]) else: add_hook_pre_reader(r, hook, before=add_info_holster) add_info_holster = hook return hook # Filter for filtering out all read stories. # # Usage : filters=[None, show_unread()] # then using [/] to cycle through. class show_unread(Filter): def __str__(self): return "Show unread" def __call__(self, tag, item): return not item.was("read") # Filter for filtering out all unread stories. # # Usage : filters=[None, show_marked()] # then using [/] to cycle through class show_marked(Filter): def __str__(self): return "Show marked" def __call__(self, tag, item): return item.was("marked") # A filter to take a keyword or regex and filter # all stories that don't contain/match it. # # Usage : filters=[None, only_with("Obama")] # filters=[None, only_with(".*[Ll]inux.*", regex=True)] # class only_with(Filter): def __init__(self, keyword, **kwargs): self.precache = [] self.keyword = keyword if "regex" in kwargs and kwargs["regex"]: self.match = re.compile(keyword) else: self.match = re.compile(".*" + re.escape(keyword) + ".*", re.I) def __str__(self): return "With %s" % self.keyword def __call__(self, tag, item): return self.match.match(item["title"]) # Same as above, except filters out all stories that # *do* match the keyword / regex. class only_without(only_with): def __str__(self): return "Without %s" % self.keyword def __call__(self, tag, item): return not self.match.match(item["title"]) # Display feed when it has one or more of the specified tags class with_tag_in(Filter): def __init__(self, *tags): self.tags = set(tags) self.precache = [] def __str__(self): return "With Tags: %s" % '/'.join(self.tags) def __call__(self, tag, item): feed = [f for f in tag.cfg.feeds if f.path == item.ufp_path][0] tags=set(feed.tags) return bool(self.tags.intersection(tags)) # Display when all filters match # # Usage : filters=[all_of(with_tag_in('news'), show_unread)] class aggregate_filter(Filter): def __init__(self, *filters): self.filters = [ validate_filter(None, f) for f in filters ] self.precache = [] for f in self.filters: if not f: continue for pc in f.precache: if pc not in self.precache: self.precache.append(pc) class all_of(aggregate_filter): def __str__(self): return ' & '.join(["(%s)" % f for f in self.filters]) def __call__(self, tag, item): return all([f(tag, item) for f in self.filters]) class any_of(aggregate_filter): def __str__(self): return ' | '.join(["(%s)" % f for f in self.filters]) def __call__(self, tag, item): return any([f(tag, item) for f in self.filters]) def register_filter(filt): if filt not in all_filters: all_filters.append(filt) def register_sort(s): if s not in all_sorts: all_sorts.append(s) def set_filter(filter): register_filter(filter) return lambda x : x.set_filter(filter) def set_tag_filter(filter): register_filter(filter) return lambda x : x.set_tag_filter(filter) def set_tag_sort(sort): register_sort(sort) return lambda x : x.set_tag_sort(sort) def set_tags(tags): return lambda x : x.set_tagset(tags) # Creates a keybind for searching for a keyword or regex. # # Usage : keys["1"] = search("Obama") # keys["2"] = search(".*[Ll]inux.*, regex=True) def search(s, **kwargs): if "regex" in kwargs and kwargs["regex"]: return lambda x : x.do_inline_search(re.compile(s)) else: return lambda x : x.do_inline_search(\ re.compile(".*" + re.escape(s) + ".*", re.I)) # Creates a keybind to do a interactive search. # # Usage : keys["/"] = search_filter def search_filter(gui): rex = input.input(gui.cfg, "Search Filter") if not rex: return gui.set_filter(None) elif rex.startswith("rgx:"): rex = rex[4:] else: rex = "(?i).*" + re.escape(rex) + ".*" return gui.set_filter(only_with(rex, regex=True)) # Creates a keybind to append current story information to a file in the user's # home directory. This is merely an example, but with a little modification it # could be used to output XML chunks or Markdown output, etc. # # Usage : keys["s"] = save def save(x): import locale enc = locale.getpreferredencoding() # We have to encode strings to the preferredencoding() to avoid # getting UnicodeEncode exceptions. file = open(os.getenv("HOME")+"/canto_out", "a") file.write((x.sel["item"]["title"] + "\n").encode(enc, "ignore")) file.write((x.sel["item"]["link"] + "\n\n").encode(enc, "ignore")) file.close() # Creates a keybind to copy the URL of the current story to the clipboard. # xclip must be available for this to work. # # Usage : keys["y"] = ["just_read", yank, "next_item"] def yank(gui): xclip = subprocess.Popen('xclip -i', shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE) try: xclip.stdin.write(gui.sel["item"]["link"]) xclip.stdin.close() if xclip.wait() != 0: raise IOError except IOError: gui.cfg.log("xclip must be installed for yank to work!") else: gui.cfg.log("Yanked: %s" % gui.sel["item"]["title"]) # Downloads image/link content into a specified directory using wget. # # Usage : reader_keys['w'] = wget_link("/path/to/downloads") def wget_link(path): def do_wget(reader): term = input.num_input(reader.cfg, "Link") if not term: return try: url = reader.links[term][1] except: reader.cfg.log("Invalid link.") return cmd = "wget -P \"" + path + "\" \"%u\"" utility.silentfork(cmd, url, 0, 0) return do_wget # Highlights a word in the reader or main views # # Usage : # r = get_default_renderer() # add_hook_pre_reader(r, highlight_word("NASA")) # add_hook_pre_story(r, highlight_word("never")) # # Highlights the word "never" in the main view, and # "NASA" in the reader. Extra arguments include flags # to pass to the regex (defaults to re.I for ignorecase) # and content, which defines the content in the item to be # parsed. def highlight_word(word, flags=re.I, content="content"): reg = re.compile(r"\b(" + re.escape(word) + r")\b", flags) def hword(dict): dict[content] = reg.sub(r"%R\1%r", dict[content]) return hword # Note: the following two hacks are for xterm and compatible # terminal emulators ([u]rxvt, eterm, aterm, etc.). These should # not be run in screen or standard linux terms because they'll # print garbage to the screen. # Sets the xterm_title to Feed - Title # # Usage : select_hook = set_xterm_title def set_xterm_title(tag, item): # Don't use print! prefcode = locale.getpreferredencoding() os.write(1, (u"\033]0; %s - %s\007" % \ (tag.tag, item["title"])).encode(prefcode)) # Sets the xterm title to " " # # Usage : end_hook = clear_xterm_title def clear_xterm_title(*args): os.write(1, "\033]0; \007") # SORTS class by_date(Sort): def __init__(self): self.precache = ["updated_parsed"] def __str__(self): return "By Date" def __call__(self, x, y): # We wrap this, despite the fact that sorts are all # wrapped in an exception logger because this is a # normal, unimportant problem. try: a = int(time.mktime(x["updated_parsed"])) b = int(time.mktime(y["updated_parsed"])) except: return 0 return b - a class by_len(Sort): def __str__(self): return "By Length" def __call__(self, x, y): return len(x["title"]) - len(y["title"]) class by_content(Sort): def __str__(self): return "By Length of Content" def __call__(self, x, y): def get_text(story): s,links = canto_html.convert(story.get_text()) return s return len(get_text(x)) - len(get_text(y)) class by_alpha(Sort): def __str__(self): return "Alphabetical" def __call__(self, x, y): for a, b in zip(x["title"],y["title"]): if ord(a) != ord(b): return ord(a) - ord(b) return len(x["title"]) - len(y["title"]) class by_unread(Sort): def __str__(self): return "By Unread" def __call__(self, x, y): if x.was("read") and not y.was("read"): return 1 if y.was("read") and not x.was("read"): return -1 return 0 class reverse_sort(Sort): def __init__(self, other_sort): self.other_sort = validate_sort(None, other_sort) self.precache = self.other_sort.precache def __str__(self): return "Reversed %s" % self.other_sort def __call__(self, x, y): return -1 * self.other_sort(x,y) class sort_order(Sort): def __init__(self, *sorts): self.sorts = [ validate_sort(None, s) for s in sorts ] self.precache = [] for s in self.sorts: if not s: continue for pc in s.precache: if pc not in self.precache: self.precache.append(pc) def __str__(self): return ", ".join(["%s" % s for s in self.sorts ]) def __call__(self, x, y): for s in self.sorts: r = s(x, y) if r: return r return 0 canto-0.7.10/canto/feed.py000066400000000000000000000214741142361563200152770ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # The Feed() object is the canto client's interface to the files written by # canto-fetch. In essence, it's only purpose is to load stories and keep the # state synced on disk. # The Feed() object is entirely separate from the Tag() objects that are displayed # in the interface, despite the fact that (by default) there is a single tag per # feed. # The only entry points of feed (other than __init__ when it's created) are # update() for updating from disk and todisk() to commit the current state when # Canto shuts down. from const import STORY_QD, STORY_SAVED, STORY_UPDATED import story import cPickle import fcntl class Feed(list): def __init__(self, cfg, dirpath, URL, tags, rate, keep, \ filter, username, password): # We pay attention to whether the base was set at creation time (i.e. # via the config) so that setting tags=["sometag"] on two feeds merges # them whereas tags=[None, ...] resolving to the same base tag can have # their base tags resolved to "Base" and "Base (2)" (see canto.py) self.tags = tags if self.tags[0] == None: self.base_set = 0 self.base_explicit = 0 else: self.base_set = 1 self.base_explicit = 1 self.URL = URL self.rate = rate self.keep = keep self.username = username self.password = password # Queue status self.qd = False # Hard filter self.filter = filter self.path = dirpath self.cfg = cfg def __eq__(self, other): return self.URL == other.URL def get_ufp(self): lockflags = fcntl.LOCK_SH if self.base_set: lockflags |= fcntl.LOCK_NB try: f = open(self.path, "r") try: fcntl.flock(f.fileno(), lockflags) ufp = cPickle.load(f) except ImportError: try: # Fortunately, I don't think forcing the cpickle # to use feedparser_builtin is harmful, since they're # basically the same class, feedparser_builtin is just the # only way to properly look up the toplevel module now. f.seek(0) data = f.read() data = data.replace("feedparser\n","feedparser_builtin\n",1) ufp = cPickle.loads(data) except: return 0 except: return 0 finally: fcntl.flock(f.fileno(), fcntl.LOCK_UN) f.close() except: return 0 return ufp def update(self): ufp = self.get_ufp() if not ufp: return 0 # If the base hasn't been set, attempt to set it from the data we just # picked up. get_ufp() blocks if base isn't set so it's impossible that # we'll just bail out unless there's a cPickle.load exception, but at # that point we're totally fucked anyway. if not self.base_set: self.base_set = 1 if "feed" in ufp and "title" in ufp["feed"]: replace = lambda x: x or ufp["feed"]["title"] else: # Using URL for tag, no guarantees replace = lambda x: x or self.URL self.tags = [ replace(x) for x in self.tags] self.extend(ufp["entries"]) self.todisk(ufp) return 1 # Extend's job is to take items from disk, strip them down to the items that # we want to keep in memory (i.e. stuff that's used often) and add them to # the feed, applying the hard filter if necessary. def extend(self, entries): newlist = [] for entry in entries: # This checks existence in newlist as well to avoid # duplicate items on feeds with duplicates in them. # (i.e. broken) if entry in self and entry not in newlist: centry = self[self.index(entry)] if (not centry.updated) and\ (centry["canto_state"] != entry["canto_state"]): centry["canto_state"] = entry["canto_state"] newlist.append(centry) continue # nentry is the new, stripped down version of the item nentry = {} nentry["id"] = entry["id"] nentry["feed"] = self.URL nentry["canto_state"] = entry["canto_state"] if "title" not in entry: nentry["title"] = "" else: nentry["title"] = entry["title"] if "title_detail" in entry: nentry["title_detail"] = entry["title_detail"] for pc in self.cfg.precache: if pc in entry: nentry[pc] = entry[pc] else: nentry[pc] = None if "link" in entry: nentry["link"] = entry["link"] elif "href" in entry: nentry["link"] = entry["href"] # If tags were added in the configuration, c-f won't # notice (doesn't care about tags), so we check and # append as needed. updated = STORY_SAVED if self.tags[0] != nentry["canto_state"][0]: nentry["canto_state"][0] = self.tags[0] updated = STORY_UPDATED for tag in self.tags[1:]: if tag not in nentry["canto_state"]: nentry["canto_state"].append(tag) updated = STORY_UPDATED if nentry not in newlist: newlist.append(story.Story(nentry, self.path, updated)) del self[:] for item in newlist: if not self.filter or self.filter(self, item): list.append(self, item) # Merging items means that they're unvalidated and unfiltered. This is # used when story objects are read in from a pipe. def merge(self, iter): for i, item in enumerate(iter): if item in self: cur = self[self.index(item)] if cur.updated in [STORY_SAVED, STORY_QD]: cur["canto_state"] = item["canto_state"] cur.updated = 0 iter[i] = cur del self[:] list.extend(self, iter) # todisk is the complement to get_ufp, however, since the state may have # changed on any of the items, it has to intelligently merge the changes # before writing to disk. def todisk(self, ufp=None): if ufp == None: ufp = self.get_ufp() if not ufp: return changed = self.changed() if not changed : return for entry in changed: # We've stopped caring about this item if entry not in ufp["entries"]: continue old = ufp["entries"][ufp["entries"].index(entry)] if old["canto_state"] != entry["canto_state"]: # States differ, and we've recorded an update, that means we # probably have the newer information, so we handle the # state_change_hook in a batch and overwrite the old data if entry.updated: if self.cfg.state_change_hook: add = [t for t in entry["canto_state"] if\ t not in old["canto_state"]] rem = [t for t in old["canto_state"] if\ t not in entry["canto_state"]] self.cfg.state_change_hook(self, entry, add, rem) old["canto_state"] = entry["canto_state"] # States differ, but we have no change, most likely the on disk # info is newer (i.e. changed by another running canto # instance). We count on the other canto instance handling the # state_change_hook. else: entry["canto_state"] = old["canto_state"] # Dump the feed to disk. f = open(self.path, "r+") try: fcntl.flock(f.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB) f.seek(0, 0) f.truncate() cPickle.dump(ufp, f) f.flush() for x in changed: x.updated = STORY_SAVED except: return 0 finally: fcntl.flock(f.fileno(), fcntl.LOCK_UN) f.close() del ufp return 1 def changed(self): return [ x for x in self if x.updated ] canto-0.7.10/canto/feedparser_builtin.py000066400000000000000000003606441142361563200202470ustar00rootroot00000000000000#!/usr/bin/env python """Universal feed parser Handles RSS 0.9x, RSS 1.0, RSS 2.0, CDF, Atom 0.3, and Atom 1.0 feeds Visit http://feedparser.org/ for the latest version Visit http://feedparser.org/docs/ for the latest documentation Required: Python 2.1 or later Recommended: Python 2.3 or later Recommended: CJKCodecs and iconv_codec """ __version__ = "4.1"# + "$Revision: 1.92 $"[11:15] + "-cvs" __license__ = """Copyright (c) 2002-2006, Mark Pilgrim, All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 'AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.""" __author__ = "Mark Pilgrim " __contributors__ = ["Jason Diamond ", "John Beimler ", "Fazal Majid ", "Aaron Swartz ", "Kevin Marks "] _debug = 0 # HTTP "User-Agent" header to send to servers when downloading feeds. # If you are embedding feedparser in a larger application, you should # change this to your application name and URL. USER_AGENT = "UniversalFeedParser/%s +http://feedparser.org/" % __version__ # HTTP "Accept" header to send to servers when downloading feeds. If you don't # want to send an Accept header, set this to None. ACCEPT_HEADER = "application/atom+xml,application/rdf+xml,application/rss+xml,application/x-netcdf,application/xml;q=0.9,text/xml;q=0.2,*/*;q=0.1" # List of preferred XML parsers, by SAX driver name. These will be tried first, # but if they're not installed, Python will keep searching through its own list # of pre-installed parsers until it finds one that supports everything we need. PREFERRED_XML_PARSERS = ["drv_libxml2"] # If you want feedparser to automatically run HTML markup through HTML Tidy, set # this to 1. Requires mxTidy # or utidylib . TIDY_MARKUP = 0 # List of Python interfaces for HTML Tidy, in order of preference. Only useful # if TIDY_MARKUP = 1 PREFERRED_TIDY_INTERFACES = ["uTidy", "mxTidy"] # ---------- required modules (should come with any Python distribution) ---------- import sgmllib, re, sys, copy, urlparse, time, rfc822, types, cgi, urllib, urllib2 try: from cStringIO import StringIO as _StringIO except: from StringIO import StringIO as _StringIO # ---------- optional modules (feedparser will work without these, but with reduced functionality) ---------- # gzip is included with most Python distributions, but may not be available if you compiled your own try: import gzip except: gzip = None try: import zlib except: zlib = None # If a real XML parser is available, feedparser will attempt to use it. feedparser has # been tested with the built-in SAX parser, PyXML, and libxml2. On platforms where the # Python distribution does not come with an XML parser (such as Mac OS X 10.2 and some # versions of FreeBSD), feedparser will quietly fall back on regex-based parsing. try: import xml.sax xml.sax.make_parser(PREFERRED_XML_PARSERS) # test for valid parsers from xml.sax.saxutils import escape as _xmlescape _XML_AVAILABLE = 1 except: _XML_AVAILABLE = 0 def _xmlescape(data): data = data.replace('&', '&') data = data.replace('>', '>') data = data.replace('<', '<') return data # base64 support for Atom feeds that contain embedded binary data try: import base64, binascii except: base64 = binascii = None # cjkcodecs and iconv_codec provide support for more character encodings. # Both are available from http://cjkpython.i18n.org/ try: import cjkcodecs.aliases except: pass try: import iconv_codec except: pass # chardet library auto-detects character encodings # Download from http://chardet.feedparser.org/ try: import chardet if _debug: import chardet.constants chardet.constants._debug = 1 except: chardet = None # ---------- don't touch these ---------- class ThingsNobodyCaresAboutButMe(Exception): pass class CharacterEncodingOverride(ThingsNobodyCaresAboutButMe): pass class CharacterEncodingUnknown(ThingsNobodyCaresAboutButMe): pass class NonXMLContentType(ThingsNobodyCaresAboutButMe): pass class UndeclaredNamespace(Exception): pass sgmllib.tagfind = re.compile('[a-zA-Z][-_.:a-zA-Z0-9]*') sgmllib.special = re.compile('' % (tag, ''.join([' %s="%s"' % t for t in attrs])), escape=0) # match namespaces if tag.find(':') <> -1: prefix, suffix = tag.split(':', 1) else: prefix, suffix = '', tag prefix = self.namespacemap.get(prefix, prefix) if prefix: prefix = prefix + '_' # special hack for better tracking of empty textinput/image elements in illformed feeds if (not prefix) and tag not in ('title', 'link', 'description', 'name'): self.intextinput = 0 if (not prefix) and tag not in ('title', 'link', 'description', 'url', 'href', 'width', 'height'): self.inimage = 0 # call special handler (if defined) or default handler methodname = '_start_' + prefix + suffix try: method = getattr(self, methodname) return method(attrsD) except AttributeError: return self.push(prefix + suffix, 1) def unknown_endtag(self, tag): if _debug: sys.stderr.write('end %s\n' % tag) # match namespaces if tag.find(':') <> -1: prefix, suffix = tag.split(':', 1) else: prefix, suffix = '', tag prefix = self.namespacemap.get(prefix, prefix) if prefix: prefix = prefix + '_' # call special handler (if defined) or default handler methodname = '_end_' + prefix + suffix try: method = getattr(self, methodname) method() except AttributeError: self.pop(prefix + suffix) # track inline content if self.incontent and self.contentparams.has_key('type') and not self.contentparams.get('type', 'xml').endswith('xml'): # element declared itself as escaped markup, but it isn't really self.contentparams['type'] = 'application/xhtml+xml' if self.incontent and self.contentparams.get('type') == 'application/xhtml+xml': tag = tag.split(':')[-1] self.handle_data('' % tag, escape=0) # track xml:base and xml:lang going out of scope if self.basestack: self.basestack.pop() if self.basestack and self.basestack[-1]: self.baseuri = self.basestack[-1] if self.langstack: self.langstack.pop() if self.langstack: # and (self.langstack[-1] is not None): self.lang = self.langstack[-1] def handle_charref(self, ref): # called for each character reference, e.g. for ' ', ref will be '160' if not self.elementstack: return ref = ref.lower() if ref in ('34', '38', '39', '60', '62', 'x22', 'x26', 'x27', 'x3c', 'x3e'): text = '&#%s;' % ref else: if ref[0] == 'x': c = int(ref[1:], 16) else: c = int(ref) text = unichr(c).encode('utf-8') self.elementstack[-1][2].append(text) def handle_entityref(self, ref): # called for each entity reference, e.g. for '©', ref will be 'copy' if not self.elementstack: return if _debug: sys.stderr.write('entering handle_entityref with %s\n' % ref) if ref in ('lt', 'gt', 'quot', 'amp', 'apos'): text = '&%s;' % ref else: # entity resolution graciously donated by Aaron Swartz def name2cp(k): import htmlentitydefs if hasattr(htmlentitydefs, 'name2codepoint'): # requires Python 2.3 return htmlentitydefs.name2codepoint[k] k = htmlentitydefs.entitydefs[k] if k.startswith('&#') and k.endswith(';'): return int(k[2:-1]) # not in latin-1 return ord(k) try: name2cp(ref) except KeyError: text = '&%s;' % ref else: text = unichr(name2cp(ref)).encode('utf-8') self.elementstack[-1][2].append(text) def handle_data(self, text, escape=1): # called for each block of plain text, i.e. outside of any tag and # not containing any character or entity references if not self.elementstack: return if escape and self.contentparams.get('type') == 'application/xhtml+xml': text = _xmlescape(text) self.elementstack[-1][2].append(text) def handle_comment(self, text): # called for each comment, e.g. pass def handle_pi(self, text): # called for each processing instruction, e.g. pass def handle_decl(self, text): pass def parse_declaration(self, i): # override internal declaration handler to handle CDATA blocks if _debug: sys.stderr.write('entering parse_declaration\n') if self.rawdata[i:i+9] == '', i) if k == -1: k = len(self.rawdata) self.handle_data(_xmlescape(self.rawdata[i+9:k]), 0) return k+3 else: k = self.rawdata.find('>', i) return k+1 def mapContentType(self, contentType): contentType = contentType.lower() if contentType == 'text' or contentType == 'plain': contentType = 'text/plain' elif contentType == 'html': contentType = 'text/html' elif contentType == 'xhtml': contentType = 'application/xhtml+xml' return contentType def trackNamespace(self, prefix, uri): loweruri = uri.lower() if (prefix, loweruri) == (None, 'http://my.netscape.com/rdf/simple/0.9/') and not self.version: self.version = 'rss090' if loweruri == 'http://purl.org/rss/1.0/' and not self.version: self.version = 'rss10' if loweruri == 'http://www.w3.org/2005/atom' and not self.version: self.version = 'atom10' if loweruri.find('backend.userland.com/rss') <> -1: # match any backend.userland.com namespace uri = 'http://backend.userland.com/rss' loweruri = uri if self._matchnamespaces.has_key(loweruri): self.namespacemap[prefix] = self._matchnamespaces[loweruri] self.namespacesInUse[self._matchnamespaces[loweruri]] = uri else: self.namespacesInUse[prefix or ''] = uri def resolveURI(self, uri): return _urljoin(self.baseuri or '', uri) def decodeEntities(self, element, data): return data def push(self, element, expectingText): self.elementstack.append([element, expectingText, []]) def pop(self, element, stripWhitespace=1): if not self.elementstack: return if self.elementstack[-1][0] != element: return element, expectingText, pieces = self.elementstack.pop() output = ''.join(pieces) if stripWhitespace: output = output.strip() if not expectingText: return output # decode base64 content if base64 and self.contentparams.get('base64', 0): try: output = base64.decodestring(output) except binascii.Error: pass except binascii.Incomplete: pass # resolve relative URIs if (element in self.can_be_relative_uri) and output: output = self.resolveURI(output) # decode entities within embedded markup if not self.contentparams.get('base64', 0): output = self.decodeEntities(element, output) # remove temporary cruft from contentparams try: del self.contentparams['mode'] except KeyError: pass try: del self.contentparams['base64'] except KeyError: pass # resolve relative URIs within embedded markup if self.mapContentType(self.contentparams.get('type', 'text/html')) in self.html_types: if element in self.can_contain_relative_uris: output = _resolveRelativeURIs(output, self.baseuri, self.encoding) # sanitize embedded markup if self.mapContentType(self.contentparams.get('type', 'text/html')) in self.html_types: if element in self.can_contain_dangerous_markup: output = _sanitizeHTML(output, self.encoding) if self.encoding and type(output) != type(u''): try: output = unicode(output, self.encoding) except: pass # categories/tags/keywords/whatever are handled in _end_category if element == 'category': return output # store output in appropriate place(s) if self.inentry and not self.insource: if element == 'content': self.entries[-1].setdefault(element, []) contentparams = copy.deepcopy(self.contentparams) contentparams['value'] = output self.entries[-1][element].append(contentparams) elif element == 'link': self.entries[-1][element] = output if output: self.entries[-1]['links'][-1]['href'] = output else: if element == 'description': element = 'summary' self.entries[-1][element] = output if self.incontent: contentparams = copy.deepcopy(self.contentparams) contentparams['value'] = output self.entries[-1][element + '_detail'] = contentparams elif (self.infeed or self.insource) and (not self.intextinput) and (not self.inimage): context = self._getContext() if element == 'description': element = 'subtitle' context[element] = output if element == 'link': context['links'][-1]['href'] = output elif self.incontent: contentparams = copy.deepcopy(self.contentparams) contentparams['value'] = output context[element + '_detail'] = contentparams return output def pushContent(self, tag, attrsD, defaultContentType, expectingText): self.incontent += 1 self.contentparams = FeedParserDict({ 'type': self.mapContentType(attrsD.get('type', defaultContentType)), 'language': self.lang, 'base': self.baseuri}) self.contentparams['base64'] = self._isBase64(attrsD, self.contentparams) self.push(tag, expectingText) def popContent(self, tag): value = self.pop(tag) self.incontent -= 1 self.contentparams.clear() return value def _mapToStandardPrefix(self, name): colonpos = name.find(':') if colonpos <> -1: prefix = name[:colonpos] suffix = name[colonpos+1:] prefix = self.namespacemap.get(prefix, prefix) name = prefix + ':' + suffix return name def _getAttribute(self, attrsD, name): return attrsD.get(self._mapToStandardPrefix(name)) def _isBase64(self, attrsD, contentparams): if attrsD.get('mode', '') == 'base64': return 1 if self.contentparams['type'].startswith('text/'): return 0 if self.contentparams['type'].endswith('+xml'): return 0 if self.contentparams['type'].endswith('/xml'): return 0 return 1 def _itsAnHrefDamnIt(self, attrsD): href = attrsD.get('url', attrsD.get('uri', attrsD.get('href', None))) if href: try: del attrsD['url'] except KeyError: pass try: del attrsD['uri'] except KeyError: pass attrsD['href'] = href return attrsD def _save(self, key, value): context = self._getContext() context.setdefault(key, value) def _start_rss(self, attrsD): versionmap = {'0.91': 'rss091u', '0.92': 'rss092', '0.93': 'rss093', '0.94': 'rss094'} if not self.version: attr_version = attrsD.get('version', '') version = versionmap.get(attr_version) if version: self.version = version elif attr_version.startswith('2.'): self.version = 'rss20' else: self.version = 'rss' def _start_dlhottitles(self, attrsD): self.version = 'hotrss' def _start_channel(self, attrsD): self.infeed = 1 self._cdf_common(attrsD) _start_feedinfo = _start_channel def _cdf_common(self, attrsD): if attrsD.has_key('lastmod'): self._start_modified({}) self.elementstack[-1][-1] = attrsD['lastmod'] self._end_modified() if attrsD.has_key('href'): self._start_link({}) self.elementstack[-1][-1] = attrsD['href'] self._end_link() def _start_feed(self, attrsD): self.infeed = 1 versionmap = {'0.1': 'atom01', '0.2': 'atom02', '0.3': 'atom03'} if not self.version: attr_version = attrsD.get('version') version = versionmap.get(attr_version) if version: self.version = version else: self.version = 'atom' def _end_channel(self): self.infeed = 0 _end_feed = _end_channel def _start_image(self, attrsD): self.inimage = 1 self.push('image', 0) context = self._getContext() context.setdefault('image', FeedParserDict()) def _end_image(self): self.pop('image') self.inimage = 0 def _start_textinput(self, attrsD): self.intextinput = 1 self.push('textinput', 0) context = self._getContext() context.setdefault('textinput', FeedParserDict()) _start_textInput = _start_textinput def _end_textinput(self): self.pop('textinput') self.intextinput = 0 _end_textInput = _end_textinput def _start_author(self, attrsD): self.inauthor = 1 self.push('author', 1) _start_managingeditor = _start_author _start_dc_author = _start_author _start_dc_creator = _start_author _start_itunes_author = _start_author def _end_author(self): self.pop('author') self.inauthor = 0 self._sync_author_detail() _end_managingeditor = _end_author _end_dc_author = _end_author _end_dc_creator = _end_author _end_itunes_author = _end_author def _start_itunes_owner(self, attrsD): self.inpublisher = 1 self.push('publisher', 0) def _end_itunes_owner(self): self.pop('publisher') self.inpublisher = 0 self._sync_author_detail('publisher') def _start_contributor(self, attrsD): self.incontributor = 1 context = self._getContext() context.setdefault('contributors', []) context['contributors'].append(FeedParserDict()) self.push('contributor', 0) def _end_contributor(self): self.pop('contributor') self.incontributor = 0 def _start_dc_contributor(self, attrsD): self.incontributor = 1 context = self._getContext() context.setdefault('contributors', []) context['contributors'].append(FeedParserDict()) self.push('name', 0) def _end_dc_contributor(self): self._end_name() self.incontributor = 0 def _start_name(self, attrsD): self.push('name', 0) _start_itunes_name = _start_name def _end_name(self): value = self.pop('name') if self.inpublisher: self._save_author('name', value, 'publisher') elif self.inauthor: self._save_author('name', value) elif self.incontributor: self._save_contributor('name', value) elif self.intextinput: context = self._getContext() context['textinput']['name'] = value _end_itunes_name = _end_name def _start_width(self, attrsD): self.push('width', 0) def _end_width(self): value = self.pop('width') try: value = int(value) except: value = 0 if self.inimage: context = self._getContext() context['image']['width'] = value def _start_height(self, attrsD): self.push('height', 0) def _end_height(self): value = self.pop('height') try: value = int(value) except: value = 0 if self.inimage: context = self._getContext() context['image']['height'] = value def _start_url(self, attrsD): self.push('href', 1) _start_homepage = _start_url _start_uri = _start_url def _end_url(self): value = self.pop('href') if self.inauthor: self._save_author('href', value) elif self.incontributor: self._save_contributor('href', value) elif self.inimage: context = self._getContext() context['image']['href'] = value elif self.intextinput: context = self._getContext() context['textinput']['link'] = value _end_homepage = _end_url _end_uri = _end_url def _start_email(self, attrsD): self.push('email', 0) _start_itunes_email = _start_email def _end_email(self): value = self.pop('email') if self.inpublisher: self._save_author('email', value, 'publisher') elif self.inauthor: self._save_author('email', value) elif self.incontributor: self._save_contributor('email', value) _end_itunes_email = _end_email def _getContext(self): if self.insource: context = self.sourcedata elif self.inentry: context = self.entries[-1] else: context = self.feeddata return context def _save_author(self, key, value, prefix='author'): context = self._getContext() context.setdefault(prefix + '_detail', FeedParserDict()) context[prefix + '_detail'][key] = value self._sync_author_detail() def _save_contributor(self, key, value): context = self._getContext() context.setdefault('contributors', [FeedParserDict()]) context['contributors'][-1][key] = value def _sync_author_detail(self, key='author'): context = self._getContext() detail = context.get('%s_detail' % key) if detail: name = detail.get('name') email = detail.get('email') if name and email: context[key] = '%s (%s)' % (name, email) elif name: context[key] = name elif email: context[key] = email else: author = context.get(key) if not author: return emailmatch = re.search(r'''(([a-zA-Z0-9\_\-\.\+]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?))''', author) if not emailmatch: return email = emailmatch.group(0) # probably a better way to do the following, but it passes all the tests author = author.replace(email, '') author = author.replace('()', '') author = author.strip() if author and (author[0] == '('): author = author[1:] if author and (author[-1] == ')'): author = author[:-1] author = author.strip() context.setdefault('%s_detail' % key, FeedParserDict()) context['%s_detail' % key]['name'] = author context['%s_detail' % key]['email'] = email def _start_subtitle(self, attrsD): self.pushContent('subtitle', attrsD, 'text/plain', 1) _start_tagline = _start_subtitle _start_itunes_subtitle = _start_subtitle def _end_subtitle(self): self.popContent('subtitle') _end_tagline = _end_subtitle _end_itunes_subtitle = _end_subtitle def _start_rights(self, attrsD): self.pushContent('rights', attrsD, 'text/plain', 1) _start_dc_rights = _start_rights _start_copyright = _start_rights def _end_rights(self): self.popContent('rights') _end_dc_rights = _end_rights _end_copyright = _end_rights def _start_item(self, attrsD): self.entries.append(FeedParserDict()) self.push('item', 0) self.inentry = 1 self.guidislink = 0 id = self._getAttribute(attrsD, 'rdf:about') if id: context = self._getContext() context['id'] = id self._cdf_common(attrsD) _start_entry = _start_item _start_product = _start_item def _end_item(self): self.pop('item') self.inentry = 0 _end_entry = _end_item def _start_dc_language(self, attrsD): self.push('language', 1) _start_language = _start_dc_language def _end_dc_language(self): self.lang = self.pop('language') _end_language = _end_dc_language def _start_dc_publisher(self, attrsD): self.push('publisher', 1) _start_webmaster = _start_dc_publisher def _end_dc_publisher(self): self.pop('publisher') self._sync_author_detail('publisher') _end_webmaster = _end_dc_publisher def _start_published(self, attrsD): self.push('published', 1) _start_dcterms_issued = _start_published _start_issued = _start_published def _end_published(self): value = self.pop('published') self._save('published_parsed', _parse_date(value)) _end_dcterms_issued = _end_published _end_issued = _end_published def _start_updated(self, attrsD): self.push('updated', 1) _start_modified = _start_updated _start_dcterms_modified = _start_updated _start_pubdate = _start_updated _start_dc_date = _start_updated def _end_updated(self): value = self.pop('updated') parsed_value = _parse_date(value) self._save('updated_parsed', parsed_value) _end_modified = _end_updated _end_dcterms_modified = _end_updated _end_pubdate = _end_updated _end_dc_date = _end_updated def _start_created(self, attrsD): self.push('created', 1) _start_dcterms_created = _start_created def _end_created(self): value = self.pop('created') self._save('created_parsed', _parse_date(value)) _end_dcterms_created = _end_created def _start_expirationdate(self, attrsD): self.push('expired', 1) def _end_expirationdate(self): self._save('expired_parsed', _parse_date(self.pop('expired'))) def _start_cc_license(self, attrsD): self.push('license', 1) value = self._getAttribute(attrsD, 'rdf:resource') if value: self.elementstack[-1][2].append(value) self.pop('license') def _start_creativecommons_license(self, attrsD): self.push('license', 1) def _end_creativecommons_license(self): self.pop('license') def _addTag(self, term, scheme, label): context = self._getContext() tags = context.setdefault('tags', []) if (not term) and (not scheme) and (not label): return value = FeedParserDict({'term': term, 'scheme': scheme, 'label': label}) if value not in tags: tags.append(FeedParserDict({'term': term, 'scheme': scheme, 'label': label})) def _start_category(self, attrsD): if _debug: sys.stderr.write('entering _start_category with %s\n' % repr(attrsD)) term = attrsD.get('term') scheme = attrsD.get('scheme', attrsD.get('domain')) label = attrsD.get('label') self._addTag(term, scheme, label) self.push('category', 1) _start_dc_subject = _start_category _start_keywords = _start_category def _end_itunes_keywords(self): for term in self.pop('itunes_keywords').split(): self._addTag(term, 'http://www.itunes.com/', None) def _start_itunes_category(self, attrsD): self._addTag(attrsD.get('text'), 'http://www.itunes.com/', None) self.push('category', 1) def _end_category(self): value = self.pop('category') if not value: return context = self._getContext() tags = context['tags'] if value and len(tags) and not tags[-1]['term']: tags[-1]['term'] = value else: self._addTag(value, None, None) _end_dc_subject = _end_category _end_keywords = _end_category _end_itunes_category = _end_category def _start_cloud(self, attrsD): self._getContext()['cloud'] = FeedParserDict(attrsD) def _start_link(self, attrsD): attrsD.setdefault('rel', 'alternate') attrsD.setdefault('type', 'text/html') attrsD = self._itsAnHrefDamnIt(attrsD) if attrsD.has_key('href'): attrsD['href'] = self.resolveURI(attrsD['href']) expectingText = self.infeed or self.inentry or self.insource context = self._getContext() context.setdefault('links', []) context['links'].append(FeedParserDict(attrsD)) if attrsD['rel'] == 'enclosure': self._start_enclosure(attrsD) if attrsD.has_key('href'): expectingText = 0 if (attrsD.get('rel') == 'alternate') and (self.mapContentType(attrsD.get('type')) in self.html_types): context['link'] = attrsD['href'] else: self.push('link', expectingText) _start_producturl = _start_link def _end_link(self): value = self.pop('link') context = self._getContext() if self.intextinput: context['textinput']['link'] = value if self.inimage: context['image']['link'] = value _end_producturl = _end_link def _start_guid(self, attrsD): self.guidislink = (attrsD.get('ispermalink', 'true') == 'true') self.push('id', 1) def _end_guid(self): value = self.pop('id') self._save('guidislink', self.guidislink and not self._getContext().has_key('link')) if self.guidislink: # guid acts as link, but only if 'ispermalink' is not present or is 'true', # and only if the item doesn't already have a link element self._save('link', value) def _start_title(self, attrsD): self.pushContent('title', attrsD, 'text/plain', self.infeed or self.inentry or self.insource) def _start_title_low_pri(self, attrsD): if not self._getContext().has_key('title'): self._start_title(attrsD) _start_dc_title = _start_title_low_pri _start_media_title = _start_title_low_pri def _end_title(self): value = self.popContent('title') context = self._getContext() if self.intextinput: context['textinput']['title'] = value elif self.inimage: context['image']['title'] = value def _end_title_low_pri(self): if not self._getContext().has_key('title'): self._end_title() _end_dc_title = _end_title_low_pri _end_media_title = _end_title_low_pri def _start_description(self, attrsD): context = self._getContext() if context.has_key('summary'): self._summaryKey = 'content' self._start_content(attrsD) else: self.pushContent('description', attrsD, 'text/html', self.infeed or self.inentry or self.insource) def _start_abstract(self, attrsD): self.pushContent('description', attrsD, 'text/plain', self.infeed or self.inentry or self.insource) def _end_description(self): if self._summaryKey == 'content': self._end_content() else: value = self.popContent('description') context = self._getContext() if self.intextinput: context['textinput']['description'] = value elif self.inimage: context['image']['description'] = value self._summaryKey = None _end_abstract = _end_description def _start_info(self, attrsD): self.pushContent('info', attrsD, 'text/plain', 1) _start_feedburner_browserfriendly = _start_info def _end_info(self): self.popContent('info') _end_feedburner_browserfriendly = _end_info def _start_generator(self, attrsD): if attrsD: attrsD = self._itsAnHrefDamnIt(attrsD) if attrsD.has_key('href'): attrsD['href'] = self.resolveURI(attrsD['href']) self._getContext()['generator_detail'] = FeedParserDict(attrsD) self.push('generator', 1) def _end_generator(self): value = self.pop('generator') context = self._getContext() if context.has_key('generator_detail'): context['generator_detail']['name'] = value def _start_admin_generatoragent(self, attrsD): self.push('generator', 1) value = self._getAttribute(attrsD, 'rdf:resource') if value: self.elementstack[-1][2].append(value) self.pop('generator') self._getContext()['generator_detail'] = FeedParserDict({'href': value}) def _start_admin_errorreportsto(self, attrsD): self.push('errorreportsto', 1) value = self._getAttribute(attrsD, 'rdf:resource') if value: self.elementstack[-1][2].append(value) self.pop('errorreportsto') def _start_summary(self, attrsD): context = self._getContext() if context.has_key('summary'): self._summaryKey = 'content' self._start_content(attrsD) else: self._summaryKey = 'summary' self.pushContent(self._summaryKey, attrsD, 'text/plain', 1) _start_itunes_summary = _start_summary def _end_summary(self): if self._summaryKey == 'content': self._end_content() else: self.popContent(self._summaryKey or 'summary') self._summaryKey = None _end_itunes_summary = _end_summary def _start_enclosure(self, attrsD): attrsD = self._itsAnHrefDamnIt(attrsD) self._getContext().setdefault('enclosures', []).append(FeedParserDict(attrsD)) href = attrsD.get('href') if href: context = self._getContext() if not context.get('id'): context['id'] = href def _start_source(self, attrsD): self.insource = 1 def _end_source(self): self.insource = 0 self._getContext()['source'] = copy.deepcopy(self.sourcedata) self.sourcedata.clear() def _start_content(self, attrsD): self.pushContent('content', attrsD, 'text/plain', 1) src = attrsD.get('src') if src: self.contentparams['src'] = src self.push('content', 1) def _start_prodlink(self, attrsD): self.pushContent('content', attrsD, 'text/html', 1) def _start_body(self, attrsD): self.pushContent('content', attrsD, 'application/xhtml+xml', 1) _start_xhtml_body = _start_body def _start_content_encoded(self, attrsD): self.pushContent('content', attrsD, 'text/html', 1) _start_fullitem = _start_content_encoded def _end_content(self): copyToDescription = self.mapContentType(self.contentparams.get('type')) in (['text/plain'] + self.html_types) value = self.popContent('content') if copyToDescription: self._save('description', value) _end_body = _end_content _end_xhtml_body = _end_content _end_content_encoded = _end_content _end_fullitem = _end_content _end_prodlink = _end_content def _start_itunes_image(self, attrsD): self.push('itunes_image', 0) self._getContext()['image'] = FeedParserDict({'href': attrsD.get('href')}) _start_itunes_link = _start_itunes_image def _end_itunes_block(self): value = self.pop('itunes_block', 0) self._getContext()['itunes_block'] = (value == 'yes') and 1 or 0 def _end_itunes_explicit(self): value = self.pop('itunes_explicit', 0) self._getContext()['itunes_explicit'] = (value == 'yes') and 1 or 0 if _XML_AVAILABLE: class _StrictFeedParser(_FeedParserMixin, xml.sax.handler.ContentHandler): def __init__(self, baseuri, baselang, encoding): if _debug: sys.stderr.write('trying StrictFeedParser\n') xml.sax.handler.ContentHandler.__init__(self) _FeedParserMixin.__init__(self, baseuri, baselang, encoding) self.bozo = 0 self.exc = None def startPrefixMapping(self, prefix, uri): self.trackNamespace(prefix, uri) def startElementNS(self, name, qname, attrs): namespace, localname = name lowernamespace = str(namespace or '').lower() if lowernamespace.find('backend.userland.com/rss') <> -1: # match any backend.userland.com namespace namespace = 'http://backend.userland.com/rss' lowernamespace = namespace if qname and qname.find(':') > 0: givenprefix = qname.split(':')[0] else: givenprefix = None prefix = self._matchnamespaces.get(lowernamespace, givenprefix) if givenprefix and (prefix == None or (prefix == '' and lowernamespace == '')) and not self.namespacesInUse.has_key(givenprefix): raise UndeclaredNamespace, "'%s' is not associated with a namespace" % givenprefix if prefix: localname = prefix + ':' + localname localname = str(localname).lower() if _debug: sys.stderr.write('startElementNS: qname = %s, namespace = %s, givenprefix = %s, prefix = %s, attrs = %s, localname = %s\n' % (qname, namespace, givenprefix, prefix, attrs.items(), localname)) # qname implementation is horribly broken in Python 2.1 (it # doesn't report any), and slightly broken in Python 2.2 (it # doesn't report the xml: namespace). So we match up namespaces # with a known list first, and then possibly override them with # the qnames the SAX parser gives us (if indeed it gives us any # at all). Thanks to MatejC for helping me test this and # tirelessly telling me that it didn't work yet. attrsD = {} for (namespace, attrlocalname), attrvalue in attrs._attrs.items(): lowernamespace = (namespace or '').lower() prefix = self._matchnamespaces.get(lowernamespace, '') if prefix: attrlocalname = prefix + ':' + attrlocalname attrsD[str(attrlocalname).lower()] = attrvalue for qname in attrs.getQNames(): attrsD[str(qname).lower()] = attrs.getValueByQName(qname) self.unknown_starttag(localname, attrsD.items()) def characters(self, text): self.handle_data(text) def endElementNS(self, name, qname): namespace, localname = name lowernamespace = str(namespace or '').lower() if qname and qname.find(':') > 0: givenprefix = qname.split(':')[0] else: givenprefix = '' prefix = self._matchnamespaces.get(lowernamespace, givenprefix) if prefix: localname = prefix + ':' + localname localname = str(localname).lower() self.unknown_endtag(localname) def error(self, exc): self.bozo = 1 self.exc = exc def fatalError(self, exc): self.error(exc) raise exc class _BaseHTMLProcessor(sgmllib.SGMLParser): elements_no_end_tag = ['area', 'base', 'basefont', 'br', 'col', 'frame', 'hr', 'img', 'input', 'isindex', 'link', 'meta', 'param'] def __init__(self, encoding): self.encoding = encoding if _debug: sys.stderr.write('entering BaseHTMLProcessor, encoding=%s\n' % self.encoding) sgmllib.SGMLParser.__init__(self) def reset(self): self.pieces = [] sgmllib.SGMLParser.reset(self) def _shorttag_replace(self, match): tag = match.group(1) if tag in self.elements_no_end_tag: return '<' + tag + ' />' else: return '<' + tag + '>' def feed(self, data): data = re.compile(r'', self._shorttag_replace, data) # bug [ 1399464 ] Bad regexp for _shorttag_replace data = re.sub(r'<([^<\s]+?)\s*/>', self._shorttag_replace, data) data = data.replace(''', "'") data = data.replace('"', '"') if self.encoding and type(data) == type(u''): data = data.encode(self.encoding) sgmllib.SGMLParser.feed(self, data) def normalize_attrs(self, attrs): # utility method to be called by descendants attrs = [(k.lower(), v) for k, v in attrs] attrs = [(k, k in ('rel', 'type') and v.lower() or v) for k, v in attrs] return attrs def unknown_starttag(self, tag, attrs): # called for each start tag # attrs is a list of (attr, value) tuples # e.g. for
, tag='pre', attrs=[('class', 'screen')]
        if _debug: sys.stderr.write('_BaseHTMLProcessor, unknown_starttag, tag=%s\n' % tag)
        uattrs = []
        # thanks to Kevin Marks for this breathtaking hack to deal with (valid) high-bit attribute values in UTF-8 feeds
        for key, value in attrs:
            if type(value) != type(u''):
                value = unicode(value, self.encoding, errors='replace')
            uattrs.append((unicode(key, self.encoding), value))
        strattrs = u''.join([u' %s="%s"' % (key, value) for key, value in uattrs]).encode(self.encoding)
        if tag in self.elements_no_end_tag:
            self.pieces.append('<%(tag)s%(strattrs)s />' % locals())
        else:
            self.pieces.append('<%(tag)s%(strattrs)s>' % locals())

    def unknown_endtag(self, tag):
        # called for each end tag, e.g. for 
, tag will be 'pre' # Reconstruct the original end tag. if tag not in self.elements_no_end_tag: self.pieces.append("" % locals()) def handle_charref(self, ref): # called for each character reference, e.g. for ' ', ref will be '160' # Reconstruct the original character reference. self.pieces.append('&#%(ref)s;' % locals()) def handle_entityref(self, ref): # called for each entity reference, e.g. for '©', ref will be 'copy' # Reconstruct the original entity reference. self.pieces.append('&%(ref)s;' % locals()) def handle_data(self, text): # called for each block of plain text, i.e. outside of any tag and # not containing any character or entity references # Store the original text verbatim. if _debug: sys.stderr.write('_BaseHTMLProcessor, handle_text, text=%s\n' % text) self.pieces.append(text) def handle_comment(self, text): # called for each HTML comment, e.g. # Reconstruct the original comment. self.pieces.append('' % locals()) def handle_pi(self, text): # called for each processing instruction, e.g. # Reconstruct original processing instruction. self.pieces.append('' % locals()) def handle_decl(self, text): # called for the DOCTYPE, if present, e.g. # # Reconstruct original DOCTYPE self.pieces.append('' % locals()) _new_declname_match = re.compile(r'[a-zA-Z][-_.a-zA-Z0-9:]*\s*').match def _scan_name(self, i, declstartpos): rawdata = self.rawdata n = len(rawdata) if i == n: return None, -1 m = self._new_declname_match(rawdata, i) if m: s = m.group() name = s.strip() if (i + len(s)) == n: return None, -1 # end of buffer return name.lower(), m.end() else: self.handle_data(rawdata) # self.updatepos(declstartpos, i) return None, -1 def output(self): '''Return processed HTML as a single string''' return ''.join([str(p) for p in self.pieces]) class _LooseFeedParser(_FeedParserMixin, _BaseHTMLProcessor): def __init__(self, baseuri, baselang, encoding): sgmllib.SGMLParser.__init__(self) _FeedParserMixin.__init__(self, baseuri, baselang, encoding) def decodeEntities(self, element, data): data = data.replace('<', '<') data = data.replace('<', '<') data = data.replace('>', '>') data = data.replace('>', '>') data = data.replace('&', '&') data = data.replace('&', '&') data = data.replace('"', '"') data = data.replace('"', '"') data = data.replace(''', ''') data = data.replace(''', ''') if self.contentparams.has_key('type') and not self.contentparams.get('type', 'xml').endswith('xml'): data = data.replace('<', '<') data = data.replace('>', '>') data = data.replace('&', '&') data = data.replace('"', '"') data = data.replace(''', "'") return data class _RelativeURIResolver(_BaseHTMLProcessor): relative_uris = [('a', 'href'), ('applet', 'codebase'), ('area', 'href'), ('blockquote', 'cite'), ('body', 'background'), ('del', 'cite'), ('form', 'action'), ('frame', 'longdesc'), ('frame', 'src'), ('iframe', 'longdesc'), ('iframe', 'src'), ('head', 'profile'), ('img', 'longdesc'), ('img', 'src'), ('img', 'usemap'), ('input', 'src'), ('input', 'usemap'), ('ins', 'cite'), ('link', 'href'), ('object', 'classid'), ('object', 'codebase'), ('object', 'data'), ('object', 'usemap'), ('q', 'cite'), ('script', 'src')] def __init__(self, baseuri, encoding): _BaseHTMLProcessor.__init__(self, encoding) self.baseuri = baseuri def resolveURI(self, uri): return _urljoin(self.baseuri, uri) def unknown_starttag(self, tag, attrs): attrs = self.normalize_attrs(attrs) attrs = [(key, ((tag, key) in self.relative_uris) and self.resolveURI(value) or value) for key, value in attrs] _BaseHTMLProcessor.unknown_starttag(self, tag, attrs) def _resolveRelativeURIs(htmlSource, baseURI, encoding): if _debug: sys.stderr.write('entering _resolveRelativeURIs\n') p = _RelativeURIResolver(baseURI, encoding) p.feed(htmlSource) return p.output() class _HTMLSanitizer(_BaseHTMLProcessor): acceptable_elements = ['a', 'abbr', 'acronym', 'address', 'area', 'b', 'big', 'blockquote', 'br', 'button', 'caption', 'center', 'cite', 'code', 'col', 'colgroup', 'dd', 'del', 'dfn', 'dir', 'div', 'dl', 'dt', 'em', 'fieldset', 'font', 'form', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'i', 'img', 'input', 'ins', 'kbd', 'label', 'legend', 'li', 'map', 'menu', 'ol', 'optgroup', 'option', 'p', 'pre', 'q', 's', 'samp', 'select', 'small', 'span', 'strike', 'strong', 'sub', 'sup', 'table', 'tbody', 'td', 'textarea', 'tfoot', 'th', 'thead', 'tr', 'tt', 'u', 'ul', 'var'] acceptable_attributes = ['abbr', 'accept', 'accept-charset', 'accesskey', 'action', 'align', 'alt', 'axis', 'border', 'cellpadding', 'cellspacing', 'char', 'charoff', 'charset', 'checked', 'cite', 'class', 'clear', 'cols', 'colspan', 'color', 'compact', 'coords', 'datetime', 'dir', 'disabled', 'enctype', 'for', 'frame', 'headers', 'height', 'href', 'hreflang', 'hspace', 'id', 'ismap', 'label', 'lang', 'longdesc', 'maxlength', 'media', 'method', 'multiple', 'name', 'nohref', 'noshade', 'nowrap', 'prompt', 'readonly', 'rel', 'rev', 'rows', 'rowspan', 'rules', 'scope', 'selected', 'shape', 'size', 'span', 'src', 'start', 'summary', 'tabindex', 'target', 'title', 'type', 'usemap', 'valign', 'value', 'vspace', 'width'] unacceptable_elements_with_end_tag = ['script', 'applet'] def reset(self): _BaseHTMLProcessor.reset(self) self.unacceptablestack = 0 def unknown_starttag(self, tag, attrs): if not tag in self.acceptable_elements: if tag in self.unacceptable_elements_with_end_tag: self.unacceptablestack += 1 return attrs = self.normalize_attrs(attrs) attrs = [(key, value) for key, value in attrs if key in self.acceptable_attributes] _BaseHTMLProcessor.unknown_starttag(self, tag, attrs) def unknown_endtag(self, tag): if not tag in self.acceptable_elements: if tag in self.unacceptable_elements_with_end_tag: self.unacceptablestack -= 1 return _BaseHTMLProcessor.unknown_endtag(self, tag) def handle_pi(self, text): pass def handle_decl(self, text): pass def handle_data(self, text): if not self.unacceptablestack: _BaseHTMLProcessor.handle_data(self, text) def _sanitizeHTML(htmlSource, encoding): p = _HTMLSanitizer(encoding) p.feed(htmlSource) data = p.output() if TIDY_MARKUP: # loop through list of preferred Tidy interfaces looking for one that's installed, # then set up a common _tidy function to wrap the interface-specific API. _tidy = None for tidy_interface in PREFERRED_TIDY_INTERFACES: try: if tidy_interface == "uTidy": from tidy import parseString as _utidy def _tidy(data, **kwargs): return str(_utidy(data, **kwargs)) break elif tidy_interface == "mxTidy": from mx.Tidy import Tidy as _mxtidy def _tidy(data, **kwargs): nerrors, nwarnings, data, errordata = _mxtidy.tidy(data, **kwargs) return data break except: pass if _tidy: utf8 = type(data) == type(u'') if utf8: data = data.encode('utf-8') data = _tidy(data, output_xhtml=1, numeric_entities=1, wrap=0, char_encoding="utf8") if utf8: data = unicode(data, 'utf-8') if data.count(''): data = data.split('>', 1)[1] if data.count('= '2.3.3' assert base64 != None user, passw = base64.decodestring(req.headers['Authorization'].split(' ')[1]).split(':') realm = re.findall('realm="([^"]*)"', headers['WWW-Authenticate'])[0] self.add_password(realm, host, user, passw) retry = self.http_error_auth_reqed('www-authenticate', host, req, headers) self.reset_retry_count() return retry except: return self.http_error_default(req, fp, code, msg, headers) def _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers): """URL, filename, or string --> stream This function lets you define parsers that take any input source (URL, pathname to local or network file, or actual data as a string) and deal with it in a uniform manner. Returned object is guaranteed to have all the basic stdio read methods (read, readline, readlines). Just .close() the object when you're done with it. If the etag argument is supplied, it will be used as the value of an If-None-Match request header. If the modified argument is supplied, it must be a tuple of 9 integers as returned by gmtime() in the standard Python time module. This MUST be in GMT (Greenwich Mean Time). The formatted date/time will be used as the value of an If-Modified-Since request header. If the agent argument is supplied, it will be used as the value of a User-Agent request header. If the referrer argument is supplied, it will be used as the value of a Referer[sic] request header. If handlers is supplied, it is a list of handlers used to build a urllib2 opener. """ if hasattr(url_file_stream_or_string, 'read'): return url_file_stream_or_string if url_file_stream_or_string == '-': return sys.stdin if urlparse.urlparse(url_file_stream_or_string)[0] in ('http', 'https', 'ftp'): if not agent: agent = USER_AGENT # test for inline user:password for basic auth auth = None if base64: urltype, rest = urllib.splittype(url_file_stream_or_string) realhost, rest = urllib.splithost(rest) if realhost: user_passwd, realhost = urllib.splituser(realhost) if user_passwd: url_file_stream_or_string = '%s://%s%s' % (urltype, realhost, rest) auth = base64.encodestring(user_passwd).strip() # try to open with urllib2 (to use optional headers) request = urllib2.Request(url_file_stream_or_string) request.add_header('User-Agent', agent) if etag: request.add_header('If-None-Match', etag) if modified: # format into an RFC 1123-compliant timestamp. We can't use # time.strftime() since the %a and %b directives can be affected # by the current locale, but RFC 2616 states that dates must be # in English. short_weekdays = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'] months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'] request.add_header('If-Modified-Since', '%s, %02d %s %04d %02d:%02d:%02d GMT' % (short_weekdays[modified[6]], modified[2], months[modified[1] - 1], modified[0], modified[3], modified[4], modified[5])) if referrer: request.add_header('Referer', referrer) if gzip and zlib: request.add_header('Accept-encoding', 'gzip, deflate') elif gzip: request.add_header('Accept-encoding', 'gzip') elif zlib: request.add_header('Accept-encoding', 'deflate') else: request.add_header('Accept-encoding', '') if auth: request.add_header('Authorization', 'Basic %s' % auth) if ACCEPT_HEADER: request.add_header('Accept', ACCEPT_HEADER) request.add_header('A-IM', 'feed') # RFC 3229 support opener = apply(urllib2.build_opener, tuple(handlers + [_FeedURLHandler()])) opener.addheaders = [] # RMK - must clear so we only send our custom User-Agent try: return opener.open(request) finally: opener.close() # JohnD # try to open with native open function (if url_file_stream_or_string is a filename) try: return open(url_file_stream_or_string) except: pass # treat url_file_stream_or_string as string return _StringIO(str(url_file_stream_or_string)) _date_handlers = [] def registerDateHandler(func): '''Register a date handler function (takes string, returns 9-tuple date in GMT)''' _date_handlers.insert(0, func) # ISO-8601 date parsing routines written by Fazal Majid. # The ISO 8601 standard is very convoluted and irregular - a full ISO 8601 # parser is beyond the scope of feedparser and would be a worthwhile addition # to the Python library. # A single regular expression cannot parse ISO 8601 date formats into groups # as the standard is highly irregular (for instance is 030104 2003-01-04 or # 0301-04-01), so we use templates instead. # Please note the order in templates is significant because we need a # greedy match. _iso8601_tmpl = ['YYYY-?MM-?DD', 'YYYY-MM', 'YYYY-?OOO', 'YY-?MM-?DD', 'YY-?OOO', 'YYYY', '-YY-?MM', '-OOO', '-YY', '--MM-?DD', '--MM', '---DD', 'CC', ''] _iso8601_re = [ tmpl.replace( 'YYYY', r'(?P\d{4})').replace( 'YY', r'(?P\d\d)').replace( 'MM', r'(?P[01]\d)').replace( 'DD', r'(?P[0123]\d)').replace( 'OOO', r'(?P[0123]\d\d)').replace( 'CC', r'(?P\d\d$)') + r'(T?(?P\d{2}):(?P\d{2})' + r'(:(?P\d{2}))?' + r'(?P[+-](?P\d{2})(:(?P\d{2}))?|Z)?)?' for tmpl in _iso8601_tmpl] del tmpl _iso8601_matches = [re.compile(regex).match for regex in _iso8601_re] del regex def _parse_date_iso8601(dateString): '''Parse a variety of ISO-8601-compatible formats like 20040105''' m = None for _iso8601_match in _iso8601_matches: m = _iso8601_match(dateString) if m: break if not m: return if m.span() == (0, 0): return params = m.groupdict() ordinal = params.get('ordinal', 0) if ordinal: ordinal = int(ordinal) else: ordinal = 0 year = params.get('year', '--') if not year or year == '--': year = time.gmtime()[0] elif len(year) == 2: # ISO 8601 assumes current century, i.e. 93 -> 2093, NOT 1993 year = 100 * int(time.gmtime()[0] / 100) + int(year) else: year = int(year) month = params.get('month', '-') if not month or month == '-': # ordinals are NOT normalized by mktime, we simulate them # by setting month=1, day=ordinal if ordinal: month = 1 else: month = time.gmtime()[1] month = int(month) day = params.get('day', 0) if not day: # see above if ordinal: day = ordinal elif params.get('century', 0) or \ params.get('year', 0) or params.get('month', 0): day = 1 else: day = time.gmtime()[2] else: day = int(day) # special case of the century - is the first year of the 21st century # 2000 or 2001 ? The debate goes on... if 'century' in params.keys(): year = (int(params['century']) - 1) * 100 + 1 # in ISO 8601 most fields are optional for field in ['hour', 'minute', 'second', 'tzhour', 'tzmin']: if not params.get(field, None): params[field] = 0 hour = int(params.get('hour', 0)) minute = int(params.get('minute', 0)) second = int(params.get('second', 0)) # weekday is normalized by mktime(), we can ignore it weekday = 0 # daylight savings is complex, but not needed for feedparser's purposes # as time zones, if specified, include mention of whether it is active # (e.g. PST vs. PDT, CET). Using -1 is implementation-dependent and # and most implementations have DST bugs daylight_savings_flag = 0 tm = [year, month, day, hour, minute, second, weekday, ordinal, daylight_savings_flag] # ISO 8601 time zone adjustments tz = params.get('tz') if tz and tz != 'Z': if tz[0] == '-': tm[3] += int(params.get('tzhour', 0)) tm[4] += int(params.get('tzmin', 0)) elif tz[0] == '+': tm[3] -= int(params.get('tzhour', 0)) tm[4] -= int(params.get('tzmin', 0)) else: return None # Python's time.mktime() is a wrapper around the ANSI C mktime(3c) # which is guaranteed to normalize d/m/y/h/m/s. # Many implementations have bugs, but we'll pretend they don't. return time.localtime(time.mktime(tm)) registerDateHandler(_parse_date_iso8601) # 8-bit date handling routines written by ytrewq1. _korean_year = u'\ub144' # b3e2 in euc-kr _korean_month = u'\uc6d4' # bff9 in euc-kr _korean_day = u'\uc77c' # c0cf in euc-kr _korean_am = u'\uc624\uc804' # bfc0 c0fc in euc-kr _korean_pm = u'\uc624\ud6c4' # bfc0 c8c4 in euc-kr _korean_onblog_date_re = \ re.compile('(\d{4})%s\s+(\d{2})%s\s+(\d{2})%s\s+(\d{2}):(\d{2}):(\d{2})' % \ (_korean_year, _korean_month, _korean_day)) _korean_nate_date_re = \ re.compile(u'(\d{4})-(\d{2})-(\d{2})\s+(%s|%s)\s+(\d{,2}):(\d{,2}):(\d{,2})' % \ (_korean_am, _korean_pm)) def _parse_date_onblog(dateString): '''Parse a string according to the OnBlog 8-bit date format''' m = _korean_onblog_date_re.match(dateString) if not m: return w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \ {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\ 'hour': m.group(4), 'minute': m.group(5), 'second': m.group(6),\ 'zonediff': '+09:00'} if _debug: sys.stderr.write('OnBlog date parsed as: %s\n' % w3dtfdate) return _parse_date_w3dtf(w3dtfdate) registerDateHandler(_parse_date_onblog) def _parse_date_nate(dateString): '''Parse a string according to the Nate 8-bit date format''' m = _korean_nate_date_re.match(dateString) if not m: return hour = int(m.group(5)) ampm = m.group(4) if (ampm == _korean_pm): hour += 12 hour = str(hour) if len(hour) == 1: hour = '0' + hour w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \ {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\ 'hour': hour, 'minute': m.group(6), 'second': m.group(7),\ 'zonediff': '+09:00'} if _debug: sys.stderr.write('Nate date parsed as: %s\n' % w3dtfdate) return _parse_date_w3dtf(w3dtfdate) registerDateHandler(_parse_date_nate) _mssql_date_re = \ re.compile('(\d{4})-(\d{2})-(\d{2})\s+(\d{2}):(\d{2}):(\d{2})(\.\d+)?') def _parse_date_mssql(dateString): '''Parse a string according to the MS SQL date format''' m = _mssql_date_re.match(dateString) if not m: return w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s:%(second)s%(zonediff)s' % \ {'year': m.group(1), 'month': m.group(2), 'day': m.group(3),\ 'hour': m.group(4), 'minute': m.group(5), 'second': m.group(6),\ 'zonediff': '+09:00'} if _debug: sys.stderr.write('MS SQL date parsed as: %s\n' % w3dtfdate) return _parse_date_w3dtf(w3dtfdate) registerDateHandler(_parse_date_mssql) # Unicode strings for Greek date strings _greek_months = \ { \ u'\u0399\u03b1\u03bd': u'Jan', # c9e1ed in iso-8859-7 u'\u03a6\u03b5\u03b2': u'Feb', # d6e5e2 in iso-8859-7 u'\u039c\u03ac\u03ce': u'Mar', # ccdcfe in iso-8859-7 u'\u039c\u03b1\u03ce': u'Mar', # cce1fe in iso-8859-7 u'\u0391\u03c0\u03c1': u'Apr', # c1f0f1 in iso-8859-7 u'\u039c\u03ac\u03b9': u'May', # ccdce9 in iso-8859-7 u'\u039c\u03b1\u03ca': u'May', # cce1fa in iso-8859-7 u'\u039c\u03b1\u03b9': u'May', # cce1e9 in iso-8859-7 u'\u0399\u03bf\u03cd\u03bd': u'Jun', # c9effded in iso-8859-7 u'\u0399\u03bf\u03bd': u'Jun', # c9efed in iso-8859-7 u'\u0399\u03bf\u03cd\u03bb': u'Jul', # c9effdeb in iso-8859-7 u'\u0399\u03bf\u03bb': u'Jul', # c9f9eb in iso-8859-7 u'\u0391\u03cd\u03b3': u'Aug', # c1fde3 in iso-8859-7 u'\u0391\u03c5\u03b3': u'Aug', # c1f5e3 in iso-8859-7 u'\u03a3\u03b5\u03c0': u'Sep', # d3e5f0 in iso-8859-7 u'\u039f\u03ba\u03c4': u'Oct', # cfeaf4 in iso-8859-7 u'\u039d\u03bf\u03ad': u'Nov', # cdefdd in iso-8859-7 u'\u039d\u03bf\u03b5': u'Nov', # cdefe5 in iso-8859-7 u'\u0394\u03b5\u03ba': u'Dec', # c4e5ea in iso-8859-7 } _greek_wdays = \ { \ u'\u039a\u03c5\u03c1': u'Sun', # caf5f1 in iso-8859-7 u'\u0394\u03b5\u03c5': u'Mon', # c4e5f5 in iso-8859-7 u'\u03a4\u03c1\u03b9': u'Tue', # d4f1e9 in iso-8859-7 u'\u03a4\u03b5\u03c4': u'Wed', # d4e5f4 in iso-8859-7 u'\u03a0\u03b5\u03bc': u'Thu', # d0e5ec in iso-8859-7 u'\u03a0\u03b1\u03c1': u'Fri', # d0e1f1 in iso-8859-7 u'\u03a3\u03b1\u03b2': u'Sat', # d3e1e2 in iso-8859-7 } _greek_date_format_re = \ re.compile(u'([^,]+),\s+(\d{2})\s+([^\s]+)\s+(\d{4})\s+(\d{2}):(\d{2}):(\d{2})\s+([^\s]+)') def _parse_date_greek(dateString): '''Parse a string according to a Greek 8-bit date format.''' m = _greek_date_format_re.match(dateString) if not m: return try: wday = _greek_wdays[m.group(1)] month = _greek_months[m.group(3)] except: return rfc822date = '%(wday)s, %(day)s %(month)s %(year)s %(hour)s:%(minute)s:%(second)s %(zonediff)s' % \ {'wday': wday, 'day': m.group(2), 'month': month, 'year': m.group(4),\ 'hour': m.group(5), 'minute': m.group(6), 'second': m.group(7),\ 'zonediff': m.group(8)} if _debug: sys.stderr.write('Greek date parsed as: %s\n' % rfc822date) return _parse_date_rfc822(rfc822date) registerDateHandler(_parse_date_greek) # Unicode strings for Hungarian date strings _hungarian_months = \ { \ u'janu\u00e1r': u'01', # e1 in iso-8859-2 u'febru\u00e1ri': u'02', # e1 in iso-8859-2 u'm\u00e1rcius': u'03', # e1 in iso-8859-2 u'\u00e1prilis': u'04', # e1 in iso-8859-2 u'm\u00e1ujus': u'05', # e1 in iso-8859-2 u'j\u00fanius': u'06', # fa in iso-8859-2 u'j\u00falius': u'07', # fa in iso-8859-2 u'augusztus': u'08', u'szeptember': u'09', u'okt\u00f3ber': u'10', # f3 in iso-8859-2 u'november': u'11', u'december': u'12', } _hungarian_date_format_re = \ re.compile(u'(\d{4})-([^-]+)-(\d{,2})T(\d{,2}):(\d{2})((\+|-)(\d{,2}:\d{2}))') def _parse_date_hungarian(dateString): '''Parse a string according to a Hungarian 8-bit date format.''' m = _hungarian_date_format_re.match(dateString) if not m: return try: month = _hungarian_months[m.group(2)] day = m.group(3) if len(day) == 1: day = '0' + day hour = m.group(4) if len(hour) == 1: hour = '0' + hour except: return w3dtfdate = '%(year)s-%(month)s-%(day)sT%(hour)s:%(minute)s%(zonediff)s' % \ {'year': m.group(1), 'month': month, 'day': day,\ 'hour': hour, 'minute': m.group(5),\ 'zonediff': m.group(6)} if _debug: sys.stderr.write('Hungarian date parsed as: %s\n' % w3dtfdate) return _parse_date_w3dtf(w3dtfdate) registerDateHandler(_parse_date_hungarian) # W3DTF-style date parsing adapted from PyXML xml.utils.iso8601, written by # Drake and licensed under the Python license. Removed all range checking # for month, day, hour, minute, and second, since mktime will normalize # these later def _parse_date_w3dtf(dateString): def __extract_date(m): year = int(m.group('year')) if year < 100: year = 100 * int(time.gmtime()[0] / 100) + int(year) if year < 1000: return 0, 0, 0 julian = m.group('julian') if julian: julian = int(julian) month = julian / 30 + 1 day = julian % 30 + 1 jday = None while jday != julian: t = time.mktime((year, month, day, 0, 0, 0, 0, 0, 0)) jday = time.gmtime(t)[-2] diff = abs(jday - julian) if jday > julian: if diff < day: day = day - diff else: month = month - 1 day = 31 elif jday < julian: if day + diff < 28: day = day + diff else: month = month + 1 return year, month, day month = m.group('month') day = 1 if month is None: month = 1 else: month = int(month) day = m.group('day') if day: day = int(day) else: day = 1 return year, month, day def __extract_time(m): if not m: return 0, 0, 0 hours = m.group('hours') if not hours: return 0, 0, 0 hours = int(hours) minutes = int(m.group('minutes')) seconds = m.group('seconds') if seconds: seconds = int(seconds) else: seconds = 0 return hours, minutes, seconds def __extract_tzd(m): '''Return the Time Zone Designator as an offset in seconds from UTC.''' if not m: return 0 tzd = m.group('tzd') if not tzd: return 0 if tzd == 'Z': return 0 hours = int(m.group('tzdhours')) minutes = m.group('tzdminutes') if minutes: minutes = int(minutes) else: minutes = 0 offset = (hours*60 + minutes) * 60 if tzd[0] == '+': return -offset return offset __date_re = ('(?P\d\d\d\d)' '(?:(?P-|)' '(?:(?P\d\d\d)' '|(?P\d\d)(?:(?P=dsep)(?P\d\d))?))?') __tzd_re = '(?P[-+](?P\d\d)(?::?(?P\d\d))|Z)' __tzd_rx = re.compile(__tzd_re) __time_re = ('(?P\d\d)(?P:|)(?P\d\d)' '(?:(?P=tsep)(?P\d\d(?:[.,]\d+)?))?' + __tzd_re) __datetime_re = '%s(?:T%s)?' % (__date_re, __time_re) __datetime_rx = re.compile(__datetime_re) m = __datetime_rx.match(dateString) if (m is None) or (m.group() != dateString): return gmt = __extract_date(m) + __extract_time(m) + (0, 0, 0) if gmt[0] == 0: return return time.gmtime(time.mktime(gmt) + __extract_tzd(m) - time.timezone) registerDateHandler(_parse_date_w3dtf) def _parse_date_rfc822(dateString): '''Parse an RFC822, RFC1123, RFC2822, or asctime-style date''' data = dateString.split() if data[0][-1] in (',', '.') or data[0].lower() in rfc822._daynames: del data[0] if len(data) == 4: s = data[3] i = s.find('+') if i > 0: data[3:] = [s[:i], s[i+1:]] else: data.append('') dateString = " ".join(data) if len(data) < 5: dateString += ' 00:00:00 GMT' tm = rfc822.parsedate_tz(dateString) if tm: return time.gmtime(rfc822.mktime_tz(tm)) # rfc822.py defines several time zones, but we define some extra ones. # 'ET' is equivalent to 'EST', etc. _additional_timezones = {'AT': -400, 'ET': -500, 'CT': -600, 'MT': -700, 'PT': -800} rfc822._timezones.update(_additional_timezones) registerDateHandler(_parse_date_rfc822) def _parse_date(dateString): '''Parses a variety of date formats into a 9-tuple in GMT''' for handler in _date_handlers: try: date9tuple = handler(dateString) if not date9tuple: continue if len(date9tuple) != 9: if _debug: sys.stderr.write('date handler function must return 9-tuple\n') raise ValueError map(int, date9tuple) return date9tuple except Exception, e: if _debug: sys.stderr.write('%s raised %s\n' % (handler.__name__, repr(e))) pass return None def _getCharacterEncoding(http_headers, xml_data): '''Get the character encoding of the XML document http_headers is a dictionary xml_data is a raw string (not Unicode) This is so much trickier than it sounds, it's not even funny. According to RFC 3023 ('XML Media Types'), if the HTTP Content-Type is application/xml, application/*+xml, application/xml-external-parsed-entity, or application/xml-dtd, the encoding given in the charset parameter of the HTTP Content-Type takes precedence over the encoding given in the XML prefix within the document, and defaults to 'utf-8' if neither are specified. But, if the HTTP Content-Type is text/xml, text/*+xml, or text/xml-external-parsed-entity, the encoding given in the XML prefix within the document is ALWAYS IGNORED and only the encoding given in the charset parameter of the HTTP Content-Type header should be respected, and it defaults to 'us-ascii' if not specified. Furthermore, discussion on the atom-syntax mailing list with the author of RFC 3023 leads me to the conclusion that any document served with a Content-Type of text/* and no charset parameter must be treated as us-ascii. (We now do this.) And also that it must always be flagged as non-well-formed. (We now do this too.) If Content-Type is unspecified (input was local file or non-HTTP source) or unrecognized (server just got it totally wrong), then go by the encoding given in the XML prefix of the document and default to 'iso-8859-1' as per the HTTP specification (RFC 2616). Then, assuming we didn't find a character encoding in the HTTP headers (and the HTTP Content-type allowed us to look in the body), we need to sniff the first few bytes of the XML data and try to determine whether the encoding is ASCII-compatible. Section F of the XML specification shows the way here: http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info If the sniffed encoding is not ASCII-compatible, we need to make it ASCII compatible so that we can sniff further into the XML declaration to find the encoding attribute, which will tell us the true encoding. Of course, none of this guarantees that we will be able to parse the feed in the declared character encoding (assuming it was declared correctly, which many are not). CJKCodecs and iconv_codec help a lot; you should definitely install them if you can. http://cjkpython.i18n.org/ ''' def _parseHTTPContentType(content_type): '''takes HTTP Content-Type header and returns (content type, charset) If no charset is specified, returns (content type, '') If no content type is specified, returns ('', '') Both return parameters are guaranteed to be lowercase strings ''' content_type = content_type or '' content_type, params = cgi.parse_header(content_type) return content_type, params.get('charset', '').replace("'", '') sniffed_xml_encoding = '' xml_encoding = '' true_encoding = '' http_content_type, http_encoding = _parseHTTPContentType(http_headers.get('content-type')) # Must sniff for non-ASCII-compatible character encodings before # searching for XML declaration. This heuristic is defined in # section F of the XML specification: # http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info try: if xml_data[:4] == '\x4c\x6f\xa7\x94': # EBCDIC xml_data = _ebcdic_to_ascii(xml_data) elif xml_data[:4] == '\x00\x3c\x00\x3f': # UTF-16BE sniffed_xml_encoding = 'utf-16be' xml_data = unicode(xml_data, 'utf-16be').encode('utf-8') elif (len(xml_data) >= 4) and (xml_data[:2] == '\xfe\xff') and (xml_data[2:4] != '\x00\x00'): # UTF-16BE with BOM sniffed_xml_encoding = 'utf-16be' xml_data = unicode(xml_data[2:], 'utf-16be').encode('utf-8') elif xml_data[:4] == '\x3c\x00\x3f\x00': # UTF-16LE sniffed_xml_encoding = 'utf-16le' xml_data = unicode(xml_data, 'utf-16le').encode('utf-8') elif (len(xml_data) >= 4) and (xml_data[:2] == '\xff\xfe') and (xml_data[2:4] != '\x00\x00'): # UTF-16LE with BOM sniffed_xml_encoding = 'utf-16le' xml_data = unicode(xml_data[2:], 'utf-16le').encode('utf-8') elif xml_data[:4] == '\x00\x00\x00\x3c': # UTF-32BE sniffed_xml_encoding = 'utf-32be' xml_data = unicode(xml_data, 'utf-32be').encode('utf-8') elif xml_data[:4] == '\x3c\x00\x00\x00': # UTF-32LE sniffed_xml_encoding = 'utf-32le' xml_data = unicode(xml_data, 'utf-32le').encode('utf-8') elif xml_data[:4] == '\x00\x00\xfe\xff': # UTF-32BE with BOM sniffed_xml_encoding = 'utf-32be' xml_data = unicode(xml_data[4:], 'utf-32be').encode('utf-8') elif xml_data[:4] == '\xff\xfe\x00\x00': # UTF-32LE with BOM sniffed_xml_encoding = 'utf-32le' xml_data = unicode(xml_data[4:], 'utf-32le').encode('utf-8') elif xml_data[:3] == '\xef\xbb\xbf': # UTF-8 with BOM sniffed_xml_encoding = 'utf-8' xml_data = unicode(xml_data[3:], 'utf-8').encode('utf-8') else: # ASCII-compatible pass xml_encoding_match = re.compile('^<\?.*encoding=[\'"](.*?)[\'"].*\?>').match(xml_data) except: xml_encoding_match = None if xml_encoding_match: xml_encoding = xml_encoding_match.groups()[0].lower() if sniffed_xml_encoding and (xml_encoding in ('iso-10646-ucs-2', 'ucs-2', 'csunicode', 'iso-10646-ucs-4', 'ucs-4', 'csucs4', 'utf-16', 'utf-32', 'utf_16', 'utf_32', 'utf16', 'u16')): xml_encoding = sniffed_xml_encoding acceptable_content_type = 0 application_content_types = ('application/xml', 'application/xml-dtd', 'application/xml-external-parsed-entity') text_content_types = ('text/xml', 'text/xml-external-parsed-entity') if (http_content_type in application_content_types) or \ (http_content_type.startswith('application/') and http_content_type.endswith('+xml')): acceptable_content_type = 1 true_encoding = http_encoding or xml_encoding or 'utf-8' elif (http_content_type in text_content_types) or \ (http_content_type.startswith('text/')) and http_content_type.endswith('+xml'): acceptable_content_type = 1 true_encoding = http_encoding or 'us-ascii' elif http_content_type.startswith('text/'): true_encoding = http_encoding or 'us-ascii' elif http_headers and (not http_headers.has_key('content-type')): true_encoding = xml_encoding or 'iso-8859-1' else: true_encoding = xml_encoding or 'utf-8' return true_encoding, http_encoding, xml_encoding, sniffed_xml_encoding, acceptable_content_type def _toUTF8(data, encoding): '''Changes an XML data stream on the fly to specify a new encoding data is a raw sequence of bytes (not Unicode) that is presumed to be in %encoding already encoding is a string recognized by encodings.aliases ''' if _debug: sys.stderr.write('entering _toUTF8, trying encoding %s\n' % encoding) # strip Byte Order Mark (if present) if (len(data) >= 4) and (data[:2] == '\xfe\xff') and (data[2:4] != '\x00\x00'): if _debug: sys.stderr.write('stripping BOM\n') if encoding != 'utf-16be': sys.stderr.write('trying utf-16be instead\n') encoding = 'utf-16be' data = data[2:] elif (len(data) >= 4) and (data[:2] == '\xff\xfe') and (data[2:4] != '\x00\x00'): if _debug: sys.stderr.write('stripping BOM\n') if encoding != 'utf-16le': sys.stderr.write('trying utf-16le instead\n') encoding = 'utf-16le' data = data[2:] elif data[:3] == '\xef\xbb\xbf': if _debug: sys.stderr.write('stripping BOM\n') if encoding != 'utf-8': sys.stderr.write('trying utf-8 instead\n') encoding = 'utf-8' data = data[3:] elif data[:4] == '\x00\x00\xfe\xff': if _debug: sys.stderr.write('stripping BOM\n') if encoding != 'utf-32be': sys.stderr.write('trying utf-32be instead\n') encoding = 'utf-32be' data = data[4:] elif data[:4] == '\xff\xfe\x00\x00': if _debug: sys.stderr.write('stripping BOM\n') if encoding != 'utf-32le': sys.stderr.write('trying utf-32le instead\n') encoding = 'utf-32le' data = data[4:] newdata = unicode(data, encoding) if _debug: sys.stderr.write('successfully converted %s data to unicode\n' % encoding) declmatch = re.compile('^<\?xml[^>]*?>') newdecl = '''''' if declmatch.search(newdata): newdata = declmatch.sub(newdecl, newdata) else: newdata = newdecl + u'\n' + newdata return newdata.encode('utf-8') def _stripDoctype(data): '''Strips DOCTYPE from XML document, returns (rss_version, stripped_data) rss_version may be 'rss091n' or None stripped_data is the same XML document, minus the DOCTYPE ''' entity_pattern = re.compile(r']*?)>', re.MULTILINE) data = entity_pattern.sub('', data) doctype_pattern = re.compile(r']*?)>', re.MULTILINE) doctype_results = doctype_pattern.findall(data) doctype = doctype_results and doctype_results[0] or '' if doctype.lower().count('netscape'): version = 'rss091n' else: version = None data = doctype_pattern.sub('', data) return version, data def parse(url_file_stream_or_string, etag=None, modified=None, agent=None, referrer=None, handlers=[]): '''Parse a feed from a URL, file, stream, or string''' result = FeedParserDict() result['feed'] = FeedParserDict() result['entries'] = [] if _XML_AVAILABLE: result['bozo'] = 0 if type(handlers) == types.InstanceType: handlers = [handlers] try: f = _open_resource(url_file_stream_or_string, etag, modified, agent, referrer, handlers) data = f.read() except Exception, e: result['bozo'] = 1 result['bozo_exception'] = e data = '' f = None # if feed is gzip-compressed, decompress it if f and data and hasattr(f, 'headers'): if gzip and f.headers.get('content-encoding', '') == 'gzip': try: data = gzip.GzipFile(fileobj=_StringIO(data)).read() except Exception, e: # Some feeds claim to be gzipped but they're not, so # we get garbage. Ideally, we should re-request the # feed without the 'Accept-encoding: gzip' header, # but we don't. result['bozo'] = 1 result['bozo_exception'] = e data = '' elif zlib and f.headers.get('content-encoding', '') == 'deflate': try: data = zlib.decompress(data, -zlib.MAX_WBITS) except Exception, e: result['bozo'] = 1 result['bozo_exception'] = e data = '' # save HTTP headers if hasattr(f, 'info'): info = f.info() if info.has_key('Etag'): result['etag'] = info.getheader('ETag') last_modified = info.getheader('Last-Modified') if last_modified: result['modified'] = _parse_date(last_modified) if hasattr(f, 'url'): result['href'] = f.url result['status'] = 200 if hasattr(f, 'status'): result['status'] = f.status if hasattr(f, 'headers'): result['headers'] = f.headers.dict if hasattr(f, 'close'): f.close() # there are four encodings to keep track of: # - http_encoding is the encoding declared in the Content-Type HTTP header # - xml_encoding is the encoding declared in the ; changed # project name #2.5 - 7/25/2003 - MAP - changed to Python license (all contributors agree); # removed unnecessary urllib code -- urllib2 should always be available anyway; # return actual url, status, and full HTTP headers (as result['url'], # result['status'], and result['headers']) if parsing a remote feed over HTTP -- # this should pass all the HTTP tests at ; # added the latest namespace-of-the-week for RSS 2.0 #2.5.1 - 7/26/2003 - RMK - clear opener.addheaders so we only send our custom # User-Agent (otherwise urllib2 sends two, which confuses some servers) #2.5.2 - 7/28/2003 - MAP - entity-decode inline xml properly; added support for # inline and as used in some RSS 2.0 feeds #2.5.3 - 8/6/2003 - TvdV - patch to track whether we're inside an image or # textInput, and also to return the character encoding (if specified) #2.6 - 1/1/2004 - MAP - dc:author support (MarekK); fixed bug tracking # nested divs within content (JohnD); fixed missing sys import (JohanS); # fixed regular expression to capture XML character encoding (Andrei); # added support for Atom 0.3-style links; fixed bug with textInput tracking; # added support for cloud (MartijnP); added support for multiple # category/dc:subject (MartijnP); normalize content model: 'description' gets # description (which can come from description, summary, or full content if no # description), 'content' gets dict of base/language/type/value (which can come # from content:encoded, xhtml:body, content, or fullitem); # fixed bug matching arbitrary Userland namespaces; added xml:base and xml:lang # tracking; fixed bug tracking unknown tags; fixed bug tracking content when # element is not in default namespace (like Pocketsoap feed); # resolve relative URLs in link, guid, docs, url, comments, wfw:comment, # wfw:commentRSS; resolve relative URLs within embedded HTML markup in # description, xhtml:body, content, content:encoded, title, subtitle, # summary, info, tagline, and copyright; added support for pingback and # trackback namespaces #2.7 - 1/5/2004 - MAP - really added support for trackback and pingback # namespaces, as opposed to 2.6 when I said I did but didn't really; # sanitize HTML markup within some elements; added mxTidy support (if # installed) to tidy HTML markup within some elements; fixed indentation # bug in _parse_date (FazalM); use socket.setdefaulttimeout if available # (FazalM); universal date parsing and normalization (FazalM): 'created', modified', # 'issued' are parsed into 9-tuple date format and stored in 'created_parsed', # 'modified_parsed', and 'issued_parsed'; 'date' is duplicated in 'modified' # and vice-versa; 'date_parsed' is duplicated in 'modified_parsed' and vice-versa #2.7.1 - 1/9/2004 - MAP - fixed bug handling " and '. fixed memory # leak not closing url opener (JohnD); added dc:publisher support (MarekK); # added admin:errorReportsTo support (MarekK); Python 2.1 dict support (MarekK) #2.7.4 - 1/14/2004 - MAP - added workaround for improperly formed
tags in # encoded HTML (skadz); fixed unicode handling in normalize_attrs (ChrisL); # fixed relative URI processing for guid (skadz); added ICBM support; added # base64 support #2.7.5 - 1/15/2004 - MAP - added workaround for malformed DOCTYPE (seen on many # blogspot.com sites); added _debug variable #2.7.6 - 1/16/2004 - MAP - fixed bug with StringIO importing #3.0b3 - 1/23/2004 - MAP - parse entire feed with real XML parser (if available); # added several new supported namespaces; fixed bug tracking naked markup in # description; added support for enclosure; added support for source; re-added # support for cloud which got dropped somehow; added support for expirationDate #3.0b4 - 1/26/2004 - MAP - fixed xml:lang inheritance; fixed multiple bugs tracking # xml:base URI, one for documents that don't define one explicitly and one for # documents that define an outer and an inner xml:base that goes out of scope # before the end of the document #3.0b5 - 1/26/2004 - MAP - fixed bug parsing multiple links at feed level #3.0b6 - 1/27/2004 - MAP - added feed type and version detection, result['version'] # will be one of SUPPORTED_VERSIONS.keys() or empty string if unrecognized; # added support for creativeCommons:license and cc:license; added support for # full Atom content model in title, tagline, info, copyright, summary; fixed bug # with gzip encoding (not always telling server we support it when we do) #3.0b7 - 1/28/2004 - MAP - support Atom-style author element in author_detail # (dictionary of 'name', 'url', 'email'); map author to author_detail if author # contains name + email address #3.0b8 - 1/28/2004 - MAP - added support for contributor #3.0b9 - 1/29/2004 - MAP - fixed check for presence of dict function; added # support for summary #3.0b10 - 1/31/2004 - MAP - incorporated ISO-8601 date parsing routines from # xml.util.iso8601 #3.0b11 - 2/2/2004 - MAP - added 'rights' to list of elements that can contain # dangerous markup; fiddled with decodeEntities (not right); liberalized # date parsing even further #3.0b12 - 2/6/2004 - MAP - fiddled with decodeEntities (still not right); # added support to Atom 0.2 subtitle; added support for Atom content model # in copyright; better sanitizing of dangerous HTML elements with end tags # (script, frameset) #3.0b13 - 2/8/2004 - MAP - better handling of empty HTML tags (br, hr, img, # etc.) in embedded markup, in either HTML or XHTML form (
,
,
) #3.0b14 - 2/8/2004 - MAP - fixed CDATA handling in non-wellformed feeds under # Python 2.1 #3.0b15 - 2/11/2004 - MAP - fixed bug resolving relative links in wfw:commentRSS; # fixed bug capturing author and contributor URL; fixed bug resolving relative # links in author and contributor URL; fixed bug resolvin relative links in # generator URL; added support for recognizing RSS 1.0; passed Simon Fell's # namespace tests, and included them permanently in the test suite with his # permission; fixed namespace handling under Python 2.1 #3.0b16 - 2/12/2004 - MAP - fixed support for RSS 0.90 (broken in b15) #3.0b17 - 2/13/2004 - MAP - determine character encoding as per RFC 3023 #3.0b18 - 2/17/2004 - MAP - always map description to summary_detail (Andrei); # use libxml2 (if available) #3.0b19 - 3/15/2004 - MAP - fixed bug exploding author information when author # name was in parentheses; removed ultra-problematic mxTidy support; patch to # workaround crash in PyXML/expat when encountering invalid entities # (MarkMoraes); support for textinput/textInput #3.0b20 - 4/7/2004 - MAP - added CDF support #3.0b21 - 4/14/2004 - MAP - added Hot RSS support #3.0b22 - 4/19/2004 - MAP - changed 'channel' to 'feed', 'item' to 'entries' in # results dict; changed results dict to allow getting values with results.key # as well as results[key]; work around embedded illformed HTML with half # a DOCTYPE; work around malformed Content-Type header; if character encoding # is wrong, try several common ones before falling back to regexes (if this # works, bozo_exception is set to CharacterEncodingOverride); fixed character # encoding issues in BaseHTMLProcessor by tracking encoding and converting # from Unicode to raw strings before feeding data to sgmllib.SGMLParser; # convert each value in results to Unicode (if possible), even if using # regex-based parsing #3.0b23 - 4/21/2004 - MAP - fixed UnicodeDecodeError for feeds that contain # high-bit characters in attributes in embedded HTML in description (thanks # Thijs van de Vossen); moved guid, date, and date_parsed to mapped keys in # FeedParserDict; tweaked FeedParserDict.has_key to return True if asking # about a mapped key #3.0fc1 - 4/23/2004 - MAP - made results.entries[0].links[0] and # results.entries[0].enclosures[0] into FeedParserDict; fixed typo that could # cause the same encoding to be tried twice (even if it failed the first time); # fixed DOCTYPE stripping when DOCTYPE contained entity declarations; # better textinput and image tracking in illformed RSS 1.0 feeds #3.0fc2 - 5/10/2004 - MAP - added and passed Sam's amp tests; added and passed # my blink tag tests #3.0fc3 - 6/18/2004 - MAP - fixed bug in _changeEncodingDeclaration that # failed to parse utf-16 encoded feeds; made source into a FeedParserDict; # duplicate admin:generatorAgent/@rdf:resource in generator_detail.url; # added support for image; refactored parse() fallback logic to try other # encodings if SAX parsing fails (previously it would only try other encodings # if re-encoding failed); remove unichr madness in normalize_attrs now that # we're properly tracking encoding in and out of BaseHTMLProcessor; set # feed.language from root-level xml:lang; set entry.id from rdf:about; # send Accept header #3.0 - 6/21/2004 - MAP - don't try iso-8859-1 (can't distinguish between # iso-8859-1 and windows-1252 anyway, and most incorrectly marked feeds are # windows-1252); fixed regression that could cause the same encoding to be # tried twice (even if it failed the first time) #3.0.1 - 6/22/2004 - MAP - default to us-ascii for all text/* content types; # recover from malformed content-type header parameter with no equals sign # ('text/xml; charset:iso-8859-1') #3.1 - 6/28/2004 - MAP - added and passed tests for converting HTML entities # to Unicode equivalents in illformed feeds (aaronsw); added and # passed tests for converting character entities to Unicode equivalents # in illformed feeds (aaronsw); test for valid parsers when setting # XML_AVAILABLE; make version and encoding available when server returns # a 304; add handlers parameter to pass arbitrary urllib2 handlers (like # digest auth or proxy support); add code to parse username/password # out of url and send as basic authentication; expose downloading-related # exceptions in bozo_exception (aaronsw); added __contains__ method to # FeedParserDict (aaronsw); added publisher_detail (aaronsw) #3.2 - 7/3/2004 - MAP - use cjkcodecs and iconv_codec if available; always # convert feed to UTF-8 before passing to XML parser; completely revamped # logic for determining character encoding and attempting XML parsing # (much faster); increased default timeout to 20 seconds; test for presence # of Location header on redirects; added tests for many alternate character # encodings; support various EBCDIC encodings; support UTF-16BE and # UTF16-LE with or without a BOM; support UTF-8 with a BOM; support # UTF-32BE and UTF-32LE with or without a BOM; fixed crashing bug if no # XML parsers are available; added support for 'Content-encoding: deflate'; # send blank 'Accept-encoding: ' header if neither gzip nor zlib modules # are available #3.3 - 7/15/2004 - MAP - optimize EBCDIC to ASCII conversion; fix obscure # problem tracking xml:base and xml:lang if element declares it, child # doesn't, first grandchild redeclares it, and second grandchild doesn't; # refactored date parsing; defined public registerDateHandler so callers # can add support for additional date formats at runtime; added support # for OnBlog, Nate, MSSQL, Greek, and Hungarian dates (ytrewq1); added # zopeCompatibilityHack() which turns FeedParserDict into a regular # dictionary, required for Zope compatibility, and also makes command- # line debugging easier because pprint module formats real dictionaries # better than dictionary-like objects; added NonXMLContentType exception, # which is stored in bozo_exception when a feed is served with a non-XML # media type such as 'text/plain'; respect Content-Language as default # language if not xml:lang is present; cloud dict is now FeedParserDict; # generator dict is now FeedParserDict; better tracking of xml:lang, # including support for xml:lang='' to unset the current language; # recognize RSS 1.0 feeds even when RSS 1.0 namespace is not the default # namespace; don't overwrite final status on redirects (scenarios: # redirecting to a URL that returns 304, redirecting to a URL that # redirects to another URL with a different type of redirect); add # support for HTTP 303 redirects #4.0 - MAP - support for relative URIs in xml:base attribute; fixed # encoding issue with mxTidy (phopkins); preliminary support for RFC 3229; # support for Atom 1.0; support for iTunes extensions; new 'tags' for # categories/keywords/etc. as array of dict # {'term': term, 'scheme': scheme, 'label': label} to match Atom 1.0 # terminology; parse RFC 822-style dates with no time; lots of other # bug fixes #4.1 - MAP - removed socket timeout; added support for chardet library canto-0.7.10/canto/gui.py000066400000000000000000000622661142361563200151640ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # Where thread.py is the work-horse of the worker thread (hah, aptly named), # gui.py is where most of the important work for the interface thread is done. # The Gui class does a number of things. First of all, it handles the main list # window. It creates it, writes to it, and destroys it. Second of all, it # handles all of the keybinds. # The core data structure of Gui is the map. The map includes one entry for all # items that are visible or could possibly be visible given the current settings # like global filters, tag filters, and current tags. Each entry in the map # is a dict that looks like this: # d["item"] -> the Story() object # d["tag"] -> the Tag() object the story is in. # d["row"] -> the row the item is on # d["lines"] -> the number of lines the item takes up, given the current # width the screen. # This information is regenerated each time the items change or the screen size # changes. All of these keybinds end up traversing the map somehow, and # particularly the self.sel variable that indicates what is selected. # The Gui class has a number of important functions. # refresh() -> create and resize the main window. # __map_items() -> regenerate the map # draw_elements() -> actually draw to the screen # key() -> converts a single key to a group of actions # action() -> perform a list of actions # alarm() -> takes the diffs generated by the worker thread and # integrates it into the current tags # Most of these significant functions relay their events to a Reader() object, # if necessary. # The rest of the functions are keybinds or helpers for said keybinds. from cfg.filters import validate_filter from cfg.sorts import validate_sort from input import input, search, num_input from basegui import BaseGui from reader import Reader from const import * import utility import extra import curses # DECORATOR DEFINITIONS # The following are a number of decorators that move a lot of repeating code out # of the gui itself. These provide some level of consistency between similar # functions without their true intents being lost in what amounts to template # code. # noitem_unsafe makes a keybind fail with a message if it requires items to be # present. Basically anything that doesn't change global settings is going to # make this check. def noitem_unsafe(fn): def ns_dec(self, *args): if self.items: return fn(self, *args) else: self.cfg.log("No Items.") return ns_dec # change_selected handles the select/unselect hooks, and the change_tag trigger. def change_selected(fn): def dec(self, *args): # Don't unselect before the function so that the function can still # ascertain what item is selected. oldsel = self.sel r = fn(self, *args) if oldsel: oldsel["item"].unselect() if self.cfg.unselect_hook: self.cfg.unselect_hook(oldsel["tag"], oldsel["item"]) if self.sel: self.sel["item"].select() if self.cfg.select_hook: self.cfg.select_hook(self.sel["tag"], self.sel["item"]) if "change_tag" in self.cfg.triggers and\ oldsel and self.sel and \ oldsel["tag"] != self.sel["tag"]: self.change_tag_override = 1 return r return dec # The rest of the decorators are self-explanatory, mostly just printing out a # log message when their decorated function is called. def change_filter(fn): def dec(self, *args): r,f = fn(self, *args) if not r: return self.cfg.log("Filter: %s" % f) for t in self.tags: t.clear() self.sel = None self.items = 0 return REFILTER return dec def change_tag_filter(fn): def dec(self, *args): r,f = fn(self, *args) if not r: return self.cfg.log("Tag Filter: %s" % f) return TFILTER return dec def change_sorts(fn): def dec(self, *args): r,s = fn(self, *args) if not r: return self.cfg.log("Sort: %s" % s) # Because sorts, unlike filters, don't change the items that are # present in the tag, we can do these ASAP, whereas the filters have # to be moved through the work thread. self.sel["tag"].sort(s) self.sel["tag"].enum() return UPDATE return dec def change_tags(fn): def dec(self, *args): r,t = fn(self, *args) if r: for ot in self.tags: ot.clear() self.tags = t self.sel = None self.items = 0 self.cfg.log("Tags: %s" % ", ".join([unicode(x) for x in t])) return RETAG return dec # The main class. class Gui(BaseGui) : def __init__(self, cfg, tags): self.keys = cfg.key_list self.window_list = [] self.map = [] self.reader_obj = None self.cfg = cfg self.lines = 0 self.sel = None self.offset = 0 self.max_offset = 0 self.tags = tags self.change_tag_override = 0 if self.cfg.start_hook: self.cfg.start_hook(self) def refresh(self): # Generate all of the columns self.window_list = [curses.newpad(self.cfg.gui_height + 1, \ self.cfg.gui_width / self.cfg.columns)\ for i in range(0, self.cfg.columns)] # Setup the backgrounds. for window in self.window_list: window.bkgdset(curses.color_pair(1)) # Self.lines is the maximum number of visible lines on the screen # at any given time. Used for scroll detection. self.lines = self.cfg.columns * self.cfg.gui_height # A redraw indicates that the map must be regenerated. self.__map_items() # Pass to the reader if self.reader_obj: self.reader_obj.refresh() self.draw_elements() def __map_items(self): # This for loop populates self.map with all stories that # A - are first in a collapsed feed or not in one at all. # B - that actually manage to print something to the screen. # We keep track of the "virtual row" # Essentially, map pretends that we're drawing onto an infinitely long # single window, and it's up to draw_elements to determine what range in # the map is actually visible and then it's up to the Renderer() # (particularly Renderer.__window()) to convert that into a real window # and row to draw to. self.map = [] self.items = 0 row = 0 for i, tag in enumerate(self.tags): for item in tag: if not tag.collapsed or item.idx == 0: lines = self.print_item(tag, item, 0) if lines: self.map.append( {"tag" : tag, "row" : row, "item" : item, "lines" : lines}) self.items += 1 if self.items == 1: self.map[0]["prev"] = 0 else: self.map[-2]["next"] = self.items - 1 self.map[-1]["prev"] = self.items - 2 row += lines if self.items: self.map[-1]["next"] = self.items - 1 # Set max_offset, this is how we know not to recenter the # screen when it would leave unused space at the end. self.max_offset = row - self.lines # Print a single item to the screen. def print_item(self, tag, story, row): d = { "story" : story, "tag" : tag, "row" : row, "cfg" : self.cfg, "width" : self.cfg.gui_width / self.cfg.columns, "window_list" : self.window_list } r = tag.renderer.story(d) # Dereference anything that was fetched from disk story.free() return r # Print all stories in self.map. Ignores all off screen items. def draw_elements(self): if self.items: self.__check_scroll() row = -1 * self.offset for item in self.map: # If row is not offscreen up if item["row"] + item["lines"] > self.offset: # If row is offscreen down if item["row"] > self.lines + self.offset: break self.print_item(item["tag"], item["item"], row) row += item["lines"] else: row = -1 # Actually perform curses screen update. for i,win in enumerate(self.window_list) : # Clear unused space (entirely or partially empty columns) if i * self.cfg.gui_height >= row: win.erase() else: win.clrtobot() win.noutrefresh(0,0, self.cfg.gui_top, i*(self.cfg.gui_width / self.cfg.columns) + self.cfg.gui_right, self.cfg.gui_top + self.cfg.gui_height - 1, (i+1)*(self.cfg.gui_width / self.cfg.columns) + self.cfg.gui_right) if self.reader_obj: self.reader_obj.draw_elements() curses.doupdate() # This is only overridden to pass to the reader, otherwise the BaseGui key # implementation would be suitable. def key(self, k): if self.reader_obj: return self.reader_obj.key(k) return BaseGui.key(self, k) def action(self, a): if self.reader_obj: return self.reader_obj.action(a) r = BaseGui.action(self, a) # The change_tag_override forces the return to be UPDATE if self.change_tag_override: self.change_tag_override = 0 return UPDATE return r def __single_scroll_up(self, adj): self.offset = max(self.sel["row"] - adj, 0) def __single_scroll_down(self, adj): self.offset = min(self.sel["row"] - adj, self.max_offset) def __page_scroll_up(self, adj): self.offset = max(self.sel["row"] -\ (self.lines - self.cfg.cursor_edge), 0) def __page_scroll_down(self, adj): self.offset = min(self.sel["row"] - self.cfg.cursor_edge, self.max_offset) def __scroll_up(self, adj=None): if self.cfg.cursor_scroll == "page": self.__page_scroll_up(adj) elif self.cfg.cursor_scroll == "scroll": self.__single_scroll_up(adj) def __scroll_down(self, adj=None): if self.cfg.cursor_scroll == "page": self.__page_scroll_down(adj) elif self.cfg.cursor_scroll == "scroll": self.__single_scroll_down(adj) def __edge_check_scroll(self): if self.sel["lines"] > self.cfg.cursor_edge: fuzz = self.sel["lines"] else: fuzz = self.cfg.cursor_edge if fuzz >= (self.lines / 2): fuzz = (self.lines / 2) - 1 # Scroll up always uses cursor_edge if max(self.sel["row"] - self.cfg.cursor_edge, 0) < self.offset: self.__scroll_up(self.cfg.cursor_edge) return 1 # Scroll down uses fuzz to take the item's lines into account. if self.sel["row"] + fuzz > self.lines + self.offset: self.__scroll_down(self.lines - fuzz) return 1 return 0 def __pin_check_scroll(self): if self.cfg.cursor_type == "middle": adj = self.cfg.gui_height / 2 elif self.cfg.cursor_type == "top": adj = 0 elif self.cfg.cursor_type == "bottom": adj = self.cfg.gui_height - self.sel["lines"] # If our current item is offscreen up, ret 1 if self.sel["row"] < self.offset + adj and \ self.offset > 0: self.__scroll_up(adj) return 1 # If our current item is offscreen down, ret 1 if self.sel["row"] > (self.offset + adj) and\ self.offset < self.max_offset: self.__scroll_down(adj) return 1 return 0 def __check_scroll(self) : if self.cfg.cursor_type in ["edge", "old"]: return self.__edge_check_scroll() return self.__pin_check_scroll() @change_selected def alarm(self, new=[], old=[]): # This is where the item diffs generated by the worker thread are # integrated into the currently displayed tags. for lst in [new, old]: if lst: for i, t in enumerate(lst): if not t: continue # global filter, tag filter, tag sort, added/removed items gf, tf, s, l = t # Check that the diff was created with the same filters and # sorts that are still in play gf = self.cfg.all_filters[gf] tf = self.cfg.all_filters[tf] s = self.cfg.all_sorts[s] if l and self.tags[i].sorts.cur() == s and\ self.tags[i].filters.cur() == tf and\ self.cfg.filters.cur() == gf: # Add or remove them as necessary if lst is old: self.tags[i].retract(l) else: self.tags[i].extend(l) # Remap since we may have added or removed items # Keep the old map for closest item search. oldmap = self.map self.__map_items() # At this point, the items have successfully been integrated into the # running tags, so now we just attempt to maintain our current selection # status. def search_map(map, sel): for (i, item) in enumerate(map): if item["item"] == sel["item"] and\ item["tag"] == sel["tag"]: return i return -1 def nearest_in_tag(newmap, oldmap, sel): i = search_map(oldmap, sel) if i < 0: return -1 olen = len(oldmap) distance = 1 while True: still_in_tag = 0 if i - distance > 0 and\ oldmap[i - distance]["tag"] == sel["tag"]: still_in_tag = 1 match = search_map(newmap, oldmap[i - distance]) if match >= 0: return match if i + distance < olen and\ oldmap[i + distance]["tag"] == sel["tag"]: still_in_tag = 1 match = search_map(newmap, oldmap[i + distance]) if match >= 0: return match if not still_in_tag: return -1 distance += 1 if self.items: if self.sel: i = search_map(self.map, self.sel) # Item in new map still, maintain selection if i > -1: self.sel = self.map[i] else: # Item's not in map, try to select the nearest i = nearest_in_tag(self.map, oldmap, self.sel) if i > -1: self.sel = self.map[i] else: # Not in the map, no other tag items in map, # go ahead and select the top of the new items. self.__select_topoftag() else: # No selection made, select the first item possible. This is the # initial case and the case after a tag change or refilter. self.__select_topoftag(0) # If we had a selection, and now no items elif self.sel: self.cfg.log("No Items.") self.sel = None if self.cfg.update_hook: self.cfg.update_hook(self) @noitem_unsafe @change_selected def __select_topoftag(self, t=-1): if t < 0: # Default case, attempt to select the top of the current tag. ts = self.tags[self.tags.index(self.sel["tag"]):] else: ts = self.tags[t:] for i in xrange(len(self.map)): if self.map[i]["tag"] in ts: self.sel = self.map[i] break else: self.sel = self.map[0] @noitem_unsafe @change_selected def next_item(self): self.sel = self.map[self.sel["next"]] @noitem_unsafe def next_tag(self): curtag = self.sel["tag"] while self.sel != self.map[-1]: if curtag != self.sel["tag"]: break self.next_item() @noitem_unsafe @change_selected def prev_item(self): self.sel = self.map[self.sel["prev"]] @noitem_unsafe def prev_tag(self): curtag = self.sel["tag"] while self.sel != self.map[0]: if curtag != self.sel["tag"] and \ self.sel["item"] == self.sel["tag"][0]: break self.prev_item() # Goto_tag goes to an absolute #'d tag. So the third # tag defined in your configuration will always be '3' @noitem_unsafe @change_selected def goto_tag(self, num = None): if not num: num = num_input(self.cfg, "Absolute Tag") if num == None: return # Simple wrapping like python, so -1 is the last, -2 is the second to # last, etc. etc. if num < 0: num = len(self.tags) + num num = min(len(self.tags) - 1, num) target = self.tags[num] for item in self.map: if item["tag"] == target: self.sel = item break else: self.cfg.log("Abolute Tag %d not visible" % num) # Goto_reltag goes to a tag relative to what's visible. @noitem_unsafe @change_selected def goto_reltag(self, num = None): if not num: num = num_input(self.cfg, "Tag") if not num: return idx = self.tags.index(self.sel["tag"]) idx = max(min(len(self.tags) - 1, idx + num), 0) target = self.tags[idx] for item in self.map: if item["tag"] == target: self.sel = item break @noitem_unsafe @change_selected def next_filtered(self, f) : cursor = self.map[self.sel["next"]] while True: if f(cursor["tag"], cursor["item"]): self.sel = cursor return if cursor == self.map[-1]: return cursor = self.map[cursor["next"]] @noitem_unsafe @change_selected def prev_filtered(self, f) : cursor = self.map[self.sel["prev"]] while True: if f(cursor["tag"], cursor["item"]): self.sel = cursor return if cursor == self.map[0]: return cursor = self.map[cursor["prev"]] def next_mark(self): self.next_filtered(extra.show_marked()) def prev_mark(self): self.prev_filtered(extra.show_marked()) def next_unread(self): self.next_filtered(extra.show_unread()) def prev_unread(self): self.prev_filtered(extra.show_unread()) @noitem_unsafe def just_read(self): self.sel["tag"].set_read(self.sel["item"]) @noitem_unsafe def just_unread(self): self.sel["tag"].set_unread(self.sel["item"]) @noitem_unsafe def goto(self) : self.sel["tag"].set_read(self.sel["item"]) self.draw_elements() utility.goto(("", self.sel["item"]["link"], "link"), self.cfg) def help(self): self.cfg.wait_for_pid = utility.silentfork("man canto", "", 1, 0) @noitem_unsafe def reader(self) : self.sel["item"].in_reader = True self.reader_obj = Reader(self.cfg, self.sel["tag"],\ self.sel["item"], self.reader_dead) return REDRAW_ALL # This is the callback when the reader is done. def reader_dead(self): self.reader_obj = None self.sel["item"].in_reader = False @change_selected @change_filter def set_filter(self, filt): filt = validate_filter(self.cfg, filt) return (self.cfg.filters.override(filt), self.cfg.filters.cur()) @change_selected @change_filter def next_filter(self): return (self.cfg.filters.next(), self.cfg.filters.cur()) @change_selected @change_filter def prev_filter(self): return (self.cfg.filters.prev(), self.cfg.filters.cur()) @noitem_unsafe @change_selected @change_tag_filter def set_tag_filter(self, filt): filt = validate_filter(self.cfg, filt) return (self.sel["tag"].filters.override(filt),\ self.sel["tag"].filters.cur()) @noitem_unsafe @change_selected @change_tag_filter def next_tag_filter(self): self.cfg.log("%s" % self.sel["tag"].filters) return (self.sel["tag"].filters.next(),\ self.sel["tag"].filters.cur()) @noitem_unsafe @change_selected @change_tag_filter def prev_tag_filter(self): return (self.sel["tag"].filters.prev(),\ self.sel["tag"].filters.cur()) @noitem_unsafe @change_sorts def next_tag_sort(self): return (self.sel["tag"].sorts.next(), self.sel["tag"].sorts.cur()) @noitem_unsafe @change_sorts def prev_tag_sort(self): return (self.sel["tag"].sorts.prev(), self.sel["tag"].sorts.cur()) @noitem_unsafe @change_sorts def set_tag_sort(self, sort): sort = validate_sort(self.cfg, sort) return (self.sel["tag"].sorts.override(sort),\ self.sel["tag"].sorts.cur()) @change_selected @change_tags def next_tagset(self): return (self.cfg.tags.next(), self.cfg.tags.cur()) @change_selected @change_tags def prev_tagset(self): return (self.cfg.tags.prev(), self.cfg.tags.cur()) @change_selected @change_tags def set_tagset(self, t): newtags = [] if not t: for feed in self.cfg.feeds: for tag in self.cfg.cfgtags: if tag.tag == feed.tags[0]: newtags.append(tag) else: for tag in t: for ctag in self.cfg.cfgtags: if ctag.tag == tag: newtags.append(ctag) break else: newtags.append(Tag(self.cfg, self.cfg.default_renderer, self.cfg.tag_sorts, self.cfg.tag_filters, tag)) return (self.cfg.tags.override(newtags), self.cfg.tags.cur()) @noitem_unsafe def inline_search(self): self.do_inline_search(search(self.cfg, "Inline Search")) def do_inline_search(self, s) : if s: for t in self.tags: for story in t: if s.match(story["title"]): story.set("marked") else: story.unset("marked") self.prev_mark() self.next_mark() self.draw_elements() @noitem_unsafe def toggle_mark(self): if self.sel["item"].was("marked"): self.sel["item"].unset("marked") else: self.sel["item"].set("marked") @noitem_unsafe def all_unmarked(self): for item in self.map: if item["item"].was("marked"): item["item"].unset("marked") @noitem_unsafe def toggle_collapse_tag(self): self.sel["tag"].collapsed =\ not self.sel["tag"].collapsed self.sel["item"].unselect() self.__map_items() self.__select_topoftag() def __collapse_all(self, c): for t in self.tags: t.collapsed = c self.__map_items() self.__select_topoftag() # These are convenience functions so that keybinds don't have to be lambdas # or other functions and can therefore be more easily manipulated. def set_collapse_all(self): self.__collapse_all(1) def unset_collapse_all(self): self.__collapse_all(0) def force_update(self): self.cfg.log("Forcing update.") return UPDATE @noitem_unsafe def tag_read(self): self.sel["tag"].all_read() def all_read(self): for t in self.tags: t.all_read() @noitem_unsafe def tag_unread(self): self.sel["tag"].all_unread() def all_unread(self): for t in self.tags : t.all_unread() def restart(self): return RESTART def quit(self): if self.cfg.end_hook: self.cfg.end_hook(self) return EXIT canto-0.7.10/canto/handlers.py000066400000000000000000000032451142361563200161700ustar00rootroot00000000000000#Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. class Handler(): def __init__(self): self.reset() def get_attr(self, attrs, attr): l = [v for k,v in attrs if k == attr] if l : return l[0] class LinkHandler(Handler): def reset(self): self.active = 0 self.link = "" self.content = "" self.handler = "link" def match(self, tag, attrs, open, ll): if tag == "a": if open: href = self.get_attr(attrs, "href") if href: self.link = href self.active = 1 return "%4" else: self.reset() else: ll.append((self.content, self.link, self.handler)) self.reset() return u"[" + unicode(len(ll)) + u"]%0" class ImageHandler(Handler): def reset(self): self.active = 0 self.handler = "image" def match(self, tag, attrs, open, ll): if tag == "img": if open: src = self.get_attr(attrs, "src") alt = self.get_attr(attrs, "alt") if not alt: alt = "image" if src: extension = src.rsplit('.',1)[-1] ll.append((alt, src, self.handler)) self.reset() return u"%7["+ alt + u"][" + unicode(len(ll)) + u"]%0" canto-0.7.10/canto/input.py000066400000000000000000000100621142361563200155220ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. from curses import ascii import curses import signal import re # I am aware that Python's curses library comes with a TextBox class # and, indeed, the input() function was using it for awhile. The problems # with Textbox were numerous though: # * Only ASCII characters could be printed/inputted (the *big* one) # * Included a whole bunch of multi-line editing and validation stuff # that was completely unnecessary, since we know the input line # only needs to be one line long. # * To make editing easier, it used a half-ass system of gathering # the data from the window's written content with win.inch(), # which apparently didn't play nice with multi-byte? # # All of these problems have been fixed in half as many lines with all # the same functionality on a single line basis, but the design is still # based on Textbox. class InputBox: def __init__(self, win): self.minx = win.getyx()[1] self.x = self.minx self.win = win self.win.keypad(1) self.result = "" def refresh(self): self.win.move(0, self.minx) maxx = self.win.getmaxyx()[1] try: self.win.addstr(self.result[-1 * (maxx - self.minx):]\ .encode("UTF-8", "replace")) except: pass self.win.clrtoeol() self.win.move(0, min(self.x, maxx - 1)) self.win.refresh() def key(self, ch): if ch in (ascii.STX, curses.KEY_LEFT): if self.x > self.minx: self.x -= 1 elif ch in (ascii.BS, curses.KEY_BACKSPACE): if self.x > self.minx: idx = self.x - self.minx self.result = self.result[:idx - 1] + self.result[idx:] self.x -= 1 elif ch in (ascii.ACK, curses.KEY_RIGHT): # C-f self.x += 1 if len(self.result) + self.minx < self.x: self.result += " " elif ch in (ascii.ENQ, curses.KEY_END): # C-e self.x = self.minx + len(self.result) elif ch in (ascii.SOH, curses.KEY_HOME): # C-a self.x = self.minx elif ch == ascii.NL: # C-j return 0 elif ch == ascii.BEL: # C-g return -1 elif ch == ascii.FF: # C-l self.refresh() else: self.x += 1 idx = self.x - self.minx self.result = self.result[:idx] + unichr(ch) + self.result[idx:] return 1 def edit(self): while 1: ch = self.win.getch() if ch <= 0: continue r = self.key(ch) if not r: break if r < 0: self.result = None break self.refresh() return self.result def input(cfg, prompt): cfg.message("%B%1" + prompt + ":%b ") cfg.msg.move(0, len(prompt) + 2) temp = signal.getsignal(signal.SIGALRM) signal.signal(signal.SIGALRM, signal.SIG_IGN) # These curs_set calls can except, but we shouldn't care try: curses.curs_set(1) except: pass term = InputBox(cfg.msg).edit() try: curses.curs_set(0) except: pass signal.signal(signal.SIGALRM, temp) signal.alarm(1) cfg.msg.erase() cfg.msg.refresh() return term def num_input(cfg, prompt): term = input(cfg, prompt) if not term: return try: term = int(term) except: cfg.log("Not a number.") return None return term def search(cfg, prompt): term = input(cfg, prompt) if not term : return elif term.startswith("rgx:"): str = term[4:] else: str = ".*" + re.escape(term) + ".*" try: m = re.compile(str, re.I) except: return None return m canto-0.7.10/canto/interface_draw.py000066400000000000000000000401301142361563200173370ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # Interface_draw comprises the python base of canto's drawing code. The Renderer # class is the object wrapper around canto's C extension (where all of the # actual ncurses drawing is done). # The Renderer class contains functions to draw each of the main components of # the interface. The reader, the story list and the message status. In < 0.7.0 # the entire class had to be overridden in what amounted to an overly complex # way to do *anything*. # In >= 0.7.0, each of these functions is augmented with a series of hooks that # affect the content that's going to be drawn. So where in 0.6.x the reader() # function would explicitly render the content to HTML and insert the links and # display to the screen, >= 0.7.0 decorates the reader() function with a number # of auxiliary functions that amount to doing the same thing by default but are # much easier to modify. One call to reader() turns into something like this: # reader_base -> adds the basic, unmodified text to the dict # reader_convert_html -> converts any HTML elements in the text, grabs # links out as well. # reader_highlight_quotes -> highlights the quotes with color %5 # reader_add_main_link -> adds the story link to links from convert_html # reader_add_enc_links -> adds any enclosure links to links # reader_render_links -> actually adds the content to # reader -> finally actually perform the write # This is all done transparently. Each of these functions takes a single dict # that can be added to transparently. So, when someone wants to add some # information, they only need to write a function that takes a dict and # manipulates the data in it, and insert it into the hooks. Much simpler than # overriding a Python class. For an example of how that works, see the # add_hook*/add_info functions in canto.extra. # The dict, after the _base() call, contains: # dict["content"] -> the text (either the story title or story description) # dict["story"] -> the relevant Story() object # dict["tag"] -> the tag that the story is in # dict["cfg"] -> Canto's Cfg() object # There may be other particulars in there, but they're more for use in the # drawing than to be modified on the fly. # A call to story() is similarly augmented. # The hooks are capable of doing post_hooks as well, which take place after the # content is drawn. These aren't used by default, but could be used to perform # any tear-down from more complex pre_hooks. from widecurse import core, tlen import canto_html import locale import re # The draw_hooks decorator is what actually turns a single reader or story call # in the succession of calls. def draw_hooks(func): def new_func(self, *args): # Hooray for Python introspection. base = getattr(self, func.func_name + "_base", None) pre = getattr(self, "pre_" + func.func_name, []) post = getattr(self, "post_" + func.func_name, []) r = None # Base function to set dict["content"] if base: base(*args) # Pre hooks for f in pre: f(*args) # The actual expected call r = func(self, *args) # Post hooks for f in post: f(*args) return r return new_func # The BaseRenderer class, to ensure that any custom renderer is going to have # all the necessary functions. class BaseRenderer : def status(self, bar, height, width, str): pass def reader(self, dict): pass def story(self, dict): pass # The main Renderer class. # As mentioned above, the reader() and story() calls are augmented with hooks. # These only handle content. The remainder of the drawing logic (the code that # draws the pretty boxes and the tag headers on top of the first items) is # handled by 5 functions for the story list and 5 functions for the reader. # Story list # tag_head -> draws the top of each tag # firsts -> draws the first line of each item # mids -> draws the middle lines of each item # ends -> draws the last line of each item # tag_foot -> draws the bottom of each tag # The reader's corresponding functions are reader_{head, foot} and # r{firsts, mids, ends}. In the case that there's only one line, only the # firsts() functions are used, so it's not guaranteed that any function but that # one will be called. # All of these functions return tuples of three items: # (head, repeat, end) # Where head is the left content, end is the right content, and repeat is the # string repeated to fill the gap. # The head and foot functions return a list of them, each one assumed to be a # new line. class Renderer(BaseRenderer): def __init__(self): self.htmlrenderer = canto_html.CantoHTML() self.prefcode = locale.getpreferredencoding() self.html_rgx = [ # Eliminate extraneous HTML (re.compile(u"<.*?>"), u"") ] # These are used by the story pre_hook "strip_entities" self.story_rgx = [ (re.compile(u"&(\w{1,8});"), self.htmlrenderer.ent_wrapper), (re.compile(u"&#([xX]?[0-9a-fA-F]+)[^0-9a-fA-F]"), self.htmlrenderer.char_wrapper) ] # Default hook definitions self.pre_story = [ self.story_strip_entities ] self.pre_reader = [ self.reader_convert_html, self.reader_highlight_quotes, self.reader_add_main_link, self.reader_add_enc_links, self.reader_render_links, ] self.highlight_quote_rgx = re.compile(u"[\\\"](.*?)[\\\"]") self.bq = u"%B%1│%0%b " self.bq_on = 0 self.indent = u" " self.in_on = 0 # Call the initialization hook. self.init_hook() def tag_head(self, dict): t = u"%1" + dict["tag"].tag + u" [%2" + unicode(dict["tag"].unread)\ + u"%0]%0" if dict["tag"].collapsed: if dict["tag"][0].selected(): return [(u"%C%B%1 > " + t + u"", u" ", u" "),(u" ",u" ",u" ")] else: return [(u"%C%B " + t + u"",u" ", u" "),(u" ",u" ",u" ")] return [(u"%B " + t, u" ", u""),(u"%1┌", u"─", u"┐%C%0")] def firsts(self, dict): base = u"%C%1%B│%b%0 " if dict["story"].selected() : base += u"%1%B>%b%0 " else: base += u" " if dict["story"].was("marked"): base += u"%1%B" else: if dict["story"].was("read"): base += u"%3" else: base += u"%2%B" return (base, u" ", u" %1%B│%b%0") def mids(self, dict): return (u"%1%B│%b%0 ", u" ", u" %1%B│%b%0") def ends(self, dict): return (u"%1%B│%b%0 ", u" ", u" %1%B│%b%0") def tag_foot(self, dict): return [(u"%1%B└", u"─", u"┘%C%0")] def reader_head(self, dict): if "html" in dict["story"].get_title_type(): title = self.do_regex(dict["story"]["title"], self.html_rgx) else: title = dict["story"]["title"] title = self.do_regex(title, self.story_rgx) return [(u"%1%B" + title, u" ", u" "),(u"%1┌",u"─",u"┐%C")] def reader_foot(self, dict): return [(u"%B└", u"─", u"┘%C")] def rfirsts(self, dict): return (u"%1%B│%b%0 ", u" ", u" %1%B│%b%0") def rmids(self, dict): return (u"%1%B│%b%0 ", u" ", u" %1%B│%b%0") def rends(self, dict): return (u"%1%B│%b%0 ", u" ", u" %1%B│%b%0") # __window converts a virtual row into a window and an offset row. So if # you're got two columns that are 80 lines long, __window called with row 82 # will return window_list[1], row 1. def __window(self, row, height, window_list): if height != -1: winidx, winrow = divmod(row, height) try : window = window_list[winidx] except IndexError: window = None return (window, winrow) else: return (window_list[0], row) # The core_wrap and tlen_wrap functions exist to handle the encoding of # content to an encoding that can be printed to the terminal. Both take # unicode and return unicode, so aside from when the config / args are # parsed, this is the only place that canto deals with non-Unicode data. def core_wrap(self, window, winrow, width, s, rep, end): ret = core(window, winrow, 0, width, s.encode(self.prefcode, 'replace'), rep.encode(self.prefcode, 'replace'), end.encode(self.prefcode, 'replace')) if ret: ret = unicode(ret, self.prefcode) return ret def tlen_wrap(self, s): return tlen(s.encode(self.prefcode, 'replace')) # simple_out is a simple drawing function that has no overhead for doing # complicated stuff like block level formatting. It also only handles a # single (h, r, e) tuple. This is useful for drawing heads/feet where all of # the formatting is already done and only one (h, r, e) tuple is used. def simple_out(self, list, row, height, width, window_list): line = 0 for s,rep,end in list: while s: window, winrow = self.__window(row + line, height, window_list) s = self.core_wrap(window, winrow, width, s, rep, end) line += 1 return row + line # out is a much more complex drawing function. It does block level # formatting and takes a list of lines associated with an (h, r, e) tuple. # This is used for all of the real content. See reader() or story() for use. def out(self, list, row, height, width, window_list): line = 0 for s, l in list: if s: # Handle block level style, not covered in widecurse. # This is broken into three sections so that styles # can be applied to a single line and applied in between. # Note, as with any unknown % escape, these will be # totally ignored in the middle of a line. # Toggle on based on start of line while s[:2] in [u"%Q",u"%I"]: if s.startswith(u"%Q"): self.bq_on += 1 else: self.in_on += 1 s = s[2:] # Add decorations to firsts,mids,lasts if self.bq_on: l = [(e[0] + self.bq * self.bq_on,\ e[1],e[2]) for e in l] if self.in_on: l = [(e[0] + self.indent * self.in_on,\ e[1],e[2]) for e in l] # Toggle off based on end of line while s[-2:] in [u"%q",u"%i"]: if s.endswith(u"%q"): self.bq_on -= 1 else: self.in_on -= 1 s = s[:-2] while s : window, winrow = self.__window(row + line, height, window_list) # First line, obviously use first line caps. if line == 0: start, rep, end = l[0] # If line > 1 and we've got more than could be handled # with end_caps, use mid_caps elif self.tlen_wrap(s) > (width - (self.tlen_wrap(l[2][2]))): start, rep, end = l[1] # Otherwise, use end_caps else: start, rep, end = l[2] t = s s = self.core_wrap(window, winrow, width, start + s, rep, end) # Detect an infinite loop caused by start, and canto # trying to be smart about wrapping =). if s == t: s = self.core_wrap(window, winrow, width, s, u" ",u"") line += 1 return row + line # *** From here out, it's all story, reader, status and associated hooks. def do_regex(self, target, rlist): s = target for rgx,rep in rlist: s = rgx.sub(rep,s) return s def story_base(self, dict): if dict["story"]["title"]: dict["content"] = dict["story"]["title"] else: dict["content"] = "%B%b" dict["type"] = dict["story"].get_title_type() def story_strip_entities(self, dict): if "html" in dict["type"]: dict["content"] = self.do_regex(dict["content"], self.html_rgx) dict["content"] = self.do_regex(dict["content"], self.story_rgx) dict["content"] = dict["content"].lstrip().rstrip() @draw_hooks def story(self, dict): d = {"tag": dict["tag"], "story" : dict["story"],\ "cfg" : dict["tag"].cfg } row = dict["row"] if dict["story"].idx == 0: row = self.simple_out(self.tag_head(d),\ row, dict["tag"].cfg.gui_height, \ dict["width"], dict["window_list"]) if not dict["tag"].collapsed: row = self.out([[dict["content"], (self.firsts(d), self.mids(d), \ self.ends(d))]], row, dict["tag"].cfg.gui_height,\ dict["width"], dict["window_list"]) if dict["story"].last: row = self.simple_out(self.tag_foot(d),\ row, dict["tag"].cfg.gui_height, \ dict["width"], dict["window_list"]) return row def reader_base(self, dict): dict["content"] = dict["story"].get_text() dict["type"] = dict["story"].get_type() def reader_convert_html(self, dict): if "html" in dict["type"]: dict["content"], dict["links"] = \ self.htmlrenderer.convert(dict["content"]) else: dict["content"] = self.do_regex(dict["content"], self.story_rgx) dict["links"] = [] def reader_add_main_link(self, dict): dict["links"] = [(u"main link", dict["story"]["link"], "link")]\ + dict["links"] def reader_add_enc_links(self, dict): if "enclosures" in dict["story"]: for e in dict["story"]["enclosures"]: if "type" not in e: e["type"] = "unknown" dict["links"].append((u"[%s]" % e["type"], e["href"], "link")) def reader_render_links(self, dict): if not dict["show_links"]: return dict["content"] += "\n" for idx, link in enumerate(dict["links"]): if link[2] == "link": color = u"%4" elif link[2] == "image": color = u"%7" else: color = u"%8" dict["content"] += color + u"[" + unicode(idx) + u"] " + \ link[0] + u"%1 - " + link[1] + "\n" def reader_highlight_quotes(self, dict): dict["content"] = self.highlight_quote_rgx.sub(u"%5\"\\1\"%0",\ dict["content"]) @draw_hooks def reader(self, dict): d = {"story" : dict["story"], "cfg" : dict["cfg"] } l = dict["content"].split("\n") row = self.simple_out(self.reader_head(d), 0, -1,\ dict["width"], [dict["window"]]) row = self.out([[x, (self.rfirsts(d), self.rmids(d), self.rends(d))] for x in l], row, -1,\ dict["width"], [dict["window"]]) row = self.simple_out(self.reader_foot(d), row, -1,\ dict["width"], [dict["window"]]) return row, dict["links"] def status(self, bar, height, width, str): self.simple_out([(str, u" ", u"")], 0, height, width, [bar]) def init_hook(self): # Do nothing. Override this in your own renderer. return 0 canto-0.7.10/canto/main.py000066400000000000000000000566011142361563200153200ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # The Main object encompasses a running instance of Canto. It can be divided # into a number of parts. # init # if called as canto-fetch jump to canto_fetch.main # parse config # parse canto specific arguments # start work thread # load the initial feed data (from thread) # do one-off operation flags (like -a, -n, -o, -i, etc.) # do basic curses init # instantiate Gui class # main loop # check that Gui is still alive # if we're waiting for a process, sleep # check for input # if no input, check on threads # if work done, update screen # if input, pass to Gui and interpret return # if return implies update, queue up work for thread from process import ProcessHandler from utility import Cycle from cfg.base import get_cfg from const import * from gui import Gui import canto_fetch import utility import args import tag import traceback import signal import locale import curses import time import sys import os # This should be phased out after 0.7.x becomes standard def upgrade_help(): print "\nIf you're having trouble upgrading from 0.6.x please visit" print "http://codezen.org/canto/config/#upgrading-from-06x" class Main(): def __init__(self, stdscr = None): signal.signal(signal.SIGUSR2, self.debug_out) # Let locale figure itself out locale.setlocale(locale.LC_ALL, "") enc = locale.getpreferredencoding() # If we're canto-fetch, jump to that main function if sys.argv[0].endswith("canto-fetch"): canto_fetch.main(enc) # Parse arguments that canto shares with canto-fetch, return # a lot of file locations and an optlist that will contain the # parsed, but yet unused canto specific arguments. conf_dir, log_file, conf_file, feed_dir, script_dir, optlist =\ args.parse_common_args(enc, "hvulaor:t:i:n:", ["help","version","update","list","checkall","opml", "import=","url=","checknew=","tag="]) # Instantiate the config and start the log. try : self.cfg = get_cfg(conf_file, log_file, feed_dir, script_dir) self.cfg.parse() except : traceback.print_exc() upgrade_help() sys.exit(-1) self.cfg.log("Canto v %s (%s)" % \ ("%d.%d.%d" % VERSION_TUPLE, GIT_SHA), "w") self.cfg.log("Time: %s" % time.asctime()) self.cfg.log("Config parsed successfully.") # If we were passed an existing curses screen (i.e. restart) # pass it through to the config. self.cfg.stdscr = stdscr if self.cfg.stdscr: self.restarting = True else: self.restarting = False self.restart = False # Default arguments. flags = 0 feed_ct = None opml_file = None url = None newtag = None # Note that every single flag that takes an argument has its # argument converted to unicode. Saves a lot of bullshit later. for opt, arg in optlist : if opt in ["-u","--update"] : flags |= UPDATE_FIRST elif opt in ["-n","--checknew"] : flags |= CHECK_NEW feed_ct = unicode(arg, enc, "ignore") elif opt in ["-a","--checkall"] : flags |= CHECK_NEW elif opt in ["-l","--list"] : flags |= FEED_LIST elif opt in ["-o","--opml"] : flags |= OUT_OPML elif opt in ["-i","--import"] : flags |= IN_OPML opml_file = unicode(arg, enc, "ignore") elif opt in ["-r","--url"] : flags |= IN_URL url = unicode(arg, enc, "ignore") elif opt in ["-t","--tag"] : newtag = unicode(arg, enc, "ignore") # Import flags harness the same functions as their config # based counterparts, source_opml and source_url. if flags & IN_OPML: self.cfg.locals['source_opml'](opml_file, append=True) print "OPML imported." if flags & IN_URL: self.cfg.locals['source_url'](url, append=True, tag=newtag) print "URL added." # All import options should terminate. if flags & (IN_OPML + IN_URL): sys.exit(0) # If self.cfg had to generate a config, make sure we # update first. if self.cfg.no_conf: self.cfg.log("Conf was auto-generated, adding -u") flags |= UPDATE_FIRST if flags & UPDATE_FIRST: self.cfg.log("Pausing to update...") canto_fetch.run(self.cfg, True, True) # Detect if there are any new feeds by whether their # set path exists. If not, run canto-fetch but don't # force it, so canto-fetch intelligently updates. for i,f in enumerate(self.cfg.feeds) : if not os.path.exists(f.path): self.cfg.log("Detected unfetched feed: %s." % f.URL) canto_fetch.run(self.cfg, True, False) #Still no go? if not os.path.exists(f.path): self.cfg.log("Failed to fetch %s, removing" % f.URL) self.cfg.feeds[i] = None else: self.cfg.log("Fetched.\n") break # Collapse the feed array, if we had to remove some unfetchables. self.cfg.feeds = filter(lambda x: x != None, self.cfg.feeds) self.new = [] self.old = [] self.ph = ProcessHandler(self.cfg) # Force an update from disk by queueing a work item for each thread. # At this point, we just want to do the portion of the update where the # disk is read, so PROC_UPDATE is used. self.cfg.log("Populating feeds...") for f in self.cfg.feeds: self.ph.send((PROC_UPDATE, f.URL, [])) for f in self.cfg.feeds: f.merge(self.ph.recv()[1]) self.ph.send((PROC_GETTAGS, )) fixedtags = self.ph.recv() self.ph.kill_process() for i, f in enumerate(self.cfg.feeds): self.cfg.feeds[i].tags = fixedtags[i] # Now that the tags have all been straightened out, validate the config. # Making sure the tags are unique before validation is important because # part of validation is the actual creation of Tag() objects. try: self.cfg.validate() except Exception, err: print err upgrade_help() sys.exit(0) # Print out a feed list if flags & FEED_LIST: for f in self.cfg.feeds: print f.tags[0].encode(enc, "ignore") sys.exit(0) # This could probably be done better, or more officially # with some XML library, but for now, print is working # fairly well. if flags & OUT_OPML: self.cfg.log("Outputting OPML") print """""" print """""" for feed in self.cfg.feeds: ufp = feed.get_ufp() if "atom" in ufp["version"]: t = "pie" else: t = "rss" print """\t""" %\ (feed.tags[0].encode(enc, "ignore"), feed.URL, t) print """""" print """""" sys.exit(0) # Handle -a/-n flags (print number of new items) if flags & CHECK_NEW: if not feed_ct: unread = 0 for f in self.cfg.feeds: for s in f: if "read" not in s["canto_state"]: unread += 1 print unread else: t = [ f for f in self.cfg.feeds if f.tags[0] == feed_ct ] if not t: print "Unknown Feed" else: print len([ x for x in t[0] if "read"\ not in x["canto_state"]]) sys.exit(0) # After this point, we know that all further operation is going to # require all tags to be populated, we queue up the latter # half of the work, to actually filter and sort the items. # The reason we clear the feeds first is that only after validation do # we know what keys are going to have to be precached, and canto tries # hard to conserve items (see feed.merge), so we need to replace all of # them with corrected, fresh items from the process that knows about the # precache for f in self.cfg.feeds: del f[:] signal.signal(signal.SIGCHLD, self.chld) self.update(1, self.cfg.feeds, PROC_BOTH) # At this point we know that we're going to actually launch # the client, so we fire up ncurses and add the screen # information to our Cfg(). if not self.restarting: self.cfg.stdscr = curses.initscr() self.cfg.stdscr.nodelay(1) # curs_set can return ERR, we shouldn't care try: curses.curs_set(0) except: pass # if any of these mess up though, the rest of the # the operation is suspect, so die. try: curses.noecho() curses.start_color() curses.use_default_colors() except: self.cfg.log("Unable to init curses, bailing") self.done() self.sigusr = 0 self.resize = 0 self.alarmed = 0 self.ticks = 60 self.cfg.height, self.cfg.width = self.cfg.stdscr.getmaxyx() # Init colors try: for i, (fg, bg) in enumerate(self.cfg.colors): curses.init_pair(i + 1, fg, bg) except: self.cfg.log("Unable to init curses color pairs!") self.cfg.log("Curses initialized.") # Instantiate the base Gui class self.gui = Gui(self.cfg, self.cfg.tags.cur()) self.cfg.log("GUI initialized.") # Signal handling signal.signal(signal.SIGWINCH, self.winch) signal.signal(signal.SIGALRM, self.alarm) signal.signal(signal.SIGINT, self.done) signal.signal(signal.SIGUSR1, self.sigusr) signal.alarm(1) self.cfg.log("Signals set.") self.estring = None # The main loop is wrapped in one huge try...except so that no matter # what exception is thrown, we can clean up curses and exit without # shitting all over the terminal. try: # Initial draw of the screen, if not restarting self.refresh(self.restarting) self.restarting = False # Main program loop, terminated when all handlers have # deregistered / exited. self.cfg.log("Beginning main loop.") while 1: # Clear the key t = None # Gui is none if a key returned EXIT if not self.gui: break # If we've spawned a text-based process (i.e. elinks), then just # pause and wait to be awakened by SIGCHLD if self.cfg.wait_for_pid: signal.pause() # Tick when SIGALRM is received. if self.alarmed: self.alarmed = 0 self.tick() # Deferred update from signal if self.sigusr: self.sigusr = 0 if "signal" in self.cfg.triggers: self.update() # Get the key k = self.cfg.stdscr.getch() # KEY_RESIZE is the only key not propagated, to # keep users from rebinding it and crashing. if k == curses.KEY_RESIZE or self.resize: self.resize = 0 self.refresh() continue # No input, time to check on the worker. if k == -1: # Make sure we don't pin the CPU, so if there's no input and # no waiting updates, sleep for awhile. if not self.ph.pid: time.sleep(0.01) continue r = self.ph.recv(True, 0.01) if r: feed = [ f for f in self.cfg.feeds if f.URL == r[1]][0] if r[0] == PROC_UPDATE: old = [] for gf, tf, s, l in r[4]: if not l: old.append((gf, tf, s, l)) continue for i, oldidx in enumerate(l): l[i] = feed[oldidx] old.append((gf, tf, s, l)) feed.merge(r[2]) new = [] for gf, tf, s, l in r[3]: if not l: new.append((gf, tf, s, None)) continue for i, newidx in enumerate(l): l[i] = (feed[newidx], newidx) new.append((gf, tf, s, l)) self.gui.alarm(new, old) self.gui.draw_elements() feed.qd = False continue # Handle Meta pairs elif k == 195: k2 = self.cfg.stdscr.getch() if k2 >= 64: t = (k2 - 64, 1) else: t = (k, 0) # Just a normal key-press else: t = (k, 0) # Key resolves a keypress tuple into a list of actions actions = self.gui.key(t) # Actions are executed in order, and each return code # is handled in order. for a in actions: r = self.gui.action(a) if r == REFRESH_ALL: self.refresh() elif r == UPDATE: self.update() elif r in [REFILTER, RETAG]: self.ph.flush() self.update(1) elif r == TFILTER: # Tag filters shouldn't perform a full update, so we map # the relevant tag to all of the feeds that include that # tag and update them. t = self.gui.sel["tag"] ufds = [ f for f in self.cfg.feeds\ if t.tag in f.tags] t.clear() self.update(1, ufds) elif r == REDRAW_ALL: self.gui.draw_elements() elif r == RESTART: self.restart = True self.gui = None elif r == EXIT: self.gui = None break except Exception: self.estring = traceback.format_exc() except KeyboardInterrupt: pass self.done() def done(self, a=None, b=None): # Unset signals. for s in [signal.SIGWINCH, signal.SIGCHLD, signal.SIGINT]: signal.signal(s, signal.SIG_DFL) signal.signal(signal.SIGALRM, signal.SIG_IGN) # Kill the message log self.cfg.msg = None # Kill curses if not self.restart: try: curses.endwin() except: pass self.cfg.log("Curses done.") # If there was an exception, nicely print it out. if self.estring: self.cfg.log("\nEXCEPTION:") self.cfg.log(self.estring) print "Canto exited on an exception.\n" print self.estring print "Please report this bug. Send your logfile " +\ "(%s) to jack@codezen.org" % self.cfg.log_file # Flush the work thread to make sure no updates are going on. self.ph.flush() self.ph.sync() self.cfg.log("Flushed to disk.") self.ph.kill_process() # For the most part, it's smart to avoid doing anything but set a flag in an # signal handler. CHLD is an exception because the only case in which we do # anymore work than just acknowledging the process is dead is when # wait_for_pid is set and, in this case the main loop is paused anyway. def chld(self, a=None, b=None): # I'm not sure why, but SIGCHLD gets called and occasionally waitpid # then throws an exception about no waiting processes therefore this is # wrapped in meaningless try...except try: while True: pid, status = os.waitpid(0, os.WNOHANG) if pid == 0: break # If the interface is waiting on this pid to be done, reset the # signal and simulate a resize to make sure the window # information is still fresh. if self.cfg.wait_for_pid == pid: self.cfg.wait_for_pid = 0 signal.signal(signal.SIGALRM, self.alarm) signal.signal(signal.SIGWINCH, self.winch) self.alarmed = 1 self.resize = 1 except: pass # Back to better practices. =) def winch(self, a=None, b=None): self.resize = 1 def alarm(self, a=None, b=None): self.alarmed = 1 def sigusr(self, a, b): self.sigusr = 1 # Tick decrements two timers. One for a possible update (if "interval" is a # valid update trigger), and one for the message box at the bottom of the # interface so that messages don't persist for very long. def tick(self, refilter=0): # Possible update tick self.ticks -= 1 if self.ticks <= 0: if "interval" in self.cfg.triggers: self.update(0, self.cfg.feeds) self.ticks = 60 # Message tick self.cfg.msg_tick -= 1 if self.cfg.msg_tick == 0: self.cfg.message(self.cfg.status(self.cfg), 1) signal.alarm(1) # Update is where the work is queued up for the work thread. def update(self, refilter = 0, iter = None, action=PROC_BOTH): # Default to updating all feeds that match the gui's current tags if iter == None: iter = [] for f in self.cfg.feeds: for t in self.gui.tags: if t.tag == "*" or t.tag in f.tags: iter.append(f) break # This will restart the worker process if an unknown filter has made it # into the global or tag filters. This should *never* happen unless the # user has on-the-fly defined a filter (i.e. with search_filter). This # makes on-the-fly filters rather expensive. Also note that this should # *always* be accompanied by a REFILTER return code from the gui as the # kill_process will ignore any further returned data. restart = 0 for t in self.cfg.tags.cur(): tf = t.filters.cur() if tf not in self.cfg.all_filters: self.cfg.all_filters.append(tf) restart = 1 gf = self.cfg.filters.cur() if gf not in self.cfg.all_filters: self.cfg.all_filters.append(gf) restart = 1 if restart: self.ph.kill_process() self.ph.start_process(self.cfg) # Queue up the appropriate items. for f in iter: if f.qd and not refilter: continue self.ph.send( (action, f.URL, f[:],\ self.cfg.all_filters.index(self.cfg.filters.cur()), [(t.tag,\ self.cfg.all_filters.index(t.filters.cur()),\ self.cfg.all_sorts.index(t.sorts.cur()))\ for t in self.cfg.tags.cur()],\ refilter)) for s in f.changed(): s.updated = STORY_QD f.qd = True # Refresh should only be called when it's possible that the screen has # changed shape. def refresh(self, restart = False): # Get new self.cfg.{height, width} if not restart: try: curses.endwin() except: pass self.cfg.stdscr.touchwin() self.cfg.stdscr.refresh() self.cfg.stdscr.keypad(1) self.cfg.height, self.cfg.width = self.cfg.stdscr.getmaxyx() # If there's a resize hook, execute it. if self.cfg.resize_hook: self.cfg.resize_hook(self.cfg) # Make sure we've got a minimum for columns and reader_lines self.cfg.columns = max(self.cfg.columns, 1) self.cfg.reader_lines = max(self.cfg.reader_lines, 3) # Adjust gui_height to compensate for the message at the bottom. self.cfg.gui_height = self.cfg.height - self.cfg.msg_height self.cfg.gui_width = self.cfg.width # Now we interpret the reader_orientation setting from the config to # shape the reader area, and adjust other height / width settings # accordingly. # XXX This logic could be cleaned up... if self.cfg.reader_orientation == "top": self.cfg.gui_height -= self.cfg.reader_lines self.cfg.gui_top = self.cfg.reader_lines elif self.cfg.reader_orientation == "bottom": self.cfg.gui_height -= self.cfg.reader_lines elif self.cfg.reader_orientation == "left": self.cfg.gui_width -= self.cfg.reader_lines self.cfg.gui_right = self.cfg.reader_lines elif self.cfg.reader_orientation == "right": self.cfg.gui_width -= self.cfg.reader_lines # Create the message window. This could arguably be crammed into the cfg # class itself, however, for logging and messaging, the cfg class is # basically just a glorified way to do away with globals =P self.cfg.msg = curses.newwin(self.cfg.msg_height,\ self.cfg.width, self.cfg.height - self.cfg.msg_height, 0) self.cfg.msg.bkgdset(curses.color_pair(1)) self.cfg.msg.erase() self.cfg.msg.refresh() # Perform the main update update. self.gui.refresh() self.gui.draw_elements() # This also doesn't follow good signal practices, but it's an exception # because the backtrace is important. def debug_out(self, a=None, b=None): f = open("canto_debug_out", "w") for l in traceback.format_stack(): f.write(l) f.close() canto-0.7.10/canto/process.py000066400000000000000000000361141142361563200160470ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # Processes are used for concurrency in canto because Python's GIL makes threads # worthlessly slow. The initial thread based reimplementation of canto for 0.7.x # made strides towards responsiveness, but the speed was agonizing in some # cases, and after reading http://www.dabeaz.com/python/GIL.pdf, it was clear # that they weren't working as well as you would anticipate. # # --SIDE RANT-- # I know that GvR has stated that the GIL isn't going anywhere # anytime soon. I think this is the worst choice possible for Python. When it # was initially implemented in the 90s, multi-processor, multi-core machines # were basically *unheard of* outside of corporate computing. These days you # can't buy a processor without getting multiple cores and sacrificing # concurrent performance for single threaded performance is a huge mistake. In # the future, languages that support massive parallelism are going to clean # Python's clock (see: Haskell, Erlang). # --END SIDE RANT-- # # Processing in canto is not simple. The ProcessManager class forks a process # with two pipes, one directed to the process and another directed from the # process (the typical pipe setup). The interface process puts a work object in # the pipe, the worker process does the work and puts the result in the outgoing # pipe. The work tuples looke like this: # # (action, arguments...) # # There are a number of actions: (PROC_UPDATE, URL, old items) performs on disk # update only, this is used early in init when we're trying to rectify tags from # ondisk content # # (PROC_GETTAGS, ) requests that the process return the rectified tags (i.e. # collisions resolved) # # (PROC_FILTER / PROCESS_BOTH , URL, old items, global filter index, # [tag_info], refilter) performs the filtering/sorting (in addition to # update for BOTH) this is the most common full update. PROC_FILTER is # only used after PROC_UPDATE early on. [tag_info] is a list of one tuple # per tag containing: # # (tag string, tag filter index, tag sort index) # # (PROC_FLUSH, ) This essentially serves as a marker in the pipe that's # returned verbatim when the worker thread receives it. In practice, the # ProcessHandler's flush() call puts it into the pipe and discards any # output until it gets it back. This is used when the items in the pipe # are no longer accurate (i.e. the filter/sort/tag settings have changed. # # (PROC_SYNC, URL, old items) This syncs the state to disk. Typically the # state is saved on update, but on exiting the program, it needs to be # explicitly synced to disk so that any state changes made between the # last disk update are saved. # # (PROC_KILL, ) Kills the process. In implementation, it's just PROC_FLUSH # with an added clause so it terminates after returning the same tuple. # After that tuple if received back, it's guaranteed that the process was # safely exited and it can be assumed that the pipes are no longer active. # Most of the return tuples are self-explanatory. The most common return from # PROC_BOTH looks like this: # # (URL, stories, newdiff, olddiff) # # Where both diffs are arrays that match up with all of the currently used tags. # For each tag, the diff contains # # (global filter index, tag filter index, tag sort index, new/old item # indices) # # This diff includes information to keep everything in sync. While the thread # works the filters and sorts can change so when the interface thread receives # the diff info it has to check that it's still valid. # If it is valid, items are added or evicted from the tags. # NOTE: If this doesn't make sense, canto.gui.alarm() is where this format is # unravelled. # WHY ALL THE INDICES? Since we're already passing everything but the kitchen # sink explicitly through the pipes, it might seem odd to pass index numbers # back and forth. However, passing lambdas or functions through pipes is not # easily possible because they are unpickle-able. The solution is that after the # os.fork(), the all_filters and all_sorts list (in addition to the empty Feed # objects) are still resident in the new process' memory. So we pass indices # into those lists to workaround the inability to pass the functions themselves. # An unfortunate side-effect is that all filters and sorts have to be known at # the time of the os.fork since the lists won't be synchronized between # processes automatically (which is one reason processes are more difficult to # work with than threads). # The point here is to make the interface process have to do the absolute # minimum because every second spent updating is a second spent unresponsive to # the user. from const import * from threading import Thread, Lock from cPickle import dumps, loads import select import signal import time import sys import os class Queue(): def __init__(self): self.recvpipe, self.sendpipe = os.pipe() self.poll = select.poll() self.poll.register(self.recvpipe, select.POLLIN) self.objlist = [] self.objlock = Lock() self.thread = None self.alive = True def get(self, block=True, timeout=None): ready = self.poll.poll(timeout) for fd, event in ready: l = os.read(self.recvpipe, 1) while l[-1] != " ": l += os.read(self.recvpipe, 1) l = int(l) s = os.read(self.recvpipe, l) return loads(s) raise Exception def feed_pipe(self): while self.alive: if not len(self.objlist): time.sleep(0.1) continue self.objlock.acquire() obj = self.objlist[0] self.objlist = self.objlist[1:] self.objlock.release() s = dumps(obj) l = len(s) os.write(self.sendpipe, "%d %s" % (l, s)) def put(self, obj): if not self.thread: self.thread = Thread(target = self.feed_pipe) self.thread.start() self.objlock.acquire() self.objlist.append(obj) self.objlock.release() def close(self): if self.thread: while len(self.objlist): pass self.alive = False self.thread.join() os.close(self.recvpipe) os.close(self.sendpipe) class ProcessHandler(): def __init__(self, cfg): self.persist = True self.cfg = cfg self.start_process(cfg, True) def start_process(self, cfg, persist=False): self.persist = persist self.updated = Queue() self.update = Queue() self.pid = os.fork() if not self.pid: self.run(self.update, self.updated, cfg.all_filters, cfg.all_sorts, cfg.feeds) return self.pid def run(self, update, updated, all_filters, all_sorts, feeds): def scan_tags(feeds): # This chunk of code takes any base tags that inadvertantly conflict # (i.e. weren't explicitly set by the user) and resolves them into # Tag, Tag (2), Tag (3), etc. # This may seem like paranoia, but half-assed feed generators that # use default feed titles shouldn't break Canto. base_tags = {} for f in [x for x in feeds if x.base_set and\ not x.base_explicit]: otag = f.tags[0] if f.tags[0] in base_tags: base_tags[otag] += 1 # We check each tag is unique, even if we're # generating a new one so that if a user defines # "Tag, Tag, Tag (2)", it resolves to # "Tag, Tag (3), Tag (2)" while f.tags[0] + (" (%d)" % base_tags[otag]) in base_tags: base_tags[otag] += 1 f.tags[0] += " (%d)" % base_tags[otag] # Remove original tag from all stories in the feed for s in f: s.unset(otag) s.set(f.tags[0]) else: base_tags[f.tags[0]] = 1 def send(obj): return updated.put(obj) # SIGINT is issued to all sub-processes when given as ^C, # and the SIGINT handler for the main process will cleanup # the worker, so that's the only one that's truly necessary # AFAICT, but chalk the rest of the ignores up to paranoia. signal.signal(signal.SIGCHLD, signal.SIG_DFL) signal.signal(signal.SIGWINCH, signal.SIG_DFL) signal.signal(signal.SIGALRM, signal.SIG_DFL) signal.signal(signal.SIGINT, signal.SIG_DFL) signal.signal(signal.SIGUSR1, signal.SIG_DFL) while True: while True: try: r = update.get(True, 0.1) except: # If parent canto is dead, kill ourselves if os.getppid() == 1: r = (PROC_KILL, ) send = lambda x: True break continue break action, args = r[0],r[1:] if action == PROC_GETTAGS: scan_tags(feeds) send([ f.tags for f in feeds ]) continue if action in [PROC_FLUSH, PROC_KILL]: # Make sure we leave the on-disk presence constant send((action, )) if action == PROC_KILL: update.close() updated.close() sys.exit(0) continue if action == PROC_SYNC: feed = [ f for f in feeds if f.URL == args[0] ][0] feed.merge(args[1]) while feed.changed(): feed.todisk() send((PROC_SYNC,)) continue # PROC_UPDATE, just load the data from disk. if action >= PROC_UPDATE: feed = [ f for f in feeds if f.URL == args[0] ][0] feed.merge(args[1]) if not feed.update(): send((PROC_DEQD, feed.URL)) continue if action == PROC_UPDATE: send((action, feed[:])) else: prev = args[1] filter = args[2] taginfo = args[3] refilter = args[4] if refilter: prev = [] # Step 1: Global Filters gf = all_filters[filter] new = [] for item in feed: if (item in prev) or (gf and (not gf(feed, item))): continue new.append(item) old = [] for item in prev: if (item in feed) and ((not gf) or gf(feed, item)): continue old.append(item) # Step 2: Tag filters, initial diff ndiff = [None] * len(taginfo) for item in new: for i, (t, tf, ts) in enumerate(taginfo): tagf = all_filters[tf] if t in item["canto_state"] and\ ((not tagf) or tagf(t,item)): if not ndiff[i]: ndiff[i] = [item] else: ndiff[i].append(item) odiff = [None] * len(taginfo) for item in old: for i, (t, tf, ts) in enumerate(taginfo): tagf = all_filters[tf] if t in item["canto_state"] and\ ((not tagf) or tagf(t, item)): if not odiff[i]: odiff[i] = [item] else: odiff[i].append(item) # Step 3: Tag sorts for i, (t, tf, ts) in enumerate(taginfo): sort = all_sorts[ts] if not sort: continue if ndiff[i]: ndiff[i].sort(sort) if odiff[i]: odiff[i].sort(sort) # Step 4: Convert items into indices for newdiff in ndiff: if not newdiff: continue for i, item in enumerate(newdiff): newdiff[i] = feed.index(newdiff[i]) for olddiff in odiff: if not olddiff: continue for i, item in enumerate(olddiff): olddiff[i] = prev.index(olddiff[i]) # Step 5: Add parity information for i, (t, tf, ts) in enumerate(taginfo): ndiff[i] = (filter, tf, ts, ndiff[i]) odiff[i] = (filter, tf, ts, odiff[i]) # Step 6: Queue up the results for the interface process. send((PROC_UPDATE, feed.URL, feed[:], ndiff, odiff)) if action > PROC_UPDATE: del feed[:] def send(self, obj): if not self.pid: self.start_process(self.cfg) return self.update.put(obj) # recv_raw won't attempt to kill the process if no more work is queued. # Its intended for use when syncing / killing when a tuple is floated # through the process and no feeds are queued. def recv_raw(self, block=True, timeout=None): r = None try: r = self.updated.get(block, timeout) except: pass return r def recv(self, block=True, timeout=None): r = self.recv_raw(block, timeout) # If no more feeds are queued and we're not persistent (used very early), # kill the slave process in addition to returning the received value. if self.persist: return r for f in self.cfg.feeds: if f.qd: return r self.kill_process() return r def send_and_wait(self, symbol): self.send((symbol, )) while True: got = self.recv_raw() if got == (symbol, ): return # Send_and_wait ignores all items on the queue # so none of the feeds are still queued. for f in self.cfg.feeds: f.qd = False def kill_process(self): if self.pid: self.send_and_wait(PROC_KILL) self.update.close() self.updated.close() self.pid = 0 def flush(self): if self.pid: self.send_and_wait(PROC_FLUSH) def sync(self): for f in self.cfg.feeds: self.send((PROC_SYNC, f.URL, f[:])) for f in self.cfg.feeds: self.recv_raw() canto-0.7.10/canto/reader.py000066400000000000000000000124541142361563200156340ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. from basegui import BaseGui from input import input from const import * import utility import curses class Reader(BaseGui): def __init__(self, cfg, tag, story, dead_call): self.story = story self.cfg = cfg self.keys = cfg.reader_key_list self.focus = 0 self.more = 0 self.offset = 0 self.height = 0 self.width = 0 self.height = 0 self.show_links = 0 self.tag = tag self.dead = dead_call self.refresh() def refresh(self): # It's unfortunate, but because the interface is so complex, # the only way to get the number of lines it will take to completely # render the reader, we actually have to render it to a None window # first. # A way to get this right off the bat would be nice, but I doubt # it would enhance the performance more than one iota. d = { "story" : self.story, "tag" : self.tag, "cfg" : self.cfg, "show_links" : self.show_links, "window" : None } if self.cfg.reader_orientation in ["top","bottom",None]: # First render for self.lines d["width"] = self.cfg.gui_width d["height"] = 0 self.lines, self.links = self.tag.renderer.reader(d) # This is the default, old behavior (floating window) if not self.cfg.reader_orientation: d["width"] = self.cfg.gui_width d["height"] = min(self.lines, self.cfg.gui_height) self.top, self.right = (0,0) # Rendering the reader into a pre-existing space else: d["height"] = self.cfg.reader_lines d["width"] = self.cfg.gui_width if self.cfg.reader_orientation == "top": self.top, self.right = (0,0) else: self.top, self.right = (self.cfg.gui_height, 0) else: d["width"] = self.cfg.reader_lines d["height"] = self.cfg.gui_height self.lines, self.links = self.tag.renderer.reader(d) if self.cfg.reader_orientation == "left": self.top, self.right = (0, 0) else: self.top, self.right = (0, self.cfg.gui_width) self.window = curses.newpad(self.lines, d["width"]) self.window.bkgdset(curses.color_pair(1)) d["window"] = self.window self.width = d["width"] self.height = d["height"] self.lines, self.links = self.tag.renderer.reader(d) self.draw_elements() def draw_elements(self): self.more = self.lines - (self.height + self.offset) self.window.noutrefresh(self.offset, 0, self.top, self.right, \ self.top + self.height - 1, self.width + self.right) def toggle_show_links(self): self.show_links = not self.show_links self.refresh() return REDRAW_ALL def scroll_down(self): if self.more > 0 : self.offset += 1 return REDRAW_ALL def page_down(self): if self.more > self.height: self.offset += self.height else: self.offset = self.lines - self.height return REDRAW_ALL def scroll_up(self): if self.offset : self.offset -= 1 return REDRAW_ALL def page_up(self): if self.offset > self.height: self.offset -= self.height else: self.offset = 0 return REDRAW_ALL def goto(self): term = input(self.cfg, "Goto") if not term: return links = [] terms = term.split(',') for t in terms: try: links.append(int(t)) except: if t.count('-') == 1: d = t.index('-') a = t[:d] b = t[(d+1):] try: a = int(a) b = int(b) except: self.cfg.log("Unable to interpret range!") return for l in xrange(a,b + 1): links.append(l) else: self.cfg.log("Unable to interpret link!") return out = "Going to link" if len(links) != 1: out += "s " for n in links[:-1]: out += "%d, " % n out += "and %d" % links[-1] else: out += " %d" % links[0] self.cfg.log(out) for l in links: self.dogoto(l) def dogoto(self, n): if n == None: return if n < len(self.links): utility.goto(self.links[n], self.cfg) return 1 def destroy(self): # Dereference anything fetched from disk to render. self.story.free() self.window.erase() self.draw_elements() self.dead() return REDRAW_ALL canto-0.7.10/canto/story.py000066400000000000000000000114211142361563200155430ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # Story is a slick little class. It's main purpose is to hold all of the data # for each story, both content and state. To save memory, 0.7.0 will attempt to # fetch data from disk if it's not held by default. A good example is # description information which can take up kilobytes or megabytes of memory but # are rarely used. # Story doesn't care which feed or tag it's associated with. If you really want # to get the feed, story["feed"] contains the unique URL, but you'd have to use # the config to get the Feed() object. The only thing that the Story() gets from # the feed is the get_ufp function that gets the feedparser dict from disk. from const import STORY_SAVED, STORY_UPDATED import cPickle import fcntl class Story(): def __init__(self, d, ufp_path, updated): self.updated = updated self.ufp_path = ufp_path self.ondisk = None self.d = d self.sel = 0 self.in_reader = 0 def __eq__(self, other): if self["id"] != other["id"]: return 0 # The reason that we have to check for membership is # that sometimes (like when writing to disk) it's convenient # to compare against a dict in the feedparser block than # another Story object. In all of these cases, the feeds # are guaranteed to be the same anyway. if "feed" in other and (self["feed"] != other["feed"]): return 0 return 1 def __str__(self): return self.d["title"] + " " + str(id(self)) # Where get_ufp reads the ufp from disk, this narrows that down to a # particular feed entry. def get_ufp_entry(self): if not self.ufp_path: return {} try: f = open(self.ufp_path, "r") try: fcntl.flock(f.fileno(), fcntl.LOCK_SH) ufp = cPickle.load(f) except: return {} finally: fcntl.flock(f.fileno(), fcntl.LOCK_UN) f.close() except: return {} for ondisk in ufp["entries"]: if ondisk["id"] == self["id"]: break else: self.ondisk = None self.ondisk = ondisk def __getitem__(self, key): if key in self.d: return self.d[key] # If the key isn't stored by default, get it from disk. else: if not self.ondisk: self.get_ufp_entry() if not self.ondisk: return "" if key in self.ondisk: return self.ondisk[key] return "" def __setitem__(self, key, item): self.d[key] = item def __contains__(self, key): if key in self.d: return True else: if not self.ondisk: self.get_ufp_entry() if not self.ondisk: return False return key in self.ondisk def was(self, tag): return tag in self.d["canto_state"] def set(self, tag): if not tag in self.d["canto_state"]: self.d["canto_state"].append(tag) self.updated = STORY_UPDATED def unset(self,tag): if tag in self.d["canto_state"]: self.d["canto_state"].remove(tag) self.updated = STORY_UPDATED # These are separate from the was/set/unset since the selected status isn't # stored in the ufp. def selected(self): return self.sel def select(self): self.sel = 1 def unselect(self): self.sel = 0 def get_text(self): if "content" in self: for c in self["content"]: if "type" in c and "text" in c["type"]: return c["value"] return self["description"] def get_type(self): if "content" in self: for c in self["content"]: if "type" in c and "text" in c["type"]: return c["type"] if "summary_detail" in self and "type" in self["summary_detail"]: return self["summary_detail"]["type"] return "text/html" def get_title_type(self): if "title_detail" in self and "type" in self["title_detail"]: return self["title_detail"]["type"] return "text/html" # Free makes the Story() forget all of the uncommon items. Should be called # after anything that could cause Story() to fetch items (ie. in the # Renderer hooks). def free(self): if self.ondisk: self.ondisk = None canto-0.7.10/canto/tag.py000066400000000000000000000071371142361563200151470ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import story class Tag(list): def __init__(self, cfg, renderer, sorts = [[None]], \ filters = [None], c = "*"): list.__init__(self) self.cfg = cfg self.renderer = renderer self.tag = c self.collapsed = 0 self.start = 0 self.read = 0 self.unread = 0 self.filters = filters self.sorts = sorts def __eq__(self, other): return self.tag == other.tag def __str__(self): return self.tag def search_stories(self, story): for i in range(len(self)) : if self[i]["id"] == story["id"]: return i return -1 def all_read(self): for s in self : s.set("read") self.unread = 0 self.read = len(self) def all_unread(self): for s in self : s.unset("read") self.read = 0 self.unread = len(self) def set_read(self, item): if not item.was("read"): item.set("read") self.unread -= 1 self.read += 1 def set_unread(self, item): if item.was("read"): item.unset("read") self.unread += 1 self.read -= 1 def sort_add(self, iter): if not iter: return sort = self.sorts.cur() if not len(self): list.extend(self, [item[0] for item in iter]) return if not sort: for item, idx in iter: list.insert(self, idx, item) return added = 0 for i, item in enumerate(self): while sort(item, iter[0][0]) > 0: list.insert(self, i + added, iter[0][0]) del iter[0] added += 1 if not iter: return list.extend(self, [ item[0] for item in iter]) def retract(self, iter): for item in iter: if item in self: # Items in the reader are immune to being retraced. if item.in_reader: continue if item.was("read"): self.read -= 1 else: self.unread -= 1 self.remove(item) empty = 0 if self.filters.cur() and not len(self): d = { "title" : "No unfiltered items.", "description" : "You've filtered out everything!", "canto_state" : [self.tag, "*"], "id" : "canto-internal" } stub = story.Story(d, None, 0) self.append(stub) empty = 1 self.enum(empty) def extend(self, iter): self.sort_add(iter) if len(self) > 1 and self[0]["id"] == "canto-internal": del self[0] self.enum() def enum(self, empty = 0): if empty: self.read = 0 self.unread = 0 self[0].idx = 0 self[0].last = 1 else: lt = len(self) for i in range(lt): self[i].idx = i self[i].last = 0 if lt: self[-1].last = 1 self.read = len(filter(lambda x : x.was("read"), self)) self.unread = len(self) - self.read def clear(self): self.retract(self[:]) canto-0.7.10/canto/utility.py000066400000000000000000000070501142361563200160710ustar00rootroot00000000000000# -*- coding: utf-8 -*- #Canto - ncurses RSS reader # Copyright (C) 2008 Jack Miller # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import tempfile import urllib2 import signal import curses import locale import sys import re import os # The Cycle class has proved to be useful. It's used # to encapsulate every cycle in canto, global filters, # tag filters, global and tag sorts, tags. It's # essentially a list with a current pointer and # exception proof next/prev function and the ability # to temporarily override a particular value. class Cycle(): def __init__(self, list, idx = 0): self.overridden = False self.over = None self.list = list if 0 <= idx < len(self.list): self.idx = idx else: self.idx = 0 def next(self): self.overridden = False if self.idx >= len(self.list) - 1: return 0 self.idx += 1 return 1 def prev(self): self.overridden = False if self.idx <= 0: return 0 self.idx -= 1 return 1 def override(self, cur): if not self.overridden or self.over != cur: self.overridden = True self.over = cur return 1 return 0 def cur(self): if self.overridden: return self.over return self.list[self.idx] def silentfork(path, href, text, fetch): enc = locale.getpreferredencoding() href = href.encode(enc, "ignore") pid = os.fork() if not pid : # A lot of programs don't appreciate # having their fds closed, so instead # we dup them to /dev/null. fd = os.open("/dev/null", os.O_RDWR) os.dup2(fd, sys.stderr.fileno()) if not text: os.setpgid(os.getpid(), os.getpid()) os.dup2(fd, sys.stdout.fileno()) if fetch: response = urllib2.urlopen(href) data = response.read() fd, name = tempfile.mkstemp() os.write(fd, data) os.close(fd) path = path.replace("%u", name) else: path = path.replace("%u", href) os.execv("/bin/sh", ["/bin/sh", "-c", path]) sys.exit(0) if text: signal.signal(signal.SIGALRM, signal.SIG_IGN) signal.signal(signal.SIGWINCH, signal.SIG_IGN) return pid def goto(link, cfg): title,href,handler = link if handler in cfg.handlers: for k in [h for h in cfg.handlers[handler].keys() if h]: if href.endswith(k): binary, text, fetch = cfg.handlers[handler][k] break else: if None in cfg.handlers[handler]: binary, text, fetch = cfg.handlers[handler][None] else: cfg.log("No default %s handler defined!" % handler) return # Escape all "s in the URL, to avoid malicious use # of crafted feeds. Thanks to Andreas. href = href.replace("\"","%22") if text: cfg.wait_for_pid = silentfork(binary, href, 1, fetch) else: silentfork(binary, href, 0, fetch) else: cfg.log("No handler set for %s" % handler) def stripchars(string): string = string.replace("\\","\\\\") string = string.replace("%", "\\%") return string def strip_escape_chars(strings): return (re.sub("\\\\(.)", "\\1", string) for string in strings) canto-0.7.10/canto/widecurse.c000066400000000000000000000210501142361563200161460ustar00rootroot00000000000000/* Canto - ncurses RSS reader Copyright (C) 2008 Jack Miller This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation.*/ #include #include #define IGNORE_COLOR 1 #define IGNORE_STYLE 2 static int ignore = 0; char *lstrip(char *s) { int i = 0; for(i=0;s[i];i++) { if((s[i] != ' ')&&(s[i] != '\t')) break; } return &s[i]; } static int theme_strlen(char *message, char end) { int len = 0; int i = 0; while ((message[i] != end) && (message[i] != 0)) { if (message[i] == '%') { i++; } else if (message[i] == '\\') { i++; len++; } else if ((unsigned char) message[i] > 0x7f) { wchar_t dest[2]; int bytes = mbtowc(dest, &message[i], 3) - 1; if (bytes >= 0) { int rwidth = wcwidth(dest[0]); if(rwidth < 0) rwidth = 1; i += bytes; len += rwidth; } else { i++; len += 1; } } else if (message[i] != '\n') len++; i++; } return len; } static PyObject *tlen(PyObject *self, PyObject *args) { char *message; char end = 0; if(!PyArg_ParseTuple(args, "s|c", &message, &end)) return NULL; return Py_BuildValue("i",theme_strlen(message, end)); } #define COLOR_MEMORY 8 static void style_box(WINDOW *win, char code) { /* This function's limited memory */ static int colors[COLOR_MEMORY] = {0}; static int color_idx = 0; static char attrs[6] = {0,0,0,0,0,0}; /* >=2 == even ignore color */ if (!(ignore & IGNORE_COLOR)) { if (code == '0') { if ((color_idx != COLOR_MEMORY - 1)||(!colors[color_idx])) color_idx = (color_idx > 1) ? color_idx - 1: 1; colors[color_idx] = 0; wattron(win, COLOR_PAIR(colors[color_idx - 1])); } else if ((code >= '1') && (code <= '8')) { if (color_idx == COLOR_MEMORY - 1) { if (colors[color_idx]) { int i = 0; for (i = 0; i < color_idx; i++) colors[i] = colors[i + 1]; } colors[color_idx] = code - '0'; wattron(win, COLOR_PAIR(colors[color_idx])); } else { colors[color_idx] = code - '0'; wattron(win, COLOR_PAIR(colors[color_idx])); color_idx++; } } } if (ignore & IGNORE_STYLE) return; if (code == 'B') { attrs[0]++; if(!attrs[5]) wattron(win, A_BOLD); } else if (code == 'b') { attrs[0]--; if(!attrs[0]) wattroff(win, A_BOLD); } else if (code == 'U') { attrs[1]++; if(!attrs[5]) wattron(win, A_UNDERLINE); } else if (code == 'u') { attrs[1]--; if(!attrs[1]) wattroff(win, A_UNDERLINE); } else if (code == 'S') { attrs[2]++; if(!attrs[5]) wattron(win, A_STANDOUT); } else if (code == 's') { attrs[2]--; if(!attrs[2]) wattroff(win, A_STANDOUT); } else if (code == 'R') { attrs[3]++; if(!attrs[5]) wattron(win, A_REVERSE); } else if (code == 'r') { attrs[3]--; if(!attrs[3]) wattroff(win, A_REVERSE); } else if (code == 'D') { attrs[4]++; if(!attrs[5]) wattron(win, A_DIM); } else if (code == 'd') { attrs[4]--; if(!attrs[4]) wattroff(win, A_DIM); } /* For some reason wattron(win, A_NORMAL) doesn't work. */ else if (code == 'N') { attrs[5]++; if (win) wattrset(win, 0); } else if (code == 'n') { attrs[5]--; if(!attrs[5]) { if(attrs[0]) wattron(win, A_BOLD); if(attrs[1]) wattron(win, A_UNDERLINE); if(attrs[2]) wattron(win, A_STANDOUT); if(attrs[3]) wattron(win, A_REVERSE); if(attrs[4]) wattron(win, A_DIM); } } else if (code == 'C') { int j = 0; for(j = 0; j < 5; j++) attrs[j] = 0; if (win) wattrset(win, 0); } } static int putxy(WINDOW *win, int width, int *i, int *y, int *x, char *str) { if ((unsigned char) str[0] > 0x7F) { wchar_t dest[2]; int bytes = mbtowc(dest, &str[0], 3) - 1; if (bytes >= 0) { /* To deal with non-latin characters that can take up more than one character's alotted width, with offset x by wcwidth(character) rather than 1 */ /* Took me forever to find that function, thanks Andreas (newsbeuter) for that one. */ int rwidth = wcwidth(dest[0]); if (rwidth < 0) rwidth = 1; if (rwidth > (width - *x)) return 1; dest[1] = 0; mvwaddwstr(win, *y, *x, dest); *x += rwidth; *i += bytes; } } else mvwaddch(win, *y, (*x)++, str[0]); return 0; } static int do_char(WINDOW *win, int width, int *i, int *y, int *x, char *str) { if (!str[*i]) { return -1; } else if (str[*i] == '\\') { (*i)++; putxy(win, width, i, y, x, &str[*i]); } else if (str[*i] == '%') { (*i)++; if (!str[(*i)]) return -1; else style_box(win, str[*i]); } else if (str[*i] == ' ') { int tmp = theme_strlen(&str[*i + 1], ' '); if ((tmp >= (width - *x)) && (tmp < width)) { (*i)++; return -2; } else putxy(win, width, i, y, x, &str[*i]); } else if(putxy(win, width, i, y, x, &str[*i])) return -2; return 0; } static PyObject * mvw(PyObject *self, PyObject *args) { int y, x, width, rep_len, end_len, ret; char *message, *rep, *end; const char *m_enc, *r_enc, *e_enc; PyObject *window; WINDOW *win; /* We use the 'et' format because we don't want Python to touch the encoding and generate Unicode Exceptions */ if(!PyArg_ParseTuple(args, "Oiiietetet", &window, &y, &x, &width, &m_enc, &message, &r_enc, &rep, &e_enc, &end)) return NULL; if (window != Py_None) win = ((PyCursesWindowObject *)window)->win; else win = NULL; rep_len = strlen(rep); end_len = theme_strlen(end, 0); /* Make width relative to current x */ width += x; int i = 0; for (i = 0; ((x < width - end_len) || (message[i] == '%')); i++) { ret = do_char(win, width - end_len, &i, &y, &x, message); if (ret) break; } int j = 0; for(j = 0; x < (width - end_len); j = (j + 1) % rep_len) do_char(win, width - end_len, &j, &y, &x, rep); for(j = 0; end[j]; j++) do_char(win, width, &j, &y, &x, end); PyMem_Free(rep); PyMem_Free(end); if (ret == -1) { PyMem_Free(message); return Py_BuildValue("s", NULL); } else { PyObject *r = Py_BuildValue("s", lstrip(&message[i])); PyMem_Free(message); return r; } } static PyObject *disable_color(PyObject *self, PyObject *args) { ignore |= IGNORE_COLOR; Py_RETURN_NONE; } static PyObject *enable_color(PyObject *self, PyObject *args) { ignore &= ~IGNORE_COLOR; Py_RETURN_NONE; } static PyObject *disable_style(PyObject *self, PyObject *args) { ignore |= IGNORE_STYLE; Py_RETURN_NONE; } static PyObject *enable_style(PyObject *self, PyObject *args) { ignore &= ~IGNORE_COLOR; Py_RETURN_NONE; } static PyMethodDef MvWMethods[] = { {"core", mvw, METH_VARARGS, "Wide char print."}, {"disable_color", disable_color, METH_VARARGS, "Disable color codes."}, {"enable_color", enable_color, METH_VARARGS, "Enable color codes."}, {"disable_style", disable_style, METH_VARARGS, "Disable style codes."}, {"enable_style", enable_style, METH_VARARGS, "Enable style codes."}, {"tlen", tlen, METH_VARARGS, "Len ignoring theme escapes, and accounting for Unicode character width."}, {NULL, NULL, 0, NULL} }; PyMODINIT_FUNC initwidecurse(void) { Py_InitModule("widecurse", MvWMethods); } canto-0.7.10/doc/000077500000000000000000000000001142361563200134535ustar00rootroot00000000000000canto-0.7.10/doc/advanced-configuration000066400000000000000000000664451142361563200200270ustar00rootroot00000000000000This is a section on configuration that requires a little more programming knowledge and knowledge of Canto's internal data structures. [TOC] # Canto Objects
## The `cfg` object The `cfg` object includes just about every ounce of configuration in Canto. You can use this object to get to any tag, sort, filter, story or feed that could be used. Here's a subset of useful content. ### Attributes ###
[cfg.feeds] A list of all feed objects Canto is aware of.
[cfg.tags] A list of all tag objects Canto is aware of.
[cfg.tag_sorts] A list of all tag sorts Canto is aware of.
[cfg.tag_filters] A list of all tag filters Canto is aware of.
[cfg.filters] A list of all overall filters Canto is aware of.
[cfg.sorts] A list of all overall sorts Canto is aware of.
cfg.*_hook Any set hooks, where * is one of: new, resize, select, unselect, state_change, start, and end.
cfg.log_file Path to log file.
cfg.path Path to configuration file.
cfg.feed_dir Path to feed directory.
cfg.script_dir Path to script directory.
cfg.gui Link to the gui object.
### Methods ### In addition to all of the methods available to the configuration (not listed here as they really shouldn't be used outside of the configuration which is already covered [here](../config/)). The `cfg` object includes `cfg.log("message")` that writes a message to the log file on disk and (if the curses UI is up) will print to the status bar. ## The `gui` object The gui object is primarily used in hooks and keybinds, but represents a running instance of the main interface. ### Attributes ###
gui.sel["item"] The currently selected story object (could be None)
gui.sel["tag"] The currently selected tag object (could be None)
gui.keys A dict mapping keybinds to gui attributes
[gui.window_list] A list of ncurses windows (one per column)
gui.reader Link to the reader object (could be None)
### Methods ### The methods of the `gui` object meant for direct use are mostly [keybinds](../config/#keybinds). The "name" of the keybind is a method of the class that takes no arguments. For example, the default binding for the down arrow is "next_item", so hitting the down arrow is equivalent to running `gui.next_item()`. ## The `reader` object The reader object controls an open instance of the article reader. ### Attributes ###
reader.story Link to the story object open in reader.
reader.tag Link to the tag object story is in.
reader.width Width of the reader.
reader.height Height of the reader.
reader.keys Dict mapping keys to reader attributes
reader.cfg Link to the cfg object.
### Methods ###
reader.page_down() Scroll the reader down.
reader.page_up() Scroll the reader up.
reader.goto() Prompts user for a link to go to.
reader.dogoto(n) Go to link n, no prompt.
reader.destroy() Kill the reader.
## The `tag` object A tag is similar to a feed in that it's a list of stories at it's base. The important difference though is that tags aren't tied to a single Atom/RSS feed. If a user defines no custom tags, one tag per feed is generated automatically. However, if two feeds share a main tag, there will still be two feed objects, but only one tag. Tags are represented as blocks of stories in the interface. ### Attributes ###
[tag.filters] This tag's filters, cycled through with {/} by default.
[tag.sorts] This tag's sorts, cycled through with -/= by default.
tag.unread The number of unread stories in this tag.
tag.read The number of read items in this tag.
tag.collapsed Whether this tag is collapsed.
tag.tag This tag's string representation.
tag.cfg Link to the `cfg` object.
### Methods ### The tag object includes some helpful functions typically accessible via keybinds.
tag.all_read() Set all items read.
tag.all_unread() Set all items unread.
tag.search_stories(story) Returns the index of an story object in the tag, or -1.
tag.set_read(story) Sets a story object as read and updates tag attributes.
tag.set_unread(story) Sets a story object as unread and updates tag attributes.
## The `feed` object The feed object in canto encompasses a single Atom/RSS feed. At it's core it's a list of story objects, each of which (surprise!) contains the content of a single Atom/RSS item. As such, you can iterate over it: :::python for story in feed: print story["title"] ### Attributes ###
[feed.tags] The tags this feed falls under. `.tags[0]` is the user or feed specified name of the feed. This is a list of strings.
feed.URL The URL of the RSS/Atom this feed represents.
feed.path The full path to the feed's info on disk
feed.rate The rate (in minutes) of how often canto-fetch will update this feed.
feed.keep The number of items canto-fetch will store for this feed
feed.username The username for basic/digest canto-fetch will use.
feed.password The password for basic/digest canto-fetch will use.
feed.cfg Link to the cfg object.
The feed object has a number of methods, but they shouldn't ever be used from keybinds or hooks because the feed objects available to the interface may or may not be active in the worker process. The feed as a boolean attribute `feed.qd` that specifies whether the feed is queued to be run through the worker process, but generally feed methods shouldn't be used. If you really want to anyway, read the source and figure it out =). ## The `story` object The story object encompasses a single item from an RSS/Atom feed. It functions like a Python dict object. ### Attributes ###
story["title"] The title of the item.
story["link"] The main link of the item.
story["canto_state"] The on-disk state of the item
That's a *very* sparse set of attributes, but those are the only two that you can usually count on. The story object actually contains *all* information given in the item. To see what items are included, you should use `canto-inspect` as described [here](http://codezen.org/canto/config/#less-common-content). > **NOTE**: Story objects are extremely flexible and try to be as lightweight > (in terms of memory) as possible. As such, only commonly used items are kept > in memory, but attempting to fetch items that aren't already in memory will > automatically attempt to lookup the content from it's feed content. If it > attempts to look something up that isn't there, it will return "", but not > throw an error. > So, if you want to write a sort / hook that deals with content that isn't > common between all your feeds you can still reference `story["uncommon"]` > without checking for the attribute. **However**, be aware that failing to find > an item is going to get a shared lock and hit the disk anyway so if you are > using uncommon content that's tied to a particular feed, you might want to > check that `story.ufp_path` is set to the path of the feed with the content. ### Methods ###
story.was(key) This returns whether key is in the item's on-disk state.
story.set(key) Add a key to the item's on-disk state.
story.unset(key) Remove a key from the item's on-disk state.
story.selected() Return wether this item is currently selected.
story.select() Set this item as selected.
story.unselect() Set this item as unselected.
story.get_text() Get an item's main content. This will either return the "description" field of the item in Atom/RSS or any text sub-content. Use this instead of story["description"]
At this point, the on-disk state of a given story can include these keys: "read" and "marked". Previously, a "new" tag was possible, but this has been phased out due to the fact that `new_hook` is handled promptly by canto-fetch, not canto. However, the state is just a list of strings, if you wanted to add persistent state information to an item, you could add an arbitrary string and it would be remembered. >**NOTE**: Always use the helper functions for messing with item state. It's >important that changing the state sets the item as "changed" so that canto >knows it needs to be queued up to write to disk. >**ALSO NOTE**: The "state" of whether an item is selected is handled separately >because the selected status of an item doesn't persist between runs (yet).
# Writing Draw Hooks #
Draw hooks are one of the most useful aspects of hardcore Canto customization, they allow you to change the content displayed arbitrarily. Before you spend too much time though, please make sure you check out this [configuration section](http://codezen.org/canto/config/#adding-content). ## Drawing Format Codes ## The style of particular text is set inline inside the string, so making things bold or underlined is very easy using escapes.
%B / %b Turn bold attribute on / off
%U / %u Turn underline attribute on / off
%S / %s Turn standout attribute on / off
%R / %r Turn reverse video attribute on / off
%D / %d Turn dim attribute on / off
%N / %n Turn off all attribute temporarily (%n restores previous attributes)
%C Turn all attributes off permanently
%1 - %8 Turn color pair 0 - 7 off (NOTE %# starts at 1, not 0)
%0 Return to previous color.
So, a couple of working example using the color/style escapes: "%B This is bold! %b This is not." "%B This is bold! %R This is bold and reversed. %b Just reversed. %r" "%1 Color one %2 Color two %0%0 Whatever color was on before." Something to note is that style attributes remember how many times they've been activated and deactivated. For example: "%B%B This is bold! %b This is still bold! %b This is not." Also, Canto relies on you to **keep your changes self contained**. A BAD EXAMPLE: "%B This is my bold text." This will cause *everything* after it until a "%C" call to be bolded and that's not what you want. A BETTER EXAMPLE: "%B This is my bold text.%b" Similarly, with colors, you should use %0 for each color you start, like in the first color example. This allows colors to be properly embedded. > **NOTE**: Support for colors and attributes is entirely up to your terminal. > Some terminals support `%U` (underline) and some don't (like the raw Linux > terminal). Most terminals support `%B` (bold), and `%R` (reverse). I've never > seen a terminal that did `%D` (dim) right, but it's a valid curses option so > it's included. > **ALSO NOTE**: The memory for `%0` is only 8 colors. The default interface > only uses up to a depth of 3 in the case of a link inside a quote. I think > this limit is more than enough to easily accomplish whatever you want. ## Basic format ## A draw_hook typically looks like this: :::python def myhook(dict): ## Do something Where `dict` is a dict with the following keys:
dict["content"] The content to be printed.
dict["story"] The story object being used.
dict["cfg"] The overall configuration object.
dict["tag"] The relevant tag object
dict["width"] Width of the reader or width of column the story is in.
dict["height"] Height of the reader (READER ONLY)
dict["show_links"] Whether the use wants to see internal links (READER ONLY)
These are available by default to any hook. However, **any content** added to the `dict` will be made available to the next hook called. So, for example, if you hook is called after the `reader_add_enc_links` (the hook that parses the content and grabs the links out), you'd also have access to `dict["links"]` More on that later. ## Example: highlight words ## So let's make a basic, hard-coded draw_hook to highlight a word: :::python def hword(dict): import re reg = re.compile(r"\b(" + re.escape("MYWORD") + r")\b", re.I) dict["content"] = reg.sub(r"%R\1%r", dict["content"]) You import Python's regex library, compile a regex that matches on MYWORD (escaped for safety's sake, more important later) and captures the content as `\1`. Then, you apply the regex to `dict["content"]` to surround the word with `%R` and `%r` (the reverse video hints). But this is horribly inefficient, you're compiling the regex each time and that's unnecessary. The advantage of the above is that's it's entirely self contained and a friend could past it straight into their config and it would work. So, using Python's scoping we can make it more efficient: :::python import re reg = re.compile(r"\b(" + re.escape("MYWORD") + r")\b", re.I) def hword(dict): dict["content"] = reg.sub(r"%R\1%r", dict["content"]) There. Adding this to your config will import the regex library and copile the regex once, and just apply it every time you need to. Now, we can take advantage of this scoping and make this a whole lot more useful by being generic. This is the code of `highlight_word` that's been in `canto.extra` since 0.7.6: :::python def highlight_word(word, flags=re.I, content="content"): import re reg = re.compile(r"\b(" + re.escape(word) + r")\b", flags) def hword(dict): dict[content] = reg.sub(r"%R\1%r", dict[content]) return hword So now we have a helper function that takes an argument of the word we want to highlight. We import the regex library (this is a bit different from `canto.extra` that imports it once at the top, but again this is drop-in-able code). We compile the regex (same as before but instead of MYWORD we're using the `word` argument). And then we define a hook that uses that regex and return it. Thanks to Python, that regex that we compiled once will follow that function around. This is called a "closure" and they're very useful. With this helper function, we can generate arbitrary highlighting hooks that work efficiently. Now, how do we use them? ## Using Custom Draw Hooks ## Adding a hook to the renderer is pretty easy. :::python r = get_default_renderer() add_hook_pre_reader(r, my_hook) add_hook_pre_story(r, my_hook) First you get the Canto renderer, then you add the hook before the render (that's the "pre" part of the `add_hook_pre_*` functions). Or, using a helper like the one we created above: :::python r = get_default_renderer() add_hook_pre_reader(r, highlight_word("something")) ## Draw Hook Order ## Sometimes the stage where your draw hook is executed is important. For example, if you want to access the raw item content (i.e. before an HTML processing), your hook needs to be run before the html conversion hook. Or, if you want your hook to have access to the enclosure links, it needs to be run after the hook that extracts those. ### Default Draw Hooks ### Pre reader render:
reader_convert_html Takes the item content and renders any HTML into pure text. Sets `dict["links"]` as well.
reader_highlight_quotes Highlight's quotes in color 5.
reader_add_main_link Sets link 0 as main link to `dict["links"]`
reader_add_enc_links Adds any enclosure links to `dict["links"]`
reader_render_links Adds the link readout, if `dict["show_links"]`.
Pre story render:
story_strip_entities Converts HTML entities from story titles (for reader hooks this is handled in the HTML conversion step)
### Specifying Order ### If you want to add your draw hook in a particular place, you merely have to specify the `before` or `after` keyword arguments to the add_hook_pre_* call. For example, to add a reader hook before the HTML is converted, you would do this: :::python r = get_default_renderer() add_hook_pre_reader(myhook, before="reader_convert_html") If you have another custom hook you want to be before or after, you can also pass a function that's already been added as a hook instead of a string. Following from the above example, you could do this: :::python add_hook_pre_reader(anotherhook, after=myhook)
# Writing Filters
Filters are easy to write and use. As of 0.7.0, filters are classes. A quick example is the `show_unread` filter: :::python from canto.cfg.filter import Filter class show_unread(Filter): def __str__(self): return "Show unread" def __call__(self, tag, item): return not item.was("read") As you can see, there are two components to a filter class. The Python `__str__` or `__unicode__` functions return the string representation of a class. This is used to display the status message when switching to this filter. In this case "Filter: Show unread" is shown. The Python `__call__` function is what's run when Canto tries to filter some items with your filter. This simply returns a boolean whether this item should be shown or not ( `True` = show this item, `False` = do not show). Python will let you use `0/None/""` for `False` and anything else will be interpreted as `True`. By default, Python functions return `None`, so if your sort has no return value it will filter *everything*. the `__call__` function is provided with a tag and a story object in order to make your determination. Custom defined filters can be used like any other filters in the config: :::python filters = [show_unread, my_new_filter, None] Canto doesn't care whether you class is instantiated or not, if it isn't it will be instantiated after the config is parsed.
# Writing Sorts
Sorts, like filters are simple to use custom classes as of 0.7.0. Sorts can be tricky though in that for them to work *consistently* `sort(a,b)` has to return the exact opposite of `sort(b,a)`. Also, sorts *must* return integers. Let's take a look at a toy example: sorting by length of title. :::python from canto.cfg.sorts import Sort class by_len(Sort): def __str__(self): return "By Length" def __call__(self, x, y): return len(x["title"]) - len(y["title"]) Similar to filters, the sort has a Python `__str__` function that's used when switching to this sort. In this case it will say "Sort: By Length" in the status bar when you switch to this sort. Also similar to filters, the sort has a `__call__` that does the actual work. A sort's `__call__` receives two story objects (x and y) that it can use to determine whether one item comes before the other. Based on these two objects, you return one of three cases:
sort(x, y) < 0 x should be before y
sort(x, y) == 0 x and y sort the same
sort(x, y) > 0 y should be before x
>**NOTE**: Python's sorts are "stable" meaning that if sort(x, y) == 0 then >whatever order x and y were in before the sort will be the same order they're >in after the sort. Like custom filters, custom sorts can be used like any other sorts :::python sorts = [ None, my_sort ] Canto doesn't care whether you instantiate my_sort or not, it will be instantiated after the config is parsed.
# Filter and Sort Notes
You should be wary, writing custom filters and sorts because they are called *very* often when they are used from the interface. As such, it's not a good idea to do anything too intensive inside of the `__call__` function. ## One time work For both custom sorts and filters you can set an `__init__` function to be called once. However, it's important that you call the parent class' `__init__` as well. Like so: :::python class myfilter(Filter): def __init__(self): Filter.__init__(self) # Call default init. import re self.regex = re.compile("...") # Do something intensive def __str__(self): return "My Filter" def __call__(self, tag, story): return self.regex.match(story["title"]) In this case, the regex is compiled *once*, keeping the filter as low intensity as possible. You can also define an `__init__` that takes arguments but if you do that **you must instantiate the class in the configuration**, Canto does not pass any arguments when automatically instantiating the class. For example: ::python:: # Partial filter taking two arguments class myfilter(Filter): def __init__(self, arg1, arg2): Filter.__init__(self) self.arg1 = arg1 self.arg1 = arg2 ... # Instantiate with arguments filters = [ myfilter(arg1, arg2), None ] ## Work with uncommon items (precaching) ## As mentioned in the [story object docs](http://codezen.org/canto/advconfig/#the-story-object) section, the story object attempts to be efficient by only keeping common content (the title, link, state, etc.) in memory. Using items that aren't "common" requires the story object to attempt to fetch them from the disk which is *expensive*. In order to keep these items in memory, you can add them to the `precache` list as part of your filter or sort. Like so: :::python def mysort(Sort): def __init__(self): Sort.__init__(self) # Call default __init__ self.precache = ["uncommon_item", "uncommon_item2"] ... Canto will honor this after the config is parsed and items will them automatically keep "uncommon_item" and "uncommon_item2" in memory.
# General Hooks
General hooks are hooks that are called whenever a certain criterion is met. Currently, they're all passed different arguments, but they might be standardized in the next major release (0.8.0). A general hook looks like: :::python def my_hook(...args..): Where args are dependent on which hook you're defining.
start_hook(gui) Called once on interface bringup.
update_hook(gui) Called when the interface updates.
unselect_hook(tag, story) Called when an item is unselected.
select_hook(tag, story) Called when an item is selected.
state_change_hook(feed, story, added_key, removed_keys) Called when an item's state has changed and is being written to disk.
resize_hook(cfg) Called whenever the window geometry could have changed.
end_hook(gui) Called when the interface is being torn down.
new_hook(feed, item, last) Called once for each new item.
>**NOTE**: new_hook is the only one of these hooks that doesn't require canto to >be running to execute. It is handled by canto-fetch. New_hook is primarily >meant for notification purposes. Using a user defined hook is just like any other hook. Either you can define the hook directly: :::python def resize_hook(cfg): # Do whatever Or you can define separate hooks, like `canto.extra` and set them: :::python def my_resize_hook(cfg): # Do whatever resize_hook = my_resize_hook
# Writing Custom Keybinds #
Writing custom keybinds (more complex than just chaining multiple key presses together, which can be accomplished with [macros](../config/#macros)), can be a useful way to automate Canto or share information with other programs. A keybind is merely a function that takes a gui or reader object argument and can optionally return a constant to get Canto to refresh, retag, update etc. For example, let's make a reader keybind that automatically goes to a link number instead of prompting. :::python def goto_6(reader): reader.dogoto(6) reader_keys['6'] = goto_6 Or, more generically, we could make a goto_n wrapper, so that we can loop and set a whole bunch of binds automatically: :::python def goto_n(n) def goto(reader): reader.dogoto(n) return goto for i in xrange(6): reader_keys["%s" % i] = goto_n(i) Another example is a simple save function: :::python import locale enc = locale.getpreferredencoding() def save(gui): f = open("mylog", "a") f.write(gui.sel["item"]["title"].encode(enc, "ignore")) f.close() keys['s'] = save >**NOTE**: Notice the encoding that has to be done on file I/O you can thank >Python for that little oddity. You have to explicitly convert otherwise you'll >get UnicodeEncode exceptions if you try to write a Unicode character that has >no equivalent in your system locale.
canto-0.7.10/doc/configuration000066400000000000000000001542161142361563200162560ustar00rootroot00000000000000This is where you learn the details of `~/.canto/conf.py`. For the impatient, you can skip to the [example config](#example-config). If you're interested in more programmer-centric customization and aren't afraid of getting your hands dirty with some Python, then you may also be interested in [advanced configuration](http://codezen.org/canto/advconfig). [TOC] # Configuration This section covers Canto's basics features and how to use filters, sorts, tags, and the other pre-written goodies that can be found in `canto.extra`. The actual writing of custom content is covered in later sections. This is all intended to be put into `~/.canto/conf.py` (conf, without the extension, is acceptable as well). ## Adding feeds
### `add()` `add` is the basic building block of Canto's config. As the name suggests, it adds a feed to the config. Ninety-nine percent of the time, a call like this will get the job done: :::python add("http://someurl") You can also tweak some other settings having to do with fetching the feed. The `rate` and `keep` variables effect the rate at which the feed is fetched from the server and `keep` determines how many items should be kept. The following line will update a feed every 30 minutes and keep up to a 100 items. :::python add("http://someurl", rate=30, keep=100) > **NOTE**: "keep" will be silently ignored if it's below the number of items > in the feed. In fact, by default `keep = 0`. This behavior differs from > 0.6.x. > The default `rate` is 5 for fetching from the server every five minutes. ### Password Protected Feeds If the feed is behind browser authentication (i.e. when you try to reach it in a browser it brings up a username/password box), you can specify those in the feed definition too. :::python add("http://someprotectedurl", username="myuser", password="mypass") > **NOTE**: In order to protect sensitive information in your config, it's > standard practice to `chmod 600 ~/.canto/conf` so that other users can't > read your password even if they can read your home directory. However, > Canto will not enforce these permissions as some other programs. There are a few other options for `add`, but these are more logically covered elsewhere. ### Script Extensions Canto supports using Snownews extensions. Essentially, these are executable scripts that, when run, spit out the feed XML. These are usually used to make a feed out of a webpage that doesn't usually provide a feed (which are thankfully becoming more and more rare). By default, these are put into `~/.canto/scripts/`, but this can be changed by adding the `-S` flag to `canto-fetch`. A typical example of using a script is to get a feed for the Slashdot polls which, as of this writing, has no RSS just for it. [slashdotpolls](http://codezen.org/static/slashdotpolls) is a script that will scrape Slashdot and output a feed. To use it: :::bash $ wget http://codezen.org/static/slashdotpolls $ chmod +x slashdotpolls $ mkdir ~/.canto/scripts $ mv slashdotpolls ~/.canto/scripts/ It's very important that the script is marked as executable, or the extension will fail. >**NOTE**: Because these extensions require an arbitrary script to be run as your user, DO NOT EVER use a script that comes from an unknown location without first READING the script to make sure it's not MALICIOUS. Then, to use the script from Canto, you'd add a feed starting with "script:", like this: :::python add("script:slashdotpolls -external") add("script:myscript -arg1 -arg2 ...") For slashdotpolls, `-external` is a flag that makes it print the RSS. You can find a lot more extensions like this in the Snownews [repository](http://kiza.kcore.de/software/snownews/snowscripts/extensions). ### "Sourcing" Other Files Canto supports adding feeds from other file formats. This can be useful when trying to keep URLs synced between readers. Canto can source OPML files at runtime simply by giving a path to the OPML file. :::python source_opml("/home/myuser/feeds.opml") Canto can also source plain lists of URLs, delimited by newlines. :::python source_urls("/home/myuser/urls") Feeds that are sourced are added with the equivalent of a basic `add` call with a URL. If you want to add other attributes to feeds that have been added this way, then you can use `change_feed` that takes the same arguments as `add` does. ### Tweaking Defaults At some point you may want to change the `rate` and `keep` parameters for a large quantity of feeds and do so simultaneously. Using `default_rate` and `default_keep` you can set those parameters for **every feed following the call**. Because this change only affects feeds that are added after the call, it can be used to set 'keep' and 'rate' variables for batches of feeds, rather than all feeds. If you want the 'keep' and 'rate' variables to affect all feed behavior globally, set the defaults before you define your feeds. > **NOTE**: To reiterate from above, `rate` is in minutes and `keep` will > ignore any number lower than the number of items in the feed's source. The following is a good application of using the default calls: :::python default_rate(30) # News feeds add("http://news1") add("http://news2") ... default_rate(120) # Slow blog feeds add("http://blog1") add("http://blog2") ... default_rate(1) # Quick feed default_keep(100) # Lots of items could be missed add("http://quick1") add("http://quick2") ... If you choose not to change settings, rate is set to five minutes (5) and keep is set to 0, which indicates that all the items in the feed source should be kept. ### Discard Policy Usually, it's preferable to discard items that are old enough that they're no longer inside the `keep` range for a particular feed. If you'd like to avoid ever discarding items with a particular tag or state, you can use the `never_discard` function. For example, to avoid ever discarding unread items: :::python never_discard("unread") You can also specify a tag like "Slashdot", but I wouldn't suggest it unless you're okay with spending large amounts of disk space for the 1000s of Slashdot articles you'll accumulate.
## Cursor Behavior (0.7.7+)
As of 0.7.7 Canto supports multiple types of cursor behavior. The default behavior since the beginning of Canto has been scrolling by one when the cursor attempts to go past an edge. Edge is the simplest type of scrolling, but Canto now supports the user defining how far from the actual end of the screen that edge should be. Also in 0.7.7 the ability to keep the cursor in one spot was added. This cursor behavior is changed with three different variables:
Name Valid Settings Meaning
cursor_type
"edge","top","middle","bottom"
Which cursor behavior to use
cursor_scroll
"scroll","page"
How the interface should scroll (only valid with "edge" cursor_type)
cursor_edge
integer
How far the cursor is from the end of the screen before it scrolls
Before 0.7.7 the default scroll behavior was: :::python cursor_type = "edge" cursor_scroll = "scroll" cursor_edge = 0 In 0.7.7 the new default is the same, but with a wider margin set with cursor_edge. :::python cursor_type = "edge" cursor_scroll = "scroll" cursor_edge = 5 Other common scrolling effects can be achieved with these variables. For example, paging like `mutt`: :::python cursor_type = "edge" cursor_scroll = "page" cursor_edge = 0 With `cursor_scroll = "page"` the `cursor_edge` value is respected, but it's highly recommended to keep the number low (0,1,2) to keep just enough context around the item. Higher edges are generally very disorienting. Or to keep the cursor in the middle of the page (at least when it doesn't screen real estate): :::python cursor_type = "middle"
## Browsing
Canto supports using external programs to open the content found in a feed item. Typically, you just want to set a `link_handler` to your favorite browser. :::python link_handler("firefox \"%u\"") This will use firefox as your browser. The `\"%u\"` will be replaced with the URL. Users that want to use a text based browser like [elinks](http://elink.or.cz), have to tell Canto to relinquish the terminal while you use it, like so: :::python link_handler("elinks \"%u\"", text=True) If you find yourself bouncing between the Linux console and an X terminal, you can use a bit of logic to automatically set the browser based on the `TERM` environmental variable. :::python import os if os.getenv("TERM") == "linux": link_handler("elinks \"%u\"", text=True) # Text-only else: link_handler("firefox \"%u\"") # X terminal ### Non-HTML Content Links to PDFs and other content you'd rather view in a program other than your browser (like enclosures) can be setup by using `link_handler` with an extension. For example, to open and .mp3 in a podcast: :::python link_handler("mplayer -someoptions \"%u\"", ext="mp3") Fortunately, mplayer can stream from the web by default. Some applications require the content to be fetched before hand. This requirement can be handled using the `fetch` parameter. For example, to open a .pdf in evince that doesn't support opening from the internet directly you can write: :::python link_handler("evince \"%u\"", ext="pdf", fetch=True) Canto will then fetch the content into `/tmp` and run the associated program. ### Images Images are handled similarly to links with the `image_handler` call. It takes the same arguments as `link_handler`. A good example: :::python image_handler("fbv \"%u\"", text=True, fetch=True) This will use `fbv` to view an image in a text console. > NOTE: Image links are denoted by the color blue in the reader
## Reader Layout
You can dedicate space for the reader, rather than having it float above the items (the default behavior). `reader_orientation` and `reader_lines`. Reader orientation can be one of five possible settings. :::python reader_orientation = None # Default floating reader_orientation = "left" # Dedicated left of the item list reader_orientation = "right" # Dedicated right of the item list reader_orientation = "top" # Dedicated on top of the item list reader_orientation = "bottom" # Dedicated under the item list You can also specify the size for any of the dedicated layouts (i.e. not floating). For "left" and "right", `reader_lines` controls the width, and for "top" and "bottom" it controls the height. It's set like this: :::python reader_lines = 10 `reader_lines` has a minimum of three lines since the default theme ceases to behave well when its space is so constricted. Three lines is practically unreadable, so this is unlikely to change. ### Layout Hook Setting the orientation and size of the reader area statically can be useful, but can lead to trouble (like setting the reader area to be larger than the available space, which is not good). [Hooks](#hooks) are covered later, but for now a `resize_hook` is useful to resize the reader area to be a proportion of the available space, rather than a constant. This hook will make a reader area that takes half of the screen to the left, no matter how the window is resized and set the number of columns in the main list. :::python def resize_hook(cfg): cfg.reader_orientation = "left" cfg.reader_lines = cfg.width / 2 cfg.columns = (cfg.width / 2) / 65 Copying and pasting this anywhere in your config will achieve the desired effect.
## Colors
Changing the colors of the interface is simple. There are eight default ncurses colors, and one place holder for a default value.
Curses Color Number Representation
-1 "default"
0 "black"
1 "red"
2 "green"
3 "yellow"
4 "blue"
5 "pink" or "magenta"
6 "cyan"
7 "white"
>**NOTE**: "default" is usually black on a default terminal. If your terminal supports transparency though, it will be made transparent. >**ALSO NOTE**: With curses colors you occasionally have to be creative about getting colors not listed here. For example, to achieve "gray", you have to use "black", but make the text bold. You can use these colors in eight different slots in canto.
Color Pair Definition How it's used
0 (White, Black) This is default color pair
1 (Blue, Black) This is used for unread story items.
2 (Yellow, Black) This is used for read story items.
3 (Green, Black) This is used for links in the reader.
4 (Magenta, Black) This is used for quotes in the reader.
5 (Black,Black) This is used for emphasis (italic/small/em) text in the reader, used with %B to appear gray
6 (Blue,Black) This is used for image links in the reader
7 (Black,Black) This is unset/unused.
Changing these items is as simple as using the `colors` list. :::python colors[0] = "blue" colors[0] = 4 colors[0] = (4, -1) colors[0] = ("blue", "default") These statements are equivalent. If you only specify one number or one color, it's used as the foreground color and inherits the background of `colors[0]`, or "default" if you're setting `colors[0]`. Therefore: :::python colors[0] = ("blue", "white") colors[1] = "red" Now `colors[1]` inherits `colors[0]`'s background, which would now be set to ("red", "white"). ### 256 Colors On terminals that support 256 colors, you can specify colors by number. A color chart for xterm is available [here](http://www.calmar.ws/vim/256-xterm-24bit-rgb-color-chart.html) :::python colors[0] = 120 To make sure that your terminal supports 256 colors, you can test it with this [color script](http://codezen.org/static/colortest), which is a mirrored copy of [this](http://www.vim.org/scripts/script.php?script_id=1349) Vim script. If you're having trouble with your terminal and are sure that it supports 256 colors, try setting your `TERM` variable before invoking canto: :::bash $ TERM="xterm-256color" canto
# Using Advanced Features Canto is extremely powerful due to its internal use of the Python interpreter for all of its configuration requirements. The details of writing extension content is covered elsewhere, but there is a lot of good information included with the source. ## Importing canto.extra
In order to use extra content it must be imported in the usual pythonic way, as in: :::python from canto.extra import * A call to import canto.extra will make all of the goodies packaged with Canto available to your config.
## Keybinds
### Specifying Keys The first step to define your own keybinds is to learn how to specify which key you're binding to. Typically, it's very easy to rebind keys. :::python keys['a'] = ... reader_keys['a'] = ... Any visible non-newline character can be used directly. Whitespace characters (including newline) can be embedded with their typical escape (i.e. \t for tab, \n for newline, etc.). :::python keys['\n'] = ... # Enter keys['\t'] = ... # Tab keys[' '] = ... # Space keys[' '] = ... # Tab Any invisible characters, like function keys, arrow keys, etc. can be used by their ncurses name. On the man page for `getch()`, a list of all possible names is available. Here's an [online copy](http://www.mkssoftware.com/docs/man3/curs_getch.3.asp). Typically definitions using these keys look like this: :::python keys['KEY_F1'] = ... keys['KEY_LEFT'] = ... To specify Control or Alt key combinations, you can use "C-" for control and "M-" (meta) for Alt. :::python keys['C-a'] = ... # Ctrl+A keys['M-a'] = ... # Alt+A keys['C-M-a'] = ... # Ctrl+Alt+A ### Default Binds The following keybinds are typically available to the user. They will be used in the examples below. #### Main View
Default Binding Name Function
h
help
Shows the man page (has all of these bindings listed).
KEY_DOWN / j
next_item
Move to the next item.
KEY_UP / k
prev_item
Move to the previous item.
KEY_NPAGE / l
next_tag
Move to the next feed/group of items
KEY_PPAGE / o
prev_tag
Move to the previous feed/group of items.
KEY_RIGHT
just_read
Mark current story read and nothing else.
KEY_LEFT
just_unread
Mark current story unread and nothing else.
g
goto
Open the current story in your browser.
f
inline_search
Mark all stories matching a search.
n
next_mark
Go to the next marked story.
p
prev_mark
Go to the previous marked story.
.
next_unread
Go to the next unread story.
,
prev_unread
Go to the previous unread story.
Space
"reader"
Mark the story read and open the reader.
c
toggle_collapse_tag
Collapse/Show a tag of items.
C
set_collapse_all
Collapse on all tags.
V
unset_collapse_all
Uncollapse all tags.
m
toggle_mark
Mark/unmark an item.
M
all_unmarked
Unmark all items
r
tag_read
Set all stories in a feed/group read.
R
all_read
Set all stories read.
u
tag_unread
Set all stories in a feed/group unread.
U
all_unread
Set all stories unread.
C-r
force_update
Reread stories from disk.
C-l
refresh
Redraw the screen.
q
quit
Quit Canto.
\
restart
Restart canto (0.7.6+)
]
next_filter
Apply next global filter.
[
prev_filter
Apply previous global filter
}
next_tag_filter
Apply next tag filter (from filters)
{
prev_tag_filter
Apply previous feed filter
=
next_tag_sort
Apply next tag sort
-
prev_tag_sort
Apply previous tag sort
<
prev_tagset
Show previous set of tags
>
next_tagset
Show next set of tags
;
goto_reltag
Goto the nth visible tag, relative to current index (filter aware)
:
goto_tag
Goto the nth tag (filter unaware)
#### Reader View
Default Binding Name Function
KEY_DOWN / j
scroll_down
Scrolls, if there's more text.
KEY_UP / k
scroll_up
Scroll up, if not at the top.
KEY_NPAGE
page_down
Page down.
KEY_PPAGE
page_up
Page Up.
n
["destroy", "just_read", "next_item", "reader"]
Goto the next story without closing the reader.
p
["destroy", "just_read", "prev_item", "reader"]
Goto the previous story without closing the reader.
g
goto
Go to a specific link listed inside the item text.
l
toggle_show_links
Show/hide the list of links at the bottom of the reader.
Space
["destroy", "just_read"]
Close the reader
q
["destroy", "just_read", "quit"]
Quit Canto.
h
["destroy", "just_read", "help"]
Show help.
### Using Default Binds Setting a new key for pre-existing functionality is easy to do using strings. As you can see in the above table, the bind "help" brings up the help page. To rebind this functionality to the F1 key (a typical DOS binding), you could simpy do :::python keys["KEY_F1"] = "help" As you might expect, you can also override existing keys :::python keys[' '] = "next_item" # Overrides the default "reader" command And you can unset a key all together :::python keys['q'] = None # Unsets 'q' ### Macros Canto allows you to queue up more than one action with a keybind. A simple list can get the job done. For example, to create a keybind that will set an item as read and move to the next list item (rather than using the right arrow followed by the down arrow) we could set a macro like this :::python keys['j'] = ["just_read", "next_item"] "just_read" sets the item as read and "next_item" moves to the next item. More complicated macros can be created that can cover both main view and reader view keybinds. Take for example the default binding of "n" in reader view. :::python reader_keys['n'] = [ "destroy", "next_item", "reader" ] This macro allows you to go to the next item without leaving the reader. When this macro executes three events happen: "destroy" kills the reader, "next_item" makes the main interface go to the next item, and "reader" makes the main interface re-open the reader. All this work is done with one keystroke. Another common macro task is to open the reader and automatically open the link list. This also can be achieved with this simple code :::python keys[' '] = ["just_read", "reader", "toggle_show_links" ] Using macros and keybinds, it's possible to get a maximum amount of work from a minimum number of keystrokes. ### Keybind Goodies. Rebinding some existing functionality to a different key or creating a simple macro will certainly make most users work faster and easier. Up until now, we've only used strings in the keybinds and macros. These strings are shorthand for built-in functionality. However, in place of these strings, you can bind functions to keystrokes. Doing so, adds a very powerful feature to Canto's interface. Later in the document we'll cover `set_filter`, `set_tag_filter`, and `set_tag_sort` which are all defined in `canto.extra`. For now, we'll cover some more interesting and useful additions. #### Searching You can setup a keybind to search for your favorite terms using the `search` keybind, which takes a keyword argument or a regex. This uses the internal `inline_search` behavior and marks all items matching the search. :::python keys['1'] = search("Linux") keys['2'] = search(".*[Uu]buntu.*", regex=True) You can also use `search_filter` which will prompt you interactively for a keyword (or a regex if you prefix the string with "rgx:") and filter out all unmatching items. :::python keys['/'] = search_filter Once again, note that `search_filter` is **not** in quotes, it is not a string because it's not a builtin keybind. `search_filter` is defined in `canto.extra` and therefore is used as a function. #### Copying (Yanking) A neat function for putting a link on the X clipboard (for use in pasting into a chat, a browser, etc.) can be used :::python keys['y'] = yank > **NOTE**: Yank requires `xclip` to be installed and visible in your `PATH`. On Debian based distros it's the `xclip` package, but on some it might be included with a generic X11 application meta-package. If in doubt, do `which xclip` from your shell. #### Downloading Content New in version 0.7.6 is the capability to `wget` content out of links. This essentially amounts to a custom `link_handler`/`image_handler`. :::python reader_keys['w'] = wget_link("/path/to/downloads") The above will make 'w' in the reader prompt you for a link number and will download that link into the path specified. > **NOTE**: wget_link requires `wget` to be installed and visible in your > `PATH`. On most distros this is already installed or is available in a `wget` > package. #### Saving The last neat little utility is `save` which writes a file (~/canto_out) with a title and a link when called. This is designed as a template example for writing a keybind, rather than a fully functional bind but it can be useful. :::python keys['s'] = save
## Filters
Perhaps the most useful extra feature Canto provides is its powerful filter system. `canto.extra` provides a number of useful filters
None
Filter no items.
show_unread
Ignore all items that have been marked read
show_marked
Ignore all items that are unmarked
only_with("string")
Show only items that have "string" in the title
only_without("string")
Show only items that **don't** have "string" in the title
all_of(filter1, filter2, ...)
Show only items that pass all listed filters (binary and)
any_of(filter1, filter2, ...)
Show items that pass any of the listed filters (binary or).
Additionally, there is `with_tag_in`, which is covered in the tag section, specifically. There are three ways to apply filters. - **Global filters**. These apply regardless. Any items that you see in the interface had to pass through this filter. Global filters are useful, for example, to filter items based upon a given state, as in`show_unread`. - **Tag filters**. These filters only apply to specific tags (See the tag section). - **Feed filters**. These filters are applied when loading content from disk. Items that don't pass this filter will never appear in the interface. Feed filters are useful when you want to ignore a whole set of items entirely, like news posts in webcomic feeds. ### Using Global Filters Of the the three filters, global filters are arguably the most useful.For example, a global filter can be used to filter out items that have already been read. Accomplishing that is simple: :::python filters=[show_unread] Setting the 'show unread' filter will remove all previously read feed items by default when Canto opens. If you still want to have access to all items, you can add the `None` filter to the list: :::python filters=[show_unread, None] With this filter in place, you can switch between `show_unread` and `None` using `[` and `]` to cycle through the list. If you're more comfortable using a keybind to choose your global filters, then you can use `set_filter`. This allows you to set the global filter regardless of whether it's in the `filters` list: :::python keys['1'] = set_filter(show_unread) keys['2'] = set_filter(show_marked) keys['3'] = set_filter(None) This lets you use the 1, 2, and 3 keys to set your filters directly, without needing to cycle through the list. ### Using Feed Filters Most of the time, feed filters are only useful if you want to completely ignore some easily filtered content in a feed. My favorite example is ignoring all non-comic items in a webcomic feed. Take [penny-arcade](http://penny-arcade.com)'s feed for example. Each item's title is clearly marked with "Comic:" or "News:". If I wanted just completely ignore non-comic items, I could modify the `add` call for Penny Arcade to use the `only_with` filter: :::python add("http://feeds.penny-arcade.com/pa-mainsite", filter=only_with("Comic:")) This filter will eliminate all items that don't have "Comic:" in the title. Other examples include filtering distro package feeds for only a certain type of package (i.e. Gentoo, `only_with("sys-devel")`), or filtering porn torrents from torrent site feeds (`only_without("XXX")`, provided the feed items are clearly marked).
## Tags
A tag is an arbitrary set of stories. By default, Canto creates a single tag per feed and if you never use any other tags, feeds and tags are analogous. A tag allows you to filter, sort and otherwise customize how these groups of items are displayed. ### Manipulating Default Tags As mentioned above, each feed is given a tag by default. That tag's name is the name specified in the feed's source. So for the reddit feed, the tag's title (which is displayed at the top of the box of stories in the interface) is "reddit.com: what's new online!". That title is a bit long, and we want to use something a little more concise. So, to override the default tag, we can add this to the config: :::python add("http://reddit.com/.rss", tags=["Reddit"]) This addition will cause the displayed name to change to "Reddit" from the longer "reddit.com: what's new online!". ### Adding Tags Adding a tag to a feed is as simple as coming up with a name and adding it to the tag list. :::python add("http://some-blog", tags=[None, "blogs"]) add("http://some-other-blog", tags=[None, "blogs"]) > **NOTE**: `None` in the tag is shorthand for using the title included with the > feed. If all tags are omitted, `tags=[None]` is implied. This addition will define an implicit tag "blogs". After adding that tag, you can use < and > to switch between the default set of tags (i.e. one per feed) to the "blogs" tag. Notice that when you switch to the "blogs" tag, the displayed content will be the stories in the first feed followed by the stories in the second feed. This display may not seem very useful if you're using implicit tags, but, when you add a sort to mix, the two feeds you can achieve some neat effects, like organizing all of your favorite blog posts from around the internet in chronological order. ### Using Tags as Folders Typically, the above behavior, appending the items together by using a common tag is not what a user expects unless they're going to use a sort. Usually, tags are used as folders names so that switching to "blogs" means showing all the feeds that have "blogs" in the tags. This behavior is accomplished using the `with_tag_in` filter. Following the above example, we can emulate folders with global filters: :::python add("http://some-blog", tags=[None, "blogs"]) add("http://some-other-blog", tags=[None, "blogs"]) filters = [ None, with_tag_in("blogs") ] With this snippet, using ']' to switch to the next global filter will cause Canto to display only the items in the two "blogs" feeds, but the items will still be organized by feed rather than displayed as an appended list of items. You can also list multiple tags and use implicit default tags for use in `with_tag_in` :::python add("http://rss.slashdot.org/slashdot/Slashdot") # Creates implicit "Slashdot" tag add("http://.some-blog", tags=[None, "blogs"]) filters = [ None, with_tag_in("Slashdot", "blogs") ] Lastly, you can combine `with_tag_in` and other filters with `all_of` :::python filters = [ None, all_of(with_tag_in("blogs"), show_unread) ] This combination will make your second global filter show you all of your blog feeds, but only their unread items. ### Adding Explicit Tags So far we've only dealt with implicit tags that are either created by default or by appending a string to the `tags` list. Such creations are only useful for using tags with < / > or in filters. However, tags themselves can have attributes. You can make an explicit tag with the `add_tag` function. :::python add("http://some-blog", tags=[None, "blogs"]) add("http://some-other-blog", tags=[None, "blogs"]) add_tag("blogs", ...parameters...) These definitions can come before or after you use them in `add` calls. ### Tag Filters Tag filters, as the name would suggest only apply to a specific tag. These filters are useful if a filter would only make sense for a certain set of items rather than globally. Let's return to the webcomic example from the [feed filter](#using-feed-filters) section. In that example, we wanted to entirely discard posts that were news and only see comics. Using a tag filter, however, it's possible to keep all items, but merely hide (rather than entirely discard) the other stories. This is useful if you want to prioritize one set of stories over another. In this case, we want to prioritize the comics, but make the news items available on request. :::python add("http://feeds.penny-arcade.com/pa-mainsite") # Implicitly creates "Penny Arcade" tag add_tag("Penny Arcade", filters=[only_with("Comic:"), only_with("News:")]) This example makes the "Penny Arcade" tag explicit and sets up two tag filters. Now when you've selected the Penny Arcade feed, you can use { and } to switch between the tag filters and show comics or news. Alternatively, a similar effect could be achieved by using `only_without("Comic:")` as the second filter, which would allow all items not shown in the first filter, not necessarily just items with "News:" in them. Using these tag filters, you can essentially turn one tag or feed into multiple overlapping tags and cycle through them. > **NOTE**: Tag filters are always overridden by global filters. If your global filter is `show_unread`, even if your tag filter is `None`, you won't see any read items. Like global filters, tag filters can be set by default. :::python default_tag_filters([show_unread]) Similar to `default_rate` and `default_keep`, these defaults are applied as **explicit** tags are created. Any tags created with `add_tag` will inherit the default tag filters from the call immediately before the `add_tag` (or `[None]` if it hasn't been called at all). Implicit tags (i.e. not created with `add_tag`) are made explicit after the rest of the configuration is done, so they will inherit the defaults from the last call to `default_tag_filters` made in the config. Just like global filters, tag filters can be set directly via keybind :::python keys['u'] = set_tag_filter(show_unread) This keybind will set the current tag's filter to `show_unread`. > **NOTE** : Unlike global filters, tag filters will *never* make a tag fully disappear since there would be no easy way to change the tag filter back to one with items in it. ### Sorts Another benefit of making explicit tags is the ability to sort items in varied ways. `canto.extra` defines some default sorts to use.
None
Use the ordering specified in the feed.
by_date
Order by the time the items are parsed.
by_len
Order by length of title.
by_content
Order by length of content.
by_alpha
Sort alphabetically.
by_unread
Order by read status.
reverse_sort(sort)
Reverse the given sort.
> **NOTE**: Sorts based on strings are done on unparsed strings. This means that the strings could still have HTML built into them and untranslated entities. This effects the sort because the length or the first character may not be what's displayed. A title "<strong>Zoo</strong>" will sort alphabetically before "Aardvark Sighting" because "<" is before "A", despite the fact that the HTML will not be displayed. > This was done to speed sorts so that interpretable HTML wouldn't have to be stripped before and replaced after the sorting it done. > **ALSO NOTE**: Sorts can possibly make Canto's memory footprint increase marginally if they require access to data that isn't usually kept in memory. Sorts that function on the title (`by_length`, `by_alpha`, etc.) have no effect because the title is *always* in memory. Sorts like `by_date` require a date field to be kept in memory so it adds a couple of bytes per story. > This was also a speed tweak to avoid stories hitting the disk every time it's sorted which makes the program grind to a halt. The simplest way to use a sort is to do so when you define a tag :::python add_tag("Tag", sorts=[by_unread]) The above code will sort the given tag with unread stories first. Similarly to filters and sorts you can set defaults and use keybinds to set sorts. :::python default_tag_sorts([show_unread]) keys['s'] = set_tag_sort(show_unread) And, once again, like `default_tag_filters`, explicit tags inherit the tag filters from the previous call to default_tag_sorts, while implicit tags inherit the sorts from the final call to default_tag_sorts. Sorts like `by_date` are most useful when combining two feeds into a single tag :::python add("http://news1", tags=["News"]) add("http://news2", tags=["News"]) add_tag("News", sorts=[by_date]) ### Sort Order Anywhere that a sort can be used, you can use multiple sorts with the `sort_order` function from `canto.extra`. This takes any number of sorts in order of priority. :::python default_tag_sorts([sort_order(by_unread, by_alpha)]) This snippet will make tags sort items first by unread status and then sort the same items alphabetically.
## Adding Content
A common task is to add relevant information to the reader. ### Typical Content A lot of feeds support typical information about each item. By default, the reader displays the title, the description, and the subsequent links. If you wanted to add other content, you can use `add_info`. For example, to add the author of an item to the reader: :::python r = get_default_renderer() add_info(r, "Author") This will add the following line to the reader, above the content: Author: [author] `add_info` takes other arguments to customize how the line is displayed. :::python add_info(r, "Author", caption="by: ") The resulting line now looks like this: by: [author] If the item being displayed doesn't include any author information, the line will be entirely ommitted. Additionally, it could be that the information isn't useful and should be ignored. Lots of feed generators set author to a default like `donotreply@somedomain` which isn't useful info. Other feeds will put author information into the content anyway. Because of this, you can specify to only add the information to particular tags. :::python add_info(r, "Author", tags=["Slashdot"...]) ### Less Common Content It's difficult to know whether your RSS includes any special information. As of 0.7.x, canto includes a simple wrapper script called `canto-inspect`. You call it like so: canto-inspect [URL] > output It's essentially a custom pretty printer for the XML, Although it is not extremely advanced, using canto-inspect you can detect interesting content, as in this partial output from `canto-inspect http://rss.slashdot.org/slashdot/Slashdot`: :::text [entries] [0] [summary_detail] [base]: http://rss.slashdot.org/slashdot/Slashdot [type]: text/html [value]: ... [language] [updated_parsed]: ... [links] [0] [href]: http://rss.slashdot.org/~r/Slashdot/slashdot/... [type]: text/html [rel]: alternate [title]: Doctors Fight Patent On Medical Knowledge [slash_department]: no-not-patent-medicine [feedburner_origlink]: http://yro.slashdot.org/story/09/07/... [author]: kdawson [updated]: 2009-07-21T18:20:00+00:00 [summary]: ... [title_detail] [base]: http://rss.slashdot.org/slashdot/Slashdot [type]: text/plain [value]: Doctors Fight Patent On Medical Knowledge [language] [slash_section]: yro [link]: http://rss.slashdot.org/~r/Slashdot/slashdot/~3/... [slash_hit_parade]: 0,0,0,0,0,0,0 [id]: http://yro.slashdot.org/story/09/07/21/1646216/... [tags] [0] [term]: patents [scheme] [label] In the above hodge-podge of information, we can see some content which might interest some users. Take `slash_department` and `slash_section` divisions for example. Using the add_info function, we can add the content as follows: :::python r = get_default_renderer() add_info(r, "slash_department", caption="Dept: ", tags=["Slashdot"]) > **NOTE**: The first argument to `add_info` corresponds to the content in the [brackets], but isn't case sensitive. ### Highlighting ### New in 0.7.6 is the ability to statically highlight words in the reader or main views. :::python r = get_default_renderer() add_hook_pre_reader(r, highlight_word("NASA")) add_hook_pre_story(r, highlight_word("never")) This will, for example, highlight the word "NASA" in the reader and "never" in the main view. These are *not* case sensitive by default. Those familiar with Python regex can specify a `flags` arg but if all you need is case sensitivity you can set it to `None` :::python r = get_default_renderer() add_hook_pre_reader(r, highlight_word("NASA", None)) A case sensitive version of the reader highlight above.
## Update Triggers
Canto supports a number of different update mechanisms. >**NOTE**: These triggers are to update the client from disk only, they have nothing to do with getting items from the server. That is only controlled by running `canto-fetch` and the rates you have set in the configuration. * **Interval Updating**. This is the default behavior. At intervals (generally about a minute), the feeds are read from disk and the display is updated. This behavior is what most people expect from their RSS reader. * **Change Tag Updating**. This makes the client update whenever you change feeds/tags. This is useful to use with filters. * **Signal Updating**. This enables you to issue a `SIGUSR1` to canto and trigger a screen update. Most useful from a script (i.e. write a script or a cron that runs `canto-fetch` and immediately issues `SIGUSR1`. * **Manual Updating**. Not automatic, but driven by a keybind (C-r by default). ### Considerations Multiple update triggers allow users to update Canto's feeds in different ways depending upon their reading habits. Users who don't appreciate text shifting out from under their eyes might want to avoid the every-minute `interval` update and use the `change_tag` or manual update triggers to insure more predictable refreshes. On the other hand, users that tend to jump from one tag to another and do short bursts of content reading might find the interval triggers more to their liking. It's all about what suits you. If you feel there is an update trigger that we need to implement, file a feature request bug and we'll consider it. Update triggers are fairly easy to implement. ### Using Triggers Using triggers is a simple as using a list. By default, `triggers` is set like this: :::python triggers = ["interval"] You can add triggers or remove triggers with typical Python list functions :::python triggers.append("change_tag") triggers.append("signal") triggers.remove("interval") You can only set "change_tag", "signal", and "interval" in `triggers`. Manual isn't considered a real trigger, but is set like other keybinds to `force_update`. I expect there will be refinements to the trigger system in upcoming releases. Once again, any ideas for new triggers or improvements (or even code) are [welcome](../contact).
## Hooks
Canto includes a number of hooks for outside extensibility. You may find hooks to be most useful when you author them yourself. Even so, `canto.extra` does include a few basic, but useful, hooks. The possible hooks:
start_hook
Run once, on startup
resize_hook
Run when the window is resized (including on start)
new_hook
Run once for every new item.
select_hook
Run when a new item is selected
unselect_hook
Run when an item is unselected
state_change_hook
Run whenever an item's state (read/marked) changes
update_hook
Run when the interface updates
end_hook
Run when the interface closes
> **NOTE**: All hooks are enforced by the interface **except new_hook**. `new_hook` is intended to be used as a notification method. All other hooks don't function unless Canto is running. ### Using Hooks There are only two hooks included in `canto.extra` by default. These are for manipulating the titles of an xterm (or another compatible X terminal). :::python select_hook = set_xterm_title end_hook = clear_xterm_title This will set the xterm's title to "Tag - Title" when you select an item and clear it when Canto closes. Because this hook doesn't work everywhere and where it doesn't work it essentially clobbers ncurses by printing to the screen (you set an xterm title by writing to stdout with a special code), it's usually preferable to check the environment's `TERM` before employing the hooks. :::python import os if os.getenv("TERM") == "xterm": # Or other compatible term select_hook = set_xterm_title end_hook = clear_xterm_title This code ensures that when you switch to the Linux console or another terminal, Canto won't start spewing uninterpreted content to the screen.
# Example Config
Here is a modified version of my own config that should serve as a decent starting point for any new users. :::python from canto.extra import * import os # Handlers when in Linux console or xterm if os.getenv("TERM") == "linux": link_handler("elinks \"%u\"", text=True) image_handler("fbi \"%u\"", text=True, fetch=True) else: link_handler("firefox \"%u\"") image_handler("feh \"%u\"", fetch=True) # Max column width of 65 characters def resize_hook(cfg): cfg.columns = cfg.width / 65 # Never discard items I haven't seen never_discard("unread") # I prefer change_tag to interval # Uncomment these to use it too # triggers.remove("interval") # triggers.append("change_tag") keys['/'] = search_filter keys['y'] = yank # Use [ / ] to switch between global filters filters=[show_unread, None] # Make unread items float to the top, when not # using show_unread filter default_tag_sorts([by_unread]) # Selected Feeds add("http://rss.slashdot.org/slashdot/Slashdot", tags=[None, "news"]) add("http://osnews.com/files/recent.xml", tags=[None, "news"]) add("http://www.damninteresting.com/?feed=rss2") add("http://reddit.com/.rss", tags=["Reddit", "reddits", "news"]) add("http://programming.reddit.com/.rss", tags=[None, "reddits"]) add("http://netsec.reddit.com/.rss", tags=[None, "reddits"]) #... # Some examples # Uncomment if you've downloaded the script # add("script:slashdotpolls -external") # # Simple password example # add("http://feedparser.org/docs/examples/digest_auth.xml", username="test", # password="digest") You can download this example config [here](http://codezen.org/static/conf.py.example)
# Upgrading from 0.6.x
For most users, upgrading to 0.7.x from 0.6.x should be painless. There are some quirks that may cause trouble. ## Standard Procedures First of all, if you run `canto-fetch` as a daemon, you want to make sure that all the old daemons aren't running. There aren't any differences in the disk format between the two versions, but it's bad practice to have multiple versions of software running on the same data. You can properly kill all running `canto-fetch` instances like so: :::bash $ killall -INT canto-fetch After that, you should have no running instances. You can check with :::bash $ ps -u [youruser] | grep canto-fetch If you still having running instances after a few moments, you can issue `killall -9 canto-fetch` to force them to exit. ## Shared Memory The (multi)processing module requires semaphores that are supported by `/dev/shm` with glibc. If you're getting weird errors like OSError: [Errno 13] Permission denied or OSError: [Errno 38] Function not implemented Then you need to mount `/dev/shm` and make sure you have read/write permissions. You can do this as root or with sudo like this: :::bash $ sudo mount shm /dev/shm -t tmpfs By default, `tmpfs` is has 777 permissions, but just in case: :::bash $ sudo chmod 777 /dev/shm As a side note, this is mounted by default in most common distros, and can improve the peformance of some applications using shared memory. In fact, glibc 2.2+ expects it to be mounted. To get it to be mounted on startup, add this line to your `/etc/fstab` tmpfs /dev/shm tmpfs defaults 0 0 ## Configuration Once again, for most users, changes to your configuration shouldn't be necessary. If you loop through the color array, you may have to change your configuration. If you use sorts, or define sorts and filters, then you may need a configuration change. ### Color array This is mainly if you're doing a more advanced loop through the color list. If you're just setting colors in the typical way (`colors[0] = (num/str,num/str)`), then you should be okay. If you're looping with :::python for i, (fg, bg) in colors: Then you may run into trouble. The new default colors are not all set to tuples when configured so the (fg, bg) may except. However, the most common use for this loop is to set a common background for all colors. In 0.7.x, if a color is not set, the background of a color defaults to the background color of the first pair, making the loop unnecessary. In short :::python # 0.6.x version for i, (fg, bg) in colors: colors[i] = (fg, "newbackground") # 0.7.x version, setting the background of 0, changes them all colors[0] = ("white", "newbackground") ### Using Sorts The primary difference with using sorts is that sort order is no longer conveyed as a simple list. This was confusing and made for a lot of double lists in weird places. :::python add_tag("sometag", sorts=[[by_date]]) # 0.6.x add_tag("sometag", sorts=[by_date]) # 0.7.x To convey the same meaning as the double lists used to (i.e. sort order), you can use the new `sort_order` function. :::python add_tag("sometag", sorts=[[by_alpha, by_len]]) # 0.6.x add_tag("sometag", sorts=[sort_order(by_alpha, by_len)] # 0.7.x ### Defining Filters and Sorts If you created your own filters and sorts for 0.6.x, the main difference is that these now must be classes which subclass `Filter` and `Sort` respectively. So where once :::python # 0.6.x valid filter def myfilt(tag, story): ...perform filter... was valid. You now need :::python # 0.7.x valid filter class myfilt(Filter): def __call__(self, tag, story): ...perform filter... Also, any items used other than "title", "link", "id", and "canto_state", should be added to the precache variable of the class. :::python class myfilt(Filter): def __init__(self): Filter.__init__(self) self.precache = ["extra_item", ...] ... You'll know that this needs to be done if Canto is extremely sluggish. Of course, you can see examples of the new classes in [canto.extra](http://codezen.org/cgi-bin/gitweb.cgi?p=canto.git;a=blob;f=canto/extra.py;hb=HEAD) ### Validation 0.7.x is more strict than 0.6.x about validating your configuration. It's possible that accepted input that doesn't fall under the previous categories and still doesn't work with 0.7.x. Usually in this case, the error message is enough to set you straight. If you're still having trouble, [contact](../contact) me. ### Other Changes * `new_hook` is enforce by canto-fetch now, and will thus run even without Canto * `keep` variables set < the number of items in the feed source will be ignored and thus `keep=0` now indicates that all items in the feed should be kept and no more. ### If All Else Fails If you're really stuck and confused trying to upgrade: [contact](../contact) me.
canto-0.7.10/doc/contact000066400000000000000000000034731142361563200150400ustar00rootroot00000000000000[TOC] # IRC
A number of users and myself hang out in `#canto` on `irc.freenode.net`. My handle is `@jmiller`. This is my preferred way of tackling bugs because it's real time, and you can answer my debugging questions. If you're looking for a fix as fast as possible, come here. This goes especially for crashing bugs. If you're new to IRC, [xchat](http://www.xchat.org) is a good beginner IRC client. I use [irssi](http://www.irssi.org). I also suggest using [pastebin.ca](http://pastebin.ca) as a pastebin because it supports non-ASCII characters.
# Mailing Lists
Canto has two official mailing lists. * [canto-reader](http://codezen.org/cgi-bin/mailman/listinfo/canto-reader) is a mailing list for general discussion, asking questions, requesting features. If you're not a fan of IRC then this is the place for you. This is where you should submit **patches** as well. * [canto-reader-announce](http://codezen.org/cgi-bin/mailman/listinfo/canto-reader-announce) is the typical ANNOUNCE: list, read-only and essentially designed for maintainers or folks that aren't running a distro that packages Canto. This content is not mirrored on the canto-reader list.
# Bug Tracker
Canto uses GitHub for its [bug tracker](http://github.com/themoken/Canto/issues). Generally, you should submit bugs through IRC or the mailing lists first, but you can also create a bug/feature request here. This requires a GitHub account.
# E-Mail
Before Canto had the mailing lists, I encouraged people to mail me directly with any problems. Now, I'd prefer you to use the mailing lists. If you really have something to say just to me that won't help anyone else (i.e. praise/flames) then my email is **jack [at] codezen [dot] org**
canto-0.7.10/doc/faq000066400000000000000000000141511142361563200141470ustar00rootroot00000000000000The compulsory Frequently Asked Questions page. [TOC] # Why Canto?
**The interface.** I'm not sure how people can stand three-pane readers. Graphical or console, I don't want to tab around multiple windows to read my news. The way I read, I want all of the headlines out where I can see them with minimal cruft. It should be one key stroke to do most common actions. **Extensibility** Everything used to configure Canto is actually part of the Python interpreter. It has all of the power of Python anywhere. Hooks are provided to allow you to perform actions on particular events (start, stop, update, resize, select, unselect, new), filters allow you to filter items out on arbitrary terms (globally or on a feed basis), and intelligent keybinds allow you to script your way to pretty much anything you want. Examples: - Use a keybind to write the item to a file, or mail it to a friend. - Use a filter to only browse through items you haven't read yet. - Simply add content like an item's tags or author to the reader. - Sort feeds by date, alphabetically, by length, or by any other arbitrary criteria. - Keybind your favorite search terms to cycle through them. - Detect that your terminal is xterm or a tty and set the browser appropriately - Arbitrary things, like change the theme based on the time of day The point being that, with all of Python behind it, you can manipulate Canto in new and inventive ways without any help from me. My limited vision doesn't encumber how you use the software. **Theming** All of the drawing code is done by a simple class full of literate Python. If you want to modify how anything is drawn to the screen, subclass the renderer, change anything you want and voila... Canto now draws differently. Definitely for advanced users, but extremely powerful.
# Does Canto support Atom feeds?
Yes. Canto uses [feedparser](http://feedparser.org) which is capable of parsing virtually any syndicated news format on the planet and aside from that, Canto doesn't care if a feed is any version of RSS, Atom, or any other feed type.
# Does Canto support OPML?
Yes, as of 0.5.2, Canto support importing from/export to OPML, as well as adding feeds from an OPML file at [runtime](../config/#sourcing-other-files)
# Does Canto support Unicode/UTF-8?
Yes. As long as your terminal and its font can support it, then Canto can output it. If you can't see certain characters try using a pan-Unicode font (like unifont) with your terminal. Then you can read all of your Unicode Klingon characters.
# The interface is full of ????s, WTF man?
Your locale isn't UTF-8 compatible. You should probably change that if you expect your terminal apps to do anything smart with non-ASCII characters (like the ones used to draw the interface). More info [here](http://codezen.org/canto/start/#locale-or-wtf-are-these-lines) and [here](http://wiki.archlinux.org/index.php/Locale).
# What does Canto require?
As of version 0.7.8 Canto only requires Python 2.5+ (although 2.4 *may* work, it's not tested at all) and [chardet](http://chardet.feedparser.org). Canto < 0.7.8 requires [feedparser](http://feedparser.org) in addition to the above. Canto < 0.7.6 on Python < 2.6 requires the multiprocessing library. To **build** Canto: GCC or another standards compliant C compiler, ncurses header files, and python header files (for compiling the core code).
# Why ncurses?
I am one of those internet nutjobs that believe that the mouse shouldn't have made it as far as it has in the world. I like to be able to do everything from the keyboard, and a lot of times that means using console apps because they seem to be the only ones not following the click-button paradigm (with a shoutout to [Vimperator](http://vimperator.org) for making Firefox an exception). There's also the fact that virtually every Unix-like distro has ncurses installed by default, so that the user doesn't have to choose apps based on what will best integrate into their desktop. How many times have you found a piece of software and are disappointed because it doesn't really fit with everything else? People in Gnome running Amarok know exactly what I'm talking about. Then there's the fact that I like to read news from in SSH sessions (very nice when you're on campus, or at work). And finally, I just really dig text-interfaces. [XMonad](http://xmonad.org) and have six million xterms running with [vim](http://www.vim.org),[mutt](http://mutt.org), and [moc](http://moc.daper.net) at all times.
# Why Python?
The previous iteration of this project was in C (NRSS, which is deprecated in favor of Canto now). Now C is great. I love C. I do my daily work in C and I enjoy it. But when it comes to *having fun* while coding, Python is the way to go. I know you take a speed hit, but that's why the core drawing logic is in C (surprise!). And there's one thing that I've learned while bugfixing NRSS, and that's that it's no fun. In addition, Mark Pilgrim's excellent [feedparser](http://feedparser.org) and [chardet](http://chardet.feedparser.org) libraries are native to Python, and they make life a lot easier. Lastly, Python is also second only to Perl in system adoption and... God knows I wouldn't want to write this in Perl =).
# What license is Canto under?
Canto is [GPLv2](http://www.gnu.org/licenses/old-licenses/gpl-2.0.html) software. Just like all the other software **real** men write.
# What platform does Canto run on?
Canto should run on any Linux system with the required libraries, regardless of architecture. Other platforms like *BSD, Cygwin, and Windows are 100% untested and unsupported. Patches welcome.
# Where should I request features / report bugs?
Read the [contact](../contact) page. In short you should either get into the IRC channel, write a message on the Mailing List or submit a bug report to the [bug tracker](http://github.com/themoken/Canto/issues).
canto-0.7.10/doc/getting-started000066400000000000000000000144521142361563200165110ustar00rootroot00000000000000Getting started with Canto is easy. Most of this is covered in `man canto`, but it seems like it should be online as well. [TOC] # Quick Start
Right after you download and install Canto, I'm sure you're eager to get going. If you just run `canto`, it will start up, generate an example configuration (~/.canto/conf.py.example), fetch those feeds, and startup the interface as usual. This is to let you get a feeling for the program quickly, before messing around with the configuration. Here's the output for conf.py.example: :::python # Auto-generated by canto because you don't have one. # Please copy to/create ~/.canto/conf.py add("http://rss.slashdot.org/slashdot/Slashdot") add("http://reddit.com/.rss") add("http://kerneltrap.org/node/feed") add("http://codezen.org/canto/feeds/latest") You can find a list of the default keybinds on the [configuration page](../config/#default-binds), to help you navigate.
# Configuration
So you've messed around with the interface and want to put a little effort into a config. The first place to check is the [configuration page](../config/), to help you get on your way. The most basic config (like the conf.example generated if you ran without a config), is just a series of `add` calls, as you can see above. If you store your feeds in *OPML* format, you can use `canto -i ` to automatically import your feeds to your config, or `source_opml()` to add them from an OPML file at run time. If you store your feeds in a *list of URLs*, you can use `source_urls()` to read the list from a file at run time.
# Fetching
In order for canto to receive updates, it's fetching program, `canto-fetch` must be run often. You can achieve this either by adding * * * * * canto-fetch to your crontab, meaning that `canto-fetch` will be run once a minute while the computer is running, or if you don't use cron you can add canto-fetch -db to your startup scripts (.xinitrc, etc.), this will make canto-fetch into a daemon and run itself every minute. You can also run `canto-fetch -db` by hand, or just run `canto -u` to force an update every time you run. It's up to you.
# Tips
This covers some less obvious niceties of using the Canto interface that aren't really covered by [keybinds](../config/#default-binds) or anywhere else. ## Locale or "WTF are these ???? lines?" If you're having trouble with the interface showing question marks all over the place instead of the nice lines shown in the screenshots, then you've got a locale problem, specifically you're running in a non-UTF-8 locale. My first suggestion in this case is to switch your system locale to a UTF-8 compatible one. This affects a lot of things, like other terminal applications. For example: * [This](http://codezen.org/static/good_elinks.png) is a screenshot of `elinks` running in a proper UTF-8 locale. Nice, real characters. Thetas look like thetas. Omegas look like omegas. * [This](http://codezen.org/static/bad_elinks.png) is a screenshot of `elinks` running in iso-8859-1 which is a typical non-Unicode locale. Everything looks funny. Thetas are THs and omega is W*? So I don't mean to suggest that everyone's running around reading Greek in their terminal apps, but this is an illustration using the same program, with the same input ([wikipedia](http://en.wikipedia.org/wiki/List_of_unicode_characters)), but one looks much better than the other. And it's not just non-Latin languages either. Lots of RSS generating blogs use fancy symbols (some as basic as double-quote marks) that don't render correctly in non-UTF-8 locales. ### To use a proper locale 1. Run `locale -a` which will show you the installed locales on your system. If you see an appropriate UTF-8 line, skip to 4. 2. Edit `/etc/locale.gen` (or use `sudo dpkg-reconfigure locales` on Debian based systems) to add/uncomment your locale. It should be a line in the form of `language_territory.UTF-8 UTF-8` where language and country are abbreviations. For example: `en_US.UTF-8 UTF-8` for American English or `de_DE.UTF-8 UTF-8` for German. 3. Run `sudo locale-gen` to actually generate the locale data based in `/etc/locale.gen` . 4. * TO CHANGE ALL PROGRAMS: Add `export LC_ALL=xx_XX.UTF-8` to your shell configuration (`~/.bashrc` for example) and restart your terminal/shell. * TO JUST RUN CANTO: Invoke Canto like this: `LC_ALL=xx_XX.UTF-8 canto` Some alternate (and mostly distro agnostic) help from the Arch Wiki [here](http://wiki.archlinux.org/index.php/Locale) That should make Canto, and all of your other terminal apps display nicely. Also note though that even with a UTF-8 locale actually displaying the right characters is still dependent on using a good font in your terminal. So, if you plan on reading any feeds in Arabic or Chinese or even Math, you may be interested in setting up your terminal with a font like [unifont](http://packages.debian.org/sid/unifont). ## Input Boxes Throughout canto you might run into an input box or two. These boxes accept any sort of input (including UTF-8 characters). In addition to input though, it supports typical (Emacsy?) terminal control sequences.
Ctrl + g (BEL)
Cancel Input
Ctrl + j (NL)
Enter Input (same as Enter)
Ctrl + a (SOH)
Start of line (same as Home)
Ctrl + e (ENQ)
End of line (same as End)
Movement is also supported with the left and right arrows as well as their Emacsy equivalent. Ctrl + g is the only keybind that doesn't have an easily typable equivalent but is also probably the most useful. ## Goto Links ## In addition to the above keybind, the reader "go to link" input box ('g' in the reader, by default) supports going to multiple links at once (very useful for posts that aggregate a number of cool links). In this input box **only** you can specify a comma delimited list of ranges. Some examples: 1 # Goto link 1 1,3 # Goto links 1 and 3 1-3 # Goto links 1, 2, and 3 1,4-6 # Goto links 1, 4, 5, 6 Also worth noting: link 0 is *always* the story's main link. Going to link 0 from the reader is equivalent to hitting 'g' (by default) in the main view.
canto-0.7.10/man/000077500000000000000000000000001142361563200134615ustar00rootroot00000000000000canto-0.7.10/man/canto-fetch.1000066400000000000000000000035731142361563200157460ustar00rootroot00000000000000.TH Canto-fetch 1 "MAN_DATE" "Version MAN_VERSION" "Canto-fetch" .SH NAME Canto-fetch \- A quiet feed fetcher. .SH DESCRIPTION Canto-fetch is designed to be run through a cron job, every minute. Every time it's run, it checks the timestamp on each index file and updates the feed, if necessary. The format it produces is a simple UTF-8 encoded, NULL delimited text file on disk, readable by the canto client. .SH GETTING STARTED Canto-fetch is meant to be used through a cron, adding this line to your crontab will cause canto-fetch to poll every minute. * * * * * canto-fetch Alternatively, if you're unable/uncomfortable using cron, canto-fetch has a background daemon mode so you can invoke it in your X session scripts. Just use canto-fetch \-b .SH USAGE These options correspond to options to the canto client. .TP \-h / \--help Print usage and quit. .TP \-v / \--version Print version and quit. .TP \-V / \--verbose Output status while updating. .TP \-d / \--daemon Continue to check for updates every minute. Mostly for debugging with \-V, users probably want \-b to background. .TP \-b / \--background Detach from the terminal (implies \-d) .TP \-f / \--force Force updates on all feeds, ignoring timestamps. .TP \-s / \--sysfp Use feedparser on system instead of builtin copy. .TP \-C / \--conf [PATH] Set path to configuration file (default: ~/.canto/conf) .TP \-F / \--fdir [PATH] Set path to feed directory (default: ~/.canto/feeds/) .TP \-L / \--log [PATH] Set path to log (default: ~/.canto/fetchlog) .TP \-S / \--sdir [PATH] Set the path to execurl scripts (default ~/.canto/scripts/) .SH FILES .TP .I ~/.canto/fetchlog Canto-fetch log file. .TP .I ~/.canto/feeds/ This is the directory where canto-fetch records stories. .SH BUGS None known, but it's not outside of the realm of possibility =P. .SH HOMEPAGE http://codezen.org/canto .SH AUTHOR Jack Miller canto-0.7.10/man/canto-inspect.1000066400000000000000000000005531142361563200163150ustar00rootroot00000000000000.TH Canto-inspect 1 "MAN_DATE" "Version MAN_VERSION" "Canto-inspect" .SH NAME Canto-inspect - print content from a feed. .SH SYNOPSIS canto-inspect [URL] .SH DESCRIPTION Canto-inspect is a custom pretty printer for content from a feed. It provides a nice layout for the feed XML. .SH HOMEPAGE http://codezen.org/canto .SH AUTHOR Jack Miller jack@codezen.org canto-0.7.10/man/canto.1000066400000000000000000000107071142361563200146540ustar00rootroot00000000000000.TH Canto 1 "MAN_DATE" "Version MAN_VERSION" "Canto" .SH NAME Canto \- An ncurses RSS reader .SH DESCRIPTION Canto is an RSS reader built to be flexible and highly customizable on top of python. .SH GETTING STARTED Canto is the client, separate from the fetcher Canto-fetch. In order to update you must invoke canto-fetch on a regular basis. Usually the way to do that is to add a line into your crontab like this: * * * * * canto-fetch If you're not a fan of cron, then you can add `canto-fetch \-b` to your startup scripts to have canto-fetch run as a daemon on its own. After you have your configuration file generated, just run `canto` and it will update for you, if canto-fetch hasn't been run already. .SH COMMAND LINE USAGE .TP \-h/--help Simple help .TP \-v/--version Print version .TP \-u/--update Update feeds before launching client. .TP \-l/--list List all configured feeds .TP \-a/--checkall Print number of new items. .TP \-n/--checknew [feed] Print number of new items in feed. .TP \-o/--opml Print feeds to stdout as OPML file. .TP \-i/--import [path] Import feeds from OPML file to your configuration. .TP \-r/--url [URL] Add feed at URL to your configuration. .TP \-t/--tag [tag] Set tag for feed added with \-r .TP \-D/--dir [path] Set the configuration directory (default: ~/.canto/) .TP \-C/--conf [path] Set the configuration file (default: ~/.canto/conf) .TP \-L/--log [path] Set the log file (default: ~/.canto/log) .TP \-F/--fdir [path] Set the feed directory (default: ~/.canto/feeds) .TP \-S /--sdir [PATH] Set the path to execurl scripts (default ~/.canto/scripts/) .SH INTERNAL USAGE Within the program you can use the following (default) keys. These can be changed in your configuration file by using the "key" configuration option. .TP UP / DOWN or k / j Select previous or next item. (next_item) (prev_item) .TP PGUP / PGDOWN or o / l Goto previous or next tag. (next_tag) (prev_tag) .TP RIGHT / LEFT Set item unread or read (just_unread) (just_read) .TP [ / ] Cycle through defined filters (prev_filter) (next_filter) .TP { / } Cycle through defined tag filters (prev_feed_filter) (next_feed_filter) .TP \- / = Cycle through defined tag sorts (prev_tag_sort) (next_tag_sort) .TP < / > Cycle through defined tag sets (prev_tagset) (next_tagset) .TP : Goto a specific tag (order of the config) (goto_tag) .TP ; Goto a specific visible tag (goto_reltag) .TP TAB Switch focus between list and reader (only useful with dedicated reader space) (switch) .TP h Display this man page. (help) .TP Space Read a story (reader) .TP g Use the defined browser to goto the item's URL (goto) .TP C / V Collapse/Uncollapse all tags (set_collapse_all) (unset_collapse_all) .TP c Collapse/Uncollapse current tag (toggle_collapse_tag) .TP f Inline search (inline_search) .TP m Toggle item marked/unmark (toggle_mark) .TP M Unmark all items (all_unmarked) .TP n / p Goto next/previous marked item (next_mark) (prev_mark) .TP , / . Goto next/previous unread item (next_unread) (prev_unread) .TP r / u Mark tag read/unread (tag_read) (tag_unread) .TP R / U Mark all read/unread (all_read) (all_unread) .TP Ctrl+R Refresh feeds (force_update) .TP Ctrl+L Redraw screen (refresh) .TP \\ Restart Canto (restart) .TP q Quit Canto (quit) .SH READER USAGE Inside the reader, there are a number of different keys. These can be changed with the "reader_key" configuration option. .TP UP / DOWN or k / j Scroll up/down if content off screen (scroll_up) (scroll_down) .TP n / p goto next/previous item without closing reader .TP l Enumerate links (toggle_show_links) .TP g Choose a link to goto (goto) .SH CONFIGURATION The ~/.canto/conf.py file is where all of the configuration is. You can start by reading http://codezen.org/canto/config . If you're updating from <= 0.6.x then you should read http://codezen.org/canto/config/#upgrading-from-06x . .SH FILES .TP .I ~/.canto/conf.py Main configuration file. ~/.canto/conf (without the extension) is also checked for compatibility. .TP .I ~/.canto/log Everyday log file. .TP .I ~/.canto/fetchlog Canto-fetch log file. .TP .I ~/.canto/feeds/ This is the directory where the stories are recorded. .SH BUGS I'm sure there are some. If you run into a bug (a crash or bad behavior), then send please report it. Any of the methods described in http://codezen.org/canto/contact are acceptable. Also, please include your configuration and log files with the report. .SH HOMEPAGE http://codezen.org/canto .SH AUTHOR Jack Miller canto-0.7.10/runhere.sh000077500000000000000000000010651142361563200147170ustar00rootroot00000000000000#!/bin/bash pkill -INT -f "^python.*canto-fetch" killall -INT canto OLDPPATH=$PYTHONPATH OLDMPATH=$MANPATH python setup.py install --prefix=$PWD/root PYVER=`python -c "import sys; print sys.version[:3]"` if [ -e "$PWD/root/lib64" ]; then echo "Detected 64bit install" LIBDIR="lib64" else LIBDIR="lib" fi export PYTHONPATH="$PWD/root/$LIBDIR/python$PYVER/site-packages:$OLDPPATH" export MANPATH="$PWD/root/share/man:$OLDMPATH" bin/canto-fetch -b bin/canto pkill -INT -f "^python.*canto-fetch" export PYTHONPATH=$OLDPPATH export MANPATH=$OLDMPATH canto-0.7.10/setup.py000077500000000000000000000046061142361563200144310ustar00rootroot00000000000000#!/usr/bin/env python from __future__ import with_statement # This isn't required in Python 2.6 import commands import distutils.core import distutils.command.install_data import uninstall version = ['0','7','10'] man_date = "27 July 2010" git_commit = commands.getoutput("git show --pretty=oneline\ --abbrev-commit").split()[0] # TODO: replace with build_manpages target (@james will do this shortly if approved) class Canto_install_data(distutils.command.install_data.install_data): def run(self): ret = distutils.command.install_data.install_data.run(self) install_cmd = self.get_finalized_command('install') libdir = install_cmd.install_lib mandir = install_cmd.install_data + "/share/man/man1/" for source in ["/canto/const.py"]: with open(libdir + source, "r+") as f: d = f.read().replace("SET_VERSION_TUPLE","(" +\ ",".join(version) + ")") d = d.replace("SET_GIT_SHA", "\"" + git_commit + "\"") f.truncate(0) f.seek(0) f.write(d) for manpage in ["canto.1","canto-fetch.1","canto-inspect.1"]: with open(mandir + manpage, "r+") as f: d = f.read().replace("MAN_VERSION", ".".join(version)) d = d.replace("MAN_DATE", man_date) f.truncate(0) f.seek(0) f.write(d) distutils.core.setup(name='Canto', version=".".join(version), description='An ncurses RSS aggregator.', author='Jack Miller', author_email='jack@codezen.org', url='http://codezen.org/canto', download_url='http://codezen.org/static/canto-' + ".".join(version) + ".tar.gz", platforms=["linux"], license='GPLv2', scripts=['bin/canto','bin/canto-fetch', 'bin/canto-inspect'], packages=['canto', 'canto.cfg'], ext_modules=[distutils.core.Extension('canto.widecurse',\ sources = ['canto/widecurse.c'], libraries = ['ncursesw'], library_dirs=["/usr/local/lib", "/opt/local/lib"], include_dirs=["/usr/local/include", "/opt/local/include"])], data_files = [("share/man/man1/",\ ["man/canto.1", "man/canto-fetch.1", "man/canto-inspect.1"])], cmdclass={ 'install_data': Canto_install_data, 'install': uninstall.install, 'uninstall': uninstall.uninstall } ) canto-0.7.10/uninstall.py000066400000000000000000000161751142361563200153030ustar00rootroot00000000000000#!/usr/bin/python # -*- coding: utf-8 -*- """distutils.command.uninstall Uninstall/install targets for python distutils. Implements the Distutils 'uninstall' command and a replacement (inheriting) install command to add in the hooks to make the uninstall command happy. """ # Copyright (C) 2009 James Shubin, McGill University # Written for McGill University by James Shubin # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU Affero General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see . # created 2009/10/10, James Shubin from __future__ import with_statement __revision__ = "$Id$" # TODO: what should i do with this? import os import distutils.core import distutils.command.install import distutils.errors # TODO: have the get stored away as part of the install somewhere INSTALL_LOG = 'install.log' # default filename for install log _ = lambda x: x # add fake gettext function until i fix up i18n __all__ = ['install', 'uninstall'] # NOTE: PEP376 might eventually support (automatic?) uninstall, until then... # see: http://www.python.org/dev/peps/pep-0376/ or search the internets. class install(distutils.command.install.install): """inherit from the main install, replacing it.""" # NOTE: don't add any functions in this class without calling the parent! def run(self): if self.verbose: print _('running custom install') # run whatever was supposed to run from the main install # NOTE: this respects the dry-run option. distutils.command.install.install.run(self) # ...and then add on some hooks to support uninstalling. if not(self.dry_run): try: with open(INSTALL_LOG, 'w') as f: # note: use '\n' for *all* platforms, not just linux. # see: http://docs.python.org/library/os.html#os.linesep f.write(_('# installed files log. needed for uninstall. do NOT delete.\n')) f.writelines([ '%s\n' % x for x in self.get_outputs() ]) except IOError, e: print _('unable to write install log to: %s') % INSTALL_LOG print e class uninstall(distutils.core.Command): description = _('uninstalls a python package, trying its best to succeed') user_options = [ ('force-log', 'L', _('uninstall from install log data')), ('force-guess', 'G', _('uninstall based on a dry-run install')), ('install-log=', 'f', _('specifies the install log file to use')), ('generate-log=', 'g', _('generates a dry-run log file')), # TODO: someone annoying can add this if they're scared :) #('always-ask=', 'a', _('prompt before every removal')), # TODO: add this option. #('purge-config', 'P', _('purge all traces of config')) ] def initialize_options(self): self.force_log = None self.force_guess = None self.install_log = None self.generate_log = None self.always_ask = None self.purge_config = None def finalize_options(self): # uninstaller has to try one method or the other, not both. # if neither is set, then uninstaller gets to choose. if self.force_log and self.force_guess: raise distutils.errors.DistutilsOptionError( _('choose either the `force-log\' or `force-guess\' option.') ) # do some validation if (self.install_log is not None) and not(os.path.exists(self.install_log)): raise distutils.errors.DistutilsOptionError( _('the `install-log\' option must point to an existing file.') ) def run(self): success = False # do this unless we are forced to guess if not(self.force_guess): filename = INSTALL_LOG # the default if self.install_log is not None: # override if specified filename = self.install_log try: with open(filename, 'r') as f: # take out the comments filelist = [ x.strip() for x in f.readlines() if x[0] != '#' ] success = True # this worked except IOError, e: if self.force_log: print _('unable to read install log at: %s') % filename print e return # must exit this function # we assume that as a backup, we can `depend' on this heuristic if self.generate_log or (not(success) and not(self.force_log)): if self.verbose: print _('running guess') output = self.get_install_outputs() # success logically represents if we are depending on `guess' if not success: filelist = output # also generate the log if asked if self.generate_log and not(self.dry_run): try: with open(self.generate_log) as f: f.write(_('# installed files guess log.\n')) f.writelines([ '%s\n' % x for x in output ]) except IOError, e: print _('unable to write install guess log to: %s') % self.generate_log print e # document the type of uninstall that the data is coming from if self.verbose: if success: print _('uninstalling from log: %s' % filename) else: print _('uninstalling from guess') # given the list of files, process them and delete here: dirlist = [] for x in filelist: if not(os.path.exists(x)): print _('missing: %s') % x elif os.path.isfile(x): # collect dirs which install log doesn't store dirlist.append(os.path.split(x)[0]) self.__os_rm(x) # remove any .pyc, pyo & (pyw: should we?) mess if os.path.splitext(x)[1] == '.py': for ext in ('c', 'o'): # add 'w' ? mod = x + ext if self.__os_rm(mod): # don't remove it twice if mod in filelist: filelist.remove(mod) # save for later elif os.path.isdir(x): dirlist.append(x) if len(dirlist) == 0: return dirlist = list(set(dirlist)) # remove duplicates # loop through list until it stops changing size. # this way we know all directories have been pruned. # most robust if someone shoves in a weird install log. if self.verbose: print _('attempting to remove directories...') while True: size = len(dirlist) if size == 0: if self.verbose: print _('successfully removed all directories.') break for x in dirlist: # keep non-empty dirs if len(os.listdir(x)) == 0: if self.__os_rm(x): dirlist.remove(x) if len(dirlist) == size: print _('couldn\'t remove any more directories') print _('directories not removed include:') for i in dirlist: print '\t* %s' % i break def get_install_outputs(self): """returns the get_outputs() list of a dry run installation.""" self.distribution.dry_run = 1 # do this under a dry run self.run_command('install') return self.get_finalized_command('install').get_outputs() def __os_rm(self, f): """simple helper function to aid with code reuse.""" if os.path.exists(f): if self.verbose: print _('removing: %s') % f if not(self.dry_run): try: if os.path.isdir(f): os.rmdir(f) else: os.remove(f) return True except OSError, e: if self.verbose: print _('couldn\'t remove: %s') % f return False