fImport/0000755000176000001440000000000012120624003011710 5ustar ripleyusersfImport/MD50000644000176000001440000000766312120624003012234 0ustar ripleyusers22709eaf431b727b40990525a60880ae *ChangeLog 2a5772be068edbee1f23387ff4cf5899 *DESCRIPTION 572c0bc4ab8114fe4564047da45f7704 *NAMESPACE 4943edaa3e07e213da1d4c4c2da250e4 *R/builtin-xlsGdata.R ed967b1e07f635b23aee8e7903238314 *R/class-fWEBDATA.R 0dbf7b63f8b2e43bb2112051b09215a0 *R/import-fred.R ddaeec726cb8aff6bde3bcd8bfbc1527 *R/import-oanda.R f3b9a04d85d58852b66fedcdb943620d *R/import-yahoo.R 11f4e6a4fd1c685a1ac9306ce34e0ff8 *R/methods-show.R 5baefb04ff7e29ae4471721ab3516725 *R/read-download.R c0682a94d00ee6e28919eed15c855f0a *R/read-lines.R a2a223c51b49ca03c633c4f8c8c42384 *R/read-links.R 1fd90792d71f61eb71dfe8109572b629 *R/read-lynx.R 1f798e7678b97a660886a70a11a6f1f0 *R/read-split.R 741b8c0c29e515f559e312b8a1d5e207 *R/read-w3m.R 822d2b5aadb43ce2c8fe9e7f980f786f *R/read-xls.R eeea1f3173a8922eb62c6b40c3d0a364 *R/utils-yahoo.R 4e1bef3c6ce229ac6f7e6719c16a1960 *R/zzz.R eb67e9393d7b73a41874d38d88598379 *data/amexListing.csv.gz ffb69a6101e6aeb8bc9c656037d7a5f9 *data/h15Listing.csv.gz 2ee3619a6ce1fa8543e4c4829162c6fa *data/nasdaqListing.csv.gz 94b8a57b94cb38e94b01ce3cd3c7f382 *data/nyseListing.csv.gz c2811a28fad4bbe884d39897a563a91d *data/oandaListing.csv.gz f6167196c2061461034160c4d5493a5e *data/stoxxListing.csv.gz 16fc9d7e85d555d07b975c3ed380a167 *data/swxListing.csv.gz 6042b9c5e5bec3ecc1b6959cd2858b64 *inst/COPYRIGHT.html ac75910053903467dfeef65a5e8aee81 *inst/THANKS f8109a53128f172d5199998a0774a982 *inst/perl/IO/AtomicFile.pm cb8bf30e73340e4eba233c51dd8b2f34 *inst/perl/IO/InnerFile.pm 5886a657d7e49b133d23f7b2dbe30c21 *inst/perl/IO/Lines.pm 6be2f7b5899b83a897025caf868e2b8b *inst/perl/IO/Scalar.pm aaa5b626b1467f10703f741377f48f45 *inst/perl/IO/Scalar.pm.html 520d9d810f5758f247727f8f2730d71e *inst/perl/IO/ScalarArray.pm 3e242abfa789aff62181bf299d9089e8 *inst/perl/IO/Stringy.pm 546777a943a0b90882709f2b10d317d1 *inst/perl/IO/Wrap.pm 3669bd450d4fc4e6b883fcd7ad604caf *inst/perl/IO/WrapTie.pm ac02eb7b8f5a465ddfbffe3d6ae8e970 *inst/perl/OLE/Storage_Lite.pm 1f654a86fcfa94c11a7ca0727cb80a6b *inst/perl/Spreadsheet/ParseExcel.pm 5b9c925cbfd90af04f94bf79650d6c3d *inst/perl/Spreadsheet/ParseExcel/Dump.pm a7b0e77e315ed0af93cf1eac2947bdf2 *inst/perl/Spreadsheet/ParseExcel/FmtDefault.pm c290c305fc2c72319a7c5a03c9374413 *inst/perl/Spreadsheet/ParseExcel/FmtJapan.pm 022ec2be7c1a29b53b70e9b07e8ca1ca *inst/perl/Spreadsheet/ParseExcel/FmtJapan2.pm 15d505b239530e813c43a47994d7d1b6 *inst/perl/Spreadsheet/ParseExcel/FmtUnicode.pm b4d38ac2cf0d31024032f6f26f312d91 *inst/perl/Spreadsheet/ParseExcel/SaveParser.pm 86f47d356ad68219465e21d9b283733f *inst/perl/Spreadsheet/ParseExcel/Utility.pm c4341957b89bad277d02b806190398de *inst/perl/html-parser/html-ascii.pl 100863c8c612d7fe7a2538f0ee746d0f *inst/perl/html-parser/html-parser.html dad0620a0e3d01b1c288528977cba43b *inst/perl/html-parser/html-to-ascii.pl e3f98170141a84a3e33df38e24a5de0c *inst/perl/html-parser/html-to-rfc.pl cabe6a21afaf40fc91875b4fae9fc1be *inst/perl/html-parser/parse-html.pl 04a1075156469440e0c963ca59c96ac7 *inst/perl/html-parser/rfc.pl d2a8d2a6640daf41a2b1e1a5df9aa1bb *inst/perl/html-parser/tformat.pl b6b727ff08caecfaddf318ca5c2493fd *inst/perl/xls2csv.pl 9bcce6a39d261460a19e94c9bfc098cd *man/00fImport-package.Rd a63c89b8305a50f46451d3a2f8876cd5 *man/class-fWEBDATA.Rd 64d29a4ffc25b9d8d85f367b60739458 *man/import-fred.Rd 2c89294b2b051272583a180c609309fa *man/import-oanda.Rd dfc610ee77657b9b59719300cc99c6d7 *man/import-yahoo.Rd fe6a375d6c5413bc5469741f06452db6 *man/methods-show.Rd e325f3ec995deb13ac213ab4b44dec70 *man/provider-listings.Rd bd16550c8a030bc258981ab2e2af6244 *man/read-download.Rd 33106f53a3ff7d47d6ea2b175eed8fce *man/read-lines.Rd 12d10df68a7b9470d4f29e783277b4f5 *man/read-links.Rd 71955874ff4b00b44e5b27e3e0e86d6c *man/read-lynx.Rd 9de6256e4bec4805cb4b626a1dbdbe60 *man/read-split.Rd 20c3c2c3e34105c554b70d266231355b *man/read-w3m.Rd fc1901177e700d181c41c7c69422e402 *man/read-xls.Rd d106cbf104f1fbbfcb884cc44d663a2e *man/utils-yahooBriefing.Rd 1d07ef9438d49a70fde8d3fadf9c956a *man/utils-yahooKeystats.Rd fImport/man/0000755000176000001440000000000011731576637012515 5ustar ripleyusersfImport/man/utils-yahooKeystats.Rd0000644000176000001440000000260011731422661016772 0ustar ripleyusers\name{utils-yahooKeystats} \alias{yahooKeystats} \title{Import Key Statistics Data from Yahoo} \description{ Imports key statistics data from \verb{chart.yahoo.com}. } \usage{ yahooKeystats(query, file = "tempfile", source = NULL, save = FALSE, try = TRUE) } \arguments{ \item{file}{a character string specifying a filename, usually having extension \file{".csv"}, where to save the downloaded data.} \item{query}{ a character string, denoting the location of the data at the web site. } \item{save}{ a logical value, if set to TRUE the downloaded data file will be stored under the path and file name specified by the string \code{file}. By default FALSE. } \item{source}{a character string setting the URL of the source. If \code{NULL}, then the URL will be set automatically to its default value. } \item{try}{logical indicating if the internet access will be checked.} } \value{ returns a data frame (\code{\link{data.frame}}) with key statistics downloaded from yahoo's web site. } \author{ Diethelm Wuertz for the Rmetrics \R-port. } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis (2010). \emph{Financial Market Data for R/Rmetrics}; Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, \url{www.rmetrics.org}. } \keyword{data} fImport/man/utils-yahooBriefing.Rd0000644000176000001440000000263011731422661016713 0ustar ripleyusers\name{utils-yahooBriefing} \alias{yahooBriefing} \title{Import Briefings from Yahoo} \description{ Imports briefings from chart.yahoo.com. } \usage{ yahooBriefing(query, file = "tempfile", source = NULL, save = FALSE, try = TRUE) } \arguments{ \item{file}{ a character string with filename, usually having extension ".csv", where to save the downloaded data. } \item{query}{ a character string, denoting the location of the data at the web site. } \item{save}{ a logical value, if set to TRUE the downloaded data file will be stored under the path and file name specified by the string \code{file}. By default FALSE. } \item{source}{a character string setting the URL of the source. If \code{NULL}, then the URL will be set automatically to its default value. } \item{try}{ a logical value, if set to TRUE the Internet access will be checked. } } \value{ returns a data frame with briefings downloaded from yahoo's web site. } \author{ Diethelm Wuertz for the Rmetrics \R-port. } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{data} fImport/man/read-xls.Rd0000644000176000001440000000201611731422661014505 0ustar ripleyusers\name{read-xls} \alias{read.xls} \title{Read from an xls file} \description{ Reads a sheet from an xls file. } \usage{ read.xls(url, sheet=1, lines=-1, verbose=FALSE, encoding="unknown") } \arguments{ \item{url}{ a character string specifying the URL of the web page. } \item{sheet}{ an integer denoting which sheet should be extracted, by default the first. } \item{lines}{ a negative integer with the lines to be skipped, by default the first. } \item{verbose}{ a logical decides about verbose mode, by default FALSE. } \item{encoding}{ a character string with the type of encoding, by defaul \code{"unknown"}. } } \value{ the downloaded text. } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{programming} fImport/man/read-w3m.Rd0000644000176000001440000000214411731422661014407 0ustar ripleyusers\name{read-w3m} \alias{read.w3m} \title{w3m Browser interface} \description{ Uses the w3m Browser to read a web page. } \usage{ read.w3m(url, intern = TRUE, bin = NULL, pipe = FALSE, \dots) } \arguments{ \item{url}{ a character string specifying the URL of the web page. } \item{intern}{ a logical which indicates whether to make the output of the command an R object. } \item{bin}{ a string with the path of your w3m binary or NULL if w3m binary is available in the operating system path. } \item{pipe}{ a logical which indicates whether the result should be returned as a \code{pipe()} commmand. } \item{\dots}{ optional arguments passed to w3m binary. For a list of options, see the w3m manual page. } } \value{ the downloaded text. } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{programming} fImport/man/read-split.Rd0000644000176000001440000000155711731422661015043 0ustar ripleyusers\name{read-split} \alias{charvecSplit} \alias{dataSplit} \title{Splitting downloaded data sets} \description{ Two helpful data set and charvec splitting utilities. } \usage{ dataSplit(x, split=" ", col=-1) charvecSplit(x, split=" ", col=1, format="\%F") } \arguments{ \item{x}{ character vector to be splitted. } \item{split}{ the split character, by default a blank. } \item{col}{ an integer value or vector, the columns to be selected. } \item{format}{ the date format of the character vector, by default the ISO-8601 date format. } } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{programming} fImport/man/read-lynx.Rd0000644000176000001440000000233011731422661014670 0ustar ripleyusers\name{read-lynx} \alias{read.lynx} \title{Lynx Browser interface} \description{ Uses the Lynx Browser to read a web page. } \usage{ read.lynx(url, intern = TRUE, bin = NULL, pipe = FALSE, \dots) } \arguments{ \item{url}{ a character string specifying the URL of the web page. } \item{intern}{ a logical which indicates whether to make the output of the command an R object. } \item{bin}{ a string with the path of your lynx binary or NULL if lynx binary is available in the operating system path. } \item{pipe}{ a logical which indicates whether the result should be returned as a \code{pipe()} commmand. } \item{\dots}{ optional arguments passed to lynx binary. For example \code{accept_all_cookies = TRUE} or \code{cookie_file="~/.lynx_cookies"}. For a list of options, see the lynx manual page. } } \value{ the downloaded text. } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{programming} fImport/man/read-links.Rd0000644000176000001440000000207611731422661015025 0ustar ripleyusers\name{read-links} \alias{read.links} \title{Links Browser interface} \description{ Uses the Links Text Browser to read a web page. } \usage{ read.links(url, intern = TRUE, bin = NULL, pipe = FALSE, \dots) } \arguments{ \item{url}{ a character string specifying the URL of the web page. } \item{intern}{ a logical which indicates whether to make the output of the command an R object. } \item{bin}{ a string with the path of your lynx binary or NULL if lynx binary is available in the operating system path. } \item{pipe}{ a logical which indicates whether the result should be returned as a \code{pipe()} commmand. } \item{\dots}{ optional arguments passed to links binary. } } \value{ the downloaded text. } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{programming} fImport/man/read-lines.Rd0000644000176000001440000000242511731576637015032 0ustar ripleyusers\name{read-lines} \alias{read.lines} \title{Read from a text file line by line} \description{ Reads from a text file line by line. Wrapper to readLines() function. } \usage{ read.lines(con=stdin(), n=-1, ok=TRUE, warn=FALSE, encoding="unknown") } \arguments{ \item{con}{ a connection object or a character string. } \item{n}{ an integer, the (maximal) number of lines to read. Negative values indicate that one should read up to the end of input on the connection. } \item{ok}{ a logical, is it OK to reach the end of the connection before n > 0 lines are read? If not, an error will be generated. } \item{warn}{ a logical, warn if a text file is missing a final EOL. The default is FALSE, note different from function \code{readLines}. } \item{encoding}{ a character string, the encoding to be assumed for input strings. } } \value{ the downloaded text. Same output as readLines() function. } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{programming} fImport/man/read-download.Rd0000644000176000001440000000176511731422661015520 0ustar ripleyusers\name{read-download} \alias{composeURL} \alias{indexGrep} \title{Download Utilities} \description{ Twp helpful utilities to read downloads.\ } \usage{ composeURL(\dots, prefix="http://") indexGrep(pattern, x, \dots) } \arguments{ \item{\dots}{ [composeURL]\cr character strings from which the URL will be composed.\cr [indexGrep]\cr optional arguments to be passed to the function \code{grep}. } \item{prefix}{ a character string specifying the prefix of the URL. } \item{pattern}{ a character string containing a regular expression to be matched in the given character vector. } \item{x}{ a character vector where matches are sought. } } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{programming} fImport/man/provider-listings.Rd0000644000176000001440000000132011731422661016447 0ustar ripleyusers\name{provider-Listings} \alias{providerListings} \alias{amexListing} \alias{h15Listing} \alias{nasdaqListing} \alias{nyseListing} \alias{oandaListing} \alias{stoxxListing} \alias{swxListing} \title{Provider Listing of symbols and descriptions} \description{ CSV files with provider listings of symbols, descriptions ands related information } \format{ All files are in CSV Excel spreadsheet format. The delimiter is a semicolon. } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{datasets} fImport/man/methods-show.Rd0000644000176000001440000000126111415310234015377 0ustar ripleyusers\name{show-methods} \docType{methods} \alias{show-methods} \alias{show,ANY-method} \alias{show,fWEBDATA-method} \title{WEBDATA Download Show Methods} \description{ Show methods for WEBDATA downloads. } \section{Methods}{ \describe{ \item{object = "ANY"}{ Generic function. } \item{object = "fWEBDATA"}{ Print function for objects of class \code{"fWEBDATA"}. } } } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{data} fImport/man/import-yahoo.Rd0000644000176000001440000001225311731422661015421 0ustar ripleyusers\name{import-yahoo} \alias{import-yahoo} \alias{yahooImport} \alias{yahooSeries} \title{Import Market Data from Yahoo} \description{ Imports financial time series data from chart.yahoo.com. } \usage{ yahooSeries(symbols, from = NULL, to = Sys.timeDate(), nDaysBack = 366, \dots) yahooImport(query, file = "tempfile", source = NULL, frequency = c("daily", "weekly", "monthly"), from = NULL, to = Sys.timeDate(), nDaysBack = 366, save = FALSE, sep = ";", try = TRUE) } \arguments{ \item{file}{ a character string with filename, usually having extension ".csv", where to save the downloaded data. } \item{frequency}{ a character string, one of "auto", "quarterly", "monthly", or "daily", defining the frequency of the data records. Note, the import function tries autodetect the frequency of the time series to be dowwnloaded. This may fail, in such case specify the frequency explicitely. } \item{from}{ the date from when to extract the time series. } \item{nDaysBack}{ the number of days back. } \item{query}{ a character string, denoting the location of the data at the web site. } \item{save}{ a logical value, if set to TRUE the downloaded data file will be stored under the path and file name specified by the string \code{file}. By default FALSE. } \item{sep}{ a charcter value specifying the column separator. } \item{source}{a character string setting the URL of the source. If \code{NULL}, then the URL will be set automatically to its default value. } \item{symbols}{ a character string with the symbols to be downloaded. } \item{to}{ the end date of the data download, by default the current date. } \item{try}{ a logical value, if set to TRUE the Internet access will be checked. } \item{\dots}{ optional arguments to be passed. } } \value{ The function \code{yahooImport} returns an S4 object of class \code{fWEBDATA} with the following slots: \item{@call}{ the function call. } \item{@data}{ the data as downloaded formatted as a data.frame. } \item{@param}{ a character vector whose elements contain the values of selected parameters of the argument list. } \item{@title}{ a character string with the name of the download. This can be overwritten specifying a user defined input argument. } \item{@description}{ a character string with an optional user defined description. By default just the current date when the test was applied will be returned.} The function \code{yahooSeries} returns an S4 object of class \code{timeSeries} or alternatively an object specified by the function argument \code{returnClass}. \cr The function \code{keystatsImport} returns a data frame with key statistics downloaded from yahoo's web site. } \details{ \bold{Import data from chart.yahoo.com:} \cr\cr The query string is given as \cr\cr \code{s=SYMBOL&a=DD&b=MM&c=CCYY&g=d&q=q&z=SYMBOL&x=.csv} \cr\cr where \code{SYMBOL} has to replaced by the symbol name of the instrument, and \code{DD}, \code{MM}, and \code{CCYY} by the day, month-1 and century/year when the time series should start. Here are some examples of symbols: \tabular{ll}{ \code{[query]} \tab Description: \cr \tab \cr \code{^DJI} \tab Dow Jones 30 Industrial Averages \cr \code{^NYA} \tab New York Stock Exchange Composite \cr \code{^NDX} \tab Nasdaq 100 Index \cr \code{^IXIC} \tab Nasdaq Composite Index \cr \code{^TYX} \tab US 30Y Treasury Bond Index \cr \code{IBM} \tab BM DJIA Stock \cr \code{KO} \tab Coca-Cola DJIA Stock } The meaning of the tokens in the query string are the following: \tabular{ll}{ Token \tab Description \cr \tab \cr \code{s} \tab Selected Ticker-Symbol \cr \code{a} \tab First Quote starts with Month (mm) \cr \code{b} \tab First Quote starts with Day (dd) \cr \code{c} \tab First Quote starts with Year (ccyy) \cr \code{d} \tab Last Quote ends with Month (mm) \cr \code{e} \tab Last Quote ends with Day (dd) \cr \code{f} \tab Last Quote ends with Year (ccyy) \cr \code{z} \tab Selected Ticker-Symbol } } \note{ \bold{Internet Download Functions:} IMPORTANT NOTE: If the service provider changes the data file format it may become necessary to modify and update the functions. } \author{ Diethelm Wuertz for the Rmetrics \R-port. } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{data} fImport/man/import-oanda.Rd0000644000176000001440000000677011731422661015373 0ustar ripleyusers\name{import-oanda} \alias{import-oanda} \alias{oandaImport} \alias{oandaSeries} \title{Import FX Market Data from OANDA} \description{ Imports FX market data from www.oanda.com. } \usage{ oandaSeries(symbols, from = NULL, to = Sys.timeDate(), nDaysBack = 366, \dots) oandaImport(query, file = "tempfile", source = NULL, frequency = "daily", from = NULL, to = Sys.timeDate(), nDaysBack = 366, save = FALSE, sep = ";", try = TRUE) } \arguments{ \item{file}{ a character string with filename, usually having extension ".csv", where to save the downloaded data. } \item{frequency}{ a character string, one of "auto", "quarterly", "monthly", or "daily", defining the frequency of the data records. Note, the import function tries autodetect the frequency of the time series to be dowwnloaded. This may fail, in such case specify the frequency explicitely. } \item{from}{ the date from when to extract the time series. } \item{nDaysBack}{ the number of days back. } \item{query}{ a character string, denoting the location of the data at the web site. } \item{save}{ a logical value, if set to TRUE the downloaded data file will be stored under the path and file name specified by the string \code{file}. By default FALSE. } \item{sep}{ a charcter value specifying the column separator. } \item{source}{a character string setting the URL of the source. If \code{NULL}, then the URL will be set automatically to its default value. } \item{symbols}{ a character string with the symbols to be downloaded. } \item{to}{ the end date of the data download, by default the current date. } \item{try}{ a logical value, if set to TRUE the Internet access will be checked. } \item{\dots}{ optional arguments to be passed. } } \value{ The function \code{fredImport} returns an S4 object of class \code{fWEBDATA} with the following slots: \item{@call}{ the function call. } \item{@data}{ the data as downloaded formatted as a data.frame. } \item{@param}{ a character vector whose elements contain the values of selected parameters of the argument list. } \item{@title}{ a character string with the name of the download. This can be overwritten specifying a user defined input argument. } \item{@description}{ a character string with an optional user defined description. By default just the current date when the test was applied will be returned.} The function \code{fredSeries} returns an S4 object of class \code{timeSeries} or alternatively an object specified by the function argument \code{returnClass}. \cr } \note{ \bold{Internet Download Functions:} \cr\cr IMPORTANT NOTE: If the service provider changes the data file format it may become necessary to modify and update the functions. } \author{ Diethelm Wuertz for the Rmetrics \R-port. } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{data} fImport/man/import-fred.Rd0000644000176000001440000000677711731422661015240 0ustar ripleyusers\name{import-fred} \alias{import-fred} \alias{fredImport} \alias{fredSeries} \title{Import Market Data from the Fred} \description{ Imports financial time series data from research.stlouisfed.org. } \usage{ fredSeries(symbols, from = NULL, to = Sys.timeDate(), nDaysBack = 366, \dots) fredImport(query, file = "tempfile", source = NULL, frequency = "daily", from = NULL, to = Sys.timeDate(), nDaysBack = NULL, save = FALSE, sep = ";", try = TRUE) } \arguments{ \item{file}{ a character string with filename, usually having extension ".csv", where to save the downloaded data. } \item{frequency}{ a character string, one of "auto", "quarterly", "monthly", or "daily", defining the frequency of the data records. Note, the import function tries autodetect the frequency of the time series to be dowwnloaded. This may fail, in such case specify the frequency explicitely. } \item{from}{ the date from when to extract the time series. } \item{nDaysBack}{ the number of days back. } \item{query}{ a character string, denoting the location of the data at the web site. } \item{save}{ a logical value, if set to TRUE the downloaded data file will be stored under the path and file name specified by the string \code{file}. By default FALSE. } \item{sep}{ a charcter value specifying the column separator. } \item{source}{a character string setting the URL of the source. If \code{NULL}, then the URL will be set automatically to its default value. } \item{symbols}{ a character string with the symbols to be downloaded. } \item{to}{ the end date of the data download, by default the current date. } \item{try}{ a logical value, if set to TRUE the Internet access will be checked. } \item{\dots}{ optional arguments to be passed. } } \value{ The function \code{fredImport} returns an S4 object of class \code{fWEBDATA} with the following slots: \item{@call}{ the function call. } \item{@data}{ the data as downloaded formatted as a data.frame. } \item{@param}{ a character vector whose elements contain the values of selected parameters of the argument list. } \item{@title}{ a character string with the name of the download. This can be overwritten specifying a user defined input argument. } \item{@description}{ a character string with an optional user defined description. By default just the current date when the test was applied will be returned.} The function \code{fredSeries} returns an S4 object of class \code{timeSeries} or alternatively an object specified by the function argument \code{returnClass}. \cr } \note{ \bold{Internet Download Functions:} IMPORTANT NOTE: If the service provider changes the data file format it may become necessary to modify and update the functions. } \author{ Diethelm Wuertz for the Rmetrics \R-port. } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{data} fImport/man/class-fWEBDATA.Rd0000644000176000001440000000315711731422661015315 0ustar ripleyusers\name{class-fWEBDATA} \docType{class} \alias{fWEBDATA-class} \title{Class "fWEBDATA"} \description{ The class fWEBDATA represents a download from the internet. } \section{Objects from the Class}{ Objects can be created by calls of the import or series functions. } \section{Slots}{ \describe{ \item{\code{call}:}{Object of class \code{"call"}: the call of the applied function. } \item{\code{data}:}{Object of class \code{"data.frame"}: the data as downloaded formatted as a data.frame. } \item{\code{param}:}{Object of class \code{"character"}: a character vector whose elements contain the values of selected parameters of the argument list. } \item{\code{title}:}{Object of class \code{"character"}: a character string with the name of the download. This can be overwritten specifying a user defined input argument. } \item{\code{description}:}{Object of class \code{" character"}: a character string with an optional user defined description. By default just the current date and user when the test was applied will be returned.} } } \section{Methods}{ \describe{ \item{show}{\code{signature(object = "fWEBDATA")}: prints an object of class 'fWEBDATA'. } } } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{data} fImport/man/00fImport-package.Rd0000644000176000001440000000766411731422661016155 0ustar ripleyusers\name{fImport-package} \alias{fImport-package} \alias{fImport} \docType{package} \title{Import Data Package} \description{ Package of utility functions to download and manage data sets from the Internet or from other sources. } \details{ \tabular{ll}{ Package: \tab fImport\cr Type: \tab Package\cr Date: \tab 2008, 2009, 2010\cr License: \tab GPL Version 2 or later\cr Copyright: \tab (c) 1999-2008 Diethelm Wuertz and Rmetrics Foundation\cr URL: \tab \url{http://www.rmetrics.org} } The package contains: 1) Three example import funcions for Yahoo, Oanda, and the Federal Reserve. 2) Several utility functions to read and write data, including functions which allow system calls to the \code{Links}, \code{Lynx}, and \code{W3M} text browser, and a function to read \code{xls} Excel files. 3) Several csv files with provider listings. } \section{Overview on Time Series Data:}{ The package makes functions available to download financial market data from the internet. Functions are available for the follwing web sites: 1. Data from research.stlouisfed.org\cr 2. Data from www.oanda.com\cr 3. Data from chart.yahoo.com There are two kinds of functions available, the first kind is called \code{*Series} which downloads a \code{"timeSeries"} object from a web site wher the star \code{*} is a placeholder for the web site (fred, oanda, and yahoo), and the second kind is called \code{*Import} which downloads an S4 object of class \code{"fWEBDATA"} wit a \code{@Data} slot which keeps the \code{"timeSeries"} object and further slots which keep additional download information. We recommend the first kind of functions for easy download of economic and financial time series, whereas we recommend to use the second kind of functions when additional information is required for example for the storage of the data in a data base management system. IMPORTANT NOTE: The download from the web site www.forecasts.org has been withdrawn since the time series are no longer updated. Note, all the series offered by the "forecasts.org" web site are also available from the FRED St. Louis database. } \section{Downloading 'timeSeries' Objects:}{ The following functions 1. fredSeries\cr 2. oandaSeries\cr 3. yahooSeries allow for an easy download of economic and financial time series data as objects of class \code{"timeSeries"}. Note, with version 280.73 major changes were made in the argument list of the downloading functions. We apologize for any inconveniences caused by these changes. The reason was that now all functions have the same arguments which makes their usage much easier. For details we refer to the help pages of the functions mentioned above. If you like to keep additional download information, use the functions 1. fredImport\cr 2. oandaImport\cr 3. yahooImport which return objects of class \code{"fWEBDATA"}. } \section{Fomer Yahoo Utilities:}{ The function \code{yahooKeystats} allows to download key statistics from Yahoo's web site. The function \code{yahooBriefings} allows to download a briefing from Yahoo's web site. Note: The Yahoo utilities \code{yahooKeystats} and \code{yahooBriefings} are (as of March 2012) no longer supported. } \section{License:}{ This Rmetrics package is written for educational usage teaching "Computational Finance and Financial Engineering" and licensed under GPL. } \references{ Diethelm Wuertz, Yohan Chalabi, and Andrew Ellis, (2010); \emph{Financial Market Data for R/Rmetrics}, Rmetrics eBook, Rmetrics Association and Finance Online, Zurich, www.rmetrics.org. } \keyword{data} fImport/inst/0000755000176000001440000000000011720123747012703 5ustar ripleyusersfImport/inst/perl/0000755000176000001440000000000011720123747013645 5ustar ripleyusersfImport/inst/perl/xls2csv.pl0000644000176000001440000000650311370220716015605 0ustar ripleyusers#!/bin/env perl BEGIN { use File::Basename; unshift(@INC, dirname $0); } use strict; use Spreadsheet::ParseExcel; # declare some varibles local my($row, $col, $sheet, $cell, $usage, $basename, $sheetnumber, $filename); ## ## Usage information ## $usage = < [] [] Translate the Microsoft Excel spreadsheet file contained in into comma separated value format (CSV) and store in . If is not specified, the output file will have the same name as the input file with '.xls' or '.XLS' (if any) removed and '.csv' appended. If no worksheet number is given, each worksheet will be written to a separate file with the name '_.csv'. EOF ## ## parse arguments ## if(!defined($ARGV[0])) { print $usage; exit 1; } $basename = $ARGV[1]; $basename =~ s/.csv//; if ($basename eq "") { my @path; @path = split(/[\/\\]/, $ARGV[0]); # split on file separator $basename = $path[$#path]; $basename =~ s/.xls//i; } if(defined($ARGV[2]) ) { $sheetnumber = $ARGV[2]; die "Sheetnumber must be an integer larger than 0." if $sheetnumber < 1; } ## ## open spreadsheet ## my $oExcel = new Spreadsheet::ParseExcel; print "Loading $ARGV[0] ...\n"; open(FH, "<$ARGV[0]") or die "Unable to open file '$ARGV[0]'.\n"; close(FH); my $oBook = $oExcel->Parse($ARGV[0]); print "\n"; print "Orignal Filename :", $oBook->{File} , "\n"; print "Number of Sheets :", $oBook->{SheetCount} , "\n"; print "Author :", $oBook->{Author} , "\n"; print "\n"; my @sheetlist = (@{$oBook->{Worksheet}}); if (defined($sheetnumber)) { @sheetlist=($sheetlist[$sheetnumber-1]); } ## ## iterate across each worksheet, writing out a separat csv file ## my $i=0; foreach my $sheet (@sheetlist) { $i++; my $sheetname = $sheet->{Name}; if(defined($sheetnumber)) { $filename = "${basename}.csv"; } else { $filename = "${basename}_${sheetname}.csv"; } print "Writing Sheet number $i ('$sheetname') to file '$filename'\n"; open(OutFile,">$filename"); my $cumulativeBlankLines=0; my $minrow = $sheet->{MinRow}; my $maxrow = $sheet->{MaxRow}; my $mincol = $sheet->{MinCol}; my $maxcol = $sheet->{MaxCol}; print "Minrow=$minrow Maxrow=$maxrow Mincol=$mincol Maxcol=$maxcol\n"; for(my $row = $minrow; $row <= $maxrow; $row++) { my $outputLine = ""; for(my $col = $mincol; $col <= $maxcol; $col++) { my $cell = $sheet->{Cells}[$row][$col]; if( defined($cell) ) { $_=$cell->Value; #{Val}; # convert '#NUM!' strings to missing (empty) values s/#NUM!//; # escape double-quote characters in the data since # they are used as field delimiters s/\"/\\\"/g; } else { $_ = ''; } $outputLine .= "\"" . $_ . "\"" if(length($_)>0); # separate cells with commas $outputLine .= "," if( $col != $maxcol) ; } #$outputLine =~ s/[, ]+$//g; ## strip off trailing blanks and commas # skip blank/empty lines if( $outputLine =~ /^[, ]*$/ ) { $cumulativeBlankLines++ } else { print OutFile "$outputLine \n" } } close OutFile; print " (Ignored $cumulativeBlankLines blank lines.)\n" if ($cumulativeBlankLines); print "\n"; } fImport/inst/perl/html-parser/0000755000176000001440000000000011720123747016103 5ustar ripleyusersfImport/inst/perl/html-parser/tformat.pl0000644000176000001440000000614211370220716020112 0ustar ripleyusers# Simple text formatter # Jim Davis 17 July 94 # current page, line, and column numbers. $page = 1; $line = 1; $column = 1; $left_margin = 1; $right_margin = 72; # lines on page before footer. or 0 if no limit. $bottom_margin = 58; # add newlines to make page be full length? $fill_page_length = 1; sub print_word_wrap { local ($word) = @_; if (($column + ($whitespace_significant ? 0 : 1) + length ($word) ) > ($right_margin + 1)) { &fresh_line();} if ($column > $left_margin && !$whitespace_significant) { print " "; $column++;} print $word; $column += length ($word);} sub print_whitespace { local ($char) = @_; if ($char eq " ") { $column++; print " ";} elsif ($char eq "\t") { &get_to_column (&tab_column($column));} elsif ($char eq "\n") { &new_line();} else { die "Unknown whitespace character \"$char\"\nStopped";} } sub tab_column { local ($c) = @_; (int (($c-1) / 8) + 1) * 8 + 1;} sub fresh_line { if ($column > $left_margin) {&new_line();} while ($column < $left_margin) { print " "; $column++;}} sub finish_page { # Add extra newlines to finish page. # You might not want to do this on the last page. if ($fill_page_length) { while ($line < $bottom_margin) {&cr();}} &do_footer (); $line = 1; $column = 1;} sub start_page { if ($page != 1) { &do_header ();}} sub print_n_chars { local ($n, $char) = @_; local ($i); for ($i = 1; $i <= $n; $i++) {print $char;} $column += $n;} # need one NL to end current line, and then N to get N blank lines. sub skip_n_lines { local ($n, $room_left) = @_; if ($bottom_margin > 0 && $line + $room_left >= $bottom_margin) { &finish_page(); &start_page();} else { local ($i); for ($i = 0; $i <= $n; $i++) {&new_line();}}} sub new_line { if ($bottom_margin > 0 && $line >= $bottom_margin) { &finish_page(); &start_page();} else {&cr();} &print_n_chars ($left_margin - 1, " ");} # used in footer and header where we don't respect the bottom margin. sub print_blank_lines { local ($n) = @_; local ($i); for ($i = 0; $i < $n; $i++) {&cr();}} sub cr { print "\n"; $line++; $column = 1;} # left, center, and right tabbed items sub print_lcr_line { local ($left, $center, $right) = @_; &print_tab_left (1, $left); &print_tab_center (($right_margin - $left_margin) / 2, $center); &print_tab_right ($right_margin, $right); &cr();} sub print_tab_left { local ($tab_column, $string) = @_; &get_to_column ($tab_column); print $string; $column += length ($string); } sub print_tab_center { local ($tab_column, $string) = @_; &get_to_column ($tab_column - (length($string) / 2)); print $string; $column += length ($string); } sub print_tab_right { local ($tab_column, $string) = @_; &get_to_column ($tab_column - length($string)); print $string; $column += length ($string); } sub get_to_column { local ($goal_column) = @_; if ($column > $goal_column) {print " "; $column++;} else { while ($column < $goal_column) { print " "; $column++;}}} fImport/inst/perl/html-parser/rfc.pl0000644000176000001440000000375111370220716017213 0ustar ripleyusers# Routines for HTML handling of an RFC # load these after loading html-to-ascii.pl because they redefine some things. # Gosh, it sure would be nice to have an object oriented language for this, # so I didn't have to duplicate code in both files. # Jim Davis, July 15 1994 # 3 Aug 94 changed META tag handling. $lines_per_page = 58; $columns_per_line = 72; # Need this info to generate header lines. $author = "(no author)"; $status = "Internet Draft"; $title = "(no title)"; $date = "(no date)"; # The values are read from META elements in the HEAD, e.g. # # # # number of blank lines after header, before text. $top_skip = 2; # blank lines before footer $bottom_skip = 2; $bottom_margin = $lines_per_page - $bottom_skip - 1 ; $End{"HEAD"} = "end_head"; sub end_head { local ($element) = @_; &set_header_variables_from_meta_tags(); $ignore_text = 0;} sub set_header_variables_from_meta_tags { $author = $Variable{"author"}; $status = $Variable{"status"}; $title = $Variable{"title"}; $date = $Variable{"date"};} # Called by tformat sub do_header { local ($save_left) = $left_margin; local ($save_right) = $right_margin; $left_margin = 1; $right_margin = $columns_per_line; &print_lcr_line ($status, $title, $date); $left_margin = $save_left; $right_margin = $save_right; &print_blank_lines ($top_skip);} sub do_footer { &print_blank_lines ($bottom_skip); local ($save_left) = $left_margin; local ($save_right) = $right_margin; $left_margin = 1; $right_margin = $columns_per_line; &print_lcr_line ($author, "", "[Page $page]"); $left_margin = $save_left; $right_margin = $save_right; print "\014\n"; $page++;} $End{"BODY"} = "end_document"; sub end_document { local ($element) = @_; # might not want to fill the last page $fill_page_length = $flush_last_page; &finish_page ();} 1; fImport/inst/perl/html-parser/parse-html.pl0000644000176000001440000002243711370220716020517 0ustar ripleyusers# HTML parser # Jim Davis, July 15 1994 # This is an HTML parser not an SGML parser. It does not parse a DTD, # The DTD is implicit in the code, and specific to HTML. # The processing of the HTML can be customized by the user by # 1) Defining routines to be called for various tags (see Begin and End arrays) # 2) Defining routines html_content and html_whitespace # This is not a validating parser. It does not check the content model # eg you can use DT outside a DL and it won't know. It is too liberal in # what tags are allowed to minimize what other tags. # Bugs - can't parse the prolog or whatever you call it # # # # %html; # ]> # modified 3 Aug to add a bunch of HTML 2.0 tags # modified 3 Sept to print HTML stack to STDERR not STDOUT, to add new # routines html_begin_doc and html_end_doc for application specific cleanup # and to break parse_html into two pieces. # modified 30 Sept 94. parse_attributes now handles tag attributes that # don't have values. thanks to Bill Simpson-Young # for the code. # modified 17 Apr 95 to support FORMS tags. $debug = 0; $whitespace_significant = 0; # global variables: # $line_buffer is line buffer # $line_count is input line number. $line_buffer = ""; $line_count = 0; sub parse_html { local ($file) = @_; open (HTML, $file) || die "Could not open $file: $!\nStopped"; &parse_html_stream (); close (HTML);} # Global input HTML is the handle to the stream of HTML sub parse_html_stream { local ($token, $new); ## initialization @stack=(); $line_count = 0; $line_buffer = ""; ## application specific initialization &html_begin_doc(); main: while (1) { # if whitespace does not matter, trim any leading space. if (! $whitespace_significant) { $line_buffer =~ s/^\s+//;} # now dispatch on the type of token if ($line_buffer =~ /^(\s+)/) { $token = $1; $line_buffer = $'; &html_whitespace ($token);} # This will lose if there is more than one comment on the line! elsif ($line_buffer =~ /^(\)/) { $token = $1; $line_buffer = $'; &html_comment ($token);} elsif ($line_buffer =~ /^(\]*\>)/) { $token = $1; $line_buffer = $'; &html_comment ($token);} elsif ($line_buffer =~ /^(\<\/[^\>]*\>)/) { $token = $1; $line_buffer = $'; &html_etag ($token);} elsif ($line_buffer =~ /^(\<[^!\/][^\>]*\>)/) { $token = $1; $line_buffer = $'; &html_tag ($token);} elsif ($line_buffer =~ /^([^\s<]+)/) { $token = $1; $line_buffer = $'; $token = &substitute_entities($token); &html_content ($token); } else { # No valid token in buffer. Maybe it's empty, or maybe there's an # incomplete tag. So get some more data. $new = ; if (! defined ($new)) {last main;} # if we're trying to find a match for a tag, then get rid of embedded newline # this is, I think, a kludge if ($line_buffer =~ /^\ -1) { print STDERR "Stack not empty at end of document\n"; &print_html_stack();} } sub html_tag { local ($tag) = @_; local ($element) = &tag_element ($tag); local (%attributes) = &tag_attributes ($tag); # the tag might minimize (be an implicit end) for the previous tag local ($prev_element); while (&Minimizes(&stack_top_element(), $element)) { $prev_element = &stack_pop_element (); if ($debug) { print STDERR "MINIMIZING $prev_element with $element on $line_count\n";} &html_end ($prev_element, 0);} push (@stack, $tag); &html_begin ($element, $tag, *attributes); if (&Empty($element)) { pop(@stack); &html_end ($element, 0);} } sub html_etag { local ($tag) = @_; local ($element) = &tag_element ($tag); # pop stack until find matching tag. This is probably a bad idea, # or at least too general. local ( $prev_element) = &stack_pop_element(); until ($prev_element eq $element) { if ($debug) { print STDERR "MINIMIZING $prev_element with /$element on $line_count \n";} &html_end ($prev_element, 0); if ($#stack == -1) { print STDERR "No match found for /$element. You will lose\n"; last;} $prev_element = &stack_pop_element();} &html_end ($element, 1); } # For each element, the names of elements which minimize it. # This is of course totally HTML dependent and probably I have it wrong too $Minimize{"DT"} = "DT:DD"; $Minimize{"DD"} = "DT"; $Minimize{"LI"} = "LI"; $Minimize{"P"} = "P:DT:LI:H1:H2:H3:H4:BLOCKQUOTE:UL:OL:DL"; # Does element E2 minimize E1? sub Minimizes { local ($e1, $e2) = @_; local ($value) = 0; foreach $elt (split (":", $Minimize{$e1})) { if ($elt eq $e2) {$value = 1;}} $value;} $Empty{"BASE"} = 1; $Empty{"BR"} = 1; $Empty{"HR"} = 1; $Empty{"IMG"} = 1; $Empty{"ISINDEX"} = 1; $Empty{"LINK"} = 1; $Empty{"META"} = 1; $Empty{"NEXTID"} = 1; $Empty{"INPUT"} = 1; # Empty tags have no content and hence no end tags sub Empty { local ($element) = @_; $Empty{$element};} sub print_html_stack { print STDERR "\n ==\n"; foreach $elt (reverse @stack) {print STDERR " $elt\n";} print STDERR " ==========\n";} # The element on top of stack, if any. sub stack_top_element { if ($#stack >= 0) { &tag_element ($stack[$#stack]);}} sub stack_pop_element { &tag_element (pop (@stack));} # The element from the tag, normalized. sub tag_element { local ($tag) = @_; $tag =~ /<\/?([^\s>]+)/; local ($element) = $1; $element =~ tr/a-z/A-Z/; $element;} # associative array of the attributes of a tag. sub tag_attributes { local ($tag) = @_; $tag =~ /^<[A-Za-z]+ +(.*)>$/; &parse_attributes($1);} # string should be something like # KEY="value" KEY2="longer value" KEY3="tags o doom" # output is an associative array (like a lisp property list) # attributes names are not case sensitive, do I downcase them # Maybe (probably) I should substitute for entities when parsing attributes. sub parse_attributes { local ($string) = @_; local (%attributes); local ($name, $val); get: while (1) { if ($string =~ /^ *([A-Za-z]+)=\"([^\"]*)\"/) { $name = $1; $val = $2; $string = $'; $name =~ tr/A-Z/a-z/; $attributes{$name} = $val; } elsif ($string =~ /^ *([A-Za-z]+)=(\S*)/) { $name = $1; $val = $2; $string = $'; $name =~ tr/A-Z/a-z/; $attributes{$name} = $val;} elsif ($string =~ /^ *([A-Za-z]+)/) { $name = $1; $val = ""; $string = $'; $name =~ tr/A-Z/a-z/; $attributes{$name} = $val;} else {last;}} %attributes;} sub substitute_entities { local ($string) = @_; $string =~ s/&/&/g; $string =~ s/<//g; $string =~ s/"/\"/g; $string;} @HTML_elements = ( "A", "ADDRESS", "B", "BASE", "BLINK", # Netscape addition :-( "BLOCKQUOTE", "BODY", "BR", "CITE", "CENTER", # Netscape addition :-( "CODE", "DD", "DIR", "DFN", "DL", "DT", "EM", "FORM", "H1", "H2", "H3", "H4", "H5", "H6", "HEAD", "HR", "HTML", "I", "ISINDEX", "IMG", "INPUT", "KBD", "LI", "LINK", "MENU", "META", "NEXTID", "OL", "OPTION", "P", "PRE", "SAMP", "SELECT", "STRIKE", "STRONG", "TITLE", "TEXTAREA", "TT", "UL", "VAR", ); sub define_element { local ($element) = @_; $Begin{$element} = "Noop"; $End{$element} = "Noop";} foreach $element (@HTML_elements) {&define_element($element);} # do nothing sub Noop { local ($element, $xxx) = @_;} # called when a tag begins. Dispatches using Begin sub html_begin { local ($element, $tag, *attributes) = @_; local ($routine) = $Begin{$element}; if ($routine eq "") { print STDERR "Unknown HTML element $element ($tag) on line $line_count\n";} else {eval "&$routine;"}} # called when a tag ends. Explicit is 0 if tag end is because of minimization # not that you should care. sub html_end { local ($element, $explicit) = @_; local ($routine) = $End{$element}; if ($routine eq "") { print STDERR "Unknown HTML element \"$element\" (END $explicit) on line $line_count\n";} else {eval "&$routine(\"$element\", $explicit)";}} sub html_content { local ($word) = @_; } sub html_whitespace { local ($whitespace) = @_;} sub html_comment { local ($tag) = @_;} # redefine these for application-specific initialization and cleanup sub html_begin_doc {} sub html_end_doc {} # return a "true value" when loaded by perl. 1; fImport/inst/perl/html-parser/html-to-rfc.pl0000644000176000001440000000225511370220716020573 0ustar ripleyusers#!/bin/env perl # Program to generate ASCII text for an Internet Draft # according to the standards in # ftp://nis.nsf.net/internet/documents/ietf/1id-guidelines.txt # ftp://nis.nsf.net/internet/documents/rfc/rfc1543.txt # # Created by James R. Davis, July 15 1994 # get directory where this file is. {$0 =~ /^(.*)\/.*$/; $my_dir = $1; if ($my_dir !~ ?^/?) {$my_dir = $ENV{PWD} . "/" . $my_dir;} if ($my_dir =~ ?/$?) {chop ($my_dir);}} push(@INC, $my_dir); # Parse command line arguments. $file = ""; $flush_last_page = 1; while ($#ARGV >=0) { $arg = shift; if ($arg =~ /^-./) { if ($arg =~ /-flush/) { if ($#ARGV == -1) {die "Missing value for $arg\nStopped ";} # If 0, don't add newlines to the last page. # Hmm, might not want page numbering at all in that case... $flush_last_page = shift; } else { die "Unrecognized switch $arg.\nStopped";}} else { $file = $arg; }} if ($file eq "") { die "Missing argument (HTML input file)\n";} require "parse-html.pl" || die "Could not load parse-html.pl"; require "html-ascii.pl" || die "Could not load html-ascii.pl"; require "rfc.pl" || die "Could not load rfc.pl"; &parse_html ($file); fImport/inst/perl/html-parser/html-to-ascii.pl0000644000176000001440000000151211370220716021104 0ustar ripleyusers#!/bin/env perl # Program to generate ASCII text for HTML # Created by James R. Davis, July 15 1994 # get directory where this file is. {$0 =~ /^(.*)\/.*$/; $my_dir = $1; if ($my_dir !~ ?^/?) {$my_dir = $ENV{PWD} . "/" . $my_dir;} if ($my_dir =~ ?/$?) {chop ($my_dir);}} push(@INC, $my_dir); # Parse command line arguments. $file = ""; while ($#ARGV >=0) { $arg = shift; if ($arg =~ /^-./) { if ($arg =~ /-width/) { if ($#ARGV == -1) {die "Missing value for $arg\nStopped ";} $columns_per_line= shift;} else { die "Unrecognized switch $arg.\nStopped";}} else { $file = $arg; }} if ($file eq "") { die "Missing argument (HTML input file)\n";} require "parse-html.pl" || die "Could not load parse-html.pl"; require "html-ascii.pl" || die "Could not load html-ascii.pl"; &parse_html ($file); fImport/inst/perl/html-parser/html-parser.html0000644000176000001440000000565411370220716021234 0ustar ripleyusers Cheap HTML parser in perl

Cheap HTML parser
Jim Davis
davis@dri.cornell.edu
July 1994

This is code for doing simple processing on HTML. I know there are bugs and limitations in the code, but it suffices for simple purposes. Among the limitations: This is an HTML parser, not an SGML parser - it does not accept a DTD, rather the model of HTML is built into the code. Also it does not validate the HTML - it will attempt to parse invalid documents, and the results are undefined if the document is in error.

The source code is available as a compressed Unix tar file. It runs under perl 4.0 patch level 36. I don't know about other versions of perl. This directory contains:

parse-html.pl
A simple HTML parser written in perl. As it parses the HTML, it calls routines (which you may redefine) for each tag encountered, and for whitespace and content. You can redefine these routines so as to process the HTML document.
html-to-ascii.pl
Uses the HTML parser to generate a plain ASCII version of an HTML document.
html-ascii.pl
The actual routines to generate the ASCII.
tformat.pl
A lowlevel text formatter used for generating ASCII. More or less like a subset of nroff
html-to-rfc.pl
Uses the HTML parser to generate a plain ASCII version of an HTML, with special formatting requirements for Internet drafts and RFCs
rfc.pl
Additional routines required for RFC formatting (e.g. page headers and footers)

Generating RFCs from HTML

The RFC format requires there be a header and footer containing, among other things, the name of the authors, a short title, and so on. You specify values for these fields with META tags as shown by the following example.
<META name="status" content="Internet Draft">
<META name="title" content="Internet audio protocol">
<META name="date" content="July 1983">
<META name="author" content="Nixon, Haldeman">
(The META tag is not officially part of HTML, it was proposed by Roy Fielding.) The tags should be in the HEAD.

Known bugs

  • It can't parse the prolog (or whatever you call it) because it does not know how to ensure that the square brackets match, e.g. the following
       <!DOCTYPE HTML [
         <!entity % HTML.Minimal "INCLUDE"<
         <!-- Include standard HTML DTD --<
         <!ENTITY % html PUBLIC "-//connolly hal.com//DTD WWW HTML 1.8//EN"<
         %html;
         ]<
    
  • font tags (e.g. CODE, EM) cause an extra whitespace in output e.g. <TT>foo</TT> yields "foo ,".
fImport/inst/perl/html-parser/html-ascii.pl0000644000176000001440000001423611370220716020473 0ustar ripleyusers# Routines for HTML to ASCII. # (fixed width font, no font changes for size, bold, etc) with a little # BUGS AND MISSING FEATURES # font tags (e.g. CODE, EM) cause an extra whitespace # e.g. foo, -> foo , # Jim Davis July 15 1994 # modified 3 Aug 94 to support MENU and DIR require "tformat.pl" || die "Could not load tformat.pl: $@\nStopped"; # Can be set by command line arg if (! defined($columns_per_line)) { $columns_per_line = 72;} if (! defined($flush_last_page)) { $flush_last_page = 1;} # amount to add to indentation $indent_left = 5; $indent_right = 5; # ignore contents inside HEAD. $ignore_text = 0; # Set variables in tformat $left_margin = 1; $right_margin = $columns_per_line; $bottom_margin = 0; ## Routines called by html.pl $Begin{"HEAD"} = "begin_head"; $End{"HEAD"} = "end_head"; sub begin_head { local ($element, $tag) = @_; $ignore_text = 1;} sub end_head { local ($element) = @_; $ignore_text = 0;} $Begin{"BODY"} = "begin_document"; sub begin_document { local ($element, $tag) = @_; &start_page();} $End{"BODY"} = "end_document"; sub end_document { local ($element) = @_; &fresh_line();} ## Headers $Begin{"H1"} = "begin_header"; $End{"H1"} = "end_header"; $Begin{"H2"} = "begin_header"; $End{"H2"} = "end_header"; $Begin{"H3"} = "begin_header"; $End{"H3"} = "end_header"; $Begin{"H4"} = "begin_header"; $End{"H4"} = "end_header"; $Skip_Before{"H1"} = 1; $Skip_After{"H1"} = 1; $Skip_Before{"H2"} = 1; $Skip_After{"H2"} = 1; $Skip_Before{"H3"} = 1; $Skip_After{"H3"} = 0; sub begin_header { local ($element, $tag) = @_; &skip_n_lines ($Skip_Before{$element}, 5);} sub end_header { local ($element) = @_; &skip_n_lines ($Skip_After{$element});} $Begin{"BR"} = "line_break"; sub line_break { local ($element, $tag) = @_; &fresh_line();} $Begin{"P"} = "begin_paragraph"; # if fewer than this many lines left on page, start new page $widow_cutoff = 5; sub begin_paragraph { local ($element, $tag) = @_; &skip_n_lines (1, $widow_cutoff);} $Begin{"BLOCKQUOTE"} = "begin_blockquote"; $End{"BLOCKQUOTE"} = "end_blockquote"; sub begin_blockquote { local ($element, $tag) = @_; $left_margin += $indent_left; $right_margin = $columns_per_line - $indent_right; &skip_n_lines (1);} sub end_blockquote { local ($element) = @_; $left_margin -= $indent_left; $right_margin = $columns_per_line; &skip_n_lines (1);} $Begin{"PRE"} = "begin_pre"; $End{"PRE"} = "end_pre"; sub begin_pre { local ($element, $tag) = @_; $whitespace_significant = 1;} sub end_pre { local ($element) = @_; $whitespace_significant = 0;} $Begin{"INPUT"} = "form_input"; sub form_input { local ($element, $tag, *attributes) = @_; if ($attributes{"value"} ne "") { &print_word_wrap($attributes{"value"});}} $Begin{"HR"} = "horizontal_rule"; sub horizontal_rule { local ($element, $tag) = @_; &fresh_line (); &print_n_chars ($right_margin - $left_margin, "-");} # Add code for IMG (use ALT attribute) # Ignore I, B, EM, TT, CODE (no font changes) ## List environments $Begin{"UL"} = "begin_itemize"; $End{"UL"} = "end_list_env"; $Begin{"OL"} = "begin_enumerated"; $End{"OL"} = "end_list_env"; $Begin{"MENU"} = "begin_menu"; $End{"MENU"} = "end_list_env"; $Begin{"DIR"} = "begin_dir"; $End{"DIR"} = "end_list_env"; $Begin{"LI"} = "begin_list_item"; # application-specific initialization routine sub html_begin_doc { @list_stack = (); $list_type = "bullet"; $list_counter = 0;} sub push_list_env { push (@list_stack, join (":", $list_type, $list_counter));} sub pop_list_env { ($list_type, $list_counter) = split (":", pop (@list_stack)); $left_margin -= $indent_left;} sub begin_itemize { local ($element, $tag) = @_; &push_list_env(); $left_margin += $indent_left; $list_type = "bullet"; $list_counter = "*";} sub begin_menu { local ($element, $tag) = @_; &push_list_env(); $left_margin += $indent_left; $list_type = "bullet"; $list_counter = "*";} sub begin_dir { local ($element, $tag) = @_; &push_list_env(); $left_margin += $indent_left; $list_type = "bullet"; $list_counter = "*";} sub begin_enumerated { local ($element, $tag) = @_; &push_list_env(); $left_margin += $indent_left; $list_type = "enumerated"; $list_counter = 1;} sub end_list_env { local ($element) = @_; &pop_list_env(); # &fresh_line(); } sub begin_list_item { local ($element, $tag) = @_; $left_margin -= 2; &fresh_line(); &print_word_wrap("$list_counter "); if ($list_type eq "enumerated") {$list_counter++;} $left_margin += 2;} $Begin{"DL"} = "begin_dl"; sub begin_dl { local ($element, $tag) = @_; &skip_n_lines(1,5);} $Begin{"DT"} = "begin_defined_term"; $Begin{"DD"} = "begin_defined_definition"; $End{"DD"} = "end_defined_definition"; sub begin_defined_term { local ($element, $tag) = @_; &fresh_line();} sub begin_defined_definition { local ($element, $tag) = @_; $left_margin += $indent_left; &fresh_line();} sub end_defined_definition { local ($element) = @_; $left_margin -= $indent_left; &fresh_line();} $Begin{"META"} = "begin_meta"; # a META tag sets a value in the assoc array %Variable # i.e. sers $Variable{author} to "Rushdie" sub begin_meta { local ($element, $tag, *attributes) = @_; local ($variable, $value); $variable = $attributes{name}; $value = $attributes{content}; $Variable{$variable} = $value;} $Begin{"IMG"} = "begin_img"; sub begin_img { local ($element, $tag, *attributes) = @_; &print_word_wrap (($attributes{"alt"} ne "") ? $attributes{"alt"} : "[IMAGE]");} # Content and whitespace. sub html_content { local ($string) = @_; unless ($ignore_text) { &print_word_wrap ($string);}} sub html_whitespace { local ($string) = @_; if (! $whitespace_significant) { die "Internal error, called html_whitespace when whitespace was not significant";} local ($i); for ($i = 0; $i < length ($string); $i++) { &print_whitespace (substr($string,$i,1));}} # called by tformat. Do nothing. sub do_footer { } sub do_header { } 1; fImport/inst/perl/Spreadsheet/0000755000176000001440000000000011720123747016114 5ustar ripleyusersfImport/inst/perl/Spreadsheet/ParseExcel.pm0000644000176000001440000024440511370220716020511 0ustar ripleyusers# Spreadsheet::ParseExcel # by Kawai, Takanori (Hippo2000) 2000.10.2 # 2001. 2.2 (Ver. 0.15) # This Program is ALPHA version. #////////////////////////////////////////////////////////////////////////////// # Spreadsheet::ParseExcel Objects #////////////////////////////////////////////////////////////////////////////// use Spreadsheet::ParseExcel::FmtDefault; #============================================================================== # Spreadsheet::ParseExcel::Workbook #============================================================================== package Spreadsheet::ParseExcel::Workbook; use strict; use warnings; sub new { my ($class) = @_; my $self = {}; bless $self, $class; } #------------------------------------------------------------------------------ # Spreadsheet::ParseExcel::Workbook->ParseAbort #------------------------------------------------------------------------------ sub ParseAbort { my($self, $val) =@_; $self->{_ParseAbort} = $val; } #------------------------------------------------------------------------------ # Spreadsheet::ParseExcel::Workbook->Parse #------------------------------------------------------------------------------ sub Parse { my($class, $source, $oFmt) =@_; my $excel = Spreadsheet::ParseExcel->new; my $workbook = $excel->Parse($source, $oFmt); $workbook->{_Excel} = $excel; return $workbook; } #------------------------------------------------------------------------------ # Spreadsheet::ParseExcel::Workbook Worksheet #------------------------------------------------------------------------------ sub Worksheet { my($oBook, $sName) =@_; my $oWkS; foreach $oWkS (@{$oBook->{Worksheet}}) { return $oWkS if($oWkS->{Name} eq $sName); } if($sName =~ /^\d+$/) { return $oBook->{Worksheet}->[$sName]; } return undef; } #DESTROY { # my ($self) = @_; # warn "DESTROY $self called\n" #} #============================================================================== # Spreadsheet::ParseExcel::Worksheet #============================================================================== package Spreadsheet::ParseExcel::Worksheet; use strict; use warnings; use overload '0+' => \&sheetNo, 'fallback' => 1, ; use Scalar::Util qw(weaken); sub new { my ($class, %rhIni) = @_; my $self = \%rhIni; weaken $self->{_Book}; $self->{Cells}=undef; $self->{DefColWidth}=8.38; bless $self, $class; } #------------------------------------------------------------------------------ # Spreadsheet::ParseExcel::Worksheet->sheetNo #------------------------------------------------------------------------------ sub sheetNo { my($oSelf) = @_; return $oSelf->{_SheetNo}; } #------------------------------------------------------------------------------ # Spreadsheet::ParseExcel::Worksheet->Cell #------------------------------------------------------------------------------ sub Cell { my($oSelf, $iR, $iC) = @_; # return undef if no arguments are given or if no cells are defined return if ((!defined($iR)) || (!defined($iC)) || (!defined($oSelf->{MaxRow})) || (!defined($oSelf->{MaxCol}))); # return undef if outside defined rectangle return if (($iR < $oSelf->{MinRow}) || ($iR > $oSelf->{MaxRow}) || ($iC < $oSelf->{MinCol}) || ($iC > $oSelf->{MaxCol})); # return the Cell object return $oSelf->{Cells}[$iR][$iC]; } #------------------------------------------------------------------------------ # Spreadsheet::ParseExcel::Worksheet->RowRange #------------------------------------------------------------------------------ sub RowRange { my($oSelf) = @_; my $iMin = $oSelf->{MinRow} || 0; my $iMax = defined($oSelf->{MaxRow}) ? $oSelf->{MaxRow} : ($iMin-1); # return the range return($iMin, $iMax); } #------------------------------------------------------------------------------ # Spreadsheet::ParseExcel::Worksheet->ColRange #------------------------------------------------------------------------------ sub ColRange { my($oSelf) = @_; my $iMin = $oSelf->{MinCol} || 0; my $iMax = defined($oSelf->{MaxCol}) ? $oSelf->{MaxCol} : ($iMin-1); # return the range return($iMin, $iMax); } #DESTROY { # my ($self) = @_; # warn "DESTROY $self called\n" #} #============================================================================== # Spreadsheet::ParseExcel::Font #============================================================================== package Spreadsheet::ParseExcel::Font; use strict; use warnings; sub new { my($class, %rhIni) = @_; my $self = \%rhIni; bless $self, $class; } #DESTROY { # my ($self) = @_; # warn "DESTROY $self called\n" #} #============================================================================== # Spreadsheet::ParseExcel::Format #============================================================================== package Spreadsheet::ParseExcel::Format; use strict; use warnings; sub new { my($class, %rhIni) = @_; my $self = \%rhIni; bless $self, $class; } #DESTROY { # my ($self) = @_; # warn "DESTROY $self called\n" #} #============================================================================== # Spreadsheet::ParseExcel::Cell #============================================================================== package Spreadsheet::ParseExcel::Cell; use strict; use warnings; sub new { my($sPkg, %rhKey)=@_; my($sWk, $iLen); my $self = \%rhKey; bless $self, $sPkg; } sub Value { my($self)=@_; return $self->{_Value}; } #DESTROY { # my ($self) = @_; # warn "DESTROY $self called\n" #} #============================================================================== # Spreadsheet::ParseExcel #============================================================================== package Spreadsheet::ParseExcel; use strict; use warnings; use OLE::Storage_Lite; use IO::File; use Config; our $VERSION = '0.32'; my @aColor = ( '000000', # 0x00 'FFFFFF', 'FFFFFF', 'FFFFFF', 'FFFFFF', 'FFFFFF', 'FFFFFF', 'FFFFFF', 'FFFFFF', #0x08 - This one's Black, too ??? 'FFFFFF', 'FF0000', '00FF00', '0000FF', 'FFFF00', 'FF00FF', '00FFFF', '800000', # 0x10 '008000', '000080', '808000', '800080', '008080', 'C0C0C0', '808080', '9999FF', # 0x18 '993366', 'FFFFCC', 'CCFFFF', '660066', 'FF8080', '0066CC', 'CCCCFF', '000080', # 0x20 'FF00FF', 'FFFF00', '00FFFF', '800080', '800000', '008080', '0000FF', '00CCFF', # 0x28 'CCFFFF', 'CCFFCC', 'FFFF99', '99CCFF', 'FF99CC', 'CC99FF', 'FFCC99', '3366FF', # 0x30 '33CCCC', '99CC00', 'FFCC00', 'FF9900', 'FF6600', '666699', '969696', '003366', # 0x38 '339966', '003300', '333300', '993300', '993366', '333399', '333333', 'FFFFFF' # 0x40 ); use constant verExcel95 => 0x500; use constant verExcel97 =>0x600; use constant verBIFF2 =>0x00; use constant verBIFF3 =>0x02; use constant verBIFF4 =>0x04; use constant verBIFF5 =>0x08; use constant verBIFF8 =>0x18; #Added (Not in BOOK) my %ProcTbl =( #Develpers' Kit P291 0x14 => \&_subHeader, # Header 0x15 => \&_subFooter, # Footer 0x18 => \&_subName, # NAME(?) 0x1A => \&_subVPageBreak, # Veritical Page Break 0x1B => \&_subHPageBreak, # Horizontal Page Break 0x22 => \&_subFlg1904, # 1904 Flag 0x26 => \&_subMergin, # Left Mergin 0x27 => \&_subMergin, # Right Mergin 0x28 => \&_subMergin, # Top Mergin 0x29 => \&_subMergin, # Bottom Mergin 0x2A => \&_subPrintHeaders, # Print Headers 0x2B => \&_subPrintGridlines, # Print Gridlines 0x3C => \&_subContinue, # Continue 0x43 => \&_subXF, # ExTended Format(?) #Develpers' Kit P292 0x55 => \&_subDefColWidth, # Consider 0x5C => \&_subWriteAccess, # WRITEACCESS 0x7D => \&_subColInfo, # Colinfo 0x7E => \&_subRK, # RK 0x81 => \&_subWSBOOL, # WSBOOL 0x83 => \&_subHcenter, # HCENTER 0x84 => \&_subVcenter, # VCENTER 0x85 => \&_subBoundSheet, # BoundSheet 0x92 => \&_subPalette, # Palette, fgp 0x99 => \&_subStandardWidth, # Standard Col #Develpers' Kit P293 0xA1 => \&_subSETUP, # SETUP 0xBD => \&_subMulRK, # MULRK 0xBE => \&_subMulBlank, # MULBLANK 0xD6 => \&_subRString, # RString #Develpers' Kit P294 0xE0 => \&_subXF, # ExTended Format 0xE5 => \&_subMergeArea, # MergeArea (Not Documented) 0xFC => \&_subSST, # Shared String Table 0xFD => \&_subLabelSST, # Label SST #Develpers' Kit P295 0x201 => \&_subBlank, # Blank 0x202 => \&_subInteger, # Integer(Not Documented) 0x203 => \&_subNumber, # Number 0x204 => \&_subLabel , # Label 0x205 => \&_subBoolErr, # BoolErr 0x207 => \&_subString, # STRING 0x208 => \&_subRow, # RowData 0x221 => \&_subArray, #Array (Consider) 0x225 => \&_subDefaultRowHeight, # Consider 0x31 => \&_subFont, # Font 0x231 => \&_subFont, # Font 0x27E => \&_subRK, # RK 0x41E => \&_subFormat, # Format 0x06 => \&_subFormula, # Formula 0x406 => \&_subFormula, # Formula 0x09 => \&_subBOF, # BOF(BIFF2) 0x209 => \&_subBOF, # BOF(BIFF3) 0x409 => \&_subBOF, # BOF(BIFF4) 0x809 => \&_subBOF, # BOF(BIFF5-8) ); my $BIGENDIAN; my $PREFUNC; my $_CellHandler; my $_NotSetCell; my $_Object; my $_use_perlio; #------------------------------------------------------------------------------ # Spreadsheet::ParseExcel->new #------------------------------------------------------------------------------ sub new { my ($class, %hParam) =@_; if (not defined $_use_perlio) { if (exists $Config{useperlio} && $Config{useperlio} eq "define") { $_use_perlio = 1; } else { $_use_perlio = 0; require IO::Scalar; import IO::Scalar; } } # Check ENDIAN(Little: Interl etc. BIG: Sparc etc) $BIGENDIAN = (defined $hParam{Endian})? $hParam{Endian} : (unpack("H08", pack("L", 2)) eq '02000000')? 0: 1; my $self = {}; bless $self, $class; $self->{GetContent} = \&_subGetContent; if ($hParam{EventHandlers}) { $self->SetEventHandlers($hParam{EventHandlers}); } else { $self->SetEventHandlers(\%ProcTbl); } if($hParam{AddHandlers}) { foreach my $sKey (keys(%{$hParam{AddHandlers}})) { $self->SetEventHandler($sKey, $hParam{AddHandlers}->{$sKey}); } } $_CellHandler = $hParam{CellHandler} if($hParam{CellHandler}); $_NotSetCell = $hParam{NotSetCell}; $_Object = $hParam{Object}; return $self; } #------------------------------------------------------------------------------ # Spreadsheet::ParseExcel->SetEventHandler #------------------------------------------------------------------------------ sub SetEventHandler { my($self, $key, $sub_ref) = @_; $self->{FuncTbl}->{$key} = $sub_ref; } #------------------------------------------------------------------------------ # Spreadsheet::ParseExcel->SetEventHandlers #------------------------------------------------------------------------------ sub SetEventHandlers { my($self, $rhTbl) = @_; $self->{FuncTbl} = undef; foreach my $sKey (keys %$rhTbl) { $self->{FuncTbl}->{$sKey} = $rhTbl->{$sKey}; } } #------------------------------------------------------------------------------ # Spreadsheet::ParseExcel->Parse #------------------------------------------------------------------------------ sub Parse { my($self, $source, $oWkFmt)=@_; my $oBook = Spreadsheet::ParseExcel::Workbook->new; $oBook->{SheetCount} = 0; my ($sBIFF, $iLen) = $self->_get_content($source, $oBook); return undef if not $sBIFF; if ($oWkFmt) { $oBook->{FmtClass} = $oWkFmt; } else { $oBook->{FmtClass} = Spreadsheet::ParseExcel::FmtDefault->new; } #3. Parse content my $lPos = 0; my $sWk = substr($sBIFF, $lPos, 4); $lPos += 4; my $iEfFlg = 0; while($lPos<=$iLen) { my($bOp, $bLen) = unpack("v2", $sWk); if($bLen) { $sWk = substr($sBIFF, $lPos, $bLen); $lPos += $bLen; } #printf STDERR "%4X:%s\n", $bOp, 'UNDEFIND---:' . unpack("H*", $sWk) unless($NameTbl{$bOp}); #Check EF, EOF if($bOp == 0xEF) { #EF $iEfFlg = $bOp; } elsif($bOp == 0x0A) { #EOF undef $iEfFlg; } #1. Formula String with No String if (not $iEfFlg) { if($oBook->{_PrevPos} && (defined $self->{FuncTbl}->{$bOp}) && ($bOp != 0x207)) { my $iPos = $oBook->{_PrevPos}; $oBook->{_PrevPos} = undef; my ($iR, $iC, $iF) = @$iPos; _NewCell ( $oBook, $iR, $iC, Kind => 'Formula String', Val => '', FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 0, Code => undef, Book => $oBook, ); } if(defined $self->{FuncTbl}->{$bOp}) { $self->{FuncTbl}->{$bOp}->($oBook, $bOp, $bLen, $sWk); } $PREFUNC = $bOp if ($bOp != 0x3C); #Not Continue } if (($lPos+4) <= $iLen) { $sWk = substr($sBIFF, $lPos, 4); } $lPos += 4; return $oBook if defined $oBook->{_ParseAbort}; } return $oBook; } # $source is either filename or open filehandle or array of string or scalar # referernce # $oBook is passed to be updated sub _get_content { my ($self, $source, $oBook) = @_; if(ref($source) eq "SCALAR") { #1.1 Specified by Buffer my ($sBIFF, $iLen) = $self->{GetContent}->($source); return $sBIFF ? ($sBIFF, $iLen) : (undef); } #1.2 Specified by Other Things(HASH reference etc) # elsif(ref($source)) { # return undef; # } #1.2 Specified by GLOB reference elsif((ref($source) =~ /GLOB/) or (ref($source) eq 'Fh')) { #For CGI.pm (Light FileHandle) binmode($source); my $sWk; my $sBuff=''; while(read($source, $sWk, 4096)) { $sBuff .= $sWk; } my ($sBIFF, $iLen) = $self->{GetContent}->(\$sBuff); return $sBIFF ? ($sBIFF, $iLen) : (undef); } elsif(ref($source) eq 'ARRAY') { #1.3 Specified by File content $oBook->{File} = undef; my $sData = join('', @$source); my ($sBIFF, $iLen) = $self->{GetContent}->(\$sData); return $sBIFF ? ($sBIFF, $iLen) : (undef); } else { #1.4 Specified by File name $oBook->{File} = $source; return undef unless (-e $source); my ($sBIFF, $iLen) = $self->{GetContent}->($source); return $sBIFF ? ($sBIFF, $iLen) : (undef); } } #------------------------------------------------------------------------------ # _subGetContent (for Spreadsheet::ParseExcel) #------------------------------------------------------------------------------ sub _subGetContent { my ($sFile) = @_; my $oOl = OLE::Storage_Lite->new($sFile); return (undef, undef) unless($oOl); my @aRes = $oOl->getPpsSearch( [OLE::Storage_Lite::Asc2Ucs('Book'), OLE::Storage_Lite::Asc2Ucs('Workbook')], 1, 1); return (undef, undef) if($#aRes < 0); #Hack from Herbert if ($aRes[0]->{Data}) { return ($aRes[0]->{Data}, length($aRes[0]->{Data})); } #Same as OLE::Storage_Lite my $oIo; #1. $sFile is Ref of scalar if(ref($sFile) eq 'SCALAR') { if ($_use_perlio) { open $oIo, "<", \$sFile; } else { $oIo = IO::Scalar->new; $oIo->open($sFile); } } #2. $sFile is a IO::Handle object elsif(UNIVERSAL::isa($sFile, 'IO::Handle')) { $oIo = $sFile; binmode($oIo); } #3. $sFile is a simple filename string elsif(!ref($sFile)) { $oIo = IO::File->new; $oIo->open("<$sFile") || return undef; binmode($oIo); } my $sWk; my $sBuff =''; while($oIo->read($sWk, 4096)) { #4_096 has no special meanings $sBuff .= $sWk; } $oIo->close(); #Not Excel file (simple method) return (undef, undef) if (substr($sBuff, 0, 1) ne "\x09"); return ($sBuff, length($sBuff)); } #------------------------------------------------------------------------------ # _subBOF (for Spreadsheet::ParseExcel) Developers' Kit : P303 #------------------------------------------------------------------------------ sub _subBOF { my($oBook, $bOp, $bLen, $sWk) = @_; my ($iVer, $iDt) = unpack("v2", $sWk); #Workbook Global if($iDt==0x0005) { $oBook->{Version} = unpack("v", $sWk); $oBook->{BIFFVersion} = ($oBook->{Version}==verExcel95)? verBIFF5:verBIFF8; $oBook->{_CurSheet} = undef; $oBook->{_CurSheet_} = -1; } #Worksheeet or Dialogsheet elsif($iDt != 0x0020) { #if($iDt == 0x0010) if(defined $oBook->{_CurSheet_}) { $oBook->{_CurSheet} = $oBook->{_CurSheet_} + 1; $oBook->{_CurSheet_}++; ($oBook->{Worksheet}[$oBook->{_CurSheet}]->{SheetVersion}, $oBook->{Worksheet}[$oBook->{_CurSheet}]->{SheetType},) = unpack("v2", $sWk) if(length($sWk) > 4); } else { $oBook->{BIFFVersion} = int($bOp / 0x100); if (($oBook->{BIFFVersion} == verBIFF2) || ($oBook->{BIFFVersion} == verBIFF3) || ($oBook->{BIFFVersion} == verBIFF4)) { $oBook->{Version} = $oBook->{BIFFVersion}; $oBook->{_CurSheet} = 0; $oBook->{Worksheet}[$oBook->{SheetCount}] = Spreadsheet::ParseExcel::Worksheet->new( _Name => '', Name => '', _Book => $oBook, _SheetNo => $oBook->{SheetCount}, ); $oBook->{SheetCount}++; } } } else { ($oBook->{_CurSheet_}, $oBook->{_CurSheet}) = (((defined $oBook->{_CurSheet})? $oBook->{_CurSheet}: -1), undef); } } #------------------------------------------------------------------------------ # _subBlank (for Spreadsheet::ParseExcel) DK:P303 #------------------------------------------------------------------------------ sub _subBlank { my($oBook, $bOp, $bLen, $sWk) = @_; my ($iR, $iC, $iF) = unpack("v3", $sWk); _NewCell( $oBook, $iR, $iC, Kind => 'BLANK', Val => '', FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 0, Code => undef, Book => $oBook, ); #2.MaxRow, MaxCol, MinRow, MinCol _SetDimension($oBook, $iR, $iC, $iC); } #------------------------------------------------------------------------------ # _subInteger (for Spreadsheet::ParseExcel) Not in DK #------------------------------------------------------------------------------ sub _subInteger { my($oBook, $bOp, $bLen, $sWk) = @_; my($iR, $iC, $iF, $sTxt, $sDum); ($iR, $iC, $iF, $sDum, $sTxt) = unpack("v3cv", $sWk); _NewCell ( $oBook, $iR, $iC, Kind => 'INTEGER', Val => $sTxt, FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 0, Code => undef, Book => $oBook, ); #2.MaxRow, MaxCol, MinRow, MinCol _SetDimension($oBook, $iR, $iC, $iC); } #------------------------------------------------------------------------------ # _subNumber (for Spreadsheet::ParseExcel) : DK: P354 #------------------------------------------------------------------------------ sub _subNumber { my($oBook, $bOp, $bLen, $sWk) = @_; my ($iR, $iC, $iF) = unpack("v3", $sWk); my $dVal = _convDval(substr($sWk, 6, 8)); _NewCell ( $oBook, $iR, $iC, Kind => 'Number', Val => $dVal, FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 1, Code => undef, Book => $oBook, ); #2.MaxRow, MaxCol, MinRow, MinCol _SetDimension($oBook, $iR, $iC, $iC); } #------------------------------------------------------------------------------ # _convDval (for Spreadsheet::ParseExcel) #------------------------------------------------------------------------------ sub _convDval { my($sWk)=@_; return unpack("d", ($BIGENDIAN)? pack("c8", reverse(unpack("c8", $sWk))) : $sWk); } #------------------------------------------------------------------------------ # _subRString (for Spreadsheet::ParseExcel) DK:P405 #------------------------------------------------------------------------------ sub _subRString { my($oBook, $bOp, $bLen, $sWk) = @_; my($iR, $iC, $iF, $iL, $sTxt); ($iR, $iC, $iF, $iL) = unpack("v4", $sWk); $sTxt = substr($sWk, 8, $iL); #Has STRUN if(length($sWk) > (8+$iL)) { _NewCell ( $oBook, $iR, $iC, Kind => 'RString', Val => $sTxt, FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 0, Code => '_native_', #undef, Book => $oBook, Rich => substr($sWk, (8+$iL)+1), ); } else { _NewCell ( $oBook, $iR, $iC, Kind => 'RString', Val => $sTxt, FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 0, Code => '_native_', Book => $oBook, ); } #2.MaxRow, MaxCol, MinRow, MinCol _SetDimension($oBook, $iR, $iC, $iC); } #------------------------------------------------------------------------------ # _subBoolErr (for Spreadsheet::ParseExcel) DK:P306 #------------------------------------------------------------------------------ sub _subBoolErr { my($oBook, $bOp, $bLen, $sWk) = @_; my ($iR, $iC, $iF) = unpack("v3", $sWk); my ($iVal, $iFlg) = unpack("cc", substr($sWk, 6, 2)); my $sTxt = DecodeBoolErr($iVal, $iFlg); _NewCell ( $oBook, $iR, $iC, Kind => 'BoolError', Val => $sTxt, FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 0, Code => undef, Book => $oBook, ); #2.MaxRow, MaxCol, MinRow, MinCol _SetDimension($oBook, $iR, $iC, $iC); } #------------------------------------------------------------------------------ # _subRK (for Spreadsheet::ParseExcel) DK:P401 #------------------------------------------------------------------------------ sub _subRK { my($oBook, $bOp, $bLen, $sWk) = @_; my ($iR, $iC) = unpack("v3", $sWk); my($iF, $sTxt)= _UnpackRKRec(substr($sWk, 4, 6)); _NewCell ( $oBook, $iR, $iC, Kind => 'RK', Val => $sTxt, FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 1, Code => undef, Book => $oBook, ); #2.MaxRow, MaxCol, MinRow, MinCol _SetDimension($oBook, $iR, $iC, $iC); } #------------------------------------------------------------------------------ # _subArray (for Spreadsheet::ParseExcel) DK:P297 #------------------------------------------------------------------------------ sub _subArray { my($oBook, $bOp, $bLen, $sWk) = @_; my ($iBR, $iER, $iBC, $iEC) = unpack("v2c2", $sWk); } #------------------------------------------------------------------------------ # _subFormula (for Spreadsheet::ParseExcel) DK:P336 #------------------------------------------------------------------------------ sub _subFormula { my($oBook, $bOp, $bLen, $sWk) = @_; my($iR, $iC, $iF) = unpack("v3", $sWk); my ($iFlg) = unpack("v", substr($sWk,12,2)); if($iFlg == 0xFFFF) { my($iKind) = unpack("c", substr($sWk, 6, 1)); my($iVal) = unpack("c", substr($sWk, 8, 1)); if(($iKind==1) or ($iKind==2)) { my $sTxt = ($iKind == 1)? DecodeBoolErr($iVal, 0):DecodeBoolErr($iVal, 1); _NewCell ( $oBook, $iR, $iC, Kind => 'Formulra Bool', Val => $sTxt, FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 0, Code => undef, Book => $oBook, ); } else { # Result (Reserve Only) $oBook->{_PrevPos} = [$iR, $iC, $iF]; } } else { my $dVal = _convDval(substr($sWk, 6, 8)); _NewCell ( $oBook, $iR, $iC, Kind => 'Formula Number', Val => $dVal, FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 1, Code => undef, Book => $oBook, ); } #2.MaxRow, MaxCol, MinRow, MinCol _SetDimension($oBook, $iR, $iC, $iC); } #------------------------------------------------------------------------------ # _subString (for Spreadsheet::ParseExcel) DK:P414 #------------------------------------------------------------------------------ sub _subString { my($oBook, $bOp, $bLen, $sWk) = @_; #Position (not enough for ARRAY) my $iPos = $oBook->{_PrevPos}; return undef unless($iPos); $oBook->{_PrevPos} = undef; my ($iR, $iC, $iF) = @$iPos; my ($iLen, $sTxt, $sCode); if($oBook->{BIFFVersion} == verBIFF8) { my( $raBuff, $iLen) = _convBIFF8String($oBook, $sWk, 1); $sTxt = $raBuff->[0]; $sCode = ($raBuff->[1])? 'ucs2': undef; } elsif($oBook->{BIFFVersion} == verBIFF5) { $sCode = '_native_'; $iLen = unpack("v", $sWk); $sTxt = substr($sWk, 2, $iLen); } else { $sCode = '_native_'; $iLen = unpack("c", $sWk); $sTxt = substr($sWk, 1, $iLen); } _NewCell ( $oBook, $iR, $iC, Kind => 'String', Val => $sTxt, FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 0, Code => $sCode, Book => $oBook, ); #2.MaxRow, MaxCol, MinRow, MinCol _SetDimension($oBook, $iR, $iC, $iC); } #------------------------------------------------------------------------------ # _subLabel (for Spreadsheet::ParseExcel) DK:P344 #------------------------------------------------------------------------------ sub _subLabel { my($oBook, $bOp, $bLen, $sWk) = @_; my($iR, $iC, $iF) = unpack("v3", $sWk); my ($sLbl, $sCode); #BIFF8 if($oBook->{BIFFVersion} >= verBIFF8) { my ( $raBuff, $iLen, $iStPos, $iLenS) = _convBIFF8String($oBook, substr($sWk,6), 1); $sLbl = $raBuff->[0]; $sCode = ($raBuff->[1])? 'ucs2': undef; } #Before BIFF8 else { $sLbl = substr($sWk,8); $sCode = '_native_'; } _NewCell ( $oBook, $iR, $iC, Kind => 'Label', Val => $sLbl, FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 0, Code => $sCode, Book => $oBook, ); #2.MaxRow, MaxCol, MinRow, MinCol _SetDimension($oBook, $iR, $iC, $iC); } #------------------------------------------------------------------------------ # _subMulRK (for Spreadsheet::ParseExcel) DK:P349 #------------------------------------------------------------------------------ sub _subMulRK { my($oBook, $bOp, $bLen, $sWk) = @_; return if ($oBook->{SheetCount}<=0); my ($iR, $iSc) = unpack("v2", $sWk); my $iEc = unpack("v", substr($sWk, length($sWk) -2, 2)); my $iPos = 4; for(my $iC=$iSc; $iC<=$iEc; $iC++) { my($iF, $lVal) = _UnpackRKRec(substr($sWk, $iPos, 6), $iR, $iC); _NewCell ( $oBook, $iR, $iC, Kind => 'MulRK', Val => $lVal, FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 1, Code => undef, Book => $oBook, ); $iPos += 6; } #2.MaxRow, MaxCol, MinRow, MinCol _SetDimension($oBook, $iR, $iSc, $iEc); } #------------------------------------------------------------------------------ # _subMulBlank (for Spreadsheet::ParseExcel) DK:P349 #------------------------------------------------------------------------------ sub _subMulBlank { my($oBook, $bOp, $bLen, $sWk) = @_; my ($iR, $iSc) = unpack("v2", $sWk); my $iEc = unpack("v", substr($sWk, length($sWk)-2, 2)); my $iPos = 4; for(my $iC=$iSc; $iC<=$iEc; $iC++) { my $iF = unpack('v', substr($sWk, $iPos, 2)); _NewCell ( $oBook, $iR, $iC, Kind => 'MulBlank', Val => '', FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 0, Code => undef, Book => $oBook, ); $iPos+=2; } #2.MaxRow, MaxCol, MinRow, MinCol _SetDimension($oBook, $iR, $iSc, $iEc); } #------------------------------------------------------------------------------ # _subLabelSST (for Spreadsheet::ParseExcel) DK: P345 #------------------------------------------------------------------------------ sub _subLabelSST { my($oBook, $bOp, $bLen, $sWk) = @_; my ($iR, $iC, $iF, $iIdx) = unpack('v3V', $sWk); _NewCell ( $oBook, $iR, $iC, Kind => 'PackedIdx', Val => $oBook->{PkgStr}[$iIdx]->{Text}, FormatNo=> $iF, Format => $oBook->{Format}[$iF], Numeric => 0, Code => ($oBook->{PkgStr}[$iIdx]->{Unicode})? 'ucs2': undef, Book => $oBook, Rich => $oBook->{PkgStr}[$iIdx]->{Rich}, ); #2.MaxRow, MaxCol, MinRow, MinCol _SetDimension($oBook, $iR, $iC, $iC); } #------------------------------------------------------------------------------ # _subFlg1904 (for Spreadsheet::ParseExcel) DK:P296 #------------------------------------------------------------------------------ sub _subFlg1904 { my($oBook, $bOp, $bLen, $sWk) = @_; $oBook->{Flg1904} = unpack("v", $sWk); } #------------------------------------------------------------------------------ # _subRow (for Spreadsheet::ParseExcel) DK:P403 #------------------------------------------------------------------------------ sub _subRow { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); #0. Get Worksheet info (MaxRow, MaxCol, MinRow, MinCol) my($iR, $iSc, $iEc, $iHght, $undef1, $undef2, $iGr, $iXf) = unpack("v8", $sWk); $iEc--; #1. RowHeight if($iGr & 0x20) { #Height = 0 $oBook->{Worksheet}[$oBook->{_CurSheet}]->{RowHeight}[$iR] = 0; } else { $oBook->{Worksheet}[$oBook->{_CurSheet}]->{RowHeight}[$iR] = $iHght/20.0; } #2.MaxRow, MaxCol, MinRow, MinCol _SetDimension($oBook, $iR, $iSc, $iEc); } #------------------------------------------------------------------------------ # _SetDimension (for Spreadsheet::ParseExcel) #------------------------------------------------------------------------------ sub _SetDimension { my($oBook, $iR, $iSc, $iEc)=@_; return undef unless(defined $oBook->{_CurSheet}); #2.MaxRow, MaxCol, MinRow, MinCol #2.1 MinRow $oBook->{Worksheet}[$oBook->{_CurSheet}]->{MinRow} = $iR unless (defined $oBook->{Worksheet}[$oBook->{_CurSheet}]->{MinRow}) and ($oBook->{Worksheet}[$oBook->{_CurSheet}]->{MinRow} <= $iR); #2.2 MaxRow $oBook->{Worksheet}[$oBook->{_CurSheet}]->{MaxRow} = $iR unless (defined $oBook->{Worksheet}[$oBook->{_CurSheet}]->{MaxRow}) and ($oBook->{Worksheet}[$oBook->{_CurSheet}]->{MaxRow} > $iR); #2.3 MinCol $oBook->{Worksheet}[$oBook->{_CurSheet}]->{MinCol} = $iSc unless (defined $oBook->{Worksheet}[$oBook->{_CurSheet}]->{MinCol}) and ($oBook->{Worksheet}[$oBook->{_CurSheet}]->{MinCol} <= $iSc); #2.4 MaxCol $oBook->{Worksheet}[$oBook->{_CurSheet}]->{MaxCol} = $iEc unless (defined $oBook->{Worksheet}[$oBook->{_CurSheet}]->{MaxCol}) and ($oBook->{Worksheet}[$oBook->{_CurSheet}]->{MaxCol} > $iEc); } #------------------------------------------------------------------------------ # _subDefaultRowHeight (for Spreadsheet::ParseExcel) DK: P318 #------------------------------------------------------------------------------ sub _subDefaultRowHeight { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); #1. RowHeight my($iDum, $iHght) = unpack("v2", $sWk); $oBook->{Worksheet}[$oBook->{_CurSheet}]->{DefRowHeight} = $iHght/20; } #------------------------------------------------------------------------------ # _subStandardWidth(for Spreadsheet::ParseExcel) DK:P413 #------------------------------------------------------------------------------ sub _subStandardWidth { my($oBook, $bOp, $bLen, $sWk) = @_; my $iW = unpack("v", $sWk); $oBook->{StandardWidth}= _adjustColWidth($oBook, $iW); } #------------------------------------------------------------------------------ # _subDefColWidth(for Spreadsheet::ParseExcel) DK:P319 #------------------------------------------------------------------------------ sub _subDefColWidth { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); my $iW = unpack("v", $sWk); $oBook->{Worksheet}[$oBook->{_CurSheet}]->{DefColWidth}= _adjustColWidth($oBook, $iW); } #------------------------------------------------------------------------------ # _adjustColWidth (for Spreadsheet::ParseExcel) #------------------------------------------------------------------------------ sub _adjustColWidth { my($oBook, $iW)=@_; return (($iW -0xA0)/256); # ($oBook->{Worksheet}[$oBook->{_CurSheet}]->{SheetVersion} == verExcel97)? # (($iW -0xA0)/256) : $iW; } #------------------------------------------------------------------------------ # _subColInfo (for Spreadsheet::ParseExcel) DK:P309 #------------------------------------------------------------------------------ sub _subColInfo { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); my($iSc, $iEc, $iW, $iXF, $iGr) = unpack("v5", $sWk); for(my $i= $iSc; $i<=$iEc; $i++) { $oBook->{Worksheet}[$oBook->{_CurSheet}]->{ColWidth}[$i] = ($iGr & 0x01)? 0: _adjustColWidth($oBook, $iW); #0x01 means HIDDEN $oBook->{Worksheet}[$oBook->{_CurSheet}]->{ColFmtNo}[$i] = $iXF; # $oBook->{Worksheet}[$oBook->{_CurSheet}]->{ColCr}[$i] = $iGr; #Not Implemented } } #------------------------------------------------------------------------------ # _subSST (for Spreadsheet::ParseExcel) DK:P413 #------------------------------------------------------------------------------ sub _subSST { my($oBook, $bOp, $bLen, $sWk) = @_; _subStrWk($oBook, substr($sWk, 8)); } #------------------------------------------------------------------------------ # _subContinue (for Spreadsheet::ParseExcel) DK:P311 #------------------------------------------------------------------------------ sub _subContinue { my($oBook, $bOp, $bLen, $sWk) = @_; #if(defined $self->{FuncTbl}->{$bOp}) { # $self->{FuncTbl}->{$PREFUNC}->($oBook, $bOp, $bLen, $sWk); #} _subStrWk($oBook, $sWk, 1) if($PREFUNC == 0xFC); } #------------------------------------------------------------------------------ # _subWriteAccess (for Spreadsheet::ParseExcel) DK:P451 #------------------------------------------------------------------------------ sub _subWriteAccess { my($oBook, $bOp, $bLen, $sWk) = @_; return if (defined $oBook->{_Author}); #BIFF8 if($oBook->{BIFFVersion} >= verBIFF8) { $oBook->{Author} = _convBIFF8String($oBook, $sWk); } #Before BIFF8 else { my($iLen) = unpack("c", $sWk); $oBook->{Author} = $oBook->{FmtClass}->TextFmt(substr($sWk, 1, $iLen), '_native_'); } } #------------------------------------------------------------------------------ # _convBIFF8String (for Spreadsheet::ParseExcel) #------------------------------------------------------------------------------ sub _convBIFF8String { my($oBook, $sWk, $iCnvFlg) = @_; my($iLen, $iFlg) = unpack("vc", $sWk); my($iHigh, $iExt, $iRich) = ($iFlg & 0x01, $iFlg & 0x04, $iFlg & 0x08); my($iStPos, $iExtCnt, $iRichCnt, $sStr); #2. Rich and Ext if($iRich && $iExt) { $iStPos = 9; ($iRichCnt, $iExtCnt) = unpack('vV', substr($sWk, 3, 6)); } elsif($iRich) { #Only Rich $iStPos = 5; $iRichCnt = unpack('v', substr($sWk, 3, 2)); $iExtCnt = 0; } elsif($iExt) { #Only Ext $iStPos = 7; $iRichCnt = 0; $iExtCnt = unpack('V', substr($sWk, 3, 4)); } else { #Nothing Special $iStPos = 3; $iExtCnt = 0; $iRichCnt = 0; } #3.Get String if($iHigh) { #Compressed $iLen *= 2; $sStr = substr($sWk, $iStPos, $iLen); _SwapForUnicode(\$sStr); $sStr = $oBook->{FmtClass}->TextFmt($sStr, 'ucs2') unless($iCnvFlg); } else { #Not Compressed $sStr = substr($sWk, $iStPos, $iLen); $sStr = $oBook->{FmtClass}->TextFmt($sStr, undef) unless($iCnvFlg); } #4. return if(wantarray) { #4.1 Get Rich and Ext if(length($sWk) < $iStPos + $iLen+ $iRichCnt*4+$iExtCnt) { return ([undef, $iHigh, undef, undef], $iStPos + $iLen+ $iRichCnt*4+$iExtCnt, $iStPos, $iLen); } else { return ([$sStr, $iHigh, substr($sWk, $iStPos + $iLen, $iRichCnt*4), substr($sWk, $iStPos + $iLen+ $iRichCnt*4, $iExtCnt)], $iStPos + $iLen+ $iRichCnt*4+$iExtCnt, $iStPos, $iLen); } } else { return $sStr; } } #------------------------------------------------------------------------------ # _subXF (for Spreadsheet::ParseExcel) DK:P453 #------------------------------------------------------------------------------ sub _subXF { my($oBook, $bOp, $bLen, $sWk) = @_; my ($iFnt, $iIdx); my($iLock, $iHidden, $iStyle, $i123, $iAlH, $iWrap, $iAlV, $iJustL, $iRotate, $iInd, $iShrink, $iMerge, $iReadDir, $iBdrD, $iBdrSL, $iBdrSR, $iBdrST, $iBdrSB, $iBdrSD, $iBdrCL, $iBdrCR, $iBdrCT, $iBdrCB, $iBdrCD, $iFillP, $iFillCF, $iFillCB); if($oBook->{BIFFVersion} == verBIFF8) { my ($iGen, $iAlign, $iGen2, $iBdr1, $iBdr2, $iBdr3, $iPtn ); ($iFnt, $iIdx, $iGen, $iAlign, $iGen2, $iBdr1, $iBdr2, $iBdr3, $iPtn ) = unpack("v7Vv", $sWk); $iLock = ($iGen & 0x01)? 1:0; $iHidden = ($iGen & 0x02)? 1:0; $iStyle = ($iGen & 0x04)? 1:0; $i123 = ($iGen & 0x08)? 1:0; $iAlH = ($iAlign & 0x07); $iWrap = ($iAlign & 0x08)? 1:0; $iAlV = ($iAlign & 0x70) / 0x10; $iJustL = ($iAlign & 0x80)? 1:0; $iRotate = (($iAlign & 0xFF00) / 0x100) & 0x00FF; $iRotate = 90 if($iRotate == 255); $iRotate = 90 - $iRotate if($iRotate > 90); $iInd = ($iGen2 & 0x0F); $iShrink = ($iGen2 & 0x10)? 1:0; $iMerge = ($iGen2 & 0x20)? 1:0; $iReadDir = (($iGen2 & 0xC0) / 0x40) & 0x03; $iBdrSL = $iBdr1 & 0x0F; $iBdrSR = (($iBdr1 & 0xF0) / 0x10) & 0x0F; $iBdrST = (($iBdr1 & 0xF00) / 0x100) & 0x0F; $iBdrSB = (($iBdr1 & 0xF000) / 0x1000) & 0x0F; $iBdrCL = (($iBdr2 & 0x7F)) & 0x7F; $iBdrCR = (($iBdr2 & 0x3F80) / 0x80) & 0x7F; $iBdrD = (($iBdr2 & 0xC000) / 0x4000) & 0x3; $iBdrCT = (($iBdr3 & 0x7F)) & 0x7F; $iBdrCB = (($iBdr3 & 0x3F80) / 0x80) & 0x7F; $iBdrCD = (($iBdr3 & 0x1FC000) / 0x4000) & 0x7F; $iBdrSD = (($iBdr3 & 0x1E00000) / 0x200000) & 0xF; $iFillP = (($iBdr3 & 0xFC000000) / 0x4000000) & 0x3F; $iFillCF = ($iPtn & 0x7F); $iFillCB = (($iPtn & 0x3F80) / 0x80) & 0x7F; } else { my ($iGen, $iAlign, $iPtn, $iPtn2, $iBdr1, $iBdr2); ($iFnt, $iIdx, $iGen, $iAlign, $iPtn, $iPtn2, $iBdr1, $iBdr2) = unpack("v8", $sWk); $iLock = ($iGen & 0x01)? 1:0; $iHidden = ($iGen & 0x02)? 1:0; $iStyle = ($iGen & 0x04)? 1:0; $i123 = ($iGen & 0x08)? 1:0; $iAlH = ($iAlign & 0x07); $iWrap = ($iAlign & 0x08)? 1:0; $iAlV = ($iAlign & 0x70) / 0x10; $iJustL = ($iAlign & 0x80)? 1:0; $iRotate = (($iAlign & 0x300) / 0x100) & 0x3; $iFillCF = ($iPtn & 0x7F); $iFillCB = (($iPtn & 0x1F80) / 0x80) & 0x7F; $iFillP = ($iPtn2 & 0x3F); $iBdrSB = (($iPtn2 & 0x1C0) / 0x40) & 0x7; $iBdrCB = (($iPtn2 & 0xFE00) / 0x200) & 0x7F; $iBdrST = ($iBdr1 & 0x07); $iBdrSL = (($iBdr1 & 0x38) / 0x8) & 0x07; $iBdrSR = (($iBdr1 & 0x1C0) / 0x40) & 0x07; $iBdrCT = (($iBdr1 & 0xFE00) / 0x200) & 0x7F; $iBdrCL = ($iBdr2 & 0x7F) & 0x7F; $iBdrCR = (($iBdr2 & 0x3F80) / 0x80) & 0x7F; } push @{$oBook->{Format}} , Spreadsheet::ParseExcel::Format->new ( FontNo => $iFnt, Font => $oBook->{Font}[$iFnt], FmtIdx => $iIdx, Lock => $iLock, Hidden => $iHidden, Style => $iStyle, Key123 => $i123, AlignH => $iAlH, Wrap => $iWrap, AlignV => $iAlV, JustLast => $iJustL, Rotate => $iRotate, Indent => $iInd, Shrink => $iShrink, Merge => $iMerge, ReadDir => $iReadDir, BdrStyle => [$iBdrSL, $iBdrSR, $iBdrST, $iBdrSB], BdrColor => [$iBdrCL, $iBdrCR, $iBdrCT, $iBdrCB], BdrDiag => [$iBdrD, $iBdrSD, $iBdrCD], Fill => [$iFillP, $iFillCF, $iFillCB], ); } #------------------------------------------------------------------------------ # _subFormat (for Spreadsheet::ParseExcel) DK: P336 #------------------------------------------------------------------------------ sub _subFormat { my($oBook, $bOp, $bLen, $sWk) = @_; my $sFmt; if (($oBook->{BIFFVersion} == verBIFF2) || ($oBook->{BIFFVersion} == verBIFF3) || ($oBook->{BIFFVersion} == verBIFF4) || ($oBook->{BIFFVersion} == verBIFF5) ) { $sFmt = substr($sWk, 3, unpack('c', substr($sWk, 2, 1))); $sFmt = $oBook->{FmtClass}->TextFmt($sFmt, '_native_'); } else { $sFmt = _convBIFF8String($oBook, substr($sWk, 2)); } $oBook->{FormatStr}->{unpack('v', substr($sWk, 0, 2))} = $sFmt; } #------------------------------------------------------------------------------ # _subPalette (for Spreadsheet::ParseExcel) DK: P393 #------------------------------------------------------------------------------ sub _subPalette { my($oBook, $bOp, $bLen, $sWk) = @_; for(my $i=0;$i{BIFFVersion} == verBIFF8) { ($iHeight, $iAttr, $iCIdx, $iBold, $iSuper, $iUnderline) = unpack("v5c", $sWk); my($iSize, $iHigh) = unpack('cc', substr($sWk, 14, 2)); if($iHigh) { $sFntName = substr($sWk, 16, $iSize*2); _SwapForUnicode(\$sFntName); $sFntName = $oBook->{FmtClass}->TextFmt($sFntName, 'ucs2'); } else { $sFntName = substr($sWk, 16, $iSize); $sFntName = $oBook->{FmtClass}->TextFmt($sFntName, '_native_'); } $bBold = ($iBold >= 0x2BC)? 1: 0; $bItalic = ($iAttr & 0x02)? 1: 0; $bStrikeout = ($iAttr & 0x08)? 1: 0; $bUnderline = ($iUnderline)? 1: 0; } elsif($oBook->{BIFFVersion} == verBIFF5) { ($iHeight, $iAttr, $iCIdx, $iBold, $iSuper, $iUnderline) = unpack("v5c", $sWk); $sFntName = $oBook->{FmtClass}->TextFmt( substr($sWk, 15, unpack("c", substr($sWk, 14, 1))), '_native_'); $bBold = ($iBold >= 0x2BC)? 1: 0; $bItalic = ($iAttr & 0x02)? 1: 0; $bStrikeout = ($iAttr & 0x08)? 1: 0; $bUnderline = ($iUnderline)? 1: 0; } else { ($iHeight, $iAttr) = unpack("v2", $sWk); $iCIdx = undef; $iSuper = 0; $bBold = ($iAttr & 0x01)? 1: 0; $bItalic = ($iAttr & 0x02)? 1: 0; $bUnderline = ($iAttr & 0x04)? 1: 0; $bStrikeout = ($iAttr & 0x08)? 1: 0; $sFntName = substr($sWk, 5, unpack("c", substr($sWk, 4, 1))); } push @{$oBook->{Font}}, Spreadsheet::ParseExcel::Font->new( Height => $iHeight / 20.0, Attr => $iAttr, Color => $iCIdx, Super => $iSuper, UnderlineStyle => $iUnderline, Name => $sFntName, Bold => $bBold, Italic => $bItalic, Underline => $bUnderline, Strikeout => $bStrikeout, ); #Skip Font[4] push @{$oBook->{Font}}, {} if(scalar(@{$oBook->{Font}}) == 4); } #------------------------------------------------------------------------------ # _subBoundSheet (for Spreadsheet::ParseExcel): DK: P307 #------------------------------------------------------------------------------ sub _subBoundSheet { my($oBook, $bOp, $bLen, $sWk) = @_; my($iPos, $iGr, $iKind) = unpack("Lc2", $sWk); $iKind &= 0x0F; return if(($iKind != 0x00) && ($iKind != 0x01)); if($oBook->{BIFFVersion} >= verBIFF8) { my($iSize, $iUni) = unpack("cc", substr($sWk, 6, 2)); my $sWsName = substr($sWk, 8); if($iUni & 0x01) { _SwapForUnicode(\$sWsName); $sWsName = $oBook->{FmtClass}->TextFmt($sWsName, 'ucs2'); } $oBook->{Worksheet}[$oBook->{SheetCount}] = Spreadsheet::ParseExcel::Worksheet->new( Name => $sWsName, Kind => $iKind, _Pos => $iPos, _Book => $oBook, _SheetNo => $oBook->{SheetCount}, ); } else { $oBook->{Worksheet}[$oBook->{SheetCount}] = Spreadsheet::ParseExcel::Worksheet->new( Name => $oBook->{FmtClass}->TextFmt(substr($sWk, 7), '_native_'), Kind => $iKind, _Pos => $iPos, _Book => $oBook, _SheetNo => $oBook->{SheetCount}, ); } $oBook->{SheetCount}++; } #------------------------------------------------------------------------------ # _subHeader (for Spreadsheet::ParseExcel) DK: P340 #------------------------------------------------------------------------------ sub _subHeader { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); my $sW; #BIFF8 if($oBook->{BIFFVersion} >= verBIFF8) { $sW = _convBIFF8String($oBook, $sWk); $oBook->{Worksheet}[$oBook->{_CurSheet}]->{Header} = ($sW eq "\x00")? undef : $sW; } #Before BIFF8 else { my($iLen) = unpack("c", $sWk); $sW = $oBook->{FmtClass}->TextFmt(substr($sWk, 1, $iLen), '_native_'); $oBook->{Worksheet}[$oBook->{_CurSheet}]->{Header} = ($sW eq "\x00\x00\x00")? undef : $sW; } } #------------------------------------------------------------------------------ # _subFooter (for Spreadsheet::ParseExcel) DK: P335 #------------------------------------------------------------------------------ sub _subFooter { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); my $sW; #BIFF8 if($oBook->{BIFFVersion} >= verBIFF8) { $sW = _convBIFF8String($oBook, $sWk); $oBook->{Worksheet}[$oBook->{_CurSheet}]->{Footer} = ($sW eq "\x00")? undef : $sW; } #Before BIFF8 else { my($iLen) = unpack("c", $sWk); $sW = $oBook->{FmtClass}->TextFmt(substr($sWk, 1, $iLen), '_native_'); $oBook->{Worksheet}[$oBook->{_CurSheet}]->{Footer} = ($sW eq "\x00\x00\x00")? undef : $sW; } } #------------------------------------------------------------------------------ # _subHPageBreak (for Spreadsheet::ParseExcel) DK: P341 #------------------------------------------------------------------------------ sub _subHPageBreak { my($oBook, $bOp, $bLen, $sWk) = @_; my @aBreak; my $iCnt = unpack("v", $sWk); return undef unless(defined $oBook->{_CurSheet}); #BIFF8 if($oBook->{BIFFVersion} >= verBIFF8) { for(my $i=0;$i<$iCnt;$i++) { my($iRow, $iColB, $iColE) = unpack('v3', substr($sWk, 2 + $i*6, 6)); # push @aBreak, [$iRow, $iColB, $iColE]; push @aBreak, $iRow; } } #Before BIFF8 else { for(my $i=0;$i<$iCnt;$i++) { my($iRow) = unpack('v', substr($sWk, 2 + $i*2, 2)); push @aBreak, $iRow; # push @aBreak, [$iRow, 0, 255]; } } @aBreak = sort {$a <=> $b} @aBreak; $oBook->{Worksheet}[$oBook->{_CurSheet}]->{HPageBreak} = \@aBreak; } #------------------------------------------------------------------------------ # _subVPageBreak (for Spreadsheet::ParseExcel) DK: P447 #------------------------------------------------------------------------------ sub _subVPageBreak { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); my @aBreak; my $iCnt = unpack("v", $sWk); #BIFF8 if($oBook->{BIFFVersion} >= verBIFF8) { for(my $i=0;$i<$iCnt;$i++) { my($iCol, $iRowB, $iRowE) = unpack('v3', substr($sWk, 2 + $i*6, 6)); push @aBreak, $iCol; # push @aBreak, [$iCol, $iRowB, $iRowE]; } } #Before BIFF8 else { for(my $i=0;$i<$iCnt;$i++) { my($iCol) = unpack('v', substr($sWk, 2 + $i*2, 2)); push @aBreak, $iCol; # push @aBreak, [$iCol, 0, 65535]; } } @aBreak = sort {$a <=> $b} @aBreak; $oBook->{Worksheet}[$oBook->{_CurSheet}]->{VPageBreak} = \@aBreak; } #------------------------------------------------------------------------------ # _subMergin (for Spreadsheet::ParseExcel) DK: P306, 345, 400, 440 #------------------------------------------------------------------------------ sub _subMergin { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); my $dWk = _convDval(substr($sWk, 0, 8)) * 127 / 50; if($bOp == 0x26) { $oBook->{Worksheet}[$oBook->{_CurSheet}]->{LeftMergin} = $dWk; } elsif($bOp == 0x27) { $oBook->{Worksheet}[$oBook->{_CurSheet}]->{RightMergin} = $dWk; } elsif($bOp == 0x28) { $oBook->{Worksheet}[$oBook->{_CurSheet}]->{TopMergin} = $dWk; } elsif($bOp == 0x29) { $oBook->{Worksheet}[$oBook->{_CurSheet}]->{BottomMergin} = $dWk; } } #------------------------------------------------------------------------------ # _subHcenter (for Spreadsheet::ParseExcel) DK: P340 #------------------------------------------------------------------------------ sub _subHcenter { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); my $iWk = unpack("v", $sWk); $oBook->{Worksheet}[$oBook->{_CurSheet}]->{HCenter} = $iWk; } #------------------------------------------------------------------------------ # _subVcenter (for Spreadsheet::ParseExcel) DK: P447 #------------------------------------------------------------------------------ sub _subVcenter { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); my $iWk = unpack("v", $sWk); $oBook->{Worksheet}[$oBook->{_CurSheet}]->{VCenter} = $iWk; } #------------------------------------------------------------------------------ # _subPrintGridlines (for Spreadsheet::ParseExcel) DK: P397 #------------------------------------------------------------------------------ sub _subPrintGridlines { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); my $iWk = unpack("v", $sWk); $oBook->{Worksheet}[$oBook->{_CurSheet}]->{PrintGrid} = $iWk; } #------------------------------------------------------------------------------ # _subPrintHeaders (for Spreadsheet::ParseExcel) DK: P397 #------------------------------------------------------------------------------ sub _subPrintHeaders { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); my $iWk = unpack("v", $sWk); $oBook->{Worksheet}[$oBook->{_CurSheet}]->{PrintHeaders} = $iWk; } #------------------------------------------------------------------------------ # _subSETUP (for Spreadsheet::ParseExcel) DK: P409 #------------------------------------------------------------------------------ sub _subSETUP { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); my $oWkS = $oBook->{Worksheet}[$oBook->{_CurSheet}]; my $iGrBit; ($oWkS->{PaperSize}, $oWkS->{Scale} , $oWkS->{PageStart}, $oWkS->{FitWidth} , $oWkS->{FitHeight}, $iGrBit, $oWkS->{Res}, $oWkS->{VRes},) = unpack('v8', $sWk); $oWkS->{HeaderMergin} = _convDval(substr($sWk, 16, 8)) * 127 / 50; $oWkS->{FooterMergin} = _convDval(substr($sWk, 24, 8)) * 127 / 50; $oWkS->{Copis}= unpack('v2', substr($sWk, 32, 2)); $oWkS->{LeftToRight}= (($iGrBit & 0x01)? 1: 0); $oWkS->{Landscape} = (($iGrBit & 0x02)? 1: 0); $oWkS->{NoPls} = (($iGrBit & 0x04)? 1: 0); $oWkS->{NoColor} = (($iGrBit & 0x08)? 1: 0); $oWkS->{Draft} = (($iGrBit & 0x10)? 1: 0); $oWkS->{Notes} = (($iGrBit & 0x20)? 1: 0); $oWkS->{NoOrient} = (($iGrBit & 0x40)? 1: 0); $oWkS->{UsePage} = (($iGrBit & 0x80)? 1: 0); } #------------------------------------------------------------------------------ # _subName (for Spreadsheet::ParseExcel) DK: P350 #------------------------------------------------------------------------------ sub _subName { my($oBook, $bOp, $bLen, $sWk) = @_; my($iGrBit, $cKey, $cCh, $iCce, $ixAls, $iTab, $cchCust, $cchDsc, $cchHep, $cchStatus) = unpack('vc2v3c4', $sWk); #Builtin Name + Length == 1 if(($iGrBit & 0x20) && ($cCh == 1)) { #BIFF8 if($oBook->{BIFFVersion} >= verBIFF8) { my $iName = unpack('n', substr($sWk, 14 )); my $iSheet = unpack('v', substr($sWk, 8 )) - 1; if($iName == 6) { #PrintArea my($iSheetW, $raArea) = _ParseNameArea(substr($sWk, 16)); $oBook->{PrintArea}[$iSheet] = $raArea; } elsif($iName == 7) { #Title my($iSheetW, $raArea) = _ParseNameArea(substr($sWk, 16)); my @aTtlR = (); my @aTtlC = (); foreach my $raI (@$raArea) { if($raI->[3] == 0xFF) { #Row Title push @aTtlR, [$raI->[0], $raI->[2] ]; } else { #Col Title push @aTtlC, [$raI->[1], $raI->[3] ]; } } $oBook->{PrintTitle}[$iSheet] = {Row => \@aTtlR, Column => \@aTtlC}; } } else { my $iName = unpack('c', substr($sWk, 14 )); if($iName == 6) { #PrintArea my($iSheet, $raArea) = _ParseNameArea95(substr($sWk, 15)); $oBook->{PrintArea}[$iSheet] = $raArea; } elsif($iName == 7) { #Title my($iSheet, $raArea) = _ParseNameArea95(substr($sWk, 15)); my @aTtlR = (); my @aTtlC = (); foreach my $raI (@$raArea) { if($raI->[3] == 0xFF) { #Row Title push @aTtlR, [$raI->[0], $raI->[2] ]; } else { #Col Title push @aTtlC, [$raI->[1], $raI->[3] ]; } } $oBook->{PrintTitle}[$iSheet] = {Row => \@aTtlR, Column => \@aTtlC}; } } } } #------------------------------------------------------------------------------ # ParseNameArea (for Spreadsheet::ParseExcel) DK: 494 (ptgAread3d) #------------------------------------------------------------------------------ sub _ParseNameArea { my ($sObj) =@_; my ($iOp); my @aRes = (); $iOp = unpack('C', $sObj); my $iSheet; if($iOp == 0x3b) { my($iWkS, $iRs, $iRe, $iCs, $iCe) = unpack('v5', substr($sObj, 1)); $iSheet = $iWkS; push @aRes, [$iRs, $iCs, $iRe, $iCe]; } elsif($iOp == 0x29) { my $iLen = unpack('v', substr($sObj, 1, 2)); my $iSt = 0; while($iSt < $iLen) { my($iOpW, $iWkS, $iRs, $iRe, $iCs, $iCe) = unpack('cv5', substr($sObj, $iSt+3, 11)); if($iOpW == 0x3b) { $iSheet = $iWkS; push @aRes, [$iRs, $iCs, $iRe, $iCe]; } if($iSt==0) { $iSt += 11; } else { $iSt += 12; #Skip 1 byte; } } } return ($iSheet, \@aRes); } #------------------------------------------------------------------------------ # ParseNameArea95 (for Spreadsheet::ParseExcel) DK: 494 (ptgAread3d) #------------------------------------------------------------------------------ sub _ParseNameArea95 { my ($sObj) =@_; my ($iOp); my @aRes = (); $iOp = unpack('C', $sObj); my $iSheet; if($iOp == 0x3b) { $iSheet = unpack('v', substr($sObj, 11, 2)); my($iRs, $iRe, $iCs, $iCe) = unpack('v2C2', substr($sObj, 15, 6)); push @aRes, [$iRs, $iCs, $iRe, $iCe]; } elsif($iOp == 0x29) { my $iLen = unpack('v', substr($sObj, 1, 2)); my $iSt = 0; while($iSt < $iLen) { my $iOpW = unpack('c', substr($sObj, $iSt+3, 6)); $iSheet = unpack('v', substr($sObj, $iSt+14, 2)); my($iRs, $iRe, $iCs, $iCe) = unpack('v2C2', substr($sObj, $iSt+18, 6)); push @aRes, [$iRs, $iCs, $iRe, $iCe] if($iOpW == 0x3b); if($iSt==0) { $iSt += 21; } else { $iSt += 22; #Skip 1 byte; } } } return ($iSheet, \@aRes); } #------------------------------------------------------------------------------ # _subBOOL (for Spreadsheet::ParseExcel) DK: P452 #------------------------------------------------------------------------------ sub _subWSBOOL { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); $oBook->{Worksheet}[$oBook->{_CurSheet}]->{PageFit} = ((unpack('v', $sWk) & 0x100)? 1: 0); } #------------------------------------------------------------------------------ # _subMergeArea (for Spreadsheet::ParseExcel) DK: (Not) #------------------------------------------------------------------------------ sub _subMergeArea { my($oBook, $bOp, $bLen, $sWk) = @_; return undef unless(defined $oBook->{_CurSheet}); my $iCnt = unpack("v", $sWk); my $oWkS = $oBook->{Worksheet}[$oBook->{_CurSheet}]; $oWkS->{MergedArea} = [] unless(defined $oWkS->{MergedArea}); for(my $i=0; $i < $iCnt; $i++) { my($iRs, $iRe, $iCs, $iCe) = unpack('v4', substr($sWk, $i*8 + 2, 8)); for(my $iR=$iRs;$iR<=$iRe;$iR++) { for(my $iC=$iCs;$iC<=$iCe;$iC++) { $oWkS->{Cells}[$iR][$iC] ->{Merged} = 1 if(defined $oWkS->{Cells}[$iR][$iC] ); } } push @{$oWkS->{MergedArea}}, [$iRs, $iCs, $iRe, $iCe]; } } #------------------------------------------------------------------------------ # DecodeBoolErr (for Spreadsheet::ParseExcel) DK: P306 #------------------------------------------------------------------------------ sub DecodeBoolErr { my($iVal, $iFlg) = @_; if($iFlg) { # ERROR if($iVal == 0x00) { return "#NULL!"; } elsif($iVal == 0x07) { return "#DIV/0!"; } elsif($iVal == 0x0F) { return "#VALUE!"; } elsif($iVal == 0x17) { return "#REF!"; } elsif($iVal == 0x1D) { return "#NAME?"; } elsif($iVal == 0x24) { return "#NUM!"; } elsif($iVal == 0x2A) { return "#N/A!"; } else { return "#ERR"; } } else { return ($iVal)? "TRUE" : "FALSE"; } } #------------------------------------------------------------------------------ # _UnpackRKRec (for Spreadsheet::ParseExcel) DK:P 401 #------------------------------------------------------------------------------ sub _UnpackRKRec { my($sArg) = @_; my $iF = unpack('v', substr($sArg, 0, 2)); my $lWk = substr($sArg, 2, 4); my $sWk = pack("c4", reverse(unpack("c4", $lWk))); my $iPtn = unpack("c",substr($sWk, 3, 1)) & 0x03; if($iPtn == 0) { return ($iF, unpack("d", ($BIGENDIAN)? $sWk . "\0\0\0\0": "\0\0\0\0". $lWk)); } elsif($iPtn == 1) { # http://rt.cpan.org/Ticket/Display.html?id=18063 my $u31 = unpack("c",substr($sWk, 3, 1)) & 0xFC; $u31 |= 0xFFFFFF00 if ($u31 & 0x80); # raise neg bits for neg 1-byte value substr($sWk, 3, 1) &= pack('c', $u31); my $u01 = unpack("c",substr($lWk, 0, 1)) & 0xFC; $u01 |= 0xFFFFFF00 if ($u01 & 0x80); # raise neg bits for neg 1-byte value substr($lWk, 0, 1) &= pack('c', $u01); return ($iF, unpack("d", ($BIGENDIAN)? $sWk . "\0\0\0\0": "\0\0\0\0". $lWk)/ 100); } elsif($iPtn == 2) { my $sUB = unpack("B32", $sWk); my $sWkLB = pack("B32", (substr($sUB, 0, 1) x 2) . substr($sUB, 0, 30)); my $sWkL = ($BIGENDIAN)? $sWkLB: pack("c4", reverse(unpack("c4", $sWkLB))); return ($iF, unpack("i", $sWkL)); } else { my $sUB = unpack("B32", $sWk); my $sWkLB = pack("B32", (substr($sUB, 0, 1) x 2) . substr($sUB, 0, 30)); my $sWkL = ($BIGENDIAN)? $sWkLB: pack("c4", reverse(unpack("c4", $sWkLB))); return ($iF, unpack("i", $sWkL) / 100); } } #------------------------------------------------------------------------------ # _subStrWk (for Spreadsheet::ParseExcel) DK:P280 .. #------------------------------------------------------------------------------ sub _subStrWk { my($oBook, $sWk, $fCnt) = @_; #1. Continue if(defined($fCnt)) { #1.1 Before No Data No if($oBook->{StrBuff} eq '') { # #print "CONT NO DATA\n"; #print "DATA:", unpack('H30', $oBook->{StrBuff}), " PRE:$oBook->{_PrevCond}\n"; $oBook->{StrBuff} .= $sWk; } #1.1 No PrevCond elsif(!(defined($oBook->{_PrevCond}))) { #print "NO PREVCOND\n"; $oBook->{StrBuff} .= substr($sWk, 1); } else { #print "CONT\n"; my $iCnt1st = ord($sWk); # 1st byte of Continue may be a GR byte my($iStP, $iLenS) = @{$oBook->{_PrevInfo}}; my $iLenB = length($oBook->{StrBuff}); #1.1 Not in String if($iLenB >= ($iStP + $iLenS)) { #print "NOT STR\n"; $oBook->{StrBuff} .= $sWk; # $oBook->{StrBuff} .= substr($sWk, 1); } #1.2 Same code (Unicode or ASCII) elsif(($oBook->{_PrevCond} & 0x01) == ($iCnt1st & 0x01)) { #print "SAME\n"; $oBook->{StrBuff} .= substr($sWk, 1); } else { #1.3 Diff code (Unicode or ASCII) my $iDiff = ($iStP + $iLenS) - $iLenB; if($iCnt1st & 0x01) { #print "DIFF ASC $iStP $iLenS $iLenB DIFF:$iDiff\n"; #print "BEF:", unpack("H6", $oBook->{StrBuff}), "\n"; my ($iDum, $iGr) =unpack('vc', $oBook->{StrBuff}); substr($oBook->{StrBuff}, 2, 1) = pack('c', $iGr | 0x01); #print "AFT:", unpack("H6", $oBook->{StrBuff}), "\n"; for(my $i = ($iLenB-$iStP); $i >=1; $i--) { substr($oBook->{StrBuff}, $iStP+$i, 0) = "\x00"; } } else { #print "DIFF UNI:", $oBook->{_PrevCond}, ":", $iCnt1st, " DIFF:$iDiff\n"; for(my $i = ($iDiff/2); $i>=1;$i--) { substr($sWk, $i+1, 0) = "\x00"; } } $oBook->{StrBuff} .= substr($sWk, 1); } } } else { #2. Saisho $oBook->{StrBuff} .= $sWk; } #print " AFT2:", unpack("H60", $oBook->{StrBuff}), "\n"; $oBook->{_PrevCond} = undef; $oBook->{_PrevInfo} = undef; while(length($oBook->{StrBuff}) >= 4) { my ( $raBuff, $iLen, $iStPos, $iLenS) = _convBIFF8String($oBook, $oBook->{StrBuff}, 1); #No Code Convert if(defined($raBuff->[0])) { push @{$oBook->{PkgStr}}, { Text => $raBuff->[0], Unicode => $raBuff->[1], Rich => $raBuff->[2], Ext => $raBuff->[3], }; $oBook->{StrBuff} = substr($oBook->{StrBuff}, $iLen); } else { $oBook->{_PrevCond} = $raBuff->[1]; $oBook->{_PrevInfo} = [$iStPos, $iLenS]; last; } } } #------------------------------------------------------------------------------ # _SwapForUnicode (for Spreadsheet::ParseExcel) #------------------------------------------------------------------------------ sub _SwapForUnicode { my($sObj) = @_; # for(my $i = 0; $i{_CurSheet}); my $oCell = Spreadsheet::ParseExcel::Cell->new( Val => $rhKey{Val}, FormatNo=> $rhKey{FormatNo}, Format => $rhKey{Format}, Code => $rhKey{Code}, Type => $oBook->{FmtClass}->ChkType( $rhKey{Numeric}, $rhKey{Format}->{FmtIdx}), ); $oCell->{_Kind} = $rhKey{Kind}; $oCell->{_Value} = $oBook->{FmtClass}->ValFmt($oCell, $oBook); if($rhKey{Rich}) { my @aRich = (); my $sRich = $rhKey{Rich}; for(my $iWk=0;$iWk{Font}[$iFnt]]; } $oCell->{Rich} = \@aRich; } if(defined $_CellHandler) { if(defined $_Object){ no strict; ref($_CellHandler) eq "CODE" ? $_CellHandler->($_Object, $oBook, $oBook->{_CurSheet}, $iR, $iC, $oCell) : $_CellHandler->callback($_Object, $oBook, $oBook->{_CurSheet}, $iR, $iC, $oCell); } else{ $_CellHandler->($oBook, $oBook->{_CurSheet}, $iR, $iC, $oCell); } } unless($_NotSetCell) { $oBook->{Worksheet}[$oBook->{_CurSheet}]->{Cells}[$iR][$iC] = $oCell; } return $oCell; } #------------------------------------------------------------------------------ # ColorIdxToRGB (for Spreadsheet::ParseExcel) #------------------------------------------------------------------------------ sub ColorIdxToRGB { my($sPkg, $iIdx) = @_; return ((defined $aColor[$iIdx])? $aColor[$iIdx] : $aColor[0]); } #DESTROY { # my ($self) = @_; # warn "DESTROY $self called\n" #} 1; __END__ =head1 NAME Spreadsheet::ParseExcel - Get information from Excel file =head1 SYNOPSIS I use strict; use Spreadsheet::ParseExcel; my $excel = Spreadsheet::ParseExcel::Workbook->Parse($file); foreach my $sheet (@{$excel->{Worksheet}}) { printf("Sheet: %s\n", $sheet->{Name}); $sheet->{MaxRow} ||= $sheet->{MinRow}; foreach my $row ($sheet->{MinRow} .. $sheet->{MaxRow}) { $sheet->{MaxCol} ||= $sheet->{MinCol}; foreach my $col ($sheet->{MinCol} .. $sheet->{MaxCol}) { my $cell = $sheet->{Cells}[$row][$col]; if ($cell) { printf("( %s , %s ) => %s\n", $row, $col, $cell->{Val}); } } } } I use strict; use Spreadsheet::ParseExcel; my $oExcel = Spreadsheet::ParseExcel->new; #1.1 Normal Excel97 my $oBook = $oExcel->Parse('Excel/Test97.xls'); my($iR, $iC, $oWkS, $oWkC); print "FILE :", $oBook->{File} , "\n"; print "COUNT :", $oBook->{SheetCount} , "\n"; print "AUTHOR:", $oBook->{Author} , "\n"; for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++) { $oWkS = $oBook->{Worksheet}[$iSheet]; print "--------- SHEET:", $oWkS->{Name}, "\n"; for(my $iR = $oWkS->{MinRow} ; defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) { for(my $iC = $oWkS->{MinCol} ; defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ; $iC++) { $oWkC = $oWkS->{Cells}[$iR][$iC]; print "( $iR , $iC ) =>", $oWkC->Value, "\n" if($oWkC); # Formatted Value print "( $iR , $iC ) =>", $oWkC->{Val}, "\n" if($oWkC); # Original Value } } } =head1 DESCRIPTION Spreadsheet::ParseExcel makes you to get information from Excel95, Excel97, Excel2000 file. =head2 Functions =over 4 =item new I<$oExcel> = Spreadsheet::ParseExcel->new( [ I => \&subCellHandler, I => undef | 1, ]); Constructor. =over 4 =item CellHandler I<(experimental)> specify callback function when a cell is detected. I gets arguments like below: sub subCellHandler (I<$oBook>, I<$iSheet>, I<$iRow>, I<$iCol>, I<$oCell>); B : The atributes of Workbook may not be complete. This function will be called almost order by rows and columns. Take care B, I. =item NotSetCell I<(experimental)> specify set or not cell values to Workbook object. =back =item Parse I<$oWorkbook> = $oParse->Parse(I<$sFileName> [, I<$oFmt>]); return L<"Workbook"> object. if error occurs, returns undef. =over 4 =item I<$sFileName> name of the file to parse From 0.12 (with OLE::Storage_Lite v.0.06), scalar reference of file contents (ex. \$sBuff) or IO::Handle object (inclucdng IO::File etc.) are also available. =item I<$oFmt> L<"Formatter Class"> to format the value of cells. =back =item ColorIdxToRGB I<$sRGB> = $oParse->ColorIdxToRGB(I<$iColorIdx>); I returns RGB string corresponding to specified color index. RGB string has 6 charcters, representing RGB hex value. (ex. red = 'FF0000') =back =head2 Workbook I Workbook class has these methods : =over 4 =item Parse (class method) : same as Spreadsheet::ParseExcel =back =over 4 =item Worksheet I<$oWorksheet> = $oBook->Worksheet(I<$sName>); I returns a Worksheet object with I<$sName> or undef. If there is no worksheet with I<$sName> and I<$sName> contains only digits, it returns a Worksheet object at that position. =back Workbook class has these properties : =over 4 =item File Name of the file =item Author Author of the file =item Flg1904 If this flag is on, date of the file count from 1904. =item Version Version of the file =item SheetCount Numbers of L<"Worksheet"> s in that Workbook =item Worksheet[SheetNo] Array of L<"Worksheet">s class =item PrintArea[SheetNo] Array of PrintArea array refs. Each PrintArea is : [ I, I, I, I] =item PrintTitle[SheetNo] Array of PrintTitle hash refs. Each PrintTitle is : { Row => [I, I], Column => [I, I]} =back =head2 Worksheet I Worksheet class has these methods: =over 4 =item Cell ( ROW, COL ) Return the Cell iobject at row ROW and column COL if it is defined. Otherwise return undef. =item RowRange () Return a two-element list (MIN, MAX) containing the minimum and maximum of defined rows in the worksheet If there is no row defined MAX is smaller than MIN. =item ColRange () Return a two-element list (MIN, MAX) containing the minimum and maximum of defined columns in the worksheet If there is no row defined MAX is smaller than MIN. =back Worksheet class has these properties: =over 4 =item Name Name of that Worksheet =item DefRowHeight Default height of rows =item DefColWidth Default width of columns =item RowHeight[Row] Array of row height =item ColWidth[Col] Array of column width (undef means DefColWidth) =item Cells[Row][Col] Array of L<"Cell">s infomation in the worksheet =item Landscape Print in horizontal(0) or vertical (1). =item Scale Print scale. =item FitWidth Number of pages with fit in width. =item FitHeight Number of pages with fit in height. =item PageFit Print with fit (or not). =item PaperSize Papar size. The value is like below: Letter 1, LetterSmall 2, Tabloid 3 , Ledger 4, Legal 5, Statement 6 , Executive 7, A3 8, A4 9 , A4Small 10, A5 11, B4 12 , B5 13, Folio 14, Quarto 15 , 10x14 16, 11x17 17, Note 18 , Envelope9 19, Envelope10 20, Envelope11 21 , Envelope12 22, Envelope14 23, Csheet 24 , Dsheet 25, Esheet 26, EnvelopeDL 27 , EnvelopeC5 28, EnvelopeC3 29, EnvelopeC4 30 , EnvelopeC6 31, EnvelopeC65 32, EnvelopeB4 33 , EnvelopeB5 34, EnvelopeB6 35, EnvelopeItaly 36 , EnvelopeMonarch 37, EnvelopePersonal 38, FanfoldUS 39 , FanfoldStdGerman 40, FanfoldLegalGerman 41, User 256 =item PageStart Start page number. =item UsePage Use own start page number (or not). =item LeftMergin, RightMergin, TopMergin, BottomMergin, HeaderMergin, FooterMergin Mergins for left, right, top, bottom, header and footer. =item HCenter Print in horizontal center (or not) =item VCenter Print in vertical center (or not) =item Header Content of print header. Please refer Excel Help. =item Footer Content of print footer. Please refer Excel Help. =item PrintGrid Print with Gridlines (or not) =item PrintHeaders Print with headings (or not) =item NoColor Print in black-white (or not). =item Draft Print in draft mode (or not). =item Notes Print with notes (or not). =item LeftToRight Print left to right(0) or top to down(1). =item HPageBreak Array ref of horizontal page breaks. =item VPageBreak Array ref of vertical page breaks. =item MergedArea Array ref of merged areas. Each merged area is : [ I, I, I, I] =back =head2 Cell I Cell class has these properties: =over 4 =item Value I Formatted value of that cell =item Val Original Value of that cell =item Type Kind of that cell ('Text', 'Numeric', 'Date') =item Code Character code of that cell (undef, 'ucs2', '_native_') undef tells that cell seems to be ascii. '_native_' tells that cell seems to be 'sjis' or something like that. =item Format L<"Format"> for that cell. =item Merged That cells is merged (or not). =item Rich Array ref of font informations about each characters. Each entry has : [ I, I] For more information please refer sample/dmpExR.pl =back =head2 Format I Format class has these properties: =over 4 =item Font L<"Font"> object for that Format. =item AlignH Horizontal Alignment. 0: (standard), 1: left, 2: center, 3: right, 4: fill , 5: justify, 7:equal_space B 6 may be I but it seems not to work. =item AlignV Vertical Alignment. 0: top, 1: vcenter, 2: bottom, 3: vjustify, 4: vequal_space =item Indent Number of indent =item Wrap Wrap (or not). =item Shrink Display in shrinking (or not) =item Rotate In Excel97, 2000 : degrees of string rotation. In Excel95 or earlier : 0: No rotation, 1: Top down, 2: 90 degrees anti-clockwise, 3: 90 clockwise =item JustLast JustLast (or not). I =item ReadDir Direction for read. =item BdrStyle Array ref of boder styles : [I, I, I, I] =item BdrColor Array ref of boder color indexes : [I, I, I, I] =item BdrDiag Array ref of diag boder kind, style and color index : [I, I