snow/0000755000176200001440000000000014136256672011254 5ustar liggesuserssnow/NAMESPACE0000644000176200001440000000127214136027532012464 0ustar liggesusers# Export all names unless they start with a dot exportPattern("^[^.]") importFrom("graphics", "abline", "axis", "plot", "rect", "segments") importFrom("stats", "runif") importFrom("utils", "flush.console") importFrom("parallel", "detectCores") ## really just to get `[,cluster`! S3method(plot, snowTimingData) S3method(print, snowTimingData) S3method(closeNode, SOCKnode) S3method(closeNode, default) S3method(recvData, MPInode) S3method(recvData, SOCKnode) S3method(recvOneData, MPIcluster) S3method(recvOneData, SOCKcluster) S3method(sendData, MPInode) S3method(sendData, SOCKnode) S3method(stopCluster, MPIcluster) S3method(stopCluster, default) S3method(stopCluster, spawnedMPIcluster) snow/README0000644000176200001440000000663014136027100012117 0ustar liggesusers SNOW: Simple Network of Workstations The snow package provides support for simple parallel computing on a network of workstations using R. A master R process calls makeCluster to start a cluster of worker processes; the master process then uses functions such as clusterCall and clusterApply to execute R code on the worker processes and collect and return the results on the master. This framework supports many forms of "embarrassingly parallel" computations. Snow can use one of two communications mechanisms: sockets or MPI. MPI clusters use package Rmpi. If using MPI the MPI system may need to be started externally. SOCK clusters are the easiest approach for using snow on a single multi-core computer as they require no additional software. CAUTION Be sure to call stopCluster before exiting R. Otherwise stray processes may remain running and need to be shut down manually. INSTALLATION MPI clusters require a suitable MPI implementation (e.g. LAM-MPI or Open MPI) and the Rmpi package. The rlecuyer package may also be useful to support parallel random number generation. These supporting R packages and the snow package should be installed in the same library directory. The snow package and supporting packages need to be available on all hosts that are to be used for a cluster. No further configuration should be needed for a homogeneous network of workstations with a common architecture, operating system, and common file system layout. If some hosts have different file system layouts, then SOCK clusters can use host specifications for the workers that specify where to find the snow package and the Rscript program to use. Alternatively, the file RunSnowWorker should be placed in a directory on the PATH of each host to be used for worker processes, and each such host should define the variable R_SNOW_LIB as the directory in which the snow package and supporting packages have been installed. Thus if snow has been installed with R CMD INSTALL snow -l $HOME/SNOW/R/lib then users with a csh shell would place something like setenv R_SNOW_LIB $HOME/SNOW/R/lib in their .cshrc files. Setting this variable to a nonempty value on the master as well ensures that the cluster startup mechanism assumes an inhomogeneous cluster by default. Rscript should also be on the PATH of the hosts used to run worker processes. Alternatively, you can define the environment variable R_SNOW_RSCRIPT_CMD to the path for Rscript, or you can edit edit the RunSnowWorker scripts to use a fully qualified path to the R shell script. For SOCK clusters the option manual = TRUE forces a manual startup mode in which the master prints the command to be run manually to start a worker process. Together with setting the outfile option this can be useful for debugging cluster startup. To date, snow has been used successfully with master and workers running on combinations of several flavors of Unix-like operating systems, including Linux, HP-UX and Mac OS X using LAM-MPI, or sockets. The socket version of snow has been run with a master on Linux or Windows Windows and workers on a combination of Windows, Linux, and Mac OS X; freeSSHd and Putty's plink were used for remote process startup on windows. The MPI version has been run on a single multi-core Windows machine using DeinoMPI; reports on experiences with MPICH2 on windows would be welcome. REFERENCE http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html. snow/man/0000755000176200001440000000000014136030142012005 5ustar liggesuserssnow/man/snow-startstop.Rd0000644000176200001440000001354114136033722015336 0ustar liggesusers\name{snow-startstop} \title{Starting and Stopping SNOW Clusters} \alias{getMPIcluster} \alias{makeMPIcluster} %% \alias{makePVMcluster} %% \alias{makeNWScluster} \alias{makeSOCKcluster} \alias{makeCluster} \alias{stopCluster} \alias{setDefaultClusterOptions} \usage{ makeCluster(spec, type = getClusterOption("type"), ...) stopCluster(cl) setDefaultClusterOptions(...) makeSOCKcluster(names, ..., options = defaultClusterOptions) %% makePVMcluster(count, ..., options = defaultClusterOptions) makeMPIcluster(count, ..., options = defaultClusterOptions) %% makeNWScluster(names, ..., options = defaultClusterOptions) getMPIcluster() } \arguments{ \item{spec}{cluster specification} \item{count}{number of nodes to create} \item{names}{character vector of node names} \item{options}{cluster options object} \item{cl}{cluster object} \item{...}{cluster option specifications} \item{type}{character; specifies cluster type.} } \description{ Functions to start and stop a SNOW cluster and to set default cluster options. } \details{ \code{makeCluster} starts a cluster of the specified or default type and returns a reference to the cluster. Supported cluster types are \code{"SOCK"}, %% \code{"PVM"}, %% \code{"NWS"}, and \code{"MPI"}. For %% \code{"PVM"} and \code{"MPI"} clusters the \code{spec} argument should be an integer specifying the number of worker nodes to create. For \code{"SOCK"} %% and \code{"NWS"} clusters \code{spec} should be a character vector naming the hosts on which worker nodes should be started; one node is started for each element in the vector. For \code{"SOCK"} %% and \code{"NWS"} clusters \code{spec} can also be an integer specifying the number of worker nodes to create on the local machine. For \code{SOCK} %% and \code{NWS} clusters the \code{spec} can also be a list of machine specifications, each a list of named option values. Such a list must include a character value named \code{host} host specifying the name or address of the host to use. Any other option can be specified as well. For \code{SOCK} %% and \code{NWS} clusters this may be a more convenient alternative than inhomogeneous cluster startup procedure. The options \code{rscript} and \code{snowlib} are often useful; see the examples below. \code{stopCluster} should be called to properly shut down the cluster before exiting R. If it is not called it may be necessary to use external means to ensure that all worker processes are shut down. \code{setDefaultClusterOptions} can be used to specify alternate values for default cluster options. There are many options. The most useful ones are \code{type} and \code{homogeneous}. The default value of the \code{type} option is currently set to "MPI" if \pkg{Rmpi} is on the search path. Otherwise it is set to %% \code{"PVM"} if the \pkg{rpvm} package is available, to \code{"MPI"} if \pkg{Rmpi} is available, %% but \pkg{rpvm} is not, and to \code{"SOCK"} otherwise. The \code{homogeneous} option should be set to \code{FALSE} to specify that the startup procedure for inhomogeneous clusters is to be used; this requires some additional configuration. The default setting is \code{TRUE} unless the environment variable \code{R_SNOW_LIB} is defined on the master host with a non-empty value. The option\code{outfile} can be used to specify the file to which worker node output is to be directed. The default is \code{/dev/null}; during debugging of an installation it can be useful to set this to a proper file. On some systems setting \code{outfile} to \code{""} or to \code{/dev/tty} will result in worker output being sent tothe terminal running the master process. The functions \code{makeSOCKcluster}, %% \code{makePVMcluster}, %% \code{makeNWScluster}, and \code{makeMPIcluster} can be used to start a cluster of the corresponding type. In MPI configurations where process spawning is not available and something like \code{mpirun} is used to start a master and a set of workers the corresponding cluster will have been pre-constructed and can be obtained with \code{getMPIcluster}. It is also possible to obtain a reference to the running cluster using \code{makeCluster} or \code{makeMPIcluster}. In this case the \code{count} argument can be omitted; if it is supplied, it must equal the number of nodes in the cluster. This interface is still experimental and subject to change. For SOCK %% and NWS clusters the option \code{manual = TRUE} forces a manual startup mode in which the master prints the command to be run manually to start a worker process. Together with setting the \code{outfile} option this can be useful for debugging cluster startup. For more details see \url{https://stat.uiowa.edu/~luke/R/cluster/cluster.html}. } \examples{ \dontrun{ ## Two workers run on the local machine as a SOCK cluster. cl <- makeCluster(c("localhost","localhost"), type = "SOCK") clusterApply(cl, 1:2, get("+"), 3) stopCluster(cl) ## Another approach to running on the local machine as a SOCK cluster. cl <- makeCluster(2, type = "SOCK") clusterApply(cl, 1:2, get("+"), 3) stopCluster(cl) ## A SOCK cluster with two workers on Mac OS X, two on Linux, and two ## on Windows: macOptions <- list(host = "owasso", rscript = "/Library/Frameworks/R.framework/Resources/bin/Rscript", snowlib = "/Library/Frameworks/R.framework/Resources/library") lnxOptions <- list(host = "itasca", rscript = "/usr/lib64/R/bin/Rscript", snowlib = "/home/luke/tmp/lib") winOptions <- list(host="192.168.1.168", rscript="C:/Program Files/R/R-2.7.1/bin/Rscript.exe", snowlib="C:/Rlibs") cl <- makeCluster(c(rep(list(macOptions), 2), rep(list(lnxOptions), 2), rep(list(winOptions), 2)), type = "SOCK") clusterApply(cl, 1:6, get("+"), 3) stopCluster(cl) } } \keyword{programming} snow/man/snow-timing.Rd0000644000176200001440000000311111564215220014547 0ustar liggesusers\name{snow-timing} \title{Timing SNOW Clusters} \alias{snow.time} \alias{plot.snowTimingData} \alias{print.snowTimingData} \usage{ snow.time(expr) \method{print}{snowTimingData}(x, ...) \method{plot}{snowTimingData}(x, xlab = "Elapsed Time", ylab = "Node", title = "Cluster Usage", ...) } \arguments{ \item{expr}{expression to evaluate} \item{x}{timing data object to plot or print} \item{xlab}{x axis label} \item{ylab}{y axis label} \item{title}{plot main title} \item{...}{additional arguments} } \description{ Experimental functions to collect and display timing data for cluster computations. } \details{ \code{snow.time} collects and returns and returns timing information for cluster usage in evaluating \code{expr}. The return value is an object of class \code{snowTimingData}; details of the return value are subject to change. The \code{print} method for \code{snowTimingData} objects shows the total elapsed time, the total communication time between master and worker nodes, and the compute time on each worker node. The \code{plot}, motivated by the display produced by \code{xpvm}, produces a Gantt chart of the computation, with green rectangles representing active computation, blue horizontal lines representing a worker waiting to return a result, and red lines representing master/worker communications. } \examples{ \dontrun{ cl <- makeCluster(2,type="SOCK") x <- rnorm(1000000) tm <- snow.time(clusterCall(cl, function(x) for (i in 1:100) sum(x), x)) print(tm) plot(tm) stopCluster(cl) } } \keyword{programming} snow/man/snow-rand.Rd0000644000176200001440000000405414136023514014214 0ustar liggesusers\name{snow-rand} \title{Uniform Random Number Generation in SNOW Clusters} \alias{clusterSetupRNG} \alias{clusterSetupSPRNG} \alias{clusterSetupRNGstream} \usage{ clusterSetupRNG (cl, type = "RNGstream", ...) clusterSetupRNGstream (cl, seed=rep(12345,6), ...) clusterSetupSPRNG (cl, seed = round(2^32 * runif(1)), prngkind = "default", para = 0, ...) } \arguments{ \item{cl}{Cluster object.} \item{type}{\code{type="RNGstream"} (default) initializes the L'Ecuyer's RNG. \code{type="SPRNG"} initializes the SPRNG generator.} \item{...}{Arguments passed to the underlying function (see details bellow).} \item{seed}{Integer value (SPRNG) or a vector of six integer values (RNGstream) used as seed for the RNG.} \item{prngkind}{Character string naming generator type used with SPRNG.} \item{para}{Additional parameters for the generator.} } \description{ Initialize independent uniform random number streams to be used in a SNOW cluster. It uses either the L'Ecuyer's random number generator (package rlecuyer required) or the SPRNG generator (package rsprng required). } \details{ \code{clusterSetupRNG} calls (subject to its argument values) one of the other functions, passing arguments \code{(cl, ...)}. If the "SPRNG" type is used, then the function \code{clusterSetupSPRNG} is called. If the "RNGstream" type is used, then the function \code{clusterSetupRNGstream} is called. \code{clusterSetupSPRNG} loads the \code{rsprng} package and initializes separate streams on each node. For further details see the documentation of \code{init.sprng}. The generator on the master is not affected. NOTE: SPRNG is currently not supported. \code{clusterSetupRNGstream} loads the \code{rlecuyer} package, creates one stream per node and distributes the stream states to the nodes. For more details see \url{https://stat.uiowa.edu/~luke/R/cluster/cluster.html}. } \examples{ \dontrun{ clusterSetupSPRNG(cl) clusterSetupSPRNG(cl, seed=1234) clusterSetupRNG(cl, seed=rep(1,6)) } } \keyword{programming} snow/man/snow-internal.Rd0000644000176200001440000000637514136035776015131 0ustar liggesusers\name{snow-internal} \title{Internal SNOW Objects} \alias{runMPIworker} \alias{setMPIcluster} \alias{stopCluster.default} \alias{stopCluster.MPIcluster} \alias{stopCluster.spawnedMPIcluster} %% \alias{stopCluster.NWScluster} \alias{defaultClusterOptions} \alias{docall} \alias{workLoop} \alias{sinkWorkerOutput} \alias{newSOCKnode} \alias{makeSOCKmaster} \alias{closeNode.SOCKnode} \alias{sendData.SOCKnode} \alias{recvData.SOCKnode} %% \alias{newPVMnode} %% \alias{makePVMmaster} %% \alias{closeNode.PVMnode} %% \alias{sendData.PVMnode} %% \alias{recvData.PVMnode} \alias{newMPInode} \alias{makeMPImaster} \alias{closeNode.MPInode} \alias{sendData.MPInode} \alias{recvData.MPInode} %% \alias{newNWSnode} %% \alias{makeNWSmaster} %% \alias{closeNode.NWSnode} %% \alias{sendData.NWSnode} %% \alias{recvData.NWSnode} \alias{closeNode} \alias{closeNode.default} \alias{sendData} \alias{recvData} \alias{sendNode} \alias{postNode} \alias{stopNode} \alias{sendCall} \alias{recvResult} %\alias{initSprngNode} \alias{initRNGstreamNode} \alias{splitIndices} \alias{splitList} \alias{splitRows} \alias{splitCols} \alias{addClusterOptions} \alias{initDefaultClusterOptions} \alias{findRecvOneTag} \alias{recvOneData} \alias{recvOneData.MPIcluster} %% \alias{recvOneData.PVMcluster} \alias{recvOneData.SOCKcluster} %% \alias{recvOneData.NWScluster} \alias{recvOneResult} \alias{getClusterOption} \alias{checkCluster} \alias{checkForRemoteErrors} \alias{staticClusterApply} \alias{dynamicClusterApply} \alias{shQuoteIfNeeded} \usage{ \method{stopCluster}{default}(cl) \method{stopCluster}{MPIcluster}(cl) \method{stopCluster}{spawnedMPIcluster}(cl) %% \method{stopCluster}{NWScluster}(cl) defaultClusterOptions addClusterOptions(options, new) initDefaultClusterOptions(libname) docall(fun, args) workLoop(master) sinkWorkerOutput(outfile) newSOCKnode(machine = "localhost", ..., options = defaultClusterOptions, rank) makeSOCKmaster(master = Sys.getenv("MASTER"), port = Sys.getenv("PORT")) \method{closeNode}{SOCKnode}(node) \method{sendData}{SOCKnode}(node, data) \method{recvData}{SOCKnode}(node) %% newPVMnode(where = "", options = defaultClusterOptions, rank) %% makePVMmaster() %% \method{sendData}{PVMnode}(node, data) %% \method{recvData}{PVMnode}(node) setMPIcluster(new) runMPIworker() newMPInode(rank, comm) makeMPImaster(comm) \method{sendData}{MPInode}(node, data) \method{recvData}{MPInode}(node) closeNode(node) \method{closeNode}{default}(node) sendData(node, data) recvData(node) postNode(con, type, value = NULL, tag = NULL) stopNode(n) sendCall(con, fun, args, return = TRUE, tag = NULL) recvResult(con) %initSprngNode(streamno, nstream, seed, kind, para) initRNGstreamNode(stream) splitIndices(nx, ncl) splitList(x, ncl) splitRows(x, ncl) splitCols(x, ncl) findRecvOneTag(cl, anytag) recvOneData(cl) recvOneResult(cl) \method{recvOneData}{MPIcluster}(cl) %% \method{recvOneData}{PVMcluster}(cl) \method{recvOneData}{SOCKcluster}(cl) %% \method{recvOneData}{NWScluster}(cl) getClusterOption(name, options = defaultClusterOptions) checkCluster(cl) checkForRemoteErrors(val) staticClusterApply(cl, fun, n, argfun) dynamicClusterApply(cl, fun, n, argfun) shQuoteIfNeeded(p) } \description{ Internal functions for the snow package. } \details{ These are not to be called by the user. } \keyword{internal} snow/man/snow-parallel.Rd0000644000176200001440000000303014136023527015061 0ustar liggesusers\name{snow-parallel} \title{Higher Level SNOW Functions} \alias{parLapply} \alias{parRapply} \alias{parCapply} \alias{parApply} \alias{parMM} \alias{parSapply} \usage{ parLapply(cl, x, fun, ...) parSapply(cl, X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) parApply(cl, X, MARGIN, FUN, ...) parRapply(cl, x, fun, ...) parCapply(cl, x, fun, ...) parMM(cl, A, B) } \arguments{ \item{cl}{cluster object} \item{fun}{function or character string naming a function} \item{X}{array to be used} \item{x}{matrix to be used} \item{FUN}{function or character string naming a function} \item{MARGIN}{vector specifying the dimensions to use.} \item{simplify}{logical; see \code{sapply}} \item{USE.NAMES}{logical; see \code{sapply}} \item{...}{additional arguments to pass to standard function} \item{A}{matrix} \item{B}{matrix} } \description{ Parallel versions of \code{apply} and related functions. } \details{ \code{parLapply}, \code{parSapply}, and \code{parApply} are parallel versions of \code{lapply}, \code{sapply}, and \code{apply}. \code{parRapply} and \code{parCapply} are parallel row and column \code{apply} functions for a matrix \code{x}; they may be slightly more efficient than \code{parApply}. \code{parMM} is a very simple(minded) parallel matrix multiply; it is intended as an illustration. For more details see \url{https://stat.uiowa.edu/~luke/R/cluster/cluster.html}. } \examples{ \dontrun{ cl <- makeSOCKcluster(c("localhost","localhost")) parSapply(cl, 1:20, get("+"), 3) } } \keyword{programming} snow/man/snow-cluster.Rd0000644000176200001440000000767514136033615014770 0ustar liggesusers\name{snow-cluster} \title{Cluster-Level SNOW Functions} \alias{clusterSplit} \alias{clusterCall} \alias{clusterApply} \alias{clusterApplyLB} \alias{clusterEvalQ} \alias{clusterExport} \alias{clusterMap} \usage{ clusterSplit(cl, seq) clusterCall(cl, fun, ...) clusterApply(cl, x, fun, ...) clusterApplyLB(cl, x, fun, ...) clusterEvalQ(cl, expr) clusterExport(cl, list, envir = .GlobalEnv) clusterMap(cl, fun, ..., MoreArgs = NULL, RECYCLE = TRUE) } \arguments{ \item{cl}{cluster object} \item{fun}{function or character string naming a function} \item{expr}{expression to evaluate} \item{seq}{vector to split} \item{list}{character vector of variables to export} \item{envir}{environment from which t export variables} \item{x}{array} \item{...}{additional arguments to pass to standard function} \item{MoreArgs}{additional argument for \code{fun}} \item{RECYCLE}{logical; if true shorter arguments are recycled} } \description{ Functions for computing on a SNOW cluster. } \details{ These are the basic functions for computing on a cluster. All evaluations on the worker nodes are done using \code{tryCatch}. Currently an error is signaled on the master if any one of the nodes produces an error. More sophisticated approaches will be considered in the future. \code{clusterCall} calls a function \code{fun} with identical arguments \code{...} on each node in the cluster \code{cl} and returns a list of the results. \code{clusterEvalQ} evaluates a literal expression on each cluster node. It a cluster version of \code{evalq}, and is a convenience function defined in terms of \code{clusterCall}. \code{clusterApply} calls \code{fun} on the first cluster node with arguments \code{seq[[1]]} and \code{...}, on the second node with \code{seq[[2]]} and \code{...}, and so on. If the length of \code{seq} is greater than the number of nodes in the cluster then cluster nodes are recycled. A list of the results is returned; the length of the result list will equal the length of \code{seq}. \code{clusterApplyLB} is a load balancing version of \code{clusterApply}. if the length \code{p} of \code{seq} is greater than the number of cluster nodes \code{n}, then the first \code{n} jobs are placed in order on the \code{n} nodes. When the first job completes, the next job is placed on the available node; this continues until all jobs are complete. Using \code{clusterApplyLB} can result in better cluster utilization than using \code{clusterApply}. However, increased communication can reduce performance. Furthermore, the node that executes a particular job is nondeterministic, which can complicate ensuring reproducibility in simulations. \code{clusterMap} is a multi-argument version of \code{clusterApply}, analogous to \code{mapply}. If \code{RECYCLE} is true shorter arguments are recycled; otherwise, the result length is the length of the shortest argument. Cluster nodes are recycled if the length of the result is greater than the number of nodes. \code{clusterExport} assigns the values on the master of the variables named in \code{list} to variables of the same names in the global environments of each node. The environment on the master from which variables are exported defaults to the global environment. \code{clusterSplit} splits \code{seq} into one consecutive piece for each cluster and returns the result as a list with length equal to the number of cluster nodes. Currently the pieces are chosen to be close to equal in length. Future releases may attempt to use relative performance information about nodes to choose split proportional to performance. For more details see \url{https://stat.uiowa.edu/~luke/R/cluster/cluster.html}. } \examples{ \dontrun{ cl <- makeSOCKcluster(c("localhost","localhost")) clusterApply(cl, 1:2, get("+"), 3) clusterEvalQ(cl, library(boot)) x<-1 clusterExport(cl, "x") clusterCall(cl, function(y) x + y, 2) } } \keyword{programming} snow/DESCRIPTION0000644000176200001440000000065414136256672012767 0ustar liggesusersPackage: snow Title: Simple Network of Workstations Version: 0.4-4 Author: Luke Tierney, A. J. Rossini, Na Li, H. Sevcikova Description: Support for simple parallel computing in R. Maintainer: Luke Tierney Suggests: rlecuyer Enhances: Rmpi License: GPL Depends: R (>= 2.13.1), utils NeedsCompilation: no Packaged: 2021-10-26 17:36:46 UTC; luke Repository: CRAN Date/Publication: 2021-10-27 14:10:02 UTC snow/R/0000755000176200001440000000000014136036147011446 5ustar liggesuserssnow/R/mpi.R0000644000176200001440000001210514136036147012355 0ustar liggesusers# # MPI Implementation # newMPInode <- function(rank, comm) structure(list(rank = rank, RECVTAG = 33, SENDTAG = 22, comm = comm), class = "MPInode") makeMPImaster <- function(comm = 0) structure(list(rank = 0, RECVTAG = 22, SENDTAG = 33, comm = comm), class = "MPInode") sendData.MPInode <- function(node, data) Rmpi::mpi.send.Robj(data, node$rank, node$SENDTAG, node$comm) recvData.MPInode <- function(node) Rmpi::mpi.recv.Robj(node$rank, node$RECVTAG, node$comm) recvOneData.MPIcluster <- function(cl) { rtag <- findRecvOneTag(cl, Rmpi::mpi.any.tag()) comm <- cl[[1]]$comm # should all be the same status <- 0 Rmpi::mpi.probe(Rmpi::mpi.any.source(), rtag, comm, status) srctag <- Rmpi::mpi.get.sourcetag(status) data <- Rmpi::mpi.recv.Robj(srctag[1], srctag[2], comm) list(node = srctag[1], value = data) } getMPIcluster <- NULL setMPIcluster <- NULL local({ cl <- NULL getMPIcluster <<- function() cl setMPIcluster <<- function(new) cl <<- new }) makeMPIcluster <- function(count, ..., options = defaultClusterOptions) { options <- addClusterOptions(options, list(...)) cl <- getMPIcluster() if (! is.null(cl)) { if (missing(count) || count == length(cl)) cl else stop(sprintf("MPI cluster of size %d already running", length(cl))) } else if (missing(count)) { # assume something like mpirun -np count+1 has been used to start R count <- Rmpi::mpi.comm.size(0) - 1 if (count <= 0) stop("no nodes available.") cl <- vector("list",count) for (i in seq(along=cl)) cl[[i]] <- newMPInode(i, 0) class(cl) <- c("MPIcluster","cluster") setMPIcluster(cl) cl } else { # use process spawning to create cluster if (! requireNamespace("Rmpi")) stop("the `Rmpi' package is needed for MPI clusters.") comm <- 1 intercomm <- 2 if (Rmpi::mpi.comm.size(comm) > 0) stop(paste("a cluster already exists", comm)) scriptdir <- getClusterOption("scriptdir", options) outfile <- getClusterOption("outfile", options) homogeneous <- getClusterOption("homogeneous", options) if (getClusterOption("useRscript", options)) { if (homogeneous) { rscript <- shQuoteIfNeeded(getClusterOption("rscript", options)) snowlib <- getClusterOption("snowlib", options) script <- shQuoteIfNeeded(file.path(snowlib, "snow", "RMPInode.R")) args <- c(script, paste("SNOWLIB=", snowlib, sep=""), paste("OUT=", outfile, sep="")) mpitask <- rscript } else { args <- c("RMPInode.R", paste("OUT=", outfile, sep="")) mpitask <- "RunSnowWorker" } } else { if (homogeneous) { script <- shQuoteIfNeeded(file.path(scriptdir, "RMPInode.sh")) rlibs <- paste(getClusterOption("rlibs", options), collapse = ":") rprog <- shQuoteIfNeeded(getClusterOption("rprog", options)) args <- c(paste("RPROG=", rprog, sep=""), paste("OUT=", outfile, sep=""), paste("R_LIBS=", rlibs, sep=""), script) } else { args <- c(paste("OUT=", outfile, sep=""), "RunSnowNode", "RMPInode.sh") } mpitask <- "/usr/bin/env" } count <- Rmpi::mpi.comm.spawn(slave = mpitask, slavearg = args, nslaves = count, intercomm = intercomm) if (Rmpi::mpi.intercomm.merge(intercomm, 0, comm)) { Rmpi::mpi.comm.set.errhandler(comm) Rmpi::mpi.comm.disconnect(intercomm) } else stop("Failed to merge the comm for master and workers.") cl <- vector("list",count) for (i in seq(along=cl)) cl[[i]] <- newMPInode(i, comm) class(cl) <- c("spawnedMPIcluster", "MPIcluster", "cluster") setMPIcluster(cl) cl } } runMPIworker <- function() { comm <- 1 intercomm <- 2 Rmpi::mpi.comm.get.parent(intercomm) Rmpi::mpi.intercomm.merge(intercomm,1,comm) Rmpi::mpi.comm.set.errhandler(comm) Rmpi::mpi.comm.disconnect(intercomm) workLoop(makeMPImaster(comm)) Rmpi::mpi.comm.disconnect(comm) Rmpi::mpi.quit() } stopCluster.MPIcluster <- function(cl) { NextMethod() setMPIcluster(NULL) } stopCluster.spawnedMPIcluster <- function(cl) { comm <- 1 NextMethod() Rmpi::mpi.comm.disconnect(comm) } #**** figure out how to get Rmpi::mpi.quit called (similar issue for pvm?) #**** fix things so stopCluster works in both versions. #**** need .Last to make sure cluster is shut down on exit of master #**** figure out why the workers busy wait under mpirun snow/R/sock.R0000644000176200001440000001361213343257663012542 0ustar liggesusers# # Socket Implementation # #**** allow user to be different on different machines #**** allow machines to be selected from a hosts list newSOCKnode <- function(machine = "localhost", ..., options = defaultClusterOptions, rank) { # **** allow some form of spec here # **** make sure options are quoted options <- addClusterOptions(options, list(...)) if (is.list(machine)) { options <- addClusterOptions(options, machine) machine <- machine$host } outfile <- getClusterOption("outfile", options) if (machine == "localhost") master <- "localhost" else master <- getClusterOption("master", options) port <- getClusterOption("port", options) manual <- getClusterOption("manual", options) ## build the local command for starting the worker homogeneous <- getClusterOption("homogeneous", options) if (getClusterOption("useRscript", options)) { if (homogeneous) { rscript <- shQuoteIfNeeded(getClusterOption("rscript", options)) snowlib <- getClusterOption("snowlib", options) script <- shQuoteIfNeeded(file.path(snowlib, "snow", "RSOCKnode.R")) env <- paste("MASTER=", master, " PORT=", port, " OUT=", outfile, " SNOWLIB=", snowlib, sep="") cmd <- paste(rscript, script, env) } else { script <- "RunSnowWorker RSOCKnode.R" env <- paste("MASTER=", master, " PORT=", port, " OUT=", outfile, sep="") cmd <- paste(script, env) } } else { if (homogeneous) { scriptdir <- getClusterOption("scriptdir", options) script <- shQuoteIfNeeded(file.path(scriptdir, "RSOCKnode.sh")) rlibs <- paste(getClusterOption("rlibs", options), collapse = ":") rprog <- shQuoteIfNeeded(getClusterOption("rprog", options)) env <- paste("MASTER=", master, " PORT=", port, " OUT=", outfile, " RPROG=", rprog, " R_LIBS=", rlibs, sep="") } else { script <- "RunSnowNode RSOCKnode.sh" env <- paste("MASTER=", master, " PORT=", port, " OUT=", outfile, sep="") } cmd <- paste("env", env, script) } if (manual) { cat("Manually start worker on", machine, "with\n ", cmd, "\n") flush.console() } else { ## add the remote shell command if needed if (machine != "localhost") { rshcmd <- getClusterOption("rshcmd", options) user <- getClusterOption("user", options) cmd <- paste(rshcmd, "-l", user, machine, cmd) } if (.Platform$OS.type == "windows") { ## On windows using input = something seems needed to ## disconnect standard input of an ssh process when run ## from Rterm (at least using putty's plink). In ## principle this could also be used for supplying a ## password, but that is probably a bad idea. So, for now ## at least, on windows password-less authentication is ## necessary. system(cmd, wait = FALSE, input = "") } else system(cmd, wait = FALSE) } ## need timeout here because of the way internals work timeout <- getClusterOption("timeout") old <- options(timeout = timeout); on.exit(options(old)) con <- socketConnection(port = port, server=TRUE, blocking=TRUE, open="a+b") structure(list(con = con, host = machine, rank = rank), class = "SOCKnode") } makeSOCKmaster <- function(master = Sys.getenv("MASTER"), port = Sys.getenv("PORT")) { port <- as.integer(port) timeout <- getClusterOption("timeout") ## Retry scheme parameters (do these need to be customizable?) retryDelay <- 0.1 # 0.1 second initial delay before retrying retryScale <- 1.5 # 50% increase of delay at each retry setup_timeout <- 120 # retry setup for 2 minutes before failing ## Retry multiple times in case the master is not yet ready t0 <- Sys.time() repeat { con <- tryCatch({ socketConnection(master, port = port, blocking = TRUE, open = "a+b", timeout = timeout) }, error = identity, warning = identity) if (inherits(con, "connection")) break if (Sys.time() - t0 > setup_timeout) break Sys.sleep(retryDelay) retryDelay <- retryScale * retryDelay } if (inherits(con, "condition")) stop(con) structure(list(con = con), class = "SOCKnode") } closeNode.SOCKnode <- function(node) close(node$con) sendData.SOCKnode <- function(node, data) { ## timeout <- getClusterOption("timeout") ## old <- options(timeout = timeout); ## on.exit(options(old)) serialize(data, node$con) } recvData.SOCKnode <- function(node) { ## timeout <- getClusterOption("timeout") ## old <- options(timeout = timeout); ## on.exit(options(old)) unserialize(node$con) } recvOneData.SOCKcluster <- function(cl) { socklist <- lapply(cl, function(x) x$con) repeat { ready <- socketSelect(socklist) if (length(ready) > 0) break; } n <- which(ready)[1] # may need rotation or some such for fairness list(node = n, value = unserialize(socklist[[n]])) } makeSOCKcluster <- function(names, ..., options = defaultClusterOptions) { if (is.numeric(names)) names <- rep('localhost', names[1]) options <- addClusterOptions(options, list(...)) cl <- vector("list",length(names)) for (i in seq(along=cl)) cl[[i]] <- newSOCKnode(names[[i]], options = options, rank = i) class(cl) <- c("SOCKcluster", "cluster") cl } snow/R/timing.R0000644000176200001440000001253611611343415013061 0ustar liggesusers## This approach uses a single global data structure to record timing ## information. The structure contains functions defined in a shared ## environment to implement mutable state. Workers are identified by ## their rank. The functions recvData, recvOneData, and sendCall have ## been modified to record timing information in timing is active. ## The overhead of the test for timing should be fairly small compared ## to the data transmissions. If we had an efficient dynamically ## scoped variable mechanism it would probably be better to use that ## instead of the global variable that is reset via on.exit. ## ## For now, calls to the timing function snow.time() cannot be nested. .snowTimingData <- local({ data <- NULL timerRunning <- function(run) { if (! missing(run)) { if (run) data <<- list(index = integer(0), data = NULL) else data <<- NULL } ! is.null(data) } incrementIndex <- function(rank) { n <- length(data$index) if (rank > n) data$index <<- c(data$index, rep(0L, rank - n)) data$index[rank] <<- data$index[rank] + 1L data$index[rank] } getIndex <- function(rank) { data$index[rank] } enterSend <- function(rank, start, end) { n <- length(data$data) if (rank > n) data$data <<- c(data$data, vector(mode = "list", rank - n)) if (is.null(data$data[[rank]])) { d <- matrix(NA_real_, 10, 5) colnames(d) <- c("send_start", "send_end", "recv_start", "recv_end", "exec") data$data[[rank]] <<- d } i <- incrementIndex(rank) nr <- nrow(data$data[[rank]]) if (nr < i) data$data[[rank]] <<- rbind(data$data[[rank]], matrix(NA_real_, nr, 5)) data$data[[rank]][i, 1:2] <<- c(start, end) } enterRecv <- function(rank, start, end, exec) { i <- getIndex(rank) data$data[[rank]][i, 3:5] <<- c(start, end, exec) } extractTimingData <- function() lapply(seq_along(data$data), function(i) { d <- data$data[[i]] n <- data$index[[i]] if (! is.null(d)) { d <- d[1:n,, drop = FALSE] d[, 3] <- pmax(d[, 3], d[, 2] + d[, 5]) d } }) list(running = timerRunning, enterSend = enterSend, enterRecv = enterRecv, extract = extractTimingData) }) snow.time <- function(expr) { if (.snowTimingData$running()) stop("nested snow.sime calls are currently not supported") .snowTimingData$running(TRUE) on.exit(.snowTimingData$running(FALSE)) start <- proc.time()[3] expr end <- proc.time()[3] data <- lapply(.snowTimingData$extract(), function(d) { if (! is.null(d)) { d[,1:4] <- d[,1:4] - start d } else NULL }) structure(list(elapsed = end - start, data = data), class = "snowTimingData") } plot.snowTimingData <- function(x, xlab = "Elapsed Time", ylab = "Node", title = "Cluster Usage", ...) { w2 <- 0.05 data <- x$data n <- length(data) r <- c(0, x$elapsed) plot(r, c(0 - w2, max(n, 1) + w2), xlab = xlab, ylab = ylab, type = "n", yaxt = "n", ...) axis(2, yaxp = c(0, n, max(n, 1))) title(title) ## show the information for the workers for (i in 0 : n) abline(i, 0, lty = 2) for (i in seq_along(data)) { d <- data[[i]] nr <- nrow(d) segments(d[, 1], rep(0, nr), d[, 2], rep(i, nr), col = "red") rect(d[, 2], rep(i - w2, nr), d[, 2] + d[, 5], rep(i + w2, nr), col = "green") segments(d[, 2] + d[, 5], rep(i, nr), d[, 3], rep(i, nr), col = "blue") segments(d[, 3], rep(i, nr), d[, 4], rep(0, nr), col = "red") } ## compute and draw the intervals where no worker is active if (length(data) > 0) { d <- do.call(rbind, data) times <- c(d[, 1], d[, 4]) ord <- order(times) cs <- cumsum(rep(c(1,-1), each = nrow(d))[ord]) st <- sort(times) left <- c(0, st[cs == 0]) right <- c(st[c(1, which(cs[-length(cs)] == 0) + 1)], x$elapsed) } else { left <- 0 right <- x$elapsed } rect(left, -w2, right, w2, col = "green") } print.snowTimingData <- function(x, ...) { data <- x$data send <- sum(unlist(lapply(data, function(d) if (is.null(d)) 0 else sum(d[,2] - d[,1])))) recv <- sum(unlist(lapply(data, function(d) if (is.null(d)) 0 else sum(d[,4] - d[,3])))) nodes <- sapply(data, function(d) if (is.null(d)) 0 else sum(d[,5])) n <- length(data) if (n > 0) nodeNames <- paste("node", 1:n) else nodeNames <- character(0) y <- structure(c(x$elapsed, send, recv, nodes), names = c("elapsed", "send", "receive", nodeNames)) print(y) invisible(x) } snow/R/snow.R0000644000176200001440000004544214136035503012563 0ustar liggesusers# # Utilities # docall <- function(fun, args) { if ((is.character(fun) && length(fun) == 1) || is.name(fun)) fun <- get(as.character(fun), envir = .GlobalEnv, mode = "function") do.call("fun", lapply(args, enquote)) } shQuoteIfNeeded <- function(p) { if (length(grep("[[:space:]]", p)) == 0) p else if (.Platform$OS.type == "windows") shQuote(p) else stop("file names with spaces do not work properly on this platform") } # # Checking and subsetting # checkCluster <- function(cl) { if (! inherits(cl, "cluster")) stop("not a valid cluster"); } ## Now defined in 'parallel' ## "[.cluster" <-function(cl,...) { ## v<-unclass(cl)[...] ## class(v)<-class(cl) ## v ## } # # Work Loop Function # workLoop <- function(master) { repeat tryCatch({ msg <- recvData(master) cat(paste("Type:", msg$type, "\n")) if (msg$type == "DONE") { closeNode(master) break; } else if (msg$type == "EXEC") { success <- TRUE ## This uses the message, rather than the exception since ## the exception class/methods may not be available on the ## master. handler <- function(e) { success <<- FALSE structure(conditionMessage(e), class=c("snow-try-error","try-error")) } t1 <- proc.time() value <- tryCatch(docall(msg$data$fun, msg$data$args), error = handler) t2 <- proc.time() value <- list(type = "VALUE", value = value, success = success, time = t2 - t1, tag = msg$data$tag) msg <- NULL ## release for GC sendData(master, value) value <- NULL ## release for GC } }, interrupt = function(e) NULL) } sinkWorkerOutput <- function(outfile) { if (outfile != "") { if (.Platform$OS.type == "windows" && outfile == "/dev/null") outfile <- "nul:" outcon <- file(outfile, open = "w") sink(outcon) sink(outcon, type = "message") } } # # Higher-Level Node Functions # closeNode <- function(node) UseMethod("closeNode") closeNode.default <- function(node) {} sendData <- function(node, data) UseMethod("sendData") recvData <- function(node) UseMethod("recvData") postNode <- function(con, type, value = NULL, tag = NULL) { sendData(con, list(type = type, data = value, tag = tag)) } stopNode <- function(n) { postNode(n, "DONE") closeNode(n) } recvOneData <- function(cl) UseMethod("recvOneData") # # Cluster Creation and Destruction # defaultClusterOptions <- NULL #**** check valid cluster option initDefaultClusterOptions <- function(libname) { rhome <- Sys.getenv("R_HOME") if (Sys.getenv("R_SNOW_LIB") != "") homogeneous <- FALSE else homogeneous <- TRUE if (.Platform$OS.type == "windows") rscript <- file.path(R.home("bin"), "Rscript.exe") else rscript <- file.path(R.home("bin"), "Rscript") port <- Sys.getenv("R_PARALLEL_PORT") port <- if (identical(port, "random")) NA else as.integer(port) if (is.na(port)) port <- 11000 + 1000 * ((stats::runif(1L) + unclass(Sys.time())/300) %% 1) options <- list(port = as.integer(port), timeout = 60 * 60 * 24 * 30, # 30 days master = Sys.info()["nodename"], homogeneous = homogeneous, type = NULL, outfile = "/dev/null", rhome = rhome, user = Sys.info()["user"], rshcmd = "ssh", rlibs = Sys.getenv("R_LIBS"), scriptdir = file.path(libname, "snow"), rprog = file.path(rhome, "bin", "R"), snowlib = libname, rscript = rscript, useRscript = file.exists(rscript), manual = FALSE) defaultClusterOptions <<- addClusterOptions(emptyenv(), options) } addClusterOptions <- function(options, new) { if (! is.null(new)) { options <- new.env(parent = options) names <- names(new) for (i in seq(along = new)) assign(names[i], new[[i]], envir = options) } options } getClusterOption <- function(name, options = defaultClusterOptions) get(name, envir = options) setDefaultClusterOptions <- function(...) { list <- list(...) names <- names(list) for (i in seq(along = list)) assign(names[i], list[[i]], envir = defaultClusterOptions) } makeCluster <- function(spec, type = getClusterOption("type"), ...) { if (is.null(type)) stop("need to specify a cluster type") switch(type, SOCK = makeSOCKcluster(spec, ...), # PVM = makePVMcluster(spec, ...), MPI = makeMPIcluster(spec, ...), # NWS = makeNWScluster(spec, ...), stop("unknown cluster type")) } stopCluster <- function(cl) UseMethod("stopCluster") stopCluster.default <- function(cl) for (n in cl) stopNode(n) # # Cluster Functions # sendCall <- function (con, fun, args, return = TRUE, tag = NULL) { #**** mark node as in-call timing <- .snowTimingData$running() if (timing) start <- proc.time()[3] postNode(con, "EXEC", list(fun = fun, args = args, return = return, tag = tag)) if (timing) .snowTimingData$enterSend(con$rank, start, proc.time()[3]) NULL } recvResult <- function (con) { if (.snowTimingData$running()) { start <- proc.time()[3] r <- recvData(con) end <- proc.time()[3] .snowTimingData$enterRecv(con$rank, start, end, r$time[3]) } else r <- recvData(con) r$value } checkForRemoteErrors <- function(val) { count <- 0 firstmsg <- NULL for (v in val) { if (inherits(v, "try-error")) { count <- count + 1 if (count == 1) firstmsg <- v } } if (count == 1) stop("one node produced an error: ", firstmsg) else if (count > 1) stop(count, " nodes produced errors; first error: ", firstmsg) val } clusterCall <- function(cl, fun, ...) { checkCluster(cl) for (i in seq(along = cl)) sendCall(cl[[i]], fun, list(...)) checkForRemoteErrors(lapply(cl, recvResult)) } staticClusterApply <- function(cl, fun, n, argfun) { checkCluster(cl) p <- length(cl) if (n > 0 && p > 0) { val <- vector("list", n) start <- 1 while (start <= n) { end <- min(n, start + p - 1) jobs <- end - start + 1 for (i in 1:jobs) sendCall(cl[[i]], fun, argfun(start + i - 1)) val[start:end] <- lapply(cl[1:jobs], recvResult) start <- start + jobs } checkForRemoteErrors(val) } } clusterApply <- function(cl, x, fun, ...) { argfun <- function(i) c(list(x[[i]]), list(...)) staticClusterApply(cl, fun, length(x), argfun) } clusterEvalQ<-function(cl, expr) clusterCall(cl, eval, substitute(expr), env=.GlobalEnv) clusterExport <- local({ env <- as.environment(1) ## .GlobalEnv gets <- function(n, v) { assign(n, v, envir = env); NULL } function(cl, list, envir = .GlobalEnv) { ## do this with only one clusterCall--loop on workers? for (name in list) { clusterCall(cl, gets, name, get(name, envir = envir)) } } }) ## A variant that does the exports one at at ime--this may be useful ## when large objects are being sent # clusterExportSerial <- function(cl, list) { # gets <- function(n, v) { assign(n, v, envir = .GlobalEnv); NULL } # for (name in list) { # v <- get(name, envir = .GlobalEnv) # for (i in seq(along = cl)) { # sendCall(cl[[i]], gets, list(name, v)) # recvResult(cl[[i]]) # } # } # } recvOneResult <- function (cl) { if (.snowTimingData$running()) { start <- proc.time()[3] v <- recvOneData(cl) end <- proc.time()[3] .snowTimingData$enterRecv(v$node, start, end, v$value$time[3]) } else v <- recvOneData(cl) list(value = v$value$value, node = v$node, tag = v$value$tag) } findRecvOneTag <- function(cl, anytag) { rtag <- NULL for (node in cl) { if (is.null(rtag)) rtag <- node$RECVTAG else if (rtag != node$RECVTAG) { rtag <- anytag break; } } rtag } dynamicClusterApply <- function(cl, fun, n, argfun) { checkCluster(cl) p <- length(cl) if (n > 0 && p > 0) { submit <- function(node, job) sendCall(cl[[node]], fun, argfun(job), tag = job) for (i in 1 : min(n, p)) submit(i, i) val <- vector("list", n) for (i in 1:n) { d <- recvOneResult(cl) j <- i + min(n, p) if (j <= n) submit(d$node, j) val[d$tag] <- list(d$value) } checkForRemoteErrors(val) } } clusterApplyLB <- function(cl, x, fun, ...) { ## **** this closure is sending all of x to all nodes argfun <- function(i) c(list(x[[i]]), list(...)) dynamicClusterApply(cl, fun, length(x), argfun) } ## **** should this allow load balancing? ## **** disallow recycling if one arg is length zero? clusterMap <- function (cl, fun, ..., MoreArgs = NULL, RECYCLE = TRUE) { checkCluster(cl) args <- list(...) if (length(args) == 0) stop("need at least one argument") n <- sapply(args, length) if (RECYCLE) { vlen <- max(n) if (!all(n == vlen)) for (i in 1:length(args)) args[[i]] <- rep(args[[i]], length = max(n)) } else vlen = min(n) ## **** this closure is sending all of ... to all nodes argfun <- function(i) c(lapply(args, function(x) x[[i]]), MoreArgs) staticClusterApply(cl, fun, vlen, argfun) } # # Cluster RNG Support # clusterSetupRNG <- function (cl, type="RNGstream", ...) { RNGnames <- c("RNGstream", "SPRNG") rng <- pmatch (type, RNGnames) if (is.na(rng)) stop(paste("'", type, "' is not a valid choice. Choose 'RNGstream' or 'SPRNG'.", sep = "")) type <- RNGnames[rng] if (rng == 1) clusterSetupRNGstream(cl, ...) else clusterSetupSPRNG(cl, ...) type } # # Cluster SPRNG Support # # adapted from rpvm (Li & Rossini) ## clusterSetupSPRNG <- function (cl, seed = round(2^32 * runif(1)), ## prngkind = "default", para = 0, ...) ## { ## if (!is.character(prngkind) || length(prngkind) > 1) ## stop("'rngkind' must be a character string of length 1.") ## if (!is.na(pmatch(prngkind, "default"))) ## prngkind <- "LFG" ## prngnames <- c("LFG", "LCG", "LCG64", "CMRG", "MLFG", "PMLCG") ## kind <- pmatch(prngkind, prngnames) - 1 ## if (is.na(kind)) ## stop(paste("'", prngkind, "' is not a valid choice", sep = "")) ## nc <- length(cl) ## invisible(clusterApply(cl, 0:(nc-1), initSprngNode, ## nc, seed, kind, para)) ## } clusterSetupSPRNG <- function (cl, seed = round(2^32 * runif(1)), prngkind = "default", para = 0, ...) stop("SPRNG is currently not supported") ## initSprngNode <- function (streamno, nstream, seed, kind, para) ## { ## if (! requireNamespace("rsprng")) ## stop("the `rsprng' package is needed for SPRNG support.") ## .Call("r_init_sprng", as.integer(kind), as.integer(streamno), ## as.integer(nstream), as.integer(seed), as.integer(para), ## PACKAGE = "rsprng") ## RNGkind("user") ## } # # rlecuyer support # clusterSetupRNGstream <- function (cl, seed=rep(12345,6), ...) { if (! requireNamespace("rlecuyer")) stop("the `rlecuyer' package is needed for RNGstream support.") rlecuyer::.lec.init() rlecuyer::.lec.SetPackageSeed(seed) nc <- length(cl) names <- as.character(1:nc) rlecuyer::.lec.CreateStream(names) states <- lapply(names, rlecuyer::.lec.GetStateList) invisible(clusterApply(cl, states, initRNGstreamNode)) } initRNGstreamNode <- function (stream) { if (! requireNamespace("rlecuyer")) stop("the `rlecuyer' package is needed for RNGstream support.") .lec.Random.seed.table <- get(".lec.Random.seed.table", .GlobalEnv) if (length(.lec.Random.seed.table$name) > 0) { rm(".lec.Random.seed.table", envir=.GlobalEnv) .lec.Random.seed.table <- list(Cg=matrix(0,nrow=0,ncol=6), Bg=matrix(0,nrow=0,ncol=6), Ig=matrix(0,nrow=0,ncol=6), AIP=matrix(0,nrow=0,ncol=2), name=c()) } .lec.Random.seed.table$Cg <- rbind(.lec.Random.seed.table$Cg, stream$Cg[1:6]) .lec.Random.seed.table$Bg <- rbind(.lec.Random.seed.table$Bg,stream$Bg) .lec.Random.seed.table$Ig <- rbind(.lec.Random.seed.table$Ig,stream$Ig) .lec.Random.seed.table$AIP <- rbind(.lec.Random.seed.table$AIP, c(stream$Anti, stream$IncPrec)) .lec.Random.seed.table$name <- c(.lec.Random.seed.table$name, stream$name) env <- as.environment(1) ## .GlobalEnv assign(".lec.Random.seed.table", .lec.Random.seed.table, envir = env) old.kind<-rlecuyer::.lec.CurrentStream(stream$name) old.kind } # # Parallel Functions # splitIndices <- function(nx, ncl) { batchsize <- if (nx %% ncl == 0) nx %/% ncl else 1 + nx %/% ncl batches <- (nx + batchsize - 1) %/% batchsize split(1:nx, rep(1:batches, each = batchsize)[1:nx]) } splitIndices <- function(nx, ncl) { i <- 1:nx; if (ncl == 1) i else structure(split(i, cut(i, ncl)), names=NULL) } # The fuzz used by cut() is too small when nx and ncl are both large # and causes some groups to be empty. The definition below avoids that # while minimizing changes from the results produced by the definition # above. splitIndices <- function(nx, ncl) { i <- seq_len(nx) if (ncl == 0) list() else if (ncl == 1 || nx == 1) list(i) else { fuzz <- min((nx - 1) / 1000, 0.4 * nx / ncl) breaks <- seq(1 - fuzz, nx + fuzz, length = ncl + 1) structure(split(i, cut(i, breaks)), names = NULL) } } clusterSplit <- function(cl, seq) lapply(splitIndices(length(seq), length(cl)), function(i) seq[i]) splitList <- function(x, ncl) lapply(splitIndices(length(x), ncl), function(i) x[i]) splitRows <- function(x, ncl) lapply(splitIndices(nrow(x), ncl), function(i) x[i,, drop=FALSE]) splitCols <- function(x, ncl) lapply(splitIndices(ncol(x), ncl), function(i) x[,i, drop=FALSE]) parLapply <- function(cl, x, fun, ...) docall(c, clusterApply(cl, splitList(x, length(cl)), lapply, fun, ...)) parRapply <- function(cl, x, fun, ...) docall(c, clusterApply(cl, splitRows(x,length(cl)), apply, 1, fun, ...)) parCapply <- function(cl, x, fun, ...) docall(c, clusterApply(cl, splitCols(x,length(cl)), apply, 2, fun, ...)) parMM <- function(cl, A, B) docall(rbind,clusterApply(cl, splitRows(A, length(cl)), get("%*%"), B)) # adapted from sapply in the R sources parSapply <- function (cl, X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) { FUN <- match.fun(FUN) # should this be done on worker? answer <- parLapply(cl,as.list(X), FUN, ...) if (USE.NAMES && is.character(X) && is.null(names(answer))) names(answer) <- X if (simplify && length(answer) != 0) { common.len <- unique(unlist(lapply(answer, length))) if (common.len == 1) unlist(answer, recursive = FALSE) else if (common.len > 1) array(unlist(answer, recursive = FALSE), dim = c(common.len, length(X)), dimnames = list(names(answer[[1]]), names(answer))) else answer } else answer } # adapted from apply in the R sources parApply <- function(cl, X, MARGIN, FUN, ...) { FUN <- match.fun(FUN) # should this be done on worker? ## Ensure that X is an array object d <- dim(X) dl <- length(d) if(dl == 0) stop("dim(X) must have a positive length") ds <- 1:dl # for compatibility with R versions prior to 1.7.0 if (! exists("oldClass")) oldClass <- class if(length(oldClass(X)) > 0) X <- if(dl == 2) as.matrix(X) else as.array(X) dn <- dimnames(X) ## Extract the margins and associated dimnames s.call <- ds[-MARGIN] s.ans <- ds[MARGIN] d.call <- d[-MARGIN] d.ans <- d[MARGIN] dn.call<- dn[-MARGIN] dn.ans <- dn[MARGIN] ## dimnames(X) <- NULL ## do the calls d2 <- prod(d.ans) if(d2 == 0) { ## arrays with some 0 extents: return ``empty result'' trying ## to use proper mode and dimension: ## The following is still a bit `hackish': use non-empty X newX <- array(vector(typeof(X), 1), dim = c(prod(d.call), 1)) ans <- FUN(if(length(d.call) < 2) newX[,1] else array(newX[,1], d.call, dn.call), ...) return(if(is.null(ans)) ans else if(length(d.call) < 2) ans[1][-1] else array(ans, d.ans, dn.ans)) } ## else newX <- aperm(X, c(s.call, s.ans)) dim(newX) <- c(prod(d.call), d2) if(length(d.call) < 2) {# vector if (length(dn.call)) dimnames(newX) <- c(dn.call, list(NULL)) arglist <- lapply(1:d2, function(i) newX[,i]) } else arglist <- lapply(1:d2, function(i) array(newX[,i], d.call, dn.call)) ans <- parLapply(cl, arglist, FUN, ...) ## answer dims and dimnames ans.list <- is.recursive(ans[[1]]) l.ans <- length(ans[[1]]) ans.names <- names(ans[[1]]) if(!ans.list) ans.list <- any(unlist(lapply(ans, length)) != l.ans) if(!ans.list && length(ans.names)) { all.same <- sapply(ans, function(x) identical(names(x), ans.names)) if (!all(all.same)) ans.names <- NULL } len.a <- if(ans.list) d2 else length(ans <- unlist(ans, recursive = FALSE)) if(length(MARGIN) == 1 && len.a == d2) { names(ans) <- if(length(dn.ans[[1]])) dn.ans[[1]] # else NULL return(ans) } if(len.a == d2) return(array(ans, d.ans, dn.ans)) if(len.a > 0 && len.a %% d2 == 0) return(array(ans, c(len.a %/% d2, d.ans), if(is.null(dn.ans)) { if(!is.null(ans.names)) list(ans.names,NULL) } else c(list(ans.names), dn.ans))) return(ans) } # # Library Initialization # .onLoad <- function(libname, pkgname) { initDefaultClusterOptions(libname) if (exists("mpi.comm.size")) type <- "MPI" ## else if (length(find.package("rpvm", quiet = TRUE)) != 0) ## type <- "PVM" else if (length(find.package("Rmpi", quiet = TRUE)) != 0) type <- "MPI" ## else if (length(find.package("nws", quiet = TRUE)) != 0) ## type <- "NWS" else type <- "SOCK" setDefaultClusterOptions(type = type) } snow/MD50000644000176200001440000000211014136256672011556 0ustar liggesusers472d5ac9b8be15e37145c6628a54a5db *DESCRIPTION 35e2b4ff307f51e8da71107cb0612a62 *NAMESPACE d518ea575733a7b3d0cb2a220bc6112b *R/mpi.R 546a2c934322e412e79c97a791dad32d *R/snow.R 663d08fda3442a514fc2329cc46b7411 *R/sock.R 59d3c83000da91b34a6fbbd9e98f168b *R/timing.R 313bebaa052618f9a50bb43054754759 *README 947dea09fefa0338225dd31eca17b9cf *inst/RMPISNOW 101d5193e259f4e806539caaf57d2aed *inst/RMPISNOWprofile 9377ad04a416ba38bacc13244d0ed88e *inst/RMPInode.R d494656a44cfae8c9927a57843ad4d0e *inst/RMPInode.sh b06b792b3a15008fe63d92126a44c186 *inst/RSOCKnode.R 2939b5cdca29c508ac784aadf1ba8820 *inst/RSOCKnode.sh 33f05b3ea71a95a24d42a8725665ec83 *inst/RunSnowNode 28d30ce64558eef7cac822ff090ebcdf *inst/RunSnowWorker 618bcd4abfbbc08f8c9df392de152497 *inst/RunSnowWorker.bat ba572e8be1672d4a6f8795697163524f *man/snow-cluster.Rd 89b6e364f1b18ecc99f816952849d5c5 *man/snow-internal.Rd fcdb2091efad94d1f2f4b6fe0aa46d42 *man/snow-parallel.Rd d41f7a7d8393ef858feaeb815078c88a *man/snow-rand.Rd f9c9079aacace277b939c8f0fc2c4f91 *man/snow-startstop.Rd 31ba5f9fda5da83fbdac36eaf05c117f *man/snow-timing.Rd snow/inst/0000755000176200001440000000000014136035750012221 5ustar liggesuserssnow/inst/RunSnowNode0000755000176200001440000000011211564215220014354 0ustar liggesusers#!/bin/sh export R_LIBS=${R_SNOW_LIB}:${R_LIBS} exec $R_SNOW_LIB/snow/$1 snow/inst/RSOCKnode.sh0000755000176200001440000000035614136034556014316 0ustar liggesusers#! /bin/sh # the & for backgrounding works in bash--does it work in other sh variants? ${RPROG:-R} --vanilla < ${OUT:-/dev/null} 2>&1 & library(snow) options(timeout=getClusterOption("timeout")) workLoop(makeSOCKmaster()) EOF snow/inst/RMPISNOWprofile0000644000176200001440000000071614136034432015003 0ustar liggesuserslibrary(Rmpi) library(snow) assign(".MPIrun", function() { if (mpi.comm.rank(0) > 0){ sys.load.image(".RData",TRUE) .First.sys() sink(file="/dev/null") workLoop(makeMPImaster()) mpi.quit() } else { makeMPIcluster() .Last <<- function(){ cl <- getMPIcluster() if (! is.null(cl)) stopCluster(cl) mpi.quit() } } }, .GlobalEnv) .MPIrun() snow/inst/RMPInode.sh0000755000176200001440000000016214136035750014174 0ustar liggesusers#! /bin/sh ${RPROG:-R} --vanilla < ${OUT:-/dev/null} 2>&1 library(Rmpi) library(snow) runMPIworker() EOF snow/inst/RMPInode.R0000755000176200001440000000130114136035733013760 0ustar liggesuserslocal({ snowlib <- Sys.getenv("R_SNOW_LIB") outfile <- Sys.getenv("R_SNOW_OUTFILE") args <- commandArgs() pos <- match("--args", args) args <- args[-(1 : pos)] for (a in args) { pos <- regexpr("=", a) name <- substr(a, 1, pos - 1) value <- substr(a,pos + 1, nchar(a)) switch(name, SNOWLIB = snowlib <- value, OUT = outfile <- value) } if (! (snowlib %in% .libPaths())) .libPaths(c(snowlib, .libPaths())) library(methods) ## because Rscript as of R 2.7.0 doesn't load methods library(Rmpi) library(snow) sinkWorkerOutput(outfile) cat("starting MPI worker\n") runMPIworker() }) snow/inst/RSOCKnode.R0000644000176200001440000000167014136034515014075 0ustar liggesuserslocal({ master <- "localhost" port <- "" snowlib <- Sys.getenv("R_SNOW_LIB") outfile <- Sys.getenv("R_SNOW_OUTFILE") ##**** defaults to ""; document args <- commandArgs() pos <- match("--args", args) args <- args[-(1 : pos)] for (a in args) { pos <- regexpr("=", a) name <- substr(a, 1, pos - 1) value <- substr(a,pos + 1, nchar(a)) switch(name, MASTER = master <- value, PORT = port <- value, SNOWLIB = snowlib <- value, OUT = outfile <- value) } if (! (snowlib %in% .libPaths())) .libPaths(c(snowlib, .libPaths())) library(methods) ## because Rscript as of R 2.7.0 doesn't load methods library(snow) if (port == "") port <- getClusterOption("port") sinkWorkerOutput(outfile) cat("starting worker for", paste(master, port, sep = ":"), "\n") workLoop(makeSOCKmaster(master, port)) }) snow/inst/RMPISNOW0000755000176200001440000000415014136033410013414 0ustar liggesusers#! /bin/sh # if defined, prepend R_SNOW_LIB to $_LIBS if test ! -z "${R_SNOW_LIB}" ; then R_LIBS=${R_SNOW_LIB}:${R_LIBS}; export R_LIBS fi # find the library containing the snow package; should eventually use Rscript snowdir=`echo 'invisible(cat(tryCatch(dirname(find.package("snow")), error = function(e) ""),"\n",sep=""))' | R --no-echo` # for now this hijacks the R_PROFILE mechanism to start up the R # sessions and load snow and Rmpi into them R_PROFILE=${snowdir}/snow/RMPISNOWprofile; export R_PROFILE if test ! -z "${LAMRANK}" ; then # use the LAMRANK environment variable set by LAM-MPI's mpirun to # run R with appropriate arguments for master and workers. if test "${LAMRANK}" -eq 0 ; then exec R $* else exec R --no-echo > /dev/null 2>&1 fi elif test ! -z "${MV2_COMM_WORLD_RANK}" ; then # For MVAPICH2 if test "${MV2_COMM_WORLD_RANK}" -eq 0 ; then exec R --no-save $* else exec R --no-echo > /dev/null 2>&1 fi elif test ! -z "${OMPI_MCA_ns_nds_vpid}" ; then # Similar approach for OpenMPI using the OMPI_MCA_ns_nds_vpid # variable. This is for backward compatibility with pre-1.3 # versions. The master R process thinks it is non-interactive so # for now --no-save or something like that is needed. if test "${OMPI_MCA_ns_nds_vpid}" -eq 0 ; then exec R --no-save $* else exec R --no-echo > /dev/null 2>&1 fi elif test ! -z "${OMPI_COMM_WORLD_RANK}" ; then # Similar approach for OpenMPI using the OMPI_COMM_WORLD_RANK # variable. This is the correct variable to use as of v1.3+. The # master R process thinks it is non-interactive so for now # --no-save or something like that is needed. if test "${OMPI_COMM_WORLD_RANK}" -eq 0 ; then exec R --no-save $* else exec R --no-echo > /dev/null 2>&1 fi else # The fallback is to use the same arguments on master and workers, # with --no-save for cases where workers don't have a terminal. # This means that things like CMD batch won't work. It seems to be # important NOT to use exec here, at least when this code runs under LAM. R --no-save $* fi snow/inst/RunSnowWorker0000755000176200001440000000010311564215220014740 0ustar liggesusers#!/bin/sh exec ${R_SNOW_RSCRIPT_CMD:-Rscript} $R_SNOW_LIB/snow/$* snow/inst/RunSnowWorker.bat0000755000176200001440000000034411564215220015514 0ustar liggesusers@echo off rem If the R_SNOW_RSCRIPT_CMD variable is not defined assume Rscript rem is in the search path. if not defined R_SNOW_RSCRIPT_CMD set R_SNOW_RSCRIPT_CMD=Rscript %R_SNOW_RSCRIPT_CMD% %R_SNOW_LIB%\snow\%*