% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/seqCluster.R
\name{seqCluster}
\alias{seqCluster}
\title{Program for sequentially clustering, removing cluster, and starting again.}
\usage{
seqCluster(
  inputMatrix,
  inputType,
  k0,
  subsample = TRUE,
  beta,
  top.can = 0.01,
  remain.n = 30,
  k.min = 3,
  k.max = k0 + 10,
  verbose = TRUE,
  subsampleArgs = NULL,
  mainClusterArgs = NULL,
  warnings = FALSE
)
}
\arguments{
\item{inputMatrix}{numerical matrix on which to run the clustering or a
\code{\link[SummarizedExperiment]{SummarizedExperiment}},
\code{\link{SingleCellExperiment}}, or \code{\link{ClusterExperiment}}
object.}

\item{inputType}{a character vector defining what type of input is given in
the \code{inputMatrix} argument. Must consist of values "diss","X", or
"cat" (see details). "X" and "cat" should be indicate
matrices with features in the row and samples in the column; "cat"
corresponds to the features being numerical integers corresponding to
categories, while "X" are continuous valued features. "diss" corresponds to
an \code{inputMatrix} that is a NxN dissimilarity matrix. "cat" is largely
used internally for clustering of sets of clusterings.}

\item{k0}{the value of K at the first iteration of sequential algorithm, see
details below or vignette.}

\item{subsample}{logical as to whether to subsample via 
\code{\link{subsampleClustering}} to get the distance matrix at each 
iteration; otherwise the distance matrix is set by arguments to
\code{\link{mainClustering}}.}

\item{beta}{value between 0 and 1 to decide how stable clustership membership
has to be before 'finding' and removing the cluster.}

\item{top.can}{only the top.can clusters from \code{\link{mainClustering}} (ranked
by 'orderBy' argument given to \code{\link{mainClustering}}) will be compared
pairwise for stability. Can be either an integer value, identifying the absolute number of clusters, or a value between 0 and 1, meaning to keep all clusters with at least this proportion of the remaining samples in the cluster. Making this either a very big integer or equal to 0 will effectively remove this
parameter and all pairwise comparisons of all clusters found will be
considered; this might result in smaller clusters being found. If top.can is between 0 and 1, then there is still a hard threshold of at least 5 samples in a cluster to be considered as a cluster.}

\item{remain.n}{when only this number of samples are left (i.e. not yet
clustered) then algorithm will stop.}

\item{k.min}{each iteration of sequential detection of clustering will
decrease the beginning K of subsampling, but not lower than k.min.}

\item{k.max}{algorithm will stop if K in iteration is increased beyond this
point.}

\item{verbose}{whether the algorithm should print out information as to its
progress.}

\item{subsampleArgs}{list of arguments to be passed to
\code{\link{subsampleClustering}}.}

\item{mainClusterArgs}{list of arguments to be passed to
\code{\link{mainClustering}}).}

\item{warnings}{logical. Whether to print out the many possible warnings and
messages regarding checking the internal consistency of the parameters.}
}
\value{
A list with values
\itemize{

\item{\code{clustering}}{ a vector of length equal to nrows(x) giving the
integer-valued cluster ids for each sample. The integer values are assigned
in the order that the clusters were found. "-1" indicates the sample was not
clustered.}

\item{\code{clusterInfo}}{ if clusters were successfully found, a matrix of
information regarding the algorithm behavior for each cluster (the starting
and stopping K for each cluster, and the number of iterations for each
cluster).}

\item{\code{whyStop}}{ a character string explaining what triggered the
algorithm to stop.}
}
}
\description{
Given a data matrix, this function will call clustering
routines, and sequentially remove best clusters, and iterate to find
clusters.
}
\details{
\code{seqCluster} is not meant to be called by the user. It is only
  an exported function so as to be able to clearly document the arguments for
  \code{seqCluster} which can be passed via the argument \code{seqArgs} in
  functions like \code{\link{clusterSingle}} and \code{\link{clusterMany}}.

This code is adapted from the sequential protion of the code of the
  tightClust package of Tseng and Wong. At each iteration of the algorithm it
  finds a set of samples that constitute a homogeneous cluster and remove
  them, and iterate again to find the next set of samples that form a
  cluster.

In each iteration, to determine the next set of homogeneous set of
  samples, the algorithm will iteratively cluster the current set of samples
  for a series of increasing values of the parameter $K$, starting at a value
  \code{kinit} and increasing by 1 at each iteration, until a sufficiently
  homogeneous set of clusters is found. For the first set of homogeneous
  samples, \code{kinit} is set to the argument $k0$, and for iteration,
  \code{kinit} is increased internally.

Depending on the value of \code{subsample} how the value of $K$ is
  used differs. If \code{subsample=TRUE}, $K$ is the \code{k} sent to the
  cluster function \code{clusterFunction} sent to 
  \code{\link{subsampleClustering}} via \code{subsampleArgs}; then
  \code{\link{mainClustering}} is run on the result of the co-occurance matrix from
  \code{\link{subsampleClustering}} with the \code{ClusterFunction} object
  defined in the argument \code{clusterFunction} set via \code{mainClusterArgs}.
  The number of clusters actually resulting from this run of
  \code{\link{mainClustering}} may not be equal to the $K$ sent to  the clustering
  done in \code{\link{subsampleClustering}}. If \code{subsample=FALSE},
  \code{\link{mainClustering}} is called directly on the data to determine the
  clusters and $K$ set by \code{seqCluster} for this iteration determines the
  parameter of the clustering done by \code{\link{mainClustering}}. Specifically,
  the argument \code{clusterFunction} defines the clustering of the
  \code{\link{mainClustering}} step and \code{k} is sent to that
  \code{ClusterFunction} object. This means that if \code{subsample=FALSE},
  the \code{clusterFunction} must be of \code{algorithmType} "K".

In either setting of \code{subsample}, the resulting clusters from
  \code{\link{mainClustering}} for a particular $K$ will be compared to clusters
  found in the previous iteration of $K-1$. For computational (and other?)
  convenience, only the first \code{top.can} clusters of each iteration will
  be compared to the first \code{top.can} clusters of previous iteration for
  similarity (where \code{top.can} currently refers to ordering by size, so
  first \code{top.can} largest clusters.

If there is no cluster of the first \code{top.can} in the current
  iteration $K$ that has overlap similarity > \code{beta} to any in the
  previous iteration, then the algorithm will move to the next iteration,
  increasing to $K+1$.

If, however, of these clusters there is a cluster in the current
  iteration $K$ that has overlap similarity > beta to a cluster in the
  previous iteration $K-1$, then the cluster with the largest such similarity
  will be identified as a homogenous set of samples and the samples in it
  will be removed and designated as such. The algorithm will then start again
  to determine the next set of homogenous samples, but without these samples.
  Furthermore, in this case (i.e. a cluster was found and removed), the value
  of \code{kinit} will be be reset to \code{kinit-1}; i.e. the range of
  increasing $K$ that will be iterated over to find a set of homogenous
  samples will start off one value less than was the case for the previous
  set of homogeneous samples. If \code{kinit-1}<\code{k.min}, then
  \code{kinit} will be set to \code{k.min}.

If there are less than \code{remain.n} samples left after finding a
  cluster and removing its samples, the algorithm will stop, as subsampling
  is deamed to no longer be appropriate. If the K has to be increased to
  beyond \code{k.max} without finding any pair of clusters with overlap >
  beta, then the algorithm will stop. Any samples not found as part of a
  homogenous set of clusters at that point will be classified as unclustered
  (given a value of -1)

Certain combinations of inputs to \code{mainClusterArgs} and
  \code{subsampleArgs} are not allowed. See \code{\link{clusterSingle}} for
  these explanations.
}
\examples{
\dontrun{
data(simData)

set.seed(12908)
clustSeqHier <- seqCluster(simData, inputType="X", k0=5, subsample=TRUE,
   beta=0.8, subsampleArgs=list(resamp.n=100,
   samp.p=0.7, clusterFunction="kmeans", clusterArgs=list(nstart=10)),
   mainClusterArgs=list(minSize=5,clusterFunction="hierarchical01",
   clusterArgs=list(alpha=0.1)))
}
}
\references{
Tseng and Wong (2005), "Tight Clustering: A Resampling-Based
  Approach for Identifying Stable and Tight Patterns in Data", Biometrics,
  61:10-16.
}
\seealso{
tight.clust,
  \code{\link{clusterSingle}},\code{\link{mainClustering}},\code{\link{subsampleClustering}}
}
