Index: trunk/doc/stat_methods.tex
===================================================================
--- trunk/doc/stat_methods.tex	(revision 674)
+++ trunk/doc/stat_methods.tex	(revision 675)
@@ -1,3796 +1,3801 @@
 \documentclass[12pt,titlepage]{article}
 
 \usepackage{fullpage}
 \usepackage{amsmath}
 \usepackage{makeidx}
 \usepackage{epsfig}
 \usepackage[hyperfootnotes=false]{hyperref}
 \usepackage{longtable}
 \usepackage[width=.85\textwidth]{caption}
 
 \newcommand{\sub}[1]{\ensuremath{_{\mbox{\scriptsize \,#1}}}}
 \newcommand{\supers}[1]{\ensuremath{^{\mbox{\scriptsize #1}}}}
 
 \newcommand{\cname}[1]{\index{#1}\textsf{#1}}
 \newcommand{\E}{\ensuremath{\mathbb{E}}}
 \newcommand{\V}{\ensuremath{\mathbb{V}}}
 \newcommand{\R}{\ensuremath{\mathbb{R}}}
 \newcommand{\etal}{\emph{et al.}}
 \newcommand{\hf}{\hat{f}_N}
 \newcommand{\xf}{x\sub{fit}}
 \newcommand{\scriptname}[1]{\texttt{#1}}
 \newcommand{\at}{\tilde{a}\sub{fit}}
 \newcommand{\bt}{\tilde{b}\sub{fit}}
 
 \makeindex
 \renewcommand\indexname{Functions and Classes}
 
 \newenvironment{thinlist} {
     \begin{list} {---} {
         \setlength{\topsep}{0.075cm}
         \setlength{\parsep}{0.075cm}
         \setlength{\itemsep}{0.075cm}
     }
 } {\end{list}}
 
 
 \begin{document}
 
 \title{Statistical Methods in the NPStat Package}
 \author{I. Volobouev, {\it i.volobouev@ttu.edu}}
 \date{Version: 5.1.0 \hspace{1.5cm} Date: \today}
 \maketitle
 
 \tableofcontents
 
 \newpage
 \section{Preliminaries}
 \label{sec:preliminaries}
 
 The NPStat package provides a C++ implementation of 
 several nonparametric
 statistical modeling tools together with
 various supporting code. Nonparametric modeling becomes very
 useful when there is little prior information available to justify an
 assumption that the data belongs to a certain parametric family of
 distributions. Even though nonparametric techniques are usually
 more computationally intensive than parametric ones,
 judicious use of fast algorithms in combination with
 increasing power of modern computers often results in very practical
 solutions --- at least until the ``curse of dimensionality'' starts
 imposing its heavy toll.
 
 It will be assumed that the reader of this note is reasonably
 proficient in C++ programming, and that the meaning
 of such words as ``class'', ``function'', ``template'',
 ``method'', ``container'', ``namespace'', {\it etc.}
 is obvious from their context.
 
 The latest version of the package code can be downloaded from
 \href{http://npstat.hepforge.org/}{http://npstat.hepforge.org/}
 Most classes and functions implemented in NPStat are placed in the 
 ``npstat'' namespace. A~few functions and classes that
 need Minuit2~\cite{ref:minuit} installation
 in order to be meaningfully used belong
 to the ``npsi'' namespace. Such functions and
 classes can be found in the ``interfaces''
 directory of the package.
 
 When a C++ function, class, or template is mentioned in the text
 for the first time, the header file which contains its declaration
 is often mentioned as well. If the header file is not mentioned,
 it is most likely ``npstat/stat/NNNN.hh'', where NNNN stands
 for the actual name of the class or function.
 
 An expression ``$I(Q)$'' is frequently employed in the text,
 where $Q$ is some logical proposition.
 $I(Q)$ is the proposition indicator function
 defined as follows: $I(Q) = 1$ if $Q$ is true
 and $I(Q) = 0$ if $Q$ is false. For example, a product of $I(0 \le x \le 1)$
 with some other mathematical function is often used to
 denote some probability density supported
 on the $[0, 1]$ interval. Note that $I(\neg Q) = 1 - I(Q)$, where 
 $\neg Q$ is the logical negation of $Q$. 
 
 
 \section{Data Representation}
 \label{sec:datarep}
 
 For data storage, the NPStat software relies on the object serialization
 and I/O capabilities provided
 by the C++ Geners package~\cite{ref:geners}.
 Geners supports persistence of all standard library containers
 (vectors, lists, maps, {\it etc.}) including those introduced
 in the C++11 Standard~\cite{ref:cpp11} (arrays, tuples, unordered sets
 and maps). In addition to data structures supported by Geners, NPStat
 offers several other persistent containers useful in statistical
 data analysis: multidimensional arrays, homogeneous $n$-tuples, and
 histograms.
 
 Persistent multidimensional arrays are implemented with the \cname{ArrayND}
 template (header file ``npstat/nm/ArrayND.hh'').
 This is a~reasonably full-featured array class, with support for
 subscripting (with and without bounds checking), vector space operations
 (addition, subtraction,
 multiplication by a scalar), certain tensor operations (index contraction,
 outer product), slicing, iteration over subranges, linear and cubic
 interpolation of array values
 to fractional indices, {\it etc.} Arrays whose maximum
 number of elements is known at the compile time can be efficiently
 created on the stack.
 
 Homogeneous $n$-tuples are two-dimensional tables in which the number of
 columns is fixed while the number of rows can grow dynamically.
 The stored object type is chosen when the $n$-tuple is created
 (it is a template parameter). For
 homogeneous $n$-tuples, storage type is the same for every column
 which allows for more efficient optimization of memory allocation
 in comparison with heterogeneous $n$-tuples.\footnote{The Geners
 serialization package supports
 several types of persistent heterogeneous $n$-tuples including
 std::vector$<$std::tuple$<$...$> >$, 
 RowPacker variadic template
 optimized for accessing large data samples row-by-row,
 and ColumnPacker variadic template
 optimized for data access column-by-column.}
 Two persistent template classes are provided for homogeneous $n$-tuples:
 \cname{InMemoryNtuple} for $n$-tuples which can completely fit inside
 the computer memory (header file ``npstat/stat/InMemoryNtuple.hh'')
 and \cname{ArchivedNtuple} for $n$-tuples
 which can grow as large as the available disk space
 (header file ``npstat/stat/ArchivedNtuple.hh''). Both of these
 classes inherit from the abstract class \cname{AbsNtuple} which
 defines all interfaces relevant for statistical analysis of the data
 these $n$-tuples contain. Thus \cname{InMemoryNtuple} and \cname{ArchivedNtuple}
 are completely interchangeable for use in various data analysis algorithms.
 
 Arbitrary-dimensional\footnote{To be precise,
 the number of dimensions for both histograms
 and multidimensional arrays can not exceed
 31 on 32-bit systems and 63 on 64-bit systems. However, your computer will
 probably run out of memory long before this limitation
 becomes relevant for your data analysis.} persistent
 histograms are implemented with the \cname{HistoND}
 template (header file ``npstat/stat/HistoND.hh''). Both uniform and non-uniform
 placing of bin edges can be created using
 axis classes \cname{HistoAxis} and \cname{NUHistoAxis}, respectively,
 as \cname{HistoND} template parameters. The \cname{DualHistoAxis} class
 can be employed for histograms in which only a fraction of axes
 must have non-uniform bins.
 Two binning schemes are supported
 for data sample processing.
 In the normal scheme, the bin count is incremented if the point coordinates
 fall inside the given bin (the \cname{fill} method of the class).
 In the centroid-preserving scheme,
 all bins in the neighborhood of the sample point
 are incremented in such a way that the center of mass of bin
 increments coincides exactly with the point location
 (the \cname{fillC} method)\footnote{For one-dimensional histograms,
 this is called ``linear binning''~\cite{ref:linearbinning}.
 NPStat supports this technique for uniformly binned
 histograms of arbitrary dimensionality.}.
 The objects used as
 histogram bin contents and the objects used as
 weights added to the bins when
 the histogram is filled are not required to have
 anything in common; the only requirement is that a~meaningful
 binary function (typically, operator+=) can be defined for them.
 This level of abstraction allows, for example, for using vectors and tensors
 as histogram bin contents or for implementing profile histograms with
 the same \cname{HistoND} template. As the bin contents are implemented
 with the \cname{ArrayND} class, all features of \cname{ArrayND}
 (vector space operations, slicing, interpolation, {\it etc}) are available
 for them as well. Overflows are stored in separate arrays with
 three cells in each dimension: underflow, overflow, and the central part
 covered by the regular histogram bins.
 
 \section{Descriptive Statistics}
 \label{sec:descriptivestats}
 
 A number of facilities is included in the NPStat package for calculating
 standard descriptive sample statistics such as mean, covariance matrix,
 {\it etc}. Relevant functions, classes, and templates are described below.
 
 \subsection{Functions and Classes for Use with Point Clouds}
 
 \cname{arrayStats} (header file ``npstat/stat/arrayStats.hh'') --- This
 function estimates the population mean ($\mu$), standard deviation ($\sigma$),
 skewness ($s$),
 and kurtosis ($k$) for a one-dimensional set of points. A numerically sound
 two-pass algorithm is used.
 % Internally, calculations are performed in long double precision. 
 The following formulae are implemented ($N$ is the number of points
 in the sample):
 \begin{equation}
 \mu = \frac{1}{N} \sum_{i=0}^{N-1} x_i
 \label{eq:samplemean}
 \end{equation}
 \begin{equation}
 \sigma = \sqrt{\frac{1}{N - 1} \sum_{i=0}^{N-1} (x_i - \mu)^2}
 \label{eq:samplestdev}
 \end{equation}
 \begin{equation}
 s = \frac{N}{(N - 1) (N - 2) \sigma^3} \sum_{i=0}^{N-1} (x_i - \mu)^3
 \end{equation}
 \begin{equation}
 g_2 = N \frac{\sum_{i=0}^{N-1} (x_i - \mu)^4}{\left( \sum_{i=0}^{N-1} (x_i - \mu)^2 \right)^2} - 3
 \label{eq:kurtosisg2}
 \end{equation}
 \begin{equation}
 k = \frac{N - 1}{(N - 2) (N - 3)} ((N + 1) g_2 + 6) + 3
 \label{eq:kurtosis}
 \end{equation}
 Note that the population kurtosis estimate is defined in such a way that its
 expectation value for a sample drawn from the Gaussian distribution equals
 3 rather than 0. For more details on the origin of these formulae
 consult Ref.~\cite{ref:sampleskewkurt}.
 
 \cname{empiricalCdf} (header file ``npstat/stat/StatUtils.hh'') --- This
 function evaluates the empirical cumulative distribution function (ECDF) for
 a sample of points. The points must be sorted
 in the increasing order  by the user before
 the function call is made.
 The ECDF is defined as follows. Suppose, the $N$ sorted point values are
 $\{x_0, x_1, ..., x_{N-1}\}$ and all $x_i$ are distinct ({\it i.e.,} there
 are no ties). Then
 \begin{equation}
 \mbox{ECDF}(x) = \left\{ \begin{array}{ll}
                          0 & \mbox{if\ } x \le x_0, \\
                          \frac{1}{N} \left( \frac{x-x_i}{x_{i+1}-x_i}+i+1/2 \right) & \mbox{if\ } x_i < x \le x_{i+1} \ \mbox{and} \ x < x_{N-1}, \\
                          1 & \mbox{if\ } x \ge x_{N-1}.
                          \end{array}
                  \right.
 \label{eq:ecdf}
 \end{equation}
 When all sample points are distinct, $\mbox{ECDF}(x)$ is discontinuous
 at $x = x_0$ and $x = x_{N-1}$ and continuous for all other $x$.
 If the sample has more than one point
 with the same $x$, let say $x_k$ and
 $x_{k+1}$ where $k > 0$ and $k < N - 2$, $x_k$ also becomes
 a point of discontinuity.
 $\mbox{ECDF}(x_k)$ will be set to $\frac{1}{N} (k + 1/2)$, the value
 coincident with the function left limit.
 
 \cname{empiricalQuantile} (header file ``npstat/stat/StatUtils.hh'') --- This
 is the empirical quantile function, $\mbox{EQF}(y)$, which provides
 an inverse to $\mbox{ECDF}(x)$. For any $x$ from the interval
 $[x_0, x_{N-1}]$, $\mbox{EQF}(\mbox{ECDF}(x)) = x$ (up to numerical
 round-off errors). If $x < x_0$ then $\mbox{EQF}(\mbox{ECDF}(x)) = x_0$ and if
 $x > x_{N-1}$ then $\mbox{EQF}(\mbox{ECDF}(x)) = x_{N-1}$.
 
 \cname{MultivariateSumAccumulator} and \cname{MultivariateSumsqAccumulator}
 --- These classes are intended for calculating means and covariance matrices
     for multivariate data samples in a~numerically stable manner. The
     classes can be
     used inside user-implemented cycles over sets of points or with
     \cname{cycleOverRows} and other similar methods of the \cname{AbsNtuple}
     class. Two passes over the data
     are necessary for calculating the covariance matrix,
     first with
     \cname{MultivariateSumAccumulator} and second with
     \cname{MultivariateSumsqAccumulator}. The multivariate
     mean is initially found from
 \begin{equation}
 \overline{{\bf x}} = \frac{1}{N} \sum_{i=0}^{N-1} {\bf x}_i
 \end{equation}
 and then an estimate of the population covariance matrix is calculated as
 \begin{equation}
 V = \frac{1}{N - 1} \ 
     \sum_{i=0}^{N-1} ({\bf x}_i - \overline{{\bf x}}) ({\bf x}_i - \overline{{\bf x}})^T
 \label{eq:covar}
 \end{equation}
 
 \cname{MultivariateWeightedSumAccumulator} and
 \cname{MultivariateWeightedSumsqAccumulator} --- These classes are
 intended for calculating means and covariance matrices
 for multivariate data samples in which points enter with non-negative
 weights:
 \begin{equation}
 \overline{{\bf x}} = \frac{\sum_{i=0}^{N-1} w_i {\bf x}_i}{\sum_{i=0}^{N-1} w_i},
 \label{eq:weightedmean}
 \end{equation}
 \begin{equation}
 V = \frac{N_{eff}}{N_{eff} - 1} \ 
     \frac{\sum_{i=0}^{N-1} w_i ({\bf x}_i - \overline{\bf x}) ({\bf x}_i - \overline{\bf x})^T}{\sum_{i=0}^{N-1} w_i},
 \label{eq:weightedcov}
 \end{equation}
 where $N_{eff}$ is the effective number of entries in a weighted
 sample\footnote{$N_{eff}$ is also known as the Kish's effective sample size. Note that this
 estimate of $V$ makes sense only if the weights and point locations are statistically independent
 from each other. If the weights are functions of the observed ${\bf x}_i$ then there
 appears to be no general estimate that is independent of the weighting function.}:
 \begin{equation}
 N_{eff} = \frac{\left( \sum_{i=0}^{N-1} w_i \right)^2}{\sum_{i=0}^{N-1} w_i^2}.
 \label{eq:neff}
 \end{equation}
 Two passes over the data are required, first with
 \cname{MultivariateWeightedSumAccumulator} and then with
 \cname{MultivariateWeightedSumsqAccumulator}.
 
 \cname{StatAccumulator} --- Persistent class which determines
 the minimum, the maximum, the mean, and the standard
 deviation of a~sample of values using a single-pass algorithm.
 It can be used when the two-pass calculation
 is either impossible or impractical for
 some reason. This class can be conveniently employed as
 a bin content type for \cname{HistoND} (this turns \cname{HistoND}
 into a profile histogram).
 \cname{StatAccumulator}
 maintains a running average which is updated
 after accumulating $2^K$ points, $K = 0, 1, 2, ...$ The 
 numerical precision of the standard deviation determined
 in this manner is not as good as in the two-pass method but it
 is usually much better than that expected
 from a~naive implementation which simply accumulates sample moments about 0
 and then evaluates $\sigma^2$ according to
 $\frac{N}{N - 1} (\overline{{\bf x}^2} - {\overline{\bf x}}^2)$ (this formula
 can suffer from a severe subtractive cancellation problem).
 
 \cname{WeightedStatAccumulator} --- Similar class for calculating
 mean and standard deviation of a~sample of weighted values using
 a single-pass algorithm.
 
 \cname{StatAccumulatorArr} --- This class provides functionality similar
 to an array of \cname{StatAccumulator} objects but with a more convenient
 interface for accumulating sample values (cycling over the input values
 is performed by the class itself).
 
 \cname{StatAccumulatorPair} --- Two \cname{StatAccumulator} objects
 augmented by the sum of cross terms which allows for subsequent
 calculation of covariance. The covariance is estimated using
 a formula which can potentially suffer from subtractive
 cancellation\footnote{This problem is partially alleviated
 by using long double type for internal calculations.}.
 Therefore, this class should not be used to process large samples of points in
 which, for one of the variables or for both of them,
 the absolute value of the mean can be much larger than the
 standard deviation. This class is persistent.
 
 \cname{CrossCovarianceAccumulator} --- To accumulate the sums for
 covariance matrix calculation according to formula~\ref{eq:covar}
 in $d$-dimensional space,
 one has to keep track of $d (d + 1)/2$ sums of squares and cross terms.
 Sometimes only a small part of the complete covariance matrix is of interest.
 In this case $d$ can often be split into subspaces of dimensionalities
 $d_1$ and $d_2$, $d = d_1 + d_2$, so that the interesting part of
 covariance matrix is dimensioned $d_1 \times d_2$ (plus $d$ diagonal terms).
 For large values
 of $d$, accumulating $d + d_1 \times d_2$ sums instead of $d (d + 1)/2$
 can mean big difference in speed and memory consumption, especially
 in case $d_1 \ll d_2$ (or $d_1 \gg d_2$).
 The \cname{CrossCovarianceAccumulator} allows
 its users to do just that: it accumulates cross terms between two
 arrays with $d_1$ and $d_2$ elements but not the cross terms between
 elements of the same array. The covariances are estimated by a single-pass
 method using a~formula which can suffer from subtractive
 cancellation\footnotemark[5]. Use with care.
 
 \cname{SampleAccumulator} --- This class stores all values
 it gets in an internal buffer. Whenever mean, standard deviation,
 or some quantile function values are needed for the accumulated sample,
 they are calculated from the stored values using numerically sound techniques.
 Mean and standard deviation are calculated according to
 Eqs.~\ref{eq:samplemean} and~\ref{eq:samplestdev}.
 Covariance and correlations can be calculated for a pair of
 \cname{SampleAccumulator} objects with the same number of stored elements
 by corresponding methods of the class.
 Empirical CDF and empirical quantile calculations
 are handled by calling \cname{empiricalCdf} and \cname{empiricalQuantile}
 functions internally, preceded by data sorting.
 This class is persistent and can be used as the bin content template
 parameter for \cname{HistoND}.
 
 \cname{WeightedSampleAccumulator} --- Similar to SampleAccumulator,
 but intended for calculating various statistics for samples
 of weighted points.
 
 Kendall's $\tau$ and Spearman's $\rho$
 rank correlation coefficients can be estimated
 by functions \cname{sampleKendallsTau} and \cname{sampleSpearmansRho},
 respectively. These functions are declared in the header files
 ``npstat/stat/kendallsTau.hh'' and ``npstat/stat/spearmansRho.hh'',
 respectively.
 
 
 \subsection{Functions for Use with Binned Data}
 
 \cname{histoMean} (header file ``npstat/stat/histoStats.hh'') --- This
 function calculates the histogram center of mass according to
 Eq.~\ref{eq:weightedmean} with bin contents
 used as weights for bin center coordinates.
 Note that the code of this function does not enforce non-negativity of all bin
 values, and the result of~\ref{eq:weightedmean} will not necessarily
 make a lot of sense in case negative weights are present.
 In order to check that all bins are non-negative
 and that there is at least one positive bin, you can call the \cname{isDensity}
 method of the \cname{ArrayND} class on the histogram
 bin contents. The \cname{histoMean}
 function is standalone and not a member of the \cname{HistoND} class
 because Eq.~\ref{eq:weightedmean} is not going to make sense for all
 possible bin types.
 
 \cname{histoCovariance} (header file ``npstat/stat/histoStats.hh'') --- This
 function estimates the population covariance matrix for histogrammed data
 according to Eq.~\ref{eq:weightedcov}, with bin contents
 used as weights for bin center coordinates.
 As for \cname{histoMean}, results
 produced by \cname{histoCovariance}
 will not make much sense in case negative bin values are present.
 
 \cname{arrayCoordMean} (header file ``npstat/stat/arrayStats.hh'') --- Similar
 to \cname{histoMean} but the bin values are provided in an \cname{ArrayND}
 object and bin locations are specified by additional arguments. 
 Only uniform bin spacing is supported by this function
 (unlike \cname{histoMean} which simply gets its bin centers from
 the argument histogram).
 
 \cname{arrayCoordCovariance} (header file ``npstat/stat/arrayStats.hh'') 
 --- Similar to \cname{histoCovariance} but the bin values are provided
 in an \cname{ArrayND} object and bin locations are specified by additional
 arguments. Uniform binning only.
 
 \cname{arrayShape1D} (header file ``npstat/stat/arrayStats.hh'') --- Estimates
 population mean, standard deviation, skewness, and kurtosis for 
 one-dimensional histogrammed data with equidistant bins.
 Bin values $b_i$ used as weights are provided in an \cname{ArrayND}
 object and bin locations $x_i$ are specified by additional arguments.
 The mean and the standard deviations squared
 are calculated according to Eqs.~\ref{eq:weightedmean}
 and~\ref{eq:weightedcov} reduced to one dimension.
 The skewness is estimated as
 \begin{equation}
 s = \frac{N_{eff}^2}{(N_{eff} - 1) (N_{eff} - 2) \sigma^3} \ 
 \frac{\sum_{i=0}^{N_b-1} b_i (x_i - \mu)^3}{\sum_{i=0}^{N_b-1} b_i},
 \end{equation}
 with $N_{eff}$ defined by Eq.~\ref{eq:neff}. The kurtosis is found from Eq.~\ref{eq:kurtosis} in which $N$
 is replaced by $N_{eff}$ and, instead of Eq.~\ref{eq:kurtosisg2},
 $g_2$ is determined from
 \begin{equation}
 g_2 = \sum_{i=0}^{N_b-1} b_i \ \frac{\sum_{i=0}^{N_b-1} b_i (x_i - \mu)^4}{\left( \sum_{i=0}^{N_b-1} b_i (x_i - \mu)^2 \right)^2} - 3.
 \end{equation}
 
 \cname{arrayQuantiles1D} (header file ``npstat/stat/arrayStats.hh'')
 --- This function treats
 one-dimensional arrays as histograms and determines the values
 of the quantile function for a~given set of cumulative density values
 (after normalizing the histogram area to 1). Each bin is considered
 to have uniform probability density between its edges. If you need
 to perform this type of calculation more than once with the same
 array, it will be more efficient to construct a \cname{BinnedDensity1D}
 object and then to use its \cname{quantile} method instead of calling
 the \cname{arrayQuantiles1D} function multiple times.
 
 \cname{arrayEntropy} (header file ``npstat/stat/arrayStats.hh'')
 --- This function calculates 
 \begin{equation}
 \label{eq:entropy}
 H = -\frac{1}{N_b} \sum_{i=0}^{N_b-1} b_i \ln b_i
 \end{equation}
 This formula is appropriate for arrays that tabulate probability density
 values on uniform grids. To convert the result  into the actual entropy
 of the density, multiply it by the volume of the distribution support.
 
 
 \subsection{ArrayND Projections}
 
 The following set of classes works with \cname{ArrayND} class
 and assists in making array ``projections''. 
 Projections reduce array dimensionality
 by calculating certain array properties over projected indices.
 For example, one might want to create a one-dimensional array, $a$,
 from a three-dimensional array, $b$, by summing all array elements
 with the given second index: $a_j = \sum_{i, k} b_{ijk}$ (this means
 that the first and the third indices of the array are ``projected''). This
 and other similar operations can be performed by the \cname{project}
 method of the \cname{ArrayND} class. What is done with the
 array elements when the projection is performed can be specified
 by the ``projector'' argument of the \cname{project} method. In particular,
 this argument can be an object which performs some statistical
 analysis of the projected elements.
 
 \cname{ArrayMaxProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 --- For each value of the array indices which are not projected,
     finds the maximum array element for all possible values of
     projected indices.
 
 \cname{ArrayMinProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 --- Finds the minimum array element for all possible values of
     projected indices.
 
 \cname{ArraySumProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 --- Sums all array values in the iteration over projected indices.
 
 \cname{ArrayMeanProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 --- Calculates the mean array value over projected indices.
 
 \cname{ArrayMedianProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 --- Calculates the median array value over projected indices.
 
 \cname{ArrayStdevProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 --- Calculates the standard deviation of array values encountered
     during the iteration over projected indices.
 
 \cname{ArrayRangeProjector} (header file ``npstat/stat/ArrayProjectors.hh'')
 --- Calculates the ``range'' of array values over projected indices.
     ``Range'' is defined here as the difference between 75\supers{th} and
     25\supers{th} percentiles of the sample values divided by the
     distance between 75\supers{th} and 25\supers{th} percentiles of
     the Gaussian distribution with $\sigma = 1$. Therefore, ``range'' can
     be used as a robust estimate of the standard deviation.
 
 All ``projectors'' are derived either from \cname{AbsArrayProjector} class
 in case the calculated quantity depends on the array indices
 (header file ``npstat/nm/AbsArrayProjector.hh'')
 or from \cname{AbsVisitor} class in case it is sufficient
 to know just the element values (header file ``npstat/nm/AbsVisitor.hh'').
 Naturally, the user
 can supply his/her own projector implementations derived
 from these base classes for use
 with the the \cname{project} method of the \cname{ArrayND} class.
 
 
 \section{Statistical Distributions}
 \label{sec:distros}
 
 A number of univariate and multivariate statistical distributions
 is supported by the NPStat package. All implementations of
 univariate continuous distributions share a common interface and can be
 used interchangeably in a variety of statistical algorithms.
 A similar approach has been adopted for univariate discrete and
 multivariate continuous distributions.
 
 \subsection{Univariate Continuous Distributions}
 
 All classes which represent univariate continuous distributions
 inherit from the \cname{AbsDistribution1D} abstract base class. These classes
 must implement methods \cname{density} (probability density function),
 \cname{cdf} (cumulative distribution function), \cname{exceedance}
 (exceedance probability is just 1 $-$ cdf, but direct application
 of this formula is often
 unacceptable in numerical calculations due to subtractive cancellation),
 and \cname{quantile} (the quantile function, inverse of cdf). For
 a large number of statistical distributions, the probability density
 function looks like $\frac{1}{\sigma}\, p\left( \frac{x - \mu}{\sigma} \right)$, where
 $\mu$ and $\sigma$ are the distribution location and scale parameters, respectively.
 Distributions of this kind should inherit from the base class 
 \cname{AbsScalableDistribution1D} which handles proper
 scaling and shifting of the argument (this base class itself inherits
 from \cname{AbsDistribution1D}, and it is declared in the same
 header file ``npstat/stat/AbsDistribution1D.hh'').
 
 Some of the standard univariate continuous distributions implemented
 in NPStat are listed in Table~\ref{table:distros1d}. A few special
 distributions less frequently encountered in the statistical
 literature are described in more detail in the following subsections.
 Of course, user-developed classes inheriting from \cname{AbsDistribution1D}
 or \cname{AbsScalableDistribution1D} can also be employed with all
 NPStat functions and classes that take an instance of
 \cname{AbsDistribution1D} as one of parameters.
 \begin{longtable}{|c|c|c|}
 % \captionsetup{width=.9\textwidth}
 \caption{Continuous univariate distributions included in NPStat.
 $P_n(x)$ are the Legendre polynomials.
 Parameters $\mu$ and $\sigma$ are not shown for scalable distributions.
 When not given explicitly, the normalization constant $\cal{N}$
 ensures that 
 $\int_{-\infty}^{\infty} p(x) dx = 1$. Most of the classes
 listed in this table are declared in the header file
 ``npstat/stat/Distributions1D.hh''. If the distribution
 is not declared in that header then it has a dedicated header
 with the same name, {\it e.g.}, ``npstat/stat/TruncatedDistribution1D.hh''
 for \cname{TruncatedDistribution1D}.}
 \label{table:distros1d} \\
 
 \hline \multicolumn{1}{|c|}{Class Name} &
 \multicolumn{1}{c|}{$p(x)$} &
 \multicolumn{1}{c|}{Scalable?} \\ \hline \hline 
 \endfirsthead
 
 \multicolumn{3}{c}%
 {\tablename\ \thetable{} --- continued from the previous page} \\
 \hline \multicolumn{1}{|c|}{Class Name} &
 \multicolumn{1}{c|}{$p(x)$} &
 \multicolumn{1}{c|}{Scalable?} \\ \hline \hline 
 \endhead
 
 \multicolumn{3}{|r|}{{Continued on the next page}} \\ \hline
 \endfoot
 
 \hline
 \endlastfoot
 
 \cname{Uniform1D} & $I(0 \le x \le 1)$ & yes \\ \hline
 \cname{IsoscelesTriangle1D} & $(1 - |x|) I(-1 \le x \le 1)$ & yes \\ \hline
 \cname{Exponential1D} & $e^{-x} I(x \ge 0)$ & yes \\ \hline
 \cname{Quadratic1D} & $(1 + a P_1(2 x - 1) + b P_2(2 x - 1)) \,I(0 \le x \le 1)$ & yes \\ \hline
 \cname{LogQuadratic1D} & ${\cal{N}} \exp (a P_1(2 x - 1) + b P_2(2 x - 1)) \,I(0 \le x \le 1)$ & yes \\ \hline
 \cname{Gauss1D} & $\frac{1}{\sqrt{2 \pi}} e^{-x^2/2}$ & yes \\ \hline
 \cname{GaussianMixture1D} & $\frac{1}{\sqrt{2 \pi}} \sum_i \frac{w_i}{\sigma_i} e^{-(x-\mu_i)^2/(2 \sigma_i^2)}, \ \mbox{where} \ \sum_i w_i = 1$ & yes \\ \hline
 \cname{TruncatedGauss1D} & $\frac{\cal{N}}{\sqrt{2 \pi}} e^{-x^2/2} \, I(-n_{\sigma} \le x \le n_{\sigma})$ & yes \\ \hline
 \cname{MirroredGauss1D} & \begin{minipage}{0.52\linewidth}
 \vskip-3mm \begin{eqnarray*}
 I(0 \le x \le 1)\, \frac{1}{\sqrt{2 \pi} s} \sum_{i=-\infty}^{\infty} & 
 \left[ e^{-(x - m + 2\,i)^2/(2 s^2)} + \right.\\
 & \left.e^{-(x + m + 2\,i)^2/(2 s^2)}\right],
 \end{eqnarray*}  \vskip-1mm 
 where $0 \le m \le 1$ and $s > 0$ \newline \vskip-8mm $\mbox{}$ \end{minipage} & yes \\ \hline
 \cname{BifurcatedGauss1D} & \begin{minipage}{0.52\linewidth}
 \vskip-2mm \begin{eqnarray*}
 {\cal{N}} & \left[e^{-x^2/(2 \alpha)} I(-n_{\sigma,L} \le x < 0) \, + \right.\\
           & \left. e^{-x^2/(2 (1 - \alpha))} I(0 \le x < n_{\sigma,R})\right]
 \end{eqnarray*} \vskip0mm \end{minipage} & yes \\ \hline
 \cname{UGaussConvolution1D} & $\frac{1}{\sqrt{2 \pi} (b - a)} \, e^{-x^2/2} * I(a \le x \le b)$ & yes \\ \hline
 \cname{SymmetricBeta1D} & ${\cal{N}} \, (1 - x^2)^p \, I(-1 < x < 1)$ & yes \\ \hline
 \cname{Beta1D} & ${\cal{N}} \, x^{\alpha - 1} (1 - x)^{\beta - 1} \, I(0 < x < 1)$ & yes \\ \hline
 \cname{Gamma1D} & ${\cal{N}} \, x^{\alpha - 1} e^{-x} \, I(x > 0)$ & yes \\ \hline
 \cname{Pareto1D} & $\alpha x^{-\alpha - 1} I(x \ge 1)$ & yes \\ \hline
 \cname{Huber1D} & ${\cal{N}} \, [e^{-x^2/2} I(|x| \le a) + e^{\, a (a/2 - |x|)} (1 - I(|x| \le a))] $ & yes \\ \hline
 \cname{Cauchy1D} & $\pi^{-1} (1 + x^2)^{-1}$ & yes \\ \hline
 \cname{StudentsT1D} & ${\cal{N}} \, (1 + x^2/N_{dof})^{-(N_{dof} + 1)/2}$ & yes \\ \hline
 \cname{Moyal1D} & $\frac{1}{\sqrt{2 \pi}} e^{-(x + e^{-x})/2}$ & yes \\ \hline
 \cname{Logistic1D} & $e^{-x} \,(1 + e^{-x})^{-2}$ & yes \\ \hline
 \cname{Tabulated1D} & \begin{minipage}{0.52\linewidth}
                       \vskip1mm
                       Defined by a table of equidistant values on
                       the [0, 1] interval, interpolated by 
                       a~polynomial (up to cubic). The first table
                       point is at $x = 0$ and the last is at $x = 1$. \newline \vskip-8mm $\mbox{}$
                       % Normalization is computed automatically.
                       \end{minipage} & yes \\ \hline
 \cname{BinnedDensity1D} & \begin{minipage}{0.52\linewidth}
                       \vskip1mm
                       Defined by a table of $N$ equidistant values
                       on the [0, 1] interval, with optional linear
                       interpolation. The first table point is at
                       $x = 1/(2 N)$ and the last is at $x = 1 - 1/(2 N)$.
                       Useful for converting 1-d histograms into distributions.
                       \end{minipage} & yes \\ \hline
 \cname{QuantileTable1D} & \begin{minipage}{0.52\linewidth}
                       \vskip1mm
                       Defined by a table of $N$ equidistant quantile function values
                       on the [0, 1] interval with linear interpolation between these values
                       (so that density looks like a histogram with equal area bins).
                       The first table point is at
                       $x = 1/(2 N)$ and the last is at $x = 1 - 1/(2 N)$.
                       Useful for converting data samples into distributions
                       by sampling empirical quantiles.
                       \end{minipage} & yes \\ \hline
 \cname{DistributionMix1D} & $\sum_i w_i p_i(x), \ \mbox{where} \ \sum_i w_i = 1$ & no \\ \hline
 \cname{DeltaMixture1D} & $\sum_i w_i \delta(x - x_i), \ \mbox{where} \ \sum_i w_i = 1$ & yes \\ \hline
 \cname{LocationScaleFamily1D} & \begin{minipage}{0.52\linewidth}
                       \vskip1mm
                       $\frac{1}{s} p\sub{other}\left(\frac{x - m}{s}\right)$, useful for converting non-scalable
                       1-d distributions into scalable.
                       \end{minipage} & yes \\ \hline
 \cname{RatioOfNormals} & \begin{minipage}{0.52\linewidth} 
                       \vskip1mm
                       Ratio of two correlated normal random variables, as described in~\cite{ref:ratioofnormals}.
                       \end{minipage} & no \\ \hline
 \cname{LeftCensoredDistribution}  & $f \, p\sub{other}(x) + (1 - f) \, \delta(x - x_{-\infty})$ & no \\ \hline
 \cname{RightCensoredDistribution}  & $f \, p\sub{other}(x) + (1 - f) \, \delta(x - x_{+\infty})$ & no \\ \hline
 \cname{TruncatedDistribution1D} & ${\cal{N}} \, p\sub{other}(x) \, I(x\sub{min} \le x \le x\sub{max})$ & no \\ \hline
 \cname{TransformedDistribution1D} & $p\sub{other}(y) \left| \frac{dy}{dx} \right|$ for monotonous $y(x)$ & no \\
 \end{longtable}
 
 \subsection{Johnson Curves}
 
 The density functions of the Johnson
 distributions~\cite{ref:johnson, ref:johnsonbook, ref:hahnshap}
 are defined as follows:
 \begin{equation}
 S_{U}\ \mbox{(unbounded)}:\ \ p(x) = \frac{\delta}{\lambda \sqrt{2 \pi \left(1 + 
 \left(\frac{x - \xi}{\lambda}\right)^{2}\right)}} \,e^{-\frac{1}{2}
 \left(\gamma + \delta\,\mbox{\scriptsize sinh}^{-1}\left(\frac{x-\xi}
 {\lambda}\right)\right)^{2}}
 \label{eq:johnsu}
 \end{equation}
 \begin{equation}
 S_{B}\ \mbox{(bounded)}:\ \ \ p(x) = \frac{\delta}{\lambda \sqrt{2 \pi}
 \left(\frac{x - \xi}{\lambda}\right)\left(1 - \frac{x - \xi}{\lambda}\right)}
 \,e^{-\frac{1}{2}\left( \gamma + \delta \log \left( \frac{x-\xi}
 {\xi + \lambda - x}\right) \right)^{2}} I(\xi < x < \xi + \lambda)
 \label{eq:johnsb}
 \end{equation}
 They are related to the normal distribution, $N(\mu, \sigma)$,
 by simple variable
 transformations. Variable $z$ distributed according to $N(0, 1)$
 can be 
 obtained from Johnson's variates $x$ by transformation
 $z = \gamma + \delta f\left(\frac{x-\xi}{\lambda}\right)$:
 \begin{displaymath}
 \begin{array}{rlll}
 f(y) = & \mbox{sinh}^{-1}(y) & & S_{U}\ \mbox{curves}\\
 f(y) = & \log \left(\frac{y}{1-y}\right) & (0 < y < 1) & S_{B}\ \mbox{curves}
 \end{array}
 \end{displaymath}
 Both densities become arbitrarily close to the lognormal density (for which
 variable $z = \gamma + \delta \log (x - \xi)$ is normally distributed)
 in the limit $\gamma \rightarrow \infty$
 and to the normal distribution in the limit $\delta \rightarrow \infty$ 
 ($\xi$ and $\lambda$ have to be adjusted accordingly). Johnson's
 parameterization of the lognormal density is
 \begin{equation}
 p(x) = \frac{\delta}{\sqrt{2 \pi} (x - \xi)} e^{-\frac{1}{2} (\gamma + \delta \log (x - \xi))^2} I(x > \xi)
 \end{equation}
 
 Together with their limiting cases, Johnson's $S_{U}$ and $S_{B}$
 distributions attain {\it all possible values of skewness and kurtosis.} 
 Unfortunately, parameters $\xi, \lambda, \gamma$, and $\delta$
 of $S_{U}$ and $S_{B}$ in
 Eqs.~\ref{eq:johnsu} and~\ref{eq:johnsb} have no direct
 relation to each other.
 Crossing the lognormal boundary requires a discontinuous change in
 the parameter values which is 
 rather inconvenient for practical data fitting purposes. This
 problem can be alleviated by reparameterizing the functions
 in terms of mean $\mu$, standard deviation $\sigma$,
 skewness $s$, and kurtosis $k$, so that the corresponding
 curve type and the original parameters $\xi, \lambda, \gamma, \delta$
 are determined numerically. Examples of Johnson's density functions
 are shown in Fig~\ref{johns_exa}.
 \begin{figure}[t]
 \begin{center}
 \includegraphics[width=0.6\textwidth]{johnson_examples.pdf}
 \caption{ Black: $s = 0,  ~k = 3$, Gaussian. Red: $s = -1.5, ~k = 10$, $S_{U}$.
 Blue: $s = 0.5, ~ k = 2$, $S_{B}$.
 For all three curves, the mean is 0 and the standard deviation is 1.}
 \label{johns_exa}
 \end{center}
 \end{figure}
 
 Johnson's $S_{U}$ and $S_{B}$ curves are implemented in NPStat with classes
 \cname{JohnsonSu} and \cname{JohnsonSb}, respectively (header file
 ``npstat/stat/JohnsonCurves.hh''). Both of these
 distributions are parameterized by $\mu$, $\sigma$, $s$, and $k$,
 with an automatic internal conversion into $\xi, \lambda, \gamma, \delta$.
 An original algorithm was developed to perform this conversion,
 based in part on ideas from Refs.~\cite{ref:draper, ref:hill}.
 The lognormal distribution parameterized by $\mu$, $\sigma$, and $s$
 is implemented by the \cname{LogNormal} class (header file
 ``npstat/stat/Distributions1D.hh''). The class \cname{JohnsonSystem}
 (header file ``npstat/stat/JohnsonCurves.hh'') can be used when automatic
 switching between $S_{U}$, $S_{B}$,
 lognormal, and Gaussian distributions is desired.
 
 
 \subsection{Composite Distributions}
 
 Composite distributions are built out of two or more
 component distributions. One of these component distributions is
 arbitrary while
 all others must have a density supported on the interval
 [0, 1]. Suppose, $G_k(x), \ k = 1, 2, 3, ...$ are cumulative
 distribution functions with corresponding densities $g_i(x)$
 proportional to $I(0 \le x \le 1)$.
 Then, if $H(x)$ is a cumulative distribution function
 with density $h(x)$, $F_1(x) = G_1(H(x))$ is also a cumulative
 distribution function with density $f_1(x) = h(x) \, g_1(H(x))$.
 Similarly, $F_2(x) = G_2(F_1(x))$ is a cumulative distribution with density
 $f_2(x) = f_1(x) g_2(F_1(x)) = h(x) \, g_1(H(x)) g_2(G_1(H(x)))$.
 We can now construct $F_3(x) = G_3(F_2(x))$ and so on. This sequence can
 be terminated after an arbitrary number of steps.
 
 Note that $f_1(x) = h(x)$ in case $g_1(x)$ is a uniform
 probability density. Small deviations from uniformity in $g_1(x)$
 will lead to corresponding small changes in $f_1(x)$.
 The resulting density model can be quite flexible, even if
 the component distributions $H(x)$ and $G_1(x)$ are
 simple. Therefore, data samples with complicated sets
 of features can be modeled as follows: construct
 an approximate model $h(x)$ first, even if it does not fit
 the sample quite right.
 Then transform the data points $x_i$ to the [0, 1] interval
 according to $y_i = H(x_i)$. The density of 
 points $y_i$\footnote{This density is called ``relative density''~\cite{ref:handcock}
 or ``comparison density''~\cite{ref:thas} in the statistical literature. Note
 that $h(x)$ should be selected in such a way that the ratio between
 the unknown population density of the sample under study and $h(x)$ should
 be bounded for all $x$. If it is not, the relative density
 will usually be unbounded, and its subsequent representation
 by polynomials or log-polynomials will not lead to a consistent
 estimate. Johnson curves often work reasonably well as $h(x)$.}
 can now be
 fitted to another parametric distribution (including composite one)
 or it can be modeled by nonparametric techniques~\cite{ref:kdetransform}.
 Due to the elimination of the boundary bias, the LOrPE density
 estimation method described in Section~\ref{sec:lorpe}
 becomes especially useful in the latter approach.
 Once the appropriate $G(y)$ is constructed (parametrically
 or not), the resulting composite density will provide
 a good fit to the original set of points $x_i$\footnote{There is,
 of course, a close connection between this density modeling
 approach and a number of goodness-of-fit techniques based on
 the comparison of the empirical cdf
 with the cdf of the fitted density. For example, using
 Legendre polynomials to model $\log (g_1(x))$, as in the
 \cname{LogQuadratic1D} density, directly
 improves the test criterion used in the Neyman's
 smooth test for goodness-of-fit~\cite{ref:thas, ref:smoothtest}.}.
 
 The composite distributions are implemented in NPStat
 with the \cname{CompositeDistribution1D} class.
 One composite distribution can be used to construct
 another, thus allowing for composition chains of
 arbitrary length, as described at the beginning of
 this subsection. Several pre-built distributions of this type
 are included in the NPStat package (they are declared in the header file
 ``npstat/stat/CompositeDistros1D.hh''). Flexible
 models potentially capable of fitting wide varieties
 of univariate data samples are implemented by the \cname{JohnsonLadder}
 and \cname{BinnedCompositeJohnson} classes. In both of these,
 Johnson curves are used as $H(x)$. \cname{JohnsonLadder}
 takes an arbitrarily long sequence of parametric \cname{LogQuadratic1D}
 distributions for $G_k(x)$ while \cname{BinnedCompositeJohnson}
 is using a single nonparametric \cname{BinnedDensity1D} as $G_1(x)$.
 
 
 \subsection{Univariate Discrete Distributions}
 
 Classes which represent univariate discrete distributions
 inherit from the \cname{AbsDiscreteDistribution1D} abstract base class.
 The interface defined by this base class differs in a number of ways
 from the \cname{AbsDistribution1D} interface.
 Instead of the method \cname{density}
 used with arguments of type ``double'',
 discrete distributions have the method \cname{probability} defined for
 the arguments of type ``long int''.
 The methods \cname{cdf} and \cname{exceedance}
 have the same signatures as corresponding methods of continuous distributions,
 but the \cname{quantile} function is returning long integers. The
 method \cname{random} generates long integers as well.
 
 Discrete univariate distributions which can be trivially shifted should
 inherit from the \cname{ShiftableDiscreteDistribution1D} base class which
 handles the shift operation. There is, however, no operation analogous
 to scaling of continuous distributions.
 
 Univariate discrete distributions implemented
 in NPStat are listed in Table~\ref{table:discrdistros1d}.
 \begin{table}[ht!]
 \caption{Discrete univariate distributions included in NPStat.
 The location parameter is not shown explicitly for shiftable distributions.
 Classes listed in this table are declared in the header file
 ``npstat/stat/DiscreteDistributions1D.hh''.}
 \label{table:discrdistros1d}
 \begin{center}
 \noindent\begin{tabular}{|c|c|c|} \hline
 Class Name & $p(n)$ & Shiftable? \\ \hline\hline
 \cname{DiscreteTabulated1D} & \begin{minipage}{0.52\linewidth}
                       \vskip1mm
                       Defined by a table of probability values.
                       Normalization is computed automatically.
                       \end{minipage} & yes \\ \hline
 \cname{Poisson1D} & $\frac{\lambda^n}{n!} e^{-\lambda}$ & no \\ \hline
 \end{tabular}
 \end{center}
 \end{table}
 Mixtures of discrete univariate distributions with finite support
 can be implemented using the function \cname{pooledDiscreteTabulated1D}
 (header file ``npstat/stat/DiscreteDistributions1D.hh'').
 
 \subsection{Multivariate Continuous Distributions}
 
 All classes which represent multivariate continuous distributions
 inherit from the \cname{AbsDistributionND} abstract base class.
 These classes must implement methods \cname{density} (probability
 density function) and \cname{unitMap} (mapping from the unit
 $d$-dimensional cube, $U_d$, into the density support region).
 Densities that can be shifted and scaled in each coordinate separately
 should be derived from the \cname{AbsScalableDistributionND} base class
 (which itself inherits from \cname{AbsDistributionND}).
 Classes representing densities that look like $p({\bf x}) = \prod q(x_i)$,
 where $q(x)$ is some one-dimensional density,
 should be derived from the \cname{HomogeneousProductDistroND} base class
 (the three base classes just mentioned
 are declared in the ``npstat/stat/AbsDistributionND.hh'' header file).
 Simple classes inheriting from \cname{AbsDistributionND}
 are listed in Table~\ref{table:distrosnd}.
 \begin{table}[ht!]
 \caption{Continuous multivariate distributions included in NPStat.
 These distributions are predominantly intended for use as multivariate
 density estimation kernels.
 Shifts and scale factors are not shown for scalable distributions.
 Here, ``scalability'' means
 the ability to adjust the shift and scale parameters in each
 dimension, not the complete bandwidth matrix.
 When not given explicitly, the normalization constant $\cal{N}$
 ensures that 
 $\int p({\bf x}) d{\bf x} = 1$. All of these distributions
 are declared in the header file ``npstat/stat/DistributionsND.hh''
 with exception of \cname{ScalableGaussND} which has its own header.}
 \label{table:distrosnd}
 \begin{center}
 \noindent\begin{tabular}{|c|c|c|} \hline
 Class Name & $p({\bf x})$ & Scalable? \\ \hline\hline
 \cname{ProductDistributionND} &  $\prod_{i=1}^{d} p_i(x_i)$ & 
 \begin{minipage}{0.13\linewidth}
 depends on components
 \end{minipage} \\ \hline
 \cname{UniformND} & $\prod_{i=1}^{d} I(0 \le x_i \le 1)$ & yes \\ \hline
 \cname{ScalableGaussND} & $(2 \pi)^{-d/2} e^{-|{\bf x}|^2/2}$ & yes \\ \hline
 \cname{ProductSymmetricBetaND} & ${\cal{N}} \prod_{i=1}^{d} (1 - x_i^2)^p I(-1 < x_i < 1)$ & yes \\ \hline
 \cname{ScalableSymmetricBetaND} & ${\cal{N}} (1 - |{\bf x}|^2)^p I(|{\bf x}| < 1)$ & yes \\ \hline
 \cname{ScalableHuberND} & ${\cal{N}} \, [e^{-|{\bf x}|^2/2} I(|{\bf x}| \le a) + e^{\, a (a/2 - |{\bf x}|)} (1 - I(|{\bf x}| \le a))]$ & yes \\ \hline
 \cname{RadialProfileND} & \begin{minipage}{0.52\linewidth}
                       \vskip1mm
                       Arbitrary centrally-symmetric density.
                       Defined by its radial profile:
                       a table of equidistant values on
                       the [0, 1] interval, interpolated by 
                       a~polynomial (up to cubic). The first table
                       point is at $|{\bf x}| = 0$ and the last is at
                       $|{\bf x}| = 1$. For $|{\bf x}| > 1$ the density is 0.
                       Normalization is computed automatically.
                       \end{minipage} & yes \\ \hline
 \cname{BinnedDensityND} & \begin{minipage}{0.52\linewidth}
                       \vskip1mm
                       Defined by a table of values
                       on $U_d$, equidistant in each dimension,
                       with optional multilinear
                       interpolation. In each dimension,
                       the first table point is at
                       $x_i = 1/(2 N_i)$ and the last is at
                       $x_i = 1 - 1/(2 N_i)$.
                       Useful for converting multivariate
                       histograms into distributions.
                       \end{minipage} 
 & yes \\ \hline
 \end{tabular}
 \end{center}
 \end{table}
 
 
 \subsection{Copulas}
 
 For any continuous multivariate density $p({\bf x}|{\bf a})$ 
 in a $d$-dimensional space $X$ of random variables ${\bf x} \in X$
 depending on a vector of parameters ${\bf a}$,
 we define $d$ marginal densities $p_i(x_i|{\bf a})$, $i = 1, \ldots, d$ by
 $$
 p_i(x_i|{\bf a}) \equiv \int p(x_1, \ldots, x_d |{\bf a}) \mathop{\prod_{j=1}^d}_{j \ne i} d x_j
 $$
 with their corresponding cumulative distribution functions
 $$
 F_i(x_i|{\bf a}) \equiv \int_{-\infty}^{x_i} p_i(\tau|{\bf a}) d \tau .
 $$
 For each point ${\bf x} \in X$, there is a corresponding
 point ${\bf y}$ in a unit $d$-dimensional
 cube $U_d$ such that $y_i(x_i) \equiv F_i(x_i|{\bf a})$, $i = 1, \ldots, d$. The {\it copula density}
 is defined on $U_d$ by
 \begin{equation}
 c({\bf y}({\bf x})|{\bf a}) \equiv \frac{p({\bf x}|{\bf a})}{\prod_{i=1}^{d} p_i(x_i|{\bf a})}.
 \label{eq:copuladef}
 \end{equation}
 Copula density (as well as its corresponding multivariate
 distribution function $C({\bf y}|{\bf a})$, or just {\it copula})
 contains all information
 about mutual dependence of individual variables $x_i$.
 It can be shown that all copula marginals are uniform and, conversely,
 that any distribution on $U_d$ whose marginals are uniform
 is a copula~\cite{ref:copulabook}.
 
 Naturally, when the copula and the marginals of some multivariate density
 are known, the density itself is expressed by
 \begin{equation}
 p({\bf x}|{\bf a}) = c({\bf y}({\bf x})|{\bf a}) \prod_{i=1}^{d} p_i(x_i|{\bf a}).
 \end{equation}
 Note that $c({\bf y}|{\bf a}) = 1$ for all ${\bf y}$ if and only if all $x_i$
 are independent and the density $p({\bf x}|{\bf a})$ is fully
 factorizable.
 
 The NPStat package allows its users to model multivariate continuous
 distributions using copula and marginals with the aid of
 \cname{CompositeDistributionND} class. An object of this class is
 normally constructed out of user-provided copula and marginals.
 \cname{CompositeDistributionND} object can also be constructed
 from histogram bins. Several standard copulas are
 implemented: Gaussian, Student's-$t$, and
 Farlie-Gumbel-Morgenstern~\cite{ref:copulabook}.
 The corresponding class names are \cname{GaussianCopula},
 \cname{TCopula}, and \cname{FGMCopula} (these classes are
 declared in the header file ``npstat/stat/Copulas.hh'').
 
 Empirical multivariate copulas densities can be constructed
 as follows. Let's assume that there are no coincident 
 point coordinates in each dataset variable. We can sort all
 $N$ elements of the data set in the increasing order in each coordinate
 separately and assign to each data point $i$ a multi-index
 $\{m_{i0}, ..., m_{id}\}$. For each dimension $k$, $m_{ik}$
 represents the number of the point $i$ in the sequence ordered
 by the dimension $k$ coordinate, so that
 $0 \le m_{ik} < N$. The empirical copula density, $\mbox{ECD}({\bf x})$,
 is then defined by
 \begin{equation}
 \mbox{ECD}({\bf x}) = \sum_{i=0}^{N-1} \prod_{k=0}^{d-1} \delta\left(x_k - \frac{m_{ik} + 1/2}{N}\right),
 \end{equation}
 where $\delta(x)$ is the Dirac delta function. To get a better idea about
 locations of these delta functions, think of
 an $N \times N \times ... \times N$ uniform grid in $d$ dimensions on
 which $N$ points are placed in such a way that no two
 points share the same coordinate in any of the dimensions.
 In two dimensions, this is like the placement of chess pieces
 in the eight queens puzzle~\cite{ref:eightqueens}
 which is simplified to use rooks instead of queens~\cite{ref:eightrooks}.
 
 In the NPStat package, empirical copula densities can be approximated by
 histograms defined on $U_d$. Construction of empirical copula histograms
 can be performed by functions \cname{empiricalCopulaHisto} 
 (header file ``npstat/stat/empiricalCopulaHisto.hh'') and
 \cname{empiricalCopulaDensity} 
 (header file ``npstat/stat/empiricalCopula.hh''). The integrated empirical copulas can be
 constructed on uniform grids by the \cname{calculateEmpiricalCopula}
 function (header ``npstat/stat/empiricalCopula.hh''). 
 
 The Spearman's rank correlation coefficient, $\rho$, can
 be estimated from two-dimensional empirical copulas using functions
 \cname{spearmansRhoFromCopula} and \cname{spearmansRhoFromCopulaDensity}
 declared in the header file ``npstat/stat/spearmansRho.hh''.
 These functions evaluate the following integrals numerically:
 \begin{align}
   \rho &= 12 \int_0^1 \int_0^1 C({\bf y}|{\bf a}) dy_0 dy_1 - 3 & \mbox{\ } & \mbox{used by \textsf{spearmansRhoFromCopula}} \\
   \rho &= 12 \int_0^1 \int_0^1 c({\bf y}|{\bf a}) y_0 y_1 dy_0 dy_1 - 3 & \mbox{\ } & \mbox{used by \textsf{spearmansRhoFromCopulaDensity}}
 \end{align}
 While these two formulas are identical mathematically~\cite{ref:copulabook},
 the approximations used
 in numerical integration and differentiation will usually lead to slightly
 different results returned by these functions for some copula and its
 corresponding density.
 
 The Kendall's rank correlation coefficient, $\tau$,
 can be estimated from the empirical copulas using the
 function \cname{kendallsTauFromCopula} declared in
 the header ``npstat/stat/kendallsTau.hh''.
 This function evaluates the following formula numerically:
 \begin{equation}
 \tau = 4 \int_0^1 \int_0^1  C({\bf y}|{\bf a}) c({\bf y}|{\bf a}) dy_0 dy_1 - 1.
 \end{equation}
 
 The mutual information between variables can be estimated from
 copula densities represented on uniform grids according
 to Eq.~\ref{eq:entropy}\footnote{Mutual information is
 simply the negative of the copula entropy~\cite{ref:copulaentropy}.}.
 
 Decomposition of multivariate densities into the marginals and the
 copula is useful not only for a subsequent analysis of mutual
 dependencies of the variates but also for implementing nonparametric
 density interpolation, as described in the next section.
 
 
 \section{Nonparametric Interpolation of Densities}
 \label{sec:densityinterpolation}
 
 There are numerous problems in High Energy Physics in which construction of some
 probability density requires extensive simulations and, due to the CPU
 time limitations, can only be performed for a~limited number of
 parameter settings. This is typical, for example, for ``mass
 templates'' which depend on such parameters as the pole mass of the
 particle under study, sample background fraction, detector jet energy
 scale, {\it etc}. It is often desirable to have the capability to
 evaluate such a density for arbitrary parameter values. 
 It is sometimes possible to address this problem by postulating
 an explicit parametric
 model and fitting that model to the sets of simulated distributions.
 However, complex dependence of the distribution shapes on the parameter
 values often leads to infeasibility of this approach. Then it becomes
 necessary to interpolate the densities without postulating a concrete
 model.
 
 \subsection{Interpolation of Univariate Densities using Quantile Functions}
 \label{sec:interpolatebyquant}
 
 As it was shown in~\cite{ref:read}, a general
 interpolation of one-dimensional distributions which
 leads to very naturally looking results
 can be achieved by
 interpolating the {\it quantile function} (defined as the
 inverse of the cumulative distribution function). While study~\cite{ref:read}
 considers linear interpolation in one-dimensional parameter space, it is obvious that
 similar weighted average interpolation can be easily constructed in
 multivariate parameter settings and with higher order interpolation
 schemes. In general, the interpolated quantile function for  
 an~arbitrary parameter value ${\bf a}$ is expressed by
 \begin{equation}
 \label{eq:quantileinterp1d}
 q(y|{\bf a}) =  \sum_{j=1}^m w_j q(y|{\bf a}_j),
 \end{equation}
 where the weighted quantile functions are summed at the $m$ ``nearby''
 parameter settings ${\bf a}_j$ for which the distribution was
 explicitly constructed. The weights $w_j$ are normalized by
 $\sum_{j=1}^m w_j = 1$.  Their precise values depend on the location
 of ${\bf a}$ w.r.t. nearby ${\bf a}_j$ and on the interpolation scheme used.
 Simple high-order interpolation schemes can be constructed
 in one-dimensional parameter space by using Lagrange interpolating
 polynomials
 to determine the weights\footnote{To determine interpolated value of a function using
 Lagrange interpolating polynomials, weights are assigned to function
 values at a set of nearby points according to a rule
 published by Lagrange in 1795. The weights depend only on the
 abscissae of the interpolated points and on the coordinate
 at which the function value is evaluated but not on the
 interpolated function values.}. In $r$-dimensional parameter
 space, rectangular grids can be utilized with multilinear or
 multicubic interpolation. For
 example, in the case of multilinear
 interpolation, the weights can be calculated as follows:
 \begin{thinlist}
 \item Find the hyperrectangular parameter grid
 cell inside which the value of ${\bf a}$ falls.
 \item Shift and scale this cell so that it becomes a hypercube
 with diagonal vertices at $(0, 0, ..., 0)$ and $(1, 1, ..., 1)$.
 \item Let's designate the shifted and scaled value of ${\bf a}$ by ${\bf z}$,
       with components $(z_1, z_2, ..., z_r)$.
       If we were to interpolate towards ${\bf z}$ 
       a scalar function $f$ defined at the
       vertices with coordinates $(c_1, c_2, \ldots, c_r)$,
       where all $c_k$ values are now either 0 or 1,
       then we would use the formula
       \begin{equation}
       \label{eq:multilinear}
 f(z_1, z_2, ..., z_r) = \sum_{\substack{
 c_1 \in \,\{0, 1\}\\
 c_2 \in \,\{0, 1\}\\
 \cdot \cdot \cdot \cdot \cdot \cdot \cdot \cdot \cdot \\
 c_r \in \,\{0, 1\}\\
 }}  f(c_1, c_2, ..., c_r) \prod_{k=1}^r z_k^{c_k} (1 - z_k)^{1 - c_k}
       \end{equation}
       which is obviously linear in every $z_k$ and has
       correct function values when evaluated at the vertices.
       In complete analogy, we define the weights for the quantile functions
       constructed at the vertices with coordinates $(c_1, c_2, \ldots, c_r)$ to be
       $w(c_1, c_2, \ldots, c_r) = \prod_{k=1}^r z_k^{c_k} (1 - z_k)^{1 - c_k}$.
       Naturally, there are $m = 2^r$ weights total.
 \end{thinlist}
 The quantile interpolation of univariate distributions is implemented
 in the NPStat package with the \cname{InterpolatedDistribution1D} class.
 A collection of quantile functions is assembled incrementally together
-with their weights (formula for calculating the weights is to be supplied
+with their weights (formula for calculating the weights has to be supplied
 by the user).
 For calculating the interpolated density at point $x$,
 the equation $x = q(y|{\bf a})$,
 with $q(y|{\bf a})$ from~\ref{eq:quantileinterp1d},
 is solved numerically for $y$. The density is then evaluated by
 numerically differentiating the interpolated quantile function:
 $p(x|{\bf a}) = \left( \frac{\partial q(y|{\bf a})}{\partial y} \right)^{-1}$.
 
-A class with a simpler interface implementing linear weight assignments
-between nearby points in a 1-d parameter space is called \cname{InterpolatedDistro1D1P}.
-
-Less reliable but computationally faster linear interpolation of densities (instead of
-quantile functions) is implemented for 1-d parameter spaces by the
-class \cname{VerticallyInterpolatedDistro1D1P}. This class allows
-the user to perform interpolation in a transformed space (for example,
-the transform can center all distributions at 0 and rescale them to
-the same width) in order to alleviate deficiencies of ``vertical''
-interpolation.
+Direct interpolation of univariate density functions, also known
+as ``vertical interpolation'', is implemented by the
+\cname{VerticallyInterpolatedDistribution1D} class.
 
+A class with a simpler interface implementing automatic linear weight assignments
+between nearby points in a 1-d parameter space is called \cname{InterpolatedDistro1D1P}.
+This class can be used to interpolate either quantile functions
+or densities, depending on a switch. A~similar class which interpolates
+univariate densities on a rectangular grid in a miltidimensional parameter
+space is called \cname{InterpolatedDistro1DNP}.
 
 \subsection{Interpolation of Multivariate Densities}
 \label{sec:interpolatemultivar}
 
 The procedure described in the previous section can be
 generalized to interpolate multivariate distributions.
 Let $p({\bf x}|{\bf a})$ be a multivariate probability density
 in a $d$-dimensional space $X$ of random variables ${\bf x} \in X$.
 ${\bf a}$ is a vector of parameters, and
 $\int_{X} p({\bf x}|{\bf a}) d {\bf x} = 1$ 
 for every ${\bf a}$\footnote{Compared to interpolation of
 arbitrary functions, preserving this normalization is one of the
 major complications in interpolating multivariate densities.}.
 For a~``well-behaved'' density (Riemann-integrable, {\it etc}),
 it is always possible to construct a one-to-one
 mapping from the space $X$
 into the unit $d$-dimensional cube, $U_d$, using a sequence of one-dimensional
 conditional cumulative distribution functions. These functions are defined
 as follows:
 $$
 \begin{array}{c}
 F_{1}(x_1|x_2, x_3, \ldots, x_d, {\bf a}) \equiv \int_{-\infty}^{x_1} p(z_1, x_2, x_3, \ldots, x_d | {\bf a}) d z_1 / 
 \int_{-\infty}^{\infty} p(z_1, x_2, x_3, \ldots, x_d | {\bf a}) d z_1
 \\
 F_{2}(x_2 | x_3, \ldots, x_d, {\bf a}) \equiv \int_{-\infty}^{x_2} F_{1}(\infty | z_2, x_3, \ldots, x_d, {\bf a}) d z_2 /
 \int_{-\infty}^{\infty} F_{1}(\infty | z_2, x_3, \ldots, x_d, {\bf a}) d z_2
 \\
 F_{3}(x_3 | \ldots, x_d, {\bf a}) \equiv \int_{-\infty}^{x_3} F_{2}(\infty | z_3, \ldots, x_d, {\bf a}) d z_3 /
 \int_{-\infty}^{\infty} F_{2}(\infty | z_3, \ldots, x_d, {\bf a}) d z_3
 \\
 .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .
 \\
 F_{d}(x_d | {\bf a}) \equiv \int_{-\infty}^{x_d} F_{d-1}(\infty | z_d, {\bf a}) d z_d /
 \int_{-\infty}^{\infty} F_{d-1}(\infty | z_d, {\bf a}) d z_d
 \end{array}
 $$
 Naturally, $F_{k}(\infty | x_{k+1}, \ldots, x_d, {\bf a})$ is just renormalized $p({\bf x}|{\bf a})$
 in which the first $k$ components of ${\bf x}$ are integrated out ({\it i.e.,} marginalized).
 The mapping from ${\bf x} \in X$ into ${\bf y} \in U_d$ is defined by $y_i = F_i(x_i | \ldots), \ i = 1, \ldots, d$. In terms of conditional cumulative distribution functions,
 $$
 p({\bf x}|{\bf a}) = \prod_{i=1}^{d}{\frac{\partial F_i(x_i | \ldots)}{\partial x_i}}.
 $$
 
 This method is not unique and other mappings 
 from $X$ to $U_d$ are possible. What makes this particular construction useful
 is that the inverse mapping, from $U_d$ into $X$, can be
 easily constructed as well: we simply solve the equations 
 \begin{equation}
 \label{eq:invertcondcdf}
 y_i = F_i(x_i | \ldots)
 \end{equation}
 in the reverse order, starting from dimension $d$ and going back
 to dimension $1$. Each equation in this sequence {\it has only one unknown}
 and therefore it can be efficiently solved numerically
 (and sometimes algebraically) by a variety
 of standard root finding techniques. The solutions of these equations,
 $x_i = q_i(y_i | x_{i+1}, \ldots, x_d, {\bf a}) \equiv F_i^{-1}(y_i | \ldots)$, are
 the {\it conditional quantile functions} (CQFs). Note that
 \begin{equation}
 \label{eq:invderiv}
 p({\bf x}|{\bf a}) = \left( \prod_{\ i=1}^{d}{\frac{\partial q_i(y_i | \ldots)}{\partial y_i}} \right)^{-1}.
 \end{equation}
 
 Now, if the CQFs are known for some parameter values
 ${\bf a}_1$ and ${\bf a}_2$,
 interpolation towards ${\bf a} = (1 - \lambda) {\bf a}_1 + \lambda {\bf a}_2$ is made
 in the same manner as described in Section~\ref{sec:interpolatebyquant}: 
 $x_i = (1 - \lambda) q_i(y_i | x_{i+1}, \ldots, x_d, {\bf a}_1) + \lambda q_i(y_i | x_{i+1}, \ldots, x_d, {\bf a}_2)$.
 If the CQFs 
 are known on a~grid in the parameter space, we have to use
 an appropriate interpolation technique (multilinear, multicubic, {\it etc})
 in that space in order to assign
 the weights to the CQFs at the nearby grid points. In general,
 the interpolated CQFs are defined by
 a weighted average
 \begin{equation}
 \label{eq:interpq}
 q_i(y_i | x_{i+1}, \ldots, x_d, {\bf a}) = \sum_{j=1}^m w_j q_i(y_i | x_{i+1}, \ldots, x_d, {\bf a}_j),
 \end{equation}
 where the sum is performed over $m$ nearby parameter points,
 weights $w_j$ are normalized by $\sum_{j=1}^m w_j = 1$, and their
 exact values depend on the parameter grid chosen and the interpolation method
 used.
 
 Basically, it is the whole mapping from ${\bf y}$ into ${\bf x}$ which gets
 interpolated in this method.
 The CQF interpolation results look 
 very natural, but the process is rather CPU-intensive:
 for each ${\bf x}$ we need to solve $d$ one-dimensional nonlinear equations
 \begin{equation}
 \label{eq:interpsolve}
 x_i = q_i(y_i | x_{i+1}, \ldots, x_d, {\bf a}), \ \ \ \ i = d, \ldots, 1
 \end{equation}
 in order to determine ${\bf y}$ of the interpolated mapping.
 In this process, each call to evaluate $q_i(y_i | x_{i+1}, \ldots, x_d, {\bf a})$
 triggers $m$ calls to evaluate $q_i(y_i | x_{i+1}, \ldots, x_d, {\bf a}_j)$.
 Depending on implementation details, each of these $m$ calls
 may in turn trigger root finding in an equation
 like~\ref{eq:invertcondcdf}.
 Once ${\bf y}$ is found for the interpolated CQFs,
 density is determined by numerical
 evaluation of~\ref{eq:invderiv}
 in which $q_i(y_i | \ldots)$ are given by~\ref{eq:interpq}.
 
 In practice, finite steps of the parameter grids will cause
 dependence of the interpolation results on the order in which
 conditional quantiles are evaluated. If choosing some
 particular order is considered undesirable and increased CPU
 loads are acceptable, all $d!$ possible permutations
 should be averaged.
 
 In the NPStat package, multivariate densities whose mapping
 from the multidimensional unit cube into the density support
 region is implemented via CQFs return ``true'' when
 their virtual function \cname{mappedByQuantiles} is called.
 User-developed implementations of
 multivariate densities should follow this convention as well.
 In particular, mapping of
 the densities represented by the \cname{BinnedDensityND} class
 (lookup tables on hyperrectangular
 grids with multilinear interpolation) is performed by CQFs,
 as well as mapping of all fully factorizable densities. 
 
 When the fidelity of the model is less critical or when the
 correlation structure of the distribution is more stable w.r.t.
 parameter changes than its location and scales, much faster
 multivariate density interpolation can be performed by decomposing
 the density into the copula and the marginals. The support
 of the copula density $c({\bf y}({\bf x})|{\bf a})$ defined
 in Eq.~\ref{eq:copuladef}
 is always $U_d$ and it does not depend on the
 parameter ${\bf a}$. This suggests that the marginals and the copula can
 be interpolated separately, using quantile function interpolation
 for the marginals and the standard weighted average interpolation
 for the copula.
 
 The NPStat package uses the same data structure to perform both
 CQF-based and copula-based interpolation of multivariate densities.
 Multilinear interpolation is supported on a rectangular parameter
 grid (not necessarily equidistant).
 The distributions at the grid points are collected together
-using the \cname{GridInterpolatedDistribution} template,
-while method-specific
-differences are isolated inside a policy class which serves as
-the \cname{GridInterpolatedDistribution} template parameter.
-Currently, \cname{UnitMapInterpolationND} or 
-\cname{CopulaInterpolationND} classes can serve as such a~parameter
-which results in CQF-based or copula-based interpolation, respectively.
+using the \cname{GridInterpolatedDistribution} class.
+Internally, the interpolation is performed by either
+\cname{UnitMapInterpolationND} or \cname{CopulaInterpolationND}
+class, depending on a switch. These classes perform
+CQF-based and copula-based interpolation, respectively.
 
 \section{Nonparametric Density Estimation}
 \label{sec:densityestimation}
 
 The problem of estimating population probability density function
 from a finite sample drawn from this population is ubiquitous
 in data analysis practice. It is often the case that
 there are no substantial reasons for choosing a particular parametric
 density model but certain assumptions,
 such as continuity
 of the density together with some number of derivatives or absence
 of narrow spatial features, can still be justified. There is
 a~number of approaches by which assumptions of this kind can be introduced
 into the statistical model. These approaches are collectively known
 as ``nonparametric
 density estimation'' methods~{\cite{ref:silverman, ref:izenmann, ref:kde}}. The NPStat
 package provides an efficient implementation of one of 
 such methods, kernel density estimation (KDE), together with its extension
 to polynomial density models and densities with bounded support. 
 This extension is called ``local orthogonal polynomial expansion'' (LOrPE).
 
 \subsection{Kernel Density Estimation (KDE)}
 \label{sec:kde}
 
 Suppose, we have an i.i.d.~sample of measurements $x_i$, $i = 0, 1,
 ..., N-1$ from a univariate probability density $p(x)$. The empirical
 probability density function (EPDF) for this sample is defined by
 \begin{equation}
 \mbox{EPDF}(x) = \frac{1}{N} \sum_{i=0}^{N-1} \delta (x - x_i),
 \label{eq:edf}
 \end{equation}
 where $\delta(x)$ is the Dirac delta function. $\mbox{EPDF}(x)$
 can itself be considered an estimate of $p(x)$. However, this
 estimate can be substantially improved if some additional information
 about $p(x)$ is available. For example, we can often assume that
 $p(x)$ is continuous together with its first few derivatives or
 that it can have at most a few modes. In such cases a convolution
 of $\mbox{EPDF}(x)$ with a kernel function, $K(x)$, often provides
 a much better estimate of the population density. $K(x)$ itself is
 usually chosen to be a symmetric continuous density with a location
 and scale parameter, so that the resulting estimate looks
 like
 \begin{equation}
 \hat{p}\sub{KDE}(x|h) = \frac{1}{N h} \sum_{i=0}^{N-1} K \left(\frac{x - x_i}{h}\right).
 \label{eq:kde}
 \end{equation}
  In the context of
 density estimation, parameter $h$ is usually referred to as
 ``bandwidth''.
 Use of the Gaussian distribution or one of the distributions from the
 symmetric beta family as $K(x)$ is very common. In fact, it is so common
 that beta family kernels have their own names in the density estimation
 literature. These names are listed in Table~\ref{table:betakernels}.
 \begin{table}[h!]
 \caption{One-dimensional kernels from the symmetric beta family.
 All these kernels look like
 ${\cal{N}} \, (1 - x^2)^p \, I(-1 < x < 1)$, where ${\cal{N}}$
 is the appropriate normalization factor. In the
 NPStat package, their formulae are implemented by the
 \cname{SymmetricBeta1D} function.}
 \label{table:betakernels}
 \begin{center}
 \noindent\begin{tabular}{|c|c|} \hline
 Kernel name  & Power $p$ \\ \hline\hline
 Uniform (also called ``boxcar'') & 0 \\ \hline
 Epanechnikov & 1 \\ \hline
 Biweight (also called ``quartic'') & 2 \\ \hline
 Triweight & 3 \\ \hline
 Quadweight & 4 \\ \hline
 % Quintweight & 5 \\ \hline
 \end{tabular}
 \end{center}
 \end{table}
 
 In the limit $N \rightarrow \infty$ and with proper choice of $h$ so that
 $h \rightarrow 0$ and $N h \rightarrow \infty$,
 $\hat{p}\sub{KDE}(x)$ becomes a~consistent
 estimate of $p(x)$ in terms of integrated squared error (ISE):
 \begin{equation}
 \mbox{ISE}(h) = \int_{-\infty}^{\infty} (\hat{p}\sub{KDE}(x|h) - p(x))^2 dx, \ \ \ \ \ \lim_{N \rightarrow \infty} \mbox{ISE}(h) = 0.
 \end{equation}
 
 For the method analysis purposes, it is useful to understand
 what happens when we reconstruct known densities $p(x)$.
 Here, another measure of distance between the true
 density and its estimator becomes indispensable,
 called ``mean integrated squared error'' (MISE):
 \begin{equation}
 \mbox{MISE}(h) = E(\mbox{ISE}(h)) = E \left( \int_{-\infty}^{\infty} (\hat{p}\sub{KDE}(x|h) - p(x))^2 dx \right),
 \label{eq:mise}
 \end{equation}
 where $E(...)$ stands for the expectation value over samples of $N$ points
 drawn from $p(x)$. 
 While other distance measures can be defined
 (and may be more relevant for your problem),
 $\mbox{MISE}(h)$ is usually
 the easiest to analyze
 mathematically\footnote{Note that both ISE and MISE are not dimensionless and,
 therefore, not invariant under scaling transformations.
 It is OK to compare different $\hat{p}\sub{KDE}(x|h)$ with each other if
 the underlying $p(x)$ is the same, but ISE or MISE comparison for different $p(x)$
 requires certain care if meaningful results are to be obtained.}.
 A typical goal of such an analysis consists in
 finding the value of $h$ which minimizes $\mbox{MISE}(h)$ for
 a~sample of given size, and a~significant amount of effort
 has been devoted to such bandwidth optimization studies
 (see, {\it e.g.,} Refs.~\cite{ref:bwopt, ref:loaderbw} for a review).
 A large fraction of these studies employs a~simple MISE approximation valid for large
 values of $N$ known as ``AMISE'' (asymptotic MISE). This approximation
 includes just the two leading terms known as ``bias'' which increases with
 increasing bandwidth and ``variance'' which decreases with increasing bandwidth,
 so that the bandwidth optimization procedure is reduced to finding
 the best bias-variance trade-off.
 
 For subsequent discussion, it will be useful to introduce the concept
 of kernel order. Let's define the functional
 \begin{equation}
 \mu_{j}(f) = \int_{-\infty}^{\infty} x^j f(x) dx
 \label{eq:mufunct}
 \end{equation}
 which is the $j$-th moment of $f(x)$ about 0.
 Then it is said that the kernel $K(x)$ is of order $m$ if
 \begin{equation}
 \mu_0(K) = 1, \ \mu_j(K) = 0 \ \mbox{for} \ j = 1, ..., m-1, \ \mbox{and} \ \mu_m(K) \ne 0.
 \end{equation}
 It can be shown that, for one-dimensional
 KDE, the rate of AMISE convergence to 0 is proportional
 to $N^{-\frac{2 m}{2 m + 1}}$ (see, for example, section~2.8
 of~\cite{ref:kernsmooth}). Therefore, kernels with high values
 of $m$ should be preferred for large samples. At the same time,
 only in case $m = 2$ the kernels can be everywhere non-negative
 ({\it i.e.,} bona fide densities).
 When $m > 2$ (so called ``high-order'' kernels), in order to have
 $\mu_2(K) = 0$, $K(x)$ must become negative somewhere.
 Therefore, negative values of $\hat{p}\sub{KDE}(x|h)$ also
 become possible, and a mechanism for dealing with this problem
 must be specified. In NPStat, this problem is normally taken care
 of by setting negative values of $\hat{p}\sub{KDE}(x|h)$ to 0
 with subsequent renormalization of the density estimate so that its
 integral is 1.
 
 It is instructive to consider $p(x)$, $\mbox{EPDF}(x)$,
 and $\hat{p}\sub{KDE}(x|h)$ in the frequency
 domain\footnote{The Fourier transform of a
 probability density is called ``characteristic function'' of the distribution.}. There, the Fourier
 transform of $K(x)$ acts as a low-pass filter which suppresses the sampling
 noise present in $\mbox{EPDF}(x)$ so that the result, $\hat{p}\sub{KDE}(x|h)$,
 has a better match
 to the spectrum of $p(x)$. By Parseval's identity, this
 leads to the reduction of the ISE. High-order kernels allow
 for a sharper frequency cutoff in the filter. Fortunately, the precise shape
 of $p(x)$ frequency spectrum
 is relatively
 unimportant, it is only important that its high frequency components
 decay ``fast enough'' so that their suppression together with the
 noise does not cause a significant distortion of  $\hat{p}\sub{KDE}(x|h)$
 in comparison with $p(x)$. Powerful automatic bandwidth selection rules
 can be derived from this type of analysis if certain
 assumptions about the $p(x)$ spectrum decay rate
 at high frequencies are satisfied~\cite{ref:chiu2, ref:chiumultivar}.
 
 With a few modifications, the ideas described above can be translated
 to multivariate settings. The kernel function becomes a multivariate
 density, and the bandwidth parameter becomes, in general, the bandwidth
 matrix (for example, the covariance matrix in the case of multivariate
 Gaussian kernel).
 
 The NPStat package calculates $\hat{p}\sub{KDE}(x|h)$ on an equidistant
 grid in one or more dimensions.
 Initially, the data sample is histogrammed using a finely
 binned histogram and then convoluted with a kernel using
 Discrete Fast Fourier Transform (DFFT).
 The number of histogram bins, $N_b$, should be selected
 taking into account the following considerations:
 \begin{thinlist}
 \item The bins should be sufficiently small so that no ``interesting''
       detail will be missed due to discretization of the density.
 
 \item The bins should be sufficiently small so that the expected optimal
       MISE is significantly larger than the ISE due to
       density discretization. Detailed exposition of this
       requirement can be found in Ref.~\cite{ref:discretizedkde}.
 
 \item DFFT should be efficient for this number of bins. It is best to use
       $N_b = 2^k$ bins in each dimension, where $k$ is a positive integer.
 \end{thinlist}
 The computational complexity of this method is
 ${\cal O}(N) + {\cal O}(N_b \ln N_b)$ which
 is usually much better than the ${\cal O}(N \times N_b)$ complexity of
 a ``naive'' KDE implementation. However, for large sample
 dimensionalities $N_b$ can become very large which limits the
 applicability of this technique. There are other computational
 methods (not yet in NPStat) which can work efficiently for
 high-dimensional samples~\cite{ref:fastkde1, ref:fastkde2}.
 
 The following NPStat functions and classes can be used to perform
 KDE and to assist in bandwidth selection:
 
 \cname{amiseOptimalBwGauss} (header file ``npstat/stat/amiseOptimalBandwidth.hh'') --- calculates AMISE-optimal bandwidth for
     fixed-bandwidth KDE with Gaussian kernel as well as high-order kernels
     derived from Gaussian. The following formula is implemented:
 \begin{equation}
 h\sub{AMISE} = \left( \frac{R(K) (m!)^2}{2 m \mu_{m}^2(K) R(p^{(m)}) N} \right)^{\frac{1}{2 m + 1}}.
 \label{eq:amisebw}
 \end{equation}
 In this formula, $R(f)$ denotes the functional
 $R(f) = \int_{-\infty}^{\infty} f^2(x) dx$ defined
 for any square-integrable function $f$, $\mu_{m}(f)$
 is the functional defined in Eq.~\ref{eq:mufunct},
 $K$ is the kernel,
 $m$ is the kernel order, $p^{(m)}$ is the $m$-th derivative of the
 reconstructed density\footnote{Of course, in
 practice the reconstructed density is usually unknown.
 It is expected that a~look-alike density will be substituted.
 }, and $N$ is the number of points in the sample.
 The expected AMISE corresponding to this bandwidth is calculated from
 \begin{equation}
 \mbox{AMISE}(h\sub{AMISE}) = \frac{(2 m + 1) R(K)}{2 m \, h\sub{AMISE} \,N}.
 \label{eq:bestamise}
 \end{equation}
 For the origin and further discussion
 of these formulae consult, for example, section~2.8 in
 Ref.~\cite{ref:kernsmooth}. It is assumed that high-order kernels are
 generated according to Eq.~\ref{eq:effk}.
 
 \cname{amiseOptimalBwSymbeta} (header file ``npstat/stat/amiseOptimalBandwidth.hh'') 
     --- calculates AMISE-optimal bandwidth
     and corresponding expected AMISE according to
     Eqs.~\ref{eq:amisebw} and~\ref{eq:bestamise}
     for fixed-bandwidth KDE with kernels from symmetric beta family as well
     as with high-order kernels derived from symmetric beta distributions.
 
 \cname{amisePluginBwGauss} (header file ``npstat/stat/amiseOptimalBandwidth.hh'') 
     --- Gaussian $p(x)$ is substituted
     in Eqs.~\ref{eq:amisebw} and~\ref{eq:bestamise} and the corresponding
     quantities are found for the Gaussian kernel as well as high-order kernels
     derived from Gaussian.
 
 \cname{amisePluginBwSymbeta} (header file ``npstat/stat/amiseOptimalBandwidth.hh'') 
     ---  Gaussian $p(x)$ is substituted
     in Eqs.~\ref{eq:amisebw} and~\ref{eq:bestamise} and the corresponding
     quantities are found for the kernels from the symmetric beta family as well
     as high-order kernels derived from symmetric beta distributions.
 
 \cname{miseOptimalBw} and \cname{gaussianMISE} methods of the
     \cname{GaussianMixture1D} class --- these methods calculate
     MISE-optimal bandwidth and the corresponding exact MISE for
     Gaussian mixture densities estimated with the Gaussian kernel
     (or high-order kernels derived from the Gaussian) according
     to the formulae from Ref.~\cite{ref:bwgaussmix}.
 
 \cname{ConstantBandwidthSmoother1D} --- This class implements fixed-bandwidth
     KDE for one-dimensional samples of points using Gaussian kernels
     or kernels from the symmetric beta family (including high-order
     kernels which are generated internally). Boundary effects can be
     alleviated by data mirroring. Kernel convolutions are performed
     by DFFT after sample discretization.
 
 \cname{ConstantBandwidthSmootherND} --- This class implements fixed-bandwidth
     KDE for multivariate histograms. Arbitrary density implementations
     which inherit from \cname{AbsDistributionND} can be used as weight
     functions for generating high-order kernels. Boundary effects can be
     alleviated by data mirroring.
 
 \cname{JohnsonKDESmoother} --- This class constructs adaptive bandwidth KDE estimates for
     one-dimensional samples. It operates in several steps:
     \begin{thinlist}
     \item The sample is discretized using centroid-preserving binning.
     \item Population mean, $\mu$, standard deviation, $\sigma$,
           skewness, $s$, and kurtosis, $k$, are
           estimated by the \cname{arrayShape1D} function.
     \item A distribution from the Johnson system with these
           values of  $\mu$, $\sigma$, $s$, and $k$ is used as a template
           for the \cname{amiseOptimalBwSymbeta} (or \cname{amiseOptimalBwGauss})
           function which calculates optimal constant bandwidth
           according to Eq.~\ref{eq:amisebw}.
     \item A pilot density estimate is built by KDE
           (using \cname{LocalPolyFilter1D} class) with the bandwidth determined
           in the previous step.
     \item The sample is smoothed by the \cname{variableBandwidthSmooth1D}
           function (described below) using this pilot estimate.
     \end{thinlist}
 This method achieves better MISE
 convergence rate without using high-order kernels (so that density
 truncation below zero is unnecessary). Give this method a serious
 consideration if you are working with one-dimensional samples,
 expect to reconstruct a unimodal distribution, and do not have to worry
 about boundary effects.
 
 
 \cname{KDECopulaSmoother} --- multivariate KDE in which extra care
     is applied in order to ensure that the estimation result
     is a bona fide copula ({\it i.e.,} that all of its marginals
     are uniform). 
     This class should normally be used to smooth empirical copula densities
     constructed by \cname{empiricalCopulaHisto} or
     \cname{empiricalCopulaDensity}.
     The bandwidth can be supplied by the user or
     it can be chosen by cross-validation. In general,
     \cname{LOrPECopulaSmoother} will produce better results
     in this context, so \cname{KDECopulaSmoother} should
     be used only in case the slower speed of \cname{LOrPECopulaSmoother}
     is deemed unacceptable.
 
 \cname{KDEFilterND} --- A collection of \cname{KDEFilterND} objects which utilize
     common workspace and DFFT engine can be employed to perform cross-validation
     calculations and bandwidth scans. Such a collection is used, for example,
     by the \cname{KDECopulaSmoother} class described above.
     If you already know the bandwidth,
     the \cname{ConstantBandwidthSmootherND} class
     will likely be more convenient to use than this one.
 
 \cname{variableBandwidthSmooth1D} --- implements one-dimensional
     variable kernel adaptive KDE (which should not be confused
     with {\it local kernel}, or {\it balloon}, method --- see section~2.10 in
     Ref.~\cite{ref:kernsmooth} for further discussion). In general, this
     approach consists in assigning different bandwidth values to each sample
     point. In the \cname{variableBandwidthSmooth1D} function, this assignment
     is performed according to the formula
     \begin{equation}
       h_i = \frac{c}{{\hat \rho}^{\alpha}(x_i)},
     \end{equation}
     where ${\hat \rho(x)}$ is a~pilot density estimate constructed, for example,
     by fixed-bandwidth KDE. 
     Boundary kernel adjustments are performed automatically.
     The normalization constant $c$ is
     determined so that the geometric mean
     of $h_i$ equals to a~user-provided bandwidth. There are
     certain reasons to believe that the choice of power
     parameter $\alpha = 1/2$ is a~particularly good one~\cite{ref:abramson}.
 
 \cname{simpleVariableBandwidthSmooth1D} --- a high-level driver function for
     variable kernel adaptive KDE which uses kernels from the symmetric beta
     family and automatically generates a~pilot density estimate employing
     AMISE plugin bandwidth.
 
 
 \subsection{Local Orthogonal Polynomial Expansion (LOrPE)}
 \label{sec:lorpe}
 
 Local Orthogonal Polynomial Expansion (LOrPE) can be viewed as
 a convenient method
 for creating kernels with desired properties (including high-order
 kernels) and for eliminating the KDE boundary bias\footnote{If you
 are familiar with orthogonal series density estimators (OSDE), you can
 also view LOrPE as a~localized version of OSDE.}.
 LOrPE amounts to constructing a truncated expansion of the EPDF
 defined by Eq.~\ref{eq:edf}
 into orthogonal polynomial series near each point $x\sub{fit}$
 where we want to build the initial density estimate:
 \begin{equation}
 \label{eq:expansion}
 \hat{f}\sub{LOrPE}(x|h) = \sum_{k=0}^{M}c_{k}(x\sub{fit}, h) P_{k}\left(\frac{x - \xf}{h}\right).
 \end{equation}
 The polynomials $P_{k}(x)$ are built to satisfy the normalization condition
 \begin{equation}\label{eq:norm0}
 \frac{1}{h} \int_{a}^{b} P_{j}\left(\frac{x - \xf}{h}\right)P_{k}\left(\frac{x - \xf}{h}\right) K\left(\frac{x - \xf}{h}\right)dx = \delta_{jk},
 \end{equation}
 which is equivalent to
 \begin{equation}
 \label{eq:norm}
 \int_{(a - x\sub{fit})/h}^{(b - x\sub{fit})/h} P_{j}(y)P_{k}(y)K(y)dy = \delta_{jk},
 \end{equation}
 where $\delta_{jk}$ is the Kronecker delta,
 $K(x)$ is a suitably chosen kernel function,
 and $[a, b]$ is the support interval of the estimated density.
 For commonly used kernels from
 the beta family (Epanechnikov, biweight, triweight, {\it etc}.), condition (\ref{eq:norm})
 generates the normalized Gegenbauer polynomials (up to a common multiplicative
 constant) at points $\xf$ sufficiently deep inside the support
 interval, provided $h$ is small enough to guarantee that
 $(a-\xf)/h\leq -1$ and $(b-\xf)/h\geq 1$. 
 If $\xf$ is
 sufficiently close to the boundaries of the density
 support $[a,b]$ then the polynomial system will vary
 depending on $\xf$, and the notation
 $P_{k}(\cdot,\xf)$ would be more appropriate. However, in the subsequent
 text this dependence on $\xf$ will be suppressed in order to simplify
 the notation.
 
 The expansion coefficients $c_{k}(x\sub{fit}, h)$
 are determined by the usual scalar product of the expanded function with
 $P_{k}$:
 \begin{equation}\label{eq:convol}
 c_{k}(x\sub{fit}, h) = \frac{1}{h} \int \mbox{EPDF}(x) P_{k}\left( \frac{x - x\sub{fit}}{h} \right)K\left( \frac{x - x\sub{fit}}{h} \right)dx,
 \end{equation}
 which, after substituting $\mbox{EPDF}(x)$ from~\ref{eq:edf}, leads to
 \begin{equation}\label{eq:emp-ck}
 c_{k}(x\sub{fit}, h) = \frac{1}{Nh}\sum_{i=0}^{N-1}P_{k}\left( \frac{x_i - x\sub{fit}}{h} \right) K\left( \frac{x_i - x\sub{fit}}{h} \right).
 \end{equation}
 Note that the coefficients $c_{k}(x\sub{fit}, h)$ calculated according to
 Eq.~\ref{eq:convol} can be naturally 
 interpreted as localized expectation values of orthogonal polynomials
 $P_{k}\left( \frac{x - x\sub{fit}}{h} \right)$ w.r.t. probability
 density $\mbox{EPDF}(x)$ in which localization weights are given by
 $\frac{1}{h} K\left( \frac{x - x\sub{fit}}{h} \right)$.
 
 The density estimate at $x\sub{fit}$ is defined by
 \begin{equation}
 \hat{p}\sub{LOrPE}(x\sub{fit}|h) = \max\{0, \hat{f}\sub{LOrPE}(x\sub{fit}|h)\}.
 \label{eq:trunc}
 \end{equation}
 In general, LOrPE does not produce a bona fide density (in this respect it is similar to the
 orthogonal series estimator), and after calculating
 the density for all $x\sub{fit}$ values one has to perform the overall renormalization.
 
 Equation~\ref{eq:expansion} can be usefully generalized as follows:
 \begin{equation}\label{eq:lorpe}
 \hat{f}\sub{LOrPE}(x|h) = \sum_{k=0}^{\infty} g(k) c_{k}(x\sub{fit}, h) P_{k}\left( \frac{x - x\sub{fit}}{h} \right).
 \end{equation}
 Here, $g(k)$ is a ``taper function''. Normally, $g(0) = 1$ and
 there is an integer $M$ such that $g(k) = 0$ for any $k > M$. The
 taper function suppresses
 high order terms gradually instead of using a
 sharp cutoff at $M$.
 
 When evaluated at points $x\sub{fit}$
 which are sufficiently far away from the density support
 boundaries and if $K(x)$ is an even function, Eq.~\ref{eq:lorpe}
 is equivalent to a~kernel density estimator with the effective
 kernel
 \begin{equation}\label{eq:effk}
 K\sub{eff}(x) =  K(x) \sum_{j=0}^{\infty} g(2 j) P_{2 j}(0) P_{2 j}(x).
 \end{equation}
 Moreover, if $g(k)$ is a step function,
   {\it i.e.,}~$g(k) = 1$ for all $k \le M$ and $g(k) = 0$ for all $k > M$,
   it can be shown that the effective kernel is of order $M+1$ if $M$
   is odd and $M+2$ if $M$ is 
   even~\cite{ref:placida}\footnote{For such kernels the sum in Eq.~\ref{eq:effk}
   can be
   reduced to a simple algebraic form via the Christoffel-Darboux identity.
   However, the resulting formula is not easy to evaluate in a numerically
   stable manner near $x = 0$.}.
 
 A slightly different modification of LOrPE is based on the following
 identity:
 \begin{equation}\label{eq:deltasum}
 \frac{1}{h} \sum_{j=0}^{\infty} P_{j}\left(\frac{x - x_i}{h}\right)P_{j}\left(0\right) K\left(\frac{x - x_i}{h}\right) = \delta(x - x_i).
 \end{equation}
 Substituting this into Eq.~\ref{eq:edf}, we obtain
 \begin{equation}
 \mbox{EPDF}(x) = \frac{1}{N h} \sum_{i=0}^{N-1} \sum_{j=0}^{\infty} P_{j}\left(\frac{x - x_i}{h}\right)P_{j}\left(0\right) K\left(\frac{x - x_i}{h}\right).
 \end{equation}
 The modified density estimate is obtained from this expansion by introducing a taper:
 \begin{equation}
 \hat{\hat{f}}\sub{LOrPE}(x\sub{fit}|h) = \frac{1}{N h} \sum_{i=0}^{N-1} \sum_{j=0}^{\infty} g(j) P_{j}\left(\frac{x\sub{fit} - x_i}{h}\right)P_{j}\left(0\right) K\left(\frac{x\sub{fit} - x_i}{h}\right)
 \label{eq:lorpe2}
 \end{equation}
 and then truncation is handled just like in Eq.~\ref{eq:trunc}.
 For even kernels and points $x\sub{fit}$ sufficiently far away from the density support
 boundaries, Eq.~\ref{eq:lorpe2} is equivalent to Eq.~\ref{eq:lorpe}
 evaluated at $x = x\sub{fit}$. However, this is no longer true near the boundaries.
 Perhaps, the easiest way to think about it is that, in Eq.~\ref{eq:lorpe2},
 an effective kernel is placed at the location of each data point
 (and, in general, shapes of these effective kernels
 depend on the data point location $x_i$).
 All these kernels are then summed to obtain the density estimates
 at all $x\sub{fit}$.
 On the other hand, in Eq.~\ref{eq:lorpe} the effective kernel is placed
 at the location of each point at which the density estimate is made (so
 that the effective kernel shape depends on $x\sub{fit}$).
 This kernel generates the weights for each data point which are summed to
 obtain the estimate. The difference between these two approaches
 is usually ignored for KDE
 (simple KDE is unable to deal with boundary bias anyway),
 but for LOrPE substantially different results can be obtained near
 the boundaries.
 
 It is not obvious apriori which density estimate is better:
 $\hat{\hat{f}}\sub{LOrPE}$ from Eq.~\ref{eq:lorpe2}
 or $\hat{f}\sub{LOrPE}$ from  Eq.~\ref{eq:lorpe},
 although some preliminary experimentation with simple distributions
 does indicate that
 $\hat{f}\sub{LOrPE}$ typically results in smaller MISE.
 The integral of the $\hat{\hat{f}}\sub{LOrPE}$ estimate
 on the complete density support interval is automatically 1
 which is not true for $\hat{f}\sub{LOrPE}$. On the other
 hand, $\hat{f}\sub{LOrPE}$ admits an appealing interpretation
 in terms of the local density expansion~\ref{eq:lorpe} in which
 the localized expectation values of the orthogonal polynomials
 $P_k$ are matched to their observed values
 in the data sample (this also leads to automatic matching
 of localized distribution moments about $x\sub{fit}$).
 
 One-dimensional KDE with fixed kernel $K(x)$ has only one important parameter
 which regulates the amount of smoothing:
 bandwidth $h$. LOrPE has two such parameters: bandwidth $h$ and the highest
 polynomial order $M$ (or, in general, the shape of the taper function).
 It is intuitively obvious that polynomial modeling
 should result in a smaller bias than KDE for densities
 with several (at least $M$) continuous
 derivatives, and that a~proper balance of $h$ and $M$ should
 result in a better estimator overall.
 
 LOrPE calculations remain essentially unchanged
 in multivariate settings: the only difference is switching to multivariate
 orthogonal polynomial systems.
 
 Even though LOrPE is equivalent to KDE far away from the density support
 boundaries, LOrPE does not suffer from the boundary bias because
 Eq.~\ref{eq:norm} automatically adjusts the shape of 
 orthogonal polynomials near the boundary. This makes LOrPE
 applicable in a~wider set of problems than KDE.
 In addition to just making better estimates of densities with
 a~sharp cutoff at the boundary, LOrPE fixes the main
 problem with some existing KDE-based methods which are rarely
 used in practice due to their severe degradation from
 the boundary bias. Examples of such methods include transformation
 kernel density estimation
 (see, for example, section 2.10.3 of~\cite{ref:kernsmooth})
 in which the transformation target is the uniform distribution,
 as well as separate estimation
 of the marginals and the copula for multivariate densities.
 
 Unfortunately, LOrPE improvements over KDE do not come without
 a price in terms of the algorithmic complexity of the method.
 The density estimate can no longer be represented as a simple
 convolution of the sample EPDF and a kernel. Because of this,
 DFFT-based calculations are no longer sufficient.
 In the LOrPE implementation within NPStat, simple algorithms
 are used instead which perform pointwise convolutions. Their
 computational complexity scales as ${\cal O}(N) + {\cal O}(N_b N_s)$, where
 $N_b$ is the number of bins in the sample discretization
 histogram and $N_s$ is the number of bins of the same length
 (area, volume, {\it etc}) inside the kernel support. For
 large bandwidth values (or for kernels with infinite support)
 this essentially becomes ${\cal O}(N) + {\cal O}(N_b^2)$ which
 can be significantly slower than the KDE implementation based on DFFT.
 
 The following NPStat classes and functions can be used to perform LOrPE
 and to assist in choosing the bandwidth:
 
 \cname{LocalPolyFilter1D} --- LOrPE for one-dimensional samples.
    A numerical Gram-Schmidt procedure is used 
    to build polynomials defined by Eq.~\ref{eq:norm0}
    on an equidistant grid. A linear filter which combines
    formulae~\ref{eq:convol} and~\ref{eq:lorpe} is then
    constructed for each grid point $\xf$ from the density
    support region (the same ``central'' filter is used for
    all $\xf$ points far away from the support boundaries).
    The \cname{filter} method of the class can then be used to
    build density estimates defined by Eq.~\ref{eq:lorpe} with $x = \xf$
    from sample histograms. Alternatively, the \cname{convolve} method
    can be used to to make estimates according to Eq.~\ref{eq:lorpe2}.
    If necessary, subsequent truncation of the reconstructed
    densities below 0 together with renormalization should be performed
    by the user.
 
 \cname{WeightTableFilter1DBuilder}
    (header file ``npstat/stat/WeightTableFilter1DBuilder.hh'') ---
    Helper class designed to work with \cname{LocalPolyFilter1D}.
    This helper constructs linear filters out of arbitrary scanned weights
    utilizing orthogonal polynomials. If it is necessary to introduce exclusion
    regions into the data (for example, for the purpose of interpolating
    background density from sidebands), constructor of this class is the place
    where this can be done.
 
 \cname{NonmodifyingFilter1DBuilder}
    (header file ``npstat/stat/WeightTableFilter1DBuilder.hh'') ---
    Helper class designed to work with \cname{LocalPolyFilter1D}.
    This filter does not change the data ({\it i.e.}, this filter is
    represented by the unit matrix).
 
 \cname{getBoundaryFilter1DBuilder}
    (header file ``npstat/stat/AbsFilter1DBuilder.hh'') --- This
    function can be used to construct various filters that inherit
    from \cname{AbsBoundaryFilter1DBuilder} class. These filters differ
    by the kernel adjustments they perform near the density support boundaries.
    Many filters of this kind are declared in the
    ``npstat/stat/Filter1DBuilders.hh'' header file.
 
 \cname{PolyFilterCollection1D} --- A collection of \cname{LocalPolyFilter1D}
    objects which can be used, for example, in bandwidth scans or in
    cross-validation calculations.
 
 \cname{LocalPolyFilterND} --- similar to \cname{LocalPolyFilter1D} but
     intended for estimating multivariate densities.
 
 \cname{SequentialPolyFilterND} --- similar to \cname{LocalPolyFilterND}
     but each dimension is processed sequentially using 1-d filtering.
     Employs a collection of \cname{LocalPolyFilter1D} objects, one for
     each dataset dimension.
 
 \cname{LOrPECopulaSmoother} --- multivariate LOrPE in which extra care
     is applied in order to ensure that the estimation result
     is a bona fide copula ({\it i.e.,} that all of its marginals
     are uniform). 
     This class should normally be used to smooth empirical copula densities
     constructed by \cname{empiricalCopulaHisto} or
     \cname{empiricalCopulaDensity}.
     The bandwidth can be supplied by the user or
     it can be chosen by cross-validation. Less reliable but faster
     calculations of this type can be performed with the
     \cname{KDECopulaSmoother} class described in Section~\ref{sec:kde}.
 
 \cname{SequentialCopulaSmoother} --- similar to \cname{LOrPECopulaSmoother}
     but each dimension is processed sequentially using 1-d filtering.
 
 \cname{NonparametricCompositeBuilder} --- a high-level API for estimating
     multivariate densities by applying KDE or LOrPE separately to
     each marginal and to the copula. This class builds
     \cname{CompositeDistributionND} objects from collections of sample points.
 
 \cname{symbetaLOrPEFilter1D} (header file
 ``npstat/stat/LocalPolyFilter1D.hh'') --- a convenience utility
 for building one-dimensional LOrPE filters using kernels from
 the symmetric beta family (including the Gaussian).
 
 \cname{lorpeMise1D} (header file ``npstat/stat/lorpeMise1D.hh'')
     --- calculates LOrPE MISE for an arbitrary known density
     according to Eq.~\ref{eq:mise}. The support of the density is
     split into $M$ subintervals. It is assumed that the sample
     points are distributed in these subintervals according to the
     multinomial distribution. The covariance matrix of this
     distribution is then propagated to the density estimation
     result. The trace of the propagated covariance, multiplied by the width of the subinterval,
     is added to the integrated squared bias in order to obtain the MISE. Of course, this method
     reproduces Eq.~\ref{eq:mise} exactly only in the limit $M \rightarrow \infty$,
     while in practice the $O(M^3)$ computational complexity of the error propagation
     formulae (based on conventional matrix multiplication) will necessitate a~reasonable
     choice of finite~$M$. I suggest choosing $M$ in such a manner
     that the ISE introduced by the discretization of the density is
     significantly smaller than the estimated MISE.
 
+    Further remarks on theoretical development of LOrPE,
+    as well as a comparison of its performance with
+    other density estimation techniques,
+    can be found in~\cite{ref:lorpe}.
 
 \subsection {Density Estimation with Bernstein Polynomials}
 \label{sec:bernstein}
 
 Density representation by Bernstein polynomial series
 is an alternative
 approach which can be used to alleviate the boundary bias problem of KDE.
 Bernstein polynomials are defined as follows:
 \begin{equation}
 b_{m,n}(x) = C_n^m x^m (1 - x)^{n - m},
 \label{eq:bpoly}
 \end{equation}
 where $m = 0, 1, ..., n$ and $C_n^m$ are the binomial coefficients:
 \begin{equation}
 C_n^m = \frac{n!}{m! (n-m)!}.
 \end{equation}
 In the density estimation context, Bernstein polynomials are often
 generalized to non-integer values of $n$ and $m$ in which case
 $C_n^m$ is replaced by
 $\frac{\Gamma(n + 1)}{\Gamma(m + 1) \Gamma(n - m +1)}$. Up to
 normalization, this representation is equivalent to the
 beta distribution with parameters $\alpha = m + 1$ and $\beta = n - m + 1$.
 For notational simplicity, I will use the term ``Bernstein polynomials''
 even if $n$ and $m$ are not integer.
 
 There are two substantially different variations of this density estimation
 method. In the first scheme~\cite{ref:betakern},
 Bernstein polynomials are used
 as variable-shape kernels in a KDE-like formula:
 \begin{equation}
 \hat{f}\sub{B}(x|n) = \frac{n + 1}{N} \sum_{i=0}^{N-1} b_{m(x), n}(x_i),
 \label{eq:betakernels}
 \end{equation}
 where it is assumed
 that the reconstructed density is supported on the [0, 1] interval
 (naturally, this interval can be shifted and scaled as needed).
 The requirement of asymptotic estimator consistency does not
 fix $m(x)$ uniquely, and this mapping can be chosen in a~variety
 of ways. In NPStat, the following relationship is implemented:
 \begin{equation}
 m(x) = \left\{ \begin{array}{ll}
                    c & \mbox{ if } x (n - 2 s) + s \leq c \\
                    x (n - 2 s) + s & \mbox{ if } c < x (n - 2 s) + s < n - c \\
                    n - c & \mbox{ if } x (n - 2 s) + s \geq n - c
                \end{array} \right.
 \end{equation}
 This formula reproduces the simple mapping $m(x) = x n$ considered in
 Ref.~\cite{ref:betakern} in case $s = 0$ and $c = 0$. The offset parameter
 $s$ plays the role of effective Bernstein polynomial degree used
 at $x = 0$ and regulates the amount of boundary bias. Meaningful
 values of $s$ lie between $-1$ and 0. As $m \rightarrow -1$, the mean
 of the generalized Bernstein polynomial kernels tends to $x$ so that
 the estimator
 becomes uniformly unbiased for linear density functions.
 At the same time, the width of
 the kernel tends to 0 at the boundary which leads to an increase in the estimator
 variance. As it makes little sense to use kernels whose width
 is smaller than the discretization bin size, the cutoff parameter $c$
 was introduced. This cutoff effectively limits kernel width from below
 in a manner which preserves asymptotic consistency of the estimator
 (the useful range of $c$ values is also [-1, 0]).
 In addition to appropriate selection of the main
 bandwidth parameter $n$, proper choice of parameters $s$ and $c$
 can significantly improve estimator convergence at the boundary.
 
 In the second variation of this density estimation technique~\cite{ref:babu},
 the polynomials are chosen
 based on the location of the observed points:
 \begin{equation}
 \hat{\hat{f}}\sub{B}(x|n) = \frac{n + 1}{N} \sum_{i=0}^{N-1} b_{m(x_i), n}(x).
 \label{eq:bern2}
 \end{equation}
 For any $m$ and $n$,
 $\int_0^1 b_{m,n}(x) dx = \frac{1}{n + 1}$, so this particular estimate
 is a bona fide density. Due to the reasons that will be mentioned later
 in this subsection,
 it can be advantageous to keep integer $m$ and $n$ in this approach.
 As in the case of $x$-dependent kernel shape, there is
 some amount of freedom in the $m(x_i)$ assignment. For asymptotic
 consistency we must require that $m(x_i)/n \rightarrow x_i$ as 
 $n \rightarrow \infty$. However, such
 assignments are not unique. One can choose, for example,
 $m(x_i) = \lfloor x_i/(n + 1) \rfloor$ as in Ref.~\cite{ref:babu}
 (the symbol $\lfloor \cdot \rfloor$ stands for the ``floor''
 function), but it can also be useful to assign more than one polynomial
 to $x_i$. The following scheme is implemented in NPStat in addition to
 the $m(x_i)$ assignment just mentioned. First, an
 integer $k$ is found such that $k + 0.5 \leq x_i (n + 1) < k + 1.5$.
 Then, if $0 \leq k < n$,
 \begin{equation}
 m(x_i) = \left\{ \begin{array}{ll}
                    k     & \mbox{ with weight } k + 1.5 - x_i (n + 1) \\
                    k + 1 & \mbox{ with weight }  x_i (n + 1) - k - 0.5
                  \end{array} \right.
 \end{equation}
 If $k < 0$ then $m(x_i) = 0$ with weight 1, and if $k >= n$ then $m(x_i) = n$
 with weight~1.
 The polynomial weighting schemes actually implemented in the code are
 slightly more complicated as they take into account data binning.
 
 With integer values of $m$ and $n$, density estimates constructed according
 to Eq.~\ref{eq:bern2} (or its weighted version just described) have an important
 property of being positive doubly stochastic. What it means is that not only
 $\int_0^1 \hat{\hat{f}}\sub{B}(x|n) dx = 1$
 but also a sum of an arbitrary number of separate $\hat{\hat{f}}\sub{B}(x|n)$
 estimators will be flat in $x$
 as long as the sum of $x_i$ values used to build all these estimators
 is itself flat. If the $x_i$ values are flat
 between 0 and 1 then assigned $m$ values will be flat
 between 0 and n (inclusive). Then double stochasticity
 follows directly from the partition
 of unity property of Bernstein polynomials: $\sum_{m=0}^n  b_{m,n}(x) = \sum_{m=0}^n  C_n^m x^m (1 - x)^{n - m} = [x + (1 - x)]^n = 1$.
 
 A collection of positive double stochastic estimators can be used for copula
 filtering by sequentially applying these estimators in each dimension (with the help
 of \cname{SequentialPolyFilterND} class).
 If the initial data set processed by this sequence represents a copula density
 (for example, in case it is an empirical copula density histogram), the result
 is also guaranteed to be a copula density.
 
 In NPStat, any density estimator
 implemented via the \cname{LocalPolyFilter1D} class
 can be turned into closest (in some sense) doubly stochastic estimator by
 calling the \cname{doublyStochasticFilter} method of that class.
 Non-negative estimators will be converted into non-negative
 doubly stochastic estimators
 using an iterative procedure similar to the one described in~\cite{ref:sinkhorn}
 while filters with negative entries will be converted into generalized doubly
 stochastic filters according to the method described in~\cite{ref:khoury}.
 
 The following facilities are provided by NPStat for estimating
 densities with Bernstein polynomials and beta distribution kernels:
 
 \cname{BetaFilter1DBuilder} --- Constructs linear filters for
 \cname{LocalPolyFilter1D} class according to Eq.~\ref{eq:betakernels}.
 
 \cname{BernsteinFilter1DBuilder} --- Constructs linear filters for
 \cname{LocalPolyFilter1D} according to Eq.~\ref{eq:bern2}. These
 filters are intended for use with the \cname{convolve} method
 of \cname{LocalPolyFilter1D} class rather than the \cname{filter} method.
 
 \cname{betaKernelsBandwidth} --- This function estimates optimal
 bandwidth (order $n$ of the generalized Bernstein polynomial)
 for Eq.~\ref{eq:betakernels}
 according to the AMISE calculation
 presented in~\cite{ref:betakern}. This bandwidth estimate should not be taken
 very seriously for realistic sample sizes, as finite sample performance
 of Bernstein polynomial methods is not well understood.
 
 It is worth mentioning that there are other ways to create
 doubly stochastic filters. For example, the filter which
 represents the discretized Green's function for the homogeneous
 1-d heat equation with the Neumann boundary conditions is doubly
 stochastic. This particular filter can be constructed
 with the help of the \cname{symbetaLOrPEFilter1D} function,
 using Gaussian kernel and specifying 0 for the degree of the
 LOrPE polynomial as well as \textsf{BM\_FOLD} for the boundary
 handling option. Subsequently, the \cname{convolve} method of this filter
 should be utilized to smooth the data.
 
 \subsection {Using Cross-Validation for Choosing the Bandwidth}
 \label{sec:crossval}
 
 Cross-validation is a technique for adaptive bandwidth selection
 applicable to both KDE and LOrPE. Two types of cross-validation
 are supported by NPStat: least squares and pseudo-likelihood.
 
 The least squares cross-validation is based on the following idea.
 The MISE from Eq.~\ref{eq:mise} can be written as
 \begin{eqnarray*}
 \mbox{MISE}(h) & = & E \left( \int_{-\infty}^{\infty} (\hat{p}(x|h) - p(x))^2 dx \right) \\ & = & E \left( \int_{-\infty}^{\infty} \hat{p}^2(x|h) dx \right) - 2 E \left( \int_{-\infty}^{\infty} \hat{p}(x|h) p(x) dx \right) + 
 \int_{-\infty}^{\infty} p^2(x) dx.
 \label{eq:lsqcv}
 \end{eqnarray*}
 The last term, $\int_{-\infty}^{\infty} p^2(x) dx$, does not depend on $h$,
 so minimization of MISE is equivalent to minimization of $B(h) \equiv E \left( \int_{-\infty}^{\infty} \hat{p}^2(x|h) dx - 2 \int_{-\infty}^{\infty} \hat{p}(x|h) p(x) dx \right).$ Of course, $p(x)$ itself is unknown. However, it can be
 shown (as in section 3.4.3 of~\cite{ref:silverman})
 that an unbiased estimator of $B(h)$ can be constructed as
 \begin{equation}
 \mbox{LSCV}(h) = \int_{-\infty}^{\infty} \hat{p}^2(x|h) dx - \frac{2}{N} \sum_{i=0}^{N-1} \hat{p}_{-1,i}(x_i, h),
 \label{eq:lscv}
 \end{equation}
 where $\hat{p}_{-1,i}(x, h)$ is a ``leaving-one-out'' density estimator
 to which the point at $x_i$ does not contribute. For example, in the
 case of KDE this estimator is defined by
 \begin{equation}
 \hat{p}_{-1,i}(x|h) = \frac{1}{(N-1) \, h} \sum_{\stackrel{j=0}{j \ne i}}^{N-1} K \left(\frac{x - x_j}{h}\right).
 \end{equation}
 Minimization of $\mbox{LSCV}(h)$ can lead to a reasonable bandwidth estimate,
 $h\sub{LSCV}$. However, as $h\sub{LSCV}$ is itself a random quantity,
 its convergence towards the bandwidth that optimizes MISE, $h\sub{MISE}$,
 is known to be rather slow. Moreover, $\mbox{LSCV}(h)$ can have
 multiple minima, so its minimization is best carried out by simply
 scanning $h$ within a certain range in the proximity of some value
 $h^{*}$ suggested by plug-in methods. For more information on the
 issues related to the least squares cross-validation see
 Refs.~\cite{ref:silverman, ref:kernsmooth}.
 
 The pseudo-likelihood cross-validation (also sometimes called
 likelihood cross-validation) is based on maximizing the ``leaving-one-out''
 likelihood:
 \begin{equation}
 \mbox{PLCV}(h) = \prod_{i=0}^{N-1} \hat{p}_{-1,i}(x_i, h)
 \end{equation}
 The criterion of maximum $\mbox{PLCV}(h)$
 can be obtained by minimizing an~approximate
 Kullback-Leibler distance between the density and its
 estimate~\cite{ref:silverman}.
 Maximizing $\mbox{PLCV}(h)$ is only appropriate in certain
 situations. In particular, whenever $\hat{p}_{-1,i}(x_i, h)$
 becomes 0 even for a~single point, this criterion fails
 to produce meaningful results. Its use is also problematic
 for densities with infinite support due to the strong
 influence fluctuations in the distribution tails
 exert on $\mbox{PLCV}(h)$. Because of these problems, NPStat
 implements a ``regularized'' version of $\mbox{PLCV}(h)$
 defined by
 \begin{equation}
 \mbox{RPLCV}(h) = \prod_{i=0}^{N-1} \max \left(\hat{p}_{-1,i}(x_i, h),\, \frac{\hat{p}\sub{self,$i$}(x_i, h)}{N^{\alpha}}\right),
 \label{eq:rplcv}
 \end{equation}
 where $\alpha$ is the regularization parameter chosen by the user
 ($\alpha = 1/2$ usually works reasonably well) and
 $\hat{p}\sub{self,$i$}(x, h)$ is the contribution of the data point at $x_i$
 into the original density estimator. For KDE, this contribution is
 \begin{equation}
 \hat{p}\sub{self,$i$}(x|h) = \frac{1}{N h} K \left(\frac{x - x_i}{h}\right).
 \end{equation}
 If the bandwidth is fixed, $\hat{p}\sub{self,$i$}(x_i|h) = K(0)/(Nh)$
 for every point $x_i$. 
 $\hat{p}\sub{self,$i$}(x_i, h)$ changes from one point to another for
 LOrPE and for variable-bandwidth KDE.
 
 Cross-validation in the NPStat package is implemented for discretized
 KDE and LOrPE density estimators.
 It is assumed that the optimal bandwidth corresponds
 to the maximum of some quantity, as in the case of $\mbox{RPLCV}(h)$.
 All classes which perform cross-validation for univariate densities
 inherit from the abstract base class \cname{AbsBandwidthCV1D}. For
 multivariate densities, the corresponding base class is \cname{AbsBandwidthCVND}
 (both of these base classes are declared in the header file 
 ``npstat/stat/AbsBandwidthCV.hh''). The following concrete classes
 can be used:
 
 \cname{BandwidthCVLeastSquares1D} --- implements Eq.~\ref{eq:lscv} for
     KDE and LOrPE in 1-d.
 
 \cname{BandwidthCVLeastSquaresND} --- implements Eq.~\ref{eq:lscv} for
     multivariate KDE and LOrPE.
 
 \cname{BandwidthCVPseudoLogli1D} --- implements Eq.~\ref{eq:rplcv} for
     KDE and LOrPE in 1-d.
 
 \cname{BandwidthCVPseudoLogliND} --- implements Eq.~\ref{eq:rplcv} for
     multivariate KDE and LOrPE.
 
 \noindent The cross-validation classes are used internally by such high-level
 classes as \cname{KDECopulaSmoother}, \cname{LOrPECopulaSmoother}, and
 \cname{SequentialCopulaSmoother}.
 
 
 \subsection {The Nearest Neighbor Method}
 
 Using NPStat tools, a simple density estimation algorithm similar to
 the $k$-nearest neighbor method~\cite{ref:silverman}
 can be implemented for one-dimensional samples as follows:
 \begin{thinlist}
 \item Discretize the data using a finely binned histogram.
 \item Convert this histogram into a distribution by constructing a \cname{BinnedDensity1D} object.
 \item For any point $x$ at which a density estimate is desired,
 calculate the corresponding cumulative distribution value $y = F(x)$.
 \item For some interval $\Delta$, $0 < \Delta < 1$, estimate the density
 at $x$ by
 \begin{equation}
 \hat{p}\sub{NN}(x) = \frac{\Delta}{q(y + \Delta/2)\, - \,q(y - \Delta/2)},
 \end{equation}
 where $q(y)$ is the quantile function: $q(F(x)) = x$.
 This formula assumes $y - \Delta/2 \ge 0$ and $y + \Delta/2 \le 1$.
 If $y + \Delta/2 > 1$ then the $\hat{p}\sub{NN}(x)$ denominator should 
 be replaced by $q(1)\, - \,q(1 - \Delta)$, and if $y - \Delta/2 < 0$
 then the denominator should become $q(\Delta)\, - \,q(0)$.
 \end{thinlist}
 In this approach, the parameter $\Delta$ plays the same role as the $k/N$ ratio
 in the standard $k$-nearest neighbor method.
 For best results, $\Delta$ should scale with the number
 of sample points as $N^{-1/5}$,
 and the optimal constant of proportionality depends on
 the estimated density itself~\cite{ref:silverman}.
 The $k$-nearest neighbor method (and its modification just described)
 is not recommended for estimating
 densities with infinite support as it leads to a diverging
 density integral.
 
 For multivariate samples, a similar estimate can be constructed with
 the help of \cname{HistoNDCdf} class. Its method \cname{coveringBox}
 can be used to find the smallest $d$-dimensional box with the given center
 and fixed proportions which encloses the desired sample fraction.
 
 
 \section{Nonparametric Regression}
 \label{sec:npregression}
 
 ``Regression'' refers to a form of data analysis
 in which the behavior of the dependent variable, called ``response'',
 is deduced as a function of the independent variable, called ``predictor'',
 from a sample of observations.
 The response values are considered random ({\it e.g.}, contaminated by
 noise) while the predictor values are usually assumed to be deterministic.
 The analysis purpose is thus to determine the location parameter
 of the response distribution
 (mean, median, mode, {\it etc}) as a function of the predictor.
 In the NPStat algorithms, the response is always assumed to be a
 univariate quantity while the predictor can be either univariate or
 multivariate. In the discussion below, predictor will be denoted
 by ${\bf x}$, response by $y$, $\mu({\bf x})$ will be used
 to describe the response location function, and $\hat{\mu}({\bf x})$
 will denote an~estimator of $\mu({\bf x})$.
 
 ``Nonparametric regression'' refers to a form of regression analysis
 in which no global parametric model is postulated for $\mu({\bf x})$.
 Instead, for every ${\bf x}\sub{fit}$, $\mu({\bf x})$ is described 
 in the vicinity of ${\bf x}\sub{fit}$ by a relatively
 simple model which is fitted using sample points located nearby.
 Further discussion of $\mu({\bf x})$ estimation depends critically
 on the assumptions which can be made about the distribution of
 response values.
 
 
 \subsection{Local Least Squares}
 
 With the additional assumption of Gaussian error distribution ({\it i.e.,}
 $y_i = \mu({\bf x}_i) + \epsilon_i$, where $\epsilon_i$ are normally distributed
 with mean 0 and standard deviation $\sigma_i$), the
 model fitting can be efficiently performed by the method of local
 least squares. In this method, $\hat{\mu}({\bf x})$ is found by
 minimizing the quantity:
 \begin{equation}
 \chi^2({\bf x}\sub{fit}, h) = \sum_{i=0}^{N-1} \left(\frac{y_i - \hat{\mu}({\bf x}_i|{\bf x}\sub{fit},h)}{\sigma_i}\right)^2 K_h({\bf x}_i - {\bf x}\sub{fit}).
 \label{eq:localleastsq}
 \end{equation}
 Here, $h$ refers to one or more parameters which determine
 the extent of the kernel $K_h({\bf x})$.
 In NPStat, $\hat{\mu}({\bf x}|{\bf x}\sub{fit},h)$ is usually decomposed
 into orthogonal polynomials. In the case of univariate predictor,
 \begin{equation}
 \hat{\mu}(x|x\sub{fit},h) = \sum_{k=0}^{M} \hat{a}_k(x\sub{fit},h) P_k\left(\frac{x - x\sub{fit}}{h}\right),
 \label{eq:regmodel}
 \end{equation}
 where polynomials $P_k(x)$ are subject to normalization condition~\ref{eq:norm0}.
 The expansion coefficients $\hat{a}_k(x\sub{fit},h)$ are determined from
 the equations $\partial \chi^2(x\sub{fit},h)/ \partial \hat{a}_k = 0$
 which leads to $M + 1$ simultaneous equations for $k = 0, 1, ..., M$:
 \begin{equation}
 \sum_{i=0}^{N-1} \frac{1}{\sigma_i^2} \left( y_i - \sum_{j=0}^{M} \hat{a}_j(x\sub{fit},h) P_j\left(\frac{x_i - x\sub{fit}}{h}\right) \right)  K\left(\frac{x_i - x\sub{fit}}{h}\right) P_k\left(\frac{x_i - x\sub{fit}}{h}\right) = 0.
 \label{eq:regsystem}
 \end{equation}
 
 If the predictor values $x_i$ are specified on a regular grid of points
 then the discretized version of Eq~~\ref{eq:norm0} is just
 \begin{equation}
 \frac{h_g}{h} \sum_{i=0}^{N-1} P_j\left(\frac{x_i - x\sub{fit}}{h} \right)  K\left(\frac{x_i - x\sub{fit}}{h}\right) P_k\left(\frac{x_i - x\sub{fit}}{h}\right) = \delta_{jk},
 \end{equation}
 where $h_g$ is the distance between any two adjacent values of $x_i$.
 This leads to a particularly simple solution for $\hat{a}_k(x\sub{fit},h)$
 if the model can be assumed at least locally homoscedastic
 ({\it i.e.,} if all $\sigma_i$ are the same in the vicinity of $x\sub{fit}$):
 \begin{equation}
 \hat{a}_k(x\sub{fit},h) = \frac{h_g}{h} \sum_{i=0}^{N-1} y_i  K\left(\frac{x_i - x\sub{fit}}{h}\right) P_k\left(\frac{x_i - x\sub{fit}}{h}\right).
 \label{eq:regconvol}
 \end{equation}
 Substituting this into Eq.~\ref{eq:regmodel}, one gets
 \begin{equation}
 \hat{\mu}(x\sub{fit} | h) = \frac{h_g}{h} \sum_{i=0}^{N-1} \sum_{k=0}^{M} y_i  K\left(\frac{x_i - x\sub{fit}}{h}\right) P_k\left(\frac{x_i - x\sub{fit}}{h}\right) P_k(0).
 \label{eq:effreg}
 \end{equation}
 If $K(x)$ is an even function and $x$ is far away from
 the boundaries of the interval on which the regression is performed,
 Eq.~\ref{eq:effreg} is equivalent to the well-known Nadaraya-Watson estimator,
 \begin{equation}
 \hat{\mu}_{NW}(x\sub{fit} | h) = \frac{\sum_{i=0}^{N-1} y_i K\sub{eff}\left(\frac{x_i - x\sub{fit}}{h}\right)}{\sum_{i=0}^{N-1} K\sub{eff}\left(\frac{x_i - x\sub{fit}}{h}\right)},
 \end{equation}
 with the effective kernel given by
 \begin{equation}
 K\sub{eff}(x) = \sum_{k=0}^{M} P_{k}(0) P_{k}(x) K(x).
 \end{equation}
 Just as in the case of LOrPE, a taper function can be introduced
 for this kernel which leads to Eq~\ref{eq:effk}.
 
 If the predictor values $x_i$ are arbitrary or if the model can not
 be considered homoscedastic even locally, there is no
 simple formula which solves the linear system~\ref{eq:regsystem}.
 In this case the equations must be solved numerically.
 To perform this calculation, NPStat calls appropriate routines
 from LAPACK~\cite{ref:lapack}.
 % Modeling the response by orthogonal polynomial series
 % (as opposed to monomial expansion) helps to prevent ill-conditioning
 % of the system and significantly improves the numerical stability
 % of the solutions.
 
 Generalization of local least squares methods to multivariate predictors
 is straightforward: one simply switches to multivariate kernels
 and polynomial systems.
 
 The following NPStat classes can be used to perform local least
 squares regression of locally homoscedastic polynomial models
 on regular grids:
 
 \cname{LocalPolyFilter1D}, \cname{LocalPolyFilterND}, and \cname{SequentialPolyFilterND} --- these
 classes have already been mentioned in Section~\ref{sec:lorpe}. 
 It turns out that LOrPE of discretized data essentially amounts to
 local least squares regression on histogram bin contents. To see that,
 compare Eqs.~\ref{eq:regconvol} and~\ref{eq:convol}.
 Up to an overall normalization
 constant, Eq.~\ref{eq:regconvol} is just a discretized version of Eq.~\ref{eq:convol}.
 In fact, all three ``PolyFilter'' classes actually perform local least squares regression
 which, in the signal analysis terminology, is a linear filtering procedure. If the result
 is to be treated as a density, it has to be truncated below zero and renormalized by the user.
 
 \cname{QuadraticOrthoPolyND} --- this class supports a finer interface to the
     local least squares regression functionality on a grid than \cname{LocalPolyFilterND},
     but only for polynomials up to second degree. In addition to the response
     itself, this class can be used to calculate the gradient
     and the hessian of the local response surface defined by
     Eq.~\ref{eq:regmodel}\footnote{This can be useful, for example,
     for summarizing properties of log-likelihoods defined on grids
     in the space of estimated parameters for some parametric
     statistical models.}.
     The predictor/response data can be
     provided by a method of some class (callback)
     which is sometimes more convenient
     than using just a~grid of points.
 
 \cname{LocalQuadraticLeastSquaresND} --- this class fits local
     least squares regression models for arbitrary multivariate predictor values
     (no longer required to be on a grid).
     The models can be heteroscedastic, as in the most general case of Eq.~\ref{eq:regsystem}.
     The polynomials can be at most quadratic. Calculation of the gradient
     and the hessian of the local response surface is supported.
 
 \subsection{Local Logistic Regression}
 \label{sec:locallog}
 
 For regressing binary response variables,
 NPStat implements a method known as ``local quadratic logistic regression''
 (LQLR). This method is a trivial extension of the local linear
 logistic regression originally proposed in~\cite{ref:localreg}.
 In this type of analysis, ``response'' is the probability of
 success in a Bernoulli trial, estimated as a function of one or more
 predictors. Due to the manner in which it is often used, this probability 
 will be called ``efficiency'' for the remainder of this section.
 
 In the LQLR model, the efficiency dependence on ${\bf x}$ is represented by
 \begin{equation}
 \epsilon({\bf x}) = \frac{1}{1 + e^{-P({\bf x})}}
 \end{equation}
 where $P({\bf x})$ is a multivariate quadratic polynomial whose
 coefficients are determined at each predictor value ${\bf x}\sub{fit}$
 by maximizing the local log-likelihood:
 \begin{equation}
 L({\bf x}\sub{fit}, h) = \sum_{i=0}^{N-1} K_h({\bf x}_{i} - {\bf x}\sub{fit}) \left[y_i \ln (\hat{\epsilon}({\bf x}_{i})) + (1 - y_i) \ln (1 - \hat{\epsilon}({\bf x}_{i})) \right].
 \label{eq:lqlrlogli}
 \end{equation}
 Here,
 $K_h({\bf x}_{i} - {\bf x}\sub{fit})$ is a~suitable localizing kernel which decays to 0
 when ${\bf x}_{i}$ is far from ${\bf x}\sub{fit}$, and $y_i$ are the observed values
 of the Bernoulli trial: 1 if the point ``passes'' and 0 if it ``fails''.
 The local log-likelihood~\ref{eq:lqlrlogli} is very similar to the one
 implemented in the Locfit package~\cite{ref:locfit}, the only difference
 is that orthogonal polynomials are used in NPStat to construct
 $P({\bf x})$ series instead of monomials.
 
 Setting partial derivatives of $L({\bf x}\sub{fit}, h)$ with respect to
 polynomial coefficients to 0 results in a system of nonlinear
 equations for these coefficients. Solving such a system of equations
 does not appear to be any easier than dealing with the original log-likelihood
 optimization problem by applying, let say, the Levenberg-Marquardt
 algorithm~\cite{ref:lvm}.
 
 NPStat includes facilities for efficient calculation of the LQLR
 log-likelihood together with
 its gradient with respect to $P({\bf x})$ expansion coefficients.
 This code does not rely on any specific optimization solver,
 and can be easily interfaced to a~number of different external
 optimization tools. The relevant classes are:
 
 \cname{LogisticRegressionOnKDTree} (header file ``npstat/stat/LocalLogisticRegression.hh'') --- this class calculates the log-likelihood
     from Eq.~\ref{eq:lqlrlogli} in the assumption that the kernel $K_h({\bf x})$
     has finite support. In this case iterating over all $N$ points in the sample
     becomes rather inefficient: there is no reason to cycle over values of
     ${\bf x}_{i}$ far away from ${\bf x}\sub{fit}$ because
     $K_h({\bf x}_{i} - {\bf x}\sub{fit})$ is identically zero for all such points.
     To automatically restrict the iterated range, the predictor values
     are arranged into a space-partitioning data structure known
     as $k$-$d$~tree~\cite{ref:kdtree} which is implemented in NPStat with the
     \cname{KDTree} class (header file ``npstat/nm/KDTree.hh'').
 
 \cname{LogisticRegressionOnGrid} (header file ``npstat/stat/LocalLogisticRegression.hh'') --- this class calculates the log-likelihood
     from Eq.~\ref{eq:lqlrlogli} in the assumption that the predictor values
     are histogrammed. Two identically binned histograms must be available: the one
     with all values of $y_i$ (``the denominator'') and the one which collects
     only those ${\bf x}_{i}$ for which $y_i = 1$ (``the numerator''). Naturally,
     only the bins sufficiently close to ${\bf x}\sub{fit}$ are
     processed when the LQLR log-likelihood is evaluated for these histograms.
 
 The ``interfaces'' directory of the NPStat package includes two
 high-level driver functions for fitting LQLR  response
 surfaces (header file
 ``npstat/interfaces/minuitLocalRegression.hh'').
 These functions employ a general-purpose 
 optimization package Minuit~\cite{ref:minuit} for maximizing
 the log-likelihood. The names of the functions are
 \cname{minuitUnbinnedLogisticRegression} 
 (intended for use with \cname{LogisticRegressionOnKDTree})
 and \cname{minuitLogisticRegressionOnGrid} 
 (for use with \cname{LogisticRegressionOnGrid}).
 To use these functions, the Minuit
 package has to be compiled and linked together
 with the user code which provides the data 
 and calls the functions.
 
 
 \subsection{Local Quantile Regression}
 \label{sec:localqreg}
 
 The method of least squares allows us to solve the
 problem of response mean determination in the regression
 context. By recasting calculation of the sample
 mean as a minimization problem, we have gained the
 ability to condition the mean
 on the value of the predictor. In a~similar manner,
 the method of least absolute deviations can be used
 to determine conditional median.
 In the method of least squares,
 the expression $S(f) = \sum_{i} f(y_i - \hat{\mu}({\bf x_i}))$
 is minimized, with $f(t) = t^2$. The method of least absolute
 deviations differs only by setting $f(t) = |t|$.
 Moreover, just like the problem of determination
 of response mean can be localized
 by introducing a kernel in the predictor space
 (resulting in local least squares, as in Eq.~\ref{eq:localleastsq}),
 the problem of response median determination
 can be subjected to the same localization treatment.
 As a method of determination of response location,
 local median regression is extremely robust (insensitive to outliers).
 
 Not only the median but an arbitrary
 distribution quantile can also be determined in this manner.
 The corresponding function to use is
 \begin{equation}
 f_q(t) = q \,t \,I(t \ge 0) - (1 - q) \,t \,I(t < 0),
 \end{equation}
 where $q$ is the cumulative distribution value of interest, $0 < q < 1$.
 You can easily convince yourself of the validity of this statement
 as follows. For a sample of points ${y_0, ..., y_{N-1}}$,
 define $t = y - y_q$, where $y_q$ is a parameter on which $S(f_q)$ depends.
 The condition for the minimum of $S(f_q)$, $\frac{d S(f_q)}{d y_q} = 0$,
 is then equivalent to  $\frac{d S(f_q)}{d t} = 0$. 
 As $\frac{d f_q(t)}{d t} = q \,I(t > 0) - (1 - q) \,I(t < 0)$, the minimum
 of $S(f_q)$ is reached when
 \begin{equation}
 q \sum_{i=0}^{N-1} I(y_i > y_q) - (1 - q) \sum_{i=0}^{N-1} I(y_i < y_q) = 0.
 \label{eq:qmin}
 \end{equation}
 This equation is solved when the number of sample points for which
 $y_i > y_q$ is $(1 - q) N$ and the number of sample points for which
 $y_i < y_q$ is $q N$. But this is precisely the definition of
 the sample quantile which, in the limit $N \rightarrow \infty$, becomes
 the population quantile of interest.
 
 Unfortunately, solving Eq.~\ref{eq:qmin} numerically
 in the regression context is usually significantly more
 challenging than solving the corresponding $\chi^2$ minimization
 problem. For realistic finite samples, $\frac{d S(f_q)}{d y_q}$ is not
 a continuous function, and Eq.~\ref{eq:qmin} can have multiple solutions.
 This basically rules out the use of standard gradient-based methods
 for $S(f_q)$ minimization.
 
 The following NPStat classes facilitate the solution of the local
 quantile regression problem:
 
 \cname{QuantileRegression1D} --- calculates the expression to
    be minimized (also called the ``loss function'') for a single univariate
    predictor value $x\sub{fit}$. This expression looks as follows:
    \begin{equation}
    S\sub{LQR}(x\sub{fit}) = \sum_{i=0}^{N-1} f_q(y_i - \hat{y}_q(x_i|x\sub{fit})) w_i,
    \end{equation}
    where weights $w_i$ are provided by the user (for example, these
    weights can be calculated as $w_i = K((x_i - x\sub{fit})/h)$, but more sophisticated
    weighting schemes can be applied as well). The quantile dependence on the
    predictor is modeled by
    \begin{equation}
    \hat{y}_q(x|x\sub{fit}) = \sum_{k=0}^{M} \hat{a}_k(x\sub{fit}) P_k\left(\frac{x - x\sub{fit}}{h}\right),
    \label{eq:quantpred}
    \end{equation}
    where $P_k(x)$ are either Legendre or Gegenbauer polynomials. The use of Legendre
    polynomials is appropriate for a global fit of the regression curve, while
    Gegenbauer polynomials are intended for use in combination with symmetric beta
    kernels that generate localizing weights~$w_i$. The expansion coefficients
    $\hat{a}_k(x\sub{fit})$ are to be determined by minimizing $S\sub{LQR}(x\sub{fit})$.
 
 \cname{QuantileRegressionOnKDTree} (header file ``npstat/stat/LocalQuantileRegression.hh'') 
    --- calculates the quantile regression
    loss function for a multivariate predictor (the class method
    which returns it is called \cname{linearLoss}):
    \begin{equation}
    S\sub{LQR}({\bf x}\sub{fit}, h) = \sum_{i=0}^{N-1} f_q(y_i - \hat{y}_q({\bf x}_i|{\bf x}\sub{fit})) K_h({\bf x}_i - {\bf x}\sub{fit}).
    \label{eq:quantregnd}
    \end{equation}
    The quantile dependence on the predictor is modeled by an expansion similar to
    Eq.~\ref{eq:quantpred} in which multivariate orthogonal polynomials
    (up to second order) are generated by $K_h({\bf x})$ used as the weight
    function. The expansion coefficients are to be determined for each
    ${\bf x}\sub{fit}$ separately by minimizing
    $S\sub{LQR}({\bf x}\sub{fit}, h)$. The predictor values are arranged into
    a~$k$-$d$~tree structure for reasons similar to those mentioned when
    the \cname{LogisticRegressionOnKDTree} class was described.
 
 \cname{QuantileRegressionOnHisto} (header file ``npstat/stat/LocalQuantileRegression.hh'') 
    --- calculates loss function~\ref{eq:quantregnd}
    in the assumption that the response is histogrammed on a regular
    grid of predictor values (so that the input histogram dimensionality is larger
    by one than the dimensionality of the predictor variable).
 
 \cname{CensoredQuantileRegressionOnKDTree} (header ``npstat/stat/ensoredQuantileRegression.hh'')
    --- this class constructs an
    appropriate quantile regression loss function in case some of the response
    values are unavailable due to censoring ({\it i.e.,} there is a cutoff
    from above or from below on the response values).
    For example, this situation can occur in modeling of jet response of a
    particle detector when jets with visible energy below certain cutoff
    are not reconstructed due to limitations in the clustering procedure.
    It is assumed that the censoring efficiency ({\it i.e.,} the fraction
    of points not affected by the cutoff) can be determined as a~function
    of the predictor by other techniques, like the local logistic regression
    described in section~\ref{sec:locallog}. It is also assumed
    that the cutoff value is known as a~function of the predictor, and that
    the presence of this cutoff is the only reason for inefficiency.
    The appropriate loss function in this case is
    \begin{equation}
    S\sub{CLQR}({\bf x}\sub{fit}, h) = \sum_{i=0}^{N\sub{p}-1} \left[f_q(y_i - \hat{y}_q({\bf x}_i|{\bf x}\sub{fit})) + \frac{1 - \epsilon_i}{\epsilon_i} g_q(\epsilon_i, y_{\mbox{\scriptsize  cut},i} - \hat{y}_q({\bf x}_i|{\bf x}\sub{fit}))\right] K_h({\bf x}_i - {\bf x}\sub{fit}),
    \label{eq:clqr}
    \end{equation}
    where the summation is performed only over the $N\sub{p}$ points
    in the sample surviving the cutoff. $\epsilon_i$ is the censoring
    efficiency for the given ${\bf x}_i$. The function
    $g_q(\epsilon, t)$ is defined differently for right-censored (R-C) samples
    in which surviving points are below the cutoff and left-censored (L-C) samples
    in which surviving points are above the cutoff:
    \begin{equation}
      g_{q,\mbox{\scriptsize R-C}}(\epsilon, t) = I(q < \epsilon) f_q(t) + I(q \ge \epsilon) \left(\frac{q - \epsilon}{1 - \epsilon} f_q(t) + \frac{1 - q}{1 - \epsilon} f_q(+\infty) \right)
    \label{eq:clqrrc}
    \end{equation}
    \begin{equation}
      g_{q,\mbox{\scriptsize L-C}}(\epsilon, t) = I(q > 1 - \epsilon) f_q(t) + I(q \le 1 - \epsilon) \left( \frac{1 - \epsilon - q}{1 - \epsilon} f_q(t) + \frac{q}{1 - \epsilon} f_q(-\infty) \right)
    \label{eq:clqrlc}
    \end{equation}
 
 Formulae~\ref{eq:clqr}, \ref{eq:clqrrc}, and~\ref{eq:clqrlc} were inspired
 by Ref.~\cite{ref:localquantreg}. They can be understood as follows. Consider, for example,
 a right-censored sample. If this sample was not censored, the value of the
 response  cumulative distribution function at $y_{\mbox{\scriptsize  cut},i}$
 would be equal $\epsilon_i$.
 For each point with response $y_i$ below the cutoff,
 there are $(1 - \epsilon_i)/\epsilon_i$ unobserved points above the cutoff (and $1/\epsilon_i$ points total).
 Before localization, points below
 the cutoff contribute the usual amount $f_q(y_i - \hat{y}_q({\bf x}_i|{\bf x}\sub{fit}))$
 into $S\sub{CLQR}({\bf x}\sub{fit}, h)$ (compare with
 Eq.~\ref{eq:quantregnd}). To the points above cutoff, we assign
 the value of response which  equals either
 $y_{\mbox{\scriptsize  cut},i}$ or $+\infty$, in such a manner that the estimate
 $\hat{y}_q({\bf x}_i|{\bf x}\sub{fit})$ is ``pushed'' in the
 right direction and by the right amount when it crosses $y_{\mbox{\scriptsize  cut},i}$.
 If the estimated quantile is less than the efficiency, all unobserved points
 are assigned the response value $y_{\mbox{\scriptsize  cut},i}$ 
 (this corresponds to the term $I(q < \epsilon) f_q(t)$ in the Eq.~\ref{eq:clqrrc}). This is
 because we know that the correct value of $\hat{y}_q({\bf x}_i|{\bf x}\sub{fit})$
 should be below $y_{\mbox{\scriptsize  cut},i}$, so the penalty for placing
 $\hat{y}_q({\bf x}_i|{\bf x}\sub{fit})$ above the cutoff is generated
 by all points, including the unobserved ones. If the chosen quantile is
 larger than the efficiency, the only thing we know is that the correct value of
 $\hat{y}_q({\bf x}_i|{\bf x}\sub{fit})$ should be inside the interval
 $(y_{\mbox{\scriptsize  cut},i}, +\infty)$. There is no reason to prefer
 any particular value from this interval, so the overall contribution of
 sample point $i$ for which $\hat{y}_q({\bf x}_i|{\bf x}\sub{fit}) \in (y_{\mbox{\scriptsize cut},i}, +\infty)$
 into $S\sub{CLQR}({\bf x}\sub{fit}, h)$ must not depend on 
 $\hat{y}_q({\bf x}_i|{\bf x}\sub{fit})$\footnote{If most points ${\bf x}_i$
 are like that for some ${\bf x}\sub{fit}$ value, the fit becomes
 unreliable. Consider increasing the bandwidth of the localization
 kernel or avoid such situations altogether. You should not expect
 to obtain good results for high (low) $q$ values and all possible
 ${\bf x}\sub{fit}$ in a right (left)-censored sample.}
 We do, however, want to prevent $\hat{y}_q({\bf x}_i|{\bf x}\sub{fit})$ from leaving
 this interval. This is achieved by placing so many points at $y_{\mbox{\scriptsize  cut},i}$
 that the fraction of sample points (including both observed and unobserved ones)
 at or below $y_{\mbox{\scriptsize  cut},i}$ is exactly $q$, and this is precisely
 what the term proportional to $I(q \ge \epsilon)$ does in Eq.~\ref{eq:clqrrc}.
 Similar reasoning applied to a left-censored sample leads to Eq.~\ref{eq:clqrlc}.
 
 Naturally, in the computer program, $-\infty$ and $+\infty$ in Eqs.~\ref{eq:clqrrc} and~\ref{eq:clqrlc}
 should be replaced by suitable user-provided numbers which are known to be below and above,
 respectively, all possible response values. In order to avoid deterioration
 in $S\sub{CLQR}({\bf x}\sub{fit}, h)$ numerical precision, these numbers should not be
 very different from the minimum and maximum observed response.
 
 \cname{CensoredQuantileRegressionOnHisto}  (header ``npstat/stat/ensoredQuantileRegression.hh'')
 --- this class calculates
 the loss function~\ref{eq:clqr}
 in the assumption that all information about response, efficiency, and cutoffs is
 provided on a~regular grid in the space of predictor values.
 
 The ``interfaces'' directory of the NPStat package includes several high-level
 driver functions for performing local quantile regression. These driver functions
 use the simplex minimization method of the Minuit package to perform local fitting of the
 quantile curve expansion coefficients.
 \cname{minuitLocalQuantileRegression1D} function uses \cname{QuantileRegression1D}
 class internally to perform local quantile regression with one-dimensional predictors.
 This driver function can be used, for example, to construct Neyman belts from
 numerical simulations of some statistical estimator.
 A similar function, \cname{weightedLocalQuantileRegression1D}, can be used
 to perform local quantile regression with one-dimensional predictors when
 the points are weighted.
 The \cname{minuitQuantileRegression} driver function can use one
 of the \cname{QuantileRegressionOnKDTree}, \cname{QuantileRegressionOnHisto},
 \cname{CensoredQuantileRegressionOnKDTree}, or
 \cname{CensoredQuantileRegressionOnHisto} classes
 (all of which inherit from a common base)
 to perform local quantile regression with multivariate predictors.
 The function \cname{minuitQuantileRegressionIncrBW} has similar
 functionality but it can also automatically increase the localization kernel
 bandwidth so that not less than a certain predefined fraction of the whole
 sample participates in the the local quantile determination
 for each ${\bf x}\sub{fit}$.
 
 \subsection{Iterative Local Least Trimmed Squares}
 
 The NPStat package includes an implementation of an iterative
 local least trimmed
 squares (ILLTS) algorithm applicable when predictor/response
 values are supplied on a regular grid of points in a multivariate
 predictor space (think image denoising). The algorithm operation
 consists of the following steps:
 \begin{enumerate}
 \item Systems of orthogonal polynomials up to user-defined degree $M$
       are constructed using weight functions $K_{h,-j}({\bf x})$.
       These weight functions are defined using symmetric
       finite support kernels in which the point in the kernel
       center together with $j - 1$ other points are set to 0.
       Imagine, for example, a $5 \times 5$ grid in two dimensions.
       Start with the uniform $K_h({\bf x})$ kernel which is 1 at every
       grid point. The $K_{h,-1}({\bf x})$ weight function
       is produced from $K_h({\bf x})$ by setting the central grid
       cell to 0. 24 weight functions of type $K_{h,-2}({\bf x})$
       are produced from $K_{h,-1}({\bf x})$ by setting one other
       grid cell to 0, in addition to the central one. 276
       weight functions of type $K_{h,-3}({\bf x})$ are produced
       from $K_{h,-1}({\bf x})$ by choosing 2 out of 24 remaining
       cells which are set to 0, and so on.
 
 \item Local least squares regression is performed using all
       polynomial systems generated by $K_{h,-j}({\bf x})$ weights
       for all possible positions of the kernel inside the predictor
       grid (sliding window)\footnote{Edge effects are taking into
       account by constructing special polynomial systems 
       which use boundary kernels similar to
       $K_{h,-j}({\bf x})$ but with different placement of zeros.}.
       For each kernel position,
       we find the polynomial system which produces the best
       $\chi^2$ calculated over grid points for
       which the weight function is not 0. For this polynomial
       system, we determine the value
       $\Delta = |y_c - y_{c,\mbox{\scriptsize fit}}|$ for the
       kernel center, where $y_c$ is the response value in
       the data sample and $y_{c,\mbox{\scriptsize fit}}$ is
       the response value produced by the local least squares fit.
 
 \item The position of the kernel is found for which
       $\Delta$ is the largest in the whole data sample.
 
 \item The response for the position found
       in the previous step is adjusted by setting it
       to the value fitted at the kernel center.
 
 \item $\Delta$ is recalculated for all kernel positions affected by
       the adjustment performed in the previous step.
 
 \item The previous three steps are repeated until some stopping
       criterion is satisfied. For example, the requirement that
       the largest $\Delta$ in the grid becomes small (below certain
       cutoff) can serve as such a stopping criterion.
 \end{enumerate}
 
 The ILLTS algorithm works best if the fraction of outliers in the
 sample is relatively small and the response errors are homoscedastic.
 ILLTS is expected to be more efficient (in the statistical sense) than the local
 quantile regression. If the spectrum of response errors is not known
 in advance, it becomes very instructive to plot the history
 of $\Delta$ values for which response adjustments were performed.
 This plot often exhibits two characteristic ``knees'' which
 correspond to the suppression of outliers and suppression of ``normal''
 response noise. The procedure can be stopped somewhere 
 between these knees and followed up by normal local
 least squares on the adjusted sample, perhaps utilizing
 different bandwidth.
 
 Unfortunately, computational complexity of the ILLTS
 algorithm is increasing exponentially
 with increasing $j$, so only very small values of $j$ are
 practical. The inability to use high values of $j$ can
 be partially compensated for by choosing smaller bandwidth
 values and performing more iteration cycles. More cycles
 result in larger {\it effective} bandwidth --- think, for example,
 what happens when you perform local least squares multiple
 times. For certain kernels like Gaussian or Cauchy
 ({\it i.e.,} stable distributions) and polynomial degree $M = 0$ (local
 constant fit),
 this is exactly equivalent to choosing larger bandwidth.
 For other types of kernels not only the effective bandwidth increases
 with the number of passes
 but also the effective kernel shape gets modified\footnote{ILLTS adds
 the dimension of time (adjustment cycle number) to the
 solution of robust regression problem. It would be interesting to explore
 this, for example, by introducing time-dependent bandwidth or
 by studying the connection between the ILLTS updating scheme and
 the numerous updating schemes employed in solving
 partial differential equations.}.
 
 The top-level API function for running the ILLTS algorithm is called
 \cname{griddedRobustRegression}. The $K_{h,-1}({\bf x})$ weights
 can be used by supplying an object of \cname{WeightedLTSLoss} type
 as its loss calculator argument, and $K_{h,-2}({\bf x})$ weights
 are used by choosing \cname{TwoPointsLTSLoss} instead. A simple
 stopping criterion based on the $\Delta$ value, local least trimmed
 squares $\chi^2$, or the number of adjustment cycles can be specified
 with an object of \cname{GriddedRobustRegressionStop} type. 
 The \cname{griddedRobustRegression} implementation
 is rather general, and can accept user-developed loss calculators and
 stopping criteria.
 
 \subsection{Organizing Regression Results}
 
 It is usually desirable to calculate the regression
 surface on a reasonably fine predictor grid and save the
 result of this calculation for subsequent
 fast lookup. A general inerface for such a lookup is provided
 by the \cname{AbsMultivariateFunctor} class 
 (header file ``npstat/nm/AbsMultivariateFunctor.hh''). This class
 can be used to represent results of both parametric and
 nonparametric fits.
 
 Persistent classes \cname{StorableInterpolationFunctor} and 
 \cname{StorableHistoNDFunctor} derived from \cname{AbsMultivariateFunctor}
 are designed to represent
 nonparametric regression results. Both of these classes
 assume that the regression was
 performed on a (hyper)rectangular grid of predictor
 points. \cname{StorableInterpolationFunctor}
 supports multilinear interpolation and extrapolation of results,
 with flexible extrapolation (constant or linear) for each
 dimension beyond the grid boundaries. This class essentially
 combines the \cname{AbsMultivariateFunctor} interface with
 the functionality of the \cname{LinInterpolatedTableND} class
 discussed in more detail in Section~\ref{sec:utilities}.
 
 The class \cname{StorableHistoNDFunctor} allows for constant,
 multilinear, and multicubic interpolation\footnote{Multicubic
 interpolation is supported for uniform grids only.}
 inside the grid boundaries, and for constant extrapolation outside
 the boundaries (the extrapolated response value is set to its value at
 the closest boundary point). Both \cname{StorableInterpolationFunctor}
 and \cname{StorableHistoNDFunctor} support arbitrary transformations
 of the response variable via a user-provided functor. If a 
 transformation was initially applied to the
 response values in order to simplify subsequent modeling,
 this is a good place to perform the inverse.
 
 \section{Unfolding with Smoothing}
 \label{sec:unfolding}
 
 In particle physics, the term {\it unfolding} is used to describe
 methods of nonparametric reconstruction of probability densities
 using observations affected by particle detector resolution and inefficiency
 (see~\cite{ref:unfoldingrev} for a contemporary review).
 In other natural sciences, the term {\it inverse problem} is commonly
 used~\cite{ref:numrecipes}, while in the statistical literature 
 a more specific name, {\it deconvolution density estimate}, is
 becoming the norm~\cite{ref:deconvolutionbook}.
 
 \subsection{Unfolding Problem}
 
 For the purpose of stating the unfolding problem,
 it will be assumed that the detector can be described by an
 operator $K$. This operator (also called {\it kernel},
 {\it transfer function}, {\it observation function},
 or {\it response function}, depending on the author and context)
 converts probability
 densities $p({\bf x})$
 in the physical process space ${\bf x}$ into the densities $q({\bf y})$ in the
 observation space ${\bf y}$: 
 $q = K p \equiv \int K({\bf y}, {\bf x}) p({\bf x}) {\bf dx}$.
 The response function does not have
 to be fully efficient: $q$ does not have to integrate to 1 when
 $p$ is normalized. In the subsequent discussion, operator $K$ will be assumed
 linear and exactly known but not necessarily invertible.
 
 In many situations of interest, observations are described by the empirical 
 density function ({\it i.e.}, there is no error term associated with
 each individual observation):
 \begin{equation}
 \rho_e({\bf y}) = \frac{1}{N} \sum_{i=1}^{N} \delta({\bf y} - {\bf y}_i).
 \end{equation}
 In this case, the probability to observe a point at ${\bf y}_i$ is given by
 the normalized version of $q$ called $r$: $r = q/\epsilon$.
 In case $p$ is normalized,
 \begin{equation}
 \epsilon = \int q({\bf y}) \,{\bf dy} = \int K p \,{\bf dy}
 \label{eq:efficiency}
 \end{equation}
 is the overall detector
 acceptance for the physical process under study.
 
 The purpose of unfolding is to learn as much as possible about $p({\bf x})$ 
 given $\rho_e({\bf y})$ when a~parametric model for $p({\bf x})$ is lacking.
 The difficulty of this problem can be easily appreciated from the following
 argument. $K$ typically acts as low pass filter. For measurements of
 a~single scalar quantity, it can often be assumed that the detector
 has resolution $\sigma$ and that $K(y, x) = {\cal N}(y - x, \sigma^2)$, so that
 the detector simply convolves $p(x)$ with the normal distribution.
 If $q$ was exactly known, the Fourier transform of $p$ could simply
 be obtained from $p(\omega) = q(\omega)/K(\omega)$. As $q(\omega)$ is
 not known, the closest available approximation is the characteristic
 function of $\rho_e(y)$:
 $\rho_e(\omega) = \int \rho_e(y) e^{i \omega y} dy = \frac{1}{N} \sum_{i=1}^{N} e^{i \omega y_i}$. As the characteristic function of the normal
 distribution is just $K(\omega) = e^{-\sigma^2 \omega^2/2}$, the ratio
 $\rho_e(\omega)/K(\omega)$ becomes arbitrarily large
 as $\omega \rightarrow \infty$. The ``naive'' method of
 estimating $p(\omega)$ as $\rho_e(\omega)/K(\omega)$ thus fails
 miserably: the high frequency components of the noise contained
 in the $\rho_e(\omega)$ are multiplied by an arbitrarily large factor
 so that $\rho_e(\omega)/K(\omega)$ isn't even square-integrable.
 
 A number of effective
 approaches to solving the pure deconvolution problem just described
 are discussed in~\cite{ref:deconvolutionbook}. These approaches invariably involve
 introduction of additional smoothness assumptions about either
 $p(x)$ or $q(y)$ or both. Such
 assumptions essentially declare that the high frequency components of $p(x)$
 are of little interest and, therefore, can be suppressed in the
 $\rho_e(\omega)/K(\omega)$ ratio (so that the inverse Fourier transform
 can exist). Introduction of new information by applying additional assumptions
 which make an~originally ill-posed problem treatable is called
 {\it regularization}.
 
 In the problems of interest for particle physics, the action
 of $K({\bf y}, {\bf x})$ on $p({\bf x})$ is usually more complicated than
 simple convolution. At the time of this writing,
 reconstruction of $p({\bf x})$ in particle physics applications is most often
 performed by either the SVD unfolding~\cite{ref:svdunfold}
 or the expectation-maximization
 (a.k.a. D'Agostini, or Bayesian)
 unfolding~\cite{ref:dagounfold}\footnote{In other disciplines,
 utility of these methods has been discovered earlier~\cite{ref:unfoldingrev}.}.
 Both of these methods start with the
 assumption that ${\bf x}$ and ${\bf y}$ spaces are binned, and that
 partitioning of these spaces is beyond the control of the method.
 In the SVD unfolding, one-dimensional ${\bf x}$ is assumed, and the
 regularization is performed by penalizing
 discretized second derivative of the $p(x)$
 density\footnote{This regularization technique has been reinvented
 many times. Depending on the problem, is also called 
 the {\it constrained linear inversion method},
 the {\it Phillips-Twomey method}, the {\it Tikhonov regularization},
 or the {\it ridge-parameter approach}~\cite{ref:numrecipes, ref:deconvolutionbook}.}. In the expectation-maximization
 unfolding, regularization usually consists in imposing
 a~subjective limit on the number of iteration cycles. This
 early stopping criterion penalizes
 deviations from the prior distribution used to start the iterations.
 However, due to the method nonlinearity, it is difficult to
 augment this statement with analytical derivations
 of the penalty applicable to arbitrary response functions.
 
 It should be appreciated that, for these methods,
 the sample binning itself
 serves as a part of problem regularization. Just imagine
 making very wide bins --- this leads to a~response
 matrix which is almost diagonal and easily invertible.
 However, information about small structures within each bin
 is now lost.
 
 \subsection{EMS Unfolding}
 
 The unfolding method implemented in NPStat combines smoothing in the
 ${\bf x}$ space with the expectation-maximization iterations performed
 until convergence.
 This combination (abbreviated
 as EMS --- expectation-maximization with smoothing)
 has important advantages over SVD unfolding and
 expectation-maximization with early stopping:
 \begin{thinlist}
 \item{{\it Ad hoc} binning is no longer needed. While the solution is
       still implemented on a grid, the cell width
       can be chosen sufficiently fine
       so that discretization does not affect the results.}
 \item{Precise response functions can be employed instead of their
       coarsely binned versions affected by prior distribution
       assumptions.}
 \item{Problem regularization can be unambiguously described in terms of the
       parameters of the smoothing filter used (bandwidth, {\it etc}).}
 \end{thinlist}
 While the empirical success of the EMS unfolding method
 has already been reported in the statistical
 literature~\cite{ref:nychka, ref:smoothedem, ref:dagounfold},
 the procedures implemented in NPStat also address the issues
 of uncertainty estimation for the reconstructed distribution and of
 choosing the filter parameters according to an objective
 criterion. 
 
 The method proceeds as follows. The standard expectation-maximization
 iterations update the reconstructed values of $p({\bf x})$ according to the formula~\cite{ref:nychka, ref:smoothedem}
 \begin{equation}
     \lambda_j^{(k+1)} = \frac{\lambda_j^{(k)}}{\epsilon_j} \sum_{i=1}^{n} \frac{K_{ij} y_i}{\sum_{\rho=1}^{m} K_{i\rho} \lambda_{\rho}^{(k)}}.
 \end{equation}
 Here, $\lambda_j^{(k)}$ are the unnormalized $p({\bf x})$ values
 (event counts) discretized on a sufficiently fine grid
 in the physical process space ${\bf x}$
 (whose cells are small in comparison with the typical size of response function
 features), obtained on a~$k$\supers{th} iteration. The index $j = 1, ..., m$ refers to
 the~{\it linearized} cell number in this (possibly multidimensional) grid.
 All $\lambda_j^{(0)}$ values (the starting point for the iterations) can normally
 be set to the same constant $c = N/(\epsilon \, m)$, where $N$ is the
 number of observed events and $\epsilon$ is the overall detector efficiency
 for a~constant $p({\bf x})$. $y_i$, $i = 1, ..., n$, denotes the number
 of observed events inside the cell with linearized index $i$ in the space
 of observations ${\bf y}$. Dimensionalities of the ${\bf x}$ and ${\bf y}$
 spaces can be arbitrary and distinct. $K_{ij}$ is the discretized response
 matrix. It is the probability that
 an event from the physical cell $j$ causes an observation in the cell
 $i$ of the ${\bf y}$ space. $\epsilon_j = \sum_{i=1}^{n} K_{ij}$ is the
 detector efficiency for the physical cell $j$.
 
 These iterations are modified by introducing a smoothing step. The updating
 scheme becomes
 \begin{eqnarray}
     \lambda_j^{*(k+1)} & = & \frac{\lambda_j^{(k)}}{\epsilon_j} \sum_{i=1}^{n} \frac{K_{ij} y_i}{\sum_{\rho=1}^{m} K_{i\rho} \lambda_{\rho}^{(k)}}, \\
     \lambda_r^{(k+1)} & = & \alpha^{(k+1)} \sum_{j=1}^m S_{r j} \lambda_j^{*(k+1)},
 \label{eq:semscheme}
 \end{eqnarray}
 where $S_{r j}$ is the {\it smoothing matrix}, and
 the smoothing step normalization constant, $\alpha^{(k+1)}$, preserves the
 overall event count obtained during the preceding expectation-maximization
 step (so that $\sum_{r=1}^m \lambda_r^{(k+1)} = \sum_{j=1}^m \lambda_j^{*(k+1)}$). The values $\lambda_j^{(\infty)}$ 
 obtained upon iteration convergence are therefore solutions of the equation
 \begin{equation}
     \lambda_r^{(\infty)} = \alpha^{(\infty)} \sum_{j=1}^m S_{r j} \frac{\lambda_j^{(\infty)}}{\epsilon_j} \sum_{i=1}^{n} \frac{K_{ij} y_i}{\sum_{\rho=1}^{m} K_{i\rho} \lambda_{\rho}^{(\infty)}},
 \label{eq:semeq}
 \end{equation}
 where $\alpha^{(\infty)} = \sum_{r=1}^m \lambda_r^{*(\infty)} / \sum_{r=1}^m \sum_{j=1}^m S_{r j} \lambda_j^{*(\infty)}$.
 
 The equation for the error propagation matrix,
 $J_{rs} \equiv \frac{\partial \lambda_r^{(\infty)}}{\partial y_s}$, can be obtained
 by differentiating Eq.~\ref{eq:semeq} w.r.t. $y_s$. In the matrix notation, this
 equation is
 \begin{equation}
 {\bf J} = (\alpha^{(\infty)} {\bf S} + {\bf A}) \left({\bf M} + {\bf B J}\right),
 \label{eq:errprop}
 \end{equation}
 where
 \begin{eqnarray}
     A_{j q} & = & \frac{\left(1 - \alpha^{(\infty)} \sum_{r=1}^m S_{rq} \right) \lambda_j^{(\infty)}}{\sum_{i=1}^{m} \lambda_i^{(\infty)}}, \label{eq:errpropmatricesa} \\
     B_{j q} & = & \frac{\delta_{j q}}{\epsilon_j} \sum_{i=1}^{n} \frac{K_{ij } y_i}{\sum_{\rho=1}^{m} K_{i\rho} \lambda_{\rho}^{(\infty)}} -  \frac{\lambda_j^{(\infty)}}{\epsilon_j} \sum_{i=1}^{n} \frac{K_{iq} K_{ij} y_i}{\left(\sum_{\rho=1}^{m} K_{i\rho} \lambda_{\rho}^{(\infty)}\right)^2},\\
     M_{j q} & = & \frac{\lambda_j^{(\infty)}}{\epsilon_j} \frac{K_{q j }}{\sum_{\rho=1}^{m} K_{q\rho} \lambda_{\rho}^{(\infty)}}.
 \end{eqnarray}
 The NPStat code solves the equivalent equation, $({\bf I} - (\alpha^{(\infty)} {\bf S} + {\bf A}) {\bf B}) \, {\bf J} = (\alpha^{(\infty)} {\bf S} + {\bf A}) {\bf M}$, using the LU
 factorization algorithm as implemented in LAPACK~\cite{ref:lapack}, and then
 runs iterative refinement cycles defined by Eq.~\ref{eq:errprop}
 until convergence.
 
 For unweighted samples, the covariance matrix of observations, ${\bf V}$,
 can be derived automatically by NPStat according to either Poisson or
 multinomial distribution assumptions
 using $\hat{y}_i = \sum_{\rho=1}^{m} K_{i\rho} \lambda_{\rho}^{(\infty)}$ as mean values.
 In more complicated situations, the user is expected to provide the
 covariance matrix of observations\footnote{This may require running the
 unfolding code twice, first to obtain $\lambda_{\rho}^{(\infty)}$, and then,
 when the covariance matrix is constructed externally, to propagate the uncertainties.
 Note that unfolding with the expectation-maximization algorithm intrinsically
 assumes Poisson distribution of the observed counts. If the covariance matrix of observations
 is highly inconsistent with this assumption, it will be impossible to interpret
 the result as a maximum likelihood estimate.}.
 The covariance matrix of unfolded values is then estimated according to
 ${\bf J} {\bf V} {\bf J}\supers{T}$.
 
 While the original expectation-maximization algorithm is agnostic about the dimensionalities
 of ${\bf x}$ and ${\bf y}$ spaces, the smoothing step is, of course, dimensionality-specific.
 The unfolding code is using smoothing filters constructed with the help of facilities described
 in Section~\ref{sec:densityestimation}. The following classes implement unfolding with
 smoothing:
 
 \cname{SmoothedEMUnfold1D} (header file ``npstat/stat/SmoothedEMUnfold1D.hh'') --- unfolds
 one-dimensional distributions using objects of \cname{LocalPolyFilter1D} class as
 smoothing filters.
 
 \cname{SmoothedEMUnfoldND} (header file ``npstat/stat/SmoothedEMUnfoldND.hh'') --- unfolds
 multivariate distributions. Dimensionalities of the ${\bf x}$ and ${\bf y}$ spaces
 can be arbitrary
 and distinct. Dimensionality of the smoothing filter is, of course, expected to be
 consistent with the structure of ${\bf x}$. Typically, the filters will be objects of
 either \cname{LocalPolyFilterND} or \cname{SequentialPolyFilterND} class, adapted
 for unfolding use by the \cname{UnfoldingFilterND} template.
 \cname{SmoothedEMUnfoldND} code is employing a more efficient
 implementation of the response matrix than the \cname{Matrix} class
 used by the rest of NPStat code, but Eq.~\ref{eq:errprop}
 is still solved using normal dense matrices. Therefore, practically
 usable number of cells in the discretization
 of the ${\bf x}$ space will be at most
 a~few thousands, as the computational complexity of the
 algorithm based on dense matrices is proportional to this number cubed.
 
 \subsection{Choosing the Smoothing Parameters}
 
 For likelihood-based inference, a useful model selection principle
 is provided by the Akaike information criterion (AIC)~\cite{ref:aic}.
 The AIC criterion adjusted for the finite sample size is~\cite{ref:modelsel}
 \begin{equation}
 AIC_c = -2 \ln L + 2 k + \frac{2 k (k + 1)}{N - k - 1},
 \label{eq:aicc}
 \end{equation}
 where $L$ is the maximized value of the model likelihood,
 $N$ is the sample size, and $k$ the number of parameters
 in the model. Selecting a model
 by minimizing $AIC_c$ avoids overfitting by reaching a compromise
 between complexity of the model and goodness-of-fit.
 
 Application of the $AIC_c$ criterion to the EMS
 unfolding procedure is, however, not completely straightforward.
 While calculation of the likelihood
 can be accomplished assuming Poisson distribution in the space
 of observations,
 it is not immediately obvious how to count the number of parameters
 in the model. To overcome this difficulty,
 NPStat assumes that the number of model parameters
 can be estimated as the {\it effective rank} of the
 ${\bf K} {\bf J} {\bf J}\supers{T} {\bf K}\supers{T}$ matrix.
 This assumption is based on the following reasoning\footnote{One can
 also argue that the ${\bf K} {\bf J}$ matrix
 plays similar role to the hat matrix in linear regression problems.
 This leads to the same conclusion about the number of model parameters.}.
 The covariance matrix of the fitted folded values ({\it i.e.}, $\hat{y}_i$) is
 ${\bf V}_{\hat{y}}({\bf V}) = {\bf K} {\bf J} {\bf V} {\bf J}\supers{T} {\bf K}\supers{T}$. If, using polynomial series, one fits multiple independent
 samples of random points taken from the uniform distribution, with
 the number of points per sample varying according to the Poisson distribution,
 the rank of the covariance matrix of the
 fitted unnormalized density values calculated over these samples
 will be equal to the degree of the fittted polynomial plus one.
 This is precisely the number of parameters of the fitted model.
 It doesn't matter how many abscissae are used to construct the covariance
 matrix of the fitted values as long as the number
 of abscissae exceeds the degree of the polynomial and the
 average number or points in a sample is ``sufficiently large''.
 While the model fitted to the observed values by the EMS unfolding method
 isn't polynomial, we can still identify some measure of the rank of 
 ${\bf V}_{\hat{y}}({\bf I}) = {\bf K} {\bf J} {\bf J}\supers{T} {\bf K}\supers{T}$ with the number of model parameters.
 
 Two different definitions of the effective
 rank of a symmetric positive-semidefinite matrix 
 (say, {\bf Q}) are implemented.
 The first one is the exponent 
 of the von Neumann entropy of ${\bf Q}/\mbox{tr} ({\bf Q})$. In terms
 of the ${\bf Q}$ eigenvalues, $e_i$, it is expressed
 as\footnote{Naturally, $\mbox{erank}_1({\bf Q})$ is also the exponent of the Shannon entropy of the normalized eigenspectrum.}
 \begin{equation}
         \mbox{erank}_1({\bf Q}) = \exp \left\{ -\sum_{i=1}^n \frac{e_i}{\|e\|} \ln \left( \frac{e_i}{\|e\|} \right) \right\},
         \ \ \ \|e\| = \sum_{i=1}^n e_i.
 \end{equation}
 The second is the ratio of the matrix trace to the largest eigenvalue:
 \begin{equation}
          \mbox{erank}_2({\bf Q}) = \frac{\mbox{tr} ({\bf Q})}{\max_{1 \le i \le n} e_i} = \frac{\|e\|}{\max_{1 \le i \le n} e_i}.
 \end{equation}
 These effective ranks are calculated by the \cname{symPSDefEffectiveRank} method
 of the NPStat \cname{Matrix} class.
 From some initial experimentation with simple models, it appears that
 setting $k$ in Eq.~\ref{eq:aicc} to either $\mbox{erank}_1({\bf K} {\bf J} {\bf J}\supers{T} {\bf K}\supers{T})$ or $\mbox{erank}_2({\bf K} {\bf J} {\bf J}\supers{T} {\bf K}\supers{T})$ works well, as both of these ranks have similar
 derivatives w.r.t. filter bandwidth. The corresponding $AIC_c$ criteria
 will be called $EAIC_c$ ($E$ in this abbreviation stands for ``entropic'')
 and $TAIC_c$ ($T$ stands for ``trace'').
 
 For any smoothing filter, the most important parameter regulating its
 functionality is the bandwidth. NPStat provides the following classes
 which facilitate the studies of $AIC_c$ criteria and unfolding performance
 vs. filter bandwidth:
 
 \cname{UnfoldingBandwidthScanner1D} (header ``npstat/stat/UnfoldingBandwidthScanner1D.hh'') --- simplifies and speeds up studies of 1-d EMS unfolding
 results obtained with multiple
 bandwidth values. Filters constructed for different
 bandwidth settings are stored internally
 and do not have to be recomputed when
 multiple observed samples are processed. Effective ranks of the
 ${\bf K} {\bf J} {\bf J}\supers{T} {\bf K}\supers{T}$ matrices
 are memoized as well.
 The bandwidth which optimizes either
 $EAIC_c$ or $TAIC_c$ criterion can be automatically searched for by
 calling the \cname{processAICcBandwidth} method of this class.
 Typical class usage is
 illustrated by the ``ems\_unfold\_1d.cc'' example program located
 in the ``examples/C++'' subdirectory of the NPStat package.
 
 \cname{UnfoldingBandwidthScannerND} (header ``npstat/stat/UnfoldingBandwidthScannerND.hh'') --- assists in studies of multivariate EMS unfolding
 results obtained with multiple bandwidth settings. \cname{SequentialPolyFilterND}
 class is used internally to generate multivariate filters
 according to the user-provided
 bandwidth values in each dimension of the ${\bf x}$ space. The filters
 and the effective ranks are memoized, but there is no automatic procedure
 to determine the optimal bandwidth set\footnote{The 
 \cname{UnfoldingBandwidthScannerND} class
 is designed to speed up repetitive mutidimensional grid scans.
 For more sophisticated algorithms searching for a~function minimum
 in multiple dimensions,
 the strategy of memoizing a lot of intermediate
 results for each point considered
 will not be optimal.}.
 
 Use of an effective rank to determine the number of model parameters
 leads to the requirement that the number of discretization cells in
 the observation space ${\bf y}$ should be substantially larger than
 this rank. This condition should be verified once the 
 EMS unfolding is performed with the optimal filter.
 
 If the unfolded values of $p({\bf x})$ are expected to become close
 to 0 somewhere in the ${\bf x}$ region considered, it is important
 to choose a type of filter that guarantees the non-negativity of
 the smoothed result\footnote{The code will automatically truncate negative
 filtered values but this truncation will lead to distortions in
 the covariance matrix of unfolded results not taken into account by linear
 error propagation formulae. A~highly inappropriate choice
 of filter can even break the expectation-maximization iterations
 by producing $\hat{y}_i = 0$ corresponding to a positive ${y}_i$ value.}.
 All LOrPE filters of degree zero have this property.
 
 \subsection{Large Problems with Sparse Matrices}
 
 The standard matrix multiplication methods and the LU factorization algorithm
 used to solve Eq.~\ref{eq:errprop} have $O(m^3)$ computational complexity,
 where $m$ is the number of cells used to discretize the physical process space.
 For programs running
 on a single processor, this complexity effectively limits $m$
 to a~few thousands. However, in many practically important applications
 the response function is sufficiently short-ranged so that most elements
 of its discretized representation $K_{ij}$ are zeros. In such situations
 it may be advantageous to utilize techniques and algorithms developed for
 sparse matrices in order to increase $m$.
 
 NPStat provides a header-only EMS unfolding implementation based on sparse
 matrix arithmetic in the namespace ``emsunfold''. In order to take advantage
 of this code,
 two additional software packages have to be installed by the user.
 NPStat relies on Eigen~\cite{ref:eigen} for basic operations
 with sparse matrices and for solving sparse linear systems. The package
 TRLAN~\cite{ref:trlan} is used for calculating leading eigenvalues and
 eigenvectors of large covariance matrices\footnote{I am not aware of any
 method for calculating {\it all} eigenvalues and eigenvectors of
 a symmetric positive-semidefinite matrix, sparse or not,
 with computational complexity better than $O(m^3)$.}.
 The following classes implement EMS unfolding with sparse matrices:
 
 \cname{SmoothedEMSparseUnfoldND}
 (header ``npstat/emsunfold/SmoothedEMSparseUnfoldND.hh'')
 --- parallels the functionality of the \cname{SmoothedEMUnfoldND} class, with a few
 peculiarities related to the sparseness of various matrices. In particular, only Poisson uncertainties
 on the observed data can be calculated automatically, as multinomial covariances
 would not be sparse. User-provided uncertainties are restricted to diagonal
 matrices. The smoothing matrix should be doubly
 stochastic\footnote{That is, each row and column
 of the smoothing matrix should sum to 1, within the tolerance parameter
 of the unfolding algorithm.
 Construction of doubly stochastic filters with NPStat is discussed
 in Section~\ref{sec:bernstein}. Double stochasticity is not
 enforced by the code itself, as it is often useful to benchmark the results obtained with sparse
 matrices against an implementation utilizing dense matrices.},
 otherwise Eq.~\ref{eq:errpropmatricesa} will inject a dense matrix
 into the system.
 
 \cname{SparseUnfoldingBandwidthScannerND}
 --- parallels the functionality of \cname{UnfoldingBandwidthScanner1D},
 with additional input parameters and diagnostic outputs needed to drive determination of
 eigenvalues and eigenvectors of large covariance matrices. This class
 is declared in the header file
 ``npstat/emsunfold/SparseUnfoldingBandwidthScannerND.hh''.
 
 
 \section{Pseudo- and Quasi-Random Numbers}
 \label{sec:randomgen}
 
 The C++11 Standard~\cite{ref:cpp11} defines an API for generating
 pseudo-random numbers. Unfortunately, this API
 suffers from a disconnect between modeling of statistical
 distributions (including densities, cumulative distributions, {\it etc})
 and generation of random numbers. Moreover,
 quasi-random numbers~\cite{ref:Niederreiter} useful for
 a large variety of simulation and data analysis purposes
 are not represented by the Standard,
 and there is no meaningful support for generating genuinely
 multivariate random sequences.
 
 NPStat adopts a different approach towards generation of pseudo-random,
 quasi-random,
 and non-random sequences ---
 the one that is more appropriate in the context
 of a~statistical
 analysis package. A small number of high-quality generators is
 implemented for
 producing such sequences on a unit $d$-dimensional cube.
 All such generators inherit from the same abstract base class
 \cname{AbsRandomGenerator} (header file ``npstat/rng/AbsRandomGenerator.hh'').
 Generators developed or ported by users can be seamlessly
 added as well. Conversion of uniformly distributed sequences into other
 types of distributions and to different support regions is performed
 by the classes that represent statistical distributions --- in particular,
 by those classes which inherit from \cname{AbsDistribution1D} and 
 \cname{AbsDistributionND} bases. Both of these bases have a virtual
 method \cname{random} which takes a sequence generator instance as an
 input and produces correspondingly distributed numbers (random or not)
 on output.
 By default, transformation of sequences is performed by the \cname{quantile}
 method for one-dimensional distributions and by \cname{unitMap}
 method for multivariate ones. However, it is expected that the
 \cname{random} method itself will be overriden by the derived classes
 when it is easier, for example, to make
 a random sequence with the desired properties by the acceptance-rejection
 technique. The following functions and classes are implemented:
 
 \cname{MersenneTwister} (header file ``npstat/rng/MersenneTwister.hh'') 
    --- generates pseudo-random numbers using the
    Mersenne Twister algorithm~\cite{ref:mercenne}.
 
 \cname{SobolGenerator} (header file ``npstat/rng/SobolGenerator.hh'') 
    --- generates Sobol quasi-random
    sequences~\cite{ref:sobol}.
 
 \cname{HOSobolGenerator} (header file ``npstat/rng/HOSobolGenerator.hh'') 
    --- generates higher order scrambled Sobol
    sequences~\cite{ref:hosobol}.
 
 \cname{RandomSequenceRepeater} (header file ``npstat/rng/RandomSequenceRepeater.hh'')
    --- this class can be used to produce
    multiple repetitions of sequences created by other generators
    whenever an instance of \cname{AbsRandomGenerator} is needed.
    The whole
    sequence is simply remembered (which can take a significant
    amount of memory
    for large sequences) and extended as necessary
    by calling the original generator.
 
 \cname{WrappedRandomGen} (header file ``npstat/rng/AbsRandomGenerator.hh'')
    --- a simple adaptor class for ``old style''
    random generator functions like ``drand48()''. Implements
    \cname{AbsRandomGenerator} interface.
 
 \cname{CPP11RandomGen} (header file ``npstat/rng/CPP11RandomGen.hh'')
    --- a simple adaptor class for pseudo-random
    generator engines defined in the C++11 standard. Implements
    \cname{AbsRandomGenerator} interface.
 
 \cname{EquidistantSampler1D} (header file ``npstat/rng/EquidistantSampler1D.hh'')
    --- generates a sequence of equidistant
    points, similar to bin centers of a~histogram with axis limits at 0 and 1.
 
 \cname{RegularSampler1D} (header file ``npstat/rng/RegularSampler1D.hh'')
    --- generates a sequence of points by
    splitting the $[0, 1]$ interval by 2, then splitting all subintervals
    by 2, {\it etc}. The points returned are the split locations.
    Useful for generating $2^{k} - 1$ points when integer
    $k$ is not known in advance.
 
 \cname{convertToSphericalRandom} (header file ``npstat/rng/convertToSphericalRandom.hh'')
    --- converts a multivariate random
    number from a unit $d$-dimensional cube into a random direction
    in $d$ dimensions and one additional random number between 0 and 1.
    Useful for generating random numbers according to spherically
    symmetrical distributions.
 
 
 \section{Algorithms Related to Combinatorics}
 
 The NPStat package implements several functions and algorithms related to
 permutations of integers and combinatorics:
 
 \cname{factorial} (header file ``npstat/rng/permutation.hh'')
     --- this function returns an exact factorial if the
    result does not exceed the largest unsigned long (up to $12!$ on 32-bit
    systems and up to $20!$ on 64-bit ones).
 
 \cname{ldfactorial} (header file ``npstat/rng/permutation.hh'')
    --- this function returns an approximate
    factorial up to $1754!$ as a long double.
 
 \cname{logfactorial} (header file ``npstat/rng/permutation.hh'')
    --- natural logarithm of a factorial, up to $\ln((2^{32} - 1)!)$,
    as a long double.
 
 \cname{binomialCoefficient} (header file ``npstat/nm/binomialCoefficient.hh'')
    --- calculates the binomial coefficients
    $C_N^M = \frac{N!}{M! (N-M)!}$ using an algorithm which avoids
    overflows.
 
 \cname{orderedPermutation} (header file ``npstat/rng/permutation.hh'')
    --- this function can be used to iterate
    over permutations of numbers $\{0, 1, ..., N-1\}$ in a systematic way.
    It generates a unique permutation of such numbers given a~non-negative input
    integer below $N!$.
 
 \cname{permutationNumber} (header file ``npstat/rng/permutation.hh'')
    --- inverse of \cname{orderedPermutation}:
    maps a permutation of numbers $\{0, 1, ..., N-1\}$ into a unique
    integer between $0$ and $N!$.
 
 \cname{randomPermutation} (header file ``npstat/rng/permutation.hh'')
    --- generates random permutations of numbers
    $\{0, 1, ..., N-1\}$ in which probability of every permutation
    is the same.
 
 \cname{NMCombinationSequencer} (header file ``npstat/stat/NMCombinationSequencer.hh'')
    --- this class iterates over all
    possible choices of $j_1, ..., j_M$ from
    $N$ possible values for each $j_k$ in such a way that all $j_1, ..., j_M$
    are distinct and appear in the sequence in the increasing order,
    last index changing most often. Naturally, the total number of
    all such choices equals to the number of ways to pick $M$ distinct
    items out of $N$: it is the binomial coefficient $C_N^M$.
 
 
 \section{Numerical Analysis Utilities}
 \label{sec:utilities}
 
 The NPStat package includes a menagerie of numerical analysis utilities
 designed, primarily, to support the statistical calculations described
 in the previous sections. They are placed in the ``nm'' directory
 of the package. A number of these utilities can be used as
 stand-alone tools. If the corresponding header file
 is not mentioned explicitly in the descriptions below,
 it is ``npstat/nm/NNNN.hh'', where NNNN stands
 for the actual name of the class or function.
 
 \cname{ConvolutionEngine1D} and \cname{ConvolutionEngineND} --- These
    classes encapsulate
    the NPStat interface to the FFTW package~\cite{ref:fftw}.
    They can be used to performs DFFT convolutions of one-dimensional
    and multivariate functions, respectively.
 
 \cname{EquidistantInLinearSpace} (header file ``npstat/nm/EquidistantSequence.hh'') 
    --- A sequence of equidistant points.
    For use with algorithms that take a vector of points as one of
    their parameters.
 
 \cname{EquidistantInLogSpace} (header file ``npstat/nm/EquidistantSequence.hh'') 
    --- A sequence of points whose logarithms are equidistant.
 
 \cname{findRootInLogSpace} --- templated numerical equation solving
    for 1-$d$ functions (or for 1-$d$ subspaces of multivariate functions)
    using interval division. It is assumed that the solution can be
    represented as a product of some object ({\it e.g.,} a vector)
    by a positive real number,
    and that number is then searched for.
 
 \cname{GaussHermiteQuadrature} and \cname{GaussLegendreQuadrature} 
    --- templated Gauss-Hermite and Gauss-Legendre quadratures for
    one-dimensional functions. Internally, calculations
    are performed in long double precision. Of course,
    lower precision functions can be integrated as well, with
    corresponding reduction in the precision of the result.
 
 \cname{rectangleIntegralCenterAndSize} (header file ``npstat/nm/rectangleQuadrature.hh'')
    --- Gauss-Legendre cubatures on
    rectangular and hyperrectangular domains using tensor product
    integration\footnote{``Tensor product integration'' simply means
    that the locations at which the function is evaluated and
    corresponding weights are determined by sequential application
    of Gauss-Legendre quadratures in each dimension.}.
 
 \cname{goldenSectionSearchInLogSpace} (header file ``npstat/nm/goldenSectionSearch.hh'')
    --- templated numerical search
    for a minimum of 1-$d$ functions (or for 1-$d$ subspaces of multivariate
    functions) using the golden section method.
    It is assumed that location of the minimum can be
    represented as a product of some object ({\it e.g.,} a vector)
    by a positive constant, and that constant is then searched for.
 
 \cname{goldenSectionSearchOnAGrid} (header file ``npstat/nm/goldenSectionSearch.hh'')
    --- search for a minimum using coordinates restricted to a user-defined
    one-dimensional grid. Appropriate for use with functions which are expensive
    to evaluate and for which the user has some prior idea about their smoothness.
 
 \cname{parabolicExtremum} (header file ``npstat/nm/MathUtils.hh'')
    -- determine extremum of a~parabola passing through three given points
    on a plane. Can be used in combination with \cname{goldenSectionSearchOnAGrid}
    to refine location of the minimum.
 
 \cname{GridAxis} --- This class can be used to define an axis of
    a rectangular grid with non-uniform spacing of points. The complementary
    class \cname{UniformAxis} works more efficiently for representing
    equidistant points, while the class \cname{DualAxis} can be
    used to represent both uniform and non-uniform grids.
 
 \cname{interpolate\_linear}, \cname{interpolate\_quadratic},
    \cname{interpolate\_cubic} (these three functions are declared in the
    header file ``npstat/nm/interpolate.hh'') 
    --- linear, quadratic, and cubic
    polynomials with given values at two, three, and four equidistant
    points, respectively.
 
 \cname{LinInterpolatedTable1D} --- persistent one-dimensional lookup table
    with linear interpolation between the tabulated values. Useful
    for representing arbitrary one-dimensional functions in case
    the full numerical precision is not required. If the table is
    monotonous, the inverse table can be constructed automatically.
 
 \cname{LinInterpolatedTableND} --- persistent multidimensional lookup table
    with multilinear interpolation between the tabulated values,
    as in Eq~\ref{eq:multilinear}. Extrapolation beyond
    the grid boundaries is supported as well. This class is useful
    for representing arbitrary functions in case the full numerical precision
    is not required. \cname{GridAxis}, \cname{UniformAxis}, or \cname{DualAxis}
    class (or user-developed
    classes with similar sets of methods) can be used
    to define grid point locations.
    Note that simple location-based lookup of stored values
    (without interpolation)
    can be trivially performed with the \cname{closestBin} method of the
    \cname{HistoND} class. Lookup of histogram bin values with interpolation
    can be performed by the \cname{interpolateHistoND} function
    (header file ``npstat/stat/interpolateHistoND.hh'').
 
 \cname{LinearMapper1d}, \cname{LogMapper1d} --- linear and log-linear
    transformations in 1-$d$ as functors
    (with ``double operator()(const double\& x) const'' method defined).
    The \cname{CircularMapper1d} class works similarly to \cname{LinearMapper1d}
    in situations with circular data topologies.
 
 \cname{Matrix} --- A templated matrix class. Useful for standard matrix
    manipulations, solving linear systems, finding eigenvalues and
    eigenvectors of symmetric matrices, singular value decomposition,
    {\it etc.} Encapsulates NPStat interface to LAPACK~\cite{ref:lapack}.
 
 \cname{findPeak3by3}, \cname{findPeak5by5} (header file ``npstat/nm/findPeak2D.hh'') --- 
    utilities which facilitate peak finding for two-dimensional surfaces 
    which can be contaminated by small amounts of noise ({\it e.g.}, from
    round-off errors). The user can fit a 2-$d$ quadratic polynomial
    inside a $3 \times 3$
    or $5 \times 5$ window by least squares\footnote{A fast method
    utilizing discrete orthogonal polynomial expansion is used internally.}
    and check whether that polynomial has
    an extremum inside the window.
    Initially intended for studying 2-$d$ log-likelihoods using sliding
    windows.
 
 \cname{solveQuadratic}, \cname{solveCubic} (header file ``npstat/nm/MathUtils.hh'')
    --- solutions of quadratic
    and cubic equations, respectively, by numerically sound 
    methods\footnote{The textbook formula
    $x_{1,2} = \frac{-b \pm \sqrt{b^2 - 4 a c}}{2 a}$
    for the roots of quadratic
    equation $a x^2 + b x + c =0$
    is also a~prime example of a numerical analysis pitfall.
    A minimal modification,
    $x_1 = -\frac{b + (I(b \ge 0) - I(b < 0))\sqrt{b^2 - 4 a c}}{2 a}$,
    $x_2 = \frac{c}{a x_1}$, avoids the subtractive
    cancellation problem in the $x_{1,2}$ numerator.}.
 
 \cname{ndUnitSphereVolume}  (header file ``npstat/nm/MathUtils.hh'')
    --- volume of the $n$-dimensional unit sphere.
 
 \cname{polyAndDeriv}  (header file ``npstat/nm/MathUtils.hh'')
    --- monomial series $\sum_{k=0}^M c_k x^k$
    and its derivative with respect to $x$, templated on the type
    of coefficients $c_k$.
 
 \cname{polySeriesSum}, \cname{legendreSeriesSum}, 
 \cname{gegenbauerSeriesSum}, \cname{chebyshevSeriesSum} (all of these
    functions are declared in the header file ``npstat/nm/MathUtils.hh'') 
    --- templated
    series of one-dimensional monomials, Legendre polynomials, Gegenbauer
    polynomials, and Chebyshev polynomials, respectively. Numerically
    sound recursive formulae are used to generate the polynomials.
 
 \cname{hermiteSeriesSumProb}, \cname{hermiteSeriesSumPhys} (header
    file ``npstat/nm/MathUtils.hh'') --- templated series of
    ``probabilist'' and ``physicist'' Hermite polynomials, respectively.
 
 \cname{chebyshevSeriesCoeffs} (header
    file ``npstat/nm/MathUtils.hh'') --- utility for approximating
    mathematical functions with Chebyshev polynomials.
 
 \cname{OrthoPoly1D}, \cname{OrthoPolyND} --- uni- and multivariate
    orthogonal polynomials on equidistant rectangular grids with
    arbitrary weight functions. In addition to producing
    the polynomials themselves, these classes can
    be used to calculate polynomial series,
    polynomial series expansion coefficients for gridded
    functions, and polynomial filters defined by Eq.~\ref{eq:effk}
    (but normalized so that the sum of filter coefficients is 1).
 
 The NPStat package also includes implementations of various
 special functions needed in statistical calculations (incomplete
 gamma function and its inverse, incomplete beta function, {\it etc}).
 These functions are declared in the header file
 ``npstat/nm/SpecialFunctions.hh''.
 
 \cleardoublepage
 \phantomsection
 \addcontentsline{toc}{section}{References}
 \begin{thebibliography}{99}
 
 \bibitem{ref:minuit}
 Minuit2 Minimization Package,
 \href{http://www.cern.ch/minuit/}{http://www.cern.ch/minuit/}
 
 \bibitem{ref:geners}
 Geners —-- Generic Serialization for C++,
 \href{http://geners.hepforge.org/}{http://geners.hepforge.org/}
 
 \bibitem{ref:cpp11} ISO/IEC Standard 14882:2011
 {\it Programming Language C++} (2011).
 
 \bibitem{ref:linearbinning}
 M.C.~Jones, and H.W.~Lotwick,  ``On the Errors Involved in Computing the Empirical Characteristic Function'', {\it Journal of Statistical Computation and Simulation} {\bf 17}, 133 (1983).
 
 \bibitem{ref:sampleskewkurt}
 D.N. Joanes and C.A. Gill, ``Comparing measures of sample skewness
 and kurtosis'', {\it The Statistician} {\bf 47}, 183 (1998).
 
 \bibitem{ref:ratioofnormals}
 D.V. Hinkley, ``On the ratio of two correlated normal random variables'',
 {\it Biometrika} {\bf 56}, 635 (1969).
 
 \bibitem{ref:johnson}
 N.L. Johnson, ``Systems of Frequency Curves Generated by Methods of Translation'', {\it Biometrika} {\bf 36}, 149 (1949).
 
 \bibitem{ref:johnsonbook}
 W.P. Elderton and N.L. Johnson, ``Systems of Frequency Curves'',
 Cambridge University Press (1969).
 
 \bibitem{ref:hahnshap}
 G.J. Hahn and S.S. Shapiro, ``Statistical Models in Engineering'',
 Wiley (1994).
 
 \bibitem{ref:draper}
 J. Draper, ``Properties of Distributions Resulting from Certain Simple Transformations of the Normal Distribution'', {\it Biometrika} {\bf 39}, 290 (1952).
 
 \bibitem{ref:hill}
 I.D. Hill, R. Hill, and R. L. Holder,
 ``Algorithm AS 99: Fitting Johnson Curves by Moments'',
 {\it Applied Statistics} {\bf 25}, 180 (1976).
 
 \bibitem{ref:handcock}
 M.S. Handcock and M. Morris, ``Relative Distribution Methods'',
 {\it Sociological Methodology} {\bf 28}, 53 (1998).
 
 \bibitem{ref:thas}
 O. Thas, ``Comparing Distributions'', Springer Series in Statistics (2009).
 
 \bibitem{ref:kdetransform}
 L. Yang and J.S. Marron, ``Iterated Transformation-Kernel Density Estimation'',
 {\it Journal of the American Statistical Association} {\bf 94}, 580 (1999).
 
 \bibitem{ref:smoothtest}
 J. Neyman, ``'Smooth Test' for Goodness of Fit'', {\it Skandinavisk Aktuarietidskrift} {\bf 20}, 150~(1937).
 
 \bibitem{ref:copulabook}
 R.B.~Nelsen, ``An Introduction to Copulas'', 2\supers{nd} Ed.,
 Springer Series in Statistics (2006).
 
 \bibitem{ref:eightqueens}
 \href{http://en.wikipedia.org/wiki/Eight_queens_puzzle}{``Eight queens puzzle'', Wikipedia.}
 
 \bibitem{ref:eightrooks}
 \href{http://en.wikipedia.org/wiki/Rook_polynomial}{``Rook polynomial'', Wikipedia.}
 
 \bibitem{ref:copulaentropy}
 J.~Ma and Z.~Sun, ``Mutual information is copula entropy'', 
 \href{http://arxiv.org/abs/0808.0845}{arXiv:0808.0845v1} (2008).
 
 \bibitem{ref:read}
 A.L.~Read, ``Linear interpolation of histograms'',
  {\it Nucl. Instr. Meth.} {\bf A 425}, 357  (1999).
 
 \bibitem{ref:silverman} B.W.~Silverman, ``Density Estimation for Statistics
 and Data Analysis'', Chapman \& Hall (1986).
 
 \bibitem{ref:izenmann} A.J.~Izenman, ``Recent Developments in Nonparametric Density Estimation'', {\it Journal of the American Statistical Association} {\bf 86}, 205 (1991).
 
 \bibitem{ref:kde} D.W.~Scott,
 ``Multivariate Density Estimation: Theory, Practice, and Visualization'',
 Wiley (1992).
 
 \bibitem{ref:bwopt} M.C.~Jones, J.S.~Marron and S.J.~Sheather,
 ``A Brief Survey of Bandwidth Selection for Density Estimation'',
 {\it Journal of the American Statistical Association} {\bf 91}, 401 (1996).
 
 \bibitem{ref:loaderbw} C.R.~Loader,
 ``Bandwidth Selection: Classical or Plug-in?'',
 {\it The Annals of Statistics} {\bf 27}, 415 (1999).
 
 \bibitem{ref:kernsmooth} M.P.~Wand and M.C.~Jones,
 ``Kernel Smoothing'', Chapman \& Hall (1995).
 
 \bibitem{ref:chiu2} S.-T. Chiu,
 ``An Automatic Bandwidth Selector for Kernel Density Estimation'',
 {\it Biometrika} {\bf 79}, 771 (1992).
 
 \bibitem{ref:chiumultivar} T.-J. Wu and M.-H. Tsai,
 ``Root n bandwidths selectors in multivariate kernel density estimation'',
 {\it Probab. Theory Relat. Fields} {\bf 129}, 537 (2004).
 
 \bibitem{ref:discretizedkde} M.C.~Jones,
 ``Discretized and Interpolated Kernel Density Estimates'',
 {\it Journal of the American Statistical Association} {\bf 84}, 733 (1989).
 
 \bibitem{ref:fastkde1} C.~Yang, R.~Duraiswami, N.A.~Gumerov and L.~Davis,
 ``Improved Fast Gauss Transform and Efficient Kernel Density Estimation'',
 in {\it Proceedings of the Ninth IEEE International
 Conference on Computer Vision} (2003).
 
 \bibitem{ref:fastkde2} D.~Lee, A.G.~Gray, and A.W. Moore,
 ``Dual-Tree Fast Gauss Transforms'', 
 \href{http://arxiv.org/abs/1102.2878}{arXiv:1102.2878v1} (2011).
 
 \bibitem{ref:bwgaussmix} J.S. Marron and M.P. Wand, 
 ``Exact Mean Integrated Squared Error'', {\it The Annals Of Statistics}
 {\bf 20}, 712 (1992).
 
 \bibitem{ref:abramson} I.S.~Abramson, ``On Bandwidth Variation in Kernel
 Estimates --- A Square Root Law'', {\it The Annals Of Statistics}
 {\bf 10}, 1217 (1982).
 
 \bibitem{ref:placida} P.D.A. Dassanayake,
 ``Local Orthogonal Polynomial Expansion for Density Estimation'',
 M.S. Thesis, Texas Tech University (2012).
 
+\bibitem{ref:lorpe} D.P.A. Dassanayake, I. Volobouev, and A.A. Trindade,
+``Local Orthogonal Polynomial Expansion for Density Estimation'',
+{\it Journal of Nonparametric Statistics} {\bf 29}, 806 (2017).
+
 \bibitem{ref:betakern} S.X. Chen, ``Beta kernel estimators for
 density functions'', {\it Computational Statistics \& Data Analysis}
 {\bf 31}, 131 (1999).
 
 \bibitem{ref:babu} G.J. Babu, A.J. Canty, and Y.P. Chaubey,
 ``Application of Bernstein Polynomials for smooth estimation
 of a distribution and density function'', {\it Journal of Statistical
 Planning and Inference} {\bf 105}, 377 (2002).
 
 \bibitem{ref:sinkhorn} R. Sinkhorn and P. Knopp, 
 ``Concerning Nonnegative Matrices and Doubly Stochastic Matrices'',
 {\it Pacific Journal of Mathematics} {\bf 21}, 343 (1967).
 
 \bibitem{ref:khoury} R. Khoury, ``Closest Matrices in the Space of
 Generalized Doubly Stochastic Matrices'', {\it Journal of
 Mathematical Analysis and Applications} {\bf 222}, 562 (1998).
 
 \bibitem{ref:lapack}
 LAPACK —-- Linear Algebra PACKage,
 \href{http://www.netlib.org/lapack/}{http://www.netlib.org/lapack/}
 
 \bibitem{ref:localreg} T.J.~Hastie, ``Non-parametric Logistic Regression'',
 SLAC PUB-3160, June 1983.
 
 \bibitem{ref:locfit} 
 CRAN - Package locfit, 
 \href{http://cran.r-project.org/web/packages/locfit/}
 {http://cran.r-project.org/web/packages/locfit/}
 
 \bibitem{ref:lvm} K.~Levenberg, ``A Method for the Solution 
 of Certain Non-Linear Problems in Least Squares'',
 {\it Quarterly of Applied Mathematics} {\bf 2}, 164 (1944).
 
 D.~Marquardt, ``An Algorithm for Least-Squares Estimation of Nonlinear Parameters'', {\it SIAM Journal on Applied Mathematics} {\bf 11}, 431 (1963).
 
 \bibitem{ref:kdtree} J.L.~Bentley, ``Multidimensional
 Binary Search Trees Used for Associative Searching'',
 {\it Communications of the ACM} {\bf 18}, 509 (1975).
 
 \bibitem{ref:localquantreg} H.J. Wang and L. Wang, 
 ``Locally Weighted Censored Quantile Regression'',
 {\it Journal of the American Statistical Association} {\bf 104}, 487 (2009).
 
 \bibitem{ref:unfoldingrev}
 F.~Span\`o, ``Unfolding in particle physics:
 a window on solving inverse problems'',
 {\it EPJ Web of Conferences} {\bf 55}, 03002 (2013).
 
 \bibitem{ref:numrecipes}
 W.H.~Press, S.A.~Teukolsky, W.T.~Vetterling, and B.P.~Flannery,
 ``Numerical Recipes: the Art of Scientific Computing'', 3rd ed.,
 Cambridge University Press (2007).
 
 \bibitem{ref:deconvolutionbook}
 A.~Meister, ``Deconvolution Problems in Nonparametric Statistics'',
 Springer Lecture Notes in Statistics, Vol. 193 (2009).
 
 \bibitem{ref:svdunfold}
 A. H\"ocker and V. Kartvelishvili, ``SVD approach to data unfolding'',
 {\it Nuclear Instruments and Methods in Physics Research A}
 {\bf 372}, 469 (1996).
 
 \bibitem{ref:dagounfold}
 G. D'Agostini, ``A multidimensional unfolding method based on Bayes' theorem'', 
 {\it Nuclear Instruments and Methods in Physics Research A}
 {\bf 362}, 487 (1995).
 
 \bibitem{ref:nychka}
 D. Nychka, ``Some Properties of Adding a Smoothing Step to the EM Algorithm'',
 {\it Statistics \& Probability Letters} {\bf 9}, 187 (1990).
 
 \bibitem{ref:smoothedem}
 B.W. Silverman \etal, ``A Smoothed EM Approach to Indirect Estimation Problems,
 with Particular Reference to Stereology and Emission Tomography'',
 {\it Journal of the Royal Statistical Society B} {\bf 52}, 271 (1990).
 
 \bibitem{ref:aic}
 H.~Akaike, ``A new look at the statistical model identification'',
 {\it IEEE Transactions on Automatic Control} {\bf 19}, 716 (1974).
 
 \bibitem{ref:modelsel}
 K.P. Burnham and D.R. Anderson, ``Model Selection and Multi-Model Inference:
 A~Practical Information-Theoretic Approach'', 2nd ed., Springer (2004).
 
 \bibitem{ref:eigen}
 Eigen --- a C++ template library for linear algebra:
 matrices, vectors, numerical solvers, and related algorithms,
 \href{http://eigen.tuxfamily.org/}{http://eigen.tuxfamily.org/}
 
 \bibitem{ref:trlan}
 K.~Wu and H.~Simon, ``Thick-Restart Lanczos Method
 for Large Symmetric Eigenvalue Problems'',
 {\it SIAM Journal on Matrix Analysis and Applications} {\bf 22}, 602 (2000).
 
 \bibitem{ref:Niederreiter} 
 H. Niederreiter, ``Random Number Generation and Quasi-Monte
  Carlo Methods'', Society for Industrial and Applied Mathematics (SIAM),
  Philadelphia (1992).
 
 \bibitem{ref:mercenne} M. Matsumoto and T. Nishimura, ``Mersenne Twister:
 A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator'',
 {\it ACM Transactions on Modeling and Computer Simulation} {\bf 8}, 3 (1998).
 
 \bibitem{ref:sobol} I. Sobol and Y.L. Levitan, 
 ``The Production of Points Uniformly Distributed in a Multidimensional Cube'',
 Tech. Rep. 40, Institute of Applied Mathematics,
 USSR Academy of Sciences, 1976 (in Russian).
 
 \bibitem{ref:hosobol} J. Dick, ``Higher order scrambled digital nets achieve the optimal rate of the root mean square error for smooth integrands'',
 \href{http://arxiv.org/abs/1007.0842}{arXiv:1007.0842} (2010).
 
 \bibitem{ref:fftw}
 FFTW (Fastest Fourier Transform in the West),
 \href{http://www.fftw.org/}{http://www.fftw.org/}
 
 \end{thebibliography}
 
 \cleardoublepage
 \phantomsection
 \addcontentsline{toc}{section}{Functions and Classes}
 \printindex
 
 \end{document}
Index: trunk/doc/stat_methods.pdf
===================================================================
Cannot display: file marked as a binary type.
svn:mime-type = application/pdf
Index: trunk/INSTALL
===================================================================
--- trunk/INSTALL	(revision 674)
+++ trunk/INSTALL	(revision 675)
@@ -1,50 +1,51 @@
 The NPStat package depends on several other packages which must be
 installed first:
 
 1) Double precision version of the FFTW library from www.fftw.org
 
 2) The "Geners" package from http://geners.hepforge.org/
 
 3) The "KSTest" package from http://kstest.hepforge.org/
 
 4) Fortran LAPACK and BLAS libraries liblapack.so and libblas.so
    (installed on many systems by default).
 
 If these packages are not yet installed on your machine, please
 install them first following the instructions in their corresponding
 INSTALL files. Make sure that your C++ compiler supports C++11.
 
 A few NPStat algorithms need an external minimization engine. These
 algorithms can be easily accessed if the Minuit2 minimization package
 is installed (see http://www.cern.ch/minuit/) However, Minuit2 is not
 needed in order to compile NPStat -- Minuit2 is interfaced to NPStat
 via header files only.
 
 Set the PKG_CONFIG_PATH environment variable to the name of the
 directory where the FFTW and Geners libraries install their package
 configuration files "fftw3.pc" and "geners.pc", respectively. Usually,
 this directory is /usr/local/lib/pkgconfig.
 
 After this, a typical NPStat installation can proceed as follows:
 
 ./configure --disable-static --with-pic
 make
 make install
 
 The last step may have to be performed as root. Run
 
 ./configure --help
 
 to view available configuration options.
 
-If you want to compile the python API, you might want to install a
-well-integrated python distribution first, such as the anaconda one.
-After completing the NPStat installation steps described above, run
+If you want to compile the python API (NPStat uses python3), you might
+want to install a well-integrated python distribution first, such as
+the anaconda one (https://www.anaconda.com/download/). After
+completing the NPStat installation steps described above, run
 
 make python
 
 The Python extension will be compiled inside "npstat/swig" directory
-but it will not be installed anywhere else. In order to use the python
-API, add the absolute path to your "npstat/swig" directory to the
-PYTHONPATH environmental variable. Note that compiling the Python
+but it will not be installed anywhere else. In order to use the
+python API, add the absolute path of your "npstat/swig" directory to
+the PYTHONPATH environmental variable. Note that compiling the Python
 wrapper can take a long time (tens of minutes).
Index: trunk/npstat/stat/InterpolatedDistro1D1P.hh
===================================================================
--- trunk/npstat/stat/InterpolatedDistro1D1P.hh	(revision 674)
+++ trunk/npstat/stat/InterpolatedDistro1D1P.hh	(revision 675)
@@ -1,104 +1,108 @@
 #ifndef NPSTAT_INTERPOLATEDDISTRO1D1P_HH_
 #define NPSTAT_INTERPOLATEDDISTRO1D1P_HH_
 
 /*!
 // \file InterpolatedDistro1D1P.hh
 //
 // \brief Interpolation of 1-d distributions as a function of one parameter
 //
 // Author: I. Volobouev
 //
 // December 2014
 */
 
 #include "geners/CPP11_shared_ptr.hh"
 
 #include "npstat/nm/GridAxis.hh"
 #include "npstat/stat/AbsInterpolatedDistribution1D.hh"
 
 namespace npstat {
     class InterpolatedDistro1D1P : public AbsDistribution1D
     {
     public:
 #ifndef SWIGBUG
-        /** This constructor builds an object which is ready to use */
+        /**
+        // This constructor builds an object which is ready to use.
+        // The "distros" argument must provide one distribution for
+        // each grid cell.
+        */
         InterpolatedDistro1D1P(
             const GridAxis& axis,
             const std::vector<CPP11_shared_ptr<const AbsDistribution1D> >& distros,
             double initialParameterValue = 0.0, bool interpolateVertically = false);
 #endif // SWIGBUG
 
         /**
         // If this constructor is used, the object must be built incrementally,
         // using "setGridDistro" calls, one call for each grid cell. After all
         // distributions have been set, provide the parameter value with
         // "setParameter". Only then one can call methods "density", etc.
         */
         InterpolatedDistro1D1P(const GridAxis& axis,
                                bool interpolateVertically = false);
 
         inline virtual ~InterpolatedDistro1D1P() {delete interpolator_;}
 
         InterpolatedDistro1D1P(const InterpolatedDistro1D1P&);
         InterpolatedDistro1D1P& operator=(const InterpolatedDistro1D1P&);
 
         /** Set the distribution corresponding to the given grid point */
         void setGridDistro(unsigned cell, const AbsDistribution1D& distro);
 
         /** Set the value of the parameter */
         void setParameter(double value);
 
         /** Get the value of the parameter */
         inline double getParameter() const {checkReady(); return param_;}
 
         inline const GridAxis& getAxis() const {return axis_;}
         inline unsigned nDistros() const {return axis_.nCoords();}
 
         void interpolateVertically(bool b);
         inline bool interpolatingVertically() const {return vertical_;}
 
         /** "Virtual copy constructor" */
         inline virtual InterpolatedDistro1D1P* clone() const
             {return new InterpolatedDistro1D1P(*this);}
 
         inline double density(const double x) const
             {checkReady(); return interpolator_->density(x);}
         inline double cdf(const double x) const
             {checkReady(); return interpolator_->cdf(x);}
         inline double exceedance(const double x) const
             {checkReady(); return interpolator_->exceedance(x);}
         inline double quantile(const double x) const
             {checkReady(); return interpolator_->quantile(x);}
 
         //@{
         /** Prototype needed for I/O */
         virtual inline gs::ClassId classId() const {return gs::ClassId(*this);}
         virtual bool write(std::ostream&) const;
         //@}
 
         static inline const char* classname()
             {return "npstat::InterpolatedDistro1D1P";}
         static inline unsigned version() {return 2;}
         static InterpolatedDistro1D1P* read(const gs::ClassId& id,
                                             std::istream& in);
     protected:
         virtual bool isEqual(const AbsDistribution1D&) const;
 
     private:
         InterpolatedDistro1D1P();
 
         void rebuildInterpolator();
         void checkDistros();
         void checkReady() const;
 
         GridAxis axis_;
         std::vector<CPP11_shared_ptr<const AbsDistribution1D> > distros_;
         AbsInterpolatedDistribution1D* interpolator_;
         double param_;
         bool vertical_;
         bool parameterSet_;
         bool haveAllDistros_;
     };
 }
 
 #endif // NPSTAT_INTERPOLATEDDISTRO1D1P_HH_
Index: trunk/NEWS
===================================================================
--- trunk/NEWS	(revision 674)
+++ trunk/NEWS	(revision 675)
@@ -1,764 +1,764 @@
-Version 5.1.0 - development
+Version 5.1.0 - Nov 17 2019, by I. Volobouev
 
 * Started using python-specific features (pythonprepend, etc) in the
   SWIG .i files.
 
 * Added function "scannedKSDistance".
 
-* Added Python API for the persistence of several classes.
+* Added Python API for persistence of several classes.
 
 
 Version 5.0.0 - Oct 17 2019, by I. Volobouev
 
 * C++11 support is now mandatory.
 
 * Added more classes and functions to Python API. Overhauled some of
   the existing interfaces. Updated everything to python3.
 
 * Increased the maximum order of "classical" Edgeworth expansions to 14.
 
 * Increased the maximum order of conversions between 1-d central moments
   and cumulants to 20.
 
 * Added "histoQuantiles" function.
 
 * Added "correctDensityEstimateGHU" function.
 
 * Added "randomHistoFill1D" utility function.
 
 * Added ComparisonDistribution1D class.
 
 * Added classes GaussND, LinTransformedDistroND, and DistributionMixND.
 
 * Added a lot of SWIG .i files
 
 
 Version 4.11.0 - July 22 2019, by I. Volobouev
 
 * Added support for cumulant calculations for various Wald statistics
   in the Poisson process model.
 
 * Added functions convertCumulantsToCentralMoments and
   convertCentralMomentsToCumulants (header file cumulantConversion.hh).
 
 * Added function "cumulantUncertainties".
 
 
 Version 4.10.0 - July 11 2019, by I. Volobouev
 
 * Added SemiInfGaussianQuadrature class.
 
 * Added functions arrayMoment, arrayMoments, and arrayCentralMoments.
 
 * Added enum EdgeworthSeriesMethod and class EdgeworthSeries1D.
 
 * Added DeltaMixture1D class.
 
 * Added enum LikelihoodStatisticType.
 
 * Added functions "mixtureModelCumulants" and "poissonProcessCumulants"
   in the header likelihoodStatisticCumulants.hh.
 
 
 Version 4.9.0 - Dec 18 2018, by I. Volobouev
 
 * Added "integratePoly" and "jointIntegral" methods to the
   AbsClassicalOrthoPoly1D class.
 
 * Added utility functions "truncatedInverseSqrt" and "matrixIndexPairs".
 
 * Added a number of functions to "orthoPoly1DVProducts.hh".
 
 * Added classes ChebyshevOrthoPoly1st and ChebyshevOrthoPoly2nd inheriting
   from AbsClassicalOrthoPoly1D.
 
 * Added class HermiteProbOrthoPoly1D.
 
 * Added FejerQuadrature class.
 
 * Added classe IsoscelesTriangle1D and Logistic1D.
 
 * Added classes AbsKDE1DKernel and KDE1DHOSymbetaKernel.
 
 * Added static function "optimalDegreeHart" to OSDE1D class.
 
 
 Version 4.8.0 - Jul 9 2018, by I. Volobouev
 
 * Added ShiftedLegendreOrthoPoly1D class.
 
 * Added Lanczos method to generate recurrence coefficients for the
   ContOrthoPoly1D class.
 
 * Added npstat/stat/orthoPoly1DVProducts.hh file with various utilities
   for statistical analyis of chaos polynomials.
 
 
 Version 4.7.0 - Jan 13 2018, by I. Volobouev
 
 * Added "UniPareto1D" distribution (uniform with Pareto tail to the right).
 
 * More coordinate/weight cases for the GaussLegendreQuadrature class.
 
 * Added ContOrthoPoly1D class -- continuous orthogonal polynomials with
   discrete measure.
 
 * Added functions "linearLeastSquares" and "tdSymEigen" to the Matrix class.
 
 * Added OSDE1D class.
 
 * Added classes LocationScaleFamily1D and SinhAsinhTransform1D.
 
 * Added new functors (CdfFunctor1D, etc) as AbsDistribution1D helpers.
 
 * Small fix in StatAccumulatorArr.cc.
 
 
 Version 4.6.0 - Jan 23 2017, by I. Volobouev
 
 * Updated 1-d LOrPE cross validation code (classes AbsBandwidthCV,
   BandwidthCVLeastSquares1D, BandwidthCVPseudoLogli1D) for use with
   weighted samples in the case the sample itself is available at the
   point cross validation is run.
 
 
 Version 4.5.0 - Aug 01 2016, by I. Volobouev
 
 * A small fix in OrthoPolyND.icc (switched from cycling over unsigned to
   unsigned long in the scalar product function).
 
 * Implemented Gauss-Hermite quadrature with Gaussian density weight.
 
 * Changed the meaning of Quadratic1D and LogQuadratic1D parameters to be
   consistent with Legendre polynomial coefficients on [-1, 1] (new
   parameters are 1/2 of old).
 
 * Added class MinuitUnbinnedFitFcn1D (to interfaces).
 
 * Added function "findRootNewtonRaphson".
 
 * Added "statUncertainties" header with various functions.
 
 
 Version 4.4.0 - May 9 2016, by I. Volobouev
 
 * Added "timestamp" function.
 
 * Improved implementation of BinnedDensity1D::unscaledQuantile function.
   Certain problems caused by round-off errors are now fixed.
 
 * Added the capability to use the single closest parameter cells (thus
   disabling actual interpolation between parameter values, for speed)
   to "GridInterpolatedDistribution".
 
 
 Version 4.3.0 - March 19 2016, by I. Volobouev
 
 * Put additional assert statements inside OrthoPolyND.icc to prevent
   phony "array subscript is above array bounds" messages in g++ 4.9.2.
 
 * Improved CmdLine.hh.
 
 * Additional methods in CopulaInterpolationND and GridInterpolatedDistribution.
 
 * Added function "volumeDensityFromBinnedRadial".
 
 * Added convenience method "globalFilter" to the OrthoPoly1D class.
 
 * Initiated the transition of the Python API from Python 2 to Python 3.
 
 
 Version 4.2.0 - October 29 2015, by I. Volobouev
 
 * Added interpolation methods for the marginals to classes
   "CopulaInterpolationND" and "GridInterpolatedDistribution".
 
 * Removed assert on underflow in the "igamc" function. Now
   in such cases this function simply returns 0.0.
 
 
 Version 4.1.0 - July 27 2015, by I. Volobouev
 
 * Made a few additional methods virtual in AbsNtuple.
 
 * Declared methods "columnIndices" of AbsNtuple const (as they should be).
 
 * Added function "weightedCopulaHisto" to build copulas for samples of
   weighted points.
 
 * Added function "weightedVariableBandwidthSmooth1D" to use variable
   bandwidth smoothing with weighted histograms.
 
 * Added "AbsWeightedMarginalSmoother" interface class for smoothing samples
   of weighted points. Modified classes ConstantBandwidthSmoother1D,
   JohnsonKDESmoother, and LOrPEMarginalSmoother to support this interface.
 
 * Added class "VariableBandwidthSmoother1D" which implements both
   AbsMarginalSmoother and AbsWeightedMarginalSmoother interfaces.
 
 * Implemented cross-validation for weighted samples.
 
 * Implemented "buildInterpolatedCompositeDistroND" for generic construction
   of multivariate transfer functions.
 
 * Implemented "buildInterpolatedDistro1DNP" for generic construction
   of univariate transfer functions.
 
 
 Version 4.0.1 - June 17 2015, by I. Volobouev
 
 * Added "dump_qmc" example executable.
 
 
 Version 4.0.0 - June 10 2015, by I. Volobouev
 
 * Complete overhaul of 1-d filter-building code. Addition of new boundary
   methods is a lot easier now. The user API for choosing a LOrPE boundary
   method is encapsulated in the new "BoundaryHandling" class.
 
 * Implemented a number of new filter builders with different boundary
   treatments.
 
 * Updated the "LocalPolyFilter1D" class so that it holds the local
   bandwidth factors derived by the filter builders.
 
 
 Version 3.8.0 - June 1 2015, by I. Volobouev
 
 * Implemented class ConstSqFilter1DBuilder (declared in the header file
   npstat/stat/Filter1DBuilders.hh). The "BoundaryMethod" enum has been
   extended accordingly. Other files using this enum have been updated.
 
 * Implemented class FoldingSqFilter1DBuilder. Similar to ConstSqFilter1DBuilder
   but it also folds the kernel in addition to stretching it.
 
 * Added virtual destructors to a number of classes.
 
 * Added "externalMemArrayND" with various signatures to allow the use of
   ArrayND with memory not managed by ArrayND.
 
 * Added move constructors and move assignment operators to ArrayND and
   Matrix classes.
 
 
 Version 3.7.0 - May 10 2015, by I. Volobouev
 
 * Better numerical derivative calculation in InterpolatedDistribution1D.cc.
 
 * Added class "LocalMultiFilter1D" for fast generation of filters which
   correspond to each orthogonal polynomial separately.
 
 * Added a function calculating the area of n-dimensional sphere.
 
 * Added I/O capabilities to the RadialProfileND class.
 
 * Added class "LogRatioTransform1D".
 
 * Added utility function "multiFill1DHistoWithCDFWeights" (header file
   histoUtils.hh).
 
 * Avoiding underflow of the incomplete gamma in "convertToSphericalRandom".
 
 
 Version 3.6.0 - April 6 2015, by I. Volobouev
 
 * Fixed Marsaglia's code calculating the Anderson-Darling statistics
   (it was breaking down for large values of z).
 
 * Added high-level driver function "simpleVariableBandwidthSmooth1D" to
   automatically build the pilot estimate for "variableBandwidthSmooth1D".
 
 * Swithched to log of sigma as Minuit parameter in "minuitFitJohnsonCurves"
   instead of sigma (linear sigma would sometimes break the fit when Minuit
   would come up with 0 or negative trial value for it).
 
 * Extended "MinuitDensityFitFcn1D" class so that it could be used to fit
   non-uniformly binned histograms.
 
 * Adapted "minuitFitJohnsonCurves" function so that it could be used with
   histograms templated upon DualHistoAxis.
 
 * Added functions "fillArrayCentersPreservingAreas" and
   "canFillArrayCentersPreservingAreas".
 
 * Implemented an interface to monotonous coordinate transformations with
   the intent of using them in constructing densities. Implemented a number
   of transforms.
 
 * Implemented class "TransformedDistribution1D".
 
 * Added class "VerticallyInterpolatedDistro1D1P".
 
 * Added utility function "fill1DHistoWithCDFWeights".
 
 
 Version 3.5.0 - February 21 2015, by I. Volobouev
 
 * Added "symPDEigenInv" method to the Matrix class.
 
 * Added "variableCount" method to unfolding bandwidth scanner classes.
 
 * Increased the tolerance parameters in JohnsonSu::initialize and in
   JohnsonFit constructor.
 
 * Bug fix in function "fillHistoFromText".
 
 
 Version 3.4.4 - January 13 2015, by I. Volobouev
 
 * Corrected handling of some "assert" statements so that the code compiles
   correctly with the -DNDEBUG option.
 
 
 Version 3.4.3 - January 5 2015, by I. Volobouev
 
 * Implemented class MirroredGauss1D.
 
 * Added method "getOracleData" to class UnfoldingBandwidthScanner1D.
 
 * Bug fix in FoldingFilter1DBuilder::makeOrthoPoly.
 
 
 Version 3.4.2 - December 15 2014, by I. Volobouev
 
 * Implemented InterpolatedDistro1D1P class.
 
 
 Version 3.4.1 - November 07 2014, by I. Volobouev
 
 * Implemented "divideTransforms" function for deconvolutions.
 
 * Implemented the Moyal distribution.
 
 * Added "fillHistoFromText" utility function.
 
 * Added "apply_lorpe_1d" example.
 
 
 Version 3.4.0 - October 01 2014, by I. Volobouev
 
 * Implemented Hadamard product and Hadamard ratio for matrices.
 
 * Bug fix in the "solve_linear_system" lapack interface function.
 
 
 Version 3.3.1 - August 08 2014, by I. Volobouev
 
 * Terminate iterative refinement of the unfolding error propagation
   matrix early in case the solution does not seem to improve.
 
 
 Version 3.3.0 - August 05 2014, by I. Volobouev
 
 * Added correction factors to the determination of the number of fitted
   parameters in various unfolding procedures.
 
 
 Version 3.2.0 - July 25 2014, by I. Volobouev
 
 * Added "gaussianResponseMatrix" function for non-uniform binning.
 
 * Added Pareto distribution.
 
 * Implemented EMS unfolding with sparse matrices.
 
 * Added methods "getObservedShape" and "getUnfoldedShape" to the AbsUnfoldND
   class.
 
 * Bug fix in the assignment operator of ProductDistributionND class. Made
   class ProductDistributionND persistent.
 
 * Bug fix in the error propagation for unfolding, in the code which takes
   into account the extra normalization constant.
 
 * Added "productResponseMatrix" function to assist in making sparse response
   matrices.
 
 * Bug fix in the factory constructor of the Cauchy1D class.
 
 * Completed implementation of the "RatioOfNormals" class.
 
 
 Version 3.1.0 - June 29 2014, by I. Volobouev
 
 * Improved (again) random number generator for the 1-d Gaussian distribution.
   Something about expectation values of normalized Hermite polynomials over
   random numbers made by this generator is still not fully understood. The
   standard deviation of these expectations decreases with the polynomial
   order (while it should stay constant). It is possible that the numbers
   of points used are simply insufficient to sample the tails correctly.
 
 * Implemented smoothed expectation-maximization (a.k.a. D'Agostini) unfolding
   for 1-d distributions in classes SmoothedEMUnfold1D and MultiscaleEMUnfold1D.
   In certain usage scenarios, MultiscaleEMUnfold1D can be more efficient than
   SmoothedEMUnfold1D.
 
 * Implemented smoothed expectation-maximization unfolding for multivariate
   distributions in a class SmoothedEMUnfoldND.
 
 * Added class "UnfoldingBandwidthScanner1D" to study 1-d unfolding behavior
   as a function of filter bandwidth.
 
 * Added class "UnfoldingBandwidthScannerND" to study multivariate unfolding
   behavior as a function of provided bandwidth values.
 
 * Added DummyLocalPolyFilter1D class useful when a filter is needed which
   does not smooth anything.
 
 * Added function "poissonLogLikelihood" (header file npstat/stat/arrayStats.hh).
 
 * Added function "pooledDiscreteTabulated1D" (header file
   npstat/stat/DiscreteDistributions1D.hh).
 
 * Implemented class UGaussConvolution1D (convolution of uniform distribution
   with a Gaussian).
 
 * Implemented gamma distribution (class Gamma1D).
 
 * Defined interface for comparing binned distributions, AbsBinnedComparison1D.
 
 * Implemented several comparison classes for comparing binned distributions:
   PearsonsChiSquared, BinnedKSTest1D, BinnedADTest1D. Class BinnedKSTest1D
   pulled in dependence on the "kstest" package.
 
 * Made classes LocalPolyFilter1D, LocalPolyFilterND, and SequentialPolyFilterND
   persistent.
 
 * Added code generating dense filter matrices to LocalPolyFilterND and
   SequentialPolyFilterND (as needed for unfolding).
 
 * Made class MemoizingSymbetaFilterProvider persistent.
 
 * Implemented function goldenSectionSearchOnAGrid (header file 
   npstat/nm/goldenSectionSearch.hh).
 
 * Implemented function parabolicExtremum (header npstat/nm/MathUtils.hh).
 
 * Added class DistributionMix1D (header npstat/stat/DistributionMix1D.hh).
 
 * Added interface to solving A*X = B, with matrices X and B, to the Matrix
   class (method "solveLinearSystems").
 
 * Added "reshape" methods to the ArrayND class.
 
 * Added "gaussianResponseMatrix" function.
 
 * Added a section on unfolding to the documentation.
 
 * Added "ems_unfold_1d" example program.
 
 
 Version 3.0.0 - March 14 2014, by I. Volobouev
 
 * Added interface to the LAPACK SVD routines.
 
 * Added function "lorpeMise1D" to calculate MISE for arbitrary distributions.
 
 * Added FoldingFilter1DBuilder class.
 
 * Changed interfaces for several high-level functions to use
   FoldingFilter1DBuilder. The major version number got bumped up.
 
 * Split DensityScan1D.hh away from AbsDistribution1D.hh.
 
 
 Version 2.7.0 - March 10 2014, by I. Volobouev
 
 * Added code to optimize operations with diagonal matrices.
 
 * Added discretizedDistance.hh file for simple L1 and L2 distance calculations
   with numeric arrays.
 
 * Added base class for future unfolding code.
 
 * The "reset" method of the Matrix class was renamed into "uninitialize"
   in order to be consistent with ArrayND.
 
 * Added function "multinomialCovariance1D".
 
 * Added "polyTimesWeight" method to the OrthoPoly1D class.
 
 * Added methods "TtimesThis" and "timesT" to the Matrix class. These
   methods are more efficient than transpose followed by multiplication.
 
 
 Version 2.6.0 - January 30 2014, by I. Volobouev
 
 * Added function "lorpeBackgroundCVDensity1D" which linearizes calculation
   of the cross validation likelihood in semiparametric fits. Argument
   "linearizeCrossValidation" was added to MinuitSemiparametricFitFcn1D
   constructor, "lorpeBackground1D" function, etc.
 
 * Added the ability to build filters with center point removed to classes
   WeightTableFilter1DBuilder and StretchingFilter1DBuilder. The function
   "symbetaLOrPEFilter1D" now has an appropriate switch.
 
 * Added "removeRowAndColumn" method to the Matrix class.
 
 * Added CircularBuffer class.
 
 * Added various plugin bandwidth functions which work with non-integer
   polynomial degrees.
 
 * Switched to the Legendre polynomial basis for calculating all 1-d
   orthogonal polynomials (instead of monomial basis).
 
 * Added MemoizingSymbetaFilterProvider class.
 
 * Added "operator+=" method to the MultivariateSumAccumulator class.
 
 * Simplified implementation of the PolyFilterCollection1D class.
   File PolyFilterCollection1D.icc is removed.
 
 * Added "RatioOfNormals" 1-d distribution function. Only the density is
   currently implemented but not the CDF.
 
 * Added ExpMapper1d class.
 
 
 Version 2.5.0 - October 15 2013, by I. Volobouev
 
 * Added "getFilterMatrix" method to the LocalPolyFilter1D class.
 
 * Added "genEigen" method to the Matrix class (for determination of
   eigenvalues and eigenvectors of general real matrices).
 
 * Refactored the LAPACK interface so that interface functions to floats
   are automatically generated from interface functions to doubles. See
   the comment at the end of the "lapack_interface.icc" file for the shell
   commands to do this.
 
 
 Version 2.4.0 - October 6 2013, by I. Volobouev
 
 * Added functions "lorpeBackground1D", "lorpeBgCVPseudoLogli1D", and
   "lorpeBgLogli1D".
 
 * Added minuit interface classes "MinuitLOrPEBgCVFcn1D" and
   "MinuitSemiparametricFitFcn1D".
 
 * Added "ScalableDensityConstructor1D" class for use with Minuit interface
   functions.
 
 * Added classes AbsSymbetaFilterProvider and SymbetaPolyCollection1D.
 
 
 Version 2.3.0 - October 1 2013, by I. Volobouev
 
 * Allowed point dimensionality to be larger than the histogram dimensionality
   in the "empiricalCopulaHisto" function.
 
 * Added "keepAllFilters" method to AbsFilter1DBuilder and all derived classes.
 
 * Implemented exclusion regions for WeightTableFilter1DBuilder and
   StretchingFilter1DBuilder.
 
 * "symbetaLOrPEFilter1D" function (in the header LocalPolyFilter1D.hh)
   is updated to take the exclusion mask argument.
 
 * Added "continuousDegreeTaper" function which can do something meaningful
   with the continuous LOrPE degree parameter.
 
 
 Version 2.2.0 - June 30 2013, by I. Volobouev
 
 * Added classes DiscreteBernsteinPoly1D and BernsteinFilter1DBuilder.
 
 * Added classes DiscreteBeta1D and BetaFilter1DBuilder.
 
 * Added BifurcatedGauss1D class to model Gaussian-like distributions with
   different sigmas on the left and right sides.
 
 * Added virtual destructors to the classes declared in the Filter1DBuilders.hh
   header.
 
 * Added a method to the Matrix template to calculate Frobenius norm.
 
 * Added methods to the Matrix template to calculate row and column sums.
 
 * Added "directSum" method to the Matrix template.
 
 * Added constructor from a subrange of another matrix to the Matrix template.
 
 * Added code to the LocalPolyFilter1D class that generates a doubly
   stochastic filter out of an arbitrary filter.
 
 * Added "npstat/nm/definiteIntegrals.hh" header and corresponding .cc file
   for various infrequently used integrals.
 
 * Added "betaKernelsBandwidth" function.
 
 
 Version 2.1.0 - June 20 2013, by I. Volobouev
 
 * Fixed couple problems which showed up in the robust regression code
   due to compiler update.
 
 * Fixed CensoredQuantileRegressionOnKDTree::process method (needed this->
   dereference for some member).
 
 
 Version 2.0.0 - June 15 2013, by I. Volobouev
 
 * Updated to use "Geners" version 1.3.0. A few interfaces were changed
   (API for the string archives was removed because Geners own string archive
   facilities are now adequate) so the major version number was bumped up.
 
 
 Version 1.6.0 - June 12 2013, by I. Volobouev
 
 * Updated some documentation.
 
 * Updated fitCompositeJohnson.icc to use simplified histogram constructors.
 
 * Bug fix in the "minuitLocalQuantileRegression1D" function.
 
 * Changed the "quantileBinFromCdf" function to use unsigned long type for
   array indices.
 
 * Added "weightedLocalQuantileRegression1D" function (namespace npsi) for
   local regression with single predictor on weighted points.
 
 
 Version 1.5.0 - May 23 2013, by I. Volobouev
 
 * Added interfaces to LAPACK routines DSYEVD, DSYEVR, and corresponding
   single precision versions.
 
 * Added the "symPSDefEffectiveRank" method to the Matrix class for
   calculating effective ranks of symmetric positive semidefinite matrices.
 
 * Added converting constructor and assignment operator to the Matrix class.
 
 * Run the Gram-Schmidt procedure twice when orthogonal polynomials are
   derived in order to improve orthogonality.
 
 
 Version 1.4.0 - May 20 2013, by I. Volobouev
 
 * Added the "append" method to the AbsNtuple class.
 
 
 Version 1.3.0 - May 10 2013, by I. Volobouev
 
 * Added the code for Hermite polynomial series.
 
 * Improved random number generator for the 1-d Gaussian distribution.
 
 * Added a framework for discrete 1-d distributions as well as two
   concrete distribution classes (Poisson1D, DiscreteTabulated1D).
 
 * Added functions "readCompressedStringArchiveExt" and
   "writeCompressedStringArchiveExt" which can read/write either compressed
   or uncompressed string archives, distinguished by file extension.
 
 
 Version 1.2.1 - March 22 2013, by I. Volobouev
 
 * Improved CmdLine.hh in the "examples/C++" directory.
 
 * Added class QuantileTable1D.
 
 * Added classes LeftCensoredDistribution and RightCensoredDistribution.
 
 
 Version 1.2.0 - March 13 2013, by I. Volobouev
 
 * Added convenience "fill" methods to work with the ntuples which have
   small number of columns (up to 10).
 
 * Fixed a bug in AbsRandomGenerator for univariate generators making
   multivariate points.
 
 * Added LOrPEMarginalSmoother class.
 
 
 Version 1.1.1 - March 11 2013, by I. Volobouev
 
 * Added utility function "symbetaLOrPEFilter1D" which creates 1-d LOrPE
   filters using kernels from the symmetric beta family (and the Gaussian).
 
 * Added high level driver function "lorpeSmooth1D".
 
 * Allowed variables with zero variances for calculation of correlation
   coefficients in "MultivariateSumsqAccumulator". Such variables will
   have zero correlation coefficients with all other variables.
 
 * Added rebinning constructor to the HistoND class.
 
 
 Version 1.1.0 - March 8 2013, by I. Volobouev
 
 * Changed NUHistoAxis::fltBinNumber method to produce correct results
   with interpolation degree 0. It is not yet obvious which method would
   work best for higher interpolation degrees.
 
 * Added functions for converting between StringArchive and python bytearray.
   They have been placed in a new header: wrap/stringArchiveToBinary.hh.
 
 * Added methods "exportMemSlice" and "importMemSlice" to ArrayND. These
   methods allow for filling array slices from unstructured memory buffers
   and for exporting array slices to such memory buffers.
 
 * Added "simpleColumnNames" function (header file AbsNtuple.hh) to generate
   trivial column names when ntuple column names are not important.
 
 * Added functions "neymanPearsonWindow1D" and "signalToBgMaximum1D".
   They are declared in a new header npstat/neymanPearsonWindow1D.hh.
 
 
 Version 1.0.5 - December 17 2012, by I. Volobouev
 
 * Flush string archives before writing them out in stringArchiveIO.cc.
 
 * Added class TruncatedDistribution1D.
 
 
 Version 1.0.4 - November 14 2012, by I. Volobouev
 
 * Added utilities for reading/writing Geners string archives to files.
 
 * Added BinSummary class.
 
 * Doxygen documentation improved. Every header file in stat, nm, rng,
   and interfaces now has a brief description.
 
 
 Version 1.0.3 - September 27 2012, by I. Volobouev
 
 * Fixed some bugs related to moving StorableMultivariateFunctor code
   from "nm" to "stat".
 
 
 Version 1.0.2 - August 6 2012, by I. Volobouev
 
 * Added converting copy constructor to the "LinInterpolatedTableND" class.
 
 * Added StorableMultivariateFunctor class (together with the corresponding
   reader class).
 
 * Added StorableInterpolationFunctor class which inherits from the above
   and can be used with interpolation tables.
 
 * Added StorableHistoNDFunctor class which inherits from
   StorableMultivariateFunctor and can be used to interpolate histogram bins.
 
 * Added "transpose" method to HistoND class.
 
 * Created DualAxis class.
 
 * Created DualHistoAxis class.
 
 * Added conversion functions between histogram and grid axes.
 
 * Added "mergeTwoHistos" function for smooth merging of two histograms.
 
 * Added "ProductSymmetricBetaNDCdf" functor to be used as weight in
   merging histograms.
 
 * Added CoordinateSelector class.
 
 
 Version 1.0.1 - June 29 2012, by I. Volobouev
 
 * Implemented class LinInterpolatedTableND with related supporting code.