Index: trunk/NEWS =================================================================== --- trunk/NEWS (revision 827) +++ trunk/NEWS (revision 828) @@ -1,924 +1,924 @@ -Version 5.7.0 - development +Version 5.7.0 - September 7 2022, by I. Volobouev * Added method "weightedPointsAverages" to the AbsClassicalOrthoPoly1D and ContOrthoPoly1D classes. * Added methods "sumOfSquaredWeights" to classes WeightedStatAccumulator and WeightedSampleAccumulator. * Added method "sigmaRange" to WeightedSampleAccumulator class. * Added header file LOrPE1DCV.hh with various cross-validation faclities for unbinned LOrPE in one dimension. * Added function "sampleDistro1DWithWeight". * Added class "GaussianDip". * Added functions "findScanMinimum1D" and "findScanMaximum1D", mainly for use in Python code. * Interfaced class DeltaMixture1D to python. DeltaMixture1D can now deal with coordinate ties. * Added classes AcceptanceFunctor1D and InverseAcceptanceFunctor1D. * Changed signature of filter and weight functors used with ntuples (added row number argument to all calls). * Added class ModulatedDistribution1D. * Added weight setting capabilitied to classes LOrPE1D and LOrPE1DCV. * Added function solveForLOrPEWeights together with helper classes LOrPEWeightsResult, LOrPE1DFixedDegreeCVRunner, LOrPE1DFixedDegreeCVScanner, and LOrPEWeightsLocalConvergence. * Added function semiMixLocalLogLikelihood. * Added class FejerQuadratureSet memoizing FejerQuadrature objects. * Added function "assignResamplingWeights". * Added class PowerTransform1D. * Added 1-d distribution classes PowerRatio1D and PowerHalfCauchy1D. * Added class EllipticalDistribution and some derived classes. Version 5.6.0 - May 31 2022, by I. Volobouev * Changed AbsKDE1DKernel, KDE1DDensityKernel, KDE1DHOSymbetaKernel, and KDE1D classes to accept a normalization factor. In certain situations, this allows for a simple boundary correction by data mirroring. Note, however, that cross-validation does not work with mirroring. * Added "resampleWithReplacement" templated function. * Added class FoldedDistribution1D. * Added class LOrPE1DSymbetaKernel, enabling unbinned LOrPE. * Fixed a bug in the support determination of TruncatedDistribution1D. * Added class LOrPE1D, with intended use similar to KDE1D. * Improved some Python wrappers. Version 5.5.1 - May 8 2022, by I. Volobouev * Changed the SWIG interface so that one can now use references to abstract classes for restoring objects from Geners archives. * Added "subrange" function to the SWIG interface of the Matrix class. Version 5.5.0 - April 15 2022, by I. Volobouev * Added method "extWeightAverages" to the ContOrthoPoly1D class. * Added method "getNull" to the AbsUnbinnedGOFTest1D class. * Added class UnbinnedGOFTest1DFactory. Version 5.4.0 - March 7 2022, by I. Volobouev * Added method "extWeightAverages" to the AbsClassicalOrthoPoly1D class. * Bug fix in the "random" method of DistributionMix1D class. Version 5.3.0 - July 29 2021, by I. Volobouev * Added some facilities for quadruple precision calculations. They mostly rely on the __float128 type (gcc extension, libquadmath) and on compiling the LAPACK/BLAS libraries with the "-freal-8-real-16" switch. * Added the "GaussLegendreQuadratureQ" class capable of performing Gauss-Legendre quadratures in quadruple precision. * Added the "GaussLegendreQuadrature2D" class capable of performing tensor-product Gauss-Legendre quadratures in two dimensions. * Added OrthoPoly1DDeg functor. * Added class ScalableClassicalOrthoPoly1D. * Added class ClassicalOrthoPoly1DFromWeight. This also necessitated refactoring of the class DensityOrthoPoly1D. * Added classes AbsIntervalQuadrature1D and RectangleQuadrature1D. * Added function "kernelSensitivityMatrix". * Added function "sumOfSquares". * Added a number of .i files. * Added classes AbsUnbinnedGOFTest1D and OrthoPolyGOFTest1D. * Added class PolynomialDistro1D. * Added a number of unbinned goodness-of-fit tests. * Added class Gauss1DQuadrature. * Added class SmoothGOFTest1D. * Added class Poly1D". Version 5.2.0 - July 20 2020, by I. Volobouev * Added a number of .i files. * Added classes "DiscreteGauss1DBuilder" and "DiscreteGaussCopulaSmoother". * Added class "MatrixFilter1DBuilder". * Added facilities for saddlepoint approximations: classes AbsCGF1D and SeriesCGF1D, function saddlepointDistribution1D, as well as class ExpTiltedDistribution1D which can be useful in saddlepoint approximation precision studies. * Added class "DensityOrthoPoly1D" for building orthonormal polynomial systems using (almost) arbitrary 1-d densities as weights. * Added "empiricalMoment" method to the AbsDistribution1D class. * Added class "HeatEq1DNeumannBoundary". * Added methods "sampleAverages" and "sampleProductAverages" to the ContOrthoPoly1D class. * Added method "expectation" to the AbsDistribution1D class. Version 5.1.0 - Nov 17 2019, by I. Volobouev * Started using python-specific features (pythonprepend, etc) in the SWIG .i files. * Added function "scannedKSDistance". * Added Python API for persistence of several classes. Version 5.0.0 - Oct 17 2019, by I. Volobouev * C++11 support is now mandatory. * Added more classes and functions to Python API. Overhauled some of the existing interfaces. Updated everything to python3. * Increased the maximum order of "classical" Edgeworth expansions to 14. * Increased the maximum order of conversions between 1-d central moments and cumulants to 20. * Added "histoQuantiles" function. * Added "correctDensityEstimateGHU" function. * Added "randomHistoFill1D" utility function. * Added ComparisonDistribution1D class. * Added classes GaussND, LinTransformedDistroND, and DistributionMixND. * Added a lot of SWIG .i files Version 4.11.0 - July 22 2019, by I. Volobouev * Added support for cumulant calculations for various Wald statistics in the Poisson process model. * Added functions convertCumulantsToCentralMoments and convertCentralMomentsToCumulants (header file cumulantConversion.hh). * Added function "cumulantUncertainties". Version 4.10.0 - July 11 2019, by I. Volobouev * Added SemiInfGaussianQuadrature class. * Added functions arrayMoment, arrayMoments, and arrayCentralMoments. * Added enum EdgeworthSeriesMethod and class EdgeworthSeries1D. * Added DeltaMixture1D class. * Added enum LikelihoodStatisticType. * Added functions "mixtureModelCumulants" and "poissonProcessCumulants" in the header likelihoodStatisticCumulants.hh. Version 4.9.0 - Dec 18 2018, by I. Volobouev * Added "integratePoly" and "jointIntegral" methods to the AbsClassicalOrthoPoly1D class. * Added utility functions "truncatedInverseSqrt" and "matrixIndexPairs". * Added a number of functions to "orthoPoly1DVProducts.hh". * Added classes ChebyshevOrthoPoly1st and ChebyshevOrthoPoly2nd inheriting from AbsClassicalOrthoPoly1D. * Added class HermiteProbOrthoPoly1D. * Added FejerQuadrature class. * Added classe IsoscelesTriangle1D and Logistic1D. * Added classes AbsKDE1DKernel and KDE1DHOSymbetaKernel. * Added static function "optimalDegreeHart" to OSDE1D class. Version 4.8.0 - Jul 9 2018, by I. Volobouev * Added ShiftedLegendreOrthoPoly1D class. * Added Lanczos method to generate recurrence coefficients for the ContOrthoPoly1D class. * Added npstat/stat/orthoPoly1DVProducts.hh file with various utilities for statistical analyis of chaos polynomials. Version 4.7.0 - Jan 13 2018, by I. Volobouev * Added "UniPareto1D" distribution (uniform with Pareto tail to the right). * More coordinate/weight cases for the GaussLegendreQuadrature class. * Added ContOrthoPoly1D class -- continuous orthogonal polynomials with discrete measure. * Added functions "linearLeastSquares" and "tdSymEigen" to the Matrix class. * Added OSDE1D class. * Added classes LocationScaleFamily1D and SinhAsinhTransform1D. * Added new functors (CdfFunctor1D, etc) as AbsDistribution1D helpers. * Small fix in StatAccumulatorArr.cc. Version 4.6.0 - Jan 23 2017, by I. Volobouev * Updated 1-d LOrPE cross validation code (classes AbsBandwidthCV, BandwidthCVLeastSquares1D, BandwidthCVPseudoLogli1D) for use with weighted samples in the case the sample itself is available at the point cross validation is run. Version 4.5.0 - Aug 01 2016, by I. Volobouev * A small fix in OrthoPolyND.icc (switched from cycling over unsigned to unsigned long in the scalar product function). * Implemented Gauss-Hermite quadrature with Gaussian density weight. * Changed the meaning of Quadratic1D and LogQuadratic1D parameters to be consistent with Legendre polynomial coefficients on [-1, 1] (new parameters are 1/2 of old). * Added class MinuitUnbinnedFitFcn1D (to interfaces). * Added function "findRootNewtonRaphson". * Added "statUncertainties" header with various functions. Version 4.4.0 - May 9 2016, by I. Volobouev * Added "timestamp" function. * Improved implementation of BinnedDensity1D::unscaledQuantile function. Certain problems caused by round-off errors are now fixed. * Added the capability to use the single closest parameter cells (thus disabling actual interpolation between parameter values, for speed) to "GridInterpolatedDistribution". Version 4.3.0 - March 19 2016, by I. Volobouev * Put additional assert statements inside OrthoPolyND.icc to prevent phony "array subscript is above array bounds" messages in g++ 4.9.2. * Improved CmdLine.hh. * Additional methods in CopulaInterpolationND and GridInterpolatedDistribution. * Added function "volumeDensityFromBinnedRadial". * Added convenience method "globalFilter" to the OrthoPoly1D class. * Initiated the transition of the Python API from Python 2 to Python 3. Version 4.2.0 - October 29 2015, by I. Volobouev * Added interpolation methods for the marginals to classes "CopulaInterpolationND" and "GridInterpolatedDistribution". * Removed assert on underflow in the "igamc" function. Now in such cases this function simply returns 0.0. Version 4.1.0 - July 27 2015, by I. Volobouev * Made a few additional methods virtual in AbsNtuple. * Declared methods "columnIndices" of AbsNtuple const (as they should be). * Added function "weightedCopulaHisto" to build copulas for samples of weighted points. * Added function "weightedVariableBandwidthSmooth1D" to use variable bandwidth smoothing with weighted histograms. * Added "AbsWeightedMarginalSmoother" interface class for smoothing samples of weighted points. Modified classes ConstantBandwidthSmoother1D, JohnsonKDESmoother, and LOrPEMarginalSmoother to support this interface. * Added class "VariableBandwidthSmoother1D" which implements both AbsMarginalSmoother and AbsWeightedMarginalSmoother interfaces. * Implemented cross-validation for weighted samples. * Implemented "buildInterpolatedCompositeDistroND" for generic construction of multivariate transfer functions. * Implemented "buildInterpolatedDistro1DNP" for generic construction of univariate transfer functions. Version 4.0.1 - June 17 2015, by I. Volobouev * Added "dump_qmc" example executable. Version 4.0.0 - June 10 2015, by I. Volobouev * Complete overhaul of 1-d filter-building code. Addition of new boundary methods is a lot easier now. The user API for choosing a LOrPE boundary method is encapsulated in the new "BoundaryHandling" class. * Implemented a number of new filter builders with different boundary treatments. * Updated the "LocalPolyFilter1D" class so that it holds the local bandwidth factors derived by the filter builders. Version 3.8.0 - June 1 2015, by I. Volobouev * Implemented class ConstSqFilter1DBuilder (declared in the header file npstat/stat/Filter1DBuilders.hh). The "BoundaryMethod" enum has been extended accordingly. Other files using this enum have been updated. * Implemented class FoldingSqFilter1DBuilder. Similar to ConstSqFilter1DBuilder but it also folds the kernel in addition to stretching it. * Added virtual destructors to a number of classes. * Added "externalMemArrayND" with various signatures to allow the use of ArrayND with memory not managed by ArrayND. * Added move constructors and move assignment operators to ArrayND and Matrix classes. Version 3.7.0 - May 10 2015, by I. Volobouev * Better numerical derivative calculation in InterpolatedDistribution1D.cc. * Added class "LocalMultiFilter1D" for fast generation of filters which correspond to each orthogonal polynomial separately. * Added a function calculating the area of n-dimensional sphere. * Added I/O capabilities to the RadialProfileND class. * Added class "LogRatioTransform1D". * Added utility function "multiFill1DHistoWithCDFWeights" (header file histoUtils.hh). * Avoiding underflow of the incomplete gamma in "convertToSphericalRandom". Version 3.6.0 - April 6 2015, by I. Volobouev * Fixed Marsaglia's code calculating the Anderson-Darling statistics (it was breaking down for large values of z). * Added high-level driver function "simpleVariableBandwidthSmooth1D" to automatically build the pilot estimate for "variableBandwidthSmooth1D". * Swithched to log of sigma as Minuit parameter in "minuitFitJohnsonCurves" instead of sigma (linear sigma would sometimes break the fit when Minuit would come up with 0 or negative trial value for it). * Extended "MinuitDensityFitFcn1D" class so that it could be used to fit non-uniformly binned histograms. * Adapted "minuitFitJohnsonCurves" function so that it could be used with histograms templated upon DualHistoAxis. * Added functions "fillArrayCentersPreservingAreas" and "canFillArrayCentersPreservingAreas". * Implemented an interface to monotonous coordinate transformations with the intent of using them in constructing densities. Implemented a number of transforms. * Implemented class "TransformedDistribution1D". * Added class "VerticallyInterpolatedDistro1D1P". * Added utility function "fill1DHistoWithCDFWeights". Version 3.5.0 - February 21 2015, by I. Volobouev * Added "symPDEigenInv" method to the Matrix class. * Added "variableCount" method to unfolding bandwidth scanner classes. * Increased the tolerance parameters in JohnsonSu::initialize and in JohnsonFit constructor. * Bug fix in function "fillHistoFromText". Version 3.4.4 - January 13 2015, by I. Volobouev * Corrected handling of some "assert" statements so that the code compiles correctly with the -DNDEBUG option. Version 3.4.3 - January 5 2015, by I. Volobouev * Implemented class MirroredGauss1D. * Added method "getOracleData" to class UnfoldingBandwidthScanner1D. * Bug fix in FoldingFilter1DBuilder::makeOrthoPoly. Version 3.4.2 - December 15 2014, by I. Volobouev * Implemented InterpolatedDistro1D1P class. Version 3.4.1 - November 07 2014, by I. Volobouev * Implemented "divideTransforms" function for deconvolutions. * Implemented the Moyal distribution. * Added "fillHistoFromText" utility function. * Added "apply_lorpe_1d" example. Version 3.4.0 - October 01 2014, by I. Volobouev * Implemented Hadamard product and Hadamard ratio for matrices. * Bug fix in the "solve_linear_system" lapack interface function. Version 3.3.1 - August 08 2014, by I. Volobouev * Terminate iterative refinement of the unfolding error propagation matrix early in case the solution does not seem to improve. Version 3.3.0 - August 05 2014, by I. Volobouev * Added correction factors to the determination of the number of fitted parameters in various unfolding procedures. Version 3.2.0 - July 25 2014, by I. Volobouev * Added "gaussianResponseMatrix" function for non-uniform binning. * Added Pareto distribution. * Implemented EMS unfolding with sparse matrices. * Added methods "getObservedShape" and "getUnfoldedShape" to the AbsUnfoldND class. * Bug fix in the assignment operator of ProductDistributionND class. Made class ProductDistributionND persistent. * Bug fix in the error propagation for unfolding, in the code which takes into account the extra normalization constant. * Added "productResponseMatrix" function to assist in making sparse response matrices. * Bug fix in the factory constructor of the Cauchy1D class. * Completed implementation of the "RatioOfNormals" class. Version 3.1.0 - June 29 2014, by I. Volobouev * Improved (again) random number generator for the 1-d Gaussian distribution. Something about expectation values of normalized Hermite polynomials over random numbers made by this generator is still not fully understood. The standard deviation of these expectations decreases with the polynomial order (while it should stay constant). It is possible that the numbers of points used are simply insufficient to sample the tails correctly. * Implemented smoothed expectation-maximization (a.k.a. D'Agostini) unfolding for 1-d distributions in classes SmoothedEMUnfold1D and MultiscaleEMUnfold1D. In certain usage scenarios, MultiscaleEMUnfold1D can be more efficient than SmoothedEMUnfold1D. * Implemented smoothed expectation-maximization unfolding for multivariate distributions in a class SmoothedEMUnfoldND. * Added class "UnfoldingBandwidthScanner1D" to study 1-d unfolding behavior as a function of filter bandwidth. * Added class "UnfoldingBandwidthScannerND" to study multivariate unfolding behavior as a function of provided bandwidth values. * Added DummyLocalPolyFilter1D class useful when a filter is needed which does not smooth anything. * Added function "poissonLogLikelihood" (header file npstat/stat/arrayStats.hh). * Added function "pooledDiscreteTabulated1D" (header file npstat/stat/DiscreteDistributions1D.hh). * Implemented class UGaussConvolution1D (convolution of uniform distribution with a Gaussian). * Implemented gamma distribution (class Gamma1D). * Defined interface for comparing binned distributions, AbsBinnedComparison1D. * Implemented several comparison classes for comparing binned distributions: PearsonsChiSquared, BinnedKSTest1D, BinnedADTest1D. Class BinnedKSTest1D pulled in dependence on the "kstest" package. * Made classes LocalPolyFilter1D, LocalPolyFilterND, and SequentialPolyFilterND persistent. * Added code generating dense filter matrices to LocalPolyFilterND and SequentialPolyFilterND (as needed for unfolding). * Made class MemoizingSymbetaFilterProvider persistent. * Implemented function goldenSectionSearchOnAGrid (header file npstat/nm/goldenSectionSearch.hh). * Implemented function parabolicExtremum (header npstat/nm/MathUtils.hh). * Added class DistributionMix1D (header npstat/stat/DistributionMix1D.hh). * Added interface to solving A*X = B, with matrices X and B, to the Matrix class (method "solveLinearSystems"). * Added "reshape" methods to the ArrayND class. * Added "gaussianResponseMatrix" function. * Added a section on unfolding to the documentation. * Added "ems_unfold_1d" example program. Version 3.0.0 - March 14 2014, by I. Volobouev * Added interface to the LAPACK SVD routines. * Added function "lorpeMise1D" to calculate MISE for arbitrary distributions. * Added FoldingFilter1DBuilder class. * Changed interfaces for several high-level functions to use FoldingFilter1DBuilder. The major version number got bumped up. * Split DensityScan1D.hh away from AbsDistribution1D.hh. Version 2.7.0 - March 10 2014, by I. Volobouev * Added code to optimize operations with diagonal matrices. * Added discretizedDistance.hh file for simple L1 and L2 distance calculations with numeric arrays. * Added base class for future unfolding code. * The "reset" method of the Matrix class was renamed into "uninitialize" in order to be consistent with ArrayND. * Added function "multinomialCovariance1D". * Added "polyTimesWeight" method to the OrthoPoly1D class. * Added methods "TtimesThis" and "timesT" to the Matrix class. These methods are more efficient than transpose followed by multiplication. Version 2.6.0 - January 30 2014, by I. Volobouev * Added function "lorpeBackgroundCVDensity1D" which linearizes calculation of the cross validation likelihood in semiparametric fits. Argument "linearizeCrossValidation" was added to MinuitSemiparametricFitFcn1D constructor, "lorpeBackground1D" function, etc. * Added the ability to build filters with center point removed to classes WeightTableFilter1DBuilder and StretchingFilter1DBuilder. The function "symbetaLOrPEFilter1D" now has an appropriate switch. * Added "removeRowAndColumn" method to the Matrix class. * Added CircularBuffer class. * Added various plugin bandwidth functions which work with non-integer polynomial degrees. * Switched to the Legendre polynomial basis for calculating all 1-d orthogonal polynomials (instead of monomial basis). * Added MemoizingSymbetaFilterProvider class. * Added "operator+=" method to the MultivariateSumAccumulator class. * Simplified implementation of the PolyFilterCollection1D class. File PolyFilterCollection1D.icc is removed. * Added "RatioOfNormals" 1-d distribution function. Only the density is currently implemented but not the CDF. * Added ExpMapper1d class. Version 2.5.0 - October 15 2013, by I. Volobouev * Added "getFilterMatrix" method to the LocalPolyFilter1D class. * Added "genEigen" method to the Matrix class (for determination of eigenvalues and eigenvectors of general real matrices). * Refactored the LAPACK interface so that interface functions to floats are automatically generated from interface functions to doubles. See the comment at the end of the "lapack_interface.icc" file for the shell commands to do this. Version 2.4.0 - October 6 2013, by I. Volobouev * Added functions "lorpeBackground1D", "lorpeBgCVPseudoLogli1D", and "lorpeBgLogli1D". * Added minuit interface classes "MinuitLOrPEBgCVFcn1D" and "MinuitSemiparametricFitFcn1D". * Added "ScalableDensityConstructor1D" class for use with Minuit interface functions. * Added classes AbsSymbetaFilterProvider and SymbetaPolyCollection1D. Version 2.3.0 - October 1 2013, by I. Volobouev * Allowed point dimensionality to be larger than the histogram dimensionality in the "empiricalCopulaHisto" function. * Added "keepAllFilters" method to AbsFilter1DBuilder and all derived classes. * Implemented exclusion regions for WeightTableFilter1DBuilder and StretchingFilter1DBuilder. * "symbetaLOrPEFilter1D" function (in the header LocalPolyFilter1D.hh) is updated to take the exclusion mask argument. * Added "continuousDegreeTaper" function which can do something meaningful with the continuous LOrPE degree parameter. Version 2.2.0 - June 30 2013, by I. Volobouev * Added classes DiscreteBernsteinPoly1D and BernsteinFilter1DBuilder. * Added classes DiscreteBeta1D and BetaFilter1DBuilder. * Added BifurcatedGauss1D class to model Gaussian-like distributions with different sigmas on the left and right sides. * Added virtual destructors to the classes declared in the Filter1DBuilders.hh header. * Added a method to the Matrix template to calculate Frobenius norm. * Added methods to the Matrix template to calculate row and column sums. * Added "directSum" method to the Matrix template. * Added constructor from a subrange of another matrix to the Matrix template. * Added code to the LocalPolyFilter1D class that generates a doubly stochastic filter out of an arbitrary filter. * Added "npstat/nm/definiteIntegrals.hh" header and corresponding .cc file for various infrequently used integrals. * Added "betaKernelsBandwidth" function. Version 2.1.0 - June 20 2013, by I. Volobouev * Fixed couple problems which showed up in the robust regression code due to compiler update. * Fixed CensoredQuantileRegressionOnKDTree::process method (needed this-> dereference for some member). Version 2.0.0 - June 15 2013, by I. Volobouev * Updated to use "Geners" version 1.3.0. A few interfaces were changed (API for the string archives was removed because Geners own string archive facilities are now adequate) so the major version number was bumped up. Version 1.6.0 - June 12 2013, by I. Volobouev * Updated some documentation. * Updated fitCompositeJohnson.icc to use simplified histogram constructors. * Bug fix in the "minuitLocalQuantileRegression1D" function. * Changed the "quantileBinFromCdf" function to use unsigned long type for array indices. * Added "weightedLocalQuantileRegression1D" function (namespace npsi) for local regression with single predictor on weighted points. Version 1.5.0 - May 23 2013, by I. Volobouev * Added interfaces to LAPACK routines DSYEVD, DSYEVR, and corresponding single precision versions. * Added the "symPSDefEffectiveRank" method to the Matrix class for calculating effective ranks of symmetric positive semidefinite matrices. * Added converting constructor and assignment operator to the Matrix class. * Run the Gram-Schmidt procedure twice when orthogonal polynomials are derived in order to improve orthogonality. Version 1.4.0 - May 20 2013, by I. Volobouev * Added the "append" method to the AbsNtuple class. Version 1.3.0 - May 10 2013, by I. Volobouev * Added the code for Hermite polynomial series. * Improved random number generator for the 1-d Gaussian distribution. * Added a framework for discrete 1-d distributions as well as two concrete distribution classes (Poisson1D, DiscreteTabulated1D). * Added functions "readCompressedStringArchiveExt" and "writeCompressedStringArchiveExt" which can read/write either compressed or uncompressed string archives, distinguished by file extension. Version 1.2.1 - March 22 2013, by I. Volobouev * Improved CmdLine.hh in the "examples/C++" directory. * Added class QuantileTable1D. * Added classes LeftCensoredDistribution and RightCensoredDistribution. Version 1.2.0 - March 13 2013, by I. Volobouev * Added convenience "fill" methods to work with the ntuples which have small number of columns (up to 10). * Fixed a bug in AbsRandomGenerator for univariate generators making multivariate points. * Added LOrPEMarginalSmoother class. Version 1.1.1 - March 11 2013, by I. Volobouev * Added utility function "symbetaLOrPEFilter1D" which creates 1-d LOrPE filters using kernels from the symmetric beta family (and the Gaussian). * Added high level driver function "lorpeSmooth1D". * Allowed variables with zero variances for calculation of correlation coefficients in "MultivariateSumsqAccumulator". Such variables will have zero correlation coefficients with all other variables. * Added rebinning constructor to the HistoND class. Version 1.1.0 - March 8 2013, by I. Volobouev * Changed NUHistoAxis::fltBinNumber method to produce correct results with interpolation degree 0. It is not yet obvious which method would work best for higher interpolation degrees. * Added functions for converting between StringArchive and python bytearray. They have been placed in a new header: wrap/stringArchiveToBinary.hh. * Added methods "exportMemSlice" and "importMemSlice" to ArrayND. These methods allow for filling array slices from unstructured memory buffers and for exporting array slices to such memory buffers. * Added "simpleColumnNames" function (header file AbsNtuple.hh) to generate trivial column names when ntuple column names are not important. * Added functions "neymanPearsonWindow1D" and "signalToBgMaximum1D". They are declared in a new header npstat/neymanPearsonWindow1D.hh. Version 1.0.5 - December 17 2012, by I. Volobouev * Flush string archives before writing them out in stringArchiveIO.cc. * Added class TruncatedDistribution1D. Version 1.0.4 - November 14 2012, by I. Volobouev * Added utilities for reading/writing Geners string archives to files. * Added BinSummary class. * Doxygen documentation improved. Every header file in stat, nm, rng, and interfaces now has a brief description. Version 1.0.3 - September 27 2012, by I. Volobouev * Fixed some bugs related to moving StorableMultivariateFunctor code from "nm" to "stat". Version 1.0.2 - August 6 2012, by I. Volobouev * Added converting copy constructor to the "LinInterpolatedTableND" class. * Added StorableMultivariateFunctor class (together with the corresponding reader class). * Added StorableInterpolationFunctor class which inherits from the above and can be used with interpolation tables. * Added StorableHistoNDFunctor class which inherits from StorableMultivariateFunctor and can be used to interpolate histogram bins. * Added "transpose" method to HistoND class. * Created DualAxis class. * Created DualHistoAxis class. * Added conversion functions between histogram and grid axes. * Added "mergeTwoHistos" function for smooth merging of two histograms. * Added "ProductSymmetricBetaNDCdf" functor to be used as weight in merging histograms. * Added CoordinateSelector class. Version 1.0.1 - June 29 2012, by I. Volobouev * Implemented class LinInterpolatedTableND with related supporting code. Index: trunk/doc/stat_methods.pdf =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/pdf Index: trunk/npstat/stat/LOrPE1DFixedDegreeCVRunner.hh =================================================================== --- trunk/npstat/stat/LOrPE1DFixedDegreeCVRunner.hh (revision 827) +++ trunk/npstat/stat/LOrPE1DFixedDegreeCVRunner.hh (revision 828) @@ -1,83 +1,94 @@ #ifndef NPSTAT_LORPE1DFIXEDDEGREECVRUNNER_HH_ #define NPSTAT_LORPE1DFIXEDDEGREECVRUNNER_HH_ +/*! +// \file LOrPE1DFixedDegreeCVScanner.hh +// +// \brief Optimization of LOrPE bandwidth choice by cross-validation +// using the golden section search +// +// Author: I. Volobouev +// +// August 2022 +*/ + #include #include #include "npstat/nm/goldenSectionSearch.hh" #include "npstat/nm/MathUtils.hh" #include "npstat/stat/LOrPE1DCVResult.hh" namespace npstat { class LOrPE1DFixedDegreeCVRunner { public: inline LOrPE1DFixedDegreeCVRunner(const double i_fixedDegree, const double i_bwFactorGuess, const double i_tol) : fixedDegree_(i_fixedDegree), bwFactorGuess_(i_bwFactorGuess), tol_(i_tol) { assert(fixedDegree_ >= 0.0); assert(bwFactorGuess_ > 0.0); assert(tol_ > 0.0); } inline double fixedDegree() const {return fixedDegree_;} inline double bwFactorGuess() const {return bwFactorGuess_;} inline double tol() const {return tol_;} template inline LOrPE1DCVResult crossValidate(Lorpe& lorpe) const { const double oldDeg = lorpe.boundFilterDegree(); lorpe.bindFilterDegree(fixedDegree_); const double sqr2 = 1.414213562373095; double bwFactor = 0.0, bestFcn, divider = 1.0; bool bestBwFound = false; auto fcn = MultiplyByConst(lorpe, -1.0); const unsigned maxtries = 11; // divider can be increased up to // the factor 2^((maxtries-1)/2) for (unsigned itry=0; itry 0.0); // Construct parabolic approximation to the minimum const double fstep = 1.0 + tol_; const double bwMin = bwFactor/fstep; const double bwMax = bwFactor*fstep; double extremumCoordinate, extremumValue; const bool isMinimum = parabolicExtremum( std::log(bwMin), fcn(bwMin), std::log(bwFactor), bestFcn, std::log(bwMax), fcn(bwMax), &extremumCoordinate, &extremumValue); assert(isMinimum); if (oldDeg < 0.0) lorpe.unbindFilterDegree(); else lorpe.bindFilterDegree(oldDeg); return LOrPE1DCVResult( fixedDegree_, std::exp(extremumCoordinate), -extremumValue, true, false); } private: double fixedDegree_; double bwFactorGuess_; double tol_; }; } #endif // NPSTAT_LORPE1DFIXEDDEGREECVRUNNER_HH_ Index: trunk/npstat/stat/bivariateChiSquare.hh =================================================================== --- trunk/npstat/stat/bivariateChiSquare.hh (revision 827) +++ trunk/npstat/stat/bivariateChiSquare.hh (revision 828) @@ -1,14 +1,25 @@ #ifndef NPSTAT_BIVARIATECHISQUARE_HH_ #define NPSTAT_BIVARIATECHISQUARE_HH_ -// Imagine the bivariate normal density. -// mux and muy are shifts (means). +/*! +// \file bivariateChiSquare.hh +// +// \brief Simple expression for the chi-square in the bivariate case +// +// Author: I. Volobouev +// +// August 2022 +*/ + +/** +// Imagine the bivariate normal density. mux and muy are shifts (means). // sx and sy are the standard deviations (must be positive). // rho is the correlation coefficient (must have |rho| < 1.0). // The function returns the chi-square between the bivariate mean // and the point (x, y). The bivariate normal density at (x, y) // is proportional to exp(-chi_square/2). +*/ double bivariateChiSquare(double mux, double muy, double sx, double sy, double rho, double x, double y); #endif // NPSTAT_BIVARIATECHISQUARE_HH_ Index: trunk/npstat/stat/semiMixLocalLogLikelihood.hh =================================================================== --- trunk/npstat/stat/semiMixLocalLogLikelihood.hh (revision 827) +++ trunk/npstat/stat/semiMixLocalLogLikelihood.hh (revision 828) @@ -1,61 +1,71 @@ #ifndef NPSTAT_SEMIMIXLOCALLOGLIKELIHOOD_HH_ #define NPSTAT_SEMIMIXLOCALLOGLIKELIHOOD_HH_ +/*! +// \file semiMixLocalLogLikelihood.hh +// +// \brief Local likelihood for fitting semiparametric mixture models +// +// Author: I. Volobouev +// +// August 2022 +*/ + #include #include #include #include #include #include "npstat/nm/SimpleFunctors.hh" #include "npstat/nm/coordAndWeight.hh" #include "npstat/nm/fcnOrConst.hh" #include "npstat/stat/AbsDistribution1D.hh" namespace npstat { // WFcn can be just a constant (convertible to double) template inline double semiMixLocalLogLikelihood( const std::vector& sortedCoords, const AbsDistribution1D& signal, const double alpha, const Functor1& bgDensityFcn, const WFcn& localizingWeight, const double localizingWeightXmin, const double localizingWeightXmax) { if (localizingWeightXmin >= localizingWeightXmax) throw std::invalid_argument( "In npstat::semiMixLocalLogLikelihood: invalid weight boundaries"); const double logMin = std::log(DBL_MIN); long double sum = 0.0L; const unsigned long sz = sortedCoords.size(); double x, dummy; for (unsigned long i=0; i= localizingWeightXmin && x <= localizingWeightXmax) { const double w = fcnOrConst(localizingWeight, x); assert(w >= 0.0); if (w > 0.0) { const double s = signal.density(x); const double bg = bgDensityFcn(x); const double d = alpha*s + (1.0-alpha)*bg; const double logd = d > DBL_MIN ? std::log(d) : logMin; sum += w*logd; } } } return sum; } } #endif // NPSTAT_SEMIMIXLOCALLOGLIKELIHOOD_HH_ Index: trunk/npstat/stat/LOrPEWeightsLocalConvergence.hh =================================================================== --- trunk/npstat/stat/LOrPEWeightsLocalConvergence.hh (revision 827) +++ trunk/npstat/stat/LOrPEWeightsLocalConvergence.hh (revision 828) @@ -1,86 +1,97 @@ #ifndef NPSTAT_LORPEWEIGHTSLOCALCONVERGENCE_HH_ #define NPSTAT_LORPEWEIGHTSLOCALCONVERGENCE_HH_ +/*! +// \file LOrPEWeightsLocalConvergence.hh +// +// \brief Class that decides whether the iterations of the LOrPE +// semiparametric mixture Fredholm equation have converged +// +// Author: I. Volobouev +// +// August 2022 +*/ + #include #include namespace npstat { class LOrPEWeightsLocalConvergence { public: inline LOrPEWeightsLocalConvergence(const double i_tol, const double i_normPower, const bool i_useDensity) : tol_(i_tol), normPower_(i_normPower), checkConVergenceForDensity_(i_useDensity) { assert(tol_ > 0.0); assert(normPower_ > 0.0); } inline double tol() const {return tol_;} inline double normPower() const {return normPower_;} inline bool checkingConVergenceForDensity() const {return checkConVergenceForDensity_;} template inline bool converged(const Lorpe& lorpe, const double* previousWeights, const double* currentWeights, const double* previousBg, const double* currentBg) const { if (checkConVergenceForDensity_) return locallyConverged(lorpe, previousBg, currentBg); else return locallyConverged(lorpe, previousWeights, currentWeights); } private: template inline bool locallyConverged(const Lorpe& lorpe, const double* previous, const double* current) const { assert(previous); assert(current); const auto& sample = lorpe.coords(); const unsigned long sampleSize = sample.size(); long double sum = 0.0L, wsum = 0.0L; double p; for (unsigned long i=0; i= 0.0); if (w > 0.0) { wsum += w; const double ac = std::abs(current[i]); const double ap = std::abs(previous[i]); const double d = std::abs(current[i] - previous[i])*2.0/(ac + ap + 1.0); if (normPower_ == 1.0) p = d; else if (normPower_ == 2.0) p = d*d; else p = std::pow(d, normPower_); sum += w*p; } } assert(wsum > 0.0L); const double weightedDiff = sum/wsum; const double norm = std::pow(weightedDiff, 1.0/normPower_); return norm < tol_; } double tol_; double normPower_; bool checkConVergenceForDensity_; }; } #endif // NPSTAT_LORPEWEIGHTSLOCALCONVERGENCE_HH_ Index: trunk/npstat/stat/LOrPE1DFixedDegreeCVPicker.hh =================================================================== --- trunk/npstat/stat/LOrPE1DFixedDegreeCVPicker.hh (revision 827) +++ trunk/npstat/stat/LOrPE1DFixedDegreeCVPicker.hh (revision 828) @@ -1,169 +1,181 @@ #ifndef NPSTAT_LORPE1DFIXEDDEGREECVPICKER_HH_ #define NPSTAT_LORPE1DFIXEDDEGREECVPICKER_HH_ +/*! +// \file LOrPE1DFixedDegreeCVPicker.hh +// +// \brief Optimization of LOrPE bandwidth choice by cross-validation +// with a hybrid algorithm combining golden section search and +// a grid scan +// +// Author: I. Volobouev +// +// August 2022 +*/ + #include #include #include #include #include #include "npstat/nm/goldenSectionSearch.hh" #include "npstat/nm/MathUtils.hh" #include "npstat/nm/ScanExtremum1D.hh" #include "npstat/nm/EquidistantSequence.hh" #include "npstat/nm/DualAxis.hh" #include "npstat/nm/SimpleFunctors.hh" #include "npstat/stat/LOrPE1DCVResult.hh" namespace npstat { namespace Private { template class LOrPE1DFixedDegreeCVPickerHelper : public Functor1 { public: inline LOrPE1DFixedDegreeCVPickerHelper(Lorpe& lorpe, const double fixedDegree) : lorpe_(lorpe), fixedDegree_(fixedDegree) { assert(fixedDegree_ >= 0.0); } inline virtual ~LOrPE1DFixedDegreeCVPickerHelper() {} inline virtual double operator()(const double& bwFactor) const { const double cv = lorpe_(fixedDegree_, bwFactor); const auto insertResult = cvValues_.insert(std::make_pair(bwFactor,cv)); assert(insertResult.second); return -cv; } inline std::pair getMemoized(const double bwFactor) const { const auto it = cvValues_.find(bwFactor); if (it == cvValues_.end()) return std::make_pair(0.0, false); else return std::make_pair(it->second, true); } private: Lorpe& lorpe_; double fixedDegree_; mutable std::map cvValues_; }; } class LOrPE1DFixedDegreeCVPicker { public: inline LOrPE1DFixedDegreeCVPicker(const double degree, const double minBwFactor, const double maxBwFactor, const unsigned nBwFactors, const double firstBwFactorToTry, const unsigned initialStepSizeInFactorsGridCells) : bandwidthFactors_(EquidistantInLogSpace(minBwFactor, maxBwFactor, nBwFactors)), fixedDegree_(degree), initialStep_(initialStepSizeInFactorsGridCells) { if (fixedDegree_ < 0.0) throw std::invalid_argument( "In npstat::LOrPE1DFixedDegreeCVPicker constructor: " "degree argument must be non-negative"); if (minBwFactor >= maxBwFactor) throw std::invalid_argument( "In npstat::LOrPE1DFixedDegreeCVPicker constructor: " "minimum factor must be less than maximum"); i0_ = bandwidthFactors_.getInterval(firstBwFactorToTry).first; assert(i0_ < nBwFactors); if (!initialStep_) initialStep_ = 1U; } inline unsigned nBwFactors() const {return bandwidthFactors_.nCoords();} inline double getBwFactor(const unsigned i) const {return bandwidthFactors_.coordinate(i);} inline double searchStart() const {return bandwidthFactors_.coordinate(i0_);} template inline LOrPE1DCVResult crossValidate(Lorpe& lorpe) const { typedef Private::LOrPE1DFixedDegreeCVPickerHelper Helper; const unsigned nscan = bandwidthFactors_.nCoords(); const Helper helper(lorpe, fixedDegree_); unsigned imin = UINT_MAX; double fMinusOne, fmin, fPlusOne; const MinSearchStatus1D status = goldenSectionSearchOnAGrid( helper, bandwidthFactors_, i0_, initialStep_, &imin, &fMinusOne, &fmin, &fPlusOne); if (status == MIN_SEARCH_FAILED) { // This most likely means we have found a local maximum // instead of a minimum. Switch to a simple scan instead // to find the global minimum. std::vector buffer(2*nscan); double* bwLogs = &buffer[0]; double* cvValues = bwLogs + nscan; for (unsigned i=0; i& searched = helper.getMemoized(bwFactor); if (searched.second) cvValues[i] = searched.first; else cvValues[i] = lorpe(fixedDegree_, bwFactor); } const ScanExtremum1D& scanMax = findScanMaximum1D(bwLogs, nscan, cvValues, nscan); return LOrPE1DCVResult( fixedDegree_, std::exp(scanMax.location()), scanMax.value(), true, scanMax.isOnTheBoundary()); } else if (status == MIN_SEARCH_OK) { assert(imin); assert(imin + 1U < nscan); const double bwMin = bandwidthFactors_.coordinate(imin-1U); const double bwFactor = bandwidthFactors_.coordinate(imin); const double bwMax = bandwidthFactors_.coordinate(imin+1U); // Build parabolic approximation in the log space double extremumCoordinate, extremumValue; const bool isMinimum = parabolicExtremum( std::log(bwMin), fMinusOne, std::log(bwFactor), fmin, std::log(bwMax), fPlusOne, &extremumCoordinate, &extremumValue); assert(isMinimum); return LOrPE1DCVResult( fixedDegree_, std::exp(extremumCoordinate), -extremumValue, true, false); } else { // The minimum is on the grid boundary assert(imin < nscan); const double bwFactor = bandwidthFactors_.coordinate(imin); const std::pair& searched = helper.getMemoized(bwFactor); assert(searched.second); return LOrPE1DCVResult( fixedDegree_, bwFactor, searched.first, true, true); } } private: DualAxis bandwidthFactors_; double fixedDegree_; unsigned i0_; unsigned initialStep_; }; } #endif // NPSTAT_LORPE1DFIXEDDEGREECVPICKER_HH_ Index: trunk/npstat/stat/EllipticalDistribution.hh =================================================================== --- trunk/npstat/stat/EllipticalDistribution.hh (revision 827) +++ trunk/npstat/stat/EllipticalDistribution.hh (revision 828) @@ -1,134 +1,135 @@ #ifndef NPSTAT_ELLIPTICALDISTRIBUTION_HH_ #define NPSTAT_ELLIPTICALDISTRIBUTION_HH_ /*! // \file EllipticalDistribution.hh // // \brief Multivariate elliptical ditributions // // Author: I. Volobouev // // September 2022 */ #include #include "npstat/nm/Matrix.hh" #include "npstat/stat/AbsDistribution1D.hh" #include "npstat/stat/AbsDistributionND.hh" namespace npstat { class EllipticalDistribution : public AbsDistributionND { public: /** // The constructor arguments are as follows: // // location, dim The shift and the dimensionality of // the distribution. The array "location" // must have at least "dim" elements. // // transformationMatrix The square matrix for generating random // variables from this density according to // x = location + transformationMatrix*y, // where y is a spherically distributed // random variable. For multivariate normal, // transformationMatrix is the square root // of the covariance matrix. // // gDistro The "generator" distribution. The // multivariate density is going to be // proportional to gDistro.density(chi-square), // where chi-square variable is constructed // as in the multivariate normal. // Must have gDistro.quantile(0.0) = 0.0. // // hDistro Distribution of the r^2 variable. Must have // hDistro.quantile(0.0) = 0.0. */ EllipticalDistribution(const double* location, unsigned dim, const Matrix& transformationMatrix, const AbsDistribution1D& gDistro, const AbsDistribution1D& hDistro); EllipticalDistribution(const EllipticalDistribution&); EllipticalDistribution& operator=(const EllipticalDistribution&); inline virtual EllipticalDistribution* clone() const {return new EllipticalDistribution(*this);} inline virtual ~EllipticalDistribution() {cleanup();} virtual double density(const double* x, unsigned dim) const; virtual void unitMap(const double* rnd, unsigned bufLen, double* x) const; inline virtual bool mappedByQuantiles() const {return false;} virtual unsigned random(AbsRandomGenerator& g, double* x, unsigned lenX) const; + /** This function returns the value of the quadratic form */ double chiSquare(const double* x, unsigned dim) const; inline const AbsDistribution1D& getGDistro() const {return *g_;} inline const AbsDistribution1D& getHDistro() const {return *h_;} inline const Matrix& getTransformationMatrix() const {return A_;} inline const std::vector& getShift() const {return mu_;} // Methods needed for I/O inline virtual gs::ClassId classId() const {return gs::ClassId(*this);} virtual bool write(std::ostream& os) const; static inline const char* classname() {return "npstat::EllipticalDistribution";} static inline unsigned version() {return 1;} static EllipticalDistribution* read(const gs::ClassId& id, std::istream& in); protected: virtual bool isEqual(const AbsDistributionND&) const; std::vector mu_; Matrix A_; Matrix InvCovmat_; AbsDistribution1D* g_; AbsDistribution1D* h_; double det_; double gNorm_; private: EllipticalDistribution(); void initialize(const AbsDistribution1D& g, const AbsDistribution1D& h); void cleanup(); mutable std::vector buf_; #ifdef SWIG public: inline EllipticalDistribution(const double* location1, unsigned dim1, const double* transform, unsigned nrows, unsigned ncols, const AbsDistribution1D& gDistro, const AbsDistribution1D& hDistro) : AbsDistributionND(dim1), mu_(location1, location1+dim1), A_(nrows, ncols, transform), g_(0), h_(0), det_(0.0), gNorm_(0.0), buf_(dim1) { assert(location1); assert(dim1); assert(transform); initialize(gDistro, hDistro); } inline AbsDistribution1D* getGDistro2() const {return g_->clone();} inline AbsDistribution1D* getHDistro2() const {return h_->clone();} inline Matrix getTransformationMatrix2() const {return A_;} inline std::vector getShift2() const {return mu_;} #endif // SWIG }; } #endif // NPSTAT_ELLIPTICALDISTRIBUTION_HH_ Index: trunk/npstat/stat/00README.txt =================================================================== --- trunk/npstat/stat/00README.txt (revision 827) +++ trunk/npstat/stat/00README.txt (revision 828) @@ -1,808 +1,849 @@ The code in this directory can depend on headers from the "nm" and "rng" directories in the "npstat" package. The classes implemented in this directory can be split into several subgroups by their purpose: * Data representation * Descriptive statistics * Statistical distributions * Fitting of parametric models * Statistical testing * Local filtering * Nonparametric density estimation * Deconvolution density estimation (Unfolding) * Nonparametric regression * Interpolation of statistical distributions * Miscellaneous data analysis techniques * Convenience API * Utilities * I/O Data representation ------------------- AbsNtuple.hh -- interface class for ntuples. Implemented in InMemoryNtuple.hh, ArchivedNtuple.hh. To be used by application code. NtHistoFill.hh, NtNtupleFill.hh, NtRectangularCut.hh -- convenience classes and functors for use inside "cycleOverRows", "conditionalCycleOverRows", and other similar ntuple methods. Column.hh -- helper class for AbsNtuple.hh. Can be used to refer to ntuple columns either by name or by column number. Used by AbsNtuple.hh, and should not be used separately. ArchivedNtuple.hh -- class for storing huge ntuples which do not fit in the computer memory. To be used by application code. HistoAxis.hh -- representation of a histogram axis with equidistant bins. NUHistoAxis.hh -- representation of a histogram axis with non-uniform bins. DualHistoAxis.hh -- histogram axis which works reasonably well for both uniform and non-uniform bins. convertAxis.hh -- conversion functions between histogram and grid axes. HistoND.hh -- arbitrary-dimensional histogram template. interpolateHistoND.hh -- interpolation of histogram data to coordinate values between bin centers. mergeTwoHistos.hh -- a utility for smooth blending of two histograms. ProductSymmetricBetaNDCdf.hh -- an interpolation function for blending histograms. InMemoryNtuple.hh -- class for storing small ntuples which completely fit in the computer memory. Works faster than ArchivedNtuple. NtupleBuffer.hh -- data buffer for data exchange between ArchivedNtuple and disk-based archive. OrderedPointND.hh -- a multidimensional point which can be sorted according to multiple sorting criteria. randomHistoFill1D.hh -- utilities for filling 1-d histograms with random samples. randomHistoFillND.hh -- utilities for filling multivariate histograms with random samples. StorableMultivariateFunctor.hh -- Base class for storable multivariate functors. StorableHistoNDFunctor.hh -- Adaptation that represents HistoND as a StorableMultivariateFunctor. HistoNDFunctorInstances.hh -- A number of concrete typedefs for StorableHistoNDFunctor template. StorableInterpolationFunctor.hh -- Adaptation that represents LinInterpolatedTableND as a StorableMultivariateFunctor. InterpolationFunctorInstances.hh -- A number of concrete typedefs for StorableInterpolationFunctor template. Descriptive statistics ---------------------- ArrayProjectors.hh -- helper classes for making lower-dimensional projections of the ArrayND class. Calculate various statistics over projected dimensions (mean, median, etc). arrayStats.hh -- means, standard deviations, and covariance matrices of multivariate arrays. Skewness, kurtosis, and cumulants for 1-d arrays. cumulantConversion.hh -- conversions between central moments and cumulants for 1-d distributions. cumulantUncertainties.hh -- uncertainties for sample cumulants calculated from population cumulants. statUncertainties.hh -- uncertainties of various sample statistics. histoStats.hh -- means and covariance matrices for histograms. kendallsTau.hh -- Kendall's rank correlation coefficient, from a sample of points or from copula. spearmansRho.hh -- Spearman's rank correlation coefficient, from a sample of points or from copula. logLikelihoodPeak.hh -- summary of a 1-d log-likelihood curve. MultivariateSumAccumulator.hh -- classes in these files are MultivariateSumsqAccumulator.hh intended for calculating MultivariateWeightedSumAccumulator.hh covariance and correlation MultivariateWeightedSumsqAccumulator.hh matrices of multivariate data in a numerically stable manner. These classes can serve as appropriate accumulator functors in various "cycleOverRows" methods of AbsNtuple or as separate tools. SampleAccumulator.hh -- accumulator of items for the purpose of calculating statistical summaries. For use inside histograms, etc. WeightedSampleAccumulator.hh -- similar class for use with weighted items. CircularBuffer.hh -- accumulator of items with fixed length. When this length is exceeded, the oldest items are discarded. Can calculate statistical summaries (mean, stdev) that do not require element sorting. StatAccumulator.hh -- simple, single-pass calculator of mean and standard deviation. Updates a running average, so it works a bit better than a "naive" version. StatAccumulatorPair.hh -- simple, single-pass calculator of mean, standard deviation, and covariance for two variables. StatAccumulatorArr.hh -- simple, single-pass calculator of mean and standard deviation for array sets. CrossCovarianceAccumulator.hh -- single-pass calculator of covariances and correlations for samples with elements represented by arrays. WeightedStatAccumulator.hh -- single-pass calculator of mean and standard deviation for weighted points. WeightedStatAccumulatorPair.hh -- simple, single-pass calculator of mean, standard deviation, and covariance for weighted points. BinSummary.hh -- a five-number summary for SampleAccumulator, StatAccumulator, WeightedSampleAccumulator, and WeightedSampleAccumulator which can be used for making box plots, etc. Allows for direct manipulation of the center value, upper and lower ranges, and min/max value. Statistical distributions ------------------------- AbsDistribution1D.hh -- interface classes for 1-d parametric and tabulated statistical distributions. AbsDiscreteDistribution1D.hh -- interface classes for 1-d discrete statistical distributions. Implemented in DiscreteDistributions1D.hh. AbsDistributionTransform1D.hh -- interface class for 1-d coordinate transforms intended for subsequent use in constructing distributions (in particular, with the TransformedDistribution1D class). AbsCGF1D.hh -- interface class for 1-d cumulant generating functions. AsinhTransform1D.hh -- asinh transform (just like the one used to generate Johnson's S_U curves). SinhAsinhTransform1D.hh -- transform y = sinh(a + b*asinh(x)). LogRatioTransform1D.hh -- log(y/(1-y)) transform with scale and location adjustment (just like the one used to generate Johnson's S_B curves). LogTransform1D.hh -- transform y = log(1 + x). +PowerTransform1D.hh -- transform y = a x^p. + AbsDistributionND.hh -- interface classes for multivariate statistical distributions. ComparisonDistribution1D.hh -- comparison distributions in one dimension CompositeDistribution1D.hh -- represents univariate statistical distributions whose cumulative distribution functions F(x) can be built by composition of two other cumulative distribution functions: F(x) = G(H(x)). CompositeDistributionND.hh -- represents multivariate statistical distributions decomposed into copula and marginals. Copulas.hh -- distributions which are copulas. CompositeDistros1D.hh -- several implementations of CompositeDistribution1D.hh using concrete distributions. DeltaMixture1D.hh -- a mixture of Dirac delta functions. DensityOrthoPoly1D.hh -- generate a system of orthonormal polynomials for an arbitrary density used as the weight. DistributionMixND.hh -- a mixture of multivariate distributions. Distributions1D.hh -- a number of continuous 1-d statistical distributions. DistributionsND.hh -- a number of continuous multivariate statistical distributions. DiscreteDistributions1D.hh -- several discrete statistical distributions, including Poisson and tabulated. DistributionMix1D.hh -- Mixtures of one-dimensional statistical distributions. distro1DStats.hh -- a utility for empirical calculation of distribution mean, standard deviation, skewness, and kurtosis. EdgeworthSeries1D.hh -- Edgeworth series. EdgeworthSeriesMethod.hh -- Helper class for defining Edgeworth series. +EllipticalDistribution.hh -- General implementation of multivariate + elliptical ditributions. + +EllipticalDistributions.hh -- A few concrete multivariate elliptical + ditributions with simple parameterizations. + ExpTiltedDistribution1D.hh -- Exponentially tilted density (a.k.a. "s-tilted"). FoldedDistribution1D.hh -- 1-d distributions with support folded into an interval. GaussianMixtureEntry.hh -- Helper class for defining Gaussian mixtures. GaussianMixture1D.hh -- One-dimensional Gaussian mixtures. GridRandomizer.hh -- class which knows how to map the unit hypercube into a distribution whose density is represented on an n-d grid (and, therefore, how to generate corresponding random numbers). Not intended for direct use by application code (use class "BinnedDensityND" instead). IdentityTransform1D.hh -- Identity coordinate transformation. JohnsonCurves.hh -- Johnson frequency curves. LeftCensoredDistribution.hh -- left-censored distribution. LocationScaleFamily1D.hh -- Creates a location-scale family from a (typically non-scalable) 1-d distribution. LocationScaleTransform1.hh -- Coordinate transformation of the type y = (x - mu)/sigma, with mu and sigma depending on a single parameter and calculated by two separate functors. +ModulatedDistribution1D.hh -- 1-d continuous statistical distributions + multiplied by an arbitrary smooth non-negative function. + PolynomialDistro1D.hh -- statistical distribution constructed using orthonormal Legendre polynomial series. +PowerHalfCauchy1D.hh -- distribution with unscaled density proportional + to x^m/(1 + x^2)^n for x >= 0 (and 0 for x < 0). + +PowerRatio1D.hh -- distribution with unscaled density proportional to + x^m/(1 + x)^n for x >= 0 (and 0 for x < 0). + RatioOfNormals.hh -- Distribution generated when one Gaussian variable is divided by another. RightCensoredDistribution.hh -- right-censored distribution. saddlepointDistribution1D.hh -- saddlepoint approximation to distributions. SbMomentsCalculator.hh -- Calculator of moments for S_b curves. Not for use by application code. ScalableGaussND.hh -- multivariate Gaussian with diagonal covariance matrix. SeriesCGF1D.hh -- cumulant generating function implementation with truncated series using known cumulants. TransformedDistribution1D.hh -- 1-d distributions in x but whose density, cdf, etc are specified in the y (transformed) space. TruncatedDistribution1D.hh -- 1-d distributions with truncated support. UGaussConvolution1D.hh -- convolution of uniform and Gaussian distributions. Fitting of parametric models ---------------------------- FitUtils.hh -- fitting of 1-d histograms. See also headers minuitFitJohnsonCurves.hh and MinuitDensityFitFcn1D.hh in the "interfaces" directory. Statistical testing ------------------- AbsBinnedComparison1D.hh -- interface for comparisons of binned distributions. AbsUnbinnedGOFTest1D.hh -- interface for one-sample unbinned goodness-of-fit tests for 1-d distributions. BinnedADTest1D.hh -- binned version of the Anderson-Darling test. BinnedKSTest1D.hh -- binned version of the Kolmogorov-Smirnov test. likelihoodStatisticCumulants.hh -- calculating cumulants for various signal testing statistics for subsequent use in Edgeworth serires. LikelihoodStatisticType.hh -- enum for the statistics that can be used to test for signal presence in mixture models or Poisson process models. OrthoPolyGOFTest1D.hh -- orthogonal polynomial goodness-of-fit test. PearsonsChiSquared.hh -- chi-squared goodness-of-fit test. scannedKSDistance.hh -- Kolmogorov-Smirnov distance between two (oracle) 1-d distributions. SmoothGOFTest1D.hh -- smooth test for goodness-of-fit. UnbinnedGOFTests1D.hh -- a variety of goodness-of-fit tests (KS, AD, Cramer-von Mises, Zhang's Z_K, Z_A, and Z_C). UnbinnedGOFTest1DFactory.hh -- a factory for creating a number of GoF tests from a uniform interface. Local filtering --------------- AbsFilter1DBuilder.hh -- interface classes for building local polynomial filters in 1d. Implemented in Filter1DBuilders.hh, BetaFilter1DBuilder.hh, WeightTableFilter1DBuilder.hh, BernsteinFilter1DBuilder.hh, DiscreteGauss1DBuilder.hh, MatrixFilter1DBuilder.hh. Used by LocalPolyFilter1D.hh. AbsPolyFilter1D.hh -- interface class for building univariate smoothers that can be optimized by cross-validation. Implemented in ConstantBandwidthSmoother1D.hh, LocalPolyFilter1D.hh. AbsPolyFilterND.hh -- interface class for building multivariate smoothers that can be optimized by cross-validation. Implemented in LocalPolyFilterND.hh, SequentialPolyFilterND.hh, and KDEFilterND.hh. Used by AbsBandwidthCV.hh, BandwidthCVLeastSquaresND.hh, BandwidthCVPseudoLogliND.hh. AbsSymbetaFilterProvider.hh -- interface class for building local polynomial filters in 1d using kernels from the symmetric beta family. MemoizingSymbetaFilterProvider.hh -- implements AbsSymbetaFilterProvider interface and allows for filter storage and lookup. BernsteinFilter1DBuilder.hh -- concrete class for building filters which smooth densities using Bernstein polynomials. BetaFilter1DBuilder.hh -- concrete class for building filters which smooth densities using beta functions (Bernstein polynomials of non-integer degree). betaKernelsBandwidth.hh -- optimal bandwidth for density estimation with beta kernels. BoundaryHandling.hh -- user API for handling LOrPE boundary methods. BoundaryMethod.hh -- enums for handling LOrPE boundary methods. continuousDegreeTaper.hh -- a method for generating tapers with effectively continuous degree. Intended for use with LocalPolyFilter1D. correctDensityEstimateGHU.hh -- Glad-Hjort-Ushakov correction for density estimates that can be negative DiscreteGauss1DBuilder.hh -- concrete class for building filters which smooth densities using the Green's function of the discretized heat equation. Filter1DBuilders.hh -- concrete classes for building local polynomial filters in 1d. They differ by their treatment of the weight function and boundary effects. MatrixFilter1DBuilder.hh -- building a local polynomial filter from a filtering matrix calculated by other code. WeightTableFilter1DBuilder.hh -- concrete classes for building local polynomial filters in 1d from density scans. KDEFilterND.hh -- KDE filtering (Nadaraya-Watson regression) on a regularly spaced 1-d grid. LocalPolyFilter1D.hh -- local polynomial filtering (regression) on a regularly spaced 1-d grid. LocalPolyFilterND.hh -- local polynomial filtering (regression) on a regularly spaced multivariate grid. LocalMultiFilter1D.hh -- local polynomial filtering with separate polynomials from an orthogonal set. LocalQuadraticLeastSquaresND.hh -- local quadratic polynomial filtering for an irregular set of points (possibly including uncertainties). +LOrPE1DCVResult.hh -- an object representing the result of the 1-d LOrPE + cross-validation procedure. + +LOrPE1DFixedDegreeCVPicker.hh -- various algorithms for finding the +LOrPE1DFixedDegreeCVRunner.hh optimal LOrPE bandwidth by +LOrPE1DFixedDegreeCVScanner.hh cross-validation, for a fixed LOrPE degree. + +LOrPE1DVariableDegreeCVPicker.hh -- algorithms for finding the optimal +LOrPE1DVariableDegreeCVRunner.hh LOrPE bandwidth and degree by + cross-validation. + lorpeSmooth1D.hh -- high level driver for LocalPolyFilter1D, etc. Intended for density reconstruction from histograms. lorpeBackground1D.hh -- high level driver for fitting mixed models in which signal is parameterized and background is nonparametric. lorpeBackgroundCVDensity1D.hh -- linearization of cross-validation calculations for fitting mixed models with nonparametric background. +LOrPEWeightsLocalConvergence.hh -- class that decides whether the iterations + of the LOrPE semiparametric mixture Fredholm equation have converged. + QuadraticOrthoPolyND.hh -- local quadratic polynomial filtering on a grid. In comparison with LocalPolyFilterND.hh, supports a finer interface to filtering functionality (direct support of an AbsDistributionND as a weight, calculations of gradient and hessian for the fitted surface, fitting is performed on functors rather than ArrayND objects, etc). Used by LocalLogisticRegression.hh. +semiMixLocalLogLikelihood.hh -- local likelihood for fitting semiparametric + mixture models. + SequentialPolyFilterND.hh -- similar to LocalPolyFilterND.hh, but the filtering is performed sequentially for each dimension using 1-d filters. +solveForLOrPEWeights.hh -- solver for the non-linear Fredholm equation + in the semiparametric mixture model with unbinned LOrPE. + SymbetaPolyCollection1D.hh -- class that builds LocalPolyFilter1D objects and memoizes local polynomials for the bandwidth values used. Nonparametric density estimation -------------------------------- AbsBandwidthCV.hh -- interface classes for calculating 1-d and multivariate cross-validation criteria for bandwidth and taper selection. Interfaces declared in this file are implemented in BandwidthCVLeastSquares1D.hh, BandwidthCVLeastSquaresND.hh, BandwidthCVPseudoLogli1D.hh, and BandwidthCVPseudoLogliND.hh. These interfaces are used by classes in CVCopulaSmoother.hh. AbsBandwidthGCV.hh -- interface classes for calculating 1-d and multivariate cross-validation criteria for bandwidth and taper selection. Interfaces declared in this file are implemented in BandwidthGCVLeastSquares1D.hh, BandwidthGCVLeastSquaresND.hh, BandwidthGCVPseudoLogli1D.hh, and BandwidthGCVPseudoLogliND.hh. These interfaces are used by classes in GCVCopulaSmoother.hh. The difference with the series of classes defined in "AbsBandwidthCV.hh" is that the grouping (i.e., the binning) of data is explicitly acknowledged, so that a substantially different set of filters (removing the whole group) can be used for cross-validation. AbsCompositeDistroBuilder.hh -- interface class for building composite distrubutions (which consist of copula and marginals) out of data samples. Implemented in DummyCompositeDistroBuilder.hh and NonparametricCompositeBuilder.hh. AbsDistro1DBuilder.hh -- interface class for building 1-d distributions out of data samples. Implemented in DummyDistro1DBuilder.hh and NonparametricDistro1DBuilder.hh. AbsCopulaSmootherBase.hh -- interface class for building copulas out of data samples. Implemented in AbsCVCopulaSmoother.hh. Used by NonparametricCompositeBuilder.hh. AbsKDE1DKernel.hh -- interface class for simple, brute-force KDE in 1-d without discretization or boundary correction. Implemented in KDE1DHOSymbetaKernel.hh. AbsMarginalSmootherBase.hh -- interface class for building 1-d smoothers of histograms. Implemented in JohnsonKDESmoother.hh, LOrPEMarginalSmoother.hh, ConstantBandwidthSmoother1D.hh, and VariableBandwidthSmoother1D.hh. Used by NonparametricCompositeBuilder.hh. AbsResponseIntervalBuilder.hh -- interface class for making cuts in the inivariate response space when density estimation is performed in the regression context. Implemented in DummyResponseIntervalBuilder.hh and RatioResponseIntervalBuilder.hh. AbsResponseBoxBuilder.hh -- interface class for making cuts in the multivariate response space when density estimation is performed in the regression context. Implemented in DummyResponseBoxBuilder.hh and RatioResponseBoxBuilder.hh. amiseOptimalBandwidth.hh -- function for selecting optimal LOrPE bandwidth values by optimizing AMISE on a reference distribution. Used in JohnsonKDESmoother.cc and ConstantBandwidthSmoother1D.cc. Can also be used by application code. BandwidthCVLeastSquares1D.hh -- class for calculating KDE or LOrPE cross-validation MISE approximations for 1-d density estimates. BandwidthCVLeastSquaresND.hh -- class for calculating KDE or LOrPE cross-validation MISE approximations for multivariate density estimates. BandwidthCVPseudoLogli1D.hh -- class for calculating KDE or LOrPE cross-validation pseudo log likelihood, for 1-d density estimates. BandwidthCVPseudoLogliND.hh -- Class for calculating KDE or LOrPE cross-validation pseudo log likelihood, for multivariate density estimates. buildInterpolatedCompositeDistroND.hh -- Multivariate density estimation in the regression context, with interpolation. buildInterpolatedDistro1DNP.hh -- Univariate density estimation in the regression context, with interpolation. ConstantBandwidthSmoother1D.hh -- 1-d KDE implementation with constant bandwidth. Implements AbsMarginalSmoother interface. ConstantBandwidthSmootherND.hh -- multivariate KDE implementation with constant bandwidth. CVCopulaSmoother.hh -- an interface to copula smoothers which use constant bandwidth LOrPE and select bandwidth by cross-validation. Implemented in LOrPECopulaSmoother.hh, SequentialCopulaSmoother.hh, KDECopulaSmoother.hh, DiscreteGaussCopulaSmoother.hh, and BernsteinCopulaSmoother.hh. Could be used by application code if it needs to develop its own cross-validation method for nonparametric copula estimation. GCVCopulaSmoother.hh -- an interface to copula smoothers working with grouped data and using substantially different filters for cross-validation. Implemented in KDEGroupedCopulaSmoother.hh, LOrPEGroupedCopulaSmoother.hh, and SequentialGroupedCopulaSmoother.hh. empiricalCopula.hh -- functions for building empirical copulas by constructing kd-tree for the data points and then doing lookups in this tree. empiricalCopulaHisto.hh -- function for building empirical copula densities by ordering the data points in multiple dimensions. weightedCopulaHisto.hh -- function for building empirical copula densities by ordering weighted data points in multiple dimensions. HistoNDCdf.hh -- multivariate cumulative distribution function built from a histogram. Its "coveringBox" method can be used to make k-NN type density estimates (and for other purposes). JohnsonKDESmoother.hh -- 1-d KDE implementation with adaptive bandwidth (see comments in the header file for details). Implements AbsMarginalSmoother interface. See also "fitCompositeJohnson.hh" header in the "interfaces" directory for an alternative approach. KDE1D.hh -- Convenience class which aggregates the kernel and the data for brute-force 1-d KDE without boundary correction. KDE1DCV.hh -- Cross-validation utilities for brute-force KDE in 1-d. KDE1DHOSymbetaKernel.hh -- high order Gaussian or symmetric beta kernels for brute-force KDE in 1-d. KDECopulaSmoother.hh -- constant bandwidth multivariate KDE copula smoother in which the bandwidth is selected by cross-validation. Implements CVCopulaSmoother. kernelSensitivityMatrix.hh -- calculation of the sensitivity matrix for KDE-like density estimation. LOrPE1D.hh -- Convenience class which aggregates the kernel and the data for unbinned LOrPE in one dimension. LOrPE1DCV.hh -- Unbinned density estimation by LOrPE with cross-validation. LOrPE1DSymbetaKernel.hh -- high order Gaussian or symmetric beta kernels for unbinned LOrPE in one dimension. LOrPECopulaSmoother.hh -- constant bandwidth multivariate LOrPE copula smoother in which the bandwidth is selected by cross-validation. Implements CVCopulaSmoother. LOrPEMarginalSmoother.hh -- 1-d LOrPE for fitting margins of composite distributions. Basically, interfaces "lorpeSmooth1D" to AbsMarginalSmoother. lorpeMise1D.hh -- Deterministic MISE calculation for reconstructing an arbitrary 1-d density. NonparametricCompositeBuilder.hh -- an implementation of AbsCompositeDistroBuilder. Uses separate density estimation procedures for copula and marginals. orthoPoly1DVProducts.hh -- utility functions for calculating certain statistical properties of 1-d orthogonal polynomials. OSDE1D.hh -- orthogonal series density estimation in one dimension. PolyFilterCollection1D.hh -- collection of LocalPolyFilter1D objects with the same kernel but different bandwidth values. Intended for use with bandwidth scans (for example, in cross-validation scenarios). QuantileTable1D.hh -- density function defined by its quantile table. Can be easily constructed using "empiricalQuantile" function from StatUtils.hh. SequentialCopulaSmoother.hh -- similar to LOrPECopulaSmoother, but the filters are limited to tensor products of univariate filters. variableBandwidthSmooth1D.hh -- KDE with adaptive bandwidth. Used by JohnsonKDESmoother.hh. Deconvolution density estimation (Unfolding) -------------------------------------------- AbsUnfold1D.hh -- interface class for deconvolution density estimation in 1-d (a.k.a. unfolding). AbsUnfoldND.hh -- interface class for multivariate deconvolution density estimation (a.k.a. unfolding). AbsUnfoldingFilterND.hh -- interface class for smoothers used in multivariate unfolding. gaussianResponseMatrix.hh -- helper function for building response matrices for one-dimensional unfolding problems. MultiscaleEMUnfold1D.hh -- a variation of 1-d unfolding algorithm with multiscale filtering and, potentially, faster convergence. productResponseMatrix.hh -- helper function for building sparse response matrices for multivariate unfolding problems. ResponseMatrix.hh -- sparse response matrix representation for multivariate unfolding. SmoothedEMUnfold1D.hh -- expectation-maximization (a.k.a. D'Agostini) unfolding with smoothing for 1-d distributions. SmoothedEMUnfoldND.hh -- expectation-maximization unfolding with smoothing for multivariate distributions. UnfoldingBandwidthScanner1D.hh -- class which gets various information from 1-d unfolding results in a convenient form. UnfoldingBandwidthScannerND.hh -- class which gets various information from multivariate unfolding results in a convenient form. Nonparametric regression ------------------------ LocalLogisticRegression.hh -- facilities for performing local linear and quadratic logistic regression. The interface is designed for use together with Minuit. See also the header "minuitLocalRegression.hh" in the "interfaces" directory. QuantileRegression1D.hh -- nonparametric quantile regression with one predictor. Supports polynomials of arbitrary degrees. Useful for constructing Neyman belts. See also "minuitLocalQuantileRegression1D.hh" header in the "interfaces" directory. LocalQuantileRegression.hh -- multivariate local linear or quadratic quantile regression. Can be used to fit histograms or collections of points. See also "minuitQuantileRegression.hh" header in the "interfaces" directory. CensoredQuantileRegression.hh -- multivariate local linear or quadratic quantile regression which can be used for data samples affected by a one-sided cut. griddedRobustRegression.hh -- framework for local robust regression (in particular, for local least trimmed squares). GriddedRobustRegressionStop.hh -- simple functor for stopping robust regression sequence. AbsLossCalculator.hh -- abstract class for calculating local loss for local robust regression. Implemented in "WeightedLTSLoss.hh" and "TwoPointsLTSLoss.hh". WeightedLTSLoss.hh -- functor for calculating local least trimmed squares with one point removed. TwoPointsLTSLoss.hh -- functor for calculating local least trimmed squares with two points or 1-d stripes removed. Interpolation of statistical distributions ------------------------------------------ AbsGridInterpolatedDistribution.hh -- interface class for interpolating between probability distributions placed at the points of a rectangular parameter grid. Implemented in GridInterpolatedDistribution.hh. To be used by application code. AbsInterpolatedDistribution1D.hh -- interface class for univariate density interpolation algorithms. Implemented by InterpolatedDistribution1D.hh and VerticallyInterpolatedDistribution1D.hh. AbsInterpolationAlgoND.hh -- interface class for multivariate density interpolation algorithms. Implemented by CopulaInterpolationND.hh and UnitMapInterpolationND.hh. Used by GridInterpolatedDistribution.hh. CopulaInterpolationND.hh -- interpolation of distributions represented by CompositeDistributionND. Copulas and quantile functions of the marginals are combined with externally provided weights. UnitMapInterpolationND.hh -- interpolation of distributions mapped to the unit cube by conditional quantile functions. GridInterpolatedDistribution.hh -- class which represents a complete multivariate distribution interpolated in parameters. Constructed incrementally, by adding distributions to the grid points. InterpolatedDistribution1D.hh -- 1-d continuous statistical distribution which interpolates between other distributions by performing linear interpolation of the quantile function. InterpolatedDistro1D1P.hh -- 1-d continuous statistical distribution interpolation on a 1-d parameter grid, with linear interpolation of weights between parameter values. Supports both interpolation of quantile functions and vertical interpolation. InterpolatedDistro1DNP.hh -- 1-d continuous statistical distribution interpolation on a multivariate parameter grid, with multilinear interpolation of weights between parameter values. Supports both interpolation of quantile functions and vertical interpolation. UnitMapInterpolationND.hh -- this class interpolates between multivariate distributions by interpolating between their unit maps. Miscellaneous data analysis techniques -------------------------------------- neymanPearsonWindow1D.hh -- search for likelihood ratio maximum and determination of optimal cuts in 1-d based on likelihood ratio between signal and background. Convenience API --------------- DensityScan1D.hh -- utility class for discretizing 1-d densities. DensityScanND.hh -- functor for filling multidimensional arrays with multivariate density scans. Calculates the density in the bin center. discretizationErrorND.hh -- function for calculating the ISE due to discretization of multivariate densities. DensityAveScanND.hh -- functor for filling multidimensional arrays with multivariate density scans. Integrates the density over the bin area. Distribution1DFactory.hh -- creation of a number of 1-d distributions from a uniform interface. scanSymmetricDensityAsWeight.hh -- determines density support and scans a multivariate density in a manner suitable for subsequent construction of orthogonal polynomial systems. Fills one quadrant (octant, etc) only. scanMultivariateDensityAsWeight.hh -- determines density support and scans a multivariate density in a manner suitable for subsequent construction of orthogonal polynomial systems. Performs a complete scan. Utilities --------- AllSymbetaParams1D.hh -- complete set of parameters for building 1-d filters from the symmetric beta family. buildInterpolatedHelpers.hh -- utilities for nonparametric interpolation of statistical distributions. Not for use by application code. +bivariateChiSquare.hh -- calculation of the chi-square in the bivariate case, + with the covariance matrix parameterized by the standard deviations and + the correlation coefficient. + histoUtils.hh -- utilities related to special ways of filling histograms, etc. mirrorWeight.hh -- helper function for scanning multivariate densities. Used by LocalPolyFilterND and KDEFilterND codes. multinomialCovariance1D.hh -- helper function for building multinomial distribution covariance matrices. NMCombinationSequencer.hh -- helper class for a certain type of integer permutations (distinct ways of choosing M out of N objects). sampleDistro1DWithWeight.hh -- special acceptance-rejection sampling from a density. StatUtils.hh -- a few useful functions which did not fit naturally anywhere else. SymbetaParams1D.hh -- collects the parameters of symmetric beta kernels. volumeDensityFromBinnedRadial.hh -- convert a density which was obtained from a histogram of radius values into the density per unit area (or per unit volume or hypervolume). WeightedDistro1DPtr.hh -- associates a pointer to AbsDistribution1D with a weight. Not for use by application code. I/O --- Distribution1DReader.hh -- factory for deserializing 1-d distribution functions. DistributionNDReader.hh -- factory for deserializing N-d distribution functions. distributionReadError.hh -- this code throws an appropriate exception if input I/O operations fail for a distribution previously stored on disk. DiscreteDistribution1DReader.hh -- factory for deserializing 1-d discrete distributions. DistributionTransform1DReader.hh -- factory for deserializing 1-d transforms. fillHistoFromText.hh -- utility for filling histograms from text files similar utility for ntuples in declared in the AbsNtuple.hh header). LocalPolyFilter1DReader.hh -- a reader factory for classes derived from LocalPolyFilter1D. NtupleRecordTypes.hh -- mechanisms for locating parts of the ArchivedNtuple NtupleRecordTypesFwd.hh in the archive. Not for use by application code. NtupleReference.hh -- special reference type for ArchivedNtuple. StorableMultivariateFunctorReader.hh -- factory for deserializing for storable multivariate functors. UnfoldingFilterNDReader.hh -- reader factory for classes derived from AbsUnfoldingFilterND. Index: trunk/npstat/stat/EllipticalDistributions.hh =================================================================== --- trunk/npstat/stat/EllipticalDistributions.hh (revision 827) +++ trunk/npstat/stat/EllipticalDistributions.hh (revision 828) @@ -1,250 +1,251 @@ #ifndef NPSTAT_ELLIPTICALDISTRIBUTIONS_HH_ #define NPSTAT_ELLIPTICALDISTRIBUTIONS_HH_ /*! // \file EllipticalDistributions.hh // // \brief A few concrete examples of multivariate elliptical ditributions +// with simple parameters // // Author: I. Volobouev // // September 2022 */ #include "npstat/stat/EllipticalDistribution.hh" #include "npstat/stat/Distributions1D.hh" #include "npstat/stat/PowerRatio1D.hh" #include "npstat/stat/PowerTransform1D.hh" #include "npstat/stat/TransformedDistribution1D.hh" // The distributions below are taken from Chapter 3 of the // monograph "Symmetric Multivariate and Related Distributions" // by K.-T. Fang, S. Kotz and K. W. Ng, CRC Press, 1990. namespace npstat { /** // EllipticalNormal is the same distribution as GaussND. // The transformation matrix in the constructor is just // the square root of the desired covariance matrix. */ class EllipticalNormal : public EllipticalDistribution { public: inline EllipticalNormal(const double* location1, unsigned dim1, const Matrix& transformationMatrix) : EllipticalDistribution(location1, dim1, transformationMatrix, Exponential1D(0.0, 2.0), Gamma1D(0.0, 2.0, dim1/2.0)) {} inline virtual ~EllipticalNormal() {} // Methods needed for I/O inline virtual gs::ClassId classId() const {return gs::ClassId(*this);} inline virtual bool write(std::ostream& os) const { return EllipticalDistribution::classId().write(os) && EllipticalDistribution::write(os); } static inline const char* classname() {return "npstat::EllipticalNormal";} static inline unsigned version() {return 1;} static inline EllipticalNormal* read( const gs::ClassId& id, std::istream& in) { static const gs::ClassId current( gs::ClassId::makeId()); current.ensureSameId(id); CPP11_auto_ptr pb = gs::read_obj(in); return new EllipticalNormal(*pb); } private: inline EllipticalNormal(const EllipticalDistribution& b) : EllipticalDistribution(b) {} #ifdef SWIG public: inline EllipticalNormal(const double* location1, unsigned dim1, const double* transform, unsigned nrows, unsigned ncols) : EllipticalDistribution(location1, dim1, transform, nrows, ncols, Exponential1D(0.0, 2.0), Gamma1D(0.0, 2.0, dim1/2.0)) {} #endif // SWIG }; class EllipticalKotz : public EllipticalDistribution { public: /** For this distribution we must have s > 0, r > 0, 2*N + dim1 > 2 */ inline EllipticalKotz(const double* location1, unsigned dim1, const Matrix& transformationMatrix, const unsigned N, const double r, const double s) : EllipticalDistribution( location1, dim1, transformationMatrix, TransformedDistribution1D(PowerTransform1D(r, s), Gamma1D(0.0, 1.0, N/s)), TransformedDistribution1D(PowerTransform1D(r, s), Gamma1D(0.0, 1.0, (N+dim1/2.0-1)/s))), N_(N), r_(r), s_(s) {calcNorm();} inline virtual ~EllipticalKotz() {} virtual double density(const double* location1, unsigned dim1) const; // Methods needed for I/O inline virtual gs::ClassId classId() const {return gs::ClassId(*this);} virtual bool write(std::ostream& os) const; static inline const char* classname() {return "npstat::EllipticalKotz";} static inline unsigned version() {return 1;} static EllipticalKotz* read(const gs::ClassId& id, std::istream& in); protected: inline virtual bool isEqual(const AbsDistributionND& other) const { const EllipticalKotz& r = static_cast(other); return EllipticalDistribution::isEqual(r) && N_ == r.N_ && r_ == r.r_ && s_ == r.s_; } private: void calcNorm(); inline EllipticalKotz(const EllipticalDistribution& b, const unsigned N, const double r, const double s) : EllipticalDistribution(b), N_(N), r_(r), s_(s) {calcNorm();} unsigned N_; double r_; double s_; double norm_; #ifdef SWIG public: inline EllipticalKotz(const double* location1, unsigned dim1, const double* transform, unsigned nrows, unsigned ncols, const unsigned N, const double r, const double s) : EllipticalDistribution( location1, dim1, transform, nrows, ncols, TransformedDistribution1D(PowerTransform1D(r, s), Gamma1D(0.0, 1.0, N/s)), TransformedDistribution1D(PowerTransform1D(r, s), Gamma1D(0.0, 1.0, (N+dim1/2.0-1)/s))), N_(N), r_(r), s_(s) {calcNorm();} #endif // SWIG }; class EllipticalPearsonTypeVII : public EllipticalDistribution { public: /** // In the constructor we must have s > 0, 2*N > dim1. // Also, the current implementation is limited to N < 7. */ inline EllipticalPearsonTypeVII(const double* location1, unsigned dim1, const Matrix& transformationMatrix, const unsigned N, const double s) : EllipticalDistribution( location1, dim1, transformationMatrix, PowerRatio1D(0.0, s, 0, N), PowerRatio1D(0.0, s, static_cast(dim1)-2, N)) {} inline virtual ~EllipticalPearsonTypeVII() {} // Methods needed for I/O inline virtual gs::ClassId classId() const {return gs::ClassId(*this);} inline virtual bool write(std::ostream& os) const { return EllipticalDistribution::classId().write(os) && EllipticalDistribution::write(os); } static inline const char* classname() {return "npstat::EllipticalPearsonTypeVII";} static inline unsigned version() {return 1;} static inline EllipticalPearsonTypeVII* read( const gs::ClassId& id, std::istream& in) { static const gs::ClassId current( gs::ClassId::makeId()); current.ensureSameId(id); CPP11_auto_ptr pb = gs::read_obj(in); return new EllipticalPearsonTypeVII(*pb); } private: inline EllipticalPearsonTypeVII(const EllipticalDistribution& b) : EllipticalDistribution(b) {} #ifdef SWIG public: inline EllipticalPearsonTypeVII(const double* location1, unsigned dim1, const double* transform, unsigned nrows, unsigned ncols, const unsigned N, const double s) : EllipticalDistribution( location1, dim1, transform, nrows, ncols, PowerRatio1D(0.0, s, 0, N), PowerRatio1D(0.0, s, static_cast(dim1)-2, N)) {} #endif // SWIG }; class EllipticalPearsonTypeII : public EllipticalDistribution { public: /** // Not sure why Pearson type II elliptical distribution does not have // the "s" parameter in the Fang's book. Here it is included, similar // to Pearson type VII. */ inline EllipticalPearsonTypeII(const double* location1, unsigned dim1, const Matrix& transformationMatrix, const unsigned m, const double s) : EllipticalDistribution(location1, dim1, transformationMatrix, Beta1D(0.0, s, 1.0, m+1U), Beta1D(0.0, s, dim1/2.0, m+1U)) {} inline virtual ~EllipticalPearsonTypeII() {} // Methods needed for I/O inline virtual gs::ClassId classId() const {return gs::ClassId(*this);} inline virtual bool write(std::ostream& os) const { return EllipticalDistribution::classId().write(os) && EllipticalDistribution::write(os); } static inline const char* classname() {return "npstat::EllipticalPearsonTypeII";} static inline unsigned version() {return 1;} static inline EllipticalPearsonTypeII* read( const gs::ClassId& id, std::istream& in) { static const gs::ClassId current( gs::ClassId::makeId()); current.ensureSameId(id); CPP11_auto_ptr pb = gs::read_obj(in); return new EllipticalPearsonTypeII(*pb); } private: inline EllipticalPearsonTypeII(const EllipticalDistribution& b) : EllipticalDistribution(b) {} #ifdef SWIG public: inline EllipticalPearsonTypeII(const double* location1, unsigned dim1, const double* transform, unsigned nrows, unsigned ncols, const unsigned m, const double s) : EllipticalDistribution(location1, dim1, transform, nrows, ncols, Beta1D(0.0, s, 1.0, m+1U), Beta1D(0.0, s, dim1/2.0, m+1U)) {} #endif // SWIG }; } #endif // NPSTAT_ELLIPTICALDISTRIBUTIONS_HH_ Index: trunk/npstat/stat/PowerTransform1D.hh =================================================================== --- trunk/npstat/stat/PowerTransform1D.hh (revision 827) +++ trunk/npstat/stat/PowerTransform1D.hh (revision 828) @@ -1,71 +1,71 @@ #ifndef NPSTAT_POWERTRANSFORM1D_HH_ #define NPSTAT_POWERTRANSFORM1D_HH_ /*! // \file PowerTransform1D.hh // -// \brief Transform y = a x^p (a > 0). Obviously, works only for non-negative x. -// x must be strictly positive if p is negative. +// \brief Transform y = a x^p (a > 0). Obviously, works only for +// non-negative x. x must be strictly positive if p is negative. // // Author: I. Volobouev // // September 2022 */ #include "npstat/stat/AbsDistributionTransform1D.hh" namespace npstat { class PowerTransform1D : public AbsDistributionTransform1D { public: PowerTransform1D(double a, double p); inline virtual ~PowerTransform1D() {} inline virtual PowerTransform1D* clone() const {return new PowerTransform1D(*this);} double transformForward(double x, double* dydx) const; double transformBack(double y) const; inline bool isIncreasing() const {return params_[1] > 0.0;} //@{ /** Prototype needed for I/O */ inline virtual gs::ClassId classId() const {return gs::ClassId(*this);} virtual bool write(std::ostream&) const; //@} static inline const char* classname() {return "npstat::PowerTransform1D";} static inline unsigned version() {return 1;} static PowerTransform1D* read(const gs::ClassId& id, std::istream& is); protected: virtual bool isEqual(const AbsDistributionTransform1D&) const; private: void validateScale(double a); void validatePower(double p); inline void setParameterChecked(const unsigned which, const double value) { if (which) validatePower(value); else validateScale(value); params_[which] = value; } inline void setAllParametersChecked(const double* p) { validateScale(p[0]); validatePower(p[1]); params_[0] = p[0]; params_[1] = p[1]; } inline double getParameterChecked(const unsigned which) const {return params_[which];} double params_[2]; }; } #endif // NPSTAT_POWERTRANSFORM1D_HH_ Index: trunk/npstat/stat/LOrPE1DVariableDegreeCVRunner.hh =================================================================== --- trunk/npstat/stat/LOrPE1DVariableDegreeCVRunner.hh (revision 827) +++ trunk/npstat/stat/LOrPE1DVariableDegreeCVRunner.hh (revision 828) @@ -1,178 +1,190 @@ #ifndef NPSTAT_LORPE1DVARIABLEDEGREECVRUNNER_HH_ #define NPSTAT_LORPE1DVARIABLEDEGREECVRUNNER_HH_ +/*! +// \file LOrPE1DVariableDegreeCVRunner.hh +// +// \brief Simultaneous optimization of LOrPE bandwidth and degree +// choice by cross-validation, assuming that the golden +// section search can succeed for both variables +// +// Author: I. Volobouev +// +// August 2022 +*/ + #include #include #include "npstat/nm/LinInterpolatedTable1D.hh" #include "npstat/nm/ConstSubscriptMap.hh" #include "npstat/nm/DualAxis.hh" #include "npstat/nm/SimpleFunctors.hh" #include "npstat/stat/LOrPE1DFixedDegreeCVRunner.hh" namespace npstat { namespace Private { template class LOrPE1DVariableDegreeCVRunnerHelper : public Functor1 { public: typedef LOrPE1DCVResult Status; typedef ConstSubscriptMap StatusMap; inline LOrPE1DVariableDegreeCVRunnerHelper( Lorpe& lorpe, const LinInterpolatedTable1D& relativeBandwidths, const double referenceDegree, const double referenceBandwidth, const double tol) : lorpe_(lorpe), relativeBandwidths_(relativeBandwidths), referenceDegree_(referenceDegree), referenceBandwidth_(referenceBandwidth), tol_(tol) { assert(referenceDegree_ >= 0.0); assert(referenceBandwidth_ > 0.0); assert(tol_ > 0.0); referenceBwFactor_ = relativeBandwidths_(referenceDegree_); assert(referenceBwFactor_ > 0.0); } inline virtual ~LOrPE1DVariableDegreeCVRunnerHelper() {} inline virtual double operator()(const double& degree) const { // Try to guess a good bandwidth value for this degree const double bw1 = relativeBandwidths_(degree); const double fr = bw1/referenceBwFactor_; double bwGuess = referenceBandwidth_*fr; assert(bwGuess > 0.0); if (!statusMap_.empty()) { StatusMap::const_iterator closest = statusMap_.upper_bound(degree); if (closest != statusMap_.begin()) --closest; const double bw0 = relativeBandwidths_(closest->first); const double f0 = bw1/bw0; if (std::abs(std::log(f0)) < std::abs(std::log(fr))) { bwGuess = closest->second.bwFactor()*f0; assert(bwGuess > 0.0); } } const LOrPE1DFixedDegreeCVRunner r(degree, bwGuess, tol_); const Status& status = r.crossValidate(lorpe_); const auto insertResult = statusMap_.insert(std::make_pair(degree,status)); assert(insertResult.second); return -status.cvFunction(); } inline const StatusMap& getStatusMap() const {return statusMap_;} private: Lorpe& lorpe_; const LinInterpolatedTable1D& relativeBandwidths_; double referenceDegree_; double referenceBandwidth_; double referenceBwFactor_; double tol_; mutable StatusMap statusMap_; }; } class LOrPE1DVariableDegreeCVRunner { public: inline LOrPE1DVariableDegreeCVRunner(const LinInterpolatedTable1D& i_relativeBandwidths, const double i_maxDeg, const unsigned i_nDegsInTheGrid, const double i_initialDeg, const double i_initialBwFactorGuess, const double i_initialDegStep, const double i_bwTol) : relativeBandwidths_(i_relativeBandwidths), degAxis_(UniformAxis(i_nDegsInTheGrid, 0.0, i_maxDeg)), initialDeg_(i_initialDeg), initialBwFactorGuess_(i_initialBwFactorGuess), initialDegStep_(i_initialDegStep), bwTol_(i_bwTol) { assert(i_maxDeg > 0.0); assert(initialDeg_ >= 0.0); assert(initialDeg_ <= i_maxDeg); assert(initialBwFactorGuess_ > 0.0); assert(initialDegStep_ > 0.0); assert(bwTol_ > 0.0); } template inline LOrPE1DCVResult crossValidate(Lorpe& lorpe) const { typedef Private::LOrPE1DVariableDegreeCVRunnerHelper Helper; typedef LOrPE1DCVResult Result; const LOrPE1DFixedDegreeCVRunner r( initialDeg_, initialBwFactorGuess_, bwTol_); const double referenceBandwidth = r.crossValidate(lorpe).bwFactor(); const Helper helper(lorpe, relativeBandwidths_, initialDeg_, referenceBandwidth, bwTol_); const double gridStep = degAxis_.intervalWidth(); const unsigned cellStep = std::round(initialDegStep_/gridStep); const std::pair& where = degAxis_.getInterval(initialDeg_); unsigned deg0 = where.first; if (where.second < 0.5) ++deg0; unsigned imin = UINT_MAX; double fMinusOne, fmin, fPlusOne; const MinSearchStatus1D status = goldenSectionSearchOnAGrid( helper, degAxis_, deg0, cellStep, &imin, &fMinusOne, &fmin, &fPlusOne); assert(status != MIN_SEARCH_FAILED); assert(imin < degAxis_.nCoords()); const double degGuess = degAxis_.coordinate(imin); Result resGuess = helper.getStatusMap()[degGuess]; if (status == MIN_SEARCH_OK) { assert(imin); assert(imin + 1U < degAxis_.nCoords()); // Parabolic approximation in the degree variable double bestDegree, extremumValue; const bool isMinimum = parabolicExtremum( degAxis_.coordinate(imin - 1U), fMinusOne, degGuess, fmin, degAxis_.coordinate(imin + 1U), fPlusOne, &bestDegree, &extremumValue); assert(isMinimum); const double bw0 = relativeBandwidths_(degGuess); const double bw1 = relativeBandwidths_(bestDegree); const double bwtry = resGuess.bwFactor()*bw1/bw0; const LOrPE1DFixedDegreeCVRunner r2(bestDegree, bwtry, bwTol_); Result bestGuess = r2.crossValidate(lorpe); Result* bestResult = bestGuess.cvFunction() > resGuess.cvFunction() ? &bestGuess : &resGuess; bestResult->setDegreeBoundaryFlag(false); return *bestResult; } else // Optimum is on the boundary of the degree grid return resGuess; } private: LinInterpolatedTable1D relativeBandwidths_; DualAxis degAxis_; double initialDeg_; double initialBwFactorGuess_; double initialDegStep_; double bwTol_; }; } #endif // NPSTAT_LORPE1DVARIABLEDEGREECVRUNNER_HH_ Index: trunk/npstat/stat/LOrPE1DFixedDegreeCVScanner.hh =================================================================== --- trunk/npstat/stat/LOrPE1DFixedDegreeCVScanner.hh (revision 827) +++ trunk/npstat/stat/LOrPE1DFixedDegreeCVScanner.hh (revision 828) @@ -1,68 +1,79 @@ #ifndef NPSTAT_LORPE1DFIXEDDEGREECVSCANNER_HH_ #define NPSTAT_LORPE1DFIXEDDEGREECVSCANNER_HH_ +/*! +// \file LOrPE1DFixedDegreeCVScanner.hh +// +// \brief Optimization of LOrPE bandwidth choice by cross-validation +// by simply scanning a predefined set of bandwidth values +// +// Author: I. Volobouev +// +// August 2022 +*/ + #include #include #include #include #include "npstat/nm/isMonotonous.hh" #include "npstat/nm/ScanExtremum1D.hh" #include "npstat/stat/LOrPE1DCVResult.hh" namespace npstat { class LOrPE1DFixedDegreeCVScanner { public: inline LOrPE1DFixedDegreeCVScanner(const double degree, const std::vector& bandwidthFactors) : bandwidthFactors_(bandwidthFactors), logBw_(bandwidthFactors.size()), fixedDegree_(degree) { if (fixedDegree_ < 0.0) throw std::invalid_argument( "In npstat::LOrPE1DFixedDegreeCVScanner constructor: " "degree argument must be non-negative"); const unsigned nscan = bandwidthFactors_.size(); if (nscan < 2U) throw std::invalid_argument( "In npstat::LOrPE1DFixedDegreeCVScanner constructor: " "insufficient number of bandwidth factors"); if (!isStrictlyIncreasing(bandwidthFactors_.begin(), bandwidthFactors_.end())) throw std::invalid_argument( "In npstat::LOrPE1DFixedDegreeCVScanner constructor: " "sequence of bandwidth factors must be strictly increasing"); for (unsigned i=0; i inline LOrPE1DCVResult crossValidate(Lorpe& lorpe) const { const unsigned nscan = bandwidthFactors_.size(); std::vector cvValues(nscan); for (unsigned i=0; i bandwidthFactors_; std::vector logBw_; double fixedDegree_; }; } #endif // NPSTAT_LORPE1DFIXEDDEGREECVSCANNER_HH_ Index: trunk/npstat/stat/LOrPE1DCVResult.hh =================================================================== --- trunk/npstat/stat/LOrPE1DCVResult.hh (revision 827) +++ trunk/npstat/stat/LOrPE1DCVResult.hh (revision 828) @@ -1,49 +1,60 @@ #ifndef NPSTAT_LORPE1DCVRESULT_HH_ #define NPSTAT_LORPE1DCVRESULT_HH_ +/*! +// \file LOrPE1DCVResult.hh +// +// \brief An object representing the result of the 1-d LOrPE +// cross-validation procedure +// +// Author: I. Volobouev +// +// August 2022 +*/ + #include namespace npstat { class LOrPE1DCVResult { public: // Default constructor creates an invalid result inline LOrPE1DCVResult() : filterDegree_(-1.0), bwFactor_(0.0), cvFunction_(0.0), onDegreeBoundary_(true), onBwBoundary_(true) {} // This constructor must create a valid result inline LOrPE1DCVResult(const double i_filterDegree, const double i_bwFactor, const double i_cvFunction, const bool i_onDegreeBoundary, const bool i_onBandwidthBoundary) : filterDegree_(i_filterDegree), bwFactor_(i_bwFactor), cvFunction_(i_cvFunction), onDegreeBoundary_(i_onDegreeBoundary), onBwBoundary_(i_onBandwidthBoundary) { assert(this->isValid()); } inline void setDegreeBoundaryFlag(const bool b) {onDegreeBoundary_ = b;} inline double filterDegree() const {return filterDegree_;} inline double bwFactor() const {return bwFactor_;} inline double cvFunction() const {return cvFunction_;} inline bool isOnDegreeBoundary() const {return onDegreeBoundary_;} inline bool isOnBandwidthBoundary() const {return onBwBoundary_;} inline bool isValid() const {return filterDegree_ >= 0.0 && bwFactor_ > 0.0;} private: double filterDegree_; double bwFactor_; double cvFunction_; bool onDegreeBoundary_; bool onBwBoundary_; }; } #endif // NPSTAT_LORPE1DCVRESULT_HH_ Index: trunk/npstat/stat/solveForLOrPEWeights.hh =================================================================== --- trunk/npstat/stat/solveForLOrPEWeights.hh (revision 827) +++ trunk/npstat/stat/solveForLOrPEWeights.hh (revision 828) @@ -1,149 +1,149 @@ #ifndef NPSTAT_SOLVEFORLORPEWEIGHTS_HH_ #define NPSTAT_SOLVEFORLORPEWEIGHTS_HH_ /*! // \file solveForLOrPEWeights.hh // // \brief Solver for the non-linear Fredholm equation in the semiparametric -// mixture approach with unbinned LOrPE +// mixture model with unbinned LOrPE // // Author: I. Volobouev // // July 2022 */ #include #include #include #include #include #include "npstat/stat/LOrPE1DCVResult.hh" #include "npstat/stat/AbsDistribution1D.hh" namespace npstat { class LOrPEWeightsResult { public: inline LOrPEWeightsResult(const double i_alpha, const bool i_converged, const unsigned i_nIterations, const LOrPE1DCVResult& cvResult) : cvResult_(cvResult), alpha_(i_alpha), nIterations_(i_nIterations), converged_(i_converged) { if (converged_) assert(cvResult_.isValid()); } inline double filterDegree() const {return cvResult_.filterDegree();} inline double bwFactor() const {return cvResult_.bwFactor();} inline double cvFunction() const {return cvResult_.cvFunction();} inline bool isOnDegreeBoundary() const {return cvResult_.isOnDegreeBoundary();} inline bool isOnBandwidthBoundary() const {return cvResult_.isOnBandwidthBoundary();} inline double alpha() const {return alpha_;} inline unsigned nIterations() const {return nIterations_;} inline bool converged() const {return converged_;} void print(std::ostream& os) const; private: LOrPE1DCVResult cvResult_; double alpha_; unsigned nIterations_; bool converged_; }; template inline LOrPEWeightsResult solveForLOrPEWeights( Lorpe& lorpe, const CvRunner& cvRunner, const AbsDistribution1D& signal, const double alpha, const ConvergenceCalculator& conv, const unsigned maxIterations) { const auto& sample = lorpe.coords(); const unsigned long sampleSize = sample.size(); std::vector signalDensity(sampleSize); for (unsigned long i=0; i weightBuffer(sampleSize*2); double* previousWeights = &weightBuffer[0]; double* currentWeights = previousWeights + sampleSize; for (unsigned long i=0; i bgDensityBuffer(sampleSize*2); double* previousBg = &bgDensityBuffer[0]; double* currentBg = previousBg + sampleSize; // Run the initial cross-validation LOrPE1DCVResult cvResult = cvRunner.crossValidate(lorpe); { // Create the initial background estimate // at the sample points const double filterDegree = cvResult.filterDegree(); const double bwFactor = cvResult.bwFactor(); for (unsigned long i=0; i= 0.0); } } bool converged = false; unsigned iter = 0; for (; iter 0.0) { const double d = alpha*signalDensity[i] + (1.0 - alpha)*previousBg[i]; assert(d > 0.0); currentWeights[i] = previousBg[i]/d; } else currentWeights[i] = 0.0; } lorpe.setWeights(currentWeights, sampleSize); // Run cross-validation with the new weights cvResult = cvRunner.crossValidate(lorpe); const double filterDegree = cvResult.filterDegree(); const double bwFactor = cvResult.bwFactor(); // Calculate background density for (unsigned long i=0; i= 0.0); } // Check for convergence converged = conv.converged(lorpe, previousWeights, currentWeights, previousBg, currentBg); // Rotate buffers std::swap(previousWeights, currentWeights); std::swap(previousBg, currentBg); } return LOrPEWeightsResult(alpha, converged, iter, cvResult); } } std::ostream& operator<<(std::ostream& os, const npstat::LOrPEWeightsResult& res); #endif // NPSTAT_SOLVEFORLORPEWEIGHTS_HH_ Index: trunk/npstat/stat/LOrPE1DVariableDegreeCVPicker.hh =================================================================== --- trunk/npstat/stat/LOrPE1DVariableDegreeCVPicker.hh (revision 827) +++ trunk/npstat/stat/LOrPE1DVariableDegreeCVPicker.hh (revision 828) @@ -1,324 +1,335 @@ #ifndef NPSTAT_LORPE1DVARIABLEDEGREECVPICKER_HH_ #define NPSTAT_LORPE1DVARIABLEDEGREECVPICKER_HH_ +/*! +// \file LOrPE1DVariableDegreeCVPicker.hh +// +// \brief Simultaneous optimization of LOrPE bandwidth and degree +// choice by cross-validation +// +// Author: I. Volobouev +// +// August 2022 +*/ + #include "npstat/nm/LinearMapper1d.hh" #include "npstat/stat/LOrPE1DFixedDegreeCVPicker.hh" namespace npstat { class LOrPE1DVariableDegreeCVPicker; namespace Private { template class LOrPE1DVariableDegreeCVPickerHelper : public Functor1 { public: typedef std::map StatusMap; inline LOrPE1DVariableDegreeCVPickerHelper( Lorpe& lorpe, const LOrPE1DVariableDegreeCVPicker& picker) : lorpe_(lorpe), picker_(picker) {} inline virtual ~LOrPE1DVariableDegreeCVPickerHelper() {} virtual double operator()(const double& degree) const; // This method returns an invalid result if the degree key is not memoized inline LOrPE1DCVResult getMemoized(const double degree) const { const auto it = statusMap_.find(degree); if (it == statusMap_.end()) return LOrPE1DCVResult(); else return it->second; } private: Lorpe& lorpe_; const LOrPE1DVariableDegreeCVPicker& picker_; mutable StatusMap statusMap_; }; } class LOrPE1DVariableDegreeCVPicker { public: - /* + /** // LOrPE1DVariableDegreeCVPicker constructor arguments are: // // i_maxDeg -- Maximum filter degree to consider // in the optimization procedure. // // minBwFactors -- Minimum bandwidth factors to // consider for each filter degree. // The number of points in the degree // grid will be set to the size // of this vector. For example, // if i_maxDeg = 2 and the size // of minBwFactors is 5, the following // five degrees will be considered: // 0.0, 0.5, 1.0, 1.5, 2.0. The // first element of "minBwFactors" // will be used for degree 0, the // second for degree 0.5, etc. The // size of "minBwFactors" vector // must be at least 2. // // maxBwFactors -- Maximum bandwidth factors to // consider for each degree. The size // of this vector should be equal to // the size of "minBwFactors". // // startingBwFactors -- Bandwidth factors to use to start // searches (for each filter degree). // The size of this vector should be // equal to the size of "minBwFactors". // // nBwFactors -- The size of the grid of bandwidth // factors. For each degree, this grid // will be generated by // "EquidistantInLogSpace" class with // the minimum and maximum taken from // the corresponding elements of // "minBwFactors" and "maxBwFactors" // vectors. This parameter must be // at least 2. // // initialStepSizeInFactorCells -- Initial step size in the units of // bandwidth cell width, for the // optimization over bandwidth factors. // // initialDeg -- Initial guess for the best filter // degree. // // initialStepSizeInDegreeCells -- Initial step size in the units of // filter degree cell width, for the // optimization over filter degrees. // The width of the filter degree cell // is just // i_maxDeg/(minBwFactors.size() - 1). */ inline LOrPE1DVariableDegreeCVPicker(const double i_maxDeg, const std::vector& minBwFactors, const std::vector& maxBwFactors, const std::vector& startingBwFactors, const unsigned nBwFactors, const unsigned initialStepSizeInFactorCells, const double initialDeg, const unsigned initialStepSizeInDegreeCells) : degAxis_(UniformAxis(minBwFactors.size(), 0.0, i_maxDeg)), minBwFactorForEachDeg_(minBwFactors), maxBwFactorForEachDeg_(maxBwFactors), startingBwFactorForEachDeg_(startingBwFactors), nBwFactors_(nBwFactors), initialStepSizeInFactorsGridCells_(initialStepSizeInFactorCells), initialStepSizeInDegreeGridCells_(initialStepSizeInDegreeCells) { const unsigned nDegsInTheGrid = minBwFactorForEachDeg_.size(); assert(i_maxDeg > 0.0); assert(nDegsInTheGrid > 1U); assert(maxBwFactorForEachDeg_.size() == nDegsInTheGrid); assert(startingBwFactorForEachDeg_.size() == nDegsInTheGrid); for (unsigned i=0; i 0.0); assert(maxBwFactorForEachDeg_[i] > minBwFactorForEachDeg_[i]); assert(startingBwFactorForEachDeg_[i] > 0.0); } assert(nBwFactors_ > 1U); if (initialDeg <= 0.0) initialDegCell_ = 0U; else if (initialDeg >= i_maxDeg) initialDegCell_ = nDegsInTheGrid - 1U; else { const std::pair& cell = degAxis_.getInterval(initialDeg); initialDegCell_ = cell.first; if (cell.second < 0.5) ++initialDegCell_; } if (!initialStepSizeInDegreeGridCells_) initialStepSizeInDegreeGridCells_ = 1U; if (!initialStepSizeInFactorsGridCells_) initialStepSizeInFactorsGridCells_ = 1U; } template inline LOrPE1DCVResult crossValidate(Lorpe& lorpe) const { typedef Private::LOrPE1DVariableDegreeCVPickerHelper Helper; typedef LOrPE1DCVResult Result; const unsigned nscan = degAxis_.nCoords(); const Helper helper(lorpe, *this); unsigned imin = UINT_MAX; double fMinusOne, fmin, fPlusOne; const MinSearchStatus1D status = goldenSectionSearchOnAGrid( helper, degAxis_, initialDegCell_, initialStepSizeInDegreeGridCells_, &imin, &fMinusOne, &fmin, &fPlusOne); if (status == MIN_SEARCH_FAILED) { std::vector buffer(2*nscan); double* degrees = &buffer[0]; double* cvValues = degrees + nscan; for (unsigned i=0; i resGuess.cvFunction() ? bestGuess : resGuess; } else { // Optimum is on the boundary of the degree grid assert(imin < nscan); const double deg = degAxis_.coordinate(imin); const LOrPE1DCVResult& searched = helper.getMemoized(deg); assert(searched.isValid()); return searched; } } private: template friend class Private::LOrPE1DVariableDegreeCVPickerHelper; template inline LOrPE1DCVResult cvArbitraryDegree(Lorpe& lorpe, const double bestDegree) const { const unsigned nscan = degAxis_.nCoords(); double minBwFactor, maxBwFactor, firstBwFactorToTry; const unsigned ibest = degAxis_.getInterval(bestDegree).first; const unsigned inext = ibest + 1U; if (inext < nscan) { const double xbest = degAxis_.coordinate(ibest); const double xnext = degAxis_.coordinate(inext); // Perform logarithmic interpolation of the min/max/firstToTry // bandwidth factors to the given degree which no longer // has to be on the degree grid const LinearMapper1d m0(xbest, std::log(minBwFactorForEachDeg_[ibest]), xnext, std::log(minBwFactorForEachDeg_[inext])); minBwFactor = std::exp(m0(bestDegree)); const LinearMapper1d m1(xbest, std::log(maxBwFactorForEachDeg_[ibest]), xnext, std::log(maxBwFactorForEachDeg_[inext])); maxBwFactor = std::exp(m1(bestDegree)); const LinearMapper1d m2(xbest, std::log(startingBwFactorForEachDeg_[ibest]), xnext, std::log(startingBwFactorForEachDeg_[inext])); firstBwFactorToTry = std::exp(m2(bestDegree)); } else { minBwFactor = minBwFactorForEachDeg_[ibest]; maxBwFactor = maxBwFactorForEachDeg_[ibest]; firstBwFactorToTry = startingBwFactorForEachDeg_[ibest]; } const LOrPE1DFixedDegreeCVPicker r2(bestDegree, minBwFactor, maxBwFactor, nBwFactors_, firstBwFactorToTry, initialStepSizeInFactorsGridCells_); LOrPE1DCVResult cv = r2.crossValidate(lorpe); assert(cv.isValid()); cv.setDegreeBoundaryFlag(false); return cv; } DualAxis degAxis_; std::vector minBwFactorForEachDeg_; std::vector maxBwFactorForEachDeg_; std::vector startingBwFactorForEachDeg_; unsigned nBwFactors_; unsigned initialStepSizeInFactorsGridCells_; unsigned initialDegCell_; unsigned initialStepSizeInDegreeGridCells_; }; namespace Private { template inline double LOrPE1DVariableDegreeCVPickerHelper::operator()(const double& degree) const { // Check that the degree argument is actually on // the degree grid. Figure out the bandwidth range // and the first bandwidth to try for this degree. const double eps = 1.e-5; const std::pair& cell = picker_.degAxis_.getInterval(degree); assert(cell.second < eps || cell.second > 1.0 - eps); unsigned ideg = cell.first; if (cell.second < eps) ++ideg; assert(ideg < picker_.degAxis_.nCoords()); const double minBwFactor = picker_.minBwFactorForEachDeg_[ideg]; const double maxBwFactor = picker_.maxBwFactorForEachDeg_[ideg]; const double firstBwFactorToTry = picker_.startingBwFactorForEachDeg_[ideg]; // Now, find the optimal bandwidth for this degree const LOrPE1DFixedDegreeCVPicker r( degree, minBwFactor, maxBwFactor, picker_.nBwFactors_, firstBwFactorToTry, picker_.initialStepSizeInFactorsGridCells_); const LOrPE1DCVResult& status = r.crossValidate(lorpe_); assert(status.isValid()); // Remember the optimal result for this degree const auto insertResult = statusMap_.insert(std::make_pair(degree,status)); assert(insertResult.second); return -status.cvFunction(); } } } #endif // NPSTAT_LORPE1DVARIABLEDEGREECVPICKER_HH_ Index: trunk/npstat/rng/00README.txt =================================================================== --- trunk/npstat/rng/00README.txt (revision 827) +++ trunk/npstat/rng/00README.txt (revision 828) @@ -1,70 +1,74 @@ The code in this directory can depend on headers from the "nm" directory of the "npstat" package but not from any other directory. The classes implemented in this directory mainly deal with generation of pseudo- and quasi-random multivariate sequences -- while the C++11 standard has a good coverage of pseudo-random generators, quasi-random numbers are still not there. Several utilities related to random numbers and combinatorics are implemented in this directory as well. Random Number Generators ------------------------ AbsRandomGenerator.hh -- Abstract interface class for pseudo- and quasi-random number generators. All other generators inherit from this interface. CPP11RandomGen.hh -- Wrapper for generator engines defined in the C++11 standard. EquidistantSampler1D.hh -- Samples a 1-d interval in equidistant steps, like bin centers in a histogram. There is nothing random about it. SobolGenerator.hh -- Multivariate Sobol quasi-random sequences. Note that the improved integral convergence over pseudo-random sequences is achieved only if the number of points used is 2^M - 2^K, with some integer M and K (often K = 0). HOSobolGenerator.hh -- Higher order scrambled Sobol sequences. See the comments in the header file if you want to know more. Probably, the best sequence overall for use in phase space integration. MersenneTwister.hh -- The famous Mersenne twister and its implementation MersenneTwisterImpl.hh (should be replaced by the C++11 implementation in the future). RandomSequenceRepeater.hh -- Can be used to store and repeat a sequence made by other generators. This can be useful when integration is performed in the same manner for a number of parameter values. RegularSampler1D.hh -- Regular sampler of the unit interval conforming to the AbsRandomGenerator interface. Splits the interval in half, then splits obtained halves in half, etc. Naturally, needs 2^M - 1 calls (with integer M) to perform regular sampling. Unlike EquidistantSampler1D, it is not necessary to recalculate function values for previous points if you decide to increase M on the basis of some convergence check. Utilities --------- +assignResamplingWeights.hh -- resample by assigning weights to a set + of points (instead of picking the points + from the set). + convertToSphericalRandom.hh -- converts random numbers generated in an N-dimensional hypercube into a random direction in the corresponding space (a point on a hypersphere) plus one independent random number on [0, 1) which can later be used to generate the distance to the origin. permutation.hh -- utilities related to permuting a set of consecutive integers, iterating over permutations, and calculating factorials. resampleWithReplacement.hh -- resampling with replacement from a vector of points.