Page MenuHomeHEPForge

No OneTemporary

diff --git a/ChangeLog b/ChangeLog
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,382 +1,386 @@
+2016-04-01 Holger Schulz <holger.schulz@cern.ch>
+
+ * Add a first version of the parametrisation using Machine Learning
+
2016-02-16 Holger Schulz <holger.schulz@cern.ch>
-
+
* Bump to version 2.1.3
2016-02-15 Holger Schulz <holger.schulz@cern.ch>
- * prof2-sens: bugfix, add weight file usage, clean up CL
+ * prof2-sens: bugfix, add weight file usage, clean up CL
* prof2-I: add weight file usage
* prof2-predict: add weight file usage, fix output writing
* Extended documentation
2016-02-12 Holger Schulz <holger.schulz@cern.ch>
- * Documentation including asciinema
+ * Documentation including asciinema
2016-02-11 Holger Schulz <holger.schulz@cern.ch>
* Remove min_runs function, replace with numCoeffs
from C++ code
* Add prof2-envelopes script
* In prof-tune simplechi2 --- continue on weight==0
to allow some robustness when switching off bins
2016-02-10 Holger Schulz <holger.schulz@cern.ch>
* First version of prof2-runcombs plus sampling functions
salvaged from Prof1, modified to have no scipy dependence
* Add --rc switch to prof2-ipol to optionally read runcombs
Syntax is --rc runcoms.dat[colon]5
(Minimally invasive addon)
* Code cull in prof2-ipol. Only support multiprocessing
version, default 1 thread. Changed CL switch to '-j'.
2015-12-16 Holger Schulz <holger.schulz@cern.ch>
* Only calculate polynomials structure once per instance
-> significant speed increase
* Add the brute force grid scan (--scan) to prof2-tune
2015-12-11 Holger Schulz <holger.schulz@cern.ch>
* Minimiser tweaks --- no more need for param translation,
make strategy steerable (--strategy or -s), -q suppresses
iminuit output now, calculation and writing of correlation
matrix into results, dump weights into results file
2015-12-10 Holger Schulz <holger.schulz@cern.ch>
* Undo ui madness
* Remove numpy dependence (except sampling.py)
* Add prof-sens for sensitivity plotting exploting derivatives
2015-12-09 Holger Schulz <holger.schulz@cern.ch>
* Can specify PREFIX when calling make
2015-12-09 Andy Buckley <andy.buckley@cern.ch>
* Improve prof2-predict UI.
* Add metadata printing in prof2-lsipol debug mode.
* Add -v and -q flag shortcuts to all scripts.
2015-12-07 Holger Schulz <holger.schulz@cern.ch>
* Use __all__ in __init__.py
* Started sphinx documentation
* Support only iminuit, bump version to 2.2.alpha, restructure imports
* Sanity checks when calling parametrisation
2015-12-03 Andy Buckley <andy.buckley@cern.ch>
* Provide a version() function in the C++ library, and map it into
Python. Both sources use the VERSION variable defined in the
Makefile at build time.
* Reduce SVD fitting threshold to 1e-20.
2015-12-03 Holger Schulz <holger.schulz@cern.ch>
* Make derivative calculation work also when scaling params
* Let YODA write YODA
2015-12-02 Holger Schulz <holger.schulz@cern.ch>
* Derivative calculation
2015-11-28 Andy Buckley <andy.buckley@cern.ch>
* Version 2.1.2
* Revert range shift in Ipol.cc, since it produced (small)
numerical errors rather than improved stability. To be
understood...
* Add test/mkpolydata script, for generating polynomial pseudodata
used in closure tests.
2015-11-27 Andy Buckley <andy.buckley@cern.ch>
* Fix bug in histo loading.
* Adding log binning and other options to prof2-residuals.
2015-11-24 Andy Buckley <andy.buckley@cern.ch>
* Version 2.1.1 release.
* Fix prof2-predict YODA writing, and improve DataHisto.toYODA.
* Change parameter rescaling from using the [0,1] range to instead
use [1,2], to avoid arbitrarily scaled param numbers. BREAKS IPOL
FILE COMPATIBILITY WITH 2.1.0!!
* Set minimizer initial step size = prange/10.
* Add automatic determination of maximum MC errors in prof-tune,
and use them to regularize interpolated errs which could distort
the fit.
* Separate internal Minuit function names (with syntax
restrictions) from the free-form external param names.
2015-11-23 Andy Buckley <andy.buckley@cern.ch>
* Add pyext/professor2/chi2.py
* Add find_maxerrs() to the Python interface, for use in error regularisation.
* Use the param file name as a regex.
* Use __slots__ in the internal histogram and bin types, for better memory efficiency.
* Fix (re-fix?) use of inappropriate indices in calculation of mean and median MC uncertainties.
* More proactive data object deletion after conversion to YODA data format.
2015-11-22 Andy Buckley <andy.buckley@cern.ch>
* Version 2.1.0 (significant version change due to scaling, all-orders, and ROOT support)
2015-11-18 Andy Buckley <andy.buckley@cern.ch>
* dataio.py: Add ROOT file reading ability.
* Change rescaling I/O behaviour a bit, moving some hacked
Python-only Ipol string constructors into explicit API improvements, and
only activating rescaling if DoParamScaling is set true.
2015-11-16 Andy Buckley <andy.buckley@cern.ch>
* Add CPPFLAGS steering in Makefile and API tidying.
2015-11-11 Holger Schulz <holger.schulz@cern.ch>
* Add automatic raw param value -> unit range mapping to Ipol.
* Add any-order long vector calculation.
2015-11-10 Andy Buckley <andy.buckley@cern.ch>
* Add prof2-lsipol as a handy way to make a tuning weights file
and for general convenience.
2015-10-06 Andy Buckley <andy.buckley@cern.ch>
* Version 2.0.0
* Add 'make dist' target to Makefile, and tweak/reduce Cython necessity
* Allow calls to IpolBin and IpolHisto methods with params passed
as unpacked *args as well as a packed iterable.
* Add possibility to pass params as an dict-like as well as a list
or tuple of floats -- note that the dict-like must return
*ordered* values consistent with the Ipol training. Also ensure
the float type.
* Add conversion of C++ exceptions to Python exceptions.
* Add professor2.utils module, providing an opportunistic use of
OrderedDict for params so the params order required by Ipol is the
same as the order in which they are specified in the (first run's)
params.dat file.
* Change Python-mapped Ipol methods with no args to be properties.
* Add nice __repr__ methods for Histo and Bin objects.
* Specialise Histo as IpolHisto and DataHisto, and provide
toData{Histo,Bin} on IpolHisto and IpolBin respectively.
* Remove ProfMaster.
2015-10-04 Andy Buckley <andy.buckley@cern.ch>
* Add protection of svd.setThreshold call to ensure it is
supported/needed, via Eigen version number macros.
* Add vmin and vmax args to Ipol and IpolBin in Python.
2015-10-02 Andy Buckley <andy.buckley@cern.ch>
* Put histo file reading in a try..except block so read failures
on non-histo files in the run dirs are not fatal.
* Add protection in the histo loader, so we don't trip up on
Rivet's new cross-section and counter objects.
2015-09-30 Andy Buckley <andy.buckley@cern.ch>
* src/Ipol.h: Use Eigen/SVD include path rather than non-standard eigen3/Eigen/SVD.
2015-09-14 Holger Schulz <holger.schulz@durham.ac.uk>
* Use Eigen's svd.setThreshold(1e-20) to prevent the errors we saw with Eigen
3.2.2 and later when looking at 5th order polynomials in 3 dimensinos
with 900+ anchors
* Some clean up
2015-09-11 Andy Buckley <andy.buckley@cern.ch>
* Adding prof2-residuals script for ipol faithfulness testing.
* Rename scripts from prof-* to prof2-* to allow parallel installations of Prof1 and Prof2.
* Move some non-core scripts from bin to contrib.
* Make root-config dependency optional
* Lots of reworking to make prof-tune work again, and to use weight file parsing.
2015-09-10 Andy Buckley <andy.buckley@cern.ch>
* Use weight file path parsing in prof-ipol.
* Add prof.weights submodule, with a copy of PointMatcher and a
new read_pointmatchers function for handling weight files.
* Provide __version__attribute in the Python module.
* User script CLI consistency, simplification, general smoothing...
2015-08-30 Andy Buckley <andy.buckley@cern.ch>
* Add recursive scangrid generator function, and other tweaks.
2015-08-18 Andy Buckley <andy.buckley@cern.ch>
* Tweaks to ipol file parsing functions.
* Fix a bug in IpolBin's use of interpolated errors.
* Convert mean and median error parameterisation to use new 0th order ipols.
2015-08-15 Holger Schulz <holger.schulz@durham.ac.uk>
* A first version of catching singular matrices in the SVD
* Ipol has 1 additional argument, "threshold", that determines what
singular values are considered 0, fully propagated to pyext
* Going 2 Pro 2 Fessional
* Some startup checks of prof- scripts
* remove bin/prof-sampling, rename prof-prediction -> prof-predict
2015-08-13 Holger Schulz <holger.schulz@durham.ac.uk>
* Reverted unintended commit of hack.
* Added very basic prof-config script to bin
* Update PATH when sourcing setup.sh
2015-08-12 Holger Schulz <holger.schulz@durham.ac.uk>
* Added prof-prediction which reads in a ifile and a point in
parameter space (either a text file, a comma separated list or just
the args) and writes out an ipolhisto. Tested to work.
2015-08-03 Holger Schulz <holger.schulz@durham.ac.uk>
* Added 0-order polynomials, i.e. constant values.
Currently, the coefficent is simply the value of
the first anochorpoint.
2015-07-15 Holger Schulz <holger.schulz@durham.ac.uk>
* Some pull plotting functionality
2015-07-08 Holger Schulz <holger.schulz@durham.ac.uk>
* Calculate and print G.o.f.
* Write some meta info to results
* Read limits, fixed params from single textfile --limits
* Output files steered by --output
* Remove now obsolete bin/prof-interpolate
2015-07-08 Holger Schulz <holger.schulz@durham.ac.uk>
* First working version of prof-tune (pyminuit, simpleGOF)
* Multiprocessing to speed up prof-ipol (--multi)
* prof-ipol-tabulated to write out interpolation of tabulated data
* Minor bugfixes
2015-06-16 Andy Buckley <andy.buckley@cern.ch>
* Adding mean and median strategies for assigning constant ipol bin errors.
2015-06-03 Andy Buckley <andy.buckley@cern.ch>
* Adding a Bin base class in the Python side, and a coherent handling of multiple (interpolated) bin errors.
* Various tweaks and minor fixes.
2015-05-10 Andy Buckley <andy.buckley@cern.ch>
* Add 'Minuit' class importing to the __init__.py file.
2015-05-05 Andy Buckley <andy.buckley@cern.ch>
* Rewrite C++ ipolstring parsing using STL string functions rather than Boost: shorter! Boost dependency GONE :-)
* Replace boost::tuple with std::pair and clean up ParamPoints interface a bit.
* Add a --veto option to prof-sample, for a user-specified vetoing function.
* Move Sampler from prof-sample into professor2/__init__.py
2015-05-04 Andy Buckley <andy.buckley@cern.ch>
* Adding a first version of prof-tune with ipol reading from file.
* Adding options for (average) error interpolation and ipol persistency to prof-ipol and module functions.
2015-05-03 Andy Buckley <andy.buckley@cern.ch>
* Add bin/prof-ipol simple script using the functions below -- to be merged with prof-interpolate.
* professor2/__init__.py: Add Histo, DataBin, IpolBin and basic data handling functions.
* Remove unnecessary dlopen fiddling from professor2/__init__.py
2015-04-23 Andy Buckley <andy.buckley@cern.ch>
* Remove bound ParamPoints pointer from Ipol, and ditch lazy coeff evaluation in favour of simplicity and construct-time coeff calculating.
* Move long vector and coeff building into unbound functions rather than Ipol methods.
* Start of ChangeLog. Library has already iterated quite a bit.
* FILL IN EARLY HISTORY FROM HG LOG
diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -1,141 +1,144 @@
## Makefile for Professor 2.x
VERSION := 2.1.3
## Default values for user-specifiable build variables
ifndef PREFIX
PREFIX := /usr/local
endif
CXXSTD := c++11
ifndef CXX
CXX := g++
endif
ifndef CPPFLAGS
CPPFLAGS := -I/usr/include/eigen3
endif
ifndef CXXFLAGS
CXXFLAGS := -O3
ifdef DEBUG
ifneq ($(DEBUG),0)
CXXFLAGS += -g
endif
endif
endif
ifndef PYTHON
PYTHON := python
endif
ifndef CYTHON
CYTHON := cython
endif
###################
DISTNAME := Professor-$(VERSION)
#SHELL := /bin/bash
HAVE_ROOT := $(shell which root-config 2> /dev/null)
HAVE_CYTHON := $(test `cython --version 2>&1 | sed -e 's/Cython version \([0-9\.]\+\)/\1/' | cut -d. -f2` -ge 20)
LIBHEADERS := $(wildcard include/Professor/*.h)
LIBSOURCES := $(wildcard src/*.cc)
LIBOBJECTS := $(patsubst %,obj/%.o, ParamPoints Counter Ipol Version)
TESTSOURCES := $(wildcard test/*.cc test/testPython*)
TESTPROGS := test/testParamPoints test/testIpol
BINPROGS := $(wildcard bin/*)
+CONTRIBPROGS := $(wildcard contrib/*)
PYTHONSOURCES := $(wildcard pyext/professor2/*.py)
CYTHONSOURCES := $(wildcard pyext/professor2/*.pxd) $(wildcard pyext/professor2/*.pyx)
.PHONY := all lib pyext tests cxxtests pytests check icheck clean root dist
all: lib pyext tests
@true
lib: lib/libProfessor2.so
@true
lib/libProfessor2.so: $(LIBOBJECTS)
mkdir -p lib
$(CXX) -shared -Wl,-soname,libProfessor2.so -o $@ $(LIBOBJECTS)
obj/%.o: src/%.cc $(LIBHEADERS)
mkdir -p obj
$(CXX) -std=$(CXXSTD) -DPROF_VERSION="$(VERSION)" -Iinclude $(CPPFLAGS) $(CXXFLAGS) -c -fPIC $< -o $@
pyext: pyext/professor2/core.so $(wildcard pyext/professor2/*.py)
$(PYTHON) pyext/setup.py install --prefix=.
ifdef HAVE_CYTHON
pyext/professor2/core.cpp: $(LIBHEADERS) $(CYTHONSOURCES) lib
$(CYTHON) pyext/professor2/core.pyx --cplus
else
pyext/professor2/core.cpp: $(LIBHEADERS) $(CYTHONSOURCES) lib
$(error "Cython >= 0.20 not available; can't build $@")
endif
pyext/professor2/core.so: pyext/professor2/core.cpp
PROF_VERSION=$(VERSION) $(PYTHON) pyext/setup.py build_ext -i --force
tests: cxxtests pytests
@true
cxxtests: $(TESTPROGS)
@true
test/%: test/%.cc $(LIBHEADERS) lib
$(CXX) -std=$(CXXSTD) -Iinclude $(CPPFLAGS) $(CXXFLAGS) $< -Llib -lProfessor2 -o $@
ifdef HAVE_ROOT
root: src/testRoot.cc $(LIBHEADERS) lib
$(CXX) -std=$(CXXSTD) $(CPPFLAGS) $(CXXFLAGS) $< -Iinclude `root-config --cflags --libs` -Llib -lProfessor2 -o test/test$@
endif
pytests: pyext
@true
check: tests
@echo
@echo "testParamPoints" && test/testParamPoints && echo "\n\n"
@echo "testIpol" && test/testIpol && echo "\n\n"
icheck: tests
test/testPython
test/testPython1D
test/testPython2D
install: all
mkdir -p $(PREFIX)/bin && cp bin/* $(PREFIX)/bin/
+ mkdir -p $(PREFIX)/contrib && cp contrib/* $(PREFIX)/contrib/
mkdir -p $(PREFIX)/include && cp -r include/Professor $(PREFIX)/include/
test -d lib && mkdir -p $(PREFIX)/lib && cp -r lib/* $(PREFIX)/lib/ || true
test -d lib64 && mkdir -p $(PREFIX)/lib64 && cp -r lib64/* $(PREFIX)/lib64/ || true
# cp setup.sh $(PREFIX)
dist: all
rm -rf $(DISTNAME)
mkdir -p $(DISTNAME)
cp --parents \
README Makefile \
$(LIBHEADERS) \
$(LIBSOURCES) \
$(BINPROGS) \
+ $(CONTRIBPROGS) \
$(TESTSOURCES) \
- $(PYTHONSOURCES) pyext/setup.py pyext/professor2/misc/*py \
+ $(PYTHONSOURCES) pyext/setup.py pyext/professor2/misc/*py pyext/professor2/ml/*py\
$(CYTHONSOURCES) $(wildcard pyext/professor2/*.cpp) \
$(wildcard contrib/*) \
$(DISTNAME)/
tar czf $(DISTNAME).tar.gz $(DISTNAME)
clean:
rm -rf obj/*.o lib/*
rm -f pyext/professor2/core.cpp pyext/professor2/core.so
rm -f $(TESTPROGS)
rm -rf $(DISTNAME) $(DISTNAME).tar.gz
diff --git a/contrib/prof2-ml b/contrib/prof2-ml
new file mode 100755
--- /dev/null
+++ b/contrib/prof2-ml
@@ -0,0 +1,186 @@
+#! /usr/bin/env python
+
+"""\
+%prog <runsdir> [<ipolfile>=ipol.dat] [opts]
+
+Interpolate histo bin values as a function of the parameter space by loading
+from the usual data directory structure $datadir/mc/{rundirs}
+
+TODO:
+ * Use weight file position matches to exclude some bins, as well as path matching
+ * Handle run combination file/string (write a hash of the run list into the ipol filename?)
+ * Support asymm error parameterisation
+"""
+
+import optparse, os, sys
+op = optparse.OptionParser(usage=__doc__)
+op.add_option("--pname", "--pfile", dest="PNAME", default="params.dat", help="Name of the params file to be found in each run directory (default: %default)")
+op.add_option("--wfile", dest="WFILE", default=None, help="Path to a weight file, used to restrict ipol building to a subset of bins (default: %default)")
+op.add_option("--ierr", dest="IERR", default="median", help="Whether to interpolate MC errors: none, mean, median, symm (default: %default)") #< add rel, asymm
+op.add_option("--rc", dest="RUNCOMBS", default=None, help="Run combination file")
+# TODO: Add a no-scale option
+# TODO: Change to instead (optionally) specify the max number of parallel threads/procs. Use by default if ncores > 1?
+op.add_option("-j", dest="MULTI", type=int, default=1, help="Number of threads to use")
+op.add_option("--summ", dest="SUMMARY", default=None, help="Summary description to be written to the ipol output file")
+op.add_option("-v", "--debug", dest="DEBUG", action="store_true", default=False, help="Turn on some debug messages")
+op.add_option("-q", "--quiet", dest="QUIET", action="store_true", default=False, help="Turn off messages")
+opts, args = op.parse_args()
+
+## Get mandatory arguments
+if len(args) < 1:
+ print "Argument missing... exiting\n\n"
+ op.print_usage()
+ sys.exit(1)
+RUNSDIR = args[0]
+IFILE = "ml.pkl"
+if len(args) >= 2:
+ IFILE = args[1]
+
+
+## Load the Professor machinery
+import professor2 as prof
+if not opts.QUIET:
+ print prof.logo
+
+
+## Load MC run histos and params
+import glob
+INDIRS = glob.glob(os.path.join(RUNSDIR, "*"))
+try:
+ PARAMS, HISTOS = prof.read_rundata(INDIRS, opts.PNAME)
+ RUNS, PARAMNAMES, PARAMSLIST = prof.mk_ipolinputs(PARAMS)
+except Exception, e:
+ print e
+ sys.exit(1)
+
+# TODO: add runcomb file parsing to select a runs subset
+# --rc runcombs.dat:4
+# would use the 4th line of the runcombs.dat file
+if opts.RUNCOMBS is not None:
+ f_rc, line = opts.RUNCOMBS.split(":")
+ with open(f_rc) as f:
+ temp=[l for l in f]
+ thisRC = temp[int(line)].split()
+
+ # Filtering and overwriting
+ thisRUNS, thisPARAMSLIST = [], []
+ for num, r in enumerate(RUNS):
+ if r in thisRC:
+ thisRUNS.append(r)
+ thisPARAMSLIST.append(PARAMSLIST[num])
+ RUNS=thisRUNS
+ PARAMSLIST=thisPARAMSLIST
+
+
+# ## Some useful announcements about the data loaded and the interpolation planned
+# if not opts.QUIET:
+ # if (len(RUNS) < prof.numCoeffs(len(PARAMNAMES), opts.ORDER)):
+ # print "Not enough runs for order %i polynomials"%opts.ORDER
+ # for i in xrange(1, opts.ORDER):
+ # if (prof.numCoeffs(len(PARAMNAMES), opts.ORDER -i) <= len(RUNS)):
+ # print "Try order %i (min %i runs)"%(opts.ORDER -i, prof.numCoeffs(len(PARAMNAMES), opts.ORDER -i))
+ # import sys
+ # sys.exit(1)
+ # else:
+ # print "Building %dD interpolations in %d params: require at least %d runs" % \
+ # (opts.ORDER, len(PARAMNAMES), prof.numCoeffs(len(PARAMNAMES), opts.ORDER))
+ # print "Loaded %d distinct observables from %d runs" % (len(HISTOS), len(RUNS))
+
+
+## Weight file parsing to select a histos subset
+if opts.WFILE:
+ matchers = prof.read_pointmatchers(opts.WFILE)
+ for hn in HISTOS.keys():
+ if not any(m.match_path(hn) for m in matchers.keys()):
+ del HISTOS[hn]
+ elif opts.DEBUG:
+ print "Observable %s passed weight file path filter" % hn
+ print "Filtered observables by path, %d remaining" % len(HISTOS)
+HNAMES = HISTOS.keys()
+
+## If there's nothing left to interpolate, exit!
+if not HNAMES:
+ print "No observables remaining... exiting"
+ sys.exit(1)
+
+
+## Robustness tests and cleaning: only retain runs that contain every histo
+# TODO: combine with weights histo vetoing -- should we explicitly unload unused run data, or keep it for the next combination to use? Or do we now leave runcombs to the user?
+bad, badnum = [], []
+for irun, run in enumerate(RUNS):
+ for hn in HNAMES:
+ if not HISTOS[hn].has_key(run):
+ bad.append(run)
+ badnum.append(irun)
+ break
+if bad:
+ print "Found %d bad runs in %d total... removing" % (len(bad), len(RUNS))
+ goodr, goodp = [], []
+ for irun, run in enumerate(RUNS):
+ if not irun in badnum:
+ goodr.append(run)
+ goodp.append(PARAMSLIST[irun])
+ RUNS = goodr
+ PARAMSLIST = goodp
+
+## If there's nothing left to interpolate, exit!
+if not RUNS:
+ print "No valid runs remaining... exiting"
+ sys.exit(1)
+
+
+IHISTOS = {}
+
+from professor2.ml import *
+
+def worker(q, rdict):
+ "Function to make bin ipols and store ipol persistency strings for each histo"
+ while True:
+ if q.empty():
+ break
+ hn = q.get()
+ histos = HISTOS[hn]
+ ih = mk_MLHisto(histos, RUNS, PARAMSLIST)
+ del HISTOS[hn] #< pro-actively clear up memory
+ rdict[hn] = ih#zlib.compress(s, 9) #< save some memory
+ del histos
+
+
+print "\n\nParametrising...\n"
+import time, multiprocessing
+time1 = time.time()
+
+## A shared memory object is required for coefficient retrieval
+from multiprocessing import Manager
+manager = Manager()
+tempDict = manager.dict()
+
+## The job queue
+q = multiprocessing.Queue()
+map(lambda x:q.put(x), HNAMES)
+
+## Fire away
+workers = [multiprocessing.Process(target=worker, args=(q, tempDict)) for i in range(opts.MULTI)]
+map(lambda x:x.start(), workers)
+map(lambda x:x.join(), workers)
+
+# ## Finally copy the result dictionary into the object itself
+for k in tempDict.keys():
+ IHISTOS[k] = tempDict[k]
+
+# for HN in HNAMES:
+ # histos = HISTOS[HN]
+ # IHISTOS[HN] = mk_MLHisto(histos, RUNS, PARAMSLIST)
+
+## Timing
+time2 = time.time()
+print 'Parametrisation took %0.2f s' % ((time2-time1))
+
+
+
+## Active memory clean-up
+del HISTOS
+
+import cPickle
+cPickle.dump(IHISTOS, open(IFILE, 'wb'), -1)
+print "\nOutput written to %s" % IFILE
diff --git a/contrib/prof2-ml-predict b/contrib/prof2-ml-predict
new file mode 100755
--- /dev/null
+++ b/contrib/prof2-ml-predict
@@ -0,0 +1,76 @@
+#! /usr/bin/env python
+
+"""\
+%prog <param1,param2,param3> [<ipolfile>=ipol.dat]
+
+%prog PARAMFILE [<ipolfile>=ipol.dat]
+
+Write out interpolated histograms at a given parameter point.
+
+"""
+
+import optparse, os, sys
+op = optparse.OptionParser(usage=__doc__)
+op.add_option("--wfile", dest="WFILE", default=None, help="Path to a weight file, used to restrict output to a subset of histograms (default: %default)")
+op.add_option("-o", dest="OUTPUT", default="mlprediction.yoda", help="output file name (default: %default)")
+op.add_option("-v", "--debug", dest="DEBUG", action="store_true", default=False, help="Turn on some debug messages")
+op.add_option("-q", "--quiet", dest="QUIET", action="store_true", default=False, help="Turn off messages")
+opts, args = op.parse_args()
+
+
+try:
+ import yoda
+except ImportError:
+ "YODA not found, exiting"
+ import sys
+ sys.exit(1)
+
+
+
+## Get mandatory arguments
+if len(args) < 1:
+ print "Argument missing... exiting\n\n"
+ op.print_usage()
+ sys.exit(1)
+PARAMSTR = args[0]
+IFILE = "ipol.dat"
+if len(args) >= 2:
+ IFILE = args[1]
+
+
+import professor2 as prof
+if not opts.QUIET:
+ print prof.logo
+
+## Parse the param point argument
+PARAMS = None
+if os.path.exists(PARAMSTR):
+ with open(args[0]) as f:
+ PARAMS = [float(l.strip().split()[-1]) for l in f if not l.startswith("#")]
+else:
+ print "No parameter file given or specified param file does not exist, exiting..."
+ sys.exit(1)
+
+## Read ML histos
+
+import cPickle
+MLHISTOS=cPickle.load(open(IFILE, 'rb'))
+
+## Weight file parsing
+if opts.WFILE:
+ matchers = prof.read_pointmatchers(opts.WFILE)
+ for hn in MLHISTOS.keys():
+ if not any(m.match_path(hn) for m in matchers.keys()):
+ del MLHISTOS[hn]
+ if len(MLHISTOS.keys())==0:
+ print "Nothing left after weight file parsing, exiting"
+ sys.exit(0)
+
+
+## Write out ipolhistos
+ofile = opts.OUTPUT
+scatters=[MLHISTOS[k].toDataHisto(PARAMS).toScatter2D() for k in sorted(MLHISTOS.keys())]
+yoda.writeYODA(scatters, ofile)
+
+if not opts.QUIET:
+ print "Wrote output to", ofile
diff --git a/contrib/prof2-ml-residuals b/contrib/prof2-ml-residuals
new file mode 100755
--- /dev/null
+++ b/contrib/prof2-ml-residuals
@@ -0,0 +1,175 @@
+#! /usr/bin/env python
+
+"""\
+%prog <runsdir> [<ipolfile>=ipol.dat] [opts]
+
+Compute and plot the relative deviations of the given interpolation from the
+corresponding sample points in the given <runsdir>, for both bin values and bin
+errors. Histograms are made for each histogram independently (summed over bins and MC
+points), and for the sum over all histos and bins, and are written to PDFs and
+to a res.yoda data file.
+
+This can be useful for determining whether the interpolation is providing a
+sufficiently faithful approximation to the true observables, and choosing the
+most appropriate order of fit polynomial. It can also be useful to detect
+overfitting, by comparing residuals distributions between the set of runs used
+to make the fit (via prof-ipol) and an equivalent set not included in the fit.
+
+TODO:
+ * Allow control over the output filenames / directory.
+ * Add an option for value-only (no err) residuals
+ * Support runcomb filtering
+"""
+
+from __future__ import print_function
+
+import optparse, os, sys
+op = optparse.OptionParser(usage=__doc__)
+#op.add_option("--ifile", dest="IFILE", default="ipol.dat", help="File from which to read the bin interpolations (default: %default)")
+op.add_option("--pname", "--pfile", dest="PNAME", default="params.dat", help="name of the params file to be found in each run directory (default: %default)")
+op.add_option("--xrange", "--rerange", dest="XRANGE", default=0.1, type=float, help="the +- range of relative deviation to display (default: %default)")
+op.add_option("--nbins", dest="NBINS", default=30, type=int, help="number of bins in relative residuals histograms (default: %default)")
+op.add_option("--no-plots", dest="NO_PLOTS", action="store_true", default=False, help="don't write histogram PDFs (default: %default)")
+op.add_option("--tot-only", dest="ONLY_TOT", action="store_true", default=False, help="only make total residuals histograms, not per-observable (default: %default)")
+op.add_option("--logx", dest="LOG_BINS", action="store_true", default=False, help="use a symmetric log binning for better resolution around 0 (default: %default)")
+op.add_option("-v", "--debug", dest="DEBUG", action="store_true", default=False, help="turn on some debug messages")
+op.add_option("-q", "--quiet", dest="QUIET", action="store_true", default=False, help="turn off messages")
+opts, args = op.parse_args()
+
+## Get mandatory arguments
+if len(args) < 1:
+ print("Argument missing... exiting\n\n", file=sys.stderr)
+ op.print_usage()
+ sys.exit(1)
+RUNSDIR = args[0]
+IFILE = "ml.pkl"
+if len(args) >= 2:
+ IFILE = args[1]
+
+
+## Load the Professor machinery
+import professor2 as prof
+if not opts.QUIET:
+ print(prof.logo)
+
+## No point in going further if YODA isn't available, too
+try:
+ import yoda
+except ImportError:
+ print("YODA is required by this tool... exiting", file=sys.stderr)
+ sys.exit(1)
+
+
+## Load MC run histos and params
+import glob
+# TODO: add runcomb file parsing to select a runs subset
+INDIRS = glob.glob(os.path.join(RUNSDIR,"*"))
+try:
+ PARAMS, HISTOS = prof.read_rundata(INDIRS, opts.PNAME)
+ RUNS, PARAMNAMES, PARAMSLIST = prof.mk_ipolinputs(PARAMS)
+except Exception, e:
+ print(e, file=sys.stderr)
+ sys.exit(1)
+
+## Load interpolated histograms from file
+import cPickle
+IHISTOS=cPickle.load(open(IFILE))
+print("Loaded ipol histos from", IFILE)
+
+
+## Set up residuals histo binnings
+hedges = None
+if opts.LOG_BINS:
+ ## Determine central linear bin (half) size:
+ ## dx1 == exp(ln(xmin) + (ln(xmax) - ln(xmin))/nbins) - xmin
+ ## => set xmin <= dx1 for monotonic bin sizes:
+ ## => ln(2*xmin) == ln(xmin)(1 - 1/nbins) + ln(xmax)/nbins
+ ## => ln(xmin) == ln(xmax) - ln(2)*nbins
+ ## => xmin == exp( ln(xmax) - ln(2)*nbins )
+ eps = opts.XRANGE*(3**-(opts.NBINS//2))
+ edges1 = yoda.logspace(opts.NBINS//2, eps, opts.XRANGE)
+ hedges = [-e for e in reversed(edges1)] + edges1
+else:
+ hedges = yoda.linspace(opts.NBINS, -opts.XRANGE, opts.XRANGE)
+if opts.DEBUG:
+ print("Histogram bin edges = ", hedges)
+
+## Set up total residuals histograms
+hresvaltot = yoda.Histo1D(hedges, path="/TOT/resval")
+hreserrtot = yoda.Histo1D(hedges, path="/TOT/reserr")
+aos = [hresvaltot, hreserrtot]
+
+
+RESLOG = open("res.log", "w")
+
+def log(*args):
+ print(*args, file=RESLOG)
+ if not opts.QUIET:
+ print(*args)
+
+if not opts.NO_PLOTS:
+ from matplotlib import pyplot as plt
+
+if not opts.QUIET:
+ print()
+
+## Loop over histograms and compute residuals histos
+for hn, mchistos in sorted(HISTOS.iteritems()):
+ hnstr = hn.replace("/", "_").strip("_")
+
+ hresval = yoda.Histo1D(hedges, path=hn+"/resval")
+ hreserr = yoda.Histo1D(hedges, path=hn+"/reserr")
+
+ if hn in IHISTOS.keys():
+ ihisto = IHISTOS[hn]
+ for irun, run in enumerate(RUNS):
+ mchisto = mchistos[run]
+ assert mchisto.nbins == ihisto.nbins
+ for ib in xrange(ihisto.nbins):
+
+ ## Value residuals
+ absres = ihisto.bins[ib].val([PARAMSLIST[irun]]) - mchisto.bins[ib].val
+ # print(irun, ib, absres)
+ if mchisto.bins[ib].val != 0:
+ relres = absres / mchisto.bins[ib].val
+ hresval.fill(relres)
+ hresvaltot.fill(relres)
+
+ ## Error residuals
+ # print(ihisto.bins[ib].err(PARAMSLIST[irun]) , mchisto.bins[ib].err)
+ # absres = ihisto.bins[ib].err(PARAMSLIST[irun]) - mchisto.bins[ib].err
+ # if mchisto.bins[ib].err != 0:
+ # relres = absres / mchisto.bins[ib].err
+ # hreserr.fill(relres)
+ # hreserrtot.fill(relres)
+
+ if not opts.ONLY_TOT:
+ aos += [hresval, hreserr]
+
+ if hresval.numEntries() > 0:
+ log(hn, "value:", "mean =", hresval.xMean(), "stddev =", hresval.xStdDev())
+ if not opts.NO_PLOTS:
+ f, _ = yoda.plot(hresval, "res_val_" + hnstr + ".pdf", plotkeys={"Title": hn})
+ plt.close(f)
+
+ # if hreserr.numEntries() > 0:
+ # log(hn, "error:", "mean =", hreserr.xMean(), "stddev =", hreserr.xStdDev())
+ # if not opts.NO_PLOTS:
+ # f, _ = yoda.plot(hreserr, "res_err_" + hnstr + ".pdf", plotkeys={"Title": hn})
+ # plt.close(f)
+
+ # if hresval.numEntries() > 0 or hreserr.numEntries() > 0:
+ # log()
+
+if hresvaltot.numEntries() > 0:
+ log("Tot val", "mean =", hresvaltot.xMean(), "stddev =", hresvaltot.xStdDev())
+ if not opts.NO_PLOTS:
+ yoda.plot(hresvaltot, "res_val_tot.pdf")
+
+if hreserrtot.numEntries() > 0:
+ log("Tot err", "mean =", hreserrtot.xMean(), "stddev =", hreserrtot.xStdDev())
+ if not opts.NO_PLOTS:
+ yoda.plot(hreserrtot, "res_err_tot.pdf")
+
+RESLOG.close()
+yoda.write(aos, "mlres.yoda")
diff --git a/contrib/prof2-ml-tune b/contrib/prof2-ml-tune
new file mode 100755
--- /dev/null
+++ b/contrib/prof2-ml-tune
@@ -0,0 +1,332 @@
+#! /usr/bin/env python
+
+"""\
+%prog <refdir> [<ipolfile>=ipol.dat] [<runsdir>=<refdir>/../mc] [opts]
+
+Use the interpolations stored in <ipolfile> to find optimised parameters with
+the reference histograms found in the <refdir> as the optimisation target.
+
+The <runsdir> is used to calculate the maximum error value seen for each bin,
+to regularise interpolated errors which could otherwise blow up, leading to
+an unrepresentative small chi2 (and hence fit result) outside the sampled ranges.
+
+TODO:
+ * Handle run combination file/string (write a hash of the run list into the ipol filename?)
+"""
+
+import optparse, os, sys
+op = optparse.OptionParser(usage=__doc__)
+op.add_option("--wfile", dest="WFILE", default=None, help="Path to a weight file to specify unequal chi2 weights of each bin in the fit (default: %default)")
+op.add_option("--output", dest="OUTPUT", default="mltunes", help="Prefix for outputs (default: %default)")
+op.add_option("--limits", dest="LIMITS", default=None, help="Simple text file with parameter limits and fixed parameters")
+op.add_option("--scan", dest="SCAN", default=None, type=int, help="Perform a brute force scan of SCAN points to find minimiser start point")
+op.add_option("-g", "--gradient", dest="GRADIENT", action="store_true", default=False, help="Run minimisation with analytic gradient (EXPERIMENTAL!)")
+op.add_option("-v", "--debug", dest="DEBUG", action="store_true", default=False, help="Turn on some debug messages")
+op.add_option("-q", "--quiet", dest="QUIET", action="store_true", default=False, help="Turn off messages")
+op.add_option("-s", "--strategy", dest="STRATEGY", default=1, type=int, help="Set Minuit strategy [0 fast, 1 default, 2 slow]")
+opts, args = op.parse_args()
+
+## Get mandatory arguments
+if len(args) < 1:
+ print "Argument missing... exiting\n\n"
+ op.print_usage()
+ sys.exit(1)
+REFDIR = args[0]
+IFILE = "ipol.dat"
+RUNSDIR = os.path.join(REFDIR, "..", "mc")
+if len(args) >= 2:
+ IFILE = args[1]
+if len(args) >= 3:
+ RUNSDIR = args[2]
+
+
+# TODO: ipol fit limits are in the ipol datfile... automatically use them / warn if result is outside?
+
+
+## Load Professor and show the standard banner
+import professor2 as prof
+if not opts.QUIET:
+ print prof.logo
+
+
+## Read persisted interpolations to re-create the ipol Histos
+import cPickle
+IHISTOS=cPickle.load(open(IFILE))
+# METADATA, IHISTOS = prof.read_ipolhistos(IFILE)
+
+## Read reference data histos
+import os, sys, glob
+DHISTOS = {}
+REFFILES = glob.glob(os.path.join(REFDIR, "*"))
+for rf in REFFILES:
+ DHISTOS.update(prof.read_histos(rf))
+
+## Try to read run histos and extract maximum errors
+MAXERRDICT = None
+try:
+ rundirs = glob.glob(os.path.join(RUNSDIR, "*"))
+ _, RUNHISTOS = prof.read_rundata(rundirs, None) #< don't care about reading params files
+ MAXERRDICT = prof.find_maxerrs(RUNHISTOS)
+except:
+ print "Could not read run data for error regularisation -- chi2 may be unstable"
+
+## Weight file parsing
+matchers = prof.read_pointmatchers(opts.WFILE) if opts.WFILE else None
+
+
+## Find things available in both ipol and ref data, and in the weight file if there is one
+available = []
+for ihn in IHISTOS.keys():
+ ## Set default bin weights
+ for ib in IHISTOS[ihn].bins:
+ ib.w = 1.0
+ ## Find user-specified bin weights if there was a weight file
+ if matchers is not None:
+ ## Find matches
+ pathmatch_matchers = [(m,wstr) for m,wstr in matchers.iteritems() if m.match_path(ihn)]
+ ## Ditch histos not listed in the weight file
+ if not pathmatch_matchers:
+ del IHISTOS[ihn]
+ continue
+ ## Attach fit weights to the ibins, setting to zero if there's no position match
+ for ib in IHISTOS[ihn].bins:
+ posmatch_matchers = [(m,wstr) for (m,wstr) in pathmatch_matchers if m.match_pos(ib)]
+ ib.w = float(posmatch_matchers[-1][1]) if posmatch_matchers else 0 #< NB. using last match
+ for rhn in DHISTOS.keys():
+ if ihn in rhn: #< TODO: short for rhn = "/REF/"+ihn ?
+ # TODO: we should eliminate this potential mismatch of ref and MC hnames
+ available.append([ihn,rhn])
+ break #< TODO: ok?
+
+
+## Prepare lists of ibins and dbins
+IBINS, DBINS, MAXERRS = [], [], []
+for a in available:
+ # TODO: print out the available observables
+ IBINS.extend(IHISTOS[a[0]].bins)
+ DBINS.extend(DHISTOS[a[1]].bins)
+ if MAXERRDICT:
+ MAXERRS.extend(MAXERRS[a[0]])
+if not MAXERRS:
+ MAXERRS = None
+
+
+## Sanity checks
+assert len(IBINS) == len(DBINS)
+if not IBINS:
+ print "No bins ..., exiting"
+ sys.exit(1)
+assert MAXERRS is None or len(IBINS) == len(MAXERRS)
+
+
+def simpleGoF(params):
+ """
+ Very straightforward goodness-of-fit measure
+ """
+ chi2 = 0.0
+ for num, ibin in enumerate(IBINS):
+ ## Weight is attached to the ipol bin (default set to 1.0 above)
+ w = ibin.w
+ if w == 0:
+ continue
+ ## Get ipol & ref bin values and compute their difference
+ ival = ibin.val([params])
+ dval = DBINS[num].val
+ diff = dval - ival
+ ## Data error
+ err2 = DBINS[num].err**2
+ ## Plus interpolation error added in quadrature
+ maxierr = MAXERRS[ibin] if MAXERRS else None
+ # err2 += ibin.err(params, emax=maxierr)**2
+ # TODO: compute asymm error for appropriate deviation direction cf. sum([e**2 for e in ibin.ierrs])
+ if not err2:
+ raise prof.StatError("Zero uncertainty on a bin being used in the fit -- cannot compute a reasonable GoF")
+ # TODO: should we square w too, so it penalised deviations _linearly_?
+ chi2 += w * diff**2 / err2
+ return chi2
+
+
+
+
+DIM=IHISTOS[IHISTOS.keys()[0]].bins[0].mlval.support_vectors_.shape[-1]
+
+a=IHISTOS[IHISTOS.keys()[0]].bins[0].mlval.support_vectors_
+
+## Take parameter names directly from ifile, or fallback
+# PNAMES = METADATA["ParamNames"].split()
+# TODO pass PNAMES from outside? or put them into the ML bin?
+# if not PNAMES:
+PNAMES = ["A%03i" % i for i in xrange(DIM)]
+
+## Function definition wrapper
+funcdef = prof.mk_fitfunc("simpleGoF", PNAMES, "profGoF")
+exec funcdef in locals()
+if opts.DEBUG:
+ print "Built GoF wrapper from:\n '%s'" % funcdef
+
+
+try:
+ from iminuit import Minuit
+except ImportError, e:
+ print "Unable to import iminuit, exiting", e
+ import sys
+ sys.exit(1)
+
+if not opts.QUIET:
+ print "\n"
+ print 66*"*"
+ print "* Using iminuit, please visit https://github.com/iminuit/iminuit *"
+ print 66*"*"
+ print "\n"
+
+## Ignition
+## Dictionary fitarg for iminuit
+FARG=dict()
+
+## Initial conditions --- use pos = center of hypercube, and step = range/10
+# TODO: Optionally make an initial brute force scan to choose the Minuit starting point, using prof.scangrid
+# pmins = [float(x) for x in METADATA["MinParamVals"].split()]
+# pmaxs = [float(x) for x in METADATA["MaxParamVals"].split()]
+# assert len(pmins) == len(pmaxs)
+
+# pmids = [(pmins[i] + pmaxs[i])/2. for i in xrange(len(pmins))]
+# pranges = [(pmaxs[i] - pmins[i]) for i in xrange(len(pmins))]
+
+
+# if opts.SCAN is not None:
+ # npoints_per_dim=opts.SCAN
+ # print "Scanning %i points"%(npoints_per_dim**len(pmins))
+ # setup=[]
+ # for num, p in enumerate(PNAMES):
+ # setup.append((p, npoints_per_dim, pmins[num], pmaxs[num]))
+
+ # grid=prof.scangrid(*setup)
+ # winner = grid.next()
+ # winner_v = simpleGoF([x[1] for x in winner])
+ # for num, g in enumerate(grid):
+ # currV=simpleGoF([x[1] for x in g])
+ # if currV < winner_v:
+ # winner=g
+ # winner_v=currV
+ # if ((num+1)%100 ==0):
+ # print "%i/%i complete"%(num+1, npoints_per_dim**len(pmins))
+
+ # # This sets the star point
+ # #print "Using startpoint:"
+ # for i, aname in enumerate(PNAMES):
+ # assert(aname==winner[i][0])
+ # pmids[i] = winner[i][1]
+ # #print "%s = %f"%(aname, pmids[i])
+
+
+# for i, aname in enumerate(PNAMES):
+ # FARG[aname] = pmids[i]
+ # FARG['error_%s'%aname] = pranges[i] / 10.
+
+## Fix parameters, set limits (with pname translation)
+limits, fixed = prof.read_limitsandfixed(opts.LIMITS)
+
+for i, pname in enumerate(PNAMES):
+ if pname in limits.keys():
+ FARG['limit_%s'%pname] = limits[pname]
+ if pname in fixed.keys():
+ if not opts.QUIET:
+ print "Fixing", pname, "= %f"%fixed[PNAMES[i]]
+ FARG[pname] = fixed[PNAMES[i]]
+ FARG['fix_%s'%pname] = True
+
+
+# TODO: errordef as CL params?
+PRINTLEVEL = 0 if opts.QUIET else 1
+
+minuit = Minuit(profGoF, errordef=1, print_level=PRINTLEVEL, forced_parameters=PNAMES, **FARG)
+minuit.strategy = opts.STRATEGY
+
+import time
+start_time = time.time()
+## Lift off
+minuit.migrad()
+print("Minimisation finished after %s seconds" % (time.time() - start_time))
+
+
+## Now process the result:
+## Goodness of fit
+if not opts.QUIET:
+ chi2 = minuit.fval
+ ndof = len(DBINS) - (len(PNAMES) - len(fixed.keys()))
+ print "'chi2': %.2f --- Ndf : %i" % (chi2, ndof)
+
+## Check if result is in validity range
+result = [minuit.values[p] for p in PNAMES]
+# rok, rng = prof.is_inrange(result, pmins, pmaxs)
+
+# Max number of characters in any of parameter names --- for formatting (ljust)
+LMAX=max([len(p) for p in PNAMES])
+
+## Write out result
+with open("%s_results.txt" % opts.OUTPUT,"w") as f:
+ ## Meta info
+ f.write("# ProfVersion: %s\n" % prof.version())
+ f.write("# Date: %s\n" % prof.mk_timestamp())
+ f.write("# InterpolationFile: %s\n" % os.path.abspath(IFILE))
+ f.write("# DataDirectory: %s\n" % os.path.abspath(REFDIR))
+ ## Limits
+ lstring = ""
+ for p in PNAMES:
+ if limits.has_key(p):
+ lstring += "\n#\t%s\t%f %f" % (p.ljust(LMAX), limits[p][0], limits[p][1])
+ f.write("#\n# Limits: %s" % lstring)
+ # Fixed parameters
+ fstring = ""
+ for p in PNAMES:
+ if fixed.has_key(p):
+ fstring += "\n#\t%s\t%f" % (p.ljust(LMAX), fixed[p])
+ f.write("\n#\n# Fixed: %s\n" % fstring)
+ f.write("#\n# Minimisation result:\n#\n")
+ ## The tuned parameter values
+ for i, p in enumerate(PNAMES):
+ f.write("%s\t%f\n" % (p.ljust(LMAX), minuit.values[PNAMES[i]]))
+
+ # Correlation matrix --- if params are fixed the covariance is not defined
+ # The keys of minuit.covariance are tuples of strings
+ f.write("#\n# Correlation matrix:\n#\n")
+ t1, t2 = zip(*minuit.covariance.keys())
+ l1=list(t1)
+ CNAMES=list(set(l1))
+
+ from math import sqrt
+ for i in PNAMES:
+ s="# %s"%i.ljust(LMAX)
+ for j in PNAMES:
+ if i in CNAMES and j in CNAMES:
+ if minuit.covariance[(i,j)] >=0:
+ s+= " %.2f"%(minuit.covariance[(i,j)]/(sqrt(minuit.covariance[(i,i)])*sqrt(minuit.covariance[(j,j)])))
+ else:
+ s+= " %.2f"%(minuit.covariance[(i,j)]/(sqrt(minuit.covariance[(i,i)])*sqrt(minuit.covariance[(j,j)])))
+ else:
+ s+= " ---"
+ f.write(s+"\n")
+ # Weights --- dump them all at the end
+ f.write("#\n#\n# Weights used\n#\n")
+ #
+ if matchers is None:
+ for k in IHISTOS.keys():
+ f.write("# %s\t1.0\n"%k)
+ else:
+ with open(opts.WFILE) as g:
+ for line in g:
+ l=line.strip()
+ if len(l)==0 or l.startswith("#"):
+ continue
+ f.write("# %s\n"%l)
+
+
+
+## Write out ipolhistos
+try:
+ import yoda
+ result = [minuit.values[name] for name in PNAMES]
+ scatters=[IHISTOS[k].toDataHisto(result).toScatter2D() for k in sorted(IHISTOS.keys())]
+ yoda.writeYODA(scatters, "%s_ipolhistos.yoda" % opts.OUTPUT)
+except ImportError:
+ print "Unable to import yoda, not writing out ipolhistos"
diff --git a/pyext/professor2/ml/__init__.py b/pyext/professor2/ml/__init__.py
new file mode 100644
--- /dev/null
+++ b/pyext/professor2/ml/__init__.py
@@ -0,0 +1,1 @@
+from histos import *
diff --git a/pyext/professor2/ml/histos.py b/pyext/professor2/ml/histos.py
new file mode 100644
--- /dev/null
+++ b/pyext/professor2/ml/histos.py
@@ -0,0 +1,74 @@
+# -*- python -*-
+
+from professor2.histos import *
+
+class MLHisto(Histo):
+ "Specialisation of Histo as a container of MLBins"
+
+ def __init__(self, nnbins=None, path=None):
+ Histo.__init__(self, nnbins, path)
+
+ def toDataHisto(self, *params):
+ "Convert this IpolBin to a DataBin with values and errors computed at params"
+ dbins = [ib.toDataBin(*params) for ib in self.bins]
+ dhist = DataHisto(dbins, self.path)
+ return dhist
+
+
+class MLBin(Bin):
+ """
+ A bin containing a value Machine Learning
+ """
+
+ __slots__ = ["mlval", "__dict__"]
+
+ def __init__(self, xmin, xmax, X, Y, pnames=None):
+ Bin.__init__(self, xmin, xmax)
+ from numpy import array
+ X=array(X)
+ Y=array([[y,0] for y in Y]) # Manky hack to get rid of the deprecation warning
+
+ # Data scaling --- Standard scaler works much better than MinMaxScaler
+ from sklearn import preprocessing
+ self._xscaler = preprocessing.StandardScaler()
+ xscaled = self._xscaler.fit_transform(X)
+ self._yscaler = preprocessing.StandardScaler()
+ yscaled = self._yscaler.fit_transform(Y) # This produces the noisy deprecation warning
+
+ # Machine Learning magic
+ from sklearn import svm
+ ml = svm.SVR() # TODO --- explore parameters of SVR
+ ml.fit(xscaled, yscaled[:,0]) # PArt of the hack
+ self.mlval = ml
+
+ def val(self, *params):
+ "Get the ML prediction of this bin"
+ from numpy import array
+ p_raw =array(params[0][0]).reshape(1,-1) # The raw, unscaled param point
+ p_scaled = self._xscaler.transform(p_raw) # The scaled param point
+ ret_raw = self.mlval.predict(p_scaled) # The prediction in the scaled value world
+ ret = self._yscaler.inverse_transform([ret_raw,0]) # The prediction in the unscaled value world
+ return float(ret[0]) # part of the hack
+
+ def toDataBin(self, *params): #< needs Python3
+ "Convert this NNBin to a DataBin with values at params"
+ db = DataBin(self.xmin, self.xmax,
+ val=self.val(params),
+ )
+ return db
+
+def mk_MLHisto(histos, runs, paramslist, paramnames=None):
+ from numpy import array
+
+ nbins = len(histos.itervalues().next().bins)
+ mbins = []
+
+ for n in xrange(nbins):
+ xmins = set([histos[run].bins[n].xmin for run in runs])
+ xmaxs = set([histos[run].bins[n].xmax for run in runs])
+ xmin, xmax = xmins.pop(), xmaxs.pop()
+ vals = [histos[run].bins[n].val for run in runs]
+
+ mbins.append(MLBin(xmin, xmax, array(paramslist), array(vals), paramnames))
+
+ return MLHisto(mbins, histos.values()[0].path)
diff --git a/pyext/setup.py b/pyext/setup.py
--- a/pyext/setup.py
+++ b/pyext/setup.py
@@ -1,31 +1,31 @@
from distutils.core import setup
from distutils.extension import Extension
from glob import glob
import os
srcdir = os.environ["PWD"] #< assume makefile is called from base dir TODO: use cwd()?
libdir = os.path.abspath("lib") #< assume makefile is called from base dir TODO: use srcdir var?
incdir = os.path.abspath("include") #< assume makefile is called from base dir TODO: use srcdir var?
os.environ.setdefault("CC", "g++")
os.environ.setdefault("CXX", "g++")
ext = Extension("professor2.core",
["pyext/professor2/core.cpp"],
language="C++",
depends=glob("../include/*.h"),
include_dirs=[incdir, os.path.join(srcdir, "pyext", "professor2")],
extra_compile_args="-std=c++11 -Wno-unused-but-set-variable -Wno-sign-compare".split(),
library_dirs=[libdir],
runtime_library_dirs=[],
libraries=["Professor2"])
v = os.environ.get("PROF_VERSION", "X.Y.Z")
setup(name = "professor2",
version=v,
ext_modules = [ext],
- packages = ["professor2", "professor2/misc"],
+ packages = ["professor2", "professor2/ml", "professor2/misc"],
package_dir = {"": "pyext"},
description="Professor version 2",
author="Professor collaboration",
author_email="professor@projects.hepforge.org",
url="http://professor.hepforge.org")
diff --git a/setup.sh b/setup.sh
--- a/setup.sh
+++ b/setup.sh
@@ -1,11 +1,13 @@
#!/bin/bash
fpath=`readlink -f ${BASH_SOURCE[0]}`
profdir=`dirname ${fpath}`
PYV=`python -c "import sys;v=sys.version_info[0:2];print '%i.%i'%(v)"`
export LD_LIBRARY_PATH=${profdir}/local/lib:$LD_LIBRARY_PATH
export PYTHONPATH=${profdir}/local/lib/python${PYV}/site-packages:$PYTHONPATH
+export PYTHONPATH=${profdir}/local/lib64/python${PYV}/site-packages:$PYTHONPATH
export PATH=${profdir}/local/bin:$PATH
+export PATH=${profdir}/local/contrib:$PATH

File Metadata

Mime Type
text/x-diff
Expires
Wed, May 14, 11:38 AM (10 h, 50 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
5089754
Default Alt Text
(48 KB)

Event Timeline