Page Menu
Home
HEPForge
Search
Configure Global Search
Log In
Files
F8310210
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
84 KB
Subscribers
None
View Options
Index: trunk/bin/prof-tune
===================================================================
--- trunk/bin/prof-tune (revision 1399)
+++ trunk/bin/prof-tune (revision 1400)
@@ -1,679 +1,678 @@
#! /usr/bin/env python
"""\
%prog --datadir DATADIR [--ipoldir IPOLDIR] --weights OBSFILE [--runsfile RUNSFILE]
%prog --refdir REFDIR --ipoldir IPOLDIR --weights OBSFILE [--runsfile RUNSFILE]
Minimise a goodness of fit (GoF) measure using pre-built MC-interpolations from
IPOLDIR against reference data in REFDIR. The MC-interpolations must be created
with prof-interpolate before running %prog.
If only the DATADIR variable is used to specify input file locations, it is
assumed that the saved interpolations are to be found in DATADIR/ipol, and that
the reference data is in DATADIR/ref. The --refdir and --ipoldir switches can be
used to specify the information explicitly or to override one of the guesses
made using the --datadir value.
The same format of observable weights file (and indeed same file!) as used for
prof-interpolate can be used to specify the observable or bin/range-wise weights
which will enter into the GoF calculation. However, it is of course necessary
that the bins for which weights are requested exist in the stored
interpolations: hence the observables specified at this stage should be a subset
of those used for prof-interpolate. To build output histograms but not include
an observable in the optimisation measure, use a weight of 0.
The RUNSFILE is used to restrict the run combinations for which optimisations
will be calculated. As for the observables specified in the weights file,
interpolations must exist for the requested run combinations. If unspecified, it
will be assumed that a file called runcombs.dat exists and is to be used.
[EXPERIMENTAL]
To consider binwise correlations in the same observable there are two ways implemented.
Both build up a covariance matrix and minimize chi^2, in which the inverse of the
covariance matrix is used.
1.) Minimum overlap method: cov_i,j = min(sigma_i, sigma_j)**2
2.) Correlated (100%) and uncorrelated uncertainties in one file per observable.
The file should be created with 'prof-lsobs --correlations' and contain one line per
bin with (several) uncorrelated and or correlated uncertainties, separeted by '---':
uncorr1 [uncorr2 ...] --- corr1 [corr2 ...]
TODO:
* read in histogram titles
* check result is in-/outside limits
* use limits dynamically determined from runcomb
* threading?? Just use prof-batchtune
"""
import sys, os
import professor.user as prof
from professor.tools import shell as shell
ipshell = shell.usePrettyTraceback()
shell.setProcessName("prof-tune")
-
## Set up signal handling
import signal
global RECVD_KILL_SIGNAL
RECVD_KILL_SIGNAL = None
def handleKillSignal(signum, frame):
"""
Declare us as having been signalled, and return to default handling
behaviour.
"""
prof.log.critical("Signal handler called with signal " + str(signum))
prof.log.critical("Waiting for this minimization to finish...")
global RECVD_KILL_SIGNAL
RECVD_KILL_SIGNAL = signum
signal.signal(signum, signal.SIG_DFL)
## Signals to handle
signal.signal(signal.SIGTERM, handleKillSignal);
signal.signal(signal.SIGHUP, handleKillSignal);
signal.signal(signal.SIGINT, handleKillSignal);
signal.signal(signal.SIGUSR2, handleKillSignal);
-
## Build command-line options parser
import optparse
parser = optparse.OptionParser(usage=__doc__, version=prof.version)
## Add standard Prof options
prof.addDataCLOptions(parser, mc=True, ref=True, ipol=True, scan=False)
prof.addRunCombsCLOptions(parser)
prof.addIpolCLOptions(parser)
prof.addOutputCLOptions(parser)
## Callback function for optparse to allow for 0 or argument
## Most of it is copied from
## http://docs.python.org/library/optparse.html#optparse-option-callbacks
def vararg_callback(option, opt_str, value, parser):
assert value is None
value = []
def floatable(str):
try:
float(str)
return True
except ValueError:
return False
for arg in parser.rargs:
# stop on --foo like options
if arg[:2] == "--" and len(arg) > 2:
break
# stop on -a, but not on -3 or -3.0
if arg[:1] == "-" and len(arg) > 1 and not floatable(arg):
break
value.append(arg)
if len(value) > 1:
prof.log.error("%s can have either one or no arguments... exiting" % (opt_str))
sys.exit(1)
elif len(value) == 0:
prof.log.warning("Treating '%s' as flag" % (opt_str))
valuetouse = 'DYNAMIC'
elif len(value) == 1:
prof.log.warning("Setting '%s' to '%s' flag" % (opt_str, value[0]))
valuetouse = value[0]
del parser.rargs[:len(value)]
setattr(parser.values, option.dest, valuetouse)
## Tuning options
group = optparse.OptionGroup(parser, "Tuning")
group.add_option("--no-ipolhistos", dest="SAVEIPOLHISTOS",
action="store_false", default=True,
help="Switch off interpolation histogram storing.")
group.add_option("--no-params", dest="SAVEPARAMS",
action="store_false", default=True,
help="Switch off creating parameter files with tune results.")
group.add_option("--correlations", dest="CORRELATIONS",
action="store", default=False,
help="[EXPERIMENTAL]:"
" Specify 'minoverlap' to use the minimum overlap"
" method to derive correlations or give a directory"
" with correlation information. [default: False]")
group.add_option("--offset", "--runnum-offset", dest="IRUNOFFSET", metavar="N", default=0, type=int,
help="Offset the runcomb variation index by the given value [default: %default]")
group.add_option("--name", dest="TUNENAME", metavar="NAME", default=None,
help="Identify this tune group with a special name, which will be "
"included in the names of output tune files and directories [default: %default]")
parser.add_option_group(group)
mingroup = optparse.OptionGroup(parser, "Minimizer")
mingroup.add_option("--minimizer", choices=("pyminuit", "scipy"),
dest="MINIMIZER", default="pyminuit",
help="Select the minimizer to use (pyminuit|scipy) [default: %default]")
# TODO: Move to a single --errors=none|migrad|minos option
mingroup.add_option("--minos", dest="USEMINOS", action="store_true",
default=False, help="When using Minuit, use MINOS to estimate the "
"parameter errors [default: off]")
mingroup.add_option("--print-minuit", dest="PRINTMINUIT", type="choice",
choices=("0","1","2","3"), metavar="N", default="0",
help="When using Minuit, set printMode to the given "
"value. [default: %default]")
mingroup.add_option("--limits", dest="LIMITS", default=None, metavar="FILE",
action="callback", callback=vararg_callback,
help="File with parameter limits for Minuit. E.g. the "
"file used to create the MC sample points. If no argument "
"is given, the limits will be dynamically computed based "
"on the distribution of input MC anchor points.")
mingroup.add_option("--spmethods", "--start-points", metavar="FOO,BAR",
dest="STARTPOINTMETHODS", default="center",
help="Comma separated list with minimisation starting"
" point methods. [default: %default]")
mingroup.add_option("--manual-sp", "--manual-startpoint",
dest="MANUALSTARTPOINT", metavar="FOO=N,BAR=M",
help="Comma separated list of parameter name - value"
" pairs to be used with 'manual' start point"
" method. Unspecified parameters are chosen"
" randomly. Name and value are separated by an"
" equals sign ('='), e.g."
" PARJ(21)=1.2,PARJ(22)=2.4 . If no start point"
" methods are specified this replaces the default"
" start point method is set to 'manual'.")
mingroup.add_option("--fixed-parameters", dest="FIXEDPARAMETERS",
metavar="FOO=N,BAR=M",
help="Comma separated list of parameter name - value"
" pairs with parameter values to be fixed during"
" minimization. Format is the same as with"
" '--manual-startpoint', e.g."
" PARJ(21)=1.2,PARJ(22)=2.4 . NB: Only supported by"
" pyminuit!")
mingroup.add_option("--eigentunes", dest="EIGENTUNES",
default=False, action="store_true",
help="Make eigentunes using the parameter covariance matrix in the vicinity of the central "
"tune to construct parameter eigenvectors which minimise parameter correlations. The "
"eigentunes are then the deviations along these lines which produce a fixed deviation in the GoF.")
mingroup.add_option("--eigentunes-dgof", dest="EIGENTUNES_DELTA_GOF", metavar="DGOF", default="1x",
help="The deviation from the minimum GoF used to define the extent of "
"eigentune deviation. For a Pearson chi2-distributed GoF measure, Delta(GoF)=1 "
"corresponds to a 1 sigma deviation. Specifying the quantity with an 'x' suffix "
"will be interpreted as a multiplier on the minimum GoF value, GoF_min. An "
"'xn' suffix is used to specify the deviation on GoF in units of (GoF_min/Ndof), "
"which is formally the most appropriate deviation for chi2-distributed quantities. "
"Note that the Professor chi2 is not explicitly designed to have a chi2-distribution "
"and that in practice a Delta(chi2) = 1 or chi2_min/Ndof deviation may produce "
"miniscule deviations: this is the same issue as encountered in construction of "
"PDF error sets and will probably require similar use of judgement and inflated "
"tolerances to produce representative resulting eigentune sets. For now the most "
"reasonable deviations are produced with a delta of 1 or chi2_min/Ndf on chi2/Ndf "
"rather than the total chi2, and hence we recommend using values of 1n or 1x "
"(default = %default)")
parser.add_option_group(mingroup)
## Add standard logging control
prof.addLoggingCLOptions(parser, logoswitch=True)
## Parse arguments
opts, args = parser.parse_args()
+
## Correct argument type on Minuit log level (optparse is not quite general enough)
opts.PRINTMINUIT = int(opts.PRINTMINUIT)
## Set up logging level and print initial messages
prof.log.setPriority(opts)
if opts.SHOW_LOGO:
prof.writeLogo()
prof.writeGuideLine()
## Check that Delta(GoF) is sane
if opts.EIGENTUNES_DELTA_GOF <= 0:
prof.log.error("--eigentunes-dgof argument must have a positive value... exiting")
sys.exit(5)
## Test paths
try:
## Test if we can read input ipols and ref data
dirs_ok = True
paths = prof.DataProxy.getPathsFromCLOptions(opts)
ipoldir = paths["ipol"]
if not ipoldir:
prof.log.error("No interpolation directory given: Use the --datadir or --ipoldir option!")
dirs_ok = False
refdir = paths["ref"]
if not refdir:
prof.log.error("No reference data directory given: Use the --datadir or --refdir option!")
dirs_ok = False
if not dirs_ok:
sys.exit(1)
## Ensure that we can write to output directories
OUTDIR = paths["outdir"]
if not OUTDIR:
prof.log.error("No output directory given: Use the --datadir or --outdir option!")
sys.exit(1)
## Tune data dir
TUNEOUTDIR = os.path.join(OUTDIR, "tunes")
prof.log.debug("Using %s for tune param & pickle file storage" % TUNEOUTDIR)
prof.io.makeDir(TUNEOUTDIR)
except Exception, e:
prof.log.error("Error: %s" % e)
sys.exit(1)
## Get data proxy
try:
## Get a DataProxy object (core object for tuning data)
dataproxy = prof.DataProxy.mkFromCLOptions(opts)
except Exception, e:
prof.log.error("Error: %s" % e)
sys.exit(1)
## Get the specified interpolation class. We're only interested in the order
## of the polynomial to construct the file name of the ipol pickles. This is
## not necessarily the exact interpolation method used. I.e. IpolCls will
## always refer to the pure-Python implementation but the pickled files
## might contain the Weave version.
try:
IpolCls = prof.InterpolationClass
IpolCls.method = opts.IPOLMETHOD
prof.log.info("Using %s for parameterisation." % IpolCls.__name__)
except Exception, e:
prof.log.error("Problem getting parameterisation method: %s" % e)
prof.log.error("Exiting!")
sys.exit(1)
## Get the specified minimizer class.
try:
MinimizerCls = prof.getMinimizerClass(opts.MINIMIZER, useminos=opts.USEMINOS,
printminuit=opts.PRINTMINUIT)
prof.log.debug("Using %s as minimizer." % MinimizerCls.__name__)
except Exception, e:
prof.log.error("Problem getting minimizer: %s" % e)
prof.log.error("Exiting...")
sys.exit(1)
## Read limits file / dynamically compute limits
limits = None
checklimits = None
if opts.LIMITS is None:
prof.log.debug("Not using minimisation limits.")
elif opts.LIMITS == "DYNAMIC":
prof.log.info("Using minimisation limits from MC anchor points")
else:
prof.io.testReadFile(opts.LIMITS)
checklimits = limits = prof.ParameterRange.mkFromFile(opts.LIMITS)
prof.log.info("Using minimisation limits from '%s':\n%s" % (opts.LIMITS, limits))
## Parse fixed parameter option
fixedparams = None
if opts.FIXEDPARAMETERS is not None:
fixedparams = prof.ParameterPoint.mkFromString(opts.FIXEDPARAMETERS)
## Parse start point methods
spmethods = opts.STARTPOINTMETHODS.split(',')
manualsp = None
if opts.MANUALSTARTPOINT is not None:
manualsp = prof.ParameterPoint.mkFromString(opts.MANUALSTARTPOINT)
if len(spmethods) == 1:
spmethods = ["manual"]
elif "manual" not in spmethods:
spmethods.append("manual")
## Load run combinations
allruns = []
if opts.RUNSFILE:
prof.log.debug("Using %s as runs file" % opts.RUNSFILE)
try:
rcm = prof.RunCombManager.mkFromFile(opts.RUNSFILE)
allruns = rcm.runcombs
except Exception, e:
prof.log.error("Error while opening run combination file %s: %s" % (opts.RUNSFILE, e))
sys.exit(1)
else:
prof.log.debug("No run combination file given! Using all available runs.")
allruns.append(dataproxy.getMCData().availableruns)
prof.log.info("Loaded %i run combinations" % len(allruns))
## Select the observables we want to use for our tune
weights = None
weightsname = None
if opts.OBSERVABLEFILE:
weightsname = os.path.basename(opts.OBSERVABLEFILE)
try:
prof.io.testReadFile(opts.OBSERVABLEFILE)
weights = prof.WeightManager.mkFromFile(opts.OBSERVABLEFILE)
except Exception, e:
prof.log.error("Problem when reading observable file: %s" % e)
prof.log.error("Exiting!")
sys.exit(1)
prof.log.debug("Loaded observable file from %s" % opts.OBSERVABLEFILE)
else:
weightsname = "unitweights"
prof.log.debug("No observable file given! Using all available observables.")
weights = prof.WeightManager()
for obsname in dataproxy.getMCData().getAvailableObservables():
weights.addBinRangeWeight(obsname)
prof.log.debug("Loaded observables: %s" % weights)
## Check that all interpolations are available in ipoldir
for runs in allruns:
path = dataproxy.getIpolFilePath(IpolCls, runs)
if not os.path.exists(path):
prof.log.error("Could not find interpolation file for runs %s in"
" ipoldir %s: %s" % (sorted(runs), dataproxy.ipolpath, path))
prof.log.error("Please call prof-interpolate with the correct"
" arguments before using this program to build"
" interpolation files!")
prof.log.error("Exiting!")
sys.exit(1)
def tune(gofobject, spmethod="center"):
global limits, checklimits, manualsp
## If no explicit limits file but the --limits option has been given, use
## the ipol-specific paramrange
## The opts.LIMITS flag only applies to use of limits in tune minimization
if opts.LIMITS == "DYNAMIC":
limits = gofobject.tunedata.paramranges
## Use a separate, always-enabled param range for checking if proposed tunes are within range
if checklimits is None:
checklimits = gofobject.tunedata.paramranges
## Use the center of the limits space as starting point in case
if manualsp is None and spmethod=="center" and opts.LIMITS is not None:
manualsp = limits.center
## Check if starting point was accidently requested to be outside limits
if isinstance(limits, prof.ParameterRange) and not limits.isInside(manualsp):
prof.log.error("Starting point for minimisation outside limits, exiting...")
sys.exit(1)
## Instantiate minimizer
minimizer = MinimizerCls()
msg = "Starting minimiser...\n"
msg += " using spmethod %s\n" % spmethod
msg += " using manualsp %s\n" % manualsp
msg += " using limits \n%s\n" % limits
msg += " using fixedpars %s" % fixedparams
prof.log.debug(msg)
## Run minimizer on provided GoF
result = minimizer.minimize(gofobject,
spmethod=spmethod, manualsp=manualsp,
limits=limits, fixedpars=fixedparams)
return result
def mkIpolHistos(result, tundat, outpath):
prof.log.debug("Writing interpolation histograms to %s" % outpath)
f = open(outpath, "w")
# TODO: convert to YODA, or at least add a YODA format option
f.write('<?xml version="1.0" encoding="ISO-8859-1" ?>\n')
f.write('<!DOCTYPE aida SYSTEM "http://aida.freehep.org/schemas/3.3/aida.dtd">\n')
f.write('<aida version="3.3">\n')
for obs in sorted(tundat.observables):
f.write(tundat.getInterpolationHisto(obs, result).asAIDA() + "\n\n")
f.write("</aida>\n")
f.close()
## Do tuning for each run combination, and append to the ResultList
reslist = prof.ResultList()
reslistname = "results.pkl"
if opts.TUNENAME:
reslistname = opts.TUNENAME + "-" + reslistname
reslistfile = os.path.join(TUNEOUTDIR, reslistname)
nummins = len(allruns)*len(spmethods)
tunenum = 0
for irun, runs in enumerate(allruns):
for imeth, spmethod in enumerate(spmethods):
tunenum += 1
itune = len(spmethods) * (irun + opts.IRUNOFFSET) + imeth
itune_name = "tune-%s-%03i" % (weightsname, itune)
if opts.TUNENAME:
itune_name = opts.TUNENAME + "-" + itune_name
prof.log.info("Starting %i/%i tune, %s..." % (tunenum, nummins, itune_name))
## Tune output directory for params and histograms
ITUNEOUTDIR = os.path.join(TUNEOUTDIR, itune_name)
prof.io.makeDir(ITUNEOUTDIR)
if opts.SAVEIPOLHISTOS:
IPOLHISTOOUTDIR = os.path.join(ITUNEOUTDIR, "ipolhistos")
prof.log.debug("Using %s for interpolation histo storage" % IPOLHISTOOUTDIR)
prof.io.makeDir(IPOLHISTOOUTDIR)
try:
## Load interpolation set
path = dataproxy.getIpolFilePath(IpolCls, runs)
prof.log.debug("Loading ipolset from %s" % path)
ipolset = prof.InterpolationSet.mkFromPickle(path)
prof.log.debug("Loaded interpolation set %s" % (ipolset))
ipolhistonames = ipolset.getHistogramNames()
prof.log.debug("Creating TuneData")
tundat = None
try:
tundat = dataproxy.getTuneData(withref=True, useipol=IpolCls,
useobs=weights.observables, useruns=runs)
tundat.applyObservableWeights(weights)
tundat.vetoEmptyErrors() # TODO: Fix and remove this
except prof.DataProxyError, e:
prof.log.error("Could not build tune data from the specified interpolation. "
"Does the ipol object contain all the bins that your tune is asking for?")
print e
prof.log.debug(e)
sys.exit(4)
## Make GoF object for this ipol and tunedata
# gof = prof.SimpleIpolChi2(tundat, epsilon=0.05, withcorrelation=opts.CORRELATIONS)
gof = prof.SimpleIpolChi2(tundat, withcorrelation=opts.CORRELATIONS)
## Make tune
result = tune(gof, spmethod)
prof.log.info("\nTune:\n%s\n" % result.format(cmp=ipolset.pnamecmp))
## Check that tune is within parameter ranges, otherwise warn and discard
if isinstance(checklimits, prof.ParameterRange) and not checklimits.isInside(result.values):
prof.log.warning("Tune %s is outside the parameter boundaries." % itune_name)
## Add the successful tune to the results list
reslist.append(result)
## Get GoF
gof.setParams(result.values)
TUNE_GOF = gof.calcGoF()
## Computation of actual Delta(GoF) from a relative spec if required
if opts.EIGENTUNES:
msg = "Computing eigentunes with Delta(GoF) = %s" % opts.EIGENTUNES_DELTA_GOF
if opts.EIGENTUNES_DELTA_GOF.endswith("x"):
## Scale delta value by chi2_min
opts.EIGENTUNES_DELTA_GOF = float(opts.EIGENTUNES_DELTA_GOF[:-1]) * TUNE_GOF
elif opts.EIGENTUNES_DELTA_GOF.endswith("xn") or opts.EIGENTUNES_DELTA_GOF.endswith("nx"):
## Scale delta value by chi2_min/Ndf
opts.EIGENTUNES_DELTA_GOF = float(opts.EIGENTUNES_DELTA_GOF[:-2]) * TUNE_GOF / gof.calcNdof()
elif opts.EIGENTUNES_DELTA_GOF.endswith("n"):
## Scale delta value by Ndf (equivalent to a raw delta on GoF/Ndf)
opts.EIGENTUNES_DELTA_GOF = float(opts.EIGENTUNES_DELTA_GOF[:-1]) * gof.calcNdof()
else:
## Use the raw delta value
opts.EIGENTUNES_DELTA_GOF = float(opts.EIGENTUNES_DELTA_GOF)
msg += " => %e" % opts.EIGENTUNES_DELTA_GOF
prof.log.info(msg)
## Write out params file for central tune
if opts.SAVEPARAMS:
paramsfile = os.path.join(ITUNEOUTDIR, "%s-0.params" % itune_name)
result.values.writeParamFile(paramsfile, key=ipolset.pnamecmp)
## Store interpolation histograms for central tune
if opts.SAVEIPOLHISTOS:
ipolhistofile = os.path.join(IPOLHISTOOUTDIR, "%s-0.aida" % itune_name)
mkIpolHistos(result, tundat, ipolhistofile)
prof.log.info("Ipolhistos written to %s"%ipolhistofile)
else:
prof.log.debug("Not storing any histogram data for later plotting.")
## Write out tune pickle file
import pickle
onereslist = prof.ResultList([result])
f = open(os.path.join(ITUNEOUTDIR, "%s-0.pkl" % itune_name), "w")
pickle.dump(onereslist, f)
f.close()
## EIGENTUNES
## Create variations on optimal result vector by going in the 2N +- principle direction vectors
if opts.EIGENTUNES:
prof.log.info("Eigentunes:\n")
import numpy
from professor.tools import eigen
def mkDeltaTune(result, i, scale):
"""Construct a deviation tune of magnitude scale in the ith
covariance-rotated direction."""
global fixedparams
if fixedparams is None:
fixedparams = {}
try:
covmat = result.covariance#_incl_fixed
T_transp, S, T = eigen.eigenDecomposition(covmat)
delta = scale * numpy.sqrt(S[i])
eigenbasis_unitvec = list(numpy.zeros(covmat._dim))
eigenbasis_unitvec[i] = 1
eigenbasis_vec = delta * numpy.matrix(eigenbasis_unitvec)
## Rotate result params into eigenbasis
result_params_reduced = numpy.matrix([result.values[pname] for pname in result.names
if pname not in fixedparams.keys()])
assert len(result_params_reduced.tolist()[0]) == len(result.values) - len(fixedparams)
assert len(result_params_reduced.tolist()[0]) == len(T_transp.tolist()[0])
result_params_trf = (T_transp * result_params_reduced.transpose()).transpose()
etune_params_trf = result_params_trf + eigenbasis_vec
etune_params_t = T * etune_params_trf.transpose()
etune_paramnames = [pname for pname in result.names if pname not in fixedparams.keys()]
etune_params = etune_params_t.transpose().tolist()[0]
for pname in fixedparams.keys():
etune_paramnames.append(pname)
etune_params.append(fixedparams[pname])
etune = prof.ParameterTune(etune_paramnames, numpy.array(etune_params),
runs=runs, obs=weights.observables)
except Exception, e:
prof.log.warning(e)
#import traceback
#traceback.print_exc()
return S[i], etune
def mkDeltaTuneAndGoF(result, i, scale):
mys, mytune = mkDeltaTune(result, i, scale)
gof.setParams(mytune)
return mys, mytune, gof.calcGoF()
for i in xrange(result.covariance._dim):
def delta_gof_with_offset(nsigma):
"""
Find Delta(GoF) point explicitly -- the 1-sigma eigenvectors from Minuit seem far too big
"""
global result, i, TUNE_GOF, opts
mys, mytune, mygof = mkDeltaTuneAndGoF(result, i, nsigma)
return mygof - TUNE_GOF - opts.EIGENTUNES_DELTA_GOF
def find_delta_gof_offset_dsigma(tune_name, maxattempts=10):
global opts, i, result
import scipy.optimize
dnsig_inner = 0
dnsig_outer = opts.EIGENTUNES_DELTA_GOF + 1
for attempt in xrange(1, maxattempts+1):
prof.log.debug("%s: attempt #%d to find dGoF = %f from GoF = %f. |dsigma| = %f" %
(tune_name, attempt, opts.EIGENTUNES_DELTA_GOF, TUNE_GOF, dnsig_outer))
try:
if "+" in tune_name:
assert(delta_gof_with_offset(dnsig_inner) < 0)
nsig_dtune = scipy.optimize.brentq(delta_gof_with_offset, dnsig_inner, dnsig_outer)
elif "-" in tune_name:
assert(delta_gof_with_offset(-dnsig_inner) < 0)
nsig_dtune = scipy.optimize.brentq(delta_gof_with_offset, -dnsig_outer, -dnsig_inner)
else:
raise Exception("Eigentune names must contain a + or - sign to denote the direction of deviation")
lambda_i, dtune, gof_dtune = mkDeltaTuneAndGoF(result, i, nsig_dtune)
return nsig_dtune, gof_dtune, dtune, lambda_i
except ValueError, ve:
dnsig_inner = dnsig_outer
dnsig_outer *= 2
if maxattempts and attempt == maxattempts:
raise Exception("No %s eigentune could be constructed for Delta(GoF) = %f" %
(tune_name, opts.EIGENTUNES_DELTA_GOF))
def process_dtune(tune_name):
try:
nsig_dtune, gof_dtune, tune_dtune, lambda_i = find_delta_gof_offset_dsigma(tune_name)
prof.log.info("%s: (Delta(sigma) = %e, GoF = %e)\n%s\n" % (tune_name, nsig_dtune, gof_dtune, tune_dtune))
if isinstance(checklimits, prof.ParameterRange) and not checklimits.isInside(tune_dtune):
prof.log.warning("Tune %s is outside the parameter boundaries." % itune_name)
prof.log.info("Tune %s %% deviation vector =\n%s\n" % (tune_name, 100*(tune_dtune - result.values)/result.values))
prof.log.info("")
## Store params files for error tunes
if opts.SAVEPARAMS:
paramsfile_dtune = os.path.join(ITUNEOUTDIR, "%s-%s.params" % (itune_name, tune_name))
tune_dtune.writeParamFile(paramsfile_dtune)
## Store interpolation histograms for error tunes
if opts.SAVEIPOLHISTOS:
ipolhistofile_dtune = os.path.join(IPOLHISTOOUTDIR, "%s-%s.aida" % (itune_name, tune_name))
mkIpolHistos(tune_dtune, tundat, ipolhistofile_dtune)
# ## Write out eigentune pickle file
# TODO: Make sure that the eigentune "result" object that's stored has the right params ("values") set
# import pickle
# f = open(os.path.join(ITUNEOUTDIR, "%s-%s.pkl" % (itune_name, tune_name)), "w")
# pickle.dump(result, f)
# f.close()
except Exception, e:
prof.log.warning(e)
# import traceback
# traceback.print_exc()
## Make and process the eigentunes
tune_plus_name = "%02d+" % (i+1)
process_dtune(tune_plus_name)
tune_minus_name = "%02d-" % (i+1)
process_dtune(tune_minus_name)
# TODO: Include error tunes in some ResultCollectionList-type object for pickling
# import numpy
# for s in numpy.linspace(-1.1, 1.1, num=23):
# mys, mytune, mygof = mkDeltaTuneAndGoF(result, i, s)
# gof.setParams(mytune)
# print s, mygof
except prof.MinimizerError, err:
prof.log.error("Minimizer error for this tuning attempt: %s" % err)
prof.log.error("No result is saved!\n")
continue
except IOError, ioe:
prof.log.error("Error when retrieving interpolation: %s" % ioe)
prof.log.error("No result computed.\n")
continue
except KeyboardInterrupt:
prof.log.critical("Keyboard interrupt detected, you'll have to "
"make do with only %i results." % len(reslist))
break
if RECVD_KILL_SIGNAL is not None:
prof.log.critical("Leaving loop early due to signal %s: you'll "
"have to make do with only %i results." %
(RECVD_KILL_SIGNAL, len(reslist)))
break
## Write results snapshot
prof.log.debug("Writing %i results snapshots in %s" % (len(reslist), reslistfile))
reslist.write(reslistfile)
if RECVD_KILL_SIGNAL is not None:
break
## Pickle the final list of all tune results
prof.log.debug("Writing results to %s" % reslistfile)
reslist.write(reslistfile)
Index: trunk/professor/histo/lighthisto.py
===================================================================
--- trunk/professor/histo/lighthisto.py (revision 1399)
+++ trunk/professor/histo/lighthisto.py (revision 1400)
@@ -1,856 +1,856 @@
import posixpath
import os, sys, re
from professor.tools.elementtree import ET
from professor.tools.decorators import deprecated
import professor.tools.log as logging
__all__= ['Histo', 'Bin', 'PlotParser', 'FORMATS']
if "ET" not in dir():
try:
import xml.etree.cElementTree as ET
except ImportError:
logging.debug("Could not load module xml.etree.cElementTree,"
" so we're on a Python < 2.5 system."
" Trying to load cElementTree...")
try:
import cElementTree as ET
except ImportError:
logging.warning("Could not load module cElementTree:"
" using slower xml.etree.ElementTree instead!")
import xml.etree.ElementTree as ET
from htmlentitydefs import codepoint2name
unichr2entity = {}
for code, name in codepoint2name.iteritems():
# exclude "&"
if code != 38:
unichr2entity[unichr(code)] = u"&%s;" % (name)
# Using mutable types as (dict, list) as default arguments can have nasty
# side effects.
def htmlescape(text, d=None):
if d is None:
d = unichr2entity
if u"&" in text:
text = text.replace(u"&", u"&")
for key, value in d.iteritems():
if key in text:
text = text.replace(key, value)
return text
## Accepted histo formats / extensions
FORMATS = ("yoda", "aida") #, "dat", "flat")
class Histo(object):
"""Simple container for histograms storing a list of :class:`Bin` instances.
Histogram trees can be read in from AIDA and flat format files with
:meth:`~Histo.fromAIDA` and :meth:`~histo.fromFlat`. These methods
produce dictionaries mapping histogram paths to histograms. For string
representations for AIDA and flat output :meth:`~Histo.asFlat` and
:meth:`~Histo.asAIDA` can be used.
Two different paths for histograms exist:
:attr:`histopath`
The path of the histogram.
:attr:`fullpath`
The full path of the histogram including "/REF" for reference histograms.
Looping over the bins in a histogram can be simply done by::
>>> for b in myhisto:
... # do stuff with Bin b
"""
aidaindent = " "
def __init__(self):
self._bins = []
# the path dirname (including /REF but not the observable name)
self.path = "/"
# the observable name, e.g. d01-x02-y01
self.name = None
# the histogram title
self.title = ''
self.xlabel = ''
self.ylabel = ''
self.annotations = {}
self._sorted = False
def __cmp__(self, other):
"""Sort by $path/$name string"""
return self.fullPath() > other.fullPath()
def __str__(self):
out = "Histogram '%s' with %d bins\n" % (self.fullPath(), self.numBins())
out += "Title: %s\n" % self.title
out += "XLabel: %s\n" % self.xlabel
out += "YLabel: %s\n" % self.ylabel
out += "\n".join([str(b) for b in self.getBins()])
return out
def fullPath(self):
if self.path and self.name:
return posixpath.join(self.path, self.name)
if self.path:
return self.path
if self.name:
return "/" + self.name
return None
fullpath = property(fullPath, doc="Full histo path including leading '/REF' and histogram name")
def histoPath(self):
if self.path.startswith("/REF"):
return self.fullpath[4:]
else:
return self.fullpath
histopath = property(histoPath, doc="Histo path but without a leading '/REF'")
# def header(self):
# out = "# BEGIN PLOT\n"
# out += "LogY=1\n"
# out += "Title=%s\n" % self.title
# out += "XLabel=%s\n" % self.xlabel
# out += "YLabel=%s\n" % self.ylabel
# out += "# END PLOT\n"
# return out
@deprecated
def asFlat(self):
out = "# BEGIN HISTOGRAM %s\n" % self.fullPath()
out += "AidaPath=%s\n" % self.fullPath()
out += "Title=%s\n" % self.title
if self.xlabel:
out += "XLabel=%s\n" % self.xlabel
if self.ylabel:
out += "YLabel=%s\n" % self.ylabel
if self.fullpath and self.fullpath.startswith('/REF'):
out += "PolyMarker=*\n"
out += "ErrorBars=1\n"
for aname, aval in self.annotations.iteritems():
out += "%s=%s\n" % (aname, aval)
out += "## Area: %e\n" % self.area()
out += "## Num bins: %d\n" % self.numBins()
out += "## xlow \txhigh \tval \terrminus\terrplus\n"
out += "\n".join([b.asFlat() for b in self.getBins()])
out += "\n# END HISTOGRAM\n"
return out
# def asGnuplot(self):
# out = "## HISTOGRAM: %s\n" % self.fullPath()
# out += "## Title: %s\n" % self.title
# if (self.xlabel!=''):
# out += "## XLabel: %s\n" % self.xlabel
# if (self.ylabel!=''):
# out += "## YLabel: %s\n" % self.ylabel
# out += "## Area: %s\n" % self.area()
# out += "## Num bins: %d\n" % self.numBins()
# out += "## xval \tyval \txlow \txhigh \tylow \tyhigh\n"
# out += "\n".join([b.asGnuplot() for b in self.getBins()])
# out += "\n# END HISTOGRAM\n"
# return out
# TODO: Remove!
aidaindent = " "
def asAIDA(self):
ind = self.aidaindent
r = ind + '<dataPointSet name="%s" dimension="2"\n' % (
self.name)
if self.title is not None:
r += ind + ' path="%s" title="%s">\n' % (
self.path, htmlescape(self.title))
else:
r += ind + ' path="%s" title="">\n' % (
self.path)
if (self.xlabel!=''):
r += ind + ' <dimenstion dim="0" title="%s"/>\n' % (
htmlescape(self.xlabel))
if (self.ylabel!=''):
r += ind + ' <dimenstion dim="1" title="%s"/>\n' % (
htmlescape(self.ylabel))
r += ind + " <annotation>\n"
if (self.title!=''):
r += ind + ' <item key="Title" value="%s" sticky="true"/>\n' % (
htmlescape(self.title))
else:
r += ind + ' <item key="Title" value="" sticky="true"/>\n'
r += ind + ' <item key="AidaPath" value="%s" sticky="true"/>\n' % (
self.fullPath())
# TODO: FullPath annotation item?
# r += ind + ' <item key="FullPath" value
r += ind + " </annotation>\n"
for b in self:
r += b.asAIDA()
r += ind + "</dataPointSet>\n"
return r
def numBins(self):
return len(self._bins)
def getBins(self):
if not self._sorted:
self._bins.sort()
self._sorted = True
return self._bins
def setBins(self, bins):
self._bins = bins
self._sorted = False
return self
def addBin(self, bin):
self._bins.append(bin)
self._sorted = False
return self
def getBin(self, index):
if not self._sorted:
self._bins.sort()
self._sorted = True
return self.getBins()[index]
bins = property(getBins, setBins)
def addAnnotation(self, aname, aval):
self.annotations[aname] = aval
return self
def getAnnotation(self, aname):
return self.annotations.get(aname)
def area(self):
return sum([bin.area() for bin in self.bins])
getArea = area
def __iter__(self):
return iter(self.getBins())
def __len__(self):
return len(self._bins)
def __getitem__(self, index):
return self.getBin(index)
def chop(self, *xranges):
"""Return a chopped histogram.
The kept bins are defined by (xstart, xstop) pairs. The first xstart
and last xstop can be None meaning that all is included from the
first or up to the last bin respectively.
Example:
>>> hist.chop((2.5, 5.5), (7.5, None))
"""
if len(xranges) == 0:
raise ValueError("At least one (xstart, xstop) range is needed!")
# check that xranges is
laststop = xranges[0][1]
for xr in xranges[1:]:
if laststop >= xr[0]:
raise ValueError("(xstart, xstop) ranges must be in numerical order!")
laststop = xr[1]
new = Histo()
new.path = self.path
new.name = self.name
new.title = self.title
new.xlabel = self.xlabel
new.ylabel = self.ylabel
irange = 0
curran = xranges[irange]
for b in self:
#lowok = False
#highok = False
br = b.getXRange()
# update the current range used if we exceed the current upper
# limit
while (curran[1] is not None and
irange < len(xranges) - 1 and
br[0] > curran[1]):
irange += 1
curran = xranges[irange]
if ((curran[0] is None or curran[0] <= br[0] or
br[0] <= curran[0] <= br[1]) and
(curran[1] is None or curran[1] >= br[1] or
br[0] <= curran[1] <= br[1])):
new.addBin(b)
else:
sys.stderr.write("Chopping bin %s: %e\n" % (self.fullPath(), b.getBinCenter()))
return new
def renormalise(self, newarea):
""" Renormalise histo to newarea """
# Construc new histo
new = Histo()
# Use the same metadata
new.path = self.path
new.name = self.name
new.title = self.title
new.xlabel = self.xlabel
new.ylabel = self.ylabel
# The current histogram area
oldarea = self.getArea()
# Iterate over all bins
for b in self:
# Rescale Value, Err+, Err-
newy = b.val * float(newarea) / oldarea
newerrplus = b.errplus * float(newarea) / oldarea
newerrminus = b.errminus * float(newarea) / oldarea
newbin = Bin(b.xlow, b.xhigh, newy, newerrplus, newerrminus, b.focus)
new.addBin(newbin)
return new
@classmethod
def fromDPS(cls, dps):
"""Build a histogram from a XML dataPointSet."""
new = cls()
new.name = dps.get("name")
new.title = dps.get("title")
new.path = dps.get("path")
# # strip /REF from path
# if new.path.startswith("/REF"):
# new.path = new.path[4:]
axes = dps.findall("dimension")
if (len(axes)>=2):
for a in axes:
if (a.get("dim")=="0"):
new.xlabel = a.get("title")
elif (a.get("dim")=="1"):
new.ylabel = a.get("title")
elif (a.get("dim")=="2"):
new.zlabel = a.get("title")
points = dps.findall("dataPoint")
#numbins = len(points)
for binnum, point in enumerate(points):
bin = Bin()
measurements = point.findall("measurement")
for d, m in enumerate(measurements):
val = float(m.get("value"))
down = float(m.get("errorMinus"))
up = float(m.get("errorPlus"))
if d == 0:
low = val - down
high = val + up
bin.setXRange(low, high)
elif (len(measurements) == 2 and d == 1) or (len(measurements) == 3 and d == 2):
bin.val = val
bin.errplus = up
bin.errminus = down
elif (len(measurements) == 3 and d == 1):
low = val - down
high = val + up
bin.setYRange(low, high)
new.addBin(bin)
return new
@classmethod
def fromYODAHisto(cls, stringbuf):
"""Build a histogram from its YODA format representation.
"""
desc = {}
new = cls()
for line in stringbuf.splitlines():
line = line.strip()
if not line:
continue
if 'BEGIN YODA_HISTO1D' in line:
fullpath = line.split('BEGIN YODA_HISTO1D', 1)[1].strip()
new.path = os.path.dirname(fullpath)
new.name = os.path.basename(fullpath)
continue
elif 'END YODA_HISTO1D' in line:
break
elif line.startswith("#"):
continue
elif "=" in line:
linearray = line.split("=", 1)
desc[linearray[0]] = linearray[1]
else:
linearray = line.split()
if desc["Type"] == "Histo1D": # TODO: deal with under/overflows later
try:
float(linearray[0])
new.addBin(Bin(float(linearray[0]), float(linearray[1]),
float(linearray[2]),
float(linearray[4]), float(linearray[4])))
except:
pass
elif desc["Type"] == "Scatter2D":
try:
float(linearray[0])
new.addBin(Bin(float(linearray[0]) - float(linearray[1]),
float(linearray[0]) + float(linearray[2]),
float(linearray[3]),
float(linearray[4]), float(linearray[5])))
except:
pass
else:
sys.stderr.write("Unknown line format in '%s'\n" % line)
## Apply special annotations as histo obj attributes
if desc.has_key("Path"):
new.path, new.name = posixpath.split(desc["Path"])
if desc.has_key("Title"):
new.title = desc["Title"]
if desc.has_key("XLabel"):
new.title = desc["XLabel"]
if desc.has_key("YLabel"):
new.title = desc["YLabel"]
return new
@classmethod
def fromFlatHisto(cls, stringbuf):
"""Build a histogram from its flat text representation.
"""
desc = {}
new = cls()
for line in stringbuf.splitlines():
line = line.strip()
if not line:
continue
if 'BEGIN HISTOGRAM' in line:
fullpath = line.split('BEGIN HISTOGRAM', 1)[1].strip()
new.path = os.path.dirname(fullpath)
new.name = os.path.basename(fullpath)
continue
elif 'END HISTOGRAM' in line:
break
elif line.startswith("#"):
continue
elif "=" in line:
linearray = line.split("=", 1)
desc[linearray[0]] = linearray[1]
else:
linearray = line.split()
if len(linearray) == 4:
new.addBin(Bin(float(linearray[0]), float(linearray[1]),
float(linearray[2]),
float(linearray[3]), float(linearray[3])))
elif len(linearray) == 5:
new.addBin(Bin(float(linearray[0]), float(linearray[1]),
float(linearray[2]),
float(linearray[3]), float(linearray[4])))
else:
sys.stderr.write("Unknown line format in '%s'\n" % line)
## Apply special annotations as histo obj attributes
if desc.has_key("AidaPath"):
new.path, new.name = posixpath.split(desc["AidaPath"])
if desc.has_key("Title"):
new.title = desc["Title"]
if desc.has_key("XLabel"):
new.title = desc["XLabel"]
if desc.has_key("YLabel"):
new.title = desc["YLabel"]
return new
@classmethod
def fromFile(cls, path):
histos = {}
try:
import yoda
aos = yoda.read(path)
for hpath, ao in sorted(aos.iteritems()):
h = cls()
if hpath.startswith("/REF"):
hpath = hpath[4:]
h.path = os.path.dirname(hpath)
h.name = os.path.basename(hpath)
- if "Scatter" not in ao.type:
- ao = ao.mkScatter()
+ if "Scatter" not in ao.type: # TODO: this is not necessary from YODA 1.2.0
+ ao = ao.mkScatter()
for p in ao.points:
h.addBin( Bin(p.x-p.xErrs[0], p.x+p.xErrs[1], p.y, p.yErrs[1], p.yErrs[0]) )
histos[hpath] = h
- # TODO: delete local readers, and rely on YODA reader (except for a very basic format, maybe)
+ # TODO: delete local readers, and rely on YODA reader (except for a very basic flat format, maybe)
except ImportError, e: ## fall back to local readers (discard!)
#print e
ftype = os.path.splitext(path)[1]
#print path, ftype
if ftype == ".yoda":
histos = cls.fromYODA(path)
elif ftype == ".aida":
histos = cls.fromAIDA(path)
#elif ftype == ".flat":
#histos = cls.fromFlat(path)
else:
raise ValueError("File type %s not supported." % ftype)
return histos
@classmethod
def fromYODA(cls, path):
"""Load all histograms in file 'path' into a histo-path=>histo dict.
The keys of the dictionary are the full paths of the histogram, i.e.
AnalysisID/HistoID, a leading "/REF" is stripped from the keys.
"""
runhistos = {}
if path == "-":
f = sys.stdin
else:
f = open(path, "r")
fullpath = None
s = ""
for line in f:
if "BEGIN YODA_HISTO1D" in line:
fullpath = line.split('BEGIN YODA_HISTO1D', 1)[1].strip()
# TODO: Really? Here?
if fullpath.startswith("/REF"):
fullpath = fullpath[4:]
if fullpath:
s += line
if "END YODA_HISTO1D" in line:
runhistos[fullpath] = cls.fromYODAHisto(s)
## Reset for next histo
fullpath = None
s = ""
if f is not sys.stdin:
f.close()
return runhistos
@classmethod
def fromFlat(cls, path):
"""Load all histograms in file 'path' into a histo-path=>histo dict.
The keys of the dictionary are the full paths of the histogram, i.e.
AnalysisID/HistoID, a leading "/REF" is stripped from the keys.
"""
runhistos = []
if path == "-":
f = sys.stdin
else:
f = open(path, "r")
fullpath = None
s = ""
for line in f:
if "BEGIN HISTOGRAM" in line:
fullpath = line.split('BEGIN HISTOGRAM', 1)[1].strip()
# TODO: Really? Here?
if fullpath.startswith("/REF"):
fullpath = fullpath[4:]
if fullpath:
s += line
if "END HISTOGRAM" in line:
runhistos.append(cls.fromFlatHisto(s))
## Reset for next histo
fullpath = None
s = ""
if f is not sys.stdin:
f.close()
return runhistos
@classmethod
def fromAIDA(cls, path):
"""Load all histograms in file 'path' into a histo-path=>histo dict.
The keys of the dictionary are the full paths of the histogram, i.e.
AnalysisID/HistoID, a leading "/REF" is stripped from the keys.
TODO: /REF stripping should really happen in user code...
"""
runhistos = dict()
tree = ET.parse(path)
for dps in tree.findall("dataPointSet"):
fullpath = posixpath.join(dps.get("path"), dps.get("name"))
# TODO: Really? Here?
if fullpath.startswith("/REF"):
fullpath = fullpath[4:]
runhistos[fullpath] = cls.fromDPS(dps)
return runhistos
class Bin(object):
"""A simple container for a binned value with an error."""
__slots__ = ["xlow", "xhigh", "ylow", "yhigh", "val", "errplus", "errminus", "_focus"]
def __init__(self, xlow=None, xhigh=None, val=0, errplus=0, errminus=0, focus=None, ylow=None, yhigh=None):
def _float(f):
if f is None:
return None
return float(f)
self.xlow = _float(xlow)
self.xhigh= _float(xhigh)
self.ylow = _float(ylow)
self.yhigh= _float(yhigh)
self.val = _float(val)
self.errplus = _float(errplus)
self.errminus = _float(errminus)
self._focus= _float(focus)
def __str__(self):
out = "%e to %e: %e +%e-%e" % (self.xlow, self.xhigh,
self.val, self.errplus, self.errminus)
return out
@deprecated
def asFlat(self):
if self.ylow==None or self.yhigh==None:
out = "%e\t%e\t%e\t%e\t%e" % (self.xlow, self.xhigh, self.val, self.errminus, self.errplus)
else:
out = "%e\t%e\t%e\t%e\t%e\t%e" % (self.xlow, self.xhigh, self.ylow, self.yhigh, self.val, 0.5*(self.errminus+self.errplus))
return out
def asGnuplot(self):
out = "%e\t%e\t%e\t%e\t%e\t%e" % (self.getBinCenter(), self.val,
self.xlow, self.xhigh, self.val-self.errminus,
self.val+self.errplus)
return out
# TODO: Remove!
aidaindent = " "
def asAIDA(self):
"Return this bin as AIDA formatted string."
ind = self.aidaindent
return (ind + "<dataPoint>\n"
+ ind
+ ' <measurement errorPlus="%e" value="%e" errorMinus="%e"/>\n' % (
.5*(self.xhigh - self.xlow), self.getBinCenter(), .5*(self.xhigh - self.xlow))
+ ind
+ ' <measurement errorPlus="%e" value="%e" errorMinus="%e"/>\n' % (
self.errplus, self.val, self.errminus)
+ ind + "</dataPoint>\n")
def __cmp__(self, other):
"""Sort by mean x value (yeah, I know...)"""
return (self.xlow + self.xhigh) > (other.xlow + other.xhigh)
def getXRange(self):
return (self.xlow, self.xhigh)
def getYRange(self):
return (self.ylow, self.yhigh)
def setXRange(self, xlow, xhigh):
self.xlow = xlow
self.xhigh = xhigh
return self
def setYRange(self, ylow, yhigh):
self.ylow = ylow
self.yhigh = yhigh
return self
def getBinCenter(self):
"""Geometric middle of the bin range."""
return float(self.xlow + .5*(self.xhigh - self.xlow))
def getFocus(self):
"""Mean x-value of the bin."""
if self._focus is not None:
return (self.xlow + self.xhigh)/2.0
else:
return self._focus
focus = property(getFocus)
def getVal(self):
"""Y-value of the bin."""
return self.val
def area(self):
return self.val * (self.xhigh - self.xlow)
getArea = area
def getErr(self):
"""Get mean of +ve and -ve y-errors."""
return (self.errplus + self.errminus)/2.0
def setErr(self, err):
"""Set both +ve and -ve y-errors simultaneously."""
self.errplus = err
self.errminus = err
return self
err = property(getErr, setErr)
class PlotParser(object):
"""Parser for Rivet's .plot plot info files."""
pat_begin_block = re.compile('^#+ BEGIN ([A-Z0-9_]+) ?(\S+)?')
# temporarily allow several hashes before END for YODA
pat_end_block = re.compile('^#+ END ([A-Z0-9_]+)')
pat_comment = re.compile('^#|^\s*$')
pat_property = re.compile('^(\w+?)=(.*)$')
pat_path_property = re.compile('^(\S+?)::(\w+?)=(.*)$')
def __init__(self, plotpaths=None):
"""
Parameters
----------
plotpaths : list of str, optional
The directories to search for .plot files.
The default is to call :command:`rivet-config --datadir` to get
the directory where the .plot files can be found.
Raises
------
ValueError
If `plotpaths` is not specified and calling
:command:`rivet-config` fails.
"""
if plotpaths is None:
plotpaths = []
self.plotpaths = plotpaths
if len(self.plotpaths) == 0:
try:
import rivet
try:
self.plotpaths = rivet.getAnalysisPlotPaths()
except AttributeError:
self.plotpaths = rivet.getAnalysisRefPaths()
except AttributeError, e:
sys.stderr.write("Failed to load Rivet analysis plot/reference paths: %s\n" % e)
sys.stderr.write("Rivet version is too old.\n")
raise ValueError("No plot paths given and rivet module is too old.")
except ImportError, e:
sys.stderr.write("Failed to import rivet module: %s\n" % e)
raise ValueError("No plot paths given and the rivet module could not be loaded!")
def getSection(self, section, hpath):
"""Get a section for a histogram from a .plot file.
Parameters
----------
section : ('PLOT'|'SPECIAL'|'HISTOGRAM')
The section that should be extracted.
hpath : str
The histogram path, i.e. /AnaylsisID/HistogramID .
Todo
----
Caching!
At the moment the result of the lookup is not cached so every
call requires a file to be searched for and opened.
"""
if section not in ['PLOT', 'SPECIAL', 'HISTOGRAM']:
raise ValueError("Can't parse section \'%s\'" %section)
parts = hpath.split("/")
if len(parts) != 3:
raise ValueError("hpath has wrong number of parts (%i)" % (len(parts)))
base = parts[1] + ".plot"
ret = {'PLOT': {}, 'SPECIAL': None, 'HISTOGRAM': {}}
for pidir in self.plotpaths:
plotfile = os.path.join(pidir, base)
if os.access(plotfile, os.R_OK):
#print plotfile
startreading = False
f = open(plotfile)
for line in f:
m = self.pat_begin_block.match(line)
if m:
tag, pathpat = m.group(1,2)
# pathpat could be a regex
if tag == section and re.match(pathpat,hpath):
startreading = True
if section in ['SPECIAL']:
ret[section] = ''
continue
if not startreading:
continue
if self.isEndMarker(line, section):
startreading = False
continue
elif self.isComment(line):
continue
if section in ['PLOT', 'HISTOGRAM']:
vm = self.pat_property.match(line)
if vm:
prop, value = vm.group(1,2)
#print prop, value
ret[section][prop] = value
elif section in ['SPECIAL']:
ret[section] += line
f.close()
# no break, as we can collect settings from multiple .plot files
return ret[section]
def getHeaders(self, hpath):
"""Get the plot headers for histogram hpath.
This returns the PLOT section.
Parameters
----------
hpath : str
The histogram path, i.e. /AnalysisID/HistogramID .
Returns
-------
plot_section : dict
The dictionary usually contains the 'Title', 'XLabel' and
'YLabel' properties of the respective plot.
See also
--------
:meth:`getSection`
"""
return self.getSection('PLOT', hpath)
def getSpecial(self, hpath):
"""Get a SPECIAL section for histogram hpath.
The SPECIAL section is only available in a few analyses.
Parameters
----------
hpath : str
Histogram path. Must have the form /AnalysisID/HistogramID .
See also
--------
:meth:`getSection`
"""
return self.getSection('SPECIAL', hpath)
def getHistogramOptions(self, hpath):
"""Get a HISTOGRAM section for histogram hpath.
The HISTOGRAM section is only available in a few analyses.
Parameters
----------
hpath : str
Histogram path. Must have the form /AnalysisID/HistogramID .
See also
--------
:meth:`getSection`
"""
return self.getSection('HISTOGRAM', hpath)
def isEndMarker(self, line, blockname):
m = self.pat_end_block.match(line)
return m and m.group(1) == blockname
def isComment(self, line):
return self.pat_comment.match(line) is not None
def updateHistoHeaders(self, hist):
headers = self.getHeaders(hist.histopath)
if headers.has_key("Title"):
hist.title = headers["Title"]
if headers.has_key("XLabel"):
hist.xlabel = headers["XLabel"]
if headers.has_key("YLabel"):
hist.ylabel = headers["YLabel"]
Index: trunk/professor/data/proxy.py
===================================================================
--- trunk/professor/data/proxy.py (revision 1399)
+++ trunk/professor/data/proxy.py (revision 1400)
@@ -1,654 +1,654 @@
import os
import itertools
from professor import histo
from professor.tools.hashes import md5
from mcdata import MCData
from tunedata import TuneData
from professor.interpolation.interpolationset import InterpolationSet
from professor.tools.errors import DataProxyError, IOTestFailed
from professor.tools import io
import professor.tools.log as logging
class DataProxy(object):
"""Central object for loading data from the file system.
Three types of data are handled:
Reference data :
TODO
MC data :
Different types of MC data can be stored. The MC data is stored in a
`dict` ``{type-ID => MCData}`` . type-IDs are for example 'sample'
or 'scan'.
Interpolations :
TODO
See Also
--------
MCData : Abstraction of a MC data subdirectory.
"""
def __init__(self):
self._refpath = None
self._ipolpath = None
self._mcpaths = dict()
# { histo path => Histo }
self._refdata = None
# { data type => MCData }
self._mcdata = dict()
# Do not cache InterpolationSets: this is probably too memory-intensive
# self._ipoldata = None
#
self._outdir = None
def __str__(self):
"""
Mildly informative string representation.
"""
s = "<%s ref(%s)" % (self.__class__.__name__, self._refpath)
for typeid, mcdata in self._mcdata.items():
s += " mc-%s(%s)" % (typeid, mcdata.basepath)
s += " ipol(%s)>" % self._ipolpath
return s
@staticmethod
def getPathsFromCLOptions(opts):
"""Return a dict with the data paths specified on command line.
The dictionary has the following 4 keys:
* mc
* ref
* scan
* ipol
Each value can contain `None` meaning that the respective
command-line option is not available or that a value could not be
constructed from e.g. DATADIR/mc .
Arguments
---------
opts : optparse.Values
"""
datadir = opts.DATADIR
paths = {"mc":None, "ref":None, "scan":None, "ipol":None, "outdir":None}
try:
if opts.MCDIR is not None:
paths["mc"] = opts.MCDIR
elif datadir is not None:
paths["mc"] = os.path.join(datadir, "mc")
else:
logging.debug("Command-line option defined for `mcdir` but"
" no value could be built. Continue and hope"
" for the best...")
except AttributeError, err:
logging.debug("MCDIR not found in CL options: %s" % err)
try:
if opts.REFDIR is not None:
paths["ref"] = opts.REFDIR
elif datadir is not None:
paths["ref"] = os.path.join(datadir, "ref")
else:
logging.debug("Command-line option defined for `refdir` but"
" no value could be built. Continue and hope"
" for the best...")
except AttributeError, err:
logging.debug("REFDIR not found in CL options: %s" % err)
try:
if opts.SCANDIR is not None:
paths["scan"] = opts.SCANDIR
elif datadir is not None:
paths["scan"] = os.path.join(datadir, "scan")
else:
logging.debug("Command-line option defined for `scandir` but"
" no value could be built. Continue and hope"
" for the best...")
except AttributeError, err:
logging.debug("SCANDIR not found in CL options: %s" % err)
try:
if opts.IPOLDIR is not None:
paths["ipol"] = opts.IPOLDIR
elif datadir is not None:
paths["ipol"] = os.path.join(datadir, "ipol")
else:
logging.debug("Command-line option defined for `ipoldir` but"
" no value could be built. Continue and hope"
" for the best...")
except AttributeError, err:
logging.debug("IPOLDIR not found in CL options: %s" % err)
try:
if opts.OUTDIR is not None:
paths["outdir"] = opts.OUTDIR
elif datadir is not None:
paths["outdir"] = datadir
else:
logging.debug("Command-line option defined for `outdir` but"
" no value could be built. Continue and hope"
" for the best...")
except AttributeError, err:
logging.debug("IPOLDIR not found in CL options: %s" % err)
# Summary of paths in DEBUG mode
logging.debug("MCDIR:\t%s"%paths['mc'])
logging.debug("REFDIR:\t%s"%paths['ref'])
logging.debug("SCANDIR:\t%s"%paths['scan'])
logging.debug("IPOLDIR:\t%s"%paths['ipol'])
logging.debug("OUTDIR:\t%s"%paths['outdir'])
return paths
@classmethod
def mkFromCLOptions(cls, opts):
"""Build DataProxy from CL options that were prepared with
addDataCLOptions.
Only the paths are set in the returned DataProxy for which the
parser has an according option.
See Also
--------
addDataCLOptions : Add a data location command-line option group to
an `OptionParser`.
getPathsFromCLOptions : Get a `dict` of data-location paths from
command line options.
"""
# Try to create more comfortable RivetDataProxy. Do this here so we
# don't need to modify all prof-* scripts.
# Use the refdir as specified if it is given explicitly on the command line
if cls is not RivetDataProxy and hasattr(opts, "REFDIR") and opts.REFDIR is None:
try:
logging.debug("Trying to create RivetDataProxy")
return RivetDataProxy.mkFromCLOptions(opts)
except Exception, err:
logging.warning("Could not create RivetDataProxy: %s" % err)
else:
logging.debug("Not creating RivetDataProxy")
proxy = cls()
paths = cls.getPathsFromCLOptions(opts)
proxy.refpath = paths["ref"]
proxy.ipolpath = paths["ipol"]
proxy.outdir = paths["outdir"]
if paths["mc"] is not None:
proxy.setMCPath(paths["mc"], "sample")
elif paths["scan"] is not None:
proxy.setMCPath(paths["scan"], "linescan")
else:
proxy.setMCPath(None)
return proxy
def setDataPath(self, base):
"""Set data location paths rooted at `base`.
Sets the data location paths for reference data (`base/ref`),
MC sample (`base/mc`) and interpolation storage (`base/ipol/`).
Parameters
----------
base : str
Base path for data locations.
"""
temp = os.path.join(base, "ref")
if os.path.isdir(temp):
self.setRefPath(temp)
temp = os.path.join(base, "mc")
if os.path.isdir(temp):
self.setMCPath(temp, "sample")
temp = os.path.join(base, "ipol")
if os.path.isdir(temp):
self.setIpolPath(temp)
temp = os.path.join(base, "outdir")
if os.path.isdir(temp):
self.setOutputPath(temp)
def setOutputPath(self, path):
self._outdir = None
if path:
io.testReadDir(path)
self._outdir = path
def getOutputPath(self):
# if self._refpath is None:
# raise DataProxyError("No reference data path set!")
return self._outdir
outdir = property(getOutputPath, setOutputPath,
doc="Base directory for output")
def setRefPath(self, path):
self._refdata = None
if path:
io.testReadDir(path)
self._refpath = path
def getRefPath(self):
# if self._refpath is None:
# raise DataProxyError("No reference data path set!")
return self._refpath
refpath = property(getRefPath, setRefPath,
doc="Base directory for reference data files")
def _loadRefData(self, force=False):
"""
Load all reference data if not done before.
If 'force' is true, reload data even if already loaded.
"""
## Exit early if refdata is already populated (and a reload is not forced)
if not force and self._refdata:
return
self._refdata = {}
if self.refpath:
refdircontent = os.listdir(self.refpath)
for reffile in refdircontent:
# TODO: The file format stuff should be done in the histo module
if not (reffile.endswith(".yoda") or reffile.endswith(".aida")):
continue
reffilepath = os.path.join(self.refpath, reffile)
if not os.path.isfile(reffilepath):
logging.warn("Could not read reference file: " + reffilepath)
self._refdata.update(histo.Histo.fromFile(reffilepath))
def getRefHisto(self, histopath):
"""Get a reference histogram.
Parameters
----------
histopath : str
A histogram path of the form '/Analysis/HistoID'.
Raises
------
DataProxyError
If `self.refpath` is not set.
KeyError
If `histopath` is not available.
-
Returns
-------
histogram : histo.Histo
The reference histogram.
"""
self._loadRefData()
return self._refdata[histopath]
def getRefData(self):
"""Get the dictionary of all loaded reference histograms, indexed
by histogram path.
Returns
-------
refhistos : dict(path => histo.Histo)
The reference histograms.
"""
self._loadRefData()
return self._refdata
def setMCPath(self, path, datatype="sample"):
"""Add MC data of given type rooted at `path`.
Parameters
----------
path : str
Base directory of the MC data.
datatype : str, optional
The type identifier of the MC data, e.g. `'sample'` or
`'linescan'`. The default is `'sample'`.
Raises
------
IOTestFailed
If `path` is not a readable directory.
"""
if path:
io.testReadDir(path)
if self._mcdata.has_key(datatype):
del self._mcdata[datatype]
self._mcdata[datatype] = MCData(path)
def addMCData(self, mcdata, datatype):
"""Add MC data of given data type.
Add a MC data interface to the internal storage dictionary. If an
entry for `datatype` already exists it will be overwritten!
Parameters
----------
mcdata : MCData (or subclass)
The MC data to add.
datatype : str
The MC data type, e.g. `'sample'` or `'scan'` or `'tunes'`.
Raises
------
TypeError
If mcdata has wrong type.
"""
if not isinstance(mcdata, MCData):
raise TypeError("Argument mcdata must be a MCData (or subclass)"
" instance. But type is %s" % (type(mcdata)))
self._mcdata[datatype] = mcdata
def getMCData(self, datatype="sample"):
"""Get MC data of the given type.
Parameters
----------
datatype : str, optional
The MC data type, e.g. `'sample'` or `'scan'` (default is
`'sample'`.
Raises
------
DataProxyError
If no MC data of type `datatype` is available.
Returns
-------
mcdata : MCData
The `datatype` MC data.
"""
if not self._mcdata.has_key(datatype):
raise DataProxyError("MC data type '%s' not set!" % (datatype))
return self._mcdata[datatype]
def setIpolPath(self, path):
if path:
io.testReadDir(path)
self._ipolpath = path
def getIpolPath(self):
# if self._ipolpath is None:
# raise DataProxyError("No interpolation base path set!")
return self._ipolpath
ipolpath = property(getIpolPath, setIpolPath,
doc="Base directory for interpolation set files")
def getIpolFilePath(self, ipolcls, runs, output=False):
"""Return the canonical path for an interpolation pickle.
Parameters
----------
ipolcls : class
The interpolation method class. Must have a 'method' attribute.
runs : list, str
The runs that are used as anchor points for the interpolation.
Can be a list of strings or a single string of colon-separated
run keys.
"""
if type(runs) in [list, tuple]:
#print runs
runs = ":".join(sorted(runs))
if output:
return os.path.join(self.outdir,
"profipol_%s_%s.pkl" % (ipolcls.method,
md5(runs).hexdigest()))
else:
return os.path.join(self.ipolpath,
"profipol_%s_%s.pkl" % (ipolcls.method,
md5(runs).hexdigest()))
# TODO: raise meaningful error
def getInterpolationSet(self, ipolcls, runs):
"""Get an InterpolationSet.
This is loaded from disk on-the-fly.
Parameters
----------
ipolcls : class
The interpolation method class.
runs : list, str
The runs that are used as anchor points for the interpolation.
Can be a list of strings or a single string of colon-separated
run keys.
"""
path = self.getIpolFilePath(ipolcls, runs)
return InterpolationSet.mkFromPickle(path)
# TODO: is this necessary?
# TODO: filter for max info ipol
def listInterpolationSets(self):
"""Return a list of *all* InterpolationSets in the ipol directory.
Raises
------
DataProxyError
If `self.ipolpath` is not set.
"""
l = []
for f in os.listdir(self.ipolpath):
if not f.endswith(".pkl"):
continue
p = os.path.join(self.ipolpath, f)
l.append(InterpolationSet.mkFromPickle(p))
return l
# TODO: add method that gets a ipolcls and a list of runcombs and
# checks, that all files exist
def getTuneData(self, withref=True, withmc=None,
useipol=None, useruns=None, useobs=None):
"""Return a TuneData object with the desired data.
The kind of data that is given to TuneData can be steered via the
(optional) flags. Depending on the kind of computation (calculating
interpolation coefficients/minimising/...) different kinds of data
must be turned on.
This is the central data preparation function.
Parameters
----------
withref : bool, optional
Equip `TuneData` with reference data (the default is `True`).
withmc : {str, `None`}, optional
If not `None`, the type of MC data that is stored in the
`TuneData`, e.g. `'sample'`. The default is `None`.
useipol : {interpolation_class, `None`}, optional
If not `None`, the interpolation method class used for the
per-bin interpolations. Only the
:attr:`~BinInterpolation.method` attribute is important because
this is used to construct the file name of the pickle file.
useruns : {list of str, `None`}, optional
The run numbers used for interpolation. Can be `None` if
`withmc` is given. In this case, all available MC runs are used.
useobs : {list of str, `None`}, optional
The observables to use. Can be `None` if `withmc` is given. In
this case, all available observables in the MC data are used.
"""
return TuneData(self, withref, withmc, useipol, useruns, useobs)
@staticmethod
def getBinID(histo, ibin):
"""Get a canonical bin id of the form Analysis/HistoID:BinIdx .
Parameters
----------
histo : Histo
Histogram.
ibin : int
Bin index.
"""
return "%s:%i" % (histo.histopath, ibin)
@staticmethod
def getBinIndex(binid):
"""Get the bin index from a canonical bin ID.
Parameters
----------
binid : str
The bin ID.
Returns
-------
index : int
"""
return DataProxy.splitBinID(binid)[1]
@staticmethod
def splitBinID(binid):
"""Split a bin ID in observable and bin index.
Parameters
----------
binid : str
The bin ID.
Returns
-------
observable : str
index : int
"""
obs, idx = binid.split(":")
return obs, int(idx)
class RivetDataProxy(DataProxy):
"""Data proxy that loads the reference data from the files distributed with
Rivet.
"""
def __init__(self):
import rivet
super(RivetDataProxy, self).__init__()
self._refdirs = ["."]
try:
self._refdirs = rivet.getAnalysisRefPaths()
except:
logging.warning("Your copy of Rivet does not support the getAnalysisRefPaths() function: please upgrade!")
def getRefHisto(self, histopath):
"""Get a reference histogram.
Parameters
----------
histopath : str
A histogram path of the form '/Analysis/HistoID'.
Raises
------
KeyError
If `histopath` is not available.
Returns
-------
histogram : histo.Histo
The reference histogram.
"""
if self._refdata is None or not self._refdata.has_key(histopath):
self._loadRefData(histopath)
return self._refdata[histopath]
def _loadRefData(self, histopath):
try:
rd = itertools.chain([self.refpath], self._refdirs)
logging.debug("Using %s and %s for ref data search." % (self.refpath, self._refdirs))
except DataProxyError:
rd = self._refdirs
logging.debug("refpath not set. Using only %s for ref data search." % self._refdirs)
ana = histopath[1:].split("/")[0]
rp = None
for rd in self._refdirs:
for extn in histo.FORMATS:
rp = os.path.join(rd, ana + "." + extn)
try:
io.testReadFile(rp)
logging.debug("Found ref data file for analysis %s: %s" % (ana, rp))
break
except IOTestFailed:
rp = None
if rp:
break
- if rp is None:
- ## No file matching the analysis name found. (Re-)load all ref data in refpath.
+ if not rp:
+ ## No file matching the analysis name found. (Re-)load all ref data in refpath
logging.debug("(Re-)loading reference data from %s" % self.refpath)
super(RivetDataProxy, self)._loadRefData(True)
else:
if self._refdata is None:
self._refdata = {}
logging.debug("Loading histograms from %s" % rp)
- self._refdata.update(histo.Histo.fromFile(rp))
+ refhistos = histo.Histo.fromFile(rp)
+ self._refdata.update(refhistos)
def getRefData(self):
# if self._refdata is None:
# raise DataProxyError("No reference data loaded!")
return self._refdata
@classmethod
def mkFromCLOptions(cls, opts):
"""Build DataProxy from CL-options that were prepared with
addDataCLOptions.
Only the paths are set in the returned DataProxy for which the
parser has an according option.
See Also
--------
addDataCLOptions : Add a data location command-line option group to
an `OptionParser`.
getPathsFromCLOptions : Get a `dict` of data-location paths from
command line options.
"""
proxy = cls()
paths = cls.getPathsFromCLOptions(opts)
if paths["ref"] is not None:
try:
proxy.refpath = paths["ref"]
except IOTestFailed, err:
logging.debug("Reference directory does not exist: %s (this"
" is not fatal for RivetDataProxy)" % (err))
if paths["ipol"] is not None:
proxy.ipolpath = paths["ipol"]
if paths["mc"] is not None:
proxy.setMCPath(paths["mc"], "sample")
if paths["scan"] is not None:
proxy.setMCPath(paths["scan"], "linescan")
return proxy
File Metadata
Details
Attached
Mime Type
text/x-diff
Expires
Sat, Dec 21, 5:43 PM (10 h, 35 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
4023684
Default Alt Text
(84 KB)
Attached To
rPROFESSORSVN professorsvn
Event Timeline
Log In to Comment