picca_continuum

Continuum fitting module.

class PiccaContinuumFitter(args)[source]

Bases: object

Picca continuum fitter class.

Fits spectra without coadding. Pipeline inverse variance preferably should be smoothed before fitting. Mean continuum and var_lss are smoothed using inverse weights and cubic spline to help numerical stability.

When fitting for var_lss, number of wavelength bins in the observed frame for variance fitting nwbins are calculated by demanding 120 A steps between bins as closely as possible by VarLSSFitter.

Contruct an instance, then call iterate() with local spectra.

Parameters:

args (argparse.Namespace) – Namespace. Wavelength values are taken to be the edges (not centers). See respective parsers for default values.

nbins

Number of bins for the mean continuum in the rest frame.

Type:

int

rfwave

Rest-frame wavelength centers for the mean continuum.

Type:

ndarray

_denom

Denominator for the slope term in the continuum model.

Type:

float

meancont_interp

Fast cubic spline object for the mean continuum.

Type:

FastCubic1DInterp

comm

MPI comm object to reduce, broadcast etc.

Type:

MPI.COMM_WORLD

mpi_rank

Rank of the MPI process.

Type:

int

meanflux_interp

Interpolator for mean flux. If fiducial is not set, this equals to 1.

Type:

FastLinear1DInterp

flux_stacker

Stacks flux. Set up with 8 A wavelength bin size.

Type:

FluxStacker

varlss_fitter

None if fiducials are set for var_lss.

Type:

VarLSSFitter or None

varlss_interp

Cubic spline for var_lss if fitting. Linear if from file.

Type:

FastLinear1DInterp or FastCubic1DInterp

eta_interp

Interpolator for eta. Returns one if fiducial var_lss is set.

Type:

FastCubic1DInterp

niterations

Number of iterations from args.num_iterations.

Type:

int

cont_order

Order of continuum polynomial from args.cont_order.

Type:

int

outdir

Directory to save catalogs. If None or empty, does not save.

Type:

str or None

fit_eta

True if fitting eta and fiducial var_lss is not set.

Type:

bool

normalize_stacked_flux

Normalizes observed flux to be 1 if True.

Type:

bool

eta_calib_ivar

Calibrate IVAR with eta estimates.

Type:

bool

model

Continuum model to use

Type:

Derived(BaseContinuumModel)

_get_fiducial_interp(fname, col2read)[source]

Return an interpolator for mean flux or var_lss.

FITS file must have a STATS extention, which must have LAMBDA, MEANFLUX and VAR_LSS columns. This is the same format as rawio output from picca, except VAR column in picca is the variance of flux not deltas. We break away from that convention by explicitly requiring variance on deltas in a new column. LAMBDA must be linearly and equally spaced. This function sets up col2read as FastLinear1DInterp object.

Parameters:
  • fname (str) – Filename of the FITS file.

  • col2read (str) – Should be MEANFLUX or VAR_LSS.

Return type:

FastLinear1DInterp

Raises:

QsonicException – If LAMBDA is not equally spaced or col2read is not in the file.

_set_meanflux_varlss_interps(args)[source]
_set_continuum_model(args)[source]
fit_continua(spectra_list)[source]

Fits all continua for a list of Spectrum objects.

Parameters:

spectra_list (list(Spectrum)) – Spectrum objects to fit.

Raises:

QsonicException – If there are no valid fits.

_project_normalize_meancont(new_meancont)[source]

Project out higher order Legendre polynomials from the new mean continuum since these are degenerate with the free fitting parameters. Returns a normalized mean continuum. Integrals are calculated using CubicSpline with ln lambda_RF as x array.

Parameters:

new_meancont (ndarray) – First estimate of the new mean continuum.

Returns:

  • new_meancont (ndarray) – Legendere polynomials projected out and normalized mean continuum.

  • mean (float) – Normalization of the mean continuum.

update_mean_cont(spectra_list)[source]

Update the global mean continuum and stacked flux.

Uses forestivar_sm in inverse variance, but must be set beforehand. Raw mean continuum estimates are smoothed with a weighted scipy.interpolate.UnivariateSpline. The mean continuum is removed from higher Legendre polynomials and normalized by the mean. This function updates meancont_interp.fp.

Parameters:

spectra_list (list(Spectrum)) – Spectrum objects to fit.

Returns:

has_converged – True if all continuum updates on every point are less than 0.33 times the error estimates.

Return type:

bool

update_var_lss_eta(spectra_list)[source]

Fit and update var_lss and eta if enabled. See VarLSSFitter for fitting details.

Parameters:

spectra_list (list(Spectrum)) – Spectrum objects to fit.

_normalize_flux(spectra_list)[source]

Multiplies continuum estimates with stacked values in order to normalize flux in the observed grid to be 1 if --normalize-stacked-flux is passed.

Parameters:

spectra_list (list(Spectrum)) – Spectrum objects.

_eta_calibate_ivar(spectra_list)[source]

Divides forest ivar values by eta estimates if --eta-calib-ivar is passed.

Parameters:

spectra_list (list(Spectrum)) – Spectrum objects.

iterate(spectra_list)[source]

Main function to fit continua and iterate.

Consists of three major steps: initializing, fitting, updating global variables. The initialization sets cont_params variable of every Spectrum object. Continuum polynomial order is carried by setting cont_params[x]. At each iteration:

  1. Global variables (mean continuum, var_lss) are saved to attributes.fits file. This ensures the order of what is used in each iteration.

  2. All spectra are fit.

  3. Mean continuum is updated by stacking, smoothing and removing degenarate modes. Check for convergence if update is small.

  4. If fitting for var_lss, fit and update by calculating variance statistics.

Parameters:

spectra_list (list(Spectrum)) – Spectrum objects to fit.

fit_final_varlss(spectra_list)[source]
save(fattr, suff='')[source]

Save mean continuum and var_lss (if fitting) to a fits file.

Parameters:
  • fattr (MPISaver) – File handler to save only on master node.

  • suff (str) – Suffix for the current iteration number.

save_contchi2_catalog(spectra_list)[source]

Save the chi2 catalog that includes information regarding chi2, mean_snr, targetid, etc. if self.outdir is set. All values are gathered to and saved on the master node.

Parameters:

spectra_list (list(Spectrum)) – Spectrum objects to fit.

_fast_weighted_vector_bincount(x, delta, var, minlength)[source]
class VarLSSFitter(w1obs, w2obs, nwbins=None, var1=0.0001, var2=20.0, nvarbins=100, nsubsamples=None, use_cov=False, comm=None)[source]

Bases: object

Variance fitter for the large-scale fluctuations.

Input wavelengths and variances are the bin edges, so centers will be shifted. Valid bins require at least 500 pixels from 50 quasars. Assumes no spectra has wave < w1obs or wave > w2obs.

Note

This class is designed to be used in a linear fashion. You create it, add statistics to it and finally fit. After fit() is called, fit() and add() cannot be called again. You may reset() and start over.

Usage:

...
varfitter = VarLSSFitter(
    wave1, wave2, nwbins,
    var1, var2, nvarbins,
    nsubsamples=100, comm=comm)
# Change static minimum numbers for valid statistics
VarLSSFitter.min_num_pix = min_num_pix
VarLSSFitter.min_num_qso = min_num_qso

for delta in deltas_list:
    varfitter.add(delta.wave, delta.delta, delta.ivar)

fit_results = np.ones((nwbins, 2))
fit_results[:, 0] = 0.1
fit_results, std_results = varfitter.fit(fit_results)

varfitter.save("variance-file.fits")

# You CANNOT call ``fit`` again!
Parameters:
  • w1obs (float) – Lower observed wavelength edge.

  • w2obs (float) – Upper observed wavelength edge.

  • nwbins (int, default: None) – Number of wavelength bins. If none, automatically calculated to yield 120 A wavelength spacing.

  • var1 (float, default: 1e-4) – Lower variance edge.

  • var2 (float, default: 20) – Upper variance edge.

  • nvarbins (int, default: 100) – Number of variance bins.

  • nsubsamples (int, default: None) – Number of subsamples for the Jackknife covariance. Auto determined by number of bins (nvarbins^2) if None.

  • use_cov (bool, default: False) – Use the Jackknife covariance when fitting.

  • comm (MPI.COMM_WORLD or None, default: None) – MPI comm object to allreduce if enabled.

waveobs

Wavelength centers in the observed frame.

Type:

ndarray

ivar_edges

Inverse variance edges.

Type:

ndarray

minlength

Minimum size of the combined bin count array. It includes underflow and overflow bins for both wavelength and variance bins.

Type:

int

wvalid_bins

Bool array slicer to get non-overflow bins of 1D arrays.

Type:

ndarray

subsampler

Subsampler object that stores mean_delta in axis=0, var_delta in axis=1 , var2_delta in axis=2, mean bin variance in axis=3.

Type:

SubsampleCov

comm

MPI comm object to allreduce if enabled.

Type:

MPI.COMM_WORLD or None, default: None

mpi_rank

Rank of the MPI process if comm!=None. Zero otherwise.

Type:

int

min_num_pix = 500

Minimum number of pixels a bin must have to be valid.

Type:

int

min_num_qso = 50

Minimum number of quasars a bin must have to be valid.

Type:

int

static variance_function(var_pipe, var_lss, eta=1)[source]

Variance model to be fit.

\[\sigma^2_\mathrm{obs} = \eta \sigma^2_\mathrm{pipe} + \sigma^2_\mathrm{LSS}\]
Parameters:
  • var_pipe (ndarray) – Pipeline variance.

  • var_lss (ndarray) – Large-scale structure variance.

  • eta (float) – Pipeline variance calibration scalar.

Returns:

Expected variance of deltas.

Return type:

ndarray

construct_interp(y0=1.0)[source]

Return a FastCubic1DInterp object that interpolates in observed wavelength bins.

Parameters:

y0 (float, default: 1.0) – Initial constant value for the interpolator.

Return type:

FastCubic1DInterp

reset()[source]

Reset delta and num arrays to zero.

add(wave, delta, ivar)[source]

Add statistics of a single spectrum. Updates delta and num arrays.

Assumes no spectra has wave < w1obs or wave > w2obs.

Parameters:
  • wave (ndarray) – Wavelength array.

  • delta (ndarray) – Delta array.

  • ivar (ndarray) – Inverse variance array.

calculate_subsampler_stats()[source]

Calculates mean, variance and error on the variance.

It also calculates the delete-one Jackknife variance over nsubsamples. The variance on var_delta is updated with the variance on the mean estimates, then it is regularized by calculated var2_delta (Gaussian estimates), where if Jackknife variance is smaller than the Gaussian estimate, it is replaced by the Gaussian estimate.

Covariance is calculated if use_cov is set to True in init. Covariance of mean_delta^2 is propagated using the formula in qsonic.mathtools.block_covariance_of_square().

_smooth_fit_results(fit_results, std_results)[source]
_fit_array_shape_assert(arr)[source]
_get_wave_bin_slice_params(iwave)[source]
fit(initial_guess, smooth=True)[source]

Syncronize all MPI processes and fit for var_lss and eta.

Second axis always contains eta values. Example:

var_lss = initial_guess[:, 0]
eta = initial_guess[:, 1]

This implemented using scipy.optimize.curve_fit() with e_var_delta or cov_var_delta as absolute errors. Domain is bounded to (0, 2). These fits are then smoothed via scipy.interpolate.UnivariateSpline using weights from curve_fit, while missing values or failed wavelength bins are extrapolated.

Parameters:
  • initial_guess (ndarray) – Initial guess for var_lss and eta. If 1D array, eta is fixed to one. If 2D, its shape must be (nwbins, 2).

  • smooth (bool, default: True) – Smooth results using UnivariateSpline.

Returns:

  • fit_results (ndarray) – Smoothed fit results at observed wavelengths where missing values are extrapolated. 1D array containing LSS variance if initial_guess is 1D. 2D containing eta values on the second axis if initial_guess is 2D ndarray.

  • std_results (ndarray) – Error on var_lss from sqrt of curve_fit output. Same behavior as fit_results.

write(mpi_saver, min_snr=0, max_snr=100)[source]

Write variance statistics to FITS file in ‘VAR_STATS’ extension.

Parameters:
  • mpi_saver (MPISaver) – MPI FITS file handler.

  • min_snr (float, default: 0) – Minimum SNR in this sample to be written into header.

  • max_snr (float, default: 100) – Maximum SNR in this sample to be written into header.

property num_pixels

ndarray: Number of pixels in bins.

property num_qso

ndarray: Number of quasars in bins.

property mean_delta

Mean delta.

Type:

ndarray

property var_delta

ndarray: Variance delta.

property e_var_delta

ndarray: Error on variance delta using delete-one Jackknife. The error of mean_delta is propagated, then regularized using var2_delta. See source code of calculate_subsampler_stats().

property cov_var_delta

ndarray: 3D array of shape (nwbins, nvarbins, nvarbins) for the covariance on variance delta using delete-one Jackknife. The covariance of mean_delta is propagated. See qsonic.mathtools.block_covariance_of_square() for details. None if use_cov is False.

property var2_delta

ndarray: Variance delta^2.

property var_centers

ndarray: Mean variance for the bin centers in descending order.

property e_var_centers

ndarray: Error on the mean variance for the bin centers.

class FluxStacker(w1obs, w2obs, dwobs, waverf, dwrf, comm=None)[source]

Bases: object

The class to stack flux values to obtain IGM mean flux and other problems.

This object can be called. Stacked flux is initialized to one. Reset before adding statistics.

Parameters:
  • w1obs (float) – Lower observed wavelength edge.

  • w2obs (float) – Upper observed wavelength edge.

  • dwobs (float) – Wavelength spacing.

  • waverf (ndarray) – Rest-frame wavelength bins.

  • dwrf (float) – Wavelength spacing in the rest frame.

  • comm (MPI.COMM_WORLD or None, default: None) – MPI comm object to allreduce if enabled.

waveobs

Wavelength centers in the observed frame.

Type:

ndarray

nwbins

Number of wavelength bins

Type:

int

dwobs

Wavelength spacing. Usually same as observed grid.

Type:

float

nwrfbins

Number of rest-frame wavelength bins

Type:

int

waverf0

The first rest-frame wavelength bin

Type:

float

dwrf

Rest-frame wavelength spacing.

Type:

float

_interp

Interpolator. Saves stacked_flux in fp and weights in ep.

Type:

FastLinear1DInterp

_interp_rf

Rest-frame interpolator. Saves stacked_flux in fp and weights in ep.

Type:

FastLinear1DInterp

comm

MPI comm object to allreduce if enabled.

Type:

MPI.COMM_WORLD or None, default: None

add(wave, wave_rf, weighted_flux, weighted_cont, weight)[source]

Add statistics of a single spectrum.

Updates stacked_flux and weights. Assumes no spectra has wave < w1obs or wave > w2obs.

Parameters:
  • wave (ndarray) – Wavelength array in the observed frame.

  • wave_rf (ndarray) – Wavelength array in the rest frame.

  • weighted_flux (ndarray) – Weighted flux array. Specifically w * f/C.

  • weighted_cont (ndarray) – Weighted continuum array.

  • weight (ndarray) – Weight array.

calculate()[source]

Calculate stacked flux by allreducing if necessary.

reset()[source]

Reset stacked_flux and weights arrays to zero.

property stacked_flux

Stacked flux.

Type:

ndarray

property weights

Weights.

Type:

ndarray

property stacked_flux_rf

Stacked flux in the rest-frame.

Type:

ndarray

property std_flux_rf

Error in the rest-frame.

Type:

ndarray

Arguments

add_picca_continuum_parser(parser=None)[source]

Adds PiccaContinuumFitter related arguments to parser. These arguments are grouped under ‘Continuum fitting options’. All of them come with defaults, none are required.

Parameters:

parser (argparse.ArgumentParser, default: None) –

Returns:

parser

Return type:

argparse.ArgumentParser

usage: foo [-h] [--num-iterations NUM_ITERATIONS]
           [--continuum-model {picca,true,input}] [--true-continuum]
           [--input-continuum-dir INPUT_CONTINUUM_DIR]
           [--fiducial-meanflux FIDUCIAL_MEANFLUX]
           [--fiducial-varlss FIDUCIAL_VARLSS] [--cont-order CONT_ORDER]
           [--var-fit-eta] [--var-use-cov] [--normalize-stacked-flux]
           [--eta-calib-ivar] [--rfdwave RFDWAVE]
           [--minimizer {iminuit,l_bfgs_b}]

Continuum fitting options

--num-iterations

Number of iterations for continuum fitting.

Default: 10

--continuum-model

Possible choices: picca, true, input

Continuum model. picca fits the continuum in the forest region. true is the true continuum analysis for mock analysis. input in the input continuum analysis for all.

Default: “picca”

--true-continuum

Alternative argument for true continuum analysis.

Default: False

--input-continuum-dir

Input continuum directory.

--fiducial-meanflux

Fiducial mean flux FITS file.

--fiducial-varlss

Fiducial var_lss FITS file.

--cont-order

Order of continuum fitting polynomial.

Default: 1

--var-fit-eta

Fit for noise calibration (eta).

Default: False

--var-use-cov

Use covariance in varlss-eta fitting.

Default: False

--normalize-stacked-flux

Force stacked flux to be one at the end.Note this does not change STACKED_FLUX in attributes. It only updates CONT of each delta.

Default: False

--eta-calib-ivar

Calibrate IVAR with eta estimates.

Default: False

--rfdwave

Rest-frame wave steps. Complies with forest limits

Default: 0.8

--minimizer

Possible choices: iminuit, l_bfgs_b

Minimizer to fit the continuum.

Default: “iminuit”