API Reference

This section contains the complete API documentation for PySIPS.

Main Components

The primary interface to PySIPS is through the PysipsRegressor class, which provides a scikit-learn compatible API for Bayesian symbolic regression.

PysipsRegressor

class pysips.PysipsRegressor(operators=None, max_complexity=24, terminal_probability=0.1, constant_probability=None, command_probability=0.2, node_probability=0.2, parameter_probability=0.2, prune_probability=0.2, fork_probability=0.2, repeat_mutation_probability=0.05, crossover_pool_size=None, mutation_prob=0.75, crossover_prob=0.25, exclusive=True, num_particles=50, num_mcmc_samples=5, target_ess=0.8, param_init_bounds=None, opt_restarts=1, model_selection='mode', checkpoint_file=None, random_state=None, max_time=None, max_equation_evals=None, show_progress_bar=True)[source]

Bases: BaseEstimator, RegressorMixin

A scikit-learn compatible wrapper for PySIPS symbolic regression.

This regressor uses Sequential Monte Carlo (SMC) sampling to explore the space of symbolic expressions and find mathematical models that best explain the observed data. The approach provides principled uncertainty quantification and supports checkpointing for long-running fits.

Parameters:

operators (list, default=['+', '*']) – List of operators to use in symbolic expressions.
max_complexity (int, default=24) – Maximum complexity of symbolic expressions.
terminal_probability (float, default=0.1) – Probability of selecting a terminal during expression generation.
constant_probability (float or None, default=None) – Probability of selecting a constant terminal. If None, will be set to 1/(x_dim + 1).
command_probability (float, default=0.2) – Probability of command mutation.
node_probability (float, default=0.2) – Probability of node mutation.
parameter_probability (float, default=0.2) – Probability of parameter mutation.
prune_probability (float, default=0.2) – Probability of pruning mutation.
fork_probability (float, default=0.2) – Probability of fork mutation.
repeat_mutation_probability (float, default=0.05) – Probability of repeating a mutation.
crossover_pool_size (int, default=num_particles) – Size of the crossover pool.
mutation_prob (float, default=0.75) – Probability of mutation (vs crossover).
crossover_prob (float, default=0.25) – Probability of crossover (vs mutation).
exclusive (bool, default=True) – Whether mutation and crossover are exclusive.
num_particles (int, default=50) – Number of particles for sampling.
num_mcmc_samples (int, default=5) – Number of MCMC samples.
target_ess (float, default=0.8) – Target effective sample size.
param_init_bounds (list, default=[-5, 5]) – Bounds for parameter initialization.
opt_restarts (int, default=1) – Number of optimization restarts.
model_selection (str, default="mode") – The way to choose a best model from the produced distribution of models. Current options are “mode” for the most frequently occuring model and “max_nml” for the model with maximum normalized marginal likelihood.
checkpoint_file (str or None, default=None) – File path for saving and loading sampling progress. If the checkpoint file exists, fitting will attempt to resume from the saved state and continue updating the checkpoint as sampling proceeds. If None, no checkpointing is performed.
random_state (int or None, default=None) – Random seed for reproducibility.
max_time (float or None, default=None) – Maximum time in seconds to run the sampling process. If None, the sampling will run until completion without time constraints. Cannot be used together with max_equation_evals.
max_equation_evals (int or None, default=None) – Maximum number of evaluations during the sampling process. If None, the sampling will run until completion without time constraints. Cannot be used together with max_time.
show_progress_bar (bool, default=True) – Whether to display a progress bar during fitting. When False, the progress bar will be hidden, which is useful for hyperparameter tuning or when running multiple fits in parallel.

__init__(operators=None, max_complexity=24, terminal_probability=0.1, constant_probability=None, command_probability=0.2, node_probability=0.2, parameter_probability=0.2, prune_probability=0.2, fork_probability=0.2, repeat_mutation_probability=0.05, crossover_pool_size=None, mutation_prob=0.75, crossover_prob=0.25, exclusive=True, num_particles=50, num_mcmc_samples=5, target_ess=0.8, param_init_bounds=None, opt_restarts=1, model_selection='mode', checkpoint_file=None, random_state=None, max_time=None, max_equation_evals=None, show_progress_bar=True)[source]

fit(X, y)[source]

Fit the symbolic regression model to training data.

Parameters:

X (array-like of shape (n_samples, n_features)) – Training input samples.
y (array-like of shape (n_samples,)) – Target values.

Returns:

self – Returns self.

Return type:

object

predict(X)[source]

Predict using the best symbolic regression model.

Parameters:: X (array-like of shape (n_samples, n_features)) – Samples to predict.
Returns:: y_pred – Returns predicted values.
Return type:: array-like of shape (n_samples,)

score(X, y, sample_weight=None)[source]

Return the coefficient of determination R^2 of the prediction.

Parameters:

X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – R^2 of self.predict(X) with respect to y.

Return type:

float

get_expression()[source]

Get the symbolic expression of the best model.

Returns:: expression – String representation of the best model.
Return type:: str

get_models()[source]

Get all sampled models and their likelihoods.

Returns:

models (list) – List of all sampled models.
likelihoods (numpy.ndarray) – Array of corresponding likelihoods.

classmethod __init_subclass__(**kwargs)

Set the set_{method}_request methods.

This uses PEP-487 [1] to set the set_{method}_request methods. It looks for the information available in the set default values which are set using __metadata_request__* class attributes, or inferred from method signatures.

The __metadata_request__* class attributes are used when a method does not explicitly accept a metadata through its arguments or if the developer would like to specify a request value for those metadata which are different from the default None.

References

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → PysipsRegressor

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

Core Modules

These modules provide the underlying implementation of the symbolic regression algorithm.

Sampler

Sequential Monte Carlo (SMC) Sampling with Custom Prior and MCMC Kernel.

This module provides high-level functions for performing Sequential Monte Carlo sampling using custom prior distributions and Metropolis-Hastings MCMC kernels. It integrates with the smcpy library to provide adaptive sampling capabilities with unique value generation and optional checkpointing support.

The module is designed for scenarios where you need to sample from a parameter space using a custom generator function while ensuring uniqueness of samples and applying likelihood-based filtering. Checkpointing allows for resuming interrupted sampling runs and provides fault tolerance for long-running computations.

Example

>>> def my_likelihood(x):
...     return np.exp(-0.5 * x**2)  # Gaussian-like likelihood
>>>
>>> def my_proposal(x):
...     return x + np.random.normal(0, 0.1)  # Random walk proposal
>>>
>>> def my_generator():
...     return np.random.uniform(-5, 5)  # Uniform parameter generator
>>>
>>> # Basic usage without checkpointing
>>> models, likelihoods = sample(my_likelihood, my_proposal, my_generator)
>>> print(f"Found {len(models)} models with likelihoods")
>>>
>>> # Usage with checkpointing
>>> models, likelihoods = sample(my_likelihood, my_proposal, my_generator,
...                              checkpoint_file="my_sampling.pkl")
>>> print(f"Checkpointed run completed with {len(models)} models")

Notes

This module uses the following workflow: 1. Creates a custom Prior that generates unique models 2. Sets up a Metropolis-Hastings MCMC kernel 3. Optionally enables checkpointing for progress persistence 4. Runs adaptive SMC sampling 5. Returns the final population of models and their likelihoods

The covariance calculation is disabled in the mutator as a workaround for object-based parameters that may not support standard covariance computation.

Checkpointing automatically saves sampling progress and can resume from the last saved state if the checkpoint file exists when sampling begins.

pysips.sampler.sample(likelihood, proposal, generator, max_time=None, max_equation_evals=None, multiprocess=False, kwargs=None, seed=None, checkpoint_file=None, show_progress_bar=True)[source]

Perform Sequential Monte Carlo sampling with default parameters.

This is a high-level convenience function that sets up and runs SMC sampling with commonly used default parameters. For more control over the sampling process, use run_smc directly.

Parameters:

likelihood (callable) – Function that computes the likelihood of a given parameter value. Should accept a single parameter and return a scalar likelihood value.
proposal (callable) – Function that proposes new parameter values given a current value. Used in the Metropolis-Hastings MCMC steps.
generator (callable) – Function that generates initial parameter values when called with no arguments. Should return hashable values for uniqueness tracking.
max_time (float, optional) – Maximum compute time limit for the sampling, in seconds (default no time limit).
max_equation_evals (int, optional) – Maximum number of equation evaluations allowed during sampling (default: no limit).
multiprocess (bool, optional) – Whether to use multiprocessing for likelihood evaluations (default: False).
kwargs (dict, optional) – Additional keyword arguments to override default SMC parameters. Default parameters are {“num_particles”: 5000, “num_mcmc_samples”: 10}.
seed (int, optional) – Random seed for reproducible results (default: None).
checkpoint_file (str, optional) – File path for saving and loading sampling progress. If the checkpoint file exists, sampling will attempt to resume from the saved state and continue updating the checkpoint as it proceeds. If None, no checkpointing is performed (default: None).
show_progress_bar (bool, optional) – Whether to display a progress bar during sampling. When False, the progress bar will be hidden, which is useful for hyperparameter tuning or when running multiple fits in parallel (default: True).

Returns:

models (list) – List of parameter values from the final SMC population.
likelihoods (list) – List of likelihood values corresponding to each model in the final population.
phis (list) – List of phi values (tempering parameters) from the SMC sequence.

Examples

>>> def likelihood_func(x):
...     return np.exp(-0.5 * (x - 2)**2)
>>>
>>> def proposal_func(x):
...     return x + np.random.normal(0, 0.5)
>>>
>>> def generator_func():
...     return np.random.uniform(-10, 10)
>>>
>>> # Basic sampling without checkpointing
>>> models, likes, phis = sample(likelihood_func, proposal_func, generator_func)
>>> print(f"Sampled {len(models)} models")
>>>
>>> # Sampling with checkpointing
>>> models, likes, phis = sample(likelihood_func, proposal_func, generator_func,
...                              checkpoint_file="progress.pkl")
>>> print(f"Checkpointed sampling completed")
>>>
>>> # Sampling with max equation evaluations limit
>>> models, likes, phis = sample(likelihood_func, proposal_func, generator_func,
...                              max_equation_evals=10000)
>>> print(f"Sampling completed with evaluation limit")

Notes

This function internally calls run_smc with default parameters. The default configuration uses 5000 particles and 10 MCMC samples per SMC step, which provides a reasonable balance between accuracy and computational cost for many applications.

When checkpointing is enabled: - If the checkpoint file exists, sampling resumes from the saved state - Progress is automatically saved during the sampling process - The checkpoint file uses pickle format for serialization - Interrupted runs can be restarted from the last checkpoint

When max_equation_evals and max_time are specified: - max_time takes precedence over max_equation_evals

pysips.sampler.run_smc(likelihood, proposal, generator, max_time, max_equation_evals, multiprocess, kwargs, rng, checkpoint_file, show_progress_bar)[source]

Execute Sequential Monte Carlo sampling with full parameter control.

This function implements the core SMC sampling algorithm using a custom prior distribution and Metropolis-Hastings MCMC kernel. It provides complete control over all sampling parameters and optional checkpointing.

Parameters:

likelihood (callable) – Function that computes the likelihood of a given parameter value.
proposal (callable) – Function that proposes new parameter values in MCMC steps.
generator (callable) – Function that generates unique initial parameter values.
max_time (float, None) – Maximum compute time limit for the sampling, in seconds. None value indicates no time limit.
max_equation_evals (int, None) – Maximum number of equation evaluations allowed during sampling. None value indicates no evaluation limit.
multiprocess (bool) – Whether to enable multiprocessing for likelihood evaluations.
kwargs (dict) – Keyword arguments for the SMC sampler (e.g., num_particles, num_mcmc_samples).
rng (numpy.random.Generator) – Random number generator instance for reproducible sampling.
checkpoint_file (str, None) – File path for checkpointing. If None, no checkpointing is performed. If provided, sampling progress will be saved to this file and can be resumed if the file exists from a previous run.
show_progress_bar (bool) – Whether to display a progress bar during sampling. When False, progress updates are suppressed.

Returns:

models (list) – Parameter values from the final SMC population, converted to list format.
likelihoods (list) – Likelihood values for each model in the final population, computed fresh to ensure consistency.
phis (list) – Phi values (tempering parameters) from the SMC sequence.

Notes

The checkpointing mechanism uses SMCPy’s PickleStorage context manager: - Automatically detects existing checkpoint files and resumes - Saves progress incrementally during sampling - Uses append mode (‘a’) by default for safe restarts - Handles serialization of the complete sampler state

Sampling strategy selection logic: - If max_time is specified FixedTimeSampler is used - If max_equation_evals is specified (and max_time is not), MaxStepSampler is used - If neither is specified, AdaptiveSampler is used

Prior

Custom Prior Distribution for Unique Random Value Generation.

This module provides a specialized prior distribution class that extends the ImproperUniform prior from smcpy to generate unique random values using a custom generator function. It is designed to prevent duplicate values in sampling scenarios where uniqueness is required.

Constants

MAX_REPEATSint: Maximum number of consecutive attempts allowed before warning about potential generator issues (default: 100).

Example

>>> def my_generator():
...     return np.random.randint(0, 1000)
>>>
>>> prior = Prior(my_generator)
>>> samples = prior.rvs(10)  # Generate 10 unique samples
>>> print(samples.shape)
(10, 1)

class pysips.prior.Prior(generator)[source]

Bases: ImproperUniform

A class that extends ImproperUniform to generate unique random values.

This prior uses a custom generator function to produce unique random values and warns if the generator repeatedly produces duplicates.

Parameters:: generator (callable) – A function that generates random values when called with no arguments. This generator should return a hashable type.

Notes

This class tracks duplicate values and warns if the generator fails to produce a unique value after a set number of consecutive attempts.

__init__(generator)[source]

rvs(N, random_state=None)[source]

Generate N unique random values using the generator.

Parameters:: N (int) – Number of unique random values to generate.
Returns:: Array of shape (N, 1) containing unique values generated by the generator.
Return type:: ndarray
Warns:: UserWarning – If the generator fails to produce a new unique value after MAX_REPEATS consecutive attempts.

Notes

The random_state parameter is included for compatibility with scipy.stats distributions but is not actually used by this method.

Metropolis

Metropolis-Hastings MCMC Implementation for Symbolic Regression.

This module provides a specialized Metropolis-Hastings Markov Chain Monte Carlo (MCMC) sampler designed for symbolic regression models. It extends the smcpy VectorMCMC class to handle symbolic expressions (bingo AGraph objects) as parameters, with custom proposal mechanisms and likelihood evaluation for equation discovery.

The implementation supports both single-process and multiprocess likelihood evaluation, making it suitable for computationally intensive symbolic regression tasks where model evaluation is the computational bottleneck.

Algorithm Overview

The Metropolis algorithm follows the standard accept/reject framework:

Proposal Generation: Uses a provided proposal function to generate new symbolic expressions from current ones
Likelihood Evaluation: Computes log-likelihood for proposed expressions using the provided likelihood function
Accept/Reject Decision: Accepts or rejects proposals based on the Metropolis criterion comparing likelihoods
Chain Evolution: Iteratively builds a Markov chain of symbolic expressions that converges to the target distribution

The key adaptation for symbolic regression is handling discrete, structured parameter spaces (symbolic expressions) rather than continuous parameters.

Example Integration

>>> from bingo.symbolic_regression import AGraph
>>>
>>> def likelihood_func(model):
...     # Evaluate model on data and return log-likelihood
...     return model.evaluate_fitness_vector(X, y)
>>>
>>> def proposal_func(model):
...     # Generate new model via mutation
...     return mutate(model)
>>>
>>> mcmc = Metropolis(
...     likelihood=likelihood_func,
...     proposal=proposal_func,
...     prior=uniform_prior,
...     multiprocess=True
... )

Implementation Notes

Uniform priors are assumed (evaluate_log_priors returns ones)
Proposal updates are called after each sampling round to maintain an adaptive gene pool for crossover operations
Fitness values are cached on AGraph objects to avoid redundant computation
The implementation handles vectorized operations for efficiency

class pysips.metropolis.Metropolis(likelihood, proposal, prior, multiprocess=False)[source]

Bases: VectorMCMC

Class for running basic MCMC w/ the Metropolis algorithm

Parameters:

likelihood (callable) – Computes marginal log likelihood given a bingo AGraph
proposal (callable) – Proposes a new AGraph conditioned on an existing AGraph; must be symmetric.

__init__(likelihood, proposal, prior, multiprocess=False)[source]

Parameters:

model (callable) – maps inputs to outputs
data (1D array) – data corresponding to model outputs
priors (list of objects) – random variable objects with a pdf and rvs method (e.g. scipy stats random variable objects); note that the rvs method will also receive a numpy random number generator as an argument, which must be accounted for with custom prior objects
log_like_args (1D array or None) – any fixed parameters that define the likelihood function (e.g., standard deviation for a Gaussian likelihood).
log_like_func (callable) – log likelihood function that takes inputs, model, data, and hyperparameters and returns log likelihoods

smc_metropolis(inputs, num_samples, cov=None)[source]

Parameters:

model (AGraph) – model at which Markov chain initiates
num_samples (int) – number of samples in the chain; includes burnin

evaluate_model(_=None)[source]

evaluate_log_priors(inputs)[source]

evaluate_log_likelihood(inputs)[source]

Laplace NMLL

Laplace Approximation for Normalized Marginal Log-Likelihood Estimation.

This module provides functionality for computing the Normalized Marginal Log-Likelihood (NMLL) using the Laplace approximation method. It integrates with the bingo symbolic regression library to evaluate the likelihood of symbolic mathematical models given observed data.

The Laplace approximation is a method for approximating integrals that appear in Bayesian model selection, particularly useful for comparing different symbolic regression models. It approximates the marginal likelihood by making a Gaussian approximation around the maximum a posteriori (MAP) estimate of the parameters.

Key Features

Integration with bingo’s symbolic regression framework
Multiple optimization restarts to avoid local minima
Configurable scipy-based optimization backend
Automatic parameter bound initialization for robust optimization

Usage Example

>>> import numpy as np
>>> from bingo.symbolic_regression import AGraph
>>>
>>> # Generate sample data
>>> X = np.random.randn(100, 2)
>>> y = X[:, 0]**2 + X[:, 1] + np.random.normal(0, 0.1, 100)
>>>
>>> # Create NMLL evaluator
>>> nmll_evaluator = LaplaceNmll(X, y, opt_restarts=3)
>>>
>>> # Evaluate a symbolic model (assuming you have an AGraph model)
>>> # nmll_score = nmll_evaluator(model)

Notes

The multiple restart strategy helps ensure robust optimization by avoiding local minima in the parameter space, which is especially important for complex symbolic expressions.

class pysips.laplace_nmll.LaplaceNmll(X, y, opt_restarts=1, **optimizer_kwargs)[source]

Bases: object

Normalized Marginal Likelihood using Laplace approximation

Parameters:

X (2d Numpy Array) – Array of shape [num_datapoints, num_features] representing the input features
y (1d Numpy Array) – Array of labels of shape [num_datapoints]
opt_restarts (int, optional) – number of times to perform gradient based optimization, each with different random initialization, by default 1
**optimizer_kwargs – any keyword arguments to be passed to bingo’s scipy optimizer

__init__(X, y, opt_restarts=1, **optimizer_kwargs)[source]

__call__(model)[source]

calaculates NMLL using the Laplace approximation

Parameters:: model (AGraph) – a bingo equation using the AGraph representation

Proposal Mechanisms

Mutation Proposal

Mutation-Based Proposal Generator for Symbolic Regression Models.

This module provides a proposal mechanism for symbolic regression that uses bingo’s AGraph mutation operations to generate new candidate models from existing ones. It is designed to work within Markov Chain Monte Carlo (MCMC) sampling frameworks where new model proposals are needed at each step.

The module implements a configurable mutation strategy that can perform various types of structural changes to symbolic mathematical expressions, including adding/removing nodes, changing operations, modifying parameters, and pruning or expanding expression trees.

Key Features

Multiple mutation types: command, node, parameter, prune, and fork mutations
Configurable probabilities for each mutation type
Repeat mutation capability for more dramatic changes
Ensures non-identical proposals (prevents proposing the same model)
Seeded random number generation for reproducible results
Integration with bingo’s ComponentGenerator for operator management

Mutation Types

Command Mutation: Changes the operation at a node (e.g., ‘+’ to ‘*’)
Node Mutation: Replaces a node with a new randomly generated subtree
Parameter Mutation: Modifies the numeric constants in the expression
Prune Mutation: Removes a portion of the expression tree
Fork Mutation: Adds a new branch to the expression tree
Repeat Mutation: Recursively applies additional mutations with specified probability

Usage Example

>>> # Create a mutation proposal generator
>>> proposal = MutationProposal(
...     X_dim=3,  # 3 input features
...     operators=["+", "subtract", "multiply", "divide"],
...     terminal_probability=0.2,
...     command_probability=0.3,
...     node_probability=0.2,
...     seed=42
... )
>>>
>>> # Use in MCMC sampling (assuming you have a model)
>>> # new_model = proposal(current_model)

Notes

The proposal generator ensures that new proposals are always different from the input model by repeatedly applying mutations until a change occurs. This prevents MCMC chains from getting stuck with identical consecutive states.

The update() method is provided for compatibility with adaptive MCMC frameworks but currently performs no operations, as the mutation probabilities are fixed at initialization.

class pysips.mutation_proposal.MutationProposal(x_dim, operators, terminal_probability=0.1, constant_probability=None, command_probability=0.2, node_probability=0.2, parameter_probability=0.2, prune_probability=0.2, fork_probability=0.2, repeat_mutation_probability=0.0, seed=None)[source]

Bases: object

Proposal functor that performs bingo’s Agraph mutation

Parameters:

x_dim (int) – dimension of input data (number of features in dataset)
operators (list of str) – list of equation primatives to allow, e.g. [“+”, “subtraction”, “pow”]
terminal_probability (float, optional) – [0.0-1.0] probability that a new node will be a terminal, by default 0.1
constant_probability (float, optional) – [0.0-1.0] probability that a new terminal will be a constant, by default weighted the same as a single feature of the input data
command_probability (float, optional) – probability of command mutation, by default 0.2
node_probability (float, optional) – probability of node mutation, by default 0.2
parameter_probability (float, optional) – probability of parameter mutation, by default 0.2
prune_probability (float, optional) – probability of pruning (removing a portion of the equation), by default 0.2
fork_probability (float, optional) – probability of forking (adding an additional branch to the equation), by default 0.2
repeat_mutation_probability (float, optional) – probability of a repeated mutation (applied recursively). default 0.0
seed (int, optional) – random seed used to control repeatability

__init__(x_dim, operators, terminal_probability=0.1, constant_probability=None, command_probability=0.2, node_probability=0.2, parameter_probability=0.2, prune_probability=0.2, fork_probability=0.2, repeat_mutation_probability=0.0, seed=None)[source]

__call__(model)[source]

Apply mutation to generate a new symbolic expression model.

This method takes a symbolic regression model (AGraph) as input and returns a new model created by applying one or more mutation operations. The method guarantees that the returned model is different from the input model by repeating mutations if necessary.

Parameters:

model (AGraph) – The input symbolic regression model to be mutated. This should be a bingo AGraph instance representing a mathematical expression.

Returns:

AGraph – A new symbolic regression model created by applying mutation(s) to the input model. Guaranteed to be different from the input model.
Mutation Process
—————
1. **Initial Mutation** (Applies the configured mutation operation to the model)
2. **Repeat Mutations** (May apply additional mutations based on repeat_mutation_probability)
3. **Difference Check** (Ensures the new model differs from the original one)
4. **Repeated Attempts** (If the mutation produces an identical model, tries again)

Notes

The mutation type applied is selected probabilistically based on the

probabilities specified during initialization (command_probability, node_probability, etc.) - The repeat mutation feature allows for more dramatic changes by applying multiple mutations in sequence with probability repeat_mutation_probability - This method will always return a different model, never the same as the input

Crossover Proposal

Crossover-Based Proposal Generator for Symbolic Regression Models.

This module provides a crossover-based proposal mechanism for symbolic regression that creates new candidate models by combining genetic material from existing models. It implements genetic programming crossover operations using bingo’s AGraphCrossover functionality within an MCMC or evolutionary algorithm framework.

The crossover operation mimics biological reproduction by exchanging subtrees between two parent expressions to create offspring that inherit characteristics from both parents. This approach can effectively explore the space of symbolic expressions by combining successful components from different models.

Key Features

Random partner selection from a configurable gene pool
Stochastic child selection (50/50 probability between two crossover offspring)
Avoids self-crossover by ensuring different parent selection
Updateable gene pool for adaptive sampling strategies
Seeded random number generation for reproducible results

Crossover Mechanism

The crossover operation works by: 1. Selecting a random crossover point in each parent expression tree 2. Swapping the subtrees at those points between the two parents 3. Producing two offspring that combine features from both parents 4. Randomly selecting one of the two offspring as the proposal

This process allows successful expression fragments to be preserved and recombined in novel ways, potentially discovering better solutions through the exploration of hybrid models.

Usage Example

>>> # Assume you have a collection of symbolic models
>>> gene_pool = [model1, model2, model3, model4]  # List of AGraph models
>>>
>>> # Create crossover proposal generator
>>> crossover = CrossoverProposal(gene_pool, seed=42)
>>>
>>> # Use in MCMC or evolutionary sampling
>>> current_model = model1
>>> new_proposal = crossover(current_model)
>>>
>>> # Update gene pool as better models are discovered
>>> updated_pool = [best_model1, best_model2, new_good_model]
>>> crossover.update(updated_pool)

Integration Notes

The update() method allows for dynamic gene pool management, enabling adaptive strategies where successful models from the sampling process can be added to influence future proposals.

class pysips.crossover_proposal.CrossoverProposal(gene_pool, seed=None)[source]

Bases: object

A proposal operator that performs crossover between AGraph models.

This class implements a callable object that creates new models by performing crossover operations between an input model and randomly selected partners from a gene pool. It utilizes bingo’s AGraphCrossover mechanism and randomly selects one of the two children produced by each crossover operation.

Parameters:

gene_pool (list of AGraph) – A collection of AGraph models that will be used as potential partners during crossover operations
seed (int, optional) – Random seed for the internal random number generator, used to control repeatability of operations

__init__(gene_pool, seed=None)[source]

__call__(model)[source]

Perform crossover between the input model and a randomly selected one from the gene pool.

This method randomly selects a parent from the gene pool, performs crossover between the input model and the selected parent, and returns one of the two resulting children with equal probability.

Parameters:: model (AGraph) – The model to be used as the first parent in the crossover operation
Returns:: A new model resulting from crossover between the input model and a randomly selected model from the gene pool
Return type:: AGraph

update(gene_pool, *_, **__)[source]

Update the gene pool used for selecting crossover partners.

Parameters:

gene_pool (iterable of AGraph) – The new collection of AGraph models to use as the gene pool
*_ (tuple) – Additional positional arguments (ignored)
**__ (dict) – Additional keyword arguments (ignored)

Notes

This method allows for updating the gene pool while maintaining the same crossover behavior. The additional parameters are included for compatibility with other proposal update interfaces but are not used.

Random Choice Proposal

Composite Proposal Generator with Probabilistic Selection.

This module provides a meta-proposal mechanism that probabilistically selects and applies one or more proposal operators from a collection of available proposals. It supports both exclusive selection (choosing exactly one proposal) and non-exclusive selection (choosing multiple proposals to apply sequentially).

This approach allows for flexible proposal strategies in MCMC sampling or evolutionary algorithms by combining different types of modifications (e.g., mutation, crossover, local optimization) with configurable probabilities.

Selection Modes

Exclusive Mode (default): Selects exactly one proposal based on the provided probabilities using weighted random selection. The probabilities are automatically normalized to sum to the cumulative total.
Non-Exclusive Mode: Each proposal is independently selected based on its probability. If no proposals are selected in a round, the process repeats until at least one is chosen. Selected proposals are applied sequentially in random order.

Usage Examples

Exclusive selection (choose one proposal type): >>> from mutation import MutationProposal >>> from crossover import CrossoverProposal >>> >>> mutation = MutationProposal(X_dim=3, operators=[“+”, “*”]) >>> crossover = CrossoverProposal(gene_pool) >>> >>> # 70% mutation, 30% crossover >>> proposal = RandomChoiceProposal( … [mutation, crossover], … [0.7, 0.3], … exclusive=True … )

Non-exclusive selection (can apply multiple proposals): >>> # Each proposal has independent 40% chance of being applied >>> proposal = RandomChoiceProposal( … [mutation, crossover, local_optimizer], … [0.4, 0.4, 0.2], … exclusive=False … )

Integration Notes

The update() method automatically propagates parameter updates to all constituent proposals, making this class compatible with adaptive sampling frameworks that modify proposal parameters during execution.

All constituent proposals must implement: - __call__(model) method for applying the proposal - update(*args, **kwargs) method for parameter updates (optional)

class pysips.random_choice_proposal.RandomChoiceProposal(proposals, probabilities, exclusive=True, seed=None)[source]

Bases: object

Randomly choose a proposal to use

Parameters:

proposals (list of proposals) – options for the proposal
probabilities (list of float) – probabilties of choosing each proposal
exclusive (bool, optional) – whether the proposals are mutually exclusive or if they can all be performed at once, by default True
seed (int, optional) – random seed used to control repeatability

__init__(proposals, probabilities, exclusive=True, seed=None)[source]

__call__(model)[source]

Apply randomly selected proposal(s) to generate a new model.

This method implements the core functionality of the composite proposal generator. It selects one or more proposals based on the configured probabilities and selection mode, then applies them sequentially to transform the input model.

Parameters:

model (object) – The input model to be transformed. This should be compatible with all constituent proposal operators (typically an AGraph for symbolic regression or similar structured representation).

Returns:

object – A new model resulting from applying the selected proposal(s). The type matches the input model type.
Process Overview
—————-
1. **Selection Phase** (Randomly selects active proposals based on:)
- Exclusive mode (Exactly one proposal via weighted selection)
- Non-exclusive mode (Zero or more proposals via independent trials)
2. **Application Phase** (Applies selected proposals sequentially:)
- First proposal transforms the original model
- Subsequent proposals transform the result of previous applications
- Order is randomized in non-exclusive mode to avoid bias

Notes

In non-exclusive mode, if no proposals are initially selected, the

selection process repeats until at least one proposal is chosen - Sequential application means later proposals operate on the results of earlier ones, potentially creating compound transformations

update(*args, **kwargs)[source]

Propagate parameter updates to all constituent proposals.

This method forwards update calls to all constituent proposal operators, enabling the composite proposal to participate in adaptive sampling schemes where proposal parameters are modified during the sampling process.

Parameters:

*args (tuple) – Positional arguments to be passed to each constituent proposal’s update method. Common examples include new gene pools, population statistics, or adaptation parameters.
**kwargs (dict) – Keyword arguments to be passed to each constituent proposal’s update method. May include parameters like learning rates, temperature schedules, or other adaptive parameters.

Returns:

This method modifies the constituent proposals in-place and does not return any values.

Return type:

None