Quickstart Guide

This quickstart guide will help you get up and running with PySIPS in under 5 minutes.

Installation

Install PySIPS using pip:

pip install pysips

Basic Usage

Here’s a minimal example to get you started with symbolic regression:

import numpy as np
from pysips import PysipsRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Generate synthetic data: y = x^2 + noise
np.random.seed(42)
X = np.linspace(-3, 3, 100).reshape(-1, 1)
y = X[:, 0]**2 + np.random.normal(0, 0.1, size=X.shape[0])

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create the regressor
regressor = PysipsRegressor(
    operators=['+', '-', '*'],  # Available operators
    max_complexity=12,          # Maximum expression size
    num_particles=100,          # Population size
    num_mcmc_samples=10,        # MCMC steps per iteration
    max_time=60,                # Maximum runtime in seconds
    random_state=42
)

# Fit the regressor
regressor.fit(X_train, y_train)

# Make predictions
y_pred = regressor.predict(X_test)

# Get the discovered expression
expression = regressor.get_expression()
print(f"Discovered expression: {expression}")
print(f"R² score: {r2_score(y_test, y_pred):.4f}")

Expected Output

Discovered expression: x_0^2
R² score: 0.9987

Understanding the Parameters

Essential Parameters:

  • operators: List of mathematical operators to use (e.g., ['+', '-', '*', '/', 'pow'])

  • max_complexity: Maximum size of the expression graph (controls model complexity)

  • num_particles: Number of particles in the SMC population (higher = better exploration)

  • random_state: Random seed for reproducibility

Time Control:

  • max_time: Maximum runtime in seconds (default: no limit)

  • show_progress_bar: Display progress during fitting (default: True)

Model Selection:

  • model_selection: Choose 'mode' (most frequent) or 'max_likelihood' (best scoring)

Accessing Results

After fitting, you can access various results:

# Get the best expression as a string
expression = regressor.get_expression()

# Get all unique models and their likelihoods
models, likelihoods = regressor.get_models()
print(f"Number of unique models: {len(models)}")

# Make predictions on new data
y_pred = regressor.predict(X_new)

Next Steps

  • Continue to the PySIPS Tutorial for more advanced usage and examples

  • Explore the PySIPS API Documentation for detailed API documentation

  • Check out the examples in the demos/ directory of the repository

Common Issues

Long Runtime:

If fitting takes too long, try: - Reducing num_particles (e.g., 50-100 for quick experiments) - Reducing max_complexity (e.g., 10-15 for simpler expressions) - Setting max_time to limit the runtime

Poor Results:

If the discovered expression is not accurate, try: - Increasing num_particles (e.g., 200-500 for better exploration) - Adjusting the available operators - Increasing max_complexity if you expect more complex relationships - Running for longer (increase or remove max_time)