nl_causal.sparse_reg

Package Contents

Classes

WLasso

Linear Model trained with Weighted L1 prior as regularizer (aka the weighted-Lasso)

SCAD

Linear Model trained with Weighted SCAD prior as regularizer (aka the weighted-SCAD)

SCAD_IC

Linear Model Selection trained with SCAD as regularizer

L0_IC

Linear Model Selection trained with L0 prior as regularizer

class nl_causal.sparse_reg.WLasso(alpha=1.0, *, ada_weight=1.0, fit_intercept=True, precompute=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')

Bases: sklearn.base.RegressorMixin, sklearn.linear_model._base.LinearModel

Linear Model trained with Weighted L1 prior as regularizer (aka the weighted-Lasso) The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * sum_{j=1}^d weight_j * |w_j|

Technically the Weighted Lasso model is optimizing the same objective function as the Lasso with X = X / ada_weight[None,:].

Parameters:
alpha: float, default=1.0

Constant that multiplies the L1 term. Defaults to 1.0. alpha = 0 is equivalent to an ordinary least square, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Lasso object is not advised. Given this, you should use the LinearRegression object.

ada_weight: ndarray of shape (n_features,)

Weight that multiplies the L1 term for each coefficient. Defaults to 1.0.

fit_intercept: bool, default=True

Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

normalize: bool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use StandardScaler before calling fit on an estimator with normalize=False.

precompute: ‘auto’, bool or array-like of shape (n_features, n_features), default=False

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix can also be passed as argument. For sparse input this option is always True to preserve sparsity.

copy_X: bool, default=True

If True, X will be copied; else, it may be overwritten.

max_iter: int, default=1000

The maximum number of iterations.

tol: float, default=1e-4

The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

warm_start: bool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

positive: bool, default=False

When set to True, forces the coefficients to be positive.

random_state: int, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.

selection: {‘cyclic’, ‘random’}, default=’cyclic’

If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.

Examples

>>> from nl_causal import sparse_reg
>>> clf = sparse_reg.WLasso(alpha=0.1, ada_weight=[1.,0.])
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
>>> print(clf.coef_)
[0.         , 0.99999998]
>>> print(clf.intercept_)
1.7881393254981504e-08
>>> clf = sparse_reg.WLasso(alpha=0.1, ada_weight=[0.,1.])
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
>>> print(clf.coef_)
[0.99999998 0.              ]
>>> print(clf.intercept_)
1.7881393254981504e-08
Attributes:
coef_: ndarray of shape (n_features,) or (n_targets, n_features)

Parameter vector (w in the cost function formula).

dual_gap_: float or ndarray of shape (n_targets,)

Given param alpha, the dual gaps at the end of the optimization, same shape as each observation of y.

sparse_coef_: sparse matrix of shape (n_features, 1) or (n_targets, n_features)

Readonly property derived from coef_.

intercept_: float or ndarray of shape (n_targets,)

Independent term in decision function.

n_iter_: int or list of int

Number of iterations run by the coordinate descent solver to reach the specified tolerance.

fit(X, y, sample_weight=None)

Fit linear model.

Parameters:
X: {array-like, sparse matrix} of shape (n_samples, n_features)

Training data

y: array-like of shape (n_samples,) or (n_samples, n_targets)

Target values. Will be cast to X’s dtype if necessary

sample_weight: array-like of shape (n_samples,), default=None

Individual weights for each sample

Returns:
self: returns an instance of self.
class nl_causal.sparse_reg.SCAD(alpha=1.0, *, ada_weight=1.0, fit_intercept=True, precompute=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')

Bases: sklearn.base.RegressorMixin, sklearn.linear_model._base.LinearModel

Linear Model trained with Weighted SCAD prior as regularizer (aka the weighted-SCAD) The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * sum_{j=1}^d weight_j * SCAD(|w_j|)
Parameters:
alpha: float, default=1.0

Constant that multiplies the SCAD penalty. Defaults to 1.0. alpha = 0 is equivalent to an ordinary least square, solved by the LinearRegression object. For numerical reasons, using alpha = 0 with the Lasso object is not advised. Given this, you should use the LinearRegression object.

ada_weight: ndarray of shape (n_features,)

Weight that multiplies the SCAD term for each coefficient. Defaults to 1.0.

fit_intercept: bool, default=True

Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

normalize: bool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use StandardScaler before calling fit on an estimator with normalize=False.

precompute: ‘auto’, bool or array-like of shape (n_features, n_features), default=False

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix can also be passed as argument. For sparse input this option is always True to preserve sparsity.

copy_X: bool, default=True

If True, X will be copied; else, it may be overwritten.

max_iter: int, default=1000

The maximum number of iterations.

tol: float, default=1e-4

The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

warm_start: bool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

positive: bool, default=False

When set to True, forces the coefficients to be positive.

random_state: int, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.

selection: {‘cyclic’, ‘random’}, default=’cyclic’

If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.

Examples

>>> from nl_causal import sparse_reg
>>> clf = sparse_reg.SCAD(alpha=0.1)
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
>>> print(clf.coef_)
[0.99999998 0.              ]
>>> print(clf.intercept_)
1.7881393254981504e-08
Attributes:
coef_: ndarray of shape (n_features,) or (n_targets, n_features)

Parameter vector (w in the cost function formula).

dual_gap_: float or ndarray of shape (n_targets,)

Given param alpha, the dual gaps at the end of the optimization, same shape as each observation of y.

sparse_coef_: sparse matrix of shape (n_features, 1) or (n_targets, n_features)

Readonly property derived from coef_.

intercept_: float or ndarray of shape (n_targets,)

Independent term in decision function.

n_iter_: int or list of int

Number of iterations run by the coordinate descent solver to reach the specified tolerance.

fit(X, y, sample_weight=None)

Fit linear model.

Parameters:
X: {array-like, sparse matrix} of shape (n_samples, n_features)

Training data

y: array-like of shape (n_samples,) or (n_samples, n_targets)

Target values. Will be cast to X’s dtype if necessary

sample_weight: array-like of shape (n_samples,), default=None

Individual weights for each sample

Returns:
self: returns an instance of self.
grad_SCAD_(a=3.7)

Compute first-order gradient of SCAD

class nl_causal.sparse_reg.SCAD_IC(alphas, *, criterion='bic', ada_weight=1.0, fit_intercept=True, precompute=False, copy_X=True, max_iter=1000, var_res=None, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')

Bases: sklearn.linear_model.LassoLarsIC

Linear Model Selection trained with SCAD as regularizer The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * sum_{j=1}^d weight_j * SCAD(|w_j|)
Parameters:
alphas: float, default=1.0

List of alphas where to compute the SCAD. default=np.arange(-3,3,.1)

criterion: {‘bic’, ‘aic’}, default=’bic’

Selection criterion of model selection.

mask: ndarray of shape (n_features,); dtype = bool

Indicator to count the variable in L0 term. default = ‘full’

fit_intercept: bool, default=True

Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

normalize: bool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use StandardScaler before calling fit on an estimator with normalize=False.

precompute: ‘auto’, bool or array-like of shape (n_features, n_features), default=False

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix can also be passed as argument. For sparse input this option is always True to preserve sparsity.

copy_X: bool, default=True

If True, X will be copied; else, it may be overwritten.

max_iter: int, default=1000

The maximum number of iterations.

tol: float, default=1e-4

The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

warm_start: bool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

positive: bool, default=False

When set to True, forces the coefficients to be positive.

random_state: int, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.

selection: {‘cyclic’, ‘random’}, default=’cyclic’

If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.

Examples

>>> from nl_causal import sparse_reg
>>> clf = sparse_reg.SCAD_IC(alphas=[.001, .01, .1, 1.])
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
>>> print(clf.coef_)
[1. 0.]
>>> print(clf.intercept_)
1.7881396363605973e-10
>>> clf.selection_summary()
       alpha        model     criteria                 mse
0   0.001   [0]     3.663001e-01  2.131628e-20
1   0.010   [0]     3.758041e-01  2.131628e-18
2   0.100   [0]     1.326204e+00  2.131628e-16
3   1.000   []      3.002400e+15  6.666667e-01
Attributes:
coef_: ndarray of shape (n_features,) or (n_targets, n_features)

Parameter vector (w in the cost function formula).

dual_gap_: float or ndarray of shape (n_targets,)

Given param alpha, the dual gaps at the end of the optimization, same shape as each observation of y.

sparse_coef_: sparse matrix of shape (n_features, 1) or (n_targets, n_features)

Readonly property derived from coef_.

intercept_: float or ndarray of shape (n_targets,)

Independent term in decision function.

n_iter_: int or list of int

Number of iterations run by the coordinate descent solver to reach the specified tolerance.

fit(X, y, sample_weight=None)

Fit linear model.

Parameters:
X: {array-like, sparse matrix} of shape (n_samples, n_features)

Training data

y: array-like of shape (n_samples,) or (n_samples, n_targets)

Target values. Will be cast to X’s dtype if necessary

sample_weight: array-like of shape (n_samples,), default=None

Individual weights for each sample

Returns:
self: returns an instance of self.
_get_estimator()
_is_multitask()
_more_tags()
selection_summary()

A summary for the result of model selection of the sparse regression in Stage 2.

Returns:
df: dataframe

dataframe with columns: “candidate_model”, “criteria”, and “mse”.

class nl_causal.sparse_reg.L0_IC(alphas, criterion='bic', *, Ks=range(10), ada_weight=True, fit_intercept=True, precompute=False, copy_X=True, max_iter=1000, verbose=False, eps=np.finfo(float).eps, tol=0.0001, warm_start=False, positive=False, var_res=None, refit=True, find_best=True, random_state=None, selection='cyclic')

Bases: sklearn.linear_model.LassoLarsIC

Linear Model Selection trained with L0 prior as regularizer The optimization objective for Lasso is:

(1 / (2 * n_samples)) * ||y - Xw||^2_2, s.t. ||w||_0 <= K
Parameters:
Ks: range of int, default=range(1,10)

Number of nonzero coef to be tuned.

alphas: float, default=1.0

List of alphas where to compute the SCAD. default=np.arange(-3,3,.1)

criterion: {‘bic’, ‘aic’}, default=’bic’

Selection criterion of model selection.

mask: ndarray of shape (n_features,); dtype = bool

Indicator to count the variable in L0 term. default = ‘full’

fit_intercept: bool, default=True

Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).

normalize: bool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use StandardScaler before calling fit on an estimator with normalize=False.

precompute: ‘auto’, bool or array-like of shape (n_features, n_features), default=False

Whether to use a precomputed Gram matrix to speed up calculations. If set to 'auto' let us decide. The Gram matrix can also be passed as argument. For sparse input this option is always True to preserve sparsity.

copy_X: bool, default=True

If True, X will be copied; else, it may be overwritten.

max_iter: int, default=1000

The maximum number of iterations.

tol: float, default=1e-4

The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

warm_start: bool, default=False

When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.

positive: bool, default=False

When set to True, forces the coefficients to be positive.

random_state: int, RandomState instance, default=None

The seed of the pseudo random number generator that selects a random feature to update. Used when selection == ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.

selection: {‘cyclic’, ‘random’}, default=’cyclic’

If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.

refitbool, default=True

refit the best selected model by OLS.

Examples

>>> from nl_causal import sparse_reg
>>> clf = sparse_reg.L0_IC(alphas=[.001, .01, .1, 1.], Ks=[1,2])
>>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2])
>>> print(clf.coef_)
[1. 0.]
>>> print(clf.intercept_)
2.220446049250313e-16
>>> clf.selection_summary()
    model           criteria                   mse
0   (0,)            3.662041e-01  3.286920e-32
1   ()              3.002400e+15  6.666667e-01
Attributes:
coef_: ndarray of shape (n_features,) or (n_targets, n_features)

Parameter vector (w in the cost function formula).

dual_gap_: float or ndarray of shape (n_targets,)

Given param alpha, the dual gaps at the end of the optimization, same shape as each observation of y.

sparse_coef_: sparse matrix of shape (n_features, 1) or (n_targets, n_features)

Readonly property derived from coef_.

intercept_: float or ndarray of shape (n_targets,)

Independent term in decision function.

n_iter_: int or list of int

Number of iterations run by the coordinate descent solver to reach the specified tolerance.

fit(X, y, sample_weight=None)

Fit linear model.

Parameters:
X: {array-like, sparse matrix} of shape (n_samples, n_features)

Training data

y: array-like of shape (n_samples,) or (n_samples, n_targets)

Target values. Will be cast to X’s dtype if necessary

sample_weight: array-like of shape (n_samples,), default=None

Individual weights for each sample

Returns:
self: returns an instance of self.
selection_summary()

A summary for the result of model selection of the sparse regression in Stage 2.

Returns:
df: dataframe

dataframe with columns: “candidate_model”, “criteria”, and “mse”.