nl_causal.sparse_reg
Package Contents
Classes
Linear Model trained with Weighted L1 prior as regularizer (aka the weighted-Lasso) |
|
Linear Model trained with Weighted SCAD prior as regularizer (aka the weighted-SCAD) |
|
Linear Model Selection trained with SCAD as regularizer |
|
Linear Model Selection trained with L0 prior as regularizer |
- class nl_causal.sparse_reg.WLasso(alpha=1.0, *, ada_weight=1.0, fit_intercept=True, precompute=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')
Bases:
sklearn.base.RegressorMixin,sklearn.linear_model._base.LinearModelLinear Model trained with Weighted L1 prior as regularizer (aka the weighted-Lasso) The optimization objective for Lasso is:
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * sum_{j=1}^d weight_j * |w_j|
Technically the Weighted Lasso model is optimizing the same objective function as the Lasso with X = X / ada_weight[None,:].
- Parameters:
- alpha: float, default=1.0
Constant that multiplies the L1 term. Defaults to 1.0.
alpha = 0is equivalent to an ordinary least square, solved by theLinearRegressionobject. For numerical reasons, usingalpha = 0with theLassoobject is not advised. Given this, you should use theLinearRegressionobject.- ada_weight: ndarray of shape (n_features,)
Weight that multiplies the L1 term for each coefficient. Defaults to 1.0.
- fit_intercept: bool, default=True
Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
- normalize: bool, default=False
This parameter is ignored when
fit_interceptis set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please useStandardScalerbefore callingfiton an estimator withnormalize=False.- precompute: ‘auto’, bool or array-like of shape (n_features, n_features), default=False
Whether to use a precomputed Gram matrix to speed up calculations. If set to
'auto'let us decide. The Gram matrix can also be passed as argument. For sparse input this option is alwaysTrueto preserve sparsity.- copy_X: bool, default=True
If
True, X will be copied; else, it may be overwritten.- max_iter: int, default=1000
The maximum number of iterations.
- tol: float, default=1e-4
The tolerance for the optimization: if the updates are smaller than
tol, the optimization code checks the dual gap for optimality and continues until it is smaller thantol.- warm_start: bool, default=False
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.
- positive: bool, default=False
When set to
True, forces the coefficients to be positive.- random_state: int, RandomState instance, default=None
The seed of the pseudo random number generator that selects a random feature to update. Used when
selection== ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.- selection: {‘cyclic’, ‘random’}, default=’cyclic’
If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.
Examples
>>> from nl_causal import sparse_reg >>> clf = sparse_reg.WLasso(alpha=0.1, ada_weight=[1.,0.]) >>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2]) >>> print(clf.coef_) [0. , 0.99999998] >>> print(clf.intercept_) 1.7881393254981504e-08 >>> clf = sparse_reg.WLasso(alpha=0.1, ada_weight=[0.,1.]) >>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2]) >>> print(clf.coef_) [0.99999998 0. ] >>> print(clf.intercept_) 1.7881393254981504e-08
- Attributes:
- coef_: ndarray of shape (n_features,) or (n_targets, n_features)
Parameter vector (w in the cost function formula).
- dual_gap_: float or ndarray of shape (n_targets,)
Given param alpha, the dual gaps at the end of the optimization, same shape as each observation of y.
- sparse_coef_: sparse matrix of shape (n_features, 1) or (n_targets, n_features)
Readonly property derived from
coef_.- intercept_: float or ndarray of shape (n_targets,)
Independent term in decision function.
- n_iter_: int or list of int
Number of iterations run by the coordinate descent solver to reach the specified tolerance.
- fit(X, y, sample_weight=None)
Fit linear model.
- Parameters:
- X: {array-like, sparse matrix} of shape (n_samples, n_features)
Training data
- y: array-like of shape (n_samples,) or (n_samples, n_targets)
Target values. Will be cast to X’s dtype if necessary
- sample_weight: array-like of shape (n_samples,), default=None
Individual weights for each sample
- Returns:
- self: returns an instance of self.
- class nl_causal.sparse_reg.SCAD(alpha=1.0, *, ada_weight=1.0, fit_intercept=True, precompute=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')
Bases:
sklearn.base.RegressorMixin,sklearn.linear_model._base.LinearModelLinear Model trained with Weighted SCAD prior as regularizer (aka the weighted-SCAD) The optimization objective for Lasso is:
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * sum_{j=1}^d weight_j * SCAD(|w_j|)
- Parameters:
- alpha: float, default=1.0
Constant that multiplies the SCAD penalty. Defaults to 1.0.
alpha = 0is equivalent to an ordinary least square, solved by theLinearRegressionobject. For numerical reasons, usingalpha = 0with theLassoobject is not advised. Given this, you should use theLinearRegressionobject.- ada_weight: ndarray of shape (n_features,)
Weight that multiplies the SCAD term for each coefficient. Defaults to 1.0.
- fit_intercept: bool, default=True
Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
- normalize: bool, default=False
This parameter is ignored when
fit_interceptis set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please useStandardScalerbefore callingfiton an estimator withnormalize=False.- precompute: ‘auto’, bool or array-like of shape (n_features, n_features), default=False
Whether to use a precomputed Gram matrix to speed up calculations. If set to
'auto'let us decide. The Gram matrix can also be passed as argument. For sparse input this option is alwaysTrueto preserve sparsity.- copy_X: bool, default=True
If
True, X will be copied; else, it may be overwritten.- max_iter: int, default=1000
The maximum number of iterations.
- tol: float, default=1e-4
The tolerance for the optimization: if the updates are smaller than
tol, the optimization code checks the dual gap for optimality and continues until it is smaller thantol.- warm_start: bool, default=False
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.
- positive: bool, default=False
When set to
True, forces the coefficients to be positive.- random_state: int, RandomState instance, default=None
The seed of the pseudo random number generator that selects a random feature to update. Used when
selection== ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.- selection: {‘cyclic’, ‘random’}, default=’cyclic’
If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.
Examples
>>> from nl_causal import sparse_reg >>> clf = sparse_reg.SCAD(alpha=0.1) >>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2]) >>> print(clf.coef_) [0.99999998 0. ] >>> print(clf.intercept_) 1.7881393254981504e-08
- Attributes:
- coef_: ndarray of shape (n_features,) or (n_targets, n_features)
Parameter vector (w in the cost function formula).
- dual_gap_: float or ndarray of shape (n_targets,)
Given param alpha, the dual gaps at the end of the optimization, same shape as each observation of y.
- sparse_coef_: sparse matrix of shape (n_features, 1) or (n_targets, n_features)
Readonly property derived from
coef_.- intercept_: float or ndarray of shape (n_targets,)
Independent term in decision function.
- n_iter_: int or list of int
Number of iterations run by the coordinate descent solver to reach the specified tolerance.
- fit(X, y, sample_weight=None)
Fit linear model.
- Parameters:
- X: {array-like, sparse matrix} of shape (n_samples, n_features)
Training data
- y: array-like of shape (n_samples,) or (n_samples, n_targets)
Target values. Will be cast to X’s dtype if necessary
- sample_weight: array-like of shape (n_samples,), default=None
Individual weights for each sample
- Returns:
- self: returns an instance of self.
- grad_SCAD_(a=3.7)
Compute first-order gradient of SCAD
- class nl_causal.sparse_reg.SCAD_IC(alphas, *, criterion='bic', ada_weight=1.0, fit_intercept=True, precompute=False, copy_X=True, max_iter=1000, var_res=None, tol=0.0001, warm_start=False, positive=False, random_state=None, selection='cyclic')
Bases:
sklearn.linear_model.LassoLarsICLinear Model Selection trained with SCAD as regularizer The optimization objective for Lasso is:
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * sum_{j=1}^d weight_j * SCAD(|w_j|)
- Parameters:
- alphas: float, default=1.0
List of alphas where to compute the SCAD. default=np.arange(-3,3,.1)
- criterion: {‘bic’, ‘aic’}, default=’bic’
Selection criterion of model selection.
- mask: ndarray of shape (n_features,); dtype = bool
Indicator to count the variable in L0 term. default = ‘full’
- fit_intercept: bool, default=True
Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
- normalize: bool, default=False
This parameter is ignored when
fit_interceptis set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please useStandardScalerbefore callingfiton an estimator withnormalize=False.- precompute: ‘auto’, bool or array-like of shape (n_features, n_features), default=False
Whether to use a precomputed Gram matrix to speed up calculations. If set to
'auto'let us decide. The Gram matrix can also be passed as argument. For sparse input this option is alwaysTrueto preserve sparsity.- copy_X: bool, default=True
If
True, X will be copied; else, it may be overwritten.- max_iter: int, default=1000
The maximum number of iterations.
- tol: float, default=1e-4
The tolerance for the optimization: if the updates are smaller than
tol, the optimization code checks the dual gap for optimality and continues until it is smaller thantol.- warm_start: bool, default=False
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.
- positive: bool, default=False
When set to
True, forces the coefficients to be positive.- random_state: int, RandomState instance, default=None
The seed of the pseudo random number generator that selects a random feature to update. Used when
selection== ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.- selection: {‘cyclic’, ‘random’}, default=’cyclic’
If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.
Examples
>>> from nl_causal import sparse_reg >>> clf = sparse_reg.SCAD_IC(alphas=[.001, .01, .1, 1.]) >>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2]) >>> print(clf.coef_) [1. 0.] >>> print(clf.intercept_) 1.7881396363605973e-10 >>> clf.selection_summary() alpha model criteria mse 0 0.001 [0] 3.663001e-01 2.131628e-20 1 0.010 [0] 3.758041e-01 2.131628e-18 2 0.100 [0] 1.326204e+00 2.131628e-16 3 1.000 [] 3.002400e+15 6.666667e-01
- Attributes:
- coef_: ndarray of shape (n_features,) or (n_targets, n_features)
Parameter vector (w in the cost function formula).
- dual_gap_: float or ndarray of shape (n_targets,)
Given param alpha, the dual gaps at the end of the optimization, same shape as each observation of y.
- sparse_coef_: sparse matrix of shape (n_features, 1) or (n_targets, n_features)
Readonly property derived from
coef_.- intercept_: float or ndarray of shape (n_targets,)
Independent term in decision function.
- n_iter_: int or list of int
Number of iterations run by the coordinate descent solver to reach the specified tolerance.
- fit(X, y, sample_weight=None)
Fit linear model.
- Parameters:
- X: {array-like, sparse matrix} of shape (n_samples, n_features)
Training data
- y: array-like of shape (n_samples,) or (n_samples, n_targets)
Target values. Will be cast to X’s dtype if necessary
- sample_weight: array-like of shape (n_samples,), default=None
Individual weights for each sample
- Returns:
- self: returns an instance of self.
- _get_estimator()
- _is_multitask()
- _more_tags()
- selection_summary()
A summary for the result of model selection of the sparse regression in Stage 2.
- Returns:
- df: dataframe
dataframe with columns: “candidate_model”, “criteria”, and “mse”.
- class nl_causal.sparse_reg.L0_IC(alphas, criterion='bic', *, Ks=range(10), ada_weight=True, fit_intercept=True, precompute=False, copy_X=True, max_iter=1000, verbose=False, eps=np.finfo(float).eps, tol=0.0001, warm_start=False, positive=False, var_res=None, refit=True, find_best=True, random_state=None, selection='cyclic')
Bases:
sklearn.linear_model.LassoLarsICLinear Model Selection trained with L0 prior as regularizer The optimization objective for Lasso is:
(1 / (2 * n_samples)) * ||y - Xw||^2_2, s.t. ||w||_0 <= K
- Parameters:
- Ks: range of int, default=range(1,10)
Number of nonzero coef to be tuned.
- alphas: float, default=1.0
List of alphas where to compute the SCAD. default=np.arange(-3,3,.1)
- criterion: {‘bic’, ‘aic’}, default=’bic’
Selection criterion of model selection.
- mask: ndarray of shape (n_features,); dtype = bool
Indicator to count the variable in L0 term. default = ‘full’
- fit_intercept: bool, default=True
Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (i.e. data is expected to be centered).
- normalize: bool, default=False
This parameter is ignored when
fit_interceptis set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please useStandardScalerbefore callingfiton an estimator withnormalize=False.- precompute: ‘auto’, bool or array-like of shape (n_features, n_features), default=False
Whether to use a precomputed Gram matrix to speed up calculations. If set to
'auto'let us decide. The Gram matrix can also be passed as argument. For sparse input this option is alwaysTrueto preserve sparsity.- copy_X: bool, default=True
If
True, X will be copied; else, it may be overwritten.- max_iter: int, default=1000
The maximum number of iterations.
- tol: float, default=1e-4
The tolerance for the optimization: if the updates are smaller than
tol, the optimization code checks the dual gap for optimality and continues until it is smaller thantol.- warm_start: bool, default=False
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.
- positive: bool, default=False
When set to
True, forces the coefficients to be positive.- random_state: int, RandomState instance, default=None
The seed of the pseudo random number generator that selects a random feature to update. Used when
selection== ‘random’. Pass an int for reproducible output across multiple function calls. See Glossary.- selection: {‘cyclic’, ‘random’}, default=’cyclic’
If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e-4.
- refitbool, default=True
refit the best selected model by OLS.
Examples
>>> from nl_causal import sparse_reg >>> clf = sparse_reg.L0_IC(alphas=[.001, .01, .1, 1.], Ks=[1,2]) >>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2]) >>> print(clf.coef_) [1. 0.] >>> print(clf.intercept_) 2.220446049250313e-16 >>> clf.selection_summary() model criteria mse 0 (0,) 3.662041e-01 3.286920e-32 1 () 3.002400e+15 6.666667e-01
- Attributes:
- coef_: ndarray of shape (n_features,) or (n_targets, n_features)
Parameter vector (w in the cost function formula).
- dual_gap_: float or ndarray of shape (n_targets,)
Given param alpha, the dual gaps at the end of the optimization, same shape as each observation of y.
- sparse_coef_: sparse matrix of shape (n_features, 1) or (n_targets, n_features)
Readonly property derived from
coef_.- intercept_: float or ndarray of shape (n_targets,)
Independent term in decision function.
- n_iter_: int or list of int
Number of iterations run by the coordinate descent solver to reach the specified tolerance.
- fit(X, y, sample_weight=None)
Fit linear model.
- Parameters:
- X: {array-like, sparse matrix} of shape (n_samples, n_features)
Training data
- y: array-like of shape (n_samples,) or (n_samples, n_targets)
Target values. Will be cast to X’s dtype if necessary
- sample_weight: array-like of shape (n_samples,), default=None
Individual weights for each sample
- Returns:
- self: returns an instance of self.
- selection_summary()
A summary for the result of model selection of the sparse regression in Stage 2.
- Returns:
- df: dataframe
dataframe with columns: “candidate_model”, “criteria”, and “mse”.