close Warning: Can't synchronize with repository "(default)" (/var/svn/tolp does not appear to be a Subversion repository.). Look in the Trac log for more information.

Package QltvRespModel

Max-likelihood and bayesian estimation of qualitative response models as an special case of generalized linear models

Weighted Boolean Regresions

Abstract class @WgtBoolReg is an specialization of class GrzLinModel::@WgtReg and is the base to inherit weighted boolean regressions as logit or probit or any other, given just the scalar distribution function  F and the corresponding density function  f . In a weighted regression each row of input data has a distinct weight in the likelihood function. For example, it can be very usefull to handle with data extrated from an stratified sample.

Let be

  •  X\in\mathbb{R}^{m\times n} the regression input matrix
  •  w\in\mathbb{R}^{m} the vector of weights of each register
  •  y\in\mathbb{R}^{m} the regression output matrix

The hypotesis is that  \forall i=1 \dots m

 y_{i}\sim Bernoulli\left(\pi_{i}\right)

 \pi_{i}=Pr\left[y_{i}=1\right] = F\left(X_{i}\beta\right)

The likelihood function is then

 lk\left(\beta\right)=\underset{i}{\prod}\pi_{i}^{w_{i}y_{i}}\left(1-\pi_{i}\right)^{w_{i}\left(1-y_{i}\right)}

and its logarithm

 L\left(\beta\right)=\ln\left(lk\left(\beta\right)\right)=\underset{i}{\sum}w_{i}\left(y_{i}\ln\left(\pi_{i}\right)+\left(1-y_{i}\right)\ln\left(1-\pi_{i}\right)\right)

The gradient of the logarithm of the likelihood function will be

 \frac{\partial L\left(\beta\right)}{\partial\beta_{j}}=\underset{i}{\sum}w_{i}\left(y_{i}\frac{f\left(x_{i}\beta\right)}{F\left(x_{i}\beta\right)}-\left(1-y_{i}\right)\frac{f\left(x_{i}\beta\right)}{1-F\left(x_{i}\beta\right)}\right)x_{ij}

and the hessian is

 \frac{\partial L\left(\beta\right)}{\partial\beta_{i}\partial_{j}}=\underset{k}{\sum}w_{k}\left(y_{k}\frac{f'\left(x_{k}\beta\right)F\left(x_{k}\beta\right)-f^{2}\left(x_{k}\beta\right)}{F^{2}\left(x_{k}\beta\right)}-\left(1-y_{k}\right)\frac{f'\left(x_{k}\beta\right)\left(1-F\left(x_{k}\beta\right)\right)+f^{2}\left(x_{k}\beta\right)}{\left(1-F\left(x_{k}\beta\right)\right)^{2}}\right)x_{ik}x_{jk}

User can and should define scalar truncated normal or uniform prior information and bounds for all variables for which he/she has robust knowledge.

 \beta_k \sim N\left(\nu_k, \sigma_k \right)

 l_k \le \beta_k \le u_k \wedge l_k < u_k

When  \sigma_k is infinite or unknown we will express a uniform prior. When  l_k = -\infty or unknown we will express that variable has no lower bound. When  u_k = +\infty or unknown we will express that variable has no upper bound.

It's also allowed to give any set of constraining linear inequations if they are compatible with lower and upper bounds

 A \beta \le a

This class implements max-likelihood estimation by means of package NonLinGloOpt and bayesian simulation using BysSampler.

The only mandatory members are the matrices of output and input of the regression

  //Output vector 0 o 1 (mx1)
  VMatrix y;
  //Input matrix (mxn)
  VMatrix X;

You can also specify these other members:

  //Weights  vector (mx1), default values are 1
  VMatrix w=Rand(0,0,0,0);
  //Name of output
  Text output.name = "";
  //Names of input variables
  Set input.name = Copy(Empty);
  //Set of GrzLinModel::@PsbTrnNrmUnfSclDst
  Set prior = Copy(Empty);
  //Constraining matrices A*b<=a
  //Constraining coefficient matrix
  VMatrix A=Rand(0,0,0,0); 
  //Constraining border vector
  VMatrix a=Rand(0,0,0,0); 

Weighted Logit Regression

Class @WgtLogit is an specialization of class @WgtBoolReg that handles with weighted logit regressions.

You can view test_0003 of using this class.

In this case we have that scalar distribution is the logistic one.

Scalar cumulant:

 F\left(z\right) = \frac{1}{1+e^{-z}}

Scalar density:

 f\left(z\right) = \frac{e^{-z}}{\left(1+e^{-z}\right)^2}

Scalar density derivative:

 f'\left(z\right) = - f\left(z\right) F\left(z\right) {\left(1-e^{-z}\right)}

Logarithm of likelihood:

 L\left(\beta\right)=\underset{i}{-\sum}w_{i}\left(\ln\left(1+e^{-x_{i}^{t}\beta}\right)+\left(1-y_{i}\right)x_{i}^{t}\beta\right)

Gradient:

 \frac{\partial L\left(\beta\right)}{\partial\beta_{j}}=\underset{i}{\sum}w_{i}\left(\frac{e^{-x_{i}^{t}\beta}}{1+e^{-x_{i}^{t}\beta}}-\left(1-y_{i}\right)\right)x_{ij}=\underset{i}{\sum}w_{i}\left(\left(1-\pi_{i}\right)-\left(1-y_{i}\right)\right)x_{ij}=\underset{i}{\sum}w_{i}\left(y_{i}-\pi_{i}\right)x_{ij}

Hessian:

 \frac{\partial L\left(\beta\right)}{\partial\beta_{i}\partial_{j}}=\underset{k}{\sum}w_{k}\frac{-e^{-x_{k}^{t}\beta}}{\left(1+e^{-x_{k}^{t}\beta}\right)^{2}}x_{ki}x_{kj}=-\underset{k}{\sum}x_{ki}x_{kj}w_{k}\pi_{k}\left(1-\pi_{k}\right)

From the standpoint of arithmetic discrete numerical calculation must take into account that

 e^{710}=\infty

For this reason we must carefully try to contain the exponential expressions. In this case it will use the following asymptotic equalities

 \ln\left(1+e^{-z}\right)\;\overset{z\rightarrow-\infty}{\longrightarrow}\;-z

 \frac{e^{-z}}{1+e^{-z}}\;\overset{z\rightarrow-\infty}{\longrightarrow}\;1

Weighted Probit Regression

Class @WgtProbit is an specialization of class @WgtBoolReg that handles with weighted probit regressions.

You can view test_0004 of using this class.

In this case we have that scalar distribution is the standard normal one.

Scalar cumulant:

 F\left(z\right) = \Phi\left(z\right)

Scalar density:

 f\left(z\right) = \phi\left(z\right)

Scalar density derivative:

 f'\left(z\right) = -z \phi\left(z\right)

Logarithm of likelihood:

 L\left(\beta\right)=\underset{i}{\sum}w_{i}\left(y_{i}\ln\left(\Phi\left(x_{i}\beta\right)\right)+\left(1-y_{i}\right)\ln\left(\Phi\left(-x_{i}\beta\right)\right)\right)

Gradient:

 \frac{\partial L\left(\beta\right)}{\partial\beta_{j}}=\underset{i}{\sum}w_{i}\left(y_{i}\frac{\phi\left(x_{i}\beta\right)}{\Phi\left(x_{i}\beta\right)}-\left(1-y_{i}\right)\frac{\phi\left(x_{i}\beta\right)}{\Phi\left(-x_{i}\beta\right)}\right)x_{ij}

Hessian:

 \frac{\partial L\left(\beta\right)}{\partial\beta_{i}\partial_{j}}=\frac{\partial L\left(\beta\right)}{\partial\beta_{i}\partial_{j}}=-\underset{k}{\sum}w_{k}\phi\left(x_{k}\beta\right)\left(y_{k}\frac{z\Phi\left(x_{k}\beta\right)+\phi\left(x_{k}\beta\right)}{\Phi\left(x_{k}\beta\right)^{2}}+\left(1-y_{k}\right)\frac{-z\Phi\left(-x_{k}\beta\right)+\phi\left(x_{k}\beta\right)}{\Phi\left(-x_{k}\beta\right)^{2}}\right)x_{ik}x_{jk}

To avoid numerical problems will use the following equality

 \ln\left(\Phi\left(z\right)\right)=\ln\left(1-erf\left(\frac{-z}{\sqrt{2}}\right)\right)-\ln2

The function logarithm of complemetary error function

 \ln\left(1-erf\left(u\right)\right)

is implemented as gsl_sf_log_erfc that is available in TOL.

In the gradient it appears two times the Hazard function

 h\left(z\right)=\frac{\phi\left(z\right)}{1-\Phi\left(z\right)}=\frac{\phi\left(z\right)}{\Phi\left(-z\right)}

 \frac{\phi\left(z\right)}{\Phi\left(z\right)}=\frac{\phi\left(-z\right)}{\Phi\left(z\right)}=h\left(-z\right)

decreases rapidly as z approaches -\infty and asymptotes to h\left(z\right) \sim z as z approaches +\infty.

Hazard function is implemented as gsl_sf_hazard that is also available in TOL.

Last modified 14 years ago Last modified on Mar 31, 2011, 8:37:13 AM