cmx             package:PresenceAbsence             R Documentation

_C_o_n_f_u_s_i_o_n _M_a_t_r_i_x

_D_e_s_c_r_i_p_t_i_o_n:

     'cmx' calculates the confusion matrix for a single model.

_U_s_a_g_e:

     cmx(DATA, threshold = 0.5, which.model = 1, na.rm = FALSE)

_A_r_g_u_m_e_n_t_s:

    DATA: a matrix or dataframe of observed and predicted values where
          each row represents one plot and where columns are:

                  DATA[,1]  plot ID                                            text
                  DATA[,2]  observed values                                    zero-one values
                  DATA[,3]  predicted probabilities from first model           numeric (between 0 and 1)
                  DATA[,4]  predicted probabilities from second model, etc...  

threshold: a cutoff value between zero and one used for translating
          predicted probabilities into 0 /1 values, defaults to 0.5. It
          must be a single value between zero and one.

which.model: a number indicating which model from DATA should be used

   na.rm: a logical indicating whether missing values should be removed

_D_e_t_a_i_l_s:

     'cmx' calculates the confusion matrix for a single model at a
     single threshold. 

     If 'DATA' contains more predictions from more than one model 
     'WHICH.DATA' can be used to specify which model should be used. If
     'WHICH.DATA' is not given, 'cmx' will use predictions from the
     first model by default.

     When calculating the confusion matrix, any plot with a predicted
     probability greater than 'threshold' is considered to be predicted
     'Present', while any plot with a predicted probability less than
     or equal to 'threshold' is considered to be predicted 'Absent'.
     The only exception is when 'threshold' equals zero. In that case,
     all plots are considered to be predicted 'Present'.

     Unlike other functions in this library, 'threshold' can not be a
     vector or an integer greater than one. Instead, 'threshold' must
     be given as a single number between zero and one.

     If 'na.rm' equals 'FALSE' and 'NA''s are present in the 'DATA'
     function will return 'NA'.

     If 'na.rm' equals 'TRUE' and 'NA''s are present in the 'DATA',
     function will remove all rows where any of the values in the row
     consist of 'NA'. Function will also print the number of rows that
     have been removed.

_V_a_l_u_e:

     the confusion matrix is returned in the form of a table where: 

 columns: observed values

    rows: predicted values

_A_u_t_h_o_r(_s):

     Elizabeth Freeman eafreeman@fs.fed.us

_R_e_f_e_r_e_n_c_e_s:

_S_e_e _A_l_s_o:

     'pcc', 'sensitivity', 'specificity', 'Kappa'

_E_x_a_m_p_l_e_s:

     ### EXAMPLE 1 ###
          ### generate simulated data ###
          set.seed(666)
          N=1000
          SIMDATA<-matrix(0,N,3)
          SIMDATA<-as.data.frame(SIMDATA)
          names(SIMDATA)<-c("plotID","Observed","Predicted")
          SIMDATA$plotID<-1:N
          SIMDATA$Observed<-rbinom(n=N,size=1,prob=.2)
          SIMDATA$Predicted[SIMDATA$Observed==1]<-rnorm(n=length(SIMDATA$Observed[SIMDATA$Observed==1]),mean=.8,sd=.15)
          SIMDATA$Predicted[SIMDATA$Observed==0]<-rnorm(n=length(SIMDATA$Observed[SIMDATA$Observed==0]),mean=.2,sd=.15)
          SIMDATA$Predicted<-(SIMDATA$Predicted-min(SIMDATA$Predicted))/(max(SIMDATA$Predicted)-min(SIMDATA$Predicted))

          ### plot simulated data
          hist(SIMDATA$Predicted,100)

          ### calculate confusion matrix ###
          cmx(SIMDATA)

     ### EXAMPLE 2 ###

          data(SIM3DATA)

          cmx(SIM3DATA)
          cmx(SIM3DATA,which.model=2)
          cmx(SIM3DATA,which.model=3,threshold=.2)

