LCS                  package:qualV                  R Documentation

_A_l_g_o_r_i_t_h_m _f_o_r _t_h_e _L_o_n_g_e_s_t _C_o_m_m_o_n _S_u_b_s_e_q_u_e_n_c_e _P_r_o_b_l_e_m

_D_e_s_c_r_i_p_t_i_o_n:

     Determines the longest common subsequence of two strings.

_U_s_a_g_e:

     LCS(a, b)

_A_r_g_u_m_e_n_t_s:

       a: vector (numeric or character), missing values are not
          accepted

       b: vector (numeric or character), missing values are not
          accepted

_D_e_t_a_i_l_s:

     A longest common subsequence ('LCS') is a common subsequence  of
     two strings of maximum length. The 'LCS' Problem consists of
     finding a 'LCS' of two given strings and its length ('LLCS'). The
     'QSI' is computed by division of the 'LLCS' over maximum length of
     ''a'' and ''b''.

_V_a_l_u_e:

       a: vector ''a''

       b: vector ''b''

    LLCS: length of 'LCS'

     LCS: longest common subsequence

     QSI: quality similarity index

      va: one possible 'LCS' of vector ''a''

      vb: one possible 'LCS' of vector ''b''

_N_o_t_e:

     Basing on the most prominent but simple calculation scheme this
     algorithm is not very efficient with respect to its time and
     memory requirements.

_R_e_f_e_r_e_n_c_e_s:

     Wagner, R. A. and Fischer, M. J. (1974) The String-to-String
     Correction Problem. Journal of the ACM, 21, 168-173.

     Paterson, M. and Danck, V. (1994) Longest Common Subsequences.
     Mathematical Foundations of Computer Science, 841, 127-142.

     Gusfield, D. (1997) Algorithms on Strings, Trees, and Sequences:
     Computer Science and Computational Biology. Cambridge University
     Press, England, ISBN 0-521-58519-8.

_E_x_a_m_p_l_e_s:

     # direct use
     a <- c("b", "c", "a", "b", "c", "b")
     b <- c("a", "b", "c", "c", "b")
     LCS(a, b)

     # a constructed example
     x <- seq(0, 2 * pi, 0.1)  # time
     y <- 5 + sin(x)           # a process
     o <- y + rnorm(x, sd=0.2) # observation with random error
     p <- y + 0.1              # simulation with systematic bias
     plot(x, o); lines(x, p)

     lcs <- LCS(f.slope(x, o), f.slope(x, p))  # too much noise
     lcs$LLCS
     lcs$QSI

     os <- ksmooth(x, o, kernel = "normal", bandwidth = dpill(x, o), x.points = x)$y
     lcs <- LCS(f.slope(x, os), f.slope(x, p))
     lcs$LLCS
     lcs$QSI

     # observed and measured data with non-matching time intervals
     data(phyto)
     bbobs    <- dpill(obs$t, obs$y)
     n        <- tail(obs$t, n = 1) - obs$t[1] + 1
     obsdpill <- ksmooth(obs$t, obs$y, kernel = "normal", bandwidth = bbobs,
                         n.points = n)
     obss     <- data.frame(t = obsdpill$x, y = obsdpill$y)
     obss     <- obss[match(sim$t, obss$t),]
     obs_f1   <- f.slope(obss$t, obss$y)
     sim_f1   <- f.slope(sim$t, sim$y)
     lcs      <- LCS(obs_f1, sim_f1)
     lcs$QSI

