algebra              package:relations              R Documentation

_R_e_l_a_t_i_o_n_a_l _A_l_g_e_b_r_a

_D_e_s_c_r_i_p_t_i_o_n:

     Various "relational algebra"-like operations.

_U_s_a_g_e:

     relation_projection(x, margin = NULL)
     relation_selection(x, subset)
     relation_cartesian(x, y, ...)
     relation_union(x, y, ...)
     relation_complement(x, y)
     relation_intersection(x, y)
     relation_symdiff(x, y)
     relation_division(x, y)
     relation_remainder(x, y)
     relation_join(x, y, ...)
     relation_semijoin(x, y, ...)
     relation_antijoin(x, y, ...)

_A_r_g_u_m_e_n_t_s:

    x, y: Relation objects.

  margin: Either a character vector of domain names, or an integer
          vector of domain indices.

  subset: Expression resulting in a logical vector of length equal to
          the number of tuples in the graph.

     ...: For 'relation_cartesian' and 'relation_union': relation
          objects; else: parameters passed to 'merge'.

_D_e_t_a_i_l_s:

     These functions provide functionality similar to the corresponding
     operations defined in relational algebra theory as introduced by
     Codd (1970).  Note, however, that domains in database relations,
     unlike the concept of relations we use here, are unordered.  In
     fact, a database relation ("table") is defined as a set of
     elements called "tuples", where the "tuple" componentes are named,
     but unordered.  So in fact, a "tuple" in this sense is a set of
     mappings from the attribute names into the union of the attribute
     domains.

     The _projection_ of a relation on a specified margin (i.e., a
     vector of domain names or indices) is the relation obtained when
     all tuples are restricted to this margin. As a consequence,
     duplicate tuples are removed.

     The _selection_ of a relation is the relation obtained by taking a
     subset of the relation graph, defined by some logical expression.

     The _cartesian product_ of two relations is obtained by basically
     buiding the cartesian product of all graph elements, but combining
     the resulting pairs into single tuples.

     The _union_ of two relations simply combines the graph elements of
     both relations; the _complement_ of two relations X and Y removes
     the tuples of Y from X.

     The _intersection_ (_symmetric difference_) of two relations is
     the relation with all tuples they have (do not have) in common.

     The _division_ of relation X by relation Y is the reversed
     cartesian product. The result is a relation with the domain unique
     to X and containing the maximum number of tuples which, multiplied
     by Y, are contained in X. The _remainder_ of this operation is the
     complement of X and the division of X by Y. Note that for both
     operations, the domain of Y must be contained in the domain of X.

     The (natural) _join_ of two relations is their cartesian product,
     restricted to the subset where the elements of the common
     attributes do match. The left/right/full outer join of two
     relations X and Y is the union of X/Y/X and Y, and the inner join
     of X and Y. The implementation uses 'merge', and so the
     left/right/full outer joins are obtained by setting
     'all.x'/'all.y'/'all' to 'TRUE' in 'relation_join'. The domains to
     be matched are specifyied using 'by'.

     The left (right) _semijoin_ of two relations X and Y is the join
     of these, projected to the attributes of X (Y). Thus, it yields
     all tuples of X (Y) participating in the join of X and Y. 

     The left (right) _antijoin_ of two relations X and Y is the
     complement of X (Y) and the join of both, projected to the
     attributes of X (Y). Thus, it yields all tuples of X (Y) _not_
     participating in the join of X and Y.

_R_e_f_e_r_e_n_c_e_s:

     E. F. Codd (1970). A relational model of data for large shared
     data banks. _Communications of the ACM_, *13*(6), 377-387.

_S_e_e _A_l_s_o:

     'relation'

_E_x_a_m_p_l_e_s:

     ## projection
     Person <-
         data.frame(Name = c("Harry", "Sally", "George", "Helena", "Peter"),
                    Age = c(34, 28, 29, 54, 34),
                    Weight = c(80, 64, 70, 54, 80),
                    stringsAsFactors = FALSE)
     Person <- as.relation(Person)
     relation_table(Person)
     relation_table(relation_projection(Person, c("Age", "Weight")))

     ## selection
     relation_table(R1 <- relation_selection(Person, Age < 29))
     relation_table(R2 <- relation_selection(Person, Age >= 34))
     relation_table(R3 <- relation_selection(Person, Age == Weight))

     ## union
     relation_table(R1 %U% R2)

     ## works only for the same domains:
     relation_table(R2 | R3)

     ## complement
     relation_table(Person - R2)

     ## intersection
     relation_table(relation_intersection(R2, R3))

     ## works only for the same domains:
     relation_table(R2 & R3)

     ## symmetric difference
     relation_table(relation_symdiff(R2, R3))

     ## cartesian product
     Employee <-
         data.frame(Name = c("Harry", "Sally", "George", "Harriet", "John"),
                    EmpId = c(3415, 2241, 3401, 2202, 3999),
                    DeptName = c("Finance", "Sales", "Finance", "Sales", "N.N."),
                    stringsAsFactors = FALSE)
     Employee <- as.relation(Employee)
     relation_table(Employee)
     Dept <- data.frame(DeptName = c("Finance", "Sales", "Production"),
                        Manager = c("George", "Harriet", "Charles"),
                        stringsAsFactors = FALSE)
     Dept <- as.relation(Dept)
     relation_table(Dept)

     relation_table(Employee %><% Dept)

     ## Natural join
     relation_table(Employee %|><|% Dept)

     ## left (outer) join
     relation_table(Employee %=><% Dept)

     ## right (outer) join
     relation_table(Employee %><=% Dept)

     ## full outer join
     relation_table(Employee %=><=% Dept)

     ## antijoin
     relation_table(Employee %|>% Dept)
     relation_table(Employee %<|% Dept)

     ## semijoin
     relation_table(Employee %|><% Dept)
     relation_table(Employee %><|% Dept)

     ## division
     Completed <-
         data.frame(Student = c("Fred", "Fred", "Fred", "Eugene",
                                "Eugene", "Sara", "Sara"),
                    Task = c("Database1", "Database2", "Compiler1",
                             "Database1", "Compiler1", "Database1",
                             "Database2"),
                    stringsAsFactors = FALSE)
     Completed <- as.relation(Completed)
     relation_table(Completed)
     DBProject <- data.frame(Task = c("Database1", "Database2"),
                             stringsAsFactors = FALSE)
     DBProject <- as.relation(DBProject)
     relation_table(DBProject)

     relation_table(Completed %/% DBProject)

     ## division remainder
     relation_table(Completed %% DBProject)

