Johnny Deng's Column: Relational Algebra

Operations for relational database include: project, restrict (select), product, union, difference, join, intersect, divide. In SQL, the 'project' or 'projection' operation works by just specifying a subset of the attributes and a 'restrict' operation works with a SELECT command with a WHERE clause. More generally, a 'query' combines several operations.

l SELECT. RESTRICTS the rows chosen from a table to those entries with specified attribute values.

SELECT item

FROM stock_level

WHERE quantity > 100

constructs a new, logical table - an unnamed relation - with one column per row (i.e. item) containing all rows from stock_level that satisfy the WHERE clause.

l PROJECT. Selects rows made up of a sub-set of columns from a table.

produces a new logical table where each row contains only two columns - item and description. The new table will only contain distinct rows from stock_item; i.e. any duplicate rows so formed will be eliminated.

l JOIN. Associates entries from two tables on the basis of matching column values.

It is not necessary for there to be a one-to-one relationship between entries in two tables to be joined - entries which do not match anything will be eliminated from the result, and entries from one table which match several entries in the other will be duplicated the required number of times. The above definition is actually that of a NATURAL or EQUI-JOIN - i.e. a join in which the values of the matching columns are equal. Outer join - expands the natural join by making sure that every record from both relations is listed in the join relation at least once, that is, it includes all rows from each table, even unmatched rows, where columns are filled in with nulls.There are two steps: (1) A natural join is executed. (2) Then if any record in one relation does not match a record from the other relation in the natural join, that unmatched record is added to the join relation, and the additional columns are filled with nulls.It has become normal to extend join to include other comparison operators such as less than, greater than, etc. It is important to be clear about one's intentions here to obtain meaningful results. Join is obviously a very general operation, and the principal source of processing power in relational systems, but it is also costly in time and space. Because no ordering can be guaranteed, a join may require a comparison of every entry in one table with every entry in the other, and create large intermediate results. That is why users of large-scale data bases, while acknowledging the power and flexibility of the relational approach, were slow to adopt it instead of methods based on more efficient file processing techniques.

l PRODUCT. Builds a relation from two specified relations consisting of all possible combinations of rows, one from each of the two relations.

l UNION. Builds a relation consisting of all rows appearing in either or both of the two relations.

l INTERSECT. Builds a relation consisting of all rows appearing in both of the two relations.

l DIFFERENCE. Builds a relation consisting of all rows appearing in the first and not in the second of the two relations.

l DIVIDE.Takes two relations, one binary and one unary, and builds a relation consisting of all values of one column of the binary relation that match, in the other column, all values in the unary relation.

Integrity rules

l entity integrity: no duplicate tuples;

l referential integrity: references only to indexes or keys that exist;

l domain integrity: domain types include date, currency (dollar), etc;

l column integrity: constraints on a column or attribute;

l user-defined integrity: constraints among different columns or attributes.

SQL components.

l Data definition language: statements (commands) used to create, modify and delete columns and keys, etc.

l Data manipulation language: statements (commands) used to add, change, retrieve and delete data in tables, or create views;

l Transaction management language: statements to track the changes to a database;

l Data control language: statements to grant or restrict authorization privileges to users (for database manager only).

Johnny Deng's Column

Saturday, 11 August 2007

Relational Algebra

No comments:

Site Search

Blog Archive

Who am I?

Access History