Relational Model

Primary construct of relational model is:

  • Database = set of named relations(or tables)
    Each relation has a set of named attributes(or columns)
    Each tuple (or row) has a value for each attribute
    Each attributes has a type
  • Schema - structural description of relations in database.
  • Instance - actual contents at given point in time.
  • NULL - special value for “unknown” or “undefined”.
  • KEY - attribute whose value is unique for each tuple.

Intro To SQL

Data Definition Language

Contains command to create table and drop table

Data Manipulation Language

Contains command to query and modify database.

SELECT Statement

It consists of 3 basic clauses: Select, From, Where.

Table Variables & Set Operators

Table variables are easy way to refer to table/relations used within FROM clause in SELECT or WHERE clause.
Example

SELECT A.id, B.name  
FROM STUDENT A, COLLEGE B  
WHERE A.cName = B.cName  

There are 3 SET Operators: Union, Intersect & Except.

  • Union: It will return the union of data of two SQL query. The reference table of both queries can be same.
SELECT cname from College  
UNION  
SELECT sName from Student  
  • Intersect: It will return intersection or common data of two SQL query.
SELECT sId from Apply where major = 'CS'  
UNION  
SELECT sId from Apply where major = 'EE'  
  • Except: It will return data from first query which are not part of/returned in 2nd query.
SELECT sId from Apply where major = 'CS'  
EXCEPT  
SELECT sId from Apply where major = 'EE'  

In the above example, data returned will be student ids who applied to ‘CS’ as major and did not apply to ‘EE’ as major.

Subqueries

Subqueries are nested Select statements used within WHERE, SELECT & FROM clause.

WHERE Clause

Subquery in WHERE clause is also called nested query. This subquery is executed first and then used to filter data of main/outer query. Correlated subquery references outer query and it is executed once for each row of outer query.

SELECT GPA FROM STUDENT AS S WHERE S.cId IN (SELECT id FROM COLLEGE AS C WHERE C.name = 'MIT') 
FROM Clause

Subquery within FROM clause is also called derived table. Subquery within FROM clause should have a explicit alias. Subquery is executed first before outer query is executed.

SELECT * FROM
(SELECT S.name, S.id, (S.GPA * (S.sizeHs/1000) - S.GPA) as scaledGPA FROM STUDENT AS S) AS S WHERE S.scaledGPA > 3
SELECT CLAUSE

Subquery in SELECT clause can only return data with one column.The subquery is executed for each row processed by the main query. This can lead to performance issues if the subquery is complex and the main query processes a large number of rows.

SELECT e1.employee_id, (SELECT MAX(e2.salary) FROM employees e2 WHERE e2.department_id = e1.department_id) AS max_salary FROM employees e1; 

The query will return employee id and avg salary of department that the employee belongs to.

JOIN Family of Operators

Types of JOIN
  • Inner Join Returns records that have matching values in both tables. Result set includes only rows where the join condition is met
    • with On condition The ON keyword is followed by join condition that specifies how the tables should be related. Join conditions can use any columns from joined table and can use complex expressions.
    SELECT *
    FROM table1
    INNER JOIN table2
    ON table1.column_name = table2.column_name;
    • with USING(attributes/column_name) The USING keyword is followed by list of column names which exists in both tables. The join condition is implicitly based on equality of specified columns. USING is not supported in all browsers while ON condition is universally supported.
    SELECT *
    FROM table1
    INNER JOIN table2
    USING (column_name);
  • Natural JOIN Joins tables based on columns with the same name and compatible data types.Implicit join condition is equality of all columns with the same name. Removes duplicate columns with the same name from the result set.
	SELECT *
	FROM customers
	NATURAL JOIN orders;
  • Left Outer Join Returns all records from the left table, and matched records from the right table. Result set includes all rows from left table with NULL values for unmatched right table columns.
    SELECT customers.customer_id, customers.name, orders.order_id, orders.order_date
    FROM customers
    LEFT JOIN orders ON customers.customer_id = orders.customer_id;

- Right Outer Join
	  Returns all records from right table and matched records from left table. Result set includes all rows from right table with NULL values for unmatched left columns.
```sql
	SELECT orders.order_id, orders.order_date, customers.customer_id, customers.name
	FROM orders
	RIGHT JOIN customers ON orders.customer_id = customers.customer_id;
  • Full Outer Join Returns all records when there is match in left or right table. Combines results of both left join and right join. Result set contains all rows from both tables, with NULL values for unmatched columns.
	SELECT customers.customer_id, customers.name, orders.order_id, orders.order_date
	FROM customers
	FULL OUTER JOIN orders ON customers.customer_id = orders.customer_id;
  • Cross Joins Returns Cartesian product of the sets of records from joined tables. Matches every row from the first table with every row with second table.
	SELECT color, size
	FROM colors
	CROSS JOIN sizes;
	SELECT color, size
	FROM colors, sizes;
Commutativity and Associativity
  • Commutativity It refers to the property where order of tables in JOIN operation can be swapped without changing the result. A op B = B op A . Left and Right Outer join do not satisfy commutativity property.
  • Associativity It refers to grouping of three of more tables in JOIN operation can be changed without changing the result set. A op (B op C) = (A op B) op C. It does not hold for outer joins.

Aggregation

Aggregate functions perform computations over rows of a relation. Along with aggregation function two new clauses can be used: GROUP BY and HAVING. GROUP BY allows us to partition rows in groups and then aggregate function can compute over each group independently. HAVING clause filters groups after they have been created by group by.

Databases: Advanced topics In SQL

Index And Transactions

Indexes

Indexes are persistent data structure, stored in database. It’s primary mechanism is to improve performance on a database. Underlying Data Structure:

  • Balanced Trees(B trees, B+ trees) For A=V, A< V, A (log N time complexity)
  • Hash Tables For A = V operations (constant time complexity)
Downside of Indexes:
  • Needs extra space (marginal downside since storage is not a big issue).
  • Overhead of index creation. When database is loaded initially indexes needs to be created which can be a somewhat time consuming operation.
  • Index maintenance: When data is updated/mutated in database, the index has to be modified to reflect those changes. This can increase time for update operation and if database is write heavy, it will offset benefits of indexes.
Benefit of an index depends on:
  • Size of table and its layout.
  • Data distribution.
  • Query vs Update load.
SQL Syntax
Create Index IndexName on T(A) -- create index on single column
Create Index IndexName on T(A1,A2,…,An) -- create index on multiple columns
Create Unique Index IndexName on T(A) -- Add uniqueness constraint on column for whom index is being created
Drop Index IndexName -- Delete index
Transactions

A transaction is a sequence of one or more SQL operations treated as a unit. Transactions appear to run in isolation. If system fail, entire transaction’s changes will either be reflected or it won’t. Transactions are solution to concurrency problems and system failure problems.

Properties (ACID)
  • Isolation Isolation property ensures that intermediate state of transaction is hidden from concurrently running transactions. Isolation is implemented by serializability. Serializability says that operations may be interleaved but the execution must be in some sequential order of execution.
  • Durability If system crashes after transaction is commits then when system comes back up the changes are reflected in the database.
  • Atomicity This property ensures that all transactions are commited all or nothing. That means if system crashed in between a transaction then when system comes back up the partial changes of that transaction are roll-backed and the state of database will be same as before the transaction started.
  • Consistency This property ensures that integrity constraints of database which holds true before the transaction holds true after the transaction i.e the change of state of database due to transaction satisfies the integrity constraints of database. Integrity constraints are specification of legal states of database or predefined rules of database.
Isolation Levels

Serializability has overhead and it reduces concurrency. DBMS do offer weaker isolation levels (ordered weaker stronger): Read Uncommitted, Read Committed, Repeatable Read. These isolation levels have lower overhead and higher concurrency but at the cost of lower consistency guarantee.

Dirty Reads: Read is dirty when transaction reads a uncommitted value that was modified by a different transaction.

  • Read Uncommitted A transaction may perform dirty reads. For example:

    UPDATE Student Set GPA = (1.1) * GPA Where sizeHS > 2500 -- T1
    -- concurrent with
    SET Transaction ISOLATION Level Read Uncommited; -- T2
    Select Avg(GPA) FROM Student;

    In the above example, T1 is serializable transaction and T2 is Read Uncommitted Transaction and both are executed concurrently. So T1 & T2 don’t follow a sequential order and T2 might read Student data modified by T1 before T1 is commited and it might be possible that T1 might get roll-backed, making data read by T1 inaccurate.

  • Read Committed Does not allow transaction to perform dirty read but does not guarantee global serializabilty. Transaction can read data committed by other transactions. For example:

    UPDATE Student Set GPA = (1.1) * GPA Where sizeHS > 2500 -- T1
    -- concurrent with
    SET Transaction ISOLATION Level Read Commited; -- T2
    Select Avg(GPA) From Student;
    Select Max(GPA) From Student;

    In the above example it is possible that The Avg statement/operation is performed before update query is committed and max operation/statement is performed after the update query is committed. So Avg operation will not take into account increased GPA but Max operation will take increased GPA into account.

  • Repeatable Read A transaction cannot do dirty read and if item is read multiple times, it cannot change the value of item between multiple reads It does not guarantee global serializabilty. So for previous example, if T2 was repeatable read transaction the GPA value for both Avg & Max would have been the same. There is one caveat with repeatable read, it allows a relation to change its value multiple times through Phantom Tuples.

    Insert Into Student [ 100 new tuples ] -- T1
    -- concurrent-with
    --T2
    Set Transaction Isolation Level Repeatable Read;
    Select Avg(GPA) From Student;
    Select Max(GPA) From Student;

    For the above example, Avg operation takes place before insert and max operation takes place after insert. Although GPA should have returned same value, the second operation will take into account newly inserted data as the query has same data but different rows are being selected due to insertion of data. This read is called phantom read. The rows or tuples which were not present in the initial read and are present in the next read are called phantom tuples Transactions can be set Read-only which helps systems optimize performance.

Constraints And Triggers

Integrity Constraints

Impose restrictions on allowable data, beyond those imposed by structure and types. Examples: 0.0 GPA 4.0, enrolment < 75000. Why use them ?

  • Data entry errors (inserts)
  • correctness criteria (updates)
  • enforce consistency Classification of constraints:
  • Non-NULL On setting this, column will not accept NULL as value.
create table Student (sId int, sName text, GPA real NOT NULL, sizeHs int);
  • Key

    • Primary key constraint. On setting this, column will accept only unique values and will throw error when non unique values are set or updated. Also a table can have only one primary key.
    • Unique constraint. On setting this, column will accept only unique values and tables can have multiple unique columns.
    create table Student (sId int primary key, sName text unique, GPA real, sizeHs int);
  • Referential Integrity (Foreign Key) Refers to integrity of references. Consider two tables R & S having referential integrity with there column A & B respectively then each value in column A of table R must must appear in column B of table S. A is called foreign key. B is usually required to be primary key or unique. Multi attribute foreign keys are allowed.

    Referential Integrity is enforced by Relational Databases and potential violating operation for referential integrity are (consider for two tables R and S described above):

    • Insert into R: If data is inserted into R and value of column A is not present in any row of table S in column B.
    • Update S.B: If column B of a row in table S is being updated with the same data being referenced in table R.
    • Update R.A: If column A is updated with a value not present in table S.
    • Delete S: If a row is deleted in table S which is being referenced in table R. For any update/delete in table S(including dropping/deleting table S), it is possible to not violate referential integrity if the referential integrity constraint is defined with special options then it is possible for database to perform automatic operations which does not violate the referential integrity. This special actions/options are:
    • Restrict(Default): This is default behaviour of throwing error when referential integrity is violated.
    • SET Null: When row/tuple in referenced table(table S here) are updated or deleted, the referencing tuples/rows in referencing table(table R here) are updated with NULL value on referencing columns.
    • Cascade: Similar to SET Null but on updating/deleting rows in referenced table the rows/tuples in referencing table are also updated or deleted.
  • Attribute Based Attribute/Column based constraint allow us to associate a condition with specific attribute/column and these conditions are checked when new row is created or when data in column is updated.

    create table Student(sId int, sName text, GPA real check(GPA <= 4.0 and GPA > 0.0), sizeHs int check(sizeHs < 5000))
  • Tuple Based Tuple/Row based constraint allow us to associate a relationship between different values in each tuple. It is also checked when row is added or updated.

    create table Apply(sId int, cName text, major text, decision text, check (decision = 'N' or cName <> 'Stanford' or major <> 'CS'))
  • General Assertions General Assertions are part of SQL standards but are not supported by any SQLs. General assertions are constraints defined at database level and are used to ensure certain conditions hold true across the entire database at all times. Constraints can be declared as well as enforced. We can declare constraints at the time we create tables and constraints are initially checked after bulk loading and constraints are enforced after every modification. Deferred constraint checking is checking of constraints after every transaction.

Triggers

Event-Condition-Action Rules. Monitor database,when event occurs check condition; if true, do action. Why use them:

  • Move logic from apps into DBMS.
  • To enforce constraints

Syntax in SQL Standards:

CREATE TRIGGER name
BEFORE | AFTER | INSTEAD of EVENTS -- events can be insert on T, update on T[col1, col2, ..., coln], delete on table T
 
[referencing variables] -- it provides a way to refer modified data which caused intialisation of trigger. Referencing variables can be:
-- old row as var
-- new row as var
-- old table as var
-- new table as var
-- on insert operation, only new row/table can be referenced. on delete operation,only old row/table can be referenced. on update both old and new row/table can be referenced. old/new row refers to each row whose modification caused the trigger (only happens when for each row is enabled) while old/new table refers to all rows/tuples together which were modified and not old state of database.
 
[For Each Row] -- optional keyword which will trigger action for each modified tuple/row
 
When (condition) -- check the condition on database and if true action is performed
action -- action is sql statement

Tricky Issues with TRIGGERS:

  • Row level v/s Statement level: Usage of new/old row/table data when event trigger is before & instead of modification.
  • Multiple triggers activated at same time for same event.
  • Triggers activating other triggers(chaining which includes multiple chaining, self chaining).
  • How to write trigger when conditions are involved. Add Condition in when vs in action.
  • Triggers as mentioned in the SQL standards is not implemented exactly in any of the database system. Many of the databases deviate from implementation and some of them don’t support all the Trigger options.

Views And Authorization

Views

Views represents logical layer of database.

  • View V = ViewQuery() Syntax:
Create View Vname AS <SQL-Query> -- schema of Vname will be schema of query result
Create View Vname(A1, A2,...An) AS <SQL-Query>

Created views can be referenced as table in SQL query. Views are not stored anywhere in databases. When views are used in SQL query, the query is rewritten to use base tables so that the SQL query of views replaces views itself(rewritten and flattened so that views query is used efficiently). Views can use other views.

Views Modification

Modification to views are rewritten to modify base tables. The issue with translation of modification of views to modification of base tables is there are number of ways in which the modification of base tables can be translated and some ways may not be desired modification. For example: Tables R(A,B) and S(B,C) and a view V = select A,C from R,S where R.B=S.B. Suppose R={(1,5),(2,5)} and S={(5,10)}, so V={(1,10),(2,10)}. The user wants to delete tuple (2,10) from V. This can be translated to delete (2,5) from R, update (2,5) in R to (2,6), update (2,5) in R to (1,6).

There are two approaches to modification of views:

  1. Rewriting process specified explicitly by view creator
    • Can handle all modifications.
    • No guarantee of correctness (or meaningful). This approach is enabled by TRIGGER Instead OF(In Postgres SQL Rules is similar to this)
  2. Restrict views + modifications so that translation to base table modifications is meaningful and unambiguous.
    • No user intervention.
    • Restrictions are significant. The SQL standard has adopted this approach and provides rigorous limitations on what views can be modified. As per lecturer, only supported in MYSQL.
Modifying Views Using Trigger

Instead Of Trigger is used to allow views to be modified as defined by views creator. Consider three tables:

Student(sId INT, sname VARCHAR)
College(cId INT, cName VARCHAR)
APPLY(aId INT, sId INT FOREIGN KEY, cName VARCHAR FOREIGN KEY, major VARCHAR, descision CHAR)

and views

-- View 1
CREATE VIEW CSaccept as
SELECT sId, cName FROM APPLY WHERE major = 'CS' AND decision = 'Y'
 
-- View 2

following are examples of type of modifications on these views.

  • Inserting value in Views Trigger to insert row in cName.
CREATE TRIGGER CSAcceptInsert
INSTEAD OF insert ON CSAccept
FOR EACH ROW
WHEN New.major = 'CS'
begin
	INSERT INTO APPLY VALUES(New.sId, New.cName, New.major, NULL)
end
SQL Statement to insert can be
INSERT INTO CSAccept VALUES(111, 'Random', 'CS')
  • Updating values in Views Trigger to update cName in CSaccept view.
CREATE TRIGGER CSAcceptUpdate
INSTEAD OF update of cName ON CSaccept
For Each Row
begin
	update APPLY SET cName = New.cName
	WHERE sId = OLD.sId AND cName = OLD.cName AND major = 'CS' AND decision = 'Y'
end
SQL for updating can be
UPDATE CSaccept SET cName = 'Cornell' WHERE sId = 2 ANSD cName = 'MIT'
  • Deleting values in Views Trigger to delete value in CSaccept can be written as

    CREATE TRIGGER CSacceptDel
    INSTEAD OF delete ON CSaccept
    For Each Row
    begin
    	delete FROM APPLY WHERE sId = OLD.sId AND cName = OLD.cName AND major = 'CS' AND decision = 'Y'
    end

    Then SQL statement for deleting value within the view will be:

    DELETE FROM CSaccept WHERE sId = 2 AND cName = 'Stanford'

    This will ensure that data which are part of CSaccept are deleted. If conditions of major and decision are removed from trigger sql statement then the view delete command will also delete data which are in APPLY but not part of view which is incorrect.

Constraints defined for base tables are maintained for modification of Views and database will throw constraint violation error when any modification violates the base tables constraints.Views should not be allowed to be modified if they have aggregated data of base tables.

Restriction in Updating Views In SQL standard
  • SELECT (No Distinct) on single Table T.
  • Attributes not in View can be NULL or have default value.
  • Subqueries must not refer to/use Table T but can refer to other tables.
  • No GROUP BY or Aggregation.
Materialized Views

Usually Views are Virtual Views i.e when views are referenced the query is rewritten and are not stored as Table in database. Materialized Views are views which are stored as table in database. Added advantage of Materialised Views over Views are that it improves query performance. Modifications to the materialised views must also modify base tables similar to virtual views so the issues with modification of views are same as well in Materialised Views. (Efficiency) benefits of a materialized view depend on:

  • Size of data.
  • Complexity of view.
  • Number of queries using view.
  • Number of modifications affecting view
Authorization

Authorisation in database is a process of granting and revoking access rights and privileges to users. It ensures user can operate on only those tables to which they are authorised to. Syntax for granting privileges:

Grant privs On R To users [ With Grant Option ]
-- grant option allows the users to grant priviliges to other users

Syntax for revoking privileges:

Revoke privs On R From users [ Cascade | Restrict ]
-- Cascade will also revoke priviliges granted from priviliges being revoked. Restrict is opposite of Cascade

Databases: Modeling And Theory

Relational Algebra

Relational algebra is a formal system for manipulating relations (tables) in the relational model of database management systems (DBMS). It serves as the foundation for querying databases, providing a set of operations that take one or more relations as input and produce a new relation as output.

It is formal language and is base for SQL language. Relational Algebra Query(expression) on set of relations produces relation as a result. Simplest query in relational algebra is the name of the relation. Operators are used in relational algebra to filter, slice and combine relations.

  • Select Operator: Picks certain rows. Is denoted by . For example: Query Students with GPA > 3.7 can be denoted . Query Students with GPA > 3.7 and HS < 100 can be denoted as
  • Project Operator: Picks certain columns. It is denoted by . For ex: ID and decision of all applications is denoted as . Operators can be composed. For example to pick both row and columns, one of the query can be ID and name of students with GPA > 3.7. This can be denoted in relational algebra as .
  • Cross Product Operator: Combine two relations. The result is schema which is union of both relations and contents are every combination of rows from those two relations. For example: Query Names and GPAs of students with HS>1000 who applied to CS and were rejected can be denoted as
  • Natural Join: Performs cross product but enforces equality on same name attributes/column(eliminates one copy of duplicate attributes).The example in cross product can be written in terms of natural join as Also,
  • Theta Join: It is expressed as . Theta join allows us to use comparison operator other than ’=’ operator.It is most basic join operation implemented in DBMS. ‘join’ usually means theta join. There are no implicit join conditions in Theta join.
  • Set Operator: The result of this operator is different than join. The result are concatenated vertically.
    • Union Operator: . Combines rows of two relations and eliminates duplicate rows. Two relations must be union compatible i.e
      • Must have same number of columns/attributes.
      • Corresponding attributes must have same datatype. For example, if for relation A first column S is of type int then for relation B first column should be of type int.
    • Difference Operator . Rows/tuples present in relation A but not in B. Relations should be union compatible.
    • Intersection Operator . Intersection is basically equal to natural join. Relations must be union compatible.
  • Rename Operator: It renames/reassigns schema of result of Expression. It is described as . It is mainly used for two scenarios: To unify schema’s for set operator & for to clearly define and use schema in self joins.
Alternate Notations
  • Assignment statements: In relational algebra, an assignment statement is used to assign the result of a relational algebra expression to a relation variable. This allows you to reuse intermediate results in subsequent operations.Example:
  • Expression Tree: An expression tree in relational algebra is a tree data structure used to represent the hierarchy and operations of a relational algebra expression. Each node in the tree represents an operation (e.g., selection, projection, join), and the leaf nodes represent base relations or attributes.

Relational Design Theory

Design by Decomposition
  • Start with “Mega” relation containing everything.
  • Decompose into smaller, better relations with same information such that decomposed tables on reassembly/natural join produces original table(Lossless join property).
  • Can do decomposition automatically. Automatic decomposition works by:
    • Specifying “Mega” relations and properties of the data.
    • System will decompose relations based on properties.
    • Final set of relation will satisfy normal form(No anomalies, No lost information). Properties of Data:
      • Functional Dependencies
      • Multivalued Dependencies Normal Forms:
      • Functional Dependencies Boyce-Codd Normal Form
      • Functional Dependencies + Multivalued Dependencies Fourth Normal Form
      • Skipping 1st,2nd,3rd Normal Form.
Functional Dependencies

Functional dependency between two attributes/set of two attributes occurs when one attribute/set of attributes uniquely determine second attribute/set of attributes. Here functionally determines . This means for any instance of there will be one instance of and when two tuples/rows have same value of they will have same value of . For example consider relation Apply(SSN, cName, state, date, major), cName date will mean for same value of cName there will be same date and SSN,cName Major will mean for same value of SSN, cName there will be same value of Major.Inverse is also applicable for the mentioned functional dependencies. If set of attributes in a relation determines all the other attributes then the set of attributes are key.This will mean that if values of set of attributes in two tuples are same then values of other attributes are same and tuples are duplicate. Rules for Function Dependency:

  • Splitting rule ,,,…, then , , ,… Same is not applicable for left side splitting.
  • Combining rule then ,,,…,
  • Trivial Dependency rule , then
  • Transitive rule and then
Closure of Attributes

Closure of set of attributes A is set of attributes functionally determined by set of attributes X. It is denoted as . Steps to find closure of set of attributes :

  1. Initialise closure of denoted as to itself.
  2. If is present in and then add to
  3. Repeat step 2 until there is no change in .
Closure and Keys

Relation of closure and keys can be used to determine following:

  • Is a key for R ? To determine if is a key of relation R, compute . If has all attributes of relation R then is a key.
  • Find all keys given a set of Functional Dependencies ? Consider all subset of attributes of a relation in increasing size and compute closure attributes of each subset of attributes. If closure has all attributes, it is key and every super set of key is a key.
  • Find if FD is applicable for the given relation R Compute and check if is present or not. If present, the FD will satisfy relation R.
Specifying set of FDs for a relation

S​ “follows from” S​ if every relation instance that satisfies S also satisfies S. This means that if the dependencies in S​ hold true, then those in S must also hold true for any given dataset.

Methods to test if S​​ follows from S​: - Using Attribute Closure: Compute the closure of attributes based on S​and check if it includes all the dependencies in S. For example To check if SSN→Priority (from S​) follows from S, compute the closure of SSN using S.If SSN’s closure includes Priority, then SSN→Priority follows from S​​.

Boyce-Codd Normal Form(BCNF)
Decomposition of a Relational Schema

Relation R(A,A, …, A or ) can be decomposed into two Relation R(C,C, …, Cor ) and R (D,D,…,D or ) where and R R. R and R can be denoted in terms of relational algebra as , .

Relation R with FDs is in Boyce-Codd Normal Form if: For each functional dependency in R B, is a key.

BCNF decomposition algorithm

Input: relation R + FDs for R Output: decomposition of R into BCNF relations with “lossless join”.

  • Compute Keys for R.
  • Repeat until all relations are in BCNF:
    • Pick any R’ with A B that violates BCNF.
    • Decompose R’ into R(A,B) and R(A, rest).
    • Compute FDs for R and R^84eea5.
    • Compute keys for R and R.
Multivalued Dependencies & Fourth normal form
Multivalued Dependency

For a relation R, multivalued dependency is expressed as A ↠ B which is said as ‘A multi determines B’. A multi-valued dependency occurs when, within a database table, one attribute (or a set of attributes) determines multiple independent values of another attribute (or set of attributes).

For a relation R with attributes X,Y, and Z (where Z is the set of attributes in R not in X or Y), X↠Y holds if, for every pair of tuples t1 and t2 in R that agree on X, there exist tuples t3 and t4 in R such that:

  • t1[X] = t2[X] = t3[X] = t4[X]
  • t3[Y] = t1[Y] and t3[Z] = t2[Z]
  • t4[Y] = t2[Y] and t4[Z] = t1[Z]
Trivial & Nontrivial Multivalued Dependency

Trivial Multivalued dependency is defined as and or . If conditions are not met then Multivalued dependency is Non-Trivial.

Rules for Multivalued Dependencies
  • Functional Dependency is Multivalued Dependency.
  • Intersection rule: & then
  • Transitive rule: & then
Fourth Normal Form

Relation R with Multivalued Dependencies is in 4NF if: For each nontrivial , is a key.

4NF decomposition algorithm

Input: relation R + FDs for R + MVDs for R Output: decomposition of R into 4NF relations with “lossless join”

  • Compute keys for R.
  • Repeat until all relations are in 4NF:
    • Pick any R’ with nontrivial A ↠ B that violates 4NF.
    • Decompose R’ into R (A, B) and R (A, rest).
    • Compute FDs and MVDs for R and R.
    • Compute keys for R and R.

Unified Modeling Language