Dave's Quick-But-Unproven Graphical Normalisation Method

By Dave Voorhis <dave@armchair.mb.ca>
September 6, 2000 (Revised May 30, 2003)

Given a set of functional dependencies like the following...

...the goal is to achieve Boyce-Codd Normal Form. This appears to be easily obtained by writing the functional dependencies in a bubble diagram and following some steps. While it it has yet to fail me, this algorithm has not yet been proven correct. If you can find a proof of its (in)correctness, please let me know at dave@armchair.mb.ca

For each functional dependency, draw the left side as a new bubble, unless it already appears on the diagram.

For each functional dependency, draw the right side as a new bubble, unless it already appears on the diagram.

For each functional dependency, draw an arrow from the appropriate left-side bubble to the right-side bubble.

Once all functional dependencies have been processed in step 3, write down each non-leaf node (i.e., a bubble that points to other bubbles -- a bubble that points to nothing else is a leaf node) as the primary key of a new relation:

ABThing(A,B,
AThing(A,
EThing(E,
CThing(C,
FBThing(F,B,
CBThing(C,B,

For each relation, find its bubble on the diagram and note the adjacent bubbles to which it points. These become attributes of the relation.

ABThing(A,B,D,C)
AThing(A,E)
EThing(E,G)
CThing(C,I,H)
FBThing(F,B,C)
CBThing(C,B,F)

Eliminate any duplicate relations. In the example above, the last two are duplicates.

ABThing(A,B,D,C)
AThing(A,E)
EThing(E,G)
CThing(C,I,H)
FBThing(F,B,C)

Update

May 30, 2003

Not long after I put this page on the Web, I was shown some comments [Note1] and discussion on the methodology by C. J. Date, Hugh Darwen, David Miles, Kevin Waugh, and others. Based on these comments, it exhibits at least the following limitations:

Waugh claims it is not reversible in some cases, i.e., after applying the methodology to produce normalised relations, it is apparently not always possible to denormalise back to the original relations.[Waugh1] I have not verified this, but agree that it is probably true.
There is no means to identify candidate keys.[Waugh2]
The method fails if the model contains cycles.[Date1]

Futhermore, the diagramming style was claimed[Waugh3] to be similar to one employed in Date's text, "An Introduction to Database Systems".[Date2] While it turns out this doesn't necessarily affect the methodology (other than justifying alterations to match a more familiar model), and it may be a useful learning tool, I do question its overall usefulness in actual practice.[Note2].

Recently, I resumed interest in the idea -- a side effect of a growing interest in software (particularily database) visualisation. As time permits, I'll work on verifying and addressing the limitations, and review algorithms such as reference [Date3] for converting a set of FDs into 3NF (but not BCNF) relations. This may lead to an improved diagramming methodology and/or a means of visualising existing or new normalisation algorithms. Suggestions or comments are welcome.

[Note1] These were published on an email mailing list open to tutors and lecturers on a given university Database course. Out of respect for their privacy and copyright, I shall not reprint their comments or identify the university or the mailing list until I have obtained written permission to do so.

[Waugh1] Waugh, Kevin. Email to mailing list. (September 11, 2000 2:35 PM)

[Waugh2] Waugh, Kevin. Email to mailing list. (September 11, 2000 3:11 PM)

[Date1] Date, C. J. "A Response to Dave's Quick-But-Unproven Normalisation Method", Fax image of printed text attached to email to mailing list, forwarded by Hugh Darwen. (September 12, 2000.)

[Waugh3] Waugh, Kevin. Email to mailing list. (Thursday, September 07, 2000 10:14 AM)

[Note2] As more of a practitioner than a theoretician, I'd always treated normalisation as a means to an end rather than an object of study, so I was quite happy to normalise using conventional methods. Business models (or at least the ones I typically dealt with) tended to generalise to a limited set of common generic models (eg., Accounts Receivable, General Ledger, Inventory, etc.) -- each normalised once, reused frequently, coupled and decoupled easily, and altered rarely (with renormalisation as needed) to meet specific user requirements.

[Date2] Date, C. J. "An Introduction to Database Systems. Sixth Edition.", Addison Wesley Publishing Company, Inc. (1995)

[Date3] ibid. Page 321

END