When data modeling, you are telling the RDBMS the following:
For example, consider a table for recording user-submitted comments to a Web site. The publisher has made the following stipulations:
Implicit in this model is the assumption that
create table user_submitted_comments (
comment_id integer primary key,
user_id not null references users,
submission_time date default sysdate not null,
ip_address varchar(50) not null,
content clob,
approved_p char(1) check(approved_p in ('t','f'))
);
approved_p
can be NULL and that, if not explicitly set during the INSERT, that is
what it will default to. What about the check constraint? It would
seem to restrict approved_p to values of "t" or "f". NULL,
however, is a special value and if we wanted to prevent
approved_p from taking on NULL we'd have to add an explicit
not null constraint.
How do NULLs work with queries? Let's fill
user_submitted_comments with some sample data and see:
We've successfully JOINed the
insert into user_submitted_comments
(comment_id, user_id, ip_address, content)
values
(1, 23069, '18.30.2.68', 'This article reminds me of Hemingway');
Table created.
SQL> select first_names, last_name, content, user_submitted_comments.approved_p
from user_submitted_comments, users
where user_submitted_comments.user_id = users.user_id;
FIRST_NAMES LAST_NAME CONTENT APPROVED_P
------------ --------------- ------------------------------------ ------------
Philip Greenspun This article reminds me of Hemingway
user_submitted_comments and
users table to get both the comment content and the name of
the user who submitted it. Notice that in the select list we had to
explicitly request
user_submitted_comments.approved_p. This is because
the users table also has an approved_p
column.
When we inserted the comment row we did not specify a value for the
approved_p column. Thus we expect that the value would be
NULL and in fact that's what it seems to be. Oracle's SQL*Plus
application indicates a NULL value with white space.
For the administration page, we'll want to show only those
comments where the approved_p column is NULL:
"No rows selected"? That's odd. We know for a fact that we have one row in the comments table and that is
SQL> select first_names, last_name, content, user_submitted_comments.approved_p
from user_submitted_comments, users
where user_submitted_comments.user_id = users.user_id
and user_submitted_comments.approved_p = NULL;
no rows selected
approved_p column is
set to NULL. How to debug the query? The first thing to do is simplify
by removing the JOIN:
What is happening here is that any expression involving NULL evaluates to NULL, including one that effectively looks like "NULL = NULL". The WHERE clause is looking for expressions that evaluate to true. What you need to use is the special test IS NULL:
SQL> select * from user_submitted_comments where approved_p = NULL;
no rows selected
An adage among SQL programmers is that the only time you can use "= NULL" is in an UPDATE statement (to set a column's value to NULL). It never makes sense to use "= NULL" in a WHERE clause.
SQL> select * from user_submitted_comments where approved_p is NULL;
COMMENT_ID USER_ID SUBMISSION_T IP_ADDRESS
---------- ---------- ------------ ----------
CONTENT APPROVED_P
------------------------------------ ------------
1 23069 2000-05-27 18.30.2.68
This article reminds me of Hemingway
The bottom line is that as a data modeler you will have to decide which columns can be NULL and what that value will mean.
This data model locks you into some realities:
create table mailing_list (
email varchar(100) not null primary key,
name varchar(100)
);
create table phone_numbers (
email varchar(100) not null references mailing_list,
number_type varchar(15) check (number_type in ('work','home','cell','beeper')),
phone_number varchar(20) not null
);
number_type column may be too constrained.
Suppose William H. Gates the Third wishes to record some extra phone
numbers with types of "boat", "ranch", "island", and "private_jet". The
check (number_type in ('work','home','cell','beeper'))
statement prevents Mr. Gates from doing this.
name column of
mailing_list is free to be NULL.
email as a key in two tables and
therefore will have to update both tables. The references
mailing_list keeps you from making the mistake of only updating
mailing_list and leaving orphaned rows in
phone_numbers. But if users changed their email addresses
frequently, you might not want to do things this way.
These aren't necessarily bad realities in which to be locked. However,
a good data modeler recognizes that every line of code in the .sql file
has profound implications for the Web service.
To get some more information on how a simple datamodel for a Discussion Forum can evolve, read en:sql-wn-data_modeling-philip
We have some big goals to consider. We want the data in the database to
Note that we use a generated integer
create table static_pages (
page_id integer not null primary key,
url_stub varchar(400) not null unique,
original_author integer references users(user_id),
page_title varchar(4000),
page_body clob,
obsolete_p char(1) default 'f' check (obsolete_p in ('t','f')),
members_only_p char(1) default 'f' check (members_only_p in ('t','f')),
price number,
copyright_info varchar(4000),
accept_comments_p char(1) default 't' check (accept_comments_p in ('t','f')),
accept_links_p char(1) default 't' check (accept_links_p in ('t','f')),
last_updated date,
-- used to prevent minor changes from looking like new content
publish_date date
);
create table static_page_authors (
page_id integer not null references static_pages,
user_id integer not null references users,
notify_p char(1) default 't' check (notify_p in ('t','f')),
unique(page_id,user_id)
);
page_id key for this
table. We could key the table by the url_stub (filename),
but that would make it very difficult to reorganize files in the Unix
file system (something that should actually happen very seldom on a Web
server; it breaks links from foreign sites).
How to generate these unique integer keys when you have to insert a new
row into static_pages? You could
page_id so far
page_id
Then we can get new page IDs by using
create sequence page_id_sequence start with 1;
page_id_sequence.nextval in INSERT statements (see
the Transactions chapter for a fuller
discussion of sequences).
--- based on SQL for Web Nerds