The Movies Database Documentation

CSCI4380 Database Systems

Rensselaer Polytechnic Institute

Copyright 1996 Gio Wiederhold. Modified 1997 David L. Spooner.

1. Introduction

The Movies Database contains information about many films, actors, directors, studios, etc. While there is much missing information, you will find that the database contains many interesting and useful facts. The database is organized as a relational database. Not all the data conveniently fits into the form of relations (tables), however. Thus, many of the fields are text fields containing, for example, a list of producers for a film. Many other fields are null. You will also find that many relations have an attribute named "Notes" that contains interesting comments about the persons or films described by the relations. There is no guarantee that the information contained in this database is correct; although every effort has been made to ensure its accuracy.

The next sections describe the relations in The Movies Database. For each relation is listed its approximate size in tuples and the names and types of its attributes. Following is a section that shows associations between the relations in the database and the referential integrity rules that result. The final section discusses one implementation of The Movies Database using Microsoft Access.

2. The Films Relation

2.1 Summary

The Films relation contains information about films such as their titles, directors, and producers, as well as their year of release. It currently contains data for 10,145 films. Each film is identified with a unique number that serves as the primary key for the relation.

2.2 Attributes and Domains

FilmId (integer) =

Title (text string, length 50) =

Year (integer) =

Director (text string, length 50) =

Producers (text string, length 75) =

Studios (text string, length 75) =

Process (text string, length 100) =

Category (text string, length 30) =

Awards (text string, length 50) =

Location (text string, length 75) =

Notes (text string, length 255) =

3. The Actors Relation

3.1 Summary

This table contains information about actors. Each actor is identified by a "stagename", and there are intervals indicating the dates that the actor worked and the actor's lifetime. Other information in this relation is the actor's real name, background, and the type of roles he/she typically plays. There are currently 14,001 actors in the relation, but only about half have detailed information.

3.2 Attributes and Domains

ActorId (integer) =

StageName (text string, length 50) =

DatesOfWork (text string, length 10) =

BirthName (text string, length 20) =

FirstName (text string, length 20) =

Gender (character) =

DateOfBirth (text string, length 10) =

DateOfDeath (text string, length 10) =

RoleType (text string, length 20) =

Origin (text string, length 30) =

Notes (text string, length 255) =

4. The People Relation

4.1 Summary

This table contains information about directors, producers and other people associated with the film industry who are not primarily actors. It is similar to the Actors relation in that it contains intervals for when a person worked and when he/she lived. The key of the relation is a unique id number assigned to each person in the relation. A person code is used in the Films relation to refer to a person in this relation. The person code associated with each person is defined in this relation. As with the Actors relation, this relation includes the real name of a person. It also contains a field that indicates the occupation of a person within the film industry. There are currently 2878 people in this relation.

4.2 Attributes and Domains

PersonId (integer) =

PersonCode (text string, length 50) =

Jobs (text string, length 75) =

Dates (text string, length 20) =

LastName (text string, length 20) =

FirstName (text string, length 30) =

BirthDate (text string, length 10) =

DeathDate (text string, length 20) =

BirthCountry (text string, length 20) =

Notes (text string, length 255) =

5. The Studios Relation

5.1 Summary

This relation contains information about important studios in the history of the film industry. The studios are divided into three categories: pre-studio facilities, early studios, and recent studios. The key of the relation is a unique id number assigned to each studio. There are currently 177 studios included in this relation.

5.2 Attributes and Domains

StudioId (integer) =

Name (text string, length 30) =

Company (text string, length 30) =

Period (text string, length 20) =

City (text string, length 20) =

Country (text string, length 20) =

FoundedDate (text string, length 20) =

EndDate (text string, length 20) =

Founder (text string, length 100) =

Successor (text string, length 50) =

Notes (text string, length 150) =

6. The Casts Relation

6.1 Summary

This is a large relation that identifies who acted as what in which film. This is an association relation, linking the Actors relation with the Films relation. The key of the relation is the combination of a film id number, actor id number and role. All three values are needed for the key since an actor can play more than one role in a film. The relation currently has 40,588 tuples in it.

6.2 Attributes and Domains

FilmId (integer, foreign key of Films relation) =

ActorId (integer, foreign key of Actors relation) =

RoleType (text string, length 30) =

Role (text string, length 50) =

Awards (text string, length 10) =

Notes (text string, length 100) =

7. The Remakes Relation

7.1 Summary

This table gives information about films that are remakes of other films. It is an example of recursion in a relational database. For each remake, a number between 0 and 1 is given to estimate how identical the two films are. The key of the relation is the combination of a remade film id and the original film id. This relation currently has 1115 remakes in it.

7.2 Attributes and Domains

RemakeId (integer, foreign key of Films relation) =

Similarity (float) =

OriginalId (integer, foreign key of Films relation) =

8. The Synonyms Relation

8.1 Summary

This relation contains alternate titles for films. Alternate titles are sometimes used in different countries or when a film is re-released. The key for this relation is the combination of a film id number and an alternate title for the film. This relation currently has 362 alternate titles in it.

8.2 Attributes and Domains

FilmId (integer, foreign key of Films relation) =

SynTitle (text string, length 75) =

Synonym (text string, length 20) =

Original (text string, length 20) =

9. The Sayings Relation

9.1 Summary

This relation contains a few memorable quotes from films. Long quotes are broken into multiple lines, so that each tuple in the relation contains a line from a quote. The key for the relation is a combination of a quote identifier number and a line number. The Sayings relation currently has 25 quotes in it.

9.2 Attributes and Domains

Saying# (integer) =

FilmId (integer, foreign key of Films relation) =

Line# (integer) =

Speaker (text string, length 50) =

ActorId (integer, foreign key of the Actors relation) =

Role (text string, length 50) =

Listener (text string, length 50) =

Quote (text string, length 255) =

10. The AwardCodes Relation

10.1 Summary

This relation lists the types of awards given in the film industry and the corresponding awarding agencies. Books and magazines that rate films are also included here. Currently this relation has 30 tuples.

10.2 Attributes and Domains

Code (text string, length 10) =

AwardingOrg (text string, length 150) =

Country (text string, length 20) =

PopularName (text string, length 50) =

Years (text string, length 20) =

Notes (text string, length 100) =

11. The SpecialAwards Relation

11.1 Summary

This relation contains information about awards received for special occasions. Regular awards associated with a particular film are given in the Films relation. Regular awards associated with a particular performance are given in the Casts relation. Some of the special awards are given to actors and some are given to other people in the film industry (e.g., directors). Thus, the relation contains two attributes, one to identify an actor in the Actors relation and the other to identify a person in the People relation. Each tuple in this relation must specify a value in exactly one of these two attributes. This relation currently contains 20 special awards.

11.2 Attributes and Domains

SpecialAwardId (integer) =

ActorId (integer, foreign key of the Actors relation) =

PeopleId (integer, foreign key of the People relation) =

AwardOrg (text string, length 10, foreign key of the AwardCodes relation) =

Year (text string, length 10) =

AwardType (text string, length 50) =

AwardedFor (text string, length 50) =

12. The References Relation

12.1 Summary

This relation contains the bibliographic information for books that provided material for this database. In the database, references are indicated with a number in brackets (e.g., [1]). These numbers in brackets are used as the key for this relation. The relation currently contains 78 references.

12.1 Attributes and Domains

Ref# (text string, length 10) =

Authors (text string, length 50) =

Title (text string, length 100) =

Publisher (text string, length 150) =

Year (text string, length 10) =

Notes (text string, length 100) =

13. The Production Relation

13.1 Summary

The Production relation is a small relation that contains typical annual film production statistics for countries with a large film industry. The purpose of this relation is to convey an idea of where the majority of films are made. The relation currently has 4 tuples.

13.2 Attributes and Domains

Country (text string, length 20) =

AnnualFilmProd (integer) =

14. The Budget Relation

14.1 Summary

This small relation contains information about the typical breakdown of a film budget into various categories of expenses. Its purpose is to convey an idea of the relative cost of various aspects of film production. The relation currently contains 8 tuples.

14.2 Attributes and Domains

Category (text string, length 30) =

Percent (float) =

 15. Associations and Referential Integrity Rules

There are many associations that exist between the relations in The Movies Database. Each of these associations establishes a referential integrity rule for the database. These associations are shown in the figure below taken from the Microsoft Access implementation of the database.

In all cases, the associations are one-to-many associations. Note that the Studios, Budget, Production, and References relations do not participate in any associations. The bold attributes in each relation are the key attributes for the relations.

Each of the associations creates a referential integrity requirement for the database. As an example, consider the association between the ActorId attribute in the Actors relation and the ActorId attribute in the Casts relation. The referential integrity requirement established by this association is that any non-null value in the ActorId attribute of the Casts relation must match the ActorId attribute in some tuple in the Actors relation. In other words, an actor cannot participate in the cast of a film unless the actor is listed in the Actors relation. It is the job of the database system to make sure that these referential integrity requirements are never violated as a result of updates to the database.

16. A Microsoft Access Implementation of The Movies Database

The Movies Database has been implemented using Microsoft Access97 running on Windows95. This version of the database is read-only. The database file is roughly 9.5 megabytes in size. It is available on the World Wide Web at URL: http://www.cs.rpi.edu/~spoonerd/dbs/MOVIEDB/movies.mdb . When the database is opened using Access97, the following form appears.

The functions of the buttons are as follows:

Movies -- this button displays a form with a variety of queries about films in the database. These queries include:

People in the movie industry -- this button displays a form with a variety of queries about actors, directors, producers, and other people in the film industry. These queries include:

Movie Studios -- this button displays a form with a variety of queries about movie studios and facilities. Specific queries include:

Movie Production Data -- this button displays a form with queries about annual production data and film budgets. Specific queries include:

Other Data -- this button displays a form with queries about other types of data in the database. Specific queries include:

Each of these forms also has a button that returns to the main menu shown above. The Exit button on the main menu closes The Movies Database. Each of the forms used to display the results of a query has a close button that removes the form from the screen.

Queries requiring the user to enter data begin with a dialogue box requesting this data. For example, in the Movies menu, when the Find Film button is clicked, the following dialogue box is displayed.

If the user types Forrest Gump and clicks OK, the following form appears with the result of the query.

To close this form, click the Close button. If the result of a query contains more than one tuple, this will be indicated in the lower left corner of the form. Click on the left and right arrows to scroll through the tuples. In this case, the result of the query is a single tuple, so the left and right arrows are disabled.

Questions, error reports, comments and database corrections should be sent to David Spooner at spoonerd@cs.rpi.edu.