The Movies Database Documentation
CSCI4380 Database Systems
Rensselaer Polytechnic Institute
Copyright 1996 Gio Wiederhold. Modified 1997 David L. Spooner.
1. Introduction
The Movies Database contains information about many films, actors, directors, studios, etc. While there is much missing information, you will find that the database contains many interesting and useful facts. The database is organized as a relational database. Not all the data conveniently fits into the form of relations (tables), however. Thus, many of the fields are text fields containing, for example, a list of producers for a film. Many other fields are null. You will also find that many relations have an attribute named "Notes" that contains interesting comments about the persons or films described by the relations. There is no guarantee that the information contained in this database is correct; although every effort has been made to ensure its accuracy.
The next sections describe the relations in The Movies Database. For each relation is listed its approximate size in tuples and the names and types of its attributes. Following is a section that shows associations between the relations in the database and the referential integrity rules that result. The final section discusses one implementation of The Movies Database using Microsoft Access.
2. The Films Relation
2.1 Summary
The Films relation contains information about films such as their titles, directors, and producers, as well as their year of release. It currently contains data for 10,145 films. Each film is identified with a unique number that serves as the primary key for the relation.
2.2 Attributes and Domains
FilmId (integer) =
- An internally generated id number for the film. This is the key of the relation and is unique.
Title (text string, length 50) =
- The film's title. This field is not necessarily unique, but it is required.
Year (integer) =
- Year the film was released. This field is required.
Director (text string, length 50) =
- Director of the film. In some cases the full name is given, otherwise a standardized id-name is used. All directors must appear in the People relation, which can be used to decode the standardized id-name. If there are multiple candidate directors, the one who finished the film is chosen and other candidates are listed in the Notes field (see below) labeled as CoDirector(). This field is optional.
Producers (text string, length 75) =
- Producer(s) of the film. In some cases a full name is given, in other cases a standardized id-name is used. If a standardized id-name is used, it can be decoded in the People relation. Multiple producers are permitted and common. In these cases the field contains a list of producers. This field is optional.
Studios (text string, length 75) =
- Studio(s) where the film was made. Common studio names appear in the Studios relation. If the studio is not known or uncommon its location may be given by specifying the name of the country that contains the studio. If more than one studio was used to make the film, then this field contains a list of studios. This field is optional.
- If a film was distributed by a different studio than the one(s) at which it was filmed, then the distributor for the film is identified in this field, preceded by the label "Distributor:".
Process (text string, length 100) =
- Process used to make the film (e.g. black and white or color). Color processes are broken down into various types. Often a film uses more than one process. This field also indicates if a film is a silent film. This field is required.
Category (text string, length 30) =
- Category of the film (e.g. suspense, mystery). A film may belong to more than one category. This field is optional.
Awards (text string, length 50) =
- List of awards received by the film. The codes used to identify awards are listed in the AwardCodes relation. This field included not only actual awards but also favorable (mostly) mentions in compendia such as Halliwell [4] and Roger Ebert's [10] books, with the appropriate number of stars. A + symbol is a half star and a - after a code indicates a negative mention. -H means not in Halliwell [4]. This field is optional.
Location (text string, length 75) =
- Location for the story depicted by the film. Multiple locations are separated by semicolons (;), multiple levels in one location are separated by commas. For example, "high school, countryside, CA", indicates the film location is a California high school in the countryside. For countries other than the USA the country name is given as well. If the period of the film is significant it is given as Time([[dd]mmm[yyyy]). This field is optional.
Notes (text string, length 255) =
- Here a variety of notes are kept. All entries have a designator to identify the type of a note. The actual value of the note is listed in parentheses after the designator (e.g., Writer(name), R(rating)) . Designators such as writers can have multiple entries, separated by commas. If an award is associated with an entry, such as an academy award for the writers of a film, it follows the name(s) after a semicolon (;). For authors (also music directors), the title is specified as Author(author:book;awards). Error() means possible error to be verified sometime. This field is optional.
3. The Actors Relation
3.1 Summary
This table contains information about actors. Each actor is identified by a "stagename", and there are intervals indicating the dates that the actor worked and the actor's lifetime. Other information in this relation is the actor's real name, background, and the type of roles he/she typically plays. There are currently 14,001 actors in the relation, but only about half have detailed information.
3.2 Attributes and Domains
ActorId (integer) =
- This field is the key for the Actors relation and assigns a unique id number to each actor. This field is needed since the stage names of actors are not unique. This field is required.
StageName (text string, length 50) =
- Stagename of an actor. When an actor has used multiple names the last one used is preferred. There are a few actors with identical names. This field is required.
DatesOfWork (text string, length 10) =
- The interval during which the actor worked. In general the first date is the date of the first film by the actor. For actors who are no longer working, the second date is the date of the final film by the actor. For actors who are still working, the second date, if given, is often meaningless. This field is optional.
BirthName (text string, length 20) =
- Original last name. This field is optional.
FirstName (text string, length 20) =
- Original first name. Nicknames or other assumed names are specified in parentheses. This field is optional.
Gender (character) =
- Coded as F for female, M for male, X for unknown, G for group (i.e., musical group), and A for Animal. This field is optional.
DateOfBirth (text string, length 10) =
- If not found in a particular reference book, the number of the reference book is given in brackets (i.e., [ref]). Consult the Reference relation for the details of this reference book. If found in the book, but the birth date is unknown, the field contains a *. This field is optional.
DateOfDeath (text string, length 10) =
- If unknown or the actor is alive, this field contains "unknown". Year+ indicates that the actor was still alive in that year. This field is optional.
RoleType (text string, length 20) =
- Types of roles played by the actor (e.g., leading man, hero). This field is optional.
Origin (text string, length 30) =
- Country of origin. This field is optional.
Notes (text string, length 255) =
- Used mainly for additional facts such as MarriedTo(name), LivedWith(name), and WorkedWith(name). A code Cit(n) indicates how frequently the actor is cited in the Casts relation. This field is optional.
4. The People Relation
4.1 Summary
This table contains information about directors, producers and other people associated with the film industry who are not primarily actors. It is similar to the Actors relation in that it contains intervals for when a person worked and when he/she lived. The key of the relation is a unique id number assigned to each person in the relation. A person code is used in the Films relation to refer to a person in this relation. The person code associated with each person is defined in this relation. As with the Actors relation, this relation includes the real name of a person. It also contains a field that indicates the occupation of a person within the film industry. There are currently 2878 people in this relation.
4.2 Attributes and Domains
PersonId (integer) =
- A unique id number assigned to each person. This field is the key for the relation and is required.
PersonCode (text string, length 50) =
- A unique code for each person that is based on the name of the person in standardized form. This code is referenced by some of the fields in the Films relation (e.g., Director, Producer). This field is required and is a candidate key for the relation.
Jobs (text string, length 75) =
- This field indicates if a person is a Producer, Director, Writer, Actor, Cinematographer, Choreographer, or a Visual arts director. A person may be more than one of these, so the field contains a list of job titles. This field is optional.
Dates (text string, length 20) =
- This field contains the interval of time in which the person worked. If two dates are given, the first is the first year of work for the person and the second is the last year of work. For people who are still alive, the second date, if given, is often meaningless. This field is optional.
LastName (text string, length 20) =
- Given last name of the person; it may be spelled more precisely here than in the PersonCode field above. This field is required.
FirstName (text string, length 30) =
- Given first name of the person. Nicknames or other assumed names are given in parentheses. This field is optional.
BirthDate (text string, length 10) =
- Date of Birth. If not found in a reference book, then the number of the reference book in the Reference relation is given in square brackets (i.e., [ref]). If found in the book, but the birth date is unknown, the field contains a *. This field is optional.
DeathDate (text string, length 20) =
- Date of Death. If unknown or the person is alive, the field contains "unknown". This field is optional.
BirthCountry (text string, length 20) =
- The person's birth country. This field is optional.
Notes (text string, length 255) =
- This field is similar to the Notes field in the Actors relation. Persons who are female are identified as Gender(F). Special awards (not associated with a film) are shown as Award(). This field is optional.
5. The Studios Relation
5.1 Summary
This relation contains information about important studios in the history of the film industry. The studios are divided into three categories: pre-studio facilities, early studios, and recent studios. The key of the relation is a unique id number assigned to each studio. There are currently 177 studios included in this relation.
5.2 Attributes and Domains
StudioId (integer) =
- A unique id number assigned to a studio. This field is the key of the relation and is required.
Name (text string, length 30) =
- Short name of the studio. This field is required.
Company (text string, length 30) =
- Company that owns the studio. This field is optional.
Period (text string, length 20) =
- Category in which the studio belongs (i.e., pre-studio facility, early studio, recent studio). This field is optional.
City (text string, length 20) =
- City where the studio is located. This field is optional.
Country (text string, length 20) =
- The country in which the studio is located. This field is optional.
FoundedDate (text string, length 20) =
- Date the studio was founded or first opened. This field is optional.
EndDate (text string, length 20) =
- Last date at which the studio was active. This field is optional.
Founder (text string, length 100) =
- The studio's founder. There may be more than one. This field is optional.
Successor (text string, length 50) =
- The fate of the studio. This field is optional.
Notes (text string, length 150) =
- Other interesting facts about the studio. This field is optional.
6. The Casts Relation
6.1 Summary
This is a large relation that identifies who acted as what in which film. This is an association relation, linking the Actors relation with the Films relation. The key of the relation is the combination of a film id number, actor id number and role. All three values are needed for the key since an actor can play more than one role in a film. The relation currently has 40,588 tuples in it.
6.2 Attributes and Domains
FilmId (integer, foreign key of Films relation) =
- Identifier of the film. All films must appear in the Films relation. This field is part of the key and is required.
ActorId (integer, foreign key of Actors relation) =
- Identifier of the actor in this role. All actors must be in the Actors relation. If the actor is unknown and the role is important, then a placeholder actor named "sa" (some actor) is used. This field is part of the key and is required.
RoleType (text string, length 30) =
- Type of the role (e.g., cameo, hero, ruler, suspect, wimp). If the role is not classified, then the field contains "unclassified". This field is required.
Role (text string, length 50) =
- Short description of the role. This field is part of the key and is required.
Awards (text string, length 10) =
- Awards given to this actor for this role. The awards are listed using award codes that are defined in the AwardCodes relation. This is an optional field.
Notes (text string, length 100) =
- This field is rarely used except for something exceptional in the performance such as "Note(Garbo laughs)". This field is optional.
7. The Remakes Relation
7.1 Summary
This table gives information about films that are remakes of other films. It is an example of recursion in a relational database. For each remake, a number between 0 and 1 is given to estimate how identical the two films are. The key of the relation is the combination of a remade film id and the original film id. This relation currently has 1115 remakes in it.
7.2 Attributes and Domains
RemakeId (integer, foreign key of Films relation) =
- Identifier of the remake. All remakes must appear in the Films relation. This field is part of the key and is required.
Similarity (float) =
- A fraction indicating how similar the remake is to the original. This field is required.
OriginalId (integer, foreign key of Films relation) =
- Identifier of the original film. All original films must appear in the Films relation. This field is part of the key and is required.
8. The Synonyms Relation
8.1 Summary
This relation contains alternate titles for films. Alternate titles are sometimes used in different countries or when a film is re-released. The key for this relation is the combination of a film id number and an alternate title for the film. This relation currently has 362 alternate titles in it.
8.2 Attributes and Domains
FilmId (integer, foreign key of Films relation) =
- Identifier of a film in the Films relation that has an alternate title. This field is part of the key and is required.
SynTitle (text string, length 75) =
- Alternate title (synonym) for the film. This field is part of the key and is required.
Synonym (text string, length 20) =
- Country, year, or circumstances in which the alternate title was used. This field is required.
Original (text string, length 20) =
- Country, year, or circumstances in which the original title was used. This field is required.
9. The Sayings Relation
9.1 Summary
This relation contains a few memorable quotes from films. Long quotes are broken into multiple lines, so that each tuple in the relation contains a line from a quote. The key for the relation is a combination of a quote identifier number and a line number. The Sayings relation currently has 25 quotes in it.
9.2 Attributes and Domains
Saying# (integer) =
- A unique id number for each saying. This field is part of the primary key and is required.
FilmId (integer, foreign key of Films relation) =
- The id number for the film from which the quote is taken. In a few cases the film is unknown or the quote does not come from a film. Hence, this field is optional.
Line# (integer) =
- The line number within a quote. Most quotes require only a single line; however a few long quotes require multiple lines. This field is part of the key and is required.
Speaker (text string, length 50) =
- The name of the actor delivering the quote. This field is required.
ActorId (integer, foreign key of the Actors relation) =
- The id number for the quoted actor in the Actors relation, if the speaker is in the actor relation. This field is optional.
Role (text string, length 50) =
- Role of quoted actor. This field is optional.
Listener (text string, length 50) =
- Role of the person to whom the quote is spoken. This field is optional.
Quote (text string, length 255) =
- Text of one line of a quote. This field is required.
10. The AwardCodes Relation
10.1 Summary
This relation lists the types of awards given in the film industry and the corresponding awarding agencies. Books and magazines that rate films are also included here. Currently this relation has 30 tuples.
10.2 Attributes and Domains
Code (text string, length 10) =
- Short code used to identify the award in the Films, Casts, and SpecialAwards relations. This field is the key of the relation and is required.
AwardingOrg (text string, length 150) =
- Agency that gives awards or author of a book or magazine that rates films. This field is required.
Country (text string, length 20) =
- Country in which the award is given or the book or magazine is published. This field is required.
PopularName (text string, length 50) =
- An alternate name for the type of award. This field is optional.
Years (text string, length 20) =
- Interval in time in which the award is/was given. This field is required.
Notes (text string, length 100) =
- Special interesting facts concerning the award. This field is optional.
11. The SpecialAwards Relation
11.1 Summary
This relation contains information about awards received for special occasions. Regular awards associated with a particular film are given in the Films relation. Regular awards associated with a particular performance are given in the Casts relation. Some of the special awards are given to actors and some are given to other people in the film industry (e.g., directors). Thus, the relation contains two attributes, one to identify an actor in the Actors relation and the other to identify a person in the People relation. Each tuple in this relation must specify a value in exactly one of these two attributes. This relation currently contains 20 special awards.
11.2 Attributes and Domains
SpecialAwardId (integer) =
- Unique identifier for the special awards. This is the key of the relation and is required.
ActorId (integer, foreign key of the Actors relation) =
- If the special award is for an actor, this attribute contains the id number for this actor in the Actors relation. This field is optional.
PeopleId (integer, foreign key of the People relation) =
- If the special award is for a director, producer, etc., this attribute contains the id number of the person in the People relation. This field is optional.
AwardOrg (text string, length 10, foreign key of the AwardCodes relation) =
- Award code for the type of award given. This field is required.
Year (text string, length 10) =
- Year in which the award was given. This field is required.
AwardType (text string, length 50) =
- A description of the award that was given. This field is optional.
AwardedFor (text string, length 50) =
- A description of why the award was given. This field is optional.
12. The References Relation
12.1 Summary
This relation contains the bibliographic information for books that provided material for this database. In the database, references are indicated with a number in brackets (e.g., [1]). These numbers in brackets are used as the key for this relation. The relation currently contains 78 references.
12.1 Attributes and Domains
Ref# (text string, length 10) =
- Unique reference number in []. This is the key of the relation and is required.
Authors (text string, length 50) =
- Names of the authors of the reference. This field is optional.
Title (text string, length 100) =
- Title of reference. This field is required.
Publisher (text string, length 150) =
- Bibliographic information about the publisher of the reference. This field is required.
Year (text string, length 10) =
- Year of publication of the reference. This field is required.
Notes (text string, length 100) =
- Important or interesting facts about the reference. This field is optional.
13. The Production Relation
13.1 Summary
The Production relation is a small relation that contains typical annual film production statistics for countries with a large film industry. The purpose of this relation is to convey an idea of where the majority of films are made. The relation currently has 4 tuples.
13.2 Attributes and Domains
Country (text string, length 20) =
- The name of a country that produces many films in a typical year. This is the key of the relation and is required.
AnnualFilmProd (integer) =
- The number of films produced by this country in a typical year. This field is required.
14. The Budget Relation
14.1 Summary
This small relation contains information about the typical breakdown of a film budget into various categories of expenses. Its purpose is to convey an idea of the relative cost of various aspects of film production. The relation currently contains 8 tuples.
14.2 Attributes and Domains
Category (text string, length 30) =
- A category of expenses in film production. This field is the key of the relation and is required.
Percent (float) =
- Percentage of a typical film budget that is spent in this category. This field is required.
15. Associations and Referential Integrity Rules
There are many associations that exist between the relations in The Movies Database. Each of these associations establishes a referential integrity rule for the database. These associations are shown in the figure below taken from the Microsoft Access implementation of the database.

In all cases, the associations are one-to-many associations. Note that the Studios, Budget, Production, and References relations do not participate in any associations. The bold attributes in each relation are the key attributes for the relations.
Each of the associations creates a referential integrity requirement for the database. As an example, consider the association between the ActorId attribute in the Actors relation and the ActorId attribute in the Casts relation. The referential integrity requirement established by this association is that any non-null value in the ActorId attribute of the Casts relation must match the ActorId attribute in some tuple in the Actors relation. In other words, an actor cannot participate in the cast of a film unless the actor is listed in the Actors relation. It is the job of the database system to make sure that these referential integrity requirements are never violated as a result of updates to the database.
16. A Microsoft Access Implementation of The Movies Database
The Movies Database has been implemented using Microsoft Access97 running on Windows95. This version of the database is read-only. The database file is roughly 9.5 megabytes in size. It is available on the World Wide Web at URL: http://www.cs.rpi.edu/~spoonerd/dbs/MOVIEDB/movies.mdb . When the database is opened using Access97, the following form appears.

The functions of the buttons are as follows:
Movies -- this button displays a form with a variety of queries about films in the database. These queries include:
- Find a film given its name
- Find the actors in a film given the film's id number (obtained by using the first query)
- Find the roles in a film given the film's id number (obtained by using the first query)
- Find the remakes of a film given its id number (obtained by using the first query)
- Find the original films of a remade film given its id number (obtained by using the first query)
- Find alternate titles for a film given the film's id number (obtained by using the first query)
People in the movie industry -- this button displays a form with a variety of queries about actors, directors, producers, and other people in the film industry. These queries include:
- Find an actor given the actor's name
- Find the films of an actor given the actor's id number (obtained using the first query)
- Find the roles of an actor given the actor's id number (obtained using the first query)
- Find directors, producers, writers, etc. given the last name
Movie Studios -- this button displays a form with a variety of queries about movie studios and facilities. Specific queries include:
- Browse pre-studio facilities
- Browse early studios
- Browse recent studios
Movie Production Data -- this button displays a form with queries about annual production data and film budgets. Specific queries include:
- Browse annual production statistics in textual form and optionally as a pie chart
- Browse a typical film budget breakdown in textual form and optionally as a pie chart
Other Data -- this button displays a form with queries about other types of data in the database. Specific queries include:
- Browse often quoted lines from movies
- Browse special awards that have been given
- Find the details of an award type given an award code
- Find the details of a reference given the number of the reference in square brackets
Each of these forms also has a button that returns to the main menu shown above. The Exit button on the main menu closes The Movies Database. Each of the forms used to display the results of a query has a close button that removes the form from the screen.
Queries requiring the user to enter data begin with a dialogue box requesting this data. For example, in the Movies menu, when the Find Film button is clicked, the following dialogue box is displayed.

If the user types Forrest Gump and clicks OK, the following form appears with the result of the query.

To close this form, click the Close button. If the result of a query contains more than one tuple, this will be indicated in the lower left corner of the form. Click on the left and right arrows to scroll through the tuples. In this case, the result of the query is a single tuple, so the left and right arrows are disabled.
Questions, error reports, comments and database corrections should be sent to David Spooner at spoonerd@cs.rpi.edu.