IMDB movies data           package:ggplot           R Documentation

_M_o_v_i_e _i_n_f_o_r_m_a_t_i_o_n _a_n_d _u_s_e_r _r_a_t_i_n_g_s _f_r_o_m _I_M_D_B._c_o_m

_D_e_s_c_r_i_p_t_i_o_n:

     The internet movie database, \href{http://imdb.com/}{imdb.com}, is
     a website devoted to collecting movie data supplied by studios and
     fans.  It claims to be the biggest movie database on the web and
     is run by amazon.  More about information imdb.com can be found
     \href{http://imdb.com/help/show_leaf?about}{online}, including
     information about the
     \href{http://imdb.com/help/show_leaf?infosource}{data collection
     process}.

     IMDB makes their \href{http://uk.imdb.com/interfaces/}{raw data
     available}. Unfortunately, the data is divided into many text
     files and the format of each file differs slightly.  To create one
     data file containing all the desired information I wrote a script
     in the \href{http://ruby-lang.org}{ruby} to extract the relevent
     information and store in a database.  This data was then exported
     into csv for easy import into many programs.

     The following text files were downloaded and used:

     \begin{itemize}

     *  business.list. Total budget

     *  genres.list.  Genres that a movie belongs to (eg. comedy and
        action)

     *  movies.list.  Master list of all movie titles with year of
        production.

     *  mpaa-ratings-reasons.list.  MPAA ratings.

     *  ratings.list.  IMDB fan ratings.

     *  running-times.list.  Movie length in minutes. \end{itemize}

        Movies were selected for inclusion if they had a known length
        and had been rated by at least one imdb user.  The csv file
        contains the following fields:

        \begin{itemize}

     *  title.  Title of the movie.

     *  year.  Year of release.

     *  budget.  Total budget (if known) in US dollars

     *  length.  Length in minutes.

     *  rating.  Average IMDB user rating.

     *  votes.  Number of IMDB users who rated this movie.

     *  r1-10.  Multiplying by ten gives percentile (to nearest 10%) of
        users who rated this movie a 1.

     *  mpaa.  MPAA rating.

     *  action, animation, comedy, drama, documentary, romance, short. 
        Binary variables representing if movie was classified as
        belonging to that genre. \end{itemize}

_U_s_a_g_e:

     data(movies)

_F_o_r_m_a_t:

     A data frame with 28819 rows and 24 variables

_R_e_f_e_r_e_n_c_e_s:

     <URL: http://had.co.nz/data/movies/>

