CSCI 4964/6963 Interactive Visualization
Spring 2016

Home
  Contact Information
  Office Hours   Announcements
  Discussion Forum (LMS)

Syllabus
  Prerequistites
  Learning Outcomes
  Course Grades

Calendar
  Lecture notes
  Readings
  Homework

Readings

Homework
  Late Day Policy
  Electronic Submission

Final Project
   Spring '16 Projects

References
  On-lin Material
  Optional Books

Assignment #4: Data Collection and Preparation

For this homework, you may work in a team of 2 or individually. You are encouraged to work with someone you hadn't met before this course.

This week your primary task is identify and collect a new and interesting (to you!) data set that is also interestingly large. You are expected to use your programming skills to obtain and/or wrangle this data into a file format you can visualize and analyze.

Some examples of where to start:

  • Take a non trivial computer program (for example a simulation or a solver) you have written and add dense logging information. How often does each function get called? How many times does an inner loop get called? What is the pattern of data stored in a variable or passed into a function?

  • Monitor your own computer activity, what keys do you press, where does your mouse move, what files do you open, what?

  • Scrape the GPS data off of your phone to gather your location over time. Or your heart-rate from a smart watch.

  • Setup a microphone or video camera and collect a stream of audio and/or images.

Try to find a dataset that's not simply "download a file". You should be doing a moderate amount of work (writing code) to either collect or parse/reorganize/simplify/post-process this data.

NOTE: Grad students working on a thesis or undergraduates working on a research project are strongly encouraged (required?) to work with a research-related data source.

Once you've selected a data source...

  • Write down at least 2 specific research questions that can be solved by analyzing this data. The first should be "obvious" and may simply communicate the overall quantity of data you've got your hands on. The second should be more complex or subtle, that can be answered by the data, but will involve rearranging or simplifying or finding correlations within the data.

  • What are your specific hypotheses related to these research questions? What knowledge are your drawing on to make these predictions?

  • With your research questions in mind, design the detailed format for your raw data (the columns of your data "spreadsheet") and decide on the action or sampling frequency for each "row" of the data. Make sure you are able to acquire an "interesting" amount of data, both number of samples (at least 1000 rows?) and dimensions per sample (at least 3 columns?) Note: These estimates are not requirements. If your data has many more columns, things can be quite interesting even with far fewer rows.

  • Create (at least 2) simple visualization plots of this data using a tool that's new to you (or you would like to learn more about). Consider using: Excel, LineUp, Tableau, Google Analytics, Plotly, or VTK. These plots should attempt to answer the research questions you posed earlier. You can revise your research questions as needed as you work with the data.

When you're ready to submit:

  • Prepare a writeup for this assignment with the information requested above as either a .pdf with inline images or a plaintext README.txt with well-named image files. Additionally, your writeup should detail the efforts you made to collect, parse, reorganize, simplify, and/or post-process this data source.

  • In a code directory, include the source code you wrote to collect the data. (Don't include 3rd party libraries, it won't be compiled or run for grading purposes.)

  • In a data directory, include interesting samples of the data. Don't attempt to upload the entire dataset (it might be too big!), but a sample that shows the format and range of values. Document the overall size of the data (# of rows and/or file size for context). Depending on any work you had to do to wrangle the data into an alternate format, include samples of the data at intermediate and final stages as well.

  • A brief review of the tool you used to create the visualizations.

Note: Teams of two should clearly label their submission with both names. And both students should upload the full assignment.