# CS 1173 Data Analysis and Visualization in MATLAB Laboratory 2 Cats vs Dogs

## Objectives:

• Analyze a new data set applying tools of previous lessons and labs.
• Further develop critical thinking skills.
• Interpret data and draw conclusions.

Data.world, with the help of the American Veterinary Medical Association, did a study in 2018 of Cats vs Dogs popularity in the U.S. Their goal was to find out which pet is more popular by state. The goal of this lab is to use the same data but instead focus on the quantity of pets within certain regions of the US.

### Hand-in Requirements:

• The lab2Script.m script file containing the code used to read the data and generate the graphs. (Be sure your script displays each of the figures you want to have considered.)
• A Microsoft Word document with bullet points.

The lab should be submitted submitted electronically through Blackboard under the Labs menu. Zip up your entire lab2 directory to submit. (Right click on the lab2 folder and to zip all the files together.) Remember to put your Word document in the lab2 directory along with your script and the data.

 File Description Lab2Pets.zip The .zip file contains 3 files. pets.mat The dataset contains a 48x6 matrix. Each row represents a state, in alphabetical order, with the exclusion of Alaska and Hawaii. (Ex: Alabama is row 1 and Wyoming is row 48) pets.mat has 6 columns, with the breakdown below: Column 1 contains the number of households surveyed. Column 2 contains the number of households with any cat or dog pets. Column 3 contains the number of households that had at least 1 dog as a pet. Column 4 contains the overall number of pet dogs. Column 5 contains the number of households that had at least 1 cat as a pet. Column 6 contains the overall number of pet cats. us_states.csv States listed in alphabetical order and its index number. lab2.m is a template that you should use for Lab 2.

### Part I:Initial Setup

<
• Open the script file Lab2.m and load the data. You will start with the Lab2.m file and edit it.

### Part II: Identifying and extracting the regions

We are going to be dividing the US into 4 major regions. They are Northeast, Midwest, South and West. To find what region each state belongs to, use the table below.

 Region List of States Northeast Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont, New Jersey, New York and Pennsylvania Midwest Illinois, Indiana, Michigan, Ohio, Wisconsin, Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, and South Dakota South Delaware, Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, West Virginia, Alabama, Kentucky, Mississippi, Tennessee, Arkansas, Louisiana, Oklahoma, and Texas. West Arizona, Colorado, Idaho, Montana, Nevada, New Mexico, Utah, Wyoming, California, Oregon, and Washington.

Using the above regional definitions and the data in us_states.csv, determine what rows belong to what regions, and create the variables below with the data from each state. For example, West has 11 states in it, so your resulting array that you create should have 11 rows and 6 columns. You will edit the code in Lab2.m to create these variables.

• West = ? % Extract the appropriate rows for the West region
• Midwest = ? % Extract the appropriate rows for the Midwest region
• Northeast = ? % Extract the appropriate rows for the Northeast region
• South = ? % Extract the appropriate rows for the South region
• Region_households = ? % A vector of four values, each value is the total number of households surveyed in each region - one value for each region.

### Part III: Pie Chart of households surveyed by region

Create a new cell in which, using the West, Midwest, Northeast and South variables you created above, create a pie chart containing the percentage of households surveyed by region. Hint: There should be 4 pie pieces.

### Part IV: Comparing cats and dogs by region.

Plot 2 separate bar charts on the same figure (subplot), the left containing the total number of cats by region and the right is total number of dogs by region. Label and scale your axes appropriately.

To label each bar, use set(gca,'xticklabel',{'West','Midwest','Northeast','South'}). You must use the region_cats and region_dogs variables.

### Part V: Statistics table

Populate the statistics variables in the script with the correct definitions. This will printout a table of statistics similar to the one below, once you are done. You are free to create other variables to help you with these values.

 All Midwest West South Northeast Houses ? ? ? ? ? Max Dogs ? ? ? ? ? % of dogs ? ? ? ? ? Avg dogs ? ? ? ? ? Max Cats ? ? ? ? ? % of cats ? ? ? ? ? Avg cats ? ? ? ? ?

where the rows mean:

• Num of Houses: Number of houses surveyed.
• Max_Cats: maximum number of cats found in any state in that region.
• Per_Cats: percentage of households surveyed that have at least one cat as a pet.
• Avg_Cats: average number of cats in a household reporting cats as pets.
• Max_Dogs: maximum number of dogs found in any state in that region.
• Per_Dogs: percentage of households surveyed that have at least one dog as a pet.
• Avg_Dogs: average number of dogs in a household reporting dogs as pets.

### Other requirements:

Implement each part of the lab in a separate cell. Document what each cell does.

### Part IV: Analysis

Create a MicroSoft Word document containing the following:

• In bullet point format, give 3 possible reasons why the South region has the highest number of pets.
• In bullet point format, give 3 possible reasons why the overall average number of cats is higher than the overall average number of dogs.
• In bullet point format, identify 3 potential problems this survey is susceptible to.
• A short paragraph on how you would conduct this survey and what information you would add to make it more robust.

### Grading rubric for Part I (point values)

 Criterion Performance indicator Missing Needs improvement Needs a little improvement Meets expectations Excellent Part II graph is correct and has appropriate labeling 0 2.5 4 4.5 5 Part III graph is correct and has appropriate labeling 0 2.5 4 4.5 5 Part IV graph is correct and has appropriate labeling 0 2.5 4 4.5 5 Part V statistics table has correct values 0 3.75 6 6.75 7.5 Script runs without error 0 3.75 6 6.75 7.5 3 bullet points discussing why South has highest number of pets 0 2.5 4 4.5 5 3 bullet points comparing average number of cats and average number of dogs 0 2.5 4 4.5 5 3 bullet points discussing problems with the survey 0 2.5 4 4.5 5 Paragraph on the implications 0 2.5 4 4.5 5

This project was created by Dave Patrick and input by Dawn Roberson of the University of Texas at San Antonio and last modified on 6 Oct 2019. Please contact dawnlee.roberson@utsa.edu with comments or suggestions.