My Cart (0)

Customer Service 1-800-221-5528

Murach’s R for Data Analysis

by Scott McCoy
15 chapters, 576 pages, 235 illustrations
Published December 2022
ISBN 978-1-943873-03-6
Print: $59.50
eBook: $54.50
Print + eBook: $72.00

From its start, the R programming language was designed to be used for statistical analysis. Today, it’s one of the top languages used by data analysts and statisticians. With this book, you’ll learn the R skills you need to become a successful data analyst, even if you’re new to programming or have never studied statistics.

College Instructors

Go to our instructor’s site to learn more about this book and its instructor’s materials.

 

Praise for:
Murach's Python for Data Analysis

In his first at-bat, Scott McCoy smashes this one out of the park! This book is not just informative, it is exciting.”

—Scott Spurlock, Software Engineer

  • About this Book
  • Table of Contents
  • FREE Downloads
  • Book FAQs
  • Corrections
  • Reviews

Who this book is for

This book is for anyone who wants to learn how to visualize and present data professionally. The only prerequisite is basic computer literacy. That’s because chapters 1 and 2 present the parts of the R programming language that you need to get started with data analysis. Then, the rest of the book shows how to use R to analyze data.

Thanks to its unique paired-pages format, this book works equally well if you’re new to programming or if you’re an experienced programmer. Each figure is paired with explanatory text in clear, easy-to-understand language. If you’re new to programming, you’ll want to read each chapter carefully, following along with the code examples. If you’re an experienced programmer, you can read more quickly and apply your new skills on the job.

What this book does

Section 1: Get off to a fast start

This section gets you started programming right away. First, you’ll learn how to use RStudio, a popular program for coding in R that’s available for free. Then, you’ll learn the parts of the R language that you need to analyze data. Next, you’ll learn how to use R with the tidyverse package to create your first analysis.

Section 2: The essential skills for descriptive analysis

Most analysis is descriptive analysis in which you analyze data to better understand it. That’s why section 2 of this book presents the critical descriptive analysis skills that you need for success on the job. That includes:

  • Create powerful visualizations that can guide your analysis
  • Get data from CSV files, Excel files, JSON files, and databases
  • Clean data by dropping unneeded rows and columns, by using the correct data types, and by finding and fixing missing values and outliers
  • Prepare data by adding columns, modifying the data in columns, applying functions and lambda expressions, grouping and aggregating data, and more
  • Enhance your data visualizations to make them ready for professional presentation

Section 3: The case studies

This section presents three complete analyses that show how the skills presented in the first two sections can be applied to real-world data sets:

  • Polling data for the 2016 US presidential election
  • Wildfire data from the US Forest Service
  • Basketball shot data from the NBA (National Basketball Association)

These in-depth analyses make sure that you master the professional skills you need.

Section 4: Get started with predictive analysis

Predictive analysis takes data analysis to another level by using statistical models to predict unknown or future values. Although predictive analysis is a large and complex topic, this section presents the concepts you need to get started with it. More specifically, this section shows how to use linear regression models to predict continuous numeric values and how to use classification models to predict categorical values.

Section 5: Presentation skills

Section 5 shows how to present an analysis. To do that, you can use R Markdown to convert your analysis into an HTML document, PDF file, or PowerPoint slideshow. This is an important skill because the value of an analysis comes from being able to present the insights gained from it to your target audience, whether that’s your boss, your clients, or the general public.

Why you’ll learn faster and better with this book

Like all our books, this book is designed to make it as easy as possible for you to learn new skills faster and retain them better. Here are a few of those features:

  • All of the information is presented in paired pages, with the essential syntax, guidelines, and examples on the right page and clear explanations on the left page. This helps you learn faster by reading less.
  • The paired-pages format is ideal for reference when you need to refresh your memory about how to do something.
  • The three analyses presented in section 3 use real-world data sets.
  • The hundreds of short examples present usable code for tasks that you’re likely to need for your own analyses.
  • The exercises at the end of each chapter provide a way for you to gain valuable hands-on experience without any extra busywork.

What software you need for this book

To use R for data analysis, you only need to download and install the RStudio program and the R programming language. Both are available for free. Appendix A shows how to install them on Windows, and appendix B shows how to install them on macOS.

What people say about Murach's Python for Data Analysis

“In his first at-bat, Scott McCoy smashes this one out of the park! This book is not just informative, it is exciting.”
— Scott Spurlock, Software Engineer, Georgia

“Unlike some other books on data analysis with Python, the explanations of how to perform data analysis are thorough rather than terse or with no explanations.”
— Posted at an online bookseller

What people say about Murach books

“This is my first exposure to Murach’s books, and I love them. I like the organization of the content, the consistent approach in each book, and the accuracy of the material.”
— Bob L., Michigan

“I can’t praise this book highly enough. The clarity used in picking what to include, when to introduce it, and how to do so is remarkable.”
— Charles Ferguson, Software Developer, Australia

“Another thing I like is the exercises at the end of each chapter. They’re a great way to reinforce the main points of each chapter and force you to get your hands dirty.”
— Hien Luu, SD Forum/Java SIG

“Throughout the entire project, your book was indispensable to me. The answers were right there at every turn. All the examples made sense, and they all worked!”
— Alan Vogt, ETL Consultant, Massachusetts

“This book covers the perfect amount of description, and it does not make you bored by providing unnecessary details.”
— Posted at an online bookseller

“I picked up my first Murach book at a local bookstore in 2006, not knowing what was inside or what level of knowledge it would require of me, and it has changed my life since, literally. Your format (the paired pages) made it easy for me, an accountant with no IT or software development background, to understand databases and gain skills that proved useful throughout my entire career.”
— Giovanni Galope, Accountant, Philippines

“Your books shine out from the rest—the quality of writing and presentation of information is topnotch, and the consistency of quality across books is impressive.”
— Nolan Tamashiro, Developer

View the table of contents for this book in a PDF: Table of Contents (PDF)

Click on any chapter title to display or hide its content.

Section 1 Get started fast

Chapter 1 Introduction to RStudio and R

Introduction to data analysis

What data analysis is

The five phases of data analysis

How to get started with RStudio

Introduction to RStudio

How to run code in the Console pane

How to run code in the Source pane

How to view variables in the Environment pane

How to get started with R

How to create variables

How to work with variables

How to code arithmetic expressions

How to use arithmetic expressions in statements

How to interpret error messages

Chapter 2 More skills for working with R

How to use functions

How to call functions

How to use functions to work with strings

How to use functions to work with numbers

How to work with data structures

How to work with vectors

How to work with data frames

How to work with lists

How to add values to data structures

How to code Boolean expressions

How to use the relational operators

How to use the logical operators

How to work with control structures

How to code if statements

How to code nested if statements

How to code for loops

How to define functions

Chapter 3 How to code your first analysis

An introduction to the analysis

The child mortality data

How to set the working directory

How to work with packages

How to get and examine the data

How to read the data into a tibble

How to select the top and bottom rows

How to view summary statistics

How to prepare and analyze the data

How to melt the data

How to add, modify, and rename columns

How to save a tibble as an RDS file

How to calculate summary columns

How to create a line plot

Section 2 The essential skills for data analysis

Chapter 4 How to visualize data

How to get some data to plot

How to use the datasets package

How to get the irises data

How to get the chicks data

How to select rows based on a condition

An introduction to ggplot2

How to create a base plot

Functions for common plot types

How to create relational plots

How to create a line plot

How to create a scatter plot

How to create categorical plots

How to create a bar plot

How to create a box plot

How to create distribution plots

How to create a histogram

How to create a KDE plot

How to create an ECDF plot

How to create a 2D KDE plot

More skills for working with plots

How to combine plots

How to create a grid of plots

How to view documentation

How to save a plot

Chapter 5 How to get data

Basic skills for getting data

How to find the data you want

How to read data from CSV and Excel files

How to download data

How to work with a zip file

How to read data from a database

How to connect to a database

How to list the tables in a database

How to list the columns of a table

How to code a query

How to use a query to read data

How to read data from a JSON file

How to read a JSON file into a list

How to get the index for a list

How to get data from a list

How to build a tibble from the data in a list

Chapter 6 How to clean data

Introduction to cleaning data

A general plan for cleaning data

How to display column names and data types

How to examine the unique values for a single column

How to display the unique values for all columns

How to count the unique values for all columns

How to display the value counts

How to sort the data

How to simplify a data set

How to filter and drop rows

How to drop columns

How to rename columns

How to work with missing values

How to find missing values

How to fix missing values

How to work with data types

How to select columns by data type

How to convert strings to numbers

How to convert strings to dates and times

How to work with the factor type

How to work with outliers

How to assess outliers

How to calculate quartiles and quantiles

How to calculate the fences for the box plot

How to fix the outliers

Chapter 7 How to prepare data

How to add and modify columns

How to work with date columns

How to use stringr to work with strings

How to work with string and numeric columns

How to group and bin data

How to use statistical functions

How to summarize data

How to group and summarize data

Another way to group and summarize data

How to rank rows

How to add a cumulative sum

How to bin data

How to apply functions and lambda expressions

How to define functions that operate on rows

How to define functions that operate on columns

How to use lambda expressions instead of functions

How to reshape tibbles

How to add columns by joining tibbles

How to add rows

Chapter 8 More skills for data visualization

More skills for working with plots

Get the data

More skills for working with scatter plots

More skills for working with bar plots

How to add an error bar to a bar plot

More skills for working with line plots

How to create a smooth line plot

How to add labels to plots

How to work with shapes

How to plot shapes

How to plot a baseball field

How to return plot components from a function

How to plot hits on a baseball field

How to work with maps

How to plot maps

How to add data to a map

How to tune plots

How to zoom in on part of a plot

How to adjust the limits of a plot

How to work with the plot title and axes labels

How to change the position of the legend

How to edit the legend

How to hide the text and ticks for each axis

How to set the colors for the plot

How to change the theme of the plot

How to work with a grid of plots

How to create a pairwise grid of scatter plots

How to use other plot types in the grid

Section 3 Three case studies

Chapter 9 The Polling analysis

Get and examine the data

Load the packages

Get the data

Examine the data

Clean the data

Select and rename the columns

Sort the rows

Select the rows

Improve some columns

Prepare the data

Add columns

Pivot the data

Analyze the data

Plot the national polls

Plot the polls for swing states

Analyze the polls by voter type

More preparation and analysis

Plot the gap for the last week of the election

Plot the weekly gap over time

Chapter 10 The Wildfires analysis

Get the data

Load the packages for this analysis

Unzip the database file

Read the data from the database

Clean the data

Improve column names and data types

Drop duplicate rows

Select rows for large fires

Examine NA values

Prepare the data

Add, modify, and select columns

Sort the rows

Analyze the data

Plot the largest fire per year in California

Plot the mean and median acres burned in California

Plot the fires per month in California

Plot the total acres burned for the top 10 states

Plot the acres burned per year for the top 4 states

Plot the data on a map

Plot the 20 largest fires in California

Plot all fires in California larger than 500 acres

Plot all fires in the U.S. larger than 100,000 acres

Chapter 11 The Basketball Shots analysis

Get the data

Load the packages

Read the data

Build the tibble

Clean the data

Examine the unique values

Select and rename the columns

Improve the data types for two columns

Prepare the data

Add a Season column

Add a Points column

Add some summary columns

Analyze the shot statistics

Plot shots made per game by season

Plot shots attempted vs. made per game

Plot shots made per game for all seasons

Plot shot statistics by season

Plot shooting percentages per season

Analyze the shot locations

Plot shot locations for two games

Define a function for drawing the court

Plot shot locations for two games on a court

Plot shots by zone for one season

Plot shot count by zone

Plot shooting percentage by zone

Plot shot density

Compare shot locations and density for two seasons

Section 4 An introduction to data modeling

Chapter 12 How to work with simple regression models

Introduction to predictive analysis

Types of predictive models

Introduction to regression analysis

The diamonds data set

How to get the data

How to examine and clean the data

How to find correlations

How to interpret correlation coefficients

How to identify correlations with r-values

How to identify correlations visually

How to create a model that uses a straight line

A procedure for working with a regression model

How to split the data

How to drop outliers from the training data set

How to create a model

How to use a model to make predictions

How to work with formulas

How to plot an equation

How to plot an equation on a scatter plot

How to code formulas

How to plot a formula on a scatter plot

How to create a model for a curved line

Chapter 13 How to work with multiple regression models

How to work with multiple variables

How to create and fit the model

How to judge the model by its R2 value

How to judge the model by its residuals

How to work with variable interactions

More formula operators

How to create and fit the model

How to view the model’s terms

How to remove insignificant terms

How to plot regression coefficients

How to work with nonlinear patterns

Five common nonlinear patterns

How to transform variables

How to create, fit, and judge the model

How to work with ordinal variables

How to examine ordinal variables

How to create, fit, and judge the model

Chapter 14 How to work with classification models

Introduction to classification models

Introduction to classification analysis

How to get the data for this chapter

How to visually investigate the data

How to create and judge a classification model

How to create a decision tree

How to plot a decision tree

How to judge a model with a confusion matrix

How to tune a classification model

How to use variable importance to select variables

How to adjust the hyperparameters

How to compare decision trees

How to cross validate a model

How to tune hyperparameters with a grid search

Section 5 Presentation skills

Chapter 15 How to use R Markdown to present an analysis

How to get started with R Markdown files

How to create an R Markdown file

How to render an R Markdown file

How to work with R Markdown documents

How to code the YAML header

How to add headings and paragraphs

How to add chunks of code

How to run chunks of code

How to format text

How to create dynamic documents

How to specify multiple output formats

The Wildfires document

The HTML document displayed in a browser

The PDF and Word documents for the same markdown

The R Markdown

How to work with R Markdown presentations

How to start a presentation

The first two slides of a presentation

Appendixes

Appendix A How to set up Windows for this book

How to install R

How to install RStudio

How to install the files for this book

How to install the packages for this book

Appendix B How to set up macOS for this book

How to install R

How to install RStudio

How to install the files for this book

How to install the packages for this book

Free chapter

To get a better idea of how well this book can work for you regardless of your level of experience, you can download the third chapter of this book in PDF format.

Chapter 3: How to code your first analysis

The goal of this chapter is to give you a taste of how data analysis works. In addition, it’s designed to introduce you to some of the most important R packages for working with data analysis. To do that, this chapter presents the R code for a simple but complete analysis of child mortality data. The code for this analysis uses a collection of packages known as the tidyverse.

Chapter 3 PDF  Download Now

Book analyses and exercises

This download includes files for:

  • The R scripts for all examples and analyses presented in this book
  • The starting points for the exercises presented at the end of each chapter
  • The solutions to those exercises
  • The data for all of the examples, analyses, and exercises

Appendixes A and B show how to install and use these files on Windows and macOS.

Zip file  Download Now

On this page, we’ll be posting answers to the questions that come up most often about our R data analysis book. If you have any questions that you haven’t found answered here, please email us. Thanks!

To view the corrections for this book in a PDF, just click on this link: View the corrections

Then, if you find any other errors, please email us so we can correct them in the next printing of the book. Thank you!

There are no reviews for this product yet.

To leave a review, please log in to your account.     Log In Here

Our Ironclad Guarantee

You must be satisfied. Try our print books for 30 days or our eBooks for 14 days. If they aren't the best you've ever used, you can return the books or cancel the eBooks for a prompt refund. No questions asked!

Contact Murach Books

For orders and customer service:

1-800-221-5528

Weekdays, 8 to 4 Pacific Time

College Instructors

If you're a college instructor who would like to consider a book for a course, please visit our website for instructors to learn how to get a complimentary review copy and the full set of instructional materials.