From a review of the first edition: "Modern Data Science with R… is rich with examples and is guided by a strong narrative voice. What’s more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics" (The American Statistician).
Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions.
The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice.
Show moreFrom a review of the first edition: "Modern Data Science with R… is rich with examples and is guided by a strong narrative voice. What’s more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics" (The American Statistician).
Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions.
The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice.
Show moreI Part I: Introduction to Data Science. 1. Prologue: Why data science? 2. Data visualization. 3. A grammar for graphics. 4. Data wrangling on one table. 5. Data wrangling on multiple tables. 6. Tidy data. 7. Iteration. 8. Data science ethics. II. Part II: Statistics and Modeling. 9. Statistical foundations. 10. Predictive modeling. 11. Supervised learning. 12. Unsupervised learning. 13. Simulation. III Part III: Topics in Data Science. 14. Dynamic and customized data graphics. 15. Database querying using SQL. 16. Database administration. 17. Working with spatial data. 18.Geospatial computations. 19. Text as data. 20. Network science. IV Part IV: Appendices.
Benjamin S. Baumer is an associate professor in the Statistical & Data Sciences program at Smith College. He has been a practicing data scientist since 2004, when he became the first full-time statistical analyst for the New York Mets. Ben is a co-author of The Sabermetric Revolution and Analyzing Baseball Data with R. He received the 2019 Waller Education Award and the 2016 Significant Contributor Award from the Society for American Baseball Research.
Daniel T. Kaplan is the DeWitt Wallace emeritus professor of mathematics and computer science at Macalester College. He is the author of several textbooks on statistical modeling and statistical computing. Danny received the 2006 Macalester Excellence in Teaching award and the 2017 CAUSE Lifetime Achievement Award.
Nicholas J. Horton is Beitzel Professor of Technology and Society (Statistics and Data Science) at Amherst College. He is a Fellow of the ASA and the AAAS, co-chair of the National Academies Committee on Applied and Theoretical Statistics, recipient of a number of national teaching awards, author of a series of books on statistical computing, and actively involved in data science curriculum efforts to help students "think with data".
"This text continues to be fantastic! There are a number of courses
for which I would require this book and others that I would
recommend it as a supplement. I would likely require it for courses
focused on computing in R or courses in data science. I would
include it as a recommended text in introductory and other
statistics courses that used R as the software of choice, where
this text could be used as a supplemental resource in how to use R
to work with data." (Hunter Glanz Cal Poly San Luis Obispo)"Easy
for students to read and relate to the exercises and examples. Many
questions and hands-on activities with data sets to practice
skills." (Lynn Collen, St. Cloud Stat Univ.)"I used the first
edition of this book as the primary text for an intermediate data
science course a few years ago and I liked it very much…I think
that the technical breadth, writing style, and level of difficulty
are very clear strengths. Also, my students and I found the
`tidyverse` approach to be particularly well-suited for teaching
and learning R…and I love that the MDSR book includes such complete
code. Students can program everything they see in the book, and
often times there are tips & tricks for them to discover along the
way just by studying expert code provided by the authors. This
really sets MDSR apart from other books I considered for the
course." (Matthew Beckman, Penn State University)
"[...] To answer a wide range of modern research questions, this
book by Baumer, Kaplan, and Horton features an excellent
introduction to data wrangling, visualization, statistical
modeling, machine learning, and other advanced statistical
applications through the RStudio environment following the
tidyverse syntax. [...] Overall, Modern Data Science with R, 2nd
edition serves as an excellent introductory resource to help
develop techniques to extract, transform, visualize, and learn from
datasets through the R environment. It focuses on implementing
those techniques in R and does not provide a theoretical background
for the discussed methods. The book will be a perfect reference for
a broad audience ranging from undergraduates in data science
courses to advanced graduate students and professionals from a
variety of research fields."
-Kohma Arai and Vyacheslav Lyubchich, in Technometrics, July
2022"Overall, I enjoyed reading this book. The authors were very
good at creating a complete tool for studying data science.
Therefore, I recommend this book, for its content, writing, and
organization, to graduate students in data science and statistics.
I also recommend the book to professionals who should prepare
themselves for the challenges they are going to face in the future
with the voluminous and heterogenous amount of data that should be
timely analyzed to extract meaningful information to guide
action."
-Georgios Nikolopoulos, in ISCB News, June 2022"The authors have
successfully completed the job of choosing the content with
relevant topics and, deciding the extent of knowledge to be
delivered, and finally, putting them in an understandable sequence.
This is a well-written book and does not cover much theory. .. The
book’s second edition contents are updated, expanded, revised,
split, rewritten and rearranged compared to the first edition. The
key changes are the use of recently developed R packages, ....
(and) updated exercises in the chapters ..."
-Shalabh,in Journal of the Royal Statistical Society Series A,
August 2021"[This book] provides an excellent basis for
statisticians who want to dig deeper into, for example, data
handling, for computer scientists who aim to strengthen their
knowledge of statistical methods as well as for all other
researchers who are interested in data science in general. ... Each
section is structured as an interplay between R-code and
explanatory text for understanding. The division into several
stand-alone segments is an advantage, because the reader may easily
choose the section she or he is interested in without missing
relevant information. A key feature of the book is its focus on
different example data sets that are available via R-packages or
from URLs that are embedded in the text. These data sets are used
to illustrate the methodology presented using R-code. Their
availability allows the reader to reproduce the code while working
with the book. ... It can be warmly recommended to practical
researchers who seek a comprehensive overview of different topics
in data science with focus on implementations in R."
-Annika Hoyer, in Biometrical Journal, August 2021"This text
continues to be fantastic! There are a number of courses for which
I would require this book and others that I would recommend it as a
supplement. I would likely require it for courses focused on
computing in R or courses in data science. I would include it as a
recommended text in introductory and other statistics courses that
used R as the software of choice, where this text could be used as
a supplemental resource in how to use R to work with data."
-Hunter Glanz, Cal Poly San Luis Obispo"Easy for students to read
and relate to the exercises and examples. Many questions and
hands-on activities with data sets to practice skills."
-Lynn Collen, St. Cloud Stat University"I used the first edition of
this book as the primary text for an intermediate data science
course a few years ago and I liked it very much…I think that the
technical breadth, writing style, and level of difficulty are very
clear strengths. Also, my students and I found the `tidyverse`
approach to be particularly well-suited for teaching and learning
R…and I love that the MDSR book includes such complete code.
Students can program everything they see in the book, and often
times there are tips & tricks for them to discover along the way
just by studying expert code provided by the authors. This really
sets MDSR apart from other books I considered for the course."
-Matthew Beckman, Penn State University"The authors have covered
almost all aspects of data science, a revolutionary field that
marries elements of computational thinking and traditional
statistical theory. The book can thus equip the readers with the
necessary knowledge and skills to extract data from a variety of
sources, restructure observations in a form that allows analysis,
store data in efficient databases, and work effectively on massive
and complex data sets in order to produce actionable
information."
- Georgios Nikolopoulos, University of Cyprus, ISCB Book Reviews,
June 2022.
![]() |
Ask a Question About this Product More... |
![]() |