GALIT SHMUELI, PHD, is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland, Statistics.com, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 100 publications including books. PETER C. BRUCE is President and Founder of the Institute for Statistics Education at Statistics.com. He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of Introductory Statistics and Analytics: A Resampling Perspective (Wiley) and co-author of Practical Statistics for Data Scientists: 50 Essential Concepts (O'Reilly). PETER GEDECK, PHD, is a Senior Data Scientist at Collaborative Drug Discovery, where he helps develop cloud-based software to manage the huge amount of data involved in the drug discovery process. He also teaches data mining at Statistics.com. NITIN R. PATEL, PhD, is cofounder and board member of Cytel Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years.
Foreword by Gareth James xix Foreword by Ravi Bapna xxi Preface to the Python Edition xxiii Acknowledgments xxvii Part I Preliminaries Chapter 1 Introduction 3 1.1 What is Business Analytics? 3 1.2 What is Data Mining? 5 1.3 Data Mining and Related Terms 5 1.4 Big Data 6 1.5 Data Science 7 1.6 Why are There So Many Different Methods? 8 1.7 Terminology and Notation 9 1.8 Road Maps to This Book 11 Chapter 2 Overview of the Data Mining Process 15 2.1 Introduction 15 2.2 Core Ideas in Data Mining 16 2.3 The Steps in Data Mining 19 2.4 Preliminary Steps 21 2.5 Predictive Power and Overfitting 34 2.6 Building a Predictive Model 40 2.7 Using Python for Data Mining on a Local Machine 44 2.8 Automating Data Mining Solutions 45 2.9 Ethical Practice in Data Mining 47 Problems 56 Part II Data Exploration and Dimension Reduction Chapter 3 Data Visualization 61 3.1 Introduction 61 3.2 Data Examples 64 3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots 65 3.4 Multidimensional Visualization 74 3.5 Specialized Visualizations 88 3.6 Summary: Major Visualizations and Operations, by Data Mining Goal 93 Problems 97 Chapter 4 Dimension Reduction 99 4.1 Introduction 100 4.2 Curse of Dimensionality 100 4.3 Practical Considerations 100 4.4 Data Summaries 102 4.5 Correlation Analysis 105 4.6 Reducing the Number of Categories in Categorical Variables 106 4.7 Converting a Categorical Variable to a Numerical Variable 108 4.8 Principal Components Analysis 108 4.9 Dimension Reduction Using Regression Models 119 4.10 Dimension Reduction Using Classification and Regression Trees 119 Problems 120 Part III Performance Evaluation Chapter 5 Evaluating Predictive Performance 125 5.1 Introduction 126 5.2 Evaluating Predictive Performance 126 5.3 Judging Classifier Performance 131 5.4 Judging Ranking Performance 144 5.5 Oversampling 149 Problems 155 Part IV Prediction and Classification Methods Chapter 6 Multiple Linear Regression 161 6.1 Introduction 162 6.2 Explanatory vs. Predictive Modeling 162 6.3 Estimating the Regression Equation and Prediction 164 6.4 Variable Selection in Linear Regression 169 Appendix: Using Statmodels 179 Problems 180 Chapter 7 k-Nearest Neighbors (kNN) 185 7.1 The k-NN Classifier (Categorical Outcome) 185 7.2 k-NN for a Numerical Outcome 193 7.3 Advantages and Shortcomings of k-NN Algorithms 195 Problems 197 Chapter 8 The Naive Bayes Classifier 199 8.1 Introduction 199 Example 1: Predicting Fraudulent Financial Reporting 201 8.2 Applying the Full (Exact) Bayesian Classifier 201 8.3 Advantages and Shortcomings of the Naive Bayes Classifier 210 Problems 214 Chapter 9 Classification and Regression Trees 217 9.1 Introduction 218 9.2 Classification Trees 220 9.3 Evaluating the Performance of a Classification Tree 228 9.4 Avoiding Overfitting 232 9.5 Classification Rules from Trees 238 9.6 Classification Trees for More Than Two Classes 239 9.7 Regression Trees 239 9.8 Improving Prediction: Random Forests and Boosted Trees 243 9.9 Advantages and Weaknesses of a Tree 246 Problems 248 Chapter 10 Logistic Regression 251 10.1 Introduction 252 10.2 The Logistic Regression Model 253 10.3 Example: Acceptance of Personal Loan 255 10.4 Evaluating Classification Performance 261 10.5 Logistic Regression for Multi-class Classification 264 10.6 Example of Complete Analysis: Predicting Delayed Flights 269 Appendix: Using Statmodels 278 Problems 280 Chapter 11 Neural Nets 283 11.1 Introduction 284 11.2 Concept and Structure of a Neural Network 284 11.3 Fitting a Network to Data 285 11.4 Required User Input 297 11.5 Exploring the Relationship Between Predictors and Outcome 299 11.6 Deep Learning 299 11.7 Advantages and Weaknesses of Neural Networks 305 Problems 306 Chapter 12 Discriminant Analysis 309 12.1 Introduction 310 12.2 Distance of a Record from a Class 311 12.3 Fisher's Linear Classification Functions 314 12.4 Classification Performance of Discriminant Analysis 317 12.5 Prior Probabilities 318 12.6 Unequal Misclassification Costs 319 12.7 Classifying More Than Two Classes 319 12.8 Advantages and Weaknesses 322 Problems 324 Chapter 13 Combining Methods: Ensembles and Uplift Modeling 327 13.1 Ensembles 328 13.2 Uplift (Persuasion) Modeling 334 13.3 Summary 340 Problems 341 Part V Mining Relationships among Records Chapter 14 Association Rules and Collaborative Filtering 345 14.1 Association Rules 346 14.2 Collaborative Filtering 357 14.3 Summary 368 Problems 370 Chapter 15 Cluster Analysis 375 15.1 Introduction 376 15.2 Measuring Distance Between Two Records 379 15.3 Measuring Distance Between Two Clusters 385 15.4 Hierarchical (Agglomerative) Clustering 387 15.5 Non-Hierarchical Clustering: The k-Means Algorithm 395 Problems 401 Part VI Forecasting Time Series Chapter 16 Handling Time Series 407 16.1 Introduction 408 16.2 Descriptive vs. Predictive Modeling 409 16.3 Popular Forecasting Methods in Business 409 16.4 Time Series Components 410 16.5 Data-Partitioning and Performance Evaluation 415 Problems 419 Chapter 17 Regression-Based Forecasting 423 17.1 A Model with Trend 424 17.2 A Model with Seasonality 429 17.3 A Model with Trend and Seasonality 432 17.4 Autocorrelation and ARIMA Models 433 Problems 442 Chapter 18 Smoothing Methods 451 18.1 Introduction 452 18.2 Moving Average 452 18.3 Simple Exponential Smoothing 457 18.4 Advanced Exponential Smoothing 460 Problems 464 Part VII Data Analytics Chapter 19 Social Network Analytics 473 19.1 Introduction 473 19.2 Directed vs. Undirected Networks 475 19.3 Visualizing and Analyzing Networks 476 19.4 Social Data Metrics and Taxonomy 480 19.5 Using Network Metrics in Prediction and Classification 485 19.6 Collecting Social Network Data with Python 491 19.7 Advantages and Disadvantages 491 Problems 494 Chapter 20 Text Mining 495 20.1 Introduction 496 20.2 The Tabular Representation of Text: Term-Document Matrix and "Bag-of-Words'' 496 20.3 Bag-of-Words vs. Meaning Extraction at Document Level 497 20.4 Preprocessing the Text 498 20.5 Implementing Data Mining Methods 506 20.6 Example: Online Discussions on Autos and Electronics 506 20.7 Summary 510 Problems 511 Part VIII Cases Chapter 21 Cases 515 21.1 Charles Book Club 515 21.2 German Credit 522 21.3 Tayko Software Cataloger 527 21.4 Political Persuasion 531 21.5 Taxi Cancellations 535 21.6 Segmenting Consumers of Bath Soap 537 21.7 Direct-Mail Fundraising 541 21.8 Catalog Cross-Selling 544 21.9 Time Series Case: Forecasting Public Transportation Demand 546 References 549 Data Files Used in the Book 551 Python Utilities Functions 555 Index 565
Show moreGALIT SHMUELI, PHD, is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland, Statistics.com, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 100 publications including books. PETER C. BRUCE is President and Founder of the Institute for Statistics Education at Statistics.com. He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of Introductory Statistics and Analytics: A Resampling Perspective (Wiley) and co-author of Practical Statistics for Data Scientists: 50 Essential Concepts (O'Reilly). PETER GEDECK, PHD, is a Senior Data Scientist at Collaborative Drug Discovery, where he helps develop cloud-based software to manage the huge amount of data involved in the drug discovery process. He also teaches data mining at Statistics.com. NITIN R. PATEL, PhD, is cofounder and board member of Cytel Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years.
Foreword by Gareth James xix Foreword by Ravi Bapna xxi Preface to the Python Edition xxiii Acknowledgments xxvii Part I Preliminaries Chapter 1 Introduction 3 1.1 What is Business Analytics? 3 1.2 What is Data Mining? 5 1.3 Data Mining and Related Terms 5 1.4 Big Data 6 1.5 Data Science 7 1.6 Why are There So Many Different Methods? 8 1.7 Terminology and Notation 9 1.8 Road Maps to This Book 11 Chapter 2 Overview of the Data Mining Process 15 2.1 Introduction 15 2.2 Core Ideas in Data Mining 16 2.3 The Steps in Data Mining 19 2.4 Preliminary Steps 21 2.5 Predictive Power and Overfitting 34 2.6 Building a Predictive Model 40 2.7 Using Python for Data Mining on a Local Machine 44 2.8 Automating Data Mining Solutions 45 2.9 Ethical Practice in Data Mining 47 Problems 56 Part II Data Exploration and Dimension Reduction Chapter 3 Data Visualization 61 3.1 Introduction 61 3.2 Data Examples 64 3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots 65 3.4 Multidimensional Visualization 74 3.5 Specialized Visualizations 88 3.6 Summary: Major Visualizations and Operations, by Data Mining Goal 93 Problems 97 Chapter 4 Dimension Reduction 99 4.1 Introduction 100 4.2 Curse of Dimensionality 100 4.3 Practical Considerations 100 4.4 Data Summaries 102 4.5 Correlation Analysis 105 4.6 Reducing the Number of Categories in Categorical Variables 106 4.7 Converting a Categorical Variable to a Numerical Variable 108 4.8 Principal Components Analysis 108 4.9 Dimension Reduction Using Regression Models 119 4.10 Dimension Reduction Using Classification and Regression Trees 119 Problems 120 Part III Performance Evaluation Chapter 5 Evaluating Predictive Performance 125 5.1 Introduction 126 5.2 Evaluating Predictive Performance 126 5.3 Judging Classifier Performance 131 5.4 Judging Ranking Performance 144 5.5 Oversampling 149 Problems 155 Part IV Prediction and Classification Methods Chapter 6 Multiple Linear Regression 161 6.1 Introduction 162 6.2 Explanatory vs. Predictive Modeling 162 6.3 Estimating the Regression Equation and Prediction 164 6.4 Variable Selection in Linear Regression 169 Appendix: Using Statmodels 179 Problems 180 Chapter 7 k-Nearest Neighbors (kNN) 185 7.1 The k-NN Classifier (Categorical Outcome) 185 7.2 k-NN for a Numerical Outcome 193 7.3 Advantages and Shortcomings of k-NN Algorithms 195 Problems 197 Chapter 8 The Naive Bayes Classifier 199 8.1 Introduction 199 Example 1: Predicting Fraudulent Financial Reporting 201 8.2 Applying the Full (Exact) Bayesian Classifier 201 8.3 Advantages and Shortcomings of the Naive Bayes Classifier 210 Problems 214 Chapter 9 Classification and Regression Trees 217 9.1 Introduction 218 9.2 Classification Trees 220 9.3 Evaluating the Performance of a Classification Tree 228 9.4 Avoiding Overfitting 232 9.5 Classification Rules from Trees 238 9.6 Classification Trees for More Than Two Classes 239 9.7 Regression Trees 239 9.8 Improving Prediction: Random Forests and Boosted Trees 243 9.9 Advantages and Weaknesses of a Tree 246 Problems 248 Chapter 10 Logistic Regression 251 10.1 Introduction 252 10.2 The Logistic Regression Model 253 10.3 Example: Acceptance of Personal Loan 255 10.4 Evaluating Classification Performance 261 10.5 Logistic Regression for Multi-class Classification 264 10.6 Example of Complete Analysis: Predicting Delayed Flights 269 Appendix: Using Statmodels 278 Problems 280 Chapter 11 Neural Nets 283 11.1 Introduction 284 11.2 Concept and Structure of a Neural Network 284 11.3 Fitting a Network to Data 285 11.4 Required User Input 297 11.5 Exploring the Relationship Between Predictors and Outcome 299 11.6 Deep Learning 299 11.7 Advantages and Weaknesses of Neural Networks 305 Problems 306 Chapter 12 Discriminant Analysis 309 12.1 Introduction 310 12.2 Distance of a Record from a Class 311 12.3 Fisher's Linear Classification Functions 314 12.4 Classification Performance of Discriminant Analysis 317 12.5 Prior Probabilities 318 12.6 Unequal Misclassification Costs 319 12.7 Classifying More Than Two Classes 319 12.8 Advantages and Weaknesses 322 Problems 324 Chapter 13 Combining Methods: Ensembles and Uplift Modeling 327 13.1 Ensembles 328 13.2 Uplift (Persuasion) Modeling 334 13.3 Summary 340 Problems 341 Part V Mining Relationships among Records Chapter 14 Association Rules and Collaborative Filtering 345 14.1 Association Rules 346 14.2 Collaborative Filtering 357 14.3 Summary 368 Problems 370 Chapter 15 Cluster Analysis 375 15.1 Introduction 376 15.2 Measuring Distance Between Two Records 379 15.3 Measuring Distance Between Two Clusters 385 15.4 Hierarchical (Agglomerative) Clustering 387 15.5 Non-Hierarchical Clustering: The k-Means Algorithm 395 Problems 401 Part VI Forecasting Time Series Chapter 16 Handling Time Series 407 16.1 Introduction 408 16.2 Descriptive vs. Predictive Modeling 409 16.3 Popular Forecasting Methods in Business 409 16.4 Time Series Components 410 16.5 Data-Partitioning and Performance Evaluation 415 Problems 419 Chapter 17 Regression-Based Forecasting 423 17.1 A Model with Trend 424 17.2 A Model with Seasonality 429 17.3 A Model with Trend and Seasonality 432 17.4 Autocorrelation and ARIMA Models 433 Problems 442 Chapter 18 Smoothing Methods 451 18.1 Introduction 452 18.2 Moving Average 452 18.3 Simple Exponential Smoothing 457 18.4 Advanced Exponential Smoothing 460 Problems 464 Part VII Data Analytics Chapter 19 Social Network Analytics 473 19.1 Introduction 473 19.2 Directed vs. Undirected Networks 475 19.3 Visualizing and Analyzing Networks 476 19.4 Social Data Metrics and Taxonomy 480 19.5 Using Network Metrics in Prediction and Classification 485 19.6 Collecting Social Network Data with Python 491 19.7 Advantages and Disadvantages 491 Problems 494 Chapter 20 Text Mining 495 20.1 Introduction 496 20.2 The Tabular Representation of Text: Term-Document Matrix and "Bag-of-Words'' 496 20.3 Bag-of-Words vs. Meaning Extraction at Document Level 497 20.4 Preprocessing the Text 498 20.5 Implementing Data Mining Methods 506 20.6 Example: Online Discussions on Autos and Electronics 506 20.7 Summary 510 Problems 511 Part VIII Cases Chapter 21 Cases 515 21.1 Charles Book Club 515 21.2 German Credit 522 21.3 Tayko Software Cataloger 527 21.4 Political Persuasion 531 21.5 Taxi Cancellations 535 21.6 Segmenting Consumers of Bath Soap 537 21.7 Direct-Mail Fundraising 541 21.8 Catalog Cross-Selling 544 21.9 Time Series Case: Forecasting Public Transportation Demand 546 References 549 Data Files Used in the Book 551 Python Utilities Functions 555 Index 565
Show moreForeword by Gareth James xix
Foreword by Ravi Bapna xxi
Preface to the Python Edition xxiii
Acknowledgments xxvii
Part I Preliminaries
Chapter 1 Introduction 3
1.1 What is Business Analytics? 3
1.2 What is Data Mining? 5
1.3 Data Mining and Related Terms 5
1.4 Big Data 6
1.5 Data Science 7
1.6 Why are There So Many Different Methods? 8
1.7 Terminology and Notation 9
1.8 Road Maps to This Book 11
Chapter 2 Overview of the Data Mining Process 15
2.1 Introduction 15
2.2 Core Ideas in Data Mining 16
2.3 The Steps in Data Mining 19
2.4 Preliminary Steps 21
2.5 Predictive Power and Overfitting 34
2.6 Building a Predictive Model 40
2.7 Using Python for Data Mining on a Local Machine 44
2.8 Automating Data Mining Solutions 45
2.9 Ethical Practice in Data Mining 47
Problems 56
Part II Data Exploration and Dimension Reduction
Chapter 3 Data Visualization 61
3.1 Introduction 61
3.2 Data Examples 64
3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots 65
3.4 Multidimensional Visualization 74
3.5 Specialized Visualizations 88
3.6 Summary: Major Visualizations and Operations, by Data Mining Goal 93
Problems 97
Chapter 4 Dimension Reduction 99
4.1 Introduction 100
4.2 Curse of Dimensionality 100
4.3 Practical Considerations 100
4.4 Data Summaries 102
4.5 Correlation Analysis 105
4.6 Reducing the Number of Categories in Categorical Variables 106
4.7 Converting a Categorical Variable to a Numerical Variable 108
4.8 Principal Components Analysis 108
4.9 Dimension Reduction Using Regression Models 119
4.10 Dimension Reduction Using Classification and Regression Trees 119
Problems 120
Part III Performance Evaluation
Chapter 5 Evaluating Predictive Performance 125
5.1 Introduction 126
5.2 Evaluating Predictive Performance 126
5.3 Judging Classifier Performance 131
5.4 Judging Ranking Performance 144
5.5 Oversampling 149
Problems 155
Part IV Prediction and Classification Methods
Chapter 6 Multiple Linear Regression 161
6.1 Introduction 162
6.2 Explanatory vs. Predictive Modeling 162
6.3 Estimating the Regression Equation and Prediction 164
6.4 Variable Selection in Linear Regression 169
Appendix: Using Statmodels 179
Problems 180
Chapter 7 k-Nearest Neighbors (kNN) 185
7.1 The k-NN Classifier (Categorical Outcome) 185
7.2 k-NN for a Numerical Outcome 193
7.3 Advantages and Shortcomings of k-NN Algorithms 195
Problems 197
Chapter 8 The Naive Bayes Classifier 199
8.1 Introduction 199
Example 1: Predicting Fraudulent Financial Reporting 201
8.2 Applying the Full (Exact) Bayesian Classifier 201
8.3 Advantages and Shortcomings of the Naive Bayes Classifier 210
Problems 214
Chapter 9 Classification and Regression Trees 217
9.1 Introduction 218
9.2 Classification Trees 220
9.3 Evaluating the Performance of a Classification Tree 228
9.4 Avoiding Overfitting 232
9.5 Classification Rules from Trees 238
9.6 Classification Trees for More Than Two Classes 239
9.7 Regression Trees 239
9.8 Improving Prediction: Random Forests and Boosted Trees 243
9.9 Advantages and Weaknesses of a Tree 246
Problems 248
Chapter 10 Logistic Regression 251
10.1 Introduction 252
10.2 The Logistic Regression Model 253
10.3 Example: Acceptance of Personal Loan 255
10.4 Evaluating Classification Performance 261
10.5 Logistic Regression for Multi-class Classification 264
10.6 Example of Complete Analysis: Predicting Delayed Flights 269
Appendix: Using Statmodels 278
Problems 280
Chapter 11 Neural Nets 283
11.1 Introduction 284
11.2 Concept and Structure of a Neural Network 284
11.3 Fitting a Network to Data 285
11.4 Required User Input 297
11.5 Exploring the Relationship Between Predictors and Outcome 299
11.6 Deep Learning 299
11.7 Advantages and Weaknesses of Neural Networks 305
Problems 306
Chapter 12 Discriminant Analysis 309
12.1 Introduction 310
12.2 Distance of a Record from a Class 311
12.3 Fisher’s Linear Classification Functions 314
12.4 Classification Performance of Discriminant Analysis 317
12.5 Prior Probabilities 318
12.6 Unequal Misclassification Costs 319
12.7 Classifying More Than Two Classes 319
12.8 Advantages and Weaknesses 322
Problems 324
Chapter 13 Combining Methods: Ensembles and Uplift Modeling 327
13.1 Ensembles 328
13.2 Uplift (Persuasion) Modeling 334
13.3 Summary 340
Problems 341
Part V Mining Relationships among Records
Chapter 14 Association Rules and Collaborative Filtering 345
14.1 Association Rules 346
14.2 Collaborative Filtering 357
14.3 Summary 368
Problems 370
Chapter 15 Cluster Analysis 375
15.1 Introduction 376
15.2 Measuring Distance Between Two Records 379
15.3 Measuring Distance Between Two Clusters 385
15.4 Hierarchical (Agglomerative) Clustering 387
15.5 Non-Hierarchical Clustering: The k-Means Algorithm 395
Problems 401
Part VI Forecasting Time Series
Chapter 16 Handling Time Series 407
16.1 Introduction 408
16.2 Descriptive vs. Predictive Modeling 409
16.3 Popular Forecasting Methods in Business 409
16.4 Time Series Components 410
16.5 Data-Partitioning and Performance Evaluation 415
Problems 419
Chapter 17 Regression-Based Forecasting 423
17.1 A Model with Trend 424
17.2 A Model with Seasonality 429
17.3 A Model with Trend and Seasonality 432
17.4 Autocorrelation and ARIMA Models 433
Problems 442
Chapter 18 Smoothing Methods 451
18.1 Introduction 452
18.2 Moving Average 452
18.3 Simple Exponential Smoothing 457
18.4 Advanced Exponential Smoothing 460
Problems 464
Part VII Data Analytics
Chapter 19 Social Network Analytics 473
19.1 Introduction 473
19.2 Directed vs. Undirected Networks 475
19.3 Visualizing and Analyzing Networks 476
19.4 Social Data Metrics and Taxonomy 480
19.5 Using Network Metrics in Prediction and Classification 485
19.6 Collecting Social Network Data with Python 491
19.7 Advantages and Disadvantages 491
Problems 494
Chapter 20 Text Mining 495
20.1 Introduction 496
20.2 The Tabular Representation of Text: Term-Document Matrix and “Bag-of-Words’’ 496
20.3 Bag-of-Words vs. Meaning Extraction at Document Level 497
20.4 Preprocessing the Text 498
20.5 Implementing Data Mining Methods 506
20.6 Example: Online Discussions on Autos and Electronics 506
20.7 Summary 510
Problems 511
Part VIII Cases
Chapter 21 Cases 515
21.1 Charles Book Club 515
21.2 German Credit 522
21.3 Tayko Software Cataloger 527
21.4 Political Persuasion 531
21.5 Taxi Cancellations 535
21.6 Segmenting Consumers of Bath Soap 537
21.7 Direct-Mail Fundraising 541
21.8 Catalog Cross-Selling 544
21.9 Time Series Case: Forecasting Public Transportation Demand 546
References 549
Data Files Used in the Book 551
Python Utilities Functions 555
Index 565
GALIT SHMUELI, PHD, is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland, Statistics.com, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 100 publications including books.
PETER C. BRUCE is President and Founder of the Institute for Statistics Education at Statistics.com. He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of Introductory Statistics and Analytics: A Resampling Perspective (Wiley) and co-author of Practical Statistics for Data Scientists: 50 Essential Concepts (O'Reilly).
PETER GEDECK, PHD, is a Senior Data Scientist at Collaborative Drug Discovery, where he helps develop cloud-based software to manage the huge amount of data involved in the drug discovery process. He also teaches data mining at Statistics.com.
NITIN R. PATEL, PhD, is cofounder and board member of Cytel Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years.
![]() |
Ask a Question About this Product More... |
![]() |