Anandarajan, Murugan.

Practical Text Analytics : Maximizing the Value of Text Data. - 1 online resource (294 pages) - eBooks on Demand Advances in Analytics and Data Science Ser. ; v.2 . - Advances in Analytics and Data Science Ser. .

Intro -- Dedication -- Preface -- Acknowledgments -- Contents -- About the Authors -- List of Abbreviations -- List of Figures -- List of Tables -- Chapter 1: Introduction to Text Analytics -- 1.1 Introduction -- 1.2 Text Analytics: What Is It? -- 1.3 Origins and Timeline of Text Analytics -- 1.4 Text Analytics in Business and Industry -- 1.5 Text Analytics Skills -- 1.6 Benefits of Text Analytics -- 1.7 Text Analytics Process Road Map -- 1.7.1 Planning -- 1.7.2 Text Preparing and Preprocessing -- 1.7.3 Text Analysis Techniques -- 1.7.4 Communicating the Results -- 1.8 Examples of Text Analytics Software -- References -- Part I: Planning the Text Analytics Project -- Chapter 2: The Fundamentals of Content Analysis -- 2.1 Introduction -- 2.2 Deductive Versus Inductive Approaches -- 2.2.1 Content Analysis for Deductive Inference -- 2.2.2 Content Analysis for Inductive Inference -- 2.3 Unitizing and the Unit of Analysis -- 2.3.1 The Sampling Unit -- 2.3.2 The Recording Unit -- 2.3.3 The Context Unit -- 2.4 Sampling -- 2.5 Coding and Categorization -- 2.6 Examples of Inductive and Deductive Inference Processes -- 2.6.1 Inductive Inference -- 2.6.2 Deductive Inference -- References -- Further Reading -- Chapter 3: Planning for Text Analytics -- 3.1 Introduction -- 3.2 Initial Planning Considerations -- 3.2.1 Drivers -- 3.2.2 Objectives -- 3.2.3 Data -- 3.2.4 Cost -- 3.3 Planning Process -- 3.4 Problem Framing -- 3.4.1 Identifying the Analysis Problem -- 3.4.2 Inductive or Deductive Inference -- 3.5 Data Generation -- 3.5.1 Definition of the Project's Scope and Purpose -- 3.5.2 Text Data Collection -- 3.5.3 Sampling -- 3.5.3.1 Non-probability Sampling -- 3.5.3.2 Probability Sampling -- 3.5.3.3 Sampling for Classification Analysis -- 3.5.3.4 Sample Size -- 3.6 Method and Implementation Selection -- 3.6.1 Analysis Method Selection. 3.6.2 The Selection of Implementation Software -- References -- Further Reading -- Part II: Text Preparation -- Chapter 4: Text Preprocessing -- 4.1 Introduction -- 4.2 The Preprocessing Process -- 4.3 Unitize and Tokenize -- 4.3.1 N-Grams -- 4.4 Standardization and Cleaning -- 4.5 Stop Word Removal -- 4.5.1 Custom Stop Word Dictionaries -- 4.6 Stemming and Lemmatization -- 4.6.1 Syntax and Semantics -- 4.6.2 Stemming -- 4.6.3 Lemmatization -- 4.6.4 Part-of-Speech (POS) Tagging -- References -- Further Reading -- Chapter 5: Term-Document Representation -- 5.1 Introduction -- 5.2 The Inverted Index -- 5.3 The Term-Document Matrix -- 5.4 Term-Document Matrix Frequency Weighting -- 5.4.1 Local Weighting -- 5.4.1.1 Logarithmic (Log) Frequency -- 5.4.1.2 Binary/Boolean Frequency -- 5.4.2 Global Weighting -- 5.4.2.1 Document Frequency (df) -- 5.4.2.2 Global Frequency (gf) -- 5.4.2.3 Inverse Document Frequency (idf) -- 5.4.3 Combinatorial Weighting: Local and Global Weighting -- 5.4.3.1 Term Frequency-Inverse Document Frequency (tfidf) -- 5.5 Decision-Making -- References -- Further Reading -- Part III: Text Analysis Techniques -- Chapter 6: Semantic Space Representation and Latent Semantic Analysis -- 6.1 Introduction -- 6.2 Latent Semantic Analysis (LSA) -- 6.2.1 Singular Value Decomposition (SVD) -- 6.2.2 LSA Example -- 6.3 Cosine Similarity -- 6.4 Queries in LSA -- 6.5 Decision-Making: Choosing the Number of Dimensions -- References -- Further Reading -- Chapter 7: Cluster Analysis: Modeling Groups in Text -- 7.1 Introduction -- 7.2 Distance and Similarity -- 7.3 Hierarchical Cluster Analysis -- 7.3.1 Hierarchical Cluster Analysis Algorithm -- 7.3.2 Graph Methods -- 7.3.2.1 Single Linkage -- 7.3.2.2 Complete Linkage -- 7.3.3 Geometric Methods -- 7.3.3.1 Centroid -- 7.3.3.2 Ward's Minimum Variance Method -- 7.3.4 Advantages and Disadvantages of HCA. 7.4 k-Means Clustering -- 7.4.1 kMC Algorithm -- 7.4.2 The kMC Process -- 7.4.3 Advantages and Disadvantages of kMC -- 7.5 Cluster Analysis: Model Fit and Decision-Making -- 7.5.1 Choosing the Number of Clusters -- 7.5.1.1 Subjective Methods -- 7.5.1.2 Graphing Methods -- Scree Plot -- Silhouette Plot -- 7.5.2 Naming/Describing Clusters -- 7.5.3 Evaluating Model Fit -- 7.5.4 Choosing the Cluster Analysis Model -- References -- Further Reading -- Chapter 8: Probabilistic Topic Models -- 8.1 Introduction -- 8.2 Latent Dirichlet Allocation (LDA) -- 8.3 Correlated Topic Model (CTM) -- 8.4 Dynamic Topic Model (DT) -- 8.5 Supervised Topic Model (sLDA) -- 8.6 Structural Topic Model (STM) -- 8.7 Decision Making in Topic Models -- 8.7.1 Assessing Model Fit and Number of Topics -- 8.7.2 Model Validation and Topic Identification -- 8.7.3 When to Use Topic Models -- References -- Further Reading -- Chapter 9: Classification Analysis: Machine Learning Applied to Text -- 9.1 Introduction -- 9.2 The General Text Classification Process -- 9.3 Evaluating Model Fit -- 9.3.1 Confusion Matrices/Contingency Tables -- 9.3.2 Overall Model Measures -- 9.3.2.1 Accuracy -- 9.3.2.2 Error Rate -- 9.3.3 Class-Specific Measures -- 9.3.3.1 Precision -- 9.3.3.2 Recall -- 9.3.3.3 F-Measure -- 9.4 Classification Models -- 9.4.1 Naïve Bayes -- 9.4.2 k-Nearest Neighbors (kNN) -- 9.4.3 Support Vector Machines (SVM) -- 9.4.4 Decision Trees -- 9.4.5 Random Forests -- 9.4.6 Neural Networks -- 9.5 Choosing a Classification -- 9.5.1 Model Fit -- References -- Further Reading -- Chapter 10: Modeling Text Sentiment: Learning and Lexicon Models -- 10.1 Lexicon Approach -- 10.2 Machine Learning Approach -- 10.2.1 Naïve Bayes (NB) -- 10.2.2 Support Vector Machines (SVM) -- 10.2.3 Logistic Regression -- 10.3 Sentiment Analysis Performance: Considerations and Evaluation -- References. Further Reading -- Part IV: Communicating the Results -- Chapter 11: Storytelling Using Text Data -- 11.1 Introduction -- 11.2 Telling Stories About the Data -- 11.3 Framing the Story -- 11.3.1 Storytelling Framework -- 11.3.2 Applying the Framework -- 11.4 Organizations as Storytellers -- 11.4.1 United Parcel Service -- 11.4.2 Zillow -- 11.5 Data Storytelling Checklist -- References -- Further Reading -- Chapter 12: Visualizing Analysis Results -- 12.1 Strategies for Effective Visualization -- 12.1.1 Be Purposeful -- 12.1.2 Know the Audience -- 12.1.3 Solidify the Message -- 12.1.4 Plan and Outline -- 12.1.5 Keep It Simple -- 12.1.6 Focus Attention -- 12.2 Visualization Techniques in Text Analytics -- 12.2.1 Corpus/Document Collection-Level Visualizations -- 12.2.2 Theme and Category-Level Visualizations -- 12.2.2.1 LSA Dimensions -- 12.2.2.2 Cluster-Level Visualizations -- 12.2.2.3 Topic-Level Visualizations -- 12.2.2.4 Category or Class-Level Visualizations -- 12.2.2.5 Sentiment-Level Visualizations -- 12.2.3 Document-Level Visualizations -- References -- Further Reading -- Part V: Text Analytics Examples -- Chapter 13: Sentiment Analysis of Movie Reviews Using R -- 13.1 Introduction to R and RStudio -- 13.2 SA Data and Data Import -- 13.3 Objective of the Sentiment Analysis -- 13.4 Data Preparation and Preprocessing -- 13.4.1 Tokenize -- 13.4.2 Remove Stop Words -- 13.5 Sentiment Analysis -- 13.6 Sentiment Analysis Results -- 13.7 Custom Dictionary -- 13.8 Out-of-Sample Comparison -- References -- Further Reading -- Chapter 14: Latent Semantic Analysis (LSA) in Python -- 14.1 Introduction to Python and IDLE -- 14.2 Preliminary Steps -- 14.3 Getting Started -- 14.4 Data and Data Import -- 14.5 Analysis -- Further Reading -- Chapter 15: Learning-Based Sentiment Analysis Using RapidMiner -- 15.1 Introduction -- 15.2 Getting Started in RapidMiner. 15.3 Text Data Import -- 15.4 Text Preparation and Preprocessing -- 15.5 Text Classification Sentiment Analysis -- Reference -- Further Reading -- Chapter 16: SAS Visual Text Analytics -- 16.1 Introduction -- 16.2 Getting Started -- 16.3 Analysis -- Further Reading -- Index.

9783319956633


Big data.


Electronic books.

HF4999.2-6182

005.7