System of Automated Text Messages Clustering by Semantic Proximity Based on NLP and Machine Learning Methods
Table 1: Examples of data tokenization in BBC tweets dataset
Original documents from the corpus | Tokenized documents |
Ambulance progress not fast enough | ambulance progress not fast enough |
Guinea declares Ebola emergency | guinea declare ebola emergency |
VIDEO: Sitting down poses health risk | video sit down pose health risk |