Voice of customer is typically captured through multiple connect points like surveys, warranty claims, social media, and so on. Customer verbatim is collected through these connect points to encourage free expression of opinion by customers. Such verbatim data is generally of high value and is typically analyzed using Natural Language Processing (NLP) techniques for translating into influencing actions in manufacturing, customer service, marketing, and product development departments. One of the challenges in analyzing unstructured verbatim data is to map that data onto appropriate concern codes (CCCs), which are typically used in automotive firms for tracking quality and satisfaction metrics. These concern codes map to a hierarchy of function areas in the organization aimed at improving product, service and hence the customer’s overall experience. In this paper, we discuss our approach to address the challenge of mapping customer verbatim to concern codes, which is a classical natural language text classification problem.
In this work, verbatim inputs from transactional systems of quality office, warranty claims, issues matrix, surveys, and social media content are used. Diverse sources and format of these textual comments pose challenges with free flow writing, content, language, and abbreviations. We apply traditional approaches of word counts and TF-IDF for representing documents. Considering complexity of word relationships, we adopt word embedding using word2vec to derive document vector representation of text. Machine learning models are then used to classify customer comments into concern codes. As next steps, we plan to use translation techniques to extend this comment classification framework for English to other languages.