Matchit: A Domain-Adaptable Information Extraction and Categorization Tool for Enhanced Decision Making

2025-01-8107

04/01/2025

Features
Event
WCX SAE World Congress Experience
Authors Abstract
Content
This paper presents Matchit, a novel method for expediting issue investigation and generating actionable insights from textual data. Recognizing the challenges of extracting relevant information from large, unstructured datasets, we propose a domain-adaptable approach by integrating expert domain knowledge to guide Large Language models (LLMs) to automatically identify and categorize key information into distinct topics. This process offers two key functionalities: fully automatic topic extraction based solely on input data, providing a concise overview of the problem and potential solutions, and user-guided extraction, where domain experts can specify the type of information or pre-defined categories to target specific insights. This flexibility allows for both broad exploration and focused analysis of the data. Matchit's efficacy is demonstrated through its application in the automotive industry, where it successfully extracts repair diagnostics from diverse textual sources like repair records, surveys, and customer service logs. By identifying and categorizing information related to failure modes, symptoms, repair actions, and procedures, Matchit enables efficient identification of similar repairs in new datasets, significantly reducing manual review efforts. Case studies presented demonstrate the tool's effectiveness in achieving accurate matching results. Matchit's versatility extends beyond the automotive domain, offering a powerful solution for any application requiring customized information extraction, categorization, and matching from textual data.
Meta TagsDetails
DOI
https://doi.org/10.4271/2025-01-8107
Pages
18
Citation
Wang, L., and Arora, K., "Matchit: A Domain-Adaptable Information Extraction and Categorization Tool for Enhanced Decision Making," SAE Technical Paper 2025-01-8107, 2025, https://doi.org/10.4271/2025-01-8107.
Additional Details
Publisher
Published
Apr 01
Product Code
2025-01-8107
Content Type
Technical Paper
Language
English