Country / Region
APAC
Tags
Artificial intelligence, Implementation, Innovation, Mapping, Pre/postcoordination, Tooling
Kakao Healthcare developed the Healthcare data Research Suite (HRS) platform to support multi-institutional research by enabling semantic interoperability through terminology standardization using SNOMED CT. However, manual mapping of local terms to SNOMED CT is labor-intensive and prone to inconsistency. To overcome these challenges, we developed and evaluated "Chipmunk”, a large language model (LLM)-based tool that automates SNOMED CT mapping and supports the authoring of new concepts through post-coordination.
The automated mapping process involved preprocessing local terms, applying syntactic matching, conducting vector similarity searches with an embedding model, and continuously enriching the reference terms through iterative incorporation of verified mapping results. New concepts were authored via a structured post-coordination process. Performance was evaluated using diagnostic and surgical procedural terms from four hospitals in South Korea with usability feedback from expert clinical terminologists.
After adding reference terms, accuracy in the diagnostic domain ranged from 89.69% to 98.67%, while in the surgical procedural domain, it ranged from 82.62% to 99.19%. The tool also reduced mapping time by 47% and human resource usage by 70%. Users reported reduced effort, decreased errors, and improved usability.
Chipmunk demonstrated strong accuracy and efficiency in real-world SNOMED CT mapping by integrating pre- and post-coordination into a unified workflow. However, challenges remain with complex surgical procedures, ambiguous terms, and abbreviations, highlighting the need for ongoing expert review. Future directions include integrating knowledge graph-based retrieval and enhanced validation features. Despite advances in automation, human expertise remains essential for ensuring precise and contextually accurate terminology mapping.
Description
Research Aim and Scope
This research aims to develop and evaluate Chipmunk, a LLM-based SNOMED CT automatic mapping tool designed for multi-institutional research. The scope encompasses the methodology for developing the tool, its implementation in a real-world environment, and a comprehensive evaluation—including both performance (accuracy) and usability assessments. The evaluation was conducted using diagnostic and surgical procedural terms collected from four South Korean hospitals. In addition, special attention is paid to the challenges of automatic mapping when standardizing clinical terminologies in multi-center settings.
Anticipated Impact
The system offers several key improvements: (1) it enhances resource and time efficiency through advanced preprocessing—such as Korean-to-English translation of local terms, abbreviation handling—and by applying curation rules, thereby significantly reducing repetitive manual tasks; (2) it improves the quality and accuracy of mapping by leveraging automated mapping results, reducing human errors and increasing precision; (3) it increases consistency by minimizing variability between different mappers that often occurs with manual mapping; (4) it supports an efficient workflow by enabling both pre-coordinated and post-coordinated mapping within a single process; and (5) it further advances usability with an intuitive, user-friendly interface—similar to Excel—compared to traditional methods. Ultimately, by using Chipmunk, large-scale mapping projects such as multi-institution terminology standardization can be performed more easily, quickly, accurately, and efficiently, thereby ensuring semantic interoperability and facilitating the meaningful use of clinical data.
Scope
1. Global Standard Terminology System: SNOMED CT is globally adopted, ensuring international standardization and interoperability.
2. Comprehensive Reference Terminology: It covers various medical domains, ensuring broad consistency in healthcare data.
3. Support for Post-Coordination: It enables creating new concepts by combining existing terms, providing flexibility.
4. Rich Attribute Relationships and Hierarchical Structure: Its detailed structure allows for sophisticated and precise data querying.
5. Standard Terminology System for HRS: It is the chosen standard for multi-institutional data standardization within Kakao Healthcare's HRS platform.
How SNOMED CT will be used
SNOMED CT serves as the standard terminology system for multi-institutional data standardization in the Healthcare data Research Suite (HRS) developed by Kakao Healthcare.
Here are the key ways SNOMED CT is utilized:
1. System Architecture and Implementation
The automatic mapping process begins with setting up a mapping project and includes two main steps: auto-mapping and new concept authoring. Final mappings are integrated into a database as the local term-SNOMED CT mapping table. Both mapping results and new SNOMED CT concepts are added to the reference term database, which holds previous mapping data and extensions. A terminology review committee, along with the mapping team, reviews and cross-validates these results
1.1 Auto-mapping Method
The overall automated mapping process, as illustrated in Figure 2. Key steps include:
* Preprocessing: Refine local terms by removing unnecessary symbols, expanding abbreviations, and translating non-English terms into English using tools like Google Cloud and OpenAI API.
* Syntactic Mapping: Local terms are mapped to exact text matches in SNOMED CT and reference databases, with search results restricted by semantic tags to ensure precision and efficiency. Consistency is maintained by applying mappings across duplicate terms while taking into consideration any domain differences.
* Vector Similarity Mapping: Vector similarity search is used with reference terms generated from SNOMED CT descriptions and mapped terms, embedding these terms with the text-embedding-3-small model and storing the results in the FAISS Vector DB for fast retrieval.
* Reference Term Integration and Improvement: Figure 3 illustrates the process of generating reference terms. Finalized mappings are added to the reference term set for ongoing refinement, with duplicate terms removed and unique ones undergoing vector embedding. This process uses an iterative approach, continuously enhancing tool performance across multiple institutions.
1.2 Authoring New Concepts
Chipmunk facilitates the authoring of new SNOMED CT concepts through a structured post-coordination process
* Focus Concept Selection: Begin with choosing a focus concept, aided by proximal primitive concept retrieval to ensure accurate supertype inference and minimize errors.
* Concept Modeling with MRCM: Model the concept following MRCM rules, utilizing guided naming to prevent errors. Chipmunk restricts attribute selection based on domain, ensuring only relevant ones are available, and narrows permissible values using ECL queries.
* Classification and Validation: These are performed to maintain precision and consistency, using tools like the SNOMED Release Validation Framework and the OWL Toolkit.
* Integration: Newly authored concepts are integrated into a mapping table, structured to include the source terminology and corresponding SNOMED CT identifiers.
2. System Evaluation
* Dataset: Data from four South Korean medical centers provided a diverse set of diagnostic and surgical procedural codes, ensuring comprehensive evaluation aligned with SNOMED CT semantic tags.
* Accuracy Evaluation: Accuracy was tested by comparing automated results to verified manual mappings, using top-1‚ and top-5 metrics. Performance was assessed both within individual hospitals and with added reference terms from others.
* Efficiency Evaluation: Baseline manual processes were timed and then compared to the tool-enabled processes to measure time savings and resource optimization.
Why SNOMED CT will be used
Contact


