2011年12月17日 星期六

What can natural language processing do for clinical decision support?Ch#6-8

6. Providing evidence: personalized context-sensitive summarization and question answering
6。提供證據:個性化上下文敏感總結和答疑

The need to link evidence to patients’ records was stated in the 1977 assessment of computer-based medical information systems undertaken because of increased concern over the quality and rising costs of medical care [103]. The assessment concluded that the quality and cost concerns could be addressed by medical information systems that will supply physicians with information and incorporate valid ?ndings of medical research [103]. The results of medical research might soon become directly available through querying clinical research databases, however to date, ?ndings of medical research can be primarily found in the literature. Following the 1977 report, medical informatics research focused on understanding physicians’ information needs and enabling physicians’ access to the published results of clinical studies. This research provides a solid foundation for NLP aimed at satisfying physicians’ desiderata. The most desired features include comprehensive specific bottom-line recommendations that anticipate and directly answer clinical questions, rapid access, current information, and evidence-based rationale for recommendations [104].

需要聯繫的證據,病人的病歷中申明1977年開展以計算機為基礎的醫療信息系統的評估,因為日益關注的質量和成本上升的醫療[103]。評估得出的結論是質量和成本問題,可以通過醫療信息系統,將提供資料的醫生和納入有效的醫療研究結果[103]。醫學研究的結果可能很快成為直接可通過查詢臨床研究數據庫,但迄今為止,醫學研究的結果可以主要見於文獻中。繼1977年的報告,醫學信息學研究主要集中在了解醫師的信息需求,使醫生獲得臨床研究結果公佈。這項研究提供了一個堅實的基礎,旨在滿足醫生的必要條件為 NLP。最想要的功能包括全面具體的底線預測和建議,直接回答的臨床問題,快速訪問,當前的信息,和證據為基礎的建議的理由[104]。

6.1. Clinical data and evidence summarization for clinicians

6.1。為臨床的臨床資料和證據匯總

Unlike the comparatively better researched summarization and visualization of structured clinical data [105–108], summarization of clinical narrative is an evolving area of research. Afantenos et al. surveyed the potential of summarization technology in the medical domain [109]. Van Vleck et al. identified information physicians consider relevant to summarizing a patient’s medical history in the medical record. The following categories were identified as necessary to capturing patient’s history: Labs and Tests, Problem and Treatment, History, Findings, Allergies, Meds, Plan, and Identifying Info [110]. Meng et al. approached generation of clinical notes as an extractive summarization problem [111]. In this approach, sentences containing patient information that needs to be repeated are extracted based on their rhetorical categories determined using semantic patterns. This extraction method compares favorably to the baseline extraction method (the position of a sentence in the note) on a test set of 162 sentences in urological clinical notes [111]. Cao et al summarized patients’ discharge summaries into problem lists [70].

不同於相對更好的研究總結和結構性的臨床數據的可視化[105-108],總結臨床敘事研究發展的領域。 Afantenos等。調查匯總的技術在醫療領域的潛力[109]。範 Vleck等。總結在一個病人的病歷,病歷確定的信息,醫生認為有關。要捕捉患者的病史,實驗室和試驗,存在的問題及處理,歷史,結果,過敏,MEDS,計劃,並確定信息[110]被確定為以下類別。 Meng等人。走近臨床筆記一代作為採掘總結問題[111]。在這種方法中,包含病人的信息,需要重複的句子中提取的基礎上確定使用語義模式的修辭類別。這種提取方法相比,毫不遜色在泌尿外科臨床記錄的162句的測試集上的基線提取方法(注意在一個句子中的位置)[111]。問題列出曹等人總結病人出院摘要[70]。

The PERSIVAL project (a prototype system, not currently in use) summarized medical scienti?c publications [112,113]. The summarization module of the PERSIVAL system generated summaries tailored for physicians and patients. Summaries generated for a physician contained information relevant to a specific patient’s record. Each publication was represented using a set of templates. Templates were then clustered into semantically related units in order to generate a summary [112,113].

PERSIVAL項目(一個原型系統,目前尚未使用)總結醫學科學出版物[112113]。總結 PERSIVAL系統模塊產生的摘要,為醫生和患者量身定做。醫生的生成摘要中包含的信息有關的特定病人的記錄。每個發布代表用一組模板。模板,然後將語義相關的單位集中在以生成摘要[112113]。

Based on the semantic abstraction paradigm, Fiszman et al. are developing a summarization system that relies on SemRep for semantic interpretation of the biomedical literature. The system condenses SemRep predications and presents them in graphical format [114]. We hope to see in the future if the above method holds promise for summarization and visual presentation of clinical notes.
基於語義的抽象範式,Fiszman等。正在開發一個總結系統上依賴於生物醫學文獻的語義解釋 SemRep。該系統凝結 SemRep predications和圖形格式[114]。我們希望看到在未來,如果上面的方法保存總結和視覺表現的臨床筆記承諾。

6.2. Clinical data and evidence summarization for patients
6.2。為患者的臨床資料和證據匯總

The online access to personal health and medical records and the overwhelming amount of health-related information available to patients (alternatively called health care consumers and lay users) pose many interesting questions. Hardcastle and Hallet studied which text segments of a patient record require explanation before being released to patients and what types of explanation are appropriate [115]. Elhadad and Sutaria presented an unsupervised method for building a lexicon of semantically equivalent pairs of technical and lay medical terms [116].
    Ahlfeldt et al. surveyed issues related to communicating technical medical terms in everyday language for patients and generating patient-friendly texts [117]. The survey presents research on alleviating the lack of understanding of clinical documents caused by medical terminology. This research includes generation of patient vocabularies and matching those vocabularies and problem lists with standard terminologies; generation of terminological resources, corpora and annotation tool; development of natural consumer  anguage generation systems; and customization of patient education materials [117]. Green presents the design of a discourse generator that plans the content and organization of lay-oriented genetic counseling documents to assist drafting letters that summarize the results for patients [118].

在線訪問個人健康和醫療記錄和健康有關的信息提供給患者(或者稱為醫療保健消費者,奠定用戶)絕大多數量,提出了許多有趣的問題。Hardcastle和Hallet研究病人記錄的文本段需要被釋放之前向患者解釋和什麼類型的解釋是適當的[115]。Elhadad和Sutaria提出了建設一個詞彙語義上等同於對技術的無監督方法和非專業醫學術語[116]。
    Ahlfeldt等人。在日常語言溝通技術的醫療條件,為病人和病人友好的文本[117]有關調查的問題。調查研究提出減輕缺乏引起的臨床醫療術語文件的理解。這項研究包括病人的詞彙和匹配的詞彙和標準術語的問題列出代代用語上的資源,語料庫和註釋工具;自然消費 anguage發電系統的發展;和定制的患者教育材料[117]。綠色呈現一個話語發生器的設計,計劃奠定面向遺傳諮詢文件的內容和組織協助起草的信件,總結患者的結果[118]。

6.3. Clinical question answering
6.3。臨床問題回答

One of the principal purposes of CDS is answering questions[14]. Questions occurring in clinical situations could pertain to "information on particular patients; data on health and sickness within the local population; medical knowledge; local information on doctors available for referral; information on local social in?uences and expectations; and information on scientific, political, legal, social, management, and ethical changes affecting both how medicine is practiced and how doctors interact with individual patients” [119]. Some questions do not need NLP and can be answered directly by a known resource. For example, the NLM Go Local service19 (which connects users to health services in their local communities and directs users of the Go Local sites to MedlinePlus health information) was established to answer logistics questions by providing access to local information. Questions about particular patients are currently answered by manually browsing or searching the EHR. Answering these questions can be facilitated by summarization (which requires NLP if information is extracted from free-text fields) and visualization tools [105–108]. Facilitating access to medical knowledge by providing answers to clinical questions is an area of active NLP research [120]. The goal of clinical question answering systems is to satisfy medical knowledge questions providing answers in the form of short action items supported by strong evidence.

CDS的主要目的之一是回答問題 [14]。在臨床情況下發生的問題可能涉及到“特別是病人的信息;在當地居民的健康和疾病的數據;醫療知識,對醫生轉介的本地信息,對當地的社會影響和期望的信息;和信息科學,政治,法律,社會,管理,和道德的變化影響都實行醫藥是如何和醫生如何與個別病人的互動“[119]有些問題並不需要NLP和可直接回答已知的資源,例如,NLM的本地service19(連接用戶在當地社區衛生服務,並指導用戶轉到本地網站 MedlinePlus衛生信息)建立物流問題的答案提供訪問本地信息,關於特別是病人的問題,目前正在通過手動瀏覽或搜索的回答電子病歷回答這些問題總結(需要NLP的信息是從自由文本字段中提取)和可視化工具,可以促進[105-108]促進獲得醫療知識,提供臨床問題的答案是一個活躍的領域NLP的研究[120]。臨床問題回答系統的目標是為了滿足醫學知識的問題提供了有力的證據支持短期行動項目的形式答案

 Jacquemart and Zweigenbaum studied the feasibility of answering students’ questions in the domain of oral pathology using Web resources. Questions involving pathology,procedures, treatments,examinations, indications, diagnosis and anatomy were used to develop eight broad semantic models comprised of 66 different syntactico-semantic patterns representing the questions. The triple-based model ([concept]–(relation)–[concept]) combined with which, why, and does modalities accounted for a vast majority of questions. The formally represented questions were used to query 10 different search engines. Search results were checked manually to find a passage answering the question in a consistent context[121]

Jacquemart和Zweigenbaum回答學生的問題在口腔病理學域使用網絡資源的可行性研究。涉及病理的問題,被用來開發8大66不同syntactico的語義表示問題的模式組成的語義模型的過程,治療,檢查,適應症,診斷和解剖。基於三重模式([概念] - (關係) - [概念])結合,為什麼,不佔絕大多數的問題的方式。正式代表的問題,用於查詢10個不同的搜索引擎。檢查手動搜索結果找到一個通道,在一致的情況下回答問題[121]

The [concept]–(relation)–[concept] triples generated by SemRep can be used to generate conceptual condensates that summarize a set of documents [114], or answer speci?c questions, for example, ?nding the best pharmacotherapy for a given disease[65]. Within the EpoCare project, the same question type is answered by using an SVM to classify MEDLINE abstract sentences as containing an outcome (answer) or not and extracting the high-ranking sentences [122]. The CQA-1.0 system also implements an Evidence Based Medicine (EBM)-inspired approach to outcome extraction [120]. In addition to extracting outcomes from individual MEDLINE abstracts to answer a wide range of questions,the CQA-1.0 system aggregates answers to questions about the best drug therapy into 5–6 drug classes generated based on the individual pharmaceutical treatments extracted from each abstract. Each class is supported by the strongest patient-oriented outcome pertaining to each drug in the class. The EpoCare and CQA-1.0 systems rely on the Patient-Intervention-Comparison-Outcome (PICO) framework developed to help clinicians formulate clinical questions [99]. The MedQA system answers de?nitional questions by integrating information retrieval, extraction, and summarization techniques to automatically generate paragraph-level text [123].

[概念] - (關係) - [概念] SemRep生成的三倍,可以用來產生概念的凝析油,總結一套
文件[114],或回答具體問題,例如,找到一個給定的疾病的最佳藥物[65]。在該 EpoCare項目中,同樣的問題類型是通過使用SVM分類包含的結果(答案)或不提取的高級的句子[122]論文摘要的句子回答。CQA- 1.0系統還實現了循證醫學(EBM)啟發的結果提取方法[120]。除了提取結果個人論文摘要,回答了範圍廣泛的問題,CQA-1.0系統聚集大約為5-6類藥物的最佳藥物治療的問題的答案從每個抽象提取的個人藥物治療的基礎上產生。每個類是最強的病人為本的有關藥物類中的每個結果的支持。 EpoCare和CQA- 1.0系統依靠病人干預比較成果(皮秒)的框架,以幫助臨床醫生制定臨床問題[99]。 MedQA系統集成信息檢索,提取和總結技術,自動生成段級文字[123]定義問題的答案。

7. Clinical NLP: direct applications of NLP in healthcare
   In addition to processing text pertaining to patients and generated by clinicians and researchers, NLP methods have been applied directly to patients’ narratives for diagnostic and prognostic purposes.
   The Linguistic Inquiry and Word Count (LIWC)20 tool was used to explore personality expressed through a person’s linguistic style [124]. The LIWC tool (which calculates the percentage of words in written text that match up to 82 language dimensions) was evaluated in predicting post-bereavement improvements in mental and physical health [125], predicting adjustment to cancer [126], differentiating between the Internet message board entries and homepages of pro-anorexics or recovering anorexics [127], and recognizing suicidal and non-suicidal individuals [128]. Pestian et al. demonstrated that the sequential minimization optimization algorithm can classify completer and simulated suicide notes as well as mental health professionals [129].
   Another potential clinical NLP application is assessment of neurodegenerative impairments. Roark et al. studied automation of NLP methods for diagnosis of mild cognitive impairment (MCI).Automatic psychometric evaluation included syntactic annotation and analysis of spoken language samples elicited during neuropsychological exams of elderly subjects. Evaluation of syntactic complexity of the narrative was based on analysis of dependency structures and deviations from the standard (for English) rightbranching trees in parse trees of subjects’ utterances. Measures derived from automatic parses highly correlated with manually derived measures, indicating that automatically derived measures may be useful for discriminating between healthy and MCI subjects. [130].

7。臨床 NLP:NLP的直接應用在醫療保健
   除了處理有關患者和臨床醫生和研究人員所產生的文本,NLP方法已直接應用於病人的診斷和預後的目的“敘述。   的語言調查和Word計數(LIWC)20工具被用來探索通過人的語言風格表達的個性[124]。 LIWC工具(計算中的單詞匹配多達82個語言尺寸的書面文本的百分比)進行了評估預測後喪親之痛的改進,在精神和身體健康[125],預測調整為癌症[126],區分互聯網留言板條目和網頁的親厭食症或恢復厭食症[127],並認識到自殺及非自殺的人[128]。 Pestian等人。表明,順序最小化的優化算法分類的完備和模擬自殺筆記,以及心理衛生專業人員[129]。
   另一個潛在的臨床應用 NLP應用是神經退行性損傷評估。含有Roark等人。 NLP的方法研究自動化診斷輕度認知功能障礙(MCI)的自動心理評估包括口語樣本的語法註釋和分析引起老年受試者的神經心理學考試期間。的敘事語法的複雜性的評價是基於依賴結構和rightbranching科目“話語的解析樹的樹木(英文)的標準偏差分析。從自動產生的高度相關措施分析與手動派生的措施,表明自動派生的措施,可用於健康和MCI科目之間的歧視。 [130]。

Clinical NLP is also used for medication compliance and drug
abuse monitoring. Butler et al. explored usefulness of content analysis of Internet message board postings for detection of potentially abusable opioid analgesics [131]. In this study, attractiveness for abuse of OxyContin, Vicodin, and Kadian determined automatically (using the total number of posts by product, total number of mentions by product (including synonyms and misspellings), total number of posts containing at least one mention of each product,total number of unique authors, and the number of unique authors of posts referencing any of the 3 target products) was compared to the known attractiveness of the products. The numbers of mentions of the products were signi?cantly different and corresponded to the product attractiveness. Based on this and other metrics, the authors conclude that a systematic approach to post-marketing surveillance of Internet chatter related to pharmaceutical products is feasible [131]. Understanding patient compliance issues could help in clinical decisions. This understanding could be gained through processing of informal textual communications found in the publicly available blog postings and e-mail archives. For example, Malouf et al. analyzed 316,373 posts to 19 Internet discussion groups and other websites from 8731 distinct users and found associations (such as cognitive side effects, risks, and dosage related issues) the epilepsy patients and their caregivers have for different medications [132].

NLP也是臨床用於服藥依從性和藥物濫用監測。巴特勒等人。探討互聯網留言板帖子內容分析的用處檢測潛在的abusable阿片類鎮痛藥[131]。在這項研究中,自動確定 OxyContin,Vicodin,並 Kadian濫用的吸引力(按產品的職位總數,提到了產品的總人數(包括同義詞和拼寫錯誤),至少包含一個提到每個職位總數的產品,獨特的作者總數,和獨特的作者引用的3個目標產品)的職位數相比,產品的已知的吸引力。提到的產品的數量均顯著不同,相當於產品的吸引力。和其他指標的基礎上,作者得出結論,上市後監視互聯網相關的醫藥產品的喋喋不休,系統化的方法是可行的[131]。了解患者的依從性問題,可以幫助臨床決策。這種理解可能會獲得通過公開發布的博客文章和電子郵件檔案中發現的非正式文本通信處理。例如,Malouf等。分析316373職位,以19互聯網討論組和8731不同用戶和協會(如認知的副作用,風險,與劑量相關的問題)的癲癇患者和他們的照顧者的其他網站有不同的藥物[132]。

To the best of our knowledge, the applications described in this section are experimental rather than deployed and regularly used in clinical setting. The dif?culties in translation of clinical NLP research into clinical practice and obstacles in determining the level of practical engagement of NLP systems are discussed in the next section.

據我們所知,在本節中所述的應用程序,而不是部署,並在臨床上經常使用的實驗。在翻譯的NLP的臨床研究到臨床實踐,並在確定 NLP的系統的實際參與程度的障礙,困難是在下一節討論。

 Most of the above presented methods and systems were developed for speci?c users, document types and CDS goals. Future research might indicate if such systems could be easily retargeted for new users and goals and whether the retargeted systems can compete with those designed for speci?c tasks and clinical systems. Evaluation methods for measuring the impact of NLP methods on healthcare in addition to reliable standardized evaluation of NLP systems need to be developed.

上述提出的方法和系統開發為特定的用戶,文檔類型和CDS目標。未來的研究可能表明,如果這種系統可以很容易地為新用戶和目標重定向和重定向系統是否能與之抗衡的具體任務和臨床系統設計的。除了可靠NLP的系統的標準化評價的測量 NLP的方法對醫療保健的影響的評價方法需要開發。

For several issues very important to the future development of NLP for CDS, there is currently only anecdotal evidence and sparse publications. For example, with few exceptions, we do not know which of the reviewed NLP–CDS systems are actually implemented or deployed, and what makes these systems worthwhile. We might speculate that, for example, MedLEE is successfully integrated with a clinical information system because it was developed and adapted, as needed, for specific users and CDS goals, but the reason for its success could also be its sophisticated NLP. We could better judge which features determine whether NLP–CDS systems are applied outside of the experimental setting if we had more data points. We believe it would be valuable to have a special venue for presenting case studies and analysis of applied NLP systems in the near future.

NLP的CD的未來發展非常重要的幾個問題,是目前唯一的證據和稀疏的出版物。例如,除了少數例外,我們不知道其中的NLP- CDS系統實際上是實施或部署,是什麼讓這些系統值得。我們不妨推測,例如,MedLEE是成功地與臨床信息系統集成,因為它是開發和調整,需要為特定的用戶和CDS的目標,但其成功的原因也可能是其先進的自然語言處理。我們可以更好地判斷哪些功能確定是否NLP- CDS系統應用實驗的設置之外,如果我們有更多的數據點。我們相信這將是有價值的,有一個特殊的案例研究和應用自然語言處理系統的分析,在不久的將來提出場地。

Priorities in NLP development will be determined by the readiness of intended users to adopt NLP. The early successes in NLP and CDS led to high user expectations that were not always met. NLP researchers need to re-gain clinicians’ trust, which is achievable based on better understanding of the NLP strengths and weaknesses by  clinicians, as well as significant progress in biomedical NLP. Reacquainting clinicians with NLP can be facilitated by  NLP training, well-planned NLP experiments, careful and thoughtful evaluation of the results, high-quality implementation of NLP modules, semi-automated and easier methods for adapting NLP for other domains, and evaluations of NLP–CDS adequacy in satisfying user needs.

在NLP發展的優先任務將是確定的目標用戶願意採用自然語言處理。在NLP和CDS的早期成功導致高的用戶並不總是達到預期。NLP的研究人員需要重新獲得醫生的信任,這是可以實現的基礎上更好地了解NLP的長處和弱點,由臨床醫生,以及在生物醫學 NLP的方面取得了重大進展。NLP的訓練,精心策劃的NLP的實驗,細心周到的評價結果,高品質的NLP模塊的實施,半自動化,更容易適應其他域的自然語言處理的方法,和NLP的評價可以促進 Reacquainting臨床與 NLP- CDS在滿足用戶需求的充足。

We believe NLP can contribute to decision support for all groups involved in the clinical process, but the development will probably focus on the areas for which there is higher demand. For example,if researchers are more eager consumers of NLP than clinicians,NLP research into text mining and literature summarization will continue dominating the field.

我們相信NLP可以有助於在臨床過程中涉及的所有群體的決策支持,但發展很可能會集中在哪些領域有更高的要求。例如,如果研究人員正在比醫生更渴望NLP的消費者,自然語言處理到文本挖掘和總結文學的研究將繼續稱霸該領域。

The NLP CDS tasks are so numerous and complex that this area of research will succeed in making practical impact only as a result of coordinated community-wide effort.
NLP的CDS的任務是如此紛繁複雜的,這一領域的研究將成功只能作為協調社會各界的努力的結果,實際影響。

沒有留言:

張貼留言