Learning More with Less: GAN-based Medical Image Augmentation
Changhee Han, Kohei Murao, Shin'ichi Satoh, Hideki Nakayama
Date Published: 7 May, 2019(Originally Published: 29 Mar, 2019)
Categories at arxiv.org: cs.CV, cs.AI
6 pages, 2 figures, Accepted to MEDICAL IMAGING TECHNOLOGY Special Issue
Convolutional Neural Network (CNN)-based accurate prediction typically requires large-scale annotated training data. In Medical Imaging, however, both obtaining medical data and annotating them by expert physicians are challenging; to overcome this lack of data, Data Augmentation (DA) using Generative Adversarial Networks (GANs) is essential, since they can synthesize additional annotated training data to handle small and fragmented medical images from various scanners--those generated images, realistic but completely novel, can further fill the real image distribution uncovered by the original dataset. As a tutorial, this paper introduces GAN-based Medical Image Augmentation, along with tricks to boost classification/object detection/segmentation performance using them, based on our experience and related work. Moreover, we show our first GAN-based DA work using automatic bounding box annotation, for robust CNN-based brain metastases detection on 256 x 256 MR images; GAN-based DA can boost 10% sensitivity in diagnosis with a clinically acceptable number of additional False Positives, even with highly-rough and inconsistent bounding boxes.
PURPOSE: The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools monitoring and prioritizing the literature to understand the clinical implications of the pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance (risk of cancer for germline mutation carriers) or prevalence of germline genetic mutations. METHODS: We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated dataset for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule based on the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule based on the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence. RESULTS: For penetrance classification, we annotated 3740 paper titles and abstracts and used 60% for training the model, 20% for tuning the model, and 20% for evaluating the model. The SVM model achieves 89.53% accuracy (percentage of papers that were correctly classified) while the CNN model achieves 88.95 % accuracy. For prevalence classification, we annotated 3753 paper titles and abstracts. The SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 % accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date.
The Hausdorff Distance (HD) is widely used in evaluating medical image segmentation methods. However, existing segmentation methods do not attempt to reduce HD directly. In this paper, we present novel loss functions for training convolutional neural network (CNN)-based segmentation methods with the goal of reducing HD directly. We propose three methods to estimate HD from the segmentation probability map produced by a CNN. One method makes use of the distance transform of the segmentation boundary. Another method is based on applying morphological erosion on the difference between the true and estimated segmentation maps. The third method works by applying circular/spherical convolution kernels of different radii on the segmentation probability maps. Based on these three methods for estimating HD, we suggest three loss functions that can be used for training to reduce HD. We use these loss functions to train CNNs for segmentation of the prostate, liver, and pancreas in ultrasound, magnetic resonance, and computed tomography images and compare the results with commonly-used loss functions. Our results show that the proposed loss functions can lead to approximately 18-45 % reduction in HD without degrading other segmentation performance criteria such as the Dice similarity coefficient. The proposed loss functions can be used for training medical image segmentation methods in order to reduce the large segmentation errors.
Measuring Patient Similarities via a Deep Architecture with Medical Concept Embedding
Zihao Zhu, Changchang Yin, Buyue Qian, Yu Cheng, Jishang Wei, Fei Wang
Date Published: 9 Feb, 2019
Categories at arxiv.org: stat.ML, cs.AI, cs.LG
Published in ICDM 2016, arXiv version. Code link is added
Evaluating the clinical similarities between pairwise patients is a fundamental problem in healthcare informatics. A proper patient similarity measure enables various downstream applications, such as cohort study and treatment comparative effectiveness research. One major carrier for conducting patient similarity research is Electronic Health Records(EHRs), which are usually heterogeneous, longitudinal, and sparse. Though existing studies on learning patient similarity from EHRs have shown being useful in solving real clinical problems, their applicability is limited due to the lack of medical interpretations. Moreover, most previous methods assume a vector-based representation for patients, which typically requires aggregation of medical events over a certain time period. As a consequence, temporal information will be lost. In this paper, we propose a patient similarity evaluation framework based on the temporal matching of longitudinal patient EHRs. Two efficient methods are presented, unsupervised and supervised, both of which preserve the temporal properties in EHRs. The supervised scheme takes a convolutional neural network architecture and learns an optimal representation of patient clinical records with medical concept embedding. The empirical results on real-world clinical data demonstrate substantial improvement over the baselines. We make our code and sample data available for further study.
Adversarial Attacks Against Medical Deep Learning Systems
Samuel G. Finlayson, Hyung Won Chung, Isaac S. Kohane, Andrew L. Beam
Date Published: 4 Feb, 2019(Originally Published: 15 Apr, 2018)
Categories at arxiv.org: cs.CR, cs.CY, cs.LG, stat.ML
The discovery of adversarial examples has raised concerns about the practical deployment of deep learning systems. In this paper, we demonstrate that adversarial examples are capable of manipulating deep learning systems across three clinical domains. For each of our representative medical deep learning classifiers, both white and black box attacks were highly successful. Our models are representative of the current state of the art in medical computer vision and, in some cases, directly reflect architectures already seeing deployment in real world clinical settings. In addition to the technical contribution of our paper, we synthesize a large body of knowledge about the healthcare system to argue that medicine may be uniquely susceptible to adversarial attacks, both in terms of monetary incentives and technical vulnerability. To this end, we outline the healthcare economy and the incentives it creates for fraud and provide concrete examples of how and why such attacks could be realistically carried out. We urge practitioners to be aware of current vulnerabilities when deploying deep learning systems in clinical settings, and encourage the machine learning community to further investigate the domain-specific characteristics of medical learning systems.
Training Medical Image Analysis Systems like Radiologists
Gabriel Maicas, Andrew P. Bradley, Jacinto C. Nascimento, Ian Reid, Gustavo Carneiro
Date Published: 4 Feb, 2019(Originally Published: 28 May, 2018)
Categories at arxiv.org: cs.CV, cs.AI
Oral Presentation at MICCAI 2018
The training of medical image analysis systems using machine learning approaches follows a common script: collect and annotate a large dataset, train the classifier on the training set, and test it on a hold-out test set. This process bears no direct resemblance with radiologist training, which is based on solving a series of tasks of increasing difficulty, where each task involves the use of significantly smaller datasets than those used in machine learning. In this paper, we propose a novel training approach inspired by how radiologists are trained. In particular, we explore the use of meta-training that models a classifier based on a series of tasks. Tasks are selected using teacher-student curriculum learning, where each task consists of simple classification problems containing small training sets. We hypothesize that our proposed meta-training approach can be used to pre-train medical image analysis models. This hypothesis is tested on the automatic breast screening classification from DCE-MRI trained with weakly labeled datasets. The classification performance achieved by our approach is shown to be the best in the field for that application, compared to state of art baseline approaches: DenseNet, multiple instance learning and multi-task learning.
Medical Diagnosis with a Novel SVM-CoDOA Based Hybrid Approach
M. Hanefi Calp
Date Published: 2 Feb, 2019
Categories at arxiv.org: cs.NE, cs.AI
9 pages
Machine Learning is an important sub-field of the Artificial Intelligence and it has been become a very critical task to train Machine Learning techniques via effective method or techniques. Recently, researchers try to use alternative techniques to improve ability of Machine Learning techniques. Moving from the explanations, objective of this study is to introduce a novel SVM-CoDOA (Cognitive Development Optimization Algorithm trained Support Vector Machines) system for general medical diagnosis. In detail, the system consists of a SVM, which is trained by CoDOA, a newly developed optimization algorithm. As it is known, use of optimization algorithms is an essential task to train and improve Machine Learning techniques. In this sense, the study has provided a medical diagnosis oriented problem scope in order to show effectiveness of the SVM-CoDOA hybrid formation.
Typical personal medical data contains sensitive information about individuals. Storing or sharing the personal medical data is thus often risky. For example, a short DNA sequence can provide information that can not only identify an individual, but also his or her relatives. Nonetheless, most countries and researchers agree on the necessity of collecting personal medical data. This stems from the fact that medical data, including genomic data, are an indispensable resource for further research and development regarding disease prevention and treatment. To prevent personal medical data from being misused, techniques to reliably preserve sensitive information should be developed for real world application. In this paper, we propose a framework called anonymized generative adversarial networks (AnomiGAN), to improve the maintenance of privacy of personal medical data, while also maintaining high prediction performance. We compared our method to state-of-the-art techniques and observed that our method preserves the same level of privacy as differential privacy (DP), but had better prediction results. We also observed that there is a trade-off between privacy and performance results depending on the degree of preservation of the original data. Here, we provide a mathematical overview of our proposed model and demonstrate its validation using UCI machine learning repository datasets in order to highlight its utility in practice. Experimentally, our approach delivers a better performance compared to that of the DP approach.
Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging
Ping-Chun Hsieh, Xi Liu, Anirban Bhattacharya, P. R. Kumar
Date Published: 14 May, 2019(Originally Published: 29 Oct, 2018)
Categories at arxiv.org: cs.LG, stat.ML
To appear in ICML 2019
Sequential decision making for lifetime maximization is a critical problem in many real-world applications, such as medical treatment and portfolio selection. In these applications, a `reneging' phenomenon, where participants may disengage from future interactions after observing an unsatisfiable outcome, is rather prevalent. To address the above issue, this paper proposes a model of heteroscedastic linear bandits with reneging, which allows each participant to have a distinct `satisfaction level,' with any interaction outcome falling short of that level resulting in that participant reneging. Moreover, it allows the variance of the outcome to be context-dependent. Based on this model, we develop a UCB-type policy, namely HR-UCB, and prove that it achieves \pch{$\mathcal{O}\big(\sqrt{{T}(\log({T}))^{3}}\big)$} regret. Finally, we validate the performance of HR-UCB via simulations.
Gauge Equivariant Convolutional Networks and the Icosahedral CNN
Taco S. Cohen, Maurice Weiler, Berkay Kicanaoglu, Max Welling
Date Published: 13 May, 2019(Originally Published: 11 Feb, 2019)
Categories at arxiv.org: cs.LG, cs.CV, cs.NE, stat.ML
Proceedings of the International Conference on Machine Learning (ICML), 2019
The principle of equivariance to symmetry transformations enables a theoretically grounded approach to neural network architecture design. Equivariant networks have shown excellent performance and data efficiency on vision and medical imaging problems that exhibit symmetries. Here we show how this principle can be extended beyond global symmetries to local gauge transformations. This enables the development of a very general class of convolutional neural networks on manifolds that depend only on the intrinsic geometry, and which includes many popular methods from equivariant and geometric deep learning. We implement gauge equivariant CNNs for signals defined on the surface of the icosahedron, which provides a reasonable approximation of the sphere. By choosing to work with this very regular manifold, we are able to implement the gauge equivariant convolution using a single conv2d call, making it a highly scalable and practical alternative to Spherical CNNs. Using this method, we demonstrate substantial improvements over previous methods on the task of segmenting omnidirectional images and global climate patterns.
Adversarial Examples for Electrocardiograms
Xintian Han, Yuxuan Hu, Luca Foschini, Larry Chinitz, Lior Jankelson, Rajesh Ranganath
Date Published: 13 May, 2019
Categories at arxiv.org: eess.SP, cs.CR, cs.LG, stat.ML
Among all physiological signals, electrocardiogram (ECG) has seen some of the largest expansion in both medical and recreational applications with the rise of single-lead versions. These versions are embedded in medical devices and wearable products such as the injectable Medtronic Linq monitor, the iRhythm Ziopatch wearable monitor, and the Apple Watch Series 4. Recently, deep neural networks have been used to classify ECGs, outperforming even physicians specialized in cardiac electrophysiology. However, deep learning classifiers have been shown to be brittle to adversarial examples, including in medical-related tasks. Yet, traditional attack methods such as projected gradient descent (PGD) create examples that introduce square wave artifacts that are not physiological. Here, we develop a method to construct smoothed adversarial examples. We chose to focus on models learned on the data from the 2017 PhysioNet/Computing-in-Cardiology Challenge for single lead ECG classification. For this model, we utilized a new technique to generate smoothed examples to produce signals that are 1) indistinguishable to cardiologists from the original examples 2) incorrectly classified by the neural network. Further, we show that adversarial examples are not rare. Deep neural networks that have achieved state-of-the-art performance fail to classify smoothed adversarial ECGs that look real to clinical experts.
Deep Landscape Forecasting for Real-time Bidding Advertising
Kan Ren, Jiarui Qin, Lei Zheng, Zhengyu Yang, Weinan Zhang, Yong Yu
Date Published: 12 May, 2019(Originally Published: 7 May, 2019)
Categories at arxiv.org: cs.IR, cs.GT, cs.LG
KDD 2019. The reproducible code and dataset link is https://github.com/rk2900/DLF
The emergence of real-time auction in online advertising has drawn huge attention of modeling the market competition, i.e., bid landscape forecasting. The problem is formulated as to forecast the probability distribution of market price for each ad auction. With the consideration of the censorship issue which is caused by the second-price auction mechanism, many researchers have devoted their efforts on bid landscape forecasting by incorporating survival analysis from medical research field. However, most existing solutions mainly focus on either counting-based statistics of the segmented sample clusters, or learning a parameterized model based on some heuristic assumptions of distribution forms. Moreover, they neither consider the sequential patterns of the feature over the price space. In order to capture more sophisticated yet flexible patterns at fine-grained level of the data, we propose a Deep Landscape Forecasting (DLF) model which combines deep learning for probability distribution forecasting and survival analysis for censorship handling. Specifically, we utilize a recurrent neural network to flexibly model the conditional winning probability w.r.t. each bid price. Then we conduct the bid landscape forecasting through probability chain rule with strict mathematical derivations. And, in an end-to-end manner, we optimize the model by minimizing two negative likelihood losses with comprehensive motivations. Without any specific assumption for the distribution form of bid landscape, our model shows great advantages over previous works on fitting various sophisticated market price distributions. In the experiments over two large-scale real-world datasets, our model significantly outperforms the state-of-the-art solutions under various metrics.
Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are the most popular non-linear predictive models used in practice today, yet comparatively little attention has been paid to explaining their predictions. Here we significantly improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the general US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model's performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.
Ink removal from histopathology whole slide images by combining classification, detection and image generation models
Sharib Ali, Nasullah Khalid Alham, Clare Verrill, Jens Rittscher
Date Published: 10 May, 2019
Categories at arxiv.org: cs.CV, cs.LG, eess.IV
Accepted paper at IEEE International Symposium on Biomedical Imaging (ISBI) 2019, Venice, Italy
Histopathology slides are routinely marked by pathologists using permanent ink markers that should not be removed as they form part of the medical record. Often tumour regions are marked up for the purpose of highlighting features or other downstream processing such an gene sequencing. Once digitised there is no established method for removing this information from the whole slide images limiting its usability in research and study. Removal of marker ink from these high-resolution whole slide images is non-trivial and complex problem as they contaminate different regions and in an inconsistent manner. We propose an efficient pipeline using convolution neural networks that results in ink-free images without compromising information and image resolution. Our pipeline includes a sequential classical convolution neural network for accurate classification of contaminated image tiles, a fast region detector and a domain adaptive cycle consistent adversarial generative model for restoration of foreground pixels. Both quantitative and qualitative results on four different whole slide images show that our approach yields visually coherent ink-free whole slide images.
The dearth of prescribing guidelines for physicians is one key driver of the current opioid epidemic in the United States. In this work, we analyze medical and pharmaceutical claims data to draw insights on characteristics of patients who are more prone to adverse outcomes after an initial synthetic opioid prescription. Toward this end, we propose a generative model that allows discovery from observational data of subgroups that demonstrate an enhanced or diminished causal effect due to treatment. Our approach models these sub-populations as a mixture distribution, using sparsity to enhance interpretability, while jointly learning nonlinear predictors of the potential outcomes to better adjust for confounding. The approach leads to human-interpretable insights on discovered subgroups, improving the practical utility for decision support
Recently, Deep Neural Networks (DNNs) have been achieving impressive results on wide range of tasks. However, they suffer from being well-calibrated. In decision-making applications, such as autonomous driving or medical diagnosing, the confidence of deep networks plays an important role to bring the trust and reliability to the system. To calibrate the deep networks' confidence, many probabilistic and measure-based approaches are proposed. Temperature Scaling (TS) is a state-of-the-art among measure-based calibration methods which has low time and memory complexity as well as effectiveness. In this paper, we study TS and show it does not work properly when the validation set that TS uses for calibration has small size or contains noisy-labeled samples. TS also cannot calibrate highly accurate networks as well as non-highly accurate ones. Accordingly, we propose Attended Temperature Scaling (ATS) which preserves the advantages of TS while improves calibration in aforementioned challenging situations. We provide theoretical justifications for ATS and assess its effectiveness on wide range of deep models and datasets. We also compare the calibration results of TS and ATS on skin lesion detection application as a practical problem where well-calibrated system can play important role in making a decision.
Clinical diagnostic decision making and population-based studies often rely on multi-modal data which is noisy and incomplete. Recently, several works proposed geometric deep learning approaches to solve disease classification, by modeling patients as nodes in a graph, along with graph signal processing of multi-modal features. Many of these approaches are limited by assuming modality- and feature-completeness, and by transductive inference, which requires re-training of the entire model for each new test sample. In this work, we propose a novel inductive graph-based approach that can generalize to out-of-sample patients, despite missing features from entire modalities per patient. We propose multi-modal graph fusion which is trained end-to-end towards node-level classification. We demonstrate the fundamental working principle of this method on a simplified MNIST toy dataset. In experiments on medical data, our method outperforms single static graph approach in multi-modal disease classification.
Artificial intelligence (AI) researchers claim that they have made great `achievements' in clinical realms. However, clinicians point out the so-called `achievements' have no ability to implement into natural clinical settings. The root cause for this huge gap is that many essential features of natural clinical tasks are overlooked by AI system developers without medical background. In this paper, we propose that the clinical benchmark suite is a novel and promising direction to capture the essential features of the real-world clinical tasks, hence qualifies itself for guiding the development of AI systems, promoting the implementation of AI in real-world clinical practice.
Unsupervised Temperature Scaling: Post-Processing Unsupervised Calibration of Deep Models Decisions
Azadeh Sadat Mozafari, Hugo Siqueira Gomes, Wilson Leão, Christian Gagné
Date Published: 8 May, 2019(Originally Published: 1 May, 2019)
Categories at arxiv.org: cs.CV, cs.LG
arXiv admin note: text overlap with arXiv:1810.11586
Great performances of deep learning are undeniable, with impressive results on wide range of tasks. However, the output confidence of these models is usually not well calibrated, which can be an issue for applications where confidence on the decisions is central to bring trust and reliability (e.g., autonomous driving or medical diagnosis). For models using softmax at the last layer, Temperature Scaling (TS) is a state-of-the-art calibration method, with low time and memory complexity as well as demonstrated effectiveness.TS relies on a T parameter to rescale and calibrate values of the softmax layer, using a labelled dataset to determine the value of that parameter.We are proposing an Unsupervised Temperature Scaling (UTS) approach, which does not dependent on labelled samples to calibrate the model,allowing, for example, using a part of test samples for calibrating the pre-trained model before going into inference mode. We provide theoretical justifications for UTS and assess its effectiveness on the wide range of deep models and datasets. We also demonstrate calibration results of UTS on skin lesion detection, a problem where a well-calibrated output can play an important role for accurate decision-making.
Clinical decision making is challenging because of pathological complexity, as well as large amounts of heterogeneous data generated as part of routine clinical care. In recent years, machine learning tools have been developed to aid this process. Intensive care unit (ICU) admissions represent the most data dense and time-critical patient care episodes. In this context, prediction models may help clinicians determine which patients are most at risk and prioritize care. However, flexible tools such as artificial neural networks (ANNs) suffer from a lack of interpretability limiting their acceptability to clinicians. In this work, we propose a novel interpretable Bayesian neural network architecture which offers both the flexibility of ANNs and interpretability in terms of feature selection. In particular, we employ a sparsity inducing prior distribution in a tied manner to learn which features are important for outcome prediction. We evaluate our approach on the task of mortality prediction using two real-world ICU cohorts. In collaboration with clinicians we found that, in addition to the predicted outcome results, our approach can provide novel insights into the importance of different clinical measurements. This suggests that our model can support medical experts in their decision making process.
Integrative Analysis of Patient Health Records and Neuroimages via Memory-based Graph Convolutional Network
Xi Sheryl Zhang, Jingyuan Chou, Fei Wang
Date Published: 7 May, 2019(Originally Published: 17 Sep, 2018)
Categories at arxiv.org: cs.LG, stat.ML
With the arrival of the big data era, more and more data are becoming readily available in various real-world applications and those data are usually highly heterogeneous. Taking computational medicine as an example, we have both Electronic Health Records (EHR) and medical images for each patient. For complicated diseases such as Parkinson's and Alzheimer's, both EHR and neuroimaging information are very important for disease understanding because they contain complementary aspects of the disease. However, EHR and neuroimage are completely different. So far the existing research has been mainly focusing on one of them. In this paper, we proposed a framework, Memory-Based Graph Convolution Network (MemGCN), to perform integrative analysis with such multi-modal data. Specifically, GCN is used to extract useful information from the patients' neuroimages. The information contained in the patient EHRs before the acquisition of each brain image is captured by a memory network because of its sequential nature. The information contained in each brain image is combined with the information read out from the memory network to infer the disease state at the image acquisition timestamp. To further enhance the analytical power of MemGCN, we also designed a multi-hop strategy that allows multiple reading and updating on the memory can be performed at each iteration. We conduct experiments using the patient data from the Parkinson's Progression Markers Initiative (PPMI) with the task of classification of Parkinson's Disease (PD) cases versus controls. We demonstrate that superior classification performance can be achieved with our proposed framework, comparing with existing approaches involving a single type of data.
Clustering is an essential technique for discovering patterns in data. The steady increase in amount and complexity of data over the years led to improvements and development of new clustering algorithms. However, algorithms that can cluster data with mixed variable types (continuous and categorical) remain limited, despite the abundance of data with mixed types particularly in the medical field. Among existing methods for mixed data, some posit unverifiable distributional assumptions or that the contributions of different variable types are not well balanced. We propose a two-step hybrid density- and partition-based algorithm (HyDaP) that can detect clusters after variables selection. The first step involves both density-based and partition-based algorithms to identify the data structure formed by continuous variables and recognize the important variables for clustering; the second step involves partition-based algorithm together with a novel dissimilarity measure we designed for mixed data to obtain clustering results. Simulations across various scenarios and data structures were conducted to examine the performance of the HyDaP algorithm compared to commonly used methods. We also applied the HyDaP algorithm on electronic health records to identify sepsis phenotypes.
Learning From Noisy Labels By Regularized Estimation Of Annotator Confusion
Ryutaro Tanno, Ardavan Saeedi, Swami Sankaranarayanan, Daniel C. Alexander, Nathan Silberman
Date Published: 6 May, 2019(Originally Published: 10 Feb, 2019)
Categories at arxiv.org: cs.LG, cs.CV, stat.ML
CVPR 2019
The predictive performance of supervised learning algorithms depends on the quality of labels. In a typical label collection process, multiple annotators provide subjective noisy estimates of the "truth" under the influence of their varying skill-levels and biases. Blindly treating these noisy labels as the ground truth limits the accuracy of learning algorithms in the presence of strong disagreement. This problem is critical for applications in domains such as medical imaging where both the annotation cost and inter-observer variability are high. In this work, we present a method for simultaneously learning the individual annotator model and the underlying true label distribution, using only noisy observations. Each annotator is modeled by a confusion matrix that is jointly estimated along with the classifier predictions. We propose to add a regularization term to the loss function that encourages convergence to the true annotator confusion matrix. We provide a theoretical argument as to how the regularization is essential to our approach both for the case of single annotator and multiple annotators. Despite the simplicity of the idea, experiments on image classification tasks with both simulated and real labels show that our method either outperforms or performs on par with the state-of-the-art methods and is capable of estimating the skills of annotators even with a single label available per image.
Multiple Instance Learning (MIL) is a weak supervision learning paradigm that allows modeling of machine learning problems in which labels are available only for groups of examples called bags. A positive bag may contain one or more positive examples but it is not known which examples in the bag are positive. All examples in a negative bag belong to the negative class. Such problems arise frequently in fields of computer vision, medical image processing and bioinformatics. Many neural network based solutions have been proposed in the literature for MIL, however, almost all of them rely on introducing specialized blocks and connectivity in the architectures. In this paper, we present a novel and effective approach to Multiple Instance Learning in neural networks. Instead of making changes to the architectures, we propose a simple bag-level ranking loss function that allows Multiple Instance Classification in any neural architecture. We have demonstrated the effectiveness of our proposed method for popular MIL benchmark datasets. In addition, we have tested the performance of our method in convolutional neural networks used to model an MIL problem derived from the well-known MNIST dataset. Results have shown that despite being simpler, our proposed scheme is comparable or better than existing methods in the literature in practical scenarios. Python code files for all the experiments can be found at https://github.com/amina01/ESMIL.
Automatic construction of Chinese herbal prescription from tongue image via CNNs and auxiliary latent therapy topics
Yang Hu, Guihua Wen, Huiqiang Liao, Changjun Wang, Dan Dai, Zhiwen Yu
Date Published: 6 May, 2019(Originally Published: 23 Jan, 2018)
Categories at arxiv.org: cs.CV, cs.LG, cs.NE
17 pages, 10 figures
The tongue image provides important physical information of humans. It is of great importance for diagnoses and treatments in clinical medicine. Herbal prescriptions are simple, noninvasive and have low side effects. Thus, they are widely applied in China. Studies on the automatic construction technology of herbal prescriptions based on tongue images have great significance for deep learning to explore the relevance of tongue images for herbal prescriptions, it can be applied to healthcare services in mobile medical systems. In order to adapt to the tongue image in a variety of photographic environments and construct herbal prescriptions, a neural network framework for prescription construction is designed. It includes single/double convolution channels and fully connected layers. Furthermore, it proposes the auxiliary therapy topic loss mechanism to model the therapy of Chinese doctors and alleviate the interference of sparse output labels on the diversity of results. The experiment use the real world tongue images and the corresponding prescriptions and the results can generate prescriptions that are close to the real samples, which verifies the feasibility of the proposed method for the automatic construction of herbal prescriptions from tongue images. Also, it provides a reference for automatic herbal prescription construction from more physical information.
The electrocardiogram (ECG) is a widely-used medical test, typically consisting of 12 voltage versus time traces collected from surface recordings over the heart. Here we hypothesize that a deep neural network can predict an important future clinical event (one-year all-cause mortality) from ECG voltage-time traces. We show good performance for predicting one-year mortality with an average AUC of 0.85 from a model cross-validated on 1,775,926 12-lead resting ECGs, that were collected over a 34-year period in a large regional health system. Even within the large subset of ECGs interpreted as 'normal' by a physician (n=297,548), the model performance to predict one-year mortality remained high (AUC=0.84), and Cox Proportional Hazard model revealed a hazard ratio of 6.6 (p<0.005) for the two predicted groups (dead vs alive one year after ECG) over a 30-year follow-up period. A blinded survey of three cardiologists suggested that the patterns captured by the model were generally not visually apparent to cardiologists even after being shown 240 paired examples of labeled true positives (dead) and true negatives (alive). In summary, deep learning can add significant prognostic information to the interpretation of 12-lead resting ECGs, even in cases that are interpreted as 'normal' by physicians.
The recent adoption of Electronic Health Records (EHRs) by health care providers has introduced an important source of data that provides detailed and highly specific insights into patient phenotypes over large cohorts. These datasets, in combination with machine learning and statistical approaches, generate new opportunities for research and clinical care. However, many methods require the patient representations to be in structured formats, while the information in the EHR is often locked in unstructured texts designed for human readability. In this work, we develop the methodology to automatically extract clinical features from clinical narratives from large EHR corpora without the need for prior knowledge. We consider medical terms and sentences appearing in clinical narratives as atomic information units. We propose an efficient clustering strategy suitable for the analysis of large text corpora and to utilize the clusters to represent information about the patient compactly. To demonstrate the utility of our approach, we perform an association study of clinical features with somatic mutation profiles from 4,007 cancer patients and their tumors. We apply the proposed algorithm to a dataset consisting of about 65 thousand documents with a total of about 3.2 million sentences. We identify 341 significant statistical associations between the presence of somatic mutations and clinical features. We annotated these associations according to their novelty, and report several known associations. We also propose 32 testable hypotheses where the underlying biological mechanism does not appear to be known but plausible. These results illustrate that the automated discovery of clinical features is possible and the joint analysis of clinical and genetic datasets can generate appealing new hypotheses.
Odor identification is an important area in a wide range of industries like cosmetics, food, beverages and medical diagnosis among others. Odor detection could be done through an array of gas sensors conformed as an electronic nose where a data acquisition module converts sensor signals to a standard output to be analyzed. To facilitate odors detection a system is required for the identification. This paper presents the results of an automated odor identification process implemented by a fuzzy system and an electronic nose. First, an electronic nose prototype is manufactured to detect organic compounds vapor using an array of five tin dioxide gas sensors, an arduino uno board is used as a data acquisition section. Second, an intelligent module with a fuzzy system is considered for the identification of the signals received by the electronic nose. This solution proposes a system to identify odors by using a personal computer. Results show an acceptable precision.
Dynamic Transfer Learning for Named Entity Recognition
Parminder Bhatia, Kristjan Arumae, Busra Celikkaya
Date Published: 1 May, 2019(Originally Published: 13 Dec, 2018)
Categories at arxiv.org: cs.LG, cs.CL, stat.ML
AAAI 2019 Workshop on Health Intelligence
State-of-the-art named entity recognition (NER) systems have been improving continuously using neural architectures over the past several years. However, many tasks including NER require large sets of annotated data to achieve such performance. In particular, we focus on NER from clinical notes, which is one of the most fundamental and critical problems for medical text analysis. Our work centers on effectively adapting these neural architectures towards low-resource settings using parameter transfer methods. We complement a standard hierarchical NER model with a general transfer learning framework consisting of parameter sharing between the source and target tasks, and showcase scores significantly above the baseline architecture. These sharing schemes require an exponential search over tied parameter sets to generate an optimal configuration. To mitigate the problem of exhaustively searching for model optimization, we propose the Dynamic Transfer Networks (DTN), a gated architecture which learns the appropriate parameter sharing scheme between source and target datasets. DTN achieves the improvements of the optimized transfer learning framework with just a single training setting, effectively removing the need for exponential search.
Discovering heterogeneous subpopulations for fine-grained analysis of opioid use and opioid use disorders
Jen J. Gong, Abigail Z. Jacobs, Toby E. Stuart, Mathijs de Vaan
Date Published: 1 May, 2019(Originally Published: 11 Nov, 2018)
Categories at arxiv.org: q-bio.QM, cs.LG, stat.ML
Withdrawn pending data use agreement clarification
The opioid epidemic in the United States claims over 40,000 lives per year, and it is estimated that well over two million Americans have an opioid use disorder. Over-prescription and misuse of prescription opioids play an important role in the epidemic. Individuals who are prescribed opioids, and who are diagnosed with opioid use disorder, have diverse underlying health states. Policy interventions targeting prescription opioid use, opioid use disorder, and overdose often fail to account for this variation. To identify latent health states, or phenotypes, pertinent to opioid use and opioid use disorders, we use probabilistic topic modeling with medical diagnosis histories from a statewide population of individuals who were prescribed opioids. We demonstrate that our learned phenotypes are predictive of future opioid use-related outcomes. In addition, we show how the learned phenotypes can provide important context for variability in opioid prescriptions. Understanding the heterogeneity in individual health states and in prescription opioid use can help identify policy interventions to address this public health crisis.
Quantum Generalized Linear Models
Colleen M. Farrelly, Srikanth Namuduri, Uchenna Chukwu
Date Published: 1 May, 2019
Categories at arxiv.org: stat.ME, cs.LG, quant-ph
10 pages, 2 figures, 3 tables
Generalized linear models (GLM) are link function based statistical models. Many supervised learning algorithms are extensions of GLMs and have link functions built into the algorithm to model different outcome distributions. There are two major drawbacks when using this approach in applications using real world datasets. One is that none of the link functions available in the popular packages is a good fit for the data. Second, it is computationally inefficient and impractical to test all the possible distributions to find the optimum one. In addition, many GLMs and their machine learning extensions struggle on problems of overdispersion in Tweedie distributions. In this paper we propose a quantum extension to GLM that overcomes these drawbacks. A quantum gate with non-Gaussian transformation can be used to continuously deform the outcome distribution from known results. In doing so, we eliminate the need for a link function. Further, by using an algorithm that superposes all possible distributions to collapse to fit a dataset, we optimize the model in a computationally efficient way. We provide an initial proof-of-concept by testing this approach on both a simulation of overdispersed data and then on a benchmark dataset, which is quite overdispersed, and achieved state of the art results. This is a game changer in several applied fields, such as part failure modeling, medical research, actuarial science, finance and many other fields where Tweedie regression and overdispersion are ubiquitous.
Factor Analysis in Fault Diagnostics Using Random Forest
Nagdev Amruthnath, Tarun Gupta
Date Published: 30 Apr, 2019
Categories at arxiv.org: cs.LG, stat.ML
Factor analysis or sometimes referred to as variable analysis has been extensively used in classification problems for identifying specific factors that are significant to particular classes. This type of analysis has been widely used in application such as customer segmentation, medical research, network traffic, image, and video classification. Today, factor analysis is prominently being used in fault diagnosis of machines to identify the significant factors and to study the root cause of a specific machine fault. The advantage of performing factor analysis in machine maintenance is to perform prescriptive analysis (helps answer what actions to take?) and preemptive analysis (helps answer how to eliminate the failure mode?). In this paper, a real case of an industrial rotating machine was considered where vibration and ambient temperature data was collected for monitoring the health of the machine. Gaussian mixture model-based clustering was used to cluster the data into significant groups, and spectrum analysis was used to diagnose each cluster to a specific state of the machine. The significant features that attribute to a particular mode of the machine were identified by using the random forest classification model. The significant features for specific modes of the machine were used to conclude that the clusters generated are distinct and have a unique set of significant features.
Archiving large sets of medical or cell images in digital libraries may require ordering randomly scattered sets of image data according to specific criteria, such as the spatial extent of a specific local color or contrast content that reveals different meaningful states of a physiological structure, tissue, or cell in a certain order, indicating progression or recession of a pathology, or the progressive response of a cell structure to treatment. Here we used a Self Organized Map (SOM)-based, fully automatic and unsupervised, classification procedure described in our earlier work and applied it to sets of minimally processed grayscale and/or color processed Scanning Electron Microscopy (SEM) images of CD4+ T-lymphocytes (so-called helper cells) with varying extent of HIV virion infection. It is shown that the quantization error in the SOM output after training permits to scale the spatial magnitude and the direction of change (+ or -) in local pixel contrast or color across images of a series with a reliability that exceeds that of any human expert. The procedure is easily implemented and fast, and represents a promising step towards low-cost automatic digital image archiving with minimal intervention of a human operator.
Clinical decision support systems (CDSS) will play an in-creasing role in improving the quality of medical care for critically ill patients. However, due to limitations in current informatics infrastructure, CDSS do not always have com-plete information on state of supporting physiologic monitor-ing devices, which can limit the input data available to CDSS. This is especially true in the use case of mechanical ventilation (MV), where current CDSS have no knowledge of critical ventilation settings, such as ventilation mode. To enable MV CDSS to make accurate recommendations related to ventilator mode, we developed a highly performant ma-chine learning model that is able to perform per-breath clas-sification of 5 of the most widely used ventilation modes in the USA with an average F1-score of 97.52%. We also show how our approach makes methodologic improvements over previous work and that it is highly robust to missing data caused by software/sensor error.
Adversarial Examples: Opportunities and Challenges
Jiliang Zhang, Chen Li
Date Published: 29 Apr, 2019(Originally Published: 13 Sep, 2018)
Categories at arxiv.org: cs.LG, stat.ML
16 pages, 13 figures, 5 tables
With the advent of the era of artificial intelligence (AI), deep neural networks (DNNs) have shown huge superiority over human in image recognition, speech processing, autonomous vehicles and medical diagnosis. However, recent studies indicate that DNNs are vulnerable to adversarial examples (AEs) which are designed by attackers to fool deep learning models. Different from real examples, AEs can hardly be distinguished by human eyes, but mislead the model to predict incorrect outputs and therefore threaten security critical deep-learning applications. In recent years, the generation and defense of AEs have become a research hotspot in the field of AI security. This article reviews the latest research progress of AEs. First, we introduce the concept, cause, characteristics and evaluation metrics of AEs, then give a survey on the state-of-the-art AE generation methods with the discussion of advantages and disadvantages. After that, we review the existing defenses and discuss their limitations. Finally, the future research opportunities and challenges on AEs are prospected.
Automatic alignment of surgical videos using kinematic data
Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, François Petitjean, Lhassane Idoumghar, Pierre-Alain Muller
Date Published: 26 Apr, 2019(Originally Published: 3 Apr, 2019)
Categories at arxiv.org: cs.CV, cs.AI, cs.LG, stat.ML
Accepted at AIME 2019
Over the past one hundred years, the classic teaching methodology of "see one, do one, teach one" has governed the surgical education systems worldwide. With the advent of Operation Room 2.0, recording video, kinematic and many other types of data during the surgery became an easy task, thus allowing artificial intelligence systems to be deployed and used in surgical and medical practice. Recently, surgical videos has been shown to provide a structure for peer coaching enabling novice trainees to learn from experienced surgeons by replaying those videos. However, the high inter-operator variability in surgical gesture duration and execution renders learning from comparing novice to expert surgical videos a very difficult task. In this paper, we propose a novel technique to align multiple videos based on the alignment of their corresponding kinematic multivariate time series data. By leveraging the Dynamic Time Warping measure, our algorithm synchronizes a set of videos in order to show the same gesture being performed at different speed. We believe that the proposed approach is a valuable addition to the existing learning tools for surgery.
Clinical researchers use disease progression modeling algorithms to predict future patient status and characterize progression patterns. One approach for disease progression modeling is to describe patient status using a small number of states that represent distinctive distributions over a set of observed measures. Hidden Markov models (HMMs) and its variants are a class of models that both discover these states and make predictions concerning future states for new patients. HMMs can be trained using longitudinal observations of subjects from large-scale cohort studies, clinical trials, and electronic health records. Despite the advantages of using the algorithms for discovering interesting patterns, it still remains challenging for medical experts to interpret model outputs, complex modeling parameters, and clinically make sense of the patterns. To tackle this problem, we conducted a design study with physician scientists, statisticians, and visualization experts, with the goal to investigate disease progression pathways of certain chronic diseases, namely type 1 diabetes (T1D), Huntington's disease, Parkinson's disease, and chronic obstructive pulmonary disease (COPD). As a result, we introduce DPVis which seamlessly integrates model parameters and outcomes of HMMs into interpretable, and interactive visualizations. In this study, we demonstrate that DPVis is successful in evaluating disease progression models, visually summarizing disease states, interactively exploring disease progression patterns, and designing and comparing clinically relevant subgroup cohorts by introducing a case study on observation data from clinical studies of T1D.
Quadratic Autoencoder (Q-AE) for Low-dose CT Denoising
Fenglei Fan, Hongming Shan, Mannudeep K. Kalra, Ramandeep Singh, Guhan Qian, Matthew Getzin, Yueyang Teng, Juergen Hahn, Ge Wang
Date Published: 26 Apr, 2019(Originally Published: 17 Jan, 2019)
Categories at arxiv.org: cs.LG, eess.SP, stat.ML
Inspired by complexity and diversity of biological neurons, our group proposed quadratic neurons by replacing the inner product in current artificial neurons with a quadratic operation on input data, thereby enhancing the capability of an individual neuron. Along this direction, we are motivated to evaluate the power of quadratic neurons in popular network architectures, simulating human-like learning in the form of quadratic-neuron-based deep learning. Our prior theoretical studies have shown important merits of quadratic neurons and networks in representation, efficiency, and interpretability. In this paper, we use quadratic neurons to construct an encoder-decoder structure, referred as the quadratic autoencoder, and apply it to low-dose CT denoising. The experimental results on the Mayo low-dose CT dataset demonstrate the utility of quadratic autoencoder in terms of image denoising and model efficiency. To our best knowledge, this is the first time that the deep learning approach is implemented with a new type of neurons and demonstrates a significant potential in the medical imaging field.
Computational Approaches to Access Probabilistic Population Codes for Higher Cognition an Decision-Making
Kevin Jasberg, Sergej Sizov
Date Published: 25 Apr, 2019
Categories at arxiv.org: cs.NE, stat.AP
arXiv admin note: text overlap with arXiv:1804.10861
In recent years, research unveiled more and more evidence for the so-called Bayesian Brain Paradigm, i.e. the human brain is interpreted as a probabilistic inference machine and Bayesian modelling approaches are hence used successfully. One of the many theories is that of Probabilistic Population Codes (PPC). Although this model has so far only been considered as meaningful and useful for sensory perception as well as motor control, it has always been suggested that this mechanism also underlies higher cognition and decision-making. However, the adequacy of PPC for this regard cannot be confirmed by means of neurological standard measurement procedures. In this article we combine the parallel research branches of recommender systems and predictive data mining with theoretical neuroscience. The nexus of both fields is given by behavioural variability and resulting internal distributions. We adopt latest experimental settings and measurement approaches from predictive data mining to obtain these internal distributions, to inform the theoretical PPC approach and to deduce medical correlates which can indeed be measured in vivo. This is a strong hint for the applicability of the PPC approach and the Bayesian Brain Paradigm for higher cognition and human decision-making.
Attention-based Transfer Learning for Brain-computer Interface
Chuanqi Tan, Fuchun Sun, Tao Kong, Bin Fang, Wenchang Zhang
Date Published: 25 Apr, 2019
Categories at arxiv.org: eess.SP, cs.AI, cs.HC, cs.LG
In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP...
Different functional areas of the human brain play different roles in brain activity, which has not been paid sufficient research attention in the brain-computer interface (BCI) field. This paper presents a new approach for electroencephalography (EEG) classification that applies attention-based transfer learning. Our approach considers the importance of different brain functional areas to improve the accuracy of EEG classification, and provides an additional way to automatically identify brain functional areas associated with new activities without the involvement of a medical professional. We demonstrate empirically that our approach out-performs state-of-the-art approaches in the task of EEG classification, and the results of visualization indicate that our approach can detect brain functional areas related to a certain task.
One major challenge in the medication of Parkinson's disease is that the severity of the disease, reflected in the patients' motor state, cannot be measured using accessible biomarkers. Therefore, we develop and examine a variety of statistical models to detect the motor state of such patients based on sensor data from a wearable device. We find that deep learning models consistently outperform a classical machine learning model applied on hand-crafted features in this time series classification task. Furthermore, our results suggest that treating this problem as a regression instead of an ordinal regression or a classification task is most appropriate. For consistent model evaluation and training, we adopt the leave-one-subject-out validation scheme to the training of deep learning models. We also employ a class-weighting scheme to successfully mitigate the problem of high multi-class imbalances in this domain. In addition, we propose a customized performance measure that reflects the requirements of the involved medical staff on the model. To solve the problem of limited availability of high quality training data, we propose a transfer learning technique which helps to improve model performance substantially. Our results suggest that deep learning techniques offer a high potential to autonomously detect motor states of patients with Parkinson's disease.
Feature Grouping as a Stochastic Regularizer for High-Dimensional Structured Data
Sergul Aydore, Bertrand Thirion, Gael Varoquaux
Date Published: 22 Apr, 2019(Originally Published: 31 Jul, 2018)
Categories at arxiv.org: cs.LG, stat.ML
12 pages, 14 figures
In many applications where collecting data is expensive, for example neuroscience or medical imaging, the sample size is typically small compared to the feature dimension. It is challenging in this setting to train expressive, non-linear models without overfitting. These datasets call for intelligent regularization that exploits known structure, such as correlations between the features arising from the measurement device. However, existing structured regularizers need specially crafted solvers, which are difficult to apply to complex models. We propose a new regularizer specifically designed to leverage structure in the data in a way that can be applied efficiently to complex models. Our approach relies on feature grouping, using a fast clustering algorithm inside a stochastic gradient descent loop: given a family of feature groupings that capture feature covariations, we randomly select these groups at each iteration. We show that this approach amounts to enforcing a denoising regularizer on the solution. The method is easy to implement in many model architectures, such as fully connected neural networks, and has a linear computational cost. We apply this regularizer to a real-world fMRI dataset and the Olivetti Faces datasets. Experiments on both datasets demonstrate that the proposed approach produces models that generalize better than those trained with conventional regularizers, and also improves convergence speed.
Effectiveness of LSTMs in Predicting Congestive Heart Failure Onset
Sunil Mallya, Marc Overhage, Navneet Srivastava, Tatsuya Arai, Cole Erdman
Date Published: 13 Feb, 2019(Originally Published: 7 Feb, 2019)
Categories at arxiv.org: cs.LG, cs.AI, stat.ML
LSTMs, Electronic Health Records
In this paper we present a Recurrent neural networks (RNN) based architecture that achieves an AUCROC of 0.9147 for predicting the onset of Congestive Heart Failure (CHF) 15 months in advance using a 12-month observation window on a large cohort of 216,394 patients. We believe this to be the largest study in CHF onset prediction with respect to the number of CHF case patients in the cohort and the test set (3,332 CHF patients) on which the AUC metrics are reported. We explore the extent to which LSTM (Long Short Term Memory) based model, a variant of RNNs, can accurately predict the onset of CHF when compared to known linear baselines like Logistic Regression, Random Forests and deep learning based models such as Multi-Layer Perceptron and Convolutional Neural Networks. We utilize demographics, medical diagnosis and procedure data from 21,405 CHF and 194,989 control patients to as our features. We describe our feature embedding strategy for medical diagnosis codes that accommodates the sparse, irregular, longitudinal, and high-dimensional characteristics of EHR data. We empirically show that LSTMs can capture the longitudinal aspects of EHR data better than the proposed baselines. As an attempt to interpret the model, we present a temporal data analysis-based technique on false positives to attribute feature importance. A model capable of predicting the onset of congestive heart failure months in the future with this level of accuracy and precision can support efforts of practitioners to implement risk factor reduction strategies and researchers to begin to systematically evaluate interventions to potentially delay or avert development of the disease with high mortality, morbidity and significant costs.
Online detection of instantaneous changes in the generative process of a data sequence generally focuses on retrospective inference of such change points without considering their future occurrences. We extend the Bayesian Online Change Point Detection algorithm to also infer the number of time steps until the next change point (i.e., the residual time). This enables us to handle observation models which depend on the total segment duration, which is useful to model data sequences with temporal scaling. In addition, we extend the model by removing the i.i.d. assumption on the observation model parameters. The resulting inference algorithm for segment detection can be deployed in an online fashion, and we illustrate applications to synthetic and to two medical real-world data sets.
Over-parameterized deep neural networks have proven to be able to learn an arbitrary dataset with 100$\%$ training accuracy. Because of a risk of overfitting and computational cost issues, we cannot afford to increase the number of network nodes if we want achieve better training results for medical images. Previous deep learning research shows that the training ability of a neural network improves dramatically (for the same epoch of training) when a few nodes with supplementary information are added to the network. These few informative nodes allow the network to learn features that are otherwise difficult to learn by generating a disentangled data representation. This paper analyzes how concatenation of additional information as supplementary axes affects the training of the neural networks. This analysis was conducted for a simple multilayer perceptron (MLP) classification model with a rectified linear unit (ReLU) on two-dimensional training data. We compared the networks with and without concatenation of supplementary information to support our analysis. The model with concatenation showed more robust and accurate training results compared to the model without concatenation. We also confirmed that our findings are valid for deeper convolutional neural networks (CNN) using ultrasound images and for a conditional generative adversarial network (cGAN) using the MNIST data.
Finding and Following of Honeycombing Regions in Computed Tomography Lung Images by Deep Learning
Emre Eğriboz, Furkan Kaynar, Songül Varlı Albayrak, Benan Müsellim, Tuba Selçuk
Date Published: 11 Feb, 2019(Originally Published: 31 Oct, 2018)
Categories at arxiv.org: cs.CV, cs.LG, stat.ML
4 pages, 9 figures, 3 tables
In recent years, besides the medical treatment methods in medical field, Computer Aided Diagnosis (CAD) systems which can facilitate the decision making phase of the physician and can detect the disease at an early stage have started to be used frequently. The diagnosis of Idiopathic Pulmonary Fibrosis (IPF) disease by using CAD systems is very important in that it can be followed by doctors and radiologists. It has become possible to diagnose and follow up the disease with the help of CAD systems by the development of high resolution computed imaging scanners and increasing size of computation power. The purpose of this project is to design a tool that will help specialists diagnose and follow up the IPF disease by identifying areas of honeycombing and ground glass patterns in High Resolution Computed Tomography (HRCT) lung images. Creating a program module that segments the lung pair and creating a self-learner deep learning model from given Computed Tomography (CT) images for the specific diseased regions thanks to doctors are the main purposes of this work. Through the created model, program module will be able to find special regions in given new CT images. In this study, the performance of lung segmentation was tested by the S{\o}rensen-Dice coefficient method and the mean performance was measured as 90.7%, testing of the created model was performed with data not used in the training stage of the CNN network, and the average performance was measured as 87.8% for healthy regions, 73.3% for ground-glass areas and 69.1% for honeycombing zones.
Diabetic eye disease is one of the fastest growing causes of preventable blindness. With the advent of anti-VEGF (vascular endothelial growth factor) therapies, it has become increasingly important to detect center-involved diabetic macular edema (ci-DME). However, center-involved diabetic macular edema is diagnosed using optical coherence tomography (OCT), which is not generally available at screening sites because of cost and workflow constraints. Instead, screening programs rely on the detection of hard exudates in color fundus photographs as a proxy for DME, often resulting in high false positive or false negative calls. To improve the accuracy of DME screening, we trained a deep learning model to use color fundus photographs to predict ci-DME. Our model had an ROC-AUC of 0.89 (95% CI: 0.87-0.91), which corresponds to a sensitivity of 85% at a specificity of 80%. In comparison, three retinal specialists had similar sensitivities (82-85%), but only half the specificity (45-50%, p<0.001 for each comparison with model). The positive predictive value (PPV) of the model was 61% (95% CI: 56-66%), approximately double the 36-38% by the retinal specialists. In addition to predicting ci-DME, our model was able to detect the presence of intraretinal fluid with an AUC of 0.81 (95% CI: 0.81-0.86) and subretinal fluid with an AUC of 0.88 (95% CI: 0.85-0.91). The ability of deep learning algorithms to make clinically relevant predictions that generally require sophisticated 3D-imaging equipment from simple 2D images has broad relevance to many other applications in medical imaging.
Early hospital mortality prediction using vital signals
Reza Sadeghi, Tanvi Banerjee, William Romine
Date Published: 9 Feb, 2019(Originally Published: 18 Mar, 2018)
Categories at arxiv.org: cs.LG, stat.ML
11 pages, 5 figures, preprint of accepted paper in IEEE&ACM CHASE 2018 and published in Smart Heal...
Early hospital mortality prediction is critical as intensivists strive to make efficient medical decisions about the severely ill patients staying in intensive care units. As a result, various methods have been developed to address this problem based on clinical records. However, some of the laboratory test results are time-consuming and need to be processed. In this paper, we propose a novel method to predict mortality using features extracted from the heart signals of patients within the first hour of ICU admission. In order to predict the risk, quantitative features have been computed based on the heart rate signals of ICU patients. Each signal is described in terms of 12 statistical and signal-based features. The extracted features are fed into eight classifiers: decision tree, linear discriminant, logistic regression, support vector machine (SVM), random forest, boosted trees, Gaussian SVM, and K-nearest neighborhood (K-NN). To derive insight into the performance of the proposed method, several experiments have been conducted using the well-known clinical dataset named Medical Information Mart for Intensive Care III (MIMIC-III). The experimental results demonstrate the capability of the proposed method in terms of precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). The decision tree classifier satisfies both accuracy and interpretability better than the other classifiers, producing an F1-score and AUC equal to 0.91 and 0.93, respectively. It indicates that heart rate signals can be used for predicting mortality in patients in the ICU, achieving a comparable performance with existing predictions that rely on high dimensional features from clinical records which need to be processed and may contain missing information.
Compressed Sensing with Deep Image Prior and Learned Regularization
David Van Veen, Ajil Jalal, Eric Price, Sriram Vishwanath, Alexandros G. Dimakis
Date Published: 7 Feb, 2019(Originally Published: 17 Jun, 2018)
Categories at arxiv.org: stat.ML, cs.IT, cs.LG, math.IT
We propose a novel method for compressed sensing recovery using untrained deep generative models. Our method is based on the recently proposed Deep Image Prior (DIP), wherein the convolutional weights of the network are optimized to match the observed measurements. We show that this approach can be applied to solve any differentiable inverse problem. We also introduce a novel learned regularization technique which incorporates a small amount of prior information on the network weights. Compared to previous unlearned methods for compressed sensing, our algorithm requires fewer measurements in most cases. Unlike previous learned approaches based on generative models, our method does not require pre-training over large datasets. As such, we can apply our method to various medical imaging datasets for which data acquisition is expensive and generative models are difficult to train.
Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning
Wouter van Loon, Marjolein Fokkema, Botond Szabo, Mark de Rooij
Date Published: 7 Feb, 2019(Originally Published: 6 Nov, 2018)
Categories at arxiv.org: stat.ML, cs.LG, stat.ME
24 pages, 9 figures; corrected a minor error in reported AUC and accuracy values for Group Lasso, ...
In biomedical research many different types of patient data can be collected, including various types of omics data and medical imaging modalities. Applying multi-view learning to these different sources of information can increase the accuracy of medical classification models compared with single-view procedures. However, the collection of biomedical data can be expensive and taxing on patients, so that superfluous data collection should be avoided. It is therefore necessary to develop multi-view learning methods which can accurately identify the views most important for prediction. In recent years, several biomedical studies have used an approach known as multi-view stacking (MVS), where a model is trained on each view separately and the resulting predictions are combined through stacking. In these studies, MVS has been shown to increase classification accuracy. However, the MVS framework can also be used for selecting a subset of important views. To study the view selection potential of MVS, we develop a special case called stacked penalized logistic regression (StaPLR). Compared with existing view-selection methods, StaPLR can make use of faster optimization algorithms and is easily parallelized. We show that nonnegativity constraints on the parameters of the function which combines the views are important for preventing unimportant views from entering the model. We investigate the performance of StaPLR through simulations, and consider two real data examples. We compare the performance of StaPLR with an existing view selection method called the group lasso and observe that, in terms of view selection, StaPLR has a consistently lower false positive rate.
Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions. Due to it's complexity, i For high-stakes domains such as medical, providing intuitive explanations that can be consumed by domain experts without ML expertise becomes crucial. To this demand, concept-based methods (e.g., TCAV) were introduced to provide explanations using user-chosen high-level concepts rather than individual input features. While these methods successfully leverage rich representations learned by the networks to reveal how human-defined concepts are related to the prediction, they require users to select concepts of their choice and collect labeled examples of those concepts. In this work, we introduce DTCAV (Discovery TCAV) a global concept-based interpretability method that can automatically discover concepts as image segments, along with each concept's estimated importance for a deep neural network's predictions. We validate that discovered concepts are as coherent to humans as hand-labeled concepts. We also show that the discovered concepts carry significant signal for prediction by analyzing a network's performance with stitched/added/deleted concepts. DTCAV results revealed a number of undesirable correlations (e.g., a basketball player's jersey was a more important concept for predicting the basketball class than the ball itself) and show the potential shallow reasoning of these networks.
Speeding up scaled gradient projection methods using deep neural networks for inverse problems in image processing
Byung Hyun Lee, Se Young Chun
Date Published: 7 Feb, 2019
Categories at arxiv.org: cs.LG, cs.CV, stat.ML
10 pages, 5 figures
Conventional optimization based methods have utilized forward models with image priors to solve inverse problems in image processing. Recently, deep neural networks (DNN) have been investigated to significantly improve the image quality of the solution for inverse problems. Most DNN based inverse problems have focused on using data-driven image priors with massive amount of data. However, these methods often do not inherit nice properties of conventional approaches using theoretically well-grounded optimization algorithms such as monotone, global convergence. Here we investigate another possibility of using DNN for inverse problems in image processing. We propose methods to use DNNs to seamlessly speed up convergence rates of conventional optimization based methods. Our DNN-incorporated scaled gradient projection methods, without breaking theoretical properties, significantly improved convergence speed over state-of-the-art conventional optimization methods such as ISTA or FISTA in practice for inverse problems such as image inpainting, compressive image recovery with partial Fourier samples, image deblurring, and medical image reconstruction with sparse-view projections.
Integral Privacy for Sampling from Mollifier Densities with Approximation Guarantees
Hisham Husain, Zac Cranko, Richard Nock
Date Published: 6 Feb, 2019(Originally Published: 13 Jun, 2018)
Categories at arxiv.org: stat.ML, cs.LG
$\varepsilon$-differential privacy is a leading protection setting, focused by design on individual privacy. Many applications, such as in the medical / pharmaceutical domains, would rather posit privacy at a group level, furthermore of unknown size, a setting in which classical budget scaling tricks typically cannot guarantee non-trivial privacy levels. We call this privacy setting integral privacy. In this paper, we study a major problem of machine learning and statistics with related applications in domains cited above that have recently met with substantial press: sampling. Our formal contribution is twofolds: we provide a general theory for sampling to be integrally private, and we show how to achieve integral privacy with guarantees on the approximation of the true (non-private) density. Our theory introduces $\varepsilon$-mollifiers, subsets of densities whose sampling is guaranteed to be integrally private. Guaranteed approximation bounds of the true density are obtained via the boosting theory as it was originally formulated: we learn sufficient statistics in an $\varepsilon$-mollifier of exponential families using classifiers, which brings guaranteed approximation and convergence rates that degrade gracefully with the privacy budget, under weak assumptions. Approximation guarantees cover the mode capture problem. Experimental results against private kernel density estimation and private GANs displays the quality of our results, in particular for high privacy regimes.
Neural Networks (NN) have recently emerged as backbone of several sensitive applications like automobile, medical image, security, etc. NNs inherently offer Partial Fault Tolerance (PFT) in their architecture; however, the biased PFT of NNs can lead to severe consequences in applications like cryptography and security critical scenarios. In this paper, we propose a revised implementation which enhances the PFT property of NN significantly with detailed mathematical analysis. We evaluated the performance of revised NN considering both software and FPGA implementation for a cryptographic primitive like AES SBox. The results show that the PFT of NNs can be significantly increased with the proposed methodology.
Impossibility and Uncertainty Theorems in AI Value Alignment (or why your AGI should not have a utility function)
Peter Eckersley
Date Published: 5 Feb, 2019(Originally Published: 31 Dec, 2018)
Categories at arxiv.org: cs.AI
Published in SafeAI 2019: Proceedings of the AAAI Workshop on Artificial Intelligence Safety 2019
Utility functions or their equivalents (value functions, objective functions, loss functions, reward functions, preference orderings) are a central tool in most current machine learning systems. These mechanisms for defining goals and guiding optimization run into practical and conceptual difficulty when there are independent, multi-dimensional objectives that need to be pursued simultaneously and cannot be reduced to each other. Ethicists have proved several impossibility theorems that stem from this origin; those results appear to show that there is no way of formally specifying what it means for an outcome to be good for a population without violating strong human ethical intuitions (in such cases, the objective function is a social welfare function). We argue that this is a practical problem for any machine learning system (such as medical decision support systems or autonomous weapons) or rigidly rule-based bureaucracy that will make high stakes decisions about human lives: such systems should not use objective functions in the strict mathematical sense. We explore the alternative of using uncertain objectives, represented for instance as partially ordered preferences, or as probability distributions over total orders. We show that previously known impossibility theorems can be transformed into uncertainty theorems in both of those settings, and prove lower bounds on how much uncertainty is implied by the impossibility results. We close by proposing two conjectures about the relationship between uncertainty in objectives and severe unintended consequences from AI systems.
Privacy Preserving Off-Policy Evaluation
Tengyang Xie, Philip S. Thomas, Gerome Miklau
Date Published: 1 Feb, 2019
Categories at arxiv.org: cs.LG, stat.ML
Many reinforcement learning applications involve the use of data that is sensitive, such as medical records of patients or financial information. However, most current reinforcement learning methods can leak information contained within the (possibly sensitive) data on which they are trained. To address this problem, we present the first differentially private approach for off-policy evaluation. We provide a theoretical analysis of the privacy-preserving properties of our algorithm and analyze its utility (speed of convergence). After describing some results of this theoretical analysis, we show empirically that our method outperforms previous methods (which are restricted to the on-policy setting).
Deep Learning for Inverse Problems: Bounds and Regularizers
Jaweria Amjad, Zhaoyan Lyu, Miguel R. D. Rodrigues
Date Published: 31 Jan, 2019
Categories at arxiv.org: cs.LG, stat.ML
Inverse problems arise in a number of domains such as medical imaging, remote sensing, and many more, relying on the use of advanced signal and image processing approaches -- such as sparsity-driven techniques -- to determine their solution. This paper instead studies the use of deep learning approaches to approximate the solution of inverse problems. In particular, the paper provides a new generalization bound, depending on key quantity associated with a deep neural network -- its Jacobian matrix -- that also leads to a number of computationally efficient regularization strategies applicable to inverse problems. The paper also tests the proposed regularization strategies in a number of inverse problems including image super-resolution ones. Our numerical results conducted on various datasets show that both fully connected and convolutional neural networks regularized using the regularization or proxy regularization strategies originating from our theory exhibit much better performance than deep networks regularized with standard approaches such as weight-decay.
catch22: CAnonical Time-series CHaracteristics
Carl H Lubba, Sarab S Sethi, Philip Knaute, Simon R Schultz, Ben D Fulcher, Nick S Jones
Date Published: 30 Jan, 2019(Originally Published: 29 Jan, 2019)
Categories at arxiv.org: cs.IR, cs.LG, stat.ML
Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library, such as those in the hctsa toolbox. However, this approach is computationally expensive and involves evaluating many similar features, limiting the widespread adoption of feature-based representations of time series for real-world applications. In this work, we introduce a method to infer small sets of time-series features that (i) exhibit strong classification performance across a given collection of time-series problems, and (ii) are minimally redundant. Applying our method to a set of 93 time-series classification datasets (containing over 147000 time series) and using a filtered version of the hctsa feature library (4791 features), we introduce a generically useful set of 22 CAnonical Time-series CHaracteristics, catch22. This dimensionality reduction, from 4791 to 22, is associated with an approximately 1000-fold reduction in computation time and near linear scaling with time-series length, despite an average reduction in classification accuracy of just 7%. catch22 captures a diverse and interpretable signature of time series in terms of their properties, including linear and non-linear autocorrelation, successive differences, value distributions and outliers, and fluctuation scaling properties. We provide an efficient implementation of catch22, accessible from many programming environments, that facilitates feature-based time-series analysis for scientific, industrial, financial and medical applications using a common language of interpretable time-series properties.
Detection of Alzheimers Disease from MRI using Convolutional Neural Networks, Exploring Transfer Learning And BellCNN
GuruRaj Awate
Date Published: 29 Jan, 2019
Categories at arxiv.org: eess.IV, cs.CV, cs.LG
IEEE Conference, Intended for non-technical audiences
There is a need for automatic diagnosis of certain diseases from medical images that could help medical practitioners for further assessment towards treating the illness. Alzheimers disease is a good example of a disease that is often misdiagnosed. Alzheimers disease (Hear after referred to as AD), is caused by atrophy of certain brain regions and by brain cell death and is the leading cause of dementia and memory loss [1]. MRI scans reveal this information but atrophied regions are different for different individuals which makes the diagnosis a bit more trickier and often gets misdiagnosed [1, 13]. We believe that our approach to this particular problem would improve the assessment quality by pre-flagging the images which are more likely to have AD. We propose two solutions to this; one with transfer learning [9] and other by BellCNN [14], a custom made Convolutional Neural Network (Hear after referred to as CNN). Advantages and disadvantages of each approach will also be discussed in their respective sections. The dataset used for this project is provided by Open Access Series of Imaging Studies (Hear after referred to as OASIS) [2, 3, 4], which contains over 400 subjects, 100 of whom have mild to severe dementia. The dataset has labeled these subjects by two standards of diagnosis; MiniMental State Examination (Hear after referred to as MMSE) and Clinical Dementia Rating (Hear after referred to as CDR). These are some of the general tools and concepts which are prerequisites to our solution; CNN [5, 6], Neural Networks [10] (Hear after referred to as NN), Anaconda bundle for python, Regression, Tensorflow [7]. Keywords: Alzheimers Disease, Convolutional Neural Network, BellCNN, Image Recognition, Machine Learning, MRI, OASIS, Tensorflow
Neural eliminators and classifiers
Włodzisław Duch, Rafał Adamczak, Yoichi Hayashi
Date Published: 28 Jan, 2019
Categories at arxiv.org: cs.LG, stat.ML, 62-07, 62G-05, I.2.6
11 pages, 1 fig
Classification may not be reliable for several reasons: noise in the data, insufficient input information, overlapping distributions and sharp definition of classes. Faced with several possibilities neural network may in such cases still be useful if instead of a classification elimination of improbable classes is done. Eliminators may be constructed using classifiers assigning new cases to a pool of several classes instead of just one winning class. Elimination may be done with the help of several classifiers using modified error functions. A real life medical application of neural network is presented illustrating the usefulness of elimination.
We propose a novel automatic method for accurate segmentation of the prostate in T2-weighted magnetic resonance imaging (MRI). Our method is based on convolutional neural networks (CNNs). Because of the large variability in the shape, size, and appearance of the prostate and the scarcity of annotated training data, we suggest training two separate CNNs. A global CNN will determine a prostate bounding box, which is then resampled and sent to a local CNN for accurate delineation of the prostate boundary. This way, the local CNN can effectively learn to segment the fine details that distinguish the prostate from the surrounding tissue using the small amount of available training data. To fully exploit the training data, we synthesize additional data by deforming the training images and segmentations using a learned shape model. We apply the proposed method on the PROMISE12 challenge dataset and achieve state of the art results. Our proposed method generates accurate, smooth, and artifact-free segmentations. On the test images, we achieve an average Dice score of 90.6 with a small standard deviation of 2.2, which is superior to all previous methods. Our two-step segmentation approach and data augmentation strategy may be highly effective in segmentation of other organs from small amounts of annotated medical images.
Deep learning has been successfully applied to a variety of image classification tasks. There has been keen interest to apply deep learning in the medical domain, particularly specialties that heavily utilize imaging, such as ophthalmology. One issue that may hinder application of deep learning to the medical domain is the vast amount of data necessary to train deep neural networks (DNNs). Because of regulatory and privacy issues associated with medicine, and the generally proprietary nature of data in medical domains, obtaining large datasets to train DNNs is a challenge, particularly in the ophthalmology domain. Transfer learning is a technique developed to address the issue of applying DNNs for domains with limited data. Prior reports on transfer learning have examined custom networks to fully train or used a particular DNN for transfer learning. However, to the best of my knowledge, no work has systematically examined a suite of DNNs for transfer learning for classification of diabetic retinopathy, diabetic macular edema, and two key features of age-related macular degeneration. This work attempts to investigate transfer learning for classification of these ophthalmic conditions. Part I gives a condensed overview of neural networks and the DNNs under evaluation. Part II gives the reader the necessary background concerning diabetic retinopathy and prior work on classification using retinal fundus photographs. The methodology and results of transfer learning for diabetic retinopathy classification are presented, showing that transfer learning towards this domain is feasible, with promising accuracy. Part III gives an overview of diabetic macular edema, choroidal neovascularization and drusen (features associated with age-related macular degeneration), and presents results for transfer learning evaluation using optical coherence tomography to classify these entities.
The classification of time series data is a well-studied problem with numerous practical applications, such as medical diagnosis and speech recognition. A popular and effective approach is to classify new time series in the same way as their nearest neighbours, whereby proximity is defined using Dynamic Time Warping (DTW) distance, a measure analogous to sequence alignment in bioinformatics. However, practitioners are not only interested in accurate classification, they are also interested in why a time series is classified a certain way. To this end, we introduce here the problem of finding a minimum length subsequence of a time series, the removal of which changes the outcome of the classification under the nearest neighbour algorithm with DTW distance. Informally, such a subsequence is expected to be relevant for the classification and can be helpful for practitioners in interpreting the outcome. We describe a simple but optimized implementation for detecting these subsequences and define an accompanying measure to quantify the relevance of every time point in the time series for the classification. In tests on electrocardiogram data we show that the algorithm allows discovery of important subsequences and can be helpful in detecting abnormalities in cardiac rhythms distinguishing sick from healthy patients.
Optimal Nonparametric Inference under Quantization
Ruiqi Liu, Ganggang Xu, Zuofeng Shang
Date Published: 25 Jan, 2019(Originally Published: 24 Jan, 2019)
Categories at arxiv.org: math.ST, cs.LG, stat.ML, stat.TH
Statistical inference based on lossy or incomplete samples is of fundamental importance in research areas such as signal/image processing, medical image storage, remote sensing, signal transmission. In this paper, we propose a nonparametric testing procedure based on quantized samples. In contrast to the classic nonparametric approach, our method lives on a coarse grid of sample information and are simple-to-use. Under mild technical conditions, we establish the asymptotic properties of the proposed procedures including asymptotic null distribution of the quantization test statistic as well as its minimax power optimality. Concrete quantizers are constructed for achieving the minimax optimality in practical use. Simulation results and a real data analysis are provided to demonstrate the validity and effectiveness of the proposed test. Our work bridges the classical nonparametric inference to modern lossy data setting.
Learning Interpretable Models with Causal Guarantees
Carolyn Kim, Osbert Bastani
Date Published: 24 Jan, 2019
Categories at arxiv.org: cs.LG, stat.ML
Machine learning has shown much promise in helping improve the quality of medical, legal, and economic decision-making. In these applications, machine learning models must satisfy two important criteria: (i) they must be causal, since the goal is typically to predict individual treatment effects, and (ii) they must be interpretable, so that human decision makers can validate and trust the model predictions. There has recently been much progress along each direction independently, yet the state-of-the-art approaches are fundamentally incompatible. We propose a framework for learning causal interpretable models---from observational data---that can be used to predict individual treatment effects. Our framework can be used with any algorithm for learning interpretable models. Furthermore, we prove an error bound on the treatment effects predicted by our model. Finally, in an experiment on real-world data, we show that the models trained using our framework significantly outperform a number of baselines.
Online Adaptive Image Reconstruction (OnAIR) Using Dictionary Models
Brian E. Moore, Saiprasad Ravishankar, Raj Rao Nadakuditi, Jeffrey A. Fessler
Date Published: 24 Jan, 2019(Originally Published: 6 Sep, 2018)
Categories at arxiv.org: stat.ML, cs.LG
Submitted to IEEE Transactions on Computational Imaging
Sparsity and low-rank models have been popular for reconstructing images and videos from limited or corrupted measurements. Dictionary or transform learning methods are useful in applications such as denoising, inpainting, and medical image reconstruction. This paper proposes a framework for online (or time-sequential) adaptive reconstruction of dynamic image sequences from linear (typically undersampled) measurements. We model the spatiotemporal patches of the underlying dynamic image sequence as sparse in a dictionary, and we simultaneously estimate the dictionary and the images sequentially from streaming measurements. Multiple constraints on the adapted dictionary are also considered such as a unitary matrix, or low-rank dictionary atoms that provide additional efficiency or robustness. The proposed online algorithms are memory efficient and involve simple updates of the dictionary atoms, sparse coefficients, and images. Numerical experiments demonstrate the usefulness of the proposed methods in inverse problems such as video reconstruction or inpainting from noisy, subsampled pixels, and dynamic magnetic resonance image reconstruction from very limited measurements.
To improve the performance of Intensive Care Units (ICUs), the field of bio-statistics has developed scores which try to predict the likelihood of negative outcomes. These help evaluate the effectiveness of treatments and clinical practice, and also help to identify patients with unexpected outcomes. However, they have been shown by several studies to offer sub-optimal performance. Alternatively, Deep Learning offers state of the art capabilities in certain prediction tasks and research suggests deep neural networks are able to outperform traditional techniques. Nevertheless, a main impediment for the adoption of Deep Learning in healthcare is its reduced interpretability, for in this field it is crucial to gain insight on the why of predictions, to assure that models are actually learning relevant features instead of spurious correlations. To address this, we propose a deep multi-scale convolutional architecture trained on the Medical Information Mart for Intensive Care III (MIMIC-III) for mortality prediction, and the use of concepts from coalitional game theory to construct visual explanations aimed to show how important these inputs are deemed by the network. Our results show our model attains state of the art performance while remaining interpretable. Supporting code can be found at https://github.com/williamcaicedo/ISeeU.
A Question-Entailment Approach to Question Answering
Asma Ben Abacha, Dina Demner-Fushman
Date Published: 23 Jan, 2019
Categories at arxiv.org: cs.CL, cs.AI, cs.IR, cs.LG, 68T50
One of the challenges in large-scale information retrieval (IR) is to develop fine-grained and domain-specific methods to answer natural language questions. Despite the availability of numerous sources and datasets for answer retrieval, Question Answering (QA) remains a challenging problem due to the difficulty of the question understanding and answer extraction tasks. One of the promising tracks investigated in QA is to map new questions to formerly answered questions that are `similar'. In this paper, we propose a novel QA approach based on Recognizing Question Entailment (RQE) and we describe the QA system and resources that we built and evaluated on real medical questions. First, we compare machine learning and deep learning methods for RQE using different kinds of datasets, including textual inference, question similarity and entailment in both the open and clinical domains. Second, we combine IR models with the best RQE method to select entailed questions and rank the retrieved answers. To study the end-to-end QA approach, we built the MedQuAD collection of 47,457 question-answer pairs from trusted medical sources, that we introduce and share in the scope of this paper. Following the evaluation process used in TREC 2017 LiveQA, we find that our approach exceeds the best results of the medical task with a 29.8% increase over the best official score. The evaluation results also support the relevance of question entailment for QA and highlight the effectiveness of combining IR and RQE for future QA efforts. Our findings also show that relying on a restricted set of reliable answer sources can bring a substantial improvement in medical QA.
This paper presents a new method for medical diagnosis of neurodegenerative diseases, such as Parkinson's, by extracting and using latent information from trained Deep convolutional, or convolutional-recurrent Neural Networks (DNNs). In particular, our approach adopts a combination of transfer learning, k-means clustering and k-Nearest Neighbour classification of deep neural network learned representations to provide enriched prediction of the disease based on MRI and/or DaT Scan data. A new loss function is introduced and used in the training of the DNNs, so as to perform adaptation of the generated learned representations between data from different medical environments. Results are presented using a recently published database of Parkinson's related information, which was generated and evaluated in a hospital environment.
Chest radiography is an extremely powerful imaging modality, allowing for a detailed inspection of a patient's thorax, but requiring specialized training for proper interpretation. With the advent of high performance general purpose computer vision algorithms, the accurate automated analysis of chest radiographs is becoming increasingly of interest to researchers. However, a key challenge in the development of these techniques is the lack of sufficient data. Here we describe MIMIC-CXR, a large dataset of 371,920 chest x-rays associated with 227,943 imaging studies sourced from the Beth Israel Deaconess Medical Center between 2011 - 2016. Each imaging study can pertain to one or more images, but most often are associated with two images: a frontal view and a lateral view. Images are provided with 14 labels derived from a natural language processing tool applied to the corresponding free-text radiology reports. All images have been de-identified to protect patient privacy. The dataset is made freely available to facilitate and encourage a wide range of research in medical computer vision.
Aggregated Pairwise Classification of Statistical Shapes
Min Ho Cho, Sebastian Kurtek, Steven N. MacEachern
Date Published: 22 Jan, 2019
Categories at arxiv.org: stat.ML, cs.CV, cs.LG
The classification of shapes is of great interest in diverse areas ranging from medical imaging to computer vision and beyond. While many statistical frameworks have been developed for the classification problem, most are strongly tied to early formulations of the problem - with an object to be classified described as a vector in a relatively low-dimensional Euclidean space. Statistical shape data have two main properties that suggest a need for a novel approach: (i) shapes are inherently infinite dimensional with strong dependence among the positions of nearby points, and (ii) shape space is not Euclidean, but is fundamentally curved. To accommodate these features of the data, we work with the square-root velocity function of the curves to provide a useful formal description of the shape, pass to tangent spaces of the manifold of shapes at different projection points which effectively separate shapes for pairwise classification in the training data, and use principal components within these tangent spaces to reduce dimensionality. We illustrate the impact of the projection point and choice of subspace on the misclassification rate with a novel method of combining pairwise classifiers.
Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models. The dataset is freely available at https://stanfordmlgroup.github.io/competitions/chexpert .
Label shift refers to the phenomenon where the marginal probability p(y) of observing a particular class changes between the training and test distributions while the conditional probability p(x|y) stays fixed. This is relevant in settings such as medical diagnosis, where a classifier trained to predict disease based on observed symptoms may need to be adapted to a different distribution where the baseline frequency of the disease is higher. Given calibrated estimates of p(y|x), one can apply an EM algorithm to correct for the shift in class imbalance between the training and test distributions without ever needing to calculate p(x|y). Unfortunately, modern neural networks typically fail to produce well-calibrated probabilities, compromising the effectiveness of this approach. Although Temperature Scaling can greatly reduce miscalibration in these networks, it can leave behind a systematic bias in the probabilities that still poses a problem. To address this, we extend Temperature Scaling with class-specific bias parameters, which largely eliminates systematic bias in the calibrated probabilities and allows for effective domain adaptation under label shift. We term our calibration approach "Bias-Corrected Temperature Scaling". On experiments with CIFAR10, we find that EM with Bias-Corrected Temperature Scaling significantly outperforms both EM with Temperature Scaling and the recently-proposed Black-Box Shift Estimation.
Results complete.