DDDAS.org

Dynamic Data-Driven Application Systems

Titles and Abstracts for DDCS 2019

All papers are in Computational Science – ICCS 2019: 19th International Conference, Faro, Portugal, June 12–14, 2019, Proceedings, Part IV, João M. F. Rodrigues, Pedro J. S. Cardoso, Jânio Monteiro, Roberto Lam, Valeria V. Krzhizhanovskaya, Michael H. Lees, Jack J. Dongarra, and Peter M.A. Sloot (eds.), Lecture Notes in Computer Science, volume 11539, Springer, Cham, Switzerland, 2019.

Anne D. Brooks and Robert A. Lodder, Nonparametric Approach to Weak Signal Detection in the Search for Extraterrestrial Intelligence (SETI), pp. 3–15.

Abstract

It might be easier for intelligent extraterrestrial civilizations to be found when they mark their position with a bright laser beacon. Given the possible distances involved, however, it is likely that weak signal detection techniques would still be required to identify even the brightest SETI Beacon. The Bootstrap Error-adjusted Single-sample Technique (BEST) is such a detection method. The BEST has been shown to outperform the more traditional Mahalanobis metric in analysis of SETI data from a Project Argus near infrared telescope. The BEST algorithm is used to identify unusual signals and returns a distance in asymmetric nonparametric multidimensional central 68% confidence intervals (equivalent to standard deviations for 1-D data that are normally distributed, or Mahalanobis distance units for normally distributed data of d dimensions). Calculation of the Mahalanobis metric requires matrix factorization and is order of d3. Furthermore, the accuracy and precision of the BEST metric are greater than the Mahalanobis metric in realistic data collection scenarios (many more wavelengths available then observations at those wavelengths). An extension of the BEST to examine multiple samples (subclusters of data) simultaneously is explored in this paper.

Keywords

Parallel algorithm, Bootstrap, Supernova, Gamma ray burst, Solar transit

Junteng Hou, Shupeng Wang, Guangjun Wu, Ge Fu, Siyu Jia, Yong Wang, and Binbin Li, Parallel Strongly Connected Components Detection with Multi-partition on GPUs, pp. 16–30.

Abstract

The graph computing is often used to analyze complex relationships in the interconnected world, and the strongly connected components (SCC) detection in digraphs is a basic problem in graph computing. As graph size increases, many parallel algorithms based on GPUs have been proposed to detect SCC. The state-of-the-art parallel algorithms of SCC detection can accelerate on various graphs, but there is still space for improvement in: (1) Multiple traversals are time-consuming when processing real-world graphs; (2) Pivot selection is less accurate or time-consuming. We proposed an SCC detection method with multi-partition that optimizes the algorithm process and achieves high performance. Unlike existing parallel algorithms, we select a pivot and traverse it forward, and then select a vice pivot and traverse the pivot and the vice pivot backwards simultaneously. After updating the state of each vertex, we can get multiple partitions to parallelly detect SCC. At different phases of our approach, we use a vertex with the largest degree product or a random vertex as the pivot to balance selection accuracy and efficiency. We also implement weakly connected component (WCC) detection and 2-SCC to optimize our algorithm. And the vertices marked by the WCC partition are selected as the pivot to reduce unnecessary operations. We conducted experiments on the NVIDIA K80 with real-world and synthetic graphs. The results show that the proposed algorithm achieves an average detection acceleration of 8.8×× and 21×× when compared with well-known algorithms, such as Tarjan’s algorithm and Barnat’s algorithm.

Keywords

Strongly connected components detection, GPU, Multi-partition scheme, Real-world graphs

Michel Pires, Nicollas Silva, Leonardo Rocha, Wagner Meira, and Renato Ferreira, Efficient Parallel Associative Classification Based on Rules Memorization, pp. 31–44.

Abstract

Associative classification refers to a class of algorithms that is very efficient in classification problems. Data in such domain are multidimensional, with data instances represented as points of a fixed-length attribute space, and are exploited from two large sets: training and testing datasets. Models, known as classifiers, are mined in the training set by class association rules and are used in eager and lazy strategies for labeling test data instances. Because test data instances are independent and evaluated by sophisticated and high costly computations, an expressive overlap among similar data instances may be introduced. To overcome such drawback, we propose a parallel and high-performance associative classification based on a lazy strategy, which partial computations of similar data instances are cached and shared efficiently. In this sense, a PageRank-driven similarity metric is introduced to reorder computations by affinity, improving frequent-demanded association rules memoization in typical cache strategies. The experiments results show that our similarity-based metric maximizes the reuse of rules cached and, consequently, improve application performance, with gains up to 60% in execution time and 40% higher cache hit rate, mainly in limited cache space conditions.

Keywords

Parallel associative classification, Memorization, Class association rules

Sreelekha Guggilam, Syed Mohammed Arshad Zaidi, Varun Chandola, and Abani K. Patra, Integrated Clustering and Anomaly Detection (INCAD) for Streaming Data, pp. 45–59.

Abstract

Most current clustering based anomaly detection methods use scoring schema and thresholds to classify anomalies. These methods are often tailored to target specific data sets with “known” number of clusters. The paper provides a streaming clustering and anomaly detection algorithm that does not require strict arbitrary thresholds on the anomaly scores or knowledge of the number of clusters while performing probabilistic anomaly detection and clustering simultaneously. This ensures that the cluster formation is not impacted by the presence of anomalous data, thereby leading to more reliable definition of “normal vs abnormal” behavior. The motivations behind developing the INCAD model [17] and the path that leads to the streaming model are discussed.

Keywords

Anomaly detection, Bayesian non-parametric models, Extreme value theory, Clustering based anomaly detection

Xiukun Hu and Craig C. Douglas, An Implementation of a Coupled Dual-Porosity-Stokes Model with FeniCS, pp. 60–73.

Abstract

Porous media and conduit coupled systems are heavily used in a variety of areas. A coupled dual-porosity-Stokes model has been proposed to simulate the fluid flow in a dual-porosity media and conduits coupled system. In this paper, we propose an implementation of this multi-physics model. We solve the system with the automated high performance differential equation solving environment FEniCS. Tests of the convergence rate of our implementation in both 2D and 3D are conducted in this paper. We also give tests on performance and scalability of our implementation.

Keywords

Domain decomposition, Finite element method, Multi-physics, Parallel computing, FEniCS

Shamoz Shah and Madhu Goyal, Anomaly Detection in Social Media Using Recurrent Neural Network, pp. 74–83.

Abstract

In today’s information environment there is an increasing reliance on online and social media in the acquisition, dissemination and consumption of news. Specifically, the utilization of social media platforms such as Facebook and Twitter has increased as a cutting edge medium for breaking news. On the other hand, the low cost, easy access and rapid propagation of news through social media makes the platform more sensitive to fake and anomalous reporting. The propagation of fake and anomalous news is not some benign exercise. The extensive spread of fake news has the potential to do serious and real damage to individuals and society. As a result, the detection of fake news in social media has become a vibrant and important field of research. In this paper, a novel application of machine learning approaches to the detection and classification of fake and anomalous data are considered. An initial clustering step with the K-Nearest Neighbor (KNN) algorithm is proposed before training the result with a Recurrent Neural Network (RNN). The results of a preliminary application of the KNN phase before the RNN phase produces a quantitative and measureable improvement in the detection of outliers, and as such is more effective in detecting anomalies or outliers against the test dataset of 2016 US Presidential Election predictions.

Keywords

Clustering, Recurrent neural networks, Twitter, Presidential Election

Xing Wu, Shangwen Lv, Liangjun Zang, Jizhong Han, and Songlin Hu, Conditional BERT Contextual Augmentation, pp. 84–95.

Abstract

Data augmentation methods are often applied to prevent overfitting and improve generalization of deep neural network models. Recently proposed contextual augmentation augments labeled sentences by randomly replacing words with more varied substitutions predicted by language model. Bidirectional Encoder Representations from Transformers (BERT) demonstrates that a deep bidirectional language model is more powerful than either an unidirectional language model or the shallow concatenation of a forward and backward model. We propose a novel data augmentation method for labeled sentences called conditional BERT contextual augmentation. We retrofit BERT to conditional BERT by introducing a new conditional masked language model (The term “conditional masked language model” appeared once in original BERT paper, which indicates context-conditional, is equivalent to term “masked language model”. In our paper, “conditional masked language model” indicates we apply extra label-conditional constraint to the “masked language model”.) task. The well trained conditional BERT can be applied to enhance contextual augmentation. Experiments on six various different text classification tasks show that our method can be easily applied to both convolutional or recurrent neural networks classifier to obtain improvement.

Ricardo Martins, Alberto Azevedo, André B. Fortunato, Elsa Alves, Anabela Oliveira, and Alexandra Carvalho, An Innovative and Reliable Water Leak Detection Service Supported by Data-Intensive Remote Sensing Processing, pp. 96–108.

Abstract

The WADI project (Water-tightness Airborne Detection Implementation), integrated within the H2020 initiative, is developing an airborne water leak detection surveillance service, based on manned and unmanned aerial vehicles. This service aims to provide water utilities with adequate information on leaks in large water distribution infrastructures outside urban areas. Given the high cost associated with water infrastructure networks repairs, a reliability layer is necessary to improve the trustworthiness of the WADI leak identification procedure, based on complementary technologies for leak detection. Herein, a methodology based on the combined use of Sentinel remote sensing data and a water leak pathways model is presented, based on data-intensive computing. The resulting water leak detection reliability service, provided to the users through a web interface, targets prompt and cost-effective infrastructure repairs with the required degree of confidence on the detected leaks. The web platform allows for both data analysis and visualization of Sentinel images and relevant leak indicators at the sites selected by the user. The user can also provide aerial imagery inputs, to be processed together with Sentinel remote sensing data at the satellite acquisition dates identified by the user. The platform provides information about the detected leaks location and time evolution, and will be linked in the future with the outputs from water pathway models.

Keywords

Remote sensing, Water leak service, Data-intensive computing, HPC

This website has been established and is maintained by Prof. Craig C. Douglas.
©2014-2019 by www.dddas.org and Craig C. Douglas