Future predictions of Coronavirus cases using ARIMA model

This post aims to track the spread of COVID-19, also known as 2019 Novel Coronavirus. It is a new respiratory virus first identified in Wuhan in December 2019. According to Centers for Disease Control and Prevention (2020) the virus probably initially emerged from an animal source but now there are many affected cases indicating person-to-person spread occurring. At this time, how easily or sustainably this … Continue reading Future predictions of Coronavirus cases using ARIMA model

Sticky post

How to learn R

There are an incredibly large number of resources available to you for learning R, Starting from the R Manual, step by step books, youtube videos, more books, R blogs and so on. Here you will find a list of only some of these resources: Where to get R (Software) You can download R form https://cran.r-project.org/. R-Studio: An R Editor with additional plus, and which provides … Continue reading How to learn R

Determining the Number of Clusters

By Adán José-García Discovering the number of clusters (k) in a dataset is a fundamental problem in data clustering or cluster analysis. Clustering is an unsupervised learning technique aiming to discover the natural partition of data objects into clusters. Clustering algorithms can be broadly divided into two groups: hierarchical and partitional. Both categories of clustering algorithms, i.e., k-means and single-link algorithms, require as input the … Continue reading Determining the Number of Clusters

SDG Indicator Filtering Function

It is a common issue to handle missing values in data preparation step before analysis. In R, missing values are represented by NA, and there are abundant NA-related functions in R to deal with NA values. Since we would like to cluster the SDG indicators later, it is highly recommended to construct a filtering function to guarantee there are no NA values in filtered data … Continue reading SDG Indicator Filtering Function

Chapter 3 – DBSCAN: Density-Based Clustering

As mentioned before, agglomerative algorithms work slowly on large data sets, kmeans could not be applied to non-convex data sets, and both are not able to detect and delete outliers. Thus, density-based clustering, or DBSCAN was proposed to meet the requirements like distinction and removal of noises, dealing with data sets in arbitrary shapes and improvement in efficiency of processing data sets with large size … Continue reading Chapter 3 – DBSCAN: Density-Based Clustering

Chapter 2 – Kmeans Clustering

Partitioning algorithms are one of the most widely used and deeply studied clustering algorithms. It aims to partition the dataset into several clusters with similar objects while maximize the between-cluster variations(Dabbura, 2018). Though there are many modified partitioning algorithms, we will focus on Kmeans algorithm in this blog. Kmeans has been widely applied in data mining, pattern recognition, image compression and many machine learning fields … Continue reading Chapter 2 – Kmeans Clustering

Chapter 1 – Agglomerative Hierarchical Clustering

Nowadays, clustering techniques are frequently used in data analysis. Among them, partitioning and hierarchical clustering are the two most deeply studied and widely used clustering methods. Different from partitioning, hierarchical clustering approach the problem via constructing a hierarchy of clusters, thus, it is heavily used in Bioinformatics, (e.g.) phylogenetic trees of animal evolution or virus transmission (Kilitcioglu, 2018). In this blog, the hierarchical clustering method … Continue reading Chapter 1 – Agglomerative Hierarchical Clustering