The impact of Covid-19 in World’s Economy

By Maria Fernanda Ibarra Gutiérrez The Coronavirus disease (Covid-19) is a worldwide health problem that according to the World Health Organization (WHO) has spread in 213 countries. Up to the 13th of April 2020, there were 1,807,308 cases around the world according to the Our World in Data database (Ritchie, 2020). At the current moment, the United States has the higher number of cases … Continue reading The impact of Covid-19 in World’s Economy

mferibgApril 13, 2020April 14, 2020

Article, Data Science / Machine Learning, LifeScience

Future predictions of Coronavirus cases using ARIMA model

This post aims to track the spread of COVID-19, also known as 2019 Novel Coronavirus. It is a new respiratory virus first identified in Wuhan in December 2019. According to Centers for Disease Control and Prevention (2020) the virus probably initially emerged from an animal source but now there are many affected cases indicating person-to-person spread occurring. At this time, how easily or sustainably this … Continue reading Future predictions of Coronavirus cases using ARIMA model

dinhtrang24April 13, 2020April 17, 2020

Article, Security, SocialScience Data

Article review: Exploring crime patterns in Mexico City

By Maria Fernanda Ibarra Gutiérrez Big Data analysis is a research approach that has been growing in importance to study several aspects of society, as we live surrounded by governmental and private systems, technological devices and social media platforms that gather information from our daily activities, choices, purchases, searches, health patterns and other digital touchpoints. Therefore, there is a large amount of data suitable for … Continue reading Article review: Exploring crime patterns in Mexico City

mferibgFebruary 27, 2020April 15, 2020

AlgorithmTips, Article, Data Science / Machine Learning

How to learn R

There are an incredibly large number of resources available to you for learning R, Starting from the R Manual, step by step books, youtube videos, more books, R blogs and so on. Here you will find a list of only some of these resources: Where to get R (Software) You can download R form https://cran.r-project.org/. R-Studio: An R Editor with additional plus, and which provides … Continue reading How to learn R

leospinafFebruary 6, 2020July 24, 2020

AlgorithmTips, Data Science / Machine Learning, GuestPost

Determining the Number of Clusters

By Adán José-García Discovering the number of clusters (k) in a dataset is a fundamental problem in data clustering or cluster analysis. Clustering is an unsupervised learning technique aiming to discover the natural partition of data objects into clusters. Clustering algorithms can be broadly divided into two groups: hierarchical and partitional. Both categories of clustering algorithms, i.e., k-means and single-link algorithms, require as input the … Continue reading Determining the Number of Clusters

leospinafJanuary 20, 2020January 20, 2020Leave a comment

Article, Data posts, Finance, Finance Data

Dataset I: The Atlas of Economic Complexity-Part A

In this Blogpost, I would like to introduce the Atlas of economic complexity. It is a powerful data visualisation tool developed by the Harvard growth lab. Even at first glance, there is an abundance of information presented in a compelling way on the homepage. If we take more time to dive into the data and tweak different settings, the website delivers even more knowledge and … Continue reading Dataset I: The Atlas of Economic Complexity-Part A

keithwangjunzheNovember 22, 2019December 4, 2019Leave a comment

Article, Data posts, Finance, Finance Data

Dataset I: The Atlas of Economic Complexity-Part B

The first dataset we would like to discuss is the Atlas of Economic Complexity, derived from http://atlas.cid.harvard.edu/explore. After exploring and appreciating those pretty charts and graphs in the post of Part A, we are also interested in the meta datasets under those graphs. The data sources of global goods trade and service trade are United Nations Statistical Division and Direction of Trade Statistics database (IMF) … Continue reading Dataset I: The Atlas of Economic Complexity-Part B

KEFANJINNovember 22, 2019December 4, 2019

Article, Data posts, Finance, Finance Data

Examining Purchasing Power Parity theory by a time regression model

Introduction According to Bank International Settlements (2019), the foreign exchange market (or forex market) is the largest and the most liquid financial market in the world with global daily trading of $6.6 trillion in April 2019. Among leading currencies, the British pound sterling (GBP) is ranked fourth in line as one of the most widely traded currencies in the world and the pound has a … Continue reading Examining Purchasing Power Parity theory by a time regression model

dinhtrang24November 10, 2019February 28, 2020

Data Science / Machine Learning, SocialScience, SocialScience Data

SDG Indicator Filtering Function

It is a common issue to handle missing values in data preparation step before analysis. In R, missing values are represented by NA, and there are abundant NA-related functions in R to deal with NA values. Since we would like to cluster the SDG indicators later, it is highly recommended to construct a filtering function to guarantee there are no NA values in filtered data … Continue reading SDG Indicator Filtering Function

KEFANJINJuly 29, 2019October 24, 2019

Data Science / Machine Learning

Chapter 3 – DBSCAN: Density-Based Clustering

As mentioned before, agglomerative algorithms work slowly on large data sets, kmeans could not be applied to non-convex data sets, and both are not able to detect and delete outliers. Thus, density-based clustering, or DBSCAN was proposed to meet the requirements like distinction and removal of noises, dealing with data sets in arbitrary shapes and improvement in efficiency of processing data sets with large size … Continue reading Chapter 3 – DBSCAN: Density-Based Clustering

KEFANJINJuly 28, 2019October 24, 2019

Data Science / Machine Learning

Chapter 2 – Kmeans Clustering

Partitioning algorithms are one of the most widely used and deeply studied clustering algorithms. It aims to partition the dataset into several clusters with similar objects while maximize the between-cluster variations(Dabbura, 2018). Though there are many modified partitioning algorithms, we will focus on Kmeans algorithm in this blog. Kmeans has been widely applied in data mining, pattern recognition, image compression and many machine learning fields … Continue reading Chapter 2 – Kmeans Clustering

KEFANJINJuly 12, 2019October 23, 2019

AlgorithmTips, Article

Introduction to ggplot2

We will use the bric_data and melted_data1 for this tutorial. bric_data has two variables: Country.Name and gini; melted_data1 has three variables: Country.Name, year, and value Continue reading Introduction to ggplot2

keithwangjunzheJuly 5, 2019October 2, 2019

Data Science / Machine Learning

Chapter 1 – Agglomerative Hierarchical Clustering

Nowadays, clustering techniques are frequently used in data analysis. Among them, partitioning and hierarchical clustering are the two most deeply studied and widely used clustering methods. Different from partitioning, hierarchical clustering approach the problem via constructing a hierarchy of clusters, thus, it is heavily used in Bioinformatics, (e.g.) phylogenetic trees of animal evolution or virus transmission (Kilitcioglu, 2018). In this blog, the hierarchical clustering method … Continue reading Chapter 1 – Agglomerative Hierarchical Clustering

KEFANJINJuly 1, 2019October 24, 2019

Data and Methods Exploration Group

Blog on Data Science research and projects conducted by lab members

Category: Article

The impact of Covid-19 in World’s Economy

Future predictions of Coronavirus cases using ARIMA model

Article review: Exploring crime patterns in Mexico City

How to learn R

Determining the Number of Clusters

Dataset I: The Atlas of Economic Complexity-Part A

Dataset I: The Atlas of Economic Complexity-Part B

Examining Purchasing Power Parity theory by a time regression model

SDG Indicator Filtering Function

Chapter 3 – DBSCAN: Density-Based Clustering

Chapter 2 – Kmeans Clustering

Introduction to ggplot2

Chapter 1 – Agglomerative Hierarchical Clustering