Transparency on the reporting of public procurement information: lessons from handling compiled procurement information

In this blog post, we will summarise the key challenges affecting the transparency of public procurement information in the UK, including data quality issues such as lack of unique identifiers, duplicated records, inconsistent dates, and missing data fields. We argue that improving data collection, quality, and availability in public procurement is important to support accountability, transparency and to inform policy reform. Finally, we will describe … Continue reading Transparency on the reporting of public procurement information: lessons from handling compiled procurement information

Multivariate Time Series analysis with volatility-Oil Prices

With the basic analysis on the univariate time series on last blog post “Univariate Time Series Analysis -Oil Prices”. This blog post will continue the analysis on multivariate time series. First is using Henze-Zirklers test to check the multivariate normality. The mvnTest = ”hz” in the mvn function can perform the Henze-Zirklers test. The last column indicates whether data set follows a multivariate normality or … Continue reading Multivariate Time Series analysis with volatility-Oil Prices

Univariate Time Series Analysis -Oil Prices

This blog post will try to modeling and forecasting univariate time series dataset with ARMA-GARCH model and exam the goodness of fit with some basic tests. The oil prices dataset is the log returns of four benchmarks(West Texas Intermediate (WTI), Brent Blend, Dubai Crude and Maya) from 10/1/1997 to 4/6/2010. In this data set, each benchmark contains 698 observations, each of them was divided between … Continue reading Univariate Time Series Analysis -Oil Prices

Analysis of two-mode networks

In this blog post it will be analysed a two-mode network of students’ enrolments into modules at the University. Firstly, it will be shown how to visualise this two-mode network. Secondly, it will be demonstrated how to transform this network into a one-mode network to explore the similarities of each mode. The latter will be made using three methods: Overlaps count, Jaccard Similarity and Simple … Continue reading Analysis of two-mode networks

Dataset: House Property Sales. Exploratory analysis.

By Maria Fernanda Ibarra Gutiérrez and Thu Trang Dinh In this blog post, we will describe the database about House Property Sales, which can be downloaded from: https://www.kaggle.com/htagholdings/property-sales?select=raw_sales.csv According to the first Figure, this database describes some characteristics of the property sales into 5 variables and 29,580 observations from the 7th of February 2007 to the 26 of July 2019. This database does not have … Continue reading Dataset: House Property Sales. Exploratory analysis.

Article review: Exploring crime patterns in Mexico City

By Maria Fernanda Ibarra Gutiérrez Big Data analysis is a research approach that has been growing in importance to study several aspects of society, as we live surrounded by governmental and private systems, technological devices and social media platforms that gather information from our daily activities, choices, purchases, searches, health patterns and other digital touchpoints. Therefore, there is a large amount of data suitable for … Continue reading Article review: Exploring crime patterns in Mexico City

Introduction to scatter plot

By Maria Fernanda Ibarra Gutiérrez This blog looks at the ways in which scatter plots can be used to visualise multiple sets of data and the relationships between several variables. It takes a data set and deals with outliers, formatting the graphs for clarity, using bubbles to show a third variable, adding regression models and trend to the plots and splitting the data into separate … Continue reading Introduction to scatter plot

Dataset I: The Atlas of Economic Complexity-Part A

In this Blogpost, I would like to introduce the Atlas of economic complexity. It is a powerful data visualisation tool developed by the Harvard growth lab. Even at first glance, there is an abundance of information presented in a compelling way on the homepage. If we take more time to dive into the data and tweak different settings, the website delivers even more knowledge and … Continue reading Dataset I: The Atlas of Economic Complexity-Part A

Dataset I: The Atlas of Economic Complexity-Part B

The first dataset we would like to discuss is the Atlas of Economic Complexity, derived from http://atlas.cid.harvard.edu/explore. After exploring and appreciating those pretty charts and graphs in the post of Part A, we are also interested in the meta datasets under those graphs. The data sources of global goods trade and service trade are United Nations Statistical Division and Direction of Trade Statistics database (IMF) … Continue reading Dataset I: The Atlas of Economic Complexity-Part B

Examining Purchasing Power Parity theory by a time regression model

Introduction According to Bank International Settlements (2019), the foreign exchange market (or forex market) is the largest and the most liquid financial market in the world with global daily trading of $6.6 trillion in April 2019. Among leading currencies, the British pound sterling (GBP) is ranked fourth in line as one of the most widely traded currencies in the world and the pound has a … Continue reading Examining Purchasing Power Parity theory by a time regression model

SDG Indicator Filtering Function

It is a common issue to handle missing values in data preparation step before analysis. In R, missing values are represented by NA, and there are abundant NA-related functions in R to deal with NA values. Since we would like to cluster the SDG indicators later, it is highly recommended to construct a filtering function to guarantee there are no NA values in filtered data … Continue reading SDG Indicator Filtering Function