Algorithm Tips

  • Ball mapper over bank’s customers.
    In this blog post, I will show an R application of a Topological Data Analysis tool called Ball Mapper (BM), to visualise the distribution of the bank’s customers that have stayed or exited the bank across the joint distribution of the customers’ characteristics. BM is a useful tool to visualise datasets with multiple dimensions, to do so, BM summarises points that are close to each …
  • Web Scraping – How to retrieve 1807 skills using three lines of code
    This blog post will detail the steps required to begin your journey into web-scraping. Web Scraping can help solve many of the challenges that are faced in an ever-increasing digital word. Some of these challenges include being able to process the vast amount of data online, having a system that can react quickly to this data changing frequently, and making sure that the quality of the …
  • Visualization with R Studio
     “The simple graph has brought more information to the data analyst’s mind than any other device”. —- American mathematician John Tukey,the co-creator of Cooley–Tukey Fast Fourier Transform algorithm During the data exploration, there are many approaches leading us to different aspects of the dataset. Among them, visualization is a recommended start point due to its clear and straightforward results. The output of visualization could help …
  • Introduction to scatter plot
    By Maria Fernanda Ibarra Gutiérrez This blog looks at the ways in which scatter plots can be used to visualise multiple sets of data and the relationships between several variables. It takes a data set and deals with outliers, formatting the graphs for clarity, using bubbles to show a third variable, adding regression models and trend to the plots and splitting the data into separate …
  • How to learn R
    There are an incredibly large number of resources available to you for learning R, Starting from the R Manual, step by step books, youtube videos, more books, R blogs and so on. Here you will find a list of only some of these resources: Where to get R (Software) You can download R form https://cran.r-project.org/. R-Studio: An R Editor with additional plus, and which provides …
  • Determining the Number of Clusters
    By Adán José-García Discovering the number of clusters (k) in a dataset is a fundamental problem in data clustering or cluster analysis. Clustering is an unsupervised learning technique aiming to discover the natural partition of data objects into clusters. Clustering algorithms can be broadly divided into two groups: hierarchical and partitional. Both categories of clustering algorithms, i.e., k-means and single-link algorithms, require as input the …
  • Introduction to ggplot2
    We will use the bric_data and melted_data1 for this tutorial. bric_data has two variables: Country.Name and gini; melted_data1 has three variables: Country.Name, year, and value