R is a programming language designed for data analysis, along with its Integrated Development Environment RStudio it is well suited to form the central platform that ties together your data analytics. R is ideal for activities such as data manipulation, data visualisation, connecting to external databases and APIs, creating interactive dashboards, forecasting, data modelling and scheduling tasks.
One of R's strengths is its wealth of extensions created by its highly active user community as well as the R Core Team These extensions are known as packages and can be downloaded from CRAN (Comprehensive R Archive Network).
Below I've listed some of the packages used in this website along with an overview of their functions.
-
tidyverse - The data to be used for analysis and reporting rarely arrives in a form suitable for your needs. It will need to be processed before you can continue. This is the realm of the tidyverse. The tidyverse isn't actually a single package but a collection of packages designed to process data with the principles of Tidy Data in mind.
-
ggplot2 - A favourite package for graphing in R. The "gg" in its name stands for Grammar of Graphics, referring to its conventions of translating textual instructions into visuals. These visuals are layered on top of each other to create highly customisable graphs.
ggplot2 is actually part of the tidyverse but being the primary package for graphing gives it a separate mention.
-
shiny - The package used for building interactive web applications in R. The dashboards you see on this site are built in shiny along with the package shinydashboard which provides additional functionalities helpful in dashboard design.
Shiny web apps consist of two parts. There is a user interface (UI) which defines the client side aspect of the app, what the user sees and interacts with. R translates a UI script written in R into a combination of HTML, JavaScript and CSS to create this.
Then there is the server side. This does the work behind the scenes to create the app. Pulling in data, processing the data and rendering the outputs of the app.
The server script and UI communicate with each other through common variables. As an example, when you select an indicator in the World Development Indicators dashboard your selection is stored as the input variable indicator_selection and passed to the server script. The server script takes this variable and creates a new request to pull this specific indicator's data from the database. This data is processed in the server script and used to render the geographical visualisation that is in turn passed back to the UI to be displayed
-
DBI - Provides an interface to databases. Used in conjunction with packages such as RMySQL which provide the drivers for specific Relational Database Management Systems
-
googleAnalyticsR - A package used to interact with the Google Analytics API. Allows data from your Google Analytics account to be seamlessly integrated into your wider reporting.
-
sf and sp - sf stands for Simple Features and sp is short for spatial. Two packages for handling GIS (Geospatial Information Systems) data in R
GIS data has many uses but requires special treatment compared to other families of data. These packages provide the tools to do so.
The World Development Indicators dashboard uses these packages to assign the countries contained in the World Bank data with their own geospatial polygons which are then mapped to the globe.
-
leaflet - While the sf and sp packages include functionalities for mapping GIS data the leaflet package takes this a step further. Providing interactivity within the maps.
The leaflet package in R runs on the JavaScript leaflet library.
-
taskscheduleR and cronR - These packages allow you to automate the running of your scripts. taskscheduleR for Windows operating systems and cronR for Linux.
If you have a task that follows a defined process and needs to be performed on a regular basis - daily, weekly, monthly, whatever - you can write a script in R to achieve this and schedule that script to run automatically. Freeing up your time to do work that can't be done by automated scripts.
Databases are the most efficient and reliable method of storing data. Data is stored in a hierarchical structure with a server at the top level. The server contains databases which in turn hold schema within them, these schema hold tables which are made up of fields.
When tables have fields that contain values which are shared with other tables then these tables can be joined on those fields.
Some of the most popular RDBM systems use today are Microsoft SQL Server, MySQL and Oracle Database. This site uses MySQL which is the most popular open source RDBMS.
SQL is the means by which to communicate with an RDBMS. Allowing the use to alter the structure of a database, input and extract data.
The most frequent action in the world of data analytics is extracting data, known as querying. Here is an example of a query used in the World Development Index dashboard from the home page.
SELECT
c.ISO_code,
c.`Short.Name`,
i.`Indicator.Name`,
i.order_transform_label,
i.prefix,
i.suffix,
v.year,
(v.indicator_value * i.order_transform_number) As indicator_value
FROM WDI_Yearly_Values v
INNER JOIN WDI_Countries c ON c.id = v.country_id
INNER JOIN WDI_Indicators i ON i.id = v.indicator_id
WHERE i.`Indicator.Name` = 'GDP (current US$)'
AND v.year = 2017
Each RDBMS has it own version of SQL. Microsoft SQL Server has T-SQL, MySQL's version is simply called MySQL and Oracle Database has PL/SQL. These are fundamentally very similar to one another but differ slightly in their syntax.
Google Analytics is a free tool provided by Google used to monitor traffic on a website. This provides website owners insights into how a user interacts with their site. When this data is analysed properly it highlights areas of concern and opportunities for optimisation.
This is achieved through snippets of JavaScript embedded within a website that send user action data out to Google's servers.
The data is made up of metrics and dimensions. A metric would be something like a page view that can be counted. Or average time on page which can be assigned a specific numeric value. A dimension is something like a device type which has defined categories (mobile, desktop, tablet) that the metrics fall into. Metrics can be turned into dimension by bucketing them. So we could lets say have a dimension called timeOnPage which has categories of <1 minute and >1 minute
Google Analytics comes with hundreds of predefined metrics and dimensions out of the box. But also gives the option to define custom metrics and dimensions as required. If for example you wanted to track how often a specific button on your site was clicked you could tag this button with some JavaScript code and assign this code to metric. When the button was clicked the JavaScript would send this data to your Google Analytics account.
One of the keys to gaining insights from your web analytics is proper segmentation of your audience. Segmentation means assigning users to groups known as segments then comparing and contrasting the behaviors of these segments. For example breaking users down into different browser types and looking at the average time they spend on your website. If a particular browser has significantly worse metrics than the others it could mean your site isn't rendering properly in that browser.
This site, the apps contained within it and the data the apps run from are all hosted in the "Cloud" through the provider DigitalOcean. With 2GB RAM and 50GB disk space costing $10 per month to rent. This $10 per month makes up the entirety of the expense of this site.
Hosting in this manner provides many benefits. Such as security, flexibility, scalability and price. There is no on-premise hardware that needs to be protected and maintained. If your cloud machine ever needs more RAM or disk space you can simply up the resources of your machine from your account in a process that takes less than a minute.
Many businesses are moving their IT infrastructure to the cloud for these reasons.
The machine hosting this site runs on Ubuntu, a distribution of the open source operating system Linux.
This site makes use of both Nginx and Apache HTTP web servers for handling web traffic. With Nginx acting as a reverse proxy server for Apache. This configuration increases the security of the site, makes use of Nginx superior speed as a web server and provides load balancing functionality.
JavaScript controls the interactive aspect of websites. Triggering events like a menu expanding when a button is clicked or redirecting a visitor to a different page based on a certain criteria. For example this site uses JavaScript to detect the pixel width of your device when you land on the home page. It will automatically send you to the mobile version of the site if this width is less than 800 pixels.
Defines the objects from which web page are built using tags. These tags define the type of element you want to include. for example the tag for including a paragraph of text is p, the image tag is img. A block of HTML code begins with a tag contained within pointy brackets and ends with the tag contained within the pointy brackets again but proceeded with a forward slash. So the following code block inserts a paragraph into a webpage
<p>text to show on page</p>
These tags are contained within each other to form a hierarchy. Tags used to display content are nearly always contained within a tag called div (standing for division). These divisions form the skeleton of the site.
Without CSS a web page would still display all of the content contained in its HTML tags but not in an appealing format. CSS Creates rules that are then attached to HTML tags. These rules are used to control the appearance of the tags on the web page. Affecting things such as colour, size and spacing. The CSS code can be embedded directly into a web page within "STYLE" tags. Or more commonly stored in a separate file which the web page file will link to, called a CSS stylesheet. This allows the single file to set rules shared across multiple web pages.