Introduction to Data Visualization
Data Visualisations are one of the most important components of Data analysis, as they can effectively summarise extensive data in graphical format. Several chart types are available, each having its strengths and use cases. One of the challenging aspects of the analysis process is selecting the correct way to describe your data by using any of the visualisations.
While deciding the chart type, think about the role the chart will fulfil. Common purposes for data visualisation include:
- Displaying change through time
- Representing a part-to-whole composition
- Illustrating flows and purposes
- Displaying data distribution
- Determining values between groups
- Analysing the relationship between variables
- Analysing geographical information
Next, think about the types of data you intend to plot. The selection of chart type will depend on the data whether it will be categorical, numerical or a combination of both. Some visualisations can be used for numerous purposes depending on these criteria. This blog is organised with this concept in focus, dedicating one chapter towards each visualisation role. Each chapter includes various chart types to address common data types and subtasks.
An important note is that this document only provides general guidelines, exploring out of the standard modes might help gain additional insights. Explore not only other chart types but also the alternative methods of encoding variables in each chart. Here, the noteworthy point is that you aren’t compelled to present your information in a single plot. It’s better to maintain simplicity and be clear in each plot and use various plots to facilitate comparisons, demonstrate trends and illustrate relationships among multiple variables.
How This Blog Is Organized?
This blog is structured into chapters, one for each primary category for utilising data visualisations. Every chapter begins with a brief description, followed by a list of chart types that fall into that category. Each chart type is accompanied by brief introductions and some icons. Below are the keys to decoding these symbols
Basic(B):
Chart types accompanied by this icon represent different chart types, including standard and typical. While creating a data visualisation, try starting with one of the standard chart types before exploring uncommon or advanced options.
Uncomman(U):
Chat types accompanied by these icons are generally more unusual than the casual standard types. The use cases for these chart types are more specialised as compared to other chart types in the same category or more commonly used in different roles
Advanced(A):
Chart types with this icon are even more specialised according to their role. Ensure that the selected chart type is the most suitable for you before implementing it. Sometimes, these chart types will not be available in visualisation software or libraries requiring additional effort to assemble them.
RAW NUMBERS: JUST SHOWING THE DATA
It is essential to keep in mind that you don’t always need to rely on a chart to depict your data. Sometimes, presenting the data as text is the most powerful method of conveying information.
Single Value Chart (B)
When you only have one number, it is best to just report as it is without adding any visual information. Graphically plotting a single value (with a bar or point ) usually lacks meaning if there aren’t any other values compared to it.
Single value with indicator(B)
An indicator compares a value with another one, usually to evaluate a metric’s value between the current period and the previous period.
Bullet Chart(B)
A chart type which states the difference between a single value with another number, often a benchmark instead of another data point.
Table(B)
Compares data points (rows) across multiple attributes (columns). Usually sorted by a significant attribute to enhance utility.
CHARTS FOR SHOWING CHANGE OVER TIME
One of the most common purposes of data visualisation is to observe the variation in features or in the metric’s number value over time. Usually, these charts display time on the horizontal axis, moving from left to right, while the vertical axis represents the variable of interest’s values.
Line chart(B)
The most prevalent chart type for showing changes over time is a line chart, where a point is plotted for each time period from left to right. Points are connected through line segments to depict the progression over time.
Sparkline(B)
A small line chart with little or no labelling, designed to be placed alongside text passages or within tables. Delivers a top-level summary without drawing too much attention.
Connected scatter plot
Demonstrates change over time across two numerical values. While line segments still connect points across time, they may not go consistently from left to right, unlike in atypical line charts.
Bar chart
Every time period is depicted with a bar, the bar’s value is demonstrated by its height above or below a zero baseline. It works best when there are fewer time periods to show.
Box plot
In this format, every time period is associated with a box and whiskers, each set demonstrates the most common data values. This format excels when there are numerous recordings for each time period and a distribution of values needs to be illustrated.
CHARTS FOR SHOWING PART-TO-WHOLE COMPOSITION
At times, it’s not just important to know about the total, but also the composition that makes up the total. The other charts like the standard bar chart are used to compare component values, but these are the charts of the part-to-whole relationship.
Pie Chart (B)
The whole value is shown as a filled circle, with each part displayed as proportional slices from that circle, each corresponding to each categorical group. It works best with five or fewer slices with each having distinct proportions.
Doughnut chart (B)
It is a pie chart with a hole in the centre. The central area is mainly used to depict a relevant single numeric value. But, sometimes it is also used as an alternative to a standard progress bar.
Waffle chart/grid plot (U)
In this squares are typically arranged in a 10*10 grid, where each square depicts one percent of the total. These squares are coloured based on the size of each categorical group.
Stacked bar chart (B)
A bar chart where each bar is broken down into multiple sub-bars which displays a part to the whole breakdown. A single stack bar can be used instead of a pie or doughnut chart as people tend to make more precise judgements of length than of area or angle.
Stacked area chart (B)
A line chart, where shaded regions are added beneath the line to break the total into sub-group values.
Steam graph(A)
It is the modified version of a stacked area chart in which areas are stacked around a central axis. It highlights relative changes rather than exact values.
Waterfall chart (A)
Depicts a change over time with a part-to-whole decomposition. The ends on the bars show value at two-time points and also the lengths of intermediate floating bars, which shows the change between points.
Note:
In some part-to-whole compositions follow a hierarchical form. In these, each component can be further divided into more fine parts at lower levels. The following are more specialised chart types for visualisation of these data types
Mosaic plot / Marimekko chart(U)
This can be visualised as a stacked bar on both axes. Each box is segmented on one axis based on the categorical value. Further divisions are made in the other axis within each sub-box based on the second categorical variable
Treemap (A)
It can be thought of as a more generalised Marimekko chart. Sub-boxes do not behold a consistent cut direction at a particular hierarchy level, and within this visualisation, it can consist of more than two levels of hierarchy
CHARTS FOR DEPICTING FLOWS AND PROCESS
A more specialised use of charts, which relates to the decomposition of the whole, involves tracking the flow of quantities through a multi-stage process. At their most advanced stage, these charts can efficiently illustrate how multiple inputs are processed through multiple outputs.
Funnel chart
Seen in business contexts, it illustrates how individuals encounter a product and eventually become users and customers. For each stage, each bar is plotted, whose length resembles the users. The connecting regions resemble the connection between stages and give the chart type’s shape about its name.
Parallel sets chart
Parallel stacked bars represent multiple part-to-whole divisions on different dimensions. The connecting regions depict how different sub-groups relate to one another across dimensions.
Sankey diagram
The width of the coloured portion illustrates the relative volume at each stage of a process. It also enables the visualisation of multiple sources of inputs and outputs.
Gantt chart
These charts are used for project scheduling, breaking down projects into individual tasks. Each task is accompanied by a bar, which offers the timeline for when each task should begin and end.
CHARTS FOR LOOKING AT HOW DATA IS DISTRIBUTED
Visualisations play a key role in showing how data point values are distributed. This is especially useful during the exploration process while developing an understanding of the characteristics of data features.
Bar chart
Utilised when a variable is qualitative or has discrete values. The height of each bar resembles the quantity within each categorical group.
Histogram
It is very similar to a bar chart, but it is used when a variable takes a continuous numeric value. The numeric range of the variable is segmented into bins, each bin for aggregating counts. Bars are plotted against each other to depict the continuous nature of the variable.
Density curve
Similar to histogram but acts as an alternative to histogram. Every individual data point adds a small segment of the local area; these portions are aggregated across all points to form the full curve.
Box plot
A box and whiskers plot illustrates the spread of the most frequent data values. The ends of the box display the middle 50% of the data. It is more commonly used for comparing distributions across groups rather than serving as a comprehensive overview.
Letter-value plot
Extends the box plot of quartiles by incorporating additional boxes introducing eighths, sixteenths and finer quantiles. Best when there is plenty of data to make sure the estimates are reliable.
Violin plot
Combines a density curve drawn on a central line with a box plot to show information about numbers. It’s usually used more to compare distributions between groups, rather than giving a complete summary.
Note:
The violin plot commonly includes a box plot to provide statistical information alongside the density curve. Sometimes, the internal box plot might be left out, or another type of linear distribution chart might be used as a substitute. All the below methods are most effective when dealing with small or moderate amounts of data points. When there are many data points, a summary like the box plot is best.
Rug Plot
Each data point is displayed as a tick mark on a straight line with a value that exactly corresponds to its position.
Strip plot
Similar to rug plots dots are used instead of tick marks. At times, points are plotted randomly jittering up or down to reduce overlapping.
Swarm plot
Similar to strip plots, deliberate shifting is conducted to minimize overlapping. Some horizontal jitter might be necessary to maintain the dot swarm compact.
CHARTS FOR COMPARING VALUES BETWEEN GROUPS
A common use of data visualisation is to compare values between distinct groups. This is often integrated with other purposes of data visualisation, such as displaying changes over time or examining data distribution. Therefore, it constitutes the largest category of chart types:
Bar chart
The basic method for comparing numeric values between groups and categories. Each group is given a bar and the height of each bar corresponds to its value above a zero baseline.
Grouped bar chart
Expands upon a bar chart by allowing comparison of data across two categorical variables. Each bar represents an intersection of variable levels: categories for one variable are depicted by the positions of bar clusters, while the other variable is displayed either by the colour or the position.
Lollipop Chart
It substitutes the bars of the bar chart with lines or dots. This is useful when there are numerous groups or categories to plot.
Dot plot
The bars of the bar chart are replaced with dots. Since the value is represented by position instead of length, the dot plot can be very advantageous when zero is not useful.
Line chart
Every line in a line chart depicts the change in value across time. One line is plotted for each group being compared. This works best when there are five or fewer groups to plot.
Sparkline
Smaller line charts usually with minimal or no labelling. Intended to provide a broad overview that seamlessly integrates with text or tables, but is also useful when numerous groups need representation.
Ridgeline
A series of line charts or density curves with partially shifted axes, is used to compare distributions between groups. Best when there are clear and distinct patterns across different groups.
Box plot
It compares a statistical overview of numerical values among different groups. Each group or category is assigned a box and whiskers plot, depicting the range of the most common data values.
Letter-value plot
Used in a similar way as the box plot, but instead of box plots, a letter value plot is assigned to every group. This works best when there is lots of data in every group, ensuring stable statistical estimates.
Violin plot
It compares distributions among different groups. For each group or category, a violin plot is allocated, which combines a density curve with a box plot.
Note:
One subcategory of comparison charts arises from the comparison of values between groups for multiple attributes.
Slope Chart
A specialised form of line chart. Here, two parallel lines depict different times, with vertical positions indicating value. For each data point, a line segment is drawn between the two times. Helpful when there are many data points, the slope of the line provides a rapid indication of the direction of change for each one.
Parallel co-ordinates plot
An extension of the slope plot for multiple dimensions. Now, every vertical line represents a different variable, each has its own scale. Scatterplots are necessary for spotting patterns and relationships within the data. when dealing with only two variables, they are commonly simpler to interact with.
Dumbbell plot
Mostly used for comparing two data points across multiple variables, this method is similar to parallel coordinates, each data point has a value. On the contrary, line segments connect points within each variable, depicting the variation in value.
At certain times, you might only be interested in the ranking among the groups, without knowing the actual value.
Bump Chart
A modified version of the line chart where the vertical position denotes rank instead of value. This change enables it to accommodate a larger number of categories than a standard line chart.
Grouped Bar chart
Normally, grouped bar charts arrange the bars within each group in a lenient order. However, they can be sorted by value within each group to highlight ranking though this can make it more difficult to locate each sub-category.
CHARTS FOR OBSERVATION RELATIONSHIPS BETWEEN VARIABLES
A frequent task in data exploration involves identifying the relationships between various data features. The following chart types can be employed to compare two or more variables, helping to identify trends and patterns.
Scatter plot
A standard chart type for displaying relationships between two numeric variables. The region of each factor on the horizontal and vertical axes represents the fee of the respective variable.
Bubble chart
In this kind of scatter plot, the dimensions of each dot are decided by means of a third variety. You can do more with scatter plots, like the usage of different shapes for one-of-a-kind classes, or the use of colours to expose something else. Scatter plots can represent both classes or numbers. It’s quality to limit a scatter plot to three variables to keep it clean to apprehend.
Connected scatter plot
If a third variable represents time, points in a scatter plot can be linked with traces to show how values trade through the years.
Dual-axis bar-line plot
A bar-line plot combines a horizontal axis across two types of charts: the bar chart and the line chart. Beneficial while the data represented with the aid of every chart type are interconnected but measured on awesome numeric scales.
Grouped bar chart
The two-bar graph shows data from two groups. Each rod represents a category, and the colours or positions in each group represent other groups. The length of each bar in a layer section represents the value of that group, such as the data frequency or a summary of the third numeric variable.
Heatmap
Extend bar charts and histograms to include two variables, both of which can be categorical or numeric. Each axis represents groups or bins of values for a single variable and forms a grid. The colour of each cell indicates the data frequency or third variable collected for each interval of the non-variable.
2-d density curve
Width of density curves for two statistical variables. Different colours are assigned to values, such as heat maps, but are best applied to the location of the soil rather than to discrete containers. Interestingly, this chart is sometimes called a heatmap and can be a bit confusing.
Dendrogram
A special scheme designed to show similarities between data points. The closer the branch connecting two data centers, the more similar. Sometimes it is depicted next to a thermal map to illustrate the underlying information.
Note:
Sometimes the shape of the relationship seems fortuitous. A mathematical graph with nodes connected by edges is the basic method, but there are other types of charts for visualizing such data.
Network diagram
points represent individual objects, while points connect objects through specific relationships. The value can be represented by the weight of the points. Vertex locations do not inherently carry meaning and should only be placed so that the communication is as clear as possible.
Transit map
A sensible application of community diagrams is in mapping train and subway structures. Often, those diagrams abstractly emphasize connections among stations as opposed to their exact geographical locations.
Chord diagram
Like a standard grid diagram, but the vertices are arranged in a circle.
Tree diagram
Grid diagram arranged to show hierarchical relationships. The direction on each edge represents the relationship between connecting neurons, such as the parent-child relationship or the adult-child relationship.
CHARTS FOR LOOKING AT GEOGRAPHICAL DATA
Sometimes, data sets feature geographical information such as latitude and longitude or location descriptors like countries or states. While plotting this data often involves overlaying it onto a map background. There are specific chart types that are tailored specifically for mapping and geographical analysis.
Scatter Map
Scatter plots are constructed on top of geographical maps, utilising geographic coordinates to point the locations.
Bubble map
Bubble charts are also built on top of geographical maps, where the size of the point depicts the value. It can also be utilised to group together in a scatter map when they are densely packed.
2-d histogram
Heat maps can be constructed on top of geographical maps. Sometimes displayed with a hexagonal grid instead of a rectangular grid. Which may distort the geographic representation on its edges.
Isopleth/contour map
The 2-d 2-density curve is overlaid on top of the geographical map.
Connection map
Network connections and flows are overlaid on top of geographical maps.
Choropleth
Like a heatmap, but the colours are assigned to geopolitical instead of using a grid. Values are commonly represented in the form of rates or ratios to prevent distortion due to population densities.
Cartogram
Geopolitical regions are sized according to their value, which necessarily requires distortion in shapes and overall topology.
ESSENTIAL CHARTS FOR DATA ANALYSIS
This guide covers a variety of chart types for even more specialised use cases. Sometimes, it is difficult to figure out which chart will be most suitable for the data at hand.
To help with selecting the appropriate chart, eighteen common chart types for data analysis are single value chart, single value w/Indicator, bullet chart, table, line chart, sparkline, bar chart, histogram, box plot, pie chart, stacked bar chart, stacked area chart, scatter plot, bubble chart, grouped bar chart, heatmap, bubble map, geographic map and choropleth. Most visualization for dashboards and reports will be effectively served by one of these chart types. While using the chart picker, it’s important to keep in mind these three key points: –
- What specific role or analysis will the chart fulfil?
- What kind of data do I have – Specific or numeric – and how many variables do I need to plot?
- After making the chart, does it convey the important information?
CHARTS THAT SHOULD BE USED JUDICIOUSLY
There are a few chart types left out of the guide that probably won’t be viewed as too rare or unique. Chart types such as those in this section have been excluded as they are less efficient compared to other chart types or have that make them more harder to understand. Use these charts only when you need to convey a unique or specific point that would give an advantage from an alternative visual representation.
Pictogram / Isotype
Used to compare values among groups and in other instances where a bar chart might be suitable. Every icon represents a particular quantity with values that are generally rounded to the nearest whole icon. As a result, this approach loses some accuracy compared to the common bar chart.
Circular/radial bar chart
A bar chart, but with bars plotted in circular arcs. However, this distorts each group’s value as it is ambiguous whether the values are displayed by bar angles or arc lengths. It’s recommended to stick with a standard bar chart.
Radar/spider plot
Used to contrast values among data points across several attributes. Each attribute is represented as a spoke, with its value indicated by the distance from the centre. For each data point, a separate polygon is drawn. However, this polygon area influences the perceived value. To avoid distortion, it is usually better to use a parallel coordinate plot or multiple-grouped bar charts.
ADDITIONAL WAYS TO VISUALISE DATA.
Numerous advanced charting methods extend beyond just picking the correct chart type and data chart encodings. Here are some common methods that can improve clarity and interpretation of data.
Horizontal vs. vertical orientation
Bar charts or Box plots, the comparison chart types, can either be displayed vertically or horizontally. The horizontal orientation can be advantageous when the groups have long names.
Small multiples/faceting
Instead of plotting multiple groups or categories on a single axis, generate individual plots for one group. This can help in differentiating between groups, especially when there are many groups.
Chart compositions/dashboards
Groups of charts, statistics and tables are employed to rapidly communicate key information to the users. Organising related items by grouping them and keeping the most important at the top and the least important at the bottom can assist viewers in extracting insights from the data.