Select Distinct Logo Clear Background

Business Analytics Blog

UNION in SQL

Combine data with union in SQL

A Guide to Using UNION in SQL: Combining and Manipulating Data

Introduction: Understanding the Purpose and Functionality of UNION in SQL

The SQL UNION operator allows us to merge the results of two or more SELECT statements into a single result set.

It performs a set operation by combining rows from different tables based on their column names and data types. This functionality proves to be immensely useful when we need to consolidate related information from various sources into one cohesive dataset. For example you may need to consolidate sales data from different subsidiaries, each of which may be using different data sources.

By utilizing the UNION operator, we can effectively eliminate duplicate records while merging data from multiple tables. This ensures that our resulting dataset remains clean and concise without any redundant information.

Throughout this article we will explore its syntax, examine various use cases where it can be applied, and understand how it contributes to efficient data manipulation within relational databases.

SQL UNION is an essential tool for combining and merging data in SQL.


Let us start with a simple example

We will use two similar sales data sets, the first covers store numbers 1 to 5 (SalesA)

a list of store numbers

We also have a second dataset, covering store numbers 5 to 10 (SalesB)

A sample sales dataset

You will notice that store 5 features in both datasets, you will see what happens with the duplicates shortly

Using UNION to Combine Data from Multiple Tables

The syntax of the UNION operator is 

SELECT columns FROM SalesA
UNION
SELECT columns FROM SalesB;

If we apply this to our sample dataset our SQL code looks like this

 select [Product_ID], [Store_ID], [qty_sold] from SalesA
union 
select [Product_ID], [Store_ID], [qty_sold] from SalesB

We can run this to get the results we are looking for

SQL Union dataset example

The Union operator has merged rows from the separate tables into a single dataset, and if you look carefully you will notice that the duplicate row for store 5 has been resolved

Eliminating rows is a key feature of the UNION operator

Key points to note when using the UNION method

  • There must be a matching number of columns in both sides of the union
  • The data types must be the same or compatible in the corresponding columns, (e.g. you cant mix text and integers unless you translate the integers values to text in one of queries)
  • You must be sure that eliminating duplicate rows with UNION is the correct treatment you need
  • You need to be sure that the columns are in the same order. in this example it would be easy to transpose the product ID column with the store ID column. The query would still work as the data types are the same but the data would be wrong and difficult to spot

Applying UNION ALL for Including Duplicate Rows in the Result Set

Eliminating duplicates is all well and good, but what if you need to keep the duplicates.

If we use the UNION ALL operator then all rows are retained including duplicates

The syntax for UNION ALL is essentially the same as UNION

lets see the result

Example of a UNION ALL query output in SQL, showing duplicate rows

We can clearly see the effect of retaining duplicate rows with UNION ALL

Leveraging the Power of Union to Manipulate and Combine Data Effortlessly in SQL Queries

Now that you understand the power of being able to combine data using UNION, you can go on and efficiently combine data from different sources

Watch this space as we will show how to do the same in Power Query in the coming weeks

Remember to make sure that your columns are in the correct order as lining up the columns incorrectly wouldn't necessarily fail or be identified if the data types are the same


One final tip that we find useful if combining datasets is to add a source column, particularly when using UNION ALL, this way you can easily trace which table a row has been returned from

Example

select [Product_ID], [Store_ID], [qty_sold], 'SalesA' as Source from SalesA
union ALL
select [Product_ID], [Store_ID], [qty_sold], 'SalesB' from SalesB

Its also worth pointing out that the UNION ALL query inherits its field names from the first query, in this example we don’t need to name the last column in the second query

Adding a source column in a UNION ALL query

Subscribe to our channel to see more SQL tips and timesavers

Select Distinct YouTube Channel

Or find other useful SQL, Power BI or other business analytics timesavers in our Blog

Our Business Analytics Timesavers are selected from our day to day analytics consultancy work and are the everyday things we see that really help analysts, SQL developers, BI Developers and many more people. Our blog has something for everyone, from tips for improving your SQL skills to posts about BI tools and techniques. We hope that you find these helpful!

Business Analytics Blog

By Simon Harrison

Simon Harrison Founder of Select Distinct Limited and a business intelligence expert