Select Distinct Logo Clear Background

Business Analytics Blog

Split a column in BigQuery

How to split a column in BigQuery

This quick guide shows you how to split a column in BigQuery

When working with text data, quite often you need to split a column or trim excess characters to clean the data

This quick guide shows you how to it using URL data from Google Analytics GA4 to create separate columns for the Clean URL and the URL parameter

 

Initial Set up

Lets start with a list of URLs from our web traffic

You can copy this code to try it

 

WITH URLs AS
 (SELECT 'https://www.selectdistinct.co.uk/?utm_campaign=directory&utm_medium=referral&utm_source=themanifest.com' as page_location
  UNION ALL
  SELECT 'https://www.selectdistinct.co.uk/business-intelligence-expert-services/power-bi-mentoring/?gclid=EAIaIQobChMI1I3E3aa_ggMVJz4GAB3jzgg-EAAYASAAEgKnwPD_BwE' as page_location
  UNION ALL
  SELECT 'https://www.selectdistinct.co.uk/2023/06/28/topn-in-power-bi/?fbclid=IwAR2WKv315a8lxzS3kBxuGd3sZSr7VXcZDkxxFOVDxI0fcevx-I506qLfYew' as page_location
  UNION ALL
  SELECT 'https://www.selectdistinct.co.uk/2023/04/27/rolling-averages-and-rolling-sums-in-power-bi/?fbclid=IwAR2_o-1lnKTuNynhgdSrBZkGLzRH5Sa3aNpqtDjFwH4mq2Z7ii0U0RRDO0c'
  as page_location
  UNION ALL
  SELECT 'https://www.selectdistinct.co.uk' as page_location
)
select URLs.* from URLs

 

After running this code you will see 5 rows of URL data for testing

a lit of URL data from Google Analytics GA4 for testing

Step 1

The first step is to find the position of the delimiter, In our case it’s a question mark

select URLs.*, strpos(page_location, ‘?’) as String_Position

STRPOS tells us the character position of the question mark

 

using STRPOS we can find the position of a character in a text field

 

Step 2

Next,

Find the length of the page location field

length(page_location) as URL_length from URLs

 

The LENGTH command tells us the total number of characters in the full name

adding this as a column to our SQL gives us

 

select URLs.*

, strpos(page_location, ‘?’) as String_Position

, length(page_location) as URL_length from URLs

 

 

finding the length of a text field in bigquery

 

Step 3

The next step is to find the characters that make up the clean url before the url parameter, by finding everything to the left of the question mark

 

case when strpos(page_location, ‘?’) = 0 then page_location

    else left(page_location, strpos(page_location, ‘?’) -1) end as clean_url

Here we need to first check that the url data has as url parameter by checking for the question mark, if it does not have a question mark then the url is good to use. But if the data does contain a question mark we find everything to the left of that position

This takes the question mark position of 34 in the first example, subtracts 1 to remove the question mark itself, and returns the first 33 characters to give us a clean url

https://www.selectdistinct.co.uk/

 

extracting the clean url data from GA4 data

 

Step 4

For the URL parameters,

we want everything to the right of the question mark

case when strpos(page_location, ‘?’) = 0 then NULL else

substring(page_location, strpos(page_location, ‘?’)+1, length(page_location) – strpos(page_location, ‘?’)) end as URL_Parameter

This takes the question mark position of 34 in the first example, adds 1 to remove the question mark itself, and returns the remaining characters by evaluating how many characters remain

 

 

GA4 data with the page location split into a clean url column and a url parameter column

This data can then be used for further analysis, linking the URL back to other data , whilst also being able to see the parameter details

With more work we could break down the URL parameter further and provide even more insights

Here is the full SQL code that you should have if you have been following along

WITH URLs AS

(SELECT ‘https://www.selectdistinct.co.uk/?utm_campaign=directory&utm_medium=referral&utm_source=themanifest.com’ as page_location

  UNION ALL

  SELECT ‘https://www.selectdistinct.co.uk/business-intelligence-expert-services/power-bi-mentoring/?gclid=EAIaIQobChMI1I3E3aa_ggMVJz4GAB3jzgg-EAAYASAAEgKnwPD_BwE’ as page_location

  UNION ALL

  SELECT ‘https://www.selectdistinct.co.uk/2023/06/28/topn-in-power-bi/?fbclid=IwAR2WKv315a8lxzS3kBxuGd3sZSr7VXcZDkxxFOVDxI0fcevx-I506qLfYew’ as page_location

  UNION ALL

  SELECT ‘https://www.selectdistinct.co.uk/2023/04/27/rolling-averages-and-rolling-sums-in-power-bi/?fbclid=IwAR2_o-1lnKTuNynhgdSrBZkGLzRH5Sa3aNpqtDjFwH4mq2Z7ii0U0RRDO0c’

  as page_location

  UNION ALL

  SELECT ‘https://www.selectdistinct.co.uk’ as page_location

)

select URLs.*

, strpos(page_location, ‘?’) as String_Position

, length(page_location) as URL_length

, case when strpos(page_location, ‘?’) = 0 then page_location

    else left(page_location, strpos(page_location, ‘?’) -1) end as clean_url

, case when strpos(page_location, ‘?’) = 0 then NULL else

substring(page_location, strpos(page_location, ‘?’)+1, length(page_location) – strpos(page_location, ‘?’)) end as URL_Parameter

from URLs

 

You can apply this same technique to split other types of data, or substitute the space for a different character

 

 

 

Subscribe to our channel to see more tips and timesavers

Select Distinct YouTube Channel

Or find other useful SQL, Power BI or other business analytics timesavers in our Blog

Our Business Analytics Timesavers are selected from our day to day analytics consultancy work.  They are the everyday things we see that really help analysts, SQL developers, BI Developers and many more people.

Our blog has something for everyone, from tips for improving your SQL skills to posts about BI tools and techniques. We hope that you find these helpful!

Business Analytics Blog

By Simon Harrison

Simon Harrison Founder of Select Distinct Limited and a business intelligence expert