How do you retrieve the exit rate from BigQuery?

In preparation for the phasing out of Universal Analytics on July 1, 2023, we at Online Dialogue have been working hard to ensure that we will be able to do all the analytics we used to do in Universal Analytics in GA4 from now on. In our daily work, we used to use the Google Analytics add-on for Google Sheets. This tool made it possible to build custom analysis sheets in a quick and accessible way to, for example, get a first impression of a new customer's website performance, create advanced funnel analyses and standardize extensive A/B testing analyses. With the move to GA4 we now do the same analyses with BigQuery data.

Over the past year, we have gained a lot of experience writing queries to answer our analysis questions using GA4 data. We have created a comprehensive set of queries for the different metrics that are frequently used in our clients' A/B testing programs. In this blog, we highlight one of these metrics for you: the exit rate (the exit rate), and we explain step by step how this query works. Here we use the raw GA4 data made available in BigQuery. In doing so, we assume that GA4 is already linked to BigQuery.

GA4 data and API data versus BigQuery data

Before we dive into the query, a few words about the difference between data from GA4, the API and BigQuery. After all, why don't we just use the new Analytics API from Google? The main reason is that the data from the GA4 interface and API is not 100% accurate. In fact, instead of the actual numbers, GA4 gives an estimate based on a HyperLogLog++ algorithm weather. For many purposes, this estimate is fine, for example, if you want to see the general trend in your visitor numbers. For an A/B test, however, the situation is different. Just 1% difference between the displayed number and the actual number of conversions or users can make, for example, a test that is actually a winner seem inconclusive. So for a thorough analysis of A/B testing, we need very accurate data and this is only available through BigQuery. 

What does ‘exit rate’ mean?

In addition, it is important to define what the term ‘exit rate’ or ‘exit rate’ means. Google describes it as “For all page views on the page, the exit percentage is the percentage of page views that were the last in the session.”. In other words, divide the number of page views on a specific page that is the last page view of a session by the total number of page views on that page, and multiply that by 100%. Or shorter: exits / pageviews * 100%.

Retrieve Exit Rate from BigQuery

Then it is now time for the query! The query consists of seven subqueries that eventually result in one table, containing the general exit rate of a page. To distinguish between the percentage for the two variants of an A/B test, additional subqueries are required to retrieve which pageviews belong to which test variant. Since the exact setup for that will differ for everyone, we will leave that part out of consideration here. 

Variables

At the top of the query, we declare a number of variables so that we can use the same information at different places in the query, or to easily modify that information at the top. In this case, we declare the start date, end date, and the page for which you need the exit rate wants to determine.

DECLARE STARTDATE STRING DEFAULT '20230601'
DECLARE ENDDATE STRING DEFAULT '20230628'
DECLARE PAGE STRING DEFAULT 'http://example.com/product/nice-product';

The start and end dates are used to retrieve the appropriate tables. In fact, in BigQuery, there is 1 table per day with all Analytics data for that day. Each table name ends with the date in the yyyymmdd format. Therefore, that is also the date format to be used in the variables. For example, in the example above, that is ‘20230601’ for June 1, 2023. 

The page you pass as a value to the variable will be treated in the query as a regular expression (regex). You can enter a specific URL, or use a regex to view an entire group of pages (e.g., all product pages) at once.

Page views and users

In the first subquery, we extract all the pageviews on, with associated time of day, pseudo-id of the user, session number and url. We have the pageviews of all pages needed, to later determine what the last pageview of a session is. In the query below, replace and with the appropriate names. The last line of this subquery ensures that only data is retrieved from the tables belonging to a date between the specified start date and end date.

WITH users_and_page_views AS (
 SELECT
   event_timestamp,
   user_pseudo_id,
   value.int_value AS session_number,
   (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_url
 FROM `..events_*`, UNNEST(event_params)
 WHERE event_name = 'page_view'
 AND key = 'ga_session_number'
 AND _table_suffix BETWEEN STARTDATE AND ENDDATE
),

Timestamp of the last pageview per session

Next, based on the result of the previous subquery, we look at what the timestamp of the last pageview of each session is. The result of this query is a list of timestamps and their associated user and session. This is how we know at what time the last pageview-action of a user has been within a session. Since a user can have multiple sessions, they can also have multiple exits. Therefore, it is important to perform this step at the session level.

timestamp_of_last_url_visited_per_session AS (
 SELECT
   MAX(event_timestamp) AS last_timestamp,
   user_pseudo_id,
   session_number,
 FROM users_and_page_views
 GROUP BY user_pseudo_id, session_number
),

Last URL per session

In the next step, it is important to link the timestamp information with the location of the last action. Therefore, in the next subquery we create a summary of not only the timestamp of the last pageview per session, as well as the corresponding URL. To do this, we combine the two previous tables, linking the rows based on the timestamp and user_pseudo_id. That way, we can be sure that we are linking the correct URL to the correct timestamp, even if two users happen to view a page at the same time.

last_visited_url_per_session AS (
 SELECT DISTINCT
   timestamp_of_last_url_visited_per_session.last_timestamp,
   users_and_page_views.user_pseudo_id,
   users_and_page_views.session_number,
   users_and_page_views.page_url
 FROM timestamp_of_last_url_visited_per_session LEFT JOIN users_and_page_views
 ON timestamp_of_last_url_visited_per_session.user_pseudo_id = users_and_page_views.user_pseudo_id
   AND timestamp_of_last_url_visited_per_session.last_timestamp = users_and_page_views.event_timestamp
),

Exits by URL

In the previous subqueries, we retrieved all the information we need to complete the exit rate-formula ( exits / pageviews * 100%). In the following subqueries, we determine the appropriate numbers of the page whose exit rate we want to know.

Now that we know from each user what the last pageview of each of their sessions was, we can determine how much exits there were on a given page. For this we use the PAGE we defined as a variable at the top. For pageviews whose URL satisfies the regex defined in the PAGE variable, the number of times they appear in the result table of the preceding subquery is counted.

In the current example, we have passed ‘http://example.com/product/nice-product’ at the very top as a value to the PAGE variable. So then the subquery counts how many times that URL is the last pageview in a user's session was. This is the first value for calculating the exit rate: exits / pageviews * 100%.

exits_on_page_per_url AS (
 SELECT
   page_url,
   COUNT(*) AS exit_count,
 FROM last_visited_url_per_session
 WHERE REGEXP_CONTAINS(page_url, PAGE)
 GROUP BY page_url
 ORDER BY page_url
),

Pageviews per URL

To obtain the second value from the calculation, the following subquery counts the total number of pageviews on the defined page. To do this, the number of pageviews counted in the previously created ‘users_and_page_views’ table that satisfies the regex from the PAGE variable. This makes the second value for calculating the exit rate (exits / page views * 100%) also found. 

pageviews_per_url AS (
 SELECT
   page_url,
   COUNT(*) AS page_views,
 FROM users_and_page_views
 WHERE REGEXP_CONTAINS(page_url, PAGE)
 GROUP BY page_url
 ORDER BY page_url
)

Calculating the exit rate

The final step is the actual calculation of the exit rate. To do this, we combine the data from the two previous subqueries. Those tables are combined based on the URL. Based on the data from those two tables, the following sum is calculated: exits / pageviews * 100%.

SELECT
 pageviews_per_url.page_url,
 exits_on_page_per_url.exit_count / pageviews_per_url.page_views * 100 AS exit_rate
FROM pageviews_per_url INNER JOIN exits_on_page_per_url
 ON pageviews_per_url.page_url = exits_on_page_per_url.page_url

The result is a table containing the URL and the corresponding exit rate

page_urlexit_rate
http://example.com/product/nice-product5.75

Successful! You have now calculated the exit rate of the desired page!

... Just a quick note: we send out a newsletter every three weeks that includes the latest blogs, team updates and, of course, news about the offerings in our academy. Click here to subscribe.


Newsletter sign up

It can be more efficient

There are all sorts of ways to make the above query, with its many subqueries more efficient. After each subquery, a temporary table is created that can be used in subsequent subqueries. Writing out and retrieving these temporary tables takes processing power and time. As analysts at Online Dialogue, we are therefore constantly working to make our queries better and more efficient. However, writing good and ultimately efficient queries starts with properly understanding GA4's logic and data model. To really understand what is happening, queries that are built incrementally are preferred. A step-by-step query is not only useful to communicate the logic, as in this blog, but is also useful to easily check that your data is correct and you are not making mistakes in your calculation.

This exit rate query is just a start

The query for the exit rate is just one of the queries we have written recently to continue our analyses with GA4. For example, we have also written queries for the number of users, the number of conversions, the time on page, and the average order value per A/B test variation. We also converted the impact analysis all the way to a GA4 version. These are available to our customers. Want to know more about this, If so, please contact Pim. Should you also be in the middle of switching from UA to GA4 for your A/B testing analyses or other data analyses, and need help with that? Please contact us, and then together we'll see what Online Dialogue can do in that.