GoogleAnalytics4のBigQuery連携データをユーザー単位のページ遷移に加工するクエリ
GoogleAnalytics4のrawデータから、ユーザー単位(user_pseudo_id)でページ遷移が分かるデータマートを作成する。
手順
1. UNNESTでフラット化
2. ga_session_id, page_title, page_locationを横持ち
3. 同一ページで複数のeventが発生していても最初のtimestampだけ残すように、event_date, user_pseudo_id, ga_session_id, page_title, page_location単位のQUALIFY句
#standardSQL WITH -- 1. UNNESTでフラット化 tbl AS ( SELECT DISTINCT event_date, event_timestamp, user_pseudo_id, event_params.key AS event_params_key, event_params.value.string_value AS event_params_value_string_value, event_params.value.int_value AS event_params_value_int_value FROM `hatena-ga.analytics_xxxxxxx.events_*`, UNNEST(event_params) AS event_params WHERE (event_params.key IN ("page_location", "page_title", "ga_session_id")) ) SELECT -- 2. ga_session_id, page_title, page_locationを横持ち event_date, FORMAT_TIMESTAMP("%Y-%m-%d %H:%M:%S", TIMESTAMP_TRUNC(TIMESTAMP_MICROS(event_timestamp), second), "Asia/Tokyo") AS event_timestamp_tokyo, user_pseudo_id, MAX(CASE event_params_key WHEN "ga_session_id" THEN event_params_value_int_value ELSE NULL END) AS ga_session_id, MAX(CASE event_params_key WHEN "page_title" THEN event_params_value_string_value ELSE NULL END) AS page_title, MAX(CASE event_params_key WHEN "page_location" THEN event_params_value_string_value ELSE NULL END) AS page_location, FROM tbl GROUP BY event_date, user_pseudo_id, event_timestamp QUALIFY -- 3. 同一ページで複数のeventが発生していても最初のtimestampだけ残す ROW_NUMBER() OVER (PARTITION BY event_date, user_pseudo_id, ga_session_id, page_title, page_location ORDER BY event_timestamp ASC) = 1 ORDER BY event_date, user_pseudo_id, event_timestamp
結果はこんな感じ。