Unlock geospatial insights with Data Studio and BigQuery GIS

Celonis, IBM and Red Hat Form Strategic Partnership to Help Transform Business Execution
April 1, 2021
Speeding up small queries in BigQuery with BI Engine
April 1, 2021
Celonis, IBM and Red Hat Form Strategic Partnership to Help Transform Business Execution
April 1, 2021
Speeding up small queries in BigQuery with BI Engine
April 1, 2021

Data Analytics

Unlock geospatial insights with Data Studio and BigQuery GIS

Chances are, your data contains information about geographic locations in some form, whether it’s addresses, postal codes, GPS coordinates, or regions that are meaningful to your business. Are you putting this data to work to understand your key metrics from every angle? In the past, you might’ve needed specialized Geographic Information System (GIS) software, but today, these capabilities are built into Google BigQuery. You can store locations, routes, and boundaries with geospatial data types and manipulate them with geospatial functions. Ultimately, helping people explore this data and spot geospatial patterns requires visualizing it on a map. To that end, we’re excited to announce new enhancements to Data Studio, including support for choropleth maps of BigQuery GEOGRAPHY polygons, so you can easily visualize BigQuery GIS data in a Google Maps-based interface.

Google Maps in Data Studio

Data Studio is a no-cost, self-serve reporting and data visualization service from Google Marketing Platform that connects to BigQuery and hundreds of other data sources. With it, you can visually explore your data and design and share beautiful, interactive reports. With the addition in the past year of a Google Maps-based visualization, you can visualize and interact with your geographic data just as you do with Google Maps: pan around, zoom in, even pop into Street View.

Don’t have geographic coordinates in your data? No problem: Data Studio recognizes countries, states/provinces, Designated Market Areas (DMAs), cities, postal codes, addresses, and other supported geographic field types. For example, even if all you have are DMA codes and metrics from Google Ads, you can visualize click-through rate by DMA:

US DMA Choropleth map in Data Studio
Click-through rate by Designated Market Area

Visualize BigQuery GEOGRAPHY polygons

But what if you want to visualize boundaries beyond the most commonly used ones? What if there are different boundaries that are important in your industry or business? What if you’ve done an analysis that groups locations into clusters and drawn boundaries around them?

With support for BigQuery GEOGRAPHY polygons in Data Studio, you can now visualize arbitrary polygons in a choropleth map. When you connect to BigQuery data that contains GEOGRAPHY fields, you’ll see them recognized as geospatial data:

Geospatial Field in Data Studio

To visualize this data, add a Google Maps “filled map” visualization:

Configuring "filled maps" visualizations in Data Studio

Then, for the Geospatial field, simply choose the field with geospatial data:

Configuring the geospatial field in Data Studio

You can group by a location dimension and color by a dimension or metric. To learn more, check out this step-by-step walkthrough.

Let’s take a look at a few examples of this feature in action. We’ll use data from BigQuery Public Datasets, which contain several datasets with geospatial data.

Mapping census tracts

Suppose we want to visualize rent affordability in different areas of the United States. We can get data about the percentage of income spent on rent from the U.S. Census Bureau’s American Community Survey dataset. We could visualize this metric on a map by state, county, metro area, or zip code, but it can vary quite a bit even within the same zip code. To understand it at a more detailed level, we might want to visualize census tracts. Thankfully, census tract boundaries are available in the U.S. Boundaries dataset. By joining these datasets and visualizing in Data Studio, we can understand rent affordability at a deeper level:

Mapping census tracts in Data Studio
Rent affordability by census tract in the Seattle, Washington area

Here, we’re seeing census tracts in the Seattle area, with the least affordable areas in orange. Two areas stand out for very different reasons: the University District (cheaper rent, but many students with low or no income) and Medina (high incomes, but multi-million dollar lakefront houses).

Here’s the query to get this data:

Language: SQL

  select
  ct.state_fips_code,
  ct.county_fips_code,
  c.county_name,
  ct.tract_ce,
  ct.geo_id,
  ct.tract_name,
  ct.lsad_name,
  ct.tract_geom,
  acs.percent_income_spent_on_rent,
from `bigquery-public-data.geo_census_tracts.census_tracts_washington` ct
left join `bigquery-public-data.geo_us_boundaries.counties` c
  on (ct.state_fips_code || ct.county_fips_code) = c.geo_id
left join `bigquery-public-data.census_bureau_acs.censustract_2018_5yr` acs
  on ct.geo_id = acs.geo_id

Mapping New York City taxi zones

Next, suppose we’re analyzing New York City taxi trips and want to understand how tipping varies by pickup location. New York City is divided into taxi zones, whose boundaries are available in the dataset. Using Data Studio, we can visualize the median tip percentage by taxi zone in the Brooklyn and Queens boroughs:

NYC Taxi Zone map in Data Studio
Median tip percentage by New York City taxi zone

The map helps us see a clear geospatial pattern: passengers picked up in the zones nearer to Manhattan tend to tip more.

Here’s the query to get this data:

Language: SQL

  with
zones_deduped as (
  select
    zone_id,
    zone_name,
    borough,
    st_union_agg(zone_geom) as zone_geom
  from `bigquery-public-data.new_york_taxi_trips.taxi_zone_geom`
  group by 1, 2, 3
),
pickup_zone_stats as (
  select
    pickup_location_id,
    approx_quantiles(safe_divide(tip_amount, fare_amount), 2)[ordinal(2)]
      as pickup_median_tip_percent,
  from `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2018`
  where payment_type = "1" -- Credit card
  group by 1
)
select *
from zones_deduped z
left join pickup_zone_stats p on z.zone_id = p.pickup_location_id

While this example involves taxi zones, there are many specialized boundaries that exist across various sectors and businesses: electoral districts, school districts, hospital referral regions, and flood risk zones, for instance.

Clustering severe storms

Finally, suppose we want to understand where in the U.S. different types of severe storms tend to occur. Rather than visualize the individual storms, we want to visualize “clusters” of many storms within a given area. BigQuery’s geospatial functions come in handy here: We can assign storms to clusters using the ST_CLUSTERDBSCAN function and draw boundaries around them using the ST_CONVEXHULL function. Then we can visualize these polygons in Data Studio:

Map of severe storm clusters in the US
Clusters of severe storms, and most common storm type

The map helps us see how the frequency and type of severe storms vary from west to east, from flooding in the Bay Area, to hail storms in the Great Plains, to thunderstorms in the Midwest and East Coast. (If you’d prefer to avoid severe storms altogether, you might want to live in the Pacific Northwest, where drizzle is frequent but severe storms are rare.)

Here’s the query to get this data:

Language: SQL

  with storms as (
  select
    st_geogpoint(event_longitude, event_latitude) as geo_point,
    event_type
  from `bigquery-public-data.noaa_historic_severe_storms.storms_2019`
),
storms_clustered as (
  select
    geo_point,
    event_type,
    st_clusterdbscan(geo_point, 20000, 20) over() as cluster_num
  from storms
),
storm_cluster_polygons as (
  select
    cluster_num,
    st_convexhull(st_union_agg(geo_point)) as hull,
    approx_top_count(event_type, 1)[offset(0)].value as most_common_event_type
  from storms_clustered
  where cluster_num is not null
  group by cluster_num 
)
select *, st_area(hull) as cluster_area
from storm_cluster_polygons

Leave a Reply

Your email address will not be published. Required fields are marked *