Reduce the costs of ML workflows with preemptible VMs and GPUs
November 1, 2019Announcing VMware Skyline Health for vSphere & vSAN
November 4, 2019Part of the fun of using cloud and big data together is exploring and experimenting with all the tools and datasets that are now available. There are lots of public datasets, including those hosted on Google Cloud, available for exploration. You can choose your favorite environment and language to delve into those datasets to find new and interesting results. We’re excited to announce that our newest BigQuery ML competition, available on Kaggle, is open for you to show off your data analytics skills.
If you haven’t used Kaggle before, you’ll find a ready-to-use notebooks environment with a ton of community-published data and public code –more than 19,000 public datasets and 200,000 notebooks. Since we recently announced integration with Kaggle kernels and BigQuery, you can now use BigQuery right from within Kaggle and take advantage of tools like BigQuery ML, which allows you to train and serve ML models right in BigQuery. BigQuery uses standard SQL queries, so it’s easy to get started if you haven’t used it before.
Green means go: getting started
Now, on to the competition details. You’ll find a dataset from Geotab as your starting point. We’re excited to partner with Geotab, which provides a variety of aggregate datasets gathered from commercial vehicle telematics devices. The dataset for this competition includes aggregate stopped vehicle information and intersection wait times, gathered from trip logging metrics from vehicles like semi-trucks. The data have been grouped by intersection, month, hour of day, direction driven through the intersection, and whether the day was on a weekend or not. Your task is to predict congestion based on an aggregate measure of stopping distance and waiting times at intersections in four major U.S. cities: Atlanta, Boston, Chicago and Philadelphia.
Geotab’s data serves as an intriguing basis for this challenge. “Cities have a wealth of data organically available. We’re excited about this competition because it truly puts a data scientist’s creativity to the test to pull in data from interesting external resources available to them,” says Mike Branch, vice president of data and analytics at Geotab. “This competition truly shows the democratization of AI to ease the accessibility of machine learning by positioning the power of ML alongside familiar SQL syntax. We hope that this inspires cities to see the art of the possible through data that is readily available to them to help drive insight into congestion.”
Take a look at the competition page for more information on how to make a submission.
Thinking big to win
But, you may be asking, how do I win this competition? We’d encourage you to think beyond the raw dataset provided to see what other elements can steer you toward success. There are lots of other external public datasets that may spark your imagination. What happens if you join weather or construction data? How about open maps datasets? What other Geotab datasets are available? Is there public city data that you can mine?
You can also find some useful resources here as you’re getting started:
Along with bragging rights, you can win GCP coupon awards with your submission in two categories: BigQuery ML models built in SQL or BigQuery ML models built in TensorFlow. To get started with BigQuery if you’re not using it already, try the BigQuery sandbox option for 10GB of free storage, 1 terabyte per month of query processing, and 10GB of BigQuery ML model creation queries.
You can get all the submission file guidelines and deadlines on the competition page, plus a discussion forum and leaderboard. You’ve got till the end of the year to submit your idea. Have fun, and make sure to show off your skills when you’re done!