Level Up Data Minions!

Signalflow
5 min readJan 5, 2019

Quickly visualize dynamic data with Google Drive, Sheets, BigQuery & Redash

Your business is growing up and its’ needs are becoming increasingly more sophisticated. As a data minion (my quirky reference to data analysts / scientists / engineers) you can quickly add value to your company without breaking the bank.

Tools, Glorious Tools!

With this tutorial you will be able to start your company’s data warehouse solution lean, and scale thereafter. The tools you need for this workflow are:

  • Google Account for BigQuery, Drive, Sheets and Cloud Console
  • Redash Hosted Account (or Redash Self-Hosted Account)

Steps to complete this workflow:

Step 1: Create a Sheet

  • Create data in your Google Spreadsheet

Step 2: Create a new GCP Project

Step 3: Switch on API access

  • Navigate to your projects’ API section and enable Google Drive API. Enable BigQuery API if necessary.

Step 4: Create a Redash Account

Step 5: Link Redash & BigQuery

  • Follow the Redash guidelines to create a BigQuery service account. Thereafter upload the .json-key to allow Redash access to Google BigQuery through an encryption layer. Test access to make sure your connection was set-up correctly.

Step 6: Share sheet-access to your service account email

  • Once you have setup the data source, you can load spreadsheets into Redash. To do so, you need to share the spreadsheet with the Service Account’s email address. This can be found in the Google Sheets API credentials page or in the JSON file under the "client_email" key. Sharing is done like you would share with any regular user (Source).

Step 7: Set-up the BigQuery Dataset and Table

  • In the BigQuery Web UI, click on a dataset and choose to create a table.
  • Choose the source as “Google Drive”.
  • Paste the URL for your Sheet into the location bar. Note: Make sure you copy the URL from the worksheet in Sheets that you want to serve as the table.
  • Choose either CSV or Sheets as the format. Note: CSV format will allow you to check “Auto-detect Schema.” Sheets’ format will allow you to specify the column names and types.
  • Choose the correct data schema / format per column.
  • Click “Create Table”

Step 8: Test BigQuery-GDrive connection by running a query in BQ Web UI

  • Your GSheet should have no headers, otherwise your query will not work
  • Make sure your DateColumn is set to “Date format” and not “Number format” in your GSheet.

Step 9: Navigate to Redash, write a query and run the output

  • Test BigQuery-GDrive-GSheet connection by running a query in Redash

Step 10: Update your GSheet and re-run your query in Redash

  • Your Redash output should contain the newest data which is in your Gsheet (below)

Step 11: Create a Redash visualisation on top of your data

Step 12: Create a dashboard and add the graph to it

Insights:

Data is like water. If it sits around it typically becomes stale. If you adapt your company’s data to continually flow through all the various departments; data will turn into insights and insights will impact your roadmap unimaginably.

There are various ways to approach data needs within a company. To keep things simple, one could think of data for ad hoc analysis and reporting data. We will focus only on the latter.

Reporting data could be divided into 3 columns:

  • Column A: This process should be defined at least once a year by the management / shareholder team. It will force them to focus on metrics which are crucial for the company and hold each team accountable for their own projects.
  • Column B: This process should as far as possible be automated, because it is simply impossible to efficiently scale insights to all departments in near real-time, if it relies on manual data collection and updates. By automating we limit human errors.
  • Column C: To further assist management / shareholders with decision-making, an additional layer of algorithmic prediction could be applied on-top of actual data. It makes sense to forecast only 3 months into the future.

Lastly, teams are often very busy to view metrics, one could think about setting up a “re-active alert system” or “summary updates” at various time intervals of B & C and automatically notify each team over a slack channel.

Future enhancements:

  • Expand your data sources: Shareholder forecasts (i.e. x5 Monthly Registrations by year 3), Survey data (i.e. NPS), External data via an API (i.e. Play / App Store Ratings), External data via Crawl (i.e. Amazon-crawl), etc.
  • Schedule App Script js-code to access a rest api.
  • Schedule App Script js-code to push sheet data to your SQLite / MySQL DB.
  • Schedule a cron-job on your server to pull data from your GSheet.
  • Schedule Google BigQuery meta-metric DBs.
  • Apply Spreadsheet Forecasting Function(s) to your dataset.
  • Apply BigQuery Forecasting Function(s) to your dataset.
Stay rockstar and go apply your new found insights minion!

Who in this world does not like feedback?

  • ⚡Comments: Is there something I could improve or perhaps address in a next post? Let your fingers do the talking and please drop me a comment below.
  • ⚡Followers: If you loved this post, please consider following me on Medium and Twitter for future posts on various indie tasks, workflows, and open source tech.
  • ⚡Readers: If you found this article helpful, please feel free to also scan over some of my other workflows.

--

--

Signalflow

Insightfull tasks to optimise your startup’s worflows.