Signalflow
2 min readApr 2, 2021

--

I actually did an interesting test around this…. Up to 30% more data can propagate from intra tables, after daily tables have been created within 48hours since that daily table’s creation.

A nice rule of thumb is even if you run scheduled queries on daily-tables, write your logic of time ranges to stretch back at least 2–3 days since table creation. So knowing the 30% shortage due to all kinds of reasons, I addressed the same problem as you had, just from another angle. I love your approach as well and make more sense I guess if you absolutely can not wait 2/3 days for a data delay on your reports ☺

If you want you can test this 30% rule on your own data (as no dataset’s assumption will ever be 100% universal — hehe!).

Create a scheduled query on your intraday-table to run every hour. Write the results to a table and let it run for a week, because some days you have more traffic than other days, given your web/mobile-app.

On the other end write a daily snapshot of all existing records also to seperate tables via a scheduled query and compare the count of rows of each of the two approaches, to answer the following:

  • which hour of the day brought about the highest amount of your data ingestion for your intraday table?
  • what is the volume of rows of your daily table at 23rd/24th hour of intra vs permant table D0,D1,D2 after table creation

Super interesting exercise! Ping me the %/results at “@dirkjobosman” on twitter. Would be curious to know if you also get 30% or not ☺

I might just write an article at some point on this, so feel free to also follow me here ♥️

--

--

Signalflow
Signalflow

Written by Signalflow

Insightfull tasks to optimise your startup’s worflows.

No responses yet