I actually did an interesting test around this…. Up to 30% more data can propagate from intra tables, after daily tables have been created within 48hours since that daily table’s creation.
A nice rule of thumb is even if you run scheduled queries on daily-tables, write your logic of time ranges to stretch back at least 2–3 days since table creation. So knowing the 30% shortage due to all kinds of reasons, I addressed the same problem as you had, just from another angle. I love your approach as well and make more sense I guess if you absolutely can not wait 2/3 days for a data delay on your reports ☺
If you want you can test this 30% rule on your own data (as no dataset’s assumption will ever be 100% universal — hehe!).
Create a scheduled query on your intraday-table to run every hour. Write the results to a table and let it run for a week, because some days you have more traffic than other days, given your web/mobile-app.
On the other end write a daily snapshot of all existing records also to seperate tables via a scheduled query and compare the count of rows of each of the two approaches, to answer the following:
- which hour of the day brought about the highest amount of your data ingestion for your intraday table?
- what is the volume of rows of your daily table at 23rd/24th hour of intra vs permant table D0,D1,D2 after table creation
Super interesting exercise! Ping me the %/results at “@dirkjobosman” on twitter. Would be curious to know if you also get 30% or not ☺
I might just write an article at some point on this, so feel free to also follow me here ♥️