Forum OpenACS Q&A: More fooling with time-series data

Collapse
Posted by Andrew Piskorski on
The query you originally posted also fails to be sufficiently general to adapt to the situation where the previous day might be more than one day off. (Your day_date comparisons in the pseudo-tables only look for adjacent days.)

Daryl, I don't think so. Well, it wouldn't be the first time I've been wrong, but I'm pretty sure my ugly Oracle query works exactly as I originally described. Since it assigns a continuous sequence of integers to the days actually present, and joins on those integers, not on dates, it always gets the previous day, no matter how long ago the previous day was. E.g.:

ROW_NUM PREV_ROW_NUM DAY_DATE   PREV_DAY_D PRICE PREV_PRICE
------- ------------ ---------- ---------- ----- ----------
      5            4 2000-01-07 2000-01-06   101        101
      6            5 2000-01-10 2000-01-07   101        101

Now, the rest of your comments and concerns are quite valid. I came up with that ugly query while experimenting, I would think more than twice before actually putting something like that into production use.

I've also learned to be wary of pushing pieces of queries into PL/SQL procedures. That often makes the query clearer, so I like doing it during development. But I've seen that result in queries that are slower by an order of magnitude or more, I think because PL/SQL is completely opaque to the query optimizer.

Hm, Daryl, could you expand on your comment about "It's also using a behavior that Oracle doesn't guarantee (ORDER BY within a view)."? Do you know any more specifics about that?

I think Oracle does guarantee that "order by" works when defining a materialized view, so it would be pretty weird if it didn't work when defining an on the fly view. Then again, assuming that normal and materialized views always work the same way would be pretty foolish on my part...

Yeah, the holiday/calendar stuff you posted is smart. I was actually doing something similar inside my "test_data_view", in order to filter out any bogus dates (prices when the market was not open), and optionally fill in values for any missing days. Compared to my RowNum hack, that sort of thing is downright elegant. :)

But, gosh, you're right. I can use that same holiday/calendaring strategy to replace my RowNum hack - thanks for pointing that out. E.g.:

-- We can do the same query using max() instead of the RowNum
-- pseudo-column.  This might be considered clearer, and more correct.
select
  d.id
  ,d.day_date
  ,dp.day_date  as prev_day_date
  ,d.price
  ,dp.price  as prev_price
  ,(d.price / dp.price)
from test_data_view d   -- Today.
    ,test_data_view dp  -- Previous day.
where d.id = 0
  and d.id = dp.id
  and  d.day_date between '2000-01-01' and '2000-03-31'
  and dp.day_date between '2000-01-01' and '2000-03-31'
  and dp.day_date = (
    select max(d2.day_date)
    from test_data_view d2
    where d.day_date > d2.day_date
  )
;

But guess what? The above query is much slower - about 75 vs. 3 seconds. I didn't look at the query plans, but I'd guess that this lousy performance is due to one or both of these factors:

  1. We are joining views which themselves are doing joins, and Oracle is known to sometimes be very bad at optimizing this. Flattening everything out into a single SQL query or view definition would probably help.

  2. Joining on "max(d2.day_date)" is less specific than joining on "(d2.row_num + 1)", so it might be a lot slower unless we've got an index on day_date (which we do), and a really smart query optimizer.

Oh well. You know, all these solutions look like pretty brittle hacks to me. I suspect what we're really learning here, is that Oracle and standard SQL just don't support ordered data very well at all. As David W. pointed out, in a lot of these cases it would probably be better just to do calculations like (price / previous price) outside the database.