Quantcast

Materialized views WIP patch

classic Classic list List threaded Threaded
61 messages Options
12345
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Kevin Grittner-4
Robert Haas wrote:
> Josh Berkus wrote:

>> Being empty (in an inaccurate way) is just one kind of stale data.
>
> This is my feeling also.

If you had an MV summarizing Wisconsin courts cumulative case counts
by case type, "empty" would not have been a valid "stale" state for
over 150 years. That is a degree of staleness that IMV is not just a
quantitative degree of staleness, as if a nightly recalculation had
failed to occur, but a qualitatively different state entirely. While
you may or may not want to use the stale data if last night's regen
failed, and so it should be under application control, I can't
imagine a situation where you would want to proceed if the MV didn't
have data that had at some time been correct -- preferrably at some
time since the invention of digital electronic computers. Could you
provide an example where it would be a good thing to do so?

-Kevin


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Josh Berkus
In reply to this post by Tom Lane-2

> You could make that same claim about plain views, but in point of
> fact the demand for making them work in COPY has been minimal.
> So I'm not convinced this is an essential first-cut feature.
> We can always add it later.

Of course.  I just had the impression that we could support COPY FROM by
*deleting* a couple lines from Kevin's patch, rather than it being extra
work.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Tom Lane-2
Josh Berkus <[hidden email]> writes:
>> You could make that same claim about plain views, but in point of
>> fact the demand for making them work in COPY has been minimal.
>> So I'm not convinced this is an essential first-cut feature.
>> We can always add it later.

> Of course.  I just had the impression that we could support COPY FROM by
> *deleting* a couple lines from Kevin's patch, rather than it being extra
> work.

Even if it happens to be trivial in the current patch, it's an added
functional requirement that we might later regret having cavalierly
signed up for.  And, as noted upthread, relations that support only
one direction of COPY don't exist at the moment; that would be adding
an asymmetry that we might later regret, too.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Simon Riggs
In reply to this post by Kevin Grittner-4
On 16 November 2012 11:25, Kevin Grittner <[hidden email]> wrote:

>>> 16. To get new data into the MV, the command is LOAD MATERIALIZED
>>> VIEW mat view_name. This seemed more descriptive to me that the
>>> alternatives and avoids declaring any new keywords beyond
>>> MATERIALIZED. If the MV is flagged as relisvalid == false, this
>>> will change it to true.
>>
>> UPDATE MATERIALIZED VIEW was problematic?
>
> Not technically, really, but I saw two reasons that I preferred LOAD MV:
>
> 1. It seems to me to better convey that the entire contents of the MV
>    will be built from scratch, rather than incrementally adjusted.
> 2. We haven't hashed out the syntax for more aggressive maintenance of
>    an MV, and it seemed like UPDATE MV might be syntax we would want to
>    use for something which updated selected parts of an MV when we do.
>
>> Does LOAD automatically TRUNCATE the view before reloading it? If not,
>> why not?
>
> It builds a new heap and moves it into place. When the transaction
> running LMV commits, the old heap is deleted. In implementation it is
> closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
> creating a new table. This allows all permissions, etc., to stay in
> place.

This seems very similar to the REPLACE command we discussed earlier,
except this is restricted to Mat Views.

If we're going to have this, I would prefer a whole command.

e.g. REPLACE matviewname REFRESH

that would also allow

REPLACE tablename AS query

Same thing under the covers, just more widely applicable and thus more useful.



Either way, I don't much like overloading the use of LOAD, which
already has a very different meaning.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Albe Laurenz *EXTERN*
In reply to this post by Kevin Grittner-4
Josh Berkus wrote:
>> It would be nice for the user to have some way to know that a matview is
>> empty due to never being LOADed or recently being TRUNCATEd. However, I
>> don't think that relisvalid flag -- and preventing scanning the relation
>> -- is a good solution. What I'd rather have instead is a timestamp of
>> when the MV was last LOADed. If the MV was never loaded (or was
>> truncated) that timestamp would be NULL. Such a timestamp would allow
>> users to construct all kinds of ad-hoc refresh schemes for MVs which
>> would not be possible without it.

+1

Kevin Grittner wrote:
> I see your point there; I'll think about that. My take was more that MVs
> would often be refreshed by crontab, and that you would want to keep
> subsequent steps from running and generating potentially plausible but
> completely inaccurate results if the LMV failed.

If one of these subsequent steps doesn't care if refresh
failed once, it shouldn't be forced to fail.  I imagine
that for many applications yesterday's data can be good enough.

Those that care should check the timestamp.

Yours,
Laurenz Albe

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Kevin Grittner-4
In reply to this post by Kevin Grittner-4
Albe Laurenz wrote:

> Kevin Grittner wrote:
>
>> My take was more that MVs would often be refreshed by crontab, and
>> that you would want to keep subsequent steps from running and
>> generating potentially plausible but completely inaccurate results
>> if the LMV failed.
>
> If one of these subsequent steps doesn't care if refresh
> failed once, it shouldn't be forced to fail. I imagine
> that for many applications yesterday's data can be good enough.
>
> Those that care should check the timestamp.

It sounds like you and I are in agreement on this; I just didn't
state it very precisely. If a LMV on a MV which already has data
fails, the relisvalid would not prevent it from being used -- it
would be stale, but still valid data from *some* point in time. The
point is that if an MV is created WITH NO DATA or has been TRUNCATEd
and there has not been a subsequent LMV, what it contains may not
represent any state which was *ever* valid, or it may represent a
state which would only have been valid hundreds of years in the past,
had the system been computerized at that time. To me, that does not
seem like the same thing as a simple "stale" state.

I'm looking at whether there is some reasonable way to detect invalid
data as well as capture age of data. Every solution I've thought of
so far has at least one hard-to-solve race condition, but I have
hopes that I can either solve that for one of the ideas, or come up
with an idea which falls more gracefully under MVCC management.

-Kevin


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Kevin Grittner-4
In reply to this post by Kevin Grittner-4
Simon Riggs wrote:

> This seems very similar to the REPLACE command we discussed
> earlier, except this is restricted to Mat Views.

I don't remember that discussion -- do you have a reference?

> If we're going to have this, I would prefer a whole command.
>
> e.g. REPLACE matviewname REFRESH
>
> that would also allow
>
> REPLACE tablename AS query
>
> Same thing under the covers, just more widely applicable and thus
> more useful.

An interesting throught. I would have thought that if we were going
to allow changing the definition of an existing MV, we would be
better off with CREATE OR REPLACE MATERIALIZED VIEW. Either way, if
you allow the column types or the number of columns to be changed,
you do tend to run into issues if there are other MVs, views,
triggers, rules, etc., which depend on the MV, so I don't think it's
material for an initial patch. But it is worth considering which way
we might want to extend it.

> Either way, I don't much like overloading the use of LOAD, which
> already has a very different meaning.

Well, it's hard to avoid creating new keywords without overloading
the meaning of exsiting ones. Personally I didn't find

 ¬†LOAD MATERIALIZED VIEW matview_name;

to be very easy to confuse with

 ¬†LOAD 'filename';

But that's a subjective thing. If too many people find that
confusing, it may be worth creating a new keyword; but I wanted to
see whether it was really necessary first.

-Kevin


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Tom Lane-2
"Kevin Grittner" <[hidden email]> writes:
> Simon Riggs wrote:
>> Either way, I don't much like overloading the use of LOAD, which
>> already has a very different meaning.

> Well, it's hard to avoid creating new keywords without overloading
> the meaning of exsiting ones.

FWIW, I'd much rather see us overload LOAD (which is seldom used)
than REPLACE (which might in the future become a widely-used DML
command).

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Josh Berkus
In reply to this post by Kevin Grittner-4
Kevin,

> I'm looking at whether there is some reasonable way to detect invalid
> data as well as capture age of data. Every solution I've thought of
> so far has at least one hard-to-solve race condition, but I have
> hopes that I can either solve that for one of the ideas, or come up
> with an idea which falls more gracefully under MVCC management.

What's the race condition?  I'd think that LOAD would take an exclusive
lock on the matview involved.

>   LOAD MATERIALIZED VIEW matview_name;
>
> to be very easy to confuse with
>
>   LOAD 'filename';
>
> But that's a subjective thing. If too many people find that
> confusing, it may be worth creating a new keyword; but I wanted to
> see whether it was really necessary first.

I do not find them confusing.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Josh Berkus
On 11/19/12 9:57 AM, Josh Berkus wrote:

> Kevin,
>
>> I'm looking at whether there is some reasonable way to detect invalid
>> data as well as capture age of data. Every solution I've thought of
>> so far has at least one hard-to-solve race condition, but I have
>> hopes that I can either solve that for one of the ideas, or come up
>> with an idea which falls more gracefully under MVCC management.
>
> What's the race condition?  I'd think that LOAD would take an exclusive
> lock on the matview involved.

BTW, another thought on the timestamp: while it would be better to have
a lastrefresh timestamp in pg_class, the other option is to have an
extra column in the matview (pg_last_update).  While that would involve
some redundant storage, it would neatly solve the issues around unlogged
matviews; the timestamp and the data would vanish at the same time.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Kevin Grittner-4
In reply to this post by Kevin Grittner-4
Tom Lane wrote:

> "Kevin Grittner" <[hidden email]> writes:
>> Josh Berkus wrote:
>>> What use would a temporary matview be?
>
>> It would be essentially like a temporary table, with all the same
>> persistence options. I'm not really sure how often it will be more
>> useful than a temporary table before we have incremental
>> maintenance of materialized views; once we have that, though, it
>> seems likely that there could be reasonable use cases.
>
> One of the principal attributes of a temp table is that its
> contents aren't (reliably) accessible from anywhere except the
> owning backend. Not sure where you're going to hide the incremental
> maintenance in that scenario.

The more I think about that, the less sensible temporary MVs seem.
Unless I can figure out some reasonable use case, I'll diable that in
the next version of the patch.

-Kevin


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Marko Tiikkaja-3
In reply to this post by Kevin Grittner-4
Hi Kevin,

On 15/11/2012 03:28, Kevin Grittner wrote:
> Attached is a patch that is still WIP but that I think is getting
> pretty close to completion.

I've been looking at this, but I unfortunately haven't had as much time
as I had hoped for, and have not looked at the code in detail yet.  It's
also a relatively big patch, so I wouldn't mind another pair of eyes on it.


I have been testing the patch a bit, and I'm slightly disappointed by
the fact that it still doesn't solve this problem (and I apologize if I
have missed discussion about this in the docs or in this thread):

<assume "foo" is a non-empty materialized view>

T1: BEGIN;
T1: LOAD MATERIALIZED VIEW foo;

T2: SELECT * FROM foo;

T1: COMMIT;

<T2 sees an empty table>


As others have pointed out, replacing the contents of a table is
something which people have been wanting to do for a long time, and I
think having this ability would make this patch a lot better; now it
just feels like syntactic sugar.

> 1.  CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
>      TABLE AS, with all the same clauses supported. That includes
>      declaring a materialized view to be temporary or unlogged.
> 2.  MVs don't support inheritance.
> 3.  MVs can't define foreign keys.
> 4.  MVs can't be the target of foreign keys.
> 5.  MVs can't have triggers.
> 6.  Users can't create rules which reference MVs (although MVs
>      [ab]use the rules mechanism internally, similar to how views do).
> 7.  MVs can't be converted to views, nor vice versa.
> 8.  Users may not directly use INSERT/UPDATE/DELETE on an MV.
> 9.  MVs can't directly be used in a COPY statement, but can be the
>      source of data using a SELECT.
> 10. MVs can't own sequences.
> 11. MVs can't be the target of LOCK statements, although other
>      statements get locks just like a table.
> 12. MVs can't use data modifying CTEs in their definitions.
> 13. pg_class now has a relisvalid column, which is true if an MV is
>      truncated or created WITH NO DATA. You can not scan a relation
>      flagged as invalid.
> 14. ALTER MATERIALIZED VIEW is supported for the options that seemed
>      to make sense. For example, you can change the tablespace or
>      schema, but you cannot add or drop column with ALTER.
> 16. To get new data into the MV, the command is LOAD MATERIALIZED
>      VIEW mat view_name. This seemed more descriptive to me that the
>      alternatives and avoids declaring any new keywords beyond
>      MATERIALIZED. If the MV is flagged as relisvalid == false, this
>      will change it to true.
> 17. Since the data viewed in an MV is not up-to-date with the latest
>      committed transaction, it didn't seem to make any sense to try to
>      apply SERIALIZABLE transaction semantics to queries looking at
>      the contents of an MV, although if LMV is run in a SERIALIZABLE
>      transaction the MV data is guaranteed to be free of serialization
>      anomalies. This does leave the transaction running the LOAD
>      command vulnerable to serialization failures unless it is also
>      READ ONLY DEFERRABLE.
> 18. Bound parameters are not supported for the CREATE MATERIALIZED
>      VIEW statement.

I believe all of these points have been under discussion, and I don't
have anything to add to the ongoing discussions.

> 19. LMV doesn't show a row count. It wouldn't be hard to add, it just
>      seemed a little out of place to do that, when CLUSTER, etc.,
>      don't.

This sounds like a useful feature, but your point about CLUSTER and
friends still stands.

> In the long term, we will probably need to separate the
> implementation of CREATE TABLE AS and CREATE MATERIALIZED VIEW, but
> for now there is so little that they need to do differently it seemed
> less evil to have a few "if" clauses that that much duplicated code.

Seems sensible.

I'll get back when I manage to get a better grasp of the code.


Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Robert Haas
On Sun, Nov 25, 2012 at 7:30 PM, Marko Tiikkaja <[hidden email]> wrote:
> As others have pointed out, replacing the contents of a table is something
> which people have been wanting to do for a long time, and I think having
> this ability would make this patch a lot better; now it just feels like
> syntactic sugar.

I agree that it's mostly syntactic sugar, but I think we need to have
realistic expectations for what is possible in an initial patch.  When
I committed the first patch for foreign data wrappers, it didn't work
at all: it was just syntax support.  Tom later committed a follow-on
patch that made them work.  Similarly, I split the event trigger patch
into two halves, one of which added the syntax support and the other
of which made them functional: and even with both commits in, I think
it's fair to say that event triggers are still in a fairly primitive
state.

None of those patches were small patches.  It's going to take multiple
years to get materialized views up to a state where they're really
useful to a broad audience in production applications, but I don't
think we should sneer at anyone for writing a patch that is "just
syntactic sugar".  As it turns out, adding a whole new object type is
a lot of work and generates a big patch even if it doesn't do much
just yet.  Rejecting such patches on the grounds that they aren't
comprehensive enough is, IMHO, extremely unwise; we'll either end up
landing even larger patches that are almost impossible to review
comprehensively and therefore more likely to break something, or else
we'll kill the projects outright and end up with nothing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Marko Tiikkaja-3
On 11/26/12 2:07 PM, Robert Haas wrote:

> On Sun, Nov 25, 2012 at 7:30 PM, Marko Tiikkaja <[hidden email]> wrote:
>> As others have pointed out, replacing the contents of a table is something
>> which people have been wanting to do for a long time, and I think having
>> this ability would make this patch a lot better; now it just feels like
>> syntactic sugar.
>
> I agree that it's mostly syntactic sugar, but I think we need to have
> realistic expectations for what is possible in an initial patch.  When
> I committed the first patch for foreign data wrappers, it didn't work
> at all: it was just syntax support.  Tom later committed a follow-on
> patch that made them work.  Similarly, I split the event trigger patch
> into two halves, one of which added the syntax support and the other
> of which made them functional: and even with both commits in, I think
> it's fair to say that event triggers are still in a fairly primitive
> state.
>
> None of those patches were small patches.  It's going to take multiple
> years to get materialized views up to a state where they're really
> useful to a broad audience in production applications, but I don't
> think we should sneer at anyone for writing a patch that is "just
> syntactic sugar".  As it turns out, adding a whole new object type is
> a lot of work and generates a big patch even if it doesn't do much
> just yet.  Rejecting such patches on the grounds that they aren't
> comprehensive enough is, IMHO, extremely unwise; we'll either end up
> landing even larger patches that are almost impossible to review
> comprehensively and therefore more likely to break something, or else
> we'll kill the projects outright and end up with nothing.

First of all, I have to apologize.  Re-reading the email I sent out last
night, it does indeed feel a bit harsh and I can understand your reaction.

At no point did I mean to belittle Kevin's efforts or the patch itself.
  I was mostly looking for Kevin's input on how hard it would be to
solve the particular problem and whether it would be possible to do so
for 9.3.

While I feel like the problem I pointed out is a small caveat and should
be at least documented for 9.3, I think this patch has merits of its own
even if that problem never gets fixed, and I will continue to review
this patch.


Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Peter Geoghegan-2
In reply to this post by Robert Haas
On 26 November 2012 13:07, Robert Haas <[hidden email]> wrote:
> None of those patches were small patches.  It's going to take multiple
> years to get materialized views up to a state where they're really
> useful to a broad audience in production applications, but I don't
> think we should sneer at anyone for writing a patch that is "just
> syntactic sugar".

+1. I have a sweet tooth. I don't like it when people criticise
patches on the basis of "obviously you could achieve the same effect
with $CONVOLUTION". Making things simpler is a desirable outcome. Now,
that isn't to say that we should disregard everything or even anything
else in pursuit of simplicity; just that "needing a Ph.D is
Postgresology", as you once put it, to do something routine to many is
really hard to defend.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

12345
Loading...