Quantcast

Materialized views WIP patch

classic Classic list List threaded Threaded
61 messages Options
1234
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Materialized views WIP patch

Kevin Grittner-4
Attached is a patch that is still WIP but that I think is getting
pretty close to completion. It is not intended to be the be-all and
end-all for materialized views, but the minimum useful feature set --
which is all that I've had time to do for this release. In
particular, the view is only updated on demand by a complete rebuild.
For the next release, I hope to build on this base to allow more
eager and incremental updates, and perhaps a concurrent batch update.

1.  CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
    TABLE AS, with all the same clauses supported. That includes
    declaring a materialized view to be temporary or unlogged.
2.  MVs don't support inheritance.
3.  MVs can't define foreign keys.
4.  MVs can't be the target of foreign keys.
5.  MVs can't have triggers.
6.  Users can't create rules which reference MVs (although MVs
    [ab]use the rules mechanism internally, similar to how views do).
7.  MVs can't be converted to views, nor vice versa.
8.  Users may not directly use INSERT/UPDATE/DELETE on an MV.
9.  MVs can't directly be used in a COPY statement, but can be the
    source of data using a SELECT.
10. MVs can't own sequences.
11. MVs can't be the target of LOCK statements, although other
    statements get locks just like a table.
12. MVs can't use data modifying CTEs in their definitions.
13. pg_class now has a relisvalid column, which is true if an MV is
    truncated or created WITH NO DATA. You can not scan a relation
    flagged as invalid.
14. ALTER MATERIALIZED VIEW is supported for the options that seemed
    to make sense. For example, you can change the tablespace or
    schema, but you cannot add or drop column with ALTER.
15. The SELECT query used to define the MV may not contain a
    data-modifying CTE.
16. To get new data into the MV, the command is LOAD MATERIALIZED
    VIEW mat view_name. This seemed more descriptive to me that the
    alternatives and avoids declaring any new keywords beyond
    MATERIALIZED. If the MV is flagged as relisvalid == false, this
    will change it to true.
17. Since the data viewed in an MV is not up-to-date with the latest
    committed transaction, it didn't seem to make any sense to try to
    apply SERIALIZABLE transaction semantics to queries looking at
    the contents of an MV, although if LMV is run in a SERIALIZABLE
    transaction the MV data is guaranteed to be free of serialization
    anomalies. This does leave the transaction running the LOAD
    command vulnerable to serialization failures unless it is also
    READ ONLY DEFERRABLE.
18. Bound parameters are not supported for the CREATE MATERIALIZED
    VIEW statement.
19. LMV doesn't show a row count. It wouldn't be hard to add, it just
    seemed a little out of place to do that, when CLUSTER, etc.,
    don't.

I wasn't able to wrap up a few things in time for this commitfest:

 - Documentation is incomplete.
 - pg_dump support needs addtional dependencies added to properly
   handle MVs which are defined using other MVs.
 - pg_dump binary hasn't had a lot of attention yet.
 - There are no regression tests yet.
 - I ran into problems getting the validity check working right, so I
   have disabled it by commenting out the function body in this
   patch.
 - TRUNCATE should probably support a MATERIALIZED VIEW clause.

It would be good to have some discussion to try to reach a consensus
about whether we need to differentiate between *missing* datat (where
a materialized view which has been loaded WITH NO DATA or TRUNCATEd
and has not been subsequently LOADed) and potentially *stale* data.
If we don't care to distinguish between a view which generated no
rows when it ran and a one for which the query has not been run, we
can avoid adding the relisvalid flag, and we could support UNLOGGED
MVs. Perhaps someone can come up with a better solution to that
problem.

In the long term, we will probably need to separate the
implementation of CREATE TABLE AS and CREATE MATERIALIZED VIEW, but
for now there is so little that they need to do differently it seemed
less evil to have a few "if" clauses that that much duplicated code.

The paint is pretty wet still, but hopefully people can evaluate the
approach and work out any issues with the design choices in the CF so
that it can be wrapped up nicely for the next one.

 92 files changed, 2377 insertions(+), 440 deletions(-)

-Kevin


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

matview-v1.patch (242K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Alvaro Herrera-9
Kevin Grittner wrote:

Interesting stuff.

>   /*
> +  * SetRelationIsValid
> +  * Set the value of the relation's relisvalid field in pg_class.
> +  *
> +  * NOTE: caller must be holding an appropriate lock on the relation.
> +  * ShareUpdateExclusiveLock is sufficient.
> +  *
> +  * NOTE: an important side-effect of this operation is that an SI invalidation
> +  * message is sent out to all backends --- including me --- causing plans
> +  * referencing the relation to be rebuilt with the new list of children.
> +  * This must happen even if we find that no change is needed in the pg_class
> +  * row.
> +  */
> + void
> + SetRelationIsValid(Oid relationId, bool relisvalid)
> + {

It's not clear to me that it's right to do this by doing regular heap
updates here instead of heap_inplace_update.  Also, I think this might
end up causing a lot of pg_class tuple churn (at least for matviews that
delete rows at xact end), which would be nice to avoid.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Greg Smith-21
In reply to this post by Kevin Grittner-4
On 11/14/12 6:28 PM, Kevin Grittner wrote:
>   - Documentation is incomplete.
>   ...
>   - There are no regression tests yet.

Do you have any simple test cases you've been using you could attach?
With epic new features like this, when things don't work it's hard to
distinguish between "that just isn't implemented yet" and "the author
never tested that".  Having some known good samples you have tested,
even if they're not proper regression tests, would be helpful for
establishing the code baseline works.

--
Greg Smith   2ndQuadrant US    [hidden email]   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Josh Berkus
In reply to this post by Kevin Grittner-4
Kevin,

> Attached is a patch that is still WIP but that I think is getting
> pretty close to completion. It is not intended to be the be-all and
> end-all for materialized views, but the minimum useful feature set --
> which is all that I've had time to do for this release. In
> particular, the view is only updated on demand by a complete rebuild.
> For the next release, I hope to build on this base to allow more
> eager and incremental updates, and perhaps a concurrent batch update.

Nice to see this come in!

> 1.  CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
>     TABLE AS, with all the same clauses supported. That includes
>     declaring a materialized view to be temporary or unlogged.

What use would a temporary matview be?

Unlogged is good.

> 2.  MVs don't support inheritance.

In which direction?  Can't inherit, or can't be inherited from?

> 3.  MVs can't define foreign keys.
> 4.  MVs can't be the target of foreign keys.
> 5.  MVs can't have triggers.

Makes sense.

> 9.  MVs can't directly be used in a COPY statement, but can be the
>     source of data using a SELECT.

Hmmm? I don't understand the reason for this.

> 13. pg_class now has a relisvalid column, which is true if an MV is
>     truncated or created WITH NO DATA. You can not scan a relation
>     flagged as invalid.

What error would a user see?

> 14. ALTER MATERIALIZED VIEW is supported for the options that seemed
>     to make sense. For example, you can change the tablespace or
>     schema, but you cannot add or drop column with ALTER.

How would you change the definition of an MV then?

> 16. To get new data into the MV, the command is LOAD MATERIALIZED
>     VIEW mat view_name. This seemed more descriptive to me that the
>     alternatives and avoids declaring any new keywords beyond
>     MATERIALIZED. If the MV is flagged as relisvalid == false, this
>     will change it to true.

UPDATE MATERIALIZED VIEW was problematic?

Does LOAD automatically TRUNCATE the view before reloading it?  If not,
why not?

> It would be good to have some discussion to try to reach a consensus
> about whether we need to differentiate between *missing* datat (where
> a materialized view which has been loaded WITH NO DATA or TRUNCATEd
> and has not been subsequently LOADed) and potentially *stale* data.
> If we don't care to distinguish between a view which generated no
> rows when it ran and a one for which the query has not been run, we
> can avoid adding the relisvalid flag, and we could support UNLOGGED
> MVs. Perhaps someone can come up with a better solution to that
> problem.

Hmmm.  I understand the distinction you're making here, but I'm not sure
it actually matters to the user.  MVs, by their nature, always have
potentially stale data.  Being empty (in an inaccurate way) is just one
kind of stale data.

It would be nice for the user to have some way to know that a matview is
empty due to never being LOADed or recently being TRUNCATEd.  However, I
don't think that relisvalid flag -- and preventing scanning the relation
-- is a good solution.  What I'd rather have instead is a timestamp of
when the MV was last LOADed.  If the MV was never loaded (or was
truncated) that timestamp would be NULL.  Such a timestamp would allow
users to construct all kinds of ad-hoc refresh schemes for MVs which
would not be possible without it.

I don't see how this relates to UNLOGGED matviews either way.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Jeff Davis-8
In reply to this post by Kevin Grittner-4
On Wed, 2012-11-14 at 21:28 -0500, Kevin Grittner wrote:
> Attached is a patch that is still WIP but that I think is getting
> pretty close to completion. It is not intended to be the be-all and
> end-all for materialized views, but the minimum useful feature set --
> which is all that I've had time to do for this release. In
> particular, the view is only updated on demand by a complete rebuild.
> For the next release, I hope to build on this base to allow more
> eager and incremental updates, and perhaps a concurrent batch update.

The documentation says that a materialized view is basically a
create-table-as-select except that it remembers the query. Would you say
that there is a compelling use case for this alone, or is this a
building block for more sophisticated materialized view support (e.g.
eager updating) later?

Regards,
        Jeff Davis



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Dimitri Fontaine-7
Jeff Davis <[hidden email]> writes:
> The documentation says that a materialized view is basically a
> create-table-as-select except that it remembers the query. Would you say
> that there is a compelling use case for this alone, or is this a
> building block for more sophisticated materialized view support (e.g.
> eager updating) later?

The implementation of the re-LOAD'ing command makes it already
worthwile. Bonus point if locking is limited to when the new content is
all computer and ready, but even without that, I want to have it. ;)

I'd bikeshed and prefer the UPDATE MATERIALIZED VIEW nsp.foo; of course.

The alternative is creating a view, a matching table and a stored
procedure that will implement the rebuilding, for each mat view you want
to have. So that's already a big step forward in my eyes.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Kevin Grittner-4
In reply to this post by Kevin Grittner-4
Alvaro Herrera wrote:

> It's not clear to me that it's right to do this by doing regular heap
> updates here instead of heap_inplace_update. Also, I think this might
> end up causing a lot of pg_class tuple churn (at least for matviews that
> delete rows at xact end), which would be nice to avoid.

If we keep the flag, I will look into heap_inplace_update.

Thanks!

-Kevin


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Kevin Grittner-4
In reply to this post by Kevin Grittner-4
Greg Smith wrote:

> On 11/14/12 6:28 PM, Kevin Grittner wrote:
>> - Documentation is incomplete.
>> ...
>> - There are no regression tests yet.
>
> Do you have any simple test cases you've been using you could attach?
> With epic new features like this, when things don't work it's hard to
> distinguish between "that just isn't implemented yet" and "the author
> never tested that". Having some known good samples you have tested,
> even if they're not proper regression tests, would be helpful for
> establishing the code baseline works.

I can probably post something along those lines Monday. Sorry for the
delay. Basically, though, I tried to state everything that I know of
which is not yet done and working; so if you find something which
doesn't behave as you would expect, please let me know. It's well
within the realm of possibility that there are issues that I didn't
think of. I certainly can fall  into the tendency of programmers to
think about testing those things which I thought to cover in the code.

-Kevin


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Merlin Moncure-2
In reply to this post by Dimitri Fontaine-7
On Fri, Nov 16, 2012 at 7:13 AM, Dimitri Fontaine
<[hidden email]> wrote:

> Jeff Davis <[hidden email]> writes:
>> The documentation says that a materialized view is basically a
>> create-table-as-select except that it remembers the query. Would you say
>> that there is a compelling use case for this alone, or is this a
>> building block for more sophisticated materialized view support (e.g.
>> eager updating) later?
>
> The implementation of the re-LOAD'ing command makes it already
> worthwile. Bonus point if locking is limited to when the new content is
> all computer and ready, but even without that, I want to have it. ;)

Seconded.  Background lock free refresh of materialization table would
be wonderful.  But moving dependency between source and materialized
table out of plpgsql function and into defined schema justifies
feature on its own merits.

merlin


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Kevin Grittner-4
In reply to this post by Kevin Grittner-4
Josh Berkus wrote:

>> 1. CREATE MATERIALIZED VIEW syntax is stolen directly from CREATE
>> TABLE AS, with all the same clauses supported. That includes
>> declaring a materialized view to be temporary or unlogged.
>
> What use would a temporary matview be?

It would be essentially like a temporary table, with all the same
persistence options. I'm not really sure how often it will be more
useful than a temporary table before we have incremental maintenance
of materialized views; once we have that, though, it seems likely
that there could be reasonable use cases.

> Unlogged is good.

I agree that there are likely to be more use cases for this than
temp MVs. Unfortunately, I've had a hard time figuring out how to
flag an MV which is empty because its contents were lost after a
crash with preventing people from using an MV which hasn't been
populated, which has the potential to silently return incorrect
results.

>> 2. MVs don't support inheritance.
>
> In which direction? Can't inherit, or can't be inherited from?

The table inheritance has not been implemented in either direction
for MVs. It didn't seem clear to me that there were reasonable use
cases. Do you see any?

>> 9. MVs can't directly be used in a COPY statement, but can be the
>> source of data using a SELECT.
>
> Hmmm? I don't understand the reason for this.

Consistency. There are other object types which seem to enforce this
rule for no reason that I can see beyond maybe a desire to have both
directions of COPY work with the same set of objects. If I remember
correctly, allowing this would eliminate one line of code from the
patch, so if sentiment is in favor of it, it is very easily done.

>> 13. pg_class now has a relisvalid column, which is true if an MV is
>> truncated or created WITH NO DATA. You can not scan a relation
>> flagged as invalid.
>
> What error would a user see?

I can more directly answer that on Monday. If you enable the body of
the function which makes the relisvalid check you can see the messages.
I commented it out because I have not yet figured out how to suppress
the check for a LOAD MV command.

>> 14. ALTER MATERIALIZED VIEW is supported for the options that seemed
>> to make sense. For example, you can change the tablespace or
>> schema, but you cannot add or drop column with ALTER.
>
> How would you change the definition of an MV then?

At this point you would need to drop and re-create the MV. If we
want to add columns to an MV or change what an existing column holds,
perhaps we could have an ALTER MV which changed the SELECT statement
that populates the MV? I would prefer to leave that to a later patch,
though -- it seems like a bit of a minefield compared to what is
being implemented in this patch.

>> 16. To get new data into the MV, the command is LOAD MATERIALIZED
>> VIEW mat view_name. This seemed more descriptive to me that the
>> alternatives and avoids declaring any new keywords beyond
>> MATERIALIZED. If the MV is flagged as relisvalid == false, this
>> will change it to true.
>
> UPDATE MATERIALIZED VIEW was problematic?

Not technically, really, but I saw two reasons that I preferred LOAD MV:

1. It seems to me to better convey that the entire contents of the MV
   will be built from scratch, rather than incrementally adjusted.
2. We haven't hashed out the syntax for more aggressive maintenance of
   an MV, and it seemed like UPDATE MV might be syntax we would want to
   use for something which updated selected parts of an MV when we do.

> Does LOAD automatically TRUNCATE the view before reloading it? If not,
> why not?

It builds a new heap and moves it into place. When the transaction
running LMV commits, the old heap is deleted. In implementation it is
closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
creating a new table. This allows all permissions, etc., to stay in
place.

>> It would be good to have some discussion to try to reach a consensus
>> about whether we need to differentiate between *missing* datat (where
>> a materialized view which has been loaded WITH NO DATA or TRUNCATEd
>> and has not been subsequently LOADed) and potentially *stale* data.
>> If we don't care to distinguish between a view which generated no
>> rows when it ran and a one for which the query has not been run, we
>> can avoid adding the relisvalid flag, and we could support UNLOGGED
>> MVs. Perhaps someone can come up with a better solution to that
>> problem.
>
> Hmmm. I understand the distinction you're making here, but I'm not sure
> it actually matters to the user. MVs, by their nature, always have
> potentially stale data. Being empty (in an inaccurate way) is just one
> kind of stale data.

Robert feels the same way, but I disagree. Some MVs will not be terribly
volatile. In my view there is a big difference between having a "top ten"
list which might be based on yesterday's base tables rather than the base
table states as of this moment, and having a "top ten" list with no
entries. If you want to, for example, take some action if an order comes
in for one of your top customers, and a different path for other
customers, suddenly treating all of your long-time top customers as not
being so, without any squawk from the database, seems dangerous.

> It would be nice for the user to have some way to know that a matview is
> empty due to never being LOADed or recently being TRUNCATEd. However, I
> don't think that relisvalid flag -- and preventing scanning the relation
> -- is a good solution. What I'd rather have instead is a timestamp of
> when the MV was last LOADed. If the MV was never loaded (or was
> truncated) that timestamp would be NULL. Such a timestamp would allow
> users to construct all kinds of ad-hoc refresh schemes for MVs which
> would not be possible without it.

I see your point there; I'll think about that. My take was more that MVs
would often be refreshed by crontab, and that you would want to keep
subsequent steps from running and generating potentially plausible but
completely inaccurate results if the LMV failed.

> I don't see how this relates to UNLOGGED matviews either way.

UNLOGGED tables and indexes are made empty on crash recovery by copying
the initialization fork over the "normal" relations. Care was taken to
avoid needing to connect to each database in turn to complete that
recovery. This style of recovery can't really set the relisvalid flag, as
far as I can see; which leaves us choosing between unlogged MVs and
knowing whether they hold valid data -- unless someone has a better idea.

-Kevin


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Kevin Grittner-4
In reply to this post by Kevin Grittner-4
Jeff Davis wrote:

> On Wed, 2012-11-14 at 21:28 -0500, Kevin Grittner wrote:
> > Attached is a patch that is still WIP but that I think is getting
> > pretty close to completion. It is not intended to be the be-all and
> > end-all for materialized views, but the minimum useful feature set --
> > which is all that I've had time to do for this release. In
> > particular, the view is only updated on demand by a complete rebuild.
> > For the next release, I hope to build on this base to allow more
> > eager and incremental updates, and perhaps a concurrent batch update.
>
> The documentation says that a materialized view is basically a
> create-table-as-select except that it remembers the query. Would you say
> that there is a compelling use case for this alone, or is this a
> building block for more sophisticated materialized view support (e.g.
> eager updating) later?

IMV, this has some slight value as it stands, although perhaps not enough
to justify a patch this big. The idea is that with this much in place,
patches to implement more aggressive and incremental maintenance of the
MV data become possible. So I think the bar it should pass for commit is
that it seems a sane basis for that, while providing some functionality
which people will find useful.

-Kevin


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Robert Haas
In reply to this post by Josh Berkus
On Thu, Nov 15, 2012 at 1:36 PM, Josh Berkus <[hidden email]> wrote:
> Hmmm.  I understand the distinction you're making here, but I'm not sure
> it actually matters to the user.  MVs, by their nature, always have
> potentially stale data.  Being empty (in an inaccurate way) is just one
> kind of stale data.

This is my feeling also.

> I don't see how this relates to UNLOGGED matviews either way.

Right now, Kevin has things set up so that when you do "TRUNCATE mv",
it clears the relisvalid flag.  If we allowed unlogged materialized
views, the table would be automatically truncated on a crash, but
there wouldn't be any way to clear relisvalid in that situation, so
Kevin felt we should simply disable unlogged MVs.  Personally, I'm not
excited about having a relisvalid flag at all, and doubly not excited
if it means we can't have unlogged MVs.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Thom Brown-2
In reply to this post by Kevin Grittner-4
On 16 November 2012 16:25, Kevin Grittner <[hidden email]> wrote:
Josh Berkus wrote:

> Unlogged is good.

I agree that there are likely to be more use cases for this than
temp MVs. Unfortunately, I've had a hard time figuring out how to
flag an MV which is empty because its contents were lost after a
crash with preventing people from using an MV which hasn't been
populated, which has the potential to silently return incorrect
results.

pg_class.relisvalid = false.. getting rid of the use for truncated MVs?

--
Thom
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Tom Lane-2
In reply to this post by Kevin Grittner-4
"Kevin Grittner" <[hidden email]> writes:
> Josh Berkus wrote:
>> What use would a temporary matview be?

> It would be essentially like a temporary table, with all the same
> persistence options. I'm not really sure how often it will be more
> useful than a temporary table before we have incremental maintenance
> of materialized views; once we have that, though, it seems likely
> that there could be reasonable use cases.

One of the principal attributes of a temp table is that its contents
aren't (reliably) accessible from anywhere except the owning backend.
Not sure where you're going to hide the incremental maintenance in
that scenario.

> The table inheritance has not been implemented in either direction
> for MVs. It didn't seem clear to me that there were reasonable use
> cases. Do you see any?

We don't have inheritance for views, so how would we have it for
materialized views?

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Thom Brown-2
In reply to this post by Kevin Grittner-4
On 15 November 2012 02:28, Kevin Grittner <[hidden email]> wrote:
Attached is a patch that...

 Got this error:

postgres=# create view v_test as select 1;
CREATE VIEW
postgres=# create materialized view mv_test as select * from v_test;
ERROR:  could not open file "base/12064/16425": No such file or directory

--
Thom
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Antonin Houska
By chance (?) I got similar one today too, when dropping extension:

ERROR:  could not open file "base/12623/12548": No such file or directory

I thought something might have gone wrong during Linux upgrade 2 days ago, but it's not likely that we both have the issue.

I wonder if something is broken in the catalog. The last commit I have in my environment is

commit 4af3dda13601d859a20425e3554533fde0549056
Author: Peter Eisentraut [hidden email]
Date:   Sun Oct 28 10:35:46 2012 -0400

Kind regards,
Tony.

On 11/16/2012 06:14 PM, Thom Brown wrote:
On 15 November 2012 02:28, Kevin Grittner <[hidden email]> wrote:
Attached is a patch that...

 Got this error:

postgres=# create view v_test as select 1;
CREATE VIEW
postgres=# create materialized view mv_test as select * from v_test;
ERROR:  could not open file "base/12064/16425": No such file or directory

--
Thom

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Josh Berkus
In reply to this post by Kevin Grittner-4
Kevin,

> I agree that there are likely to be more use cases for this than
> temp MVs. Unfortunately, I've had a hard time figuring out how to
> flag an MV which is empty because its contents were lost after a
> crash with preventing people from using an MV which hasn't been
> populated, which has the potential to silently return incorrect
> results.

See below.

>>> 2. MVs don't support inheritance.
>>
>> In which direction? Can't inherit, or can't be inherited from?
>
> The table inheritance has not been implemented in either direction
> for MVs. It didn't seem clear to me that there were reasonable use
> cases. Do you see any?

No, I just wanted clarity on this.  I can see a strong case for
eventually supporting CREATE MATERIALIZED VIEW matview_1 LIKE matview,
in order to "copy" mativews, though.

> Consistency. There are other object types which seem to enforce this
> rule for no reason that I can see beyond maybe a desire to have both
> directions of COPY work with the same set of objects. If I remember
> correctly, allowing this would eliminate one line of code from the
> patch, so if sentiment is in favor of it, it is very easily done.

There's going to be a pretty strong demand for COPY FROM matviews.
Forcing the user to use COPY FROM ( SELECT ... ) will be seen as
arbitrary and unintuitive.

>> How would you change the definition of an MV then?
>
> At this point you would need to drop and re-create the MV. If we
> want to add columns to an MV or change what an existing column holds,
> perhaps we could have an ALTER MV which changed the SELECT statement
> that populates the MV? I would prefer to leave that to a later patch,
> though -- it seems like a bit of a minefield compared to what is
> being implemented in this patch.

I agree that it should be a later patch.

> Not technically, really, but I saw two reasons that I preferred LOAD MV:
>
> 1. It seems to me to better convey that the entire contents of the MV
>    will be built from scratch, rather than incrementally adjusted.
> 2. We haven't hashed out the syntax for more aggressive maintenance of
>    an MV, and it seemed like UPDATE MV might be syntax we would want to
>    use for something which updated selected parts of an MV when we do.

Hmmm, I see your point.  So "LOAD" would recreate, and (when supported)
UPDATE would incrementally update?

> It builds a new heap and moves it into place. When the transaction
> running LMV commits, the old heap is deleted. In implementation it is
> closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
> creating a new table. This allows all permissions, etc., to stay in
> place.

OK, so same effect as a truncate.

> Robert feels the same way, but I disagree. Some MVs will not be terribly
> volatile. In my view there is a big difference between having a "top ten"
> list which might be based on yesterday's base tables rather than the base
> table states as of this moment, and having a "top ten" list with no
> entries. If you want to, for example, take some action if an order comes
> in for one of your top customers, and a different path for other
> customers, suddenly treating all of your long-time top customers as not
> being so, without any squawk from the database, seems dangerous.

Right, but a relisvalid flag just tells me that the matview was updated
at sometime in the past, and not *when* it was updated.  It could have
been 3 years ago.  The fact that it was updated at some indefinite time
is fairly valueless information.

There's a rule in data warehousing which says that it's better to have
no data (and know that you have no data) than to have incorrect data.

> I see your point there; I'll think about that. My take was more that MVs
> would often be refreshed by crontab, and that you would want to keep
> subsequent steps from running and generating potentially plausible but
> completely inaccurate results if the LMV failed.

Yeah, that too.  Also, a timestamp it would make it easy to double-check
if the cron job was failing or had been disabled.

> UNLOGGED tables and indexes are made empty on crash recovery by copying
> the initialization fork over the "normal" relations. Care was taken to
> avoid needing to connect to each database in turn to complete that
> recovery. This style of recovery can't really set the relisvalid flag, as
> far as I can see; which leaves us choosing between unlogged MVs and
> knowing whether they hold valid data -- unless someone has a better idea.

Yeah, well, whether we have relisvalid or mvlastupdate, we're going to
have to work out some way to have that field react to changes to the
table overall.  I don't know *how*, but it's something we'll have to solve.


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Dimitri Fontaine-7
In reply to this post by Kevin Grittner-4
"Kevin Grittner" <[hidden email]> writes:
>> UPDATE MATERIALIZED VIEW was problematic?
>
> Not technically, really, but I saw two reasons that I preferred LOAD MV:
>
> 1. It seems to me to better convey that the entire contents of the MV
>    will be built from scratch, rather than incrementally adjusted.
> 2. We haven't hashed out the syntax for more aggressive maintenance of
>    an MV, and it seemed like UPDATE MV might be syntax we would want to
>    use for something which updated selected parts of an MV when we do.

Good point, and while I'm in the mood for some grammar input, here's a
try:

  ALTER MATERIALIZED VIEW foo RESET;
  ALTER MATERIALIZED VIEW foo UPDATE;

I think such wholesale operations make more sense as ALTER statement
than as UPDATE statements.

> It builds a new heap and moves it into place. When the transaction
> running LMV commits, the old heap is deleted. In implementation it is
> closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
> creating a new table. This allows all permissions, etc., to stay in
> place.

When you say closer to CLUSTER, do you include the Access Exclusive Lock
that forbids reading the previous version's data while you prepare the
new one? That would be very bad and I wouldn't understand the need to,
in the scope of MATERIALIZED VIEWs which are by definition lagging
behind…

If as I think you don't have that limitation in your implementation,
it's awesome and just what I was hoping to read :)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Tom Lane-2
In reply to this post by Josh Berkus
Josh Berkus <[hidden email]> writes:
> There's going to be a pretty strong demand for COPY FROM matviews.
> Forcing the user to use COPY FROM ( SELECT ... ) will be seen as
> arbitrary and unintuitive.

You could make that same claim about plain views, but in point of
fact the demand for making them work in COPY has been minimal.
So I'm not convinced this is an essential first-cut feature.
We can always add it later.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Materialized views WIP patch

Kevin Grittner-4
In reply to this post by Kevin Grittner-4
Thom Brown wrote:

> postgres=# create view v_test as select 1;
> CREATE VIEW
> postgres=# create materialized view mv_test as select * from v_test;
> ERROR: could not open file "base/12064/16425": No such file or directory

Thanks for the report; will investigate.

-Kevin


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1234
Loading...