Quantcast

GIN index not used

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

GIN index not used

Mark
I have problem with GIN index. Queries over it takes a lot of time. Some informations:
 
I've got a table with tsvector- textvector:
CREATE TABLE mediawiki.pagecontent
(
  old_id integer NOT NULL DEFAULT nextval('mediawiki.text_old_id_seq'::regclass),
  old_text text,
  old_flags text,
  textvector tsvector,
  CONSTRAINT pagecontent_pkey PRIMARY KEY (old_id)
)
The table has about 311 000 rows.
I've created GIN index over textvector:
CREATE INDEX gin_index2 ON mediawiki.pagecontent USING gin (textvector);

At first all was ok, but last days I find out that the queries take a lot of time (10sec and more). When I start EXPLAIN ANALIZE of the query
SELECT old_id FROM mediawiki.pagecontent WHERE (textvector @@ (to_tsquery('den')))
the result is this:
"Bitmap Heap Scan on pagecontent  (cost=8677.26..26663.01 rows=9037 width=4) (actual time=8.258..8131.677 rows=9093 loops=1)"
"  Recheck Cond: (textvector @@ to_tsquery('den'::text))"
"  ->  Bitmap Index Scan on gin_index2  (cost=0.00..8675.00 rows=9037 width=0) (actual time=6.002..6.002 rows=9093 loops=1)"
"        Index Cond: (textvector @@ to_tsquery('den'::text))"
"Total runtime: 8150.949 ms"
 
It seems that the GIN index was not used. At first I though, that there is some mess in the db, cause I provided some experiments(delete and import data and GIN index), so I used vacuum, afterwards I tried pg_dump and pg_restore, but it did not help.
Could you please point me in the right direction, where could be the problem?
Thanks a lot
Mark
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GIN index not used

Alban Hertroys-3
On 27 Apr 2011, at 22:55, Mark wrote:

> I've got a table with tsvector- textvector:
> CREATE TABLE mediawiki.pagecontent
> (
>  old_id integer NOT NULL DEFAULT
> nextval('mediawiki.text_old_id_seq'::regclass),
>  old_text text,
>  old_flags text,
>  textvector tsvector,
>  CONSTRAINT pagecontent_pkey PRIMARY KEY (old_id)
> )
> The table has about 311 000 rows.
> I've created GIN index over textvector:
> CREATE INDEX gin_index2 ON mediawiki.pagecontent USING gin (textvector);

> When I start EXPLAIN ANALIZE of the query

> SELECT old_id FROM mediawiki.pagecontent WHERE (textvector @@
> (to_tsquery('den')))
> the result is this:
> "Bitmap Heap Scan on pagecontent  (cost=8677.26..26663.01 rows=9037 width=4)
> (actual time=8.258..8131.677 rows=9093 loops=1)"
> "  Recheck Cond: (textvector @@ to_tsquery('den'::text))"
> "  ->  Bitmap Index Scan on gin_index2  (cost=0.00..8675.00 rows=9037
> width=0) (actual time=6.002..6.002 rows=9093 loops=1)"
> "        Index Cond: (textvector @@ to_tsquery('den'::text))"
> "Total runtime: 8150.949 ms"
>
> It seems that the GIN index was not used.

But it _is_ being used. It takes almost 1ms per row it finds though, I wonder why that is. Maybe you're using slow hardware or you didn't provide PG with enough resources to work with efficiently?

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4db8fc3811671483513299!



--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GIN index not used

Mark
Alban thanks for your quick reply.
It is true that I use for this only 2,5GB RAM on Intel Core i5 CPU 2.67GHz and resources I didn't changed from instalation of postgres:
max_connections = 100
shared_buffers = 32MB
(other parameters are commented)
, but that would not be the reason I think.

I was maybe wrong to explain. As I said at first all was ok. That means that after creating the index one query took less time.
For example query over:
tsquery(liquid & water) - 38ms, 219 results

but now it is like this:
tsquery(liquid & water) - 2859ms, 219 results

I though that it was caused by some "magic" over this table (I have deleted and inserted data and index again).
That is why I tried mentioned vacuum and pg_dump with pg_restore.
Could it be caused by this?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GIN index not used

Oleg Bartunov
In reply to this post by Mark
It should be better in 9.1
http://archives.postgresql.org/message-id/4C2DDC9B.1060908@...

Oleg
On Wed, 27 Apr 2011, Mark wrote:

> I have problem with GIN index. Queries over it takes a lot of time. Some
> informations:
>
> I've got a table with tsvector- textvector:
> CREATE TABLE mediawiki.pagecontent
> (
>  old_id integer NOT NULL DEFAULT
> nextval('mediawiki.text_old_id_seq'::regclass),
>  old_text text,
>  old_flags text,
>  textvector tsvector,
>  CONSTRAINT pagecontent_pkey PRIMARY KEY (old_id)
> )
> The table has about 311 000 rows.
> I've created GIN index over textvector:
> CREATE INDEX gin_index2 ON mediawiki.pagecontent USING gin (textvector);
>
> At first all was ok, but last days I find out that the queries take a lot of
> time (10sec and more). When I start EXPLAIN ANALIZE of the query
> SELECT old_id FROM mediawiki.pagecontent WHERE (textvector @@
> (to_tsquery('den')))
> the result is this:
> "Bitmap Heap Scan on pagecontent  (cost=8677.26..26663.01 rows=9037 width=4)
> (actual time=8.258..8131.677 rows=9093 loops=1)"
> "  Recheck Cond: (textvector @@ to_tsquery('den'::text))"
> "  ->  Bitmap Index Scan on gin_index2  (cost=0.00..8675.00 rows=9037
> width=0) (actual time=6.002..6.002 rows=9093 loops=1)"
> "        Index Cond: (textvector @@ to_tsquery('den'::text))"
> "Total runtime: 8150.949 ms"
>
> It seems that the GIN index was not used. At first I though, that there is
> some mess in the db, cause I provided some experiments(delete and import
> data and GIN index), so I used vacuum, afterwards I tried pg_dump and
> pg_restore, but it did not help.
> Could you please point me in the right direction, where could be the
> problem?
> Thanks a lot
> Mark
>
> --
> View this message in context: http://postgresql.1045698.n5.nabble.com/GIN-index-not-used-tp4344826p4344826.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>

  Regards,
  Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [hidden email], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GIN index not used

Alban Hertroys-3
In reply to this post by Mark
On 28 Apr 2011, at 10:07, Mark wrote:

> Alban thanks for your quick reply.
> It is true that I use for this only 2,5GB RAM on Intel Core i5 CPU 2.67GHz
> and resources I didn't changed from instalation of postgres:
> max_connections = 100
> shared_buffers = 32MB
> (other parameters are commented)
> , but that would not be the reason I think.

It probably is, the default Postgres settings are quite modest and GIN indexes are memory hungry.
I think you need to increase shared_buffers. With 2.5GB of memory (such a strange number) the docs suggest about 250MB.
See http://www.postgresql.org/docs/current/static/runtime-config-resource.html for details.

> I was maybe wrong to explain. As I said at first all was ok. That means that
> after creating the index one query took less time.
> For example query over:
> tsquery(liquid & water) - 38ms, 219 results
>
> but now it is like this:
> tsquery(liquid & water) - 2859ms, 219 results

Presumably the index did fit in your available shared buffers before your delete/insert actions, but now it doesn't anymore (which possibly means there are still dead rows referenced by the index). I'm also not sure why you're getting a bitmap index scan instead of a normal index scan, maybe that's normal for GIN indexes (considering what I know about how they work internally, that seems plausible).

What version of Postgres is this?
If it's 9.0, you could try VACUUM FULL (before you increase shared_buffers) and see whether your performance is back where it was, which would indicate that dead rows are the cause. In older versions you could as well, but you'll want to REINDEX after that. That's not a solution though, only a test ;)

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4db987e912122982898556!



--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GIN index not used

Alban Hertroys-3
On 28 Apr 2011, at 17:29, Alban Hertroys wrote:
> With 2.5GB of memory (such a strange number) the docs suggest about 250MB.

Correction, 25% of 2.5GB isn't 250MB of course. It would be somewhat over 500MB, although it's really just a rule-of-thumb (no point in calculating exact numbers). Anyway, much more than you've currently configured, so even 250MB would probably show a significant improvement.

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4db98c8412122209668288!



--
Sent via pgsql-general mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: GIN index not used

Mark
Alban thank for your ideas
> It probably is, the default Postgres settings are quite modest and GIN indexes are memory hungry.
> I think you need to increase shared_buffers. With 2.5GB of memory (such a strange number) the docs        > suggest about 250MB.
> See http://www.postgresql.org/docs/current/static/runtime-config-resource.html for details.
Well I did not mentioned that it all runs on Virtual Machine so that's why 2.5GB. I could not assigned more.
Today I have tried the VM machine on PC where I assigned 9GB of memory to this VM. There I set the shared_buffers on 900MB. You were right it wa's definitely caused by the size of memory.
I do not understand why before it was such quick even, if I had less memory.

Thanks very much for your help :-)

Loading...