Quantcast

[HACKERS] Inefficiency in recent pgtz patch

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[HACKERS] Inefficiency in recent pgtz patch

Tom Lane-2
I have just noticed that this patch:
       
        * Implements a hash-table cache of loaded tables, so we don't have
        to read and parse the TZ file everytime we change a timezone. While
        not necesasry now (we don't change timezones very often), I beleive
        this will be necessary (or at least good) when "multiple timezones
        in the same query" is eventually implemented. And code-wise, this
        was the time to do it.

balloons the working store of every backend process by something over 5
megabytes:

Timezones: 5678768 total in 19 blocks; 3904 free (0 chunks); 5674864 used

The reason is that during postmaster start we load every single timezone
definition in the zic database whilst searching for the one that matches
the system timezone most closely.  Because pg_tzset is designed to
unconditionally put every loaded timezone into the hashtable, that means
the postmaster ends up with copies of everything, and then all that junk
is inherited via fork by every backend.

I'm not sure about the actual cost of inheriting data that we (probably)
don't ever change, but this seems like a bad idea in principle,
considering that 99% of backends will likely have a use for only one
timezone value.

Please fix it.  Possibly pg_tzset could take an extra argument
indicating whether or not to cache the data.  Or maybe score_timezone
shouldn't use pg_tzset.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [hidden email] so that your
      message can get through to the mailing list cleanly
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [HACKERS] Inefficiency in recent pgtz patch

Magnus Hagander
>I have just noticed that this patch:
>
> * Implements a hash-table cache of loaded tables, so we
>don't have
...

>balloons the working store of every backend process by something over 5
>megabytes:
>
>Timezones: 5678768 total in 19 blocks; 3904 free (0 chunks);
>5674864 used
>
>The reason is that during postmaster start we load every
>single timezone
>definition in the zic database whilst searching for the one
>that matches
>the system timezone most closely.

Yikes. I checked memory consumption but apparantly only on win32, which
uses a different method for picking the timezone, and doesn't scan
everything. Definitly needs a fix.


> Because pg_tzset is designed to
>unconditionally put every loaded timezone into the hashtable,
>that means
>the postmaster ends up with copies of everything, and then all
>that junk
>is inherited via fork by every backend.
>
>I'm not sure about the actual cost of inheriting data that we
>(probably)
>don't ever change, but this seems like a bad idea in principle,
>considering that 99% of backends will likely have a use for only one
>timezone value.

I'd say there are lots of cases where the backend will need *a couple*
of timezones (both in "single connections" but especially int he case of
pooled connections). And the use will increase when we can support these
timezones in "AT TIME ZONE".

So I still think it's a good idea to have and use the hash table, but
definitly not to fill it with everything we know at backend startup
(because it definitly won't be common to need any larger percentage of
the number of timezones).


>Please fix it.  Possibly pg_tzset could take an extra argument
>indicating whether or not to cache the data.  Or maybe score_timezone
>shouldn't use pg_tzset.

Will look at that as soon as I can. I think making score_timezone not
use pg_tzset is the cleanest way.

Do you agree that using a hashtable for it in general is a good idea
assuming this sideeffect is removed, though?

/Magnus

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [hidden email] so that your
      message can get through to the mailing list cleanly
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [HACKERS] Inefficiency in recent pgtz patch

Tom Lane-2
"Magnus Hagander" <[hidden email]> writes:
> Do you agree that using a hashtable for it in general is a good idea
> assuming this sideeffect is removed, though?

I have no problem with the hashtable, only with preloading it with
everything.  What I'd like to see is that the table inherited at fork()
contains just the data for the default timezone.  (At least in the
normal case where that setting hasn't been changed since postmaster
start.)

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Loading...