Quantcast

initdb and fsync

classic Classic list List threaded Threaded
54 messages Options
123
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

initdb and fsync

Jeff Davis-8
It looks like initdb doesn't fsync all the files it creates, e.g. the
PG_VERSION file.

While it's unlikely that it would cause any real data loss, it can be
inconvenient in some testing scenarios involving VMs.

Thoughts? Would a patch to add a few fsync calls to initdb be accepted?
Is a platform-independent fsync be available at initdb time?

Regards,
    Jeff Davis


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Noah Misch-2
On Fri, Jan 27, 2012 at 04:19:41PM -0800, Jeff Davis wrote:
> It looks like initdb doesn't fsync all the files it creates, e.g. the
> PG_VERSION file.
>
> While it's unlikely that it would cause any real data loss, it can be
> inconvenient in some testing scenarios involving VMs.
>
> Thoughts? Would a patch to add a few fsync calls to initdb be accepted?

+1.  If I'm piloting "strace -f" right, initdb currently issues *no* syncs.

We'd probably, then, want a way to re-disable the fsyncs for hacker benefit.

> Is a platform-independent fsync be available at initdb time?

Not sure.

Thanks,
nm

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Andrew Dunstan


On 01/27/2012 11:52 PM, Noah Misch wrote:
>> Is a platform-independent fsync be available at initdb time?
> Not sure.
>

It's a macro on Windows that calls _commit(fd), so it should be portable
enough.

I'm curious what problem we're actually solving here, though. I've run
the buildfarm countless thousands of times on different VMs, and five of
my seven current animals run in VMs, and I don't think I've ever seen a
failure ascribable to inadequately synced files from initdb.

cheers

andrew

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Jeff Davis-8
On Sat, 2012-01-28 at 10:31 -0500, Andrew Dunstan wrote:
> I'm curious what problem we're actually solving here, though. I've run
> the buildfarm countless thousands of times on different VMs, and five of
> my seven current animals run in VMs, and I don't think I've ever seen a
> failure ascribable to inadequately synced files from initdb.

I believe I have seen such a problem second hand in a situation where
the VM was known to be killed harshly (not sure if you do that
regularly).

It's a little difficult for me to _prove_ that this would have solved
the problem, and I think it was only observed once (though I could
probably reproduce it if I tried). The symptom was a log message
indicating that PG_VERSION was missing or corrupt on a system that was
previously started and online (albeit briefly for a test).

Regards,
        Jeff Davis


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Tom Lane-2
In reply to this post by Andrew Dunstan
Andrew Dunstan <[hidden email]> writes:
> I'm curious what problem we're actually solving here, though. I've run
> the buildfarm countless thousands of times on different VMs, and five of
> my seven current animals run in VMs, and I don't think I've ever seen a
> failure ascribable to inadequately synced files from initdb.

Yeah.  Personally I would be sad if initdb got noticeably slower, and
I've never seen or heard of a failure that this would fix.

I wonder whether it wouldn't be sufficient to call sync(2) at the end,
anyway, rather than cluttering the entire initdb codebase with fsync
calls.

                        regards, tom lane

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Jeff Janes
In reply to this post by Andrew Dunstan
On Sat, Jan 28, 2012 at 7:31 AM, Andrew Dunstan <[hidden email]> wrote:

>
>
> On 01/27/2012 11:52 PM, Noah Misch wrote:
>>>
>>> Is a platform-independent fsync be available at initdb time?
>>
>> Not sure.
>>
>
> It's a macro on Windows that calls _commit(fd), so it should be portable
> enough.
>
> I'm curious what problem we're actually solving here, though. I've run the
> buildfarm countless thousands of times on different VMs, and five of my
> seven current animals run in VMs, and I don't think I've ever seen a failure
> ascribable to inadequately synced files from initdb.

I wouldn't expect you to ever see that problem on the buildfarm.  If
the OS gets thunked during the middle of a regression test, when it
comes back up the code is not going to try to pick up where it left
off, it is just going to blow away the entire install and start over
from scratch.  So any crash-recoverability problems will never be
detected.  I would guess the original poster is doing a more stringent
kind of test.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Jeff Davis-8
In reply to this post by Tom Lane-2
On Sat, 2012-01-28 at 13:18 -0500, Tom Lane wrote:

> Andrew Dunstan <[hidden email]> writes:
> > I'm curious what problem we're actually solving here, though. I've run
> > the buildfarm countless thousands of times on different VMs, and five of
> > my seven current animals run in VMs, and I don't think I've ever seen a
> > failure ascribable to inadequately synced files from initdb.
>
> Yeah.  Personally I would be sad if initdb got noticeably slower, and
> I've never seen or heard of a failure that this would fix.
>
> I wonder whether it wouldn't be sufficient to call sync(2) at the end,
> anyway, rather than cluttering the entire initdb codebase with fsync
> calls.

I can always add a "sync" call to the test, also (rather than modifying
initdb). Or, it could be an initdb option, which might be a good
compromise. I don't have a strong opinion here.

As machines get more memory and filesystems get more lazy, I wonder if
it will be a more frequent occurrence, however. On the other hand, if
filesystems are more lazy, that also increases the cost associated with
extra "sync" calls. I think there would be a surprise factor if
sometimes initdb had a long pause at the end and caused 10GB of data to
be written out.

Regards,
        Jeff Davis


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Jeff Janes
In reply to this post by Tom Lane-2
On Sat, Jan 28, 2012 at 10:18 AM, Tom Lane <[hidden email]> wrote:

> Andrew Dunstan <[hidden email]> writes:
>> I'm curious what problem we're actually solving here, though. I've run
>> the buildfarm countless thousands of times on different VMs, and five of
>> my seven current animals run in VMs, and I don't think I've ever seen a
>> failure ascribable to inadequately synced files from initdb.
>
> Yeah.  Personally I would be sad if initdb got noticeably slower, and
> I've never seen or heard of a failure that this would fix.
>
> I wonder whether it wouldn't be sufficient to call sync(2) at the end,
> anyway, rather than cluttering the entire initdb codebase with fsync
> calls.
>
>                        regards, tom lane

Does sync(2) behave like sync(8) and flush the entire system cache, or
does it just flush the files opened by the process which called it?

The man page didn't enlighten me on that.

sometimes sync(8) never returns.  It doesn't just flush what was dirty
at the time it was called, it actually keeps running until there are
simultaneously no dirty pages anywhere in the system.  On busy
systems, this condition might never be reached.  And it can't be
interrupted, not even with kill -9.

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Andrew Dunstan
In reply to this post by Jeff Davis-8


On 01/28/2012 01:46 PM, Jeff Davis wrote:

> On Sat, 2012-01-28 at 13:18 -0500, Tom Lane wrote:
>> Andrew Dunstan<[hidden email]>  writes:
>>> I'm curious what problem we're actually solving here, though. I've run
>>> the buildfarm countless thousands of times on different VMs, and five of
>>> my seven current animals run in VMs, and I don't think I've ever seen a
>>> failure ascribable to inadequately synced files from initdb.
>> Yeah.  Personally I would be sad if initdb got noticeably slower, and
>> I've never seen or heard of a failure that this would fix.
>>
>> I wonder whether it wouldn't be sufficient to call sync(2) at the end,
>> anyway, rather than cluttering the entire initdb codebase with fsync
>> calls.
> I can always add a "sync" call to the test, also (rather than modifying
> initdb). Or, it could be an initdb option, which might be a good
> compromise. I don't have a strong opinion here.
>
> As machines get more memory and filesystems get more lazy, I wonder if
> it will be a more frequent occurrence, however. On the other hand, if
> filesystems are more lazy, that also increases the cost associated with
> extra "sync" calls. I think there would be a surprise factor if
> sometimes initdb had a long pause at the end and caused 10GB of data to
> be written out.
>

-1 for that. A very quick look at initdb.c suggests to me that there are
only two places where we'd need to put fsync(), right before we call
fclose()  in write_file() and write_version_file(). If we're going to do
anything that seems to be the least painful and most portable way to go.

cheers

andrew


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Florian Weimer
In reply to this post by Tom Lane-2
* Tom Lane:

> I wonder whether it wouldn't be sufficient to call sync(2) at the end,
> anyway, rather than cluttering the entire initdb codebase with fsync
> calls.

We tried to do this in the Debian package mananger.  It works as
expected on Linux systems, but it can cause a lot of data to hit the
disk, and there are kernel versions where sync(2) never completes if
the system is rather busy.

initdb is much faster with 9.1 than with 8.4.  It's so fast that you
can use it in test suites, instead of reusing an existing cluster.
I think this is a rather desirable property.

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Jeff Davis-8
In reply to this post by Tom Lane-2
On Sat, 2012-01-28 at 13:18 -0500, Tom Lane wrote:
> Yeah.  Personally I would be sad if initdb got noticeably slower, and
> I've never seen or heard of a failure that this would fix.

I worked up a patch, and it looks like it does about 6 file fsync's and
a 7th for the PGDATA directory. That degrades the time from about 1.1s
to 1.4s on my workstation.

pg_test_fsync says this about my workstation (one 8kB write):
        open_datasync                     117.495 ops/sec
        fdatasync                         117.949 ops/sec
        fsync                              25.530 ops/sec
        fsync_writethrough                            n/a
        open_sync                          24.666 ops/sec

25 ops/sec means about 40ms per fsync, times 7 is about 280ms, so that
seems like about the right degradation for fsync. I tried with fdatasync
as well to see if it improved things, and I wasn't able to realize any
difference (not sure exactly why).

So, is it worth it? Should we make it an option that can be specified?

> I wonder whether it wouldn't be sufficient to call sync(2) at the end,
> anyway, rather than cluttering the entire initdb codebase with fsync
> calls.

It looks like there are only a few places, so I don't think clutter is
really the problem with the simple patch at this point (unless there is
a portability problem with just calling fsync).

Regards,
        Jeff Davis


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

initdb-fsync.patch.gz (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Noah Misch-2
On Sat, Feb 04, 2012 at 03:41:27PM -0800, Jeff Davis wrote:
> On Sat, 2012-01-28 at 13:18 -0500, Tom Lane wrote:
> > Yeah.  Personally I would be sad if initdb got noticeably slower, and
> > I've never seen or heard of a failure that this would fix.
>
> I worked up a patch, and it looks like it does about 6 file fsync's and
> a 7th for the PGDATA directory. That degrades the time from about 1.1s
> to 1.4s on my workstation.

> So, is it worth it? Should we make it an option that can be specified?

If we add fsync calls to the initdb process, they should cover the entire data
directory tree.  This patch syncs files that initdb.c writes, but we ought to
also sync files that bootstrap-mode backends had written.  An optimization
like the pg_flush_data() call in copy_file() may reduce the speed penalty.

initdb should do these syncs by default and offer an option to disable them.

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Jeff Davis-8
On Sat, 2012-02-04 at 20:18 -0500, Noah Misch wrote:
> If we add fsync calls to the initdb process, they should cover the entire data
> directory tree.  This patch syncs files that initdb.c writes, but we ought to
> also sync files that bootstrap-mode backends had written.

It doesn't make sense for initdb to take responsibility to sync files
created by the backend. If there are important files that the backend
creates, it should be the backend's responsibility to fsync them (and
their parent directory, if needed). And if they are unimportant to the
backend, then there is no reason for initdb to fsync them.

> An optimization
> like the pg_flush_data() call in copy_file() may reduce the speed penalty.

That worked pretty well. It took it down about 100ms on my machine,
which closes the gap significantly.

> initdb should do these syncs by default and offer an option to disable them.

For test frameworks that run initdb often, that makes sense.

But for developers, it doesn't make sense to spend 0.5s typing an option
that saves you 0.3s. So, we'd need some more convenient way to choose
the no-fsync option, like an environment variable that developers can
set. Or maybe developers don't care about 0.3s?

Regards,
        Jeff Davis


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Tom Lane-2
Jeff Davis <[hidden email]> writes:
> On Sat, 2012-02-04 at 20:18 -0500, Noah Misch wrote:
>> If we add fsync calls to the initdb process, they should cover the entire data
>> directory tree.  This patch syncs files that initdb.c writes, but we ought to
>> also sync files that bootstrap-mode backends had written.

> It doesn't make sense for initdb to take responsibility to sync files
> created by the backend.

No, but the more interesting question is whether bootstrap mode troubles
to fsync its writes.  I'm not too sure about that either way ...

                        regards, tom lane

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Noah Misch-2
In reply to this post by Jeff Davis-8
On Sun, Feb 05, 2012 at 10:53:20AM -0800, Jeff Davis wrote:

> On Sat, 2012-02-04 at 20:18 -0500, Noah Misch wrote:
> > If we add fsync calls to the initdb process, they should cover the entire data
> > directory tree.  This patch syncs files that initdb.c writes, but we ought to
> > also sync files that bootstrap-mode backends had written.
>
> It doesn't make sense for initdb to take responsibility to sync files
> created by the backend. If there are important files that the backend
> creates, it should be the backend's responsibility to fsync them (and
> their parent directory, if needed). And if they are unimportant to the
> backend, then there is no reason for initdb to fsync them.

I meant primarily to illustrate the need to be comprehensive, not comment on
which executable should fsync a particular file.  Bootstrap-mode backends do
not sync anything during an initdb run on my system.  With your patch, we'll
fsync a small handful of files and leave nearly everything else vulnerable.

That being said, having each backend fsync its own writes will mean syncing
certain files several times within a single initdb run.  If the penalty from
that proves high enough, we may do well to instead have initdb.c sync
everything just once.

> > initdb should do these syncs by default and offer an option to disable them.
>
> For test frameworks that run initdb often, that makes sense.
>
> But for developers, it doesn't make sense to spend 0.5s typing an option
> that saves you 0.3s. So, we'd need some more convenient way to choose
> the no-fsync option, like an environment variable that developers can
> set. Or maybe developers don't care about 0.3s?

Developers have shell aliases/functions/scripts and command history.  I
wouldn't object to having an environment variable control it, but I would not
personally find that more convenient than a command-line switch.

Thanks,
nm

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Peter Eisentraut-2
In reply to this post by Jeff Davis-8
On sön, 2012-02-05 at 10:53 -0800, Jeff Davis wrote:

> > initdb should do these syncs by default and offer an option to
> disable them.
>
> For test frameworks that run initdb often, that makes sense.
>
> But for developers, it doesn't make sense to spend 0.5s typing an
> option
> that saves you 0.3s. So, we'd need some more convenient way to choose
> the no-fsync option, like an environment variable that developers can
> set. Or maybe developers don't care about 0.3s?
>
You can use https://launchpad.net/libeatmydata for those cases.



--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Robert Haas
On Fri, Feb 10, 2012 at 3:57 PM, Peter Eisentraut <[hidden email]> wrote:

> On sön, 2012-02-05 at 10:53 -0800, Jeff Davis wrote:
>> > initdb should do these syncs by default and offer an option to
>> disable them.
>>
>> For test frameworks that run initdb often, that makes sense.
>>
>> But for developers, it doesn't make sense to spend 0.5s typing an
>> option
>> that saves you 0.3s. So, we'd need some more convenient way to choose
>> the no-fsync option, like an environment variable that developers can
>> set. Or maybe developers don't care about 0.3s?
>>
> You can use https://launchpad.net/libeatmydata for those cases.

That's hilarious.

But, a command-line option seems more convenient.

It also seems entirely sufficient.  The comments above suggest that it
would take too long to type the option, but any PG developers who are
worried about the speed difference surely know how to create shell
aliases, shell functions, shell scripts, ... and if anyone's really
concerned about it, we can provide a short form for the option.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Jeff Davis-8
In reply to this post by Noah Misch-2
On Sun, 2012-02-05 at 17:56 -0500, Noah Misch wrote:
> I meant primarily to illustrate the need to be comprehensive, not comment on
> which executable should fsync a particular file.  Bootstrap-mode backends do
> not sync anything during an initdb run on my system.  With your patch, we'll
> fsync a small handful of files and leave nearly everything else vulnerable.

Thank you for pointing that out. With that in mind, I have a new version
of the patch which just recursively fsync's the whole directory
(attached).

I also introduced a new option --nosync (-N) to disable this behavior.

The bad news is that it introduces a lot more time to initdb -- it goes
from about 1s to about 10s on my machine. I tried fsync'ing the whole
directory twice just to make sure that the second was a no-op, and
indeed it didn't make much difference (still about 10s).

That's pretty inefficient considering that

  initdb -D data --nosync && sync

only takes a couple seconds. Clearly batching the operation is a big
help. Maybe there's some more efficient way to fsync a lot of
files/directories? Or maybe I can mitigate it by avoiding files that
don't really need to be fsync'd?

Regards,
        Jeff Davis


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

initdb-fsync-20120312.patch.gz (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Andres Freund
On Tuesday, March 13, 2012 04:49:40 AM Jeff Davis wrote:

> On Sun, 2012-02-05 at 17:56 -0500, Noah Misch wrote:
> > I meant primarily to illustrate the need to be comprehensive, not comment
> > on which executable should fsync a particular file.  Bootstrap-mode
> > backends do not sync anything during an initdb run on my system.  With
> > your patch, we'll fsync a small handful of files and leave nearly
> > everything else vulnerable.
>
> Thank you for pointing that out. With that in mind, I have a new version
> of the patch which just recursively fsync's the whole directory
> (attached).
>
> I also introduced a new option --nosync (-N) to disable this behavior.
>
> The bad news is that it introduces a lot more time to initdb -- it goes
> from about 1s to about 10s on my machine. I tried fsync'ing the whole
> directory twice just to make sure that the second was a no-op, and
> indeed it didn't make much difference (still about 10s).
I suggest you try making it two loops:

for recursively everything in dir:
   posix_fadvise(fd, POSIX_FADV_DONTNEED);

for recursively everything in dir:
   fsync(fd);

In my experience that gives way much better performance due to the fact that
it does not force its own metadata/journal commit/transaction for every file
but can be batched. copydir() does the same since some releases...

Obviously its not that nice to use _DONTNEED but I havent found something that
works equally well. You could try sync_file_range(fd, 0, 0,
SYNC_FILE_RANGE_WRITE) in the first loop but my experience with that hasn't
been that good.

Andres

--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: initdb and fsync

Jeff Davis-8
On Tue, 2012-03-13 at 09:42 +0100, Andres Freund wrote:
> for recursively everything in dir:
>    posix_fadvise(fd, POSIX_FADV_DONTNEED);
>
> for recursively everything in dir:
>    fsync(fd);

Wow, that made a huge difference!

  no sync:      ~ 1.0s
  sync:         ~10.0s
  fadvise+sync: ~ 1.3s

Patch attached.

Now I feel much better about it. Most people will either have fadvise, a
write cache (rightly or wrongly), or actually need the sync. Those that
have none of those can use -N.

Regards,
        Jeff Davis


--
Sent via pgsql-hackers mailing list ([hidden email])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

initdb-fsync-20120313.patch.gz (3K) Download Attachment

123
Loading...