FAQ | This is a LIVE service | Changelog

Skip to content
Snippets Groups Projects
Commit e40c1a55 authored by Dr Catherine Pitt's avatar Dr Catherine Pitt
Browse files

Make database connections short-lived

The code used to open a database connection for each thread and leave
them open for as long as the scheduler ran. This worked reasonably well
until we moved to PostgreSQL 13 on Focal, although the scheduler would
fail if the database was restarted because there was no logic to
reconnect after a connection dropped.

On Focal/PG13 the connection for the 'cron' thread steadily consumes
memory until it has exhausted everything in the machine. This appears to
be a Postgres change rather than a Perl DBI change: the problem can be
reproduced by sitting in psql and running 'select * from backup_queue'
repeatedly. Once or twice a minute an instance of this query will cause
the connection to consume another MB of RAM which is not released until
the database connection is closed. The cron thread runs that query every
two seconds. My guess is it's something peculiar about the view that
query selects from - the time interval thing is interesting.
This needs more investigation.

But in the meantime I'd like to have backup servers that don't endlessly
gobble RAM, so this change makes the threads connect to the database
only when they need to, and closes the connection afterwards. This
should also make things work better over database restarts but that's
not been carefully tested.
parent 86279e65
No related branches found
Tags 0.9-ch75
No related merge requests found
......@@ -3,6 +3,6 @@ Priority: optional
Section: otherosfs
Maintainer: Chemistry COs <support@ch.cam.ac.uk>
Architecture: all
Version: 0.9-ch74
Version: 0.9-ch75
Depends: zfs-dkms, postgresql-13 | postgresql-9.5 | postgresql-9.4 , liblockfile-simple-perl, libdbi-perl, libjson-perl, libzfs-perl-chem, libnet-openssh-perl, libdbd-pg-perl, mbuffer, rsync, nfs-kernel-server, pv, libwww-curl-perl
Description: a backup system using ZFS (repository 'backup-scheduler')
......@@ -212,15 +212,14 @@ sub cron {
my $keepgoing=1;
my $ping=Net::Ping->new('icmp');
my $dbh=DBI->connect("dbi:Pg:",,);
my $sql='select backup_task_id,hostname,zfs_target from backup_queue';
my $offlinesql='update host set offline=true where hostname=?';
my $sth=$dbh->prepare($sql);
my $off_sth=$dbh->prepare($offlinesql);
# Start a log file
my $dbh;
my $sth;
while ($keepgoing) {
$sth->execute() || die "Unable to run $sql: ".$dbh->errstr;
$dbh=DBI->connect("dbi:Pg:",,) || logEntry($logfh,'Unable to connect to database');
$sth=$dbh->prepare($sql);
$sth->execute() || logEntry($logfh,"Unable to run $sql: ".$dbh->errstr);
# Get any items we need to backup
while (my $row=$sth->fetchrow_hashref) {
my $h=$row->{'hostname'};
......@@ -232,6 +231,7 @@ sub cron {
if (defined($q)) {
my $logid=&stampjoinqueue($dbh,$row->{'backup_task_id'},$logfh);
print "Enqueueing ".&shorthost($h)." item id ".$row->{'backup_item_id'}." with log id ".$logid."\n";
logEntry($logfh,"Enqueueing ".&shorthost($h)." item id ".$row->{'backup_item_id'}." with log id ".$logid."\n");
# Queue the item by adding the unique log ID
$q->enqueue($logid);
} else {
......@@ -240,15 +240,14 @@ sub cron {
} else {
# Machine is offline
logEntry($logfh,"Machine $h is not pingable.");
#$off_sth->execute($h);
}
}
$dbh->disconnect;
# Should we quit?
if (examinecontrolq(D_CRON) & M_STOP) {
$keepgoing=0;
} else {
sleep(SLEEP_CRON);
threads->exit() unless $dbh->ping()
}
}
logEntry($logfh,"Cron thread ending");
......@@ -294,7 +293,7 @@ sub writer($$$) {
$SIG{'INT'}='DEFAULT';
};
my $dbh=DBI->connect("dbi:Pg:",,);
my $dbh;
my $keepgoing=1;
my $actions={zfs => \&zfswrite, zfs_rsync=> \&zfsrsyncwrite};
while ($keepgoing) {
......@@ -302,6 +301,7 @@ sub writer($$$) {
my $qlen=$q->pending();
if ($qlen) {
my @items=$q->dequeue_nb();
$dbh=DBI->connect("dbi:Pg:",,);
my $sql="select backup_type_name from writer_view where backup_log_id=?";
my $sth=$dbh->prepare($sql);
foreach my $id (@items) {
......@@ -313,6 +313,7 @@ sub writer($$$) {
$actions->{$class}->($dbh,$id,$logfh);
}
}
$dbh->disconnect;
}
# Should we quit?
if (examinecontrolq(D_WRITER) & M_STOP) {
......@@ -484,6 +485,7 @@ sub tidydatabase() {
$dbh->do($sql,undef);
$sql="update backup_task set isrunning='f'";
$dbh->do($sql,undef);
$dbh->disconnect;
}
sub setoffline($$) {
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment