Thursday, June 5, 2014

Sharing a script that logs all long running queries, kills them :) and sends out an email alert to my inbox. I generally modify it according to the production requirement. In my environment there are a couple of production servers where I would setup this script to only alert and Not kill any query.
Please double test before implementing in realtime production environment.Follow the general thumb rule to implement all changes/scripts from lower to higher env. :) this script comes with disclaimer that person using has complete ownership for it’s results.
Ensure postfix installed and configured for emails to work. The script works for postgre user since postgre has privs over all databases in the server
############################
if [ `whoami` != "postgres" ]; then
exit 0;
fi
# selecting non-idle queries which are running since at least 6 minutes.
psql -c "select pid, client_addr, query_start, current_query from pg_stat_activity
where current_query != '<IDLE>' and current_query != 'COPY' and current_query != 'VACUUM' and query_start + '6 min'::interval < now()
and substring(current_query, 1, 11) != 'autovacuum:'
order by query_start desc" > $LOGFILE
NUMBER_OF_STUCK_QUERIES=`cat $LOGFILE | grep "([0-9]* row[s]*)" | sed 's/(//' | awk '{ print $1}'`
if [ $NUMBER_OF_STUCK_QUERIES != 0 ]; then
# Getting the first column from the output discarding alphfanumeric values (table elements in psql's output).
STUCK_PIDS=`cat $LOGFILE | sed "s/([0-9]* row[s]*)//" | awk '{ print $1 }' | sed "s/[^0-9]//g"`
for PID in $STUCK_PIDS; do
echo -n "Cancelling PID $PID ... " >> $LOGFILE

# "t" means the query is successfully cancelled.
SUCCESS=`psql -c "SELECT pg_cancel_backend($PID);" | grep " t"`
if [ $SUCCESS ]; then
SUCCESS="OK.";
else
SUCCESS="Failed.";
fi
echo $SUCCESS >> $LOGFILE
done

cat $LOGFILE | mail -s "Stuck PLpgSQL processes detected and killed that were running over 6 minutes." youremail@whatever.com;

fi

rm $LOGFILE
#######################################