Home
The Toolkit for Online Communities
17243 Community Members, 1 member online, 1936 visitors today
Log In Register
OpenACS Home : Forums : OpenACS Development : OpenACS performance issues

Forum OpenACS Development: OpenACS performance issues

Icon of envelope Request notifications

Collapse
Posted by Matthew Coupe on
I'm in a bit of a mess and could really do with some help please!

We recently upgraded to OpenACS 5.3.2, dotLRN 2.3.1, with Postgresql 8.2.5, Aolserver 4.0.10, tcl 8.4.13.

Server: Red Hat ES 4
Memory: 4GB
Processor: 2X Dual Core Xeon 2.8GHz

Our site makes use of xowiki heavily but we seems to be having massive performance issues compared to before we upgraded and started using Xowiki so much.

The problem seems to be that when there are a few people logged in (8+) when you try to view a page it can take a few seconds to display and seems to hang halfway through, occasionally there are server hangs of 10 seconds during which time it seems as thoguh the server is not responding at all.

There are quite a few variables in play and I'm struggling to narrow it down but I'll try posting as much info as I can in the hope that someone can help me out.

We have 13,000 accounts on the system but there are usually only 10-15 people online at any 1 time.

Next week we will have 200 people online all of the time so this is a bit of a concern!

When executing Top, total memory can be anything between 50% and 90% of 4GB taken at any given time.

This seems to be the business end of the config file:
server parameters:
#
# Server parameters
#
ns_section ns/server/${server}
ns_param directoryfile $directoryfile
ns_param pageroot $pageroot
ns_param maxconnections 150
ns_param maxdropped 0
ns_param maxthreads 60
ns_param minthreads 20
ns_param threadtimeout 3600
ns_param globalstats false ;# Enable built-in statistics
ns_param urlstats false ;# Enable URL statistics
ns_param maxurlstats 1000 ;# Max number of URL's to do stats on
# ns_param directoryadp $pageroot/dirlist.adp ;# Choose one or the ot
her
# ns_param directoryproc _ns_dirlist ;# ...but not both!
# ns_param directorylisting fancy ;# Can be simple or fancy

An extract from the log file is below. As you can see, it is quite busy and this part in particular seems to be a bit concerning, 1240ms + 478ms + 480ms + 491ms seems to be excessive?

[05/Mar/2008:15:57:39][24412.2533358512][-conn:service0::20] Notice: --i ::3551196 DONE, ::3551196 -> (1240ms, 10
97ms)
[05/Mar/2008:15:57:41][24412.2531253168][-conn:service0::22] Notice: --starting... /ezine/competitions/weblog sum
mary=1&date=1999-02-18, ::3551421 -> (478ms)
[05/Mar/2008:15:57:41][24412.2531253168][-conn:service0::22] Notice: --try weblog -> 0, ::3551421 ::xowiki::Packa
ge->resolve_request (480ms, 1ms)
[05/Mar/2008:15:57:41][24412.2531253168][-conn:service0::22] Notice: --try weblog-portlet -> 0, ::3551421 ::xowik
i::Package->resolve_request (491ms, 10ms)

[05/Mar/2008:15:57:36][24412.2815663024][-conn:service0::3] Warning: blank-compat: /var/lib/aolserver/service0
n/packages/xowiki/www/index.vuh uses deprecated property header_stuff instead of head.
[05/Mar/2008:15:57:36][24412.2815663024][-conn:service0::3] Notice: --i ::3551222 DONE, ::3551222 -> (670ms, 406m
s)
[05/Mar/2008:15:57:38][24412.2533358512][-conn:service0::20] Notice: --starting... /campus/home/ , ::3551196 -> (
48ms)
[05/Mar/2008:15:57:38][24412.2533358512][-conn:service0::20] Notice: --try index -> 0, ::3551196 ::xowiki::Packag
e->resolve_request (51ms, 2ms)
[05/Mar/2008:15:57:38][24412.2533358512][-conn:service0::20] Notice: --after options, ::3553858 ::xowiki::Page->v
iew (143ms, 92ms)
[05/Mar/2008:15:57:39][24412.2533358512][-conn:service0::20] Warning: blank-compat: /var/lib/aolserver/nccedudotl
rn/packages/xowiki/www/index.vuh uses deprecated property title instead of doc(title).
[05/Mar/2008:15:57:39][24412.2533358512][-conn:service0::20] Warning: blank-compat: /var/lib/aolserver/nccedudotl
rn/packages/xowiki/www/index.vuh uses deprecated property header_stuff instead of head.
[05/Mar/2008:15:57:39][24412.2533358512][-conn:service0::20] Notice: --i ::3551196 DONE, ::3551196 -> (1240ms, 10
97ms)
[05/Mar/2008:15:57:41][24412.2531253168][-conn:service0::22] Notice: --starting... /ezine/competitions/weblog sum
mary=1&date=1999-02-18, ::3551421 -> (478ms)
[05/Mar/2008:15:57:41][24412.2531253168][-conn:service0::22] Notice: --try weblog -> 0, ::3551421 ::xowiki::Packa
ge->resolve_request (480ms, 1ms)
[05/Mar/2008:15:57:41][24412.2531253168][-conn:service0::22] Notice: --try weblog-portlet -> 0, ::3551421 ::xowik
i::Package->resolve_request (491ms, 10ms)
[05/Mar/2008:15:57:41][24412.2531253168][-conn:service0::22] Notice: --W starting, ::xotcl::__#m ::xowiki::Weblog
->init (495ms, 4ms)
[05/Mar/2008:15:57:41][24412.2531253168][-conn:service0::22] Warning: db_qd_get_fullname: there is no documented
proc with name xowiki::init returning dbqd..NULL (declare proc xowiki::init with ad_proc to make it work with the que
ry dispatcher
[05/Mar/2008:15:57:41][24412.2531253168][-conn:service0::22] Notice: --W done, ::xotcl::__#m ::xowiki::Weblog->in
it (582ms, 87ms)
[05/Mar/2008:15:57:41][24412.2531253168][-conn:service0::22] Notice: --try categories-portlet -> 0, ::3551421 ::x
owiki::Package->resolve_request (642ms, 59ms)
[05/Mar/2008:15:57:41][24412.2531253168][-conn:service0::22] Warning: db_qd_get_fullname: there is no documented
proc with name 3552051::content returning dbqd..NULL (declare proc 3552051::content with ad_proc to make it work with
the query dispatcher
[05/Mar/2008:15:57:41][24412.2531253168][-conn:service0::22] Notice: --after options, ::3552049 ::xowiki::Page->v
iew (759ms, 117ms)
[05/Mar/2008:15:57:41][24412.2531253168][-conn:service0::22] Warning: blank-compat: /var/lib/aolserver/nccedudotl
rn/packages/xowiki/www/index.vuh uses deprecated property title instead of doc(title).
[05/Mar/2008:15:58:00][24412.2809346992][-conn:service0::9] Notice: --starting... /campus/home/ , ::3551196 -> (7
0ms)
[05/Mar/2008:15:58:00][24412.2809346992][-conn:service0::9] Notice: --try index -> 0, ::3551196 ::xowiki::Package
->resolve_request (72ms, 2ms)
[05/Mar/2008:15:58:00][24412.2809346992][-conn:service0::9] Notice: --after options, ::3553858 ::xowiki::Page->vi
ew (162ms, 89ms)
[05/Mar/2008:15:58:00][24412.2809346992][-conn:service0::9] Warning: blank-compat: /var/lib/aolserver/service0
n/packages/xowiki/www/index.vuh uses deprecated property title instead of doc(title).
[05/Mar/2008:15:58:00][24412.2809346992][-conn:service0::9] Warning: blank-compat: /var/lib/aolserver/service0
n/packages/xowiki/www/index.vuh uses deprecated property header_stuff instead of head.
[05/Mar/2008:15:58:01][24412.2809346992][-conn:service0::9] Notice: --i ::3551196 DONE, ::3551196 -> (1589ms, 142
7ms)
[05/Mar/2008:15:58:08][24412.2817768368][-conn:service0::1] Warning: blank-compat: /var/lib/aolserver/service0
n/packages/acs-subsite/www/pvt/home.adp uses deprecated property title instead of doc(title).
[05/Mar/2008:15:58:08][24412.2817768368][-conn:service0::1] Warning: blank-compat: /var/lib/aolserver/service0
n/packages/acs-subsite/www/pvt/home.adp uses deprecated property header_stuff instead of head.
[05/Mar/2008:15:58:08][24412.2817768368][-conn:service0::1] Warning: blank-compat: property focus is deprecated i
n blank-master - focus should be handled in site-master.
[05/Mar/2008:15:58:24][24412.2532305840][-conn:service0::21] Warning: blank-
compat: /var/lib/aolserver/service0/packages/acs-subsite/www/user/password-update.adp uses deprecated property ti
tle instead of doc(title).
[05/Mar/2008:15:58:24][24412.2532305840][-conn:service0::21] Warning: blank-compat: /var/lib/aolserver/nccedudotl
rn/packages/acs-subsite/www/user/password-update.adp uses deprecated property header_stuff instead of head.
[05/Mar/2008:15:58:24][24412.2532305840][-conn:service0::21] Warning: blank-compat: property focus is deprecated
in blank-master - focus should be handled in site-master.
[05/Mar/2008:15:58:29][24412.2814610352][-conn:service0::4] Notice: --starting... /campus/facultyzone/ , ::355111
8 -> (27ms)
[05/Mar/2008:15:58:29][24412.2814610352][-conn:service0::4] Notice: --try index -> 0, ::3551118 ::xowiki::Package
->resolve_request (29ms, 2ms)
[05/Mar/2008:15:58:30][24412.2814610352][-conn:service0::4] Notice: --after options, ::3551878 ::xowiki::Page->vi
ew (269ms, 239ms)
[05/Mar/2008:15:58:30][24412.2814610352][-conn:service0::4] Warning: blank-compat: /var/lib/aolserver/service0
n/packages/xowiki/www/index.vuh uses deprecated property title instead of doc(title).
[05/Mar/2008:15:58:30][24412.2814610352][-conn:service0::4] Warning: blank-compat: /var/lib/aolserver/service0
n/packages/xowiki/www/index.vuh uses deprecated property header_stuff instead of head.
[05/Mar/2008:15:58:30][24412.2814610352][-conn:service0::4] Notice: --i ::3551118 DONE, ::3551118 -> (517ms, 248m
s)
[05/Mar/2008:15:58:40][24412.2817768368][-conn:service0::1] Warning: blank-compat: /var/lib/aolserver/service0

Let me know if you need any more info please!

Thanks in advance,
Matthew

Collapse
Posted by Dave Bauer on
What process is consuming CPU, if any.
I'd look specifically for nsd and postgres procresses in TOP to see if one of them is consuming excessive cpu.

If its postgres, check your postgresql.conf tuning settings.

Collapse
Posted by Matthew Coupe on
The CPU is barely getting above 5-20% use.

It's almost as if when one person makes a request everyone else has to wait for their request before anything else happens.

From Postgresql conf
max_connections = 100
#---------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#---------------------------------------------------------------------------

# - Memory -

shared_buffers = 32MB # min 128kB or max_connections*16kB
# (change requires restart)
#temp_buffers = 8MB # min 800kB
#max_prepared_transactions = 5 # can be 0 or more
# (change requires restart)
# Note: increasing max_prepared_transactions costs ~600 bytes of shared memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
#work_mem = 1MB # min 64kB
#maintenance_work_mem = 16MB # min 1MB
#max_stack_depth = 2MB # min 100kB

# - Free Space Map -

max_fsm_pages = 204800 # min max_fsm_relations*16, 6 bytes each
# (change requires restart)
#max_fsm_relations = 1000 # min 100, ~70 bytes each
# (change requires restart)
.
.
.

These seem to take an age considering the CPU is so idle?

[05/Mar/2008:16:22:27][24412.2803030960][-conn:service0::15] Notice: --i ::3551196 DONE, ::3551196 -> (4774ms, 38
5ms)
[05/Mar/2008:16:22:36][24412.2806188976][-conn:service0::12] Notice: --starting... /campus/myprogramme/ , ::35512
22 -> (2806ms)
[05/Mar/2008:16:22:36][24412.2806188976][-conn:service0::12] Notice: --try index -> 0, ::3551222 ::xowiki::Packag
e->resolve_request (2809ms, 2ms)
[05/Mar/2008:16:22:36][24412.2806188976][-conn:service0::12] Notice: --after options, ::3551914 ::xowiki::Page->v
iew (3022ms, 213ms)

Collapse
Posted by Raúl Morales Hidalgo on
You should also check if max connections in each pool is too low, we had that problem and requests had to be queued making the user wait until one connection was available

Also if you have developer support instaled check what is consuming more time in the page load.

Collapse
Posted by Matthew Coupe on
What pool settings would you recommend for the following?
ns_param maxthreads 60
ns_param minthreads 20

Currently they are:
ns_section ns/db/pools
ns_param pool1 "Pool 1"
ns_param pool2 "Pool 2"
ns_param pool3 "Pool 3"

ns_section ns/db/pool/pool1
ns_param maxidle 1000000000
ns_param maxopen 1000000000
ns_param connections 5
ns_param verbose $debug
ns_param extendedtableinfo true
ns_param logsqlerrors $debug
if { $database == "oracle" } {
ns_param driver ora8
ns_param datasource {}
ns_param user $db_name
ns_param password $db_password
} else {
ns_param driver postgres
ns_param datasource ${db_host}:${db_port}:${db_name}
ns_param user $db_user
ns_param password ""
}

ns_section ns/db/pool/pool2
ns_param maxidle 1000000000
ns_param maxopen 1000000000
ns_param connections 5
ns_param verbose $debug
ns_param extendedtableinfo true
ns_param logsqlerrors $debug
if { $database == "oracle" } {
ns_param driver ora8
ns_param datasource {}
ns_param user $db_name
ns_param password $db_password
} else {
ns_param driver postgres
ns_param datasource ${db_host}:${db_port}:${db_name}
ns_param user $db_user
ns_param password ""
}

ns_section ns/db/pool/pool3
ns_param maxidle 1000000000
ns_param maxopen 1000000000
ns_param connections 5
ns_param verbose $debug
ns_param extendedtableinfo true
ns_param logsqlerrors $debug
if { $database == "oracle" } {
ns_param driver ora8
ns_param datasource {}
ns_param user $db_name
ns_param password $db_password
} else {
ns_param driver postgres
ns_param datasource ${db_host}:${db_port}:${db_name}
ns_param user $db_user
ns_param password ""
}

ns_section ns/server/${server}/db
ns_param pools "*"
ns_param defaultpool pool1

Collapse
Posted by Dave Bauer on
You have 5 connections for your db pool.

I think this is the culprit.

CPU is not slow, so might not be shared memory issue after all.

You have minthreads at 20 which means 20 concurrent connections.

I am not sure what the idle db pool connection settings are though, I know it was discussed before.

Collapse
Posted by Raúl Morales Hidalgo on
connections is way to low, I'd try with at least 20 in each pool and if it starts getting very slow with more users raise it at once.

Also keep in mind you have it 100 connections configured in postgres, so if you raise it over 33 you'll have to change the value in postgres also

Collapse
Posted by Gustaf Neumann on
for maxthreads=60, i would recommend 50 connections in pool1, the other ones smaller (e.g. 10). define at least 100 connectons in postgres

i would as well recommend to set maxidle and maxopen to 0, but this does not have to do with your performace problems

i have noticed on some machines problem with the keepalive (when e.g. the page is already show, the load takes forever to finish). Therefore you might try

ns_section "ns/parameters"
....
ns_param keepalivetimeout 0
This change won't help with the long times reported in the log.

By looking into the log i wonder about e.g. the line with "--starting". The reported time is the time, when xowiki starts working. In the example below, something was already taking 2.8 secs before xowiki became active

[05/Mar/2008:16:22:36][24412.2806188976][-conn:service0::12] Notice: --starting... /campus/myprogramme/ , ::35512
22 -> (2806ms)

so, the culprit might be in the master templates. Do you have custom templates with SQL-calls?

Collapse
Posted by Matthew Coupe on
We started off with the Zen theme and added some extra stuff, not too much but possibly some bad code!

What you explain about the wait before it becomes active is exactly how it feels, you click on a link and then it takes a few seconds before you get any sort of response from the system. It's very strange.

From developer support these are the database queries that run on our master template. Some of them seem to take an excessive amount of time...

0 ms gethandle gethandle (returned pool2)
389 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551196' and ppm.party_id = '14554' and ppm.privilege = 'read' )

1 ms 0or1row dbqd.-lookup.entry_exists_select: 0or1row nsdb0

select item_id from cr_items where name = 'index' and parent_id = '3551907'

1 ms 0or1row dbqd.-lookup.entry_exists_select: 0or1row nsdb0

select item_id from cr_items where name = 'en:index' and parent_id = '3551907'

17 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '-4' and ppm.party_id = '14554' and ppm.privilege = 'admin' )

1 ms select dbqd.::xowiki::portlet::my-references-render.get_references: select nsdb0

SELECT page,ci.name,f.package_id from xowiki_references,cr_items ci,cr_folders f where reference=3553858 and ci.item_id = page and ci.parent_id = f.folder_id

1 ms getrow getrow nsdb0

1 ms select dbqd.categories.tcl.categories-procs.category::get_mapped_categories.get_mapped_categories: select nsdb0

select category_id
from category_object_map
where object_id = '3553858'

0 ms getrow getrow nsdb0

2 ms dml dbqd.::xowiki::Page-record_last_visited.update_last_visisted: dml nsdb0

update xowiki_last_visited set time = current_timestamp, count = count + 1 where page_id = 3553858 and user_id = 14554

1 ms 0or1row dbqd.views.tcl.views-procs.views::record_view.record_view: 0or1row nsdb0

select views__record_view('3553858', '14554')

1 ms 0or1row dbqd.views.tcl.views-procs.views::get.views: 0or1row nsdb0

SELECT views, unique_views, to_char(last_viewed,'YYYY-MM-DD HH24:MI:SS') as last_viewed
FROM view_aggregates
WHERE object_id = '3553858'

170 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551196' and ppm.party_id = '14554' and ppm.privilege = 'write' )

18 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551196' and ppm.party_id = '14554' and ppm.privilege = 'admin' )

23 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551196' and ppm.party_id = '14554' and ppm.privilege = 'create' )

17 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '109968' and ppm.party_id = '14554' and ppm.privilege = 'admin' )

2 ms 0or1row dbqd.dotlrn.tcl.community-procs.dotlrn_community::member_p.select_count_membership: 0or1row nsdb0

select count(*)
from dual
where exists (select 1
from dotlrn_member_rels_approved
where community_id = '3225'
and user_id = '14554')

3 ms 0or1row dbqd.dotlrn.tcl.dotlrn-security-procs.dotlrn::user_p.select_count: 0or1row nsdb0

select count(*)
from dual
where exists (select 1
from dotlrn_users
where user_id = '14554')

3 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '1303' and ppm.party_id = '14554' and ppm.privilege = 'admin' )

3 ms 0or1row dbqd.dotlrn.tcl.dotlrn-security-procs.dotlrn::user_p.select_count: 0or1row nsdb0

select count(*)
from dual
where exists (select 1
from dotlrn_users
where user_id = '14554')

1 ms select dbqd.theme-campus.tcl.navigation-procs.campus::portal_subnavbar.list_page_nums_select: select nsdb0

select pretty_name, accesskey, sort_key
from portal_pages
where portal_id = null and hidden_p = 'f'
order by sort_key

0 ms getrow getrow nsdb0

18 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '344' and ppm.party_id = '14554' and ppm.privilege = 'admin' )

18 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '14554' and ppm.party_id = '14554' and ppm.privilege = 'admin' )

412 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551056' and ppm.party_id = '14554' and ppm.privilege = 'read' )

23 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551056' and ppm.party_id = '14554' and ppm.privilege = 'create' )

10 ms select dbqd.weblogger-portlet.www.weblogger-portlet-front-page.entries: select nsdb0

select e.entry_id,
acs_object__name(apm_package__parent_id(e.package_id)) as parent_name,
(select site_node__url(site_nodes.node_id)
from site_nodes
where site_nodes.object_id = e.package_id) as base_url,
e.title,
e.content,
u.first_names as poster_first_names,
u.last_name as poster_last_name,
to_char(e.entry_date , 'HH24:MI') as posted_time_pretty,
p.package_id
from apm_packages p left outer join
pinds_blog_entries e on (e.package_id = p.package_id) join
acs_objects o on (o.object_id = e.entry_id) join
acs_users_all u on (u.user_id = o.creation_user)
where p.package_id in (3551056)

and e.draft_p = 'f'
and e.deleted_p = 'f'
order by p.package_id asc, e.entry_date desc, e.entry_date desc

0 ms getrow getrow nsdb0

0 ms getrow getrow nsdb0

0 ms getrow getrow nsdb0

0 ms getrow getrow nsdb0

0 ms getrow getrow nsdb0

0 ms getrow getrow nsdb0

0 ms getrow getrow nsdb0

0 ms getrow getrow nsdb0

0 ms getrow getrow nsdb0

1,136 Total Duration (ms)

Collapse
Posted by Dave Bauer on
Those long permissions queries are suspicious. Normally checking one object takes 1ms or so.
Collapse
Posted by Matthew Coupe on
How can I find out more information about what query this is? 408ms is a loooong time for every page impression.


408 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551196' and ppm.party_id = '14554' and ppm.privilege = 'read' )

Collapse
Posted by Brian Fenton on
Hi Gustaf,

I'm just reading over this old thread and I spotted your comment "for maxthreads=60, i would recommend 50 connections in pool1, the other ones smaller (e.g. 10)". Can I ask you why you said this, instead of just giving each pool 20 connections? I'm just curious, as it seems more intuitive to equally divide the connections between each pool.

many thanks
Brian Fenton

Is more related to your applications and how many nested queries (api-db-calls) you run.

Gustaf, when you mention "i would as well recommend to set maxidle and maxopen to 0, but this does not have to do with your performace problems"; can you elaborate why?

Collapse
Posted by Gustaf Neumann on
Brian,

for each request, the openacs handle allocator tries to first to satisfy requests for handle from the first pool, for nested queries for the second pool, etc. Every request accessing the database will need a handle from pool1. If you have e.g. 60 connection threads, the needed number depends on how many of these potential concurrent threads need the database. The value certainly depends on the applications, it might be different for custom applications. The values above are based on our experience in our learn@wu system (we added some monitoring to get better understanding of this issue).

Roc, see the two following postings for the maxidle/maxopen magic:
http://dossy.org/archives/000319.html
http://openacs.org/forums/message-view?message_id=921467

The folklore setting of

ns_param maxidle 1000000000
ns_param maxopen 1000000000

are just another source for possible confusion and overflow, not contributing anything useful.

hope, this helps
-gustaf neumann

Collapse
Posted by Brian Fenton on
Many thanks Gustaf. That makes sense. Thanks for taking the time to respond.

Brian

Collapse
Posted by Dave Bauer on
I'd recommend increasing shared buffers to 128MB.

You need to take the aolserver down, and restart postgresql after making the change.

32MB is very small for a modern server with 4GB ram.

128mb shared buffers is the default recommended value for OpenACS.

Collapse
Posted by Matthew Coupe on
I tried to set it to 128MB but postgres wouldn't restart, I set it to 80MB and it restarted but then I couldn't get any sort of connection whatsoever to my site. Sounds strange? Is there a total memory value somewhere that I need to increase?
Collapse
Posted by Matthew Coupe on
From: http://openacs.org/xowiki/postgresql-admin

I take it we need to increase this value too?
What would you recommend?

#Change the kernel parameter for maximum shared memory segment size to 128Mb: DAVEB: How current is this? Do modern 2.4 or 2.6 kernels have such low settings? This is highly dependent also on the amount of RAM on your server, most realy servers have gigabytes of RAM so adjust accordingly.

[root root]# echo 134217728 >/proc/sys/kernel/shmmax
[root root]#

Make that change permanent by editing /etc/sysctl.conf to add these lines at the end:

# increase shared memory limit for postgres
kernel.shmmax = 134217728

Collapse
Posted by Dave Bauer on
Yes, sorry I forgot to mention that! Your kernel max shared memory segment value needs to be increased to allow postgresql to use that much memory.
Collapse
Posted by Matthew Coupe on
What value would you recommend for that if I'm going for 128MB in the shared buffers?
Collapse
Posted by Dave Bauer on
This is the correct setting

# increase shared memory limit for postgres
kernel.shmmax = 134217728

for 128mb.

Collapse
Posted by Matthew Coupe on
I have kernel.shmmax set to 134217728 but then when I try to set shared_buffers to 128MB in the PostgreSQL config it won't start the postgresql service again?
Collapse
Posted by Matthew Coupe on
For information:

The database .dmp file is approximately 1GB. We run vacuum --analyze every hour and --full --analyze daily. With a DB that size do we need to increase memory by alot? I actually have the memory at hand to upgrade to 12GB but we haven't had chance to install it yet.

I've installed the latest dump file on a rubbish server and it is proportionally slow, so that should count out any other issues such as location or Red Hat OS issues.

Collapse
Posted by Gustaf Neumann on
> 0 ms gethandle gethandle (returned pool2)
> 389 ms 0or1row
> dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

permission handling seems very slow. maybe you have missed some updates adding indices; what is the output of the following postgres commands?

\d acs_permissions
select count(*) from acs_permissions;

\d acs_object_context_index
select count(*) from acs_object_context_index;

\d acs_privilege_descendant_map
select count(*) from acs_privilege_descendant_map;

Collapse
Posted by Matthew Coupe on
=# \d acs_permissions
Table "public.acs_permissions"
Column | Type | Modifiers
------------+------------------------+-----------
object_id | integer | not null
grantee_id | integer | not null
privilege | character varying(100) | not null
Indexes:
"acs_permissions_pk" PRIMARY KEY, btree (object_id, grantee_id, privilege)
"acs_permissions_grantee_idx" btree (grantee_id)
"acs_permissions_object_id_idx" btree (object_id)
"acs_permissions_privilege_idx" btree (privilege)
Foreign-key constraints:
"acs_permissions_grantee_id_fk" FOREIGN KEY (grantee_id) REFERENCES parties(party_id)
"acs_permissions_on_what_id_fk" FOREIGN KEY (object_id) REFERENCES acs_objects(object_id)
"acs_permissions_priv_fk" FOREIGN KEY (privilege) REFERENCES acs_privileges(privilege)

=# select count(*) from acs_permissions;
count
--------
193374
(1 row)

=# \d acs_object_context_index
Table "public.acs_object_context_index"
Column | Type | Modifiers
---------------+---------+-----------
object_id | integer | not null
ancestor_id | integer | not null
n_generations | integer | not null
Indexes:
"acs_object_context_index_pk" PRIMARY KEY, btree (object_id, ancestor_id)
"acs_obj_ctx_idx_ancestor_idx" btree (ancestor_id)
Check constraints:
"acs_obj_context_idx_n_gen_ck" CHECK (n_generations >= 0)
Foreign-key constraints:
"acs_obj_context_idx_anc_id_fk" FOREIGN KEY (ancestor_id) REFERENCES acs_objects(object_id)
"acs_obj_context_idx_obj_id_fk" FOREIGN KEY (object_id) REFERENCES acs_objects(object_id)

n=# select count(*) from acs_object_context_index;
count
--------
859795
(1 row)

=# \d acs_privilege_descendant_map
Table "public.acs_privilege_descendant_map"
Column | Type | Modifiers
------------+------------------------+-----------
privilege | character varying(100) | not null
descendant | character varying(100) | not null
Indexes:
"acs_priv_desc_map_idx" btree (descendant)
Foreign-key constraints:
"acs_priv_hier_child_priv_fk" FOREIGN KEY (descendant) REFERENCES acs_privileges(privilege)
"acs_priv_hier_priv_fk" FOREIGN KEY (privilege) REFERENCES acs_privileges(privilege)

=# select count(*) from acs_privilege_descendant_map;
count
-------
309
(1 row)

Collapse
Posted by Matthew Coupe on
On my dev box I changed the Master Template back to Zen and I was still getting some slow request times on the permission checks.

429 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551196' and ppm.party_id = '14554' and ppm.privilege = 'read' )

Collapse
Posted by Gustaf Neumann on
zen is no performance bottleneck, it won't affect the speed of permissions at all.

we have on our learn system more than 10 times the number of entries in the permission tables, and much better speed (most time below 1ms, with up to 2000 concurrently active users; but we have also a different hardware).

The indices you have are correct, these are the indices of the standard openacs installation.

We have on our productive environment two indices more:
Try to add these:

create index acs_privilege_descendant_map_privilege_idx on acs_privilege_descendant_map (privilege);
create index acs_object_context_index_object_id_idx on acs_object_context_index (object_id);
after creating the indices, do a "VACUUM" on both tables.

measure again. It is a while, since i have added these indices. They might be an artefact of our local modifications. if they don't not help, remove the indices again.

Collapse
Posted by Matthew Coupe on
> zen is no performance bottleneck, it won't affect the speed of permissions at all.

Sorry, I wasn't very clear. I was benchmarking against Zen to see if there are any extra expensive queries in my custom Zen template. Zen is a little faster but that is to be expected as our custom template is a little more complex.

I added the indices but there was little effect and some queries were still taking 240ms+.

Dave mentioned increasing Shared_Buffers from 32MB - I tried this but Postgresql wouldn't restart. What sort of value should I be going for here? Could this be the issue?

Collapse
Posted by Gustaf Neumann on
Seems twice as fast, but not really fast.
Have you set shmmax as indicated above? shmmax must be larger than the shared buffers, otherwise postgres won't start. note, that if you have multiple postgres servers running, you need memory for each server. What is the message, when postgres refuses to start?
Collapse
Posted by Dave Bauer on
Matthew if the shhmax is set try using less than 128mb shared buffers to see if that works.
ie: try 64mb or 96mb.
Collapse
Posted by Matthew Coupe on
I changed shared buffers to 96MB and check this out:

Total query time for db transactions:
58 Total Duration (ms)

Before it was:
1,136 Total Duration (ms)

0 ms gethandle gethandle (returned pool2)
4 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551196' and ppm.party_id = '14554' and ppm.privilege = 'read' )

1 ms 0or1row dbqd.-lookup.entry_exists_select: 0or1row nsdb0

select item_id from cr_items where name = 'index' and parent_id = '3551907'

1 ms 0or1row dbqd.-lookup.entry_exists_select: 0or1row nsdb0

select item_id from cr_items where name = 'en:index' and parent_id = '3551907'

2 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '-4' and ppm.party_id = '14554' and ppm.privilege = 'admin' )

1 ms select dbqd.::xowiki::portlet::my-references-render.get_references: select nsdb0

SELECT page,ci.name,f.package_id from xowiki_references,cr_items ci,cr_folders f where reference=3553858 and ci.item_id = page and ci.parent_id = f.folder_id

0 ms select dbqd.categories.tcl.categories-procs.category::get_mapped_categories.get_mapped_categories: select nsdb0

select category_id
from category_object_map
where object_id = '3553858'

3 ms dml dbqd.::xowiki::Page-record_last_visited.update_last_visisted: dml nsdb0

update xowiki_last_visited set time = current_timestamp, count = count + 1 where page_id = 3553858 and user_id = 14554

1 ms 0or1row dbqd.views.tcl.views-procs.views::record_view.record_view: 0or1row nsdb0

select views__record_view('3553858', '14554')

1 ms 0or1row dbqd.views.tcl.views-procs.views::get.views: 0or1row nsdb0

SELECT views, unique_views, to_char(last_viewed,'YYYY-MM-DD HH24:MI:SS') as last_viewed
FROM view_aggregates
WHERE object_id = '3553858'

2 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551196' and ppm.party_id = '14554' and ppm.privilege = 'write' )

2 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551196' and ppm.party_id = '14554' and ppm.privilege = 'admin' )

3 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551196' and ppm.party_id = '14554' and ppm.privilege = 'create' )

3 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '109968' and ppm.party_id = '14554' and ppm.privilege = 'admin' )

2 ms 0or1row dbqd.dotlrn.tcl.community-procs.dotlrn_community::member_p.select_count_membership: 0or1row nsdb0

select count(*)
from dual
where exists (select 1
from dotlrn_member_rels_approved
where community_id = '3225'
and user_id = '14554')

6 ms 0or1row dbqd.dotlrn.tcl.dotlrn-security-procs.dotlrn::user_p.select_count: 0or1row nsdb0

select count(*)
from dual
where exists (select 1
from dotlrn_users
where user_id = '14554')

3 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '1303' and ppm.party_id = '14554' and ppm.privilege = 'admin' )

2 ms 0or1row dbqd.dotlrn.tcl.dotlrn-security-procs.dotlrn::user_p.select_count: 0or1row nsdb0

select count(*)
from dual
where exists (select 1
from dotlrn_users
where user_id = '14554')

1 ms select dbqd.theme-campus.tcl.navigation-procs.campus::portal_subnavbar.list_page_nums_select: select nsdb0

select pretty_name, accesskey, sort_key
from portal_pages
where portal_id = null and hidden_p = 'f'
order by sort_key

3 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '344' and ppm.party_id = '14554' and ppm.privilege = 'admin' )

2 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '14554' and ppm.party_id = '14554' and ppm.privilege = 'admin' )

2 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551056' and ppm.party_id = '14554' and ppm.privilege = 'read' )

3 ms 0or1row dbqd.acs-tcl.tcl.acs-permissions-procs.permission::permission_p_not_cached.select_permission_p: 0or1row nsdb0

select 1 from dual
where exists
( select 1
from acs_object_party_privilege_map ppm
where ppm.object_id = '3551056' and ppm.party_id = '14554' and ppm.privilege = 'create' )

10 ms select dbqd.weblogger-portlet.www.weblogger-portlet-front-page.entries: select nsdb0

select e.entry_id,
acs_object__name(apm_package__parent_id(e.package_id)) as parent_name,
(select site_node__url(site_nodes.node_id)
from site_nodes
where site_nodes.object_id = e.package_id) as base_url,
e.title,
e.content,
u.first_names as poster_first_names,
u.last_name as poster_last_name,
to_char(e.entry_date , 'HH24:MI') as posted_time_pretty,
p.package_id
from apm_packages p left outer join
pinds_blog_entries e on (e.package_id = p.package_id) join
acs_objects o on (o.object_id = e.entry_id) join
acs_users_all u on (u.user_id = o.creation_user)
where p.package_id in (3551056)

and e.draft_p = 'f'
and e.deleted_p = 'f'
order by p.package_id asc, e.entry_date desc, e.entry_date desc

58 Total Duration (ms)

Collapse
Posted by Gustaf Neumann on
Good news! i guess, this means that you are happy with the performance now. maybe a conclusion we can draw is that if people realize a performance degration when upgrading to newer postgres versions (e.g. 8.2.5) is to check the shared buffers. Maybe postgres 8.2.5 suffers more than previous versions from insufficient shared buffers.

Note, that if you have now 96MB of shared buffers,
you are sill on the rather low side with your 4GB of memory.
Here are some old recommendations from postgres tuning:

* Start at 4MB (512) for a workstation
* Medium size data set and 256-512MB available RAM: 16-32MB (2048-4096)
* Large dataset and lots of available RAM (1-4GB): 64-256MB (8192-32768)

http://www.varlena.com/GeneralBits/Tidbits/perf.html#shbuf

Use "ipcs -m" and "ipcs -ml" on Linux to find memory usage and current limits.

Collapse
Posted by Matthew Coupe on
Yes, I'm feeling a bit better about it now!

I'm going to play around with the values next week and try to optimise the performance even further. We're upping to 10GB of RAM in the server next week so I should be able to increase it even more.

SHMMAX was the only problem still outstanding. I'm going to try doubling that figure and then increase the Shared Buffers accordingly.

There are so many variables to consider when upgrading it's almost like starting all over again!

I'll try the ipcs-m and ipcs-ml commands when I get back to the office and report back. hopefully this thread will be useful for others wanting to optimise postgres performance.

Thank you so much for all of the replies to this post. Without you all putting in the time this project wouldn't be so successful.

Matthew