Forum OpenACS Development: Scalability of acs_objects and acs_object_context_index

I started digging through the permissioning and object system. Good to see the permission_p PL/SQL function has got that small. I am looking at a loaded system with real-life data distribution and the permissioning query is still too expensive. Here's the first qustions I have:

The acs_objects table for the system in question has a whopping 1844064 rows. Here's the result of an object type count query:

portal_page - 316676
content_folder - 226755
apm_parameter_value - 216478
cal_item - 148456
acs_activity - 120809
portal - 109971
file_storage_object - 109523
content_item - 109523
calendar - 101141
user - 96731
membership_rel - 96731
dotlrn_student_profile_rel - 96728
site_node - 39744
apm_package - 30945
rel_segment - 8833
static_portal_content - 4416
dotlrn_club - 4413
dotlrn_member_rel - 3911
fs_url - 1550
dotlrn_admin_rel - 228

Objects of type content_folder or cal_item are certainly "things" in the system. But why is an object of type portal_page, apm_parameter_value, acs_activity, portal, dotlrn_student_profile_rel, apm_package, rel_segment, static_portal_content, dotlrn_club, dotlrn_member_rel, fs_url, dotlrn_admin_rel a "thing"?

Can someone confirm that objects of above mentioned types really retrieve information from acs_objects? If yes, which ones and what is the price of transferring this information to another table?

So acs_objects is not part of the permissioning code, however it is a table used extensively. Then I looked at acs_object_context_index with its even more whopping 5935428 (!) rows:

content_folder - 1243313
file_storage_object - 769692
content_item - 660169
portal_page - 651004
apm_parameter_value - 432956
cal_item - 417719
membership_rel - 290193
dotlrn_student_profile_rel - 290184
portal - 246420
acs_activity - 241618
calendar - 202282
user - 193462
apm_package - 150176
site_node - 79488
rel_segment - 17666
dotlrn_member_rel - 15644
dotlrn_club - 13239
fs_url - 9358
static_portal_content - 8832
dotlrn_admin_rel - 912
apm_parameter - 366

A couple of object types have incredibly many rows. Which of these object types really have permissioning information attached, which ones could be drop from this table and what would be the price of dropping them ?

I forgot to add: acs_object_context_index is the biggest of 4 tables in acs_permission.permission_p.
Dirk,

These are great questions. The main issue when determining if something needs to be an object is whether you want to get able to assign permissions to it.

Right. My gutfeel is that something like a portal_page or a apm_parameter_value can hardly have a finely-grained permissioning record attached to it, so these rows may just be deleted from acs_object_context_index.

We probably also have to look at each individual package to determine which object an have finely grained permissioning. E.g. on one project I worked on they were using a group calender: in such a case a single cal_item wouldn't need a permission/context record.

I think the Package Developer initially determined which things needed to be ACS Objects. I don't think you can just decide that you don't want them to be, and delete them, or am I missing something here? Your performance is going to be more related to the number of permissions checks per page, rather than the number of objects.

Some packages over use permissions/privileges as an index to who can view what on a given page. These pages will be very slow, and potentially slow down the whole system.

Whether or not permissions will be granted on something is *one of*, not the *only*, criteria to consider when making something an object.  Also important are whether or not it's important to track when the item was created, and by who, and whether or not an item is going to be plugged into the general  scheme of things.  If we stop making such things objects, then we'll be retrograding to the 3.x-style world where different suites of packages solve the same problem in different ad hoc ways.  That wasn't a particularly pretty picture, as I recall, so I'd like to avoid our going back to that state of affairs.

Having said that, there are clearly too many object types in the basic design.  Over time I'd like to pare them down, but only with careful thought.  It's simple to say that an apm_parameter doesn't need permissions, for instance, because anyone who can admin the package should be able to admin the parameters.  But the web admin UI allows for the creation of new parameters, and perhaps having audit info as to who created or last modified a param is useful?  If so, then in the 4.x design that means it should be an object.  Doing auditing separately might be a good idea but that's not how 4.x works and the odds of our making fundamental changes in the overall design at this level AND PROVIDING UPGRADE SCRIPTS for existing sites is very low.  On the other hand a future redesign that abandons existing sites would have  a lot more freedom, but so far the community has expressed little interest in this approach.  Most of us seem to want to write sites today incrementally improving 4.x instead.

Now, on to specifics ...

Have you actually measured the amount of time permission_p takes on your data?  This is another case of handwaving, saying "permissions are too slow" without posting data.  The number of rows in acs_objects is meaningless in the analysis, the pertinent question is "how long does a permissions check take?"

Can you post the query you're using to test performance?  Can you post how long it takes to execute and the query plan generated by Oracle and the hardware you're testing on?  Can you post pertinent Oracle stats so we can be sure the large amount of data in the system isn't causing cache misses etc?

The load testing you're doing is extremely useful, but we need a lot more detail before we can analyze the results.  I've personally just done load testing with a couple hundred thousand objects rather than a couple million objects because at the time I only had a P500 available.  However in that context permission checks, properly written ones at least, performed adequately.

In general the permissions design anticipates that you'll use acs_object_party_privilege_map in where clauses to filter rows returned by queries rather than calls to acs_permission.permission_p().

In the past, though, that view gave horrifically slow performance and ars Digitan advice was to avoid it, which is reflected in the coding of some of the packages.  And when we ported to Postgres, it turned out that view was *always* too slow to use, and we were forced to abandon its use.  In some contexts an alternate view, all_object_party_privilege_map, was usable but not in all.  So many people working on packages began calling the permission_p function to check perms in the where clause as a last resort.

Now, though, with my 4.6.1 permissions upgrade, the acs_object_party_privilete_map view's fast and should be used if your query does something like "return all the rows in this table that the user has the 'read' permission on".  In my work on file-storage while improving the performance of permissions I got a 3x-4x performance improvement using this view to filter files the user has read permission on vs. calling the permission_p() function.  Before my rewrite of permissions using the view would've slowed down the query by a couple of orders of magnitude, so you can see how the rewrite as impacted my thinking, at least, as to how to write queries that do permission checking (and I've upgraded the permissions design doc for 4.6.3 to reflect this fact, though I've not committed the new doc yet.)

Michael Hinds and I are slowly working on getting the rewrite of new-portals that Open Force began but did not complete working again.  It will take some time.  One reason I've been pushing to complete this rewrite is that it reduces the number of object types in the datamodel, with mapping tables which are strictly local to the package being plain tables rather than derived from acs_objects.  This should reduce the number of objects in a dotLRN installation  considerably.

I'm also curious as to the hardware you're testing on, and as to what kind of expectations people have for performance on a particular hardware platform.  A dotLRN installation with nearly one hundred thousand users is a HUGE system, and I personally wouldn't expect to be able to support such a community with a cheap PC, for instance.  Your loaded instance has about four times as many objects as the live system at Sloan/MIT that's been running for about a year now, and has three times the number of users than students that attend the University of Heidelberg.

As far as specific object types go, let's think about dotlrn_clubs as an example.  These are objects because they need to do user-role permission checking.  Who are the admins?  Who are the members?  Admins can modify policy for clubs on an individual basis, and classes, which are implemented similarly to clubs, have a more complex role structure (students, profs, TAs, etc.)

If these weren't objects, dotLRN would have to implement a separate permissions management scheme of its own, adding to its complexity.  The first two or three attempts at implementing a general permissions scheme for 4.x failed miserably in performance, and it took me a lot of work analyzing it before I figured out how to improve it to wher e it is usable with reasonably large numbers of objects.  So what are the odds that a first attempt to create a parallel scheme for dotLRN would scale well?  Even if it did, new hackers coming to the system would have more code with additional complexity to master before they could extend the  base system.  Isn't avoiding this kind of redudancy and parallel effort important, too?

There's an underlying philisophical issue that the community may want to discuss - our target price/performance for real websites, in some sense.  How much consistency in the design and ease of development are we willing to trade off in order to make it possible to run larger sites on less expensive server hardware?

There's not an obviously correct answer here.  You and Timo argue that execution efficiency should be number one, but efficiency of development as we continue efforts to support more rapid development is another way to save clients money.  There are likely to be trade-offs here.  How many in the community truly expect to be developing sites for clients with a hundred thousand users in an intranet (real users as opposed to unregistered visitors more typical of public sites), and for those who are, doesn't the budget for such projects tend to be much larger than budgets for more modest site development?  How far should we go to simplify the deployment of extremely large and busy sites that have significant development budgets if doing so increases the effort/cost to customize/write new packages for more modest sites?

I think these and similar questions are important, too ... and that there's probably not a single answer that fits the vision of everyone in the community.  So they should be considered when we talk about revisiting the design of various aspects.

There's actually a fairly long list of detailed modifications to the kernel that we can pursue that would help scalabilty without sacrificing generality, and we should start talking about them before too long.  Again, upgrade scripts for existing installations are a real hassle.  I spent about twice as much time writing and thoroughly testing my upgrade scripts for the permissions code as I did on the actual rewrite, for instance, and this is probably typical for changes to fundamental pieces of acs-kernel.

I just started working a bit on this, I'll try to answer Don's questions in the next days.

Personally I don't find the permissioning code too slow. The system in question is huge and finely-grained permissioning is probably not a good idea - not for "coarse" things.

I was careful in avoiding to complain about speed of the permissioning system :) Here's the explain plan output


         94  recursive calls
        232  db block gets
     167407  consistent gets
       1401  physical reads

for this query:


SQL>   select site_node.url(node_id) as url, acs_object.name(object_id) as name                                                                                                                     
      from site_nodes                                                                                                                                                                               
      where parent_id = site_node.node_id('/')                                                                                                                                                      
      and object_id is not null                                                                                                                                                                     
      and acs_permission.permission_p(                                                                                                                                                              
          object_id,                                                                                                                                                                                
          acs.magic_object_id('the_public'),                                                                                                                                                        
          'read') = 't';

Timed statistics etc. all later

What I want is a careful consideration of how we can make central tables a bit (fair bit? a lot?) lighter - *without* breaking the past and that step by step.

Don, could you post this list of modifications to the kernel? There's no need for the list to be complete, we could put this onto a project page, try to order it by impact in complexity, time, and benefits, and start discussing the value of every modification.

Collapse
Posted by Dirk Gomez on
Ts, I forgot very basic information: 12 rows out of 16 rows in site_nodes yield a 't'. Here's the cost for the some query without the callout to acs_permission.permission_p:

         98  recursive calls
          8  db block gets
        592  consistent gets
          0  physical reads

Ahh ... see, the code's calling permission_p() rather than use the acs_object_party_privilege_map, which might look something like:
and exists (select 1
            from acs_object_party_privilege_map m
            where m.object_id = site_nodes.object_id
              and m.party_id = acs.magic_object_id('whatever')
              and m.privilege = 'read')
and perhaps even poking at the magic object map table directly avoiding the call to acs.magic_object_id. Personally I'm not the least bit offended by code that pokes into tables we know won't change if there's good reason, and testing with cc_users showed that calling acs.magic_object_id() can significantly slow down queries.

If you have time, could you try recasting the query first using acs_object_party_privilege_map and then poking into the magic object map directly, and then post the results here? It would be interesting to compare the three cases, the current query, then modified to use acs_object_party_privilege_map, then further modified to use the magic object map table.

I'm curious as to the large number of physical reads in the existing query ... if that holds steady for repeated runs then doesn't that indicate your buffer cache is too small? This is just a side issue, though, I would expect rewriting the query to speed things up considerably.

Also the call to site_node.node_id will become problematical as the site_node map gets large, there's no index on the name column in the datamodel and the function's pretty expensive.
I had a query almost like
select site_node.url(node_id) as url, acs_object.name(object_id) as name                                                                                                                     
      from site_nodes                                                                                                                                                                               
      where parent_id = site_node.node_id('/')                                                                                                                                                      
      and object_id is not null                                                                                                                                                                     
      and acs_permission.permission_p(                                                                                                                                                              
          object_id,                                                                                                                                                                                
          acs.magic_object_id('the_public'),                                                                                                                                                        
          'read') = 't'
in my system until a few weeks ago and it became to slow to use. First of all site_node.url and acs_object.name are just as slow if not slower than permission_p. Also calling permission_p in the where clause (at least with oracle) is asking for trouble.

I've got more than 5000 site nodes so I created a seperate table to hold the url and name and ended up with something like

select url, name, acs_permission.permission_p(object_id,:user_id,'read) perm
from site_nodes n, site_nodesextra e
where n.node_id = e.extra_id
and paerent_id = :slash_id
You can maintain the extra stuff with a trigger. I stuffed the whole mess in a function and only return rows where perm is true. I also memoized the result for 5 minutes. In your case you could also add the public permission.

The last time I looked at the permission docs I think they say the permission system is not designed to retreive permissions on large row sets. Hopefully it still says this because it's true.

This is not as bad as it sounds and it might be helpful if the docs suggested some alternitives. If you need to group things make a package, put the object_id of the package in the context_id and look for that. I think the docs say don't do this but it works really well. You can also check for direct permssions which is really fast but not as flexible.

I've been running a production system for a couple of years now. We put about .5 gig of data into it every day and everything is an object. It's better to make everything an object and deal with whatever permission problems arrise, than to deal with inconsistancies caused by not making things objects.

Barry, are you saying 0.5 gigabytes of data into your production OpenACS system every day for two years? Equals 365 GB of data currently? That sounds like a lot! Can you tell us more about what your application is, what hardware and OS you use, and how you've coped with such large data volumes?
I think the use of the acs_object_party_privilege_map is safe to use to retrieve the correct rows rather than to do as you've done, which is to retrive all the possibly correct rows then return whether or not "read" is allowed, where presumably you throw away the "false" rows at the Tcl level. Note though that this is *only* true because of the changes made in 4.6.1. Beforehand using the view was, as I've said above. asking for trouble, guaranteed in Postgres and with about a 50% probability in Oracle with the odds being due to the unpredictable behavior of the Oracle query optimizer. So you might try recasting that query using my
exists
clause above as a model, getting rid of the permission_p call in the rowset portion of your select clause. I'd be interested in seeing a comparison between the two, if, of course, you have the time to do so and sufficient curiousity. The more data we get, the easier it will be to think about any future optimizations we might make and the more confident we can feel when writing a "best way to query permissions" guide.
I've tried acs_object_party_privilege_map before without much success, but I may try the 4.6.1 version if I run into trouble again. From what I've seen the permission system is very usefull but I you need to return a rowset based on the value of permission_p you need to rethink your design.
The view is now literally a couple of orders of magnitude faster in some (not all) cases in Oracle, varies little in performance whether an object gets a privilege directly or through inheritance, and can be much, much more than a couple of orders magnitude faster in PostgreSQL (the view would occasionally eat all virtual memory and cause Postgres to get very unhappy.)

I don't think your statement's really true anymore.  I added the permissions check into SloanSpace's file-storage folder display query on their test server (they'd taken it out for the very reason you state) and it only slowed the query by about 10-15%.  That's not a bad penalty at all for the fine-grained control you get (if you need it.)

I forget how many files were in their system at the time but I know there were over 850 URLs alone and that the vast majority of entries were not URLs.  10-20K files at least and a half-million objects ...

YMMV of course.

Wow! Don: it looks like I really need to revisit my thinking on data model design. If real permissions can now be queried so efficiently, things I thought were just impossible in the real world may now be the only reasonable way to do things.

Let's put it this way ... it's a LOT faster than it, and it doesn't degrade spectacularly in performance when you add lots of parties, objects, etc the way it used to.

Whether or not it is fast enough for a particular real-life site is an open question, though.  Testing is advised!

Also I'd still like to flatten out the number of defined permissions.  If we didn't have 20 different "foo_read" privileges performance would be better.  Currently the more packages you add the poorer the permissions system performs simply because of all these "foo_read", "foo_write" and "foo_admin" permissions all of which are children of the global "read", "write" and "admin" perms which adds to the denormalized hierarchy maps.

Server prices are getting awfully low for impressive performance, though, my new SCSI based Dell 600C, which after a second disk (for software RAID 1) and an extra 512 MB registered ECC RAM came in at about $800, greatly outperforms the dual P450 SCSI based system I built for $2100 four years ago and has 8 times as much RAID 1 storage and the disks are 1/3 faster with huge 8MB buffers, too.

Not that this means we want to wantonly burn cycles, but the flexibility of the permissions system certainly is attractive and high throughput servers are cheaper and cheaper ...

Collapse
Posted by Ola Hansson on
Also I'd still like to flatten out the number of defined permissions. If we didn't have 20 different "foo_read" privileges performance would be better. Currently the more packages you add the poorer the permissions system performs simply because of all these "foo_read", "foo_write" and "foo_admin" permissions all of which are children of the global "read", "write" and "admin" perms which adds to the denormalized hierarchy maps.
Don, the "foo_admin", etc., kind of privileges obviously exist in some numbers in the current toolkit ... I am curious if you (or someone else) could explain why they are there in the first place? What purpose do they serve?

From a practical administrative viewpoint, it is of course important to be able to delegate the administrative responsibility over certain packages in order to relieve the SWA(s). But that, in itself, doesn't mean you have to create a specific "foo_admin" privilege for every such package - giving a person the normal "admin" privilege on a package instance would suffice, right?.

As far as I can tell, the only benefit from using "foo_admin" to using "admin" would be that you may give someone "foo_admin" (admin restricted to instances of package "foo") on a whole subsite or the entire site, and not have to hand out the same permissions again when the you (the SWA) mount new instances of package "foo".

I presume the use of the "foo_blah" type of privileges may have better justification than that, though, if it is to be worth the performance cost.

It's not clear why Ars Digita wrote its packages to define so many new privileges.  The permissions design document urges one to be conservative about doing so, presumably someone in the core team changed their mind about that during the early days of ACS 4.0 without changing the documentation.

Do any of our Ars Digitans know?

The case you describe is just about the only use for these "foo-admin" etc privileges I've heard.  But as you say, it's really not sufficient reason if it imposes a performance penalty, and I know for certain it does.  It would just be a convenience to subsite admins, that's all, and there are other ways it could be implemented.

If we only had a handful of privileges in the system, it would make the generalized permissions admin UI a lot more usable, too.  Right now you're presented with a whole list of "foo-admin", "bar-admin" etc privileges you can map to a given object even though they don't have any meaning in most cases (elsewhere I've discussed possibly adding a mechanism to restrict the object types a particular privilege can be assigned to, but if we cut down the privileges in the system to a rational set this might not be necessary.)

Here's a thread about permissions hierarchy to refresh your memory:

https://openacs.org/forums/message-view?message_id=26613

Not a particularly flattering look at my early post-aD days, but oh well.  It does discuss why I ended up creating "glossary_admin", etc.

Enjoy my defensive responses,

I have only been putting in that much data for the last few months. I've got about 50 gig total. Much of it is XML documents and images. I just passed 10,000,000 acs objects. I may be at about 350 gig and 100,000,000 objects by the end of the year.

For hardware I've got 3 Sun X1s, Intel ssl box and a Dell 24?? dual xeon Oracle database with about 500gig mirror disks, Currently I do DNS load balancing but I've got an Arrow Point Load balancer I may hook up. Interesting hardware is dirt cheap these days. The SSL boxes where $250 a piece.

The site is only available to registered users and runs all SSL. It allows insurance companies to order house inspections and inspection companys to complete them and return them. It's really just a many to many B2B site.

We use acs_objects and permissions for everything. Currently I'm running 4.2 but I'd like to upgrade but I'm not sure quite how. I'm also planning on switching to 9i so I guess I'll do it all at once. Maybe I should write a case study for this site.

I will try the view memtioned above and see how it works.

Barry, FYI, Paul Huibers has upgraded a (much smaller than yours) ACS 4.2 site to OpenACS 4.6.x via the "run all upgrade scripts on the database" route. (As opposed to the "clean install of 4.6 and insert all old data" route.) I don't know if that'd be the best way to go for you, but it worked for us. It was not trivial, but a lot of the work may have been Paul fixing broken old upgrade scripts, etc.
Since there was never any scalabilty testing, we were told to make all the packages have their own permissions. That is why acs-reference (for one) have outdated permissioning.

I should probably change that now that deleting things from objects, permissions et al works (which it didn't used to so I left it there).

And yes you are right about wanting to assign someone say news admin but not site admin. I am not sure that is still the right way to do it as you can just create a group and assign the group admin on that instance of the package.

Jon,

That is such great, simple idea! In fact, you could create a news-admin group per subsite, and programatically assign the admin permission on the group using the APM callbacks.

I think that this is a better solution to the admin all packages of this type problem.

That sounds like the start of a plan! In more detail, how would this scenario work?

Subsite:
In an after_instantiate callback we'd create "News admin", "Forum admin", etc, groups.

News:
In an after_instantiate callback we'd grant "admin" to the subsite's "News admin" group on the News package_id (the package_key, that is).

...and on and on with Forums, etc...

Would this work if you mounted a second and a third ... subsite, where a new group with the same name (News admin, for instance) would be created?

In my scenario above, we should not grant admin on the news package_key, but rather on the news instance_id ...

(I was playing around with an _after_install_ callback for news, that's why I wrote such an obviously wrong thing)

Yep, Jon's pointed out one of the "other ways" this could be done.

Barry - do NOT try the acs_object_party_privilege_map view in a stock ACS 4.2 system!  Making that view scale was one of the major changes in OpenACS 4.6.1.

As far as upgrading goes ... I know at least one ACS 4.2 site has upgraded to the 4.6.1 permissions code without ill effect.  It's just a datamodel change and is transparent to client code.

It adds a new denormalized table which trades speed for space.

Ola,

Something like this:

Afer a news package is mounted, it would check is a news-admin group existed for the subsite. I am not sure exactly how to do this yet. If one already existed, it would use that and grant admin on itself to the existing group. If the group did not exist it would be created first.

What I am planning on the rewrite of the Cronjob Package is to allow multiple instances. All crons will have the context_id set to the instance_id of the package instance that created the cron. All privileges are assigned to the instance_id. Essentially the instance_id is the 'group_id'. However, I'm trying to get two modes: user and group. In group mode the package_id would be used to select which crons you can read/write. In user mode, the user_id of the cron creator would be used. You can switch back and forth between modes, per instance.

I don't know how good an idea this is, but I have never found a need for any additional privileges. It seems like the group idea of Jon's would allow any additional specific privs to be created in a very understandable way.

I think Greenpeace also upgraded a 4.2 site to OpenACS 4.6. You might contact them and see if they have their migration scripts around.

It would definitely be a service to the community to post these.

You might also talk with Jon Griffin: http://jongriffin.com/static/openacs/acs-conversion/upgrade

We only migrated custom packages from the ACS 4.2 core to the OpenACS 4.5 core.  There was no existing data therefore no data migration problem, and there was no use of the standard packages (forums, news, cms, etc.)
Barry and Don, I wasn't totally clear when I said:

It would definitely be a service to the community to post these.

I meant the case study on doing Barry's migration from 4.2 to OpenACS 4.6, not Greenpeace's scripts. Sorry for the confusion. And thanks for the clarification on what was migrated Don.

(5 days without net access - sorry)

Damn, I should not have posted the "offending" query. I meant to start a discussion about application deficiencies, not about a particular query.

Nonetheless here the results for diff. queries with the same semantics:


 select site_node.url(node_id) as url, acs_object.name(object_id) as name
      from site_nodes
      where parent_id = site_node.node_id('/')
      and object_id is not null
      and acs_permission.permission_p(
          object_id,
          acs.magic_object_id('the_public'),
          'read') = 't';

Statistics
----------------------------------------------------------
        118  recursive calls
        136  db block gets
     167407  consistent gets
       2026  physical reads

 select site_node.url(node_id) as url, acs_object.name(object_id) as name
      from site_nodes
      where parent_id = site_node.node_id('/')
      and object_id is not null
      and exists(
         select 1
           from acs_object_party_privilege_map m
           where m.object_id = site_nodes.object_id
           and m.party_id = acs.magic_object_id('the_public')
           and m.privilege = 'read');

Statistics
----------------------------------------------------------
     166032  recursive calls
     663816  db block gets
     333325  consistent gets
       1984  physical reads

select site_node.url(node_id) as url, acs_object.name(object_id) as name
      from site_nodes
      where parent_id = site_node.node_id('/')
      and object_id is not null
      and exists(
         select 1
           from acs_object_party_privilege_map m
           where m.object_id = site_nodes.object_id
           and m.party_id = -1
           and m.privilege = 'read');

Statistics
----------------------------------------------------------
         80  recursive calls
          8  db block gets
     167324  consistent gets
       1959  physical reads


So unfortunately no advance with directly accessing acs_object_party_privilege_map. But note the price of acs.magic_object_id: "evils of PL/SQL" (my broken vinyl record) and not telling the database what I know. Maybe I should start there.

Proposal 1: Hardcode values that are hardcoded anyway.

the_public always has the id -1. Not using a function to retrieve this id made a *gigantic* difference for this particular query. It will make a gigantic difference for many other queries as well.

Replacing it with a join to the magic_objects table is not enough. I *do* know that I want -1. Why not tell the database right away? The optimizer may make a much better decision and I don't lose flexibility. If people dislike hardcoding values, then we should use PL/SQL global variables and the TCL function acs_magic_object_id.

Proposal 2: Gradually remove permission types.

Not only do they make the permissioning system less scalable, they make the packages much more complex. Why are the foo_* permissions there in the first place? I think they were a selling argument..."our permissioning system is extensible..." There's no real use for most of them. And forum_write and write is similar enough. As to having admin on news and not on subsite - you would grant a particular user admin on the news package, not on the subsite - or am I missing something here?

Proposal 3: Research which object_types in acs_objects and acs_object_context_index are being used.

There still is not a good policy on which data needs to go to acs_objects etc. It is not unlikely that some information in the central tables is write-only. We may be able to remove rows from central object tables without further ado...

Proposal 4: Come up with a good policy for acs_objects etc.

The delination of what needs to be an object and what not is still very fuzzy. https://openacs.org/doc/openacs-4-6-2/objects.html refers to a rule of thumb. We should find a "proper rule" and then have an eye on changes to central tables. Here's a fuzzy propsal: "Everything that has a URL to the user needs to be an acs_object", the rest goes to other meta_tables e. g. acs_audited_objects, acs_internal_objects

Proposal 5: Flatten out the privilege hierarchy into a table

The privilege hierarchy is sstill a view that performs an implicit connect-by. We should just flatten that out with triggers or - Oracle! - a materialized view.

Proposal 6:: Add a section "Permissions and site-wide metadata" to packages docs.

As using central tables inevitably affects other packages their use should be documented. Probably helps us get a grip on a "metadata table use".

I'll be throwing in my time once we have reached a agreement on any of above mentioned points...

<blockquote> But note the price of acs.magic_object_id: "evils of
PL/SQL" (my broken vinyl record) and not telling the
database what I know.
</blockquote>

Hi Dirk!  What if you tried the following in your query above:
    and m.party_id = (select acs.magic_object_id('the_public'))

I seem to remember optimizing a Postgres query by doing something similar, the idea being that the subselect only runs once.  BTW, does Oracle support the notion of defining acs.magic_object_id() as immutable or stable?

Try it without any permission check and see what the results are. I suspect acs_object.name and site_node.url are as much a problem (if not more) than the permission call
Dirk, is there any specific reason why you are always including site_node.url in the queries you are testing? I thought that was discussed previously and it was agreed that this proc could be replaced in most cases. I would be more interested in stats that measure realistic security checks only, which are required in many situations.
Yes, it gets much more interesting if you remove the other PL/SQL calls.  If you have time could you just return the site node node_id and otherwise strip out all the PL/SQL calls with the acs_party_object_privilege_map version using "-1" so we can get an idea as to the cost of the perm check alone?

Using magic numbers in the code isn't really a good idea, for instance I just changed the security context object id from 0 to -4 in 4.6.1 and had to chase code that had hardcoded object 0 (fortunately not much had.)  I needed object id 0 in order to provide a "real" party with party_id 0 for the "unregistered visitor" ([ad_conn user_id] == 0 case) so I could remove a UNION branch from the various views (all UNION branches are gone now, more important for PG than Oracle.)

PG doesn't have global variables and I'd hate to see us adopt an Oracle-only solution.  With a subselect in your from clause you can probably grab the magic object id "properly" and end up with the same efficiency:

from (select acs.magic_object_id('the public') as party_id from dual) public
...
where party_id = public.party_id

while maintaining the abstraction.

File storage queries improved to a much greater extent when I moved to using the acs_party_object_privilege_map.  What I've been seeing is more consistency using the view compared with calling permission_p in the where clause.  Sometimes Oracle seems to get this bright idea that it should call permission_p() first on all rows before doing any other qualification on them and bang! you're out 'o luck.

Hmmm ... come to think of it, I think Oracle's calling acs.magic_object_id for every row in the acs_party_object_privilege_map before it weeds any out and that's why removing the call made such a HUGE difference!!!  Yeah, I'm sure of it.  BAD Oracle!  NASTY Oracle!  DON'T DO THAT, ORACLE!

Flattening the privilege hierarchy can be done, we've denormalized all this in the PG version.  But ... getting rid of all those "foo_*" style privileges should make that unnecessary.  Even CONNECT BY should be fast when there's only one or two things to connect!

I know you didn't want to focus on this query per se but real world examples like this are a big help when thinking about improving scalability.

When I last proposed using a materialized view in the permissions realm I was informed that they're an Enterprise Edition feature only.  We've held back from requiring EE features since the added licensing costs are burdensome.

We could, however, insert the code to materialize the view in comments, as we've done with bitmap indexes, and a common inviting people to uncomment the code if they're EE licensees.

Tilmann, the site_node function is in there because it is in the code I found. I didn't write the query, I am just using it as an example. Where did I find this query? In /www/index-oracle.xql - the first page you visit on a freshly installed OpenACS system. Or well, the first page you visit on any OpenACS system.

Don, yes the CBO is apparently bad about estimating the price of a PL/SQL function - hence performs it on every row.

Barry, look above. Of course I had run the query without the permission check and had posted the result in this thread. For your convenience the result:


         98  recursive calls
          8  db block gets
        592  consistent gets
          0  physical reads

Still expensive (the result has only 13 rows), but *definitely* not the culprit here.

http://technet.oracle.com/doc/oracle8i_816/server.816/a76962/ch4.htm: Yeah, materialized views are EE edition only 🤔

Why not just pass on the magic object id from the TCL code. It would give the database all the information it needs. cc_users could be commented e.g.: -- this is ugly -1 is the magic_object_id for the_public Robert, the CBO rewrites the query anyway. Per default it tries up to 80.000 different execution plans and however you shuffle around your PL/SQL function, your query plan is most likely the same. (There is a "trick" with rownum in subselects though)

Here is the version that is both - textbook-compliant (no magic hardcoded values in your SQL script) and efficient (only *one* substitution at creation time, never ever during run-time):

column magic_object_id new_value MAGIC_OBJECT_ID
select acs.magic_object_id('the_public') MAGIC_OBJECT_ID from dual;

create or replace view foobar as select * from
  acs_permissions
  where grantee_id = &MAGIC_OBJECT_ID;

It is unfortunate that this query is on the first page of acs_subsite because it will fail sooner or later. From what I've seen it is not easily fixed either. My usual quick solution is to move the permssion check into the select part and get rid of the rows in TCL. That as least makes the query time dependent on the number of rows returned not the number of acs_objects.

I *really* don't want to tune this particular, but discuss the size of kernel metadata tables...

If you want to make the CBO defer the evaluation of PL/SQL functions you can try this:

select * from
 (select rownum, a,b,c
  from d,e,f
  where ...)
  where acs_permission.permission_p(...)='t';

Try it. I find the TCL solution very, very kludgy.

Dirk, yes, grabbing the magic id with the tcl proc then using the result as a bindvar in the query is a clean way to avoid using a magic number in the query.

Also ... acs_object_context_index is weighing in at less than four times the number of objects.  This is a table with three integer columns in it, a pretty skinny table, the actual storage used will be considerably less than that for the objects and auxillary tables.

Hmmm...one concern with your query (even though you don't want to talk about it :) is the number of physical reads.  Looks like bumping the shared buffer space for Oracle is in order, no?

In the next few weeks I'll load up my nice new server with its nice new disks with a much larger number of objects than I've tested with in the past.  I'll play around with expanding the hierarchy view in Oracle as we do in PG and see what effect that has.

I agree the TCL solution is kludgy (perhaps even worse if you use db_multirow), but I've found that Oracle is way to cleaver to be fooled by something like
select * from
(select blah)
where acs_permision() = 't'
I'm running 8.1.7. Perhaps other versions are not so smart.
Barry, note the rownum in "my" query. That should do the "defer" magic.
A new stupid Oracle trick. I'll have to try it.
Don, yeah, bumping up the memory should avoid physical IO. But while the data is real, the instance isn't.

I'll write up some "plan" over the next few days recapping and summarizing this thread. This is probably more a 5.1 thing then, because Peter and I have picked up "noquote" again.