Forum .LRN Q&A: Assessment items discussion thread

To limit individual emails and store some ideas from chats, I want to open the assessment item discussion thread which deals with the creation of the assessment item page at https://openacs.org/projects/openacs/packages/assessment/design/as_items.

From the preamble:

The Item and Section catalogues are central parts of the assessment system. These repositories support reuse of Assessment components by storing of the various Items (or questions if you like) and groups of Items (ie Sections) that can be used in an assessment. You are able to add/edit/delete an item of a certain type to a certain scope. Furthermore it allows you to search and browse for questions for inclusion in your assesment as well as import and export multiple questions using various formats.

Collapse
Posted by Malte Sussdorff on
I hope we will get a general help system attached to widgets, but for the moment I have a few notions:
  • Why do we have a sort order in as_item_help_map (is it possible to have multiple help texts under *one* icon (just thinking UI here)?
  • Shouldn't as_messages contain a locale ?
Furthermore, it would be good if we could colorize the mandatory tables vs. the optional ones. Even better would it be to colorize the tables into feature areas, which would allow us a staged development process.

To give an example: Though internationalization is key, our method of overlaying the default locale stored in as_items with additional locales in as_item_checks allows us to defer that functionality to a later stage, but we know that we always have to add a locale to the as_items table (probably defaulting to the users current locale).

Collapse
3: as_items IRC discussion (response to 1)
Posted by Malte Sussdorff on
https://openacs.org/irc/log/2004-03-01#T18-42-47

Talks where about as_items, creation of item_types, localization, help, display and other things. Major changes in  the specs are a result.

Collapse
Posted by Stan Kaufman on
Malte, I've updated the Items page with a "colorized" graphic that divides these metadata components functionally. It's hard for me to see how any of these pieces are "optional". They all look mandatory to me. 😉

Re your specific points:

- a sort_order on as_item_help_map may not be necessary, but it doesn't hurt anything either. What mapping table doesn't benefit from a sort_order column?

- a locale column in as_messages makes sense and is now there.

Collapse
Posted by Carl Robert Blesius on
Great graphic Stan. Makes it possible to actually follow where things are heading.
Thanks Carl. Those with more intracranial RAM may not need it, but I gotta have pictures of this stuff. 😉

I'd like to get some ideas from people about what it means to maintain versions of the components in an Assessment. The reason this subject is important is that it will guide how we use the CR in the package. Humor me here while I think aloud.

First, consider that we're trying to support maximal reuse of Items, Sections and even entire Assessments while also enabling Assessment authors to modify elements as they reuse them. When an author uses an existing Item, she can use it exactly as is, modify it completely, or modify it to some degree in between. Unfortunately for us, there are no clear semantic boundaries here. If an author takes a multiple-choice Item with three options, and then adds a fourth, is this Item now a "small" version change? Or is it a fundamentally different Item? How about if the author changes the wording of the "question" but leaves all the choices the same. (We'll ignore translations to other languages for now, but that is yet another issue.)

Second, consider that the CR provides two main ways to "reuse" things. An existing cr_item can either get a new cr_revision, or it can be cloned completely into a new cr_item. From the standpoint of the CR's definitions, the first way produces two versions of one item, while the second way produces two different items. This is true even if the new cr_revision has every attribute ripped out and changed to something different from the first cr_revision, while the new cr_item is cloned without changing any its ancestor's attributes. The first case produces potentially an "evil opposite 'twin'" while the second case produces a "completely identical stranger." The point: the CR doesn't impose any semantic controls here on what constitutes a "revision" vs a new "item". The distinction is entirely left to the whims (or hopefully good judgment) of the user.

So here are the questions:

1. Is it the case that we can't and shouldn't try to be any more prescriptive in this matter than "leave it up to the user"? Should the core elements in the Assessment package be implemented as cr_items/cr_revisions and let users branch their repositories in whatever fashion they like, no matter how cluttered their respositories will get over time? This would be easiest, perhaps, but doesn't seem best to me.

2. Is the notion of a "revision" nonsensical here such that we shouldn't even use the CR? If an Item (or Section etc) is changed even a little bit, is it now a "new" entity that should stand in the repository as a sibling to its predecessor, not a child? In that case, should we merely "clone and modify" new rows in the as_items table, eg and forget about the CR?

3. Can anyone articulate a convincing rationale for when an author should make a new cr_revision "revision" instead of a new cr_item "clone"? How about this: since the "item_text" and "item_subtext" (ie "what the question is") express the meaning of the Item, if these attributes were put in the cr_item and thus made immutable, then other attributes (the number/nature of the choices to the Item, scoring, etc) could be changed.
So if the author means the same thing when revising the Item, she would make a new cr_revision and not change the wording of the question. But if she wants to change the wording of the question, then she really wants to change its meaning, so she needs to insert a new cr_item cloned off the original.

On one hand, I think that option 3 makes good sense and may serve most potential applications the best, as well as make optimal use of the CR (which is the goal within the OpenACS framework).

However, let me cite a counterexample. At some point during his research, my colleague who developed and validated the "Seattle Angina Questionnaire" (see http://www.cvoutcomes.org/demos/) added a fifth choice to one question. He didn't change the wording of the question -- just added another option for one of them. This changed the scoring of one scale and fundamentally altered the interpretation of the entire instrument. Prior clinical data scored the first way was no longer directly comparable to subsequent data. In a very real sense, the minor "revision" resulted in a semantically new assessment. So if the SAQ were being implemented via the Assessment package, how should this be handled?

Anyway, I'm keen to hear what people think about this, since whatever semantic implications we decide there are during editing any of these elements will have to be implemented via triggers or other procedural mechanisms as well as the fundamental physical modeling in the db. The better we can understand what people expect here, the more likely we are to get the package right. Thanks.

Oh yeah, and if you think that this is all a steaming pile of nonsense, please point that out, too. I may have worried myself into a deep dark hole from mulling over this too long. 😉

Collapse
Posted by Malte Sussdorff on
I did some cleaning up between as_items and the rest of the metadata, moving around bits and pieces.

Stan, could you change your graffle on the as_items page to reflect this fact and not show any section or assessment details, but recreate them on the metadata page with as_items showing up as a black box (we only have two interfaces anyway, from as_item_section_map and as_scale_item_map).

For those concerned, application wise we are going to use an as::item::display function that will take the snippet out of as_items and insert it appropriately. This allows us to stay scalable despite the many many tables and functions surrounding as_items. We might actually go even further and create the whole survey in html pages, but for the moment, we don't have to.

Collapse
Posted by Stan Kaufman on
Based on our discussions earlier today, I've heavily updated the entire Design section here: https://openacs.org/projects/openacs/packages/assessment/design/

Please review it and point out anything I missed or stuff that you still don't agree with, if anything.

Collapse
Posted by Stan Kaufman on
I've also edited the Items Requirements page (https://openacs.org/projects/openacs/packages/assessment/requirements/item_types) a fair bit, highlighting some of the more important things in red. This list doesn't include a number of authoring configuration options that we've already listed in the Design documents.

More importantly, though, there are several "item types" that are really not *item* types but in fact are *section* types. It probably doesn't really matter for the purposes of this page, since whether a given Question Experience is fashioned via manipulation of an Item or a Section is really a Design Issue. But I think it's probably useful to keep the distinctions clear even though we've already leapfrogged the requirements in the design docs.

Collapse
Posted by Malte Sussdorff on
Stan, this is awesome work! You are Mr. Graffle, definitly. Thanks for doing the grunt work of splitting it up, this makes it so much easiert to read and to edit.

Can I ask you to upload the graffles before you leave with the same name as the image to the file storage folder ? This will allow me to link them to the images and I could edit them as well (and not have you go through all changes in the documents on your return to update the graffles).

Which brings me to another point: How shall we model the relationships (aka foreign keys) and types of the table attributes?

My inclination is to add this information into the text and not the graffles, but maybe others prefer to have a clean ER-Diagramm ?

Collapse
Posted by Stan Kaufman on
Malte, I've uploaded all the OG files from which I exported the subsystem graphics. You'll see them in the listing of the Assessment files in the file storage each with the same name as the graphic image appended with the term OmniGraffle. It should be clear.

I've also included a link to the OG file just above the image in each page, so that should make it easier to download the OG files for editing. Hopefully the naming scheme will make it easy to upload new versions into the correct place.

There are a variety of other files in our repository used in the older doc pages. They kinda clutter up the area but I don't think it's bad enough to move them into subfolders, and I don't want to bother with reediting all those archival pages.

Re the level of detail to put into the diagrams (like FKs etc) -- I didn't find an obvious way to make OG do some of the formatting that a tool like Lars et al. use in their Simulation package (/contrib/packages/simulation) can do. If anyone knows how to do this and wants to bother, that's fine.

But personally I think that kind of detail is better kept in the "Specific Entities" section where we list and discuss the design. The purpose of the graphics is to draw the Big Picture and keep things in perspective. Trying to maintain All The Details in *two* places (the listings *and* the graphics) makes it likely that *neither* place will have the reliable information.

Even the way we have the graphics now invites errors during edits, since we have the "master graphic" (which I started off considering the Authoritative Schema) and all the subsystem graphics. If anyone edits the latter, I would think they should make the same edits to the former. But that doesn't seem all that likely, does it? So there is a high risk of Graphic Slippage. 😉

Anyway, I suggest that we follow this convention:

1. The Definitive Design Spec for any entity is its listing/discussion in the "Specific Entities" section of its subsystem.
2. The Most Reliable Graphic for any entity is the subsystem graphic on the page on which that entity's "Specific Entities" discussion occurs.
3. The Master Schema Graphic is *not* the definitive documentation of anything, though everyone who edits stuff here should make a good faith effort to keep it accurate and up to date.

So, how much SQL detail do we need to write in these html pages? Seems to me that we don't need a lot more; the primary keys and foreign keys are pretty obvious just by their naming scheme. I don't think that we need to type all the constraints etc here; we'll just have to do it again when we crank up emacs and start writing *.sql files. If anyone has a strong opinion about something like whether an attribute needs to be a varchar(100) and not a varchar(200), that's the kind of thing that *should* be stated in the "Specific Entities" discussion, but the *rationale* is more important to type in than the exact SQL syntax.

At least that's the way I like to approach development docs. 😉

Collapse
Posted by Malte Sussdorff on
We started the discussion on the SQL. A couple of questions started to come up:
  1. We have seen the table as_item_localized to localize items but we haven't found a way to localize the item choices. How are you going to do that?
  2. Why don't we use only the as_item_localized table for localization and remove the default_locale columns in as_items table?
  3. We also don't know if as_items table is an acs_object or cr_revision or both.
Answers given:
  1. This is a big issue. It might be best to create an as_localization table, consisting of (item_id, table_name, column_name, locale, value). Not sure how the database api will allow us to set the variable called "column_name" to the value "value" in the caller context.
  2. We don't want to hit the localization table with every request. Furthermore it is not necessary, as we do not assume that all content is internationlized (most items wont). Therefore overlaying is the name of the game (yet another of these OO like features I like)
  3. the question about how to use the ACS object system and more specifically the CR is one we need to work through before anyone starts investing a lot of effort in typing the SQL. In fact, if we do use the CR, we'll actually want to create much of the datamodel using the TCL interface to it. See this thread for more info: https://openacs.org/forums/message-view?message_id=162095 (I allude to this in the API page I started here https://openacs.org/projects/openacs/packages/assessment/design/api but sort of stalled on).
This is just a summary, thanks to Eduardo, Alvaro and Stan for the comments and questions.
Collapse
Posted by Stan Kaufman on
Let me amplify my email remark that Malte included as "answer 3" above, since it's really more of a question than an "answer".

We need input from two directions:

1. Those people in dotLRN land who are actively implementing "testing" in real curricula: how do you understand "revisions" of such tests? What is the distinction between a modification of the "same" question and a modification that implies that you now have an entirely "new" question? Is there a distinction? Is there a definable transition point? If so, what is it? If not, are we nuts to talk about "revisions"?

2. The OpenACS gurus who best understand the CR: where in our Assessment datamodeling can we best utilize the CR? Between the two polar extremes of making *every* table inherit from cr_items/cr_revisions and *none* of them, what is the "best practice" application to our constructs? Or perhaps more usefully, what are the main conceptual principles we need to consider while we decide what to "stick in the CR" and what not to?

Collapse
Posted by Carl Robert Blesius on
Let me give your first question a try Stan.

<blockquote>Those people in dotLRN land who are actively
implementing "testing" in real curricula: how do you
understand "revisions" of such tests?
</blockquote>

Right now, we revise a test and it effects all people that have taken that instance of the test (e.g. change a question or an answer to a question and all people who have taken the test in the past are effected by the change).

<blockquote>What is the distinction between a modification
of the "same" question and a modification that
implies that you now have an entirely "new" question?
</blockquote>

Let me try to distinguish the two: the modification of the "same" question is the modification of a question that was/is actually used in an exam (e.g. it was discovered that the question and answer pairs where not matched up correctly and once they are corrected the results of this instance of the exam change). A totally "new" question when the content changes independent of a specific instance of an exam (e.g. change of actual knowledge: once it was believed that the primary cause of peptic ulcers was hypersecretion of acid, now Helicobacter pylori is the peptic ulcer star but because this has changed in the books over the past couple of years does not mean that the results of an exam in 1985 are changed)?

<blockquote>Is there a distinction?
</blockquote>

I am not sure. 😊 Do not think the examples I used did a good job of distingushing the two above, because they both could be solved using "revisions" in the content repository (revisions of a single .

<blockquote>Is there a definable transition point?
</blockquote>

Probably.

<blockquote>If so, what is it?
</blockquote>

The point we actually ask the user if they want to create a "new" question?

<blockquote>If not, are we nuts to talk about "revisions"?
</blockquote>

Far from nuts, but keep it up you are getting closer. 😉

So in summary: I tend to want to leave it up to the user (test admin) to define when a "new" question should be created (with warnings). Revisions can and should be used when a question that is in use or has been used in the past is changed.

Collapse
Posted by Malte Sussdorff on
My intake on the revision of tests is as follows and closely related to the insight of Carl.

Each test that has been taken by responders needs to be preserved in the state when the user took it. This means we need to preserve the assesment settings, the section settings and the items once a test goes live.

When it comes down to revisions, I don't think we therefore have the luxury of differentiation. Every version that has gone live needs to be preserved and while we are at it, just preserve everything.

It get's interesting from a user point of view though. What happens if you change an item, that is already in use in sections. My understanding is to send an email to the owners of the sections with a one-click "approve new version" or "stay at old version" functionality. Usually you'd approve the new version if the section is *not* in an active assessment with responses in it (we should give a warning about this, in case) and you agree with the content of the change (especially interesting if the owners of the sections have a different view of the world).

But what happens if an item has been changed in a way that the author thinks it is the same kind of question (and you can calculate statistics across all revisions of the question), but the other owners of sections disagree (stating that the change has made a comparison between responses to an item futile) ?

Another question would be, do we need to store revisions e.g. of item_types or all the other "supporting" tables. Here our denormalization with the adp-snippet comes in handy. As we store the snippet with the item, we always store the representation of a revision of an item to the respondee. This way we can easily reconstruct an assessment for any given point in time.

To give more concrete answers on how to do this for sections and assessments, we'd have to look more deeper into this datamodell along with the inter-item checks and such. We need at least revision the mapping tables (as_item_section_map and as_section_assessment_map).

Collapse
Posted by Stan Kaufman on
Carl and Malte: thanks for your ideas. Looks like we're nearing consensus here, and I think that a standard use of the CR will work for us.

- We need to maintain the "state" of an Assessment throughout its lifetime. This "state" consists of the wording/composition/etc of each Item and Section as it was during a subject's data collection event. This "state" is defined by the revisions of the Items and Sections and the rows in the mapping tables that associate them. Maintenance of this "state" is most important when creating reports/data extracts/etc of collected subject data.

- The semantics of when a new "state" exists is something that ultimately needs to be declared by the domain experts who set up the Assessment -- ie the admin users with edit privileges. However, since there can be more than one such admin user, we need a communication mechanism that alerts all other admin users whenever a new "state" is being constructed (ie when someone revises an Item, adds an Item, etc etc) and allows the other admin users to adopt the changes, or not.

- This means that there can be multiple "states" of the entire Assessment hierarchy that are "live" simulaneously. So we need mechanisms to make sure that each subject who comes to do an Assessment gets the right one.

If we make each of the main components (as_assessments, as_items, as_item_choices) all CR entities (extend cr_items and cr_revisions) and then we procedurally maintain the appropriate rows in the mapping tables (which don't need to be acs_objects like acs_rels but could be -- though do we win anything by making them such?), and build in appropriate use of notifications, warnings etc into the editing UIs, then will we accomplish this? Seems so, though until we start to work out the details, I think it will be hard to know for sure.

Storing the adp snippets with the revisions seems like mostly an efficiency convenience, eh? It will make spitting out the Assessment to a user who needs to take it faster, but we don't want to rely on the snippet for any definition of what the nature of the revision is.

Stan wrote

<blockquote> the CR provides two main ways to "reuse" things
An existing cr_item can either get a new cr_revision, or
it can be cloned completely into a new cr_item.
</blockquote>

This notion of reuse does not seem to fit with the reuse in the Content Packaging Standard IMS CP, and is therefore questionable, and the versioning concept is only an obstacle in filestore, anyway. In IMS CP, this distinction between version and clone is not necessary, nor is the question

<blockquote> If an author takes a multiple-choice Item with three
options, and then adds a fourth, is this Item now a "small"
version change? Or is it a fundamentally different Item?
</blockquote>

because either the reusable entities like the three options are separately addressible for reusage, or they are not considered worth being reused.

Even if the IMS standards are not yet complied to or will never be met, it would be useful to align the principal notions with the standard's concepts, wouldn't it?

Collapse
Posted by Stan Kaufman on
Hi Matthias, thanks for posting some ideas. A couple responses:

- The hypothesis behind all these docs/discussion is that an Assessment package can be constructed that will have generic applicability -- not merely to educational applications like .LRN but also to lots of other contexts. Therefore, limiting the design to specs from only one constituency, like IMS, contradicts this goal. On the other hand, if it turns out that there are incompatibilities amongst the specs, then the hypothesis will have been disproved, and multiple packages will need to be developed by different teams. I've always had concerns that this might be the case, but I'm still hoping not. But I definitely think we haven't proven this one way or another just yet.

- Maybe (probably) I haven't been adequately clear about the issue I'm addressing regarding versions. Consider this example: say that an investigator (or professor) has a survey (or test) with a question in it like this: "What is the political party of the British PM?" and the question choices (this is a multiple-choice format) are "Tory" "Liberal" "Other".

Say that this survey/test is released and 1000 responses are in the database. Because 90% of the responses are "Other", the investigator/professor decides to sharpen up the responses by adding several more choices: "Liberal Democrat" "Green" "Republican" "Democrat" while retaining the others, including "Other". From this point on, all respondants will have a semantically different question to answer since they have additional options. This means that the section containing this quesiton is semantically different, and ultimately the entire survey/test is semantically different.

So here's the rub: is it so different that it is an entirely *new* survey/test that should be stored as a "clone with modifications" or is it a variant of the original survey/test that should be stored as a revision? This becomes important at the point of data retrieval/analysis: How should the first 1000 responses be handled vis-a-vis all subsequent ones? Very practically, does the system have to map differernt cr_items to "the same assessment" or is there a single cr_item for this assessment but multiple cr_revisions? What is the query that pulls all these data points? How are the "states-of-the-survey/test" tagged in that result set so that anyone analyzing the results can make sense of this?

I hope this makes some sense. If none of these issues are relevant to the IMS spec, then we'll have to go back to the basic question of whether we need to fork this effort.

Collapse
Posted by Matthias Melcher on
Hi Stan,
I am not an IMS expert (I was only worried that no alignment was mentioned) but as far as I understand http://www.imsglobal.org/question/qtiv1p2/imsqti_asi_infov1p2.html#1442265 , IMS/CTI/ASI are more oriented towards reusage and sequencing rather than statistical comparison. Therefore, I would think that in your example, the new wording of the question must be seen as a different thing in the content repository, and hence the resulting assessment will have to be regarded as a new one, as well. If all the other items in the sequence and all the other sequences in the assessment remain the same, the problem of matching old and new answers could perhaps be mitigated but I think such complicated statistical analysis is so different from normal edu usage that it should at least be hidden from the normal edu course admin (to avoid confusing them) and placed in separate UIs. Couldn't the specs be similarly modularized, as well?
Collapse
Posted by Eduardo Santos on
Hi everybody,

I'm sorry to bring this topic back, but it's the only reference I found about assessment localization. I'll have to run some kind of Survey that has to be localized. This mean I must have one question and all the answer options have to be rendered according to user's locale. I've seen that the assessment package should provide this feature, as said in /doc/assessment/versioning.html. However, when I looked for the table as_item_localized in my install I just couldn't find it. It seems that the table doesn't exist.

Even reading this thread, the IRC logs, and the docs, it's not clear to me the work done so far. Why was this table removed? What's the focus right now and what can be done to solve this issue?

If somebody can help me on that, I'll be very thankfull.

Collapse
Posted by Dave Bauer on
As an experiment you can try entering the questions as message keys.

ie: #assessment.This_is_a_question# and make each one unique.

The same for the choices

#assessment.choice_label#

Then you could turn on translator mode and edit translations which administering or viewing the assessment.

It is not elegant, but it should work.

Collapse
Posted by Eduardo Santos on
Hi Dave,

Thank you very much for your reply. I've thought about it already, but I was looking for a more generic, clean and elegant solution. My system already have a lot of i18n keys, and a solution like this would increase this number a lot and regular users just couldn't do it.

My idea is that the one who is going to translate the surveys doesn't have to be the same who creates it. I also think that the translator will not have any technical skills.

The funny thing is that the assessment package has this prediction in the design documents, but it's like somebody just rip it out of there. The IRC logs show one conversation about it, but nobody got a conclusion or a final spec about it.

If somebody can give me the design idea, maybe I can do this myself, but I would like to follow the package designers ideas.