Forum OpenACS Q&A: Non-english characters in OpenACS 4.5

Is there a way to tell OpenACS (4.5beta1) to accept non-english
characters, e.g. in the bboard-postings (like this one) when it is set
as "Plain Text"?

This looks horrible: äöü ���

(should be ä ö ... Ü

Collapse
Posted by Malte Sussdorff on
You should use UTF-8 with AOLserver and Postgres. Dig around for internationalization on this bboard and if you fail to find an answer ask Don to reply 😊.
Collapse
Posted by Henry Minsky on
"Non-English" covers a lot of ground. If you mean to accept
ISO-8859-1 correctly, that can be done. If you mean other character
sets (including multibyte code like SJIS and EUC) and Unicode, they
can be done also. But currently have to pretty much decide on a single char set
your site is going to be delivered in, and forms data is expected to be in,  or else it gets to be quite a bit more work.

For historical reasons, ACS and AOLserver think that 7 bit ASCII
is a reasonable default. But the USA is still using Imperial
units instead of metric, and it doesn't seem to have hurt us
much.

I will try to write up the definitive guide to forcing character sets to work under ACS and AOLserver one of these nights.

In the meantime, checking the bboard will probably help.

Collapse
Posted by Reuven Lerner on

I'm using Unicode more and more on Web sites that I work on, because my clients need English + Hebrew + some other language. There's no easy or reasonable way to do this without Unicode.

Luckily, it's pretty easy to set up an OpenACS system that uses Unicode with UTF-8 encoding:

  • Create your PostgreSQL database with Unicode encoding. That is, instead of just saying createdb openacs, say createdb openacs --encoding=UNICODE.
  • The only other major step involves telling AOLserver that it should modify the outgoing Content-type header such that it indicates output will be in in UTF-8, rather than the default of Latin-1. (If you have HTML forms, then input from those forms will automatically be in UTF-8 if the page itself was sent in UTF-8.) Add these four directives to your nsd.tcl:
    ns_param   HackContentType 1
    ns_param   URLCharset      utf-8
    ns_param   OutputCharset   utf-8
    ns_param   HttpOpenCharset utf-8
    

    (I found these directives on the bboard a while back, and don't remember who originally suggested them.)

These should be sufficient to ensure that OpenACS works in UTF8. However, design issues relating to language -- such as right-to-left and left-to-right issues that we deal with in Hebrew -- and formatting, dates, currencies, and other such things. And I had problems with OpenACS 3's implementation of ns_sendmail, which needed some tweaking to send UTF-8. And my friend and colleague Danny Lieberman reports that depending on your version of glibc, Unicode collating (i.e., sorting) might not work just right, so lists of bboards or users might look funny. (New versions of glibc seem to be much better about this than old ones.)

But these many little issues aside, it works pretty darned well!

Collapse
Posted by David Kuczek on
Servus Christoph,

here is a thread with a step by step instruction to get oacs up and running with unicode or latin-1 (German, French, Spanish etc.)

https://openacs.org/bboard/q-and-a-fetch-msg.tcl?msg_id=00044O

Collapse
Posted by Jay Dubanik on
I have made an attempt to change my system (oacs4.5/oracle) to iso-8859-2 character set.

I was able to get the output right but I'm not able to get the input from forms to work.

Problem:
I tried add new data from general-comments and from news forms.
both with same result, characters entered in content (the large text area) work fine and display back ok but
the TITLE line does not convert, and I get for those characters something that looks like binary, a capital letter and a square for each polish letter

here what i have done so far to set up the system for iso-8859-2 characters.

  • converted oracle database with alter database character set ee8iso8859p2;
  • changed in /modules/tcl/init.tcl file
    SystemEncoding utf-8 to SystemEncoding iso8859-2
  • added in nsd.tcl lines
    ns_section ns/parameters
    ns_param OutputCharset iso-8859-2
  • changed in modules/tcl/http.tcl and form.tcl and charsets.tcl any reference to iso-8859-1 to iso-8859-2
after these changes I got statics module to scan few html pages with polish characters and the result is working as Im getting pages back with 'add comment' link and with correct characters displayed.

Don't know where to start looking

Please help, Janus

Collapse
Posted by Jay Dubanik on
in addition to my previous posting,
I would like somebody to tell me if I understand this correctly.
If I'm not mistaken static pages are not stored in database only reference to them is, this is why my pages of Character set iso-8859-2 are displayed correctly.
Instead titles for news items or/end titles for general comments are stored in rdbms and thats why any characters saved to rdbms in Character set iso-8859-2 is displayed incorrectly.

The only problem is that I don't know where it gets broken.

janus