Mixed byte & Unicode support

VA Smalltalk is a "100% VisualAge compatible" IDE that includes the original VisualAge technology and the popular VA Assist and WidgetKit add-ons.

Moderators: Eric Clayberg, wembley, tc, Diane Engles, solveig

Mixed byte & Unicode support

Postby Garv » Mon Aug 31, 2009 12:56 pm

Hi,

Can someone provide some feedback as to the level of support provided in VA Smalltalk for both mixed byte (english + 1 other language) and Unicode? We're planning to localize our Smalltalk application in the near future for the Asian and European markets.

Any VA Smalltalk related tips from people who have gone through this process would be greatly appreciated.

Thanks
Garv
 
Posts: 7
Joined: Thu May 10, 2007 11:05 am

Re: Mixed byte & Unicode support

Postby tc » Wed Sep 02, 2009 10:32 am

Hello,

I can answer the part about Asian languages, etc. VA ST uses something called '.mpr' files to hold the strings in the language of your choice. So different countries would get different mpr files. If you search the documentation for 'building .mpr files', it has a full explanation of what they are and how to use them. A few pages after that has an example that uses everything you read.

So, instead of 'System message: 'Enter name:', the code would be 'System message: 'Ms1234' and 'Ms1234' would be a reference to a string in an mpr file.

Internally, there is a class DBString which handles double byte characters.

We have customers currently selling products (the same product) in multi-byte languages, in Asia, and single byte languages, in Europe.

--tc
tc
Moderator
 
Posts: 304
Joined: Tue Oct 17, 2006 7:40 am
Location: Raleigh, NC

Re: Mixed byte & Unicode support

Postby marten » Wed Sep 02, 2009 10:40 am

tc wrote:Hello,

I can answer the part about Asian languages, etc. VA ST uses something called '.mpr' files to hold the strings in the language of your choice.


And If I remember correctly ... also the code page ... which means, that you may use the same mpr file (for one language) in different code pages ( ... platforms).

Marten
Marten Feldtmann, Principal Smalltalk User, Private
SkypeMe callto://marten.feldtmann
marten
[|]
 
Posts: 641
Joined: Sat Oct 14, 2006 7:10 am
Location: Hamburg - Germany

Re: Mixed byte & Unicode support

Postby tc » Wed Sep 02, 2009 11:28 am

Hello,

I forgot to mention the code page but it is talked about in the documentation I mentioned. When I did some work in multi-byte languages, the only issue I had was making sure the code page for the language wanted was in the abt.ini file.

--tc
tc
Moderator
 
Posts: 304
Joined: Tue Oct 17, 2006 7:40 am
Location: Raleigh, NC

Re: Mixed byte & Unicode support

Postby Garv » Fri Oct 23, 2009 6:06 am

Hi,

Thanks for the feedback. Looks like double-byte will work. Any thoughts on unicode? i.e. a single code page which can be used for all languages? We have a need to support communications to applications which maybe in different code pages using a single Smalltalk application. i.e. We maybe getting values in French, Japanese, Chinese, etc all coming into a single image.

The backend for storage would be MS SQL server running a unicode collation.

*Edited*
I noticed there an October date now for version 8.0.1 which states UTF-8 support. This may goe a long way in addressing some of our concerns. For planning purposes, is this still on track for an October release.

Thanks
Garv
 
Posts: 7
Joined: Thu May 10, 2007 11:05 am

Re: Mixed byte & Unicode support

Postby marten » Fri Oct 23, 2009 7:00 am

Garv wrote:Hi,

Thanks for the feedback. Looks like double-byte will work. Any thoughts on unicode? i.e. a single code page which can be used for all languages? We have a need to support communications to applications which maybe in different code pages using a single Smalltalk application. i.e. We maybe getting values in French, Japanese, Chinese, etc all coming into a single image.

The backend for storage would be MS SQL server running a unicode collation.

*Edited*
I noticed there an October date now for version 8.0.1 which states UTF-8 support. This may goe a long way in addressing some of our concerns. For planning purposes, is this still on track for an October release.

Thanks


I think, that you can do most of the stuff right now. You may store you work (into databases) using utf-8 based strings. That only means, that you have to convert your local code page based strings to the utf-8 code page strings (and the other direction).

The code page for utf-8 is 65001 (under Windows).

Marten

Marten
Marten Feldtmann, Principal Smalltalk User, Private
SkypeMe callto://marten.feldtmann
marten
[|]
 
Posts: 641
Joined: Sat Oct 14, 2006 7:10 am
Location: Hamburg - Germany

Re: Mixed byte & Unicode support

Postby jtuchel » Mon Oct 26, 2009 12:31 am

Marten, Taylor,

are there any code samples available as a quick start into supporting Unicode by converting?
Seaside and its traction in VAST 8 as well as web services as teh integration strategy of choice in many VAST projects really push for a solution for unicode. If there were a few code snippets to show how to encode in unicode and back to a codepage like the iso-8859-family would be a great start...

Kind Regards,

Joachim
jtuchel
[|]
 
Posts: 245
Joined: Fri Oct 05, 2007 1:05 am
Location: Ludwigsburg, Germany

Re: Mixed byte & Unicode support

Postby marten » Mon Oct 26, 2009 4:27 am

jtuchel wrote:Marten, Taylor,
are there any code samples available as a quick start into supporting Unicode by converting?


If I assume, that all my text is stored as UTF-8 strings (as for example within the sqlite database), then I would use this code (Windows only !) to convert each string within my application before binding the converted value to an insert/update string column parameter within my sql statement (as an usage example):

Therefore the "storing"-part must execute stuff like:

Code: Select all
^(AbtCodePageConverter current 
      convert: currentCodePageString
      fromCodePage: AbtCodePageConverter currentCodePage
      toCodePage: AbtCodePageConverter utf8CodePage) trimNull

and the "loading" part:

Code: Select all
^(AbtCodePageConverter current 
      convert: utf8String
      fromCodePage: AbtCodePageConverter utf8CodePage
      toCodePage: AbtCodePageConverter currentCodePage) trimNull
.
Within a multi-user/multi-session application you must get/store the "AbtCodePageConverter currentCodePage" value from/to your session bindings and not from the global image settings.

Therefore having a multi-user/session/language application you would store all your texts as utf8 based strings in different languages in your "database". And then the user selects a native language and its code page (however you get it) and then the conversion should work.

The other question may be: if I have not a native client, but want to work with a different code page than the default code page, you may need additional code to sort and compare strings .... the built-in stuff (Locale) does only work against the current code page ... but then you may use third-party libraries to do this work ....

To make this work with databases you must consider the connection settings available on various database products - as an example: database kernel working on utf8, but each connection may have a different code page. Then the conversion is done already internally within the database driver ... then the conversion is not needed at all in the VA code. Or you make the decision, that you set the client connection code page always to UTF-8, then your Smalltalk code has to do the conversion again. And again: special thoughts have to be done when doing sorting or searching ....
Marten Feldtmann, Principal Smalltalk User, Private
SkypeMe callto://marten.feldtmann
marten
[|]
 
Posts: 641
Joined: Sat Oct 14, 2006 7:10 am
Location: Hamburg - Germany

Re: Mixed byte & Unicode support

Postby skrishnamachari » Mon Dec 07, 2009 5:48 am

I am tieing myself into knots trying out VA for Japanese display through the mpr files mechanism described. This is on Windows XP.

I can have the mpr/tra files working fine with French / German language.

Any heads up on where the issue maybe is appreciated. Possibly the issue maybe the codepage to be correctly linked. Is there a simple way to run through this with:

Code: Select all
aStr := DBString new: 100.
1 to: 100 do: [ :ea |
  aStr at: ea put: <the japanese character values..>].
anyTextWidget/anyButton/anyWindowTitle object: ( or title: ...) aStr.


So that we can test it out.. completely for all UI widgets. Probably run through a SUnit testsuite that can set/ get and test the characters are appropriate to Japanese Katakana or Hiragana display..

Also do we have ability to use the explicit codepage convertor to display US-English and Japanese interchangeably...?

-Skrish
skrishnamachari
 
Posts: 7
Joined: Sun Jun 28, 2009 9:58 pm

Re: Mixed byte & Unicode support

Postby marten » Tue Dec 08, 2009 9:36 am

But what are your special questions ? This is such a general topic. I tried to answer, but always stopped - its too general to answer this question:

- were you successful to build the mpr files with japanese text ?
- did you have a japanese system ?
- what did you try out ? What did not work ???
- How do you want to do it ??

In general you would be able to set a font to each widget. Therefore - with additional works - you might be able to have widgets with a font (in codepage 8859-1 -> english text) and in the same window a widget with a japanese font. But this would mean: additional work and painful management by your own - unless you write your own framework to help your here.

Build an mpr file on a system with the japanese font installed (as the default code page), load it into your application and then it should work.

Just some ideas - one would need additional infos to be more specific ...

Marten

skrishnamachari wrote:I am tieing myself into knots trying out VA for Japanese display through the mpr files mechanism described. This is on Windows XP.

I can have the mpr/tra files working fine with French / German language.

Any heads up on where the issue maybe is appreciated. Possibly the issue maybe the codepage to be correctly linked. Is there a simple way to run through this with:

Code: Select all
aStr := DBString new: 100.
1 to: 100 do: [ :ea |
  aStr at: ea put: <the japanese character values..>].
anyTextWidget/anyButton/anyWindowTitle object: ( or title: ...) aStr.


So that we can test it out.. completely for all UI widgets. Probably run through a SUnit testsuite that can set/ get and test the characters are appropriate to Japanese Katakana or Hiragana display..

Also do we have ability to use the explicit codepage convertor to display US-English and Japanese interchangeably...?

-Skrish
Marten Feldtmann, Principal Smalltalk User, Private
SkypeMe callto://marten.feldtmann
marten
[|]
 
Posts: 641
Joined: Sat Oct 14, 2006 7:10 am
Location: Hamburg - Germany

Re: Mixed byte & Unicode support

Postby skrishnamachari » Thu Dec 10, 2009 3:35 am

Should have updated this immediately. The Japanese works fine after running through all reqd settings in the WinXP Regional/ Language Settings panel.

Only glitch I noticed was the label was not updating correctly. The japanese characters were garbled. Rest all from buttons , tab label, window title, input , text editor all works fine..

Will post a image.. soon..
skrishnamachari
 
Posts: 7
Joined: Sun Jun 28, 2009 9:58 pm

Re: Mixed byte & Unicode support

Postby skrishnamachari » Thu Dec 10, 2009 3:39 am

ok, nonetheless what is the prognosis on Unicode..

a) You do already have the DBString and the windows specific work on having all the function call changes from TextOutA/TabbedTextOutA to TextOutW/TabbedTextOutW.. or similar works will that be a large barrier to cross...?

b) How far is the effort required on the change delta for more or less workable unicode for UI representation...

I have tried out a bare bone hack attempts from the WindowsPlatformFramework.. and the related calls of those API methods: TabbedTextOutA for instance.. gives a certain mileage..

-Skrish
skrishnamachari
 
Posts: 7
Joined: Sun Jun 28, 2009 9:58 pm


Return to VA Smalltalk 7.0, 7.5 & 8.0

Who is online

Users browsing this forum: Yahoo [Bot] and 1 guest