Conversion of UNICODE entities

VA Smalltalk is a "100% VisualAge compatible" IDE that includes the original VisualAge technology and the popular VA Assist and WidgetKit add-ons.

Moderators: Eric Clayberg, wembley, tc, Diane Engles, solveig

Conversion of UNICODE entities

Postby heyrothu » Wed Oct 28, 2009 8:14 am

Hi y'all!

We have a problem within our project

We have to conncurrent systems running. One Smalltalk-component and several other components implemented in Java. Communications is via synchronous webservices and/or via asynchronous database messages. Both communication utilize XML for the exchange of data. Between the systems we have agreed to use ISO-8859-1 encoding. The following XML structur is giving us real trouble, when trying to convert it to Smalltalk objects:

Code: Select all
<?xml version="1.0" encoding="ISO-8859-1"?>
<aprod.ermittelteAuszahlinfosFuerAuszahlidsUndDaten
    xsi:schemaLocation="aprod.ermittelteAuszahinfosFuerAuszahlidsUndDaten.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <auszahlinfo>
        <id>1000005648</id>
        <gueltigam>2009-10-26</gueltigam>
        <gueltigab>2009-10-01</gueltigab>
        <gueltigbis>3000-12-31</gueltigbis>
        <kassenoderkundennummer>0G11570</kassenoderkundennummer>
        <auszahlkennzeichen>A</auszahlkennzeichen>
        <empfaengername1>Finanzamt Idar-Oberstein</empfaengername1>
        <kontonummer>7401507497</kontonummer>
        <swiftcode>SOLADEST600</swiftcode>
        <bankleitzahl>60050101</bankleitzahl>
        <bankname>BW-Bank/LBBW Stuttgart</bankname>
        <verwendungszweck2>09/225/1072/1-I/2</verwendungszweck2>
        <bemerkung>GGR;01.10.2009;Abtretung &#x20ac;500,00 pro Abrechnung</bemerkung>
    </auszahlinfo>
</aprod.ermittelteAuszahlinfosFuerAuszahlidsUndDaten>


The unicode entity &#x20ac; which is the -sign is translated into a double byte character, which cannot be replaced in the out-string, which happens to be a single byte character string. So an Exception is thrown by String.

This raises the following questions:

1. Is it valid to use unicode entities in ISO-8859-1 encoded XML?
2. Should we just use another encoding scheme, or
3. Can the parser be configured so that unicode entities can indeed be used with ISO-8859-1 encoded XML?

Any help and/or suggestions are deeply appreciated,

Best regards

Uwe
heyrothu
 
Posts: 6
Joined: Wed Dec 19, 2007 5:07 am

Re: Conversion of UNICODE entities

Postby tc » Wed Oct 28, 2009 8:57 am

Hello,

You can try converting the DBString to a String by doing the following:
Code: Select all
myDBString asByteArray asString

. . . and see if that helps. Full unicode support is on our road map of items for VA ST but I do not think the XML parser has switches to tell it what encoding to use, etc.

--tc
tc
Moderator
 
Posts: 304
Joined: Tue Oct 17, 2006 7:40 am
Location: Raleigh, NC

Re: Conversion of UNICODE entities

Postby jtuchel » Wed Oct 28, 2009 9:36 pm

Uwe,

just a little remark: you mention the euro sign and talk about iso-8859-1. This doesn't contain the Euro sign at all. You need to use 8859-15 instead. Take a look at: http://de.wikipedia.org/wiki/ISO_8859-1 for more details.

It is valid to use encoded characters in XML as long as the encoding is declared in the xml entity (at the beginning of the document). From what I've found, it is okay to use a unicode hex encoded entity, so your use of &#x20ac; looks fine to me.

But still your encoding in 8859-1 is wrong, replace it with 8859-15 to use a correct character set. But I guess the exception you get has nothing to do with it :? So if Taylor's suggestion doesn't work, you're probably out of luck until VAST supports unicode...

You may also want to take a look at Marten's code snippets in another thread on this forum: viewtopic.php?f=12&t=3294#p13151

cu

Joachim
jtuchel
[|]
 
Posts: 245
Joined: Fri Oct 05, 2007 1:05 am
Location: Ludwigsburg, Germany

Re: Conversion of UNICODE entities

Postby heyrothu » Fri Oct 30, 2009 5:51 am

Thanks a lot guys!

We decided to go for the 8859-15 solution. That way we are able to use all characters on the german keyboard.

We will integrate Unicode support as soon as it is available.

Cheers

Uwe
heyrothu
 
Posts: 6
Joined: Wed Dec 19, 2007 5:07 am

Re: Conversion of UNICODE entities

Postby jtuchel » Fri Oct 30, 2009 6:13 am

Uwe,

if I read your post correctly, you got the Euro sign to appear on both sides of the wire...?

If so, could you share a bit of your code for conversion from/to XML?

cu

Joachim
jtuchel
[|]
 
Posts: 245
Joined: Fri Oct 05, 2007 1:05 am
Location: Ludwigsburg, Germany


Return to VA Smalltalk 7.0, 7.5 & 8.0

Who is online

Users browsing this forum: Yahoo [Bot] and 1 guest