Results 1 to 3 of 3

Thread: StaxEventItemReader ISO-8859-1 Character Normalization

  1. #1
    Join Date
    Nov 2010
    Posts
    7

    Default StaxEventItemReader ISO-8859-1 Character Normalization

    Hello,

    Newbie here. I have a batch program that uses a StaxEventItemReader to input some XML. The XML is UTF-8 and contains some ISO-8859-1 Latin characters.

    The Stax parser works fine issuing nextEvent calls until it gets to retrieving the XMLEvent that contains one of these characters, throwing this exception:

    com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 middle byte 0x6e (at char #35463, byte #31999)

    Looking for ideas on how to either normalize the data at the point of retrieving the event...or somehow configuring the StaxEventItemReader so it normalize the data. Any ideas?

    Thanks!

  2. #2
    Join Date
    Jun 2005
    Posts
    4,232

    Default

    You probably need to consult the documentation for your XML library (Woodstox by the looks of it)? But it is telling you there is an invalid byte, so are you sure it is really ISO-8859-1?

  3. #3
    Join Date
    Nov 2010
    Posts
    7

    Default

    Thanks for the reply, Dave. Yes, WoodStox is our XML library...I took your advice and checked out the documentation but unfortunately it did not provide any information or clues on how to handle this scenario. The links to their issues/bugs database and logs show as unavailble.

    I am also pretty sure it is ISO-8859-1. Googling, I have found similar issues with the same character - when I delete that character, it runs fine. Posts seem to hint at differences between encoding used by StAX reader implementations (e.g. StAXUtils.createXMLStreamReader(InputStream)) vs. encoding the String uses (JVM default).

    Was really hoping to see some properties/interface of the StaxEventItemReader available to normalize characters, set encoding options, alter the type of underlying reader it uses, etc. Any ideas are welcomed. Thank you!

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •