Results 1 to 9 of 9

Thread: UTF-8 form input garbled

  1. #1
    Join Date
    Sep 2004
    Location
    Toulouse, France
    Posts
    50

    Default UTF-8 form input garbled

    Hello,

    Having a strange problem with Spring 1.1 for UTF-8 form input, which occurs whether I use JSTL or Freemarker 2.3 as a view.

    If use a JSTL view with the following settings:
    Code:
    test.class=org.springframework.web.servlet.view.JstlView
    test.contentType=text/html; charset=utf-8
    test.url=/WEB-INF/jsp/test.jsp
    Plus the following directive in my JSP page
    Code:
    <%@ page language="java" contentType="text/html; charset=utf-8" %>
    When I submit the form to my controller with various non-latin 1 characters, I get strange values set on my bean. The same occurs on a Freemarker view set to UTF-8, using the following settings:
    Code:
    test.class=org.springframework.web.servlet.view.freemarker.FreeMarkerView
    test.exposeSpringMacroHelpers=true
    test.requestContextAttribute=rc
    test.contentType=text/html; charset=utf-8
    test.url=test.ftl
    Configured as:
    Code:
    <bean 
    	id="freemarkerConfig" 
    	class="org.springframework.web.servlet.view.freemarker.FreeMarkerConfigurer">
    	<property name="templateLoaderPath"><value>/WEB-INF/freemarker/</value></property> 
    	<property name="freemarkerSettings">
    		<props>
    			<prop key="default_encoding">UTF-8</prop>
    		</props>
    	</property>
    </bean>
    For example, if I submit the Russian characters "дор", what actually gets assigned in the bean's setter method is:
    2004-09-21 18:24:07,477 DEBUG [org.springframework.beans.BeanWrapperImpl] - Invoked write method [public void test.TestBean.setName(java.lang.String)] with value [до�?]

    This seems to happen consistently with certain unicode characters only, while others are always entered correctly.

    Best regards,
    Assaf

  2. #2
    Join Date
    Sep 2004
    Location
    Toulouse, France
    Posts
    50

    Default

    Well, one part of the mystery has been solved: I forgot to set log4j to write the log in UTF-8:
    log4j.appender.logfile.Encoding=UTF-8

    So log4j was writing in ANSI (or CP1252), but Eclipse was displaying this log in UTF-8! A mess, the result of which was that Eclipse was displaying most of the data correctly as UTF-8, but the database and webpage were storing/displaying them as ANSI.

    So this is my actual problem: if I enter a string like "fatigué" on my web page, what I get in the log, database, and following web page is "fatigué". The last two characters, representing bytes C3 and A9 in ANSI, are precisely the UTF-8 representation of e-acute (é).

    So, somewhere between posting my data from the web page and storing the string found in request.getParameter("name"), the UTF-8 is being converted into ANSI. Any ideas how to fix this?

    BTW, the page is displaying UTF-8 correctly (it's picking up French accents from the resource bundle), it's just the form submit that's breaking.

    My charset configuration settings can be seen in the previous post.

    Best regards,
    Assaf

  3. #3
    Join Date
    Aug 2004
    Location
    London, UK
    Posts
    339

    Default

    try putting an accept-charset attribute on the form tag in your HTML..
    Code:
    <form action="" method="POST" accept-charset="UTF-8">
    ...
    </form>
    Regards,
    Darren Davison.
    Public Key: 0xE855B3EA

  4. #4
    Join Date
    Sep 2004
    Location
    Toulouse, France
    Posts
    50

    Default

    The accept-charset attribute didn't work.
    However, I finally found something that did: the trick was to capture the request prior to binding and setting the character encoding to the correct value.

    The only place I found for doing this in the workflow was by overriding the isFormSubmission method, setting the character encoding, and then the calling the super method.

    Code:
    protected boolean isFormSubmission&#40;HttpServletRequest request&#41; &#123;
    	try &#123;
    		request.setCharacterEncoding&#40;"utf-8"&#41;;
    	&#125; catch &#40;UnsupportedEncodingException uee&#41; &#123;
    		LOG.error&#40;uee&#41;;
    		throw new RuntimeException&#40;uee&#41;;
    	&#125;
    	LOG.debug&#40;"encoding&#58; " + request.getCharacterEncoding&#40;&#41;&#41;;
    	
    	return super.isFormSubmission&#40;request&#41;;
    &#125;
    However, it would seem simpler if this were a standard attribute of the SimpleFormController or one of its super-classes, which can be configured in the servlet context config file, and automatically gets set prior to form request binding.

    Best regards,
    Assaf

  5. #5
    Join Date
    Aug 2004
    Location
    London, UK
    Posts
    339

    Default

    It strikes me that it's possibly a fault of the servlet container - which one are you using? I'll see how a few others behave with this too and see if there's any consistency.
    Darren Davison.
    Public Key: 0xE855B3EA

  6. #6
    Join Date
    Sep 2004
    Posts
    18

    Default

    The default request encoding according to the Servlet specification is ISO-8859-1. If the client doesn't send any charset information (and none of the major browsers do) then this is used, unless you explicitly set a different encoding, for example in a filter. Most browsers respond with the same charset your response was in, so what I'm doing at the moment is to send *only* UTF-8 in my responses, forcing the request into UTF-8 with a filter, and additionally using accept-charset. It seems to work so far.

    Hope this helps
    Carl-Eric

  7. #7
    Join Date
    Sep 2004
    Location
    Toulouse, France
    Posts
    50

    Default

    Quote Originally Posted by davison
    It strikes me that it's possibly a fault of the servlet container - which one are you using?
    Resin 3.0.7 - I wonder if the others are smarter about this?

    Carl-Eric, thanks for your suggestions. The filter works and seems cleaner than messing with isFormSubmission. For anybody else running into the same problem, here's a sample filter, with set-up in web.xml and code.
    Code:
    <filter>
      <filter-name>
    	charsetFilter
      </filter-name>
      <filter-class>
    	com.blah.blah.blah.CharsetFilter
      </filter-class>
    	<init-param>
    	  <param-name>requestEncoding</param-name>
    	  <param-value>UTF-8</param-value>
    	</init-param>
    </filter>
    
    <filter-mapping>
    	<filter-name>charsetFilter</filter-name>
    	<url-pattern>/*</url-pattern>
    </filter-mapping>
    Code:
    public class CharsetFilter implements Filter &#123;
    	FilterConfig config;
    	String encoding = "UTF-8";
    	
    	/**
    	 * @see javax.servlet.Filter#destroy&#40;&#41;
    	 */
    	public void destroy&#40;&#41; &#123;
    	&#125;
    	
    	/**
    	 * Sets the character encoding on the request
    	 * @see javax.servlet.Filter#doFilter&#40;javax.servlet.ServletRequest, javax.servlet.ServletResponse, javax.servlet.FilterChain&#41;
    	 */
    	public void doFilter&#40;ServletRequest request, ServletResponse response,
    			FilterChain chain&#41; throws IOException, ServletException &#123;
    		request.setCharacterEncoding&#40;encoding&#41;;
    		chain.doFilter&#40;request, response&#41;;    
    	&#125;
    	
    	/**
    	 * @see javax.servlet.Filter#init&#40;javax.servlet.FilterConfig&#41;
    	 */
    	public void init&#40;FilterConfig config&#41; throws ServletException &#123;
    		this.config = config;
    		this.encoding = config.getInitParameter&#40;"requestEncoding"&#41;;
    	&#125;
    &#125;
    Regarding accept-charset: I presume this is supposed to indicate to the Browser that it should automatically reject any input not matching a particular charset (?). I don't find the definition in the w3c recommendations very clear. Anyway, it's completely ignored by Firefox 0.8, tested by entering Russian characters on ISO-8859-1 which all got sent correctly. So for now, using "accept-charset" seems like needless typing...

  8. #8
    Join Date
    Sep 2004
    Posts
    1

    Default

    I solved UTF-8 problem in Tomcat 5, by adding the following lines to web.xml:
    Code:
        <locale-encoding-mapping-list>
            <locale-encoding-mapping>
                <locale>en</locale>
                <encoding>UTF-8</encoding>
            </locale-encoding-mapping>
            <locale-encoding-mapping>
                <locale>no</locale>
                <encoding>UTF-8</encoding>
            </locale-encoding-mapping>
            <locale-encoding-mapping>
                <locale>ru</locale>
                <encoding>UTF-8</encoding>
            </locale-encoding-mapping>
            <locale-encoding-mapping>
                <locale>pl</locale>
                <encoding>UTF-8</encoding>
            </locale-encoding-mapping>
        </locale-encoding-mapping-list>

  9. #9
    Join Date
    Oct 2006
    Posts
    1

    Default

    Since this thread almost satisfied my needs:

    You don't have to write your own filter to set the character encoding. Springframework (i'm using version 1.2.6) comes with the 'org.springframework.web.filter.CharacterEncodingF ilter' for this purpose.
    (why the hell is there space rendered between the 'F' and the 'i' in 'CharacterEncodingFilter' ?!?)

    Use it like this in your web.xml:

    Code:
    <filter>
      <filter-name>charsetFilter</filter-name>
      <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
      <init-param>
        <param-name>encoding</param-name>
        <param-value>UTF-8</param-value>
      </init-param>
    </filter>
    
    <filter-mapping>
      <filter-name>charsetFilter</filter-name>
      <url-pattern>/*</url-pattern>
    </filter-mapping>

Similar Threads

  1. Replies: 3
    Last Post: Jun 8th, 2010, 03:27 AM
  2. Replies: 9
    Last Post: May 4th, 2006, 09:53 AM
  3. Replies: 2
    Last Post: Apr 29th, 2005, 01:35 PM
  4. Trouble with form input names
    By robinhyman in forum Web
    Replies: 4
    Last Post: Oct 22nd, 2004, 11:33 AM
  5. Form input generation macro for password field
    By Sampo Pasanen in forum Web
    Replies: 0
    Last Post: Sep 6th, 2004, 06:54 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •