Jul 17th, 2011, 04:10 AM
Unable to Override the Spring MVC URL decoding which uses default "ISO-8859-1"
We are passing "reifengr��e" which has 2 special German characters in the URL query string value. The request is received from the browser correctly encoded as "reifengr%C3%B6%C3%9Fe". But then the Spring MVC framework decodes it using default "ISO-8859-1" which results in "reifengrö�Ÿe" (wrong value).
We tried overriding this behavior by defining a filter that sets UTF-8 as the encoding
in the request but it does not work (even though the filter gets called before the Dispatcher Servlet). Also, tried adding below lines in the web.xml ...
before the dispatcher servlet but it did not work. We are using spring 3.0.0.RELEASE jars.
Is there a bug in the MVC framework ? What am I missing here ? Do I need to do anything additional ? Any help in this regard will be appreciated ?
Jul 17th, 2011, 12:05 PM
Charset decoding of a URL and parameters is a property of the servlet container (more so than a property of the web-application).
It is the servlet container that provides the Java language "String" type. By the time the binary byte stream it decoded and turned into a String the character set conversion is already done. This conversion is usually done by the Servlet container as to provide the Servlet APIs that Spring MVC is built on top of. Such as ServletRequest#getParameter() http://download.oracle.com/javaee/5/...va.lang.String)
It maybe possible to retrieve the original byte stream of URI from the Servlet container and decoding it yourself. It is certaonly possible to retrieve the original % encoded URI which would allow you to decode it yourself (if you wanted). That seems the long and hard way to solve the problem.
The filter you are using only affects the content bodies, maybe request body, maybe response body, maybe both. Since this can convey a character set in the HTTP headers (maybe it affects POST method requests). Most CharacterSet stuff you read up on relates to content body not the URLs. The two things (content body and URL encode/decode) should not be confused.
It it important to know that a URL is character-set less, that is there is no standard character-set in use, if you only looked at HTTP/1.0-1.1 and URI standards. To those standards it is simply a series of binary octets that are encoded for use in the HTTP protocol. There are defacto standard ways of encoding 8bit data into the US-ASCII it uses for headers/meta-data, but the meaning of that 8bit data (of the orignal URI) is arbitrary, in your case you want it to be UTF-8 but the servlet container is using ISO-8859-1. To this end a URI/URL is just an opaque identifier that only has meaning when delivered to the appropiate server.
So then comes in the Servlet specification built on top of HTTP and URI specifications. To get consitent default application behavior they look to have mandated the default of ISO-8859-1 for URL encoding/decoding you are seeing, this is because the specifications that the Servlet specification was build on top of, did not mandate any particular meaning to a URI/URL.
So which servlet container are you using ? If you are using tomcat maybe documentation at http://wiki.apache.org/tomcat/FAQ/CharacterEncoding can help you specify the character-set you want the container to use for URL decoding/encoding check out question ("How do I change how GET parameters are interpreted?")
I would hope this can be setup in web.xml using <context-param>, you kind of want the <Connector> default encoding upto the part of the path that is the ServletPath and then from that point on allow the remainder of the URL to be overridden from <context-param>.
If it is only URL encoding you need to change then consider re-evaluating your need to keep the filter.
Last edited by dlmiles; Jul 17th, 2011 at 12:36 PM.
Jul 17th, 2011, 12:37 PM
The answer is over here: http://stackoverflow.com/questions/4...acter-encoding
Summary: Spring definitively does the wrong thing here, it really should (a) allow to set the encoding directly without relying on this workaround, and (b) also use UTF-8 as default.
Jul 17th, 2011, 12:58 PM
IMHO: Spring itself should use a "pass-through mode" by default. That is use ServletRequest#getParameter(String) for path variable lookup (and not attempt to perform its own decoding).
Originally Posted by Julian Reschke
If it has to pick a standard character set to go with.. then ISO-8859-1 would remain more correct while the Servlet specification mandates the same.
While it seems better that UTF-8 is a more worldly default for a sevlet specification to have, it does not negate the fact that ISO-8859-1 is in effect "8bit ASCII" and does not require any kind of processing of filtering which makes it a highly efficient default to have. From there you can opt to lower your performance by using UTF-8 if you want. So I can fully undeerstand why it was chosen.
It is not clear from the OPs example if they are using Spring request annotations or plain old RequestMapping methods. Also the OP appears to have tried using a filtering to solve the problem and claims it does not work. The result of the stackoverflow article is to use such a servlet filter.
Jul 17th, 2011, 01:14 PM
You may be correct for request parameters.
Originally Posted by dlmiles
For elements of the path, IMHO the servlet spec doesn't say, and the only reliable way is to get the raw URI from the servlet request (there's one method that is guaranteed to return the actual bits on the wire, without the servlet container messing with it); this is what Spring should be using to extract path parameters.
Jul 17th, 2011, 02:11 PM
Originally Posted by Julian Reschke
My bias for talking of Query String parameters is due to the OP's problem. Being the decoding of Query String data.
But to talk about request path the "pass-through" method would be using data from HttpServletRequest.getRequestURL() which relies on the Servlet container to perform the HTTP protocol to Servlet API conversion. Where as HttpServletRequest.getRequestURI() retains the original HTTP protocol encoded form (which is US-ASCII) and thus conversion to Java data type "String" is bi-directional with no loss of precision.
The Servlet Spec may not mandate URI/URL decoding/encoding, but neither does HTTP 1.0 or 1.1 or URI specification. The background info explains the reason for that, the URI does not have to necessarly relate to human readable text, since it is an opaque binary identifier. But the Servlet APIs use "String" java types, I can only guess for practical reasons.
Jul 18th, 2011, 12:02 AM
Issue resolved -- Thanks.
Thanks a lot to both of you for your respective feedback on this issue. I do need the filter because the Request Body Content does need to be encoded / decoded using UTF-8. Thanks for clarifying on the point that this filter has no affect on the URI / URL itself but only on the Request Body. So I did make some configuration change in the server.xml of my Tomcat as recommended in "http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q2" and then it started behaving as expected i.e UTF-8 decoded. I used the 2nd recommended option which says "Set the useBodyEncodingForURI attribute on the <Connector> element in server.xml to true. This will cause the Connector to use the request body's encoding for GET parameters. " since the latter is already taken care of with the filter I am using.
Thanks dlmiles & Julian Reschke for your help on this topic.
Nov 15th, 2011, 07:36 AM
First of all thank you for excellent discussion on this issue. This not only helped me understand how it works but also gave directions to resolve it.
I resolved this issue by adding URIEncoder=UTF-8 in both http and ajp connector of tomcat server.xml. Did nothing other than this. I am using Spring, apache/tomcat on Linux.
First I added URIEncoder=UTF-8 only in http connector. It worked on local Window PC. However, when deployed same on Linux it failed. I had to add URIEncoder=UTF-8 in both ajp connector also to make it work.
Tags for this Thread