Web components usually use PrintWriter to produce responses; PrintWriter automatically encodes using ISO-8859-1. Servlets can also output binary data using OutputStream classes, which perform no encoding. An application that uses a character set that cannot use the default encoding must explicitly set a different encoding.
For web components, three encodings must be considered:
Request
Page (JSP pages)
Response
The request encoding is the character encoding in which parameters in an incoming request are interpreted. Currently, many browsers do not send a request encoding qualifier with the Content-Type header. In such cases, a web container will use the default encoding, ISO-8859-1, to parse request data.
If the client hasn’t set character encoding and the request data is encoded with a different encoding from the default, the data won’t be interpreted correctly. To remedy this situation, you can use the
ServletRequest.setCharacterEncoding(String enc)
method to override the character encoding supplied by the container.
To control the request encoding from JSP pages, you can use the JSTL
fmt:requestEncoding
tag.
You must call the method or tag before parsing any request parameters or reading any input from the request. Calling the method or tag once data has been read will not affect the encoding.
For JSP pages, the page encoding is the character encoding in which the file is encoded.
For JSP pages in standard syntax, the page encoding is determined from the following sources:
The page encoding value of a JSP property group (see Setting Properties for Groups of JSP Pages) whose URL pattern matches the page.
The pageEncoding attribute of the page directive of the page. It is a translation-time error to name different encodings in the pageEncoding attribute of the page directive of a JSP page and in a JSP property group.
The CHARSET value of the contentType attribute of the page directive.
If none of these is provided, ISO-8859-1 is used as the default page encoding.
The pageEncoding and contentType attributes determine the page character encoding of only the file that physically contains the page directive. A web container raises a translation-time error if an unsupported page encoding is specified.
The response encoding is the character encoding of the textual response generated by a web component. The response encoding must be set appropriately so that the characters are rendered correctly for a given locale. A web container sets an initial response encoding for a JSP page from the following sources:
The CHARSET value of the contentType attribute of the page directive
The encoding specified by the pageEncoding attribute of the page directive
The page encoding value of a JSP property group whose URL pattern matches the page
If none of these is provided, ISO-8859-1 is used as the default response encoding.
The setCharacterEncoding, setContentType, and setLocale methods can be called repeatedly to change the character encoding. Calls made after the servlet response’s getWriter method has been called or after the response is committed have no effect on the character encoding. Data is sent to the response stream on buffer flushes (for buffered pages) or on encountering the first content on unbuffered pages.
Calls to setContentType set the character encoding only if the given content type string provides a value for the charset attribute. Calls to setLocale set the character encoding only if neither setCharacterEncoding nor setContentType has set the character encoding before. To control the response encoding from JSP pages, you can use the JSTL fmt.setLocale tag.
To obtain the character encoding for a locale, the setLocale method checks the locale encoding mapping for the web application. For example, to map Japanese to the Japanese-specific encoding Shift_JIS, follow these steps:
Select the WAR.
Click the Advanced Settings button.
In the Locale Character Encoding table, Click the Add button.
Enter ja in the Extension column.
Enter Shift_JIS in the Character Encoding column.
If a mapping is not set for the web application, setLocale uses a Application Server mapping.
The first application in Chapter 5, JavaServer Pages Technology allows a user to choose an English string representation of a locale from all the locales available to the Java 2 platform and then outputs a date localized for that locale. To ensure that the characters in the date can be rendered correctly for a wide variety of character sets, the JSP page that generates the date sets the response encoding to UTF-8 by using the following directive:
<%@ page contentType="text/html; charset=UTF-8" %>