{"id":79,"date":"2004-10-23T14:32:53","date_gmt":"2004-10-23T13:32:53","guid":{"rendered":"\/?p=79"},"modified":"2025-02-01T03:04:17","modified_gmt":"2025-02-01T03:04:17","slug":"java-filewriter-xml-and-utf-8","status":"publish","type":"post","link":"https:\/\/www.malcolmhardie.com\/weblogs\/angus\/2004\/10\/23\/java-filewriter-xml-and-utf-8\/","title":{"rendered":"Java FileWriter, XML and UTF-8"},"content":{"rendered":"<p>Oddly enough the java.io.FileWriter class doesn&#8217;t use UTF-8 by default. I&#8217;m not exactly sure what the default encoding is (possibly ISO-8859-1 or US-ASCII?) but it doesn&#8217;t seem to be UTF-8, which is odd given that java strings are supposed to be unicode. This causes a problem if you want to have non-ascii characters and you don&#8217;t realise what&#8217;s happening. This was a bug in SQLEditor and somebody accidentally typed an umlaut into one of the fields and the file wouldn&#8217;t reload. (Which was annoying).<\/p>\n<p>The correct thing to do seems to be to use the following:<\/p>\n<p><code>OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(path),\"UTF-8\");<\/code><\/p>\n<p>Which ensures that you are using UTF-8.<\/p>\n<p>I suppose that the motivation for this is that it means that simple use of FileWriter is compatible with applications that are not unicode aware and don&#8217;t support UTF-8. It probably makes sense at some level, but it just goes to show that you can&#8217;t assume anything. \ud83d\ude42<\/p>\n<p>Update: <a href=\"http:\/\/www.malcolmhardie.com\/weblogs\/angus\/2004\/10\/23\/java-filewriter-xml-and-utf-8\/comment-page-1\/#comment-21288\">Bela&#8217;s comment<\/a> (below) explains more about which character set you&#8217;ll actually get.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Oddly enough the java.io.FileWriter class doesn&#8217;t use UTF-8 by default. I&#8217;m not exactly sure what the default encoding is (possibly ISO-8859-1 or US-ASCII?) but it doesn&#8217;t seem to be UTF-8, which is odd given that java strings are supposed to be unicode. This causes a problem if you want to have non-ascii characters and you [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[12,14],"tags":[],"class_list":["post-79","post","type-post","status-publish","format-standard","hentry","category-sqleditor","category-writing-software"],"_links":{"self":[{"href":"https:\/\/www.malcolmhardie.com\/weblogs\/angus\/wp-json\/wp\/v2\/posts\/79","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.malcolmhardie.com\/weblogs\/angus\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.malcolmhardie.com\/weblogs\/angus\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.malcolmhardie.com\/weblogs\/angus\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.malcolmhardie.com\/weblogs\/angus\/wp-json\/wp\/v2\/comments?post=79"}],"version-history":[{"count":1,"href":"https:\/\/www.malcolmhardie.com\/weblogs\/angus\/wp-json\/wp\/v2\/posts\/79\/revisions"}],"predecessor-version":[{"id":1601,"href":"https:\/\/www.malcolmhardie.com\/weblogs\/angus\/wp-json\/wp\/v2\/posts\/79\/revisions\/1601"}],"wp:attachment":[{"href":"https:\/\/www.malcolmhardie.com\/weblogs\/angus\/wp-json\/wp\/v2\/media?parent=79"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.malcolmhardie.com\/weblogs\/angus\/wp-json\/wp\/v2\/categories?post=79"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.malcolmhardie.com\/weblogs\/angus\/wp-json\/wp\/v2\/tags?post=79"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}