Java FileWriter, XML and UTF-8

Oddly enough the java.io.FileWriter class doesn’t use UTF-8 by default. I’m not exactly sure what the default encoding is (possibly ISO-8859-1 or US-ASCII?) but it doesn’t seem to be UTF-8, which is odd given that java strings are supposed to be unicode. This causes a problem if you want to have non-ascii characters and you don’t realise what’s happening. This was a bug in SQLEditor and somebody accidentally typed an umlaut into one of the fields and the file wouldn’t reload. (Which was annoying).

The correct thing to do seems to be to use the following:

OutputStreamWriter out = new OutputStreamWriter(new FileOutputStream(path),"UTF-8");

Which ensures that you are using UTF-8.

I suppose that the motivation for this is that it means that simple use of FileWriter is compatible with applications that are not unicode aware and don’t support UTF-8. It probably makes sense at some level, but it just goes to show that you can’t assume anything. 🙂

Update: Bela’s comment (below) explains more about which character set you’ll actually get.

This entry was posted in SQLEditor, Writing Software. Bookmark the permalink.

43 Responses to Java FileWriter, XML and UTF-8

  1. Florian says:

    That’s exactly the line of code I needed. You are currently no. 2 in a google search for “FileWriter UTF-8” 😉

    The java input/ouput api sooo unintuitive – and when someone actually wrote an easy to use FileWriter class he forgot to implement a setEncoding(…).

  2. Severine says:

    And no. 1 with “java FileWriter UTF-8”
    Danke schön !

  3. laurent says:

    Thank You !

    your answer is so accurate for my
    “FileWriter UTF8” google search !!!

    That’s exactly the line of code I needed too !

  4. Edge says:

    Thank you!
    That’s what I need.

  5. sinka says:

    Thank you!!Gracias!

  6. Nabil says:

    chukran! (thx in arabic)

  7. Shachar says:

    Toda (thx in Hebrew)

  8. simon says:

    4 years later and your code is still helping people. Many thanks my friend!!

  9. Damian Mora says:

    Excellent, just the code line I was looking for. Muchas Gracias. 🙂

  10. kann says:

    Thanks,nice work ^^

  11. Bela says:

    Köszönöm (thx in Hungarian)

    I’ve read after this article the Java API carefully and there is the answer: (http://java.sun.com/javase/6/docs/api/java/io/FileWriter.html)

    “Convenience class for writing character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are acceptable. To specify these values yourself, construct an OutputStreamWriter on a FileOutputStream. ”

    You get the default character encoding on your system:
    System.getProperty(file.encoding) => I have the cp1252

    So, never use FileWriter! It is everything, but convenient.

  12. Gowmukhi says:

    Awesome !!!

    Thanks

  13. Vijay says:

    Thank you! Dude.. those who are struggling with castor utf-8 conversion.. this is very helpful piece of code…

  14. Nitish says:

    Your answer give me the absolute answer of my question. I used same concept for utf-16 encoding, for my encryption -decryption project. I come up with success. But have still a problem while decryption it is saving file in such [] blocks everytime, but reading it write. I checked the utf-16 code it is reading. I would like to chat you about the problem any time you would like.

  15. Senny says:

    I was using fileWriter and was facing some problems with the copyright symbol, due to which my xml contained invalid characters. Your line of code gave exactly what I was looking for….

  16. Vaibhav says:

    Thanks a lot!! Finally I got what i was looking for 🙂

  17. Vishal says:

    This post is really great!

  18. Simon says:

    very nice! exactly what i’m looking for.

  19. Konstantin Petrukhnov says:

    first result from Google:
    “java xml output utf-8”

  20. Thiago says:

    Thanks!!!!!

  21. Sebastien says:

    Thanks a lot for posting this… even many months later, it still helps some people! 🙂

  22. Marcelo says:

    Thanks !! Gracias !!! its Nov-2009 and this code keeps helping people ! =)

  23. You’re still number one hit on google for “java filewriter for utf-8”. Your code is exactly what I need. Thank You.

  24. agustin says:

    Just fine…

    tks

    From Chile.

  25. Angus Hardie says:

    A most illuminating explanation, thank you!

  26. Iulian says:

    A big thank you from Romania !

  27. Menio says:

    …and from the Netherlands too!

  28. Y says:

    Thank you a lot, Java sux in default…

  29. milan says:

    dakujem (in slovak) 🙂

  30. arny says:

    great & thanks much.
    just to add:
    looking at:
    http://java.sun.com/javase/6/docs/api/java/io/FileWriter.html
    made me add like this:
    —java:
    Writer out = new BufferedWriter( new OutputStreamWriter(new FileOutputStream(this.outputFilename),”UTF-8″));

    (I guess that’s what they call the “decorator pattern” in for example:
    http://oreilly.com/catalog/9780596007126
    )
    HTH

  31. Abraham says:

    Thanks man! I’m starting with JDom to creates XMLs and this post was what I looking for 😉 GBY

  32. Russ says:

    Yes, now July 29, 2010 and this post is still a lifesaver! I didn’t suspect that this class was the source of my problems, now solved.

  33. Esteban says:

    thànks mán!!!!!!!!!

  34. Marco says:

    Love you, man! It solved my problem! =D

  35. milkywayfarer says:

    More over, today is 25th of December and post is still actual!
    Thx from cold Russia (:

  36. mohan verma says:

    Thanks alot my friend!!!!
    I found what i am looking for!!!

  37. Ivan says:

    Thank you!!!!!!!!!!!!!!!!!!!!!!!!!!!)

  38. Jomo Frodo says:

    Beautiful – thanks!

  39. nick says:

    I found similar problem in March 2008 with reading UTF-8 encoded files in. I wrote it up here:

    http://footech.blogspot.com/search/label/UTF8

  40. EPO says:

    Before
    new FileWriter( ….
    output
    wÃŒnscht

    After
    new OutputStreamWriter(…. ,”UTF-8″)
    output
    wĂĽnscht

    expected
    wünscht

    shit went in the second round ….

  41. Anh says:

    Cảm ơn bạn!!!

  42. fereshteh says:

    Sepaas (Thanks in Persian!) ^_^

Leave a Reply