Internationalized Strings

MongoDB supports UTF-8 for strings in stored objects and queries.  (Specifically, BSON strings are UTF-8.)

Generally, drivers for each programming language convert from the language's string format of choice to UTF-8 when serializing and deserializing BSON.  For example, the Java driver converts Java Unicode strings to UTF-8 on serialization.

In most cases this means you can effectively store most international characters in MongoDB strings. A few notes:

  • MongoDB regex queries support UTF-8 in the regex string.
  • Currently, sort() on a string uses strcmp: sort order will be reasonable but not fully international correct.  Future versions of MongoDB may support full UTF-8 sort ordering.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.

PLEASE POST QUESTIONS IN THE FORUMS: http://groups.google.com/group/mongodb-user. Post tips and clarifications here.

blog comments powered by Disqus