4.11. Using UTF-8 with APIs
An API has two vectors over which you're going to need to enforce a character set and encoding: input and output. (Throughout this book, the term API refers to external web services APIs, unless otherwise noted. We're not talking about language tools or classes.)
As far as the output goes, you probably already have it covered. If API responses are XML based, then you can use the same HTTP and XML headers as we previously discussed. If your output is HTML based, the HTTP header and <meta> tag combination will work fine.
For other custom outputs, using a BOM can be a good idea if you have some way to determine the start of a stream. If you can't or don't want to use a BOM, nothing beats just documenting what you're sending. Making your output character set and encoding explicit early on will guard against people developing applications that work at first but crash when they finally encounter some foreign text.
Input to APIs can be a bigger problem. As the saying goes, the only things less intelligent than computers are their users. If you expose a public API to your application, you can't guarantee that the text sent will be in the correct character set. As with all input vectors, it's extremely important to verify that all input is both valid and goodsomething we're going to look at in detail in the next chapter.
|