mbrtowc

mbrtowc

Converts a multibyte character to a wide character, and saves the parse state

#include <wchar.h>
size_t mbrtowc ( wchar_t * restrict widebuffer , const char * restrict string ,
                size_t maxsize , mbstate_t * restrict state  );

The mbrtowc( ) function, like mbtowc( ), determines the wide character that corresponds to the multibyte character referenced by the second pointer argument, and stores the result in the location referenced by the first pointer argument. Its additional parameter, a pointer to an mbstate_t object, describes the shift state of a multibyte character sequence in the given encoding. mbrtowc( ) updates this shift-state object after analyzing the multibyte character in the string, so you can use it in a subsequent function call to interpret the next character correctly. (Hence the additional "r" in the function name, which stands for "restartable.") If the last argument is a null pointer, mbrtowc( ) uses an internal, static mbstate_t object.

The third argument is the maximum number of bytes to read for the multibyte character, and the return value is the number of bytes that the function actually read to obtain a valid multibyte character. If the string pointer in the second parameter points to a null character, mbrtowc( ) returns 0 and sets the parse state object to the initial state. If the string pointer does not point to a valid multibyte character, mbrtowc( ) returns -1, sets the errno variable to EILSEQ, and leaves the mbstate_t object in an undefined state.

Example

size_t mbstoupper( char *s1, char *s2, size_t n )
/* Copies the multibyte string from s2 to s1, converting all the characters
   to upper case on the way.
   Because there are no standard functions for case-mapping in multibyte
   encodings, converts to and from the wide-character encoding (using the
   current locale setting for the LC_CTYPE category). The source string must
   begin in the initial shift state.
   Returns: the number of bytes written; or (size_t)-1 on an encoding error.
 */
{
  char *inptr = s2, *outptr = s1;
  wchar_t thiswc[1];
  size_t inresult, outresult;

  mbstate_t states[2], *instate = &states[0], *outstate = &states[1];

  memset( states, '\0', sizeof states );

  do
  {
    inresult = mbrtowc( thiswc, inptr, MB_CUR_MAX, instate );
    switch ( inresult )
      {
      case (size_t)-2:  // The (MB_CUR_MAX) bytes at inptr do not make a
            // complete mb character. Maybe there is a redundant sequence of
            // shift codes. Treat the same as an encoding error.
        *outptr = '\0';
        return (size_t)-1;

      case (size_t)-1:   // Found an invalid mb sequence at inptr:
        return inresult; // pass the error to the caller.

      case 0:         // Got a null character. Make a last null wc.
                      // The default action, with wcrtomb, does this nicely,
                      // so *no break statement* necessary here.

      default:        // Read <result> mb characters to get one wide
                      // character.
        /* Check for length limit before writing anything but a null.
           Note: Using inresult as an approximation for the output length.
           The actual output length could conceivably be different due to a
           different succession of state-shift sequences.
        */
        if (( outptr - s1 ) + inresult + MB_CUR_MAX > n )
        {   // i.e., if bytes written + bytes to write + termination > n,
            // then terminate now by simulating a null-character input.
          thiswc[0] = L'\0';
          inresult = 0;
        }
        inptr += inresult;
        if (( outresult = wcrtomb( outptr, (wchar_t)towupper(thiswc[0]),
                                   outstate )) == -1 )
        {                               // Encoding error on output:
          *outptr = '\0';               // Terminate and return error.
          return outresult;
        }
        else
          outptr += outresult;
      }
  } while ( inresult );                 // Drop out after handling '\0'.
  return outptr - s1;
}

Example

See Also