Determines the length of a multibyte character, or whether the multibyte encoding is stateful
#include <stdlib.h>
int mblen ( const char *s , size_t maxsize );
The mblen( ) function determines the length in bytes of a multibyte character referenced by its pointer argument. If the argument points to a valid multibyte character, then mblen( ) returns a value greater than zero. If the argument points to a null character ('\0'), then mblen( ) returns 0. A return value of -1 indicates that the argument does not point to a valid multibyte character, or that the multibyte character is longer than the maximum size specified by the second argument. The LC_TYPE category in the current locale settings determines which byte sequences are valid multibyte characters. The second argument specifies a maximum byte length for the multibyte character, and should not be greater than the value of the symbolic constant MB_CUR_MAX, defined in stdlib.h. If you pass mblen( ) a null pointer as the first argument, then the return value indicates whether the current multibyte encoding is stateful. This behavior is the same as that of mbtowc( ). If mblen( ) returns 0, then the encoding is stateless. If it returns any other value, the encoding is stateful; that is, the interpretation of a given byte sequence may depend on the shift state.
Example
size_t mbsrcat( char * restrict s1, char * restrict s2,
mbstate_t * restrict p_s1state, size_t n )
/* mbsrcat: multibyte string restartable concatenation.
* Appends s2 to s1, respecting final shift state of destination string,
* indicated by *p_s1state. String s2 must start in the initial shift state.
* Returns: number of bytes written, or (size_t)-1 on encoding error.
* Max. total length (incl. terminating null byte) is <= n;
* stores ending state of concatenated string in *s1state.
*/
{
int result;
size_t i = strlen( s1 );
size_t j = 0;
if ( i >= n - ( MB_CUR_MAX + 1 )) // Sanity check: room for 1 multibyte
// char + string terminator.
return 0; // Report 0 bytes written.
// Shift s1 down to initial state:
if ( !mbsinit( p_s1state )) // If not initial state, then append
{ // shift sequence to get initial state.
if ( ( result = wcrtomb ( s1+i, L'\0', p_s1state )) == -1 )
{ // Encoding error:
s1[i] = '\0'; // Try restoring termination.
return (size_t)-1; // Report error to caller.
}
else
i += result;
}
// Copy only whole multibyte characters at a time.
// Get length of next char w/o changing state:
while (( result = mblen( s2+j, MB_CUR_MAX )) <= (n - ( 1 + i )) )
{
if ( result == 0 ) break;
if ( result == -1 )
{ // Encoding error:
s1[i] = '\0'; // Terminate now.
return (size_t)-1; // Report error to caller.
}
// Next character fits; copy it and update state:
strncpy( s1+i, s2+j, mbrlen( s2+j, MB_CUR_MAX, p_s1state ));
i += result;
j += result;
}
s1[i] = '\0';
return j;
}
See Also
mbrlen( ), mbtowc( )
|