Browse Source

substr() sample case

migration/RELEASE_1_0_0
Andrei Zmievski 21 years ago
parent
commit
cd43b7dda7
  1. 60
      README.UNICODE-UPGRADES

60
README.UNICODE-UPGRADES

@ -262,6 +262,66 @@ Unicode strings:
Upgrading Functions
===================
Let's take a look at a couple of functions that have been upgraded to
support new string types.
substr()
--------
This functions returns part of a string based on offset and length
parameters.
void *str;
int32_t str_len, cp_len;
zend_uchar str_type;
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "tl|l", &str, &str_len, &str_type, &f, &l) == FAILURE) {
return;
}
The first thing we notice is that the incoming string specifier is 't',
which means that we can accept all 3 string types. The 'str' variable is
declared as void*, because it can point to either UChar* or char*.
The actual type of the incoming string is stored in 'str_type' variable.
if (str_type == IS_UNICODE) {
cp_len = u_countChar32(str, str_len);
} else {
cp_len = str_len;
}
If the string is a Unicode one, we cannot rely on the str_len value to tell
us the number of characters in it. Instead, we call u_countChar32() to
obtain it.
The next several lines normalize start and length parameters to fit within the
string. Nothing new here. Then we locate the appropriate segment.
if (str_type == IS_UNICODE) {
int32_t start = 0, end = 0;
U16_FWD_N((UChar*)str, end, str_len, f);
start = end;
U16_FWD_N((UChar*)str, end, str_len, l);
RETURN_UNICODEL((UChar*)str + start, end-start, 1);
Since codepoint (character) #n is not necessarily at offset #n in Unicode
strings, we start at the beginning and iterate forward until we have gone
through the required number of codepoints to reach the start of the segment.
Then we save the location in 'start' and continue iterating through the number
of codepoints specified by the offset. Once that's done, we can return the
segment as a Unicode string.
} else {
RETURN_STRINGL((char*)str + f, l, 1);
}
For native and binary types, we can return the segment directly.
References
==========

Loading…
Cancel
Save