Sneak peek at the new Unicode-friendly PHP6

Summary: The following is a guest post from Andrew Mager, associate technical producer at ZDNet. This dispatch is from the Bay Area PHP Meetup.

The following is a guest post from Andrew Mager, associate technical producer at ZDNet. This dispatch is from the Bay Area PHP Meetup. He can be found at Andrewmager.com.

When Andrei Zmievski isn't busy building infrastructure for a social gaming startup, or processing photos from his Nikon D3, he is compiling C code that will define PHP 6.

php1.jpg

He stopped by the old CNET Networks building last week to speak to the Bay Area PHP Meetup audience.

PHP6 will have Unicode support everywhere; in the engine, in extensions, in the API. It's going to be native and complete; no hacks, no external libraries, no language bias. English is just another language, it's not the primary language.

Unicode is a system that provides a unique number for every character. Its current version has 99,000+ characters, but it has the capacity for over 1 million+ characters.

Complete support of Unicode will prevent Mojibake, the phenomenon of incorrect, unreadable characters shown when computer software fails to render text correctly, according to its associated character encoding.

php2.jpg

We've all seen it, and it's ugly.

PHP6 supports Unicode composition, so you can create new characters as languages evolve.

php31.jpg

Unicode simplifies development, but doesn't solve all of the internationalization problems.

Internationalization is the design and development of an application without built-in cultural assumptions and that is efficient to localize. Time formats, currencies, sorting letters — there are lots of inconsistencies in the world.

PHP6 will have two string types: Unicode and Binary.

Unicode identifiers are allowed:

php4.jpg

Functions will understand how to read Unicode text.

Streams have built-in support for converting between Unicode strings and other encodings on the fly. ex: fopen('textfile.txt'), fread('something.txt');

In version 6, PHP will be much easier to use across different languages. For instance, this is how simple it is to grab a Chinese news feed, parse the first five stories, clean it up, and convert it to JSON.

php5.jpg

Then you can easily display it on the web in any format you want.

TextIterator is a new feature of PHP 6: you can iterate over code points, characters, graphemes, words, lines, sentences, both forwards or backwards. It makes truncating much easier.

Transliteration allows you to take names written in Japanese and translate them into Latin so you can pronounce it. It only takes two lines of code in PHP 6:

php6.jpg

pecl/intl will be bundled in PHP 5.3. It's complementary to the Unicode support.

Other cool features of PHP 6 include number collation, formatting numbers, "message formatting", APC bundled, closures, traits, 64-bit integer type, a new MySQL driver, and general cleanup.

Zmievski says the new language will hopefully be ready by March of 2009. Here is a link to the full presentation.

Topic: Software Development

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

3 comments
Log in or register to join the discussion
  • More interested in 5.3....

    and the features coming with that. And with the Zend Framework coming along even the stuff in 5.3 isn't as important. Now if they would just work on their documentation...

    I like the fact that the accelerator will be bundled although its not the one I use.
    storm14k
  • RE: Sneak peek at the new Unicode-friendly PHP6

    I think 5.3 comes out very soon.
    magerleagues1
  • RE: Sneak peek at the new Unicode-friendly PHP6

    I regarded it had been gonna <a href="http://www.porter-bags.com/" style="text-decoration: none; color: black;">lv handbags on sale</a> be some uninteresting outdated submit, nonetheless it unquestionably fundamentally compensated for my time.
    tomlin21-24319035676893835085146735905770