X
Business

Sneak peek at the new Unicode-friendly PHP6

The following is a guest post from Andrew Mager, associate technical producer at ZDNet. This dispatch is from the Bay Area PHP Meetup.
Written by Larry Dignan, Contributor

The following is a guest post from Andrew Mager, associate technical producer at ZDNet. This dispatch is from the Bay Area PHP Meetup. He can be found at Andrewmager.com.

When Andrei Zmievski isn't busy building infrastructure for a social gaming startup, or processing photos from his Nikon D3, he is compiling C code that will define PHP 6.

He stopped by the old CNET Networks building last week to speak to the Bay Area PHP Meetup audience.

PHP6 will have Unicode support everywhere; in the engine, in extensions, in the API. It's going to be native and complete; no hacks, no external libraries, no language bias. English is just another language, it's not the primary language.

Unicode is a system that provides a unique number for every character. Its current version has 99,000+ characters, but it has the capacity for over 1 million+ characters.

Complete support of Unicode will prevent Mojibake, the phenomenon of incorrect, unreadable characters shown when computer software fails to render text correctly, according to its associated character encoding.

We've all seen it, and it's ugly.

PHP6 supports Unicode composition, so you can create new characters as languages evolve.

Unicode simplifies development, but doesn't solve all of the internationalization problems.

Internationalization is the design and development of an application without built-in cultural assumptions and that is efficient to localize. Time formats, currencies, sorting letters — there are lots of inconsistencies in the world.

PHP6 will have two string types: Unicode and Binary.

Unicode identifiers are allowed:

Functions will understand how to read Unicode text.

Streams have built-in support for converting between Unicode strings and other encodings on the fly. ex: fopen('textfile.txt'), fread('something.txt');

In version 6, PHP will be much easier to use across different languages. For instance, this is how simple it is to grab a Chinese news feed, parse the first five stories, clean it up, and convert it to JSON.

Then you can easily display it on the web in any format you want.

TextIterator is a new feature of PHP 6: you can iterate over code points, characters, graphemes, words, lines, sentences, both forwards or backwards. It makes truncating much easier.

Transliteration allows you to take names written in Japanese and translate them into Latin so you can pronounce it. It only takes two lines of code in PHP 6:

pecl/intl will be bundled in PHP 5.3. It's complementary to the Unicode support.

Other cool features of PHP 6 include number collation, formatting numbers, "message formatting", APC bundled, closures, traits, 64-bit integer type, a new MySQL driver, and general cleanup.

Zmievski says the new language will hopefully be ready by March of 2009. Here is a link to the full presentation.

Editorial standards