Monday, October 31, 2011

PHP UTF-8 character to dezimal converter

A while ago I was developing on a connector to a SMS sending API. As SMS are partially transmitted in a 7-bit coded limited character set (http://en.wikipedia.org/wiki/GSM_03.38) not all characters are supported. To validate each and every character in the message, it is useful - at least for debugging purposes - to convert the characters to decimal and filter it by looking up an array.

Correct me if I am wrong but in my opinion, PHP is not really supporting development with UTF-8 this is why my function is checking the bytes low-level.

Simply change the variable $string to the text you want to convert and the output will be value and character row by row.

<html xmlns="http://www.w3.org/1999/xhtml" 
   xml:lang="en-us" lang="en-us" dir="ltr" >
<head>
   <meta http-equiv="content-type" content="text/html; 
      charset=UTF-8" />
</head>
<body>
<?php
$string = "|^€{}[~]\\";
$count = 0;

for ($i=0; $i < strlen($string); $i++)
{
    echo ordUTF8($string, $i, $count)." ".$string[$i]."<br />";
    $i += $count - 1;
}

function ordUTF8($string, $index = 0, &$bytes = null)
{
    $len = strlen($string);
    $bytes = 0;
    
    if ($index >= $len)
    {
        return false;
    }
    
    $h = ord($string{$index});
    
    if ($h <= 0x7F)
    {
        $bytes = 1;
        return $h;
    }
    else if ($h < 0xC2)
    {
        return false;
    }
    else if ($h <= 0xDF && $index < $len - 1)
    {
        $bytes = 2;
        return ($h & 0x1F) <<  6 
            | (ord($string{$index + 1}) & 0x3F);
    }
    else if ($h <= 0xEF && $index < $len - 2)
    {
        $bytes = 3;
        return ($h & 0x0F) << 12 
            | (ord($string{$index + 1}) & 0x3F) << 6
            | (ord($string{$index + 2}) & 0x3F);
    }          
    else if ($h <= 0xF4 && $index < $len - 3)
    {
        $bytes = 4;
        return ($h & 0x0F) << 18 
            | (ord($string{$index + 1}) & 0x3F) << 12
            | (ord($string{$index + 2}) & 0x3F) << 6
            | (ord($string{$index + 3}) & 0x3F);
    }
    else
    {
        return false;
    }
}
?>
</body>
</html>

Sunday, October 30, 2011

Class generator out of JSON definitions

This is my first tool I like to offer. I started to write a Plasma Widget (Plasmoids) which is able to show my streams of Google+ on my Desktop. Unfortunately Google does not have an API for C++. It would be possible to write the Plasmoids in other languages like Python but because my knowledge of Python is very poor I decided to use C++. The data from Google+ is provided as JSON objects. Therefore I needed a compiler which is able to generates classes directly from the definition from Google+. For example the definition for a comment looks like that:

{
  "kind": "plus#comment",
  "id": string,
  "published": datetime,
  "updated": datetime,
  "actor": {
    "id": string,
    "displayName": string,
    "url": string,
    "image": {
      "url": string
    }
  },
  "verb": "post",
  "object": {
    "objectType": "comment",
    "content": string
  },
  "selfLink": string,
  "inReplyTo": [
    {
      "id": string,
      "url": string
    }
  ]
}

The tool is now able to create objects out of this definition and of course out of all other definitions similar to that. It is possible to generate classes for Qt and Java. It should be easy to extend this to other languages without very much effort.
The tool uses JavaCC to parse the JSON definition.

If something has to be changed on the grammar rules, I recommend the Eclipse plugin for JavaCC which one can find at http://eclipse-javacc.sourceforge.net/ If one just needs to add other languages there is no need for JavaCC and all changes can be done directly in the Java sources.

To use the tool, download the jar file from http://141.3.80.191/~mehrwald/JSONClassGenerator.jar
In the command line type:
java -jar JSONClassGenerator.jar <FileToParse> <ObjectName> <PathToSave> (Qt|Java) [<FileToParse> <ObjectName> <PathToSave> (Qt|Java)...]

If one needs to change something or add another language please find the sources at http://141.3.80.191/~mehrwald/JSONClassGenerator_src.zip

I would appreciate all sorts of comments to this tool. If you have questions do not hesitate to ask me.

What is this blog for?

In this blog we will provide tools and code snippets which we created to make our life as a programmer easier. Maybe there will be some tools from others which are able to fulfil the same tasks as our tools but we did not find them instantly or they are much to complex for the work we had to do.

As the name already states the tools are hacked. We needed them fast and the code is quick and dirty hacked. In most of the cases it should be possible for someone who is able to read source code to use and/or modify them. We did not want to write perfect code for this tools. We know there might be a better way to do something in the code but the quick and dirty thing was right okay for us.

Unless otherwise noted we want our code to be GPLv3. We will not provide licence information inside of every piece of code but we hope that everyone is honest enough to follow the rules of GPLv3. If one wants to have something for commercial use please feel free to contact us directly.

Now the blog is opened! Have fun with our tools if they are helpful for you and we hope the information we provide can help to reach the goal of your own projects faster.