Blog of V.Ganesh: August 2011

Saturday, August 27, 2011

Upgrade to MeTA Studio : full jython integration, first steps of jchempaint integration

A new upgrade to MeTA Studio is now available for download. The current version is 2.0.27082011 and the update is available from usual place: http://code.google.com/p/metastudio/

Changes in this version:
- Updated smackx libraries, GTalk integration works again!
- Fully supported Jython integration into the IDE

- Initial implementation of JChempaint integration, more improvements on the way

- Other improvements: surface normals, improved GTalk widget, bug fixes and improvements in federation framework, issues with .mar handling fixed, new APIs for managing workflows (no UI integration yet).
- I also welcome javadba, who recently joined the project and plans to contribute unit tests for the parallel framework in the MeTA Studio and in the process improve it.

With this release, MeTA Studio SVN has 500+ commits (http://code.google.com/p/metastudio/updates/list). Though the development on MeTA Studio started quite some time back (2003-04 for version 2.0.x), it was only made opensource in 2009. Just interesting to look back ;)

Enjoy!

(PS: as you might notice from SVN logs, I unsuccessfully tried porting the project to JDK 7, which was recently released. I am almost sure that the problem is with a bug in the current JDK release. Will be investigating and may eventually migrate to the new JDK release once it becomes stable enough.)

Tuesday, August 16, 2011

Hello Kinect : Speech recognition

If you have ever used speech recognition built into recent version of Windows, you would know of all the troubles it has. It is good, but not good enough to make you want it use on everyday basis. It is good when it works, and painful other times. Handwriting recognition or merely typing is much less annoying to get your real work done. Most of the trouble comes when you are in a room where other people are also talking. The engine gets confused.
With Kinect, however, things are better as the speech recognizer also uses additional inputs from the ~~camera~~ multi-array microphone to estimate the source of sound to recognize. Which pretty much makes speech recognition quite interesting.
So what all do we need to identify speech using the Kinect SDK?
First of all you need to add Microsoft.Speech.dll to your project, and them make the following two imports:

using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Recognition;

All of the speech recognition stuff is then essentially handled by the classes: KinectAudioSource and SpeechRecognitionEngine. The later is part of Microsoft speech API and provides a generalized framework for speech recognition.

private KinectAudioSource kinectSource;
private SpeechRecognitionEngine sre;

RecognizerInfo ri = SpeechRecognitionEngine.InstalledRecognizers().Where(
r => r.Id == RecognizerId).FirstOrDefault();

if (ri == null) return;
sre = new SpeechRecognitionEngine(ri.Id);
var helloChoice = new Choices();
helloChoice.Add("hello");
helloChoice.Add("kinect");
var gb = new GrammarBuilder();
gb.Append(helloChoice);
var g = new Grammar(gb);
sre.LoadGrammar(g);
sre.SpeechRecognized += sre_SpeechRecognized;
sre.SpeechHypothesized += sre_SpeechHypothesized;
sre.SpeechRecognitionRejected += new EventHandler(sre_SpeechRecognitionRejected);
var t = new Thread(StartKinectAudioStream);
t.Start();

For the speech recognizer to work correctly, you need to provide words that need to identified. These are handled by constructing a 'grammer' for the same. In the above code we construct a simple grammar to recognizer only two words 'hello' and 'kinect'. Next we register event handlers, which are 'callbacks' when the SpeechRecognitionEngine recognizes (or does not) something that is spoken.
After this we open the Kinect's audio stream and start listening to it in a different thread.

The body of StartKinectAudioStream() function is as follows:

kinectSource = new KinectAudioSource();
kinectSource.SystemMode = SystemMode.OptibeamArrayOnly;
kinectSource.FeatureMode = true;
kinectSource.AutomaticGainControl = false;
kinectSource.MicArrayMode = MicArrayMode.MicArrayAdaptiveBeam;
var kinectStream = kinectSource.Start();
sre.SetInputToAudioStream(kinectStream, new SpeechAudioFormatInfo(
                                               EncodingFormat.Pcm, 16000, 16, 1,
                                               32000, 2, null));
sre.RecognizeAsync(RecognizeMode.Multiple);

The code above basically tries to construct a beam for each person recognized by Kinect (skeletal tracker).

Finally, the signature of event handlers for speech recognizer are as follows:

void sre_SpeechRecognitionRejected(object sender, SpeechRecognitionRejectedEventArgs e)
void sre_SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
   Console.Write("\rSpeech Recognized: \t{0}", e.Result.Text);
   lastRecognizedWord = e.Result.Text;
}

Here a short video:

More to come soon :)

Monday, August 15, 2011

Google + Motorola = Googarola !

Here we go, Googarola (http://googleblog.blogspot.com/2011/08/supercharging-android-google-to-acquire.html)! Google has announced that it would be acquiring Motorola mobility, the unit of Motorola that build the Droids. Essentially, Google has stopped being software only company, by buying a major Android handset manufacturer. This is a quite smart move. From the official blogpost, it is quite apparent the motive of this buy: Patents. Lately, Android handset manufacturer have been hit by patents from all sides, some of which may be ligit, but overall they simply seem to seve as a way to scare companies away from using Android alltogether. With Motorola acquisition, Google gets its hand on a large pool of mobile patents, and at the same time also weakens its complaint againt Apple+Mircosoft et. al. to the US DoJ for the rescent issue of Nortel patents buyout.
Personally, I feel this is a good move in view of customers. Hopefully, the mobile space has room for atleast about 5 different ecosystems which are sustainable as against only 2 players in case of PCs. Google can now have more control over the hardware and at the same time will have less worry of any patent litigation. Also, I think this move is good for other Android handset manufactures, as now Google can essentially be more helpful to them when it comes to patent litigations. What actually worries me is if the culture of an 80 year old company will jell well with an yet young Google.
In view of this acquisition, one more thing seems to be clear: the reason why Nokia didn't go with Android. I guess, with Motorola acquisition in mind, Google obviously couldn't think of giving Nokia enough room for customization of Andoid.

Things in mobile space are getting very interesting. May be another major cosolidation is underway... like the 'unlikely' acquition of Nokia by Microsoft? Although I am highly skeptical of such an event.

Saturday, August 06, 2011

Hello Kinect!

So, finally I gave up all the resistance to avoid buying this cool new stuff and ordered a piece for myself from http://www.flipkart.com/. The unit arrived last week, but had to spent the weekend rearranging my living room so that I could get enough space to use my Kinect. Plugging it to my PC and using the drivers provided with Microsoft Kinect SDK was straightforward. I did initially face a problem with the driver not installing properly, but quickly figured it out that this was because the Kinect was not plugged into a root USB port.

My intention of getting Kinect was not to play games, but to play with programming it. Although I would happily take an XBox, if you gift me one ;-)
This post and subsequent posts on Kindle on this blog will tell my experience of programming on Kindle. As a first step, I ensured that the samples provided with the SDK work well. Next, I installed Visual Studio Express 2010 (available for free here: http://www.microsoft.com/express).
I chose C# (C-Sharp, may be they should call it C-Dumb :P) as my programming language for Kinect. Jokes apart, I have very little experience using C#, mostly using Java or C++ (left using Fortran on day-to-day basis 2 years ago!). Any how, I found C# to be quite neatly designed language and easy to learn particularly if you come from Java or C++ background. If you come from C++, you are sure to enjoy some freshness that Java brought to object oriented programming.
So considering you have some idea to program in C#, writing a 'Hello Kinect' is relatively easy. First ensure that you have added Microsoft.Research.Kinect.dll as an external dependency to the Visual Studio project you create.

Next is to import the Kinect APIs:
using Microsoft.Research.Kinect.Nui;
All the initialization of Kinect NUI (Natural User Interface) is handled using the Runtime class.
Runtime nui = new Runtime();
nui.Initialize(RuntimeOptions.UseDepthAndPlayerIndex |
               RuntimeOptions.UseSkeletalTracking |
               RuntimeOptions.UseColor);

Next we open the Video and Depth streams of Kinect:
nui.VideoStream.Open(ImageStreamType.Video, 2,
                     ImageResolution.Resolution640x480,
                     ImageType.Color);
nui.DepthStream.Open(ImageStreamType.Depth, 2,
                     ImageResolution.Resolution320x240,
                     ImageType.DepthAndPlayerIndex);
Note that the current Kinect hardware only support VGA resolution (max) for video stream.
When a data frame is available for processing on Kinect, the driver sends a notification to the application. In C# this is handled by registering an appropriate even handler as follows:
nui.DepthFrameReady +=
new EventHandler<ImageFrameReadyEventArgs>(nui_DepthFrameReady);
nui.SkeletonFrameReady +=
new EventHandler<SkeletonFrameReadyEventArgs>(nui_SkeletonFrameReady);

The signatures of the event handlers look as below:
void nui_DepthFrameReady(object sender, ImageFrameReadyEventArgs e)
void nui_SkeletonFrameReady(object sender,
                            SkeletonFrameReadyEventArgs e)

Now, I have managed to see how to get the Skeletal data easily:
SkeletonFrame skeletonFrame = e.SkeletonFrame;
foreach (SkeletonData data in skeletonFrame.Skeletons)
{
       foreach (Joint joint in data.Joints)
       {
              // transform and plot joint.Position
       }
}
In the end uninitialize:
nui.Uninitialize();

So here is a video of my ‘dot avatar’) in action Smile

My ‘dot avatar’, the other ‘dot avatar’ is my dad in background. Kinect SDK for Windows 7 at the moment allows tracking of only two people.

Next up, I need to figure out how exactly to use depth data as well as handle audio. Hopefully a post for next weekend Smile

Have fun!