Thursday, July 26, 2012

Reading UTF-8 Characters From An Infinite Byte Stream

I’ve been playing with the twitter streaming API today. In very simple terms, you make an HTTP request and then sit on the response stream reading objects off it. The stream is a stream of UTF-8 characters and each object is a JSON encoded data structure terminated by \r\n. Simple I thought, I’ll just create a StreamReader and set up a while loop on its Read method. Here’s my first attempt …

using(var reader = new StreamReader(stream, Encoding.UTF8))
{
var messageBuilder = new StringBuilder();
var nextChar = 'x';
while (reader.Peek() >= 0)
{
nextChar = (char)reader.Read()
messageBuilder.Append(nextChar);

if (nextChar == '\r')
{
ProcessBuffer(messageBuilder.ToString());
messageBuilder.Clear();
}
}
}

Unfortunately it didn’t work. The StreamReader maintains a small internal buffer so I wouldn’t see the \r\n combination that marked the end of a new tweet until the next tweet came along and flushed the buffer.

OK, so let’s just read each byte from the stream and convert them one-by-one into UTF-8 characters. This works fine when your tweets are all in English, but UTF-8 can have multi-byte characters; any Japanese tweets I tried to read failed.

Thanks to ‘Richard’ on Stack Overflow the answer turned out to be the Decoder class. It  buffers the bytes of incomplete UTF-8 characters, allowing you to keep stacking up bytes until they are complete. Here’s revised example that works great with Japanese tweets:

int byteAsInt = 0;
var messageBuilder = new StringBuilder();
var decoder = Encoding.UTF8.GetDecoder();
var nextChar = new char[1];

while ((byteAsInt = stream.ReadByte()) != -1)
{
var charCount = decoder.GetChars(new[] {(byte) byteAsInt}, 0, 1, nextChar, 0);
if(charCount == 0) continue;

Console.Write(nextChar[0]);
messageBuilder.Append(nextChar);

if (nextChar[0] == '\r')
{
ProcessBuffer(messageBuilder.ToString());
messageBuilder.Clear();
}
}

Thursday, July 12, 2012

Tracing System.Net to debug HTTP Clients

If you are writing software that leverages the System.Net.WebRequest class, you’re probably familiar with tools like Fiddler or Wireshark. You can use these tools to see the actual HTTP requests and responses going back and forth between your client and the server. A nice alternative to these tools, that I only recently discovered, is the System.Net trace source. The System.Net source emits logging messages from the HttpWebRequest and HttpWebResponse classes that give a very similar experience to using Fiddler.

Here’s an example App.config file that configures the System.Net listener to output both to the console and a log file:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>

<system.diagnostics>

<trace autoflush="true" />

<sources>
<source name="System.Net" maxdatasize="1024">
<listeners>
<add name="MyTraceFile"/>
<add name="MyConsole"/>
</listeners>
</source>
</sources>

<sharedListeners>
<add
name="MyTraceFile"
type="System.Diagnostics.TextWriterTraceListener"
initializeData="System.Net.trace.log"
/>
<add name="MyConsole" type="System.Diagnostics.ConsoleTraceListener" />
</sharedListeners>

<switches>
<add name="System.Net" value="Information" />
<!-- <add name="System.Net" value="Verbose" />-->
</switches>

</system.diagnostics>

</configuration>

Here I’ve set up two listeners; ‘MyTraceFile’, that outputs the trace information to a log file; and ‘MyConsole’, that outputs to the console.

My favourite test tool is TestDriven.NET which allows you to run arbitrary methods and sends the output to the Visual Studio output console. Being able to run a test method (I’ve got F8 mapped to run tests, so it’s a single keystroke) and see the System.Net trace output immediately in Visual Studio is very cool.

Here’s some code which makes a GET request to www.google.com

var request = WebRequest.CreateDefault(new Uri("http://www.google.com/"));

request.Method = "GET";

var response = (HttpWebResponse)request.GetResponse();

using (var responseStream = response.GetResponseStream())
{
if (responseStream == null)
{
Console.Out.WriteLine("response stream is null");
return;
}

using (var reader = new StreamReader(responseStream))
{
// do something with the response body
var responseBody = reader.ReadToEnd();
}

}

When I run this code, I get the following trace output …

System.Net Information: 0 : [5752] Current OS installation type is 'Client'.
System.Net Information: 0 : [5752] RAS supported: True
System.Net Error: 0 : [5752] Can't retrieve proxy settings for Uri 'http://www.google.com/'. Error code: 12180.
System.Net Information: 0 : [5752] Associating HttpWebRequest#49685557 with ServicePoint#53977989
System.Net Information: 0 : [5752] Associating Connection#56846532 with HttpWebRequest#49685557
System.Net Information: 0 : [5752] Connection#56846532 - Created connection from 192.168.1.146:53202 to 173.194.67.99:80.
System.Net Information: 0 : [5752] Associating HttpWebRequest#49685557 with ConnectStream#19026863
System.Net Information: 0 : [5752] HttpWebRequest#49685557 - Request: GET / HTTP/1.1

System.Net Information: 0 : [5752] ConnectStream#19026863 - Sending headers
{
Host: www.google.com
Connection: Keep-Alive
}.
System.Net Information: 0 : [5752] Connection#56846532 - Received status line: Version=1.1, StatusCode=302, StatusDescription=Found.
System.Net Information: 0 : [5752] Connection#56846532 - Received headers
{
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Location: http://www.google.co.uk/
Set-Cookie: expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com,path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com,domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com,expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com,path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com,domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com,expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com,path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com,domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com,expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com,path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com,domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google...}.
System.Net Information: 0 : [5752] ConnectStream#10789400::ConnectStream(Buffered 221 bytes.)
System.Net Information: 0 : [5752] Associating HttpWebRequest#49685557 with ConnectStream#10789400
System.Net Information: 0 : [5752] Associating HttpWebRequest#49685557 with HttpWebResponse#11016073
System.Net Warning: 0 : [5752] HttpWebRequest#49685557::() - Error code 302 was received from server response.
System.Net Warning: 0 : [5752] HttpWebRequest#49685557::() - Resubmitting request.
System.Net Information: 0 : [5752] Associating HttpWebRequest#49685557 with ServicePoint#23936385
System.Net Information: 0 : [5752] Associating Connection#22196665 with HttpWebRequest#49685557
System.Net Information: 0 : [5752] Connection#22196665 - Created connection from 192.168.1.146:53203 to 173.194.67.94:80.
System.Net Information: 0 : [5752] Associating HttpWebRequest#49685557 with ConnectStream#57250404
System.Net Information: 0 : [5752] HttpWebRequest#49685557 - Request: GET / HTTP/1.1

System.Net Information: 0 : [5752] ConnectStream#57250404 - Sending headers
{
Host: www.google.co.uk
Connection: Keep-Alive
}.
System.Net Information: 0 : [5752] Connection#22196665 - Received status line: Version=1.1, StatusCode=200, StatusDescription=OK.
System.Net Information: 0 : [5752] Connection#22196665 - Received headers
{
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Date: Thu, 12 Jul 2012 13:39:26 GMT
Expires: -1
Set-Cookie: NID=61=CTUlcAyhXQp63NVCOkXYWVgi2nMQiOUpyG-x1yRlw-Unhq3OyQ5zXCIxIJ9ctSN_qg6Lni90142sYKQzDZ7oZXBZxnWQbzhcjqVcKQEgCfBgMAjxhDgVLOfgXBR6IzTm; expires=Fri, 11-Jan-2013 13:39:26 GMT; path=/; domain=.google.co.uk; HttpOnly,PREF=ID=b7c02536ab59a395:FF=0:TM=1342100366:LM=1342100366:S=gqGT-3tWl96NIpdz; expires=Sat, 12-Jul-2014 13:39:26 GMT; path=/; domain=.google.co.uk,NID=61=CTUlcAyhXQp63NVCOkXYWVgi2nMQiOUpyG-x1yRlw-Unhq3OyQ5zXCIxIJ9ctSN_qg6Lni90142sYKQzDZ7oZXBZxnWQbzhcjqVcKQEgCfBgMAjxhDgVLOfgXBR6IzTm; expires=Fri, 11-Jan-2013 13:39:26 GMT; path=/; domain=.google.co.uk; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Transfer-Encoding: chunked
}.
System.Net Information: 0 : [5752] ConnectStream#42047594::ConnectStream(Buffered -1 bytes.)
System.Net Information: 0 : [5752] Associating HttpWebRequest#49685557 with ConnectStream#42047594
System.Net Information: 0 : [5752] Associating HttpWebRequest#49685557 with HttpWebResponse#47902635
System.Net Information: 0 : [5752] ContentLength=-1

You can see that Google users in the UK get a 302 redirect to www.google.co.uk. This is a special Google that refuses to say anything bad about Her Majesty and stops working to drink tea at exactly 11 am.

With the reporting level set at ‘Information’, you can see all the header HTTP information and some of the work that the underlying sockets are doing. Setting the level to ‘Verbose’ will give you the HTTP bodies as well.

For more information see the MSDN documentation here.

Happy HTTPing!

Tuesday, July 10, 2012

Playing with Twitter

My excellent clients, 15Below, are interested in exploring how social media, like Twitter and Facebook can work in their business area, travel communications. I’ve been tasked with investigating the technical side of this, in particular interacting with the Twitter API. Before I share what little I’ve learnt about talking to twitter with .NET, I’d like to throw a stone in the lake and ripple out some thoughts on life on planet Twitter.

The first conclusion I’ve come to is that a database of twitter handles isn’t very useful; there’s not much you can do with them. You can’t send more than 250 direct messages per day, and in any case the user has to be following you to receive them. So twitter as a point-to-point mass private communication channel, like email, is a non-starter. No spamming then. You can mention slightly more people, but once again you would soon start to hit the 1000 status updates per day limit with any serious targeted communications. You used to be able to get your application whitelisted by Twitter, which allows you higher API limits, but they’ve stopped that now.

We thought about creating multiple senders, but you can’t automate account creation and most of the limits are per account. Twitter is very strict about any application that they consider to be bending their rules, and you’ll soon find yourself blacklisted even if you manually set about creating multiple accounts for an application that’s deemed to be out of bounds.

The interesting scenarios coalesce around the core Twitter interaction use case; broadcasting to followers and joining in with conversations based around mentions. One idea we’re playing with is for an account that replies with status updates when you mention it with some kind of token, say a flight number or the name of a city. You’ve still got that 1000 tweets per day limit, but so long as the use case is partitioned correctly, that needn’t be a problem.

OK, that’s the ‘what’s it good for’ waffle out of the way, now for the interesting stuff: how easy is it to talk to Twitter from .NET?

I’ve been playing with the excellent TweetSharp library by Daniel Crenna, which makes basic twitter interaction pretty trivial. The only thing I had think at all hard about was the OAuth workflow.

OAuth is a protocol that allows a user to authorise an application for a subset of functionality on their account without giving away their password. Now I will probably be building server side components that automate a small set of twitter accounts, so I need an OAuth workflow that allows me to do a one-time configuration of the component to give it access to an account. This means I have to have a one-time OAuth tool, that allows me to harvest the OAuth tokens for a single account.

I’ve written a little console app to do just that …

class Program
{
// you get the consumer key and secret from Twitter when you register your app.
private const string consumerKey = " --- ";
private const string consumerSecret = " --- ";

static void Main(string[] args)
{
var service = new TwitterService(consumerKey, consumerSecret);
var requestToken = service.GetRequestToken();
var authorisationUri = service.GetAuthorizationUri(requestToken);

// open up a browser window and point it at the authorisationUri
Process.Start(authorisationUri.ToString());

Console.WriteLine("Input the verifier pin number from the web page:");
var verifier = Console.ReadLine();
var accessToken = service.GetAccessToken(requestToken, verifier);

// now output the access token and secret so that we can cut-n-paste them into our
// server side configuration.
Console.Out.WriteLine("accessToken.Token = {0}", accessToken.Token);
Console.Out.WriteLine("accessToken.TokenSecret = {0}", accessToken.TokenSecret);

// check that the access token and secret work.
service.AuthenticateWith(accessToken.Token, accessToken.TokenSecret);

Console.WriteLine(service.Response.Response);
}
}

This will pop up a browser window and present a pin number generated by Twitter that you then have to copy and paste. It then outputs the generated token and ‘token secret’ for that account.

Once you’ve got an access token and a token secret, you can use them for as long as you want. If you own the accounts your server side component is using, you don’t have to consider the workflow for when the user cancels your application’s access rights.

Sending a status update is trivial …

var service = new TwitterService(consumerKey, consumerSecret);
service.AuthenticateWith(previouslySavedToken, previouslySavedTokenSecret);
service.SendTweet(message);

My next challenge is getting to grips with the streaming API so that I can monitor mentions and reply to status update requests. I’ve got a working demo that polls the Twitter API, but it’s a very wasteful use of the scarce API access allowance.

So there you have it. Some very early thoughts on using Twitter for business and pleasure. I think there are some very exciting opportunities, especially for ‘you don’t have to install anything’ communication with remote applications scenario.

Tuesday, July 03, 2012

EasyNetQ: Continuous Delivery with TeamCity and NuGet

I recently applied to have EasyNetQ, my awesome RabbitMQ API for .NET, build on the CodeBetter TeamCity build server (you can log on as ‘guest’ with no password). The very helpful Anne Epstein set it up for me and all is now is building, testing and packaging nicely.
The bit I really like is that TeamCity can automatically push the EasyNetQ NuGet package up to NuGet.org on each build. So now EasyNetQ has continuous delivery, when you install or update it using NuGet you are getting the very latest version fresh from GitHub.
My attention was also drawn to MyGet.org, a personal-nuget-in-the-cloud service. For the moment, with EasyNetQ still very much a beta project, I’m happy to have the latest master always be available, but I can envisage some time in the future where I’ll want to have a bit more confidence before pushing a release up to NuGet.org; MyGet would allow me to publish beta packages for testing and then promote them to NuGet when they’ve passed all the QA barriers.
I now consider continuous deployment / delivery to be essential for any software project; It’s just brilliant being able to point anyone who cares to a test server that always has the very latest trunk build. It’s all a part of keeping development process cycles as tight as possible and always having working software.