<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dynamical Software</title>
	<atom:link href="http://www.dynamicalsoftware.com/news/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.dynamicalsoftware.com/news</link>
	<description>Towards a Smarter Software Development Experience</description>
	<lastBuildDate>Tue, 15 May 2012 05:19:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Using Solr/Lucene to Surface the Big Data of Social Media</title>
		<link>http://www.dynamicalsoftware.com/news/?p=212</link>
		<comments>http://www.dynamicalsoftware.com/news/?p=212#comments</comments>
		<pubDate>Sun, 06 May 2012 19:37:14 +0000</pubDate>
		<dc:creator>glenn</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Apache Solr]]></category>
		<category><![CDATA[EHcache]]></category>
		<category><![CDATA[Lucene]]></category>

		<guid isPermaLink="false">http://www.dynamicalsoftware.com/news/?p=212</guid>
		<description><![CDATA[Although you need Big Data to effectively implement a large scale social media solution, Hadoop is not always the right tool. This implementation description details how to use Solr/Lucene as a NoSql solution to meet the near real-time Big Data needs of a social news feed.]]></description>
			<content:encoded><![CDATA[<p>On Wednesday, May 9, I will be speaking at the <a title="Welcome to Lucene Revolution 2012" href="http://lucenerevolution.org/" target="_blank">2012 Lucene Revolution conference</a> in Boston on Using <a title="Apache Lucene - Apache Solr" href="http://lucene.apache.org/solr/" target="_blank">Solr/Lucene</a> to Surface the Big Data of Social Media. Solr is an open source technology most known for its capabilities as a search engine. Big Data is a recent IT trend where large amounts of data (both in terms of volume and rate) are collected and used. The amount of data is too large for a single relational database to handle. Social Media is any system where the users contribute content and express affinity for other content and those actions get published to the user&#8217;s social graph.</p>
<p>In this presentation, I will be focusing on how to use Solr as a kind of NoSql solution for Big Data. Topics will include scaling Solr both up and out, sharding, replication, caching, SOA, indexing, and synchronization. I will also give advise on how best to integrate Solr with other open source technologies such as Jetty, RabbitMQ, Spring, Ehcache, and HOWL.</p>
<p>If you are attending this conference, then I hope that you catch my presentation. If not, and you are interested in Solr as a NoSql solution, then be sure to check back with this blog topic as I will include links to the published slide deck afterwards.</p>
<p><strong>Update May 14</strong>: Back from the conference. Many Apache commiters talked about Lucene 4. SolrCloud is Solr on ZooKeeper. Good keynote from Hortonworks. Lame keynote from Microsoft. O&#8217;Reilly published a nice <a title="Lucene conference touches many areas of growth in search" href="http://radar.oreilly.com/2012/05/lucene-conference-touches-many.html" target="_blank">review of the conference</a> too.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dynamicalsoftware.com/news/?feed=rss2&amp;p=212</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Top Ten Ways to Customize Tigase</title>
		<link>http://www.dynamicalsoftware.com/news/?p=195</link>
		<comments>http://www.dynamicalsoftware.com/news/?p=195#comments</comments>
		<pubDate>Mon, 02 Jan 2012 00:37:22 +0000</pubDate>
		<dc:creator>glenn</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[chat]]></category>
		<category><![CDATA[Extensible Messaging and Presence Protocol]]></category>
		<category><![CDATA[Groovy]]></category>
		<category><![CDATA[IM]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[OSS]]></category>
		<category><![CDATA[real-time communications]]></category>
		<category><![CDATA[Tigase]]></category>
		<category><![CDATA[XMPP]]></category>

		<guid isPermaLink="false">http://www.dynamicalsoftware.com/news/?p=195</guid>
		<description><![CDATA[Tigase is a highly customizable XMPP server that provides an excellent, scalable platform for real-time communications that is easy to extend to meet your specific needs. There are over seventy different ways in which you can extend or enhance chat server functionality  in Tigase but I am going to cover just the top ten here.]]></description>
			<content:encoded><![CDATA[<p>With last month&#8217;s announcement that <a title="Microsoft opens up Messenger with XMPP access" href="http://www.techrepublic.com/blog/australia/microsoft-opens-up-messenger-with-xmpp-access/422" target="_blank">Microsoft is adopting XMPP in their Windows Live (was MSN) messenger</a> product,  I thought that I would help you ring in the new year and celebrate the <a title="Skype adds XMPP support, IM interoperability next?" href="http://gigaom.com/2011/06/28/skype-xmpp-support/" target="_blank">ever</a> <a title="Cleartext Enterprise IM App" href="http://www.cleartext.com/desktop.php">increasing</a> <a title="XMPP Is Hotter Than Ever: Cisco Acquires Jabber, Inc" href="http://metajack.im/2008/09/19/xmpp-is-hotter-than-ever/">popularity</a> of XMPP by publishing this post on some of the more efficacious ways in which you can customize the premier XMPP server technology Tigase in order to add Real Time Communications capability to your web based product or service.</p>
<p>Most of Tigase&#8217;s core code is written in Java. The best way to customize your version of Tigase is to build a Java project that extends Tigase classes or implements Tigase interfaces and to include the resulting JAR file in the libs folder. You tell Tigase about your classes through the init.properties file. There are over seventy different ways in which you can extend or enhance chat server functionality this way in Tigase but I am going to cover just the top ten here.</p>
<p><strong>AuthRepository</strong></p>
<p>Implement this interface to customize handling log in requests beyond what a single stored procedure call can do. If all you need to do is provide your own stored procedure, then stick with the TigaseCustomAuth implementation and specify your own stored procedure in the init.properties file.</p>
<p><strong>AbstractMessageReceiver</strong></p>
<p>Extend this abstract class to build a component. The most important methods to override are processPacket, getDefaults, and setProperties. The processPacket method is where you handle packets that are sent to this component. If you need to instrument your code with additional parameters, then use the getDefaults method to obtain your configuration out of the init.properties file. The setProperties method allows your component to accept dynamic properties that are set while Tigase is running.</p>
<p><strong>XMPPProcessor</strong></p>
<p>Extend this abstract class to build a plugin.. The Id method returns a string that must match what is in the –sm-plugins comma delimited list in order for this plugin to get loaded. The supElements and supNamespaces methods let Tigase know about the kinds of packets that this plugin supports. The init method is when your plugin gets initialized with configuration information from the init.properties file. The process method is what gets called when it is time for your plugin to process a packet.</p>
<p><strong>What is the difference between a plugin and a component?</strong> Think of <a title="Plugin Development" href="http://www.tigase.org/content/plugin-development" target="_blank">plugins</a> as the best way to extend existing functionality or add new functionality to already existing or new stanzas no matter who sends or receives them. <a title="Component Development" href="http://www.tigase.org/content/component-development" target="_blank">Components</a> permit you to add server-side chat bots who respond to traditional stanzas such as IQ, presence, or message packets. Plugins register for particular stanza types and get invoked no matter who is the recipient. In order for the component to process a request, that request must be addressed to be sent to the JID of the component itself. When a plugin handles a request, it gets access to either the sender&#8217;s (if outbound) or the receiver&#8217;s (if inbound) session data. A component has no access to user session data when it is invoked. A plugin always runs in the JVM of the session manager so the only way to scale out plugins is clustering. Components can be configured as either internal or external which is a more natural and intuitive way to scaling out additional functionality. New in version 5.1.0 of Tigase is the ability to cluster external components.</p>
<p><strong>DataRepository</strong></p>
<p>In all likelihood, you won&#8217;t need to implement this interface. Objects of this type help you manage multiple, pooled JDBC connections and their associated prepared statements. The factory method for objects of this type is RepositoryFactory.getDataRepository.</p>
<p><strong>ShutdownHook</strong></p>
<p>Implement this interface if you need to run some code when Tigase shuts down but that code can&#8217;t take a long time to run. You have to add code to your project that registers your own shutdown hooks. Logging during shutdown time is already turned off so debugging this code will not be easy.</p>
<p><strong>UserRepository</strong></p>
<p>Implement this interface when you need to integrate Tigase user data (e.g. rosters, vcards) with your web site. If you still need to access a relational database, then consider extending JDBCRepository instead. If you are setting up a federated chat system, then you will also need to extend UserRepositoryMDImpl for multi-domain rosters, etc.</p>
<p>Here are some additional tips and tricks to customizing Tigase.</p>
<p>If you want to filter packets, then extend XMPPProcessor and return true from your override of the preProcess method. Returning true means to filter this packet out. Returning false means to permit this packet to continue.</p>
<p>If you want to count packets, then implement the poorly named PacketFilterIfc which is more suitable for statistics generation than for filtering. Be sure to override the getStatistics method and add your counters (name and value) to the StatisticsList argument. Your statistics will be included in the results to any ad hoc stats command request.</p>
<p>If your processor is more about extending existing functionality (such as integrating chat with your web site) than it is about providing completely new functionality, then consider extending one of the already written plugins from the tigase.xmpp.impl package. For example, extending tigase.xmpp.impl.Presence permits you to override the processInSubscribe, processOutSubscribe, and stopped methods for a more easier way to handle those complex changes in presence state.</p>
<p>You can also customize Tigase by adding custom scripts to the scripts/admin folder. Usually written in the groovy programming language, these scripts get loaded at Tigase start up time and are available to be invoked as XMPP add hoc commands. You can also reload these types of scripts dynamically without having to restart Tigase. Groovy code runs much slower than Java so you should limit the frequency of your usage of these scripts. You load these scripts either by copying then in to the scripts/admin folder and restarting Tigase or by loading them dynamically via the add-script ad hoc command. When copied into the scripts/admin folder, each script must contain AS:Description, AS:CommandId, and AS:Component in a comment block near the top. The component must match one of the item names returned from a disco#items query.</p>
<p>There you have it. Tigase is a highly customizable XMPP server that provides an excellent, scalable platform for real-time communications that is easy to extend to meet your specific needs.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dynamicalsoftware.com/news/?feed=rss2&amp;p=195</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Configuring Your Application</title>
		<link>http://www.dynamicalsoftware.com/news/?p=204</link>
		<comments>http://www.dynamicalsoftware.com/news/?p=204#comments</comments>
		<pubDate>Sat, 19 Nov 2011 06:24:50 +0000</pubDate>
		<dc:creator>glenn</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[ANTLR]]></category>
		<category><![CDATA[Document Object Model]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[Parsing]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[YAML]]></category>

		<guid isPermaLink="false">http://www.dynamicalsoftware.com/news/?p=204</guid>
		<description><![CDATA[Which format should this information get stored as and how should the information get read in and parsed? What are the informational complexity needs of the configuration? ]]></description>
			<content:encoded><![CDATA[<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Modern applications need to be configurable. What good is an application that cannot work after moving the database or any other dependent service? I find that applications tend to collect quite a large number of configurable information over time. When the project manager can&#8217;t make a decision, the engineer makes it a configuration switch to defer the decision till later.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">This information doesn&#8217;t change during the run of the application. It just needs to get loaded into memory at start up time in order to be used during the run of the application. This is true for both client and server applications.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">You don&#8217;t need a database to store this information between runs. That would be overkill. Besides, your configuration information is most probably manged by configuration management software which knows how to write text files. In all likelihood, your configuration should be stored in a text file.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Which format should this information get stored as and how should the information get read in and parsed? That depends on how the configuration information is organized. What are the informational complexity needs of the configuration? In my experience, that comes down to two choices, a bunch of name/value pairs or something that is context sensitive.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">What does context sensitive mean in this case? It means that actual structure of the information depends on its earlier content. For example, I recently wrote a light weight configuration management system where configuration files get generated using a model driven template approach. The model can look something like this vastly simplified example.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">class Server attribute port;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">class MySql extends Server attribute password template mysqlconf;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">object db instanceof MySql attribute port : &#8216;3306&#8242;, password : &#8216;test&#8217; ;</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">You could say that this could be captured as a series of name/value pairs but what the actual names are would depend on the proceeding class name (identified by what follows the instanceof keyword in this case).</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">The easiest way to parse something like this is to use a grammar parser generator such as Antlr. You  specify the grammar and tell antlr to generate the source code needed to parse input in the format of that grammar. Throw in a little boilerplate code to invoke the  parser and you get an easily traverable object hierarchy that contains the information to your specification.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">This is easy to code and runs quick but it requires that you understand compiler theory enough to write that grammar specification. That&#8217;s not for everyone so I don&#8217;t recommend using Antlr if all you have is a set of name/value pairs. What should you use in that case?</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">It all depends on whether or not the name/value pairs are flat or in a hierarchy. Flat name/value pairs would  look something like  this.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">dbport=3306</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">dbpassword=test</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">For Java developers, the best way to load this kind of information is with the Properties class. You call the load method to read the file and the getProperty method to access the value of any property. You can also enumerate through all properties and use the Properties class to update the information back to the file if need be. This is, by far, the simplest and easiest approach. A lot of applications can safely use this approach without compromise.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Hierarchical name/value pairs would look something like this.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Db:</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span>- port: 3306</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;"><span style="white-space: pre;"> </span> password: test</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">The three most popular formats for storing hierarchical name/value pairs is XML, YAML, and JSON. Historically speaking, XML would be the most popular choice for Java developers but there are also good libraries available for parsing JSON in Java too. YAML is more popular amongst the Ruby and python programmers  but there are parsing libraries for Java too.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">With XML you get two flavors of parsers, SAX and DOM. SAX is an event driven approach where you code handlers that get called as the XML is being parsed. DOM does all the parsing up front then returns an object hierarchy for you to traverse in your code in order to access the underlying data. SAX is usually quicker to run than DOM but more complicated to code for as you will most probably have to maintain an explicit stack during the parsing operation. If you prefer the DOM approach but want to provide your own classes instead of using the generic xerces classes, then take a look at either the Apache Digester project or JAXB.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">You have probably noticed by now that the kind of system level configuration for a typical application (client or sever) could be modeled as context sensitive, flat name/value pairs, or hierarchical name/value pairs. The question isn&#8217;t so much about mandatory organizational style as it is about what is most likely to make the most sense cognitively to the engineers who will be maintaining this application in the future? To me, if it looks like a catalog of system resources, then it will best fit the context sensitive approach. If it looks like default properties with some specific overrides, then it will best fit in a hierarchical name/value approach. If we are just talking about how to connect to a small list of services, then the flat name/value approach would be my preferred approach.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Depending on the complexity of the information in your app&#8217;s configuration needs, choose the right technology for the job. In this topic, we covered three such choices; Antlr for context sensitive configurations, Properties for flat name/value configurations (most popular), and XML/YAML/JSON parsing for hierarchical name/value configurations.</div>
<p>Modern applications need to be configurable. What good is an application that cannot work after moving the database or any other dependent service? I find that applications tend to collect quite a large number of configurable information over time. When the project manager can&#8217;t make a decision, the engineer makes it a configuration switch to defer the decision till later.</p>
<p>This information doesn&#8217;t change during the run of the application. It just needs to get loaded into memory at start up time in order to be used during the run of the application. This is true for both client and server applications.</p>
<p>You don&#8217;t need a database to store this information between runs. That would be overkill. Besides, your configuration information is most probably manged by configuration management software which knows how to write text files. In all likelihood, your configuration should be stored in a text file.</p>
<p>Which format should this information get stored as and how should the information get read in and parsed? That depends on how the configuration information is organized. What are the informational complexity needs of the configuration? In my experience, that comes down to two choices, a bunch of name/value pairs or something that is context sensitive.</p>
<p>What does context sensitive mean in this case? It means that actual structure of the information depends on its earlier content. For example, I recently wrote a <a title="DIY Configuration Management" href="http://ccmbeta.dynamicalsoftware.com:8080/site/news/2011/diy-configuration-management-263806147.html" target="_self">light weight configuration management system</a> where configuration files get generated using a model driven template approach. The model can look something like this vastly simplified example.</p>
<pre>class Server attribute port;
class MySql extends Server attribute password template mysqlconf;
object db instanceof MySql attribute port : '3306', password : 'test' ;</pre>
<p>You could say that this could be captured as a series of name/value pairs but what the actual names are would depend on the proceeding class name (identified by what follows the instanceof keyword in this case).</p>
<p>The easiest way to parse something like this is to use a grammar parser generator such as <a title="Server Side Antlr" href="http://www.dynamicalsoftware.com/java/server/antlr" target="_self">Antlr</a>. You  specify the grammar and tell antlr to generate the source code needed to parse input in the format of that grammar. Throw in a little boilerplate code to invoke the  parser and you get an easily traverable object hierarchy that contains the information to your specification.</p>
<p>This is easy to code and runs quick but it requires that you understand compiler theory enough to write that grammar specification. That&#8217;s not for everyone so I don&#8217;t recommend using Antlr if all you have is a set of name/value pairs. What should you use in that case?</p>
<p>It all depends on whether or not the name/value pairs are flat or in a hierarchy. Flat name/value pairs would  look something like  this.</p>
<pre>dbport=3306
dbpassword=test</pre>
<p>For Java developers, the best way to load this kind of information is with the <a title="Properties" href="http://download.oracle.com/javase/6/docs/api/java/util/Properties.html" target="_blank">Properties class</a>. You call the load method to read the file and the getProperty method to access the value of any property. You can also enumerate through all properties and use the Properties class to update the information back to the file if need be. This is, by far, the simplest and easiest approach. A lot of applications can safely use this approach without compromise.</p>
<p>Hierarchical name/value pairs would look something like this.</p>
<pre>Db:
       - port: 3306
         password: test</pre>
<p>The three most popular formats for storing hierarchical name/value pairs is <a title="Processing XML with Java" href="http://www.ibiblio.org/xml/books/xmljava/" target="_blank">XML</a>, <a title="The Official YAML Web Site" href="http://www.yaml.org/" target="_blank">YAML</a>, and <a title="JSON" href="http://json.org/" target="_blank">JSON</a>. Historically speaking, XML would be the most popular choice for Java developers but there are also good libraries available for parsing JSON in Java too. YAML is more popular amongst the Ruby and python programmers  but there are parsing libraries for Java too.</p>
<p>With XML you get two flavors of parsers, SAX and DOM. SAX is an event driven approach where you code handlers that get called as the XML is being parsed. DOM does all the parsing up front then returns an object hierarchy for you to traverse in your code in order to access the underlying data. SAX is usually quicker to run than DOM but more complicated to code for as you will most probably have to maintain an explicit stack during the parsing operation. If you prefer the DOM approach but want to provide your own classes instead of using the generic xerces classes, then take a look at either the <a title="Commons" href="http://commons.apache.org/digester/" target="_blank">Apache Digester</a> project or <a title="Java Architecture for XML" href="http://www.oracle.com/technetwork/articles/javase/index-140168.html" target="_blank">JAXB</a>.</p>
<p>You have probably noticed by now that the kind of system level configuration for a typical application (client or sever) could be modeled as context sensitive, flat name/value pairs, or hierarchical name/value pairs. The question isn&#8217;t so much about mandatory organizational style as it is about what is most likely to make the most sense cognitively to the engineers who will be maintaining this application in the future? To me, if it looks like a catalog of system resources, then it will best fit the context sensitive approach. If it looks like default properties with some specific overrides, then it will best fit in a hierarchical name/value approach. If we are just talking about how to connect to a small list of services, then the flat name/value approach would be my preferred approach.</p>
<p>Depending on the complexity of the information in your app&#8217;s configuration needs, choose the right technology for the job. In this topic, we covered three such choices; Antlr for context sensitive configurations, Properties for flat name/value configurations (most popular), and XML/YAML/JSON parsing for hierarchical name/value configurations.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dynamicalsoftware.com/news/?feed=rss2&amp;p=204</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Making of the Conversational Content Management Beta</title>
		<link>http://www.dynamicalsoftware.com/news/?p=199</link>
		<comments>http://www.dynamicalsoftware.com/news/?p=199#comments</comments>
		<pubDate>Sat, 15 Oct 2011 20:00:57 +0000</pubDate>
		<dc:creator>glenn</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[Apache web server]]></category>
		<category><![CDATA[Collaboration]]></category>
		<category><![CDATA[Content Management]]></category>
		<category><![CDATA[Content management systems]]></category>
		<category><![CDATA[Extensible Messaging and Presence Protocol]]></category>
		<category><![CDATA[Hippo CMS]]></category>
		<category><![CDATA[Jack Rabbit]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[MySQL AB]]></category>
		<category><![CDATA[online collaborative document creation environments]]></category>
		<category><![CDATA[Open source]]></category>
		<category><![CDATA[real-time communications]]></category>
		<category><![CDATA[Social information processing]]></category>
		<category><![CDATA[Strophe]]></category>
		<category><![CDATA[Tigase]]></category>

		<guid isPermaLink="false">http://www.dynamicalsoftware.com/news/?p=199</guid>
		<description><![CDATA[With online collaboration such as Google Apps, the users get no clear clues as to who to authoring now. There is a chat area where participants can discuss the content but it does not directly become the content itself. With CCM, the content authoring medium is basically a chat room. People discuss the topic and a chat reporter records the conversations then distills it down to an actionable summary.]]></description>
			<content:encoded><![CDATA[<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Maybe it is just the inner entrepreneur but I really enjoy getting to explore an innovative idea and bringing it to the light of day. That was perhaps the main reason why I enjoyed working on the Conversational Content Management beta so much.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">What was innovative about this project was in the way that the collaboration problem was addressed. In most online collaborative document creation environments, the actual editing is the visual equivalent to what was used to be called half duplex. If you move the mouse and started typing, then you rudely took away control from whoever was editing last. This has a very chilling effect on the fragile process of group creative interaction.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Have you ever served on a board of directors? The monthly board meeting usually looks like this. A group of people discuss the agenda while an assistant takes notes. Later on, the highlights (i.e. the actionable summary) get published.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">What if you could have access to that assistant in any meeting? That is what the Conversational Content Management beta is all about. All collaboration is done in the chat room itself. A robot records the entire conversation and extracts out the actionable summary later. The following technical white paper discusses how and introduces the following open source projects used to create this beta and accelerate its time to market.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Hippo CMS is an open source Content Management System written in Java. Its emphasis is on publishing with a limited number of authors with access available to the anonymous public at large.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Strophe is an open source Java Script library that permits web pages to access an XMPP chat server via BOSH.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Code Igniter is a PHP MVC framework for writing web applications hosted by the Apache web server.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Jack Rabbit is an implementation of the Java Content Repository which is how the chat reporter publishes content in the CMS.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Tigase is the open source XMPP server, written in Java, that serves as the chat room implementation.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">Thrift is the communications layer between Tigase and the chat session manager.</div>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow-x: hidden; overflow-y: hidden;">MySql is the open source relational data base where the web site and the chat server store their data.</div>
<p>Maybe it is just the inner entrepreneur but I really enjoy getting to explore an innovative idea and bringing it to the light of day. That was perhaps the main reason why I enjoyed working on the <a title="Conversational Content Management" href="http://www.dynamicalsoftware.com/ccm/beta" target="_self">Conversational Content Management</a> beta so much.</p>
<p>What was innovative about this project was in the way that the collaboration problem was addressed. In most online collaborative document creation environments, the actual editing is the visual equivalent to what was used to be called half duplex. If you move the mouse and started typing, then you rudely took away control from whoever was editing last. This has a very chilling effect on the fragile process of group creative interaction.</p>
<p>Have you ever served on a board of directors? The monthly board meeting usually looks like this. A group of people discuss the agenda while an assistant takes notes. Later on, the highlights (i.e. the actionable summary) get published.</p>
<p>What if you could have access to that assistant in any meeting? That is what the Conversational Content Management beta is all about. All collaboration is done in the chat room itself. A robot records the entire conversation and extracts out the actionable summary later. The following technical white paper discusses how and introduces the following open source projects used to create this beta and accelerate its time to market.</p>
<ul>
<li><a title="Hippo CMS - Empower your Audience" href="http://onehippo.com" target="_blank">Hippo CMS</a> is an open source Content Management System written in Java. Its emphasis is on publishing with a limited number of authors with access available to the anonymous public at large.</li>
<li><a title="Strophe - libraries for XMPP poets" href="http://strophe.im/" target="_blank">Strophe</a> is an open source Java Script library that permits web pages to access an XMPP chat server via BOSH.</li>
<li><a title="Code Igniter - open source PHP web application framework" href="http://codeigniter.com/" target="_blank">Code Igniter</a> is a PHP MVC framework for writing web applications hosted by the <a title="The Apache HTTP Server Project" href="http://httpd.apache.org/" target="_blank">Apache web server</a>.</li>
<li><a title="Welcome to Apache Jack Rabbit" href="http://jackrabbit.apache.org/" target="_blank">Jack Rabbit</a> is an implementation of the <a title="Introducing the Java Content Repository" href="http://www.ibm.com/developerworks/java/library/j-jcr/" target="_blank">Java Content Repository</a> which is how the chat reporter publishes content in the CMS.</li>
<li><a title="Tigase Open Source and Free Jabber/XMPP Server" href="http://tigase.org" target="_blank">Tigase</a> is the open source XMPP server, written in <a title="OpenJDK" href="http://openjdk.java.net/" target="_blank">Java</a>, that serves as the chat room implementation.</li>
<li><a title="Apache Thrift" href="http://thrift.apache.org/" target="_blank">Thrift</a> is the communications layer between Tigase and the chat session manager.</li>
<li><a title="MySql - The world's most popular open source databas" href="http://www.mysql.com/" target="_blank">MySql</a> is the open source relational data base where the web site and the chat server store their data.</li>
</ul>
<p><span style="font-size: xx-small;"><a href="http://www.docstoc.com/docs/98853452/Conversational-Content-Management">Conversational Content Management</a></span><br />
<object id="_ds_98853452" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="630" height="550" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="name" value="_ds_98853452" /><param name="FlashVars" value="doc_id=98853452&amp;mem_id=201324&amp;showrelated=1&amp;showotherdocs=1&amp;doc_type=pdf&amp;allowdownload=1" /><param name="wmode" value="opaque" /><param name="allowScriptAccess" value="always" /><param name="allowFullScreen" value="true" /><param name="src" value="http://viewer.docstoc.com/" /><embed id="_ds_98853452" type="application/x-shockwave-flash" width="630" height="550" src="http://viewer.docstoc.com/" allowfullscreen="true" allowscriptaccess="always" wmode="opaque" flashvars="doc_id=98853452&amp;mem_id=201324&amp;showrelated=1&amp;showotherdocs=1&amp;doc_type=pdf&amp;allowdownload=1" name="_ds_98853452"></embed></object><br />
<script type="text/javascript">// <![CDATA[
var docstoc_docid="98853452";var docstoc_title="Conversational Content Management";var docstoc_urltitle="Conversational Content Management";
// ]]&gt;</script><script src="http://i.docstoccdn.com/js/check-flash.js" type="text/javascript"></script></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dynamicalsoftware.com/news/?feed=rss2&amp;p=199</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics Part 3 &#8211; Graphing</title>
		<link>http://www.dynamicalsoftware.com/news/?p=191</link>
		<comments>http://www.dynamicalsoftware.com/news/?p=191#comments</comments>
		<pubDate>Tue, 05 Apr 2011 06:01:41 +0000</pubDate>
		<dc:creator>glenn</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Essbase]]></category>
		<category><![CDATA[OLAP]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Ubuntu]]></category>
		<category><![CDATA[Weka]]></category>

		<guid isPermaLink="false">http://www.dynamicalsoftware.com/news/?p=191</guid>
		<description><![CDATA[Only through the process of constant and unrelenting improvement can you hope to remain competitive. Analytics provides the reality check for that process.]]></description>
			<content:encoded><![CDATA[<p>When you apply the rigour of numerical analysis to studying the complex behaviour of your app and how its users relate to it, you take out the guesswork in how to improve and fine-tune its effectiveness. Capturing the raw data through logging and massaging and selecting the right subset of that data are necessary prerequisite steps towards the final deliverable for analysis, the graph or table based report.</p>
<p>In the past two blogs, we have surveyed some popular <a href="http://www.dynamicalsoftware.com/analytics/logging">logging</a> and data <a href="http://www.dynamicalsoftware.com/analytics/selection">selection</a> techniques. In this blog, I will introduce you to some tools to graph or report the data in ways that make it easy to draw action oriented conclusions.</p>
<p>A picture is worth a thousand words as they say. Plotting a graph is a great way to spot trends or reveal new relationships in various aspects of your app. Graphing data entails inputing one or more CSV files and rendering an image based on the data from those files. I introduce four ways to do this; spreadsheets, OLAP, R, and Weka.</p>
<p>Microsoft Office, Oracle <a href="http://www.openoffice.org/">Open Office</a>, IBM <a href="http://symphony.lotus.com/software/lotus/symphony/home.nsf/home">Lotus Symphony</a>, and <a href="http://docs.google.com">Google Docs</a> all offer a spreadsheet application capable of graphing the data from CSV files. Typically, this entails opening up the CSV file which displays its contents in rows and columns. Then selecting the right rows and columns (do include label data for the graph) and invoking a chart wizard to render the image. After that, you capture the image, usually through some kind of print screen capability.</p>
<p>Sometimes your analytics is so complex that it is too cumbersome to use CSV files. Sometimes, the CSV files are just too big to import into the spreadsheet. Perhaps your analysts feel more comfortable slicing and dicing their data in real-time and pivoting with a drag-and-drop interface. For that reason, you may want to take a look at using an OLAP database as the back-end to your spreadsheets instead of opening up the CSV files directly in the spreadsheet app itself.</p>
<p>When analysing logs files the normal way, you have multiple data sets and they are all two dimensional. With OLAP, you have only one data set but it is multi-dimensional. This multi-dimensional data set is usually referred to as the cube. Most OLAP cubes have about 20 or so dimensions to them. Each dimension is actually a hierarchy of values which is how you can roll-up or drill-down in an OLAP spreadsheet. You still use the CSV files only now you load the cube with the CSV files then access the cube with spreadsheets. Most OLAP technology these days use a relational database under the covers.</p>
<p>Most of the relational database vendors also provide an OLAP offering. Microsoft and Oracle are the two most notable examples. Oracle&#8217;s offering is most probably the most mature as they acquired Essbase from Hyperian who, in turn, purchased it from a company of the same name. For an open  source OLAP database, consider using Pentaho&#8217;s <a href="http://mondrian.pentaho.com/">Mondrian</a> project. The GUI of choice for both the Microsoft and Oracle offerings is MS-Excel. Mondrian is intended to be embedded in your Java applications and has a web interface too but no first class plug-in for MS-Excel.</p>
<p>Mondrian is open source and so lacks any licensing fee structure. Jaspersoft also releases a community edition of their <a title="Welcome to JasperForge" href="http://jasperforge.org/">OLAP server</a>. The others have somewhat high licensing costs. Be prepared to incur high operational and maintenance costs when using any OLAP solution.</p>
<p>The down-side to using spreadsheets to render your graphs is that it is a manual operation. This can get to be a bit daunting when you have to generate these reports often. Another approach is to use the open source <a href="http://www.r-project.org/">R Programming Language</a> to generate the actual graphs.</p>
<p>Perhaps the biggest advantage to R is that you can easily apply various transformations to the data set, or work with multiple data sets, as easy as writing a formula. R comes with lots of plotting and graphing options  such as dot charts, bar charts, pie charts, notched box plots, histograms, topographic maps, and heat maps.</p>
<p>Just like with the spreadsheets, you read in a CSV file&#8217;s contents to create a data set in R. Once loaded, there are various functions that you can apply to a data set in order to pivot, slice and dice, filter, normalize, etc. Vector and matrix math is available in R as are lists, data frames and just about any probability distribution function that you could imagine. The R programming language was originally designed for statistical analysis.</p>
<p>If you are trying to find correlation between various dimensions in your data, then R has a special feature for you where it automatically plots a 2D graph of every column  vs every other column in your data set. This is called the scatter plot. You can quickly inspect each graph visually in order to discover hither to unknown relevance.</p>
<p>Ubuntu users can easily install the R programming language environment via the Synaptic Package Manager by marking the r-recommended package for installation.</p>
<p>Perhaps you see analysis in terms of data mining but don&#8217;t feel comfortable with using a programming language such as R. In which case, take a look at <a href="http://www.cs.waikato.ac.nz/ml/weka/">Weka</a> which is a collection of machine learning algorithms for data mining tasks. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. There is a GUI tool that is easy to use. Although it prefers its own ARFF format, you can load Weka with CSV files. Like R, Weka can easily show you scatter plots to help you find correlation between various dimensions in your data. Unlike R, Weka provides advanced classification and clustering capability through the use of machine learning where you provide special training data that it uses to do a better job on the real data later on.</p>
<p>There you have it. With these three blogs, you now have a survey level introduction to the various tools and techniques for <a href="http://www.dynamicalsoftware.com/analytics/logging">logging</a>, <a href="http://www.dynamicalsoftware.com/analytics/selection">selecting</a>, and graphing your app&#8217;s activities for the purposes of analysing its performance and behavioural characteristics.</p>
<p>Only through the process of constant and unrelenting improvement can you hope to remain competitive. Analytics provides the reality check for that process.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dynamicalsoftware.com/news/?feed=rss2&amp;p=191</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics Part 2 &#8211; Data Selection</title>
		<link>http://www.dynamicalsoftware.com/news/?p=175</link>
		<comments>http://www.dynamicalsoftware.com/news/?p=175#comments</comments>
		<pubDate>Sun, 06 Mar 2011 20:03:11 +0000</pubDate>
		<dc:creator>glenn</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[AWK]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[open source tools]]></category>

		<guid isPermaLink="false">http://www.dynamicalsoftware.com/news/?p=175</guid>
		<description><![CDATA[Learn how to massage your application's event logging into data that is compatible with analytics.]]></description>
			<content:encoded><![CDATA[<p>It has been said by numerous industry pundits and trend watchers that we are currently in the age of information analysis. From the A/B testing of your humble iPhone app to planning the next six figure media campaign for your large scale social content site, analytics is the method of choice for learning how to better optimize and tune the effectiveness and relevancy of your app.</p>
<p>This blog is part 2 of 3 parts. In today&#8217;s blog, I will go into how to massage the raw logging data captured from <a href="http://www.dynamicalsoftware.com/analytics/logging">part 1</a> in preparation for <a href="http://www.dynamicalsoftware.com/analytics/graphing">graphing</a> or reporting on interesting trends which you will learn about in part 3. </p>
<p>You have now instrumented your app for logging and have gotten all these raw log files with the collected activity of your app. These files capture all aspects of your app&#8217;s activity which is information overload when trying to spot trends or to answer specific questions about relevance or usability. What you need to do in the selection phase is convert those log files from a logging appropriate format to a graphing appropriate format.</p>
<div id="attachment_177" class="wp-caption aligncenter" style="width: 310px"><a href="http://www.dynamicalsoftware.com/news/?attachment_id=177" rel="attachment wp-att-177"><img src="http://www.dynamicalsoftware.com/news/wp-content/uploads/2011/03/dataAggregation-300x151.png" alt="converting log data into trend data" title="Data Aggregation" width="300" height="151" class="size-medium wp-image-177" /></a><p class="wp-caption-text">converting log data into trend data</p></div>
<p>How do you filter and aggregate these raw log files into data that is more easily able to be graphed or displayed as a table based report?</p>
<p>If you are using Linux or Mac OS X, then you most probably have all the command line tools that you will need to get the job done.  Windows users should check out <a href="http://www.cygwin.com/">cygwin</a> which provides similar command line tools for Microsoft compatible machines. Here is a gentle introduction to some of these tools and what they can do for you.</p>
<p>If you are looking only at one type of activity and wish to filter out the rest, then consider using <a href="http://www.gnu.org/software/grep/">grep</a> which is a general regular expression parser that can report on or extract out strings from your log files that matches patterns that you specify. It can also do some rudimentary aggregation and perform these operations on just one file or on multiple files in a folder hierarchy. Another way to aggregate is to pipe the grep output into the wc -l command.</p>
<p>If each line in your log file looks like a multi-column table where each column is delimited by a special  character (e.g. CSV or Comma Separated Values files), then you can use the Linux command line tools <a href="http://www.techrepublic.com/article/master-the-lesser-known-cut-and-diff-linux-commands/5034547">cut</a>, <a href="http://webtools.live2support.com/linux/join.php">join</a>, and <a href="http://www.computerhope.com/unix/upaste.htm">paste</a> to subset and union these rows both vertically and horizontally. The cut command extracts specific columns out of a file. The paste command combines two files together on a line-by-line basis. The join command combines two files vertically over a specific column.</p>
<p>Sometimes you might need to massage a log file in order to get it into CSV format. In which case, <a href="http://www.linuxjournal.com/article/7231">sed</a> might just be the right tool for the job. This line oriented text editor is ideally suited for shell scripts that need to do this kind of data massaging.</p>
<p>For more complex aggregation, pivoting, filtering, and report generation, consider using awk which is a very advanced report writing tool. To use <a href="http://www.awktutorial.com/">awk</a>, you write this script that looks like a hybrid of the C programming language and a sed script. This script is fed your log file which it parses to generate the output.</p>
<p>Each of these tools perform a very specific task. You can easily combine these tools to get the entire job done through use of such shell programming features as pipes, sideways pipes, and redirection. Constructs such as conditionals, for loops, and here files should also come in handy.</p>
<p>A pipe takes the output from one tool and makes it the input to another. A sideways pipe takes the output from one tool and makes it the command line arguments to another. If that output is multi-line, then you will want to pipe it into xargs. Redirection permits a tool to get its input from a previously existing file or to save its output as a new file. </p>
<p>If your filtering needs to be context sensitive, where a match of one pattern should occur only in the context of another (possibly on a previous row in the log file), then consider using <a href="http://www.antlr.org/">ANTLR</a> to perform these more advanced filtering operations. It is easy to install ANTLR. Just download the JARs and run them. You will most probably also want to use <a href="http://www.antlr.org/works/index.html">antlrworks</a> to create your grammar specification files.</p>
<p>ANTLR is actually a general purpose parser generator but you can re-purpose it via the use of its filter and rewrite grammar options and its integration with another Java library called string template.</p>
<p>Maybe you already use relational databases and would just like to take advantage of the expressive power of the Structured Query Language&#8217;s select statement to generate reports from your log files instead of all these different command line tools? If that is your preference, then consider dumping your log files into <a href="http://hadoop.apache.org/">Hadoop</a> and use Hive to run SQL statements against your logging data. While not standards compliant, Hive SQL is very expressive and will work on extremely large data sets (though not quickly). </p>
<p>The easiest way to install Hadoop and <a href="http://hive.apache.org/">Hive</a> is <a href="http://www.cloudera.com/">Cloudera</a> which is available to Linux users as a Debian package. You will have to add their repository to your <a href="http://www.nongnu.org/synaptic/">Synaptic Package Manager</a> repository configuration before installing. </p>
<p>It&#8217;s amazing just how easy it is to slice and dice your raw log files in order to get the data just right for graphing and reporting. Today, you got a glimpse into some of the more common tools used for these purposes. In the next instalment, you will see how to use more <a href="http://www.dynamicalsoftware.com/analytics/graphing">open source tools in order to graph</a> and/or display this data in professional looking reports.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dynamicalsoftware.com/news/?feed=rss2&amp;p=175</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics Part 1 &#8211; Logging</title>
		<link>http://www.dynamicalsoftware.com/news/?p=168</link>
		<comments>http://www.dynamicalsoftware.com/news/?p=168#comments</comments>
		<pubDate>Wed, 09 Feb 2011 08:01:07 +0000</pubDate>
		<dc:creator>glenn</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Apache HTTP Server]]></category>
		<category><![CDATA[AWStats]]></category>
		<category><![CDATA[Google Analytics]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[Log4j]]></category>

		<guid isPermaLink="false">http://www.dynamicalsoftware.com/news/?p=168</guid>
		<description><![CDATA[Analytics has grown increasingly popular over the years and for good reason. There is currently no better way to arrive at a solid understanding of what your complex application is doing right and wrong than to numerically analyze its activity.
This blog is a three parter. In today&#8217;s blog, I will cover the basics of logging. [...]]]></description>
			<content:encoded><![CDATA[<p>Analytics has grown increasingly popular over the years and for good reason. There is currently no better way to arrive at a solid understanding of what your complex application is doing right and wrong than to numerically analyze its activity.</p>
<p>This blog is a three parter. In today&#8217;s blog, I will cover the basics of logging. How you can add code to your application that will log its activity for later analysis. In the next part, I will go into how to <a href="http://www.dynamicalsoftware.com/analytics/selection">select the raw log data</a> in meaningful ways. In the final instalment, I shall elaborate on how to plot or graph relevant subsets of logging data for the purposes of insightful analysis.</p>
<p>Logging is a necessary prerequisite to any kind of data analytics. If your application doesn&#8217;t log its activity, then you cannot analyze it later. The simplest way to instrument your app for logging is to write out what it is currently doing to the console. Various tools are available to take the output to the console and to split it up into many files over time. Splitting up the output into multiple files will make it easier to copy to another server. It is not recommended that you manipulate and analyze the data on the same server as where your app is running. If your app can run as a service, then consider using <a href="http://smarden.org/runit/">runit</a> with its svlogd tool.</p>
<p>If yours is a web application, then a lot of logging is most likely already happening for you. Both Apache and IIS record each HTTP operation, the end point URI, and the return code automatically. From these log files, you can learn a lot about how your app is used but there are limitations. You cannot trace the identity of the user as he or she flows through your site with this type of logging. The only type of demographic questions you can answer with this type of logging is browser type or OS. Nothing in the user&#8217;s profile is available. Form post data is also not recorded at this level of logging. That return code is from the web server and not from your application. Your insert statement may have failed but the server still returned a response so the return code will indicate success. One easy way to analyze Apache log data is with the open source project <a href="http://awstats.sourceforge.net/">awstats</a>.</p>
<p>The most popular way to log web application activity on the Internet is <a href="http://www.google.com/analytics/">Google Analytics</a>. This is a free service, provided by Google, where you drop a little java script on each page. Over time, you can access your data via the Google Analytics web site where you can track such metrics as types of visitors (browser, OS, etc), where they come from (what sites are linking to yours), most popular content (including number of visits, unique visitors, and average time spent on the page), and most popular landing and exit pages.</p>
<p>Google Analytics also lets you set up conversion goals that you can also track. These are essentially a sequence of pages that you hope to funnel your users through. This is usually important for e-commerce sites to evaluate the effectiveness of their shopping cart.</p>
<p>When instrumenting your app to log its activity, take a look at using a third party library instead of doing the writing directly in your app code. There really is no reason to duplicate all that logging logic in a million different places in your app. Instead, take advantage of the object oriented concept of encapsulation by wrapping your logging logic in a class or two.</p>
<p>Java developers get this for free in the java.util.logging package. The Log class and the Level enumeration are what you will be dealing with the most. Use environmental settings and a configuration file to customize how and where all that logging gets saved including the number of files to rotate the logs through. Logging levels permit you to write copious data for short term debugging purposes or less data for long term trend watching and forecasting purposes. Advanced use may entail subclassing the LogFormatter class if you need to do fancy stuff such as aggregating data.</p>
<p>PHP developers may wish to use <a href="https://github.com/facebook/scribe">scribe</a> to handle the mechanics of logging. Originally developed by <a href="http://www.facebook.com/note.php?note_id=32008268919">Facebook</a> and now an open source project, scribe is great for centralizing logs from large server farms. There is a <a href="http://www.cloudera.com/blog/2008/10/thrift-scribe-hive-and-cassandra-open-source-data-management-software/">thrift based API</a> which makes scribe available to other development languages and platforms.</p>
<p>From a data store perspective, logging usually entails writing huge amounts of immutable data to be aggregated and analyzed later on. This sounds like a perfect application for <a href="http://hadoop.apache.org/">Hadoop</a>. Even if you do write your logs to files, a batch process can come behind and import those files into HDFS.  Hadoop is distributed so it is immensely scalable with no single point of failure.</p>
<p>From the humble print statement to the system console to massively parallel noSQL data stores, we have surveyed the most effective and popular methods for logging your application&#8217;s activity. This is an important first step on the road to analyzing your apps performance and usability. Stay tuned for the next installment which is all about how to select and prepare all these raw logging files for analysis.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dynamicalsoftware.com/news/?feed=rss2&amp;p=168</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Server Side ANTLR</title>
		<link>http://www.dynamicalsoftware.com/news/?p=161</link>
		<comments>http://www.dynamicalsoftware.com/news/?p=161#comments</comments>
		<pubDate>Tue, 16 Nov 2010 07:28:57 +0000</pubDate>
		<dc:creator>glenn</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[ANTLR]]></category>

		<guid isPermaLink="false">http://www.dynamicalsoftware.com/news/?p=161</guid>
		<description><![CDATA[ANTLR is used to embed a domain specific language parser into a Java servlet then stress tested to determine its scalability.]]></description>
			<content:encoded><![CDATA[<p>ANTLR is ANother Tool for Language Recognition. I have used ANTLR many times before to embed the ability to parse a configuration or set of instructions written in a domain specific language into a command line tool. This past weekend, I coded up a <a href="http://code.google.com/p/newswidget/">Java servlet that uses ANTLR technology</a> to parse web requests written in a specific query language conducive to web publishing.</p>
<p>Compiler theory and language parsing is a pretty deep subject in computer science. You start with Chomsky&#8217;s Theory of Formal Languages and end up with parsing strategies such as LALR(1) and LL(k). It&#8217;s not for everyone. ANTLR makes it easy to develop and maintain your own parser for a language whose specification keeps changing but only if you have a solid grasp of the fundamental concepts. Otherwise, it is going to look a lot like voodoo.</p>
<p>I have used ANTLR before but this is the first time that I have used version 3. There has been a lot of improvements since the last version. One of the biggest improvements has been the change of parsing strategy from LL(k) to LL(*). Perhaps a small digression to explain that is in order.</p>
<p>A very important part to parsing is the ability to recognize strings that conform to the language specification from those that do not. You  write the language specification in a kind of EBNF format (file type is .g in ANTLR) and run an ANTLR tool that creates classes in a variety of languages (e.g. Java, C#, Python) whose job it is to parse these strings.</p>
<p>What the parser does is break the string up into tokens (this is called the lexical analysis phase) then process those tokens. At the same time, it is traversing the language specification which is a set of production rules. If it finds a rule for the next token, then the string is still compliant to the language specification. Otherwise, it is not.</p>
<p>What happens if it finds two or more rules that match the next token? This is called ambiguity which is a bad thing. The parser needs to decide what the next rule is or it can&#8217;t proceed. The way to make that decision is to look ahead to see if any tokens further down stream can resolve the ambiguity. Grammar specifications with limited look ahead are harder to read.</p>
<p>That is why the move to LL(*) from LL(k) is a big deal. With LL(k) you have to specify the maximum look ahead and the bigger the number, the slower the parsing. With LL(*) you can have unlimited look ahead and the parsing remains fast thanks to two other features new in version 3 called memoization and backtracking.</p>
<p>Some other cool features in version 3 is the ability to use ANTLR to easily create filters and rewriters. With these new features, it is easy to use ANTLR to create log file analysis tools, for example. Grammar rules can now have return values which allows you to easily propagate relevant information up the call stack of the rule traversal in a parsing session.</p>
<p>What do you do with a string once it has been parsed? Earlier versions of ANTLR allowed you to either embed your own commands into the parser class that get executed when the corresponding rule has been chosen and to create an AST or Abstract Symbol Tree that can be traversed after the parsing has concluded.</p>
<p>Another new feature in version 3 is a third option which is to use string template technology to generate output; however, that has not proven to be so useful in my own exploration of version 3.</p>
<p>Another thing that I did this past weekend was stress test the servlet using Tsung. I wanted to see how well ANTLR could perform when expected to serve about two requests per second for an hour.</p>
<p>What I found was that ANTLR was easily able to handle the load taking around one millisecond to perform the parsing. A surprise discovery for me what that the AST version performed just as well as the embedded commands version. I had expected that the AST version would take longer since you had to traverse the AST data structure afterwards. There was no noticeable difference in latency between the two types of parsings.</p>
<p><object id="_ds_61821470" name="_ds_61821470" width="630" height="550" type="application/x-shockwave-flash" data="http://viewer.docstoc.com/"><param name="FlashVars" value="doc_id=61821470&#038;mem_id=201324&#038;doc_type=pdf&#038;fullscreen=0&#038;allowdownload=1" /><param name="movie" value="http://viewer.docstoc.com/"/><param name="allowScriptAccess" value="always" /><param name="allowFullScreen" value="true" /></object><br /><script type="text/javascript">var docstoc_docid="61821470";var docstoc_title="Scalable Server Side Language Parsing";var docstoc_urltitle="Scalable Server Side Language Parsing";</script><script type="text/javascript" src="http://i.docstoccdn.com/js/check-flash.js"></script><font size="1"><a href="http://www.docstoc.com/docs/61821470/Scalable-Server-Side-Language-Parsing">Scalable Server Side Language Parsing</a></font></p>
]]></content:encoded>
			<wfw:commentRss>http://www.dynamicalsoftware.com/news/?feed=rss2&amp;p=161</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Eclipse: Twilight Saga for Coders</title>
		<link>http://www.dynamicalsoftware.com/news/?p=149</link>
		<comments>http://www.dynamicalsoftware.com/news/?p=149#comments</comments>
		<pubDate>Sat, 18 Sep 2010 23:07:44 +0000</pubDate>
		<dc:creator>glenn</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Apache Maven]]></category>
		<category><![CDATA[Aptana]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[Eclipse]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Maven2]]></category>
		<category><![CDATA[Mylyn]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Ruby on Rails]]></category>

		<guid isPermaLink="false">http://www.dynamicalsoftware.com/news/?p=149</guid>
		<description><![CDATA[Here are some of the Eclipse plugins that I recommend.]]></description>
			<content:encoded><![CDATA[<p>Coders come in two flavors, those who use a simple text editor and those who code with a full featured IDE or Integrated Development Environment. An IDE does a lot more than edit text files. Building, packaging, dependency management, refactoring, and code generation are just some of the additional functionality that typically comes with an IDE. Modern IDEs also reach into more phases of software development including requirements, analysis,  debugging, and deployment.</p>
<p>There are many IDEs that are available including Visual Studio and NetBeans; however, the IDE that I want to talk about today is Eclipse. Originally IBM&#8217;s replacement for the atrocity known as Visual Age for Java, <a title="Eclipse.org Home" href="http://www.eclipse.org/" target="_blank">Eclipse</a> has grown into a highly pluggable IDE for just about any software development language and platform.</p>
<p>It&#8217;s that high pluggability that makes Eclipse so special. That and the rich marketplace of freely available, high quality plugins that can be surfaced within Eclipse. Here are some of the plugins that I use on my home machine.</p>
<p>There are lots of plugins for Eclipse and it would be hard to set up just what you needed if you started with a blank Eclipse, just the shell with no plugins. That&#8217;s OK because there are lots of convenient packagings of plugins for you to start from. Some of the best packages feature eclipse for <a title="Eclipse IDE for Java Developers" href="http://eclipse.org/downloads/packages/eclipse-ide-java-developers/heliosr" target="_self">Java developers</a>, <a title="Eclipse IDE for Java EE Developers" href="http://eclipse.org/downloads/packages/eclipse-ide-java-developers/heliosr" target="_self">J2EE developers</a>, <a title="Eclipse for PHP Developers" href="http://eclipse.org/downloads/packages/eclipse-php-developers/heliosr" target="_self">PHP developers</a>, <a title="Eclipse CDT" href="http://www.eclipse.org/cdt/" target="_self">C++ developers</a>, and <a title="Eclipse Modeling Tools" href="http://eclipse.org/downloads/packages/eclipse-modeling-tools-includes-incubating-components/heliosr" target="_self">modelers</a>. Web developers might want to take a look at <a title="Aptana" href="http://www.aptana.com/" target="_self">Aptana</a>, especially if you do Ruby on Rails.</p>
<p>The good news is that you can mix and match. You can have multiple versions of Eclipse that you can enhance with additional plugins. I could have both the JDT and the PDT running as two separate instances of Eclipse or I could just install the <a title="PHPEclipse" href="http://www.phpeclipse.com/" target="_self">PHPEclipse</a> plugin into the JDT. Why pick one over the other? Use PDT if you are a full time PHP developer and pick PHPEclipse if you mostly do Java and just need to do a little PHP coding every now and then.</p>
<p>If you find that EMT is overkill for your design needs, then check out the Eclipse plugin for <a title="Free UML Tool for Fast UML Diagrams" href="http://www.umlet.com/" target="_self">UMLet</a>.</p>
<p>Another favorite plugin of mine is <a title="Pydev" href="http://pydev.org/" target="_self">Pydev</a> which is helpful for Python or Django developers.</p>
<p>Not only are there plugins for different development languages, there are also plugins for different frameworks. The Eclipse plugins for <a title="DataNucleus Access Platform - Eclipse Tutorial" href="http://www.datanucleus.org/products/accessplatform/guides/eclipse/index.html" target="_self">JDO</a>, <a title="SpringSource Tool Suite" href="http://www.springsource.com/developer/sts" target="_self">Spring</a>, <a title="Google Plugin for Eclipse" href="http://code.google.com/eclipse/" target="_self">GWT, and GAE</a> are some of my favorite in this category.</p>
<p>Eclipse plugins are also very helpful for more code related activities than just coding. <a title="subclipse.tigrase.org" href="http://subclipse.tigris.org/" target="_self">Subclipse</a> is a full featured plugin to interface with the <a title="Apache Subversion" href="http://subversion.apache.org/" target="_self">Subversion</a> control system. <a title="m2eclipse: Project Home" href="http://m2eclipse.sonatype.org/" target="_self">M2Eclipse</a> allows Eclipse developers to easily build their projects and manage their project dependencies using <a title="Apache Maven" href="http://maven.apache.org/" target="_self">Maven2</a>. With <a title="Mylyn: Code at the Speed of Thought" href="http://www.tasktop.com/mylyn/" target="_self">Mylyn</a>, you can access and keep track of your list of development tasks right within the Eclipse GUI. There are a lot of Mylyn connectors for various project management tools including <a title="Free Rally Project Management" href="http://www.rallydev.com/agile_products/editions/community" target="_self">Rally</a>, <a title="CollabNet TeamForge" href="http://www.collab.net/products/ctf/" target="_self">CollabNet TeamForge</a>, <a title="FogBugz - Bug &amp; Issue Tracking, Project Management, Help Desk Software" href="http://www.fogcreek.com/fogbugz/" target="_self">FogBugz</a>, and <a title="The Trac Project" href="http://trac.edgewall.org/" target="_self">Trac</a>.</p>
<p>If you are a coder who prefers using an IDE, then consider Eclipse for its wealth of plugins that span a wide variety of languages, frameworks, and methodologies.</p>
<div id="attachment_158" class="wp-caption aligncenter" style="width: 310px"><a rel="attachment wp-att-158" href="http://www.dynamicalsoftware.com/news/?attachment_id=158"><img class="size-medium wp-image-158" title="Mylyn Connector Discovery Dialog" src="http://www.dynamicalsoftware.com/news/wp-content/uploads/2010/09/mylynConnectorDiscovery-300x168.png" alt="Surfacing Project Management within Eclipse" width="300" height="168" /></a><p class="wp-caption-text">Surfacing Project Management within Eclipse</p></div>
]]></content:encoded>
			<wfw:commentRss>http://www.dynamicalsoftware.com/news/?feed=rss2&amp;p=149</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adopting Real Time Communication: Openfire vs Tigase</title>
		<link>http://www.dynamicalsoftware.com/news/?p=143</link>
		<comments>http://www.dynamicalsoftware.com/news/?p=143#comments</comments>
		<pubDate>Sat, 04 Sep 2010 20:04:03 +0000</pubDate>
		<dc:creator>glenn</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Extensible Messaging and Presence Protocol]]></category>
		<category><![CDATA[Instant messaging]]></category>
		<category><![CDATA[Openfire]]></category>
		<category><![CDATA[Tigase]]></category>

		<guid isPermaLink="false">http://www.dynamicalsoftware.com/news/?p=143</guid>
		<description><![CDATA[You are interested in adding real time communication but which open source project should you use? ]]></description>
			<content:encoded><![CDATA[<p>There is a growing trend right now to add real time communication to your enterprise collaboration solution. Companies are taking a page from the play books of Facebook and Yahoo (who copied Twitter) and are adding Instant Messaging to their service offerings. When it comes to IM, consider XMPP which is the protocol designed from day one for IM.</p>
<p>When setting up your own chat server, there are two open source projects that you should seriously consider; Openfire and Tigase.</p>
<p>Sponsored by such corporate leaders in IM as Jive, IBM, Apple, and Google, <a title="Ignite Realtime: Openfire Server" href="http://www.igniterealtime.org/projects/openfire/" target="_blank">Openfire</a> is an XMPP server written in Java and stewarded by Ignite Realtime.</p>
<p><a title="Open Source and Free (GPLv3) Jabber/XMPP Server" href="http://www.tigase.org/" target="_blank">Tigase</a> has more humble and open source traditional origins where an engineer, seasoned in server development, had an &#8220;itch to scratch&#8221; and started building this XMPP server.</p>
<p>How are Tigase and Openfire similar? They both fully support the <a title="The XMPP Standards Foundation" href="http://xmpp.org/" target="_blank">XMPP</a> standard and many of the more popular extensions to that standard. They both parallel the high extensibility of the protocol with their own pluggable architecture. They are both written in Java.</p>
<p>You can extend the chat server functionality by writing your own plugins and components. The heart of the XMPP is in its three main stanzas; IQ, presence, and message. IQ is mostly about roster management. Presence is how those away and available notifications get delivered. Messages are what you text back and forth with your friends. If you want to add new functionality to those stanzas, or create your own, then you should write a plugin.If you want to write a chat bot (i.e. a robot that you can exchange messages with), then you should write a component.</p>
<p>A JID (Jabber IDentifier) is how you log in to the server and how you address other users. They look a lot like email addresses. A component has its own JID whereas a plugin does not.</p>
<p>How are Tigase and Openfire different? Openfire provides a web based administrative interface which you can extend with your own JSP. You can add your own plugins by packaging them up in a WAR file and adding them through the admin interface. Openfire depends on being hosted by a J2EE application container such as Tomcat. Tigase is a stand-alone program that is executed by itself. The only administrative interface for Tigase is a text based properties file that you can edit before launching the Tigase service. To get Tigase to load your plugins and components, you just have to ensure that your JAR files are in the classpath prior to starting Tigase.</p>
<p>With Tigase, you can also plug in your own repository instead of depending on their relational database. Tigase is more oriented towards high scalability with its support for clustering, virtual hosts, and thread management.</p>
<p>Which one is best for your needs? If you are more concerned about IT footprint and cost containment than you are about serving large numbers of people, then you should start with Openfire. If you already know that you are going to have to serve large numbers of users and are willing to commit the resources of a server farm, then you won&#8217;t go wrong with Tigase.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.dynamicalsoftware.com/news/?feed=rss2&amp;p=143</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

