Configuring Your Application

Modern applications need to be configurable. What good is an application that cannot work after moving the database or any other dependent service? I find that applications tend to collect quite a large number of configurable information over time. When the project manager can’t make a decision, the engineer makes it a configuration switch to defer the decision till later.
This information doesn’t change during the run of the application. It just needs to get loaded into memory at start up time in order to be used during the run of the application. This is true for both client and server applications.
You don’t need a database to store this information between runs. That would be overkill. Besides, your configuration information is most probably manged by configuration management software which knows how to write text files. In all likelihood, your configuration should be stored in a text file.
Which format should this information get stored as and how should the information get read in and parsed? That depends on how the configuration information is organized. What are the informational complexity needs of the configuration? In my experience, that comes down to two choices, a bunch of name/value pairs or something that is context sensitive.
What does context sensitive mean in this case? It means that actual structure of the information depends on its earlier content. For example, I recently wrote a light weight configuration management system where configuration files get generated using a model driven template approach. The model can look something like this vastly simplified example.
class Server attribute port;
class MySql extends Server attribute password template mysqlconf;
object db instanceof MySql attribute port : ’3306′, password : ‘test’ ;
You could say that this could be captured as a series of name/value pairs but what the actual names are would depend on the proceeding class name (identified by what follows the instanceof keyword in this case).
The easiest way to parse something like this is to use a grammar parser generator such as Antlr. You  specify the grammar and tell antlr to generate the source code needed to parse input in the format of that grammar. Throw in a little boilerplate code to invoke the  parser and you get an easily traverable object hierarchy that contains the information to your specification.
This is easy to code and runs quick but it requires that you understand compiler theory enough to write that grammar specification. That’s not for everyone so I don’t recommend using Antlr if all you have is a set of name/value pairs. What should you use in that case?
It all depends on whether or not the name/value pairs are flat or in a hierarchy. Flat name/value pairs would  look something like  this.
dbport=3306
dbpassword=test
For Java developers, the best way to load this kind of information is with the Properties class. You call the load method to read the file and the getProperty method to access the value of any property. You can also enumerate through all properties and use the Properties class to update the information back to the file if need be. This is, by far, the simplest and easiest approach. A lot of applications can safely use this approach without compromise.
Hierarchical name/value pairs would look something like this.
Db:
- port: 3306
password: test
The three most popular formats for storing hierarchical name/value pairs is XML, YAML, and JSON. Historically speaking, XML would be the most popular choice for Java developers but there are also good libraries available for parsing JSON in Java too. YAML is more popular amongst the Ruby and python programmers  but there are parsing libraries for Java too.
With XML you get two flavors of parsers, SAX and DOM. SAX is an event driven approach where you code handlers that get called as the XML is being parsed. DOM does all the parsing up front then returns an object hierarchy for you to traverse in your code in order to access the underlying data. SAX is usually quicker to run than DOM but more complicated to code for as you will most probably have to maintain an explicit stack during the parsing operation. If you prefer the DOM approach but want to provide your own classes instead of using the generic xerces classes, then take a look at either the Apache Digester project or JAXB.
You have probably noticed by now that the kind of system level configuration for a typical application (client or sever) could be modeled as context sensitive, flat name/value pairs, or hierarchical name/value pairs. The question isn’t so much about mandatory organizational style as it is about what is most likely to make the most sense cognitively to the engineers who will be maintaining this application in the future? To me, if it looks like a catalog of system resources, then it will best fit the context sensitive approach. If it looks like default properties with some specific overrides, then it will best fit in a hierarchical name/value approach. If we are just talking about how to connect to a small list of services, then the flat name/value approach would be my preferred approach.
Depending on the complexity of the information in your app’s configuration needs, choose the right technology for the job. In this topic, we covered three such choices; Antlr for context sensitive configurations, Properties for flat name/value configurations (most popular), and XML/YAML/JSON parsing for hierarchical name/value configurations.

Modern applications need to be configurable. What good is an application that cannot work after moving the database or any other dependent service? I find that applications tend to collect quite a large number of configurable information over time. When the project manager can’t make a decision, the engineer makes it a configuration switch to defer the decision till later.

This information doesn’t change during the run of the application. It just needs to get loaded into memory at start up time in order to be used during the run of the application. This is true for both client and server applications.

You don’t need a database to store this information between runs. That would be overkill. Besides, your configuration information is most probably manged by configuration management software which knows how to write text files. In all likelihood, your configuration should be stored in a text file.

Which format should this information get stored as and how should the information get read in and parsed? That depends on how the configuration information is organized. What are the informational complexity needs of the configuration? In my experience, that comes down to two choices, a bunch of name/value pairs or something that is context sensitive.

What does context sensitive mean in this case? It means that actual structure of the information depends on its earlier content. For example, I recently wrote a light weight configuration management system where configuration files get generated using a model driven template approach. The model can look something like this vastly simplified example.

class Server attribute port;
class MySql extends Server attribute password template mysqlconf;
object db instanceof MySql attribute port : '3306', password : 'test' ;

You could say that this could be captured as a series of name/value pairs but what the actual names are would depend on the proceeding class name (identified by what follows the instanceof keyword in this case).

The easiest way to parse something like this is to use a grammar parser generator such as Antlr. You  specify the grammar and tell antlr to generate the source code needed to parse input in the format of that grammar. Throw in a little boilerplate code to invoke the  parser and you get an easily traverable object hierarchy that contains the information to your specification.

This is easy to code and runs quick but it requires that you understand compiler theory enough to write that grammar specification. That’s not for everyone so I don’t recommend using Antlr if all you have is a set of name/value pairs. What should you use in that case?

It all depends on whether or not the name/value pairs are flat or in a hierarchy. Flat name/value pairs would  look something like  this.

dbport=3306
dbpassword=test

For Java developers, the best way to load this kind of information is with the Properties class. You call the load method to read the file and the getProperty method to access the value of any property. You can also enumerate through all properties and use the Properties class to update the information back to the file if need be. This is, by far, the simplest and easiest approach. A lot of applications can safely use this approach without compromise.

Hierarchical name/value pairs would look something like this.

Db:
       - port: 3306
         password: test

The three most popular formats for storing hierarchical name/value pairs is XML, YAML, and JSON. Historically speaking, XML would be the most popular choice for Java developers but there are also good libraries available for parsing JSON in Java too. YAML is more popular amongst the Ruby and python programmers  but there are parsing libraries for Java too.

With XML you get two flavors of parsers, SAX and DOM. SAX is an event driven approach where you code handlers that get called as the XML is being parsed. DOM does all the parsing up front then returns an object hierarchy for you to traverse in your code in order to access the underlying data. SAX is usually quicker to run than DOM but more complicated to code for as you will most probably have to maintain an explicit stack during the parsing operation. If you prefer the DOM approach but want to provide your own classes instead of using the generic xerces classes, then take a look at either the Apache Digester project or JAXB.

You have probably noticed by now that the kind of system level configuration for a typical application (client or sever) could be modeled as context sensitive, flat name/value pairs, or hierarchical name/value pairs. The question isn’t so much about mandatory organizational style as it is about what is most likely to make the most sense cognitively to the engineers who will be maintaining this application in the future? To me, if it looks like a catalog of system resources, then it will best fit the context sensitive approach. If it looks like default properties with some specific overrides, then it will best fit in a hierarchical name/value approach. If we are just talking about how to connect to a small list of services, then the flat name/value approach would be my preferred approach.

Depending on the complexity of the information in your app’s configuration needs, choose the right technology for the job. In this topic, we covered three such choices; Antlr for context sensitive configurations, Properties for flat name/value configurations (most popular), and XML/YAML/JSON parsing for hierarchical name/value configurations.