Category Archives: Patterns and Architecture

An Investigation on Database Id Generation Strategies (Part I)

The World ID-10092667

In this post, I will Investigate how Id generation strategies chosen for primary keys of business entities at the database level may impact the entire solution. Databases that are the repository of a transactional application do not work in a vaccum: They are a part of a larger solution.

So, design decisions for the database should not be made away from the context of the design decisions of the entire solution. On the contrary, all main design decisions regarding the database of any transactional application should be made considering the impact they may have on the solution as a whole.

So, proper trade-offs may apply regarding all main design decisions in a database that is the repository of a transactional application, including Id Generation Strategies for all business entities that are persisted by the application in the database.

Regarding the architecture of any given solution, we could say that it is all about the design decisions that we make and the consequences of such decisions.

Proper trade-offs are those that could strike the best possible balance of the consequences (both positive and negative) that may impact the solution in the short, mid and long term.

To conduct this investigation, I will choose an Ad-Hoc approach:
I will first navigate the problem, so we all could grasp a better understanding of what is at stake when we make isolated design decisions regarding Id Generation Strategies.

Once we have navigated the problem, so that we have some good understanding of the pros &cons of such a way of making design decisions, we will be in a much better position to explore possible solutions to this problem.

We could not argue that we fully understand any given solution if we do not have a deep understanding of the problem that such a solution is meant to solve.

Since my interest is to investigate this general problem from an architectural perspective, I  will use common design patterns and tools, like Separation of Concerns, Model-View-Mediator (this is a generic way to refer to patterns like Model-View-Controller or MVC, Model-View-Presenter or MVP, or Model-View-ViewModel or MVVM) and Object Relational Mapping (ORM).

When I say “MVC” I mean any tool that implements the MVC form of the general Model-View-Mediator pattern, and not just ASP.NET MVC, as it just as well applies to Spring.NET, or to any other MVC-based tool.

Why should we care to use these patterns and tools?

Mainly, because they are useful to us in a very practical way: they allow us to achieve our development goals with the least amount of effort from our part, if we make proper use of them.

The principle of Separation of Concerns is a very pervasive principle in Software Architecture, since it is applied in just about any architectural tool that we could consider, like for instance, when we use any Model-View-Mediator based tool, or when we use any ORM tool.

The principle of Separation of Concerns (SoC) states that we will organize our code in chunks in such a way that any given chunk of code will have a single and well-defined purpose, and it does not assume any superfluous responsibilities.

It means that if we choose to have an n-tier (or multiple layer) architecture, one of the main reasons behind this decision is the SoC principle.

It also means that if we choose to use some kind of Model-View-Mediator approach (like say, MVC, or MVVM, or MVP), one of the main reasons behind this decision is the SoC principle.

It would also mean that if we choose to use an ORM tool (like say, NHibernate, or EF), one of the main reasons behind this decision is the SoC principle.

With any of these tools and patterns, we use the concept of Model.

The Model is a software representation of a solution to a known problem.

The Model includes all the entities or business objects that are required by the solution to solve such a known problem.

Following the SoC principle, some chunk of code at some layer or tier will use these entities to apply the necessary logic that solves the business problem at hand.

By the same token, some other chunk of code at some other layer or tier will use these entities to persist their changes of state at the proper time and at the proper data repository.

The focus of my investigation will be at the level of this particular responsibility: how the different database id generation strategies affect the CRUD operations of business objects, and I will use an ORM tool as a helper for my analysis.

Speaking of ORM tools: why do we use them? what kind of problem do they solve for us?

As I have already said, the Model is a software representation of a solution to a known problem.

If we use an object-oriented representation of a given solution, such representation is aptly named the Object Model (OM) of said solution.

If we use an entity-relationship representation of a given solution, such representation is aptly named the Data Model (DM) of said solution.

For any given solution, its Object Model is very different from its Data Model.

If your team has to implement a solution with an OOP language like C# and a database like MS-SQL Server, such difference between the two representations of the solution poses a very serious problem to the software development effort of your team.

The formal name for this problem (the wide gap between the OM and the DM of a given solution) is Object-Relational Impedance Mismatch (ORIM).

It has been proven that a certain set of patterns is effective in the solution of the ORIM problem.

ORM tools are practical implementations of these patterns.

All ORM tools use a technique known as Mapping to bridge the gap of the ORIM problem.

ORM tools allow us to use a default set of Mapping rules and conventions, and they also allow us to customize the rules and conventions to be used by our implementation.

The simplest way to use any ORM tool is with the default set of Mapping rules and conventions.

In this post I will use NHibernate as a reference model for an ORM tool.

I will present and use concepts that are relevant for any given mainstream ORM tool, but I will use the names of those concepts as they are referred by NHibernate.

I will start with the simplest of examples, and I will gradually move on to more complex examples.

Since I want to explore how the different database id generation strategies may affect the CRUD operations of business objects, in my first example I will let the ORM tool choose the database id generation strategy by letting it use its default behaviour, then do some basic CRUD operations and use the debugging tools from the ORM engine to obtain useful information to analyze how good (or bad) is the default Id generation strategy from the perspective of the system as a whole.

To do this, I have chosen to use the approach commonly known as “Code First”, and let the ORM tool generate the database schema source code for the Model used in my first example.

I will use some POCO classes as the entities of my Model.

But before I go on, it would be useful to explore a little deeper into the Model and how it is used by the different layers or tiers.

When it comes to solving a given kind of problem, it is at the level of the Business Logic Layer where the “actual” solving of the problem happens.

When it comes to persisting and retrieving the state of business objects, it is at the level of the Data Access Layer where those kinds of operations happen.

At the level of the Business Logic Layer (BLL), all business objects (instances) of all business entities (entity classes) participate.

At the level of the Data Access Layer (DAL), only instances of persistent business entities (persistent entity classes) participate.

For many kinds of businesses, there is a subset of business entities that are non-persistent: that is, instances of such non-persistent entity classes are required and used at the BLL level, but none of such instances of such classes exist at the DAL level, which means that the database schema has no tables to represent the non-persistent entity classes.

At this point, it is very useful to present an example of such kind of scenario.

Let’s consider the following business example: A company has an customer loyalty program as part of their CRM business processes.

Some of the business processes involved in the customer loyalty program apply certain business rules based on algorithms that calculate metrics as a function of the “age” of a given customer in the customer loyalty program.

Let’s suppose that, for any given order, there are 10 different algorithms that use this “age” of the customer to calculate these metrics.

The “age” of a given customer in the customer loyalty program is the number of days, expressed in years (as a real number) between the start date when such a customer joined the program, and today’s date.

We should all realize  that the start date when any given customer joins the customer loyalty program has to be a public property of some business entity that has to be a persistent entity class.

The “age” of a customer in the program, as a property, it is a function of the start date and today’s date, so, it is not an independent property, so, it should not be persisted.

Regarding the aforementioned algorithms (we have supposed that there are 10 different calculations for each new order), we could just as well use the persistent start date as parameter with each one of them. But if we did so, it would mean that for each order, we would have to calculate the very same subtraction ten times in a row, which is a clear waste of resources.

So, why not use some non-persistent business entities at the Business Logic Layer when it seems to be useful and it makes a lot of sense from many perspectives?

Now that we have gone through the rationale behind non-persistent business entities, let’s delve into a simple Object Model that could solve the “Tango with Persistent & Non-Persistent classes”:

EntityHierarchy

Now, we can get back to the simplest way to use the “Code First” approach so that our choice of ORM tool, using defaults, generates the source code for the database schema of our Model.As we are using NHibernate as a reference model for any ORM tool, the simplest way to achieve what we need is with Automapping. What Automapping really means is that we will use the default set of rules and conventions with very little customizing.

With Automapping we can tell our ORM tool to generate the source code of the database schema that corresponds to our Model, that is, the object model that represents the business entities of the domain of our solution.

Since the domain of our solution is comprised of two subsets, a subset of persistent business entities, and a subset of non-persistent business entities, we need to tell our ORM tool to generate a database schema that only includes the persistent business entities.

The code for the base classes that we need to solve the “Tango” are these:

namespace SimpleAutoMap.Domain
{

public abstract class EntityBase
{}

}

namespace SimpleAutoMap.Domain
{

public abstract class NonPersistentEntityBase : EntityBase
{}

}

namespace SimpleAutoMap.Domain
{
public abstract class PersistentEntityBase : EntityBase
{
public virtual int Id { get; set; }
}

}

It is interesting to note that, in our model, the base class for all persistent business entity classes already has the Id property included: in this case, we are using implementation inheritance so as to save code!
It is also very important to note that so far, we have only dealt with the “Tango” of Persistent and Non-Persistent classes strictly from the perspective of pure implementation inheritance, and we still need to do some more work so that our ORM tool will work with the business entities as we expect it to do.
Now that we have our base classes in place, we can move on to the main classes of our (rather simple) model:
namespace SimpleAutoMap.Domain
{
public class Product : PersistentEntityBase
{
public virtual string ProductName { get; set; }
}

}

namespace SimpleAutoMap.Domain
{
public class Customer : PersistentEntityBase
{
public virtual string CustomerName { get; set; }
public virtual DateTime InceptionDate { get; set; }
public virtual DateTime ClpStartDate { get; set; }
}

}

namespace SimpleAutoMap.Domain
{
public class LineItem : PersistentEntityBase
{
public virtual int Quantity { get; set; }
public virtual decimal UnitPrice { get; set; }
public virtual Product Product { get; set; }
}

}

namespace SimpleAutoMap.Domain
{
public class Order : PersistentEntityBase
{
public virtual DateTime OrderDate { get; set; }
public virtual Customer Customer { get; set; }
public virtual IList LineItems { get; set; }
}

}

namespace SimpleAutoMap.Domain
{
class ClpProcessingOptions : NonPersistentEntityBase
{
public double Age { get; set; }
}

}

(NOTE: in the original post, I forgot to include the properties InceptionDate and ClpStartDate to Customer. Now it is fixed!)

Before we go any further, it would be very useful to say a word about why all the properties of the persistent entities have the modifier virtual, while at the same time, the properties of the non-persistent entities do not have the modifier virtual?

At this point I do not want to distract the attention away from the main goal of this post, but nonetheless I will give a short but proper answer to this valid and important question.

From the perspective of the engine of any ORM tool, the model is an atomic unit, in the sense that each and every entity class that is a part of the persistent subset of the model (the part of the model that is relevant to the ORM engine) is “created equal”.

Unless we say otherwise, when we tell the ORM engine to “load”, it will try to load to memory each and every instance of each and every entity class (which happens to be a real waste of resources!).

This funny way to behave (the default behaviour) is apty named eager loading. But if any ORM tool would only support eager loading, it would be useless to us.

So, in order to be useful, all ORM tools also support another behaviour, apty named lazy loading.

With lazy loading, we have complete programmatic control over when and how any give set of instances of any given entity class is loaded to memory by the ORM engine.

To be able to support lazy loading, all entity classes (so as to be able to be handled by the ORM tool in this way), MUST have all of its public properties declared as virtual.

Well, now that we can get back to own main interest, we have to figure out a way to tell the ORM engine to include into the Data Model only the entity classes that inherit from the class PersistentEntityBase.

With NHibernate this goal is very simple to achieve: the default set of rules and conventions is controlled by the class DefaultAutomappingConfiguration.

All we have to do is create a subclass of DefaultAutomappingConfiguration with the proper behaviour and use it in our implementation.

The class DefaultAutomappingConfiguration has a very useful method that will help us in what we want to achieve: the method ShouldMap.

The overload of this method that is interesting to our investigation has the following signature:

public virtual bool ShouldMap(Type type)
This overload in particular is very useful, indeed, for it is virtual (which means that we can override it with our own specialized logic), and it receives as parameter any object of the class Type.
This is simple and wonderful at the same time, as we can figure out how the ORM engine uses this overload: it iterates over the entire set of entity classes of the model, and for each given entity class, it passes it to this method and uses its outcome to determine if said entity class of the model has to be mapped or not.
This is exactly what we need to tell the ORM engine to map only those entity classes that inherit from the base class PersistentEntityBase.
So, our subclass of the base class  DefaultAutomappingConfiguration looks like this:
namespace SimpleAutoMapping
{

public class SimpleAutoMappingConfiguration
: DefaultAutomappingConfiguration
{
public override bool ShouldMap(Type type)
{
return type.IsSubclassOf(typeof(PersistentEntityBase));
}
}

}

Finally, we are ready to tell our ORM tool to follow its default behaviour (with just a very simple customizing), and generate the database schema for the subset of the persistent entity classes of our model.

With a powerful ORM tool (like for instance, NHibernate!), we need a very simple routine to do this:

class Program
{

static void Main(string[] args)
{
string outputFileName = ConfigurationManager.AppSettings[“OutputFileName”];
var cfg = new SimpleAutoMapConfiguration();var configuration = Fluently.Configure()
.Database(MsSqlConfiguration.MsSql2008)
.Mappings(m => m.AutoMappings.Add(
AutoMap.AssemblyOf<Customer>(cfg)))
.BuildConfiguration();
var exporter = new SchemaExport(configuration);
exporter.SetOutputFile(outputFileName);exporter.Execute(false, false, false);
Console.WriteLine(“\n\nDB schema source code.”);
Console.ReadLine();
}

}

This routine generates a database schema that looks like this:

if exists (select 1 from sys.objects where object_id = OBJECT_ID(N'[FKDDD0206ACBEF7F6]’) AND parent_object_id = OBJECT_ID(‘[LineItem]’))
alter table [LineItem] drop constraint FKDDD0206ACBEF7F6

if exists (select 1 from sys.objects where object_id = OBJECT_ID(N'[FKDDD0206A75BA3E60]’) AND parent_object_id = OBJECT_ID(‘[LineItem]’))
alter table [LineItem] drop constraint FKDDD0206A75BA3E60

if exists (select 1 from sys.objects where object_id = OBJECT_ID(N'[FK3117099B4095694A]’) AND parent_object_id = OBJECT_ID(‘[Order]’))
alter table [Order] drop constraint FK3117099B4095694A

if exists (select * from dbo.sysobjects where id = object_id(N'[Customer]’) and OBJECTPROPERTY(id, N’IsUserTable’) = 1) drop table [Customer]

if exists (select * from dbo.sysobjects where id = object_id(N'[LineItem]’) and OBJECTPROPERTY(id, N’IsUserTable’) = 1) drop table [LineItem]

if exists (select * from dbo.sysobjects where id = object_id(N'[Order]’) and OBJECTPROPERTY(id, N’IsUserTable’) = 1) drop table [Order]

if exists (select * from dbo.sysobjects where id = object_id(N'[Product]’) and OBJECTPROPERTY(id, N’IsUserTable’) = 1) drop table [Product]

create table [Customer] (
Id INT IDENTITY NOT NULL,
CustomerName NVARCHAR(255) null,
InceptionDate DATETIME null,
ClpStartDate DATETIME null,
primary key (Id)
)

create table [LineItem] (
Id INT IDENTITY NOT NULL,
Quantity INT null,
UnitPrice DECIMAL(19,5) null,
Product_id INT null,
Order_id INT null,
primary key (Id)
)

create table [Order] (
Id INT IDENTITY NOT NULL,
OrderDate DATETIME null,
Customer_id INT null,
primary key (Id)
)

create table [Product] (
Id INT IDENTITY NOT NULL,
ProductName NVARCHAR(255) null,
primary key (Id)
)

alter table [LineItem]
add constraint FKDDD0206ACBEF7F6
foreign key (Product_id)
references [Product]

alter table [LineItem]
add constraint FKDDD0206A75BA3E60
foreign key (Order_id)
references [Order]

alter table [Order]
add constraint FK3117099B4095694A
foreign key (Customer_id)
references [Customer]

We can check that the ORM tool, with the small set of constraints that we have given it and its own default behaviour, has generated a database schema that uses IDENTITY-based primary keys on all entities.

How good (or bad) is this decision from the perspective of the entire solution (and not just from the perspective of the database itself)?
We will explore this in my next blog post (Part II of this investigation).

To download the code sample, click here

Kind regards, GEN
Advertisements

What is software architecture? What does a software architect do? How could we tell the good software architects from the bad software architects?

LightSpeed

In this post I will deal with an interesting (and thorny) set of questions regarding architecture.

Without much further ado, let’s get into the game.

What is software architecture?

In a broad sense, software architecture is the complete set of design decisions that defines and determines the structure of a given software solution, but let’s bear in mind that when we talk about “software architecture“, we really mean the main design decisions that define the main structural elements of a given software solution, and not each and every design decision, that may include a large amount of rather simple, obvious and very uninteresting design decisions.

What does a software architect do?

In plain and simple terms, the software architect is the person that makes those design decisions, so, that is mainly what the software architect does:
making the design decisions regarding a given software solution.

How could we tell the good software architects from the bad software architects?

To be able to answer this question, we should start with a simpler question:

How could we tell a good design decision from a bad design decision?

Well, design decisions are either good or bad just because of their consequences: a design decision is good if and only if it produces good consequences, just as much as it will be bad if and only if it produces bad consequences.

So, getting back to the answer for the original question, the good software architects are the ones that make design decisions that have good consequences for the project, the team and the product, and the bad software architects are the ones that make design decisions that have bad consequences for the project, the team and the product.

I am aware of the fact that any Product Owner may argue that my answer is not useful for them since, by the time they realize that the software architect is not any good, it is already too late!

Well, allow to say in my defense that it is not so, since you can detect the tell-tale signs of either good or bad consequences from design decisions being made by the architect early on the duration of any software development project.

As a parting idea to this post, I will give you another tip on good software architects. All of them are really good at these two things:

1) Good software architects, when they have to make a design decision, they never forget to ask the following question:

How could I make sure that the design decision that I’m about to make will neither compromise nor limit our ability to keep making the design decision that we need, for the forseeable future?

(Since you have allowed me a few things, allow me to say that this is a quote of my own batch)

2) Good software architects, when they ask themselves this question, they always figure out a successful answer to it for the solution at hand.

Most people may argue that the question in 1) is an impossible question since we cannot predict the future, so, there is no way that today we could guess what might be the design decisions that we will need to make in the future.

Again, allow me to say in my defense that most people are not software architects (let alone good software architects), so, most people will not pay attention to the key to this question: “neither compromise nor limit our ability to keep making the design decision that we need

Good software architects do not need to predict the future to be able to figure out the answer to the key to the question.

Kind regards, GEN

We should always use MVC

ArchitectureEvery now and then, I bump into some forum where the merits of MVC is being discussed.

When I say MVC, I mean the pattern, and not the ASP.NET MVC tool.

There seems to be an ongoing (more like a never-ending) argument pertaining to major drawbacks of the pattern.

MVC is a pattern that has some differences with other patterns.

To start with, I should call MVC an architectural pattern, and not just a design pattern.

MVC is an architectural pattern simply because it defines the structure of an entire application, while design patterns only take care of some responsibility within an application.

The aspect that defines the character of MVC is Separation of Concerns. It clearly identifies the main concerns in an application: Presentation, Interactions with the actors, CRUD operations on business objects and other application objects.

Any application, large and small, has to take care of these responsibilities, so, MVC could and should be applied on all types of applications. The point is how it is applied on small apps and how it is applied on large apps.

The key is scope. Let’s consider a console application as an example of a small app, like for instance, a command-line tool like grep.

It has presentation (presents the results of its analysis on screen as a stream of characters), interactions with the actors (through command-line switches), and it has CRUD operations on business objects and other application objects (searches for files, scans the contents of the files that match search pattern and analyzes lines in its content that match the target regular expression, then, lists results to present).

To be able to design a tool that could be maintainable, it would make sense to use MVC to separate the aforementioned concerns of this app.

The original grep tool was designed and developed in C, and there would be no major problems to use MVC with a C application. In C, we have structs, pointers, pointers to functions, dynamic memory allocation, release of resources, etc.

The main application execution loop should talk to the three concerns,  the Model, the View and the Controller, and the concerns should talk with each other as the pattern indicates. They should use events (callbacks based on function pointers) to talk to each other asynchronously.

Each concern should have its own set of helper functions that would take care of specific responsibilities (parse command-line switches, search files that match search string, scan file content line by line, compare line text with regular expression, compile list of results).

As we can see, if we can apply this architecture in C, we can certainly apply it to any other modern programming language like Java or C#.

On a larger application, we would still have the three concerns talking to each other in a similar fashion, but we would have more internal structures, components and layers to take care of, as a larger application will also have to take care of many other quality attributes, like scalability, availability, security, etc.

As briefly shown by my comments, the key is the scope. If you wisely use scope with your implementations of MVC, there is no way that you will misuse it, abuse it, or otherwise face drawbacks.

Kind regards, Gastón

En ingeniería, la forma sigue a la falla, y no a la función

Cracked Glass

Cracked Glass

La reciente salida de servicio de Amazon Web Services (AWS) puso de manifiesto como afectó a algunos clientes mientras que otros no se vieron afectados, ya que diseñaron sus soluciones On The Cloud con una combinación adicional de redundancia y eliminación de puntos únicos de falla.

Este evento destaca claramente que el problema no está en la madurez de las soluciones Cloud disponibles en el mercado, si no más bien en una visión simplista de como las organizaciones deberían implementar este tipo de soluciones.

Dicho de otra manera, en vez de decir “Creo en Cloud Computing”, deberíamos decir “Le creo a Cloud Computing”.

Si realmente “le creemos a Cloud Computing”, deberíamos aplicar los mismos principios de diseño de disponibilidad de la solución Cloud que contratamos, a la hora de determinar que componentes, dependencias e interacciones nos asegurarán un diseño fault-tolerant.

Como complemento de estas ideas, incluyo dos posts que me parecen interesantes:

http://gigaom.com/cloud/how-to-design-your-service-for-failures-in-the-cloud/

http://www.techrepublic.com/blog/networking/how-innovative-design-allowed-one-cloud-company-to-withstand-amazons-recent-outage/3995

Saludos, GEN

Let’s clear the confusion regarding multiple implementation inheritance in C++

Wrench
Wrench
Every now and then, in some of the dev forums and communities I frequently participate in, I find that some developers seem to have some confusion pertaining to why does C++ offer multiple implementation inheritance, what is the real purpose of it?
Here I include an answer I have given recently on one of such forums, as I find it to be useful for other devs.

Let’s not confuse (multiple) implementation inheritance, which is what C++ has, and (multiple) Interface Inheritance, which is what Java and C# have.

C++ does not have interfaces, but it does have abstract classes.

One of the reasons why abstract classes exist in C++ is to be able to do what interfaces do for Java and C#.

So, just as it is necessary to allow for multiple interface inheritance, C++ must support multiple implementation inheritance to be able to use abstract classes as interfaces and to have multiple “interface” inheritance (with abstract classes) just like Java and C# have.

Why is it necessary to allow for multiple interface inheritance?

Well, to find a short and simple way to explain this requires some pattern thinking.

Let’s suppose that we have a certain class that participates in a few mechanisms or design patterns.

In each mechanism, that class will have its own role in the society of classes that participate in the pattern.

So, the same class will have a set of roles, on each role will have to comply to the semantics of the mechanism it belongs to.

Each set of semantics will be formed by public methods and properties.

Each one of these sets of public methods and properties is an interface of that class, and to be useful, it has to be a member of each of the mechanisms it participates in.

As we all know, the way to make software architecture possible is to allow for components that contain classes that could work as interchangable parts: weak coupling!

But to have a simple way to implement that weak coupling, we need interfaces (or it’s sibling: abstract classes).

As you see, there is a clear requirement for multiple interface inheritance.

If we care to analyze most Application Frameworks done over the years for C++, we may find the unexpected surprise that they do not use multiple implementation inheritance with concrete classes, and they just use it with abstract classes.

It does make sense to question why would companies like Borland and Microsoft did not use a good thing like multiple implementation inheritance with concrete classes?

Well, we could argue a few possible answers, but I could guess that the simplest answer would suffice: multiple implementation inheritance with concrete classes is not practical, out of being too complex, beyond the trivial code samples.

Application Frameworks are really complex beasts on their own right.

Another idea that we could entertain as an explanation is coupling: we try that our architectures use as much weak coupling as possible, and just to be very clear, we are talking about decoupling dependencies between concrete classes.

One way to achieve that is with interface inheritance, which in C++, it is called “implementation inheritance with abstract classes”.

Well, if we were to follow this principle, it does not make much sense to use multiple implementation inheritance with concrete classes, as it increases dependencies between concrete classes, which is the opposite effect of the principle.

Over the years, I have read a few times a question similar to the following:

If Java and C# also support abstract classes like C++ does, why don’t they also support multiple implementation inheritance?

Well, the truth is that multiple interface inheritance, support of abstract classes and single implementation inheritance is a much better solution for the inheritance paradigm that just multiple implementation inheritance with support of abstract classes.

Both Java and C# architects and developers can’t go wrong with their abstractions, as the possibility to make that mistake is avoided from the start, while that is not the case with C++.

If SOAP Services are a standard and REST services are not, why bother with REST services?

This seems to be a very powerful statement: why waste any time at all with REST?

With REST, all parties involved (providers and consumers) have to agree on most aspects of the design, as there are no standards beyond the actual (HTTP) GET, POST, PUT and DELETE.

What benefit could come out of REST?

Well, I think that to be able to offer a good operational answer to this question, we should picture a simple scenario.

Let’s suppose that you have an information system that maintains weather information for many locations, and you update that information into the system on a timely basis (like say, every 15 minutes).

As any system, it should support the usual CRUD functions: CREATE (INSERT), READ (SELECT), UPDATE (UPDATE), DELETE (DELETE).

You have many distributed weather data capture stations that send data to a centralized service following the aforementioned update period.

You need to offer a scalable and simple service façade to both the data providers (data capture stations) and the data consumers (other systems that query for weather info).

SOAP web services at first seem to be fine with this requirement (why shouldn’t they?).

But when you start thinking about quality attributes and architectural decisions that would better give support to those attributes, you start wondering…

One characteristic of the entire operation is a thorny issue: within each and every 15-minute period, data does not change!

So, all consumers of data for any given location (with the location ID being in itself a parameter of the query) would get always the same result within one given 15 minute period.

SOAP web services are great for highly transactional systems, that is, systems that are affected by high rates of data changes per unit of time.

For the kind of scenario that we are conceiving, SOAP web services are not necessarily that great, as we would benefit from a service that could make good use of web page caching mechanisms (that are very common on most web servers).

Well, the good thing about REST services is that, from the perspective of the web server where they are hosted, they appear and behave as simple web pages (to the server).

As that is the case, we can actually make good use of standard page caching mechanisms with REST services, in particular, with GET calls.

In WCF REST services, we can use standard Output Cache mechanisms (including cache profiles), starting with VaryByParam, VaryByParams and VaryByCustom.

So now, we have a better understanding of when and how to take advantage of REST services.

KR, GEN

Are we validating design decisions with metrics and facts?

One key factor that we should always consider is what actions have we taken to make sure that our design decisions are in line with the prioritized quality attributes of the product to develop.

A good way to grasp this is with a simple example.

The argument here is on how we give factual support to our decisions, and not on discussing the technical merits of the decisions of the following two samples.

Lets suppose that we have a routine that needs to do some stuff with a collection of numeric integer values, and performance is the main concern (the topmost prioritized quality attribute).

Should we use an array as collection? Or should we use a Generics collection, like say List?

We will compare two routines whose source code is as similar as it can be, except for the actual type of the collection holding the numeric integer values.

Code Sample 1 (array-based collection)

 class Program

 {

 public const int TOP_VALUE =
1000;

 

 static void Main()

 {

 DateTime
start = DateTime.Now;

 

 int[]
List = new int[TOP_VALUE];

 

 for
(int i = 0; i < TOP_VALUE; i++)

 {

 

 List[i] = i;

 }

 

 for
(int i = 0; i < TOP_VALUE; i++)

 {

 

 int
j = (int) List[i];

 

 Console.WriteLine(“{0}: {1}”, i, j);

 }

 

 DateTime
end = DateTime.Now;

 

 Console.WriteLine(“Array Sample; TOP VALUE: {0}; Elapsed Time:
{1}”
, TOP_VALUE, end.Subtract(start).ToString());

 

 Console.Read();

 

 

 }

 }

 

Code Sample 2 (Generics -based collection)

 class Program

 {

 public const int TOP_VALUE =
1000;

 

 static void Main()

 {

 DateTime
start = DateTime.Now;

 

 List<int> list = new List<int>();

 

 for
(int i = 0; i < TOP_VALUE; i++)

 {

 

 list.Add(i);

 }

 

 for
(int i = 0; i < TOP_VALUE; i++)

 {

 

 int
j = list[i];

 

 Console.WriteLine(“{0}: {1}”, i, j);

 }

 

 DateTime
end = DateTime.Now;

 

 Console.WriteLine(“List<> Sample; TOP VALUE: {0}; Elapsed Time:
{1}”
, TOP_VALUE, end.Subtract(start).ToString());

 

 Console.Read();

 

 

 }

 }

 

The first sample, the one using the array-based collection, includes an explicit type cast to force unboxing into the picture.

Instead of giving way to assumptions and scenario analysis, we will use each one of the two routines to take some measures and calculate some simple performance metrics.

By changing the value of the constant, we will compare the performance of the same two routines with different sizes of the collections, from 1000 items all the way up to 10000000 items, in ten-fold increments.

We will also take three measures for each size of the collections, and then use the average value of each set of three measurements.

The performance metric we will measure is execution time (elapsed time of execution in seconds). You may see that the source code of the two routines already includes code to determine the elapsed time of execution.

The metrics capture for the two routines is as follows:

 To all practical purposes, both routines show just about the same elapsed time of execution for the same range of collection sizes, that is, from 1000 items to 10000000 items, which means that they experience and offer the same performance!

With this simple factual support, we can make a strong argument regarding performance without having to resort to assumptions or scenarios.

Lets bear in mind that, if we were to argue with just assumptions in a court of law, the other side would spit hearsay to the presiding judge.

Once we have metrics and facts to support whatever design decisions we could make, we better have some good explanations regarding what may be happening under the hood.

Thats a good subject for an upcoming post.

See ya!