Incompatible AVRO schema in Schema Registry

19 Apr 2016

Testing what’s going on

My company uses Apache Kafka as the spine for its next-generation architecture. Kafka is a distributed append-only log that can be used as a pub-sub mechanism. We use Kafka to publish events once business processes have completed successfully, allowing a high degree of decoupling between producers and consumers.

These events are encoded using Avro schemas. Avro is a binary serialization format that enables a compact representation of data, much more than, for instance, JSON. Given the high volume of events we publish to kafka, using a compact format is critical.

In combination with Avro we use Confluent’s Schema Registry to manage our schemas. The registry provides a RESTful API to store and retrieve schemas.

Compatibility modes

The Schema Registry can control what schemas get registered, ensuring a certain level of compatibility between existing and new schemas. This compatibility can be set to one of the next four modes:

BACKWARD: a new schema is allowed if it can be used to read all data ever published into the corresponding topic.
FORWARD: a new schema is allowed if it can be used to write data that all previous schemas would be able to read.
FULL: a new schema that fullfils both registrations.
NONE: a schema is allowed as long as it is valid Avro.

By default, Schema Registry sets BACKWARD compatibility, which is most likely your preferred option in PROD environment, unless you want to have a hard time with your consumers not quite understanding events published with a newer, incompatible version of the schema.

Incompatible schemas

In development phase it is perfectly fine to replace schemas with others that are incompatible. Schema Registry will prevent updating the existing schema to an incompatible newer version unless we change its default setting.

Fortunately Schema Registry offers a complete API that allows to register and retrieve schemas, but also to change some of its configuration. More specifically, it offers a /config endpoint to PUT new values for its compatibility setting.

The following command would change the compatibility setting to NONE for all schemas in the Registry:

curl -X PUT http://your-schema-registry-address/config 
     -d '{"compatibility": "NONE"}'
     -H "Content-Type:application/json"

This way next registration would be allowed by the Registry as long as the newer schema were valid Avro. The configuration can be set for an specific schema too, simply appending the name (i.e., /config/subject-name).

Once the incompatible schema has been registered, the setting should be set back to a more cautious value.

Summary

The combination of Kafka, Avro and Schema Registry is a great way to store your events in the most compact way possible, while still retains the ability to evolve the corresponding schemas.

However some of the limitations that the Schema Registry imposes make less sense on a development environment. On some occassions, making incompatible changes in a simple way is necessary and recommendable.

The Schema Registry API allows changing the compatibility setting to accept schemas that, otherwise, would be rejected.

FAQ: Story points

03 Dec 2015

Story points are quite old, but there are still way too many misunderstandings around them. Below I'm going to try to shed some light on the most common doubts around them.

What are Story Points?
It's a way to measure the effort necessary to implement a story, where a story is some requirement that an Agile team is going to convert into working software.

How do they work?
You have a scale of values, you define a baseline (a really simple story that you would consider requires an effort of 1 point) and then you estimate everything relatively to that baseline story. If a story requires the same or less effort than your baseline, you give it 1 point. If it is roughly twice as difficult, you assign 2 points. The values in the scale have to be spacious enough to make sure you don't try to estimate "too precisely". Therefore many teams choose Fibonacci series as their scale (1, 2, 3, 5, 8, etc).

Wait a minute, what do you mean by "don't try to estimate too precisely"? And why not just estimating using time?
I mean exactly that. When you use this technique, you are implicitly recognizing that you can't provide meaningful estimations with the level of detail that a time estimation requires. In plain English, you recognize your estimations in time are not accurate, therefore they don't have any value.

Instead you use a more high-level, less-precise measure like story points. Even if it is less precise than a time-based estimation, it is more valuable because it's more stable and, overtime, it will be more helpful to forecast team and project progress.

Is effort all I have to take into account when estimating with story points?
Not necessarily, although it is the most important bit. Other things that you may consider are:

How clear are the requirements and acceptance criteria in the story?
Does it look like they may be many technical or business unknowns that will be discovered during the implementation phase?
Is there any technical risk? For example, are you using a technology for the first time?

The more question marks around the story, the higher the number of story points.

Can I sum story points?

No, you can't. They don't represent numbers, they represent buckets. That means that, when you have a story that is the same or less effort than your base line, you put in the 1-point bucket. When it's the same or less than twice the effort for your base line, you throw it to the 2-point bucket, etc. You get the point.

Also quite often the amount of time require to implement a 3-point story will be much more than 50% more the effort of a 2-point story. There is no linearity, not to mention that the higher the bucket, the wilder the oscillation in implementation time (which makes sense because the higher the risk too).

Is Story Points the only way to measure stories and forecast?

No, there are other metrics. T-shirt sizes is quite common too. Some people also consider using "ideal days". This one is, more or less, a representation of how much work you can do in a perfect day, without meetings, without distractions and without any other problem. Then you assign those ideal days to stories and, if you're working on sprints, over time you can measure how many actual ideal days your team has per sprint.

Do I have to use Story Points if I do Scrum?

Not at all. If you check the Scrum.org Scrum Guide, story points aren't mention anywhere. That makes all the sense, because contrary to what many people think, Scrum is a quite loose framework (not a process) that you have to fill in with your own practices to come up with a development process. Actually, years ago the Guide didn't even mention estimations. It just mentioned your backlog should be ordered and it was up to the Product Owner to discover what that order should be.

Why should I use Story Points then?

You shouldn't if you don't know why you would use them. And you would use them if you want to provide some forecasting regarding your project. Basically, been able to answer the question: "when is this going to be done?". Story Points help you answer that question because, overtime, you get some sense of how many points you can deliver per unit of time, where that unit of time is usually your sprint size in weeks. Based on that, you can be reasonable confident about how many stories you can get done and when, on a relatively close time horizon. Don't try to estimate a massive project using story points before even starting it, it won't work. You won't have enough understanding of the project, the stakeholders and the technology and your estimations will have zero value.

Why should I estimate in the first place?

Well, if you are a developer, estimating doesn't add any value to you; zero. You just want to get a list of things to do and nail them and you don't need to communicate in advance when they'll be done, right? However, some people would argue that part of been a professional engineer includes providing meaningful estimations regarding delivery of software to the rest of the business. In better words than mine:

Avoiding responsibility for estimates is another way of saying, “I’m not ready to be relied upon for building critical pieces of infrastructure.” All businesses rely on estimates, and all engineers working on a project are involved in Joint Activity, which means that they have a responsibility to others to make themselves interpredictable. In general, mature engineers are comfortable with working within some nonzero amount of uncertainty and risk.

So man up and come up with some respetable estimations that you're willing to commit to.

Should Management measure team's productivity using Story Points?

NEVER. That is one of the biggest mistakes that can be done. If you do so, you're going to make two mistakes in one:

You will ruin story points as a tool to estimate. Eventually every human being tends to trick any system rules, even unconsciously. If you measure people's productivity with points, they will just inflate their estimations to make it look like more points are delivered per sprint, therefore the team is doing more. Wrong and useless.
You'll miss the opportunity to use a proper and useful measure, like business value. Not saying that business value is easy to measure, though, but definitively worth trying instead of measuring something that is completely irrelevant and easy to trick.

What's the difference with Planning Poker?

Planning Poker is just a estimation technique, not a estimation measure. You use planning poker as a way to take advance of the "Wisdom of Crowds". Planning Poker is useful because:

Estimations are done and presented without knowing other members' opinion. Therefore more junior/shy members won't be influenced by estimations presented by senior/stronger players.
If estimations don't match, a healthy debate is triggered where more information is brought into the discussion for those that have bigger/smaller numbers. That benefits the final estimation and also helps all team benefit from the insights of each member.

Is that all?

Not really, there are many other things that are interesting on this topic, like trying to correlate points with time (bad idea IMHO) , what a good scale for points should look like, what to do if you realize after implementing a story that it was over/under estimated, how to manage scope creep, etc. Maybe for another day.

Knockout: bindear booleano a radio button

09 Jan 2015

Knockout JS es un framework realmente útil para hacer páginas web dinámicas. Últimamente he tenido la oportunidad de utilizarlo bastante y, siendo un total inexperto, puedo decir que facilita mucho las cosas y tiene una excelente documentación.

Sin embargo, recientemente me encontré en una situación en la que tuve que invertir bastante tiempo para conseguir hacer funcionar un binding entre un par de radio buttons, una propiedad JavaScript del viewModel de Knockout y la correspondiente propiedad en el modelo MVC que recibía mi acción del controlador.

El error

El error se manifestaba simplemente no funcionando el binding entre mi propiedad JS del viewModel y los correspondiente radio buttons. En un primer momento el binding parecía ir bien, pero al enviar la página de vuelta al servidor y retornar al mismo punto por existir algún error en la validación del modelo MVC, el binding no saltaba y los radio buttons no se marcaban según los valores que el usuario hubiera elegido.

La razón

Los radio buttons definen sus valores como strings mientras que la propiedad del modelo MVC era booleano. Entre medias Knockout intentaba “lidiar” entre ambas, pero al recibir el modelo MVC de vuelta tras la validación, su valor booleano no bindeaba correctamente con los radio buttons por ser sus valores cadenas en lugar de booleanos también.

La solución

Probablemente haya múltiples, pero en mi caso la más sencilla fue utilizar un binding custom entre los radio buttons con valores string y la propiedad JS del viewModel.

https://gist.github.com/javierholguera/7bad4fa6e6a4e1aa2ffb

Utilizando un interceptor entre los valores que llegan de los radio buttons y la propiedad, podemos convertir en ambos sentidos entre booleanos y strings convenientemente.

El custom binding se aplica sobre el binding “checked” habitual de los radio buttons, de forma que podemos reaprovechar todo el mecanismo ya existente.

Referencias

StackOverflow - Knockoutjs (version 2.1.0): bind boolean value to select box

Por qué hacemos self = this en JS?

01 Jan 2015

Cuando empecé a utilizar Knockout JS, descubrí que existía una cierta convención según la cual, al crear la función Constructor del modelo que se “bindea” con Knockout, la primera línea siempre era la siguiente:

var self = this;

No tenía ni idea de porqué se hacía esto, pero entendía que era necesario para, posteriormente, poder definir y añadir los distintos métodos y propiedades que formaban el modelo.

Ahora, gracias a los vídeos de “JavaScript The Good Parts” de Douglas Crockford en Pluralsight, por fin sé el porqué de esta misteriosa pero indispensable línea. Y las razones son realmente dos.

Una función anidada no tiene acceso al this externo

Como bien explica Douglas, cuando dentro de una función definimos otra función, la función anidada no tiene acceso al puntero this que la función externa ha recibido.

Para superar esta limitación, en ocasiones veremos código en el que se define una línea como la siguiente.

var that = this;

Con esto lo que conseguimos es capturar en la función superior el puntero this, para posteriormente permitir a la función anidada acceder a la variable that que contendrá el mismo valor que contenía this cuando fue capturado.

Exactamente la misma técnica es la que estaremos aplicando al capturar this en una variable de nombre self. De esta forma las distintas funciones que crearemos como parte de la definición del modelo, podrán acceder al this que originalmente recibió la función constructora.

Pero, por qué querríamos acceder a ese this original que la función constructora recibe? Aquí entra en juego el segundo principio que define esta técnica.

La función constructora recibe el nuevo objeto en this

Esta es la otra clave de esta técnica. Cuando una función se invoca con el operador new (como hacemos al crear el modelo), se crea un nuevo objeto y se asigna al puntero this que recibe la función que estamos invocando con el operador.

Esto a efectos prácticos significa que el this que recibiremos en la función constructora es el propio nuevo objeto al que estaremos añadiendo propiedades y métodos como parte del código de dicha función.

De este peculiar modo nuestra función constructora no sólo inicializa el objeto como haría un constructor de un lenguaje estático como Java o C#, sino que también añade la propia funcionalidad al asignar funciones y propiedades.

Routing with WCF

18 Jul 2012

Today we face a problem in Production environment. We needed to route some WCF requests from one “publicly visible” server to an internal one. A typical routing scenario.

Fortunately these requests were received in a WCF service and this technology has a built-in routing feature since 4.0 version. To use it we don’t need to change any code, it is enough to modify the app.config/web.config of the services. Here we can see how to use it, step by step. All these XML code will be place inside system.serviceModel tag

First Step – Define the new Service

We have to define a new Routing service that will receive all the requests. It will, later, internally dispatch them depending on certain routing rules. The XML necessary is:

<services>
  <service name="System.ServiceModel.Routing.RoutingService" behaviorConfiguration="routerConfig">
    <endpoint address=""
              binding="basicHttpBinding"
              contract="System.ServiceModel.Routing.IRequestReplyRouter"
              name="reqReplyEndpoint" />
  </service>
</services>

Two things that we may notice:

The services needs a behavior configuration. In this configuration, later, we will define the routing table.
We don’t define an address because we assume the service will be deployed in a IIS server. If we want to do some tests with Casinni, we will need to define an address.

Second Step – SVC hosting file for the new service

Since WCF 4.0 we don’t need the SVC files to host a WCF service, we can define them in app.config/web.config and the internal plumbery of WCF/IIS is smart enough to allow us calling the corresponding URL as if the SVC file really exists. This is the XML necessary for that:

<serviceHostingEnvironment>
  <serviceActivations>
    <add relativeAddress="RiskManagementServiceUAT.svc" service="System.ServiceModel.Routing.RoutingService, System.ServiceModel.Routing, Version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" />
  </serviceActivations>
</serviceHostingEnvironment>

Two more points to consider:

The relative address must contain an extension. For example, if we define it as “RiskManagementService.UAT”, without SVC extension, it will fail.
The service needs to specify the complete qualify name in this case, but it is not usual. In other projects that we have used this “virtual SVC” system, it was not necessary. Apparently there is some kind of limitation with the RoutingService.

Third Step – Service behavior configuration

We referenced a service behavior when we defined the service in step 1. Below we can see that configuration, that will need to indicate what routing table we have to use.

<behaviors>
  <serviceBehaviors>
    <behavior name="routerConfig">
      <serviceMetadata httpGetEnabled="true" />
      <serviceDebug includeExceptionDetailInFaults="true" />
      <routing routeOnHeadersOnly="false" filterTableName="routingTable" />
    </behavior>
  </serviceBehaviors>
</behaviors>

Things to consider here:

Be sure the name of the behavior matches what you set when defined the service.
serviceMetadata may not be necessary in a production environment, if you don’t expect new clients to be created from your service WSDL.
serviceDebug MUST NOT be activated in production environment. It is a security risk.
In the routing tag we will indicate the name for the routing table to be used in this RoutingService service.

Fourth Step – Routing table

We are getting close to the end… We need to define the routing table. There is complex patterns that we may want to follow, like taking into account headers or contents in the messages. It is really useful for versioning, load-balancing and similar stuff. However, in our concrete scenario we only want to redirect all messages to the internal server, so we didn’t need a complex solution. This was our table:

<routing>
  <filters>
    <filter name="matchAll" filterType="MatchAll" />
  </filters>

  <filterTables>
    <filterTable name="routingTable">
        <add filterName="matchAll" endpointName="RiskService" />
    </filterTable>
  </filterTables>
</routing>

Three important points here:

Under filters tag we will define all possible filters. In this case we use the MatchAll filter. In bibliography there is a link for all filter types in WCF.
Under filterTables tag we define all the possible tables. We may have different tables for different routers. In our case, we define a “routingTable” in the service behavior in Step 3 and here it is.
As part of every table we will bound filters with endpoints. This is a very flexible approach, we can define as many possible filters as we may need for all our endpoints and later just correlate them in the filter table.

filterName will be the name of the filter to apply.
endpointName will be the name of the endpoint where the message will be routed when the filter is matched. This endpoint corresponds to a new element defined in the next step.

Fifth Step – Client endpoints

The router is a sum of input messages, routing logic and destination services. We have defined 2 of 3. In this point we will define what services will receive the messages once the routing is done. Here it is is the XML.

<client>
  <endpoint name="RiskService"
            address="http://appserver5/RiskManagementTool/RiskManagementService.svc"
            binding="basicHttpBinding"
            contract="*" />
</client>

So what do we have here? Just being “client” of the service where we want to send messages under certain criteria defined in Step 4. Pay attention to the name that we give to the endpoint, because this name must match the value for endpointName in the entries of the filter table defined in Step 4.

Conclusions

WCF is not a easy technology, but it is really powerful and, with enough knowledge and patience there is a great number of scenarios that you can cover with just defining appropriate XML configuration. No coding, no compiling, no deploying, just playing with the app.config/web.config and you get a very powerful Routing Service that is able to route base on headers, message content, protocol, etc.

Bibliography

Older Newer

El Javi