solarisBank Core Banking: beginning of a journey
5 minute read
As a tech company with a banking license, we at solarisBank are excited to start introducing more tech-related topics on our blog. Our tech team members already present these at various meetups and conferences, but we want to dive a bit deeper and showcase some of the cogs and wheels behind our banking platform.
For our first Tech post, we turn to one of our freshest products, Core Banking, and hand over the writing to some of our awesome team members— Daria, Egor, and Felipe.
Three years ago, solarisBank started with a small project known as Digital Banking. It gave our partners — businesses using our services — the chance to create their own banking products by connecting to our platform via RESTful API. This service was a way to orchestrate the essential processes within the underlying core banking system. It provided a CRUD (Create, Read, Update, Delete) based interface and was built using the Ruby programming language.
Over time the number of our partners increased. We started developing many new products as well. We saw that there was a need to revise our initial approach to the Digital Banking architecture. The starting point for us was having the communication with our providers separate from the partners’ interface. They had been together before, and without this coupling we could add new API features more easily. Having this goal in mind, we have started turning the Digital Banking monolith into microservices. This is what we are doing now: combining modern software architecture design patterns with accounting techniques which have been used for centuries.
This is how the idea of our in-house Core Banking product was born. By focusing less on features for partner products, the new team could concentrate more on internal banking processes. Later on, this idea evolved to having a subledger system for payments related accounts.
Core Banking at solarisBank
We want to build our system as stable as we can. It means our APIs should have clearly defined boundaries for an underlying software system. This means that in case there are any issues with parts of our infrastructure, only the affected functionality would be inoperable. For the majority of the system, the problem shouldn’t even exist.
For this reason, we decided to build an abstraction on top of our providers’ APIs to shift all the underlying communication one level down. A reduced complexity of API products also means that we filter out unnecessary data before sending it to our providers. Additionally, we keep the consistency of “request and response” that our partners are already used to.
Moreover, by introducing an internal abstraction, we can provide better tools to our partners. Even adding new functionalities, we can still be sure that we keep all the data passed to our providers. To ensure that we don’t overlook anything, we decided to use Clean Architecture. Our choice was to apply Event Sourcing and Command Query Responsibility Segregation (CQRS).
These terms are not as well-known as we’d like them to be, so if you are not sure what they mean, here’s a brief intro:
- Event sourcing means that the changes to the state of an object are stored as an immutable sequence (or “log”) of events.
- With CQRS we split an application into two parts: one writes the changes (command part) and the other reflects them (query part).
Event sourcing to the rescue
When we introduce a change to our data, we have to ensure that this new data is valid. If for some reason it’s not, we have to be able to restore the previous state of the system as if the error has never happened. Besides, as the German Federal Financial Supervisory Authority license requires, we have to have an audit log. And in the end, as for any platform with this kind of responsibility, all our systems ought to be consistent, traceable and scalable.
The main problem we faced in tracing our data change processes was that traditional CRUD data storage systems do not have all the features we need. Every relational DB has transaction logs, but they’re not exactly designed for use-cases of the banking domain. The solution that the software world came up with is event sourcing. It’s a log of our own, designed to fit specific needs and, what’s very important for us, it scales very well. One of the members of our team, Armin Pasalic, gave a great talk about event sourcing.
In short, the event sourced system does not store state, but a chain of events that have lead to that state. From this chain, we can build the state of the business entity at any given time. In addition, events make our data storage error-resistant, because we can figure out the exact state of the system at any given moment in time. This means we can discover errors and their causes easier. It leads us to quick and easy fixes: once we’ve figured out the corrupt data, we can emit a correcting event.
Yes, we correct errors by adding data — not erasing or replacing it. The reason is simple: events are history, and history cannot be rewritten. This feature of an event sourced system gives us the audit log mentioned above. Completely for free!
Now we know that everything that ever happened in our system goes to the event store. Thus, we can treat it as the source of truth for the whole system. This leads us to:
- No cross-dependencies between our services. They all listen to updates on the event store and act upon them. No single point of failure anymore.
- Infinite read scalability. With CQRS we can build as many different (or same) projections of our data as we need to.
However, every software developer knows that nice things don’t come for free. There are always tradeoffs. What are the downsides of using event sourcing?
- Our event store is going to be huge at some point. But, when we hit the limit of our store, the consistency boundaries of aggregates will help to distribute the events across multiple stores.
- Every time we want to add something new, we need to build the state based on events. On the bright side, this gives us more assurance on data correctness.
- Eventual consistency: the response isn’t immediate, because the processing takes some time.
- Enforced backward compatibility: in case of a change in the event structure, we can’t drop the support of an older version. If we can’t read early data, we won’t be able to build the state anymore.
Event sourcing is no silver bullet. But for financial ledger systems, the concepts around event sourcing are a natural fit.
Our first try with event sourcing
Before diving deep into the fascinating world of event sourcing, the Core Banking team decided to start with a proof of concept. It was an application written in Ruby, and the event store was in MySQL. The team members chose Ruby for fast prototyping, and MySQL — well, this is what was available at the moment.
Why not Kafka, you might ask? Before starting a project this ambitious, the team had many conversations with experts who had events sourced systems running on production. Some of them were using Kafka, and they were not very happy with it. If you’d like to know more, check out a talk given by one of our staff engineers, Satyajit Ranjeev. His team built a similar system at OptioPay, and in this talk, he discusses what kind of challenges they faced.
How did we fare during the event sourcing testing process and where are we today?
The prototype for handling account data was built and put into action for a test. That was a year ago, and ended up being the starting point for the Core Banking team. The service turned out to be fast, stable and reliable — what else is there to ask for? Since then, the team grew from only 2 people to 7. Today we continue developing our domain. Our tools now are Go, PostgreSQL, and DDD is our guiding star. The proof of concept is still out there, serving its purpose. There is no reason in shutting down a perfectly working part of the infrastructure.